HTMLNormalizer#
- class HTMLNormalizer(config)#
Bases:
Normalizer
The HTML
Normalizer
removes HTML markupInput - all input fields need to be of type
str
.Output - all output fields are filled with data of type
str
.- Parameters:
Example
{ "step": "normalizer", "type": "html", "input_fields": ["body"], "output_fields": ["normalized_body"] }
Methods Summary
process_doc
(doc)Process a document
Methods Documentation