The content cleanup enrichment cleans up incoming text and removes potentially malicious content from the HTML body.
The Content Standardization step is used to clean content as it comes in. From text fields, such as
summary, any HTML tags are removed. From the HTML field
body potentially harmful tags and attributes are removed, such as script tags.
This step also ensures that the
summary item field is set, thus ensuring a good display of items in result lists. If the summary is manually mapped when loading data this can be omitted.
When dealing with data from untrusted sources, this step should always be used.
This step does not take any configuration.