Content Standardization

Content Standardization#

The content cleanup enrichment cleans up incoming text and removes potentially malicious content from the HTML body.

Overview#

The Content Standardization step is used to clean content as it comes in. From text fields, such as title or summary, any HTML tags are removed. From the HTML field body potentially harmful tags and attributes are removed, such as script tags.

This step also ensures that the summary item field is set, thus ensuring a good display of items in result lists. If the summary is manually mapped when loading data this can be omitted.

When dealing with data from untrusted sources, this step should always be used.

Configuration#

This step does not take any configuration.