EmailParseNormalizer

EmailParseNormalizer#

class EmailParseNormalizer(config)#

Bases: Normalizer

The email parse Normalizer parses an string which is based on a email to extract the email body. More in detail:

Find a regex match “Body:” in the non-html email string.
Extract body using python email parser.
Given a list as discard_footers (e.g. [“Best regards”, “Warm Regards”]), discard the body after first appearance of a footer string.

Input - all input fields need to be of type str. Example:

From: Squirro\nTo: Hi,\nI hope to find you well. In the emails before you've learned more about our Insights Engine.\nBest regards, Squirro

Output - all output fields are filled with data of type str. Example:

I hope to find you well. In the emails before you've learned more about our Insights Engine.

Parameters:

type (str) – email_parse
discard_footers (list) – Discard the text after occurrence of these footer strings.

Example

{
    "step": "normalizer",
    "type": "email_parse",
    "input_fields": ["email"],
    "output_fields": ["parsed_email"],
    "discard_footers": ["Best regards", "Warm Regards"]
}

Methods Summary

process_doc(doc)

Process a document

Methods Documentation

process_doc(doc)#

Process a document

Parameters:: doc (Document) – Document
Returns:: Processed document
Return type:: Document

EmailParseNormalizer

Contents

EmailParseNormalizer#