EmailParseNormalizer

class squirro.lib.nlp.steps.normalizers.EmailParseNormalizer(config)

Bases: squirro.lib.nlp.steps.normalizers.Normalizer

The email parse Normalizer parses an string which is based on a email to extract the email body. More in detail:

  1. Find a regex match “Body:” in the non-html email string.

  2. Extract body using python email parser.

  3. Given a list as discard_footers (e.g. [“Best regards”, “Warm Regards”]), discard the body after first appearance of a footer string.

Input - all input fields need to be of type str. Example:

From: Squirro\nTo: Hi,\nI hope to find you well. In the emails before you've learned more about our Insights Engine.\nBest regards, Squirro

Output - all output fields are filled with data of type str. Example:

I hope to find you well. In the emails before you've learned more about our Insights Engine.
Parameters
  • type (str) – email_parse

  • discard_footers (list) – Discard the text after occurrence of these footer strings.

Example

{
    "step": "normalizer",
    "type": "email_parse",
    "input_fields": ["email"],
    "output_fields": ["parsed_email"],
    "discard_footers": ["Best regards", "Warm Regards"]
}

Methods Summary

process_doc(doc)

Process a document

Methods Documentation

process_doc(doc)

Process a document

Parameters

doc (Document) – Document

Returns

Processed document

Return type

Document