EmailParseNormalizer

class squirro.lib.nlp.steps.normalizers.EmailParseNormalizer(config)

Bases: squirro.lib.nlp.steps.normalizers.Normalizer

Email parser normalizer parse an email string to extract email body. Given a non-html email string, parse to extract body, additional cleaning is applied for footer not extracted by python email parser.

Parsing rules:

  1. Find a regex match “Body:” in the email string.

  2. Extract body using python email parser.

  3. Given a list as discard_footers (eg. [“Best regards”, “Warm Regards”,]), discard the body after first

    appearance of a footer string.

Example input:

From: Squirro\nTo: Hi,\nI hope to find you well. In the emails before youve learned more about our Insights Engine.\nBest regards, Squirro

Example output:

I hope to find you well. In the emails before youve learned more about our Insights Engine.
Parameters
  • type (str) – email_parse

  • field (str, None) – Field to parse emails

  • fields (list, None) – List of fields to parse emails

  • discard_footers (list) – Discard the text after occurance of these footer strings.

  • input_fields (list) – Fields to clean email

  • output_fields (list) – Fields to record cleaned email body

Methods Summary

process_doc(doc)

Process a document

Methods Documentation

process_doc(doc)

Process a document

Parameters

doc (Document) – Document

Returns

Processed document

Return type

Document