Documents Connector#

Profile: Project Creator

This page provides an overview of the Documents connector.

Project creators use this connector to upload documents to their Squirro projects.

Overview#

The Documents connector is one of the most frequently-used out-of-the-box Built-In Data Connectors.

It is found under the Data Import tab of Data Sources and is used to upload common office document file types to a Squirro project.

Documents Connector

Reference: For general information on how to load data into Squirro, see How to Load Data Using the UI.

Acceptable Document Types#

The Documents connector allows you to upload the following common document formats, including Microsoft Office and OpenOffice file types:

  • .CSV

  • .DOC, .DOCX

  • .ODP, .ODS, .ODT

  • .PDF

  • .PPT, .PPTX

  • .RTF

  • .TXT

  • .XLS, .XLSX

Warning

Password protection on files must be removed before upload, otherwise they will not load correctly and will not be indexed.

CSV and Excel Files#

Squirro offers standalone CSV and Excel connectors for working with CSV, XLS, and XLSX files in a more advanced way.

If these file types are uploaded using the Documents connector, they are converted to PDFs for indexing.

If end users will simply view the files in the Squirro UI, this is fine. However, if you will be using these file types as the basis to create labels, as the basis of machine learning models, or in other advanced use cases, you should use a dedicated connector instead.

Note: When the Excel and CSV connectors are used, every row in those uploaded files is indexed as a separate Squirro item. However, with the Documents connector, each document is indexed as a single PDF item.

ZIP Files#

You can also upload ZIP files containing multiple documents. The ZIP file will be extracted and each document will be indexed separately.

ZIP files can contain multiple different document types together, for example, a ZIP file could contain a PDF, a DOCX, and a TXT file.

Warning: Although ZIP files can contain multiple different document types, they cannot contain sub-folders.