File Upload#

Upload PDF documents directly to Squirro Chat for ad hoc analysis without requiring permanent indexing. This feature lets users analyze PDF documents on demand, extract relevant information, and receive real-time answers from files during conversations.

Currently, Squirro Chat is designed to work with PDF files. While other formats might occasionally function, only PDF is guaranteed to deliver consistent results. Support for additional file formats is planned for future iterations.

Overview#

The file upload feature allows you to:

  • Upload documents directly to chat conversations for immediate analysis.

  • Ask questions about uploaded content without indexing the files permanently.

  • Analyze multiple files simultaneously in a single conversation.

Files are stored temporarily and associated with the specific conversation, making this ideal for quick document reviews, contract analysis, research tasks, and other use cases where permanent indexing is not needed.

How to Use#

  1. Open Squirro Chat in your project.

  2. Click the file upload button or drag and drop files into the chat interface.

  3. Wait for upload confirmation.

  4. Start asking questions

Processing Strategies#

Squirro Chat uses three intelligent strategies to process uploaded files, automatically selecting the most appropriate approach based on your query.

Sequential Reading Strategy#

Best for reading through documents systematically, browsing page by page, or when you need to understand the full context.

How it works:

  • Returns content in document order (page by page).

  • Supports pagination with automatic continuation.

  • Shows progress (for example, “Showing chunks 1-10 of 50”).

  • Agent can continue reading with subsequent queries.

Search Strategy#

Best for finding specific information, locating keywords, or answering targeted questions.

How it works:

  • Searches document content using keyword and semantic matching.

  • Returns the 15 most relevant chunks with relevance scores.

  • Includes surrounding context chunks for better understanding.

  • Highlights page numbers for easy reference.

Full Document Strategy#

Best for small documents, complete summaries, or when you need the entire content.

How it works:

  • Extracts and returns complete document text.

  • No chunking or pagination.

  • Processes entire file at once.

  • Includes page markers for PDF documents.

Configuration#

For Administrators#

The file upload tool can be configured at the agent level with the following parameters.

Parameter

Default

Description

default_search_strategy

sequential

Default strategy: search, sequential, or full.

max_chunks

100

Maximum chunks returned for sequential strategy.

include_metadata

False

Show chunk metadata (page numbers, scores, indices).

chunk_context

4

Number of surrounding chunks for search results.

max_tokens_per_file

50000

Maximum tokens per file (approximate limit).

use_fallback

True

Enable fallback to PyMuPDF when chunks are unavailable.

To configure these settings:

  1. Navigate to Project Settings > Agents.

  2. Select the agent that uses file upload.

  3. Expand the File Upload tool configuration.

  4. Adjust the parameters.

  5. Save changes.

For Developers#

Custom tools can integrate with the file upload feature using the attachments placeholder.

@deploy_as_agent_tool(
    "custom_file_tool",
    define_placeholders={
        "attachments": "attachments",
        "conversation_id": "conversation_id",
    }
)
class CustomFileToolFactory(ToolBase):
    attachments: list[dict] | None = Field(default_factory=list)
    conversation_id: str = "default"

Files are passed as a list of dictionaries containing file_id, filename, and content_type.

Considerations#

File Storage#

  • Temporary storage only

    The uploaded files are not permanently indexed in Squirro.

  • Conversation-specific

    The uploaded files are associated with the conversation and not shared across chats.

  • Session-based

    The uploaded files remain available during the conversation session.

File Size and Processing#

  • Token limits apply

    The uploaded files are truncated if they exceed configured token limits. The default limit is 50’000 tokens, which corresponds to approximately 200’000 characters.

  • Context window dependency

    The maximum document size that can be effectively processed depends on the context window supported by the attached LLM model.

  • Processing time

    Large files may take longer to process for chunking.

Access and Permissions#

  • Project-level permissions

    Every user with read access or above can use the file upload feature.

  • User-specific access

    The uploaded files are accessible only via the user’s authentication token.

  • No sharing

    The uploaded files cannot be shared with other users or conversations.

Example Uses#

Contract Analysis#

Quickly review contract terms without permanent storage.

User: [Uploads contract.pdf]
User: What are the payment terms in this contract?

Chat: [Searches for "payment terms"]
Chat: According to Section 3.2 on page 5, the payment terms are...

Research Document Review#

Analyze research papers for specific findings.

User: [Uploads research-paper.pdf]
User: What methodology did the researchers use?

Chat: [Searches for "methodology"]
Chat: The study used a mixed-methods approach described in Section 2...

Multi-File Comparison#

Compare information across multiple documents.

User: [Uploads proposal-v1.pdf and proposal-v2.pdf]
User: What are the differences in pricing between these two proposals?

Chat: [Searches both files for "pricing"]
Chat: In proposal-v1.pdf, the total is $50,000 (page 3), while
       proposal-v2.pdf shows $47,500 (page 4)...