convert.ini#

The convert.ini config file, located at /etc/squirro/convert.ini configures the content extraction (see Content Extraction).

Memory Limit#

New in version 3.6.9: By default, in Squirro 3.6.9 and later, Squirro uses an external dedicated Tika service. As such, auto-spawner settings do not apply.

If there is no Tika already running under the http://localhost:9998 address, the Squirro Ingester service will auto-spawn the Tika server. This allows for the configuration of a memory limit for the spawned Tika server.

The memory is limited to avoid running into out-of-memory exceptions when converting. For large documents, the default setting may be too low. When this happens, there will be aborted items in the index with SQ-05105 processing errors. See Processing Errors to learn more.

In the apache-tike section, use the vmargs option.

Key

Usage

Default

Example

vmargs

Java VM parameters. See Oracle’s Java HotSpot VM Options reference for the options.

-Xmx512M,-Xms64M

[apache-tika]
# comma-separated list of additional Java Virtual Machine command-line options
# to use
vmargs = -Xmx512M,-Xms64M

Using External Tika Service#

The Tika web service can be configured to run separately for more fine-grained control.

New in version 3.6.9: Releases Squirro 3.6.9 and later use the dedicated Tika service by default.

Key

Usage

Default

Example

tika-url

Point to a Tika server, http://localhost:9998 is the default.

[apache-tika]
tika-url = http://localhost:9998

tika systemd service is provided by squirro-tika-server package, and tika-server using /etc/tika-config.xml configuration file. For all options available see Tika Server Configuration.

Please note that changing the default configuration is not recommended, as it must be compatible with the Squirro Ingester and required configurations may change in the future.