convert.ini#
The convert.ini
config file, located at /etc/squirro/convert.ini
configures the content extraction (see Content Extraction).
Memory Limit#
New in version 3.6.9: By default, in Squirro 3.6.9 and later, Squirro uses an external dedicated Tika service. As such, auto-spawner settings do not apply.
If there is no Tika already running under the http://localhost:9998
address, the Squirro Ingester service will auto-spawn the Tika server. This allows for the configuration of a memory limit for the spawned Tika server.
The memory is limited to avoid running into out-of-memory exceptions when converting. For large documents, the default setting may be too low. When this happens, there will be aborted items in the index with SQ-05105 processing errors. See Processing Errors to learn more.
In the apache-tike
section, use the vmargs
option.
Key |
Usage |
Default |
Example |
vmargs |
Java VM parameters. See Oracle’s Java HotSpot VM Options reference for the options. |
|
[apache-tika]
# comma-separated list of additional Java Virtual Machine command-line options
# to use
vmargs = -Xmx512M,-Xms64M
|
Using External Tika Service#
The Tika web service can be configured to run separately for more fine-grained control.
New in version 3.6.9: Releases Squirro 3.6.9 and later use the dedicated Tika service by default.
Key |
Usage |
Default |
Example |
tika-url |
Point to a Tika server, http://localhost:9998 is the default. |
[apache-tika]
tika-url = http://localhost:9998
|
tika systemd service is provided by squirro-tika-server package, and tika-server using /etc/tika-config.xml configuration file. For all options available see Tika Server Configuration.
Please note that changing the default configuration is not recommended, as it must be compatible with the Squirro Ingester and required configurations may change in the future.