Thumbnail Extraction

Thumbnail Extraction#

The Thumbnail Extraction pipeline step finds a thumbnail image to represent the Squirro item.

Enrichment name

Thumbnail Extraction, internally referred to as “webshot”

Stage

processing

Overview#

There are two ways that Squirro can find the right thumbnail for the item:

  • If the webshot_picture_hint field points to a valid image URL, that image is used as the thumbnail.

  • Alternatively the web site is downloaded and analyzed to find the most prominent image.

image1

Configuration#

Thumbnail extraction relies on an Amazon Web Services S3 configuration to store images for thumbnails and to retrieve thumbnails for display. Configure the following files:

Configuration File

Example

/etc/squirro/common.ini

/etc/squirro/common.ini

[services_external]
thumbler = //thumbler-testing.squirro.net

[thumbler_salt]
thumb = <salt_1>

/etc/squirro/webshot.ini

/etc/squirro/webshot.ini

[aws]
access_key = <key_1>
secret_key = <key_2>
s3_bucket = webshot.testing.squirro.net
s3_base_url = http://webshot.testing.squirro.net.s3-website-eu-west-1.amazonaws.com/

[webshot]
use_thumbler = True
thumbler_config = thumb
thumbler_bucket = webshot
thumbler_salt = <salt_1>

Then restart the sqwebshotd service.

/etc/squirro/thumbler.ini

/etc/squirro/thumbler.ini

[bucket_webshot]
is_s3 = True
access_key = <key_1>
secret_key = <key_2>
s3_bucket = webshot.testing.squirro.net

[config_thumb]
operation = scale
salt = <salt_1>

Then restart the sqthumblerd service.

URL and webserver configuration to forward

Example based on nginx: /etc/nginx/conf.d/thumber.conf

upstream thumbler-testing {
    server ip-squirro-cluster-node:443;
}

server {
    listen 443 ssl;
    server_name  thumbler-testing.squirro.net;

    ssl_certificate <ssl_certificate_1>;
    ssl_certificate_key <ssl_key_1;

    location / {
        proxy_pass https://thumbler-testing/service/thumbler/;
        proxy_set_header Host $host;
        proxy_set_header Connection Close;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_redirect    off;
        proxy_read_timeout 60;
    }

    # redirect server error pages to the static page /50x.html
    #
    error_page   500 502 503 504  /50x.html;
    location = /50x.html {
        root   /usr/share/nginx/html;
    }
}

Then reload the nginx service or other web server you may be using.