Thumbnail Extraction#
The Thumbnail Extraction pipeline step finds a thumbnail image to represent the Squirro item.
Enrichment name |
Thumbnail Extraction, internally referred to as “webshot” |
Stage |
processing |
Overview#
There are two ways that Squirro can find the right thumbnail for the item:
If the
webshot_picture_hint
field points to a valid image URL, that image is used as the thumbnail.Alternatively the web site is downloaded and analyzed to find the most prominent image.
Configuration#
Thumbnail extraction relies on an Amazon Web Services S3 configuration to store images for thumbnails and to retrieve thumbnails for display. Configure the following files:
Configuration File |
Example |
|
/etc/squirro/common.ini |
/etc/squirro/common.ini [services_external]
thumbler = //thumbler-testing.squirro.net
[thumbler_salt]
thumb = <salt_1>
|
|
/etc/squirro/webshot.ini |
/etc/squirro/webshot.ini [aws]
access_key = <key_1>
secret_key = <key_2>
s3_bucket = webshot.testing.squirro.net
s3_base_url = http://webshot.testing.squirro.net.s3-website-eu-west-1.amazonaws.com/
[webshot]
use_thumbler = True
thumbler_config = thumb
thumbler_bucket = webshot
thumbler_salt = <salt_1>
Then restart the sqwebshotd service. |
|
/etc/squirro/thumbler.ini |
/etc/squirro/thumbler.ini [bucket_webshot]
is_s3 = True
access_key = <key_1>
secret_key = <key_2>
s3_bucket = webshot.testing.squirro.net
[config_thumb]
operation = scale
salt = <salt_1>
Then restart the sqthumblerd service. |
|
URL and webserver configuration to forward |
Example based on nginx: /etc/nginx/conf.d/thumber.conf upstream thumbler-testing {
server ip-squirro-cluster-node:443;
}
server {
listen 443 ssl;
server_name thumbler-testing.squirro.net;
ssl_certificate <ssl_certificate_1>;
ssl_certificate_key <ssl_key_1;
location / {
proxy_pass https://thumbler-testing/service/thumbler/;
proxy_set_header Host $host;
proxy_set_header Connection Close;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_redirect off;
proxy_read_timeout 60;
}
# redirect server error pages to the static page /50x.html
#
error_page 500 502 503 504 /50x.html;
location = /50x.html {
root /usr/share/nginx/html;
}
}
Then reload the nginx service or other web server you may be using. |