Pancake Thumbnailer Infrastructure

From MozillaWiki
Jump to: navigation, search

See the diagrams attached to https://bugzilla.mozilla.org/show_bug.cgi?id=731228

Pancake THumbnailer API

What does it do

It implements an API to generate web site thumbnails (screenshots). You give it a bunch of links and it returns a list of URLs to images that contain a screenshot for those links.

What does it store

It stores thumbnail jobs in RabbitMQ.

It stores the state of the thumbnail request in a Redis database. This data expires within a few minutes.

What does it talk to

It talks to a RabbitMQ server to store and distribute thumbnail job.s This data is not persistent and expires as soon as the job has been processed. The job contains:

  • Thumbnail Job ID
  • Site URL
  • Site URL Hash

It talks to a Redis database to store the state for the thumbnail request. The following data is stored in Redis:

  • Thumbnail Job ID
  • All Sites part of the Job
  • Status of the sites (processing, error, ready)

It talks to Amazon S3 to find out if thumbnails for a specific site already exist.

Pancake Thumbnailer Worker

What does it do

It processes a thumbnail request. It uses 'phantomjs' to render the site and create an image. The resulting image is then stored in Amzon S3 and the status of the thumbnail request is updated in Redis.


What does it talk to

It talks to a RabbitMQ server to poll for thumbnail jobs.

It talks to Redis to maintin the state of the thumbnail request.

It talks to Amazon S3 to store resulting thumbnail images

It talks to the site that is being thumbnailed

What does it store

It stores/updates the thumbnail request status in Redis. This data expires within a few minutes.

It stores the images in Amazon S3. They are expired after 24 hours.