French Toast Backend Pieces

From MozillaWiki
Jump to: navigation, search

The purpose of this page is to get an idea of what kind of backend pieces are needed for new features in the French Toast release. They are organized by feature name and then detailed as what is needed on the backend.

This is just the result of a first discussion about these features. Nothing is set in stone.

French Toast Features

Pin Matching/Searching

Pin Matching is very similar to the page/stack matching that we currently have. It will show the top ranked Pins grouped by their Boards based on the search term.

This can be done completely with the existing infrastructure of Lattice, Neo4J and Elastic Search.

Search Autocomplete

The purpose of this feature is to provide something similar to Google Search Complete: give a list of suggestions when you start typing in the search box. We can do generic autocomplete, or maybe even use the Google API, and also do complete based on your data and other people's data in Pancake.

We would have a server that provides a suggest API. This server needs to provide a public API to the front-end and maintain an index of possible suggestions.

Ideally the index is updated as soon as we discover new meta data that can be used for autocomplete. So we likely want to hook it up to an event system to get notifications when people visit new pages, etc.

  • Stuart suggested cleo, a project done by LinkedIn
  • Can we use google suggest as one of the providers?

Search Suggestions

Currently the Bing search results contain suggestions. These are new search terms / topics that might relate to the original search term. These new terms will also be very similar. For example the user searches for Cheese and 'Cheese Plate' and 'Cheese Maing' are suggested.

We were thinking of using Elastic Search for this. We can index page titles, extracted meta data and maybe content to generate suggestions with a Suggestions Plugin for Elastic Search.

This would likely run on a separate server dedicated to indexing and suggesting.

To keep the index up to date, it will need to be hooked up to an event system that broadcasts when page meta data has been retrieved for new pages that users are pinning.

Boards / Pin to Board

We need to introduce the concept of Board nodes in the Lattice data model.

Boards have the following properties:

  • A short name
  • A creation date
  • A longer description?

Boards have the following relationships:

  • They must have one owner (the person who created the board)
  • They may have many followers (if the board is public)
  • They may have many contributors (if the board is shared)
  • They may have many pages

Questions from discussion:

  • Are pages the primary entity that can be put on boards? (There was talk about putting images and text on boards.)
  • Is the owner/follower/contributor model good?

Cluster Boards

Or 'smart boards' ? These boards show a generated/calculated collection of pages.

Not completely sure how these would work yet, so we skipped these in discussions.

Are Clustered Boards based on a search term? What is the starting point for them?

Updates from People

We assumed that this feature will show a timeline with shares and changes that your Pancake friends have made recently. Very much like the timelines we have for the social feed.

Items that should appear on this timeline are:

  • Facebook and Twitter updates like Pancake currently displays
  • New public Boards created by your Pancake friends?
  • New pages added to public Boards by your Pancake friends?

The first item is not Pancake specific and is the same as the social feed we have now.

If we want the last two items then we will need to maintain a social graph in Lattice. This social graph will have to be kept in sync with your Twitter and Facebook friends. It will be a subset of your Twitter/Facebook friends who are also using Pancake. But we need to take a lot of care of keeping it in sync. This means:

  • Every time a user connects/disconnects from Pancake we need to update the graphs for all people that they connect with
  • Every time a user friends/follows someone new on the external social media site, we have to check if that person is a Pancake user and update our graph
  • Every time a user defriends/unfollows someone on the external social media site, we have to update our graph

Board Name Suggestions

When the user is looking at a page and brings up the 'Pin to Board' dialog we need to make suggestions for possible board names. These suggestions must be relevant to the page that the user is looking at.

The initial idea is to use the AlchemyAPI and Diffbot services to find interesting keywords. Both return a range of concepts, entities and keywords that can be good suggestions for a Board Name. Another source can be board names that the user already used before. Or even global board names. The combined results can be ranked or sorted in an interesting way.

For the backend we will need a web service that the front-end can call to obtain this list of suggestions.

The suggestions need to be available as soon as possible. This means that as soon as the backend knows that a page has been visited, it must submit the page to the available APIs and collect meta data for it.

There is a change that the meta-data is not available yet when the user opens the Pin to Board dialog. The UI should therefore take this into account.

Backend Infrastructure

Many of the above pieces will have to respond to new content appearing in the system. We are thinking about introducing some kind of message bus where we can publish events that happen. Events are for example:

  • A new user signed up
  • A user changed his social settings
  • A user has a changed social graph
  • A user has subscribed to a board
  • A user has created a board
  • A user has pinned a new page
  • We have processed a new page and all its meta-data has been retrieved

Many events are generated directly by user actions. These will likely be responded to by a bunch of worker processes that do tasks like:

  • Make a thumbnail
  • Ask diffbot to process the page
  • Ask AlchemyAPI to process the page
  • Do our own processing

Those tasks can then generate more events to which other systems can respond. For example, the Search Autocomplete can subscribe to some of these events to keep its databases up to date. When it receives a 'hey we have the raw meta data for a new page available' event, it can grab that meta data and process it and index it in its own special way.