User:Mconnor/Past/PlacesFrecency

From MozillaWiki
Jump to: navigation, search

"Frecency" is a concept that combines of frequency and recency.

The overhead of calculating frecency for each request is simply too high, as is the overhead of continuous recalculating frecency. So I would propose that we calculate a frecency rating via one of the methods below, and at sort time apply the weighting based on the lastVisited date. Done right, this should just be integer math, and quite fast.

All ranges and weights should be tuned, this is a first-cut approximation.

Range 1: 0-4 days (Weight 1.0) Range 2: 5-14 days (Weight 0.7) Range 3: 15-31 days (Weight 0.5) Range 4: 32-90 days (Weight 0.3) Range 5: 91+ days (Weight 0.1)

Starred pages get a 40% bonus, bookmarks get a 100% bonus.

Option 1 (max overhead/best accuracy)

Each visit is worth 100 points. Calculate the total point value for each visit, the sum is the frecency rating. Advantage is that sites you visit less often but still visit will lose rank faster than Option 2. Disadvantage is that iterating through all visits will not be fast for heavily visited sites.

Option 2 (lowest overhead/good accuracy)

Each visit is worth 100 points. Divide the total points by the days since first visit to get the frecency rating (round to nearest 10). Advantage is that this is a fairly fast calculation. Disadvantage is that heaviliy visited sites that are no longer as frequently visited will continue to have a high rank.

Option 3 (lowish overhead/very good accuracy)

Option 2, but pull the last 5 (10?) visits from the visit table to create a better multiplier (average out the multiplier for each visit, apply to the vists * 100 / days result)


Right now, Option 3 seems to be the best compromise for aging data without too much overhead, but Option 2 is an acceptable first step.