Overview

One of the important questions for the Mozilla and Firefox platform is:

How many add-ons make up the Mozilla add-ons eco-system?

It's important to get the answer to this so that:

We can understand how pervasive Mozilla add-ons are
We can help users find ALL the add-on available on the net (not just those on AMO)
We can use this number to show a groundswell of support for the platform to encourage others to develop to it
We can start to index this information in a central location (AddonSearch)

This actually turns out to be quite hard to answer. AMO is one of the main distribution point for add-ons but it's certainly not the only one. The goals of this project is to gather and index information about add-ons "in the wild".

Here are a few ideas about where add-ons can be hiding.

Aggregation Sources

Mozilla AMO (public & sandboxed)
Mozilla AMO Update Service (some authors don't include an update URL which means that Firefox attempts to get updates from AMO and the GUID is logged)
AMO-like sites: AMI, Sociz, China, Mozilla Japan Addons, erweiterungen.de, Addons.pl, other locale-specific sites?
Source Repos: MozDev projects, Google Code & SourceForge
Search results: Google ("filetype:xpi", "firefox add-ons", "firefox extensions"), Yahoo, etc...
Those mentioned in Google Alerts (blogs & news) on a regular basis
Blog aggregators: Foxiewire
Addon-specific sites for XUL Apps (Songbird Nest, Flock Extensions, ...)

Individual Sources

Corporations (Google Toolbar, Google Labs)
Inside of Installers (Symantec Anti-Virus, McAfee, Skype, Java)
Individual authors' blogs and websites

Project Definition

Write a crawler that gathers info from some of the sources named above
Index the collected info and try to extract metadata from page context and the install.{js/rdf}
Allow "manual entries" to be entered into the index (e.g. for add-ons bundled in Installers)
Build a search/advanced search UI on top of the index
Initial focus should be on Firefox, Thunderbird, SeaMonkey, Flock, Songbird and Nvu only

Tech Notes

Thankfully most add-on have a .xpi file extension, so they might be easier to identify
.xpi files are ZIP files and usually contain either an install.{js/rdf} which has info about what the add-on does
The install.{js/rdf} contains a GUID which uniquely identifies the add-on (hopefully) - we may be able to use this as the primary index
TargetApplication id's and versions

Crawling

Crawling and parsing would probably be an intensive and time consuming process

Google search results (filetype:xpi) is limited. For example, a Google search for (filetype:xpi site:addons.mozilla.org) only returns 62 hits. Probably best to supplement our data rather than be primary.
How much do we crawl? How deep?

Aggregate Sites
- Two Kinds
  1. Hosting (AMI, AMO)
  2. Linking (FoxieWire)
- Site specific. Maybe only second-level domains (eg. addons.mozilla.org/* instead of all of mozilla.org). Addon authors sometimes have links on their addons page to their personal website with a more up-to-date addon.
- Mozdev/others/..

Individual Sites
- Wordpress/Blogspot (can extensions be uploaded here?)
- Google/Yahoo search
  - Rich sources of information. But too much information, or lacking quality

Bouncer
- What kind of information does bouncer collect?
- Does not give context/rating/url probably
- Good/Bad source?

GUID Collisions

Same extension different version
Same extension, same version, different website (hash comparisons?)
Same GUID, different extensions (different name)
Different extension, possibly malicious or coincidence

What to Track

Addon url (where did we find it?)
Filename
Supported Applications and versions
developer(s)
GUID changes
locales it supports
context (entire paragraph)
Ratings? (Site-specific)
Categories (How?)

Tools

Something to extract a zippy.
- Look for chrome.manifest
- Look for install.{rdf|js}
- Parse those files (rdf is xml, chrome.manifest should be simple, but what about install.js?)
Something to crawl
Something to store (database for better querying?)
List of websites to crawl
Crawler's settings (eg. How deep)

Technical Resources

Manual Extensions

Extensions that are bundled with an install, and therefore must be added manually

http://free.grisoft.com/ww.faq.num-1241#faq_1241
Symantec noting their poor addon
McAfee SiteAdvisor
Ubuntu includes an os-integration addon which is pre-installed in Firefox.
Yahoo, Amazon, Ask.com, etc. toolbars
MediaWrap, installed in some distros to give Firefox ActiveX support
XPL LinkScanner
Thinkvantage Password Manager Extension installed on Lenovo computers
Skype extension for firefox
Realplayer browser record

Update:WildOn

Contents

Overview

Aggregation Sources

Individual Sources

Project Definition

Tech Notes

Crawling

GUID Collisions

What to Track

Tools

Technical Resources

Manual Extensions

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

How to Contribute

MozillaWiki

Around Mozilla

Tools