User:Bhashem/WildOnAddons: Difference between revisions

User:Bhashem/WildOnAddons (view source)

Revision as of 23:23, 5 May 2008

1,975 bytes added , 5 May 2008

no edit summary

Cdolivei

78

edits

@@ Line 41: / Line 41: @@
 * The install.{js/rdf} contains a GUID which uniquely identifies the add-on (hopefully) - we may be able to use this as the primary index
 * [https://addons.mozilla.org/en-US/firefox/pages/appversions TargetApplication id's and versions]
+== Crawling ==
+Crawling and parsing would probably be an intensive and time consuming process
+* Google search results (filetype:xpi) is limited. For example, a Google search for (filetype:xpi site:addons.mozilla.org) only returns 62 hits. Probably best to supplement our data rather than be primary.
+* How much do we crawl? How deep?
+* Aggregate Sites
+** Two Kinds
+**# Hosting (AMI, AMO)
+**# Linking (FoxieWire)
+** Site specific. Maybe only second-level domains (eg. addons.mozilla.org/* instead of all of mozilla.org). Addon authors sometimes have links on their addons page to their personal website with a more up-to-date addon.
+** Mozdev/others/..
+* Individual Sites
+** Wordpress/Blogspot (can extensions be uploaded here?)
+** Google/Yahoo search
+*** Rich sources of information. But too much information, or lacking quality
+* Bouncer
+** What kind of information does bouncer collect?
+** Does not give context/rating/url probably
+** Good/Bad source?
+== GUID Collisions ==
+* Same extension different version
+* Same extension, same version, different website (hash comparisons?)
+* Different extension, possibly malicious or coincidence
+== What to Track ==
+* Addon url (where did we find it?)
+* Filename
+* Supported Applications and versions
+* locals it supports
+* context (entire paragraph)
+* Ratings? (Site-specific)
+* Categories (How?)
+== Tools ==
+* Something to extract a zippy.
+** Look for chrome.manifest
+** Look for install.{rdf|js}
+** Parse those files (rdf is xml, chrome.manifest should be simple, but what about install.js?)
+* Something to crawl
+* Something to store (database for better querying?)
+* List of websites to crawl
+* Crawler's settings (eg. How deep)
 = Technical Resources =
 * [http://www.robotstxt.org Writing a robot/crawler]
 * [http://www.silfreed.net/blog/2008/04/XUL-extension-parsing XUL Extension Parsing]
+= Manual Extensions =
+Extensions that are bundled with an install, and therefore must be added manually
+* http://free.grisoft.com/ww.faq.num-1241#faq_1241
+* [http://service1.symantec.com/SUPPORT/norton360.nsf/0/e1be9e4560c11b466525728900757836?OpenDocument| Symantec noting their poor addon]