Reporter: Phishing Protection Integration Discussion
Reporter by design is to collect feedback from users about websites they visit. It's done it's job rather well thus far (and should get better once the upgraded webtool is deployed). It may be wise to partner it up with Safe Browsing to allow it to serve more than just evangelism efforts. It could now move into an electronic form of community policing.
Data Collection
The data would immediately be available (raw and unconfirmed) via the reporter webtool for anyone (including but not limited to Google) to harvest for use in their blacklist.
If we implemented this in a 1.5.0.x release, we could start data collection earlier, and have a better database by the time 2.0 ships.
Humans (structures that process data without executable code) would need to review submissions to reporter to weed out any false reports.
All suspects would be innocent until proven guilty in the court of internet scams/phishing attacks.
Reporter as a Service (theoretical)
I’ve been playing with the idea that reporter could essentially operate as a service and provide end users with our own blacklist. I have essentially hacked both reporter’s webtool and and the new safe browsing extension to give this concept a test. It’s purely theoretical, but it does show promise. My proposal on such a plan is as follows:
- Implement above data collection as early as possible. Should ‘’’strongly’’’ consider supporting thunderbird with the ability to report phishing emails to further strengthen the database.
- Infrastructure. From what I can tell, this would be a rather hungry service. Feasible, but hungry. There are a lot of potential users out there.
- Safe browsing would need to allow alternate providers (see Bug 329786).
- Safe browsing would need to allow alternate branding based on selected provider.
- There’s likely more, but these are the ones that initially strike me.
- Privacy issues? Legal issues (we’re not scammers, we’re a legitimate business who happens to share the same name as a popular banking institution trying to collect account information for profit).
Google Safe Browsing Server Implementation
In no particular order, this is my list of questions/comments when reading and looking at how to implement a server in Safe Browsing: Design Documentation:
- Charset - Is charset utf-8? Should it be (IDN Domains)?
- Protocol4 - It’s uber simple and seems to do what it needs to do. Lovely.
- Grey Listing – Right now it seems the only possible options are phishing or not phishing. Is there any benefit to implementing a grey list (for those with lots of submissions, but no confirmation)?
- Threat types - Phishing, Spammers? Do we differentiate? Will we? Should we?
- Can’t view a sample report using a url such as: [1]
- In a case where no data is returned (such as above), a content.
- URL Canonicalization – just ugly, best effort seems about the best anyone can do.
- ARC4? Wikipedia affirms my initial thought that it isn’t recommended for new applications. Better alternatives? Dunno.
- Haven’t implemented ARC4 in my tests yet. Need to find PHP implementation, or use (mycrypt? PEAR?).