Input gets a non-trivial amount of hate speech and other abusive responses. Per our community guidelines and our legal notices (both are linked to in the footer of Input, we should be deleting or redacting egregious examples.

Up until now, we had no good way of identifying and dealing with such things. This project will yield the infrastructure to allow us to measure and deal with inappropriate Input content.



  • FIXME: Fill in pre-history
  • September 3rd, 2014: Wrote up project page
  • September 8th, 2014: Pushed prototype classifier into production to see how well it works. Pulling data now.


internship project (2014q3)


  1. build a Python library that can be used to classify texts as spam/ham/abuse
  2. integrate it into Fjord for testing/honing and figuring out our options


  1. we won't remove or otherwise change responses based on classification--this is purely a research phase


Tracker bug:

1062436 classifier flags for responses -- RESOLVED
1062439 post_save celery task for classifying responses -- RESOLVED
1062444 generate classifier training data -- RESOLVED
1062453 create analyzer view for examining classification data -- RESOLVED
1062455 add spicedham to vendor/ -- RESOLVED
1063825 implement spicedham backend for fjord -- RESOLVED

v1 (2014q4)

Depends on outcome of internship project.

Future possibilities

  • "flag as spam/abuse" buttons on the dashboard allowing users who are authenticated and authorized to flag items as spam/abuse