Support/Intern/2011/Brinda/Summer Internship

From MozillaWiki
Jump to navigation Jump to search

Things needed

Please make sure you have the following items

  1. MySQL Workbench
  2. VPN access to SUMO database
  3. Carrot2 Workbench
  4. Xml editor- Oxygen (good to start with to get an idea on how to use it). Other examples are jedit (limited version of Oxygen), Saxon (command line processor)


Project

I designed a mobile feedback gathering process. It’s only 4 main steps and basically automated (running scripts and queries). It groups all the main questions together, provides the vote numbers into a presentable format. It summarizes all of the mobile questions on SUMO on a single page making feedback reporting very fast.

These steps are:

1. Run a defined SQL query to get the questions and export the file.

a) Connect MySQL Workbench to sumo_dump table and run the queries

Query for New Questions asked

select CONCAT(CONCAT(qq.title," ::"),count(distinct(qqv.id))) as title , CONCAT("http://support.mozilla.com/en-US/questions/",CAST(qq.id as char)) AS url, qq.content as snippet
from questions_questionvote qqv
JOIN questions_question qq ON qq.id= qqv.question_id
JOIN taggit_taggeditem tti ON qq.id= tti.object_id
JOIN taggit_tag tt ON tt.id= tti.tag_id
where qq.created > "2011-08-01"
and
(tt.name like "%mobile%")
group by qq.id
order by count(distinct(qqv.id)) desc


Popular questions with votes > 3

select CONCAT(CONCAT(qq.title," ::"),count(distinct(qqv.id))) as title , CONCAT("http://support.mozilla.com/en-US/questions/",CAST(qq.id as char)) AS url, qq.content as snippet
from questions_questionvote qqv
JOIN questions_question qq ON qq.id= qqv.question_id
JOIN taggit_taggeditem tti ON qq.id= tti.object_id
JOIN taggit_tag tt ON tt.id= tti.tag_id
where qqv.created > "2011-08-01"
and
(tt.name like "%mobile%")
group by qq.id
having count(distinct(qqv.id))>3
order by count(distinct(qqv.id)) desc


b) Export the result in XML format. You can do this by clicking on that small floppy icon right above the result area

c)Open and input XML file and FIND and REPLACE all <ROW>, </ROW> with <document>, </document> respectively. Similary change <DATA>, </DATA> to <searchresult>, </searchresult>

d) Top three lines of your XML file should look like this. Add the two extra lines to look like this

<?xml version="1.0" encoding="UTF-8"?>
<searchresult>
<query>data</query>


2. Run the file through Carrot with defined parameters.
3. Run a script I wrote to remove duplicates and clean up the output.
4. Run a script to format the data for easy presenting.

  • Because the process doesn’t work for desktop -- too much data to process for Carrot. I made a new process that finds new issues (and potentially new issues) using a combination of Carrot and SQL. It’s a little more complicated but I think it works a lot better than what Cheng is currently doing.
    • Run a defined SQL query to look at top voted questions that were asked this week. -- Right now Cheng is only looking at the Most Requested ones on the site this helps filter out a lot of the less useful questions and generic “Firefox sucks” questions.
      ** Run Carrot with defined parameters (which are different from the Mobile parameters) [CW: This gives you a list of clusters but you have to make human decisions here... you can't just take the top ones and go with it... it's not like mobile.]
      ** Carrot provides keywords that you can then use in a defined SQL script. -- This lets you get much more accurate numbers for the number of people with a problem -- not just the count off the top 20 pages but from all the questions.
      ** For investigating issues, I made another Carrot parameter set that lets you take generic keywords (like hotmail) and helps break that down into specific issues. You need to use this with the dedupe script -- it provides a lot more insight on those keywords. It can tell you (for example) that on top of the usual hotmail questions, we suddenly see more of one specific issue.
      ** You can use the formatting script here if you need to present information as well (although it doesn’t work as well as in the mobile case).

In addition, I worked on a number of useful SQL queries that aren’t part of the usual feedback gathering.

  • I have a query that gives votes on a day-wise basis for specific keywords/issues/threads to see how these problems are trending among users and if anything is flaring up. This was particularly useful after the Firefox 5 release.
    * I have a query that provides top threads for any issue which helps identify good things to link developers to.
    * I have a query that lets you break down questions by browser version and OS version to provide more information on specific issues or just in general.
    * I have a query to see how often certain articles are used in forum replies.