Support/Intern/2011/Brinda
< Support
Jump to navigation
Jump to search
Project overview
The project is going to be figuring out a system to datamine the support forums and gather insights about top issues on SUMO. In particular, the project will involve investigating different methods to do text analysis, choosing a top candidate, optimizing the parameters and then using it to develop insights to improve SUMO support.
The deliverables are: a system (series of steps, script or algorithm) to analyze the contents of support forum threads; a report, using this tool, suggesting steps to take to improve SUMO based on what users most commonly ask about.
Project timeline
- Weeks 1-2: Get settled. Do a lot of forum support. As part of doing any kind of data analysis, it's important to be very familiar with the data set and what's available. To make the first weeks not just be about answering questions, we should also have Brinda make a list of first-impression suggestions and thoughts that she finds from both a user and contributor point of view. So not only doing support with an eye to understanding the data but also go in with a critical eye to what needs improving.
- Weeks 3-5: Evaluate a couple textual analysis and datamining tools, get familiar with using them (how to adjust parameters, pipe information in, read information out). This will also involve getting familiar with our database system.
- End of week 5: Checkpoint. A decision is made here if it's worth pursuing the data mining approach or if the available options don't suit our needs.
- Weeks 6-9: Optimize the models and parameters to get the best matching and fewest false positives. If the best model is one that uses training (Bayesian), this time will be spent training the model and evaluating the training data.
- Weeks 10-11: Demonstrate the value of a good datamining algorithm by compiling a list of canned responses that would answer the most forum questions OR some other SUMO improvement using knowledge gained from grouping issues/threads.
- Week 12: Report on findings during a brownbag. Wind down and end project.
Alternate plan
If there is no good data mining library or system (determined either before the project starts, or at the week 5 checkpoint), the back up plan is the following:
- Weeks 6-8: Using a combination of webtrends data, SUMO database info and a lot of manual grouping of threads, come up with a list of concrete suggestions for things like: new links to put in the AAQ form, links for the start page, canned responses.
- Week 9: Implement one of those suggestions (not canned responses of course, but AAQ or start page links)
- Week 10-12: Collect data about and analyze the impact of this change, present findings during a brownbag.