SocorroRoadmap2010: Difference between revisions
(Created page with '{{DRAFT}} = Related Quarterly Goals = * Q2: Migrate crash report generation onto new infrastructure (Webdev/IT/Metrics shared) = Client = * [client team] Gather more informati…') |
|||
| (20 intermediate revisions by 3 users not shown) | |||
| Line 1: | Line 1: | ||
{{DRAFT}} | {{DRAFT}} | ||
= Dates = | |||
* Dates for the following goals can be found [http://spreadsheets.google.com/ccc?key=0AhiX365xacl1dDZ6eVdMc1Vpbld3elMyX2lnUzJnclE&hl=en in google docs] | |||
= Related Quarterly Goals = | = Related Quarterly Goals = | ||
* | * Metrics q2 goals (for background): | ||
** Replace NFS in production | ** Replace NFS in production | ||
** Have cluster doing background processing of 100% of crash reports | ** Have cluster doing background processing of 100% of crash reports | ||
** Provide replacement for Postgres big table | ** Provide replacement for Postgres big table | ||
** [stretch] Developer API, likely to slide to Q3 | ** [stretch] Developer API, likely to slide to Q3 | ||
* Get reports | * Client team goal: Gather more information from crashes {{bug|528657}} | ||
= Milestones = | |||
== 1.7: Hbase, part I == | |||
* Get individual crash reports into Hbase | |||
* Begin rewriting pythonic middleware to support UI -> Hbase (transparent to UI at this stage) | |||
* OOPP hang reports supported | |||
* End of NFS | |||
== 1.8: Hbase, part II == | |||
* Daemonize processor/MDSW and run on Hbase worker nodes (architecture diagram coming) | |||
= | == 1.9 Middleware API == | ||
* ( | * Create an API to HBase/SOLR to replace most PostgreSQL queries in the webapp | ||
== 2.0: new UI == | |||
* Rewrite webUI to use new middleware | |||
* (stretch goal, may slip to 2.1) Implement a general purpose full text search. Should be able to search on any data associated with a crash, e.g any part of the stack trace and/or module list, any permutation or combination of field values | |||
** [[http://tinyurl.com/socorro-search Existing search bugs]] | ** [[http://tinyurl.com/socorro-search Existing search bugs]] | ||
** PRD needed (implicit in bugs, make explicit) | |||
** UX work needed here [chowse] | |||
= | == 2.01 Cleanup == | ||
* | * Post 2.0, let's do a clean up release to do a bunch of housekeeping | ||
* | * Perform a team survey of the unit testing landscape | ||
* | ** Define unit testing needs | ||
*** Hadoop | |||
*** Python | |||
*** PHP | |||
** Define integration testing needs | |||
** Define acceptance testing needs | |||
* Define unit testing strategy | |||
** Assign people to champion each area of testing | |||
* Perform a team survey of the documentation landscape | |||
** Define documentation needs | |||
*** http://code.google.com/p/socorro | |||
*** Python | |||
*** PHP | |||
** Assign people to champion each documentation area | |||
* Better app monitors / business level monitoring | |||
* Subversion | |||
** Decide whether to change to branch release system - https://bugzilla.mozilla.org/show_bug.cgi?id=481479 | |||
= Reports = | == 2.(x+1) Trend Reports, part 1: Explosive bugs == | ||
* | * Explosive Bugs Analysis | ||
** Automated detection of explosive bugs | |||
** First stage is {{bug|519423}} | |||
** PRD is needed here [chofmann/laura] | |||
** UX is needed here | |||
== 2.(x+2) Trend reports, part 2: better correlations == | |||
* Other [https://bugzilla.mozilla.org/buglist.cgi?quicksearch=component%3Asocorro+whiteboard%3Acloud cloud based correlation reports]: | |||
** Between one report and other related reports: what are the logical correlatons? (PRD needed) | ** Between one report and other related reports: what are the logical correlatons? (PRD needed) | ||
** Correlation between any single piece of data and another (e.g. plugins, time, etc | ** Correlation between any single piece of data and another (e.g. plugins, time, etc | ||
*** Replace current correlations HACK with cloud version {{Bug|554373}} | |||
== 2.(x+3) == | |||
* Draft goal: smarter analysis | |||
= Process = | = Process = | ||
These improvements shall be made over the course of Q2 (and likely continuing in Q3) | |||
== Better release process == | == Better release process == | ||
| Line 35: | Line 78: | ||
== Testing and QA == | == Testing and QA == | ||
* Better code review practices: commits to mailing list | |||
* Add QA to release cycle | * Add QA to release cycle | ||
* See [https://wiki.mozilla.org/QA/Execution/Web_Testing/Socorro/Test_Plan Test Plan] for UI testing | * See [https://wiki.mozilla.org/QA/Execution/Web_Testing/Socorro/Test_Plan Test Plan] for UI testing | ||
* More unit tests, more integration tests | * More unit tests, more integration tests | ||
* Validate data sources against each other (e.g. {{bug|552539}}, {{bug|553144}}) - also look back at similar fixed bugs for test cases | * Validate data sources against each other (e.g. {{bug|552539}}, {{bug|553144}}) - also look back at similar fixed bugs for test cases | ||
* | * Run tests automatically on checkin (Hudson?) | ||
== Monitoring == | == Monitoring == | ||
* Write scripts for app level monitoring for IT to hook up to nagios | * Write scripts for app level monitoring for IT to hook up to nagios | ||
* Implement "business logic" monitors: check things like hourly volume via webapp, db, etc | * Implement "business logic" monitors: check things like hourly volume via webapp, db, etc | ||
* Expand application health [[http://crash-stats.mozilla.com/status dashboard]] | * Expand application health [[http://crash-stats.mozilla.com/status dashboard]] | ||
Latest revision as of 20:16, 17 August 2010
DRAFT
The content of this page is a work in progress intended for review.
Please help improve the draft!
Ask questions or make suggestions in the discussion
or add your suggestions directly to this page.
Dates
- Dates for the following goals can be found in google docs
Related Quarterly Goals
- Metrics q2 goals (for background):
- Replace NFS in production
- Have cluster doing background processing of 100% of crash reports
- Provide replacement for Postgres big table
- [stretch] Developer API, likely to slide to Q3
- Client team goal: Gather more information from crashes bug 528657
Milestones
1.7: Hbase, part I
- Get individual crash reports into Hbase
- Begin rewriting pythonic middleware to support UI -> Hbase (transparent to UI at this stage)
- OOPP hang reports supported
- End of NFS
1.8: Hbase, part II
- Daemonize processor/MDSW and run on Hbase worker nodes (architecture diagram coming)
1.9 Middleware API
- Create an API to HBase/SOLR to replace most PostgreSQL queries in the webapp
2.0: new UI
- Rewrite webUI to use new middleware
- (stretch goal, may slip to 2.1) Implement a general purpose full text search. Should be able to search on any data associated with a crash, e.g any part of the stack trace and/or module list, any permutation or combination of field values
- [Existing search bugs]
- PRD needed (implicit in bugs, make explicit)
- UX work needed here [chowse]
2.01 Cleanup
- Post 2.0, let's do a clean up release to do a bunch of housekeeping
- Perform a team survey of the unit testing landscape
- Define unit testing needs
- Hadoop
- Python
- PHP
- Define integration testing needs
- Define acceptance testing needs
- Define unit testing needs
- Define unit testing strategy
- Assign people to champion each area of testing
- Perform a team survey of the documentation landscape
- Define documentation needs
- http://code.google.com/p/socorro
- Python
- PHP
- Assign people to champion each documentation area
- Define documentation needs
- Better app monitors / business level monitoring
- Subversion
- Decide whether to change to branch release system - https://bugzilla.mozilla.org/show_bug.cgi?id=481479
2.(x+1) Trend Reports, part 1: Explosive bugs
- Explosive Bugs Analysis
- Automated detection of explosive bugs
- First stage is bug 519423
- PRD is needed here [chofmann/laura]
- UX is needed here
2.(x+2) Trend reports, part 2: better correlations
- Other cloud based correlation reports:
- Between one report and other related reports: what are the logical correlatons? (PRD needed)
- Correlation between any single piece of data and another (e.g. plugins, time, etc
- Replace current correlations HACK with cloud version bug 554373
2.(x+3)
- Draft goal: smarter analysis
Process
These improvements shall be made over the course of Q2 (and likely continuing in Q3)
Better release process
- See DevProcess
Testing and QA
- Better code review practices: commits to mailing list
- Add QA to release cycle
- See Test Plan for UI testing
- More unit tests, more integration tests
- Validate data sources against each other (e.g. bug 552539, bug 553144) - also look back at similar fixed bugs for test cases
- Run tests automatically on checkin (Hudson?)
Monitoring
- Write scripts for app level monitoring for IT to hook up to nagios
- Implement "business logic" monitors: check things like hourly volume via webapp, db, etc
- Expand application health [dashboard]
- Some existing bugs on this. What granularity? What is "normal"?
- [deinspanjer] Hbase monitoring to be expanded
Staging
- Staging closer to production/more realistic
- Perf/load test before deployment
- Better access to staging for testing
- Best:
- database write access
- ability to run scripts
- Acceptable:
- log viewing
- database browsing
- view config files
- view automated test output (Hudson?)
- Install/write some admin tools to accomplish this (may also be useful in production)
- Best: