SocorroRoadmap2010: Difference between revisions

(Created page with '{{DRAFT}} = Related Quarterly Goals = * Q2: Migrate crash report generation onto new infrastructure (Webdev/IT/Metrics shared) = Client = * [client team] Gather more informati…')
 
 
(20 intermediate revisions by 3 users not shown)
Line 1: Line 1:
{{DRAFT}}
{{DRAFT}}
= Dates =
* Dates for the following goals can be found [http://spreadsheets.google.com/ccc?key=0AhiX365xacl1dDZ6eVdMc1Vpbld3elMyX2lnUzJnclE&hl=en in google docs]


= Related Quarterly Goals =
= Related Quarterly Goals =
* Q2: Migrate crash report generation onto new infrastructure (Webdev/IT/Metrics shared)
* Metrics q2 goals (for background):
 
= Client =
* [client team] Gather more information from crashes {{bug|528657}}
 
= Hbase related =
* [metrics team] Metrics q2 goals (for background):
** Replace NFS in production
** Replace NFS in production
** Have cluster doing background processing of 100% of crash reports
** Have cluster doing background processing of 100% of crash reports
** Provide replacement for Postgres big table
** Provide replacement for Postgres big table
** [stretch] Developer API, likely to slide to Q3
** [stretch] Developer API, likely to slide to Q3
* Get reports running from Hbase [existing goal]
* Client team goal: Gather more information from crashes {{bug|528657}}
 
= Milestones  =
== 1.7: Hbase, part I ==
* Get individual crash reports into Hbase
* Begin rewriting pythonic middleware to support UI -> Hbase  (transparent to UI at this stage)
* OOPP hang reports supported
* End of NFS
 
== 1.8: Hbase, part II ==
 
* Daemonize processor/MDSW and run on Hbase worker nodes (architecture diagram coming)


= Search =
== 1.9 Middleware API ==
* (Requires Hbase) Implement a general purpose full text  search.  Should be able to search on any data associated with a crash, e.g any part of the stack trace and/or module list, any permutation or combination of field values
* Create an API to HBase/SOLR to replace most PostgreSQL queries in the webapp
 
== 2.0: new UI ==
* Rewrite webUI to use new middleware
* (stretch goal, may slip to 2.1) Implement a general purpose full text  search.  Should be able to search on any data associated with a crash, e.g any part of the stack trace and/or module list, any permutation or combination of field values
**  [[http://tinyurl.com/socorro-search Existing search bugs]]
**  [[http://tinyurl.com/socorro-search Existing search bugs]]
** PRD needed (implicit in bugs, make explicit)
** UX work needed here [chowse]


= Trend Analysis =
== 2.01 Cleanup ==
* Automated detection of explosive bugs
* Post 2.0, let's do a clean up release to do a bunch of housekeeping
* First stage is {{bug|519423}}
* Perform a team survey of the unit testing landscape
* PRD is needed here
** Define unit testing needs
*** Hadoop
*** Python
*** PHP
** Define integration testing needs
** Define acceptance testing needs
* Define unit testing strategy
** Assign people to champion each area of testing
* Perform a team survey of the documentation landscape
** Define documentation needs
*** http://code.google.com/p/socorro
*** Python
*** PHP
** Assign people to champion each documentation area
* Better app monitors / business level monitoring
* Subversion
** Decide whether to change to branch release system - https://bugzilla.mozilla.org/show_bug.cgi?id=481479


= Reports =
== 2.(x+1) Trend Reports, part 1: Explosive bugs  ==
* More correlation reports:
* Explosive Bugs Analysis
** Automated detection of explosive bugs
** First stage is {{bug|519423}}
** PRD is needed here [chofmann/laura]
** UX is needed here
 
== 2.(x+2) Trend reports, part 2: better correlations ==
* Other [https://bugzilla.mozilla.org/buglist.cgi?quicksearch=component%3Asocorro+whiteboard%3Acloud cloud based correlation reports]:
** Between one report and other related reports: what are the logical correlatons? (PRD needed)
** Between one report and other related reports: what are the logical correlatons? (PRD needed)
** Correlation between any single piece of data and another (e.g. plugins, time, etc
** Correlation between any single piece of data and another (e.g. plugins, time, etc
*** Replace current correlations HACK with cloud version {{Bug|554373}}
== 2.(x+3) ==
* Draft goal: smarter analysis


= Process =  
= Process =  
These improvements shall be made over the course of Q2 (and likely continuing in Q3)


== Better release process ==
== Better release process ==
Line 35: Line 78:


== Testing and QA ==
== Testing and QA ==
* Better code review practices: commits to mailing list
* Add QA to release cycle
* Add QA to release cycle
* See [https://wiki.mozilla.org/QA/Execution/Web_Testing/Socorro/Test_Plan Test Plan] for UI testing
* See [https://wiki.mozilla.org/QA/Execution/Web_Testing/Socorro/Test_Plan Test Plan] for UI testing
* More unit tests, more integration tests
* More unit tests, more integration tests
* Validate data sources against each other (e.g. {{bug|552539}}, {{bug|553144}}) - also look back at similar fixed bugs for test cases
* Validate data sources against each other (e.g. {{bug|552539}}, {{bug|553144}}) - also look back at similar fixed bugs for test cases
* Do we want to use Hudson?
* Run tests automatically on checkin (Hudson?)


== Monitoring ==
== Monitoring ==
* Write scripts for app level monitoring for IT to hook up to nagios
* Write scripts for app level monitoring for IT to hook up to nagios  
* Implement "business logic" monitors: check things like hourly volume via webapp, db, etc
* Implement "business logic" monitors: check things like hourly volume via webapp, db, etc
* Expand application health [[http://crash-stats.mozilla.com/status dashboard]]
* Expand application health [[http://crash-stats.mozilla.com/status dashboard]]

Latest revision as of 20:16, 17 August 2010

DRAFT
The content of this page is a work in progress intended for review.

Please help improve the draft!

Ask questions or make suggestions in the discussion
or add your suggestions directly to this page.


Dates

Related Quarterly Goals

  • Metrics q2 goals (for background):
    • Replace NFS in production
    • Have cluster doing background processing of 100% of crash reports
    • Provide replacement for Postgres big table
    • [stretch] Developer API, likely to slide to Q3
  • Client team goal: Gather more information from crashes bug 528657

Milestones

1.7: Hbase, part I

  • Get individual crash reports into Hbase
  • Begin rewriting pythonic middleware to support UI -> Hbase (transparent to UI at this stage)
  • OOPP hang reports supported
  • End of NFS

1.8: Hbase, part II

  • Daemonize processor/MDSW and run on Hbase worker nodes (architecture diagram coming)

1.9 Middleware API

  • Create an API to HBase/SOLR to replace most PostgreSQL queries in the webapp

2.0: new UI

  • Rewrite webUI to use new middleware
  • (stretch goal, may slip to 2.1) Implement a general purpose full text search. Should be able to search on any data associated with a crash, e.g any part of the stack trace and/or module list, any permutation or combination of field values

2.01 Cleanup

  • Post 2.0, let's do a clean up release to do a bunch of housekeeping
  • Perform a team survey of the unit testing landscape
    • Define unit testing needs
      • Hadoop
      • Python
      • PHP
    • Define integration testing needs
    • Define acceptance testing needs
  • Define unit testing strategy
    • Assign people to champion each area of testing
  • Perform a team survey of the documentation landscape
  • Better app monitors / business level monitoring
  • Subversion

2.(x+1) Trend Reports, part 1: Explosive bugs

  • Explosive Bugs Analysis
    • Automated detection of explosive bugs
    • First stage is bug 519423
    • PRD is needed here [chofmann/laura]
    • UX is needed here

2.(x+2) Trend reports, part 2: better correlations

  • Other cloud based correlation reports:
    • Between one report and other related reports: what are the logical correlatons? (PRD needed)
    • Correlation between any single piece of data and another (e.g. plugins, time, etc
      • Replace current correlations HACK with cloud version bug 554373

2.(x+3)

  • Draft goal: smarter analysis


Process

These improvements shall be made over the course of Q2 (and likely continuing in Q3)

Better release process

Testing and QA

  • Better code review practices: commits to mailing list
  • Add QA to release cycle
  • See Test Plan for UI testing
  • More unit tests, more integration tests
  • Validate data sources against each other (e.g. bug 552539, bug 553144) - also look back at similar fixed bugs for test cases
  • Run tests automatically on checkin (Hudson?)

Monitoring

  • Write scripts for app level monitoring for IT to hook up to nagios
  • Implement "business logic" monitors: check things like hourly volume via webapp, db, etc
  • Expand application health [dashboard]
    • Some existing bugs on this. What granularity? What is "normal"?
  • [deinspanjer] Hbase monitoring to be expanded

Staging

  • Staging closer to production/more realistic
  • Perf/load test before deployment
  • Better access to staging for testing
    • Best:
      • database write access
      • ability to run scripts
    • Acceptable:
      • log viewing
      • database browsing
      • view config files
      • view automated test output (Hudson?)
    • Install/write some admin tools to accomplish this (may also be useful in production)