SocorroRoadmap2010

From MozillaWiki
Jump to: navigation, search

DRAFT
The content of this page is a work in progress intended for review.

Please help improve the draft!

Ask questions or make suggestions in the discussion
or add your suggestions directly to this page.


Dates

Related Quarterly Goals

  • Metrics q2 goals (for background):
    • Replace NFS in production
    • Have cluster doing background processing of 100% of crash reports
    • Provide replacement for Postgres big table
    • [stretch] Developer API, likely to slide to Q3
  • Client team goal: Gather more information from crashes bug 528657

Milestones

1.7: Hbase, part I

  • Get individual crash reports into Hbase
  • Begin rewriting pythonic middleware to support UI -> Hbase (transparent to UI at this stage)
  • OOPP hang reports supported
  • End of NFS

1.8: Hbase, part II

  • Daemonize processor/MDSW and run on Hbase worker nodes (architecture diagram coming)

1.9 Middleware API

  • Create an API to HBase/SOLR to replace most PostgreSQL queries in the webapp

2.0: new UI

  • Rewrite webUI to use new middleware
  • (stretch goal, may slip to 2.1) Implement a general purpose full text search. Should be able to search on any data associated with a crash, e.g any part of the stack trace and/or module list, any permutation or combination of field values

2.01 Cleanup

  • Post 2.0, let's do a clean up release to do a bunch of housekeeping
  • Perform a team survey of the unit testing landscape
    • Define unit testing needs
      • Hadoop
      • Python
      • PHP
    • Define integration testing needs
    • Define acceptance testing needs
  • Define unit testing strategy
    • Assign people to champion each area of testing
  • Perform a team survey of the documentation landscape
  • Better app monitors / business level monitoring
  • Subversion

2.(x+1) Trend Reports, part 1: Explosive bugs

  • Explosive Bugs Analysis
    • Automated detection of explosive bugs
    • First stage is bug 519423
    • PRD is needed here [chofmann/laura]
    • UX is needed here

2.(x+2) Trend reports, part 2: better correlations

  • Other cloud based correlation reports:
    • Between one report and other related reports: what are the logical correlatons? (PRD needed)
    • Correlation between any single piece of data and another (e.g. plugins, time, etc
      • Replace current correlations HACK with cloud version bug 554373

2.(x+3)

  • Draft goal: smarter analysis


Process

These improvements shall be made over the course of Q2 (and likely continuing in Q3)

Better release process

Testing and QA

  • Better code review practices: commits to mailing list
  • Add QA to release cycle
  • See Test Plan for UI testing
  • More unit tests, more integration tests
  • Validate data sources against each other (e.g. bug 552539, bug 553144) - also look back at similar fixed bugs for test cases
  • Run tests automatically on checkin (Hudson?)

Monitoring

  • Write scripts for app level monitoring for IT to hook up to nagios
  • Implement "business logic" monitors: check things like hourly volume via webapp, db, etc
  • Expand application health [dashboard]
    • Some existing bugs on this. What granularity? What is "normal"?
  • [deinspanjer] Hbase monitoring to be expanded

Staging

  • Staging closer to production/more realistic
  • Perf/load test before deployment
  • Better access to staging for testing
    • Best:
      • database write access
      • ability to run scripts
    • Acceptable:
      • log viewing
      • database browsing
      • view config files
      • view automated test output (Hudson?)
    • Install/write some admin tools to accomplish this (may also be useful in production)