Socorro/Pre-PHX Smoketest Schedule: Difference between revisions

From MozillaWiki
Jump to navigation Jump to search
No edit summary
(update script link)
Line 16: Line 16:
*** <strike>grinder ({{bug|619815}}) + 20 VMs ({{bug|619814}})</strike>
*** <strike>grinder ({{bug|619815}}) + 20 VMs ({{bug|619814}})</strike>
*** Lars added stats and iteration to submitter.py for initial smoke-test {{bug|622311}}
*** Lars added stats and iteration to submitter.py for initial smoke-test {{bug|622311}}
*** 40 seamicro nodes standing by to test, using [https://bug619814.bugzilla.mozilla.org/attachment.cgi?id=502200 socorro-loadtest.sh]
*** 40 seamicro nodes standing by to test, using [https://bug619814.bugzilla.mozilla.org/attachment.cgi?id=503222 socorro-loadtest.sh]
*** pool of 240k crashes, taken over 10 days from MPT prod (Jan 1st through 10th)
*** pool of 240k crashes, taken over 10 days from MPT prod (Jan 1st through 10th)
** when:
** when:
*** waiting on deps in tracking {{bug|619811}}
*** waiting on deps in tracking {{bug|619811}}
*** tentative start date - Monday Jan 10 2010
*** tentative start date - Wednesday Jan 12 2010
**** minimum 2-3 days testing; as much as we can get
**** minimum 2-3 days testing; as much as we can get
* what component failure tests we will run
* what component failure tests we will run

Revision as of 20:04, 12 January 2011

bug 619817

  • What we are going to test and how in terms of load
    • what:
      • at what point do collectors fall over?
        • start with 40 test nodes at 1k crashes each, versus one socorro collector
          • assuming/hoping this will overwhelm one collector
          • back down nodes/crashes until we find a stable place
          • check ganglia to see where our bottlenecks are
      • crashes are collected without error
      • all submitted crashes are collected and processed
        • check apache logs for collector (syslog not reliable)
        • check processor and collector logs for errors
        • confirm that all crashes are stored in hbase
    • how:
      • grinder (bug 619815) + 20 VMs (bug 619814)
      • Lars added stats and iteration to submitter.py for initial smoke-test bug 622311
      • 40 seamicro nodes standing by to test, using socorro-loadtest.sh
      • pool of 240k crashes, taken over 10 days from MPT prod (Jan 1st through 10th)
    • when:
      • waiting on deps in tracking bug 619811
      • tentative start date - Wednesday Jan 12 2010
        • minimum 2-3 days testing; as much as we can get
  • what component failure tests we will run
    • disable individual components to see test failure/recovery
      • hbase
      • postgresql
      • monitor
      • processor
    • postgresql failover test
      • failover master01->master02
      • will require manual failover of all components