Auto-tools/Goals/2012Q2

From MozillaWiki
Jump to: navigation, search

Official Q2 Goals

  • [DONE] Extend Mobile Platform Automation for B2G and Fennec Native to extend our automation systems to work with specific phone hardware and new development boards for both products.
  • [DONE] Deploy Datazilla new graph server UI into production to make it easier and simpler to track all our performance numbers across our growing sets of performance and endurance data we are collecting on our products.
    • NOTE: Revised deliverable in light of developer feedback and changed course mid-quarter to deliver a system that would turn TBPL orange/green based on per-push regression detection. This means we won't deliver a UI by end of quarter, but the necessary block and tackle for the UI will land (and we should have a system on production hardware in early Q3 - it's just waiting on hardware delivery).
  • [DONE] Signal From Noise Phase II - Make the same noise-reduction changes we made on Tp5 on all the other page-load tests and ensure all performance tests are sending raw observations to Datazilla.
  • [DONE] Refactor how rapid-release tracking flags are implemented for improved performance and maintainability
    • NOTE: Code complete, awaiting deployment with IT
  • [DONE] Upgrade bugzilla.mozilla.org to Bugzilla version 4.2
    • NOTE: We are code complete, should be deploying with IT in early weeks of Q3
  • [DONE] Reduce android test automation instability and make it easier for the web QA and desktop QA teams to write and run automated tests.
    • NOTE: If you look at the metrics, it looks like we attained this goal and then lost it. What actually happened is that we attained this, driving android automation failures down to less than 10%, but then we enabled 250,000 more tests and our instability percentage went back up. The big shift here is that for the first time in Fennec's history, we went from running "only what will pass" on mobile to running "everything except the known failures" and we have developers working on the failures.

Official Q2 Goals

These projects must be completed to achieve the above goals.

GOAL: Extend Mobile Platform Automation for B2G and Fennec Native

  • Q2 Outcomes
    • [SKIPPED] Create build-and-flash automation for b2g builds on a specific hardware platform (likely nexus s)
      • Deferred in light of the hardware arriving very late in the quarter and this no longer being a priority
    • [DONE] Run Mochitest and reftest on b2g hardware machines
      • reftest at risk, mochi ok
    • [DONE] Write tutorial on how to write and run marionette tests
    • [DONE] Write 3 simple performance tests based on Marionette for b2g
    • [SKIPPED] Add Fennec Native support for marionette (stretch)
      • deferred
    • [SKIPPED] Create power-profile tests (For fennec native and b2g)
      • deferred in favor of shifting to running all tests except known failures
    • [MISSED] Allow Panda ES Boards to be used as automation platforms (for fennec native)
    • [DONE] Increase stability of Noah's Ark (automation on mobile phones for fennec native)
    • [SKIPPED] Add power test analysis to Noah's Ark
      • deferred (see above - power tests)
    • [DONE] Architect a system for B2G crash reporting
      • TODO: pretty sure this got done, need to follow up with Ted
  • Stakeholders
    • B2G Team
    • Fennec Native Developers
  • Depends On
    • IT for obtaining and deploying phones into haxxor
    • B2G team to define hardware automation platform
    • Datazilla (to capture data from Noah's Ark/power tests)
    • "Taming Panda" project
    • Noah's Ark

GOAL: Deploy Datazilla new graph server UI into production

  • Q2 Outcomes
    • [DONE] Provide generic interfaces/web app plugability for new harnesses to reuse the same infrastructure backend
    • [SKIPPED] Provide Compare-Talos tools to drill into talos regressions (from OS/Changeset to individual page contributions to overall Talos metric)
    • [SKIPPED] Ensure that new UI is based on extensible statistics package that can be used both by developers and the graphserver UI.
      • The skipped goals here were deferred in favor or re-architecting toward a system that could turn TBPL orange/green using per-push regression detection. The elements of compare talos that are still needed even with such a tool and the overall UI will be done after the per-push detection backend and UI lands.
  • Stakeholders
    • Every automation project producing performance related data
    • Every developer at Mozilla - particularly Firefox/Platform developers consuming Talos data
  • Depends On
    • IT for deployment of VMs/pushing to production via puppet
    • Infrasec for sec review
    • Firefox/Platform dev focus group for early UI review and feedback
    • Signal From Noise Phase II
    • Support from Metrics to ensure visualizations in UI are accurate

GOAL: Signal From Noise Phase II

  • Q2 Outcomes
    • [DONE] Perform experiments to extend row-major methods to other page-load style tests
    • [DONE] Implement changes to pave the way for mozharness on mobile and desktop
    • [MISSED] Create tools to monitor noise in talos numbers so that we know when talos numbers become unacceptably noisy again
    • [DONE] Ensure that all raw data values from all talos tests flow into Datazilla database backend
      • NOTE: We worked with metrics on the "noisyness" tool, but it will only work with full raw data for datazilla. And so we dropped this support in favor of getting the per-push calculations correct which will take this idea of increasing noisyness into account.
  • Stakeholders
    • Datazilla Project
    • Firefox/Platform developers that depend on talos data
  • Depends On
    • Support from Metrics to analyze the data of the experiments to extend row-major run order
    • Support from Releng to deploy changes to talos automation infrastructure

GOAL: Enhance Bugzilla Performance

  • Q2 Outcomes
    • [DONE] Refactor how rapid-release tracking flags are implemented for improved performance and maintainability
    • [DONE] Upgrade bugzilla.mozilla.org to Bugzilla version 4.2
    • [DONE] Pulse/Push API Completion -- stretch goal
      • These are code complete as of the end of Q2. Waiting on IT resources to do one by one deployments (it's safer).
  • Stakeholders
    • Entire Mozilla project
  • Depends On
    • IT for software upgrades/testing on bugzilla system
    • IT for move to SCL3 colo
    • IT for 4.2 test site deployment

GOAL: Reduce test instability and make it easier to write automated tests

  • Q2 Outcomes
    • [DONE] Fix top 3 Android infrastructure related oranges.
      • gone from 33% failure rate to 10% failure rate
    • [AT RISK] Create an on-demand VM system for selenium grid (increases stability of web QA automation)
    • [DONE] Complete refactor of Mozmill API/Automation wrappers (increase stability/ease of use of Desktop QA automation)
  • Stakeholders
    • Fennec Native Developers
    • Web QA Team
    • Desktop QA Team
  • Depends On
    • Releng aid to deploy changes to Android toolchains
    • IT to provision on-demand VMs (maybe - we may be able to create VMs ourselves, since we own these ESXi hosts)

P1 Projects

These are projects that we desperately need to finish in Q2 because they also open doors for us (like the high level goals) but they are smaller in scope so they did not attain "official goals" status. They are listed in order of highest priority to least priority.

Stone Ridge

  • Aid the Necko team to deploy their network test system and run on change builds through it, allowing it to report to either a templeton dashboard or the Datazilla database (depending on whether Datazilla's generic interface is online in time)
  • Q2 Outcomes
    • [DONE] Wrap Necko Tests in mozbase script so that they can be easily automated - may not be needed if necko tests are run via xpcshell or some variant of existing harness
    • [SKIPPED] Create Pulse listener to download builds for testing
    • [DONE] Create JSON upload to templeton/datazilla
    • [DONE] Create dashboard for analysis of results (either as standalone templeton or as Datazilla plugin view)
      • NOTE: Not sure if the pulse listener was ever needed. Never heard any mention of it.
  • Stakeholders
    • Necko Team
  • Depends On
    • Datazilla

Pulse Enhancements

  • Improve stability, performance, and security of Pulse system.
  • Q2 Outcomes
    • [DONE] Move to new Pulse hardware in PHX
    • [DONE] Improve durable Queue system
      • NOTE: Added monitoring to handle durable queues but the true fix is in the new library which is not done
    • [SKIPPED] Create new library for Mozilla pulse not dependent on carrot
      • NOTE: Deferred in favor of B2G work
  • Stakeholders
    • All Pulse based Automation systems
  • Depends On
    • IT Support for new hardware move

Taming Panda Boards

  • Ensure that panda boards are a stable, viable automation solution for Fennec Native
  • Q2 Outcomes
    • [MISSED] Run mochitest from end to end
    • [MISSED] Resolve MAC address issue
    • [MISSED] Resolve reboot issues
      • NOTE: We could not resolve the panda mac issues nor the reboot issues this quarter. We did get mochitests running on pandas but we got derailed by constantly looking for kernel solutions to our other problems with the pandas (hands off flashing, stable mac addresses etc). Going forward, we are going to re-tailor the goals and expectations of what is possible on panda dev boards so we can deploy these in Q3.
  • Stakeholders
    • Fennec Native Developers
    • Releng
    • IT

NSS Automation

  • Aid the NSS team so that their tests can be automated in our existing automation systems
  • Q2 Outcomes
    • [SKIPPED] Define and provide tools to ensure an acceptable workflow as we transition to more modern SCM system
    • [SKIPPED] Work with releng and NSS developers to tailor short-running tests for inclusion in buildbot automation
    • [SKIPPED] Create pulse based-system for occasional execution of long-running NSS tests (outside of buildbot automation
    • [SKIPPED] Deploy NSS short-running tests to full automation (stretch)
      • If we can define the system to run the short-running tests in buildbot that is enough. It is pushing the envelope to code these tests as well as deploy them into buildbot in one quarter. Nonetheless, that is our stretch goal.
      • NOTE: The stakeholders seemed to lose steam around these goals through the quarter and we got pulled into higher priority tasks with mobile and B2G. This is still something we'd like to work on going forward.
  • Stakeholders
    • NSS Team
    • Security Team
  • Depends On
    • Releng availability for deployment of tests into buildbot automation
    • Defining workflows to support

Noah's Ark

  • Provide a stable automation system for "on-phone" automation across hardware types for Fennec Native
  • Q2 Outcomes
    • [DONE] Identify instabilities and fix them
    • [DONE] Achieve 90% uptime
  • Stakeholders
    • B2G automation
    • Fennec Native Developers

Mozharness Support

  • Provide the building blocks required for Mozharness support on android and desktop environments to simplify deployments and reduce occurrence of infrastructure of oranges.
  • Q2 Outcomes
    • [MISSED] Create a device checkout system for foopy-less management of systems
      • NOTE: It was a nice to have, deferred in favor of work on android reliability
    • [DONE] Aid with cultivating best of breed SUT managment tools (in coordination with releng)
    • [DONE] Fix dependencies so that Mozbase tools (like talos) can be easily integrated with Mozharness
    • [DONE] Deploy simple pypi server inside our infrastructure so that slaves can easily perform dependency management at runtime (versus at slave-image time via puppet)
      • NOTE: The remaining work to deploy the pypi server is all IT side now.
  • Stakeholders
    • Releng - we will work closely with releng to aid them in achieving this goal

Mobile Blockers Dashboard

  • Provide a simple high-level dashboard indicating the current status of Fennec, sourced from Bugzilla.
  • Q2 Outcomes
    • [DONE] Public web pages, updated daily, indicating total number of open and closed Fennec blockers by day, with links to Bugzilla.
    • [DONE] Public web pages, updated daily, indicating total number of closed blockers and nonblockers for each person involved in Fennec, split by team, with links to Bugzilla.
    • [SKIPPED] Include number of comments-on-blocker-bugs per person.
      • NOTE:This comments-per-person feature was not easily obtained via Bugzilla API, so this is a stretch goal.
  • Stakeholders
    • Engineering management, particularly damons.

Supporting Projects

Many of our Goals and P1 Projects depend on several building blocks. These are important, but if we find ourselves needing to prioritize our time, we should prioritize time on these such that they serve the goals above, and push any further advancements on these projects to future quarters. The projects here are listed in no particular order.

Bughunter

  • Teach QA and the Crask Kill team to effectively use Bughunter to diagnose and discover reproducible crashes.
  • Q2 Outcomes
    • [DONE] Create VMs for people to use the system safely
    • [DONE] Evangelize the use of the system
    • [MISSED] Attend crash-kill work week to find out how to best support that team with BugHunter
      • NOTE: Communication Failure: Did not get invited soon enough to book travel
  • Stakeholders
    • Crash kill/Project Mgmt
    • QA

MozTrap - Manual Test Case Management System

  • Teach and help migrate QA teams to the new test case management system
  • Q2 Outcomes
    • [DONE] Automation API integration so automated tests can track their action in the system
      • This feature is done and merged to master. It will be rolled out to Production in Q3 after a week-long QA cycle.
    • [DONE] Integrate with browserID for user creation
    • [DONE] Complete security review and roll out to production
    • [DONE] Aid QA in migrating to system and evangelize use (create tutorials etc)
  • Stakeholders
    • QA

A11y Automation

  • Mentor the A11y developers as they use our tools to fit the Speclinium accessibility testing framework into the Mozilla automation infrastructure.
  • Q2 Outcomes
    • [DONE] Aid the A11y team with using mozbase as a basis for their automation
    • [DONE] Aid the A11y team with using marionette
    • [SKIPPED] Create VMs for running the automation using pulse
    • [SKIPPED] Tailor automation to be buildbot-ready by end of quarter
    • [SKIPPED] Deploy into buildbot automation (stretch)
      • NOTE: Dburns has been helping out with this, but we deferred several tasks because their automation won't be ready for deploy by end of Q.
  • Stakeholders
    • A11y developers
    • A*team (the A11y developers will find bugs in our software we'll need to fix)

JetPerf Deployments

  • Complete deployment of Addon SDK project JetPerf. (Talos performance metrics for AddonSDK developed addons)
  • Q2 Outcomes
    • [DONE] Implement enough of Mozharness infrastructure to deploy jetperf on desktop talos
    • [MISSED] Deploy jetperf talos into buildbot automation
      • NOTE: We will likely just miss this goal because mozharness support for desktop talos is at risk, which this depends on. We are code complete from the Talos side, however.
  • Stakeholders
    • Addon SDK (jetpack) team
  • Depends On
    • Releng to deploy into buildbot automation

Eideticker

  • Use the Eideticker automation to track our progress on rendering and checkerboarding performance, particularly compared to our competition.
  • Q2 Outcomes
    • [DONE] Get chrome checkerboarding measurements working with galaxy nexus
    • [DONE] Automate checkerboarding analysis
  • Stakeholders
    • Fennec Native Dev Team
    • Fennec Marketing team

WOO

  • Add the orange seed feature to help developers discover when a test first went intermittent. Also, move to a modern staging/production system.
  • Q2 Outcomes
    • [DONE] Implement Orange Seed feature
    • [MISSED] Deploy to production using staging/production VMs
      • NOTE: Our staging system was a miss due to pending transfer of ES control to IT so that a development ES database to be available that could be used so that OF can use it. Need IT to make a decision about whether to upgrade capacity on IT's dev ES cluster or create a dev ES cluster on the metrics ES instance.
        • ES also needs to be upgraded (IT needed)
        • Also needs to be re-indexed (IT needed)
  • Stakeholders
    • Platform developers trying to Juice Oranges

Speedtests

  • Enhance the speedtests with the Kraken JS test as well as mobile measurements.
  • Q2 Outcomes
    • [DONE] Add Kraken to the test matrix
    • [DONE] (new) Add V8 to the test matrix
    • [SKIPPED] Add mobile support to the tests so that we run the canvas demo on phones
      • NOTE: Skipped in favor of getting autophone more reliable to be used as the basic platform here.
  • Stakeholders
    • JS team
    • Fennec Native team

Peptest & Telemetry

  • Wire peptest and telemetry together so that we can use telemetry data to annotate what happened during the unresponsive moments that peptest detected.
  • Q2 Outcomes
    • [DROPPED] Integrate Peptest with telemetry probes to better measure responsiveness
      • NOTE: Dropped in favor of other higher priority work for mobile
    • [SKIPPED] Aid developers with writing peptest patches
      • NOTE: Skipped in favor of meeting with developers, figuring out their new needs and resolving to work toward that with a small project in Q3.
    • [DONE] Complete peptest-talos style reporting system
    • [DONE] Integrate Peptest reporting into Datazilla
      • NOTE: Verified that peptest will write to datazilla, but did not turn on default peptest writing to datazilla yet.
  • Stakeholders
    • Platform/Firefox Developers
    • Snappy team
  • Depends On
    • Datazilla project

Mozhttpd

  • Investigate whether android test stability and mochitest turn around time can be improved by replacing httpd.js with a python webserver.
  • Q2 Outcomes
    • [DONE] Investigate using Mozhttpd for mochitest webserver to see if turnaround time can be decreased
      • Have a POC working for some directories. Still investigating.
  • Stakeholders
    • Releng (frees up slave time if this works)
    • Developers (improves end-to-end result time if it works)
    • Fennec Developers (significantly simplifies running Fennec tests by hand)

W3C Test Mirroring for CSS WG

  • Provide a semi-automated mechanism to help the Layout team submit reftests to the CSS working group as well as enable them to easily incorporate the CSS working group's tests in our on-change testing.
  • Q2 Outcomes
    • [MISSED] Complete code for mirroring solution
    • [MISSED] Deploy to VM for automation
      • NOTE: Missed due to poor leadership from mentor on this project (ctalbert).
  • Stakeholders
    • Layout Team (fantasai is our main customer)

Powerball

  • Analyze game design as a means to improve community engagement across development/qa.
  • Q2 Outcomes
    • [DONE] Create design plan for community building game
  • Stakeholders
    • Ourselves, at the moment

Addons Automation

  • We have two separate systems for automated addon testing. Neither system solve the entire problem. We need a plan in place to correct this. Intended as preparation for a Q3 goal.
  • Q2 Outcomes
    • [SKIPPED] Drive consensus around a comprehensive architecture for addon test automation
      • NOTE: Attempted to start this conversation, but the addons and related teams were not interested in the automation. Left the conversation to be revisited in the future and focused on higher priority mobile, b2g, and bugzilla work instead.
  • Stakeholders
    • AMO team
    • AMO developers
    • Platform developers
    • Releng

BuildFaster

  • The BuildFaster dashboard is currently down. It's an important tool for tracking whether our end-to-end build/test times are regressing, so we should try to get it back up.
  • Q2 Outcomes
    • [DONE] The GoFaster dashboard (in particular the end-to-end times and buildcharts view) should be publically accessible again.
  • Stakeholders
    • Releng
    • Developers

Community Involvement Goals

  • [DONE] Establish best practices to become the best community integrated development team at Mozilla
  • [MISSED] Blog about those practices (as we prove them)
    • NOTE: We didn't blog about our practices any better than we did last quarter
  • [DONE] Promote two community folks to Mentor status.
  • [MISSED] Set individual blogging targets and meet them.
    • NOTE: See above, the super stars on our team of blogging did their blogging, but the rest of us didn't.