Auto-tools/Meetings/2015-01-26

From MozillaWiki
Jump to: navigation, search

Contents

Notices, Highlights, Roundtable

  • Significant Contributions
  • [jmaher] A-Team [community day] - Tommorrow!
    • hack a bit on our tasks
    • expect to have folks help out with docs/etc.
    • the bootcamp

Newsgroup and Blog Posts

Goal Updates

Note: Items belonging to Supporting Tasks and Backlog are not part of Q1 goals and may not be completed this quarter.

Also see our Trello board

Marionette

Implement the set of features needed to support conversion of all prioritized mozmill tests to Marionette [chmanchester, AutomatedTester]

  • details: implement modal dialog support, separation of Marionette client into a separate package, release of older version of Marionette client to support update tests, modification of harness to dynamically load appropriate version of Marionette client
  • bugs: bug 906712, bug 1109183, bug 1107336
  • progress since last update:
    • bug 906712 Largely busy with other tasks, but have incorporated feedback on the api and requested review on the patch.
    • bug 1107336 - WPTRunner patch created and new try pushes after it landed and was backed out.

Support the conversion of targeted P1 mozmill tests to Marionette and get them running in CI [hskupin]

  • details: Perform the CI work necessary to get mozmill tests converted to Marionette running for update tests, tabbed browser tests, and awesomebar tests. This entails writing new Marionette test libraries for these features, and getting converted tests running on the existing mozmill CI systems.
  • stretch goal: get test results reported to Treeherder
  • stretch goal: support the conversion of Search tests
  • progress since last update:
    • Still being blocked by some bugs in Marionette, but workarounds work pretty well so far
    • Currently working on chrome window handling code. WIP up on https://github.com/mozilla/firefox-ui-tests/pull/50, and reviewable patch should follow later today or early tomorrow.
    • Next is the implementation of tab handling
    • Barbara started to work on investigating the default preferences for Marionette (bug 1123683)
    • Chris is working on getting the location bar ui module implemented
    • Chris is preparing training materials for the training next week

Resolve P1 bugs blocking the release of Marionette 1.0 [AutomatedTester, ato, jgraham]

Supporting Tasks

  • train QA on writing Marionette Greenlight Tests

MozReview and Autoland (joint with RelEng)

Add support for autolanding from MozReview to try [mcote, dminor, mdoglio]

Better Bugzilla integration with MozReview [mcote]

  • details: Make MozReview data in Bugzilla more useful by creating a Bugzilla field that contains dynamic information about a MozReview review request
  • bug: bug 1102428
  • progress since last update:
    • MozReview web API in review.
    • Extension plumbing finished.
    • Web API complete.
    • Front end code asynchronously fetching data from MozReview API.
    • Remaining: formatting of extension UI, switching MozReview to write to extension web API.

Supporting Tasks

  • continue improvements to the multi-commit UI
    • mconley landed bug 1064111, Move contents of "Commits" tab in rbmozui to the main review request page.

Perfherder

Ingest all Talos data with Treeherder and develop a UI that can be used to view current and historical data [wlach]

  • details: We want to use Treeherder to store and display Talos performance data, because Datazilla is being deprecated and Graphserver doesn't support the kind of performance analysis we'd like to perform on the data.
  • progress since last update: Really starting to come together. Initial pull request filed (https://github.com/mozilla/treeherder-ui/pull/289), should be landing pending resolution of mdoglio's requests. Lots of polish / refinement required but we're getting there. Screenshot: treeherder-graphs-2014-01-21.png

Treeherder

Distinguish between Tier 1 and Tier 2 jobs [mdoglio]

  • details: Tier 1 and Tier 2 jobs will have different sheriffing guidelines and different expectations. Accordingly, we need to display them and allow users to interact with them differently.
  • bug: bug 1113322
  • progress since last update: bug 1097090 is now under review. No much progress because I have been working on Mozreview last week.

Develop a prototype structured log viewer [camd]

  • details: Develop a minimal structured log viewer which can be used to view the structured logs produced by the test harnesses. For the initial implementation, the user should be able to toggle between the parsed log viewer and the structured log viewer, for those harnesses that produce both.
  • bug: bug 1113873
  • progress since last update:
    • Didn't get a chance to work on this in this period. No new progress.

Develop a minimal UI that sheriffs can use to file new intermittents [edmorley]

  • details: develop a usable but minimal UI that sheriffs can use to quickly file new intermittent issues; the UI should automatically fill out some of the details that sheriffs normally have to find manually. Future iterations can improve on this by auto-filling more fields.
  • bug: bug 1117583
  • progress since last update: None - worked on supporting tasks below (and to a much lesser extent, my other personal deliverable). Infra/reliability & other issues still remain that are higher priority than new features.

Supporting Tasks

  • Continue to improve the performance and operational aspects of the system
    • Dealt with tree-closing issues due to DB locks
    • Fixing the fact that we never used the DB read host (awaiting testing/deployment)
    • During the tree closure debugging also discovered jobs could get stuck during ingestion in the 'loading' state, bug filed
    • Fixed log parsing exceptions seen on production
    • Limited max number of revisions ingested to avoid timeouts with merge day pushes
    • Reduced data lifecycle from 5 to 4 months to alleviate shortage of DB disk space & discovered performance artefacts were not being expired (PR ready to land)
  • Identify and resolve remaining issues blocking TBPL EOL - treeherder parts: bug 1059400, other parts: bug 1054977
    • Survey emailed out: newsgroup post
    • Several developer reported issues fixed (correct timestamp in TBPLbot comments, missing TinderboxPrints, Persona login for emails longer than N characters, reduced target_blank usage, missing bugs suggestions, talos panel for e10s talos, broken "push not processed" refresh, filtering by author on Try)

Backlog

  • Change the authentication model to separate credentials from repos
  • Implement OrangeFactor within Treeherder
  • Create a developer-centric view

Bugzilla

Implement an alternate bug view [glob]

  • details: Implement an alternative view of bugs, to provide UX and responsiveness improvements, and as a foundation for task-/team-centric views.
  • bug: bug 1068655
  • progress since last update:

Implement versioning framework for REST API [dkl]

  • details: Version the REST API to provide stable endpoints for users and a place for unstable development.
  • bug: bug 1051056
  • progress since last update:
    • Made some progress last week after the upstream 5.0rc1 release went out but got called back to do another spin due to a missed error. Hopefully finishing the respin today/tomorrow and then back full time on the goal. Should hopefully have a working prototype by end of week.

GitHub authentication [dylan]

  • details: Allow users to authenticate with Bugzilla using their GitHub account. This will encourage more contributors and allow us to better integrate GitHub into the Mozilla workflow.
  • bug: bug 1118365
  • progress since last update:

I've been able to pass auth tokens between github and my dev server. There are some questions about mechanics of how this authentication ties into Bugzilla accounts that need to be discussed at the next BMO team meeting (Tuesday).

DevTools Harness

Get the DevTools harness running in continuous integration [ted]

  • details: Take the prototype that was developed in 2014 Q4 (https://github.com/luser/luciddream) and get it running in continuous integration and visible in Treeherder. It's TBD whether this will be run in buildbot or TaskCluster, but we should get it running somewhere per-commit this quarter against linux desktop Firefox and a B2G emulator.
  • progress since last update:
    • Met with James Lal to talk about TaskCluster, looks viable for running this harness in CI. Did some work on a mozharness script to run the Luciddream harness.
    • This goal is going to get postponed for me to work on another project.

CloudServices Automation

Supporting Tasks

  • Take existing client/server automation that is already running and get it reporting to Treeherder using the Tier 2 UI/workflow; bug 1108259

Test Infrastructure

Define and document Tier 2 jobs [bc]

  • details: Define and document all aspects of Tier 2 jobs: rationale, criteria, necessary enhancements to Treeherder and automation frameworks.
  • bug: bug 1121655
  • progress since last update: None. I've completed my 'must-do' items for Autophone and will focus on this goal this week.

Prototype a retrigger-based bisection tool [armenzg, jmaher]

  • details: Create a prototype of a command-line tool that can be used by sheriffs and others to automate retrigger-based bisection. This could be used to help bisect new intermittent oranges, and to backfill jobs that have been skipped due to coalescing. Integration with Treeherder or other service will be done later.
  • Project repo: https://github.com/armenzg/mozilla_ci_tools
  • progress since last update:
    • I've done the first release of the library that allows us to trigger jobs in automation.
    • We will have a second release this week in preparation for the bisection tool

Store high-resolution testcase data ("ActiveData") [ekyle, ahal]

  • details: Create a Proof of Concept “big data” project which will store information about every test file we run: test status, error details, test machine and test duration to begin with. We will use this project to develop schemas and queries that work with data this large, and we will use this data to normalize chunk sizes and provide details about which tests never fail.
  • progress since last update:
    • First version ETL is working well enough. It has some long term stability issues, and will inevitably need improvements once we start charting the results.
    • ahal's structured catalog worker management code will help with the stability issues found in my ETL code
    • Single node 'cluster' being filled with unittest data - current concerns are price pricing chart says $0.40/hour, or almost $300/month, so looking into using spot instances for the other nodes. The single node is weak: It will OoM under full load. If it does not crash, then it slows to a crawl: My current guess is the EBS volumes are too slow and acting as a bottleneck
  • Next Steps
    • Hopefully Fabric along with boto will make setting up EC2 instances for both the ES cluster and the ETL daemons easy
    • Integrate the ETL code into structured catalog
    • Add front-end to ES cluster like esFrontLine that protects cluster from updates, provides logging, and (hopefully) provides a simpler query interface.

Implement the ability to normalize chunk durations in mochitest [ahal]

  • details: For mochitest variants on desktop and B2G, modify manifestparser and the test harnesses to be able to specify which tests are run in specific chunks.
  • stretch goal: Implement the same feature for Android mochitest, which still uses old-style JSON manifests.
  • bug: bug 1124182
  • progress since last update:
    • refactored manifestparser into different files (landed)
    • implemented filtering system for manifestparser (up for review)
    • started work porting chunking algorithms to python (in progress)

Create Android 4.4 emulator image for automated tests [gbrown]

  • details: Continue the work in bug 1062365 to build an emulator image based on Android 4.4 that is capable of running automated tests.
    Deliverable includes:
    • a prototype image
    • instructions for re-creating the image
    • demonstration that tests can be run on image
      NOT included in this deliverable:
    • tests running in continuous integration
    • "greening" of tests
  • progress since last update:
    • bug 1123443 Allow android emulator tests to use adb devicemanager (reviewed)
    • bug 1124913 Allow android emulator tests to download emulator (reviewed)
    • working on setting up a try push with adb + new emulator + new avds

Help Releng reduce test load [jmaher]

  • details: This quarter, we’ll validate the data from SETA and provide some recommendations to Releng about which jobs/platforms we could schedule less often in order to reduce test load. We’ll monitor the impact of these changes in terms of sheriffing burden and the number of retriggers this demands, and may adjust as needed. In subsequent quarters, we’ll use additional data from the high-resolution testcase data project and OrangeFactor to provide more finely-tuned scheduling changes.
  • progress since last update:
    • SETA work continues, ETA for dev.platform post this week, ETA for changes - next month
    • items to address- webUI to view results/status, buildbot changes to reduce jobs, retrigger tool to reduce sheriff load when missing a test
    • in addition, osx 10.6 jobs have been reduced by 50%, talos jobs on esr, b2g*, and the release branch are turned off, and android jobs will be turned off on ESR

Supporting Tasks

  • Help green up tests on OSX 10.10
  • Apply --run-by-dir to all mochitest harnesses
  • Remove legacy JSON manifests in favor of manifestparser manifests
  • Provide alternate solutions for the last consumers of Datazilla and work to decommission it
  • Work with devs to introduce more dynamic analyzers (like Ehsan’s setTimeout check) in test harnesses
  • Automate Windows symbol fetching, bug 1117741 [ted]
  • Add ssltunnel support to Android tests, bug 1084614

Performance Testing

Deliver training to at least 2 people for Talos performance sheriffing [jmaher]

  • details: We want to expand the pool of people who can perform performance sheriffing to make it scale better, and to reduce the bus factor problem.
  • progress since last update:
    • 90% trained one person, and have a contributor adding much needed functionality to the tool
    • ETA for full set of docs/FAQs/process by Feb 13th

Supporting Tasks

  • Continue sheriffing Talos performance regressions
  • Add new benchmarks as needed to mozbench
  • Create a new UI for mozbench results that doesn’t requite Datazilla
  • Improve e10s support for Talos tests and infrastructure
  • Move Talos into the tree
  • Get rid of talos.zip
  • Make running Talos locally easier
  • resources: jmaher, dminor (for mozbench primarily), and contributors

Community

Increase 'contributor friendliness' of our projects [jmaher, all]

  • details: Ensure that all ongoing projects have a friendliness rating of at least 6, as shown on https://wiki.mozilla.org/Auto-tools/Projects/Everything
  • progress since last update:
    • No progress, been working on talos, alertmanager, SETA
    • community day tomorrow! Will make some progress on this then

Supporting Tasks

  • Start tracking at least three community-related metrics over time

Other Project Updates

charts.mozilla.org

Update

  • attempted a few versions of dashboard to visualize/manage releases for FxOS devices (engineering project management and release management team).

Next Step

  • use a variation on the Platform dashboard

Alerts

Update

  • Updated dzAlerts to digest the options_collection_hash (pgo/debug) and e10s. Currently running on public dev sever at home.

Next Step

  • Put Talos data in Active Data ES Cluster: dzAlerts needed a home for a cluster for a while.

Holidays and Trips

  • [mcote] on PTO starting Jan 28. Back Feb 8.

Misc