canmove, Confirmed users
1,126
edits
m (→4) Scheduled on every push: Clarify coalescing) |
(Tweaks from dev.tree-management discussions) |
||
Line 2: | Line 2: | ||
== Requirements for being shown in the default TBPL view == | == Requirements for being shown in the default TBPL view == | ||
This page was created to clarify the existing requirements that a platform/test-suite has to meet, before its jobs can be shown in the default [https://tbpl.mozilla.org/ TBPL] view. To propose changes to this policy, please speak to the sheriffs and/or post to [https://lists.mozilla.org/listinfo/dev-platform dev.platform]. | This page was created to clarify the existing requirements that a platform/test-suite has to meet, before its jobs can be shown in the default [https://tbpl.mozilla.org/ TBPL] view. Common sense will apply in cases where some of the requirements are not applicable for a particular platform/build/test type. | ||
To propose changes to this policy, please speak to the sheriffs and/or post to [https://lists.mozilla.org/listinfo/dev-platform dev.platform]. | |||
=== 1) Has an active owner === | === 1) Has an active owner === | ||
Line 30: | Line 32: | ||
=== 5) Easily run on try server === | === 5) Easily run on try server === | ||
* Otherwise developers who have had their landing backed out for breaking the job type will be unable to easily debug/fix the failures, particularly if they only reproduce on our infrastructure. | * Otherwise developers who have had their landing backed out for breaking the job type will be unable to easily debug/fix the failures, particularly if they only reproduce on our infrastructure. | ||
* Developers should not be expected to guess try chooser options, so http://trychooser.pub.build.mozilla.org/ | * Developers should not be expected to guess try chooser options, so http://trychooser.pub.build.mozilla.org/ should be updated if appropriate. | ||
=== 6) Outputs failures in a TBPL-starrable format === | === 6) Outputs failures in a TBPL-starrable format === | ||
* Failures must appear in the TBPL annotated summary (ie: | * It is highly recommended that new test harnesses do not reinvent the wheel and instead use parts of MozBase (eg: mozcrash) if at all possible - speak to the A-Team for more info. | ||
* | * Failures must appear in the TBPL annotated summary (ie: they must match the [https://hg.mozilla.org/webtools/tbpl/file/tip/php/inc/GeneralErrorFilter.php log parsing regexp]), otherwise the full log will have to be opened for every failure. | ||
* Exceptions & timeouts must be | * Failure output must be in the format expected by TBPL's [https://hg.mozilla.org/webtools/tbpl/file/tip/php/inc/AnnotatedSummaryGenerator.php bug suggestion generator] (otherwise sheriffs have to manually search Bugzilla when starring intermittent failures): | ||
* The sheriffs will be happy to | ** For in-tree/product issues (eg: test failures, crashes): | ||
*** Pipe symbol used as delimiter. | |||
*** 1st token: One of {TEST-UNEXPECTED-FAIL, TEST-UNEXPECTED-PASS, PROCESS-CRASH}. | |||
*** 2nd token: A unique test name/filepath (not a generic test loader that runs 100s of other test files, since otherwise bug suggestions will return too many results). | |||
*** 3rd token: The specific failure message (eg: the test part that failed, the top frame of a crash or the leaked objects list for a leak). | |||
** For non test-specific issues (eg: infra/automation/harness): | |||
*** TBPL falls back to searching Bugzilla for the entire failure line (excluding mozharness logging prefix), so it should be both unique to that failure type & repeatable (ie: no use of process IDs for which there will rarely be a repeat match against a bug summary). | |||
** Exceptions & timeouts must be handled with appropriate log output (eg: the failure line must state in which test the timeout occurred, not just that the entire run has timed out). | |||
* The sheriffs will be happy to advise regarding TBPL log output compatibility. | |||
=== 7) | === 7) Low intermittent failure rate === | ||
* A high failure rate: | * A high failure rate: | ||
** Causes unnecessary sheriff workload. | ** Causes unnecessary sheriff workload. | ||
** Affects the ability to sheriff the trees as a whole, particularly during times of heavy coalescing. | ** Affects the ability to sheriff the trees as a whole, particularly during times of heavy coalescing. | ||
** Undermines devs confidence in the platform/test-suite - which as demonstrated by Firefox for Android, permanently affects their willingness to believe any future failures, even once the intermittent-failure rate is lowered. | ** Undermines devs confidence in the platform/test-suite - which as demonstrated by Firefox for Android, permanently affects their willingness to believe any future failures, even once the intermittent-failure rate is lowered. | ||
* A mozilla-central push results in ~400 jobs | * A mozilla-central push results in ~400 jobs. The typical OrangeFactor across all trunk trees is normally (excluding the recent spike) 3-4, ie: a failure rate of ~1%. | ||
* Therefore as a rough guide new platform/testsuite must have at most a 5% failure rate initially, and ideally <1% longer term. | |||
* However, sheriffs will make the final determination of whether a job type has too many intermittent failures. This will be a based on a combination of factors including failure rate, length of time the failures have been occurring, owner interest in fixing them & whether TBPL is able to make bug suggestions. | |||
=== 8) Must avoid patterns known to cause non deterministic failures === | === 8) Must avoid patterns known to cause non deterministic failures === | ||
* Must avoid pulling the tip of external repositories as part of the build - since landings there can cause non-obvious failures (legacy exception being gaia). If an external repository is absolutely necessary, instead reference the desired changeset from a manifest in mozilla-central (like talos does). | |||
* Must not rely on resources outside of the build network: | * Must not rely on resources outside of the build network: | ||
** Since these will cause failures when the external site is unavailable, as well as impacting end to end times & adding noise to performance tests. | ** Since these will cause failures when the external site is unavailable, as well as impacting end to end times & adding noise to performance tests. | ||
Line 64: | Line 77: | ||
=== 11) Easy for a dev to run locally === | === 11) Easy for a dev to run locally === | ||
* | * Supported by mach (if appropriate). | ||
* Ideally part of mozilla-central (legacy | * Ideally part of mozilla-central (legacy exceptions being Talos, gaia). | ||
== Requesting changes in visibility == | == Requesting changes in visibility == | ||
Line 73: | Line 86: | ||
* Your platform/test-suite will still be being run, just not shown on the default view. This model has worked well for many projects/build types (eg jetpack, xulrunner, spidermonkey). | * Your platform/test-suite will still be being run, just not shown on the default view. This model has worked well for many projects/build types (eg jetpack, xulrunner, spidermonkey). | ||
* To see it, append '&showall=1' to the URL ({{bug|748833}} will add a checkbox for this to the TBPL UI). | * To see it, append '&showall=1' to the URL ({{bug|748833}} will add a checkbox for this to the TBPL UI). | ||
* To filter the jobs displayed, under the 'Filters' menu use the 'job name' field (which supports | * To filter the jobs displayed, under the 'Filters' menu use the 'job name' field (which supports regexp). | ||
* eg: to see both ASan & Valgrind jobs on mozilla-central (neither of which are shown by default), use: [https://tbpl.mozilla.org/?showall=1&jobname=(asan|valgrind) https://tbpl.mozilla.org/?showall=1&jobname=(asan|valgrind)] | * eg: to see both ASan & Valgrind jobs on mozilla-central (neither of which are shown by default), use: [https://tbpl.mozilla.org/?showall=1&jobname=(asan|valgrind) https://tbpl.mozilla.org/?showall=1&jobname=(asan|valgrind)] | ||
== The future == | == The future == | ||
* Planned improvements to our tooling will likely mean that some of these requirements can be relaxed in the future, as well as making it easier for maintainers of non-default-view job types to track their success/failure without having to monitor TBPL continuously. | * Planned improvements to our tooling will likely mean that some of these requirements can be relaxed in the future, as well as making it easier for maintainers of non-default-view job types to track their success/failure without having to monitor TBPL continuously. | ||
* | * Planned features for the successor to TBPL ([[Auto-tools/Projects/TBPL2]]) include: | ||
** Multiple dashboards/views for different use cases/teams (giving us more flexibility than just "default view" or "&showall=1"). | ** Multiple dashboards/views for different use-cases/teams (giving us more flexibility than just "default view" or "&showall=1"). | ||
** Opt-in notifications (email, IRC, dashboard, ...?) of failures for desired job types (see proposal in {{bug|851061}}). | ** Opt-in notifications (email, IRC, dashboard, ...?) of failures for desired job types (see proposal in {{bug|851061}}). | ||
* [[Auto-tools/Projects/Bisect_in_the_cloud]] will allow sheriffs to more easily narrow regression ranges for job types that do not run on every push, making it more viable to accept them into certain views/dashboards. | * [[Auto-tools/Projects/Bisect_in_the_cloud]] will allow sheriffs to more easily narrow regression ranges for job types that do not run on every push, making it more viable to accept them into certain views/dashboards. |