Scribed by Thomas
State of WebCompat Report Review (Honza)
- What we have learned?
Honza: great improvements since last version: now top 10 instead of top 5, top issues have better context (screenshot, explanation, links to KB entries and Bugzilla), new data sources for report (interop202x), Trends, Risks now have links, appendix explaning scoring logic.
Honza: any challenges to note?
James: once KB was accurate, things were easier. It's mostly questions about the workflow (duplicate data, which data is up to date or not, how are fixed bugs going to be resolved automatically, etc). We should think about how we will normalize data next time to reduce friction.
Dennis: agree with James. Creating the top ten list was surprisingly straightforward this time, but maintaining the KB and such may be our biggest pain going forward. It's unclear if that will be problematic.
James: working on the appendix revealed some possible issues about our methodology, but it's unclear how severe they really are (mostly arbitrary-seeming assumptions). It feels mostly right.
Joe: the issue of "a website owner is not supporting Firefox" isn't really mentioned in the report. Might be worth commenting on it, like "we don't consider it a webcompat issue"?
James: we mentioned that in Trends, but perhaps we should clarify that the top ten is about platform issues, rather than including absolutely everything. We don't have a KB entry for "Firefox not supported" for instance, so we don't have clean data for it.
Dennis: we have a github label for unsupported, and 242 bugs with that label. We could rank it zero to track it, but indeed it's not actionable with platform changes, it's a devrel issue. It would absolutely be issue #1 if we did rank it, however.
Joe: how easy would it be to provide a global quantification of each issue, like an estimate of "one in a million page views has this error", and "on those sites we estimate that it's breaking the page or just cosmetic".
James: hard to say, since it might affect only some users on certain sites (inserting Emoji, etc). The score is attempting to include this kind of impact estimate, but converting it into harder numbers will require a way to actually check sites (http archive, telemetry, etc). A point estimation of number of users affected may be plausible, but beyond that it's unclear now.
Joe: two questions to consider here: "how bad is the problem" and "what should engineers work on next". Extrapolation might be useful as a result, especially beyond the web platform team.
Tom: effort is planned to check http archive etc, but accuracy of the data is still likely to be unclear.
Joe: as long as we can estimate "this affects 3-4% of users daily", or something like that. error margins might be tricky?
Tom: concerned that we might mislead people however, if we aren't clear on what we're actually assessing (may think we're measuring more broadly than we are, or think we're more accurate that we are if we're only testing the top million sites for instance)
James: is that really an issue? ie, do we "care" about breakage beyond the top million, or is it major enough to worry about? or will users just consider firefox broken regardless, and not care about it beyond that? I think it's easier to gather data for static issues vs user-interaction ones, and that's more of the key problem.
Joe: the key insight/improvement here would be to be able to assess what we don't know, to help answer those kinds of stakeholder questions.
James: we are working in that direction, but it's too early to tell.
Joe: it's important to note that this is meant as an ongoing effort, so getting into a good cadence with fixing issues in the report would be a good place to end up in.
James: sites that don't support Firefox is a relatively crisp metric. It's a clearer problem in the sense that it should be easier to judge how many users are affected?
Honza: originally the Risks section didn't include interop202x as data sources. What were the challenges in that section this time?
James: the section is mostly based on interop2022 right now (we don't have clear signals from 2023 yet). Once we have vendor positions on each proposal, we can include Risks based on those. What I added this time was areas where we were ahead, or others are now ahead. For instance where our implementation ended up not being as solid as others, even if we were first to implement. So we should monitor those kinds of issues. One of our top tens ended up being related to that, validating that the risks are real.
James: other sources of data (CanIUse.com, intents-to-ship, Chrome use counters, etc) can also be included in the assessment in future reports.
Honza: so mostly including more data sources would be good, and also make sure we're asking the right questions?
James: right, making sure that the data is actually useful (as we discussed regarding WPTs). So making sure we choose the best data sources is probably key. This section is more useful for missing features, compared to how likely devs are to actually use them.
Honza: coming up with a scoring methodology might not be feasible, then?
James: some risks are obvious (container queries), but the ones that are less subtle might still be missed. Also maybe we already know about the obvious ones.
Dennis: the biggest real problem here is the stuff for which we don't have any signal yet. I think our key goal here should be to include more (good) data sources, to help find more potential candidates for the Risks section.
Honza: agreed, maybe we can also include Trends information at one point.
Honza: as for the Trends sections, we have some new items here (Firefox not supported, etc). But we don't have explanations for why, just that they're possible trends. What were the biggest challenges here?
Raul: incomplete user reports (no or missing steps-to-reproduce, no description, etc) was the most difficult challenge. But we were still able to reproduce the issues we documented in the section.
James: the first question we will probably get is how we normalized the data in these graphs. (Did we just see an increasing trend because we have ten times the reports compared to 2017, or did we normalize it properly?)
Dennis: thankfully I checked recently and it looks like that probably isn't an issue, but we should still make sure to normalize regardless.
Honza: are there any other data sources we could use for Trends?
Ksenia: site rank, how important a site is in a given country.. anything giving us insight into the data we already have.
Honza: there is also our new appendix explaining how we score KB entries for the top ten. Any thoughts on that?
Dennis: I like it. I think it will improve in the future (we might be scoring mobile-only issues too low, for instance). It is a really good summary of what we do right now.
James: I can add a sub-section for the platform?
Dennis: we should discuss it properly first, but yes.
Honza: any general thoughts in closing?
Joe: I've only heard positive feedback so far.
James: we're counting reports, but not sites, which might be meaningful data (lots of users saying a site isn't supported, etc). That is, considering duplicate reports in some new way.
Last working week (Oana)