Compatibility/Meetings/2023-01-31

From MozillaWiki
Jump to: navigation, search

Minutes

  • Scribed by James
  • Facilitated by Tom

Using data to measure impact of known WebCompat issues (Honza)

Scoring logic:

I = F x S

  • F - Frequency
  • S - Severity (SiteBroken, WorkflowBroken, FeatureBroken, SignificantVisual, MinorVisual, UnsupportedMessage)

F = r p q f

  • r - ranking (site popularity, calculated from CrUX dataset)
  • p - platform factor (weighted by calculated platform usage, telemetry)
  • q - qualitative likelihood of a site user experiencing the issue (custom metrics)
  • f - reduction factor (fixed by existing intervention, measure impact w/o intervention)

Actionable steps:

  • Use WebPageTest to collect custom metrics
  • Start building scripts for some (top) known compat issues detection.
  • Learn what Firefox Telemetry would help to improve accuracy of the audit
  • Use collected (historical) data to see the impact we’ve made
  • Compare our historical data with user satisfaction metrics (user sentiment survey data)

Report:

  • Write report which (a) evaluates our options to measure impact of known issues and ability to detect new issues and (b) proposes differential analysis (A/B testing) to measure impact of fixed core bugs and/or shipped interventions on user retention (like e.g. measuring time spent on the page)

Honza: We haven't written down how we want to improve our calculation of impact. We want to make the existing scoring logic more accurate using data. Qualitative likihood factor is one of the places we don't have good data. This is what the actionable steps are about. Need to figure out how to run WebPageTest locally. Other option is to work with the upstream team. Scripts would be designed to audit pages and identify known web compat problems. That would identify how widespread the issue is across the internet. Need a script per issue. Not every issue can be statically found. Could build scripts to detect new issues, but this could be harder. Could use telemetry (new or existing) with the collected data.

Tom: I agree with that.

Dennis: For the ranking the biggest point of uncertainty is the actual factors we're using. We guessed the actual numberical values, but we can't tell if e.g. a minor visual issue is really 10x less important than a broken feature. Those numbers are easy to change.

Honza: Some issues that are reported e.g. mouse events on disabled input elements: we should take those and figure out if the page is likely to be running into that issue e.g. event handlers on a disabled element. Need to check those things depending on what the user is doing. Sometimes that could be hard; harder than fixing the issue. Might be better to focus on a few issues. Might want to try to detect an unsupported dialog.

Tom: That would be the validation value; once we've identified that something is likely to be a top-20 issue we can verify that. We also want to move onto the longer tail to understand how big their impact is. The other part of this is that web page test will mostly let us test the front page without scrolling or user interaction. So there are issues here with testing. Might want to come up with manual testing of top 1000, but that's hard to maintain. Might be better to measure it with scripts/telemetry inside Firefox. Difference with existing telemetry is that it would be a deeper dive: are they using things in a way that seems like it would cause a webcompat issue. There's work to investigate if it's possible to do in a perfomant way. We can measure metrics on live user data. Don't know if we can tie it to specific URLs.

James: Makes sense. It's hard to due this in a very uniform way: we're not going to have a number for every issue to say how frequent an issue is. Sometimes it might be easy, like checking how often people want to access window.print, but for some of the other issues that depend on complex user-interaction flows, we have no idea. This proposal makes sense to validate our hypothesis, but we're not going to completely automate the ranking for every issue. For the issues we can measure, we should. We can apply the learnings of that to other issues we discover, and we'll get an understanding of what we can measure and what we can't measure. I think we should start small, and build some examples of that to see what works and what doesn't. Particularly for telemetry, I know there are strong performance considerations for tracking property accesses. We shouldn't be too confident that we can do everything, as we need to talk to other teams to see what kind of performance regression they are happy to accept.

Tom: I'd start with collecting telemetry for things where we currently don't know the impact, but where we can measure something, maybe even for an experimental subset of users.

James: I think we need to write this into a document and get the DOM team to have an opinion. For some use-counters, there were concerns before where adding use counters changes optimization properties of JavaScript calls etc. I think we should be careful not to be over-promising without talking to people about how big the impact is, and how realistic the plans are.

Tom: We're going to start by making some scripts to get some experience. We should start with a list of the top 20 issues that we think require a script to measure. Then we should understand how hard it is to measure these issues. We might need things added in the gecko layer. Then we can consider the perf impact. Until we understand what we want to do, it's hard to estimate the impact.

Honza: I agree that we should start small, learn and make some progress. The scripts we want to introduce, could we run them as an audit against a page to see if it contains possible compat issues.

Tom: Yes, we could make it run in something like lighthouse. We could also use it for diagnosis like the webcompat panel at the moment. e.g. disabled on a click element might not work in Firefox ESR. First round of analysis will give us working scripts. Can integrate into telemetry if we find there's value in doing so. Similar to the process for develpoing interventions

James: I have two concerns about turning it into developer facing features. One is that a webcompat issues could be fixed relatively fast. And the other one is that if we create a list of features that we can't support, the question is why we don't just fix it.

Tom: Don't need to prioritise metrics that make us look bad. Need to decide if we care about ESR. Rule of thumb should be: can we make a suggestion that will generally increase webcompat. There are some layout features that won't work the same across any browsers. Giving users workarounds is the easiest way to get them to care. Doesn't have to be Firefox specific.

James: In general it would be better to fix these known interop issues at the platform layer e.g. via the Interop project. In practice we don't have huge lists of areas that authors should just avoid but aren't Firefox bugs.

Honza: Have an OKR to make a metric for progress on webcompat. Could we turn data into a metric? Perfomance team has speedomter, can we have something similar? Calculating global state of webcompat is hard, but maybe we can measure progress with fixing issues and compare with user satisfaction metrics? Might take a few quaters to notice a trend.

Tom: Yes. But it's complicated. we can gather web page test and telemetry data. Need to track the data. Want to corrlate with user sentiment data. Need to measure enough issues. Might be able to say "we saw fewer instances of this issue on top 1000 sites" and attribute it to Firefox changes. Telemetry might tell us about longer tail.

Dennis: I think there's a problem tracking compatibility over time. We would want to see the number of issues going down over time, but that's in conflict with collecting better data which will increase the numbers we see. I'm not sure if this is a good metric. It tracks our level of understanding as well as the actual size of the problem.

Tom: There will be too much data to tell which is true. On an issue by issue basis we can see improvements.

Dennis: It's comparable to counting the number of webcompat.com reports. It could be that it's getting less compatible as we get more reports, but it could just be that we're getting better at collecting data. We can track this on a team level, but using this for higher level goals seems more problematic.

James: There are things that we can track that don't neccessarily have that problem. For example, you could say "last quarter, we fixed 100 scoring points of known webcompat issues. so we know that we did something to help the problem." - that doesn't tell us about the total size of the problem, but it tells us about our ability to change something. It depends on what the OKR's priorirty is: if we want to know if the platform team is working on the right issues, than we can answer that. If the goal is to track if Firefox is getting more or less compatible with the web, this is harder to do, for the reasons Dennis described. The latter is less useful for the Plattform org, but it might be more useful for the higher-ups. But if you look at this data and make decisions on it, you should be very much aware of the limitations of the data.

Honza: We somehow need to learn more about this. We really want something to measure RoI. For now user satisfaction is the best metric we have.

James: User satisfaction is an important metric, but we have exactly the same problem: we can't correlate anything we do directly to user satisfaction. This depends on manual surveys, and we have no direct link to anything direct we do, like fixing a platform bug. We can't really measure if a specific webcompat bug fix changes anything, because we have no way of measuring the impact - and this problem is common accross Firefox.

Honza: Joe wants a report on all the things that we're going to try and learn. PPM might allow us to measure the impact of interventions and bug fixes. Should be a summary of what we've done and what's possible.

James: The ability to measure "we fixed this thing, and this fix changed this measure" would be useful, if we can. But maybe we'll learn that it makes no difference. We have to see what we find. But that kind of differential analysis seems promising, although it doesn't give you a total metric, but it gives us a way to validate a hypothesis.

WebCompat KB NG (Honza)

  • Status, next steps?
  • Blockers?

Honza: Any blockers? Any updates?

Dennis: I'm blocked on creating a GCP project; when I try to start a database I get an error. So we need to get approval to set up the initial infrastructure. I hope to get this resolved quickly. In the meantime I'll start creating a schema we can use locally without requiring GCP so that we're unblocked. I will have the schema in a day or so. Will schedule a meeting to chat about actual engineering. Might be Thursday or so. Expect to write actual code this week.

jgraham: Maybe we should set up a regular meeting for this?

Dennis: I'll find a timeslot, maybe on a Thursday.

Honza: Let me know if you need help.

Mixed content - page not secured (SV)

We have concluded in the past that for issues where parts of the page are not secured (error shown in the lock icon of the URL bar) based on mixed content being present, and that is just a warning, and from our point of view nothing is broke. However, we still stumble upon users where they get a bit afraid that they might lose users on sites where payments are in place/required for different services, due to the warning. Is there something else we can do to re-assure them that this is not the case, besides the explanation regarding how mixed content works?

Context: https://github.com/webcompat/web-bugs/issues/116968

Dennis: auto-upgrading non-HTTPS requests is enabled in Beta and Nightly these days (https://bugzilla.mozilla.org/show_bug.cgi?id=1672106), making our behavior compatible with Chrome.

Dennis: no timeline for enabling that in Release yet (https://bugzilla.mozilla.org/show_bug.cgi?id=1779757). Lots of issues that need to be resolved first. There's also a CSP issue for mixed content; that's probably a blocker for enabling it on release. We want to make sure we get this right to ensure we don't create sec issues. It's actively being worked on. I can talk to Freddy.

Dennis: I also helped the webcompat issue author identify the problematic assets. This is usually easy for developers to fix. Also it's only a warning. It looks scary because it's actually a real security issue, but it's not our fault. Feel free to point me at issues where it's a problem.

Tom: Low hanging fruit is addressed, what's left is the annoying stuff.

Dennis: We now have a spec, we just need to make sure that we're actually following the specs. There shouldn't be much webcompat risk unlike e.g. cookie changes.