Working Groups/PRESC

From MozillaWiki
Jump to: navigation, search

Performance Robustness Evaluation for Statistical Classifiers (PRESC)

PRESC was launched in the pilot cohort of Mozilla’s building trustworthy AI working group. It is a toolkit for the evaluation of machine learning classification models. Its goal is to provide insights into model performance which extend beyond standard scalar accuracy-based measures and into areas which tend to be underexplored in application, including:

  • Generalizability of the model to unseen data for which the training set may not be representative
  • Sensitivity to statistical error and methodological choices
  • Performance evaluation localized to meaningful subsets of the feature space
  • In-depth analysis of misclassifications and their distribution in the feature space

Watch video here to learn more about the problem PRESC solves.

We believe that these evaluations are essential for developing confidence in the selection and tuning of machine learning models intended to address user needs, and are important prerequisites towards building trustworthy AI.

As a tool, PRESC is intended for use by ML engineers to assist in the development and updating of models. PRESC is a tool to help data scientists, developers, academics and activists evaluate the performance of machine learning classification models, specifically in areas which tend to be under-explored, such as generalizability and bias. Our current focus on misclassifications, robustness and stability will help facilitate the inclusion of bias and fairness analyses on the performance reports so that these can be taken into account when crafting or choosing between models.

An example script demonstrating how to run a report is available here.

There are a number of notebooks and explorations in the examples/ dir, but they are not guaranteed to run or be up-to-date as the package has undergone major changes recently and we have not yet finished updating these.

Some well-known datasets are provided in CSV format in the datasets/ dir for exploration purposes.

All contributors can be found here