Elmo/Retention Policy

From MozillaWiki
Jump to: navigation, search

Elmo generates a host of data in various forms. This document describes how the amount of data stored will be limited, backed up, or lost.

Right now, there's no retention policy. We store everything. There's a proposal to have one at the end.

Data types

The data generated by elmo is stored in two forms right now, SQL and plain files.

SQL 
Most of the process data and statistics is stored in SQL, through django apps.
Files 
Buildbot stores build logs in individual text files, next to status pickles.

SQL

Right now, there's no immediate need to create a retention policy for our databases.

TODO: Ask IT when that comes.

Files

There are different file types, with different importance. Ordered by how badly they break the buildbot install when lost:

master-ball

master-ball/l10n-master/changes.pck stores the changes. This file wants to be pruned every now and then, but needs to make sure that the changenumber is kept. Right now, Axel occasionally does:

cd site/master-ball/l10n-master
../scripts/buildbot stop .
python
import pickle
cm = pickle.load(open('changes.pck'))
len(cm.changes)
del cm.changes[:]
cm.parent = None
cm.saveYourself()
^D
# in the shell again
../scripts/buildbot start .

If this file is lost, the change numbers start from scratch, and conflict with the DB. That's fatal, and probably needs manual tweaking of the python file, similar to the above scriplet.

TODO: Find out which built-in buildbot knob might control this for us.

build storage space

In the build data storage space, there are the build generated files. Right now, that's in /mnt/space/builds/l10n-master.

For each builder on each master, there's one python pickle file called builder. That's the buildstatus pickle.

TODO: Find out what happens if this file is lost.

For each build, there's a python pickle file with just the buildnumber as name, 1234. The information in those files is replicated in the SQL storage. Buildbot uses the last of those files to determine the next buildnumber on restart, though, so the latest file needs to be kept.

TODO: Find out how to resurrect if last file is lost.
TODO: Find out which built-in buildbot knob might control this for us.

In addition to the pickle file, there are several log files per step per build. They're named 1234-log-stepname-logname[.bz2], and are (potentially compressed) buildbot log files. They're basically a multiplexed set of streams, with headers, stdout, stderr, and as extension of elmo, json.

The data for all but the compare step is hg checkout and revision state. That data is ending up in the build properties, and thus in the SQL database, too. This data is mostly interesting to debug problems where results are inconsistent with expectations.

If these files are lost, not much information is lost. There's no functional bustage aside of 404 errors when trying to read the actual logs.

The compare-locales step logs are named like '1234-log-moz_inspectlocales[_dirs]-stdio[.bz2], and contain the actual output of compare-locales like which strings in which file are missing or obsolete, which errors were caught. The summary numbers, count of missing, errors, etc are reflected in the SQL database. The detailed results are used both by localizers to find out what to do, and by drivers to assess how badly a localization is lagging, how bad errors are.

If these files are lost, the compare view can't show the detailed html tree.

Proposal

  • Set buildHorizon in buildbot's config to limit the number of build status pickles to a finite amount of files per builder. Keep logHorizon = None to keep log files in general.
  • Manually prune log files with scripts/cron jobs
    • For compare-locales, keep
    • For other logs, keep a month worth of data
    • When pruning the files, also remove the entries in the DB, so steps will look like not having had any log files ever in the html view.

Looking at the DB, without checking the files, that might cut the amount of files to some 15%.