Fossology

From MozillaWiki
Jump to: navigation, search

We are investigating using Fossology for a number of license-related tasks.

Codebase License Analysis

  1. Upload a copy of a codebase to the server
  2. Be able to configure some bits of the codebase as "not to be scanned" or "not relevant"
    • It's very useful to be able to do this in the app, and to apply the same exclusions to subsequent versions of the codebase
  3. Scan the codebase to allocate a license to a subset of files based on their contents - nomos
  4. Scan it again to infer a license for some other subset based on the licenses of the files around them, allocating a probability of correctness
    • This needs to take into account the licenses of all the files in that file's directory, plus certain files in parent directories
  5. Scan it again to create a report on possible license issues (proprietary code, or incompatibilities, etc.) that need to be investigated manually
    • If necessary, it could be this scan which discounts some parts of the tree as irrelevant
  6. Resolve those issues either by annotation, or by fixing the source and uploading a new version
  7. When new versions are uploaded, keep and apply the decisions and annotations which were made for the old versions
  8. This includes the option to keep decisions if the file contents have changed (but the path is the same)
  9. Need to be able to view what we did in future, as an audit trail

Possible issues:

  • If Fossology agents run in parallel, is that a problem?
  • The buckets mechanism seems to have a very simply interface, which doesn't allow DB access; will 3 and 4 have to be done by agents?
  • Fossology currently does very little with file paths, so if a file changes but the path stays the same, annotations and license tweaks are lost

Things we don't need to do:

  • Scan the code against a large corpus to look for code which has been taken from elsewhere

Meeting License Requirement for "Include Text" Licenses

Some licenses, mainly BSD-like ones, require you to reproduce a copy of the license text with the distribution. One needs to be able to extract all such blocks of text from a codebase, de-dupe them (intelligently) and produce a file listing them all.

We already have a script to do this, but Fossology might well do an equally competent job with less hassle. Or, at least, it could provide a list of target files from which a license should be extracted.

Possible Issues with Fossology

  • Pages like this one suggest that direct DB manipulation is required in a concerningly large number of common scenarios
  • Does the DB access library that they provide for agents to use only work with C? Can we bind it to a higher-level language?