Fossology
From MozillaWiki
We are investigating using Fossology for a number of license-related tasks.
Codebase License Analysis
- Upload a copy of a codebase to the server
- Be able to configure some bits of the codebase as "not to be scanned" or "not relevant"
- It's very useful to be able to do this in the app, and to apply the same exclusions to subsequent versions of the codebase
- Scan the codebase to allocate a license to a subset of files based on their contents - nomos
- Scan it again to infer a license for some other subset based on the licenses of the files around them, allocating a probability of correctness
- This needs to take into account the licenses of all the files in that file's directory, plus certain files in parent directories
- Scan it again to create a report on possible license issues (proprietary code, or incompatibilities, etc.) that need to be investigated manually
- If necessary, it could be this scan which discounts some parts of the tree as irrelevant
- Resolve those issues either by annotation, or by fixing the source and uploading a new version
- When new versions are uploaded, keep and apply the decisions and annotations which were made for the old versions
- This includes the option to keep decisions if the file contents have changed (but the path is the same)
- Need to be able to view what we did in future, as an audit trail
Possible issues:
- If Fossology agents run in parallel, is that a problem?
- The buckets mechanism seems to have a very simply interface, which doesn't allow DB access; will 3 and 4 have to be done by agents?
- Fossology currently does very little with file paths, so if a file changes but the path stays the same, annotations and license tweaks are lost
Things we don't need to do:
- Scan the code against a large corpus to look for code which has been taken from elsewhere
Meeting License Requirement for "Include Text" Licenses
Some licenses, mainly BSD-like ones, require you to reproduce a copy of the license text with the distribution. One needs to be able to extract all such blocks of text from a codebase, de-dupe them (intelligently) and produce a file listing them all.
We already have a script to do this, but Fossology might well do an equally competent job with less hassle. Or, at least, it could provide a list of target files from which a license should be extracted.
Possible Issues with Fossology
- Pages like this one suggest that direct DB manipulation is required in a concerningly large number of common scenarios
- Does the DB access library that they provide for agents to use only work with C? Can we bind it to a higher-level language?