User:David.humphrey/Tools

From MozillaWiki
Jump to: navigation, search

DRAFT


Introduction

Today the tools used by developers of large and complex software systems, such as Mozilla's Firefox web browser or the Linux operating system, are still based on technology created in the 1970s. Early work on software development tools paved the way for the advances in end-user software we use today: the refinement and flexibility of the software we produce is many times more powerful than the tools used to create it. Software developers make do with what they have, choosing to focus on the construction on new software rather than revamping their existing tools. This has resulted in a technical debt that we are now being asked to repay, as our existing software needs to be updated for use on mobile devices, parallel desktop environments, and the web.

Mozilla has been experiencing this issue first hand for many years. Their Firefox web browser is among the most popular in the world, with more than 330 million users worldwide. Recently, the web browser market has seen greater competition, as vendors such as Apple, Google, and Microsoft have worked to narrow the gap with Mozilla. Keeping pace with these large corporations has meant that Mozilla has had to rethink its development strategy. As Fred Brooks noted in his classic text on software engineering, The Mythical Man-Month, "adding manpower to a late software project makes it later." However, if those same people are tasked with creating new tools to solve these problems, the leverage of a few people is greatly increased.

Tools as Leverage

In order to evolve its more than 8-million lines of source code for modern computers and devices, Mozilla created a small team to explore the possibility of developing tools in order to automate the work of rewriting its code. Mozilla uses the C++ programming language, which is powerful and fast, but also incredibly complex. In order to automate the process of rewriting C++ code, it was first necessary to investigate methods for parsing it. Up to this point, no serious work had been done in academia or industry to properly solve this problem: most researches worked with the C language and theorized that their work could be applied to C++ too.

After a number of years of experiments and research, Mozilla created a series of tools that could parse, analyze, and rewrite C++ source code using simple scripts. Today these tools are singular in their scope and power.

Dehydra

Combining the power of the GCC compiler, and the expressiveness of the JavaScript language, Dehydra is a lightweight, scriptable static analysis tool capable of analyzing C++ code. Dehydra embeds Mozilla's SpiderMonkey JavaScript engine into GCC via a GCC-plugin. This means that any source code that GCC can compile can be analyzed with Dehydra, using simple scripts. As the code is compiled, the Dehydra plugin makes the C++ AST nodes available in the form of JavaScript objects. This allows type declarations, static and member functions, variables, statements, etc. to be gathered and examined with JavaScript.

Dehydra can be used as a semantic grep tool, able to look beyond macros and other C++ features that make textual analysis of C++ so difficult. Scripts can be written to extract semantic information or find bugs in source code, since Dehydra allows for much more error checking than C++ is capable of by itself.

Treehydra

Treehydra is Dehydra’s heavy duty companion. It deals with GCC’s internal GIMPLE intermediate representation. This allows Treehydra analyses to be run after any optimization pass in GCC. Treehydra scripts also get access to GIMPLE control flow graphs (CFGs). Access to CFGs enables more advanced path-based analyses. In effect, anything GCC “knows” about a piece of C++ is available to JavaScript code through Treehydra.

Right now, Dehydra and Treehydra require a custom, patched version of GCC. However, in the future, starting with GCC 4.5, plugin support will obviate the need for a patched GCC. That’s very exciting, especially because it will mean that static analysis of C++ code will become available to orders of magnitude more programmers!

Pork

Where Dehydra and Treehydra give access to C++ AST and control flow information, they don't provide sufficient data in order to allow do automated rewrites. This is the purpose of a third tool, Pork. Pork is a C++ parsing and rewriting tool chain. The core of Pork is a C++ parser that provides exact character positions for the start and end of every AST node, as well as macro and preprocessing substitution information. This information allows C++ to be automatically rewritten in a precise way.