DXR Language-Independent Schema

From MozillaWiki
Jump to: navigation, search

As DXR gains the ability to index more languages, we're going to need a non-C++-centric schema to hold the data. I don't think the existing schema is horrible, but I'm sure there are C++-isms hiding out in there. This page is a place to enumerate those isms and develop alternatives.

Strawman: jcranmer's documentation schema preview

I started work on developing a schema for the purposes of adding Doxygen-like support to DXR. The original documentation schema is focused on how to organize documentation and doesn't consider need for other DXR features, so I've extended it slightly to conform to those requirements.

The core of the schema is that everything is organized into "entities," which anything that can be reasoned about in documentation (or a cross-reference). Entities break down into four main categories as far as documentation is concerned: files (self-explanatory), namespaces (things that contain other entities but don't have a well-defined single location), aggregates (things that contain other entities but have a meaningful definition), and leaves (things that don't contain other entities). Note that entities also have kind names to distinguish between different aggregates (e.g., C++ structs/classes/macros). All entities are resolved and named by fully-qualified names, which are unique... almost.

XXX: Explain how to merge entities

From the standpoint of DXR's more advanced features, the documentation breakdown is insufficient. A more informative breakdown classifies entities into:

  • Types -- entities that represent a declarable type of a variable, like typedefs, classes, structs, etc.
  • Macros -- entities that represent things that expand into code. C++ templates are a better example here than C macros. Parameters here don't refer to specific values (e.g., they can refer to types or AST subtrees). This means that the code expanded by a macro may do things like define several types, functions, or etc.
  • Functions -- entities that can be called. Needed for the calltree support.
  • Variables -- entities that represent things that can take on values. Types may or may not be statically known (although, even in dynamic languages, we would still like to infer types if they are statically knowable).
  • Constants -- variables whose values are constant and known at compile-time (C/C++ enums, a subset of C macros, etc.)

Other Storages

See also alternative DXR Storages. We need not confine ourselves to the relational model.