Changes

Jump to: navigation, search

DXR Storages

435 bytes added, 06:19, 24 February 2014
Tentative Roadmap: Went looking for a trigram-from-regex extractor in Python. Crickets.
# Retool query machinery to run on ES and to be line-based. (If speed is awesome even with pathological regexes (unlikely), we can deploy here.)
# Build routine to extract trigrams from regexes. (There is no existing work apparent in Python. We could require re2 and call through to its <code>Prefilter::Info::TakeMatch</code> etc., but it doesn't look too hard to implement or too CPU-intense (when you start from the sre_parse.parse() in stdlib); I'd have to do some work in any case to bridge Python to that C routine; and fewer build steps, git submodules, and build-time checkouts make for a lower contributor support load.) Add trigram indices for lines and switch to a filtered query for regexes. Deploy.
# Get rid of the rest of the on-disk instance, embed necessary region and ref offsets and payloads into the ES index (out of band with the source code), and build pages at request time. Add caching if needed. Something like config.py might still hang around so we don't have to fetch trivial things like WWW_ROOT over a socket.
Confirm
574
edits

Navigation menu