Changes

Jump to: navigation, search

DXR Storages

720 bytes added, 21:25, 6 March 2014
Tentative Roadmap: Dump my latest ES schema design in.
# Retool query machinery to run on ES and to be line-based. (If speed is awesome even with pathological regexes (unlikely), we can deploy here.)
#* Here’s what I want. We type a bunch of words into a search field. Along the way, it suggests completions that make identifier names. A search then looks for identifiers (which would now tend to be complete), content substrings, and path segments (or substrings, or sequences of segments?). We AND them together. OR support may come later.
#* We could index the pathnames into each line, denormalizing, and always search on lines. That would make those easy to AND together. We don’t even need to mget the files afterward as with parent-child relationships, since every line contains the full path (but not icon or encoding—important? Probably not, if highlighting works, which it should without parent-child.).
# Build routine to extract trigrams from regexes. (There is no existing work apparent in Python. We could require re2 and call through to its <code>Prefilter::Info::TakeMatch</code> etc., but it doesn't look too hard to implement or too CPU-intense (when you start from the sre_parse.parse() in stdlib); I'd have to do some work in any case to bridge Python to that C routine; and fewer build steps, git submodules, and build-time checkouts make for a lower contributor support load.) Add trigram indices for lines and switch to a filtered query for regexes. Deploy.
# Get rid of the rest of the on-disk instance, embed necessary region and ref offsets and payloads into the ES index (out of band with the source code), and build pages at request time. Add caching if needed. Something like config.py might still hang around so we don't have to fetch trivial things like WWW_ROOT over a socket.
Confirm
574
edits

Navigation menu