Auto-tools/Projects/OrangeFactor: Difference between revisions

From MozillaWiki
Jump to navigation Jump to search
(initial page creation from today's meeting)
 
(→‎Meeting: September 22, 2010: correct wiki syntax)
Line 3: Line 3:


Parsing:
Parsing:
- logparser lives here: http://hg.mozilla.org/automation/logparser
* logparser lives here: http://hg.mozilla.org/automation/logparser
  - THIS DOESN'T WORK! its a straight port from topfails (which is to say it works but is inadequate and there are also bugs on it)
** THIS DOESN'T WORK! its a straight port from topfails (which is to say it works but is inadequate and there are also bugs on it)
- should be done to jython standards for hadoop
* should be done to jython standards for hadoop
    
    
  Storage:
Storage:
  - files in filesystem mirror that from the ftp site
* files in filesystem mirror that from the ftp site
  - (raw) log -> parser -> flume (sp?) -> hdfs
* (raw) log -> parser -> flume (sp?) -> hdfs
  - block size: 128M
* block size: 128M
    - does this make looking through files slow?
** does this make looking through files slow?


What do we want?
What do we want?
- we have a (proposed) schema
* we have a (proposed) schema
- we have a (proposed) REST interface
* we have a (proposed) REST interface
- (we should put this on a wiki page and move towards finalization)
* (we should put this on a wiki page and move towards finalization)


Process:
Process:
- we give python script (e.g. logparser)
* we give python script (e.g. logparser)
- invoked on every log file
* invoked on every log file
- output == what we want
* output == what we want

Revision as of 00:35, 23 September 2010

Meeting: September 22, 2010

Parsing:

  • logparser lives here: http://hg.mozilla.org/automation/logparser
    • THIS DOESN'T WORK! its a straight port from topfails (which is to say it works but is inadequate and there are also bugs on it)
  • should be done to jython standards for hadoop

Storage:

  • files in filesystem mirror that from the ftp site
  • (raw) log -> parser -> flume (sp?) -> hdfs
  • block size: 128M
    • does this make looking through files slow?

What do we want?

  • we have a (proposed) schema
  • we have a (proposed) REST interface
  • (we should put this on a wiki page and move towards finalization)

Process:

  • we give python script (e.g. logparser)
  • invoked on every log file
  • output == what we want