32
edits
| Line 7: | Line 7: | ||
Each worker will need its own mdsw to communicate with as well. Rather than requesting further changes to mdsw to make it operate in a worker pool fashion, it would probably be easiest to have each PyProc worker start a dedicated mdsw that it will communicate with. | Each worker will need its own mdsw to communicate with as well. Rather than requesting further changes to mdsw to make it operate in a worker pool fashion, it would probably be easiest to have each PyProc worker start a dedicated mdsw that it will communicate with. | ||
===Hadoop | === [[Image:Hadoop-Hbase.png|Hadoop-HBase]] === | ||
Each hadoop job reads list of ooid's, splits and passes the ooid's to mapper | === Hadoop Flowchart === | ||
Mapper invokes a socket connection and sends a request to pyproc to process raw-dumps | |||
raw-dumps are then collected by mapper and sent to the reducer. | Each hadoop job reads list of ooid's, splits and passes the ooid's to mapper Mapper invokes a socket connection and sends a request to pyproc to process raw-dumps raw-dumps are then collected by mapper and sent to the reducer. reducer inserts the raw dump and certain processed columns in Hbase | ||
reducer inserts the raw dump and certain processed columns in Hbase | |||
===Open questions=== | ===Open questions=== | ||
edits