Socorro:Hadoop: Difference between revisions

Jump to navigation Jump to search
Line 7: Line 7:
Each worker will need its own mdsw to communicate with as well.  Rather than requesting further changes to mdsw to make it operate in a worker pool fashion, it would probably be easiest to have each PyProc worker start a dedicated mdsw that it will communicate with.
Each worker will need its own mdsw to communicate with as well.  Rather than requesting further changes to mdsw to make it operate in a worker pool fashion, it would probably be easiest to have each PyProc worker start a dedicated mdsw that it will communicate with.


===Hadoop Flowchart===
=== [[Image:Hadoop-Hbase.png|Hadoop-HBase]] ===


Each hadoop job reads list of ooid's, splits and passes the ooid's to mapper
=== Hadoop Flowchart ===
Mapper invokes a socket connection and sends a request to pyproc to process raw-dumps
 
raw-dumps are then collected by mapper and sent to the reducer.
Each hadoop job reads list of ooid's, splits and passes the ooid's to mapper Mapper invokes a socket connection and sends a request to pyproc to process raw-dumps raw-dumps are then collected by mapper and sent to the reducer. reducer inserts the raw dump and certain processed columns in Hbase
reducer inserts the raw dump and certain processed columns in Hbase


===Open questions===
===Open questions===
32

edits

Navigation menu