Socorro:Hadoop: Difference between revisions

Socorro:Hadoop (view source)

51 bytes added , 10 May 2010

32

edits

@@ Line 7: / Line 7: @@
 Each worker will need its own mdsw to communicate with as well.  Rather than requesting further changes to mdsw to make it operate in a worker pool fashion, it would probably be easiest to have each PyProc worker start a dedicated mdsw that it will communicate with.
-===Hadoop Flowchart===
+=== [[Image:Hadoop-Hbase.png|Hadoop-HBase]] ===
-Each hadoop job reads list of ooid's, splits and passes the ooid's to mapper
+=== Hadoop Flowchart ===
-Mapper invokes a socket connection and sends a request to pyproc to process raw-dumps
-raw-dumps are then collected by mapper and sent to the reducer.
+Each hadoop job reads list of ooid's, splits and passes the ooid's to mapper Mapper invokes a socket connection and sends a request to pyproc to process raw-dumps raw-dumps are then collected by mapper and sent to the reducer. reducer inserts the raw dump and certain processed columns in Hbase
-reducer inserts the raw dump and certain processed columns in Hbase
 ===Open questions===