Changes

Jump to: navigation, search

Identity/AttachedServices/StorageServiceArchitecture

680 bytes added, 04:58, 30 April 2013
no edit summary
The proxy process is responsible for monitoring the health of these machines and sounding the alarm if something goes wrong. If the active master appears to be down, the proxy will transparently promote the hot standby and start sending queries to it. When the downed master comes back up, it is demoted to being the new standby.
'''TODO:''' Just one standby? Two? The principle should be the same regardless of how many we have. Star topology Topology FTW.
'''TODO:''' We could use the standby as a read slave, but I don't see the point. In a failure scenario the master needs to be able to handle the entire read load on its own, so it might as well do that all the time.
Note that this shard-state metadata will be very small and be updated very infrequently, which should make it very friendly to a local zookeeper installation.
 
== Inter-DC Redundancy ==
| +--------------+ +------------------+ |
| | Web App Tier | | Shard Proxy Tier | +---------------------+ |
| | | | | +-->| Master for Shard #N |-------|+-----+
| | +---------+ | | +-------------+ | | +----------+----------+ | |
| | | Web App | |--->| | Shard Proxy | |-----+ | (replication) | |
| | +---------+ | | +-------------+ | | +----------V---------------+ | |
| | +---------+ | | +-------------+ | +-->| Hot Standby for Shard #N | | | ( v e r y very slow replication) | | | Web App | | | | Shard Proxy | | +--------------------------+ | | ( s l o w ) | | +---------+ | | +-------------+ | | | ( r e p l i c a t i o n )
| +--------------+ +------------------+ | |
| | | | +---------------------------------+-------------------------------------------------+ | | | | (very slow replication) | | | +---------------------------------+---------------------------------------------------+ | | US-West Data Center | | | | V | |
| +--------------+ +------------------+ | |
| | Web App Tier | | Shard Proxy Tier | +---------------------------+ | |
| +--------------+ +------------------+ |
+------------------------------------------------------------------------------------+
 
 
For this scheme to work, every Shard Proxy process in every DC needs to agree on where the master is for every shard. That's not trivial. But it should be pretty doable with some replication between the respective ZooKeeper masters in each DC. We would probably not try to automate this failover, so that Ops can ensure consistent state before anything tries to send writes to a new location.
 
 
'''TODO:''' How many DCs? The principle should be the same regardless of how many we have. Nested Star Topology FTW.
Confirm
358
edits

Navigation menu