Changes

Identity/AttachedServices/StorageServiceArchitecture

680 bytes added, 04:58, 30 April 2013

no edit summary

The proxy process is responsible for monitoring the health of these machines and sounding the alarm if something goes wrong. If the active master appears to be down, the proxy will transparently promote the hot standby and start sending queries to it. When the downed master comes back up, it is demoted to being the new standby.

'''TODO:''' Just one standby? Two? The principle should be the same regardless of how many we have. Star ~~topology~~ Topology FTW.

'''TODO:''' We could use the standby as a read slave, but I don't see the point. In a failure scenario the master needs to be able to handle the entire read load on its own, so it might as well do that all the time.

Note that this shard-state metadata will be very small and be updated very infrequently, which should make it very friendly to a local zookeeper installation.

== Inter-DC Redundancy ==

| +--------------+ +------------------+ |

| | | | | +-->| Master for Shard #N |-------|+-----+

| | +---------+ | | +-------------+ | | +----------+----------+ | |

| | +---------+ | | +-------------+ | | +----------V---------------+ | |

| | +---------+ | | +-------------+ | +-->| Hot Standby for Shard #N | | | ( ~~v e r y~~ very slow replication) | | | Web App | | | | Shard Proxy | | +--------------------------+ | | ~~( s l o w )~~ | | +---------+ | | +-------------+ | | | ~~( r e p l i c a t i o n )~~

| +--------------+ +------------------+ | |

| | | | +---------------------------------+-------------------------------------------------+ | | | | (very slow replication) | | | +---------------------------------+---------------------------------------------------+ | | US-West Data Center | | | | V | |

| +--------------+ +------------------+ | |

| +--------------+ +------------------+ |

+------------------------------------------------------------------------------------+

For this scheme to work, every Shard Proxy process in every DC needs to agree on where the master is for every shard. That's not trivial. But it should be pretty doable with some replication between the respective ZooKeeper masters in each DC. We would probably not try to automate this failover, so that Ops can ensure consistent state before anything tries to send writes to a new location.

'''TODO:''' How many DCs? The principle should be the same regardless of how many we have. Nested Star Topology FTW.

Rfkelly

Confirm

358

edits

Changes

Identity/AttachedServices/StorageServiceArchitecture

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

How to Contribute

MozillaWiki

Around Mozilla

Tools