* Receive, parse and validate each incoming query
* Extract the fixed target userid, erroring out if the query does not have one.
* Look up the shard number and corresponding database for that userid.
* Forward the query to the appropriate database host, and proxy back the results.
* Application code is greatly simplified.
* Same code paths can be used for testing with or third-party deploy against a single-database setupmachine.
* Reduces total number of connections and connection-related overheads.
* Proxy can do centralized health monitoring of the individual servers.
== Hot Standby Databases ==
To guard against the loss of any individual database server, each shard will also have a hot standby database, living in the same DC and configured for synchronous (semi-synchronous?) replication. The proxy monitors the health of the standby database, but does not forward it any queries. Its only job is to serve as a backup for the active master:
+---------------------+
+----->| Master for Shard #1 |
| +----------+----------+
| | (replication)
+---------+ +----------------+ | +----------V---------------+
| Web App |------->| Sharding Proxy |---+----->| Hot Standby for Shard #1 |
+---------+ +----------------+ | +--------------------------+
|
| +---------------------+
+----->| Master for Shard #2 |
| +----------+----------+
| | (replication)
| +----------V---------------+
+----->| Hot Standby for Shard #2 |
+--------------------------+
The proxy process is responsible for monitoring the health of these machines and sounding the alarm if something goes wrong. If the active master appears to be down, the proxy will transparently promote the hot standby and start sending queries to it. When the downed master comes back up, it is demoted to being the new standby.
'''TODO:''' We could use the standby as a read slave, but I don't see the point. In a failure scenario the master needs to be able to handle the entire read load on its own, so it might as well do that all the time.
| +---------+ | | +-------------+ |
+--------------+ +------------------+
Note that we're '''not''' doing this to the MySQL servers. There's too many of them and we want too fine-grained a level of control over their failover behaviour.
==