Merging the mozilla.org codebase into mozilla.com is very tricky. We are simply hacking the two codebases to run under one domain with Apache and PHP magic. There are several ways to do this, but we must preserve integrity and performance of all pages.
In order to justify our Apache+PHP solution, here is some raw data of why this merge should be successful.
We want to write some magic to load in mozilla.org pages as top-level URLs on the mozilla.com site. We can do this simply by generating all the top-level mozilla.org folders (about 25) and rewriting all of them to a special PHP page to load in the mozilla.org codebase instead.
However, we also need to merge mozilla.org's htaccess file which contains about 1500 redirects. This is where it gets tricky because big htaccess files are messy, and more importantly incur a performance hit on every single request.
Our quest is to figure out how to get all of this working under one domain and maintain integrity and performance on all pages.
How to read these graphs
These graphics were generated using the Apache benchmark tool `ab`. They show the percentage of requests that were returned in a certain response time in milliseconds. This is a decent indication of the performance of a site.
Unless otherwise stated, these were run with a concurrency rate of 10 and for 5000 requests which is generous enough to get a realistic picture. That means that it ran 10 connections at the same time (opening new ones when old ones finished) until 5000 requests were finished.
It's *very* difficult to accurately benchmark a site, but I ran these several times and found my setup to produce somewhat accurate, predictable results. Don't read too much into these, they should just give a general picture.
Terms used in the keys on these graphs:
- RewriteMap+PHP - RewriteMap redirects and PHP magic to serve mozilla.org as top-level URLs
- RewriteMap is an Apache directive to load an external rewrite file
- This is expected to be the fastest (and cleanest) solution
- Should not affect current mozilla.com site at all
- Should load in mozilla.org fast (negligible performance hit)
- Apache Redirect+PHP - Raw .htaccess redirects (copied from mozilla.org's htaccess) and PHP magic for mozilla.org pages
- Expected to have a decent performance hit, huge htaccess files are bad
- PHP - PHP magic for both htaccess redirects and mozilla.org pages
- Expected to work OK, but implementation must be hacky
- Pure - Normal mozilla.com or mozilla.org site as it stands now
I'll put a spoiler here: RewriteMap+PHP is a clear winner for merging the mozilla.org site into mozilla.com. It's a little hacky, but it works. In fact, mozilla.org gets a 50ms speed increase, which might not sound like much but 1/20th of second is a lot. mozilla.com will be unaffected by the merge. There's nothing that will incur a performance hit.
See bug 670775 for more description about the solution/hack.
mozilla.org had about 1500 rewrites/redirects in its .htaccess file. We need to port these over somehow, but with that many redirects performance is a major concern. The concern is that every request must process the entire htaccess file, meaning every request would incur a decent performance hit if we don't do this right.
There 4 different ways to port the redirects:
- RewriteMap(txt) - Use Apache's RewriteMap directive which loads optimized redirects from a text file
- RewriteMap(dbm) - Use Apache's RewriteMap directive which loads optimized redirects from a database file
- Apache Redirects - Simply move over the 1500 Redirect/Rewrite lines, use PHP to load in org pages
- PHP - Implement the redirects in PHP, where the real .org pages are handled
Here are the results tested with a concurrency of 10 for 15000 requests:
If we zoom in some by only testing 5000 requests and increase concurrency to 30:
Conclusion: Obviously RewriteMap is the way to go. It's worth noting that "Apache Redirect+PHP" was the only one that must be avoided since every request would suffer due to the ballooning of htaccess. The other methods can selectively be applied only to pages that don't exist.
We've actually improved the mozilla.org pages' performance a lot since we removed all the htaccess redirects, as we will see in another graph.
Lastly, we can compile the RewriteMap text file into a database if needed, as it performs better under high load. However, the insignificant gains aren't worth worrying about that right now.
mozilla.com PHP page
Let's benchmark a normal mozilla.com PHP page: /en-US/firefox/new/.
Conclusion: As expected, only the Apache Redirect solution (copying all htaccess redirects) has a performance hit. Otherwise, current mozilla.com pages shouldn't have a performance hit (negligible, if any).
mozilla.org PHP page
Now let's benchmark a mozilla.org PHP page: /community/.
Conclusion: We've actually increased the performance of mozilla.org pages by 50ms! The RewriteMap+PHP solution is an obvious winner. The performance comes from converting the 1500 redirects into a more optimized RewriteMap database file.
Notice the spike in the 90% range, however. Honestly I'm not sure how to explain it, but I think it's an anomaly with my testing environment. There's absolutely no reason why that should happen (5% of the requests take a lot longer to respond). If I actually add all the htacess rewrites from mozilla.org in addition to the RewriteMap database file, it doesn't spike like that. We can look into the spike later and see if it exists on our staging/live servers. These tests also perform requests at a much higher rate than mozilla.org ever sees.
thunderbird PHP page
We had to hack in thunderbird as well (it was just merged in to the current mozilla.org, and I moved the hack up into the mozilla.com codebase). So let's see what happens with the thunderbid url /thunderbird/.
Conclusion: Thunderbird also benefits from our htaccess optimizations, as seen in the RewriteMap+PHP solution. It gets a nice 25ms shaved off each request!
Clearly, RewriteMap+PHP is again the winner. This is fortunate as it also happens to be the cleanest hack out of all of them.