NeckoNet

From MozillaWiki
Jump to: navigation, search

WARNING WARNING WARNING - READ FIRST

NeckoNet is no longer. The rest of the information on this page exists for historical reasons only.

What's all this, then?

NeckoNet is a tool designed for the networking team to be able to test the performance of the networking stack under various networking conditions. It consists of linux machine running a web server (apache), talos, a modified netem kernel module, some glue, and a lot of hope. You can tell this machine what branch you want to build and test, what pages you want to test with, and what you want the network conditions to look like during your test (bandwidth, packet loss and latency for now), and it will magically (or not) do everything for you and dump some timing numbers in an output directory. This way you can, for example, run 2 branches under the same conditions on the same set of pages, and find out which is best (or least bad, depending on your outlook on life).

Example usage

Let's say you're working on a large networking feature (such as HTTP Pipelining) and you want to be able to test your changes to see if they've had any performance impact. First, you need to create a page set you want to test against (see Creating Things). Then, create some testing configurations (for this example, we'll use some low latency and some high latency). Next, you need to set up 2 source repos, one to point at, for example, mozilla-central, and one to point at your project branch. Finally, create some test suites that bring all this together. You might have something like this:

  1. Low latency, mozilla-central
  2. High latency, mozilla-central
  3. Low latency, project branch
  4. High latency, project branch

If you want to test on multiple platforms, you would have more test suites (one set of all of the above for each platform you're testing on)

Now, you can enqueue your tests and wait. A long time. For results. Seriously, I made this kind of dumb, so even on the same branch, it will download a fresh copy of the source and build from scratch for each test suite. Hopefully you have some other stuff to do while you're waiting for this stuff to run.

Once you have your results (see Running for where they will end up), you can compare the timing information and see if you had a good, bad, or negligible impact on performance. Hooray! Also, if something went wrong during the process, you will get some sort of error message (unless you're incredibly unlucky... that's only happened to me once though), but you should still have at least the output of the build and test run process so maybe that will point you in the right direction.

Of course, it bears mentioning that you PROBABLY don't want to use mozilla-central as your baseline, since the chances of it having changed in other ways relative to your project branch are pretty good, and since Talos just measures total pageload time (including resource fetch, DOM construction, rendering, etc), you might not get the correct results.

Setup

Virtual Machine Installation

Now you can get NeckoNet with a lot less effort on your part, just download http://people.mozilla.org/~hurley/neckonet-vm.zip (made for VMWare), unzip it, and you've got a VM with a fully installed NeckoNet environment (complete with a small test suite). You can then skip the rest of this page down to Running OR if you want to create more tests, skip down to Creating Things

If you want to run tests on Windows or OS X, you'll still need to manually install on those platforms

For now, the VM has no home, due to low disk space issues on p.m.o. Email hurley if you need the VM, and we'll figure out how to get it to you.

Prerequisites

For the Master

  • Linux (tested with Fedora 15/2.6.38 kernel and Ubuntu 11.04/2.6.38 kernel, should work with other recent distros)
  • Firefox development environment
  • Bison
  • Flex
  • Yacc
  • Kernel development package(s) for whatever distro you're using
  • db4 development package(s) for whatever distro you're using
  • openssl development package(s) for whatever distro you're using
  • Python from the 2.7 series
  • sqlite 3
  • sqlalchemy (www.sqlalchemy.org)
  • flask (flask.pocoo.org)
  • jinja2 (jinja2.pocoo.org)
  • werkzeug (werkzeug.pocoo.org)
  • wtforms (wtforms.simplecodes.com)
  • Everything necessary for StandaloneTalos (but you don't need to download talos itself, that's included in our tarball)

For the test runners (aka Slaves)

  • Linux, Mac OS X, or Windows
  • Firefox development environment
  • Python from the 2.7 series
  • Everything necessary for StandaloneTalos (again, talos is included in our tarball, don't bother downloading it yourself)

Most of the python modules should be available through your package manager, but the URLs for them are above just in case.

Installation

  1. Download http://people.mozilla.org/~hurley/neckonet.tar.gz
  2. tar xzf neckonet.tar.gz
  3. cd neckonet
  4. Run "python install_<system>.py" to build & install everything (it will ask for your sudo password on linux and os x)
    1. If you are just installing a spare linux slave machine, you can use install_mac.py to reduce the amount of crap built
  5. If you're installing to run the master processes on linux, run "python necko.py dbsync" to upgrade/create the database

Creating Things

  • To get a simple test configuration, run "python necko.py basic" on the master (this is already done for you in the VM version)
  • Otherwise, create a page set, config, and repository of your own to create a test suite.
  • To create a page set:
  1. Create a manifest of pages you want to have in your archive (one URL per line, pretty simple file)
  2. Create a talos.config pointing at that manifest (see talos/necko.config for the one I use, but change the paths as appropriate for your machine)
  3. Run web-page-replay (in the wpr directory) to record: "python replay.py --record <path to archive file>"
  4. While web-page-replay is running, run talos: "python run_tests.py <path to talos.config>"
  5. Once talos is done, ^C web-page-replay
  • Now you can stick that in the database, and create your config, repo, and test suite:
  1. Run "python necko.py web"
  2. Go to localhost:8000 on the machine you're running the web ui on
  3. Upload your pageset under "Manage Pagesets"
  4. Create your config under "Manage Configurations"
  5. Create your repo under "Manage Source Repositories"
  6. Create your test suite out of the pageset, configuration, and source repo you just created

Running

On all machines you will be using for tests, open a terminal and run "python necko.py run" Now you have some choices. You can either enqueue a test from the web UI (runs on port 8000 on the master linux machine), or you can run a test from the command line on the master linux machine using "python necko.py runtest <testid>" (you can get a list of the tests using "python necko.py listtests").

This will take some time (especially if you're doing this all in VMs on the same machine). The timeouts in necko.py work just fine for running things all virtualized on my MBP, but if you need to up any timeouts, look near the top of necko.py, there's a big block comment explaining everything you'll need to know and the variables you'll need to change.

Once everything's complete, look in the "output" directory on your master machine. There will be one or more timestamped directories (corresponding to the test you ran), each containing a zip file with results (the zip files are named <ip>.zip where <ip> is the IP address of the slave machine). If, for some reason, downloading the results to your master machine fails, the results will be in the "testout" directory on the slave (or testout<timestamp> on windows because of some bugs with the build system there).

When you're done, you can ^C things to shut down necko.

Known Issues

  • On windows, you may have to go into the task manager and manually kill a leftover python.exe process. Haven't been able to figure out why this happens occasionally (not all the time), but it will make it impossible for you to run tests again later without killing the process.
  • On OS X, when tests are done running, you may not have an /etc/resolv.conf any more. Just renew your dhcp lease to get it back. Again, no clue so far as to why this happens occasionally, but it makes it kind of hard to resolve hostnames if it does.

History

Version 5 (8/23/11)

  • Support for running tests on Windows and OS X in addition to linux
  • Much more automation in running tests
  • Backend support for running the same test on multiple machines in parallel, but for some reason that didn't quite work in my tests, so you're limited to running on one machine at a time for now

Version 4 (7/27/11)

  • Lots of backend changes
  • Fix a bug where some pages originally had 'Transfer-Encoding: Chunked' but we don't handle that, which caused some tests to fail
  • Lots of miscellaneous bug fixes
  • First release on a VM

Version 3 (7/22/11)

  • NOTE THAT THE PREREQUISITES HAVE CHANGED SINCE VERSION 2
  • Bandwidth control
  • Updated web UI
  • CLI controls for most everything (good since the background version isn't 100% yet, hence the crazy 'running' directions)
  • DB upgrader
  • Now uses apache for serving content to support pipelining

Version 3.1 (7/25/11)

  • Fixed some bugs in the installer
  • Updated the prerequisites to list openssl devel (which is needed for apache)

Version 2 (6/2/11)

  • Deterministic first-SYN packet loss
  • Kmod and modified tc to support above

Version 1 (5/10/11)

  • Basic packet loss and delay
  • Linux-only, localhost-only
  • At least it has a web UI?

Future

  • Use selenium + a custom tp xpi instead of the current one to have more realistic browser interactions (instead of "load the front page of website x"). This would hopefully use nsITimedChannel to get timing information that is more specific to necko, instead of just "this is how long the whole page load took"
  • Re-use builds to save some time
  • Maybe fake up slow resources by adding some kind of delay on some of them