ReleaseEngineering/How To/Restart BuildAPI
First, figure out if it's buildapi or self-serve that's having issues.
If you're getting jobs timing out, that's probably self-serve, so try restarting the agents.
If you're getting HTTP errors or timeouts, that's probably buildapi and you should talk to webops.
Restarting the buildapi web app
You can ask webops as they know how to do it. However, since we have root privs on relengwebadm, this can be done self serve.
some background: 'relengwebadm' hosts a number of releng apps. you can see them under '/data/releng/www/*'. There is a shared deploy/restart script here '/data/releng/deploy'. You can run this script on its own and it will update each app in www/ or you can pass in an individual app (e.g. buildapi), and it will deploy/restart against the latest known version. Depending on the app, the version may be stored in puppetagain or somewhere else. For buildapi, it's in puppetagain repo.
first:
$ ssh $LDAP_SHORTNAME@relengwebadm.private.scl3.mozilla.com $ sudo su -
then decide:
if you want to just restart buildapi (you don't want to update or use a newer version). The following will sync the code out to the web nodes:
# /data/releng/deploy buildapi
You may also need to 'apachectl graceful' on each of the web heads.
or if you want to update buildapi against a new version and deploy:
Restarting the agent
selfserve-agent instances are run on multiple masters under supervisor, so it should be restarted in case of failure. In some cases (multiple fast failures) supersor disables the service.
- Search for "include selfserve_agent" in hg.mozilla.org/build/puppet/file/default/manifests/moco-nodes.pp to figure out what masters should be checked.
- Search for errors in /builds/selfserve-agent/agent.log
- start the service as root:
supervisorctl restart selfserve-agent
Memcached
Buildapi also depends on a memcached server run by IT. This is worth investigating if reporter.py jobs are hanging.