Latest revision as of 11:27, 22 June 2017

First, figure out if it's buildapi or self-serve that's having issues. If you're getting jobs timing out, that's probably self-serve, so try restarting the agents. If you're getting HTTP errors or timeouts, that's probably buildapi and you should talk to webops.

If you're getting nagios messages similar to:

 releng.webapp.scl3.mozilla.com:Rabbit Unread Messages is WARNING: RABBITMQ_OVERVIEW WARNING - messages WARNING (529) messages_unacknowledged WARNING (528), messages_ready OK (1)

You need to restart buildapi.

Restarting the buildapi web app

You can ask webops as they know how to do it. However, since we have root privs on relengwebadm, this can be done self serve.

some background: 'relengwebadm' hosts a number of releng apps. you can see them under '/data/releng/www/*'. There is a shared deploy/restart script here '/data/releng/deploy'. You can run this script on its own and it will update each app in www/ or you can pass in an individual app (e.g. buildapi), and it will deploy/restart against the latest known version. Depending on the app, the version may be stored in puppetagain or somewhere else. For buildapi, it's in puppetagain repo. Note that the deploy script will not restart the application unless it believes there has been a change.

first:

$ ssh $LDAP_SHORTNAME@relengwebadm.private.scl3.mozilla.com
$ sudo su -

then decide:

if you want to just restart buildapi (you don't want to update or use a newer version). The following will sync the code out to the web nodes and cause a restart (since ``production.ini`` is modified)

# cd /data/releng/src/buildapi
# echo "# <user> restarting bug XXXX" >> production.ini
# ./update $(./virtualenv/bin/pip freeze | grep buildapi | sed 's,[^0-9]\+,,')

You'll know that buildapi has been restarted if you see the following output reported for all web hosts:

 flower-relengapi: started

If you don't see `flower-relengapi: started` message, you may also need to 'apachectl graceful' on each of the web heads. (Please file a bug if you do, that situation "shouldn't happen".)

or if you want to update buildapi against a new version and deploy:

https://wiki.mozilla.org/ReleaseEngineering/BuildAPI#Updating_code

Restarting the agent

selfserve-agent instances are run on multiple masters under supervisor, so it should be restarted in case of failure. In some cases (multiple fast failures) supersor disables the service.

If you need to review log files:
- determine the hosts by searching for "include selfserve_agent" in hg.mozilla.org/build/puppet/file/default/manifests/moco-nodes.pp to figure out what masters should be checked.
- Search for errors in /builds/selfserve-agent/agent.log
To stop/start/restart or check status on the servers, use the ansible scripts with the ``selfserve-inventory.sh`` script. e.g.:

  ansible-playbook -i selfserve-invetory.sh -e desired_state=restarted supervisord-action.yml

Memcached

Buildapi also depends on a memcached server run by IT. This is worth investigating if reporter.py jobs are hanging.

ReleaseEngineering/How To/Restart BuildAPI: Difference between revisions

Latest revision as of 11:27, 22 June 2017

Restarting the buildapi web app

Restarting the agent

Memcached

Navigation menu

ReleaseEngineering/How To/Restart BuildAPI: Difference between revisions

Latest revision as of 11:27, 22 June 2017

Restarting the buildapi web app

Restarting the agent

Memcached

Navigation menu

Search