ReleaseEngineering/How To/Restart BuildAPI: Difference between revisions

From MozillaWiki
Jump to navigation Jump to search
(corrected command for restarting BuildAPi)
 
(7 intermediate revisions by 5 users not shown)
Line 1: Line 1:
{{Release Engineering How To|Restart BuildAPI}}
{{Release Engineering How To|Restart BuildAPI}}
First, figure out if it's buildapi or self-serve that's having issues.
If you're getting [https://secure.pub.build.mozilla.org/buildapi/self-serve/jobs jobs] timing out, that's probably self-serve, so try restarting the agents.
If you're getting HTTP errors or timeouts, that's probably buildapi and you should talk to webops.
If you're getting nagios messages similar to:
  releng.webapp.scl3.mozilla.com:Rabbit Unread Messages is WARNING: RABBITMQ_OVERVIEW WARNING - messages WARNING (529) messages_unacknowledged WARNING (528), messages_ready OK (1)
You need to restart buildapi.


= Restarting the buildapi web app =
= Restarting the buildapi web app =
As root@buildapi01.build.mozilla.org
You can ask webops as they know how to do it. However, since we have root privs on relengwebadm, this can be done self serve.
service buildapi restart
 
tail -n 300 -f ~buildapi/buildapi.log
some background: 'relengwebadm' hosts a number of releng apps. you can see them under '/data/releng/www/*'. There is a shared deploy/restart script here '/data/releng/deploy'. You can run this script on its own and it will update each app in www/ or you can pass in an individual app (e.g. buildapi), and it will deploy/restart against the latest known version. Depending on the app, the version may be stored in puppetagain or somewhere else. For buildapi, it's in puppetagain repo. Note that the deploy script will not restart the application unless it believes there has been a change.
 
first:
<pre>
$ ssh $LDAP_SHORTNAME@relengwebadm.private.scl3.mozilla.com
$ sudo su -
</pre>
 
then decide:
 
if you want to just restart buildapi (you don't want to update or use a newer version). The following will sync the code out to the web nodes and cause a restart (since ``production.ini`` is modified)
<pre>
# cd /data/releng/src/buildapi
# echo "# <user> restarting bug XXXX" >> production.ini
# ./update $(./virtualenv/bin/pip freeze | grep buildapi | sed 's,[^0-9]\+,,')
</pre>
You'll know that buildapi has been restarted if you see the following output reported for all web hosts:
  flower-relengapi: started
 
If you don't see `flower-relengapi: started` message, you may also need to 'apachectl graceful' on each of the web heads. (Please file a bug if you do, that situation "shouldn't happen".)


Use the tail to verify that buildapi restarted cleanly.
or if you want to update buildapi against a new version and deploy:
* https://wiki.mozilla.org/ReleaseEngineering/BuildAPI#Updating_code


= Restarting the agent =
= Restarting the agent =
selfserve-agent instances are run on multiple masters under supervisor, so it should be restarted in case of failure. In some cases (multiple fast failures) supersor disables the service.
selfserve-agent instances are run on multiple masters under supervisor, so it should be restarted in case of failure. In some cases (multiple fast failures) supersor disables the service.
* Search for "include selfserve_agent" in hg.mozilla.org/build/puppet/file/default/manifests/moco-nodes.pp to figure out what masters should be checked.  
* If you need to review log files:
* Search for errors in <tt>/builds/selfserve-agent/agent.log</tt>
** determine the hosts by searching for "include selfserve_agent" in hg.mozilla.org/build/puppet/file/default/manifests/moco-nodes.pp to figure out what masters should be checked.  
* start the service as root:
** Search for errors in <tt>/builds/selfserve-agent/agent.log</tt>
supervisorctl restart selfserve-agent
* To stop/start/restart or check status on the servers, use the [https://github.com/mozilla-releng/build-ansible ansible scripts] with the ``selfserve-inventory.sh`` script. e.g.:
<pre>
  ansible-playbook -i selfserve-invetory.sh -e desired_state=restarted supervisord-action.yml
</pre>
 


= Redis =
= Memcached =
Buildapi also depends on Redis, see [[ReleaseEngineering/How_To/Restart_Redis]].
Buildapi also depends on a memcached server run by IT. This is worth investigating if reporter.py jobs are hanging.

Latest revision as of 11:27, 22 June 2017


First, figure out if it's buildapi or self-serve that's having issues. If you're getting jobs timing out, that's probably self-serve, so try restarting the agents. If you're getting HTTP errors or timeouts, that's probably buildapi and you should talk to webops.

If you're getting nagios messages similar to:

 releng.webapp.scl3.mozilla.com:Rabbit Unread Messages is WARNING: RABBITMQ_OVERVIEW WARNING - messages WARNING (529) messages_unacknowledged WARNING (528), messages_ready OK (1)

You need to restart buildapi.

Restarting the buildapi web app

You can ask webops as they know how to do it. However, since we have root privs on relengwebadm, this can be done self serve.

some background: 'relengwebadm' hosts a number of releng apps. you can see them under '/data/releng/www/*'. There is a shared deploy/restart script here '/data/releng/deploy'. You can run this script on its own and it will update each app in www/ or you can pass in an individual app (e.g. buildapi), and it will deploy/restart against the latest known version. Depending on the app, the version may be stored in puppetagain or somewhere else. For buildapi, it's in puppetagain repo. Note that the deploy script will not restart the application unless it believes there has been a change.

first:

$ ssh $LDAP_SHORTNAME@relengwebadm.private.scl3.mozilla.com
$ sudo su -

then decide:

if you want to just restart buildapi (you don't want to update or use a newer version). The following will sync the code out to the web nodes and cause a restart (since ``production.ini`` is modified)

# cd /data/releng/src/buildapi
# echo "# <user> restarting bug XXXX" >> production.ini
# ./update $(./virtualenv/bin/pip freeze | grep buildapi | sed 's,[^0-9]\+,,')

You'll know that buildapi has been restarted if you see the following output reported for all web hosts:

 flower-relengapi: started

If you don't see `flower-relengapi: started` message, you may also need to 'apachectl graceful' on each of the web heads. (Please file a bug if you do, that situation "shouldn't happen".)

or if you want to update buildapi against a new version and deploy:

Restarting the agent

selfserve-agent instances are run on multiple masters under supervisor, so it should be restarted in case of failure. In some cases (multiple fast failures) supersor disables the service.

  • If you need to review log files:
    • determine the hosts by searching for "include selfserve_agent" in hg.mozilla.org/build/puppet/file/default/manifests/moco-nodes.pp to figure out what masters should be checked.
    • Search for errors in /builds/selfserve-agent/agent.log
  • To stop/start/restart or check status on the servers, use the ansible scripts with the ``selfserve-inventory.sh`` script. e.g.:
  ansible-playbook -i selfserve-invetory.sh -e desired_state=restarted supervisord-action.yml


Memcached

Buildapi also depends on a memcached server run by IT. This is worth investigating if reporter.py jobs are hanging.