ReleaseEngineering/Funsize/Troubleshooting
Contents
Deployment and Troubleshooting
Senbonzakura/Funsize is still a fairly new application and there's probably a lot of kinks in it.
If you're changing things around or deploying it somewhere, it's more than likely sooner or later you'll run into errors and have the application stuck in an un-usable state. This document will list out how to get out of that state.
Important Notes
If you're dealing with this application when it's deployed via docker (locally or on Elastic Beanstalk), you will not have direct access to the container itself, you will probably only have shell access to the host. This means you cannot SSH into the container itself.
To make it easier to extract the cache, database and logs, a folder from the host is mounted within the container. Please take a look at the dockerrun.aws.json
file for the exact locations. Typically the relevant host folder is /var/funsize
Normally you will not be able to "stop" different services in the container. Thus the best option when dealing with containers is to shut the them down and/or destroy them.
Nuclear Option
TL;DR
# Stop everything killall -9 python python2.7 # Kill Flask and celery kill -9 $(ps aux | grep rabbitmq | grep -v "grep" | awk '{print $2}') # kill rabbitmq-server rm -rf <cache location>/* # Cleanup cache mysql -u root -e "Truncate partial;" # For MySQL rm <database file>.db # For SQLite # Make sure virtualenv is active <root of repo>/startup.sh
Please look at the "Things to keep in Mind" section below to help with debugging.
It's still suggested you read through at least the rest of this section before copy pasting the commands above, unless of course you're aware of the pitfalls/consequences.
Full
If you don't want to spend time figuring out which bits need to be cleaned out and only want to get the application back in a working state ASAP, do the following:
- Stop everything that's running, this means stop:
- Stop flask (should be running as api.py)
- Stop celery
- Stop the rabbitmq-server
A good way to do this is:
killall -9 python python2.7 # This should get rid of python and celery # Don't worry if one of python2.7 or python are not found by kill, just confirm no python is running. # Confirm with "ps aux | grep celery" and "ps aux | grep api.py" # Killing rabbitmq is a little trickier # The things you need to killed are a "beam.smp" and empd kill -9 $(ps aux | grep rabbitmq | grep -v "grep" | awk '{print $2}') # Should ideally kill both. # Confirm with "ps aux | grep rabbitmq"
Clean out the Cache
You can simply do:# The location of the cache is specified in "default.ini" and "worker.ini" inside under senbonzakura/configs/ in the app dir. # The application dir in docker is /app/ by default, on your local machine it's wherever you cloned the repo # On docker the default cache is /perma/cache rm -rf <cache location>/\* # Note: not rm -rf <cache location>, we need that folder to exist
Clean out the Database You simply need to delete/empty the table that contains the data
For a MySQL: mysql -u root -e "Truncate partial;"
For SQLite: rm <database file>.db
- Restart everything You can either start everything manually by hand, or use one of the existing scripts to start things up.
You need to have the virtualenv which contains the repository activated before anything else.
If you're inside docker just run the ./docker_init.sh
script. If you're on your own machine run ./startup.sh
. Both these are inside repository at the top level
If you want to restart things manually by hand, you can use multiple tabs/panes/terminals or use &
. You need to run the following 3 commands essentially.
The following instructions assume you're in the root of the repository. ``` rabbitmq-server # add -detached if you want to daemonize instead
celery worker -A senbonzakura.backend.tasks -l INFO # Use -f <log file location> for logging to file, --detach to run as a daemon
python senbonzakura/frontend/api.py ```
Other Methods
Crucial Bits
Essentially the only things that keep any sort of state are: 1. The Database 2. The Cache
Database
The database maintains state of the partial requests, so if the database gets corrupted, or if a partial generation aborts, then the database will prevent you from re-triggering the request.
The best, non-nuclear way to clean this is up is to stop the service and cleanup the database. To do this, stop the running services so that no new entries are added while you're editing the database.
Next find all entries in the database that have status
field set to a non-zero value. You should be able to do this like so:
delete from partial where status!=0;
Cache
The cache implictly maintains some state of the application because if a partial exists in the cache, it means that the partial generation request was completed. Sometimes for whatever reason, there might be a mismatch between the state tracked in database and the one in the cache. (Especially after a Database modification/cleanup/purge and so on).
The best way to resolve this is to go into the cache and remove the offending partial if you know which one it is. If don't know or don't want to know which partial is causing the problem, you can simply delete the partial sub-directory in the cache directory.
Nuking the entire cache directory also works (see Nuclear option above), but the cache directory has cached complete MARs that have been downloaded over the course of time and it's probably a good idea to keep them around unless you have reason to do otherwise.
Things to keep in Mind
If you're working with a deployed version of the application and do not want to debug it, but would still like someone else to be able to do so later, you can do some of the following steps to help:
- Shutdown everything
- Backup the database in it's current state
i.e get a dump of the database, and current configuration file being used, eg.my.cnf
for MySQL - Make a copy of the cache in the current state.
Find the location of the cache in the .ini files; It should be/perma/cache
by default. - Save a copy of the flask and celery logs as they are.
The location of the log files is also mentioned in the .ini files (worker.ini
anddefault.ini
undersenbonzakura/configs
)
Stored in/var/log/celerylog.log
or/usr/local/var/log/celerylog.log
by default. - ... Anything else?