|
|
| Line 57: |
Line 57: |
|
| |
|
| == ELB Logs == | | == ELB Logs == |
| === Public app ===
| | The production instance of Balrog publishes logs to two different S3 buckets: |
| '''NOTE: ''' These instructions were written before [https://aws.amazon.com/blogs/aws/amazon-athena-interactive-sql-queries-for-data-in-amazon-s3/ Amazon Athena existed]. The next time we need to do such analysis, it's probably worth giving it a try. Other techniques may be better too - the instructions below are just something we've done in the past.
| | * The nginx access logs (that contain all of the update requests we receive) are published to '''balrog-us-west-2-elb-logs'''. These logs are very large, and you're unlikely to be able to download them for local querying. The best way to work with them is through Athena. |
| | | * The rest of the logs are published to '''net-mozaws-prod-us-west-2-logging-balrog''', in the "firehose/s3" directory. Within that there are subdirectories for different parts of Balrog: |
| The ELB logs for the public-facing application are replicated to the '''balrog-us-west-2-elb-logs''' S3 bucket, located in us-west-2. Logs are rotated very quickly, and we end up with tens of thousands of separate files each day. Because of this, and the fact that S3 has a lot of overhead per-file, it can be tricky to do analysis on them. You're unlikely to be able to download the logs locally in any reasonable amount of time (ie, less than a day), but mounting them on an EC2 instance in us-west-2 should provide you with reasonably quick access. Here's an example: | | ** balrog.admin.syslog.admin contains the admin wsgi app output. |
| * Launch EC2 instance (you probably a compute-optimized one, and at least 100GB of storage). | | ** balrog.admin.nginx.{access,error} contain the admin access & error logs from nginx. The access logs are generally a subset of the wsgi app output (which logs requests with a bit of extra detail). |
| * Generate an access token for your CloudOps AWS account. If you don't have a CloudOps AWS account, talk to Ben Hearsum or Bensong Wong. Put the token in a plaintext file somewhere on the instance.
| | ** balrog.admin.syslog.agent contains the agent app output. |
| ** If you've chosen local storage, you'll probably need to format and mount volume.
| | ** balrog.admin.syslog.cron contains cronjob output (eg: the history cleanup and production database dump) |
| * Install s3fs by following the instructions on https://github.com/s3fs-fuse/s3fs-fuse.
| | ** balrog.web.syslog.web contains the public wsgi app output. Note that this app does _not_ log requests, so this is largely warning/exception output. If you care about requests to the public app, use the nginx access logs (see above). |
| * Mount the bucket on your instance, eg:
| |
| s3fs balrog-us-west-2-elb-logs /media/bucket -o passwd_file=pw.txt
| |
| * Do some broad grepping directly on the S3 logs, and store it in a local file. This should speed up subsequent queries. Eg: | |
| grep '/Firefox/.*WINNT.*/release/' /media/bucket/AWSLogs/361527076523/elasticloadbalancing/us-west-2/2016/09/17/* | gzip > /media/ephemeral0/sept-17-winnt-release.txt.gz
| |
| * Do additional queries on the new logfile.
| |
| | |
| === Admin app ===
| |
| Nginx logs for the admin app are available (on a ~1 day time delay) in the [https://console.aws.amazon.com/s3/buckets/net-mozaws-prod-us-west-2-logging-balrog/balrog.admin.nginx.access/?region=us-west-2&tab=overview "net-mozaws-prod-us-west-2-logging-balrog" S3 bucket]. These logs are small enough that downloading and querying them locally is generally the most efficient thing to do.
| |
|
| |
|
| == Backups == | | == Backups == |