Community Ops/PaaS

From MozillaWiki
Jump to: navigation, search

Community Ops - PaaS

Purpose

Currently, Community websites are hosted on a variety of different infrastructure with no clear owner or maintainer, which has lead to downtime, security, and budgeting issues. The goal of this PaaS is to provide the production-quality infrastructure that the community and internal teams can use to host their services.

Apps that are going to run in Community Ops PaaS are:

  • Mozilla discourse
  • Mozilla community sites
  • Participation infrastructure sites

Quick introduction

  • Our main focus is to have an easy way to deploy docker containers as apps.
  • We are trying to abstract the way we are deploying websites using the following services
    • Mesos, Marathon, Consul, Haproxy
  • Consul acts as our service discovery system
    • We register tasks
    • We register checks to make sure that tasks are up and running
    • We can discover the tasks endpoints (eg website URL, DB host) using DNS
  • We are also using consul as a distributed key-value storage
  • Mesos pools multiple servers into a single resource
  • Marathon is a frontend to Mesos that makes it easy to deploy docker containers
  • We have two Mesos/Marathon clusters, staging and production
  • Marathon is responsible for making sure that our long running apps are up and running
  • When an app gets deployed `mesos-consul` registers it as a consul task
  • Using consul we determine which marathon apps are running
  • Using consul-template we iterate over the running apps and interpolate some information
  • With that information we configure haproxy to load balance app requests to our app containers
  • We provide a default hostname for all the marathon apps based on their name
    • eg
      • app name "foo" gets exposed to "foo.production.mozilla.community"
  • In order to set a custom FQDN to our apps we are using Consul's K/V storage

AWS Architecture

AWS Cluster
Public Jenkins security groups model
  • Mesos master nodes
    • Mesos master
    • Marathon
    • Zookeeper
    • Haproxy
  • Mesos slave nodes
    • Mesos slave
    • Docker
  • Shared RDS databases
    • PostgreSQL
    • MySQL
  • Consul shared nodes
    • Consul for both prod/staging
  • Bastion node
    • OpenVPN server
  • Admin node
    • Internal Jenkins instance for infra changes
  • Jenkins node
    • Public Jenkins instance for building and deploying sites

Software stack

Apache Mesos

Apache Mesos is a cluster manager that provides efficient resource isolation and sharing across distributed applications or frameworks.

Apache ZooKeeper

Apache ZooKeeper provides distributed configuration service, synchronization service, and naming registry for large distributed systems.

Marathon

Marathon is a production-grade container orchestration platform for Apache Mesos.

HAProxy

HAProxy is a high availability load balancer/proxy for TCP and HTTP application.

Consul

Consul is a distributed service discovery tool with health checking and k/v storage. Initially we will be using Consul just for health checking but we will expand this to replace Bamboo for service discovery.

Vault

Vault is a distributed tool for storing secrets. We will be using it to store any credentials required in our infra to be accessed by Ansible or Terraform.

Configuration Management