ReleaseEngineering/Meeting Notes/2010-02-08 - Frustrations Deploying Services

From MozillaWiki
Jump to: navigation, search

What's Bothering us right now

  • being blocked about getting things like self serve public because we want a way to deploy things
  • DNS, hosting fixes - no one is taking on
  • have things to deploy and can't
  • overall there's a "things aren't the way we want them to be"
  • no one has time to work on things - we all feel this
  • It's hard for new hires to learn what's already in place
  • What are the systems, how do they work, interrelationships are hard to track down
  • Someone who's new to this doesn't have a central picture to predict conflicts on new work

What would help right now?

  • sound description of each system (app store is beginning of that)
  • tips on how to start the discussion on where to get new work up and running (what db, host, etc)
  • process for how we create new systems - and determine the 'ilities': scalability, maintainability, reliability, etc

Suggestions/Questions/Comments

  • would it help to be writing down the questions to create the template for what kind of docs/info we need to gather?
  • how much of this is inherent in the system? complicated/moving parts/imperfect software & hardware
  • one-line patch assumption can turn into a week stuck into something, the 'how long will this take' estimates can be off
  • this is endemic to something we've all been picking at
  • we can no longer be a loose collection of engineers working on their thing
  • because of size of team and scope of problems, we are hitting the communication issue of a larger team
  • a single person can no longer just do everything alone
  • we need specializations (eg: Dustin is aleady specializing away from master side of buildbot)
  • there's a lot of stuff that someone like, for example catlee, knows that is not shared knowledge: how to access db, use talos monitoring
  • need technical measures that test/ensure fault tolerance and isolation - e.g., netflix chaos monkey
  • someone might send email when a system first come out, but what kind of things do we need to document and make sure are easy to find over time?
  • still missing something to get things done
  • example for right now: self serve - is deploying it in its current state a mistake?
  • we haven't got time/known method for auditing new tool or system to know whether it is ready to be production level

Action items

  • looking into scalability of self serve (hosting)
  • everyone contribute to docs about the systems we use and how they inter-operate with other tools/systems - ReleaseEngineering/Applications