User:Ctalbert/OrodruinComms

From MozillaWiki
Jump to: navigation, search

Currently (11/30/2009) we have the following hard coded information

  • devices are supposed to be somehow represented by the client.py subclasses
  • client.py subclasses get out OS information for the place the test-agent is running, but have no capabilities to query what the details for the device-agent are.
  • the os information given by the client.py code is then used to determine what job types correspond to that os and build
  • getjob (in __init__.py returns the proper set of job URIs that the agent can then use to download stuff

Problems

  • So, when we add a device, we have to somehow tell the proper test-agent that the new device exists. We will have many different test agents, and it will be hard to equate test-agent per device before we start the job. So, it makes configuration much easier if the devices were all kept in a central location.
  • The server keeps track of the test-agents (things that have subclassed client.py) and knows who they are
  • How does a change start?
  • How can this be used in the future to help developer X with his device that he wants to run tests on?

The way I'm seeing it

Bringing Up A New Device

  • Ensure wireless network is up
  • Disable device security (winmo)
  • Copy the SUTAgent/DeviceAgent executable onto the device
  • Copy an ini file to the device that contains the location of the test server and information like "build group"
  • The first thing the SUTAgent/DeviceAgent does is ping the server with all its device information to register it - specifically a GET with all the key/values for the device on the URI string
    • If it cannot connect to the server (or if there is no ini file) then the SUTAgent/DeviceAgent starts and goes into the listening mode (what it currently does today when you start it).

Server

  • Maintains a listing of all devices.
  • Pulls build jobs on a regular basis (when it detects the notification from buildbot)
  • Awaits a testAgent/subclass_of_client.py to connect and ask for a job
  • When it gets a request we have to "figure out what to send it" (see below).

TestAgent/aka thing that subclasses client.py

  • Asks for a job to run. The job comes down as a set of URIs and a device to run that job on.
  • It gets a job, and a device to run that job on. In our original understanding of this once this happened, that device would remain with this testAgent, and the test agent would then query for jobs on the device's behalf.
    • But, I'm starting to wonder about that. why not have the testAgent simply run the job for that device, output the results to the server, and then return the device to the pool?
    • It'd be easy (i.e. simpler code path), and as long as all our device-agents expose the same interface, gives us the optimum scalability - fewer test-agents needed because they are not sequestered to a device type.

Proposal

Server

State 0

  • Database connection initialized
  • Views initialized
  • Moves to State 1

State 1

  • Waiting for client and/or devices
  • Periodically registers new builds, remains in state 1 during this process
  • Periodically gets heartbeat information from testAgent's running on device, maintains status for that device based on that heartbeat - i.e. we know that the device is busy if the testAgent is still using it. Keeps the heartbeat between the testagent and the deviceagent.

on Device Registration

  • Adds device to database, including proper device pool.
  • Returns to State 1

on GetJob

  • returns JSON with URIs of builds and tests and a device to test them on

on Post Results

  • redirects results to result servers (email to tinderbox json call to brasstacks)
  • releases device from job, adds it to the "open" queue.

Test Agent

TODO: Hard code the number of device workers we want the test agent to handle. At the moment, that's one. We'll have to test and see how many workers the test-agent can truly service and build that out.

  • Comes up, pings server for a job
  • Gets a job, unzips necessary files for running job
  • Spawns test-agent worker to run remote libraries for the tests
  • Repeats above three steps for the number of workers we want to create.
  • each test-agent-worker will:
    • Set up communication with the device on command channel
    • Set up communication with the device on teh data channel (this is where it gets the heartbeat, stdout, error information etc)
    • Runs tests using the remote testing libs
    • obtains all results from device
    • closes device connections
    • Reports results to server
    • ends thread

Device Agent

  • on startup, looks for ini file
    • if found:
      • uses information to register with server - ip, os, pool information etc
      • goes to state 1
    • if not:
      • goes to state 1

state 1

  • wait for commands from test-agent
  • send heartbeat once a minute on data channel if you have a listener registered on the channel