Security/Reviews/TelemetryServer

From MozillaWiki
Jump to: navigation, search
Please use "Edit with form" above to edit this page.

Item Reviewed

Telemetry Server
Target
   
     Full Query    
ID Summary Priority Status
911190 Security Review of new Telemetry server -- RESOLVED

1 Total; 0 Open (0%); 1 Resolved (100%); 0 Verified (0%);


Telemetry Reboot - https://wiki.mozilla.org/Telemetry/Reboot Backend architecture change which is compatible with current Telemetry clients https://etherpad.mozilla.org/telemetry-reboot

https://github.com/mreid-moz/telemetry-server/blob/master/server/server.js
The given value "
   
     Full Query    
ID Summary Priority Status
911190 Security Review of new Telemetry server -- RESOLVED

1 Total; 0 Open (0%); 1 Resolved (100%); 0 Verified (0%);


Telemetry Reboot - https://wiki.mozilla.org/Telemetry/Reboot Backend architecture change which is compatible with current Telemetry clients https://etherpad.mozilla.org/telemetry-reboot

https://github.com/mreid-moz/telemetry-server/blob/master/server/server.js" contains strip markers and therefore it cannot be parsed sufficiently.

Introduce the Feature

Goal of Feature, what is trying to be achieved (problem solved, use cases, etc)

Server (https://github.com/mreid-moz/telemetry-server/ requirements:

    Ability to process 10x incoming packet rates of metrics  telemetry infrastructure on a single AWS instance: 2400req/s with 30K  HTTP POST packets. Fall 
    Server should be bandwidth-limited, not CPU. 
    Server should make data available for map/reduce immediately.  Fallback goal: 5min latency. In Q4 we'd like to use something like heka  to make dashboards use live data(0min lag). 
    Graphite reporting: valid packet rates for each channel, stats on packet sizes, etc 

AWS Configuration Details

  • Using Services AWS instances
  • Data to be stored for long term analysis in AWS
  • No changes to telemettry data that is being collected, or to Firefox

What solutions/approaches were considered other than the proposed solution?

  • Current approach is Bagheera, which is a java-based server with several dependencies: Kafka, ZooKeeper, Hadoop
  • New HTTP Server written in python
  • New HTTP Server written in Java
  • Several options xstevens tried when developing Bagheera (go, jetty, etc)

Why was this solution chosen?

  • The node.js based server is simple, fast, easy to hack on, easy to deploy, and has minimal dependencies.
  • Bagheera is complex to deploy and configure, and we wanted to move away from Java
  • Python server was too slow
  • New Java server (using Dropwizard) is still on the "TODO list" if node.js becomes a bottleneck
  • Xavier tried and discarded several approaches during development of Bagheera, not sure of the specific reasons.

Any security threats already considered in the design and why?

  • This server only accepts telemetry data, and a single-purpose design keeps things very simple.
  • Primary threat is submission of malicious data - the HTTP server does not inspect the incoming data, so it should not be possible to harm the server itself by carefully crafted payloads, however we will end up storing potentially bogus data. Bad data will be detected and discarded when data is processed and validated.
  • DoS attack - someone could potentially flood the server with requests, causing us to drop valid requests from Firefox

Threat Brainstorming

  • Sending incorrect "content-length" header?
  • Property "SecReview feature goal" (as page type) with input value "Server (https://github.com/mreid-moz/telemetry-server/ requirements:
        Ability to process 10x incoming packet rates of metrics  telemetry infrastructure on a single AWS instance: 2400req/s with 30K  HTTP POST packets. Fall 
    
        Server should be bandwidth-limited, not CPU. 
    
        Server should make data available for map/reduce immediately.  Fallback goal: 5min latency. In Q4 we'd like to use something like heka  to make dashboards use live data(0min lag). 
    
        Graphite reporting: valid packet rates for each channel, stats on packet sizes, etc 
    

    AWS Configuration Details

    • Using Services AWS instances
    • Data to be stored for long term analysis in AWS
    • No changes to telemettry data that is being collected, or to Firefox" contains invalid characters or is incomplete and therefore can cause unexpected results during a query or annotation process.
    • Property "SecReview alt solutions" (as page type) with input value "* Current approach is Bagheera, which is a java-based server with several dependencies: Kafka, ZooKeeper, Hadoop
    • New HTTP Server written in python
    • New HTTP Server written in Java
    • Several options xstevens tried when developing Bagheera (go, jetty, etc)" contains invalid characters or is incomplete and therefore can cause unexpected results during a query or annotation process.
    • Property "SecReview solution chosen" (as page type) with input value "* The node.js based server is simple, fast, easy to hack on, easy to deploy, and has minimal dependencies.
    • Bagheera is complex to deploy and configure, and we wanted to move away from Java
    • Python server was too slow
    • New Java server (using Dropwizard) is still on the "TODO list" if node.js becomes a bottleneck
    • Xavier tried and discarded several approaches during development of Bagheera, not sure of the specific reasons." contains invalid characters or is incomplete and therefore can cause unexpected results during a query or annotation process.
    • Property "SecReview threats considered" (as page type) with input value "* This server only accepts telemetry data, and a single-purpose design keeps things very simple.
    • Primary threat is submission of malicious data - the HTTP server does not inspect the incoming data, so it should not be possible to harm the server itself by carefully crafted payloads, however we will end up storing potentially bogus data. Bad data will be detected and discarded when data is processed and validated.
    • DoS attack - someone could potentially flood the server with requests, causing us to drop valid requests from Firefox" contains invalid characters or is incomplete and therefore can cause unexpected results during a query or annotation process.

Action Items

Action Item Status None
Release Target `
Action Items
'

Attendees

  • Curtis Koenig
  • David Chan
  • Yvan Boily
  • Michal Purzynski
  • mreid
  • st3fan
  • tinfoil
michal`

mpurzynski@mozilla.com