User:Grobinson/Notes Stanford Computer Security 2013

Tanvi and I went to the 2013 Stanford Computer Forum Security Workshop on Monday 4/15. The schedule and some of the slides are available here: http://forum.stanford.edu/events/2013security.php

The two talks that were most relevant to the security engineering team were Dan Boneh's talk, which was about poor implementations of HTTPS (although he focused on not-browsers, since in his view "browsers do a pretty good job"). The other talk was about designing automated screening systems to detect malware in mobile app stores. Tanvi and I talked to Jason, who gave the talk, afterwards. We can't use his work directly because it is solely based on Java/Android, and he thinks Javascript's more dynamic nature would make designing a static analysis tool like his for Firefox OS more challenging.

My notes are below.

Dan Boneh - HTTPS Woes

ACM CSS'12 Stanford and UT Austin

It is very difficult to use HTTPS correctly (you're probably doing it wrong).

HTTPS/SSL/TLS

Basic SSL key exchange uses RSA - browser picks a key, sends it to the server encrypted with their public key (sent with certificate). Move to Elliptic Curve = forward secrecy almost for free (10% performance penalty).

Man in the middle attack (there are nation states, corporations, and caching businesses like Cloudflare mounting this attack //all the time// on the web). Turktrust is a recent example.

The lesson: with weak client-side certificate checking, you get nothing - easy MITM on client. Browsers do a pretty good job... but what about non-browser TLS clients?

Payment gateway SDK
Mobile Ads
Web service middleware (often based on Java e.g. Alfresco)
Cloud client API

How do developers do it?

Infrequent: use a TLS library directly
More common: use a library for a protocol over TLS (cURL, Python, PHP, etc.)

Problem: atrocious TLS APIs, end result is developers frequently set incorrect options for the TLS layer

leads to improper server-side cert validation
enables simple man in the middle attacks

CURL's API is almost deceptive, encouraging developers to do the wrong thing (because true evals to 1)
gnutls returns a positive number (commonly construed as a "success" state) for self-signed certs
Python modules do not attempt to validate the server certificate. Python 3 adds the option, but it is not default.
Disabling cert validation due to testing (left in return)

What to do?

Application developers
1. must do cert. fuzzing to see ohw app behaves when presented with invalid certs.
  1. TLS pretense from iSec
2. If you only ever connect to a //single server//, embed the public key in the client. Certificates are designed for browsers, which have to connect to diverse, unknown servers.
TLS Library Developers
1. Provide consistent certificate error reports
2. Provide methods that do full cert. checking including peer name

Anti-censorship

Obama quote (May 19, 2011): "We will support open access to the Internet, and the right of journalists to be heard - whether it's a big news organization or a lone blogger" (beginning of Arab Spring)

Problem: client in //filtered region// trying to connect to service in //open region. //Solution is Tor. When filter blocks Tor, Tor adds secret proxies (bridges). Filters try to find secret proxies and block them as well.

More recently: detection by active probing (amazing). On every SSL connection out of filtered region, they record the destination, wait, and then try to connect to the server and see if it speaks Tor. If it does, block it.

Strengthen proxy-based circumvention (flash proxies). Stopped here.

MOOCs

Trends in MOOCs (and a little bit about other topics).

High dropout rate, especially high failure to complete rate. Big question is why? Lots of cool research and interesting trends from the enormous amount of data they have.

Basic security requirements.

Sample challenges

User annotation of learning material
- traditional XSS, CSRF, forgery, etc.
Reputation in group projects, peer evaluation
- integrity of reputation mechanism and robustness agianst self-maximizing malicious behavior

Smartphone Fingerprinting

abundance of sensor data (motion, sound, light) device identification hurdles

apple retiring UUIDs in 2 weeks
web code access to IDs is limited

Goal: enable low-privilege code to fingerprint

server use: check device license
general privacy implications

Identify devices using sensor data

exploit sensor defects

magnetometer (compass)
- Use in previous work for //authentication// (different problem).
- Not fruitful - too much interference, sensor has memory.
Audio System
- profile speakerphone+mic as one system and measure frequency response
- want to fingerprint device, not environment
- more measurements => better fingerprinting
- use maximum likelihood estimator
Accelerometer
- k-means clustering (80% accuracy)

sensor-based fingerprinting

completely feasible
even from remote javascript
need to revisit some privacy assumptions?

Building a next-generation App Store

App Stores: Reality

Permission changed in latest update. Different reactions - uninstall, finish playing and then install... question is Why? What is context?

Admissions system (ie Google Bouncer, Apple's anonymous system) filters apps, accepts "good" and rejects "bad".

STAMP admission system.

Static techniques (don't execute code)
- More behavioral information, but fewer details. More abstract, may be hard to interpret.
Dynamic (do)
- Fewer behaviors (difficult to get coverage - click every place where the bird can go). Easier to analyze.

STAMP as a service.

Static vs. Dynamic

Can be faster
More coverage
Less configuration issues

Source-to-sink flows

Source: sensitive data
Sink: Internet, SMS, Disk, etc.

Data flow analysis in action:

Malware/greyware analysis
- Popular for malware to steal device identifiers, etc.
API Misuse and Data Theft Detection
- Data theft from Facebook's API for example

Challenges

Do you analyze just the application, or the system as well? Android is 3.4M+ lines of complex code
Scalability
- Design choice: whole system analysis impractical
Soundness
- Avoid missing flows
Precision
- Minimize false positives (don't upset developers)

Building models

Follow the permissions
- 20 permissions for sensitive sources
- 4 permissions sensitive sinks
Data we track (Sources)
- 30+ sources of sensitive data
(Sinks)
- 10+ types of exit points

Example: Facebook Connect Sync

App description unclear, 404 link to privacy policy
Android permissions page is not enough

Conclusion

Exploring space of admission systems
Fast, practical static data flow analysis
Scalable due to avoiding full system analysis
Dynamic analysis (not discussed) collects concrete values
- Users test drive apps in browser, we collect data from apps on backend
Warning system identifies violated assumptions

STAMP is being commercialized outside of Stanford (test pilots at other corporations).

taintdroid is another dynamic analysis tool.

CFAA: Are we Criminals?

Computer Fraud and Abuse Act of 1984

"access a computer" + "without authorization" = federal crime (could be prosecuted), federal civil remedy (somebody could sue you)

Federal Appellate Court western 3rd of US (Chief Judge Kozinski) lists a lot of ridiculous examples.

violating myspace TOS = conviction (before judge set aside jury's conviction)

Easy cases (violating tos is not a crime):

Nosal
WEC
Aaron's Law

Hard cases: technical circumvention Problem: security research often involves technical circumvention

Are we criminals? Probably. But for the grace of prosecutors and civil litigants...

RSA with Insuff. Entropy

n = pq What happens when p, q generated with faulty random number generators?

Heninger et al. 2012

5.57 of TLS hosts had same private keys as another host
0.5 factorable via Euclid

Why? Not enough entropy.

Kleptography (Young and Yung 1996) Malicious black box RSA key generator

Goals

Efficient way for a host to obtain randomness from a trusted source with high entropy
A way for the host to prove that the generated modules n was generated using the given randomness

Building Blocks 1. Pedersen commitment ... got distracted by bugzilla

Searching on Encrypted Data without Revealing the Predicate

1. Payment Gateway

1. Simulation Paradigm 2. Identity based encryption

Data Mining on GBytes of Encrypted Data

Sites like Facebook, Google, Amazon, Netflix use data mining to give users relevant results. Privacy concern! The new system just learns the model, data is encrypted.

Variety of data mining algorithms. Classic/basic: linear regression. That was easy, so now working on matrix factorization.

Challenges: 1. make algo privacy preserving 2. make efficient

Contribution:

Design of a practical system for privacy preserving linear regression
Implementation
Experiments on real datasets

Comparison to the state of the art:

Hall et. all '11
Graepel

Computations on encrypted data:

2009 FHE (slow for our problems)
1979, A. Shamir and 1988, BGW (Secret Sharing) (huge communication overhead)
1982 Yao Garbled Circuits
New approach of Yao and hom. encryption (lightweight version of Gentry's FHE)

Yao's Garbled Circuits

circuit -> garbled circuit for evaluator Nothing is leaked about x, other than C(x) Evaluator learns the model, but not the input. Problems are that it is not scalable with the number of users, and users need to be online.

User:Grobinson/Notes Stanford Computer Security 2013

Contents

Dan Boneh - HTTPS Woes

Anti-censorship

MOOCs

Smartphone Fingerprinting

Building a next-generation App Store

CFAA: Are we Criminals?

RSA with Insuff. Entropy

Searching on Encrypted Data without Revealing the Predicate

Data Mining on GBytes of Encrypted Data

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

How to Contribute

MozillaWiki

Around Mozilla

Tools