Python Static Code Analysis

From MozillaWiki
Jump to: navigation, search

Goals

All content subject to change. Just a first iteration from me. Needs input from Jurriaan.

High Level Goals

This project should be seen as the start of a much bigger (Python and Security) community project with the primary goal of improving Python (web) application security. This project will be the foundation on which other people can start building code analysis tools.

We would like people to contribute new code analyzers and include those in a distribution that we maintain.

This project needs to be designed as a component that can be embedded in bigger projects. In its most basic form the tool would likely just run with a command line interface on a specific set of Python source files. But it should also be possible to embed the tool in a larger project that for example automates scans.

The output of the project should be in a simple structured and machine readable format so that other tools can take the results of a scan and work with those. We will need to define a common format.

Practical Goals

Finding actual vulnerabilities :-)

Here are some interesting common Python issues: Common Python Code Vulnerabilities

Low Hanging Fruit

  • Detect SQL, HTML and JavaScript snippets that are embedded in source code as strings. They can be ranked higher if they are used in string concatenation operations or contain string formatting markers. Maybe some basic machine learning could be used to detect the code language.
  • Detect third party API keys accidentally included in source code
  • Detect references to internal hostnames or staging environments
  • Detect calls to Python libraries/functions that are considered dangerous in web applications. Like for example Popen() or system()
  • Detect file system access.
  • Detect HTTP calls to internal or external web services. Make sure the URLs used for those calls are properly escaped. (String concatenation / manual building vs using for example requests.put(url,params))

More Complicated

  • Build a database of common security vulnerability patterns for popular Python web application frameworks in a generic way so that a source code analyzer can find these patterns. This would require some language or DSL to describe those patterns.
  • Analyze source code to find out what a specific web method takes as input (request parameters, path parameters, headers) - This could be used as input to fuzzer. Or it could be combined with a web scan to find for example hidden parameters. (Like a hidden 'debug' request parameter that the developer left in)

Advanced

  • Extend the SQL, HTML and JavaScript string detection to find out if those strings will include unescaped data taken from request parameters, request body, path parameters or headers.
  • Do code analysis to find logic problems that could lead to vulnerabilities or exploitable code.
  • Mapping the The OWASP Top Ten to code analysis

Need more advanced stuff. Or maybe the More Complicated list is already pretty advanced?

Deliverables

How about making the main deliverable a command line tool that works something like this:

python-scanner [-o report-file] [source files or directories]

The report will be an easy to parse JSON or XML file that contains scan results for the specified source files or directories.

The command line tool should be a simple wrapper around a Python code analysis module.

The individual scanners should be implemented as separate modules or classes that can be unit tested individually.

References

Similar Open Source Projects

Articles & Presentations