Previous slide Next slide Toggle fullscreen Open presenter view
Introduction to Static Code Analysis
By Mathew Payne - @GeekMasher
# Whoami
Mathew Payne - @GeekMasher
Senior Field Security Specialist - GitHub
Abertay University Alumni
Focus on:
Static Code Analysis
Code Review
DevOps / DevSecOps
Today's Talk
Introduction to to Static Code Analysis
Deeper dive into how does Static Code Analysis work?
Examples of how this is done
Summary and Thoughts on using Static Code Analysis
What is Static Code Analysis?
OWASP Definition:
Static Code Analysis (also known as Source Code Analysis) is usually performed as part of a Code Review (also known as white-box testing) and is carried out at the Implementation phase of a Security Development Lifecycle (SDL).
Static Code Analysis commonly refers to the running of Static Code Analysis tools that attempt to highlight possible vulnerabilities within 'static' (non-running) source code by using techniques such as Taint Analysis and Data Flow Analysis.
What is Static Code Analysis?
An automated tool to analyse source code
Discover known security issues
Discover repetitive security issues
Looks at the code without running the code
Models
Parse the code
Syntax trees
Create models of that code
Flow Graphs
Use the models to look for things we are interested in
Security Patterns
In Static Analysis these are called Rules or Queries
Results Produced
Security Issues
SQL Injection, Cross Site Scripting, ...
Best Practices
Code Quality and Code Smells
Long Functions, Duplicated code, ...
Positive Results
Using appropriate hashing algorithms, automatic encoding, ...
Warning: Here be Dragons
Before we begin: Glossary
Confusing I know
Static Code Analysis Parsing
Well, some of these terms might seem familiar...
Compiler and Interpreter Pipelines
*Overly simplified and different languages might look different
Abstract Syntax Tree (AST)
Example - Abstract Syntax Tree
v = 1 + 1
Example - Abstract Syntax Tree
def test (var1: str ):
print ("Var :: " + var1)
test("Hello" )
Example - Abstract Syntax Tree (web app)
from flask import Flask, render_template
app = Flask(__name__)
@app.route("/" )
def index ():
return render_template("index.html" )
if __name__ == "__main__" :
app.run('0.0.0.0' , 5000 )
Control Flow Graph (CFG)
Example - Control Flow Graph
x = 1
while x > 0 :
if x > 1 :
continue
x = x -1
y = x
Blue = True, Red = False
By Rahul Gopinath and The Fuzzing Book
Showcase - Radare2 CFG
Image(s): Radare2 CFG by @hexploitable
Data Flow Graph (DFG)
Example - Simple Application + DFG
my_var = input ()
format_str = "Input: " + my_var
print (format_str)
Taint Analysis
Sources (user controlled inputs)
Sinks (dangerous methods / assignments)
Sanitizers (secures the user data)
Passthroughs (functions that track tainted data)
Patterns - Rules & Queries
Define a Security Pattern that you want to detect
SQL Injection, Cross site script, etc.
All static code analysis tools has rules or/and queries
Hardcoded or Customisable
Open or Closed source
Configuration Rules or Dynamic Queries
False Positives & False Negatives
Configuration Rules or Dynamic Queries
Configuration Rules (yaml, json, data structure...)
Simpler to write
Complex flows can be very hard to declare
Dynamic Queries ( programming like language)
Harder to learn and write
Complex flows are easier
Just use Regex!?
Jamie Zawinski (early Netscape engineer):
Some people, when confronted with a problem, think "I know, I'll use regular expressions." Now they have two problems.
Example - Detecting Simple Configuration Problems
from flask import Flask, render_template
app = Flask("MyApp" )
@app.route("/" )
def index ():
return render_template("index.html" )
if __name__ == "__main__" :
app.run("0.0.0.0" , 80 , debug=True )
What issue do you see here?
Configuration Rules - Basic Configuration Queries
name: "Debugging Enabled"
sources:
types: bool
value: True
sinks:
flask:
- 'flask.Flask(){}.run([2]|debug)'
Mokeup example rules language
Dynamic Queries - Language for Querying
Use the AST, CFG, and DFG to get results
from sca import ast, cfg, dfg, results
sources = ast.getType("bool" ).getValue("True" )
flask = ast.findImport("flask" ).getExpr("Flask" )
sinks = flask.getCall("run" ).getParameters("debug" ) or flask.getCall("run" ).getParameters(2 )
results = dfg.taint(sources, sinks)
Mokeup example query language
Example - Simple Taint Flow
from flask import Flask, request, render_template, make_response
@app.route("/search" )
def search ():
query = request.args.get("s" )
results = lookup(query)
if len (results) > 0 :
return render_template("search.html" , results=results)
else :
return make_response("No results found for: " + query, 404 )
What issue do you see here?
Configuration Rules - Data Flow Queries
name: "Cross Site Scripting"
sources:
flask:
- "flask.request.args[]"
- "flask.request.args.get()"
sinks:
flask:
- "flask.make_response([0])"
- "flask.Response([0]){}"
- "flask.render_template_string([0])"
- "flask.abort([2])"
Mokeup example rule language
Dynamic Queries - Language for Querying
Use the AST, CFG, and DFG to get results
from sca import ast, cfg, dfg, results
flask = ast.findImport("flask" )
sources = flask.getMember("request" ).getMember("args" ).getUses()
routes = flask.findDecorator("route" ).getCall()
sinks = ast.getType("str" ).getExpr() & routes.getReturns()
results = dfg.taint(sources, sinks)
Mokeup example query language
Modeling
Models != Modeling
Researching a framework, library, or module
Creating reuseable models for the Static Analyser
"User Inputs"
flask.request.args[]
, etc.
"XSS Sinks"
flask.make_response([0])
, etc.
Example - Modeling
from sca import dfg, results
from sca.flask import flask_sources, flask_sinks_xss, flask_sanitizers_xss
from sca.web import web_sources, web_sinks_xss, sanitizers_xss
results = dfg.taint(web_sources, web_sinks_xss, sanitizers_xss)
Mokeup example query language
Sanitizers
Functions or checks that cause the input to be securing used
Escaping or Encoding before the sink
Context is extremely important
Inline, Direct, and Indirect are... extremely complicated!
Example - XSS but using Sanitizer
from flask import Flask, request, render_template, escape
@app.route("/search" )
def search ():
query = request.args.get("s" )
results = lookup(query)
if len (results) > 0 :
return render_template("search.html" , results=results)
else :
return "No results found for: " + escape(query)
Rules don't work now... False Positives here we come
Example - Using another Sanitizer...
from flask import Flask, request, render_template
from pymysql.converters import escape_string
@app.route("/search" )
def search ():
query = request.args.get("s" )
results = lookup(query)
if len (results) > 0 :
return render_template("search.html" , results=results)
else :
return "No results found for: " + escape_string(query)
Are we secure now?
Context is so important!
But it's so hard to tools to know without us telling them
Example - Hashing
import hashlib
def hashData (data: str ) -> str :
hashobj = hashlib.sha1(data.encode())
digest = hashobj.hexdigest()
return digest
Is this insecure?
Answer: It Depends on context
Any form of Cryptography (signing, integraty, etc), password storage
What about Git? OpenSSL? Simple file hashing signatures?
Example - Hashing attempt 2
import hashlib
def hashData (data: str ) -> str :
hashobj = hashlib.sha256(data.encode())
digest = hashobj.hexdigest()
return digest
Is this insecure?
Answer: It Depends on context
Password hashing!?
@app.route("/signin" )
def signin ():
password = request.form.get("password" )
digest = hashData(password)
Use PBKDF2
(NIST and FIPS-140 compliant) or hashing schema with high work factor
Sanitizers - Inline, Direct, and Indirect
output = escape(input ())
if secure(output):
output = input ()
else :
output = "Error, insecure value passed in"
if not secure(output):
output = "Error, insecure value passed in"
return output
output = input ()
Passthroughs
Tracking data sent to and from libraries
Different Static Analysis tool treat this differently
Tainted? Sanitized?
Taint entire objects / classes?
Example - Passthroughs
import os
name = input ("Input: " )
result = os.path.join("data" , name)
Results - Static Analysis Final Step
Security Issues
SQL Injection, Cross Site Scripting, ...
Best Practices
Code Quality and Code Smells
Long Functions, Duplicated code, ...
Positive Results
Using appropriate hashing algorithms, automatic encoding, ...
Congratulations:
Here not be Dragons , Here be Security
The Pros
Fast to run
Easy to Implement
Developer friendly verses other security solutions
Adding to SDLC process
The Cons
Poorly written tools leading to:
False Positives (not valid security issues)
False Negatives (un-discovered true findings)
Generally not aware of context
Need to know all your sources, sinks, and Sanitizers
False Negatives
Every framework, library, and module
**Title:**
> Introduction to Static Code Analysis
**Description:**
> This talk will give an introduction into what static code analysis is, go into a deeper dive into how it's done today, and finally discuss the impact & complications around using static analysis.
**Slides:**
https://presentations.geekmasher.dev/2021-09-Defcon44131
Source: https://owasp.org/www-community/controls/Static_Code_Analysis
TODO: Is there a better name for this I can use?
Sources: https://github.com/OWASP/ASVS
TODO: Dragon image + source
This is not a full list but a generalist list that I have
- AST: Tree representation on the Coded parsed
- CFG: Directional Graph of the Control Flows in the Application
- DFG: Directional Graph of the Data flows in an applications
- TA:
All of these locations you can build a static code analysis tools
**Resources:**
- https://www.tutorialspoint.com/compiler_design/compiler_design_syntax_analysis.htm
TODO: Fix image
Source: https://en.wikipedia.org/wiki/Control-flow_graph
(a) an if-then-else
(b) a while loop
(c) a natural loop with two exits, e.g. while with an if...break in the middle; non-structured but reducible
(d) an irreducible CFG: a loop with two entry points, e.g. goto into a while or for loop
Sources:
- https://www.fuzzingbook.org/html/ControlFlow.html
- https://rahul.gopinath.org/post/2019/12/08/python-controlflow/
![bg fit](assets/control-flow-graph-radare2-2.jpeg)
- https://codeql.github.com/docs/writing-codeql-queries/about-data-flow-analysis/
- https://www.sciencedirect.com/topics/computer-science/data-flow-graph
Sources:
- http://regex.info/blog/2006-09-15/247
- https://blog.codinghorror.com/regular-expressions-now-you-have-two-problems/
Simple debugging is enabled
Generally runs in a sandbox
Sources / Influence:
- https://github.com/returntocorp/semgrep-rules/blob/develop/python/flask/security/dangerous-template-string.yaml
Sources:
- https://cheatsheetseries.owasp.org/cheatsheets/Password_Storage_Cheat_Sheet.html
Sources:
- https://docs3.sonarqube.org/latest/analysis/security_configuration/
Sources:
- [How does JavaScript and JavaScript engine work in the browser and node?](https://medium.com/jspoint/how-javascript-works-in-browser-and-node-ab7d0d09ac2f)
- [Firing up the Ignition interpreter](https://v8.dev/blog/ignition-interpreter)
- [Carnegie Mellon University - Taint Analysis](https://www.cs.cmu.edu/~ckaestne/15313/2018/20181023-taint-analysis.pdf)
- [Northwestern - Static Analysis](https://users.cs.northwestern.edu/~ychen/classes/cs450-f16/lectures/10.10_Static%20Analysis.pdf)
- https://labs.f-secure.com/assets/BlogFiles/mwri-Static-Analysis-for-Code-and-Infrastructure-final-DevSecCon2016-2016-24-10.pdf