Previous slide Next slide Toggle fullscreen Open presenter view 
Introduction to Static Code Analysis 
By Mathew Payne - @GeekMasher 
 
# Whoami 
Mathew Payne - @GeekMasher 
Senior Field Security Specialist - GitHub
Abertay University Alumni
Focus on: 
 
 
Today's Talk 
Introduction to to Static Code Analysis 
Deeper dive into how does Static Code Analysis work? 
Examples of how this is done 
Summary and Thoughts on using Static Code Analysis
 
 
 
What is Static Code Analysis? 
OWASP Definition: 
Static Code Analysis (also known as Source Code Analysis) is usually performed as part of a Code Review (also known as white-box testing) and is carried out at the Implementation phase of a Security Development Lifecycle (SDL).
Static Code Analysis commonly refers to the running of Static Code Analysis tools that attempt to highlight possible vulnerabilities within 'static' (non-running) source code by using techniques such as Taint Analysis and Data Flow Analysis.
 
 
What is Static Code Analysis? 
 
 
 
 
Models 
 
 
 
 
Security Patterns 
In Static Analysis these are called Rules or Queries 
 
Results Produced 
SQL Injection, Cross Site Scripting, ... 
 
Long Functions, Duplicated code, ... 
 
Using appropriate hashing algorithms, automatic encoding, ... 
 
 
 
 
Before we begin: Glossary 
Confusing I know  
 
 
 
 
 
Static Code Analysis Parsing 
Well, some of these terms might seem familiar... 
 
Compiler and Interpreter Pipelines 
*Overly simplified and different languages might look different 
 
 
 
Example - Abstract Syntax Tree 
v = 1  + 1 
 
Example - Abstract Syntax Tree 
def  test (var1: str  ):
    print ("Var :: "  + var1)
test("Hello" )
 
Example - Abstract Syntax Tree (web app) 
from  flask import  Flask, render_template
app = Flask(__name__)
@app.route("/"  
def  index ():
    return  render_template("index.html" )
if  __name__ == "__main__" :
    app.run('0.0.0.0' , 5000 )
 
 
Example - Control Flow Graph 
x = 1 
while  x > 0 :
    if  x > 1 :
        continue 
    x = x -1 
y = x
Blue = True, Red = False 
By Rahul Gopinath and The Fuzzing Book 
 
Showcase - Radare2 CFG 
Image(s): Radare2 CFG by @hexploitable 
 
 
Example - Simple Application + DFG 
my_var = input ()
format_str = "Input: "  + my_var
print (format_str)
 
 
Sources (user controlled inputs) 
Sinks (dangerous methods / assignments) 
Sanitizers (secures the user data) 
Passthroughs (functions that track tainted data) 
 
 
Define a Security Pattern that you want to detect
 
 
All static code analysis tools has rules or/and queries
Hardcoded or Customisable 
Open or Closed source 
 
 
Configuration Rules or Dynamic Queries
 
 
 
 
Configuration Rules or Dynamic Queries 
Simpler to write 
Complex flows can be very hard to declare 
 
 
Harder to learn and write 
Complex flows are easier 
 
 
 
 
Just use Regex!? 
Jamie Zawinski (early Netscape engineer): 
Some people, when confronted with a problem, think "I know, I'll use regular expressions." Now they have two problems.
 
 
Example - Detecting Simple Configuration Problems 
from  flask import  Flask, render_template
app = Flask("MyApp" )
@app.route("/"  
def  index ():
    return  render_template("index.html" )
if  __name__ == "__main__" :
    app.run("0.0.0.0" , 80 , debug=True )
What issue do you see here? 
 
name:  "Debugging Enabled" 
sources: 
  types:  bool 
  value:  True 
sinks: 
  flask: 
    -  'flask.Flask(){}.run([2]|debug)' 
Mokeup example rules language 
 
Use the AST, CFG, and DFG to get results 
from  sca import  ast, cfg, dfg, results
sources = ast.getType("bool" ).getValue("True" )
flask = ast.findImport("flask" ).getExpr("Flask" )
sinks = flask.getCall("run" ).getParameters("debug" ) or  flask.getCall("run" ).getParameters(2 )
results = dfg.taint(sources, sinks)
Mokeup example query language 
 
Example - Simple Taint Flow 
from  flask import  Flask, request, render_template, make_response
@app.route("/search"  
def  search ():
    query = request.args.get("s" )
    results = lookup(query)
    if  len (results) > 0 :
        return  render_template("search.html" , results=results)
    else :
        return  make_response("No results found for: "  + query, 404 )
What issue do you see here? 
 
name:  "Cross Site Scripting" 
sources: 
  flask: 
    -  "flask.request.args[]" 
    -  "flask.request.args.get()" 
sinks: 
  flask: 
    -  "flask.make_response([0])" 
    -  "flask.Response([0]){}" 
    -  "flask.render_template_string([0])" 
    -  "flask.abort([2])" 
Mokeup example rule language 
 
Use the AST, CFG, and DFG to get results 
from  sca import  ast, cfg, dfg, results
flask = ast.findImport("flask" )
sources = flask.getMember("request" ).getMember("args" ).getUses()  
routes = flask.findDecorator("route" ).getCall() 
sinks = ast.getType("str" ).getExpr() & routes.getReturns()
results = dfg.taint(sources, sinks)
Mokeup example query language 
 
Modeling 
Researching a framework, library, or module
 
Creating reuseable models for the Static Analyser
"User Inputs"
flask.request.args[], etc. 
 
"XSS Sinks"
flask.make_response([0]), etc. 
 
 
 
 
 
Example - Modeling 
from  sca import  dfg, results
from  sca.flask import  flask_sources, flask_sinks_xss, flask_sanitizers_xss
from  sca.web import  web_sources, web_sinks_xss, sanitizers_xss
results = dfg.taint(web_sources, web_sinks_xss, sanitizers_xss)
Mokeup example query language 
 
Functions or checks that cause the input to be securing used
Escaping or Encoding before the sink 
 
 
Context is extremely important
 
Inline, Direct, and Indirect are... extremely complicated!
 
 
 
Example - XSS but using Sanitizer 
from  flask import  Flask, request, render_template, escape
@app.route("/search"  
def  search ():
    query = request.args.get("s" )
    results = lookup(query)
    if  len (results) > 0 :
        return  render_template("search.html" , results=results)
    else :
        return  "No results found for: "  + escape(query)
Rules don't work now... False Positives here we come 
 
Example - Using another Sanitizer... 
from  flask import  Flask, request, render_template
from  pymysql.converters import  escape_string
@app.route("/search"  
def  search ():
    query = request.args.get("s" )
    results = lookup(query)
    if  len (results) > 0 :
        return  render_template("search.html" , results=results)
    else :
        return  "No results found for: "  + escape_string(query)
Are we secure now? 
 
Context is so important! 
But it's so hard to tools to know without us telling them 
 
Example - Hashing 
import  hashlib
def  hashData (data: str  ) -> str :
    
    hashobj = hashlib.sha1(data.encode())
    digest = hashobj.hexdigest()
    return  digest
Is this insecure? 
 
Answer: It Depends on context 
Any form of Cryptography (signing, integraty, etc), password storage 
What about Git? OpenSSL? Simple file hashing signatures? 
 
Example - Hashing attempt 2 
import  hashlib
def  hashData (data: str  ) -> str :
    
    hashobj = hashlib.sha256(data.encode())
    digest = hashobj.hexdigest()
    return  digest
Is this insecure? 
 
Answer: It Depends on context 
Password hashing!? 
@app.route("/signin"  
def  signin ():
  
  password = request.form.get("password" )
  digest = hashData(password)
Use PBKDF2 (NIST and FIPS-140 compliant) or hashing schema with high work factor 
 
Sanitizers - Inline, Direct, and Indirect 
output = escape(input ())
if  secure(output):
  output = input ()
else :
  output = "Error, insecure value passed in" 
if  not  secure(output):
  output = "Error, insecure value passed in" 
  return  output
output = input ()
 
Tracking data sent to and from libraries
 
Different Static Analysis tool treat this differently
 
Tainted? Sanitized?
 
Taint entire objects / classes?
 
 
 
Example - Passthroughs 
import  os
name = input ("Input: " )
result = os.path.join("data" , name)
 
SQL Injection, Cross Site Scripting, ... 
 
Long Functions, Duplicated code, ... 
 
Using appropriate hashing algorithms, automatic encoding, ... 
 
 
 
Here not be Dragons  
 
 
 
 
Poorly written tools leading to:
False Positives (not valid security issues) 
False Negatives (un-discovered true findings) 
 
 
Generally not aware of context 
False Negatives 
Every framework, library, and module 
 
 
 
 
 **Title:**
> Introduction to Static Code Analysis
**Description:**
> This talk will give an introduction into what static code analysis is, go into a deeper dive into how it's done today, and finally discuss the impact & complications around using static analysis.
**Slides:**
https://presentations.geekmasher.dev/2021-09-Defcon44131
Source: https://owasp.org/www-community/controls/Static_Code_Analysis
TODO: Is there a better name for this I can use?
Sources: https://github.com/OWASP/ASVS
TODO: Dragon image + source
This is not a full list but a generalist list that I have
- AST: Tree representation on the Coded parsed
- CFG: Directional Graph of the Control Flows in the Application
- DFG: Directional Graph of the Data flows in an applications
- TA:
All of these locations you can build a static code analysis tools
**Resources:**
- https://www.tutorialspoint.com/compiler_design/compiler_design_syntax_analysis.htm
TODO: Fix image
Source: https://en.wikipedia.org/wiki/Control-flow_graph
(a) an if-then-else
(b) a while loop
(c) a natural loop with two exits, e.g. while with an if...break in the middle; non-structured but reducible
(d) an irreducible CFG: a loop with two entry points, e.g. goto into a while or for loop
Sources:
- https://www.fuzzingbook.org/html/ControlFlow.html
- https://rahul.gopinath.org/post/2019/12/08/python-controlflow/

- https://codeql.github.com/docs/writing-codeql-queries/about-data-flow-analysis/
- https://www.sciencedirect.com/topics/computer-science/data-flow-graph
Sources:
- http://regex.info/blog/2006-09-15/247
- https://blog.codinghorror.com/regular-expressions-now-you-have-two-problems/
Simple debugging is enabled
Generally runs in a sandbox
Sources / Influence:
- https://github.com/returntocorp/semgrep-rules/blob/develop/python/flask/security/dangerous-template-string.yaml
Sources:
- https://cheatsheetseries.owasp.org/cheatsheets/Password_Storage_Cheat_Sheet.html
Sources:
- https://docs3.sonarqube.org/latest/analysis/security_configuration/
Sources:
- [How does JavaScript and JavaScript engine work in the browser and node?](https://medium.com/jspoint/how-javascript-works-in-browser-and-node-ab7d0d09ac2f)
- [Firing up the Ignition interpreter](https://v8.dev/blog/ignition-interpreter)
- [Carnegie Mellon University - Taint Analysis](https://www.cs.cmu.edu/~ckaestne/15313/2018/20181023-taint-analysis.pdf)
- [Northwestern - Static Analysis](https://users.cs.northwestern.edu/~ychen/classes/cs450-f16/lectures/10.10_Static%20Analysis.pdf)
- https://labs.f-secure.com/assets/BlogFiles/mwri-Static-Analysis-for-Code-and-Infrastructure-final-DevSecCon2016-2016-24-10.pdf