Introduction to CodeQL

By Mathew Payne - @GeekMasher

v1.0

# Whoami

Mathew Payne - @GeekMasher

Senior Field Security Specialist - GitHub

Abertay University Alumni

Focus on:

  • ๐Ÿ’ป Static Code Analysis
  • ๐Ÿ‘€ Code Review
  • ๐Ÿค DevOps / DevSecOps

Today's Talk

  • What is Static Code Analysis and CodeQL?
  • Running CodeQL to do research
  • Introduction to Query writing

Questions during presentation are welcome!

๐Ÿ” Static Code Analysis Basics

Poll Time

๐Ÿ” What is Static Code Analysis?

  • ๐Ÿ“ An automated tool to analyse source code
    • "Asking and answering questions about the code"
  • ๐Ÿ”’ Static Application Security Testing (SAST)
  • #๏ธโƒฃ Looks at the code without running the code
  • ๐Ÿ” Discover security vulnerabilities
  • ๐Ÿ”— Perform Taint Tracking Analysis

Why Static Code Analysis?

โšก Taint Tracking Analysis

Track data through an application

  • ๐Ÿ” Sources (untrustworthy application inputs)
  • โ‰๏ธ Sinks (methods / assignments of interest)
  • ๐Ÿ”’ Sanitizers (secures the user data)
  • ๐Ÿ”’ Guards (conditional checks)
  • ๐Ÿ‘ป Passthroughs / Taintsteps (understand what an expression does)

โšก Example - Code Review

from flask import Flask, request, render_template
# ...
@app.route("/search")
def search():
    search = request.args.get("search")
    results = lookup(search)
    return render_template(
        "search.html", results=results
    )

โšก Example - Code Review

from flask import Flask, request, render_template
# ...
@app.route("/search")
def search():
    search = request.args.get("search")  # <- source, request parameter
    results = lookup(search)             # <- sink?
    return render_template(
        "search.html", results=results   # <- sink?
    )

โšก Example - Code Review

from flask import Flask, request, render_template

def lookup(data):
    cursor = conn.cursor()
    query = f"SELECT * FROM metadata WHERE name='{data}' OR data='{data}'"
    
    cursor.execute(query)
    return cursor.fetchall()

@app.route("/search")
def search():
    search = request.args.get("search")
    results = lookup(search)
    # ...

โšก Example - Taint Tracking Analysis

from flask import Flask, request, render_template

def lookup(data):   # <- 3. function definition
    cursor = conn.cursor()
    query = f"SELECT * FROM metadata WHERE name='{data}' OR data='{data}'"
    # ^ 4. string format, tainting query
    cursor.execute(query)    # <- 5. SINK!
    return cursor.fetchall()

@app.route("/search")
def search():
    search = request.args.get("search")  # <- 1. source, request parameter
    results = lookup(search)             # <- 2. function call
    # ...

โšก Example - Code Review (2nd sink)

from flask import Flask, request, render_template
# ...
@app.route("/search")
def search():
    search = request.args.get("search")
    results = lookup(search)
    return render_template(
        "search.html", results=results   # <- sink?
    )

โšก Researching Framework/Library

HTML Escaping / Jinja Templates

When returning HTML (the default response type in Flask), any user-provided values rendered in the output must be escaped to protect from injection attacks. HTML templates rendered with Jinja, introduced later, will do this automatically.

๐Ÿ” CodeQL ๐Ÿ”’

Disclaimer Time ๐Ÿ˜…

What is CodeQL?

  • ๐Ÿ”ง Static Code Analysis Engine
  • ๐Ÿ“š Extracts source code into data
  • โคต๏ธ Stores data in a Database
  • ๐Ÿ” Queries run on the Database
  • ๐Ÿ”ฃ Domain-specific language called "QL"

CodeQL Pipeline

Code -> Database -> Queries -> Results

Getting Started

# vscode starter
git clone --depth=1 https://github.com/github/vscode-codeql-starter

# create database (codeql cli)
codeql database create --language python ./python-DC44131-workshop

note: a little different for compiled languages

๐Ÿ” CodeQL Query Basics

/** 
 * @name SQL Injection
 * @kind problem
 * ...
 */
import python
import DataFlow::PathGraph

// Predicates and Classes
class Sources extends DataFlow::Node { /* class  */ }

// Query Output / Results
from Call call
select call, "Calls in the code"

Let's Answer Some Questions...

Question 1

How do we find the Source?

Python Code

from flask import request

request.args.get("search")
# ^ Source!

CodeQL Query

import python
import semmle.python.Concepts
import semmle.python.ApiGraphs

/*
 * How do we find the source?
 */

from DataFlow::Node request, Attribute attr
where
  request = API::moduleImport("flask").getMember("request").getAValueReachableFromSource() and
  attr.getObject() = request.asExpr()
select attr, "Source"

Question 2

What is the Sink?

Python Code

import psycopg2

conn = psycopg2.connect("dbname=workshop user=postgres")

cursor = conn.cursor()
cursor.execute(query)
     # ^ Sink: execute(query)
import python
import semmle.python.Concepts
import semmle.python.ApiGraphs

/*
 * What is the sink?
 */

from CallNode call, DataFlow::Node sink
where
  // Find all functions called "execute"
  call.getFunction().(AttrNode).getName() in ["execute"] and
  // The first argument is what we are interested in
  sink.asCfgNode() = call.getArg(0)
select sink, "Sink"

Note: being lazy and looking for execute(...)

Question 3

Can we find a path from Source to Sink?

CodeQL Full - Complete

class SqlInjectionConfig extends TaintTracking::Configuration {
  SqlInjectionConfig() { this = "SqlInjectionConfig" }

  override predicate isSource(DataFlow::Node source) { source instanceof Sources }

  override predicate isSink(DataFlow::Node sink) { sink instanceof Sinks }
}

from SqlInjectionConfig config, DataFlow::PathNode source, DataFlow::PathNode sink
where config.hasFlowPath(source, sink)
select sink.getNode(), source, sink, "This SQL query depends on $@.", source.getNode(),
  "a user-provided value"

CodeQL Queries and Databases

  • ๐Ÿ™ Pre-computed Databases
    • Databases for most Open Source applications
  • ๐Ÿฑ Using built-in queries
    • There are hundreds of queries per-language
    • Fully extendable and customisable

Closing Words

  • ๐Ÿ” Powerful tool to find security vulnerabilities
  • ๐Ÿ’ช Does the heavy lifting for us
  • ๐Ÿ˜ฐ SDL is complex to learn, but once you do ๐Ÿ‘€!
  • ๐Ÿ˜ฒ Find 0-day using CodeQL

Questions?

Happy Bug Hunting ๐Ÿ’ฐ!

- Data-flow, Control-flow, and SSA Graph's Sources: - https://owasp.org/www-community/controls/Static_Code_Analysis

Resources

CodeQL

Memes