05 December, 2023

Static Code Analysis with JuliaHub: Scanning with Semgrep

Introduction:

Static code analysis is a vital technique for modern software development, allowing us to find errors, inefficiencies, and security risks without the requirement for program execution. Semgrep, an open-source static analysis tool, has received praise for its versatility, support for a wide range of languages, and user-friendly design. We're excited to report that Semgrep now offers experimental support for Julia, a high-level, high-performance programming language developed for technical computing!

Semgrep's objective has always been to have a substantial impact on software security, regardless of the language used. Semgrep, on its way to becoming a powerful static analysis tool, now supports a variety of programming languages, including Julia. Julia's effective inclusion in Semgrep was made possible by the major efforts of Avik Sengupta from JuliaHub and Sergio Vargas.

Recently, JuliaHub has added Semgrep support! While still early, the team has developed close to 100 rules that can be used to scan your Julia code. How does this work?

Rules for Julia Code Analysis with Semgrep:

Let's illustrate how we can detect hard-coded API keys in Julia code using Semgrep. Suppose you have a piece of Julia code in a file name - file.jl that looks like:

Julia code


function get_api_key()
    # This is not good!
    return "ABCD-1234-EFGH-5678"
end

A hard-coded API key is embedded in the get_api_key() function. We can create a Semgrep rule to detect such patterns by writing the Semgrep rules in the below rule.yaml file:

Yaml code


rules:
- id: hard-coded-api-key
  pattern: 'return "=~/[A-Z0-9-]{19}/"'
  message: Hard-coded API key detected
  languages: [julia]
  severity: ERROR

In this rule, we're looking for any line of code where a string that matches a common API key format is being returned by a function. We can then run Semgrep with this bash command:

bash command


$ semgrep --config rule.yaml file.jl

If Semgrep finds a match, it will output an error:


bash error

The above output provides a clear alert that a hardcoded API key was detected, highlighting the file and the line number of the offense. You can now easily locate and correct the issue, potentially saving you from significant weaknesses in your code.

While the above is only an example of the kind of analysis Semgrep can do, enterprise users of JuliaHub will be able to access close to 100 rules that follow Julia’s Coding Guidelines. With Semgrep and its new support for Julia, you can easily write rules to detect a broad range of potential issues, helping to improve the overall quality and strength of your code. The full set of rules is available to enterprise customers of JuliaHub.

Protecting sensitive data in Julia code is crucial. This Semgrep rule, 'mbedtls-secrets-log,' that spots risky secret key logging practices. If your Julia code uses 'SSLConfig' or 'MbedTLS.SSLConfig' with 'log_secrets,' this rule flags it as an ERROR. Keep your production environment secure by steering clear of secret key logging. This rule is tagged for Julia, aligns with CWE-532 ('Insertion of Sensitive Information into Log File'), and is a must for anyone serious about Julia code security.

Yaml code


rules:
  - id: mbedtls-secrets-log
    patterns:
      - pattern-either:
        - pattern: SSLConfig(..., log_secrets=...)
        - pattern: MbedTLS.SSLConfig(..., log_secrets=...)
    message: Do not use secret key logging in production
    languages:
      - julia
    severity: ERROR
    metadata:
      cwe:
        - "CWE-532: (Insertion of Sensitive Information into Log File)"
      category: security
      technology:
        - julia

Another Semgrep rule, 'index-by-threadid,' tailored to boost the safety of your Julia code. In simple terms, this rule keeps an eye out for a potential troublemaker: using threadid() to index arrays or data structures within threaded sections marked by @threads in Julia. Here's how it works: The rule checks if you're indexing inside or outside static thread loops and looks for variations in how you use threadid(). If it catches any risky moves, it sends out a WARNING, signaling that these practices might lead to race conditions—nasty bugs that can mess up your program.

To make it even more useful, the rule comes with extra info. It's tagged with a confidence level of LOW, indicating that it's pretty sure but not absolutely certain. Plus, it points you to a JuliaLang blog post that dives deeper into the dangers of using threadid().

Yaml code


rules:
  - id: index-by-threadid
    patterns:
      - pattern-inside: |
          @threads ... for $X = ...
            ...
          end
      - pattern-not-inside: |
          @threads :static for $X = ...
            ...
          end
      - pattern-either:
          - pattern: $S[Threads.threadid()]
          - pattern: $S[threadid()]
          - pattern: |
              $TID = Threads.threadid()
              ...
              $Y = $S[$TID]
          - pattern: |
              $TID = threadid()
              ...
              $Y = $S[$TID]
    message: Indexing by `threadid()` may cause race conditions and should be avoided.
    metadata:
      confidence: LOW
      references:
        - https://www.julialang.org/blog/2023/07/PSA-dont-use-threadid/
      license: LGPL
    languages:
      - julia
    severity: WARNING

Enjoy exploring this powerful tool and happy coding with Julia and Semgrep!

To learn more about JuliaHub, visit our Overview page and get started for free.