Modeling and Simulation Case Studies | JuliaHub

Case Study: Dynamic Machine Learning with Zebrium | JuliaHub

Written by Zebrium | Jan 2, 2023 9:53:54 PM

Zebrium is a software company that has built the industry’s first Dev/Test Forensics platform. The Zebrium platform helps to automatically detect, triage and speed-up the resolution of failures that occur during the Dev/Test cycle. This can shorten CI/CD cycles, improve quality, and reduce strain on developers and testers. The underlying technology uses unsupervised machine learning to structure logs and metrics so users can see what is normal and what is not.

In Dev/Test, knowing whether a test passes or fails is not enough. Zebrium highlights and categorizes symptoms, regressions and known issues, and groups failures that share the same root cause. Zebrium even uncovers issues missed by automated testing.

To speed up the resolution of problems, Zebrium uses machine learning to turn logs and metrics into a heatmap of the most likely problem areas together with a UI that’s optimized for troubleshooting. Once a problem has been solved, users can create a signature (such as a multi-event sequence) with a few clicks that automatically detects the issue if it happens again.

This system also perfectly structures logs and metrics. Instead of dealing with regexes and parsing scripts to get at log information, Zebrium ML schematizes log data by event type, then captures each parameter into its own typed column so users can query with ease. The schema is self-maintained across versions to ensure everything just works.

Zebrium customers are mostly technology vendors - companies that build hardware, software or SaaS. The Zebrium solution is deployed as a SaaS application with the goal of seeing value within the first hour of use.

Zebrium’s CTO, Larry Lancaster, founded Zebrium with the idea that machine learning could be used to perfectly structure unstructured machine data. He set off on an ambitious exploration phase as this task had never been attempted before, taught himself machine learning and built a number of prototypes. Due to the complexity of the core algorithm, the amount of data that needs to be processed and heavy compute requirements, Julia was the ideal choice.

Larry Lancaster, Zebrium’s Founder and CTO explains:

Before Julia, I had coded significant projects in C, Perl, Java, R, and SQL. Given speed requirements, C++ along with some existing libraries was probably the nearest choice in terms of me coming up to speed. However, what I really wanted was C-like performance and Perl dev speed. I wanted to rapidly prototype a number of approaches and see what worked. Julia offered this combination of dynamic language with high performance, and a REPL. So I tried out Julia.

One big wow for me was how easy it was to speed up vectorizable loops just by adding @simd. Another was how easy it is to apply GPU operations in loops just by using an ArrayFire array or GPUArray. So effortless! I made a lot of use of these sorts of extensions.

I developed a library to structure software log data inline for ingest into a relational database. We ended up writing our application in Julia from the ground up. It forms the foundation of our Dev/Test Forensics platform.

The library was powerful and fast enough to attract some incredibly talented computer and data scientists with tons of software experience. One data scientist who joined us has fallen in love with working in the REPL, using TensorFlow and various other tools.

What’s more, the Julia base library has proven incredibly stable for such a young language, which allows us to focus on constantly improving and developing our product.

Julia is critical to the success of our product, our company and our customers. And because Julia allows us to create and deliver a high quality, high performance product that is unique in the marketplace, we attracted a seed round from Accel and GGV.