Modern CI/CD Machine Learning Workflows Using Dyad (formerly JuliaSim)

Written by Dhairya Gandhi | Jun 06, 2025

Engineering safety-critical systems requires robustness and reliability, but machine learning tends to be thought of as the antithesis of those goals. While there are many reasons for incorporating machine learning into modern modeling and simulation—for example, to improve model fidelity and generate surrogates for faster evaluation and real-time decision-making—if we are unable to guarantee the reliability of the results, then these new techniques could never enter the live updating environment that modern engineering is seeking to enable. Thus, in a time where we are now trying to launch new cars that allow for over-the-air updates to critical features such as self-driving and automated lane assist, we must ask: what do we demand of our infrastructure and testing tools so that the system can do these kinds of improvements autonomously while guaranteeing safety?

In the Dyad (formerly JuliaSim) platform, we recognize that modern machine learning workflows demand the same rigor and discipline as traditional software development. Production systems have strict requirements and need to be reliable and robust to tackle any scenario they encounter. In the world of Machine Learning, this often translates to strict guarantees of behaviour of the Machine Learning model when faced with different problems. It also needs to be validated against unknown data and have its performance tracked over the life of the model. The ability to create robust, scalable, and reproducible workflows without losing the competitive edge is therefore critical for innovation.

This blog post details how some of the ML-oriented features from the Dyad AI team, in particular the LabTracker as part of the surrogate and automated model discovery training mechanisms, are designed to give automated but robust experimentation of training within CI/CD pipelines.

Enabling Safe Automation of Machine Learning Through Reproducible CI/CD Pipelines

In order to guarantee that our machine learning achieves the accuracy goals required for safe operation in real-world environments, Dyad (formerly JuliaSim) integrates the ML training pipelines into cloud-based CI/CD. The reasoning behind this is because it removes the “it ran on my computer” factor that is often involved in many training scenarios. Instead of personally trying to remember what was tried, and giving vague details for the reproduction on another device, a CI/CD pipeline can track the entire history of what was tried, how well each trial went, and store the artifacts to do comparisons down the line. This means that the system is able to completely track and monitor the history of all surrogates generated for a given model, so that in the future one can ask questions like “was this surrogate trained to be accurate when the pressure within the pipe is low?” by looking at the training history associated with the model, and downloading the saved artifact to re-test in new scenarios. This removes the need for guess-and-check in the real world by giving definite answers about the scenarios through which models were trained.

Importantly, the CI/CD based approach offers the ability to give the guarantees of:

Validation at every step: From data preprocessing to model training, each stage is tested before proceeding.
Reproducibility: Tracking all changes and hyperparameters in version control enables easy replication of results.
Scalability: Parallelizing experiments across CI/CD environments accelerates the process while maintaining consistency.

LabTracker: Dyad’s Automated Machine Learning for Modeling and Simulation

Rather than relying on a fickle “graduate student descent” process for testing models, hyperparameters, and potential applications, the Dyad CI/CD pipeline, which we call LabTracker, is a fully automated approach for testing the ML designs in a given application area. This backend to the Dyad system is designed so that models in Dyad, such as acausal models built through the Dyad Modeling Language or imported models via FMUs, can have their machine learning elements explored in a way that is reproducible and cover the entire space of architectural ideas.

What are the elements of the LabTracker CI/CD ML Pipeline?

1. Data Generation and Preprocessing

Data quality determines model success. We have found that the standard techniques of “sample the data upfront” are insufficient for real-world industrial applications as detailed in prior writings (https://osf.io/preprints/osf/p95zn_v1).

Fig: Generated Data Exploring Relevant Phase Space for CSTR, MPC open loop controlled simulation

As such, the process of creating the right data for training surrogates and other applications is not something that should be taken lightly, instead, it’s an influential step to the system that should be looped back to as you gain more validation of whether the surrogate or other ML model does well or does not do well. Because of this, the process by which you generate the data needs to be an automated process that can be restarted at any time in the future.

The LabTracker system uses a mixture of techniques such as quasi-Monte Carlo sampling (i.e. “more than random” random sampling) along with patented active learning-based techniques to do a smart resampling process that ensures that the data is centered around areas where the machine learning needs it most. Additionally, LabTracker automates processes of normalizing or standardizing data to ensure models work within expected bounds.

The data generation process then has automated testing to ensure that the sampling has created a relevant sample that will likely give accurate machine learning. These validations include metrics that measure features such as:

Data variety and distribution.
Correctness of normalization or preprocessing steps.

Dyad’s architecture then comes with heuristics and best practices around these metrics which are used to identify where in space the data generation process is likely to be insufficient for generating the desired results, helping the user more readily understand the causes of the errors in their final machine learning models.

2. Model Design and Training

Choosing the right architecture and fine-tuning hyperparameters is crucial. Julia’s efficient handling of scientific computations makes it ideal for experimenting with complex architectures.

Tests ensure:

The model operates in its active regions (e.g., activation functions).
Training parameters produce desirable outcomes.

Different ML models scale differently to different problem types. Careful considerations are made to design models to efficiently capture various properties of the training data. To add to that, ML models themselves require that they operate in specific active regions based on the model architecture, activation functions, specific data distributions, and so on. Dyad’s DigitalEcho automatically processes the data to ensure the model can operate in this “goldilocks zone” to give it the best chance at convergence.

Scientific ML targets learning global governing behaviours in a dataset rather than reproducing the training dataset exactly with point-to-point fitting. As such, these behaviours apply to the entire domain of interest, even to unseen data. This distinction is often referred to as “Physics Informed” training. DigitalEcho takes this philosophy and applies it to all the stages of learning a surrogate.

The LabTracker pipeline encompasses analogous libraries in python such as optuna or hyperopt with horovod, allowing users to seamlessly define grids of hyperparameters, and efficiently run these experiments in the cloud with near infinite resources. It can intelligently allocate hardware resources to minimise idling time if there are experiments in the queue.

Additionally, it logs all the results and artifacts in an MLOps database such as MLFlow or WeightsAndBiases to give users visibility into progress and make changes on the fly if necessary. All the experiments are logged with git so it is trivial to reproduce, compare and extend any experiments that have been previously run. As previously discussed, the DigitalEcho pipeline also automates generating the custom training schemes that are necessary for physics-informed learning. This is achieved in a single API call. In contrast to python, where libraries such as pytorch ignite or lightning would have been needed for writing custom training loops in pytorch including writing non-trivial amounts of boilerplate. Being built on top of Julia also means we get to benefit from the JIT compiler across the entire training pipeline, ensuring we enjoy the benefits of code optimization everywhere, not only in isolated subsets.

The model design itself requires several libraries in other ecosystems. In python, modeling libraries can range from pytorch to neuraloperator for access to specialized architectures. While LabTracker can handle arbitrary architectures and training schemes by design, DigitalEcho automates the entire model design and training part for the user, so there’s no need to learn and work with numerous packages each with their own APIs and limitations. The DigitalEcho brings all of this under a single API call. It can consume a JSON configuration file, meaning that users need not know Julia or indeed, ML to benefit from surrogates.

The model design and training is done automatically with cloud-based parallel Experimentation. This allows for testing a wide range of hyperparameters and architectures all at the same time using the near infinite resources given by cloud systems. In particular we can:

Distribute experiments across multiple workers.
Track logs and results for each run, ensuring reproducibility.

Parallel experimentation can target existing resource groups or call for additional hardware and queue up experiments to optimally scale the number of experiments that can be run at once.

Each run is independently tracked, and its results, logs and artifacts are stored for easy reproducibility. All of this is done in real time, allowing one to check on experiments and compare runs at any time from a single dashboard. This approach allows for a flexible yet structured Design of Experiment (DOE) stage, ensuring robustness, flexibility and fast iteration cycles.

3. Centralized Logging and Analysis

Managing experiment outputs and hyperparameter effects requires robust tools like MLflow. By centralizing logs:

Visualizations like parallel coordinate plots help identify impactful hyperparameters.
Successful configurations can be easily reproduced.

By centralizing logs, and training/validation metrics, the user can easily compare across runs and identify impactful hyperparameters with the click of a button. Built-in visualizations such as the parallel coordinate plot can also help identify trends in the model’s performance, and find parts of the parameter space the model can be improved upon, creating a feedback loop both at the DOE stage as well as provide validation for the active learning process. LabTracker’s transparent design enables reproduction of these runs as well.

View full post