 BLOG / NEWS
Inference Convergence Algorithm in Julia  Revisited
By Jameson Nash  May 15, 2017
Adventures in Type Inference Convergence: 2017 edition
In my last post on type inference convergence, I described a correct inference algorithm. However, while this was a tremendous improvement over being wrong, I was unhappy with it. Indeed, I wrote that "a full discussion of these heuristics will have to wait for a future blog post". But what I didn't say is that I had already written many notes for that post, and had reached a point where I understood that the current algorithm wouldn't actually permit reliable, welltuned heuristics.
Plus, the algorithm required lots of global state (to manage the workqueue), which required tricky locks and coordination to ensure it stayed consistent.
Convergence Algorithm 2.0
The improved convergence algorithm in PR #21677 maintains all of the correctness guarantees of the existing algorithm, but uses a completely revised cycledetection algorithm that provides much stronger guarantees about the order in which work will be done.
By comparison to be current algorithm, the revised one has half as many states. They are:

processing: any leaf function of the callstack (usually there's just one at a time)

on the callstack: the tree DAG (directedacyclicgraph) of functions with edges representing inferred potential calls

in a cycle: groups of functions nodes discovered not to form a DAG (represented instead as an unsorted set)

finished: done
The outline of the algorithm that manages these states is that the callstack is always maintained as a simple tree, only permitting the existence of nodes with forward edges. During the course of running inference, if adding an edge would cause a cycle in this graph, the algorithm instead replaces all nodes that are participating in the cycle with a single node that represents that cyclic set.
Conceptually, this set is similar to either the "fixedpoint" set in the current algorithm, or the "active lines" set. It differs from the "fixedpoint" set because it can also guarantee that only nodes in the cycle will be in the set. The "active lines" set only allows describing the active set within a single method – the new representation expands that to iterate convergence of an entire set of methods.
Because the algorithm maintains a acyclic callstack, it becomes now possible to depend on the inspection of an edge in inference to result in exactly one of two results: an inferred return type, or a cycle detection that replaces the current node with a convergence set. This greatly increases the options available for integrating new inference heuristics!
There's one additional nice benefit to this new representation: improved inlining heuristics! It may not be obvious how improved cycledetection during inference convergence would impact the optimization afterwards, but one of the unintuitive products of inference is the inlining order. Deciding on the profitability of inlining is dependent upon knowing the precise structure (ordering) of the callstack and any cycles in it. Since the new inference structure provides precise information on which functions potentially participate in cycles, the inliner order can take advantage of that information to decide which functions to inline, and where to terminate the inlining due to recursion.
Undecidability Heuristics
Without heuristics imposed on top of the existing inference algorithm, it would already be Turingcomplete^{[1]}, with just the current support for recursively inspecting dispatch over computed types.
This means that (without some additional constraints added) inference might attempt to compute anything (including the halting problem, busy beaver, infinite recursion). In some cases, code is written that explicitly tries to make use of this using "lispy style" function application. However, code written in that manner often quickly runs into limits that are inherently required by the compiler to avoid other very similar code patterns where the inferred recursion is either unintentional (it won't affect runtime performance / behavior) or unreachable (solving for reachability is commonly referred to as the halting problem).
It is suboptimal for this inference system to be undecidable (or even slow), since its sole purpose is to find optimizations to make your program run faster. And inference is going to be painfully slow when it is being used as an interpreter (rather than just running the program unspecialized). So, there needs to be some mechanism or heuristics for preventing this from happening.
It's also important to realize that these heuristics need to be tuned precisely to the capabilities of inference.
Turingcompleteness requires several features:

loops

conditionals

arbitrarily large (but finite) memory
Removing any one of these would be sufficient to ensure the system is computable. But loops (recursion), conditionals (dispatch), and memory (applytype) are all essential to getting useful results from inference and thus it is generally not desirable to simply remove them. Additionally, even if it was possible now, as inference becomes more capable (e.g. interprocedural constant propagation, effectfree computation, speculative evaluation of constant arithmetic), new forms of loops and memory would appear that would also need to be addressed.
Current limits
Since Julia's inference algorithm currently only operates on Julia's Type's, (and does not support inference of constant expressions like 1 + 1
), we only need to consider how these can be used to express an arbitrarily large memory. Imply both that it can form arbitrarily large constants, and do arithmetic (conditional loops) on them.
There are a finite number of type names. But there are two primary mechanisms by which a type can be used to express an arbitrary value. For any type, the parameters can be arbitrarily nested in depth. Additionally, there are tuples and unions, which (in addition to depth), can have arbitrary length. We can express this complexity as:
depth(type) = 1 + maximum(depth, type.parameters) length(type) = length(type.parameters)
It is possible, using either property (or a combination of both), to represent any number.
The current heuristics in inference prohibit recursion which increases either depth or length. without this ability, the program cannot express any number larger than the input, forcing it to (eventually) terminate. Additionally, there are various (semiarbitrary) limits places on the lengths and depths of the types that are flowing through the system.
Independence of the cached result
For predictability, it is preferable that the heuristic limits imposed should only be local properties, computable from the function itself or its forward edges (the functions that it calls). This remains an unsolved problem, since the heuristics are based on examining the callstack (following backedges). This means that the order in which functions enter inference can affect how precisely they are permitted to compute their results.
(A diagram showing how the contextsensativity of the specialization type signature can create repeating patterns instead of simple recursion)
To illustrate what I mean by wanting to only use “local properties”, consider if fuchsia
entered inference first, then inference will see the widening recursion at fuchsia´
. And green´
will not be encountered. Thus, the result of green
will be different than if fuchsia
was not being inferred (recursion would have been detected at green´
, changing the results assigned and cached for both – including fuchsia
when it is later inferred, and uses the cached return type for green
).
An optimal limit would notice the recursion at fuchsia´
, but change the topmost function (fuchsia
) to use an approximation of green
while canceling typeinference on the original attempted edges. This would preserve the characteristic that the cached result of typeinferring green
shouldn't depend on having been called from fuschia
.
Looking forward
A downside of the current arbitrary limits is that they penalize valid code which uses complex types, even if they are not changing under recursion.
Another downside is that it only takes into account the possibility of expressing memory via the typesystem "length" and "depth" definitions above.
These challenges remain, and will need to be addressed in a future update to this blog (and to the implementation!).
Looking for Model Predictive Control (MPC)?
Learn about JuliaSim Control in our webinar on MPC, trimming, and linearization in JuliaSim.
Watch NowRECENT POSTS

Quantitative Systems Pharmacology: Delivering Results 37x Faster
Elisabeth Roesch • Sep 27, 2023

Parallel Computing and MultiThreading Using JuliaHub, Pluto Notebook Competition Winners and More
JuliaHub • Sep 20, 2023

CUDA.jl 5.0  Integrated Profiles and Task Synchronization Changes
Tim Besard • Sep 19, 2023