Semantic Subtyping in Luau - Roblox Weblog

Luau is the primary programming language to place the facility of semantic subtyping within the arms of hundreds of thousands of creators.

Minimizing false positives

One of many points with kind error reporting in instruments just like the Script Evaluation widget in Roblox Studio is false positives. These are warnings which can be artifacts of the evaluation, and don’t correspond to errors which might happen at runtime. For instance, this system

native x = CFrame.new()
native y
if (math.random()) then
y = CFrame.new()
else
y = Vector3.new()
finish
native z = x * y

stories a sort error which can’t occur at runtime, since CFrame helps multiplication by each Vector3 and CFrame. (Its kind is ((CFrame, CFrame) -> CFrame) & ((CFrame, Vector3) -> Vector3).)

False positives are particularly poor for onboarding new customers. If a type-curious creator switches on typechecking and is instantly confronted with a wall of spurious purple squiggles, there’s a robust incentive to instantly swap it off once more.

Inaccuracies in kind errors are inevitable, since it’s inconceivable to determine forward of time whether or not a runtime error will probably be triggered. Kind system designers have to decide on whether or not to stay with false positives or false negatives. In Luau that is decided by the mode: strict mode errs on the facet of false positives, and nonstrict mode errs on the facet of false negatives.

Whereas inaccuracies are inevitable, we attempt to take away them each time doable, since they end in spurious errors, and imprecision in type-driven tooling like autocomplete or API documentation.

Subtyping as a supply of false positives

One of many sources of false positives in Luau (and lots of different comparable languages like TypeScript or Movement) is subtyping. Subtyping is used each time a variable is initialized or assigned to, and each time a operate known as: the kind system checks that the kind of the expression is a subtype of the kind of the variable. For instance, if we add sorts to the above program

native x : CFrame = CFrame.new()
native y : Vector3 | CFrame
if (math.random()) then
y = CFrame.new()
else
y = Vector3.new()
finish
native z : Vector3 | CFrame = x * y

then the kind system checks that the kind of CFrame multiplication is a subtype of (CFrame, Vector3 | CFrame) -> (Vector3 | CFrame).

Subtyping is a really helpful function, and it helps wealthy kind constructs like kind union (T | U) and intersection (T & U). For instance, quantity? is carried out as a union kind (quantity | nil), inhabited by values which can be both numbers or nil.

Sadly, the interplay of subtyping with intersection and union sorts can have odd outcomes. A easy (however moderately synthetic) case in older Luau was:

native x : (quantity?) & (string?) = nil
native y : nil = nil
y = x — Kind ‘(quantity?) & (string?)’ couldn’t be transformed into ‘nil’
x = y

This error is attributable to a failure of subtyping, the outdated subtyping algorithm stories that (quantity?) & (string?) will not be a subtype of nil. This can be a false optimistic, since quantity & string is uninhabited, so the one doable inhabitant of (quantity?) & (string?) is nil.

That is a man-made instance, however there are actual points raised by creators attributable to the issues, for instance https://devforum.roblox.com/t/luau-recap-july-2021/1382101/5. Presently, these points principally have an effect on creators making use of refined kind system options, however as we make kind inference extra correct, union and intersection sorts will grow to be extra frequent, even in code with no kind annotations.

This class of false positives now not happens in Luau, as we’ve moved from our outdated strategy of syntactic subtyping to another known as semantic subtyping.

Syntactic subtyping

AKA “what we did earlier than.”

Syntactic subtyping is a syntax-directed recursive algorithm. The attention-grabbing instances to take care of intersection and union sorts are:

Reflexivity: T is a subtype of T
Intersection L: (T₁ & … & Tⱼ) is a subtype of U each time among the Tᵢ are subtypes of U
Union L: (T₁ | … | Tⱼ) is a subtype of U each time the entire Tᵢ are subtypes of U
Intersection R: T is a subtype of (U₁ & … & Uⱼ) each time T is a subtype of the entire Uᵢ
Union R: T is a subtype of (U₁ | … | Uⱼ) each time T is a subtype of among the Uᵢ.

For instance:

By Reflexivity: nil is a subtype of nil
so by Union R: nil is a subtype of quantity?
and: nil is a subtype of string?
so by Intersection R: nil is a subtype of (quantity?) & (string?).

Yay! Sadly, utilizing these guidelines:

quantity isn’t a subtype of nil
so by Union L: (quantity?) isn’t a subtype of nil
and: string isn’t a subtype of nil
so by Union L: (string?) isn’t a subtype of nil
so by Intersection L: (quantity?) & (string?) isn’t a subtype of nil.

That is typical of syntactic subtyping: when it returns a “sure” consequence, it’s appropriate, however when it returns a “no” consequence, it is perhaps flawed. The algorithm is a conservative approximation, and since a “no” consequence can result in kind errors, it is a supply of false positives.

Semantic subtyping

AKA “what we do now.”

Quite than pondering of subtyping as being syntax-directed, we first contemplate its semantics, and later return to how the semantics is carried out. For this, we undertake semantic subtyping:

The semantics of a sort is a set of values.
Intersection sorts are considered intersections of units.
Union sorts are considered unions of units.
Subtyping is considered set inclusion.

For instance:

Kind
Semantics

quantity
{ 1, 2, 3, … }

string
{ “foo”, “bar”, … }

nil
{ nil }

quantity?
{ nil, 1, 2, 3, … }

string?
{ nil, “foo”, “bar”, … }

(quantity?) & (string?)
{ nil, 1, 2, 3, … } ∩ { nil, “foo”, “bar”, … } = { nil }

and since subtypes are interpreted as set inclusions:

Subtype
Supertype
As a result of

nil
quantity?
{ nil } ⊆ { nil, 1, 2, 3, … }

nil
string?
{ nil } ⊆ { nil, “foo”, “bar”, … }

nil
(quantity?) & (string?)
{ nil } ⊆ { nil }

(quantity?) & (string?)
nil
{ nil } ⊆ { nil }

So in keeping with semantic subtyping, (quantity?) & (string?) is equal to nil, however syntactic subtyping solely helps one course.

That is all nice and good, but when we wish to use semantic subtyping in instruments, we want an algorithm, and it seems checking semantic subtyping is non-trivial.

Semantic subtyping is difficult

NP-hard to be exact.

We will cut back graph coloring to semantic subtyping by coding up a graph as a Luau kind such that checking subtyping on sorts has the identical consequence as checking for the impossibility of coloring the graph

For instance, coloring a three-node, two colour graph will be achieved utilizing sorts:

kind Pink = “purple”
kind Blue = “blue”
kind Coloration = Pink | Blue
kind Coloring = (Coloration) -> (Coloration) -> (Coloration) -> boolean
kind Uncolorable = (Coloration) -> (Coloration) -> (Coloration) -> false

Then a graph will be encoded as an overload operate kind with subtype Uncolorable and supertype Coloring, as an overloaded operate which returns false when a constraint is violated. Every overload encodes one constraint. For instance a line has constraints saying that adjoining nodes can’t have the identical colour:

kind Line = Coloring
& ((Pink) -> (Pink) -> (Coloration) -> false)
& ((Blue) -> (Blue) -> (Coloration) -> false)
& ((Coloration) -> (Pink) -> (Pink) -> false)
& ((Coloration) -> (Blue) -> (Blue) -> false)

A triangle is comparable, however the finish factors additionally can’t have the identical colour:

kind Triangle = Line
& ((Pink) -> (Coloration) -> (Pink) -> false)
& ((Blue) -> (Coloration) -> (Blue) -> false)

Now, Triangle is a subtype of Uncolorable, however Line will not be, because the line will be 2-colored. This may be generalized to any finite graph with any finite variety of colours, and so subtype checking is NP-hard.

We take care of this in two methods:

we cache sorts to cut back reminiscence footprint, and
hand over with a “Code Too Advanced” error if the cache of sorts will get too massive.

Hopefully this doesn’t come up in follow a lot. There may be good proof that points like this don’t come up in follow from expertise with kind programs like that of Commonplace ML, which is EXPTIME-complete, however in follow you need to exit of your approach to code up Turing Machine tapes as sorts.

Kind normalization

The algorithm used to determine semantic subtyping is kind normalization. Quite than being directed by syntax, we first rewrite sorts to be normalized, then test subtyping on normalized sorts.

A normalized kind is a union of:

a normalized nil kind (both by no means or nil)
a normalized quantity kind (both by no means or quantity)
a normalized boolean kind (both by no means or true or false or boolean)
a normalized operate kind (both by no means or an intersection of operate sorts) and many others

As soon as sorts are normalized, it’s simple to test semantic subtyping.

Each kind will be normalized (sigh, with some technical restrictions round generic kind packs). The vital steps are:

eradicating intersections of mismatched primitives, e.g. quantity & bool is changed by by no means, and
eradicating unions of capabilities, e.g. ((quantity?) -> quantity) | ((string?) -> string) is changed by (nil) -> (quantity | string).

For instance, normalizing (quantity?) & (string?) removes quantity & string, so all that’s left is nil.

Our first try at implementing kind normalization utilized it liberally, however this resulted in dreadful efficiency (advanced code went from typechecking in lower than a minute to operating in a single day). The explanation for that is annoyingly easy: there may be an optimization in Luau’s subtyping algorithm to deal with reflexivity (T is a subtype of T) that performs an affordable pointer equality test. Kind normalization can convert pointer-identical sorts into semantically-equivalent (however not pointer-identical) sorts, which considerably degrades efficiency.

Due to these efficiency points, we nonetheless use syntactic subtyping as our first test for subtyping, and solely carry out kind normalization if the syntactic algorithm fails. That is sound, as a result of syntactic subtyping is a conservative approximation to semantic subtyping.

Pragmatic semantic subtyping

Off-the-shelf semantic subtyping is barely completely different from what’s carried out in Luau, as a result of it requires fashions to be set-theoretic, which requires that inhabitants of operate sorts “act like capabilities.” There are two explanation why we drop this requirement.

Firstly, we normalize operate sorts to an intersection of capabilities, for instance a horrible mess of unions and intersections of capabilities:

((quantity?) -> quantity?) | (((quantity) -> quantity) & ((string?) -> string?))

normalizes to an overloaded operate:

((quantity) -> quantity?) & ((nil) -> (quantity | string)?)

Set-theoretic semantic subtyping doesn’t assist this normalization, and as a substitute normalizes capabilities to disjunctive regular type (unions of intersections of capabilities). We don’t do that for ergonomic causes: overloaded capabilities are idiomatic in Luau, however DNF will not be, and we don’t wish to current customers with such non-idiomatic sorts.

Our normalization depends on rewriting away unions of operate sorts:

((A) -> B) | ((C) -> D) → (A & C) -> (B | D)

This normalization is sound in our mannequin, however not in set-theoretic fashions.

Secondly, in Luau, the kind of a operate utility f(x) is B if f has kind (A) -> B and x has kind A. Unexpectedly, this isn’t at all times true in set-theoretic fashions, as a result of uninhabited sorts. In set-theoretic fashions, if x has kind by no means then f(x) has kind by no means. We don’t wish to burden customers with the concept that operate utility has a particular nook case, particularly since that nook case can solely come up in lifeless code.

In set-theoretic fashions, (by no means) -> A is a subtype of (by no means) -> B, it doesn’t matter what A and B are. This isn’t true in Luau.

For these two causes (that are largely about ergonomics moderately than something technical) we drop the set-theoretic requirement, and use pragmatic semantic subtyping.

Negation sorts

The opposite distinction between Luau’s kind system and off-the-shelf semantic subtyping is that Luau doesn’t assist all negated sorts.

The frequent case for wanting negated sorts is in typechecking conditionals:

— initially x has kind T
if (kind(x) == “string”) then
— on this department x has kind T & string
else
— on this department x has kind T & ~string
finish

This makes use of a negated kind ~string inhabited by values that aren’t strings.

In Luau, we solely enable this sort of typing refinement on take a look at sorts like string, operate, Half and so forth, and not on structural sorts like (A) -> B, which avoids the frequent case of normal negated sorts.

Prototyping and verification

Through the design of Luau’s semantic subtyping algorithm, there have been adjustments made (for instance initially we thought we had been going to have the ability to use set-theoretic subtyping). Throughout this time of fast change, it was vital to have the ability to iterate rapidly, so we initially carried out a prototype moderately than leaping straight to a manufacturing implementation.

Validating the prototype was vital, since subtyping algorithms can have sudden nook instances. Because of this, we adopted Agda because the prototyping language. In addition to supporting unit testing, Agda helps mechanized verification, so we’re assured within the design.

The prototype doesn’t implement all of Luau, simply the useful subset, however this was sufficient to find delicate function interactions that will most likely have surfaced as difficult-to-fix bugs in manufacturing.

Prototyping will not be good, for instance the principle points that we hit in manufacturing had been about efficiency and the C++ normal library, that are by no means going to be caught by a prototype. However the manufacturing implementation was in any other case pretty simple (or no less than as simple as a 3kLOC change will be).

Subsequent steps

Semantic subtyping has eliminated one supply of false positives, however we nonetheless have others to trace down:

Overloaded operate functions and operators
Property entry on expressions of advanced kind
Learn-only properties of tables
Variables that change kind over time (aka typestates)

The hunt to take away spurious purple squiggles continues!

Acknowledgments

Due to Giuseppe Castagna and Ben Greenman for useful feedback on drafts of this submit.

Alan coordinates the design and implementation of the Luau kind system, which helps drive lots of the options of growth in Roblox Studio. Dr. Jeffrey has over 30 years of expertise with analysis in programming languages, has been an lively member of quite a few open-source software program initiatives, and holds a DPhil from the College of Oxford, England.

Source link