\[...\]This has led to innumerable errors, vulnerabilities, and system crashes.” ~ Tony Hoare 1
The core problem addressed here is Null Safety and how to properly initialize objects to ensure type invariants (specifically non-nullness) hold true.
Simple Non-Null Types
The goal is to prevent null-dereferencing errors statically. Most of the variables are null, after initialization.
Non-Null Types vs. Possibly-Null Types
- Non-null type
T!: Consists of references toT-objects. It cannot hold null. - Possibly-null type
T?: Consists of references toT-objects plusnull. This corresponds to the standard typeTin languages like Java, and the most common case among most languages.
Type Safety and Invariant
- Invariant: If the static type of an expression
eis a non-null type, thene’s value at runtime must be different fromnull. - Enforcement: We require non-null types for the receiver of each field access, array access, method call, and throw statement. Equivalent to message not understood cited in Typing and Subtyping.
Subtyping and Casts
The values of T! are a proper subset of T?.
- Subtyping:
S! <: T!(if S extends T)S? <: T?(if S extends T)T! <: T?(Non-null is a subtype of possibly-null) 6.
- Downcasts: Casting from
T?(possibly-null) toT!(non-null) requires a runtime check., so we knowT?is not a subtype ofT!.
graph TD
A[object?]
B[T?]
C[S?]
D(object!)
E(T!)
F(S!)
B --> A
C --> B
D -.-> A
E -.-> B
E --> D
F -.-> C
F --> E
style D fill:#f0f8ff,color:#007bff,stroke:#007bff,stroke-width:2px
style E fill:#f0f8ff,color:#007bff,stroke:#007bff,stroke-width:2px
style F fill:#f0f8ff,color:#007bff,stroke:#007bff,stroke-width:2px
Type Rules
Some expressions in a language need to be non-null
- Receiver of field access
- Receiver of array access
- Receiver of method call
- Expression of a throw statement. If not, you can just get a null exception. Now we would like to have compile time error, to make things statically safe.
Safe Handling Null
Control Flow Analysis (Dataflow Analysis) How do we safely use a possibly-null type? We can check against null.
- Definite Assignment: Java/C# use dataflow analysis to ensure local variables are assigned before use.
- Null Checks: If we check
if (n != null), the compiler can treatnas non-null inside the block. - Limitations: Dataflow analysis works well for local variables but not for heap locations (fields) because of aliasing and side effects (e.g., a method call
foo(this)might set a field to null behind the scenes). Concurrency might be a problem. I have personally seen this thing for the first time in Typescript during my internship at Cubbit.
Object Initialization
The main challenge is: How do we construct an object of a non-null type?
When we create an object (new T()), fields start as null (or default values). We must ensure all non-null fields are assigned non-null values before the constructor terminates.
The purpose is establishing some invariant. We want to know when it is possible to rely on the invariants.
The Escaping Problem
A naive definite assignment check for fields isn’t enough. If the this reference “escapes” the constructor (is passed to another method, stored in a global field, etc.) before the object is fully initialized, other code might see null fields that are supposed to be T!.
Escaping Scenarios:
- Method Calls: Calling a dynamically bound method on
thisinside the constructor. A subclass might override this method and access a field that hasn’t been initialized yet. - Callbacks: Registering
thisas a listener (e.g., Observer pattern) inside the constructor. The Subject might call back immediately. - Concurrent Access: Publishing
thisto a static field where another thread picks it up, bu the object has not been initialized yet.
Initialization Phases
To solve this, we track the state of the object during construction using a type system.
- Free (Under Construction): The object is being created. Fields may be null.
- Committed (Initialized): The object is fully initialized. All non-null invariants hold.
Construction Types
For every class T, we distinguish three states for references:
T!/T?(Committed): Standard types. Construction is complete, if you read non null, it means, we only read non null, for the other cases its not guaranteed.free T!/free T?(Free): References to objects under construction (likethisinside a constructor).unc T!/unc T?(Unclassified): We don’t know if it’s free or committed (common supertype).- There are no casts from unclassified to free or committed types.
Initialization Requirements
- Requirement 1: Local Initialization: An object is locally initialized if its non-null fields have non-null values. Committed types must be locally initialized.
- Requirement 2: Transitive Initialization: If an object is committed, everything it reaches (references) must also be committed (transitively initialized).
Handling Cyclic Structures
Cyclic structures (e.g., a Node pointing to another Node) pose a problem. We cannot have both objects fully “committed” before they reference each other.
- Solution: We allow
freereferences to be assigned to fields of an object under construction. - The constructor parameters can be declared
freeto allow passing partially initialized objects (likethis).- It will be completed only after the last one is completed for requirement two of being well typed.
class List {
List! next; // cyclic list
List(int n) {
if (n == 1) {
next = this;
} else {
next = new List(this, n);
}
}
List(free List! last, int n) {
if (n == 2) {
next = last;
} else {
next = new List(last, n - 1);
}
}
}
Type Rules for Initialization
- Field Write:
e1.f = e2. Ife1is free (under construction), we can assign committed values to it. Ife1is committed, we cannot assign a free or unitialized values to it (preserves transitive initialization). - Field Read:
- Method Calls: Methods must declare if they accept
freereceivers. You cannot call a standard method on afreeobject unless that method is markedfree(meaning it knows how to handle partially initialized objects)22.
Subtyping for Initialization
free T! <: unc T! <: T!free T? <: unc T? <: T?
Lazy Initialization
We explain here what are usually the main advantages of lazy initialization. To reduce startup time when initializing an application we use lazy initialization methods. Sometimes we want to delay initialization until the field is accessed.
- Since the field starts as
null, it must be declaredT?(possibly-null) internally. - The getter method checks for null, initializes if necessary, and returns the value as
T!(non-null).
Non-Null Arrays
Arrays are difficult because they don’t have constructors in the traditional sense; they are initialized to default values (null).
- Problem:
String![] s = new String![5]creates an array of nulls, violating the typeString!. - Solutions:
- Array initializers:
s = { "a", "b" }. - Runtime checks/Assertion methods (e.g., Spec#
NonNullType.AssertInitialized(s)). Methods cannot check within runtime loops (they can initialize other parts).
- Array initializers:
Since arrays have really two references, we can have many types for arrays: `Person! [ ] ! a;Person? [ ] ! b;Person! [ ] ? c;Person? [ ] ? d;
Static Initializers
- Static initializers are executed once one of the following occurs:
- The class is instantiated.
- A static field is accessed.
- A static method is called.
Initialization of Global Data
Global data (Singletons, Factories, Flyweights) must be initialized before access.
Design Goals
- Effectiveness: Ensure initialization before first access.
- Clarity: Clean semantics.
- Laziness: Initialize only when needed to save startup time.
Based on section 6.3 of the provided lecture slides, here is the detailed breakdown of the approaches for initializing global data, including the code snippets you requested.
Global Vars and Init-Methods
This approach uses global variables to store references to global data, but relies on explicit calls to initialization methods to set them up. This is often the most basic way to handle globals in languages that support them.
- Mechanism: Explicit
init()calls that must be invoked, usually from a main function. - Pros: It is simple to implement.
- Cons:
- Manual Ordering: The programmer must manually code the order of initialization to satisfy dependencies, which is error-prone.
- Encapsulation: Main methods often need to know internal module dependencies to call inits in the right order, breaking information hiding.
- No Laziness: It generally requires upfront initialization unless manually coded otherwise.
// Global variable declaration
global Factory theFactory;
void init( ) {
theFactory = new Factory( );
}
class Factory {
HashMap flyweights;
Flyweight create( Data d ) { ... }
}
// Client usage
Flyweight f = theFactory.create( ... );
Static Fields and Initializers (Java/C#)
Java and C# use static fields to store global data and static initializer blocks to initialize them. These blocks run automatically immediately before the class is first used (e.g., creation of an instance, static method call, or static field access)666.
- Mechanism:
static { ... }blocks executed by the runtime system. - Pros:
- Automatic & Lazy: Initialization happens just in time when the class is needed, reducing startup time7.
- System-managed: The system handles the triggering of initialization8.
- Cons:
- Mutual Dependencies: If class A’s static initializer triggers class B, and class B triggers class A, the cycle can lead to crashes or
NullPointerExceptionsbecause initialization is considered “in progress” and won’t restart, leaving fields uninitialized 9. - Side Effects: Static initializers can have arbitrary side effects (like modifying other static fields), making it hard to reason about the program state since execution order depends on which class is accessed first 10.
- Mutual Dependencies: If class A’s static initializer triggers class B, and class B triggers class A, the cycle can lead to crashes or
class Factory {
static Factory theFactory;
HashMap flyweights;
// Static initializer block
static {
theFactory = new Factory( );
}
Flyweight create( Data d ) { ... }
}
// Initialization triggered here automatically
Factory o = Factory.theFactory;
Flyweight f = o.create( ... );
Scala Objects
Scala provides direct language support for the Singleton pattern using the object keyword. This defines a class and a single instance of that class simultaneously12.
-
Mechanism:
object SingletonName { ... }. -
Pros: Syntactic sugar and language-level support for singletons.
-
Cons: Under the hood, this often translates to Java static fields and initializers. Therefore, it inherits all the pros and cons of the static field approach, including issues with mutual dependencies and side effects13.
Scala
object Factory {
val flyweights: HashMap[ ... ]
def create( d: Data ): Flyweight = {
// ... implementation ...
}
}
Eiffel Once Methods
Eiffel uses once methods (routines). The body of a once method is executed only the first time it is called. The result is cached and returned for all subsequent calls.
- Mechanism: The
oncekeyword applied to a feature (method). - Pros:
- Laziness: The initialization code runs only when the data is actually requested.
- Caching: Provides a consistent global access point.
- Cons:
- Recursion/Mutual Dependencies: If a
oncemethod recursively calls itself (directly or via another object) during its first execution, it returns the current (partial) result rather than waiting or crashing. This often leads to meaningless values like0ornullbeing used. - Parameter Ignoring: Arguments are used only for the first call; subsequent calls ignore arguments, which can be confusing.
- Recursion/Mutual Dependencies: If a
class FlyweightMgr
feature
theFactory: Factory
-- "once" ensures this runs only once
once
create Result
end
end
-- Usage
o := manager.theFactory
f := o.createFlyweight( ... )