Code in modern days is mobile in the sense that it gets downloaded and executed on the fly very often. In JVM they are usually shipped with:
- Platform independent code
- Organized into class files.
Class Loaders
The class loader is the part of the JVM that takes the bytecode and interprets all the class files to make the program runnable.
Security Considerations
Requirements for Security
- Mobile code cannot be trusted
- Code may not be type safe
- Code may destroy or modify data
- Code may expose personal information
- Code may crash the underlying VM
- Code may purposefully degrade performance (denial ofservice)
This is somewhat similar to the trust relations we explored in Notions of Security. Anyways, since the code is not trusted, we need a way to prove some of its properties.
Some properties that are easy to check are: ▪ Proper execution requires that
- Each instruction is type correct
- Only initialized variables are read
- No stack over- or underflow occurs
This can be done by verification of the types and runtime checking.
Sandboxed Execution Environment
The program only accesses the underlying OS through some API that have access control implemented. In this way we have a limitation of the damage that the system can have.
- Sandbox- Applets get access tosystem resources onlythrough an API- Access control can beimplemented- Security relies on- Type safety- Code does not by-passsandbox
Summary of the JVM
The Java Virtual Machine is stack-based similar to the original assembly programs you have studied at starting of your bachelor.
- 
Most operations popoperands from a stack and push a result. 
- 
But here, we also have Registers that store method parametersand local variables 
- 
Instructions are typed 
- 
Load and store instructions access registers 
- 
Control is handled by intra-method jumps(goto, conditional branches) 
Type Verification
We can use type inference or type checking, at the end we still need some runtime checking for covariant or contravariant types.
Why Use Type-Inference?
We can simulate the execution of the code, just using the types instead of values (for some sort of abstract resolution). Each operation has some transition rules. We can follow these transition rules and check if there will be still a transition at the end or not, in this way if there is no transition, I already know that there is an error.
Types of the Engine
In the JVM inference engine we have the first layer of primitive types, then composite (Objects and arrays) and null for null reference.
graph BT
  subgraph "Type Hierarchy"
    int --> T
    Object --> T
    float --> T
    intArr["int[]"] --> Object
    floatArr["float[]"] --> Object
    C --> Object
    objArr["Object[]"] --> Object
    cArr["C[]"] --> objArr
    null --> intArr
    null --> floatArr
    null --> C
    null --> cArr
  end
Where T is the top type for uninitialized registers and C be used for Class, Collection, or a generic Type
Some Rules
MS stands for maximum stack size. MR is a maximum number of local register variable And dot stands for concatenation.
$iconst\ n:( S,R ) \rightarrow ( int.S,R ), \text{ if } \lvert S \rvert < M_S$
$iload\ n:( S,R ) \rightarrow ( int.S,R ),\text{ if } 0 \leq n < M_R \land R( n ) = int \land \lvert S \rvert < M_S$
$astore\ n:( t.S,R ) \rightarrow ( S,R\{ n \leftarrow t \} ),\text{ if } 0 \leq n < M_R \land t <: Object$, we use object so that we have a reference (ints or floats would fail).
$invokevirtual\ C.m.\sigma:( t'_n...t'_1.t'.S,R ) \rightarrow ( r.S,R ), \text{ if }\sigma = r ( t_1,...,t_n ) \land t' <: C \land t'_i <: t_i$. C stands for class name, m for method name and $\sigma$ the method signature. The stack needs to contain the parameters in reverse order, the end stack needs the result. The parameters of the function need to be subtypes of the signture, while the receiver subtype of the class $C$.
 
Example from the Course Slides
Smallest Common Supertype
Jumps in the control flow can lead to joins. When you have a top object in the stack with different types, the smallest common supertype is used now. If we have multiple subtyiping, the result would not be well-defined, for this reason we:
- Ignore interfaces (they are all objects)
- Then in Java you can only subclass once, and we have resolved this problem.
But we need runtime checks for invokeinterface
Type Inference Algorithm
$in( 0 ) := ( [ \ ] , [ P_0,...,P_n,T,...,T ] )$ $worklist := \{ i \mid \text{instr}_i \text{ is an instruction of the method} \}$ $\mathbf{while}\ worklist \neq \emptyset\ \mathbf{do}$ $\quad i := \min( worklist )$ $\quad \text{remove } i \text{ from } worklist$ $\quad out( i ) := apply\_rule( \text{instr}_i, in( i ) )$ $\quad \mathbf{foreach}\ q \text{ in } successors( i )\ \mathbf{do}$ $\quad \quad in( q ) := pointwise\_scs( in( q ), out( i ) )$ $\quad \quad \mathbf{if}\ in( q ) \text{ has changed}\ \mathbf{then}\ worklist := worklist \cup \{ q \}$ $\quad \mathbf{end}$ $\mathbf{end}$
Remember q is the following instruction (could be more than one).
There is very little information that is needed to make this algorithm work, but it may be slow.
Pointwise SCS
$pointwise\_scs\left( \left( [ s_1.....s_k ] , [ t_0,...,t_n ] \right) ,\left( [ s'_1.....s'_k ] , [ t'_0,...,t'_n ] \right) \right) =$ $\quad\left( \left[ scs( s_1,s'_1 ).....scs( s_k,s'_k ) \right] , \left[ scs( t_0,t'_0 ),...,scs( t_n,t'_n ) \right] \right)$
pointwise_scs is undefined for stacks of different heights Bytecode verification results in an error if called with this.
Type Checking Algorithm
We can avoid SCS and fixed point computation by marking what is the expected type at the beginning and at the end of some specific code block. This means we need to extend the class code with more type information.
$\mathbf{foreach}\ \text{basic block of a method body}\ \mathbf{do}$ $\quad \mathbf{foreach}\ \{ i \mid \text{instr}_i \text{ is an instruction of basic block} \}\ \mathbf{do}$ $\quad \quad \mathbf{if}\ types( i ) \text{ is declared}\ \mathbf{then}\ in := types( i )$ $\quad \quad in := apply\_rule( \text{instr}_i, in )$ $\quad \quad \mathbf{foreach}\ q \text{ in } successors( i )\ \mathbf{do}$ $\quad \quad \quad \mathbf{if}\ types( q ) \text{ is declared}\ \mathbf{then}$ $\quad \quad \quad \quad \text{check that } in \text{ is assignable to } types( q )$ $\quad \quad \quad \mathbf{end}$ $\quad \quad \mathbf{end}$ $\quad \mathbf{end}$ $\mathbf{end}$
 
Example of a type checking run, slide from the Course Slides
Type Inference for Source Programs
We can also apply type-inference at the start, at the source, not only at the bytecode level. If you have experience in programming, this is the standard version that you probably might know. Type inference can be done at this level to levy some typing burdens to the programmer. You can also use type annotations as in python to help the inference.
- Inference of method signatures generally requires knowledge of all implementations
- Inference of field types generally requires knowledge of all accesses to the field
- Inference of these types is non-modular
- Or based on speculation