Verification of B+-Trees with Interactive Theorem Proving and Shape Analysis
This page describes proofs for the correctness of the insert and delete algorithms on B+Trees.
The proofs have three parts:
- The toplevel proof of the main recursion using well-founded induction in KIV.
- The proofs of all subroutines were done using the shape analysis tool TVLA.
- Constraints used in TVLA were verified based on algebraic specifications of the data types using KIV.
The KIV proofs are organized in two projects. A generic library that specifies untyped heaps, explained
next. Types, predicates and algorithms specific for B+-Trees are in the second project. Finally, the TVLA proofs are explained.
KIV-Projects always link to an overview which gives the dependency graph of specifications.
Clicking on one specification gives its text, and links to a theorem base summary (which lists names of theorems)
and a theorem base listing (which shows the full sequent for each theorem). From there individual
theorems and the full details of each proof are accessible.
The Heap-Library
The Heap library project
specifies generic heaps with predicative types. It defines various predicates (like reachability, tree shape etc.)
fo these and proves elementary facts about them. The project contains
- Elementary specifications of
generic references r
(type ref has a =null predicate which specifies "is a null pointer") and
generic objects d (type data).
- A generic heap is a partial function that maps finitely many non-null references to objects. Only mappings that obey
a predicative type discipline can be constructed.
The data type of
generic heaps is specified as a non-free algebraic data type with two constructors: ∅ creates an empty heap.
H[k,d] adds or overwrites key k to point to object d, if the type of key k is compatible with the type of object d
(specified as
predicate comp).
k ∈ H tests whether key k is allocated, i.e. mapped to some object. In this case H[r] returns
the object
to which k is mapped. comp(r,H[r]) is always true for allocated references. # H gives the number of allocated keys.
- Objects are assumed to have some fields which store references. The available fields are assumed to be from a
specification of selector names. Selectors are classified into horizontal (hsel), vertical (vsel) and other
selectors. This classification is used in the definition of height for tree shaped pointer structures below.
- The set of selector names is assumed to be finite: a list of
all selectors must exist.
- Again a predicative type system is used on selectors and updates:
comp(d,sn) and comp(d,sn,r)
check, whether selector name sn is applicable on object d, and whether the field type of field sn of object d is compatible
with the type of reference r to allow an update.
get(d,sn) returns
the result of field access in this case, and
set(d,sn,r)
overwrites field sn of object d with reference r.
- Generic heaps allow the definitions of a number of properties. These are used to verify consistency rules used by TVLA:
- Predicate ok
specifies, that a heap is consistent, i.e. that it contains no dangling pointers.
- A subset relation on heaps and a domain function that returns the set of allocated nodes can be defined.
- paths through a pointer
structure and reachability can be defined.
- Predicate root
specifies that all allocated objects are reachable from a root reference.
- Predicate tree specifies tree-shaped structures. For those
height
can be defined as well as
size (number of nodes).
Note that only compatible selectors must be followed to compute height and size.
- The height of a tree can be used to define
balanced trees,
where all leaves are at the same depth.
- Finally,
unchanged(H0,H1,r)
ensures that modifications done to get from heap H0 to heap H1 affect nodes only, that were reachable from a node r
(for tree-shaped structures: which affected the subtree below r only).
- The library finally specifies
untyped heaps,
which have a single null pointer, no selectors on objects, and no type constraints. These are an
the instance ,
of generic heaps, where all compatibility predicates are true or just disallow allocation of the null pointer.
The two parameters ref and data of untyped heaps can be actualized to define instances of heaps
which contain one individual type of references and objects.
The B+Tree Project
Project B+Tree
specifies the data types required for B+Trees, the algorithms. it contains the proofs of the main
theorems as well as proofs for TVLA constraints that are specific to B+Trees. The project is organized
as follows:
- Data Type node
specifies the type of nodes of a B+Tree. A node can be either a branch or a leaf. A branch stores a sequence
branch entries,
a leaf stores a sequence of
leaf entries.
All three data types are specified as free data types.
- Each of these objects contain typed references to other objects. A node reference (of type nref) points to a node,
a branch resp. leaf entry reference (types beref and neref) point to entries. All three reference types
are given (together with constants for their null pointers)
here.
- nodeheap,
bentryheap,
lentryheap
are three instances of the
heap specification
from the library. Each defines a separate heap for nodes, leaf and branch entries.
- To adapt the generic definitions of reachability, paths etc. that the untyped setting of TVLA uses to a typed setting, we also
define an instance of
generic heaps with paths etc. from the library. This instance needs a
the sum type of all references ,
the sum type of all objects ,
an enumeration type of the two selector names next and down ,
a specification of a generic get function with a compatibility function (comp), that checks, when get is applicable.
Finally, the three typed heaps are
collected in a tuple,
and
accessor functions are defined.
With this tuple, we get an
instance of the
generic definitions of the library.
- Based on these specifications the main algorithms for
insert_top#, that adds a key to a B+Tree and
delete_top#, that removes a key from a B+Tree
can be defined. The algorithms are recursive, and based on various subroutines for splitting or merging leafs and branches,
and moving a node to the left or right sibling node (for delete).
These subroutines are verified with TVLA.
In KIV, their contracts are specified.
- All contracts are based on
definitions of predicates
that specify, when a pointer structure is a well-formed B+Tree (predicate btree). Auxiliary predicates are partly
inherited from the generic library (predicate tree that specifies tree shape), partly defined here: sorted (based on sorted1, and ok)
specifies the sorting constraints, balanced specifies the constraint that all leaves must be at the same depth, and nzsels
specifies the requirement, that all branches and leaves must have at least one entry
(a weaker constraint than that each node must have between n and 2n entries). okpath(H,r,k) specifies that the key k should be inserted or deleted from the subtree below reference r.
- The B+Tree properties are extended to cover node sizes as well, given by the function nsize and the function nset, that computes horizontally reachable nodes. The rank of the B+Tree is given by a constant N with 2 ≤ N.
- Over the specifications of insert and delete, the two main inductive theorems proved in KIV are
for the mutually recursive routines insert# and insert_be# and
for the mutually recursive routines delete# and delete_be# . Based on these and the definition of an abstraction function,
the two main refinement theorems for insert_top# and delete_top# are proved in
the toplevel specification .
- The project also contains various proofs of constraints that were used in TVLA. Proofs that are independent of the instantiation with B+trees can be found in the specifications named tvla_* and are partitioned into
consistency rules that follow from invariants,
tvla-updates that proves update formulas,
tvla-guards that proves preservation of invariants and
pre-/post that proves equivalence of global and local characterizations of invariants.
For example:
- The consistency rules for
predicates eqh, olh
which serve as the definition for balancedness (balance-sa) for shape analysis
are proved here.
- Here
is the proof (in the generic setting), that checks the guard for preserving tree shape, when some field in an object is updated. The predicate
uforest specifies unshared forests using the TVLA encoding.
- Here and
here
are the proofs (in the generic setting) that check that the precondition regarding tree shape for a subroutine established in KIV
indeed implies the global invariant specified by consistency rules in TVLA and vice versa.
- Here is similar proof for the equivalence of the different characterizations of sorting.
- The proofs that are specific for B+trees are in the following specifications
- btree-consistency
- btree-update , that proves preservation of the ≤next predicate, called ok in the kiv specifications and facts about changes to the size of a node
- btree-guards , that proves preservation of sortedness under the respective guards
TVLA-Proofs for subroutines
The TVLA input files consist of
- The specification of instrumentation predicates and consistency rules for B+Tree invariants in the files
{tree,balance,sorting,size,create}.spec.tvp
- The implementation of subroutines is in the files ending with
.imp3
.
They are written in a simple imperative language, the translator is available here .
The generated .tvp
files are included in the archive. The translator requires python 2.x and depends on the zapps parser module - installation instructions are included.
- Proofs for the different properties are executed independently. There are folders tree,balance,sorting,size which contain ready-to-run TVLA input files and a script
run.sh
that runs all proofs in that folder. Additionally, the folder "create" contains the proof for root_create
.
- If you wish to generate the TVLA inputs, you can run
make SPEC={tree,balance,sorting,size,create} [prog.ps]
, where SPEC is the property to prove and the optional [prog.ps] specifies the subroutine to check. Please run make clean
whenever you switch between different values of SPEC. The pregenerated files in the subfolders are not overwritten by these commands.
- Note:
TVLA generates the resulting postscript documents by itself per default. There is, however, a tool
bin/fmtdot.py
that rewrites the dot files to omit some of the edges and node labels and colors the output. It is called automatically from the Makefile (not by run.sh
), so you might want to remove the following lines from your TVLA activation script.
if [ -e $1.dt ]
then
echo "Converting output to PostScript..."
dot -Tps -o$1.ps < $1.dt
fi
Also, increase the maximum memory of the Java virtual machine in this script, e.g. -mx2000m
.
- The proofs are successful if no messages are generated. You can check this by
grep -i message *.dt
after running the analysis or by inspecting the generated postscript documents.
-
actions0.tvp
defines statement semantics, actions.tvp
defines further actions.
- File
$SPEC.spec.tvp
specifies the predicates and consistency rules.
- File
$SPEC.include.tvp
defines property-specific statement semantics (e.g., update formulas and guards).
- File
$SPEC.tvs
specifies the initial structure. It is refined in code from $SPEC.imp3 which contains initialization and contracts for the respective property
- The current version of TVLA is available here . We used version 3 for the verification.
Used basic libraries:
Here you can find all our publications.
Back to the overview of all KIV projects.
For general informations on KIV and a download site for the whole system refer to this page.
[Imprint]
[Data Privacy Statement]
[E-Mail]