Flashix: Formal Models and Proofs

The Flashix project aims at the construction and verification of a state-of-the-art file system for flash memory. This page provides an overview and details of the formal development. We describe the formal models, specifications and implementation layers. The development is done using the KIV Verification System.

This work is part of the project Verifcation of Flash File Systems (orig. Verifikation von Flash-Dateisystemen, RE828/13-1, DFG/German Research Council).

News

License and Disclaimer

The material on this web-site represents (excerpts of) ongoing work. It is to be understood as documentation of the project. Thereby, we do not provide any support and warranties whatsoever. You can use the material or excerpts of it for personal information and experimentation; and communication with others in unmodified form as long as attribution to the authors (current researchers), institute, university, and DFG project identification is included.

Note that we do not provide an actual running code at the moment. If you use this material, be sure to understand the assumptions and conditions under which the verification has been conducted (see below). The word "guarantee" below must always be understood relative to the assumptions.

Please contact the team if you want to contribute. We're looking for cooperations. If you think that your research may interest us, is related, or neglected in our citations, please feel free to contact us as well.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

Resources

We provide short-cuts to the complete material available here. For a description see below.

The project is rather complete. Currently, the only missing feature that remains to be verified is efficient indexing with B+ trees; and the C-code generator is currently in preparation.

Archive

Previous versions of some of the models and proofs that have been published alongside some of the papers are available as well:

Overview, Assumptions, Guarantees

As a basis for the concepts necessary we use the flash file system UBIFS, that was integrated into the official Linux kernel in 2008.

We follow a correctness-by-construction approach, namely by incremental refinement from an abstract description of the POSIX file system interface down to executable code. Conceptually, the main refinement steps involve mapping paths as found in the POSIX layer to Inodes (similar to VFS in Linux), introduction of index data structures for persistent/cached file system objects towards the actual erase blocks, using an indirection from logical blocks to support wear-leveling (similar to the UBI layer in UBIFS).

Furthermore, we assume that that the KIV theorem prover, the code generator and the compiler are correct, and that the hardware/driver implements our specification (or at least that we do not exploit incorrect behavior). Note that we consider it an orthogonal aspect to verify the generated code or the generator itself (and we have not done this). The challenges addressed in this project lie in the algorithms, in particular the ones for recovery after errors and power-cuts.

Architecture and Layers

Figure 2 shows an overview over the components in the file system. Click on these for a short (informal) description, and a link to the respective specification and implementation. Note that not all formal models are depicted individually in the picture.

[x]

Click on the layers in the diagram on the left to get a short description of the respective formal model.

Operations op in the signature are written as

op (in1, ..., inn; inout1, ..., inoutm)

with input parameters in1, ..., inn and input/output (reference) parameters inout1, ..., inoutm separated by a semicolon ;.

The state of each layer is a vector of (typed) variables

x: T

The POSIX specification defines an algebraic directory tree with an indirection for hard-links to files at the leaves. Directories and files are accessed by paths.

Signature:

create   (path, metadata)
mkdir    (path, metadata)
link     (path, path')
unlink   (path)
rmdir    (path)
open     (path; fd)
close    (fd)
read     (fd, len; buf)
write    (fd, buf; len)
truncate (path, size)
readmeta (path; md)
setmeta  (path, md)
rename   (path, path')
readdir  (path; names)

State:

tree: directory-tree
fs:   fid → file
of:   fd  → handle

Formal Specification

The Virtual Filesystem Switch realizes aspects common to all file systems, in particular path lookup, access permission checks, mapping of file content to pages, open file handles and orphaned files. The VFS implements the POSIX interface and thereby has the same set of operations. It depends on a file system implementation (precisely, on an abstract description AFS of file system behavior) to provide lower-level operations.

VFS State: of: fd → handle
AFS State: dirs: ino → dir, files: ino → file

VFS Model
AFS Specification

An abstract model of power-cut safety is given by LOGFS. It maintains an in-RAM state and a possibly outdated flash state. The difference is kept in a log.
RAM: rdirs: ino → dir, rfiles: ino → file,
Flash: fdirs: ino → dir, ffiles: ino → file,
log: list of logentry

LOGFS Specification

The core of the file system introduces flash specific concepts namely out-of-place updates and indexing. The provided operations correspond almost 1:1 to the POSIX operations, except that they are defined locally (paths are already traversed).

Signature:

create (ino; dentry)
...

The state consists of a current RAM Index ri, an outdated flash index fi, an unodered store of file system objects (called nodes) fs and a log. Furthermore, so-called orphans (files that are still in use but unreachable from the directory tree) need to be recorded.

ri, fi:  key → address
fs:      address → node
log:     list of address
ro, fo:  set of key

This state is a simple, abstract view of the core concepts. The actual data structures implementing these variables are given in the three subcomponents orphans, index, and journal.

Formal Model

The RAM index stores a (partial) mapping from keys to addresses of file system objects. It is implemented as a B+ tree ultimtately (which means that special care has to be taken to prevent allocation failures at inconvenient times). The index provides some advanced functions that iterate over entries of directories resp. pages of a file.

Signature:

newino   (; key)
checkkey (key; exists)
lookup   (key; adr)
insert   (key, adr)
delete   (key)
truncate (key, n)
entries  (key; keys)

commit_fi
load_fi

The Flash index is an old copy of that information, created during the last commit.

Formal Specification

Purpose of the Journal/LOG is to write groups of file system objects to flash memory in a sequential fashion, and to deal with the fact that flash is structured into erase blocks. Purpose of the journal is twofold

To guarantee that groups of objects (nodes) can be written atomically wrt. power-cuts.
Provide garbage collection of erase blocks (possibly by moving live data around). For this reason, it is closely linked to the index.

Signature

get(adr; node)
add(node1, ..., noden; adr1, ..., adrn)

Formal Specification
Implementation
Abstraction Relation
Refinement Proofs (incl. Crash-Safety)

The orphan component stores a set of inode numbers that have been unlinked completely from the file system tree and must be deleted after the last file handle is closed. This information is updated incrementally and written to flash during commit.

Signature:

insert    (key)
delete    (key)
contains  (key; res)
...

Formal Model

The persistence layer is the interface that changes to a block and byte-based representation. It provides the means to access all file system data, such as group and B+ tree nodes, flash orphans, the log, the superblock, block properties (used/free space) and so on. The layer defines the layout of the various regions on flash. It also implements the commit operation. It relies on the write buffer to cache partial written flash pages (these are currently flushed after each top-level operation by writing some padding data).

Signature (simplified excerpt):

read_node   (block, offset; node)
add_node    (block, node)
flush       (block)
allocate    (; block)

read_log    (block, log)

write_superblock

...

Formal Specification
Implementation
Abstraction Relation
Refinement Proofs & Crash-Safety Proof

The write buffer implements write buffer functionality on top of the EBM interface. Since flash memory can generally be only written at page granularity, the write buffer caches partially written pages. Each currently accessed block therefore has an attached object that contains this page. It provides an interface that is very similar to the hardware (MTD) and to the erase block management (EBM), however, erasing is always done asynchronously via the unmap operation, or, in case the logical block is immediately reused, by remap. The change operation atomically changes the contents of a logical block, this is used by the persistence layer to commit the super-block to flash. There is no flush operation - writing up to a page boundary is sufficient.

Signature (simplified excerpt):

create_buf  (block, offset)
destroy_buf (block)
read        (block, offset, len; buf)
write       (block, offset, len, buf)
change      (block, len, buf)
remap       (block)
unmap       (block)

Project Overview:

The Erase Block Management layer defines the interface through with flash memory is accessed by the upper layers. Blocks are addressed logically and can be written (append only), read (randomly), and erased synchronously or asynchronously (as a whole).

The implementation links to the interface to the Flash driver. It implements the mapping from logical to physical erase blocks (which is updated transparently to the upper layers). Its main purpose is wear-leveling, i.e., to distribute erases evenly between all physical blocks of the flash medium. It furthermore implements simple volume management (omitted in the signature).

Signature:

read     (block, offset, len; buf)
write    (block, offset, len, buf)
erase    (block)
map      (block)
unmap    (block)
change   (block, len, buf)

Formal Specification
Implementation

The IO layer is a simple interface that is responsible to encode/decode all information that the EBM implementation stores on flash other than user-data, such as block headers, which contain an inverse mapping to logical blocks.

Formal Models (spec & impl)

The MTD model specifies the Linux memory technology device model. It represents the lowest level in our refinement chain and encodes all assumptions about the hardware and the driver implementation.

Signature:

read     (block, offset, len; buf)
write    (block, offset, len, buf)
erase    (block)
mark_bad (block)
is_bad   (block; flag)
get_block_count (; n)

State:

pebs: array of (array of byte, fill, bad)

Formal Specification

Flashix - Development of a fully verified file system for flash memory - Models and Proofs

Table of Contents

News

License and Disclaimer

Resources

Archive

Overview, Assumptions, Guarantees

Architecture and Layers

SibylFS Test Run