(Functional) Safety

In the recent weeks I had the chance to attend some automotive conferences on functional safety.

Functional safety is defined as freedom from unreasonable risk resulting from malfunctioning of electronical systems. The important point here is: malfunctioning

A lot of presentations of the conferences, however, were about highly automated and autonomous driving and how to be sure that even the well working system is safe enough.

They even have a term for this: safety of the intended function SOTIF. There will be an ISO PAS 21448 to have some systematics for the people in the automotive world.

Interestingly both topics are discussed by almost the same people. I am afraid, because some are not aware of the difference…

Libraries are Evil

We all know that reading books can be dangerous, but this is not why software libraries are evil 😉

There is a different reason for this.

Let’s assume we have module we like to create a library from:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
module libx;

struct X {
    version(B) {
        int y = 42;
    }
    int z = 3;
}

void doX(ref X x) {
    version(B) {
        x.y = 13;
    }
    x.z = 5;
}

The module libx provides a structure X and a function doX. However, they behave both different depending on the compiled version (you can think of B as e.g. a debug version).

Now to the program using the library:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
int main() {
    import libx;
    import std.stdio;

    X myX;
    version (B) {
        writeln("y ", myX.y);
    }
    writeln("z ", myX.z);
    doX(myX);
    version (B) {
        writeln("y ", myX.y);
    }
    writeln("z ", myX.z);
    return0;
}

Even though our program is aware of the different versions, if we build the library and the program using it with different versions, this will lead to really hard to debug errors.

You could argue that the problem is the multi-version library. However, this issue is even worse for compiler options. Just think about the options defining the layout of structures in memory, i.e. for aligning and padding of the members.

I want you to be able to sleep at night, so don’t even start thinking about dynamic link libraries 😉

Summary: If you want a program to be working, do not use pre-compiled libraries. Always compile with the same compiler in one run and statically link every object together.

Goal Structuring Notation

When developing safety-related software or systems you need two things: argument and evidence. The goal structuring notation (GSN) is widely accepted to bring these two things together.

Actually, there also was www.goalstructuringnotation.info providing a complete specification of GSN but it currently seems to be broken.

I had to provide some arguments and evidence for a very special use-case recently to a customer. So wrote a litte script to easily create and modify my argumentation.

This is an example result:

I put my little script on github. It takes the arguments in a YAML file as input and outputs a DOT file that can be rendered using Graphviz.

Why do things go wrong?

In discussions about software safety you often end up  arguing about something that is actually about a fault model of software.

In this post I would like to try to sketch such a fault model.

The most obvious is logic faults: the software functionally does not what it is intended to do. We could also call this fault class algorithmic faults. As examples I see faults as division by zero, uninitialized variables and faults in e.g. state machines.

The next and very bad ones are memory management faults. That are wild pointers (pointing to something invalid), dangling pointers (usage after they point to something valid), buffer overflows and misunderstanding about memory ownership.

Recently, we also stumbled over an issue created by integer promotion that was not obvious. This represents another class of potential issues. It may be specific to C, however I guess an even more type safe language may have the same problems, e.g. when casting types.

Introducing parallel processing (e.g. multi-threaded programming, usage of multi-core or even interrupt handling) creates two new classes of potential faults in software: data consistency problems and locking issues (life-lock, dead-lock).

Faults that should not be considered in discussions on software faults are reliance on implementation-specific behavior (they can be prevented by static code analysis or better: don’t do it!) or hardware faults (e.g. single event upsets).  Software can detect such hardware faults, but they are not caused by software.

Maybe I one time add a page here to describe those faults in detail and think more about completeness.