This case study has been proposed in the the Combest project by EADS Innovation Works, Germany. It concerns a distributed heterogeneous communication system (HCS) providing an all electronic communication infrastructure, typically for cabin communication in airplanes or for building automation. HCS contains various devices such as sensors (video camera, smoke detectors, temperature, pressure, etc.) and actuators (loudspeakers, light switches, temperature control, signs, etc.) connected through a wired Ethernet network to a central server. The server runs a set of services to monitor the sensors and to control the actuators. The devices are connected to the server using network access controllers (NACs).
The architecture and functionality delivered by HCS are highly heterogeneous. The system includes different hardware components, which run different protocols and software services ensuring functions with different characteristics and degree of criticality e.g, audio streaming, clock synchronization, sensor monitoring, video surveillance, etc. Moreover, HCS has to guarantee stringent requirements, such as reliable data transmission, fault tolerance, timing and synchronization constraints. For example, the latency for delivering alarm signals from sensors, or for playing audio announcements should be smaller than certain predefined thresholds. Or, the accuracy of clock synchronization between different devices, should be guaranteed under the given physical implementation of the system.
Complete details of the case study can be found on the Combest site. We have developed a structural model of HCS using BIP. At top level, the structure of the model follows the natural decomposition into physical elements e.g., server, network access controllers and devices are the top-level components. Moreover, these components are connected and interact according to the wired network connections defined in the original system. Then, one level down, every (physical) component has a functional decomposition. Inner sub-components provide features for network operation (e.g., packet delivery, filtering, routing, scheduling, ...), protocols (e.g., clock synchronization) or services (e.g., audio/video streaming, event handling, etc.)
The overall complexity of this case study is extremely high. A model for a relevant functional subsystem required approximately 300 atomic components and 1900 connectors in BIP. Almost all atomic components have timed behaviour. They totalize approximately 250 clocks variables to express all timing constraints. Moreover, the use of large domain data (e.g., packet numbers) and complex data structures (e.g., FIFO queues of packets) made the state space of the model extremely huge. One single state needs approximately 400 bytes to be represented. Furthermore, the state space has a heterogeneous structure which prevents its compact representation using symbolic techniques based on BDDs.
We have been interested to verify the clock synchronization protocol i.e., the protocol used to synchronize the clocks of all devices within the system. The challenge is to guarantee that the protocol maintains the difference between a master clock (running on the server) and all the slave clocks (running on devices) under some bound. A first major difficulty is network communication which makes all applications interfering and therefore requires exploration of the whole model. A second difficulty comes from the time granularity i.e., one microsecond, needed to perform faithful observations. These two factors significantly restrict brute-force simulation approaches : 1 second system lifetime needs approximately 10 minutes simulation time with microsecond precision on the BIP model.
To overcome these difficulties, we proposed in [1] a new verification technique which combines random simulation and statistical model checking. We have been able to derive exact bounds on clock synchronization for all devices in the system. We also computed probabilities of clock synchronization for smaller values of the bound. Being able to provide such information is of clear importance, especially when the exact bounds are too high with respect to user’s requirements. In particular, we have shown that the bounds strongly depend on the position of the device in the network. We also estimated the average and worst proportion of failures per simulation for smaller bounds i.e., how often the clock synchronization exceeds the given bound on some arbitrary run.