Increasing the predictability of interrupt handlers through manual cache-control

Authors: Hendrik Borghorst, Olaf Spinczyk

Real-time operating systems have been around for some time, but they are never tailored for the use on modern multi-core processors with unpredictable timing behavior caused by different timings between the processor and the DRAM-controller. Operating system based cache management is one possibility to reduce the unpredictability of the processor by controlling the code and data which resides in the cache. This eliminates the variation of execution times caused by accesses to the main memory.
The main idea of an OS-based cache management is that sequential prefetching of data out of the DRAM can be done within predictable time bounds. This prefetched data is loaded into the LLC (last-level cache) where the access times are quite stable, as we have measured.
To make use of this concept it is essentially that the operating system is structured in small components which we call OSC (operating system component). Each of these components should fit inside one or multiple cache-ways of the LLC. When an OSC is needed, for example for a system call, the OSC is prefetched from the main memory to the LLC. This method shifts the unpredictability from random memory accesses to predictable cache hits for this specific component.
Interrupt handlers however need a special treatment because it is not predictable when interrupts arrive. This fact requires that a minimal first-stage interrupt handler must be guaranteed to reside inside the cache to always have predictable memory access times. This interrupt handler then checks which interrupt source triggered the interrupt and prefetches the corresponding second stage interrupt handler from the memory to the cache.
An evaluation of this interrupt handling method showed that it is possible to significantly reduce the timing variations of the interrupt handlers executed on our target platform, an ARM-based Cortex-A9. However there will be an overhead because second stage interrupt handlers have to be prefetched first.  For evaluation purposes a simple implementation of an LRU-based method had been used. But it should be further evaluated if a more sophisticated replacement policy for the needed OSCs could reduce the overhead.