Performance Evaluation of the Real-time Capabilities of Linux on Modern System on a Chips

Authors: David Vincze, Tamas Kovacshazy
 
The open-source GNU public licensed free Linux operating system starts to gain wide scale acceptance in embedded systems applying high-end system on a chips (SoC) such as TI's Sitara line SoCs, Freescales i.MX family or Intel Edison on both on the ARM and on the x86 architecture. Linux itself offers numerous approaches to write embedded software ranging from shell script programming, various scripting languages like Python or Node.js, or C programming in user or kernel space. In addition, the scheduler of Linux provides scheduling classes and priorities both in user and kernel space, including "real-time"
scheduling classes like FIFO, RR, or DEADLINE (in newer kernels).
Further complicating the picture, these modern SoCs have on-board dedicated but limited capability micro-controllers for real-time processing to free the programmers from dealing with the non-deterministic behavior of operating systems and high-end hardware (caches, speculative execution, queues, etc.). For example, the TI Sitara am335x line of SoCs have two Programmable Real-time Unit (PRU) on board, some SoCs from the i.MX line include an ARM Cortex-M4 micro-controller, while Edison have an x86 based Quark core for such purposes. In other words, they can be considered as a heterogeneous systems, with a main processor running Linux, and the micro-controller running some dedicated tasks, and the two communicating through dedicated hardware interfaces.
At the end, the software architects and programmers have too many options to make decisions about how to distribute and schedule functionality on Linux, i.e., where and how to run certain software components especially those with real-time requirements. In addition, there is very limited know-how how the various programming models, scheduling classes and priorities, user or kernel space programming, and heterogeneous architectures influence performance, especially on SoC platforms. To gain some insights on this question it is reasonable to test the applicable solutions with a simple real-time benchmark and compare their performance. For performance evaluation we used the readily available TI Sitara am3359 SoC on a Beaglebone White single board computer that have two PRUs with real-time industrial communication subsystem capabilities (PRU-ICSS).
The benchmark must be simple to allow the implementation on all realistic approaches for implementation and also must be easy to measure with good precision. We have selected simple digital waveform generation as the benchmark, i.e., the task is to generate a 50% duty cycle square wave on an I/O PIN of the SoC with predefined frequencies. This benchmark is also relevant from the point of view of real-time properties as we directly investigate the timing performance of the solutions with this benchmark.
As most of the SoC platforms have very deterministic HW level I/O port access compared to the PC platform, where the PCI or PCIE bus introduces high jitter in the range of microseconds, the precision of the measurement is primarily defined by the measurement instrument used. We use a Rohde-Schwarz R&SŽRTO1014 oscilloscope with time measurement software options. We evaluate and show the real-time performance of the following solutions:
 - Bash script using the standard I/O subsystem,
 - Javascript using the standard I/O subsystem,
 - User space C using the standard I/O subsystem,
 - User space C using low level register access,
 - User space C using low level register access and real-time scheduling class,
 - Kernel space C,
 - Kernel space C and real-time scheduling class,
 - PRU based programming.
In the final paper and presentation we show the detailed introduction of the evaluation environment, the measurement results and analysis of the capabilities of Linux and the TI Sitara am553x SoC platform. Naturally, our results are specific for the used platform and may change even on the used SoC platform with the Linux kernel or by just programming the test software differently, however, we think that the results clearly show the performance and operational envelop of these programming models generally.