[Xenomai] Command line freeze during xeno-regression-test on omap4460
gilles.chanteperdrix at xenomai.org
Fri Apr 4 12:44:47 CEST 2014
On 04/04/2014 12:27 PM, Andreas Glatz wrote:
> Hi Gilles,
> I'm finally back to my original problem below:
> On 6 Jan 2014, at 17:39, Gilles Chanteperdrix wrote:
>> On 01/06/2014 04:30 PM, Andreas Glatz wrote:
>>> I managed to produce a kernel (v3.8.13) with xenomai 2.6.3 ipipe
>>> patch and
>>> rootfs (debian wheezy) with xenomai 2.6.3 libraries for my
>>> Pandaboard ES
>>> (omap4460). The simple regression test, which only calls dd during
>>> switchtest, works fine. However the regression test with the linux
>>> project (ltp-full-20130904) scripts causes some sort of system lock
>>> After that I only can ctrl-c xeno-regression-test (i.e.
>>> switchtest), which,
>>> however, doesn't help to regain console access (neigher over
>>> ethernet nor
>>> Here's what I did:
>>> -- Building --
>>> As recomended in the Xenomai 2.6 readme I followed the instructions
>>> in 
>>> to produce a kernel and filesystem. To get a xenomai kernel I had
>>> to do
>>> three things differently:
>>> *) I used: git checkout origin/v3.8.x -b tmp
>>> *) I applied ipipe-core-3.8.13-arm-3.patch from the xenomai-2.6 git
>>> tree as
>>> described in the Xenomai 2.6 readme
>>> *) I disabled KGDB and TIDSPBRIDGE since those produced compile
>>> errors (see
>>> config )
>>> After a while I obtained the following messages from dmesg  and
>>> from the
>>> command prompt:
>>> root at arm:~# cat /proc/version
>>> Linux version 3.8.13-x3.6 (aglatz at linuxvbox) (gcc version 4.7.3
>>> (prerelease) (crosstool-NG linaro-1.13.1-4.7-2013.04-20130415 -
>>> Linaro GCC
>>> 2013.04) ) #4 SMP Sat Jan 4 15:54:20 GMT 2014
>>> -- Testing Linux --
>>> To see if everything works I downloaded and cross-compiled
>>> ltp-full-20130904  with the same toolchain and flags (-
>>> -mfpu=vfp3) as the xenomai libs and runtime. I started ltp with "./
>>> -p -l dohell-2014-01-06-1.log -S xenomai.skiplist" and after a
>>> while it
>>> finished with a few failed tests . The console access, however,
>>> -- Testing Xenomai --
>>> First I sucessfully could run the simple xenomai regression test:
>>> xeno-regression-test -l "/usr/lib/xenomai/testsuite/dohell -m /tmp
>>> 100" -t
>>> 2 which produced the output in  and the following additional
>>> with dmesg:
>>> [ 476.215057] Xenomai: RTDM: closing file descriptor 1.
>>> [ 477.434936] Xenomai: Posix: destroying semaphore f0069c00.
>>> [ 477.440887] Xenomai: Posix: destroying mutex f0069a00.
>>> [ 477.475372] xnheap: destroying shared heap 'rt_heap: heap' with
>>> bytes still in use.
>>> [ 479.008453] Xenomai: Switching rt_task to secondary mode after
>>> #0 from user-space at 0x9620 (pid 2145)
>>> [ 480.574462] Xenomai: watchdog triggered -- signaling runaway
>>> [ 480.582061] [sched_delayed] sched: RT throttling activated
>>> [ 557.336425] Xenomai: Posix: closing message queue descriptor 3.
>>> and "cat /proc/xenomai/*" produced .
>>> When I started the realistic xenomai regression test: xeno-
>>> -l "/usr/lib/xenomai/testsuite/dohell -m /tmp -l /opt/ltp" -t 2
>>> seemed fine at first - I could logon and start top to inspect the
>>> processes. However, the command line (over serial and ethernet)
>>> consistently freezes after a while (at different ltp tests though).
>>> First I
>>> thought it's the massive system load which doesn't leave CPU for the
>>> console... however ctrl-c of xeno-regression-test does not help to
>>> console access...
>> That is because kill xeno-regression-test does not kill all the
>> script children. So, basically, the load tasks are still running.
>> Also, what filesystem is /tmp? dohell is using dd to alternatively
>> write to /tmp, then erase the file. If /tmp is some flash, it will
>> become slow after a while. If it is a tmpfs, it will eat RAM.
> The described problem is _very_ reproducible on my PandaBoard ES
> (omap4460), where I boot from an SD card partition and the rootfs is
> also on the SD card partition. I tried it with several kernel versions
> (3.8.13, 3.10.18, and 3.10.34) with the latest ipipe and xenomai from
> git the git repos. Everytime I start the regression test (see command
> above) the following happens: Everything works fine until the switch/
> latency tests start. Then I see that there is heavy access to the SD
> card, which is expected, as the status LED 2 is blinking. After ~5mins
> this status LED is constantly on. That's when I know that everything
> is over. On the console I can only execute commands that are already
> in RAM, such as the bash things like ps, mount, ... However, if I try
> a simple 'touch new' it blocks forever and I know that it blocks in
> the syscall where the file should be created, because I looked at it
> with strace. I tried several things: I turned off CONFIG_PM (which was
> on by default), turned on the MMC debugging, put extra prink's in the
> omap_hsmmc.c ISR. However, everything seems to work on this level: DMA
> requests are started and do finish, the ISR is called regularly (bc
> first I though that Xenomai would starve it).
> Have you every run Xenonmai on this _specific_ board (since everything
> is running smoothly on the omap5 board)?
> Any more ideas how to debug it?
> Currently, I'm compiling the ipipe trace in hope that it would tell me
> something useful...
> Oh yes, the best bit is that the regression test works perfectly fine
> if I boot from an external USB HD _AND_ unmount (!) all MMC partitions.
So, the MMC driver has a problem. Have you tried:
- running the exact same kernel configuration only with CONFIG_XENOMAI
disabled (and stress with dohell)
- then with CONFIG_XENOMAI and CONFIG_IPIPE disabled.
Also, do you have this patch in the tree you tried?
More information about the Xenomai