[Xenomai] Command line freeze during xeno-regression-test on omap4460

Gilles Chanteperdrix gilles.chanteperdrix at xenomai.org
Sun Apr 6 16:44:21 CEST 2014


On 04/06/2014 01:21 PM, Andreas Glatz wrote:
> 
> On 4 Apr 2014, at 11:44, Gilles Chanteperdrix wrote:
> 
>> On 04/04/2014 12:27 PM, Andreas Glatz wrote:
>>> Hi Gilles,
>>>
>>> I'm finally back to my original problem below:
>>>
>>> On 6 Jan 2014, at 17:39, Gilles Chanteperdrix wrote:
>>>
>>>> On 01/06/2014 04:30 PM, Andreas Glatz wrote:
>>>>> Hi,
>>>>>
>>>>> I managed to produce a kernel (v3.8.13) with xenomai 2.6.3 ipipe
>>>>> patch and
>>>>> rootfs (debian wheezy) with xenomai 2.6.3 libraries for my
>>>>> Pandaboard ES
>>>>> (omap4460). The simple regression test, which only calls dd during
>>>>> the
>>>>> switchtest, works fine. However the regression test with the linux
>>>>> test
>>>>> project (ltp-full-20130904) scripts causes some sort of system lock
>>>>> up.
>>>>> After that I only can ctrl-c xeno-regression-test (i.e.
>>>>> switchtest), which,
>>>>> however, doesn't help to regain console access (neigher over
>>>>> ethernet nor
>>>>> serial).
>>>>>
>>>>> Here's what I did:
>>>>>
>>>>> -- Building --
>>>>> As recomended in the Xenomai 2.6 readme I followed the instructions
>>>>> in [1]
>>>>> to produce a kernel and filesystem. To get a xenomai kernel I had
>>>>> to do
>>>>> three things differently:
>>>>>
>>>>> *) I used: git checkout origin/v3.8.x -b tmp
>>>>> *) I applied ipipe-core-3.8.13-arm-3.patch from the xenomai-2.6 git
>>>>> tree as
>>>>> described in the Xenomai 2.6 readme
>>>>> *) I disabled KGDB and TIDSPBRIDGE since those produced compile
>>>>> errors (see
>>>>> config [2])
>>>>>
>>>>> After a while I obtained the following messages from dmesg [3] and
>>>>> from the
>>>>> command prompt:
>>>>>
>>>>> root at arm:~# cat /proc/version
>>>>> Linux version 3.8.13-x3.6 (aglatz at linuxvbox) (gcc version 4.7.3
>>>>> 20130328
>>>>> (prerelease) (crosstool-NG linaro-1.13.1-4.7-2013.04-20130415 -
>>>>> Linaro GCC
>>>>> 2013.04) ) #4 SMP Sat Jan 4 15:54:20 GMT 2014
>>>>>
>>>>> -- Testing Linux --
>>>>> To see if everything works I downloaded and cross-compiled
>>>>> ltp-full-20130904 [4] with the same toolchain and flags (-
>>>>> march=armv7-a
>>>>> -mfpu=vfp3) as the xenomai libs and runtime. I started ltp with "./
>>>>> runltp
>>>>> -p -l dohell-2014-01-06-1.log -S xenomai.skiplist" and after a
>>>>> while it
>>>>> finished with a few failed tests [5]. The console access, however,
>>>>> worked
>>>>> fine.
>>>>>
>>>>> -- Testing Xenomai --
>>>>> First I sucessfully could run the simple xenomai regression test:
>>>>> xeno-regression-test -l "/usr/lib/xenomai/testsuite/dohell -m /tmp
>>>>> 100" -t
>>>>> 2 which produced the output in [6] and the following additional
>>>>> messages
>>>>> with dmesg:
>>>>>
>>>>> [  476.215057] Xenomai: RTDM: closing file descriptor 1.
>>>>> [  477.434936] Xenomai: Posix: destroying semaphore f0069c00.
>>>>> [  477.440887] Xenomai: Posix: destroying mutex f0069a00.
>>>>> [  477.475372] xnheap: destroying shared heap 'rt_heap: heap' with
>>>>> 16384
>>>>> bytes still in use.
>>>>> [  479.008453] Xenomai: Switching rt_task to secondary mode after
>>>>> exception
>>>>> #0 from user-space at 0x9620 (pid 2145)
>>>>> [  480.574462] Xenomai: watchdog triggered -- signaling runaway
>>>>> thread
>>>>> 'rt_task'
>>>>> [  480.582061] [sched_delayed] sched: RT throttling activated
>>>>> [  557.336425] Xenomai: Posix: closing message queue descriptor 3.
>>>>>
>>>>> and  "cat /proc/xenomai/*" produced [7].
>>>>>
>>>>> When I started the realistic xenomai regression test: xeno-
>>>>> regression-test
>>>>> -l "/usr/lib/xenomai/testsuite/dohell -m /tmp -l /opt/ltp" -t 2
>>>>> everything
>>>>> seemed fine at first - I could logon and start top to inspect the
>>>>> running
>>>>> processes. However, the command line (over serial and ethernet)
>>>>> consistently freezes after a while (at different ltp tests though).
>>>>> First I
>>>>> thought it's the massive system load which doesn't leave CPU for  
>>>>> the
>>>>> console... however ctrl-c of xeno-regression-test does not help to
>>>>> regain
>>>>> console access...
>>>>
>>>> That is because kill xeno-regression-test does not kill all the
>>>> script children. So, basically, the load tasks are still running.
>>>> Also, what filesystem is /tmp? dohell is using dd to alternatively
>>>> write to /tmp, then erase the file. If /tmp is some flash, it will
>>>> become slow after a while. If it is a tmpfs, it will eat RAM.
>>>>
>>>>
>>>
>>> The described problem is _very_ reproducible on my PandaBoard ES
>>> (omap4460), where I boot from an SD card partition and the rootfs is
>>> also on the SD card partition. I tried it with several kernel  
>>> versions
>>> (3.8.13, 3.10.18, and 3.10.34) with the latest ipipe and xenomai from
>>> git the git repos. Everytime I start the regression test (see command
>>> above) the following happens: Everything works fine until the switch/
>>> latency tests start. Then I see that there is heavy access to the SD
>>> card, which is expected, as the status LED 2 is blinking. After  
>>> ~5mins
>>> this status LED is constantly on. That's when I know that everything
>>> is over. On the console I can only execute commands that are already
>>> in RAM, such as the bash things like ps, mount, ... However, if I try
>>> a simple 'touch new' it blocks forever and I know that it blocks in
>>> the syscall where the file should be created, because I looked at it
>>> with strace. I tried several things: I turned off CONFIG_PM (which  
>>> was
>>> on by default), turned on the MMC debugging, put extra prink's in the
>>> omap_hsmmc.c ISR. However, everything seems to work on this level:  
>>> DMA
>>> requests are started and do finish, the ISR is called regularly (bc
>>> first I though that Xenomai would starve it).
>>>
>>> Have you every run Xenonmai on this _specific_ board (since  
>>> everything
>>> is running smoothly on the omap5 board)?
>>> Any more ideas how to debug it?
>>>
>>> Currently, I'm compiling the ipipe trace in hope that it would tell  
>>> me
>>> something useful...
>>>
>>> Oh yes, the best bit is that the regression test works perfectly fine
>>> if I boot from an external USB HD _AND_ unmount (!) all MMC  
>>> partitions.
>>
>> So, the MMC driver has a problem. Have you tried:
>> - running the exact same kernel configuration only with CONFIG_XENOMAI
>> disabled (and stress with dohell)
>> - then with CONFIG_XENOMAI and CONFIG_IPIPE disabled.
>>
>> Also, do you have this patch in the tree you tried?
>> http://git.xenomai.org/ipipe.git/commit/?h=stable/ipipe-3.10.18&id=c26e7ad5679f9391cd8ea1db001bf301d2f6bc88
>>
> 
> First i mounted tmpfs on /tmp so I don't wear out the SD card too much:
> mount -t tmpfs -osize=192M tmpfs /tmp
> 
> Then I used the following line to start the test (substitute MYTEST  
> below with the following line):
> /usr/lib/xenomai/testsuite/dohell -m /tmp -l /opt/ltp
> 
> Note: I always monitored the test over wifi with 'top' so I also had  
> some network load...
> 
> I got the following results with the 3.10.34 kernel, which includes  
> everything up to the current ipipe-3.10 tag (it also included the  
> patch you mentioned):
> 
> - xeno-regression-test "MYTEST" -> FAIL if booted from SD card (see  
> description above); OK if booted from ext USB HD _AND_ no mmc  
> partitions mounted
> - CONFIG_IPIPE && CONFIG_XENOMAI && MYTEST -> FAIL (got status LED 2  
> constantly on as described above)
> - CONFIG_IPIPE && MYTEST -> OK (see attached config file and ltp test  
> log)
> 
> Anything else I should try?

Is the current LTP test when the failure happens always the same?

-- 
                                                                Gilles.




More information about the Xenomai mailing list