[Xenomai] Command line freeze during xeno-regression-test on omap4460

Gilles Chanteperdrix gilles.chanteperdrix at xenomai.org
Sun Apr 6 17:28:50 CEST 2014


On 04/06/2014 05:22 PM, Andreas Glatz wrote:
> 
> On 6 Apr 2014, at 15:44, Gilles Chanteperdrix wrote:
> 
>> On 04/06/2014 01:21 PM, Andreas Glatz wrote:
>>>
>>> On 4 Apr 2014, at 11:44, Gilles Chanteperdrix wrote:
>>>
>>>> On 04/04/2014 12:27 PM, Andreas Glatz wrote:
>>>>> Hi Gilles,
>>>>>
>>>>> I'm finally back to my original problem below:
>>>>>
>>>>> On 6 Jan 2014, at 17:39, Gilles Chanteperdrix wrote:
>>>>>
>>>>>> On 01/06/2014 04:30 PM, Andreas Glatz wrote:
>>>>>>> Hi,
>>>>>>>
>>>>>>> I managed to produce a kernel (v3.8.13) with xenomai 2.6.3 ipipe
>>>>>>> patch and
>>>>>>> rootfs (debian wheezy) with xenomai 2.6.3 libraries for my
>>>>>>> Pandaboard ES
>>>>>>> (omap4460). The simple regression test, which only calls dd  
>>>>>>> during
>>>>>>> the
>>>>>>> switchtest, works fine. However the regression test with the  
>>>>>>> linux
>>>>>>> test
>>>>>>> project (ltp-full-20130904) scripts causes some sort of system  
>>>>>>> lock
>>>>>>> up.
>>>>>>> After that I only can ctrl-c xeno-regression-test (i.e.
>>>>>>> switchtest), which,
>>>>>>> however, doesn't help to regain console access (neigher over
>>>>>>> ethernet nor
>>>>>>> serial).
>>>>>>>
>>>>>>> Here's what I did:
>>>>>>>
>>>>>>> -- Building --
>>>>>>> As recomended in the Xenomai 2.6 readme I followed the  
>>>>>>> instructions
>>>>>>> in [1]
>>>>>>> to produce a kernel and filesystem. To get a xenomai kernel I had
>>>>>>> to do
>>>>>>> three things differently:
>>>>>>>
>>>>>>> *) I used: git checkout origin/v3.8.x -b tmp
>>>>>>> *) I applied ipipe-core-3.8.13-arm-3.patch from the xenomai-2.6  
>>>>>>> git
>>>>>>> tree as
>>>>>>> described in the Xenomai 2.6 readme
>>>>>>> *) I disabled KGDB and TIDSPBRIDGE since those produced compile
>>>>>>> errors (see
>>>>>>> config [2])
>>>>>>>
>>>>>>> After a while I obtained the following messages from dmesg [3]  
>>>>>>> and
>>>>>>> from the
>>>>>>> command prompt:
>>>>>>>
>>>>>>> root at arm:~# cat /proc/version
>>>>>>> Linux version 3.8.13-x3.6 (aglatz at linuxvbox) (gcc version 4.7.3
>>>>>>> 20130328
>>>>>>> (prerelease) (crosstool-NG linaro-1.13.1-4.7-2013.04-20130415 -
>>>>>>> Linaro GCC
>>>>>>> 2013.04) ) #4 SMP Sat Jan 4 15:54:20 GMT 2014
>>>>>>>
>>>>>>> -- Testing Linux --
>>>>>>> To see if everything works I downloaded and cross-compiled
>>>>>>> ltp-full-20130904 [4] with the same toolchain and flags (-
>>>>>>> march=armv7-a
>>>>>>> -mfpu=vfp3) as the xenomai libs and runtime. I started ltp with  
>>>>>>> "./
>>>>>>> runltp
>>>>>>> -p -l dohell-2014-01-06-1.log -S xenomai.skiplist" and after a
>>>>>>> while it
>>>>>>> finished with a few failed tests [5]. The console access,  
>>>>>>> however,
>>>>>>> worked
>>>>>>> fine.
>>>>>>>
>>>>>>> -- Testing Xenomai --
>>>>>>> First I sucessfully could run the simple xenomai regression test:
>>>>>>> xeno-regression-test -l "/usr/lib/xenomai/testsuite/dohell -m / 
>>>>>>> tmp
>>>>>>> 100" -t
>>>>>>> 2 which produced the output in [6] and the following additional
>>>>>>> messages
>>>>>>> with dmesg:
>>>>>>>
>>>>>>> [  476.215057] Xenomai: RTDM: closing file descriptor 1.
>>>>>>> [  477.434936] Xenomai: Posix: destroying semaphore f0069c00.
>>>>>>> [  477.440887] Xenomai: Posix: destroying mutex f0069a00.
>>>>>>> [  477.475372] xnheap: destroying shared heap 'rt_heap: heap'  
>>>>>>> with
>>>>>>> 16384
>>>>>>> bytes still in use.
>>>>>>> [  479.008453] Xenomai: Switching rt_task to secondary mode after
>>>>>>> exception
>>>>>>> #0 from user-space at 0x9620 (pid 2145)
>>>>>>> [  480.574462] Xenomai: watchdog triggered -- signaling runaway
>>>>>>> thread
>>>>>>> 'rt_task'
>>>>>>> [  480.582061] [sched_delayed] sched: RT throttling activated
>>>>>>> [  557.336425] Xenomai: Posix: closing message queue descriptor  
>>>>>>> 3.
>>>>>>>
>>>>>>> and  "cat /proc/xenomai/*" produced [7].
>>>>>>>
>>>>>>> When I started the realistic xenomai regression test: xeno-
>>>>>>> regression-test
>>>>>>> -l "/usr/lib/xenomai/testsuite/dohell -m /tmp -l /opt/ltp" -t 2
>>>>>>> everything
>>>>>>> seemed fine at first - I could logon and start top to inspect the
>>>>>>> running
>>>>>>> processes. However, the command line (over serial and ethernet)
>>>>>>> consistently freezes after a while (at different ltp tests  
>>>>>>> though).
>>>>>>> First I
>>>>>>> thought it's the massive system load which doesn't leave CPU for
>>>>>>> the
>>>>>>> console... however ctrl-c of xeno-regression-test does not help  
>>>>>>> to
>>>>>>> regain
>>>>>>> console access...
>>>>>>
>>>>>> That is because kill xeno-regression-test does not kill all the
>>>>>> script children. So, basically, the load tasks are still running.
>>>>>> Also, what filesystem is /tmp? dohell is using dd to alternatively
>>>>>> write to /tmp, then erase the file. If /tmp is some flash, it will
>>>>>> become slow after a while. If it is a tmpfs, it will eat RAM.
>>>>>>
>>>>>>
>>>>>
>>>>> The described problem is _very_ reproducible on my PandaBoard ES
>>>>> (omap4460), where I boot from an SD card partition and the rootfs  
>>>>> is
>>>>> also on the SD card partition. I tried it with several kernel
>>>>> versions
>>>>> (3.8.13, 3.10.18, and 3.10.34) with the latest ipipe and xenomai  
>>>>> from
>>>>> git the git repos. Everytime I start the regression test (see  
>>>>> command
>>>>> above) the following happens: Everything works fine until the  
>>>>> switch/
>>>>> latency tests start. Then I see that there is heavy access to the  
>>>>> SD
>>>>> card, which is expected, as the status LED 2 is blinking. After
>>>>> ~5mins
>>>>> this status LED is constantly on. That's when I know that  
>>>>> everything
>>>>> is over. On the console I can only execute commands that are  
>>>>> already
>>>>> in RAM, such as the bash things like ps, mount, ... However, if I  
>>>>> try
>>>>> a simple 'touch new' it blocks forever and I know that it blocks in
>>>>> the syscall where the file should be created, because I looked at  
>>>>> it
>>>>> with strace. I tried several things: I turned off CONFIG_PM (which
>>>>> was
>>>>> on by default), turned on the MMC debugging, put extra prink's in  
>>>>> the
>>>>> omap_hsmmc.c ISR. However, everything seems to work on this level:
>>>>> DMA
>>>>> requests are started and do finish, the ISR is called regularly (bc
>>>>> first I though that Xenomai would starve it).
>>>>>
>>>>> Have you every run Xenonmai on this _specific_ board (since
>>>>> everything
>>>>> is running smoothly on the omap5 board)?
>>>>> Any more ideas how to debug it?
>>>>>
>>>>> Currently, I'm compiling the ipipe trace in hope that it would tell
>>>>> me
>>>>> something useful...
>>>>>
>>>>> Oh yes, the best bit is that the regression test works perfectly  
>>>>> fine
>>>>> if I boot from an external USB HD _AND_ unmount (!) all MMC
>>>>> partitions.
>>>>
>>>> So, the MMC driver has a problem. Have you tried:
>>>> - running the exact same kernel configuration only with  
>>>> CONFIG_XENOMAI
>>>> disabled (and stress with dohell)
>>>> - then with CONFIG_XENOMAI and CONFIG_IPIPE disabled.
>>>>
>>>> Also, do you have this patch in the tree you tried?
>>>> http://git.xenomai.org/ipipe.git/commit/?h=stable/ipipe-3.10.18&id=c26e7ad5679f9391cd8ea1db001bf301d2f6bc88
>>>>
>>>
>>> First i mounted tmpfs on /tmp so I don't wear out the SD card too  
>>> much:
>>> mount -t tmpfs -osize=192M tmpfs /tmp
>>>
>>> Then I used the following line to start the test (substitute MYTEST
>>> below with the following line):
>>> /usr/lib/xenomai/testsuite/dohell -m /tmp -l /opt/ltp
>>>
>>> Note: I always monitored the test over wifi with 'top' so I also had
>>> some network load...
>>>
>>> I got the following results with the 3.10.34 kernel, which includes
>>> everything up to the current ipipe-3.10 tag (it also included the
>>> patch you mentioned):
>>>
>>> - xeno-regression-test "MYTEST" -> FAIL if booted from SD card (see
>>> description above); OK if booted from ext USB HD _AND_ no mmc
>>> partitions mounted
>>> - CONFIG_IPIPE && CONFIG_XENOMAI && MYTEST -> FAIL (got status LED 2
>>> constantly on as described above)
>>> - CONFIG_IPIPE && MYTEST -> OK (see attached config file and ltp test
>>> log)
>>>
>>> Anything else I should try?
>>
>> Is the current LTP test when the failure happens always the same?
>>
>>
> 
> I went through all the logfiles on my pandaboard and and identified  
> the last tests that ltp logged before the error occurred (I'm assuming  
> that ltp writes to the file in /opt/ltp/results after completing the  
> test since there is the PASS/FAIL note as well, which logically should  
> only be available after completing the test):
> 
> test                               count
> ========================
> rt_sigqueueinfo01    1
> clock_nanosleep01 10
> munmap02                1
> semget06                   1
> epoll_create1_01     5
> splice01                      1
> clock_getres01          1
> rename13                   1
> BindMounts                1
> utimes01                     1
> 
> So it seems that the test after 'clock_nanosleep01', which is  
> 'clone01' according to the LTP log file I sent you, seems to be the  
> prime hotspot of failure followed by 'epoll01', which comes after  
> 'epoll_create1_01'.
> 
> I'm using the standard LTP version 'ltp-full-20130904', which I  
> downloaded and compiled on the target with gcc 4.6.3 (default debian  
> wheezy).

Ok. I am not sure it is meaningful. Anyway, the only difference between
CONFIG_XENOMAI + CONFIG_IPIPE and CONFIG_IPIPE alone, provided that you
are not running any program using Xenomai, is the host tick emulation.

So, could you please try to turn off
CONFIG_NO_HZ_IDLE
CONFIG_NO_HZ
CONFIG_HIGH_RES_TIMERS

And see if it works better?

-- 
                                                                Gilles.




More information about the Xenomai mailing list