[Xenomai] Kernel freezes in __ipipe_sync_stage

Philippe Gerum rpm at xenomai.org
Wed Jun 25 10:39:31 CEST 2014


On 06/25/2014 09:50 AM, Marco Tessore wrote:
> Il 24/06/2014 19:10, Philippe Gerum ha scritto:
>> On 06/24/2014 06:41 PM, Marco Tessore wrote:
>>> Hi,
>>>
>>> Il 20/06/2014 13:52, Gilles Chanteperdrix ha scritto:
>>>> On 06/20/2014 11:11 AM, Marco Tessore wrote:
>>>>> The kernel is version 2.6.31 for ARM architecture - specifically a
>>>> Do you have the same problem with a recent I-pipe patches, like one for
>>>> 3.8 or 3.10 kernel?
>>>>
>>>
>>> I managed to do some tests on 3.10 kernel but on onother board with
>>> imx28 CPU, actually it happens that that kernel freezes too,
>>> but I haven't debugged it with the jtag debugger.
>>>
>>> I have, instead, some information on the original problem, that is the
>>> one that worried me more:
>>>
>>> In summary:
>>> I have a board based on imx25, with kernel 2.6.31, Xenomai 2.5.6 and
>>> ipipe patch 1.16-02.
>>>
>>> Rarely, but often enough to be a problem, the kernel freezes at boot.
>>> Thanks to a JTAG debugger I'm able to observe the kernel in the
>>> following situation:
>>> I'm in an infinite loop with the following stack trace:
>>> __ipipe_set_irqpending
>>> xnintr_host_tick (__ipipe_propagate_irq)
>>> xnintr_clock_handler
>>> __ipipe_sync_stage    <- (1)
>>> ipipe_suspend_domain
>>> __ipipe_walk_pipeline
>>> __ipipe_restore_pipeline_head
>>> xnarch_next_tick_shot
>>> clockevents_program_event
>>> tick_dev_program_event
>>> hrtimer_interrupt
>>> mxc_interrupt
>>> handle_IRQ_event
>>> handle_level_irq
>>> asm_do_IRQ
>>> __ipipe_sync_stage <- (2)
>>> ipipe_suspend_domain
>>> __ipipe_walk_pipeline
>>> __ipipe_restore_pipeline_head
>>> xnpod_enable_timesource
>>> xnpod_init
>>> __native_skin_init
>>> ...
>>> ...
>>>
>>> Specifically, it happens that the first call to __ipipe_sync_stage, the
>>> one marked with the number (2), is working on a stage that I can not
>>> determine,
>>> let's say for convenience stage S1, I think is the Linux secondary
>>> domain but I'm not sure,
>>> so the function invokes the interrupt handler of the system timer.
>>> Continuing in the stack trace, I have a nested call to
>>> __ipipe_sync_stage, indicated with (1),
>>> but this call works on another stage, for convenience domain S2,
>>> in turn this function invokes a handler for the timer irq, which at a
>>> certain point invokes the __ipipe_propagate_irq which raises the flags
>>> for the stage S1,
>>> thus making the first call to __ipipe_sync_stage (2) fails to get out of
>>> their while loops.
>>>
>>> I should add that I do not see hardware interrupt for the timer in
>>> function __ipipe_grab_IRQ.
>>> I have no idea how the cycle is triggered,but when the kernel is locked,
>>> the kernel is located in the software exclusively infinite loop
>>> described above.
>>>
>>>
>>> In the hope that you could help me understand what is going on,
>>> I would have liked groped a patch like this:
>>> - Store, for each level of nesting of __ipipe_sync_stage, the irq number
>>> currently running and on behalf of which stage.
>>> - Patch the function __ipipe_set_irqpending in such a way as not to set
>>> the flags for the pair (irq, stage) if the pair is already present at
>>> some level in the current stack trace, that is,
>>> - if the function __ipipe_sync_stage is executing the handler for a
>>> stage, and then he had reset the flags in irqpend_himask and
>>> irqpend_lomask, it does not expect the handler goes to raise again the
>>> same flag for the same stage.
>>>
>>> What do you think about this?
>>>
>>> Thank you very much for any kind of advice you could give me
>>>
>>
>> You mentioned random lockups during boot. Does you board ever lock up
>> when passing xeno_hal.disable=1 on the kernel command line?
>>
> Yes, I mentioned random lockups, but always the kernel enters in the
> infinite loop described above.
> Following your suggestion I tried to pass parameter xeno_hal.disable=1
> but kernel sayed
> "Unknown boot option `xeno_hal.disable=1': ignoring"
>

This is because you are running an outdated Xenomai 2.5.x release. A 
work around is to build all the Xenomai skins as modules in the kernel 
(native, posix, vxworks etc), refraining from modloading them during the 
boot process.

> What is supposed to do this option anyway? If it would disable HAL, does
> not this inhibits xenomai realtime services?
>

This is exactly what we want. When the real-time services commence, 
control of the hardware timer is handed over to Xenomai, which enables 
pipelining of the clock source events to the co-kernel. We need to know 
in this path is involved.

> What about the patch,described above, that I would apply? say, don't
> permit that the interrupt handlers called in __ipipe_sync_stage raise a
> couple (stage, irq) already handled in the current stack?
>

This won't work, this breaks an aspect of the pipeline core logic. This 
would be papering over the issue, not fixing it, opening a can of worms 
down the road. We are not chasing a bug in the core logic at this point, 
we are more likely chasing a bug in the SoC-specific code which binds 
the hw timer to the pipeline.

First step is to determine if the system experiences an IRQ storm of 
some sort from the timer chip, and why so. By focusing on the IRQ replay 
loop which basically resyncs the current interrupt state with the past 
events logged, you may be looking at rays from an ancient sun.

> Thank you
> Marco Tessore
>
>


-- 
Philippe.




More information about the Xenomai mailing list