[Xenomai] "RT throttling" issue

JK.Behnke at web.de JK.Behnke at web.de
Fri Dec 4 17:24:28 CET 2015


Hello Gilles,

sorry for bothering you again.

> >
> > Using the I-pipe tracer, you may be able to understand what happens.
> Should I patch my Linux kernel so that xntrace_user_freeze is called,
> when Linux scheduler sends "RT throttling activated" message?

I now have been able to capture an I-pipe frozen file, when the
"RT throttling" condition occurs.
I have done this by adding the line
  ipipe_trace_freeze(1);
right before
  printk_sched("sched: RT Throttling activated\n");
inside "kernel/sched/rt.c".
I have set back_trace_points to 10000 to get about 1 second of
backtrace history.

Here is a fragment of my huge frozen file which contains
the last 1.6 seconds before the freeze.

//================== frozen file start =======================
 +----- Hard IRQs ('|': locked)
 |+-- Xenomai
 ||+- Linux ('*': domain stalled, '+': current, '#': current+stalled)
 |||              +---------- Delay flag ('+': > 1 us, '!': > 10 us)
 |||              |       +- NMI noise ('N')
 |||              |       |
      Type      User Val.   Time    Delay  Function (Parent)
:|  # [ 4936] -<?>-    0 -1613869      0.852  xnpod_resume_thread+0x4e (xnthread_timeout_handler+0x1e)
:|  # event   tick at -1611814-1613868+   3.538  xntimer_next_local_shot+0xb4 (xntimer_tick_aperiodic+0x1a0)
:|  # [ 5066] prc6hmi -1 -1613864+   1.964  __xnpod_schedule+0x77 (xnintr_clock_handler+0x105)
:|  # [ 4936] -<?>-    0 -1613862+   3.127  __xnpod_schedule+0x45d (xnpod_suspend_thread+0x197)
:|  # [ 4936] -<?>-    0 -1613859+   1.192  __xnpod_schedule+0x77 (xnpod_suspend_thread+0x171)
:|  # [ 5066] prc6hmi -1 -1613858!  27.286  __xnpod_schedule+0x45d (xnintr_clock_handler+0x105)
:|  *+[ 4936] -<?>-    0 -1613831+   1.824  xnpod_resume_thread+0x4e (gatekeeper_thread+0xf1)
:|  # [  456] gatekee -1 -1613829+   4.200  __xnpod_schedule+0x77 (xnpod_schedule_handler+0x29)
:|  # [ 4936] -<?>-    0 -1613825      0.952  __xnpod_schedule+0x77 (xnpod_suspend_thread+0x171)
:|  # [  456] gatekee -1 -1613824!  13.001  __xnpod_schedule+0x45d (xnpod_schedule_handler+0x29)
:|  *+[ 4936] -<?>-    0 -1613811      0.882  xnpod_resume_thread+0x4e (gatekeeper_thread+0xf1)
:|  # [  456] gatekee -1 -1613810+   2.506  __xnpod_schedule+0x77 (xnpod_schedule_handler+0x29)
:|  # [ 4936] -<?>-    0 -1613807      0.952  __xnpod_schedule+0x77 (xnpod_suspend_thread+0x171)
:|  # [  456] gatekee -1 -1613806!  11.588  __xnpod_schedule+0x45d (xnpod_schedule_handler+0x29)
:|  *+[ 4936] -<?>-    0 -1613795      0.721  xnpod_resume_thread+0x4e (gatekeeper_thread+0xf1)
:|  # [  456] gatekee -1 -1613794+   6.395  __xnpod_schedule+0x77 (xnpod_schedule_handler+0x29)
:|  # [ 4936] -<?>-    0 -1613788      0.922  __xnpod_schedule+0x77 (xnpod_suspend_thread+0x171)
:|  # [  456] gatekee -1 -1613787!  10.124  __xnpod_schedule+0x45d (xnpod_schedule_handler+0x29)
:|  *+[ 4936] -<?>-    0 -1613777      0.711  xnpod_resume_thread+0x4e (gatekeeper_thread+0xf1)
:|  # [  456] gatekee -1 -1613776+   2.686  __xnpod_schedule+0x77 (xnpod_schedule_handler+0x29)
:|  # [ 4936] -<?>-    0 -1613773      0.892  __xnpod_schedule+0x77 (xnpod_suspend_thread+0x197)
:|  # [  456] gatekee -1 -1613772! 1977.417  __xnpod_schedule+0x45d (xnpod_schedule_handler+0x29)
:|  # [ 4956] -<?>-   50 -1611795+   1.794  xnpod_resume_thread+0x4e (xnthread_timeout_handler+0x1e)
...
...
...
:|  # event   tick at -5129 -5382+   4.771  xntimer_next_local_shot+0xb4 (xntimer_tick_aperiodic+0x1a0)
:|  # [ 4961] -<?>-   -1 -5377+   3.448  __xnpod_schedule+0x77 (xnintr_clock_handler+0x105)
:|  # [ 4936] -<?>-    0 -5374+   7.027  __xnpod_schedule+0x45d (xnpod_suspend_thread+0x197)
:|  # [ 4936] -<?>-    0 -5367+   1.172  __xnpod_schedule+0x77 (xnpod_suspend_thread+0x171)
:|  # [ 4961] -<?>-   -1 -5366! 253.408  __xnpod_schedule+0x45d (xnintr_clock_handler+0x105)
:|  # [ 4957] -<?>-   50 -5112+   1.884  xnpod_resume_thread+0x4e (xnthread_periodic_handler+0x28)
:|  # event   tick at -4525 -5110+   3.548  xntimer_next_local_shot+0xb4 (xntimer_tick_aperiodic+0x1a0)
:|  # [ 4961] -<?>-   -1 -5107+   2.476  __xnpod_schedule+0x77 (xnintr_clock_handler+0x105)
:|  # [ 4957] -<?>-   50 -5104!  59.785  __xnpod_schedule+0x45d (xnpod_suspend_thread+0x197)
:|  # [ 4957] -<?>-   50 -5045+   1.794  __xnpod_schedule+0x77 (xnpod_suspend_thread+0x197)
:|  # [ 4961] -<?>-   -1 -5043! 538.795  __xnpod_schedule+0x45d (xnintr_clock_handler+0x105)
:|  # [ 5012] -<?>-   99 -4504+   2.195  xnpod_resume_thread+0x4e (xnthread_periodic_handler+0x28)
:|  # event   tick at -129  -4502+   4.180  xntimer_next_local_shot+0xb4 (xntimer_tick_aperiodic+0x1a0)
:|  # [ 4961] -<?>-   -1 -4498+   2.596  __xnpod_schedule+0x77 (xnintr_clock_handler+0x105)
:|  # [ 5012] -<?>-   99 -4495! 677.824  __xnpod_schedule+0x45d (xnpod_suspend_thread+0x197)
:|  # [ 5012] -<?>-   99 -3817+   5.844  __xnpod_schedule+0x77 (xnpod_suspend_thread+0x197)
:|  # [ 4961] -<?>-   -1 -3811! 3709.015  __xnpod_schedule+0x45d (xnintr_clock_handler+0x105)
:|  # [ 4957] -<?>-   50  -102+   3.147  xnpod_resume_thread+0x4e (xnthread_periodic_handler+0x28)
:|  # event   tick at 474     -99+   4.210  xntimer_next_local_shot+0xb4 (xntimer_tick_aperiodic+0x1a0)
:|  # [ 4961] -<?>-   -1   -95+   3.388  __xnpod_schedule+0x77 (xnintr_clock_handler+0x105)
:|  # [ 4957] -<?>-   50   -92!  55.645  __xnpod_schedule+0x45d (xnpod_suspend_thread+0x197)
:|  # [ 4957] -<?>-   50   -36+   2.065  __xnpod_schedule+0x77 (xnpod_suspend_thread+0x197)
:|  # [ 4961] -<?>-   -1   -34!  34.474  __xnpod_schedule+0x45d (xnintr_clock_handler+0x105)
<    #freeze  0x00000001     0    489.264  update_curr_rt+0x10f (task_tick_rt+0x15)
 |  # [ 5012] -<?>-   99   489      1.724  xnpod_resume_thread+0x4e (xnthread_periodic_handler+0x28)
 |  # event   tick at 4870    490      3.147  xntimer_next_local_shot+0xb4 (xntimer_tick_aperiodic+0x1a0)
 |  # [    0] -<?>-   -1   494      2.866  __xnpod_schedule+0x77 (xnintr_clock_handler+0x105)
 |  # [ 5012] -<?>-   99   497    662.948  __xnpod_schedule+0x45d (xnpod_suspend_thread+0x197)
 |  # [ 5012] -<?>-   99  1159      3.608  __xnpod_schedule+0x77 (xnpod_suspend_thread+0x197)
 |  # [    0] -<?>-   -1  1163    3720.243  __xnpod_schedule+0x45d (xnintr_clock_handler+0x105)
 |  # [ 4957] -<?>-   50  4883      2.085  xnpod_resume_thread+0x4e (xnthread_periodic_handler+0x28)
 |  # event   tick at 5474   4885      3.368  xntimer_next_local_shot+0xb4 (xntimer_tick_aperiodic+0x1a0)
 |  # [    0] -<?>-   -1  4889      1.233  __xnpod_schedule+0x77 (xnintr_clock_handler+0x105)
 |  # [ 4957] -<?>-   50  4890      0.000  __xnpod_schedule+0x45d (xnpod_suspend_thread+0x197)
//================== frozen file end =======================

There is no process utilizing the CPU  for > 950000 us, which
leads me to the conclusion, that the problem is not caused by
an infinite loop.
However I observe delay values of up to 3900 us multiple times which
makes me think that there is a high CPU utilization which in the
worst case exceeds the limit of 950000 us per 1 second interval.
Probably I would have to sum up all delay values of processes in 
secondary mode within 1 second time interval and see if it exceeds
950000 us.

Now I have the following questions
1. What is meant by "domain stalled", "current" and "current+stalled"?
2. What is that process having ID "[   0]"
3. How can I tell processes running in secondary mode from others?
   (just by looking at the priority value?)

Any hint on interpreting my frozen file is appreciated.

Thanks
Jochen



More information about the Xenomai mailing list