FW: Xenomai with isolcpus and workqueue task

Alexander Frolov frolov at nicevt.ru
Mon Jul 13 12:26:46 CEST 2020


-----Original Message-----
>> From: Lange Norbert
>> Sent: Montag, 13. Juli 2020 10:34
>> To: Alexander Frolov <frolov at nicevt.ru>
>> Subject: RE: Xenomai with isolcpus and workqueue task
>>
>>
>>
>>> -----Original Message-----
>>> From: Xenomai <xenomai-bounces at xenomai.org> On Behalf Of Alexander
>>> Frolov via Xenomai
>>> Sent: Samstag, 11. Juli 2020 16:26
>>> To: xenomai at xenomai.org
>>> Subject: Xenomai with isolcpus and workqueue task
>>>
>>> NON-ANDRITZ SOURCE: BE CAUTIOUS WITH CONTENT, LINKS OR
>> ATTACHMENTS.
>>>
>>> Hi all!
>>>
>>> I am using Xenomai 3.1 with 4.19.124 I-pipe patchon a smp motherboard.
>>> For my RT task I allocate few CPU cores with isolcpus option. However,
>>> large latency spikes are noticed due to igb watchdog activities (I am
>>> using common igb driver, not rt_igb).
>>>
>>> Looking into igb sources, it was understood that workqueue is used for
>>> some tasks (afaiu, it is used to link status
>>> monitoring)
>>>
>>> from igb_main.c
>>> ...
>>>     INIT_WORK(&adapter->reset_task, igb_reset_task);
>>>     INIT_WORK(&adapter->watchdog_task, igb_watchdog_task); ...
>>>
>>> The Linux kernel scheduler runs this igb activities on isolated CPUs
>>> disregarding isolcpus option, ruining real-time system behavior.
>> isolcpus does not mean the CPUs aren't used, it means they are excluded
>> from the normal CPU scheduler. No process will automatically be moved
>> from/to isolated CPUs, but you still need to make sure to free them of any
>> tasks.
>> Irq-handlers still run anywhere, and processes still can allow those CPUs to
>> be used.
>>
>>> So the question, is it a correct way to use normal igb on Xenomai at
>>> all or it is not recommended? What can be done to prohibit Linux
>>> scheduler to allocate those tasks on isolated cores?
>> I use the normal igb and rt_igb concurrently, I doubt it is recommended but
>> possible ;)
>>
>> You should add irqaffinity=0 to the cmdline (CPU0 is apparently always used
>> for irqs), then check 'cat /proc/irq/*/smp_affinity'. This keeps the other
>> CPUs free from linux IRQs.
>> You can use some measures to bind Linux tasks to CPU0 aswell. One of:
>>
>> -   isolcpus (sets default affinity mask aswell)
>> -   set affinity early (like in Ramdisk)
>> -   Use cgroups (cset-shield)
>>
>> Only cgroups actually prohibit processes ignoring your defaults and using
>> other CPUs, I did not get around playing with this, and just use isolcpus.
>>
>> But the most important part is to dont run RT on cores dealing with Linux
>> interrupts, some handlers/drivers don’t expects being preempted, had the
>> MMC driver bail because of a timeout.
>>
>> I haven’t solved moving the rtnet-stack, rtnet-rpc off CPU0, and the rt_igb
>> IRQs will use all CPUs.
>>
>> Norbert
Thank you! Using an IRQ affinity feature to move handlers to specified 
cores is very
practical, but in this case we experience problem with another artefakt 
of igb.

Just as example of influence of igb activity (igb_watchdog_task) on CPU4 
(which is an
isolated one).

# cat /proc/ipipe/trace/frozen | grep '\!'
...
:  +func               -1231!  52.379  igb_rd32+0x0 [igb] 
(igb_update_stats+0x520 [igb])
:  +func               -1145!  45.864  igb_rd32+0x0 [igb] 
(igb_update_stats+0x536 [igb])
:  +func               -1099!  51.917  igb_rd32+0x0 [igb] 
(igb_update_stats+0x75a [igb])
:  +func               -1047!  51.517  igb_rd32+0x0 [igb] 
(igb_update_stats+0x782 [igb])
:  +func                -996!  51.988  igb_rd32+0x0 [igb] 
(igb_update_stats+0x54e [igb])
:  +func                -944!  51.436  igb_rd32+0x0 [igb] 
(igb_update_stats+0x564 [igb])
:  +func                -893!  52.569  igb_rd32+0x0 [igb] 
(igb_update_stats+0x57a [igb])
:  +func                -840!  52.529  igb_rd32+0x0 [igb] 
(igb_update_stats+0x590 [igb])
:  +func                -787!  52.018  igb_rd32+0x0 [igb] 
(igb_update_stats+0x5a6 [igb])
:  +func                -735!  52.058  igb_rd32+0x0 [igb] 
(igb_update_stats+0x5bc [igb])
:  +func                -683!  51.497  igb_rd32+0x0 [igb] 
(igb_update_stats+0x5d2 [igb])
:  +func                -632!  51.436  igb_rd32+0x0 [igb] 
(igb_update_stats+0x5e8 [igb])
:  +func                -580!  51.416  igb_rd32+0x0 [igb] 
(igb_update_stats+0x5fe [igb])
:  +func                -529!  52.038  igb_rd32+0x0 [igb] 
(igb_update_stats+0x614 [igb])
:  +func                -477!  52.058  igb_rd32+0x0 [igb] 
(igb_update_stats+0x62a [igb])
:  +func                -425!  51.436  igb_rd32+0x0 [igb] 
(igb_update_stats+0x6f4 [igb])
:  +func                -373!  51.416  igb_rd32+0x0 [igb] 
(igb_update_stats+0x70a [igb])
:  +func                -322!  51.517  igb_rd32+0x0 [igb] 
(igb_update_stats+0x720 [igb])
:  +func                -271!  51.847  igb_rd32+0x0 [igb] 
(igb_update_stats+0x736 [igb])
:  +func                -189!  47.247  igb_rd32+0x0 [igb] 
(igb_ptp_rx_hang+0x1e [igb])
:  +func                -103!  72.735  igb_rd32+0x0 [igb] 
(igb_watchdog_task+0x66a [igb])


 From the other side, rt kernels do not have this issue, probably 
because of modified workqueue
subsystem.

Any ideas how to keep this work out of critical code?

Thank you,
    Alex Frolov





More information about the Xenomai mailing list