rt_pipe_write memory allocation bug - xenomai 3.x

Stéphane Ancelot sancelot at numalliance.com
Thu Jul 30 16:09:59 CEST 2020


Le 30/07/2020 à 15:25, Jan Kiszka a écrit :
> On 30.07.20 11:46, Stéphane Ancelot wrote:
>>
>> Le 30/07/2020 à 10:47, Jan Kiszka a écrit :
>>> On 30.07.20 10:43, Stéphane Ancelot wrote:
>>>>
>>>> Le 30/07/2020 à 00:08, Jan Kiszka a écrit :
>>>>> On 28.07.20 15:28, Stéphane Ancelot wrote:
>>>>>>
>>>>>> Le 27/07/2020 à 15:17, Jan Kiszka a écrit :
>>>>>>> On 27.07.20 14:44, Stéphane Ancelot via Xenomai wrote:
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> Using pipe created with poolsize = 0, meaning all message 
>>>>>>>> allocations for this pipe are performed on the Cobalt core heap.
>>>>>>>>
>>>>>>>> Unfortunately,  using rt_pipe_write(), when no user task is 
>>>>>>>> consuming it, we discovered after almost many rt_pipe_write() 
>>>>>>>> cycles (700000 at least in our process) , that the cobalt heap 
>>>>>>>> and system heap seem being corrupted.
>>>>>>>>
>>>>>>>> Leading to system issues like unattended task crashes .....
>>>>>>>>
>>>>>>>
>>>>>>> "3.x" implies both 3.1 and 3.0 are affected?
>>>>>>>
>>>>>>> Do you see a constantly growing use of system heap (leak)? If 
>>>>>>> that is not the case, we might have some wrap-around issue 
>>>>>>> somewhere.
>>>>>>>
>>>>>> The version we are using is  based on release b3e18b6d of master 
>>>>>> branch.
>>>>>>
>>>>>> We don't sea system memory increasing (using top).
>>>>>>
>>>>>> Comparing it to the latest releases, we have not found any big 
>>>>>> differences in xddp code .
>>>>>>
>>>>>> Using other releases , applications and compiled kernel does not 
>>>>>> warranty  to identify it has been solved , since the memory 
>>>>>> mapping to reproduce it , changes.
>>>>>>
>>>>>> For certifications reasons, we can't validate the latest source 
>>>>>> code, but only cherry pick a localised hotfix in the xenomai code.
>>>>>>
>>>>>>
>>>>>>> Reproduction case would be nice.
>>>>>>>
>>>>>> It is not easy, the initial problem was reported by one of our 
>>>>>> users , we spent lot of time to achieve to reproduce it in our 
>>>>>> context.
>>>>>>
>>>>>> Some graphics user tasks were locking or crashing after some days 
>>>>>> usage and production .
>>>>>>
>>>>>> At first,  we went in wrong directions in order to identify from 
>>>>>> where it could happen.
>>>>>>
>>>>>> In our system, we had to test each code commits back....in order 
>>>>>> to isolate the problem, and understand that it was visible after 
>>>>>> almost 700000 rt_pipe_write calls in our case.
>>>>>>
>>>>>>
>>>>>> As a unittest, we can provide the enclosed snippet.That is the 
>>>>>> extracted code that would cause problem.
>>>>>>
>>>>>
>>>>> Under which condition does that test_pipe.cpp cause the issue? 
>>>>> I've given it a quick try, and as it's late, I disabled the delay 
>>>>> in the loop. That so far did not trigger an issue. Is the delay 
>>>>> important?
>>>>>
>>>> The delay is not important , this is the rt_pripe_write() number of 
>>>> calls, that are not consumed.
>>>>
>>>> Not easy to identify the memory leak in the heap.
>>>>
>>>> Either use a system with low memory.
>>>>
>>>> I have not tried it, but I suppose filling system memory, at a 
>>>> moment it will crash it overwriting importing system data.
>>>>
>>>
>>> So, letting that test run long enough on your system will crash it?
>>
>>
>> I will have to try it.
>>
>
> I'm pretty sure we are missing at least one more variable to make this 
> trigger.

Since it overwrites some xenomai heap data, may be another task using 
xenomai memory is needed.

in my case this is a GUI task linked to some shared xenomai memory area, 
and when it locks, this is doing a specific action, (listing  file 
directory to open a file).

>
> Jan
>


More information about the Xenomai mailing list