rt_pipe_write memory allocation bug - xenomai 3.x

Jan Kiszka jan.kiszka at siemens.com
Thu Jul 30 15:25:05 CEST 2020


On 30.07.20 11:46, Stéphane Ancelot wrote:
> 
> Le 30/07/2020 à 10:47, Jan Kiszka a écrit :
>> On 30.07.20 10:43, Stéphane Ancelot wrote:
>>>
>>> Le 30/07/2020 à 00:08, Jan Kiszka a écrit :
>>>> On 28.07.20 15:28, Stéphane Ancelot wrote:
>>>>>
>>>>> Le 27/07/2020 à 15:17, Jan Kiszka a écrit :
>>>>>> On 27.07.20 14:44, Stéphane Ancelot via Xenomai wrote:
>>>>>>> Hi,
>>>>>>>
>>>>>>> Using pipe created with poolsize = 0, meaning all message 
>>>>>>> allocations for this pipe are performed on the Cobalt core heap.
>>>>>>>
>>>>>>> Unfortunately,  using rt_pipe_write(), when no user task is 
>>>>>>> consuming it, we discovered after almost many rt_pipe_write() 
>>>>>>> cycles (700000 at least in our process) , that the cobalt heap 
>>>>>>> and system heap seem being corrupted.
>>>>>>>
>>>>>>> Leading to system issues like unattended task crashes .....
>>>>>>>
>>>>>>
>>>>>> "3.x" implies both 3.1 and 3.0 are affected?
>>>>>>
>>>>>> Do you see a constantly growing use of system heap (leak)? If that 
>>>>>> is not the case, we might have some wrap-around issue somewhere.
>>>>>>
>>>>> The version we are using is  based on release b3e18b6d  of master 
>>>>> branch.
>>>>>
>>>>> We don't sea system memory increasing (using top).
>>>>>
>>>>> Comparing it to the latest releases, we have not found any big 
>>>>> differences in xddp code .
>>>>>
>>>>> Using other releases , applications and compiled kernel does not 
>>>>> warranty  to identify it has been solved , since the memory mapping 
>>>>> to reproduce it , changes.
>>>>>
>>>>> For certifications reasons, we can't validate the latest source 
>>>>> code, but only cherry pick a localised hotfix in the xenomai code.
>>>>>
>>>>>
>>>>>> Reproduction case would be nice.
>>>>>>
>>>>> It is not easy, the initial problem was reported by one of our 
>>>>> users , we spent lot of time to achieve to reproduce it in our 
>>>>> context.
>>>>>
>>>>> Some graphics user tasks were locking or crashing after some days 
>>>>> usage and production .
>>>>>
>>>>> At first,  we went in wrong directions in order to identify from 
>>>>> where it could happen.
>>>>>
>>>>> In our system, we had to test each code commits back....in order to 
>>>>> isolate the problem, and understand that it was visible after 
>>>>> almost 700000 rt_pipe_write calls in our case.
>>>>>
>>>>>
>>>>> As a unittest, we can provide the enclosed snippet.That is the 
>>>>> extracted code that would cause problem.
>>>>>
>>>>
>>>> Under which condition does that test_pipe.cpp cause the issue? I've 
>>>> given it a quick try, and as it's late, I disabled the delay in the 
>>>> loop. That so far did not trigger an issue. Is the delay important?
>>>>
>>> The delay is not important , this is the rt_pripe_write() number of 
>>> calls, that are not consumed.
>>>
>>> Not easy to identify the memory leak in the heap.
>>>
>>> Either use a system with low memory.
>>>
>>> I have not tried it, but I suppose filling system memory, at a moment 
>>> it will crash it overwriting importing system data.
>>>
>>
>> So, letting that test run long enough on your system will crash it?
> 
> 
> I will have to try it.
> 

I'm pretty sure we are missing at least one more variable to make this 
trigger.

Jan

-- 
Siemens AG, Corporate Technology, CT RDA IOT SES-DE
Corporate Competence Center Embedded Linux



More information about the Xenomai mailing list