rt_pipe_write memory allocation bug - xenomai 3.x

Jan Kiszka jan.kiszka at siemens.com
Thu Jul 30 10:47:21 CEST 2020


On 30.07.20 10:43, Stéphane Ancelot wrote:
> 
> Le 30/07/2020 à 00:08, Jan Kiszka a écrit :
>> On 28.07.20 15:28, Stéphane Ancelot wrote:
>>>
>>> Le 27/07/2020 à 15:17, Jan Kiszka a écrit :
>>>> On 27.07.20 14:44, Stéphane Ancelot via Xenomai wrote:
>>>>> Hi,
>>>>>
>>>>> Using pipe created with poolsize = 0, meaning all message 
>>>>> allocations for this pipe are performed on the Cobalt core heap.
>>>>>
>>>>> Unfortunately,  using rt_pipe_write(), when no user task is 
>>>>> consuming it, we discovered after almost many rt_pipe_write() 
>>>>> cycles (700000 at least in our process)  , that the cobalt heap and 
>>>>> system heap seem being corrupted.
>>>>>
>>>>> Leading to system issues like unattended task crashes .....
>>>>>
>>>>
>>>> "3.x" implies both 3.1 and 3.0 are affected?
>>>>
>>>> Do you see a constantly growing use of system heap (leak)? If that 
>>>> is not the case, we might have some wrap-around issue somewhere.
>>>>
>>> The version we are using is  based on release b3e18b6d  of master 
>>> branch.
>>>
>>> We don't sea system memory increasing (using top).
>>>
>>> Comparing it to the latest releases, we have not found any big 
>>> differences in xddp code .
>>>
>>> Using other releases , applications and compiled kernel does not 
>>> warranty  to identify it has been solved , since the memory mapping 
>>> to reproduce it , changes.
>>>
>>> For certifications reasons, we can't validate the latest source code, 
>>> but only cherry pick a localised hotfix in the xenomai code.
>>>
>>>
>>>> Reproduction case would be nice.
>>>>
>>> It is not easy, the initial problem was reported by one of our users 
>>> , we spent lot of time to achieve to reproduce it in our context.
>>>
>>> Some graphics user tasks were locking or crashing after some days 
>>> usage and production .
>>>
>>> At first,  we went in wrong directions in order to identify from 
>>> where it could happen.
>>>
>>> In our system, we had to test each code commits back....in order to 
>>> isolate the problem, and understand that it was visible after almost 
>>> 700000 rt_pipe_write calls in our case.
>>>
>>>
>>> As a unittest, we can provide the enclosed snippet.That is the 
>>> extracted code that would cause problem.
>>>
>>
>> Under which condition does that test_pipe.cpp cause the issue? I've 
>> given it a quick try, and as it's late, I disabled the delay in the 
>> loop. That so far did not trigger an issue. Is the delay important?
>>
> The delay is not important , this is the rt_pripe_write() number of 
> calls, that are not consumed.
> 
> Not easy to identify the memory leak in the heap.
> 
> Either use a system with low memory.
> 
> I have not tried it, but I suppose filling system memory, at a moment it 
> will crash it overwriting importing system data.
> 

So, letting that test run long enough on your system will crash it?

Could you share your kernel .config? I surely have different tunings here.

Jan

-- 
Siemens AG, Corporate Technology, CT RDA IOT SES-DE
Corporate Competence Center Embedded Linux



More information about the Xenomai mailing list