[RFC][PATCH 4.19] x86/ipipe: Protect TLB flushing against context switch by head domain

Jan Kiszka jan.kiszka at siemens.com
Thu Mar 12 17:12:23 CET 2020


On 12.03.20 16:59, Philippe Gerum wrote:
> On 3/12/20 2:48 PM, Jan Kiszka wrote:
>> From: Jan Kiszka <jan.kiszka at siemens.com>
>>
>> A Xenomai application is very rarely triggering
>>
>> WARNING: CPU: 0 PID: 1997 at arch/x86/mm/tlb.c:560 [...]
>> (local_tlb_gen > mm_tlb_gen)
>>
>> This could be triggered by loaded_mm and loaded_mm_asid becoming out of
>> sync when flush_tlb_func_common is interrupted by the head domain to
>> switch a real-time task right between the retrieval of both values, or
>> maybe even after that but before writing mm_tlb_gen back to
>> cpu_tlbstate.ctxs[loaded_mm_asid].tlb_gen.
>>
>> Avoid that case by making the retrieval atomic while keeping the TLB
>> flush interruptible. Now, there could still be interrupt during the
>> flush. To avoid writing back to the wrong context, we first atomically
>> check after the flush if nothing changed and only write if that is the
>> case. That may mean another TLB flush is triggered needlessly, but
>> that's rare and acceptable.
>>
>> Signed-off-by: Jan Kiszka <jan.kiszka at siemens.com>
>> ---
>>
>> Due to the rare nature of this issue, we are not yet confident to have
>> truly fixed it this way.
>>
>> Philippe, I'm seeing some similar attempt in dovetail but it appears to
>> me it's missing some cases.
> 
> Not "some cases", but the last one in your patch specifically if I read
> it correctly, which I assumed was not applicable, at least not the way I
> read your change, when I worked on this a year ago. This explains why
> that particular change is not present in the commit (3aa2fc2fb4c) you
> seem to have cherry picked from dovetail for the 5.x kernel series. This
> said, these are tricky issues, so as you hinted in your commit log,
> there is likely room for improvement in any case, and I may have
> overlooked things.
> 
>> Too bad that development was forking here
>> and information isn't flowing smoothly yet.
> 
> You just demonstrated that the information is there, and that anyone can
> access it freely by looking at the EVL development tree. I'm sorry to

It's there but it now requires polling to extract it. I suspect I will 
find more interesting changes once reviewing the dovetail queue 
completely (I already found the reverse: KVM was broken in dovetail due 
to incomplete forward porting; will fix when I come along the code).

> hear that forking my own code for the most part in order to find a
> better approach for others to benefit from in the long run can be a
> problem. I did not find any other way to go back to the drawing board as
> required by the technical goals I'm pursuing with EVL, which differ from
> Xenomai's.

I've seen this with other spin-offs/rewrites/etc. of the ipipe-like 
kernel queue a couple of times: Even if colors and edges look 
differently, the core concept remains the same. Thus you also share the 
conceptual problems - and often also the solutions. Doing this multiple 
times is just wasted time. That's why we really need to get Xenomai 
based in dovetail for upcoming kernels so that test results and fixes 
flow in both directions automatically again.

Jan

-- 
Siemens AG, Corporate Technology, CT RDA IOT SES-DE
Corporate Competence Center Embedded Linux



More information about the Xenomai mailing list