Cobalt Preemption of kernel update_fast_timekeeper can cause deadlocks

Philippe Gerum rpm at xenomai.org
Wed Dec 19 14:47:15 CET 2018


On 12/19/18 1:44 PM, Jan Kiszka wrote:
> On 19.12.18 13:09, Philippe Gerum via Xenomai wrote:
>> On 12/19/18 11:20 AM, Lange Norbert via Xenomai wrote:
>>> There is a deadlock issue that haunted me for several weeks,
>>> it is caused by the kernels update of the user-visible
>>> timekeeping structures used by the VDSO clock_gettime functions.
>>>
>>> The kernel regularly updates a Timestamp structure, which is
>>> accessible in user-mode,
>>> it does so by something akin to a rw-lock in update_fast_timekeeper.
>>>
>>> If cobalt preempts the core during holding the lock, any thread
>>> trying to read the time
>>> will continue to spin. (This alone is an issue IMHO).
>>> If the cobalt thread itself now call the vDSO function as reader,
>>> it will spin on the lock and block the lock from getting released.
>>>
>>>
>>> Either the update_fast_timekeeper funtion should not be preemptible
>>> by cobalt,
>>> or the spin-lock on reading could fallback to the syscall after a
>>> certain amount of retries.
>>>
>>> The later is probably easier to implement, but then could randomly
>>> demote cobalt threads.
>>> (on the other hand, this would be always a demotion on platforms
>>> without the vdso function)
>>>
>>
>> update_vsyscall() is locking the write-side. If the analysis is
>> correct, this patch may help at the expense of a some cycles more
>> spent uninterruptible:
>>
>> diff --git a/arch/x86/entry/vsyscall/vsyscall_gtod.c
>> b/arch/x86/entry/vsyscall/vsyscall_gtod.c
>> index 9fb89b6e88c3..e9baa57e8385 100644
>> --- a/arch/x86/entry/vsyscall/vsyscall_gtod.c
>> +++ b/arch/x86/entry/vsyscall/vsyscall_gtod.c
>> @@ -32,11 +32,14 @@ void update_vsyscall(struct timekeeper *tk)
>>   {
>>       int vclock_mode = tk->tkr_mono.clock->archdata.vclock_mode;
>>       struct vsyscall_gtod_data *vdata = &vsyscall_gtod_data;
>> +    unsigned long flags;
>>         /* Mark the new vclock used. */
>>       BUILD_BUG_ON(VCLOCK_MAX >= 32);
>>       WRITE_ONCE(vclocks_used, READ_ONCE(vclocks_used) | (1 <<
>> vclock_mode));
>>   +    flags = hard_cond_local_irq_save();
>> +
>>       gtod_write_begin(vdata);
>>         /* copy vsyscall data */
>> @@ -77,6 +80,8 @@ void update_vsyscall(struct timekeeper *tk)
>>         gtod_write_end(vdata);
>>   +    hard_cond_local_irq_restore(flags);
>> +
>>       if (tk->tkr_mono.clock == &clocksource_tsc)
>>           ipipe_update_hostrt(tk);
>>   }
>>
> 
> This should rather be an application bug: An RT (Xenomai) thread is
> apparently using Linux gettimeofday & Co. (glibc) from RT context.

Yes, it definitely is. But the issue is with coping with legacy.
CLOCK_HOST_REALTIME is indeed the best way around this.

 That
> was never supported, we rather have RT services for that
> (CLOCK_HOST_REALTIME).
> 
> We may think about detecting such cases better, though. Norbert, are you
> using native/alchemy APIs?
> 


-- 
Philippe.



More information about the Xenomai mailing list