Cobalt Preemption of kernel update_fast_timekeeper can cause deadlocks

Lange Norbert norbert.lange at andritz.com
Wed Dec 19 14:08:58 CET 2018



> -----Original Message-----
> From: Jan Kiszka <jan.kiszka at siemens.com>
> Sent: Mittwoch, 19. Dezember 2018 13:45
> To: Philippe Gerum <rpm at xenomai.org>; Lange Norbert
> <norbert.lange at andritz.com>; Xenomai (xenomai at xenomai.org)
> <xenomai at xenomai.org>
> Subject: Re: Cobalt Preemption of kernel update_fast_timekeeper can cause
> deadlocks
>
> E-MAIL FROM A NON-ANDRITZ SOURCE: AS A SECURITY MEASURE, PLEASE
> EXERCISE CAUTION WITH E-MAIL CONTENT AND ANY LINKS OR
> ATTACHMENTS.
>
>
> On 19.12.18 13:09, Philippe Gerum via Xenomai wrote:
> > On 12/19/18 11:20 AM, Lange Norbert via Xenomai wrote:
> >> There is a deadlock issue that haunted me for several weeks, it is
> >> caused by the kernels update of the user-visible timekeeping
> >> structures used by the VDSO clock_gettime functions.
> >>
> >> The kernel regularly updates a Timestamp structure, which is
> >> accessible in user-mode, it does so by something akin to a rw-lock in
> update_fast_timekeeper.
> >>
> >> If cobalt preempts the core during holding the lock, any thread
> >> trying to read the time will continue to spin. (This alone is an issue IMHO).
> >> If the cobalt thread itself now call the vDSO function as reader, it
> >> will spin on the lock and block the lock from getting released.
> >>
> >>
> >> Either the update_fast_timekeeper funtion should not be preemptible
> >> by cobalt, or the spin-lock on reading could fallback to the syscall after a
> certain amount of retries.
> >>
> >> The later is probably easier to implement, but then could randomly
> demote cobalt threads.
> >> (on the other hand, this would be always a demotion on platforms
> >> without the vdso function)
> >>
> >
> > update_vsyscall() is locking the write-side. If the analysis is correct, this
> patch may help at the expense of a some cycles more spent uninterruptible:
> >
> > diff --git a/arch/x86/entry/vsyscall/vsyscall_gtod.c
> > b/arch/x86/entry/vsyscall/vsyscall_gtod.c
> > index 9fb89b6e88c3..e9baa57e8385 100644
> > --- a/arch/x86/entry/vsyscall/vsyscall_gtod.c
> > +++ b/arch/x86/entry/vsyscall/vsyscall_gtod.c
> > @@ -32,11 +32,14 @@ void update_vsyscall(struct timekeeper *tk)
> >   {
> >       int vclock_mode = tk->tkr_mono.clock->archdata.vclock_mode;
> >       struct vsyscall_gtod_data *vdata = &vsyscall_gtod_data;
> > +     unsigned long flags;
> >
> >       /* Mark the new vclock used. */
> >       BUILD_BUG_ON(VCLOCK_MAX >= 32);
> >       WRITE_ONCE(vclocks_used, READ_ONCE(vclocks_used) | (1 <<
> > vclock_mode));
> >
> > +     flags = hard_cond_local_irq_save();
> > +
> >       gtod_write_begin(vdata);
> >
> >       /* copy vsyscall data */
> > @@ -77,6 +80,8 @@ void update_vsyscall(struct timekeeper *tk)
> >
> >       gtod_write_end(vdata);
> >
> > +     hard_cond_local_irq_restore(flags);
> > +
> >       if (tk->tkr_mono.clock == &clocksource_tsc)
> >               ipipe_update_hostrt(tk);
> >   }
> >
>
> This should rather be an application bug: An RT (Xenomai) thread is
> apparently using Linux gettimeofday & Co. (glibc) from RT context. That was
> never supported, we rather have RT services for that
> (CLOCK_HOST_REALTIME).

Any RT thread can preempt the kernel holding the write-lock. A deadlock only occurs if
*that* thread then tries to read-lock - true.
Keeping any number of linux threads in the reader "spin-lock" for the duration of the cobald mode will always happen
(I assume update_vsyscall is called regularly)

>
> We may think about detecting such cases better, though. Norbert, are you
> using native/alchemy APIs?

Native.
Btw. I thought mixing APIs is explicitly supported and one of the main features of Xenomai (like reading state, synchronized with RT mutexes and logging it to the filesystem), the existence of prio-inheritance would strongly imply that aswell. Otherwise, it's rather hard to guess which thread runs in the cobalt context, particularly it if might be just a temporal priority boost.
Compiling your own code with the "cobalt wrappers" is no issue, the deadlock above was caused by
Indirectly by another "linux" DSO (libstdc++).

Norbert
________________________________

This message and any attachments are solely for the use of the intended recipients. They may contain privileged and/or confidential information or other information protected from disclosure. If you are not an intended recipient, you are hereby notified that you received this email in error and that any review, dissemination, distribution or copying of this email and any attachment is strictly prohibited. If you have received this email in error, please contact the sender and delete the message and any attachment from your system.

ANDRITZ HYDRO GmbH


Rechtsform/ Legal form: Gesellschaft mit beschränkter Haftung / Corporation

Firmensitz/ Registered seat: Wien

Firmenbuchgericht/ Court of registry: Handelsgericht Wien

Firmenbuchnummer/ Company registration: FN 61833 g

DVR: 0605077

UID-Nr.: ATU14756806


Thank You
________________________________


More information about the Xenomai mailing list