race in timerobj

Ronny Meeus ronny.meeus at gmail.com
Mon Jan 25 16:10:37 CET 2021


Jan,

This is my last attempt since I do not want to waste more time on this issue.
Feel free to change whatever you want to the patch.
I put my gmail in the correct mode but it keeps on wrapping
the text and git email also doesn't work (we do not use git overhere)

Best regards,
Ronny

commit 352b0355f617bfbb38f9bd7b1f7d74967f44c857
Author: Ronny Meeus <ronny.meeus at gmail.com>
Date:   Mon Jan 25 15:11:52 2021 +0100

    copperplate/timerobj: fix corrupted timer list.

    We observe an issue that the timer-list gets corrupted resulting in an
    endless loop executed by the timer-server thread.

    During the processing of the timeout list, a pointer to the next timer
    to be handled is kept in the tmp stack variable.
    Just before calling the timer handler of the current timer the lock on
    the timer list is released giving other threads to change the list.
    If the timer currently referenced by tmp is deleted, we end up with an
    invalid node (next pointer pointing to itself) and this will result in
    an endless loop of the timer server.

    Test code is not available but I have seen this issue in our real
    production code and after applying this path, the issue is solved.

    The patch basically changes the timer server logic to always start
    from the beginning of the list since when a timer is processed, it is
    either removed (one-shot) or reinserted in a different location in the
    list.
    The processing of the list will stop anyhow if all timers that need
    to expire up to "now" are handled.

    Signed-off-by: Ronny Meeus <ronny.meeus at gmail.com>

diff --git a/lib/copperplate/timerobj.c b/lib/copperplate/timerobj.c
index 08cc0d3..96ad2e5 100644
--- a/lib/copperplate/timerobj.c
+++ b/lib/copperplate/timerobj.c
@@ -100,7 +100,7 @@ static void *timerobj_server(void *arg)
 {
        void (*handler)(struct timerobj *tmobj);
        struct timespec now, value, interval;
-       struct timerobj *tmobj, *tmp;
+       struct timerobj *tmobj;
        sigset_t set;
        int sig, ret;

@@ -119,7 +119,9 @@ static void *timerobj_server(void *arg)

                __RT(clock_gettime(CLOCK_COPPERPLATE, &now));

-               pvlist_for_each_entry_safe(tmobj, tmp, &svtimers, next) {
+               while (!pvlist_empty(&svtimers)) {
+                       tmobj = pvlist_first_entry(&svtimers,
typeof(*tmobj), next);
+
                        value = tmobj->itspec.it_value;
                        interval = tmobj->itspec.it_interval;
                        handler = tmobj->handler;


Op ma 25 jan. 2021 om 15:36 schreef Jan Kiszka <jan.kiszka at siemens.com>:
>
> On 25.01.21 15:27, Ronny Meeus wrote:
> > commit b38a39bcd758872201e71c07a24d8b5e7f26c3ac
> > Author: Ronny Meeus <ronny.meeus at gmail.com>
> > Date:   Mon Jan 25 15:11:52 2021 +0100
> >
> >     We observe an issue that the timer-list gets corrupted resulting in an
> >     endless loop executed by the timer-server thread.
> >
> >     During the processing of the timeout list, a pointer to the next timer
> >     to be handled is kept in the tmp stack variable.
> >     Just before calling the timer handler of the current timer the lock on
> >     the timer list is released giving other threads the opportunity to
> > change the list.
> >     If the timer currently referenced by tmp is deleted, we and up with an
> >     invalid node (next pointer pointing to itself) and this will result in
> >     an endless loop of the timer server.
> >
> >     Test code is not available but I have seen this issue in our real
> >     production code and after applying this path, the issue is solved.
> >
> >     The patch basically changes the timer server logic to always start
> >     from the beginning of the list since when a timer is processed, it is
> >     either removed (one-shot) or reinserted in a different location in the
> >     list.
> >     The processing of the list will stop anyhow if all timers that need
> >     to expire up to "now" are handled.
> >
>
> DCO (https://developercertificate.org) via signed-off-by missing.
>
> > diff --git a/lib/copperplate/timerobj.c b/lib/copperplate/timerobj.c
> > index 08cc0d3..b6c6abb 100644
> > --- a/lib/copperplate/timerobj.c
> > +++ b/lib/copperplate/timerobj.c
> > @@ -100,7 +100,7 @@ static void *timerobj_server(void *arg)
> >  {
> >         void (*handler)(struct timerobj *tmobj);
> >         struct timespec now, value, interval;
> > -       struct timerobj *tmobj, *tmp;
> > +       struct timerobj *tmobj;
> >         sigset_t set;
> >         int sig, ret;
> >
> > @@ -119,7 +119,12 @@ static void *timerobj_server(void *arg)
> >
> >                 __RT(clock_gettime(CLOCK_COPPERPLATE, &now));
> >
> > -               pvlist_for_each_entry_safe(tmobj, tmp, &svtimers, next) {
> > +               for (;;) {
> > +                       if (pvlist_empty(&svtimers)) {
>
> How about "while (!pvlist_empty(&svtimers)) ..."?
>
> > +                               break;
> > +                       }
> > +                       tmobj = pvlist_first_entry(&svtimers,
> > typeof(*tmobj), next);
> > +
>
> And then your mail client rewraps the email. If possible, disable this,
> otherwise use git send-email.
>
> >                         value = tmobj->itspec.it_value;
> >                         interval = tmobj->itspec.it_interval;
> >                         handler = tmobj->handler;
> >
> >
>
> Thanks,
> Jan
>
> --
> Siemens AG, T RDA IOT
> Corporate Competence Center Embedded Linux



More information about the Xenomai mailing list