race in timerobj

Jan Kiszka jan.kiszka at siemens.com
Mon Jan 25 16:26:49 CET 2021


On 25.01.21 16:10, Ronny Meeus wrote:
> Jan,
> 
> This is my last attempt since I do not want to waste more time on this issue.
> Feel free to change whatever you want to the patch.
> I put my gmail in the correct mode but it keeps on wrapping
> the text and git email also doesn't work (we do not use git overhere)

That's why there is git send-email, to by-pass unsuitable interfaces
(including web browsers).

Thanks anyway, I've hand-edited the patch to make it apply.

Philippe, any feedback from you on the approach?

Jan

> 
> Best regards,
> Ronny
> 
> commit 352b0355f617bfbb38f9bd7b1f7d74967f44c857
> Author: Ronny Meeus <ronny.meeus at gmail.com>
> Date:   Mon Jan 25 15:11:52 2021 +0100
> 
>     copperplate/timerobj: fix corrupted timer list.
> 
>     We observe an issue that the timer-list gets corrupted resulting in an
>     endless loop executed by the timer-server thread.
> 
>     During the processing of the timeout list, a pointer to the next timer
>     to be handled is kept in the tmp stack variable.
>     Just before calling the timer handler of the current timer the lock on
>     the timer list is released giving other threads to change the list.
>     If the timer currently referenced by tmp is deleted, we end up with an
>     invalid node (next pointer pointing to itself) and this will result in
>     an endless loop of the timer server.
> 
>     Test code is not available but I have seen this issue in our real
>     production code and after applying this path, the issue is solved.
> 
>     The patch basically changes the timer server logic to always start
>     from the beginning of the list since when a timer is processed, it is
>     either removed (one-shot) or reinserted in a different location in the
>     list.
>     The processing of the list will stop anyhow if all timers that need
>     to expire up to "now" are handled.
> 
>     Signed-off-by: Ronny Meeus <ronny.meeus at gmail.com>
> 
> diff --git a/lib/copperplate/timerobj.c b/lib/copperplate/timerobj.c
> index 08cc0d3..96ad2e5 100644
> --- a/lib/copperplate/timerobj.c
> +++ b/lib/copperplate/timerobj.c
> @@ -100,7 +100,7 @@ static void *timerobj_server(void *arg)
>  {
>         void (*handler)(struct timerobj *tmobj);
>         struct timespec now, value, interval;
> -       struct timerobj *tmobj, *tmp;
> +       struct timerobj *tmobj;
>         sigset_t set;
>         int sig, ret;
> 
> @@ -119,7 +119,9 @@ static void *timerobj_server(void *arg)
> 
>                 __RT(clock_gettime(CLOCK_COPPERPLATE, &now));
> 
> -               pvlist_for_each_entry_safe(tmobj, tmp, &svtimers, next) {
> +               while (!pvlist_empty(&svtimers)) {
> +                       tmobj = pvlist_first_entry(&svtimers,
> typeof(*tmobj), next);
> +
>                         value = tmobj->itspec.it_value;
>                         interval = tmobj->itspec.it_interval;
>                         handler = tmobj->handler;
> 
> 
> Op ma 25 jan. 2021 om 15:36 schreef Jan Kiszka <jan.kiszka at siemens.com>:
>>
>> On 25.01.21 15:27, Ronny Meeus wrote:
>>> commit b38a39bcd758872201e71c07a24d8b5e7f26c3ac
>>> Author: Ronny Meeus <ronny.meeus at gmail.com>
>>> Date:   Mon Jan 25 15:11:52 2021 +0100
>>>
>>>     We observe an issue that the timer-list gets corrupted resulting in an
>>>     endless loop executed by the timer-server thread.
>>>
>>>     During the processing of the timeout list, a pointer to the next timer
>>>     to be handled is kept in the tmp stack variable.
>>>     Just before calling the timer handler of the current timer the lock on
>>>     the timer list is released giving other threads the opportunity to
>>> change the list.
>>>     If the timer currently referenced by tmp is deleted, we and up with an
>>>     invalid node (next pointer pointing to itself) and this will result in
>>>     an endless loop of the timer server.
>>>
>>>     Test code is not available but I have seen this issue in our real
>>>     production code and after applying this path, the issue is solved.
>>>
>>>     The patch basically changes the timer server logic to always start
>>>     from the beginning of the list since when a timer is processed, it is
>>>     either removed (one-shot) or reinserted in a different location in the
>>>     list.
>>>     The processing of the list will stop anyhow if all timers that need
>>>     to expire up to "now" are handled.
>>>
>>
>> DCO (https://developercertificate.org) via signed-off-by missing.
>>
>>> diff --git a/lib/copperplate/timerobj.c b/lib/copperplate/timerobj.c
>>> index 08cc0d3..b6c6abb 100644
>>> --- a/lib/copperplate/timerobj.c
>>> +++ b/lib/copperplate/timerobj.c
>>> @@ -100,7 +100,7 @@ static void *timerobj_server(void *arg)
>>>  {
>>>         void (*handler)(struct timerobj *tmobj);
>>>         struct timespec now, value, interval;
>>> -       struct timerobj *tmobj, *tmp;
>>> +       struct timerobj *tmobj;
>>>         sigset_t set;
>>>         int sig, ret;
>>>
>>> @@ -119,7 +119,12 @@ static void *timerobj_server(void *arg)
>>>
>>>                 __RT(clock_gettime(CLOCK_COPPERPLATE, &now));
>>>
>>> -               pvlist_for_each_entry_safe(tmobj, tmp, &svtimers, next) {
>>> +               for (;;) {
>>> +                       if (pvlist_empty(&svtimers)) {
>>
>> How about "while (!pvlist_empty(&svtimers)) ..."?
>>
>>> +                               break;
>>> +                       }
>>> +                       tmobj = pvlist_first_entry(&svtimers,
>>> typeof(*tmobj), next);
>>> +
>>
>> And then your mail client rewraps the email. If possible, disable this,
>> otherwise use git send-email.
>>
>>>                         value = tmobj->itspec.it_value;
>>>                         interval = tmobj->itspec.it_interval;
>>>                         handler = tmobj->handler;
>>>
>>>
>>
>> Thanks,
>> Jan
>>
>> --
>> Siemens AG, T RDA IOT
>> Corporate Competence Center Embedded Linux



More information about the Xenomai mailing list