[Xenomai] Race condition between threadobj_unlock and threadobj_free in t_resume

Matthias Schneider ma30002000 at yahoo.de
Sat Apr 12 15:52:56 CEST 2014





----- Original Message -----
> From: Philippe Gerum <rpm at xenomai.org>
> To: Matthias Schneider <ma30002000 at yahoo.de>; "xenomai at xenomai.org" <xenomai at xenomai.org>
> Cc: 
> Sent: Friday, April 11, 2014 4:01 PM
> Subject: Re: [Xenomai] Race condition between threadobj_unlock and threadobj_free in t_resume
> 
> On 04/06/2014 01:24 PM, Matthias Schneider wrote:
> 
>>  The following minimal program
>> 
>>  http://pastebin.com/JdnnXwsF
>> 
>>  seems to lead to a race condition between threadobj_unlock
>>  and threadobj_free as indicated by valgrind (xenomai-forge
>>  and mercury):
>> 
>>  ==9573== Invalid read of size 4
>>  ==9573==    at 0x403F214: threadobj_unlock (threadobj.h:407)
>>  ==9573==    by 0x403FAD2: put_psos_task (task.c:151)
>>  ==9573==    by 0x4040E9F: t_resume (task.c:419)
>>  ==9573==    by 0x80486F9: main (task-2.c:20)
>>  ==9573==  Address 0x42a20c4 is 172 bytes inside a block of size 664 
> free'd
>>  ==9573==    at 0x4029C88: free (vg_replace_malloc.c:446)
>>  ==9573==    by 0x40641B6: pvfree (heapobj.h:156)
>>  ==9573==    by 0x406425A: xnfree (heapobj.h:421)
>>  ==9573==    by 0x406460C: threadobj_free (threadobj.h:285)
>>  ==9573==    by 0x4068C6B: finalize_thread (threadobj.c:1239)
>>  ==9573==    by 0x40B5AE5: __nptl_deallocate_tsd (pthread_create.c:157)
>>  ==9573==    by 0x40B5CFE: start_thread (pthread_create.c:318)
>>  ==9573==    by 0x41C2C3D: clone (clone.S:131)
>>  ==9573==
>> 
>>  I am not sure on how to fix this, any ideas? I do admit
>>  that the use case is neither typical nor very useful on its own,
>>  but it is used in my test environment.
>> 
>>  It seems that when resuming a thread that already is in
>>  its finalizer may lead to that thread's heap already being freed
>>  when it is about to being unlocked by the resume function.
> 
> I could not reproduce this issue with your test code (it's likely too 
> timing-dependent), but looking at the backtrace above a bit closer, this 
> patch should help:
> 
> diff --git a/include/boilerplate/lock.h b/include/boilerplate/lock.h
> index dce1ff0..4819b34 100644
> --- a/include/boilerplate/lock.h
> +++ b/include/boilerplate/lock.h
> @@ -177,9 +177,9 @@ int __check_cancel_type(const char *locktype);
> 
>   #define __do_unlock_safe(__lock, __state)                \
>       ({                                \
> -        int __ret;                        \
> +        int __ret, __restored_state = __state;            \
>           __ret = -__RT(pthread_mutex_unlock(__lock));        \
> -        pthread_setcancelstate(__state, NULL);            \
> +        pthread_setcancelstate(__restored_state, NULL);        \
>           __ret;                            \
>       })
> 
> -- 
> Philippe.
> 

With the patch I was no longer able to detect the issue (it occured
pretty reliably after 5-10 minutes and with the patch I had a successful
run over night.

Thanks a lot for the quick fix,

Matthias




More information about the Xenomai mailing list