All of lore.kernel.org
 help / color / mirror / Atom feed
* [Xenomai] Race condition between threadobj_unlock and threadobj_free in t_resume
@ 2014-04-06 11:24 Matthias Schneider
  2014-04-07  9:49 ` Philippe Gerum
  2014-04-11 14:01 ` Philippe Gerum
  0 siblings, 2 replies; 4+ messages in thread
From: Matthias Schneider @ 2014-04-06 11:24 UTC (permalink / raw)
  To: xenomai@xenomai.org

The following minimal program 

http://pastebin.com/JdnnXwsF 

seems to lead to a race condition between threadobj_unlock 
and threadobj_free as indicated by valgrind (xenomai-forge 
and mercury): 

==9573== Invalid read of size 4
==9573==    at 0x403F214: threadobj_unlock (threadobj.h:407)
==9573==    by 0x403FAD2: put_psos_task (task.c:151)
==9573==    by 0x4040E9F: t_resume (task.c:419)
==9573==    by 0x80486F9: main (task-2.c:20)
==9573==  Address 0x42a20c4 is 172 bytes inside a block of size 664 free'd
==9573==    at 0x4029C88: free (vg_replace_malloc.c:446)
==9573==    by 0x40641B6: pvfree (heapobj.h:156)
==9573==    by 0x406425A: xnfree (heapobj.h:421)
==9573==    by 0x406460C: threadobj_free (threadobj.h:285)
==9573==    by 0x4068C6B: finalize_thread (threadobj.c:1239)
==9573==    by 0x40B5AE5: __nptl_deallocate_tsd (pthread_create.c:157)
==9573==    by 0x40B5CFE: start_thread (pthread_create.c:318)
==9573==    by 0x41C2C3D: clone (clone.S:131)
==9573== 

I am not sure on how to fix this, any ideas? I do admit 
that the use case is neither typical nor very useful on its own, 
but it is used in my test environment. 

It seems that when resuming a thread that already is in
its finalizer may lead to that thread's heap already being freed
when it is about to being unlocked by the resume function.

Thanks in advance for your help,
regards,
Matthias 


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [Xenomai] Race condition between threadobj_unlock and threadobj_free in t_resume
  2014-04-06 11:24 [Xenomai] Race condition between threadobj_unlock and threadobj_free in t_resume Matthias Schneider
@ 2014-04-07  9:49 ` Philippe Gerum
  2014-04-11 14:01 ` Philippe Gerum
  1 sibling, 0 replies; 4+ messages in thread
From: Philippe Gerum @ 2014-04-07  9:49 UTC (permalink / raw)
  To: Matthias Schneider, xenomai@xenomai.org

On 04/06/2014 01:24 PM, Matthias Schneider wrote:
> The following minimal program
>
> http://pastebin.com/JdnnXwsF
>
> seems to lead to a race condition between threadobj_unlock
> and threadobj_free as indicated by valgrind (xenomai-forge
> and mercury):
>
> ==9573== Invalid read of size 4
> ==9573==    at 0x403F214: threadobj_unlock (threadobj.h:407)
> ==9573==    by 0x403FAD2: put_psos_task (task.c:151)
> ==9573==    by 0x4040E9F: t_resume (task.c:419)
> ==9573==    by 0x80486F9: main (task-2.c:20)
> ==9573==  Address 0x42a20c4 is 172 bytes inside a block of size 664 free'd
> ==9573==    at 0x4029C88: free (vg_replace_malloc.c:446)
> ==9573==    by 0x40641B6: pvfree (heapobj.h:156)
> ==9573==    by 0x406425A: xnfree (heapobj.h:421)
> ==9573==    by 0x406460C: threadobj_free (threadobj.h:285)
> ==9573==    by 0x4068C6B: finalize_thread (threadobj.c:1239)
> ==9573==    by 0x40B5AE5: __nptl_deallocate_tsd (pthread_create.c:157)
> ==9573==    by 0x40B5CFE: start_thread (pthread_create.c:318)
> ==9573==    by 0x41C2C3D: clone (clone.S:131)
> ==9573==
>
> I am not sure on how to fix this, any ideas? I do admit
> that the use case is neither typical nor very useful on its own,
> but it is used in my test environment.
>
> It seems that when resuming a thread that already is in
> its finalizer may lead to that thread's heap already being freed
> when it is about to being unlocked by the resume function.
>

Because the finalizer should serialize using this lock but does not, 
that's bad. I need to check whether we can make this a general rule for 
all emulators/apis. Ok, I'll have a look at the pending issues asap. 
Thanks for the heads up.

-- 
Philippe.


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [Xenomai] Race condition between threadobj_unlock and threadobj_free in t_resume
  2014-04-06 11:24 [Xenomai] Race condition between threadobj_unlock and threadobj_free in t_resume Matthias Schneider
  2014-04-07  9:49 ` Philippe Gerum
@ 2014-04-11 14:01 ` Philippe Gerum
  2014-04-12 13:52   ` Matthias Schneider
  1 sibling, 1 reply; 4+ messages in thread
From: Philippe Gerum @ 2014-04-11 14:01 UTC (permalink / raw)
  To: Matthias Schneider, xenomai@xenomai.org

On 04/06/2014 01:24 PM, Matthias Schneider wrote:
> The following minimal program
>
> http://pastebin.com/JdnnXwsF
>
> seems to lead to a race condition between threadobj_unlock
> and threadobj_free as indicated by valgrind (xenomai-forge
> and mercury):
>
> ==9573== Invalid read of size 4
> ==9573==    at 0x403F214: threadobj_unlock (threadobj.h:407)
> ==9573==    by 0x403FAD2: put_psos_task (task.c:151)
> ==9573==    by 0x4040E9F: t_resume (task.c:419)
> ==9573==    by 0x80486F9: main (task-2.c:20)
> ==9573==  Address 0x42a20c4 is 172 bytes inside a block of size 664 free'd
> ==9573==    at 0x4029C88: free (vg_replace_malloc.c:446)
> ==9573==    by 0x40641B6: pvfree (heapobj.h:156)
> ==9573==    by 0x406425A: xnfree (heapobj.h:421)
> ==9573==    by 0x406460C: threadobj_free (threadobj.h:285)
> ==9573==    by 0x4068C6B: finalize_thread (threadobj.c:1239)
> ==9573==    by 0x40B5AE5: __nptl_deallocate_tsd (pthread_create.c:157)
> ==9573==    by 0x40B5CFE: start_thread (pthread_create.c:318)
> ==9573==    by 0x41C2C3D: clone (clone.S:131)
> ==9573==
>
> I am not sure on how to fix this, any ideas? I do admit
> that the use case is neither typical nor very useful on its own,
> but it is used in my test environment.
>
> It seems that when resuming a thread that already is in
> its finalizer may lead to that thread's heap already being freed
> when it is about to being unlocked by the resume function.

I could not reproduce this issue with your test code (it's likely too 
timing-dependent), but looking at the backtrace above a bit closer, this 
patch should help:

diff --git a/include/boilerplate/lock.h b/include/boilerplate/lock.h
index dce1ff0..4819b34 100644
--- a/include/boilerplate/lock.h
+++ b/include/boilerplate/lock.h
@@ -177,9 +177,9 @@ int __check_cancel_type(const char *locktype);

  #define __do_unlock_safe(__lock, __state)				\
  	({								\
-		int __ret;						\
+		int __ret, __restored_state = __state;			\
  		__ret = -__RT(pthread_mutex_unlock(__lock));		\
-		pthread_setcancelstate(__state, NULL);			\
+		pthread_setcancelstate(__restored_state, NULL);		\
  		__ret;							\
  	})

-- 
Philippe.


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [Xenomai] Race condition between threadobj_unlock and threadobj_free in t_resume
  2014-04-11 14:01 ` Philippe Gerum
@ 2014-04-12 13:52   ` Matthias Schneider
  0 siblings, 0 replies; 4+ messages in thread
From: Matthias Schneider @ 2014-04-12 13:52 UTC (permalink / raw)
  To: Philippe Gerum, xenomai@xenomai.org





----- Original Message -----
> From: Philippe Gerum <rpm@xenomai.org>
> To: Matthias Schneider <ma30002000@yahoo.de>; "xenomai@xenomai.org" <xenomai@xenomai.org>
> Cc: 
> Sent: Friday, April 11, 2014 4:01 PM
> Subject: Re: [Xenomai] Race condition between threadobj_unlock and threadobj_free in t_resume
> 
> On 04/06/2014 01:24 PM, Matthias Schneider wrote:
> 
>>  The following minimal program
>> 
>>  http://pastebin.com/JdnnXwsF
>> 
>>  seems to lead to a race condition between threadobj_unlock
>>  and threadobj_free as indicated by valgrind (xenomai-forge
>>  and mercury):
>> 
>>  ==9573== Invalid read of size 4
>>  ==9573==    at 0x403F214: threadobj_unlock (threadobj.h:407)
>>  ==9573==    by 0x403FAD2: put_psos_task (task.c:151)
>>  ==9573==    by 0x4040E9F: t_resume (task.c:419)
>>  ==9573==    by 0x80486F9: main (task-2.c:20)
>>  ==9573==  Address 0x42a20c4 is 172 bytes inside a block of size 664 
> free'd
>>  ==9573==    at 0x4029C88: free (vg_replace_malloc.c:446)
>>  ==9573==    by 0x40641B6: pvfree (heapobj.h:156)
>>  ==9573==    by 0x406425A: xnfree (heapobj.h:421)
>>  ==9573==    by 0x406460C: threadobj_free (threadobj.h:285)
>>  ==9573==    by 0x4068C6B: finalize_thread (threadobj.c:1239)
>>  ==9573==    by 0x40B5AE5: __nptl_deallocate_tsd (pthread_create.c:157)
>>  ==9573==    by 0x40B5CFE: start_thread (pthread_create.c:318)
>>  ==9573==    by 0x41C2C3D: clone (clone.S:131)
>>  ==9573==
>> 
>>  I am not sure on how to fix this, any ideas? I do admit
>>  that the use case is neither typical nor very useful on its own,
>>  but it is used in my test environment.
>> 
>>  It seems that when resuming a thread that already is in
>>  its finalizer may lead to that thread's heap already being freed
>>  when it is about to being unlocked by the resume function.
> 
> I could not reproduce this issue with your test code (it's likely too 
> timing-dependent), but looking at the backtrace above a bit closer, this 
> patch should help:
> 
> diff --git a/include/boilerplate/lock.h b/include/boilerplate/lock.h
> index dce1ff0..4819b34 100644
> --- a/include/boilerplate/lock.h
> +++ b/include/boilerplate/lock.h
> @@ -177,9 +177,9 @@ int __check_cancel_type(const char *locktype);
> 
>   #define __do_unlock_safe(__lock, __state)                \
>       ({                                \
> -        int __ret;                        \
> +        int __ret, __restored_state = __state;            \
>           __ret = -__RT(pthread_mutex_unlock(__lock));        \
> -        pthread_setcancelstate(__state, NULL);            \
> +        pthread_setcancelstate(__restored_state, NULL);        \
>           __ret;                            \
>       })
> 
> -- 
> Philippe.
> 

With the patch I was no longer able to detect the issue (it occured
pretty reliably after 5-10 minutes and with the patch I had a successful
run over night.

Thanks a lot for the quick fix,

Matthias


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2014-04-12 13:52 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-04-06 11:24 [Xenomai] Race condition between threadobj_unlock and threadobj_free in t_resume Matthias Schneider
2014-04-07  9:49 ` Philippe Gerum
2014-04-11 14:01 ` Philippe Gerum
2014-04-12 13:52   ` Matthias Schneider

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.