[Xenomai] Corruption after phtread_mutex

All of lore.kernel.org
 help / color / mirror / Atom feed

* [Xenomai] Corruption after phtread_mutex_destroy
@ 2015-04-23 18:42 Meier, Hans
  2015-04-23 19:03 ` Gilles Chanteperdrix
  2015-04-24  6:53 ` Jan Kiszka
  0 siblings, 2 replies; 14+ messages in thread
From: Meier, Hans @ 2015-04-23 18:42 UTC (permalink / raw)
  To: xenomai@xenomai.org

Hi everybody,

First of all thanks a lot for your excellent work, we are using Xenomai 
for about 8 years now in quite a complex application together with ACE 
based on the POSIX skin and most of the time it just works fine.

But now we have a situation we think needs to be reported. 

Consider the following:
We have a thread H with high priority, a thread L with low priority and 
a mutex M (recursive, prio-inherit). L locks M and then H tries to lock M, 
L gets boosted until it unlocks M, then H succeeds in locking M, then H 
unlocks M. If H then immediately destroys and frees M, we get a corruption 
where M's pthread_mutex_t was stored (a byte gets decremented), as soon as 
L gets scheduled again.

According to man page PTHREAD_MUTEX_DESTROY(3P), section "Destroying 
Mutexes" - close to the end - "Implementations are required to allow an 
object to be destroyed and freed ... immediately after the object is 
unlocked". So that is what we do here, we destroy and free the mutex 
immediately after it is unlocked. Certainly in a simple scenario we could 
easily work around this problem by destroying and freeing M later, but 
what if this code is buried deep inside a framework lib (here ACE)? 

We upgraded to Xenomai 2.6.4 some months ago coming from 2.4.9.1 and as 
far as I remember all of the really weird crashes of our application 
happened after upgrading. So I guess it has something to do with the futex 
implementation. A heavily simplified example code showing the problem can 
be found below. There the pthread_mutex_t gets implicitly freed as it 
resides on the stack. This results in a nice, delayed stack corruption - 
just like It happens in our application. Our environment: x86 32bit on a 
P4 dual core, linux 3.10.32, config attached.

So could you please have a look into that? A note on timing: I am working 
on this project normally 2 days a week, so Monday and Thursday next week I 
will probably be here for emailing if you have further questions.

Thanks in advance,
Best regards

Hans 

---

#include <stdlib.h>
#include <stdio.h>
#include <pthread.h>
#include <errno.h>
#include <sys/mman.h>

pthread_mutex_t* g_mutex = NULL;

void* worker(void*)
{
    // we only get here while the main thread that has a higher prio
    // sleeps, thus the mutex is ready to use now
    pthread_mutex_lock(g_mutex);
    timespec t; t.tv_sec = 2; t.tv_nsec = 0;
    nanosleep(&t,&t);
    pthread_mutex_unlock(g_mutex);
    return NULL;
}

void do_mutex_stuff()
{
    pthread_mutexattr_t mutex_attr;
    pthread_mutexattr_init(&mutex_attr);
    pthread_mutexattr_setprotocol(&mutex_attr, PTHREAD_PRIO_INHERIT);
    pthread_mutexattr_settype(&mutex_attr, PTHREAD_MUTEX_RECURSIVE);

    pthread_mutex_t mutex;
    pthread_mutex_init(&mutex, &mutex_attr);
    g_mutex = &mutex;
    //allow the low prio thread now to run.
    //sleep must be long enough that the worker can lock the mutex
    //and short enough that it doesn't reach unlock.
    timespec t; t.tv_sec = 1; t.tv_nsec =0;
    nanosleep(&t,&t);
    //this call will block until worker 
    //(which is sleeping longer than we did) unlocks
    pthread_mutex_lock(&mutex);
    pthread_mutex_unlock(&mutex);
    //remark: pthread_mutex_destroy returns EINVAL here!
    /*int rv = */pthread_mutex_destroy(&mutex);

    pthread_mutexattr_destroy(&mutex_attr);
}

void check_for_corruption()
{
    //set up a corruption target
    unsigned char Buffer[256];
    memset(Buffer, 0xaa, sizeof(Buffer));

    //let worker continue after unlock and corrupt whatever it likes
    timespec t; t.tv_sec = 1; t.tv_nsec = 0;
    nanosleep(&t,&t);

    //check what has happened with our corruption target
    for(size_t i = 0; i < sizeof(Buffer); i++)
    {
        if(Buffer[i] != 0xaa)
        {
            printf("Corruption at buffer offset %u: contents: 0x%02x "
                   "instead of 0xaa\n",
                   i, (unsigned)Buffer[i]);
            printf("The corruption occured at %p, pthread_mutex_t "
                   "formerly covered [%p..%p]\n",
                   &Buffer[i], g_mutex, 
                   ((unsigned char*)g_mutex)+sizeof(pthread_mutex_t)-1);
            printf("Thus the corruption occured at offset %d within "
                   "pthread_mutex_t.\n",
                   &Buffer[i] - ((unsigned char*)g_mutex));
        }
    }
}

int main (int, char**)
{
    mlockall(MCL_CURRENT | MCL_FUTURE);

    // make self be the high prio thread
    struct sched_param sched_param_main;
    sched_param_main.sched_priority = 20;
    pthread_setschedparam(pthread_self(), SCHED_FIFO, &sched_param_main);

    // create low prio worker thread
    pthread_attr_t thread_attr;
    pthread_attr_init(&thread_attr);
    pthread_attr_setschedpolicy(&thread_attr, SCHED_FIFO);
    struct sched_param sched_param_worker;
    sched_param_worker.sched_priority = 10;
    pthread_attr_setschedparam(&thread_attr, &sched_param_worker);
    pthread_attr_setinheritsched(&thread_attr, PTHREAD_EXPLICIT_SCHED);
    pthread_t worker_thread;
    pthread_create(&worker_thread, &thread_attr, worker, NULL);

    // create mutex, let worker run, lock, unlock and destroy mutex
    do_mutex_stuff();

    // wait for a stack corruption to occur ...
    check_for_corruption();

    // cleanup
    void* threadExitPtr;
    pthread_join(worker_thread, &threadExitPtr);
    pthread_attr_destroy(&thread_attr);

    return 0;
}
-------------- next part --------------
A non-text attachment was scrubbed...
Name: config.gz
Type: application/x-gzip
Size: 16689 bytes
Desc: config.gz
URL: <http://www.xenomai.org/pipermail/xenomai/attachments/20150423/6fe8d3a6/attachment.bin>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Xenomai] Corruption after phtread_mutex_destroy
  2015-04-23 18:42 [Xenomai] Corruption after phtread_mutex_destroy Meier, Hans
@ 2015-04-23 19:03 ` Gilles Chanteperdrix
  2015-04-23 19:27   ` Gilles Chanteperdrix
  2015-04-24  6:53 ` Jan Kiszka
  1 sibling, 1 reply; 14+ messages in thread
From: Gilles Chanteperdrix @ 2015-04-23 19:03 UTC (permalink / raw)
  To: Meier, Hans; +Cc: xenomai@xenomai.org

On Thu, Apr 23, 2015 at 06:42:43PM +0000, Meier, Hans wrote:
> Hi everybody,
> 
> First of all thanks a lot for your excellent work, we are using Xenomai 
> for about 8 years now in quite a complex application together with ACE 
> based on the POSIX skin and most of the time it just works fine.
> 
> But now we have a situation we think needs to be reported. 
> 
> Consider the following:
> We have a thread H with high priority, a thread L with low priority and 
> a mutex M (recursive, prio-inherit). L locks M and then H tries to lock M, 
> L gets boosted until it unlocks M, then H succeeds in locking M, then H 
> unlocks M. If H then immediately destroys and frees M, we get a corruption 
> where M's pthread_mutex_t was stored (a byte gets decremented), as soon as 
> L gets scheduled again.
> 
> According to man page PTHREAD_MUTEX_DESTROY(3P), section "Destroying 
> Mutexes" - close to the end - "Implementations are required to allow an 
> object to be destroyed and freed ... immediately after the object is 
> unlocked". So that is what we do here, we destroy and free the mutex 
> immediately after it is unlocked. Certainly in a simple scenario we could 
> easily work around this problem by destroying and freeing M later, but 
> what if this code is buried deep inside a framework lib (here ACE)? 
> 
> We upgraded to Xenomai 2.6.4 some months ago coming from 2.4.9.1 and as 
> far as I remember all of the really weird crashes of our application 
> happened after upgrading. So I guess it has something to do with the futex 
> implementation. A heavily simplified example code showing the problem can 
> be found below. There the pthread_mutex_t gets implicitly freed as it 
> resides on the stack. This results in a nice, delayed stack corruption - 
> just like It happens in our application. Our environment: x86 32bit on a 
> P4 dual core, linux 3.10.32, config attached.
> 
> So could you please have a look into that? A note on timing: I am working 
> on this project normally 2 days a week, so Monday and Thursday next week I 
> will probably be here for emailing if you have further questions.
> 
> Thanks in advance,
> Best regards
> 
> Hans 
> 
> ---
> 
> #include <stdlib.h>
> #include <stdio.h>
> #include <pthread.h>
> #include <errno.h>
> #include <sys/mman.h>
> 
> pthread_mutex_t* g_mutex = NULL;
> 
> void* worker(void*)
> {
>     // we only get here while the main thread that has a higher prio
>     // sleeps, thus the mutex is ready to use now
>     pthread_mutex_lock(g_mutex);
>     timespec t; t.tv_sec = 2; t.tv_nsec = 0;
>     nanosleep(&t,&t);
>     pthread_mutex_unlock(g_mutex);
>     return NULL;
> }
> 
> void do_mutex_stuff()
> {
>     pthread_mutexattr_t mutex_attr;
>     pthread_mutexattr_init(&mutex_attr);
>     pthread_mutexattr_setprotocol(&mutex_attr, PTHREAD_PRIO_INHERIT);
>     pthread_mutexattr_settype(&mutex_attr, PTHREAD_MUTEX_RECURSIVE);
> 
>     pthread_mutex_t mutex;
>     pthread_mutex_init(&mutex, &mutex_attr);
>     g_mutex = &mutex;
>     //allow the low prio thread now to run.
>     //sleep must be long enough that the worker can lock the mutex
>     //and short enough that it doesn't reach unlock.
>     timespec t; t.tv_sec = 1; t.tv_nsec =0;
>     nanosleep(&t,&t);
>     //this call will block until worker 
>     //(which is sleeping longer than we did) unlocks
>     pthread_mutex_lock(&mutex);
>     pthread_mutex_unlock(&mutex);
>     //remark: pthread_mutex_destroy returns EINVAL here!
>     /*int rv = */pthread_mutex_destroy(&mutex);
> 
>     pthread_mutexattr_destroy(&mutex_attr);
> }

Are you sure pthread_mutex_destroy returns EINVAL and not EBUSY ?
Anyway, freeing the mutex while it has not been destroyed is
invalid. I do not pretend to know whether your problem is real, but
that precise example code is invalid.

-- 
					    Gilles.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Xenomai] Corruption after phtread_mutex_destroy
  2015-04-23 19:03 ` Gilles Chanteperdrix
@ 2015-04-23 19:27   ` Gilles Chanteperdrix
  0 siblings, 0 replies; 14+ messages in thread
From: Gilles Chanteperdrix @ 2015-04-23 19:27 UTC (permalink / raw)
  To: Meier, Hans; +Cc: xenomai@xenomai.org

On Thu, Apr 23, 2015 at 09:03:48PM +0200, Gilles Chanteperdrix wrote:
> On Thu, Apr 23, 2015 at 06:42:43PM +0000, Meier, Hans wrote:
> > Hi everybody,
> > 
> > First of all thanks a lot for your excellent work, we are using Xenomai 
> > for about 8 years now in quite a complex application together with ACE 
> > based on the POSIX skin and most of the time it just works fine.
> > 
> > But now we have a situation we think needs to be reported. 
> > 
> > Consider the following:
> > We have a thread H with high priority, a thread L with low priority and 
> > a mutex M (recursive, prio-inherit). L locks M and then H tries to lock M, 
> > L gets boosted until it unlocks M, then H succeeds in locking M, then H 
> > unlocks M. If H then immediately destroys and frees M, we get a corruption 
> > where M's pthread_mutex_t was stored (a byte gets decremented), as soon as 
> > L gets scheduled again.
> > 
> > According to man page PTHREAD_MUTEX_DESTROY(3P), section "Destroying 
> > Mutexes" - close to the end - "Implementations are required to allow an 
> > object to be destroyed and freed ... immediately after the object is 
> > unlocked". So that is what we do here, we destroy and free the mutex 
> > immediately after it is unlocked. Certainly in a simple scenario we could 
> > easily work around this problem by destroying and freeing M later, but 
> > what if this code is buried deep inside a framework lib (here ACE)? 
> > 
> > We upgraded to Xenomai 2.6.4 some months ago coming from 2.4.9.1 and as 
> > far as I remember all of the really weird crashes of our application 
> > happened after upgrading. So I guess it has something to do with the futex 
> > implementation. A heavily simplified example code showing the problem can 
> > be found below. There the pthread_mutex_t gets implicitly freed as it 
> > resides on the stack. This results in a nice, delayed stack corruption - 
> > just like It happens in our application. Our environment: x86 32bit on a 
> > P4 dual core, linux 3.10.32, config attached.
> > 
> > So could you please have a look into that? A note on timing: I am working 
> > on this project normally 2 days a week, so Monday and Thursday next week I 
> > will probably be here for emailing if you have further questions.
> > 
> > Thanks in advance,
> > Best regards
> > 
> > Hans 
> > 
> > ---
> > 
> > #include <stdlib.h>
> > #include <stdio.h>
> > #include <pthread.h>
> > #include <errno.h>
> > #include <sys/mman.h>
> > 
> > pthread_mutex_t* g_mutex = NULL;
> > 
> > void* worker(void*)
> > {
> >     // we only get here while the main thread that has a higher prio
> >     // sleeps, thus the mutex is ready to use now
> >     pthread_mutex_lock(g_mutex);
> >     timespec t; t.tv_sec = 2; t.tv_nsec = 0;
> >     nanosleep(&t,&t);
> >     pthread_mutex_unlock(g_mutex);
> >     return NULL;
> > }
> > 
> > void do_mutex_stuff()
> > {
> >     pthread_mutexattr_t mutex_attr;
> >     pthread_mutexattr_init(&mutex_attr);
> >     pthread_mutexattr_setprotocol(&mutex_attr, PTHREAD_PRIO_INHERIT);
> >     pthread_mutexattr_settype(&mutex_attr, PTHREAD_MUTEX_RECURSIVE);
> > 
> >     pthread_mutex_t mutex;
> >     pthread_mutex_init(&mutex, &mutex_attr);
> >     g_mutex = &mutex;
> >     //allow the low prio thread now to run.
> >     //sleep must be long enough that the worker can lock the mutex
> >     //and short enough that it doesn't reach unlock.
> >     timespec t; t.tv_sec = 1; t.tv_nsec =0;
> >     nanosleep(&t,&t);
> >     //this call will block until worker 
> >     //(which is sleeping longer than we did) unlocks
> >     pthread_mutex_lock(&mutex);
> >     pthread_mutex_unlock(&mutex);
> >     //remark: pthread_mutex_destroy returns EINVAL here!
> >     /*int rv = */pthread_mutex_destroy(&mutex);
> > 
> >     pthread_mutexattr_destroy(&mutex_attr);
> > }
> 
> Are you sure pthread_mutex_destroy returns EINVAL and not EBUSY ?
> Anyway, freeing the mutex while it has not been destroyed is
> invalid. I do not pretend to know whether your problem is real, but
> that precise example code is invalid.

Indeed, it is EINVAL. The problem is that the mutex is considered in
use as long as the worker thread has not exited pthread_mutex_unlock.

-- 
					    Gilles.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Xenomai] Corruption after phtread_mutex_destroy
  2015-04-23 18:42 [Xenomai] Corruption after phtread_mutex_destroy Meier, Hans
  2015-04-23 19:03 ` Gilles Chanteperdrix
@ 2015-04-24  6:53 ` Jan Kiszka
  2015-04-24 15:18   ` Gilles Chanteperdrix
  1 sibling, 1 reply; 14+ messages in thread
From: Jan Kiszka @ 2015-04-24  6:53 UTC (permalink / raw)
  To: Meier, Hans, xenomai@xenomai.org

Am 2015-04-23 um 20:42 schrieb Meier, Hans:
> Hi everybody,
> 
> First of all thanks a lot for your excellent work, we are using Xenomai 
> for about 8 years now in quite a complex application together with ACE 
> based on the POSIX skin and most of the time it just works fine.
> 
> But now we have a situation we think needs to be reported. 
> 
> Consider the following:
> We have a thread H with high priority, a thread L with low priority and 
> a mutex M (recursive, prio-inherit). L locks M and then H tries to lock M, 
> L gets boosted until it unlocks M, then H succeeds in locking M, then H 
> unlocks M. If H then immediately destroys and frees M, we get a corruption 
> where M's pthread_mutex_t was stored (a byte gets decremented), as soon as 
> L gets scheduled again.
> 
> According to man page PTHREAD_MUTEX_DESTROY(3P), section "Destroying 
> Mutexes" - close to the end - "Implementations are required to allow an 
> object to be destroyed and freed ... immediately after the object is 
> unlocked". So that is what we do here, we destroy and free the mutex 
> immediately after it is unlocked. Certainly in a simple scenario we could 
> easily work around this problem by destroying and freeing M later, but 
> what if this code is buried deep inside a framework lib (here ACE)? 

Not saying that ACE is to be blamed in this case (until the issue is
fully understood), but that layer is in general a rather problematic
piece of code (politely expressed).

In most projects I saw so far, it was of limited or even no real use
anymore. And it caused quite some headache due to lots of bugs,
specifically regarding concurrency, even in its "matured" parts.

So, if possible, get rid of it, at least on the long run.

Jan

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 198 bytes
Desc: OpenPGP digital signature
URL: <http://www.xenomai.org/pipermail/xenomai/attachments/20150424/725a3be9/attachment.sig>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Xenomai] Corruption after phtread_mutex_destroy
  2015-04-24  6:53 ` Jan Kiszka
@ 2015-04-24 15:18   ` Gilles Chanteperdrix
  2015-04-24 15:23     ` Jan Kiszka
  0 siblings, 1 reply; 14+ messages in thread
From: Gilles Chanteperdrix @ 2015-04-24 15:18 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: xenomai@xenomai.org

On Fri, Apr 24, 2015 at 08:53:30AM +0200, Jan Kiszka wrote:
> Am 2015-04-23 um 20:42 schrieb Meier, Hans:
> > Hi everybody,
> > 
> > First of all thanks a lot for your excellent work, we are using Xenomai 
> > for about 8 years now in quite a complex application together with ACE 
> > based on the POSIX skin and most of the time it just works fine.
> > 
> > But now we have a situation we think needs to be reported. 
> > 
> > Consider the following:
> > We have a thread H with high priority, a thread L with low priority and 
> > a mutex M (recursive, prio-inherit). L locks M and then H tries to lock M, 
> > L gets boosted until it unlocks M, then H succeeds in locking M, then H 
> > unlocks M. If H then immediately destroys and frees M, we get a corruption 
> > where M's pthread_mutex_t was stored (a byte gets decremented), as soon as 
> > L gets scheduled again.
> > 
> > According to man page PTHREAD_MUTEX_DESTROY(3P), section "Destroying 
> > Mutexes" - close to the end - "Implementations are required to allow an 
> > object to be destroyed and freed ... immediately after the object is 
> > unlocked". So that is what we do here, we destroy and free the mutex 
> > immediately after it is unlocked. Certainly in a simple scenario we could 
> > easily work around this problem by destroying and freeing M later, but 
> > what if this code is buried deep inside a framework lib (here ACE)? 
> 
> Not saying that ACE is to be blamed in this case (until the issue is
> fully understood),

The issue is fully understood, the mutex is considered in-use as
long as all threads have not exited all the mutex services. In the
case of the example posted, the "worker" thread is still in the
pthread_mutex_unlock service while the other thread is trying to
call pthread_mutex_destroy, which cause pthread_mutex_destroy to
return EINVAL.

Xenomai 3 does not have this issue.

-- 
					    Gilles.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Xenomai] Corruption after phtread_mutex_destroy
  2015-04-24 15:18   ` Gilles Chanteperdrix
@ 2015-04-24 15:23     ` Jan Kiszka
  2015-04-24 15:26       ` Gilles Chanteperdrix
  0 siblings, 1 reply; 14+ messages in thread
From: Jan Kiszka @ 2015-04-24 15:23 UTC (permalink / raw)
  To: Gilles Chanteperdrix; +Cc: xenomai@xenomai.org

On 2015-04-24 17:18, Gilles Chanteperdrix wrote:
> On Fri, Apr 24, 2015 at 08:53:30AM +0200, Jan Kiszka wrote:
>> Am 2015-04-23 um 20:42 schrieb Meier, Hans:
>>> Hi everybody,
>>>
>>> First of all thanks a lot for your excellent work, we are using Xenomai 
>>> for about 8 years now in quite a complex application together with ACE 
>>> based on the POSIX skin and most of the time it just works fine.
>>>
>>> But now we have a situation we think needs to be reported. 
>>>
>>> Consider the following:
>>> We have a thread H with high priority, a thread L with low priority and 
>>> a mutex M (recursive, prio-inherit). L locks M and then H tries to lock M, 
>>> L gets boosted until it unlocks M, then H succeeds in locking M, then H 
>>> unlocks M. If H then immediately destroys and frees M, we get a corruption 
>>> where M's pthread_mutex_t was stored (a byte gets decremented), as soon as 
>>> L gets scheduled again.
>>>
>>> According to man page PTHREAD_MUTEX_DESTROY(3P), section "Destroying 
>>> Mutexes" - close to the end - "Implementations are required to allow an 
>>> object to be destroyed and freed ... immediately after the object is 
>>> unlocked". So that is what we do here, we destroy and free the mutex 
>>> immediately after it is unlocked. Certainly in a simple scenario we could 
>>> easily work around this problem by destroying and freeing M later, but 
>>> what if this code is buried deep inside a framework lib (here ACE)? 
>>
>> Not saying that ACE is to be blamed in this case (until the issue is
>> fully understood),
> 
> The issue is fully understood, the mutex is considered in-use as
> long as all threads have not exited all the mutex services. In the
> case of the example posted, the "worker" thread is still in the
> pthread_mutex_unlock service while the other thread is trying to
> call pthread_mutex_destroy, which cause pthread_mutex_destroy to
> return EINVAL.

If the reported pattern is actually equivalent to the pattern that of
the application (wasn't clear to me so far), then it is indeed understood.

> 
> Xenomai 3 does not have this issue.
> 

You mean returning EINVAL instead of EBUSY?

Jan

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 181 bytes
Desc: OpenPGP digital signature
URL: <http://www.xenomai.org/pipermail/xenomai/attachments/20150424/c8d8e546/attachment.sig>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Xenomai] Corruption after phtread_mutex_destroy
  2015-04-24 15:23     ` Jan Kiszka
@ 2015-04-24 15:26       ` Gilles Chanteperdrix
  2015-04-24 15:31         ` Jan Kiszka
  0 siblings, 1 reply; 14+ messages in thread
From: Gilles Chanteperdrix @ 2015-04-24 15:26 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: xenomai@xenomai.org

On Fri, Apr 24, 2015 at 05:23:55PM +0200, Jan Kiszka wrote:
> On 2015-04-24 17:18, Gilles Chanteperdrix wrote:
> > On Fri, Apr 24, 2015 at 08:53:30AM +0200, Jan Kiszka wrote:
> >> Am 2015-04-23 um 20:42 schrieb Meier, Hans:
> >>> Hi everybody,
> >>>
> >>> First of all thanks a lot for your excellent work, we are using Xenomai 
> >>> for about 8 years now in quite a complex application together with ACE 
> >>> based on the POSIX skin and most of the time it just works fine.
> >>>
> >>> But now we have a situation we think needs to be reported. 
> >>>
> >>> Consider the following:
> >>> We have a thread H with high priority, a thread L with low priority and 
> >>> a mutex M (recursive, prio-inherit). L locks M and then H tries to lock M, 
> >>> L gets boosted until it unlocks M, then H succeeds in locking M, then H 
> >>> unlocks M. If H then immediately destroys and frees M, we get a corruption 
> >>> where M's pthread_mutex_t was stored (a byte gets decremented), as soon as 
> >>> L gets scheduled again.
> >>>
> >>> According to man page PTHREAD_MUTEX_DESTROY(3P), section "Destroying 
> >>> Mutexes" - close to the end - "Implementations are required to allow an 
> >>> object to be destroyed and freed ... immediately after the object is 
> >>> unlocked". So that is what we do here, we destroy and free the mutex 
> >>> immediately after it is unlocked. Certainly in a simple scenario we could 
> >>> easily work around this problem by destroying and freeing M later, but 
> >>> what if this code is buried deep inside a framework lib (here ACE)? 
> >>
> >> Not saying that ACE is to be blamed in this case (until the issue is
> >> fully understood),
> > 
> > The issue is fully understood, the mutex is considered in-use as
> > long as all threads have not exited all the mutex services. In the
> > case of the example posted, the "worker" thread is still in the
> > pthread_mutex_unlock service while the other thread is trying to
> > call pthread_mutex_destroy, which cause pthread_mutex_destroy to
> > return EINVAL.
> 
> If the reported pattern is actually equivalent to the pattern that of
> the application (wasn't clear to me so far), then it is indeed understood.
> 
> > 
> > Xenomai 3 does not have this issue.
> > 
> 
> You mean returning EINVAL instead of EBUSY?

I mean that pthread_mutex_destroy would not return an error and
destroy the mutex as it should.

-- 
					    Gilles.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 811 bytes
Desc: not available
URL: <http://www.xenomai.org/pipermail/xenomai/attachments/20150424/32e9e55f/attachment.sig>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Xenomai] Corruption after phtread_mutex_destroy
  2015-04-24 15:26       ` Gilles Chanteperdrix
@ 2015-04-24 15:31         ` Jan Kiszka
  2015-04-24 15:35           ` Gilles Chanteperdrix
  0 siblings, 1 reply; 14+ messages in thread
From: Jan Kiszka @ 2015-04-24 15:31 UTC (permalink / raw)
  To: Gilles Chanteperdrix; +Cc: xenomai@xenomai.org

On 2015-04-24 17:26, Gilles Chanteperdrix wrote:
> On Fri, Apr 24, 2015 at 05:23:55PM +0200, Jan Kiszka wrote:
>> On 2015-04-24 17:18, Gilles Chanteperdrix wrote:
>>> On Fri, Apr 24, 2015 at 08:53:30AM +0200, Jan Kiszka wrote:
>>>> Am 2015-04-23 um 20:42 schrieb Meier, Hans:
>>>>> Hi everybody,
>>>>>
>>>>> First of all thanks a lot for your excellent work, we are using Xenomai 
>>>>> for about 8 years now in quite a complex application together with ACE 
>>>>> based on the POSIX skin and most of the time it just works fine.
>>>>>
>>>>> But now we have a situation we think needs to be reported. 
>>>>>
>>>>> Consider the following:
>>>>> We have a thread H with high priority, a thread L with low priority and 
>>>>> a mutex M (recursive, prio-inherit). L locks M and then H tries to lock M, 
>>>>> L gets boosted until it unlocks M, then H succeeds in locking M, then H 
>>>>> unlocks M. If H then immediately destroys and frees M, we get a corruption 
>>>>> where M's pthread_mutex_t was stored (a byte gets decremented), as soon as 
>>>>> L gets scheduled again.
>>>>>
>>>>> According to man page PTHREAD_MUTEX_DESTROY(3P), section "Destroying 
>>>>> Mutexes" - close to the end - "Implementations are required to allow an 
>>>>> object to be destroyed and freed ... immediately after the object is 
>>>>> unlocked". So that is what we do here, we destroy and free the mutex 
>>>>> immediately after it is unlocked. Certainly in a simple scenario we could 
>>>>> easily work around this problem by destroying and freeing M later, but 
>>>>> what if this code is buried deep inside a framework lib (here ACE)? 
>>>>
>>>> Not saying that ACE is to be blamed in this case (until the issue is
>>>> fully understood),
>>>
>>> The issue is fully understood, the mutex is considered in-use as
>>> long as all threads have not exited all the mutex services. In the
>>> case of the example posted, the "worker" thread is still in the
>>> pthread_mutex_unlock service while the other thread is trying to
>>> call pthread_mutex_destroy, which cause pthread_mutex_destroy to
>>> return EINVAL.
>>
>> If the reported pattern is actually equivalent to the pattern that of
>> the application (wasn't clear to me so far), then it is indeed understood.
>>
>>>
>>> Xenomai 3 does not have this issue.
>>>
>>
>> You mean returning EINVAL instead of EBUSY?
> 
> I mean that pthread_mutex_destroy would not return an error and
> destroy the mutex as it should.

The spec recommends to return EBUSY in case destruction of a locked
mutex is requested. That seems to have changed from the 2004 edition of
the spec where it was more prominently listed as "may fail if...".
Returning EINVAL is wrong, though.

Jan


-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 181 bytes
Desc: OpenPGP digital signature
URL: <http://www.xenomai.org/pipermail/xenomai/attachments/20150424/1cc324a5/attachment.sig>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Xenomai] Corruption after phtread_mutex_destroy
  2015-04-24 15:31         ` Jan Kiszka
@ 2015-04-24 15:35           ` Gilles Chanteperdrix
  2015-04-24 16:02             ` Jan Kiszka
  0 siblings, 1 reply; 14+ messages in thread
From: Gilles Chanteperdrix @ 2015-04-24 15:35 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: xenomai@xenomai.org

On Fri, Apr 24, 2015 at 05:31:49PM +0200, Jan Kiszka wrote:
> On 2015-04-24 17:26, Gilles Chanteperdrix wrote:
> > On Fri, Apr 24, 2015 at 05:23:55PM +0200, Jan Kiszka wrote:
> >> On 2015-04-24 17:18, Gilles Chanteperdrix wrote:
> >>> On Fri, Apr 24, 2015 at 08:53:30AM +0200, Jan Kiszka wrote:
> >>>> Am 2015-04-23 um 20:42 schrieb Meier, Hans:
> >>>>> Hi everybody,
> >>>>>
> >>>>> First of all thanks a lot for your excellent work, we are using Xenomai 
> >>>>> for about 8 years now in quite a complex application together with ACE 
> >>>>> based on the POSIX skin and most of the time it just works fine.
> >>>>>
> >>>>> But now we have a situation we think needs to be reported. 
> >>>>>
> >>>>> Consider the following:
> >>>>> We have a thread H with high priority, a thread L with low priority and 
> >>>>> a mutex M (recursive, prio-inherit). L locks M and then H tries to lock M, 
> >>>>> L gets boosted until it unlocks M, then H succeeds in locking M, then H 
> >>>>> unlocks M. If H then immediately destroys and frees M, we get a corruption 
> >>>>> where M's pthread_mutex_t was stored (a byte gets decremented), as soon as 
> >>>>> L gets scheduled again.
> >>>>>
> >>>>> According to man page PTHREAD_MUTEX_DESTROY(3P), section "Destroying 
> >>>>> Mutexes" - close to the end - "Implementations are required to allow an 
> >>>>> object to be destroyed and freed ... immediately after the object is 
> >>>>> unlocked". So that is what we do here, we destroy and free the mutex 
> >>>>> immediately after it is unlocked. Certainly in a simple scenario we could 
> >>>>> easily work around this problem by destroying and freeing M later, but 
> >>>>> what if this code is buried deep inside a framework lib (here ACE)? 
> >>>>
> >>>> Not saying that ACE is to be blamed in this case (until the issue is
> >>>> fully understood),
> >>>
> >>> The issue is fully understood, the mutex is considered in-use as
> >>> long as all threads have not exited all the mutex services. In the
> >>> case of the example posted, the "worker" thread is still in the
> >>> pthread_mutex_unlock service while the other thread is trying to
> >>> call pthread_mutex_destroy, which cause pthread_mutex_destroy to
> >>> return EINVAL.
> >>
> >> If the reported pattern is actually equivalent to the pattern that of
> >> the application (wasn't clear to me so far), then it is indeed understood.
> >>
> >>>
> >>> Xenomai 3 does not have this issue.
> >>>
> >>
> >> You mean returning EINVAL instead of EBUSY?
> > 
> > I mean that pthread_mutex_destroy would not return an error and
> > destroy the mutex as it should.
> 
> The spec recommends to return EBUSY in case destruction of a locked
> mutex is requested.

The thing is, as the example demonstrate, pthread_mutex_destroy
fails while the mutex is unlocked. We are being over zealous here by
considering that the mutex is busy because a thread is still in
pthread_mutex_unlock.

-- 
					    Gilles.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 811 bytes
Desc: not available
URL: <http://www.xenomai.org/pipermail/xenomai/attachments/20150424/10712283/attachment.sig>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Xenomai] Corruption after phtread_mutex_destroy
  2015-04-24 15:35           ` Gilles Chanteperdrix
@ 2015-04-24 16:02             ` Jan Kiszka
  2015-04-27  9:39               ` Meier, Hans
  0 siblings, 1 reply; 14+ messages in thread
From: Jan Kiszka @ 2015-04-24 16:02 UTC (permalink / raw)
  To: Gilles Chanteperdrix; +Cc: xenomai@xenomai.org

On 2015-04-24 17:35, Gilles Chanteperdrix wrote:
> On Fri, Apr 24, 2015 at 05:31:49PM +0200, Jan Kiszka wrote:
>> On 2015-04-24 17:26, Gilles Chanteperdrix wrote:
>>> On Fri, Apr 24, 2015 at 05:23:55PM +0200, Jan Kiszka wrote:
>>>> On 2015-04-24 17:18, Gilles Chanteperdrix wrote:
>>>>> On Fri, Apr 24, 2015 at 08:53:30AM +0200, Jan Kiszka wrote:
>>>>>> Am 2015-04-23 um 20:42 schrieb Meier, Hans:
>>>>>>> Hi everybody,
>>>>>>>
>>>>>>> First of all thanks a lot for your excellent work, we are using Xenomai 
>>>>>>> for about 8 years now in quite a complex application together with ACE 
>>>>>>> based on the POSIX skin and most of the time it just works fine.
>>>>>>>
>>>>>>> But now we have a situation we think needs to be reported. 
>>>>>>>
>>>>>>> Consider the following:
>>>>>>> We have a thread H with high priority, a thread L with low priority and 
>>>>>>> a mutex M (recursive, prio-inherit). L locks M and then H tries to lock M, 
>>>>>>> L gets boosted until it unlocks M, then H succeeds in locking M, then H 
>>>>>>> unlocks M. If H then immediately destroys and frees M, we get a corruption 
>>>>>>> where M's pthread_mutex_t was stored (a byte gets decremented), as soon as 
>>>>>>> L gets scheduled again.
>>>>>>>
>>>>>>> According to man page PTHREAD_MUTEX_DESTROY(3P), section "Destroying 
>>>>>>> Mutexes" - close to the end - "Implementations are required to allow an 
>>>>>>> object to be destroyed and freed ... immediately after the object is 
>>>>>>> unlocked". So that is what we do here, we destroy and free the mutex 
>>>>>>> immediately after it is unlocked. Certainly in a simple scenario we could 
>>>>>>> easily work around this problem by destroying and freeing M later, but 
>>>>>>> what if this code is buried deep inside a framework lib (here ACE)? 
>>>>>>
>>>>>> Not saying that ACE is to be blamed in this case (until the issue is
>>>>>> fully understood),
>>>>>
>>>>> The issue is fully understood, the mutex is considered in-use as
>>>>> long as all threads have not exited all the mutex services. In the
>>>>> case of the example posted, the "worker" thread is still in the
>>>>> pthread_mutex_unlock service while the other thread is trying to
>>>>> call pthread_mutex_destroy, which cause pthread_mutex_destroy to
>>>>> return EINVAL.
>>>>
>>>> If the reported pattern is actually equivalent to the pattern that of
>>>> the application (wasn't clear to me so far), then it is indeed understood.
>>>>
>>>>>
>>>>> Xenomai 3 does not have this issue.
>>>>>
>>>>
>>>> You mean returning EINVAL instead of EBUSY?
>>>
>>> I mean that pthread_mutex_destroy would not return an error and
>>> destroy the mutex as it should.
>>
>> The spec recommends to return EBUSY in case destruction of a locked
>> mutex is requested.
> 
> The thing is, as the example demonstrate, pthread_mutex_destroy
> fails while the mutex is unlocked. We are being over zealous here by
> considering that the mutex is busy because a thread is still in
> pthread_mutex_unlock.

OK, then it makes sense.

Jan


-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 181 bytes
Desc: OpenPGP digital signature
URL: <http://www.xenomai.org/pipermail/xenomai/attachments/20150424/d1dea7a1/attachment.sig>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Xenomai] Corruption after phtread_mutex_destroy
  2015-04-24 16:02             ` Jan Kiszka
@ 2015-04-27  9:39               ` Meier, Hans
  2015-04-27 11:47                 ` Gilles Chanteperdrix
  0 siblings, 1 reply; 14+ messages in thread
From: Meier, Hans @ 2015-04-27  9:39 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: xenomai@xenomai.org

On 2015-04-24 18:03, Jan Kiszka wrote:
> On 2015-04-24 17:35, Gilles Chanteperdrix wrote:
>> On Fri, Apr 24, 2015 at 05:31:49PM +0200, Jan Kiszka wrote:
>>> On 2015-04-24 17:26, Gilles Chanteperdrix wrote:
>>>> On Fri, Apr 24, 2015 at 05:23:55PM +0200, Jan Kiszka wrote:
>>>>> On 2015-04-24 17:18, Gilles Chanteperdrix wrote:
>>>>>> On Fri, Apr 24, 2015 at 08:53:30AM +0200, Jan Kiszka wrote:
>>>>>>> Am 2015-04-23 um 20:42 schrieb Meier, Hans:
>>>>>>>> Hi everybody,
>>>>>>>>
>>>>>>>> First of all thanks a lot for your excellent work, we are using Xenomai 
>>>>>>>> for about 8 years now in quite a complex application together with ACE 
>>>>>>>> based on the POSIX skin and most of the time it just works fine.
>>>>>>>>
>>>>>>>> But now we have a situation we think needs to be reported. 
>>>>>>>>
>>>>>>>> Consider the following:
>>>>>>>> We have a thread H with high priority, a thread L with low priority and 
>>>>>>>> a mutex M (recursive, prio-inherit). L locks M and then H tries to lock M, 
>>>>>>>> L gets boosted until it unlocks M, then H succeeds in locking M, then H 
>>>>>>>> unlocks M. If H then immediately destroys and frees M, we get a corruption 
>>>>>>>> where M's pthread_mutex_t was stored (a byte gets decremented), as soon as 
>>>>>>>> L gets scheduled again.
>>>>>>>>
>>>>>>>> According to man page PTHREAD_MUTEX_DESTROY(3P), section "Destroying 
>>>>>>>> Mutexes" - close to the end - "Implementations are required to allow an 
>>>>>>>> object to be destroyed and freed ... immediately after the object is 
>>>>>>>> unlocked". So that is what we do here, we destroy and free the mutex 
>>>>>>>> immediately after it is unlocked. Certainly in a simple scenario we could 
>>>>>>>> easily work around this problem by destroying and freeing M later, but 
>>>>>>>> what if this code is buried deep inside a framework lib (here ACE)? 
>>>>>>>
>>>>>>> Not saying that ACE is to be blamed in this case (until the issue is
>>>>>>> fully understood),
>>>>>>
>>>>>> The issue is fully understood, the mutex is considered in-use as
>>>>>> long as all threads have not exited all the mutex services. In the
>>>>>> case of the example posted, the "worker" thread is still in the
>>>>>> pthread_mutex_unlock service while the other thread is trying to
>>>>>> call pthread_mutex_destroy, which cause pthread_mutex_destroy to
>>>>>> return EINVAL.
>>>>>
>>>>> If the reported pattern is actually equivalent to the pattern that of
>>>>> the application (wasn't clear to me so far), then it is indeed understood.
>>>>>
>>>>>>
>>>>>> Xenomai 3 does not have this issue.
>>>>>>
>>>>>
>>>>> You mean returning EINVAL instead of EBUSY?
>>>>
>>>> I mean that pthread_mutex_destroy would not return an error and
>>>> destroy the mutex as it should.
>>>
>>> The spec recommends to return EBUSY in case destruction of a locked
>>> mutex is requested.
>> 
>> The thing is, as the example demonstrate, pthread_mutex_destroy
>> fails while the mutex is unlocked. We are being over zealous here by
>> considering that the mutex is busy because a thread is still in
>> pthread_mutex_unlock.
> 
> OK, then it makes sense.

It is not only that pthread_mutex_destroy in this case should destroy the 
mutex and return without error. More than that no manipulation on the 
pthread_mutex_t structure must occur any more while leaving the 
pthread_mutex_unlock service with low prio later. I guess this doesn't 
depend on what pthread_mutex_destroy does internally or what it returns.
Sorry that I didn't point that out more clearly in my initial mail. 

Hans



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Xenomai] Corruption after phtread_mutex_destroy
  2015-04-27  9:39               ` Meier, Hans
@ 2015-04-27 11:47                 ` Gilles Chanteperdrix
  2015-04-27 13:11                   ` Meier, Hans
  0 siblings, 1 reply; 14+ messages in thread
From: Gilles Chanteperdrix @ 2015-04-27 11:47 UTC (permalink / raw)
  To: Meier, Hans; +Cc: Jan Kiszka, xenomai@xenomai.org

On Mon, Apr 27, 2015 at 09:39:29AM +0000, Meier, Hans wrote:
> On 2015-04-24 18:03, Jan Kiszka wrote:
> > On 2015-04-24 17:35, Gilles Chanteperdrix wrote:
> >> On Fri, Apr 24, 2015 at 05:31:49PM +0200, Jan Kiszka wrote:
> >>> On 2015-04-24 17:26, Gilles Chanteperdrix wrote:
> >>>> On Fri, Apr 24, 2015 at 05:23:55PM +0200, Jan Kiszka wrote:
> >>>>> On 2015-04-24 17:18, Gilles Chanteperdrix wrote:
> >>>>>> On Fri, Apr 24, 2015 at 08:53:30AM +0200, Jan Kiszka wrote:
> >>>>>>> Am 2015-04-23 um 20:42 schrieb Meier, Hans:
> >>>>>>>> Hi everybody,
> >>>>>>>>
> >>>>>>>> First of all thanks a lot for your excellent work, we are using Xenomai 
> >>>>>>>> for about 8 years now in quite a complex application together with ACE 
> >>>>>>>> based on the POSIX skin and most of the time it just works fine.
> >>>>>>>>
> >>>>>>>> But now we have a situation we think needs to be reported. 
> >>>>>>>>
> >>>>>>>> Consider the following:
> >>>>>>>> We have a thread H with high priority, a thread L with low priority and 
> >>>>>>>> a mutex M (recursive, prio-inherit). L locks M and then H tries to lock M, 
> >>>>>>>> L gets boosted until it unlocks M, then H succeeds in locking M, then H 
> >>>>>>>> unlocks M. If H then immediately destroys and frees M, we get a corruption 
> >>>>>>>> where M's pthread_mutex_t was stored (a byte gets decremented), as soon as 
> >>>>>>>> L gets scheduled again.
> >>>>>>>>
> >>>>>>>> According to man page PTHREAD_MUTEX_DESTROY(3P), section "Destroying 
> >>>>>>>> Mutexes" - close to the end - "Implementations are required to allow an 
> >>>>>>>> object to be destroyed and freed ... immediately after the object is 
> >>>>>>>> unlocked". So that is what we do here, we destroy and free the mutex 
> >>>>>>>> immediately after it is unlocked. Certainly in a simple scenario we could 
> >>>>>>>> easily work around this problem by destroying and freeing M later, but 
> >>>>>>>> what if this code is buried deep inside a framework lib (here ACE)? 
> >>>>>>>
> >>>>>>> Not saying that ACE is to be blamed in this case (until the issue is
> >>>>>>> fully understood),
> >>>>>>
> >>>>>> The issue is fully understood, the mutex is considered in-use as
> >>>>>> long as all threads have not exited all the mutex services. In the
> >>>>>> case of the example posted, the "worker" thread is still in the
> >>>>>> pthread_mutex_unlock service while the other thread is trying to
> >>>>>> call pthread_mutex_destroy, which cause pthread_mutex_destroy to
> >>>>>> return EINVAL.
> >>>>>
> >>>>> If the reported pattern is actually equivalent to the pattern that of
> >>>>> the application (wasn't clear to me so far), then it is indeed understood.
> >>>>>
> >>>>>>
> >>>>>> Xenomai 3 does not have this issue.
> >>>>>>
> >>>>>
> >>>>> You mean returning EINVAL instead of EBUSY?
> >>>>
> >>>> I mean that pthread_mutex_destroy would not return an error and
> >>>> destroy the mutex as it should.
> >>>
> >>> The spec recommends to return EBUSY in case destruction of a locked
> >>> mutex is requested.
> >> 
> >> The thing is, as the example demonstrate, pthread_mutex_destroy
> >> fails while the mutex is unlocked. We are being over zealous here by
> >> considering that the mutex is busy because a thread is still in
> >> pthread_mutex_unlock.
> > 
> > OK, then it makes sense.
> 
> It is not only that pthread_mutex_destroy in this case should destroy the 
> mutex and return without error. More than that no manipulation on the 
> pthread_mutex_t structure must occur any more while leaving the 
> pthread_mutex_unlock service with low prio later. I guess this doesn't 
> depend on what pthread_mutex_destroy does internally or what it returns.
> Sorry that I didn't point that out more clearly in my initial mail. 

Again: the problem you have is that when you call
pthread_mutex_destroy, the worker thread has not left
pthread_mutex_unlock yet. So, the manipulation that occurs, occurs
in pthread_mutex_unlock. The Xenomai API itself does not use the
mutex after it has been freed, you cause it by freeing the mutex
while pthread_mutex_destroy told you that the mutex was busy.

Anyway, this has been fixed in Xenomai 3, this will not get fixed in
Xenoami 2.6.

-- 
					    Gilles.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Xenomai] Corruption after phtread_mutex_destroy
  2015-04-27 11:47                 ` Gilles Chanteperdrix
@ 2015-04-27 13:11                   ` Meier, Hans
  2015-04-27 13:20                     ` Gilles Chanteperdrix
  0 siblings, 1 reply; 14+ messages in thread
From: Meier, Hans @ 2015-04-27 13:11 UTC (permalink / raw)
  To: Gilles Chanteperdrix; +Cc: xenomai@xenomai.org

On 2015-04-27 13:47, Gilles Chanteperdrix wrote:
> On Mon, Apr 27, 2015 at 09:39:29AM +0000, Meier, Hans wrote:
>> On 2015-04-24 18:03, Jan Kiszka wrote:
>>> On 2015-04-24 17:35, Gilles Chanteperdrix wrote:
>>>> On Fri, Apr 24, 2015 at 05:31:49PM +0200, Jan Kiszka wrote:
>>>>>> On Fri, Apr 24, 2015 at 05:23:55PM +0200, Jan Kiszka wrote:
>>>>>>> On 2015-04-24 17:18, Gilles Chanteperdrix wrote:
>>>>>>>> On Fri, Apr 24, 2015 at 08:53:30AM +0200, Jan Kiszka wrote:
>>>>>>>>> Am 2015-04-23 um 20:42 schrieb Meier, Hans:
>>>>>>>>>> Hi everybody,
>>>>>>>>>>
>>>>>>>>>> First of all thanks a lot for your excellent work, we are using Xenomai 
>>>>>>>>>> for about 8 years now in quite a complex application together with ACE 
>>>>>>>>>> based on the POSIX skin and most of the time it just works fine.
>>>>>>>>>>
>>>>>>>>>> But now we have a situation we think needs to be reported. 
>>>>>>>>>>
>>>>>>>>>> Consider the following:
>>>>>>>>>> We have a thread H with high priority, a thread L with low priority and 
>>>>>>>>>> a mutex M (recursive, prio-inherit). L locks M and then H tries to lock M, 
>>>>>>>>>> L gets boosted until it unlocks M, then H succeeds in locking M, then H 
>>>>>>>>>> unlocks M. If H then immediately destroys and frees M, we get a corruption 
>>>>>>>>>> where M's pthread_mutex_t was stored (a byte gets decremented), as soon as 
>>>>>>>>>> L gets scheduled again.
>>>>>>>>>>
>>>>>>>>>> According to man page PTHREAD_MUTEX_DESTROY(3P), section "Destroying 
>>>>>>>>>> Mutexes" - close to the end - "Implementations are required to allow an 
>>>>>>>>>> object to be destroyed and freed ... immediately after the object is 
>>>>>>>>>> unlocked". So that is what we do here, we destroy and free the mutex 
>>>>>>>>>> immediately after it is unlocked. Certainly in a simple scenario we could 
>>>>>>>>>> easily work around this problem by destroying and freeing M later, but 
>>>>>>>>>> what if this code is buried deep inside a framework lib (here ACE)? 
>>>>>>>>>
>>>>>>>>> Not saying that ACE is to be blamed in this case (until the issue is
>>>>>>>>> fully understood),
>>>>>>>>
>>>>>>>> The issue is fully understood, the mutex is considered in-use as
>>>>>>>> long as all threads have not exited all the mutex services. In the
>>>>>>>> case of the example posted, the "worker" thread is still in the
>>>>>>>> pthread_mutex_unlock service while the other thread is trying to
>>>>>>>> call pthread_mutex_destroy, which cause pthread_mutex_destroy to
>>>>>>>> return EINVAL.
>>>>>>>
>>>>>>> If the reported pattern is actually equivalent to the pattern that of
>>>>>>> the application (wasn't clear to me so far), then it is indeed understood.
>>>>>>>
>>>>>>>>
>>>>>>>> Xenomai 3 does not have this issue.
>>>>>>>>
>>>>>>>
>>>>>>> You mean returning EINVAL instead of EBUSY?
>>>>>>
>>>>>> I mean that pthread_mutex_destroy would not return an error and
>>>>>> destroy the mutex as it should.
>>>>>
>>>>> The spec recommends to return EBUSY in case destruction of a locked
>>>>> mutex is requested.
>>>> 
>>>> The thing is, as the example demonstrate, pthread_mutex_destroy
>>>> fails while the mutex is unlocked. We are being over zealous here by
>>>> considering that the mutex is busy because a thread is still in
>>>> pthread_mutex_unlock.
>>> 
>>> OK, then it makes sense.
>> 
>> It is not only that pthread_mutex_destroy in this case should destroy the 
>> mutex and return without error. More than that no manipulation on the 
>> pthread_mutex_t structure must occur any more while leaving the 
>> pthread_mutex_unlock service with low prio later. I guess this doesn't 
>> depend on what pthread_mutex_destroy does internally or what it returns.
>> Sorry that I didn't point that out more clearly in my initial mail. 
> 
> Again: the problem you have is that when you call
> pthread_mutex_destroy, the worker thread has not left
> pthread_mutex_unlock yet. So, the manipulation that occurs, occurs
> in pthread_mutex_unlock. The Xenomai API itself does not use the
> mutex after it has been freed, you cause it by freeing the mutex
> while pthread_mutex_destroy told you that the mutex was busy.
> 
> Anyway, this has been fixed in Xenomai 3, this will not get fixed in
> Xenoami 2.6.
> 
> -- 
> 					    Gilles.

Yes, I understand that you advise me not to free the mutex if 
pthread_mutex_destroy returns EINVAL because another thread has not yet 
left the unlock service (albeit it has unlocked the mutex and lost its 
boosted priority). 

But how can I wait in a high prio thread for a low prio thread to exit 
pthread_mutex_unlock without violating the priority scheme of the 
application? pthread_mutex_lock followed by pthread_mutex_unlock doesn't 
help because it pushes the low prio thread (prio-inherit) out of the 
mutex, but not out of pthread_mutex_unlock. The only thing I can do is to 
loop over pthread_mutex_destoy and sleep (!) until pthread_mutex_destoy 
returns without error. 

Or is your suggestion to switch to Xenomai 3.0 RC3 now?

Hans


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Xenomai] Corruption after phtread_mutex_destroy
  2015-04-27 13:11                   ` Meier, Hans
@ 2015-04-27 13:20                     ` Gilles Chanteperdrix
  0 siblings, 0 replies; 14+ messages in thread
From: Gilles Chanteperdrix @ 2015-04-27 13:20 UTC (permalink / raw)
  To: Meier, Hans; +Cc: xenomai@xenomai.org

On Mon, Apr 27, 2015 at 01:11:14PM +0000, Meier, Hans wrote:
> On 2015-04-27 13:47, Gilles Chanteperdrix wrote:
> > On Mon, Apr 27, 2015 at 09:39:29AM +0000, Meier, Hans wrote:
> >> On 2015-04-24 18:03, Jan Kiszka wrote:
> >>> On 2015-04-24 17:35, Gilles Chanteperdrix wrote:
> >>>> On Fri, Apr 24, 2015 at 05:31:49PM +0200, Jan Kiszka wrote:
> >>>>>> On Fri, Apr 24, 2015 at 05:23:55PM +0200, Jan Kiszka wrote:
> >>>>>>> On 2015-04-24 17:18, Gilles Chanteperdrix wrote:
> >>>>>>>> On Fri, Apr 24, 2015 at 08:53:30AM +0200, Jan Kiszka wrote:
> >>>>>>>>> Am 2015-04-23 um 20:42 schrieb Meier, Hans:
> >>>>>>>>>> Hi everybody,
> >>>>>>>>>>
> >>>>>>>>>> First of all thanks a lot for your excellent work, we are using Xenomai 
> >>>>>>>>>> for about 8 years now in quite a complex application together with ACE 
> >>>>>>>>>> based on the POSIX skin and most of the time it just works fine.
> >>>>>>>>>>
> >>>>>>>>>> But now we have a situation we think needs to be reported. 
> >>>>>>>>>>
> >>>>>>>>>> Consider the following:
> >>>>>>>>>> We have a thread H with high priority, a thread L with low priority and 
> >>>>>>>>>> a mutex M (recursive, prio-inherit). L locks M and then H tries to lock M, 
> >>>>>>>>>> L gets boosted until it unlocks M, then H succeeds in locking M, then H 
> >>>>>>>>>> unlocks M. If H then immediately destroys and frees M, we get a corruption 
> >>>>>>>>>> where M's pthread_mutex_t was stored (a byte gets decremented), as soon as 
> >>>>>>>>>> L gets scheduled again.
> >>>>>>>>>>
> >>>>>>>>>> According to man page PTHREAD_MUTEX_DESTROY(3P), section "Destroying 
> >>>>>>>>>> Mutexes" - close to the end - "Implementations are required to allow an 
> >>>>>>>>>> object to be destroyed and freed ... immediately after the object is 
> >>>>>>>>>> unlocked". So that is what we do here, we destroy and free the mutex 
> >>>>>>>>>> immediately after it is unlocked. Certainly in a simple scenario we could 
> >>>>>>>>>> easily work around this problem by destroying and freeing M later, but 
> >>>>>>>>>> what if this code is buried deep inside a framework lib (here ACE)? 
> >>>>>>>>>
> >>>>>>>>> Not saying that ACE is to be blamed in this case (until the issue is
> >>>>>>>>> fully understood),
> >>>>>>>>
> >>>>>>>> The issue is fully understood, the mutex is considered in-use as
> >>>>>>>> long as all threads have not exited all the mutex services. In the
> >>>>>>>> case of the example posted, the "worker" thread is still in the
> >>>>>>>> pthread_mutex_unlock service while the other thread is trying to
> >>>>>>>> call pthread_mutex_destroy, which cause pthread_mutex_destroy to
> >>>>>>>> return EINVAL.
> >>>>>>>
> >>>>>>> If the reported pattern is actually equivalent to the pattern that of
> >>>>>>> the application (wasn't clear to me so far), then it is indeed understood.
> >>>>>>>
> >>>>>>>>
> >>>>>>>> Xenomai 3 does not have this issue.
> >>>>>>>>
> >>>>>>>
> >>>>>>> You mean returning EINVAL instead of EBUSY?
> >>>>>>
> >>>>>> I mean that pthread_mutex_destroy would not return an error and
> >>>>>> destroy the mutex as it should.
> >>>>>
> >>>>> The spec recommends to return EBUSY in case destruction of a locked
> >>>>> mutex is requested.
> >>>> 
> >>>> The thing is, as the example demonstrate, pthread_mutex_destroy
> >>>> fails while the mutex is unlocked. We are being over zealous here by
> >>>> considering that the mutex is busy because a thread is still in
> >>>> pthread_mutex_unlock.
> >>> 
> >>> OK, then it makes sense.
> >> 
> >> It is not only that pthread_mutex_destroy in this case should destroy the 
> >> mutex and return without error. More than that no manipulation on the 
> >> pthread_mutex_t structure must occur any more while leaving the 
> >> pthread_mutex_unlock service with low prio later. I guess this doesn't 
> >> depend on what pthread_mutex_destroy does internally or what it returns.
> >> Sorry that I didn't point that out more clearly in my initial mail. 
> > 
> > Again: the problem you have is that when you call
> > pthread_mutex_destroy, the worker thread has not left
> > pthread_mutex_unlock yet. So, the manipulation that occurs, occurs
> > in pthread_mutex_unlock. The Xenomai API itself does not use the
> > mutex after it has been freed, you cause it by freeing the mutex
> > while pthread_mutex_destroy told you that the mutex was busy.
> > 
> > Anyway, this has been fixed in Xenomai 3, this will not get fixed in
> > Xenoami 2.6.
> > 
> > -- 
> > 					    Gilles.
> 
> Yes, I understand that you advise me not to free the mutex if 
> pthread_mutex_destroy returns EINVAL because another thread has not yet 
> left the unlock service (albeit it has unlocked the mutex and lost its 
> boosted priority). 

Not only that, but it is wrong to free the mutex if
pthread_mutex_destroy does not return 0, whatever the reason, it is
as simple as that.

> 
> But how can I wait in a high prio thread for a low prio thread to exit 
> pthread_mutex_unlock without violating the priority scheme of the 
> application? pthread_mutex_lock followed by pthread_mutex_unlock doesn't 
> help because it pushes the low prio thread (prio-inherit) out of the 
> mutex, but not out of pthread_mutex_unlock. The only thing I can do is to 
> loop over pthread_mutex_destoy and sleep (!) until pthread_mutex_destoy 
> returns without error. 
> 
> Or is your suggestion to switch to Xenomai 3.0 RC3 now?

I am not suggesting anything. I explain you how it works. But why
not destroying the mutex from a low priority thread? Destroying the
mutex is a heavy operation, maybe doing that from a high priority
thread is not such a great idea anyway.

-- 
					    Gilles.


^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2015-04-27 13:20 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-04-23 18:42 [Xenomai] Corruption after phtread_mutex_destroy Meier, Hans
2015-04-23 19:03 ` Gilles Chanteperdrix
2015-04-23 19:27   ` Gilles Chanteperdrix
2015-04-24  6:53 ` Jan Kiszka
2015-04-24 15:18   ` Gilles Chanteperdrix
2015-04-24 15:23     ` Jan Kiszka
2015-04-24 15:26       ` Gilles Chanteperdrix
2015-04-24 15:31         ` Jan Kiszka
2015-04-24 15:35           ` Gilles Chanteperdrix
2015-04-24 16:02             ` Jan Kiszka
2015-04-27  9:39               ` Meier, Hans
2015-04-27 11:47                 ` Gilles Chanteperdrix
2015-04-27 13:11                   ` Meier, Hans
2015-04-27 13:20                     ` Gilles Chanteperdrix

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.