All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: [Xenomai-core] [Xenomai-git] Jan Kiszka : native: Improve fault tolerance /wrt multiple task deletions
       [not found] <E1O78nO-0008PE-EK@xenomai.org>
@ 2010-05-01 13:57 ` Gilles Chanteperdrix
  2010-05-01 14:06   ` Jan Kiszka
  0 siblings, 1 reply; 9+ messages in thread
From: Gilles Chanteperdrix @ 2010-05-01 13:57 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: xenomai-core

GIT version control wrote:
> Module: xenomai-jki
> Branch: for-upstream
> Commit: 5d2fa6c7578683e036d88bc6dbb6a7f458dfe705
> URL:    http://git.xenomai.org/?p=xenomai-jki.git;a=commit;h=5d2fa6c7578683e036d88bc6dbb6a7f458dfe705
> 
> Author: Jan Kiszka <jan.kiszka@domain.hid>
> Date:   Wed Apr 28 15:08:11 2010 +0200
> 
> native: Improve fault tolerance /wrt multiple task deletions
> 
> As we may pass the pthread handle of an RT_TASK directly to glibc, we
> may trigger a SIGSEGV if the underlying thread was already terminated.
> Try to catch this application mistakes by clearing the handle at least
> in that task descriptor which successfully ran rt_task_delete or
> rt_task_join.
> 
> Signed-off-by: Jan Kiszka <jan.kiszka@domain.hid>

Ok. I have tested this patch (though I could not find whether it was
discussed on the mailing list). And in fact, it looks to me like it
turns an application error into a silently working application. So, I
would propose:
- to use 0x8 instead of 0, to cause a segfault if an invalid handle is
used, without any risk of side effect (such as killing or joining the
wrong task if ever a pthread_t is reused)
- if Xenomai is compiled with --enable-debug, return -EINVAL, or use an
assert if such a value is encountered.


-- 
					    Gilles.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Xenomai-core] [Xenomai-git] Jan Kiszka : native: Improve fault tolerance /wrt multiple task deletions
  2010-05-01 13:57 ` [Xenomai-core] [Xenomai-git] Jan Kiszka : native: Improve fault tolerance /wrt multiple task deletions Gilles Chanteperdrix
@ 2010-05-01 14:06   ` Jan Kiszka
  2010-05-01 14:11     ` Gilles Chanteperdrix
  2010-05-01 14:59     ` Gilles Chanteperdrix
  0 siblings, 2 replies; 9+ messages in thread
From: Jan Kiszka @ 2010-05-01 14:06 UTC (permalink / raw)
  To: Gilles Chanteperdrix; +Cc: xenomai-core

[-- Attachment #1: Type: text/plain, Size: 1321 bytes --]

Gilles Chanteperdrix wrote:
> GIT version control wrote:
>> Module: xenomai-jki
>> Branch: for-upstream
>> Commit: 5d2fa6c7578683e036d88bc6dbb6a7f458dfe705
>> URL:    http://git.xenomai.org/?p=xenomai-jki.git;a=commit;h=5d2fa6c7578683e036d88bc6dbb6a7f458dfe705
>>
>> Author: Jan Kiszka <jan.kiszka@domain.hid>
>> Date:   Wed Apr 28 15:08:11 2010 +0200
>>
>> native: Improve fault tolerance /wrt multiple task deletions
>>
>> As we may pass the pthread handle of an RT_TASK directly to glibc, we
>> may trigger a SIGSEGV if the underlying thread was already terminated.
>> Try to catch this application mistakes by clearing the handle at least
>> in that task descriptor which successfully ran rt_task_delete or
>> rt_task_join.
>>
>> Signed-off-by: Jan Kiszka <jan.kiszka@domain.hid>
> 
> Ok. I have tested this patch (though I could not find whether it was
> discussed on the mailing list). And in fact, it looks to me like it
> turns an application error into a silently working application.

Then there is probably something broken: rt_task_delete is supposed to
return -EIDRM of the passed handle no longer exists. That's at least
what the doc says. The point of this patch is to turn an application
crash into a proper error return value (and that not only for
--enable-debug).

Jan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 257 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Xenomai-core] [Xenomai-git] Jan Kiszka : native: Improve fault tolerance /wrt multiple task deletions
  2010-05-01 14:06   ` Jan Kiszka
@ 2010-05-01 14:11     ` Gilles Chanteperdrix
  2010-05-01 14:18       ` Jan Kiszka
  2010-05-01 14:59     ` Gilles Chanteperdrix
  1 sibling, 1 reply; 9+ messages in thread
From: Gilles Chanteperdrix @ 2010-05-01 14:11 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: xenomai-core

Jan Kiszka wrote:
> Gilles Chanteperdrix wrote:
>> GIT version control wrote:
>>> Module: xenomai-jki
>>> Branch: for-upstream
>>> Commit: 5d2fa6c7578683e036d88bc6dbb6a7f458dfe705
>>> URL:    http://git.xenomai.org/?p=xenomai-jki.git;a=commit;h=5d2fa6c7578683e036d88bc6dbb6a7f458dfe705
>>>
>>> Author: Jan Kiszka <jan.kiszka@domain.hid>
>>> Date:   Wed Apr 28 15:08:11 2010 +0200
>>>
>>> native: Improve fault tolerance /wrt multiple task deletions
>>>
>>> As we may pass the pthread handle of an RT_TASK directly to glibc, we
>>> may trigger a SIGSEGV if the underlying thread was already terminated.
>>> Try to catch this application mistakes by clearing the handle at least
>>> in that task descriptor which successfully ran rt_task_delete or
>>> rt_task_join.
>>>
>>> Signed-off-by: Jan Kiszka <jan.kiszka@domain.hid>
>> Ok. I have tested this patch (though I could not find whether it was
>> discussed on the mailing list). And in fact, it looks to me like it
>> turns an application error into a silently working application.
> 
> Then there is probably something broken: rt_task_delete is supposed to
> return -EIDRM of the passed handle no longer exists. That's at least
> what the doc says. The point of this patch is to turn an application
> crash into a proper error return value (and that not only for
> --enable-debug).

Here is the test I used:
#include <stdio.h>
#include <native/task.h>
#include <sys/mman.h>

void task_main(void* arg)
{
        rt_task_sleep(1000000000);
}

int main(void)
{
     RT_TASK task;

     mlockall(MCL_CURRENT|MCL_FUTURE);
     rt_task_create(&task, "task", 128*1024, 99, T_FPU|T_JOINABLE);
     rt_task_start(&task, task_main, NULL);
     fprintf(stderr, "join: %d\n", rt_task_join(&task));
     fprintf(stderr, "delete: %d\n", rt_task_delete(&task));
}

it prints:
join: 0
delete: 0

This said, I like the segfault, because people probably never check the
return value of rt_task_join/rt_task_delete (which is, I guess, the
reason why phtread_cancel and pthread_join segfault themselves, because
the posix spec allows to return ESRCH in that case).

-- 
					    Gilles.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Xenomai-core] [Xenomai-git] Jan Kiszka : native: Improve fault tolerance /wrt multiple task deletions
  2010-05-01 14:11     ` Gilles Chanteperdrix
@ 2010-05-01 14:18       ` Jan Kiszka
  2010-05-01 14:26         ` Gilles Chanteperdrix
  0 siblings, 1 reply; 9+ messages in thread
From: Jan Kiszka @ 2010-05-01 14:18 UTC (permalink / raw)
  To: Gilles Chanteperdrix; +Cc: xenomai-core

[-- Attachment #1: Type: text/plain, Size: 2648 bytes --]

Gilles Chanteperdrix wrote:
> Jan Kiszka wrote:
>> Gilles Chanteperdrix wrote:
>>> GIT version control wrote:
>>>> Module: xenomai-jki
>>>> Branch: for-upstream
>>>> Commit: 5d2fa6c7578683e036d88bc6dbb6a7f458dfe705
>>>> URL:    http://git.xenomai.org/?p=xenomai-jki.git;a=commit;h=5d2fa6c7578683e036d88bc6dbb6a7f458dfe705
>>>>
>>>> Author: Jan Kiszka <jan.kiszka@domain.hid>
>>>> Date:   Wed Apr 28 15:08:11 2010 +0200
>>>>
>>>> native: Improve fault tolerance /wrt multiple task deletions
>>>>
>>>> As we may pass the pthread handle of an RT_TASK directly to glibc, we
>>>> may trigger a SIGSEGV if the underlying thread was already terminated.
>>>> Try to catch this application mistakes by clearing the handle at least
>>>> in that task descriptor which successfully ran rt_task_delete or
>>>> rt_task_join.
>>>>
>>>> Signed-off-by: Jan Kiszka <jan.kiszka@domain.hid>
>>> Ok. I have tested this patch (though I could not find whether it was
>>> discussed on the mailing list). And in fact, it looks to me like it
>>> turns an application error into a silently working application.
>> Then there is probably something broken: rt_task_delete is supposed to
>> return -EIDRM of the passed handle no longer exists. That's at least
>> what the doc says. The point of this patch is to turn an application
>> crash into a proper error return value (and that not only for
>> --enable-debug).
> 
> Here is the test I used:
> #include <stdio.h>
> #include <native/task.h>
> #include <sys/mman.h>
> 
> void task_main(void* arg)
> {
>         rt_task_sleep(1000000000);
> }
> 
> int main(void)
> {
>      RT_TASK task;
> 
>      mlockall(MCL_CURRENT|MCL_FUTURE);
>      rt_task_create(&task, "task", 128*1024, 99, T_FPU|T_JOINABLE);
>      rt_task_start(&task, task_main, NULL);
>      fprintf(stderr, "join: %d\n", rt_task_join(&task));
>      fprintf(stderr, "delete: %d\n", rt_task_delete(&task));
> }
> 
> it prints:
> join: 0
> delete: 0
> 
> This said, I like the segfault, because people probably never check the
> return value of rt_task_join/rt_task_delete (which is, I guess, the
> reason why phtread_cancel and pthread_join segfault themselves, because
> the posix spec allows to return ESRCH in that case).

Well, that's kind of hard to sell to people writing software based on
documentation. We documented -EIDRM, so we should make it work like
this. Will look into this later.

The fact that pthread_cancel /may/ crash on invalid thread handles is an
implementation detail: The handle is a pointer to an object that /may/
no longer exist in memory when dereferenced.

Jan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 257 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Xenomai-core] [Xenomai-git] Jan Kiszka : native: Improve fault tolerance /wrt multiple task deletions
  2010-05-01 14:18       ` Jan Kiszka
@ 2010-05-01 14:26         ` Gilles Chanteperdrix
  2010-05-01 14:29           ` Jan Kiszka
  0 siblings, 1 reply; 9+ messages in thread
From: Gilles Chanteperdrix @ 2010-05-01 14:26 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: xenomai-core

Jan Kiszka wrote:
> Gilles Chanteperdrix wrote:
>> Jan Kiszka wrote:
>>> Gilles Chanteperdrix wrote:
>>>> GIT version control wrote:
>>>>> Module: xenomai-jki
>>>>> Branch: for-upstream
>>>>> Commit: 5d2fa6c7578683e036d88bc6dbb6a7f458dfe705
>>>>> URL:    http://git.xenomai.org/?p=xenomai-jki.git;a=commit;h=5d2fa6c7578683e036d88bc6dbb6a7f458dfe705
>>>>>
>>>>> Author: Jan Kiszka <jan.kiszka@domain.hid>
>>>>> Date:   Wed Apr 28 15:08:11 2010 +0200
>>>>>
>>>>> native: Improve fault tolerance /wrt multiple task deletions
>>>>>
>>>>> As we may pass the pthread handle of an RT_TASK directly to glibc, we
>>>>> may trigger a SIGSEGV if the underlying thread was already terminated.
>>>>> Try to catch this application mistakes by clearing the handle at least
>>>>> in that task descriptor which successfully ran rt_task_delete or
>>>>> rt_task_join.
>>>>>
>>>>> Signed-off-by: Jan Kiszka <jan.kiszka@domain.hid>
>>>> Ok. I have tested this patch (though I could not find whether it was
>>>> discussed on the mailing list). And in fact, it looks to me like it
>>>> turns an application error into a silently working application.
>>> Then there is probably something broken: rt_task_delete is supposed to
>>> return -EIDRM of the passed handle no longer exists. That's at least
>>> what the doc says. The point of this patch is to turn an application
>>> crash into a proper error return value (and that not only for
>>> --enable-debug).
>> Here is the test I used:
>> #include <stdio.h>
>> #include <native/task.h>
>> #include <sys/mman.h>
>>
>> void task_main(void* arg)
>> {
>>         rt_task_sleep(1000000000);
>> }
>>
>> int main(void)
>> {
>>      RT_TASK task;
>>
>>      mlockall(MCL_CURRENT|MCL_FUTURE);
>>      rt_task_create(&task, "task", 128*1024, 99, T_FPU|T_JOINABLE);
>>      rt_task_start(&task, task_main, NULL);
>>      fprintf(stderr, "join: %d\n", rt_task_join(&task));
>>      fprintf(stderr, "delete: %d\n", rt_task_delete(&task));
>> }
>>
>> it prints:
>> join: 0
>> delete: 0
>>
>> This said, I like the segfault, because people probably never check the
>> return value of rt_task_join/rt_task_delete (which is, I guess, the
>> reason why phtread_cancel and pthread_join segfault themselves, because
>> the posix spec allows to return ESRCH in that case).
> 
> Well, that's kind of hard to sell to people writing software based on
> documentation. We documented -EIDRM, so we should make it work like
> this. Will look into this later.
> 
> The fact that pthread_cancel /may/ crash on invalid thread handles is an
> implementation detail: The handle is a pointer to an object that /may/
> no longer exist in memory when dereferenced.

Ok, another point is that when cancelling a joinable thread,
pthread_join should still be called after pthread_cancel. So,
rt_task_join should still be able to call pthread_join after a call to
rt_task_delete.

Running pthread_join inside rt_task_delete is not an option, as there
would be a risk for deadlocks.

-- 
					    Gilles.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Xenomai-core] [Xenomai-git] Jan Kiszka : native: Improve fault tolerance /wrt multiple task deletions
  2010-05-01 14:26         ` Gilles Chanteperdrix
@ 2010-05-01 14:29           ` Jan Kiszka
  2010-05-01 14:32             ` Jan Kiszka
  2010-05-01 14:33             ` Gilles Chanteperdrix
  0 siblings, 2 replies; 9+ messages in thread
From: Jan Kiszka @ 2010-05-01 14:29 UTC (permalink / raw)
  To: Gilles Chanteperdrix; +Cc: xenomai-core

[-- Attachment #1: Type: text/plain, Size: 3249 bytes --]

Gilles Chanteperdrix wrote:
> Jan Kiszka wrote:
>> Gilles Chanteperdrix wrote:
>>> Jan Kiszka wrote:
>>>> Gilles Chanteperdrix wrote:
>>>>> GIT version control wrote:
>>>>>> Module: xenomai-jki
>>>>>> Branch: for-upstream
>>>>>> Commit: 5d2fa6c7578683e036d88bc6dbb6a7f458dfe705
>>>>>> URL:    http://git.xenomai.org/?p=xenomai-jki.git;a=commit;h=5d2fa6c7578683e036d88bc6dbb6a7f458dfe705
>>>>>>
>>>>>> Author: Jan Kiszka <jan.kiszka@domain.hid>
>>>>>> Date:   Wed Apr 28 15:08:11 2010 +0200
>>>>>>
>>>>>> native: Improve fault tolerance /wrt multiple task deletions
>>>>>>
>>>>>> As we may pass the pthread handle of an RT_TASK directly to glibc, we
>>>>>> may trigger a SIGSEGV if the underlying thread was already terminated.
>>>>>> Try to catch this application mistakes by clearing the handle at least
>>>>>> in that task descriptor which successfully ran rt_task_delete or
>>>>>> rt_task_join.
>>>>>>
>>>>>> Signed-off-by: Jan Kiszka <jan.kiszka@domain.hid>
>>>>> Ok. I have tested this patch (though I could not find whether it was
>>>>> discussed on the mailing list). And in fact, it looks to me like it
>>>>> turns an application error into a silently working application.
>>>> Then there is probably something broken: rt_task_delete is supposed to
>>>> return -EIDRM of the passed handle no longer exists. That's at least
>>>> what the doc says. The point of this patch is to turn an application
>>>> crash into a proper error return value (and that not only for
>>>> --enable-debug).
>>> Here is the test I used:
>>> #include <stdio.h>
>>> #include <native/task.h>
>>> #include <sys/mman.h>
>>>
>>> void task_main(void* arg)
>>> {
>>>         rt_task_sleep(1000000000);
>>> }
>>>
>>> int main(void)
>>> {
>>>      RT_TASK task;
>>>
>>>      mlockall(MCL_CURRENT|MCL_FUTURE);
>>>      rt_task_create(&task, "task", 128*1024, 99, T_FPU|T_JOINABLE);
>>>      rt_task_start(&task, task_main, NULL);
>>>      fprintf(stderr, "join: %d\n", rt_task_join(&task));
>>>      fprintf(stderr, "delete: %d\n", rt_task_delete(&task));
>>> }
>>>
>>> it prints:
>>> join: 0
>>> delete: 0
>>>
>>> This said, I like the segfault, because people probably never check the
>>> return value of rt_task_join/rt_task_delete (which is, I guess, the
>>> reason why phtread_cancel and pthread_join segfault themselves, because
>>> the posix spec allows to return ESRCH in that case).
>> Well, that's kind of hard to sell to people writing software based on
>> documentation. We documented -EIDRM, so we should make it work like
>> this. Will look into this later.
>>
>> The fact that pthread_cancel /may/ crash on invalid thread handles is an
>> implementation detail: The handle is a pointer to an object that /may/
>> no longer exist in memory when dereferenced.
> 
> Ok, another point is that when cancelling a joinable thread,
> pthread_join should still be called after pthread_cancel. So,
> rt_task_join should still be able to call pthread_join after a call to
> rt_task_delete.
> 
> Running pthread_join inside rt_task_delete is not an option, as there
> would be a risk for deadlocks.
> 

So we should always pthread_detach before canceling a thread in
rt_task_delete?

Jan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 257 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Xenomai-core] [Xenomai-git] Jan Kiszka : native: Improve fault tolerance /wrt multiple task deletions
  2010-05-01 14:29           ` Jan Kiszka
@ 2010-05-01 14:32             ` Jan Kiszka
  2010-05-01 14:33             ` Gilles Chanteperdrix
  1 sibling, 0 replies; 9+ messages in thread
From: Jan Kiszka @ 2010-05-01 14:32 UTC (permalink / raw)
  To: Gilles Chanteperdrix; +Cc: xenomai-core

[-- Attachment #1: Type: text/plain, Size: 3448 bytes --]

Jan Kiszka wrote:
> Gilles Chanteperdrix wrote:
>> Jan Kiszka wrote:
>>> Gilles Chanteperdrix wrote:
>>>> Jan Kiszka wrote:
>>>>> Gilles Chanteperdrix wrote:
>>>>>> GIT version control wrote:
>>>>>>> Module: xenomai-jki
>>>>>>> Branch: for-upstream
>>>>>>> Commit: 5d2fa6c7578683e036d88bc6dbb6a7f458dfe705
>>>>>>> URL:    http://git.xenomai.org/?p=xenomai-jki.git;a=commit;h=5d2fa6c7578683e036d88bc6dbb6a7f458dfe705
>>>>>>>
>>>>>>> Author: Jan Kiszka <jan.kiszka@domain.hid>
>>>>>>> Date:   Wed Apr 28 15:08:11 2010 +0200
>>>>>>>
>>>>>>> native: Improve fault tolerance /wrt multiple task deletions
>>>>>>>
>>>>>>> As we may pass the pthread handle of an RT_TASK directly to glibc, we
>>>>>>> may trigger a SIGSEGV if the underlying thread was already terminated.
>>>>>>> Try to catch this application mistakes by clearing the handle at least
>>>>>>> in that task descriptor which successfully ran rt_task_delete or
>>>>>>> rt_task_join.
>>>>>>>
>>>>>>> Signed-off-by: Jan Kiszka <jan.kiszka@domain.hid>
>>>>>> Ok. I have tested this patch (though I could not find whether it was
>>>>>> discussed on the mailing list). And in fact, it looks to me like it
>>>>>> turns an application error into a silently working application.
>>>>> Then there is probably something broken: rt_task_delete is supposed to
>>>>> return -EIDRM of the passed handle no longer exists. That's at least
>>>>> what the doc says. The point of this patch is to turn an application
>>>>> crash into a proper error return value (and that not only for
>>>>> --enable-debug).
>>>> Here is the test I used:
>>>> #include <stdio.h>
>>>> #include <native/task.h>
>>>> #include <sys/mman.h>
>>>>
>>>> void task_main(void* arg)
>>>> {
>>>>         rt_task_sleep(1000000000);
>>>> }
>>>>
>>>> int main(void)
>>>> {
>>>>      RT_TASK task;
>>>>
>>>>      mlockall(MCL_CURRENT|MCL_FUTURE);
>>>>      rt_task_create(&task, "task", 128*1024, 99, T_FPU|T_JOINABLE);
>>>>      rt_task_start(&task, task_main, NULL);
>>>>      fprintf(stderr, "join: %d\n", rt_task_join(&task));
>>>>      fprintf(stderr, "delete: %d\n", rt_task_delete(&task));
>>>> }
>>>>
>>>> it prints:
>>>> join: 0
>>>> delete: 0
>>>>
>>>> This said, I like the segfault, because people probably never check the
>>>> return value of rt_task_join/rt_task_delete (which is, I guess, the
>>>> reason why phtread_cancel and pthread_join segfault themselves, because
>>>> the posix spec allows to return ESRCH in that case).
>>> Well, that's kind of hard to sell to people writing software based on
>>> documentation. We documented -EIDRM, so we should make it work like
>>> this. Will look into this later.
>>>
>>> The fact that pthread_cancel /may/ crash on invalid thread handles is an
>>> implementation detail: The handle is a pointer to an object that /may/
>>> no longer exist in memory when dereferenced.
>> Ok, another point is that when cancelling a joinable thread,
>> pthread_join should still be called after pthread_cancel. So,
>> rt_task_join should still be able to call pthread_join after a call to
>> rt_task_delete.
>>
>> Running pthread_join inside rt_task_delete is not an option, as there
>> would be a risk for deadlocks.
>>
> 
> So we should always pthread_detach before canceling a thread in
> rt_task_delete?

Err, after, but only if it was joinable. Think we need to track
T_JOINABLE in user space as well.

Jan



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 257 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Xenomai-core] [Xenomai-git] Jan Kiszka : native: Improve fault tolerance /wrt multiple task deletions
  2010-05-01 14:29           ` Jan Kiszka
  2010-05-01 14:32             ` Jan Kiszka
@ 2010-05-01 14:33             ` Gilles Chanteperdrix
  1 sibling, 0 replies; 9+ messages in thread
From: Gilles Chanteperdrix @ 2010-05-01 14:33 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: xenomai-core

Jan Kiszka wrote:
> Gilles Chanteperdrix wrote:
>> Jan Kiszka wrote:
>>> Gilles Chanteperdrix wrote:
>>>> Jan Kiszka wrote:
>>>>> Gilles Chanteperdrix wrote:
>>>>>> GIT version control wrote:
>>>>>>> Module: xenomai-jki
>>>>>>> Branch: for-upstream
>>>>>>> Commit: 5d2fa6c7578683e036d88bc6dbb6a7f458dfe705
>>>>>>> URL:    http://git.xenomai.org/?p=xenomai-jki.git;a=commit;h=5d2fa6c7578683e036d88bc6dbb6a7f458dfe705
>>>>>>>
>>>>>>> Author: Jan Kiszka <jan.kiszka@domain.hid>
>>>>>>> Date:   Wed Apr 28 15:08:11 2010 +0200
>>>>>>>
>>>>>>> native: Improve fault tolerance /wrt multiple task deletions
>>>>>>>
>>>>>>> As we may pass the pthread handle of an RT_TASK directly to glibc, we
>>>>>>> may trigger a SIGSEGV if the underlying thread was already terminated.
>>>>>>> Try to catch this application mistakes by clearing the handle at least
>>>>>>> in that task descriptor which successfully ran rt_task_delete or
>>>>>>> rt_task_join.
>>>>>>>
>>>>>>> Signed-off-by: Jan Kiszka <jan.kiszka@domain.hid>
>>>>>> Ok. I have tested this patch (though I could not find whether it was
>>>>>> discussed on the mailing list). And in fact, it looks to me like it
>>>>>> turns an application error into a silently working application.
>>>>> Then there is probably something broken: rt_task_delete is supposed to
>>>>> return -EIDRM of the passed handle no longer exists. That's at least
>>>>> what the doc says. The point of this patch is to turn an application
>>>>> crash into a proper error return value (and that not only for
>>>>> --enable-debug).
>>>> Here is the test I used:
>>>> #include <stdio.h>
>>>> #include <native/task.h>
>>>> #include <sys/mman.h>
>>>>
>>>> void task_main(void* arg)
>>>> {
>>>>         rt_task_sleep(1000000000);
>>>> }
>>>>
>>>> int main(void)
>>>> {
>>>>      RT_TASK task;
>>>>
>>>>      mlockall(MCL_CURRENT|MCL_FUTURE);
>>>>      rt_task_create(&task, "task", 128*1024, 99, T_FPU|T_JOINABLE);
>>>>      rt_task_start(&task, task_main, NULL);
>>>>      fprintf(stderr, "join: %d\n", rt_task_join(&task));
>>>>      fprintf(stderr, "delete: %d\n", rt_task_delete(&task));
>>>> }
>>>>
>>>> it prints:
>>>> join: 0
>>>> delete: 0
>>>>
>>>> This said, I like the segfault, because people probably never check the
>>>> return value of rt_task_join/rt_task_delete (which is, I guess, the
>>>> reason why phtread_cancel and pthread_join segfault themselves, because
>>>> the posix spec allows to return ESRCH in that case).
>>> Well, that's kind of hard to sell to people writing software based on
>>> documentation. We documented -EIDRM, so we should make it work like
>>> this. Will look into this later.
>>>
>>> The fact that pthread_cancel /may/ crash on invalid thread handles is an
>>> implementation detail: The handle is a pointer to an object that /may/
>>> no longer exist in memory when dereferenced.
>> Ok, another point is that when cancelling a joinable thread,
>> pthread_join should still be called after pthread_cancel. So,
>> rt_task_join should still be able to call pthread_join after a call to
>> rt_task_delete.
>>
>> Running pthread_join inside rt_task_delete is not an option, as there
>> would be a risk for deadlocks.
>>
> 
> So we should always pthread_detach before canceling a thread in
> rt_task_delete?

>From my point of view, we should not make the native api deviate too
much from posix, so, if the thread was created joinable, it should
require rt_task_join to be run after rt_task_delete. IOW, set opaque2 to
0 only if the thread was joinable.

-- 
					    Gilles.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Xenomai-core] [Xenomai-git] Jan Kiszka : native: Improve fault tolerance /wrt multiple task deletions
  2010-05-01 14:06   ` Jan Kiszka
  2010-05-01 14:11     ` Gilles Chanteperdrix
@ 2010-05-01 14:59     ` Gilles Chanteperdrix
  1 sibling, 0 replies; 9+ messages in thread
From: Gilles Chanteperdrix @ 2010-05-01 14:59 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: xenomai-core

Jan Kiszka wrote:
> Gilles Chanteperdrix wrote:
>> GIT version control wrote:
>>> Module: xenomai-jki
>>> Branch: for-upstream
>>> Commit: 5d2fa6c7578683e036d88bc6dbb6a7f458dfe705
>>> URL:    http://git.xenomai.org/?p=xenomai-jki.git;a=commit;h=5d2fa6c7578683e036d88bc6dbb6a7f458dfe705
>>>
>>> Author: Jan Kiszka <jan.kiszka@domain.hid>
>>> Date:   Wed Apr 28 15:08:11 2010 +0200
>>>
>>> native: Improve fault tolerance /wrt multiple task deletions
>>>
>>> As we may pass the pthread handle of an RT_TASK directly to glibc, we
>>> may trigger a SIGSEGV if the underlying thread was already terminated.
>>> Try to catch this application mistakes by clearing the handle at least
>>> in that task descriptor which successfully ran rt_task_delete or
>>> rt_task_join.
>>>
>>> Signed-off-by: Jan Kiszka <jan.kiszka@domain.hid>
>> Ok. I have tested this patch (though I could not find whether it was
>> discussed on the mailing list). And in fact, it looks to me like it
>> turns an application error into a silently working application.
> 
> Then there is probably something broken: rt_task_delete is supposed to
> return -EIDRM of the passed handle no longer exists. That's at least
> what the doc says. The point of this patch is to turn an application
> crash into a proper error return value (and that not only for
> --enable-debug).

After some investigation, it turns out that opaque2 == 0 is already
useful for another case: when rt_task_delete'ing a thread belonging to
another process than the current process. So, by attaching two meanings
to this value, we introduce a regression.

-- 
					    Gilles.


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2010-05-01 14:59 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <E1O78nO-0008PE-EK@xenomai.org>
2010-05-01 13:57 ` [Xenomai-core] [Xenomai-git] Jan Kiszka : native: Improve fault tolerance /wrt multiple task deletions Gilles Chanteperdrix
2010-05-01 14:06   ` Jan Kiszka
2010-05-01 14:11     ` Gilles Chanteperdrix
2010-05-01 14:18       ` Jan Kiszka
2010-05-01 14:26         ` Gilles Chanteperdrix
2010-05-01 14:29           ` Jan Kiszka
2010-05-01 14:32             ` Jan Kiszka
2010-05-01 14:33             ` Gilles Chanteperdrix
2010-05-01 14:59     ` Gilles Chanteperdrix

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.