* Re: [Xenomai-core] [Xenomai-git] Jan Kiszka : native: Improve fault tolerance /wrt multiple task deletions [not found] <E1O78nO-0008PE-EK@xenomai.org> @ 2010-05-01 13:57 ` Gilles Chanteperdrix 2010-05-01 14:06 ` Jan Kiszka 0 siblings, 1 reply; 9+ messages in thread From: Gilles Chanteperdrix @ 2010-05-01 13:57 UTC (permalink / raw) To: Jan Kiszka; +Cc: xenomai-core GIT version control wrote: > Module: xenomai-jki > Branch: for-upstream > Commit: 5d2fa6c7578683e036d88bc6dbb6a7f458dfe705 > URL: http://git.xenomai.org/?p=xenomai-jki.git;a=commit;h=5d2fa6c7578683e036d88bc6dbb6a7f458dfe705 > > Author: Jan Kiszka <jan.kiszka@domain.hid> > Date: Wed Apr 28 15:08:11 2010 +0200 > > native: Improve fault tolerance /wrt multiple task deletions > > As we may pass the pthread handle of an RT_TASK directly to glibc, we > may trigger a SIGSEGV if the underlying thread was already terminated. > Try to catch this application mistakes by clearing the handle at least > in that task descriptor which successfully ran rt_task_delete or > rt_task_join. > > Signed-off-by: Jan Kiszka <jan.kiszka@domain.hid> Ok. I have tested this patch (though I could not find whether it was discussed on the mailing list). And in fact, it looks to me like it turns an application error into a silently working application. So, I would propose: - to use 0x8 instead of 0, to cause a segfault if an invalid handle is used, without any risk of side effect (such as killing or joining the wrong task if ever a pthread_t is reused) - if Xenomai is compiled with --enable-debug, return -EINVAL, or use an assert if such a value is encountered. -- Gilles. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Xenomai-core] [Xenomai-git] Jan Kiszka : native: Improve fault tolerance /wrt multiple task deletions 2010-05-01 13:57 ` [Xenomai-core] [Xenomai-git] Jan Kiszka : native: Improve fault tolerance /wrt multiple task deletions Gilles Chanteperdrix @ 2010-05-01 14:06 ` Jan Kiszka 2010-05-01 14:11 ` Gilles Chanteperdrix 2010-05-01 14:59 ` Gilles Chanteperdrix 0 siblings, 2 replies; 9+ messages in thread From: Jan Kiszka @ 2010-05-01 14:06 UTC (permalink / raw) To: Gilles Chanteperdrix; +Cc: xenomai-core [-- Attachment #1: Type: text/plain, Size: 1321 bytes --] Gilles Chanteperdrix wrote: > GIT version control wrote: >> Module: xenomai-jki >> Branch: for-upstream >> Commit: 5d2fa6c7578683e036d88bc6dbb6a7f458dfe705 >> URL: http://git.xenomai.org/?p=xenomai-jki.git;a=commit;h=5d2fa6c7578683e036d88bc6dbb6a7f458dfe705 >> >> Author: Jan Kiszka <jan.kiszka@domain.hid> >> Date: Wed Apr 28 15:08:11 2010 +0200 >> >> native: Improve fault tolerance /wrt multiple task deletions >> >> As we may pass the pthread handle of an RT_TASK directly to glibc, we >> may trigger a SIGSEGV if the underlying thread was already terminated. >> Try to catch this application mistakes by clearing the handle at least >> in that task descriptor which successfully ran rt_task_delete or >> rt_task_join. >> >> Signed-off-by: Jan Kiszka <jan.kiszka@domain.hid> > > Ok. I have tested this patch (though I could not find whether it was > discussed on the mailing list). And in fact, it looks to me like it > turns an application error into a silently working application. Then there is probably something broken: rt_task_delete is supposed to return -EIDRM of the passed handle no longer exists. That's at least what the doc says. The point of this patch is to turn an application crash into a proper error return value (and that not only for --enable-debug). Jan [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 257 bytes --] ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Xenomai-core] [Xenomai-git] Jan Kiszka : native: Improve fault tolerance /wrt multiple task deletions 2010-05-01 14:06 ` Jan Kiszka @ 2010-05-01 14:11 ` Gilles Chanteperdrix 2010-05-01 14:18 ` Jan Kiszka 2010-05-01 14:59 ` Gilles Chanteperdrix 1 sibling, 1 reply; 9+ messages in thread From: Gilles Chanteperdrix @ 2010-05-01 14:11 UTC (permalink / raw) To: Jan Kiszka; +Cc: xenomai-core Jan Kiszka wrote: > Gilles Chanteperdrix wrote: >> GIT version control wrote: >>> Module: xenomai-jki >>> Branch: for-upstream >>> Commit: 5d2fa6c7578683e036d88bc6dbb6a7f458dfe705 >>> URL: http://git.xenomai.org/?p=xenomai-jki.git;a=commit;h=5d2fa6c7578683e036d88bc6dbb6a7f458dfe705 >>> >>> Author: Jan Kiszka <jan.kiszka@domain.hid> >>> Date: Wed Apr 28 15:08:11 2010 +0200 >>> >>> native: Improve fault tolerance /wrt multiple task deletions >>> >>> As we may pass the pthread handle of an RT_TASK directly to glibc, we >>> may trigger a SIGSEGV if the underlying thread was already terminated. >>> Try to catch this application mistakes by clearing the handle at least >>> in that task descriptor which successfully ran rt_task_delete or >>> rt_task_join. >>> >>> Signed-off-by: Jan Kiszka <jan.kiszka@domain.hid> >> Ok. I have tested this patch (though I could not find whether it was >> discussed on the mailing list). And in fact, it looks to me like it >> turns an application error into a silently working application. > > Then there is probably something broken: rt_task_delete is supposed to > return -EIDRM of the passed handle no longer exists. That's at least > what the doc says. The point of this patch is to turn an application > crash into a proper error return value (and that not only for > --enable-debug). Here is the test I used: #include <stdio.h> #include <native/task.h> #include <sys/mman.h> void task_main(void* arg) { rt_task_sleep(1000000000); } int main(void) { RT_TASK task; mlockall(MCL_CURRENT|MCL_FUTURE); rt_task_create(&task, "task", 128*1024, 99, T_FPU|T_JOINABLE); rt_task_start(&task, task_main, NULL); fprintf(stderr, "join: %d\n", rt_task_join(&task)); fprintf(stderr, "delete: %d\n", rt_task_delete(&task)); } it prints: join: 0 delete: 0 This said, I like the segfault, because people probably never check the return value of rt_task_join/rt_task_delete (which is, I guess, the reason why phtread_cancel and pthread_join segfault themselves, because the posix spec allows to return ESRCH in that case). -- Gilles. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Xenomai-core] [Xenomai-git] Jan Kiszka : native: Improve fault tolerance /wrt multiple task deletions 2010-05-01 14:11 ` Gilles Chanteperdrix @ 2010-05-01 14:18 ` Jan Kiszka 2010-05-01 14:26 ` Gilles Chanteperdrix 0 siblings, 1 reply; 9+ messages in thread From: Jan Kiszka @ 2010-05-01 14:18 UTC (permalink / raw) To: Gilles Chanteperdrix; +Cc: xenomai-core [-- Attachment #1: Type: text/plain, Size: 2648 bytes --] Gilles Chanteperdrix wrote: > Jan Kiszka wrote: >> Gilles Chanteperdrix wrote: >>> GIT version control wrote: >>>> Module: xenomai-jki >>>> Branch: for-upstream >>>> Commit: 5d2fa6c7578683e036d88bc6dbb6a7f458dfe705 >>>> URL: http://git.xenomai.org/?p=xenomai-jki.git;a=commit;h=5d2fa6c7578683e036d88bc6dbb6a7f458dfe705 >>>> >>>> Author: Jan Kiszka <jan.kiszka@domain.hid> >>>> Date: Wed Apr 28 15:08:11 2010 +0200 >>>> >>>> native: Improve fault tolerance /wrt multiple task deletions >>>> >>>> As we may pass the pthread handle of an RT_TASK directly to glibc, we >>>> may trigger a SIGSEGV if the underlying thread was already terminated. >>>> Try to catch this application mistakes by clearing the handle at least >>>> in that task descriptor which successfully ran rt_task_delete or >>>> rt_task_join. >>>> >>>> Signed-off-by: Jan Kiszka <jan.kiszka@domain.hid> >>> Ok. I have tested this patch (though I could not find whether it was >>> discussed on the mailing list). And in fact, it looks to me like it >>> turns an application error into a silently working application. >> Then there is probably something broken: rt_task_delete is supposed to >> return -EIDRM of the passed handle no longer exists. That's at least >> what the doc says. The point of this patch is to turn an application >> crash into a proper error return value (and that not only for >> --enable-debug). > > Here is the test I used: > #include <stdio.h> > #include <native/task.h> > #include <sys/mman.h> > > void task_main(void* arg) > { > rt_task_sleep(1000000000); > } > > int main(void) > { > RT_TASK task; > > mlockall(MCL_CURRENT|MCL_FUTURE); > rt_task_create(&task, "task", 128*1024, 99, T_FPU|T_JOINABLE); > rt_task_start(&task, task_main, NULL); > fprintf(stderr, "join: %d\n", rt_task_join(&task)); > fprintf(stderr, "delete: %d\n", rt_task_delete(&task)); > } > > it prints: > join: 0 > delete: 0 > > This said, I like the segfault, because people probably never check the > return value of rt_task_join/rt_task_delete (which is, I guess, the > reason why phtread_cancel and pthread_join segfault themselves, because > the posix spec allows to return ESRCH in that case). Well, that's kind of hard to sell to people writing software based on documentation. We documented -EIDRM, so we should make it work like this. Will look into this later. The fact that pthread_cancel /may/ crash on invalid thread handles is an implementation detail: The handle is a pointer to an object that /may/ no longer exist in memory when dereferenced. Jan [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 257 bytes --] ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Xenomai-core] [Xenomai-git] Jan Kiszka : native: Improve fault tolerance /wrt multiple task deletions 2010-05-01 14:18 ` Jan Kiszka @ 2010-05-01 14:26 ` Gilles Chanteperdrix 2010-05-01 14:29 ` Jan Kiszka 0 siblings, 1 reply; 9+ messages in thread From: Gilles Chanteperdrix @ 2010-05-01 14:26 UTC (permalink / raw) To: Jan Kiszka; +Cc: xenomai-core Jan Kiszka wrote: > Gilles Chanteperdrix wrote: >> Jan Kiszka wrote: >>> Gilles Chanteperdrix wrote: >>>> GIT version control wrote: >>>>> Module: xenomai-jki >>>>> Branch: for-upstream >>>>> Commit: 5d2fa6c7578683e036d88bc6dbb6a7f458dfe705 >>>>> URL: http://git.xenomai.org/?p=xenomai-jki.git;a=commit;h=5d2fa6c7578683e036d88bc6dbb6a7f458dfe705 >>>>> >>>>> Author: Jan Kiszka <jan.kiszka@domain.hid> >>>>> Date: Wed Apr 28 15:08:11 2010 +0200 >>>>> >>>>> native: Improve fault tolerance /wrt multiple task deletions >>>>> >>>>> As we may pass the pthread handle of an RT_TASK directly to glibc, we >>>>> may trigger a SIGSEGV if the underlying thread was already terminated. >>>>> Try to catch this application mistakes by clearing the handle at least >>>>> in that task descriptor which successfully ran rt_task_delete or >>>>> rt_task_join. >>>>> >>>>> Signed-off-by: Jan Kiszka <jan.kiszka@domain.hid> >>>> Ok. I have tested this patch (though I could not find whether it was >>>> discussed on the mailing list). And in fact, it looks to me like it >>>> turns an application error into a silently working application. >>> Then there is probably something broken: rt_task_delete is supposed to >>> return -EIDRM of the passed handle no longer exists. That's at least >>> what the doc says. The point of this patch is to turn an application >>> crash into a proper error return value (and that not only for >>> --enable-debug). >> Here is the test I used: >> #include <stdio.h> >> #include <native/task.h> >> #include <sys/mman.h> >> >> void task_main(void* arg) >> { >> rt_task_sleep(1000000000); >> } >> >> int main(void) >> { >> RT_TASK task; >> >> mlockall(MCL_CURRENT|MCL_FUTURE); >> rt_task_create(&task, "task", 128*1024, 99, T_FPU|T_JOINABLE); >> rt_task_start(&task, task_main, NULL); >> fprintf(stderr, "join: %d\n", rt_task_join(&task)); >> fprintf(stderr, "delete: %d\n", rt_task_delete(&task)); >> } >> >> it prints: >> join: 0 >> delete: 0 >> >> This said, I like the segfault, because people probably never check the >> return value of rt_task_join/rt_task_delete (which is, I guess, the >> reason why phtread_cancel and pthread_join segfault themselves, because >> the posix spec allows to return ESRCH in that case). > > Well, that's kind of hard to sell to people writing software based on > documentation. We documented -EIDRM, so we should make it work like > this. Will look into this later. > > The fact that pthread_cancel /may/ crash on invalid thread handles is an > implementation detail: The handle is a pointer to an object that /may/ > no longer exist in memory when dereferenced. Ok, another point is that when cancelling a joinable thread, pthread_join should still be called after pthread_cancel. So, rt_task_join should still be able to call pthread_join after a call to rt_task_delete. Running pthread_join inside rt_task_delete is not an option, as there would be a risk for deadlocks. -- Gilles. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Xenomai-core] [Xenomai-git] Jan Kiszka : native: Improve fault tolerance /wrt multiple task deletions 2010-05-01 14:26 ` Gilles Chanteperdrix @ 2010-05-01 14:29 ` Jan Kiszka 2010-05-01 14:32 ` Jan Kiszka 2010-05-01 14:33 ` Gilles Chanteperdrix 0 siblings, 2 replies; 9+ messages in thread From: Jan Kiszka @ 2010-05-01 14:29 UTC (permalink / raw) To: Gilles Chanteperdrix; +Cc: xenomai-core [-- Attachment #1: Type: text/plain, Size: 3249 bytes --] Gilles Chanteperdrix wrote: > Jan Kiszka wrote: >> Gilles Chanteperdrix wrote: >>> Jan Kiszka wrote: >>>> Gilles Chanteperdrix wrote: >>>>> GIT version control wrote: >>>>>> Module: xenomai-jki >>>>>> Branch: for-upstream >>>>>> Commit: 5d2fa6c7578683e036d88bc6dbb6a7f458dfe705 >>>>>> URL: http://git.xenomai.org/?p=xenomai-jki.git;a=commit;h=5d2fa6c7578683e036d88bc6dbb6a7f458dfe705 >>>>>> >>>>>> Author: Jan Kiszka <jan.kiszka@domain.hid> >>>>>> Date: Wed Apr 28 15:08:11 2010 +0200 >>>>>> >>>>>> native: Improve fault tolerance /wrt multiple task deletions >>>>>> >>>>>> As we may pass the pthread handle of an RT_TASK directly to glibc, we >>>>>> may trigger a SIGSEGV if the underlying thread was already terminated. >>>>>> Try to catch this application mistakes by clearing the handle at least >>>>>> in that task descriptor which successfully ran rt_task_delete or >>>>>> rt_task_join. >>>>>> >>>>>> Signed-off-by: Jan Kiszka <jan.kiszka@domain.hid> >>>>> Ok. I have tested this patch (though I could not find whether it was >>>>> discussed on the mailing list). And in fact, it looks to me like it >>>>> turns an application error into a silently working application. >>>> Then there is probably something broken: rt_task_delete is supposed to >>>> return -EIDRM of the passed handle no longer exists. That's at least >>>> what the doc says. The point of this patch is to turn an application >>>> crash into a proper error return value (and that not only for >>>> --enable-debug). >>> Here is the test I used: >>> #include <stdio.h> >>> #include <native/task.h> >>> #include <sys/mman.h> >>> >>> void task_main(void* arg) >>> { >>> rt_task_sleep(1000000000); >>> } >>> >>> int main(void) >>> { >>> RT_TASK task; >>> >>> mlockall(MCL_CURRENT|MCL_FUTURE); >>> rt_task_create(&task, "task", 128*1024, 99, T_FPU|T_JOINABLE); >>> rt_task_start(&task, task_main, NULL); >>> fprintf(stderr, "join: %d\n", rt_task_join(&task)); >>> fprintf(stderr, "delete: %d\n", rt_task_delete(&task)); >>> } >>> >>> it prints: >>> join: 0 >>> delete: 0 >>> >>> This said, I like the segfault, because people probably never check the >>> return value of rt_task_join/rt_task_delete (which is, I guess, the >>> reason why phtread_cancel and pthread_join segfault themselves, because >>> the posix spec allows to return ESRCH in that case). >> Well, that's kind of hard to sell to people writing software based on >> documentation. We documented -EIDRM, so we should make it work like >> this. Will look into this later. >> >> The fact that pthread_cancel /may/ crash on invalid thread handles is an >> implementation detail: The handle is a pointer to an object that /may/ >> no longer exist in memory when dereferenced. > > Ok, another point is that when cancelling a joinable thread, > pthread_join should still be called after pthread_cancel. So, > rt_task_join should still be able to call pthread_join after a call to > rt_task_delete. > > Running pthread_join inside rt_task_delete is not an option, as there > would be a risk for deadlocks. > So we should always pthread_detach before canceling a thread in rt_task_delete? Jan [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 257 bytes --] ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Xenomai-core] [Xenomai-git] Jan Kiszka : native: Improve fault tolerance /wrt multiple task deletions 2010-05-01 14:29 ` Jan Kiszka @ 2010-05-01 14:32 ` Jan Kiszka 2010-05-01 14:33 ` Gilles Chanteperdrix 1 sibling, 0 replies; 9+ messages in thread From: Jan Kiszka @ 2010-05-01 14:32 UTC (permalink / raw) To: Gilles Chanteperdrix; +Cc: xenomai-core [-- Attachment #1: Type: text/plain, Size: 3448 bytes --] Jan Kiszka wrote: > Gilles Chanteperdrix wrote: >> Jan Kiszka wrote: >>> Gilles Chanteperdrix wrote: >>>> Jan Kiszka wrote: >>>>> Gilles Chanteperdrix wrote: >>>>>> GIT version control wrote: >>>>>>> Module: xenomai-jki >>>>>>> Branch: for-upstream >>>>>>> Commit: 5d2fa6c7578683e036d88bc6dbb6a7f458dfe705 >>>>>>> URL: http://git.xenomai.org/?p=xenomai-jki.git;a=commit;h=5d2fa6c7578683e036d88bc6dbb6a7f458dfe705 >>>>>>> >>>>>>> Author: Jan Kiszka <jan.kiszka@domain.hid> >>>>>>> Date: Wed Apr 28 15:08:11 2010 +0200 >>>>>>> >>>>>>> native: Improve fault tolerance /wrt multiple task deletions >>>>>>> >>>>>>> As we may pass the pthread handle of an RT_TASK directly to glibc, we >>>>>>> may trigger a SIGSEGV if the underlying thread was already terminated. >>>>>>> Try to catch this application mistakes by clearing the handle at least >>>>>>> in that task descriptor which successfully ran rt_task_delete or >>>>>>> rt_task_join. >>>>>>> >>>>>>> Signed-off-by: Jan Kiszka <jan.kiszka@domain.hid> >>>>>> Ok. I have tested this patch (though I could not find whether it was >>>>>> discussed on the mailing list). And in fact, it looks to me like it >>>>>> turns an application error into a silently working application. >>>>> Then there is probably something broken: rt_task_delete is supposed to >>>>> return -EIDRM of the passed handle no longer exists. That's at least >>>>> what the doc says. The point of this patch is to turn an application >>>>> crash into a proper error return value (and that not only for >>>>> --enable-debug). >>>> Here is the test I used: >>>> #include <stdio.h> >>>> #include <native/task.h> >>>> #include <sys/mman.h> >>>> >>>> void task_main(void* arg) >>>> { >>>> rt_task_sleep(1000000000); >>>> } >>>> >>>> int main(void) >>>> { >>>> RT_TASK task; >>>> >>>> mlockall(MCL_CURRENT|MCL_FUTURE); >>>> rt_task_create(&task, "task", 128*1024, 99, T_FPU|T_JOINABLE); >>>> rt_task_start(&task, task_main, NULL); >>>> fprintf(stderr, "join: %d\n", rt_task_join(&task)); >>>> fprintf(stderr, "delete: %d\n", rt_task_delete(&task)); >>>> } >>>> >>>> it prints: >>>> join: 0 >>>> delete: 0 >>>> >>>> This said, I like the segfault, because people probably never check the >>>> return value of rt_task_join/rt_task_delete (which is, I guess, the >>>> reason why phtread_cancel and pthread_join segfault themselves, because >>>> the posix spec allows to return ESRCH in that case). >>> Well, that's kind of hard to sell to people writing software based on >>> documentation. We documented -EIDRM, so we should make it work like >>> this. Will look into this later. >>> >>> The fact that pthread_cancel /may/ crash on invalid thread handles is an >>> implementation detail: The handle is a pointer to an object that /may/ >>> no longer exist in memory when dereferenced. >> Ok, another point is that when cancelling a joinable thread, >> pthread_join should still be called after pthread_cancel. So, >> rt_task_join should still be able to call pthread_join after a call to >> rt_task_delete. >> >> Running pthread_join inside rt_task_delete is not an option, as there >> would be a risk for deadlocks. >> > > So we should always pthread_detach before canceling a thread in > rt_task_delete? Err, after, but only if it was joinable. Think we need to track T_JOINABLE in user space as well. Jan [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 257 bytes --] ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Xenomai-core] [Xenomai-git] Jan Kiszka : native: Improve fault tolerance /wrt multiple task deletions 2010-05-01 14:29 ` Jan Kiszka 2010-05-01 14:32 ` Jan Kiszka @ 2010-05-01 14:33 ` Gilles Chanteperdrix 1 sibling, 0 replies; 9+ messages in thread From: Gilles Chanteperdrix @ 2010-05-01 14:33 UTC (permalink / raw) To: Jan Kiszka; +Cc: xenomai-core Jan Kiszka wrote: > Gilles Chanteperdrix wrote: >> Jan Kiszka wrote: >>> Gilles Chanteperdrix wrote: >>>> Jan Kiszka wrote: >>>>> Gilles Chanteperdrix wrote: >>>>>> GIT version control wrote: >>>>>>> Module: xenomai-jki >>>>>>> Branch: for-upstream >>>>>>> Commit: 5d2fa6c7578683e036d88bc6dbb6a7f458dfe705 >>>>>>> URL: http://git.xenomai.org/?p=xenomai-jki.git;a=commit;h=5d2fa6c7578683e036d88bc6dbb6a7f458dfe705 >>>>>>> >>>>>>> Author: Jan Kiszka <jan.kiszka@domain.hid> >>>>>>> Date: Wed Apr 28 15:08:11 2010 +0200 >>>>>>> >>>>>>> native: Improve fault tolerance /wrt multiple task deletions >>>>>>> >>>>>>> As we may pass the pthread handle of an RT_TASK directly to glibc, we >>>>>>> may trigger a SIGSEGV if the underlying thread was already terminated. >>>>>>> Try to catch this application mistakes by clearing the handle at least >>>>>>> in that task descriptor which successfully ran rt_task_delete or >>>>>>> rt_task_join. >>>>>>> >>>>>>> Signed-off-by: Jan Kiszka <jan.kiszka@domain.hid> >>>>>> Ok. I have tested this patch (though I could not find whether it was >>>>>> discussed on the mailing list). And in fact, it looks to me like it >>>>>> turns an application error into a silently working application. >>>>> Then there is probably something broken: rt_task_delete is supposed to >>>>> return -EIDRM of the passed handle no longer exists. That's at least >>>>> what the doc says. The point of this patch is to turn an application >>>>> crash into a proper error return value (and that not only for >>>>> --enable-debug). >>>> Here is the test I used: >>>> #include <stdio.h> >>>> #include <native/task.h> >>>> #include <sys/mman.h> >>>> >>>> void task_main(void* arg) >>>> { >>>> rt_task_sleep(1000000000); >>>> } >>>> >>>> int main(void) >>>> { >>>> RT_TASK task; >>>> >>>> mlockall(MCL_CURRENT|MCL_FUTURE); >>>> rt_task_create(&task, "task", 128*1024, 99, T_FPU|T_JOINABLE); >>>> rt_task_start(&task, task_main, NULL); >>>> fprintf(stderr, "join: %d\n", rt_task_join(&task)); >>>> fprintf(stderr, "delete: %d\n", rt_task_delete(&task)); >>>> } >>>> >>>> it prints: >>>> join: 0 >>>> delete: 0 >>>> >>>> This said, I like the segfault, because people probably never check the >>>> return value of rt_task_join/rt_task_delete (which is, I guess, the >>>> reason why phtread_cancel and pthread_join segfault themselves, because >>>> the posix spec allows to return ESRCH in that case). >>> Well, that's kind of hard to sell to people writing software based on >>> documentation. We documented -EIDRM, so we should make it work like >>> this. Will look into this later. >>> >>> The fact that pthread_cancel /may/ crash on invalid thread handles is an >>> implementation detail: The handle is a pointer to an object that /may/ >>> no longer exist in memory when dereferenced. >> Ok, another point is that when cancelling a joinable thread, >> pthread_join should still be called after pthread_cancel. So, >> rt_task_join should still be able to call pthread_join after a call to >> rt_task_delete. >> >> Running pthread_join inside rt_task_delete is not an option, as there >> would be a risk for deadlocks. >> > > So we should always pthread_detach before canceling a thread in > rt_task_delete? >From my point of view, we should not make the native api deviate too much from posix, so, if the thread was created joinable, it should require rt_task_join to be run after rt_task_delete. IOW, set opaque2 to 0 only if the thread was joinable. -- Gilles. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Xenomai-core] [Xenomai-git] Jan Kiszka : native: Improve fault tolerance /wrt multiple task deletions 2010-05-01 14:06 ` Jan Kiszka 2010-05-01 14:11 ` Gilles Chanteperdrix @ 2010-05-01 14:59 ` Gilles Chanteperdrix 1 sibling, 0 replies; 9+ messages in thread From: Gilles Chanteperdrix @ 2010-05-01 14:59 UTC (permalink / raw) To: Jan Kiszka; +Cc: xenomai-core Jan Kiszka wrote: > Gilles Chanteperdrix wrote: >> GIT version control wrote: >>> Module: xenomai-jki >>> Branch: for-upstream >>> Commit: 5d2fa6c7578683e036d88bc6dbb6a7f458dfe705 >>> URL: http://git.xenomai.org/?p=xenomai-jki.git;a=commit;h=5d2fa6c7578683e036d88bc6dbb6a7f458dfe705 >>> >>> Author: Jan Kiszka <jan.kiszka@domain.hid> >>> Date: Wed Apr 28 15:08:11 2010 +0200 >>> >>> native: Improve fault tolerance /wrt multiple task deletions >>> >>> As we may pass the pthread handle of an RT_TASK directly to glibc, we >>> may trigger a SIGSEGV if the underlying thread was already terminated. >>> Try to catch this application mistakes by clearing the handle at least >>> in that task descriptor which successfully ran rt_task_delete or >>> rt_task_join. >>> >>> Signed-off-by: Jan Kiszka <jan.kiszka@domain.hid> >> Ok. I have tested this patch (though I could not find whether it was >> discussed on the mailing list). And in fact, it looks to me like it >> turns an application error into a silently working application. > > Then there is probably something broken: rt_task_delete is supposed to > return -EIDRM of the passed handle no longer exists. That's at least > what the doc says. The point of this patch is to turn an application > crash into a proper error return value (and that not only for > --enable-debug). After some investigation, it turns out that opaque2 == 0 is already useful for another case: when rt_task_delete'ing a thread belonging to another process than the current process. So, by attaching two meanings to this value, we introduce a regression. -- Gilles. ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2010-05-01 14:59 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <E1O78nO-0008PE-EK@xenomai.org>
2010-05-01 13:57 ` [Xenomai-core] [Xenomai-git] Jan Kiszka : native: Improve fault tolerance /wrt multiple task deletions Gilles Chanteperdrix
2010-05-01 14:06 ` Jan Kiszka
2010-05-01 14:11 ` Gilles Chanteperdrix
2010-05-01 14:18 ` Jan Kiszka
2010-05-01 14:26 ` Gilles Chanteperdrix
2010-05-01 14:29 ` Jan Kiszka
2010-05-01 14:32 ` Jan Kiszka
2010-05-01 14:33 ` Gilles Chanteperdrix
2010-05-01 14:59 ` Gilles Chanteperdrix
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.