From mboxrd@z Thu Jan 1 00:00:00 1970 Message-ID: <4BDC3A09.2060106@domain.hid> Date: Sat, 01 May 2010 16:26:17 +0200 From: Gilles Chanteperdrix MIME-Version: 1.0 References: <4BDC332E.2010804@domain.hid> <4BDC357A.30803@domain.hid> <4BDC3686.9050004@domain.hid> <4BDC3829.9070306@domain.hid> In-Reply-To: <4BDC3829.9070306@domain.hid> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Subject: Re: [Xenomai-core] [Xenomai-git] Jan Kiszka : native: Improve fault tolerance /wrt multiple task deletions List-Id: Xenomai life and development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Jan Kiszka Cc: xenomai-core Jan Kiszka wrote: > Gilles Chanteperdrix wrote: >> Jan Kiszka wrote: >>> Gilles Chanteperdrix wrote: >>>> GIT version control wrote: >>>>> Module: xenomai-jki >>>>> Branch: for-upstream >>>>> Commit: 5d2fa6c7578683e036d88bc6dbb6a7f458dfe705 >>>>> URL: http://git.xenomai.org/?p=xenomai-jki.git;a=commit;h=5d2fa6c7578683e036d88bc6dbb6a7f458dfe705 >>>>> >>>>> Author: Jan Kiszka >>>>> Date: Wed Apr 28 15:08:11 2010 +0200 >>>>> >>>>> native: Improve fault tolerance /wrt multiple task deletions >>>>> >>>>> As we may pass the pthread handle of an RT_TASK directly to glibc, we >>>>> may trigger a SIGSEGV if the underlying thread was already terminated. >>>>> Try to catch this application mistakes by clearing the handle at least >>>>> in that task descriptor which successfully ran rt_task_delete or >>>>> rt_task_join. >>>>> >>>>> Signed-off-by: Jan Kiszka >>>> Ok. I have tested this patch (though I could not find whether it was >>>> discussed on the mailing list). And in fact, it looks to me like it >>>> turns an application error into a silently working application. >>> Then there is probably something broken: rt_task_delete is supposed to >>> return -EIDRM of the passed handle no longer exists. That's at least >>> what the doc says. The point of this patch is to turn an application >>> crash into a proper error return value (and that not only for >>> --enable-debug). >> Here is the test I used: >> #include >> #include >> #include >> >> void task_main(void* arg) >> { >> rt_task_sleep(1000000000); >> } >> >> int main(void) >> { >> RT_TASK task; >> >> mlockall(MCL_CURRENT|MCL_FUTURE); >> rt_task_create(&task, "task", 128*1024, 99, T_FPU|T_JOINABLE); >> rt_task_start(&task, task_main, NULL); >> fprintf(stderr, "join: %d\n", rt_task_join(&task)); >> fprintf(stderr, "delete: %d\n", rt_task_delete(&task)); >> } >> >> it prints: >> join: 0 >> delete: 0 >> >> This said, I like the segfault, because people probably never check the >> return value of rt_task_join/rt_task_delete (which is, I guess, the >> reason why phtread_cancel and pthread_join segfault themselves, because >> the posix spec allows to return ESRCH in that case). > > Well, that's kind of hard to sell to people writing software based on > documentation. We documented -EIDRM, so we should make it work like > this. Will look into this later. > > The fact that pthread_cancel /may/ crash on invalid thread handles is an > implementation detail: The handle is a pointer to an object that /may/ > no longer exist in memory when dereferenced. Ok, another point is that when cancelling a joinable thread, pthread_join should still be called after pthread_cancel. So, rt_task_join should still be able to call pthread_join after a call to rt_task_delete. Running pthread_join inside rt_task_delete is not an option, as there would be a risk for deadlocks. -- Gilles.