From mboxrd@z Thu Jan 1 00:00:00 1970 Message-ID: <4B1850B6.4010906@domain.hid> Date: Fri, 04 Dec 2009 00:58:46 +0100 From: Gilles Chanteperdrix MIME-Version: 1.0 References: In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Subject: Re: [Xenomai-help] Problem with pthread_cond_wait List-Id: Help regarding installation and common use of Xenomai List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: "Soboljew, Patrick" Cc: xenomai@xenomai.org Soboljew, Patrick wrote: > Hello all, > > I have a strange problem concerning the posix skin of xenomai (Ver. > 2.4.9.1). Whenever two or more threads call 'pthread_cond_wait' and I > want to interrupt the program with SIGINT (CTRL-c) only the main thread > and the first thread that called 'pthread_cond_wait' get the signal. The > remaining threads are not interrupted so I have created some zombies > here. I discovered this problem when I tried to debug some code with the > ACE/TAO Framework which calls these functions in a similar way. The > debugger also has problems to interrupt these threads. > > The small code example illustrates what I did. > > Has anyone an idea what exactly causes this problem? The problem is the way pthread_cond_wait handles interruption by signals: the thread tries to re-acquire the mutex before returning to user-space, so as to make the system call restartable, so may be suspended if the mutex is not free (which happens for the second thread in your test program), so that it does not return to user-space, the signal remains pending and unhandled, and you get the disturbing behaviour when hitting ctrl-c or when running inside gdb. We can fix that by making the syscall non restartable, and let the user-space handle the mutex re-locking (which it fortunately already does). Note that restarting automatically pthread_cond_wait is not even correct, since we could miss a pthread_cond_signal if it was sent between the time when the thread was unblocked from the cond wait, and the time it starts waiting again. So, in this case, it is better to return to the caller, you get a spurious wake-up, which means that the caller must run pthread_cond_wait in a loop for the program to run correctly. Anyway, here is a quick fix, could you try it? Notes for Philippe: the quick fix does not break the ABI, but also changes the behaviour of non restartable syscalls, they forcibly return -EINTR. This may look like a disrupting change for the 2.4 branch, but there is actually currently only one non-restartable syscall in Xenomai 2.4: pthread_mutex_unlock, and it is ready to handle -EINTR. However, at this chance, we should mark a few more syscalls as non restartable, notably nanosleep and select, because they use relative timeouts. I think a lot of syscalls in the native skin are using relative timeouts too, and should be marked as non-restartable, but this implies documenting the return value -EINTR. diff --git a/ksrc/nucleus/shadow.c b/ksrc/nucleus/shadow.c index 0b02f70..b361480 100644 --- a/ksrc/nucleus/shadow.c +++ b/ksrc/nucleus/shadow.c @@ -823,7 +823,7 @@ static inline void request_syscall_restart(xnthread_t *thread, if (__xn_interrupted_p(regs)) { __xn_error_return(regs, (sysflags & __xn_exec_norestart) ? - -ERESTARTNOHAND : -ERESTARTSYS); + -EINTR : -ERESTARTSYS); notify = !xnthread_test_state(thread, XNDEBUG); } diff --git a/ksrc/skins/posix/syscall.c b/ksrc/skins/posix/syscall.c index d936f21..4c16589 100644 --- a/ksrc/skins/posix/syscall.c +++ b/ksrc/skins/posix/syscall.c @@ -1518,6 +1518,7 @@ static int __pthread_cond_wait_prologue(struct task_struct *curr, timed, XN_INFINITE); +#if 0 if (err == EINTR) { do { xnthread_clear_info(cur, XNKICKED); @@ -1528,7 +1529,7 @@ static int __pthread_cond_wait_prologue(struct task_struct *curr, xnthread_set_info(cur, XNKICKED); err = EINTR; } - +#endif if (!err || err == EINTR || err == ETIMEDOUT) __xn_copy_to_user(curr, (void __user *) __xn_reg_arg3(regs), @@ -2834,7 +2835,7 @@ static xnsysent_t __systab[] = { [__pse51_cond_init] = {&__pthread_cond_init, __xn_exec_any}, [__pse51_cond_destroy] = {&__pthread_cond_destroy, __xn_exec_any}, [__pse51_cond_wait_prologue] = - {&__pthread_cond_wait_prologue, __xn_exec_primary}, + {&__pthread_cond_wait_prologue, __xn_exec_primary | __xn_exec_norestart}, [__pse51_cond_wait_epilogue] = {&__pthread_cond_wait_epilogue, __xn_exec_primary}, [__pse51_cond_signal] = {&__pthread_cond_signal, __xn_exec_any}, -- Gilles.