All of lore.kernel.org
 help / color / mirror / Atom feed
* [Xenomai-help] Problem with pthread_cond_wait
@ 2009-12-03 12:07 Soboljew, Patrick
  2009-12-03 13:15 ` Gilles Chanteperdrix
  2009-12-03 23:58 ` Gilles Chanteperdrix
  0 siblings, 2 replies; 8+ messages in thread
From: Soboljew, Patrick @ 2009-12-03 12:07 UTC (permalink / raw)
  To: xenomai

Hello all,

I have a strange problem concerning the posix skin of xenomai (Ver.
2.4.9.1). Whenever two or more threads call 'pthread_cond_wait' and I
want to interrupt the program with SIGINT (CTRL-c) only the main thread
and the first thread that called 'pthread_cond_wait' get the signal. The
remaining threads are not interrupted so I have created some zombies
here. I discovered this problem when I tried to debug some code with the
ACE/TAO Framework which calls these functions in a similar way. The
debugger also has problems to interrupt these threads.

The small code example illustrates what I did.

Has anyone an idea what exactly causes this problem?

Thanks for any help
Patrick Soboljew

======================================== Code Begin
======================================== 
#include <pthread.h>
#include <sys/mman.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>


const int NUM_THREADS=2;
pthread_mutex_t Mutex;
pthread_cond_t  Condition;


void SetupRtThread()
{
#ifdef __XENO__
    int RetVal = 0;
    struct sched_param SParam;
    SParam.sched_priority = 1;
    int Policy = SCHED_FIFO;
    if ((RetVal = pthread_setschedparam(pthread_self(), Policy,
&SParam)) != 0)
    {
        printf("pthread_setschedparam: %s", strerror(RetVal));
        exit(RetVal);
    }
#endif
}


void *ThreadFunc(void *pData)
{
    SetupRtThread();
    int MyId = (int)pData;

    printf("Starting thread %d\n", MyId);

    pthread_mutex_lock(&Mutex);
    pthread_cond_wait(&Condition, &Mutex);
    pthread_mutex_unlock(&Mutex);

    pthread_exit(NULL);
}


int main (int /*argc*/, char** /*argv[]*/)
{
    mlockall(MCL_CURRENT | MCL_FUTURE);
    SetupRtThread();


    pthread_t threads[NUM_THREADS];


    // Initialize mutex and condition variable objects
    pthread_mutex_init(&Mutex, NULL);
    pthread_cond_init (&Condition, NULL);


    pthread_attr_t attr;
    pthread_attr_init(&attr);
    pthread_attr_setdetachstate(&attr, PTHREAD_CREATE_JOINABLE);

    for (int i=0; i<NUM_THREADS; ++i)
    {
        pthread_create(&threads[i], &attr, ThreadFunc, (void *)i);
    }

    /* Wait for all threads to complete */
    for (int i=0; i<NUM_THREADS; i++)
    {
        pthread_join(threads[i], NULL);
    }
    printf ("Main(): Waited on %d  threads. Done.\n", NUM_THREADS);

    /* Clean up and exit */
    pthread_attr_destroy(&attr);
    pthread_mutex_destroy(&Mutex);
    pthread_cond_destroy(&Condition);
    pthread_exit(NULL);

}

======================================== Code End
============================================ 


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Xenomai-help] Problem with pthread_cond_wait
  2009-12-03 12:07 [Xenomai-help] Problem with pthread_cond_wait Soboljew, Patrick
@ 2009-12-03 13:15 ` Gilles Chanteperdrix
  2009-12-03 23:58 ` Gilles Chanteperdrix
  1 sibling, 0 replies; 8+ messages in thread
From: Gilles Chanteperdrix @ 2009-12-03 13:15 UTC (permalink / raw)
  To: Soboljew, Patrick; +Cc: xenomai

Soboljew, Patrick wrote:
> Hello all,
> 
> I have a strange problem concerning the posix skin of xenomai (Ver.
> 2.4.9.1). Whenever two or more threads call 'pthread_cond_wait' and I
> want to interrupt the program with SIGINT (CTRL-c) only the main thread
> and the first thread that called 'pthread_cond_wait' get the signal. The
> remaining threads are not interrupted so I have created some zombies
> here. I discovered this problem when I tried to debug some code with the
> ACE/TAO Framework which calls these functions in a similar way. The
> debugger also has problems to interrupt these threads.
> 
> The small code example illustrates what I did.
> 
> Has anyone an idea what exactly causes this problem?

I believe the problem is that the other threads are blocked on the mutex
relocking by pthread_cond_wait. I believe this issue has just been
solved in 2.5 by the rework of pthread_cond_wait. I need to check with
your test program.

Getting the current situation to work implies handling the SIGINT (or
SIGTERM) by cancelling all threads. Then in each thread doing a
pthread_cond_wait registers a cleanup handler (pthread_cleanup_push)
that releases the mutex.

If I am right, I will check if it is possible to backport the
pthread_cond_wait rework without breaking the ABI.

-- 
                                          Gilles



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Xenomai-help] Problem with pthread_cond_wait
  2009-12-03 12:07 [Xenomai-help] Problem with pthread_cond_wait Soboljew, Patrick
  2009-12-03 13:15 ` Gilles Chanteperdrix
@ 2009-12-03 23:58 ` Gilles Chanteperdrix
  2009-12-04  9:46   ` Soboljew, Patrick
  2009-12-04  9:59   ` Philippe Gerum
  1 sibling, 2 replies; 8+ messages in thread
From: Gilles Chanteperdrix @ 2009-12-03 23:58 UTC (permalink / raw)
  To: Soboljew, Patrick; +Cc: xenomai

Soboljew, Patrick wrote:
> Hello all,
> 
> I have a strange problem concerning the posix skin of xenomai (Ver.
> 2.4.9.1). Whenever two or more threads call 'pthread_cond_wait' and I
> want to interrupt the program with SIGINT (CTRL-c) only the main thread
> and the first thread that called 'pthread_cond_wait' get the signal. The
> remaining threads are not interrupted so I have created some zombies
> here. I discovered this problem when I tried to debug some code with the
> ACE/TAO Framework which calls these functions in a similar way. The
> debugger also has problems to interrupt these threads.
> 
> The small code example illustrates what I did.
> 
> Has anyone an idea what exactly causes this problem?

The problem is the way pthread_cond_wait handles interruption by 
signals: the thread tries to re-acquire the mutex before returning to 
user-space, so as to make the system call restartable, so may be 
suspended if the mutex is not free (which happens for the second thread 
in your test program), so that it does not return to user-space, the 
signal remains pending and unhandled, and you get the disturbing 
behaviour when hitting ctrl-c or when running inside gdb.

We can fix that by making the syscall non restartable, and let the 
user-space handle the mutex re-locking (which it fortunately already 
does). 

Note that restarting automatically pthread_cond_wait is not even 
correct, since we could miss a pthread_cond_signal if it was sent 
between the time when the thread was unblocked from the cond wait, and 
the time it starts waiting again. So, in this case, it is better to 
return to the caller, you get a spurious wake-up, which means that the 
caller must run pthread_cond_wait in a loop for the program to run 
correctly.

Anyway, here is a quick fix, could you try it?

Notes for Philippe: the quick fix does not break the ABI, but also 
changes the behaviour of non restartable syscalls, they forcibly return 
-EINTR. This may look like a disrupting change for the 2.4 branch, but 
there is actually currently only one non-restartable syscall in Xenomai 
2.4: pthread_mutex_unlock, and it is ready to handle -EINTR. However, at
this chance, we should mark a few more syscalls as non restartable, 
notably nanosleep and select, because they use relative timeouts. I 
think a lot of syscalls in the native skin are using relative timeouts 
too, and should be marked as non-restartable, but this implies
documenting the return value -EINTR.

diff --git a/ksrc/nucleus/shadow.c b/ksrc/nucleus/shadow.c
index 0b02f70..b361480 100644
--- a/ksrc/nucleus/shadow.c
+++ b/ksrc/nucleus/shadow.c
@@ -823,7 +823,7 @@ static inline void request_syscall_restart(xnthread_t *thread,
 		if (__xn_interrupted_p(regs)) {
 			__xn_error_return(regs,
 					  (sysflags & __xn_exec_norestart) ?
-					  -ERESTARTNOHAND : -ERESTARTSYS);
+					  -EINTR : -ERESTARTSYS);
 			notify = !xnthread_test_state(thread, XNDEBUG);
 		}
 
diff --git a/ksrc/skins/posix/syscall.c b/ksrc/skins/posix/syscall.c
index d936f21..4c16589 100644
--- a/ksrc/skins/posix/syscall.c
+++ b/ksrc/skins/posix/syscall.c
@@ -1518,6 +1518,7 @@ static int __pthread_cond_wait_prologue(struct task_struct *curr,
 						    timed,
 						    XN_INFINITE);
 
+#if 0
 	if (err == EINTR) {
 		do {
 			xnthread_clear_info(cur, XNKICKED);
@@ -1528,7 +1529,7 @@ static int __pthread_cond_wait_prologue(struct task_struct *curr,
 		xnthread_set_info(cur, XNKICKED);
 		err = EINTR;
 	}
-
+#endif
 
 	if (!err || err == EINTR || err == ETIMEDOUT)
 		__xn_copy_to_user(curr, (void __user *) __xn_reg_arg3(regs),
@@ -2834,7 +2835,7 @@ static xnsysent_t __systab[] = {
 	[__pse51_cond_init] = {&__pthread_cond_init, __xn_exec_any},
 	[__pse51_cond_destroy] = {&__pthread_cond_destroy, __xn_exec_any},
 	[__pse51_cond_wait_prologue] =
-	{&__pthread_cond_wait_prologue, __xn_exec_primary},
+	{&__pthread_cond_wait_prologue, __xn_exec_primary | __xn_exec_norestart},
 	[__pse51_cond_wait_epilogue] =
 	    {&__pthread_cond_wait_epilogue, __xn_exec_primary},
 	[__pse51_cond_signal] = {&__pthread_cond_signal, __xn_exec_any},

-- 
					    Gilles.


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [Xenomai-help] Problem with pthread_cond_wait
  2009-12-03 23:58 ` Gilles Chanteperdrix
@ 2009-12-04  9:46   ` Soboljew, Patrick
  2009-12-04  9:59   ` Philippe Gerum
  1 sibling, 0 replies; 8+ messages in thread
From: Soboljew, Patrick @ 2009-12-04  9:46 UTC (permalink / raw)
  To: xenomai

 
> Soboljew, Patrick wrote:
> > Hello all,
> > 
> > I have a strange problem concerning the posix skin of xenomai (Ver.
> > 2.4.9.1). Whenever two or more threads call 'pthread_cond_wait' and
I 
> > want to interrupt the program with SIGINT (CTRL-c) only the main 
> > thread and the first thread that called 'pthread_cond_wait' get the 
> > signal. The remaining threads are not interrupted so I have created 
> > some zombies here. I discovered this problem when I tried to debug 
> > some code with the ACE/TAO Framework which calls these 
> > functions in a similar way. The debugger also has problems to
interrupt 
> > these threads.
> > 
> > The small code example illustrates what I did.
> > 
> > Has anyone an idea what exactly causes this problem?
> 
> The problem is the way pthread_cond_wait handles interruption by
> signals: the thread tries to re-acquire the mutex before 
> returning to user-space, so as to make the system call 
> restartable, so may be suspended if the mutex is not free 
> (which happens for the second thread in your test program), 
> so that it does not return to user-space, the signal remains 
> pending and unhandled, and you get the disturbing behaviour 
> when hitting ctrl-c or when running inside gdb.
> 
> We can fix that by making the syscall non restartable, and 
> let the user-space handle the mutex re-locking (which it 
> fortunately already does). 
> 
> Note that restarting automatically pthread_cond_wait is not 
> even correct, since we could miss a pthread_cond_signal if it 
> was sent between the time when the thread was unblocked from 
> the cond wait, and the time it starts waiting again. So, in 
> this case, it is better to return to the caller, you get a 
> spurious wake-up, which means that the caller must run 
> pthread_cond_wait in a loop for the program to run correctly.
> 
> Anyway, here is a quick fix, could you try it?
> 
> Notes for Philippe: the quick fix does not break the ABI, but 
> also changes the behaviour of non restartable syscalls, they 
> forcibly return -EINTR. This may look like a disrupting 
> change for the 2.4 branch, but there is actually currently 
> only one non-restartable syscall in Xenomai
> 2.4: pthread_mutex_unlock, and it is ready to handle -EINTR. 
> However, at this chance, we should mark a few more syscalls 
> as non restartable, notably nanosleep and select, because 
> they use relative timeouts. I think a lot of syscalls in the 
> native skin are using relative timeouts too, and should be 
> marked as non-restartable, but this implies documenting the 
> return value -EINTR.
> 
> diff --git a/ksrc/nucleus/shadow.c b/ksrc/nucleus/shadow.c 
> index 0b02f70..b361480 100644
> --- a/ksrc/nucleus/shadow.c
> +++ b/ksrc/nucleus/shadow.c
> @@ -823,7 +823,7 @@ static inline void 
> request_syscall_restart(xnthread_t *thread,
>  		if (__xn_interrupted_p(regs)) {
>  			__xn_error_return(regs,
>  					  (sysflags & 
> __xn_exec_norestart) ?
> -					  -ERESTARTNOHAND : 
> -ERESTARTSYS);
> +					  -EINTR : -ERESTARTSYS);
>  			notify = !xnthread_test_state(thread, XNDEBUG);
>  		}
>  
> diff --git a/ksrc/skins/posix/syscall.c 
> b/ksrc/skins/posix/syscall.c index d936f21..4c16589 100644
> --- a/ksrc/skins/posix/syscall.c
> +++ b/ksrc/skins/posix/syscall.c
> @@ -1518,6 +1518,7 @@ static int 
> __pthread_cond_wait_prologue(struct task_struct *curr,
>  						    timed,
>  						    XN_INFINITE);
>  
> +#if 0
>  	if (err == EINTR) {
>  		do {
>  			xnthread_clear_info(cur, XNKICKED);
> @@ -1528,7 +1529,7 @@ static int 
> __pthread_cond_wait_prologue(struct task_struct *curr,
>  		xnthread_set_info(cur, XNKICKED);
>  		err = EINTR;
>  	}
> -
> +#endif
>  
>  	if (!err || err == EINTR || err == ETIMEDOUT)
>  		__xn_copy_to_user(curr, (void __user *) 
> __xn_reg_arg3(regs), @@ -2834,7 +2835,7 @@ static xnsysent_t 
> __systab[] = {
>  	[__pse51_cond_init] = {&__pthread_cond_init, __xn_exec_any},
>  	[__pse51_cond_destroy] = {&__pthread_cond_destroy, 
> __xn_exec_any},
>  	[__pse51_cond_wait_prologue] =
> -	{&__pthread_cond_wait_prologue, __xn_exec_primary},
> +	{&__pthread_cond_wait_prologue, __xn_exec_primary | 
> +__xn_exec_norestart},
>  	[__pse51_cond_wait_epilogue] =
>  	    {&__pthread_cond_wait_epilogue, __xn_exec_primary},
>  	[__pse51_cond_signal] = {&__pthread_cond_signal, __xn_exec_any},
> 
> -- 
> 					    Gilles.

Perfect, you made my day. Now it works as expected. Will this patch find
its way into xenomai 2.4.x or do we have to change to xenomai 2.5 in the
near future?



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Xenomai-help] Problem with pthread_cond_wait
  2009-12-03 23:58 ` Gilles Chanteperdrix
  2009-12-04  9:46   ` Soboljew, Patrick
@ 2009-12-04  9:59   ` Philippe Gerum
  2009-12-04 10:09     ` Gilles Chanteperdrix
  1 sibling, 1 reply; 8+ messages in thread
From: Philippe Gerum @ 2009-12-04  9:59 UTC (permalink / raw)
  To: Gilles Chanteperdrix; +Cc: Soboljew, Patrick, xenomai

On Fri, 2009-12-04 at 00:58 +0100, Gilles Chanteperdrix wrote:
> Soboljew, Patrick wrote:
> > Hello all,
> > 
> > I have a strange problem concerning the posix skin of xenomai (Ver.
> > 2.4.9.1). Whenever two or more threads call 'pthread_cond_wait' and I
> > want to interrupt the program with SIGINT (CTRL-c) only the main thread
> > and the first thread that called 'pthread_cond_wait' get the signal. The
> > remaining threads are not interrupted so I have created some zombies
> > here. I discovered this problem when I tried to debug some code with the
> > ACE/TAO Framework which calls these functions in a similar way. The
> > debugger also has problems to interrupt these threads.
> > 
> > The small code example illustrates what I did.
> > 
> > Has anyone an idea what exactly causes this problem?
> 
> The problem is the way pthread_cond_wait handles interruption by 
> signals: the thread tries to re-acquire the mutex before returning to 
> user-space, so as to make the system call restartable, so may be 
> suspended if the mutex is not free (which happens for the second thread 
> in your test program), so that it does not return to user-space, the 
> signal remains pending and unhandled, and you get the disturbing 
> behaviour when hitting ctrl-c or when running inside gdb.
> 
> We can fix that by making the syscall non restartable, and let the 
> user-space handle the mutex re-locking (which it fortunately already 
> does). 
> 
> Note that restarting automatically pthread_cond_wait is not even 
> correct, since we could miss a pthread_cond_signal if it was sent 
> between the time when the thread was unblocked from the cond wait, and 
> the time it starts waiting again. So, in this case, it is better to 
> return to the caller, you get a spurious wake-up, which means that the 
> caller must run pthread_cond_wait in a loop for the program to run 
> correctly.
> 
> Anyway, here is a quick fix, could you try it?
> 
> Notes for Philippe: the quick fix does not break the ABI, but also 
> changes the behaviour of non restartable syscalls, they forcibly return 
> -EINTR. This may look like a disrupting change for the 2.4 branch, but 
> there is actually currently only one non-restartable syscall in Xenomai 
> 2.4: pthread_mutex_unlock, and it is ready to handle -EINTR. However, at
> this chance, we should mark a few more syscalls as non restartable, 
> notably nanosleep and select, because they use relative timeouts. I 
> think a lot of syscalls in the native skin are using relative timeouts 
> too, and should be marked as non-restartable, but this implies
> documenting the return value -EINTR.

This seems the wrong approach, at the very least for the native skin.
-EINTR is already used and documented there, as a possible return value
for blocking syscalls which have been forcibly unblocked (i.e. via
rt_task_unblocked()).

Returning -EINTR upon signal interrupt as well would confuse the
application, i.e. what was the actual reason for that syscall to return?
As a corollary, a bunch of applications are currently not handling
-EINTR, precisely because rt_task_unblock() is not used in their
application; so making all timed syscalls non-restartable might break
them badly.

The best fix would rather to convert relative timeouts to their
XN_REALTIME form internally, the way it is done for a few syscalls in
2.5 already.

> 
> diff --git a/ksrc/nucleus/shadow.c b/ksrc/nucleus/shadow.c
> index 0b02f70..b361480 100644
> --- a/ksrc/nucleus/shadow.c
> +++ b/ksrc/nucleus/shadow.c
> @@ -823,7 +823,7 @@ static inline void request_syscall_restart(xnthread_t *thread,
>  		if (__xn_interrupted_p(regs)) {
>  			__xn_error_return(regs,
>  					  (sysflags & __xn_exec_norestart) ?
> -					  -ERESTARTNOHAND : -ERESTARTSYS);
> +					  -EINTR : -ERESTARTSYS);
>  			notify = !xnthread_test_state(thread, XNDEBUG);
>  		}
>  
> diff --git a/ksrc/skins/posix/syscall.c b/ksrc/skins/posix/syscall.c
> index d936f21..4c16589 100644
> --- a/ksrc/skins/posix/syscall.c
> +++ b/ksrc/skins/posix/syscall.c
> @@ -1518,6 +1518,7 @@ static int __pthread_cond_wait_prologue(struct task_struct *curr,
>  						    timed,
>  						    XN_INFINITE);
>  
> +#if 0
>  	if (err == EINTR) {
>  		do {
>  			xnthread_clear_info(cur, XNKICKED);
> @@ -1528,7 +1529,7 @@ static int __pthread_cond_wait_prologue(struct task_struct *curr,
>  		xnthread_set_info(cur, XNKICKED);
>  		err = EINTR;
>  	}
> -
> +#endif
>  
>  	if (!err || err == EINTR || err == ETIMEDOUT)
>  		__xn_copy_to_user(curr, (void __user *) __xn_reg_arg3(regs),
> @@ -2834,7 +2835,7 @@ static xnsysent_t __systab[] = {
>  	[__pse51_cond_init] = {&__pthread_cond_init, __xn_exec_any},
>  	[__pse51_cond_destroy] = {&__pthread_cond_destroy, __xn_exec_any},
>  	[__pse51_cond_wait_prologue] =
> -	{&__pthread_cond_wait_prologue, __xn_exec_primary},
> +	{&__pthread_cond_wait_prologue, __xn_exec_primary | __xn_exec_norestart},
>  	[__pse51_cond_wait_epilogue] =
>  	    {&__pthread_cond_wait_epilogue, __xn_exec_primary},
>  	[__pse51_cond_signal] = {&__pthread_cond_signal, __xn_exec_any},
> 


-- 
Philippe.




^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Xenomai-help] Problem with pthread_cond_wait
  2009-12-04  9:59   ` Philippe Gerum
@ 2009-12-04 10:09     ` Gilles Chanteperdrix
  2009-12-04 10:26       ` Gilles Chanteperdrix
  2009-12-06 17:22       ` Philippe Gerum
  0 siblings, 2 replies; 8+ messages in thread
From: Gilles Chanteperdrix @ 2009-12-04 10:09 UTC (permalink / raw)
  To: Philippe Gerum; +Cc: Soboljew, Patrick, xenomai

Philippe Gerum wrote:
> On Fri, 2009-12-04 at 00:58 +0100, Gilles Chanteperdrix wrote:
>> Soboljew, Patrick wrote:
>>> Hello all,
>>>
>>> I have a strange problem concerning the posix skin of xenomai (Ver.
>>> 2.4.9.1). Whenever two or more threads call 'pthread_cond_wait' and I
>>> want to interrupt the program with SIGINT (CTRL-c) only the main thread
>>> and the first thread that called 'pthread_cond_wait' get the signal. The
>>> remaining threads are not interrupted so I have created some zombies
>>> here. I discovered this problem when I tried to debug some code with the
>>> ACE/TAO Framework which calls these functions in a similar way. The
>>> debugger also has problems to interrupt these threads.
>>>
>>> The small code example illustrates what I did.
>>>
>>> Has anyone an idea what exactly causes this problem?
>> The problem is the way pthread_cond_wait handles interruption by 
>> signals: the thread tries to re-acquire the mutex before returning to 
>> user-space, so as to make the system call restartable, so may be 
>> suspended if the mutex is not free (which happens for the second thread 
>> in your test program), so that it does not return to user-space, the 
>> signal remains pending and unhandled, and you get the disturbing 
>> behaviour when hitting ctrl-c or when running inside gdb.
>>
>> We can fix that by making the syscall non restartable, and let the 
>> user-space handle the mutex re-locking (which it fortunately already 
>> does). 
>>
>> Note that restarting automatically pthread_cond_wait is not even 
>> correct, since we could miss a pthread_cond_signal if it was sent 
>> between the time when the thread was unblocked from the cond wait, and 
>> the time it starts waiting again. So, in this case, it is better to 
>> return to the caller, you get a spurious wake-up, which means that the 
>> caller must run pthread_cond_wait in a loop for the program to run 
>> correctly.
>>
>> Anyway, here is a quick fix, could you try it?
>>
>> Notes for Philippe: the quick fix does not break the ABI, but also 
>> changes the behaviour of non restartable syscalls, they forcibly return 
>> -EINTR. This may look like a disrupting change for the 2.4 branch, but 
>> there is actually currently only one non-restartable syscall in Xenomai 
>> 2.4: pthread_mutex_unlock, and it is ready to handle -EINTR. However, at
>> this chance, we should mark a few more syscalls as non restartable, 
>> notably nanosleep and select, because they use relative timeouts. I 
>> think a lot of syscalls in the native skin are using relative timeouts 
>> too, and should be marked as non-restartable, but this implies
>> documenting the return value -EINTR.
> 
> This seems the wrong approach, at the very least for the native skin.
> -EINTR is already used and documented there, as a possible return value
> for blocking syscalls which have been forcibly unblocked (i.e. via
> rt_task_unblocked()).
> 
> Returning -EINTR upon signal interrupt as well would confuse the
> application, i.e. what was the actual reason for that syscall to return?
> As a corollary, a bunch of applications are currently not handling
> -EINTR, precisely because rt_task_unblock() is not used in their
> application; so making all timed syscalls non-restartable might break
> them badly.
> 
> The best fix would rather to convert relative timeouts to their
> XN_REALTIME form internally, the way it is done for a few syscalls in
> 2.5 already.

Yeah, but that would be an ABI change.

In the mean time, I thought more about that: actually, syscalls with
relative timeouts are restartable if the timeout is passed by pointer
and the syscall updates the timeout upon interruption by a signal, the
way nanosleep does (so, in a sense, nanosleep is restartable, contrarily
to what I said). Even select could be restartable, but the specification
mandates that it returns EINTR upon interruption by a signal and is not
restarted automatically.

-- 
                                          Gilles



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Xenomai-help] Problem with pthread_cond_wait
  2009-12-04 10:09     ` Gilles Chanteperdrix
@ 2009-12-04 10:26       ` Gilles Chanteperdrix
  2009-12-06 17:22       ` Philippe Gerum
  1 sibling, 0 replies; 8+ messages in thread
From: Gilles Chanteperdrix @ 2009-12-04 10:26 UTC (permalink / raw)
  To: Philippe Gerum; +Cc: Soboljew, Patrick, xenomai

Gilles Chanteperdrix wrote:
> Philippe Gerum wrote:
>> On Fri, 2009-12-04 at 00:58 +0100, Gilles Chanteperdrix wrote:
>>> Soboljew, Patrick wrote:
>>>> Hello all,
>>>>
>>>> I have a strange problem concerning the posix skin of xenomai (Ver.
>>>> 2.4.9.1). Whenever two or more threads call 'pthread_cond_wait' and I
>>>> want to interrupt the program with SIGINT (CTRL-c) only the main thread
>>>> and the first thread that called 'pthread_cond_wait' get the signal. The
>>>> remaining threads are not interrupted so I have created some zombies
>>>> here. I discovered this problem when I tried to debug some code with the
>>>> ACE/TAO Framework which calls these functions in a similar way. The
>>>> debugger also has problems to interrupt these threads.
>>>>
>>>> The small code example illustrates what I did.
>>>>
>>>> Has anyone an idea what exactly causes this problem?
>>> The problem is the way pthread_cond_wait handles interruption by 
>>> signals: the thread tries to re-acquire the mutex before returning to 
>>> user-space, so as to make the system call restartable, so may be 
>>> suspended if the mutex is not free (which happens for the second thread 
>>> in your test program), so that it does not return to user-space, the 
>>> signal remains pending and unhandled, and you get the disturbing 
>>> behaviour when hitting ctrl-c or when running inside gdb.
>>>
>>> We can fix that by making the syscall non restartable, and let the 
>>> user-space handle the mutex re-locking (which it fortunately already 
>>> does). 
>>>
>>> Note that restarting automatically pthread_cond_wait is not even 
>>> correct, since we could miss a pthread_cond_signal if it was sent 
>>> between the time when the thread was unblocked from the cond wait, and 
>>> the time it starts waiting again. So, in this case, it is better to 
>>> return to the caller, you get a spurious wake-up, which means that the 
>>> caller must run pthread_cond_wait in a loop for the program to run 
>>> correctly.
>>>
>>> Anyway, here is a quick fix, could you try it?
>>>
>>> Notes for Philippe: the quick fix does not break the ABI, but also 
>>> changes the behaviour of non restartable syscalls, they forcibly return 
>>> -EINTR. This may look like a disrupting change for the 2.4 branch, but 
>>> there is actually currently only one non-restartable syscall in Xenomai 
>>> 2.4: pthread_mutex_unlock, and it is ready to handle -EINTR. However, at
>>> this chance, we should mark a few more syscalls as non restartable, 
>>> notably nanosleep and select, because they use relative timeouts. I 
>>> think a lot of syscalls in the native skin are using relative timeouts 
>>> too, and should be marked as non-restartable, but this implies
>>> documenting the return value -EINTR.
>> This seems the wrong approach, at the very least for the native skin.
>> -EINTR is already used and documented there, as a possible return value
>> for blocking syscalls which have been forcibly unblocked (i.e. via
>> rt_task_unblocked()).
>>
>> Returning -EINTR upon signal interrupt as well would confuse the
>> application, i.e. what was the actual reason for that syscall to return?
>> As a corollary, a bunch of applications are currently not handling
>> -EINTR, precisely because rt_task_unblock() is not used in their
>> application; so making all timed syscalls non-restartable might break
>> them badly.
>>
>> The best fix would rather to convert relative timeouts to their
>> XN_REALTIME form internally, the way it is done for a few syscalls in
>> 2.5 already.
> 
> Yeah, but that would be an ABI change.
> 
> In the mean time, I thought more about that: actually, syscalls with
> relative timeouts are restartable if the timeout is passed by pointer
> and the syscall updates the timeout upon interruption by a signal, the
> way nanosleep does (so, in a sense, nanosleep is restartable, contrarily
> to what I said). Even select could be restartable, but the specification
> mandates that it returns EINTR upon interruption by a signal and is not
> restarted automatically.

No. Not even select. Whether select is restarted when interrupted by a
signal with the SA_RESTART flag is implementation defined.

-- 
                                          Gilles



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Xenomai-help] Problem with pthread_cond_wait
  2009-12-04 10:09     ` Gilles Chanteperdrix
  2009-12-04 10:26       ` Gilles Chanteperdrix
@ 2009-12-06 17:22       ` Philippe Gerum
  1 sibling, 0 replies; 8+ messages in thread
From: Philippe Gerum @ 2009-12-06 17:22 UTC (permalink / raw)
  To: Gilles Chanteperdrix; +Cc: Soboljew, Patrick, xenomai

On Fri, 2009-12-04 at 11:09 +0100, Gilles Chanteperdrix wrote:
> Philippe Gerum wrote:
> > On Fri, 2009-12-04 at 00:58 +0100, Gilles Chanteperdrix wrote:
> >> Soboljew, Patrick wrote:
> >>> Hello all,
> >>>
> >>> I have a strange problem concerning the posix skin of xenomai (Ver.
> >>> 2.4.9.1). Whenever two or more threads call 'pthread_cond_wait' and I
> >>> want to interrupt the program with SIGINT (CTRL-c) only the main thread
> >>> and the first thread that called 'pthread_cond_wait' get the signal. The
> >>> remaining threads are not interrupted so I have created some zombies
> >>> here. I discovered this problem when I tried to debug some code with the
> >>> ACE/TAO Framework which calls these functions in a similar way. The
> >>> debugger also has problems to interrupt these threads.
> >>>
> >>> The small code example illustrates what I did.
> >>>
> >>> Has anyone an idea what exactly causes this problem?
> >> The problem is the way pthread_cond_wait handles interruption by 
> >> signals: the thread tries to re-acquire the mutex before returning to 
> >> user-space, so as to make the system call restartable, so may be 
> >> suspended if the mutex is not free (which happens for the second thread 
> >> in your test program), so that it does not return to user-space, the 
> >> signal remains pending and unhandled, and you get the disturbing 
> >> behaviour when hitting ctrl-c or when running inside gdb.
> >>
> >> We can fix that by making the syscall non restartable, and let the 
> >> user-space handle the mutex re-locking (which it fortunately already 
> >> does). 
> >>
> >> Note that restarting automatically pthread_cond_wait is not even 
> >> correct, since we could miss a pthread_cond_signal if it was sent 
> >> between the time when the thread was unblocked from the cond wait, and 
> >> the time it starts waiting again. So, in this case, it is better to 
> >> return to the caller, you get a spurious wake-up, which means that the 
> >> caller must run pthread_cond_wait in a loop for the program to run 
> >> correctly.
> >>
> >> Anyway, here is a quick fix, could you try it?
> >>
> >> Notes for Philippe: the quick fix does not break the ABI, but also 
> >> changes the behaviour of non restartable syscalls, they forcibly return 
> >> -EINTR. This may look like a disrupting change for the 2.4 branch, but 
> >> there is actually currently only one non-restartable syscall in Xenomai 
> >> 2.4: pthread_mutex_unlock, and it is ready to handle -EINTR. However, at
> >> this chance, we should mark a few more syscalls as non restartable, 
> >> notably nanosleep and select, because they use relative timeouts. I 
> >> think a lot of syscalls in the native skin are using relative timeouts 
> >> too, and should be marked as non-restartable, but this implies
> >> documenting the return value -EINTR.
> > 
> > This seems the wrong approach, at the very least for the native skin.
> > -EINTR is already used and documented there, as a possible return value
> > for blocking syscalls which have been forcibly unblocked (i.e. via
> > rt_task_unblocked()).
> > 
> > Returning -EINTR upon signal interrupt as well would confuse the
> > application, i.e. what was the actual reason for that syscall to return?
> > As a corollary, a bunch of applications are currently not handling
> > -EINTR, precisely because rt_task_unblock() is not used in their
> > application; so making all timed syscalls non-restartable might break
> > them badly.
> > 
> > The best fix would rather to convert relative timeouts to their
> > XN_REALTIME form internally, the way it is done for a few syscalls in
> > 2.5 already.
> 
> Yeah, but that would be an ABI change.

The native userland interface always passes timeout args to the kernel
by address, so we may have another option.

> 
> In the mean time, I thought more about that: actually, syscalls with
> relative timeouts are restartable if the timeout is passed by pointer
> and the syscall updates the timeout upon interruption by a signal, the
> way nanosleep does (so, in a sense, nanosleep is restartable, contrarily
> to what I said). Even select could be restartable, but the specification
> mandates that it returns EINTR upon interruption by a signal and is not
> restarted automatically.
> 


-- 
Philippe.




^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2009-12-06 17:22 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-12-03 12:07 [Xenomai-help] Problem with pthread_cond_wait Soboljew, Patrick
2009-12-03 13:15 ` Gilles Chanteperdrix
2009-12-03 23:58 ` Gilles Chanteperdrix
2009-12-04  9:46   ` Soboljew, Patrick
2009-12-04  9:59   ` Philippe Gerum
2009-12-04 10:09     ` Gilles Chanteperdrix
2009-12-04 10:26       ` Gilles Chanteperdrix
2009-12-06 17:22       ` Philippe Gerum

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.