From: Gilles Chanteperdrix <gilles.chanteperdrix@xenomai.org>
To: Philippe Gerum <rpm@xenomai.org>
Cc: "Soboljew, Patrick" <Patrick.Soboljew@domain.hid>, xenomai@xenomai.org
Subject: Re: [Xenomai-help] Problem with pthread_cond_wait
Date: Fri, 04 Dec 2009 11:26:12 +0100 [thread overview]
Message-ID: <4B18E3C4.9030408@domain.hid> (raw)
In-Reply-To: <4B18DFD6.1030907@domain.hid>
Gilles Chanteperdrix wrote:
> Philippe Gerum wrote:
>> On Fri, 2009-12-04 at 00:58 +0100, Gilles Chanteperdrix wrote:
>>> Soboljew, Patrick wrote:
>>>> Hello all,
>>>>
>>>> I have a strange problem concerning the posix skin of xenomai (Ver.
>>>> 2.4.9.1). Whenever two or more threads call 'pthread_cond_wait' and I
>>>> want to interrupt the program with SIGINT (CTRL-c) only the main thread
>>>> and the first thread that called 'pthread_cond_wait' get the signal. The
>>>> remaining threads are not interrupted so I have created some zombies
>>>> here. I discovered this problem when I tried to debug some code with the
>>>> ACE/TAO Framework which calls these functions in a similar way. The
>>>> debugger also has problems to interrupt these threads.
>>>>
>>>> The small code example illustrates what I did.
>>>>
>>>> Has anyone an idea what exactly causes this problem?
>>> The problem is the way pthread_cond_wait handles interruption by
>>> signals: the thread tries to re-acquire the mutex before returning to
>>> user-space, so as to make the system call restartable, so may be
>>> suspended if the mutex is not free (which happens for the second thread
>>> in your test program), so that it does not return to user-space, the
>>> signal remains pending and unhandled, and you get the disturbing
>>> behaviour when hitting ctrl-c or when running inside gdb.
>>>
>>> We can fix that by making the syscall non restartable, and let the
>>> user-space handle the mutex re-locking (which it fortunately already
>>> does).
>>>
>>> Note that restarting automatically pthread_cond_wait is not even
>>> correct, since we could miss a pthread_cond_signal if it was sent
>>> between the time when the thread was unblocked from the cond wait, and
>>> the time it starts waiting again. So, in this case, it is better to
>>> return to the caller, you get a spurious wake-up, which means that the
>>> caller must run pthread_cond_wait in a loop for the program to run
>>> correctly.
>>>
>>> Anyway, here is a quick fix, could you try it?
>>>
>>> Notes for Philippe: the quick fix does not break the ABI, but also
>>> changes the behaviour of non restartable syscalls, they forcibly return
>>> -EINTR. This may look like a disrupting change for the 2.4 branch, but
>>> there is actually currently only one non-restartable syscall in Xenomai
>>> 2.4: pthread_mutex_unlock, and it is ready to handle -EINTR. However, at
>>> this chance, we should mark a few more syscalls as non restartable,
>>> notably nanosleep and select, because they use relative timeouts. I
>>> think a lot of syscalls in the native skin are using relative timeouts
>>> too, and should be marked as non-restartable, but this implies
>>> documenting the return value -EINTR.
>> This seems the wrong approach, at the very least for the native skin.
>> -EINTR is already used and documented there, as a possible return value
>> for blocking syscalls which have been forcibly unblocked (i.e. via
>> rt_task_unblocked()).
>>
>> Returning -EINTR upon signal interrupt as well would confuse the
>> application, i.e. what was the actual reason for that syscall to return?
>> As a corollary, a bunch of applications are currently not handling
>> -EINTR, precisely because rt_task_unblock() is not used in their
>> application; so making all timed syscalls non-restartable might break
>> them badly.
>>
>> The best fix would rather to convert relative timeouts to their
>> XN_REALTIME form internally, the way it is done for a few syscalls in
>> 2.5 already.
>
> Yeah, but that would be an ABI change.
>
> In the mean time, I thought more about that: actually, syscalls with
> relative timeouts are restartable if the timeout is passed by pointer
> and the syscall updates the timeout upon interruption by a signal, the
> way nanosleep does (so, in a sense, nanosleep is restartable, contrarily
> to what I said). Even select could be restartable, but the specification
> mandates that it returns EINTR upon interruption by a signal and is not
> restarted automatically.
No. Not even select. Whether select is restarted when interrupted by a
signal with the SA_RESTART flag is implementation defined.
--
Gilles
next prev parent reply other threads:[~2009-12-04 10:26 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-12-03 12:07 [Xenomai-help] Problem with pthread_cond_wait Soboljew, Patrick
2009-12-03 13:15 ` Gilles Chanteperdrix
2009-12-03 23:58 ` Gilles Chanteperdrix
2009-12-04 9:46 ` Soboljew, Patrick
2009-12-04 9:59 ` Philippe Gerum
2009-12-04 10:09 ` Gilles Chanteperdrix
2009-12-04 10:26 ` Gilles Chanteperdrix [this message]
2009-12-06 17:22 ` Philippe Gerum
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4B18E3C4.9030408@domain.hid \
--to=gilles.chanteperdrix@xenomai.org \
--cc=Patrick.Soboljew@domain.hid \
--cc=rpm@xenomai.org \
--cc=xenomai@xenomai.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.