From mboxrd@z Thu Jan 1 00:00:00 1970 Message-ID: <4B18E3C4.9030408@domain.hid> Date: Fri, 04 Dec 2009 11:26:12 +0100 From: Gilles Chanteperdrix MIME-Version: 1.0 References: <4B1850B6.4010906@domain.hid> <1259920779.2174.77.camel@domain.hid> <4B18DFD6.1030907@domain.hid> In-Reply-To: <4B18DFD6.1030907@domain.hid> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Subject: Re: [Xenomai-help] Problem with pthread_cond_wait List-Id: Help regarding installation and common use of Xenomai List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Philippe Gerum Cc: "Soboljew, Patrick" , xenomai@xenomai.org Gilles Chanteperdrix wrote: > Philippe Gerum wrote: >> On Fri, 2009-12-04 at 00:58 +0100, Gilles Chanteperdrix wrote: >>> Soboljew, Patrick wrote: >>>> Hello all, >>>> >>>> I have a strange problem concerning the posix skin of xenomai (Ver. >>>> 2.4.9.1). Whenever two or more threads call 'pthread_cond_wait' and I >>>> want to interrupt the program with SIGINT (CTRL-c) only the main thread >>>> and the first thread that called 'pthread_cond_wait' get the signal. The >>>> remaining threads are not interrupted so I have created some zombies >>>> here. I discovered this problem when I tried to debug some code with the >>>> ACE/TAO Framework which calls these functions in a similar way. The >>>> debugger also has problems to interrupt these threads. >>>> >>>> The small code example illustrates what I did. >>>> >>>> Has anyone an idea what exactly causes this problem? >>> The problem is the way pthread_cond_wait handles interruption by >>> signals: the thread tries to re-acquire the mutex before returning to >>> user-space, so as to make the system call restartable, so may be >>> suspended if the mutex is not free (which happens for the second thread >>> in your test program), so that it does not return to user-space, the >>> signal remains pending and unhandled, and you get the disturbing >>> behaviour when hitting ctrl-c or when running inside gdb. >>> >>> We can fix that by making the syscall non restartable, and let the >>> user-space handle the mutex re-locking (which it fortunately already >>> does). >>> >>> Note that restarting automatically pthread_cond_wait is not even >>> correct, since we could miss a pthread_cond_signal if it was sent >>> between the time when the thread was unblocked from the cond wait, and >>> the time it starts waiting again. So, in this case, it is better to >>> return to the caller, you get a spurious wake-up, which means that the >>> caller must run pthread_cond_wait in a loop for the program to run >>> correctly. >>> >>> Anyway, here is a quick fix, could you try it? >>> >>> Notes for Philippe: the quick fix does not break the ABI, but also >>> changes the behaviour of non restartable syscalls, they forcibly return >>> -EINTR. This may look like a disrupting change for the 2.4 branch, but >>> there is actually currently only one non-restartable syscall in Xenomai >>> 2.4: pthread_mutex_unlock, and it is ready to handle -EINTR. However, at >>> this chance, we should mark a few more syscalls as non restartable, >>> notably nanosleep and select, because they use relative timeouts. I >>> think a lot of syscalls in the native skin are using relative timeouts >>> too, and should be marked as non-restartable, but this implies >>> documenting the return value -EINTR. >> This seems the wrong approach, at the very least for the native skin. >> -EINTR is already used and documented there, as a possible return value >> for blocking syscalls which have been forcibly unblocked (i.e. via >> rt_task_unblocked()). >> >> Returning -EINTR upon signal interrupt as well would confuse the >> application, i.e. what was the actual reason for that syscall to return? >> As a corollary, a bunch of applications are currently not handling >> -EINTR, precisely because rt_task_unblock() is not used in their >> application; so making all timed syscalls non-restartable might break >> them badly. >> >> The best fix would rather to convert relative timeouts to their >> XN_REALTIME form internally, the way it is done for a few syscalls in >> 2.5 already. > > Yeah, but that would be an ABI change. > > In the mean time, I thought more about that: actually, syscalls with > relative timeouts are restartable if the timeout is passed by pointer > and the syscall updates the timeout upon interruption by a signal, the > way nanosleep does (so, in a sense, nanosleep is restartable, contrarily > to what I said). Even select could be restartable, but the specification > mandates that it returns EINTR upon interruption by a signal and is not > restarted automatically. No. Not even select. Whether select is restarted when interrupted by a signal with the SA_RESTART flag is implementation defined. -- Gilles