* Re: PATCH: SysV semaphore race vs SIGSTOP
@ 2005-02-05 8:37 Manfred Spraul
2005-02-06 7:42 ` Ove Kaaven
0 siblings, 1 reply; 3+ messages in thread
From: Manfred Spraul @ 2005-02-05 8:37 UTC (permalink / raw)
To: Ove Kaaven; +Cc: linux-kernel, Andrew Morton
[-- Attachment #1: Type: text/plain, Size: 1085 bytes --]
Hi Ove,
>As I mentioned in an earlier mail, there is a race when SIGSTOP-ing a
>process waiting for a SysV semaphore, where if a process holding a
>semaphore suspends another process waiting on the semaphore and then
>releases the semaphore,
>
Your patch looks correct (for 2.4 - 2.6 uses a different approach), but
I'm not certain that it's needed:
You assume that signals have an immediate effect.
Linux ignores that - it delays signal processing under some
circumstances. If a syscall can be completed without blocking, then the
syscall is handled, regardless of any pending signals. The signal is
handled at the syscall return, i.e. after completing the syscall.
That's not just in SysV semaphore - at least pipes are identical:
pipe_read first check if there is data. If there is some, then it
returns the data. Signals are only checked if there is no pending data.
I'm not sure if this is a bug. But if it's one, then far more than just
sysv sem must be updated.
What about other unices? I've attached a test app that tests pipes.
Could someone try it?
--
Manfred
[-- Attachment #2: test8.cpp --]
[-- Type: text/x-c++src, Size: 2024 bytes --]
/*
* Copyright (C) 1999, 2001,2005 by Manfred Spraul.
*
* Redistribution of this file is permitted under the terms of the GNU
* General Public License (GPL)
* $Header: /home/manfred/cvs-tree/pipetest/test8.cpp,v 1.1 2005/02/05 08:35:01 manfred Exp $
*/
#include <unistd.h>
#include <fcntl.h>
#include <errno.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <signal.h>
#include <limits.h>
#include <errno.h>
#define WRITE_LEN 128 /* less than PIPE_BUF */
int buffer[WRITE_LEN];
void dummy(int sig)
{
}
int main()
{
int pipes[2];
int ret;
ret = pipe(pipes);
if(ret != 0) {
printf("pipe creation failed, ret %d, errno %d.\n",
ret, errno);
exit(1);
}
ret = fork();
if(!ret) {
/* child: read from pipe */
printf("child: trying to read %d bytes.\n", WRITE_LEN);
ret = read(pipes[0], buffer, WRITE_LEN);
/* never executed! */
printf("child: read returned %d.\n", ret);
printf("SIGSTOP test result: SIGSTOP completely broken\n");
} else {
/* synchronize with timer - the following block must be atomic */
sleep(1);
/* begin atomic block */
ret = kill(ret, SIGSTOP);
if (ret != 0) {
printf("Sending SIGSTOP failed, aborting (errno=%d).\n", errno);
exit(1);
}
ret = write(pipes[1], buffer, WRITE_LEN);
if (ret != WRITE_LEN) {
printf("Writing to pipe buffer failed, aborting (errno=%d).\n", errno);
exit(1);
}
/* end of atomic block */
printf("parent: yielding\n");
sleep(1);
ret = fcntl(pipes[0], F_SETFL, O_NONBLOCK);
if (ret != 0) {
printf("fcntl(,,O_NONBLOCK) failed, aborting (errno=%d).\n", errno);
exit(1);
}
ret = read(pipes[0], buffer, WRITE_LEN);
printf("parent: read returned %d.\n", ret);
printf("\n\n");
printf("SIGSTOP test result:\n");
printf("Expected values:\n");
printf(" %d for OS with synchroneous SIGSTOP\n", WRITE_LEN);
printf(" -1 for OS with asynchroneous SIGSTOP (errno EAGAIN=%d)\n", EAGAIN);
printf("Got: %d (errno: %d)\n", ret, errno);
close(pipes[1]);
close(pipes[0]);
}
}
^ permalink raw reply [flat|nested] 3+ messages in thread* Re: PATCH: SysV semaphore race vs SIGSTOP
2005-02-05 8:37 PATCH: SysV semaphore race vs SIGSTOP Manfred Spraul
@ 2005-02-06 7:42 ` Ove Kaaven
0 siblings, 0 replies; 3+ messages in thread
From: Ove Kaaven @ 2005-02-06 7:42 UTC (permalink / raw)
To: Manfred Spraul; +Cc: linux-kernel, Andrew Morton
lør, 05,.02.2005 kl. 09.37 +0100, skrev Manfred Spraul:
> Hi Ove,
>
> >As I mentioned in an earlier mail, there is a race when SIGSTOP-ing a
> >process waiting for a SysV semaphore, where if a process holding a
> >semaphore suspends another process waiting on the semaphore and then
> >releases the semaphore,
> >
> Your patch looks correct (for 2.4 - 2.6 uses a different approach), but
Actually I myself don't think the patch I sent is 100% correct. The most
glaring problem is of course that it only handles SIGSTOP and nothing
else, but I wasn't sure how to handle anything else. Also, I've since
found that the "continue" in there made it impossible to attach
debuggers or the like to a process blocked in semop. Finally, it would
occasionally continue to loop after the condition is satisfied, e.g.
when the semop is 0.
So, the if check in my patch is more correct as
+ if (is_stopping(current) && queue.status == 1)
+ /* Could either EINTR out or continue.
+ * I suppose EINTR is the robust choice. */
+ queue.status = -EINTR;
but that still doesn't handle all signals. I'm thinking that if the
SIGSTOP problem can be solved simply by returning EINTR (and let the
userspace code deal with that by retrying), then perhaps that can be
done for all signals.
Summarily, my "perfect" patch for 2.4 might simply look like this
(assuming signal_pending does the right thing, which I haven't
verified):
--- ipc/sem.c.original 2005-01-31 18:17:17.000000000 -0500
+++ ipc/sem.c 2005-02-06 01:52:01.000000000 -0500
@@ -961,6 +961,9 @@
error = -EIDRM;
goto out_free;
}
+ if (signal_pending(current) && queue.status == 1)
+ queue.status = -EINTR;
+
/*
* If queue.status == 1 we where woken up and
* have to retry else we simply return.
As for 2.6, yes, the 2.6 code does seem to be harder to make a simple
patch for. Then again, we're thinking of using futexes instead of
semaphores if a 2.6 kernel is detected, so if futexes don't have this
problem, I guess we can use them to work around this semaphore flaw.
> I'm not certain that it's needed:
> You assume that signals have an immediate effect.
I don't necessarily need it to have an "immediate" effect. Only that
pending signals are taken into consideration when the process returns
from certain blocking system calls dealing with atomic synchronization
primitives such as semaphores. I want to be able to think that
synchronization primitives are there to *protect* me against the effects
of implementation details such as delayed signal delivery, not that they
have their own synchronization issues. After all, the man page for
"semop" states that
* The calling process catches a signal: the value of semzcnt is
decremented and semop() fails, with errno set to EINTR.
and I want to be able to expect this to always actually happen. Because
semop is supposed to be an *atomic* operation, I don't expect it to
*both* catch a signal *and* succeed within the same semop system call
(knowing that the signal *must* have been sent by the time the semaphore
condition was fulfilled). See where I'm coming from?
> Linux ignores that - it delays signal processing under some
> circumstances. If a syscall can be completed without blocking, then the
> syscall is handled, regardless of any pending signals. The signal is
> handled at the syscall return, i.e. after completing the syscall.
> That's not just in SysV semaphore - at least pipes are identical:
Perhaps. But at least read() doesn't claim to be an atomic
synchronization primitive like semop() does. Though I suppose it's
occasionally used that way...
> pipe_read first check if there is data. If there is some, then it
> returns the data. Signals are only checked if there is no pending data.
> I'm not sure if this is a bug. But if it's one, then far more than just
> sysv sem must be updated.
I'd probably only concern myself with things that are supposed to be
atomic, really.
Anyway, thanks a lot for answering.
^ permalink raw reply [flat|nested] 3+ messages in thread
* PROBLEM: SysV semaphore race vs SIGSTOP
@ 2005-01-28 22:55 Ove Kaaven
2005-02-01 3:52 ` PATCH: " Ove Kaaven
0 siblings, 1 reply; 3+ messages in thread
From: Ove Kaaven @ 2005-01-28 22:55 UTC (permalink / raw)
To: linux-kernel
There seem to be a race when SIGSTOP-ing a process waiting for a SysV
semaphore. Even if it could not possibly have owned the semaphore when
the signal was sent (because the sender of the signal owned it at the
time), it still occasionally happens that it both stops execution *and*
acquires the semaphore, with a deadlocked application as the result.
This is a problem for some of the high-performance stuff I'm working on.
A sample test program exhibiting the problem is available at
http://www.ping.uio.no/~ovehk/sembug.c
For me, it will show "ACQUIRE FAILED!! DEADLOCK!!" almost every time I
run it. Occasionally it will run fine; if it does for you, just try
again a couple of times.
The kernel I currently use is:
Linux version 2.4.27-1-k7 (horms@tabatha.lab.ultramonkey.org) (gcc
version 3.3.5 (Debian 1:3.3.5-2)) #1 Wed Dec 1 20:12:01 JST 2004
and I run it on a uniprocessor system (AMD Athlon, 1.9GHz) with Debian
"sid" installed.
I'm not a kernel hacker, but from a quick peruse of the 2.4 code, it
didn't seem to me like the semaphore code in the kernel (ipc/sem.c) even
try to handle suspended threads (though I wouldn't know how to do so).
The 2.6 semaphore code looked almost the same to me, too, so it might be
a problem there as well.
Please Cc me on any questions or comments, since I am too wimpy to
subscribe yet.
^ permalink raw reply [flat|nested] 3+ messages in thread
* PATCH: SysV semaphore race vs SIGSTOP
2005-01-28 22:55 PROBLEM: " Ove Kaaven
@ 2005-02-01 3:52 ` Ove Kaaven
0 siblings, 0 replies; 3+ messages in thread
From: Ove Kaaven @ 2005-02-01 3:52 UTC (permalink / raw)
To: linux-kernel
As I mentioned in an earlier mail, there is a race when SIGSTOP-ing a
process waiting for a SysV semaphore, where if a process holding a
semaphore suspends another process waiting on the semaphore and then
releases the semaphore, the suspended process might still get the
semaphore, resulting in a deadlock. My sample test program is at
http://www.ping.uio.no/~ovehk/sembug.c
Now I've written a patch for this which seems to work for me. It is
against 2.4.27, but the semaphore code doesn't seem to change often. And
please Cc me on any questions or comments, since I'm not subscribed.
Here is the patch:
--- ipc/sem.c.original 2005-01-31 18:17:17.000000000 -0500
+++ ipc/sem.c 2005-01-31 18:17:34.000000000 -0500
@@ -307,6 +307,18 @@
return result;
}
+static int is_stopping(struct task_struct *t)
+{
+ if (t->state == TASK_STOPPED) {
+ /* shouldn't happen */
+ return 1;
+ }
+ if (sigismember(&t->pending.signal, SIGSTOP)) {
+ return 1;
+ }
+ return 0;
+}
+
/* Go through the pending queue for the indicated semaphore
* looking for tasks that can be completed.
*/
@@ -961,6 +973,12 @@
error = -EIDRM;
goto out_free;
}
+
+ if (is_stopping(current))
+ /* Could either EINTR out or continue.
+ * Currently I've chosen to continue */
+ continue;
+
/*
* If queue.status == 1 we where woken up and
* have to retry else we simply return.
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2005-02-06 7:43 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-02-05 8:37 PATCH: SysV semaphore race vs SIGSTOP Manfred Spraul
2005-02-06 7:42 ` Ove Kaaven
-- strict thread matches above, loose matches on Subject: below --
2005-01-28 22:55 PROBLEM: " Ove Kaaven
2005-02-01 3:52 ` PATCH: " Ove Kaaven
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox