All of lore.kernel.org
 help / color / mirror / Atom feed
From: Oleg Nesterov <oleg@tv-sign.ru>
To: Joe Korty <joe.korty@ccur.com>
Cc: Roland McGrath <roland@redhat.com>, Jiri Kosina <jkosina@suse.cz>,
	Andrew Morton <akpm@linux-foundation.org>,
	linux-kernel@vger.kernel.org
Subject: Re: [BUG, TEST PATCH] stallout race between SIGCONT and SIGSTOP
Date: Wed, 24 Sep 2008 19:05:41 +0400	[thread overview]
Message-ID: <20080924150541.GA119@tv-sign.ru> (raw)
In-Reply-To: <20080923163530.GA656@tv-sign.ru>

On 09/23, Oleg Nesterov wrote:
>
> I will be completely offline tomorrow.

Cancelled...

> Better yet, to avoid a possible confusion, could you please send me
> the (modified) source code to re-produce the stall ?

Thanks Joe for the test-case and for your comments.

Joe says:
>
> So it looks like the test is in error, not the kernel.

and I am happy to agree.

I think sigaction/10-1.c should be fixed, please see the patch below.

In essence the code does:

	for (1 .. 10) {
		kill(pid, SIGSTOP);
		wait_until_we_receive_CLD_STOPPED(); // <---- HANGS
		kill(pid, SIGCONT);
	}

We can look at the code from the different angle, this loop does

		kill(pid, SIGCONT);
		kill(pid, SIGSTOP);
		wait_until_we_receive_CLD_STOPPED(); // <---- HANGS

and it hangs because CLD_CONTINUED (which is ignored by the SIGCHLD
handler) masks CLD_STOPPED.

And yes, the mentioned patch makes the difference. To clarify, it
does not defer the "result" of SIGCONT, it only defers the sending
of SIGCHLD with .si_code == CLD_CONTINUED.

So, what happens is:

	kill(SIGCONT) does its job, CLD_CONTINUED will be sent a
	bit later.

	kill(SIGSTOP) starts another group-stop.

	the child sends CLD_CONTINUED, and before the parent deques
	the signal, it stops and sends CLD_STOPPED.

Now, since SIGCHLD is a non-rt signal, the second SIGCHLD is lost,
and wait_until_we_receive_CLD_STOPPED() hangs.

I did the test patch to be sure:

	--- 26-rc2/kernel/signal.c~	2008-09-20 20:37:52.000000000 +0400
	+++ 26-rc2/kernel/signal.c	2008-09-24 18:43:34.000000000 +0400
	@@ -808,7 +808,7 @@ static int send_signal(int sig, struct s
		 * exactly one non-rt signal, so that we can get more
		 * detailed information about the cause of the signal.
		 */
	-	if (legacy_queue(pending, sig))
	+	if (sig != SIGCHLD && legacy_queue(pending, sig))
			return 0;
		/*
		 * fast-pathed signals for kernel-internal things like SIGSTOP

and now your test-case doesn't hang.

Oleg.

--- tmp/posixtestsuite/conformance/interfaces/sigaction/10-1.c~	2003-04-15 01:18:58.000000000 +0400
+++ tmp/posixtestsuite/conformance/interfaces/sigaction/10-1.c	2008-09-24 18:01:23.000000000 +0400
@@ -19,19 +19,41 @@
 #define NUMSTOPS 10
 
 int child_stopped = 0;
-int waiting = 1;
+int child_continued = 0;
+int notification;
 
-void handler(int signo, siginfo_t *info, void *context) 
+void handler(int signo, siginfo_t *info, void *context)
 {
-	if (info && info->si_code == CLD_STOPPED) {
+	if (!info)
+		return;
+
+	notification = info->si_code;
+
+	switch (notification) {
+	case CLD_STOPPED:
 		printf("Child has been stopped\n");
-		waiting = 0;
 		child_stopped++;
+		break;
+	case CLD_CONTINUED:
+		printf("Child has been continued\n");
+		child_continued++;
+		break;
 	}
 }
 
+void wait_for_notification(int val)
+{
+	struct timeval tv;
+
+	while (notification != val) {
+		tv.tv_sec = 1;
+		tv.tv_usec = 0;
+		if (!select(0, NULL, NULL, NULL, &tv))
+			break;
+	}
+}
 
-int main()
+int main(void)
 {
 	pid_t pid;
 	struct sigaction act;
@@ -40,7 +62,7 @@ int main()
 	act.sa_sigaction = handler;
 	act.sa_flags = SA_SIGINFO;
 	sigemptyset(&act.sa_mask);
-	sigaction(SIGCHLD,  &act, 0);     
+	sigaction(SIGCHLD,  &act, 0);
 
 	if ((pid = fork()) == 0) {
 		/* child */
@@ -54,37 +76,36 @@ int main()
 		return 0;
 	} else {
 		/* parent */
-		int s; 		
+		int s;
 		int i;
 
 		for (i = 0; i < NUMSTOPS; i++) {
-			waiting = 1;
-
 			printf("--> Sending SIGSTOP\n");
+			notification = 0;
 			kill(pid, SIGSTOP);
 
 			/*
 			  Don't let the kernel optimize away queued
 			  SIGSTOP/SIGCONT signals.
 			*/
-			while (waiting) {
-				tv.tv_sec = 1;
-				tv.tv_usec = 0;
-				if (!select(0, NULL, NULL, NULL, &tv))
-				  break;
-			}
-
+			wait_for_notification(CLD_STOPPED);
 
 			printf("--> Sending SIGCONT\n");
+			notification = 0;
 			kill(pid, SIGCONT);
+			/*
+			  SIGCHLD doesn't queue, make sure CLD_CONTINUED
+			  doesn't mask the next CLD_STOPPED
+			*/
+			wait_for_notification(CLD_CONTINUED);
 		}
-		
+
 		/* POSIX specifies default action to be abnormal termination */
 		kill(pid, SIGHUP);
 		waitpid(pid, &s, 0);
 	}
 
-	if (child_stopped == NUMSTOPS) {
+	if (child_stopped == NUMSTOPS && child_continued == NUMSTOPS) {
 		printf("Test PASSED\n");
 		return 0;
 	}
@@ -92,4 +113,3 @@ int main()
 	printf("Test FAILED\n");
 	return -1;
 }
-


  reply	other threads:[~2008-09-24 15:00 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-09-23 15:53 [BUG, TEST PATCH] stallout race between SIGCONT and SIGSTOP Joe Korty
2008-09-23 16:35 ` Oleg Nesterov
2008-09-24 15:05   ` Oleg Nesterov [this message]
2008-09-24 15:19     ` Joe Korty
2008-09-24 15:56       ` Oleg Nesterov
2008-09-24 17:11         ` Joe Korty
2008-09-24 17:34           ` Oleg Nesterov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20080924150541.GA119@tv-sign.ru \
    --to=oleg@tv-sign.ru \
    --cc=akpm@linux-foundation.org \
    --cc=jkosina@suse.cz \
    --cc=joe.korty@ccur.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=roland@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.