public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Manfred Spraul <manfred@colorfullife.com>
To: Luca Tettamanti <kronos.it@gmail.com>
Cc: Christoph Lameter <cl@linux-foundation.org>,
	linux-kernel@vger.kernel.org, Julia Lawall <julia@diku.dk>,
	Andrew Morton <akpm@linux-foundation.org>,
	maciej.rutecki@gmail.com,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Subject: Re: 2.6.35-rc3: System unresponsive under load
Date: Wed, 30 Jun 2010 21:07:54 +0200	[thread overview]
Message-ID: <4C2B960A.7090503@colorfullife.com> (raw)
In-Reply-To: <AANLkTikSh0FZYPppDcF2YDoyDBkIZcAdTBj7iN_9Cta4@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 999 bytes --]

Hi Luca,

On 06/26/2010 06:47 PM, Luca Tettamanti wrote:
>
> Confirmed here: your test program freezes the system for a while under
> 2.6.35-rc3, while vanilla 2.6.34 copes fine.
> sysrq-t was responsive during the freeze, so I took a snapshot during
> it, file is attached.
>
>    
Ignore my test program:
If the master thread is interrupted in the right place, then there are 
400 runnable tasks in the runqueue.
It seems that the scheduler just processes these 400 tasks first instead 
of the keventd/ksoftirqd that is necessary for the keyboard handling.

Attached is a new idea, could you try it with your httpd test?

Perhaps the race is actually a race in the user space:
The exit path of semtimedop() does not contain an explicit memory barrier.
For the kernel, it does not matter: It merely reads one integer value.
If sysret is also no memory barrier, then user space might observe stale 
data.

Which cpu do you have? I was unable to show any misbehavior on a Phenom X4.

--
     Manfred

[-- Attachment #2: patch-IN_WAKEUP-v2 --]
[-- Type: text/plain, Size: 3096 bytes --]

From: Manfred Spraul <manfred@colorfullife.com>

The last change to improve the scalability moved the actual wake-up out of
the section that is protected by spin_lock(sma->sem_perm.lock).

This means that IN_WAKEUP can be in queue.status even when the spinlock is
acquired by the current task.  Thus the same loop that is performed when
queue.status is read without the spinlock acquired must be performed when
the spinlock is acquired.

In addition, user space may assume that semtimedop() is a memory
barrier(). Thus add a smp_mb() into the lockless return path - otherwise
the code would return after acquiring a semaphore without a memory
barrier.
Thanks to kamezawa.hiroyu@jp.fujitsu.com for noticing lack of the memory
barrier.

Addresses https://bugzilla.kernel.org/show_bug.cgi?id=16255

[akpm@linux-foundation.org: clean up kerneldoc, checkpatch warning and whitespace]

Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
Reported-by: Luca Tettamanti <kronos.it@gmail.com>
Reported-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Maciej Rutecki <maciej.rutecki@gmail.com>
---
diff --git a/ipc/sem.c b/ipc/sem.c
index 506c849..40a8f46 100644
--- a/ipc/sem.c
+++ b/ipc/sem.c
@@ -1256,6 +1256,33 @@ out:
 	return un;
 }
 
+
+/**
+ * get_queue_result - Retrieve the result code from sem_queue
+ * @q: Pointer to queue structure
+ *
+ * Retrieve the return code from the pending queue. If IN_WAKEUP is found in
+ * q->status, then we must loop until the value is replaced with the final
+ * value: This may happen if a task is woken up by an unrelated event (e.g.
+ * signal) and in parallel the task is woken up by another task because it got
+ * the requested semaphores.
+ *
+ * The function can be called with or without holding the semaphore spinlock.
+ */
+static int get_queue_result(struct sem_queue *q)
+{
+	int error;
+
+	error = q->status;
+	while (unlikely(error == IN_WAKEUP)) {
+		cpu_relax();
+		error = q->status;
+	}
+
+	return error;
+}
+
+
 SYSCALL_DEFINE4(semtimedop, int, semid, struct sembuf __user *, tsops,
 		unsigned, nsops, const struct timespec __user *, timeout)
 {
@@ -1409,15 +1436,18 @@ SYSCALL_DEFINE4(semtimedop, int, semid, struct sembuf __user *, tsops,
 	else
 		schedule();
 
-	error = queue.status;
-	while(unlikely(error == IN_WAKEUP)) {
-		cpu_relax();
-		error = queue.status;
-	}
+	error = get_queue_result(&queue);
 
 	if (error != -EINTR) {
 		/* fast path: update_queue already obtained all requested
-		 * resources */
+		 * resources.
+		 * Perform a smp_mb(): User space could assume that semop()
+		 * is a memory barrier: Without the mb(), the cpu could
+		 * speculatively read in user space stale data that was
+		 * overwritten by the previous owner of the semaphore.
+		 */
+		smp_mb();
+
 		goto out_free;
 	}
 
@@ -1427,10 +1457,12 @@ SYSCALL_DEFINE4(semtimedop, int, semid, struct sembuf __user *, tsops,
 		goto out_free;
 	}
 
+	error = get_queue_result(&queue);
+
 	/*
 	 * If queue.status != -EINTR we are woken up by another process
 	 */
-	error = queue.status;
+
 	if (error != -EINTR) {
 		goto out_unlock_free;
 	}

  reply	other threads:[~2010-06-30 19:07 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-06-18 14:49 2.6.35-rc3 deadlocks on semaphore operations Christoph Lameter
2010-06-18 14:56 ` Christoph Lameter
2010-06-19 14:43   ` Manfred Spraul
2010-06-21 14:31     ` Christoph Lameter
2010-06-21 16:52       ` Manfred Spraul
2010-06-21 17:13         ` Christoph Lameter
2010-06-18 15:34 ` Julia Lawall
2010-06-19 11:06 ` Manfred Spraul
2010-06-21 17:05   ` Christoph Lameter
2010-06-21 17:16     ` Christoph Lameter
2010-06-21 18:05       ` 2.6.35-rc3 deadlocks on wakeup(?) (was on semaphore operations) Christoph Lameter
2010-06-20  6:55 ` 2.6.35-rc3 deadlocks on semaphore operations Maciej Rutecki
2010-06-21 20:01 ` Luca Tettamanti
2010-06-23 16:29   ` Manfred Spraul
2010-06-23 19:14     ` Luca Tettamanti
2010-06-24 19:22       ` Luca Tettamanti
2010-06-25 21:09         ` Manfred Spraul
2010-06-26 12:52           ` Luca Tettamanti
2010-06-26 15:47             ` 2.6.35-rc3: System unresponsive under load Manfred Spraul
2010-06-26 15:49               ` Manfred Spraul
2010-06-26 16:47               ` Luca Tettamanti
2010-06-30 19:07                 ` Manfred Spraul [this message]
2010-07-06 16:01                   ` Luca Tettamanti
2010-07-07 19:34                   ` Luca Tettamanti
2010-07-02 15:53             ` 2.6.35-rc3 deadlocks on semaphore operations Manfred Spraul
2010-06-23 20:27     ` Christoph Lameter

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4C2B960A.7090503@colorfullife.com \
    --to=manfred@colorfullife.com \
    --cc=akpm@linux-foundation.org \
    --cc=cl@linux-foundation.org \
    --cc=julia@diku.dk \
    --cc=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=kronos.it@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=maciej.rutecki@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox