All of lore.kernel.org
 help / color / mirror / Atom feed
From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
To: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org,
	Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@redhat.com>, Darren Hart <darren@dvhart.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	fredrik.markstrom@windriver.com,
	Davidlohr Bueso <dave@stgolabs.net>,
	Manfred Spraul <manfred@colorfullife.com>
Subject: [PATCH 2/3 v2] futex: avoid double wake up in futex_wake() on -RT
Date: Fri, 10 Apr 2015 18:11:35 +0200	[thread overview]
Message-ID: <20150410161135.GF3057@linutronix.de> (raw)
In-Reply-To: <alpine.DEB.2.11.1504072131400.3845@nanos>

futex_wake() wakes the waiter while holding the hb->lock. The waiter
does not take the hb->lock and can leave the kernel. However the next
operation the same futex operation will point to the same hb->lock and
we will see a small dance around the lock including prio-boosting and
context switch:

low prio task FUTEX_WAKE on high prio
| ft-1489  [000] ....1..    81.167501: sys_enter: sys_futex (8049f60, 1, 1, 0, 0, 0)
| ft-1489  [000] dN..311    81.167504: sched_wakeup: pid=1490 prio=94
| ft-1489  [000] d...311    81.167520: sched_switch: prev_pid=1489 prev_prio=120 prev_state=R+ ==> next_pid=1490 next_prio=94
| ft-1490  [000] ....1..    81.167522: sys_exit: sys_futex = 0

prio task FUTEX_WAKE on low prio
| ft-1490  [000] ....1..    81.167528: sys_enter: sys_futex (8049f5c, 1, 1, 0, 0, 0)
| ft-1490  [000] ....1..    81.167530: sys_exit: sys_futex = 0

prio task waits FUTEX_WAIT, hb->lock still owned by low prio task
| ft-1490  [000] ....1..    81.167534: sys_enter: sys_futex (8049f60, 0, 1, 0, 0, 0)
| ft-1490  [000] d...411    81.167895: sched_pi_setprio: pid=1489 oldprio=120 newprio=94
| ft-1490  [000] d...311    81.167901: sched_switch: prev_pid=1490 prev_prio=94 prev_state=D ==> next_pid=1489 next_prio=94
| ft-1489  [000] d...411    81.167910: sched_wakeup: pid=1490 prio=94
| ft-1489  [000] d...311    81.167912: sched_pi_setprio: pid=1489 oldprio=94 newprio=120
| ft-1489  [000] d...311    81.167915: sched_switch: prev_pid=1489 prev_prio=120 prev_state=R+ ==> next_pid=1490 next_prio=94
| ft-1490  [000] d...3..    81.167922: sched_switch: prev_pid=1490 prev_prio=94 prev_state=S ==> next_pid=1489 next_prio=120
| ft-1489  [000] ....1..    81.167924: sys_exit: sys_futex = 1

This patch delays the wakeup of the process untill the hb->lock is
dropped to avoid boosting + context switch to obtain the lock.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
v1…v2:
    - update patch description
    - move the comment to __wake_futex()
    - move the wakeup before the out_put_key label in futex_wake()

 kernel/futex.c | 24 +++++++++++++++++++++---
 1 file changed, 21 insertions(+), 3 deletions(-)

diff --git a/kernel/futex.c b/kernel/futex.c
index b38abe3573a8..658f4d05cd6f 100644
--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -1092,12 +1092,12 @@ static void __unqueue_futex(struct futex_q *q)
  * The hash bucket lock must be held when this is called.
  * Afterwards, the futex_q must not be accessed.
  */
-static void wake_futex(struct futex_q *q)
+static struct task_struct *__wake_futex(struct futex_q *q)
 {
 	struct task_struct *p = q->task;
 
 	if (WARN(q->pi_state || q->rt_waiter, "refusing to wake PI futex\n"))
-		return;
+		return NULL;
 
 	/*
 	 * We set q->lock_ptr = NULL _before_ we wake up the task. If
@@ -1117,6 +1117,15 @@ static void wake_futex(struct futex_q *q)
 	 */
 	smp_wmb();
 	q->lock_ptr = NULL;
+	return p;
+}
+
+static void wake_futex(struct futex_q *q)
+{
+	struct task_struct *p = __wake_futex(q);
+
+	if (!p)
+		return;
 
 	wake_up_state(p, TASK_NORMAL);
 	put_task_struct(p);
@@ -1228,6 +1237,7 @@ futex_wake(u32 __user *uaddr, unsigned int flags, int nr_wake, u32 bitset)
 	struct futex_hash_bucket *hb;
 	struct futex_q *this, *next;
 	union futex_key key = FUTEX_KEY_INIT;
+	struct task_struct *waiter = NULL;
 	int ret;
 
 	if (!bitset)
@@ -1256,13 +1266,21 @@ futex_wake(u32 __user *uaddr, unsigned int flags, int nr_wake, u32 bitset)
 			if (!(this->bitset & bitset))
 				continue;
 
-			wake_futex(this);
+			if (nr_wake == 1)
+				waiter = __wake_futex(this);
+			else
+				wake_futex(this);
 			if (++ret >= nr_wake)
 				break;
 		}
 	}
 
 	spin_unlock(&hb->lock);
+	if (waiter) {
+		wake_up_state(waiter, TASK_NORMAL);
+		put_task_struct(waiter);
+	}
+
 out_put_key:
 	put_futex_key(&key);
 out:
-- 
2.1.4


  reply	other threads:[~2015-04-10 16:11 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-04-07 15:03 improve futex on -RT by avoiding the double wake-up Sebastian Andrzej Siewior
2015-04-07 15:03 ` [PATCH 1/3] futex: avoid double wake up in PI futex wait / wake on -RT Sebastian Andrzej Siewior
2015-04-07 18:41   ` Thomas Gleixner
2015-04-10 14:42     ` [PATCH 1/3 v2] " Sebastian Andrzej Siewior
2015-04-07 15:03 ` [PATCH 2/3] futex: avoid double wake up in futex_wake() " Sebastian Andrzej Siewior
2015-04-07 19:47   ` Thomas Gleixner
2015-04-10 16:11     ` Sebastian Andrzej Siewior [this message]
2015-04-13  3:02       ` [PATCH 2/3 v2] " Davidlohr Bueso
2015-04-16  5:09         ` Davidlohr Bueso
2015-04-16  9:19           ` Thomas Gleixner
2015-04-16 10:16             ` Peter Zijlstra
2015-04-16 10:49               ` Thomas Gleixner
2015-04-16 14:42               ` Davidlohr Bueso
2015-04-16 15:54                 ` Peter Zijlstra
2015-04-16 16:22                   ` Davidlohr Bueso
2015-04-07 15:03 ` [PATCH 3/3] ipc/mqueue: remove STATE_PENDING Sebastian Andrzej Siewior
2015-04-07 17:48   ` Manfred Spraul
2015-04-07 18:28     ` Thomas Gleixner
2015-04-10 14:37     ` [PATCH v2] " Sebastian Andrzej Siewior
2015-04-23 22:18       ` Thomas Gleixner
2015-04-28  3:24         ` Davidlohr Bueso
2015-04-28 12:37           ` Peter Zijlstra
2015-04-28 16:36             ` Davidlohr Bueso
2015-04-28 16:43               ` Peter Zijlstra
2015-04-28 16:59                 ` Davidlohr Bueso
2015-04-29 19:44                   ` Manfred Spraul
2015-04-30 18:46                     ` Davidlohr Bueso

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150410161135.GF3057@linutronix.de \
    --to=bigeasy@linutronix.de \
    --cc=darren@dvhart.com \
    --cc=dave@stgolabs.net \
    --cc=fredrik.markstrom@windriver.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=manfred@colorfullife.com \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.