All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/2] lockless wake-queues
@ 2015-04-19 19:17 Davidlohr Bueso
  2015-04-19 19:17 ` [PATCH 1/2] sched: " Davidlohr Bueso
  2015-04-19 19:17 ` [PATCH 2/2] futex: lockless wakeups Davidlohr Bueso
  0 siblings, 2 replies; 12+ messages in thread
From: Davidlohr Bueso @ 2015-04-19 19:17 UTC (permalink / raw)
  To: Peter Zijlstra, Thomas Gleixner, Ingo Molnar
  Cc: Sebastian Andrzej Siewior, Linus Torvalds, Chris Mason,
	Steven Rostedt, fredrik.markstrom, linux-kernel, Davidlohr Bueso

Hello,

This series is aimed at addressing some of the futex hash bucket
lock hold times by introducing lockless wake-queues for futex_wake.

patch-1: introduces the lockless wake-queue machinery.
patch-2: makes use of patch 1 for futexes.

Details in the individual patches.

This was suggested sometime ago by peterz, but due to it potentially
causing spurious wakeups, was never given much consideration. However,
nowadays, so far, I am reliably booting a 45-core box with peterz's
patch to trigger spurious wakeups. While there are drivers out there
that do not play nice with schedule(), they can be fixed over time --
while this is a production problem for some customers). Furthermore,
after some auditing, there really aren't that many, it a lot of cases,
those functions that end up calling schedule are merely wrapped in a
loop, so just not clear at first sight.

Other code that could be migrated to lockless wake_q includes:
s390 pfault_interrupt, sysv msg (sem already does the lockless wakeups
without the ipc object lock), among others. However, futexes are the
only users afaik that solve a _real_ issue.

Applies on top of Linus' tree 4.0-64fb1d0e975e. Thanks!

  sched: lockless wake-queues
  futex: lockless wakeups

 include/linux/sched.h | 30 ++++++++++++++++++++++++++++++
 kernel/futex.c        | 33 +++++++++++++++++----------------
 kernel/sched/core.c   | 49 +++++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 96 insertions(+), 16 deletions(-)

-- 
2.1.4


^ permalink raw reply	[flat|nested] 12+ messages in thread
* Re: [PATCH 1/2] sched: lockless wake-queues
@ 2015-04-20 18:24 George Spelvin
  2015-04-20 20:08 ` Davidlohr Bueso
  0 siblings, 1 reply; 12+ messages in thread
From: George Spelvin @ 2015-04-20 18:24 UTC (permalink / raw)
  To: dave; +Cc: linux, linux-kernel, peterz

+struct wake_q_head {
+	struct wake_q_node *first;
+	struct wake_q_node *last;
+};
+
+#define WAKE_Q_TAIL ((struct wake_q_node *) 0x01)
+
+#define WAKE_Q(name)					\
+	struct wake_q_head name = { WAKE_Q_TAIL, WAKE_Q_TAIL }

Is there some reason you don't use the simpler singly-linked list
construction with the tail being a pointer to a pointer:

struct wake_q_head {
       struct wake_q_node *first, **lastp;
};

#define WAKE_Q(name)                                   \
       struct wake_q_head name = { WAKE_Q_TAIL, &name.first }


That removes a conditional from wake_q_add:

+/*
+ * Queue a task for later wake-up by wake_up_q().  If the task is already
+ * queued by someone else, leave it to them to deliver the wakeup.
+ *
+ * This property makes it impossible to guarantee the order of wakeups,
+ * but for efficiency we try to deliver wakeups in the order tasks
+ * are added.  If we didn't mind reversing the order, a LIFO stack
+ * would be simpler.
+ */
+void wake_q_add(struct wake_q_head *head, struct task_struct *task)
+{
+	struct wake_q_node *node = &task->wake_q;
+
+	/*
+	 * Atomically grab the task, if ->wake_q is !nil already it means
+	 * its already queued (either by us or someone else) and will get the
+	 * wakeup due to that.
+	 *
+	 * This cmpxchg() implies a full barrier, which pairs with the write
+	 * barrier implied by the wakeup in wake_up_list().
+	 */
+	if (cmpxchg(&node->next, NULL, WAKE_Q_TAIL))
+		return;
+
+	get_task_struct(task);
+
+	/*
+	 * The head is context local, there can be no concurrency.
+	 */
+	*head->lastp = node;
+	head->lastp = &node->next;
+}

It may also be worth commenting the fact that wake_up_q() leaves the
struct wake_q_head in a corrupt state, so don't try to do it again.

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2015-04-21  1:39 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-04-19 19:17 [PATCH 0/2] lockless wake-queues Davidlohr Bueso
2015-04-19 19:17 ` [PATCH 1/2] sched: " Davidlohr Bueso
2015-04-20 14:42   ` Peter Zijlstra
2015-04-19 19:17 ` [PATCH 2/2] futex: lockless wakeups Davidlohr Bueso
2015-04-20  6:18   ` Ingo Molnar
2015-04-20 13:55     ` Davidlohr Bueso
2015-04-20 14:57       ` Peter Zijlstra
2015-04-20 17:31         ` Davidlohr Bueso
2015-04-20 16:04   ` Linus Torvalds
  -- strict thread matches above, loose matches on Subject: below --
2015-04-20 18:24 [PATCH 1/2] sched: lockless wake-queues George Spelvin
2015-04-20 20:08 ` Davidlohr Bueso
2015-04-21  1:39   ` George Spelvin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.