Re: Udpated sys_membarrier() speedup patch, FYI

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
To: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Avi Kivity <avi@scylladb.com>,
	maged michael <maged.michael@gmail.com>,
	Andrew Hunter <ahh@google.com>,
	gromer@google.com, linux-kernel <linux-kernel@vger.kernel.org>
Subject: Re: Udpated sys_membarrier() speedup patch, FYI
Date: Thu, 27 Jul 2017 20:58:38 +0000 (UTC)	[thread overview]
Message-ID: <1035920775.29109.1501189118277.JavaMail.zimbra@efficios.com> (raw)
In-Reply-To: <20170727203706.GO3730@linux.vnet.ibm.com>

----- On Jul 27, 2017, at 4:37 PM, Paul E. McKenney paulmck@linux.vnet.ibm.com wrote:

> On Thu, Jul 27, 2017 at 11:04:13PM +0300, Avi Kivity wrote:
>> On 07/27/2017 10:43 PM, Paul E. McKenney wrote:
>> >On Thu, Jul 27, 2017 at 10:20:14PM +0300, Avi Kivity wrote:
>> >>On 07/27/2017 09:12 PM, Paul E. McKenney wrote:
>> >>>Hello!
>> >>>
>> >>>Please see below for a prototype sys_membarrier() speedup patch.
>> >>>Please note that there is some controversy on this subject, so the final
>> >>>version will probably be quite a bit different than this prototype.
>> >>>
>> >>>But my main question is whether the throttling shown below is acceptable
>> >>>for your use cases, namely only one expedited sys_membarrier() permitted
>> >>>per scheduling-clock period (1 millisecond on many platforms), with any
>> >>>excess being silently converted to non-expedited form.  The reason for
>> >>>the throttling is concerns about DoS attacks based on user code with a
>> >>>tight loop invoking this system call.
>> >>>
>> >>>Thoughts?
>> >>Silent throttling would render it useless for me. -EAGAIN is a
>> >>little better, but I'd be forced to spin until either I get kicked
>> >>out of my loop, or it succeeds.
>> >>
>> >>IPIing only running threads of my process would be perfect. In fact
>> >>I might even be able to make use of "membarrier these threads
>> >>please" to reduce IPIs, when I change the topology from fully
>> >>connected to something more sparse, on larger machines.
>> >>
>> >>My previous implementations were a signal (but that's horrible on
>> >>large machines) and trylock + mprotect (but that doesn't work on
>> >>ARM).
>> >OK, how about the following patch, which IPIs only the running
>> >threads of the process doing the sys_membarrier()?
>> 
>> Works for me.
> 
> Thank you for testing!  I expect that Mathieu will have a v2 soon,
> hopefully CCing you guys.  (If not, I will forward it.)
> 

Will do!

> Mathieu, please note Avi's feedback below.

More below,

> 
>							Thanx, Paul
> 
>> >------------------------------------------------------------------------
>> >
>> >From: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
>> >To: Peter Zijlstra <peterz@infradead.org>
>> >Cc: linux-kernel@vger.kernel.org, Mathieu Desnoyers
>> >  <mathieu.desnoyers@efficios.com>,
>> >  "Paul E . McKenney" <paulmck@linux.vnet.ibm.com>, Boqun Feng
>> >  <boqun.feng@gmail.com>
>> >Subject: [RFC PATCH] membarrier: expedited private command
>> >Date: Thu, 27 Jul 2017 14:59:43 -0400
>> >Message-Id: <20170727185943.11570-1-mathieu.desnoyers@efficios.com>
>> >
>> >Implement MEMBARRIER_CMD_PRIVATE_EXPEDITED with IPIs using cpumask built
>> >from all runqueues for which current thread's mm is the same as our own.
>> >
>> >Scheduler-wise, it requires that we add a memory barrier after context
>> >switching between processes (which have different mm).
>> >
>> >It would be interesting to benchmark the overhead of this added barrier
>> >on the performance of context switching between processes. If the
>> >preexisting overhead of switching between mm is high enough, the
>> >overhead of adding this extra barrier may be insignificant.
>> >
>> >[ Compile-tested only! ]
>> >
>> >CC: Peter Zijlstra <peterz@infradead.org>
>> >CC: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
>> >CC: Boqun Feng <boqun.feng@gmail.com>
>> >Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
>> >---
>> >  include/uapi/linux/membarrier.h |  8 +++--
>> >  kernel/membarrier.c             | 76 ++++++++++++++++++++++++++++++++++++++++-
>> >  kernel/sched/core.c             | 21 ++++++++++++
>> >  3 files changed, 102 insertions(+), 3 deletions(-)
>> >
>> >diff --git a/include/uapi/linux/membarrier.h b/include/uapi/linux/membarrier.h
>> >index e0b108bd2624..6a33c5852f6b 100644
>> >--- a/include/uapi/linux/membarrier.h
>> >+++ b/include/uapi/linux/membarrier.h
>> >@@ -40,14 +40,18 @@
>> >   *                          (non-running threads are de facto in such a
>> >   *                          state). This covers threads from all processes
>> >   *                          running on the system. This command returns 0.
>> >+ * TODO: documentation.
>> >   *
>> >   * Command to be passed to the membarrier system call. The commands need to
>> >   * be a single bit each, except for MEMBARRIER_CMD_QUERY which is assigned to
>> >   * the value 0.
>> >   */
>> >  enum membarrier_cmd {
>> >-	MEMBARRIER_CMD_QUERY = 0,
>> >-	MEMBARRIER_CMD_SHARED = (1 << 0),
>> >+	MEMBARRIER_CMD_QUERY			= 0,
>> >+	MEMBARRIER_CMD_SHARED			= (1 << 0),
>> >+	/* reserved for MEMBARRIER_CMD_SHARED_EXPEDITED (1 << 1) */
>> >+	/* reserved for MEMBARRIER_CMD_PRIVATE (1 << 2) */
>> >+	MEMBARRIER_CMD_PRIVATE_EXPEDITED	= (1 << 3),
>> >  };
>> >
>> >  #endif /* _UAPI_LINUX_MEMBARRIER_H */
>> >diff --git a/kernel/membarrier.c b/kernel/membarrier.c
>> >index 9f9284f37f8d..8c6c0f96f617 100644
>> >--- a/kernel/membarrier.c
>> >+++ b/kernel/membarrier.c
>> >@@ -19,10 +19,81 @@
>> >  #include <linux/tick.h>
>> >
>> >  /*
>> >+ * XXX For cpu_rq(). Should we rather move
>> >+ * membarrier_private_expedited() to sched/core.c or create
>> >+ * sched/membarrier.c ?
>> >+ */
>> >+#include "sched/sched.h"
>> >+
>> >+/*
>> >   * Bitmask made from a "or" of all commands within enum membarrier_cmd,
>> >   * except MEMBARRIER_CMD_QUERY.
>> >   */
>> >-#define MEMBARRIER_CMD_BITMASK	(MEMBARRIER_CMD_SHARED)
>> >+#define MEMBARRIER_CMD_BITMASK	\
>> >+	(MEMBARRIER_CMD_SHARED | MEMBARRIER_CMD_PRIVATE_EXPEDITED)
>> >+
>> 
>> >	rcu_read_unlock();
>> >+	}
>> >+}
>> >+
>> >+static void membarrier_private_expedited(void)
>> >+{
>> >+	int cpu, this_cpu;
>> >+	cpumask_var_t tmpmask;
>> >+
>> >+	if (num_online_cpus() == 1)
>> >+		return;
>> >+
>> >+	/*
>> >+	 * Matches memory barriers around rq->curr modification in
>> >+	 * scheduler.
>> >+	 */
>> >+	smp_mb();	/* system call entry is not a mb. */
>> >+
>> >+	if (!alloc_cpumask_var(&tmpmask, GFP_NOWAIT)) {
>> >+		/* Fallback for OOM. */
>> >+		membarrier_private_expedited_ipi_each();
>> >+		goto end;
>> >+	}
>> >+
>> >+	this_cpu = raw_smp_processor_id();
>> >+	for_each_online_cpu(cpu) {
>> >+		struct task_struct *p;
>> >+
>> >+		if (cpu == this_cpu)
>> >+			continue;
>> >+		rcu_read_lock();
>> >+		p = task_rcu_dereference(&cpu_rq(cpu)->curr);
>> >+		if (p && p->mm == current->mm)
>> >+			__cpumask_set_cpu(cpu, tmpmask);
>> 
>> This gets you some false positives, if the CPU idled then mm will
>> not have changed.
> 
> Good point!  The battery-powered embedded guys would probably prefer
> we not needlessly IPI idle CPUs.  We cannot rely on RCU's dyntick-idle
> state in nohz_full cases.  Not sure if is_idle_task() can be used
> safely, given things like play_idle().

Would changing the check in this loop to:

if (p && !is_idle_task(p) && p->mm == current->mm) {

work for you ?

Thanks,

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

next prev parent reply	other threads:[~2017-07-27 20:56 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-07-27 18:12 Udpated sys_membarrier() speedup patch, FYI Paul E. McKenney
2017-07-27 18:36 ` Andrew Hunter
2017-07-27 19:06   ` Paul E. McKenney
2017-07-28 17:37     ` Andrew Hunter
2017-07-28 18:14       ` Paul E. McKenney
2017-07-27 19:20 ` Avi Kivity
2017-07-27 19:43   ` Paul E. McKenney
2017-07-27 20:04     ` Avi Kivity
2017-07-27 20:37       ` Paul E. McKenney
2017-07-27 20:58         ` Mathieu Desnoyers [this message]
2017-07-27 21:02           ` Mathieu Desnoyers
2017-07-31  6:03             ` Avi Kivity
2017-07-31  8:37               ` Peter Zijlstra
2017-07-31  8:53                 ` Avi Kivity
2017-07-28 17:15     ` Andrew Hunter
2017-07-28 17:25       ` Mathieu Desnoyers
2017-07-28 17:31       ` Paul E. McKenney
2017-07-28 17:48         ` Mathieu Desnoyers
2017-07-31 18:00 ` Dave Watson
2017-07-31 18:27   ` Paul E. McKenney

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1035920775.29109.1501189118277.JavaMail.zimbra@efficios.com \
    --to=mathieu.desnoyers@efficios.com \
    --cc=ahh@google.com \
    --cc=avi@scylladb.com \
    --cc=gromer@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=maged.michael@gmail.com \
    --cc=paulmck@linux.vnet.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.