linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: linux-kernel <linux-kernel@vger.kernel.org>,
	Ingo Molnar <mingo@kernel.org>,
	Lai Jiangshan <jiangshanlai@gmail.com>,
	dipankar <dipankar@in.ibm.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Josh Triplett <josh@joshtriplett.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Peter Zijlstra <peterz@infradead.org>,
	rostedt <rostedt@goodmis.org>,
	David Howells <dhowells@redhat.com>,
	Eric Dumazet <edumazet@google.com>, fweisbec <fweisbec@gmail.com>,
	Oleg Nesterov <oleg@redhat.com>
Subject: Re: [PATCH tip/core/rcu 4/5] sys_membarrier: Add expedited option
Date: Tue, 25 Jul 2017 09:48:35 -0700	[thread overview]
Message-ID: <20170725164835.GQ3730@linux.vnet.ibm.com> (raw)
In-Reply-To: <1889183220.24786.1500988868552.JavaMail.zimbra@efficios.com>

On Tue, Jul 25, 2017 at 01:21:08PM +0000, Mathieu Desnoyers wrote:
> ----- On Jul 24, 2017, at 5:58 PM, Paul E. McKenney paulmck@linux.vnet.ibm.com wrote:
> 
> > The sys_membarrier() system call has proven too slow for some use
> > cases, which has prompted users to instead rely on TLB shootdown.
> > Although TLB shootdown is much faster, it has the slight disadvantage
> > of not working at all on arm and arm64.  This commit therefore adds
> > an expedited option to the sys_membarrier() system call.
> 
> Is this now possible because the synchronize_sched_expedited()
> implementation does not require to send IPIs to all CPUS ? I
> suspect that using tree srcu now solves this somehow, but can
> you tell us a bit more about why it is now OK to expose this
> to user-space ?

I have gotten complaints from several users that sys_membarrier() is too
slow to be useful for them.  So they are hacking around this problem by
unmapping a region of memory, thus getting the IPIs and memory barriers
on all CPUs, but with additional mm overhead.  Plus this is non-portable,
and fragile with respect to reasonable optimizations, as was discussed
on LKML some time back:

	https://marc.info/?l=linux-kernel&m=142619683526482

So we really need to make sys_membarrier() work for these users.
If we don't, we certainly will look quite silly criticizing their
use of invoking TLB shootdown via unmapping, now won't we?

Now back in 2015, expedited grace periods were horribly slow, but
I have optimized them to the point that it should be no worse than
TLB shootdown IPIs.  Plus it is portable, and not subject to death
by optimization.

> The commit message here does not explain why it is OK real-time
> wise to expose this feature as a system call.

I figure that kernels providing that level of real-time response
will disable this, perhaps in a manner similar to that for NO_HZ_FULL.

Plus I intend to add your earlier IPI-all-threads-in-this-process
option, which will allow the people asking for this to do reasonable
testing.

Obviously, unless there are good test results and some level of user
enthusiasm, this patch goes nowhere.

Seem reasonable?

							Thanx, Paul

> Thanks,
> 
> Mathieu
> 
> 
> > 
> > Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> > ---
> > include/uapi/linux/membarrier.h | 11 +++++++++++
> > kernel/membarrier.c             |  7 ++++++-
> > 2 files changed, 17 insertions(+), 1 deletion(-)
> > 
> > diff --git a/include/uapi/linux/membarrier.h b/include/uapi/linux/membarrier.h
> > index e0b108bd2624..ba36d8a6be61 100644
> > --- a/include/uapi/linux/membarrier.h
> > +++ b/include/uapi/linux/membarrier.h
> > @@ -40,6 +40,16 @@
> >  *                          (non-running threads are de facto in such a
> >  *                          state). This covers threads from all processes
> >  *                          running on the system. This command returns 0.
> > + * @MEMBARRIER_CMD_SHARED_EXPEDITED:  Execute a memory barrier on all
> > + *			    running threads, but in an expedited fashion.
> > + *                          Upon return from system call, the caller thread
> > + *                          is ensured that all running threads have passed
> > + *                          through a state where all memory accesses to
> > + *                          user-space addresses match program order between
> > + *                          entry to and return from the system call
> > + *                          (non-running threads are de facto in such a
> > + *                          state). This covers threads from all processes
> > + *                          running on the system. This command returns 0.
> >  *
> >  * Command to be passed to the membarrier system call. The commands need to
> >  * be a single bit each, except for MEMBARRIER_CMD_QUERY which is assigned to
> > @@ -48,6 +58,7 @@
> > enum membarrier_cmd {
> > 	MEMBARRIER_CMD_QUERY = 0,
> > 	MEMBARRIER_CMD_SHARED = (1 << 0),
> > +	MEMBARRIER_CMD_SHARED_EXPEDITED = (2 << 0),
> > };
> > 
> > #endif /* _UAPI_LINUX_MEMBARRIER_H */
> > diff --git a/kernel/membarrier.c b/kernel/membarrier.c
> > index 9f9284f37f8d..b749c39bb219 100644
> > --- a/kernel/membarrier.c
> > +++ b/kernel/membarrier.c
> > @@ -22,7 +22,8 @@
> >  * Bitmask made from a "or" of all commands within enum membarrier_cmd,
> >  * except MEMBARRIER_CMD_QUERY.
> >  */
> > -#define MEMBARRIER_CMD_BITMASK	(MEMBARRIER_CMD_SHARED)
> > +#define MEMBARRIER_CMD_BITMASK	(MEMBARRIER_CMD_SHARED |		\
> > +				 MEMBARRIER_CMD_SHARED_EXPEDITED)
> > 
> > /**
> >  * sys_membarrier - issue memory barriers on a set of threads
> > @@ -64,6 +65,10 @@ SYSCALL_DEFINE2(membarrier, int, cmd, int, flags)
> > 		if (num_online_cpus() > 1)
> > 			synchronize_sched();
> > 		return 0;
> > +	case MEMBARRIER_CMD_SHARED_EXPEDITED:
> > +		if (num_online_cpus() > 1)
> > +			synchronize_sched_expedited();
> > +		return 0;
> > 	default:
> > 		return -EINVAL;
> > 	}
> > --
> > 2.5.2
> 
> -- 
> Mathieu Desnoyers
> EfficiOS Inc.
> http://www.efficios.com
> 

  reply	other threads:[~2017-07-25 16:48 UTC|newest]

Thread overview: 68+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-07-24 21:57 [PATCH tip/core/rcu 0/5] Related non-RCU updates Paul E. McKenney
2017-07-24 21:58 ` [PATCH tip/core/rcu 1/5] module: Fix pr_fmt() bug for header use of printk Paul E. McKenney
2017-07-24 21:58 ` [PATCH tip/core/rcu 2/5] init_task: Remove redundant INIT_TASK_RCU_TREE_PREEMPT() macro Paul E. McKenney
2017-07-24 21:58 ` [PATCH tip/core/rcu 3/5] EXPERIMENTAL sched: Allow migrating kthreads into online but inactive CPUs Paul E. McKenney
2017-07-24 21:58 ` [PATCH tip/core/rcu 4/5] sys_membarrier: Add expedited option Paul E. McKenney
2017-07-25  4:27   ` Boqun Feng
2017-07-25 16:24     ` Paul E. McKenney
2017-07-25 13:21   ` Mathieu Desnoyers
2017-07-25 16:48     ` Paul E. McKenney [this message]
2017-07-25 16:33   ` Peter Zijlstra
2017-07-25 16:49     ` Paul E. McKenney
2017-07-25 16:59       ` Peter Zijlstra
2017-07-25 17:17         ` Paul E. McKenney
2017-07-25 18:53           ` Peter Zijlstra
2017-07-25 19:36             ` Paul E. McKenney
2017-07-25 20:24               ` Peter Zijlstra
2017-07-25 21:19                 ` Paul E. McKenney
2017-07-25 21:55                   ` Peter Zijlstra
2017-07-25 22:39                     ` Mathieu Desnoyers
2017-07-25 22:50                     ` Mathieu Desnoyers
2017-07-26  0:01                       ` Paul E. McKenney
2017-07-26  7:46                       ` Peter Zijlstra
2017-07-26 15:42                         ` Paul E. McKenney
2017-07-26 18:01                           ` Mathieu Desnoyers
2017-07-26 18:30                             ` Paul E. McKenney
2017-07-26 20:37                               ` Mathieu Desnoyers
2017-07-26 21:11                                 ` Paul E. McKenney
2017-07-27  1:45                                   ` Paul E. McKenney
2017-07-27 12:39                                     ` Mathieu Desnoyers
2017-07-27 14:44                                       ` Paul E. McKenney
2017-07-27 10:24                               ` Peter Zijlstra
2017-07-27 14:52                                 ` Paul E. McKenney
2017-07-27  8:53                             ` Peter Zijlstra
2017-07-27 10:09                               ` Peter Zijlstra
2017-07-27 10:22                               ` Will Deacon
2017-07-27 13:14                               ` Paul E. McKenney
2017-07-25 23:59                     ` Paul E. McKenney
2017-07-26  7:41                       ` Peter Zijlstra
2017-07-26 15:41                         ` Paul E. McKenney
2017-07-27  8:30                           ` Peter Zijlstra
2017-07-27 13:08                             ` Paul E. McKenney
2017-07-27 13:49                               ` Peter Zijlstra
2017-07-27 14:32                                 ` Paul E. McKenney
2017-07-27 14:36                                   ` Peter Zijlstra
2017-07-27 14:46                                     ` Paul E. McKenney
2017-07-27 13:55                               ` Boqun Feng
2017-07-27 14:16                                 ` Paul E. McKenney
2017-07-27 14:29                                   ` Boqun Feng
2017-07-27 14:36                                     ` Paul E. McKenney
2017-07-27 14:41                                       ` Will Deacon
2017-07-27 14:47                                       ` Boqun Feng
2017-07-27 14:55                                         ` Paul E. McKenney
2017-07-27 13:56                               ` Peter Zijlstra
2017-07-27 15:19                                 ` Peter Zijlstra
2017-07-26  9:36                   ` Will Deacon
2017-07-26 15:46                     ` Paul E. McKenney
2017-07-27 10:14               ` Peter Zijlstra
2017-07-27 12:56                 ` Paul E. McKenney
2017-07-27 13:37                   ` Peter Zijlstra
2017-07-27 14:33                     ` Paul E. McKenney
2017-07-24 21:58 ` [PATCH tip/core/rcu 5/5] EXP: sched/cputime: Fix using smp_processor_id() in preemptible Paul E. McKenney
2017-07-24 22:01   ` Wanpeng Li
2017-07-24 22:29     ` Paul E. McKenney
2017-07-31 22:51 ` [PATCH tip/core/rcu 0/5] Related non-RCU updates Paul E. McKenney
2017-07-31 22:53   ` [PATCH v2 tip/core/rcu 1/4] module: Fix pr_fmt() bug for header use of printk Paul E. McKenney
2017-07-31 22:53   ` [PATCH v2 tip/core/rcu 2/4] init_task: Remove redundant INIT_TASK_RCU_TREE_PREEMPT() macro Paul E. McKenney
2017-07-31 22:53   ` [PATCH v2 tip/core/rcu 3/4] sched: Allow migrating kthreads into online but inactive CPUs Paul E. McKenney
2017-07-31 22:53   ` [PATCH v2 tip/core/rcu 4/4] membarrier: Expedited private command Paul E. McKenney

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170725164835.GQ3730@linux.vnet.ibm.com \
    --to=paulmck@linux.vnet.ibm.com \
    --cc=akpm@linux-foundation.org \
    --cc=dhowells@redhat.com \
    --cc=dipankar@in.ibm.com \
    --cc=edumazet@google.com \
    --cc=fweisbec@gmail.com \
    --cc=jiangshanlai@gmail.com \
    --cc=josh@joshtriplett.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mathieu.desnoyers@efficios.com \
    --cc=mingo@kernel.org \
    --cc=oleg@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).