linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: linux-kernel@vger.kernel.org, mingo@kernel.org,
	jiangshanlai@gmail.com, dipankar@in.ibm.com,
	akpm@linux-foundation.org, mathieu.desnoyers@efficios.com,
	josh@joshtriplett.org, tglx@linutronix.de, rostedt@goodmis.org,
	dhowells@redhat.com, edumazet@google.com, fweisbec@gmail.com,
	oleg@redhat.com, will.deacon@arm.com
Subject: Re: [PATCH tip/core/rcu 4/5] sys_membarrier: Add expedited option
Date: Thu, 27 Jul 2017 07:32:05 -0700	[thread overview]
Message-ID: <20170727143205.GU3730@linux.vnet.ibm.com> (raw)
In-Reply-To: <20170727134908.6jiv6rm4tetmrapi@hirez.programming.kicks-ass.net>

On Thu, Jul 27, 2017 at 03:49:08PM +0200, Peter Zijlstra wrote:
> On Thu, Jul 27, 2017 at 06:08:16AM -0700, Paul E. McKenney wrote:
> 
> > > No. Its called wakeup latency :-) Your SCHED_OTHER task will not get to
> > > insta-run all the time. If there are other tasks already running, we'll
> > > not IPI unless it should preempt.
> > > 
> > > If its idle, nobody cares..
> > 
> > So it does IPI immediately sometimes.
> > 
> > > > Does this auto-throttling also apply if the user is running a CPU-bound
> > > > SCHED_BATCH or SCHED_IDLE task on each CPU, and periodically waking up
> > > > one of a large group of SCHED_OTHER tasks, where the SCHED_OTHER tasks
> > > > immediately sleep upon being awakened?
> > > 
> > > SCHED_BATCH is even more likely to suffer wakeup latency since it will
> > > never preempt anything.
> > 
> > Ahem.  In this scenario, SCHED_BATCH is already running on a the CPU in
> > question, and a SCHED_OTHER task is awakened from some other CPU.
> > 
> > Do we IPI in that case.
> 
> So I'm a bit confused as to where you're trying to go with this.
> 
> I'm saying that if there are other users of our CPU, we can't
> significantly disturb them with IPIs.
> 
> Yes, we'll sometimes IPI in order to do a preemption on wakeup. But we
> cannot always win that. There is no wakeup triggered starvation case.
> 
> If you create a thread per CPU and have them insta sleep after wakeup,
> and then keep prodding them to wakeup. They will disturb things less
> than if they were while(1); loops.
> 
> If the machine is otherwise idle, nobody cares.
> 
> If there are other tasks on the system, the IPI rate is limited to the
> wakeup latency of you tasks.
> 
> 
> And any of this is limited to the CPUs we're allowed to run in the first
> place.
> 
> 
> So yes, the occasional IPI happens, but if there's other tasks, we can't
> disturb them more than we could with while(1); tasks.
> 
> 
> OTOH, something like:
> 
> 	while(1)
> 		synchronize_sched_expedited();
> 
> as per your proposed patch, will spray IPIs to all CPUs and at high
> rates.

OK, I have updated my patch to do throttling.

							Thanx, Paul

------------------------------------------------------------------------

commit 4cd5253094b6d7f9501e21e13aa4e2e78e8a70cd
Author: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Date:   Tue Jul 18 13:53:32 2017 -0700

    sys_membarrier: Add expedited option
    
    The sys_membarrier() system call has proven too slow for some use cases,
    which has prompted users to instead rely on TLB shootdown.  Although TLB
    shootdown is much faster, it has the slight disadvantage of not working
    at all on arm and arm64 and also of being vulnerable to reasonable
    optimizations that might skip some IPIs.  However, the Linux kernel
    does not currrently provide a reasonable alternative, so it is hard to
    criticize these users from doing what works for them on a given piece
    of hardware at a given time.
    
    This commit therefore adds an expedited option to the sys_membarrier()
    system call, thus providing a faster mechanism that is portable and
    is not subject to death by optimization.  Note that if more than one
    MEMBARRIER_CMD_SHARED_EXPEDITED sys_membarrier() call happens within
    the same jiffy, all but the first will use synchronize_sched() instead
    of synchronize_sched_expedited().
    
    Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
    [ paulmck: Fix code style issue pointed out by Boqun Feng. ]
    Tested-by: Avi Kivity <avi@scylladb.com>
    Cc: Maged Michael <maged.michael@gmail.com>
    Cc: Andrew Hunter <ahh@google.com>
    Cc: Geoffrey Romer <gromer@google.com>

diff --git a/include/uapi/linux/membarrier.h b/include/uapi/linux/membarrier.h
index e0b108bd2624..5720386d0904 100644
--- a/include/uapi/linux/membarrier.h
+++ b/include/uapi/linux/membarrier.h
@@ -40,6 +40,16 @@
  *                          (non-running threads are de facto in such a
  *                          state). This covers threads from all processes
  *                          running on the system. This command returns 0.
+ * @MEMBARRIER_CMD_SHARED_EXPEDITED:  Execute a memory barrier on all
+ *			    running threads, but in an expedited fashion.
+ *                          Upon return from system call, the caller thread
+ *                          is ensured that all running threads have passed
+ *                          through a state where all memory accesses to
+ *                          user-space addresses match program order between
+ *                          entry to and return from the system call
+ *                          (non-running threads are de facto in such a
+ *                          state). This covers threads from all processes
+ *                          running on the system. This command returns 0.
  *
  * Command to be passed to the membarrier system call. The commands need to
  * be a single bit each, except for MEMBARRIER_CMD_QUERY which is assigned to
@@ -48,6 +58,7 @@
 enum membarrier_cmd {
 	MEMBARRIER_CMD_QUERY = 0,
 	MEMBARRIER_CMD_SHARED = (1 << 0),
+	MEMBARRIER_CMD_SHARED_EXPEDITED = (1 << 1),
 };
 
 #endif /* _UAPI_LINUX_MEMBARRIER_H */
diff --git a/kernel/membarrier.c b/kernel/membarrier.c
index 9f9284f37f8d..587e3bbfae7e 100644
--- a/kernel/membarrier.c
+++ b/kernel/membarrier.c
@@ -22,7 +22,8 @@
  * Bitmask made from a "or" of all commands within enum membarrier_cmd,
  * except MEMBARRIER_CMD_QUERY.
  */
-#define MEMBARRIER_CMD_BITMASK	(MEMBARRIER_CMD_SHARED)
+#define MEMBARRIER_CMD_BITMASK	(MEMBARRIER_CMD_SHARED |		\
+				 MEMBARRIER_CMD_SHARED_EXPEDITED)
 
 /**
  * sys_membarrier - issue memory barriers on a set of threads
@@ -64,6 +65,20 @@ SYSCALL_DEFINE2(membarrier, int, cmd, int, flags)
 		if (num_online_cpus() > 1)
 			synchronize_sched();
 		return 0;
+	case MEMBARRIER_CMD_SHARED_EXPEDITED:
+		if (num_online_cpus() > 1) {
+			static unsigned long lastexp;
+			unsigned long j;
+
+			j = jiffies;
+			if (READ_ONCE(lastexp) == j) {
+				synchronize_sched();
+				WRITE_ONCE(lastexp, j);
+			} else {
+				synchronize_sched_expedited();
+			}
+		}
+		return 0;
 	default:
 		return -EINVAL;
 	}

  reply	other threads:[~2017-07-27 14:32 UTC|newest]

Thread overview: 68+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-07-24 21:57 [PATCH tip/core/rcu 0/5] Related non-RCU updates Paul E. McKenney
2017-07-24 21:58 ` [PATCH tip/core/rcu 1/5] module: Fix pr_fmt() bug for header use of printk Paul E. McKenney
2017-07-24 21:58 ` [PATCH tip/core/rcu 2/5] init_task: Remove redundant INIT_TASK_RCU_TREE_PREEMPT() macro Paul E. McKenney
2017-07-24 21:58 ` [PATCH tip/core/rcu 3/5] EXPERIMENTAL sched: Allow migrating kthreads into online but inactive CPUs Paul E. McKenney
2017-07-24 21:58 ` [PATCH tip/core/rcu 4/5] sys_membarrier: Add expedited option Paul E. McKenney
2017-07-25  4:27   ` Boqun Feng
2017-07-25 16:24     ` Paul E. McKenney
2017-07-25 13:21   ` Mathieu Desnoyers
2017-07-25 16:48     ` Paul E. McKenney
2017-07-25 16:33   ` Peter Zijlstra
2017-07-25 16:49     ` Paul E. McKenney
2017-07-25 16:59       ` Peter Zijlstra
2017-07-25 17:17         ` Paul E. McKenney
2017-07-25 18:53           ` Peter Zijlstra
2017-07-25 19:36             ` Paul E. McKenney
2017-07-25 20:24               ` Peter Zijlstra
2017-07-25 21:19                 ` Paul E. McKenney
2017-07-25 21:55                   ` Peter Zijlstra
2017-07-25 22:39                     ` Mathieu Desnoyers
2017-07-25 22:50                     ` Mathieu Desnoyers
2017-07-26  0:01                       ` Paul E. McKenney
2017-07-26  7:46                       ` Peter Zijlstra
2017-07-26 15:42                         ` Paul E. McKenney
2017-07-26 18:01                           ` Mathieu Desnoyers
2017-07-26 18:30                             ` Paul E. McKenney
2017-07-26 20:37                               ` Mathieu Desnoyers
2017-07-26 21:11                                 ` Paul E. McKenney
2017-07-27  1:45                                   ` Paul E. McKenney
2017-07-27 12:39                                     ` Mathieu Desnoyers
2017-07-27 14:44                                       ` Paul E. McKenney
2017-07-27 10:24                               ` Peter Zijlstra
2017-07-27 14:52                                 ` Paul E. McKenney
2017-07-27  8:53                             ` Peter Zijlstra
2017-07-27 10:09                               ` Peter Zijlstra
2017-07-27 10:22                               ` Will Deacon
2017-07-27 13:14                               ` Paul E. McKenney
2017-07-25 23:59                     ` Paul E. McKenney
2017-07-26  7:41                       ` Peter Zijlstra
2017-07-26 15:41                         ` Paul E. McKenney
2017-07-27  8:30                           ` Peter Zijlstra
2017-07-27 13:08                             ` Paul E. McKenney
2017-07-27 13:49                               ` Peter Zijlstra
2017-07-27 14:32                                 ` Paul E. McKenney [this message]
2017-07-27 14:36                                   ` Peter Zijlstra
2017-07-27 14:46                                     ` Paul E. McKenney
2017-07-27 13:55                               ` Boqun Feng
2017-07-27 14:16                                 ` Paul E. McKenney
2017-07-27 14:29                                   ` Boqun Feng
2017-07-27 14:36                                     ` Paul E. McKenney
2017-07-27 14:41                                       ` Will Deacon
2017-07-27 14:47                                       ` Boqun Feng
2017-07-27 14:55                                         ` Paul E. McKenney
2017-07-27 13:56                               ` Peter Zijlstra
2017-07-27 15:19                                 ` Peter Zijlstra
2017-07-26  9:36                   ` Will Deacon
2017-07-26 15:46                     ` Paul E. McKenney
2017-07-27 10:14               ` Peter Zijlstra
2017-07-27 12:56                 ` Paul E. McKenney
2017-07-27 13:37                   ` Peter Zijlstra
2017-07-27 14:33                     ` Paul E. McKenney
2017-07-24 21:58 ` [PATCH tip/core/rcu 5/5] EXP: sched/cputime: Fix using smp_processor_id() in preemptible Paul E. McKenney
2017-07-24 22:01   ` Wanpeng Li
2017-07-24 22:29     ` Paul E. McKenney
2017-07-31 22:51 ` [PATCH tip/core/rcu 0/5] Related non-RCU updates Paul E. McKenney
2017-07-31 22:53   ` [PATCH v2 tip/core/rcu 1/4] module: Fix pr_fmt() bug for header use of printk Paul E. McKenney
2017-07-31 22:53   ` [PATCH v2 tip/core/rcu 2/4] init_task: Remove redundant INIT_TASK_RCU_TREE_PREEMPT() macro Paul E. McKenney
2017-07-31 22:53   ` [PATCH v2 tip/core/rcu 3/4] sched: Allow migrating kthreads into online but inactive CPUs Paul E. McKenney
2017-07-31 22:53   ` [PATCH v2 tip/core/rcu 4/4] membarrier: Expedited private command Paul E. McKenney

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170727143205.GU3730@linux.vnet.ibm.com \
    --to=paulmck@linux.vnet.ibm.com \
    --cc=akpm@linux-foundation.org \
    --cc=dhowells@redhat.com \
    --cc=dipankar@in.ibm.com \
    --cc=edumazet@google.com \
    --cc=fweisbec@gmail.com \
    --cc=jiangshanlai@gmail.com \
    --cc=josh@joshtriplett.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mathieu.desnoyers@efficios.com \
    --cc=mingo@kernel.org \
    --cc=oleg@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=tglx@linutronix.de \
    --cc=will.deacon@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).