public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: Christopher Lameter <cl@linux.com>
Cc: "Li, Aubrey" <aubrey.li@linux.intel.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Andi Kleen <ak@linux.intel.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Frederic Weisbecker <fweisbec@gmail.com>,
	Aubrey Li <aubrey.li@intel.com>,
	len.brown@intel.com, rjw@rjwysocki.net,
	tim.c.chen@linux.intel.com, arjan@linux.intel.com,
	yang.zhang.wz@gmail.com, x86@kernel.org,
	linux-kernel@vger.kernel.org, daniel.lezcano@linaro.org
Subject: Re: [RFC PATCH v1 00/11] Create fast idle path for short idle periods
Date: Wed, 19 Jul 2017 09:54:25 -0700	[thread overview]
Message-ID: <20170719165425.GE3730@linux.vnet.ibm.com> (raw)
In-Reply-To: <alpine.DEB.2.20.1707191001380.18652@nuc-kabylake>

On Wed, Jul 19, 2017 at 10:03:22AM -0500, Christopher Lameter wrote:
> On Wed, 19 Jul 2017, Paul E. McKenney wrote:
> 
> > > Do we have any problem if we skip RCU idle enter/exit under a fast idle scenario?
> > > My understanding is, if tick is not stopped, then we don't need inform RCU in
> > > idle path, it can be informed in irq exit.
> >
> > Indeed, the problem arises when the tick is stopped.
> 
> Well is there a boundary when you would want the notification calls? I
> would think that even an idle period of a couple of seconds does not
> necessarily require a callback to rcu. Had some brokenness here where RCU
> calls did not occur for hours or so. At some point the system ran out of
> memory but thats far off.

Yeah, I should spell this out more completely, shouldn't I?  And get
it into the documentation if it isn't already there...  Where it is
currently at best hinted at.  :-/

1.	If a CPU is either idle or executing in usermode, and RCU believes
	it is non-idle, the scheduling-clock tick had better be running.
	Otherwise, you will get RCU CPU stall warnings.  Or at best,
	very long (11-second) grace periods, with a pointless IPI waking
	the CPU each time.

2.	If a CPU is in a portion of the kernel that executes RCU read-side
	critical sections, and RCU believes this CPU to be idle, you can get
	random memory corruption.  DON'T DO THIS!!!

	This is one reason to test with lockdep, which will complain
	about this sort of thing.

3.	If a CPU is in a portion of the kernel that is absolutely
	positively no-joking guaranteed to never execute any RCU read-side
	critical sections, and RCU believes this CPU to to be idle,
	no problem.  This sort of thing is used by some architectures
	for light-weight exception handlers, which can then avoid the
	overhead of rcu_irq_enter() and rcu_irq_exit().  Some go further
	and avoid the entireties of irq_enter() and irq_exit().

	Just make very sure you are running some of your tests with
	CONFIG_PROVE_RCU=y.

4.	If a CPU is executing in the kernel with the scheduling-clock
	interrupt disabled and RCU believes this CPU to be non-idle,
	and if the CPU goes idle (from an RCU perspective) every few
	jiffies, no problem.  It is usually OK for there to be the
	occasional gap between idle periods of up to a second or so.

	If the gap grows too long, you get RCU CPU stall warnings.

5.	If a CPU is either idle or executing in usermode, and RCU believes
	it to be idle, of course no problem.

6.	If a CPU is executing in the kernel, the kernel code
	path is passing through quiescent states at a reasonable
	frequency (preferably about once per few jiffies, but the
	occasional excursion to a second or so is usually OK) and the
	scheduling-clock interrupt is enabled, of course no problem.

	If the gap between a successive pair of quiescent states grows
	too long, you get RCU CPU stall warnings.

For purposes of comparison, the default time until RCU CPU stall warning
in mainline is 21 seconds.  A number of distros set this to 60 seconds.
Back in the 90s, the analogous timeout for DYNIX/ptx was 1.5 seconds.  :-/

Hence "a second or so" instead of your "a couple of seconds".  ;-)

Please see below for the corresponding patch to RCU's requirements.
Thoughts?

							Thanx, Paul

------------------------------------------------------------------------

commit 693c9bfa43f92570dd362d8834440b418bbb994a
Author: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Date:   Wed Jul 19 09:52:58 2017 -0700

    doc: Set down RCU's scheduling-clock-interrupt needs
    
    This commit documents the situations in which RCU needs the
    scheduling-clock interrupt to be enabled, along with the consequences
    of failing to meet RCU's needs in this area.
    
    Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>

diff --git a/Documentation/RCU/Design/Requirements/Requirements.html b/Documentation/RCU/Design/Requirements/Requirements.html
index 95b30fa25d56..7980bee5607f 100644
--- a/Documentation/RCU/Design/Requirements/Requirements.html
+++ b/Documentation/RCU/Design/Requirements/Requirements.html
@@ -2080,6 +2080,8 @@ Some of the relevant points of interest are as follows:
 <li>	<a href="#Scheduler and RCU">Scheduler and RCU</a>.
 <li>	<a href="#Tracing and RCU">Tracing and RCU</a>.
 <li>	<a href="#Energy Efficiency">Energy Efficiency</a>.
+<li>	<a href="#Scheduling-Clock Interrupts and RCU">
+	Scheduling-Clock Interrupts and RCU</a>.
 <li>	<a href="#Memory Efficiency">Memory Efficiency</a>.
 <li>	<a href="#Performance, Scalability, Response Time, and Reliability">
 	Performance, Scalability, Response Time, and Reliability</a>.
@@ -2532,6 +2534,113 @@ I learned of many of these requirements via angry phone calls:
 Flaming me on the Linux-kernel mailing list was apparently not
 sufficient to fully vent their ire at RCU's energy-efficiency bugs!
 
+<h3><a name="Scheduling-Clock Interrupts and RCU">
+Scheduling-Clock Interrupts and RCU</a></h3>
+
+<p>
+The kernel transitions between in-kernel non-idle execution, userspace
+execution, and the idle loop.
+Depending on kernel configuration, RCU handles these states differently:
+
+<table border=3>
+<tr><th><tt>HZ</tt> Kconfig</th>
+	<th>In-Kernel</th>
+		<th>Usermode</th>
+			<th>Idle</th></tr>
+<tr><th align="left"><tt>HZ_PERIODIC</tt></th>
+	<td>Can rely on scheduling-clock interrupt.</td>
+		<td>Can rely on scheduling-clock interrupt and its
+		    detection of interrupt from usermode.</td>
+			<td>Can rely on RCU's dyntick-idle detection.</td></tr>
+<tr><th align="left"><tt>NO_HZ_IDLE</tt></th>
+	<td>Can rely on scheduling-clock interrupt.</td>
+		<td>Can rely on scheduling-clock interrupt and its
+		    detection of interrupt from usermode.</td>
+			<td>Can rely on RCU's dyntick-idle detection.</td></tr>
+<tr><th align="left"><tt>NO_HZ_FULL</tt></th>
+	<td>Can only sometimes rely on scheduling-clock interrupt.
+	    In other cases, it is necessary to bound kernel execution
+	    times and/or use IPIs.</td>
+		<td>Can rely on RCU's dyntick-idle detection.</td>
+			<td>Can rely on RCU's dyntick-idle detection.</td></tr>
+</table>
+
+<table>
+<tr><th>&nbsp;</th></tr>
+<tr><th align="left">Quick Quiz:</th></tr>
+<tr><td>
+	Why can't <tt>NO_HZ_FULL</tt> in-kernel execution rely on the
+	scheduling-clock interrupt, just like <tt>HZ_PERIODIC</tt>
+	and <tt>NO_HZ_IDLE</tt> do?
+</td></tr>
+<tr><th align="left">Answer:</th></tr>
+<tr><td bgcolor="#ffffff"><font color="ffffff">
+	Because, as a performance optimization, <tt>NO_HZ_FULL</tt>
+	does not necessarily re-enable the scheduling-clock interrupt
+	on entry to each and every system call.
+</font></td></tr>
+<tr><td>&nbsp;</td></tr>
+</table>
+
+<p>
+However, RCU must be reliably informed as to whether any given
+CPU is currently in the idle loop, and, for <tt>NO_HZ_FULL</tt>,
+also whether that CPU is executing in usermode, as discussed
+<a href="#Energy Efficiency">earlier</a>.
+It also requires that the scheduling-clock interrupt be enabled when
+RCU needs it to be:
+
+<ol>
+<li>	If a CPU is either idle or executing in usermode, and RCU believes
+	it is non-idle, the scheduling-clock tick had better be running.
+	Otherwise, you will get RCU CPU stall warnings.  Or at best,
+	very long (11-second) grace periods, with a pointless IPI waking
+	the CPU from time to time.
+<li>	If a CPU is in a portion of the kernel that executes RCU read-side
+	critical sections, and RCU believes this CPU to be idle, you will get
+	random memory corruption.  <b>DON'T DO THIS!!!</b>
+
+	<br>This is one reason to test with lockdep, which will complain
+	about this sort of thing.
+<li>	If a CPU is in a portion of the kernel that is absolutely
+	positively no-joking guaranteed to never execute any RCU read-side
+	critical sections, and RCU believes this CPU to to be idle,
+	no problem.  This sort of thing is used by some architectures
+	for light-weight exception handlers, which can then avoid the
+	overhead of <tt>rcu_irq_enter()</tt> and <tt>rcu_irq_exit()</tt>
+	at exception entry and exit, respectively.
+	Some go further and avoid the entireties of <tt>irq_enter()</tt>
+	and <tt>irq_exit()</tt>.
+
+	<br>Just make very sure you are running some of your tests with
+	<tt>CONFIG_PROVE_RCU=y</tt>, just in case one of your code paths
+	was in fact joking about not doing RCU read-side critical sections.
+<li>	If a CPU is executing in the kernel with the scheduling-clock
+	interrupt disabled and RCU believes this CPU to be non-idle,
+	and if the CPU goes idle (from an RCU perspective) every few
+	jiffies, no problem.  It is usually OK for there to be the
+	occasional gap between idle periods of up to a second or so.
+
+	<br>If the gap grows too long, you get RCU CPU stall warnings.
+<li>	If a CPU is either idle or executing in usermode, and RCU believes
+	it to be idle, of course no problem.
+<li>	If a CPU is executing in the kernel, the kernel code
+	path is passing through quiescent states at a reasonable
+	frequency (preferably about once per few jiffies, but the
+	occasional excursion to a second or so is usually OK) and the
+	scheduling-clock interrupt is enabled, of course no problem.
+
+	<br>If the gap between a successive pair of quiescent states grows
+	too long, you get RCU CPU stall warnings.
+</ol>
+
+<p>
+But as long as RCU is properly informed of kernel state transitions between
+in-kernel execution, usermode execution, and idle, and as long as the
+scheduling-clock interrupt is enabled when RCU needs it to be, you
+can rest assured that the bugs you encounter will be in some other
+part of RCU or some other part of the kernel!
+
 <h3><a name="Memory Efficiency">Memory Efficiency</a></h3>
 
 <p>

  reply	other threads:[~2017-07-19 16:54 UTC|newest]

Thread overview: 114+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-07-10  1:38 [RFC PATCH v1 00/11] Create fast idle path for short idle periods Aubrey Li
2017-07-10  1:38 ` [RFC PATCH v1 01/11] sched/idle: create a fast " Aubrey Li
2017-07-10  1:38 ` [RFC PATCH v1 02/11] cpuidle: attach cpuidle governor statistics to the per-CPU device Aubrey Li
2017-07-10  1:38 ` [RFC PATCH v1 03/11] cpuidle: introduce cpuidle governor for idle prediction Aubrey Li
2017-07-12 12:16   ` Peter Zijlstra
2017-07-10  1:38 ` [RFC PATCH v1 04/11] sched/idle: make the fast idle path for short idle periods Aubrey Li
2017-07-11 12:58   ` Paul E. McKenney
2017-07-11 16:33     ` Frederic Weisbecker
2017-07-11 18:11       ` Paul E. McKenney
2017-07-12  3:19         ` Li, Aubrey
2017-07-12  5:03           ` Paul E. McKenney
2017-07-12  5:22             ` Li, Aubrey
2017-07-12 12:19       ` Peter Zijlstra
2017-07-10  1:38 ` [RFC PATCH v1 05/11] cpuidle: update idle statistics before cpuidle governor Aubrey Li
2017-07-10  1:38 ` [RFC PATCH v1 06/11] timers: keep sleep length updated as needed Aubrey Li
2017-07-10  1:38 ` [RFC PATCH v1 07/11] cpuidle: make idle residency update more generic Aubrey Li
2017-07-10  1:38 ` [RFC PATCH v1 08/11] cpuidle: menu: remove reduplicative implementation Aubrey Li
2017-07-10  1:38 ` [RFC PATCH v1 09/11] cpuidle: menu: feed cpuidle prediction to menu governor Aubrey Li
2017-07-10  1:38 ` [RFC PATCH v1 10/11] cpuidle: update cpuidle governor when needed Aubrey Li
2017-07-10  1:38 ` [RFC PATCH v1 11/11] sched/idle: Add a tuning knob to allow changing fast idle threshold Aubrey Li
2017-07-10  8:46 ` [RFC PATCH v1 00/11] Create fast idle path for short idle periods Peter Zijlstra
2017-07-10  9:29   ` Wanpeng Li
2017-07-10 13:59   ` Li, Aubrey
2017-07-10 14:46   ` Andi Kleen
2017-07-10 16:42     ` Peter Zijlstra
2017-07-10 17:27       ` Andi Kleen
2017-07-11  4:40         ` Li, Aubrey
2017-07-11  9:41           ` Peter Zijlstra
2017-07-11 16:09             ` Frederic Weisbecker
2017-07-11 16:34               ` Peter Zijlstra
2017-07-11 18:09                 ` Paul E. McKenney
2017-07-12 11:54                   ` Peter Zijlstra
2017-07-12 15:56                     ` Paul E. McKenney
2017-07-12 17:46                       ` Peter Zijlstra
2017-07-12 18:53                         ` Paul E. McKenney
2017-07-12 19:00                           ` Paul E. McKenney
2017-07-19 13:43                       ` Frederic Weisbecker
2017-07-19 14:51                         ` Paul E. McKenney
2017-07-12 12:22                   ` Peter Zijlstra
2017-07-12 15:54                     ` Paul E. McKenney
2017-07-12 17:17                       ` Peter Zijlstra
2017-07-12 17:57                         ` Peter Zijlstra
2017-07-12 18:50                           ` Paul E. McKenney
2017-07-12 18:46                         ` Paul E. McKenney
2017-07-13  8:34                           ` Peter Zijlstra
2017-07-12  4:15                 ` Li, Aubrey
2017-07-12  8:34                   ` Peter Zijlstra
2017-07-12 21:32                     ` Andi Kleen
2017-07-13  8:36                       ` Peter Zijlstra
2017-07-13 14:48                         ` Li, Aubrey
2017-07-13 14:53                           ` Peter Zijlstra
2017-07-13 15:13                             ` Li, Aubrey
2017-07-13 18:28                               ` Peter Zijlstra
2017-07-14  3:56                                 ` Li, Aubrey
2017-07-14 15:38                                   ` Peter Zijlstra
2017-07-14 15:52                                     ` Arjan van de Ven
2017-07-14 15:58                                       ` Peter Zijlstra
2017-07-14 16:03                                         ` Andi Kleen
2017-07-17  9:21                                           ` Peter Zijlstra
2017-07-17 13:41                                             ` Li, Aubrey
2017-07-14 15:53                                     ` Andi Kleen
2017-07-14 16:06                                       ` Peter Zijlstra
2017-07-14 16:26                                         ` Andi Kleen
2017-07-17 19:23                                           ` Peter Zijlstra
2017-07-17 19:27                                             ` Arjan van de Ven
2017-07-17 19:46                                               ` Thomas Gleixner
2017-07-17 19:51                                                 ` Arjan van de Ven
2017-07-17 19:59                                                   ` Thomas Gleixner
2017-07-17 19:48                                             ` Arjan van de Ven
2017-07-17 19:53                                               ` Thomas Gleixner
2017-07-17 19:55                                                 ` Arjan van de Ven
2017-07-18  3:23                                               ` Li, Aubrey
2017-07-18 18:55                                               ` Peter Zijlstra
2017-07-18  3:14                                             ` Li, Aubrey
2017-07-18  4:45                                               ` Andi Kleen
2017-07-18  6:43                                                 ` Thomas Gleixner
2017-07-18  6:56                                                   ` Li, Aubrey
2017-07-18 15:20                                                     ` Paul E. McKenney
2017-07-18 15:29                                                       ` Arjan van de Ven
2017-07-18 16:36                                                         ` Peter Zijlstra
2017-07-18 16:37                                                           ` Arjan van de Ven
2017-07-18 17:05                                                             ` Peter Zijlstra
2017-07-19  5:44                                                       ` Li, Aubrey
2017-07-19 14:48                                                         ` Paul E. McKenney
2017-07-19 15:03                                                           ` Christopher Lameter
2017-07-19 16:54                                                             ` Paul E. McKenney [this message]
2017-07-20  1:40                                                           ` Li, Aubrey
2017-07-20 12:50                                                             ` Paul E. McKenney
2017-07-20 13:45                                                               ` Arjan van de Ven
2017-07-20 14:19                                                               ` Peter Zijlstra
2017-07-20 16:02                                                                 ` Paul E. McKenney
2017-07-18 16:45                                                     ` Peter Zijlstra
2017-07-18  6:59                                                   ` Andi Kleen
2017-07-18  7:19                                                     ` Thomas Gleixner
2017-07-19  6:12                                                       ` Li, Aubrey
2017-07-19  7:55                                                         ` Thomas Gleixner
2017-07-20  1:56                                                           ` Li, Aubrey
2017-07-20  8:11                                                             ` Thomas Gleixner
2017-07-20 13:48                                                               ` Arjan van de Ven
2017-07-18  7:24                                                     ` Thomas Gleixner
2017-07-18 13:23                                               ` Peter Zijlstra
2017-07-19 13:43                                               ` Peter Zijlstra
2017-07-13 15:20                             ` Paul E. McKenney
2017-07-14  3:47                               ` Li, Aubrey
2017-07-14  4:05                                 ` Paul E. McKenney
2017-07-17 13:24                                   ` Li, Aubrey
2017-07-17 13:58                                     ` Peter Zijlstra
2017-07-17 14:02                                       ` Li, Aubrey
2017-07-12 12:14                   ` Peter Zijlstra
2017-07-11 17:58               ` Christoph Lameter
2017-07-12  2:07                 ` Li, Aubrey
2017-07-12  2:35               ` Li, Aubrey
2017-07-12 18:10               ` Peter Zijlstra
2017-07-11  9:02         ` Peter Zijlstra

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170719165425.GE3730@linux.vnet.ibm.com \
    --to=paulmck@linux.vnet.ibm.com \
    --cc=ak@linux.intel.com \
    --cc=arjan@linux.intel.com \
    --cc=aubrey.li@intel.com \
    --cc=aubrey.li@linux.intel.com \
    --cc=cl@linux.com \
    --cc=daniel.lezcano@linaro.org \
    --cc=fweisbec@gmail.com \
    --cc=len.brown@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=peterz@infradead.org \
    --cc=rjw@rjwysocki.net \
    --cc=tglx@linutronix.de \
    --cc=tim.c.chen@linux.intel.com \
    --cc=x86@kernel.org \
    --cc=yang.zhang.wz@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox