Re: Normal RCU grace period can be stalled for long because need-resched flags not set?

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Joel Fernandes <joel@joelfernandes.org>
To: "Paul E. McKenney" <paulmck@linux.ibm.com>
Cc: Steven Rostedt <rostedt@goodmis.org>,
	Mathieu Desnoyers <mathieu.desnoyers@efficios.com>,
	rcu <rcu@vger.kernel.org>
Subject: Re: Normal RCU grace period can be stalled for long because need-resched flags not set?
Date: Thu, 4 Jul 2019 20:08:30 -0400	[thread overview]
Message-ID: <20190705000830.GB12919@google.com> (raw)
In-Reply-To: <20190704221702.GL26519@linux.ibm.com>

Hi Paul,

On Thu, Jul 04, 2019 at 03:17:02PM -0700, Paul E. McKenney wrote:
> On Thu, Jul 04, 2019 at 02:50:55PM -0400, Joel Fernandes wrote:
> > On Thu, Jul 04, 2019 at 10:13:15AM -0700, Paul E. McKenney wrote:
> > > On Wed, Jul 03, 2019 at 11:24:54PM -0400, Joel Fernandes wrote:
> > > > On Wed, Jul 03, 2019 at 05:50:09PM -0700, Paul E. McKenney wrote:
> > > > > On Wed, Jul 03, 2019 at 08:32:13PM -0400, Joel Fernandes wrote:
> > > 
> > > [ . . . ]
> > > 
> > > > > > If I add an rcu_perf_wait_shutdown() to the end of the loop, the outliers go away.
> > > > > > 
> > > > > > Still can't explain that :)
> > > > > > 
> > > > > > 	do {
> > > > > > 		...
> > > > > > 		...
> > > > > > +               rcu_perf_wait_shutdown();
> > > > > >         } while (!torture_must_stop());
> > > > > 
> > > > > Might it be the cond_resched_tasks_rcu_qs() invoked from within
> > > > > rcu_perf_wait_shutdown()?  So I have to ask...  What happens if you
> > > > > use cond_resched_tasks_rcu_qs() at the end of that loop instead of
> > > > > rcu_perf_wait_shutdown()?
> > > > 
> > > > I don't think it is, if I call cond_resched_tasks_rcu_qs(), it still doesn't
> > > > help. Only calling rcu_perf_wait_shutdown() cures it.
> > > 
> > > My eyes seem to be working better today.
> > > 
> > > Here is rcu_perf_wait_shutdown():
> > > 
> > > 	static void rcu_perf_wait_shutdown(void)
> > > 	{
> > > 		cond_resched_tasks_rcu_qs();
> > > 		if (atomic_read(&n_rcu_perf_writer_finished) < nrealwriters)
> > > 			return;
> > > 		while (!torture_must_stop())
> > > 			schedule_timeout_uninterruptible(1);
> > > 	}
> > > 
> > > Take a close look at the "while" loop.  It is effectively ending your
> > > test prematurely and thus rendering the code no longer CPU-bound.  ;-)
> > 
> > That makes a lot of sense. I also found that I can drop
> > 'rcu_perf_wait_shutdown' in my preempt-disable loop as long as I don't do an
> > ftrace trace. I suspect the trace dump happening at the end is messing with
> > the last iteration of the writer loops. My preempt disable loop probably
> > disables preemption for a long time without rescheduling during this ftrace
> > dump.
> 
> Agreed, although rcutorture does have special handling for ftrace dumps,
> they still kick out the occasional RCU CPU stall warning and the like.
> 
> > Anyway, having the rcu_perf_wait_shutdown without doing the ftrace dump seems
> > to solve it.
> 
> Did you mean "leaving out both the rcu_perf_wait_shutdown() and the
> ftrace dump"?

Sorry, will be more clear next time. I meant, with rcu_perf_wait_shutdown()
after every preempt disable/enable in my loop, there are no outliers whether
ftrace dumping is enabled or not. This is pretty repeatable.

However, if I remove rcu_perf_wait_shutdown() it is still fine if don't do
ftrace dumping.

The fatal outlier situation arises when rcu_perf_wait_shutdown() is not
there in the loop, while ftrace dump is on.

However, if I also just do a set_preempt_need_resched() in my loop without a
rcu_perf_wait_shutdown(), then that also cures the outlier issue (while
ftrace dump is on).

> Another approach would be to set a global variable prior to the call
> to ftrace_dump() -- and there might already be such a variable -- and
> making a load of that variable be part of the termination condition for
> your loop.  With appropriate ordering added, of course.

Yes, I tried this but it did not work. I tried something like:
if (atomic_read(&n_rcu_perf_writer_finished) >= nrealwriters)
	return;
...in my busy_wait() code (my busy_wait() function loops waiting for time to
elapse). I was hoping that the busy_wait() loop would break out of the
preempt disable section if it detected that all writers were done.

This is also precisely the condition checked before the rcu_ftrace_dump()
function runs. So it is odd that didn't work. I'll keep digging.

> > So actually the point of all my testing was (other than learning) was to
> > compare how RCU pre-consolidated vs post-consolidated does. As predicted,
> > with post-consolidated RCU, the preempt-disable / enable does manage to slow
> > down the grace periods. This is not an issue per-se as you said that even
> > 100s of ms of grace period delay is within acceptable RCU latencies. The
> > results are as below:
> 
> And thank you for checking this out!

My pleasure! Thanks for the discussions.

> And the 1.2-second outliers were due to the ftrace dump?  If not, it would
> be good to understand what was happening in that case.

Sounds great, I will work on it. And thanks a lot for the below explanation, I
will spend some time to understand it and I appreciate that.

thanks,
 - Joel



> > I am happy to try out any other test scenarios as well if you would like me
> > to. I am open to any other suggestions you may have to improve the rcuperf
> > tests in this (deferred/consolidated RCU) or other regards.
> > 
> > I did have a request, could you help me understand why is the grace period
> > duration double that of my busy wait time? You mentioned this has something
> > to do with the thread not waking up before another GP is started. But I did
> > not follow this. Thanks a lot!!
> 
> Let's look at a normal grace period, and assume the ->gp_seq grace-period
> sequence counter is initially 0x1, so that a grace period is already
> in progress (keep in mind that the low-order two bits of ->gp_seq are
> the phase within the grace period and the rest of the bits are the
> grace-period number, so we are in phase 1 of the grace period following
> grace period #0).  This grace period was started via synchronize_rcu()
> by Task A.
> 
> Then we have the following sequence of events:
> 
> o	Task B does call_rcu(), but is too late to use the already-started
> 	grace period, so it needs another one.	The ->gp_seq_needed
> 	counter thus becomes 0x8.
> 
> o	The current grace period completes, so that the ->gp_seq
> 	counter becomes 0x4.  Callback invocation commences.
> 
> o	The grace-period kthread notices that ->gp_seq_needed is greater
> 	than ->gp_seq, so it starts a new grace period.  The ->gp_seq
> 	counter therefore becomes 0x5.
> 
> o	The callback for Task A's synchronize_rcu() is invoked, awakening
> 	Task A.  This happens almost immediately after the new grace
> 	period starts, but does definitely happen -after- that grace
> 	period starts.
> 
> o	Task A, being part of rcuperf, almost immediately does another
> 	synchronize_rcu().  So ->gp_seq_needed becomes 0xc.
> 
> If you play out the rest of this sequence, you should see how Task A
> waits for almost two full grace periods.
> 
> > Performance changes in consolidated vs regular
> > -------------------------------------------
> > I ran a thread on a reserved CPU doing preempt disable + busy wait + preempt enable
> > in a loop and measured the difference in rcuperf between conslidated and regular.
> > nreaders = nwriters = 10.
> > 
> > 		(preempt disable duration)
> > 		5ms	10ms	20ms	50ms
> > v4.19
> > median (usecs)	12000.3	12001	11000	12000
> > 
> > v5.1 (deferred)
> > median (usecs)	13000	19999   40000   100000
> > 
> > All of this is still within spec of RCU.
> > 
> > Note as discussed:
> > These results are independent of the value of jiffies_to_sched_qs. However,
> > in my preempt-disable + enable loop, if I don't do a
> > set_preempt_need_resched() in my loop, then I need to lower
> > jiffies_to_sched_qs to bring down the grace period durations. This is
> > understandable because the tick may not know sooner that it needs to resched
> > the preempt disable busy loop.
> 
> Good stuff!  Again, thank you for doing this!!!
> 
> 							Thanx, Paul

next prev parent reply	other threads:[~2019-07-05  0:08 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-07-03 15:25 Normal RCU grace period can be stalled for long because need-resched flags not set? Joel Fernandes
2019-07-03 15:30 ` Steven Rostedt
2019-07-03 16:41   ` Joel Fernandes
2019-07-03 16:43     ` Joel Fernandes
2019-07-03 17:39     ` Paul E. McKenney
2019-07-03 21:24       ` Joel Fernandes
2019-07-03 21:57         ` Paul E. McKenney
2019-07-03 22:24           ` Joel Fernandes
2019-07-03 23:01             ` Paul E. McKenney
2019-07-04  0:21               ` Joel Fernandes
2019-07-04  0:32                 ` Joel Fernandes
2019-07-04  0:50                   ` Paul E. McKenney
2019-07-04  3:24                     ` Joel Fernandes
2019-07-04 17:13                       ` Paul E. McKenney
2019-07-04 18:50                         ` Joel Fernandes
2019-07-04 22:17                           ` Paul E. McKenney
2019-07-05  0:08                             ` Joel Fernandes [this message]
2019-07-05  1:30                               ` Joel Fernandes
2019-07-05  1:57                                 ` Paul E. McKenney
2019-07-06 12:18                                   ` [attn: Steve] " Joel Fernandes
2019-07-06 18:05                                     ` Paul E. McKenney
2019-07-06 23:25                                       ` Steven Rostedt
2019-07-06 12:02                             ` Joel Fernandes
2019-07-06 18:21                               ` Paul E. McKenney
2019-07-06 23:03                                 ` Joel Fernandes
2019-07-07 11:19                                   ` Paul E. McKenney
2019-07-04  0:47                 ` Paul E. McKenney
2019-07-04 16:49                   ` Joel Fernandes
2019-07-04 17:08                     ` Paul E. McKenney
2019-07-03 16:10 ` Paul E. McKenney

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190705000830.GB12919@google.com \
    --to=joel@joelfernandes.org \
    --cc=mathieu.desnoyers@efficios.com \
    --cc=paulmck@linux.ibm.com \
    --cc=rcu@vger.kernel.org \
    --cc=rostedt@goodmis.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.