linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: "Ross Green" <rgkernel@gmail.com>,
	"Thomas Gleixner" <tglx@linutronix.de>,
	"John Stultz" <john.stultz@linaro.org>,
	"Peter Zijlstra" <peterz@infradead.org>,
	linux-kernel@vger.kernel.org, "Ingo Molnar" <mingo@kernel.org>,
	"Lai Jiangshan" <jiangshanlai@gmail.com>,
	dipankar@in.ibm.com, "Andrew Morton" <akpm@linux-foundation.org>,
	josh@joshtriplett.org, rostedt <rostedt@goodmis.org>,
	"David Howells" <dhowells@redhat.com>,
	"Eric Dumazet" <edumazet@google.com>,
	dvhart@linux.intel.com,
	"Frédéric Weisbecker" <fweisbec@gmail.com>,
	"Oleg Nesterov" <oleg@redhat.com>,
	"pranith kumar" <bobby.prani@gmail.com>
Subject: Re: rcu_preempt self-detected stall on CPU from 4.5-rc3, since 3.17
Date: Thu, 18 Feb 2016 20:22:53 -0800	[thread overview]
Message-ID: <20160219042253.GJ6719@linux.vnet.ibm.com> (raw)
In-Reply-To: <1568248905.2264.1455837260992.JavaMail.zimbra@efficios.com>

On Thu, Feb 18, 2016 at 11:14:21PM +0000, Mathieu Desnoyers wrote:
> ----- On Feb 18, 2016, at 6:51 AM, Ross Green rgkernel@gmail.com wrote:
> 
> > On Thu, Feb 18, 2016 at 10:19 AM, Paul E. McKenney
> > <paulmck@linux.vnet.ibm.com> wrote:
> >> On Wed, Feb 17, 2016 at 12:28:29PM -0800, Paul E. McKenney wrote:
> >>> On Wed, Feb 17, 2016 at 08:45:54PM +0100, Peter Zijlstra wrote:
> >>> > On Wed, Feb 17, 2016 at 11:28:17AM -0800, Paul E. McKenney wrote:
> >>> > > On Tue, Feb 16, 2016 at 09:45:49PM -0800, Paul E. McKenney wrote:
> >>> > > > On Tue, Feb 09, 2016 at 09:11:55PM +1100, Ross Green wrote:
> >>> > > > > Continued testing with the latest linux-4.5-rc3 release.
> >>> > > > >
> >>> > > > > Please find attached a copy of traces from dmesg:
> >>> > > > >
> >>> > > > > There is a lot more debug and trace data so hopefully this will shed
> >>> > > > > some light on what might be happening here.
> >>> > > > >
> >>> > > > > My testing remains run a series of simple benchmarks, let that run to
> >>> > > > > completion and then leave the system idle away with just a few daemons
> >>> > > > > running.
> >>> > > > >
> >>> > > > > the self detected stalls in this instance turned up after a days run time.
> >>> > > > > There were  NO heavy artificial computational loads on the machine.
> >>> > > >
> >>> > > > It does indeed look quiet on that dmesg for a good long time.
> >>> > > >
> >>> > > > The following insanely crude not-for-mainline hack -might- be producing
> >>> > > > good results in my testing.  It will take some time before I can claim
> >>> > > > statistically different results.  But please feel free to give it a go
> >>> > > > in the meantime.  (Thanks to Al Viro for pointing me in this direction.)
> >>> >
> >>> > Your case was special in that is was hotplug triggering it, right?
> >>>
> >>> Yes, it has thus far only shown up with CPU hotplug enabled.
> >>>
> >>> > I was auditing the hotplug paths involved when I fell ill two weeks ago,
> >>> > and have not really made any progress on that because of that :/
> >>>
> >>> I have always said that being sick is bad for one's health, but I didn't
> >>> realize that it could be bad for the kernel's health as well.  ;-)
> >>>
> >>> > I'll go have another look, I had a vague feeling for a race back then,
> >>> > lets see if I can still remember how..
> >>>
> >>> I believe that I can -finally- get an ftrace_dump() to happen within
> >>> 10-20 milliseconds of the problem, which just might be soon enough
> >>> after the problem to gather some useful information.  I am currently
> >>> testing this theory with "ftrace trace_event=sched_waking,sched_wakeup"
> >>> boot arguments on a two-hour run.
> >>
> >> And apparently another way to greatly reduce the probability of this
> >> bug occurring is to enable ftrace.  :-/
> >>
> >> Will try longer runs.
> >>
> >>                                                         Thanx, Paul
> >>
> >>> If this works out, what would be a useful set of trace events for me
> >>> to capture?
> >>>
> >>>                                                       Thanx, Paul
> >>
> > 
> > Well managed to catch this one on linux-4.5-rc4.
> > 
> > Took over 3 days and 7 hours to hit.
> > 
> > Same test as before, boot, run a series of simple benchmarks and then
> > let the machine just idle away.
> > 
> > As before, the reported stall, AND everything keeps on running as if
> > nothing had happened.
> > 
> > I notice in the task dump for both the cpus, the swapper is running on
> > both cpus.
> > 
> > does that make any sense?
> > There is around 3% of memory actually used.
> > 
> > Anyway, please find attached a copy of the dmesg output.
> > 
> > Hope this helps a few people fill in the missing pieces here.
> 
> What seems weird here is that all code paths in the loop
> perform a WRITE_ONCE(rsp->gp_activity, jiffies), which
> implies progress in each case:
> 
> - rcu_gp_init() does it,
> - both branches in the QS forcing loop do it, either
>   through rcu_gp_fqs(), or directly,
> 
> This means the thread is really stalled, and the backtrace
> shows those threads are stalled on the
> 
>                         ret = wait_event_interruptible_timeout(rsp->gp_wq,
>                                         rcu_gp_fqs_check_wake(rsp, &gf), j);
> 
> Since this is a *_timeout wait, for which the timeout
> is bounded by "j" jiffies which is bounded by "HZ" value,
> we should really not stay there too long, even if we are
> not awakened by whatever is supposed to awaken us.

Completely agreed on this seeming weird.  ;-)

> So unless I'm missing something, it would look like
> schedule_timeout() is missing its timeout there.
> 
> Perhaps we only experience this missed timeout here
> because typically there is always a wakeup coming sooner
> or later on relatively busy systems. This one is idle
> for quite a while.
> 
> Thoughts ?

I can also make this happen (infrequently) on a busy system with
rcutorture, but only with frequent CPU hotplugging.  Ross is making
it happen with pure idle.

I did manage to make this fail with ftrace running, but thus far
have not been able to get a trace that actually includes any
activity for the grace-period kthread.  Working on tightening
up the tests...

						Thanx, Paul

> Thanks,
> 
> Mathieu
> 
> 
> > 
> > Regards,
> > 
> > Ross Green
> 
> -- 
> Mathieu Desnoyers
> EfficiOS Inc.
> http://www.efficios.com

  parent reply	other threads:[~2016-02-19  4:22 UTC|newest]

Thread overview: 79+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-02-09 10:11 rcu_preempt self-detected stall on CPU from 4.5-rc3, since 3.17 Ross Green
2016-02-17  5:45 ` Paul E. McKenney
2016-02-17 19:28   ` Paul E. McKenney
2016-02-17 19:45     ` Peter Zijlstra
2016-02-17 20:28       ` Paul E. McKenney
2016-02-17 23:19         ` Paul E. McKenney
2016-02-18 11:51           ` Ross Green
2016-02-18 23:14             ` Mathieu Desnoyers
2016-02-19  3:56               ` Ross Green
2016-02-19  4:13                 ` John Stultz
2016-02-19 17:33                   ` Paul E. McKenney
2016-02-20  4:34                     ` Ross Green
2016-02-20  6:32                       ` Paul E. McKenney
2016-02-21  5:04                         ` Ross Green
2016-02-21 18:15                           ` Ross Green
2016-02-23 20:34                             ` Mathieu Desnoyers
2016-02-23 20:55                               ` Paul E. McKenney
2016-02-23 21:28                                 ` Ross Green
2016-02-25  5:13                                   ` Ross Green
2016-02-26  0:56                                     ` Paul E. McKenney
2016-02-26  1:35                                       ` Paul E. McKenney
2016-03-04  5:30                                         ` Ross Green
2016-03-04 15:18                                           ` Paul E. McKenney
2016-03-18 21:00                                       ` Josh Triplett
2016-03-18 23:56                                         ` Paul E. McKenney
2016-03-21 16:22                                           ` Jacob Pan
2016-03-21 17:26                                             ` Paul E. McKenney
2016-03-22 16:35                                               ` Chatre, Reinette
2016-03-22 17:40                                                 ` Paul E. McKenney
2016-03-22 21:04                                                   ` Chatre, Reinette
2016-03-22 21:19                                                     ` Paul E. McKenney
2016-03-23 17:15                                                       ` Chatre, Reinette
2016-03-23 18:20                                                         ` Paul E. McKenney
2016-03-23 18:25                                                           ` Chatre, Reinette
2016-03-23 19:50                                                             ` Paul E. McKenney
2016-03-25 21:24                                                           ` Chatre, Reinette
2016-03-25 21:46                                                             ` Paul E. McKenney
2016-03-26 12:29                                                               ` Mathieu Desnoyers
2016-03-26 15:28                                                                 ` Paul E. McKenney
2016-03-26 18:49                                                                   ` Paul E. McKenney
2016-03-26 22:22                                                                     ` Mathieu Desnoyers
2016-03-27  1:34                                                                       ` Paul E. McKenney
2016-03-27 13:48                                                                         ` Mathieu Desnoyers
2016-03-27 15:40                                                                           ` Paul E. McKenney
2016-03-27 20:00                                                                             ` Paul E. McKenney
2016-03-27 20:45                                                                             ` Peter Zijlstra
2016-03-27 21:06                                                                               ` Paul E. McKenney
2016-03-28  6:25                                                                                 ` Peter Zijlstra
2016-03-28 13:08                                                                                   ` Paul E. McKenney
2016-03-29  0:25                                                                                     ` Paul E. McKenney
2016-03-29  0:28                                                                                       ` Paul E. McKenney
2016-03-29 13:49                                                                                         ` Paul E. McKenney
2016-03-30 14:55                                                                                           ` Paul E. McKenney
2016-03-31 15:42                                                                                             ` Paul E. McKenney
2016-04-03  8:18                                                                                               ` Paul E. McKenney
2016-05-06  6:25                                                                                                 ` Ross Green
2016-05-07 15:25                                                                                                   ` Paul E. McKenney
2016-05-10  2:36                                                                                                     ` Ross Green
2016-06-30 17:52                                                                                                     ` Paul E. McKenney
2016-03-28  1:44                                                                               ` Mathieu Desnoyers
2016-03-28  2:23                                                                                 ` Mathieu Desnoyers
2016-03-28  6:13                                                                                   ` Peter Zijlstra
2016-03-28 13:50                                                                                     ` Paul E. McKenney
2016-03-28 14:15                                                                                     ` Mathieu Desnoyers
2016-03-27 20:53                                                                             ` Peter Zijlstra
2016-03-27 21:07                                                                               ` Paul E. McKenney
2016-03-27 20:54                                             ` Peter Zijlstra
2016-03-27 21:09                                               ` Paul E. McKenney
2016-03-28  6:28                                                 ` Peter Zijlstra
2016-03-28 13:29                                                   ` Paul E. McKenney
2016-03-28 15:07                                                     ` Mathieu Desnoyers
2016-03-28 15:56                                                       ` Paul E. McKenney
2016-03-28 16:12                                                         ` Mathieu Desnoyers
2016-03-28 16:29                                                           ` Paul E. McKenney
2016-03-30 12:58                                                     ` Boqun Feng
2016-03-30 13:30                                                       ` Paul E. McKenney
2016-03-30 14:15                                                         ` Boqun Feng
2016-02-19  4:22               ` Paul E. McKenney [this message]
2016-02-19  5:59                 ` Ross Green

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160219042253.GJ6719@linux.vnet.ibm.com \
    --to=paulmck@linux.vnet.ibm.com \
    --cc=akpm@linux-foundation.org \
    --cc=bobby.prani@gmail.com \
    --cc=dhowells@redhat.com \
    --cc=dipankar@in.ibm.com \
    --cc=dvhart@linux.intel.com \
    --cc=edumazet@google.com \
    --cc=fweisbec@gmail.com \
    --cc=jiangshanlai@gmail.com \
    --cc=john.stultz@linaro.org \
    --cc=josh@joshtriplett.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mathieu.desnoyers@efficios.com \
    --cc=mingo@kernel.org \
    --cc=oleg@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rgkernel@gmail.com \
    --cc=rostedt@goodmis.org \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).