From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: Tejun Heo <tj@kernel.org>
Cc: jiangshanlai@gmail.com, linux-kernel@vger.kernel.org
Subject: Re: WARN_ON_ONCE() in process_one_work()?
Date: Tue, 27 Jun 2017 09:27:38 -0700 [thread overview]
Message-ID: <20170627162738.GA16289@linux.vnet.ibm.com> (raw)
In-Reply-To: <20170623164142.GA14685@linux.vnet.ibm.com>
On Fri, Jun 23, 2017 at 09:41:42AM -0700, Paul E. McKenney wrote:
> On Wed, Jun 21, 2017 at 08:30:35AM -0700, Paul E. McKenney wrote:
> > On Tue, Jun 20, 2017 at 09:45:23AM -0700, Paul E. McKenney wrote:
> > > On Sun, Jun 18, 2017 at 06:40:00AM -0400, Tejun Heo wrote:
> > > > Hello,
> > > >
> > > > On Sat, Jun 17, 2017 at 10:31:05AM -0700, Paul E. McKenney wrote:
> > > > > On Sat, Jun 17, 2017 at 07:53:14AM -0400, Tejun Heo wrote:
> > > > > > Hello,
> > > > > >
> > > > > > On Fri, Jun 16, 2017 at 10:36:58AM -0700, Paul E. McKenney wrote:
> > > > > > > And no test failures from yesterday evening. So it looks like we get
> > > > > > > somewhere on the order of one failure per 138 hours of TREE07 rcutorture
> > > > > > > runtime with your printk() in the mix.
> > > > > > >
> > > > > > > Was the above output from your printk() output of any help?
> > > > > >
> > > > > > Yeah, if my suspicion is correct, it'd require new kworker creation
> > > > > > racing against CPU offline, which would explain why it's so difficult
> > > > > > to repro. Can you please see whether the following patch resolves the
> > > > > > issue?
> > > > >
> > > > > That could explain why only Steve Rostedt and I saw the issue. As far
> > > > > as I know, we are the only ones who regularly run CPU-hotplug stress
> > > > > tests. ;-)
> > > >
> > > > I was a bit confused. It has to be racing against either new kworker
> > > > being created on the wrong CPU or rescuer trying to migrate to the
> > > > CPU, and it looks like we're mostly seeing the rescuer condition, but,
> > > > yeah, this would only get triggered rarely. Another contributing
> > > > factor could be the vmstat work putting on a workqueue w/ rescuer
> > > > recently. It runs quite often, so probably has increased the chance
> > > > of hitting the right condition.
> > >
> > > Sounds like too much fun! ;-)
> > >
> > > But more constructively... If I understand correctly, it is now possible
> > > to take a CPU partially offline and put it back online again. This should
> > > allow much more intense testing of this sort of interaction.
> > >
> > > And no, I haven't yet tried this with RCU because I would probably need
> > > to do some mix of just-RCU online/offline and full-up online-offline.
> > > Plus RCU requires pretty much a full online/offline cycle to fully
> > > exercise it. :-/
> > >
> > > > > I have a weekend-long run going, but will give this a shot overnight on
> > > > > Monday, Pacific Time. Thank you for putting it together, looking forward
> > > > > to seeing what it does!
> > > >
> > > > Thanks a lot for the testing and patience. Sorry that it took so
> > > > long. I'm not completely sure the patch is correct. It might have to
> > > > be more specifc about which type of migration or require further
> > > > synchronization around migration, but hopefully it'll at least be able
> > > > to show that this was the cause of the problem.
> > >
> > > And last night's tests had no failures. Which might actually mean
> > > something, will get more info when I run without your patch this
> > > evening. ;-)
> >
> > And it didn't fail without the patch, either. 45 hours of test vs.
> > 60 hours with the patch. This one is not going to be easy to prove
> > either way. I will try again this evening without the patch and see
> > what that gets us.
>
> And another 36 hours (total of 81 hours) without the patch, still no
> failure. Sigh.
>
> In the sense that the patch doesn't cause any new problem:
>
> Tested-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
>
> But I clearly have nothing of statistical significance, so any confidence
> in the fix is coming from your reproducer.
And for whatever it is worth, I did finally get a reproduction without
the patch. The probability of occurrence is quite low with my test setup,
so please queue this patch. I will accumulate test time on it over the
months to come. :-/
Thanx, Paul
next prev parent reply other threads:[~2017-06-27 16:27 UTC|newest]
Thread overview: 27+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-05-01 16:57 WARN_ON_ONCE() in process_one_work()? Paul E. McKenney
2017-05-01 18:38 ` Paul E. McKenney
2017-05-01 18:44 ` Tejun Heo
2017-05-01 18:58 ` Paul E. McKenney
2017-05-05 17:11 ` Paul E. McKenney
2017-06-13 20:58 ` Tejun Heo
2017-06-13 22:31 ` Paul E. McKenney
2017-06-14 15:15 ` Paul E. McKenney
2017-06-15 15:38 ` Paul E. McKenney
2017-06-16 17:36 ` Paul E. McKenney
2017-06-17 11:53 ` Tejun Heo
2017-06-17 17:31 ` Paul E. McKenney
2017-06-18 10:40 ` Tejun Heo
2017-06-20 16:45 ` Paul E. McKenney
2017-06-21 15:30 ` Paul E. McKenney
2017-06-23 16:41 ` Paul E. McKenney
2017-06-27 16:27 ` Paul E. McKenney [this message]
2017-05-01 18:42 ` Tejun Heo
2017-05-01 19:42 ` Steven Rostedt
2017-05-01 19:50 ` Tejun Heo
2017-05-01 20:02 ` Steven Rostedt
-- strict thread matches above, loose matches on Subject: below --
2018-06-20 19:29 Paul E. McKenney
2018-07-02 21:05 ` Tejun Heo
2018-07-03 4:05 ` Paul E. McKenney
2018-07-03 16:40 ` Paul E. McKenney
2018-07-03 20:12 ` Tejun Heo
2018-07-03 21:44 ` Paul E. McKenney
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20170627162738.GA16289@linux.vnet.ibm.com \
--to=paulmck@linux.vnet.ibm.com \
--cc=jiangshanlai@gmail.com \
--cc=linux-kernel@vger.kernel.org \
--cc=tj@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).