From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>, Tejun Heo <tj@kernel.org>,
linux-kernel@vger.kernel.org
Subject: Re: Warning in irq_work_queue_on()
Date: Sat, 5 Sep 2015 12:53:57 -0700 [thread overview]
Message-ID: <20150905195357.GP4029@linux.vnet.ibm.com> (raw)
In-Reply-To: <20150904151153.GB13708@lerouge>
On Fri, Sep 04, 2015 at 05:11:54PM +0200, Frederic Weisbecker wrote:
> On Thu, Sep 03, 2015 at 09:58:40AM +0200, Peter Zijlstra wrote:
> > On Thu, Sep 03, 2015 at 02:03:51AM +0200, Frederic Weisbecker wrote:
> > > On Thu, Sep 03, 2015 at 12:24:27AM +0200, Peter Zijlstra wrote:
> > > > On Wed, Sep 02, 2015 at 11:50:22PM +0200, Frederic Weisbecker wrote:
> > > > > > > [ 875.703227] [<ffffffff810c2d74>] tick_nohz_full_kick_cpu+0x44/0x50
> > > > >
> > > > > It happens in nohz full, but I'm not sure the guilty is nohz full.
> > > > >
> > > > > The problem here is that wake_up_nohz_cpu() selects a CPU that is offline.
> > > >
> > > > wake_up_nohz_cpu() doesn't do any such thing. Where does the selection
> > > > logic live?
> > >
> > > Err, got confused with get_nohz_timer_target(). But yeah wake_up_nohz_cpu() is
> > > called with a CPU that is chosen by mod_timer() -> get_nohz_timer_target().
> > >
> > > >
> > > > > But this shouldn't happen. Either it selects a CPU that is in the domain tree,
> > > > > and I suspect offline CPUs aren't supposed to be there, or it selects the current
> > > > > CPU. And if the CPU is offlined, it shouldn't be running some kthread...
> > > >
> > > > Do no assume things like that.. always check with the active mask.
> > >
> > > Hmm, so perhaps we need something like this (makes me realize that
> > > the is_housekeeping_cpu() passes the wrong argument, no issue in practice
> > > since nohz full aren't in the domain tree but I still need to fix that along).
> > >
> > > diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> > > index 0902e4d..2c10a69 100644
> > > --- a/kernel/sched/core.c
> > > +++ b/kernel/sched/core.c
> > > @@ -628,7 +628,7 @@ int get_nohz_timer_target(void)
> > >
> > > rcu_read_lock();
> > > for_each_domain(cpu, sd) {
> > > - for_each_cpu(i, sched_domain_span(sd)) {
> > > + for_each_cpu_and(i, sched_domain_span(sd), cpu_online_mask) {
> >
> > cpu_active_mask, we clear that when we start killing the cpu. online
> > only gets cleared once the cpu is actually dead.
>
> So, after our discussion in IRC, I checked how domains are rebuild on hotplug
> ops and it appears that partition_sched_domain() is called on CPU_DOWN_PREPARE
> only. The CPU shouldn't be on the domain tree after that.
>
> (Correct me if I'm wrong, I really am not an expert in the domain handling code.
> As you said that we can't guarantee that a CPU in the domain tree is in the cpu_online_mask,
> I'm likely wrong somewhere).
>
> This is then followed by synchronize_sched(). Which means that after that, the
> new version of the CPU domains (with the offlining CPU excluded) is visible
> everywhere while the CPU is still in cpu_online_mask.
>
> And finally stop machine runs and the CPU is cleared out of cpu_online_mask.
> So I'm probably missing something, otherwise we could find a CPU in the domain
> tree that is not in cpu_online_mask.
OK, I have to ask... Should I be trying Frederic's patch?
At the current failure rate, I will need to be running it for about
a year to give any reasonable conclusion. :-/
Thanx, Paul
prev parent reply other threads:[~2015-09-05 19:54 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-08-25 0:16 Warning in irq_work_queue_on() Paul E. McKenney
2015-09-02 19:44 ` Tejun Heo
2015-09-02 21:50 ` Frederic Weisbecker
2015-09-02 22:24 ` Peter Zijlstra
2015-09-03 0:03 ` Frederic Weisbecker
2015-09-03 7:58 ` Peter Zijlstra
2015-09-04 15:11 ` Frederic Weisbecker
2015-09-05 19:53 ` Paul E. McKenney [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20150905195357.GP4029@linux.vnet.ibm.com \
--to=paulmck@linux.vnet.ibm.com \
--cc=fweisbec@gmail.com \
--cc=linux-kernel@vger.kernel.org \
--cc=peterz@infradead.org \
--cc=tj@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.