From: Frederic Weisbecker <frederic@kernel.org>
To: Peter Zijlstra <peterz@infradead.org>
Cc: Qian Cai <cai@lca.pw>, "Paul E. McKenney" <paulmck@kernel.org>,
Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
Thomas Gleixner <tglx@linutronix.de>,
Michael Ellerman <mpe@ellerman.id.au>,
linuxppc-dev <linuxppc-dev@lists.ozlabs.org>,
Borislav Petkov <bp@alien8.de>
Subject: Re: Endless soft-lockups for compiling workload since next-20200519
Date: Thu, 21 May 2020 02:40:36 +0200 [thread overview]
Message-ID: <20200521004035.GA15455@lenoir> (raw)
In-Reply-To: <20200520125056.GC325280@hirez.programming.kicks-ass.net>
On Wed, May 20, 2020 at 02:50:56PM +0200, Peter Zijlstra wrote:
> On Tue, May 19, 2020 at 11:58:17PM -0400, Qian Cai wrote:
> > Just a head up. Repeatedly compiling kernels for a while would trigger
> > endless soft-lockups since next-20200519 on both x86_64 and powerpc.
> > .config are in,
>
> Could be 90b5363acd47 ("sched: Clean up scheduler_ipi()"), although I've
> not seen anything like that myself. Let me go have a look.
>
>
> In as far as the logs are readable (they're a wrapped mess, please don't
> do that!), they contain very little useful, as is typical with IPIs :/
>
> > [ 1167.993773][ C1] WARNING: CPU: 1 PID: 0 at kernel/smp.c:127
> > flush_smp_call_function_queue+0x1fa/0x2e0
So I've tried to think of a race that could produce that and here is
the only thing I could come up with. It's a bit complicated unfortunately:
CPU 0 CPU 1
----- -----
tick {
trigger_load_balance() {
raise_softirq(SCHED_SOFTIRQ);
//but nohz_flags(0) = 0
}
kick_ilb() {
atomic_fetch_or(...., nohz_flags(0))
softirq() { #VMEXIT or anything that could stop a CPU for a while
run_rebalance_domain() {
nohz_idle_balance() {
atomic_andnot(NOHZ_KICK_MASK, nohz_flag(0))
}
}
}
}
// schedule
nohz_newidle_balance() {
kick_ilb() { // pick current CPU
atomic_fetch_or(...., nohz_flags(0)) #VMENTER
smp_call_function_single_async() { smp_call_function_single_async() {
// verified csd->flags != CSD_LOCK // verified csd->flags != CSD_LOCK
csd->flags = CSD_LOCK csd->flags = CSD_LOCK
//execute in place //queue and send IPI
csd->flags = 0
nohz_csd_func()
}
}
}
IPI�{
flush_smp_call_function_queue() {
csd_unlock() {
WARN_ON(csd->flags != CSD_LOCK) <---------!!!!!
The root cause here would be that trigger_load_balance() unconditionally raise
the softirq. And I have to confess I'm not clear why since the softirq is
essentially a no-op when nohz_flags() is 0.
Thanks.
next prev parent reply other threads:[~2020-05-21 0:40 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-05-20 3:58 Endless soft-lockups for compiling workload since next-20200519 Qian Cai
2020-05-20 12:50 ` Peter Zijlstra
2020-05-20 14:06 ` Qian Cai
2020-05-21 0:40 ` Frederic Weisbecker [this message]
2020-05-21 9:39 ` Peter Zijlstra
2020-05-21 10:49 ` Peter Zijlstra
2020-05-21 11:00 ` Peter Zijlstra
2020-05-21 12:41 ` Frederic Weisbecker
2020-05-25 13:21 ` Peter Zijlstra
2020-05-25 14:05 ` Frederic Weisbecker
2020-05-25 14:38 ` Peter Zijlstra
2020-05-25 14:17 ` Peter Zijlstra
2020-05-22 2:00 ` Qian Cai
2020-05-21 10:10 ` Peter Zijlstra
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20200521004035.GA15455@lenoir \
--to=frederic@kernel.org \
--cc=bp@alien8.de \
--cc=cai@lca.pw \
--cc=linux-kernel@vger.kernel.org \
--cc=linuxppc-dev@lists.ozlabs.org \
--cc=mpe@ellerman.id.au \
--cc=paulmck@kernel.org \
--cc=peterz@infradead.org \
--cc=tglx@linutronix.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox