From: Valentin Schneider <valentin.schneider@arm.com>
To: Peter Zijlstra <peterz@infradead.org>,
Vincent Donnefort <vincent.donnefort@arm.com>
Cc: tglx@linutronix.de, mingo@kernel.org, bigeasy@linutronix.de,
swood@redhat.com, juri.lelli@redhat.com,
vincent.guittot@linaro.org, dietmar.eggemann@arm.com,
rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de,
bristot@redhat.com, qais.yousef@arm.com,
linux-kernel@vger.kernel.org
Subject: Re: [PATCH 3/3] sched: Use cpu_dying() to fix balance_push vs hotplug-rollback
Date: Wed, 21 Apr 2021 10:32:43 +0100 [thread overview]
Message-ID: <87eef4ugn8.mognet@arm.com> (raw)
In-Reply-To: <YH7r+AoQEReSvxBI@hirez.programming.kicks-ass.net>
On 20/04/21 16:58, Peter Zijlstra wrote:
> On Tue, Apr 20, 2021 at 04:39:04PM +0200, Peter Zijlstra wrote:
>> On Tue, Apr 20, 2021 at 04:20:56PM +0200, Peter Zijlstra wrote:
>> > On Tue, Apr 20, 2021 at 10:46:33AM +0100, Vincent Donnefort wrote:
>> >
>> > > Found the issue:
>> > >
>> > > $ cat hotplug/states:
>> > > 219: sched:active
>> > > 220: online
>> > >
>> > > CPU0:
>> > >
>> > > $ echo 219 > hotplug/fail
>> > > $ echo 0 > online
>> > >
>> > > => cpu_active = 1 cpu_dying = 1
>> > >
>> > > which means that later on, for another CPU hotunplug, in
>> > > __balance_push_cpu_stop(), the fallback rq for a kthread can select that
>> > > CPU0, but __migrate_task() would fail and we end-up in an infinite loop,
>> > > trying to migrate that task to CPU0.
>> > >
>> > > The problem is that for a failure in sched:active, as "online" has no callback,
>> > > there will be no call to cpuhp_invoke_callback(). Hence, the cpu_dying bit would
>> > > not be reset.
>> >
>> > Urgh! Good find.
>
>> I seem to have triggered the BUG() in select_fallback_rq() with your recipie.
>> Have cpu0 fail on sched:active, then offline all other CPUs.
>>
>> Now lemme add that patch.
>
> (which obviously didn't actually build) seems to fix it.
>
Moving the cpu_dying_mask update from cpuhp_invoke_callback() to
cpuhp_{set, reset}_state() means we lose an update in cpuhp_issue_call(),
but AFAICT that wasn't required (this doesn't actually change a CPU's
hotplug state, rather executes some newly installed/removed callbacks whose
state maps below the CPU's current hp state).
Actually staring at it some more, it might have caused bugs: if a
cpuhp_setup_state() fails, we can end up in cpuhp_rollback_install() which
will end up calling cpuhp_invoke_callback(bringup=false) and mess with the
dying mask.
FWIW:
Reviewed-by: Valentin Schneider <valentin.schneider@arm.com>
next prev parent reply other threads:[~2021-04-21 9:32 UTC|newest]
Thread overview: 29+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-03-10 14:52 [PATCH 0/3] sched: Fix remaining balance_push vs hotplug hole Peter Zijlstra
2021-03-10 14:52 ` [PATCH 1/3] cpumask: Make cpu_{online,possible,present,active}() inline Peter Zijlstra
2021-04-16 15:53 ` [tip: sched/core] " tip-bot2 for Peter Zijlstra
2021-03-10 14:53 ` [PATCH 2/3] cpumask: Introduce DYING mask Peter Zijlstra
2021-03-21 19:30 ` Qais Yousef
2021-03-22 15:07 ` Steven Rostedt
2021-04-12 10:55 ` Peter Zijlstra
2021-04-12 11:16 ` Qais Yousef
2021-04-16 15:53 ` [tip: sched/core] " tip-bot2 for Peter Zijlstra
2021-03-10 14:53 ` [PATCH 3/3] sched: Use cpu_dying() to fix balance_push vs hotplug-rollback Peter Zijlstra
2021-03-11 15:13 ` Valentin Schneider
2021-03-11 16:42 ` Peter Zijlstra
2021-04-12 12:03 ` Peter Zijlstra
2021-04-12 17:22 ` Valentin Schneider
2021-04-13 6:51 ` Peter Zijlstra
2021-04-15 8:59 ` Peter Zijlstra
2021-04-15 14:32 ` Valentin Schneider
2021-04-15 15:29 ` Peter Zijlstra
2021-04-15 15:34 ` Valentin Schneider
2021-04-19 10:56 ` Vincent Donnefort
2021-04-20 9:46 ` Vincent Donnefort
2021-04-20 14:20 ` Peter Zijlstra
2021-04-20 14:39 ` Peter Zijlstra
2021-04-20 14:58 ` Peter Zijlstra
2021-04-20 16:53 ` Vincent Donnefort
2021-04-20 18:07 ` Peter Zijlstra
2021-04-21 9:32 ` Valentin Schneider [this message]
2021-04-22 7:36 ` [tip: sched/core] cpumask/hotplug: Fix cpu_dying() state tracking tip-bot2 for Peter Zijlstra
2021-04-16 15:53 ` [tip: sched/core] sched: Use cpu_dying() to fix balance_push vs hotplug-rollback tip-bot2 for Peter Zijlstra
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87eef4ugn8.mognet@arm.com \
--to=valentin.schneider@arm.com \
--cc=bigeasy@linutronix.de \
--cc=bristot@redhat.com \
--cc=bsegall@google.com \
--cc=dietmar.eggemann@arm.com \
--cc=juri.lelli@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mgorman@suse.de \
--cc=mingo@kernel.org \
--cc=peterz@infradead.org \
--cc=qais.yousef@arm.com \
--cc=rostedt@goodmis.org \
--cc=swood@redhat.com \
--cc=tglx@linutronix.de \
--cc=vincent.donnefort@arm.com \
--cc=vincent.guittot@linaro.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.