All of lore.kernel.org
 help / color / mirror / Atom feed
From: Thomas Gleixner <tglx@linutronix.de>
To: kexec@lists.infradead.org
Subject: [PATCHv3 1/2] cpu/hotplug: Keep cpu hotplug disabled until the rebooting cpu is stable
Date: Tue, 10 May 2022 10:28:11 +0200	[thread overview]
Message-ID: <87y1z9pzac.ffs@tglx> (raw)
In-Reply-To: <YnneUeRJ42SRM/M+@piliu.users.ipa.redhat.com>

On Tue, May 10 2022 at 11:38, Pingfan Liu wrote:
> On Mon, May 09, 2022 at 12:55:21PM +0200, Thomas Gleixner wrote:
>> On Mon, May 09 2022 at 12:13, Pingfan Liu wrote:
>> > The following code chunk repeats in both
>> > migrate_to_reboot_cpu() and smp_shutdown_nonboot_cpus():
>> > This is due to a breakage like the following:
>> 
>> I don't see what's broken here.
>> 
>
> No, no broken. Could it be better to replace 'breakage' with
> 'breakin'?

There is no break-in. There is a phase where CPU hotplug is reenabled,
which might be avoided.

>> > +/* primary_cpu keeps unchanged after migrate_to_reboot_cpu() */
>> 
>> This comment makes no sense.
>> 
>
> Since migrate_to_reboot_cpu() disables cpu hotplug, so the selected
> valid online cpu -- primary_cpu keeps unchange.

So what is that parameter for then? If migrate_to_reboot_cpu() ensured
that the current task is on the reboot CPU then this parameter is
useless, no?

>> >  void smp_shutdown_nonboot_cpus(unsigned int primary_cpu)
>> >  {
>> >  	unsigned int cpu;
>> >  	int error;
>> >  
>> > +	/*
>> > +	 * Block other cpu hotplug event, so primary_cpu is always online if
>> > +	 * it is not touched by us
>> > +	 */
>> >  	cpu_maps_update_begin();
>> > -
>> >  	/*
>> > -	 * Make certain the cpu I'm about to reboot on is online.
>> > -	 *
>> > -	 * This is inline to what migrate_to_reboot_cpu() already do.
>> > +	 * migrate_to_reboot_cpu() disables CPU hotplug assuming that
>> > +	 * no further code needs to use CPU hotplug (which is true in
>> > +	 * the reboot case). However, the kexec path depends on using
>> > +	 * CPU hotplug again; so re-enable it here.
>> 
>> You want to reduce confusion, but in reality this is even more confusing
>> than before.
>> 
>
> This __cpu_hotplug_enable() can be considered to defer from kernel_kexec() to
> arch-dependent code chunk (here), which is a more proper point.
>
> Could it make things better by rephrasing the words as the following?
>    migrate_to_reboot_cpu() disables CPU hotplug to prevent the selected
>    reboot cpu from disappearing. But arches need cpu_down to hot remove
>    cpus except rebooting-cpu, so re-enabling cpu hotplug again.

Can you please use proper words. arches is not a word and it's closer to
the plural of arch, than to the word architecture. This is not twitter.

And no, the architectures do not need cpu_down() at all. This very
function smp_shutdown_nonboot_cpus() invokes cpu_down_maps_locked() to
shut down the non boot CPUs. That fails when cpu_hotplug_disabled != 0.

>> >  	 */
>> > -	if (!cpu_online(primary_cpu))
>> > -		primary_cpu = cpumask_first(cpu_online_mask);
>> > +	__cpu_hotplug_enable();
>> 
>> How is this decrement solving anything? At the end of this function, the
>> counter is incremented again. So what's the point of this exercise?
>> 
> This decrement enables the cpu hot-removing.  Since
> smp_shutdown_nonboot_cpus()->cpu_down_maps_locked(), if
> cpu_hotplug_disabled, it returns -EBUSY.

Correct, so why can't you spell that out in concise words in the first
place right at that comment which reenables hotplug?

>> What does that for arch/powerpc/kernel/kexec_machine64.c now?
>> 
>> Nothing, as far as I can tell. Which means you basically reverted
>> 011e4b02f1da ("powerpc, kexec: Fix "Processor X is stuck" issue during
>> kexec from ST mode") unless I'm completely confused.
>> 
>
> Oops. Forget about powerpc. Considering the cpu hotplug is an
> arch-dependent feature in machine_shutdown(), as x86 does not need it.

It's not a feature, it's a architecture specific requirement. x86 is
irrelevant here because this is a powerpc requirement.

>> This is tinkering at best. Can we please sit down and rethink this whole
>> machinery instead of applying random duct tape to it?
>> 
> I try to make code look consistent.

Emphasis on try. So far the attempt failed and resulted in a regression.

Thanks,

        tglx


WARNING: multiple messages have this Message-ID (diff)
From: Thomas Gleixner <tglx@linutronix.de>
To: Pingfan Liu <kernelfans@gmail.com>
Cc: linux-kernel@vger.kernel.org,
	Eric Biederman <ebiederm@xmission.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Valentin Schneider <valentin.schneider@arm.com>,
	Vincent Donnefort <vincent.donnefort@arm.com>,
	Ingo Molnar <mingo@kernel.org>,
	Mark Rutland <mark.rutland@arm.com>,
	YueHaibing <yuehaibing@huawei.com>,
	Baokun Li <libaokun1@huawei.com>,
	Randy Dunlap <rdunlap@infradead.org>, Baoquan He <bhe@redhat.com>,
	kexec@lists.infradead.org
Subject: Re: [PATCHv3 1/2] cpu/hotplug: Keep cpu hotplug disabled until the rebooting cpu is stable
Date: Tue, 10 May 2022 10:28:11 +0200	[thread overview]
Message-ID: <87y1z9pzac.ffs@tglx> (raw)
In-Reply-To: <YnneUeRJ42SRM/M+@piliu.users.ipa.redhat.com>

On Tue, May 10 2022 at 11:38, Pingfan Liu wrote:
> On Mon, May 09, 2022 at 12:55:21PM +0200, Thomas Gleixner wrote:
>> On Mon, May 09 2022 at 12:13, Pingfan Liu wrote:
>> > The following code chunk repeats in both
>> > migrate_to_reboot_cpu() and smp_shutdown_nonboot_cpus():
>> > This is due to a breakage like the following:
>> 
>> I don't see what's broken here.
>> 
>
> No, no broken. Could it be better to replace 'breakage' with
> 'breakin'?

There is no break-in. There is a phase where CPU hotplug is reenabled,
which might be avoided.

>> > +/* primary_cpu keeps unchanged after migrate_to_reboot_cpu() */
>> 
>> This comment makes no sense.
>> 
>
> Since migrate_to_reboot_cpu() disables cpu hotplug, so the selected
> valid online cpu -- primary_cpu keeps unchange.

So what is that parameter for then? If migrate_to_reboot_cpu() ensured
that the current task is on the reboot CPU then this parameter is
useless, no?

>> >  void smp_shutdown_nonboot_cpus(unsigned int primary_cpu)
>> >  {
>> >  	unsigned int cpu;
>> >  	int error;
>> >  
>> > +	/*
>> > +	 * Block other cpu hotplug event, so primary_cpu is always online if
>> > +	 * it is not touched by us
>> > +	 */
>> >  	cpu_maps_update_begin();
>> > -
>> >  	/*
>> > -	 * Make certain the cpu I'm about to reboot on is online.
>> > -	 *
>> > -	 * This is inline to what migrate_to_reboot_cpu() already do.
>> > +	 * migrate_to_reboot_cpu() disables CPU hotplug assuming that
>> > +	 * no further code needs to use CPU hotplug (which is true in
>> > +	 * the reboot case). However, the kexec path depends on using
>> > +	 * CPU hotplug again; so re-enable it here.
>> 
>> You want to reduce confusion, but in reality this is even more confusing
>> than before.
>> 
>
> This __cpu_hotplug_enable() can be considered to defer from kernel_kexec() to
> arch-dependent code chunk (here), which is a more proper point.
>
> Could it make things better by rephrasing the words as the following?
>    migrate_to_reboot_cpu() disables CPU hotplug to prevent the selected
>    reboot cpu from disappearing. But arches need cpu_down to hot remove
>    cpus except rebooting-cpu, so re-enabling cpu hotplug again.

Can you please use proper words. arches is not a word and it's closer to
the plural of arch, than to the word architecture. This is not twitter.

And no, the architectures do not need cpu_down() at all. This very
function smp_shutdown_nonboot_cpus() invokes cpu_down_maps_locked() to
shut down the non boot CPUs. That fails when cpu_hotplug_disabled != 0.

>> >  	 */
>> > -	if (!cpu_online(primary_cpu))
>> > -		primary_cpu = cpumask_first(cpu_online_mask);
>> > +	__cpu_hotplug_enable();
>> 
>> How is this decrement solving anything? At the end of this function, the
>> counter is incremented again. So what's the point of this exercise?
>> 
> This decrement enables the cpu hot-removing.  Since
> smp_shutdown_nonboot_cpus()->cpu_down_maps_locked(), if
> cpu_hotplug_disabled, it returns -EBUSY.

Correct, so why can't you spell that out in concise words in the first
place right at that comment which reenables hotplug?

>> What does that for arch/powerpc/kernel/kexec_machine64.c now?
>> 
>> Nothing, as far as I can tell. Which means you basically reverted
>> 011e4b02f1da ("powerpc, kexec: Fix "Processor X is stuck" issue during
>> kexec from ST mode") unless I'm completely confused.
>> 
>
> Oops. Forget about powerpc. Considering the cpu hotplug is an
> arch-dependent feature in machine_shutdown(), as x86 does not need it.

It's not a feature, it's a architecture specific requirement. x86 is
irrelevant here because this is a powerpc requirement.

>> This is tinkering at best. Can we please sit down and rethink this whole
>> machinery instead of applying random duct tape to it?
>> 
> I try to make code look consistent.

Emphasis on try. So far the attempt failed and resulted in a regression.

Thanks,

        tglx

  reply	other threads:[~2022-05-10  8:28 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-05-09  4:13 [PATCHv3 0/2] cpu/hotplug: Keep cpu hotplug disabled until the rebooting cpu is stable Pingfan Liu
2022-05-09  4:13 ` Pingfan Liu
2022-05-09  4:13 ` Pingfan Liu
2022-05-09  4:13 ` Pingfan Liu
2022-05-09  4:13 ` [PATCHv3 1/2] " Pingfan Liu
2022-05-09  4:13   ` Pingfan Liu
2022-05-09 10:55   ` Thomas Gleixner
2022-05-09 10:55     ` Thomas Gleixner
2022-05-09 10:57     ` Thomas Gleixner
2022-05-09 10:57       ` Thomas Gleixner
2022-05-10  3:38     ` Pingfan Liu
2022-05-10  3:38       ` Pingfan Liu
2022-05-10  8:28       ` Thomas Gleixner [this message]
2022-05-10  8:28         ` Thomas Gleixner
2022-05-11  9:09         ` Pingfan Liu
2022-05-11  9:09           ` Pingfan Liu
2022-05-09  4:13 ` [PATCHv3 2/2] arm/arm64/ia64: kexec: fix the primary cpu passed to smp_shutdown_nonboot_cpus() Pingfan Liu
2022-05-09  4:13   ` Pingfan Liu
2022-05-09  4:13   ` Pingfan Liu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87y1z9pzac.ffs@tglx \
    --to=tglx@linutronix.de \
    --cc=kexec@lists.infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.