linux-mips.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Thomas Gleixner <tglx@linutronix.de>
To: Mark Brown <broonie@kernel.org>
Cc: LKML <linux-kernel@vger.kernel.org>,
	x86@kernel.org, David Woodhouse <dwmw2@infradead.org>,
	Andrew Cooper <andrew.cooper3@citrix.com>,
	Brian Gerst <brgerst@gmail.com>,
	Arjan van de Veen <arjan@linux.intel.com>,
	Paolo Bonzini <pbonzini@redhat.com>,
	Paul McKenney <paulmck@kernel.org>,
	Tom Lendacky <thomas.lendacky@amd.com>,
	Sean Christopherson <seanjc@google.com>,
	Oleksandr Natalenko <oleksandr@natalenko.name>,
	Paul Menzel <pmenzel@molgen.mpg.de>,
	"Guilherme G. Piccoli" <gpiccoli@igalia.com>,
	Piotr Gorski <lucjan.lucjanov@gmail.com>,
	Usama Arif <usama.arif@bytedance.com>,
	Juergen Gross <jgross@suse.com>,
	Boris Ostrovsky <boris.ostrovsky@oracle.com>,
	xen-devel@lists.xenproject.org,
	Russell King <linux@armlinux.org.uk>,
	Arnd Bergmann <arnd@arndb.de>,
	linux-arm-kernel@lists.infradead.org,
	Catalin Marinas <catalin.marinas@arm.com>,
	Will Deacon <will@kernel.org>, Guo Ren <guoren@kernel.org>,
	linux-csky@vger.kernel.org,
	Thomas Bogendoerfer <tsbogend@alpha.franken.de>,
	linux-mips@vger.kernel.org,
	"James E.J. Bottomley" <James.Bottomley@hansenpartnership.com>,
	Helge Deller <deller@gmx.de>,
	linux-parisc@vger.kernel.org,
	Paul Walmsley <paul.walmsley@sifive.com>,
	Palmer Dabbelt <palmer@dabbelt.com>,
	linux-riscv@lists.infradead.org,
	Mark Rutland <mark.rutland@arm.com>,
	Sabin Rapan <sabrapan@amazon.com>,
	"Michael Kelley (LINUX)" <mikelley@microsoft.com>,
	Ross Philipson <ross.philipson@oracle.com>,
	David Woodhouse <dwmw@amazon.co.uk>
Subject: Re: [patch V4 33/37] cpu/hotplug: Allow "parallel" bringup up to CPUHP_BP_KICK_AP_STATE
Date: Mon, 22 May 2023 23:04:17 +0200	[thread overview]
Message-ID: <87bkicw01a.ffs@tglx> (raw)
In-Reply-To: <4ca39e58-055f-432c-8124-7c747fa4e85b@sirena.org.uk>

On Mon, May 22 2023 at 20:45, Mark Brown wrote:
> On Fri, May 12, 2023 at 11:07:50PM +0200, Thomas Gleixner wrote:
>> From: Thomas Gleixner <tglx@linutronix.de>
>> 
>> There is often significant latency in the early stages of CPU bringup, and
>> time is wasted by waking each CPU (e.g. with SIPI/INIT/INIT on x86) and
>> then waiting for it to respond before moving on to the next.
>> 
>> Allow a platform to enable parallel setup which brings all to be onlined
>> CPUs up to the CPUHP_BP_KICK_AP state. While this state advancement on the
>> control CPU (BP) is single-threaded the important part is the last state
>> CPUHP_BP_KICK_AP which wakes the to be onlined CPUs up.
>
> We're seeing a regression on ThunderX2 systems with 256 CPUs with an
> arm64 defconfig running -next which I've bisected to this patch.  Before
> this commit we bring up 256 CPUs:
>
> [   29.137225] GICv3: CPU254: found redistributor 11e03 region 1:0x0000000441f60000
> [   29.137238] GICv3: CPU254: using allocated LPI pending table @0x00000008818e0000
> [   29.137305] CPU254: Booted secondary processor 0x0000011e03 [0x431f0af1]
> [   29.292421] Detected PIPT I-cache on CPU255
> [   29.292635] GICv3: CPU255: found redistributor 11f03 region 1:0x0000000441fe0000
> [   29.292648] GICv3: CPU255: using allocated LPI pending table @0x00000008818f0000
> [   29.292715] CPU255: Booted secondary processor 0x0000011f03 [0x431f0af1]
> [   29.292859] smp: Brought up 2 nodes, 256 CPUs
> [   29.292864] SMP: Total of 256 processors activated.
>
> but after we only bring up 255, missing the 256th:
>
> [   29.165888] GICv3: CPU254: found redistributor 11e03 region 1:0x0000000441f60000
> [   29.165901] GICv3: CPU254: using allocated LPI pending table @0x00000008818e0000
> [   29.165968] CPU254: Booted secondary processor 0x0000011e03 [0x431f0af1]
> [   29.166120] smp: Brought up 2 nodes, 255 CPUs
> [   29.166125] SMP: Total of 255 processors activated.
>
> I can't immediately see an issue with the patch itself, for systems
> without CONFIG_HOTPLUG_PARALLEL=y it should replace the loop over
> cpu_present_mask done by for_each_present_cpu() with an open coded one.
> I didn't check the rest of the series yet.
>
> The KernelCI bisection bot also isolated an issue on Odroid XU3 (a 32
> bit arm system) with the final CPU of the 8 on the system not coming up
> to the same patch:
>
>   https://groups.io/g/kernelci-results/message/42480?p=%2C%2C%2C20%2C0%2C0%2C0%3A%3Acreated%2C0%2Call-cpus%2C20%2C2%2C0%2C99054444
>
> Other boards I've checked (including some with multiple CPU clusters)
> seem to be bringing up all their CPUs so it doesn't seem to just be
> general breakage.

That does not make any sense at all and my tired brain does not help
either.

Can you please apply the below debug patch and provide the output?

Thanks,

        tglx
---
diff --git a/kernel/cpu.c b/kernel/cpu.c
index 005f863a3d2b..90a9b2ae8391 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -1767,13 +1767,20 @@ static void __init cpuhp_bringup_mask(const struct cpumask *mask, unsigned int n
 {
 	unsigned int cpu;
 
+	pr_info("Bringup max %u CPUs to %d\n", ncpus, target);
+
 	for_each_cpu(cpu, mask) {
 		struct cpuhp_cpu_state *st = per_cpu_ptr(&cpuhp_state, cpu);
+		int ret;
+
+		pr_info("Bringup CPU%u left %u\n", cpu, ncpus);
 
 		if (!--ncpus)
 			break;
 
-		if (cpu_up(cpu, target) && can_rollback_cpu(st)) {
+		ret = cpu_up(cpu, target);
+		pr_info("Bringup CPU%u %d\n", cpu, ret);
+		if (ret && can_rollback_cpu(st)) {
 			/*
 			 * If this failed then cpu_up() might have only
 			 * rolled back to CPUHP_BP_KICK_AP for the final

  reply	other threads:[~2023-05-22 21:04 UTC|newest]

Thread overview: 55+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-05-12 21:06 [patch V4 00/37] cpu/hotplug, x86: Reworked parallel CPU bringup Thomas Gleixner
2023-05-12 21:06 ` [patch V4 01/37] x86/smpboot: Cleanup topology_phys_to_logical_pkg()/die() Thomas Gleixner
2023-05-12 21:07 ` [patch V4 02/37] cpu/hotplug: Mark arch_disable_smp_support() and bringup_nonboot_cpus() __init Thomas Gleixner
2023-05-12 21:07 ` [patch V4 03/37] x86/smpboot: Avoid pointless delay calibration if TSC is synchronized Thomas Gleixner
2023-05-12 21:07 ` [patch V4 04/37] x86/smpboot: Rename start_cpu0() to soft_restart_cpu() Thomas Gleixner
2023-06-12 23:45   ` Philippe Mathieu-Daudé
2023-05-12 21:07 ` [patch V4 05/37] x86/topology: Remove CPU0 hotplug option Thomas Gleixner
2023-05-12 21:07 ` [patch V4 06/37] x86/smpboot: Remove the CPU0 hotplug kludge Thomas Gleixner
2023-05-12 21:07 ` [patch V4 07/37] x86/smpboot: Restrict soft_restart_cpu() to SEV Thomas Gleixner
2023-06-12 23:46   ` Philippe Mathieu-Daudé
2023-05-12 21:07 ` [patch V4 08/37] x86/smpboot: Remove unnecessary barrier() Thomas Gleixner
2023-05-12 21:07 ` [patch V4 09/37] x86/smpboot: Split up native_cpu_up() into separate phases and document them Thomas Gleixner
2023-05-12 21:07 ` [patch V4 10/37] x86/smpboot: Get rid of cpu_init_secondary() Thomas Gleixner
2023-06-12 23:49   ` Philippe Mathieu-Daudé
2023-05-12 21:07 ` [patch V4 11/37] x86/cpu/cacheinfo: Remove cpu_callout_mask dependency Thomas Gleixner
2023-05-12 21:07 ` [patch V4 12/37] x86/smpboot: Move synchronization masks to SMP boot code Thomas Gleixner
2023-05-12 21:07 ` [patch V4 13/37] x86/smpboot: Make TSC synchronization function call based Thomas Gleixner
2023-05-12 21:07 ` [patch V4 14/37] x86/smpboot: Remove cpu_callin_mask Thomas Gleixner
2023-05-12 21:07 ` [patch V4 15/37] cpu/hotplug: Rework sparse_irq locking in bringup_cpu() Thomas Gleixner
2023-05-12 21:07 ` [patch V4 16/37] x86/smpboot: Remove wait for cpu_online() Thomas Gleixner
2023-05-12 21:07 ` [patch V4 17/37] x86/xen/smp_pv: Remove wait for CPU online Thomas Gleixner
2023-05-12 21:07 ` [patch V4 18/37] x86/xen/hvm: Get rid of DEAD_FROZEN handling Thomas Gleixner
2023-05-12 21:07 ` [patch V4 19/37] cpu/hotplug: Add CPU state tracking and synchronization Thomas Gleixner
2023-05-12 21:07 ` [patch V4 20/37] x86/smpboot: Switch to hotplug core state synchronization Thomas Gleixner
2023-05-12 21:07 ` [patch V4 21/37] cpu/hotplug: Remove cpu_report_state() and related unused cruft Thomas Gleixner
2023-05-12 21:07 ` [patch V4 22/37] ARM: smp: Switch to hotplug core state synchronization Thomas Gleixner
2023-05-12 21:07 ` [patch V4 23/37] arm64: " Thomas Gleixner
2023-05-12 21:07 ` [patch V4 24/37] csky/smp: " Thomas Gleixner
2023-05-12 21:07 ` [patch V4 25/37] MIPS: SMP_CPS: " Thomas Gleixner
2023-05-12 21:07 ` [patch V4 26/37] parisc: " Thomas Gleixner
2023-05-12 21:07 ` [patch V4 27/37] riscv: " Thomas Gleixner
2023-05-12 21:07 ` [patch V4 28/37] cpu/hotplug: Remove unused state functions Thomas Gleixner
2023-05-12 21:07 ` [patch V4 29/37] cpu/hotplug: Reset task stack state in _cpu_up() Thomas Gleixner
2023-05-12 21:07 ` [patch V4 30/37] cpu/hotplug: Provide a split up CPUHP_BRINGUP mechanism Thomas Gleixner
2023-05-12 21:07 ` [patch V4 31/37] x86/smpboot: Enable split CPU startup Thomas Gleixner
2023-05-12 21:07 ` [patch V4 32/37] x86/apic: Provide cpu_primary_thread mask Thomas Gleixner
2023-05-12 21:07 ` [patch V4 33/37] cpu/hotplug: Allow "parallel" bringup up to CPUHP_BP_KICK_AP_STATE Thomas Gleixner
2023-05-22 19:45   ` Mark Brown
2023-05-22 21:04     ` Thomas Gleixner [this message]
2023-05-22 22:27       ` Mark Brown
2023-05-22 23:12         ` Thomas Gleixner
2023-05-23 10:19           ` Mark Brown
2023-05-12 21:07 ` [patch V4 34/37] x86/apic: Save the APIC virtual base address Thomas Gleixner
2023-05-12 21:07 ` [patch V4 35/37] x86/smpboot: Implement a bit spinlock to protect the realmode stack Thomas Gleixner
2023-05-12 21:07 ` [patch V4 36/37] x86/smpboot: Support parallel startup of secondary CPUs Thomas Gleixner
2023-05-19 16:28   ` Jeffrey Hugo
2023-05-19 16:57     ` Andrew Cooper
2023-05-19 17:44       ` Jeffrey Hugo
2023-05-12 21:07 ` [patch V4 37/37] x86/smpboot/64: Implement arch_cpuhp_init_parallel_bringup() and enable it Thomas Gleixner
2023-05-15 12:00   ` Peter Zijlstra
2023-05-13 18:32 ` [patch V4 00/37] cpu/hotplug, x86: Reworked parallel CPU bringup Oleksandr Natalenko
2023-05-13 21:00   ` Helge Deller
2023-05-14 21:48 ` Guilherme G. Piccoli
2023-05-22 10:57 ` [PATCH] x86/apic: Fix use of X{,2}APIC_ENABLE in asm with older binutils Andrew Cooper
2023-05-22 11:17   ` Russell King (Oracle)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87bkicw01a.ffs@tglx \
    --to=tglx@linutronix.de \
    --cc=James.Bottomley@hansenpartnership.com \
    --cc=andrew.cooper3@citrix.com \
    --cc=arjan@linux.intel.com \
    --cc=arnd@arndb.de \
    --cc=boris.ostrovsky@oracle.com \
    --cc=brgerst@gmail.com \
    --cc=broonie@kernel.org \
    --cc=catalin.marinas@arm.com \
    --cc=deller@gmx.de \
    --cc=dwmw2@infradead.org \
    --cc=dwmw@amazon.co.uk \
    --cc=gpiccoli@igalia.com \
    --cc=guoren@kernel.org \
    --cc=jgross@suse.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-csky@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mips@vger.kernel.org \
    --cc=linux-parisc@vger.kernel.org \
    --cc=linux-riscv@lists.infradead.org \
    --cc=linux@armlinux.org.uk \
    --cc=lucjan.lucjanov@gmail.com \
    --cc=mark.rutland@arm.com \
    --cc=mikelley@microsoft.com \
    --cc=oleksandr@natalenko.name \
    --cc=palmer@dabbelt.com \
    --cc=paul.walmsley@sifive.com \
    --cc=paulmck@kernel.org \
    --cc=pbonzini@redhat.com \
    --cc=pmenzel@molgen.mpg.de \
    --cc=ross.philipson@oracle.com \
    --cc=sabrapan@amazon.com \
    --cc=seanjc@google.com \
    --cc=thomas.lendacky@amd.com \
    --cc=tsbogend@alpha.franken.de \
    --cc=usama.arif@bytedance.com \
    --cc=will@kernel.org \
    --cc=x86@kernel.org \
    --cc=xen-devel@lists.xenproject.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).