Linux-ARM-Kernel Archive on lore.kernel.org

* Re: [PATCH RFC 3/3] arm64: Add HOTPLUG_PARALLEL support for secondary CPUs
From: Jinjie Ruan @ 2026-06-22  9:16 UTC (permalink / raw)
  To: Will Deacon
  Cc: Michael Kelley, catalin.marinas@arm.com,
	tsbogend@alpha.franken.de, pjw@kernel.org, palmer@dabbelt.com,
	aou@eecs.berkeley.edu, alex@ghiti.fr, tglx@kernel.org,
	mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com,
	hpa@zytor.com, peterz@infradead.org, kees@kernel.org,
	nathan@kernel.org, linusw@kernel.org, ojeda@kernel.org,
	david.kaplan@amd.com, lukas.bulwahn@redhat.com,
	ryan.roberts@arm.com, maz@kernel.org, timothy.hayes@arm.com,
	lpieralisi@kernel.org, thuth@redhat.com, oupton@kernel.org,
	yeoreum.yun@arm.com, miko.lenczewski@arm.com, broonie@kernel.org,
	kevin.brodsky@arm.com, james.clark@linaro.org, tabba@google.com,
	mrigendra.chaubey@gmail.com, arnd@arndb.de,
	anshuman.khandual@arm.com, x86@kernel.org,
	linux-kernel@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org, linux-mips@vger.kernel.org,
	linux-riscv@lists.infradead.org
In-Reply-To: <ajPitENEHWa8lDfC@willie-the-truck>

On 6/18/2026 8:21 PM, Will Deacon wrote:
> Hi Jinjie,
> 
> On Mon, Jun 15, 2026 at 04:51:48PM +0800, Jinjie Ruan wrote:
>> On 6/12/2026 11:45 PM, Michael Kelley wrote:
>>> From: Jinjie Ruan <ruanjinjie@huawei.com> Sent: Thursday, June 11, 2026 6:38 AM
>>>>
>>>> Support for parallel secondary CPU bringup is already utilized by x86,
>>>> MIPS, and RISC-V. This patch brings this capability to the arm64
>>>> architecture.
>>>>
>>>> Rework the global `secondary_data` accessed during early boot into
>>>> a per-CPU array. This array maps logical CPU IDs to MPIDR_EL1 values,
>>>> enabling the early boot code in head.S to resolve each secondary CPU's
>>>> logical ID concurrently.
>>>>
>>>> To fully enable HOTPLUG_PARALLEL, this patch implements:
>>>> 1) An arm64-specific arch_cpuhp_kick_ap_alive() handler.
>>>> 2) Callbacks to cpuhp_ap_sync_alive() inside secondary_start_kernel().
>>>>
>>>> Successfully tested on QEMU ARM64 virt machine (KVM on, 128 vCPUs).
>>>>
>>>> |     test kernel	   | secondary CPUs boot time |
>>>> |  ---------------------   |	--------------------  |
>>>> |   Without this patch     |		155.672	      |
>>>> |   cpuhp.parallel=0	   |		62.897	      |
>>>> |   cpuhp.parallel=1	   |		166.703	      |
>>>
>>> The last two rows seem mixed up. I would expect parallel=0 to
>>> result in a longer boot time.
>>
>> Hi, Michael,
>>
>> The results are correct and not mixed up.
>>
>> Compared to the original non‑HOTPLUG_PARALLEL approach, the advantage of
>> cpuhp.parallel=0 lies in its use of cpu_relax(`yield` on arm64) instead
>> of the wait_for_completion_timeout() mechanism (which may cause sleep
>> and context switching). This significantly reduces the overhead of VM
>> exits and context switches in a KVM guest, thereby cutting the secondary
>> CPU boot time by more than half.
> 
> I don't think that's a particularly compelling reason to enable this for
> arm64, in all honesty. The yield instruction typically doesn't do
> anything on actual arm64 silicon, so this probably means that you're
> introducing busy-loops which tend to be bad for power and scalability.
> 
> I implemented this a while ago [1] but didn't manage to see much in terms
> of performance improvement and so I didn't bother to send the patches out
> after talking about it at KVM forum [2]. However, as mentioned at the end
> of that talk, it _is_ still useful for confidential VMs using PSCI so
> let me dust off my old series and send it out to see what you think.

Hi Will,

Thanks for the insights! Your point about using PSCI v0.2's Context ID
to avoid the NR_CPUS array for input parameters (like
secondary_data.task) is incredibly elegant.

However, if I understand your series correctly, it seems your approach
primarily targets preventing the concurrent use of secondary_data.task,
but it doesn't seem to account for the potential data trampling on
secondary_data.status when multiple secondary CPUs are brought up
simultaneously.

update_cpu_boot_status()
  -> WRITE_ONCE(secondary_data.status.flags[val], 1)

arch_cpuhp_cleanup_kick_cpu()
  -> status = READ_ONCE(secondary_data.status)

Best regards,
Jinjie

> 
> It relies on PSCI v0.2, which means we don't need the NR_CPUS size array
> for secondary_data and I also have some support for error handling (it
> doesn't look like you handle __early_cpu_boot_status properly).
> 
> It looks like I could include your first patch, though!
> 
> Will
> 
> [1] https://git.kernel.org/pub/scm/linux/kernel/git/will/linux.git/log/?h=cpu-hotplug
> [2] https://www.youtube.com/watch?v=Q6kOshnnQuE
> 

^ permalink raw reply