linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* [RFC 00/10] arm64/riscv: Introduce fast kexec reboot
@ 2022-08-22  2:15 Pingfan Liu
  2022-08-22  2:15 ` [RFC 02/10] cpu/hotplug: Compile smp_shutdown_nonboot_cpus() conditioned on CONFIG_SHUTDOWN_NONBOOT_CPUS Pingfan Liu
                   ` (5 more replies)
  0 siblings, 6 replies; 7+ messages in thread
From: Pingfan Liu @ 2022-08-22  2:15 UTC (permalink / raw)
  To: linux-arm-kernel, linux-ia64, linux-riscv, linux-kernel
  Cc: Pingfan Liu, Thomas Gleixner, Steven Price,
	Kuppuswamy Sathyanarayanan, Jason A. Donenfeld,
	Frederic Weisbecker, Russell King, Catalin Marinas, Will Deacon,
	Paul Walmsley, Palmer Dabbelt, Albert Ou, Peter Zijlstra,
	Eric W. Biederman

On a SMP arm64 machine, it may take a long time to kexec-reboot a new
kernel, where the time is linear to the number of the cpus. On a 80 cpus
machine, it takes about 15 seconds, while with this patch, the time
will dramaticly drop to one second.

*** Current situation 'slow kexec reboot' ***

At present, some architectures rely on smp_shutdown_nonboot_cpus() to
implement "kexec -e". Since smp_shutdown_nonboot_cpus() tears down the
cpus serially, it is very slow.

Take a close look, a cpu_down() processing on a single cpu can approximately be
divided into two stages:
-1. from CPUHP_ONLINE to CPUHP_TEARDOWN_CPU
-2. from CPUHP_TEARDOWN_CPU to CPUHP_AP_IDLE_DEAD
    which is by stop_machine_cpuslocked(take_cpu_down, NULL, cpumask_of(cpu));
    and runs on the teardown cpu.

If these processes can run in parallel, then, the reboot can be speeded
up. That is the aim of this patch.

*** Contrast to other implements ***

X86 and PowerPC have their own machine_shutdown(), which does not reply
on the cpu hot-removing mechanism. They just discriminate some critical
components and tear down in per cpu NMI handler during the kexec
reboot. But for some architectures, let's say arm64, it is not easy to define
these critical component due to various chipmakers' implements.

As a result, sticking to the cpu hot-removing mechanism is the simplest
way to re-implement the parallel. 


*** Things worthy of consideration ***

1. The definition of a clean boundary between the first kernel and the new kernel
-1.1 firmware
     The firmware's internal state should enter into a proper state, so
it can work for the new kernel. And this is achieved by the firmware's
cpuhp_step's teardown interface if any.

-1.2 CPU internal state
     Whether the cache or PMU needs a clean shutdown before rebooting.

2. The dependency of each cpuhp_step
   The boundary of a clean cut involves only few cpuhp_step, but they
may propagate to other cpuhp_step by dependency. This series does not
bother to judge the dependency, instead, just iterate downside each
cpuhp_step. And this strategy demands that each involved cpuhp_step's
teardown procedure supports parallelism.


*** Solution ***

Ideally, if the interface _cpu_down() can be enhanced to enable
parallelism, then the fast reboot can be achieved.

But revisiting the two parts of the current cpu_down() process, the
second part 'stop_machine_cpuslocked()' is a blockade. Packed inside the
_cpu_down(), stop_machine_cpuslocked() only allow one cpu to execute the
teardown.

So this patch breaks down the process of _cpu_down(), and divides the
teardown into three steps.
1. Send each AP from CPUHP_ONLINE to CPUHP_TEARDOWN_CPU
   in parallel.
2. Sync on BP to wait all APs to enter CPUHP_TEARDOWN_CPU state
3. Send each AP from CPUHP_TEARDOWN_CPU to CPUHP_AP_IDLE_DEAD by the
   interface of stop_machine_cpuslocked() in parallel.

Finally the exposed stop_machine_cpuslocked()can be used to support
parallelism.

Apparently, step 2 is introduced in order to satisfy the prerequisite on
which stop_machine_cpuslocked() can start on each cpu.

Then the rest issue is about how to support parallelism in step 1&3.
Fortunately, each subsystem has its own carefully designed lock
mechanism. In each cpuhp_step teardown interface, adapting to the
subsystem's lock rule will make things work.


*** No rollback if failure ***

During kexec reboot, the devices have already been shutdown, there is no
way for system to roll back to a workable state. So this series also
does not consider the rollback issue if a failure on cpu_down() happens,
it just adventures to move on.

Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Steven Price <steven.price@arm.com>
Cc: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Cc: "Jason A. Donenfeld" <Jason@zx2c4.com>
Cc: Frederic Weisbecker <frederic@kernel.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
To: linux-arm-kernel@lists.infradead.org
To: linux-ia64@vger.kernel.org
To: linux-riscv@lists.infradead.org
To: linux-kernel@vger.kernel.org

Pingfan Liu (10):
  cpu/hotplug: Make __cpuhp_kick_ap() ready for async
  cpu/hotplug: Compile smp_shutdown_nonboot_cpus() conditioned on
    CONFIG_SHUTDOWN_NONBOOT_CPUS
  cpu/hotplug: Introduce fast kexec reboot
  cpu/hotplug: Check the capability of kexec quick reboot
  perf/arm-dsu: Make dsu_pmu_cpu_teardown() parallel
  rcu/hotplug: Make rcutree_dead_cpu() parallel
  lib/cpumask: Introduce cpumask_not_dying_but()
  cpuhp: Replace cpumask_any_but(cpu_online_mask, cpu)
  genirq/cpuhotplug: Ask migrate_one_irq() to migrate to a real online
    cpu
  arm64: smp: Make __cpu_disable() parallel

 arch/Kconfig                             |   4 +
 arch/arm/Kconfig                         |   1 +
 arch/arm/mach-imx/mmdc.c                 |   2 +-
 arch/arm/mm/cache-l2x0-pmu.c             |   2 +-
 arch/arm64/Kconfig                       |   1 +
 arch/arm64/kernel/smp.c                  |  31 +++-
 arch/ia64/Kconfig                        |   1 +
 arch/riscv/Kconfig                       |   1 +
 drivers/dma/idxd/perfmon.c               |   2 +-
 drivers/fpga/dfl-fme-perf.c              |   2 +-
 drivers/gpu/drm/i915/i915_pmu.c          |   2 +-
 drivers/perf/arm-cci.c                   |   2 +-
 drivers/perf/arm-ccn.c                   |   2 +-
 drivers/perf/arm-cmn.c                   |   4 +-
 drivers/perf/arm_dmc620_pmu.c            |   2 +-
 drivers/perf/arm_dsu_pmu.c               |  16 +-
 drivers/perf/arm_smmuv3_pmu.c            |   2 +-
 drivers/perf/fsl_imx8_ddr_perf.c         |   2 +-
 drivers/perf/hisilicon/hisi_uncore_pmu.c |   2 +-
 drivers/perf/marvell_cn10k_tad_pmu.c     |   2 +-
 drivers/perf/qcom_l2_pmu.c               |   2 +-
 drivers/perf/qcom_l3_pmu.c               |   2 +-
 drivers/perf/xgene_pmu.c                 |   2 +-
 drivers/soc/fsl/qbman/bman_portal.c      |   2 +-
 drivers/soc/fsl/qbman/qman_portal.c      |   2 +-
 include/linux/cpuhotplug.h               |   2 +
 include/linux/cpumask.h                  |   3 +
 kernel/cpu.c                             | 213 ++++++++++++++++++++---
 kernel/irq/cpuhotplug.c                  |   3 +-
 kernel/rcu/tree.c                        |   3 +-
 lib/cpumask.c                            |  18 ++
 31 files changed, 281 insertions(+), 54 deletions(-)

-- 
2.31.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [RFC 02/10] cpu/hotplug: Compile smp_shutdown_nonboot_cpus() conditioned on CONFIG_SHUTDOWN_NONBOOT_CPUS
  2022-08-22  2:15 [RFC 00/10] arm64/riscv: Introduce fast kexec reboot Pingfan Liu
@ 2022-08-22  2:15 ` Pingfan Liu
  2022-08-22  2:15 ` [RFC 03/10] cpu/hotplug: Introduce fast kexec reboot Pingfan Liu
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: Pingfan Liu @ 2022-08-22  2:15 UTC (permalink / raw)
  To: linux-arm-kernel, linux-ia64, linux-riscv, linux-kernel
  Cc: Pingfan Liu, Russell King, Catalin Marinas, Will Deacon,
	Paul Walmsley, Palmer Dabbelt, Albert Ou, Peter Zijlstra,
	Eric W. Biederman, Mark Rutland, Marco Elver, Masami Hiramatsu,
	Dan Li, Song Liu, Sami Tolvanen, Arnd Bergmann, Linus Walleij,
	Ard Biesheuvel, Tony Lindgren, Nick Hawkins, John Crispin,
	Geert Uytterhoeven, Andrew Morton, Bjorn Andersson,
	Anshuman Khandual, Thomas Gleixner, Steven Price

Only arm/arm64/ia64/riscv share the smp_shutdown_nonboot_cpus(). So
compiling this code conditioned on the macro
CONFIG_SHUTDOWN_NONBOOT_CPUS. Later this macro will brace the quick
kexec reboot code.

Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Marco Elver <elver@google.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Dan Li <ashimida@linux.alibaba.com>
Cc: Song Liu <song@kernel.org>
Cc: Sami Tolvanen <samitolvanen@google.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Linus Walleij <linus.walleij@linaro.org>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Tony Lindgren <tony@atomide.com>
Cc: Nick Hawkins <nick.hawkins@hpe.com>
Cc: John Crispin <john@phrozen.org>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Bjorn Andersson <bjorn.andersson@linaro.org>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Steven Price <steven.price@arm.com>
To: linux-arm-kernel@lists.infradead.org
To: linux-ia64@vger.kernel.org
To: linux-riscv@lists.infradead.org
To: linux-kernel@vger.kernel.org
---
 arch/Kconfig       | 4 ++++
 arch/arm/Kconfig   | 1 +
 arch/arm64/Kconfig | 1 +
 arch/ia64/Kconfig  | 1 +
 arch/riscv/Kconfig | 1 +
 kernel/cpu.c       | 3 +++
 6 files changed, 11 insertions(+)

diff --git a/arch/Kconfig b/arch/Kconfig
index f330410da63a..be447537d0f6 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -14,6 +14,10 @@ menu "General architecture-dependent options"
 config CRASH_CORE
 	bool
 
+config SHUTDOWN_NONBOOT_CPUS
+	select KEXEC_CORE
+	bool
+
 config KEXEC_CORE
 	select CRASH_CORE
 	bool
diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index 87badeae3181..711cfdb4f9f4 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -129,6 +129,7 @@ config ARM
 	select PCI_SYSCALL if PCI
 	select PERF_USE_VMALLOC
 	select RTC_LIB
+	select SHUTDOWN_NONBOOT_CPUS
 	select SYS_SUPPORTS_APM_EMULATION
 	select THREAD_INFO_IN_TASK
 	select HAVE_ARCH_VMAP_STACK if MMU && ARM_HAS_GROUP_RELOCS
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 571cc234d0b3..8c481a0b1829 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -223,6 +223,7 @@ config ARM64
 	select PCI_SYSCALL if PCI
 	select POWER_RESET
 	select POWER_SUPPLY
+	select SHUTDOWN_NONBOOT_CPUS
 	select SPARSE_IRQ
 	select SWIOTLB
 	select SYSCTL_EXCEPTION_TRACE
diff --git a/arch/ia64/Kconfig b/arch/ia64/Kconfig
index 26ac8ea15a9e..8a3ddea97d1b 100644
--- a/arch/ia64/Kconfig
+++ b/arch/ia64/Kconfig
@@ -52,6 +52,7 @@ config IA64
 	select ARCH_CLOCKSOURCE_DATA
 	select GENERIC_TIME_VSYSCALL
 	select LEGACY_TIMER_TICK
+	select SHUTDOWN_NONBOOT_CPUS
 	select SWIOTLB
 	select SYSCTL_ARCH_UNALIGN_NO_WARN
 	select HAVE_MOD_ARCH_SPECIFIC
diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
index ed66c31e4655..02606a48c5ea 100644
--- a/arch/riscv/Kconfig
+++ b/arch/riscv/Kconfig
@@ -120,6 +120,7 @@ config RISCV
 	select PCI_MSI if PCI
 	select RISCV_INTC
 	select RISCV_TIMER if RISCV_SBI
+	select SHUTDOWN_NONBOOT_CPUS
 	select SPARSE_IRQ
 	select SYSCTL_EXCEPTION_TRACE
 	select THREAD_INFO_IN_TASK
diff --git a/kernel/cpu.c b/kernel/cpu.c
index 338e1d426c7e..2be6ba811a01 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -1258,6 +1258,8 @@ int remove_cpu(unsigned int cpu)
 }
 EXPORT_SYMBOL_GPL(remove_cpu);
 
+#ifdef CONFIG_SHUTDOWN_NONBOOT_CPUS
+
 void smp_shutdown_nonboot_cpus(unsigned int primary_cpu)
 {
 	unsigned int cpu;
@@ -1299,6 +1301,7 @@ void smp_shutdown_nonboot_cpus(unsigned int primary_cpu)
 
 	cpu_maps_update_done();
 }
+#endif
 
 #else
 #define takedown_cpu		NULL
-- 
2.31.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [RFC 03/10] cpu/hotplug: Introduce fast kexec reboot
  2022-08-22  2:15 [RFC 00/10] arm64/riscv: Introduce fast kexec reboot Pingfan Liu
  2022-08-22  2:15 ` [RFC 02/10] cpu/hotplug: Compile smp_shutdown_nonboot_cpus() conditioned on CONFIG_SHUTDOWN_NONBOOT_CPUS Pingfan Liu
@ 2022-08-22  2:15 ` Pingfan Liu
  2022-08-22  2:15 ` [RFC 04/10] cpu/hotplug: Check the capability of kexec quick reboot Pingfan Liu
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: Pingfan Liu @ 2022-08-22  2:15 UTC (permalink / raw)
  To: linux-arm-kernel, linux-ia64, linux-riscv, linux-kernel
  Cc: Pingfan Liu, Thomas Gleixner, Steven Price,
	Kuppuswamy Sathyanarayanan, Jason A. Donenfeld,
	Frederic Weisbecker, Russell King, Catalin Marinas, Will Deacon,
	Paul Walmsley, Palmer Dabbelt, Albert Ou, Peter Zijlstra,
	Eric W. Biederman

*** Current situation 'slow kexec reboot' ***

At present, some architectures rely on smp_shutdown_nonboot_cpus() to
implement "kexec -e". Since smp_shutdown_nonboot_cpus() tears down the
cpus serially, it is very slow.

Take a close look, a cpu_down() processing on a single cpu can approximately be
divided into two stages:
-1. from CPUHP_ONLINE to CPUHP_TEARDOWN_CPU
-2. from CPUHP_TEARDOWN_CPU to CPUHP_AP_IDLE_DEAD
    which is by stop_machine_cpuslocked(take_cpu_down, NULL, cpumask_of(cpu));
    and runs on the teardown cpu.

If these processes can run in parallel, then, the reboot can be speeded
up. That is the aim of this patch.

*** Contrast to other implements ***

X86 and PowerPC have their own machine_shutdown(), which does not reply
on the cpu hot-removing mechanism. They just discriminate some critical
component and tears them down in per cpu NMI handler during the kexec
reboot. But for some architectures, let's say arm64, it is not easy to define
these critical component due to various chipmakers' implements.

As a result, sticking to the cpu hot-removing mechanism is the simplest
way to re-implement the parallel. It also renders an opportunity to
implement the cpu_down() in parallel in future (not done by this series).

*** Things worthy of consideration ***

1. The definition of a clean boundary between the first kernel and the new kernel
-1.1 firmware
     The firmware's internal state should enter into a proper state.
And this is achieved by the firmware's cpuhp_step's teardown interface
if any.

-1.2 CPU internal
     Whether the cache or PMU needs a clean shutdown before rebooting.

2. The dependency of each cpuhp_step
   The boundary of a clean cut involves only few cpuhp_step, but they
may propagate to other cpuhp_step by the way of the dependency. This
series does not bother to judge the dependency, instead, just iterate
downside each cpuhp_step. And this stragegy demands that each cpuhp_step's
teardown interface supports parallel.

*** Solution ***

Ideally, if the interface _cpu_down() can be enhanced to enable
parallel, then the fast reboot can be achieved.

But revisiting the two parts of the current cpu_down() process, the
second part 'stop_machine_cpuslocked()' is a blockade. Packed inside the
_cpu_down(), stop_machine_cpuslocked() only allow one cpu to execute the
teardown.

So this patch breaks down the process of _cpu_down(), and divides the
teardown into three steps.  And the exposed stop_machine_cpuslocked()
can be used to support parallel.
1. Bring each AP from CPUHP_ONLINE to CPUHP_TEARDOWN_CPU
   in parallel.
2. Sync on BP to wait all APs to enter CPUHP_TEARDOWN_CPU state
3. Bring each AP from CPUHP_TEARDOWN_CPU to CPUHP_AP_IDLE_DEAD by the
   interface of stop_machine_cpuslocked() in parallel.

Apparently, the step 2 is introduced in order to satisfy the condition
on which stop_machine_cpuslocked() can start on each cpu.

Then the rest issue is about how to support parallel in step 1&3.
Furtunately, each subsystem has its own carefully designed lock
mechanism. In each cpuhp_step teardown interface, adopting to the
subsystem's lock rule will make things work.

*** No rollback if failure ***

During kexec reboot, the devices have already been shutdown, there is no
way for system to roll back to a workable state. So this series also
does not consider the rollback issue.

Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Steven Price <steven.price@arm.com>
Cc: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Cc: "Jason A. Donenfeld" <Jason@zx2c4.com>
Cc: Frederic Weisbecker <frederic@kernel.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
To: linux-arm-kernel@lists.infradead.org
To: linux-ia64@vger.kernel.org
To: linux-riscv@lists.infradead.org
To: linux-kernel@vger.kernel.org
---
 kernel/cpu.c | 139 +++++++++++++++++++++++++++++++++++++++++++++++----
 1 file changed, 129 insertions(+), 10 deletions(-)

diff --git a/kernel/cpu.c b/kernel/cpu.c
index 2be6ba811a01..94ab2727d6bb 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -1260,10 +1260,125 @@ EXPORT_SYMBOL_GPL(remove_cpu);
 
 #ifdef CONFIG_SHUTDOWN_NONBOOT_CPUS
 
-void smp_shutdown_nonboot_cpus(unsigned int primary_cpu)
+/*
+ * Push all of cpus to the state CPUHP_AP_ONLINE_IDLE.
+ * Since kexec-reboot has already shut down all devices, there is no way to
+ * roll back, the cpus' teardown also requires no rollback, instead, just throw
+ * warning.
+ */
+static void cpus_down_no_rollback(struct cpumask *cpus)
 {
+	struct cpuhp_cpu_state *st;
 	unsigned int cpu;
+
+	/* launch ap work one by one, but not wait for completion */
+	for_each_cpu(cpu, cpus) {
+		st = per_cpu_ptr(&cpuhp_state, cpu);
+		/*
+		 * If the current CPU state is in the range of the AP hotplug thread,
+		 * then we need to kick the thread.
+		 */
+		if (st->state > CPUHP_TEARDOWN_CPU) {
+			cpuhp_set_state(cpu, st, CPUHP_TEARDOWN_CPU);
+			/* In order to parallel, async. And there is no way to rollback */
+			cpuhp_kick_ap_work_async(cpu);
+		}
+	}
+
+	/* wait for all ap work completion */
+	for_each_cpu(cpu, cpus) {
+		st = per_cpu_ptr(&cpuhp_state, cpu);
+		wait_for_ap_thread(st, st->bringup);
+		if (st->result)
+			pr_warn("cpu %u refuses to offline due to %d\n", cpu, st->result);
+		else if (st->state > CPUHP_TEARDOWN_CPU)
+			pr_warn("cpu %u refuses to offline, state: %d\n", cpu, st->state);
+	}
+}
+
+static int __takedown_cpu_cleanup(unsigned int cpu)
+{
+	struct cpuhp_cpu_state *st = per_cpu_ptr(&cpuhp_state, cpu);
+
+	/*
+	 * The teardown callback for CPUHP_AP_SCHED_STARTING will have removed
+	 * all runnable tasks from the CPU, there's only the idle task left now
+	 * that the migration thread is done doing the stop_machine thing.
+	 *
+	 * Wait for the stop thread to go away.
+	 */
+	wait_for_ap_thread(st, false);
+	BUG_ON(st->state != CPUHP_AP_IDLE_DEAD);
+
+	hotplug_cpu__broadcast_tick_pull(cpu);
+	/* This actually kills the CPU. */
+	__cpu_die(cpu);
+
+	tick_cleanup_dead_cpu(cpu);
+	rcutree_migrate_callbacks(cpu);
+	return 0;
+}
+
+/*
+ * There is a sync that all ap threads are done before calling this func.
+ */
+static void takedown_cpus_no_rollback(struct cpumask *cpus)
+{
+	struct cpuhp_cpu_state *st;
+	unsigned int cpu;
+
+	for_each_cpu(cpu, cpus) {
+		st = per_cpu_ptr(&cpuhp_state, cpu);
+		WARN_ON(st->state != CPUHP_TEARDOWN_CPU);
+		/* No invoke to takedown_cpu(), so set the state by manual */
+		st->state = CPUHP_AP_ONLINE;
+		cpuhp_set_state(cpu, st, CPUHP_AP_OFFLINE);
+	}
+
+	irq_lock_sparse();
+	/* ask stopper kthreads to execute take_cpu_down() in parallel */
+	stop_machine_cpuslocked(take_cpu_down, NULL, cpus);
+
+	/* Finally wait for completion and clean up */
+	for_each_cpu(cpu, cpus)
+		__takedown_cpu_cleanup(cpu);
+	irq_unlock_sparse();
+}
+
+static bool check_quick_reboot(void)
+{
+	return false;
+}
+
+static struct cpumask kexec_ap_map;
+
+void smp_shutdown_nonboot_cpus_quick_path(unsigned int primary_cpu)
+{
+	struct cpumask *cpus = &kexec_ap_map;
+	/*
+	 * To prevent other subsystem from access to __cpu_online_mask, but internally,
+	 * __cpu_disable() accesses the bitmap in parral and needs its own local lock.
+	 */
+	cpus_write_lock();
+
+	cpumask_copy(cpus, cpu_online_mask);
+	cpumask_clear_cpu(primary_cpu, cpus);
+	cpus_down_no_rollback(cpus);
+	takedown_cpus_no_rollback(cpus);
+	/*
+	 * For some subsystems, there are still remains for offline cpus from
+	 * CPUHP_BRINGUP_CPU to CPUHP_OFFLINE. But since none of them interact
+	 * with hardwares or firmware, they have no effect on the new kernel.
+	 * So skipping the cpuhp callbacks in that range
+	 */
+
+	cpus_write_unlock();
+}
+
+void smp_shutdown_nonboot_cpus(unsigned int primary_cpu)
+{
 	int error;
+	unsigned int cpu;
 
 	cpu_maps_update_begin();
 
@@ -1275,15 +1390,19 @@ void smp_shutdown_nonboot_cpus(unsigned int primary_cpu)
 	if (!cpu_online(primary_cpu))
 		primary_cpu = cpumask_first(cpu_online_mask);
 
-	for_each_online_cpu(cpu) {
-		if (cpu == primary_cpu)
-			continue;
-
-		error = cpu_down_maps_locked(cpu, CPUHP_OFFLINE);
-		if (error) {
-			pr_err("Failed to offline CPU%d - error=%d",
-				cpu, error);
-			break;
+	if (check_quick_reboot()) {
+		smp_shutdown_nonboot_cpus_quick_path(primary_cpu);
+	} else {
+		for_each_online_cpu(cpu) {
+			if (cpu == primary_cpu)
+				continue;
+
+			error = cpu_down_maps_locked(cpu, CPUHP_OFFLINE);
+			if (error) {
+				pr_err("Failed to offline CPU%d - error=%d",
+					cpu, error);
+				break;
+			}
 		}
 	}
 
-- 
2.31.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [RFC 04/10] cpu/hotplug: Check the capability of kexec quick reboot
  2022-08-22  2:15 [RFC 00/10] arm64/riscv: Introduce fast kexec reboot Pingfan Liu
  2022-08-22  2:15 ` [RFC 02/10] cpu/hotplug: Compile smp_shutdown_nonboot_cpus() conditioned on CONFIG_SHUTDOWN_NONBOOT_CPUS Pingfan Liu
  2022-08-22  2:15 ` [RFC 03/10] cpu/hotplug: Introduce fast kexec reboot Pingfan Liu
@ 2022-08-22  2:15 ` Pingfan Liu
  2022-08-22  2:15 ` [RFC 05/10] perf/arm-dsu: Make dsu_pmu_cpu_teardown() parallel Pingfan Liu
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: Pingfan Liu @ 2022-08-22  2:15 UTC (permalink / raw)
  To: linux-arm-kernel, linux-ia64, linux-riscv, linux-kernel
  Cc: Pingfan Liu, Thomas Gleixner, Steven Price,
	Kuppuswamy Sathyanarayanan, Jason A. Donenfeld,
	Frederic Weisbecker, Russell King, Catalin Marinas, Will Deacon,
	Paul Walmsley, Palmer Dabbelt, Albert Ou, Peter Zijlstra,
	Eric W. Biederman

The kexec quick reboot needs each involved cpuhp_step to run in
parallel.

There are lots of teardown cpuhp_step, but not all of them belong to
arm/arm64/riscv kexec reboot path. So introducing a member
'support_kexec_parallel' in the struct cpuhp_step to signal whether the
teardown supports parallel or not. If a cpuhp_step is used in kexec
reboot, then it needs to support parallel to enable the quick reboot.

The function check_quick_reboot() checks all teardown cpuhp_steps and
report those unsupported if any.

Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Steven Price <steven.price@arm.com>
Cc: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Cc: "Jason A. Donenfeld" <Jason@zx2c4.com>
Cc: Frederic Weisbecker <frederic@kernel.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
To: linux-arm-kernel@lists.infradead.org
To: linux-ia64@vger.kernel.org
To: linux-riscv@lists.infradead.org
To: linux-kernel@vger.kernel.org
---
 include/linux/cpuhotplug.h |  2 ++
 kernel/cpu.c               | 28 +++++++++++++++++++++++++++-
 2 files changed, 29 insertions(+), 1 deletion(-)

diff --git a/include/linux/cpuhotplug.h b/include/linux/cpuhotplug.h
index f61447913db9..73093fc15aec 100644
--- a/include/linux/cpuhotplug.h
+++ b/include/linux/cpuhotplug.h
@@ -374,6 +374,8 @@ static inline int cpuhp_setup_state_multi(enum cpuhp_state state,
 				   (void *) teardown, true);
 }
 
+void cpuhp_set_step_parallel(enum cpuhp_state state);
+
 int __cpuhp_state_add_instance(enum cpuhp_state state, struct hlist_node *node,
 			       bool invoke);
 int __cpuhp_state_add_instance_cpuslocked(enum cpuhp_state state,
diff --git a/kernel/cpu.c b/kernel/cpu.c
index 94ab2727d6bb..1261c3f3be51 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -137,6 +137,9 @@ struct cpuhp_step {
 	/* public: */
 	bool			cant_stop;
 	bool			multi_instance;
+#ifdef CONFIG_SHUTDOWN_NONBOOT_CPUS
+	bool			support_kexec_parallel;
+#endif
 };
 
 static DEFINE_MUTEX(cpuhp_state_mutex);
@@ -147,6 +150,14 @@ static struct cpuhp_step *cpuhp_get_step(enum cpuhp_state state)
 	return cpuhp_hp_states + state;
 }
 
+#ifdef CONFIG_SHUTDOWN_NONBOOT_CPUS
+void cpuhp_set_step_parallel(enum cpuhp_state state)
+{
+	cpuhp_hp_states[state].support_kexec_parallel = true;
+}
+EXPORT_SYMBOL(cpuhp_set_step_parallel);
+#endif
+
 static bool cpuhp_step_empty(bool bringup, struct cpuhp_step *step)
 {
 	return bringup ? !step->startup.single : !step->teardown.single;
@@ -1347,7 +1358,22 @@ static void takedown_cpus_no_rollback(struct cpumask *cpus)
 
 static bool check_quick_reboot(void)
 {
-	return false;
+	struct cpuhp_step *step;
+	enum cpuhp_state state;
+	bool ret = true;
+
+	for (state = CPUHP_ONLINE; state >= CPUHP_AP_OFFLINE; state--) {
+		step = cpuhp_get_step(state);
+		if (step->teardown.single == NULL)
+			continue;
+		if (step->support_kexec_parallel == false) {
+			pr_info("cpuhp state:%d, %s, does not support cpudown in parallel\n",
+					state, step->name);
+			ret = false;
+		}
+	}
+
+	return ret;
 }
 
 static struct cpumask kexec_ap_map;
-- 
2.31.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [RFC 05/10] perf/arm-dsu: Make dsu_pmu_cpu_teardown() parallel
  2022-08-22  2:15 [RFC 00/10] arm64/riscv: Introduce fast kexec reboot Pingfan Liu
                   ` (2 preceding siblings ...)
  2022-08-22  2:15 ` [RFC 04/10] cpu/hotplug: Check the capability of kexec quick reboot Pingfan Liu
@ 2022-08-22  2:15 ` Pingfan Liu
  2022-08-22  2:15 ` [RFC 08/10] cpuhp: Replace cpumask_any_but(cpu_online_mask, cpu) Pingfan Liu
  2022-08-22  2:15 ` [RFC 10/10] arm64: smp: Make __cpu_disable() parallel Pingfan Liu
  5 siblings, 0 replies; 7+ messages in thread
From: Pingfan Liu @ 2022-08-22  2:15 UTC (permalink / raw)
  To: linux-arm-kernel, linux-kernel; +Cc: Pingfan Liu, Will Deacon, Mark Rutland

In the case of kexec quick reboot, dsu_pmu_cpu_teardown() confronts
parallel and lock are needed to protect the contest on a dsu_pmu.

Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
Cc: Will Deacon <will@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
To: linux-arm-kernel@lists.infradead.org
To: linux-kernel@vger.kernel.org
---
 drivers/perf/arm_dsu_pmu.c | 14 +++++++++++---
 1 file changed, 11 insertions(+), 3 deletions(-)

diff --git a/drivers/perf/arm_dsu_pmu.c b/drivers/perf/arm_dsu_pmu.c
index a36698a90d2f..aa9f4393ff0c 100644
--- a/drivers/perf/arm_dsu_pmu.c
+++ b/drivers/perf/arm_dsu_pmu.c
@@ -833,16 +833,23 @@ static int dsu_pmu_cpu_teardown(unsigned int cpu, struct hlist_node *node)
 	struct dsu_pmu *dsu_pmu = hlist_entry_safe(node, struct dsu_pmu,
 						   cpuhp_node);
 
-	if (!cpumask_test_and_clear_cpu(cpu, &dsu_pmu->active_cpu))
+	raw_spin_lock(&dsu_pmu->pmu_lock);
+	if (!cpumask_test_and_clear_cpu(cpu, &dsu_pmu->active_cpu)) {
+		raw_spin_unlock(&dsu_pmu->pmu_lock);
 		return 0;
+	}
 
 	dst = dsu_pmu_get_online_cpu_any_but(dsu_pmu, cpu);
 	/* If there are no active CPUs in the DSU, leave IRQ disabled */
-	if (dst >= nr_cpu_ids)
+	if (dst >= nr_cpu_ids) {
+		raw_spin_unlock(&dsu_pmu->pmu_lock);
 		return 0;
+	}
 
-	perf_pmu_migrate_context(&dsu_pmu->pmu, cpu, dst);
+	/* dst should not be in dying mask. So after setting, blocking parallel */
 	dsu_pmu_set_active_cpu(dst, dsu_pmu);
+	raw_spin_unlock(&dsu_pmu->pmu_lock);
+	perf_pmu_migrate_context(&dsu_pmu->pmu, cpu, dst);
 
 	return 0;
 }
@@ -858,6 +865,7 @@ static int __init dsu_pmu_init(void)
 	if (ret < 0)
 		return ret;
 	dsu_pmu_cpuhp_state = ret;
+	cpuhp_set_step_parallel(ret);
 	return platform_driver_register(&dsu_pmu_driver);
 }
 
-- 
2.31.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [RFC 08/10] cpuhp: Replace cpumask_any_but(cpu_online_mask, cpu)
  2022-08-22  2:15 [RFC 00/10] arm64/riscv: Introduce fast kexec reboot Pingfan Liu
                   ` (3 preceding siblings ...)
  2022-08-22  2:15 ` [RFC 05/10] perf/arm-dsu: Make dsu_pmu_cpu_teardown() parallel Pingfan Liu
@ 2022-08-22  2:15 ` Pingfan Liu
  2022-08-22  2:15 ` [RFC 10/10] arm64: smp: Make __cpu_disable() parallel Pingfan Liu
  5 siblings, 0 replies; 7+ messages in thread
From: Pingfan Liu @ 2022-08-22  2:15 UTC (permalink / raw)
  To: linux-arm-kernel, dmaengine, linux-fpga, intel-gfx, dri-devel,
	linux-arm-msm, linuxppc-dev, linux-kernel
  Cc: Pingfan Liu, Russell King, Shawn Guo, Sascha Hauer,
	Pengutronix Kernel Team, Fabio Estevam, NXP Linux Team,
	Fenghua Yu, Dave Jiang, Vinod Koul, Wu Hao, Tom Rix,
	Moritz Fischer, Xu Yilun, Jani Nikula, Joonas Lahtinen,
	Rodrigo Vivi, Tvrtko Ursulin, David Airlie, Daniel Vetter,
	Will Deacon, Mark Rutland, Frank Li, Shaokun Zhang, Qi Liu,
	Andy Gross, Bjorn Andersson, Konrad Dybcio, Khuong Dinh, Li Yang,
	Yury Norov

In a kexec quick reboot path, the dying cpus are still on
cpu_online_mask. During the teardown of cpu, a subsystem needs to
migrate its broker to a real online cpu.

This patch replaces cpumask_any_but(cpu_online_mask, cpu) in a teardown
procedure with cpumask_not_dying_but(cpu_online_mask, cpu).

Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Shawn Guo <shawnguo@kernel.org>
Cc: Sascha Hauer <s.hauer@pengutronix.de>
Cc: Pengutronix Kernel Team <kernel@pengutronix.de>
Cc: Fabio Estevam <festevam@gmail.com>
Cc: NXP Linux Team <linux-imx@nxp.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Vinod Koul <vkoul@kernel.org>
Cc: Wu Hao <hao.wu@intel.com>
Cc: Tom Rix <trix@redhat.com>
Cc: Moritz Fischer <mdf@kernel.org>
Cc: Xu Yilun <yilun.xu@intel.com>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: David Airlie <airlied@linux.ie>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: Will Deacon <will@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Frank Li <Frank.li@nxp.com>
Cc: Shaokun Zhang <zhangshaokun@hisilicon.com>
Cc: Qi Liu <liuqi115@huawei.com>
Cc: Andy Gross <agross@kernel.org>
Cc: Bjorn Andersson <bjorn.andersson@linaro.org>
Cc: Konrad Dybcio <konrad.dybcio@somainline.org>
Cc: Khuong Dinh <khuong@os.amperecomputing.com>
Cc: Li Yang <leoyang.li@nxp.com>
Cc: Yury Norov <yury.norov@gmail.com>
To: linux-arm-kernel@lists.infradead.org
To: dmaengine@vger.kernel.org
To: linux-fpga@vger.kernel.org
To: intel-gfx@lists.freedesktop.org
To: dri-devel@lists.freedesktop.org
To: linux-arm-msm@vger.kernel.org
To: linuxppc-dev@lists.ozlabs.org
To: linux-kernel@vger.kernel.org
---
 arch/arm/mach-imx/mmdc.c                 | 2 +-
 arch/arm/mm/cache-l2x0-pmu.c             | 2 +-
 drivers/dma/idxd/perfmon.c               | 2 +-
 drivers/fpga/dfl-fme-perf.c              | 2 +-
 drivers/gpu/drm/i915/i915_pmu.c          | 2 +-
 drivers/perf/arm-cci.c                   | 2 +-
 drivers/perf/arm-ccn.c                   | 2 +-
 drivers/perf/arm-cmn.c                   | 4 ++--
 drivers/perf/arm_dmc620_pmu.c            | 2 +-
 drivers/perf/arm_dsu_pmu.c               | 2 +-
 drivers/perf/arm_smmuv3_pmu.c            | 2 +-
 drivers/perf/fsl_imx8_ddr_perf.c         | 2 +-
 drivers/perf/hisilicon/hisi_uncore_pmu.c | 2 +-
 drivers/perf/marvell_cn10k_tad_pmu.c     | 2 +-
 drivers/perf/qcom_l2_pmu.c               | 2 +-
 drivers/perf/qcom_l3_pmu.c               | 2 +-
 drivers/perf/xgene_pmu.c                 | 2 +-
 drivers/soc/fsl/qbman/bman_portal.c      | 2 +-
 drivers/soc/fsl/qbman/qman_portal.c      | 2 +-
 19 files changed, 20 insertions(+), 20 deletions(-)

diff --git a/arch/arm/mach-imx/mmdc.c b/arch/arm/mach-imx/mmdc.c
index af12668d0bf5..a109a7ea8613 100644
--- a/arch/arm/mach-imx/mmdc.c
+++ b/arch/arm/mach-imx/mmdc.c
@@ -220,7 +220,7 @@ static int mmdc_pmu_offline_cpu(unsigned int cpu, struct hlist_node *node)
 	if (!cpumask_test_and_clear_cpu(cpu, &pmu_mmdc->cpu))
 		return 0;
 
-	target = cpumask_any_but(cpu_online_mask, cpu);
+	target = cpumask_not_dying_but(cpu_online_mask, cpu);
 	if (target >= nr_cpu_ids)
 		return 0;
 
diff --git a/arch/arm/mm/cache-l2x0-pmu.c b/arch/arm/mm/cache-l2x0-pmu.c
index 993fefdc167a..1b0037ef7fa5 100644
--- a/arch/arm/mm/cache-l2x0-pmu.c
+++ b/arch/arm/mm/cache-l2x0-pmu.c
@@ -428,7 +428,7 @@ static int l2x0_pmu_offline_cpu(unsigned int cpu)
 	if (!cpumask_test_and_clear_cpu(cpu, &pmu_cpu))
 		return 0;
 
-	target = cpumask_any_but(cpu_online_mask, cpu);
+	target = cpumask_not_dying_but(cpu_online_mask, cpu);
 	if (target >= nr_cpu_ids)
 		return 0;
 
diff --git a/drivers/dma/idxd/perfmon.c b/drivers/dma/idxd/perfmon.c
index d73004f47cf4..f3f1ccb55f73 100644
--- a/drivers/dma/idxd/perfmon.c
+++ b/drivers/dma/idxd/perfmon.c
@@ -528,7 +528,7 @@ static int perf_event_cpu_offline(unsigned int cpu, struct hlist_node *node)
 	if (!cpumask_test_and_clear_cpu(cpu, &perfmon_dsa_cpu_mask))
 		return 0;
 
-	target = cpumask_any_but(cpu_online_mask, cpu);
+	target = cpumask_not_dying_but(cpu_online_mask, cpu);
 
 	/* migrate events if there is a valid target */
 	if (target < nr_cpu_ids)
diff --git a/drivers/fpga/dfl-fme-perf.c b/drivers/fpga/dfl-fme-perf.c
index 587c82be12f7..57804f28357e 100644
--- a/drivers/fpga/dfl-fme-perf.c
+++ b/drivers/fpga/dfl-fme-perf.c
@@ -948,7 +948,7 @@ static int fme_perf_offline_cpu(unsigned int cpu, struct hlist_node *node)
 	if (cpu != priv->cpu)
 		return 0;
 
-	target = cpumask_any_but(cpu_online_mask, cpu);
+	target = cpumask_not_dying_but(cpu_online_mask, cpu);
 	if (target >= nr_cpu_ids)
 		return 0;
 
diff --git a/drivers/gpu/drm/i915/i915_pmu.c b/drivers/gpu/drm/i915/i915_pmu.c
index 958b37123bf1..f866f9223492 100644
--- a/drivers/gpu/drm/i915/i915_pmu.c
+++ b/drivers/gpu/drm/i915/i915_pmu.c
@@ -1068,7 +1068,7 @@ static int i915_pmu_cpu_offline(unsigned int cpu, struct hlist_node *node)
 		return 0;
 
 	if (cpumask_test_and_clear_cpu(cpu, &i915_pmu_cpumask)) {
-		target = cpumask_any_but(topology_sibling_cpumask(cpu), cpu);
+		target = cpumask_not_dying_but(topology_sibling_cpumask(cpu), cpu);
 
 		/* Migrate events if there is a valid target */
 		if (target < nr_cpu_ids) {
diff --git a/drivers/perf/arm-cci.c b/drivers/perf/arm-cci.c
index 03b1309875ae..481da937fb9d 100644
--- a/drivers/perf/arm-cci.c
+++ b/drivers/perf/arm-cci.c
@@ -1447,7 +1447,7 @@ static int cci_pmu_offline_cpu(unsigned int cpu)
 	if (!g_cci_pmu || cpu != g_cci_pmu->cpu)
 		return 0;
 
-	target = cpumask_any_but(cpu_online_mask, cpu);
+	target = cpumask_not_dying_but(cpu_online_mask, cpu);
 	if (target >= nr_cpu_ids)
 		return 0;
 
diff --git a/drivers/perf/arm-ccn.c b/drivers/perf/arm-ccn.c
index 728d13d8e98a..573d6906ec9b 100644
--- a/drivers/perf/arm-ccn.c
+++ b/drivers/perf/arm-ccn.c
@@ -1205,7 +1205,7 @@ static int arm_ccn_pmu_offline_cpu(unsigned int cpu, struct hlist_node *node)
 
 	if (cpu != dt->cpu)
 		return 0;
-	target = cpumask_any_but(cpu_online_mask, cpu);
+	target = cpumask_not_dying_but(cpu_online_mask, cpu);
 	if (target >= nr_cpu_ids)
 		return 0;
 	perf_pmu_migrate_context(&dt->pmu, cpu, target);
diff --git a/drivers/perf/arm-cmn.c b/drivers/perf/arm-cmn.c
index 80d8309652a4..1847182a1ed3 100644
--- a/drivers/perf/arm-cmn.c
+++ b/drivers/perf/arm-cmn.c
@@ -1787,9 +1787,9 @@ static int arm_cmn_pmu_offline_cpu(unsigned int cpu, struct hlist_node *cpuhp_no
 	node = dev_to_node(cmn->dev);
 	if (cpumask_and(&mask, cpumask_of_node(node), cpu_online_mask) &&
 	    cpumask_andnot(&mask, &mask, cpumask_of(cpu)))
-		target = cpumask_any(&mask);
+		target = cpumask_not_dying_but(&mask, cpu);
 	else
-		target = cpumask_any_but(cpu_online_mask, cpu);
+		target = cpumask_not_dying_but(cpu_online_mask, cpu);
 	if (target < nr_cpu_ids)
 		arm_cmn_migrate(cmn, target);
 	return 0;
diff --git a/drivers/perf/arm_dmc620_pmu.c b/drivers/perf/arm_dmc620_pmu.c
index 280a6ae3e27c..3a0a2bb92e12 100644
--- a/drivers/perf/arm_dmc620_pmu.c
+++ b/drivers/perf/arm_dmc620_pmu.c
@@ -611,7 +611,7 @@ static int dmc620_pmu_cpu_teardown(unsigned int cpu,
 	if (cpu != irq->cpu)
 		return 0;
 
-	target = cpumask_any_but(cpu_online_mask, cpu);
+	target = cpumask_not_dying_but(cpu_online_mask, cpu);
 	if (target >= nr_cpu_ids)
 		return 0;
 
diff --git a/drivers/perf/arm_dsu_pmu.c b/drivers/perf/arm_dsu_pmu.c
index aa9f4393ff0c..e19ce0406b02 100644
--- a/drivers/perf/arm_dsu_pmu.c
+++ b/drivers/perf/arm_dsu_pmu.c
@@ -236,7 +236,7 @@ static int dsu_pmu_get_online_cpu_any_but(struct dsu_pmu *dsu_pmu, int cpu)
 
 	cpumask_and(&online_supported,
 			 &dsu_pmu->associated_cpus, cpu_online_mask);
-	return cpumask_any_but(&online_supported, cpu);
+	return cpumask_not_dying_but(&online_supported, cpu);
 }
 
 static inline bool dsu_pmu_counter_valid(struct dsu_pmu *dsu_pmu, u32 idx)
diff --git a/drivers/perf/arm_smmuv3_pmu.c b/drivers/perf/arm_smmuv3_pmu.c
index 00d4c45a8017..827315d31056 100644
--- a/drivers/perf/arm_smmuv3_pmu.c
+++ b/drivers/perf/arm_smmuv3_pmu.c
@@ -640,7 +640,7 @@ static int smmu_pmu_offline_cpu(unsigned int cpu, struct hlist_node *node)
 	if (cpu != smmu_pmu->on_cpu)
 		return 0;
 
-	target = cpumask_any_but(cpu_online_mask, cpu);
+	target = cpumask_not_dying_but(cpu_online_mask, cpu);
 	if (target >= nr_cpu_ids)
 		return 0;
 
diff --git a/drivers/perf/fsl_imx8_ddr_perf.c b/drivers/perf/fsl_imx8_ddr_perf.c
index 8e058e08fe81..4e0276fc1548 100644
--- a/drivers/perf/fsl_imx8_ddr_perf.c
+++ b/drivers/perf/fsl_imx8_ddr_perf.c
@@ -664,7 +664,7 @@ static int ddr_perf_offline_cpu(unsigned int cpu, struct hlist_node *node)
 	if (cpu != pmu->cpu)
 		return 0;
 
-	target = cpumask_any_but(cpu_online_mask, cpu);
+	target = cpumask_not_dying_but(cpu_online_mask, cpu);
 	if (target >= nr_cpu_ids)
 		return 0;
 
diff --git a/drivers/perf/hisilicon/hisi_uncore_pmu.c b/drivers/perf/hisilicon/hisi_uncore_pmu.c
index fbc8a93d5eac..8c39da8f4b3c 100644
--- a/drivers/perf/hisilicon/hisi_uncore_pmu.c
+++ b/drivers/perf/hisilicon/hisi_uncore_pmu.c
@@ -518,7 +518,7 @@ int hisi_uncore_pmu_offline_cpu(unsigned int cpu, struct hlist_node *node)
 	/* Choose a new CPU to migrate ownership of the PMU to */
 	cpumask_and(&pmu_online_cpus, &hisi_pmu->associated_cpus,
 		    cpu_online_mask);
-	target = cpumask_any_but(&pmu_online_cpus, cpu);
+	target = cpumask_not_dying_but(&pmu_online_cpus, cpu);
 	if (target >= nr_cpu_ids)
 		return 0;
 
diff --git a/drivers/perf/marvell_cn10k_tad_pmu.c b/drivers/perf/marvell_cn10k_tad_pmu.c
index 69c3050a4348..268e3288893d 100644
--- a/drivers/perf/marvell_cn10k_tad_pmu.c
+++ b/drivers/perf/marvell_cn10k_tad_pmu.c
@@ -387,7 +387,7 @@ static int tad_pmu_offline_cpu(unsigned int cpu, struct hlist_node *node)
 	if (cpu != pmu->cpu)
 		return 0;
 
-	target = cpumask_any_but(cpu_online_mask, cpu);
+	target = cpumask_not_dying_but(cpu_online_mask, cpu);
 	if (target >= nr_cpu_ids)
 		return 0;
 
diff --git a/drivers/perf/qcom_l2_pmu.c b/drivers/perf/qcom_l2_pmu.c
index 30234c261b05..8823d0bb6476 100644
--- a/drivers/perf/qcom_l2_pmu.c
+++ b/drivers/perf/qcom_l2_pmu.c
@@ -822,7 +822,7 @@ static int l2cache_pmu_offline_cpu(unsigned int cpu, struct hlist_node *node)
 	/* Any other CPU for this cluster which is still online */
 	cpumask_and(&cluster_online_cpus, &cluster->cluster_cpus,
 		    cpu_online_mask);
-	target = cpumask_any_but(&cluster_online_cpus, cpu);
+	target = cpumask_not_dying_but(&cluster_online_cpus, cpu);
 	if (target >= nr_cpu_ids) {
 		disable_irq(cluster->irq);
 		return 0;
diff --git a/drivers/perf/qcom_l3_pmu.c b/drivers/perf/qcom_l3_pmu.c
index 1ff2ff6582bf..ba26b2fa0736 100644
--- a/drivers/perf/qcom_l3_pmu.c
+++ b/drivers/perf/qcom_l3_pmu.c
@@ -718,7 +718,7 @@ static int qcom_l3_cache_pmu_offline_cpu(unsigned int cpu, struct hlist_node *no
 
 	if (!cpumask_test_and_clear_cpu(cpu, &l3pmu->cpumask))
 		return 0;
-	target = cpumask_any_but(cpu_online_mask, cpu);
+	target = cpumask_not_dying_but(cpu_online_mask, cpu);
 	if (target >= nr_cpu_ids)
 		return 0;
 	perf_pmu_migrate_context(&l3pmu->pmu, cpu, target);
diff --git a/drivers/perf/xgene_pmu.c b/drivers/perf/xgene_pmu.c
index 0c32dffc7ede..069eb0a0d3ba 100644
--- a/drivers/perf/xgene_pmu.c
+++ b/drivers/perf/xgene_pmu.c
@@ -1804,7 +1804,7 @@ static int xgene_pmu_offline_cpu(unsigned int cpu, struct hlist_node *node)
 
 	if (!cpumask_test_and_clear_cpu(cpu, &xgene_pmu->cpu))
 		return 0;
-	target = cpumask_any_but(cpu_online_mask, cpu);
+	target = cpumask_not_dying_but(cpu_online_mask, cpu);
 	if (target >= nr_cpu_ids)
 		return 0;
 
diff --git a/drivers/soc/fsl/qbman/bman_portal.c b/drivers/soc/fsl/qbman/bman_portal.c
index 4d7b9caee1c4..8ebcf87e7d06 100644
--- a/drivers/soc/fsl/qbman/bman_portal.c
+++ b/drivers/soc/fsl/qbman/bman_portal.c
@@ -67,7 +67,7 @@ static int bman_offline_cpu(unsigned int cpu)
 		return 0;
 
 	/* use any other online CPU */
-	cpu = cpumask_any_but(cpu_online_mask, cpu);
+	cpu = cpumask_not_dying_but(cpu_online_mask, cpu);
 	irq_set_affinity(pcfg->irq, cpumask_of(cpu));
 	return 0;
 }
diff --git a/drivers/soc/fsl/qbman/qman_portal.c b/drivers/soc/fsl/qbman/qman_portal.c
index e23b60618c1a..3807a8285ced 100644
--- a/drivers/soc/fsl/qbman/qman_portal.c
+++ b/drivers/soc/fsl/qbman/qman_portal.c
@@ -148,7 +148,7 @@ static int qman_offline_cpu(unsigned int cpu)
 		pcfg = qman_get_qm_portal_config(p);
 		if (pcfg) {
 			/* select any other online CPU */
-			cpu = cpumask_any_but(cpu_online_mask, cpu);
+			cpu = cpumask_not_dying_but(cpu_online_mask, cpu);
 			irq_set_affinity(pcfg->irq, cpumask_of(cpu));
 			qman_portal_update_sdest(pcfg, cpu);
 		}
-- 
2.31.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [RFC 10/10] arm64: smp: Make __cpu_disable() parallel
  2022-08-22  2:15 [RFC 00/10] arm64/riscv: Introduce fast kexec reboot Pingfan Liu
                   ` (4 preceding siblings ...)
  2022-08-22  2:15 ` [RFC 08/10] cpuhp: Replace cpumask_any_but(cpu_online_mask, cpu) Pingfan Liu
@ 2022-08-22  2:15 ` Pingfan Liu
  5 siblings, 0 replies; 7+ messages in thread
From: Pingfan Liu @ 2022-08-22  2:15 UTC (permalink / raw)
  To: linux-arm-kernel, linux-kernel
  Cc: Pingfan Liu, Catalin Marinas, Will Deacon, Viresh Kumar,
	Sudeep Holla, Phil Auld, Rob Herring, Ben Dooks

On a dying cpu, take_cpu_down()->__cpu_disable(), which means if the
teardown path supports parallel, __cpu_disable() confront the parallel,
which may ruin cpu_online_mask etc if no extra lock provides the
protection.

At present, the cpumask is protected by cpu_add_remove_lock, that lock
is quite above __cpu_disable(). In order to protect __cpu_disable() from
parrallel in kexec quick reboot path, introducing a local lock
cpumap_lock.

Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Cc: Sudeep Holla <sudeep.holla@arm.com>
Cc: Phil Auld <pauld@redhat.com>
Cc: Rob Herring <robh@kernel.org>
Cc: Ben Dooks <ben-linux@fluff.org>
To: linux-arm-kernel@lists.infradead.org
To: linux-kernel@vger.kernel.org
---
 arch/arm64/kernel/smp.c | 31 +++++++++++++++++++++++--------
 1 file changed, 23 insertions(+), 8 deletions(-)

diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
index ffc5d76cf695..fee8879048b0 100644
--- a/arch/arm64/kernel/smp.c
+++ b/arch/arm64/kernel/smp.c
@@ -287,6 +287,28 @@ static int op_cpu_disable(unsigned int cpu)
 	return 0;
 }
 
+static DEFINE_SPINLOCK(cpumap_lock);
+
+static void __cpu_clear_maps(unsigned int cpu)
+{
+	/*
+	 * In the case of kexec rebooting, the cpu_add_remove_lock mutex can not protect
+	 */
+	if (kexec_in_progress)
+		spin_lock(&cpumap_lock);
+	remove_cpu_topology(cpu);
+	numa_remove_cpu(cpu);
+
+	/*
+	 * Take this CPU offline.  Once we clear this, we can't return,
+	 * and we must not schedule until we're ready to give up the cpu.
+	 */
+	set_cpu_online(cpu, false);
+	if (kexec_in_progress)
+		spin_unlock(&cpumap_lock);
+
+}
+
 /*
  * __cpu_disable runs on the processor to be shutdown.
  */
@@ -299,14 +321,7 @@ int __cpu_disable(void)
 	if (ret)
 		return ret;
 
-	remove_cpu_topology(cpu);
-	numa_remove_cpu(cpu);
-
-	/*
-	 * Take this CPU offline.  Once we clear this, we can't return,
-	 * and we must not schedule until we're ready to give up the cpu.
-	 */
-	set_cpu_online(cpu, false);
+	__cpu_clear_maps(cpu);
 	ipi_teardown(cpu);
 
 	/*
-- 
2.31.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2022-08-22  2:19 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-08-22  2:15 [RFC 00/10] arm64/riscv: Introduce fast kexec reboot Pingfan Liu
2022-08-22  2:15 ` [RFC 02/10] cpu/hotplug: Compile smp_shutdown_nonboot_cpus() conditioned on CONFIG_SHUTDOWN_NONBOOT_CPUS Pingfan Liu
2022-08-22  2:15 ` [RFC 03/10] cpu/hotplug: Introduce fast kexec reboot Pingfan Liu
2022-08-22  2:15 ` [RFC 04/10] cpu/hotplug: Check the capability of kexec quick reboot Pingfan Liu
2022-08-22  2:15 ` [RFC 05/10] perf/arm-dsu: Make dsu_pmu_cpu_teardown() parallel Pingfan Liu
2022-08-22  2:15 ` [RFC 08/10] cpuhp: Replace cpumask_any_but(cpu_online_mask, cpu) Pingfan Liu
2022-08-22  2:15 ` [RFC 10/10] arm64: smp: Make __cpu_disable() parallel Pingfan Liu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).