linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/2] Fix hibernate on SMP spin-table systems
@ 2016-06-22  9:06 James Morse
  2016-06-22  9:06 ` [PATCH v2 1/2] arm64: smp: Add function to determine if cpus are stuck in the kernel James Morse
  2016-06-22  9:06 ` [PATCH v2 2/2] arm64: hibernate: Don't hibernate on systems with stuck CPUs James Morse
  0 siblings, 2 replies; 4+ messages in thread
From: James Morse @ 2016-06-22  9:06 UTC (permalink / raw)
  To: linux-arm-kernel

Hi all,

These two patches prevent hibernate on systems that use spin-tables and
have multiple CPUs. It also wires up 'cpus_stuck_in_kernel' which was
added for v4.7.

Prior to 44dbcc93ab67 ("arm64: Fix behavior of maxcpus=N"), we would
bring all the CPUs we would ever have up during boot. On a system with
spin tables and multiple CPUS the core hibernate code would prevent
hibernation because it can't disable secondary CPUs.

After 44dbcc93ab67, when we boot with 'maxcpus=1', we no longer bring
all the CPUs up, but we do move them into the secondary_holding_pen.
Resuming from hibernate will overwrite the secondary_holding_pen,
potentially releasing the secondary CPUs. If the kernel has been
loaded at a different physical address over hibernate and resume
the secondary_holding_pen may be at a different location after resume.
The core code can't help us with this, because these CPUs don't show
up in 'num_online_cpus()'

These two patches fix the problem by detecting multiple 'possible cpus'
that we have no mechanism to take offline and preventing hibernate [2].
This only happens for spin-table systems with multiple CPUs.

Kexec needs the same checks, so the 'or spin-tables' logic[0] got added
to the helper function.

(This problem was spotted on another thread [1])

Changes since v1:
 * Fixed the comment in smp.h (less verbose, more precise).
 * Improved readability with a have_cpu_die() in smp.c

[v1] http://www.spinics.net/lists/arm-kernel/msg512142.html

[0] http://www.spinics.net/lists/arm-kernel/msg510097.html
[1] http://www.spinics.net/lists/arm-kernel/msg511880.html
[2] Failing to hibernate an SMP spin-tables system booted with maxcpus=1

---------------------%<---------------------
root@localhost:~# echo disk > /sys/power/state
[12248.197718] PM: Syncing filesystems ... done.
[12248.197727] Freezing user space processes ... (elapsed 0.001 seconds) done.
[12248.203197] PM: Preallocating image memory... done (allocated 50769 pages)
[12261.838699] PM: Allocated 203076 kbytes in 13.63 seconds (14.89 MB/s)
[12261.838760] Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done.
[12261.840732] Suspending console(s) (use no_console_suspend to debug)
[12261.842540] PM: freeze of devices complete after 1.732 msecs
[12261.843897] PM: late freeze of devices complete after 1.333 msecs
[12261.845191] PM: noirq freeze of devices complete after 1.272 msecs
[12261.845201] Disabling non-boot CPUs ...
[12261.845206] hibernate: Can't hibernate: no mechanism to offline secondary CPUs.
[12261.845206] PM: Error -16 creating hibernation image
[12261.846140] PM: noirq recover of devices complete after 0.908 msecs
[12261.847160] PM: early recover of devices complete after 0.940 msecs
[12262.765886] PM: recover of devices complete after 1.452 msecs
[12262.769191] Restarting tasks ... done.
-bash: echo: write error: Device or resource busy
root at localhost:~#
---------------------%<---------------------
James Morse (2):
  arm64: smp: Add function to determine if cpus are stuck in the kernel
  arm64: hibernate: Don't hibernate on systems with stuck CPUs

 arch/arm64/include/asm/smp.h  | 12 ++++++++++++
 arch/arm64/kernel/hibernate.c |  6 ++++++
 arch/arm64/kernel/smp.c       | 18 ++++++++++++++++++
 3 files changed, 36 insertions(+)

-- 
2.8.0.rc3

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [PATCH v2 1/2] arm64: smp: Add function to determine if cpus are stuck in the kernel
  2016-06-22  9:06 [PATCH v2 0/2] Fix hibernate on SMP spin-table systems James Morse
@ 2016-06-22  9:06 ` James Morse
  2016-06-22  9:06 ` [PATCH v2 2/2] arm64: hibernate: Don't hibernate on systems with stuck CPUs James Morse
  1 sibling, 0 replies; 4+ messages in thread
From: James Morse @ 2016-06-22  9:06 UTC (permalink / raw)
  To: linux-arm-kernel

kernel/smp.c has a fancy counter that keeps track of the number of CPUs
it marked as not-present and left in cpu_park_loop(). If there are any
CPUs spinning in here, features like kexec or hibernate may release them
by overwriting this memory.

This problem also occurs on machines using spin-tables to release
secondary cores.
After commit 44dbcc93ab67 ("arm64: Fix behavior of maxcpus=N")
we bring all known cpus into the secondary holding pen, meaning this
memory can't be re-used by kexec or hibernate.

Add a function cpus_are_stuck_in_kernel() to determine if either of these
cases have occurred.

Signed-off-by: James Morse <james.morse@arm.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
---
Changes since v1:
 * Fixed the comment in smp.h (less verbose, more precise).
 * Improved readability with a have_cpu_die() in smp.c

 arch/arm64/include/asm/smp.h | 12 ++++++++++++
 arch/arm64/kernel/smp.c      | 18 ++++++++++++++++++
 2 files changed, 30 insertions(+)

diff --git a/arch/arm64/include/asm/smp.h b/arch/arm64/include/asm/smp.h
index 433e50405274..022644704a93 100644
--- a/arch/arm64/include/asm/smp.h
+++ b/arch/arm64/include/asm/smp.h
@@ -124,6 +124,18 @@ static inline void cpu_panic_kernel(void)
 	cpu_park_loop();
 }
 
+/*
+ * If a secondary CPU enters the kernel but fails to come online,
+ * (e.g. due to mismatched features), and cannot exit the kernel,
+ * we increment cpus_stuck_in_kernel and leave the CPU in a
+ * quiesecent loop within the kernel text. The memory containing
+ * this loop must not be re-used for anything else as the 'stuck'
+ * core is executing it.
+ *
+ * This function is used to inhibit features like kexec and hibernate.
+ */
+bool cpus_are_stuck_in_kernel(void);
+
 #endif /* ifndef __ASSEMBLY__ */
 
 #endif /* ifndef __ASM_SMP_H */
diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
index 678e0842cb3b..62ff3c0622e2 100644
--- a/arch/arm64/kernel/smp.c
+++ b/arch/arm64/kernel/smp.c
@@ -909,3 +909,21 @@ int setup_profiling_timer(unsigned int multiplier)
 {
 	return -EINVAL;
 }
+
+static bool have_cpu_die(void)
+{
+#ifdef CONFIG_HOTPLUG_CPU
+	int any_cpu = raw_smp_processor_id();
+
+	if (cpu_ops[any_cpu]->cpu_die)
+		return true;
+#endif
+	return false;
+}
+
+bool cpus_are_stuck_in_kernel(void)
+{
+	bool smp_spin_tables = (num_possible_cpus() > 1 && !have_cpu_die());
+
+	return !!cpus_stuck_in_kernel || smp_spin_tables;
+}
-- 
2.8.0.rc3

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* [PATCH v2 2/2] arm64: hibernate: Don't hibernate on systems with stuck CPUs
  2016-06-22  9:06 [PATCH v2 0/2] Fix hibernate on SMP spin-table systems James Morse
  2016-06-22  9:06 ` [PATCH v2 1/2] arm64: smp: Add function to determine if cpus are stuck in the kernel James Morse
@ 2016-06-22  9:06 ` James Morse
  2016-06-22 10:02   ` Mark Rutland
  1 sibling, 1 reply; 4+ messages in thread
From: James Morse @ 2016-06-22  9:06 UTC (permalink / raw)
  To: linux-arm-kernel

Hibernate relies on cpu hotplug to prevent secondary cores executing
the kernel text while it is being restored.

Add a call to cpus_are_stuck_in_kernel() to determine if there are
CPUs not counted by 'num_online_cpus()', and prevent hibernate in this
case.

Fixes: 82869ac57b5 ("arm64: kernel: Add support for hibernate/suspend-to-disk")
Signed-off-by: James Morse <james.morse@arm.com>
---
 arch/arm64/kernel/hibernate.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/arch/arm64/kernel/hibernate.c b/arch/arm64/kernel/hibernate.c
index f8df75d740f4..21ab5df9fa76 100644
--- a/arch/arm64/kernel/hibernate.c
+++ b/arch/arm64/kernel/hibernate.c
@@ -33,6 +33,7 @@
 #include <asm/pgtable.h>
 #include <asm/pgtable-hwdef.h>
 #include <asm/sections.h>
+#include <asm/smp.h>
 #include <asm/suspend.h>
 #include <asm/virt.h>
 
@@ -236,6 +237,11 @@ int swsusp_arch_suspend(void)
 	unsigned long flags;
 	struct sleep_stack_data state;
 
+	if (cpus_are_stuck_in_kernel()) {
+		pr_err("Can't hibernate: no mechanism to offline secondary CPUs.\n");
+		return -EBUSY;
+	}
+
 	local_dbg_save(flags);
 
 	if (__cpu_suspend_enter(&state)) {
-- 
2.8.0.rc3

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* [PATCH v2 2/2] arm64: hibernate: Don't hibernate on systems with stuck CPUs
  2016-06-22  9:06 ` [PATCH v2 2/2] arm64: hibernate: Don't hibernate on systems with stuck CPUs James Morse
@ 2016-06-22 10:02   ` Mark Rutland
  0 siblings, 0 replies; 4+ messages in thread
From: Mark Rutland @ 2016-06-22 10:02 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Jun 22, 2016 at 10:06:13AM +0100, James Morse wrote:
> Hibernate relies on cpu hotplug to prevent secondary cores executing
> the kernel text while it is being restored.
> 
> Add a call to cpus_are_stuck_in_kernel() to determine if there are
> CPUs not counted by 'num_online_cpus()', and prevent hibernate in this
> case.
> 
> Fixes: 82869ac57b5 ("arm64: kernel: Add support for hibernate/suspend-to-disk")
> Signed-off-by: James Morse <james.morse@arm.com>

Acked-by: Mark Rutland <mark.rutland@arm.com>

Mark.

> ---
>  arch/arm64/kernel/hibernate.c | 6 ++++++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/arch/arm64/kernel/hibernate.c b/arch/arm64/kernel/hibernate.c
> index f8df75d740f4..21ab5df9fa76 100644
> --- a/arch/arm64/kernel/hibernate.c
> +++ b/arch/arm64/kernel/hibernate.c
> @@ -33,6 +33,7 @@
>  #include <asm/pgtable.h>
>  #include <asm/pgtable-hwdef.h>
>  #include <asm/sections.h>
> +#include <asm/smp.h>
>  #include <asm/suspend.h>
>  #include <asm/virt.h>
>  
> @@ -236,6 +237,11 @@ int swsusp_arch_suspend(void)
>  	unsigned long flags;
>  	struct sleep_stack_data state;
>  
> +	if (cpus_are_stuck_in_kernel()) {
> +		pr_err("Can't hibernate: no mechanism to offline secondary CPUs.\n");
> +		return -EBUSY;
> +	}
> +
>  	local_dbg_save(flags);
>  
>  	if (__cpu_suspend_enter(&state)) {
> -- 
> 2.8.0.rc3
> 

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2016-06-22 10:02 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-06-22  9:06 [PATCH v2 0/2] Fix hibernate on SMP spin-table systems James Morse
2016-06-22  9:06 ` [PATCH v2 1/2] arm64: smp: Add function to determine if cpus are stuck in the kernel James Morse
2016-06-22  9:06 ` [PATCH v2 2/2] arm64: hibernate: Don't hibernate on systems with stuck CPUs James Morse
2016-06-22 10:02   ` Mark Rutland

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).