All of lore.kernel.org
 help / color / mirror / Atom feed
From: Suzuki.Poulose@arm.com (Suzuki K Poulose)
To: linux-arm-kernel@lists.infradead.org
Subject: [PATCH 1/2] arm64: smp: Add function to determine if cpus are stuck in the kernel
Date: Fri, 17 Jun 2016 12:16:59 +0100	[thread overview]
Message-ID: <5763DC2B.5030705@arm.com> (raw)
In-Reply-To: <20160617102713.GA14524@leverpostej>

On 17/06/16 11:27, Mark Rutland wrote:
> On Fri, Jun 17, 2016 at 10:34:56AM +0100, James Morse wrote:
>> kernel/smp.c has a fancy counter that keeps track of the number of CPUs
>> it marked as not-present and left in cpu_park_loop(). If there are any
>> CPUs spinning in here, features like kexec or hibernate may release them
>> by overwriting this memory.
>>
>> This problem also occurs on machines using spin-tables to release
>> secondary cores.
>> After commit 44dbcc93ab67 ("arm64: Fix behavior of maxcpus=N")
>> we bring all known cpus into the secondary holding pen, but may not bring
>> them up depending on 'maxcpus'. This memory can't be re-used by kexec
>> or hibernate.
>>
>> Add a function cpus_are_stuck_in_kernel() to determine if either of these
>> cases have occurred.
>>
>> Signed-off-by: James Morse <james.morse@arm.com>
>> Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
>> ---
>>   arch/arm64/include/asm/smp.h | 20 ++++++++++++++++++++
>>   arch/arm64/kernel/smp.c      | 13 +++++++++++++
>>   2 files changed, 33 insertions(+)
>>
>> diff --git a/arch/arm64/include/asm/smp.h b/arch/arm64/include/asm/smp.h
>> index 433e50405274..4be755bcc07a 100644
>> --- a/arch/arm64/include/asm/smp.h
>> +++ b/arch/arm64/include/asm/smp.h
>> @@ -124,6 +124,26 @@ static inline void cpu_panic_kernel(void)
>>   	cpu_park_loop();
>>   }
>>
>> +/*
>> + * Kernel features such as hibernate and kexec depend on cpu hotplug to know
>> + * they can replace any kernel memory they are not using themselves.
>> + *
>> + * There are two corner cases:
>> + * If a secondary CPU fails to come online, (e.g. due to mismatched features),
>> + * it will try to call cpu_die(). If this fails, it increases the counter
>> + * cpus_stuck_in_kernel and sits in cpu_park_loop(). The memory containing
>> + * this function must not be re-used for anything else as the 'stuck' core
>> + * is executing it.
>
> It might also be stuck in __no_granule_support, if it never made it to C
> code. In that case, the CPU in charge of bringing up that new CPU will
> increment the counter in __cpu_up.

Just to clarify, *in all the cases*, the CPU in charge of bringing up updates
the cpus_stuck_in_kernel.

>
> There might be other reasons we do something like that in future, so it
> might be better to be a little less specific and say something like:
>
> 	If a secondary CPU enters the kernel but fails to come online,
> 	(e.g. due to mismatched features), and cannot exit the kernel,
> 	we increment cpus_stuck_in_kernel and leave the CPU in a
> 	quiesecent loop within the kernel text. The memory containing
> 	this loop must not be re-used for anything else as the 'stuck'
> 	core is executing it.

Agree.

>> +bool cpus_are_stuck_in_kernel(void);
>> +
>>   #endif /* ifndef __ASSEMBLY__ */
>>
>>   #endif /* ifndef __ASM_SMP_H */
>> diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
>> index 678e0842cb3b..e197502f94fd 100644
>> --- a/arch/arm64/kernel/smp.c
>> +++ b/arch/arm64/kernel/smp.c
>> @@ -909,3 +909,16 @@ int setup_profiling_timer(unsigned int multiplier)
>>   {
>>   	return -EINVAL;
>>   }
>> +
>> +bool cpus_are_stuck_in_kernel(void)
>> +{
>> +	bool ret = !!cpus_stuck_in_kernel;
>> +#ifdef CONFIG_HOTPLUG_CPU
>> +	int any_cpu = raw_smp_processor_id();
>> +
>> +	if (num_possible_cpus() > 1 && !cpu_ops[any_cpu]->cpu_die)
>> +		ret = true;
>> +#endif

Minor nit: Moving the cpu_die check to a static inline function with
an obvious name might make the code look better.

	return !!cpus_stuck_in_kernel || !have_cpu_die() ?

Eitherway,

Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>

Cheers
Suzuki

>> +
>> +	return ret;
>> +}
>> --
>> 2.8.0.rc3
>>
>

  reply	other threads:[~2016-06-17 11:16 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-06-17  9:34 [PATCH 0/2] Fix hibernate on SMP spin-table systems James Morse
2016-06-17  9:34 ` [PATCH 1/2] arm64: smp: Add function to determine if cpus are stuck in the kernel James Morse
2016-06-17 10:27   ` Mark Rutland
2016-06-17 11:16     ` Suzuki K Poulose [this message]
2016-06-17 16:48       ` James Morse
2016-06-21 18:07         ` Will Deacon
2016-06-17 16:37   ` Geoff Levand
2016-06-17 16:39     ` Suzuki K Poulose
2016-06-17 16:42     ` Mark Rutland
2016-06-17  9:34 ` [PATCH 2/2] arm64: hibernate: Don't hibernate on systems with stuck CPUs James Morse
2016-06-17 10:28   ` Mark Rutland

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5763DC2B.5030705@arm.com \
    --to=suzuki.poulose@arm.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.