public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* Re: Linux crashes when trying to online secondary core
       [not found]   ` <8a021a90-e69e-f38b-c8df-ea8963f3973f@free.fr>
@ 2016-12-15 10:35     ` Mason
  2016-12-15 11:16       ` [tip:smp/urgent] clocksource/dummy_timer: Move hotplug callback after the real timers tip-bot for Thomas Gleixner
                         ` (2 more replies)
  0 siblings, 3 replies; 4+ messages in thread
From: Mason @ 2016-12-15 10:35 UTC (permalink / raw)
  To: Linux ARM, LKML
  Cc: Thomas Gleixner, Mark Rutland, Anna-Maria Gleixner,
	Richard Cochran, Sebastian Andrzej Siewior, Daniel Lezcano,
	Peter Zijlstra, Ingo Molnar, Sebastian Frias, Thibaud Cornic,
	Robin Murphy

On 14/12/2016 18:47, Mason wrote:

> On 14/12/2016 18:08, Thomas Gleixner wrote:
> 
>> On Wed, 14 Dec 2016, Mason wrote:
>>
>>> I'm seeing Linux v4.9 crash (dereferencing NULL) when I try to online
>>> the secondary core, after putting it offline.
>>
>> Does the patch below fix the issue?
>>
>> Thanks,
>>
>> 	tglx
>> 	
>> 8<---------------
>>
>> diff --git a/include/linux/cpuhotplug.h b/include/linux/cpuhotplug.h
>> index 22acee76cf4c..2594c287b078 100644
>> --- a/include/linux/cpuhotplug.h
>> +++ b/include/linux/cpuhotplug.h
>> @@ -101,7 +101,6 @@ enum cpuhp_state {
>>  	CPUHP_AP_ARM_L2X0_STARTING,
>>  	CPUHP_AP_ARM_ARCH_TIMER_STARTING,
>>  	CPUHP_AP_ARM_GLOBAL_TIMER_STARTING,
>> -	CPUHP_AP_DUMMY_TIMER_STARTING,
>>  	CPUHP_AP_JCORE_TIMER_STARTING,
>>  	CPUHP_AP_EXYNOS4_MCT_TIMER_STARTING,
>>  	CPUHP_AP_ARM_TWD_STARTING,
>> @@ -111,6 +110,7 @@ enum cpuhp_state {
>>  	CPUHP_AP_MARCO_TIMER_STARTING,
>>  	CPUHP_AP_MIPS_GIC_TIMER_STARTING,
>>  	CPUHP_AP_ARC_TIMER_STARTING,
>> +	CPUHP_AP_DUMMY_TIMER_STARTING,
>>  	CPUHP_AP_KVM_STARTING,
>>  	CPUHP_AP_KVM_ARM_VGIC_INIT_STARTING,
>>  	CPUHP_AP_KVM_ARM_VGIC_STARTING,
> 
> $ patch -p1 < tglx.patch 
> patching file include/linux/cpuhotplug.h
> Hunk #1 succeeded at 80 (offset -21 lines).
> Hunk #2 succeeded at 89 (offset -21 lines).
> 
> It does seem to fix the problem:
> 
> # echo 0 > /sys/devices/system/cpu/cpu1/online
> SMC called with a0=0x00[000001 a1=0x00000121 a2=0x00000005  a3 =0xc01189b4  0x00000121
> [1][flow/suspend3.c:39] CPU 1 die: jumping6 to. post-boot WFE
> 402826] CPU1: shutdown
> SMC called with a0=0x00000001 a1=0x00000122 a2=0x00000000 a3=0x00000000 0x00000122
> [0][flow/suspend.c:82] Killing core1
> armor+++ armor: core 1 booted, entering wfe...
> # echo 1 > /sys/devices/system/cpu/cpu1/online
> [  215.692700] tango_boot_secondary from __cpu_up
> SMC called with a0=0x80101500 a1=0x00000105 a2=0x00000000 a3=0x00000000 0x00000105
> [  215.704494] tango_set_aux_boot_addr=0
> SMC called with a0=0x00000001 a1=0x00000104 a2=0x00000000 a3=0x00000000 0x00000104
> [0][flow/smc_handler.c:127] waking up CPU1
> [  215.719308] tango_start_aux_core=0
> 
> 
> I reverted your patch, and the kernel blows up again.
> 
> So what's the problem, and how does your patch solve it?

Link to the original report:
https://marc.info/?l=linux-arm-kernel&m=148173152524746&w=2

Forgot to CC Robin Murphy, who had provided valuable input
in similar circumstances a few months back.

Also add LKML, since this doesn't appear to be ARM-specific.

Do I need to specify which device tree I was using?

Regards.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [tip:smp/urgent] clocksource/dummy_timer: Move hotplug callback after the real timers
  2016-12-15 10:35     ` Linux crashes when trying to online secondary core Mason
@ 2016-12-15 11:16       ` tip-bot for Thomas Gleixner
  2016-12-15 11:31       ` [tip:timers/urgent] tick/broadcast: Prevent NULL pointer dereference tip-bot for Thomas Gleixner
  2016-12-15 12:00       ` Linux crashes when trying to online secondary core Mark Rutland
  2 siblings, 0 replies; 4+ messages in thread
From: tip-bot for Thomas Gleixner @ 2016-12-15 11:16 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: tglx, slash.tmp, anna-maria, thibaud_cornic, mingo, linux-kernel,
	mark.rutland, robin.murphy, sf84, daniel.lezcano, bigeasy,
	rcochran, hpa

Commit-ID:  9bf11ecce5a2758e5a097c2f3a13d08552d0d6f9
Gitweb:     http://git.kernel.org/tip/9bf11ecce5a2758e5a097c2f3a13d08552d0d6f9
Author:     Thomas Gleixner <tglx@linutronix.de>
AuthorDate: Thu, 15 Dec 2016 12:01:05 +0100
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Thu, 15 Dec 2016 12:09:20 +0100

clocksource/dummy_timer: Move hotplug callback after the real timers

When the dummy timer callback is invoked before the real timer callbacks,
then it tries to install that timer for the starting CPU. If the platform
does not have a broadcast timer installed the installation fails with a
kernel crash. The crash happens due to a unconditional deference of the non
available broadcast device. This needs to be fixed in the timer core code.

But even when this is fixed in the core code then installing the dummy
timer before the real timers is a pointless exercise.

Move it to the end of the callback list.

Fixes: 00c1d17aab51 ("clocksource/dummy_timer: Convert to hotplug state machine")
Reported-and-tested-by: Mason <slash.tmp@free.fr>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Anna-Maria Gleixner <anna-maria@linutronix.de>
Cc: Richard Cochran <rcochran@linutronix.de>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Peter Zijlstra <peterz@infradead.org>,
Cc: Sebastian Frias <sf84@laposte.net>
Cc: Thibaud Cornic <thibaud_cornic@sigmadesigns.com>
Cc: Robin Murphy <robin.murphy@arm.com>
Link: http://lkml.kernel.org/r/1147ef90-7877-e4d2-bb2b-5c4fa8d3144b@free.fr
---
 include/linux/cpuhotplug.h | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/include/linux/cpuhotplug.h b/include/linux/cpuhotplug.h
index 22acee7..2ab7bf5 100644
--- a/include/linux/cpuhotplug.h
+++ b/include/linux/cpuhotplug.h
@@ -101,7 +101,6 @@ enum cpuhp_state {
 	CPUHP_AP_ARM_L2X0_STARTING,
 	CPUHP_AP_ARM_ARCH_TIMER_STARTING,
 	CPUHP_AP_ARM_GLOBAL_TIMER_STARTING,
-	CPUHP_AP_DUMMY_TIMER_STARTING,
 	CPUHP_AP_JCORE_TIMER_STARTING,
 	CPUHP_AP_EXYNOS4_MCT_TIMER_STARTING,
 	CPUHP_AP_ARM_TWD_STARTING,
@@ -115,6 +114,8 @@ enum cpuhp_state {
 	CPUHP_AP_KVM_ARM_VGIC_INIT_STARTING,
 	CPUHP_AP_KVM_ARM_VGIC_STARTING,
 	CPUHP_AP_KVM_ARM_TIMER_STARTING,
+	/* Must be the last timer callback */
+	CPUHP_AP_DUMMY_TIMER_STARTING,
 	CPUHP_AP_ARM_XEN_STARTING,
 	CPUHP_AP_ARM_CORESIGHT_STARTING,
 	CPUHP_AP_ARM_CORESIGHT4_STARTING,

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* [tip:timers/urgent] tick/broadcast: Prevent NULL pointer dereference
  2016-12-15 10:35     ` Linux crashes when trying to online secondary core Mason
  2016-12-15 11:16       ` [tip:smp/urgent] clocksource/dummy_timer: Move hotplug callback after the real timers tip-bot for Thomas Gleixner
@ 2016-12-15 11:31       ` tip-bot for Thomas Gleixner
  2016-12-15 12:00       ` Linux crashes when trying to online secondary core Mark Rutland
  2 siblings, 0 replies; 4+ messages in thread
From: tip-bot for Thomas Gleixner @ 2016-12-15 11:31 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: sf84, anna-maria, bigeasy, slash.tmp, thibaud_cornic,
	linux-kernel, mark.rutland, mingo, rcochran, hpa, robin.murphy,
	tglx, daniel.lezcano

Commit-ID:  c1a9eeb938b5433947e5ea22f89baff3182e7075
Gitweb:     http://git.kernel.org/tip/c1a9eeb938b5433947e5ea22f89baff3182e7075
Author:     Thomas Gleixner <tglx@linutronix.de>
AuthorDate: Thu, 15 Dec 2016 12:10:37 +0100
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Thu, 15 Dec 2016 12:25:13 +0100

tick/broadcast: Prevent NULL pointer dereference

When a disfunctional timer, e.g. dummy timer, is installed, the tick core
tries to setup the broadcast timer.

If no broadcast device is installed, the kernel crashes with a NULL pointer
dereference in tick_broadcast_setup_oneshot() because the function has no
sanity check.

Reported-by: Mason <slash.tmp@free.fr>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Anna-Maria Gleixner <anna-maria@linutronix.de>
Cc: Richard Cochran <rcochran@linutronix.de>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Peter Zijlstra <peterz@infradead.org>,
Cc: Sebastian Frias <sf84@laposte.net>
Cc: Thibaud Cornic <thibaud_cornic@sigmadesigns.com>
Cc: Robin Murphy <robin.murphy@arm.com>
Link: http://lkml.kernel.org/r/1147ef90-7877-e4d2-bb2b-5c4fa8d3144b@free.fr
---
 kernel/time/tick-broadcast.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/kernel/time/tick-broadcast.c b/kernel/time/tick-broadcast.c
index f6aae79..d2a20e8 100644
--- a/kernel/time/tick-broadcast.c
+++ b/kernel/time/tick-broadcast.c
@@ -871,6 +871,9 @@ void tick_broadcast_setup_oneshot(struct clock_event_device *bc)
 {
 	int cpu = smp_processor_id();
 
+	if (!bc)
+		return;
+
 	/* Set it up only once ! */
 	if (bc->event_handler != tick_handle_oneshot_broadcast) {
 		int was_periodic = clockevent_state_periodic(bc);

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: Linux crashes when trying to online secondary core
  2016-12-15 10:35     ` Linux crashes when trying to online secondary core Mason
  2016-12-15 11:16       ` [tip:smp/urgent] clocksource/dummy_timer: Move hotplug callback after the real timers tip-bot for Thomas Gleixner
  2016-12-15 11:31       ` [tip:timers/urgent] tick/broadcast: Prevent NULL pointer dereference tip-bot for Thomas Gleixner
@ 2016-12-15 12:00       ` Mark Rutland
  2 siblings, 0 replies; 4+ messages in thread
From: Mark Rutland @ 2016-12-15 12:00 UTC (permalink / raw)
  To: Mason
  Cc: Linux ARM, LKML, Thomas Gleixner, Anna-Maria Gleixner,
	Richard Cochran, Sebastian Andrzej Siewior, Daniel Lezcano,
	Peter Zijlstra, Ingo Molnar, Sebastian Frias, Thibaud Cornic,
	Robin Murphy

On Thu, Dec 15, 2016 at 11:35:12AM +0100, Mason wrote:
> On 14/12/2016 18:47, Mason wrote:
> > On 14/12/2016 18:08, Thomas Gleixner wrote:
> >> Does the patch below fix the issue?

> >> diff --git a/include/linux/cpuhotplug.h b/include/linux/cpuhotplug.h
> >> index 22acee76cf4c..2594c287b078 100644
> >> --- a/include/linux/cpuhotplug.h
> >> +++ b/include/linux/cpuhotplug.h
> >> @@ -101,7 +101,6 @@ enum cpuhp_state {
> >>  	CPUHP_AP_ARM_L2X0_STARTING,
> >>  	CPUHP_AP_ARM_ARCH_TIMER_STARTING,
> >>  	CPUHP_AP_ARM_GLOBAL_TIMER_STARTING,
> >> -	CPUHP_AP_DUMMY_TIMER_STARTING,
> >>  	CPUHP_AP_JCORE_TIMER_STARTING,
> >>  	CPUHP_AP_EXYNOS4_MCT_TIMER_STARTING,
> >>  	CPUHP_AP_ARM_TWD_STARTING,
> >> @@ -111,6 +110,7 @@ enum cpuhp_state {
> >>  	CPUHP_AP_MARCO_TIMER_STARTING,
> >>  	CPUHP_AP_MIPS_GIC_TIMER_STARTING,
> >>  	CPUHP_AP_ARC_TIMER_STARTING,
> >> +	CPUHP_AP_DUMMY_TIMER_STARTING,
> >>  	CPUHP_AP_KVM_STARTING,
> >>  	CPUHP_AP_KVM_ARM_VGIC_INIT_STARTING,
> >>  	CPUHP_AP_KVM_ARM_VGIC_STARTING,

> > It does seem to fix the problem:

> > I reverted your patch, and the kernel blows up again.
> > 
> > So what's the problem, and how does your patch solve it?
> 
> Link to the original report:
> https://marc.info/?l=linux-arm-kernel&m=148173152524746&w=2
> 
> Forgot to CC Robin Murphy, who had provided valuable input
> in similar circumstances a few months back.
> 
> Also add LKML, since this doesn't appear to be ARM-specific.
> 
> Do I need to specify which device tree I was using?

This is already fixed in the linux-tip tree, with commit messages
describing the fix.

It's specific to a few clocksources, due to their hotplug callbacks
occuring later than the dummy timer. That triggers the bug fixed in:

https://git.kernel.org/cgit/linux/kernel/git/tip/tip.git/commit/?h=timers/urgent&id=c1a9eeb938b5433947e5ea22f89baff3182e7075

The relevant timers were fixed in:

https://git.kernel.org/cgit/linux/kernel/git/tip/tip.git/commit/?h=smp/urgent&id=9bf11ecce5a2758e5a097c2f3a13d08552d0d6f9

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2016-12-15 12:02 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <ef972981-3fb6-74a4-cd83-a6629d2dab2a@free.fr>
     [not found] ` <alpine.DEB.2.20.1612141807410.3556@nanos>
     [not found]   ` <8a021a90-e69e-f38b-c8df-ea8963f3973f@free.fr>
2016-12-15 10:35     ` Linux crashes when trying to online secondary core Mason
2016-12-15 11:16       ` [tip:smp/urgent] clocksource/dummy_timer: Move hotplug callback after the real timers tip-bot for Thomas Gleixner
2016-12-15 11:31       ` [tip:timers/urgent] tick/broadcast: Prevent NULL pointer dereference tip-bot for Thomas Gleixner
2016-12-15 12:00       ` Linux crashes when trying to online secondary core Mark Rutland

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox