* Re: Linux crashes when trying to online secondary core
[not found] ` <8a021a90-e69e-f38b-c8df-ea8963f3973f@free.fr>
@ 2016-12-15 10:35 ` Mason
2016-12-15 11:16 ` [tip:smp/urgent] clocksource/dummy_timer: Move hotplug callback after the real timers tip-bot for Thomas Gleixner
` (2 more replies)
0 siblings, 3 replies; 4+ messages in thread
From: Mason @ 2016-12-15 10:35 UTC (permalink / raw)
To: Linux ARM, LKML
Cc: Thomas Gleixner, Mark Rutland, Anna-Maria Gleixner,
Richard Cochran, Sebastian Andrzej Siewior, Daniel Lezcano,
Peter Zijlstra, Ingo Molnar, Sebastian Frias, Thibaud Cornic,
Robin Murphy
On 14/12/2016 18:47, Mason wrote:
> On 14/12/2016 18:08, Thomas Gleixner wrote:
>
>> On Wed, 14 Dec 2016, Mason wrote:
>>
>>> I'm seeing Linux v4.9 crash (dereferencing NULL) when I try to online
>>> the secondary core, after putting it offline.
>>
>> Does the patch below fix the issue?
>>
>> Thanks,
>>
>> tglx
>>
>> 8<---------------
>>
>> diff --git a/include/linux/cpuhotplug.h b/include/linux/cpuhotplug.h
>> index 22acee76cf4c..2594c287b078 100644
>> --- a/include/linux/cpuhotplug.h
>> +++ b/include/linux/cpuhotplug.h
>> @@ -101,7 +101,6 @@ enum cpuhp_state {
>> CPUHP_AP_ARM_L2X0_STARTING,
>> CPUHP_AP_ARM_ARCH_TIMER_STARTING,
>> CPUHP_AP_ARM_GLOBAL_TIMER_STARTING,
>> - CPUHP_AP_DUMMY_TIMER_STARTING,
>> CPUHP_AP_JCORE_TIMER_STARTING,
>> CPUHP_AP_EXYNOS4_MCT_TIMER_STARTING,
>> CPUHP_AP_ARM_TWD_STARTING,
>> @@ -111,6 +110,7 @@ enum cpuhp_state {
>> CPUHP_AP_MARCO_TIMER_STARTING,
>> CPUHP_AP_MIPS_GIC_TIMER_STARTING,
>> CPUHP_AP_ARC_TIMER_STARTING,
>> + CPUHP_AP_DUMMY_TIMER_STARTING,
>> CPUHP_AP_KVM_STARTING,
>> CPUHP_AP_KVM_ARM_VGIC_INIT_STARTING,
>> CPUHP_AP_KVM_ARM_VGIC_STARTING,
>
> $ patch -p1 < tglx.patch
> patching file include/linux/cpuhotplug.h
> Hunk #1 succeeded at 80 (offset -21 lines).
> Hunk #2 succeeded at 89 (offset -21 lines).
>
> It does seem to fix the problem:
>
> # echo 0 > /sys/devices/system/cpu/cpu1/online
> SMC called with a0=0x00[000001 a1=0x00000121 a2=0x00000005 a3 =0xc01189b4 0x00000121
> [1][flow/suspend3.c:39] CPU 1 die: jumping6 to. post-boot WFE
> 402826] CPU1: shutdown
> SMC called with a0=0x00000001 a1=0x00000122 a2=0x00000000 a3=0x00000000 0x00000122
> [0][flow/suspend.c:82] Killing core1
> armor+++ armor: core 1 booted, entering wfe...
> # echo 1 > /sys/devices/system/cpu/cpu1/online
> [ 215.692700] tango_boot_secondary from __cpu_up
> SMC called with a0=0x80101500 a1=0x00000105 a2=0x00000000 a3=0x00000000 0x00000105
> [ 215.704494] tango_set_aux_boot_addr=0
> SMC called with a0=0x00000001 a1=0x00000104 a2=0x00000000 a3=0x00000000 0x00000104
> [0][flow/smc_handler.c:127] waking up CPU1
> [ 215.719308] tango_start_aux_core=0
>
>
> I reverted your patch, and the kernel blows up again.
>
> So what's the problem, and how does your patch solve it?
Link to the original report:
https://marc.info/?l=linux-arm-kernel&m=148173152524746&w=2
Forgot to CC Robin Murphy, who had provided valuable input
in similar circumstances a few months back.
Also add LKML, since this doesn't appear to be ARM-specific.
Do I need to specify which device tree I was using?
Regards.
^ permalink raw reply [flat|nested] 4+ messages in thread
* [tip:smp/urgent] clocksource/dummy_timer: Move hotplug callback after the real timers
2016-12-15 10:35 ` Linux crashes when trying to online secondary core Mason
@ 2016-12-15 11:16 ` tip-bot for Thomas Gleixner
2016-12-15 11:31 ` [tip:timers/urgent] tick/broadcast: Prevent NULL pointer dereference tip-bot for Thomas Gleixner
2016-12-15 12:00 ` Linux crashes when trying to online secondary core Mark Rutland
2 siblings, 0 replies; 4+ messages in thread
From: tip-bot for Thomas Gleixner @ 2016-12-15 11:16 UTC (permalink / raw)
To: linux-tip-commits
Cc: tglx, slash.tmp, anna-maria, thibaud_cornic, mingo, linux-kernel,
mark.rutland, robin.murphy, sf84, daniel.lezcano, bigeasy,
rcochran, hpa
Commit-ID: 9bf11ecce5a2758e5a097c2f3a13d08552d0d6f9
Gitweb: http://git.kernel.org/tip/9bf11ecce5a2758e5a097c2f3a13d08552d0d6f9
Author: Thomas Gleixner <tglx@linutronix.de>
AuthorDate: Thu, 15 Dec 2016 12:01:05 +0100
Committer: Thomas Gleixner <tglx@linutronix.de>
CommitDate: Thu, 15 Dec 2016 12:09:20 +0100
clocksource/dummy_timer: Move hotplug callback after the real timers
When the dummy timer callback is invoked before the real timer callbacks,
then it tries to install that timer for the starting CPU. If the platform
does not have a broadcast timer installed the installation fails with a
kernel crash. The crash happens due to a unconditional deference of the non
available broadcast device. This needs to be fixed in the timer core code.
But even when this is fixed in the core code then installing the dummy
timer before the real timers is a pointless exercise.
Move it to the end of the callback list.
Fixes: 00c1d17aab51 ("clocksource/dummy_timer: Convert to hotplug state machine")
Reported-and-tested-by: Mason <slash.tmp@free.fr>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Anna-Maria Gleixner <anna-maria@linutronix.de>
Cc: Richard Cochran <rcochran@linutronix.de>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Peter Zijlstra <peterz@infradead.org>,
Cc: Sebastian Frias <sf84@laposte.net>
Cc: Thibaud Cornic <thibaud_cornic@sigmadesigns.com>
Cc: Robin Murphy <robin.murphy@arm.com>
Link: http://lkml.kernel.org/r/1147ef90-7877-e4d2-bb2b-5c4fa8d3144b@free.fr
---
include/linux/cpuhotplug.h | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/include/linux/cpuhotplug.h b/include/linux/cpuhotplug.h
index 22acee7..2ab7bf5 100644
--- a/include/linux/cpuhotplug.h
+++ b/include/linux/cpuhotplug.h
@@ -101,7 +101,6 @@ enum cpuhp_state {
CPUHP_AP_ARM_L2X0_STARTING,
CPUHP_AP_ARM_ARCH_TIMER_STARTING,
CPUHP_AP_ARM_GLOBAL_TIMER_STARTING,
- CPUHP_AP_DUMMY_TIMER_STARTING,
CPUHP_AP_JCORE_TIMER_STARTING,
CPUHP_AP_EXYNOS4_MCT_TIMER_STARTING,
CPUHP_AP_ARM_TWD_STARTING,
@@ -115,6 +114,8 @@ enum cpuhp_state {
CPUHP_AP_KVM_ARM_VGIC_INIT_STARTING,
CPUHP_AP_KVM_ARM_VGIC_STARTING,
CPUHP_AP_KVM_ARM_TIMER_STARTING,
+ /* Must be the last timer callback */
+ CPUHP_AP_DUMMY_TIMER_STARTING,
CPUHP_AP_ARM_XEN_STARTING,
CPUHP_AP_ARM_CORESIGHT_STARTING,
CPUHP_AP_ARM_CORESIGHT4_STARTING,
^ permalink raw reply related [flat|nested] 4+ messages in thread
* [tip:timers/urgent] tick/broadcast: Prevent NULL pointer dereference
2016-12-15 10:35 ` Linux crashes when trying to online secondary core Mason
2016-12-15 11:16 ` [tip:smp/urgent] clocksource/dummy_timer: Move hotplug callback after the real timers tip-bot for Thomas Gleixner
@ 2016-12-15 11:31 ` tip-bot for Thomas Gleixner
2016-12-15 12:00 ` Linux crashes when trying to online secondary core Mark Rutland
2 siblings, 0 replies; 4+ messages in thread
From: tip-bot for Thomas Gleixner @ 2016-12-15 11:31 UTC (permalink / raw)
To: linux-tip-commits
Cc: sf84, anna-maria, bigeasy, slash.tmp, thibaud_cornic,
linux-kernel, mark.rutland, mingo, rcochran, hpa, robin.murphy,
tglx, daniel.lezcano
Commit-ID: c1a9eeb938b5433947e5ea22f89baff3182e7075
Gitweb: http://git.kernel.org/tip/c1a9eeb938b5433947e5ea22f89baff3182e7075
Author: Thomas Gleixner <tglx@linutronix.de>
AuthorDate: Thu, 15 Dec 2016 12:10:37 +0100
Committer: Thomas Gleixner <tglx@linutronix.de>
CommitDate: Thu, 15 Dec 2016 12:25:13 +0100
tick/broadcast: Prevent NULL pointer dereference
When a disfunctional timer, e.g. dummy timer, is installed, the tick core
tries to setup the broadcast timer.
If no broadcast device is installed, the kernel crashes with a NULL pointer
dereference in tick_broadcast_setup_oneshot() because the function has no
sanity check.
Reported-by: Mason <slash.tmp@free.fr>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Anna-Maria Gleixner <anna-maria@linutronix.de>
Cc: Richard Cochran <rcochran@linutronix.de>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Peter Zijlstra <peterz@infradead.org>,
Cc: Sebastian Frias <sf84@laposte.net>
Cc: Thibaud Cornic <thibaud_cornic@sigmadesigns.com>
Cc: Robin Murphy <robin.murphy@arm.com>
Link: http://lkml.kernel.org/r/1147ef90-7877-e4d2-bb2b-5c4fa8d3144b@free.fr
---
kernel/time/tick-broadcast.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/kernel/time/tick-broadcast.c b/kernel/time/tick-broadcast.c
index f6aae79..d2a20e8 100644
--- a/kernel/time/tick-broadcast.c
+++ b/kernel/time/tick-broadcast.c
@@ -871,6 +871,9 @@ void tick_broadcast_setup_oneshot(struct clock_event_device *bc)
{
int cpu = smp_processor_id();
+ if (!bc)
+ return;
+
/* Set it up only once ! */
if (bc->event_handler != tick_handle_oneshot_broadcast) {
int was_periodic = clockevent_state_periodic(bc);
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: Linux crashes when trying to online secondary core
2016-12-15 10:35 ` Linux crashes when trying to online secondary core Mason
2016-12-15 11:16 ` [tip:smp/urgent] clocksource/dummy_timer: Move hotplug callback after the real timers tip-bot for Thomas Gleixner
2016-12-15 11:31 ` [tip:timers/urgent] tick/broadcast: Prevent NULL pointer dereference tip-bot for Thomas Gleixner
@ 2016-12-15 12:00 ` Mark Rutland
2 siblings, 0 replies; 4+ messages in thread
From: Mark Rutland @ 2016-12-15 12:00 UTC (permalink / raw)
To: Mason
Cc: Linux ARM, LKML, Thomas Gleixner, Anna-Maria Gleixner,
Richard Cochran, Sebastian Andrzej Siewior, Daniel Lezcano,
Peter Zijlstra, Ingo Molnar, Sebastian Frias, Thibaud Cornic,
Robin Murphy
On Thu, Dec 15, 2016 at 11:35:12AM +0100, Mason wrote:
> On 14/12/2016 18:47, Mason wrote:
> > On 14/12/2016 18:08, Thomas Gleixner wrote:
> >> Does the patch below fix the issue?
> >> diff --git a/include/linux/cpuhotplug.h b/include/linux/cpuhotplug.h
> >> index 22acee76cf4c..2594c287b078 100644
> >> --- a/include/linux/cpuhotplug.h
> >> +++ b/include/linux/cpuhotplug.h
> >> @@ -101,7 +101,6 @@ enum cpuhp_state {
> >> CPUHP_AP_ARM_L2X0_STARTING,
> >> CPUHP_AP_ARM_ARCH_TIMER_STARTING,
> >> CPUHP_AP_ARM_GLOBAL_TIMER_STARTING,
> >> - CPUHP_AP_DUMMY_TIMER_STARTING,
> >> CPUHP_AP_JCORE_TIMER_STARTING,
> >> CPUHP_AP_EXYNOS4_MCT_TIMER_STARTING,
> >> CPUHP_AP_ARM_TWD_STARTING,
> >> @@ -111,6 +110,7 @@ enum cpuhp_state {
> >> CPUHP_AP_MARCO_TIMER_STARTING,
> >> CPUHP_AP_MIPS_GIC_TIMER_STARTING,
> >> CPUHP_AP_ARC_TIMER_STARTING,
> >> + CPUHP_AP_DUMMY_TIMER_STARTING,
> >> CPUHP_AP_KVM_STARTING,
> >> CPUHP_AP_KVM_ARM_VGIC_INIT_STARTING,
> >> CPUHP_AP_KVM_ARM_VGIC_STARTING,
> > It does seem to fix the problem:
> > I reverted your patch, and the kernel blows up again.
> >
> > So what's the problem, and how does your patch solve it?
>
> Link to the original report:
> https://marc.info/?l=linux-arm-kernel&m=148173152524746&w=2
>
> Forgot to CC Robin Murphy, who had provided valuable input
> in similar circumstances a few months back.
>
> Also add LKML, since this doesn't appear to be ARM-specific.
>
> Do I need to specify which device tree I was using?
This is already fixed in the linux-tip tree, with commit messages
describing the fix.
It's specific to a few clocksources, due to their hotplug callbacks
occuring later than the dummy timer. That triggers the bug fixed in:
https://git.kernel.org/cgit/linux/kernel/git/tip/tip.git/commit/?h=timers/urgent&id=c1a9eeb938b5433947e5ea22f89baff3182e7075
The relevant timers were fixed in:
https://git.kernel.org/cgit/linux/kernel/git/tip/tip.git/commit/?h=smp/urgent&id=9bf11ecce5a2758e5a097c2f3a13d08552d0d6f9
Thanks,
Mark.
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2016-12-15 12:02 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <ef972981-3fb6-74a4-cd83-a6629d2dab2a@free.fr>
[not found] ` <alpine.DEB.2.20.1612141807410.3556@nanos>
[not found] ` <8a021a90-e69e-f38b-c8df-ea8963f3973f@free.fr>
2016-12-15 10:35 ` Linux crashes when trying to online secondary core Mason
2016-12-15 11:16 ` [tip:smp/urgent] clocksource/dummy_timer: Move hotplug callback after the real timers tip-bot for Thomas Gleixner
2016-12-15 11:31 ` [tip:timers/urgent] tick/broadcast: Prevent NULL pointer dereference tip-bot for Thomas Gleixner
2016-12-15 12:00 ` Linux crashes when trying to online secondary core Mark Rutland
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox