* [RFC] Arm64 boot fail with numa enable in BIOS
2016-09-19 14:07 ` [RFC] Arm64 boot fail with numa enable in BIOS Mark Rutland
@ 2016-09-19 14:45 ` Will Deacon
2016-09-20 1:19 ` Leizhen (ThunderTown)
2016-09-19 17:41 ` James Morse
` (2 subsequent siblings)
3 siblings, 1 reply; 7+ messages in thread
From: Will Deacon @ 2016-09-19 14:45 UTC (permalink / raw)
To: linux-arm-kernel
On Mon, Sep 19, 2016 at 03:07:19PM +0100, Mark Rutland wrote:
> [adding LAKML, arm64 maintainers]
I've also looped in Euler ThunderTown, since (a) he's at Huawei and is
assumedly testing this stuff and (b) he has a fairly big NUMA patch
series doing the rounds (some of which I've queued).
> On Mon, Sep 19, 2016 at 09:05:26PM +0800, Yisheng Xie wrote:
> In future, please make sure to Cc LAKML along with relevant parties when
> sending arm64 patches/queries.
>
> For everyone newly Cc'd, the original message (with attachments) can be
> found at:
>
> http://lkml.kernel.org/r/7618d76d-bfa8-d8aa-59aa-06f9d90c1a98 at huawei.com
>
> > When I enable NUMA in BIOS for arm64, it failed to boot on v4.8-rc4-162-g071e31e.
>
> That commit ID doesn't seem to be in mainline (I can't find it in my
> local tree). Which tree are you using? Do you have local patches
> applied?
That commit is in mainline:
http://git.kernel.org/linus/071e31e
It would be nice to know if the problem also exists on the arm64
for-next/core branch.
Will
> I take it that by "enable NUMA in BIOS", you mean exposing SRAT to the
> OS?
>
> > For the crash log, it seems caused by error number of cpumask.
> > Any ideas about it?
>
> Much earlier in your log, there was a (non-fatal) warning, as below. Do
> you see this without NUMA/SRAT enabled in your FW? I don't see how the
> SRAT should affect the secondaries we try to bring online.
>
> Given your MPIDRs have Aff2 bits set, I wonder if we've conflated a
> logical ID with a physical ID somewhere, and it just so happens that the
> NUMA code is more likely to poke something based on that.
>
> Can you modify the warning in cpumask.h to dump the bad CPU number? That
> would make it fairly clear if that's the case.
>
> Thanks,
> Mark.
>
> > [ 0.297337] Detected PIPT I-cache on CPU1
> > [ 0.297347] GICv3: CPU1: found redistributor 10001 region 1:0x000000004d140000
> > [ 0.297356] CPU1: Booted secondary processor [410fd082]
> > [ 0.297375] ------------[ cut here ]------------
> > [ 0.320390] WARNING: CPU: 1 PID: 0 at ./include/linux/cpumask.h:121 gic_raise_softirq+0x128/0x17c
> > [ 0.329356] Modules linked in:
> > [ 0.332434]
> > [ 0.333932] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.8.0-rc4-00163-g803ea3a #21
> > [ 0.341581] Hardware name: Hisilicon Hi1616 Evaluation Board (DT)
> > [ 0.347735] task: ffff8013e9dd0000 task.stack: ffff8013e9dcc000
> > [ 0.353714] PC is at gic_raise_softirq+0x128/0x17c
> > [ 0.358550] LR is at gic_raise_softirq+0xa0/0x17c
> > [ 0.363298] pc : [<ffff00000838c124>] lr : [<ffff00000838c09c>] pstate: 200001c5
> > [ 0.370770] sp : ffff8013e9dcfde0
> > [ 0.374112] x29: ffff8013e9dcfde0 x28: 0000000000000000
> > [ 0.379476] x27: 000000000083207c x26: ffff000008ca5d70
> > [ 0.384841] x25: 0000000100000001 x24: ffff000008d63ff3
> > [ 0.390205] x23: 0000000000000000 x22: ffff000008cb0000
> > [ 0.395569] x21: ffff00000884edb0 x20: 0000000000000001
> > [ 0.400933] x19: 0000000100000000 x18: 0000000000000000
> > [ 0.406298] x17: 0000000000000000 x16: 0000000003010066
> > [ 0.411661] x15: ffff000008ca8000 x14: 0000000000000013
> > [ 0.417025] x13: 0000000000000000 x12: 0000000000000013
> > [ 0.422389] x11: 0000000000000013 x10: 0000000002e92aa7
> > [ 0.427754] x9 : 0000000000000000 x8 : ffff8413eb6ca668
> > [ 0.433118] x7 : ffff8413eb6ca690 x6 : 0000000000000000
> > [ 0.438482] x5 : fffffffffffffffe x4 : 0000000000000000
> > [ 0.443845] x3 : 0000000000000040 x2 : 0000000000000041
> > [ 0.449209] x1 : 0000000000000000 x0 : 0000000000000001
> > [ 0.454573]
> > [ 0.456069] ---[ end trace b58e70f3295a8cd7 ]---
> > [ 0.460730] Call trace:
> > [ 0.463193] Exception stack(0xffff8013e9dcfc10 to 0xffff8013e9dcfd40)
> > [ 0.469699] fc00: 0000000100000000 0001000000000000
> > [ 0.477611] fc20: ffff8013e9dcfde0 ffff00000838c124 ffff000008d72228 ffff8013e9dcff70
> > [ 0.485524] fc40: ffff000008d72608 ffff000008ab02a4 0000000000000000 0000000000000000
> > [ 0.493436] fc60: 0000000000000000 3464313430303030 0000000000000000 0000000000000000
> > [ 0.501348] fc80: ffff8013e9dcfc90 ffff00000836e678 ffff8013e9dcfca0 ffff00000836e910
> > [ 0.509259] fca0: ffff8013e9dcfd30 ffff00000836ec10 0000000000000001 0000000000000000
> > [ 0.517171] fcc0: 0000000000000041 0000000000000040 0000000000000000 fffffffffffffffe
> > [ 0.525083] fce0: 0000000000000000 ffff8413eb6ca690 ffff8413eb6ca668 0000000000000000
> > [ 0.532995] fd00: 0000000002e92aa7 0000000000000013 0000000000000013 0000000000000000
> > [ 0.540907] fd20: 0000000000000013 ffff000008ca8000 0000000003010066 0000000000000000
> > [ 0.548819] [<ffff00000838c124>] gic_raise_softirq+0x128/0x17c
> > [ 0.554713] [<ffff00000808e1f4>] smp_send_reschedule+0x34/0x3c
> > [ 0.560605] [<ffff0000080ddf18>] resched_curr+0x40/0x5c
> > [ 0.565881] [<ffff0000080de650>] check_preempt_curr+0x58/0xa0
> > [ 0.571685] [<ffff0000080de6b0>] ttwu_do_wakeup+0x18/0x80
> > [ 0.577136] [<ffff0000080de790>] ttwu_do_activate+0x78/0x88
> > [ 0.582763] [<ffff0000080df5cc>] try_to_wake_up+0x1f8/0x300
> > [ 0.588390] [<ffff0000080df79c>] default_wake_function+0x10/0x18
> > [ 0.594458] [<ffff0000080f3210>] __wake_up_common+0x5c/0x9c
> > [ 0.600085] [<ffff0000080f3264>] __wake_up_locked+0x14/0x1c
> > [ 0.605712] [<ffff0000080f3e10>] complete+0x40/0x5c
> > [ 0.610635] [<ffff00000808dba8>] secondary_start_kernel+0x148/0x1a8
> > [ 0.616965] [<00000000000831a8>] 0x831a8
>
^ permalink raw reply [flat|nested] 7+ messages in thread
* [RFC] Arm64 boot fail with numa enable in BIOS
2016-09-19 14:45 ` Will Deacon
@ 2016-09-20 1:19 ` Leizhen (ThunderTown)
0 siblings, 0 replies; 7+ messages in thread
From: Leizhen (ThunderTown) @ 2016-09-20 1:19 UTC (permalink / raw)
To: linux-arm-kernel
On 2016/9/19 22:45, Will Deacon wrote:
> On Mon, Sep 19, 2016 at 03:07:19PM +0100, Mark Rutland wrote:
>> [adding LAKML, arm64 maintainers]
>
> I've also looped in Euler ThunderTown, since (a) he's at Huawei and is
> assumedly testing this stuff and (b) he has a fairly big NUMA patch
> series doing the rounds (some of which I've queued).
In my patch series, only one is used to resolve crashed problem, but it's related to device-tree.
>
>> On Mon, Sep 19, 2016 at 09:05:26PM +0800, Yisheng Xie wrote:
>> In future, please make sure to Cc LAKML along with relevant parties when
>> sending arm64 patches/queries.
>>
>> For everyone newly Cc'd, the original message (with attachments) can be
>> found at:
>>
>> http://lkml.kernel.org/r/7618d76d-bfa8-d8aa-59aa-06f9d90c1a98 at huawei.com
>>
>>> When I enable NUMA in BIOS for arm64, it failed to boot on v4.8-rc4-162-g071e31e.
>>
>> That commit ID doesn't seem to be in mainline (I can't find it in my
>> local tree). Which tree are you using? Do you have local patches
>> applied?
>
> That commit is in mainline:
>
> http://git.kernel.org/linus/071e31e
>
> It would be nice to know if the problem also exists on the arm64
> for-next/core branch.
>
> Will
>
>
>> I take it that by "enable NUMA in BIOS", you mean exposing SRAT to the
>> OS?
>>
>>> For the crash log, it seems caused by error number of cpumask.
>>> Any ideas about it?
>>
>> Much earlier in your log, there was a (non-fatal) warning, as below. Do
>> you see this without NUMA/SRAT enabled in your FW? I don't see how the
>> SRAT should affect the secondaries we try to bring online.
>>
>> Given your MPIDRs have Aff2 bits set, I wonder if we've conflated a
>> logical ID with a physical ID somewhere, and it just so happens that the
>> NUMA code is more likely to poke something based on that.
>>
>> Can you modify the warning in cpumask.h to dump the bad CPU number? That
>> would make it fairly clear if that's the case.
>>
>> Thanks,
>> Mark.
>>
>>> [ 0.297337] Detected PIPT I-cache on CPU1
>>> [ 0.297347] GICv3: CPU1: found redistributor 10001 region 1:0x000000004d140000
>>> [ 0.297356] CPU1: Booted secondary processor [410fd082]
>>> [ 0.297375] ------------[ cut here ]------------
>>> [ 0.320390] WARNING: CPU: 1 PID: 0 at ./include/linux/cpumask.h:121 gic_raise_softirq+0x128/0x17c
>>> [ 0.329356] Modules linked in:
>>> [ 0.332434]
>>> [ 0.333932] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.8.0-rc4-00163-g803ea3a #21
>>> [ 0.341581] Hardware name: Hisilicon Hi1616 Evaluation Board (DT)
>>> [ 0.347735] task: ffff8013e9dd0000 task.stack: ffff8013e9dcc000
>>> [ 0.353714] PC is at gic_raise_softirq+0x128/0x17c
>>> [ 0.358550] LR is at gic_raise_softirq+0xa0/0x17c
>>> [ 0.363298] pc : [<ffff00000838c124>] lr : [<ffff00000838c09c>] pstate: 200001c5
>>> [ 0.370770] sp : ffff8013e9dcfde0
>>> [ 0.374112] x29: ffff8013e9dcfde0 x28: 0000000000000000
>>> [ 0.379476] x27: 000000000083207c x26: ffff000008ca5d70
>>> [ 0.384841] x25: 0000000100000001 x24: ffff000008d63ff3
>>> [ 0.390205] x23: 0000000000000000 x22: ffff000008cb0000
>>> [ 0.395569] x21: ffff00000884edb0 x20: 0000000000000001
>>> [ 0.400933] x19: 0000000100000000 x18: 0000000000000000
>>> [ 0.406298] x17: 0000000000000000 x16: 0000000003010066
>>> [ 0.411661] x15: ffff000008ca8000 x14: 0000000000000013
>>> [ 0.417025] x13: 0000000000000000 x12: 0000000000000013
>>> [ 0.422389] x11: 0000000000000013 x10: 0000000002e92aa7
>>> [ 0.427754] x9 : 0000000000000000 x8 : ffff8413eb6ca668
>>> [ 0.433118] x7 : ffff8413eb6ca690 x6 : 0000000000000000
>>> [ 0.438482] x5 : fffffffffffffffe x4 : 0000000000000000
>>> [ 0.443845] x3 : 0000000000000040 x2 : 0000000000000041
>>> [ 0.449209] x1 : 0000000000000000 x0 : 0000000000000001
>>> [ 0.454573]
>>> [ 0.456069] ---[ end trace b58e70f3295a8cd7 ]---
>>> [ 0.460730] Call trace:
>>> [ 0.463193] Exception stack(0xffff8013e9dcfc10 to 0xffff8013e9dcfd40)
>>> [ 0.469699] fc00: 0000000100000000 0001000000000000
>>> [ 0.477611] fc20: ffff8013e9dcfde0 ffff00000838c124 ffff000008d72228 ffff8013e9dcff70
>>> [ 0.485524] fc40: ffff000008d72608 ffff000008ab02a4 0000000000000000 0000000000000000
>>> [ 0.493436] fc60: 0000000000000000 3464313430303030 0000000000000000 0000000000000000
>>> [ 0.501348] fc80: ffff8013e9dcfc90 ffff00000836e678 ffff8013e9dcfca0 ffff00000836e910
>>> [ 0.509259] fca0: ffff8013e9dcfd30 ffff00000836ec10 0000000000000001 0000000000000000
>>> [ 0.517171] fcc0: 0000000000000041 0000000000000040 0000000000000000 fffffffffffffffe
>>> [ 0.525083] fce0: 0000000000000000 ffff8413eb6ca690 ffff8413eb6ca668 0000000000000000
>>> [ 0.532995] fd00: 0000000002e92aa7 0000000000000013 0000000000000013 0000000000000000
>>> [ 0.540907] fd20: 0000000000000013 ffff000008ca8000 0000000003010066 0000000000000000
>>> [ 0.548819] [<ffff00000838c124>] gic_raise_softirq+0x128/0x17c
>>> [ 0.554713] [<ffff00000808e1f4>] smp_send_reschedule+0x34/0x3c
>>> [ 0.560605] [<ffff0000080ddf18>] resched_curr+0x40/0x5c
>>> [ 0.565881] [<ffff0000080de650>] check_preempt_curr+0x58/0xa0
>>> [ 0.571685] [<ffff0000080de6b0>] ttwu_do_wakeup+0x18/0x80
>>> [ 0.577136] [<ffff0000080de790>] ttwu_do_activate+0x78/0x88
>>> [ 0.582763] [<ffff0000080df5cc>] try_to_wake_up+0x1f8/0x300
>>> [ 0.588390] [<ffff0000080df79c>] default_wake_function+0x10/0x18
>>> [ 0.594458] [<ffff0000080f3210>] __wake_up_common+0x5c/0x9c
>>> [ 0.600085] [<ffff0000080f3264>] __wake_up_locked+0x14/0x1c
>>> [ 0.605712] [<ffff0000080f3e10>] complete+0x40/0x5c
>>> [ 0.610635] [<ffff00000808dba8>] secondary_start_kernel+0x148/0x1a8
>>> [ 0.616965] [<00000000000831a8>] 0x831a8
>>
>
> .
>
^ permalink raw reply [flat|nested] 7+ messages in thread
* [RFC] Arm64 boot fail with numa enable in BIOS
2016-09-19 14:07 ` [RFC] Arm64 boot fail with numa enable in BIOS Mark Rutland
2016-09-19 14:45 ` Will Deacon
@ 2016-09-19 17:41 ` James Morse
2016-09-20 2:51 ` Hanjun Guo
2016-09-20 3:29 ` Yisheng Xie
3 siblings, 0 replies; 7+ messages in thread
From: James Morse @ 2016-09-19 17:41 UTC (permalink / raw)
To: linux-arm-kernel
On 19/09/16 15:07, Mark Rutland wrote:
> On Mon, Sep 19, 2016 at 09:05:26PM +0800, Yisheng Xie wrote:
>> For the crash log, it seems caused by error number of cpumask.
>> Any ideas about it?
> Much earlier in your log, there was a (non-fatal) warning, as below. Do
> you see this without NUMA/SRAT enabled in your FW?
>> [ 0.297337] Detected PIPT I-cache on CPU1
>> [ 0.297347] GICv3: CPU1: found redistributor 10001 region 1:0x000000004d140000
>> [ 0.297356] CPU1: Booted secondary processor [410fd082]
>> [ 0.297375] ------------[ cut here ]------------
>> [ 0.320390] WARNING: CPU: 1 PID: 0 at ./include/linux/cpumask.h:121 gic_raise_softirq+0x128/0x17c
>> [ 0.329356] Modules linked in:
>> [ 0.332434]
>> [ 0.333932] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.8.0-rc4-00163-g803ea3a #21
>> [ 0.341581] Hardware name: Hisilicon Hi1616 Evaluation Board (DT)
>> [ 0.347735] task: ffff8013e9dd0000 task.stack: ffff8013e9dcc000
>> [ 0.353714] PC is at gic_raise_softirq+0x128/0x17c
>> [ 0.358550] LR is at gic_raise_softirq+0xa0/0x17c
I've seen this first trace when built with DEBUG_PER_CPU_MAPS. My version of
this trace[0] was just noise due to gic_compute_target_list() and
gic_raise_softirq() sharing an iterator.
This patch silenced it for me:
https://lkml.org/lkml/2016/9/19/623
Yours may be a different problem with the same symptom.
Thanks,
James
[0] gicv3 trace when built with DEBUG_PER_CPU_MAPS
[ 3.077738] GICv3: CPU1: found redistributor 1 region 0:0x000000002f120000
[ 3.077943] CPU1: Booted secondary processor [410fd0f0]
[ 3.078542] ------------[ cut here ]------------
[ 3.078746] WARNING: CPU: 1 PID: 0 at ../include/linux/cpumask.h:121
gic_raise_softirq+0x12c/0x170
[ 3.078812] Modules linked in:
[ 3.078869]
[ 3.078930] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.8.0-rc5+ #5188
[ 3.078994] Hardware name: Foundation-v8A (DT)
[ 3.079059] task: ffff80087a1a0080 task.stack: ffff80087a19c000
[ 3.079145] PC is at gic_raise_softirq+0x12c/0x170
[ 3.079226] LR is at gic_raise_softirq+0xa4/0x170
^ permalink raw reply [flat|nested] 7+ messages in thread
* [RFC] Arm64 boot fail with numa enable in BIOS
2016-09-19 14:07 ` [RFC] Arm64 boot fail with numa enable in BIOS Mark Rutland
2016-09-19 14:45 ` Will Deacon
2016-09-19 17:41 ` James Morse
@ 2016-09-20 2:51 ` Hanjun Guo
2016-09-20 3:29 ` Yisheng Xie
3 siblings, 0 replies; 7+ messages in thread
From: Hanjun Guo @ 2016-09-20 2:51 UTC (permalink / raw)
To: linux-arm-kernel
On 2016/9/19 22:07, Mark Rutland wrote:
> [adding LAKML, arm64 maintainers]
>
> On Mon, Sep 19, 2016 at 09:05:26PM +0800, Yisheng Xie wrote:
>> hi all,
> Hi,
>
> In future, please make sure to Cc LAKML along with relevant parties when
> sending arm64 patches/queries.
>
> For everyone newly Cc'd, the original message (with attachments) can be
> found at:
>
> http://lkml.kernel.org/r/7618d76d-bfa8-d8aa-59aa-06f9d90c1a98 at huawei.com
>
>> When I enable NUMA in BIOS for arm64, it failed to boot on v4.8-rc4-162-g071e31e.
> That commit ID doesn't seem to be in mainline (I can't find it in my
> local tree). Which tree are you using? Do you have local patches
> applied?
Yes, we have GICv3 ITS and mbigen patches on top which trying to enable PCI msi
and native SAS on the board.
>
> I take it that by "enable NUMA in BIOS", you mean exposing SRAT to the
> OS?
Yes, SRAT and SLIT.
>
>> For the crash log, it seems caused by error number of cpumask.
>> Any ideas about it?
> Much earlier in your log, there was a (non-fatal) warning, as below. Do
> you see this without NUMA/SRAT enabled in your FW?
Works ok without NUMA/SRAT enabled, we will check the SRAT table.
> I don't see how the
>
> SRAT should affect the secondaries we try to bring online.
Yes, CPU masks and secondaries boot up is related MADT not SRAT.
Thanks
Hanjun
>
> Given your MPIDRs have Aff2 bits set, I wonder if we've conflated a
> logical ID with a physical ID somewhere, and it just so happens that the
> NUMA code is more likely to poke something based on that.
>
> Can you modify the warning in cpumask.h to dump the bad CPU number? That
> would make it fairly clear if that's the case.
>
> Thanks,
> Mark.
>
>> [ 0.297337] Detected PIPT I-cache on CPU1
>> [ 0.297347] GICv3: CPU1: found redistributor 10001 region 1:0x000000004d140000
>> [ 0.297356] CPU1: Booted secondary processor [410fd082]
>> [ 0.297375] ------------[ cut here ]------------
>> [ 0.320390] WARNING: CPU: 1 PID: 0 at ./include/linux/cpumask.h:121 gic_raise_softirq+0x128/0x17c
>> [ 0.329356] Modules linked in:
>> [ 0.332434]
>> [ 0.333932] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.8.0-rc4-00163-g803ea3a #21
>> [ 0.341581] Hardware name: Hisilicon Hi1616 Evaluation Board (DT)
>> [ 0.347735] task: ffff8013e9dd0000 task.stack: ffff8013e9dcc000
>> [ 0.353714] PC is at gic_raise_softirq+0x128/0x17c
>> [ 0.358550] LR is at gic_raise_softirq+0xa0/0x17c
>> [ 0.363298] pc : [<ffff00000838c124>] lr : [<ffff00000838c09c>] pstate: 200001c5
>> [ 0.370770] sp : ffff8013e9dcfde0
>> [ 0.374112] x29: ffff8013e9dcfde0 x28: 0000000000000000
>> [ 0.379476] x27: 000000000083207c x26: ffff000008ca5d70
>> [ 0.384841] x25: 0000000100000001 x24: ffff000008d63ff3
>> [ 0.390205] x23: 0000000000000000 x22: ffff000008cb0000
>> [ 0.395569] x21: ffff00000884edb0 x20: 0000000000000001
>> [ 0.400933] x19: 0000000100000000 x18: 0000000000000000
>> [ 0.406298] x17: 0000000000000000 x16: 0000000003010066
>> [ 0.411661] x15: ffff000008ca8000 x14: 0000000000000013
>> [ 0.417025] x13: 0000000000000000 x12: 0000000000000013
>> [ 0.422389] x11: 0000000000000013 x10: 0000000002e92aa7
>> [ 0.427754] x9 : 0000000000000000 x8 : ffff8413eb6ca668
>> [ 0.433118] x7 : ffff8413eb6ca690 x6 : 0000000000000000
>> [ 0.438482] x5 : fffffffffffffffe x4 : 0000000000000000
>> [ 0.443845] x3 : 0000000000000040 x2 : 0000000000000041
>> [ 0.449209] x1 : 0000000000000000 x0 : 0000000000000001
>> [ 0.454573]
>> [ 0.456069] ---[ end trace b58e70f3295a8cd7 ]---
>> [ 0.460730] Call trace:
>> [ 0.463193] Exception stack(0xffff8013e9dcfc10 to 0xffff8013e9dcfd40)
>> [ 0.469699] fc00: 0000000100000000 0001000000000000
>> [ 0.477611] fc20: ffff8013e9dcfde0 ffff00000838c124 ffff000008d72228 ffff8013e9dcff70
>> [ 0.485524] fc40: ffff000008d72608 ffff000008ab02a4 0000000000000000 0000000000000000
>> [ 0.493436] fc60: 0000000000000000 3464313430303030 0000000000000000 0000000000000000
>> [ 0.501348] fc80: ffff8013e9dcfc90 ffff00000836e678 ffff8013e9dcfca0 ffff00000836e910
>> [ 0.509259] fca0: ffff8013e9dcfd30 ffff00000836ec10 0000000000000001 0000000000000000
>> [ 0.517171] fcc0: 0000000000000041 0000000000000040 0000000000000000 fffffffffffffffe
>> [ 0.525083] fce0: 0000000000000000 ffff8413eb6ca690 ffff8413eb6ca668 0000000000000000
>> [ 0.532995] fd00: 0000000002e92aa7 0000000000000013 0000000000000013 0000000000000000
>> [ 0.540907] fd20: 0000000000000013 ffff000008ca8000 0000000003010066 0000000000000000
>> [ 0.548819] [<ffff00000838c124>] gic_raise_softirq+0x128/0x17c
>> [ 0.554713] [<ffff00000808e1f4>] smp_send_reschedule+0x34/0x3c
>> [ 0.560605] [<ffff0000080ddf18>] resched_curr+0x40/0x5c
>> [ 0.565881] [<ffff0000080de650>] check_preempt_curr+0x58/0xa0
>> [ 0.571685] [<ffff0000080de6b0>] ttwu_do_wakeup+0x18/0x80
>> [ 0.577136] [<ffff0000080de790>] ttwu_do_activate+0x78/0x88
>> [ 0.582763] [<ffff0000080df5cc>] try_to_wake_up+0x1f8/0x300
>> [ 0.588390] [<ffff0000080df79c>] default_wake_function+0x10/0x18
>> [ 0.594458] [<ffff0000080f3210>] __wake_up_common+0x5c/0x9c
>> [ 0.600085] [<ffff0000080f3264>] __wake_up_locked+0x14/0x1c
>> [ 0.605712] [<ffff0000080f3e10>] complete+0x40/0x5c
>> [ 0.610635] [<ffff00000808dba8>] secondary_start_kernel+0x148/0x1a8
>> [ 0.616965] [<00000000000831a8>] 0x831a8
> .
>
^ permalink raw reply [flat|nested] 7+ messages in thread
* [RFC] Arm64 boot fail with numa enable in BIOS
2016-09-19 14:07 ` [RFC] Arm64 boot fail with numa enable in BIOS Mark Rutland
` (2 preceding siblings ...)
2016-09-20 2:51 ` Hanjun Guo
@ 2016-09-20 3:29 ` Yisheng Xie
2016-09-20 8:33 ` Will Deacon
3 siblings, 1 reply; 7+ messages in thread
From: Yisheng Xie @ 2016-09-20 3:29 UTC (permalink / raw)
To: linux-arm-kernel
On 2016/9/19 22:07, Mark Rutland wrote:
> [adding LAKML, arm64 maintainers]
>
> On Mon, Sep 19, 2016 at 09:05:26PM +0800, Yisheng Xie wrote:
>> hi all,
>
> Can you modify the warning in cpumask.h to dump the bad CPU number? That
> would make it fairly clear if that's the case.
>
hi Mark,
I dump the bad CPU number, it is 64,
And the cpumask get from task is 00000000,00000000.
[ 3.873044] select_task_rq: allowed 0, allow_cpumask 00000000,00000000
[ 3.879727] cpumask_check: cpu 64, nr_cpumask_bits:64, nr_cpu_ids= 64
[ 3.895989] ------------[ cut here ]------------
[ 3.900652] WARNING: CPU: 16 PID: 103 at ./include/linux/cpumask.h:122 try_to_wake_up+0x410/0x4ac
Thanks.
Yisheng Xie
> Thanks,
> Mark.
>
>> [ 0.297337] Detected PIPT I-cache on CPU1
>> [ 0.297347] GICv3: CPU1: found redistributor 10001 region 1:0x000000004d140000
>> [ 0.297356] CPU1: Booted secondary processor [410fd082]
>> [ 0.297375] ------------[ cut here ]------------
>> [ 0.320390] WARNING: CPU: 1 PID: 0 at ./include/linux/cpumask.h:121 gic_raise_softirq+0x128/0x17c
>> [ 0.329356] Modules linked in:
>> [ 0.332434]
>> [ 0.333932] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.8.0-rc4-00163-g803ea3a #21
>> [ 0.341581] Hardware name: Hisilicon Hi1616 Evaluation Board (DT)
>> [ 0.347735] task: ffff8013e9dd0000 task.stack: ffff8013e9dcc000
>> [ 0.353714] PC is at gic_raise_softirq+0x128/0x17c
>> [ 0.358550] LR is at gic_raise_softirq+0xa0/0x17c
>> [ 0.363298] pc : [<ffff00000838c124>] lr : [<ffff00000838c09c>] pstate: 200001c5
>> [ 0.370770] sp : ffff8013e9dcfde0
>> [ 0.374112] x29: ffff8013e9dcfde0 x28: 0000000000000000
>> [ 0.379476] x27: 000000000083207c x26: ffff000008ca5d70
>> [ 0.384841] x25: 0000000100000001 x24: ffff000008d63ff3
>> [ 0.390205] x23: 0000000000000000 x22: ffff000008cb0000
>> [ 0.395569] x21: ffff00000884edb0 x20: 0000000000000001
>> [ 0.400933] x19: 0000000100000000 x18: 0000000000000000
>> [ 0.406298] x17: 0000000000000000 x16: 0000000003010066
>> [ 0.411661] x15: ffff000008ca8000 x14: 0000000000000013
>> [ 0.417025] x13: 0000000000000000 x12: 0000000000000013
>> [ 0.422389] x11: 0000000000000013 x10: 0000000002e92aa7
>> [ 0.427754] x9 : 0000000000000000 x8 : ffff8413eb6ca668
>> [ 0.433118] x7 : ffff8413eb6ca690 x6 : 0000000000000000
>> [ 0.438482] x5 : fffffffffffffffe x4 : 0000000000000000
>> [ 0.443845] x3 : 0000000000000040 x2 : 0000000000000041
>> [ 0.449209] x1 : 0000000000000000 x0 : 0000000000000001
>> [ 0.454573]
>> [ 0.456069] ---[ end trace b58e70f3295a8cd7 ]---
>> [ 0.460730] Call trace:
>> [ 0.463193] Exception stack(0xffff8013e9dcfc10 to 0xffff8013e9dcfd40)
>> [ 0.469699] fc00: 0000000100000000 0001000000000000
>> [ 0.477611] fc20: ffff8013e9dcfde0 ffff00000838c124 ffff000008d72228 ffff8013e9dcff70
>> [ 0.485524] fc40: ffff000008d72608 ffff000008ab02a4 0000000000000000 0000000000000000
>> [ 0.493436] fc60: 0000000000000000 3464313430303030 0000000000000000 0000000000000000
>> [ 0.501348] fc80: ffff8013e9dcfc90 ffff00000836e678 ffff8013e9dcfca0 ffff00000836e910
>> [ 0.509259] fca0: ffff8013e9dcfd30 ffff00000836ec10 0000000000000001 0000000000000000
>> [ 0.517171] fcc0: 0000000000000041 0000000000000040 0000000000000000 fffffffffffffffe
>> [ 0.525083] fce0: 0000000000000000 ffff8413eb6ca690 ffff8413eb6ca668 0000000000000000
>> [ 0.532995] fd00: 0000000002e92aa7 0000000000000013 0000000000000013 0000000000000000
>> [ 0.540907] fd20: 0000000000000013 ffff000008ca8000 0000000003010066 0000000000000000
>> [ 0.548819] [<ffff00000838c124>] gic_raise_softirq+0x128/0x17c
>> [ 0.554713] [<ffff00000808e1f4>] smp_send_reschedule+0x34/0x3c
>> [ 0.560605] [<ffff0000080ddf18>] resched_curr+0x40/0x5c
>> [ 0.565881] [<ffff0000080de650>] check_preempt_curr+0x58/0xa0
>> [ 0.571685] [<ffff0000080de6b0>] ttwu_do_wakeup+0x18/0x80
>> [ 0.577136] [<ffff0000080de790>] ttwu_do_activate+0x78/0x88
>> [ 0.582763] [<ffff0000080df5cc>] try_to_wake_up+0x1f8/0x300
>> [ 0.588390] [<ffff0000080df79c>] default_wake_function+0x10/0x18
>> [ 0.594458] [<ffff0000080f3210>] __wake_up_common+0x5c/0x9c
>> [ 0.600085] [<ffff0000080f3264>] __wake_up_locked+0x14/0x1c
>> [ 0.605712] [<ffff0000080f3e10>] complete+0x40/0x5c
>> [ 0.610635] [<ffff00000808dba8>] secondary_start_kernel+0x148/0x1a8
>> [ 0.616965] [<00000000000831a8>] 0x831a8
>
> .
>
^ permalink raw reply [flat|nested] 7+ messages in thread
* [RFC] Arm64 boot fail with numa enable in BIOS
2016-09-20 3:29 ` Yisheng Xie
@ 2016-09-20 8:33 ` Will Deacon
0 siblings, 0 replies; 7+ messages in thread
From: Will Deacon @ 2016-09-20 8:33 UTC (permalink / raw)
To: linux-arm-kernel
Hi Yisheng,
On Tue, Sep 20, 2016 at 11:29:24AM +0800, Yisheng Xie wrote:
> On 2016/9/19 22:07, Mark Rutland wrote:
> > On Mon, Sep 19, 2016 at 09:05:26PM +0800, Yisheng Xie wrote:
> > Can you modify the warning in cpumask.h to dump the bad CPU number? That
> > would make it fairly clear if that's the case.
> >
> hi Mark,
> I dump the bad CPU number, it is 64,
> And the cpumask get from task is 00000000,00000000.
>
> [ 3.873044] select_task_rq: allowed 0, allow_cpumask 00000000,00000000
> [ 3.879727] cpumask_check: cpu 64, nr_cpumask_bits:64, nr_cpu_ids= 64
> [ 3.895989] ------------[ cut here ]------------
> [ 3.900652] WARNING: CPU: 16 PID: 103 at ./include/linux/cpumask.h:122 try_to_wake_up+0x410/0x4ac
Can you look at this patch from David, please:
http://lists.infradead.org/pipermail/linux-arm-kernel/2016-September/458110.html
and offer a Tested-by if it fixes your problem?
Thanks,
Will
^ permalink raw reply [flat|nested] 7+ messages in thread