linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* [RFC] Arm64 boot fail with numa enable in BIOS
       [not found] <7618d76d-bfa8-d8aa-59aa-06f9d90c1a98@huawei.com>
@ 2016-09-19 14:07 ` Mark Rutland
  2016-09-19 14:45   ` Will Deacon
                     ` (3 more replies)
  0 siblings, 4 replies; 7+ messages in thread
From: Mark Rutland @ 2016-09-19 14:07 UTC (permalink / raw)
  To: linux-arm-kernel

[adding LAKML, arm64 maintainers]

On Mon, Sep 19, 2016 at 09:05:26PM +0800, Yisheng Xie wrote:
> hi all,

Hi,

In future, please make sure to Cc LAKML along with relevant parties when
sending arm64 patches/queries.

For everyone newly Cc'd, the original message (with attachments) can be
found at:

http://lkml.kernel.org/r/7618d76d-bfa8-d8aa-59aa-06f9d90c1a98 at huawei.com

> When I enable NUMA in BIOS for arm64, it failed to boot on v4.8-rc4-162-g071e31e.

That commit ID doesn't seem to be in mainline (I can't find it in my
local tree). Which tree are you using? Do you have local patches
applied?

I take it that by "enable NUMA in BIOS", you mean exposing SRAT to the
OS?

> For the crash log, it seems caused by error number of cpumask.
> Any ideas about it?

Much earlier in your log, there was a (non-fatal) warning, as below. Do
you see this without NUMA/SRAT enabled in your FW? I don't see how the
SRAT should affect the secondaries we try to bring online.

Given your MPIDRs have Aff2 bits set, I wonder if we've conflated a
logical ID with a physical ID somewhere, and it just so happens that the
NUMA code is more likely to poke something based on that.

Can you modify the warning in cpumask.h to dump the bad CPU number? That
would make it fairly clear if that's the case.

Thanks,
Mark.

> [    0.297337] Detected PIPT I-cache on CPU1
> [    0.297347] GICv3: CPU1: found redistributor 10001 region 1:0x000000004d140000
> [    0.297356] CPU1: Booted secondary processor [410fd082]
> [    0.297375] ------------[ cut here ]------------
> [    0.320390] WARNING: CPU: 1 PID: 0 at ./include/linux/cpumask.h:121 gic_raise_softirq+0x128/0x17c
> [    0.329356] Modules linked in:
> [    0.332434] 
> [    0.333932] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.8.0-rc4-00163-g803ea3a #21
> [    0.341581] Hardware name: Hisilicon Hi1616 Evaluation Board (DT)
> [    0.347735] task: ffff8013e9dd0000 task.stack: ffff8013e9dcc000
> [    0.353714] PC is at gic_raise_softirq+0x128/0x17c
> [    0.358550] LR is at gic_raise_softirq+0xa0/0x17c
> [    0.363298] pc : [<ffff00000838c124>] lr : [<ffff00000838c09c>] pstate: 200001c5
> [    0.370770] sp : ffff8013e9dcfde0
> [    0.374112] x29: ffff8013e9dcfde0 x28: 0000000000000000 
> [    0.379476] x27: 000000000083207c x26: ffff000008ca5d70 
> [    0.384841] x25: 0000000100000001 x24: ffff000008d63ff3 
> [    0.390205] x23: 0000000000000000 x22: ffff000008cb0000 
> [    0.395569] x21: ffff00000884edb0 x20: 0000000000000001 
> [    0.400933] x19: 0000000100000000 x18: 0000000000000000 
> [    0.406298] x17: 0000000000000000 x16: 0000000003010066 
> [    0.411661] x15: ffff000008ca8000 x14: 0000000000000013 
> [    0.417025] x13: 0000000000000000 x12: 0000000000000013 
> [    0.422389] x11: 0000000000000013 x10: 0000000002e92aa7 
> [    0.427754] x9 : 0000000000000000 x8 : ffff8413eb6ca668 
> [    0.433118] x7 : ffff8413eb6ca690 x6 : 0000000000000000 
> [    0.438482] x5 : fffffffffffffffe x4 : 0000000000000000 
> [    0.443845] x3 : 0000000000000040 x2 : 0000000000000041 
> [    0.449209] x1 : 0000000000000000 x0 : 0000000000000001 
> [    0.454573] 
> [    0.456069] ---[ end trace b58e70f3295a8cd7 ]---
> [    0.460730] Call trace:
> [    0.463193] Exception stack(0xffff8013e9dcfc10 to 0xffff8013e9dcfd40)
> [    0.469699] fc00:                                   0000000100000000 0001000000000000
> [    0.477611] fc20: ffff8013e9dcfde0 ffff00000838c124 ffff000008d72228 ffff8013e9dcff70
> [    0.485524] fc40: ffff000008d72608 ffff000008ab02a4 0000000000000000 0000000000000000
> [    0.493436] fc60: 0000000000000000 3464313430303030 0000000000000000 0000000000000000
> [    0.501348] fc80: ffff8013e9dcfc90 ffff00000836e678 ffff8013e9dcfca0 ffff00000836e910
> [    0.509259] fca0: ffff8013e9dcfd30 ffff00000836ec10 0000000000000001 0000000000000000
> [    0.517171] fcc0: 0000000000000041 0000000000000040 0000000000000000 fffffffffffffffe
> [    0.525083] fce0: 0000000000000000 ffff8413eb6ca690 ffff8413eb6ca668 0000000000000000
> [    0.532995] fd00: 0000000002e92aa7 0000000000000013 0000000000000013 0000000000000000
> [    0.540907] fd20: 0000000000000013 ffff000008ca8000 0000000003010066 0000000000000000
> [    0.548819] [<ffff00000838c124>] gic_raise_softirq+0x128/0x17c
> [    0.554713] [<ffff00000808e1f4>] smp_send_reschedule+0x34/0x3c
> [    0.560605] [<ffff0000080ddf18>] resched_curr+0x40/0x5c
> [    0.565881] [<ffff0000080de650>] check_preempt_curr+0x58/0xa0
> [    0.571685] [<ffff0000080de6b0>] ttwu_do_wakeup+0x18/0x80
> [    0.577136] [<ffff0000080de790>] ttwu_do_activate+0x78/0x88
> [    0.582763] [<ffff0000080df5cc>] try_to_wake_up+0x1f8/0x300
> [    0.588390] [<ffff0000080df79c>] default_wake_function+0x10/0x18
> [    0.594458] [<ffff0000080f3210>] __wake_up_common+0x5c/0x9c
> [    0.600085] [<ffff0000080f3264>] __wake_up_locked+0x14/0x1c
> [    0.605712] [<ffff0000080f3e10>] complete+0x40/0x5c
> [    0.610635] [<ffff00000808dba8>] secondary_start_kernel+0x148/0x1a8
> [    0.616965] [<00000000000831a8>] 0x831a8

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [RFC] Arm64 boot fail with numa enable in BIOS
  2016-09-19 14:07 ` [RFC] Arm64 boot fail with numa enable in BIOS Mark Rutland
@ 2016-09-19 14:45   ` Will Deacon
  2016-09-20  1:19     ` Leizhen (ThunderTown)
  2016-09-19 17:41   ` James Morse
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 7+ messages in thread
From: Will Deacon @ 2016-09-19 14:45 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Sep 19, 2016 at 03:07:19PM +0100, Mark Rutland wrote:
> [adding LAKML, arm64 maintainers]

I've also looped in Euler ThunderTown, since (a) he's at Huawei and is
assumedly testing this stuff and (b) he has a fairly big NUMA patch
series doing the rounds (some of which I've queued).

> On Mon, Sep 19, 2016 at 09:05:26PM +0800, Yisheng Xie wrote:
> In future, please make sure to Cc LAKML along with relevant parties when
> sending arm64 patches/queries.
> 
> For everyone newly Cc'd, the original message (with attachments) can be
> found at:
> 
> http://lkml.kernel.org/r/7618d76d-bfa8-d8aa-59aa-06f9d90c1a98 at huawei.com
> 
> > When I enable NUMA in BIOS for arm64, it failed to boot on v4.8-rc4-162-g071e31e.
> 
> That commit ID doesn't seem to be in mainline (I can't find it in my
> local tree). Which tree are you using? Do you have local patches
> applied?

That commit is in mainline:

  http://git.kernel.org/linus/071e31e

It would be nice to know if the problem also exists on the arm64
for-next/core branch.

Will


> I take it that by "enable NUMA in BIOS", you mean exposing SRAT to the
> OS?
> 
> > For the crash log, it seems caused by error number of cpumask.
> > Any ideas about it?
> 
> Much earlier in your log, there was a (non-fatal) warning, as below. Do
> you see this without NUMA/SRAT enabled in your FW? I don't see how the
> SRAT should affect the secondaries we try to bring online.
> 
> Given your MPIDRs have Aff2 bits set, I wonder if we've conflated a
> logical ID with a physical ID somewhere, and it just so happens that the
> NUMA code is more likely to poke something based on that.
> 
> Can you modify the warning in cpumask.h to dump the bad CPU number? That
> would make it fairly clear if that's the case.
> 
> Thanks,
> Mark.
> 
> > [    0.297337] Detected PIPT I-cache on CPU1
> > [    0.297347] GICv3: CPU1: found redistributor 10001 region 1:0x000000004d140000
> > [    0.297356] CPU1: Booted secondary processor [410fd082]
> > [    0.297375] ------------[ cut here ]------------
> > [    0.320390] WARNING: CPU: 1 PID: 0 at ./include/linux/cpumask.h:121 gic_raise_softirq+0x128/0x17c
> > [    0.329356] Modules linked in:
> > [    0.332434] 
> > [    0.333932] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.8.0-rc4-00163-g803ea3a #21
> > [    0.341581] Hardware name: Hisilicon Hi1616 Evaluation Board (DT)
> > [    0.347735] task: ffff8013e9dd0000 task.stack: ffff8013e9dcc000
> > [    0.353714] PC is at gic_raise_softirq+0x128/0x17c
> > [    0.358550] LR is at gic_raise_softirq+0xa0/0x17c
> > [    0.363298] pc : [<ffff00000838c124>] lr : [<ffff00000838c09c>] pstate: 200001c5
> > [    0.370770] sp : ffff8013e9dcfde0
> > [    0.374112] x29: ffff8013e9dcfde0 x28: 0000000000000000 
> > [    0.379476] x27: 000000000083207c x26: ffff000008ca5d70 
> > [    0.384841] x25: 0000000100000001 x24: ffff000008d63ff3 
> > [    0.390205] x23: 0000000000000000 x22: ffff000008cb0000 
> > [    0.395569] x21: ffff00000884edb0 x20: 0000000000000001 
> > [    0.400933] x19: 0000000100000000 x18: 0000000000000000 
> > [    0.406298] x17: 0000000000000000 x16: 0000000003010066 
> > [    0.411661] x15: ffff000008ca8000 x14: 0000000000000013 
> > [    0.417025] x13: 0000000000000000 x12: 0000000000000013 
> > [    0.422389] x11: 0000000000000013 x10: 0000000002e92aa7 
> > [    0.427754] x9 : 0000000000000000 x8 : ffff8413eb6ca668 
> > [    0.433118] x7 : ffff8413eb6ca690 x6 : 0000000000000000 
> > [    0.438482] x5 : fffffffffffffffe x4 : 0000000000000000 
> > [    0.443845] x3 : 0000000000000040 x2 : 0000000000000041 
> > [    0.449209] x1 : 0000000000000000 x0 : 0000000000000001 
> > [    0.454573] 
> > [    0.456069] ---[ end trace b58e70f3295a8cd7 ]---
> > [    0.460730] Call trace:
> > [    0.463193] Exception stack(0xffff8013e9dcfc10 to 0xffff8013e9dcfd40)
> > [    0.469699] fc00:                                   0000000100000000 0001000000000000
> > [    0.477611] fc20: ffff8013e9dcfde0 ffff00000838c124 ffff000008d72228 ffff8013e9dcff70
> > [    0.485524] fc40: ffff000008d72608 ffff000008ab02a4 0000000000000000 0000000000000000
> > [    0.493436] fc60: 0000000000000000 3464313430303030 0000000000000000 0000000000000000
> > [    0.501348] fc80: ffff8013e9dcfc90 ffff00000836e678 ffff8013e9dcfca0 ffff00000836e910
> > [    0.509259] fca0: ffff8013e9dcfd30 ffff00000836ec10 0000000000000001 0000000000000000
> > [    0.517171] fcc0: 0000000000000041 0000000000000040 0000000000000000 fffffffffffffffe
> > [    0.525083] fce0: 0000000000000000 ffff8413eb6ca690 ffff8413eb6ca668 0000000000000000
> > [    0.532995] fd00: 0000000002e92aa7 0000000000000013 0000000000000013 0000000000000000
> > [    0.540907] fd20: 0000000000000013 ffff000008ca8000 0000000003010066 0000000000000000
> > [    0.548819] [<ffff00000838c124>] gic_raise_softirq+0x128/0x17c
> > [    0.554713] [<ffff00000808e1f4>] smp_send_reschedule+0x34/0x3c
> > [    0.560605] [<ffff0000080ddf18>] resched_curr+0x40/0x5c
> > [    0.565881] [<ffff0000080de650>] check_preempt_curr+0x58/0xa0
> > [    0.571685] [<ffff0000080de6b0>] ttwu_do_wakeup+0x18/0x80
> > [    0.577136] [<ffff0000080de790>] ttwu_do_activate+0x78/0x88
> > [    0.582763] [<ffff0000080df5cc>] try_to_wake_up+0x1f8/0x300
> > [    0.588390] [<ffff0000080df79c>] default_wake_function+0x10/0x18
> > [    0.594458] [<ffff0000080f3210>] __wake_up_common+0x5c/0x9c
> > [    0.600085] [<ffff0000080f3264>] __wake_up_locked+0x14/0x1c
> > [    0.605712] [<ffff0000080f3e10>] complete+0x40/0x5c
> > [    0.610635] [<ffff00000808dba8>] secondary_start_kernel+0x148/0x1a8
> > [    0.616965] [<00000000000831a8>] 0x831a8
> 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [RFC] Arm64 boot fail with numa enable in BIOS
  2016-09-19 14:07 ` [RFC] Arm64 boot fail with numa enable in BIOS Mark Rutland
  2016-09-19 14:45   ` Will Deacon
@ 2016-09-19 17:41   ` James Morse
  2016-09-20  2:51   ` Hanjun Guo
  2016-09-20  3:29   ` Yisheng Xie
  3 siblings, 0 replies; 7+ messages in thread
From: James Morse @ 2016-09-19 17:41 UTC (permalink / raw)
  To: linux-arm-kernel

On 19/09/16 15:07, Mark Rutland wrote:
> On Mon, Sep 19, 2016 at 09:05:26PM +0800, Yisheng Xie wrote:
>> For the crash log, it seems caused by error number of cpumask.
>> Any ideas about it?

> Much earlier in your log, there was a (non-fatal) warning, as below. Do
> you see this without NUMA/SRAT enabled in your FW?

>> [    0.297337] Detected PIPT I-cache on CPU1
>> [    0.297347] GICv3: CPU1: found redistributor 10001 region 1:0x000000004d140000
>> [    0.297356] CPU1: Booted secondary processor [410fd082]
>> [    0.297375] ------------[ cut here ]------------
>> [    0.320390] WARNING: CPU: 1 PID: 0 at ./include/linux/cpumask.h:121 gic_raise_softirq+0x128/0x17c
>> [    0.329356] Modules linked in:
>> [    0.332434] 
>> [    0.333932] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.8.0-rc4-00163-g803ea3a #21
>> [    0.341581] Hardware name: Hisilicon Hi1616 Evaluation Board (DT)
>> [    0.347735] task: ffff8013e9dd0000 task.stack: ffff8013e9dcc000
>> [    0.353714] PC is at gic_raise_softirq+0x128/0x17c
>> [    0.358550] LR is at gic_raise_softirq+0xa0/0x17c

I've seen this first trace when built with DEBUG_PER_CPU_MAPS. My version of
this trace[0] was just noise due to gic_compute_target_list() and
gic_raise_softirq() sharing an iterator.

This patch silenced it for me:
https://lkml.org/lkml/2016/9/19/623

Yours may be a different problem with the same symptom.


Thanks,

James


[0] gicv3 trace when built with DEBUG_PER_CPU_MAPS
[    3.077738] GICv3: CPU1: found redistributor 1 region 0:0x000000002f120000
[    3.077943] CPU1: Booted secondary processor [410fd0f0]
[    3.078542] ------------[ cut here ]------------
[    3.078746] WARNING: CPU: 1 PID: 0 at ../include/linux/cpumask.h:121
gic_raise_softirq+0x12c/0x170
[    3.078812] Modules linked in:
[    3.078869]
[    3.078930] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.8.0-rc5+ #5188
[    3.078994] Hardware name: Foundation-v8A (DT)
[    3.079059] task: ffff80087a1a0080 task.stack: ffff80087a19c000
[    3.079145] PC is at gic_raise_softirq+0x12c/0x170
[    3.079226] LR is at gic_raise_softirq+0xa4/0x170

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [RFC] Arm64 boot fail with numa enable in BIOS
  2016-09-19 14:45   ` Will Deacon
@ 2016-09-20  1:19     ` Leizhen (ThunderTown)
  0 siblings, 0 replies; 7+ messages in thread
From: Leizhen (ThunderTown) @ 2016-09-20  1:19 UTC (permalink / raw)
  To: linux-arm-kernel



On 2016/9/19 22:45, Will Deacon wrote:
> On Mon, Sep 19, 2016 at 03:07:19PM +0100, Mark Rutland wrote:
>> [adding LAKML, arm64 maintainers]
> 
> I've also looped in Euler ThunderTown, since (a) he's at Huawei and is
> assumedly testing this stuff and (b) he has a fairly big NUMA patch
> series doing the rounds (some of which I've queued).
In my patch series, only one is used to resolve crashed problem, but it's related to device-tree.

> 
>> On Mon, Sep 19, 2016 at 09:05:26PM +0800, Yisheng Xie wrote:
>> In future, please make sure to Cc LAKML along with relevant parties when
>> sending arm64 patches/queries.
>>
>> For everyone newly Cc'd, the original message (with attachments) can be
>> found at:
>>
>> http://lkml.kernel.org/r/7618d76d-bfa8-d8aa-59aa-06f9d90c1a98 at huawei.com
>>
>>> When I enable NUMA in BIOS for arm64, it failed to boot on v4.8-rc4-162-g071e31e.
>>
>> That commit ID doesn't seem to be in mainline (I can't find it in my
>> local tree). Which tree are you using? Do you have local patches
>> applied?
> 
> That commit is in mainline:
> 
>   http://git.kernel.org/linus/071e31e
> 
> It would be nice to know if the problem also exists on the arm64
> for-next/core branch.
> 
> Will
> 
> 
>> I take it that by "enable NUMA in BIOS", you mean exposing SRAT to the
>> OS?
>>
>>> For the crash log, it seems caused by error number of cpumask.
>>> Any ideas about it?
>>
>> Much earlier in your log, there was a (non-fatal) warning, as below. Do
>> you see this without NUMA/SRAT enabled in your FW? I don't see how the
>> SRAT should affect the secondaries we try to bring online.
>>
>> Given your MPIDRs have Aff2 bits set, I wonder if we've conflated a
>> logical ID with a physical ID somewhere, and it just so happens that the
>> NUMA code is more likely to poke something based on that.
>>
>> Can you modify the warning in cpumask.h to dump the bad CPU number? That
>> would make it fairly clear if that's the case.
>>
>> Thanks,
>> Mark.
>>
>>> [    0.297337] Detected PIPT I-cache on CPU1
>>> [    0.297347] GICv3: CPU1: found redistributor 10001 region 1:0x000000004d140000
>>> [    0.297356] CPU1: Booted secondary processor [410fd082]
>>> [    0.297375] ------------[ cut here ]------------
>>> [    0.320390] WARNING: CPU: 1 PID: 0 at ./include/linux/cpumask.h:121 gic_raise_softirq+0x128/0x17c
>>> [    0.329356] Modules linked in:
>>> [    0.332434] 
>>> [    0.333932] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.8.0-rc4-00163-g803ea3a #21
>>> [    0.341581] Hardware name: Hisilicon Hi1616 Evaluation Board (DT)
>>> [    0.347735] task: ffff8013e9dd0000 task.stack: ffff8013e9dcc000
>>> [    0.353714] PC is at gic_raise_softirq+0x128/0x17c
>>> [    0.358550] LR is at gic_raise_softirq+0xa0/0x17c
>>> [    0.363298] pc : [<ffff00000838c124>] lr : [<ffff00000838c09c>] pstate: 200001c5
>>> [    0.370770] sp : ffff8013e9dcfde0
>>> [    0.374112] x29: ffff8013e9dcfde0 x28: 0000000000000000 
>>> [    0.379476] x27: 000000000083207c x26: ffff000008ca5d70 
>>> [    0.384841] x25: 0000000100000001 x24: ffff000008d63ff3 
>>> [    0.390205] x23: 0000000000000000 x22: ffff000008cb0000 
>>> [    0.395569] x21: ffff00000884edb0 x20: 0000000000000001 
>>> [    0.400933] x19: 0000000100000000 x18: 0000000000000000 
>>> [    0.406298] x17: 0000000000000000 x16: 0000000003010066 
>>> [    0.411661] x15: ffff000008ca8000 x14: 0000000000000013 
>>> [    0.417025] x13: 0000000000000000 x12: 0000000000000013 
>>> [    0.422389] x11: 0000000000000013 x10: 0000000002e92aa7 
>>> [    0.427754] x9 : 0000000000000000 x8 : ffff8413eb6ca668 
>>> [    0.433118] x7 : ffff8413eb6ca690 x6 : 0000000000000000 
>>> [    0.438482] x5 : fffffffffffffffe x4 : 0000000000000000 
>>> [    0.443845] x3 : 0000000000000040 x2 : 0000000000000041 
>>> [    0.449209] x1 : 0000000000000000 x0 : 0000000000000001 
>>> [    0.454573] 
>>> [    0.456069] ---[ end trace b58e70f3295a8cd7 ]---
>>> [    0.460730] Call trace:
>>> [    0.463193] Exception stack(0xffff8013e9dcfc10 to 0xffff8013e9dcfd40)
>>> [    0.469699] fc00:                                   0000000100000000 0001000000000000
>>> [    0.477611] fc20: ffff8013e9dcfde0 ffff00000838c124 ffff000008d72228 ffff8013e9dcff70
>>> [    0.485524] fc40: ffff000008d72608 ffff000008ab02a4 0000000000000000 0000000000000000
>>> [    0.493436] fc60: 0000000000000000 3464313430303030 0000000000000000 0000000000000000
>>> [    0.501348] fc80: ffff8013e9dcfc90 ffff00000836e678 ffff8013e9dcfca0 ffff00000836e910
>>> [    0.509259] fca0: ffff8013e9dcfd30 ffff00000836ec10 0000000000000001 0000000000000000
>>> [    0.517171] fcc0: 0000000000000041 0000000000000040 0000000000000000 fffffffffffffffe
>>> [    0.525083] fce0: 0000000000000000 ffff8413eb6ca690 ffff8413eb6ca668 0000000000000000
>>> [    0.532995] fd00: 0000000002e92aa7 0000000000000013 0000000000000013 0000000000000000
>>> [    0.540907] fd20: 0000000000000013 ffff000008ca8000 0000000003010066 0000000000000000
>>> [    0.548819] [<ffff00000838c124>] gic_raise_softirq+0x128/0x17c
>>> [    0.554713] [<ffff00000808e1f4>] smp_send_reschedule+0x34/0x3c
>>> [    0.560605] [<ffff0000080ddf18>] resched_curr+0x40/0x5c
>>> [    0.565881] [<ffff0000080de650>] check_preempt_curr+0x58/0xa0
>>> [    0.571685] [<ffff0000080de6b0>] ttwu_do_wakeup+0x18/0x80
>>> [    0.577136] [<ffff0000080de790>] ttwu_do_activate+0x78/0x88
>>> [    0.582763] [<ffff0000080df5cc>] try_to_wake_up+0x1f8/0x300
>>> [    0.588390] [<ffff0000080df79c>] default_wake_function+0x10/0x18
>>> [    0.594458] [<ffff0000080f3210>] __wake_up_common+0x5c/0x9c
>>> [    0.600085] [<ffff0000080f3264>] __wake_up_locked+0x14/0x1c
>>> [    0.605712] [<ffff0000080f3e10>] complete+0x40/0x5c
>>> [    0.610635] [<ffff00000808dba8>] secondary_start_kernel+0x148/0x1a8
>>> [    0.616965] [<00000000000831a8>] 0x831a8
>>
> 
> .
> 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [RFC] Arm64 boot fail with numa enable in BIOS
  2016-09-19 14:07 ` [RFC] Arm64 boot fail with numa enable in BIOS Mark Rutland
  2016-09-19 14:45   ` Will Deacon
  2016-09-19 17:41   ` James Morse
@ 2016-09-20  2:51   ` Hanjun Guo
  2016-09-20  3:29   ` Yisheng Xie
  3 siblings, 0 replies; 7+ messages in thread
From: Hanjun Guo @ 2016-09-20  2:51 UTC (permalink / raw)
  To: linux-arm-kernel

On 2016/9/19 22:07, Mark Rutland wrote:
> [adding LAKML, arm64 maintainers]
>
> On Mon, Sep 19, 2016 at 09:05:26PM +0800, Yisheng Xie wrote:
>> hi all,
> Hi,
>
> In future, please make sure to Cc LAKML along with relevant parties when
> sending arm64 patches/queries.
>
> For everyone newly Cc'd, the original message (with attachments) can be
> found at:
>
> http://lkml.kernel.org/r/7618d76d-bfa8-d8aa-59aa-06f9d90c1a98 at huawei.com
>
>> When I enable NUMA in BIOS for arm64, it failed to boot on v4.8-rc4-162-g071e31e.
> That commit ID doesn't seem to be in mainline (I can't find it in my
> local tree). Which tree are you using? Do you have local patches
> applied?

Yes, we have GICv3 ITS and mbigen patches on top which trying to enable PCI msi
and native SAS on the board.

>
> I take it that by "enable NUMA in BIOS", you mean exposing SRAT to the
> OS?

Yes, SRAT and SLIT.

>
>> For the crash log, it seems caused by error number of cpumask.
>> Any ideas about it?
> Much earlier in your log, there was a (non-fatal) warning, as below. Do
> you see this without NUMA/SRAT enabled in your FW? 

Works ok without NUMA/SRAT enabled, we will check the SRAT table.

> I don't see how the
>
> SRAT should affect the secondaries we try to bring online.

Yes, CPU masks and secondaries boot up is related MADT not SRAT.

Thanks
Hanjun

>
> Given your MPIDRs have Aff2 bits set, I wonder if we've conflated a
> logical ID with a physical ID somewhere, and it just so happens that the
> NUMA code is more likely to poke something based on that.
>
> Can you modify the warning in cpumask.h to dump the bad CPU number? That
> would make it fairly clear if that's the case.
>
> Thanks,
> Mark.
>
>> [    0.297337] Detected PIPT I-cache on CPU1
>> [    0.297347] GICv3: CPU1: found redistributor 10001 region 1:0x000000004d140000
>> [    0.297356] CPU1: Booted secondary processor [410fd082]
>> [    0.297375] ------------[ cut here ]------------
>> [    0.320390] WARNING: CPU: 1 PID: 0 at ./include/linux/cpumask.h:121 gic_raise_softirq+0x128/0x17c
>> [    0.329356] Modules linked in:
>> [    0.332434] 
>> [    0.333932] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.8.0-rc4-00163-g803ea3a #21
>> [    0.341581] Hardware name: Hisilicon Hi1616 Evaluation Board (DT)
>> [    0.347735] task: ffff8013e9dd0000 task.stack: ffff8013e9dcc000
>> [    0.353714] PC is at gic_raise_softirq+0x128/0x17c
>> [    0.358550] LR is at gic_raise_softirq+0xa0/0x17c
>> [    0.363298] pc : [<ffff00000838c124>] lr : [<ffff00000838c09c>] pstate: 200001c5
>> [    0.370770] sp : ffff8013e9dcfde0
>> [    0.374112] x29: ffff8013e9dcfde0 x28: 0000000000000000 
>> [    0.379476] x27: 000000000083207c x26: ffff000008ca5d70 
>> [    0.384841] x25: 0000000100000001 x24: ffff000008d63ff3 
>> [    0.390205] x23: 0000000000000000 x22: ffff000008cb0000 
>> [    0.395569] x21: ffff00000884edb0 x20: 0000000000000001 
>> [    0.400933] x19: 0000000100000000 x18: 0000000000000000 
>> [    0.406298] x17: 0000000000000000 x16: 0000000003010066 
>> [    0.411661] x15: ffff000008ca8000 x14: 0000000000000013 
>> [    0.417025] x13: 0000000000000000 x12: 0000000000000013 
>> [    0.422389] x11: 0000000000000013 x10: 0000000002e92aa7 
>> [    0.427754] x9 : 0000000000000000 x8 : ffff8413eb6ca668 
>> [    0.433118] x7 : ffff8413eb6ca690 x6 : 0000000000000000 
>> [    0.438482] x5 : fffffffffffffffe x4 : 0000000000000000 
>> [    0.443845] x3 : 0000000000000040 x2 : 0000000000000041 
>> [    0.449209] x1 : 0000000000000000 x0 : 0000000000000001 
>> [    0.454573] 
>> [    0.456069] ---[ end trace b58e70f3295a8cd7 ]---
>> [    0.460730] Call trace:
>> [    0.463193] Exception stack(0xffff8013e9dcfc10 to 0xffff8013e9dcfd40)
>> [    0.469699] fc00:                                   0000000100000000 0001000000000000
>> [    0.477611] fc20: ffff8013e9dcfde0 ffff00000838c124 ffff000008d72228 ffff8013e9dcff70
>> [    0.485524] fc40: ffff000008d72608 ffff000008ab02a4 0000000000000000 0000000000000000
>> [    0.493436] fc60: 0000000000000000 3464313430303030 0000000000000000 0000000000000000
>> [    0.501348] fc80: ffff8013e9dcfc90 ffff00000836e678 ffff8013e9dcfca0 ffff00000836e910
>> [    0.509259] fca0: ffff8013e9dcfd30 ffff00000836ec10 0000000000000001 0000000000000000
>> [    0.517171] fcc0: 0000000000000041 0000000000000040 0000000000000000 fffffffffffffffe
>> [    0.525083] fce0: 0000000000000000 ffff8413eb6ca690 ffff8413eb6ca668 0000000000000000
>> [    0.532995] fd00: 0000000002e92aa7 0000000000000013 0000000000000013 0000000000000000
>> [    0.540907] fd20: 0000000000000013 ffff000008ca8000 0000000003010066 0000000000000000
>> [    0.548819] [<ffff00000838c124>] gic_raise_softirq+0x128/0x17c
>> [    0.554713] [<ffff00000808e1f4>] smp_send_reschedule+0x34/0x3c
>> [    0.560605] [<ffff0000080ddf18>] resched_curr+0x40/0x5c
>> [    0.565881] [<ffff0000080de650>] check_preempt_curr+0x58/0xa0
>> [    0.571685] [<ffff0000080de6b0>] ttwu_do_wakeup+0x18/0x80
>> [    0.577136] [<ffff0000080de790>] ttwu_do_activate+0x78/0x88
>> [    0.582763] [<ffff0000080df5cc>] try_to_wake_up+0x1f8/0x300
>> [    0.588390] [<ffff0000080df79c>] default_wake_function+0x10/0x18
>> [    0.594458] [<ffff0000080f3210>] __wake_up_common+0x5c/0x9c
>> [    0.600085] [<ffff0000080f3264>] __wake_up_locked+0x14/0x1c
>> [    0.605712] [<ffff0000080f3e10>] complete+0x40/0x5c
>> [    0.610635] [<ffff00000808dba8>] secondary_start_kernel+0x148/0x1a8
>> [    0.616965] [<00000000000831a8>] 0x831a8
> .
>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [RFC] Arm64 boot fail with numa enable in BIOS
  2016-09-19 14:07 ` [RFC] Arm64 boot fail with numa enable in BIOS Mark Rutland
                     ` (2 preceding siblings ...)
  2016-09-20  2:51   ` Hanjun Guo
@ 2016-09-20  3:29   ` Yisheng Xie
  2016-09-20  8:33     ` Will Deacon
  3 siblings, 1 reply; 7+ messages in thread
From: Yisheng Xie @ 2016-09-20  3:29 UTC (permalink / raw)
  To: linux-arm-kernel



On 2016/9/19 22:07, Mark Rutland wrote:
> [adding LAKML, arm64 maintainers]
> 
> On Mon, Sep 19, 2016 at 09:05:26PM +0800, Yisheng Xie wrote:
>> hi all,
> 
> Can you modify the warning in cpumask.h to dump the bad CPU number? That
> would make it fairly clear if that's the case.
> 
hi Mark,
I dump the bad CPU number, it is 64,
And the cpumask get from task is 00000000,00000000.

[    3.873044] select_task_rq: allowed 0, allow_cpumask 00000000,00000000
[    3.879727] cpumask_check: cpu 64, nr_cpumask_bits:64, nr_cpu_ids= 64
[    3.895989] ------------[ cut here ]------------
[    3.900652] WARNING: CPU: 16 PID: 103 at ./include/linux/cpumask.h:122 try_to_wake_up+0x410/0x4ac

Thanks.
Yisheng Xie

> Thanks,
> Mark.
> 
>> [    0.297337] Detected PIPT I-cache on CPU1
>> [    0.297347] GICv3: CPU1: found redistributor 10001 region 1:0x000000004d140000
>> [    0.297356] CPU1: Booted secondary processor [410fd082]
>> [    0.297375] ------------[ cut here ]------------
>> [    0.320390] WARNING: CPU: 1 PID: 0 at ./include/linux/cpumask.h:121 gic_raise_softirq+0x128/0x17c
>> [    0.329356] Modules linked in:
>> [    0.332434] 
>> [    0.333932] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.8.0-rc4-00163-g803ea3a #21
>> [    0.341581] Hardware name: Hisilicon Hi1616 Evaluation Board (DT)
>> [    0.347735] task: ffff8013e9dd0000 task.stack: ffff8013e9dcc000
>> [    0.353714] PC is at gic_raise_softirq+0x128/0x17c
>> [    0.358550] LR is at gic_raise_softirq+0xa0/0x17c
>> [    0.363298] pc : [<ffff00000838c124>] lr : [<ffff00000838c09c>] pstate: 200001c5
>> [    0.370770] sp : ffff8013e9dcfde0
>> [    0.374112] x29: ffff8013e9dcfde0 x28: 0000000000000000 
>> [    0.379476] x27: 000000000083207c x26: ffff000008ca5d70 
>> [    0.384841] x25: 0000000100000001 x24: ffff000008d63ff3 
>> [    0.390205] x23: 0000000000000000 x22: ffff000008cb0000 
>> [    0.395569] x21: ffff00000884edb0 x20: 0000000000000001 
>> [    0.400933] x19: 0000000100000000 x18: 0000000000000000 
>> [    0.406298] x17: 0000000000000000 x16: 0000000003010066 
>> [    0.411661] x15: ffff000008ca8000 x14: 0000000000000013 
>> [    0.417025] x13: 0000000000000000 x12: 0000000000000013 
>> [    0.422389] x11: 0000000000000013 x10: 0000000002e92aa7 
>> [    0.427754] x9 : 0000000000000000 x8 : ffff8413eb6ca668 
>> [    0.433118] x7 : ffff8413eb6ca690 x6 : 0000000000000000 
>> [    0.438482] x5 : fffffffffffffffe x4 : 0000000000000000 
>> [    0.443845] x3 : 0000000000000040 x2 : 0000000000000041 
>> [    0.449209] x1 : 0000000000000000 x0 : 0000000000000001 
>> [    0.454573] 
>> [    0.456069] ---[ end trace b58e70f3295a8cd7 ]---
>> [    0.460730] Call trace:
>> [    0.463193] Exception stack(0xffff8013e9dcfc10 to 0xffff8013e9dcfd40)
>> [    0.469699] fc00:                                   0000000100000000 0001000000000000
>> [    0.477611] fc20: ffff8013e9dcfde0 ffff00000838c124 ffff000008d72228 ffff8013e9dcff70
>> [    0.485524] fc40: ffff000008d72608 ffff000008ab02a4 0000000000000000 0000000000000000
>> [    0.493436] fc60: 0000000000000000 3464313430303030 0000000000000000 0000000000000000
>> [    0.501348] fc80: ffff8013e9dcfc90 ffff00000836e678 ffff8013e9dcfca0 ffff00000836e910
>> [    0.509259] fca0: ffff8013e9dcfd30 ffff00000836ec10 0000000000000001 0000000000000000
>> [    0.517171] fcc0: 0000000000000041 0000000000000040 0000000000000000 fffffffffffffffe
>> [    0.525083] fce0: 0000000000000000 ffff8413eb6ca690 ffff8413eb6ca668 0000000000000000
>> [    0.532995] fd00: 0000000002e92aa7 0000000000000013 0000000000000013 0000000000000000
>> [    0.540907] fd20: 0000000000000013 ffff000008ca8000 0000000003010066 0000000000000000
>> [    0.548819] [<ffff00000838c124>] gic_raise_softirq+0x128/0x17c
>> [    0.554713] [<ffff00000808e1f4>] smp_send_reschedule+0x34/0x3c
>> [    0.560605] [<ffff0000080ddf18>] resched_curr+0x40/0x5c
>> [    0.565881] [<ffff0000080de650>] check_preempt_curr+0x58/0xa0
>> [    0.571685] [<ffff0000080de6b0>] ttwu_do_wakeup+0x18/0x80
>> [    0.577136] [<ffff0000080de790>] ttwu_do_activate+0x78/0x88
>> [    0.582763] [<ffff0000080df5cc>] try_to_wake_up+0x1f8/0x300
>> [    0.588390] [<ffff0000080df79c>] default_wake_function+0x10/0x18
>> [    0.594458] [<ffff0000080f3210>] __wake_up_common+0x5c/0x9c
>> [    0.600085] [<ffff0000080f3264>] __wake_up_locked+0x14/0x1c
>> [    0.605712] [<ffff0000080f3e10>] complete+0x40/0x5c
>> [    0.610635] [<ffff00000808dba8>] secondary_start_kernel+0x148/0x1a8
>> [    0.616965] [<00000000000831a8>] 0x831a8
> 
> .
> 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [RFC] Arm64 boot fail with numa enable in BIOS
  2016-09-20  3:29   ` Yisheng Xie
@ 2016-09-20  8:33     ` Will Deacon
  0 siblings, 0 replies; 7+ messages in thread
From: Will Deacon @ 2016-09-20  8:33 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Yisheng,

On Tue, Sep 20, 2016 at 11:29:24AM +0800, Yisheng Xie wrote:
> On 2016/9/19 22:07, Mark Rutland wrote:
> > On Mon, Sep 19, 2016 at 09:05:26PM +0800, Yisheng Xie wrote:
> > Can you modify the warning in cpumask.h to dump the bad CPU number? That
> > would make it fairly clear if that's the case.
> > 
> hi Mark,
> I dump the bad CPU number, it is 64,
> And the cpumask get from task is 00000000,00000000.
> 
> [    3.873044] select_task_rq: allowed 0, allow_cpumask 00000000,00000000
> [    3.879727] cpumask_check: cpu 64, nr_cpumask_bits:64, nr_cpu_ids= 64
> [    3.895989] ------------[ cut here ]------------
> [    3.900652] WARNING: CPU: 16 PID: 103 at ./include/linux/cpumask.h:122 try_to_wake_up+0x410/0x4ac

Can you look at this patch from David, please:

http://lists.infradead.org/pipermail/linux-arm-kernel/2016-September/458110.html

and offer a Tested-by if it fixes your problem?

Thanks,

Will

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2016-09-20  8:33 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <7618d76d-bfa8-d8aa-59aa-06f9d90c1a98@huawei.com>
2016-09-19 14:07 ` [RFC] Arm64 boot fail with numa enable in BIOS Mark Rutland
2016-09-19 14:45   ` Will Deacon
2016-09-20  1:19     ` Leizhen (ThunderTown)
2016-09-19 17:41   ` James Morse
2016-09-20  2:51   ` Hanjun Guo
2016-09-20  3:29   ` Yisheng Xie
2016-09-20  8:33     ` Will Deacon

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).