* [PATCH]: AMD Northbridge: Verify NB's node is online
@ 2009-11-12 18:09 Prarit Bhargava
2009-11-14 0:58 ` Ingo Molnar
2009-11-16 16:10 ` [tip:x86/urgent] x86: " tip-bot for Prarit Bhargava
0 siblings, 2 replies; 5+ messages in thread
From: Prarit Bhargava @ 2009-11-12 18:09 UTC (permalink / raw)
To: linux-kernel, bhavna.sarathy, jbarnes, andreas.herrmann3, mingo
Cc: Prarit Bhargava
Panic seen on some IBM and HP systems on 2.6.32-rc6.
BUG: unable to handle kernel NULL pointer dereference at (null)
IP: [<ffffffff8120bf3f>] find_next_bit+0x77/0x9c
PGD 2735ba067 PUD 2735d5067 PMD 0
Oops: 0000 [#1] SMP
last sysfs file: /sys/devices/platform/pcspkr/modalias
CPU 7
Modules linked in: k8temp(+) pcspkr edac_core serio_raw hwmon shpchp cciss dm_snapshot dm_zero dm_mirror dm_region_hash dm_log dm_mod
Pid: 616, comm: modprobe Not tainted 2.6.32-rc6 #2 ProLiant DL585 G2
RIP: 0010:[<ffffffff8120bf3f>] [<ffffffff8120bf3f>] find_next_bit+0x77/0x9c
RSP: 0018:ffff8802736fdd18 EFLAGS: 00010202
RAX: 0000000000000000 RBX: ffffffff8182f680 RCX: 0000000000000000
RDX: 0000000000000000 RSI: 0000000000000008 RDI: 0000000000000008
RBP: ffff8802736fdd18 R08: 0000000000000000 R09: 0000000000000000
R10: ffffffff81d922e0 R11: 0000000000000000 R12: 0000000000000000
R13: ffffffffa007e720 R14: 0000000000000001 R15: 00000000015b19e0
FS: 00007f0a474086f0(0000) GS:ffff880036400000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000000000 CR3: 0000000273cbb000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process modprobe (pid: 616, threadinfo ffff8802736fc000, task ffff8802743b5c00)
Stack:
ffff8802736fdd38 ffffffff8120bbde ffff88027646b0d8 ffff88027646b168
<0> ffff8802736fdd88 ffffffff81225c62 ffffffffa007e720 ffff88027646b0d8
<0> ffffffffa007e930 ffffffff812b9be6 ffff88027646b168 ffffffffa007e780
Call Trace:
[<ffffffff8120bbde>] cpumask_next_and+0x2e/0x3b
[<ffffffff81225c62>] pci_device_probe+0x8e/0xf5
[<ffffffff812b9be6>] ? driver_sysfs_add+0x47/0x6c
[<ffffffff812b9da5>] driver_probe_device+0xd9/0x1f9
[<ffffffff812b9f1d>] __driver_attach+0x58/0x7c
[<ffffffff812b9ec5>] ? __driver_attach+0x0/0x7c
[<ffffffff812b9298>] bus_for_each_dev+0x54/0x89
[<ffffffff812b9b4f>] driver_attach+0x19/0x1b
[<ffffffff812b97ae>] bus_add_driver+0xd3/0x23d
[<ffffffff812ba1e7>] driver_register+0x98/0x109
[<ffffffff81225ed0>] __pci_register_driver+0x63/0xd3
[<ffffffff81072776>] ? up_read+0x26/0x2a
[<ffffffffa0081000>] ? k8temp_init+0x0/0x20 [k8temp]
[<ffffffffa008101e>] k8temp_init+0x1e/0x20 [k8temp]
[<ffffffff8100a073>] do_one_initcall+0x6d/0x185
[<ffffffff8108d765>] sys_init_module+0xd3/0x236
[<ffffffff81011ac2>] system_call_fastpath+0x16/0x1b
Code: 49 83 c0 40 eb 14 49 8b 01 48 85 c0 75 39 49 83 c1 08 49 83 c0 40 48 83 ef 40 48 f7 c7 c0 ff ff ff 75 e3 48 85 ff 4c 89 c0 74 23 <49> 8b 01 b9 40 00 00 00 48 83 ca ff 29 f9 48 d3 ea 48 21 d0 75
RIP [<ffffffff8120bf3f>] find_next_bit+0x77/0x9c
RSP <ffff8802736fdd18>
CR2: 0000000000000000
---[ end trace a3d7e2941e8a6320 ]---
Hardware maybe programmed incorrectly and return a bogus node ID. Check to
see if the node is actually online before setting the numa node for an AMD
northbridge in quirk_amd_nb_node().
Signed-off-by: Prarit Bhargava <prarit@redhat.com>
diff --git a/arch/x86/kernel/quirks.c b/arch/x86/kernel/quirks.c
index 6c3b2c6..9308ba7 100644
--- a/arch/x86/kernel/quirks.c
+++ b/arch/x86/kernel/quirks.c
@@ -507,7 +507,8 @@ static void __init quirk_amd_nb_node(struct pci_dev *dev)
return;
pci_read_config_dword(nb_ht, 0x60, &val);
- set_dev_node(&dev->dev, val & 7);
+ if (node_online(val & 7))
+ set_dev_node(&dev->dev, val & 7);
pci_dev_put(nb_ht);
}
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH]: AMD Northbridge: Verify NB's node is online
2009-11-12 18:09 [PATCH]: AMD Northbridge: Verify NB's node is online Prarit Bhargava
@ 2009-11-14 0:58 ` Ingo Molnar
2009-11-16 13:39 ` Prarit Bhargava
2009-11-16 16:10 ` [tip:x86/urgent] x86: " tip-bot for Prarit Bhargava
1 sibling, 1 reply; 5+ messages in thread
From: Ingo Molnar @ 2009-11-14 0:58 UTC (permalink / raw)
To: Prarit Bhargava; +Cc: linux-kernel, bhavna.sarathy, jbarnes, andreas.herrmann3
* Prarit Bhargava <prarit@redhat.com> wrote:
> Panic seen on some IBM and HP systems on 2.6.32-rc6.
>
> BUG: unable to handle kernel NULL pointer dereference at (null)
> IP: [<ffffffff8120bf3f>] find_next_bit+0x77/0x9c
> PGD 2735ba067 PUD 2735d5067 PMD 0
> Oops: 0000 [#1] SMP
> last sysfs file: /sys/devices/platform/pcspkr/modalias
> CPU 7
> Modules linked in: k8temp(+) pcspkr edac_core serio_raw hwmon shpchp cciss dm_snapshot dm_zero dm_mirror dm_region_hash dm_log dm_mod
> Pid: 616, comm: modprobe Not tainted 2.6.32-rc6 #2 ProLiant DL585 G2
> RIP: 0010:[<ffffffff8120bf3f>] [<ffffffff8120bf3f>] find_next_bit+0x77/0x9c
> RSP: 0018:ffff8802736fdd18 EFLAGS: 00010202
> RAX: 0000000000000000 RBX: ffffffff8182f680 RCX: 0000000000000000
> RDX: 0000000000000000 RSI: 0000000000000008 RDI: 0000000000000008
> RBP: ffff8802736fdd18 R08: 0000000000000000 R09: 0000000000000000
> R10: ffffffff81d922e0 R11: 0000000000000000 R12: 0000000000000000
> R13: ffffffffa007e720 R14: 0000000000000001 R15: 00000000015b19e0
> FS: 00007f0a474086f0(0000) GS:ffff880036400000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> CR2: 0000000000000000 CR3: 0000000273cbb000 CR4: 00000000000006e0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> Process modprobe (pid: 616, threadinfo ffff8802736fc000, task ffff8802743b5c00)
> Stack:
> ffff8802736fdd38 ffffffff8120bbde ffff88027646b0d8 ffff88027646b168
> <0> ffff8802736fdd88 ffffffff81225c62 ffffffffa007e720 ffff88027646b0d8
> <0> ffffffffa007e930 ffffffff812b9be6 ffff88027646b168 ffffffffa007e780
> Call Trace:
> [<ffffffff8120bbde>] cpumask_next_and+0x2e/0x3b
> [<ffffffff81225c62>] pci_device_probe+0x8e/0xf5
> [<ffffffff812b9be6>] ? driver_sysfs_add+0x47/0x6c
> [<ffffffff812b9da5>] driver_probe_device+0xd9/0x1f9
> [<ffffffff812b9f1d>] __driver_attach+0x58/0x7c
> [<ffffffff812b9ec5>] ? __driver_attach+0x0/0x7c
> [<ffffffff812b9298>] bus_for_each_dev+0x54/0x89
> [<ffffffff812b9b4f>] driver_attach+0x19/0x1b
> [<ffffffff812b97ae>] bus_add_driver+0xd3/0x23d
> [<ffffffff812ba1e7>] driver_register+0x98/0x109
> [<ffffffff81225ed0>] __pci_register_driver+0x63/0xd3
> [<ffffffff81072776>] ? up_read+0x26/0x2a
> [<ffffffffa0081000>] ? k8temp_init+0x0/0x20 [k8temp]
> [<ffffffffa008101e>] k8temp_init+0x1e/0x20 [k8temp]
> [<ffffffff8100a073>] do_one_initcall+0x6d/0x185
> [<ffffffff8108d765>] sys_init_module+0xd3/0x236
> [<ffffffff81011ac2>] system_call_fastpath+0x16/0x1b
> Code: 49 83 c0 40 eb 14 49 8b 01 48 85 c0 75 39 49 83 c1 08 49 83 c0 40 48 83 ef 40 48 f7 c7 c0 ff ff ff 75 e3 48 85 ff 4c 89 c0 74 23 <49> 8b 01 b9 40 00 00 00 48 83 ca ff 29 f9 48 d3 ea 48 21 d0 75
> RIP [<ffffffff8120bf3f>] find_next_bit+0x77/0x9c
> RSP <ffff8802736fdd18>
> CR2: 0000000000000000
> ---[ end trace a3d7e2941e8a6320 ]---
>
> Hardware maybe programmed incorrectly and return a bogus node ID.
> Check to see if the node is actually online before setting the numa
> node for an AMD northbridge in quirk_amd_nb_node().
Hm, could you stick a printk in there, what precise node ID does the
hardware return?
Ingo
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH]: AMD Northbridge: Verify NB's node is online
2009-11-14 0:58 ` Ingo Molnar
@ 2009-11-16 13:39 ` Prarit Bhargava
2009-11-16 14:44 ` Ingo Molnar
0 siblings, 1 reply; 5+ messages in thread
From: Prarit Bhargava @ 2009-11-16 13:39 UTC (permalink / raw)
To: Ingo Molnar; +Cc: linux-kernel, bhavna.sarathy, jbarnes, andreas.herrmann3
>>
>> Hardware maybe programmed incorrectly and return a bogus node ID.
>> Check to see if the node is actually online before setting the numa
>> node for an AMD northbridge in quirk_amd_nb_node().
>>
>
> Hm, could you stick a printk in there, what precise node ID does the
> hardware return?
>
>
Ingo, yup -- I put in a printk and commented out the set_dev_node() call
when debugging this
and got this output:
quirk_amd_nb_node: current numa_node = 0x0, would set to val & 7 = 0x0
quirk_amd_nb_node: current numa_node = 0x0, would set to val & 7 = 0x1
quirk_amd_nb_node: current numa_node = 0x0, would set to val & 7 = 0x2
quirk_amd_nb_node: current numa_node = 0x0, would set to val & 7 = 0x3
The issue appears to be that the HW has set val to a valid value,
however, the system is only configured for a single node -- 0.
I realize that I'm working around broken HW ... but I think that a
quirk, quirk_amd_nb_node(), should at least keep systems booting ...
P.
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH]: AMD Northbridge: Verify NB's node is online
2009-11-16 13:39 ` Prarit Bhargava
@ 2009-11-16 14:44 ` Ingo Molnar
0 siblings, 0 replies; 5+ messages in thread
From: Ingo Molnar @ 2009-11-16 14:44 UTC (permalink / raw)
To: Prarit Bhargava; +Cc: linux-kernel, bhavna.sarathy, jbarnes, andreas.herrmann3
* Prarit Bhargava <prarit@redhat.com> wrote:
>
> >>
> >>Hardware maybe programmed incorrectly and return a bogus node
> >>ID. Check to see if the node is actually online before setting
> >>the numa node for an AMD northbridge in quirk_amd_nb_node().
> >
> >Hm, could you stick a printk in there, what precise node ID does
> >the hardware return?
> >
>
> Ingo, yup -- I put in a printk and commented out the set_dev_node()
> call when debugging this
> and got this output:
>
> quirk_amd_nb_node: current numa_node = 0x0, would set to val & 7 = 0x0
> quirk_amd_nb_node: current numa_node = 0x0, would set to val & 7 = 0x1
> quirk_amd_nb_node: current numa_node = 0x0, would set to val & 7 = 0x2
> quirk_amd_nb_node: current numa_node = 0x0, would set to val & 7 = 0x3
>
> The issue appears to be that the HW has set val to a valid value,
> however, the system is only configured for a single node -- 0.
>
> I realize that I'm working around broken HW ... but I think that a
> quirk, quirk_amd_nb_node(), should at least keep systems booting ...
Ok. I cleaned up the patch a bit and added a comment explaining the
logic - and also expanded the changelog with your new debug data, and
applied it to tip:x86/urgent. Please check the commit notification email
whether it's all OK.
Thanks,
Ingo
^ permalink raw reply [flat|nested] 5+ messages in thread
* [tip:x86/urgent] x86: AMD Northbridge: Verify NB's node is online
2009-11-12 18:09 [PATCH]: AMD Northbridge: Verify NB's node is online Prarit Bhargava
2009-11-14 0:58 ` Ingo Molnar
@ 2009-11-16 16:10 ` tip-bot for Prarit Bhargava
1 sibling, 0 replies; 5+ messages in thread
From: tip-bot for Prarit Bhargava @ 2009-11-16 16:10 UTC (permalink / raw)
To: linux-tip-commits; +Cc: linux-kernel, hpa, mingo, tglx, prarit, mingo
Commit-ID: 303fc0870f8fbfabe260c5c32b18e53458d597ea
Gitweb: http://git.kernel.org/tip/303fc0870f8fbfabe260c5c32b18e53458d597ea
Author: Prarit Bhargava <prarit@redhat.com>
AuthorDate: Thu, 12 Nov 2009 13:09:31 -0500
Committer: Ingo Molnar <mingo@elte.hu>
CommitDate: Mon, 16 Nov 2009 15:43:05 +0100
x86: AMD Northbridge: Verify NB's node is online
Fix panic seen on some IBM and HP systems on 2.6.32-rc6:
BUG: unable to handle kernel NULL pointer dereference at (null)
IP: [<ffffffff8120bf3f>] find_next_bit+0x77/0x9c
[...]
[<ffffffff8120bbde>] cpumask_next_and+0x2e/0x3b
[<ffffffff81225c62>] pci_device_probe+0x8e/0xf5
[<ffffffff812b9be6>] ? driver_sysfs_add+0x47/0x6c
[<ffffffff812b9da5>] driver_probe_device+0xd9/0x1f9
[<ffffffff812b9f1d>] __driver_attach+0x58/0x7c
[<ffffffff812b9ec5>] ? __driver_attach+0x0/0x7c
[<ffffffff812b9298>] bus_for_each_dev+0x54/0x89
[<ffffffff812b9b4f>] driver_attach+0x19/0x1b
[<ffffffff812b97ae>] bus_add_driver+0xd3/0x23d
[<ffffffff812ba1e7>] driver_register+0x98/0x109
[<ffffffff81225ed0>] __pci_register_driver+0x63/0xd3
[<ffffffff81072776>] ? up_read+0x26/0x2a
[<ffffffffa0081000>] ? k8temp_init+0x0/0x20 [k8temp]
[<ffffffffa008101e>] k8temp_init+0x1e/0x20 [k8temp]
[<ffffffff8100a073>] do_one_initcall+0x6d/0x185
[<ffffffff8108d765>] sys_init_module+0xd3/0x236
[<ffffffff81011ac2>] system_call_fastpath+0x16/0x1b
I put in a printk and commented out the set_dev_node()
call when and got this output:
quirk_amd_nb_node: current numa_node = 0x0, would set to val & 7 = 0x0
quirk_amd_nb_node: current numa_node = 0x0, would set to val & 7 = 0x1
quirk_amd_nb_node: current numa_node = 0x0, would set to val & 7 = 0x2
quirk_amd_nb_node: current numa_node = 0x0, would set to val & 7 = 0x3
I.e. the issue appears to be that the HW has set val to a valid
value, however, the system is only configured for a single
node -- 0, the others are offline.
Check to see if the node is actually online before setting
the numa node for an AMD northbridge in quirk_amd_nb_node().
Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Cc: bhavna.sarathy@amd.com
Cc: jbarnes@virtuousgeek.org
Cc: andreas.herrmann3@amd.com
LKML-Reference: <20091112180933.12532.98685.sendpatchset@prarit.bos.redhat.com>
[ v2: clean up the code and add comments ]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
arch/x86/kernel/quirks.c | 9 ++++++++-
1 files changed, 8 insertions(+), 1 deletions(-)
diff --git a/arch/x86/kernel/quirks.c b/arch/x86/kernel/quirks.c
index 6c3b2c6..18093d7 100644
--- a/arch/x86/kernel/quirks.c
+++ b/arch/x86/kernel/quirks.c
@@ -499,6 +499,7 @@ static void __init quirk_amd_nb_node(struct pci_dev *dev)
{
struct pci_dev *nb_ht;
unsigned int devfn;
+ u32 node;
u32 val;
devfn = PCI_DEVFN(PCI_SLOT(dev->devfn), 0);
@@ -507,7 +508,13 @@ static void __init quirk_amd_nb_node(struct pci_dev *dev)
return;
pci_read_config_dword(nb_ht, 0x60, &val);
- set_dev_node(&dev->dev, val & 7);
+ node = val & 7;
+ /*
+ * Some hardware may return an invalid node ID,
+ * so check it first:
+ */
+ if (node_online(node))
+ set_dev_node(&dev->dev, node);
pci_dev_put(nb_ht);
}
^ permalink raw reply related [flat|nested] 5+ messages in thread
end of thread, other threads:[~2009-11-16 16:11 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-11-12 18:09 [PATCH]: AMD Northbridge: Verify NB's node is online Prarit Bhargava
2009-11-14 0:58 ` Ingo Molnar
2009-11-16 13:39 ` Prarit Bhargava
2009-11-16 14:44 ` Ingo Molnar
2009-11-16 16:10 ` [tip:x86/urgent] x86: " tip-bot for Prarit Bhargava
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox