public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH]: AMD Northbridge: Verify NB's node is online
@ 2009-11-12 18:09 Prarit Bhargava
  2009-11-14  0:58 ` Ingo Molnar
  2009-11-16 16:10 ` [tip:x86/urgent] x86: " tip-bot for Prarit Bhargava
  0 siblings, 2 replies; 5+ messages in thread
From: Prarit Bhargava @ 2009-11-12 18:09 UTC (permalink / raw)
  To: linux-kernel, bhavna.sarathy, jbarnes, andreas.herrmann3, mingo
  Cc: Prarit Bhargava

Panic seen on some IBM and HP systems on 2.6.32-rc6.

BUG: unable to handle kernel NULL pointer dereference at (null)
IP: [<ffffffff8120bf3f>] find_next_bit+0x77/0x9c
PGD 2735ba067 PUD 2735d5067 PMD 0 
Oops: 0000 [#1] SMP 
last sysfs file: /sys/devices/platform/pcspkr/modalias
CPU 7 
Modules linked in: k8temp(+) pcspkr edac_core serio_raw hwmon shpchp cciss dm_snapshot dm_zero dm_mirror dm_region_hash dm_log dm_mod
Pid: 616, comm: modprobe Not tainted 2.6.32-rc6 #2 ProLiant DL585 G2   
RIP: 0010:[<ffffffff8120bf3f>]  [<ffffffff8120bf3f>] find_next_bit+0x77/0x9c
RSP: 0018:ffff8802736fdd18  EFLAGS: 00010202
RAX: 0000000000000000 RBX: ffffffff8182f680 RCX: 0000000000000000
RDX: 0000000000000000 RSI: 0000000000000008 RDI: 0000000000000008
RBP: ffff8802736fdd18 R08: 0000000000000000 R09: 0000000000000000
R10: ffffffff81d922e0 R11: 0000000000000000 R12: 0000000000000000
R13: ffffffffa007e720 R14: 0000000000000001 R15: 00000000015b19e0
FS:  00007f0a474086f0(0000) GS:ffff880036400000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000000000 CR3: 0000000273cbb000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process modprobe (pid: 616, threadinfo ffff8802736fc000, task ffff8802743b5c00)
Stack:
 ffff8802736fdd38 ffffffff8120bbde ffff88027646b0d8 ffff88027646b168
<0> ffff8802736fdd88 ffffffff81225c62 ffffffffa007e720 ffff88027646b0d8
<0> ffffffffa007e930 ffffffff812b9be6 ffff88027646b168 ffffffffa007e780
Call Trace:
 [<ffffffff8120bbde>] cpumask_next_and+0x2e/0x3b
 [<ffffffff81225c62>] pci_device_probe+0x8e/0xf5
 [<ffffffff812b9be6>] ? driver_sysfs_add+0x47/0x6c
 [<ffffffff812b9da5>] driver_probe_device+0xd9/0x1f9
 [<ffffffff812b9f1d>] __driver_attach+0x58/0x7c
 [<ffffffff812b9ec5>] ? __driver_attach+0x0/0x7c
 [<ffffffff812b9298>] bus_for_each_dev+0x54/0x89
 [<ffffffff812b9b4f>] driver_attach+0x19/0x1b
 [<ffffffff812b97ae>] bus_add_driver+0xd3/0x23d
 [<ffffffff812ba1e7>] driver_register+0x98/0x109
 [<ffffffff81225ed0>] __pci_register_driver+0x63/0xd3
 [<ffffffff81072776>] ? up_read+0x26/0x2a
 [<ffffffffa0081000>] ? k8temp_init+0x0/0x20 [k8temp]
 [<ffffffffa008101e>] k8temp_init+0x1e/0x20 [k8temp]
 [<ffffffff8100a073>] do_one_initcall+0x6d/0x185
 [<ffffffff8108d765>] sys_init_module+0xd3/0x236
 [<ffffffff81011ac2>] system_call_fastpath+0x16/0x1b
Code: 49 83 c0 40 eb 14 49 8b 01 48 85 c0 75 39 49 83 c1 08 49 83 c0 40 48 83 ef 40 48 f7 c7 c0 ff ff ff 75 e3 48 85 ff 4c 89 c0 74 23 <49> 8b 01 b9 40 00 00 00 48 83 ca ff 29 f9 48 d3 ea 48 21 d0 75 
RIP  [<ffffffff8120bf3f>] find_next_bit+0x77/0x9c
 RSP <ffff8802736fdd18>
CR2: 0000000000000000
---[ end trace a3d7e2941e8a6320 ]---

Hardware maybe programmed incorrectly and return a bogus node ID.  Check to
see if the node is actually online before setting the numa node for an AMD
northbridge in quirk_amd_nb_node().

Signed-off-by: Prarit Bhargava <prarit@redhat.com>

diff --git a/arch/x86/kernel/quirks.c b/arch/x86/kernel/quirks.c
index 6c3b2c6..9308ba7 100644
--- a/arch/x86/kernel/quirks.c
+++ b/arch/x86/kernel/quirks.c
@@ -507,7 +507,8 @@ static void __init quirk_amd_nb_node(struct pci_dev *dev)
 		return;
 
 	pci_read_config_dword(nb_ht, 0x60, &val);
-	set_dev_node(&dev->dev, val & 7);
+	if (node_online(val & 7))
+		set_dev_node(&dev->dev, val & 7);
 	pci_dev_put(nb_ht);
 }
 

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH]: AMD Northbridge: Verify NB's node is online
  2009-11-12 18:09 [PATCH]: AMD Northbridge: Verify NB's node is online Prarit Bhargava
@ 2009-11-14  0:58 ` Ingo Molnar
  2009-11-16 13:39   ` Prarit Bhargava
  2009-11-16 16:10 ` [tip:x86/urgent] x86: " tip-bot for Prarit Bhargava
  1 sibling, 1 reply; 5+ messages in thread
From: Ingo Molnar @ 2009-11-14  0:58 UTC (permalink / raw)
  To: Prarit Bhargava; +Cc: linux-kernel, bhavna.sarathy, jbarnes, andreas.herrmann3


* Prarit Bhargava <prarit@redhat.com> wrote:

> Panic seen on some IBM and HP systems on 2.6.32-rc6.
> 
> BUG: unable to handle kernel NULL pointer dereference at (null)
> IP: [<ffffffff8120bf3f>] find_next_bit+0x77/0x9c
> PGD 2735ba067 PUD 2735d5067 PMD 0 
> Oops: 0000 [#1] SMP 
> last sysfs file: /sys/devices/platform/pcspkr/modalias
> CPU 7 
> Modules linked in: k8temp(+) pcspkr edac_core serio_raw hwmon shpchp cciss dm_snapshot dm_zero dm_mirror dm_region_hash dm_log dm_mod
> Pid: 616, comm: modprobe Not tainted 2.6.32-rc6 #2 ProLiant DL585 G2   
> RIP: 0010:[<ffffffff8120bf3f>]  [<ffffffff8120bf3f>] find_next_bit+0x77/0x9c
> RSP: 0018:ffff8802736fdd18  EFLAGS: 00010202
> RAX: 0000000000000000 RBX: ffffffff8182f680 RCX: 0000000000000000
> RDX: 0000000000000000 RSI: 0000000000000008 RDI: 0000000000000008
> RBP: ffff8802736fdd18 R08: 0000000000000000 R09: 0000000000000000
> R10: ffffffff81d922e0 R11: 0000000000000000 R12: 0000000000000000
> R13: ffffffffa007e720 R14: 0000000000000001 R15: 00000000015b19e0
> FS:  00007f0a474086f0(0000) GS:ffff880036400000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> CR2: 0000000000000000 CR3: 0000000273cbb000 CR4: 00000000000006e0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> Process modprobe (pid: 616, threadinfo ffff8802736fc000, task ffff8802743b5c00)
> Stack:
>  ffff8802736fdd38 ffffffff8120bbde ffff88027646b0d8 ffff88027646b168
> <0> ffff8802736fdd88 ffffffff81225c62 ffffffffa007e720 ffff88027646b0d8
> <0> ffffffffa007e930 ffffffff812b9be6 ffff88027646b168 ffffffffa007e780
> Call Trace:
>  [<ffffffff8120bbde>] cpumask_next_and+0x2e/0x3b
>  [<ffffffff81225c62>] pci_device_probe+0x8e/0xf5
>  [<ffffffff812b9be6>] ? driver_sysfs_add+0x47/0x6c
>  [<ffffffff812b9da5>] driver_probe_device+0xd9/0x1f9
>  [<ffffffff812b9f1d>] __driver_attach+0x58/0x7c
>  [<ffffffff812b9ec5>] ? __driver_attach+0x0/0x7c
>  [<ffffffff812b9298>] bus_for_each_dev+0x54/0x89
>  [<ffffffff812b9b4f>] driver_attach+0x19/0x1b
>  [<ffffffff812b97ae>] bus_add_driver+0xd3/0x23d
>  [<ffffffff812ba1e7>] driver_register+0x98/0x109
>  [<ffffffff81225ed0>] __pci_register_driver+0x63/0xd3
>  [<ffffffff81072776>] ? up_read+0x26/0x2a
>  [<ffffffffa0081000>] ? k8temp_init+0x0/0x20 [k8temp]
>  [<ffffffffa008101e>] k8temp_init+0x1e/0x20 [k8temp]
>  [<ffffffff8100a073>] do_one_initcall+0x6d/0x185
>  [<ffffffff8108d765>] sys_init_module+0xd3/0x236
>  [<ffffffff81011ac2>] system_call_fastpath+0x16/0x1b
> Code: 49 83 c0 40 eb 14 49 8b 01 48 85 c0 75 39 49 83 c1 08 49 83 c0 40 48 83 ef 40 48 f7 c7 c0 ff ff ff 75 e3 48 85 ff 4c 89 c0 74 23 <49> 8b 01 b9 40 00 00 00 48 83 ca ff 29 f9 48 d3 ea 48 21 d0 75 
> RIP  [<ffffffff8120bf3f>] find_next_bit+0x77/0x9c
>  RSP <ffff8802736fdd18>
> CR2: 0000000000000000
> ---[ end trace a3d7e2941e8a6320 ]---
> 
> Hardware maybe programmed incorrectly and return a bogus node ID.  
> Check to see if the node is actually online before setting the numa 
> node for an AMD northbridge in quirk_amd_nb_node().

Hm, could you stick a printk in there, what precise node ID does the 
hardware return?

	Ingo

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH]: AMD Northbridge: Verify NB's node is online
  2009-11-14  0:58 ` Ingo Molnar
@ 2009-11-16 13:39   ` Prarit Bhargava
  2009-11-16 14:44     ` Ingo Molnar
  0 siblings, 1 reply; 5+ messages in thread
From: Prarit Bhargava @ 2009-11-16 13:39 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: linux-kernel, bhavna.sarathy, jbarnes, andreas.herrmann3


>>
>> Hardware maybe programmed incorrectly and return a bogus node ID.  
>> Check to see if the node is actually online before setting the numa 
>> node for an AMD northbridge in quirk_amd_nb_node().
>>     
>
> Hm, could you stick a printk in there, what precise node ID does the 
> hardware return?
>
>   

Ingo, yup -- I put in a printk and commented out the set_dev_node() call 
when debugging this
and got this output:

quirk_amd_nb_node: current numa_node = 0x0, would set to val & 7 = 0x0
quirk_amd_nb_node: current numa_node = 0x0, would set to val & 7 = 0x1
quirk_amd_nb_node: current numa_node = 0x0, would set to val & 7 = 0x2
quirk_amd_nb_node: current numa_node = 0x0, would set to val & 7 = 0x3

The issue appears to be that the HW has set val to a valid value, 
however, the system is only configured for a single node -- 0.

I realize that I'm working around broken HW ... but I think that a 
quirk, quirk_amd_nb_node(), should at least keep systems booting ...

P.


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH]: AMD Northbridge: Verify NB's node is online
  2009-11-16 13:39   ` Prarit Bhargava
@ 2009-11-16 14:44     ` Ingo Molnar
  0 siblings, 0 replies; 5+ messages in thread
From: Ingo Molnar @ 2009-11-16 14:44 UTC (permalink / raw)
  To: Prarit Bhargava; +Cc: linux-kernel, bhavna.sarathy, jbarnes, andreas.herrmann3


* Prarit Bhargava <prarit@redhat.com> wrote:

> 
> >>
> >>Hardware maybe programmed incorrectly and return a bogus node
> >>ID.  Check to see if the node is actually online before setting
> >>the numa node for an AMD northbridge in quirk_amd_nb_node().
> >
> >Hm, could you stick a printk in there, what precise node ID does
> >the hardware return?
> >
> 
> Ingo, yup -- I put in a printk and commented out the set_dev_node()
> call when debugging this
> and got this output:
> 
> quirk_amd_nb_node: current numa_node = 0x0, would set to val & 7 = 0x0
> quirk_amd_nb_node: current numa_node = 0x0, would set to val & 7 = 0x1
> quirk_amd_nb_node: current numa_node = 0x0, would set to val & 7 = 0x2
> quirk_amd_nb_node: current numa_node = 0x0, would set to val & 7 = 0x3
> 
> The issue appears to be that the HW has set val to a valid value, 
> however, the system is only configured for a single node -- 0.
> 
> I realize that I'm working around broken HW ... but I think that a 
> quirk, quirk_amd_nb_node(), should at least keep systems booting ...

Ok. I cleaned up the patch a bit and added a comment explaining the 
logic - and also expanded the changelog with your new debug data, and 
applied it to tip:x86/urgent. Please check the commit notification email 
whether it's all OK.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [tip:x86/urgent] x86: AMD Northbridge: Verify NB's node is online
  2009-11-12 18:09 [PATCH]: AMD Northbridge: Verify NB's node is online Prarit Bhargava
  2009-11-14  0:58 ` Ingo Molnar
@ 2009-11-16 16:10 ` tip-bot for Prarit Bhargava
  1 sibling, 0 replies; 5+ messages in thread
From: tip-bot for Prarit Bhargava @ 2009-11-16 16:10 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: linux-kernel, hpa, mingo, tglx, prarit, mingo

Commit-ID:  303fc0870f8fbfabe260c5c32b18e53458d597ea
Gitweb:     http://git.kernel.org/tip/303fc0870f8fbfabe260c5c32b18e53458d597ea
Author:     Prarit Bhargava <prarit@redhat.com>
AuthorDate: Thu, 12 Nov 2009 13:09:31 -0500
Committer:  Ingo Molnar <mingo@elte.hu>
CommitDate: Mon, 16 Nov 2009 15:43:05 +0100

x86: AMD Northbridge: Verify NB's node is online

Fix panic seen on some IBM and HP systems on 2.6.32-rc6:

 BUG: unable to handle kernel NULL pointer dereference at (null)
 IP: [<ffffffff8120bf3f>] find_next_bit+0x77/0x9c
  [...]
  [<ffffffff8120bbde>] cpumask_next_and+0x2e/0x3b
  [<ffffffff81225c62>] pci_device_probe+0x8e/0xf5
  [<ffffffff812b9be6>] ? driver_sysfs_add+0x47/0x6c
  [<ffffffff812b9da5>] driver_probe_device+0xd9/0x1f9
  [<ffffffff812b9f1d>] __driver_attach+0x58/0x7c
  [<ffffffff812b9ec5>] ? __driver_attach+0x0/0x7c
  [<ffffffff812b9298>] bus_for_each_dev+0x54/0x89
  [<ffffffff812b9b4f>] driver_attach+0x19/0x1b
  [<ffffffff812b97ae>] bus_add_driver+0xd3/0x23d
  [<ffffffff812ba1e7>] driver_register+0x98/0x109
  [<ffffffff81225ed0>] __pci_register_driver+0x63/0xd3
  [<ffffffff81072776>] ? up_read+0x26/0x2a
  [<ffffffffa0081000>] ? k8temp_init+0x0/0x20 [k8temp]
  [<ffffffffa008101e>] k8temp_init+0x1e/0x20 [k8temp]
  [<ffffffff8100a073>] do_one_initcall+0x6d/0x185
  [<ffffffff8108d765>] sys_init_module+0xd3/0x236
  [<ffffffff81011ac2>] system_call_fastpath+0x16/0x1b

I put in a printk and commented out the set_dev_node()
call when and got this output:

 quirk_amd_nb_node: current numa_node = 0x0, would set to val & 7 = 0x0
 quirk_amd_nb_node: current numa_node = 0x0, would set to val & 7 = 0x1
 quirk_amd_nb_node: current numa_node = 0x0, would set to val & 7 = 0x2
 quirk_amd_nb_node: current numa_node = 0x0, would set to val & 7 = 0x3

I.e. the issue appears to be that the HW has set val to a valid
value, however, the system is only configured for a single
node -- 0, the others are offline.

Check to see if the node is actually online before setting
the numa node for an AMD northbridge in quirk_amd_nb_node().

Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Cc: bhavna.sarathy@amd.com
Cc: jbarnes@virtuousgeek.org
Cc: andreas.herrmann3@amd.com
LKML-Reference: <20091112180933.12532.98685.sendpatchset@prarit.bos.redhat.com>
[ v2: clean up the code and add comments ]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
 arch/x86/kernel/quirks.c |    9 ++++++++-
 1 files changed, 8 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kernel/quirks.c b/arch/x86/kernel/quirks.c
index 6c3b2c6..18093d7 100644
--- a/arch/x86/kernel/quirks.c
+++ b/arch/x86/kernel/quirks.c
@@ -499,6 +499,7 @@ static void __init quirk_amd_nb_node(struct pci_dev *dev)
 {
 	struct pci_dev *nb_ht;
 	unsigned int devfn;
+	u32 node;
 	u32 val;
 
 	devfn = PCI_DEVFN(PCI_SLOT(dev->devfn), 0);
@@ -507,7 +508,13 @@ static void __init quirk_amd_nb_node(struct pci_dev *dev)
 		return;
 
 	pci_read_config_dword(nb_ht, 0x60, &val);
-	set_dev_node(&dev->dev, val & 7);
+	node = val & 7;
+	/*
+	 * Some hardware may return an invalid node ID,
+	 * so check it first:
+	 */
+	if (node_online(node))
+		set_dev_node(&dev->dev, node);
 	pci_dev_put(nb_ht);
 }
 

^ permalink raw reply related	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2009-11-16 16:11 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-11-12 18:09 [PATCH]: AMD Northbridge: Verify NB's node is online Prarit Bhargava
2009-11-14  0:58 ` Ingo Molnar
2009-11-16 13:39   ` Prarit Bhargava
2009-11-16 14:44     ` Ingo Molnar
2009-11-16 16:10 ` [tip:x86/urgent] x86: " tip-bot for Prarit Bhargava

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox