* [PATCH] Prevent AMD MCE oops on multi-server system @ 2012-10-01 6:42 Daniel J Blueman 2012-10-01 10:06 ` Borislav Petkov ` (2 more replies) 0 siblings, 3 replies; 7+ messages in thread From: Daniel J Blueman @ 2012-10-01 6:42 UTC (permalink / raw) To: Ingo Molnar Cc: Borislav Petkov, Thomas Gleixner, H. Peter Anvin, x86, linux-kernel, Daniel J Blueman When booting on a federated multi-server system, the processor Northbridge lookup returns NULL; add guards to prevent this causing an oops. Signed-off-by: Daniel J Blueman <daniel@numascale-asia.com> --- arch/x86/kernel/cpu/mcheck/mce_amd.c | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/arch/x86/kernel/cpu/mcheck/mce_amd.c b/arch/x86/kernel/cpu/mcheck/mce_amd.c index c4e916d..698b6ec 100644 --- a/arch/x86/kernel/cpu/mcheck/mce_amd.c +++ b/arch/x86/kernel/cpu/mcheck/mce_amd.c @@ -576,12 +576,10 @@ static __cpuinit int threshold_create_bank(unsigned int cpu, unsigned int bank) int err = 0; if (shared_bank[bank]) { - nb = node_to_amd_nb(amd_get_nb_id(cpu)); - WARN_ON(!nb); /* threshold descriptor already initialized on this node? */ - if (nb->bank4) { + if (nb && nb->bank4) { /* yes, use it */ b = nb->bank4; err = kobject_add(b->kobj, &dev->kobj, name); @@ -615,8 +613,10 @@ static __cpuinit int threshold_create_bank(unsigned int cpu, unsigned int bank) atomic_set(&b->cpus, 1); /* nb is already initialized, see above */ - WARN_ON(nb->bank4); - nb->bank4 = b; + if (nb) { + WARN_ON(nb->bank4); + nb->bank4 = b; + } } err = allocate_threshold_blocks(cpu, bank, 0, -- 1.7.9.5 ^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH] Prevent AMD MCE oops on multi-server system 2012-10-01 6:42 [PATCH] Prevent AMD MCE oops on multi-server system Daniel J Blueman @ 2012-10-01 10:06 ` Borislav Petkov 2012-10-01 16:12 ` Daniel J Blueman 2012-10-17 19:18 ` [tip:x86/urgent] x86, amd, mce: Avoid NULL pointer reference on CPU northbridge lookup tip-bot for Daniel J Blueman 2012-10-21 18:16 ` [tip:x86/urgent] x86, AMD, MCE: Prevent oops on multi-server system tip-bot for Daniel J Blueman 2 siblings, 1 reply; 7+ messages in thread From: Borislav Petkov @ 2012-10-01 10:06 UTC (permalink / raw) To: Daniel J Blueman Cc: Ingo Molnar, Borislav Petkov, Thomas Gleixner, H. Peter Anvin, x86, linux-kernel On Mon, Oct 01, 2012 at 02:42:05PM +0800, Daniel J Blueman wrote: > When booting on a federated multi-server system, the processor Northbridge > lookup returns NULL; add guards to prevent this causing an oops. Interesting. What does lspci say on those systems? Thanks. -- Regards/Gruss, Boris. Advanced Micro Devices GmbH Einsteinring 24, 85609 Dornach GM: Alberto Bozzo Reg: Dornach, Landkreis Muenchen HRB Nr. 43632 WEEE Registernr: 129 19551 ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] Prevent AMD MCE oops on multi-server system 2012-10-01 10:06 ` Borislav Petkov @ 2012-10-01 16:12 ` Daniel J Blueman 2012-10-01 18:01 ` Borislav Petkov 0 siblings, 1 reply; 7+ messages in thread From: Daniel J Blueman @ 2012-10-01 16:12 UTC (permalink / raw) To: Borislav Petkov Cc: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, x86, linux-kernel, Steffen Persvold On 01/10/2012 18:06, Borislav Petkov wrote: > On Mon, Oct 01, 2012 at 02:42:05PM +0800, Daniel J Blueman wrote: >> When booting on a federated multi-server system, the processor Northbridge >> lookup returns NULL; add guards to prevent this causing an oops. > Interesting. > > What does lspci say on those systems? > > Thanks. As NumaConnect remote-server I/O is in a pre-release stage, we only expose I/O on the first (root) server, so the lspci on eg my three server, single-socket C32 development system is uninteresting [1]. We map MMCONFIG addresses in the global address map to the respective server, which is how we access the processor Northbridges in the bootloader before Linux loads, so they are accessible and get enumerated when we enable remote I/O with the ACPI SSDT we generate, however since the AMD APIC IDs (hence NB IDs) are only 8-bit, the present amd_get_nb_id will produce duplicate NB IDs at best (but in this case, as we disable I/O routing, there is no structure); later, we may propose to using eg bits 23:8 for the server ID. That's another discussion though. The minimal patch at least corrects the oops regression which didn't happen in earlier kernels. Thanks! Daniel --- [1] root@oct1:~# lspci 00:00.0 Host bridge: ATI Technologies Inc RD890 Northbridge only dual slot (2x16) PCI-e GFX Hydra part (rev 02) 00:00.2 Generic system peripheral [0806]: ATI Technologies Inc Device 5a23 00:02.0 PCI bridge: ATI Technologies Inc RD890 PCI to PCI bridge (PCI express gpp port B) 00:04.0 PCI bridge: ATI Technologies Inc RD890 PCI to PCI bridge (PCI express gpp port D) 00:05.0 PCI bridge: ATI Technologies Inc RD890 PCI to PCI bridge (PCI express gpp port E) 00:06.0 PCI bridge: ATI Technologies Inc RD890 PCI to PCI bridge (PCI express gpp port F) 00:11.0 SATA controller: ATI Technologies Inc SB700/SB800 SATA Controller [AHCI mode] 00:12.0 USB Controller: ATI Technologies Inc SB700/SB800 USB OHCI0 Controller 00:12.1 USB Controller: ATI Technologies Inc SB700 USB OHCI1 Controller 00:12.2 USB Controller: ATI Technologies Inc SB700/SB800 USB EHCI Controller 00:13.0 USB Controller: ATI Technologies Inc SB700/SB800 USB OHCI0 Controller 00:13.1 USB Controller: ATI Technologies Inc SB700 USB OHCI1 Controller 00:13.2 USB Controller: ATI Technologies Inc SB700/SB800 USB EHCI Controller 00:14.0 SMBus: ATI Technologies Inc SBx00 SMBus Controller (rev 3d) 00:14.1 IDE interface: ATI Technologies Inc SB700/SB800 IDE Controller 00:14.2 Audio device: ATI Technologies Inc SBx00 Azalia (Intel HDA) 00:14.3 ISA bridge: ATI Technologies Inc SB700/SB800 LPC host controller 00:14.4 PCI bridge: ATI Technologies Inc SBx00 PCI to PCI Bridge 00:14.5 USB Controller: ATI Technologies Inc SB700/SB800 USB OHCI2 Controller 00:18.0 Host bridge: Advanced Micro Devices [AMD] Family 10h Processor HyperTransport Configuration 00:18.1 Host bridge: Advanced Micro Devices [AMD] Family 10h Processor Address Map 00:18.2 Host bridge: Advanced Micro Devices [AMD] Family 10h Processor DRAM Controller 00:18.3 Host bridge: Advanced Micro Devices [AMD] Family 10h Processor Miscellaneous Control 00:18.4 Host bridge: Advanced Micro Devices [AMD] Family 10h Processor Link Control 00:19.0 Host bridge: Device 1b47:0601 (rev 02) 00:19.1 Host bridge: Device 1b47:0602 (rev 02) 01:00.0 VGA compatible controller: ATI Technologies Inc Device 68ba 01:00.1 Audio device: ATI Technologies Inc Juniper HDMI Audio [Radeon HD 5700 Series] 02:00.0 USB Controller: NEC Corporation uPD720200 USB 3.0 Host Controller (rev 03) 03:00.0 Ethernet controller: Intel Corporation 82574L Gigabit Network Connection 04:00.0 Ethernet controller: Intel Corporation 82574L Gigabit Network Connection 05:06.0 VGA compatible controller: ASPEED Technology, Inc. ASPEED Graphics Family (rev 10) -- Daniel J Blueman Principal Software Engineer, Numascale Asia ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] Prevent AMD MCE oops on multi-server system 2012-10-01 16:12 ` Daniel J Blueman @ 2012-10-01 18:01 ` Borislav Petkov 2012-10-03 6:54 ` Daniel J Blueman 0 siblings, 1 reply; 7+ messages in thread From: Borislav Petkov @ 2012-10-01 18:01 UTC (permalink / raw) To: Daniel J Blueman Cc: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, x86, linux-kernel, Steffen Persvold On Tue, Oct 02, 2012 at 12:12:31AM +0800, Daniel J Blueman wrote: > On 01/10/2012 18:06, Borislav Petkov wrote: > >On Mon, Oct 01, 2012 at 02:42:05PM +0800, Daniel J Blueman wrote: > >>When booting on a federated multi-server system, the processor Northbridge > >>lookup returns NULL; add guards to prevent this causing an oops. > >Interesting. > > > >What does lspci say on those systems? > > > >Thanks. > As NumaConnect remote-server I/O is in a pre-release stage, we only > expose I/O on the first (root) server, so the lspci on eg my three > server, single-socket C32 development system is uninteresting [1]. Yeah, I was looking for the NB devices: > 00:18.0 Host bridge: Advanced Micro Devices [AMD] Family 10h Processor HyperTransport Configuration > 00:18.1 Host bridge: Advanced Micro Devices [AMD] Family 10h Processor Address Map > 00:18.2 Host bridge: Advanced Micro Devices [AMD] Family 10h Processor DRAM Controller > 00:18.3 Host bridge: Advanced Micro Devices [AMD] Family 10h Processor Miscellaneous Control > 00:18.4 Host bridge: Advanced Micro Devices [AMD] Family 10h Processor Link Control [ … ] > We map MMCONFIG addresses in the global address map to the > respective server, which is how we access the processor Northbridges > in the bootloader before Linux loads, so they are accessible and get > enumerated when we enable remote I/O with the ACPI SSDT we generate, > however since the AMD APIC IDs (hence NB IDs) are only 8-bit, the > present amd_get_nb_id will produce duplicate NB IDs at best (but in > this case, as we disable I/O routing, there is no structure); later, > we may propose to using eg bits 23:8 for the server ID. That's > another discussion though. Ah yes, I remember now. We had this discussion already, AFAIR. So if you say you disable I/O routing, what actually doesn't work out as expected is the NB enumeration in amd_nb.c where pci_get_device simply fails? Because if you had duplicate APIC IDs, you'd atleast get some NB descriptor, even if not the correct one? > The minimal patch at least corrects the oops regression which didn't > happen in earlier kernels. Right, I beefed it up a bit and added a stable tag, pls take a look and let me know if it is ok. I'll run it on a couple of machines but I don't expect any issues so I'll send it upstream soon. Thanks. --- >From 91388e9d34b44080bbe127c9721b6df36358654c Mon Sep 17 00:00:00 2001 From: Daniel J Blueman <daniel@numascale-asia.com> Date: Mon, 1 Oct 2012 14:42:05 +0800 Subject: [PATCH] x86, AMD, MCE: Prevent oops on multi-server system When booting on a federated multi-server system (NumaScale), the processor Northbridge lookup returns NULL; add guards to prevent this causing an oops. On those systems, the northbridge is accessed through MMIO and the "normal" northbridge enumeration in amd_nb.c doesn't work since we're generating the northbridge ID from the initial APIC ID and the last is not unique on those systems. Long story short, we end up without northbridge descriptors. Signed-off-by: Daniel J Blueman <daniel@numascale-asia.com> Cc: stable@vger.kernel.org # 3.6 Link: http://lkml.kernel.org/r/1349073725-14093-1-git-send-email-daniel@numascale-asia.com [ Boris: beef up commit message ] Signed-off-by: Borislav Petkov <borislav.petkov@amd.com> --- arch/x86/kernel/cpu/mcheck/mce_amd.c | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/arch/x86/kernel/cpu/mcheck/mce_amd.c b/arch/x86/kernel/cpu/mcheck/mce_amd.c index c4e916d77378..698b6ec12e0f 100644 --- a/arch/x86/kernel/cpu/mcheck/mce_amd.c +++ b/arch/x86/kernel/cpu/mcheck/mce_amd.c @@ -576,12 +576,10 @@ static __cpuinit int threshold_create_bank(unsigned int cpu, unsigned int bank) int err = 0; if (shared_bank[bank]) { - nb = node_to_amd_nb(amd_get_nb_id(cpu)); - WARN_ON(!nb); /* threshold descriptor already initialized on this node? */ - if (nb->bank4) { + if (nb && nb->bank4) { /* yes, use it */ b = nb->bank4; err = kobject_add(b->kobj, &dev->kobj, name); @@ -615,8 +613,10 @@ static __cpuinit int threshold_create_bank(unsigned int cpu, unsigned int bank) atomic_set(&b->cpus, 1); /* nb is already initialized, see above */ - WARN_ON(nb->bank4); - nb->bank4 = b; + if (nb) { + WARN_ON(nb->bank4); + nb->bank4 = b; + } } err = allocate_threshold_blocks(cpu, bank, 0, -- 1.7.11.rc1 -- Regards/Gruss, Boris. Advanced Micro Devices GmbH Einsteinring 24, 85609 Dornach GM: Alberto Bozzo Reg: Dornach, Landkreis Muenchen HRB Nr. 43632 WEEE Registernr: 129 19551 ^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH] Prevent AMD MCE oops on multi-server system 2012-10-01 18:01 ` Borislav Petkov @ 2012-10-03 6:54 ` Daniel J Blueman 0 siblings, 0 replies; 7+ messages in thread From: Daniel J Blueman @ 2012-10-03 6:54 UTC (permalink / raw) To: Borislav Petkov Cc: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, x86, linux-kernel, Steffen Persvold On 02/10/2012 02:01, Borislav Petkov wrote: > On Tue, Oct 02, 2012 at 12:12:31AM +0800, Daniel J Blueman wrote: >> On 01/10/2012 18:06, Borislav Petkov wrote: >>> On Mon, Oct 01, 2012 at 02:42:05PM +0800, Daniel J Blueman wrote: >>>> When booting on a federated multi-server system, the processor Northbridge >>>> lookup returns NULL; add guards to prevent this causing an oops. >>> Interesting. >>> >>> What does lspci say on those systems? >>> >>> Thanks. >> As NumaConnect remote-server I/O is in a pre-release stage, we only >> expose I/O on the first (root) server, so the lspci on eg my three >> server, single-socket C32 development system is uninteresting [1]. > > Yeah, I was looking for the NB devices: > >> 00:18.0 Host bridge: Advanced Micro Devices [AMD] Family 10h Processor HyperTransport Configuration >> 00:18.1 Host bridge: Advanced Micro Devices [AMD] Family 10h Processor Address Map >> 00:18.2 Host bridge: Advanced Micro Devices [AMD] Family 10h Processor DRAM Controller >> 00:18.3 Host bridge: Advanced Micro Devices [AMD] Family 10h Processor Miscellaneous Control >> 00:18.4 Host bridge: Advanced Micro Devices [AMD] Family 10h Processor Link Control > > [ … ] > >> We map MMCONFIG addresses in the global address map to the >> respective server, which is how we access the processor Northbridges >> in the bootloader before Linux loads, so they are accessible and get >> enumerated when we enable remote I/O with the ACPI SSDT we generate, >> however since the AMD APIC IDs (hence NB IDs) are only 8-bit, the >> present amd_get_nb_id will produce duplicate NB IDs at best (but in >> this case, as we disable I/O routing, there is no structure); later, >> we may propose to using eg bits 23:8 for the server ID. That's >> another discussion though. > > Ah yes, I remember now. We had this discussion already, AFAIR. So if you > say you disable I/O routing, what actually doesn't work out as expected > is the NB enumeration in amd_nb.c where pci_get_device simply fails? > > Because if you had duplicate APIC IDs, you'd atleast get some NB > descriptor, even if not the correct one? With remote-I/O disabled, since only the first PCI domain has been enumerated, the array of Northbridge IDs has structures only for the root (first) server's northbridges, thus the lookup returns NULL for later ones. Yes, we see the duplicates with remote I/O enabled [1, 2], stemming from amd64_edac.h: static inline u8 get_node_id(struct pci_dev *pdev) { return PCI_SLOT(pdev->devfn) - 0x18; } How about a patch that would add the PCI domain eg in bits 8 and up? >> The minimal patch at least corrects the oops regression which didn't >> happen in earlier kernels. > > Right, I beefed it up a bit and added a stable tag, pls take a look and > let me know if it is ok. I'll run it on a couple of machines but I don't > expect any issues so I'll send it upstream soon. Looks good! Thanks Boris, Daniel --- [1] EDAC MC: Ver: 3.0.0 AMD64 EDAC driver v3.4.0 EDAC amd64: DRAM ECC enabled. EDAC amd64: F10h detected (node 0). EDAC MC: DCT0 chip selects: EDAC amd64: MC: 0: 0MB 1: 0MB EDAC amd64: MC: 2: 0MB 3: 0MB EDAC amd64: MC: 4: 2048MB 5: 2048MB EDAC amd64: MC: 6: 0MB 7: 0MB EDAC MC: DCT1 chip selects: EDAC amd64: MC: 0: 0MB 1: 0MB EDAC amd64: MC: 2: 0MB 3: 0MB EDAC amd64: MC: 4: 2048MB 5: 2048MB EDAC amd64: MC: 6: 0MB 7: 0MB EDAC amd64: using x4 syndromes. EDAC amd64: MCT channel count: 2 EDAC amd64: CS4: Unbuffered DDR3 RAM EDAC amd64: CS5: Unbuffered DDR3 RAM EDAC MC0: Giving out device to 'amd64_edac' 'F10h': DEV 0000:00:18.2 EDAC amd64: DRAM ECC enabled. EDAC amd64: F10h detected (node 0). EDAC MC: DCT0 chip selects: EDAC amd64: MC: 0: 0MB 1: 0MB EDAC amd64: MC: 2: 0MB 3: 0MB EDAC amd64: MC: 4: 2048MB 5: 2048MB EDAC amd64: MC: 6: 0MB 7: 0MB EDAC MC: DCT1 chip selects: EDAC amd64: MC: 0: 0MB 1: 0MB EDAC amd64: MC: 2: 0MB 3: 0MB EDAC amd64: MC: 4: 2048MB 5: 2048MB EDAC amd64: MC: 6: 0MB 7: 0MB EDAC amd64: using x4 syndromes. EDAC amd64: MCT channel count: 2 EDAC amd64: CS4: Unbuffered DDR3 RAM EDAC amd64: CS5: Unbuffered DDR3 RAM EDAC MC: bug in low-level driver: attempt to assign duplicate mc_idx 0 in add_mc_to_global_list() EDAC amd64: Error probing instance: 0 EDAC amd64: DRAM ECC enabled. EDAC amd64: F10h detected (node 0). EDAC MC: DCT0 chip selects: EDAC amd64: MC: 0: 0MB 1: 0MB EDAC amd64: MC: 2: 0MB 3: 0MB EDAC amd64: MC: 4: 2048MB 5: 2048MB EDAC amd64: MC: 6: 0MB 7: 0MB EDAC MC: DCT1 chip selects: EDAC amd64: MC: 0: 0MB 1: 0MB EDAC amd64: MC: 2: 0MB 3: 0MB EDAC amd64: MC: 4: 2048MB 5: 2048MB EDAC amd64: MC: 6: 0MB 7: 0MB EDAC amd64: using x4 syndromes. EDAC amd64: MCT channel count: 2 EDAC amd64: CS4: Unbuffered DDR3 RAM EDAC amd64: CS5: Unbuffered DDR3 RAM EDAC MC: bug in low-level driver: attempt to assign duplicate mc_idx 0 in add_mc_to_global_list() EDAC amd64: Error probing instance: 0 EDAC PCI0: Giving out device to module 'amd64_edac' controller 'EDAC PCI controller': DEV '0000:00:18.2' (POLLED) --- [2] 0000:00:00.0 Host bridge: ATI Technologies Inc RD890 Northbridge only dual slot (2x16) PCI-e GFX Hydra part (rev 02) 0000:00:00.2 Generic system peripheral [0806]: ATI Technologies Inc Device 5a23 0000:00:02.0 PCI bridge: ATI Technologies Inc RD890 PCI to PCI bridge (PCI express gpp port B) 0000:00:04.0 PCI bridge: ATI Technologies Inc RD890 PCI to PCI bridge (PCI express gpp port D) 0000:00:05.0 PCI bridge: ATI Technologies Inc RD890 PCI to PCI bridge (PCI express gpp port E) 0000:00:06.0 PCI bridge: ATI Technologies Inc RD890 PCI to PCI bridge (PCI express gpp port F) 0000:00:11.0 SATA controller: ATI Technologies Inc SB700/SB800 SATA Controller [AHCI mode] 0000:00:12.0 USB Controller: ATI Technologies Inc SB700/SB800 USB OHCI0 Controller 0000:00:12.1 USB Controller: ATI Technologies Inc SB700 USB OHCI1 Controller 0000:00:12.2 USB Controller: ATI Technologies Inc SB700/SB800 USB EHCI Controller 0000:00:13.0 USB Controller: ATI Technologies Inc SB700/SB800 USB OHCI0 Controller 0000:00:13.1 USB Controller: ATI Technologies Inc SB700 USB OHCI1 Controller 0000:00:13.2 USB Controller: ATI Technologies Inc SB700/SB800 USB EHCI Controller 0000:00:14.0 SMBus: ATI Technologies Inc SBx00 SMBus Controller (rev 3d) 0000:00:14.1 IDE interface: ATI Technologies Inc SB700/SB800 IDE Controller 0000:00:14.2 Audio device: ATI Technologies Inc SBx00 Azalia (Intel HDA) 0000:00:14.3 ISA bridge: ATI Technologies Inc SB700/SB800 LPC host controller 0000:00:14.4 PCI bridge: ATI Technologies Inc SBx00 PCI to PCI Bridge 0000:00:14.5 USB Controller: ATI Technologies Inc SB700/SB800 USB OHCI2 Controller 0000:00:18.0 Host bridge: Advanced Micro Devices [AMD] Family 10h Processor HyperTransport Configuration 0000:00:18.1 Host bridge: Advanced Micro Devices [AMD] Family 10h Processor Address Map 0000:00:18.2 Host bridge: Advanced Micro Devices [AMD] Family 10h Processor DRAM Controller 0000:00:18.3 Host bridge: Advanced Micro Devices [AMD] Family 10h Processor Miscellaneous Control 0000:00:18.4 Host bridge: Advanced Micro Devices [AMD] Family 10h Processor Link Control 0000:00:19.0 Host bridge: Device 1b47:0601 (rev 02) 0000:00:19.1 Host bridge: Device 1b47:0602 (rev 02) 0000:01:00.0 VGA compatible controller: ATI Technologies Inc Device 68ba 0000:01:00.1 Audio device: ATI Technologies Inc Juniper HDMI Audio [Radeon HD 5700 Series] 0000:02:00.0 USB Controller: NEC Corporation uPD720200 USB 3.0 Host Controller (rev 03) 0000:03:00.0 Ethernet controller: Intel Corporation 82574L Gigabit Network Connection 0000:04:00.0 Ethernet controller: Intel Corporation 82574L Gigabit Network Connection 0000:05:06.0 VGA compatible controller: ASPEED Technology, Inc. ASPEED Graphics Family (rev 10) 0001:00:00.0 Host bridge: ATI Technologies Inc RD890 Northbridge only dual slot (2x16) PCI-e GFX Hydra part (rev 02) 0001:00:04.0 PCI bridge: ATI Technologies Inc RD890 PCI to PCI bridge (PCI express gpp port D) 0001:00:05.0 PCI bridge: ATI Technologies Inc RD890 PCI to PCI bridge (PCI express gpp port E) 0001:00:06.0 PCI bridge: ATI Technologies Inc RD890 PCI to PCI bridge (PCI express gpp port F) 0001:00:11.0 SATA controller: ATI Technologies Inc SB700/SB800 SATA Controller [AHCI mode] 0001:00:12.0 USB Controller: ATI Technologies Inc SB700/SB800 USB OHCI0 Controller 0001:00:12.1 USB Controller: ATI Technologies Inc SB700 USB OHCI1 Controller 0001:00:12.2 USB Controller: ATI Technologies Inc SB700/SB800 USB EHCI Controller 0001:00:13.0 USB Controller: ATI Technologies Inc SB700/SB800 USB OHCI0 Controller 0001:00:13.1 USB Controller: ATI Technologies Inc SB700 USB OHCI1 Controller 0001:00:13.2 USB Controller: ATI Technologies Inc SB700/SB800 USB EHCI Controller 0001:00:14.0 SMBus: ATI Technologies Inc SBx00 SMBus Controller (rev 3d) 0001:00:14.1 IDE interface: ATI Technologies Inc SB700/SB800 IDE Controller 0001:00:14.2 Audio device: ATI Technologies Inc SBx00 Azalia (Intel HDA) 0001:00:14.3 ISA bridge: ATI Technologies Inc SB700/SB800 LPC host controller 0001:00:14.4 PCI bridge: ATI Technologies Inc SBx00 PCI to PCI Bridge 0001:00:14.5 USB Controller: ATI Technologies Inc SB700/SB800 USB OHCI2 Controller 0001:00:18.0 Host bridge: Advanced Micro Devices [AMD] Family 10h Processor HyperTransport Configuration 0001:00:18.1 Host bridge: Advanced Micro Devices [AMD] Family 10h Processor Address Map 0001:00:18.2 Host bridge: Advanced Micro Devices [AMD] Family 10h Processor DRAM Controller 0001:00:18.3 Host bridge: Advanced Micro Devices [AMD] Family 10h Processor Miscellaneous Control 0001:00:18.4 Host bridge: Advanced Micro Devices [AMD] Family 10h Processor Link Control 0001:00:19.0 Host bridge: Device 1b47:0601 (rev 02) 0001:00:19.1 Host bridge: Device 1b47:0602 (rev 02) 0001:01:00.0 USB Controller: NEC Corporation uPD720200 USB 3.0 Host Controller (rev 03) 0001:02:00.0 Ethernet controller: Intel Corporation 82574L Gigabit Network Connection 0001:03:00.0 Ethernet controller: Intel Corporation 82574L Gigabit Network Connection 0001:04:06.0 VGA compatible controller: ASPEED Technology, Inc. ASPEED Graphics Family (rev 10) 0002:00:00.0 Host bridge: ATI Technologies Inc RD890 Northbridge only dual slot (2x16) PCI-e GFX Hydra part (rev 02) 0002:00:04.0 PCI bridge: ATI Technologies Inc RD890 PCI to PCI bridge (PCI express gpp port D) 0002:00:05.0 PCI bridge: ATI Technologies Inc RD890 PCI to PCI bridge (PCI express gpp port E) 0002:00:06.0 PCI bridge: ATI Technologies Inc RD890 PCI to PCI bridge (PCI express gpp port F) 0002:00:11.0 SATA controller: ATI Technologies Inc SB700/SB800 SATA Controller [AHCI mode] 0002:00:12.0 USB Controller: ATI Technologies Inc SB700/SB800 USB OHCI0 Controller 0002:00:12.1 USB Controller: ATI Technologies Inc SB700 USB OHCI1 Controller 0002:00:12.2 USB Controller: ATI Technologies Inc SB700/SB800 USB EHCI Controller 0002:00:13.0 USB Controller: ATI Technologies Inc SB700/SB800 USB OHCI0 Controller 0002:00:13.1 USB Controller: ATI Technologies Inc SB700 USB OHCI1 Controller 0002:00:13.2 USB Controller: ATI Technologies Inc SB700/SB800 USB EHCI Controller 0002:00:14.0 SMBus: ATI Technologies Inc SBx00 SMBus Controller (rev 3d) 0002:00:14.1 IDE interface: ATI Technologies Inc SB700/SB800 IDE Controller 0002:00:14.2 Audio device: ATI Technologies Inc SBx00 Azalia (Intel HDA) 0002:00:14.3 ISA bridge: ATI Technologies Inc SB700/SB800 LPC host controller 0002:00:14.4 PCI bridge: ATI Technologies Inc SBx00 PCI to PCI Bridge 0002:00:14.5 USB Controller: ATI Technologies Inc SB700/SB800 USB OHCI2 Controller 0002:00:18.0 Host bridge: Advanced Micro Devices [AMD] Family 10h Processor HyperTransport Configuration 0002:00:18.1 Host bridge: Advanced Micro Devices [AMD] Family 10h Processor Address Map 0002:00:18.2 Host bridge: Advanced Micro Devices [AMD] Family 10h Processor DRAM Controller 0002:00:18.3 Host bridge: Advanced Micro Devices [AMD] Family 10h Processor Miscellaneous Control 0002:00:18.4 Host bridge: Advanced Micro Devices [AMD] Family 10h Processor Link Control 0002:00:19.0 Host bridge: Device 1b47:0601 (rev 02) 0002:00:19.1 Host bridge: Device 1b47:0602 (rev 02) 0002:01:00.0 USB Controller: NEC Corporation uPD720200 USB 3.0 Host Controller (rev 03) 0002:02:00.0 Ethernet controller: Intel Corporation 82574L Gigabit Network Connection 0002:03:00.0 Ethernet controller: Intel Corporation 82574L Gigabit Network Connection 0002:04:06.0 VGA compatible controller: ASPEED Technology, Inc. ASPEED Graphics Family (rev 10) -- Daniel J Blueman Principal Software Engineer, Numascale Asia ^ permalink raw reply [flat|nested] 7+ messages in thread
* [tip:x86/urgent] x86, amd, mce: Avoid NULL pointer reference on CPU northbridge lookup 2012-10-01 6:42 [PATCH] Prevent AMD MCE oops on multi-server system Daniel J Blueman 2012-10-01 10:06 ` Borislav Petkov @ 2012-10-17 19:18 ` tip-bot for Daniel J Blueman 2012-10-21 18:16 ` [tip:x86/urgent] x86, AMD, MCE: Prevent oops on multi-server system tip-bot for Daniel J Blueman 2 siblings, 0 replies; 7+ messages in thread From: tip-bot for Daniel J Blueman @ 2012-10-17 19:18 UTC (permalink / raw) To: linux-tip-commits Cc: linux-kernel, hpa, mingo, tglx, hpa, borislav.petkov, daniel Commit-ID: 21c5e50e15b1abd797e62f18fd7f90b9cc004cbd Gitweb: http://git.kernel.org/tip/21c5e50e15b1abd797e62f18fd7f90b9cc004cbd Author: Daniel J Blueman <daniel@numascale-asia.com> AuthorDate: Mon, 1 Oct 2012 14:42:05 +0800 Committer: H. Peter Anvin <hpa@linux.intel.com> CommitDate: Wed, 17 Oct 2012 11:25:32 -0700 x86, amd, mce: Avoid NULL pointer reference on CPU northbridge lookup When booting on a federated multi-server system (NumaScale), the processor Northbridge lookup returns NULL; add guards to prevent this causing an oops. On those systems, the northbridge is accessed through MMIO and the "normal" northbridge enumeration in amd_nb.c doesn't work since we're generating the northbridge ID from the initial APIC ID and the last is not unique on those systems. Long story short, we end up without northbridge descriptors. Signed-off-by: Daniel J Blueman <daniel@numascale-asia.com> Cc: stable@vger.kernel.org # 3.6 Link: http://lkml.kernel.org/r/1349073725-14093-1-git-send-email-daniel@numascale-asia.com [ Boris: beef up commit message ] Signed-off-by: Borislav Petkov <borislav.petkov@amd.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> --- arch/x86/kernel/cpu/mcheck/mce_amd.c | 10 +++++----- 1 files changed, 5 insertions(+), 5 deletions(-) diff --git a/arch/x86/kernel/cpu/mcheck/mce_amd.c b/arch/x86/kernel/cpu/mcheck/mce_amd.c index c4e916d..698b6ec 100644 --- a/arch/x86/kernel/cpu/mcheck/mce_amd.c +++ b/arch/x86/kernel/cpu/mcheck/mce_amd.c @@ -576,12 +576,10 @@ static __cpuinit int threshold_create_bank(unsigned int cpu, unsigned int bank) int err = 0; if (shared_bank[bank]) { - nb = node_to_amd_nb(amd_get_nb_id(cpu)); - WARN_ON(!nb); /* threshold descriptor already initialized on this node? */ - if (nb->bank4) { + if (nb && nb->bank4) { /* yes, use it */ b = nb->bank4; err = kobject_add(b->kobj, &dev->kobj, name); @@ -615,8 +613,10 @@ static __cpuinit int threshold_create_bank(unsigned int cpu, unsigned int bank) atomic_set(&b->cpus, 1); /* nb is already initialized, see above */ - WARN_ON(nb->bank4); - nb->bank4 = b; + if (nb) { + WARN_ON(nb->bank4); + nb->bank4 = b; + } } err = allocate_threshold_blocks(cpu, bank, 0, ^ permalink raw reply related [flat|nested] 7+ messages in thread
* [tip:x86/urgent] x86, AMD, MCE: Prevent oops on multi-server system 2012-10-01 6:42 [PATCH] Prevent AMD MCE oops on multi-server system Daniel J Blueman 2012-10-01 10:06 ` Borislav Petkov 2012-10-17 19:18 ` [tip:x86/urgent] x86, amd, mce: Avoid NULL pointer reference on CPU northbridge lookup tip-bot for Daniel J Blueman @ 2012-10-21 18:16 ` tip-bot for Daniel J Blueman 2 siblings, 0 replies; 7+ messages in thread From: tip-bot for Daniel J Blueman @ 2012-10-21 18:16 UTC (permalink / raw) To: linux-tip-commits; +Cc: linux-kernel, hpa, mingo, tglx, borislav.petkov, daniel Commit-ID: 124556ec1555b89af76cec3e41375b6f9a557ead Gitweb: http://git.kernel.org/tip/124556ec1555b89af76cec3e41375b6f9a557ead Author: Daniel J Blueman <daniel@numascale-asia.com> AuthorDate: Mon, 1 Oct 2012 14:42:05 +0800 Committer: Borislav Petkov <borislav.petkov@amd.com> CommitDate: Tue, 9 Oct 2012 14:48:43 +0200 x86, AMD, MCE: Prevent oops on multi-server system When booting on a federated multi-server system (NumaScale), the processor Northbridge lookup returns NULL; add guards to prevent this causing an oops. On those systems, the northbridge is accessed through MMIO and the "normal" northbridge enumeration in amd_nb.c doesn't work since we're generating the northbridge ID from the initial APIC ID and the last is not unique on those systems. Long story short, we end up without northbridge descriptors. Signed-off-by: Daniel J Blueman <daniel@numascale-asia.com> Cc: stable@vger.kernel.org # 3.6 Link: http://lkml.kernel.org/r/1349073725-14093-1-git-send-email-daniel@numascale-asia.com [ Boris: beef up commit message ] Signed-off-by: Borislav Petkov <borislav.petkov@amd.com> --- arch/x86/kernel/cpu/mcheck/mce_amd.c | 10 +++++----- 1 files changed, 5 insertions(+), 5 deletions(-) diff --git a/arch/x86/kernel/cpu/mcheck/mce_amd.c b/arch/x86/kernel/cpu/mcheck/mce_amd.c index c4e916d..698b6ec 100644 --- a/arch/x86/kernel/cpu/mcheck/mce_amd.c +++ b/arch/x86/kernel/cpu/mcheck/mce_amd.c @@ -576,12 +576,10 @@ static __cpuinit int threshold_create_bank(unsigned int cpu, unsigned int bank) int err = 0; if (shared_bank[bank]) { - nb = node_to_amd_nb(amd_get_nb_id(cpu)); - WARN_ON(!nb); /* threshold descriptor already initialized on this node? */ - if (nb->bank4) { + if (nb && nb->bank4) { /* yes, use it */ b = nb->bank4; err = kobject_add(b->kobj, &dev->kobj, name); @@ -615,8 +613,10 @@ static __cpuinit int threshold_create_bank(unsigned int cpu, unsigned int bank) atomic_set(&b->cpus, 1); /* nb is already initialized, see above */ - WARN_ON(nb->bank4); - nb->bank4 = b; + if (nb) { + WARN_ON(nb->bank4); + nb->bank4 = b; + } } err = allocate_threshold_blocks(cpu, bank, 0, ^ permalink raw reply related [flat|nested] 7+ messages in thread
end of thread, other threads:[~2012-10-21 18:16 UTC | newest] Thread overview: 7+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2012-10-01 6:42 [PATCH] Prevent AMD MCE oops on multi-server system Daniel J Blueman 2012-10-01 10:06 ` Borislav Petkov 2012-10-01 16:12 ` Daniel J Blueman 2012-10-01 18:01 ` Borislav Petkov 2012-10-03 6:54 ` Daniel J Blueman 2012-10-17 19:18 ` [tip:x86/urgent] x86, amd, mce: Avoid NULL pointer reference on CPU northbridge lookup tip-bot for Daniel J Blueman 2012-10-21 18:16 ` [tip:x86/urgent] x86, AMD, MCE: Prevent oops on multi-server system tip-bot for Daniel J Blueman
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).