* Machine Check Exception @ 2002-12-15 20:22 Felix von Leitner 2002-12-15 20:39 ` Jan-Benedict Glaw 2002-12-16 19:55 ` Felipe W Damasio 0 siblings, 2 replies; 18+ messages in thread From: Felix von Leitner @ 2002-12-15 20:22 UTC (permalink / raw) To: linux-kernel As soon as I start oggenc on my 2.5 kernel, I get this message: CPU 0: Machine Check Exception: 0000000000000004 Bank 0: f60600000000135 at 000000001ea46db0 Kernel panic: CPU context corrupt This vc then hangs, but I could log in and write down the message on another vc. Is this a hardware error? Should I replace my CPU? My memory? Is my machine overheating? I have had several strange and unexplained segfaults and reboots under 2.4 recently. Felix ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Machine Check Exception 2002-12-15 20:22 Machine Check Exception Felix von Leitner @ 2002-12-15 20:39 ` Jan-Benedict Glaw 2002-12-16 19:55 ` Felipe W Damasio 1 sibling, 0 replies; 18+ messages in thread From: Jan-Benedict Glaw @ 2002-12-15 20:39 UTC (permalink / raw) To: linux-kernel [-- Attachment #1: Type: text/plain, Size: 1072 bytes --] On Sun, 2002-12-15 21:22:27 +0100, Felix von Leitner <felix-kernel@fefe.de> wrote in message <20021215202227.GA7375@codeblau.de>: > As soon as I start oggenc on my 2.5 kernel, I get this message: > > CPU 0: Machine Check Exception: 0000000000000004 > Bank 0: f60600000000135 at 000000001ea46db0 > Kernel panic: CPU context corrupt > > This vc then hangs, but I could log in and write down the message on > another vc. Is this a hardware error? Should I replace my CPU? My > memory? Is my machine overheating? I have had several strange and > unexplained segfaults and reboots under 2.4 recently. Probably you're suffering from bad RAM. Please create a memtest86 boot floppy and try it in your own... Segfaults and reboots mostly are bad CPU fans or bad RAM:-p MfG, JBG -- Jan-Benedict Glaw jbglaw@lug-owl.de . +49-172-7608481 "Eine Freie Meinung in einem Freien Kopf | Gegen Zensur fuer einen Freien Staat voll Freier Bürger" | im Internet! Shell Script APT-Proxy: http://lug-owl.de/~jbglaw/software/ap2/ [-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Machine Check Exception 2002-12-15 20:22 Machine Check Exception Felix von Leitner 2002-12-15 20:39 ` Jan-Benedict Glaw @ 2002-12-16 19:55 ` Felipe W Damasio 1 sibling, 0 replies; 18+ messages in thread From: Felipe W Damasio @ 2002-12-16 19:55 UTC (permalink / raw) To: Felix von Leitner; +Cc: linux-kernel Felix von Leitner wrote: > As soon as I start oggenc on my 2.5 kernel, I get this message: > > CPU 0: Machine Check Exception: 0000000000000004 > Bank 0: f60600000000135 at 000000001ea46db0 > Kernel panic: CPU context corrupt > > This vc then hangs, but I could log in and write down the message on > another vc. Is this a hardware error? Should I replace my CPU? My > memory? Is my machine overheating? I have had several strange and > unexplained segfaults and reboots under 2.4 recently. Looks like a instruction fetch error from the level 1 cache. Your CPU may be overheating, yes. Or it could even be a faulty processor. Could you please check your cooler? What's the average CPU temp.? Felipe ^ permalink raw reply [flat|nested] 18+ messages in thread
* machine check exception @ 2003-12-03 11:05 Nicholas Mucci 2003-12-03 13:05 ` Russell Coker 0 siblings, 1 reply; 18+ messages in thread From: Nicholas Mucci @ 2003-12-03 11:05 UTC (permalink / raw) To: selinux Hi, Recently I installed the 2.6.0-test6-selinux1 kernel on my box and noticed that it will occassionally throw a machine check exception after detecting IDE devices on startup. I never saw this occur with the 2.6.0-test3 selinux kernel, 2.4.18 from RedHat 8, 2.4.9 or 2.4.8 on this machine. I know there is probably a hardware issue that is causing this, but I am curious as to why the MCE only surfaces now. If I use the older kernels this problem does not show up and I don't see an MCE. Has anyone seen or heard of this before? The MCE handler was modified in test5 to remove "useless junk" that caused problems on some cpus. This may be just a mainstream 2.6 thing, but I figured I would check and see if anybody here has any thoughts or ideas about this. Thanks, --------------------------------------- Nicholas Mucci Institute for Educational Research and Public Service University of Kansas Home: 785-812-2520 Cell: 847-445-7023 -- This message was distributed to subscribers of the selinux mailing list. If you no longer wish to subscribe, send mail to majordomo@tycho.nsa.gov with the words "unsubscribe selinux" without quotes as the message. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: machine check exception 2003-12-03 11:05 machine check exception Nicholas Mucci @ 2003-12-03 13:05 ` Russell Coker 2003-12-03 13:19 ` Stephen Smalley 0 siblings, 1 reply; 18+ messages in thread From: Russell Coker @ 2003-12-03 13:05 UTC (permalink / raw) To: Nicholas Mucci, selinux On Wed, 3 Dec 2003 22:05, Nicholas Mucci <nmucci@eecs.ku.edu> wrote: > Recently I installed the 2.6.0-test6-selinux1 kernel on my box and > noticed that it will occassionally throw a machine check exception after > detecting IDE devices on startup. I never saw this occur with the > 2.6.0-test3 selinux kernel, 2.4.18 from RedHat 8, 2.4.9 or 2.4.8 on this Does this only happen when running SE Linux? What happens when SE Linux is compiled in but not enabled (boot with selinux=0)? What happens when SE Linux is not compiled in? If the problem occurs when SE Linux is not compiled in to the kernel then it is something that is best discussed on the linux-kernel mailing list. -- http://www.coker.com.au/selinux/ My NSA Security Enhanced Linux packages http://www.coker.com.au/bonnie++/ Bonnie++ hard drive benchmark http://www.coker.com.au/postal/ Postal SMTP/POP benchmark http://www.coker.com.au/~russell/ My home page -- This message was distributed to subscribers of the selinux mailing list. If you no longer wish to subscribe, send mail to majordomo@tycho.nsa.gov with the words "unsubscribe selinux" without quotes as the message. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: machine check exception 2003-12-03 13:05 ` Russell Coker @ 2003-12-03 13:19 ` Stephen Smalley 0 siblings, 0 replies; 18+ messages in thread From: Stephen Smalley @ 2003-12-03 13:19 UTC (permalink / raw) To: Russell Coker; +Cc: Nicholas Mucci, selinux On Wed, 2003-12-03 at 08:05, Russell Coker wrote: > Does this only happen when running SE Linux? What happens when SE Linux is > compiled in but not enabled (boot with selinux=0)? What happens when SE > Linux is not compiled in? > > If the problem occurs when SE Linux is not compiled in to the kernel then it > is something that is best discussed on the linux-kernel mailing list. I'd also suggest using a more modern SELinux kernel, e.g. the kernel RPMs from under http://people.redhat.com/arjanv/2.5. Latest is 2.6.0-0.test11.1.99 -- Stephen Smalley <sds@epoch.ncsc.mil> National Security Agency -- This message was distributed to subscribers of the selinux mailing list. If you no longer wish to subscribe, send mail to majordomo@tycho.nsa.gov with the words "unsubscribe selinux" without quotes as the message. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Machine Check Exception
@ 2008-10-05 17:41 Matteo Croce
0 siblings, 0 replies; 18+ messages in thread
From: Matteo Croce @ 2008-10-05 17:41 UTC (permalink / raw)
To: hpa; +Cc: tglx, mingo, linux-kernel
Hi,
while bugging the wireless devs about a freeze i've discovered an hardware
bug:
HARDWARE ERROR. This is *NOT* a software problem!
Please contact your hardware vendor
CPU 0 4 northbridge Northbridge Watchdog error
bit57 = processor context corrupt
bit61 = error uncorrected
bus error 'generic participation, request timed out
generic error mem transaction
generic access, level generic'
STATUS b200000000070f0f MCGSTATUS 4
this is triggered byt the ath9k wireless driver when booting with smp > 1
here is the kernel bug report:
http://bugzilla.kernel.org/show_bug.cgi?id=11527#c68
here I asked support in the official AMD forum:
http://forums.amd.com/forum/messageview.cfm?catid=22&threadid=101051
I ask you this as you are the "CPU ERRATA WORKAROUNDS" man,
and I CC people which are listed in the "X86 ARCHITECTURE (32-BIT AND 64-BIT)"
entry.
Best Regards,
Matteo Croce
^ permalink raw reply [flat|nested] 18+ messages in thread* Machine check exception @ 2011-07-27 11:05 F. P. Beekhof 2011-07-27 13:03 ` Borislav Petkov 0 siblings, 1 reply; 18+ messages in thread From: F. P. Beekhof @ 2011-07-27 11:05 UTC (permalink / raw) To: Jeff Garzik, Mikael Pettersson, linux-ide Hello, I'm having some trouble with my Promise TX4302: it seems to cause kernel panics due to machine check exceptions. The MCE reads: [Hardware Error]: CPU 0: Machine Check Exception: 4 Bank 4: ... [Hardware Error]: TSC 264f8ca046 and then continues about a NorthBridge Watchdog timeout. After that, there is kernel stack dump: See http://tech.unige.ch/mce.jpg or http://tech.unige.ch/mce.png for a screenshot. These always happen when I'm accessing disks attached to the TX4302, as far as I can tell. Motherboard is an Asus a8v, cpu is amd athlon64, stock ubuntu 11.04 kernel. I have no idea where to go from here though. I can provide any information required or test patches, please ask. Best, and thanks in advance, F. Beekhof ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Machine check exception 2011-07-27 11:05 Machine check exception F. P. Beekhof @ 2011-07-27 13:03 ` Borislav Petkov 2011-07-27 15:31 ` F. P. Beekhof 0 siblings, 1 reply; 18+ messages in thread From: Borislav Petkov @ 2011-07-27 13:03 UTC (permalink / raw) To: F. P. Beekhof; +Cc: Jeff Garzik, Mikael Pettersson, linux-ide On Wed, Jul 27, 2011 at 01:05:16PM +0200, F. P. Beekhof wrote: > Hello, > > I'm having some trouble with my Promise TX4302: it seems to cause > kernel panics due to machine check exceptions. > > The MCE reads: > [Hardware Error]: CPU 0: Machine Check Exception: 4 Bank 4: ... > [Hardware Error]: TSC 264f8ca046 > and then continues about a NorthBridge Watchdog timeout. > After that, there is kernel stack dump: > > See http://tech.unige.ch/mce.jpg or http://tech.unige.ch/mce.png for > a screenshot. > > These always happen when I'm accessing disks attached to the TX4302, > as far as I can tell. Motherboard is an Asus a8v, cpu is amd > athlon64, stock ubuntu 11.04 kernel. > > I have no idea where to go from here though. I can provide any > information required or test patches, please ask. can you install msr-tools and do rdmsr 0xc001001f as root and send me the output please? Thanks. -- Regards/Gruss, Boris. Advanced Micro Devices GmbH Einsteinring 24, 85609 Dornach General Managers: Alberto Bozzo, Andrew Bowd Registration: Dornach, Gemeinde Aschheim, Landkreis Muenchen Registergericht Muenchen, HRB Nr. 43632 ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Machine check exception 2011-07-27 13:03 ` Borislav Petkov @ 2011-07-27 15:31 ` F. P. Beekhof 2011-07-27 17:03 ` Borislav Petkov 0 siblings, 1 reply; 18+ messages in thread From: F. P. Beekhof @ 2011-07-27 15:31 UTC (permalink / raw) To: Borislav Petkov; +Cc: Jeff Garzik, Mikael Pettersson, linux-ide On 07/27/2011 03:03 PM, Borislav Petkov wrote: > On Wed, Jul 27, 2011 at 01:05:16PM +0200, F. P. Beekhof wrote: >> Hello, >> >> I'm having some trouble with my Promise TX4302: it seems to cause >> kernel panics due to machine check exceptions. >> >> The MCE reads: >> [Hardware Error]: CPU 0: Machine Check Exception: 4 Bank 4: ... >> [Hardware Error]: TSC 264f8ca046 >> and then continues about a NorthBridge Watchdog timeout. >> After that, there is kernel stack dump: >> >> See http://tech.unige.ch/mce.jpg or http://tech.unige.ch/mce.png for >> a screenshot. >> >> These always happen when I'm accessing disks attached to the TX4302, >> as far as I can tell. Motherboard is an Asus a8v, cpu is amd >> athlon64, stock ubuntu 11.04 kernel. >> >> I have no idea where to go from here though. I can provide any >> information required or test patches, please ask. > > can you install msr-tools and do > > rdmsr 0xc001001f > > as root and send me the output please? > > Thanks. > $ sudo rdmsr 0xc001001f 8 Is there anything else I can do ? cpuinfo and lspci output below, hope that that is helpful... Best, Fokko $ cat /proc/cpuinfo processor : 0 vendor_id : AuthenticAMD cpu family : 15 model : 47 model name : AMD Athlon(tm) 64 Processor 3200+ stepping : 0 cpu MHz : 2000.000 cache size : 512 KB fpu : yes fpu_exception : yes cpuid level : 1 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 syscall nx mmxext fxsr_opt lm 3dnowext 3dnow up rep_good nopl extd_apicid pni lahf_lm bogomips : 4005.11 TLB size : 1024 4K pages clflush size : 64 cache_alignment : 64 address sizes : 40 bits physical, 48 bits virtual power management: ts fid vid ttp tm stc $ sudo lspci -vvv 00:00.0 Host bridge: VIA Technologies, Inc. K8T800Pro Host Bridge Subsystem: ASUSTeK Computer Inc. A8V Deluxe Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx- Status: Cap+ 66MHz+ UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort+ >SERR- <PERR- INTx- Latency: 64 Region 0: Memory at <ignored> (32-bit, prefetchable) Capabilities: [80] AGP version 3.0 Status: RQ=32 Iso- ArqSz=0 Cal=2 SBA+ ITACoh- GART64- HTrans- 64bit- FW+ AGP3+ Rate=x4,x8 Command: RQ=1 ArqSz=0 Cal=0 SBA+ AGP+ GART64- 64bit- FW- Rate=x8 Capabilities: [50] Power Management version 2 Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-) Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME- Capabilities: [60] HyperTransport: Slave or Primary Interface Command: BaseUnitID=0 UnitCnt=3 MastHost- DefDir- DUL- Link Control 0: CFlE- CST- CFE- <LkFail- Init+ EOC- TXO- <CRCErr=0 IsocEn- LSEn- ExtCTL- 64b- Link Config 0: MLWI=16bit DwFcIn- MLWO=16bit DwFcOut- LWI=16bit DwFcInEn- LWO=16bit DwFcOutEn- Link Control 1: CFlE- CST- CFE- <LkFail+ Init- EOC+ TXO+ <CRCErr=0 IsocEn- LSEn- ExtCTL- 64b- Link Config 1: MLWI=8bit DwFcIn- MLWO=8bit DwFcOut- LWI=8bit DwFcInEn- LWO=8bit DwFcOutEn- Revision ID: 1.02 Link Frequency 0: 1.0GHz Link Error 0: <Prot- <Ovfl- <EOC- CTLTm- Link Frequency Capability 0: 200MHz+ 300MHz- 400MHz+ 500MHz- 600MHz+ 800MHz+ 1.0GHz+ 1.2GHz- 1.4GHz- 1.6GHz- Vend- Feature Capability: IsocFC- LDTSTOP+ CRCTM- ECTLT- 64bA- UIDRD- Link Frequency 1: 200MHz Link Error 1: <Prot- <Ovfl- <EOC- CTLTm- Link Frequency Capability 1: 200MHz- 300MHz- 400MHz- 500MHz- 600MHz- 800MHz- 1.0GHz- 1.2GHz- 1.4GHz- 1.6GHz- Vend- Error Handling: PFlE- OFlE- PFE- OFE- EOCFE- RFE- CRCFE- SERRFE- CF- RE- PNFE- ONFE- EOCNFE- RNFE- CRCNFE- SERRNFE- Prefetchable memory behind bridge Upper: 00-00 Bus Number: 00 Capabilities: [58] HyperTransport: Interrupt Discovery and Configuration Kernel driver in use: agpgart-amd64 00:00.1 Host bridge: VIA Technologies, Inc. K8T800Pro Host Bridge Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0 00:00.2 Host bridge: VIA Technologies, Inc. K8T800Pro Host Bridge Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0 00:00.3 Host bridge: VIA Technologies, Inc. K8T800Pro Host Bridge Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0 00:00.4 Host bridge: VIA Technologies, Inc. K8T800Pro Host Bridge Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0 00:00.7 Host bridge: VIA Technologies, Inc. K8T800Pro Host Bridge Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0 00:01.0 PCI bridge: VIA Technologies, Inc. VT8237 PCI bridge [K8T800/K8T890 South] (prog-if 00 [Normal decode]) Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx- Status: Cap+ 66MHz+ UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0 Bus: primary=00, secondary=01, subordinate=01, sec-latency=0 I/O behind bridge: 0000e000-0000efff Memory behind bridge: fbd00000-fbffffff Prefetchable memory behind bridge: e8000000-faffffff Secondary status: 66MHz+ FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- <SERR- <PERR- BridgeCtl: Parity- SERR+ NoISA- VGA+ MAbort- >Reset- FastB2B- PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn- Capabilities: [80] Power Management version 2 Flags: PMEClk- DSI- D1+ D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-) Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME- Kernel modules: shpchp 00:0a.0 Ethernet controller: Marvell Technology Group Ltd. 88E8001 Gigabit Ethernet Controller (rev 13) Subsystem: ASUSTeK Computer Inc. Marvell 88E8001 Gigabit Ethernet Controller (Asus) Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx- Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 64 (5750ns min, 7750ns max), Cache Line Size: 64 bytes Interrupt: pin A routed to IRQ 17 Region 0: Memory at fb700000 (32-bit, non-prefetchable) [size=16K] Region 1: I/O ports at 9400 [size=256] Expansion ROM at fb600000 [disabled] [size=128K] Capabilities: [48] Power Management version 2 Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA PME(D0+,D1+,D2+,D3hot+,D3cold+) Status: D0 NoSoftRst- PME-Enable- DSel=7 DScale=1 PME- Capabilities: [50] Vital Product Data Product Name: Yukon Gigabit Ethernet 10/100/1000Base-T Adapter Read-only fields: [PN] Part number: Yukon 88E8001 [EC] Engineering changes: Rev. 1.3 [MN] Manufacture ID: 4d 61 72 76 65 6c 6c [SN] Serial number: AbCdEfG334454 [CP] Extended capability: 01 10 cc 03 [RV] Reserved: checksum good, 10 byte(s) reserved Read/write fields: [RW] Read-write area: 121 byte(s) free End Kernel driver in use: skge Kernel modules: skge 00:0d.0 Mass storage controller: Promise Technology, Inc. PDC40718 (SATA 300 TX4) (rev 02) Subsystem: Promise Technology, Inc. PDC40718 (SATA 300 TX4) Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx- Status: Cap+ 66MHz+ UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 88 (1000ns min, 4500ns max), Cache Line Size: 4 bytes Interrupt: pin A routed to IRQ 18 Region 0: I/O ports at a000 [size=128] Region 2: I/O ports at 9800 [size=256] Region 3: Memory at fba00000 (32-bit, non-prefetchable) [size=4K] Region 4: Memory at fb900000 (32-bit, non-prefetchable) [size=128K] Expansion ROM at fb800000 [disabled] [size=32K] Capabilities: [60] Power Management version 2 Flags: PMEClk- DSI+ D1+ D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-) Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME- Kernel driver in use: sata_promise Kernel modules: sata_promise 00:0f.0 RAID bus controller: VIA Technologies, Inc. VIA VT6420 SATA RAID Controller (rev 80) Subsystem: ASUSTeK Computer Inc. A7V600/K8V Deluxe/K8V-X/A8V Deluxe motherboard Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 64 Interrupt: pin B routed to IRQ 20 Region 0: I/O ports at c000 [size=8] Region 1: I/O ports at b800 [size=4] Region 2: I/O ports at b400 [size=8] Region 3: I/O ports at b000 [size=4] Region 4: I/O ports at a800 [size=16] Region 5: I/O ports at a400 [size=256] Capabilities: [c0] Power Management version 2 Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-) Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME- Kernel driver in use: sata_via Kernel modules: sata_via 00:0f.1 IDE interface: VIA Technologies, Inc. VT82C586A/B/VT82C686/A/B/VT823x/A/C PIPC Bus Master IDE (rev 06) (prog-if 8a [Master SecP PriP]) Subsystem: ASUSTeK Computer Inc. A7V600/K8V-X/A8V Deluxe motherboard Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 32 Interrupt: pin A routed to IRQ 20 Region 0: [virtual] Memory at 000001f0 (32-bit, non-prefetchable) [size=8] Region 1: [virtual] Memory at 000003f0 (type 3, non-prefetchable) [size=1] Region 2: [virtual] Memory at 00000170 (32-bit, non-prefetchable) [size=8] Region 3: [virtual] Memory at 00000370 (type 3, non-prefetchable) [size=1] Region 4: I/O ports at fc00 [size=16] Capabilities: [c0] Power Management version 2 Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-) Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME- Kernel driver in use: pata_via Kernel modules: pata_via 00:10.0 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller (rev 81) (prog-if 00 [UHCI]) Subsystem: ASUSTeK Computer Inc. A7V600/K8V-X/A8V Deluxe motherboard Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 64, Cache Line Size: 64 bytes Interrupt: pin A routed to IRQ 21 Region 4: I/O ports at c400 [size=32] Capabilities: [80] Power Management version 2 Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=375mA PME(D0+,D1+,D2+,D3hot+,D3cold+) Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME- Kernel driver in use: uhci_hcd 00:10.1 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller (rev 81) (prog-if 00 [UHCI]) Subsystem: ASUSTeK Computer Inc. A7V600/K8V-X/A8V Deluxe motherboard Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 64, Cache Line Size: 64 bytes Interrupt: pin A routed to IRQ 21 Region 4: I/O ports at c800 [size=32] Capabilities: [80] Power Management version 2 Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=375mA PME(D0+,D1+,D2+,D3hot+,D3cold+) Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME- Kernel driver in use: uhci_hcd 00:10.2 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller (rev 81) (prog-if 00 [UHCI]) Subsystem: ASUSTeK Computer Inc. A7V600/K8V-X/A8V Deluxe motherboard Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 64, Cache Line Size: 64 bytes Interrupt: pin B routed to IRQ 21 Region 4: I/O ports at d000 [size=32] Capabilities: [80] Power Management version 2 Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=375mA PME(D0+,D1+,D2+,D3hot+,D3cold+) Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME- Kernel driver in use: uhci_hcd 00:10.3 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller (rev 81) (prog-if 00 [UHCI]) Subsystem: ASUSTeK Computer Inc. A7V600/K8V-X/A8V Deluxe motherboard Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 64, Cache Line Size: 64 bytes Interrupt: pin B routed to IRQ 21 Region 4: I/O ports at d400 [size=32] Capabilities: [80] Power Management version 2 Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=375mA PME(D0+,D1+,D2+,D3hot+,D3cold+) Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME+ Kernel driver in use: uhci_hcd 00:10.4 USB Controller: VIA Technologies, Inc. USB 2.0 (rev 86) (prog-if 20 [EHCI]) Subsystem: ASUSTeK Computer Inc. A7V600/K8V-X/A8V Deluxe motherboard Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 64, Cache Line Size: 64 bytes Interrupt: pin C routed to IRQ 21 Region 0: Memory at fbc00000 (32-bit, non-prefetchable) [size=256] Capabilities: [80] Power Management version 2 Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=375mA PME(D0+,D1+,D2+,D3hot+,D3cold+) Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME- Kernel driver in use: ehci_hcd 00:11.0 ISA bridge: VIA Technologies, Inc. VT8237 ISA bridge [KT600/K8T800/K8T890 South] Subsystem: ASUSTeK Computer Inc. A7V600/K8V-X/A8V Deluxe motherboard Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping+ SERR- FastB2B- DisINTx- Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0 Capabilities: [c0] Power Management version 2 Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-) Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME- Kernel modules: i2c-viapro 00:11.5 Multimedia audio controller: VIA Technologies, Inc. VT8233/A/8235/8237 AC97 Audio Controller (rev 60) Subsystem: ASUSTeK Computer Inc. A8V Deluxe motherboard (Realtek ALC850 codec) Control: I/O+ Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Interrupt: pin C routed to IRQ 22 Region 0: I/O ports at d800 [size=256] Capabilities: [c0] Power Management version 2 Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-) Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME- Kernel driver in use: VIA 82xx Audio Kernel modules: snd-via82xx 00:11.6 Communication controller: VIA Technologies, Inc. AC'97 Modem Controller (rev 80) Control: I/O+ Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Interrupt: pin C routed to IRQ 22 Region 0: I/O ports at 1000 [size=256] Capabilities: [d0] Power Management version 2 Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-) Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME- Kernel modules: snd-via82xx-modem 00:18.0 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] HyperTransport Technology Configuration Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Capabilities: [80] HyperTransport: Host or Secondary Interface Command: WarmRst+ DblEnd- DevNum=0 ChainSide- HostHide+ Slave- <EOCErr- DUL- Link Control: CFlE- CST- CFE- <LkFail- Init+ EOC- TXO- <CRCErr=0 IsocEn- LSEn- ExtCTL- 64b- Link Config: MLWI=16bit DwFcIn- MLWO=16bit DwFcOut- LWI=16bit DwFcInEn- LWO=16bit DwFcOutEn- Revision ID: 1.02 Link Frequency: 1.0GHz Link Error: <Prot- <Ovfl- <EOC- CTLTm- Link Frequency Capability: 200MHz+ 300MHz- 400MHz+ 500MHz- 600MHz+ 800MHz+ 1.0GHz+ 1.2GHz- 1.4GHz- 1.6GHz- Vend- Feature Capability: IsocFC- LDTSTOP+ CRCTM- ECTLT- 64bA- UIDRD- ExtRS- UCnfE- 00:18.1 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Address Map Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- 00:18.2 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] DRAM Controller Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Kernel driver in use: amd64_edac Kernel modules: amd64_edac_mod 00:18.3 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Miscellaneous Control Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Kernel driver in use: k8temp Kernel modules: k8temp 01:00.0 VGA compatible controller: ATI Technologies Inc RV350 AR [Radeon 9600] (prog-if 00 [VGA controller]) Subsystem: PC Partner Limited Sapphire Radeon 9600XT Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx- Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 64 (2000ns min), Cache Line Size: 64 bytes Interrupt: pin A routed to IRQ 16 Region 0: Memory at e8000000 (32-bit, prefetchable) [size=128M] Region 1: I/O ports at e000 [size=256] Region 2: Memory at fbe00000 (32-bit, non-prefetchable) [size=64K] Expansion ROM at fbd00000 [disabled] [size=128K] Capabilities: [58] AGP version 3.0 Status: RQ=256 Iso- ArqSz=0 Cal=0 SBA+ ITACoh- GART64- HTrans- 64bit- FW+ AGP3+ Rate=x4,x8 Command: RQ=32 ArqSz=0 Cal=0 SBA+ AGP+ GART64- 64bit- FW- Rate=x8 Capabilities: [50] Power Management version 2 Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-) Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME- Kernel driver in use: radeon Kernel modules: radeon, radeonfb 01:00.1 Display controller: ATI Technologies Inc RV350 AR [Radeon 9600] (Secondary) Subsystem: PC Partner Limited Sapphire Radeon 9600XT (Secondary) Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 64 (2000ns min), Cache Line Size: 64 bytes Region 0: Memory at f0000000 (32-bit, prefetchable) [size=128M] Region 1: Memory at fbf00000 (32-bit, non-prefetchable) [size=64K] Capabilities: [50] Power Management version 2 Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-) Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME- ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Machine check exception 2011-07-27 15:31 ` F. P. Beekhof @ 2011-07-27 17:03 ` Borislav Petkov 2011-07-27 20:54 ` F. P. Beekhof 0 siblings, 1 reply; 18+ messages in thread From: Borislav Petkov @ 2011-07-27 17:03 UTC (permalink / raw) To: F. P. Beekhof; +Cc: Jeff Garzik, Mikael Pettersson, linux-ide On Wed, Jul 27, 2011 at 05:31:56PM +0200, F. P. Beekhof wrote: > $ sudo rdmsr 0xc001001f > 8 > > Is there anything else I can do ? Ok, I'd like you to try something out: Boot into runlevel 1: you need this because you get the MCE before you've done starting apache. For that, add a "1" to your kernel command line and boot. When you get the prompt, type in your root pwd and do wrmsr 0xc001001f $(( $(rdmsr -u 0xc001001f) | (1 << 20) )) Then do rdmsr -x 0xc001001f to verify that the write has succeeded. It should say 0x00100008. then exit the prompt to continue to runlevel 2 to see whether this setting fixes your MCE issue. This is for now, thanks. -- Regards/Gruss, Boris. Advanced Micro Devices GmbH Einsteinring 24, 85609 Dornach GM: Alberto Bozzo Reg: Dornach, Landkreis Muenchen HRB Nr. 43632 WEEE Registernr: 129 19551 ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Machine check exception 2011-07-27 17:03 ` Borislav Petkov @ 2011-07-27 20:54 ` F. P. Beekhof 2011-07-27 21:30 ` F. P. Beekhof 0 siblings, 1 reply; 18+ messages in thread From: F. P. Beekhof @ 2011-07-27 20:54 UTC (permalink / raw) To: Borislav Petkov; +Cc: Jeff Garzik, Mikael Pettersson, linux-ide On 07/27/2011 07:03 PM, Borislav Petkov wrote: > On Wed, Jul 27, 2011 at 05:31:56PM +0200, F. P. Beekhof wrote: >> $ sudo rdmsr 0xc001001f >> 8 >> >> Is there anything else I can do ? > > Ok, I'd like you to try something out: > > Boot into runlevel 1: you need this because you get the MCE before > you've done starting apache. For that, add a "1" to your kernel command > line and boot. When you get the prompt, type in your root pwd and do > > wrmsr 0xc001001f $(( $(rdmsr -u 0xc001001f) | (1<< 20) )) > > Then do > > rdmsr -x 0xc001001f > > to verify that the write has succeeded. It should say > > 0x00100008. > > then exit the prompt to continue to runlevel 2 to see whether this > setting fixes your MCE issue. > > This is for now, > thanks. > Ok, writing the register worked. Now we just need to wait, these crashes occur at random moments. Sometimes there is 5 minutes between two crashes, sometimes a few days... I'll post an update as soon as I have more information. Thanks for helping out! ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Machine check exception 2011-07-27 20:54 ` F. P. Beekhof @ 2011-07-27 21:30 ` F. P. Beekhof 2011-07-28 7:47 ` Borislav Petkov 0 siblings, 1 reply; 18+ messages in thread From: F. P. Beekhof @ 2011-07-27 21:30 UTC (permalink / raw) To: Borislav Petkov; +Cc: Jeff Garzik, Mikael Pettersson, linux-ide Note: after a suspend/resume cycle, the register value is back at 8, so I have to run the commands again to set it to 100008 # rdmsr -x 0xc001001f 100008 (suspend / resume) # rdmsr -x 0xc001001f 8 On 07/27/2011 10:54 PM, F. P. Beekhof wrote: > On 07/27/2011 07:03 PM, Borislav Petkov wrote: >> On Wed, Jul 27, 2011 at 05:31:56PM +0200, F. P. Beekhof wrote: >>> $ sudo rdmsr 0xc001001f >>> 8 >>> >>> Is there anything else I can do ? >> >> Ok, I'd like you to try something out: >> >> Boot into runlevel 1: you need this because you get the MCE before >> you've done starting apache. For that, add a "1" to your kernel command >> line and boot. When you get the prompt, type in your root pwd and do >> >> wrmsr 0xc001001f $(( $(rdmsr -u 0xc001001f) | (1<< 20) )) >> >> Then do >> >> rdmsr -x 0xc001001f >> >> to verify that the write has succeeded. It should say >> >> 0x00100008. >> >> then exit the prompt to continue to runlevel 2 to see whether this >> setting fixes your MCE issue. >> >> This is for now, >> thanks. >> > > Ok, writing the register worked. > > Now we just need to wait, these crashes occur at random moments. > Sometimes there is 5 minutes between two crashes, sometimes a few days... > > I'll post an update as soon as I have more information. > > Thanks for helping out! > ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Machine check exception 2011-07-27 21:30 ` F. P. Beekhof @ 2011-07-28 7:47 ` Borislav Petkov 2011-07-31 8:22 ` F. P. Beekhof 0 siblings, 1 reply; 18+ messages in thread From: Borislav Petkov @ 2011-07-28 7:47 UTC (permalink / raw) To: F. P. Beekhof; +Cc: Borislav Petkov, Jeff Garzik, Mikael Pettersson, linux-ide On Wed, Jul 27, 2011 at 11:30:08PM +0200, F. P. Beekhof wrote: > Note: after a suspend/resume cycle, the register value is back at 8, > so I have to run the commands again to set it to 100008 > > # rdmsr -x 0xc001001f > 100008 > (suspend / resume) > # rdmsr -x 0xc001001f > 8 Yeah, that's ok for now, just to test whether this fixes your issue. You can add the wrmsr call to some post-resume hooks on your system. -- Regards/Gruss, Boris. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Machine check exception 2011-07-28 7:47 ` Borislav Petkov @ 2011-07-31 8:22 ` F. P. Beekhof 2011-07-31 12:09 ` Borislav Petkov 0 siblings, 1 reply; 18+ messages in thread From: F. P. Beekhof @ 2011-07-31 8:22 UTC (permalink / raw) To: Borislav Petkov Cc: Borislav Petkov, Jeff Garzik, Mikael Pettersson, linux-ide Hi, On 07/28/2011 09:47 AM, Borislav Petkov wrote: > On Wed, Jul 27, 2011 at 11:30:08PM +0200, F. P. Beekhof wrote: >> Note: after a suspend/resume cycle, the register value is back at 8, >> so I have to run the commands again to set it to 100008 >> >> # rdmsr -x 0xc001001f >> 100008 >> (suspend / resume) >> # rdmsr -x 0xc001001f >> 8 > > Yeah, that's ok for now, just to test whether this fixes your issue. You > can add the wrmsr call to some post-resume hooks on your system. > I've used the hooks to call a script, the value is 100008 after resume, and I'm booting the system by going onto 'recovery console', running the script to set msr 0xc001001f to 100008, then completing the normal boot procedure. So far, it seems to have fixed the issue, in the sense that there have been no MCEs yet. There was some call trace after a suspend/resume (see below), but that's it. I found that one can enable ECC on ram in the bios, which I did. As far as I know, this is non-ECC ram, so frankly I'm at a loss about To provoke MCEs, I've added a firewire card, that I had pulled out before. Removing that thing had reduced the number of MCEs, but not eliminated them. With a regular boot sequence (no msr setting), the radeon driver complained of something and the system froze within 5 minutes. I then rebooted and followed your instructions, so far the system is working perfectly fine. I've also switched two eSATA on and off a few times, they are detected fine now with no crash, and let banshee run. That has frequently proven to be too much, but now it is fine. All of this is no definite proof that all is well, but it certainly seems more stable. Is there anything else I can do ? Are there any conclusions that can be drawn from this experiment ? Best, Fokko [18297.261773] WARNING: at /build/buildd/linux-2.6.38/kernel/power/suspend_test.c:53 suspend_test_finish+0x86/0x90() [18297.261775] Hardware name: System Product Name [18297.261777] Component: resume devices, time: 17880 [18297.261778] Modules linked in: parport_pc ppdev binfmt_misc msr snd_via82xx snd_via82xx_modem gameport snd_ac97_codec ac97_bus snd_pcm snd_mpu401_uart snd_seq_midi radeon snd_rawmidi snd_seq_midi_event ttm drm_kms_helper snd_seq drm snd_timer amd64_edac_mod snd_seq_device edac_core i2c_algo_bit snd snd_page_alloc edac_mce_amd lp soundcore i2c_viapro k8temp parport shpchp reiserfs usb_storage uas usbhid hid firewire_ohci skge sata_via pata_via sata_promise firewire_core crc_itu_t raid10 raid456 async_pq async_xor xor async_memcpy async_raid6_recov raid6_pq async_tx raid1 raid0 multipath linear [18297.261815] Pid: 16135, comm: pm-suspend Not tainted 2.6.38-10-generic #46-Ubuntu [18297.261817] Call Trace: [18297.261824] [<ffffffff81065cbf>] ? warn_slowpath_common+0x7f/0xc0 [18297.261828] [<ffffffff81065db6>] ? warn_slowpath_fmt+0x46/0x50 [18297.261831] [<ffffffff810a75d6>] ? suspend_test_finish+0x86/0x90 [18297.261834] [<ffffffff810a72f7>] ? suspend_devices_and_enter+0xa7/0x160 [18297.261837] [<ffffffff810a74d5>] ? enter_state+0x125/0x150 [18297.261840] [<ffffffff810a6936>] ? state_store+0xc6/0x100 [18297.261845] [<ffffffff812dcb67>] ? kobj_attr_store+0x17/0x20 [18297.261848] [<ffffffff811d3d4e>] ? sysfs_write_file+0xde/0x160 [18297.261852] [<ffffffff81164e16>] ? vfs_write+0xc6/0x180 [18297.261855] [<ffffffff81165131>] ? sys_write+0x51/0x90 [18297.261859] [<ffffffff8100c002>] ? system_call_fastpath+0x16/0x1b [18297.261861] ---[ end trace d1b3663bc80e2f9e ]--- [18297.271611] PM: Finishing wakeup. [18297.271613] Restarting tasks ... done. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Machine check exception 2011-07-31 8:22 ` F. P. Beekhof @ 2011-07-31 12:09 ` Borislav Petkov 2011-07-31 15:56 ` F. P. Beekhof 0 siblings, 1 reply; 18+ messages in thread From: Borislav Petkov @ 2011-07-31 12:09 UTC (permalink / raw) To: F. P. Beekhof Cc: Borislav Petkov, Borislav Petkov, Jeff Garzik, Mikael Pettersson, linux-ide@vger.kernel.org Hi, On Sun, Jul 31, 2011 at 04:22:41AM -0400, F. P. Beekhof wrote: > I've used the hooks to call a script, the value is 100008 after > resume, and I'm booting the system by going onto 'recovery console', > running the script to set msr 0xc001001f to 100008, then completing > the normal boot procedure. Hmm, there has a to be a way to automate that. Maybe push /etc/init.d/rc.local up in the call prio so that it gets run as early as possible? > So far, it seems to have fixed the issue, in the sense that there have > been no MCEs yet. There was some call trace after a suspend/resume > (see below), but that's it. Yeah, its on resume. This warning fires because it took the system 17880 msecs to resume and the test was expecting something under 10000. It could be unstable RTC clock or something. You could disable it by turning CONFIG_PM_TEST_SUSPEND off for your kernel if there's no other issues with suspend/resume beside that warning firing. > I found that one can enable ECC on ram in the bios, which I did. As > far as I know, this is non-ECC ram, so frankly I'm at a loss about Maybe the BIOS is not properly detecting whether DRAM is ECC or not. Normally, if it is not, it should simply remove the option to enable ECC from the menu. To check what the hw says, do $ setpci -s 18.3 0x44.l as root and send me the result pls. > To provoke MCEs, I've added a firewire card, that I had pulled out > before. Removing that thing had reduced the number of MCEs, but not > eliminated them. With a regular boot sequence (no msr setting), the > radeon driver complained of something and the system froze within 5 > minutes. I then rebooted and followed your instructions, so far the > system is working perfectly fine. good. > I've also switched two eSATA on and off a few times, they are detected > fine now with no crash, and let banshee run. That has frequently > proven to be too much, but now it is fine. good. > All of this is no definite proof that all is well, but it certainly > seems more stable. I'd suggest you run your system at full swing and watch it for signs of trouble a couple of days longer just in case. > Are there any conclusions that can be drawn from this experiment ? Yeah, it means that your BIOS doesn't seem to have the fix for erratum #131: http://support.amd.com/us/Processor_TechDocs/25759.pdf, page 83. I don't know whether there is BIOS for your ancient CPU :-) and if there were, whether upgrading it won't break something else. If I were you, I'd run the automated script hooks and don't care about upgrade... provided we don't see any other hickups that is and provided we manage to automate them so that you don't have to boot into recovery console every time. Let me know how it all plays out. HTH. -- Regards/Gruss, Boris. Advanced Micro Devices GmbH Einsteinring 24, 85609 Dornach GM: Alberto Bozzo Reg: Dornach, Landkreis Muenchen HRB Nr. 43632 WEEE Registernr: 129 19551 ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Machine check exception 2011-07-31 12:09 ` Borislav Petkov @ 2011-07-31 15:56 ` F. P. Beekhof 2011-08-01 8:48 ` Borislav Petkov 0 siblings, 1 reply; 18+ messages in thread From: F. P. Beekhof @ 2011-07-31 15:56 UTC (permalink / raw) To: Borislav Petkov Cc: Borislav Petkov, Mikael Pettersson, linux-ide@vger.kernel.org >> I found that one can enable ECC on ram in the bios, which I did. As >> far as I know, this is non-ECC ram, so frankly I'm at a loss about > > Maybe the BIOS is not properly detecting whether DRAM is ECC or not. > Normally, if it is not, it should simply remove the option to enable ECC > from the menu. > > To check what the hw says, do > > $ setpci -s 18.3 0x44.l > > as root and send me the result pls. > $ sudo setpci -s 18.3 0x44.l 00400040 > I'd suggest you run your system at full swing and watch it for signs of > trouble a couple of days longer just in case. Ok. >> Are there any conclusions that can be drawn from this experiment ? > > Yeah, it means that your BIOS doesn't seem to have the fix for erratum > #131: http://support.amd.com/us/Processor_TechDocs/25759.pdf, page 83. So, this is not a problem with the promise-sata driver as I originally suspected. I guess then we can take this discussion off the linux-ide list... > I don't know whether there is BIOS for your ancient CPU :-) and if there > were, whether upgrading it won't break something else. > > If I were you, I'd run the automated script hooks and don't care about > upgrade... provided we don't see any other hickups that is and provided > we manage to automate them so that you don't have to boot into recovery > console every time. > > Let me know how it all plays out. Will do! Many thanks! > HTH. > ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Machine check exception 2011-07-31 15:56 ` F. P. Beekhof @ 2011-08-01 8:48 ` Borislav Petkov 0 siblings, 0 replies; 18+ messages in thread From: Borislav Petkov @ 2011-08-01 8:48 UTC (permalink / raw) To: F. P. Beekhof Cc: Borislav Petkov, Mikael Pettersson, linux-ide@vger.kernel.org On Sun, Jul 31, 2011 at 11:56:33AM -0400, F. P. Beekhof wrote: > $ sudo setpci -s 18.3 0x44.l > 00400040 Interesting, this says ECC is enabled on your machine. Do you know the exact models of your DIMMs? Sometimes they can be found in dmidecode output so can you do dmidecode > dmidecode.out and lspci -s 18.3 -xxxx > f3.out as root and send me both files pls? .. > >> Are there any conclusions that can be drawn from this experiment ? > > > > Yeah, it means that your BIOS doesn't seem to have the fix for erratum > > #131: http://support.amd.com/us/Processor_TechDocs/25759.pdf, page 83. > > So, this is not a problem with the promise-sata driver as I originally > suspected. I guess then we can take this discussion off the linux-ide > list... Yeah, looks like it. -- Regards/Gruss, Boris. Advanced Micro Devices GmbH Einsteinring 24, 85609 Dornach GM: Alberto Bozzo Reg: Dornach, Landkreis Muenchen HRB Nr. 43632 WEEE Registernr: 129 19551 ^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2011-08-01 8:49 UTC | newest] Thread overview: 18+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2002-12-15 20:22 Machine Check Exception Felix von Leitner 2002-12-15 20:39 ` Jan-Benedict Glaw 2002-12-16 19:55 ` Felipe W Damasio -- strict thread matches above, loose matches on Subject: below -- 2003-12-03 11:05 machine check exception Nicholas Mucci 2003-12-03 13:05 ` Russell Coker 2003-12-03 13:19 ` Stephen Smalley 2008-10-05 17:41 Machine Check Exception Matteo Croce 2011-07-27 11:05 Machine check exception F. P. Beekhof 2011-07-27 13:03 ` Borislav Petkov 2011-07-27 15:31 ` F. P. Beekhof 2011-07-27 17:03 ` Borislav Petkov 2011-07-27 20:54 ` F. P. Beekhof 2011-07-27 21:30 ` F. P. Beekhof 2011-07-28 7:47 ` Borislav Petkov 2011-07-31 8:22 ` F. P. Beekhof 2011-07-31 12:09 ` Borislav Petkov 2011-07-31 15:56 ` F. P. Beekhof 2011-08-01 8:48 ` Borislav Petkov
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.