All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Prasanna S. Panchamukhi" <prasanna.panchamukhi@riverbed.com>
To: Keith Mannthey <kmannth@us.ibm.com>
Cc: Rob Becker <Rob.Becker@riverbed.com>,
	"bluesmoke-devel@lists.sourceforge.net"
	<bluesmoke-devel@lists.sourceforge.net>,
	Arthur Jones <Arthur.Jones@riverbed.com>,
	"dougthompson@xmission.com" <dougthompson@xmission.com>
Subject: Re: EDAC linux-2.6.34-rc5 non correctable errors not reported on AMD64 opteron
Date: Thu, 29 Apr 2010 15:31:21 -0700	[thread overview]
Message-ID: <20100429223121.GA20899@ppanchamukhi> (raw)
In-Reply-To: <1272579222.3792.18.camel@keith-laptop>

On Thu, Apr 29, 2010 at 03:13:42PM -0700, Keith Mannthey wrote:
> On Thu, 2010-04-29 at 11:30 -0700, Prasanna S. Panchamukhi wrote:
> > Hi Doug,
> > 
> > I am testing Linux-2.6.34-rc5 EDAC driver on AMD64 Opteron.
> > I am able to inject single bit errors and get the edac driver report the 
> > correctable errors.
> > But when I inject 2-bit errors, I did not see any notification or kernel 
> > log, the system simply hangs.
> > This happens with or without edac_mc_panic_on_ue enabled.
> > Please let me know if I am missing something.
> > Below are the details.
> 
> I would have to recheck the specs to be 100% sure but I would consider
> double bit errors to be fatal on normal Opteron boxes. There is a good
> chance your BIOS detects the fatal error and freezes the box to prevent
> data corruption.

Shouldn't the edac driver be reporting Uncorrectable Errors even 
before ..BIOS detects the fatal error and freezes the box?
Did someone already tested the 2-bit error injection and reporting
on AMD64?
Did the edac driver reported Uncorrectable Errors on other architectures
Powerpc/Intel?

Thanks
Prasanna


> 
> Thanks,
>   Keith 
> 
> > Thanks
> > Prasanna
> > 
> > Steps to reproduce the problem:
> > 
> > 
> > 1. Build Linux-2.6.34-rc5 using x86_64_defconfig with following additional config options enabled:
> > CONFIG_EDAC_DECODE_MCE=y
> > CONFIG_EDAC_MM_EDAC=y
> > CONFIG_EDAC_AMD64=m
> > CONFIG_EDAC_AMD64_ERROR_INJECTION=y
> > CONFIG_EDAC_E752X=m
> > CONFIG_EDAC_I82975X=m
> > CONFIG_EDAC_I3000=m
> > CONFIG_EDAC_I3200=m
> > CONFIG_EDAC_X38=m
> > CONFIG_EDAC_I5400=m
> > CONFIG_EDAC_I5000=m
> > CONFIG_EDAC_I5100=m
> > 
> > 2. insert the kernel module
> > #insmod amd64_edac_mod.ko
> > 
> > 3. Inject errors
> > 
> > # echo 3 > /sys/devices/system/edac/mc/mc0/inject_section  
> > # echo 7 > /sys/devices/system/edac/mc/mc0/inject_word
> > # echo 0x88 > /sys/devices/system/edac/mc/mc0/inject_ecc_vector
> > # echo 1 > /sys/devices/system/edac/mc/mc0/inject_read
> > # echo 1 > /sys/devices/system/edac/mc/mc0/inject_write
> > 
> > 4. Should hang the system in few minutes.
> > 
> > Additional info:
> > - AMD64 opteron
> > # cat /proc/cpuinfo
> > processor    : 0
> > vendor_id    : AuthenticAMD
> > cpu family   : 16
> > model          : 2
> > model name    : Quad-Core AMD Opteron(tm) Processor 2346 HE
> > stepping       : 3
> > cpu MHz      : 1800.023
> > cache size    : 512 KB
> > physical id   : 0
> > siblings       : 4
> > core id        : 0
> > cpu cores    : 4
> > apicid          : 0
> > initial apicid    : 0
> > fpu        : yes
> > fpu_exception    : yes
> > cpuid level    : 5
> > wp        : yes
> > flags        : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca 
> > cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt 
> > pdpe1gb rdtscp lm 3dnowext 3dnow constant_tsc rep_good nonstop_tsc 
> > extd_apicid pni monitor cx16 popcnt lahf_lm cmp_legacy svm extapic 
> > cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs npt lbrv svm_lock
> > bogomips    : 3600.04
> > TLB size    : 1024 4K pages
> > clflush size    : 64
> > cache_alignment    : 64
> > address sizes    : 48 bits physical, 48 bits virtual
> > power management: ts ttp tm stc 100mhzsteps hwpstate
> > 
> > processor    : 1
> > vendor_id    : AuthenticAMD
> > cpu family    : 16
> > model        : 2
> > model name    : Quad-Core AMD Opteron(tm) Processor 2346 HE
> > stepping    : 3
> > cpu MHz        : 1800.023
> > cache size    : 512 KB
> > physical id    : 0
> > siblings    : 4
> > core id        : 1
> > cpu cores    : 4
> > apicid        : 1
> > initial apicid    : 1
> > fpu        : yes
> > fpu_exception    : yes
> > cpuid level    : 5
> > wp        : yes
> > flags        : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca 
> > cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt 
> > pdpe1gb rdtscp lm 3dnowext 3dnow constant_tsc rep_good nonstop_tsc 
> > extd_apicid pni monitor cx16 popcnt lahf_lm cmp_legacy svm extapic 
> > cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs npt lbrv svm_lock
> > bogomips    : 3600.08
> > TLB size    : 1024 4K pages
> > clflush size    : 64
> > cache_alignment    : 64
> > address sizes    : 48 bits physical, 48 bits virtual
> > power management: ts ttp tm stc 100mhzsteps hwpstate
> > 
> > processor    : 2
> > vendor_id    : AuthenticAMD
> > cpu family    : 16
> > model        : 2
> > model name    : Quad-Core AMD Opteron(tm) Processor 2346 HE
> > stepping    : 3
> > cpu MHz        : 1800.023
> > cache size    : 512 KB
> > physical id    : 0
> > siblings    : 4
> > core id        : 2
> > cpu cores    : 4
> > apicid        : 2
> > initial apicid    : 2
> > fpu        : yes
> > fpu_exception    : yes
> > cpuid level    : 5
> > wp        : yes
> > flags        : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca 
> > cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt 
> > pdpe1gb rdtscp lm 3dnowext 3dnow constant_tsc rep_good nonstop_tsc 
> > extd_apicid pni monitor cx16 popcnt lahf_lm cmp_legacy svm extapic 
> > cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs npt lbrv svm_lock
> > bogomips    : 3599.96
> > TLB size    : 1024 4K pages
> > clflush size    : 64
> > cache_alignment    : 64
> > address sizes    : 48 bits physical, 48 bits virtual
> > power management: ts ttp tm stc 100mhzsteps hwpstate
> > 
> > processor    : 3
> > vendor_id    : AuthenticAMD
> > cpu family    : 16
> > model        : 2
> > model name    : Quad-Core AMD Opteron(tm) Processor 2346 HE
> > stepping    : 3
> > cpu MHz        : 1800.023
> > cache size    : 512 KB
> > physical id    : 0
> > siblings    : 4
> > core id        : 3
> > cpu cores    : 4
> > apicid        : 3
> > initial apicid    : 3
> > fpu        : yes
> > fpu_exception    : yes
> > cpuid level    : 5
> > wp        : yes
> > flags        : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca 
> > cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt 
> > pdpe1gb rdtscp lm 3dnowext 3dnow constant_tsc rep_good nonstop_tsc 
> > extd_apicid pni monitor cx16 popcnt lahf_lm cmp_legacy svm extapic 
> > cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs npt lbrv svm_lock
> > bogomips    : 3600.01
> > TLB size    : 1024 4K pages
> > clflush size    : 64
> > cache_alignment    : 64
> > address sizes    : 48 bits physical, 48 bits virtual
> > power management: ts ttp tm stc 100mhzsteps hwpstate
> > 
> > 
> > 
> > 
> > ------------------------------------------------------------------------------
> > _______________________________________________
> > bluesmoke-devel mailing list
> > bluesmoke-devel@lists.sourceforge.net
> > https://lists.sourceforge.net/lists/listinfo/bluesmoke-devel
> 

------------------------------------------------------------------------------

  reply	other threads:[~2010-04-29 22:31 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-04-29 18:30 EDAC linux-2.6.34-rc5 non correctable errors not reported on AMD64 opteron Prasanna S. Panchamukhi
2010-04-29 22:13 ` Keith Mannthey
2010-04-29 22:31   ` Prasanna S. Panchamukhi [this message]
2010-04-29 23:18     ` Keith Mannthey
2010-04-30  0:12       ` Prasanna S. Panchamukhi
2010-04-30  0:38         ` Keith Mannthey
2010-04-30 11:00           ` Borislav Petkov
2010-05-05  1:40             ` Prasanna S. Panchamukhi
2010-05-06 23:56             ` Prasanna S. Panchamukhi
2010-04-30 14:08         ` Ben Woodard
  -- strict thread matches above, loose matches on Subject: below --
2010-04-28 17:14 EDAC: Linux-2.6.34-rc5 non correctable errors not reported on AMD64 Opteron Prasanna Panchamukhi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20100429223121.GA20899@ppanchamukhi \
    --to=prasanna.panchamukhi@riverbed.com \
    --cc=Arthur.Jones@riverbed.com \
    --cc=Rob.Becker@riverbed.com \
    --cc=bluesmoke-devel@lists.sourceforge.net \
    --cc=dougthompson@xmission.com \
    --cc=kmannth@us.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.