All of lore.kernel.org
 help / color / mirror / Atom feed
From: Borislav Petkov <bp@alien8.de>
To: Mauro Carvalho Chehab <mchehab@redhat.com>
Cc: linux-edac <linux-edac@vger.kernel.org>,
	lkml <linux-kernel@vger.kernel.org>
Subject: Re: [GIT PULL] EDAC fixes for 3.8
Date: Sat, 9 Mar 2013 16:46:35 +0100	[thread overview]
Message-ID: <20130309154635.GA18316@pd.tnic> (raw)
In-Reply-To: <20130307110213.7a5a9978@redhat.com>

On Thu, Mar 07, 2013 at 11:02:13AM -0300, Mauro Carvalho Chehab wrote:
> Sure. See below:
> 
> [   19.062902] EDAC MC: Ver: 3.0.0
> [   19.088757] EDAC DEBUG: edac_mc_sysfs_init: device mc created
> [   19.284745] AMD64 EDAC driver v3.4.0
> [   19.299082] EDAC amd64: DRAM ECC enabled.
> [   19.315960] EDAC DEBUG: amd64_nb_mce_bank_enabled_on_node: core: 0, MCG_CTL: 0x3f, NB MSR is enabled

								^^^^^^^
Whoops, where did core 1 go? Strange.

> [   19.321115] EDAC DEBUG: amd64_nb_mce_bank_enabled_on_node: core: 2, MCG_CTL: 0x3f, NB MSR is enabled
> [   19.321118] EDAC DEBUG: amd64_nb_mce_bank_enabled_on_node: core: 3, MCG_CTL: 0x3f, NB MSR is enabled
> [   19.321120] EDAC DEBUG: amd64_nb_mce_bank_enabled_on_node: core: 4, MCG_CTL: 0x3f, NB MSR is enabled
> [   19.321123] EDAC DEBUG: amd64_nb_mce_bank_enabled_on_node: core: 5, MCG_CTL: 0x3f, NB MSR is enabled
> [   19.321125] EDAC DEBUG: amd64_nb_mce_bank_enabled_on_node: core: 6, MCG_CTL: 0x3f, NB MSR is enabled
> [   19.321140] EDAC amd64: F10h detected (node 0).
> [   19.327072] EDAC DEBUG: reserve_mc_sibling_devs: F1: 0000:00:18.1
> [   19.327074] EDAC DEBUG: reserve_mc_sibling_devs: F2: 0000:00:18.2
> [   19.327076] EDAC DEBUG: reserve_mc_sibling_devs: F3: 0000:00:18.3
> [   19.327078] EDAC DEBUG: read_mc_regs:   TOP_MEM:  0x00000000e0000000
> [   19.327081] EDAC DEBUG: read_mc_regs:   TOP_MEM2: 0x0000000420000000

Looks about right - 16G.

> [   19.327087] EDAC DEBUG: read_dram_ctl_register: F2x110 (DCTSelLow): 0x000005e4, High range addrs at: 0x0
> [   19.327089] EDAC DEBUG: read_dram_ctl_register:   DCTs operate in unganged mode
> [   19.327091] EDAC DEBUG: read_dram_ctl_register:   Address range split per DCT: no
> [   19.327093] EDAC DEBUG: read_dram_ctl_register:   data interleave for ECC: enabled, DRAM cleared since last warm reset: yes
> [   19.327095] EDAC DEBUG: read_dram_ctl_register:   channel interleave: enabled, interleave bits selector: 0x3
> [   19.327099] EDAC DEBUG: read_mc_regs:   DRAM range[0], base: 0x0000000000000000; limit: 0x000000021fffffff
> [   19.327101] EDAC DEBUG: read_mc_regs:    IntlvEn=Disabled; Range access: RW IntlvSel=0 DstNode=0
> [   19.327104] EDAC DEBUG: read_mc_regs:   DRAM range[1], base: 0x0000000220000000; limit: 0x000000041fffffff
> [   19.327107] EDAC DEBUG: read_mc_regs:    IntlvEn=Disabled; Range access: RW IntlvSel=0 DstNode=1
> [   19.327114] EDAC DEBUG: read_dct_base_mask:   DCSB0[0]=0x00000000 reg: F2x40
> [   19.327117] EDAC DEBUG: read_dct_base_mask:   DCSB1[0]=0x00000000 reg: F2x140
> [   19.327119] EDAC DEBUG: read_dct_base_mask:   DCSB0[1]=0x00000000 reg: F2x44
> [   19.327121] EDAC DEBUG: read_dct_base_mask:   DCSB1[1]=0x00000000 reg: F2x144
> [   19.327123] EDAC DEBUG: read_dct_base_mask:   DCSB0[2]=0x00000001 reg: F2x48
> [   19.327125] EDAC DEBUG: read_dct_base_mask:   DCSB1[2]=0x00000001 reg: F2x148
> [   19.327129] EDAC DEBUG: read_dct_base_mask:   DCSB0[3]=0x00000101 reg: F2x4c
> [   19.327131] EDAC DEBUG: read_dct_base_mask:   DCSB1[3]=0x00000101 reg: F2x14c
> [   19.327134] EDAC DEBUG: read_dct_base_mask:   DCSB0[4]=0x00000000 reg: F2x50
> [   19.327136] EDAC DEBUG: read_dct_base_mask:   DCSB1[4]=0x00000000 reg: F2x150
> [   19.327138] EDAC DEBUG: read_dct_base_mask:   DCSB0[5]=0x00000000 reg: F2x54
> [   19.327140] EDAC DEBUG: read_dct_base_mask:   DCSB1[5]=0x00000000 reg: F2x154
> [   19.327142] EDAC DEBUG: read_dct_base_mask:   DCSB0[6]=0x00000201 reg: F2x58
> [   19.327144] EDAC DEBUG: read_dct_base_mask:   DCSB1[6]=0x00000201 reg: F2x158
> [   19.327146] EDAC DEBUG: read_dct_base_mask:   DCSB0[7]=0x00000301 reg: F2x5c
> [   19.327148] EDAC DEBUG: read_dct_base_mask:   DCSB1[7]=0x00000301 reg: F2x15c
> [   19.327150] EDAC DEBUG: read_dct_base_mask:     DCSM0[0]=0x00000000 reg: F2x60
> [   19.327152] EDAC DEBUG: read_dct_base_mask:     DCSM1[0]=0x00000000 reg: F2x160
> [   19.327155] EDAC DEBUG: read_dct_base_mask:     DCSM0[1]=0x00f83ce0 reg: F2x64
> [   19.327157] EDAC DEBUG: read_dct_base_mask:     DCSM1[1]=0x00f83ce0 reg: F2x164
> [   19.327159] EDAC DEBUG: read_dct_base_mask:     DCSM0[2]=0x00000000 reg: F2x68
> [   19.327161] EDAC DEBUG: read_dct_base_mask:     DCSM1[2]=0x00000000 reg: F2x168
> [   19.327163] EDAC DEBUG: read_dct_base_mask:     DCSM0[3]=0x00f83ce0 reg: F2x6c
> [   19.327165] EDAC DEBUG: read_dct_base_mask:     DCSM1[3]=0x00f83ce0 reg: F2x16c
> [   19.327169] EDAC DEBUG: dump_misc_regs: F3xE8 (NB Cap): 0x0200df5f
> [   19.327170] EDAC DEBUG: dump_misc_regs:   NB two channel DRAM capable: yes
> [   19.327172] EDAC DEBUG: dump_misc_regs:   ECC capable: yes, ChipKill ECC capable: yes
> [   19.327175] EDAC DEBUG: amd64_dump_dramcfg_low: F2x090 (DRAM Cfg Low): 0x00080100
> [   19.327179] EDAC DEBUG: amd64_dump_dramcfg_low:   DIMM type: buffered; all DIMMs support ECC: yes
> [   19.327181] EDAC DEBUG: amd64_dump_dramcfg_low:   PAR/ERR parity: enabled
> [   19.327183] EDAC DEBUG: amd64_dump_dramcfg_low:   DCT 128bit mode width: 64b
> [   19.327185] EDAC DEBUG: amd64_dump_dramcfg_low:   x4 logical DIMMs present: L0: no L1: no L2: no L3: no
> [   19.327187] EDAC DEBUG: dump_misc_regs: F3xB0 (Online Spare): 0x00000000
> [   19.327189] EDAC DEBUG: dump_misc_regs: F1xF0 (DRAM Hole Address): 0xe0002003, base: 0xe0000000, offset: 0x20000000
> [   19.327190] EDAC DEBUG: dump_misc_regs:   DramHoleValid: yes
> [   19.327193] EDAC DEBUG: amd64_debug_display_dimm_sizes: F2x080 (DRAM Bank Address Mapping): 0x00005050
> [   19.327195] EDAC MC: DCT0 chip selects:
> [   19.327196] EDAC amd64: MC: 0:     0MB 1:     0MB
> [   19.333141] EDAC amd64: MC: 2:  1024MB 3:  1024MB
> [   19.339225] EDAC amd64: MC: 4:     0MB 5:     0MB
> [   19.344247] EDAC amd64: MC: 6:  1024MB 7:  1024MB
> [   19.348948] EDAC DEBUG: amd64_debug_display_dimm_sizes: F2x180 (DRAM Bank Address Mapping): 0x00005050
> [   19.348949] EDAC MC: DCT1 chip selects:
> [   19.348954] EDAC amd64: MC: 0:     0MB 1:     0MB
> [   19.353656] EDAC amd64: MC: 2:  1024MB 3:  1024MB
> [   19.358365] EDAC amd64: MC: 4:     0MB 5:     0MB
> [   19.363086] EDAC amd64: MC: 6:  1024MB 7:  1024MB
> [   19.367799] EDAC amd64: using x8 syndromes.
> [   19.371996] EDAC DEBUG: amd64_dump_dramcfg_low: F2x190 (DRAM Cfg Low): 0x00080100
> [   19.371998] EDAC DEBUG: amd64_dump_dramcfg_low:   DIMM type: buffered; all DIMMs support ECC: yes
> [   19.372003] EDAC DEBUG: amd64_dump_dramcfg_low:   PAR/ERR parity: enabled
> [   19.372005] EDAC DEBUG: amd64_dump_dramcfg_low:   DCT 128bit mode width: 64b
> [   19.372007] EDAC DEBUG: amd64_dump_dramcfg_low:   x4 logical DIMMs present: L0: no L1: no L2: no L3: no
> [   19.372009] EDAC DEBUG: f1x_early_channel_count: Data width is not 128 bits - need more decoding
> [   19.372011] EDAC amd64: MCT channel count: 2
> [   19.376292] EDAC DEBUG: edac_mc_alloc: allocating 1904 bytes for mci data (16 ranks, 16 csrows/channels)
> [   19.376323] EDAC DEBUG: init_csrows: node 0, NBCFG=0x4af0005c[ChipKillEccCap: 1|DramEccEn: 1]
> [   19.376325] EDAC DEBUG: init_csrows: MC node: 0, csrow: 2
> [   19.376327] EDAC DEBUG: amd64_csrow_nr_pages: csrow: 2, channel: 0, DBAM idx: 5
> [   19.376329] EDAC DEBUG: amd64_csrow_nr_pages: nr_pages/channel: 262144
> [   19.376331] EDAC DEBUG: amd64_csrow_nr_pages: csrow: 2, channel: 1, DBAM idx: 5
> [   19.376333] EDAC DEBUG: amd64_csrow_nr_pages: nr_pages/channel: 262144
> [   19.376335] EDAC amd64: CS2: Registered DDR3 RAM
> [   19.380967] EDAC DEBUG: init_csrows: Total csrow2 pages: 524288
> [   19.380970] EDAC DEBUG: init_csrows: MC node: 0, csrow: 3
> [   19.380971] EDAC DEBUG: amd64_csrow_nr_pages: csrow: 3, channel: 0, DBAM idx: 5
> [   19.380973] EDAC DEBUG: amd64_csrow_nr_pages: nr_pages/channel: 262144
> [   19.380975] EDAC DEBUG: amd64_csrow_nr_pages: csrow: 3, channel: 1, DBAM idx: 5
> [   19.380977] EDAC DEBUG: amd64_csrow_nr_pages: nr_pages/channel: 262144
> [   19.380978] EDAC amd64: CS3: Registered DDR3 RAM
> [   19.385610] EDAC DEBUG: init_csrows: Total csrow3 pages: 524288
> [   19.385612] EDAC DEBUG: init_csrows: MC node: 0, csrow: 6
> [   19.385614] EDAC DEBUG: amd64_csrow_nr_pages: csrow: 6, channel: 0, DBAM idx: 5
> [   19.385615] EDAC DEBUG: amd64_csrow_nr_pages: nr_pages/channel: 262144
> [   19.385617] EDAC DEBUG: amd64_csrow_nr_pages: csrow: 6, channel: 1, DBAM idx: 5
> [   19.385619] EDAC DEBUG: amd64_csrow_nr_pages: nr_pages/channel: 262144
> [   19.385620] EDAC amd64: CS6: Registered DDR3 RAM
> [   19.390240] EDAC DEBUG: init_csrows: Total csrow6 pages: 524288
> [   19.390242] EDAC DEBUG: init_csrows: MC node: 0, csrow: 7
> [   19.390244] EDAC DEBUG: amd64_csrow_nr_pages: csrow: 7, channel: 0, DBAM idx: 5
> [   19.390246] EDAC DEBUG: amd64_csrow_nr_pages: nr_pages/channel: 262144
> [   19.390248] EDAC DEBUG: amd64_csrow_nr_pages: csrow: 7, channel: 1, DBAM idx: 5
> [   19.390250] EDAC DEBUG: amd64_csrow_nr_pages: nr_pages/channel: 262144
> [   19.390254] EDAC amd64: CS7: Registered DDR3 RAM
> [   19.394875] EDAC DEBUG: init_csrows: Total csrow7 pages: 524288

[ … ]

> [   19.395385] EDAC MC0: Giving out device to 'amd64_edac' 'F10h': DEV 0000:00:18.2
> [   19.402852] EDAC amd64: DRAM ECC enabled.
> [   19.406879] EDAC DEBUG: amd64_nb_mce_bank_enabled_on_node: core: 1, MCG_CTL: 0x3f, NB MSR is enabled

here's core 1, WTF? on the second node? Great.

> [   19.406882] EDAC DEBUG: amd64_nb_mce_bank_enabled_on_node: core: 7, MCG_CTL: 0x3f, NB MSR is enabled
> [   19.406884] EDAC DEBUG: amd64_nb_mce_bank_enabled_on_node: core: 8, MCG_CTL: 0x3f, NB MSR is enabled
> [   19.406887] EDAC DEBUG: amd64_nb_mce_bank_enabled_on_node: core: 9, MCG_CTL: 0x3f, NB MSR is enabled
> [   19.406889] EDAC DEBUG: amd64_nb_mce_bank_enabled_on_node: core: 10, MCG_CTL: 0x3f, NB MSR is enabled
> [   19.406891] EDAC DEBUG: amd64_nb_mce_bank_enabled_on_node: core: 11, MCG_CTL: 0x3f, NB MSR is enabled

[ … ]

On Thu, Mar 07, 2013 at 09:57:03AM -0300, Mauro Carvalho Chehab wrote:
> This is what the csrows nodes show:
>
> /sys/devices/system/edac/mc/mc0/csrow2/size_mb:2048
> /sys/devices/system/edac/mc/mc0/csrow3/size_mb:2048
> /sys/devices/system/edac/mc/mc0/csrow6/size_mb:2048
> /sys/devices/system/edac/mc/mc0/csrow7/size_mb:2048
> /sys/devices/system/edac/mc/mc1/csrow2/size_mb:2048
> /sys/devices/system/edac/mc/mc1/csrow3/size_mb:2048
> /sys/devices/system/edac/mc/mc1/csrow6/size_mb:2048
> /sys/devices/system/edac/mc/mc1/csrow7/size_mb:2048

This is correct.

Each chip select has 1024M per DCT but since we have 2 DCTs per node,
that's 1024M * 2 = 2G per chip select of a MC.

> Total size is 16Gb, but the number of ranks are wrong.

Well, chip select != rank, remember?

> This is what's reported by the new API:
> 
> /sys/devices/system/edac/mc/mc0/rank12/size:2048
> /sys/devices/system/edac/mc/mc0/rank13/size:2048
> /sys/devices/system/edac/mc/mc0/rank14/size:2048
> /sys/devices/system/edac/mc/mc0/rank15/size:2048
> /sys/devices/system/edac/mc/mc0/rank4/size:2048
> /sys/devices/system/edac/mc/mc0/rank5/size:2048
> /sys/devices/system/edac/mc/mc0/rank6/size:2048
> /sys/devices/system/edac/mc/mc0/rank7/size:2048
> /sys/devices/system/edac/mc/mc1/rank12/size:2048
> /sys/devices/system/edac/mc/mc1/rank13/size:2048
> /sys/devices/system/edac/mc/mc1/rank14/size:2048
> /sys/devices/system/edac/mc/mc1/rank15/size:2048
> /sys/devices/system/edac/mc/mc1/rank4/size:2048
> /sys/devices/system/edac/mc/mc1/rank5/size:2048
> /sys/devices/system/edac/mc/mc1/rank6/size:2048
> /sys/devices/system/edac/mc/mc1/rank7/size:2048
> 
> Here, the number of ranks are ok, but the size is wrong.
> 
> This is what the edac debug logs say:
> 
> [   18.829184] EDAC amd64: F10h detected (node 0).
> [   18.829206] EDAC MC: DCT0 chip selects:
> [   18.829207] EDAC amd64: MC: 0:     0MB 1:     0MB
> [   18.829219] EDAC amd64: MC: 2:  1024MB 3:  1024MB
> [   18.829220] EDAC amd64: MC: 4:     0MB 5:     0MB
> [   18.829221] EDAC amd64: MC: 6:  1024MB 7:  1024MB
> [   18.829222] EDAC MC: DCT1 chip selects:
> [   18.829223] EDAC amd64: MC: 0:     0MB 1:     0MB
> [   18.829223] EDAC amd64: MC: 2:  1024MB 3:  1024MB
> [   18.829224] EDAC amd64: MC: 4:     0MB 5:     0MB
> [   18.829225] EDAC amd64: MC: 6:  1024MB 7:  1024MB
> 
> [   18.923914] EDAC amd64: F10h detected (node 1).
> [   18.956025] EDAC MC: DCT0 chip selects:
> [   18.956028] EDAC amd64: MC: 0:     0MB 1:     0MB
> [   18.962055] EDAC amd64: MC: 2:  1024MB 3:  1024MB
> [   18.968167] EDAC amd64: MC: 4:     0MB 5:     0MB
> [   18.974252] EDAC amd64: MC: 6:  1024MB 7:  1024MB
> [   18.980333] EDAC MC: DCT1 chip selects:
> [   18.980335] EDAC amd64: MC: 0:     0MB 1:     0MB
> [   18.986415] EDAC amd64: MC: 2:  1024MB 3:  1024MB
> [   18.991454] EDAC amd64: MC: 4:     0MB 5:     0MB
> [   18.996155] EDAC amd64: MC: 6:  1024MB 7:  1024MB
> [   19.000854] EDAC amd64: using x8 syndromes.
> 
> Here, everything is fine.

So, actually to satisfy the new api, you'll probably need to stick down
this information above, i.e. the chip selects *per* DCT which equals
also the ranks.

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

  reply	other threads:[~2013-03-09 15:46 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-12-11 14:01 [GIT PULL] EDAC fixes for 3.8 Borislav Petkov
2013-03-07 12:57 ` Mauro Carvalho Chehab
2013-03-07 13:06   ` Borislav Petkov
2013-03-07 14:02     ` Mauro Carvalho Chehab
2013-03-09 15:46       ` Borislav Petkov [this message]
2013-03-11 12:07         ` Mauro Carvalho Chehab
2013-03-11 12:28           ` Mauro Carvalho Chehab
2013-03-11 13:48             ` Borislav Petkov
2013-03-11 14:12               ` Mauro Carvalho Chehab
2013-03-11 14:31                 ` Borislav Petkov
2013-03-11 20:08                   ` Mauro Carvalho Chehab
2013-03-11 20:43                     ` Borislav Petkov
2013-03-12 11:26                       ` Mauro Carvalho Chehab
2013-03-12  8:58             ` Borislav Petkov
2013-03-12  9:16               ` Borislav Petkov
2013-03-12 11:34                 ` Mauro Carvalho Chehab
2013-03-12 11:56                   ` Borislav Petkov
2013-03-12 13:58                     ` Mauro Carvalho Chehab
2013-03-12 10:55               ` Mauro Carvalho Chehab
  -- strict thread matches above, loose matches on Subject: below --
2013-01-08 15:44 Borislav Petkov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130309154635.GA18316@pd.tnic \
    --to=bp@alien8.de \
    --cc=linux-edac@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mchehab@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.