xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
From: Andreas Pflug <pgadmin@pse-consulting.de>
To: Jan Beulich <JBeulich@suse.com>
Cc: 810964@bugs.debian.org, xen-devel@lists.xen.org
Subject: Re: [BUG] EDAC infomation partially missing
Date: Fri, 22 Jan 2016 10:09:04 +0100	[thread overview]
Message-ID: <56A1F1B0.9030508@pse-consulting.de> (raw)
In-Reply-To: <56A1184902000078000C9B3D@prv-mh.provo.novell.com>

Am 21.01.16 um 17:41 schrieb Jan Beulich:
>>>> On 20.01.16 at 16:01, <andreas.pflug@web.de> wrote:
>> Initially reported to debian
>> (http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=810964), redirected here:
>>
>> With AMD Opteron 6xxx processors, half of the memory controllers are
>> missing from /sys/devices/system/edac/mc
>> Checked with single 6120 (dual memory controller) and twin 6344 (2x dual
>> MC), other dual-module CPUs might be affected too.
>>
>> Booting plain Linux (3.2, 3.16, 4.1, 4.3), all memory controllers are
>> listed under /sys/devices/system/edac/mc as expected. Same happens, when
>> Xen 4.1 is used: all MCs present.
>>
>> Starting with Xen 4.4 (Debian Jessie), only mc1 (on the single CPU
>> machine) or mc2/mc3 (dual CPU machine) are present, although the full
>> system memory is accessible. Checked versions were 4.1.4 (Debian
>> Wheezy), 4.4.1 (Jessie) and 4.6.0 (Sid)
> As already indicated by Ian in that bug, you should supply us with
> full kernel and hypervisor logs for both the good and bad cases
> (ideally with the same kernel version use in both runs, so that we
> can exclude kernel behavior differences).
Here are some dmesg excerpts, all performed with Linux 4.1.3.

When booting with Xen 4.1.4:

AMD64 EDAC driver v3.4.0
EDAC amd64: DRAM ECC enabled.
EDAC amd64: F10h detected (node 0).
EDAC MC: DCT0 chip selects:
EDAC amd64: MC: 0:     0MB 1:     0MB
EDAC amd64: MC: 2:  2048MB 3:  2048MB
EDAC amd64: MC: 4:     0MB 5:     0MB
EDAC amd64: MC: 6:     0MB 7:     0MB
EDAC MC: DCT1 chip selects:
EDAC amd64: MC: 0:     0MB 1:     0MB
EDAC amd64: MC: 2:  2048MB 3:  2048MB
EDAC amd64: MC: 4:     0MB 5:     0MB
EDAC amd64: MC: 6:     0MB 7:     0MB
EDAC amd64: using x8 syndromes.
EDAC amd64: MCT channel count: 2
EDAC MC0: Giving out device to module amd64_edac controller F10h: DEV
0000:00:18.2 (INTERRUPT)
EDAC amd64: DRAM ECC enabled.
EDAC amd64: F10h detected (node 1).
EDAC MC: DCT0 chip selects:
EDAC amd64: MC: 0:     0MB 1:     0MB
EDAC amd64: MC: 2:  2048MB 3:  2048MB
EDAC amd64: MC: 4:     0MB 5:     0MB
EDAC amd64: MC: 6:     0MB 7:     0MB
EDAC MC: DCT1 chip selects:
EDAC amd64: MC: 0:     0MB 1:     0MB
EDAC amd64: MC: 2:  2048MB 3:  2048MB
EDAC amd64: MC: 4:     0MB 5:     0MB
EDAC amd64: MC: 6:     0MB 7:     0MB
EDAC amd64: using x8 syndromes.
EDAC amd64: MCT channel count: 2
EDAC MC1: Giving out device to module amd64_edac controller F10h: DEV
0000:00:19.2 (INTERRUPT)

When booting with Xen 4.4.1:

AMD64 EDAC driver v3.4.0
EDAC amd64: DRAM ECC enabled.
EDAC amd64: NB MCE bank disabled, set MSR 0x0000017b[4] on node 0 to enable.
EDAC amd64: ECC disabled in the BIOS or no ECC capability, module will
not load.
 Either enable ECC checking or force module loading by setting
'ecc_enable_override'.
 (Note that use of the override may cause unknown side effects.)
EDAC amd64: DRAM ECC enabled.
EDAC amd64: F10h detected (node 1).
EDAC MC: DCT0 chip selects:
EDAC amd64: MC: 0:     0MB 1:     0MB
EDAC amd64: MC: 2:  2048MB 3:  2048MB
EDAC amd64: MC: 4:     0MB 5:     0MB
EDAC amd64: MC: 6:     0MB 7:     0MB
EDAC MC: DCT1 chip selects:
EDAC amd64: MC: 0:     0MB 1:     0MB
EDAC amd64: MC: 2:  2048MB 3:  2048MB
EDAC amd64: MC: 4:     0MB 5:     0MB
EDAC amd64: MC: 6:     0MB 7:     0MB
EDAC amd64: using x8 syndromes.
EDAC amd64: MCT channel count: 2
EDAC MC1: Giving out device to module amd64_edac controller F10h: DEV
0000:00:19.2 (INTERRUPT)

Apparently Xen4.4 doesn't report the BIOS flag correctly. I added
ecc_enable_override=1 to amd64_edac_mod, and then I get

EDAC MC: Ver: 3.0.0
AMD64 EDAC driver v3.4.0
EDAC amd64: DRAM ECC enabled.
EDAC amd64: NB MCE bank disabled, set MSR 0x0000017b[4] on node 0 to enable.
EDAC amd64: ECC disabled in the BIOS or no ECC capability, module will
not load.
EDAC amd64: Forcing ECC on!
EDAC amd64: F10h detected (node 0).
EDAC MC: DCT0 chip selects:
EDAC amd64: MC: 0:     0MB 1:     0MB
EDAC amd64: MC: 2:  2048MB 3:  2048MB
EDAC amd64: MC: 4:     0MB 5:     0MB
EDAC amd64: MC: 6:     0MB 7:     0MB
EDAC MC: DCT1 chip selects:
EDAC amd64: MC: 0:     0MB 1:     0MB
EDAC amd64: MC: 2:  2048MB 3:  2048MB
EDAC amd64: MC: 4:     0MB 5:     0MB
EDAC amd64: MC: 6:     0MB 7:     0MB
EDAC amd64: using x8 syndromes.
EDAC amd64: MCT channel count: 2
EDAC MC0: Giving out device to module amd64_edac controller F10h: DEV
0000:00:18.2 (INTERRUPT)
EDAC amd64: DRAM ECC enabled.
EDAC amd64: F10h detected (node 1).
EDAC MC: DCT0 chip selects:
EDAC amd64: MC: 0:     0MB 1:     0MB
EDAC amd64: MC: 2:  2048MB 3:  2048MB
EDAC amd64: MC: 4:     0MB 5:     0MB
EDAC amd64: MC: 6:     0MB 7:     0MB
EDAC MC: DCT1 chip selects:
EDAC amd64: MC: 0:     0MB 1:     0MB
EDAC amd64: MC: 2:  2048MB 3:  2048MB
EDAC amd64: MC: 4:     0MB 5:     0MB
EDAC amd64: MC: 6:     0MB 7:     0MB
EDAC amd64: using x8 syndromes.
EDAC amd64: MCT channel count: 2
EDAC MC1: Giving out device to module amd64_edac controller F10h: DEV
0000:00:19.2 (INTERRUPT)

This restored both MCs, so the BIOS flag seems to be the culprit.

Regards,
Andreas

  reply	other threads:[~2016-01-22  9:09 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <569FA160.6070308@web.de>
2016-01-21 16:41 ` [BUG] EDAC infomation partially missing Jan Beulich
2016-01-22  9:09   ` Andreas Pflug [this message]
2016-01-22 10:40     ` Jan Beulich
2016-01-22 11:33       ` Andreas Pflug
     [not found] <20170513223656.GA40303@scollay.m5p.com>
2017-05-15  8:02 ` Jan Beulich
2017-05-16  3:47   ` Elliott Mitchell
2017-05-16  9:54     ` Jan Beulich
2017-05-16 10:08       ` Andrew Cooper
2017-05-16 18:02       ` Elliott Mitchell
2017-05-13 22:36 Elliott Mitchell
  -- strict thread matches above, loose matches on Subject: below --
2016-01-20 15:01 Andreas Pflug

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=56A1F1B0.9030508@pse-consulting.de \
    --to=pgadmin@pse-consulting.de \
    --cc=810964@bugs.debian.org \
    --cc=JBeulich@suse.com \
    --cc=xen-devel@lists.xen.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).