From: "Kani, Toshimitsu" <toshi.kani@hpe.com>
To: "mchehab@s-opensource.com" <mchehab@s-opensource.com>
Cc: "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"tglx@linutronix.de" <tglx@linutronix.de>,
"mchehab@kernel.org" <mchehab@kernel.org>,
"rjw@rjwysocki.net" <rjw@rjwysocki.net>,
"srinivas.pandruvada@linux.intel.com"
<srinivas.pandruvada@linux.intel.com>,
"bp@alien8.de" <bp@alien8.de>,
"tony.luck@intel.com" <tony.luck@intel.com>,
"lenb@kernel.org" <lenb@kernel.org>,
"linux-acpi@vger.kernel.org" <linux-acpi@vger.kernel.org>,
"linux-edac@vger.kernel.org" <linux-edac@vger.kernel.org>
Subject: Re: [PATCH 3/3] ghes_edac: add platform check to enable ghes_edac
Date: Wed, 19 Jul 2017 16:40:25 +0000 [thread overview]
Message-ID: <1500481869.2042.29.camel@hpe.com> (raw)
In-Reply-To: <20170718181545.32bd9181@vento.lan>
On Tue, 2017-07-18 at 18:15 -0300, Mauro Carvalho Chehab wrote:
> Em Tue, 18 Jul 2017 19:58:54 +0000
:
> We had a similar discussion several years ago when I wrote this
> driver. On that time, I talked with Red Hat, HP, Dell, Intel people
> and with some customers with large clusters.
>
> The way it is, ghes_edac is a poor man's driver. What it hopefully
> provide is a detection that an error happened, without really telling
> the user what component should be replaced.
"poor man's driver" is a bit misleading, but yes, firmware-first
platforms have RAS features built-into the platforms, and they do not
need intelligence in EDAC drivers, which may conflict with the
platform's RAS features. I cannot speak for other vendors, but HPE
platforms log errors and provide FRU info. ghes_edac allows to report
errors to OS management tools like rasdaemon in addition to platform-
specific managements.
> Ok, on machines with their own error reporting mechanism (like
> HP servers), a sys admin can look on some proprietary software
> (or bios), in order to identify what happened.
>
> Yet, BIOS doesn't provide any glue about what's the memory
> architecture, as it maps memory as if it was a single DIMM memory:
>
> (from ghes_edac_register)
>
> layers[0].type = EDAC_MC_LAYER_ALL_MEM;
> layers[0].size = num_dimm;
> layers[0].is_virt_csrow = true;
>
> So, even on systems where the BIOS actually knows how the memory
> cards are wired, it will mask the memory controller data.
>
> Now, the EDAC driver can also be used to identify what
> channels are used. That helps the sys admin to know if the
> memories are connected in a way that it will be using multiple
> channels, or not, helping to setup the machine to obtain
> the maximum possible performance.
>
> So, for example, on my Intel-based HP server, I can check
> such info with:
>
> $ ras-mc-ctl --mainboard
> ras-mc-ctl: mainboard: HP model ProLiant ML350 Gen9
> $ ras-mc-ctl --layout
> +-------------------------------------------------------------
> ----------+
> | mc0 | mc1
> |
> | channel0 | channel1 | channel2 | channel0 | channel1 |
> channel2 |
> -------+-------------------------------------------------------------
> ----------+
> slot2: | 0 MB | 0 MB | 0 MB | 0 MB | 0
> MB | 0 MB |
> slot1: | 0 MB | 0 MB | 0 MB | 0 MB | 0
> MB | 0 MB |
> slot0: | 16384 MB | 0 MB | 16384 MB | 16384 MB | 0
> MB | 16384 MB |
> -------+-------------------------------------------------------------
> --------------+
>
> So, I know that both CPUs will be connected to my memories, and,
> on both, it is using 2 channels.
>
> If I was using the ghes driver, that information would be hidden.
>
> So, due to all problems with ghes, it is enabled only if there are no
> better solution, e. g. on systems where there's no way to talk
> directly to the hardware (like on E7 Xeon machines, where the memory
> controller is actually on a separate chip that are controlled only by
> the BIOS).
Thanks for the info! That's very helpful. I will check to see if
ghes_edac provides enough info that we need.
-Toshi
next prev parent reply other threads:[~2017-07-19 16:40 UTC|newest]
Thread overview: 79+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-07-17 21:59 [PATCH 0/3] enable ghes_edac on selected platforms Toshi Kani
2017-07-17 21:59 ` [PATCH 1/3] ACPI / blacklist: add acpi_match_oemlist() interface Toshi Kani
2017-07-18 5:34 ` Borislav Petkov
2017-07-18 15:48 ` Kani, Toshimitsu
2017-07-18 16:43 ` Borislav Petkov
2017-07-18 17:24 ` Kani, Toshimitsu
2017-07-18 17:42 ` Borislav Petkov
2017-07-18 18:49 ` Kani, Toshimitsu
2017-07-18 19:32 ` Borislav Petkov
2017-07-18 20:17 ` Kani, Toshimitsu
2017-07-17 21:59 ` [PATCH 2/3] intel_pstate: convert to use acpi_match_oemlist() Toshi Kani
2017-07-17 21:59 ` [PATCH 3/3] ghes_edac: add platform check to enable ghes_edac Toshi Kani
2017-07-18 6:00 ` Borislav Petkov
2017-07-18 8:08 ` Borislav Petkov
2017-07-18 21:20 ` Kani, Toshimitsu
2017-07-19 5:52 ` Borislav Petkov
2017-07-19 16:10 ` Kani, Toshimitsu
2017-07-19 16:22 ` Borislav Petkov
2017-07-19 16:56 ` Kani, Toshimitsu
2017-07-20 4:16 ` Borislav Petkov
2017-07-20 14:42 ` Kani, Toshimitsu
2017-07-20 15:04 ` Borislav Petkov
2017-07-20 16:55 ` Luck, Tony
2017-07-20 17:05 ` Borislav Petkov
2017-07-20 17:10 ` Luck, Tony
2017-07-20 18:16 ` Mauro Carvalho Chehab
2017-07-19 18:55 ` Aristeu Rozanski
2017-07-19 20:13 ` Kani, Toshimitsu
2017-07-20 4:19 ` Borislav Petkov
2017-07-18 19:58 ` Kani, Toshimitsu
2017-07-18 21:15 ` Mauro Carvalho Chehab
2017-07-19 5:58 ` Borislav Petkov
2017-07-19 15:14 ` Luck, Tony
2017-07-19 15:57 ` Borislav Petkov
2017-07-19 18:06 ` Luck, Tony
2017-07-19 16:40 ` Kani, Toshimitsu [this message]
2017-07-20 4:33 ` Borislav Petkov
2017-07-20 19:50 ` Kani, Toshimitsu
2017-07-20 20:15 ` Mauro Carvalho Chehab
2017-07-20 21:07 ` Kani, Toshimitsu
2017-07-21 13:34 ` Borislav Petkov
2017-07-21 13:40 ` Mauro Carvalho Chehab
2017-07-21 13:47 ` Borislav Petkov
2017-07-21 15:08 ` Kani, Toshimitsu
2017-07-21 15:13 ` Borislav Petkov
2017-07-21 15:34 ` Kani, Toshimitsu
2017-07-21 15:44 ` Mauro Carvalho Chehab
2017-07-21 16:40 ` Kani, Toshimitsu
2017-07-21 17:01 ` Mauro Carvalho Chehab
2017-07-21 17:21 ` Kani, Toshimitsu
2017-07-21 17:23 ` Borislav Petkov
2017-07-21 18:38 ` Kani, Toshimitsu
2017-07-22 6:28 ` Borislav Petkov
2017-07-24 14:49 ` Kani, Toshimitsu
2017-07-24 15:04 ` Borislav Petkov
2017-07-24 15:25 ` Kani, Toshimitsu
2017-07-24 15:37 ` Borislav Petkov
2017-07-24 15:56 ` Kani, Toshimitsu
2017-07-24 16:37 ` Borislav Petkov
2017-07-24 17:44 ` Kani, Toshimitsu
2017-07-24 17:50 ` Boris Petkov
2017-07-24 17:54 ` Kani, Toshimitsu
2017-07-24 18:18 ` Borislav Petkov
2017-07-24 17:56 ` Mauro Carvalho Chehab
2017-07-24 18:12 ` Kani, Toshimitsu
2017-07-24 16:04 ` Mauro Carvalho Chehab
2017-07-24 16:44 ` Borislav Petkov
2017-07-24 18:10 ` Mauro Carvalho Chehab
2017-07-24 18:30 ` Borislav Petkov
2017-07-25 23:00 ` Kani, Toshimitsu
2017-07-21 15:53 ` Borislav Petkov
2017-07-21 16:32 ` Kani, Toshimitsu
2017-07-19 5:55 ` Borislav Petkov
2017-07-18 22:13 ` Luck, Tony
2017-07-19 6:01 ` Borislav Petkov
2017-07-18 14:39 ` Jeffrey Hugo
2017-07-18 15:36 ` Kani, Toshimitsu
2017-07-18 16:24 ` Jeffrey Hugo
2017-07-18 16:42 ` Kani, Toshimitsu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1500481869.2042.29.camel@hpe.com \
--to=toshi.kani@hpe.com \
--cc=bp@alien8.de \
--cc=lenb@kernel.org \
--cc=linux-acpi@vger.kernel.org \
--cc=linux-edac@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mchehab@kernel.org \
--cc=mchehab@s-opensource.com \
--cc=rjw@rjwysocki.net \
--cc=srinivas.pandruvada@linux.intel.com \
--cc=tglx@linutronix.de \
--cc=tony.luck@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox