From: Borislav Petkov <bp@alien8.de>
To: Max Asbock <masbock@linux.vnet.ibm.com>
Cc: tony.luck@intel.com, lkml@vger.kernel.org,
linux-acpi@vger.kernel.org, ying.huang@intel.com,
naveen.n.rao@in.ibm.com, ananth@in.ibm.com,
lcm@linux.vnet.ibm.com
Subject: Re: [Patch] MCE, APEI: Don't enable CMCI when Firmware First mode is set in HEST for corrected machine checks
Date: Tue, 7 May 2013 01:25:37 +0200 [thread overview]
Message-ID: <20130506232537.GF22041@pd.tnic> (raw)
In-Reply-To: <1367881102.4518.68.camel@oc3432500282.ibm.com>
On Mon, May 06, 2013 at 03:58:22PM -0700, Max Asbock wrote:
> This patch adds code to the machine check handler to parse the
> APEI Hardware Error Source Table (HEST) and check if Firmware First mode
> is set for the Correct Machine Check structure in HEST.
> If Firmware First mode is set, the corrected machine checks are expected
> to be reported through a Generic Hardware Error Source (GHES). Therefore in this
> case CMCI will not be enabled to avoid interference with the firmware's
> error reporting.
>
> Signed-off-by: Max Asbock <masbock@linux.vnet.ibm.com>
> ---
> include/asm/mce.h | 2 +
> kernel/cpu/mcheck/mce-apei.c | 70 +++++++++++++++++++++++++++++++++++++++
> kernel/cpu/mcheck/mce-internal.h | 5 ++
> kernel/cpu/mcheck/mce_intel.c | 6 ++-
> 4 files changed, 81 insertions(+), 2 deletions(-)
>
>
> diff -burpN linux-3.9/arch/x86/include/asm/mce.h linux-3.9.ff/arch/x86/include/asm/mce.h
> --- linux-3.9/arch/x86/include/asm/mce.h 2013-05-02 09:38:16.000000000 -0700
> +++ linux-3.9.ff/arch/x86/include/asm/mce.h 2013-05-03 13:20:36.000000000 -0700
> @@ -144,12 +144,14 @@ DECLARE_PER_CPU(struct device *, mce_dev
>
> #ifdef CONFIG_X86_MCE_INTEL
> void mce_intel_feature_init(struct cpuinfo_x86 *c);
> +void mce_intel_init_cmci(void);
> void cmci_clear(void);
> void cmci_reenable(void);
> void cmci_rediscover(void);
> void cmci_recheck(void);
> #else
> static inline void mce_intel_feature_init(struct cpuinfo_x86 *c) { }
> +static inline void mce_intel_init_cmci(void) {}
> static inline void cmci_clear(void) {}
> static inline void cmci_reenable(void) {}
> static inline void cmci_rediscover(void) {}
> diff -burpN linux-3.9/arch/x86/kernel/cpu/mcheck/mce-apei.c linux-3.9.ff/arch/x86/kernel/cpu/mcheck/mce-apei.c
> --- linux-3.9/arch/x86/kernel/cpu/mcheck/mce-apei.c 2013-05-02 09:38:16.000000000 -0700
> +++ linux-3.9.ff/arch/x86/kernel/cpu/mcheck/mce-apei.c 2013-05-06 14:25:48.000000000 -0700
> @@ -32,6 +32,7 @@
> #include <linux/kernel.h>
> #include <linux/acpi.h>
> #include <linux/cper.h>
> +#include <linux/init.h>
> #include <acpi/apei.h>
> #include <asm/mce.h>
>
> @@ -148,3 +149,72 @@ int apei_clear_mce(u64 record_id)
> {
> return erst_clear(record_id);
> }
> +
> +
> +/* Support for Firmware First (FF) mode for Corrected Machine Checks
> + * as defined by APEI in the Hardware Error Source Table (HEST).
> + * (see section 18.3.2.2 in the ACPI spec, version 5.0)
> + *
> + * When Firmware First mode is specified in HEST for Corrected Machine Checks,
> + * these errors are expected to be reported by the firmware through a
> + * Generic Harware Event Source (GHES). In this case error reporting
> + * through CMCI should not be enabled so it won't interfere with the firmware.
> + *
> + * In the boot sequence HEST parsing is initialized after
> + * CPUs, therefore initially we don't know if Firmware First is set.
> + * We just assume yes, and enable CMCI in a late init call if FF is not set.
I'm very sceptical about that assumption - I don't think that the
majority of boxes out there *now* have APEI and FF enabled (and those
working properly :)).
Besides, there's a bigger problem with this - in the window between
mce_intel_feature_init() and honor_cmc_firmware_first() you don't handle
any correctable errors, AFAICT.
> + */
> +static bool cmc_firmware_first;
> +static bool cmc_firmware_first_checked;
> +
> +bool apei_cmc_firmware_first(void)
> +{
> + if (!cmc_firmware_first_checked)
> + return true;
> + return cmc_firmware_first;
> +}
> +
> +static int check_cmc_firmware_first(struct acpi_hest_header *hest_hdr, void *d)
> +{
> + if (hest_hdr->type == ACPI_HEST_TYPE_IA32_CORRECTED_CHECK) {
> + struct acpi_hest_ia_corrected *cmc;
> +
> + cmc = (struct acpi_hest_ia_corrected *)hest_hdr;
> +
> + if (cmc->flags & ACPI_HEST_FIRMWARE_FIRST) {
> + cmc_firmware_first = true;
> + return 1;
> + }
> + }
> + return 0;
> +}
> +
> +static void intel_late_init_cmci(void *data)
> +{
> + if (!mce_available(__this_cpu_ptr(&cpu_info)))
> + return;
> +
> + mce_intel_init_cmci();
> +}
> +
> +static __init int honor_cmc_firmware_first(void)
> +{
> + struct cpuinfo_x86 *c = __this_cpu_ptr(&cpu_info);
> +
> + if (c->x86_vendor != X86_VENDOR_INTEL)
> + return 0;
> +
> + apei_hest_parse(check_cmc_firmware_first, NULL);
> + cmc_firmware_first_checked = true;
> +
> + if (cmc_firmware_first)
> + mca_cfg.cmci_disabled = true;
> +
> + if (mca_cfg.cmci_disabled || mca_cfg.ignore_ce)
> + return 0;
> +
> + on_each_cpu(intel_late_init_cmci, NULL, 1);
> + return 0;
> +}
> +
> +late_initcall(honor_cmc_firmware_first);
This glue code should go to drivers/acpi/apei/ where the rest of the
APEI gunk lives.
> diff -burpN linux-3.9/arch/x86/kernel/cpu/mcheck/mce_intel.c linux-3.9.ff/arch/x86/kernel/cpu/mcheck/mce_intel.c
> --- linux-3.9/arch/x86/kernel/cpu/mcheck/mce_intel.c 2013-05-02 09:38:16.000000000 -0700
> +++ linux-3.9.ff/arch/x86/kernel/cpu/mcheck/mce_intel.c 2013-05-03 13:21:53.000000000 -0700
> @@ -315,7 +315,7 @@ void cmci_reenable(void)
> cmci_discover(banks);
> }
>
> -static void intel_init_cmci(void)
> +void mce_intel_init_cmci(void)
> {
> int banks;
>
> @@ -337,5 +337,7 @@ static void intel_init_cmci(void)
> void mce_intel_feature_init(struct cpuinfo_x86 *c)
> {
> intel_init_thermal(c);
> - intel_init_cmci();
> +
> + if (!apei_cmc_firmware_first())
> + mce_intel_init_cmci();
Right, so I think the right approach should be to let CMCI get
initialized, then when apei gets to run, it calls cmci_clear() to turn
it off and continue its own FF handling. And this should be doable
without touching any MCA internals like mca_cfg and such. Simply
exporting the CMCI interfaces should be much cleaner.
Btw, we should keep the possibility open to be able to fall back to CMCI
at any point when, for some reason, it has been determined that APEI
functionality is out of order but we still want to have CMCI polling,
independent from the BIOS.
Thanks.
--
Regards/Gruss,
Boris.
Sent from a fat crate under my desk. Formatting is fine.
--
next prev parent reply other threads:[~2013-05-06 23:24 UTC|newest]
Thread overview: 27+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-05-06 22:58 [Patch] MCE, APEI: Don't enable CMCI when Firmware First mode is set in HEST for corrected machine checks Max Asbock
2013-05-06 23:25 ` Borislav Petkov [this message]
2013-05-07 3:32 ` Max Asbock
2013-05-07 13:19 ` Borislav Petkov
2013-05-07 15:40 ` Max Asbock
2013-05-07 18:51 ` Borislav Petkov
2013-05-07 21:34 ` Luck, Tony
2013-05-07 21:55 ` Borislav Petkov
2013-05-08 21:22 ` Borislav Petkov
2013-05-08 21:55 ` Luck, Tony
2013-05-08 22:15 ` Borislav Petkov
2013-05-08 22:22 ` Luck, Tony
2013-05-10 17:59 ` Max Asbock
2013-05-12 14:47 ` Borislav Petkov
2013-06-14 18:17 ` [PATCH] Re: [Patch] MCE, APEI: Don't enable CMCI when Firmware First mode is set in Naveen N. Rao
2013-06-15 14:48 ` Borislav Petkov
2013-06-17 7:00 ` Naveen N. Rao
2013-06-17 7:00 ` Naveen N. Rao
2013-06-17 7:06 ` Borislav Petkov
2013-06-17 8:11 ` Naveen N. Rao
2013-06-17 8:21 ` Borislav Petkov
2013-06-17 10:31 ` Naveen N. Rao
2013-06-18 6:43 ` Naveen N. Rao
2013-06-18 22:29 ` Tony Luck
2013-06-19 6:58 ` Naveen N. Rao
2013-06-16 12:20 ` Borislav Petkov
2013-06-17 7:00 ` Naveen N. Rao
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20130506232537.GF22041@pd.tnic \
--to=bp@alien8.de \
--cc=ananth@in.ibm.com \
--cc=lcm@linux.vnet.ibm.com \
--cc=linux-acpi@vger.kernel.org \
--cc=lkml@vger.kernel.org \
--cc=masbock@linux.vnet.ibm.com \
--cc=naveen.n.rao@in.ibm.com \
--cc=tony.luck@intel.com \
--cc=ying.huang@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.