public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Borislav Petkov <bp@alien8.de>
To: Jue Wang <juew@google.com>
Cc: Tony Luck <tony.luck@intel.com>,
	x86@kernel.org, linux-kernel@vger.kernel.org,
	patches@lists.linux.dev
Subject: Re: [PATCH] x86/mce: Add workaround for SKX/CLX/CPX spurious machine checks
Date: Tue, 15 Feb 2022 23:08:43 +0100	[thread overview]
Message-ID: <Ygwka++3eipjQzB2@zn.tnic> (raw)
In-Reply-To: <20220208150945.266978-1-juew@google.com>

On Tue, Feb 08, 2022 at 07:09:45AM -0800, Jue Wang wrote:
> Subject: Re: [PATCH] x86/mce: Add workaround for SKX/CLX/CPX spurious machine checks

Please rewrite Intel-internal model abbreviations. I guess saying here
the actual model is a lot more precise than those which don't even have
any public mapping which is which.

Also, that subject needs to be more precise - "add workaround for
"spurious MCEs" is too vague.

> The fast string copy instructions ("rep movs*") could consume an

REP MOVS* - we usually spell instructions in all caps. Pls fix
everywhere.

> uncorrectable memory error in the cache line _right after_ the
> desired region to copy and raise an MCE.
> 
> Bit 0 of MSR_IA32_MISC_ENABLE can be cleared to disable fast string copy
> and will avoid such spurious machine checks. However, that is less
> preferrable due to the permanent performance impact. Considering memory

Unknown word [preferrable] in commit message.
Suggestions: ['preferable', 'preferably', 'deferrable']

> poison is rare, it's desirable to keep fast string enabled until an MCE
> is seen.
> 
> Intel has confirmed the following:
> 1. The CPU erratum of fast string copy only applies to
> SKX/CLX/CPL generations.
> 2. Directly return from MCE handler will result in complete execution
> of the fast string copy (rep movs*) with no data loss or corruption.
> 3. Directly return from MCE handler will not result in another MCE
> firing on the next poisoned cache line due to rep movs*.
> 4. Directly return from MCE handler will resume execution from a
> correct point in code.
> 5. Directly return from MCE handler due to any other SRAR MCEs will
> result in the same instruction that triggered the MCE firing a second
> MCE immediately.

Simplify this: "Directly return from MCE handler" in every sentence is
not helping.

> 6. It's not safe to directly return without disabling the fast string
> copy, as the next fast string copy of the same buffer on the same CPU
> would result in a PANIC MCE.
> 
> The mitigation in this patch should mitigate the erratum completely with

Avoid having "This patch" or "This commit" in the commit message. It is
tautologically useless.

Also, do

$ git grep 'This patch' Documentation/process

for more details.

> the only caveat that the fast string copy is disabled on the affected
> hyper thread thus performance degradation.
> 
> This is still better than the OS crashes on MCEs raised on an
> irrelevant process due to 'rep movs*' accesses in a kernel context,
> e.g., copy_page.

Wait a minute: so the MCE will happen for a piece of buffer that REP;
MOVS *wasn't* supposed to copy.

So why are we even disabling fast strings operations? Why aren't we
simply ignoring this MCE with a warn in dmesg since, reportedly, we can
recover safely?

Nothing has gone wrong, has it?

> Since a host drain / fail-over usually starts right after the first

What is a "host drain"?

> MCE is signaled, which results in VM migration or termination, the
> performance degradation is a transient effect.

This sounds like a google-specific policy and doesn't belong in the
commit message.

> Tested:
> 
> Injected errors on 1st cache line of 8 anonymous pages of process
> 'proc1' and observed MCE consumption from 'proc2' with no panic
> (directly returned).
> 
> Without the fix, the host panicked within a few minutes on a
> random 'proc2' process due to kernel access from copy_page.

We usually do not keep in the commit message how a patch has been tested
but I guess with MCE that is important enough.

> 
> Signed-off-by: Jue Wang <juew@google.com>
> ---
>  arch/x86/kernel/cpu/mce/core.c     | 53 ++++++++++++++++++++++++++++++
>  arch/x86/kernel/cpu/mce/internal.h |  5 ++-
>  2 files changed, 57 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
> index 5818b837fd4d..abbd4936dfa8 100644
> --- a/arch/x86/kernel/cpu/mce/core.c
> +++ b/arch/x86/kernel/cpu/mce/core.c
> @@ -834,6 +834,49 @@ static void quirk_sandybridge_ifu(int bank, struct mce *m, struct pt_regs *regs)
>  	m->cs = regs->cs;
>  }
>  
> +/*
> + * Disable fast string copy and return from the MCE handler upon the first SRAR
> + * MCE on bank 1 due to a CPU erratum on Intel SKX/CLX/CPL CPUs.
> + * The fast string copy instructions ("rep movs*") could consume an
> + * uncorrectable memory error in the cache line _right after_ the
> + * desired region to copy and raise an MCE with RIP pointing to the
> + * instruction _after_ the "rep movs*".
> + * This mitigation addresses the issue completely with the caveat of
> + * performance degradation on the CPU affected. This is still better
> + * than the OS crashes on MCEs raised on an irrelevant process due to
> + * 'rep movs*' accesses in a kernel context (e.g., copy_page).
> + * Since a host drain / fail-over usually starts right after the first
> + * MCE is signaled, which results in VM migration or termination, the
> + * performance degradation is a transient effect.
> + *
> + * Returns true when fast string copy on cpu should be disabled.

Unknown word [cpu] in comment.
Suggestions: ['CPU', 'cup', 'cp', 'cu', 'cps', 'cru', 'cpl', 'cpd', 'APU', 'vCPU']

> + */
> +static bool quirk_skylake_repmov(void)
> +{
> +	u64 mcgstatus = mce_rdmsrl(MSR_IA32_MCG_STATUS);
> +	u64 misc_enable = __rdmsr(MSR_IA32_MISC_ENABLE);
> +
> +	if ((mcgstatus & MCG_STATUS_LMCES) &&
> +	    unlikely(misc_enable & MSR_IA32_MISC_ENABLE_FAST_STRING)) {
> +		u64 mc1_status = mce_rdmsrl(MSR_IA32_MCx_STATUS(1));
> +
> +		if ((mc1_status &
> +		     (MCI_STATUS_VAL|MCI_STATUS_OVER|MCI_STATUS_UC|MCI_STATUS_EN|
> +		      MCI_STATUS_ADDRV|MCI_STATUS_MISCV|MCI_STATUS_PCC|
> +		      MCI_STATUS_AR|MCI_STATUS_S)) ==
> +		    (MCI_STATUS_VAL|MCI_STATUS_UC|MCI_STATUS_EN|MCI_STATUS_ADDRV|
> +		     MCI_STATUS_MISCV|MCI_STATUS_AR|MCI_STATUS_S)) {

You can write that by paying attention to the vertical alignment so that
it is visible which bits we're looking at:

                if ((mc1_status &
                     (MCI_STATUS_VAL | MCI_STATUS_OVER | MCI_STATUS_UC | MCI_STATUS_EN |
                      MCI_STATUS_ADDRV | MCI_STATUS_MISCV | MCI_STATUS_PCC |
                      MCI_STATUS_AR | MCI_STATUS_S)) == 

                     (MCI_STATUS_VAL   |                 MCI_STATUS_UC | MCI_STATUS_EN |
                      MCI_STATUS_ADDRV | MCI_STATUS_MISCV |
                      MCI_STATUS_AR | MCI_STATUS_S)) {

i.e., MCI_STATUS_OVER and MCI_STATUS_PCC must *not* be set.


> +			msr_clear_bit(MSR_IA32_MISC_ENABLE,
> +				      MSR_IA32_MISC_ENABLE_FAST_STRING_BIT);
> +			mce_wrmsrl(MSR_IA32_MCG_STATUS, 0);
> +			mce_wrmsrl(MSR_IA32_MCx_STATUS(1), 0);
> +			pr_err_once("Errata detected, disable fast string copy instructions.\n");
> +			return true;
> +		}
> +	}
> +	return false;
> +}
> +
>  /*
>   * Do a quick check if any of the events requires a panic.
>   * This decides if we keep the events around or clear them.
> @@ -1403,6 +1446,9 @@ noinstr void do_machine_check(struct pt_regs *regs)
>  	else if (unlikely(!mca_cfg.initialized))
>  		return unexpected_machine_check(regs);
>  
> +	if (mce_flags.skx_repmov_quirk && quirk_skylake_repmov())

What about the MCE broadcasting synchronization? This is bypassing
everything. There's mce_exception_count which counts stuff too.

In any case, if this function is gonna be called by do_machine_check, it
needs to be noinstr. You can test with CONFIG_DEBUG_ENTRY=y.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

  parent reply	other threads:[~2022-02-15 22:08 UTC|newest]

Thread overview: 42+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-02-07  4:36 [RFC] x86/mce: Add workaround for SKX/CLX/CPX spurious machine checks Jue Wang
2022-02-07 18:23 ` Luck, Tony
2022-02-07 18:52 ` Borislav Petkov
2022-02-07 19:24   ` Luck, Tony
2022-02-07 20:27     ` Borislav Petkov
2022-02-07 21:07       ` Luck, Tony
2022-02-07 21:20         ` Borislav Petkov
2022-02-07 21:51           ` Luck, Tony
2022-02-08 15:04             ` Jue Wang
2022-02-08 15:09               ` [PATCH] " Jue Wang
2022-02-11 20:08                 ` Jue Wang
2022-02-11 20:18                   ` Borislav Petkov
2022-02-11 20:23                     ` Jue Wang
2022-02-15 18:42                 ` Luck, Tony
2022-02-15 22:08                 ` Borislav Petkov [this message]
2022-02-15 22:22                   ` Luck, Tony
2022-02-16 10:28                     ` Borislav Petkov
2022-02-16 15:50                       ` Jue Wang
2022-02-16 18:02                         ` Borislav Petkov
2022-02-16 18:41                           ` Luck, Tony
2022-02-16 18:52                             ` Borislav Petkov
2022-02-16 18:58                               ` Luck, Tony
2022-02-16 18:59                                 ` Jue Wang
2022-02-16 21:53                                   ` [PATCH] x86/mce: work around an erratum on fast string copy instructions Jue Wang
2022-02-17 16:30                                     ` Borislav Petkov
2022-02-17 16:32                                       ` Borislav Petkov
2022-02-18  1:32                                         ` [PATCH v2] " Jue Wang
2022-02-18 15:07                                           ` Borislav Petkov
2022-02-18 16:03                                             ` Jue Wang
2022-02-18 16:14                                               ` Borislav Petkov
2022-02-18 16:21                                                 ` Jue Wang
2022-02-18 17:16                                                   ` Borislav Petkov
2022-02-18 17:39                                                     ` Jue Wang
     [not found]                                                     ` <CAPcxDJ7=hCz6KRih4OBVv-k8WLcBL4n+VSpeP_zky7Uunq89zg@mail.gmail.com>
2022-02-18 22:05                                                       ` Borislav Petkov
2022-02-18 22:38                                                         ` Luck, Tony
2022-02-18 22:58                                                           ` Borislav Petkov
2022-02-18 17:58                                             ` Luck, Tony
2022-02-19 18:09                                           ` [tip: ras/core] x86/mce: Work " tip-bot2 for Jue Wang
2022-02-16  5:40                   ` [PATCH] x86/mce: Add workaround for SKX/CLX/CPX spurious machine checks Jue Wang
2022-02-16  5:56                     ` [PATCH] x86/mce: work around an erratum on fast string copy instructions Jue Wang
2022-02-16  9:04                       ` David Laight
2022-02-16 15:33                         ` Jue Wang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Ygwka++3eipjQzB2@zn.tnic \
    --to=bp@alien8.de \
    --cc=juew@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=patches@lists.linux.dev \
    --cc=tony.luck@intel.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox