From: will.deacon@arm.com (Will Deacon)
To: linux-arm-kernel@lists.infradead.org
Subject: [PATCH v4 12/21] arm64: kernel: Survive corrected RAS errors notified by SError
Date: Tue, 31 Oct 2017 13:50:42 +0000 [thread overview]
Message-ID: <20171031135041.GL5584@arm.com> (raw)
In-Reply-To: <20171019145807.23251-13-james.morse@arm.com>
On Thu, Oct 19, 2017 at 03:57:58PM +0100, James Morse wrote:
> Prior to v8.2, SError is an uncontainable fatal exception. The v8.2 RAS
> extensions use SError to notify software about RAS errors, these can be
> contained by the ESB instruction.
>
> An ACPI system with firmware-first may use SError as its 'SEI'
> notification. Future patches may add code to 'claim' this SError as a
> notification.
>
> Other systems can distinguish these RAS errors from the SError ESR and
> use the AET bits and additional data from RAS-Error registers to handle
> the error. Future patches may add this kernel-first handling.
>
> Without support for either of these we will panic(), even if we received
> a corrected error. Add code to decode the severity of RAS errors. We can
> safely ignore contained errors where the CPU can continue to make
> progress. For all other errors we continue to panic().
>
> Signed-off-by: James Morse <james.morse@arm.com>
> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
>
> ---
> I couldn't come up with a concise way to capture 'can continue to make
> progress', so opted for 'blocking' instead.
>
> arch/arm64/include/asm/esr.h | 10 ++++++++
> arch/arm64/include/asm/traps.h | 36 ++++++++++++++++++++++++++
> arch/arm64/kernel/traps.c | 58 ++++++++++++++++++++++++++++++++++++++----
> 3 files changed, 99 insertions(+), 5 deletions(-)
>
> diff --git a/arch/arm64/include/asm/esr.h b/arch/arm64/include/asm/esr.h
> index 66ed8b6b9976..8ea52f15bf1c 100644
> --- a/arch/arm64/include/asm/esr.h
> +++ b/arch/arm64/include/asm/esr.h
> @@ -85,6 +85,15 @@
> #define ESR_ELx_WNR_SHIFT (6)
> #define ESR_ELx_WNR (UL(1) << ESR_ELx_WNR_SHIFT)
>
> +/* Asynchronous Error Type */
> +#define ESR_ELx_AET (UL(0x7) << 10)
Can you add a #define for the AET shift in the Srror ISS, please? (we have
other blocks in this file for different abort types). e.g.
/* ISS fields definitions for SError interrupts */
#define ESR_ELx_AER_SHIFT 10
then use it below.
> +#define ESR_ELx_AET_UC (UL(0) << 10) /* Uncontainable */
> +#define ESR_ELx_AET_UEU (UL(1) << 10) /* Uncorrected Unrecoverable */
> +#define ESR_ELx_AET_UEO (UL(2) << 10) /* Uncorrected Restartable */
> +#define ESR_ELx_AET_UER (UL(3) << 10) /* Uncorrected Recoverable */
> +#define ESR_ELx_AET_CE (UL(6) << 10) /* Corrected */
> +
> /* Shared ISS field definitions for Data/Instruction aborts */
> #define ESR_ELx_SET_SHIFT (11)
> #define ESR_ELx_SET_MASK (UL(3) << ESR_ELx_SET_SHIFT)
> @@ -99,6 +108,7 @@
> #define ESR_ELx_FSC (0x3F)
> #define ESR_ELx_FSC_TYPE (0x3C)
> #define ESR_ELx_FSC_EXTABT (0x10)
> +#define ESR_ELx_FSC_SERROR (0x11)
> #define ESR_ELx_FSC_ACCESS (0x08)
> #define ESR_ELx_FSC_FAULT (0x04)
> #define ESR_ELx_FSC_PERM (0x0C)
> diff --git a/arch/arm64/include/asm/traps.h b/arch/arm64/include/asm/traps.h
> index d131501c6222..8d2a1fff5c6b 100644
> --- a/arch/arm64/include/asm/traps.h
> +++ b/arch/arm64/include/asm/traps.h
> @@ -19,6 +19,7 @@
> #define __ASM_TRAP_H
>
> #include <linux/list.h>
> +#include <asm/esr.h>
> #include <asm/sections.h>
>
> struct pt_regs;
> @@ -58,4 +59,39 @@ static inline int in_entry_text(unsigned long ptr)
> return ptr >= (unsigned long)&__entry_text_start &&
> ptr < (unsigned long)&__entry_text_end;
> }
> +
> +static inline bool arm64_is_ras_serror(u32 esr)
> +{
> + bool impdef = esr & ESR_ELx_ISV; /* aka IDS */
I think you should add an IDS field along with the AET one I suggested.
> +
> + if (cpus_have_const_cap(ARM64_HAS_RAS_EXTN))
> + return !impdef;
> +
> + return false;
> +}
> +
> +/* Return the AET bits of an SError ESR, or 0/uncontainable/uncategorized */
> +static inline u32 arm64_ras_serror_get_severity(u32 esr)
> +{
> + u32 aet = esr & ESR_ELx_AET;
> +
> + if (!arm64_is_ras_serror(esr)) {
> + /* Not a RAS error, we can't interpret the ESR */
> + return 0;
> + }
> +
> + /*
> + * AET is RES0 if 'the value returned in the DFSC field is not
> + * [ESR_ELx_FSC_SERROR]'
> + */
> + if ((esr & ESR_ELx_FSC) != ESR_ELx_FSC_SERROR) {
> + /* No severity information */
> + return 0;
> + }
Hmm, this means we can't distinguish impdef or RES0 encodings from
uncontainable errors. Is that desirable?
Also, could we end up in a situation where some CPUs support RAS and some
don't, so arm64_is_ras_serror returns false yet a correctable error is
reported by one the CPUs and we treat it as uncontainable?
> +
> + return aet;
> +}
> +
> +bool arm64_blocking_ras_serror(struct pt_regs *regs, unsigned int esr);
> +void __noreturn arm64_serror_panic(struct pt_regs *regs, u32 esr);
> #endif
> diff --git a/arch/arm64/kernel/traps.c b/arch/arm64/kernel/traps.c
> index 773aae69c376..53aeb25158b0 100644
> --- a/arch/arm64/kernel/traps.c
> +++ b/arch/arm64/kernel/traps.c
> @@ -709,17 +709,65 @@ asmlinkage void handle_bad_stack(struct pt_regs *regs)
> }
> #endif
>
> -asmlinkage void do_serror(struct pt_regs *regs, unsigned int esr)
> +void __noreturn arm64_serror_panic(struct pt_regs *regs, u32 esr)
> {
> - nmi_enter();
> -
> console_verbose();
>
> pr_crit("SError Interrupt on CPU%d, code 0x%08x -- %s\n",
> smp_processor_id(), esr, esr_get_class_string(esr));
> - __show_regs(regs);
> + if (regs)
> + __show_regs(regs);
> +
> + /* KVM may call this this from a preemptible context */
> + preempt_disable();
> +
> + /*
> + * panic() unmasks interrupts, which unmasks SError. Use nmi_panic()
> + * to avoid re-entering panic.
> + */
> + nmi_panic(regs, "Asynchronous SError Interrupt");
> +
> + cpu_park_loop();
> + unreachable();
> +}
> +
> +bool arm64_blocking_ras_serror(struct pt_regs *regs, unsigned int esr)
> +{
Since you asked... what about "fatal" instead of "blocking"?
Will
next prev parent reply other threads:[~2017-10-31 13:50 UTC|newest]
Thread overview: 80+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-10-19 14:57 [PATCH v4 00/21] SError rework + RAS&IESB for firmware first support James Morse
2017-10-19 14:57 ` [PATCH v4 01/21] arm64: explicitly mask all exceptions James Morse
2017-10-19 14:57 ` [PATCH v4 02/21] arm64: introduce an order for exceptions James Morse
2017-10-19 14:57 ` [PATCH v4 03/21] arm64: Move the async/fiq helpers to explicitly set process context flags James Morse
2017-10-19 14:57 ` [PATCH v4 04/21] arm64: Mask all exceptions during kernel_exit James Morse
2017-10-19 14:57 ` [PATCH v4 05/21] arm64: entry.S: Remove disable_dbg James Morse
2017-10-19 14:57 ` [PATCH v4 06/21] arm64: entry.S: convert el1_sync James Morse
2017-10-19 14:57 ` [PATCH v4 07/21] arm64: entry.S convert el0_sync James Morse
2017-10-19 14:57 ` [PATCH v4 08/21] arm64: entry.S: convert elX_irq James Morse
2017-10-19 14:57 ` [PATCH v4 09/21] KVM: arm/arm64: mask/unmask daif around VHE guests James Morse
2017-10-30 7:40 ` Christoffer Dall
2017-11-02 12:14 ` James Morse
2017-11-03 12:45 ` Christoffer Dall
2017-11-03 17:19 ` James Morse
2017-11-06 12:42 ` Christoffer Dall
2017-10-19 14:57 ` [PATCH v4 10/21] arm64: entry.S: move SError handling into a C function for future expansion James Morse
2018-01-02 21:07 ` Adam Wallis
2018-01-03 16:00 ` James Morse
2017-10-19 14:57 ` [PATCH v4 11/21] arm64: cpufeature: Detect CPU RAS Extentions James Morse
2017-10-31 13:14 ` Will Deacon
2017-11-02 12:15 ` James Morse
2017-10-19 14:57 ` [PATCH v4 12/21] arm64: kernel: Survive corrected RAS errors notified by SError James Morse
2017-10-31 13:50 ` Will Deacon [this message]
2017-11-02 12:15 ` James Morse
2017-10-19 14:57 ` [PATCH v4 13/21] arm64: cpufeature: Enable IESB on exception entry/return for firmware-first James Morse
2017-10-31 13:56 ` Will Deacon
2017-10-19 14:58 ` [PATCH v4 14/21] arm64: kernel: Prepare for a DISR user James Morse
2017-10-19 14:58 ` [PATCH v4 15/21] KVM: arm64: Set an impdef ESR for Virtual-SError using VSESR_EL2 James Morse
2017-10-20 16:44 ` gengdongjiu
2017-10-23 15:26 ` James Morse
2017-10-24 9:53 ` gengdongjiu
2017-10-30 7:59 ` Christoffer Dall
2017-10-30 10:51 ` Christoffer Dall
2017-10-30 15:44 ` James Morse
2017-10-31 5:48 ` Christoffer Dall
2017-10-31 6:34 ` Marc Zyngier
2017-10-19 14:58 ` [PATCH v4 16/21] KVM: arm64: Save/Restore guest DISR_EL1 James Morse
2017-10-31 4:27 ` Marc Zyngier
2017-10-31 5:27 ` Christoffer Dall
2017-10-19 14:58 ` [PATCH v4 17/21] KVM: arm64: Save ESR_EL2 on guest SError James Morse
2017-10-31 4:26 ` Marc Zyngier
2017-10-31 5:47 ` Marc Zyngier
2017-11-01 17:42 ` James Morse
2017-10-19 14:58 ` [PATCH v4 18/21] KVM: arm64: Handle RAS SErrors from EL1 on guest exit James Morse
2017-10-31 5:55 ` Marc Zyngier
2017-10-31 5:56 ` Christoffer Dall
2017-10-19 14:58 ` [PATCH v4 19/21] KVM: arm64: Handle RAS SErrors from EL2 " James Morse
2017-10-27 6:26 ` gengdongjiu
2017-10-27 17:38 ` James Morse
2017-10-31 6:13 ` Marc Zyngier
2017-10-31 6:13 ` Christoffer Dall
2017-10-19 14:58 ` [PATCH v4 20/21] KVM: arm64: Take any host SError before entering the guest James Morse
2017-10-31 6:23 ` Christoffer Dall
2017-10-31 11:43 ` James Morse
2017-11-01 4:55 ` Christoffer Dall
2017-11-02 12:18 ` James Morse
2017-11-03 12:49 ` Christoffer Dall
2017-11-03 16:14 ` James Morse
2017-11-06 12:45 ` Christoffer Dall
2017-10-19 14:58 ` [PATCH v4 21/21] KVM: arm64: Trap RAS error registers and set HCR_EL2's TERR & TEA James Morse
2017-10-31 6:32 ` Christoffer Dall
2017-10-31 6:32 ` Marc Zyngier
2017-10-31 6:35 ` [PATCH v4 00/21] SError rework + RAS&IESB for firmware first support Christoffer Dall
2017-10-31 10:08 ` Will Deacon
2017-11-01 15:23 ` James Morse
2017-11-02 8:14 ` Christoffer Dall
2017-11-09 18:14 ` James Morse
2017-11-10 12:03 ` gengdongjiu
2017-11-13 11:29 ` Christoffer Dall
2017-11-13 13:05 ` Peter Maydell
2017-11-20 8:53 ` Christoffer Dall
2017-11-13 16:14 ` Andrew Jones
2017-11-13 17:56 ` Peter Maydell
2017-11-14 16:11 ` James Morse
2017-11-15 9:59 ` gengdongjiu
2017-11-14 16:03 ` James Morse
2017-11-15 9:15 ` gengdongjiu
2017-11-15 18:25 ` James Morse
2017-11-21 11:31 ` gengdongjiu
2017-11-20 8:55 ` Christoffer Dall
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20171031135041.GL5584@arm.com \
--to=will.deacon@arm.com \
--cc=linux-arm-kernel@lists.infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).