linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
From: tbaicar@codeaurora.org (Baicar, Tyler)
To: linux-arm-kernel@lists.infradead.org
Subject: [PATCH v2 10/16] arm64: kernel: Survive corrected RAS errors notified by SError
Date: Wed, 13 Sep 2017 14:52:21 -0600	[thread overview]
Message-ID: <eba92679-bbb7-6cb7-843c-7cfdbc793b6b@codeaurora.org> (raw)
In-Reply-To: <20170728141019.9084-11-james.morse@arm.com>

On 7/28/2017 8:10 AM, James Morse wrote:
> On v8.0, SError is an uncontainable fatal exception. The v8.2 RAS
> extensions use SError to notify software about RAS errors, these can be
> contained by the ESB instruction.
>
> An ACPI system with firmware-first may use SError as its 'SEI'
> notification. Future patches may add code to 'claim' this SError as
> notification.
>
> Other systems can distinguish these RAS errors from the SError ESR and
> use the AET bits and additional data from RAS-Error registers to handle
> the error.  Future patches may add this kernel-first handling.
>
> In the meantime, on both kinds of system we can safely ignore corrected
> errors.
>
> Signed-off-by: James Morse <james.morse@arm.com>
> ---
>   arch/arm64/include/asm/esr.h | 10 ++++++++++
>   arch/arm64/kernel/traps.c    | 35 ++++++++++++++++++++++++++++++++---
>   2 files changed, 42 insertions(+), 3 deletions(-)
>
> diff --git a/arch/arm64/include/asm/esr.h b/arch/arm64/include/asm/esr.h
> index 8cabd57b6348..77d5b1baf1a4 100644
> --- a/arch/arm64/include/asm/esr.h
> +++ b/arch/arm64/include/asm/esr.h
> @@ -83,6 +83,15 @@
>   /* ISS field definitions shared by different classes */
>   #define ESR_ELx_WNR		(UL(1) << 6)
>   
> +/* Asynchronous Error Type */
> +#define ESR_ELx_AET		(UL(0x7) << 10)
> +
> +#define ESR_ELx_AET_UC		(UL(0) << 10)	/* Uncontainable */
> +#define ESR_ELx_AET_UEU		(UL(1) << 10)	/* Uncorrected Unrecoverable */
> +#define ESR_ELx_AET_UEO		(UL(2) << 10)	/* Uncorrected Restartable */
> +#define ESR_ELx_AET_UER		(UL(3) << 10)	/* Uncorrected Recoverable */
> +#define ESR_ELx_AET_CE		(UL(6) << 10)	/* Corrected */
> +
>   /* Shared ISS field definitions for Data/Instruction aborts */
>   #define ESR_ELx_FnV		(UL(1) << 10)
>   #define ESR_ELx_EA		(UL(1) << 9)
> @@ -92,6 +101,7 @@
>   #define ESR_ELx_FSC		(0x3F)
>   #define ESR_ELx_FSC_TYPE	(0x3C)
>   #define ESR_ELx_FSC_EXTABT	(0x10)
> +#define ESR_ELx_FSC_SERROR	(0x11)
>   #define ESR_ELx_FSC_ACCESS	(0x08)
>   #define ESR_ELx_FSC_FAULT	(0x04)
>   #define ESR_ELx_FSC_PERM	(0x0C)
> diff --git a/arch/arm64/kernel/traps.c b/arch/arm64/kernel/traps.c
> index 943a0e242dbc..e1eaccc66548 100644
> --- a/arch/arm64/kernel/traps.c
> +++ b/arch/arm64/kernel/traps.c
> @@ -685,10 +685,8 @@ asmlinkage void bad_el0_sync(struct pt_regs *regs, int reason, unsigned int esr)
>   	force_sig_info(info.si_signo, &info, current);
>   }
>   
> -asmlinkage void do_serror(struct pt_regs *regs, unsigned int esr)
> +static void do_serror_panic(struct pt_regs *regs, unsigned int esr)
>   {
> -	nmi_enter();
> -
>   	console_verbose();
>   
>   	pr_crit("SError Interrupt on CPU%d, code 0x%08x -- %s\n",
> @@ -696,6 +694,37 @@ asmlinkage void do_serror(struct pt_regs *regs, unsigned int esr)
>   	__show_regs(regs);
>   
>   	nmi_panic(regs, "Asynchronous SError Interrupt");
> +}
> +
> +static void _do_serror(struct pt_regs *regs, unsigned int esr)
> +{
> +	bool impdef_syndrome = esr & ESR_ELx_ISV;	/* aka IDS */
> +	unsigned int aet = esr & ESR_ELx_AET;
> +
> +	if (!cpus_have_const_cap(ARM64_HAS_RAS_EXTN) || impdef_syndrome)
> +		return do_serror_panic(regs, esr);
> +
> +	/*
> +	 * AET is RES0 if 'the value returned in the DFSC field is not
> +	 * [ESR_ELx_FSC_SERROR]'
> +	 */
> +	if ((esr & ESR_ELx_FSC) != ESR_ELx_FSC_SERROR)
> +		return do_serror_panic(regs, esr);
> +
> +	switch (aet) {
Hello James,

Here you just have corrected and restartable errors being ignored and 
all other errors panic. For corrected and restartable errors, we should 
at least be logging that an error happened and provide the syndrome info 
(address, context, etc.). We also should be triggering a trace event to 
notify the user space that an error happened so that tools like RAS 
Daemon can report the error. This will involve a new trace event since 
the current ones are based of the CPER structures from the 
firmware-first case.

Recoverable UEs should not need to trigger the panic, we should be able 
to do the recovery similar to the memory fault handling in 
mm/memory-failure.c code. The recoverable UEs should also trigger a 
trace event to user space since they won't cause a panic as well.

Thanks,
Tyler
> +	case ESR_ELx_AET_CE:	/* corrected error */
> +	case ESR_ELx_AET_UEO:	/* restartable, not yet consumed */
> +		break;
> +	default:
> +		return do_serror_panic(regs, esr);
> +	}
> +}
> +
> +asmlinkage void do_serror(struct pt_regs *regs, unsigned int esr)
> +{
> +	nmi_enter();
> +
> +	_do_serror(regs, esr);
>   
>   	nmi_exit();
>   }

-- 
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project.

  reply	other threads:[~2017-09-13 20:52 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-07-28 14:10 [PATCH v2 00/16] SError rework + v8.2 RAS and IESB cpufeature support James Morse
2017-07-28 14:10 ` [PATCH v2 01/16] arm64: explicitly mask all exceptions James Morse
2017-07-28 14:10 ` [PATCH v2 02/16] arm64: introduce an order for exceptions James Morse
2017-07-28 14:10 ` [PATCH v2 03/16] arm64: unmask all exceptions from C code on CPU startup James Morse
2017-07-28 14:10 ` [PATCH v2 04/16] arm64: entry.S: mask all exceptions during kernel_exit James Morse
2017-07-28 14:10 ` [PATCH v2 05/16] arm64: entry.S: move enable_step_tsk into kernel_exit James Morse
2017-07-28 14:10 ` [PATCH v2 06/16] arm64: entry.S: convert elX_sync James Morse
2017-08-09 17:25   ` Catalin Marinas
2017-08-10 16:57     ` James Morse
2017-08-11 17:24       ` James Morse
2017-07-28 14:10 ` [PATCH v2 07/16] arm64: entry.S: convert elX_irq James Morse
2017-07-28 14:10 ` [PATCH v2 08/16] arm64: entry.S: move SError handling into a C function for future expansion James Morse
2017-07-28 14:10 ` [PATCH v2 09/16] arm64: cpufeature: Detect CPU RAS Extentions James Morse
2017-07-28 14:10 ` [PATCH v2 10/16] arm64: kernel: Survive corrected RAS errors notified by SError James Morse
2017-09-13 20:52   ` Baicar, Tyler [this message]
2017-09-14 12:58     ` James Morse
2017-07-28 14:10 ` [PATCH v2 11/16] arm64: kernel: Handle deferred SError on kernel entry James Morse
2017-08-03 17:03   ` James Morse
2017-07-28 14:10 ` [PATCH v2 12/16] arm64: entry.S: Make eret restartable James Morse
2017-07-28 14:10 ` [PATCH v2 13/16] arm64: cpufeature: Enable Implicit ESB on entry/return-from EL1 James Morse
2017-07-28 14:10 ` [PATCH v2 14/16] KVM: arm64: Take pending SErrors on entry to the guest James Morse
2017-08-01 12:53   ` Christoffer Dall
2017-07-28 14:10 ` [PATCH v2 15/16] KVM: arm64: Save ESR_EL2 on guest SError James Morse
2017-08-01 13:25   ` Christoffer Dall
2017-07-28 14:10 ` [PATCH v2 16/16] KVM: arm64: Handle deferred SErrors consumed on guest exit James Morse
2017-08-01 13:18   ` Christoffer Dall
2017-08-03 17:03     ` James Morse
2017-08-04 13:12       ` Christoffer Dall

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=eba92679-bbb7-6cb7-843c-7cfdbc793b6b@codeaurora.org \
    --to=tbaicar@codeaurora.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).