kvmarm.lists.cs.columbia.edu archive mirror
 help / color / mirror / Atom feed
From: James Morse <james.morse@arm.com>
To: Will Deacon <will.deacon@arm.com>
Cc: Jonathan.Zhang@cavium.com, Marc Zyngier <marc.zyngier@arm.com>,
	Catalin Marinas <catalin.marinas@arm.com>,
	Julien Thierry <julien.thierry@arm.com>,
	wangxiongfeng2@huawei.com, linux-arm-kernel@lists.infradead.org,
	Dongjiu Geng <gengdongjiu@huawei.com>,
	kvmarm@lists.cs.columbia.edu
Subject: Re: [PATCH v4 12/21] arm64: kernel: Survive corrected RAS errors notified by SError
Date: Thu, 02 Nov 2017 12:15:11 +0000	[thread overview]
Message-ID: <59FB0C4F.9060707@arm.com> (raw)
In-Reply-To: <20171031135041.GL5584@arm.com>

Hi Will,

On 31/10/17 13:50, Will Deacon wrote:
> On Thu, Oct 19, 2017 at 03:57:58PM +0100, James Morse wrote:
>> Prior to v8.2, SError is an uncontainable fatal exception. The v8.2 RAS
>> extensions use SError to notify software about RAS errors, these can be
>> contained by the ESB instruction.
>>
>> An ACPI system with firmware-first may use SError as its 'SEI'
>> notification. Future patches may add code to 'claim' this SError as a
>> notification.
>>
>> Other systems can distinguish these RAS errors from the SError ESR and
>> use the AET bits and additional data from RAS-Error registers to handle
>> the error. Future patches may add this kernel-first handling.
>>
>> Without support for either of these we will panic(), even if we received
>> a corrected error. Add code to decode the severity of RAS errors. We can
>> safely ignore contained errors where the CPU can continue to make
>> progress. For all other errors we continue to panic().

>> diff --git a/arch/arm64/include/asm/esr.h b/arch/arm64/include/asm/esr.h
>> index 66ed8b6b9976..8ea52f15bf1c 100644
>> --- a/arch/arm64/include/asm/esr.h
>> +++ b/arch/arm64/include/asm/esr.h
>> @@ -85,6 +85,15 @@
>>  #define ESR_ELx_WNR_SHIFT	(6)
>>  #define ESR_ELx_WNR		(UL(1) << ESR_ELx_WNR_SHIFT)
>>  
>> +/* Asynchronous Error Type */
>> +#define ESR_ELx_AET		(UL(0x7) << 10)

> Can you add a #define for the AET shift in the Srror ISS, please? (we have
> other blocks in this file for different abort types). e.g.
> 
> /* ISS fields definitions for SError interrupts */
> #define ESR_ELx_AER_SHIFT	10
> 
> then use it below.

Yes,  I should have done that..


>> +#define ESR_ELx_AET_UC		(UL(0) << 10)	/* Uncontainable */
>> +#define ESR_ELx_AET_UEU		(UL(1) << 10)	/* Uncorrected Unrecoverable */
>> +#define ESR_ELx_AET_UEO		(UL(2) << 10)	/* Uncorrected Restartable */
>> +#define ESR_ELx_AET_UER		(UL(3) << 10)	/* Uncorrected Recoverable */
>> +#define ESR_ELx_AET_CE		(UL(6) << 10)	/* Corrected */
>> +
>>  /* Shared ISS field definitions for Data/Instruction aborts */
>>  #define ESR_ELx_SET_SHIFT	(11)
>>  #define ESR_ELx_SET_MASK	(UL(3) << ESR_ELx_SET_SHIFT)
>> @@ -99,6 +108,7 @@
>>  #define ESR_ELx_FSC		(0x3F)
>>  #define ESR_ELx_FSC_TYPE	(0x3C)
>>  #define ESR_ELx_FSC_EXTABT	(0x10)
>> +#define ESR_ELx_FSC_SERROR	(0x11)
>>  #define ESR_ELx_FSC_ACCESS	(0x08)
>>  #define ESR_ELx_FSC_FAULT	(0x04)

>>  #define ESR_ELx_FSC_PERM	(0x0C)
>> diff --git a/arch/arm64/include/asm/traps.h b/arch/arm64/include/asm/traps.h
>> index d131501c6222..8d2a1fff5c6b 100644
>> --- a/arch/arm64/include/asm/traps.h
>> +++ b/arch/arm64/include/asm/traps.h
>> @@ -19,6 +19,7 @@
>>  #define __ASM_TRAP_H
>>  
>>  #include <linux/list.h>
>> +#include <asm/esr.h>
>>  #include <asm/sections.h>
>>  
>>  struct pt_regs;
>> @@ -58,4 +59,39 @@ static inline int in_entry_text(unsigned long ptr)
>>  	return ptr >= (unsigned long)&__entry_text_start &&
>>  	       ptr < (unsigned long)&__entry_text_end;
>>  }
>> +
>> +static inline bool arm64_is_ras_serror(u32 esr)
>> +{
>> +	bool impdef = esr & ESR_ELx_ISV; /* aka IDS */
> 
> I think you should add an IDS field along with the AET one I suggested.

Sure,

>> +
>> +	if (cpus_have_const_cap(ARM64_HAS_RAS_EXTN))
>> +		return !impdef;
>> +
>> +	return false;
>> +}
>> +
>> +/* Return the AET bits of an SError ESR, or 0/uncontainable/uncategorized */
>> +static inline u32 arm64_ras_serror_get_severity(u32 esr)
>> +{
>> +	u32 aet = esr & ESR_ELx_AET;
>> +
>> +	if (!arm64_is_ras_serror(esr)) {
>> +		/* Not a RAS error, we can't interpret the ESR */
>> +		return 0;
>> +	}
>> +
>> +	/*
>> +	 * AET is RES0 if 'the value returned in the DFSC field is not
>> +	 * [ESR_ELx_FSC_SERROR]'
>> +	 */
>> +	if ((esr & ESR_ELx_FSC) != ESR_ELx_FSC_SERROR) {
>> +		/* No severity information */
>> +		return 0;
>> +	}

> Hmm, this means we can't distinguish impdef or RES0 encodings from
> uncontainable errors. Is that desirable?

We panic for for both impdef and uncontainable ESR values, so the difference
doesn't matter. I'll remove the 'is_ras_serror()' in here and make it the
callers problem to check...


RES0 encodings?
If this is an imp-def 'all zeros', those should all be matched as impdef by
arm64_is_ras_serror().
Otherwise its a RAS encoding with {I,D}FSC bits that indicate we can't know the
severity.
The ARM-ARM calls these 'uncategorized'. Yes I'm treating them as uncontained,
(on aarch32 these share an encoding). I'll add a comment to call it out.


> Also, could we end up in a situation where some CPUs support RAS and some
> don't, 

Ooer, differing CPU support. I hadn't considered that... wouldn't cpufeature
declare such a system insane?


> so arm64_is_ras_serror returns false yet a correctable error is
> reported by one the CPUs and we treat it as uncontainable?

Makeing the HAS_RAS tests use this_cpu_has_cap() should cover this, but will
cause problems for KVM as it calls these from a pre-emptible context.


>> +
>> +	return aet;
>> +}
>> +
>> +bool arm64_blocking_ras_serror(struct pt_regs *regs, unsigned int esr);
>> +void __noreturn arm64_serror_panic(struct pt_regs *regs, u32 esr);
>>  #endif
>> diff --git a/arch/arm64/kernel/traps.c b/arch/arm64/kernel/traps.c
>> index 773aae69c376..53aeb25158b0 100644
>> --- a/arch/arm64/kernel/traps.c
>> +++ b/arch/arm64/kernel/traps.c
>> @@ -709,17 +709,65 @@ asmlinkage void handle_bad_stack(struct pt_regs *regs)

>> +bool arm64_blocking_ras_serror(struct pt_regs *regs, unsigned int esr)
>> +{

> Since you asked... what about "fatal" instead of "blocking"?

.. well that was obvious. Yes, I was looking too much at whether we could return
to the interrupted context instead of what we do next!


Thanks,

James

  reply	other threads:[~2017-11-02 12:15 UTC|newest]

Thread overview: 80+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-10-19 14:57 [PATCH v4 00/21] SError rework + RAS&IESB for firmware first support James Morse
2017-10-19 14:57 ` [PATCH v4 01/21] arm64: explicitly mask all exceptions James Morse
2017-10-19 14:57 ` [PATCH v4 02/21] arm64: introduce an order for exceptions James Morse
2017-10-19 14:57 ` [PATCH v4 03/21] arm64: Move the async/fiq helpers to explicitly set process context flags James Morse
2017-10-19 14:57 ` [PATCH v4 04/21] arm64: Mask all exceptions during kernel_exit James Morse
2017-10-19 14:57 ` [PATCH v4 05/21] arm64: entry.S: Remove disable_dbg James Morse
2017-10-19 14:57 ` [PATCH v4 06/21] arm64: entry.S: convert el1_sync James Morse
2017-10-19 14:57 ` [PATCH v4 07/21] arm64: entry.S convert el0_sync James Morse
2017-10-19 14:57 ` [PATCH v4 08/21] arm64: entry.S: convert elX_irq James Morse
2017-10-19 14:57 ` [PATCH v4 09/21] KVM: arm/arm64: mask/unmask daif around VHE guests James Morse
2017-10-30  7:40   ` Christoffer Dall
2017-11-02 12:14     ` James Morse
2017-11-03 12:45       ` Christoffer Dall
2017-11-03 17:19         ` James Morse
2017-11-06 12:42           ` Christoffer Dall
2017-10-19 14:57 ` [PATCH v4 10/21] arm64: entry.S: move SError handling into a C function for future expansion James Morse
2018-01-02 21:07   ` Adam Wallis
2018-01-03 16:00     ` James Morse
2017-10-19 14:57 ` [PATCH v4 11/21] arm64: cpufeature: Detect CPU RAS Extentions James Morse
2017-10-31 13:14   ` Will Deacon
2017-11-02 12:15     ` James Morse
2017-10-19 14:57 ` [PATCH v4 12/21] arm64: kernel: Survive corrected RAS errors notified by SError James Morse
2017-10-31 13:50   ` Will Deacon
2017-11-02 12:15     ` James Morse [this message]
2017-10-19 14:57 ` [PATCH v4 13/21] arm64: cpufeature: Enable IESB on exception entry/return for firmware-first James Morse
2017-10-31 13:56   ` Will Deacon
2017-10-19 14:58 ` [PATCH v4 14/21] arm64: kernel: Prepare for a DISR user James Morse
2017-10-19 14:58 ` [PATCH v4 15/21] KVM: arm64: Set an impdef ESR for Virtual-SError using VSESR_EL2 James Morse
2017-10-20 16:44   ` gengdongjiu
2017-10-23 15:26     ` James Morse
2017-10-24  9:53       ` gengdongjiu
2017-10-30  7:59   ` Christoffer Dall
2017-10-30 10:51     ` Christoffer Dall
2017-10-30 15:44       ` James Morse
2017-10-31  5:48         ` Christoffer Dall
2017-10-31  6:34   ` Marc Zyngier
2017-10-19 14:58 ` [PATCH v4 16/21] KVM: arm64: Save/Restore guest DISR_EL1 James Morse
2017-10-31  4:27   ` Marc Zyngier
2017-10-31  5:27   ` Christoffer Dall
2017-10-19 14:58 ` [PATCH v4 17/21] KVM: arm64: Save ESR_EL2 on guest SError James Morse
2017-10-31  4:26   ` Marc Zyngier
2017-10-31  5:47     ` Marc Zyngier
2017-11-01 17:42       ` James Morse
2017-10-19 14:58 ` [PATCH v4 18/21] KVM: arm64: Handle RAS SErrors from EL1 on guest exit James Morse
2017-10-31  5:55   ` Marc Zyngier
2017-10-31  5:56   ` Christoffer Dall
2017-10-19 14:58 ` [PATCH v4 19/21] KVM: arm64: Handle RAS SErrors from EL2 " James Morse
2017-10-27  6:26   ` gengdongjiu
2017-10-27 17:38     ` James Morse
2017-10-31  6:13   ` Marc Zyngier
2017-10-31  6:13   ` Christoffer Dall
2017-10-19 14:58 ` [PATCH v4 20/21] KVM: arm64: Take any host SError before entering the guest James Morse
2017-10-31  6:23   ` Christoffer Dall
2017-10-31 11:43     ` James Morse
2017-11-01  4:55       ` Christoffer Dall
2017-11-02 12:18         ` James Morse
2017-11-03 12:49           ` Christoffer Dall
2017-11-03 16:14             ` James Morse
2017-11-06 12:45               ` Christoffer Dall
2017-10-19 14:58 ` [PATCH v4 21/21] KVM: arm64: Trap RAS error registers and set HCR_EL2's TERR & TEA James Morse
2017-10-31  6:32   ` Christoffer Dall
2017-10-31  6:32   ` Marc Zyngier
2017-10-31  6:35 ` [PATCH v4 00/21] SError rework + RAS&IESB for firmware first support Christoffer Dall
2017-10-31 10:08   ` Will Deacon
2017-11-01 15:23     ` James Morse
2017-11-02  8:14       ` Christoffer Dall
2017-11-09 18:14 ` James Morse
2017-11-10 12:03   ` gengdongjiu
2017-11-13 11:29   ` Christoffer Dall
2017-11-13 13:05     ` Peter Maydell
2017-11-20  8:53       ` Christoffer Dall
2017-11-13 16:14     ` Andrew Jones
2017-11-13 17:56       ` Peter Maydell
2017-11-14 16:11       ` James Morse
2017-11-15  9:59         ` gengdongjiu
2017-11-14 16:03     ` James Morse
2017-11-15  9:15       ` gengdongjiu
2017-11-15 18:25         ` James Morse
2017-11-21 11:31           ` gengdongjiu
2017-11-20  8:55       ` Christoffer Dall

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=59FB0C4F.9060707@arm.com \
    --to=james.morse@arm.com \
    --cc=Jonathan.Zhang@cavium.com \
    --cc=catalin.marinas@arm.com \
    --cc=gengdongjiu@huawei.com \
    --cc=julien.thierry@arm.com \
    --cc=kvmarm@lists.cs.columbia.edu \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=marc.zyngier@arm.com \
    --cc=wangxiongfeng2@huawei.com \
    --cc=will.deacon@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).