linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
From: james.morse@arm.com (James Morse)
To: linux-arm-kernel@lists.infradead.org
Subject: [PATCH v4 20/21] KVM: arm64: Take any host SError before entering the guest
Date: Thu, 02 Nov 2017 12:18:20 +0000	[thread overview]
Message-ID: <59FB0D0C.1010208@arm.com> (raw)
In-Reply-To: <20171101045550.GB11166@lvm>

Hi Christoffer,

On 01/11/17 04:55, Christoffer Dall wrote:
> On Tue, Oct 31, 2017 at 11:43:42AM +0000, James Morse wrote:
>> On 31/10/17 06:23, Christoffer Dall wrote:
>>> On Thu, Oct 19, 2017 at 03:58:06PM +0100, James Morse wrote:
>>>> On VHE systems KVM masks SError before switching the VBAR value. Any
>>>> host RAS error that the CPU knew about before world-switch may become
>>>> pending as an SError during world-switch, and only be taken once we enter
>>>> the guest.
>>>>
>>>> Until KVM can take RAS SErrors during world switch, add an ESB to
>>>> force any RAS errors to be synchronised and taken on the host before
>>>> we enter world switch.
>>>>
>>>> RAS errors that become pending during world switch are still taken
>>>> once we enter the guest.
>>
>>>> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
>>>> index cf5d78ba14b5..5dc6f2877762 100644
>>>> --- a/arch/arm64/include/asm/kvm_host.h
>>>> +++ b/arch/arm64/include/asm/kvm_host.h
>>>> @@ -392,6 +392,7 @@ static inline void __cpu_init_stage2(void)
>>>>  
>>>>  static inline void kvm_arm_vhe_guest_enter(void)
>>>>  {
>>>> +	esb();
>>
>>> I don't fully appreciate what the point of this is?
>>>
>>> As I understand it, our fundamental goal here is to try to distinguish
>>> between errors happening on the host or in the guest.
>>
>> Not just host/guest, but also those we can and can't handle.
>>
>> KVM can't currently take an SError during world switch, so a RAS error that the
>> CPU was hoping to defer may spread from the host into KVM's
>> no-SError:world-switch code. If this happens it will (almost certainly) have to
>> be re-classified as uncontainable.
>>
>> There is also a firmware-first angle here: NOTIFY_SEI can't be delivered if the
>> normal world has SError masked, so any error that spreads past this point
>> becomes a reboot-by-firmware instead of an OS notification and almost-helpful
>> error message.
>>
>>
>>> If that's correct, then why don't we do it at the last possible moment
>>> when we still have a scratch register left, in the world switch code
>>> itself, and in the case abort the guest entry and report back a "host
>>> SError" return code.
>>
>> We have IESB to run the error-barrier as we enter the guest. This would make any
>> host error pending as an SError, and we would exit the guest immediately. But if
>> there was an RAS error during world switch, by this point its likely to be
>> classified as uncontainable.
>>
>> This esb() is trying to keep this window of code as small as possible, to just
>> errors that occur during world switch.
>>
>> With your vcpu load/save this window becomes a lot smaller, it may be possible
>> to get a VHE-host's arch-code SError handler to take errors from EL2, in which
>> case this barrier can disappear.
>> (note to self: guest may still own the debug hardware)
>>
> 
> ok, thanks for your detailed explanation.  I didn't consider that the
> classification of a RAS error as containable vs. non-containable
> depended on where we take the exception.

Will makes the point over on patch 11 that until we have different handling for
these different classifications of error, there isn't much point doing this now.
(i.e. we treat an error generated here, or when we enter the guest in the same way).

I was trying to keep my eye on what we need for kernel-first support, so we
don't have to change the code twice, we just expand the error handling to do better.

I'll drop this patch for now, it will come back if/when we get kernel-first
support for RAS.


What about firmware-first? Firmware can always take these errors when the normal
world is running. Dropping the barrier means its up to the CPU when any error
gets reported, if firmware has to use NOTIFY_SEI it will have to do a reboot if
the error occurs during world-switch (as SError is masked). If an error spreads
over this boundary, that's just tough-luck, the kernel would have panic'd anyway.


Sorry for the noise,


Thanks,

James

  reply	other threads:[~2017-11-02 12:18 UTC|newest]

Thread overview: 80+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-10-19 14:57 [PATCH v4 00/21] SError rework + RAS&IESB for firmware first support James Morse
2017-10-19 14:57 ` [PATCH v4 01/21] arm64: explicitly mask all exceptions James Morse
2017-10-19 14:57 ` [PATCH v4 02/21] arm64: introduce an order for exceptions James Morse
2017-10-19 14:57 ` [PATCH v4 03/21] arm64: Move the async/fiq helpers to explicitly set process context flags James Morse
2017-10-19 14:57 ` [PATCH v4 04/21] arm64: Mask all exceptions during kernel_exit James Morse
2017-10-19 14:57 ` [PATCH v4 05/21] arm64: entry.S: Remove disable_dbg James Morse
2017-10-19 14:57 ` [PATCH v4 06/21] arm64: entry.S: convert el1_sync James Morse
2017-10-19 14:57 ` [PATCH v4 07/21] arm64: entry.S convert el0_sync James Morse
2017-10-19 14:57 ` [PATCH v4 08/21] arm64: entry.S: convert elX_irq James Morse
2017-10-19 14:57 ` [PATCH v4 09/21] KVM: arm/arm64: mask/unmask daif around VHE guests James Morse
2017-10-30  7:40   ` Christoffer Dall
2017-11-02 12:14     ` James Morse
2017-11-03 12:45       ` Christoffer Dall
2017-11-03 17:19         ` James Morse
2017-11-06 12:42           ` Christoffer Dall
2017-10-19 14:57 ` [PATCH v4 10/21] arm64: entry.S: move SError handling into a C function for future expansion James Morse
2018-01-02 21:07   ` Adam Wallis
2018-01-03 16:00     ` James Morse
2017-10-19 14:57 ` [PATCH v4 11/21] arm64: cpufeature: Detect CPU RAS Extentions James Morse
2017-10-31 13:14   ` Will Deacon
2017-11-02 12:15     ` James Morse
2017-10-19 14:57 ` [PATCH v4 12/21] arm64: kernel: Survive corrected RAS errors notified by SError James Morse
2017-10-31 13:50   ` Will Deacon
2017-11-02 12:15     ` James Morse
2017-10-19 14:57 ` [PATCH v4 13/21] arm64: cpufeature: Enable IESB on exception entry/return for firmware-first James Morse
2017-10-31 13:56   ` Will Deacon
2017-10-19 14:58 ` [PATCH v4 14/21] arm64: kernel: Prepare for a DISR user James Morse
2017-10-19 14:58 ` [PATCH v4 15/21] KVM: arm64: Set an impdef ESR for Virtual-SError using VSESR_EL2 James Morse
2017-10-20 16:44   ` gengdongjiu
2017-10-23 15:26     ` James Morse
2017-10-24  9:53       ` gengdongjiu
2017-10-30  7:59   ` Christoffer Dall
2017-10-30 10:51     ` Christoffer Dall
2017-10-30 15:44       ` James Morse
2017-10-31  5:48         ` Christoffer Dall
2017-10-31  6:34   ` Marc Zyngier
2017-10-19 14:58 ` [PATCH v4 16/21] KVM: arm64: Save/Restore guest DISR_EL1 James Morse
2017-10-31  4:27   ` Marc Zyngier
2017-10-31  5:27   ` Christoffer Dall
2017-10-19 14:58 ` [PATCH v4 17/21] KVM: arm64: Save ESR_EL2 on guest SError James Morse
2017-10-31  4:26   ` Marc Zyngier
2017-10-31  5:47     ` Marc Zyngier
2017-11-01 17:42       ` James Morse
2017-10-19 14:58 ` [PATCH v4 18/21] KVM: arm64: Handle RAS SErrors from EL1 on guest exit James Morse
2017-10-31  5:55   ` Marc Zyngier
2017-10-31  5:56   ` Christoffer Dall
2017-10-19 14:58 ` [PATCH v4 19/21] KVM: arm64: Handle RAS SErrors from EL2 " James Morse
2017-10-27  6:26   ` gengdongjiu
2017-10-27 17:38     ` James Morse
2017-10-31  6:13   ` Marc Zyngier
2017-10-31  6:13   ` Christoffer Dall
2017-10-19 14:58 ` [PATCH v4 20/21] KVM: arm64: Take any host SError before entering the guest James Morse
2017-10-31  6:23   ` Christoffer Dall
2017-10-31 11:43     ` James Morse
2017-11-01  4:55       ` Christoffer Dall
2017-11-02 12:18         ` James Morse [this message]
2017-11-03 12:49           ` Christoffer Dall
2017-11-03 16:14             ` James Morse
2017-11-06 12:45               ` Christoffer Dall
2017-10-19 14:58 ` [PATCH v4 21/21] KVM: arm64: Trap RAS error registers and set HCR_EL2's TERR & TEA James Morse
2017-10-31  6:32   ` Christoffer Dall
2017-10-31  6:32   ` Marc Zyngier
2017-10-31  6:35 ` [PATCH v4 00/21] SError rework + RAS&IESB for firmware first support Christoffer Dall
2017-10-31 10:08   ` Will Deacon
2017-11-01 15:23     ` James Morse
2017-11-02  8:14       ` Christoffer Dall
2017-11-09 18:14 ` James Morse
2017-11-10 12:03   ` gengdongjiu
2017-11-13 11:29   ` Christoffer Dall
2017-11-13 13:05     ` Peter Maydell
2017-11-20  8:53       ` Christoffer Dall
2017-11-13 16:14     ` Andrew Jones
2017-11-13 17:56       ` Peter Maydell
2017-11-14 16:11       ` James Morse
2017-11-15  9:59         ` gengdongjiu
2017-11-14 16:03     ` James Morse
2017-11-15  9:15       ` gengdongjiu
2017-11-15 18:25         ` James Morse
2017-11-21 11:31           ` gengdongjiu
2017-11-20  8:55       ` Christoffer Dall

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=59FB0D0C.1010208@arm.com \
    --to=james.morse@arm.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).