All of lore.kernel.org
 help / color / mirror / Atom feed
From: Christoffer Dall <cdall@linaro.org>
To: James Morse <james.morse@arm.com>
Cc: Jonathan.Zhang@cavium.com, Marc Zyngier <marc.zyngier@arm.com>,
	Catalin Marinas <catalin.marinas@arm.com>,
	Julien Thierry <julien.thierry@arm.com>,
	Will Deacon <will.deacon@arm.com>,
	wangxiongfeng2@huawei.com, linux-arm-kernel@lists.infradead.org,
	Dongjiu Geng <gengdongjiu@huawei.com>,
	kvmarm@lists.cs.columbia.edu
Subject: Re: [PATCH v4 00/21] SError rework + RAS&IESB for firmware first support
Date: Mon, 20 Nov 2017 09:55:18 +0100	[thread overview]
Message-ID: <20171120085518.GD28855@cbox> (raw)
In-Reply-To: <5A0B13B5.3000205@arm.com>

On Tue, Nov 14, 2017 at 04:03:01PM +0000, James Morse wrote:
> Hi Christoffer,
> 
> On 13/11/17 11:29, Christoffer Dall wrote:
> > On Thu, Nov 09, 2017 at 06:14:56PM +0000, James Morse wrote:
> >> On 19/10/17 15:57, James Morse wrote:
> >>> Known issues:
> >> [...]
> >>>  * KVM-Migration: VDISR_EL2 is exposed to userspace as DISR_EL1, but how should
> >>>    HCR_EL2.VSE or VSESR_EL2 be migrated when the guest has an SError pending but
> >>>    hasn't taken it yet...?
> >>
> >> I've been trying to work out how this pending-SError-migration could work.
> >>
> >> If HCR_EL2.VSE is set then the guest will take a virtual SError when it next
> >> unmasks SError. Today this doesn't get migrated, but only KVM sets this bit as
> >> an attempt to kill the guest.
> >>
> >> This will be more of a problem with GengDongjiu's SError CAP for triggering
> >> guest SError from user-space, which will also allow the VSESR_EL2 to be
> >> specified. (this register becomes the guest ESR_EL1 when the virtual SError is
> >> taken and is used to emulate firmware-first's NOTIFY_SEI and eventually
> >> kernel-first RAS). These errors are likely to be handled by the guest.
> >>
> >>
> >> We don't want to expose VSESR_EL2 to user-space, and for migration it isn't
> >> enough as a value of '0' doesn't tell us if HCR_EL2.VSE is set.
> >>
> >> To get out of this corner: why not declare pending-SError-migration an invalid
> >> thing to do?
> 
> > To answer that question we'd have to know if that is generally a valid
> > thing to require.  How will higher level tools in the stack deal with
> > this (e.g. libvirt, and OpenStack).  Is it really valid to tell them
> > "nope, can't migrate right now".  I'm thinking if you have a failing
> > host and want to signal some error to the guest, that's probably a
> > really good time to migrate your mission-critical VM away to a different
> > host, and being told, "sorry, cannot do this" would be painful.  I'm
> > cc'ing Drew for his insight into libvirt and how this is done on x86,
> 
> Thanks,
> 
> 
> > but I'm not really crazy about this idea.
> 
> Excellent, so at the other extreme we could have an API to query all of this
> state, and another to set it. On systems without the RAS extensions this just
> moves the HCR_EL2.VSE bit. On systems with the RAS extensions it moves VSESR_EL2
> too.
> 
> I was hoping to avoid exposing different information. I need to look into how
> that works. (and this is all while avoiding adding an EL2 register to
> vcpu_sysreg [0])
> 
> 
> >> We can give Qemu a way to query if a virtual SError is (still) pending. Qemu
> >> would need to check this on each vcpu after migration, just before it throws the
> >> switch and the guest runs on the new host. This way the VSESR_EL2 value doesn't
> >> need migrating at all.
> >>
> >> In the ideal world, Qemu could re-inject the last SError it triggered if there
> >> is still one pending when it migrates... but because KVM injects errors too, it
> >> would need to block migration until this flag is cleared.
> 
> > I don't understand your conclusion here.
> 
> I was trying to reduce it to exposing just HCR_EL2.VSE as 'bool
> serror_still_pending()', then let Qemu re-inject whatever SError it injected
> last. This then behaves the same regardless of the RAS support.
> But KVM's kvm_inject_vabt() breaks this, Qemu can't know whether this pending
> SError was from Qemu, or from KVM.
> 
> ... So we need VSESR_EL2 on systems which have that register ...
> 
> (or, get rid of kvm_inject_vabt(), but that would involve a new exit type, and
> some trickery for existing user-space)
> 
> > If QEMU can query the virtual SError pending state, it can also inject
> > that before running the VM after a restore, and we should have preserved
> > the same state.
> 
> [..]
> 
> >> Can anyone suggest a better way?
> 
> > I'm thinking this is analogous to migrating a VM that uses an irqchip in
> > userspace and has set the IRQ or FIQ lines using KVM_IRQ_LINE.  My
> > feeling is that this is also not supported today.
> 
> Does KVM change/update these values behind Qemu's back? It's kvm_inject_vabt()
> that is making this tricky. (or at least confusing me)
> 

Yes, the IRQ line can be set to high from userspace, and then KVM can
lower this value when the guest has taken the virtual IRQ/FIQ.  I think
it's completely similar to your problem.

Thanks,
-Christoffer

WARNING: multiple messages have this Message-ID (diff)
From: cdall@linaro.org (Christoffer Dall)
To: linux-arm-kernel@lists.infradead.org
Subject: [PATCH v4 00/21] SError rework + RAS&IESB for firmware first support
Date: Mon, 20 Nov 2017 09:55:18 +0100	[thread overview]
Message-ID: <20171120085518.GD28855@cbox> (raw)
In-Reply-To: <5A0B13B5.3000205@arm.com>

On Tue, Nov 14, 2017 at 04:03:01PM +0000, James Morse wrote:
> Hi Christoffer,
> 
> On 13/11/17 11:29, Christoffer Dall wrote:
> > On Thu, Nov 09, 2017 at 06:14:56PM +0000, James Morse wrote:
> >> On 19/10/17 15:57, James Morse wrote:
> >>> Known issues:
> >> [...]
> >>>  * KVM-Migration: VDISR_EL2 is exposed to userspace as DISR_EL1, but how should
> >>>    HCR_EL2.VSE or VSESR_EL2 be migrated when the guest has an SError pending but
> >>>    hasn't taken it yet...?
> >>
> >> I've been trying to work out how this pending-SError-migration could work.
> >>
> >> If HCR_EL2.VSE is set then the guest will take a virtual SError when it next
> >> unmasks SError. Today this doesn't get migrated, but only KVM sets this bit as
> >> an attempt to kill the guest.
> >>
> >> This will be more of a problem with GengDongjiu's SError CAP for triggering
> >> guest SError from user-space, which will also allow the VSESR_EL2 to be
> >> specified. (this register becomes the guest ESR_EL1 when the virtual SError is
> >> taken and is used to emulate firmware-first's NOTIFY_SEI and eventually
> >> kernel-first RAS). These errors are likely to be handled by the guest.
> >>
> >>
> >> We don't want to expose VSESR_EL2 to user-space, and for migration it isn't
> >> enough as a value of '0' doesn't tell us if HCR_EL2.VSE is set.
> >>
> >> To get out of this corner: why not declare pending-SError-migration an invalid
> >> thing to do?
> 
> > To answer that question we'd have to know if that is generally a valid
> > thing to require.  How will higher level tools in the stack deal with
> > this (e.g. libvirt, and OpenStack).  Is it really valid to tell them
> > "nope, can't migrate right now".  I'm thinking if you have a failing
> > host and want to signal some error to the guest, that's probably a
> > really good time to migrate your mission-critical VM away to a different
> > host, and being told, "sorry, cannot do this" would be painful.  I'm
> > cc'ing Drew for his insight into libvirt and how this is done on x86,
> 
> Thanks,
> 
> 
> > but I'm not really crazy about this idea.
> 
> Excellent, so at the other extreme we could have an API to query all of this
> state, and another to set it. On systems without the RAS extensions this just
> moves the HCR_EL2.VSE bit. On systems with the RAS extensions it moves VSESR_EL2
> too.
> 
> I was hoping to avoid exposing different information. I need to look into how
> that works. (and this is all while avoiding adding an EL2 register to
> vcpu_sysreg [0])
> 
> 
> >> We can give Qemu a way to query if a virtual SError is (still) pending. Qemu
> >> would need to check this on each vcpu after migration, just before it throws the
> >> switch and the guest runs on the new host. This way the VSESR_EL2 value doesn't
> >> need migrating at all.
> >>
> >> In the ideal world, Qemu could re-inject the last SError it triggered if there
> >> is still one pending when it migrates... but because KVM injects errors too, it
> >> would need to block migration until this flag is cleared.
> 
> > I don't understand your conclusion here.
> 
> I was trying to reduce it to exposing just HCR_EL2.VSE as 'bool
> serror_still_pending()', then let Qemu re-inject whatever SError it injected
> last. This then behaves the same regardless of the RAS support.
> But KVM's kvm_inject_vabt() breaks this, Qemu can't know whether this pending
> SError was from Qemu, or from KVM.
> 
> ... So we need VSESR_EL2 on systems which have that register ...
> 
> (or, get rid of kvm_inject_vabt(), but that would involve a new exit type, and
> some trickery for existing user-space)
> 
> > If QEMU can query the virtual SError pending state, it can also inject
> > that before running the VM after a restore, and we should have preserved
> > the same state.
> 
> [..]
> 
> >> Can anyone suggest a better way?
> 
> > I'm thinking this is analogous to migrating a VM that uses an irqchip in
> > userspace and has set the IRQ or FIQ lines using KVM_IRQ_LINE.  My
> > feeling is that this is also not supported today.
> 
> Does KVM change/update these values behind Qemu's back? It's kvm_inject_vabt()
> that is making this tricky. (or at least confusing me)
> 

Yes, the IRQ line can be set to high from userspace, and then KVM can
lower this value when the guest has taken the virtual IRQ/FIQ.  I think
it's completely similar to your problem.

Thanks,
-Christoffer

  parent reply	other threads:[~2017-11-20  8:52 UTC|newest]

Thread overview: 160+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-10-19 14:57 [PATCH v4 00/21] SError rework + RAS&IESB for firmware first support James Morse
2017-10-19 14:57 ` James Morse
2017-10-19 14:57 ` [PATCH v4 01/21] arm64: explicitly mask all exceptions James Morse
2017-10-19 14:57   ` James Morse
2017-10-19 14:57 ` [PATCH v4 02/21] arm64: introduce an order for exceptions James Morse
2017-10-19 14:57   ` James Morse
2017-10-19 14:57 ` [PATCH v4 03/21] arm64: Move the async/fiq helpers to explicitly set process context flags James Morse
2017-10-19 14:57   ` James Morse
2017-10-19 14:57 ` [PATCH v4 04/21] arm64: Mask all exceptions during kernel_exit James Morse
2017-10-19 14:57   ` James Morse
2017-10-19 14:57 ` [PATCH v4 05/21] arm64: entry.S: Remove disable_dbg James Morse
2017-10-19 14:57   ` James Morse
2017-10-19 14:57 ` [PATCH v4 06/21] arm64: entry.S: convert el1_sync James Morse
2017-10-19 14:57   ` James Morse
2017-10-19 14:57 ` [PATCH v4 07/21] arm64: entry.S convert el0_sync James Morse
2017-10-19 14:57   ` James Morse
2017-10-19 14:57 ` [PATCH v4 08/21] arm64: entry.S: convert elX_irq James Morse
2017-10-19 14:57   ` James Morse
2017-10-19 14:57 ` [PATCH v4 09/21] KVM: arm/arm64: mask/unmask daif around VHE guests James Morse
2017-10-19 14:57   ` James Morse
2017-10-30  7:40   ` Christoffer Dall
2017-10-30  7:40     ` Christoffer Dall
2017-11-02 12:14     ` James Morse
2017-11-02 12:14       ` James Morse
2017-11-03 12:45       ` Christoffer Dall
2017-11-03 12:45         ` Christoffer Dall
2017-11-03 17:19         ` James Morse
2017-11-03 17:19           ` James Morse
2017-11-06 12:42           ` Christoffer Dall
2017-11-06 12:42             ` Christoffer Dall
2017-10-19 14:57 ` [PATCH v4 10/21] arm64: entry.S: move SError handling into a C function for future expansion James Morse
2017-10-19 14:57   ` James Morse
2018-01-02 21:07   ` Adam Wallis
2018-01-02 21:07     ` Adam Wallis
2018-01-03 16:00     ` James Morse
2018-01-03 16:00       ` James Morse
2017-10-19 14:57 ` [PATCH v4 11/21] arm64: cpufeature: Detect CPU RAS Extentions James Morse
2017-10-19 14:57   ` James Morse
2017-10-31 13:14   ` Will Deacon
2017-10-31 13:14     ` Will Deacon
2017-11-02 12:15     ` James Morse
2017-11-02 12:15       ` James Morse
2017-10-19 14:57 ` [PATCH v4 12/21] arm64: kernel: Survive corrected RAS errors notified by SError James Morse
2017-10-19 14:57   ` James Morse
2017-10-31 13:50   ` Will Deacon
2017-10-31 13:50     ` Will Deacon
2017-11-02 12:15     ` James Morse
2017-11-02 12:15       ` James Morse
2017-10-19 14:57 ` [PATCH v4 13/21] arm64: cpufeature: Enable IESB on exception entry/return for firmware-first James Morse
2017-10-19 14:57   ` James Morse
2017-10-31 13:56   ` Will Deacon
2017-10-31 13:56     ` Will Deacon
2017-10-19 14:58 ` [PATCH v4 14/21] arm64: kernel: Prepare for a DISR user James Morse
2017-10-19 14:58   ` James Morse
2017-10-19 14:58 ` [PATCH v4 15/21] KVM: arm64: Set an impdef ESR for Virtual-SError using VSESR_EL2 James Morse
2017-10-19 14:58   ` James Morse
2017-10-20 16:44   ` gengdongjiu
2017-10-20 16:44     ` gengdongjiu
2017-10-23 15:26     ` James Morse
2017-10-23 15:26       ` James Morse
2017-10-24  9:53       ` gengdongjiu
2017-10-24  9:53         ` gengdongjiu
2017-10-30  7:59   ` Christoffer Dall
2017-10-30  7:59     ` Christoffer Dall
2017-10-30 10:51     ` Christoffer Dall
2017-10-30 10:51       ` Christoffer Dall
2017-10-30 15:44       ` James Morse
2017-10-30 15:44         ` James Morse
2017-10-31  5:48         ` Christoffer Dall
2017-10-31  5:48           ` Christoffer Dall
2017-10-31  6:34   ` Marc Zyngier
2017-10-31  6:34     ` Marc Zyngier
2017-10-19 14:58 ` [PATCH v4 16/21] KVM: arm64: Save/Restore guest DISR_EL1 James Morse
2017-10-19 14:58   ` James Morse
2017-10-31  4:27   ` Marc Zyngier
2017-10-31  4:27     ` Marc Zyngier
2017-10-31  5:27   ` Christoffer Dall
2017-10-31  5:27     ` Christoffer Dall
2017-10-19 14:58 ` [PATCH v4 17/21] KVM: arm64: Save ESR_EL2 on guest SError James Morse
2017-10-19 14:58   ` James Morse
2017-10-31  4:26   ` Marc Zyngier
2017-10-31  4:26     ` Marc Zyngier
2017-10-31  5:47     ` Marc Zyngier
2017-10-31  5:47       ` Marc Zyngier
2017-11-01 17:42       ` James Morse
2017-11-01 17:42         ` James Morse
2017-10-19 14:58 ` [PATCH v4 18/21] KVM: arm64: Handle RAS SErrors from EL1 on guest exit James Morse
2017-10-19 14:58   ` James Morse
2017-10-31  5:55   ` Marc Zyngier
2017-10-31  5:55     ` Marc Zyngier
2017-10-31  5:56   ` Christoffer Dall
2017-10-31  5:56     ` Christoffer Dall
2017-10-19 14:58 ` [PATCH v4 19/21] KVM: arm64: Handle RAS SErrors from EL2 " James Morse
2017-10-19 14:58   ` James Morse
2017-10-27  6:26   ` gengdongjiu
2017-10-27  6:26     ` gengdongjiu
2017-10-27 17:38     ` James Morse
2017-10-27 17:38       ` James Morse
2017-10-31  6:13   ` Marc Zyngier
2017-10-31  6:13     ` Marc Zyngier
2017-10-31  6:13   ` Christoffer Dall
2017-10-31  6:13     ` Christoffer Dall
2017-10-19 14:58 ` [PATCH v4 20/21] KVM: arm64: Take any host SError before entering the guest James Morse
2017-10-19 14:58   ` James Morse
2017-10-31  6:23   ` Christoffer Dall
2017-10-31  6:23     ` Christoffer Dall
2017-10-31 11:43     ` James Morse
2017-10-31 11:43       ` James Morse
2017-11-01  4:55       ` Christoffer Dall
2017-11-01  4:55         ` Christoffer Dall
2017-11-02 12:18         ` James Morse
2017-11-02 12:18           ` James Morse
2017-11-03 12:49           ` Christoffer Dall
2017-11-03 12:49             ` Christoffer Dall
2017-11-03 16:14             ` James Morse
2017-11-03 16:14               ` James Morse
2017-11-06 12:45               ` Christoffer Dall
2017-11-06 12:45                 ` Christoffer Dall
2017-10-19 14:58 ` [PATCH v4 21/21] KVM: arm64: Trap RAS error registers and set HCR_EL2's TERR & TEA James Morse
2017-10-19 14:58   ` James Morse
2017-10-31  6:32   ` Christoffer Dall
2017-10-31  6:32     ` Christoffer Dall
2017-10-31  6:32   ` Marc Zyngier
2017-10-31  6:32     ` Marc Zyngier
2017-10-31  6:35 ` [PATCH v4 00/21] SError rework + RAS&IESB for firmware first support Christoffer Dall
2017-10-31  6:35   ` Christoffer Dall
2017-10-31 10:08   ` Will Deacon
2017-10-31 10:08     ` Will Deacon
2017-11-01 15:23     ` James Morse
2017-11-01 15:23       ` James Morse
2017-11-02  8:14       ` Christoffer Dall
2017-11-02  8:14         ` Christoffer Dall
2017-11-09 18:14 ` James Morse
2017-11-09 18:14   ` James Morse
2017-11-10 12:03   ` gengdongjiu
2017-11-10 12:03     ` gengdongjiu
2017-11-13 11:29   ` Christoffer Dall
2017-11-13 11:29     ` Christoffer Dall
2017-11-13 13:05     ` Peter Maydell
2017-11-13 13:05       ` Peter Maydell
2017-11-20  8:53       ` Christoffer Dall
2017-11-20  8:53         ` Christoffer Dall
2017-11-13 16:14     ` Andrew Jones
2017-11-13 16:14       ` Andrew Jones
2017-11-13 17:56       ` Peter Maydell
2017-11-13 17:56         ` Peter Maydell
2017-11-14 16:11       ` James Morse
2017-11-14 16:11         ` James Morse
2017-11-15  9:59         ` gengdongjiu
2017-11-15  9:59           ` gengdongjiu
2017-11-14 16:03     ` James Morse
2017-11-14 16:03       ` James Morse
2017-11-15  9:15       ` gengdongjiu
2017-11-15  9:15         ` gengdongjiu
2017-11-15 18:25         ` James Morse
2017-11-15 18:25           ` James Morse
2017-11-21 11:31           ` gengdongjiu
2017-11-21 11:31             ` gengdongjiu
2017-11-20  8:55       ` Christoffer Dall [this message]
2017-11-20  8:55         ` Christoffer Dall

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20171120085518.GD28855@cbox \
    --to=cdall@linaro.org \
    --cc=Jonathan.Zhang@cavium.com \
    --cc=catalin.marinas@arm.com \
    --cc=gengdongjiu@huawei.com \
    --cc=james.morse@arm.com \
    --cc=julien.thierry@arm.com \
    --cc=kvmarm@lists.cs.columbia.edu \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=marc.zyngier@arm.com \
    --cc=wangxiongfeng2@huawei.com \
    --cc=will.deacon@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.