linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
From: james.morse@arm.com (James Morse)
To: linux-arm-kernel@lists.infradead.org
Subject: [PATCH v6 11/13] KVM: arm64: Handle RAS SErrors from EL1 on guest exit
Date: Tue, 30 Jan 2018 19:18:57 +0000	[thread overview]
Message-ID: <5A70C521.2020903@arm.com> (raw)
In-Reply-To: <20180123153245.GN21802@cbox>

Hi Christoffer,

On 23/01/18 15:32, Christoffer Dall wrote:
> On Mon, Jan 22, 2018 at 06:18:54PM +0000, James Morse wrote:
>> On 19/01/18 19:20, Christoffer Dall wrote:
>>> On Mon, Jan 15, 2018 at 07:39:04PM +0000, James Morse wrote:
>>>> If only some of the CPUs support RAS the guest will see the cpufeature
>>>> sanitised version of the id registers, but we may still take RAS SError
>>>> on this CPU. Move the SError handling out of handle_exit() into a new
>>>> handler that runs before we can be preempted. This allows us to use
>>>> this_cpu_has_cap(), via arm64_is_ras_serror().
>>>
>>> Would it be possible to optimize this a bit later on by caching
>>> this_cpu_has_cap() in vcpu_load() so that we can use a single
>>> handle_exit function to process all exits?
>>
>> If vcpu_load() prevents pre-emption between the guest-exit exception and the
>> this_cpu_has_cap() test then we wouldn't need a separate handle_exit().
> 
> It doesn't, but you'd get another vcpu_put() / vcpu_load() if you get
> preempted, and you could record anything you need to know about the CPU
> that actually ran the guest in vcpu_put().

Snazzy!

> So it might be possible to call some "process pending serror" function
> in vcpu_put().

Hmm, maybe. When we exit the guest its because we've had a notification an error
occurred, but we don't know what/where yet. The case that worries me is we
reschedule() onto some other affected task, and it gets notification of the
error too.

For notifications signalled by an SError I'd like to feed them into the RAS
machinery before we unmask SError on the host, so that the first error is
handled first. Otherwise KVM has to eyeball the SError ESR and guess as to
whether the host is affected by the error, before re-enabling preemption on the
grounds it 'probably only affects this guest'.


>> But, if we support kernel-first RAS or firmware-first's NOTIFY_SEI we shouldn't
>> unmask SError until we've fed the guest-exit:SError into the RAS code. This
>> would also need the SError related handle_exit() calls to be separate/earlier.
>> (there was some verbiage on this in the cover letter).
> 
> Yeah, I sort-of understood where this was going...

(sorry, I assume not everyone reads the cover letter!)


>> (I started down the 'make handle_exit() non-preemptible', but WF{E,I}'s
>> kvm_vcpu_block()->schedule() and kvm_vcpu_on_spin()s use of kvm_vcpu_yield_to()
>> put an end to that).
> 
> It's not clear to me exactly how that would work, as handle_exit() can
> also block on stuff like allocating memory.  

Yes, it was a dead end. I figured two handle_exit()s was a bit ugly, I assumed
you were asking about moving back to a single handle_exit()...


> I suppose enabling
> preemption could be per exit reason, but that might be hard to maintain.


>> In terms of caching this_cpu_has_cap() value, is this due to a performance
>> concern? It's all called behind 'exception_index == ARM_EXCEPTION_EL1_SERROR',
>> so we've already taken an SError out of the guest. Once its all put together
>> we're likely to have a pending signal for user-space.
>> 'Corrected' (or at least ignorable) errors are going to be the odd one out, I
>> don't think we should worry about these!
> 
> The performance concern is having to call another function to check the
> return value again in the critical path.

My justification for this sort of thing has been we've taken an SError, we may
panic() the host if its uncontained. Provided there is no extra cost on the 'no
SError' path, I don't think the 'we've taken an SError' paths need to be fast.


> On older implementations this
> kind of thing is actually measureable, and there's a tendency to add a
> call here and a call there for any new aspect of the architecture, and
> it will eventually weigh things down, I believe.

I'll keep this in mind.


> On the other hand,having a "process some things before we enable preemption"
>  which is your handle_exit_early() function (could this also have been called
> handle_exit_nopreempt() ?)

Yes, and that would have been a better name!


Thanks,

James


> is a potentially generally useful thing to
> have and a reasonable thing overall.

  reply	other threads:[~2018-01-30 19:18 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-01-15 19:38 [PATCH v6 00/13] arm64/KVM: RAS & IESB for firmware first support James Morse
2018-01-15 19:38 ` [PATCH v6 01/13] arm64: cpufeature: __this_cpu_has_cap() shouldn't stop early James Morse
2018-01-15 19:38 ` [PATCH v6 02/13] arm64: sysreg: Move to use definitions for all the SCTLR bits James Morse
2018-01-15 19:38 ` [PATCH v6 03/13] arm64: cpufeature: Detect CPU RAS Extentions James Morse
     [not found]   ` <78d4374a-f75c-6860-38ab-6373ffae3eee@huawei.com>
2018-01-22 19:32     ` James Morse
2018-01-23  9:06       ` gengdongjiu
2018-01-23 19:05         ` James Morse
2018-01-25  8:27           ` gengdongjiu
2018-01-15 19:38 ` [PATCH v6 04/13] arm64: kernel: Survive corrected RAS errors notified by SError James Morse
2018-01-15 19:38 ` [PATCH v6 05/13] arm64: Unconditionally enable IESB on exception entry/return for firmware-first James Morse
2018-01-15 19:38 ` [PATCH v6 06/13] arm64: kernel: Prepare for a DISR user James Morse
2018-01-15 19:39 ` [PATCH v6 07/13] KVM: arm/arm64: mask/unmask daif around VHE guests James Morse
2018-01-15 19:39 ` [PATCH v6 08/13] KVM: arm64: Set an impdef ESR for Virtual-SError using VSESR_EL2 James Morse
2018-01-15 19:39 ` [PATCH v6 09/13] KVM: arm64: Save/Restore guest DISR_EL1 James Morse
2018-01-15 19:39 ` [PATCH v6 10/13] KVM: arm64: Save ESR_EL2 on guest SError James Morse
2018-01-15 19:39 ` [PATCH v6 11/13] KVM: arm64: Handle RAS SErrors from EL1 on guest exit James Morse
2018-01-19 19:20   ` Christoffer Dall
2018-01-22 18:18     ` James Morse
2018-01-23 15:32       ` Christoffer Dall
2018-01-30 19:18         ` James Morse [this message]
2018-01-15 19:39 ` [PATCH v6 12/13] KVM: arm64: Handle RAS SErrors from EL2 " James Morse
2018-01-19 19:54   ` Christoffer Dall
2018-01-15 19:39 ` [PATCH v6 13/13] KVM: arm64: Emulate RAS error registers and set HCR_EL2's TERR & TEA James Morse

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5A70C521.2020903@arm.com \
    --to=james.morse@arm.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).