All of lore.kernel.org
 help / color / mirror / Atom feed
From: Oliver Upton <oliver.upton@linux.dev>
To: Alexandru Elisei <alexandru.elisei@arm.com>
Cc: maz@kernel.org, Will Deacon <will@kernel.org>,
	kvmarm@lists.cs.columbia.edu,
	linux-arm-kernel@lists.infradead.org
Subject: Re: KVM/arm64: SPE: Translate VA to IPA on a stage 2 fault instead of pinning VM memory
Date: Tue, 26 Jul 2022 10:51:21 -0700	[thread overview]
Message-ID: <YuApmZFdZzTi5ROu@google.com> (raw)
In-Reply-To: <Yt5nFAscgrRGNGoH@monolith.localdoman>

Hi Alex,

On Mon, Jul 25, 2022 at 11:06:24AM +0100, Alexandru Elisei wrote:

[...]

> > A funkier approach might be to defer pinning of the buffer until the SPE is
> > enabled and avoid pinning all of VM memory that way, although I can't
> > immediately tell how flexible the architecture is in allowing you to cache
> > the base/limit values.
> 
> I was investigating this approach, and Mark raised a concern that I think
> might be a showstopper.
> 
> Let's consider this scenario:
> 
> Initial conditions: guest at EL1, profiling disabled (PMBLIMITR_EL1.E = 0,
> PMBSR_EL1.S = 0, PMSCR_EL1.{E0SPE,E1SPE} = {0,0}).
> 
> 1. Guest programs the buffer and enables it (PMBLIMITR_EL1.E = 1).
> 2. Guest programs SPE to enable profiling at **EL0**
> (PMSCR_EL1.{E0SPE,E1SPE} = {1,0}).
> 3. Guest changes the translation table entries for the buffer. The
> architecture allows this.
> 4. Guest does an ERET to EL0, thus enabling profiling.
> 
> Since KVM cannot trap the ERET to EL0, it will be impossible for KVM to pin
> the buffer at stage 2 when profiling gets enabled at EL0.

Not saying we necessarily should, but this is possible with FGT no?

> I can see two solutions here:
> 
> a. Accept the limitation (and advertise it in the documentation) that if
> someone wants to use SPE when running as a Linux guest, the kernel used by
> the guest must not change the buffer translation table entries after the
> buffer has been enabled (PMBLIMITR_EL1.E = 1). Linux already does that, so
> running a Linux guest should not be a problem. I don't know how other OSes
> do it (but I can find out). We could also phrase it that the buffer
> translation table entries can be changed after enabling the buffer, but
> only if profiling happens at EL1. But that sounds very arbitrary.
> 
> b. Pin the buffer after the stage 2 DABT that SPE will report in the
> situation above. This means that there is a blackout window, but will
> happen only once after each time the guest reprograms the buffer. I don't
> know if this is acceptable. We could say that this if this blackout window
> is not acceptable, then the guest kernel shouldn't change the translation
> table entries after enabling the buffer.
> 
> Or drop the approach of pinning the buffer and go back to pinning the
> entire memory of the VM.
> 
> Any thoughts on this? I would very much prefer to try to pin only the
> buffer.

Doesn't pinning the buffer also imply pinning the stage 1 tables
responsible for its translation as well? I agree that pinning the buffer
is likely the best way forward as pinning the whole of guest memory is
entirely impractical.

I'm also a bit confused on how we would manage to un-pin memory on the
way out with this. The guest is free to muck with the stage 1 and could
cause the SPU to spew a bunch of stage 2 aborts if it wanted to be
annoying. One way to tackle it would be to only allow a single
root-to-target walk to be pinned by a vCPU at a time. Any time a new
stage 2 abort comes from the SPU, we un-pin the old walk and pin the new
one instead.

Live migration also throws a wrench in this. IOW, there are still potential
sources of blackout unattributable to guest manipulation of the SPU.

Going to think on this some more..

--
Thanks,
Oliver
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

WARNING: multiple messages have this Message-ID (diff)
From: Oliver Upton <oliver.upton@linux.dev>
To: Alexandru Elisei <alexandru.elisei@arm.com>
Cc: Will Deacon <will@kernel.org>,
	maz@kernel.org, kvmarm@lists.cs.columbia.edu,
	linux-arm-kernel@lists.infradead.org
Subject: Re: KVM/arm64: SPE: Translate VA to IPA on a stage 2 fault instead of pinning VM memory
Date: Tue, 26 Jul 2022 10:51:21 -0700	[thread overview]
Message-ID: <YuApmZFdZzTi5ROu@google.com> (raw)
In-Reply-To: <Yt5nFAscgrRGNGoH@monolith.localdoman>

Hi Alex,

On Mon, Jul 25, 2022 at 11:06:24AM +0100, Alexandru Elisei wrote:

[...]

> > A funkier approach might be to defer pinning of the buffer until the SPE is
> > enabled and avoid pinning all of VM memory that way, although I can't
> > immediately tell how flexible the architecture is in allowing you to cache
> > the base/limit values.
> 
> I was investigating this approach, and Mark raised a concern that I think
> might be a showstopper.
> 
> Let's consider this scenario:
> 
> Initial conditions: guest at EL1, profiling disabled (PMBLIMITR_EL1.E = 0,
> PMBSR_EL1.S = 0, PMSCR_EL1.{E0SPE,E1SPE} = {0,0}).
> 
> 1. Guest programs the buffer and enables it (PMBLIMITR_EL1.E = 1).
> 2. Guest programs SPE to enable profiling at **EL0**
> (PMSCR_EL1.{E0SPE,E1SPE} = {1,0}).
> 3. Guest changes the translation table entries for the buffer. The
> architecture allows this.
> 4. Guest does an ERET to EL0, thus enabling profiling.
> 
> Since KVM cannot trap the ERET to EL0, it will be impossible for KVM to pin
> the buffer at stage 2 when profiling gets enabled at EL0.

Not saying we necessarily should, but this is possible with FGT no?

> I can see two solutions here:
> 
> a. Accept the limitation (and advertise it in the documentation) that if
> someone wants to use SPE when running as a Linux guest, the kernel used by
> the guest must not change the buffer translation table entries after the
> buffer has been enabled (PMBLIMITR_EL1.E = 1). Linux already does that, so
> running a Linux guest should not be a problem. I don't know how other OSes
> do it (but I can find out). We could also phrase it that the buffer
> translation table entries can be changed after enabling the buffer, but
> only if profiling happens at EL1. But that sounds very arbitrary.
> 
> b. Pin the buffer after the stage 2 DABT that SPE will report in the
> situation above. This means that there is a blackout window, but will
> happen only once after each time the guest reprograms the buffer. I don't
> know if this is acceptable. We could say that this if this blackout window
> is not acceptable, then the guest kernel shouldn't change the translation
> table entries after enabling the buffer.
> 
> Or drop the approach of pinning the buffer and go back to pinning the
> entire memory of the VM.
> 
> Any thoughts on this? I would very much prefer to try to pin only the
> buffer.

Doesn't pinning the buffer also imply pinning the stage 1 tables
responsible for its translation as well? I agree that pinning the buffer
is likely the best way forward as pinning the whole of guest memory is
entirely impractical.

I'm also a bit confused on how we would manage to un-pin memory on the
way out with this. The guest is free to muck with the stage 1 and could
cause the SPU to spew a bunch of stage 2 aborts if it wanted to be
annoying. One way to tackle it would be to only allow a single
root-to-target walk to be pinned by a vCPU at a time. Any time a new
stage 2 abort comes from the SPU, we un-pin the old walk and pin the new
one instead.

Live migration also throws a wrench in this. IOW, there are still potential
sources of blackout unattributable to guest manipulation of the SPU.

Going to think on this some more..

--
Thanks,
Oliver

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

  reply	other threads:[~2022-07-26 17:51 UTC|newest]

Thread overview: 72+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-04-19 13:51 KVM/arm64: SPE: Translate VA to IPA on a stage 2 fault instead of pinning VM memory Alexandru Elisei
2022-04-19 13:51 ` Alexandru Elisei
2022-04-19 14:10 ` Will Deacon
2022-04-19 14:10   ` Will Deacon
2022-04-19 14:44   ` Alexandru Elisei
2022-04-19 14:44     ` Alexandru Elisei
2022-04-19 14:59     ` Will Deacon
2022-04-19 14:59       ` Will Deacon
2022-04-19 15:20       ` Alexandru Elisei
2022-04-19 15:20         ` Alexandru Elisei
2022-04-19 15:35         ` Alexandru Elisei
2022-04-19 15:35           ` Alexandru Elisei
2022-07-25 10:06   ` Alexandru Elisei
2022-07-25 10:06     ` Alexandru Elisei
2022-07-26 17:51     ` Oliver Upton [this message]
2022-07-26 17:51       ` Oliver Upton
2022-07-27  9:30       ` Marc Zyngier
2022-07-27  9:30         ` Marc Zyngier
2022-07-27  9:52         ` Marc Zyngier
2022-07-27  9:52           ` Marc Zyngier
2022-07-27 10:38           ` Alexandru Elisei
2022-07-27 10:38             ` Alexandru Elisei
2022-07-27 16:06             ` Oliver Upton
2022-07-27 16:06               ` Oliver Upton
2022-07-27 10:56         ` Alexandru Elisei
2022-07-27 10:56           ` Alexandru Elisei
2022-07-27 11:18           ` Marc Zyngier
2022-07-27 11:18             ` Marc Zyngier
2022-07-27 12:10             ` Alexandru Elisei
2022-07-27 12:10               ` Alexandru Elisei
2022-07-27 10:19       ` Alexandru Elisei
2022-07-27 10:19         ` Alexandru Elisei
2022-07-27 10:29         ` Marc Zyngier
2022-07-27 10:29           ` Marc Zyngier
2022-07-27 10:44           ` Alexandru Elisei
2022-07-27 10:44             ` Alexandru Elisei
2022-07-27 11:08             ` Marc Zyngier
2022-07-27 11:08               ` Marc Zyngier
2022-07-27 11:57               ` Alexandru Elisei
2022-07-27 11:57                 ` Alexandru Elisei
2022-07-27 15:15                 ` Oliver Upton
2022-07-27 15:15                   ` Oliver Upton
2022-07-27 11:00       ` Alexandru Elisei
2022-07-27 11:00         ` Alexandru Elisei
2022-08-01 17:00     ` Will Deacon
2022-08-01 17:00       ` Will Deacon
2022-08-02  9:49       ` Alexandru Elisei
2022-08-02  9:49         ` Alexandru Elisei
2022-08-02 19:34         ` Oliver Upton
2022-08-02 19:34           ` Oliver Upton
2022-08-09 14:01           ` Alexandru Elisei
2022-08-09 14:01             ` Alexandru Elisei
2022-08-09 18:43             ` Oliver Upton
2022-08-09 18:43               ` Oliver Upton
2022-08-10  9:37               ` Alexandru Elisei
2022-08-10  9:37                 ` Alexandru Elisei
2022-08-10 15:25                 ` Oliver Upton
2022-08-10 15:25                   ` Oliver Upton
2022-08-12 13:05                   ` Alexandru Elisei
2022-08-12 13:05                     ` Alexandru Elisei
2022-08-17 15:05                     ` Oliver Upton
2022-08-17 15:05                       ` Oliver Upton
2022-09-12 14:50                       ` Alexandru Elisei
2022-09-12 14:50                         ` Alexandru Elisei
2022-09-13 10:58                         ` Oliver Upton
2022-09-13 10:58                           ` Oliver Upton
2022-09-13 12:41                           ` Alexandru Elisei
2022-09-13 12:41                             ` Alexandru Elisei
2022-09-13 14:13                             ` Oliver Upton
2022-09-13 14:13                               ` Oliver Upton
2023-01-03 14:26                               ` Alexandru Elisei
2023-01-03 14:26                                 ` Alexandru Elisei

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YuApmZFdZzTi5ROu@google.com \
    --to=oliver.upton@linux.dev \
    --cc=alexandru.elisei@arm.com \
    --cc=kvmarm@lists.cs.columbia.edu \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=maz@kernel.org \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.