From: Alexandru Elisei <alexandru.elisei@arm.com>
To: Oliver Upton <oliver.upton@linux.dev>
Cc: Will Deacon <will@kernel.org>,
maz@kernel.org, kvmarm@lists.cs.columbia.edu,
linux-arm-kernel@lists.infradead.org
Subject: Re: KVM/arm64: SPE: Translate VA to IPA on a stage 2 fault instead of pinning VM memory
Date: Wed, 27 Jul 2022 11:19:20 +0100 [thread overview]
Message-ID: <YuERKEjJh1qsZf8x@monolith.localdoman> (raw)
In-Reply-To: <YuApmZFdZzTi5ROu@google.com>
Hi Oliver,
Thank you for the help, replies below.
On Tue, Jul 26, 2022 at 10:51:21AM -0700, Oliver Upton wrote:
> Hi Alex,
>
> On Mon, Jul 25, 2022 at 11:06:24AM +0100, Alexandru Elisei wrote:
>
> [...]
>
> > > A funkier approach might be to defer pinning of the buffer until the SPE is
> > > enabled and avoid pinning all of VM memory that way, although I can't
> > > immediately tell how flexible the architecture is in allowing you to cache
> > > the base/limit values.
> >
> > I was investigating this approach, and Mark raised a concern that I think
> > might be a showstopper.
> >
> > Let's consider this scenario:
> >
> > Initial conditions: guest at EL1, profiling disabled (PMBLIMITR_EL1.E = 0,
> > PMBSR_EL1.S = 0, PMSCR_EL1.{E0SPE,E1SPE} = {0,0}).
> >
> > 1. Guest programs the buffer and enables it (PMBLIMITR_EL1.E = 1).
> > 2. Guest programs SPE to enable profiling at **EL0**
> > (PMSCR_EL1.{E0SPE,E1SPE} = {1,0}).
> > 3. Guest changes the translation table entries for the buffer. The
> > architecture allows this.
> > 4. Guest does an ERET to EL0, thus enabling profiling.
> >
> > Since KVM cannot trap the ERET to EL0, it will be impossible for KVM to pin
> > the buffer at stage 2 when profiling gets enabled at EL0.
>
> Not saying we necessarily should, but this is possible with FGT no?
It doesn't look to me like FEAT_FGT offers any knobs to trap ERET from EL1.
Unless there's no other way, I would prefer not to have the emulation of one
feature depend on the presence of another feature,
>
> > I can see two solutions here:
> >
> > a. Accept the limitation (and advertise it in the documentation) that if
> > someone wants to use SPE when running as a Linux guest, the kernel used by
> > the guest must not change the buffer translation table entries after the
> > buffer has been enabled (PMBLIMITR_EL1.E = 1). Linux already does that, so
> > running a Linux guest should not be a problem. I don't know how other OSes
> > do it (but I can find out). We could also phrase it that the buffer
> > translation table entries can be changed after enabling the buffer, but
> > only if profiling happens at EL1. But that sounds very arbitrary.
> >
> > b. Pin the buffer after the stage 2 DABT that SPE will report in the
> > situation above. This means that there is a blackout window, but will
> > happen only once after each time the guest reprograms the buffer. I don't
> > know if this is acceptable. We could say that this if this blackout window
> > is not acceptable, then the guest kernel shouldn't change the translation
> > table entries after enabling the buffer.
> >
> > Or drop the approach of pinning the buffer and go back to pinning the
> > entire memory of the VM.
> >
> > Any thoughts on this? I would very much prefer to try to pin only the
> > buffer.
>
> Doesn't pinning the buffer also imply pinning the stage 1 tables
> responsible for its translation as well? I agree that pinning the buffer
See my reply [1] to a question someone asked in an earlier iteration of the
pKVM series. My conclusion is that it's impossible to stop the
invalidate_range_start() MMU notifiers from being invoked for pinned pages.
But I believe that can be circumvented passing the enum mmu_notifier_event
event field to the arm64 KVM code and use that to decide to do the
unmapping or not. I am still investigating that, but it looks promising.
[1] https://lore.kernel.org/all/YuEMkKY2RU%2F2KiZW@monolith.localdoman/
> is likely the best way forward as pinning the whole of guest memory is
> entirely impractical.
I would say it's undesirable, not impractical. Like Marc said, vfio already
pins the entire guest memory with the VFIO_IOMMMU_MAP_DMA ioctl. The
difference there is that the SMMU tables are unmapped via the explicit
ioctl VFIO_IOMMU_UNMAP_DMA; the SMMU doesn't use the MMU notifiers to keep
in sync with host's stage 1 like KVM does.
>
> I'm also a bit confused on how we would manage to un-pin memory on the
> way out with this. The guest is free to muck with the stage 1 and could
> cause the SPU to spew a bunch of stage 2 aborts if it wanted to be
> annoying. One way to tackle it would be to only allow a single
> root-to-target walk to be pinned by a vCPU at a time. Any time a new
> stage 2 abort comes from the SPU, we un-pin the old walk and pin the new
> one instead.
>
> Live migration also throws a wrench in this. IOW, there are still potential
> sources of blackout unattributable to guest manipulation of the SPU.
I have a proposal to handle [2] that, if you want to have a look.
Basically, userspace tells KVM to never allow the guest to start profiling.
That means a possibly huge blackout window while the guest is being
migrated, but I don't see any better solutions.
[2] https://lore.kernel.org/all/20211117153842.302159-35-alexandru.elisei@arm.com/
Thanks,
Alex
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
next prev parent reply other threads:[~2022-07-27 10:20 UTC|newest]
Thread overview: 36+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-04-19 13:51 KVM/arm64: SPE: Translate VA to IPA on a stage 2 fault instead of pinning VM memory Alexandru Elisei
2022-04-19 14:10 ` Will Deacon
2022-04-19 14:44 ` Alexandru Elisei
2022-04-19 14:59 ` Will Deacon
2022-04-19 15:20 ` Alexandru Elisei
2022-04-19 15:35 ` Alexandru Elisei
2022-07-25 10:06 ` Alexandru Elisei
2022-07-26 17:51 ` Oliver Upton
2022-07-27 9:30 ` Marc Zyngier
2022-07-27 9:52 ` Marc Zyngier
2022-07-27 10:38 ` Alexandru Elisei
2022-07-27 16:06 ` Oliver Upton
2022-07-27 10:56 ` Alexandru Elisei
2022-07-27 11:18 ` Marc Zyngier
2022-07-27 12:10 ` Alexandru Elisei
2022-07-27 10:19 ` Alexandru Elisei [this message]
2022-07-27 10:29 ` Marc Zyngier
2022-07-27 10:44 ` Alexandru Elisei
2022-07-27 11:08 ` Marc Zyngier
2022-07-27 11:57 ` Alexandru Elisei
2022-07-27 15:15 ` Oliver Upton
2022-07-27 11:00 ` Alexandru Elisei
2022-08-01 17:00 ` Will Deacon
2022-08-02 9:49 ` Alexandru Elisei
2022-08-02 19:34 ` Oliver Upton
2022-08-09 14:01 ` Alexandru Elisei
2022-08-09 18:43 ` Oliver Upton
2022-08-10 9:37 ` Alexandru Elisei
2022-08-10 15:25 ` Oliver Upton
2022-08-12 13:05 ` Alexandru Elisei
2022-08-17 15:05 ` Oliver Upton
2022-09-12 14:50 ` Alexandru Elisei
2022-09-13 10:58 ` Oliver Upton
2022-09-13 12:41 ` Alexandru Elisei
2022-09-13 14:13 ` Oliver Upton
2023-01-03 14:26 ` Alexandru Elisei
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=YuERKEjJh1qsZf8x@monolith.localdoman \
--to=alexandru.elisei@arm.com \
--cc=kvmarm@lists.cs.columbia.edu \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=maz@kernel.org \
--cc=oliver.upton@linux.dev \
--cc=will@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox