public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed
From: Sean Christopherson <seanjc@google.com>
To: Nikolas Wipper <nikwip@amazon.de>
Cc: Paolo Bonzini <pbonzini@redhat.com>,
	Vitaly Kuznetsov <vkuznets@redhat.com>,
	 Nicolas Saenz Julienne <nsaenz@amazon.com>,
	Alexander Graf <graf@amazon.de>,
	James Gowans <jgowans@amazon.com>,
	 nh-open-source@amazon.com, Thomas Gleixner <tglx@linutronix.de>,
	 Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
	 Dave Hansen <dave.hansen@linux.intel.com>,
	linux-kernel@vger.kernel.org,  kvm@vger.kernel.org,
	x86@kernel.org, linux-doc@vger.kernel.org,
	 linux-kselftest@vger.kernel.org, kvmarm@lists.linux.dev,
	 kvm-riscv@lists.infradead.org
Subject: Re: [PATCH 00/15] KVM: x86: Introduce new ioctl KVM_TRANSLATE2
Date: Wed, 11 Dec 2024 14:05:00 -0800	[thread overview]
Message-ID: <Z1oMjKa3gExDxsCN@google.com> (raw)
In-Reply-To: <20240910152207.38974-1-nikwip@amazon.de>

On Tue, Sep 10, 2024, Nikolas Wipper wrote:
> This series introduces a new ioctl KVM_TRANSLATE2, which expands on
> KVM_TRANSLATE. It is required to implement Hyper-V's
> HvTranslateVirtualAddress hyper-call as part of the ongoing effort to
> emulate HyperV's Virtual Secure Mode (VSM) within KVM and QEMU. The hyper-
> call requires several new KVM APIs, one of which is KVM_TRANSLATE2, which
> implements the core functionality of the hyper-call. The rest of the
> required functionality will be implemented in subsequent series.
> 
> Other than translating guest virtual addresses, the ioctl allows the
> caller to control whether the access and dirty bits are set during the
> page walk. It also allows specifying an access mode instead of returning
> viable access modes, which enables setting the bits up to the level that
> caused a failure. Additionally, the ioctl provides more information about
> why the page walk failed, and which page table is responsible. This
> functionality is not available within KVM_TRANSLATE, and can't be added
> without breaking backwards compatiblity, thus a new ioctl is required.

...

>  Documentation/virt/kvm/api.rst                | 131 ++++++++
>  arch/x86/include/asm/kvm_host.h               |  18 +-
>  arch/x86/kvm/hyperv.c                         |   3 +-
>  arch/x86/kvm/kvm_emulate.h                    |   8 +
>  arch/x86/kvm/mmu.h                            |  10 +-
>  arch/x86/kvm/mmu/mmu.c                        |   7 +-
>  arch/x86/kvm/mmu/paging_tmpl.h                |  80 +++--
>  arch/x86/kvm/x86.c                            | 123 ++++++-
>  include/linux/kvm_host.h                      |   6 +
>  include/uapi/linux/kvm.h                      |  33 ++
>  tools/testing/selftests/kvm/Makefile          |   1 +
>  .../selftests/kvm/x86_64/kvm_translate2.c     | 310 ++++++++++++++++++
>  virt/kvm/kvm_main.c                           |  41 +++
>  13 files changed, 724 insertions(+), 47 deletions(-)
>  create mode 100644 tools/testing/selftests/kvm/x86_64/kvm_translate2.c

...

> The simple reason for keeping this functionality in KVM, is that it already
> has a mature, production-level page walker (which is already exposed) and
> creating something similar QEMU would take a lot longer and would be much
> harder to maintain than just creating an API that leverages the existing
> walker.

I'm not convinced that implementing targeted support in QEMU (or any other VMM)
would be at all challenging or a burden to maintain.  I do think duplicating
functionality across multiple VMMs is undesirable, but that's an argument for
creating modular userspace libraries for such functionality.  E.g. I/O APIC
emulation is another one I'd love to move to a common library.

Traversing page tables isn't difficult.  Checking permission bits isn't complex.
Tedious, perhaps.  But not complex.  KVM's rather insane code comes from KVM's
desire to make the checks as performant as possible, because eking out every little
bit of performance matters for legacy shadow paging.  I doubt VSM needs _that_
level of performance.

I say "targeted", because I assume the only use case for VSM is 64-bit non-nested
guests.  QEMU already has a rudimentary supporting for walking guest page tables,
and that code is all of 40 LoC.  Granted, it's heinous and lacks permission checks
and A/D updates, but I would expect a clean implementation with permission checks
and A/D support would clock in around 200 LoC.  Maybe 300.

And ignoring docs and selftests, that's roughly what's being added in this series.
Much of the code being added is quite simple, but there are non-trivial changes
here as well.  E.g. the different ways of setting A/D bits.

My biggest concern is taking on ABI that restricts what KVM can do in its walker.
E.g. I *really* don't like the PKU change.  Yeah, Intel doesn't explicitly define
architectural behavior, but diverging from hardware behavior is rarely a good idea.

Similarly, the behavior of FNAME(protect_clean_gpte)() probably isn't desirable
for the VSM use case.

      parent reply	other threads:[~2024-12-11 22:05 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-09-10 15:21 [PATCH 00/15] KVM: x86: Introduce new ioctl KVM_TRANSLATE2 Nikolas Wipper
2024-09-10 15:21 ` [PATCH 01/15] KVM: Add API documentation for KVM_TRANSLATE2 Nikolas Wipper
2024-09-10 15:21 ` [PATCH 02/15] KVM: x86/mmu: Abort page walk if permission checks fail Nikolas Wipper
2024-09-10 15:21 ` [PATCH 03/15] KVM: x86/mmu: Introduce exception flag for unmapped GPAs Nikolas Wipper
2024-09-10 15:21 ` [PATCH 04/15] KVM: x86/mmu: Store GPA in exception if applicable Nikolas Wipper
2024-09-10 15:21 ` [PATCH 05/15] KVM: x86/mmu: Introduce flags parameter to page walker Nikolas Wipper
2024-09-10 15:21 ` [PATCH 06/15] KVM: x86/mmu: Implement PWALK_SET_ACCESSED in " Nikolas Wipper
2024-09-10 15:21 ` [PATCH 07/15] KVM: x86/mmu: Implement PWALK_SET_DIRTY " Nikolas Wipper
2024-09-10 15:22 ` [PATCH 08/15] KVM: x86/mmu: Implement PWALK_FORCE_SET_ACCESSED " Nikolas Wipper
2024-09-10 15:22 ` [PATCH 09/15] KVM: x86/mmu: Introduce status parameter to " Nikolas Wipper
2024-09-10 15:22 ` [PATCH 10/15] KVM: x86/mmu: Implement PWALK_STATUS_READ_ONLY_PTE_GPA in " Nikolas Wipper
2024-09-10 15:22 ` [PATCH 11/15] KVM: x86: Introduce generic gva to gpa translation function Nikolas Wipper
2024-09-10 15:22 ` [PATCH 12/15] KVM: Introduce KVM_TRANSLATE2 Nikolas Wipper
2024-09-10 15:22 ` [PATCH 13/15] KVM: Add KVM_TRANSLATE2 stub Nikolas Wipper
2024-09-10 15:22 ` [PATCH 14/15] KVM: x86: Implement KVM_TRANSLATE2 Nikolas Wipper
2024-12-11 22:06   ` Sean Christopherson
2024-09-10 15:22 ` [PATCH 15/15] KVM: selftests: Add test for KVM_TRANSLATE2 Nikolas Wipper
2024-10-04 10:44 ` [PATCH 00/15] KVM: x86: Introduce new ioctl KVM_TRANSLATE2 Nikolas Wipper
2024-12-11 22:05 ` Sean Christopherson [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Z1oMjKa3gExDxsCN@google.com \
    --to=seanjc@google.com \
    --cc=bp@alien8.de \
    --cc=dave.hansen@linux.intel.com \
    --cc=graf@amazon.de \
    --cc=jgowans@amazon.com \
    --cc=kvm-riscv@lists.infradead.org \
    --cc=kvm@vger.kernel.org \
    --cc=kvmarm@lists.linux.dev \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-kselftest@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=nh-open-source@amazon.com \
    --cc=nikwip@amazon.de \
    --cc=nsaenz@amazon.com \
    --cc=pbonzini@redhat.com \
    --cc=tglx@linutronix.de \
    --cc=vkuznets@redhat.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox