public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed
From: Sean Christopherson <seanjc@google.com>
To: Oliver Upton <oliver.upton@linux.dev>
Cc: Anish Moorthy <amoorthy@google.com>,
	maz@kernel.org, kvm@vger.kernel.org,  kvmarm@lists.linux.dev,
	robert.hoo.linux@gmail.com, jthoughton@google.com,
	 dmatlack@google.com, axelrasmussen@google.com,
	peterx@redhat.com,  nadav.amit@gmail.com,
	isaku.yamahata@gmail.com, kconsul@linux.vnet.ibm.com
Subject: Re: [PATCH v7 08/14] KVM: arm64: Enable KVM_CAP_MEMORY_FAULT_INFO and annotate fault in the stage-2 fault handler
Date: Tue, 5 Mar 2024 07:39:45 -0800	[thread overview]
Message-ID: <Zec8stHiFS3KOOPt@google.com> (raw)
In-Reply-To: <ZeZu9D3Ic_1O5CIO@linux.dev>

On Tue, Mar 05, 2024, Oliver Upton wrote:
> On Mon, Mar 04, 2024 at 02:49:07PM -0800, Sean Christopherson wrote:
> 
> [...]
> 
> > The presense of MTE stuff shouldn't affect the fundamental access information,
> 
>   "When FEAT_MTE is implemented, for a synchronous Data Abort on an
>   instruction that directly accesses Allocation Tags, ISV is 0."
> 
> If there is no instruction syndrome, there's insufficient fault context
> to determine if the guest was doing a read or a write.
> 
> > e.g. if the guest was attempting to write, then KVM should set KVM_MEMORY_EXIT_FLAG_WRITE
> > irrespective of whether or not MTE is in play.
> 
> When the MMU generates such an abort, it *is not* read, write, or execute.
> It is a NoTagAccess fault. There is no sane way to describe this in
> terms of RWX.

RWX=0, with KVM_MEMORY_EXIT_FLAG_MTE seems like a reasonable way to communicate
that, no?

> > > > E.g. on the x86 side, KVM intentionally sets reserved bits in SPTEs for
> > > > "caching" emulated MMIO accesses, and the resulting fault captures the
> > > > "reserved bits set" information in register state.  But that's purely an
> > > > (optional) imlementation detail of KVM that should never be exposed to
> > > > userspace.
> > > 
> > > MMIO accesses would show up elsewhere though, right?
> > 
> > Yes, but I don't see how that's relevant.  Maybe I'm just misunderstanding what
> > you're saying/asking.
> 
> If "reserved" EPT violations found their way to userspace via the
> "memory fault" exit structure then that'd likely be due to a KVM bug.

Heh, sadly no.  Userspace can convert a GFN to private at any time, and the
TDX and SNP specs allow for implicit converstion "requests" from the guest, i.e.
KVM isn't allowed to kill the guest if the guest accesses a GFN with the "wrong"
private/shared attribute.

This particular case would likely be hit only if there's a userspace and/or guest
bug, but whether the setup is broken or just unique isn't KVM's call to make.

> The only expected flows in the near term are this and CoCo crap.
> 
> > > Either way, I have no issues whatsoever if the direction for x86 is to
> > > provide abstracted fault information.
> > 
> > I don't understand how ARM can get away with NOT providing a layer of abstraction.
> > Copying fault state verbatim to userspace will bleed KVM implementation details
> > into userspace,
> 
> The memslot flag already bleeds KVM implementation detail into userspace
> to a degree. The event we're trying to let userspace handle is at the
> intersection of a specific hardware/software state.

Yes, but IMO there's a huge difference between userspace knowing that KVM uses gup()
and memslots to translate gfn=>hva=>pfn, or even knowing that KVM plays games with
reserved stage-2 PTE bits, and userspace knowing exactly how KVM configures stage-2
PTEs.

Another example would be dirty logging on Intel CPUs.  The *admin* can decide
whether to use a write-protection scheme or page-modification logging, but KVM's
ABI with userspace provides a layer of abstraction (dirty ring or bitmap) so that
the userspace VMM doesn't need to do X for write-protection and Y for PML.

> > Abstracting gory hardware details from userspace is one of the main roles of the
> > kernel.
> 
> Where it can be accomplished without a loss (or misrepresentation) of
> information, agreed. But KVM UAPI is so architecture-specific that it
> seems arbitrary to draw the line here.

I don't think it's arbitrary.  KVM's existing uAPI for mapping memory into the
guest is almost entirely arch-neutral, and I want to preserve that for related
functionality unless it's literally impossible to do so.

> > A concrete example of hardware throwing a wrench in things is AMD's upcoming
> > "encrypted" flag (in the stage-2 page fault error code), which is set by SNP-capable
> > CPUs for *any* VM that supports guest-controlled encrypted memory.  If KVM reported
> > the page fault error code directly to userspace, then running the same VM on
> > different hardware generations, e.g. after live migration, would generate different
> > error codes.
> >  
> > Are we talking past each other?  I'm genuinely confused by the pushback on
> > capturing RWX information.  Yes, the RWX info may be insufficient in some cases,
> > but its existence doesn't preclude KVM from providing more information as needed.
> 
> My pushback isn't exactly on RWX (even though I noted the MTE quirk
> above). What I'm poking at here is the general infrastructure for
> reflecting faults into userspace, which is aggressively becoming more
> relevant.

But the purpose of memory_fault isn't to reflect faults into userspace, it's to
alert userspace that KVM has encountered a memory fault that requires userspace
action to resolve.

That distinction matters because there are and will be MMU features that KVM
supports, and that can generate novel faults, but such faults won't be punted to
userspace unless KVM provides a way for userspace to explicitly control the MMU
feature.

And if KVM lets userspace control a feature, then KVM needs new uAPI to expose
the controls.  Which means that we should always have an opportunity to expand
memory_fault, e.g. with new flags, to support such features.

  reply	other threads:[~2024-03-05 15:39 UTC|newest]

Thread overview: 42+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-02-15 23:53 [PATCH v7 00/14] Improve KVM + userfaultfd performance via KVM_EXIT_MEMORY_FAULTs on stage-2 faults Anish Moorthy
2024-02-15 23:53 ` [PATCH v7 01/14] KVM: Clarify meaning of hva_to_pfn()'s 'atomic' parameter Anish Moorthy
2024-02-15 23:53 ` [PATCH v7 02/14] KVM: Add function comments for __kvm_read/write_guest_page() Anish Moorthy
2024-02-15 23:53 ` [PATCH v7 03/14] KVM: Documentation: Make note of the KVM_MEM_GUEST_MEMFD memslot flag Anish Moorthy
2024-04-09 22:47   ` Sean Christopherson
2024-02-15 23:53 ` [PATCH v7 04/14] KVM: Simplify error handling in __gfn_to_pfn_memslot() Anish Moorthy
2024-04-09 22:44   ` Sean Christopherson
2024-02-15 23:53 ` [PATCH v7 05/14] KVM: Define and communicate KVM_EXIT_MEMORY_FAULT RWX flags to userspace Anish Moorthy
2024-02-15 23:53 ` [PATCH v7 06/14] KVM: Add memslot flag to let userspace force an exit on missing hva mappings Anish Moorthy
2024-03-08 22:07   ` Sean Christopherson
2024-03-09  0:46     ` David Matlack
2024-03-11  4:45       ` Oliver Upton
2024-03-11 16:20         ` David Matlack
2024-07-03 17:34           ` Nikita Kalyazin
2024-07-03 20:11             ` David Matlack
2024-07-04 10:10               ` Nikita Kalyazin
2024-03-11 16:36         ` Sean Christopherson
2024-03-11 17:08           ` Anish Moorthy
2024-03-11 21:21             ` Oliver Upton
2024-02-15 23:53 ` [PATCH v7 07/14] KVM: x86: Enable KVM_CAP_EXIT_ON_MISSING and annotate EFAULTs from stage-2 fault handler Anish Moorthy
2024-02-15 23:53 ` [PATCH v7 08/14] KVM: arm64: Enable KVM_CAP_MEMORY_FAULT_INFO and annotate fault in the " Anish Moorthy
2024-03-04 20:00   ` Oliver Upton
2024-03-04 20:10     ` Oliver Upton
2024-03-04 20:32       ` Sean Christopherson
2024-03-04 21:03         ` Oliver Upton
2024-03-04 22:49           ` Sean Christopherson
2024-03-05  1:01             ` Oliver Upton
2024-03-05 15:39               ` Sean Christopherson [this message]
2024-02-15 23:54 ` [PATCH v7 09/14] KVM: arm64: Implement and advertise KVM_CAP_EXIT_ON_MISSING Anish Moorthy
2024-02-15 23:54 ` [PATCH v7 10/14] KVM: selftests: Report per-vcpu demand paging rate from demand paging test Anish Moorthy
2024-04-09 22:49   ` Sean Christopherson
2024-02-15 23:54 ` [PATCH v7 11/14] KVM: selftests: Allow many vCPUs and reader threads per UFFD in " Anish Moorthy
2024-04-09 22:58   ` Sean Christopherson
2024-02-15 23:54 ` [PATCH v7 12/14] KVM: selftests: Use EPOLL in userfaultfd_util reader threads and signal errors via TEST_ASSERT Anish Moorthy
2024-02-15 23:54 ` [PATCH v7 13/14] KVM: selftests: Add memslot_flags parameter to memstress_create_vm() Anish Moorthy
2024-02-15 23:54 ` [PATCH v7 14/14] KVM: selftests: Handle memory fault exits in demand_paging_test Anish Moorthy
2024-02-16  7:36 ` [PATCH v7 00/14] Improve KVM + userfaultfd performance via KVM_EXIT_MEMORY_FAULTs on stage-2 faults Gupta, Pankaj
2024-02-16 20:00   ` Anish Moorthy
2024-02-16 23:40     ` Axel Rasmussen
2024-02-21  7:35       ` Gupta, Pankaj
2024-04-10  0:19 ` Sean Christopherson
2024-05-07 17:38   ` Anish Moorthy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Zec8stHiFS3KOOPt@google.com \
    --to=seanjc@google.com \
    --cc=amoorthy@google.com \
    --cc=axelrasmussen@google.com \
    --cc=dmatlack@google.com \
    --cc=isaku.yamahata@gmail.com \
    --cc=jthoughton@google.com \
    --cc=kconsul@linux.vnet.ibm.com \
    --cc=kvm@vger.kernel.org \
    --cc=kvmarm@lists.linux.dev \
    --cc=maz@kernel.org \
    --cc=nadav.amit@gmail.com \
    --cc=oliver.upton@linux.dev \
    --cc=peterx@redhat.com \
    --cc=robert.hoo.linux@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox