public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed
From: Sean Christopherson <seanjc@google.com>
To: Paolo Bonzini <pbonzini@redhat.com>
Cc: linux-kernel@vger.kernel.org, kvm@vger.kernel.org,
	 isaku.yamahata@intel.com, xiaoyao.li@intel.com,
	binbin.wu@linux.intel.com,  rick.p.edgecombe@intel.com
Subject: Re: [PATCH 1/7] KVM: Document KVM_MAP_MEMORY ioctl
Date: Wed, 17 Apr 2024 13:28:06 -0700	[thread overview]
Message-ID: <ZiAw1jd8840jXqok@google.com> (raw)
In-Reply-To: <20240417153450.3608097-2-pbonzini@redhat.com>

On Wed, Apr 17, 2024, Paolo Bonzini wrote:
> +4.143 KVM_MAP_MEMORY
> +------------------------
> +
> +:Capability: KVM_CAP_MAP_MEMORY
> +:Architectures: none
> +:Type: vcpu ioctl
> +:Parameters: struct kvm_map_memory (in/out)
> +:Returns: 0 on success, < 0 on error
> +
> +Errors:
> +
> +  ========== ===============================================================
> +  EINVAL     The specified `base_address` and `size` were invalid (e.g. not
> +             page aligned or outside the defined memory slots).

"outside the memslots" should probably be -EFAULT, i.e. keep EINVAL for things
that can _never_ succeed.

> +  EAGAIN     The ioctl should be invoked again and no page was processed.
> +  EINTR      An unmasked signal is pending and no page was processed.

I'm guessing we'll want to handle large ranges, at which point we'll likely end
up with EAGAIN and/or EINTR after processing at least one page.

> +  EFAULT     The parameter address was invalid.
> +  EOPNOTSUPP The architecture does not support this operation, or the
> +             guest state does not allow it.

I would phrase this as something like:

                Mapping memory given for a GPA is unsupported by the
                architecture, and/or for the current vCPU state/mode.

It's not that the guest state doesn't "allow" it, it's that it's explicitly
unsupported because it's nonsensical without a GVA (or L2 GPA).

> +  ========== ===============================================================
> +
> +::
> +
> +  struct kvm_map_memory {
> +	/* in/out */
> +	__u64 base_address;

I think we should commit to this being limited to gpa mappings, e.g. go with
"gpa", or "guest_physical_address" if we want to be verbose (I vote for "gpa").

> +	__u64 size;
> +	/* in */
> +	__u64 flags;
> +	__u64 padding[5];
> +  };
> +
> +KVM_MAP_MEMORY populates guest memory in the page tables of a vCPU.

I think we should word this very carefully and explicitly so that KVM doesn't
commit to behavior that can't be guaranteed.  We might even want to use a name
that explicitly captures the semantics, e.g. KVM_PRE_FAULT_MEMORY?

Also, this doesn't populate guest _memory_, and "in the page tables of a vCPU"
could be interpreted as the _guest's_ page tables.

Something like:

  KVM_PRE_FAULT_MEMORY populates KVM's stage-2 page tables used to map memory
  for the current vCPU state.  KVM maps memory as if the vCPU generated a
  stage-2 read page fault, e.g. faults in memory as needed, but doesn't break
  CoW.  However, KVM does not mark any newly created stage-2 PTE as Accessed.

> +When the ioctl returns, the input values are updated to point to the
> +remaining range.  If `size` > 0 on return, the caller can just issue
> +the ioctl again with the same `struct kvm_map_memory` argument.

This is likely misleading.  Unless KVM explicitly zeros size on *every* failure,
a pedantic reading of this would suggest that userspace can retry and it should
eventually succeed.

> +In some cases, multiple vCPUs might share the page tables.  In this
> +case, if this ioctl is called in parallel for multiple vCPUs the
> +ioctl might return with `size` > 0.

Why?  If there's already a valid mapping, mission accomplished.  I don't see any
reason to return an error.  If x86's page fault path returns RET_PF_RETRY, then I
think it makes sense to retry in KVM, not punt this to userspace.

> +The ioctl may not be supported for all VMs, and may just return
> +an `EOPNOTSUPP` error if a VM does not support it.  You may use
> +`KVM_CHECK_EXTENSION` on the VM file descriptor to check if it is
> +supported.

Why per-VM?  I don't think there's any per-VM state that would change the behavior.
The TDP MMU being enabled is KVM wide, and the guest state modifiers that cause
problems are per-vCPU, not per-VM.

Adding support for KVM_CHECK_EXTENSION on vCPU FDs is probably overkill, e.g. I
don't think it would add much value beyond returning EOPNOTSUPP for the ioctl()
itself.

> +Also, shadow page tables cannot support this ioctl because they
> +are indexed by virtual address or nested guest physical address.
> +Calling this ioctl when the guest is using shadow page tables (for
> +example because it is running a nested guest) will also fail.

Running a nested guest using TDP.

> +
> +`flags` must currently be zero.
> +
> +
>  5. The kvm_run structure
>  ========================
>  
> -- 
> 2.43.0
> 
> 

  reply	other threads:[~2024-04-17 20:28 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-04-17 15:34 [PATCH v3 0/7] KVM: Guest Memory Pre-Population API Paolo Bonzini
2024-04-17 15:34 ` [PATCH 1/7] KVM: Document KVM_MAP_MEMORY ioctl Paolo Bonzini
2024-04-17 20:28   ` Sean Christopherson [this message]
2024-04-17 20:37     ` Paolo Bonzini
2024-04-19 13:57       ` Xu Yilun
2024-04-17 15:34 ` [PATCH 2/7] KVM: Add KVM_MAP_MEMORY vcpu ioctl to pre-populate guest memory Paolo Bonzini
2024-04-17 19:36   ` Isaku Yamahata
2024-04-17 21:07     ` Sean Christopherson
2024-04-17 21:13       ` Paolo Bonzini
2024-04-19 13:59     ` Xu Yilun
2024-04-19 14:08       ` Sean Christopherson
2024-04-19 14:01   ` Xu Yilun
2024-04-17 15:34 ` [PATCH 3/7] KVM: x86/mmu: Extract __kvm_mmu_do_page_fault() Paolo Bonzini
2024-04-17 19:47   ` Isaku Yamahata
2024-04-17 15:34 ` [PATCH 4/7] KVM: x86/mmu: Make __kvm_mmu_do_page_fault() return mapped level Paolo Bonzini
2024-04-17 15:34 ` [PATCH 5/7] KVM: x86/mmu: Introduce kvm_tdp_map_page() to populate guest memory Paolo Bonzini
2024-04-17 21:24   ` Sean Christopherson
2024-04-17 21:31     ` Paolo Bonzini
2024-04-17 22:26       ` Sean Christopherson
2024-04-17 21:34     ` Sean Christopherson
2024-04-17 21:47       ` Paolo Bonzini
2024-04-17 15:34 ` [PATCH 6/7] KVM: x86: Implement kvm_arch_vcpu_map_memory() Paolo Bonzini
2024-04-17 19:28   ` Isaku Yamahata
2024-04-17 21:37   ` Sean Christopherson
2024-04-17 15:34 ` [PATCH 7/7] KVM: selftests: x86: Add test for KVM_MAP_MEMORY Paolo Bonzini
2024-04-18  0:01 ` [PATCH v3 0/7] KVM: Guest Memory Pre-Population API Edgecombe, Rick P
2024-04-18  0:31   ` Paolo Bonzini
2024-04-18  0:33     ` Edgecombe, Rick P

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZiAw1jd8840jXqok@google.com \
    --to=seanjc@google.com \
    --cc=binbin.wu@linux.intel.com \
    --cc=isaku.yamahata@intel.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=pbonzini@redhat.com \
    --cc=rick.p.edgecombe@intel.com \
    --cc=xiaoyao.li@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox