Re: [PATCH 1/7] KVM: Document KVM_MAP_MEMORY ioctl

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Sean Christopherson <seanjc@google.com>
To: Paolo Bonzini <pbonzini@redhat.com>
Cc: linux-kernel@vger.kernel.org, kvm@vger.kernel.org,
	 isaku.yamahata@intel.com, xiaoyao.li@intel.com,
	binbin.wu@linux.intel.com,  rick.p.edgecombe@intel.com
Subject: Re: [PATCH 1/7] KVM: Document KVM_MAP_MEMORY ioctl
Date: Wed, 17 Apr 2024 13:28:06 -0700	[thread overview]
Message-ID: <ZiAw1jd8840jXqok@google.com> (raw)
In-Reply-To: <20240417153450.3608097-2-pbonzini@redhat.com>

On Wed, Apr 17, 2024, Paolo Bonzini wrote:
> +4.143 KVM_MAP_MEMORY
> +------------------------
> +
> +:Capability: KVM_CAP_MAP_MEMORY
> +:Architectures: none
> +:Type: vcpu ioctl
> +:Parameters: struct kvm_map_memory (in/out)
> +:Returns: 0 on success, < 0 on error
> +
> +Errors:
> +
> +  ========== ===============================================================
> +  EINVAL     The specified `base_address` and `size` were invalid (e.g. not
> +             page aligned or outside the defined memory slots).

"outside the memslots" should probably be -EFAULT, i.e. keep EINVAL for things
that can _never_ succeed.

> +  EAGAIN     The ioctl should be invoked again and no page was processed.
> +  EINTR      An unmasked signal is pending and no page was processed.

I'm guessing we'll want to handle large ranges, at which point we'll likely end
up with EAGAIN and/or EINTR after processing at least one page.

> +  EFAULT     The parameter address was invalid.
> +  EOPNOTSUPP The architecture does not support this operation, or the
> +             guest state does not allow it.

I would phrase this as something like:

                Mapping memory given for a GPA is unsupported by the
                architecture, and/or for the current vCPU state/mode.

It's not that the guest state doesn't "allow" it, it's that it's explicitly
unsupported because it's nonsensical without a GVA (or L2 GPA).

> +  ========== ===============================================================
> +
> +::
> +
> +  struct kvm_map_memory {
> +	/* in/out */
> +	__u64 base_address;

I think we should commit to this being limited to gpa mappings, e.g. go with
"gpa", or "guest_physical_address" if we want to be verbose (I vote for "gpa").

> +	__u64 size;
> +	/* in */
> +	__u64 flags;
> +	__u64 padding[5];
> +  };
> +
> +KVM_MAP_MEMORY populates guest memory in the page tables of a vCPU.

I think we should word this very carefully and explicitly so that KVM doesn't
commit to behavior that can't be guaranteed.  We might even want to use a name
that explicitly captures the semantics, e.g. KVM_PRE_FAULT_MEMORY?

Also, this doesn't populate guest _memory_, and "in the page tables of a vCPU"
could be interpreted as the _guest's_ page tables.

Something like:

  KVM_PRE_FAULT_MEMORY populates KVM's stage-2 page tables used to map memory
  for the current vCPU state.  KVM maps memory as if the vCPU generated a
  stage-2 read page fault, e.g. faults in memory as needed, but doesn't break
  CoW.  However, KVM does not mark any newly created stage-2 PTE as Accessed.

> +When the ioctl returns, the input values are updated to point to the
> +remaining range.  If `size` > 0 on return, the caller can just issue
> +the ioctl again with the same `struct kvm_map_memory` argument.

This is likely misleading.  Unless KVM explicitly zeros size on *every* failure,
a pedantic reading of this would suggest that userspace can retry and it should
eventually succeed.

> +In some cases, multiple vCPUs might share the page tables.  In this
> +case, if this ioctl is called in parallel for multiple vCPUs the
> +ioctl might return with `size` > 0.

Why?  If there's already a valid mapping, mission accomplished.  I don't see any
reason to return an error.  If x86's page fault path returns RET_PF_RETRY, then I
think it makes sense to retry in KVM, not punt this to userspace.

> +The ioctl may not be supported for all VMs, and may just return
> +an `EOPNOTSUPP` error if a VM does not support it.  You may use
> +`KVM_CHECK_EXTENSION` on the VM file descriptor to check if it is
> +supported.

Why per-VM?  I don't think there's any per-VM state that would change the behavior.
The TDP MMU being enabled is KVM wide, and the guest state modifiers that cause
problems are per-vCPU, not per-VM.

Adding support for KVM_CHECK_EXTENSION on vCPU FDs is probably overkill, e.g. I
don't think it would add much value beyond returning EOPNOTSUPP for the ioctl()
itself.

> +Also, shadow page tables cannot support this ioctl because they
> +are indexed by virtual address or nested guest physical address.
> +Calling this ioctl when the guest is using shadow page tables (for
> +example because it is running a nested guest) will also fail.

Running a nested guest using TDP.

> +
> +`flags` must currently be zero.
> +
> +
>  5. The kvm_run structure
>  ========================
>  
> -- 
> 2.43.0
> 
>

next prev parent reply	other threads:[~2024-04-17 20:28 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-04-17 15:34 [PATCH v3 0/7] KVM: Guest Memory Pre-Population API Paolo Bonzini
2024-04-17 15:34 ` [PATCH 1/7] KVM: Document KVM_MAP_MEMORY ioctl Paolo Bonzini
2024-04-17 20:28   ` Sean Christopherson [this message]
2024-04-17 20:37     ` Paolo Bonzini
2024-04-19 13:57       ` Xu Yilun
2024-04-17 15:34 ` [PATCH 2/7] KVM: Add KVM_MAP_MEMORY vcpu ioctl to pre-populate guest memory Paolo Bonzini
2024-04-17 19:36   ` Isaku Yamahata
2024-04-17 21:07     ` Sean Christopherson
2024-04-17 21:13       ` Paolo Bonzini
2024-04-19 13:59     ` Xu Yilun
2024-04-19 14:08       ` Sean Christopherson
2024-04-19 14:01   ` Xu Yilun
2024-04-17 15:34 ` [PATCH 3/7] KVM: x86/mmu: Extract __kvm_mmu_do_page_fault() Paolo Bonzini
2024-04-17 19:47   ` Isaku Yamahata
2024-04-17 15:34 ` [PATCH 4/7] KVM: x86/mmu: Make __kvm_mmu_do_page_fault() return mapped level Paolo Bonzini
2024-04-17 15:34 ` [PATCH 5/7] KVM: x86/mmu: Introduce kvm_tdp_map_page() to populate guest memory Paolo Bonzini
2024-04-17 21:24   ` Sean Christopherson
2024-04-17 21:31     ` Paolo Bonzini
2024-04-17 22:26       ` Sean Christopherson
2024-04-17 21:34     ` Sean Christopherson
2024-04-17 21:47       ` Paolo Bonzini
2024-04-17 15:34 ` [PATCH 6/7] KVM: x86: Implement kvm_arch_vcpu_map_memory() Paolo Bonzini
2024-04-17 19:28   ` Isaku Yamahata
2024-04-17 21:37   ` Sean Christopherson
2024-04-17 15:34 ` [PATCH 7/7] KVM: selftests: x86: Add test for KVM_MAP_MEMORY Paolo Bonzini
2024-04-18  0:01 ` [PATCH v3 0/7] KVM: Guest Memory Pre-Population API Edgecombe, Rick P
2024-04-18  0:31   ` Paolo Bonzini
2024-04-18  0:33     ` Edgecombe, Rick P

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZiAw1jd8840jXqok@google.com \
    --to=seanjc@google.com \
    --cc=binbin.wu@linux.intel.com \
    --cc=isaku.yamahata@intel.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=pbonzini@redhat.com \
    --cc=rick.p.edgecombe@intel.com \
    --cc=xiaoyao.li@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.