qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Peter Xu <peterx@redhat.com>
To: Xu Yilun <yilun.xu@linux.intel.com>
Cc: "Alexey Kardashevskiy" <aik@amd.com>,
	"Chenyi Qiang" <chenyi.qiang@intel.com>,
	"David Hildenbrand" <david@redhat.com>,
	"Paolo Bonzini" <pbonzini@redhat.com>,
	"Philippe Mathieu-Daudé" <philmd@linaro.org>,
	"Michael Roth" <michael.roth@amd.com>,
	qemu-devel@nongnu.org, kvm@vger.kernel.org,
	"Williams Dan J" <dan.j.williams@intel.com>,
	"Peng Chao P" <chao.p.peng@intel.com>,
	"Gao Chao" <chao.gao@intel.com>, "Xu Yilun" <yilun.xu@intel.com>
Subject: Re: [PATCH 2/7] guest_memfd: Introduce an object to manage the guest-memfd with RamDiscardManager
Date: Thu, 6 Feb 2025 15:03:10 -0500	[thread overview]
Message-ID: <Z6UVfrllv-Ahex73@x1.local> (raw)
In-Reply-To: <Z6SRxV83I9/kamop@yilunxu-OptiPlex-7050>

On Thu, Feb 06, 2025 at 06:41:09PM +0800, Xu Yilun wrote:
> On Thu, Jan 30, 2025 at 11:28:11AM -0500, Peter Xu wrote:
> > On Sun, Jan 26, 2025 at 11:34:29AM +0800, Xu Yilun wrote:
> > > > Definitely not suggesting to install an invalid pointer anywhere.  The
> > > > mapped pointer will still be valid for gmem for example, but the fault
> > > > isn't.  We need to differenciate two things (1) virtual address mapping,
> > > > then (2) permission and accesses on the folios / pages of the mapping.
> > > > Here I think it's okay if the host pointer is correctly mapped.
> > > > 
> > > > For your private MMIO use case, my question is if there's no host pointer
> > > > to be mapped anyway, then what's the benefit to make the MR to be ram=on?
> > > > Can we simply make it a normal IO memory region?  The only benefit of a
> > > 
> > > The guest access to normal IO memory region would be emulated by QEMU,
> > > while private assigned MMIO requires guest direct access via Secure EPT.
> > > 
> > > Seems the existing code doesn't support guest direct access if
> > > mr->ram == false:
> > 
> > Ah it's about this, ok.
> > 
> > I am not sure what's the best approach, but IMHO it's still better we stick
> > with host pointer always available when ram=on.  OTOH, VFIO private regions
> > may be able to provide a special mark somewhere, just like when romd_mode
> > was done previously as below (qemu 235e8982ad39), so that KVM should still
> > apply these MRs even if they're not RAM.
> 
> Also good to me.
> 
> > 
> > > 
> > > static void kvm_set_phys_mem(KVMMemoryListener *kml,
> > >                              MemoryRegionSection *section, bool add)
> > > {
> > >     [...]
> > > 
> > >     if (!memory_region_is_ram(mr)) {
> > >         if (writable || !kvm_readonly_mem_allowed) {
> > >             return;
> > >         } else if (!mr->romd_mode) {
> > >             /* If the memory device is not in romd_mode, then we actually want
> > >              * to remove the kvm memory slot so all accesses will trap. */
> > >             add = false;
> > >         }
> > >     }
> > > 
> > >     [...]
> > > 
> > >     /* register the new slot */
> > >     do {
> > > 
> > >         [...]
> > > 
> > >         err = kvm_set_user_memory_region(kml, mem, true);
> > >     }
> > > }
> > > 
> > > > ram=on MR is, IMHO, being able to be accessed as RAM-like.  If there's no
> > > > host pointer at all, I don't yet understand how that helps private MMIO
> > > > from working.
> > > 
> > > I expect private MMIO not accessible from host, but accessible from
> > > guest so has kvm_userspace_memory_region2 set. That means the resolving
> > > of its PFN during EPT fault cannot depends on host pointer.
> > > 
> > > https://lore.kernel.org/all/20250107142719.179636-1-yilun.xu@linux.intel.com/
> > 
> > I'll leave this to KVM experts, but I actually didn't follow exactly on why
> > mmu notifier is an issue to make , as I thought that was per-mm anyway, and KVM
> > should logically be able to skip all VFIO private MMIO regions if affected.
> 
> I think this creates logical inconsistency. You builds the private MMIO
> EPT mapping on fault based on the HVA<->HPA mapping, but doesn't follow
> the HVA<->HPA mapping change. Why KVM believes the mapping on fault time
> but doesn't on mmu notify time?

IMHO as long as kvm knows it's a private MMIO and there's no mapping under
it guaranteed, then KVM can safely skip those ranges to speedup the mmu
notifier.

Said that, I'm not suggesting to stick with hvas if there're better
alternatives.  It's only about the paragraph that confused me a bit.

> 
> > This is a comment to this part of your commit message:
> > 
> >         Rely on userspace mapping also means private MMIO mapping should
> >         follow userspace mapping change via mmu_notifier. This conflicts
> >         with the current design that mmu_notifier never impacts private
> >         mapping. It also makes no sense to support mmu_notifier just for
> >         private MMIO, private MMIO mapping should be fixed when CoCo-VM
> >         accepts the private MMIO, any following mapping change without
> >         guest permission should be invalid.
> > 
> > So I don't yet see a hard-no of reusing userspace mapping even if they're
> > not faultable as of now - what if they can be faultable in the future?  I
> 
> The first commit of guest_memfd emphasize a lot on the benifit of
> decoupling KVM mapping from host mapping. My understanding is even if
> guest memfd can be faultable later, KVM should still work in a way
> without userspace mapping.

I could have implied to suggest using hva, not my intention.  I agree
fd-based API is better too in this case at least as of now.

What I'm not sure is how the whole things evolve with either gmemfd or
device fd when they're used with shared and mappable pages.  We can leave
that for later discussion for sure.

> 
> > am not sure..
> > 
> > OTOH, I also don't think we need KVM_SET_USER_MEMORY_REGION3 anyway.. The
> > _REGION2 API is already smart enough to leave some reserved fields:
> > 
> > /* for KVM_SET_USER_MEMORY_REGION2 */
> > struct kvm_userspace_memory_region2 {
> > 	__u32 slot;
> > 	__u32 flags;
> > 	__u64 guest_phys_addr;
> > 	__u64 memory_size;
> > 	__u64 userspace_addr;
> > 	__u64 guest_memfd_offset;
> > 	__u32 guest_memfd;
> > 	__u32 pad1;
> > 	__u64 pad2[14];
> > };
> > 
> > I think we _could_ reuse some pad*?  Reusing guest_memfd field sounds error
> > prone to me.
> 
> It truly is. I'm expecting some suggestions here.

Maybe a generic fd+offset pair from pad*?  I'm not sure whether at some
point that could also support guest-memfd there too, after all it's easy
for kvm to check whatever file->f_op it's backing, so logically kvm should
allow backing a memslot with whatever file without HVA.  Just my 2 cents.

Thanks,

-- 
Peter Xu



  reply	other threads:[~2025-02-06 20:13 UTC|newest]

Thread overview: 98+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-12-13  7:08 [PATCH 0/7] Enable shared device assignment Chenyi Qiang
2024-12-13  7:08 ` [PATCH 1/7] memory: Export a helper to get intersection of a MemoryRegionSection with a given range Chenyi Qiang
2024-12-18 12:33   ` David Hildenbrand
2025-01-08  4:47   ` Alexey Kardashevskiy
2025-01-08  6:41     ` Chenyi Qiang
2024-12-13  7:08 ` [PATCH 2/7] guest_memfd: Introduce an object to manage the guest-memfd with RamDiscardManager Chenyi Qiang
2024-12-18  6:45   ` Chenyi Qiang
2025-01-08  4:48   ` Alexey Kardashevskiy
2025-01-08 10:56     ` Chenyi Qiang
2025-01-08 11:20       ` Alexey Kardashevskiy
2025-01-09  2:11         ` Chenyi Qiang
2025-01-09  2:55           ` Alexey Kardashevskiy
2025-01-09  4:29             ` Chenyi Qiang
2025-01-10  0:58               ` Alexey Kardashevskiy
2025-01-10  6:38                 ` Chenyi Qiang
2025-01-09 21:00                   ` Xu Yilun
2025-01-09 21:50                     ` Xu Yilun
2025-01-13  3:34                       ` Chenyi Qiang
2025-01-12 22:23                         ` Xu Yilun
2025-01-14  1:14                           ` Chenyi Qiang
2025-01-15  4:06                   ` Alexey Kardashevskiy
2025-01-15  6:15                     ` Chenyi Qiang
     [not found]                       ` <2b2730f3-6e1a-4def-b126-078cf6249759@amd.com>
2025-01-20 20:46                         ` Peter Xu
2024-06-24 16:31                           ` Xu Yilun
2025-01-21 15:18                             ` Peter Xu
2025-01-22  4:30                               ` Alexey Kardashevskiy
2025-01-22  9:41                                 ` Xu Yilun
2025-01-22 16:43                                   ` Peter Xu
2025-01-23  9:33                                     ` Xu Yilun
2025-01-23 16:47                                       ` Peter Xu
2025-01-24  9:47                                         ` Xu Yilun
2025-01-24 15:55                                           ` Peter Xu
2025-01-24 18:17                                             ` David Hildenbrand
2025-01-26  3:34                                             ` Xu Yilun
2025-01-30 16:28                                               ` Peter Xu
2025-01-30 16:51                                                 ` David Hildenbrand
2025-02-06 10:41                                                 ` Xu Yilun
2025-02-06 20:03                                                   ` Peter Xu [this message]
2025-01-14  6:45               ` Chenyi Qiang
2025-01-13 10:54       ` David Hildenbrand
2025-01-14  1:10         ` Chenyi Qiang
2025-01-15  4:05         ` Alexey Kardashevskiy
     [not found]           ` <f3aaffe7-7045-4288-8675-349115a867ce@redhat.com>
2025-01-20 17:21             ` Peter Xu
2025-01-20 17:54               ` David Hildenbrand
2025-01-20 18:33                 ` Peter Xu
2025-01-20 18:47                   ` David Hildenbrand
2025-01-20 20:19                     ` Peter Xu
2025-01-20 20:25                       ` David Hildenbrand
2025-01-20 20:43                         ` Peter Xu
2025-01-21  1:35                   ` Chenyi Qiang
2025-01-21 16:35                     ` Peter Xu
2025-01-22  3:28                       ` Chenyi Qiang
2025-01-22  5:38                         ` Xiaoyao Li
2025-01-24  0:15                           ` Alexey Kardashevskiy
2025-01-24  3:09                             ` Chenyi Qiang
2025-01-24  5:56                               ` Alexey Kardashevskiy
2025-01-24 16:12                                 ` Peter Xu
2025-01-20 18:09   ` Peter Xu
2025-01-21  9:00     ` Chenyi Qiang
2025-01-21  9:26       ` David Hildenbrand
2025-01-21 10:16         ` Chenyi Qiang
2025-01-21 10:26           ` David Hildenbrand
2025-01-22  6:43             ` Chenyi Qiang
2025-01-21 15:38       ` Peter Xu
2025-01-24  3:40         ` Chenyi Qiang
2024-12-13  7:08 ` [PATCH 3/7] guest_memfd: Introduce a callback to notify the shared/private state change Chenyi Qiang
2024-12-13  7:08 ` [PATCH 4/7] KVM: Notify the state change event during shared/private conversion Chenyi Qiang
2024-12-13  7:08 ` [PATCH 5/7] memory: Register the RamDiscardManager instance upon guest_memfd creation Chenyi Qiang
2025-01-08  4:47   ` Alexey Kardashevskiy
2025-01-09  5:34     ` Chenyi Qiang
2025-01-09  9:32       ` Alexey Kardashevskiy
2025-01-10  5:13         ` Chenyi Qiang
     [not found]           ` <59bd0e82-f269-4567-8f75-a32c9c997ca9@redhat.com>
2025-01-24  3:27             ` Alexey Kardashevskiy
2025-01-24  5:36               ` Chenyi Qiang
2025-01-09  8:14   ` Zhao Liu
2025-01-09  8:17     ` Chenyi Qiang
2024-12-13  7:08 ` [PATCH 6/7] RAMBlock: make guest_memfd require coordinate discard Chenyi Qiang
2025-01-13 10:56   ` David Hildenbrand
2025-01-14  1:38     ` Chenyi Qiang
     [not found]       ` <e1141052-1dec-435b-8635-a41881fedd4c@redhat.com>
2025-01-21  6:26         ` Chenyi Qiang
2025-01-21  8:05           ` David Hildenbrand
2024-12-13  7:08 ` [RFC PATCH 7/7] memory: Add a new argument to indicate the request attribute in RamDismcardManager helpers Chenyi Qiang
2025-01-08  4:47 ` [PATCH 0/7] Enable shared device assignment Alexey Kardashevskiy
2025-01-08  6:28   ` Chenyi Qiang
2025-01-08 11:38     ` Alexey Kardashevskiy
2025-01-09  7:52       ` Chenyi Qiang
2025-01-09  8:18         ` Alexey Kardashevskiy
2025-01-09  8:49           ` Chenyi Qiang
2025-01-10  1:42             ` Alexey Kardashevskiy
2025-01-10  7:06               ` Chenyi Qiang
2025-01-10  8:26                 ` David Hildenbrand
2025-01-10 13:20                   ` Jason Gunthorpe
2025-01-10 13:45                     ` David Hildenbrand
2025-01-10 14:14                       ` Jason Gunthorpe
2025-01-10 14:50                         ` David Hildenbrand
2025-01-15  3:39                         ` Alexey Kardashevskiy
2025-01-15 12:49                           ` Jason Gunthorpe
     [not found]                             ` <cc3428b1-22b7-432a-9c74-12b7e36b6cc6@redhat.com>
2025-01-20 18:39                               ` Jason Gunthorpe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Z6UVfrllv-Ahex73@x1.local \
    --to=peterx@redhat.com \
    --cc=aik@amd.com \
    --cc=chao.gao@intel.com \
    --cc=chao.p.peng@intel.com \
    --cc=chenyi.qiang@intel.com \
    --cc=dan.j.williams@intel.com \
    --cc=david@redhat.com \
    --cc=kvm@vger.kernel.org \
    --cc=michael.roth@amd.com \
    --cc=pbonzini@redhat.com \
    --cc=philmd@linaro.org \
    --cc=qemu-devel@nongnu.org \
    --cc=yilun.xu@intel.com \
    --cc=yilun.xu@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).