From: Jason Gunthorpe <jgg@nvidia.com>
To: David Hildenbrand <david@redhat.com>
Cc: "Chenyi Qiang" <chenyi.qiang@intel.com>,
"Alexey Kardashevskiy" <aik@amd.com>,
"Paolo Bonzini" <pbonzini@redhat.com>,
"Peter Xu" <peterx@redhat.com>,
"Philippe Mathieu-Daudé" <philmd@linaro.org>,
"Michael Roth" <michael.roth@amd.com>,
qemu-devel@nongnu.org, kvm@vger.kernel.org,
"Williams Dan J" <dan.j.williams@intel.com>,
"Peng Chao P" <chao.p.peng@intel.com>,
"Gao Chao" <chao.gao@intel.com>, "Xu Yilun" <yilun.xu@intel.com>
Subject: Re: [PATCH 0/7] Enable shared device assignment
Date: Fri, 10 Jan 2025 09:20:21 -0400 [thread overview]
Message-ID: <20250110132021.GE5556@nvidia.com> (raw)
In-Reply-To: <c318c89b-967d-456e-ade1-3a8cacb21bd7@redhat.com>
On Fri, Jan 10, 2025 at 09:26:02AM +0100, David Hildenbrand wrote:
> > > > > > > > > > One limitation (also discussed in the guest_memfd
> > > > > > > > > > meeting) is that VFIO expects the DMA mapping for
> > > > > > > > > > a specific IOVA to be mapped and unmapped with the
> > > > > > > > > > same granularity.
Not just same granularity, whatever you map you have to unmap in
whole. map/unmap must be perfectly paired by userspace.
> > > > > > > > > > such as converting a small region within a larger
> > > > > > > > > > region. To prevent such invalid cases, all
> > > > > > > > > > operations are performed with 4K granularity. The
> > > > > > > > > > possible solutions we can think of are either to
> > > > > > > > > > enable VFIO to support partial unmap
Yes, you can do that, but it is aweful for performance everywhere
> > > > > > iopt_cut_iova() happens in iommufd vfio_compat.c, which is to make
> > > > > > iommufd be compatible with old VFIO_TYPE1. IIUC, it happens with
> > > > > > disable_large_page=true. That means the large IOPTE is also disabled in
> > > > > > IOMMU. So it can do the split easily. See the comment in
> > > > > > iommufd_vfio_set_iommu().
Yes. But I am working on a project to make this more general purpose
and not have the 4k limitation. There are now several use cases for
this kind of cut feature.
https://lore.kernel.org/linux-iommu/7-v1-01fa10580981+1d-iommu_pt_jgg@nvidia.com/
> > > > > This is all true but this also means that "The former requires complex
> > > > > changes in VFIO" is not entirely true - some code is already there.
Well, to do it without forcing 4k requires complex changes.
> > > > Hmm, my statement is a little confusing. The bottleneck is that the
> > > > IOMMU driver doesn't support the large page split. So if we want to
> > > > enable large page and want to do partial unmap, it requires complex
> > > > change.
Yes, this is what I'm working on.
> > > We won't need to split large pages (if we stick to 4K for now), we need
> > > to split large mappings (not large pages) to allow partial unmapping and
> > > iopt_area_split() seems to be doing this. Thanks,
Correct
> > You mean we can disable large page in iommufd and then VFIO will be able
> > to do partial unmap. Yes, I think it is doable and we can avoid many
> > ioctl context switches overhead.
Right
> So I understand this correctly: the disable_large_pages=true will imply that
> we never have PMD mappings such that we can atomically poke a hole in a
> mapping, without temporarily having to remove a PMD mapping in the iommu
> table to insert a PTE table?
Yes
> batch_iommu_map_small() seems to document that behavior.
Yes
> It's interesting that that comment points out that this is purely "VFIO
> compatibility", and that it otherwise violates the iommufd invariant:
> "pairing map/unmap". So, it is against the real iommufd design ...
IIRC you can only trigger split using the VFIO type 1 legacy API. We
would need to formalize split as an IOMMUFD native ioctl.
Nobody should use this stuf through the legacy type 1 API!!!!
> Back when working on virtio-mem support (RAMDiscardManager), thought there
> was not way to reliably do atomic partial unmappings.
Correct
Jason
next prev parent reply other threads:[~2025-01-10 13:20 UTC|newest]
Thread overview: 104+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-12-13 7:08 [PATCH 0/7] Enable shared device assignment Chenyi Qiang
2024-12-13 7:08 ` [PATCH 1/7] memory: Export a helper to get intersection of a MemoryRegionSection with a given range Chenyi Qiang
2024-12-18 12:33 ` David Hildenbrand
2025-01-08 4:47 ` Alexey Kardashevskiy
2025-01-08 6:41 ` Chenyi Qiang
2025-01-20 12:50 ` David Hildenbrand
2024-12-13 7:08 ` [PATCH 2/7] guest_memfd: Introduce an object to manage the guest-memfd with RamDiscardManager Chenyi Qiang
2024-12-18 6:45 ` Chenyi Qiang
2025-01-08 4:48 ` Alexey Kardashevskiy
2025-01-08 10:56 ` Chenyi Qiang
2025-01-08 11:20 ` Alexey Kardashevskiy
2025-01-09 2:11 ` Chenyi Qiang
2025-01-09 2:55 ` Alexey Kardashevskiy
2025-01-09 4:29 ` Chenyi Qiang
2025-01-10 0:58 ` Alexey Kardashevskiy
2025-01-10 6:38 ` Chenyi Qiang
2025-01-09 21:00 ` Xu Yilun
2025-01-09 21:50 ` Xu Yilun
2025-01-13 3:34 ` Chenyi Qiang
2025-01-12 22:23 ` Xu Yilun
2025-01-14 1:14 ` Chenyi Qiang
2025-01-15 4:06 ` Alexey Kardashevskiy
2025-01-15 6:15 ` Chenyi Qiang
2025-01-20 10:22 ` Alexey Kardashevskiy
2025-01-20 20:46 ` Peter Xu
2024-06-24 16:31 ` Xu Yilun
2025-01-21 15:18 ` Peter Xu
2025-01-22 4:30 ` Alexey Kardashevskiy
2025-01-22 9:41 ` Xu Yilun
2025-01-22 16:43 ` Peter Xu
2025-01-23 9:33 ` Xu Yilun
2025-01-23 16:47 ` Peter Xu
2025-01-24 9:47 ` Xu Yilun
2025-01-24 15:55 ` Peter Xu
2025-01-24 18:17 ` David Hildenbrand
2025-01-26 3:34 ` Xu Yilun
2025-01-30 16:28 ` Peter Xu
2025-01-30 16:51 ` David Hildenbrand
2025-02-06 10:41 ` Xu Yilun
2025-02-06 20:03 ` Peter Xu
2025-01-14 6:45 ` Chenyi Qiang
2025-01-13 10:54 ` David Hildenbrand
2025-01-14 1:10 ` Chenyi Qiang
2025-01-15 4:05 ` Alexey Kardashevskiy
2025-01-20 10:48 ` David Hildenbrand
2025-01-20 17:21 ` Peter Xu
2025-01-20 17:54 ` David Hildenbrand
2025-01-20 18:33 ` Peter Xu
2025-01-20 18:47 ` David Hildenbrand
2025-01-20 20:19 ` Peter Xu
2025-01-20 20:25 ` David Hildenbrand
2025-01-20 20:43 ` Peter Xu
2025-01-21 1:35 ` Chenyi Qiang
2025-01-21 16:35 ` Peter Xu
2025-01-22 3:28 ` Chenyi Qiang
2025-01-22 5:38 ` Xiaoyao Li
2025-01-24 0:15 ` Alexey Kardashevskiy
2025-01-24 3:09 ` Chenyi Qiang
2025-01-24 5:56 ` Alexey Kardashevskiy
2025-01-24 16:12 ` Peter Xu
2025-01-20 18:09 ` Peter Xu
2025-01-21 9:00 ` Chenyi Qiang
2025-01-21 9:26 ` David Hildenbrand
2025-01-21 10:16 ` Chenyi Qiang
2025-01-21 10:26 ` David Hildenbrand
2025-01-22 6:43 ` Chenyi Qiang
2025-01-21 15:38 ` Peter Xu
2025-01-24 3:40 ` Chenyi Qiang
2024-12-13 7:08 ` [PATCH 3/7] guest_memfd: Introduce a callback to notify the shared/private state change Chenyi Qiang
2024-12-13 7:08 ` [PATCH 4/7] KVM: Notify the state change event during shared/private conversion Chenyi Qiang
2024-12-13 7:08 ` [PATCH 5/7] memory: Register the RamDiscardManager instance upon guest_memfd creation Chenyi Qiang
2025-01-08 4:47 ` Alexey Kardashevskiy
2025-01-09 5:34 ` Chenyi Qiang
2025-01-09 9:32 ` Alexey Kardashevskiy
2025-01-10 5:13 ` Chenyi Qiang
2025-01-20 13:06 ` David Hildenbrand
2025-01-24 3:27 ` Alexey Kardashevskiy
2025-01-24 5:36 ` Chenyi Qiang
2025-01-09 8:14 ` Zhao Liu
2025-01-09 8:17 ` Chenyi Qiang
2024-12-13 7:08 ` [PATCH 6/7] RAMBlock: make guest_memfd require coordinate discard Chenyi Qiang
2025-01-13 10:56 ` David Hildenbrand
2025-01-14 1:38 ` Chenyi Qiang
2025-01-20 13:11 ` David Hildenbrand
2025-01-21 6:26 ` Chenyi Qiang
2025-01-21 8:05 ` David Hildenbrand
2024-12-13 7:08 ` [RFC PATCH 7/7] memory: Add a new argument to indicate the request attribute in RamDismcardManager helpers Chenyi Qiang
2025-01-08 4:47 ` [PATCH 0/7] Enable shared device assignment Alexey Kardashevskiy
2025-01-08 6:28 ` Chenyi Qiang
2025-01-08 11:38 ` Alexey Kardashevskiy
2025-01-09 7:52 ` Chenyi Qiang
2025-01-09 8:18 ` Alexey Kardashevskiy
2025-01-09 8:49 ` Chenyi Qiang
2025-01-10 1:42 ` Alexey Kardashevskiy
2025-01-10 7:06 ` Chenyi Qiang
2025-01-10 8:26 ` David Hildenbrand
2025-01-10 13:20 ` Jason Gunthorpe [this message]
2025-01-10 13:45 ` David Hildenbrand
2025-01-10 14:14 ` Jason Gunthorpe
2025-01-10 14:50 ` David Hildenbrand
2025-01-15 3:39 ` Alexey Kardashevskiy
2025-01-15 12:49 ` Jason Gunthorpe
2025-01-20 12:57 ` David Hildenbrand
2025-01-20 18:39 ` Jason Gunthorpe
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250110132021.GE5556@nvidia.com \
--to=jgg@nvidia.com \
--cc=aik@amd.com \
--cc=chao.gao@intel.com \
--cc=chao.p.peng@intel.com \
--cc=chenyi.qiang@intel.com \
--cc=dan.j.williams@intel.com \
--cc=david@redhat.com \
--cc=kvm@vger.kernel.org \
--cc=michael.roth@amd.com \
--cc=pbonzini@redhat.com \
--cc=peterx@redhat.com \
--cc=philmd@linaro.org \
--cc=qemu-devel@nongnu.org \
--cc=yilun.xu@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.