From: "Christian König" <christian.koenig@amd.com>
To: Jason Gunthorpe <jgg@nvidia.com>
Cc: Xu Yilun <yilun.xu@linux.intel.com>,
Christoph Hellwig <hch@lst.de>,
Leon Romanovsky <leonro@nvidia.com>,
kvm@vger.kernel.org, dri-devel@lists.freedesktop.org,
linux-media@vger.kernel.org, linaro-mm-sig@lists.linaro.org,
sumit.semwal@linaro.org, pbonzini@redhat.com, seanjc@google.com,
alex.williamson@redhat.com, vivek.kasireddy@intel.com,
dan.j.williams@intel.com, aik@amd.com, yilun.xu@intel.com,
linux-coco@lists.linux.dev, linux-kernel@vger.kernel.org,
lukas@wunner.de, yan.y.zhao@intel.com, leon@kernel.org,
baolu.lu@linux.intel.com, zhenzhong.duan@intel.com,
tao1.su@intel.com
Subject: Re: [RFC PATCH 01/12] dma-buf: Introduce dma_buf_get_pfn_unlocked() kAPI
Date: Thu, 16 Jan 2025 16:13:13 +0100 [thread overview]
Message-ID: <5f588dac-d3e2-445d-9389-067b875412dc@amd.com> (raw)
In-Reply-To: <20250115170942.GT5556@nvidia.com>
[-- Attachment #1: Type: text/plain, Size: 7399 bytes --]
Am 15.01.25 um 18:09 schrieb Jason Gunthorpe:
> On Wed, Jan 15, 2025 at 05:34:23PM +0100, Christian König wrote:
>> Granted, let me try to improve this.
>> Here is a real world example of one of the issues we ran into and why
>> CPU mappings of importers are redirected to the exporter.
>> We have a good bunch of different exporters who track the CPU mappings
>> of their backing store using address_space objects in one way or
>> another and then uses unmap_mapping_range() to invalidate those CPU
>> mappings.
>> But when importers get the PFNs of the backing store they can look
>> behind the curtain and directly insert this PFN into the CPU page
>> tables.
>> We had literally tons of cases like this where drivers developers cause
>> access after free issues because the importer created a CPU mappings on
>> their own without the exporter knowing about it.
>> This is just one example of what we ran into. Additional to that
>> basically the whole synchronization between drivers was overhauled as
>> well because we found that we can't trust importers to always do the
>> right thing.
> But this, fundamentally, is importers creating attachments and then
> *ignoring the lifetime rules of DMABUF*. If you created an attachment,
> got a move and *ignored the move* because you put the PFN in your own
> VMA, then you are not following the attachment lifetime rules!
Move notify is solely for informing the importer that they need to
re-fresh their DMA mappings and eventually block for ongoing DMA to end.
This semantics doesn't work well for CPU mappings because you need to
hold the reservation lock to make sure that the information stay valid
and you can't hold a lock while returning from a page fault.
In other words page faults are opportunistically and happen in
concurrent with invalidation, while the move_notify approach is
serialized through a common lock.
I think that's what Sima tried to point out as well.
> To implement this safely the driver would need to use
> unma_mapping_range() on the driver VMA inside the move callback, and
> hook into the VMA fault callback to re-attach the dmabuf.
Yeah and exactly that is something we don't want to allow because it
means that every importer need to get things right to prevent exporters
from running into problems.
> This is where I get into trouble with your argument. It is not that
> the API has an issue, or that the rules of the API are not logical and
> functional.
>
> You are arguing that even a logical and functional API will be
> mis-used by some people and that reviewers will not catch
> it.
Well it's not miss-used, it's just a very bad design decision to let
every importer implement functionality which actually belong into a
single point in the exporter.
> Honestly, I don't think that is consistent with the kernel philosophy.
>
> We should do our best to make APIs that are had to mis-use, but if we
> can't achieve that it doesn't mean we stop and give up on problems,
> we go into the world of APIs that can be mis-used and we are supposed
> to rely on the reviewer system to catch it.
This is not giving up, but rather just apply good design approaches.
See I can only repeat my self that we came up with this approach because
of experience and finding that what you suggest here doesn't work.
In other words we already tried what you suggest here and it doesn't work.
>> You can already turn both a TEE allocated buffer as well as a memfd
>> into a DMA-buf. So basically TEE and memfd already provides different
>> interfaces which go beyond what DMA-buf does and allows.
>> In other words if you want to do things like direct I/O to block or
>> network devices you can mmap() your memfd and do this while at the same
>> time send your memfd as DMA-buf to your GPU, V4L or neural accelerator.
>> Would this be a way you could work with as well?
> I guess, but this still requires creating a dmabuf2 type thing with
> very similar semantics and then shimming dmabuf2 to 1 for DRM consumers.
>
> I don't see how it addresses your fundamental concern that the
> semantics we want are an API that is too easy for drivers to abuse.
>
> And being more functional and efficient we'd just see people wanting
> to use dmabuf2 directly instead of bothering with 1.
Why would you want to do a dmabuf2 here?
>> separate file descriptor representing the private MMIO which iommufd
>> and KVM uses but you can turn it into a DMA-buf whenever you need to
>> give it to a DMA-buf importer?
> Well, it would end up just being used everywhere. I think one person
> wanted to use this with DRM drivers for some reason, but RDMA would
> use the dmabuf2 directly because it will be much more efficient than
> using scatterlist.
>
> Honestly, I'd much rather extend dmabuf and see DRM institute some
> rule that DRM drivers may not use XYZ parts of the improvement. Like
> maybe we could use some symbol namespaces to really enforce it
> eg. MODULE_IMPORT_NS(DMABUF_NOT_FOR_DRM_USAGE)
>
> Some of the improvements we want like the revoke rules for lifetime
> seem to be agreeable.
>
> Block the API that gives you the non-scatterlist attachment. Only
> VFIO/RDMA/kvm/iommufd will get to implement it.
I don't mind improving the scatterlist approach in any way possible. I'm
just rejecting things which we already tried and turned out to be a bad
idea.
If you make an interface which gives DMA addresses plus additional
information like address space, access hints etc.. to importers that
would be really welcomed.
But exposing PFNs and letting the importers created their DMA mappings
themselves and making CPU mappings themselves is an absolutely clear no-go.
>> In this case Xu is exporting MMIO from VFIO and importing to KVM and
>> iommufd.
>>
>> So basically a portion of a PCIe BAR is imported into iommufd?
> And KVM. We need to get the CPU address into KVM and IOMMU page
> tables. It must go through a private FD path and not a VMA because of
> the CC rules about machine check I mentioned earlier. The private FD
> must have a lifetime model to ensure we don't UAF the PCIe BAR memory.
Then create an interface between VFIO and KVM/iommufd which allows to
pass data between these two.
We already do this between DMA-buf exporters/importers all the time.
Just don't make it general DMA-buf API.
> Someone else had some use case where they wanted to put the VFIO MMIO
> PCIe BAR into a DMABUF and ship it into a GPU driver for
> somethingsomething virtualization but I didn't understand it.
Yeah, that is already perfectly supported.
>> Let's just say that both the ARM guys as well as the GPU people already
>> have some pretty "interesting" ways of doing digital rights management
>> and content protection.
> Well, that is TEE stuff, TEE and CC are not the same thing, though
> they have some high level conceptual overlap.
>
> In a certain sense CC is a TEE that is built using KVM instead of the
> TEE subsystem. Using KVM and integrating with the MM brings a whole
> set of unique challenges that TEE got to avoid..
Please go over those challenges in more detail. I need to get a better
understanding of what's going on here.
E.g. who manages encryption keys, who raises the machine check on
violations etc...
Regards,
Christian.
>
> Jason
[-- Attachment #2: Type: text/html, Size: 10071 bytes --]
next prev parent reply other threads:[~2025-01-16 15:13 UTC|newest]
Thread overview: 150+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-01-07 14:27 [RFC PATCH 00/12] Private MMIO support for private assigned dev Xu Yilun
2025-01-07 14:27 ` [RFC PATCH 01/12] dma-buf: Introduce dma_buf_get_pfn_unlocked() kAPI Xu Yilun
2025-01-08 8:01 ` Christian König
2025-01-08 13:23 ` Jason Gunthorpe
2025-01-08 13:44 ` Christian König
2025-01-08 14:58 ` Jason Gunthorpe
2025-01-08 15:25 ` Christian König
2025-01-08 16:22 ` Jason Gunthorpe
2025-01-08 17:56 ` Xu Yilun
2025-01-10 19:24 ` Simona Vetter
2025-01-10 20:16 ` Jason Gunthorpe
2025-01-08 18:44 ` Simona Vetter
2025-01-08 19:22 ` Xu Yilun
2025-01-09 8:04 ` Christian König
2025-01-08 23:06 ` Xu Yilun
2025-01-10 19:34 ` Simona Vetter
2025-01-10 20:38 ` Jason Gunthorpe
2025-01-12 22:10 ` Xu Yilun
2025-01-14 14:44 ` Simona Vetter
2025-01-14 17:31 ` Jason Gunthorpe
2025-01-15 8:55 ` Simona Vetter
2025-01-15 9:32 ` Christoph Hellwig
2025-01-15 13:34 ` Jason Gunthorpe
2025-01-16 5:33 ` Christoph Hellwig
2024-06-19 23:39 ` Xu Yilun
2025-01-16 13:28 ` Jason Gunthorpe
2025-01-15 10:06 ` Christian König
2025-01-17 14:42 ` Simona Vetter
2025-01-20 12:14 ` Christian König
2025-01-20 17:59 ` Jason Gunthorpe
2025-01-20 18:50 ` Simona Vetter
2025-01-20 19:48 ` Jason Gunthorpe
2025-01-21 16:11 ` Simona Vetter
2025-01-21 17:36 ` Jason Gunthorpe
2025-01-22 11:04 ` Simona Vetter
2025-01-22 13:28 ` Jason Gunthorpe
2025-01-22 13:29 ` Christian König
2025-01-22 14:37 ` Jason Gunthorpe
2025-01-22 14:59 ` Christian König
2025-01-23 13:59 ` Jason Gunthorpe
2025-01-23 14:32 ` Christian König
2025-01-23 14:35 ` Christian König
2025-01-23 15:02 ` Jason Gunthorpe
2025-01-23 15:48 ` Christian König
2025-01-23 16:08 ` Jason Gunthorpe
2025-01-09 8:09 ` Christian König
2025-01-10 20:54 ` Jason Gunthorpe
2025-01-15 9:38 ` Christian König
2025-01-15 13:38 ` Jason Gunthorpe
2025-01-15 13:45 ` Christian König
2025-01-15 13:46 ` Christian König
2025-01-15 14:14 ` Jason Gunthorpe
2025-01-15 14:29 ` Christian König
2025-01-15 14:30 ` Christian König
2025-01-15 15:10 ` Jason Gunthorpe
2025-01-15 16:34 ` Christian König
2025-01-15 17:09 ` Jason Gunthorpe
2025-01-16 15:13 ` Christian König [this message]
2024-06-20 22:02 ` Xu Yilun
2025-01-20 13:44 ` Christian König
2025-01-22 4:16 ` Xu Yilun
2025-01-16 16:07 ` Jason Gunthorpe
2025-01-17 14:37 ` Simona Vetter
2025-01-09 9:10 ` Christian König
2025-01-09 9:28 ` Leon Romanovsky
2025-01-07 14:27 ` [RFC PATCH 02/12] vfio: Export vfio device get and put registration helpers Xu Yilun
2025-01-07 14:27 ` [RFC PATCH 03/12] vfio/pci: Share the core device pointer while invoking feature functions Xu Yilun
2025-01-07 14:27 ` [RFC PATCH 04/12] vfio/pci: Allow MMIO regions to be exported through dma-buf Xu Yilun
2026-05-06 2:35 ` Alexey Kardashevskiy
2026-05-06 13:16 ` Jason Gunthorpe
2026-05-07 7:16 ` Alexey Kardashevskiy
2026-05-11 12:01 ` Jason Gunthorpe
2026-05-11 23:42 ` Alexey Kardashevskiy
2026-05-11 23:56 ` Jason Gunthorpe
2026-05-12 5:49 ` Alexey Kardashevskiy
2025-01-07 14:27 ` [RFC PATCH 05/12] vfio/pci: Support get_pfn() callback for dma-buf Xu Yilun
2025-01-07 14:27 ` [RFC PATCH 06/12] KVM: Support vfio_dmabuf backed MMIO region Xu Yilun
2025-01-07 14:27 ` [RFC PATCH 07/12] KVM: x86/mmu: Handle page fault for vfio_dmabuf backed MMIO Xu Yilun
2025-01-07 14:27 ` [RFC PATCH 08/12] vfio/pci: Create host unaccessible dma-buf for private device Xu Yilun
2025-01-08 13:30 ` Jason Gunthorpe
2025-01-08 16:57 ` Xu Yilun
2025-01-09 14:40 ` Jason Gunthorpe
2025-01-09 16:40 ` Xu Yilun
2025-01-10 13:31 ` Jason Gunthorpe
2025-01-11 3:48 ` Xu Yilun
2025-01-13 16:49 ` Jason Gunthorpe
2024-06-17 23:28 ` Xu Yilun
2025-01-14 13:35 ` Jason Gunthorpe
2025-01-15 12:57 ` Alexey Kardashevskiy
2025-01-15 13:01 ` Jason Gunthorpe
2025-01-17 1:57 ` Baolu Lu
2025-01-17 13:25 ` Jason Gunthorpe
2024-06-23 19:59 ` Xu Yilun
2025-01-20 13:25 ` Jason Gunthorpe
2024-06-24 21:12 ` Xu Yilun
2025-01-21 17:43 ` Jason Gunthorpe
2025-01-22 4:32 ` Xu Yilun
2025-01-22 12:55 ` Jason Gunthorpe
2025-01-23 7:41 ` Xu Yilun
2025-01-23 13:08 ` Jason Gunthorpe
2025-01-20 4:41 ` Baolu Lu
2025-01-20 9:45 ` Alexey Kardashevskiy
2025-01-20 13:28 ` Jason Gunthorpe
2025-03-12 1:37 ` Dan Williams
2025-03-17 16:38 ` Jason Gunthorpe
2025-01-07 14:27 ` [RFC PATCH 09/12] vfio/pci: Export vfio dma-buf specific info for importers Xu Yilun
2025-01-07 14:27 ` [RFC PATCH 10/12] KVM: vfio_dmabuf: Fetch VFIO specific dma-buf data for sanity check Xu Yilun
2025-01-07 14:27 ` [RFC PATCH 11/12] KVM: x86/mmu: Export kvm_is_mmio_pfn() Xu Yilun
2025-01-07 14:27 ` [RFC PATCH 12/12] KVM: TDX: Implement TDX specific private MMIO map/unmap for SEPT Xu Yilun
2025-04-29 6:48 ` [RFC PATCH 00/12] Private MMIO support for private assigned dev Alexey Kardashevskiy
2025-04-29 7:50 ` Alexey Kardashevskiy
2025-05-09 3:04 ` Alexey Kardashevskiy
2025-05-09 11:12 ` Xu Yilun
2025-05-09 16:28 ` Xu Yilun
2025-05-09 18:43 ` Jason Gunthorpe
2025-05-10 3:47 ` Xu Yilun
2025-05-12 9:30 ` Alexey Kardashevskiy
2025-05-12 14:06 ` Jason Gunthorpe
2025-05-13 10:03 ` Zhi Wang
2025-05-14 9:47 ` Xu Yilun
2025-05-14 20:05 ` Zhi Wang
2025-05-15 18:02 ` Xu Yilun
2025-05-15 19:21 ` Jason Gunthorpe
2025-05-16 6:19 ` Xu Yilun
2025-05-16 12:49 ` Jason Gunthorpe
2025-05-17 2:33 ` Xu Yilun
2025-05-20 10:57 ` Alexey Kardashevskiy
2025-05-24 3:33 ` Xu Yilun
2025-05-15 10:29 ` Alexey Kardashevskiy
2025-05-15 16:44 ` Zhi Wang
2025-05-15 16:53 ` Zhi Wang
2025-05-21 10:41 ` Alexey Kardashevskiy
2025-05-14 7:02 ` Xu Yilun
2025-05-14 16:33 ` Jason Gunthorpe
2025-05-15 16:04 ` Xu Yilun
2025-05-15 17:56 ` Jason Gunthorpe
2025-05-16 6:03 ` Xu Yilun
2025-05-22 3:45 ` Alexey Kardashevskiy
2025-05-24 3:13 ` Xu Yilun
2025-05-26 7:18 ` Alexey Kardashevskiy
2025-05-29 14:41 ` Xu Yilun
2025-05-29 16:29 ` Jason Gunthorpe
2025-05-30 16:07 ` Xu Yilun
2025-05-30 2:29 ` Alexey Kardashevskiy
2025-05-30 16:23 ` Xu Yilun
2025-06-10 4:20 ` Alexey Kardashevskiy
2025-06-10 5:19 ` Baolu Lu
2025-06-10 6:53 ` Xu Yilun
2025-05-14 3:20 ` Xu Yilun
2025-06-10 4:37 ` Alexey Kardashevskiy
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5f588dac-d3e2-445d-9389-067b875412dc@amd.com \
--to=christian.koenig@amd.com \
--cc=aik@amd.com \
--cc=alex.williamson@redhat.com \
--cc=baolu.lu@linux.intel.com \
--cc=dan.j.williams@intel.com \
--cc=dri-devel@lists.freedesktop.org \
--cc=hch@lst.de \
--cc=jgg@nvidia.com \
--cc=kvm@vger.kernel.org \
--cc=leon@kernel.org \
--cc=leonro@nvidia.com \
--cc=linaro-mm-sig@lists.linaro.org \
--cc=linux-coco@lists.linux.dev \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-media@vger.kernel.org \
--cc=lukas@wunner.de \
--cc=pbonzini@redhat.com \
--cc=seanjc@google.com \
--cc=sumit.semwal@linaro.org \
--cc=tao1.su@intel.com \
--cc=vivek.kasireddy@intel.com \
--cc=yan.y.zhao@intel.com \
--cc=yilun.xu@intel.com \
--cc=yilun.xu@linux.intel.com \
--cc=zhenzhong.duan@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.