kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jason Gunthorpe <jgg@nvidia.com>
To: "Christian König" <christian.koenig@amd.com>
Cc: Xu Yilun <yilun.xu@linux.intel.com>,
	Christoph Hellwig <hch@lst.de>,
	Leon Romanovsky <leonro@nvidia.com>,
	kvm@vger.kernel.org, dri-devel@lists.freedesktop.org,
	linux-media@vger.kernel.org, linaro-mm-sig@lists.linaro.org,
	sumit.semwal@linaro.org, pbonzini@redhat.com, seanjc@google.com,
	alex.williamson@redhat.com, vivek.kasireddy@intel.com,
	dan.j.williams@intel.com, aik@amd.com, yilun.xu@intel.com,
	linux-coco@lists.linux.dev, linux-kernel@vger.kernel.org,
	lukas@wunner.de, yan.y.zhao@intel.com, leon@kernel.org,
	baolu.lu@linux.intel.com, zhenzhong.duan@intel.com,
	tao1.su@intel.com
Subject: Re: [RFC PATCH 01/12] dma-buf: Introduce dma_buf_get_pfn_unlocked() kAPI
Date: Thu, 16 Jan 2025 12:07:47 -0400	[thread overview]
Message-ID: <20250116160747.GV5556@nvidia.com> (raw)
In-Reply-To: <5f588dac-d3e2-445d-9389-067b875412dc@amd.com>

On Thu, Jan 16, 2025 at 04:13:13PM +0100, Christian König wrote:
>> But this, fundamentally, is importers creating attachments and then
>> *ignoring the lifetime rules of DMABUF*. If you created an attachment,
>> got a move and *ignored the move* because you put the PFN in your own
>> VMA, then you are not following the attachment lifetime rules!
> 
>    Move notify is solely for informing the importer that they need to
>    re-fresh their DMA mappings and eventually block for ongoing DMA to
>    end.

I feel that it is a bit pedantic to say DMA and CPU are somehow
different. The DMABUF API gives you a scatterlist, it is reasonable to
say that move invalidates the entire scatterlist, CPU and DMA equally.

>    This semantics doesn't work well for CPU mappings because you need to
>    hold the reservation lock to make sure that the information stay valid
>    and you can't hold a lock while returning from a page fault.

Sure, I imagine hooking up a VMA is hard - but that doesn't change my
point. The semantics can be reasonable and well defined.

>    Yeah and exactly that is something we don't want to allow because it
>    means that every importer need to get things right to prevent exporters
>    from running into problems.

You can make the same argument about the DMA address. We should just
get rid of DMABUF entirely because people are going to mis-use it and
wrongly implement the invalidation callback.

I have no idea why GPU drivers want to implement mmap of dmabuf, that
seems to be a uniquely GPU thing. We are not going to be doing stuff
like that in KVM and other places. And we can implement the
invalidation callback with correct locking. Why should we all be
punished because DRM drivers seem to have this weird historical mmap
problem?

I don't think that is a reasonable way to approach building a general
purpose linux kernel API.
 
>    Well it's not miss-used, it's just a very bad design decision to let
>    every importer implement functionality which actually belong into a
>    single point in the exporter.

Well, this is the problem. Sure it may be that importers should not
implement mmap - but using the PFN side address is needed for more
than just mmap!

DMA mapping belongs in the importer, and the new DMA API makes this
even more explicit by allowing the importer alot of options to
optimize the process of building the HW datastructures. Scatterlist
and the enforeced represetation of the DMA list is very inefficient
and we are working to get rid of it. It isn't going to be replaced by
any sort of list of DMA addresses though.

If you really disagree you can try to convince the NVMe people to give
up their optimizations the new DMA API allows so DRM can prevent this
code-review problem.

I also want the same optimizations in RDMA, and I am also not
convinced giving them up is a worthwhile tradeoff.

>    Why would you want to do a dmabuf2 here?

Because I need the same kind of common framework. I need to hook VFIO
to RDMA as well. I need to fix RDMA to have working P2P in all
cases. I need to hook KVM virtual device stuff to iommufd. Someone
else need VFIO to hook into DRM.

How many different times do I need to implement a buffer sharing
lifetime model? No, we should not make a VFIO specific thing, we need
a general tool to do this properly and cover all the different use
cases. That's "dmabuf2" or whatever you want to call it. There are
more than enough use cases to justify doing this. I think this is a
bad idea, we do not need two things, we should have dmabuf to handle
all the use cases people have, not just DRMs.

>    I don't mind improving the scatterlist approach in any way possible.
>    I'm just rejecting things which we already tried and turned out to be a
>    bad idea.
>    If you make an interface which gives DMA addresses plus additional
>    information like address space, access hints etc.. to importers that
>    would be really welcomed.

This is not welcomed, having lists of DMA addresses is inefficient and
does not match the direction of the DMA API. We are trying very hard
to completely remove the lists of DMA addresses in common fast paths.

>    But exposing PFNs and letting the importers created their DMA mappings
>    themselves and making CPU mappings themselves is an absolutely clear
>    no-go.

Again, this is what we must have to support the new DMA API, the KVM
and IOMMUFD use cases I mentioned.

>> In this case Xu is exporting MMIO from VFIO and importing to KVM and
>> iommufd.
> 
>    So basically a portion of a PCIe BAR is imported into iommufd?

Yeah, and KVM. And RMDA.

>    Then create an interface between VFIO and KVM/iommufd which allows to
>    pass data between these two.
>    We already do this between DMA-buf exporters/importers all the time.
>    Just don't make it general DMA-buf API.

I have no idea what this means. We'd need a new API linked to DMABUF
that would be optional and used by this part of the world. As I said
above we could protect it with some module namespace so you can keep
it out of DRM. If you can agree to that then it seems fine..

> > Someone else had some use case where they wanted to put the VFIO MMIO
> > PCIe BAR into a DMABUF and ship it into a GPU driver for
> > somethingsomething virtualization but I didn't understand it.
> 
>    Yeah, that is already perfectly supported.

No, it isn't. Christoph is blocking DMABUF in VFIO because he does not
want to scatterlist abuses that dmabuf is doing to proliferate.  We
already have some ARM systems where the naive way typical DMABUF
implementations are setting up P2P does not work. Those systems have
PCI offset.

Getting this to be "perfectly supported" is why we are working on all
these aspects to improve the DMA API and remove the scatterlist
abuses.

>> In a certain sense CC is a TEE that is built using KVM instead of the
>> TEE subsystem. Using KVM and integrating with the MM brings a whole
>> set of unique challenges that TEE got to avoid..
> 
>    Please go over those challenges in more detail. I need to get a better
>    understanding of what's going on here.
>    E.g. who manages encryption keys, who raises the machine check on
>    violations etc...

TEE broadly has Linux launch a secure world that does some private
work. The secure worlds tend to be very limited, they are not really
VMs and they don't run full Linux inside

CC broadly has the secure world exist at boot and launch Linux and
provide services to Linux. The secure world enforces memory isolation
on Linux and generates faults on violations. KVM is the gateway to
launch new secure worlds and the secure worlds are full VMs with all
the device emulation and more.

It CC is much more like xen with it's hypervisor and DOM0 concepts.

From this perspective, the only thing that matters is that CC secure
memory is different and special - it is very much like your private
memory concept. Only special places that understand it and have the
right HW capability can use it. All the consumers need a CPU address
to program their HW because of how the secure world security works.

Jason

  parent reply	other threads:[~2025-01-16 16:07 UTC|newest]

Thread overview: 134+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-01-07 14:27 [RFC PATCH 00/12] Private MMIO support for private assigned dev Xu Yilun
2025-01-07 14:27 ` [RFC PATCH 01/12] dma-buf: Introduce dma_buf_get_pfn_unlocked() kAPI Xu Yilun
2025-01-08  8:01   ` Christian König
2025-01-08 13:23     ` Jason Gunthorpe
2025-01-08 13:44       ` Christian König
2025-01-08 14:58         ` Jason Gunthorpe
2025-01-08 15:25           ` Christian König
2025-01-08 16:22             ` Jason Gunthorpe
2025-01-08 17:56               ` Xu Yilun
2025-01-10 19:24                 ` Simona Vetter
2025-01-10 20:16                   ` Jason Gunthorpe
2025-01-08 18:44               ` Simona Vetter
2025-01-08 19:22                 ` Xu Yilun
     [not found]                   ` <58e97916-e6fd-41ef-84b4-bbf53ed0e8e4@amd.com>
2025-01-08 23:06                     ` Xu Yilun
2025-01-10 19:34                       ` Simona Vetter
2025-01-10 20:38                         ` Jason Gunthorpe
2025-01-12 22:10                           ` Xu Yilun
2025-01-14 14:44                           ` Simona Vetter
2025-01-14 17:31                             ` Jason Gunthorpe
2025-01-15  8:55                               ` Simona Vetter
2025-01-15  9:32                                 ` Christoph Hellwig
2025-01-15 13:34                                   ` Jason Gunthorpe
2025-01-16  5:33                                     ` Christoph Hellwig
2024-06-19 23:39                                       ` Xu Yilun
2025-01-16 13:28                                       ` Jason Gunthorpe
     [not found]                                 ` <420bd2ea-d87c-4f01-883e-a7a5cf1635fe@amd.com>
2025-01-17 14:42                                   ` Simona Vetter
2025-01-20 12:14                                     ` Christian König
2025-01-20 17:59                                       ` Jason Gunthorpe
2025-01-20 18:50                                         ` Simona Vetter
2025-01-20 19:48                                           ` Jason Gunthorpe
2025-01-21 16:11                                             ` Simona Vetter
2025-01-21 17:36                                               ` Jason Gunthorpe
2025-01-22 11:04                                                 ` Simona Vetter
2025-01-22 13:28                                                   ` Jason Gunthorpe
2025-01-22 13:29                                                   ` Christian König
2025-01-22 14:37                                                     ` Jason Gunthorpe
2025-01-22 14:59                                                       ` Christian König
2025-01-23 13:59                                                         ` Jason Gunthorpe
     [not found]                                                           ` <9a36fba5-2dee-46fd-9f51-47c5f0ffc1d4@amd.com>
2025-01-23 14:35                                                             ` Christian König
2025-01-23 15:02                                                               ` Jason Gunthorpe
     [not found]                                                                 ` <89f46c7f-a585-44e2-963d-bf00bf09b493@amd.com>
2025-01-23 16:08                                                                   ` Jason Gunthorpe
2025-01-09  8:09                     ` Christian König
2025-01-10 20:54                       ` Jason Gunthorpe
2025-01-15  9:38                         ` Christian König
2025-01-15 13:38                           ` Jason Gunthorpe
     [not found]                             ` <f6c2524f-5ef5-4c2c-a464-a7b195e0bf6c@amd.com>
2025-01-15 13:46                               ` Christian König
2025-01-15 14:14                                 ` Jason Gunthorpe
     [not found]                                   ` <c86cfee1-063a-4972-a343-ea0eff2141c9@amd.com>
2025-01-15 14:30                                     ` Christian König
2025-01-15 15:10                                       ` Jason Gunthorpe
     [not found]                                         ` <6f7a14aa-f607-45f9-9e15-759e26079dec@amd.com>
2025-01-15 17:09                                           ` Jason Gunthorpe
     [not found]                                             ` <5f588dac-d3e2-445d-9389-067b875412dc@amd.com>
2024-06-20 22:02                                               ` Xu Yilun
2025-01-20 13:44                                                 ` Christian König
2025-01-22  4:16                                                   ` Xu Yilun
2025-01-16 16:07                                               ` Jason Gunthorpe [this message]
2025-01-17 14:37                                                 ` Simona Vetter
     [not found]               ` <0e7f92bd-7da3-4328-9081-0957b3d155ca@amd.com>
2025-01-09  9:28                 ` Leon Romanovsky
2025-01-07 14:27 ` [RFC PATCH 02/12] vfio: Export vfio device get and put registration helpers Xu Yilun
2025-01-07 14:27 ` [RFC PATCH 03/12] vfio/pci: Share the core device pointer while invoking feature functions Xu Yilun
2025-01-07 14:27 ` [RFC PATCH 04/12] vfio/pci: Allow MMIO regions to be exported through dma-buf Xu Yilun
2025-01-07 14:27 ` [RFC PATCH 05/12] vfio/pci: Support get_pfn() callback for dma-buf Xu Yilun
2025-01-07 14:27 ` [RFC PATCH 06/12] KVM: Support vfio_dmabuf backed MMIO region Xu Yilun
2025-01-07 14:27 ` [RFC PATCH 07/12] KVM: x86/mmu: Handle page fault for vfio_dmabuf backed MMIO Xu Yilun
2025-01-07 14:27 ` [RFC PATCH 08/12] vfio/pci: Create host unaccessible dma-buf for private device Xu Yilun
2025-01-08 13:30   ` Jason Gunthorpe
2025-01-08 16:57     ` Xu Yilun
2025-01-09 14:40       ` Jason Gunthorpe
2025-01-09 16:40         ` Xu Yilun
2025-01-10 13:31           ` Jason Gunthorpe
2025-01-11  3:48             ` Xu Yilun
2025-01-13 16:49               ` Jason Gunthorpe
2024-06-17 23:28                 ` Xu Yilun
2025-01-14 13:35                   ` Jason Gunthorpe
2025-01-15 12:57                     ` Alexey Kardashevskiy
2025-01-15 13:01                       ` Jason Gunthorpe
2025-01-17  1:57                         ` Baolu Lu
2025-01-17 13:25                           ` Jason Gunthorpe
2024-06-23 19:59                             ` Xu Yilun
2025-01-20 13:25                               ` Jason Gunthorpe
2024-06-24 21:12                                 ` Xu Yilun
2025-01-21 17:43                                   ` Jason Gunthorpe
2025-01-22  4:32                                     ` Xu Yilun
2025-01-22 12:55                                       ` Jason Gunthorpe
2025-01-23  7:41                                         ` Xu Yilun
2025-01-23 13:08                                           ` Jason Gunthorpe
2025-01-20  4:41                             ` Baolu Lu
2025-01-20  9:45                             ` Alexey Kardashevskiy
2025-01-20 13:28                               ` Jason Gunthorpe
2025-03-12  1:37                                 ` Dan Williams
2025-03-17 16:38                                   ` Jason Gunthorpe
2025-01-07 14:27 ` [RFC PATCH 09/12] vfio/pci: Export vfio dma-buf specific info for importers Xu Yilun
2025-01-07 14:27 ` [RFC PATCH 10/12] KVM: vfio_dmabuf: Fetch VFIO specific dma-buf data for sanity check Xu Yilun
2025-01-07 14:27 ` [RFC PATCH 11/12] KVM: x86/mmu: Export kvm_is_mmio_pfn() Xu Yilun
2025-01-07 14:27 ` [RFC PATCH 12/12] KVM: TDX: Implement TDX specific private MMIO map/unmap for SEPT Xu Yilun
2025-04-29  6:48 ` [RFC PATCH 00/12] Private MMIO support for private assigned dev Alexey Kardashevskiy
2025-04-29  7:50   ` Alexey Kardashevskiy
2025-05-09  3:04     ` Alexey Kardashevskiy
2025-05-09 11:12       ` Xu Yilun
2025-05-09 16:28         ` Xu Yilun
2025-05-09 18:43           ` Jason Gunthorpe
2025-05-10  3:47             ` Xu Yilun
2025-05-12  9:30               ` Alexey Kardashevskiy
2025-05-12 14:06                 ` Jason Gunthorpe
2025-05-13 10:03                   ` Zhi Wang
2025-05-14  9:47                     ` Xu Yilun
2025-05-14 20:05                       ` Zhi Wang
2025-05-15 18:02                         ` Xu Yilun
2025-05-15 19:21                           ` Jason Gunthorpe
2025-05-16  6:19                             ` Xu Yilun
2025-05-16 12:49                               ` Jason Gunthorpe
2025-05-17  2:33                                 ` Xu Yilun
2025-05-20 10:57                           ` Alexey Kardashevskiy
2025-05-24  3:33                             ` Xu Yilun
2025-05-15 10:29                     ` Alexey Kardashevskiy
2025-05-15 16:44                       ` Zhi Wang
2025-05-15 16:53                         ` Zhi Wang
2025-05-21 10:41                           ` Alexey Kardashevskiy
2025-05-14  7:02                   ` Xu Yilun
2025-05-14 16:33                     ` Jason Gunthorpe
2025-05-15 16:04                       ` Xu Yilun
2025-05-15 17:56                         ` Jason Gunthorpe
2025-05-16  6:03                           ` Xu Yilun
2025-05-22  3:45                         ` Alexey Kardashevskiy
2025-05-24  3:13                           ` Xu Yilun
2025-05-26  7:18                             ` Alexey Kardashevskiy
2025-05-29 14:41                               ` Xu Yilun
2025-05-29 16:29                                 ` Jason Gunthorpe
2025-05-30 16:07                                   ` Xu Yilun
2025-05-30  2:29                                 ` Alexey Kardashevskiy
2025-05-30 16:23                                   ` Xu Yilun
2025-06-10  4:20                                     ` Alexey Kardashevskiy
2025-06-10  5:19                                       ` Baolu Lu
2025-06-10  6:53                                       ` Xu Yilun
2025-05-14  3:20                 ` Xu Yilun
2025-06-10  4:37                   ` Alexey Kardashevskiy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250116160747.GV5556@nvidia.com \
    --to=jgg@nvidia.com \
    --cc=aik@amd.com \
    --cc=alex.williamson@redhat.com \
    --cc=baolu.lu@linux.intel.com \
    --cc=christian.koenig@amd.com \
    --cc=dan.j.williams@intel.com \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=hch@lst.de \
    --cc=kvm@vger.kernel.org \
    --cc=leon@kernel.org \
    --cc=leonro@nvidia.com \
    --cc=linaro-mm-sig@lists.linaro.org \
    --cc=linux-coco@lists.linux.dev \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-media@vger.kernel.org \
    --cc=lukas@wunner.de \
    --cc=pbonzini@redhat.com \
    --cc=seanjc@google.com \
    --cc=sumit.semwal@linaro.org \
    --cc=tao1.su@intel.com \
    --cc=vivek.kasireddy@intel.com \
    --cc=yan.y.zhao@intel.com \
    --cc=yilun.xu@intel.com \
    --cc=yilun.xu@linux.intel.com \
    --cc=zhenzhong.duan@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).