Re: [PATCH 0/4] Allow MMIO regions to be exported through dma-buf

All of lore.kernel.org
 help / color / mirror / Atom feed

From: "Christian König" <christian.koenig@amd.com>
To: Jason Gunthorpe <jgg@nvidia.com>
Cc: Alex Williamson <alex.williamson@redhat.com>,
	Cornelia Huck <cohuck@redhat.com>,
	dri-devel@lists.freedesktop.org, kvm@vger.kernel.org,
	linaro-mm-sig@lists.linaro.org, linux-media@vger.kernel.org,
	Sumit Semwal <sumit.semwal@linaro.org>,
	Daniel Vetter <daniel.vetter@ffwll.ch>,
	Leon Romanovsky <leon@kernel.org>,
	linux-rdma@vger.kernel.org, Maor Gottlieb <maorg@nvidia.com>,
	Oded Gabbay <ogabbay@kernel.org>
Subject: Re: [PATCH 0/4] Allow MMIO regions to be exported through dma-buf
Date: Thu, 18 Aug 2022 14:58:10 +0200	[thread overview]
Message-ID: <d12fdf94-fbef-b981-2eff-660470ceca22@amd.com> (raw)
In-Reply-To: <Yv4qlOp9n78B8TFb@nvidia.com>



Am 18.08.22 um 14:03 schrieb Jason Gunthorpe:
> On Thu, Aug 18, 2022 at 01:07:16PM +0200, Christian König wrote:
>> Am 17.08.22 um 18:11 schrieb Jason Gunthorpe:
>>> dma-buf has become a way to safely acquire a handle to non-struct page
>>> memory that can still have lifetime controlled by the exporter. Notably
>>> RDMA can now import dma-buf FDs and build them into MRs which allows for
>>> PCI P2P operations. Extend this to allow vfio-pci to export MMIO memory
>>> from PCI device BARs.
>>>
>>> This series supports a use case for SPDK where a NVMe device will be owned
>>> by SPDK through VFIO but interacting with a RDMA device. The RDMA device
>>> may directly access the NVMe CMB or directly manipulate the NVMe device's
>>> doorbell using PCI P2P.
>>>
>>> However, as a general mechanism, it can support many other scenarios with
>>> VFIO. I imagine this dmabuf approach to be usable by iommufd as well for
>>> generic and safe P2P mappings.
>> In general looks good to me, but we really need to get away from using
>> sg_tables for this here.
>>
>> The only thing I'm not 100% convinced of is dma_buf_try_get(), I've seen
>> this incorrectly used so many times that I can't count them any more.
>>
>> Would that be somehow avoidable? Or could you at least explain the use case
>> a bit better.
> I didn't see a way, maybe you know of one

For GEM objects we usually don't use the reference count of the DMA-buf, 
but rather that of the GEM object for this. But that's not an ideal 
solution either.

>
> VFIO needs to maintain a list of dmabuf FDs that have been created by
> the user attached to each vfio_device:
>
> int vfio_pci_core_feature_dma_buf(struct vfio_pci_core_device *vdev, u32 flags,
> 				  struct vfio_device_feature_dma_buf __user *arg,
> 				  size_t argsz)
> {
> 	down_write(&vdev->memory_lock);
> 	list_add_tail(&priv->dmabufs_elm, &vdev->dmabufs);
> 	up_write(&vdev->memory_lock);
>
> And dmabuf FD's are removed from the list when the user closes the FD:
>
> static void vfio_pci_dma_buf_release(struct dma_buf *dmabuf)
> {
> 		down_write(&priv->vdev->memory_lock);
> 		list_del_init(&priv->dmabufs_elm);
> 		up_write(&priv->vdev->memory_lock);
>
> Which then poses the problem: How do you iterate over only dma_buf's
> that are still alive to execute move?
>
> This seems necessary as parts of the dma_buf have already been
> destroyed by the time the user's release function is called.
>
> Which I solved like this:
>
> 	down_write(&vdev->memory_lock);
> 	list_for_each_entry_safe(priv, tmp, &vdev->dmabufs, dmabufs_elm) {
> 		if (!dma_buf_try_get(priv->dmabuf))
> 			continue;

What would happen if you don't skip destroyed dma-bufs here? In other 
words why do you maintain that list in the first place?

Regards,
Christian.

>
> So the scenarios resolve as:
>   - Concurrent release is not in progress: dma_buf_try_get() succeeds
>     and prevents concurrent release from starting
>   - Release has started but not reached its memory_lock:
>     dma_buf_try_get() fails
>   - Release has started but passed its memory_lock: dmabuf is not on
>     the list so dma_buf_try_get() is not called.
>
> Jason

WARNING: multiple messages have this Message-ID (diff)

From: "Christian König" <christian.koenig@amd.com>
To: Jason Gunthorpe <jgg@nvidia.com>
Cc: Leon Romanovsky <leon@kernel.org>,
	kvm@vger.kernel.org, linux-rdma@vger.kernel.org,
	Daniel Vetter <daniel.vetter@ffwll.ch>,
	Oded Gabbay <ogabbay@kernel.org>,
	Cornelia Huck <cohuck@redhat.com>,
	dri-devel@lists.freedesktop.org, linaro-mm-sig@lists.linaro.org,
	Alex Williamson <alex.williamson@redhat.com>,
	Maor Gottlieb <maorg@nvidia.com>,
	Sumit Semwal <sumit.semwal@linaro.org>,
	linux-media@vger.kernel.org
Subject: Re: [PATCH 0/4] Allow MMIO regions to be exported through dma-buf
Date: Thu, 18 Aug 2022 14:58:10 +0200	[thread overview]
Message-ID: <d12fdf94-fbef-b981-2eff-660470ceca22@amd.com> (raw)
In-Reply-To: <Yv4qlOp9n78B8TFb@nvidia.com>



Am 18.08.22 um 14:03 schrieb Jason Gunthorpe:
> On Thu, Aug 18, 2022 at 01:07:16PM +0200, Christian König wrote:
>> Am 17.08.22 um 18:11 schrieb Jason Gunthorpe:
>>> dma-buf has become a way to safely acquire a handle to non-struct page
>>> memory that can still have lifetime controlled by the exporter. Notably
>>> RDMA can now import dma-buf FDs and build them into MRs which allows for
>>> PCI P2P operations. Extend this to allow vfio-pci to export MMIO memory
>>> from PCI device BARs.
>>>
>>> This series supports a use case for SPDK where a NVMe device will be owned
>>> by SPDK through VFIO but interacting with a RDMA device. The RDMA device
>>> may directly access the NVMe CMB or directly manipulate the NVMe device's
>>> doorbell using PCI P2P.
>>>
>>> However, as a general mechanism, it can support many other scenarios with
>>> VFIO. I imagine this dmabuf approach to be usable by iommufd as well for
>>> generic and safe P2P mappings.
>> In general looks good to me, but we really need to get away from using
>> sg_tables for this here.
>>
>> The only thing I'm not 100% convinced of is dma_buf_try_get(), I've seen
>> this incorrectly used so many times that I can't count them any more.
>>
>> Would that be somehow avoidable? Or could you at least explain the use case
>> a bit better.
> I didn't see a way, maybe you know of one

For GEM objects we usually don't use the reference count of the DMA-buf, 
but rather that of the GEM object for this. But that's not an ideal 
solution either.

>
> VFIO needs to maintain a list of dmabuf FDs that have been created by
> the user attached to each vfio_device:
>
> int vfio_pci_core_feature_dma_buf(struct vfio_pci_core_device *vdev, u32 flags,
> 				  struct vfio_device_feature_dma_buf __user *arg,
> 				  size_t argsz)
> {
> 	down_write(&vdev->memory_lock);
> 	list_add_tail(&priv->dmabufs_elm, &vdev->dmabufs);
> 	up_write(&vdev->memory_lock);
>
> And dmabuf FD's are removed from the list when the user closes the FD:
>
> static void vfio_pci_dma_buf_release(struct dma_buf *dmabuf)
> {
> 		down_write(&priv->vdev->memory_lock);
> 		list_del_init(&priv->dmabufs_elm);
> 		up_write(&priv->vdev->memory_lock);
>
> Which then poses the problem: How do you iterate over only dma_buf's
> that are still alive to execute move?
>
> This seems necessary as parts of the dma_buf have already been
> destroyed by the time the user's release function is called.
>
> Which I solved like this:
>
> 	down_write(&vdev->memory_lock);
> 	list_for_each_entry_safe(priv, tmp, &vdev->dmabufs, dmabufs_elm) {
> 		if (!dma_buf_try_get(priv->dmabuf))
> 			continue;

What would happen if you don't skip destroyed dma-bufs here? In other 
words why do you maintain that list in the first place?

Regards,
Christian.

>
> So the scenarios resolve as:
>   - Concurrent release is not in progress: dma_buf_try_get() succeeds
>     and prevents concurrent release from starting
>   - Release has started but not reached its memory_lock:
>     dma_buf_try_get() fails
>   - Release has started but passed its memory_lock: dmabuf is not on
>     the list so dma_buf_try_get() is not called.
>
> Jason

next prev parent reply	other threads:[~2022-08-18 12:58 UTC|newest]

Thread overview: 40+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-08-17 16:11 [PATCH 0/4] Allow MMIO regions to be exported through dma-buf Jason Gunthorpe
2022-08-17 16:11 ` Jason Gunthorpe
2022-08-17 16:11 ` [PATCH 1/4] dma-buf: Add dma_buf_try_get() Jason Gunthorpe
2022-08-17 16:11   ` Jason Gunthorpe
2022-08-17 16:11 ` [PATCH 2/4] vfio: Add vfio_device_get() Jason Gunthorpe
2022-08-17 16:11   ` Jason Gunthorpe
2022-08-17 16:11 ` [PATCH 3/4] vfio_pci: Do not open code pci_try_reset_function() Jason Gunthorpe
2022-08-17 16:11   ` Jason Gunthorpe
2022-08-17 16:11 ` [PATCH 4/4] vfio/pci: Allow MMIO regions to be exported through dma-buf Jason Gunthorpe
2022-08-17 16:11   ` Jason Gunthorpe
2022-08-21 13:51   ` Fwd: " Oded Gabbay
2022-08-21 13:51     ` Oded Gabbay
2022-08-26 18:10     ` Jason Gunthorpe
2022-08-26 18:10       ` Jason Gunthorpe
2022-08-29  5:04   ` Yan Zhao
2022-08-29  5:04     ` Yan Zhao
2022-08-29 12:26     ` Jason Gunthorpe
2022-08-29 12:26       ` Jason Gunthorpe
2022-08-18 11:07 ` [PATCH 0/4] " Christian König
2022-08-18 11:07   ` Christian König
2022-08-18 12:03   ` Jason Gunthorpe
2022-08-18 12:03     ` Jason Gunthorpe
2022-08-18 12:58     ` Christian König [this message]
2022-08-18 12:58       ` Christian König
2022-08-18 13:16       ` Jason Gunthorpe
2022-08-18 13:16         ` Jason Gunthorpe
2022-08-18 13:37         ` Christian König
2022-08-18 13:37           ` Christian König
2022-08-19 13:11           ` Jason Gunthorpe
2022-08-19 13:11             ` Jason Gunthorpe
2022-08-19 13:33             ` Christian König
2022-08-19 13:33               ` Christian König
2022-08-19 13:39               ` Jason Gunthorpe
2022-08-19 13:39                 ` Jason Gunthorpe
2022-08-19 13:47                 ` Christian König
2022-08-19 13:47                   ` Christian König
2022-08-18 12:05 ` Jason Gunthorpe
2022-08-18 12:05   ` Jason Gunthorpe
2022-08-22 21:58   ` Alex Williamson
2022-08-22 21:58     ` Alex Williamson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=d12fdf94-fbef-b981-2eff-660470ceca22@amd.com \
    --to=christian.koenig@amd.com \
    --cc=alex.williamson@redhat.com \
    --cc=cohuck@redhat.com \
    --cc=daniel.vetter@ffwll.ch \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=jgg@nvidia.com \
    --cc=kvm@vger.kernel.org \
    --cc=leon@kernel.org \
    --cc=linaro-mm-sig@lists.linaro.org \
    --cc=linux-media@vger.kernel.org \
    --cc=linux-rdma@vger.kernel.org \
    --cc=maorg@nvidia.com \
    --cc=ogabbay@kernel.org \
    --cc=sumit.semwal@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.