[PATCH 1/4] vfio: Add vfio_device

linux-rdma.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 1/4] vfio: Add vfio_device_get()
       [not found] <20241216095429.210792-1-wguay@fb.com>
@ 2024-12-16  9:54 ` Wei Lin Guay
  2024-12-16  9:54 ` [PATCH 2/4] dma-buf: Add dma_buf_try_get() Wei Lin Guay
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 29+ messages in thread
From: Wei Lin Guay @ 2024-12-16  9:54 UTC (permalink / raw)
  To: alex.williamson, dri-devel, kvm, linux-rdma
  Cc: jgg, vivek.kasireddy, dagmoxnes, kbusch, nviljoen, Wei Lin Guay

From: Jason Gunthorpe <jgg@nvidia.com>

Summary:
To increment a reference the caller already holds. Export
vfio_device_put() to pair with it.

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Wei Lin Guay <wguay@meta.com>
Reviewed-by: Dag Moxnes <dagmoxnes@meta.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Nic Viljoen <nviljoen@meta.com>
---
 drivers/vfio/vfio_main.c | 1 +
 include/linux/vfio.h     | 6 ++++++
 2 files changed, 7 insertions(+)

diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
index a5a62d9d963f..7e318e15abd5 100644
--- a/drivers/vfio/vfio_main.c
+++ b/drivers/vfio/vfio_main.c
@@ -171,6 +171,7 @@ void vfio_device_put_registration(struct vfio_device *device)
 	if (refcount_dec_and_test(&device->refcount))
 		complete(&device->comp);
 }
+EXPORT_SYMBOL_GPL(vfio_device_put_registration);

 bool vfio_device_try_get_registration(struct vfio_device *device)
 {
diff --git a/include/linux/vfio.h b/include/linux/vfio.h
index 000a6cab2d31..d7c790be4bbc 100644
--- a/include/linux/vfio.h
+++ b/include/linux/vfio.h
@@ -279,6 +279,12 @@ static inline void vfio_put_device(struct vfio_device *device)
 int vfio_register_group_dev(struct vfio_device *device);
 int vfio_register_emulated_iommu_dev(struct vfio_device *device);
 void vfio_unregister_group_dev(struct vfio_device *device);
+void vfio_device_put_registration(struct vfio_device *device);
+
+static inline void vfio_device_get(struct vfio_device *device)
+{
+	refcount_inc(&device->refcount);
+}

 int vfio_assign_device_set(struct vfio_device *device, void *set_id);
 unsigned int vfio_device_set_open_count(struct vfio_device_set *dev_set);
--
2.43.5

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 2/4] dma-buf: Add dma_buf_try_get()
       [not found] <20241216095429.210792-1-wguay@fb.com>
  2024-12-16  9:54 ` [PATCH 1/4] vfio: Add vfio_device_get() Wei Lin Guay
@ 2024-12-16  9:54 ` Wei Lin Guay
  2024-12-16  9:54 ` [PATCH 3/4] vfio/pci: Allow MMIO regions to be exported through dma-buf Wei Lin Guay
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 29+ messages in thread
From: Wei Lin Guay @ 2024-12-16  9:54 UTC (permalink / raw)
  To: alex.williamson, dri-devel, kvm, linux-rdma
  Cc: jgg, vivek.kasireddy, dagmoxnes, kbusch, nviljoen, Wei Lin Guay

From: Jason Gunthorpe <jgg@nvidia.com>

Summary:
Used to increment the refcount of the dma buf's struct file, only if the
refcount is not zero. Useful to allow the struct file's lifetime to
control the lifetime of the dmabuf while still letting the driver to keep
track of created dmabufs.

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Wei Lin Guay <wguay@meta.com>
Reviewed-by: Dag Moxnes <dagmoxnes@meta.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Nic Viljoen <nviljoen@meta.com>
---
 include/linux/dma-buf.h | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/include/linux/dma-buf.h b/include/linux/dma-buf.h
index 36216d28d8bd..9854578afecd 100644
--- a/include/linux/dma-buf.h
+++ b/include/linux/dma-buf.h
@@ -614,6 +614,19 @@ int dma_buf_fd(struct dma_buf *dmabuf, int flags);
 struct dma_buf *dma_buf_get(int fd);
 void dma_buf_put(struct dma_buf *dmabuf);

+/**
+ * dma_buf_try_get - try to get a reference on a dmabuf
+ * @dmabuf - the dmabuf to get
+ *
+ * Returns true if a reference was successfully obtained. The caller must
+ * interlock with the dmabuf's release function in some way, such as RCU, to
+ * ensure that this is not called on freed memory.
+ */
+static inline bool dma_buf_try_get(struct dma_buf *dmabuf)
+{
+	return get_file_rcu(&dmabuf->file);
+}
+
 struct sg_table *dma_buf_map_attachment(struct dma_buf_attachment *,
 					enum dma_data_direction);
 void dma_buf_unmap_attachment(struct dma_buf_attachment *, struct sg_table *,
--
2.43.5

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 3/4] vfio/pci: Allow MMIO regions to be exported through dma-buf
       [not found] <20241216095429.210792-1-wguay@fb.com>
  2024-12-16  9:54 ` [PATCH 1/4] vfio: Add vfio_device_get() Wei Lin Guay
  2024-12-16  9:54 ` [PATCH 2/4] dma-buf: Add dma_buf_try_get() Wei Lin Guay
@ 2024-12-16  9:54 ` Wei Lin Guay
  2024-12-16  9:54 ` [PATCH 4/4] vfio/pci: Allow export dmabuf without move_notify from importer Wei Lin Guay
  2024-12-16 10:21 ` [PATCH 0/4] cover-letter: Allow MMIO regions to be exported through dmabuf Christian König
  4 siblings, 0 replies; 29+ messages in thread
From: Wei Lin Guay @ 2024-12-16  9:54 UTC (permalink / raw)
  To: alex.williamson, dri-devel, kvm, linux-rdma
  Cc: jgg, vivek.kasireddy, dagmoxnes, kbusch, nviljoen, Wei Lin Guay

From: Jason Gunthorpe <jgg@nvidia.com>

Summary:
dma-buf has become a way to safely acquire a handle to non-struct page
memory that can still have lifetime controlled by the exporter. Notably
RDMA can now import dma-buf FDs and build them into MRs which allows for
PCI P2P operations. Extend this to allow vfio-pci to export MMIO memory
from PCI device BARs.

The patch design loosely follows the pattern in commit
db1a8dd916aa ("habanalabs: add support for dma-buf exporter") except this
does not support pinning.

Instead, this implements what, in the past, we've called a revocable
attachment using move. In normal situations the attachment is pinned, as a
BAR does not change physical address. However when the VFIO device is
closed, or a PCI reset is issued, access to the MMIO memory is revoked.

Revoked means that move occurs, but an attempt to immediately re-map the
memory will fail. In the reset case a future move will be triggered when
MMIO access returns. As both close and reset are under userspace control
it is expected that userspace will suspend use of the dma-buf before doing
these operations, the revoke is purely for kernel self-defense against a
hostile userspace.

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Wei Lin Guay <wguay@meta.com>
Reviewed-by: Dag Moxnes <dagmoxnes@meta.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Nic Viljoen <nviljoen@meta.com>
---
 drivers/vfio/pci/Makefile          |   1 +
 drivers/vfio/pci/dma_buf.c         | 269 +++++++++++++++++++++++++++++
 drivers/vfio/pci/vfio_pci_config.c |   8 +-
 drivers/vfio/pci/vfio_pci_core.c   |  28 ++-
 drivers/vfio/pci/vfio_pci_priv.h   |  23 +++
 include/linux/vfio_pci_core.h      |   1 +
 include/uapi/linux/vfio.h          |  18 ++
 7 files changed, 340 insertions(+), 8 deletions(-)
 create mode 100644 drivers/vfio/pci/dma_buf.c

diff --git a/drivers/vfio/pci/Makefile b/drivers/vfio/pci/Makefile
index cf00c0a7e55c..0cfdc9ede82f 100644
--- a/drivers/vfio/pci/Makefile
+++ b/drivers/vfio/pci/Makefile
@@ -2,6 +2,7 @@

 vfio-pci-core-y := vfio_pci_core.o vfio_pci_intrs.o vfio_pci_rdwr.o vfio_pci_config.o
 vfio-pci-core-$(CONFIG_VFIO_PCI_ZDEV_KVM) += vfio_pci_zdev.o
+vfio-pci-core-$(CONFIG_DMA_SHARED_BUFFER) += dma_buf.o
 obj-$(CONFIG_VFIO_PCI_CORE) += vfio-pci-core.o

 vfio-pci-y := vfio_pci.o
diff --git a/drivers/vfio/pci/dma_buf.c b/drivers/vfio/pci/dma_buf.c
new file mode 100644
index 000000000000..fd772b520cd7
--- /dev/null
+++ b/drivers/vfio/pci/dma_buf.c
@@ -0,0 +1,269 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/* Copyright (c) 2022, NVIDIA CORPORATION & AFFILIATES.
+ */
+#include <linux/dma-buf.h>
+#include <linux/pci-p2pdma.h>
+#include <linux/dma-resv.h>
+
+#include "vfio_pci_priv.h"
+
+MODULE_IMPORT_NS(DMA_BUF);
+
+struct vfio_pci_dma_buf {
+	struct dma_buf *dmabuf;
+	struct vfio_pci_core_device *vdev;
+	struct list_head dmabufs_elm;
+	unsigned int index;
+	unsigned int orig_nents;
+	size_t offset;
+	bool revoked;
+};
+
+static int vfio_pci_dma_buf_attach(struct dma_buf *dmabuf,
+				   struct dma_buf_attachment *attachment)
+{
+	struct vfio_pci_dma_buf *priv = dmabuf->priv;
+	int rc;
+
+	rc = pci_p2pdma_distance_many(priv->vdev->pdev, &attachment->dev, 1,
+				      true);
+	if (rc < 0)
+		attachment->peer2peer = false;
+	return 0;
+}
+
+static void vfio_pci_dma_buf_unpin(struct dma_buf_attachment *attachment)
+{
+}
+
+static int vfio_pci_dma_buf_pin(struct dma_buf_attachment *attachment)
+{
+	/*
+	 * Uses the dynamic interface but must always allow for
+	 * dma_buf_move_notify() to do revoke
+	 */
+	return -EINVAL;
+}
+
+static struct sg_table *
+vfio_pci_dma_buf_map(struct dma_buf_attachment *attachment,
+		     enum dma_data_direction dir)
+{
+	size_t sgl_size = dma_get_max_seg_size(attachment->dev);
+	struct vfio_pci_dma_buf *priv = attachment->dmabuf->priv;
+	struct scatterlist *sgl;
+	struct sg_table *sgt;
+	dma_addr_t dma_addr;
+	unsigned int nents;
+	size_t offset;
+	int ret;
+
+	dma_resv_assert_held(priv->dmabuf->resv);
+
+	if (!attachment->peer2peer)
+		return ERR_PTR(-EPERM);
+
+	if (priv->revoked)
+		return ERR_PTR(-ENODEV);
+
+	sgt = kzalloc(sizeof(*sgt), GFP_KERNEL);
+	if (!sgt)
+		return ERR_PTR(-ENOMEM);
+
+	nents = DIV_ROUND_UP(priv->dmabuf->size, sgl_size);
+	ret = sg_alloc_table(sgt, nents, GFP_KERNEL);
+	if (ret)
+		goto err_kfree_sgt;
+
+	/*
+	 * Since the memory being mapped is a device memory it could never be in
+	 * CPU caches.
+	 */
+	dma_addr = dma_map_resource(
+		attachment->dev,
+		pci_resource_start(priv->vdev->pdev, priv->index) +
+			priv->offset,
+		priv->dmabuf->size, dir, DMA_ATTR_SKIP_CPU_SYNC);
+	ret = dma_mapping_error(attachment->dev, dma_addr);
+	if (ret)
+		goto err_free_sgt;
+
+	/*
+	 * Break the BAR's physical range up into max sized SGL's according to
+	 * the device's requirement.
+	 */
+	sgl = sgt->sgl;
+	for (offset = 0; offset != priv->dmabuf->size;) {
+		size_t chunk_size = min(priv->dmabuf->size - offset, sgl_size);
+
+		sg_set_page(sgl, NULL, chunk_size, 0);
+		sg_dma_address(sgl) = dma_addr + offset;
+		sg_dma_len(sgl) = chunk_size;
+		sgl = sg_next(sgl);
+		offset += chunk_size;
+	}
+
+	/*
+	 * Because we are not going to include a CPU list we want to have some
+	 * chance that other users will detect this by setting the orig_nents to
+	 * 0 and using only nents (length of DMA list) when going over the sgl
+	 */
+	priv->orig_nents = sgt->orig_nents;
+	sgt->orig_nents = 0;
+	return sgt;
+
+err_free_sgt:
+	sg_free_table(sgt);
+err_kfree_sgt:
+	kfree(sgt);
+	return ERR_PTR(ret);
+}
+
+static void vfio_pci_dma_buf_unmap(struct dma_buf_attachment *attachment,
+				   struct sg_table *sgt,
+				   enum dma_data_direction dir)
+{
+	struct vfio_pci_dma_buf *priv = attachment->dmabuf->priv;
+
+	sgt->orig_nents = priv->orig_nents;
+	dma_unmap_resource(attachment->dev, sg_dma_address(sgt->sgl),
+			   priv->dmabuf->size, dir, DMA_ATTR_SKIP_CPU_SYNC);
+	sg_free_table(sgt);
+	kfree(sgt);
+}
+
+static void vfio_pci_dma_buf_release(struct dma_buf *dmabuf)
+{
+	struct vfio_pci_dma_buf *priv = dmabuf->priv;
+
+	/*
+	 * Either this or vfio_pci_dma_buf_cleanup() will remove from the list.
+	 * The refcount prevents both.
+	 */
+	if (priv->vdev) {
+		down_write(&priv->vdev->memory_lock);
+		list_del_init(&priv->dmabufs_elm);
+		up_write(&priv->vdev->memory_lock);
+		vfio_device_put_registration(&priv->vdev->vdev);
+	}
+	kfree(priv);
+}
+
+static const struct dma_buf_ops vfio_pci_dmabuf_ops = {
+	.attach = vfio_pci_dma_buf_attach,
+	.map_dma_buf = vfio_pci_dma_buf_map,
+	.pin = vfio_pci_dma_buf_pin,
+	.unpin = vfio_pci_dma_buf_unpin,
+	.release = vfio_pci_dma_buf_release,
+	.unmap_dma_buf = vfio_pci_dma_buf_unmap,
+};
+
+int vfio_pci_core_feature_dma_buf(struct vfio_pci_core_device *vdev, u32 flags,
+				  struct vfio_device_feature_dma_buf __user *arg,
+				  size_t argsz)
+{
+	struct vfio_device_feature_dma_buf get_dma_buf;
+	DEFINE_DMA_BUF_EXPORT_INFO(exp_info);
+	struct vfio_pci_dma_buf *priv;
+	int ret;
+
+	ret = vfio_check_feature(flags, argsz, VFIO_DEVICE_FEATURE_GET,
+				 sizeof(get_dma_buf));
+	if (ret != 1)
+		return ret;
+
+	if (copy_from_user(&get_dma_buf, arg, sizeof(get_dma_buf)))
+		return -EFAULT;
+
+	/* For PCI the region_index is the BAR number like everything else */
+	if (get_dma_buf.region_index >= VFIO_PCI_ROM_REGION_INDEX)
+		return -EINVAL;
+
+	exp_info.ops = &vfio_pci_dmabuf_ops;
+	exp_info.size = pci_resource_len(vdev->pdev, get_dma_buf.region_index);
+	if (!exp_info.size)
+		return -EINVAL;
+	if (get_dma_buf.offset || get_dma_buf.length) {
+		if (get_dma_buf.length > exp_info.size ||
+		    get_dma_buf.offset >= exp_info.size ||
+		    get_dma_buf.length > exp_info.size - get_dma_buf.offset ||
+		    get_dma_buf.offset % PAGE_SIZE ||
+		    get_dma_buf.length % PAGE_SIZE)
+			return -EINVAL;
+		exp_info.size = get_dma_buf.length;
+	}
+	exp_info.flags = get_dma_buf.open_flags;
+
+	priv = kzalloc(sizeof(*priv), GFP_KERNEL);
+	if (!priv)
+		return -ENOMEM;
+	INIT_LIST_HEAD(&priv->dmabufs_elm);
+	priv->offset = get_dma_buf.offset;
+	priv->index = get_dma_buf.region_index;
+
+	exp_info.priv = priv;
+	priv->dmabuf = dma_buf_export(&exp_info);
+	if (IS_ERR(priv->dmabuf)) {
+		ret = PTR_ERR(priv->dmabuf);
+		kfree(priv);
+		return ret;
+	}
+
+	/* dma_buf_put() now frees priv */
+
+	down_write(&vdev->memory_lock);
+	dma_resv_lock(priv->dmabuf->resv, NULL);
+	priv->revoked = !__vfio_pci_memory_enabled(vdev);
+	priv->vdev = vdev;
+	vfio_device_get(&vdev->vdev);
+	list_add_tail(&priv->dmabufs_elm, &vdev->dmabufs);
+	dma_resv_unlock(priv->dmabuf->resv);
+	up_write(&vdev->memory_lock);
+
+	/*
+	 * dma_buf_fd() consumes the reference, when the file closes the dmabuf
+	 * will be released.
+	 */
+	return dma_buf_fd(priv->dmabuf, get_dma_buf.open_flags);
+}
+
+void vfio_pci_dma_buf_move(struct vfio_pci_core_device *vdev, bool revoked)
+{
+	struct vfio_pci_dma_buf *priv;
+	struct vfio_pci_dma_buf *tmp;
+
+	lockdep_assert_held_write(&vdev->memory_lock);
+
+	list_for_each_entry_safe(priv, tmp, &vdev->dmabufs, dmabufs_elm) {
+		if (!dma_buf_try_get(priv->dmabuf))
+			continue;
+		if (priv->revoked != revoked) {
+			dma_resv_lock(priv->dmabuf->resv, NULL);
+			priv->revoked = revoked;
+			dma_buf_move_notify(priv->dmabuf);
+			dma_resv_unlock(priv->dmabuf->resv);
+		}
+		dma_buf_put(priv->dmabuf);
+	}
+}
+
+void vfio_pci_dma_buf_cleanup(struct vfio_pci_core_device *vdev)
+{
+	struct vfio_pci_dma_buf *priv;
+	struct vfio_pci_dma_buf *tmp;
+
+	down_write(&vdev->memory_lock);
+	list_for_each_entry_safe(priv, tmp, &vdev->dmabufs, dmabufs_elm) {
+		if (!dma_buf_try_get(priv->dmabuf))
+			continue;
+		dma_resv_lock(priv->dmabuf->resv, NULL);
+		list_del_init(&priv->dmabufs_elm);
+		priv->vdev = NULL;
+		priv->revoked = true;
+		dma_buf_move_notify(priv->dmabuf);
+		dma_resv_unlock(priv->dmabuf->resv);
+		vfio_device_put_registration(&vdev->vdev);
+		dma_buf_put(priv->dmabuf);
+	}
+	up_write(&vdev->memory_lock);
+}
diff --git a/drivers/vfio/pci/vfio_pci_config.c b/drivers/vfio/pci/vfio_pci_config.c
index 97422aafaa7b..c605c5cb0078 100644
--- a/drivers/vfio/pci/vfio_pci_config.c
+++ b/drivers/vfio/pci/vfio_pci_config.c
@@ -585,10 +585,12 @@ static int vfio_basic_config_write(struct vfio_pci_core_device *vdev, int pos,
 		virt_mem = !!(le16_to_cpu(*virt_cmd) & PCI_COMMAND_MEMORY);
 		new_mem = !!(new_cmd & PCI_COMMAND_MEMORY);

-		if (!new_mem)
+		if (!new_mem) {
 			vfio_pci_zap_and_down_write_memory_lock(vdev);
-		else
+			vfio_pci_dma_buf_move(vdev, true);
+		} else {
 			down_write(&vdev->memory_lock);
+		}

 		/*
 		 * If the user is writing mem/io enable (new_mem/io) and we
@@ -623,6 +625,8 @@ static int vfio_basic_config_write(struct vfio_pci_core_device *vdev, int pos,
 		*virt_cmd &= cpu_to_le16(~mask);
 		*virt_cmd |= cpu_to_le16(new_cmd & mask);

+		if (__vfio_pci_memory_enabled(vdev))
+			vfio_pci_dma_buf_move(vdev, false);
 		up_write(&vdev->memory_lock);
 	}

diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
index ba0ce0075b2f..bb97b4d94eb7 100644
--- a/drivers/vfio/pci/vfio_pci_core.c
+++ b/drivers/vfio/pci/vfio_pci_core.c
@@ -700,6 +700,8 @@ void vfio_pci_core_close_device(struct vfio_device *core_vdev)
 #endif
 	vfio_pci_core_disable(vdev);

+	vfio_pci_dma_buf_cleanup(vdev);
+
 	mutex_lock(&vdev->igate);
 	if (vdev->err_trigger) {
 		eventfd_ctx_put(vdev->err_trigger);
@@ -1244,7 +1246,10 @@ static int vfio_pci_ioctl_reset(struct vfio_pci_core_device *vdev,
 	 */
 	vfio_pci_set_power_state(vdev, PCI_D0);

+	vfio_pci_dma_buf_move(vdev, true);
 	ret = pci_try_reset_function(vdev->pdev);
+	if (__vfio_pci_memory_enabled(vdev))
+		vfio_pci_dma_buf_move(vdev, false);
 	up_write(&vdev->memory_lock);

 	return ret;
@@ -1490,11 +1495,10 @@ long vfio_pci_core_ioctl(struct vfio_device *core_vdev, unsigned int cmd,
 }
 EXPORT_SYMBOL_GPL(vfio_pci_core_ioctl);

-static int vfio_pci_core_feature_token(struct vfio_device *device, u32 flags,
-				       uuid_t __user *arg, size_t argsz)
+static int vfio_pci_core_feature_token(struct vfio_pci_core_device *vdev,
+				       u32 flags, uuid_t __user *arg,
+				       size_t argsz)
 {
-	struct vfio_pci_core_device *vdev =
-		container_of(device, struct vfio_pci_core_device, vdev);
 	uuid_t uuid;
 	int ret;

@@ -1521,6 +1525,9 @@ static int vfio_pci_core_feature_token(struct vfio_device *device, u32 flags,
 int vfio_pci_core_ioctl_feature(struct vfio_device *device, u32 flags,
 				void __user *arg, size_t argsz)
 {
+	struct vfio_pci_core_device *vdev =
+		container_of(device, struct vfio_pci_core_device, vdev);
+
 	switch (flags & VFIO_DEVICE_FEATURE_MASK) {
 	case VFIO_DEVICE_FEATURE_LOW_POWER_ENTRY:
 		return vfio_pci_core_pm_entry(device, flags, arg, argsz);
@@ -1530,7 +1537,9 @@ int vfio_pci_core_ioctl_feature(struct vfio_device *device, u32 flags,
 	case VFIO_DEVICE_FEATURE_LOW_POWER_EXIT:
 		return vfio_pci_core_pm_exit(device, flags, arg, argsz);
 	case VFIO_DEVICE_FEATURE_PCI_VF_TOKEN:
-		return vfio_pci_core_feature_token(device, flags, arg, argsz);
+		return vfio_pci_core_feature_token(vdev, flags, arg, argsz);
+	case VFIO_DEVICE_FEATURE_DMA_BUF:
+		return vfio_pci_core_feature_dma_buf(vdev, flags, arg, argsz);
 	default:
 		return -ENOTTY;
 	}
@@ -2083,6 +2092,7 @@ int vfio_pci_core_init_dev(struct vfio_device *core_vdev)
 	INIT_LIST_HEAD(&vdev->sriov_pfs_item);
 	init_rwsem(&vdev->memory_lock);
 	xa_init(&vdev->ctx);
+	INIT_LIST_HEAD(&vdev->dmabufs);

 	return 0;
 }
@@ -2463,11 +2473,17 @@ static int vfio_pci_dev_set_hot_reset(struct vfio_device_set *dev_set,
 	 * cause the PCI config space reset without restoring the original
 	 * state (saved locally in 'vdev->pm_save').
 	 */
-	list_for_each_entry(vdev, &dev_set->device_list, vdev.dev_set_list)
+	list_for_each_entry(vdev, &dev_set->device_list, vdev.dev_set_list) {
+		vfio_pci_dma_buf_move(vdev, true);
 		vfio_pci_set_power_state(vdev, PCI_D0);
+	}

 	ret = pci_reset_bus(pdev);

+	list_for_each_entry(vdev, &dev_set->device_list, vdev.dev_set_list)
+		if (__vfio_pci_memory_enabled(vdev))
+			vfio_pci_dma_buf_move(vdev, false);
+
 	vdev = list_last_entry(&dev_set->device_list,
 			       struct vfio_pci_core_device, vdev.dev_set_list);

diff --git a/drivers/vfio/pci/vfio_pci_priv.h b/drivers/vfio/pci/vfio_pci_priv.h
index 5e4fa69aee16..09d3c300918c 100644
--- a/drivers/vfio/pci/vfio_pci_priv.h
+++ b/drivers/vfio/pci/vfio_pci_priv.h
@@ -101,4 +101,27 @@ static inline bool vfio_pci_is_vga(struct pci_dev *pdev)
 	return (pdev->class >> 8) == PCI_CLASS_DISPLAY_VGA;
 }

+#ifdef CONFIG_DMA_SHARED_BUFFER
+int vfio_pci_core_feature_dma_buf(struct vfio_pci_core_device *vdev, u32 flags,
+				  struct vfio_device_feature_dma_buf __user *arg,
+				  size_t argsz);
+void vfio_pci_dma_buf_cleanup(struct vfio_pci_core_device *vdev);
+void vfio_pci_dma_buf_move(struct vfio_pci_core_device *vdev, bool revoked);
+#else
+static int
+vfio_pci_core_feature_dma_buf(struct vfio_pci_core_device *vdev, u32 flags,
+			      struct vfio_device_feature_dma_buf __user *arg,
+			      size_t argsz)
+{
+	return -ENOTTY;
+}
+static inline void vfio_pci_dma_buf_cleanup(struct vfio_pci_core_device *vdev)
+{
+}
+static inline void vfio_pci_dma_buf_move(struct vfio_pci_core_device *vdev,
+					 bool revoked)
+{
+}
+#endif
+
 #endif
diff --git a/include/linux/vfio_pci_core.h b/include/linux/vfio_pci_core.h
index fbb472dd99b3..da5d8955ae56 100644
--- a/include/linux/vfio_pci_core.h
+++ b/include/linux/vfio_pci_core.h
@@ -94,6 +94,7 @@ struct vfio_pci_core_device {
 	struct vfio_pci_core_device	*sriov_pf_core_dev;
 	struct notifier_block	nb;
 	struct rw_semaphore	memory_lock;
+	struct list_head	dmabufs;
 };

 /* Will be exported for vfio pci drivers usage */
diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
index 2b68e6cdf190..8812b4750cc5 100644
--- a/include/uapi/linux/vfio.h
+++ b/include/uapi/linux/vfio.h
@@ -1458,6 +1458,24 @@ struct vfio_device_feature_bus_master {
 };
 #define VFIO_DEVICE_FEATURE_BUS_MASTER 10

+/**
+ * Upon VFIO_DEVICE_FEATURE_GET create a dma_buf fd for the
+ * region selected.
+ *
+ * open_flags are the typical flags passed to open(2), eg O_RDWR, O_CLOEXEC,
+ * etc. offset/length specify a slice of the region to create the dmabuf from.
+ * If both are 0 then the whole region is used.
+ *
+ * Return: The fd number on success, -1 and errno is set on failure.
+ */
+struct vfio_device_feature_dma_buf {
+	__u32 region_index;
+	__u32 open_flags;
+	__u32 offset;
+	__u64 length;
+};
+#define VFIO_DEVICE_FEATURE_DMA_BUF 11
+
 /* -------- API for Type1 VFIO IOMMU -------- */

 /**
--
2.43.5

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 4/4] vfio/pci: Allow export dmabuf without move_notify from importer
       [not found] <20241216095429.210792-1-wguay@fb.com>
                   ` (2 preceding siblings ...)
  2024-12-16  9:54 ` [PATCH 3/4] vfio/pci: Allow MMIO regions to be exported through dma-buf Wei Lin Guay
@ 2024-12-16  9:54 ` Wei Lin Guay
  2024-12-16 10:21 ` [PATCH 0/4] cover-letter: Allow MMIO regions to be exported through dmabuf Christian König
  4 siblings, 0 replies; 29+ messages in thread
From: Wei Lin Guay @ 2024-12-16  9:54 UTC (permalink / raw)
  To: alex.williamson, dri-devel, kvm, linux-rdma
  Cc: jgg, vivek.kasireddy, dagmoxnes, kbusch, nviljoen, Wei Lin Guay

From: Wei Lin Guay <wguay@meta.com>

Summary:
Support vfio to export dmabuf to importer such as RDMA NIC that does
not support move_notify callback, since not all RDMA driver support
on-demand-paging (ODP).

There are some use-cases such as bind accelerator that always pinned
the device memory via vfio and export it to RDMA NIC such as EFA, BNXT_RE
or IRDMA that does not support ODP.

Signed-off-by: Wei Lin Guay <wguay@meta.com>
Reviewed-by: Dag Moxnes <dagmoxnes@meta.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Nic Viljoen <nviljoen@meta.com>
---
 drivers/vfio/pci/dma_buf.c       | 32 +++++++++++++++++++++++++++-----
 drivers/vfio/pci/vfio_pci_core.c | 16 ++++++++++++++++
 drivers/vfio/pci/vfio_pci_priv.h |  7 +++++++
 3 files changed, 50 insertions(+), 5 deletions(-)

diff --git a/drivers/vfio/pci/dma_buf.c b/drivers/vfio/pci/dma_buf.c
index fd772b520cd7..8017f48296cb 100644
--- a/drivers/vfio/pci/dma_buf.c
+++ b/drivers/vfio/pci/dma_buf.c
@@ -17,6 +17,7 @@ struct vfio_pci_dma_buf {
 	unsigned int orig_nents;
 	size_t offset;
 	bool revoked;
+	bool pinned;
 };

 static int vfio_pci_dma_buf_attach(struct dma_buf *dmabuf,
@@ -32,17 +33,38 @@ static int vfio_pci_dma_buf_attach(struct dma_buf *dmabuf,
 	return 0;
 }

+bool vfio_pci_dma_buf_pinned(struct vfio_pci_core_device *vdev)
+{
+	struct vfio_pci_dma_buf *priv;
+	struct vfio_pci_dma_buf *tmp;
+	bool pinned = false;
+
+	down_write(&vdev->memory_lock);
+	list_for_each_entry_safe(priv, tmp, &vdev->dmabufs, dmabufs_elm) {
+		if (!dma_buf_try_get(priv->dmabuf))
+			continue;
+		if (priv->pinned) {
+			pinned = true;
+			break;
+		}
+	}
+	up_write(&vdev->memory_lock);
+	return pinned;
+}
+
 static void vfio_pci_dma_buf_unpin(struct dma_buf_attachment *attachment)
 {
+	struct vfio_pci_dma_buf *priv = attachment->dmabuf->priv;
+
+	priv->pinned = false;
 }

 static int vfio_pci_dma_buf_pin(struct dma_buf_attachment *attachment)
 {
-	/*
-	 * Uses the dynamic interface but must always allow for
-	 * dma_buf_move_notify() to do revoke
-	 */
-	return -EINVAL;
+	struct vfio_pci_dma_buf *priv = attachment->dmabuf->priv;
+
+	priv->pinned = true;
+	return 0;
 }

 static struct sg_table *
diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
index bb97b4d94eb7..db28fa2cc9a8 100644
--- a/drivers/vfio/pci/vfio_pci_core.c
+++ b/drivers/vfio/pci/vfio_pci_core.c
@@ -1246,6 +1246,13 @@ static int vfio_pci_ioctl_reset(struct vfio_pci_core_device *vdev,
 	 */
 	vfio_pci_set_power_state(vdev, PCI_D0);

+	/*
+	 * prevent reset if dma_buf is pinned to avoid stale pinned
+	 * expose to the dmabuf exporter.
+	 */
+	if (vfio_pci_dma_buf_pinned(vdev))
+		return -EINVAL;
+
 	vfio_pci_dma_buf_move(vdev, true);
 	ret = pci_try_reset_function(vdev->pdev);
 	if (__vfio_pci_memory_enabled(vdev))
@@ -2444,6 +2451,15 @@ static int vfio_pci_dev_set_hot_reset(struct vfio_device_set *dev_set,
 			break;
 		}

+		/*
+		 * prevent reset if dma_buf is pinned to avoid stale pinned
+		 * expose to the dmabuf exporter.
+		 */
+		if (vfio_pci_dma_buf_pinned(vdev)) {
+			ret = -EINVAL;
+			break;
+		}
+
 		/*
 		 * Take the memory write lock for each device and zap BAR
 		 * mappings to prevent the user accessing the device while in
diff --git a/drivers/vfio/pci/vfio_pci_priv.h b/drivers/vfio/pci/vfio_pci_priv.h
index 09d3c300918c..43c40dc4751c 100644
--- a/drivers/vfio/pci/vfio_pci_priv.h
+++ b/drivers/vfio/pci/vfio_pci_priv.h
@@ -107,6 +107,7 @@ int vfio_pci_core_feature_dma_buf(struct vfio_pci_core_device *vdev, u32 flags,
 				  size_t argsz);
 void vfio_pci_dma_buf_cleanup(struct vfio_pci_core_device *vdev);
 void vfio_pci_dma_buf_move(struct vfio_pci_core_device *vdev, bool revoked);
+bool vfio_pci_dma_buf_pinned(struct vfio_pci_core_device *vdev);
 #else
 static int
 vfio_pci_core_feature_dma_buf(struct vfio_pci_core_device *vdev, u32 flags,
@@ -115,6 +116,12 @@ vfio_pci_core_feature_dma_buf(struct vfio_pci_core_device *vdev, u32 flags,
 {
 	return -ENOTTY;
 }
+
+static inline bool vfio_pci_dma_buf_pinned(struct vfio_pci_core_device *vdev)
+{
+	return false;
+}
+
 static inline void vfio_pci_dma_buf_cleanup(struct vfio_pci_core_device *vdev)
 {
 }
--
2.43.5

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [PATCH 0/4] cover-letter: Allow MMIO regions to be exported through dmabuf
       [not found] <20241216095429.210792-1-wguay@fb.com>
                   ` (3 preceding siblings ...)
  2024-12-16  9:54 ` [PATCH 4/4] vfio/pci: Allow export dmabuf without move_notify from importer Wei Lin Guay
@ 2024-12-16 10:21 ` Christian König
  2024-12-16 16:54   ` Keith Busch
  4 siblings, 1 reply; 29+ messages in thread
From: Christian König @ 2024-12-16 10:21 UTC (permalink / raw)
  To: Wei Lin Guay, alex.williamson, dri-devel, kvm, linux-rdma
  Cc: jgg, vivek.kasireddy, dagmoxnes, kbusch, nviljoen, Wei Lin Guay,
	Oded Gabbay, Daniel Vetter, Leon Romanovsky, Maor Gottlieb

Am 16.12.24 um 10:54 schrieb Wei Lin Guay:
> From: Wei Lin Guay <wguay@meta.com>
>
> This is another attempt to revive the patches posted by Jason
> Gunthorpe and Vivek Kasireddy, at
> https://patchwork.kernel.org/project/linux-media/cover/0-v2-472615b3877e+28f7-vfio_dma_buf_jgg@nvidia.com/
> https://lwn.net/Articles/970751/
>
> In addition to the initial proposal by Jason, another promising
> application is exposing memory from an AI accelerator (bound to VFIO)
> to an RDMA device. This would allow the RDMA device to directly access
> the accelerator's memory, thereby facilitating direct data
> transactions between the RDMA device and the accelerator.
>
> Below is from the text/motivation from the orginal cover letter.
>
> dma-buf has become a way to safely acquire a handle to non-struct page
> memory that can still have lifetime controlled by the exporter. Notably
> RDMA can now import dma-buf FDs and build them into MRs which allows for
> PCI P2P operations. Extend this to allow vfio-pci to export MMIO memory
> from PCI device BARs.
>
> This series supports a use case for SPDK where a NVMe device will be owned
> by SPDK through VFIO but interacting with a RDMA device. The RDMA device
> may directly access the NVMe CMB or directly manipulate the NVMe device's
> doorbell using PCI P2P.
>
> However, as a general mechanism, it can support many other scenarios with
> VFIO. I imagine this dmabuf approach to be usable by iommufd as well for
> generic and safe P2P mappings.
>
> This series goes after the "Break up ioctl dispatch functions to one
> function per ioctl" series.

Yeah that sounds like it should work.

But where is the rest of the series, I only see the cover letter?

>
> v2:
>   - Name the new file dma_buf.c
>   - Restore orig_nents before freeing
>   - Fix reversed logic around priv->revoked
>   - Set priv->index
>   - Rebased on v2 "Break up ioctl dispatch functions"
> v1: https://lore.kernel.org/r/0-v1-9e6e1739ed95+5fa-vfio_dma_buf_jgg@nvidia.com
> Cc: linux-rdma@vger.kernel.org
> Cc: Oded Gabbay <ogabbay@kernel.org>
> Cc: Christian König <christian.koenig@amd.com>
> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> Cc: Leon Romanovsky <leon@kernel.org>
> Cc: Maor Gottlieb <maorg@nvidia.com>
> Cc: dri-devel@lists.freedesktop.org
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
>
> Jason Gunthorpe (3):
>    vfio: Add vfio_device_get()
>    dma-buf: Add dma_buf_try_get()

That is usually a no-go. We have rejected adding dma_buf_try_get() 
multiple times.

Please explain *exactly* what you need that for and how you protect 
against races with tearndown.

Regards,
Christian.

>    vfio/pci: Allow MMIO regions to be exported through dma-buf
>
> Wei Lin Guay (1):
>    vfio/pci: Allow export dmabuf without move_notify from importer
>
>   drivers/vfio/pci/Makefile          |   1 +
>   drivers/vfio/pci/dma_buf.c         | 291 +++++++++++++++++++++++++++++
>   drivers/vfio/pci/vfio_pci_config.c |   8 +-
>   drivers/vfio/pci/vfio_pci_core.c   |  44 ++++-
>   drivers/vfio/pci/vfio_pci_priv.h   |  30 +++
>   drivers/vfio/vfio_main.c           |   1 +
>   include/linux/dma-buf.h            |  13 ++
>   include/linux/vfio.h               |   6 +
>   include/linux/vfio_pci_core.h      |   1 +
>   include/uapi/linux/vfio.h          |  18 ++
>   10 files changed, 405 insertions(+), 8 deletions(-)
>   create mode 100644 drivers/vfio/pci/dma_buf.c
>
> --
> 2.43.5


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 0/4] cover-letter: Allow MMIO regions to be exported through dmabuf
  2024-12-16 10:21 ` [PATCH 0/4] cover-letter: Allow MMIO regions to be exported through dmabuf Christian König
@ 2024-12-16 16:54   ` Keith Busch
  2024-12-17  9:53     ` Christian König
  0 siblings, 1 reply; 29+ messages in thread
From: Keith Busch @ 2024-12-16 16:54 UTC (permalink / raw)
  To: Christian König
  Cc: Wei Lin Guay, alex.williamson, dri-devel, kvm, linux-rdma, jgg,
	vivek.kasireddy, dagmoxnes, nviljoen, Wei Lin Guay, Oded Gabbay,
	Daniel Vetter, Leon Romanovsky, Maor Gottlieb

On Mon, Dec 16, 2024 at 11:21:39AM +0100, Christian König wrote:
> Am 16.12.24 um 10:54 schrieb Wei Lin Guay:
> > From: Wei Lin Guay <wguay@meta.com>
> > However, as a general mechanism, it can support many other scenarios with
> > VFIO. I imagine this dmabuf approach to be usable by iommufd as well for
> > generic and safe P2P mappings.
> > 
> > This series goes after the "Break up ioctl dispatch functions to one
> > function per ioctl" series.
> 
> Yeah that sounds like it should work.
> 
> But where is the rest of the series, I only see the cover letter?

Should be here:

  https://lore.kernel.org/linux-rdma/20241216095429.210792-2-wguay@fb.com/T/#u

^ permalink raw reply	[flat|nested] 29+ messages in thread

* RE: [PATCH 0/4] cover-letter: Allow MMIO regions to be exported through dmabuf
       [not found] <20241216095920.237117-1-wguay@fb.com>
@ 2024-12-16 17:34 ` Kasireddy, Vivek
  2024-12-17 12:19   ` Wei Lin Guay
  0 siblings, 1 reply; 29+ messages in thread
From: Kasireddy, Vivek @ 2024-12-16 17:34 UTC (permalink / raw)
  To: Wei Lin Guay, alex.williamson@redhat.com,
	dri-devel@lists.freedesktop.org, kvm@vger.kernel.org,
	linux-rdma@vger.kernel.org
  Cc: jgg@nvidia.com, dagmoxnes@meta.com, kbusch@kernel.org,
	nviljoen@meta.com, Wei Lin Guay, Oded Gabbay,
	Christian König, Daniel Vetter, Leon Romanovsky,
	Maor Gottlieb

Hi Wei Lin,

> Subject: [PATCH 0/4] cover-letter: Allow MMIO regions to be exported
> through dmabuf
> 
> From: Wei Lin Guay <wguay@meta.com>
> 
> This is another attempt to revive the patches posted by Jason
> Gunthorpe and Vivek Kasireddy, at
> https://patchwork.kernel.org/project/linux-media/cover/0-v2-
> 472615b3877e+28f7-vfio_dma_buf_jgg@nvidia.com/
> https://lwn.net/Articles/970751/
v2: https://lore.kernel.org/dri-devel/20240624065552.1572580-1-vivek.kasireddy@intel.com/
addresses review comments from Alex and Jason and also includes the ability
to create the dmabuf from multiple ranges. This is really needed to future-proof
the feature.

Also, my understanding is that this patchset cannot proceed until Leon's series is merged:
https://lore.kernel.org/kvm/cover.1733398913.git.leon@kernel.org/


Thanks,
Vivek

> 
> In addition to the initial proposal by Jason, another promising
> application is exposing memory from an AI accelerator (bound to VFIO)
> to an RDMA device. This would allow the RDMA device to directly access
> the accelerator's memory, thereby facilitating direct data
> transactions between the RDMA device and the accelerator.
> 
> Below is from the text/motivation from the orginal cover letter.
> 
> dma-buf has become a way to safely acquire a handle to non-struct page
> memory that can still have lifetime controlled by the exporter. Notably
> RDMA can now import dma-buf FDs and build them into MRs which allows
> for
> PCI P2P operations. Extend this to allow vfio-pci to export MMIO memory
> from PCI device BARs.
> 
> This series supports a use case for SPDK where a NVMe device will be owned
> by SPDK through VFIO but interacting with a RDMA device. The RDMA device
> may directly access the NVMe CMB or directly manipulate the NVMe device's
> doorbell using PCI P2P.
> 
> However, as a general mechanism, it can support many other scenarios with
> VFIO. I imagine this dmabuf approach to be usable by iommufd as well for
> generic and safe P2P mappings.
> 
> This series goes after the "Break up ioctl dispatch functions to one
> function per ioctl" series.
> 
> v2:
>  - Name the new file dma_buf.c
>  - Restore orig_nents before freeing
>  - Fix reversed logic around priv->revoked
>  - Set priv->index
>  - Rebased on v2 "Break up ioctl dispatch functions"
> v1: https://lore.kernel.org/r/0-v1-9e6e1739ed95+5fa-
> vfio_dma_buf_jgg@nvidia.com
> Cc: linux-rdma@vger.kernel.org
> Cc: Oded Gabbay <ogabbay@kernel.org>
> Cc: Christian König <christian.koenig@amd.com>
> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> Cc: Leon Romanovsky <leon@kernel.org>
> Cc: Maor Gottlieb <maorg@nvidia.com>
> Cc: dri-devel@lists.freedesktop.org
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
> 
> Jason Gunthorpe (3):
>   vfio: Add vfio_device_get()
>   dma-buf: Add dma_buf_try_get()
>   vfio/pci: Allow MMIO regions to be exported through dma-buf
> 
> Wei Lin Guay (1):
>   vfio/pci: Allow export dmabuf without move_notify from importer
> 
>  drivers/vfio/pci/Makefile          |   1 +
>  drivers/vfio/pci/dma_buf.c         | 291 +++++++++++++++++++++++++++++
>  drivers/vfio/pci/vfio_pci_config.c |   8 +-
>  drivers/vfio/pci/vfio_pci_core.c   |  44 ++++-
>  drivers/vfio/pci/vfio_pci_priv.h   |  30 +++
>  drivers/vfio/vfio_main.c           |   1 +
>  include/linux/dma-buf.h            |  13 ++
>  include/linux/vfio.h               |   6 +
>  include/linux/vfio_pci_core.h      |   1 +
>  include/uapi/linux/vfio.h          |  18 ++
>  10 files changed, 405 insertions(+), 8 deletions(-)
>  create mode 100644 drivers/vfio/pci/dma_buf.c
> 
> --
> 2.43.5

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 0/4] cover-letter: Allow MMIO regions to be exported through dmabuf
  2024-12-16 16:54   ` Keith Busch
@ 2024-12-17  9:53     ` Christian König
  2024-12-17 11:06       ` Wei Lin Guay
  2024-12-18 10:44       ` Simona Vetter
  0 siblings, 2 replies; 29+ messages in thread
From: Christian König @ 2024-12-17  9:53 UTC (permalink / raw)
  To: Keith Busch
  Cc: Wei Lin Guay, alex.williamson, dri-devel, kvm, linux-rdma, jgg,
	vivek.kasireddy, dagmoxnes, nviljoen, Wei Lin Guay, Oded Gabbay,
	Daniel Vetter, Leon Romanovsky, Maor Gottlieb

Am 16.12.24 um 17:54 schrieb Keith Busch:
> On Mon, Dec 16, 2024 at 11:21:39AM +0100, Christian König wrote:
>> Am 16.12.24 um 10:54 schrieb Wei Lin Guay:
>>> From: Wei Lin Guay <wguay@meta.com>
>>> However, as a general mechanism, it can support many other scenarios with
>>> VFIO. I imagine this dmabuf approach to be usable by iommufd as well for
>>> generic and safe P2P mappings.
>>>
>>> This series goes after the "Break up ioctl dispatch functions to one
>>> function per ioctl" series.
>> Yeah that sounds like it should work.
>>
>> But where is the rest of the series, I only see the cover letter?
> Should be here:
>
>    https://lore.kernel.org/linux-rdma/20241216095429.210792-2-wguay@fb.com/T/#u

Please send that out once more with me on explicit CC.

Apart from that I have to reject the adding of dma_buf_try_get(), that 
is clearly not something we should do.

Thanks,
Christian.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 0/4] cover-letter: Allow MMIO regions to be exported through dmabuf
  2024-12-17  9:53     ` Christian König
@ 2024-12-17 11:06       ` Wei Lin Guay
       [not found]         ` <e8759159-b141-410b-be08-aad54eaed41f@amd.com>
  2024-12-18 10:44       ` Simona Vetter
  1 sibling, 1 reply; 29+ messages in thread
From: Wei Lin Guay @ 2024-12-17 11:06 UTC (permalink / raw)
  To: Christian König
  Cc: Keith Busch, Wei Lin Guay, alex.williamson@redhat.com,
	dri-devel@lists.freedesktop.org, kvm@vger.kernel.org,
	linux-rdma@vger.kernel.org, jgg@nvidia.com,
	vivek.kasireddy@intel.com, Dag Moxnes, Nicolaas Viljoen,
	Oded Gabbay, Daniel Vetter, Leon Romanovsky, Maor Gottlieb

Hi Christian, 

Thanks again for your prompt response/review.

> On 17 Dec 2024, at 10:53, Christian König <christian.koenig@amd.com> wrote:
> 
> > 
> Am 16.12.24 um 17:54 schrieb Keith Busch:
>> On Mon, Dec 16, 2024 at 11:21:39AM +0100, Christian König wrote:
>>> Am 16.12.24 um 10:54 schrieb Wei Lin Guay:
>>>> From: Wei Lin Guay <wguay@meta.com>
>>>> However, as a general mechanism, it can support many other scenarios with
>>>> VFIO. I imagine this dmabuf approach to be usable by iommufd as well for
>>>> generic and safe P2P mappings.
>>>> 
>>>> This series goes after the "Break up ioctl dispatch functions to one
>>>> function per ioctl" series.
>>> Yeah that sounds like it should work.
>>> 
>>> But where is the rest of the series, I only see the cover letter?
>> Should be here:
>> 
>>   https://lore.kernel.org/linux-rdma/20241216095429.210792-2-wguay@fb.com/T/#u
> 
> Please send that out once more with me on explicit CC.

I will resend the patch series. I was experiencing issues with my email client, which inadvertently split the series into two separate emails.

> 
> Apart from that I have to reject the adding of dma_buf_try_get(), that is clearly not something we should do.


Understood. It appears that Vivek has confirmed that his v2 has resolved the issue. I will follow up with him to determine if he plans to resume his patch, and if so, I will apply my last patch on top of his updated patch series 

Thanks,
Wei Lin
> 
> Thanks,
> Christian.



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 0/4] cover-letter: Allow MMIO regions to be exported through dmabuf
  2024-12-16 17:34 ` Kasireddy, Vivek
@ 2024-12-17 12:19   ` Wei Lin Guay
  2024-12-18  7:02     ` Kasireddy, Vivek
  0 siblings, 1 reply; 29+ messages in thread
From: Wei Lin Guay @ 2024-12-17 12:19 UTC (permalink / raw)
  To: Kasireddy, Vivek
  Cc: Wei Lin Guay, alex.williamson@redhat.com,
	dri-devel@lists.freedesktop.org, kvm@vger.kernel.org,
	linux-rdma@vger.kernel.org, jgg@nvidia.com, Dag Moxnes,
	kbusch@kernel.org, Nicolaas Viljoen, Oded Gabbay,
	Christian König, Daniel Vetter, Leon Romanovsky,
	Maor Gottlieb

Hi Vivek, 

> On 16 Dec 2024, at 18:34, Kasireddy, Vivek <vivek.kasireddy@intel.com> wrote:
> 
> > 
> Hi Wei Lin,
> 
>> Subject: [PATCH 0/4] cover-letter: Allow MMIO regions to be exported
>> through dmabuf
>> 
>> From: Wei Lin Guay <wguay@meta.com>
>> 
>> This is another attempt to revive the patches posted by Jason
>> Gunthorpe and Vivek Kasireddy, at
>> https://patchwork.kernel.org/project/linux-media/cover/0-v2-
>> 472615b3877e+28f7-vfio_dma_buf_jgg@nvidia.com/
>> https://urldefense.com/v3/__https://lwn.net/Articles/970751/__;!!Bt8RZUm9aw!5uPrlSI9DhXVIeRMntADbkdWcu4YYhAzGN0OwyQ2NHxrN7bYE9f1eQBXUD8xHHPxiJorSkYhNjIRwdG4PsD2$
> v2: https://lore.kernel.org/dri-devel/20240624065552.1572580-1-vivek.kasireddy@intel.com/
> addresses review comments from Alex and Jason and also includes the ability
> to create the dmabuf from multiple ranges. This is really needed to future-proof
> the feature. 

The good thing is that your patch series addressed Christian’s concerns of adding dma_buf_try_get().

However, several questions regarding your v2 patch:  
- The dma-buf still support redundant mmap handler? (One of review comments from v1?) 
- Rather than handle different regions within a single dma-buf, would vfio-user open multiple distinct file descriptors work?
For our specific use case, we don't require multiple regions and prefer Jason’s original patch.

> 
> Also, my understanding is that this patchset cannot proceed until Leon's series is merged:
> https://lore.kernel.org/kvm/cover.1733398913.git.leon@kernel.org/

Thanks for the pointer. 
I will rebase my local patch series on top of that. 

Thanks,
Wei Lin

> 
> 
> Thanks,
> Vivek
> 
>> 
>> In addition to the initial proposal by Jason, another promising
>> application is exposing memory from an AI accelerator (bound to VFIO)
>> to an RDMA device. This would allow the RDMA device to directly access
>> the accelerator's memory, thereby facilitating direct data
>> transactions between the RDMA device and the accelerator.
>> 
>> Below is from the text/motivation from the orginal cover letter.
>> 
>> dma-buf has become a way to safely acquire a handle to non-struct page
>> memory that can still have lifetime controlled by the exporter. Notably
>> RDMA can now import dma-buf FDs and build them into MRs which allows
>> for
>> PCI P2P operations. Extend this to allow vfio-pci to export MMIO memory
>> from PCI device BARs.
>> 
>> This series supports a use case for SPDK where a NVMe device will be owned
>> by SPDK through VFIO but interacting with a RDMA device. The RDMA device
>> may directly access the NVMe CMB or directly manipulate the NVMe device's
>> doorbell using PCI P2P.
>> 
>> However, as a general mechanism, it can support many other scenarios with
>> VFIO. I imagine this dmabuf approach to be usable by iommufd as well for
>> generic and safe P2P mappings.
>> 
>> This series goes after the "Break up ioctl dispatch functions to one
>> function per ioctl" series.
>> 
>> v2:
>> - Name the new file dma_buf.c
>> - Restore orig_nents before freeing
>> - Fix reversed logic around priv->revoked
>> - Set priv->index
>> - Rebased on v2 "Break up ioctl dispatch functions"
>> v1: https://lore.kernel.org/r/0-v1-9e6e1739ed95+5fa-
>> vfio_dma_buf_jgg@nvidia.com
>> Cc: linux-rdma@vger.kernel.org
>> Cc: Oded Gabbay <ogabbay@kernel.org>
>> Cc: Christian König <christian.koenig@amd.com>
>> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
>> Cc: Leon Romanovsky <leon@kernel.org>
>> Cc: Maor Gottlieb <maorg@nvidia.com>
>> Cc: dri-devel@lists.freedesktop.org
>> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
>> 
>> Jason Gunthorpe (3):
>>  vfio: Add vfio_device_get()
>>  dma-buf: Add dma_buf_try_get()
>>  vfio/pci: Allow MMIO regions to be exported through dma-buf
>> 
>> Wei Lin Guay (1):
>>  vfio/pci: Allow export dmabuf without move_notify from importer
>> 
>> drivers/vfio/pci/Makefile          |   1 +
>> drivers/vfio/pci/dma_buf.c         | 291 +++++++++++++++++++++++++++++
>> drivers/vfio/pci/vfio_pci_config.c |   8 +-
>> drivers/vfio/pci/vfio_pci_core.c   |  44 ++++-
>> drivers/vfio/pci/vfio_pci_priv.h   |  30 +++
>> drivers/vfio/vfio_main.c           |   1 +
>> include/linux/dma-buf.h            |  13 ++
>> include/linux/vfio.h               |   6 +
>> include/linux/vfio_pci_core.h      |   1 +
>> include/uapi/linux/vfio.h          |  18 ++
>> 10 files changed, 405 insertions(+), 8 deletions(-)
>> create mode 100644 drivers/vfio/pci/dma_buf.c
>> 
>> --
>> 2.43.5


^ permalink raw reply	[flat|nested] 29+ messages in thread

* RE: [PATCH 0/4] cover-letter: Allow MMIO regions to be exported through dmabuf
       [not found]         ` <e8759159-b141-410b-be08-aad54eaed41f@amd.com>
@ 2024-12-18  6:16           ` Kasireddy, Vivek
  2024-12-18 10:01             ` Christian König
  0 siblings, 1 reply; 29+ messages in thread
From: Kasireddy, Vivek @ 2024-12-18  6:16 UTC (permalink / raw)
  To: Christian König, Wei Lin Guay
  Cc: Keith Busch, alex.williamson@redhat.com,
	dri-devel@lists.freedesktop.org, kvm@vger.kernel.org,
	linux-rdma@vger.kernel.org, jgg@nvidia.com, Dag Moxnes,
	Nicolaas Viljoen, Oded Gabbay, Simona Vetter, Leon Romanovsky,
	Maor Gottlieb

Hi Christian,

> Subject: Re: [PATCH 0/4] cover-letter: Allow MMIO regions to be exported
> through dmabuf
> 
> 
>> 	I will resend the patch series. I was experiencing issues with my email
>> client, which inadvertently split the series into two separate emails.
> 
> 
> Alternatively I can also copy the code from the list archive and explain why
> that doesn't work:
> 
> +void vfio_pci_dma_buf_move(struct vfio_pci_core_device *vdev, bool
> +revoked) {
> +    struct vfio_pci_dma_buf *priv;
> +    struct vfio_pci_dma_buf *tmp;
> +
> +    lockdep_assert_held_write(&vdev->memory_lock);
> 
> This makes sure that the caller is holding vdev->memory_lock.
> 
> +
> +    list_for_each_entry_safe(priv, tmp, &vdev->dmabufs, dmabufs_elm) {
> +        if (!dma_buf_try_get(priv->dmabuf))
> 
> This here only works because vfio_pci_dma_buf_release() also grabs vdev-
> >memory_lock and so we are protected against concurrent releases.
> 
> +            continue;
> +        if (priv->revoked != revoked) {
> +            dma_resv_lock(priv->dmabuf->resv, NULL);
> +            priv->revoked = revoked;
> +            dma_buf_move_notify(priv->dmabuf);
> +            dma_resv_unlock(priv->dmabuf->resv);
> +        }
> +        dma_buf_put(priv->dmabuf);
> 
> The problem is now that this here might drop the last reference which in turn
> calls vfio_pci_dma_buf_release() which also tries to grab vdev-
> >memory_lock and so results in a deadlock.
AFAICS, vfio_pci_dma_buf_release() would not be called synchronously after the
last reference is dropped by dma_buf_put(). This is because fput(), which is called
by dma_buf_put() triggers f_op->release() asynchronously; therefore, a deadlock
is unlikely to occur in this scenario, unless I am overlooking something.

Thanks,
Vivek

> 
> +    }
> +}
> 
> This pattern was suggested multiple times and I had to rejected it every time
> because the whole idea is just fundamentally broken.
> 
> It's really astonishing how people always come up with the same broken
> pattern.
> 
> Regards,
> Christian.
> 
> 
> 
> 
> 
> 
> 
> 		Apart from that I have to reject the adding of
> dma_buf_try_get(), that is clearly not something we should do.
> 
> 
> 
> 	Understood. It appears that Vivek has confirmed that his v2 has
> resolved the issue. I will follow up with him to determine if he plans to
> resume his patch, and if so, I will apply my last patch on top of his updated
> patch series
> 
> 	Thanks,
> 	Wei Lin
> 
> 
> 		Thanks,
> 		Christian.
> 
> 
> 
> 


^ permalink raw reply	[flat|nested] 29+ messages in thread

* RE: [PATCH 0/4] cover-letter: Allow MMIO regions to be exported through dmabuf
  2024-12-17 12:19   ` Wei Lin Guay
@ 2024-12-18  7:02     ` Kasireddy, Vivek
       [not found]       ` <61DF4F0E-D947-436B-9160-A40079DB9085@meta.com>
  0 siblings, 1 reply; 29+ messages in thread
From: Kasireddy, Vivek @ 2024-12-18  7:02 UTC (permalink / raw)
  To: Wei Lin Guay
  Cc: alex.williamson@redhat.com, dri-devel@lists.freedesktop.org,
	kvm@vger.kernel.org, linux-rdma@vger.kernel.org, jgg@nvidia.com,
	Dag Moxnes, kbusch@kernel.org, Nicolaas Viljoen, Oded Gabbay,
	Christian König, Daniel Vetter, Leon Romanovsky,
	Maor Gottlieb

Hi Wei Lin,

> Subject: Re: [PATCH 0/4] cover-letter: Allow MMIO regions to be exported
> through dmabuf
> >>
> >> From: Wei Lin Guay <wguay@meta.com>
> >>
> >> This is another attempt to revive the patches posted by Jason
> >> Gunthorpe and Vivek Kasireddy, at
> >> https://patchwork.kernel.org/project/linux-media/cover/0-v2-
> >> 472615b3877e+28f7-vfio_dma_buf_jgg@nvidia.com/
> >>
> https://urldefense.com/v3/__https://lwn.net/Articles/970751/__;!!Bt8RZUm
> 9aw!5uPrlSI9DhXVIeRMntADbkdWcu4YYhAzGN0OwyQ2NHxrN7bYE9f1eQBX
> UD8xHHPxiJorSkYhNjIRwdG4PsD2$
> > v2: https://lore.kernel.org/dri-devel/20240624065552.1572580-1-
> vivek.kasireddy@intel.com/
> > addresses review comments from Alex and Jason and also includes the
> ability
> > to create the dmabuf from multiple ranges. This is really needed to future-
> proof
> > the feature.
> 
> The good thing is that your patch series addressed Christian’s concerns of
> adding dma_buf_try_get().
No, it did not address his concern. In v2, instead of adding a new function, we
are now calling get_file_rcu() directly, which was suggested by Christian in the
old original review. 

> 
> However, several questions regarding your v2 patch:
> - The dma-buf still support redundant mmap handler? (One of review
> comments from v1?)
Yeah, the mmap handler is really needed as a debugging tool given that the
importer would not be able to provide access to the dmabuf's underlying
memory via the CPU in any other way.

> - Rather than handle different regions within a single dma-buf, would vfio-
> user open multiple distinct file descriptors work?
> For our specific use case, we don't require multiple regions and prefer
> Jason’s original patch.
Restricting the dmabuf to a single region (or having to create multiple dmabufs
to represent multiple regions/ranges associated with a single scattered buffer)
would not be feasible or ideal in all cases. For instance, in my use-case, I am
sharing a large framebuffer (FB) located in GPU's VRAM. And, allocating a large
FB contiguously (nr_ranges = 1) in VRAM is not possible when there is memory
pressure.

Furthermore, since we are adding a new UAPI with this patch/feature, we cannot
go back and tweak it (to add support for nr_ranges > 1) should there be a need in
the future, but you can always use nr_ranges = 1 anytime. That is why it makes
sense to be flexible in terms of the number of ranges/regions.

> 
> >
> > Also, my understanding is that this patchset cannot proceed until Leon's
> series is merged:
> > https://lore.kernel.org/kvm/cover.1733398913.git.leon@kernel.org/
> 
> Thanks for the pointer.
> I will rebase my local patch series on top of that.
AFAIK, Leon's work includes new mechanism/APIs to do P2P DMA, which we
should be using in this patchset. And, I think he is also planning to use the
new APIs to augment dmabuf usage and not be forced to use the scatter-gather
list particularly in cases where the underlying memory is not backed by struct page.

I was just waiting for all of this to happen, before posting a v3.

Thanks,
Vivek

> 
> Thanks,
> Wei Lin
> 
> >
> >
> > Thanks,
> > Vivek
> >
> >>
> >> In addition to the initial proposal by Jason, another promising
> >> application is exposing memory from an AI accelerator (bound to VFIO)
> >> to an RDMA device. This would allow the RDMA device to directly access
> >> the accelerator's memory, thereby facilitating direct data
> >> transactions between the RDMA device and the accelerator.
> >>
> >> Below is from the text/motivation from the orginal cover letter.
> >>
> >> dma-buf has become a way to safely acquire a handle to non-struct page
> >> memory that can still have lifetime controlled by the exporter. Notably
> >> RDMA can now import dma-buf FDs and build them into MRs which
> allows
> >> for
> >> PCI P2P operations. Extend this to allow vfio-pci to export MMIO memory
> >> from PCI device BARs.
> >>
> >> This series supports a use case for SPDK where a NVMe device will be
> owned
> >> by SPDK through VFIO but interacting with a RDMA device. The RDMA
> device
> >> may directly access the NVMe CMB or directly manipulate the NVMe
> device's
> >> doorbell using PCI P2P.
> >>
> >> However, as a general mechanism, it can support many other scenarios
> with
> >> VFIO. I imagine this dmabuf approach to be usable by iommufd as well for
> >> generic and safe P2P mappings.
> >>
> >> This series goes after the "Break up ioctl dispatch functions to one
> >> function per ioctl" series.
> >>
> >> v2:
> >> - Name the new file dma_buf.c
> >> - Restore orig_nents before freeing
> >> - Fix reversed logic around priv->revoked
> >> - Set priv->index
> >> - Rebased on v2 "Break up ioctl dispatch functions"
> >> v1: https://lore.kernel.org/r/0-v1-9e6e1739ed95+5fa-
> >> vfio_dma_buf_jgg@nvidia.com
> >> Cc: linux-rdma@vger.kernel.org
> >> Cc: Oded Gabbay <ogabbay@kernel.org>
> >> Cc: Christian König <christian.koenig@amd.com>
> >> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> >> Cc: Leon Romanovsky <leon@kernel.org>
> >> Cc: Maor Gottlieb <maorg@nvidia.com>
> >> Cc: dri-devel@lists.freedesktop.org
> >> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
> >>
> >> Jason Gunthorpe (3):
> >>  vfio: Add vfio_device_get()
> >>  dma-buf: Add dma_buf_try_get()
> >>  vfio/pci: Allow MMIO regions to be exported through dma-buf
> >>
> >> Wei Lin Guay (1):
> >>  vfio/pci: Allow export dmabuf without move_notify from importer
> >>
> >> drivers/vfio/pci/Makefile          |   1 +
> >> drivers/vfio/pci/dma_buf.c         | 291 +++++++++++++++++++++++++++++
> >> drivers/vfio/pci/vfio_pci_config.c |   8 +-
> >> drivers/vfio/pci/vfio_pci_core.c   |  44 ++++-
> >> drivers/vfio/pci/vfio_pci_priv.h   |  30 +++
> >> drivers/vfio/vfio_main.c           |   1 +
> >> include/linux/dma-buf.h            |  13 ++
> >> include/linux/vfio.h               |   6 +
> >> include/linux/vfio_pci_core.h      |   1 +
> >> include/uapi/linux/vfio.h          |  18 ++
> >> 10 files changed, 405 insertions(+), 8 deletions(-)
> >> create mode 100644 drivers/vfio/pci/dma_buf.c
> >>
> >> --
> >> 2.43.5


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 0/4] cover-letter: Allow MMIO regions to be exported through dmabuf
  2024-12-18  6:16           ` Kasireddy, Vivek
@ 2024-12-18 10:01             ` Christian König
  2024-12-19  7:02               ` Kasireddy, Vivek
  0 siblings, 1 reply; 29+ messages in thread
From: Christian König @ 2024-12-18 10:01 UTC (permalink / raw)
  To: Kasireddy, Vivek, Wei Lin Guay
  Cc: Keith Busch, alex.williamson@redhat.com,
	dri-devel@lists.freedesktop.org, kvm@vger.kernel.org,
	linux-rdma@vger.kernel.org, jgg@nvidia.com, Dag Moxnes,
	Nicolaas Viljoen, Oded Gabbay, Simona Vetter, Leon Romanovsky,
	Maor Gottlieb

Am 18.12.24 um 07:16 schrieb Kasireddy, Vivek:
> Hi Christian,
>
>> Subject: Re: [PATCH 0/4] cover-letter: Allow MMIO regions to be exported
>> through dmabuf
>>
>>
>>> 	I will resend the patch series. I was experiencing issues with my email
>>> client, which inadvertently split the series into two separate emails.
>>
>> Alternatively I can also copy the code from the list archive and explain why
>> that doesn't work:
>>
>> +void vfio_pci_dma_buf_move(struct vfio_pci_core_device *vdev, bool
>> +revoked) {
>> +    struct vfio_pci_dma_buf *priv;
>> +    struct vfio_pci_dma_buf *tmp;
>> +
>> +    lockdep_assert_held_write(&vdev->memory_lock);
>>
>> This makes sure that the caller is holding vdev->memory_lock.
>>
>> +
>> +    list_for_each_entry_safe(priv, tmp, &vdev->dmabufs, dmabufs_elm) {
>> +        if (!dma_buf_try_get(priv->dmabuf))
>>
>> This here only works because vfio_pci_dma_buf_release() also grabs vdev-
>>> memory_lock and so we are protected against concurrent releases.
>> +            continue;
>> +        if (priv->revoked != revoked) {
>> +            dma_resv_lock(priv->dmabuf->resv, NULL);
>> +            priv->revoked = revoked;
>> +            dma_buf_move_notify(priv->dmabuf);
>> +            dma_resv_unlock(priv->dmabuf->resv);
>> +        }
>> +        dma_buf_put(priv->dmabuf);
>>
>> The problem is now that this here might drop the last reference which in turn
>> calls vfio_pci_dma_buf_release() which also tries to grab vdev-
>>> memory_lock and so results in a deadlock.
> AFAICS, vfio_pci_dma_buf_release() would not be called synchronously after the
> last reference is dropped by dma_buf_put(). This is because fput(), which is called
> by dma_buf_put() triggers f_op->release() asynchronously; therefore, a deadlock
> is unlikely to occur in this scenario, unless I am overlooking something.

My recollection is that the f_op->release handler is only called 
asynchronously if fput() was issued in interrupt context.

But could be that this information is outdated.

Regards,
Christian.

>
> Thanks,
> Vivek
>
>> +    }
>> +}
>>
>> This pattern was suggested multiple times and I had to rejected it every time
>> because the whole idea is just fundamentally broken.
>>
>> It's really astonishing how people always come up with the same broken
>> pattern.
>>
>> Regards,
>> Christian.
>>
>>
>>
>>
>>
>>
>>
>> 		Apart from that I have to reject the adding of
>> dma_buf_try_get(), that is clearly not something we should do.
>>
>>
>>
>> 	Understood. It appears that Vivek has confirmed that his v2 has
>> resolved the issue. I will follow up with him to determine if he plans to
>> resume his patch, and if so, I will apply my last patch on top of his updated
>> patch series
>>
>> 	Thanks,
>> 	Wei Lin
>>
>>
>> 		Thanks,
>> 		Christian.
>>
>>
>>
>>


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 0/4] cover-letter: Allow MMIO regions to be exported through dmabuf
  2024-12-17  9:53     ` Christian König
  2024-12-17 11:06       ` Wei Lin Guay
@ 2024-12-18 10:44       ` Simona Vetter
  1 sibling, 0 replies; 29+ messages in thread
From: Simona Vetter @ 2024-12-18 10:44 UTC (permalink / raw)
  To: Christian König
  Cc: Keith Busch, Wei Lin Guay, alex.williamson, dri-devel, kvm,
	linux-rdma, jgg, vivek.kasireddy, dagmoxnes, nviljoen,
	Wei Lin Guay, Oded Gabbay, Daniel Vetter, Leon Romanovsky,
	Maor Gottlieb

On Tue, Dec 17, 2024 at 10:53:32AM +0100, Christian König wrote:
> Am 16.12.24 um 17:54 schrieb Keith Busch:
> > On Mon, Dec 16, 2024 at 11:21:39AM +0100, Christian König wrote:
> > > Am 16.12.24 um 10:54 schrieb Wei Lin Guay:
> > > > From: Wei Lin Guay <wguay@meta.com>
> > > > However, as a general mechanism, it can support many other scenarios with
> > > > VFIO. I imagine this dmabuf approach to be usable by iommufd as well for
> > > > generic and safe P2P mappings.
> > > > 
> > > > This series goes after the "Break up ioctl dispatch functions to one
> > > > function per ioctl" series.
> > > Yeah that sounds like it should work.
> > > 
> > > But where is the rest of the series, I only see the cover letter?
> > Should be here:
> > 
> >    https://lore.kernel.org/linux-rdma/20241216095429.210792-2-wguay@fb.com/T/#u
> 
> Please send that out once more with me on explicit CC.
> 
> Apart from that I have to reject the adding of dma_buf_try_get(), that is
> clearly not something we should do.

Yeah if we do try_get it would need to be at least on a specific dma_buf
type (so checking for dma_buf->ops), since in general this does not work
(unless we add the general code in dma_buf.c somehow to make it work).
Aside from any other design concerns.
-Sima
-- 
Simona Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 29+ messages in thread

* RE: [PATCH 0/4] cover-letter: Allow MMIO regions to be exported through dmabuf
  2024-12-18 10:01             ` Christian König
@ 2024-12-19  7:02               ` Kasireddy, Vivek
  2024-12-19 10:04                 ` Christian König
  0 siblings, 1 reply; 29+ messages in thread
From: Kasireddy, Vivek @ 2024-12-19  7:02 UTC (permalink / raw)
  To: Christian König, Wei Lin Guay
  Cc: Keith Busch, alex.williamson@redhat.com,
	dri-devel@lists.freedesktop.org, kvm@vger.kernel.org,
	linux-rdma@vger.kernel.org, jgg@nvidia.com, Dag Moxnes,
	Nicolaas Viljoen, Oded Gabbay, Simona Vetter, Leon Romanovsky,
	Maor Gottlieb

Hi Christian,

> Subject: Re: [PATCH 0/4] cover-letter: Allow MMIO regions to be exported
> through dmabuf
> 
> >>
> >>> 	I will resend the patch series. I was experiencing issues with my email
> >>> client, which inadvertently split the series into two separate emails.
> >>
> >> Alternatively I can also copy the code from the list archive and explain why
> >> that doesn't work:
> >>
> >> +void vfio_pci_dma_buf_move(struct vfio_pci_core_device *vdev, bool
> >> +revoked) {
> >> +    struct vfio_pci_dma_buf *priv;
> >> +    struct vfio_pci_dma_buf *tmp;
> >> +
> >> +    lockdep_assert_held_write(&vdev->memory_lock);
> >>
> >> This makes sure that the caller is holding vdev->memory_lock.
> >>
> >> +
> >> +    list_for_each_entry_safe(priv, tmp, &vdev->dmabufs, dmabufs_elm) {
> >> +        if (!dma_buf_try_get(priv->dmabuf))
> >>
> >> This here only works because vfio_pci_dma_buf_release() also grabs
> vdev-
> >>> memory_lock and so we are protected against concurrent releases.
> >> +            continue;
> >> +        if (priv->revoked != revoked) {
> >> +            dma_resv_lock(priv->dmabuf->resv, NULL);
> >> +            priv->revoked = revoked;
> >> +            dma_buf_move_notify(priv->dmabuf);
> >> +            dma_resv_unlock(priv->dmabuf->resv);
> >> +        }
> >> +        dma_buf_put(priv->dmabuf);
> >>
> >> The problem is now that this here might drop the last reference which in
> turn
> >> calls vfio_pci_dma_buf_release() which also tries to grab vdev-
> >>> memory_lock and so results in a deadlock.
> > AFAICS, vfio_pci_dma_buf_release() would not be called synchronously
> after the
> > last reference is dropped by dma_buf_put(). This is because fput(), which is
> called
> > by dma_buf_put() triggers f_op->release() asynchronously; therefore, a
> deadlock
> > is unlikely to occur in this scenario, unless I am overlooking something.
> 
> My recollection is that the f_op->release handler is only called
> asynchronously if fput() was issued in interrupt context.
Here is the code of fput() from the current master:
void fput(struct file *file)
{
        if (file_ref_put(&file->f_ref)) {
                struct task_struct *task = current;

                if (unlikely(!(file->f_mode & (FMODE_BACKING | FMODE_OPENED)))) {
                        file_free(file);
                        return;
                }
                if (likely(!in_interrupt() && !(task->flags & PF_KTHREAD))) {
                        init_task_work(&file->f_task_work, ____fput);
                        if (!task_work_add(task, &file->f_task_work, TWA_RESUME))
                                return;
                        /*
                         * After this task has run exit_task_work(),
                         * task_work_add() will fail.  Fall through to delayed
                         * fput to avoid leaking *file.
                         */
                }

                if (llist_add(&file->f_llist, &delayed_fput_list))
                        schedule_delayed_work(&delayed_fput_work, 1);
        }
}

IIUC, based on the above code, it looks like there are two ways in which the
f_op->release() handler is triggered from fput():
- via delayed_fput() for kernel threads and code in interrupt context
- via task_work_run() just before the task/process returns to the user-mode

The first scenario above is definitely asynchronous as the release() handler is
called from a worker thread. But I think the second case (which is the most
common and relevant for our use-case) can also be considered asynchronous,
because the execution of the system call or ioctl that led to the context in
which dma_buf_put() was called is completed.

Here is a trace from my light testing with the udmabuf driver, where you can
see the release() handler being called by syscall_exit_to_user_mode() :
[  158.464203] Call Trace:
[  158.466681]  <TASK>
[  158.468815]  dump_stack_lvl+0x60/0x80
[  158.472507]  dump_stack+0x14/0x16
[  158.475853]  release_udmabuf+0x2f/0x9f
[  158.479631]  dma_buf_release+0x3c/0x90
[  158.483408]  __dentry_kill+0x8f/0x180
[  158.487098]  dput+0xe7/0x1a0
[  158.490013]  __fput+0x131/0x2b0
[  158.493178]  ____fput+0x19/0x20
[  158.496352]  task_work_run+0x61/0x90
[  158.499959]  syscall_exit_to_user_mode+0x1a4/0x1b0
[  158.504769]  do_syscall_64+0x5b/0x110
[  158.508458]  entry_SYSCALL_64_after_hwframe+0x4b/0x53

And, here is the relevant syscall code (from arch/x86/entry/common.c):
__visible noinstr bool do_syscall_64(struct pt_regs *regs, int nr)
{
        add_random_kstack_offset();
        nr = syscall_enter_from_user_mode(regs, nr);

        instrumentation_begin();

        if (!do_syscall_x64(regs, nr) && !do_syscall_x32(regs, nr) && nr != -1) {
                /* Invalid system call, but still a system call. */
                regs->ax = __x64_sys_ni_syscall(regs);
        }

        instrumentation_end();
        syscall_exit_to_user_mode(regs);

I also confirmed that the release() handler is indeed called after dma_buf_put()
(and not by dma_buf_put()) by adding debug prints before and after
dma_buf_put() and one in the release() handler. Furthermore, I also found
that calling close() on the dmabuf fd in the user-space is one scenario in
which fput() is called synchronously. Here is the relevant trace:
[  302.770910] Call Trace:
[  302.773389]  <TASK>
[  302.775516]  dump_stack_lvl+0x60/0x80
[  302.779209]  dump_stack+0x14/0x16
[  302.782549]  release_udmabuf+0x2f/0x9f
[  302.786329]  dma_buf_release+0x3c/0x90
[  302.790105]  __dentry_kill+0x8f/0x180
[  302.793789]  dput+0xe7/0x1a0
[  302.796703]  __fput+0x131/0x2b0
[  302.799866]  __fput_sync+0x53/0x70
[  302.803299]  __x64_sys_close+0x58/0xc0
[  302.807076]  x64_sys_call+0x126a/0x17d0
[  302.810938]  do_syscall_64+0x4f/0x110
[  302.814622]  entry_SYSCALL_64_after_hwframe+0x4b/0x53

As you can see above, there is indeed a synchronous version of fput() defined
just below fput():
/*
 * synchronous analog of fput(); for kernel threads that might be needed
 * in some umount() (and thus can't use flush_delayed_fput() without
 * risking deadlocks), need to wait for completion of __fput() and know
 * for this specific struct file it won't involve anything that would
 * need them.  Use only if you really need it - at the very least,
 * don't blindly convert fput() by kernel thread to that.
 */
void __fput_sync(struct file *file)
{
	if (file_ref_put(&file->f_ref))
		__fput(file);
}

Based on all the above, I think we can conclude that since dma_buf_put()
does not directly (or synchronously) call the f_op->release() handler, a
deadlock is unlikely to occur in the scenario you described.

Thanks,
Vivek

> 
> But could be that this information is outdated.
> 
> Regards,
> Christian.
> 
> >
> > Thanks,
> > Vivek
> >
> >> +    }
> >> +}
> >>
> >> This pattern was suggested multiple times and I had to rejected it every
> time
> >> because the whole idea is just fundamentally broken.
> >>
> >> It's really astonishing how people always come up with the same broken
> >> pattern.
> >>
> >> Regards,
> >> Christian.
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >> 		Apart from that I have to reject the adding of
> >> dma_buf_try_get(), that is clearly not something we should do.
> >>
> >>
> >>
> >> 	Understood. It appears that Vivek has confirmed that his v2 has
> >> resolved the issue. I will follow up with him to determine if he plans to
> >> resume his patch, and if so, I will apply my last patch on top of his
> updated
> >> patch series
> >>
> >> 	Thanks,
> >> 	Wei Lin
> >>
> >>
> >> 		Thanks,
> >> 		Christian.
> >>
> >>
> >>
> >>


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 0/4] cover-letter: Allow MMIO regions to be exported through dmabuf
  2024-12-19  7:02               ` Kasireddy, Vivek
@ 2024-12-19 10:04                 ` Christian König
  2025-01-02 13:39                   ` Jason Gunthorpe
  0 siblings, 1 reply; 29+ messages in thread
From: Christian König @ 2024-12-19 10:04 UTC (permalink / raw)
  To: Kasireddy, Vivek, Wei Lin Guay
  Cc: Keith Busch, alex.williamson@redhat.com,
	dri-devel@lists.freedesktop.org, kvm@vger.kernel.org,
	linux-rdma@vger.kernel.org, jgg@nvidia.com, Dag Moxnes,
	Nicolaas Viljoen, Oded Gabbay, Simona Vetter, Leon Romanovsky,
	Maor Gottlieb

Am 19.12.24 um 08:02 schrieb Kasireddy, Vivek:
> Hi Christian,
>
>> Subject: Re: [PATCH 0/4] cover-letter: Allow MMIO regions to be exported
>> through dmabuf
>>
>>>>> 	I will resend the patch series. I was experiencing issues with my email
>>>>> client, which inadvertently split the series into two separate emails.
>>>> Alternatively I can also copy the code from the list archive and explain why
>>>> that doesn't work:
>>>>
>>>> +void vfio_pci_dma_buf_move(struct vfio_pci_core_device *vdev, bool
>>>> +revoked) {
>>>> +    struct vfio_pci_dma_buf *priv;
>>>> +    struct vfio_pci_dma_buf *tmp;
>>>> +
>>>> +    lockdep_assert_held_write(&vdev->memory_lock);
>>>>
>>>> This makes sure that the caller is holding vdev->memory_lock.
>>>>
>>>> +
>>>> +    list_for_each_entry_safe(priv, tmp, &vdev->dmabufs, dmabufs_elm) {
>>>> +        if (!dma_buf_try_get(priv->dmabuf))
>>>>
>>>> This here only works because vfio_pci_dma_buf_release() also grabs
>> vdev-
>>>>> memory_lock and so we are protected against concurrent releases.
>>>> +            continue;
>>>> +        if (priv->revoked != revoked) {
>>>> +            dma_resv_lock(priv->dmabuf->resv, NULL);
>>>> +            priv->revoked = revoked;
>>>> +            dma_buf_move_notify(priv->dmabuf);
>>>> +            dma_resv_unlock(priv->dmabuf->resv);
>>>> +        }
>>>> +        dma_buf_put(priv->dmabuf);
>>>>
>>>> The problem is now that this here might drop the last reference which in
>> turn
>>>> calls vfio_pci_dma_buf_release() which also tries to grab vdev-
>>>>> memory_lock and so results in a deadlock.
>>> AFAICS, vfio_pci_dma_buf_release() would not be called synchronously
>> after the
>>> last reference is dropped by dma_buf_put(). This is because fput(), which is
>> called
>>> by dma_buf_put() triggers f_op->release() asynchronously; therefore, a
>> deadlock
>>> is unlikely to occur in this scenario, unless I am overlooking something.
>> My recollection is that the f_op->release handler is only called
>> asynchronously if fput() was issued in interrupt context.
> Here is the code of fput() from the current master:
> void fput(struct file *file)
> {
>          if (file_ref_put(&file->f_ref)) {
>                  struct task_struct *task = current;
>
>                  if (unlikely(!(file->f_mode & (FMODE_BACKING | FMODE_OPENED)))) {
>                          file_free(file);
>                          return;
>                  }
>                  if (likely(!in_interrupt() && !(task->flags & PF_KTHREAD))) {
>                          init_task_work(&file->f_task_work, ____fput);
>                          if (!task_work_add(task, &file->f_task_work, TWA_RESUME))
>                                  return;
>                          /*
>                           * After this task has run exit_task_work(),
>                           * task_work_add() will fail.  Fall through to delayed
>                           * fput to avoid leaking *file.
>                           */
>                  }
>
>                  if (llist_add(&file->f_llist, &delayed_fput_list))
>                          schedule_delayed_work(&delayed_fput_work, 1);
>          }
> }
>
> IIUC, based on the above code, it looks like there are two ways in which the
> f_op->release() handler is triggered from fput():
> - via delayed_fput() for kernel threads and code in interrupt context
> - via task_work_run() just before the task/process returns to the user-mode
>
> The first scenario above is definitely asynchronous as the release() handler is
> called from a worker thread. But I think the second case (which is the most
> common and relevant for our use-case) can also be considered asynchronous,
> because the execution of the system call or ioctl that led to the context in
> which dma_buf_put() was called is completed.
>
> Here is a trace from my light testing with the udmabuf driver, where you can
> see the release() handler being called by syscall_exit_to_user_mode() :
> [  158.464203] Call Trace:
> [  158.466681]  <TASK>
> [  158.468815]  dump_stack_lvl+0x60/0x80
> [  158.472507]  dump_stack+0x14/0x16
> [  158.475853]  release_udmabuf+0x2f/0x9f
> [  158.479631]  dma_buf_release+0x3c/0x90
> [  158.483408]  __dentry_kill+0x8f/0x180
> [  158.487098]  dput+0xe7/0x1a0
> [  158.490013]  __fput+0x131/0x2b0
> [  158.493178]  ____fput+0x19/0x20
> [  158.496352]  task_work_run+0x61/0x90
> [  158.499959]  syscall_exit_to_user_mode+0x1a4/0x1b0
> [  158.504769]  do_syscall_64+0x5b/0x110
> [  158.508458]  entry_SYSCALL_64_after_hwframe+0x4b/0x53
>
> And, here is the relevant syscall code (from arch/x86/entry/common.c):
> __visible noinstr bool do_syscall_64(struct pt_regs *regs, int nr)
> {
>          add_random_kstack_offset();
>          nr = syscall_enter_from_user_mode(regs, nr);
>
>          instrumentation_begin();
>
>          if (!do_syscall_x64(regs, nr) && !do_syscall_x32(regs, nr) && nr != -1) {
>                  /* Invalid system call, but still a system call. */
>                  regs->ax = __x64_sys_ni_syscall(regs);
>          }
>
>          instrumentation_end();
>          syscall_exit_to_user_mode(regs);
>
> I also confirmed that the release() handler is indeed called after dma_buf_put()
> (and not by dma_buf_put()) by adding debug prints before and after
> dma_buf_put() and one in the release() handler. Furthermore, I also found
> that calling close() on the dmabuf fd in the user-space is one scenario in
> which fput() is called synchronously. Here is the relevant trace:
> [  302.770910] Call Trace:
> [  302.773389]  <TASK>
> [  302.775516]  dump_stack_lvl+0x60/0x80
> [  302.779209]  dump_stack+0x14/0x16
> [  302.782549]  release_udmabuf+0x2f/0x9f
> [  302.786329]  dma_buf_release+0x3c/0x90
> [  302.790105]  __dentry_kill+0x8f/0x180
> [  302.793789]  dput+0xe7/0x1a0
> [  302.796703]  __fput+0x131/0x2b0
> [  302.799866]  __fput_sync+0x53/0x70
> [  302.803299]  __x64_sys_close+0x58/0xc0
> [  302.807076]  x64_sys_call+0x126a/0x17d0
> [  302.810938]  do_syscall_64+0x4f/0x110
> [  302.814622]  entry_SYSCALL_64_after_hwframe+0x4b/0x53
>
> As you can see above, there is indeed a synchronous version of fput() defined
> just below fput():
> /*
>   * synchronous analog of fput(); for kernel threads that might be needed
>   * in some umount() (and thus can't use flush_delayed_fput() without
>   * risking deadlocks), need to wait for completion of __fput() and know
>   * for this specific struct file it won't involve anything that would
>   * need them.  Use only if you really need it - at the very least,
>   * don't blindly convert fput() by kernel thread to that.
>   */
> void __fput_sync(struct file *file)
> {
> 	if (file_ref_put(&file->f_ref))
> 		__fput(file);
> }
>
> Based on all the above, I think we can conclude that since dma_buf_put()
> does not directly (or synchronously) call the f_op->release() handler, a
> deadlock is unlikely to occur in the scenario you described.

Yeah, I agree.

Interesting to know, I wasn't aware that the task_work functionality 
exists for that use case.

Thanks,
Christian.

>
> Thanks,
> Vivek
>
>> But could be that this information is outdated.
>>
>> Regards,
>> Christian.
>>
>>> Thanks,
>>> Vivek
>>>
>>>> +    }
>>>> +}
>>>>
>>>> This pattern was suggested multiple times and I had to rejected it every
>> time
>>>> because the whole idea is just fundamentally broken.
>>>>
>>>> It's really astonishing how people always come up with the same broken
>>>> pattern.
>>>>
>>>> Regards,
>>>> Christian.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> 		Apart from that I have to reject the adding of
>>>> dma_buf_try_get(), that is clearly not something we should do.
>>>>
>>>>
>>>>
>>>> 	Understood. It appears that Vivek has confirmed that his v2 has
>>>> resolved the issue. I will follow up with him to determine if he plans to
>>>> resume his patch, and if so, I will apply my last patch on top of his
>> updated
>>>> patch series
>>>>
>>>> 	Thanks,
>>>> 	Wei Lin
>>>>
>>>>
>>>> 		Thanks,
>>>> 		Christian.
>>>>
>>>>
>>>>
>>>>


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 0/4] cover-letter: Allow MMIO regions to be exported through dmabuf
  2024-12-19 10:04                 ` Christian König
@ 2025-01-02 13:39                   ` Jason Gunthorpe
  2025-01-06 12:05                     ` Simona Vetter
  0 siblings, 1 reply; 29+ messages in thread
From: Jason Gunthorpe @ 2025-01-02 13:39 UTC (permalink / raw)
  To: Christian König
  Cc: Kasireddy, Vivek, Wei Lin Guay, Keith Busch,
	alex.williamson@redhat.com, dri-devel@lists.freedesktop.org,
	kvm@vger.kernel.org, linux-rdma@vger.kernel.org, Dag Moxnes,
	Nicolaas Viljoen, Oded Gabbay, Simona Vetter, Leon Romanovsky,
	Maor Gottlieb

On Thu, Dec 19, 2024 at 11:04:54AM +0100, Christian König wrote:

> > Based on all the above, I think we can conclude that since dma_buf_put()
> > does not directly (or synchronously) call the f_op->release() handler, a
> > deadlock is unlikely to occur in the scenario you described.

Right, there is no deadlock here, and there is nothing inhernetly
wrong with using try_get like this. The locking here is ugly ugly
ugly, I forget why, but this was the best I came up with to untangle
it without deadlocks or races.

> Yeah, I agree.
> 
> Interesting to know, I wasn't aware that the task_work functionality exists
> for that use case.

IIRC it was changed a while back because call chains were getting kind of
crazy long with the file release functions and stack overflow was a
problem in some cases.

Jason

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 0/4] cover-letter: Allow MMIO regions to be exported through dmabuf
  2025-01-02 13:39                   ` Jason Gunthorpe
@ 2025-01-06 12:05                     ` Simona Vetter
  2025-01-06 16:27                       ` Jason Gunthorpe
  0 siblings, 1 reply; 29+ messages in thread
From: Simona Vetter @ 2025-01-06 12:05 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Christian König, Kasireddy, Vivek, Wei Lin Guay, Keith Busch,
	alex.williamson@redhat.com, dri-devel@lists.freedesktop.org,
	kvm@vger.kernel.org, linux-rdma@vger.kernel.org, Dag Moxnes,
	Nicolaas Viljoen, Oded Gabbay, Simona Vetter, Leon Romanovsky,
	Maor Gottlieb

On Thu, Jan 02, 2025 at 09:39:51AM -0400, Jason Gunthorpe wrote:
> On Thu, Dec 19, 2024 at 11:04:54AM +0100, Christian König wrote:
> 
> > > Based on all the above, I think we can conclude that since dma_buf_put()
> > > does not directly (or synchronously) call the f_op->release() handler, a
> > > deadlock is unlikely to occur in the scenario you described.
> 
> Right, there is no deadlock here, and there is nothing inhernetly
> wrong with using try_get like this. The locking here is ugly ugly
> ugly, I forget why, but this was the best I came up with to untangle
> it without deadlocks or races.

Yeah, imo try_get is perfectly fine pattern. With that sorted my only
request is to make the try_get specific to the dma_ops, because it only
works if both ->release and the calling context of try_get follow the same
rules, which by necessity are exporter specific.

In full generality as a dma_buf.c interface it's just busted and there's
no way to make it work, unless we inflict that locking ugliness on
absolutely every exporter.

> > Yeah, I agree.
> > 
> > Interesting to know, I wasn't aware that the task_work functionality exists
> > for that use case.
> 
> IIRC it was changed a while back because call chains were getting kind of
> crazy long with the file release functions and stack overflow was a
> problem in some cases.

Hm I thought it was also a "oh fuck deadlocks" moment, because usually
most of the very deep fput calls are for temporary reference and not the
final one. Unless userspace is sneaky and drops its own fd reference in a
separate thread with close(), then everything goes boom. And untangling
all these was impossible.
-Sima
-- 
Simona Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 0/4] cover-letter: Allow MMIO regions to be exported through dmabuf
  2025-01-06 12:05                     ` Simona Vetter
@ 2025-01-06 16:27                       ` Jason Gunthorpe
  2025-01-08 16:48                         ` Simona Vetter
  0 siblings, 1 reply; 29+ messages in thread
From: Jason Gunthorpe @ 2025-01-06 16:27 UTC (permalink / raw)
  To: Simona Vetter
  Cc: Christian König, Kasireddy, Vivek, Wei Lin Guay, Keith Busch,
	alex.williamson@redhat.com, dri-devel@lists.freedesktop.org,
	kvm@vger.kernel.org, linux-rdma@vger.kernel.org, Dag Moxnes,
	Nicolaas Viljoen, Oded Gabbay, Leon Romanovsky, Maor Gottlieb

On Mon, Jan 06, 2025 at 01:05:08PM +0100, Simona Vetter wrote:
> On Thu, Jan 02, 2025 at 09:39:51AM -0400, Jason Gunthorpe wrote:
> > On Thu, Dec 19, 2024 at 11:04:54AM +0100, Christian König wrote:
> > 
> > > > Based on all the above, I think we can conclude that since dma_buf_put()
> > > > does not directly (or synchronously) call the f_op->release() handler, a
> > > > deadlock is unlikely to occur in the scenario you described.
> > 
> > Right, there is no deadlock here, and there is nothing inhernetly
> > wrong with using try_get like this. The locking here is ugly ugly
> > ugly, I forget why, but this was the best I came up with to untangle
> > it without deadlocks or races.
> 
> Yeah, imo try_get is perfectly fine pattern. With that sorted my only
> request is to make the try_get specific to the dma_ops, because it only
> works if both ->release and the calling context of try_get follow the same
> rules, which by necessity are exporter specific.

We already know the fd is a dma_ops one because it is on an internal
list and it could not get there otherwise.

The pointer cannot become invalid and freed back to the type safe RCU
while on the list, meaning the try_get is safe as is.

I think Christian's original advice to just open code it is the best
option.

> In full generality as a dma_buf.c interface it's just busted and there's
> no way to make it work, unless we inflict that locking ugliness on
> absolutely every exporter.

I'm not sure about that, the struct file code has special logic to
accommodate the type safe RCU trick. I didn't try to digest it fully,
but I expect there are ways to use it safely without the locking on
release.

> > IIRC it was changed a while back because call chains were getting kind of
> > crazy long with the file release functions and stack overflow was a
> > problem in some cases.
> 
> Hm I thought it was also a "oh fuck deadlocks" moment, because usually
> most of the very deep fput calls are for temporary reference and not the
> final one.

That sounds motivating too, but I've also seen cases of being careful
to unlock before fputting things..

Jason

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 0/4] cover-letter: Allow MMIO regions to be exported through dmabuf
  2025-01-06 16:27                       ` Jason Gunthorpe
@ 2025-01-08 16:48                         ` Simona Vetter
  0 siblings, 0 replies; 29+ messages in thread
From: Simona Vetter @ 2025-01-08 16:48 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Simona Vetter, Christian König, Kasireddy, Vivek,
	Wei Lin Guay, Keith Busch, alex.williamson@redhat.com,
	dri-devel@lists.freedesktop.org, kvm@vger.kernel.org,
	linux-rdma@vger.kernel.org, Dag Moxnes, Nicolaas Viljoen,
	Oded Gabbay, Leon Romanovsky, Maor Gottlieb

On Mon, Jan 06, 2025 at 12:27:57PM -0400, Jason Gunthorpe wrote:
> On Mon, Jan 06, 2025 at 01:05:08PM +0100, Simona Vetter wrote:
> > On Thu, Jan 02, 2025 at 09:39:51AM -0400, Jason Gunthorpe wrote:
> > > On Thu, Dec 19, 2024 at 11:04:54AM +0100, Christian König wrote:
> > > 
> > > > > Based on all the above, I think we can conclude that since dma_buf_put()
> > > > > does not directly (or synchronously) call the f_op->release() handler, a
> > > > > deadlock is unlikely to occur in the scenario you described.
> > > 
> > > Right, there is no deadlock here, and there is nothing inhernetly
> > > wrong with using try_get like this. The locking here is ugly ugly
> > > ugly, I forget why, but this was the best I came up with to untangle
> > > it without deadlocks or races.
> > 
> > Yeah, imo try_get is perfectly fine pattern. With that sorted my only
> > request is to make the try_get specific to the dma_ops, because it only
> > works if both ->release and the calling context of try_get follow the same
> > rules, which by necessity are exporter specific.
> 
> We already know the fd is a dma_ops one because it is on an internal
> list and it could not get there otherwise.
> 
> The pointer cannot become invalid and freed back to the type safe RCU
> while on the list, meaning the try_get is safe as is.
> 
> I think Christian's original advice to just open code it is the best
> option.

Yeah open coding in vfio is imo best, agreed on that.

> > In full generality as a dma_buf.c interface it's just busted and there's
> > no way to make it work, unless we inflict that locking ugliness on
> > absolutely every exporter.
> 
> I'm not sure about that, the struct file code has special logic to
> accommodate the type safe RCU trick. I didn't try to digest it fully,
> but I expect there are ways to use it safely without the locking on
> release.

Hm yes, if you use a write barrier when set your file pointer and clear it
in release, then you can use get_file_rcu with just rcu_read_lock on the
read side. But you have to directly use your own struct file * pointer
since it needs to reload that in a loop, you can't use dma_buf->file.

At that point you're already massively peeking behind the dma_buf
abstraction that just directly using get_file_rcu is imo better.

And for anything else, whether it's rcu or plain locks, it's all subsystem
specific anyway that I think a dma_buf.c wrapper

Not entirely sure, but for the dma_buf_try_get wrapper you have to put a
full lock acquisition or rcu_sychnronize into your ->release callback,
otherwise I don't think it's safe to use.

> > > IIRC it was changed a while back because call chains were getting kind of
> > > crazy long with the file release functions and stack overflow was a
> > > problem in some cases.
> > 
> > Hm I thought it was also a "oh fuck deadlocks" moment, because usually
> > most of the very deep fput calls are for temporary reference and not the
> > final one.
> 
> That sounds motivating too, but I've also seen cases of being careful
> to unlock before fputting things..

Yeah I think knowledge about this issue was very uneven in different
subsystems.
-Sima
-- 
Simona Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 29+ messages in thread

* RE: [PATCH 0/4] cover-letter: Allow MMIO regions to be exported through dmabuf
       [not found]       ` <61DF4F0E-D947-436B-9160-A40079DB9085@meta.com>
@ 2025-02-26  7:55         ` Kasireddy, Vivek
  2025-02-26 13:38           ` Jason Gunthorpe
  2025-02-26 18:24           ` Wei Lin Guay
  0 siblings, 2 replies; 29+ messages in thread
From: Kasireddy, Vivek @ 2025-02-26  7:55 UTC (permalink / raw)
  To: Wei Lin Guay
  Cc: alex.williamson@redhat.com, dri-devel@lists.freedesktop.org,
	kvm@vger.kernel.org, linux-rdma@vger.kernel.org, jgg@nvidia.com,
	Dag Moxnes, kbusch@kernel.org, Nicolaas Viljoen, Oded Gabbay,
	Christian König, Simona Vetter, Leon Romanovsky,
	Maor Gottlieb

Hi Wei Lin,

[...]
> 
> 	Yeah, the mmap handler is really needed as a debugging tool given
> that the
> 	importer would not be able to provide access to the dmabuf's
> underlying
> 	memory via the CPU in any other way.
> 
> 
> 
> 		- Rather than handle different regions within a single dma-buf,
> would vfio-
> 		user open multiple distinct file descriptors work?
> 		For our specific use case, we don't require multiple regions and
> prefer
> 		Jason’s original patch.
> 
> 
> 	Restricting the dmabuf to a single region (or having to create multiple
> dmabufs
> 	to represent multiple regions/ranges associated with a single scattered
> buffer)
> 	would not be feasible or ideal in all cases. For instance, in my use-case,
> I am
> 	sharing a large framebuffer (FB) located in GPU's VRAM. And,
> allocating a large
> 	FB contiguously (nr_ranges = 1) in VRAM is not possible when there is
> memory
> 	pressure.
> 
> 	Furthermore, since we are adding a new UAPI with this patch/feature,
> we cannot
> 	go back and tweak it (to add support for nr_ranges > 1) should there be
> a need in
> 	the future, but you can always use nr_ranges = 1 anytime. That is why
> it makes
> 	sense to be flexible in terms of the number of ranges/regions.
> 
> 
> 
> 
> 
> 			Also, my understanding is that this patchset cannot
> proceed until Leon's
> 
> 
> 		series is merged:
> 
> 
> 
> 	https://lore.kernel.org/kvm/cover.1733398913.git.leon@kernel.org/
> 
> 
> 
> 		Thanks for the pointer.
> 		I will rebase my local patch series on top of that.
> 
> 
> 	AFAIK, Leon's work includes new mechanism/APIs to do P2P DMA,
> which we
> 	should be using in this patchset. And, I think he is also planning to use
> the
> 	new APIs to augment dmabuf usage and not be forced to use the
> scatter-gather
> 	list particularly in cases where the underlying memory is not backed by
> struct page.
> 
> 	I was just waiting for all of this to happen, before posting a v3.
> 
> 
> 
> Is there any update or ETA for the v3? Are there any ways we can help?
I believe Leon's series is very close to getting merged. Once it lands, this series can
be revived.

> 
> Additionally, do you have any repo that we can access to begin validating our
> user API changes. This would greatly aid us in our software development.
Sure, here is the branch associated with this series (v2):
https://gitlab.freedesktop.org/Vivek/drm-tip/-/commits/vfio_dmabuf_2

Note that, the above branch is based off of Kernel 6.10 but I think it shouldn't be
too hard to forklift the patches onto 6.14. Also, here is the Qemu branch that
includes patches that demonstrate and make use of this new feature:
https://gitlab.freedesktop.org/Vivek/qemu/-/commits/vfio_dmabuf_2

On a different note, if it is not too much trouble, could you please reply to emails
in text (preferred format for emails on mailing lists) instead of HTML?

Thanks,
Vivek

> 
> Thanks,
> Wei Lin
> 
> 
> 	Thanks,
> 	Vivek
> 
> 
> 
> 
> 		Thanks,
> 		Wei Lin
> 
> 
> 
> 
> 
> 			Thanks,
> 			Vivek
> 
> 
> 
> 
> 				In addition to the initial proposal by Jason,
> another promising
> 				application is exposing memory from an AI
> accelerator (bound to VFIO)
> 				to an RDMA device. This would allow the RDMA
> device to directly access
> 				the accelerator's memory, thereby facilitating
> direct data
> 				transactions between the RDMA device and the
> accelerator.
> 
> 				Below is from the text/motivation from the
> orginal cover letter.
> 
> 				dma-buf has become a way to safely acquire a
> handle to non-struct page
> 				memory that can still have lifetime controlled by
> the exporter. Notably
> 				RDMA can now import dma-buf FDs and build
> them into MRs which
> 
> 
> 		allows
> 
> 
> 				for
> 				PCI P2P operations. Extend this to allow vfio-pci
> to export MMIO memory
> 				from PCI device BARs.
> 
> 				This series supports a use case for SPDK where a
> NVMe device will be
> 
> 
> 		owned
> 
> 
> 				by SPDK through VFIO but interacting with a
> RDMA device. The RDMA
> 
> 
> 		device
> 
> 
> 				may directly access the NVMe CMB or directly
> manipulate the NVMe
> 
> 
> 		device's
> 
> 
> 				doorbell using PCI P2P.
> 
> 				However, as a general mechanism, it can support
> many other scenarios
> 
> 
> 		with
> 
> 
> 				VFIO. I imagine this dmabuf approach to be
> usable by iommufd as well for
> 				generic and safe P2P mappings.
> 
> 				This series goes after the "Break up ioctl dispatch
> functions to one
> 				function per ioctl" series.
> 
> 				v2:
> 				- Name the new file dma_buf.c
> 				- Restore orig_nents before freeing
> 				- Fix reversed logic around priv->revoked
> 				- Set priv->index
> 				- Rebased on v2 "Break up ioctl dispatch
> functions"
> 				v1: https://lore.kernel.org/r/0-v1-
> 9e6e1739ed95+5fa-
> 				vfio_dma_buf_jgg@nvidia.com
> 				Cc: linux-rdma@vger.kernel.org
> 				Cc: Oded Gabbay <ogabbay@kernel.org>
> 				Cc: Christian König
> <christian.koenig@amd.com>
> 				Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> 				Cc: Leon Romanovsky <leon@kernel.org>
> 				Cc: Maor Gottlieb <maorg@nvidia.com>
> 				Cc: dri-devel@lists.freedesktop.org
> 				Signed-off-by: Jason Gunthorpe
> <jgg@nvidia.com>
> 
> 				Jason Gunthorpe (3):
> 				vfio: Add vfio_device_get()
> 				dma-buf: Add dma_buf_try_get()
> 				vfio/pci: Allow MMIO regions to be exported
> through dma-buf
> 
> 				Wei Lin Guay (1):
> 				vfio/pci: Allow export dmabuf without
> move_notify from importer
> 
> 				drivers/vfio/pci/Makefile          |   1 +
> 				drivers/vfio/pci/dma_buf.c         | 291
> +++++++++++++++++++++++++++++
> 				drivers/vfio/pci/vfio_pci_config.c |   8 +-
> 				drivers/vfio/pci/vfio_pci_core.c   |  44 ++++-
> 				drivers/vfio/pci/vfio_pci_priv.h   |  30 +++
> 				drivers/vfio/vfio_main.c           |   1 +
> 				include/linux/dma-buf.h            |  13 ++
> 				include/linux/vfio.h               |   6 +
> 				include/linux/vfio_pci_core.h      |   1 +
> 				include/uapi/linux/vfio.h          |  18 ++
> 				10 files changed, 405 insertions(+), 8 deletions(-)
> 				create mode 100644 drivers/vfio/pci/dma_buf.c
> 
> 				--
> 				2.43.5
> 


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 0/4] cover-letter: Allow MMIO regions to be exported through dmabuf
  2025-02-26  7:55         ` Kasireddy, Vivek
@ 2025-02-26 13:38           ` Jason Gunthorpe
  2025-02-26 17:59             ` Leon Romanovsky
                               ` (2 more replies)
  2025-02-26 18:24           ` Wei Lin Guay
  1 sibling, 3 replies; 29+ messages in thread
From: Jason Gunthorpe @ 2025-02-26 13:38 UTC (permalink / raw)
  To: Kasireddy, Vivek
  Cc: Wei Lin Guay, alex.williamson@redhat.com,
	dri-devel@lists.freedesktop.org, kvm@vger.kernel.org,
	linux-rdma@vger.kernel.org, Dag Moxnes, kbusch@kernel.org,
	Nicolaas Viljoen, Oded Gabbay, Christian König,
	Simona Vetter, Leon Romanovsky, Maor Gottlieb

On Wed, Feb 26, 2025 at 07:55:07AM +0000, Kasireddy, Vivek wrote:

> > Is there any update or ETA for the v3? Are there any ways we can help?

> I believe Leon's series is very close to getting merged. Once it
> lands, this series can be revived.

The recent drama has made what happens next unclear.

I would like it if interested parties could contribute reviews to
Leon's v7 series to help it along.

We may want to start considering pushing ahead with this patch series
and using the usual hack of abusing the scatterlist in the mean time.

Thanks,
Jason

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 0/4] cover-letter: Allow MMIO regions to be exported through dmabuf
  2025-02-26 13:38           ` Jason Gunthorpe
@ 2025-02-26 17:59             ` Leon Romanovsky
  2025-02-26 18:27               ` Wei Lin Guay
  2025-03-04  7:15             ` Kasireddy, Vivek
  2025-03-04 14:29             ` Christian König
  2 siblings, 1 reply; 29+ messages in thread
From: Leon Romanovsky @ 2025-02-26 17:59 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Kasireddy, Vivek, Wei Lin Guay, alex.williamson@redhat.com,
	dri-devel@lists.freedesktop.org, kvm@vger.kernel.org,
	linux-rdma@vger.kernel.org, Dag Moxnes, kbusch@kernel.org,
	Nicolaas Viljoen, Oded Gabbay, Christian König,
	Simona Vetter, Maor Gottlieb, Marek Szyprowski

On Wed, Feb 26, 2025 at 09:38:22AM -0400, Jason Gunthorpe wrote:
> On Wed, Feb 26, 2025 at 07:55:07AM +0000, Kasireddy, Vivek wrote:
> 
> > > Is there any update or ETA for the v3? Are there any ways we can help?
> 
> > I believe Leon's series is very close to getting merged. Once it
> > lands, this series can be revived.
> 
> The recent drama has made what happens next unclear.
> 
> I would like it if interested parties could contribute reviews to
> Leon's v7 series to help it along.

Link to v7 https://lore.kernel.org/all/cover.1738765879.git.leonro@nvidia.com/

Thanks

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 0/4] cover-letter: Allow MMIO regions to be exported through dmabuf
  2025-02-26  7:55         ` Kasireddy, Vivek
  2025-02-26 13:38           ` Jason Gunthorpe
@ 2025-02-26 18:24           ` Wei Lin Guay
  1 sibling, 0 replies; 29+ messages in thread
From: Wei Lin Guay @ 2025-02-26 18:24 UTC (permalink / raw)
  To: Kasireddy, Vivek
  Cc: alex.williamson@redhat.com, dri-devel@lists.freedesktop.org,
	kvm@vger.kernel.org, linux-rdma@vger.kernel.org, jgg@nvidia.com,
	Dag Moxnes, kbusch@kernel.org, Nicolaas Viljoen, Oded Gabbay,
	Christian König, Simona Vetter, Leon Romanovsky,
	Maor Gottlieb



> On 26 Feb 2025, at 08:55, Kasireddy, Vivek <vivek.kasireddy@intel.com> wrote:
> 
> > 
> Hi Wei Lin,
> 
> [...]
>> 
>> Yeah, the mmap handler is really needed as a debugging tool given
>> that the
>> importer would not be able to provide access to the dmabuf's
>> underlying
>> memory via the CPU in any other way.
>> 
>> 
>> 
>> - Rather than handle different regions within a single dma-buf,
>> would vfio-
>> user open multiple distinct file descriptors work?
>> For our specific use case, we don't require multiple regions and
>> prefer
>> Jason’s original patch.
>> 
>> 
>> Restricting the dmabuf to a single region (or having to create multiple
>> dmabufs
>> to represent multiple regions/ranges associated with a single scattered
>> buffer)
>> would not be feasible or ideal in all cases. For instance, in my use-case,
>> I am
>> sharing a large framebuffer (FB) located in GPU's VRAM. And,
>> allocating a large
>> FB contiguously (nr_ranges = 1) in VRAM is not possible when there is
>> memory
>> pressure.
>> 
>> Furthermore, since we are adding a new UAPI with this patch/feature,
>> we cannot
>> go back and tweak it (to add support for nr_ranges > 1) should there be
>> a need in
>> the future, but you can always use nr_ranges = 1 anytime. That is why
>> it makes
>> sense to be flexible in terms of the number of ranges/regions.
>> 
>> 
>> 
>> 
>> 
>> Also, my understanding is that this patchset cannot
>> proceed until Leon's
>> 
>> 
>> series is merged:
>> 
>> 
>> 
>> https://lore.kernel.org/kvm/cover.1733398913.git.leon@kernel.org/
>> 
>> 
>> 
>> Thanks for the pointer.
>> I will rebase my local patch series on top of that.
>> 
>> 
>> AFAIK, Leon's work includes new mechanism/APIs to do P2P DMA,
>> which we
>> should be using in this patchset. And, I think he is also planning to use
>> the
>> new APIs to augment dmabuf usage and not be forced to use the
>> scatter-gather
>> list particularly in cases where the underlying memory is not backed by
>> struct page.
>> 
>> I was just waiting for all of this to happen, before posting a v3.
>> 
>> 
>> 
>> Is there any update or ETA for the v3? Are there any ways we can help?
> I believe Leon's series is very close to getting merged. Once it lands, this series can
> be revived.
> 
>> 
>> Additionally, do you have any repo that we can access to begin validating our
>> user API changes. This would greatly aid us in our software development.
> Sure, here is the branch associated with this series (v2):
> https://urldefense.com/v3/__https://gitlab.freedesktop.org/Vivek/drm-tip/-/commits/vfio_dmabuf_2__;!!Bt8RZUm9aw!5AqOvAz6XDQHodYKrX3ALs11b3JitInzCLznNijS7PEQ2wsvhUbx0bF91aTlC-832qGyYJkZnPnV77r0u99IpQ$ 
> 
> Note that, the above branch is based off of Kernel 6.10 but I think it shouldn't be
> too hard to forklift the patches onto 6.14. Also, here is the Qemu branch that
> includes patches that demonstrate and make use of this new feature:
> https://urldefense.com/v3/__https://gitlab.freedesktop.org/Vivek/qemu/-/commits/vfio_dmabuf_2__;!!Bt8RZUm9aw!5AqOvAz6XDQHodYKrX3ALs11b3JitInzCLznNijS7PEQ2wsvhUbx0bF91aTlC-832qGyYJkZnPnV77rIbSw9TQ$


Thanks for the pointer, Vivek. 
> 
> On a different note, if it is not too much trouble, could you please reply to emails
> in text (preferred format for emails on mailing lists) instead of HTML?

Apologies. It was an issue with my email client setup again. It should be fixed now. 

Thanks,
Wei Lin

> 
> Thanks,
> Vivek
> 
>> 
>> Thanks,
>> Wei Lin
>> 
>> 
>> Thanks,
>> Vivek
>> 
>> 
>> 
>> 
>> Thanks,
>> Wei Lin
>> 
>> 
>> 
>> 
>> 
>> Thanks,
>> Vivek
>> 
>> 
>> 
>> 
>> In addition to the initial proposal by Jason,
>> another promising
>> application is exposing memory from an AI
>> accelerator (bound to VFIO)
>> to an RDMA device. This would allow the RDMA
>> device to directly access
>> the accelerator's memory, thereby facilitating
>> direct data
>> transactions between the RDMA device and the
>> accelerator.
>> 
>> Below is from the text/motivation from the
>> orginal cover letter.
>> 
>> dma-buf has become a way to safely acquire a
>> handle to non-struct page
>> memory that can still have lifetime controlled by
>> the exporter. Notably
>> RDMA can now import dma-buf FDs and build
>> them into MRs which
>> 
>> 
>> allows
>> 
>> 
>> for
>> PCI P2P operations. Extend this to allow vfio-pci
>> to export MMIO memory
>> from PCI device BARs.
>> 
>> This series supports a use case for SPDK where a
>> NVMe device will be
>> 
>> 
>> owned
>> 
>> 
>> by SPDK through VFIO but interacting with a
>> RDMA device. The RDMA
>> 
>> 
>> device
>> 
>> 
>> may directly access the NVMe CMB or directly
>> manipulate the NVMe
>> 
>> 
>> device's
>> 
>> 
>> doorbell using PCI P2P.
>> 
>> However, as a general mechanism, it can support
>> many other scenarios
>> 
>> 
>> with
>> 
>> 
>> VFIO. I imagine this dmabuf approach to be
>> usable by iommufd as well for
>> generic and safe P2P mappings.
>> 
>> This series goes after the "Break up ioctl dispatch
>> functions to one
>> function per ioctl" series.
>> 
>> v2:
>> - Name the new file dma_buf.c
>> - Restore orig_nents before freeing
>> - Fix reversed logic around priv->revoked
>> - Set priv->index
>> - Rebased on v2 "Break up ioctl dispatch
>> functions"
>> v1: https://lore.kernel.org/r/0-v1-
>> 9e6e1739ed95+5fa-
>> vfio_dma_buf_jgg@nvidia.com
>> Cc: linux-rdma@vger.kernel.org
>> Cc: Oded Gabbay <ogabbay@kernel.org>
>> Cc: Christian König
>> <christian.koenig@amd.com>
>> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
>> Cc: Leon Romanovsky <leon@kernel.org>
>> Cc: Maor Gottlieb <maorg@nvidia.com>
>> Cc: dri-devel@lists.freedesktop.org
>> Signed-off-by: Jason Gunthorpe
>> <jgg@nvidia.com>
>> 
>> Jason Gunthorpe (3):
>> vfio: Add vfio_device_get()
>> dma-buf: Add dma_buf_try_get()
>> vfio/pci: Allow MMIO regions to be exported
>> through dma-buf
>> 
>> Wei Lin Guay (1):
>> vfio/pci: Allow export dmabuf without
>> move_notify from importer
>> 
>> drivers/vfio/pci/Makefile          |   1 +
>> drivers/vfio/pci/dma_buf.c         | 291
>> +++++++++++++++++++++++++++++
>> drivers/vfio/pci/vfio_pci_config.c |   8 +-
>> drivers/vfio/pci/vfio_pci_core.c   |  44 ++++-
>> drivers/vfio/pci/vfio_pci_priv.h   |  30 +++
>> drivers/vfio/vfio_main.c           |   1 +
>> include/linux/dma-buf.h            |  13 ++
>> include/linux/vfio.h               |   6 +
>> include/linux/vfio_pci_core.h      |   1 +
>> include/uapi/linux/vfio.h          |  18 ++
>> 10 files changed, 405 insertions(+), 8 deletions(-)
>> create mode 100644 drivers/vfio/pci/dma_buf.c
>> 
>> --
>> 2.43.5
>> 
> 


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 0/4] cover-letter: Allow MMIO regions to be exported through dmabuf
  2025-02-26 17:59             ` Leon Romanovsky
@ 2025-02-26 18:27               ` Wei Lin Guay
  0 siblings, 0 replies; 29+ messages in thread
From: Wei Lin Guay @ 2025-02-26 18:27 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Jason Gunthorpe, Kasireddy, Vivek, alex.williamson@redhat.com,
	dri-devel@lists.freedesktop.org, kvm@vger.kernel.org,
	linux-rdma@vger.kernel.org, Dag Moxnes, kbusch@kernel.org,
	Nicolaas Viljoen, Oded Gabbay, Christian König,
	Simona Vetter, Maor Gottlieb, Marek Szyprowski



> On 26 Feb 2025, at 18:59, Leon Romanovsky <leon@kernel.org> wrote:
> 
> > 
> On Wed, Feb 26, 2025 at 09:38:22AM -0400, Jason Gunthorpe wrote:
>> On Wed, Feb 26, 2025 at 07:55:07AM +0000, Kasireddy, Vivek wrote:
>> 
>>>> Is there any update or ETA for the v3? Are there any ways we can help?
>> 
>>> I believe Leon's series is very close to getting merged. Once it
>>> lands, this series can be revived.
>> 
>> The recent drama has made what happens next unclear.
>> 
>> I would like it if interested parties could contribute reviews to
>> Leon's v7 series to help it along.
> 
> Link to v7 https://lore.kernel.org/all/cover.1738765879.git.leonro@nvidia.com/

Thanks, Jason/Leon for the v7. We will begin by reviewing the patch series.   
> 
> Thanks


^ permalink raw reply	[flat|nested] 29+ messages in thread

* RE: [PATCH 0/4] cover-letter: Allow MMIO regions to be exported through dmabuf
  2025-02-26 13:38           ` Jason Gunthorpe
  2025-02-26 17:59             ` Leon Romanovsky
@ 2025-03-04  7:15             ` Kasireddy, Vivek
  2025-03-04 14:29             ` Christian König
  2 siblings, 0 replies; 29+ messages in thread
From: Kasireddy, Vivek @ 2025-03-04  7:15 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Wei Lin Guay, alex.williamson@redhat.com,
	dri-devel@lists.freedesktop.org, kvm@vger.kernel.org,
	linux-rdma@vger.kernel.org, Dag Moxnes, kbusch@kernel.org,
	Nicolaas Viljoen, Oded Gabbay, Christian König,
	Simona Vetter, Leon Romanovsky, Maor Gottlieb

Hi Jason,

> Subject: Re: [PATCH 0/4] cover-letter: Allow MMIO regions to be exported
> through dmabuf
> 
> On Wed, Feb 26, 2025 at 07:55:07AM +0000, Kasireddy, Vivek wrote:
> 
> > > Is there any update or ETA for the v3? Are there any ways we can help?
> 
> > I believe Leon's series is very close to getting merged. Once it
> > lands, this series can be revived.
> 
> The recent drama has made what happens next unclear.
> 
> I would like it if interested parties could contribute reviews to
> Leon's v7 series to help it along.
> 
> We may want to start considering pushing ahead with this patch series
> and using the usual hack of abusing the scatterlist in the mean time.
Yeah, given that this series is gaining more adoption (in newer use-cases)
and since Leon's series would likely take more time (and iterations) to get
merged, I agree that it makes sense to continue using the scatterlist for now
with this series.

Although, Simona has implicitly Ack'd this series, an explicit Ack would be
needed from them or Christian for the dmabuf parts. I'll post a v3 soon
(sometime this week) and we can take it from there.

Thanks,
Vivek

> 
> Thanks,
> Jason

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 0/4] cover-letter: Allow MMIO regions to be exported through dmabuf
  2025-02-26 13:38           ` Jason Gunthorpe
  2025-02-26 17:59             ` Leon Romanovsky
  2025-03-04  7:15             ` Kasireddy, Vivek
@ 2025-03-04 14:29             ` Christian König
  2025-03-04 14:33               ` Jason Gunthorpe
  2025-03-04 14:54               ` Leon Romanovsky
  2 siblings, 2 replies; 29+ messages in thread
From: Christian König @ 2025-03-04 14:29 UTC (permalink / raw)
  To: Jason Gunthorpe, Kasireddy, Vivek
  Cc: Wei Lin Guay, alex.williamson@redhat.com,
	dri-devel@lists.freedesktop.org, kvm@vger.kernel.org,
	linux-rdma@vger.kernel.org, Dag Moxnes, kbusch@kernel.org,
	Nicolaas Viljoen, Oded Gabbay, Simona Vetter, Leon Romanovsky,
	Maor Gottlieb

Am 26.02.25 um 14:38 schrieb Jason Gunthorpe:
> On Wed, Feb 26, 2025 at 07:55:07AM +0000, Kasireddy, Vivek wrote:
>
>>> Is there any update or ETA for the v3? Are there any ways we can help?
>> I believe Leon's series is very close to getting merged. Once it
>> lands, this series can be revived.
> The recent drama has made what happens next unclear.
>
> I would like it if interested parties could contribute reviews to
> Leon's v7 series to help it along.

I think demonstrating how any new interface would work with the existing importers/exporters would help.

> We may want to start considering pushing ahead with this patch series
> and using the usual hack of abusing the scatterlist in the mean time.

That works for me as well.

I think when we should get away from scatterlist then that should be a separate set of patches or maybe call it a separate project to get done for all DMA-buf parties.

E.g. propose something like an iterator based interface and demonstrate that V4L and a DRM driver can live with that and that this approach works like expected.

Regards,
Christian.


>
> Thanks,
> Jason


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 0/4] cover-letter: Allow MMIO regions to be exported through dmabuf
  2025-03-04 14:29             ` Christian König
@ 2025-03-04 14:33               ` Jason Gunthorpe
  2025-03-04 14:54               ` Leon Romanovsky
  1 sibling, 0 replies; 29+ messages in thread
From: Jason Gunthorpe @ 2025-03-04 14:33 UTC (permalink / raw)
  To: Christian König
  Cc: Kasireddy, Vivek, Wei Lin Guay, alex.williamson@redhat.com,
	dri-devel@lists.freedesktop.org, kvm@vger.kernel.org,
	linux-rdma@vger.kernel.org, Dag Moxnes, kbusch@kernel.org,
	Nicolaas Viljoen, Oded Gabbay, Simona Vetter, Leon Romanovsky,
	Maor Gottlieb

On Tue, Mar 04, 2025 at 03:29:42PM +0100, Christian König wrote:
> Am 26.02.25 um 14:38 schrieb Jason Gunthorpe:
> > On Wed, Feb 26, 2025 at 07:55:07AM +0000, Kasireddy, Vivek wrote:
> >
> >>> Is there any update or ETA for the v3? Are there any ways we can help?
> >> I believe Leon's series is very close to getting merged. Once it
> >> lands, this series can be revived.
> > The recent drama has made what happens next unclear.
> >
> > I would like it if interested parties could contribute reviews to
> > Leon's v7 series to help it along.
> 
> I think demonstrating how any new interface would work with the
> existing importers/exporters would help.

I can see that, but also it is a huge amount of work to make trial
changes to all sorts of subsystems :\

> > We may want to start considering pushing ahead with this patch series
> > and using the usual hack of abusing the scatterlist in the mean time.
> 
> That works for me as well.
> 
> I think when we should get away from scatterlist then that should be
> a separate set of patches or maybe call it a separate project to get
> done for all DMA-buf parties.
> 
> E.g. propose something like an iterator based interface and
> demonstrate that V4L and a DRM driver can live with that and that
> this approach works like expected.

Yeah, maybe so. We are some ways from that yet..

Jason

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 0/4] cover-letter: Allow MMIO regions to be exported through dmabuf
  2025-03-04 14:29             ` Christian König
  2025-03-04 14:33               ` Jason Gunthorpe
@ 2025-03-04 14:54               ` Leon Romanovsky
  1 sibling, 0 replies; 29+ messages in thread
From: Leon Romanovsky @ 2025-03-04 14:54 UTC (permalink / raw)
  To: Christian König
  Cc: Jason Gunthorpe, Kasireddy, Vivek, Wei Lin Guay,
	alex.williamson@redhat.com, dri-devel@lists.freedesktop.org,
	kvm@vger.kernel.org, linux-rdma@vger.kernel.org, Dag Moxnes,
	kbusch@kernel.org, Nicolaas Viljoen, Oded Gabbay, Simona Vetter,
	Maor Gottlieb, Marek Szyprowski

On Tue, Mar 04, 2025 at 03:29:42PM +0100, Christian König wrote:
> Am 26.02.25 um 14:38 schrieb Jason Gunthorpe:
> > On Wed, Feb 26, 2025 at 07:55:07AM +0000, Kasireddy, Vivek wrote:
> >
> >>> Is there any update or ETA for the v3? Are there any ways we can help?
> >> I believe Leon's series is very close to getting merged. Once it
> >> lands, this series can be revived.
> > The recent drama has made what happens next unclear.
> >
> > I would like it if interested parties could contribute reviews to
> > Leon's v7 series to help it along.
> 
> I think demonstrating how any new interface would work with the existing importers/exporters would help.

Unfortunately, it is huge waste of time given current situation where
even basic building blocks are not merged yet [1].

We do see clear path to fix dmabuf, see this roadmap [2].

[1] https://lore.kernel.org/all/20250302085717.GO53094@unreal/
[2] https://lore.kernel.org/linux-rdma/20250122071600.GC10702@unreal/T/#u

Thanks

^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2025-03-04 14:54 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20241216095429.210792-1-wguay@fb.com>
2024-12-16  9:54 ` [PATCH 1/4] vfio: Add vfio_device_get() Wei Lin Guay
2024-12-16  9:54 ` [PATCH 2/4] dma-buf: Add dma_buf_try_get() Wei Lin Guay
2024-12-16  9:54 ` [PATCH 3/4] vfio/pci: Allow MMIO regions to be exported through dma-buf Wei Lin Guay
2024-12-16  9:54 ` [PATCH 4/4] vfio/pci: Allow export dmabuf without move_notify from importer Wei Lin Guay
2024-12-16 10:21 ` [PATCH 0/4] cover-letter: Allow MMIO regions to be exported through dmabuf Christian König
2024-12-16 16:54   ` Keith Busch
2024-12-17  9:53     ` Christian König
2024-12-17 11:06       ` Wei Lin Guay
     [not found]         ` <e8759159-b141-410b-be08-aad54eaed41f@amd.com>
2024-12-18  6:16           ` Kasireddy, Vivek
2024-12-18 10:01             ` Christian König
2024-12-19  7:02               ` Kasireddy, Vivek
2024-12-19 10:04                 ` Christian König
2025-01-02 13:39                   ` Jason Gunthorpe
2025-01-06 12:05                     ` Simona Vetter
2025-01-06 16:27                       ` Jason Gunthorpe
2025-01-08 16:48                         ` Simona Vetter
2024-12-18 10:44       ` Simona Vetter
     [not found] <20241216095920.237117-1-wguay@fb.com>
2024-12-16 17:34 ` Kasireddy, Vivek
2024-12-17 12:19   ` Wei Lin Guay
2024-12-18  7:02     ` Kasireddy, Vivek
     [not found]       ` <61DF4F0E-D947-436B-9160-A40079DB9085@meta.com>
2025-02-26  7:55         ` Kasireddy, Vivek
2025-02-26 13:38           ` Jason Gunthorpe
2025-02-26 17:59             ` Leon Romanovsky
2025-02-26 18:27               ` Wei Lin Guay
2025-03-04  7:15             ` Kasireddy, Vivek
2025-03-04 14:29             ` Christian König
2025-03-04 14:33               ` Jason Gunthorpe
2025-03-04 14:54               ` Leon Romanovsky
2025-02-26 18:24           ` Wei Lin Guay

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).