public inbox for linux-media@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH v5 0/8] dma-buf: Use revoke mechanism to invalidate shared buffers
@ 2026-01-24 19:14 Leon Romanovsky
  2026-01-24 19:14 ` [PATCH v5 1/8] dma-buf: Rename .move_notify() callback to a clearer identifier Leon Romanovsky
                   ` (7 more replies)
  0 siblings, 8 replies; 51+ messages in thread
From: Leon Romanovsky @ 2026-01-24 19:14 UTC (permalink / raw)
  To: Sumit Semwal, Christian König, Alex Deucher, David Airlie,
	Simona Vetter, Gerd Hoffmann, Dmitry Osipenko, Gurchetan Singh,
	Chia-I Wu, Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	Lucas De Marchi, Thomas Hellström, Rodrigo Vivi,
	Jason Gunthorpe, Leon Romanovsky, Kevin Tian, Joerg Roedel,
	Will Deacon, Robin Murphy, Felix Kuehling, Alex Williamson,
	Ankit Agrawal, Vivek Kasireddy
  Cc: linux-media, dri-devel, linaro-mm-sig, linux-kernel, amd-gfx,
	virtualization, intel-xe, linux-rdma, iommu, kvm

Changelog:
v5:
 * Documented the DMA-BUF expectations around DMA unmap.
 * Added wait support in VFIO for DMA unmap.
 * Reordered patches.
 * Improved commit messages to document even more.
v4: https://lore.kernel.org/all/20260121-dmabuf-revoke-v4-0-d311cbc8633d@nvidia.com
 * Changed DMA_RESV_USAGE_KERNEL to DMA_RESV_USAGE_BOOKKEEP.
 * Made .invalidate_mapping() truly optional.
 * Added patch which renames dma_buf_move_notify() to be
   dma_buf_invalidate_mappings().
 * Restored dma_buf_attachment_is_dynamic() function.
v3: https://lore.kernel.org/all/20260120-dmabuf-revoke-v3-0-b7e0b07b8214@nvidia.com/
 * Used Jason's wordings for commits and cover letter.
 * Removed IOMMUFD patch.
 * Renamed dma_buf_attachment_is_revoke() to be dma_buf_attach_revocable().
 * Added patch to remove CONFIG_DMABUF_MOVE_NOTIFY.
 * Added Reviewed-by tags.
 * Called to dma_resv_wait_timeout() after dma_buf_move_notify() in VFIO.
 * Added dma_buf_attach_revocable() check to VFIO DMABUF attach function.
 * Slightly changed commit messages.
v2: https://patch.msgid.link/20260118-dmabuf-revoke-v2-0-a03bb27c0875@nvidia.com
 * Changed series to document the revoke semantics instead of
   implementing it.
v1: https://patch.msgid.link/20260111-dmabuf-revoke-v1-0-fb4bcc8c259b@nvidia.com

-------------------------------------------------------------------------
This series is based on latest VFIO fix, which will be sent to Linus
very soon.

https://lore.kernel.org/all/20260121-vfio-add-pin-v1-1-4e04916b17f1@nvidia.com/

Thanks
-------------------------------------------------------------------------
This series documents a dma-buf “revoke” mechanism: to allow a dma-buf
exporter to explicitly invalidate (“kill”) a shared buffer after it has
been distributed to importers, so that further CPU and device access is
prevented and importers reliably observe failure.

The change in this series is to properly document and use existing core
“revoked” state on the dma-buf object and a corresponding exporter-triggered
revoke operation.

dma-buf has quietly allowed calling move_notify on pinned dma-bufs, even
though legacy importers using dma_buf_attach() would simply ignore
these calls.

The intention was that move_notify() would tell the importer to expedite
it's unmapping process and once the importer is fully finished with DMA it
would unmap the dma-buf which finally signals that the importer is no
longer ever going to touch the memory again. Importers that touch past
their unmap() call can trigger IOMMU errors, AER and beyond, however
read-and-discard access between move_notify() and unmap is allowed.

Thus, we can define the exporter's revoke sequence for pinned dma-buf as:

	dma_resv_lock(dmabuf->resv, NULL);
	// Prevent new mappings from being established
	priv->revoked = true;

	// Tell all importers to eventually unmap
	dma_buf_invalidate_mappings(dmabuf);

	// Wait for any inprogress fences on the old mapping
	dma_resv_wait_timeout(dmabuf->resv,
			      DMA_RESV_USAGE_BOOKKEEP, false,
			      MAX_SCHEDULE_TIMEOUT);
	dma_resv_unlock(dmabuf->resv, NULL);

	// Wait for all importers to complete unmap
	wait_for_completion(&priv->unmapp_comp);

However, dma-buf also supports importers that don't do anything on
move_notify(), and will not unmap the buffer in bounded time.

Since such importers would cause the above sequence to hang, a new
mechanism is needed to detect incompatible importers.

Introduce dma_buf_attach_revocable() which if true indicates the above
sequence is safe to use and will complete in kernel-only bounded time for
this attachment.

Unfortunately dma_buf_attach_revocable() is going to fail for the popular
RDMA pinned importer, which means we cannot introduce it to existing
places using pinned move_notify() without potentially breaking existing
userspace flows.

Existing exporters that only trigger this flow for RAS errors should not
call dma_buf_attach_revocable() and will suffer an unbounded block on the
final completion, hoping that the userspace will notice the RAS and clean
things up. Without revoke support on the RDMA pinned importers it doesn't
seem like any other non-breaking option is currently possible.

For new exporters, like VFIO and RDMA, that have userspace triggered
revoke events, the unbouned sleep would not be acceptable. They can call
dma_buf_attach_revocable() and will not work with the RDMA pinned importer
from day 0, preventing regressions.

In the process add documentation explaining the above details.

Thanks

Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
Leon Romanovsky (8):
      dma-buf: Rename .move_notify() callback to a clearer identifier
      dma-buf: Rename dma_buf_move_notify() to dma_buf_invalidate_mappings()
      dma-buf: Always build with DMABUF_MOVE_NOTIFY
      vfio: Wait for dma-buf invalidation to complete
      dma-buf: Make .invalidate_mapping() truly optional
      dma-buf: Add dma_buf_attach_revocable()
      vfio: Permit VFIO to work with pinned importers
      iommufd: Add dma_buf_pin()

 drivers/dma-buf/Kconfig                     | 12 ----
 drivers/dma-buf/dma-buf.c                   | 69 +++++++++++++++++-----
 drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c | 14 ++---
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.c  |  2 +-
 drivers/gpu/drm/amd/amdkfd/Kconfig          |  2 +-
 drivers/gpu/drm/virtio/virtgpu_prime.c      |  2 +-
 drivers/gpu/drm/xe/tests/xe_dma_buf.c       |  7 +--
 drivers/gpu/drm/xe/xe_bo.c                  |  2 +-
 drivers/gpu/drm/xe/xe_dma_buf.c             | 14 ++---
 drivers/infiniband/core/umem_dmabuf.c       | 13 -----
 drivers/infiniband/hw/mlx5/mr.c             |  2 +-
 drivers/iommu/iommufd/pages.c               | 11 +++-
 drivers/iommu/iommufd/selftest.c            |  2 +-
 drivers/vfio/pci/vfio_pci_dmabuf.c          | 90 +++++++++++++++++++++++------
 include/linux/dma-buf.h                     | 17 +++---
 15 files changed, 164 insertions(+), 95 deletions(-)
---
base-commit: 61ceaf236115f20f4fdd7cf60f883ada1063349a
change-id: 20251221-dmabuf-revoke-b90ef16e4236

Best regards,
--  
Leon Romanovsky <leonro@nvidia.com>


^ permalink raw reply	[flat|nested] 51+ messages in thread

* [PATCH v5 1/8] dma-buf: Rename .move_notify() callback to a clearer identifier
  2026-01-24 19:14 [PATCH v5 0/8] dma-buf: Use revoke mechanism to invalidate shared buffers Leon Romanovsky
@ 2026-01-24 19:14 ` Leon Romanovsky
  2026-01-24 19:14 ` [PATCH v5 2/8] dma-buf: Rename dma_buf_move_notify() to dma_buf_invalidate_mappings() Leon Romanovsky
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 51+ messages in thread
From: Leon Romanovsky @ 2026-01-24 19:14 UTC (permalink / raw)
  To: Sumit Semwal, Christian König, Alex Deucher, David Airlie,
	Simona Vetter, Gerd Hoffmann, Dmitry Osipenko, Gurchetan Singh,
	Chia-I Wu, Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	Lucas De Marchi, Thomas Hellström, Rodrigo Vivi,
	Jason Gunthorpe, Leon Romanovsky, Kevin Tian, Joerg Roedel,
	Will Deacon, Robin Murphy, Felix Kuehling, Alex Williamson,
	Ankit Agrawal, Vivek Kasireddy
  Cc: linux-media, dri-devel, linaro-mm-sig, linux-kernel, amd-gfx,
	virtualization, intel-xe, linux-rdma, iommu, kvm

From: Leon Romanovsky <leonro@nvidia.com>

Rename the .move_notify() callback to .invalidate_mappings() to make its
purpose explicit and highlight that it is responsible for invalidating
existing mappings.

Suggested-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
 drivers/dma-buf/dma-buf.c                   | 6 +++---
 drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c | 4 ++--
 drivers/gpu/drm/virtio/virtgpu_prime.c      | 2 +-
 drivers/gpu/drm/xe/tests/xe_dma_buf.c       | 6 +++---
 drivers/gpu/drm/xe/xe_dma_buf.c             | 2 +-
 drivers/infiniband/core/umem_dmabuf.c       | 4 ++--
 drivers/infiniband/hw/mlx5/mr.c             | 2 +-
 drivers/iommu/iommufd/pages.c               | 2 +-
 include/linux/dma-buf.h                     | 6 +++---
 9 files changed, 17 insertions(+), 17 deletions(-)

diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
index edaa9e4ee4ae..59cc647bf40e 100644
--- a/drivers/dma-buf/dma-buf.c
+++ b/drivers/dma-buf/dma-buf.c
@@ -948,7 +948,7 @@ dma_buf_dynamic_attach(struct dma_buf *dmabuf, struct device *dev,
 	if (WARN_ON(!dmabuf || !dev))
 		return ERR_PTR(-EINVAL);
 
-	if (WARN_ON(importer_ops && !importer_ops->move_notify))
+	if (WARN_ON(importer_ops && !importer_ops->invalidate_mappings))
 		return ERR_PTR(-EINVAL);
 
 	attach = kzalloc(sizeof(*attach), GFP_KERNEL);
@@ -1055,7 +1055,7 @@ EXPORT_SYMBOL_NS_GPL(dma_buf_pin, "DMA_BUF");
  *
  * This unpins a buffer pinned by dma_buf_pin() and allows the exporter to move
  * any mapping of @attach again and inform the importer through
- * &dma_buf_attach_ops.move_notify.
+ * &dma_buf_attach_ops.invalidate_mappings.
  */
 void dma_buf_unpin(struct dma_buf_attachment *attach)
 {
@@ -1262,7 +1262,7 @@ void dma_buf_move_notify(struct dma_buf *dmabuf)
 
 	list_for_each_entry(attach, &dmabuf->attachments, node)
 		if (attach->importer_ops)
-			attach->importer_ops->move_notify(attach);
+			attach->importer_ops->invalidate_mappings(attach);
 }
 EXPORT_SYMBOL_NS_GPL(dma_buf_move_notify, "DMA_BUF");
 
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c
index c1461317eb29..cd4944ceb047 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c
@@ -438,7 +438,7 @@ amdgpu_dma_buf_create_obj(struct drm_device *dev, struct dma_buf *dma_buf)
 }
 
 /**
- * amdgpu_dma_buf_move_notify - &attach.move_notify implementation
+ * amdgpu_dma_buf_move_notify - &attach.invalidate_mappings implementation
  *
  * @attach: the DMA-buf attachment
  *
@@ -509,7 +509,7 @@ amdgpu_dma_buf_move_notify(struct dma_buf_attachment *attach)
 
 static const struct dma_buf_attach_ops amdgpu_dma_buf_attach_ops = {
 	.allow_peer2peer = true,
-	.move_notify = amdgpu_dma_buf_move_notify
+	.invalidate_mappings = amdgpu_dma_buf_move_notify
 };
 
 /**
diff --git a/drivers/gpu/drm/virtio/virtgpu_prime.c b/drivers/gpu/drm/virtio/virtgpu_prime.c
index ce49282198cb..19c78dd2ca77 100644
--- a/drivers/gpu/drm/virtio/virtgpu_prime.c
+++ b/drivers/gpu/drm/virtio/virtgpu_prime.c
@@ -288,7 +288,7 @@ static void virtgpu_dma_buf_move_notify(struct dma_buf_attachment *attach)
 
 static const struct dma_buf_attach_ops virtgpu_dma_buf_attach_ops = {
 	.allow_peer2peer = true,
-	.move_notify = virtgpu_dma_buf_move_notify
+	.invalidate_mappings = virtgpu_dma_buf_move_notify
 };
 
 struct drm_gem_object *virtgpu_gem_prime_import(struct drm_device *dev,
diff --git a/drivers/gpu/drm/xe/tests/xe_dma_buf.c b/drivers/gpu/drm/xe/tests/xe_dma_buf.c
index 5df98de5ba3c..1f2cca5c2f81 100644
--- a/drivers/gpu/drm/xe/tests/xe_dma_buf.c
+++ b/drivers/gpu/drm/xe/tests/xe_dma_buf.c
@@ -23,7 +23,7 @@ static bool p2p_enabled(struct dma_buf_test_params *params)
 static bool is_dynamic(struct dma_buf_test_params *params)
 {
 	return IS_ENABLED(CONFIG_DMABUF_MOVE_NOTIFY) && params->attach_ops &&
-		params->attach_ops->move_notify;
+		params->attach_ops->invalidate_mappings;
 }
 
 static void check_residency(struct kunit *test, struct xe_bo *exported,
@@ -60,7 +60,7 @@ static void check_residency(struct kunit *test, struct xe_bo *exported,
 
 	/*
 	 * Evict exporter. Evicting the exported bo will
-	 * evict also the imported bo through the move_notify() functionality if
+	 * evict also the imported bo through the invalidate_mappings() functionality if
 	 * importer is on a different device. If they're on the same device,
 	 * the exporter and the importer should be the same bo.
 	 */
@@ -198,7 +198,7 @@ static void xe_test_dmabuf_import_same_driver(struct xe_device *xe)
 
 static const struct dma_buf_attach_ops nop2p_attach_ops = {
 	.allow_peer2peer = false,
-	.move_notify = xe_dma_buf_move_notify
+	.invalidate_mappings = xe_dma_buf_move_notify
 };
 
 /*
diff --git a/drivers/gpu/drm/xe/xe_dma_buf.c b/drivers/gpu/drm/xe/xe_dma_buf.c
index 7c74a31d4486..1b9cd043e517 100644
--- a/drivers/gpu/drm/xe/xe_dma_buf.c
+++ b/drivers/gpu/drm/xe/xe_dma_buf.c
@@ -287,7 +287,7 @@ static void xe_dma_buf_move_notify(struct dma_buf_attachment *attach)
 
 static const struct dma_buf_attach_ops xe_dma_buf_attach_ops = {
 	.allow_peer2peer = true,
-	.move_notify = xe_dma_buf_move_notify
+	.invalidate_mappings = xe_dma_buf_move_notify
 };
 
 #if IS_ENABLED(CONFIG_DRM_XE_KUNIT_TEST)
diff --git a/drivers/infiniband/core/umem_dmabuf.c b/drivers/infiniband/core/umem_dmabuf.c
index 0ec2e4120cc9..d77a739cfe7a 100644
--- a/drivers/infiniband/core/umem_dmabuf.c
+++ b/drivers/infiniband/core/umem_dmabuf.c
@@ -129,7 +129,7 @@ ib_umem_dmabuf_get_with_dma_device(struct ib_device *device,
 	if (check_add_overflow(offset, (unsigned long)size, &end))
 		return ret;
 
-	if (unlikely(!ops || !ops->move_notify))
+	if (unlikely(!ops || !ops->invalidate_mappings))
 		return ret;
 
 	dmabuf = dma_buf_get(fd);
@@ -195,7 +195,7 @@ ib_umem_dmabuf_unsupported_move_notify(struct dma_buf_attachment *attach)
 
 static struct dma_buf_attach_ops ib_umem_dmabuf_attach_pinned_ops = {
 	.allow_peer2peer = true,
-	.move_notify = ib_umem_dmabuf_unsupported_move_notify,
+	.invalidate_mappings = ib_umem_dmabuf_unsupported_move_notify,
 };
 
 struct ib_umem_dmabuf *
diff --git a/drivers/infiniband/hw/mlx5/mr.c b/drivers/infiniband/hw/mlx5/mr.c
index 325fa04cbe8a..97099d3b1688 100644
--- a/drivers/infiniband/hw/mlx5/mr.c
+++ b/drivers/infiniband/hw/mlx5/mr.c
@@ -1620,7 +1620,7 @@ static void mlx5_ib_dmabuf_invalidate_cb(struct dma_buf_attachment *attach)
 
 static struct dma_buf_attach_ops mlx5_ib_dmabuf_attach_ops = {
 	.allow_peer2peer = 1,
-	.move_notify = mlx5_ib_dmabuf_invalidate_cb,
+	.invalidate_mappings = mlx5_ib_dmabuf_invalidate_cb,
 };
 
 static struct ib_mr *
diff --git a/drivers/iommu/iommufd/pages.c b/drivers/iommu/iommufd/pages.c
index dbe51ecb9a20..76f900fa1687 100644
--- a/drivers/iommu/iommufd/pages.c
+++ b/drivers/iommu/iommufd/pages.c
@@ -1451,7 +1451,7 @@ static void iopt_revoke_notify(struct dma_buf_attachment *attach)
 
 static struct dma_buf_attach_ops iopt_dmabuf_attach_revoke_ops = {
 	.allow_peer2peer = true,
-	.move_notify = iopt_revoke_notify,
+	.invalidate_mappings = iopt_revoke_notify,
 };
 
 /*
diff --git a/include/linux/dma-buf.h b/include/linux/dma-buf.h
index 0bc492090237..1b397635c793 100644
--- a/include/linux/dma-buf.h
+++ b/include/linux/dma-buf.h
@@ -407,7 +407,7 @@ struct dma_buf {
 	 *   through the device.
 	 *
 	 * - Dynamic importers should set fences for any access that they can't
-	 *   disable immediately from their &dma_buf_attach_ops.move_notify
+	 *   disable immediately from their &dma_buf_attach_ops.invalidate_mappings
 	 *   callback.
 	 *
 	 * IMPORTANT:
@@ -458,7 +458,7 @@ struct dma_buf_attach_ops {
 	bool allow_peer2peer;
 
 	/**
-	 * @move_notify: [optional] notification that the DMA-buf is moving
+	 * @invalidate_mappings: [optional] notification that the DMA-buf is moving
 	 *
 	 * If this callback is provided the framework can avoid pinning the
 	 * backing store while mappings exists.
@@ -475,7 +475,7 @@ struct dma_buf_attach_ops {
 	 * New mappings can be created after this callback returns, and will
 	 * point to the new location of the DMA-buf.
 	 */
-	void (*move_notify)(struct dma_buf_attachment *attach);
+	void (*invalidate_mappings)(struct dma_buf_attachment *attach);
 };
 
 /**

-- 
2.52.0


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v5 2/8] dma-buf: Rename dma_buf_move_notify() to dma_buf_invalidate_mappings()
  2026-01-24 19:14 [PATCH v5 0/8] dma-buf: Use revoke mechanism to invalidate shared buffers Leon Romanovsky
  2026-01-24 19:14 ` [PATCH v5 1/8] dma-buf: Rename .move_notify() callback to a clearer identifier Leon Romanovsky
@ 2026-01-24 19:14 ` Leon Romanovsky
  2026-01-27  9:21   ` Christian König
  2026-01-24 19:14 ` [PATCH v5 3/8] dma-buf: Always build with DMABUF_MOVE_NOTIFY Leon Romanovsky
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 51+ messages in thread
From: Leon Romanovsky @ 2026-01-24 19:14 UTC (permalink / raw)
  To: Sumit Semwal, Christian König, Alex Deucher, David Airlie,
	Simona Vetter, Gerd Hoffmann, Dmitry Osipenko, Gurchetan Singh,
	Chia-I Wu, Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	Lucas De Marchi, Thomas Hellström, Rodrigo Vivi,
	Jason Gunthorpe, Leon Romanovsky, Kevin Tian, Joerg Roedel,
	Will Deacon, Robin Murphy, Felix Kuehling, Alex Williamson,
	Ankit Agrawal, Vivek Kasireddy
  Cc: linux-media, dri-devel, linaro-mm-sig, linux-kernel, amd-gfx,
	virtualization, intel-xe, linux-rdma, iommu, kvm

From: Leon Romanovsky <leonro@nvidia.com>

Along with renaming the .move_notify() callback, rename the corresponding
dma-buf core function. This makes the expected behavior clear to exporters
calling this function.

Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
 drivers/dma-buf/dma-buf.c                  | 8 ++++----
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 2 +-
 drivers/gpu/drm/xe/xe_bo.c                 | 2 +-
 drivers/iommu/iommufd/selftest.c           | 2 +-
 drivers/vfio/pci/vfio_pci_dmabuf.c         | 4 ++--
 include/linux/dma-buf.h                    | 2 +-
 6 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
index 59cc647bf40e..e12db540c413 100644
--- a/drivers/dma-buf/dma-buf.c
+++ b/drivers/dma-buf/dma-buf.c
@@ -912,7 +912,7 @@ dma_buf_pin_on_map(struct dma_buf_attachment *attach)
  * 3. Exporters must hold the dma-buf reservation lock when calling these
  *    functions:
  *
- *     - dma_buf_move_notify()
+ *     - dma_buf_invalidate_mappings()
  */
 
 /**
@@ -1247,14 +1247,14 @@ void dma_buf_unmap_attachment_unlocked(struct dma_buf_attachment *attach,
 EXPORT_SYMBOL_NS_GPL(dma_buf_unmap_attachment_unlocked, "DMA_BUF");
 
 /**
- * dma_buf_move_notify - notify attachments that DMA-buf is moving
+ * dma_buf_invalidate_mappings - notify attachments that DMA-buf is moving
  *
  * @dmabuf:	[in]	buffer which is moving
  *
  * Informs all attachments that they need to destroy and recreate all their
  * mappings.
  */
-void dma_buf_move_notify(struct dma_buf *dmabuf)
+void dma_buf_invalidate_mappings(struct dma_buf *dmabuf)
 {
 	struct dma_buf_attachment *attach;
 
@@ -1264,7 +1264,7 @@ void dma_buf_move_notify(struct dma_buf *dmabuf)
 		if (attach->importer_ops)
 			attach->importer_ops->invalidate_mappings(attach);
 }
-EXPORT_SYMBOL_NS_GPL(dma_buf_move_notify, "DMA_BUF");
+EXPORT_SYMBOL_NS_GPL(dma_buf_invalidate_mappings, "DMA_BUF");
 
 /**
  * DOC: cpu access
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
index e08f58de4b17..f73dc99d1887 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
@@ -1270,7 +1270,7 @@ void amdgpu_bo_move_notify(struct ttm_buffer_object *bo,
 
 	if (abo->tbo.base.dma_buf && !drm_gem_is_imported(&abo->tbo.base) &&
 	    old_mem && old_mem->mem_type != TTM_PL_SYSTEM)
-		dma_buf_move_notify(abo->tbo.base.dma_buf);
+		dma_buf_invalidate_mappings(abo->tbo.base.dma_buf);
 
 	/* move_notify is called before move happens */
 	trace_amdgpu_bo_move(abo, new_mem ? new_mem->mem_type : -1,
diff --git a/drivers/gpu/drm/xe/xe_bo.c b/drivers/gpu/drm/xe/xe_bo.c
index bf4ee976b680..7d02cd9a8501 100644
--- a/drivers/gpu/drm/xe/xe_bo.c
+++ b/drivers/gpu/drm/xe/xe_bo.c
@@ -819,7 +819,7 @@ static int xe_bo_move_notify(struct xe_bo *bo,
 
 	/* Don't call move_notify() for imported dma-bufs. */
 	if (ttm_bo->base.dma_buf && !ttm_bo->base.import_attach)
-		dma_buf_move_notify(ttm_bo->base.dma_buf);
+		dma_buf_invalidate_mappings(ttm_bo->base.dma_buf);
 
 	/*
 	 * TTM has already nuked the mmap for us (see ttm_bo_unmap_virtual),
diff --git a/drivers/iommu/iommufd/selftest.c b/drivers/iommu/iommufd/selftest.c
index 550ff36dec3a..f60cbd5328cc 100644
--- a/drivers/iommu/iommufd/selftest.c
+++ b/drivers/iommu/iommufd/selftest.c
@@ -2081,7 +2081,7 @@ static int iommufd_test_dmabuf_revoke(struct iommufd_ucmd *ucmd, int fd,
 	priv = dmabuf->priv;
 	dma_resv_lock(dmabuf->resv, NULL);
 	priv->revoked = revoked;
-	dma_buf_move_notify(dmabuf);
+	dma_buf_invalidate_mappings(dmabuf);
 	dma_resv_unlock(dmabuf->resv);
 
 err_put:
diff --git a/drivers/vfio/pci/vfio_pci_dmabuf.c b/drivers/vfio/pci/vfio_pci_dmabuf.c
index 4be4a85005cb..d8ceafabef48 100644
--- a/drivers/vfio/pci/vfio_pci_dmabuf.c
+++ b/drivers/vfio/pci/vfio_pci_dmabuf.c
@@ -332,7 +332,7 @@ void vfio_pci_dma_buf_move(struct vfio_pci_core_device *vdev, bool revoked)
 		if (priv->revoked != revoked) {
 			dma_resv_lock(priv->dmabuf->resv, NULL);
 			priv->revoked = revoked;
-			dma_buf_move_notify(priv->dmabuf);
+			dma_buf_invalidate_mappings(priv->dmabuf);
 			dma_resv_unlock(priv->dmabuf->resv);
 		}
 		fput(priv->dmabuf->file);
@@ -353,7 +353,7 @@ void vfio_pci_dma_buf_cleanup(struct vfio_pci_core_device *vdev)
 		list_del_init(&priv->dmabufs_elm);
 		priv->vdev = NULL;
 		priv->revoked = true;
-		dma_buf_move_notify(priv->dmabuf);
+		dma_buf_invalidate_mappings(priv->dmabuf);
 		dma_resv_unlock(priv->dmabuf->resv);
 		vfio_device_put_registration(&vdev->vdev);
 		fput(priv->dmabuf->file);
diff --git a/include/linux/dma-buf.h b/include/linux/dma-buf.h
index 1b397635c793..d5c3ce2b3aa4 100644
--- a/include/linux/dma-buf.h
+++ b/include/linux/dma-buf.h
@@ -600,7 +600,7 @@ struct sg_table *dma_buf_map_attachment(struct dma_buf_attachment *,
 					enum dma_data_direction);
 void dma_buf_unmap_attachment(struct dma_buf_attachment *, struct sg_table *,
 				enum dma_data_direction);
-void dma_buf_move_notify(struct dma_buf *dma_buf);
+void dma_buf_invalidate_mappings(struct dma_buf *dma_buf);
 int dma_buf_begin_cpu_access(struct dma_buf *dma_buf,
 			     enum dma_data_direction dir);
 int dma_buf_end_cpu_access(struct dma_buf *dma_buf,

-- 
2.52.0


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v5 3/8] dma-buf: Always build with DMABUF_MOVE_NOTIFY
  2026-01-24 19:14 [PATCH v5 0/8] dma-buf: Use revoke mechanism to invalidate shared buffers Leon Romanovsky
  2026-01-24 19:14 ` [PATCH v5 1/8] dma-buf: Rename .move_notify() callback to a clearer identifier Leon Romanovsky
  2026-01-24 19:14 ` [PATCH v5 2/8] dma-buf: Rename dma_buf_move_notify() to dma_buf_invalidate_mappings() Leon Romanovsky
@ 2026-01-24 19:14 ` Leon Romanovsky
  2026-01-27  9:26   ` Christian König
  2026-01-24 19:14 ` [PATCH v5 4/8] vfio: Wait for dma-buf invalidation to complete Leon Romanovsky
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 51+ messages in thread
From: Leon Romanovsky @ 2026-01-24 19:14 UTC (permalink / raw)
  To: Sumit Semwal, Christian König, Alex Deucher, David Airlie,
	Simona Vetter, Gerd Hoffmann, Dmitry Osipenko, Gurchetan Singh,
	Chia-I Wu, Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	Lucas De Marchi, Thomas Hellström, Rodrigo Vivi,
	Jason Gunthorpe, Leon Romanovsky, Kevin Tian, Joerg Roedel,
	Will Deacon, Robin Murphy, Felix Kuehling, Alex Williamson,
	Ankit Agrawal, Vivek Kasireddy
  Cc: linux-media, dri-devel, linaro-mm-sig, linux-kernel, amd-gfx,
	virtualization, intel-xe, linux-rdma, iommu, kvm

From: Leon Romanovsky <leonro@nvidia.com>

DMABUF_MOVE_NOTIFY was introduced in 2018 and has been marked as
experimental and disabled by default ever since. Six years later,
all new importers implement this callback.

It is therefore reasonable to drop CONFIG_DMABUF_MOVE_NOTIFY and
always build DMABUF with support for it enabled.

Suggested-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
 drivers/dma-buf/Kconfig                     | 12 ------------
 drivers/dma-buf/dma-buf.c                   |  3 +--
 drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c | 10 +++-------
 drivers/gpu/drm/amd/amdkfd/Kconfig          |  2 +-
 drivers/gpu/drm/xe/tests/xe_dma_buf.c       |  3 +--
 drivers/gpu/drm/xe/xe_dma_buf.c             | 12 ++++--------
 6 files changed, 10 insertions(+), 32 deletions(-)

diff --git a/drivers/dma-buf/Kconfig b/drivers/dma-buf/Kconfig
index b46eb8a552d7..84d5e9b24e20 100644
--- a/drivers/dma-buf/Kconfig
+++ b/drivers/dma-buf/Kconfig
@@ -40,18 +40,6 @@ config UDMABUF
 	  A driver to let userspace turn memfd regions into dma-bufs.
 	  Qemu can use this to create host dmabufs for guest framebuffers.
 
-config DMABUF_MOVE_NOTIFY
-	bool "Move notify between drivers (EXPERIMENTAL)"
-	default n
-	depends on DMA_SHARED_BUFFER
-	help
-	  Don't pin buffers if the dynamic DMA-buf interface is available on
-	  both the exporter as well as the importer. This fixes a security
-	  problem where userspace is able to pin unrestricted amounts of memory
-	  through DMA-buf.
-	  This is marked experimental because we don't yet have a consistent
-	  execution context and memory management between drivers.
-
 config DMABUF_DEBUG
 	bool "DMA-BUF debug checks"
 	depends on DMA_SHARED_BUFFER
diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
index e12db540c413..cd68c1c0bfd7 100644
--- a/drivers/dma-buf/dma-buf.c
+++ b/drivers/dma-buf/dma-buf.c
@@ -847,8 +847,7 @@ static bool
 dma_buf_pin_on_map(struct dma_buf_attachment *attach)
 {
 	return attach->dmabuf->ops->pin &&
-		(!dma_buf_attachment_is_dynamic(attach) ||
-		 !IS_ENABLED(CONFIG_DMABUF_MOVE_NOTIFY));
+	       !dma_buf_attachment_is_dynamic(attach);
 }
 
 /**
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c
index cd4944ceb047..b7f85b8653cf 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c
@@ -133,13 +133,9 @@ static int amdgpu_dma_buf_pin(struct dma_buf_attachment *attach)
 	 * notifiers are disabled, only allow pinning in VRAM when move
 	 * notiers are enabled.
 	 */
-	if (!IS_ENABLED(CONFIG_DMABUF_MOVE_NOTIFY)) {
-		domains &= ~AMDGPU_GEM_DOMAIN_VRAM;
-	} else {
-		list_for_each_entry(attach, &dmabuf->attachments, node)
-			if (!attach->peer2peer)
-				domains &= ~AMDGPU_GEM_DOMAIN_VRAM;
-	}
+	list_for_each_entry(attach, &dmabuf->attachments, node)
+		if (!attach->peer2peer)
+			domains &= ~AMDGPU_GEM_DOMAIN_VRAM;
 
 	if (domains & AMDGPU_GEM_DOMAIN_VRAM)
 		bo->flags |= AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED;
diff --git a/drivers/gpu/drm/amd/amdkfd/Kconfig b/drivers/gpu/drm/amd/amdkfd/Kconfig
index 16e12c9913f9..a5d7467c2f34 100644
--- a/drivers/gpu/drm/amd/amdkfd/Kconfig
+++ b/drivers/gpu/drm/amd/amdkfd/Kconfig
@@ -27,7 +27,7 @@ config HSA_AMD_SVM
 
 config HSA_AMD_P2P
 	bool "HSA kernel driver support for peer-to-peer for AMD GPU devices"
-	depends on HSA_AMD && PCI_P2PDMA && DMABUF_MOVE_NOTIFY
+	depends on HSA_AMD && PCI_P2PDMA
 	help
 	  Enable peer-to-peer (P2P) communication between AMD GPUs over
 	  the PCIe bus. This can improve performance of multi-GPU compute
diff --git a/drivers/gpu/drm/xe/tests/xe_dma_buf.c b/drivers/gpu/drm/xe/tests/xe_dma_buf.c
index 1f2cca5c2f81..c107687ef3c0 100644
--- a/drivers/gpu/drm/xe/tests/xe_dma_buf.c
+++ b/drivers/gpu/drm/xe/tests/xe_dma_buf.c
@@ -22,8 +22,7 @@ static bool p2p_enabled(struct dma_buf_test_params *params)
 
 static bool is_dynamic(struct dma_buf_test_params *params)
 {
-	return IS_ENABLED(CONFIG_DMABUF_MOVE_NOTIFY) && params->attach_ops &&
-		params->attach_ops->invalidate_mappings;
+	return params->attach_ops && params->attach_ops->invalidate_mappings;
 }
 
 static void check_residency(struct kunit *test, struct xe_bo *exported,
diff --git a/drivers/gpu/drm/xe/xe_dma_buf.c b/drivers/gpu/drm/xe/xe_dma_buf.c
index 1b9cd043e517..ea370cd373e9 100644
--- a/drivers/gpu/drm/xe/xe_dma_buf.c
+++ b/drivers/gpu/drm/xe/xe_dma_buf.c
@@ -56,14 +56,10 @@ static int xe_dma_buf_pin(struct dma_buf_attachment *attach)
 	bool allow_vram = true;
 	int ret;
 
-	if (!IS_ENABLED(CONFIG_DMABUF_MOVE_NOTIFY)) {
-		allow_vram = false;
-	} else {
-		list_for_each_entry(attach, &dmabuf->attachments, node) {
-			if (!attach->peer2peer) {
-				allow_vram = false;
-				break;
-			}
+	list_for_each_entry(attach, &dmabuf->attachments, node) {
+		if (!attach->peer2peer) {
+			allow_vram = false;
+			break;
 		}
 	}
 

-- 
2.52.0


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v5 4/8] vfio: Wait for dma-buf invalidation to complete
  2026-01-24 19:14 [PATCH v5 0/8] dma-buf: Use revoke mechanism to invalidate shared buffers Leon Romanovsky
                   ` (2 preceding siblings ...)
  2026-01-24 19:14 ` [PATCH v5 3/8] dma-buf: Always build with DMABUF_MOVE_NOTIFY Leon Romanovsky
@ 2026-01-24 19:14 ` Leon Romanovsky
  2026-01-26 20:53   ` Pranjal Shrivastava
  2026-01-30  8:30   ` Christian König
  2026-01-24 19:14 ` [PATCH v5 5/8] dma-buf: Make .invalidate_mapping() truly optional Leon Romanovsky
                   ` (3 subsequent siblings)
  7 siblings, 2 replies; 51+ messages in thread
From: Leon Romanovsky @ 2026-01-24 19:14 UTC (permalink / raw)
  To: Sumit Semwal, Christian König, Alex Deucher, David Airlie,
	Simona Vetter, Gerd Hoffmann, Dmitry Osipenko, Gurchetan Singh,
	Chia-I Wu, Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	Lucas De Marchi, Thomas Hellström, Rodrigo Vivi,
	Jason Gunthorpe, Leon Romanovsky, Kevin Tian, Joerg Roedel,
	Will Deacon, Robin Murphy, Felix Kuehling, Alex Williamson,
	Ankit Agrawal, Vivek Kasireddy
  Cc: linux-media, dri-devel, linaro-mm-sig, linux-kernel, amd-gfx,
	virtualization, intel-xe, linux-rdma, iommu, kvm

From: Leon Romanovsky <leonro@nvidia.com>

dma-buf invalidation is handled asynchronously by the hardware, so VFIO
must wait until all affected objects have been fully invalidated.

In addition, the dma-buf exporter is expecting that all importers unmap any
buffers they previously mapped.

Fixes: 5d74781ebc86 ("vfio/pci: Add dma-buf export support for MMIO regions")
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
 drivers/vfio/pci/vfio_pci_dmabuf.c | 71 ++++++++++++++++++++++++++++++++++++--
 1 file changed, 68 insertions(+), 3 deletions(-)

diff --git a/drivers/vfio/pci/vfio_pci_dmabuf.c b/drivers/vfio/pci/vfio_pci_dmabuf.c
index d8ceafabef48..485515629fe4 100644
--- a/drivers/vfio/pci/vfio_pci_dmabuf.c
+++ b/drivers/vfio/pci/vfio_pci_dmabuf.c
@@ -17,6 +17,8 @@ struct vfio_pci_dma_buf {
 	struct dma_buf_phys_vec *phys_vec;
 	struct p2pdma_provider *provider;
 	u32 nr_ranges;
+	struct kref kref;
+	struct completion comp;
 	u8 revoked : 1;
 };
 
@@ -44,27 +46,46 @@ static int vfio_pci_dma_buf_attach(struct dma_buf *dmabuf,
 	return 0;
 }
 
+static void vfio_pci_dma_buf_done(struct kref *kref)
+{
+	struct vfio_pci_dma_buf *priv =
+		container_of(kref, struct vfio_pci_dma_buf, kref);
+
+	complete(&priv->comp);
+}
+
 static struct sg_table *
 vfio_pci_dma_buf_map(struct dma_buf_attachment *attachment,
 		     enum dma_data_direction dir)
 {
 	struct vfio_pci_dma_buf *priv = attachment->dmabuf->priv;
+	struct sg_table *ret;
 
 	dma_resv_assert_held(priv->dmabuf->resv);
 
 	if (priv->revoked)
 		return ERR_PTR(-ENODEV);
 
-	return dma_buf_phys_vec_to_sgt(attachment, priv->provider,
-				       priv->phys_vec, priv->nr_ranges,
-				       priv->size, dir);
+	ret = dma_buf_phys_vec_to_sgt(attachment, priv->provider,
+				      priv->phys_vec, priv->nr_ranges,
+				      priv->size, dir);
+	if (IS_ERR(ret))
+		return ret;
+
+	kref_get(&priv->kref);
+	return ret;
 }
 
 static void vfio_pci_dma_buf_unmap(struct dma_buf_attachment *attachment,
 				   struct sg_table *sgt,
 				   enum dma_data_direction dir)
 {
+	struct vfio_pci_dma_buf *priv = attachment->dmabuf->priv;
+
+	dma_resv_assert_held(priv->dmabuf->resv);
+
 	dma_buf_free_sgt(attachment, sgt, dir);
+	kref_put(&priv->kref, vfio_pci_dma_buf_done);
 }
 
 static void vfio_pci_dma_buf_release(struct dma_buf *dmabuf)
@@ -287,6 +308,9 @@ int vfio_pci_core_feature_dma_buf(struct vfio_pci_core_device *vdev, u32 flags,
 		goto err_dev_put;
 	}
 
+	kref_init(&priv->kref);
+	init_completion(&priv->comp);
+
 	/* dma_buf_put() now frees priv */
 	INIT_LIST_HEAD(&priv->dmabufs_elm);
 	down_write(&vdev->memory_lock);
@@ -326,6 +350,8 @@ void vfio_pci_dma_buf_move(struct vfio_pci_core_device *vdev, bool revoked)
 	lockdep_assert_held_write(&vdev->memory_lock);
 
 	list_for_each_entry_safe(priv, tmp, &vdev->dmabufs, dmabufs_elm) {
+		unsigned long wait;
+
 		if (!get_file_active(&priv->dmabuf->file))
 			continue;
 
@@ -333,7 +359,37 @@ void vfio_pci_dma_buf_move(struct vfio_pci_core_device *vdev, bool revoked)
 			dma_resv_lock(priv->dmabuf->resv, NULL);
 			priv->revoked = revoked;
 			dma_buf_invalidate_mappings(priv->dmabuf);
+			dma_resv_wait_timeout(priv->dmabuf->resv,
+					      DMA_RESV_USAGE_BOOKKEEP, false,
+					      MAX_SCHEDULE_TIMEOUT);
 			dma_resv_unlock(priv->dmabuf->resv);
+			if (revoked) {
+				kref_put(&priv->kref, vfio_pci_dma_buf_done);
+				/* Let's wait till all DMA unmap are completed. */
+				wait = wait_for_completion_timeout(
+					&priv->comp, secs_to_jiffies(1));
+				/*
+				 * If you see this WARN_ON, it means that
+				 * importer didn't call unmap in response to
+				 * dma_buf_invalidate_mappings() which is not
+				 * allowed.
+				 */
+				WARN(!wait,
+				     "Timed out waiting for DMABUF unmap, importer has a broken invalidate_mapping()");
+			} else {
+				/*
+				 * Kref is initialize again, because when revoke
+				 * was performed the reference counter was decreased
+				 * to zero to trigger completion.
+				 */
+				kref_init(&priv->kref);
+				/*
+				 * There is no need to wait as no mapping was
+				 * performed when the previous status was
+				 * priv->revoked == true.
+				 */
+				reinit_completion(&priv->comp);
+			}
 		}
 		fput(priv->dmabuf->file);
 	}
@@ -346,6 +402,8 @@ void vfio_pci_dma_buf_cleanup(struct vfio_pci_core_device *vdev)
 
 	down_write(&vdev->memory_lock);
 	list_for_each_entry_safe(priv, tmp, &vdev->dmabufs, dmabufs_elm) {
+		unsigned long wait;
+
 		if (!get_file_active(&priv->dmabuf->file))
 			continue;
 
@@ -354,7 +412,14 @@ void vfio_pci_dma_buf_cleanup(struct vfio_pci_core_device *vdev)
 		priv->vdev = NULL;
 		priv->revoked = true;
 		dma_buf_invalidate_mappings(priv->dmabuf);
+		dma_resv_wait_timeout(priv->dmabuf->resv,
+				      DMA_RESV_USAGE_BOOKKEEP, false,
+				      MAX_SCHEDULE_TIMEOUT);
 		dma_resv_unlock(priv->dmabuf->resv);
+		kref_put(&priv->kref, vfio_pci_dma_buf_done);
+		wait = wait_for_completion_timeout(&priv->comp,
+						   secs_to_jiffies(1));
+		WARN_ON(!wait);
 		vfio_device_put_registration(&vdev->vdev);
 		fput(priv->dmabuf->file);
 	}

-- 
2.52.0


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v5 5/8] dma-buf: Make .invalidate_mapping() truly optional
  2026-01-24 19:14 [PATCH v5 0/8] dma-buf: Use revoke mechanism to invalidate shared buffers Leon Romanovsky
                   ` (3 preceding siblings ...)
  2026-01-24 19:14 ` [PATCH v5 4/8] vfio: Wait for dma-buf invalidation to complete Leon Romanovsky
@ 2026-01-24 19:14 ` Leon Romanovsky
  2026-01-30  8:30   ` Christian König
  2026-01-24 19:14 ` [PATCH v5 6/8] dma-buf: Add dma_buf_attach_revocable() Leon Romanovsky
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 51+ messages in thread
From: Leon Romanovsky @ 2026-01-24 19:14 UTC (permalink / raw)
  To: Sumit Semwal, Christian König, Alex Deucher, David Airlie,
	Simona Vetter, Gerd Hoffmann, Dmitry Osipenko, Gurchetan Singh,
	Chia-I Wu, Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	Lucas De Marchi, Thomas Hellström, Rodrigo Vivi,
	Jason Gunthorpe, Leon Romanovsky, Kevin Tian, Joerg Roedel,
	Will Deacon, Robin Murphy, Felix Kuehling, Alex Williamson,
	Ankit Agrawal, Vivek Kasireddy
  Cc: linux-media, dri-devel, linaro-mm-sig, linux-kernel, amd-gfx,
	virtualization, intel-xe, linux-rdma, iommu, kvm

From: Leon Romanovsky <leonro@nvidia.com>

The .invalidate_mapping() callback is documented as optional, yet it
effectively became mandatory whenever importer_ops were provided. This
led to cases where RDMA non-ODP code had to supply an empty stub.

Relax the checks in the dma-buf core so the callback can be omitted,
allowing RDMA code to drop the unnecessary function.

Removing the stub allows the next patch to tell that RDMA does not support
.invalidate_mapping() by checking for a NULL op.

Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
 drivers/dma-buf/dma-buf.c             |  6 ++----
 drivers/infiniband/core/umem_dmabuf.c | 13 -------------
 2 files changed, 2 insertions(+), 17 deletions(-)

diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
index cd68c1c0bfd7..1629312d364a 100644
--- a/drivers/dma-buf/dma-buf.c
+++ b/drivers/dma-buf/dma-buf.c
@@ -947,9 +947,6 @@ dma_buf_dynamic_attach(struct dma_buf *dmabuf, struct device *dev,
 	if (WARN_ON(!dmabuf || !dev))
 		return ERR_PTR(-EINVAL);
 
-	if (WARN_ON(importer_ops && !importer_ops->invalidate_mappings))
-		return ERR_PTR(-EINVAL);
-
 	attach = kzalloc(sizeof(*attach), GFP_KERNEL);
 	if (!attach)
 		return ERR_PTR(-ENOMEM);
@@ -1260,7 +1257,8 @@ void dma_buf_invalidate_mappings(struct dma_buf *dmabuf)
 	dma_resv_assert_held(dmabuf->resv);
 
 	list_for_each_entry(attach, &dmabuf->attachments, node)
-		if (attach->importer_ops)
+		if (attach->importer_ops &&
+		    attach->importer_ops->invalidate_mappings)
 			attach->importer_ops->invalidate_mappings(attach);
 }
 EXPORT_SYMBOL_NS_GPL(dma_buf_invalidate_mappings, "DMA_BUF");
diff --git a/drivers/infiniband/core/umem_dmabuf.c b/drivers/infiniband/core/umem_dmabuf.c
index d77a739cfe7a..256e34c15e6b 100644
--- a/drivers/infiniband/core/umem_dmabuf.c
+++ b/drivers/infiniband/core/umem_dmabuf.c
@@ -129,9 +129,6 @@ ib_umem_dmabuf_get_with_dma_device(struct ib_device *device,
 	if (check_add_overflow(offset, (unsigned long)size, &end))
 		return ret;
 
-	if (unlikely(!ops || !ops->invalidate_mappings))
-		return ret;
-
 	dmabuf = dma_buf_get(fd);
 	if (IS_ERR(dmabuf))
 		return ERR_CAST(dmabuf);
@@ -184,18 +181,8 @@ struct ib_umem_dmabuf *ib_umem_dmabuf_get(struct ib_device *device,
 }
 EXPORT_SYMBOL(ib_umem_dmabuf_get);
 
-static void
-ib_umem_dmabuf_unsupported_move_notify(struct dma_buf_attachment *attach)
-{
-	struct ib_umem_dmabuf *umem_dmabuf = attach->importer_priv;
-
-	ibdev_warn_ratelimited(umem_dmabuf->umem.ibdev,
-			       "Invalidate callback should not be called when memory is pinned\n");
-}
-
 static struct dma_buf_attach_ops ib_umem_dmabuf_attach_pinned_ops = {
 	.allow_peer2peer = true,
-	.invalidate_mappings = ib_umem_dmabuf_unsupported_move_notify,
 };
 
 struct ib_umem_dmabuf *

-- 
2.52.0


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v5 6/8] dma-buf: Add dma_buf_attach_revocable()
  2026-01-24 19:14 [PATCH v5 0/8] dma-buf: Use revoke mechanism to invalidate shared buffers Leon Romanovsky
                   ` (4 preceding siblings ...)
  2026-01-24 19:14 ` [PATCH v5 5/8] dma-buf: Make .invalidate_mapping() truly optional Leon Romanovsky
@ 2026-01-24 19:14 ` Leon Romanovsky
  2026-01-26 20:38   ` Pranjal Shrivastava
  2026-01-30  8:43   ` Christian König
  2026-01-24 19:14 ` [PATCH v5 7/8] vfio: Permit VFIO to work with pinned importers Leon Romanovsky
  2026-01-24 19:14 ` [PATCH v5 8/8] iommufd: Add dma_buf_pin() Leon Romanovsky
  7 siblings, 2 replies; 51+ messages in thread
From: Leon Romanovsky @ 2026-01-24 19:14 UTC (permalink / raw)
  To: Sumit Semwal, Christian König, Alex Deucher, David Airlie,
	Simona Vetter, Gerd Hoffmann, Dmitry Osipenko, Gurchetan Singh,
	Chia-I Wu, Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	Lucas De Marchi, Thomas Hellström, Rodrigo Vivi,
	Jason Gunthorpe, Leon Romanovsky, Kevin Tian, Joerg Roedel,
	Will Deacon, Robin Murphy, Felix Kuehling, Alex Williamson,
	Ankit Agrawal, Vivek Kasireddy
  Cc: linux-media, dri-devel, linaro-mm-sig, linux-kernel, amd-gfx,
	virtualization, intel-xe, linux-rdma, iommu, kvm

From: Leon Romanovsky <leonro@nvidia.com>

Some exporters need a flow to synchronously revoke access to the DMA-buf
by importers. Once revoke is completed the importer is not permitted to
touch the memory otherwise they may get IOMMU faults, AERs, or worse.

DMA-buf today defines a revoke flow, for both pinned and dynamic
importers, which is broadly:

	dma_resv_lock(dmabuf->resv, NULL);
	// Prevent new mappings from being established
	priv->revoked = true;

	// Tell all importers to eventually unmap
	dma_buf_invalidate_mappings(dmabuf);

	// Wait for any inprogress fences on the old mapping
	dma_resv_wait_timeout(dmabuf->resv,
			      DMA_RESV_USAGE_BOOKKEEP, false,
			      MAX_SCHEDULE_TIMEOUT);
	dma_resv_unlock(dmabuf->resv, NULL);

	// Wait for all importers to complete unmap
	wait_for_completion(&priv->unmapped_comp);

This works well, and an importer that continues to access the DMA-buf
after unmapping it is very buggy.

However, the final wait for unmap is effectively unbounded. Several
importers do not support invalidate_mappings() at all and won't unmap
until userspace triggers it.

This unbounded wait is not suitable for exporters like VFIO and RDMA tha
need to issue revoke as part of their normal operations.

Add dma_buf_attach_revocable() to allow exporters to determine the
difference between importers that can complete the above in bounded time,
and those that can't. It can be called inside the exporter's attach op to
reject incompatible importers.

Document these details about how dma_buf_invalidate_mappings() works and
what the required sequence is to achieve a full revocation.

Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
 drivers/dma-buf/dma-buf.c | 48 ++++++++++++++++++++++++++++++++++++++++++++++-
 include/linux/dma-buf.h   |  9 +++------
 2 files changed, 50 insertions(+), 7 deletions(-)

diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
index 1629312d364a..f0e05227bda8 100644
--- a/drivers/dma-buf/dma-buf.c
+++ b/drivers/dma-buf/dma-buf.c
@@ -1242,13 +1242,59 @@ void dma_buf_unmap_attachment_unlocked(struct dma_buf_attachment *attach,
 }
 EXPORT_SYMBOL_NS_GPL(dma_buf_unmap_attachment_unlocked, "DMA_BUF");
 
+/**
+ * dma_buf_attach_revocable - check if a DMA-buf importer implements
+ * revoke semantics.
+ * @attach: the DMA-buf attachment to check
+ *
+ * Returns true if the DMA-buf importer can support the revoke sequence
+ * explained in dma_buf_invalidate_mappings() within bounded time. Meaning the
+ * importer implements invalidate_mappings() and ensures that unmap is called as
+ * a result.
+ */
+bool dma_buf_attach_revocable(struct dma_buf_attachment *attach)
+{
+	return attach->importer_ops &&
+	       attach->importer_ops->invalidate_mappings;
+}
+EXPORT_SYMBOL_NS_GPL(dma_buf_attach_revocable, "DMA_BUF");
+
 /**
  * dma_buf_invalidate_mappings - notify attachments that DMA-buf is moving
  *
  * @dmabuf:	[in]	buffer which is moving
  *
  * Informs all attachments that they need to destroy and recreate all their
- * mappings.
+ * mappings. If the attachment is dynamic then the dynamic importer is expected
+ * to invalidate any caches it has of the mapping result and perform a new
+ * mapping request before allowing HW to do any further DMA.
+ *
+ * If the attachment is pinned then this informs the pinned importer that the
+ * underlying mapping is no longer available. Pinned importers may take this is
+ * as a permanent revocation and never establish new mappings so exporters
+ * should not trigger it lightly.
+ *
+ * Upon return importers may continue to access the DMA-buf memory. The caller
+ * must do two additional waits to ensure that the memory is no longer being
+ * accessed:
+ *  1) Until dma_resv_wait_timeout() retires fences the importer is allowed to
+ *     fully access the memory.
+ *  2) Until the importer calls unmap it is allowed to speculatively
+ *     read-and-discard the memory. It must not write to the memory.
+ *
+ * A caller wishing to use dma_buf_invalidate_mappings() to fully stop access to
+ * the DMA-buf must wait for both. Dynamic callers can often use just the first.
+ *
+ * All importers providing a invalidate_mappings() op must ensure that unmap is
+ * called within bounded time after the op.
+ *
+ * Pinned importers that do not support a invalidate_mappings() op will
+ * eventually perform unmap when they are done with the buffer, which may be an
+ * ubounded time from calling this function. dma_buf_attach_revocable() can be
+ * used to prevent such importers from attaching.
+ *
+ * Importers are free to request a new mapping in parallel as this function
+ * returns.
  */
 void dma_buf_invalidate_mappings(struct dma_buf *dmabuf)
 {
diff --git a/include/linux/dma-buf.h b/include/linux/dma-buf.h
index d5c3ce2b3aa4..84a7ec8f5359 100644
--- a/include/linux/dma-buf.h
+++ b/include/linux/dma-buf.h
@@ -468,12 +468,8 @@ struct dma_buf_attach_ops {
 	 * called with this lock held as well. This makes sure that no mapping
 	 * is created concurrently with an ongoing move operation.
 	 *
-	 * Mappings stay valid and are not directly affected by this callback.
-	 * But the DMA-buf can now be in a different physical location, so all
-	 * mappings should be destroyed and re-created as soon as possible.
-	 *
-	 * New mappings can be created after this callback returns, and will
-	 * point to the new location of the DMA-buf.
+	 * See the kdoc for dma_buf_invalidate_mappings() for details on the
+	 * required behavior.
 	 */
 	void (*invalidate_mappings)(struct dma_buf_attachment *attach);
 };
@@ -601,6 +597,7 @@ struct sg_table *dma_buf_map_attachment(struct dma_buf_attachment *,
 void dma_buf_unmap_attachment(struct dma_buf_attachment *, struct sg_table *,
 				enum dma_data_direction);
 void dma_buf_invalidate_mappings(struct dma_buf *dma_buf);
+bool dma_buf_attach_revocable(struct dma_buf_attachment *attach);
 int dma_buf_begin_cpu_access(struct dma_buf *dma_buf,
 			     enum dma_data_direction dir);
 int dma_buf_end_cpu_access(struct dma_buf *dma_buf,

-- 
2.52.0


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v5 7/8] vfio: Permit VFIO to work with pinned importers
  2026-01-24 19:14 [PATCH v5 0/8] dma-buf: Use revoke mechanism to invalidate shared buffers Leon Romanovsky
                   ` (5 preceding siblings ...)
  2026-01-24 19:14 ` [PATCH v5 6/8] dma-buf: Add dma_buf_attach_revocable() Leon Romanovsky
@ 2026-01-24 19:14 ` Leon Romanovsky
  2026-01-29 21:04   ` Alex Williamson
  2026-01-30  3:14   ` Tian, Kevin
  2026-01-24 19:14 ` [PATCH v5 8/8] iommufd: Add dma_buf_pin() Leon Romanovsky
  7 siblings, 2 replies; 51+ messages in thread
From: Leon Romanovsky @ 2026-01-24 19:14 UTC (permalink / raw)
  To: Sumit Semwal, Christian König, Alex Deucher, David Airlie,
	Simona Vetter, Gerd Hoffmann, Dmitry Osipenko, Gurchetan Singh,
	Chia-I Wu, Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	Lucas De Marchi, Thomas Hellström, Rodrigo Vivi,
	Jason Gunthorpe, Leon Romanovsky, Kevin Tian, Joerg Roedel,
	Will Deacon, Robin Murphy, Felix Kuehling, Alex Williamson,
	Ankit Agrawal, Vivek Kasireddy
  Cc: linux-media, dri-devel, linaro-mm-sig, linux-kernel, amd-gfx,
	virtualization, intel-xe, linux-rdma, iommu, kvm

From: Leon Romanovsky <leonro@nvidia.com>

Till now VFIO has rejected pinned importers, largely to avoid being used
with the RDMA pinned importer that cannot handle a move_notify() to revoke
access.

Using dma_buf_attach_revocable() it can tell the difference between pinned
importers that support the flow described in dma_buf_invalidate_mappings()
and those that don't.

Thus permit compatible pinned importers.

This is one of two items IOMMUFD requires to remove its private interface
to VFIO's dma-buf.

Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
 drivers/vfio/pci/vfio_pci_dmabuf.c | 15 +++------------
 1 file changed, 3 insertions(+), 12 deletions(-)

diff --git a/drivers/vfio/pci/vfio_pci_dmabuf.c b/drivers/vfio/pci/vfio_pci_dmabuf.c
index 485515629fe4..3c8dc56e2238 100644
--- a/drivers/vfio/pci/vfio_pci_dmabuf.c
+++ b/drivers/vfio/pci/vfio_pci_dmabuf.c
@@ -22,16 +22,6 @@ struct vfio_pci_dma_buf {
 	u8 revoked : 1;
 };
 
-static int vfio_pci_dma_buf_pin(struct dma_buf_attachment *attachment)
-{
-	return -EOPNOTSUPP;
-}
-
-static void vfio_pci_dma_buf_unpin(struct dma_buf_attachment *attachment)
-{
-	/* Do nothing */
-}
-
 static int vfio_pci_dma_buf_attach(struct dma_buf *dmabuf,
 				   struct dma_buf_attachment *attachment)
 {
@@ -43,6 +33,9 @@ static int vfio_pci_dma_buf_attach(struct dma_buf *dmabuf,
 	if (priv->revoked)
 		return -ENODEV;
 
+	if (!dma_buf_attach_revocable(attachment))
+		return -EOPNOTSUPP;
+
 	return 0;
 }
 
@@ -107,8 +100,6 @@ static void vfio_pci_dma_buf_release(struct dma_buf *dmabuf)
 }
 
 static const struct dma_buf_ops vfio_pci_dmabuf_ops = {
-	.pin = vfio_pci_dma_buf_pin,
-	.unpin = vfio_pci_dma_buf_unpin,
 	.attach = vfio_pci_dma_buf_attach,
 	.map_dma_buf = vfio_pci_dma_buf_map,
 	.unmap_dma_buf = vfio_pci_dma_buf_unmap,

-- 
2.52.0


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v5 8/8] iommufd: Add dma_buf_pin()
  2026-01-24 19:14 [PATCH v5 0/8] dma-buf: Use revoke mechanism to invalidate shared buffers Leon Romanovsky
                   ` (6 preceding siblings ...)
  2026-01-24 19:14 ` [PATCH v5 7/8] vfio: Permit VFIO to work with pinned importers Leon Romanovsky
@ 2026-01-24 19:14 ` Leon Romanovsky
  2026-01-29  7:08   ` Tian, Kevin
  2026-01-30  0:17   ` Jason Gunthorpe
  7 siblings, 2 replies; 51+ messages in thread
From: Leon Romanovsky @ 2026-01-24 19:14 UTC (permalink / raw)
  To: Sumit Semwal, Christian König, Alex Deucher, David Airlie,
	Simona Vetter, Gerd Hoffmann, Dmitry Osipenko, Gurchetan Singh,
	Chia-I Wu, Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	Lucas De Marchi, Thomas Hellström, Rodrigo Vivi,
	Jason Gunthorpe, Leon Romanovsky, Kevin Tian, Joerg Roedel,
	Will Deacon, Robin Murphy, Felix Kuehling, Alex Williamson,
	Ankit Agrawal, Vivek Kasireddy
  Cc: linux-media, dri-devel, linaro-mm-sig, linux-kernel, amd-gfx,
	virtualization, intel-xe, linux-rdma, iommu, kvm

From: Leon Romanovsky <leonro@nvidia.com>

IOMMUFD relies on a private protocol with VFIO, and this always operated
in pinned mode.

Now that VFIO can support pinned importers update IOMMUFD to invoke the
normal dma-buf flow to request pin.

This isn't enough to allow IOMMUFD to work with other exporters, it still
needs a way to get the physical address list which is another series.

IOMMUFD supports the defined revoke semantics. It immediately stops and
fences access to the memory inside it's invalidate_mappings() callback,
and it currently doesn't use scatterlists so doesn't call map/unmap at
all.

It is expected that a future revision can synchronously call unmap from
the move_notify callback as well.

Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
 drivers/iommu/iommufd/pages.c | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/iommufd/pages.c b/drivers/iommu/iommufd/pages.c
index 76f900fa1687..a5eb2bc4ef48 100644
--- a/drivers/iommu/iommufd/pages.c
+++ b/drivers/iommu/iommufd/pages.c
@@ -1501,16 +1501,22 @@ static int iopt_map_dmabuf(struct iommufd_ctx *ictx, struct iopt_pages *pages,
 		mutex_unlock(&pages->mutex);
 	}
 
-	rc = sym_vfio_pci_dma_buf_iommufd_map(attach, &pages->dmabuf.phys);
+	rc = dma_buf_pin(attach);
 	if (rc)
 		goto err_detach;
 
+	rc = sym_vfio_pci_dma_buf_iommufd_map(attach, &pages->dmabuf.phys);
+	if (rc)
+		goto err_unpin;
+
 	dma_resv_unlock(dmabuf->resv);
 
 	/* On success iopt_release_pages() will detach and put the dmabuf. */
 	pages->dmabuf.attach = attach;
 	return 0;
 
+err_unpin:
+	dma_buf_unpin(attach);
 err_detach:
 	dma_resv_unlock(dmabuf->resv);
 	dma_buf_detach(dmabuf, attach);
@@ -1656,6 +1662,7 @@ void iopt_release_pages(struct kref *kref)
 	if (iopt_is_dmabuf(pages) && pages->dmabuf.attach) {
 		struct dma_buf *dmabuf = pages->dmabuf.attach->dmabuf;
 
+		dma_buf_unpin(pages->dmabuf.attach);
 		dma_buf_detach(dmabuf, pages->dmabuf.attach);
 		dma_buf_put(dmabuf);
 		WARN_ON(!list_empty(&pages->dmabuf.tracker));

-- 
2.52.0


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* Re: [PATCH v5 6/8] dma-buf: Add dma_buf_attach_revocable()
  2026-01-24 19:14 ` [PATCH v5 6/8] dma-buf: Add dma_buf_attach_revocable() Leon Romanovsky
@ 2026-01-26 20:38   ` Pranjal Shrivastava
  2026-01-26 21:01     ` Jason Gunthorpe
  2026-01-30  8:43   ` Christian König
  1 sibling, 1 reply; 51+ messages in thread
From: Pranjal Shrivastava @ 2026-01-26 20:38 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Sumit Semwal, Christian König, Alex Deucher, David Airlie,
	Simona Vetter, Gerd Hoffmann, Dmitry Osipenko, Gurchetan Singh,
	Chia-I Wu, Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	Lucas De Marchi, Thomas Hellström, Rodrigo Vivi,
	Jason Gunthorpe, Kevin Tian, Joerg Roedel, Will Deacon,
	Robin Murphy, Felix Kuehling, Alex Williamson, Ankit Agrawal,
	Vivek Kasireddy, linux-media, dri-devel, linaro-mm-sig,
	linux-kernel, amd-gfx, virtualization, intel-xe, linux-rdma,
	iommu, kvm

Hi Leon,

On Sat, Jan 24, 2026 at 09:14:18PM +0200, Leon Romanovsky wrote:
> From: Leon Romanovsky <leonro@nvidia.com>
> 
> Some exporters need a flow to synchronously revoke access to the DMA-buf
> by importers. Once revoke is completed the importer is not permitted to
> touch the memory otherwise they may get IOMMU faults, AERs, or worse.
> 
> DMA-buf today defines a revoke flow, for both pinned and dynamic
> importers, which is broadly:
> 
> 	dma_resv_lock(dmabuf->resv, NULL);
> 	// Prevent new mappings from being established
> 	priv->revoked = true;
> 
> 	// Tell all importers to eventually unmap
> 	dma_buf_invalidate_mappings(dmabuf);
> 
> 	// Wait for any inprogress fences on the old mapping
> 	dma_resv_wait_timeout(dmabuf->resv,
> 			      DMA_RESV_USAGE_BOOKKEEP, false,
> 			      MAX_SCHEDULE_TIMEOUT);
> 	dma_resv_unlock(dmabuf->resv, NULL);
> 
> 	// Wait for all importers to complete unmap
> 	wait_for_completion(&priv->unmapped_comp);
> 
> This works well, and an importer that continues to access the DMA-buf
> after unmapping it is very buggy.
> 
> However, the final wait for unmap is effectively unbounded. Several
> importers do not support invalidate_mappings() at all and won't unmap
> until userspace triggers it.
> 
> This unbounded wait is not suitable for exporters like VFIO and RDMA tha
> need to issue revoke as part of their normal operations.
> 
> Add dma_buf_attach_revocable() to allow exporters to determine the
> difference between importers that can complete the above in bounded time,
> and those that can't. It can be called inside the exporter's attach op to
> reject incompatible importers.
> 
> Document these details about how dma_buf_invalidate_mappings() works and
> what the required sequence is to achieve a full revocation.
> 
> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
> ---
>  drivers/dma-buf/dma-buf.c | 48 ++++++++++++++++++++++++++++++++++++++++++++++-
>  include/linux/dma-buf.h   |  9 +++------
>  2 files changed, 50 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
> index 1629312d364a..f0e05227bda8 100644
> --- a/drivers/dma-buf/dma-buf.c
> +++ b/drivers/dma-buf/dma-buf.c
> @@ -1242,13 +1242,59 @@ void dma_buf_unmap_attachment_unlocked(struct dma_buf_attachment *attach,
>  }
>  EXPORT_SYMBOL_NS_GPL(dma_buf_unmap_attachment_unlocked, "DMA_BUF");
>  
> +/**
> + * dma_buf_attach_revocable - check if a DMA-buf importer implements
> + * revoke semantics.
> + * @attach: the DMA-buf attachment to check
> + *
> + * Returns true if the DMA-buf importer can support the revoke sequence
> + * explained in dma_buf_invalidate_mappings() within bounded time. Meaning the
> + * importer implements invalidate_mappings() and ensures that unmap is called as
> + * a result.
> + */
> +bool dma_buf_attach_revocable(struct dma_buf_attachment *attach)
> +{
> +	return attach->importer_ops &&
> +	       attach->importer_ops->invalidate_mappings;
> +}
> +EXPORT_SYMBOL_NS_GPL(dma_buf_attach_revocable, "DMA_BUF");
> +

I noticed that Patch 5 removes the invalidate_mappings stub from 
umem_dmabuf.c, effectively making the callback NULL for an RDMA 
importer. Consequently, dma_buf_attach_revocable() (introduced here)
will return false for these importers.

Since the cover letter mentions that VFIO will use
dma_buf_attach_revocable() to prevent unbounded waits, this appears to
effectively block paths like the VFIO-export -> RDMA-import path..

Given that RDMA is a significant consumer of dma-bufs, are there plans
to implement proper revocation support in the IB/RDMA core (umem_dmabuf)? 

It would be good to know if there's a plan for bringing such importers
into compliance with the new revocation semantics so they can interop
with VFIO OR are we completely ruling out users like RDMA / IB importing
any DMABUFs exported by VFIO?

Thanks,
Praan

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v5 4/8] vfio: Wait for dma-buf invalidation to complete
  2026-01-24 19:14 ` [PATCH v5 4/8] vfio: Wait for dma-buf invalidation to complete Leon Romanovsky
@ 2026-01-26 20:53   ` Pranjal Shrivastava
  2026-01-27  8:58     ` Leon Romanovsky
  2026-01-30  8:30   ` Christian König
  1 sibling, 1 reply; 51+ messages in thread
From: Pranjal Shrivastava @ 2026-01-26 20:53 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Sumit Semwal, Christian König, Alex Deucher, David Airlie,
	Simona Vetter, Gerd Hoffmann, Dmitry Osipenko, Gurchetan Singh,
	Chia-I Wu, Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	Lucas De Marchi, Thomas Hellström, Rodrigo Vivi,
	Jason Gunthorpe, Kevin Tian, Joerg Roedel, Will Deacon,
	Robin Murphy, Felix Kuehling, Alex Williamson, Ankit Agrawal,
	Vivek Kasireddy, linux-media, dri-devel, linaro-mm-sig,
	linux-kernel, amd-gfx, virtualization, intel-xe, linux-rdma,
	iommu, kvm

On Sat, Jan 24, 2026 at 09:14:16PM +0200, Leon Romanovsky wrote:
> From: Leon Romanovsky <leonro@nvidia.com>
> 
> dma-buf invalidation is handled asynchronously by the hardware, so VFIO
> must wait until all affected objects have been fully invalidated.
> 
> In addition, the dma-buf exporter is expecting that all importers unmap any
> buffers they previously mapped.
> 
> Fixes: 5d74781ebc86 ("vfio/pci: Add dma-buf export support for MMIO regions")
> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
> ---
>  drivers/vfio/pci/vfio_pci_dmabuf.c | 71 ++++++++++++++++++++++++++++++++++++--
>  1 file changed, 68 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/vfio/pci/vfio_pci_dmabuf.c b/drivers/vfio/pci/vfio_pci_dmabuf.c
> index d8ceafabef48..485515629fe4 100644
> --- a/drivers/vfio/pci/vfio_pci_dmabuf.c
> +++ b/drivers/vfio/pci/vfio_pci_dmabuf.c
> @@ -17,6 +17,8 @@ struct vfio_pci_dma_buf {
>  	struct dma_buf_phys_vec *phys_vec;
>  	struct p2pdma_provider *provider;
>  	u32 nr_ranges;
> +	struct kref kref;
> +	struct completion comp;
>  	u8 revoked : 1;
>  };
>  
> @@ -44,27 +46,46 @@ static int vfio_pci_dma_buf_attach(struct dma_buf *dmabuf,
>  	return 0;
>  }
>  
> +static void vfio_pci_dma_buf_done(struct kref *kref)
> +{
> +	struct vfio_pci_dma_buf *priv =
> +		container_of(kref, struct vfio_pci_dma_buf, kref);
> +
> +	complete(&priv->comp);
> +}
> +
>  static struct sg_table *
>  vfio_pci_dma_buf_map(struct dma_buf_attachment *attachment,
>  		     enum dma_data_direction dir)
>  {
>  	struct vfio_pci_dma_buf *priv = attachment->dmabuf->priv;
> +	struct sg_table *ret;
>  
>  	dma_resv_assert_held(priv->dmabuf->resv);
>  
>  	if (priv->revoked)
>  		return ERR_PTR(-ENODEV);
>  
> -	return dma_buf_phys_vec_to_sgt(attachment, priv->provider,
> -				       priv->phys_vec, priv->nr_ranges,
> -				       priv->size, dir);
> +	ret = dma_buf_phys_vec_to_sgt(attachment, priv->provider,
> +				      priv->phys_vec, priv->nr_ranges,
> +				      priv->size, dir);
> +	if (IS_ERR(ret))
> +		return ret;
> +
> +	kref_get(&priv->kref);
> +	return ret;
>  }
>  
>  static void vfio_pci_dma_buf_unmap(struct dma_buf_attachment *attachment,
>  				   struct sg_table *sgt,
>  				   enum dma_data_direction dir)
>  {
> +	struct vfio_pci_dma_buf *priv = attachment->dmabuf->priv;
> +
> +	dma_resv_assert_held(priv->dmabuf->resv);
> +
>  	dma_buf_free_sgt(attachment, sgt, dir);
> +	kref_put(&priv->kref, vfio_pci_dma_buf_done);
>  }
>  
>  static void vfio_pci_dma_buf_release(struct dma_buf *dmabuf)
> @@ -287,6 +308,9 @@ int vfio_pci_core_feature_dma_buf(struct vfio_pci_core_device *vdev, u32 flags,
>  		goto err_dev_put;
>  	}
>  
> +	kref_init(&priv->kref);
> +	init_completion(&priv->comp);
> +
>  	/* dma_buf_put() now frees priv */
>  	INIT_LIST_HEAD(&priv->dmabufs_elm);
>  	down_write(&vdev->memory_lock);
> @@ -326,6 +350,8 @@ void vfio_pci_dma_buf_move(struct vfio_pci_core_device *vdev, bool revoked)
>  	lockdep_assert_held_write(&vdev->memory_lock);
>  
>  	list_for_each_entry_safe(priv, tmp, &vdev->dmabufs, dmabufs_elm) {
> +		unsigned long wait;
> +
>  		if (!get_file_active(&priv->dmabuf->file))
>  			continue;
>  
> @@ -333,7 +359,37 @@ void vfio_pci_dma_buf_move(struct vfio_pci_core_device *vdev, bool revoked)
>  			dma_resv_lock(priv->dmabuf->resv, NULL);
>  			priv->revoked = revoked;
>  			dma_buf_invalidate_mappings(priv->dmabuf);
> +			dma_resv_wait_timeout(priv->dmabuf->resv,
> +					      DMA_RESV_USAGE_BOOKKEEP, false,
> +					      MAX_SCHEDULE_TIMEOUT);
>  			dma_resv_unlock(priv->dmabuf->resv);
> +			if (revoked) {
> +				kref_put(&priv->kref, vfio_pci_dma_buf_done);
> +				/* Let's wait till all DMA unmap are completed. */
> +				wait = wait_for_completion_timeout(
> +					&priv->comp, secs_to_jiffies(1));

Is the 1-second constant sufficient for all hardware, or should the 
invalidate_mappings() contract require the callback to block until 
speculative reads are strictly fenced? I'm wondering about a case where
a device's firmware has a high response latency, perhaps due to internal
management tasks like error recovery or thermal and it exceeds the 1s 
timeout. 

If the device is in the middle of a large DMA burst and the firmware is
slow to flush the internal pipelines to a fully "quiesced"
read-and-discard state, reclaiming the memory at exactly 1.001 seconds
risks triggering platform-level faults..

Since the wen explicitly permit these speculative reads until unmap is
complete, relying on a hardcoded timeout in the exporter seems to 
introduce a hardware-dependent race condition that could compromise
system stability via IOMMU errors or AER faults. 

Should the importer instead be required to guarantee that all 
speculative access has ceased before the invalidation call returns?

Thanks
Praan

> +				/*
> +				 * If you see this WARN_ON, it means that
> +				 * importer didn't call unmap in response to
> +				 * dma_buf_invalidate_mappings() which is not
> +				 * allowed.
> +				 */
> +				WARN(!wait,
> +				     "Timed out waiting for DMABUF unmap, importer has a broken invalidate_mapping()");
> +			} else {
> +				/*
> +				 * Kref is initialize again, because when revoke
> +				 * was performed the reference counter was decreased
> +				 * to zero to trigger completion.
> +				 */
> +				kref_init(&priv->kref);
> +				/*
> +				 * There is no need to wait as no mapping was
> +				 * performed when the previous status was
> +				 * priv->revoked == true.
> +				 */
> +				reinit_completion(&priv->comp);
> +			}
>  		}
>  		fput(priv->dmabuf->file);
>  	}
> @@ -346,6 +402,8 @@ void vfio_pci_dma_buf_cleanup(struct vfio_pci_core_device *vdev)
>  
>  	down_write(&vdev->memory_lock);
>  	list_for_each_entry_safe(priv, tmp, &vdev->dmabufs, dmabufs_elm) {
> +		unsigned long wait;
> +
>  		if (!get_file_active(&priv->dmabuf->file))
>  			continue;
>  
> @@ -354,7 +412,14 @@ void vfio_pci_dma_buf_cleanup(struct vfio_pci_core_device *vdev)
>  		priv->vdev = NULL;
>  		priv->revoked = true;
>  		dma_buf_invalidate_mappings(priv->dmabuf);
> +		dma_resv_wait_timeout(priv->dmabuf->resv,
> +				      DMA_RESV_USAGE_BOOKKEEP, false,
> +				      MAX_SCHEDULE_TIMEOUT);
>  		dma_resv_unlock(priv->dmabuf->resv);
> +		kref_put(&priv->kref, vfio_pci_dma_buf_done);
> +		wait = wait_for_completion_timeout(&priv->comp,
> +						   secs_to_jiffies(1));
> +		WARN_ON(!wait);
>  		vfio_device_put_registration(&vdev->vdev);
>  		fput(priv->dmabuf->file);
>  	}
> 
> -- 
> 2.52.0
> 
> 

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v5 6/8] dma-buf: Add dma_buf_attach_revocable()
  2026-01-26 20:38   ` Pranjal Shrivastava
@ 2026-01-26 21:01     ` Jason Gunthorpe
  0 siblings, 0 replies; 51+ messages in thread
From: Jason Gunthorpe @ 2026-01-26 21:01 UTC (permalink / raw)
  To: Pranjal Shrivastava
  Cc: Leon Romanovsky, Sumit Semwal, Christian König, Alex Deucher,
	David Airlie, Simona Vetter, Gerd Hoffmann, Dmitry Osipenko,
	Gurchetan Singh, Chia-I Wu, Maarten Lankhorst, Maxime Ripard,
	Thomas Zimmermann, Lucas De Marchi, Thomas Hellström,
	Rodrigo Vivi, Kevin Tian, Joerg Roedel, Will Deacon, Robin Murphy,
	Felix Kuehling, Alex Williamson, Ankit Agrawal, Vivek Kasireddy,
	linux-media, dri-devel, linaro-mm-sig, linux-kernel, amd-gfx,
	virtualization, intel-xe, linux-rdma, iommu, kvm

On Mon, Jan 26, 2026 at 08:38:44PM +0000, Pranjal Shrivastava wrote:
> I noticed that Patch 5 removes the invalidate_mappings stub from 
> umem_dmabuf.c, effectively making the callback NULL for an RDMA 
> importer. Consequently, dma_buf_attach_revocable() (introduced here)
> will return false for these importers.

Yes, that is the intention.

> Since the cover letter mentions that VFIO will use
> dma_buf_attach_revocable() to prevent unbounded waits, this appears to
> effectively block paths like the VFIO-export -> RDMA-import path..

It remains usable with the ODP path and people are using that right
now.

> Given that RDMA is a significant consumer of dma-bufs, are there plans
> to implement proper revocation support in the IB/RDMA core (umem_dmabuf)? 

This depends on each HW, they need a way to implement the revoke
semantic. I can't guess what is possible, but I would hope that most
HW could at least do a revoke on a real MR.

Eg a MR rereg operation to a kernel owned empty PD is an effective
"revoke", and MR rereg is at least defined by standards so HW should
implement it.
 
> It would be good to know if there's a plan for bringing such importers
> into compliance with the new revocation semantics so they can interop
> with VFIO OR are we completely ruling out users like RDMA / IB importing
> any DMABUFs exported by VFIO?

It will be driver dependent, there is no one shot update here.

Jason

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v5 4/8] vfio: Wait for dma-buf invalidation to complete
  2026-01-26 20:53   ` Pranjal Shrivastava
@ 2026-01-27  8:58     ` Leon Romanovsky
  2026-01-27 16:27       ` Jason Gunthorpe
  0 siblings, 1 reply; 51+ messages in thread
From: Leon Romanovsky @ 2026-01-27  8:58 UTC (permalink / raw)
  To: Pranjal Shrivastava
  Cc: Sumit Semwal, Christian König, Alex Deucher, David Airlie,
	Simona Vetter, Gerd Hoffmann, Dmitry Osipenko, Gurchetan Singh,
	Chia-I Wu, Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	Lucas De Marchi, Thomas Hellström, Rodrigo Vivi,
	Jason Gunthorpe, Kevin Tian, Joerg Roedel, Will Deacon,
	Robin Murphy, Felix Kuehling, Alex Williamson, Ankit Agrawal,
	Vivek Kasireddy, linux-media, dri-devel, linaro-mm-sig,
	linux-kernel, amd-gfx, virtualization, intel-xe, linux-rdma,
	iommu, kvm

On Mon, Jan 26, 2026 at 08:53:57PM +0000, Pranjal Shrivastava wrote:
> On Sat, Jan 24, 2026 at 09:14:16PM +0200, Leon Romanovsky wrote:
> > From: Leon Romanovsky <leonro@nvidia.com>
> > 
> > dma-buf invalidation is handled asynchronously by the hardware, so VFIO
> > must wait until all affected objects have been fully invalidated.
> > 
> > In addition, the dma-buf exporter is expecting that all importers unmap any
> > buffers they previously mapped.
> > 
> > Fixes: 5d74781ebc86 ("vfio/pci: Add dma-buf export support for MMIO regions")
> > Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
> > ---
> >  drivers/vfio/pci/vfio_pci_dmabuf.c | 71 ++++++++++++++++++++++++++++++++++++--
> >  1 file changed, 68 insertions(+), 3 deletions(-)

<...>

> > @@ -333,7 +359,37 @@ void vfio_pci_dma_buf_move(struct vfio_pci_core_device *vdev, bool revoked)
> >  			dma_resv_lock(priv->dmabuf->resv, NULL);
> >  			priv->revoked = revoked;
> >  			dma_buf_invalidate_mappings(priv->dmabuf);
> > +			dma_resv_wait_timeout(priv->dmabuf->resv,
> > +					      DMA_RESV_USAGE_BOOKKEEP, false,
> > +					      MAX_SCHEDULE_TIMEOUT);
> >  			dma_resv_unlock(priv->dmabuf->resv);
> > +			if (revoked) {
> > +				kref_put(&priv->kref, vfio_pci_dma_buf_done);
> > +				/* Let's wait till all DMA unmap are completed. */
> > +				wait = wait_for_completion_timeout(
> > +					&priv->comp, secs_to_jiffies(1));
> 
> Is the 1-second constant sufficient for all hardware, or should the 
> invalidate_mappings() contract require the callback to block until 
> speculative reads are strictly fenced? I'm wondering about a case where
> a device's firmware has a high response latency, perhaps due to internal
> management tasks like error recovery or thermal and it exceeds the 1s 
> timeout. 
> 
> If the device is in the middle of a large DMA burst and the firmware is
> slow to flush the internal pipelines to a fully "quiesced"
> read-and-discard state, reclaiming the memory at exactly 1.001 seconds
> risks triggering platform-level faults..
> 
> Since the wen explicitly permit these speculative reads until unmap is
> complete, relying on a hardcoded timeout in the exporter seems to 
> introduce a hardware-dependent race condition that could compromise
> system stability via IOMMU errors or AER faults. 
> 
> Should the importer instead be required to guarantee that all 
> speculative access has ceased before the invalidation call returns?

It is guaranteed by the dma_resv_wait_timeout() call above. That call ensures
that the hardware has completed all pending operations. The 1‑second delay is
meant to catch cases where an in-kernel DMA unmap call is missing, which should
not trigger any DMA activity at that point.

So yes, one second is more than sufficient.

Thanks

> 
> Thanks
> Praan
> 
> > +				/*
> > +				 * If you see this WARN_ON, it means that
> > +				 * importer didn't call unmap in response to
> > +				 * dma_buf_invalidate_mappings() which is not
> > +				 * allowed.
> > +				 */
> > +				WARN(!wait,
> > +				     "Timed out waiting for DMABUF unmap, importer has a broken invalidate_mapping()");
> > +			} else {
> > +				/*
> > +				 * Kref is initialize again, because when revoke
> > +				 * was performed the reference counter was decreased
> > +				 * to zero to trigger completion.
> > +				 */
> > +				kref_init(&priv->kref);
> > +				/*
> > +				 * There is no need to wait as no mapping was
> > +				 * performed when the previous status was
> > +				 * priv->revoked == true.
> > +				 */
> > +				reinit_completion(&priv->comp);
> > +			}
> >  		}
> >  		fput(priv->dmabuf->file);
> >  	}
> > @@ -346,6 +402,8 @@ void vfio_pci_dma_buf_cleanup(struct vfio_pci_core_device *vdev)
> >  
> >  	down_write(&vdev->memory_lock);
> >  	list_for_each_entry_safe(priv, tmp, &vdev->dmabufs, dmabufs_elm) {
> > +		unsigned long wait;
> > +
> >  		if (!get_file_active(&priv->dmabuf->file))
> >  			continue;
> >  
> > @@ -354,7 +412,14 @@ void vfio_pci_dma_buf_cleanup(struct vfio_pci_core_device *vdev)
> >  		priv->vdev = NULL;
> >  		priv->revoked = true;
> >  		dma_buf_invalidate_mappings(priv->dmabuf);
> > +		dma_resv_wait_timeout(priv->dmabuf->resv,
> > +				      DMA_RESV_USAGE_BOOKKEEP, false,
> > +				      MAX_SCHEDULE_TIMEOUT);
> >  		dma_resv_unlock(priv->dmabuf->resv);
> > +		kref_put(&priv->kref, vfio_pci_dma_buf_done);
> > +		wait = wait_for_completion_timeout(&priv->comp,
> > +						   secs_to_jiffies(1));
> > +		WARN_ON(!wait);
> >  		vfio_device_put_registration(&vdev->vdev);
> >  		fput(priv->dmabuf->file);
> >  	}
> > 
> > -- 
> > 2.52.0
> > 
> > 
> 

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v5 2/8] dma-buf: Rename dma_buf_move_notify() to dma_buf_invalidate_mappings()
  2026-01-24 19:14 ` [PATCH v5 2/8] dma-buf: Rename dma_buf_move_notify() to dma_buf_invalidate_mappings() Leon Romanovsky
@ 2026-01-27  9:21   ` Christian König
  0 siblings, 0 replies; 51+ messages in thread
From: Christian König @ 2026-01-27  9:21 UTC (permalink / raw)
  To: Leon Romanovsky, Sumit Semwal, Alex Deucher, David Airlie,
	Simona Vetter, Gerd Hoffmann, Dmitry Osipenko, Gurchetan Singh,
	Chia-I Wu, Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	Lucas De Marchi, Thomas Hellström, Rodrigo Vivi,
	Jason Gunthorpe, Kevin Tian, Joerg Roedel, Will Deacon,
	Robin Murphy, Felix Kuehling, Alex Williamson, Ankit Agrawal,
	Vivek Kasireddy
  Cc: linux-media, dri-devel, linaro-mm-sig, linux-kernel, amd-gfx,
	virtualization, intel-xe, linux-rdma, iommu, kvm

On 1/24/26 20:14, Leon Romanovsky wrote:
> From: Leon Romanovsky <leonro@nvidia.com>
> 
> Along with renaming the .move_notify() callback, rename the corresponding
> dma-buf core function. This makes the expected behavior clear to exporters
> calling this function.
> 
> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>

Reviewed-by: Christian König <christian.koenig@amd.com>

> ---
>  drivers/dma-buf/dma-buf.c                  | 8 ++++----
>  drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 2 +-
>  drivers/gpu/drm/xe/xe_bo.c                 | 2 +-
>  drivers/iommu/iommufd/selftest.c           | 2 +-
>  drivers/vfio/pci/vfio_pci_dmabuf.c         | 4 ++--
>  include/linux/dma-buf.h                    | 2 +-
>  6 files changed, 10 insertions(+), 10 deletions(-)
> 
> diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
> index 59cc647bf40e..e12db540c413 100644
> --- a/drivers/dma-buf/dma-buf.c
> +++ b/drivers/dma-buf/dma-buf.c
> @@ -912,7 +912,7 @@ dma_buf_pin_on_map(struct dma_buf_attachment *attach)
>   * 3. Exporters must hold the dma-buf reservation lock when calling these
>   *    functions:
>   *
> - *     - dma_buf_move_notify()
> + *     - dma_buf_invalidate_mappings()
>   */
>  
>  /**
> @@ -1247,14 +1247,14 @@ void dma_buf_unmap_attachment_unlocked(struct dma_buf_attachment *attach,
>  EXPORT_SYMBOL_NS_GPL(dma_buf_unmap_attachment_unlocked, "DMA_BUF");
>  
>  /**
> - * dma_buf_move_notify - notify attachments that DMA-buf is moving
> + * dma_buf_invalidate_mappings - notify attachments that DMA-buf is moving
>   *
>   * @dmabuf:	[in]	buffer which is moving
>   *
>   * Informs all attachments that they need to destroy and recreate all their
>   * mappings.
>   */
> -void dma_buf_move_notify(struct dma_buf *dmabuf)
> +void dma_buf_invalidate_mappings(struct dma_buf *dmabuf)
>  {
>  	struct dma_buf_attachment *attach;
>  
> @@ -1264,7 +1264,7 @@ void dma_buf_move_notify(struct dma_buf *dmabuf)
>  		if (attach->importer_ops)
>  			attach->importer_ops->invalidate_mappings(attach);
>  }
> -EXPORT_SYMBOL_NS_GPL(dma_buf_move_notify, "DMA_BUF");
> +EXPORT_SYMBOL_NS_GPL(dma_buf_invalidate_mappings, "DMA_BUF");
>  
>  /**
>   * DOC: cpu access
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
> index e08f58de4b17..f73dc99d1887 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
> @@ -1270,7 +1270,7 @@ void amdgpu_bo_move_notify(struct ttm_buffer_object *bo,
>  
>  	if (abo->tbo.base.dma_buf && !drm_gem_is_imported(&abo->tbo.base) &&
>  	    old_mem && old_mem->mem_type != TTM_PL_SYSTEM)
> -		dma_buf_move_notify(abo->tbo.base.dma_buf);
> +		dma_buf_invalidate_mappings(abo->tbo.base.dma_buf);
>  
>  	/* move_notify is called before move happens */
>  	trace_amdgpu_bo_move(abo, new_mem ? new_mem->mem_type : -1,
> diff --git a/drivers/gpu/drm/xe/xe_bo.c b/drivers/gpu/drm/xe/xe_bo.c
> index bf4ee976b680..7d02cd9a8501 100644
> --- a/drivers/gpu/drm/xe/xe_bo.c
> +++ b/drivers/gpu/drm/xe/xe_bo.c
> @@ -819,7 +819,7 @@ static int xe_bo_move_notify(struct xe_bo *bo,
>  
>  	/* Don't call move_notify() for imported dma-bufs. */
>  	if (ttm_bo->base.dma_buf && !ttm_bo->base.import_attach)
> -		dma_buf_move_notify(ttm_bo->base.dma_buf);
> +		dma_buf_invalidate_mappings(ttm_bo->base.dma_buf);
>  
>  	/*
>  	 * TTM has already nuked the mmap for us (see ttm_bo_unmap_virtual),
> diff --git a/drivers/iommu/iommufd/selftest.c b/drivers/iommu/iommufd/selftest.c
> index 550ff36dec3a..f60cbd5328cc 100644
> --- a/drivers/iommu/iommufd/selftest.c
> +++ b/drivers/iommu/iommufd/selftest.c
> @@ -2081,7 +2081,7 @@ static int iommufd_test_dmabuf_revoke(struct iommufd_ucmd *ucmd, int fd,
>  	priv = dmabuf->priv;
>  	dma_resv_lock(dmabuf->resv, NULL);
>  	priv->revoked = revoked;
> -	dma_buf_move_notify(dmabuf);
> +	dma_buf_invalidate_mappings(dmabuf);
>  	dma_resv_unlock(dmabuf->resv);
>  
>  err_put:
> diff --git a/drivers/vfio/pci/vfio_pci_dmabuf.c b/drivers/vfio/pci/vfio_pci_dmabuf.c
> index 4be4a85005cb..d8ceafabef48 100644
> --- a/drivers/vfio/pci/vfio_pci_dmabuf.c
> +++ b/drivers/vfio/pci/vfio_pci_dmabuf.c
> @@ -332,7 +332,7 @@ void vfio_pci_dma_buf_move(struct vfio_pci_core_device *vdev, bool revoked)
>  		if (priv->revoked != revoked) {
>  			dma_resv_lock(priv->dmabuf->resv, NULL);
>  			priv->revoked = revoked;
> -			dma_buf_move_notify(priv->dmabuf);
> +			dma_buf_invalidate_mappings(priv->dmabuf);
>  			dma_resv_unlock(priv->dmabuf->resv);
>  		}
>  		fput(priv->dmabuf->file);
> @@ -353,7 +353,7 @@ void vfio_pci_dma_buf_cleanup(struct vfio_pci_core_device *vdev)
>  		list_del_init(&priv->dmabufs_elm);
>  		priv->vdev = NULL;
>  		priv->revoked = true;
> -		dma_buf_move_notify(priv->dmabuf);
> +		dma_buf_invalidate_mappings(priv->dmabuf);
>  		dma_resv_unlock(priv->dmabuf->resv);
>  		vfio_device_put_registration(&vdev->vdev);
>  		fput(priv->dmabuf->file);
> diff --git a/include/linux/dma-buf.h b/include/linux/dma-buf.h
> index 1b397635c793..d5c3ce2b3aa4 100644
> --- a/include/linux/dma-buf.h
> +++ b/include/linux/dma-buf.h
> @@ -600,7 +600,7 @@ struct sg_table *dma_buf_map_attachment(struct dma_buf_attachment *,
>  					enum dma_data_direction);
>  void dma_buf_unmap_attachment(struct dma_buf_attachment *, struct sg_table *,
>  				enum dma_data_direction);
> -void dma_buf_move_notify(struct dma_buf *dma_buf);
> +void dma_buf_invalidate_mappings(struct dma_buf *dma_buf);
>  int dma_buf_begin_cpu_access(struct dma_buf *dma_buf,
>  			     enum dma_data_direction dir);
>  int dma_buf_end_cpu_access(struct dma_buf *dma_buf,
> 


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v5 3/8] dma-buf: Always build with DMABUF_MOVE_NOTIFY
  2026-01-24 19:14 ` [PATCH v5 3/8] dma-buf: Always build with DMABUF_MOVE_NOTIFY Leon Romanovsky
@ 2026-01-27  9:26   ` Christian König
  2026-01-27  9:58     ` Leon Romanovsky
  0 siblings, 1 reply; 51+ messages in thread
From: Christian König @ 2026-01-27  9:26 UTC (permalink / raw)
  To: Leon Romanovsky, Sumit Semwal, Alex Deucher, David Airlie,
	Simona Vetter, Gerd Hoffmann, Dmitry Osipenko, Gurchetan Singh,
	Chia-I Wu, Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	Lucas De Marchi, Thomas Hellström, Rodrigo Vivi,
	Jason Gunthorpe, Kevin Tian, Joerg Roedel, Will Deacon,
	Robin Murphy, Felix Kuehling, Alex Williamson, Ankit Agrawal,
	Vivek Kasireddy
  Cc: linux-media, dri-devel, linaro-mm-sig, linux-kernel, amd-gfx,
	virtualization, intel-xe, linux-rdma, iommu, kvm

On 1/24/26 20:14, Leon Romanovsky wrote:
> From: Leon Romanovsky <leonro@nvidia.com>
> 
> DMABUF_MOVE_NOTIFY was introduced in 2018 and has been marked as
> experimental and disabled by default ever since. Six years later,
> all new importers implement this callback.
> 
> It is therefore reasonable to drop CONFIG_DMABUF_MOVE_NOTIFY and
> always build DMABUF with support for it enabled.
> 
> Suggested-by: Christian König <christian.koenig@amd.com>
> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>

Reviewed-by: Christian König <christian.koenig@amd.com>

As long as nobody starts screaming in the last second or I encounter some other problem I'm going to push the first three patches to drm-misc-next now.

They are basically just cleanups we should have done a long time ago.

Regards,
Christian.

> ---
>  drivers/dma-buf/Kconfig                     | 12 ------------
>  drivers/dma-buf/dma-buf.c                   |  3 +--
>  drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c | 10 +++-------
>  drivers/gpu/drm/amd/amdkfd/Kconfig          |  2 +-
>  drivers/gpu/drm/xe/tests/xe_dma_buf.c       |  3 +--
>  drivers/gpu/drm/xe/xe_dma_buf.c             | 12 ++++--------
>  6 files changed, 10 insertions(+), 32 deletions(-)
> 
> diff --git a/drivers/dma-buf/Kconfig b/drivers/dma-buf/Kconfig
> index b46eb8a552d7..84d5e9b24e20 100644
> --- a/drivers/dma-buf/Kconfig
> +++ b/drivers/dma-buf/Kconfig
> @@ -40,18 +40,6 @@ config UDMABUF
>  	  A driver to let userspace turn memfd regions into dma-bufs.
>  	  Qemu can use this to create host dmabufs for guest framebuffers.
>  
> -config DMABUF_MOVE_NOTIFY
> -	bool "Move notify between drivers (EXPERIMENTAL)"
> -	default n
> -	depends on DMA_SHARED_BUFFER
> -	help
> -	  Don't pin buffers if the dynamic DMA-buf interface is available on
> -	  both the exporter as well as the importer. This fixes a security
> -	  problem where userspace is able to pin unrestricted amounts of memory
> -	  through DMA-buf.
> -	  This is marked experimental because we don't yet have a consistent
> -	  execution context and memory management between drivers.
> -
>  config DMABUF_DEBUG
>  	bool "DMA-BUF debug checks"
>  	depends on DMA_SHARED_BUFFER
> diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
> index e12db540c413..cd68c1c0bfd7 100644
> --- a/drivers/dma-buf/dma-buf.c
> +++ b/drivers/dma-buf/dma-buf.c
> @@ -847,8 +847,7 @@ static bool
>  dma_buf_pin_on_map(struct dma_buf_attachment *attach)
>  {
>  	return attach->dmabuf->ops->pin &&
> -		(!dma_buf_attachment_is_dynamic(attach) ||
> -		 !IS_ENABLED(CONFIG_DMABUF_MOVE_NOTIFY));
> +	       !dma_buf_attachment_is_dynamic(attach);
>  }
>  
>  /**
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c
> index cd4944ceb047..b7f85b8653cf 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c
> @@ -133,13 +133,9 @@ static int amdgpu_dma_buf_pin(struct dma_buf_attachment *attach)
>  	 * notifiers are disabled, only allow pinning in VRAM when move
>  	 * notiers are enabled.
>  	 */
> -	if (!IS_ENABLED(CONFIG_DMABUF_MOVE_NOTIFY)) {
> -		domains &= ~AMDGPU_GEM_DOMAIN_VRAM;
> -	} else {
> -		list_for_each_entry(attach, &dmabuf->attachments, node)
> -			if (!attach->peer2peer)
> -				domains &= ~AMDGPU_GEM_DOMAIN_VRAM;
> -	}
> +	list_for_each_entry(attach, &dmabuf->attachments, node)
> +		if (!attach->peer2peer)
> +			domains &= ~AMDGPU_GEM_DOMAIN_VRAM;
>  
>  	if (domains & AMDGPU_GEM_DOMAIN_VRAM)
>  		bo->flags |= AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED;
> diff --git a/drivers/gpu/drm/amd/amdkfd/Kconfig b/drivers/gpu/drm/amd/amdkfd/Kconfig
> index 16e12c9913f9..a5d7467c2f34 100644
> --- a/drivers/gpu/drm/amd/amdkfd/Kconfig
> +++ b/drivers/gpu/drm/amd/amdkfd/Kconfig
> @@ -27,7 +27,7 @@ config HSA_AMD_SVM
>  
>  config HSA_AMD_P2P
>  	bool "HSA kernel driver support for peer-to-peer for AMD GPU devices"
> -	depends on HSA_AMD && PCI_P2PDMA && DMABUF_MOVE_NOTIFY
> +	depends on HSA_AMD && PCI_P2PDMA
>  	help
>  	  Enable peer-to-peer (P2P) communication between AMD GPUs over
>  	  the PCIe bus. This can improve performance of multi-GPU compute
> diff --git a/drivers/gpu/drm/xe/tests/xe_dma_buf.c b/drivers/gpu/drm/xe/tests/xe_dma_buf.c
> index 1f2cca5c2f81..c107687ef3c0 100644
> --- a/drivers/gpu/drm/xe/tests/xe_dma_buf.c
> +++ b/drivers/gpu/drm/xe/tests/xe_dma_buf.c
> @@ -22,8 +22,7 @@ static bool p2p_enabled(struct dma_buf_test_params *params)
>  
>  static bool is_dynamic(struct dma_buf_test_params *params)
>  {
> -	return IS_ENABLED(CONFIG_DMABUF_MOVE_NOTIFY) && params->attach_ops &&
> -		params->attach_ops->invalidate_mappings;
> +	return params->attach_ops && params->attach_ops->invalidate_mappings;
>  }
>  
>  static void check_residency(struct kunit *test, struct xe_bo *exported,
> diff --git a/drivers/gpu/drm/xe/xe_dma_buf.c b/drivers/gpu/drm/xe/xe_dma_buf.c
> index 1b9cd043e517..ea370cd373e9 100644
> --- a/drivers/gpu/drm/xe/xe_dma_buf.c
> +++ b/drivers/gpu/drm/xe/xe_dma_buf.c
> @@ -56,14 +56,10 @@ static int xe_dma_buf_pin(struct dma_buf_attachment *attach)
>  	bool allow_vram = true;
>  	int ret;
>  
> -	if (!IS_ENABLED(CONFIG_DMABUF_MOVE_NOTIFY)) {
> -		allow_vram = false;
> -	} else {
> -		list_for_each_entry(attach, &dmabuf->attachments, node) {
> -			if (!attach->peer2peer) {
> -				allow_vram = false;
> -				break;
> -			}
> +	list_for_each_entry(attach, &dmabuf->attachments, node) {
> +		if (!attach->peer2peer) {
> +			allow_vram = false;
> +			break;
>  		}
>  	}
>  
> 


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v5 3/8] dma-buf: Always build with DMABUF_MOVE_NOTIFY
  2026-01-27  9:26   ` Christian König
@ 2026-01-27  9:58     ` Leon Romanovsky
  2026-01-27 10:02       ` Christian König
  0 siblings, 1 reply; 51+ messages in thread
From: Leon Romanovsky @ 2026-01-27  9:58 UTC (permalink / raw)
  To: Christian König
  Cc: Sumit Semwal, Alex Deucher, David Airlie, Simona Vetter,
	Gerd Hoffmann, Dmitry Osipenko, Gurchetan Singh, Chia-I Wu,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	Lucas De Marchi, Thomas Hellström, Rodrigo Vivi,
	Jason Gunthorpe, Kevin Tian, Joerg Roedel, Will Deacon,
	Robin Murphy, Felix Kuehling, Alex Williamson, Ankit Agrawal,
	Vivek Kasireddy, linux-media, dri-devel, linaro-mm-sig,
	linux-kernel, amd-gfx, virtualization, intel-xe, linux-rdma,
	iommu, kvm

On Tue, Jan 27, 2026 at 10:26:26AM +0100, Christian König wrote:
> On 1/24/26 20:14, Leon Romanovsky wrote:
> > From: Leon Romanovsky <leonro@nvidia.com>
> > 
> > DMABUF_MOVE_NOTIFY was introduced in 2018 and has been marked as
> > experimental and disabled by default ever since. Six years later,
> > all new importers implement this callback.
> > 
> > It is therefore reasonable to drop CONFIG_DMABUF_MOVE_NOTIFY and
> > always build DMABUF with support for it enabled.
> > 
> > Suggested-by: Christian König <christian.koenig@amd.com>
> > Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
> 
> Reviewed-by: Christian König <christian.koenig@amd.com>
> 
> As long as nobody starts screaming in the last second or I encounter some other problem I'm going to push the first three patches to drm-misc-next now.

How do you see progress of other patches?
Can they be queued for that tree as well?

Thanks

> 
> They are basically just cleanups we should have done a long time ago.
> 
> Regards,
> Christian.
> 

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v5 3/8] dma-buf: Always build with DMABUF_MOVE_NOTIFY
  2026-01-27  9:58     ` Leon Romanovsky
@ 2026-01-27 10:02       ` Christian König
  2026-01-27 11:42         ` Leon Romanovsky
  2026-01-30  7:00         ` Leon Romanovsky
  0 siblings, 2 replies; 51+ messages in thread
From: Christian König @ 2026-01-27 10:02 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Sumit Semwal, Alex Deucher, David Airlie, Simona Vetter,
	Gerd Hoffmann, Dmitry Osipenko, Gurchetan Singh, Chia-I Wu,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	Lucas De Marchi, Thomas Hellström, Rodrigo Vivi,
	Jason Gunthorpe, Kevin Tian, Joerg Roedel, Will Deacon,
	Robin Murphy, Felix Kuehling, Alex Williamson, Ankit Agrawal,
	Vivek Kasireddy, linux-media, dri-devel, linaro-mm-sig,
	linux-kernel, amd-gfx, virtualization, intel-xe, linux-rdma,
	iommu, kvm

On 1/27/26 10:58, Leon Romanovsky wrote:
> On Tue, Jan 27, 2026 at 10:26:26AM +0100, Christian König wrote:
>> On 1/24/26 20:14, Leon Romanovsky wrote:
>>> From: Leon Romanovsky <leonro@nvidia.com>
>>>
>>> DMABUF_MOVE_NOTIFY was introduced in 2018 and has been marked as
>>> experimental and disabled by default ever since. Six years later,
>>> all new importers implement this callback.
>>>
>>> It is therefore reasonable to drop CONFIG_DMABUF_MOVE_NOTIFY and
>>> always build DMABUF with support for it enabled.
>>>
>>> Suggested-by: Christian König <christian.koenig@amd.com>
>>> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
>>
>> Reviewed-by: Christian König <christian.koenig@amd.com>
>>
>> As long as nobody starts screaming in the last second or I encounter some other problem I'm going to push the first three patches to drm-misc-next now.
> 
> How do you see progress of other patches?
> Can they be queued for that tree as well?

I was hoping to get through them by the end of the week.

Just wanted to make sure that CI systems start to see the first three patches (who affect everybody), so that we get early feedback should we have missed something.

Regards,
Christian.

> 
> Thanks
> 
>>
>> They are basically just cleanups we should have done a long time ago.
>>
>> Regards,
>> Christian.
>>


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v5 3/8] dma-buf: Always build with DMABUF_MOVE_NOTIFY
  2026-01-27 10:02       ` Christian König
@ 2026-01-27 11:42         ` Leon Romanovsky
  2026-01-27 20:45           ` Leon Romanovsky
  2026-01-30  7:00         ` Leon Romanovsky
  1 sibling, 1 reply; 51+ messages in thread
From: Leon Romanovsky @ 2026-01-27 11:42 UTC (permalink / raw)
  To: Christian König
  Cc: Sumit Semwal, Alex Deucher, David Airlie, Simona Vetter,
	Gerd Hoffmann, Dmitry Osipenko, Gurchetan Singh, Chia-I Wu,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	Lucas De Marchi, Thomas Hellström, Rodrigo Vivi,
	Jason Gunthorpe, Kevin Tian, Joerg Roedel, Will Deacon,
	Robin Murphy, Felix Kuehling, Alex Williamson, Ankit Agrawal,
	Vivek Kasireddy, linux-media, dri-devel, linaro-mm-sig,
	linux-kernel, amd-gfx, virtualization, intel-xe, linux-rdma,
	iommu, kvm

On Tue, Jan 27, 2026 at 11:02:03AM +0100, Christian König wrote:
> On 1/27/26 10:58, Leon Romanovsky wrote:
> > On Tue, Jan 27, 2026 at 10:26:26AM +0100, Christian König wrote:
> >> On 1/24/26 20:14, Leon Romanovsky wrote:
> >>> From: Leon Romanovsky <leonro@nvidia.com>
> >>>
> >>> DMABUF_MOVE_NOTIFY was introduced in 2018 and has been marked as
> >>> experimental and disabled by default ever since. Six years later,
> >>> all new importers implement this callback.
> >>>
> >>> It is therefore reasonable to drop CONFIG_DMABUF_MOVE_NOTIFY and
> >>> always build DMABUF with support for it enabled.
> >>>
> >>> Suggested-by: Christian König <christian.koenig@amd.com>
> >>> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
> >>
> >> Reviewed-by: Christian König <christian.koenig@amd.com>
> >>
> >> As long as nobody starts screaming in the last second or I encounter some other problem I'm going to push the first three patches to drm-misc-next now.
> > 
> > How do you see progress of other patches?
> > Can they be queued for that tree as well?
> 
> I was hoping to get through them by the end of the week.
> 
> Just wanted to make sure that CI systems start to see the first three patches (who affect everybody), so that we get early feedback should we have missed something.

Perfect, I based my patches on these two commits:
61ceaf236115 (vfio/for-linus) vfio: Prevent from pinned DMABUF importers to attach to VFIO DMABUF
24d479d26b25 (tag: v6.19-rc6) Linux 6.19-rc6

Thanks

> 
> Regards,
> Christian.
> 
> > 
> > Thanks
> > 
> >>
> >> They are basically just cleanups we should have done a long time ago.
> >>
> >> Regards,
> >> Christian.
> >>
> 

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v5 4/8] vfio: Wait for dma-buf invalidation to complete
  2026-01-27  8:58     ` Leon Romanovsky
@ 2026-01-27 16:27       ` Jason Gunthorpe
  2026-01-29  7:06         ` Tian, Kevin
  0 siblings, 1 reply; 51+ messages in thread
From: Jason Gunthorpe @ 2026-01-27 16:27 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Pranjal Shrivastava, Sumit Semwal, Christian König,
	Alex Deucher, David Airlie, Simona Vetter, Gerd Hoffmann,
	Dmitry Osipenko, Gurchetan Singh, Chia-I Wu, Maarten Lankhorst,
	Maxime Ripard, Thomas Zimmermann, Lucas De Marchi,
	Thomas Hellström, Rodrigo Vivi, Kevin Tian, Joerg Roedel,
	Will Deacon, Robin Murphy, Felix Kuehling, Alex Williamson,
	Ankit Agrawal, Vivek Kasireddy, linux-media, dri-devel,
	linaro-mm-sig, linux-kernel, amd-gfx, virtualization, intel-xe,
	linux-rdma, iommu, kvm

On Tue, Jan 27, 2026 at 10:58:35AM +0200, Leon Romanovsky wrote:
> > > @@ -333,7 +359,37 @@ void vfio_pci_dma_buf_move(struct vfio_pci_core_device *vdev, bool revoked)
> > >  			dma_resv_lock(priv->dmabuf->resv, NULL);
> > >  			priv->revoked = revoked;
> > >  			dma_buf_invalidate_mappings(priv->dmabuf);
> > > +			dma_resv_wait_timeout(priv->dmabuf->resv,
> > > +					      DMA_RESV_USAGE_BOOKKEEP, false,
> > > +					      MAX_SCHEDULE_TIMEOUT);
> > >  			dma_resv_unlock(priv->dmabuf->resv);
> > > +			if (revoked) {
> > > +				kref_put(&priv->kref, vfio_pci_dma_buf_done);
> > > +				/* Let's wait till all DMA unmap are completed. */
> > > +				wait = wait_for_completion_timeout(
> > > +					&priv->comp, secs_to_jiffies(1));
> > 
> > Is the 1-second constant sufficient for all hardware, or should the 
> > invalidate_mappings() contract require the callback to block until 
> > speculative reads are strictly fenced? I'm wondering about a case where
> > a device's firmware has a high response latency, perhaps due to internal
> > management tasks like error recovery or thermal and it exceeds the 1s 
> > timeout. 
> > 
> > If the device is in the middle of a large DMA burst and the firmware is
> > slow to flush the internal pipelines to a fully "quiesced"
> > read-and-discard state, reclaiming the memory at exactly 1.001 seconds
> > risks triggering platform-level faults..
> > 
> > Since the wen explicitly permit these speculative reads until unmap is
> > complete, relying on a hardcoded timeout in the exporter seems to 
> > introduce a hardware-dependent race condition that could compromise
> > system stability via IOMMU errors or AER faults. 
> > 
> > Should the importer instead be required to guarantee that all 
> > speculative access has ceased before the invalidation call returns?
> 
> It is guaranteed by the dma_resv_wait_timeout() call above. That call ensures
> that the hardware has completed all pending operations. The 1‑second delay is
> meant to catch cases where an in-kernel DMA unmap call is missing, which should
> not trigger any DMA activity at that point.

Christian may know actual examples, but my general feeling is he was
worrying about drivers that have pushed the DMABUF to visibility on
the GPU and the move notify & fences only shoot down some access. So
it has to wait until the DMABUF is finally unmapped.

Pranjal's example should be covered by the driver adding a fence and
then the unbounded fence wait will complete it.

I think the open question here is if drivers that can't rely on their
fences should return dma_buf_attach_revocable() = false ? It depends
on how long they will leave the buffers mapped, and if it is "bounded
time".

The converse is we want to detect bugs where drivers have wrongly set
dma_buf_attach_revocable() = true and this turns into an infinite
sleep, so the logging is necessary, IMHO.

At worst the code should sleep 1s, print, then keep sleeping..

Jason

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v5 3/8] dma-buf: Always build with DMABUF_MOVE_NOTIFY
  2026-01-27 11:42         ` Leon Romanovsky
@ 2026-01-27 20:45           ` Leon Romanovsky
  0 siblings, 0 replies; 51+ messages in thread
From: Leon Romanovsky @ 2026-01-27 20:45 UTC (permalink / raw)
  To: Christian König
  Cc: Sumit Semwal, Alex Deucher, David Airlie, Simona Vetter,
	Gerd Hoffmann, Dmitry Osipenko, Gurchetan Singh, Chia-I Wu,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	Lucas De Marchi, Thomas Hellström, Rodrigo Vivi,
	Jason Gunthorpe, Kevin Tian, Joerg Roedel, Will Deacon,
	Robin Murphy, Felix Kuehling, Alex Williamson, Ankit Agrawal,
	Vivek Kasireddy, linux-media, dri-devel, linaro-mm-sig,
	linux-kernel, amd-gfx, virtualization, intel-xe, linux-rdma,
	iommu, kvm

On Tue, Jan 27, 2026 at 01:42:43PM +0200, Leon Romanovsky wrote:
> On Tue, Jan 27, 2026 at 11:02:03AM +0100, Christian König wrote:
> > On 1/27/26 10:58, Leon Romanovsky wrote:
> > > On Tue, Jan 27, 2026 at 10:26:26AM +0100, Christian König wrote:
> > >> On 1/24/26 20:14, Leon Romanovsky wrote:
> > >>> From: Leon Romanovsky <leonro@nvidia.com>
> > >>>
> > >>> DMABUF_MOVE_NOTIFY was introduced in 2018 and has been marked as
> > >>> experimental and disabled by default ever since. Six years later,
> > >>> all new importers implement this callback.
> > >>>
> > >>> It is therefore reasonable to drop CONFIG_DMABUF_MOVE_NOTIFY and
> > >>> always build DMABUF with support for it enabled.
> > >>>
> > >>> Suggested-by: Christian König <christian.koenig@amd.com>
> > >>> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
> > >>
> > >> Reviewed-by: Christian König <christian.koenig@amd.com>
> > >>
> > >> As long as nobody starts screaming in the last second or I encounter some other problem I'm going to push the first three patches to drm-misc-next now.
> > > 
> > > How do you see progress of other patches?
> > > Can they be queued for that tree as well?
> > 
> > I was hoping to get through them by the end of the week.
> > 
> > Just wanted to make sure that CI systems start to see the first three patches (who affect everybody), so that we get early feedback should we have missed something.
> 
> Perfect, I based my patches on these two commits:
> 61ceaf236115 (vfio/for-linus) vfio: Prevent from pinned DMABUF importers to attach to VFIO DMABUF
> 24d479d26b25 (tag: v6.19-rc6) Linux 6.19-rc6

The fix was merged https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=1f97d9dcf53649c41c33227b345a36902cbb08ad

Thanks

> 
> Thanks
> 
> > 
> > Regards,
> > Christian.
> > 
> > > 
> > > Thanks
> > > 
> > >>
> > >> They are basically just cleanups we should have done a long time ago.
> > >>
> > >> Regards,
> > >> Christian.
> > >>
> > 
> 

^ permalink raw reply	[flat|nested] 51+ messages in thread

* RE: [PATCH v5 4/8] vfio: Wait for dma-buf invalidation to complete
  2026-01-27 16:27       ` Jason Gunthorpe
@ 2026-01-29  7:06         ` Tian, Kevin
  2026-01-29  7:33           ` Leon Romanovsky
  2026-01-29 14:58           ` Jason Gunthorpe
  0 siblings, 2 replies; 51+ messages in thread
From: Tian, Kevin @ 2026-01-29  7:06 UTC (permalink / raw)
  To: Jason Gunthorpe, Leon Romanovsky
  Cc: Pranjal Shrivastava, Sumit Semwal, Christian König,
	Alex Deucher, David Airlie, Simona Vetter, Gerd Hoffmann,
	Dmitry Osipenko, Gurchetan Singh, Chia-I Wu, Maarten Lankhorst,
	Maxime Ripard, Thomas Zimmermann, Lucas De Marchi,
	Thomas Hellström, Vivi, Rodrigo, Joerg Roedel, Will Deacon,
	Robin Murphy, Felix Kuehling, Alex Williamson, Ankit Agrawal,
	Kasireddy, Vivek, linux-media@vger.kernel.org,
	dri-devel@lists.freedesktop.org, linaro-mm-sig@lists.linaro.org,
	linux-kernel@vger.kernel.org, amd-gfx@lists.freedesktop.org,
	virtualization@lists.linux.dev, intel-xe@lists.freedesktop.org,
	linux-rdma@vger.kernel.org, iommu@lists.linux.dev,
	kvm@vger.kernel.org

> From: Jason Gunthorpe <jgg@ziepe.ca>
> Sent: Wednesday, January 28, 2026 12:28 AM
> 
> On Tue, Jan 27, 2026 at 10:58:35AM +0200, Leon Romanovsky wrote:
> > > > @@ -333,7 +359,37 @@ void vfio_pci_dma_buf_move(struct
> vfio_pci_core_device *vdev, bool revoked)
> > > >  			dma_resv_lock(priv->dmabuf->resv, NULL);
> > > >  			priv->revoked = revoked;
> > > >  			dma_buf_invalidate_mappings(priv->dmabuf);
> > > > +			dma_resv_wait_timeout(priv->dmabuf->resv,
> > > > +					      DMA_RESV_USAGE_BOOKKEEP,
> false,
> > > > +					      MAX_SCHEDULE_TIMEOUT);
> > > >  			dma_resv_unlock(priv->dmabuf->resv);
> > > > +			if (revoked) {
> > > > +				kref_put(&priv->kref,
> vfio_pci_dma_buf_done);
> > > > +				/* Let's wait till all DMA unmap are
> completed. */
> > > > +				wait = wait_for_completion_timeout(
> > > > +					&priv->comp, secs_to_jiffies(1));
> > >
> > > Is the 1-second constant sufficient for all hardware, or should the
> > > invalidate_mappings() contract require the callback to block until
> > > speculative reads are strictly fenced? I'm wondering about a case where
> > > a device's firmware has a high response latency, perhaps due to internal
> > > management tasks like error recovery or thermal and it exceeds the 1s
> > > timeout.
> > >
> > > If the device is in the middle of a large DMA burst and the firmware is
> > > slow to flush the internal pipelines to a fully "quiesced"
> > > read-and-discard state, reclaiming the memory at exactly 1.001 seconds
> > > risks triggering platform-level faults..
> > >
> > > Since the wen explicitly permit these speculative reads until unmap is
> > > complete, relying on a hardcoded timeout in the exporter seems to
> > > introduce a hardware-dependent race condition that could compromise
> > > system stability via IOMMU errors or AER faults.
> > >
> > > Should the importer instead be required to guarantee that all
> > > speculative access has ceased before the invalidation call returns?
> >
> > It is guaranteed by the dma_resv_wait_timeout() call above. That call
> ensures
> > that the hardware has completed all pending operations. The 1‑second
> delay is
> > meant to catch cases where an in-kernel DMA unmap call is missing, which
> should
> > not trigger any DMA activity at that point.
> 
> Christian may know actual examples, but my general feeling is he was
> worrying about drivers that have pushed the DMABUF to visibility on
> the GPU and the move notify & fences only shoot down some access. So
> it has to wait until the DMABUF is finally unmapped.
> 
> Pranjal's example should be covered by the driver adding a fence and
> then the unbounded fence wait will complete it.
> 

Bear me if it's an ignorant question.

The commit msg of patch6 says that VFIO doesn't tolerate unbounded
wait, which is the reason behind the 2nd timeout wait here.

Then why is "the unbounded fence wait" not a problem in the same
code path? the use of MAX_SCHEDULE_TIMEOUT imply a worst-case
timeout in hundreds of years...

and it'd be helpful to put some words in the code based on what's
discussed here.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* RE: [PATCH v5 8/8] iommufd: Add dma_buf_pin()
  2026-01-24 19:14 ` [PATCH v5 8/8] iommufd: Add dma_buf_pin() Leon Romanovsky
@ 2026-01-29  7:08   ` Tian, Kevin
  2026-01-30  0:17   ` Jason Gunthorpe
  1 sibling, 0 replies; 51+ messages in thread
From: Tian, Kevin @ 2026-01-29  7:08 UTC (permalink / raw)
  To: Leon Romanovsky, Sumit Semwal, Christian König, Alex Deucher,
	David Airlie, Simona Vetter, Gerd Hoffmann, Dmitry Osipenko,
	Gurchetan Singh, Chia-I Wu, Maarten Lankhorst, Maxime Ripard,
	Thomas Zimmermann, Lucas De Marchi, Thomas Hellström,
	Vivi, Rodrigo, Jason Gunthorpe, Joerg Roedel, Will Deacon,
	Robin Murphy, Felix Kuehling, Alex Williamson, Ankit Agrawal,
	Kasireddy, Vivek
  Cc: linux-media@vger.kernel.org, dri-devel@lists.freedesktop.org,
	linaro-mm-sig@lists.linaro.org, linux-kernel@vger.kernel.org,
	amd-gfx@lists.freedesktop.org, virtualization@lists.linux.dev,
	intel-xe@lists.freedesktop.org, linux-rdma@vger.kernel.org,
	iommu@lists.linux.dev, kvm@vger.kernel.org

> From: Leon Romanovsky <leon@kernel.org>
> Sent: Sunday, January 25, 2026 3:14 AM
> 
> From: Leon Romanovsky <leonro@nvidia.com>
> 
> IOMMUFD relies on a private protocol with VFIO, and this always operated
> in pinned mode.
> 
> Now that VFIO can support pinned importers update IOMMUFD to invoke
> the
> normal dma-buf flow to request pin.
> 
> This isn't enough to allow IOMMUFD to work with other exporters, it still
> needs a way to get the physical address list which is another series.
> 
> IOMMUFD supports the defined revoke semantics. It immediately stops and
> fences access to the memory inside it's invalidate_mappings() callback,
> and it currently doesn't use scatterlists so doesn't call map/unmap at
> all.
> 
> It is expected that a future revision can synchronously call unmap from
> the move_notify callback as well.
> 
> Acked-by: Christian König <christian.koenig@amd.com>
> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>

Reviewed-by: Kevin Tian <kevin.tian@intel.com>

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v5 4/8] vfio: Wait for dma-buf invalidation to complete
  2026-01-29  7:06         ` Tian, Kevin
@ 2026-01-29  7:33           ` Leon Romanovsky
  2026-01-29  8:13             ` Tian, Kevin
  2026-01-29 14:58           ` Jason Gunthorpe
  1 sibling, 1 reply; 51+ messages in thread
From: Leon Romanovsky @ 2026-01-29  7:33 UTC (permalink / raw)
  To: Tian, Kevin
  Cc: Jason Gunthorpe, Pranjal Shrivastava, Sumit Semwal,
	Christian König, Alex Deucher, David Airlie, Simona Vetter,
	Gerd Hoffmann, Dmitry Osipenko, Gurchetan Singh, Chia-I Wu,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	Lucas De Marchi, Thomas Hellström, Vivi, Rodrigo,
	Joerg Roedel, Will Deacon, Robin Murphy, Felix Kuehling,
	Alex Williamson, Ankit Agrawal, Kasireddy, Vivek,
	linux-media@vger.kernel.org, dri-devel@lists.freedesktop.org,
	linaro-mm-sig@lists.linaro.org, linux-kernel@vger.kernel.org,
	amd-gfx@lists.freedesktop.org, virtualization@lists.linux.dev,
	intel-xe@lists.freedesktop.org, linux-rdma@vger.kernel.org,
	iommu@lists.linux.dev, kvm@vger.kernel.org

On Thu, Jan 29, 2026 at 07:06:37AM +0000, Tian, Kevin wrote:
> > From: Jason Gunthorpe <jgg@ziepe.ca>
> > Sent: Wednesday, January 28, 2026 12:28 AM
> > 
> > On Tue, Jan 27, 2026 at 10:58:35AM +0200, Leon Romanovsky wrote:
> > > > > @@ -333,7 +359,37 @@ void vfio_pci_dma_buf_move(struct
> > vfio_pci_core_device *vdev, bool revoked)
> > > > >  			dma_resv_lock(priv->dmabuf->resv, NULL);
> > > > >  			priv->revoked = revoked;
> > > > >  			dma_buf_invalidate_mappings(priv->dmabuf);
> > > > > +			dma_resv_wait_timeout(priv->dmabuf->resv,
> > > > > +					      DMA_RESV_USAGE_BOOKKEEP,
> > false,
> > > > > +					      MAX_SCHEDULE_TIMEOUT);
> > > > >  			dma_resv_unlock(priv->dmabuf->resv);
> > > > > +			if (revoked) {
> > > > > +				kref_put(&priv->kref,
> > vfio_pci_dma_buf_done);
> > > > > +				/* Let's wait till all DMA unmap are
> > completed. */
> > > > > +				wait = wait_for_completion_timeout(
> > > > > +					&priv->comp, secs_to_jiffies(1));
> > > >
> > > > Is the 1-second constant sufficient for all hardware, or should the
> > > > invalidate_mappings() contract require the callback to block until
> > > > speculative reads are strictly fenced? I'm wondering about a case where
> > > > a device's firmware has a high response latency, perhaps due to internal
> > > > management tasks like error recovery or thermal and it exceeds the 1s
> > > > timeout.
> > > >
> > > > If the device is in the middle of a large DMA burst and the firmware is
> > > > slow to flush the internal pipelines to a fully "quiesced"
> > > > read-and-discard state, reclaiming the memory at exactly 1.001 seconds
> > > > risks triggering platform-level faults..
> > > >
> > > > Since the wen explicitly permit these speculative reads until unmap is
> > > > complete, relying on a hardcoded timeout in the exporter seems to
> > > > introduce a hardware-dependent race condition that could compromise
> > > > system stability via IOMMU errors or AER faults.
> > > >
> > > > Should the importer instead be required to guarantee that all
> > > > speculative access has ceased before the invalidation call returns?
> > >
> > > It is guaranteed by the dma_resv_wait_timeout() call above. That call
> > ensures
> > > that the hardware has completed all pending operations. The 1‑second
> > delay is
> > > meant to catch cases where an in-kernel DMA unmap call is missing, which
> > should
> > > not trigger any DMA activity at that point.
> > 
> > Christian may know actual examples, but my general feeling is he was
> > worrying about drivers that have pushed the DMABUF to visibility on
> > the GPU and the move notify & fences only shoot down some access. So
> > it has to wait until the DMABUF is finally unmapped.
> > 
> > Pranjal's example should be covered by the driver adding a fence and
> > then the unbounded fence wait will complete it.
> > 
> 
> Bear me if it's an ignorant question.
> 
> The commit msg of patch6 says that VFIO doesn't tolerate unbounded
> wait, which is the reason behind the 2nd timeout wait here.

It is not accurate. A second timeout is present both in the
description of patch 6 and in VFIO implementation. The difference is
that the timeout is enforced within VFIO.

> 
> Then why is "the unbounded fence wait" not a problem in the same
> code path? the use of MAX_SCHEDULE_TIMEOUT imply a worst-case
> timeout in hundreds of years...

"An unbounded fence wait" is a different class of wait. It indicates broken
hardware that continues to issue DMA transactions even after it has been told to
stop.

The second wait exists to catch software bugs or misuse, where the dma-buf
importer has misrepresented its capabilities.

> 
> and it'd be helpful to put some words in the code based on what's
> discussed here.

We've documented as much as we can in dma_buf_attach_revocable() and
dma_buf_invalidate_mappings(). Do you have any suggestions on what else
should be added here?

Thanks

^ permalink raw reply	[flat|nested] 51+ messages in thread

* RE: [PATCH v5 4/8] vfio: Wait for dma-buf invalidation to complete
  2026-01-29  7:33           ` Leon Romanovsky
@ 2026-01-29  8:13             ` Tian, Kevin
  2026-01-29  8:41               ` Leon Romanovsky
  0 siblings, 1 reply; 51+ messages in thread
From: Tian, Kevin @ 2026-01-29  8:13 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Jason Gunthorpe, Pranjal Shrivastava, Sumit Semwal,
	Christian König, Alex Deucher, David Airlie, Simona Vetter,
	Gerd Hoffmann, Dmitry Osipenko, Gurchetan Singh, Chia-I Wu,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	Lucas De Marchi, Thomas Hellström, Vivi, Rodrigo,
	Joerg Roedel, Will Deacon, Robin Murphy, Felix Kuehling,
	Alex Williamson, Ankit Agrawal, Kasireddy, Vivek,
	linux-media@vger.kernel.org, dri-devel@lists.freedesktop.org,
	linaro-mm-sig@lists.linaro.org, linux-kernel@vger.kernel.org,
	amd-gfx@lists.freedesktop.org, virtualization@lists.linux.dev,
	intel-xe@lists.freedesktop.org, linux-rdma@vger.kernel.org,
	iommu@lists.linux.dev, kvm@vger.kernel.org

> From: Leon Romanovsky <leon@kernel.org>
> Sent: Thursday, January 29, 2026 3:34 PM
> 
> On Thu, Jan 29, 2026 at 07:06:37AM +0000, Tian, Kevin wrote:
> > > From: Jason Gunthorpe <jgg@ziepe.ca>
> > > Sent: Wednesday, January 28, 2026 12:28 AM
> > >
> > > On Tue, Jan 27, 2026 at 10:58:35AM +0200, Leon Romanovsky wrote:
> > > > > > @@ -333,7 +359,37 @@ void vfio_pci_dma_buf_move(struct
> > > vfio_pci_core_device *vdev, bool revoked)
> > > > > >  			dma_resv_lock(priv->dmabuf->resv, NULL);
> > > > > >  			priv->revoked = revoked;
> > > > > >  			dma_buf_invalidate_mappings(priv-
> >dmabuf);
> > > > > > +			dma_resv_wait_timeout(priv->dmabuf->resv,
> > > > > > +
> DMA_RESV_USAGE_BOOKKEEP,
> > > false,
> > > > > > +
> MAX_SCHEDULE_TIMEOUT);
> > > > > >  			dma_resv_unlock(priv->dmabuf->resv);
> > > > > > +			if (revoked) {
> > > > > > +				kref_put(&priv->kref,
> > > vfio_pci_dma_buf_done);
> > > > > > +				/* Let's wait till all DMA unmap are
> > > completed. */
> > > > > > +				wait = wait_for_completion_timeout(
> > > > > > +					&priv->comp,
> secs_to_jiffies(1));
> > > > >
> > > > > Is the 1-second constant sufficient for all hardware, or should the
> > > > > invalidate_mappings() contract require the callback to block until
> > > > > speculative reads are strictly fenced? I'm wondering about a case
> where
> > > > > a device's firmware has a high response latency, perhaps due to
> internal
> > > > > management tasks like error recovery or thermal and it exceeds the
> 1s
> > > > > timeout.
> > > > >
> > > > > If the device is in the middle of a large DMA burst and the firmware is
> > > > > slow to flush the internal pipelines to a fully "quiesced"
> > > > > read-and-discard state, reclaiming the memory at exactly 1.001
> seconds
> > > > > risks triggering platform-level faults..
> > > > >
> > > > > Since the wen explicitly permit these speculative reads until unmap is
> > > > > complete, relying on a hardcoded timeout in the exporter seems to
> > > > > introduce a hardware-dependent race condition that could
> compromise
> > > > > system stability via IOMMU errors or AER faults.
> > > > >
> > > > > Should the importer instead be required to guarantee that all
> > > > > speculative access has ceased before the invalidation call returns?
> > > >
> > > > It is guaranteed by the dma_resv_wait_timeout() call above. That call
> > > ensures
> > > > that the hardware has completed all pending operations. The 1‑second
> > > delay is
> > > > meant to catch cases where an in-kernel DMA unmap call is missing,
> which
> > > should
> > > > not trigger any DMA activity at that point.
> > >
> > > Christian may know actual examples, but my general feeling is he was
> > > worrying about drivers that have pushed the DMABUF to visibility on
> > > the GPU and the move notify & fences only shoot down some access. So
> > > it has to wait until the DMABUF is finally unmapped.
> > >
> > > Pranjal's example should be covered by the driver adding a fence and
> > > then the unbounded fence wait will complete it.
> > >
> >
> > Bear me if it's an ignorant question.
> >
> > The commit msg of patch6 says that VFIO doesn't tolerate unbounded
> > wait, which is the reason behind the 2nd timeout wait here.
> 
> It is not accurate. A second timeout is present both in the
> description of patch 6 and in VFIO implementation. The difference is
> that the timeout is enforced within VFIO.
> 
> >
> > Then why is "the unbounded fence wait" not a problem in the same
> > code path? the use of MAX_SCHEDULE_TIMEOUT imply a worst-case
> > timeout in hundreds of years...
> 
> "An unbounded fence wait" is a different class of wait. It indicates broken
> hardware that continues to issue DMA transactions even after it has been
> told to
> stop.
> 
> The second wait exists to catch software bugs or misuse, where the dma-buf
> importer has misrepresented its capabilities.
> 

Okay I see.

> >
> > and it'd be helpful to put some words in the code based on what's
> > discussed here.
> 
> We've documented as much as we can in dma_buf_attach_revocable() and
> dma_buf_invalidate_mappings(). Do you have any suggestions on what else
> should be added here?
> 

the selection of 1s?

then,

Reviewed-by: Kevin Tian <kevin.tian@intel.com>

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v5 4/8] vfio: Wait for dma-buf invalidation to complete
  2026-01-29  8:13             ` Tian, Kevin
@ 2026-01-29  8:41               ` Leon Romanovsky
  2026-01-29 21:04                 ` Alex Williamson
  2026-01-30  3:10                 ` Tian, Kevin
  0 siblings, 2 replies; 51+ messages in thread
From: Leon Romanovsky @ 2026-01-29  8:41 UTC (permalink / raw)
  To: Tian, Kevin
  Cc: Jason Gunthorpe, Pranjal Shrivastava, Sumit Semwal,
	Christian König, Alex Deucher, David Airlie, Simona Vetter,
	Gerd Hoffmann, Dmitry Osipenko, Gurchetan Singh, Chia-I Wu,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	Lucas De Marchi, Thomas Hellström, Vivi, Rodrigo,
	Joerg Roedel, Will Deacon, Robin Murphy, Felix Kuehling,
	Alex Williamson, Ankit Agrawal, Kasireddy, Vivek,
	linux-media@vger.kernel.org, dri-devel@lists.freedesktop.org,
	linaro-mm-sig@lists.linaro.org, linux-kernel@vger.kernel.org,
	amd-gfx@lists.freedesktop.org, virtualization@lists.linux.dev,
	intel-xe@lists.freedesktop.org, linux-rdma@vger.kernel.org,
	iommu@lists.linux.dev, kvm@vger.kernel.org

On Thu, Jan 29, 2026 at 08:13:18AM +0000, Tian, Kevin wrote:
> > From: Leon Romanovsky <leon@kernel.org>
> > Sent: Thursday, January 29, 2026 3:34 PM
> > 
> > On Thu, Jan 29, 2026 at 07:06:37AM +0000, Tian, Kevin wrote:
> > > > From: Jason Gunthorpe <jgg@ziepe.ca>
> > > > Sent: Wednesday, January 28, 2026 12:28 AM
> > > >
> > > > On Tue, Jan 27, 2026 at 10:58:35AM +0200, Leon Romanovsky wrote:
> > > > > > > @@ -333,7 +359,37 @@ void vfio_pci_dma_buf_move(struct
> > > > vfio_pci_core_device *vdev, bool revoked)
> > > > > > >  			dma_resv_lock(priv->dmabuf->resv, NULL);
> > > > > > >  			priv->revoked = revoked;
> > > > > > >  			dma_buf_invalidate_mappings(priv-
> > >dmabuf);
> > > > > > > +			dma_resv_wait_timeout(priv->dmabuf->resv,
> > > > > > > +
> > DMA_RESV_USAGE_BOOKKEEP,
> > > > false,
> > > > > > > +
> > MAX_SCHEDULE_TIMEOUT);
> > > > > > >  			dma_resv_unlock(priv->dmabuf->resv);
> > > > > > > +			if (revoked) {
> > > > > > > +				kref_put(&priv->kref,
> > > > vfio_pci_dma_buf_done);
> > > > > > > +				/* Let's wait till all DMA unmap are
> > > > completed. */
> > > > > > > +				wait = wait_for_completion_timeout(
> > > > > > > +					&priv->comp,
> > secs_to_jiffies(1));
> > > > > >
> > > > > > Is the 1-second constant sufficient for all hardware, or should the
> > > > > > invalidate_mappings() contract require the callback to block until
> > > > > > speculative reads are strictly fenced? I'm wondering about a case
> > where
> > > > > > a device's firmware has a high response latency, perhaps due to
> > internal
> > > > > > management tasks like error recovery or thermal and it exceeds the
> > 1s
> > > > > > timeout.
> > > > > >
> > > > > > If the device is in the middle of a large DMA burst and the firmware is
> > > > > > slow to flush the internal pipelines to a fully "quiesced"
> > > > > > read-and-discard state, reclaiming the memory at exactly 1.001
> > seconds
> > > > > > risks triggering platform-level faults..
> > > > > >
> > > > > > Since the wen explicitly permit these speculative reads until unmap is
> > > > > > complete, relying on a hardcoded timeout in the exporter seems to
> > > > > > introduce a hardware-dependent race condition that could
> > compromise
> > > > > > system stability via IOMMU errors or AER faults.
> > > > > >
> > > > > > Should the importer instead be required to guarantee that all
> > > > > > speculative access has ceased before the invalidation call returns?
> > > > >
> > > > > It is guaranteed by the dma_resv_wait_timeout() call above. That call
> > > > ensures
> > > > > that the hardware has completed all pending operations. The 1‑second
> > > > delay is
> > > > > meant to catch cases where an in-kernel DMA unmap call is missing,
> > which
> > > > should
> > > > > not trigger any DMA activity at that point.
> > > >
> > > > Christian may know actual examples, but my general feeling is he was
> > > > worrying about drivers that have pushed the DMABUF to visibility on
> > > > the GPU and the move notify & fences only shoot down some access. So
> > > > it has to wait until the DMABUF is finally unmapped.
> > > >
> > > > Pranjal's example should be covered by the driver adding a fence and
> > > > then the unbounded fence wait will complete it.
> > > >
> > >
> > > Bear me if it's an ignorant question.
> > >
> > > The commit msg of patch6 says that VFIO doesn't tolerate unbounded
> > > wait, which is the reason behind the 2nd timeout wait here.
> > 
> > It is not accurate. A second timeout is present both in the
> > description of patch 6 and in VFIO implementation. The difference is
> > that the timeout is enforced within VFIO.
> > 
> > >
> > > Then why is "the unbounded fence wait" not a problem in the same
> > > code path? the use of MAX_SCHEDULE_TIMEOUT imply a worst-case
> > > timeout in hundreds of years...
> > 
> > "An unbounded fence wait" is a different class of wait. It indicates broken
> > hardware that continues to issue DMA transactions even after it has been
> > told to
> > stop.
> > 
> > The second wait exists to catch software bugs or misuse, where the dma-buf
> > importer has misrepresented its capabilities.
> > 
> 
> Okay I see.
> 
> > >
> > > and it'd be helpful to put some words in the code based on what's
> > > discussed here.
> > 
> > We've documented as much as we can in dma_buf_attach_revocable() and
> > dma_buf_invalidate_mappings(). Do you have any suggestions on what else
> > should be added here?
> > 
> 
> the selection of 1s?

It is indirectly written in description of WARN_ON(), but let's add
more. What about the following?

diff --git a/drivers/vfio/pci/vfio_pci_dmabuf.c b/drivers/vfio/pci/vfio_pci_dmabuf.c
index 93795ad2e025..948ba75288c6 100644
--- a/drivers/vfio/pci/vfio_pci_dmabuf.c
+++ b/drivers/vfio/pci/vfio_pci_dmabuf.c
@@ -357,7 +357,13 @@ void vfio_pci_dma_buf_move(struct vfio_pci_core_device *vdev, bool revoked)
                        dma_resv_unlock(priv->dmabuf->resv);
                        if (revoked) {
                                kref_put(&priv->kref, vfio_pci_dma_buf_done);
-                               /* Let's wait till all DMA unmap are completed. */
+                               /*
+                                * Let's wait for 1 second till all DMA unmap
+                                * are completed. It is supposed to catch dma-buf
+                                * importers which lied about their support
+                                * of dmabuf revoke. See dma_buf_invalidate_mappings()
+                                * for the expected behaviour,
+                                */
                                wait = wait_for_completion_timeout(
                                        &priv->comp, secs_to_jiffies(1));
                                /*

> 
> then,
> 
> Reviewed-by: Kevin Tian <kevin.tian@intel.com>

Thanks

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* Re: [PATCH v5 4/8] vfio: Wait for dma-buf invalidation to complete
  2026-01-29  7:06         ` Tian, Kevin
  2026-01-29  7:33           ` Leon Romanovsky
@ 2026-01-29 14:58           ` Jason Gunthorpe
  2026-01-30  3:12             ` Tian, Kevin
  2026-01-30  8:46             ` Christian König
  1 sibling, 2 replies; 51+ messages in thread
From: Jason Gunthorpe @ 2026-01-29 14:58 UTC (permalink / raw)
  To: Tian, Kevin
  Cc: Leon Romanovsky, Pranjal Shrivastava, Sumit Semwal,
	Christian König, Alex Deucher, David Airlie, Simona Vetter,
	Gerd Hoffmann, Dmitry Osipenko, Gurchetan Singh, Chia-I Wu,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	Lucas De Marchi, Thomas Hellström, Vivi, Rodrigo,
	Joerg Roedel, Will Deacon, Robin Murphy, Felix Kuehling,
	Alex Williamson, Ankit Agrawal, Kasireddy, Vivek,
	linux-media@vger.kernel.org, dri-devel@lists.freedesktop.org,
	linaro-mm-sig@lists.linaro.org, linux-kernel@vger.kernel.org,
	amd-gfx@lists.freedesktop.org, virtualization@lists.linux.dev,
	intel-xe@lists.freedesktop.org, linux-rdma@vger.kernel.org,
	iommu@lists.linux.dev, kvm@vger.kernel.org

On Thu, Jan 29, 2026 at 07:06:37AM +0000, Tian, Kevin wrote:
> Bear me if it's an ignorant question.
> 
> The commit msg of patch6 says that VFIO doesn't tolerate unbounded
> wait, which is the reason behind the 2nd timeout wait here.

As far as I understand dmabuf design a fence wait should complete
eventually under kernel control, because these sleeps are
sprinkled all around the kernel today.

I suspect that is not actually true for every HW, probably something
like "shader programs can run forever technically".

We can argue if those cases should not report revocable either, but at
least this will work "correctly" even if it takes a huge amount of
time.

I wouldn't mind seeing a shorter timeout and print on the fence too
just in case.

Jason

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v5 4/8] vfio: Wait for dma-buf invalidation to complete
  2026-01-29  8:41               ` Leon Romanovsky
@ 2026-01-29 21:04                 ` Alex Williamson
  2026-01-30  3:10                 ` Tian, Kevin
  1 sibling, 0 replies; 51+ messages in thread
From: Alex Williamson @ 2026-01-29 21:04 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Tian, Kevin, Jason Gunthorpe, Pranjal Shrivastava, Sumit Semwal,
	Christian König, Alex Deucher, David Airlie, Simona Vetter,
	Gerd Hoffmann, Dmitry Osipenko, Gurchetan Singh, Chia-I Wu,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	Lucas De Marchi, Thomas Hellström, Vivi, Rodrigo,
	Joerg Roedel, Will Deacon, Robin Murphy, Felix Kuehling,
	Ankit Agrawal, Kasireddy, Vivek, linux-media@vger.kernel.org,
	dri-devel@lists.freedesktop.org, linaro-mm-sig@lists.linaro.org,
	linux-kernel@vger.kernel.org, amd-gfx@lists.freedesktop.org,
	virtualization@lists.linux.dev, intel-xe@lists.freedesktop.org,
	linux-rdma@vger.kernel.org, iommu@lists.linux.dev,
	kvm@vger.kernel.org

On Thu, 29 Jan 2026 10:41:56 +0200
Leon Romanovsky <leon@kernel.org> wrote:

> On Thu, Jan 29, 2026 at 08:13:18AM +0000, Tian, Kevin wrote:
> > > From: Leon Romanovsky <leon@kernel.org>
> > > Sent: Thursday, January 29, 2026 3:34 PM
> > > 
> > > On Thu, Jan 29, 2026 at 07:06:37AM +0000, Tian, Kevin wrote:  
> > > > > From: Jason Gunthorpe <jgg@ziepe.ca>
> > > > > Sent: Wednesday, January 28, 2026 12:28 AM
> > > > >
> > > > > On Tue, Jan 27, 2026 at 10:58:35AM +0200, Leon Romanovsky wrote:  
> > > > > > > > @@ -333,7 +359,37 @@ void vfio_pci_dma_buf_move(struct  
> > > > > vfio_pci_core_device *vdev, bool revoked)  
> > > > > > > >  			dma_resv_lock(priv->dmabuf->resv, NULL);
> > > > > > > >  			priv->revoked = revoked;
> > > > > > > >  			dma_buf_invalidate_mappings(priv-  
> > > >dmabuf);  
> > > > > > > > +			dma_resv_wait_timeout(priv->dmabuf->resv,
> > > > > > > > +  
> > > DMA_RESV_USAGE_BOOKKEEP,  
> > > > > false,  
> > > > > > > > +  
> > > MAX_SCHEDULE_TIMEOUT);  
> > > > > > > >  			dma_resv_unlock(priv->dmabuf->resv);
> > > > > > > > +			if (revoked) {
> > > > > > > > +				kref_put(&priv->kref,  
> > > > > vfio_pci_dma_buf_done);  
> > > > > > > > +				/* Let's wait till all DMA unmap are  
> > > > > completed. */  
> > > > > > > > +				wait = wait_for_completion_timeout(
> > > > > > > > +					&priv->comp,  
> > > secs_to_jiffies(1));  
> > > > > > >
> > > > > > > Is the 1-second constant sufficient for all hardware, or should the
> > > > > > > invalidate_mappings() contract require the callback to block until
> > > > > > > speculative reads are strictly fenced? I'm wondering about a case  
> > > where  
> > > > > > > a device's firmware has a high response latency, perhaps due to  
> > > internal  
> > > > > > > management tasks like error recovery or thermal and it exceeds the  
> > > 1s  
> > > > > > > timeout.
> > > > > > >
> > > > > > > If the device is in the middle of a large DMA burst and the firmware is
> > > > > > > slow to flush the internal pipelines to a fully "quiesced"
> > > > > > > read-and-discard state, reclaiming the memory at exactly 1.001  
> > > seconds  
> > > > > > > risks triggering platform-level faults..
> > > > > > >
> > > > > > > Since the wen explicitly permit these speculative reads until unmap is
> > > > > > > complete, relying on a hardcoded timeout in the exporter seems to
> > > > > > > introduce a hardware-dependent race condition that could  
> > > compromise  
> > > > > > > system stability via IOMMU errors or AER faults.
> > > > > > >
> > > > > > > Should the importer instead be required to guarantee that all
> > > > > > > speculative access has ceased before the invalidation call returns?  
> > > > > >
> > > > > > It is guaranteed by the dma_resv_wait_timeout() call above. That call  
> > > > > ensures  
> > > > > > that the hardware has completed all pending operations. The 1‑second  
> > > > > delay is  
> > > > > > meant to catch cases where an in-kernel DMA unmap call is missing,  
> > > which  
> > > > > should  
> > > > > > not trigger any DMA activity at that point.  
> > > > >
> > > > > Christian may know actual examples, but my general feeling is he was
> > > > > worrying about drivers that have pushed the DMABUF to visibility on
> > > > > the GPU and the move notify & fences only shoot down some access. So
> > > > > it has to wait until the DMABUF is finally unmapped.
> > > > >
> > > > > Pranjal's example should be covered by the driver adding a fence and
> > > > > then the unbounded fence wait will complete it.
> > > > >  
> > > >
> > > > Bear me if it's an ignorant question.
> > > >
> > > > The commit msg of patch6 says that VFIO doesn't tolerate unbounded
> > > > wait, which is the reason behind the 2nd timeout wait here.  
> > > 
> > > It is not accurate. A second timeout is present both in the
> > > description of patch 6 and in VFIO implementation. The difference is
> > > that the timeout is enforced within VFIO.
> > >   
> > > >
> > > > Then why is "the unbounded fence wait" not a problem in the same
> > > > code path? the use of MAX_SCHEDULE_TIMEOUT imply a worst-case
> > > > timeout in hundreds of years...  
> > > 
> > > "An unbounded fence wait" is a different class of wait. It indicates broken
> > > hardware that continues to issue DMA transactions even after it has been
> > > told to
> > > stop.
> > > 
> > > The second wait exists to catch software bugs or misuse, where the dma-buf
> > > importer has misrepresented its capabilities.
> > >   
> > 
> > Okay I see.
> >   
> > > >
> > > > and it'd be helpful to put some words in the code based on what's
> > > > discussed here.  
> > > 
> > > We've documented as much as we can in dma_buf_attach_revocable() and
> > > dma_buf_invalidate_mappings(). Do you have any suggestions on what else
> > > should be added here?
> > >   
> > 
> > the selection of 1s?  
> 
> It is indirectly written in description of WARN_ON(), but let's add
> more. What about the following?
> 
> diff --git a/drivers/vfio/pci/vfio_pci_dmabuf.c b/drivers/vfio/pci/vfio_pci_dmabuf.c
> index 93795ad2e025..948ba75288c6 100644
> --- a/drivers/vfio/pci/vfio_pci_dmabuf.c
> +++ b/drivers/vfio/pci/vfio_pci_dmabuf.c
> @@ -357,7 +357,13 @@ void vfio_pci_dma_buf_move(struct vfio_pci_core_device *vdev, bool revoked)
>                         dma_resv_unlock(priv->dmabuf->resv);
>                         if (revoked) {
>                                 kref_put(&priv->kref, vfio_pci_dma_buf_done);
> -                               /* Let's wait till all DMA unmap are completed. */
> +                               /*
> +                                * Let's wait for 1 second till all DMA unmap
> +                                * are completed. It is supposed to catch dma-buf
> +                                * importers which lied about their support
> +                                * of dmabuf revoke. See dma_buf_invalidate_mappings()
> +                                * for the expected behaviour,
> +                                */
>                                 wait = wait_for_completion_timeout(
>                                         &priv->comp, secs_to_jiffies(1));
>                                 /*
> 
> > 
> > then,
> > 
> > Reviewed-by: Kevin Tian <kevin.tian@intel.com>  

Reviewed-by: Alex Williamson <alex@shazbot.org>

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v5 7/8] vfio: Permit VFIO to work with pinned importers
  2026-01-24 19:14 ` [PATCH v5 7/8] vfio: Permit VFIO to work with pinned importers Leon Romanovsky
@ 2026-01-29 21:04   ` Alex Williamson
  2026-01-30  3:14   ` Tian, Kevin
  1 sibling, 0 replies; 51+ messages in thread
From: Alex Williamson @ 2026-01-29 21:04 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Sumit Semwal, Christian König, Alex Deucher, David Airlie,
	Simona Vetter, Gerd Hoffmann, Dmitry Osipenko, Gurchetan Singh,
	Chia-I Wu, Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	Lucas De Marchi, Thomas Hellström, Rodrigo Vivi,
	Jason Gunthorpe, Kevin Tian, Joerg Roedel, Will Deacon,
	Robin Murphy, Felix Kuehling, Ankit Agrawal, Vivek Kasireddy,
	linux-media, dri-devel, linaro-mm-sig, linux-kernel, amd-gfx,
	virtualization, intel-xe, linux-rdma, iommu, kvm

On Sat, 24 Jan 2026 21:14:19 +0200
Leon Romanovsky <leon@kernel.org> wrote:

> From: Leon Romanovsky <leonro@nvidia.com>
> 
> Till now VFIO has rejected pinned importers, largely to avoid being used
> with the RDMA pinned importer that cannot handle a move_notify() to revoke
> access.
> 
> Using dma_buf_attach_revocable() it can tell the difference between pinned
> importers that support the flow described in dma_buf_invalidate_mappings()
> and those that don't.
> 
> Thus permit compatible pinned importers.
> 
> This is one of two items IOMMUFD requires to remove its private interface
> to VFIO's dma-buf.
> 
> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
> ---
>  drivers/vfio/pci/vfio_pci_dmabuf.c | 15 +++------------
>  1 file changed, 3 insertions(+), 12 deletions(-)
> 
> diff --git a/drivers/vfio/pci/vfio_pci_dmabuf.c b/drivers/vfio/pci/vfio_pci_dmabuf.c
> index 485515629fe4..3c8dc56e2238 100644
> --- a/drivers/vfio/pci/vfio_pci_dmabuf.c
> +++ b/drivers/vfio/pci/vfio_pci_dmabuf.c
> @@ -22,16 +22,6 @@ struct vfio_pci_dma_buf {
>  	u8 revoked : 1;
>  };
>  
> -static int vfio_pci_dma_buf_pin(struct dma_buf_attachment *attachment)
> -{
> -	return -EOPNOTSUPP;
> -}
> -
> -static void vfio_pci_dma_buf_unpin(struct dma_buf_attachment *attachment)
> -{
> -	/* Do nothing */
> -}
> -
>  static int vfio_pci_dma_buf_attach(struct dma_buf *dmabuf,
>  				   struct dma_buf_attachment *attachment)
>  {
> @@ -43,6 +33,9 @@ static int vfio_pci_dma_buf_attach(struct dma_buf *dmabuf,
>  	if (priv->revoked)
>  		return -ENODEV;
>  
> +	if (!dma_buf_attach_revocable(attachment))
> +		return -EOPNOTSUPP;
> +
>  	return 0;
>  }
>  
> @@ -107,8 +100,6 @@ static void vfio_pci_dma_buf_release(struct dma_buf *dmabuf)
>  }
>  
>  static const struct dma_buf_ops vfio_pci_dmabuf_ops = {
> -	.pin = vfio_pci_dma_buf_pin,
> -	.unpin = vfio_pci_dma_buf_unpin,
>  	.attach = vfio_pci_dma_buf_attach,
>  	.map_dma_buf = vfio_pci_dma_buf_map,
>  	.unmap_dma_buf = vfio_pci_dma_buf_unmap,
> 

I'm not sure what the merge plan is for the remaining patches, but I'll
toss my ack in so that it can all go through Christian's tree if he
prefers.  Thanks

Reviewed-by: Alex Williamson <alex@shazbot.org>


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v5 8/8] iommufd: Add dma_buf_pin()
  2026-01-24 19:14 ` [PATCH v5 8/8] iommufd: Add dma_buf_pin() Leon Romanovsky
  2026-01-29  7:08   ` Tian, Kevin
@ 2026-01-30  0:17   ` Jason Gunthorpe
  1 sibling, 0 replies; 51+ messages in thread
From: Jason Gunthorpe @ 2026-01-30  0:17 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Sumit Semwal, Christian König, Alex Deucher, David Airlie,
	Simona Vetter, Gerd Hoffmann, Dmitry Osipenko, Gurchetan Singh,
	Chia-I Wu, Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	Lucas De Marchi, Thomas Hellström, Rodrigo Vivi, Kevin Tian,
	Joerg Roedel, Will Deacon, Robin Murphy, Felix Kuehling,
	Alex Williamson, Ankit Agrawal, Vivek Kasireddy, linux-media,
	dri-devel, linaro-mm-sig, linux-kernel, amd-gfx, virtualization,
	intel-xe, linux-rdma, iommu, kvm

On Sat, Jan 24, 2026 at 09:14:20PM +0200, Leon Romanovsky wrote:
> From: Leon Romanovsky <leonro@nvidia.com>
> 
> IOMMUFD relies on a private protocol with VFIO, and this always operated
> in pinned mode.
> 
> Now that VFIO can support pinned importers update IOMMUFD to invoke the
> normal dma-buf flow to request pin.
> 
> This isn't enough to allow IOMMUFD to work with other exporters, it still
> needs a way to get the physical address list which is another series.
> 
> IOMMUFD supports the defined revoke semantics. It immediately stops and
> fences access to the memory inside it's invalidate_mappings() callback,
> and it currently doesn't use scatterlists so doesn't call map/unmap at
> all.
> 
> It is expected that a future revision can synchronously call unmap from
> the move_notify callback as well.
> 
> Acked-by: Christian König <christian.koenig@amd.com>
> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
> ---
>  drivers/iommu/iommufd/pages.c | 9 ++++++++-
>  1 file changed, 8 insertions(+), 1 deletion(-)

Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>

Jason

^ permalink raw reply	[flat|nested] 51+ messages in thread

* RE: [PATCH v5 4/8] vfio: Wait for dma-buf invalidation to complete
  2026-01-29  8:41               ` Leon Romanovsky
  2026-01-29 21:04                 ` Alex Williamson
@ 2026-01-30  3:10                 ` Tian, Kevin
  1 sibling, 0 replies; 51+ messages in thread
From: Tian, Kevin @ 2026-01-30  3:10 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Jason Gunthorpe, Pranjal Shrivastava, Sumit Semwal,
	Christian König, Alex Deucher, David Airlie, Simona Vetter,
	Gerd Hoffmann, Dmitry Osipenko, Gurchetan Singh, Chia-I Wu,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	Lucas De Marchi, Thomas Hellström, Vivi, Rodrigo,
	Joerg Roedel, Will Deacon, Robin Murphy, Felix Kuehling,
	Alex Williamson, Ankit Agrawal, Kasireddy, Vivek,
	linux-media@vger.kernel.org, dri-devel@lists.freedesktop.org,
	linaro-mm-sig@lists.linaro.org, linux-kernel@vger.kernel.org,
	amd-gfx@lists.freedesktop.org, virtualization@lists.linux.dev,
	intel-xe@lists.freedesktop.org, linux-rdma@vger.kernel.org,
	iommu@lists.linux.dev, kvm@vger.kernel.org

> From: Leon Romanovsky <leon@kernel.org>
> Sent: Thursday, January 29, 2026 4:42 PM
> 
> On Thu, Jan 29, 2026 at 08:13:18AM +0000, Tian, Kevin wrote:
> > > From: Leon Romanovsky <leon@kernel.org>
> > > Sent: Thursday, January 29, 2026 3:34 PM
> > >
> > > On Thu, Jan 29, 2026 at 07:06:37AM +0000, Tian, Kevin wrote:
> > > > > From: Jason Gunthorpe <jgg@ziepe.ca>
> > > > > Sent: Wednesday, January 28, 2026 12:28 AM
> > > > >
> > > > > On Tue, Jan 27, 2026 at 10:58:35AM +0200, Leon Romanovsky wrote:
> > > > > > > > @@ -333,7 +359,37 @@ void vfio_pci_dma_buf_move(struct
> > > > > vfio_pci_core_device *vdev, bool revoked)
> > > > > > > >  			dma_resv_lock(priv->dmabuf->resv, NULL);
> > > > > > > >  			priv->revoked = revoked;
> > > > > > > >  			dma_buf_invalidate_mappings(priv-
> > > >dmabuf);
> > > > > > > > +			dma_resv_wait_timeout(priv->dmabuf->resv,
> > > > > > > > +
> > > DMA_RESV_USAGE_BOOKKEEP,
> > > > > false,
> > > > > > > > +
> > > MAX_SCHEDULE_TIMEOUT);
> > > > > > > >  			dma_resv_unlock(priv->dmabuf->resv);
> > > > > > > > +			if (revoked) {
> > > > > > > > +				kref_put(&priv->kref,
> > > > > vfio_pci_dma_buf_done);
> > > > > > > > +				/* Let's wait till all DMA unmap are
> > > > > completed. */
> > > > > > > > +				wait = wait_for_completion_timeout(
> > > > > > > > +					&priv->comp,
> > > secs_to_jiffies(1));
> > > > > > >
> > > > > > > Is the 1-second constant sufficient for all hardware, or should the
> > > > > > > invalidate_mappings() contract require the callback to block until
> > > > > > > speculative reads are strictly fenced? I'm wondering about a case
> > > where
> > > > > > > a device's firmware has a high response latency, perhaps due to
> > > internal
> > > > > > > management tasks like error recovery or thermal and it exceeds
> the
> > > 1s
> > > > > > > timeout.
> > > > > > >
> > > > > > > If the device is in the middle of a large DMA burst and the
> firmware is
> > > > > > > slow to flush the internal pipelines to a fully "quiesced"
> > > > > > > read-and-discard state, reclaiming the memory at exactly 1.001
> > > seconds
> > > > > > > risks triggering platform-level faults..
> > > > > > >
> > > > > > > Since the wen explicitly permit these speculative reads until
> unmap is
> > > > > > > complete, relying on a hardcoded timeout in the exporter seems
> to
> > > > > > > introduce a hardware-dependent race condition that could
> > > compromise
> > > > > > > system stability via IOMMU errors or AER faults.
> > > > > > >
> > > > > > > Should the importer instead be required to guarantee that all
> > > > > > > speculative access has ceased before the invalidation call returns?
> > > > > >
> > > > > > It is guaranteed by the dma_resv_wait_timeout() call above. That
> call
> > > > > ensures
> > > > > > that the hardware has completed all pending operations. The
> 1‑second
> > > > > delay is
> > > > > > meant to catch cases where an in-kernel DMA unmap call is missing,
> > > which
> > > > > should
> > > > > > not trigger any DMA activity at that point.
> > > > >
> > > > > Christian may know actual examples, but my general feeling is he was
> > > > > worrying about drivers that have pushed the DMABUF to visibility on
> > > > > the GPU and the move notify & fences only shoot down some access.
> So
> > > > > it has to wait until the DMABUF is finally unmapped.
> > > > >
> > > > > Pranjal's example should be covered by the driver adding a fence and
> > > > > then the unbounded fence wait will complete it.
> > > > >
> > > >
> > > > Bear me if it's an ignorant question.
> > > >
> > > > The commit msg of patch6 says that VFIO doesn't tolerate unbounded
> > > > wait, which is the reason behind the 2nd timeout wait here.
> > >
> > > It is not accurate. A second timeout is present both in the
> > > description of patch 6 and in VFIO implementation. The difference is
> > > that the timeout is enforced within VFIO.
> > >
> > > >
> > > > Then why is "the unbounded fence wait" not a problem in the same
> > > > code path? the use of MAX_SCHEDULE_TIMEOUT imply a worst-case
> > > > timeout in hundreds of years...
> > >
> > > "An unbounded fence wait" is a different class of wait. It indicates broken
> > > hardware that continues to issue DMA transactions even after it has been
> > > told to
> > > stop.
> > >
> > > The second wait exists to catch software bugs or misuse, where the dma-
> buf
> > > importer has misrepresented its capabilities.
> > >
> >
> > Okay I see.
> >
> > > >
> > > > and it'd be helpful to put some words in the code based on what's
> > > > discussed here.
> > >
> > > We've documented as much as we can in dma_buf_attach_revocable()
> and
> > > dma_buf_invalidate_mappings(). Do you have any suggestions on what
> else
> > > should be added here?
> > >
> >
> > the selection of 1s?
> 
> It is indirectly written in description of WARN_ON(), but let's add
> more. What about the following?
> 
> diff --git a/drivers/vfio/pci/vfio_pci_dmabuf.c
> b/drivers/vfio/pci/vfio_pci_dmabuf.c
> index 93795ad2e025..948ba75288c6 100644
> --- a/drivers/vfio/pci/vfio_pci_dmabuf.c
> +++ b/drivers/vfio/pci/vfio_pci_dmabuf.c
> @@ -357,7 +357,13 @@ void vfio_pci_dma_buf_move(struct
> vfio_pci_core_device *vdev, bool revoked)
>                         dma_resv_unlock(priv->dmabuf->resv);
>                         if (revoked) {
>                                 kref_put(&priv->kref, vfio_pci_dma_buf_done);
> -                               /* Let's wait till all DMA unmap are completed. */
> +                               /*
> +                                * Let's wait for 1 second till all DMA unmap
> +                                * are completed. It is supposed to catch dma-buf
> +                                * importers which lied about their support
> +                                * of dmabuf revoke. See dma_buf_invalidate_mappings()
> +                                * for the expected behaviour,
> +                                */
>                                 wait = wait_for_completion_timeout(
>                                         &priv->comp, secs_to_jiffies(1));
>                                 /*
> 

looks good. Just replace the trailing "," with "." 

^ permalink raw reply	[flat|nested] 51+ messages in thread

* RE: [PATCH v5 4/8] vfio: Wait for dma-buf invalidation to complete
  2026-01-29 14:58           ` Jason Gunthorpe
@ 2026-01-30  3:12             ` Tian, Kevin
  2026-01-30  5:43               ` Mauro Carvalho Chehab
  2026-01-30  8:46             ` Christian König
  1 sibling, 1 reply; 51+ messages in thread
From: Tian, Kevin @ 2026-01-30  3:12 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Leon Romanovsky, Pranjal Shrivastava, Sumit Semwal,
	Christian König, Alex Deucher, David Airlie, Simona Vetter,
	Gerd Hoffmann, Dmitry Osipenko, Gurchetan Singh, Chia-I Wu,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	Lucas De Marchi, Thomas Hellström, Vivi, Rodrigo,
	Joerg Roedel, Will Deacon, Robin Murphy, Felix Kuehling,
	Alex Williamson, Ankit Agrawal, Kasireddy, Vivek,
	linux-media@vger.kernel.org, dri-devel@lists.freedesktop.org,
	linaro-mm-sig@lists.linaro.org, linux-kernel@vger.kernel.org,
	amd-gfx@lists.freedesktop.org, virtualization@lists.linux.dev,
	intel-xe@lists.freedesktop.org, linux-rdma@vger.kernel.org,
	iommu@lists.linux.dev, kvm@vger.kernel.org

> From: Jason Gunthorpe <jgg@ziepe.ca>
> Sent: Thursday, January 29, 2026 10:59 PM
> 
> On Thu, Jan 29, 2026 at 07:06:37AM +0000, Tian, Kevin wrote:
> > Bear me if it's an ignorant question.
> >
> > The commit msg of patch6 says that VFIO doesn't tolerate unbounded
> > wait, which is the reason behind the 2nd timeout wait here.
> 
> As far as I understand dmabuf design a fence wait should complete
> eventually under kernel control, because these sleeps are
> sprinkled all around the kernel today.
> 
> I suspect that is not actually true for every HW, probably something
> like "shader programs can run forever technically".
> 
> We can argue if those cases should not report revocable either, but at
> least this will work "correctly" even if it takes a huge amount of
> time.

good to know those background.

> 
> I wouldn't mind seeing a shorter timeout and print on the fence too
> just in case.
> 

either way is OK. It's not difficult to figure out a long wait anyway. 😊

^ permalink raw reply	[flat|nested] 51+ messages in thread

* RE: [PATCH v5 7/8] vfio: Permit VFIO to work with pinned importers
  2026-01-24 19:14 ` [PATCH v5 7/8] vfio: Permit VFIO to work with pinned importers Leon Romanovsky
  2026-01-29 21:04   ` Alex Williamson
@ 2026-01-30  3:14   ` Tian, Kevin
  1 sibling, 0 replies; 51+ messages in thread
From: Tian, Kevin @ 2026-01-30  3:14 UTC (permalink / raw)
  To: Leon Romanovsky, Sumit Semwal, Christian König, Alex Deucher,
	David Airlie, Simona Vetter, Gerd Hoffmann, Dmitry Osipenko,
	Gurchetan Singh, Chia-I Wu, Maarten Lankhorst, Maxime Ripard,
	Thomas Zimmermann, Lucas De Marchi, Thomas Hellström,
	Vivi, Rodrigo, Jason Gunthorpe, Joerg Roedel, Will Deacon,
	Robin Murphy, Felix Kuehling, Alex Williamson, Ankit Agrawal,
	Kasireddy, Vivek
  Cc: linux-media@vger.kernel.org, dri-devel@lists.freedesktop.org,
	linaro-mm-sig@lists.linaro.org, linux-kernel@vger.kernel.org,
	amd-gfx@lists.freedesktop.org, virtualization@lists.linux.dev,
	intel-xe@lists.freedesktop.org, linux-rdma@vger.kernel.org,
	iommu@lists.linux.dev, kvm@vger.kernel.org

> From: Leon Romanovsky <leon@kernel.org>
> Sent: Sunday, January 25, 2026 3:14 AM
> 
> From: Leon Romanovsky <leonro@nvidia.com>
> 
> Till now VFIO has rejected pinned importers, largely to avoid being used
> with the RDMA pinned importer that cannot handle a move_notify() to
> revoke
> access.
> 
> Using dma_buf_attach_revocable() it can tell the difference between pinned
> importers that support the flow described in dma_buf_invalidate_mappings()
> and those that don't.
> 
> Thus permit compatible pinned importers.
> 
> This is one of two items IOMMUFD requires to remove its private interface
> to VFIO's dma-buf.
> 
> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>

Reviewed-by: Kevin Tian <kevin.tian@intel.com>

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v5 4/8] vfio: Wait for dma-buf invalidation to complete
  2026-01-30  3:12             ` Tian, Kevin
@ 2026-01-30  5:43               ` Mauro Carvalho Chehab
  2026-01-30  5:48                 ` Tian, Kevin
  0 siblings, 1 reply; 51+ messages in thread
From: Mauro Carvalho Chehab @ 2026-01-30  5:43 UTC (permalink / raw)
  To: Tian, Kevin
  Cc: Jason Gunthorpe, Leon Romanovsky, Pranjal Shrivastava,
	Sumit Semwal, Christian König, Alex Deucher, David Airlie,
	Simona Vetter, Gerd Hoffmann, Dmitry Osipenko, Gurchetan Singh,
	Chia-I Wu, Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	Lucas De Marchi, Thomas Hellström, Vivi, Rodrigo,
	Joerg Roedel, Will Deacon, Robin Murphy, Felix Kuehling,
	Alex Williamson, Ankit Agrawal, Kasireddy, Vivek,
	linux-media@vger.kernel.org, dri-devel@lists.freedesktop.org,
	linaro-mm-sig@lists.linaro.org, linux-kernel@vger.kernel.org,
	amd-gfx@lists.freedesktop.org, virtualization@lists.linux.dev,
	intel-xe@lists.freedesktop.org, linux-rdma@vger.kernel.org,
	iommu@lists.linux.dev, kvm@vger.kernel.org

On Fri, 30 Jan 2026 03:12:02 +0000
"Tian, Kevin" <kevin.tian@intel.com> wrote:

> > From: Jason Gunthorpe <jgg@ziepe.ca>
> > Sent: Thursday, January 29, 2026 10:59 PM
> > 
> > On Thu, Jan 29, 2026 at 07:06:37AM +0000, Tian, Kevin wrote:  
> > > Bear me if it's an ignorant question.
> > >
> > > The commit msg of patch6 says that VFIO doesn't tolerate unbounded
> > > wait, which is the reason behind the 2nd timeout wait here.  
> > 
> > As far as I understand dmabuf design a fence wait should complete
> > eventually under kernel control, because these sleeps are
> > sprinkled all around the kernel today.
> > 
> > I suspect that is not actually true for every HW, probably something
> > like "shader programs can run forever technically".
> > 
> > We can argue if those cases should not report revocable either, but at
> > least this will work "correctly" even if it takes a huge amount of
> > time.  
> 
> good to know those background.
> 
> > 
> > I wouldn't mind seeing a shorter timeout and print on the fence too
> > just in case.
> >   
> 
> either way is OK. It's not difficult to figure out a long wait anyway. 😊

Please don't use Outlook when answering to patches - or ensure that
it is properly patched to only send plain text - which I don't
think it is possible.

If you look on this message source code, it is not in plain text:

	Content-Type: text/plain; charset="utf-8"
	Content-Transfer-Encoding: base64

Your message content is:

	PiBGcm9tOiBKYXNvbiBHdW50aG9ycGUgPGpnZ0B6aWVwZS5jYT4NCj4gU2VudDogVGh1cnNkYXks
	IEphbnVhcnkgMjksIDIwMjYgMTA6NTkgUE0NCj4gDQo+IE9uIFRodSwgSmFuIDI5LCAyMDI2IGF0
	IDA3OjA2OjM3QU0gKzAwMDAsIFRpYW4sIEtldmluIHdyb3RlOg0KPiA+IEJlYXIgbWUgaWYgaXQn
	cyBhbiBpZ25vcmFudCBxdWVzdGlvbi4NCj4gPg0KPiA+IFRoZSBjb21taXQgbXNnIG9mIHBhdGNo
	NiBzYXlzIHRoYXQgVkZJTyBkb2Vzbid0IHRvbGVyYXRlIHVuYm91bmRlZA0KPiA+IHdhaXQsIHdo
	aWNoIGlzIHRoZSByZWFzb24gYmVoaW5kIHRoZSAybmQgdGltZW91dCB3YWl0IGhlcmUuDQo+IA0K
	PiBBcyBmYXIgYXMgSSB1bmRlcnN0YW5kIGRtYWJ1ZiBkZXNpZ24gYSBmZW5jZSB3YWl0IHNob3Vs
	ZCBjb21wbGV0ZQ0KPiBldmVudHVhbGx5IHVuZGVyIGtlcm5lbCBjb250cm9sLCBiZWNhdXNlIHRo
	ZXNlIHNsZWVwcyBhcmUNCj4gc3ByaW5rbGVkIGFsbCBhcm91bmQgdGhlIGtlcm5lbCB0b2RheS4N
	Cj4gDQo+IEkgc3VzcGVjdCB0aGF0IGlzIG5vdCBhY3R1YWxseSB0cnVlIGZvciBldmVyeSBIVywg
	cHJvYmFibHkgc29tZXRoaW5nDQo+IGxpa2UgInNoYWRlciBwcm9ncmFtcyBjYW4gcnVuIGZvcmV2
	ZXIgdGVjaG5pY2FsbHkiLg0KPiANCj4gV2UgY2FuIGFyZ3VlIGlmIHRob3NlIGNhc2VzIHNob3Vs
	ZCBub3QgcmVwb3J0IHJldm9jYWJsZSBlaXRoZXIsIGJ1dCBhdA0KPiBsZWFzdCB0aGlzIHdpbGwg
	d29yayAiY29ycmVjdGx5IiBldmVuIGlmIGl0IHRha2VzIGEgaHVnZSBhbW91bnQgb2YNCj4gdGlt
	ZS4NCg0KZ29vZCB0byBrbm93IHRob3NlIGJhY2tncm91bmQuDQoNCj4gDQo+IEkgd291bGRuJ3Qg
	bWluZCBzZWVpbmcgYSBzaG9ydGVyIHRpbWVvdXQgYW5kIHByaW50IG9uIHRoZSBmZW5jZSB0b28N
	Cj4ganVzdCBpbiBjYXNlLg0KPiANCg0KZWl0aGVyIHdheSBpcyBPSy4gSXQncyBub3QgZGlmZmlj
	dWx0IHRvIGZpZ3VyZSBvdXQgYSBsb25nIHdhaXQgYW55d2F5LiDwn5iKDQo=

which is something that patch tools - in special patchwork - won't handle.

Thanks,
Mauro

^ permalink raw reply	[flat|nested] 51+ messages in thread

* RE: [PATCH v5 4/8] vfio: Wait for dma-buf invalidation to complete
  2026-01-30  5:43               ` Mauro Carvalho Chehab
@ 2026-01-30  5:48                 ` Tian, Kevin
  0 siblings, 0 replies; 51+ messages in thread
From: Tian, Kevin @ 2026-01-30  5:48 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: Jason Gunthorpe, Leon Romanovsky, Pranjal Shrivastava,
	Sumit Semwal, Christian König, Alex Deucher, David Airlie,
	Simona Vetter, Gerd Hoffmann, Dmitry Osipenko, Gurchetan Singh,
	Chia-I Wu, Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	Lucas De Marchi, Thomas Hellström, Vivi, Rodrigo,
	Joerg Roedel, Will Deacon, Robin Murphy, Felix Kuehling,
	Alex Williamson, Ankit Agrawal, Kasireddy, Vivek,
	linux-media@vger.kernel.org, dri-devel@lists.freedesktop.org,
	linaro-mm-sig@lists.linaro.org, linux-kernel@vger.kernel.org,
	amd-gfx@lists.freedesktop.org, virtualization@lists.linux.dev,
	intel-xe@lists.freedesktop.org, linux-rdma@vger.kernel.org,
	iommu@lists.linux.dev, kvm@vger.kernel.org

> From: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
> Sent: Friday, January 30, 2026 1:43 PM
> 
> On Fri, 30 Jan 2026 03:12:02 +0000
> "Tian, Kevin" <kevin.tian@intel.com> wrote:
> 
> > > From: Jason Gunthorpe <jgg@ziepe.ca>
> > > Sent: Thursday, January 29, 2026 10:59 PM
> > >
> > > On Thu, Jan 29, 2026 at 07:06:37AM +0000, Tian, Kevin wrote:
> > > > Bear me if it's an ignorant question.
> > > >
> > > > The commit msg of patch6 says that VFIO doesn't tolerate unbounded
> > > > wait, which is the reason behind the 2nd timeout wait here.
> > >
> > > As far as I understand dmabuf design a fence wait should complete
> > > eventually under kernel control, because these sleeps are
> > > sprinkled all around the kernel today.
> > >
> > > I suspect that is not actually true for every HW, probably something
> > > like "shader programs can run forever technically".
> > >
> > > We can argue if those cases should not report revocable either, but at
> > > least this will work "correctly" even if it takes a huge amount of
> > > time.
> >
> > good to know those background.
> >
> > >
> > > I wouldn't mind seeing a shorter timeout and print on the fence too
> > > just in case.
> > >
> >
> > either way is OK. It's not difficult to figure out a long wait anyway. 😊
> 
> Please don't use Outlook when answering to patches - or ensure that
> it is properly patched to only send plain text - which I don't
> think it is possible.
> 
> If you look on this message source code, it is not in plain text:
> 
> 	Content-Type: text/plain; charset="utf-8"
> 	Content-Transfer-Encoding: base64

it's likely caused by the trailing smile icon. I'll pay attention to it.

> 
> Your message content is:
> 
> 	PiBGcm9tOiBKYXNvbiBHdW50aG9ycGUgPGpnZ0B6aWVwZS5jYT4NCj
> 4gU2VudDogVGh1cnNkYXks
> 	IEphbnVhcnkgMjksIDIwMjYgMTA6NTkgUE0NCj4gDQo+IE9uIFRodSw
> gSmFuIDI5LCAyMDI2IGF0
> 	IDA3OjA2OjM3QU0gKzAwMDAsIFRpYW4sIEtldmluIHdyb3RlOg0KPiA+
> IEJlYXIgbWUgaWYgaXQn
> 	cyBhbiBpZ25vcmFudCBxdWVzdGlvbi4NCj4gPg0KPiA+IFRoZSBjb21taX
> QgbXNnIG9mIHBhdGNo
> 	NiBzYXlzIHRoYXQgVkZJTyBkb2Vzbid0IHRvbGVyYXRlIHVuYm91bmRlZ
> A0KPiA+IHdhaXQsIHdo
> 	aWNoIGlzIHRoZSByZWFzb24gYmVoaW5kIHRoZSAybmQgdGltZW91dC
> B3YWl0IGhlcmUuDQo+IA0K
> 	PiBBcyBmYXIgYXMgSSB1bmRlcnN0YW5kIGRtYWJ1ZiBkZXNpZ24gYSB
> mZW5jZSB3YWl0IHNob3Vs
> 	ZCBjb21wbGV0ZQ0KPiBldmVudHVhbGx5IHVuZGVyIGtlcm5lbCBjb250
> cm9sLCBiZWNhdXNlIHRo
> 	ZXNlIHNsZWVwcyBhcmUNCj4gc3ByaW5rbGVkIGFsbCBhcm91bmQgd
> GhlIGtlcm5lbCB0b2RheS4N
> 	Cj4gDQo+IEkgc3VzcGVjdCB0aGF0IGlzIG5vdCBhY3R1YWxseSB0cnVlIG
> ZvciBldmVyeSBIVywg
> 	cHJvYmFibHkgc29tZXRoaW5nDQo+IGxpa2UgInNoYWRlciBwcm9ncmF
> tcyBjYW4gcnVuIGZvcmV2
> 	ZXIgdGVjaG5pY2FsbHkiLg0KPiANCj4gV2UgY2FuIGFyZ3VlIGlmIHRob3
> NlIGNhc2VzIHNob3Vs
> 	ZCBub3QgcmVwb3J0IHJldm9jYWJsZSBlaXRoZXIsIGJ1dCBhdA0KPiBsZ
> WFzdCB0aGlzIHdpbGwg
> 	d29yayAiY29ycmVjdGx5IiBldmVuIGlmIGl0IHRha2VzIGEgaHVnZSBhbW
> 91bnQgb2YNCj4gdGlt
> 	ZS4NCg0KZ29vZCB0byBrbm93IHRob3NlIGJhY2tncm91bmQuDQoNCj4
> gDQo+IEkgd291bGRuJ3Qg
> 	bWluZCBzZWVpbmcgYSBzaG9ydGVyIHRpbWVvdXQgYW5kIHByaW50I
> G9uIHRoZSBmZW5jZSB0b28N
> 	Cj4ganVzdCBpbiBjYXNlLg0KPiANCg0KZWl0aGVyIHdheSBpcyBPSy4gSX
> QncyBub3QgZGlmZmlj
> 	dWx0IHRvIGZpZ3VyZSBvdXQgYSBsb25nIHdhaXQgYW55d2F5LiDwn5i
> KDQo=
> 
> which is something that patch tools - in special patchwork - won't handle.
> 
> Thanks,
> Mauro

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v5 3/8] dma-buf: Always build with DMABUF_MOVE_NOTIFY
  2026-01-27 10:02       ` Christian König
  2026-01-27 11:42         ` Leon Romanovsky
@ 2026-01-30  7:00         ` Leon Romanovsky
  1 sibling, 0 replies; 51+ messages in thread
From: Leon Romanovsky @ 2026-01-30  7:00 UTC (permalink / raw)
  To: Christian König
  Cc: Sumit Semwal, Alex Deucher, David Airlie, Simona Vetter,
	Gerd Hoffmann, Dmitry Osipenko, Gurchetan Singh, Chia-I Wu,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	Lucas De Marchi, Thomas Hellström, Rodrigo Vivi,
	Jason Gunthorpe, Kevin Tian, Joerg Roedel, Will Deacon,
	Robin Murphy, Felix Kuehling, Alex Williamson, Ankit Agrawal,
	Vivek Kasireddy, linux-media, dri-devel, linaro-mm-sig,
	linux-kernel, amd-gfx, virtualization, intel-xe, linux-rdma,
	iommu, kvm

On Tue, Jan 27, 2026 at 11:02:03AM +0100, Christian König wrote:
> On 1/27/26 10:58, Leon Romanovsky wrote:
> > On Tue, Jan 27, 2026 at 10:26:26AM +0100, Christian König wrote:
> >> On 1/24/26 20:14, Leon Romanovsky wrote:
> >>> From: Leon Romanovsky <leonro@nvidia.com>
> >>>
> >>> DMABUF_MOVE_NOTIFY was introduced in 2018 and has been marked as
> >>> experimental and disabled by default ever since. Six years later,
> >>> all new importers implement this callback.
> >>>
> >>> It is therefore reasonable to drop CONFIG_DMABUF_MOVE_NOTIFY and
> >>> always build DMABUF with support for it enabled.
> >>>
> >>> Suggested-by: Christian König <christian.koenig@amd.com>
> >>> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
> >>
> >> Reviewed-by: Christian König <christian.koenig@amd.com>
> >>
> >> As long as nobody starts screaming in the last second or I encounter some other problem I'm going to push the first three patches to drm-misc-next now.
> > 
> > How do you see progress of other patches?
> > Can they be queued for that tree as well?
> 
> I was hoping to get through them by the end of the week.

Christian,

Can we please merge the whole series now?

Thanks

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v5 5/8] dma-buf: Make .invalidate_mapping() truly optional
  2026-01-24 19:14 ` [PATCH v5 5/8] dma-buf: Make .invalidate_mapping() truly optional Leon Romanovsky
@ 2026-01-30  8:30   ` Christian König
  2026-01-30 12:55     ` Leon Romanovsky
  0 siblings, 1 reply; 51+ messages in thread
From: Christian König @ 2026-01-30  8:30 UTC (permalink / raw)
  To: Leon Romanovsky, Sumit Semwal, Alex Deucher, David Airlie,
	Simona Vetter, Gerd Hoffmann, Dmitry Osipenko, Gurchetan Singh,
	Chia-I Wu, Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	Lucas De Marchi, Thomas Hellström, Rodrigo Vivi,
	Jason Gunthorpe, Kevin Tian, Joerg Roedel, Will Deacon,
	Robin Murphy, Felix Kuehling, Alex Williamson, Ankit Agrawal,
	Vivek Kasireddy
  Cc: linux-media, dri-devel, linaro-mm-sig, linux-kernel, amd-gfx,
	virtualization, intel-xe, linux-rdma, iommu, kvm

On 1/24/26 20:14, Leon Romanovsky wrote:
> From: Leon Romanovsky <leonro@nvidia.com>
> 
> The .invalidate_mapping() callback is documented as optional, yet it
> effectively became mandatory whenever importer_ops were provided. This
> led to cases where RDMA non-ODP code had to supply an empty stub.
> 
> Relax the checks in the dma-buf core so the callback can be omitted,
> allowing RDMA code to drop the unnecessary function.
> 
> Removing the stub allows the next patch to tell that RDMA does not support
> .invalidate_mapping() by checking for a NULL op.
> 
> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
> ---
>  drivers/dma-buf/dma-buf.c             |  6 ++----
>  drivers/infiniband/core/umem_dmabuf.c | 13 -------------
>  2 files changed, 2 insertions(+), 17 deletions(-)
> 
> diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
> index cd68c1c0bfd7..1629312d364a 100644
> --- a/drivers/dma-buf/dma-buf.c
> +++ b/drivers/dma-buf/dma-buf.c
> @@ -947,9 +947,6 @@ dma_buf_dynamic_attach(struct dma_buf *dmabuf, struct device *dev,
>  	if (WARN_ON(!dmabuf || !dev))
>  		return ERR_PTR(-EINVAL);
>  
> -	if (WARN_ON(importer_ops && !importer_ops->invalidate_mappings))
> -		return ERR_PTR(-EINVAL);
> -
>  	attach = kzalloc(sizeof(*attach), GFP_KERNEL);
>  	if (!attach)
>  		return ERR_PTR(-ENOMEM);
> @@ -1260,7 +1257,8 @@ void dma_buf_invalidate_mappings(struct dma_buf *dmabuf)
>  	dma_resv_assert_held(dmabuf->resv);
>  
>  	list_for_each_entry(attach, &dmabuf->attachments, node)
> -		if (attach->importer_ops)
> +		if (attach->importer_ops &&
> +		    attach->importer_ops->invalidate_mappings)
>  			attach->importer_ops->invalidate_mappings(attach);
>  }
>  EXPORT_SYMBOL_NS_GPL(dma_buf_invalidate_mappings, "DMA_BUF");
> diff --git a/drivers/infiniband/core/umem_dmabuf.c b/drivers/infiniband/core/umem_dmabuf.c
> index d77a739cfe7a..256e34c15e6b 100644
> --- a/drivers/infiniband/core/umem_dmabuf.c
> +++ b/drivers/infiniband/core/umem_dmabuf.c
> @@ -129,9 +129,6 @@ ib_umem_dmabuf_get_with_dma_device(struct ib_device *device,
>  	if (check_add_overflow(offset, (unsigned long)size, &end))
>  		return ret;
>  
> -	if (unlikely(!ops || !ops->invalidate_mappings))

You should probably keep "if (unlikely(!ops)).." here.

Apart from that the patch looks good to me.

Regards,
Christian.

> -		return ret;
> -
>  	dmabuf = dma_buf_get(fd);
>  	if (IS_ERR(dmabuf))
>  		return ERR_CAST(dmabuf);
> @@ -184,18 +181,8 @@ struct ib_umem_dmabuf *ib_umem_dmabuf_get(struct ib_device *device,
>  }
>  EXPORT_SYMBOL(ib_umem_dmabuf_get);
>  
> -static void
> -ib_umem_dmabuf_unsupported_move_notify(struct dma_buf_attachment *attach)
> -{
> -	struct ib_umem_dmabuf *umem_dmabuf = attach->importer_priv;
> -
> -	ibdev_warn_ratelimited(umem_dmabuf->umem.ibdev,
> -			       "Invalidate callback should not be called when memory is pinned\n");
> -}
> -
>  static struct dma_buf_attach_ops ib_umem_dmabuf_attach_pinned_ops = {
>  	.allow_peer2peer = true,
> -	.invalidate_mappings = ib_umem_dmabuf_unsupported_move_notify,
>  };
>  
>  struct ib_umem_dmabuf *
> 


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v5 4/8] vfio: Wait for dma-buf invalidation to complete
  2026-01-24 19:14 ` [PATCH v5 4/8] vfio: Wait for dma-buf invalidation to complete Leon Romanovsky
  2026-01-26 20:53   ` Pranjal Shrivastava
@ 2026-01-30  8:30   ` Christian König
  2026-01-30 13:01     ` Leon Romanovsky
  1 sibling, 1 reply; 51+ messages in thread
From: Christian König @ 2026-01-30  8:30 UTC (permalink / raw)
  To: Leon Romanovsky, Sumit Semwal, Alex Deucher, David Airlie,
	Simona Vetter, Gerd Hoffmann, Dmitry Osipenko, Gurchetan Singh,
	Chia-I Wu, Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	Lucas De Marchi, Thomas Hellström, Rodrigo Vivi,
	Jason Gunthorpe, Kevin Tian, Joerg Roedel, Will Deacon,
	Robin Murphy, Felix Kuehling, Alex Williamson, Ankit Agrawal,
	Vivek Kasireddy
  Cc: linux-media, dri-devel, linaro-mm-sig, linux-kernel, amd-gfx,
	virtualization, intel-xe, linux-rdma, iommu, kvm

On 1/24/26 20:14, Leon Romanovsky wrote:
> From: Leon Romanovsky <leonro@nvidia.com>
> 
> dma-buf invalidation is handled asynchronously by the hardware, so VFIO
> must wait until all affected objects have been fully invalidated.
> 
> In addition, the dma-buf exporter is expecting that all importers unmap any
> buffers they previously mapped.
> 
> Fixes: 5d74781ebc86 ("vfio/pci: Add dma-buf export support for MMIO regions")
> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
> ---
>  drivers/vfio/pci/vfio_pci_dmabuf.c | 71 ++++++++++++++++++++++++++++++++++++--
>  1 file changed, 68 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/vfio/pci/vfio_pci_dmabuf.c b/drivers/vfio/pci/vfio_pci_dmabuf.c
> index d8ceafabef48..485515629fe4 100644
> --- a/drivers/vfio/pci/vfio_pci_dmabuf.c
> +++ b/drivers/vfio/pci/vfio_pci_dmabuf.c
> @@ -17,6 +17,8 @@ struct vfio_pci_dma_buf {
>  	struct dma_buf_phys_vec *phys_vec;
>  	struct p2pdma_provider *provider;
>  	u32 nr_ranges;
> +	struct kref kref;
> +	struct completion comp;
>  	u8 revoked : 1;
>  };
>  
> @@ -44,27 +46,46 @@ static int vfio_pci_dma_buf_attach(struct dma_buf *dmabuf,
>  	return 0;
>  }
>  
> +static void vfio_pci_dma_buf_done(struct kref *kref)
> +{
> +	struct vfio_pci_dma_buf *priv =
> +		container_of(kref, struct vfio_pci_dma_buf, kref);
> +
> +	complete(&priv->comp);
> +}
> +
>  static struct sg_table *
>  vfio_pci_dma_buf_map(struct dma_buf_attachment *attachment,
>  		     enum dma_data_direction dir)
>  {
>  	struct vfio_pci_dma_buf *priv = attachment->dmabuf->priv;
> +	struct sg_table *ret;
>  
>  	dma_resv_assert_held(priv->dmabuf->resv);
>  
>  	if (priv->revoked)
>  		return ERR_PTR(-ENODEV);
>  
> -	return dma_buf_phys_vec_to_sgt(attachment, priv->provider,
> -				       priv->phys_vec, priv->nr_ranges,
> -				       priv->size, dir);
> +	ret = dma_buf_phys_vec_to_sgt(attachment, priv->provider,
> +				      priv->phys_vec, priv->nr_ranges,
> +				      priv->size, dir);
> +	if (IS_ERR(ret))
> +		return ret;
> +
> +	kref_get(&priv->kref);
> +	return ret;
>  }
>  
>  static void vfio_pci_dma_buf_unmap(struct dma_buf_attachment *attachment,
>  				   struct sg_table *sgt,
>  				   enum dma_data_direction dir)
>  {
> +	struct vfio_pci_dma_buf *priv = attachment->dmabuf->priv;
> +
> +	dma_resv_assert_held(priv->dmabuf->resv);
> +
>  	dma_buf_free_sgt(attachment, sgt, dir);
> +	kref_put(&priv->kref, vfio_pci_dma_buf_done);
>  }
>  
>  static void vfio_pci_dma_buf_release(struct dma_buf *dmabuf)
> @@ -287,6 +308,9 @@ int vfio_pci_core_feature_dma_buf(struct vfio_pci_core_device *vdev, u32 flags,
>  		goto err_dev_put;
>  	}
>  
> +	kref_init(&priv->kref);
> +	init_completion(&priv->comp);
> +
>  	/* dma_buf_put() now frees priv */
>  	INIT_LIST_HEAD(&priv->dmabufs_elm);
>  	down_write(&vdev->memory_lock);
> @@ -326,6 +350,8 @@ void vfio_pci_dma_buf_move(struct vfio_pci_core_device *vdev, bool revoked)
>  	lockdep_assert_held_write(&vdev->memory_lock);
>  
>  	list_for_each_entry_safe(priv, tmp, &vdev->dmabufs, dmabufs_elm) {
> +		unsigned long wait;
> +
>  		if (!get_file_active(&priv->dmabuf->file))
>  			continue;
>  
> @@ -333,7 +359,37 @@ void vfio_pci_dma_buf_move(struct vfio_pci_core_device *vdev, bool revoked)
>  			dma_resv_lock(priv->dmabuf->resv, NULL);
>  			priv->revoked = revoked;
>  			dma_buf_invalidate_mappings(priv->dmabuf);
> +			dma_resv_wait_timeout(priv->dmabuf->resv,
> +					      DMA_RESV_USAGE_BOOKKEEP, false,
> +					      MAX_SCHEDULE_TIMEOUT);
>  			dma_resv_unlock(priv->dmabuf->resv);
> +			if (revoked) {
> +				kref_put(&priv->kref, vfio_pci_dma_buf_done);
> +				/* Let's wait till all DMA unmap are completed. */
> +				wait = wait_for_completion_timeout(
> +					&priv->comp, secs_to_jiffies(1));
> +				/*
> +				 * If you see this WARN_ON, it means that
> +				 * importer didn't call unmap in response to
> +				 * dma_buf_invalidate_mappings() which is not
> +				 * allowed.
> +				 */
> +				WARN(!wait,
> +				     "Timed out waiting for DMABUF unmap, importer has a broken invalidate_mapping()");

You can do the revoke to do your resource management, for example re-use the backing store for something else.

But it is mandatory that you keep the mapping around indefinitely until the importer closes it.

Before that you can't do things like runtime PM or remove or anything which would make the DMA addresses invalid.

As far as I can see vfio_pci_dma_buf_move() is used exactly for that use case so this here is an absolutely clear NAK from my side for this approach.

You can either split up the functionality of vfio_pci_dma_buf_move() into vfio_pci_dma_buf_invalidate_mappings() and vfio_pci_dma_buf_flush() and then call the later whenever necessary or you keep it in one function and block everybody until the importer has dropped all mappings.

> +			} else {
> +				/*
> +				 * Kref is initialize again, because when revoke
> +				 * was performed the reference counter was decreased
> +				 * to zero to trigger completion.
> +				 */
> +				kref_init(&priv->kref);
> +				/*
> +				 * There is no need to wait as no mapping was
> +				 * performed when the previous status was
> +				 * priv->revoked == true.
> +				 */
> +				reinit_completion(&priv->comp);
> +			}
>  		}
>  		fput(priv->dmabuf->file);

This is also extremely questionable. Why doesn't the dmabuf have a reference while on the linked list?

Regards,
Christian.

>  	}
> @@ -346,6 +402,8 @@ void vfio_pci_dma_buf_cleanup(struct vfio_pci_core_device *vdev)
>  
>  	down_write(&vdev->memory_lock);
>  	list_for_each_entry_safe(priv, tmp, &vdev->dmabufs, dmabufs_elm) {
> +		unsigned long wait;
> +
>  		if (!get_file_active(&priv->dmabuf->file))
>  			continue;
>  
> @@ -354,7 +412,14 @@ void vfio_pci_dma_buf_cleanup(struct vfio_pci_core_device *vdev)
>  		priv->vdev = NULL;
>  		priv->revoked = true;
>  		dma_buf_invalidate_mappings(priv->dmabuf);
> +		dma_resv_wait_timeout(priv->dmabuf->resv,
> +				      DMA_RESV_USAGE_BOOKKEEP, false,
> +				      MAX_SCHEDULE_TIMEOUT);
>  		dma_resv_unlock(priv->dmabuf->resv);
> +		kref_put(&priv->kref, vfio_pci_dma_buf_done);
> +		wait = wait_for_completion_timeout(&priv->comp,
> +						   secs_to_jiffies(1));
> +		WARN_ON(!wait);
>  		vfio_device_put_registration(&vdev->vdev);
>  		fput(priv->dmabuf->file);
>  	}
> 


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v5 6/8] dma-buf: Add dma_buf_attach_revocable()
  2026-01-24 19:14 ` [PATCH v5 6/8] dma-buf: Add dma_buf_attach_revocable() Leon Romanovsky
  2026-01-26 20:38   ` Pranjal Shrivastava
@ 2026-01-30  8:43   ` Christian König
  2026-01-30 14:00     ` Jason Gunthorpe
  1 sibling, 1 reply; 51+ messages in thread
From: Christian König @ 2026-01-30  8:43 UTC (permalink / raw)
  To: Leon Romanovsky, Sumit Semwal, Alex Deucher, David Airlie,
	Simona Vetter, Gerd Hoffmann, Dmitry Osipenko, Gurchetan Singh,
	Chia-I Wu, Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	Lucas De Marchi, Thomas Hellström, Rodrigo Vivi,
	Jason Gunthorpe, Kevin Tian, Joerg Roedel, Will Deacon,
	Robin Murphy, Felix Kuehling, Alex Williamson, Ankit Agrawal,
	Vivek Kasireddy
  Cc: linux-media, dri-devel, linaro-mm-sig, linux-kernel, amd-gfx,
	virtualization, intel-xe, linux-rdma, iommu, kvm

On 1/24/26 20:14, Leon Romanovsky wrote:
> From: Leon Romanovsky <leonro@nvidia.com>
> 
> Some exporters need a flow to synchronously revoke access to the DMA-buf
> by importers. Once revoke is completed the importer is not permitted to
> touch the memory otherwise they may get IOMMU faults, AERs, or worse.

That approach is seriously not going to fly.

You can use the invalidate_mappings approach to trigger the importer to give back the mapping, but when the mapping is really given back is still completely on the importer side.

In other words you can't do the shot down revoke semantics you are trying to establish here.

Regards,
Christian.

> 
> DMA-buf today defines a revoke flow, for both pinned and dynamic
> importers, which is broadly:
> 
> 	dma_resv_lock(dmabuf->resv, NULL);
> 	// Prevent new mappings from being established
> 	priv->revoked = true;
> 
> 	// Tell all importers to eventually unmap
> 	dma_buf_invalidate_mappings(dmabuf);
> 
> 	// Wait for any inprogress fences on the old mapping
> 	dma_resv_wait_timeout(dmabuf->resv,
> 			      DMA_RESV_USAGE_BOOKKEEP, false,
> 			      MAX_SCHEDULE_TIMEOUT);
> 	dma_resv_unlock(dmabuf->resv, NULL);
> 
> 	// Wait for all importers to complete unmap
> 	wait_for_completion(&priv->unmapped_comp);
> 
> This works well, and an importer that continues to access the DMA-buf
> after unmapping it is very buggy.
> 
> However, the final wait for unmap is effectively unbounded. Several
> importers do not support invalidate_mappings() at all and won't unmap
> until userspace triggers it.
> 
> This unbounded wait is not suitable for exporters like VFIO and RDMA tha
> need to issue revoke as part of their normal operations.
> 
> Add dma_buf_attach_revocable() to allow exporters to determine the
> difference between importers that can complete the above in bounded time,
> and those that can't. It can be called inside the exporter's attach op to
> reject incompatible importers.
> 
> Document these details about how dma_buf_invalidate_mappings() works and
> what the required sequence is to achieve a full revocation.
> 
> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
> ---
>  drivers/dma-buf/dma-buf.c | 48 ++++++++++++++++++++++++++++++++++++++++++++++-
>  include/linux/dma-buf.h   |  9 +++------
>  2 files changed, 50 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
> index 1629312d364a..f0e05227bda8 100644
> --- a/drivers/dma-buf/dma-buf.c
> +++ b/drivers/dma-buf/dma-buf.c
> @@ -1242,13 +1242,59 @@ void dma_buf_unmap_attachment_unlocked(struct dma_buf_attachment *attach,
>  }
>  EXPORT_SYMBOL_NS_GPL(dma_buf_unmap_attachment_unlocked, "DMA_BUF");
>  
> +/**
> + * dma_buf_attach_revocable - check if a DMA-buf importer implements
> + * revoke semantics.
> + * @attach: the DMA-buf attachment to check
> + *
> + * Returns true if the DMA-buf importer can support the revoke sequence
> + * explained in dma_buf_invalidate_mappings() within bounded time. Meaning the
> + * importer implements invalidate_mappings() and ensures that unmap is called as
> + * a result.
> + */
> +bool dma_buf_attach_revocable(struct dma_buf_attachment *attach)
> +{
> +	return attach->importer_ops &&
> +	       attach->importer_ops->invalidate_mappings;
> +}
> +EXPORT_SYMBOL_NS_GPL(dma_buf_attach_revocable, "DMA_BUF");
> +
>  /**
>   * dma_buf_invalidate_mappings - notify attachments that DMA-buf is moving
>   *
>   * @dmabuf:	[in]	buffer which is moving
>   *
>   * Informs all attachments that they need to destroy and recreate all their
> - * mappings.
> + * mappings. If the attachment is dynamic then the dynamic importer is expected
> + * to invalidate any caches it has of the mapping result and perform a new
> + * mapping request before allowing HW to do any further DMA.
> + *
> + * If the attachment is pinned then this informs the pinned importer that the
> + * underlying mapping is no longer available. Pinned importers may take this is
> + * as a permanent revocation and never establish new mappings so exporters
> + * should not trigger it lightly.
> + *
> + * Upon return importers may continue to access the DMA-buf memory. The caller
> + * must do two additional waits to ensure that the memory is no longer being
> + * accessed:
> + *  1) Until dma_resv_wait_timeout() retires fences the importer is allowed to
> + *     fully access the memory.
> + *  2) Until the importer calls unmap it is allowed to speculatively
> + *     read-and-discard the memory. It must not write to the memory.
> + *
> + * A caller wishing to use dma_buf_invalidate_mappings() to fully stop access to
> + * the DMA-buf must wait for both. Dynamic callers can often use just the first.
> + *
> + * All importers providing a invalidate_mappings() op must ensure that unmap is
> + * called within bounded time after the op.
> + *
> + * Pinned importers that do not support a invalidate_mappings() op will
> + * eventually perform unmap when they are done with the buffer, which may be an
> + * ubounded time from calling this function. dma_buf_attach_revocable() can be
> + * used to prevent such importers from attaching.
> + *
> + * Importers are free to request a new mapping in parallel as this function
> + * returns.
>   */
>  void dma_buf_invalidate_mappings(struct dma_buf *dmabuf)
>  {
> diff --git a/include/linux/dma-buf.h b/include/linux/dma-buf.h
> index d5c3ce2b3aa4..84a7ec8f5359 100644
> --- a/include/linux/dma-buf.h
> +++ b/include/linux/dma-buf.h
> @@ -468,12 +468,8 @@ struct dma_buf_attach_ops {
>  	 * called with this lock held as well. This makes sure that no mapping
>  	 * is created concurrently with an ongoing move operation.
>  	 *
> -	 * Mappings stay valid and are not directly affected by this callback.
> -	 * But the DMA-buf can now be in a different physical location, so all
> -	 * mappings should be destroyed and re-created as soon as possible.
> -	 *
> -	 * New mappings can be created after this callback returns, and will
> -	 * point to the new location of the DMA-buf.
> +	 * See the kdoc for dma_buf_invalidate_mappings() for details on the
> +	 * required behavior.
>  	 */
>  	void (*invalidate_mappings)(struct dma_buf_attachment *attach);
>  };
> @@ -601,6 +597,7 @@ struct sg_table *dma_buf_map_attachment(struct dma_buf_attachment *,
>  void dma_buf_unmap_attachment(struct dma_buf_attachment *, struct sg_table *,
>  				enum dma_data_direction);
>  void dma_buf_invalidate_mappings(struct dma_buf *dma_buf);
> +bool dma_buf_attach_revocable(struct dma_buf_attachment *attach);
>  int dma_buf_begin_cpu_access(struct dma_buf *dma_buf,
>  			     enum dma_data_direction dir);
>  int dma_buf_end_cpu_access(struct dma_buf *dma_buf,
> 


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v5 4/8] vfio: Wait for dma-buf invalidation to complete
  2026-01-29 14:58           ` Jason Gunthorpe
  2026-01-30  3:12             ` Tian, Kevin
@ 2026-01-30  8:46             ` Christian König
  1 sibling, 0 replies; 51+ messages in thread
From: Christian König @ 2026-01-30  8:46 UTC (permalink / raw)
  To: Jason Gunthorpe, Tian, Kevin
  Cc: Leon Romanovsky, Pranjal Shrivastava, Sumit Semwal, Alex Deucher,
	David Airlie, Simona Vetter, Gerd Hoffmann, Dmitry Osipenko,
	Gurchetan Singh, Chia-I Wu, Maarten Lankhorst, Maxime Ripard,
	Thomas Zimmermann, Lucas De Marchi, Thomas Hellström,
	Vivi, Rodrigo, Joerg Roedel, Will Deacon, Robin Murphy,
	Felix Kuehling, Alex Williamson, Ankit Agrawal, Kasireddy, Vivek,
	linux-media@vger.kernel.org, dri-devel@lists.freedesktop.org,
	linaro-mm-sig@lists.linaro.org, linux-kernel@vger.kernel.org,
	amd-gfx@lists.freedesktop.org, virtualization@lists.linux.dev,
	intel-xe@lists.freedesktop.org, linux-rdma@vger.kernel.org,
	iommu@lists.linux.dev, kvm@vger.kernel.org

On 1/29/26 15:58, Jason Gunthorpe wrote:
> On Thu, Jan 29, 2026 at 07:06:37AM +0000, Tian, Kevin wrote:
>> Bear me if it's an ignorant question.
>>
>> The commit msg of patch6 says that VFIO doesn't tolerate unbounded
>> wait, which is the reason behind the 2nd timeout wait here.
> 
> As far as I understand dmabuf design a fence wait should complete
> eventually under kernel control, because these sleeps are
> sprinkled all around the kernel today.

Well it's a bit different, but we indeed guarantee that dma_fences complete in finite time.

> I suspect that is not actually true for every HW, probably something
> like "shader programs can run forever technically".

Nope, stuff like that is strictly forbidden.

Regards,
Christian.

> 
> We can argue if those cases should not report revocable either, but at
> least this will work "correctly" even if it takes a huge amount of
> time.
> 
> I wouldn't mind seeing a shorter timeout and print on the fence too
> just in case.
> 
> Jason


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v5 5/8] dma-buf: Make .invalidate_mapping() truly optional
  2026-01-30  8:30   ` Christian König
@ 2026-01-30 12:55     ` Leon Romanovsky
  0 siblings, 0 replies; 51+ messages in thread
From: Leon Romanovsky @ 2026-01-30 12:55 UTC (permalink / raw)
  To: Christian König
  Cc: Sumit Semwal, Alex Deucher, David Airlie, Simona Vetter,
	Gerd Hoffmann, Dmitry Osipenko, Gurchetan Singh, Chia-I Wu,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	Lucas De Marchi, Thomas Hellström, Rodrigo Vivi,
	Jason Gunthorpe, Kevin Tian, Joerg Roedel, Will Deacon,
	Robin Murphy, Felix Kuehling, Alex Williamson, Ankit Agrawal,
	Vivek Kasireddy, linux-media, dri-devel, linaro-mm-sig,
	linux-kernel, amd-gfx, virtualization, intel-xe, linux-rdma,
	iommu, kvm

On Fri, Jan 30, 2026 at 09:30:05AM +0100, Christian König wrote:
> On 1/24/26 20:14, Leon Romanovsky wrote:
> > From: Leon Romanovsky <leonro@nvidia.com>
> > 
> > The .invalidate_mapping() callback is documented as optional, yet it
> > effectively became mandatory whenever importer_ops were provided. This
> > led to cases where RDMA non-ODP code had to supply an empty stub.
> > 
> > Relax the checks in the dma-buf core so the callback can be omitted,
> > allowing RDMA code to drop the unnecessary function.
> > 
> > Removing the stub allows the next patch to tell that RDMA does not support
> > .invalidate_mapping() by checking for a NULL op.
> > 
> > Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
> > ---
> >  drivers/dma-buf/dma-buf.c             |  6 ++----
> >  drivers/infiniband/core/umem_dmabuf.c | 13 -------------
> >  2 files changed, 2 insertions(+), 17 deletions(-)
> > 
> > diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
> > index cd68c1c0bfd7..1629312d364a 100644
> > --- a/drivers/dma-buf/dma-buf.c
> > +++ b/drivers/dma-buf/dma-buf.c
> > @@ -947,9 +947,6 @@ dma_buf_dynamic_attach(struct dma_buf *dmabuf, struct device *dev,
> >  	if (WARN_ON(!dmabuf || !dev))
> >  		return ERR_PTR(-EINVAL);
> >  
> > -	if (WARN_ON(importer_ops && !importer_ops->invalidate_mappings))
> > -		return ERR_PTR(-EINVAL);
> > -
> >  	attach = kzalloc(sizeof(*attach), GFP_KERNEL);
> >  	if (!attach)
> >  		return ERR_PTR(-ENOMEM);
> > @@ -1260,7 +1257,8 @@ void dma_buf_invalidate_mappings(struct dma_buf *dmabuf)
> >  	dma_resv_assert_held(dmabuf->resv);
> >  
> >  	list_for_each_entry(attach, &dmabuf->attachments, node)
> > -		if (attach->importer_ops)
> > +		if (attach->importer_ops &&
> > +		    attach->importer_ops->invalidate_mappings)
> >  			attach->importer_ops->invalidate_mappings(attach);
> >  }
> >  EXPORT_SYMBOL_NS_GPL(dma_buf_invalidate_mappings, "DMA_BUF");
> > diff --git a/drivers/infiniband/core/umem_dmabuf.c b/drivers/infiniband/core/umem_dmabuf.c
> > index d77a739cfe7a..256e34c15e6b 100644
> > --- a/drivers/infiniband/core/umem_dmabuf.c
> > +++ b/drivers/infiniband/core/umem_dmabuf.c
> > @@ -129,9 +129,6 @@ ib_umem_dmabuf_get_with_dma_device(struct ib_device *device,
> >  	if (check_add_overflow(offset, (unsigned long)size, &end))
> >  		return ret;
> >  
> > -	if (unlikely(!ops || !ops->invalidate_mappings))
> 
> You should probably keep "if (unlikely(!ops)).." here.

The unlikely is useless in this path.

> 
> Apart from that the patch looks good to me.
> 
> Regards,
> Christian.
> 
> > -		return ret;
> > -
> >  	dmabuf = dma_buf_get(fd);
> >  	if (IS_ERR(dmabuf))
> >  		return ERR_CAST(dmabuf);
> > @@ -184,18 +181,8 @@ struct ib_umem_dmabuf *ib_umem_dmabuf_get(struct ib_device *device,
> >  }
> >  EXPORT_SYMBOL(ib_umem_dmabuf_get);
> >  
> > -static void
> > -ib_umem_dmabuf_unsupported_move_notify(struct dma_buf_attachment *attach)
> > -{
> > -	struct ib_umem_dmabuf *umem_dmabuf = attach->importer_priv;
> > -
> > -	ibdev_warn_ratelimited(umem_dmabuf->umem.ibdev,
> > -			       "Invalidate callback should not be called when memory is pinned\n");
> > -}
> > -
> >  static struct dma_buf_attach_ops ib_umem_dmabuf_attach_pinned_ops = {
> >  	.allow_peer2peer = true,
> > -	.invalidate_mappings = ib_umem_dmabuf_unsupported_move_notify,
> >  };
> >  
> >  struct ib_umem_dmabuf *
> > 
> 

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v5 4/8] vfio: Wait for dma-buf invalidation to complete
  2026-01-30  8:30   ` Christian König
@ 2026-01-30 13:01     ` Leon Romanovsky
  2026-01-30 13:21       ` Christian König
  0 siblings, 1 reply; 51+ messages in thread
From: Leon Romanovsky @ 2026-01-30 13:01 UTC (permalink / raw)
  To: Christian König
  Cc: Sumit Semwal, Alex Deucher, David Airlie, Simona Vetter,
	Gerd Hoffmann, Dmitry Osipenko, Gurchetan Singh, Chia-I Wu,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	Lucas De Marchi, Thomas Hellström, Rodrigo Vivi,
	Jason Gunthorpe, Kevin Tian, Joerg Roedel, Will Deacon,
	Robin Murphy, Felix Kuehling, Alex Williamson, Ankit Agrawal,
	Vivek Kasireddy, linux-media, dri-devel, linaro-mm-sig,
	linux-kernel, amd-gfx, virtualization, intel-xe, linux-rdma,
	iommu, kvm

On Fri, Jan 30, 2026 at 09:30:59AM +0100, Christian König wrote:
> On 1/24/26 20:14, Leon Romanovsky wrote:
> > From: Leon Romanovsky <leonro@nvidia.com>
> > 
> > dma-buf invalidation is handled asynchronously by the hardware, so VFIO
> > must wait until all affected objects have been fully invalidated.
> > 
> > In addition, the dma-buf exporter is expecting that all importers unmap any
> > buffers they previously mapped.
> > 
> > Fixes: 5d74781ebc86 ("vfio/pci: Add dma-buf export support for MMIO regions")
> > Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
> > ---
> >  drivers/vfio/pci/vfio_pci_dmabuf.c | 71 ++++++++++++++++++++++++++++++++++++--
> >  1 file changed, 68 insertions(+), 3 deletions(-)
> > 
> > diff --git a/drivers/vfio/pci/vfio_pci_dmabuf.c b/drivers/vfio/pci/vfio_pci_dmabuf.c
> > index d8ceafabef48..485515629fe4 100644
> > --- a/drivers/vfio/pci/vfio_pci_dmabuf.c
> > +++ b/drivers/vfio/pci/vfio_pci_dmabuf.c
> > @@ -17,6 +17,8 @@ struct vfio_pci_dma_buf {
> >  	struct dma_buf_phys_vec *phys_vec;
> >  	struct p2pdma_provider *provider;
> >  	u32 nr_ranges;
> > +	struct kref kref;
> > +	struct completion comp;
> >  	u8 revoked : 1;
> >  };
> >  
> > @@ -44,27 +46,46 @@ static int vfio_pci_dma_buf_attach(struct dma_buf *dmabuf,
> >  	return 0;
> >  }
> >  
> > +static void vfio_pci_dma_buf_done(struct kref *kref)
> > +{
> > +	struct vfio_pci_dma_buf *priv =
> > +		container_of(kref, struct vfio_pci_dma_buf, kref);
> > +
> > +	complete(&priv->comp);
> > +}
> > +
> >  static struct sg_table *
> >  vfio_pci_dma_buf_map(struct dma_buf_attachment *attachment,
> >  		     enum dma_data_direction dir)
> >  {
> >  	struct vfio_pci_dma_buf *priv = attachment->dmabuf->priv;
> > +	struct sg_table *ret;
> >  
> >  	dma_resv_assert_held(priv->dmabuf->resv);
> >  
> >  	if (priv->revoked)
> >  		return ERR_PTR(-ENODEV);
> >  
> > -	return dma_buf_phys_vec_to_sgt(attachment, priv->provider,
> > -				       priv->phys_vec, priv->nr_ranges,
> > -				       priv->size, dir);
> > +	ret = dma_buf_phys_vec_to_sgt(attachment, priv->provider,
> > +				      priv->phys_vec, priv->nr_ranges,
> > +				      priv->size, dir);
> > +	if (IS_ERR(ret))
> > +		return ret;
> > +
> > +	kref_get(&priv->kref);
> > +	return ret;
> >  }
> >  
> >  static void vfio_pci_dma_buf_unmap(struct dma_buf_attachment *attachment,
> >  				   struct sg_table *sgt,
> >  				   enum dma_data_direction dir)
> >  {
> > +	struct vfio_pci_dma_buf *priv = attachment->dmabuf->priv;
> > +
> > +	dma_resv_assert_held(priv->dmabuf->resv);
> > +
> >  	dma_buf_free_sgt(attachment, sgt, dir);
> > +	kref_put(&priv->kref, vfio_pci_dma_buf_done);
> >  }
> >  
> >  static void vfio_pci_dma_buf_release(struct dma_buf *dmabuf)
> > @@ -287,6 +308,9 @@ int vfio_pci_core_feature_dma_buf(struct vfio_pci_core_device *vdev, u32 flags,
> >  		goto err_dev_put;
> >  	}
> >  
> > +	kref_init(&priv->kref);
> > +	init_completion(&priv->comp);
> > +
> >  	/* dma_buf_put() now frees priv */
> >  	INIT_LIST_HEAD(&priv->dmabufs_elm);
> >  	down_write(&vdev->memory_lock);
> > @@ -326,6 +350,8 @@ void vfio_pci_dma_buf_move(struct vfio_pci_core_device *vdev, bool revoked)
> >  	lockdep_assert_held_write(&vdev->memory_lock);
> >  
> >  	list_for_each_entry_safe(priv, tmp, &vdev->dmabufs, dmabufs_elm) {
> > +		unsigned long wait;
> > +
> >  		if (!get_file_active(&priv->dmabuf->file))
> >  			continue;
> >  
> > @@ -333,7 +359,37 @@ void vfio_pci_dma_buf_move(struct vfio_pci_core_device *vdev, bool revoked)
> >  			dma_resv_lock(priv->dmabuf->resv, NULL);
> >  			priv->revoked = revoked;
> >  			dma_buf_invalidate_mappings(priv->dmabuf);
> > +			dma_resv_wait_timeout(priv->dmabuf->resv,
> > +					      DMA_RESV_USAGE_BOOKKEEP, false,
> > +					      MAX_SCHEDULE_TIMEOUT);
> >  			dma_resv_unlock(priv->dmabuf->resv);
> > +			if (revoked) {
> > +				kref_put(&priv->kref, vfio_pci_dma_buf_done);
> > +				/* Let's wait till all DMA unmap are completed. */
> > +				wait = wait_for_completion_timeout(
> > +					&priv->comp, secs_to_jiffies(1));
> > +				/*
> > +				 * If you see this WARN_ON, it means that
> > +				 * importer didn't call unmap in response to
> > +				 * dma_buf_invalidate_mappings() which is not
> > +				 * allowed.
> > +				 */
> > +				WARN(!wait,
> > +				     "Timed out waiting for DMABUF unmap, importer has a broken invalidate_mapping()");
> 
> You can do the revoke to do your resource management, for example re-use the backing store for something else.
> 
> But it is mandatory that you keep the mapping around indefinitely until the importer closes it.
> 
> Before that you can't do things like runtime PM or remove or anything which would make the DMA addresses invalid.
> 
> As far as I can see vfio_pci_dma_buf_move() is used exactly for that use case so this here is an absolutely clear NAK from my side for this approach.
> 
> You can either split up the functionality of vfio_pci_dma_buf_move() into vfio_pci_dma_buf_invalidate_mappings() and vfio_pci_dma_buf_flush() and then call the later whenever necessary or you keep it in one function and block everybody until the importer has dropped all mappings.

No problem, I can change it to be:

diff --git a/drivers/vfio/pci/vfio_pci_dmabuf.c b/drivers/vfio/pci/vfio_pci_dmabuf.c
index d087d018d547..53772a84c93b 100644
--- a/drivers/vfio/pci/vfio_pci_dmabuf.c
+++ b/drivers/vfio/pci/vfio_pci_dmabuf.c
@@ -357,23 +357,7 @@ void vfio_pci_dma_buf_move(struct vfio_pci_core_device *vdev, bool revoked)
                        dma_resv_unlock(priv->dmabuf->resv);
                        if (revoked) {
                                kref_put(&priv->kref, vfio_pci_dma_buf_done);
-                               /*
-                                * Let's wait for 1 second till all DMA unmap
-                                * are completed. It is supposed to catch dma-buf
-                                * importers which lied about their support
-                                * of dmabuf revoke. See dma_buf_invalidate_mappings()
-                                * for the expected behaviour.
-                                */
-                               wait = wait_for_completion_timeout(
-                                       &priv->comp, secs_to_jiffies(1));
-                               /*
-                                * If you see this WARN_ON, it means that
-                                * importer didn't call unmap in response to
-                                * dma_buf_invalidate_mappings() which is not
-                                * allowed.
-                                */
-                               WARN(!wait,
-                                    "Timed out waiting for DMABUF unmap, importer has a broken invalidate_mapping()");
+                               wait_for_completion(&priv->comp);
                        } else {
                                /*
                                 * Kref is initialize again, because when revoke

Do you want me to send v6?

Thanks

> 
> > +			} else {
> > +				/*
> > +				 * Kref is initialize again, because when revoke
> > +				 * was performed the reference counter was decreased
> > +				 * to zero to trigger completion.
> > +				 */
> > +				kref_init(&priv->kref);
> > +				/*
> > +				 * There is no need to wait as no mapping was
> > +				 * performed when the previous status was
> > +				 * priv->revoked == true.
> > +				 */
> > +				reinit_completion(&priv->comp);
> > +			}
> >  		}
> >  		fput(priv->dmabuf->file);
> 
> This is also extremely questionable. Why doesn't the dmabuf have a reference while on the linked list?
> 
> Regards,
> Christian.
> 
> >  	}
> > @@ -346,6 +402,8 @@ void vfio_pci_dma_buf_cleanup(struct vfio_pci_core_device *vdev)
> >  
> >  	down_write(&vdev->memory_lock);
> >  	list_for_each_entry_safe(priv, tmp, &vdev->dmabufs, dmabufs_elm) {
> > +		unsigned long wait;
> > +
> >  		if (!get_file_active(&priv->dmabuf->file))
> >  			continue;
> >  
> > @@ -354,7 +412,14 @@ void vfio_pci_dma_buf_cleanup(struct vfio_pci_core_device *vdev)
> >  		priv->vdev = NULL;
> >  		priv->revoked = true;
> >  		dma_buf_invalidate_mappings(priv->dmabuf);
> > +		dma_resv_wait_timeout(priv->dmabuf->resv,
> > +				      DMA_RESV_USAGE_BOOKKEEP, false,
> > +				      MAX_SCHEDULE_TIMEOUT);
> >  		dma_resv_unlock(priv->dmabuf->resv);
> > +		kref_put(&priv->kref, vfio_pci_dma_buf_done);
> > +		wait = wait_for_completion_timeout(&priv->comp,
> > +						   secs_to_jiffies(1));
> > +		WARN_ON(!wait);
> >  		vfio_device_put_registration(&vdev->vdev);
> >  		fput(priv->dmabuf->file);
> >  	}
> > 
> 
> 

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* Re: [PATCH v5 4/8] vfio: Wait for dma-buf invalidation to complete
  2026-01-30 13:01     ` Leon Romanovsky
@ 2026-01-30 13:21       ` Christian König
  2026-01-30 13:31         ` Leon Romanovsky
  2026-01-30 13:56         ` Jason Gunthorpe
  0 siblings, 2 replies; 51+ messages in thread
From: Christian König @ 2026-01-30 13:21 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Sumit Semwal, Alex Deucher, David Airlie, Simona Vetter,
	Gerd Hoffmann, Dmitry Osipenko, Gurchetan Singh, Chia-I Wu,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	Lucas De Marchi, Thomas Hellström, Rodrigo Vivi,
	Jason Gunthorpe, Kevin Tian, Joerg Roedel, Will Deacon,
	Robin Murphy, Felix Kuehling, Alex Williamson, Ankit Agrawal,
	Vivek Kasireddy, linux-media, dri-devel, linaro-mm-sig,
	linux-kernel, amd-gfx, virtualization, intel-xe, linux-rdma,
	iommu, kvm

On 1/30/26 14:01, Leon Romanovsky wrote:
> On Fri, Jan 30, 2026 at 09:30:59AM +0100, Christian König wrote:
>> On 1/24/26 20:14, Leon Romanovsky wrote:
>>> From: Leon Romanovsky <leonro@nvidia.com>
>>>
>>> dma-buf invalidation is handled asynchronously by the hardware, so VFIO
>>> must wait until all affected objects have been fully invalidated.
>>>
>>> In addition, the dma-buf exporter is expecting that all importers unmap any
>>> buffers they previously mapped.
>>>
>>> Fixes: 5d74781ebc86 ("vfio/pci: Add dma-buf export support for MMIO regions")
>>> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
>>> ---
>>>  drivers/vfio/pci/vfio_pci_dmabuf.c | 71 ++++++++++++++++++++++++++++++++++++--
>>>  1 file changed, 68 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/drivers/vfio/pci/vfio_pci_dmabuf.c b/drivers/vfio/pci/vfio_pci_dmabuf.c
>>> index d8ceafabef48..485515629fe4 100644
>>> --- a/drivers/vfio/pci/vfio_pci_dmabuf.c
>>> +++ b/drivers/vfio/pci/vfio_pci_dmabuf.c
>>> @@ -17,6 +17,8 @@ struct vfio_pci_dma_buf {
>>>  	struct dma_buf_phys_vec *phys_vec;
>>>  	struct p2pdma_provider *provider;
>>>  	u32 nr_ranges;
>>> +	struct kref kref;
>>> +	struct completion comp;
>>>  	u8 revoked : 1;
>>>  };
>>>  
>>> @@ -44,27 +46,46 @@ static int vfio_pci_dma_buf_attach(struct dma_buf *dmabuf,
>>>  	return 0;
>>>  }
>>>  
>>> +static void vfio_pci_dma_buf_done(struct kref *kref)
>>> +{
>>> +	struct vfio_pci_dma_buf *priv =
>>> +		container_of(kref, struct vfio_pci_dma_buf, kref);
>>> +
>>> +	complete(&priv->comp);
>>> +}
>>> +
>>>  static struct sg_table *
>>>  vfio_pci_dma_buf_map(struct dma_buf_attachment *attachment,
>>>  		     enum dma_data_direction dir)
>>>  {
>>>  	struct vfio_pci_dma_buf *priv = attachment->dmabuf->priv;
>>> +	struct sg_table *ret;
>>>  
>>>  	dma_resv_assert_held(priv->dmabuf->resv);
>>>  
>>>  	if (priv->revoked)
>>>  		return ERR_PTR(-ENODEV);
>>>  
>>> -	return dma_buf_phys_vec_to_sgt(attachment, priv->provider,
>>> -				       priv->phys_vec, priv->nr_ranges,
>>> -				       priv->size, dir);
>>> +	ret = dma_buf_phys_vec_to_sgt(attachment, priv->provider,
>>> +				      priv->phys_vec, priv->nr_ranges,
>>> +				      priv->size, dir);
>>> +	if (IS_ERR(ret))
>>> +		return ret;
>>> +
>>> +	kref_get(&priv->kref);
>>> +	return ret;
>>>  }
>>>  
>>>  static void vfio_pci_dma_buf_unmap(struct dma_buf_attachment *attachment,
>>>  				   struct sg_table *sgt,
>>>  				   enum dma_data_direction dir)
>>>  {
>>> +	struct vfio_pci_dma_buf *priv = attachment->dmabuf->priv;
>>> +
>>> +	dma_resv_assert_held(priv->dmabuf->resv);
>>> +
>>>  	dma_buf_free_sgt(attachment, sgt, dir);
>>> +	kref_put(&priv->kref, vfio_pci_dma_buf_done);
>>>  }
>>>  
>>>  static void vfio_pci_dma_buf_release(struct dma_buf *dmabuf)
>>> @@ -287,6 +308,9 @@ int vfio_pci_core_feature_dma_buf(struct vfio_pci_core_device *vdev, u32 flags,
>>>  		goto err_dev_put;
>>>  	}
>>>  
>>> +	kref_init(&priv->kref);
>>> +	init_completion(&priv->comp);
>>> +
>>>  	/* dma_buf_put() now frees priv */
>>>  	INIT_LIST_HEAD(&priv->dmabufs_elm);
>>>  	down_write(&vdev->memory_lock);
>>> @@ -326,6 +350,8 @@ void vfio_pci_dma_buf_move(struct vfio_pci_core_device *vdev, bool revoked)
>>>  	lockdep_assert_held_write(&vdev->memory_lock);
>>>  
>>>  	list_for_each_entry_safe(priv, tmp, &vdev->dmabufs, dmabufs_elm) {
>>> +		unsigned long wait;
>>> +
>>>  		if (!get_file_active(&priv->dmabuf->file))
>>>  			continue;
>>>  
>>> @@ -333,7 +359,37 @@ void vfio_pci_dma_buf_move(struct vfio_pci_core_device *vdev, bool revoked)
>>>  			dma_resv_lock(priv->dmabuf->resv, NULL);
>>>  			priv->revoked = revoked;
>>>  			dma_buf_invalidate_mappings(priv->dmabuf);
>>> +			dma_resv_wait_timeout(priv->dmabuf->resv,
>>> +					      DMA_RESV_USAGE_BOOKKEEP, false,
>>> +					      MAX_SCHEDULE_TIMEOUT);
>>>  			dma_resv_unlock(priv->dmabuf->resv);
>>> +			if (revoked) {
>>> +				kref_put(&priv->kref, vfio_pci_dma_buf_done);
>>> +				/* Let's wait till all DMA unmap are completed. */
>>> +				wait = wait_for_completion_timeout(
>>> +					&priv->comp, secs_to_jiffies(1));
>>> +				/*
>>> +				 * If you see this WARN_ON, it means that
>>> +				 * importer didn't call unmap in response to
>>> +				 * dma_buf_invalidate_mappings() which is not
>>> +				 * allowed.
>>> +				 */
>>> +				WARN(!wait,
>>> +				     "Timed out waiting for DMABUF unmap, importer has a broken invalidate_mapping()");
>>
>> You can do the revoke to do your resource management, for example re-use the backing store for something else.
>>
>> But it is mandatory that you keep the mapping around indefinitely until the importer closes it.
>>
>> Before that you can't do things like runtime PM or remove or anything which would make the DMA addresses invalid.
>>
>> As far as I can see vfio_pci_dma_buf_move() is used exactly for that use case so this here is an absolutely clear NAK from my side for this approach.
>>
>> You can either split up the functionality of vfio_pci_dma_buf_move() into vfio_pci_dma_buf_invalidate_mappings() and vfio_pci_dma_buf_flush() and then call the later whenever necessary or you keep it in one function and block everybody until the importer has dropped all mappings.
> 
> No problem, I can change it to be:
> 
> diff --git a/drivers/vfio/pci/vfio_pci_dmabuf.c b/drivers/vfio/pci/vfio_pci_dmabuf.c
> index d087d018d547..53772a84c93b 100644
> --- a/drivers/vfio/pci/vfio_pci_dmabuf.c
> +++ b/drivers/vfio/pci/vfio_pci_dmabuf.c
> @@ -357,23 +357,7 @@ void vfio_pci_dma_buf_move(struct vfio_pci_core_device *vdev, bool revoked)
>                         dma_resv_unlock(priv->dmabuf->resv);
>                         if (revoked) {
>                                 kref_put(&priv->kref, vfio_pci_dma_buf_done);
> -                               /*
> -                                * Let's wait for 1 second till all DMA unmap
> -                                * are completed. It is supposed to catch dma-buf
> -                                * importers which lied about their support
> -                                * of dmabuf revoke. See dma_buf_invalidate_mappings()
> -                                * for the expected behaviour.
> -                                */
> -                               wait = wait_for_completion_timeout(
> -                                       &priv->comp, secs_to_jiffies(1));
> -                               /*
> -                                * If you see this WARN_ON, it means that
> -                                * importer didn't call unmap in response to
> -                                * dma_buf_invalidate_mappings() which is not
> -                                * allowed.
> -                                */
> -                               WARN(!wait,
> -                                    "Timed out waiting for DMABUF unmap, importer has a broken invalidate_mapping()");
> +                               wait_for_completion(&priv->comp);
>                         } else {
>                                 /*
>                                  * Kref is initialize again, because when revoke
> 
> Do you want me to send v6?

That would work for me.

Question is if you really want to do it this way? See usually exporters try to avoid blocking such functions.

What exporters usually do instead is to grab references, e.g. call pm_runtime_get_sync() when either a DMA-buf, a DMA-buf attachment or in your case here a mapping of this attachment is made.

But all of this is just a suggestion, if you are fine with blocking then feel free to add my rb.

Regards,
Christian.

> 
> Thanks
> 
>>
>>> +			} else {
>>> +				/*
>>> +				 * Kref is initialize again, because when revoke
>>> +				 * was performed the reference counter was decreased
>>> +				 * to zero to trigger completion.
>>> +				 */
>>> +				kref_init(&priv->kref);
>>> +				/*
>>> +				 * There is no need to wait as no mapping was
>>> +				 * performed when the previous status was
>>> +				 * priv->revoked == true.
>>> +				 */
>>> +				reinit_completion(&priv->comp);
>>> +			}
>>>  		}
>>>  		fput(priv->dmabuf->file);
>>
>> This is also extremely questionable. Why doesn't the dmabuf have a reference while on the linked list?
>>
>> Regards,
>> Christian.
>>
>>>  	}
>>> @@ -346,6 +402,8 @@ void vfio_pci_dma_buf_cleanup(struct vfio_pci_core_device *vdev)
>>>  
>>>  	down_write(&vdev->memory_lock);
>>>  	list_for_each_entry_safe(priv, tmp, &vdev->dmabufs, dmabufs_elm) {
>>> +		unsigned long wait;
>>> +
>>>  		if (!get_file_active(&priv->dmabuf->file))
>>>  			continue;
>>>  
>>> @@ -354,7 +412,14 @@ void vfio_pci_dma_buf_cleanup(struct vfio_pci_core_device *vdev)
>>>  		priv->vdev = NULL;
>>>  		priv->revoked = true;
>>>  		dma_buf_invalidate_mappings(priv->dmabuf);
>>> +		dma_resv_wait_timeout(priv->dmabuf->resv,
>>> +				      DMA_RESV_USAGE_BOOKKEEP, false,
>>> +				      MAX_SCHEDULE_TIMEOUT);
>>>  		dma_resv_unlock(priv->dmabuf->resv);
>>> +		kref_put(&priv->kref, vfio_pci_dma_buf_done);
>>> +		wait = wait_for_completion_timeout(&priv->comp,
>>> +						   secs_to_jiffies(1));
>>> +		WARN_ON(!wait);
>>>  		vfio_device_put_registration(&vdev->vdev);
>>>  		fput(priv->dmabuf->file);
>>>  	}
>>>
>>
>>


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v5 4/8] vfio: Wait for dma-buf invalidation to complete
  2026-01-30 13:21       ` Christian König
@ 2026-01-30 13:31         ` Leon Romanovsky
  2026-01-30 13:56         ` Jason Gunthorpe
  1 sibling, 0 replies; 51+ messages in thread
From: Leon Romanovsky @ 2026-01-30 13:31 UTC (permalink / raw)
  To: Christian König
  Cc: Sumit Semwal, Alex Deucher, David Airlie, Simona Vetter,
	Gerd Hoffmann, Dmitry Osipenko, Gurchetan Singh, Chia-I Wu,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	Lucas De Marchi, Thomas Hellström, Rodrigo Vivi,
	Jason Gunthorpe, Kevin Tian, Joerg Roedel, Will Deacon,
	Robin Murphy, Felix Kuehling, Alex Williamson, Ankit Agrawal,
	Vivek Kasireddy, linux-media, dri-devel, linaro-mm-sig,
	linux-kernel, amd-gfx, virtualization, intel-xe, linux-rdma,
	iommu, kvm

On Fri, Jan 30, 2026 at 02:21:08PM +0100, Christian König wrote:
> On 1/30/26 14:01, Leon Romanovsky wrote:
> > On Fri, Jan 30, 2026 at 09:30:59AM +0100, Christian König wrote:
> >> On 1/24/26 20:14, Leon Romanovsky wrote:
> >>> From: Leon Romanovsky <leonro@nvidia.com>
> >>>
> >>> dma-buf invalidation is handled asynchronously by the hardware, so VFIO
> >>> must wait until all affected objects have been fully invalidated.
> >>>
> >>> In addition, the dma-buf exporter is expecting that all importers unmap any
> >>> buffers they previously mapped.
> >>>
> >>> Fixes: 5d74781ebc86 ("vfio/pci: Add dma-buf export support for MMIO regions")
> >>> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
> >>> ---
> >>>  drivers/vfio/pci/vfio_pci_dmabuf.c | 71 ++++++++++++++++++++++++++++++++++++--
> >>>  1 file changed, 68 insertions(+), 3 deletions(-)
> >>>
> >>> diff --git a/drivers/vfio/pci/vfio_pci_dmabuf.c b/drivers/vfio/pci/vfio_pci_dmabuf.c
> >>> index d8ceafabef48..485515629fe4 100644
> >>> --- a/drivers/vfio/pci/vfio_pci_dmabuf.c
> >>> +++ b/drivers/vfio/pci/vfio_pci_dmabuf.c
> >>> @@ -17,6 +17,8 @@ struct vfio_pci_dma_buf {
> >>>  	struct dma_buf_phys_vec *phys_vec;
> >>>  	struct p2pdma_provider *provider;
> >>>  	u32 nr_ranges;
> >>> +	struct kref kref;
> >>> +	struct completion comp;
> >>>  	u8 revoked : 1;
> >>>  };
> >>>  
> >>> @@ -44,27 +46,46 @@ static int vfio_pci_dma_buf_attach(struct dma_buf *dmabuf,
> >>>  	return 0;
> >>>  }
> >>>  
> >>> +static void vfio_pci_dma_buf_done(struct kref *kref)
> >>> +{
> >>> +	struct vfio_pci_dma_buf *priv =
> >>> +		container_of(kref, struct vfio_pci_dma_buf, kref);
> >>> +
> >>> +	complete(&priv->comp);
> >>> +}
> >>> +
> >>>  static struct sg_table *
> >>>  vfio_pci_dma_buf_map(struct dma_buf_attachment *attachment,
> >>>  		     enum dma_data_direction dir)
> >>>  {
> >>>  	struct vfio_pci_dma_buf *priv = attachment->dmabuf->priv;
> >>> +	struct sg_table *ret;
> >>>  
> >>>  	dma_resv_assert_held(priv->dmabuf->resv);
> >>>  
> >>>  	if (priv->revoked)
> >>>  		return ERR_PTR(-ENODEV);
> >>>  
> >>> -	return dma_buf_phys_vec_to_sgt(attachment, priv->provider,
> >>> -				       priv->phys_vec, priv->nr_ranges,
> >>> -				       priv->size, dir);
> >>> +	ret = dma_buf_phys_vec_to_sgt(attachment, priv->provider,
> >>> +				      priv->phys_vec, priv->nr_ranges,
> >>> +				      priv->size, dir);
> >>> +	if (IS_ERR(ret))
> >>> +		return ret;
> >>> +
> >>> +	kref_get(&priv->kref);
> >>> +	return ret;
> >>>  }
> >>>  
> >>>  static void vfio_pci_dma_buf_unmap(struct dma_buf_attachment *attachment,
> >>>  				   struct sg_table *sgt,
> >>>  				   enum dma_data_direction dir)
> >>>  {
> >>> +	struct vfio_pci_dma_buf *priv = attachment->dmabuf->priv;
> >>> +
> >>> +	dma_resv_assert_held(priv->dmabuf->resv);
> >>> +
> >>>  	dma_buf_free_sgt(attachment, sgt, dir);
> >>> +	kref_put(&priv->kref, vfio_pci_dma_buf_done);
> >>>  }
> >>>  
> >>>  static void vfio_pci_dma_buf_release(struct dma_buf *dmabuf)
> >>> @@ -287,6 +308,9 @@ int vfio_pci_core_feature_dma_buf(struct vfio_pci_core_device *vdev, u32 flags,
> >>>  		goto err_dev_put;
> >>>  	}
> >>>  
> >>> +	kref_init(&priv->kref);
> >>> +	init_completion(&priv->comp);
> >>> +
> >>>  	/* dma_buf_put() now frees priv */
> >>>  	INIT_LIST_HEAD(&priv->dmabufs_elm);
> >>>  	down_write(&vdev->memory_lock);
> >>> @@ -326,6 +350,8 @@ void vfio_pci_dma_buf_move(struct vfio_pci_core_device *vdev, bool revoked)
> >>>  	lockdep_assert_held_write(&vdev->memory_lock);
> >>>  
> >>>  	list_for_each_entry_safe(priv, tmp, &vdev->dmabufs, dmabufs_elm) {
> >>> +		unsigned long wait;
> >>> +
> >>>  		if (!get_file_active(&priv->dmabuf->file))
> >>>  			continue;
> >>>  
> >>> @@ -333,7 +359,37 @@ void vfio_pci_dma_buf_move(struct vfio_pci_core_device *vdev, bool revoked)
> >>>  			dma_resv_lock(priv->dmabuf->resv, NULL);
> >>>  			priv->revoked = revoked;
> >>>  			dma_buf_invalidate_mappings(priv->dmabuf);
> >>> +			dma_resv_wait_timeout(priv->dmabuf->resv,
> >>> +					      DMA_RESV_USAGE_BOOKKEEP, false,
> >>> +					      MAX_SCHEDULE_TIMEOUT);
> >>>  			dma_resv_unlock(priv->dmabuf->resv);
> >>> +			if (revoked) {
> >>> +				kref_put(&priv->kref, vfio_pci_dma_buf_done);
> >>> +				/* Let's wait till all DMA unmap are completed. */
> >>> +				wait = wait_for_completion_timeout(
> >>> +					&priv->comp, secs_to_jiffies(1));
> >>> +				/*
> >>> +				 * If you see this WARN_ON, it means that
> >>> +				 * importer didn't call unmap in response to
> >>> +				 * dma_buf_invalidate_mappings() which is not
> >>> +				 * allowed.
> >>> +				 */
> >>> +				WARN(!wait,
> >>> +				     "Timed out waiting for DMABUF unmap, importer has a broken invalidate_mapping()");
> >>
> >> You can do the revoke to do your resource management, for example re-use the backing store for something else.
> >>
> >> But it is mandatory that you keep the mapping around indefinitely until the importer closes it.
> >>
> >> Before that you can't do things like runtime PM or remove or anything which would make the DMA addresses invalid.
> >>
> >> As far as I can see vfio_pci_dma_buf_move() is used exactly for that use case so this here is an absolutely clear NAK from my side for this approach.
> >>
> >> You can either split up the functionality of vfio_pci_dma_buf_move() into vfio_pci_dma_buf_invalidate_mappings() and vfio_pci_dma_buf_flush() and then call the later whenever necessary or you keep it in one function and block everybody until the importer has dropped all mappings.
> > 
> > No problem, I can change it to be:
> > 
> > diff --git a/drivers/vfio/pci/vfio_pci_dmabuf.c b/drivers/vfio/pci/vfio_pci_dmabuf.c
> > index d087d018d547..53772a84c93b 100644
> > --- a/drivers/vfio/pci/vfio_pci_dmabuf.c
> > +++ b/drivers/vfio/pci/vfio_pci_dmabuf.c
> > @@ -357,23 +357,7 @@ void vfio_pci_dma_buf_move(struct vfio_pci_core_device *vdev, bool revoked)
> >                         dma_resv_unlock(priv->dmabuf->resv);
> >                         if (revoked) {
> >                                 kref_put(&priv->kref, vfio_pci_dma_buf_done);
> > -                               /*
> > -                                * Let's wait for 1 second till all DMA unmap
> > -                                * are completed. It is supposed to catch dma-buf
> > -                                * importers which lied about their support
> > -                                * of dmabuf revoke. See dma_buf_invalidate_mappings()
> > -                                * for the expected behaviour.
> > -                                */
> > -                               wait = wait_for_completion_timeout(
> > -                                       &priv->comp, secs_to_jiffies(1));
> > -                               /*
> > -                                * If you see this WARN_ON, it means that
> > -                                * importer didn't call unmap in response to
> > -                                * dma_buf_invalidate_mappings() which is not
> > -                                * allowed.
> > -                                */
> > -                               WARN(!wait,
> > -                                    "Timed out waiting for DMABUF unmap, importer has a broken invalidate_mapping()");
> > +                               wait_for_completion(&priv->comp);
> >                         } else {
> >                                 /*
> >                                  * Kref is initialize again, because when revoke
> > 
> > Do you want me to send v6?
> 
> That would work for me.
> 
> Question is if you really want to do it this way? See usually exporters try to avoid blocking such functions.
> 
> What exporters usually do instead is to grab references, e.g. call pm_runtime_get_sync() when either a DMA-buf, a DMA-buf attachment or in your case here a mapping of this attachment is made.

I view this as an enhancement that can be addressed later down the road.

> 
> But all of this is just a suggestion, if you are fine with blocking then feel free to add my rb.

It is fine for initial version. We need to start somewhere.

Thanks

> 
> Regards,
> Christian.
> 
> > 
> > Thanks
> > 
> >>
> >>> +			} else {
> >>> +				/*
> >>> +				 * Kref is initialize again, because when revoke
> >>> +				 * was performed the reference counter was decreased
> >>> +				 * to zero to trigger completion.
> >>> +				 */
> >>> +				kref_init(&priv->kref);
> >>> +				/*
> >>> +				 * There is no need to wait as no mapping was
> >>> +				 * performed when the previous status was
> >>> +				 * priv->revoked == true.
> >>> +				 */
> >>> +				reinit_completion(&priv->comp);
> >>> +			}
> >>>  		}
> >>>  		fput(priv->dmabuf->file);
> >>
> >> This is also extremely questionable. Why doesn't the dmabuf have a reference while on the linked list?
> >>
> >> Regards,
> >> Christian.
> >>
> >>>  	}
> >>> @@ -346,6 +402,8 @@ void vfio_pci_dma_buf_cleanup(struct vfio_pci_core_device *vdev)
> >>>  
> >>>  	down_write(&vdev->memory_lock);
> >>>  	list_for_each_entry_safe(priv, tmp, &vdev->dmabufs, dmabufs_elm) {
> >>> +		unsigned long wait;
> >>> +
> >>>  		if (!get_file_active(&priv->dmabuf->file))
> >>>  			continue;
> >>>  
> >>> @@ -354,7 +412,14 @@ void vfio_pci_dma_buf_cleanup(struct vfio_pci_core_device *vdev)
> >>>  		priv->vdev = NULL;
> >>>  		priv->revoked = true;
> >>>  		dma_buf_invalidate_mappings(priv->dmabuf);
> >>> +		dma_resv_wait_timeout(priv->dmabuf->resv,
> >>> +				      DMA_RESV_USAGE_BOOKKEEP, false,
> >>> +				      MAX_SCHEDULE_TIMEOUT);
> >>>  		dma_resv_unlock(priv->dmabuf->resv);
> >>> +		kref_put(&priv->kref, vfio_pci_dma_buf_done);
> >>> +		wait = wait_for_completion_timeout(&priv->comp,
> >>> +						   secs_to_jiffies(1));
> >>> +		WARN_ON(!wait);
> >>>  		vfio_device_put_registration(&vdev->vdev);
> >>>  		fput(priv->dmabuf->file);
> >>>  	}
> >>>
> >>
> >>
> 
> 

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v5 4/8] vfio: Wait for dma-buf invalidation to complete
  2026-01-30 13:21       ` Christian König
  2026-01-30 13:31         ` Leon Romanovsky
@ 2026-01-30 13:56         ` Jason Gunthorpe
  2026-01-30 14:11           ` Christian König
  1 sibling, 1 reply; 51+ messages in thread
From: Jason Gunthorpe @ 2026-01-30 13:56 UTC (permalink / raw)
  To: Christian König
  Cc: Leon Romanovsky, Sumit Semwal, Alex Deucher, David Airlie,
	Simona Vetter, Gerd Hoffmann, Dmitry Osipenko, Gurchetan Singh,
	Chia-I Wu, Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	Lucas De Marchi, Thomas Hellström, Rodrigo Vivi, Kevin Tian,
	Joerg Roedel, Will Deacon, Robin Murphy, Felix Kuehling,
	Alex Williamson, Ankit Agrawal, Vivek Kasireddy, linux-media,
	dri-devel, linaro-mm-sig, linux-kernel, amd-gfx, virtualization,
	intel-xe, linux-rdma, iommu, kvm

On Fri, Jan 30, 2026 at 02:21:08PM +0100, Christian König wrote:

> That would work for me.
> 
> Question is if you really want to do it this way? See usually
> exporters try to avoid blocking such functions.

Yes, it has to be this way, revoke is a synchronous user space
triggered operation around things like FLR or device close. We can't
defer it into some background operation like pm.

> >>>  		}
> >>>  		fput(priv->dmabuf->file);
> >>
> >> This is also extremely questionable. Why doesn't the dmabuf have
> >> a reference while on the linked list?

If we hold a refcount while on the list then the FD can never be
closed.

There is locking protecting the list so that it is safe and close
continues to work right.

Jason

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v5 6/8] dma-buf: Add dma_buf_attach_revocable()
  2026-01-30  8:43   ` Christian König
@ 2026-01-30 14:00     ` Jason Gunthorpe
  0 siblings, 0 replies; 51+ messages in thread
From: Jason Gunthorpe @ 2026-01-30 14:00 UTC (permalink / raw)
  To: Christian König
  Cc: Leon Romanovsky, Sumit Semwal, Alex Deucher, David Airlie,
	Simona Vetter, Gerd Hoffmann, Dmitry Osipenko, Gurchetan Singh,
	Chia-I Wu, Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	Lucas De Marchi, Thomas Hellström, Rodrigo Vivi, Kevin Tian,
	Joerg Roedel, Will Deacon, Robin Murphy, Felix Kuehling,
	Alex Williamson, Ankit Agrawal, Vivek Kasireddy, linux-media,
	dri-devel, linaro-mm-sig, linux-kernel, amd-gfx, virtualization,
	intel-xe, linux-rdma, iommu, kvm

On Fri, Jan 30, 2026 at 09:43:22AM +0100, Christian König wrote:
> On 1/24/26 20:14, Leon Romanovsky wrote:
> > From: Leon Romanovsky <leonro@nvidia.com>
> > 
> > Some exporters need a flow to synchronously revoke access to the DMA-buf
> > by importers. Once revoke is completed the importer is not permitted to
> > touch the memory otherwise they may get IOMMU faults, AERs, or worse.
> 
> That approach is seriously not going to fly.
> 
> You can use the invalidate_mappings approach to trigger the importer
> to give back the mapping, but when the mapping is really given back
> is still completely on the importer side.

Yes, and that is what this is all doing, there is the wait for the
importer's unmap to happen in the sequence.

> In other words you can't do the shot down revoke semantics you are
> trying to establish here.

All this is doing is saying if dma_buf_attach_revocable() == true then
the importer will call unmap within bounded time after
dma_buf_invalidate_mappings().

That's it. If the importing driver doesn't want to do that then it
should make dma_buf_attach_revocable()=false.

VFIO/etc only want to interwork with importers that can do this.

Jason

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v5 4/8] vfio: Wait for dma-buf invalidation to complete
  2026-01-30 13:56         ` Jason Gunthorpe
@ 2026-01-30 14:11           ` Christian König
  2026-01-30 14:44             ` Jason Gunthorpe
  0 siblings, 1 reply; 51+ messages in thread
From: Christian König @ 2026-01-30 14:11 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Leon Romanovsky, Sumit Semwal, Alex Deucher, David Airlie,
	Simona Vetter, Gerd Hoffmann, Dmitry Osipenko, Gurchetan Singh,
	Chia-I Wu, Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	Lucas De Marchi, Thomas Hellström, Rodrigo Vivi, Kevin Tian,
	Joerg Roedel, Will Deacon, Robin Murphy, Felix Kuehling,
	Alex Williamson, Ankit Agrawal, Vivek Kasireddy, linux-media,
	dri-devel, linaro-mm-sig, linux-kernel, amd-gfx, virtualization,
	intel-xe, linux-rdma, iommu, kvm

On 1/30/26 14:56, Jason Gunthorpe wrote:
> On Fri, Jan 30, 2026 at 02:21:08PM +0100, Christian König wrote:
> 
>> That would work for me.
>>
>> Question is if you really want to do it this way? See usually
>> exporters try to avoid blocking such functions.
> 
> Yes, it has to be this way, revoke is a synchronous user space
> triggered operation around things like FLR or device close. We can't
> defer it into some background operation like pm.

Yeah, but you only need that in a couple of use cases and not all.

Especially when the device is idle exporters usually don't want runtime PM to kick in and try to suspend the device while importers are still using it.

> 
>>>>>  		}
>>>>>  		fput(priv->dmabuf->file);
>>>>
>>>> This is also extremely questionable. Why doesn't the dmabuf have
>>>> a reference while on the linked list?
> 
> If we hold a refcount while on the list then the FD can never be
> closed.
> 
> There is locking protecting the list so that it is safe and close
> continues to work right.

Ah! You use get_file_active(), yeah that should work.

Thanks,
Christian.


> 
> Jason


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v5 4/8] vfio: Wait for dma-buf invalidation to complete
  2026-01-30 14:11           ` Christian König
@ 2026-01-30 14:44             ` Jason Gunthorpe
  2026-02-02  8:42               ` Christian König
  0 siblings, 1 reply; 51+ messages in thread
From: Jason Gunthorpe @ 2026-01-30 14:44 UTC (permalink / raw)
  To: Christian König
  Cc: Leon Romanovsky, Sumit Semwal, Alex Deucher, David Airlie,
	Simona Vetter, Gerd Hoffmann, Dmitry Osipenko, Gurchetan Singh,
	Chia-I Wu, Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	Lucas De Marchi, Thomas Hellström, Rodrigo Vivi, Kevin Tian,
	Joerg Roedel, Will Deacon, Robin Murphy, Felix Kuehling,
	Alex Williamson, Ankit Agrawal, Vivek Kasireddy, linux-media,
	dri-devel, linaro-mm-sig, linux-kernel, amd-gfx, virtualization,
	intel-xe, linux-rdma, iommu, kvm

On Fri, Jan 30, 2026 at 03:11:48PM +0100, Christian König wrote:
> On 1/30/26 14:56, Jason Gunthorpe wrote:
> > On Fri, Jan 30, 2026 at 02:21:08PM +0100, Christian König wrote:
> > 
> >> That would work for me.
> >>
> >> Question is if you really want to do it this way? See usually
> >> exporters try to avoid blocking such functions.
> > 
> > Yes, it has to be this way, revoke is a synchronous user space
> > triggered operation around things like FLR or device close. We can't
> > defer it into some background operation like pm.
> 
> Yeah, but you only need that in a couple of use cases and not all.

Not all, that is why the dma_buf_attach_revocable() is there to
distinguish this case from others.

Jason

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v5 4/8] vfio: Wait for dma-buf invalidation to complete
  2026-01-30 14:44             ` Jason Gunthorpe
@ 2026-02-02  8:42               ` Christian König
  2026-02-02 15:12                 ` Jason Gunthorpe
  0 siblings, 1 reply; 51+ messages in thread
From: Christian König @ 2026-02-02  8:42 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Leon Romanovsky, Sumit Semwal, Alex Deucher, David Airlie,
	Simona Vetter, Gerd Hoffmann, Dmitry Osipenko, Gurchetan Singh,
	Chia-I Wu, Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	Lucas De Marchi, Thomas Hellström, Rodrigo Vivi, Kevin Tian,
	Joerg Roedel, Will Deacon, Robin Murphy, Felix Kuehling,
	Alex Williamson, Ankit Agrawal, Vivek Kasireddy, linux-media,
	dri-devel, linaro-mm-sig, linux-kernel, amd-gfx, virtualization,
	intel-xe, linux-rdma, iommu, kvm

On 1/30/26 15:44, Jason Gunthorpe wrote:
> On Fri, Jan 30, 2026 at 03:11:48PM +0100, Christian König wrote:
>> On 1/30/26 14:56, Jason Gunthorpe wrote:
>>> On Fri, Jan 30, 2026 at 02:21:08PM +0100, Christian König wrote:
>>>
>>>> That would work for me.
>>>>
>>>> Question is if you really want to do it this way? See usually
>>>> exporters try to avoid blocking such functions.
>>>
>>> Yes, it has to be this way, revoke is a synchronous user space
>>> triggered operation around things like FLR or device close. We can't
>>> defer it into some background operation like pm.
>>
>> Yeah, but you only need that in a couple of use cases and not all.
> 
> Not all, that is why the dma_buf_attach_revocable() is there to
> distinguish this case from others.

No, no that's not what I mean.

See on the one hand you have runtime PM which automatically shuts down your device after some time when the last user stops using it.

Then on the other hand you have an intentional revoke triggered by userspace.

As far as I've read up on the code currently both are handled the same way, and that doesn't sound correct to me.

Runtime PM should *not* trigger automatically when there are still mappings or even DMA-bufs in existence for the VFIO device.

Regards,
Christian.

> 
> Jason


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v5 4/8] vfio: Wait for dma-buf invalidation to complete
  2026-02-02  8:42               ` Christian König
@ 2026-02-02 15:12                 ` Jason Gunthorpe
  2026-02-02 15:21                   ` Christian König
  0 siblings, 1 reply; 51+ messages in thread
From: Jason Gunthorpe @ 2026-02-02 15:12 UTC (permalink / raw)
  To: Christian König
  Cc: Leon Romanovsky, Sumit Semwal, Alex Deucher, David Airlie,
	Simona Vetter, Gerd Hoffmann, Dmitry Osipenko, Gurchetan Singh,
	Chia-I Wu, Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	Lucas De Marchi, Thomas Hellström, Rodrigo Vivi, Kevin Tian,
	Joerg Roedel, Will Deacon, Robin Murphy, Felix Kuehling,
	Alex Williamson, Ankit Agrawal, Vivek Kasireddy, linux-media,
	dri-devel, linaro-mm-sig, linux-kernel, amd-gfx, virtualization,
	intel-xe, linux-rdma, iommu, kvm

On Mon, Feb 02, 2026 at 09:42:22AM +0100, Christian König wrote:
> On 1/30/26 15:44, Jason Gunthorpe wrote:
> > On Fri, Jan 30, 2026 at 03:11:48PM +0100, Christian König wrote:
> >> On 1/30/26 14:56, Jason Gunthorpe wrote:
> >>> On Fri, Jan 30, 2026 at 02:21:08PM +0100, Christian König wrote:
> >>>
> >>>> That would work for me.
> >>>>
> >>>> Question is if you really want to do it this way? See usually
> >>>> exporters try to avoid blocking such functions.
> >>>
> >>> Yes, it has to be this way, revoke is a synchronous user space
> >>> triggered operation around things like FLR or device close. We can't
> >>> defer it into some background operation like pm.
> >>
> >> Yeah, but you only need that in a couple of use cases and not all.
> > 
> > Not all, that is why the dma_buf_attach_revocable() is there to
> > distinguish this case from others.
> 
> No, no that's not what I mean.
> 
> See on the one hand you have runtime PM which automatically shuts
> down your device after some time when the last user stops using it.
> 
> Then on the other hand you have an intentional revoke triggered by
> userspace.
> 
> As far as I've read up on the code currently both are handled the
> same way, and that doesn't sound correct to me.
> 
> Runtime PM should *not* trigger automatically when there are still
> mappings or even DMA-bufs in existence for the VFIO device.

I'm a little confused why we are talking about runtime PM - are you
pointing out an issue in VFIO today where it's PM support is not good?

I admit I don't know a lot about VFIO PM support.. Though I thought in
the VFIO case PM was actually under userspace control as generally the
PM control is delegated to the VM.

Through that lens, what is happening here is correct. If the VM
requests to shut down VFIO PM (through a hypervisor vfio ioctl) then
we do want to revoke the DMABUF so that the VM can't trigger a AER/etc
by trying to access the sleeping PCI device.

I don't think VFIO uses automatic PM on a timer, that doesn't make
sense for it's programming model.

Jason

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v5 4/8] vfio: Wait for dma-buf invalidation to complete
  2026-02-02 15:12                 ` Jason Gunthorpe
@ 2026-02-02 15:21                   ` Christian König
  2026-02-02 15:55                     ` Jason Gunthorpe
  0 siblings, 1 reply; 51+ messages in thread
From: Christian König @ 2026-02-02 15:21 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Leon Romanovsky, Sumit Semwal, Alex Deucher, David Airlie,
	Simona Vetter, Gerd Hoffmann, Dmitry Osipenko, Gurchetan Singh,
	Chia-I Wu, Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	Lucas De Marchi, Thomas Hellström, Rodrigo Vivi, Kevin Tian,
	Joerg Roedel, Will Deacon, Robin Murphy, Felix Kuehling,
	Alex Williamson, Ankit Agrawal, Vivek Kasireddy, linux-media,
	dri-devel, linaro-mm-sig, linux-kernel, amd-gfx, virtualization,
	intel-xe, linux-rdma, iommu, kvm

On 2/2/26 16:12, Jason Gunthorpe wrote:
> On Mon, Feb 02, 2026 at 09:42:22AM +0100, Christian König wrote:
>> On 1/30/26 15:44, Jason Gunthorpe wrote:
>>> On Fri, Jan 30, 2026 at 03:11:48PM +0100, Christian König wrote:
>>>> On 1/30/26 14:56, Jason Gunthorpe wrote:
>>>>> On Fri, Jan 30, 2026 at 02:21:08PM +0100, Christian König wrote:
>>>>>
>>>>>> That would work for me.
>>>>>>
>>>>>> Question is if you really want to do it this way? See usually
>>>>>> exporters try to avoid blocking such functions.
>>>>>
>>>>> Yes, it has to be this way, revoke is a synchronous user space
>>>>> triggered operation around things like FLR or device close. We can't
>>>>> defer it into some background operation like pm.
>>>>
>>>> Yeah, but you only need that in a couple of use cases and not all.
>>>
>>> Not all, that is why the dma_buf_attach_revocable() is there to
>>> distinguish this case from others.
>>
>> No, no that's not what I mean.
>>
>> See on the one hand you have runtime PM which automatically shuts
>> down your device after some time when the last user stops using it.
>>
>> Then on the other hand you have an intentional revoke triggered by
>> userspace.
>>
>> As far as I've read up on the code currently both are handled the
>> same way, and that doesn't sound correct to me.
>>
>> Runtime PM should *not* trigger automatically when there are still
>> mappings or even DMA-bufs in existence for the VFIO device.
> 
> I'm a little confused why we are talking about runtime PM - are you
> pointing out an issue in VFIO today where it's PM support is not good?

Exactly that, yes. This patch set here doesn't break it, but most likely makes the effect quite worse.

> I admit I don't know a lot about VFIO PM support.. Though I thought in
> the VFIO case PM was actually under userspace control as generally the
> PM control is delegated to the VM.
> 
> Through that lens, what is happening here is correct. If the VM
> requests to shut down VFIO PM (through a hypervisor vfio ioctl) then
> we do want to revoke the DMABUF so that the VM can't trigger a AER/etc
> by trying to access the sleeping PCI device.
> 
> I don't think VFIO uses automatic PM on a timer, that doesn't make
> sense for it's programming model.

From your description I agree that this doesn't make sense, but from the code it looks like exactly that is done.

Grep for pm_runtime_* on drivers/vfio/pci, but could be that I misunderstood the functionality, e.g. didn't spend to much time on it.

Just keep it in the back of your mind and maybe double check if that is actually the desired behavior.

Regards,
Christian.

> Jason


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v5 4/8] vfio: Wait for dma-buf invalidation to complete
  2026-02-02 15:21                   ` Christian König
@ 2026-02-02 15:55                     ` Jason Gunthorpe
  0 siblings, 0 replies; 51+ messages in thread
From: Jason Gunthorpe @ 2026-02-02 15:55 UTC (permalink / raw)
  To: Christian König, Alex Williamson
  Cc: Leon Romanovsky, Sumit Semwal, Alex Deucher, David Airlie,
	Simona Vetter, Gerd Hoffmann, Dmitry Osipenko, Gurchetan Singh,
	Chia-I Wu, Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	Lucas De Marchi, Thomas Hellström, Rodrigo Vivi, Kevin Tian,
	Joerg Roedel, Will Deacon, Robin Murphy, Felix Kuehling,
	Ankit Agrawal, Vivek Kasireddy, linux-media, dri-devel,
	linaro-mm-sig, linux-kernel, amd-gfx, virtualization, intel-xe,
	linux-rdma, iommu, kvm

On Mon, Feb 02, 2026 at 04:21:50PM +0100, Christian König wrote:
> > I admit I don't know a lot about VFIO PM support.. Though I thought in
> > the VFIO case PM was actually under userspace control as generally the
> > PM control is delegated to the VM.
> > 
> > Through that lens, what is happening here is correct. If the VM
> > requests to shut down VFIO PM (through a hypervisor vfio ioctl) then
> > we do want to revoke the DMABUF so that the VM can't trigger a AER/etc
> > by trying to access the sleeping PCI device.
> > 
> > I don't think VFIO uses automatic PM on a timer, that doesn't make
> > sense for it's programming model.
> 
> From your description I agree that this doesn't make sense, but from
> the code it looks like exactly that is done.
> 
> Grep for pm_runtime_* on drivers/vfio/pci, but could be that I
> misunderstood the functionality, e.g. didn't spend to much time on
> it.
> 
> Just keep it in the back of your mind and maybe double check if that
> is actually the desired behavior.

I had a small conversation with AlexW and we think VFIO is OK (bugs
excluded).

The use of the PM timer is still under userspace control, even though
a timer is still involved.

Basically there are a series of IOCTL defined in VFIO, like
LOW_POWER_ENTRY that all isolate the PCI device from userspace. The
mmap is blocked with SIBGUS and the DMABUFs are revoked.

The VFIO uAPI contract requries userspace to stop touching the device
immediately when using these IOCTLs. The PM timer may still be
involved, but is an implementation detail.

Effectively VFIO has a device state "isolated" meaning that userspace
cannot access the MMIO, and it enters this state based on various
IOCTLs from userspace. It ties mmap and DMABUF together so that if
mmap SIGBUS's the DMABUF is unmapped.

I understand your remarks, and this use of PM is certainly nothing
that any other driver should copy, but it does make sense for VFIO. If
there are bugs/issues we would continue to keep the overall property
that SGIBUS==DMABUF unmapped and only adjust when that happens.

TBH, I don't think people use the VFIO PM feature very much.

Jason

^ permalink raw reply	[flat|nested] 51+ messages in thread

end of thread, other threads:[~2026-02-02 15:55 UTC | newest]

Thread overview: 51+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-01-24 19:14 [PATCH v5 0/8] dma-buf: Use revoke mechanism to invalidate shared buffers Leon Romanovsky
2026-01-24 19:14 ` [PATCH v5 1/8] dma-buf: Rename .move_notify() callback to a clearer identifier Leon Romanovsky
2026-01-24 19:14 ` [PATCH v5 2/8] dma-buf: Rename dma_buf_move_notify() to dma_buf_invalidate_mappings() Leon Romanovsky
2026-01-27  9:21   ` Christian König
2026-01-24 19:14 ` [PATCH v5 3/8] dma-buf: Always build with DMABUF_MOVE_NOTIFY Leon Romanovsky
2026-01-27  9:26   ` Christian König
2026-01-27  9:58     ` Leon Romanovsky
2026-01-27 10:02       ` Christian König
2026-01-27 11:42         ` Leon Romanovsky
2026-01-27 20:45           ` Leon Romanovsky
2026-01-30  7:00         ` Leon Romanovsky
2026-01-24 19:14 ` [PATCH v5 4/8] vfio: Wait for dma-buf invalidation to complete Leon Romanovsky
2026-01-26 20:53   ` Pranjal Shrivastava
2026-01-27  8:58     ` Leon Romanovsky
2026-01-27 16:27       ` Jason Gunthorpe
2026-01-29  7:06         ` Tian, Kevin
2026-01-29  7:33           ` Leon Romanovsky
2026-01-29  8:13             ` Tian, Kevin
2026-01-29  8:41               ` Leon Romanovsky
2026-01-29 21:04                 ` Alex Williamson
2026-01-30  3:10                 ` Tian, Kevin
2026-01-29 14:58           ` Jason Gunthorpe
2026-01-30  3:12             ` Tian, Kevin
2026-01-30  5:43               ` Mauro Carvalho Chehab
2026-01-30  5:48                 ` Tian, Kevin
2026-01-30  8:46             ` Christian König
2026-01-30  8:30   ` Christian König
2026-01-30 13:01     ` Leon Romanovsky
2026-01-30 13:21       ` Christian König
2026-01-30 13:31         ` Leon Romanovsky
2026-01-30 13:56         ` Jason Gunthorpe
2026-01-30 14:11           ` Christian König
2026-01-30 14:44             ` Jason Gunthorpe
2026-02-02  8:42               ` Christian König
2026-02-02 15:12                 ` Jason Gunthorpe
2026-02-02 15:21                   ` Christian König
2026-02-02 15:55                     ` Jason Gunthorpe
2026-01-24 19:14 ` [PATCH v5 5/8] dma-buf: Make .invalidate_mapping() truly optional Leon Romanovsky
2026-01-30  8:30   ` Christian König
2026-01-30 12:55     ` Leon Romanovsky
2026-01-24 19:14 ` [PATCH v5 6/8] dma-buf: Add dma_buf_attach_revocable() Leon Romanovsky
2026-01-26 20:38   ` Pranjal Shrivastava
2026-01-26 21:01     ` Jason Gunthorpe
2026-01-30  8:43   ` Christian König
2026-01-30 14:00     ` Jason Gunthorpe
2026-01-24 19:14 ` [PATCH v5 7/8] vfio: Permit VFIO to work with pinned importers Leon Romanovsky
2026-01-29 21:04   ` Alex Williamson
2026-01-30  3:14   ` Tian, Kevin
2026-01-24 19:14 ` [PATCH v5 8/8] iommufd: Add dma_buf_pin() Leon Romanovsky
2026-01-29  7:08   ` Tian, Kevin
2026-01-30  0:17   ` Jason Gunthorpe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox