All of lore.kernel.org
 help / color / mirror / Atom feed
From: Aneesh Kumar K.V <aneesh.kumar@kernel.org>
To: Jason Gunthorpe <jgg@ziepe.ca>
Cc: iommu@lists.linux.dev, linux-kernel@vger.kernel.org,
	kvm@vger.kernel.org, Alexey Kardashevskiy <aik@amd.com>,
	Bjorn Helgaas <helgaas@kernel.org>,
	Dan Williams <dan.j.williams@intel.com>,
	Joerg Roedel <joro@8bytes.org>,
	Jonathan Cameron <jic23@kernel.org>,
	Kevin Tian <kevin.tian@intel.com>,
	Nicolin Chen <nicolinc@nvidia.com>,
	Samuel Ortiz <sameo@rivosinc.com>,
	Steven Price <steven.price@arm.com>,
	Suzuki K Poulose <Suzuki.Poulose@arm.com>,
	Will Deacon <will@kernel.org>,
	Xu Yilun <yilun.xu@linux.intel.com>,
	Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>,
	Sean Christopherson <seanjc@google.com>,
	Paolo Bonzini <pbonzini@redhat.com>
Subject: Re: [PATCH v4 1/4] iommufd/device: Associate a kvm pointer to iommufd_device
Date: Wed, 29 Apr 2026 19:22:46 +0530	[thread overview]
Message-ID: <yq5atsstzxe9.fsf@kernel.org> (raw)
In-Reply-To: <20260428125015.GB849557@ziepe.ca>


Jason Gunthorpe <jgg@ziepe.ca> writes:

> On Tue, Apr 28, 2026 at 05:31:04PM +0530, Aneesh Kumar K.V wrote:
>> > I thought we were trying to get away from struct kvm?
>> >
>> > https://lore.kernel.org/all/adf29Rn7q9Db0hxc@google.com/
>> >
>> > Ie this should be a 'struct file *kvm_fd'
>> >
>> > ?
>> >
>> > Though I am wondering how practical it is to do this at this moment :\
>> >
>> 
>> Should we also switch 
>> 
>> modified   drivers/vfio/vfio.h
>> @@ -22,8 +22,8 @@ struct vfio_device_file {
>>  
>>  	u8 access_granted;
>>  	u32 devid; /* only valid when iommufd is valid */
>> -	spinlock_t kvm_ref_lock; /* protect kvm field */
>> -	struct kvm *kvm;
>> +	spinlock_t kvm_ref_lock; /* protect kvm_file */
>> +	struct file *kvm_file;
>>  	struct iommufd_ctx *iommufd; /* protected by struct vfio_device_set::lock */
>>  };
>>  
>> @@ -88,7 +88,7 @@ struct vfio_group {
>>  #endif
>>  	enum vfio_group_type		type;
>>  	struct mutex			group_lock;
>> -	struct kvm			*kvm;
>> +	struct file			*kvm_file;
>>  	struct file			*opened_file;
>>  	struct blocking_notifier_head	notifier;
>>  	struct iommufd_ctx		*iommufd;
>
> Ideally, yes everything outside kvm should just use the struct file
> and there should be quite narrow places where you do some kvm
> operation on it.
>  
>> ie,
>> KVM_CREATE_DEVICE with KVM_DEV_TYPE_VFIO still use kvm->users_count,
>> KVM_DEV_VFIO_FILE_ADD -> will switch to get_file(kvm->_file);
>> and VFIO_DEVICE_BIND_IOMMUFD -> will switch to get_file(df->kvm_file)
>> 
>> > Maybe ask Paolo how his series is going?
>
> I thought the above is what Paolo's series was going after?
>

Hi Paolo,

Could you please take a look at this? I’ve tried to document the
reference count details and hope I’ve captured everything correctly, but
I’d appreciate your review to make sure I haven’t missed anything.

From 141fb9430d6c4227144221410121780c6681d1ce Mon Sep 17 00:00:00 2001
From: "Aneesh Kumar K.V (Arm)" <aneesh.kumar@kernel.org>
Date: Wed, 29 Apr 2026 12:14:36 +0530
Subject: [PATCH] vfio: cache KVM VM file references instead of raw struct kvm
 pointers

VFIO currently records struct kvm pointers on vfio_group, vfio_device_file
and the opened vfio_device. Switch VFIO to track the VM's struct file
instead of a raw struct kvm pointer so that VFIO holds a reference to the
VM fd's struct file, rather than caching a struct kvm pointer and using
KVM-managed object lifetime (kvm->users_count) to make that pointer usable.

This also lets VFIO stop taking KVM object references through
kvm_get_kvm_safe()/kvm_put_kvm() via symbol_get(), and instead pin the VM
through references to kvm->_file.

KVM publishes the association through KVM_DEV_VFIO_FILE_ADD:
  - kvm_vfio_file_add() gets the userspace VFIO file with fget(fd)
  - it then takes its own persistent reference with kvf->file =
    get_file(filp)
  - that kvf->file reference is owned by the KVM VFIO device and lives on
    kv->file_list until KVM_DEV_VFIO_FILE_DEL or kvm_vfio_release()
  - kvm_vfio_file_set_kvm(kvf->file, dev->kvm) then updates the VFIO side
    association to the VM

On the VFIO side, vfio_kvm_file_replace() converts struct kvm to a VM file
reference with get_file_active(&kvm->_file). get_file_active() is required
because dev->kvm may still be alive due to the KVM device fd's
kvm_get_kvm() reference from KVM_CREATE_DEVICE, while the VM fd itself may
already be in final __fput(). In that case kvm->_file must not be
re-acquired with plain get_file().

The stored group/device-file kvm_file references are long-lived association
references managed by KVM:
  - they are installed from KVM_DEV_VFIO_FILE_ADD
  - they are replaced/cleared by kvm_vfio_file_set_kvm()
  - they are dropped from kvm_vfio_file_del() and kvm_vfio_release(), which
    call vfio_file_set_kvm(file, NULL) before dropping kvf->file

The opened vfio_device still takes its own kvm_file reference for driver
use:
  - group path: vfio_device_group_get_kvm_safe()
  - cdev/iommufd path: vfio_df_get_kvm_safe() from VFIO_DEVICE_BIND_IOMMUFD
  - both paths call vfio_device_get_kvm_safe(), which get_file()s the
    already-associated kvm_file and stores it in device->kvm_file
  - that open-time reference is dropped by vfio_device_put_kvm() when the
    device is closed or unbound

This gives each layer a clear ownership model:
  - KVM device fd: pins struct kvm through kvm->users_count via
    kvm_get_kvm()/kvm_put_kvm()
  - VM fd and VFIO kvm_file references: pin the VM file through kvm->_file
  - kvf->file: pins the VFIO file until KVM removes the association
  - device->kvm_file: pins the VM file for the duration of an active VFIO
    device open


Signed-off-by: Aneesh Kumar K.V (Arm) <aneesh.kumar@kernel.org>
---
 drivers/vfio/device_cdev.c | 10 ++---
 drivers/vfio/group.c       | 14 +++----
 drivers/vfio/vfio.h        | 16 +++++---
 drivers/vfio/vfio_main.c   | 76 ++++++++++++++++++--------------------
 include/linux/kvm_host.h   |  3 ++
 include/linux/vfio.h       |  3 +-
 virt/kvm/kvm_main.c        |  1 +
 7 files changed, 62 insertions(+), 61 deletions(-)

diff --git a/drivers/vfio/device_cdev.c b/drivers/vfio/device_cdev.c
index 8ceca24ac136..b3e7e1b725c1 100644
--- a/drivers/vfio/device_cdev.c
+++ b/drivers/vfio/device_cdev.c
@@ -56,7 +56,7 @@ int vfio_device_fops_cdev_open(struct inode *inode, struct file *filep)
 static void vfio_df_get_kvm_safe(struct vfio_device_file *df)
 {
 	spin_lock(&df->kvm_ref_lock);
-	vfio_device_get_kvm_safe(df->device, df->kvm);
+	vfio_device_get_kvm_safe(df->device, df->kvm_file);
 	spin_unlock(&df->kvm_ref_lock);
 }
 
@@ -133,10 +133,10 @@ long vfio_df_ioctl_bind_iommufd(struct vfio_device_file *df,
 	}
 
 	/*
-	 * Before the device open, get the KVM pointer currently
-	 * associated with the device file (if there is) and obtain
-	 * a reference.  This reference is held until device closed.
-	 * Save the pointer in the device for use by drivers.
+	 * Before the device open, get the VM struct file currently
+	 * associated with the device file (if there is one) and obtain a
+	 * reference. This reference is held until the device is closed.
+	 * Save the file in the device for use by drivers.
 	 */
 	vfio_df_get_kvm_safe(df);
 
diff --git a/drivers/vfio/group.c b/drivers/vfio/group.c
index 4f15016d2a5f..8ed6a676f1c9 100644
--- a/drivers/vfio/group.c
+++ b/drivers/vfio/group.c
@@ -158,7 +158,7 @@ static int vfio_group_ioctl_set_container(struct vfio_group *group,
 static void vfio_device_group_get_kvm_safe(struct vfio_device *device)
 {
 	spin_lock(&device->group->kvm_ref_lock);
-	vfio_device_get_kvm_safe(device, device->group->kvm);
+	vfio_device_get_kvm_safe(device, device->group->kvm_file);
 	spin_unlock(&device->group->kvm_ref_lock);
 }
 
@@ -176,10 +176,10 @@ static int vfio_df_group_open(struct vfio_device_file *df)
 	mutex_lock(&device->dev_set->lock);
 
 	/*
-	 * Before the first device open, get the KVM pointer currently
-	 * associated with the group (if there is one) and obtain a reference
-	 * now that will be held until the open_count reaches 0 again.  Save
-	 * the pointer in the device for use by drivers.
+	 * Before the first device open, get the VM struct file currently
+	 * associated with the group (if there is one) and obtain a
+	 * reference now that will be held until the open_count reaches 0
+	 * again. Save the file in the device for use by drivers.
 	 */
 	if (device->open_count == 0)
 		vfio_device_group_get_kvm_safe(device);
@@ -860,9 +860,7 @@ bool vfio_group_enforced_coherent(struct vfio_group *group)
 
 void vfio_group_set_kvm(struct vfio_group *group, struct kvm *kvm)
 {
-	spin_lock(&group->kvm_ref_lock);
-	group->kvm = kvm;
-	spin_unlock(&group->kvm_ref_lock);
+	vfio_kvm_file_replace(&group->kvm_file, &group->kvm_ref_lock, kvm);
 }
 
 /**
diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h
index 50128da18bca..5b4bda2aeb47 100644
--- a/drivers/vfio/vfio.h
+++ b/drivers/vfio/vfio.h
@@ -22,8 +22,8 @@ struct vfio_device_file {
 
 	u8 access_granted;
 	u32 devid; /* only valid when iommufd is valid */
-	spinlock_t kvm_ref_lock; /* protect kvm field */
-	struct kvm *kvm;
+	spinlock_t kvm_ref_lock; /* protect kvm_file */
+	struct file *kvm_file;
 	struct iommufd_ctx *iommufd; /* protected by struct vfio_device_set::lock */
 };
 
@@ -88,7 +88,7 @@ struct vfio_group {
 #endif
 	enum vfio_group_type		type;
 	struct mutex			group_lock;
-	struct kvm			*kvm;
+	struct file			*kvm_file;
 	struct file			*opened_file;
 	struct blocking_notifier_head	notifier;
 	struct iommufd_ctx		*iommufd;
@@ -435,11 +435,17 @@ static inline void vfio_virqfd_exit(void)
 #endif
 
 #if IS_ENABLED(CONFIG_KVM)
-void vfio_device_get_kvm_safe(struct vfio_device *device, struct kvm *kvm);
+void vfio_kvm_file_replace(struct file **dst, spinlock_t *lock, struct kvm *kvm);
+void vfio_device_get_kvm_safe(struct vfio_device *device, struct file *kvm_file);
 void vfio_device_put_kvm(struct vfio_device *device);
 #else
+static inline void vfio_kvm_file_replace(struct file **dst,
+		spinlock_t *lock, struct kvm *kvm)
+{
+}
+
 static inline void vfio_device_get_kvm_safe(struct vfio_device *device,
-					    struct kvm *kvm)
+					    struct file *kvm_file)
 {
 }
 
diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
index 742477546b15..05ddd7344c18 100644
--- a/drivers/vfio/vfio_main.c
+++ b/drivers/vfio/vfio_main.c
@@ -433,54 +433,51 @@ void vfio_unregister_group_dev(struct vfio_device *device)
 EXPORT_SYMBOL_GPL(vfio_unregister_group_dev);
 
 #if IS_ENABLED(CONFIG_KVM)
-void vfio_device_get_kvm_safe(struct vfio_device *device, struct kvm *kvm)
+void vfio_kvm_file_replace(struct file **dst, spinlock_t *lock, struct kvm *kvm)
 {
-	void (*pfn)(struct kvm *kvm);
-	bool (*fn)(struct kvm *kvm);
-	bool ret;
+	struct file *old_kvm_file, *new_kvm_file = NULL;
 
-	lockdep_assert_held(&device->dev_set->lock);
+	/*
+	 * @kvm can outlive the VM fd and its final __fput(). Only take a
+	 * new reference if the VM file is still active.
+	 */
+	if (kvm)
+		new_kvm_file = get_file_active(&kvm->_file);
 
-	if (!kvm)
-		return;
+	spin_lock(lock);
+	old_kvm_file = *dst;
+	*dst = new_kvm_file;
+	spin_unlock(lock);
 
-	pfn = symbol_get(kvm_put_kvm);
-	if (WARN_ON(!pfn))
-		return;
+	if (old_kvm_file)
+		fput(old_kvm_file);
+}
 
-	fn = symbol_get(kvm_get_kvm_safe);
-	if (WARN_ON(!fn)) {
-		symbol_put(kvm_put_kvm);
-		return;
-	}
+void vfio_device_get_kvm_safe(struct vfio_device *device, struct file *kvm_file)
+{
+	lockdep_assert_held(&device->dev_set->lock);
 
-	ret = fn(kvm);
-	symbol_put(kvm_get_kvm_safe);
-	if (!ret) {
-		symbol_put(kvm_put_kvm);
-		return;
-	}
+	/*
+	 * Take a VM file reference if the KVM fd is still active.
+	 */
+	if (kvm_file)
+		kvm_file = get_file(kvm_file);
 
-	device->put_kvm = pfn;
-	device->kvm = kvm;
+	device->kvm_file = kvm_file;
 }
 
 void vfio_device_put_kvm(struct vfio_device *device)
 {
+	struct file *kvm_file;
+
 	lockdep_assert_held(&device->dev_set->lock);
 
-	if (!device->kvm)
+	kvm_file = device->kvm_file;
+	if (!kvm_file)
 		return;
 
-	if (WARN_ON(!device->put_kvm))
-		goto clear;
-
-	device->put_kvm(device->kvm);
-	device->put_kvm = NULL;
-	symbol_put(kvm_put_kvm);
-
-clear:
-	device->kvm = NULL;
+	device->kvm_file = NULL;
+	fput(kvm_file);
 }
 #endif
 
@@ -1488,13 +1485,10 @@ static void vfio_device_file_set_kvm(struct file *file, struct kvm *kvm)
 	struct vfio_device_file *df = file->private_data;
 
 	/*
-	 * The kvm is first recorded in the vfio_device_file, and will
-	 * be propagated to vfio_device::kvm when the file is bound to
-	 * iommufd successfully in the vfio device cdev path.
+	 * Cache the VM file reference associated with this VFIO file so it
+	 * can be pinned into vfio_device while the device is open.
 	 */
-	spin_lock(&df->kvm_ref_lock);
-	df->kvm = kvm;
-	spin_unlock(&df->kvm_ref_lock);
+	vfio_kvm_file_replace(&df->kvm_file, &df->kvm_ref_lock, kvm);
 }
 
 /**
@@ -1502,8 +1496,8 @@ static void vfio_device_file_set_kvm(struct file *file, struct kvm *kvm)
  * @file: VFIO group file or VFIO device file
  * @kvm: KVM to link
  *
- * When a VFIO device is first opened the KVM will be available in
- * device->kvm if one was associated with the file.
+ * When a VFIO device is first opened, VFIO caches a VM file reference if
+ * one was associated with the file.
  */
 void vfio_file_set_kvm(struct file *file, struct kvm *kvm)
 {
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 6b76e7a6f4c2..94812fb8e4cf 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -45,6 +45,8 @@
 #include <asm/kvm_host.h>
 #include <linux/kvm_dirty_ring.h>
 
+struct file;
+
 #ifndef KVM_MAX_VCPU_IDS
 #define KVM_MAX_VCPU_IDS KVM_MAX_VCPUS
 #endif
@@ -860,6 +862,7 @@ struct kvm {
 	struct srcu_struct srcu;
 	struct srcu_struct irq_srcu;
 	pid_t userspace_pid;
+	struct file *_file;
 	bool override_halt_poll_ns;
 	unsigned int max_halt_poll_ns;
 	u32 dirty_ring_size;
diff --git a/include/linux/vfio.h b/include/linux/vfio.h
index e90859956514..e5fa5fb7f3fe 100644
--- a/include/linux/vfio.h
+++ b/include/linux/vfio.h
@@ -52,7 +52,7 @@ struct vfio_device {
 	struct vfio_device_set *dev_set;
 	struct list_head dev_set_list;
 	unsigned int migration_flags;
-	struct kvm *kvm;
+	struct file *kvm_file;
 
 	/* Members below here are private, not for driver use */
 	unsigned int index;
@@ -64,7 +64,6 @@ struct vfio_device {
 	unsigned int open_count;
 	struct completion comp;
 	struct iommufd_access *iommufd_access;
-	void (*put_kvm)(struct kvm *kvm);
 	struct inode *inode;
 #if IS_ENABLED(CONFIG_IOMMUFD)
 	struct iommufd_device *iommufd_device;
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 9093251beb39..45e70ae78215 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -5507,6 +5507,7 @@ static int kvm_dev_ioctl_create_vm(unsigned long type)
 		r = PTR_ERR(file);
 		goto put_kvm;
 	}
+	kvm->_file = file;
 
 	/*
 	 * Don't call kvm_put_kvm anymore at this point; file->f_op is
-- 
2.43.0


  reply	other threads:[~2026-04-29 13:52 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-27  6:10 [PATCH v4 0/4] Add iommufd ioctls to support TSM operations Aneesh Kumar K.V (Arm)
2026-04-27  6:10 ` [PATCH v4 1/4] iommufd/device: Associate a kvm pointer to iommufd_device Aneesh Kumar K.V (Arm)
2026-04-27  9:07   ` Baolu Lu
2026-04-27 14:01     ` Jason Gunthorpe
2026-04-27 13:59   ` Jason Gunthorpe
2026-04-28 12:01     ` Aneesh Kumar K.V
2026-04-28 12:50       ` Jason Gunthorpe
2026-04-29 13:52         ` Aneesh Kumar K.V [this message]
2026-04-27  6:10 ` [PATCH v4 2/4] iommufd/viommu: Associate a kvm pointer to iommufd_viommu Aneesh Kumar K.V (Arm)
2026-04-27 14:03   ` Jason Gunthorpe
2026-04-27  6:10 ` [PATCH v4 3/4] iommufd/tsm: add vdevice TSM bind/unbind ioctl Aneesh Kumar K.V (Arm)
2026-04-27  6:10 ` [PATCH v4 4/4] iommufd/vdevice: add TSM guest request ioctl Aneesh Kumar K.V (Arm)
2026-04-27 14:05   ` Jason Gunthorpe
2026-04-28 12:13     ` Aneesh Kumar K.V
2026-04-28 12:48       ` Jason Gunthorpe
2026-05-08  3:12         ` Tian, Kevin
2026-05-08  4:12           ` Aneesh Kumar K.V

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=yq5atsstzxe9.fsf@kernel.org \
    --to=aneesh.kumar@kernel.org \
    --cc=Suzuki.Poulose@arm.com \
    --cc=aik@amd.com \
    --cc=dan.j.williams@intel.com \
    --cc=helgaas@kernel.org \
    --cc=iommu@lists.linux.dev \
    --cc=jgg@ziepe.ca \
    --cc=jic23@kernel.org \
    --cc=joro@8bytes.org \
    --cc=kevin.tian@intel.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=nicolinc@nvidia.com \
    --cc=pbonzini@redhat.com \
    --cc=sameo@rivosinc.com \
    --cc=seanjc@google.com \
    --cc=shameerali.kolothum.thodi@huawei.com \
    --cc=steven.price@arm.com \
    --cc=will@kernel.org \
    --cc=yilun.xu@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.