public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/2] iommu/vt-d: Introduce rbtree for probed devices
@ 2024-02-20  6:59 Lu Baolu
  2024-02-20  6:59 ` [PATCH v2 1/2] iommu/vt-d: Use rbtree to track iommu " Lu Baolu
                   ` (2 more replies)
  0 siblings, 3 replies; 5+ messages in thread
From: Lu Baolu @ 2024-02-20  6:59 UTC (permalink / raw)
  To: Joerg Roedel, Will Deacon, Robin Murphy, Jason Gunthorpe,
	Kevin Tian
  Cc: Huang Jiaqing, Ethan Zhao, iommu, linux-kernel, Lu Baolu

This is a follow-up to the discussion thread here:

https://lore.kernel.org/linux-iommu/20240201193427.GQ50608@ziepe.ca/

This proposes a per-IOMMU red-black tree for iommu probed devices. It
benefits fault handling paths, where the VT-d hardware reports the
Source ID of the related device, and software needs to retrieve the
corresponding device pointer based on this ID.

This is also available at github:

https://github.com/LuBaolu/intel-iommu/commits/rbtree-for-device-info-v2

Change log:

v2:
 - Drop get/put_device() in device_rbtree_find();
 - Add a mutex to synchronize iopf report and device release paths.

v1: https://lore.kernel.org/linux-iommu/20240215072249.4465-1-baolu.lu@linux.intel.com/

Lu Baolu (2):
  iommu/vt-d: Use rbtree to track iommu probed devices
  iommu/vt-d: Use device rbtree in iopf reporting path

 drivers/iommu/intel/iommu.h | 10 ++++
 drivers/iommu/intel/dmar.c  |  4 +-
 drivers/iommu/intel/iommu.c | 91 ++++++++++++++++++++++++++++++++++++-
 drivers/iommu/intel/svm.c   | 17 +++----
 4 files changed, 111 insertions(+), 11 deletions(-)

-- 
2.34.1


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH v2 1/2] iommu/vt-d: Use rbtree to track iommu probed devices
  2024-02-20  6:59 [PATCH v2 0/2] iommu/vt-d: Introduce rbtree for probed devices Lu Baolu
@ 2024-02-20  6:59 ` Lu Baolu
  2024-02-20  6:59 ` [PATCH v2 2/2] iommu/vt-d: Use device rbtree in iopf reporting path Lu Baolu
  2024-02-26  6:15 ` [PATCH v2 0/2] iommu/vt-d: Introduce rbtree for probed devices Baolu Lu
  2 siblings, 0 replies; 5+ messages in thread
From: Lu Baolu @ 2024-02-20  6:59 UTC (permalink / raw)
  To: Joerg Roedel, Will Deacon, Robin Murphy, Jason Gunthorpe,
	Kevin Tian
  Cc: Huang Jiaqing, Ethan Zhao, iommu, linux-kernel, Lu Baolu,
	Jason Gunthorpe

Use a red-black tree(rbtree) to track devices probed by the driver's
probe_device callback. These devices need to be looked up quickly by
a source ID when the hardware reports a fault, either recoverable or
unrecoverable.

Fault reporting paths are critical. Searching a list in this scenario
is inefficient, with an algorithm complexity of O(n). An rbtree is a
self-balancing binary search tree, offering an average search time
complexity of O(log(n)). This significant performance improvement
makes rbtrees a better choice.

Furthermore, rbtrees are implemented on a per-iommu basis, eliminating
the need for global searches and further enhancing efficiency in
critical fault paths. The rbtree is protected by a spin lock with
interrupts disabled to ensure thread-safe access even within interrupt
contexts.

Co-developed-by: Huang Jiaqing <jiaqing.huang@intel.com>
Signed-off-by: Huang Jiaqing <jiaqing.huang@intel.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
---
 drivers/iommu/intel/iommu.h |  8 ++++
 drivers/iommu/intel/dmar.c  |  3 +-
 drivers/iommu/intel/iommu.c | 88 ++++++++++++++++++++++++++++++++++++-
 3 files changed, 96 insertions(+), 3 deletions(-)

diff --git a/drivers/iommu/intel/iommu.h b/drivers/iommu/intel/iommu.h
index eaef932ad7c9..2b67ad0d6fe9 100644
--- a/drivers/iommu/intel/iommu.h
+++ b/drivers/iommu/intel/iommu.h
@@ -722,6 +722,11 @@ struct intel_iommu {
 	struct q_inval  *qi;            /* Queued invalidation info */
 	u32 iommu_state[MAX_SR_DMAR_REGS]; /* Store iommu states between suspend and resume.*/
 
+	/* rb tree for all probed devices */
+	struct rb_root device_rbtree;
+	/* protect the device_rbtree */
+	spinlock_t device_rbtree_lock;
+
 #ifdef CONFIG_IRQ_REMAP
 	struct ir_table *ir_table;	/* Interrupt remapping info */
 	struct irq_domain *ir_domain;
@@ -755,6 +760,8 @@ struct device_domain_info {
 	struct intel_iommu *iommu; /* IOMMU used by this device */
 	struct dmar_domain *domain; /* pointer to domain */
 	struct pasid_table *pasid_table; /* pasid table */
+	/* device tracking node(lookup by PCI RID) */
+	struct rb_node node;
 #ifdef CONFIG_INTEL_IOMMU_DEBUGFS
 	struct dentry *debugfs_dentry; /* pointer to device directory dentry */
 #endif
@@ -1081,6 +1088,7 @@ void free_pgtable_page(void *vaddr);
 void iommu_flush_write_buffer(struct intel_iommu *iommu);
 struct iommu_domain *intel_nested_domain_alloc(struct iommu_domain *parent,
 					       const struct iommu_user_data *user_data);
+struct device *device_rbtree_find(struct intel_iommu *iommu, u16 rid);
 
 #ifdef CONFIG_INTEL_IOMMU_SVM
 void intel_svm_check(struct intel_iommu *iommu);
diff --git a/drivers/iommu/intel/dmar.c b/drivers/iommu/intel/dmar.c
index 23cb80d62a9a..f9b63c2875f7 100644
--- a/drivers/iommu/intel/dmar.c
+++ b/drivers/iommu/intel/dmar.c
@@ -1095,7 +1095,8 @@ static int alloc_iommu(struct dmar_drhd_unit *drhd)
 	iommu->agaw = agaw;
 	iommu->msagaw = msagaw;
 	iommu->segment = drhd->segment;
-
+	iommu->device_rbtree = RB_ROOT;
+	spin_lock_init(&iommu->device_rbtree_lock);
 	iommu->node = NUMA_NO_NODE;
 
 	ver = readl(iommu->reg + DMAR_VER_REG);
diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index 61bb35046ea4..acfe27bd3448 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -96,6 +96,81 @@ static phys_addr_t root_entry_uctp(struct root_entry *re)
 	return re->hi & VTD_PAGE_MASK;
 }
 
+static int device_rid_cmp_key(const void *key, const struct rb_node *node)
+{
+	struct device_domain_info *info =
+		rb_entry(node, struct device_domain_info, node);
+	const u16 *rid_lhs = key;
+
+	if (*rid_lhs < PCI_DEVID(info->bus, info->devfn))
+		return -1;
+
+	if (*rid_lhs > PCI_DEVID(info->bus, info->devfn))
+		return 1;
+
+	return 0;
+}
+
+static int device_rid_cmp(struct rb_node *lhs, const struct rb_node *rhs)
+{
+	struct device_domain_info *info =
+		rb_entry(lhs, struct device_domain_info, node);
+	u16 key = PCI_DEVID(info->bus, info->devfn);
+
+	return device_rid_cmp_key(&key, rhs);
+}
+
+/*
+ * Looks up an IOMMU-probed device using its source ID.
+ *
+ * Returns the pointer to the device if there is a match. Otherwise,
+ * returns NULL.
+ *
+ * Note that this helper doesn't guarantee that the device won't be
+ * released by the iommu subsystem after being returned. The caller
+ * should use its own synchronization mechanism to avoid the device
+ * being released during its use if its possibly the case.
+ */
+struct device *device_rbtree_find(struct intel_iommu *iommu, u16 rid)
+{
+	struct device_domain_info *info = NULL;
+	struct rb_node *node;
+	unsigned long flags;
+
+	spin_lock_irqsave(&iommu->device_rbtree_lock, flags);
+	node = rb_find(&rid, &iommu->device_rbtree, device_rid_cmp_key);
+	if (node)
+		info = rb_entry(node, struct device_domain_info, node);
+	spin_unlock_irqrestore(&iommu->device_rbtree_lock, flags);
+
+	return info ? info->dev : NULL;
+}
+
+static int device_rbtree_insert(struct intel_iommu *iommu,
+				struct device_domain_info *info)
+{
+	struct rb_node *curr;
+	unsigned long flags;
+
+	spin_lock_irqsave(&iommu->device_rbtree_lock, flags);
+	curr = rb_find_add(&info->node, &iommu->device_rbtree, device_rid_cmp);
+	spin_unlock_irqrestore(&iommu->device_rbtree_lock, flags);
+	if (WARN_ON(curr))
+		return -EEXIST;
+
+	return 0;
+}
+
+static void device_rbtree_remove(struct device_domain_info *info)
+{
+	struct intel_iommu *iommu = info->iommu;
+	unsigned long flags;
+
+	spin_lock_irqsave(&iommu->device_rbtree_lock, flags);
+	rb_erase(&info->node, &iommu->device_rbtree);
+	spin_unlock_irqrestore(&iommu->device_rbtree_lock, flags);
+}
+
 /*
  * This domain is a statically identity mapping domain.
  *	1. This domain creats a static 1:1 mapping to all usable memory.
@@ -4329,25 +4404,34 @@ static struct iommu_device *intel_iommu_probe_device(struct device *dev)
 	}
 
 	dev_iommu_priv_set(dev, info);
+	ret = device_rbtree_insert(iommu, info);
+	if (ret)
+		goto free;
 
 	if (sm_supported(iommu) && !dev_is_real_dma_subdevice(dev)) {
 		ret = intel_pasid_alloc_table(dev);
 		if (ret) {
 			dev_err(dev, "PASID table allocation failed\n");
-			kfree(info);
-			return ERR_PTR(ret);
+			goto clear_rbtree;
 		}
 	}
 
 	intel_iommu_debugfs_create_dev(info);
 
 	return &iommu->iommu;
+clear_rbtree:
+	device_rbtree_remove(info);
+free:
+	kfree(info);
+
+	return ERR_PTR(ret);
 }
 
 static void intel_iommu_release_device(struct device *dev)
 {
 	struct device_domain_info *info = dev_iommu_priv_get(dev);
 
+	device_rbtree_remove(info);
 	dmar_remove_one_dev_info(dev);
 	intel_pasid_free_table(dev);
 	intel_iommu_debugfs_remove_dev(info);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH v2 2/2] iommu/vt-d: Use device rbtree in iopf reporting path
  2024-02-20  6:59 [PATCH v2 0/2] iommu/vt-d: Introduce rbtree for probed devices Lu Baolu
  2024-02-20  6:59 ` [PATCH v2 1/2] iommu/vt-d: Use rbtree to track iommu " Lu Baolu
@ 2024-02-20  6:59 ` Lu Baolu
  2024-02-21 15:34   ` Jason Gunthorpe
  2024-02-26  6:15 ` [PATCH v2 0/2] iommu/vt-d: Introduce rbtree for probed devices Baolu Lu
  2 siblings, 1 reply; 5+ messages in thread
From: Lu Baolu @ 2024-02-20  6:59 UTC (permalink / raw)
  To: Joerg Roedel, Will Deacon, Robin Murphy, Jason Gunthorpe,
	Kevin Tian
  Cc: Huang Jiaqing, Ethan Zhao, iommu, linux-kernel, Lu Baolu

The existing I/O page fault handler currently locates the PCI device by
calling pci_get_domain_bus_and_slot(). This function searches the list
of all PCI devices until the desired device is found. To improve lookup
efficiency, replace it with device_rbtree_find() to search the device
within the probed device rbtree.

The I/O page fault is initiated by the device, which does not have any
synchronization mechanism with the software to ensure that the device
stays in the probed device tree. Theoretically, a device could be released
by the IOMMU subsystem after device_rbtree_find() and before
iopf_get_dev_fault_param(), which would cause a use-after-free problem.

Add a mutex to synchronize the I/O page fault reporting path and the IOMMU
release device path. This lock doesn't introduce any performance overhead,
as the conflict between I/O page fault reporting and device releasing is
very rare.

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
---
 drivers/iommu/intel/iommu.h |  2 ++
 drivers/iommu/intel/dmar.c  |  1 +
 drivers/iommu/intel/iommu.c |  3 +++
 drivers/iommu/intel/svm.c   | 17 +++++++++--------
 4 files changed, 15 insertions(+), 8 deletions(-)

diff --git a/drivers/iommu/intel/iommu.h b/drivers/iommu/intel/iommu.h
index 2b67ad0d6fe9..404d2476a877 100644
--- a/drivers/iommu/intel/iommu.h
+++ b/drivers/iommu/intel/iommu.h
@@ -719,6 +719,8 @@ struct intel_iommu {
 #endif
 	struct iopf_queue *iopf_queue;
 	unsigned char iopfq_name[16];
+	/* Synchronization between fault report and iommu device release. */
+	struct mutex iopf_lock;
 	struct q_inval  *qi;            /* Queued invalidation info */
 	u32 iommu_state[MAX_SR_DMAR_REGS]; /* Store iommu states between suspend and resume.*/
 
diff --git a/drivers/iommu/intel/dmar.c b/drivers/iommu/intel/dmar.c
index f9b63c2875f7..d14797aabb7a 100644
--- a/drivers/iommu/intel/dmar.c
+++ b/drivers/iommu/intel/dmar.c
@@ -1097,6 +1097,7 @@ static int alloc_iommu(struct dmar_drhd_unit *drhd)
 	iommu->segment = drhd->segment;
 	iommu->device_rbtree = RB_ROOT;
 	spin_lock_init(&iommu->device_rbtree_lock);
+	mutex_init(&iommu->iopf_lock);
 	iommu->node = NUMA_NO_NODE;
 
 	ver = readl(iommu->reg + DMAR_VER_REG);
diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index acfe27bd3448..6743fe6c7a36 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -4430,8 +4430,11 @@ static struct iommu_device *intel_iommu_probe_device(struct device *dev)
 static void intel_iommu_release_device(struct device *dev)
 {
 	struct device_domain_info *info = dev_iommu_priv_get(dev);
+	struct intel_iommu *iommu = info->iommu;
 
+	mutex_lock(&iommu->iopf_lock);
 	device_rbtree_remove(info);
+	mutex_unlock(&iommu->iopf_lock);
 	dmar_remove_one_dev_info(dev);
 	intel_pasid_free_table(dev);
 	intel_iommu_debugfs_remove_dev(info);
diff --git a/drivers/iommu/intel/svm.c b/drivers/iommu/intel/svm.c
index b644d57da841..dda276e28325 100644
--- a/drivers/iommu/intel/svm.c
+++ b/drivers/iommu/intel/svm.c
@@ -645,7 +645,7 @@ static irqreturn_t prq_event_thread(int irq, void *d)
 	struct intel_iommu *iommu = d;
 	struct page_req_dsc *req;
 	int head, tail, handled;
-	struct pci_dev *pdev;
+	struct device *dev;
 	u64 address;
 
 	/*
@@ -691,21 +691,22 @@ static irqreturn_t prq_event_thread(int irq, void *d)
 		if (unlikely(req->lpig && !req->rd_req && !req->wr_req))
 			goto prq_advance;
 
-		pdev = pci_get_domain_bus_and_slot(iommu->segment,
-						   PCI_BUS_NUM(req->rid),
-						   req->rid & 0xff);
 		/*
 		 * If prq is to be handled outside iommu driver via receiver of
 		 * the fault notifiers, we skip the page response here.
 		 */
-		if (!pdev)
+		mutex_lock(&iommu->iopf_lock);
+		dev = device_rbtree_find(iommu, req->rid);
+		if (!dev) {
+			mutex_unlock(&iommu->iopf_lock);
 			goto bad_req;
+		}
 
-		intel_svm_prq_report(iommu, &pdev->dev, req);
-		trace_prq_report(iommu, &pdev->dev, req->qw_0, req->qw_1,
+		intel_svm_prq_report(iommu, dev, req);
+		trace_prq_report(iommu, dev, req->qw_0, req->qw_1,
 				 req->priv_data[0], req->priv_data[1],
 				 iommu->prq_seq_number++);
-		pci_dev_put(pdev);
+		mutex_unlock(&iommu->iopf_lock);
 prq_advance:
 		head = (head + sizeof(*req)) & PRQ_RING_MASK;
 	}
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH v2 2/2] iommu/vt-d: Use device rbtree in iopf reporting path
  2024-02-20  6:59 ` [PATCH v2 2/2] iommu/vt-d: Use device rbtree in iopf reporting path Lu Baolu
@ 2024-02-21 15:34   ` Jason Gunthorpe
  0 siblings, 0 replies; 5+ messages in thread
From: Jason Gunthorpe @ 2024-02-21 15:34 UTC (permalink / raw)
  To: Lu Baolu
  Cc: Joerg Roedel, Will Deacon, Robin Murphy, Kevin Tian,
	Huang Jiaqing, Ethan Zhao, iommu, linux-kernel

On Tue, Feb 20, 2024 at 02:59:39PM +0800, Lu Baolu wrote:
> diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
> index acfe27bd3448..6743fe6c7a36 100644
> --- a/drivers/iommu/intel/iommu.c
> +++ b/drivers/iommu/intel/iommu.c
> @@ -4430,8 +4430,11 @@ static struct iommu_device *intel_iommu_probe_device(struct device *dev)
>  static void intel_iommu_release_device(struct device *dev)
>  {
>  	struct device_domain_info *info = dev_iommu_priv_get(dev);
> +	struct intel_iommu *iommu = info->iommu;
>  
> +	mutex_lock(&iommu->iopf_lock);
>  	device_rbtree_remove(info);
> +	mutex_unlock(&iommu->iopf_lock);

This seems like a pretty reasonable solution, maybe someday it can
become lockless.. This is a fast path right?

> @@ -691,21 +691,22 @@ static irqreturn_t prq_event_thread(int irq, void *d)
>  		if (unlikely(req->lpig && !req->rd_req && !req->wr_req))
>  			goto prq_advance;
>  
> -		pdev = pci_get_domain_bus_and_slot(iommu->segment,
> -						   PCI_BUS_NUM(req->rid),
> -						   req->rid & 0xff);
>  		/*
>  		 * If prq is to be handled outside iommu driver via receiver of
>  		 * the fault notifiers, we skip the page response here.
>  		 */
> -		if (!pdev)
> +		mutex_lock(&iommu->iopf_lock);
> +		dev = device_rbtree_find(iommu, req->rid);
> +		if (!dev) {
> +			mutex_unlock(&iommu->iopf_lock);
>  			goto bad_req;
> +		}

Though now we have a mutex and a spinlock covering the same data
structure.. It could be optimized some more.

But maybe we should leave micro optimization aside for now.

Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>

Jason

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v2 0/2] iommu/vt-d: Introduce rbtree for probed devices
  2024-02-20  6:59 [PATCH v2 0/2] iommu/vt-d: Introduce rbtree for probed devices Lu Baolu
  2024-02-20  6:59 ` [PATCH v2 1/2] iommu/vt-d: Use rbtree to track iommu " Lu Baolu
  2024-02-20  6:59 ` [PATCH v2 2/2] iommu/vt-d: Use device rbtree in iopf reporting path Lu Baolu
@ 2024-02-26  6:15 ` Baolu Lu
  2 siblings, 0 replies; 5+ messages in thread
From: Baolu Lu @ 2024-02-26  6:15 UTC (permalink / raw)
  To: Joerg Roedel, Will Deacon, Robin Murphy, Jason Gunthorpe,
	Kevin Tian
  Cc: baolu.lu, Huang Jiaqing, Ethan Zhao, iommu, linux-kernel

On 2/20/24 2:59 PM, Lu Baolu wrote:
> This is a follow-up to the discussion thread here:
> 
> https://lore.kernel.org/linux-iommu/20240201193427.GQ50608@ziepe.ca/
> 
> This proposes a per-IOMMU red-black tree for iommu probed devices. It
> benefits fault handling paths, where the VT-d hardware reports the
> Source ID of the related device, and software needs to retrieve the
> corresponding device pointer based on this ID.
> 
> This is also available at github:
> 
> https://github.com/LuBaolu/intel-iommu/commits/rbtree-for-device-info-v2
> 
> Change log:
> 
> v2:
>   - Drop get/put_device() in device_rbtree_find();
>   - Add a mutex to synchronize iopf report and device release paths.
> 
> v1:https://lore.kernel.org/linux-iommu/20240215072249.4465-1-baolu.lu@linux.intel.com/
> 
> Lu Baolu (2):
>    iommu/vt-d: Use rbtree to track iommu probed devices
>    iommu/vt-d: Use device rbtree in iopf reporting path

Above queued for v6.9.

Best regards,
baolu

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2024-02-26  6:21 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-02-20  6:59 [PATCH v2 0/2] iommu/vt-d: Introduce rbtree for probed devices Lu Baolu
2024-02-20  6:59 ` [PATCH v2 1/2] iommu/vt-d: Use rbtree to track iommu " Lu Baolu
2024-02-20  6:59 ` [PATCH v2 2/2] iommu/vt-d: Use device rbtree in iopf reporting path Lu Baolu
2024-02-21 15:34   ` Jason Gunthorpe
2024-02-26  6:15 ` [PATCH v2 0/2] iommu/vt-d: Introduce rbtree for probed devices Baolu Lu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox