* [PATCH 0/2] iommu/vt-d: Introduce rbtree for probed devices
@ 2024-02-15 7:22 Lu Baolu
2024-02-15 7:22 ` [PATCH 1/2] iommu/vt-d: Use rbtree to track iommu " Lu Baolu
2024-02-15 7:22 ` [PATCH 2/2] iommu/vt-d: Use device rbtree in iopf reporting path Lu Baolu
0 siblings, 2 replies; 19+ messages in thread
From: Lu Baolu @ 2024-02-15 7:22 UTC (permalink / raw)
To: Joerg Roedel, Will Deacon, Robin Murphy, Jason Gunthorpe,
Kevin Tian
Cc: Huang Jiaqing, Ethan Zhao, iommu, linux-kernel, Lu Baolu
This is a follow-up to the discussion thread here:
https://lore.kernel.org/linux-iommu/20240201193427.GQ50608@ziepe.ca/
This proposes a per-IOMMU red-black tree for iommu probed devices. It
benefits fault handling paths, where the VT-d hardware reports the
Source ID of the related device, and software needs to retrieve the
corresponding device pointer based on this ID.
Lu Baolu (2):
iommu/vt-d: Use rbtree to track iommu probed devices
iommu/vt-d: Use device rbtree in iopf reporting path
drivers/iommu/intel/iommu.h | 8 ++++
drivers/iommu/intel/dmar.c | 3 +-
drivers/iommu/intel/iommu.c | 91 ++++++++++++++++++++++++++++++++++++-
drivers/iommu/intel/svm.c | 14 +++---
4 files changed, 105 insertions(+), 11 deletions(-)
--
2.34.1
^ permalink raw reply [flat|nested] 19+ messages in thread
* [PATCH 1/2] iommu/vt-d: Use rbtree to track iommu probed devices
2024-02-15 7:22 [PATCH 0/2] iommu/vt-d: Introduce rbtree for probed devices Lu Baolu
@ 2024-02-15 7:22 ` Lu Baolu
2024-02-15 17:47 ` Jason Gunthorpe
2024-02-19 2:45 ` Ethan Zhao
2024-02-15 7:22 ` [PATCH 2/2] iommu/vt-d: Use device rbtree in iopf reporting path Lu Baolu
1 sibling, 2 replies; 19+ messages in thread
From: Lu Baolu @ 2024-02-15 7:22 UTC (permalink / raw)
To: Joerg Roedel, Will Deacon, Robin Murphy, Jason Gunthorpe,
Kevin Tian
Cc: Huang Jiaqing, Ethan Zhao, iommu, linux-kernel, Lu Baolu
Use a red-black tree(rbtree) to track devices probed by the driver's
probe_device callback. These devices need to be looked up quickly by
a source ID when the hardware reports a fault, either recoverable or
unrecoverable.
Fault reporting paths are critical. Searching a list in this scenario
is inefficient, with an algorithm complexity of O(n). An rbtree is a
self-balancing binary search tree, offering an average search time
complexity of O(log(n)). This significant performance improvement
makes rbtrees a better choice.
Furthermore, rbtrees are implemented on a per-iommu basis, eliminating
the need for global searches and further enhancing efficiency in
critical fault paths. The rbtree is protected by a spin lock with
interrupts disabled to ensure thread-safe access even within interrupt
contexts.
Co-developed-by: Huang Jiaqing <jiaqing.huang@intel.com>
Signed-off-by: Huang Jiaqing <jiaqing.huang@intel.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
---
drivers/iommu/intel/iommu.h | 7 +++++
drivers/iommu/intel/dmar.c | 3 +-
drivers/iommu/intel/iommu.c | 62 +++++++++++++++++++++++++++++++++++--
3 files changed, 69 insertions(+), 3 deletions(-)
diff --git a/drivers/iommu/intel/iommu.h b/drivers/iommu/intel/iommu.h
index cf9a28c7fab8..54eeaa8e35a9 100644
--- a/drivers/iommu/intel/iommu.h
+++ b/drivers/iommu/intel/iommu.h
@@ -716,6 +716,11 @@ struct intel_iommu {
struct q_inval *qi; /* Queued invalidation info */
u32 iommu_state[MAX_SR_DMAR_REGS]; /* Store iommu states between suspend and resume.*/
+ /* rb tree for all probed devices */
+ struct rb_root device_rbtree;
+ /* protect the device_rbtree */
+ spinlock_t device_rbtree_lock;
+
#ifdef CONFIG_IRQ_REMAP
struct ir_table *ir_table; /* Interrupt remapping info */
struct irq_domain *ir_domain;
@@ -749,6 +754,8 @@ struct device_domain_info {
struct intel_iommu *iommu; /* IOMMU used by this device */
struct dmar_domain *domain; /* pointer to domain */
struct pasid_table *pasid_table; /* pasid table */
+ /* device tracking node(lookup by PCI RID) */
+ struct rb_node node;
#ifdef CONFIG_INTEL_IOMMU_DEBUGFS
struct dentry *debugfs_dentry; /* pointer to device directory dentry */
#endif
diff --git a/drivers/iommu/intel/dmar.c b/drivers/iommu/intel/dmar.c
index 23cb80d62a9a..f9b63c2875f7 100644
--- a/drivers/iommu/intel/dmar.c
+++ b/drivers/iommu/intel/dmar.c
@@ -1095,7 +1095,8 @@ static int alloc_iommu(struct dmar_drhd_unit *drhd)
iommu->agaw = agaw;
iommu->msagaw = msagaw;
iommu->segment = drhd->segment;
-
+ iommu->device_rbtree = RB_ROOT;
+ spin_lock_init(&iommu->device_rbtree_lock);
iommu->node = NUMA_NO_NODE;
ver = readl(iommu->reg + DMAR_VER_REG);
diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index a81a2be9b870..09009d96e553 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -96,6 +96,55 @@ static phys_addr_t root_entry_uctp(struct root_entry *re)
return re->hi & VTD_PAGE_MASK;
}
+static int device_rid_cmp_key(const void *key, const struct rb_node *node)
+{
+ struct device_domain_info *info =
+ rb_entry(node, struct device_domain_info, node);
+ const u16 *rid_lhs = key;
+
+ if (*rid_lhs < PCI_DEVID(info->bus, info->devfn))
+ return -1;
+
+ if (*rid_lhs > PCI_DEVID(info->bus, info->devfn))
+ return 1;
+
+ return 0;
+}
+
+static int device_rid_cmp(struct rb_node *lhs, const struct rb_node *rhs)
+{
+ struct device_domain_info *info =
+ rb_entry(lhs, struct device_domain_info, node);
+ u16 key = PCI_DEVID(info->bus, info->devfn);
+
+ return device_rid_cmp_key(&key, rhs);
+}
+
+static int device_rbtree_insert(struct intel_iommu *iommu,
+ struct device_domain_info *info)
+{
+ struct rb_node *curr;
+ unsigned long flags;
+
+ spin_lock_irqsave(&iommu->device_rbtree_lock, flags);
+ curr = rb_find_add(&info->node, &iommu->device_rbtree, device_rid_cmp);
+ spin_unlock_irqrestore(&iommu->device_rbtree_lock, flags);
+ if (curr)
+ dev_warn(info->dev, "device already in rbtree\n");
+
+ return curr ? -EEXIST : 0;
+}
+
+static void device_rbtree_remove(struct device_domain_info *info)
+{
+ struct intel_iommu *iommu = info->iommu;
+ unsigned long flags;
+
+ spin_lock_irqsave(&iommu->device_rbtree_lock, flags);
+ rb_erase(&info->node, &iommu->device_rbtree);
+ spin_unlock_irqrestore(&iommu->device_rbtree_lock, flags);
+}
+
/*
* This domain is a statically identity mapping domain.
* 1. This domain creats a static 1:1 mapping to all usable memory.
@@ -4264,25 +4313,34 @@ static struct iommu_device *intel_iommu_probe_device(struct device *dev)
}
dev_iommu_priv_set(dev, info);
+ ret = device_rbtree_insert(iommu, info);
+ if (ret)
+ goto free;
if (sm_supported(iommu) && !dev_is_real_dma_subdevice(dev)) {
ret = intel_pasid_alloc_table(dev);
if (ret) {
dev_err(dev, "PASID table allocation failed\n");
- kfree(info);
- return ERR_PTR(ret);
+ goto clear_rbtree;
}
}
intel_iommu_debugfs_create_dev(info);
return &iommu->iommu;
+clear_rbtree:
+ device_rbtree_remove(info);
+free:
+ kfree(info);
+
+ return ERR_PTR(ret);
}
static void intel_iommu_release_device(struct device *dev)
{
struct device_domain_info *info = dev_iommu_priv_get(dev);
+ device_rbtree_remove(info);
dmar_remove_one_dev_info(dev);
intel_pasid_free_table(dev);
intel_iommu_debugfs_remove_dev(info);
--
2.34.1
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [PATCH 2/2] iommu/vt-d: Use device rbtree in iopf reporting path
2024-02-15 7:22 [PATCH 0/2] iommu/vt-d: Introduce rbtree for probed devices Lu Baolu
2024-02-15 7:22 ` [PATCH 1/2] iommu/vt-d: Use rbtree to track iommu " Lu Baolu
@ 2024-02-15 7:22 ` Lu Baolu
2024-02-15 17:55 ` Jason Gunthorpe
2024-02-19 6:54 ` Ethan Zhao
1 sibling, 2 replies; 19+ messages in thread
From: Lu Baolu @ 2024-02-15 7:22 UTC (permalink / raw)
To: Joerg Roedel, Will Deacon, Robin Murphy, Jason Gunthorpe,
Kevin Tian
Cc: Huang Jiaqing, Ethan Zhao, iommu, linux-kernel, Lu Baolu
The existing IO page fault handler currently locates the PCI device by
calling pci_get_domain_bus_and_slot(). This function searches the list
of all PCI devices until the desired device is found. To improve lookup
efficiency, a helper function named device_rbtree_find() is introduced
to search for the device within the rbtree. Replace
pci_get_domain_bus_and_slot() in the IO page fault handling path.
Co-developed-by: Huang Jiaqing <jiaqing.huang@intel.com>
Signed-off-by: Huang Jiaqing <jiaqing.huang@intel.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
---
drivers/iommu/intel/iommu.h | 1 +
drivers/iommu/intel/iommu.c | 29 +++++++++++++++++++++++++++++
drivers/iommu/intel/svm.c | 14 ++++++--------
3 files changed, 36 insertions(+), 8 deletions(-)
diff --git a/drivers/iommu/intel/iommu.h b/drivers/iommu/intel/iommu.h
index 54eeaa8e35a9..f13c228924f8 100644
--- a/drivers/iommu/intel/iommu.h
+++ b/drivers/iommu/intel/iommu.h
@@ -1081,6 +1081,7 @@ void free_pgtable_page(void *vaddr);
void iommu_flush_write_buffer(struct intel_iommu *iommu);
struct iommu_domain *intel_nested_domain_alloc(struct iommu_domain *parent,
const struct iommu_user_data *user_data);
+struct device *device_rbtree_find(struct intel_iommu *iommu, u16 rid);
#ifdef CONFIG_INTEL_IOMMU_SVM
void intel_svm_check(struct intel_iommu *iommu);
diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index 09009d96e553..d92c680bcc96 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -120,6 +120,35 @@ static int device_rid_cmp(struct rb_node *lhs, const struct rb_node *rhs)
return device_rid_cmp_key(&key, rhs);
}
+/*
+ * Looks up an IOMMU-probed device using its source ID.
+ *
+ * If the device is found:
+ * - Increments its reference count.
+ * - Returns a pointer to the device.
+ * - The caller must call put_device() after using the pointer.
+ *
+ * If the device is not found, returns NULL.
+ */
+struct device *device_rbtree_find(struct intel_iommu *iommu, u16 rid)
+{
+ struct device_domain_info *info;
+ struct device *dev = NULL;
+ struct rb_node *node;
+ unsigned long flags;
+
+ spin_lock_irqsave(&iommu->device_rbtree_lock, flags);
+ node = rb_find(&rid, &iommu->device_rbtree, device_rid_cmp_key);
+ if (node) {
+ info = rb_entry(node, struct device_domain_info, node);
+ dev = info->dev;
+ get_device(dev);
+ }
+ spin_unlock_irqrestore(&iommu->device_rbtree_lock, flags);
+
+ return dev;
+}
+
static int device_rbtree_insert(struct intel_iommu *iommu,
struct device_domain_info *info)
{
diff --git a/drivers/iommu/intel/svm.c b/drivers/iommu/intel/svm.c
index b644d57da841..717b7041973c 100644
--- a/drivers/iommu/intel/svm.c
+++ b/drivers/iommu/intel/svm.c
@@ -645,7 +645,7 @@ static irqreturn_t prq_event_thread(int irq, void *d)
struct intel_iommu *iommu = d;
struct page_req_dsc *req;
int head, tail, handled;
- struct pci_dev *pdev;
+ struct device *dev;
u64 address;
/*
@@ -691,21 +691,19 @@ static irqreturn_t prq_event_thread(int irq, void *d)
if (unlikely(req->lpig && !req->rd_req && !req->wr_req))
goto prq_advance;
- pdev = pci_get_domain_bus_and_slot(iommu->segment,
- PCI_BUS_NUM(req->rid),
- req->rid & 0xff);
/*
* If prq is to be handled outside iommu driver via receiver of
* the fault notifiers, we skip the page response here.
*/
- if (!pdev)
+ dev = device_rbtree_find(iommu, req->rid);
+ if (!dev)
goto bad_req;
- intel_svm_prq_report(iommu, &pdev->dev, req);
- trace_prq_report(iommu, &pdev->dev, req->qw_0, req->qw_1,
+ intel_svm_prq_report(iommu, dev, req);
+ trace_prq_report(iommu, dev, req->qw_0, req->qw_1,
req->priv_data[0], req->priv_data[1],
iommu->prq_seq_number++);
- pci_dev_put(pdev);
+ put_device(dev);
prq_advance:
head = (head + sizeof(*req)) & PRQ_RING_MASK;
}
--
2.34.1
^ permalink raw reply related [flat|nested] 19+ messages in thread
* Re: [PATCH 1/2] iommu/vt-d: Use rbtree to track iommu probed devices
2024-02-15 7:22 ` [PATCH 1/2] iommu/vt-d: Use rbtree to track iommu " Lu Baolu
@ 2024-02-15 17:47 ` Jason Gunthorpe
2024-02-18 4:22 ` Baolu Lu
2024-02-19 2:45 ` Ethan Zhao
1 sibling, 1 reply; 19+ messages in thread
From: Jason Gunthorpe @ 2024-02-15 17:47 UTC (permalink / raw)
To: Lu Baolu
Cc: Joerg Roedel, Will Deacon, Robin Murphy, Kevin Tian,
Huang Jiaqing, Ethan Zhao, iommu, linux-kernel
On Thu, Feb 15, 2024 at 03:22:48PM +0800, Lu Baolu wrote:
> Use a red-black tree(rbtree) to track devices probed by the driver's
> probe_device callback. These devices need to be looked up quickly by
> a source ID when the hardware reports a fault, either recoverable or
> unrecoverable.
>
> Fault reporting paths are critical. Searching a list in this scenario
> is inefficient, with an algorithm complexity of O(n). An rbtree is a
> self-balancing binary search tree, offering an average search time
> complexity of O(log(n)). This significant performance improvement
> makes rbtrees a better choice.
>
> Furthermore, rbtrees are implemented on a per-iommu basis, eliminating
> the need for global searches and further enhancing efficiency in
> critical fault paths. The rbtree is protected by a spin lock with
> interrupts disabled to ensure thread-safe access even within interrupt
> contexts.
>
> Co-developed-by: Huang Jiaqing <jiaqing.huang@intel.com>
> Signed-off-by: Huang Jiaqing <jiaqing.huang@intel.com>
> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
> ---
> drivers/iommu/intel/iommu.h | 7 +++++
> drivers/iommu/intel/dmar.c | 3 +-
> drivers/iommu/intel/iommu.c | 62 +++++++++++++++++++++++++++++++++++--
> 3 files changed, 69 insertions(+), 3 deletions(-)
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
> +static int device_rbtree_insert(struct intel_iommu *iommu,
> + struct device_domain_info *info)
> +{
> + struct rb_node *curr;
> + unsigned long flags;
> +
> + spin_lock_irqsave(&iommu->device_rbtree_lock, flags);
> + curr = rb_find_add(&info->node, &iommu->device_rbtree, device_rid_cmp);
> + spin_unlock_irqrestore(&iommu->device_rbtree_lock, flags);
> + if (curr)
> + dev_warn(info->dev, "device already in rbtree\n");
I would suggest
WARN_ON(curr);
Something has gone really wonky at this point, right?
Jason
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 2/2] iommu/vt-d: Use device rbtree in iopf reporting path
2024-02-15 7:22 ` [PATCH 2/2] iommu/vt-d: Use device rbtree in iopf reporting path Lu Baolu
@ 2024-02-15 17:55 ` Jason Gunthorpe
2024-02-18 7:02 ` Baolu Lu
2024-02-21 7:04 ` Ethan Zhao
2024-02-19 6:54 ` Ethan Zhao
1 sibling, 2 replies; 19+ messages in thread
From: Jason Gunthorpe @ 2024-02-15 17:55 UTC (permalink / raw)
To: Lu Baolu
Cc: Joerg Roedel, Will Deacon, Robin Murphy, Kevin Tian,
Huang Jiaqing, Ethan Zhao, iommu, linux-kernel
On Thu, Feb 15, 2024 at 03:22:49PM +0800, Lu Baolu wrote:
> The existing IO page fault handler currently locates the PCI device by
> calling pci_get_domain_bus_and_slot(). This function searches the list
> of all PCI devices until the desired device is found. To improve lookup
> efficiency, a helper function named device_rbtree_find() is introduced
> to search for the device within the rbtree. Replace
> pci_get_domain_bus_and_slot() in the IO page fault handling path.
>
> Co-developed-by: Huang Jiaqing <jiaqing.huang@intel.com>
> Signed-off-by: Huang Jiaqing <jiaqing.huang@intel.com>
> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
> ---
> drivers/iommu/intel/iommu.h | 1 +
> drivers/iommu/intel/iommu.c | 29 +++++++++++++++++++++++++++++
> drivers/iommu/intel/svm.c | 14 ++++++--------
> 3 files changed, 36 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/iommu/intel/iommu.h b/drivers/iommu/intel/iommu.h
> index 54eeaa8e35a9..f13c228924f8 100644
> --- a/drivers/iommu/intel/iommu.h
> +++ b/drivers/iommu/intel/iommu.h
> @@ -1081,6 +1081,7 @@ void free_pgtable_page(void *vaddr);
> void iommu_flush_write_buffer(struct intel_iommu *iommu);
> struct iommu_domain *intel_nested_domain_alloc(struct iommu_domain *parent,
> const struct iommu_user_data *user_data);
> +struct device *device_rbtree_find(struct intel_iommu *iommu, u16 rid);
>
> #ifdef CONFIG_INTEL_IOMMU_SVM
> void intel_svm_check(struct intel_iommu *iommu);
> diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
> index 09009d96e553..d92c680bcc96 100644
> --- a/drivers/iommu/intel/iommu.c
> +++ b/drivers/iommu/intel/iommu.c
> @@ -120,6 +120,35 @@ static int device_rid_cmp(struct rb_node *lhs, const struct rb_node *rhs)
> return device_rid_cmp_key(&key, rhs);
> }
>
> +/*
> + * Looks up an IOMMU-probed device using its source ID.
> + *
> + * If the device is found:
> + * - Increments its reference count.
> + * - Returns a pointer to the device.
> + * - The caller must call put_device() after using the pointer.
> + *
> + * If the device is not found, returns NULL.
> + */
> +struct device *device_rbtree_find(struct intel_iommu *iommu, u16 rid)
> +{
> + struct device_domain_info *info;
> + struct device *dev = NULL;
> + struct rb_node *node;
> + unsigned long flags;
> +
> + spin_lock_irqsave(&iommu->device_rbtree_lock, flags);
> + node = rb_find(&rid, &iommu->device_rbtree, device_rid_cmp_key);
> + if (node) {
> + info = rb_entry(node, struct device_domain_info, node);
> + dev = info->dev;
> + get_device(dev);
This get_device() is a bit troubling. It eventually calls into
iommu_report_device_fault() which does:
struct dev_iommu *param = dev->iommu;
Which is going to explode if the iomm driver release has already
happened, which is a precondition to getting to a unref'd struct
device.
The driver needs to do something to fence these events during it's
release function.
If we are already doing that then I'd suggest to drop the get_device
and add a big fat comment explaining the special rules about lifetime
that are in effect here.
Otherwise you need to do that barrier rethink the way the locking
works..
Aside from that this looks like a great improvement to me
Thanks,
Jason
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 1/2] iommu/vt-d: Use rbtree to track iommu probed devices
2024-02-15 17:47 ` Jason Gunthorpe
@ 2024-02-18 4:22 ` Baolu Lu
0 siblings, 0 replies; 19+ messages in thread
From: Baolu Lu @ 2024-02-18 4:22 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: baolu.lu, Joerg Roedel, Will Deacon, Robin Murphy, Kevin Tian,
Huang Jiaqing, Ethan Zhao, iommu, linux-kernel
On 2024/2/16 1:47, Jason Gunthorpe wrote:
> On Thu, Feb 15, 2024 at 03:22:48PM +0800, Lu Baolu wrote:
>> Use a red-black tree(rbtree) to track devices probed by the driver's
>> probe_device callback. These devices need to be looked up quickly by
>> a source ID when the hardware reports a fault, either recoverable or
>> unrecoverable.
>>
>> Fault reporting paths are critical. Searching a list in this scenario
>> is inefficient, with an algorithm complexity of O(n). An rbtree is a
>> self-balancing binary search tree, offering an average search time
>> complexity of O(log(n)). This significant performance improvement
>> makes rbtrees a better choice.
>>
>> Furthermore, rbtrees are implemented on a per-iommu basis, eliminating
>> the need for global searches and further enhancing efficiency in
>> critical fault paths. The rbtree is protected by a spin lock with
>> interrupts disabled to ensure thread-safe access even within interrupt
>> contexts.
>>
>> Co-developed-by: Huang Jiaqing<jiaqing.huang@intel.com>
>> Signed-off-by: Huang Jiaqing<jiaqing.huang@intel.com>
>> Signed-off-by: Lu Baolu<baolu.lu@linux.intel.com>
>> ---
>> drivers/iommu/intel/iommu.h | 7 +++++
>> drivers/iommu/intel/dmar.c | 3 +-
>> drivers/iommu/intel/iommu.c | 62 +++++++++++++++++++++++++++++++++++--
>> 3 files changed, 69 insertions(+), 3 deletions(-)
> Reviewed-by: Jason Gunthorpe<jgg@nvidia.com>
>
>> +static int device_rbtree_insert(struct intel_iommu *iommu,
>> + struct device_domain_info *info)
>> +{
>> + struct rb_node *curr;
>> + unsigned long flags;
>> +
>> + spin_lock_irqsave(&iommu->device_rbtree_lock, flags);
>> + curr = rb_find_add(&info->node, &iommu->device_rbtree, device_rid_cmp);
>> + spin_unlock_irqrestore(&iommu->device_rbtree_lock, flags);
>> + if (curr)
>> + dev_warn(info->dev, "device already in rbtree\n");
> I would suggest
>
> WARN_ON(curr);
>
> Something has gone really wonky at this point, right?
Yes. This is not expected, and it is worth a WARN_ON().
Best regards,
baolu
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 2/2] iommu/vt-d: Use device rbtree in iopf reporting path
2024-02-15 17:55 ` Jason Gunthorpe
@ 2024-02-18 7:02 ` Baolu Lu
2024-02-21 15:31 ` Jason Gunthorpe
2024-02-21 7:04 ` Ethan Zhao
1 sibling, 1 reply; 19+ messages in thread
From: Baolu Lu @ 2024-02-18 7:02 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: baolu.lu, Joerg Roedel, Will Deacon, Robin Murphy, Kevin Tian,
Huang Jiaqing, Ethan Zhao, iommu, linux-kernel
On 2024/2/16 1:55, Jason Gunthorpe wrote:
> On Thu, Feb 15, 2024 at 03:22:49PM +0800, Lu Baolu wrote:
>> The existing IO page fault handler currently locates the PCI device by
>> calling pci_get_domain_bus_and_slot(). This function searches the list
>> of all PCI devices until the desired device is found. To improve lookup
>> efficiency, a helper function named device_rbtree_find() is introduced
>> to search for the device within the rbtree. Replace
>> pci_get_domain_bus_and_slot() in the IO page fault handling path.
>>
>> Co-developed-by: Huang Jiaqing <jiaqing.huang@intel.com>
>> Signed-off-by: Huang Jiaqing <jiaqing.huang@intel.com>
>> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
>> ---
>> drivers/iommu/intel/iommu.h | 1 +
>> drivers/iommu/intel/iommu.c | 29 +++++++++++++++++++++++++++++
>> drivers/iommu/intel/svm.c | 14 ++++++--------
>> 3 files changed, 36 insertions(+), 8 deletions(-)
>>
>> diff --git a/drivers/iommu/intel/iommu.h b/drivers/iommu/intel/iommu.h
>> index 54eeaa8e35a9..f13c228924f8 100644
>> --- a/drivers/iommu/intel/iommu.h
>> +++ b/drivers/iommu/intel/iommu.h
>> @@ -1081,6 +1081,7 @@ void free_pgtable_page(void *vaddr);
>> void iommu_flush_write_buffer(struct intel_iommu *iommu);
>> struct iommu_domain *intel_nested_domain_alloc(struct iommu_domain *parent,
>> const struct iommu_user_data *user_data);
>> +struct device *device_rbtree_find(struct intel_iommu *iommu, u16 rid);
>>
>> #ifdef CONFIG_INTEL_IOMMU_SVM
>> void intel_svm_check(struct intel_iommu *iommu);
>> diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
>> index 09009d96e553..d92c680bcc96 100644
>> --- a/drivers/iommu/intel/iommu.c
>> +++ b/drivers/iommu/intel/iommu.c
>> @@ -120,6 +120,35 @@ static int device_rid_cmp(struct rb_node *lhs, const struct rb_node *rhs)
>> return device_rid_cmp_key(&key, rhs);
>> }
>>
>> +/*
>> + * Looks up an IOMMU-probed device using its source ID.
>> + *
>> + * If the device is found:
>> + * - Increments its reference count.
>> + * - Returns a pointer to the device.
>> + * - The caller must call put_device() after using the pointer.
>> + *
>> + * If the device is not found, returns NULL.
>> + */
>> +struct device *device_rbtree_find(struct intel_iommu *iommu, u16 rid)
>> +{
>> + struct device_domain_info *info;
>> + struct device *dev = NULL;
>> + struct rb_node *node;
>> + unsigned long flags;
>> +
>> + spin_lock_irqsave(&iommu->device_rbtree_lock, flags);
>> + node = rb_find(&rid, &iommu->device_rbtree, device_rid_cmp_key);
>> + if (node) {
>> + info = rb_entry(node, struct device_domain_info, node);
>> + dev = info->dev;
>> + get_device(dev);
>
> This get_device() is a bit troubling. It eventually calls into
> iommu_report_device_fault() which does:
>
> struct dev_iommu *param = dev->iommu;
>
> Which is going to explode if the iomm driver release has already
> happened, which is a precondition to getting to a unref'd struct
> device.
>
> The driver needs to do something to fence these events during it's
> release function.
Yes, theoretically the dev->iommu should be protected in the
iommu_report_device_fault() path.
>
> If we are already doing that then I'd suggest to drop the get_device
> and add a big fat comment explaining the special rules about lifetime
> that are in effect here.
>
> Otherwise you need to do that barrier rethink the way the locking
> works..
A device hot removing goes through at least the following steps:
- Disable PRI.
- Drain all outstanding I/O page faults.
- Stop DMA.
- Unload the device driver.
- Call iommu_release_device() upon the BUS_NOTIFY_REMOVED_DEVICE event.
This sequence ensures that a device cannot generate an I/O page fault
after PRI has been disabled. So in reality it's impossible for a device
to generate an I/O page fault before disabling PRI and then go through
the long journey to reach iommu_release_device() before
iopf_get_dev_fault_param() is called in page fault interrupt handling
thread.
Considering this behavior, adding a comment to the code explaining the
sequence and removing put_device() may be a simpler solution?
>
> Aside from that this looks like a great improvement to me
>
> Thanks,
> Jason
Best regards,
baolu
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 1/2] iommu/vt-d: Use rbtree to track iommu probed devices
2024-02-15 7:22 ` [PATCH 1/2] iommu/vt-d: Use rbtree to track iommu " Lu Baolu
2024-02-15 17:47 ` Jason Gunthorpe
@ 2024-02-19 2:45 ` Ethan Zhao
2024-02-19 4:04 ` Baolu Lu
1 sibling, 1 reply; 19+ messages in thread
From: Ethan Zhao @ 2024-02-19 2:45 UTC (permalink / raw)
To: Lu Baolu, Joerg Roedel, Will Deacon, Robin Murphy,
Jason Gunthorpe, Kevin Tian
Cc: Huang Jiaqing, iommu, linux-kernel
On 2/15/2024 3:22 PM, Lu Baolu wrote:
> Use a red-black tree(rbtree) to track devices probed by the driver's
> probe_device callback. These devices need to be looked up quickly by
> a source ID when the hardware reports a fault, either recoverable or
> unrecoverable.
>
> Fault reporting paths are critical. Searching a list in this scenario
> is inefficient, with an algorithm complexity of O(n). An rbtree is a
> self-balancing binary search tree, offering an average search time
> complexity of O(log(n)). This significant performance improvement
> makes rbtrees a better choice.
>
> Furthermore, rbtrees are implemented on a per-iommu basis, eliminating
> the need for global searches and further enhancing efficiency in
> critical fault paths. The rbtree is protected by a spin lock with
> interrupts disabled to ensure thread-safe access even within interrupt
> contexts.
>
> Co-developed-by: Huang Jiaqing <jiaqing.huang@intel.com>
> Signed-off-by: Huang Jiaqing <jiaqing.huang@intel.com>
> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
> ---
> drivers/iommu/intel/iommu.h | 7 +++++
> drivers/iommu/intel/dmar.c | 3 +-
> drivers/iommu/intel/iommu.c | 62 +++++++++++++++++++++++++++++++++++--
> 3 files changed, 69 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/iommu/intel/iommu.h b/drivers/iommu/intel/iommu.h
> index cf9a28c7fab8..54eeaa8e35a9 100644
> --- a/drivers/iommu/intel/iommu.h
> +++ b/drivers/iommu/intel/iommu.h
> @@ -716,6 +716,11 @@ struct intel_iommu {
> struct q_inval *qi; /* Queued invalidation info */
> u32 iommu_state[MAX_SR_DMAR_REGS]; /* Store iommu states between suspend and resume.*/
>
> + /* rb tree for all probed devices */
> + struct rb_root device_rbtree;
> + /* protect the device_rbtree */
> + spinlock_t device_rbtree_lock;
> +
> #ifdef CONFIG_IRQ_REMAP
> struct ir_table *ir_table; /* Interrupt remapping info */
> struct irq_domain *ir_domain;
> @@ -749,6 +754,8 @@ struct device_domain_info {
> struct intel_iommu *iommu; /* IOMMU used by this device */
> struct dmar_domain *domain; /* pointer to domain */
> struct pasid_table *pasid_table; /* pasid table */
> + /* device tracking node(lookup by PCI RID) */
> + struct rb_node node;
> #ifdef CONFIG_INTEL_IOMMU_DEBUGFS
> struct dentry *debugfs_dentry; /* pointer to device directory dentry */
> #endif
> diff --git a/drivers/iommu/intel/dmar.c b/drivers/iommu/intel/dmar.c
> index 23cb80d62a9a..f9b63c2875f7 100644
> --- a/drivers/iommu/intel/dmar.c
> +++ b/drivers/iommu/intel/dmar.c
> @@ -1095,7 +1095,8 @@ static int alloc_iommu(struct dmar_drhd_unit *drhd)
> iommu->agaw = agaw;
> iommu->msagaw = msagaw;
> iommu->segment = drhd->segment;
> -
> + iommu->device_rbtree = RB_ROOT;
> + spin_lock_init(&iommu->device_rbtree_lock);
> iommu->node = NUMA_NO_NODE;
>
> ver = readl(iommu->reg + DMAR_VER_REG);
> diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
> index a81a2be9b870..09009d96e553 100644
> --- a/drivers/iommu/intel/iommu.c
> +++ b/drivers/iommu/intel/iommu.c
> @@ -96,6 +96,55 @@ static phys_addr_t root_entry_uctp(struct root_entry *re)
> return re->hi & VTD_PAGE_MASK;
> }
>
> +static int device_rid_cmp_key(const void *key, const struct rb_node *node)
> +{
> + struct device_domain_info *info =
> + rb_entry(node, struct device_domain_info, node);
> + const u16 *rid_lhs = key;
> +
> + if (*rid_lhs < PCI_DEVID(info->bus, info->devfn))
> + return -1;
> +
> + if (*rid_lhs > PCI_DEVID(info->bus, info->devfn))
> + return 1;
> +
> + return 0;
> +}
> +
> +static int device_rid_cmp(struct rb_node *lhs, const struct rb_node *rhs)
> +{
> + struct device_domain_info *info =
> + rb_entry(lhs, struct device_domain_info, node);
> + u16 key = PCI_DEVID(info->bus, info->devfn);
> +
> + return device_rid_cmp_key(&key, rhs);
> +}
> +
> +static int device_rbtree_insert(struct intel_iommu *iommu,
> + struct device_domain_info *info)
> +{
> + struct rb_node *curr;
> + unsigned long flags;
> +
> + spin_lock_irqsave(&iommu->device_rbtree_lock, flags);
> + curr = rb_find_add(&info->node, &iommu->device_rbtree, device_rid_cmp);
> + spin_unlock_irqrestore(&iommu->device_rbtree_lock, flags);
> + if (curr)
> + dev_warn(info->dev, "device already in rbtree\n");
> +
> + return curr ? -EEXIST : 0;
> +}
> +
> +static void device_rbtree_remove(struct device_domain_info *info)
> +{
> + struct intel_iommu *iommu = info->iommu;
> + unsigned long flags;
> +
> + spin_lock_irqsave(&iommu->device_rbtree_lock, flags);
> + rb_erase(&info->node, &iommu->device_rbtree);
> + spin_unlock_irqrestore(&iommu->device_rbtree_lock, flags);
> +}
> +
> /*
> * This domain is a statically identity mapping domain.
> * 1. This domain creats a static 1:1 mapping to all usable memory.
> @@ -4264,25 +4313,34 @@ static struct iommu_device *intel_iommu_probe_device(struct device *dev)
> }
>
> dev_iommu_priv_set(dev, info);
> + ret = device_rbtree_insert(iommu, info);
> + if (ret)
> + goto free;
>
> if (sm_supported(iommu) && !dev_is_real_dma_subdevice(dev)) {
> ret = intel_pasid_alloc_table(dev);
> if (ret) {
> dev_err(dev, "PASID table allocation failed\n");
> - kfree(info);
> - return ERR_PTR(ret);
> + goto clear_rbtree;
> }
> }
>
> intel_iommu_debugfs_create_dev(info);
>
> return &iommu->iommu;
> +clear_rbtree:
> + device_rbtree_remove(info);
> +free:
> + kfree(info);
> +
> + return ERR_PTR(ret);
> }
>
> static void intel_iommu_release_device(struct device *dev)
> {
> struct device_domain_info *info = dev_iommu_priv_get(dev);
>
> + device_rbtree_remove(info);
Perhpas too early here to remove dev from the rbtree, if it is wanted in
devTLB invalidation steps in intel_pasid_tear_down_entry().
Thanks,
Ethan
> dmar_remove_one_dev_info(dev);
> intel_pasid_free_table(dev);
> intel_iommu_debugfs_remove_dev(info);
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 1/2] iommu/vt-d: Use rbtree to track iommu probed devices
2024-02-19 2:45 ` Ethan Zhao
@ 2024-02-19 4:04 ` Baolu Lu
2024-02-19 5:33 ` Ethan Zhao
0 siblings, 1 reply; 19+ messages in thread
From: Baolu Lu @ 2024-02-19 4:04 UTC (permalink / raw)
To: Ethan Zhao, Joerg Roedel, Will Deacon, Robin Murphy,
Jason Gunthorpe, Kevin Tian
Cc: baolu.lu, Huang Jiaqing, iommu, linux-kernel
On 2024/2/19 10:45, Ethan Zhao wrote:
>> @@ -4264,25 +4313,34 @@ static struct iommu_device
>> *intel_iommu_probe_device(struct device *dev)
>> }
>> dev_iommu_priv_set(dev, info);
>> + ret = device_rbtree_insert(iommu, info);
>> + if (ret)
>> + goto free;
>> if (sm_supported(iommu) && !dev_is_real_dma_subdevice(dev)) {
>> ret = intel_pasid_alloc_table(dev);
>> if (ret) {
>> dev_err(dev, "PASID table allocation failed\n");
>> - kfree(info);
>> - return ERR_PTR(ret);
>> + goto clear_rbtree;
>> }
>> }
>> intel_iommu_debugfs_create_dev(info);
>> return &iommu->iommu;
>> +clear_rbtree:
>> + device_rbtree_remove(info);
>> +free:
>> + kfree(info);
>> +
>> + return ERR_PTR(ret);
>> }
>> static void intel_iommu_release_device(struct device *dev)
>> {
>> struct device_domain_info *info = dev_iommu_priv_get(dev);
>> + device_rbtree_remove(info);
>
> Perhpas too early here to remove dev from the rbtree, if it is wanted in
> devTLB invalidation steps in intel_pasid_tear_down_entry().
Perhaps the caller of device_rbtree_find() should not depend on the
order in the release_device callback. For the device TLB invalidation
timed-out case, probably it could be checked in this way:
struct device *dev = device_rbtree_find(iommu, ite_sid);
if (!dev || !pci_device_is_present(to_pci_dev(dev)))
return -ETIMEDOUT;
Best regards,
baolu
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 1/2] iommu/vt-d: Use rbtree to track iommu probed devices
2024-02-19 4:04 ` Baolu Lu
@ 2024-02-19 5:33 ` Ethan Zhao
2024-02-19 6:47 ` Baolu Lu
0 siblings, 1 reply; 19+ messages in thread
From: Ethan Zhao @ 2024-02-19 5:33 UTC (permalink / raw)
To: Baolu Lu, Joerg Roedel, Will Deacon, Robin Murphy,
Jason Gunthorpe, Kevin Tian
Cc: Huang Jiaqing, iommu, linux-kernel
On 2/19/2024 12:04 PM, Baolu Lu wrote:
> On 2024/2/19 10:45, Ethan Zhao wrote:
>>> @@ -4264,25 +4313,34 @@ static struct iommu_device
>>> *intel_iommu_probe_device(struct device *dev)
>>> }
>>> dev_iommu_priv_set(dev, info);
>>> + ret = device_rbtree_insert(iommu, info);
>>> + if (ret)
>>> + goto free;
>>> if (sm_supported(iommu) && !dev_is_real_dma_subdevice(dev)) {
>>> ret = intel_pasid_alloc_table(dev);
>>> if (ret) {
>>> dev_err(dev, "PASID table allocation failed\n");
>>> - kfree(info);
>>> - return ERR_PTR(ret);
>>> + goto clear_rbtree;
>>> }
>>> }
>>> intel_iommu_debugfs_create_dev(info);
>>> return &iommu->iommu;
>>> +clear_rbtree:
>>> + device_rbtree_remove(info);
>>> +free:
>>> + kfree(info);
>>> +
>>> + return ERR_PTR(ret);
>>> }
>>> static void intel_iommu_release_device(struct device *dev)
>>> {
>>> struct device_domain_info *info = dev_iommu_priv_get(dev);
>>> + device_rbtree_remove(info);
>>
>> Perhpas too early here to remove dev from the rbtree, if it is wanted in
>> devTLB invalidation steps in intel_pasid_tear_down_entry().
>
> Perhaps the caller of device_rbtree_find() should not depend on the
I didn't catch up here. seems have to maintain the lifecycle as PCI subsystem
does, or there would be mutli instances for the same BDF(e.g. the device is
removed then plugged, again and again.....in the same slot) in the rbtree ?
> order in the release_device callback. For the device TLB invalidation
> timed-out case, probably it could be checked in this way:
>
> struct device *dev = device_rbtree_find(iommu, ite_sid);
> if (!dev || !pci_device_is_present(to_pci_dev(dev)))
> return -ETIMEDOUT;
looks much advanced with such infrastructure !
Thanks,
Ethan
>
> Best regards,
> baolu
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 1/2] iommu/vt-d: Use rbtree to track iommu probed devices
2024-02-19 5:33 ` Ethan Zhao
@ 2024-02-19 6:47 ` Baolu Lu
2024-02-19 7:24 ` Ethan Zhao
0 siblings, 1 reply; 19+ messages in thread
From: Baolu Lu @ 2024-02-19 6:47 UTC (permalink / raw)
To: Ethan Zhao, Joerg Roedel, Will Deacon, Robin Murphy,
Jason Gunthorpe, Kevin Tian
Cc: baolu.lu, Huang Jiaqing, iommu, linux-kernel
On 2024/2/19 13:33, Ethan Zhao wrote:
> On 2/19/2024 12:04 PM, Baolu Lu wrote:
>> On 2024/2/19 10:45, Ethan Zhao wrote:
>>>> @@ -4264,25 +4313,34 @@ static struct iommu_device
>>>> *intel_iommu_probe_device(struct device *dev)
>>>> }
>>>> dev_iommu_priv_set(dev, info);
>>>> + ret = device_rbtree_insert(iommu, info);
>>>> + if (ret)
>>>> + goto free;
>>>> if (sm_supported(iommu) && !dev_is_real_dma_subdevice(dev)) {
>>>> ret = intel_pasid_alloc_table(dev);
>>>> if (ret) {
>>>> dev_err(dev, "PASID table allocation failed\n");
>>>> - kfree(info);
>>>> - return ERR_PTR(ret);
>>>> + goto clear_rbtree;
>>>> }
>>>> }
>>>> intel_iommu_debugfs_create_dev(info);
>>>> return &iommu->iommu;
>>>> +clear_rbtree:
>>>> + device_rbtree_remove(info);
>>>> +free:
>>>> + kfree(info);
>>>> +
>>>> + return ERR_PTR(ret);
>>>> }
>>>> static void intel_iommu_release_device(struct device *dev)
>>>> {
>>>> struct device_domain_info *info = dev_iommu_priv_get(dev);
>>>> + device_rbtree_remove(info);
>>>
>>> Perhpas too early here to remove dev from the rbtree, if it is wanted in
>>> devTLB invalidation steps in intel_pasid_tear_down_entry().
>>
>> Perhaps the caller of device_rbtree_find() should not depend on the
>
> I didn't catch up here. seems have to maintain the lifecycle as PCI
> subsystem
> does, or there would be mutli instances for the same BDF(e.g. the device is
> removed then plugged, again and again.....in the same slot) in the rbtree ?
There should not be multiple instances for a same BDF. The lifecycle of
a device is managed by the device and driver core. The iommu subsystem
registers a notification to the core and take actions on device ADD and
REMOVE events.
Best regards,
baolu
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 2/2] iommu/vt-d: Use device rbtree in iopf reporting path
2024-02-15 7:22 ` [PATCH 2/2] iommu/vt-d: Use device rbtree in iopf reporting path Lu Baolu
2024-02-15 17:55 ` Jason Gunthorpe
@ 2024-02-19 6:54 ` Ethan Zhao
2024-02-19 6:58 ` Baolu Lu
1 sibling, 1 reply; 19+ messages in thread
From: Ethan Zhao @ 2024-02-19 6:54 UTC (permalink / raw)
To: Lu Baolu, Joerg Roedel, Will Deacon, Robin Murphy,
Jason Gunthorpe, Kevin Tian
Cc: Huang Jiaqing, iommu, linux-kernel
On 2/15/2024 3:22 PM, Lu Baolu wrote:
> The existing IO page fault handler currently locates the PCI device by
> calling pci_get_domain_bus_and_slot(). This function searches the list
> of all PCI devices until the desired device is found. To improve lookup
> efficiency, a helper function named device_rbtree_find() is introduced
> to search for the device within the rbtree. Replace
> pci_get_domain_bus_and_slot() in the IO page fault handling path.
>
> Co-developed-by: Huang Jiaqing <jiaqing.huang@intel.com>
> Signed-off-by: Huang Jiaqing <jiaqing.huang@intel.com>
> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
> ---
> drivers/iommu/intel/iommu.h | 1 +
> drivers/iommu/intel/iommu.c | 29 +++++++++++++++++++++++++++++
> drivers/iommu/intel/svm.c | 14 ++++++--------
> 3 files changed, 36 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/iommu/intel/iommu.h b/drivers/iommu/intel/iommu.h
> index 54eeaa8e35a9..f13c228924f8 100644
> --- a/drivers/iommu/intel/iommu.h
> +++ b/drivers/iommu/intel/iommu.h
> @@ -1081,6 +1081,7 @@ void free_pgtable_page(void *vaddr);
> void iommu_flush_write_buffer(struct intel_iommu *iommu);
> struct iommu_domain *intel_nested_domain_alloc(struct iommu_domain *parent,
> const struct iommu_user_data *user_data);
> +struct device *device_rbtree_find(struct intel_iommu *iommu, u16 rid);
>
> #ifdef CONFIG_INTEL_IOMMU_SVM
> void intel_svm_check(struct intel_iommu *iommu);
> diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
> index 09009d96e553..d92c680bcc96 100644
> --- a/drivers/iommu/intel/iommu.c
> +++ b/drivers/iommu/intel/iommu.c
> @@ -120,6 +120,35 @@ static int device_rid_cmp(struct rb_node *lhs, const struct rb_node *rhs)
> return device_rid_cmp_key(&key, rhs);
> }
>
> +/*
> + * Looks up an IOMMU-probed device using its source ID.
> + *
> + * If the device is found:
> + * - Increments its reference count.
> + * - Returns a pointer to the device.
> + * - The caller must call put_device() after using the pointer.
> + *
> + * If the device is not found, returns NULL.
> + */
> +struct device *device_rbtree_find(struct intel_iommu *iommu, u16 rid)
> +{
> + struct device_domain_info *info;
> + struct device *dev = NULL;
> + struct rb_node *node;
> + unsigned long flags;
> +
> + spin_lock_irqsave(&iommu->device_rbtree_lock, flags);
Though per iommu device rbtree isn't a big tree, given already holds spin_lock
why still needs irq off ?
Thanks,
Ethan
> + node = rb_find(&rid, &iommu->device_rbtree, device_rid_cmp_key);
> + if (node) {
> + info = rb_entry(node, struct device_domain_info, node);
> + dev = info->dev;
> + get_device(dev);
> + }
> + spin_unlock_irqrestore(&iommu->device_rbtree_lock, flags);
> +
> + return dev;
> +}
> +
> static int device_rbtree_insert(struct intel_iommu *iommu,
> struct device_domain_info *info)
> {
> diff --git a/drivers/iommu/intel/svm.c b/drivers/iommu/intel/svm.c
> index b644d57da841..717b7041973c 100644
> --- a/drivers/iommu/intel/svm.c
> +++ b/drivers/iommu/intel/svm.c
> @@ -645,7 +645,7 @@ static irqreturn_t prq_event_thread(int irq, void *d)
> struct intel_iommu *iommu = d;
> struct page_req_dsc *req;
> int head, tail, handled;
> - struct pci_dev *pdev;
> + struct device *dev;
> u64 address;
>
> /*
> @@ -691,21 +691,19 @@ static irqreturn_t prq_event_thread(int irq, void *d)
> if (unlikely(req->lpig && !req->rd_req && !req->wr_req))
> goto prq_advance;
>
> - pdev = pci_get_domain_bus_and_slot(iommu->segment,
> - PCI_BUS_NUM(req->rid),
> - req->rid & 0xff);
> /*
> * If prq is to be handled outside iommu driver via receiver of
> * the fault notifiers, we skip the page response here.
> */
> - if (!pdev)
> + dev = device_rbtree_find(iommu, req->rid);
> + if (!dev)
> goto bad_req;
>
> - intel_svm_prq_report(iommu, &pdev->dev, req);
> - trace_prq_report(iommu, &pdev->dev, req->qw_0, req->qw_1,
> + intel_svm_prq_report(iommu, dev, req);
> + trace_prq_report(iommu, dev, req->qw_0, req->qw_1,
> req->priv_data[0], req->priv_data[1],
> iommu->prq_seq_number++);
> - pci_dev_put(pdev);
> + put_device(dev);
> prq_advance:
> head = (head + sizeof(*req)) & PRQ_RING_MASK;
> }
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 2/2] iommu/vt-d: Use device rbtree in iopf reporting path
2024-02-19 6:54 ` Ethan Zhao
@ 2024-02-19 6:58 ` Baolu Lu
2024-02-19 7:06 ` Ethan Zhao
0 siblings, 1 reply; 19+ messages in thread
From: Baolu Lu @ 2024-02-19 6:58 UTC (permalink / raw)
To: Ethan Zhao, Joerg Roedel, Will Deacon, Robin Murphy,
Jason Gunthorpe, Kevin Tian
Cc: baolu.lu, Huang Jiaqing, iommu, linux-kernel
On 2024/2/19 14:54, Ethan Zhao wrote:
> On 2/15/2024 3:22 PM, Lu Baolu wrote:
>> The existing IO page fault handler currently locates the PCI device by
>> calling pci_get_domain_bus_and_slot(). This function searches the list
>> of all PCI devices until the desired device is found. To improve lookup
>> efficiency, a helper function named device_rbtree_find() is introduced
>> to search for the device within the rbtree. Replace
>> pci_get_domain_bus_and_slot() in the IO page fault handling path.
>>
>> Co-developed-by: Huang Jiaqing <jiaqing.huang@intel.com>
>> Signed-off-by: Huang Jiaqing <jiaqing.huang@intel.com>
>> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
>> ---
>> drivers/iommu/intel/iommu.h | 1 +
>> drivers/iommu/intel/iommu.c | 29 +++++++++++++++++++++++++++++
>> drivers/iommu/intel/svm.c | 14 ++++++--------
>> 3 files changed, 36 insertions(+), 8 deletions(-)
>>
>> diff --git a/drivers/iommu/intel/iommu.h b/drivers/iommu/intel/iommu.h
>> index 54eeaa8e35a9..f13c228924f8 100644
>> --- a/drivers/iommu/intel/iommu.h
>> +++ b/drivers/iommu/intel/iommu.h
>> @@ -1081,6 +1081,7 @@ void free_pgtable_page(void *vaddr);
>> void iommu_flush_write_buffer(struct intel_iommu *iommu);
>> struct iommu_domain *intel_nested_domain_alloc(struct iommu_domain
>> *parent,
>> const struct iommu_user_data *user_data);
>> +struct device *device_rbtree_find(struct intel_iommu *iommu, u16 rid);
>> #ifdef CONFIG_INTEL_IOMMU_SVM
>> void intel_svm_check(struct intel_iommu *iommu);
>> diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
>> index 09009d96e553..d92c680bcc96 100644
>> --- a/drivers/iommu/intel/iommu.c
>> +++ b/drivers/iommu/intel/iommu.c
>> @@ -120,6 +120,35 @@ static int device_rid_cmp(struct rb_node *lhs,
>> const struct rb_node *rhs)
>> return device_rid_cmp_key(&key, rhs);
>> }
>> +/*
>> + * Looks up an IOMMU-probed device using its source ID.
>> + *
>> + * If the device is found:
>> + * - Increments its reference count.
>> + * - Returns a pointer to the device.
>> + * - The caller must call put_device() after using the pointer.
>> + *
>> + * If the device is not found, returns NULL.
>> + */
>> +struct device *device_rbtree_find(struct intel_iommu *iommu, u16 rid)
>> +{
>> + struct device_domain_info *info;
>> + struct device *dev = NULL;
>> + struct rb_node *node;
>> + unsigned long flags;
>> +
>> + spin_lock_irqsave(&iommu->device_rbtree_lock, flags);
>
> Though per iommu device rbtree isn't a big tree, given already holds
> spin_lock
> why still needs irq off ?
I want it to be usable not only in the normal execution flow, but also
in the interrupt context, such as for the DMA remapping fault
(unrecoverable) reporting path.
Best regards,
baolu
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 2/2] iommu/vt-d: Use device rbtree in iopf reporting path
2024-02-19 6:58 ` Baolu Lu
@ 2024-02-19 7:06 ` Ethan Zhao
2024-02-19 7:22 ` Baolu Lu
0 siblings, 1 reply; 19+ messages in thread
From: Ethan Zhao @ 2024-02-19 7:06 UTC (permalink / raw)
To: Baolu Lu, Joerg Roedel, Will Deacon, Robin Murphy,
Jason Gunthorpe, Kevin Tian
Cc: Huang Jiaqing, iommu, linux-kernel
On 2/19/2024 2:58 PM, Baolu Lu wrote:
> On 2024/2/19 14:54, Ethan Zhao wrote:
>> On 2/15/2024 3:22 PM, Lu Baolu wrote:
>>> The existing IO page fault handler currently locates the PCI device by
>>> calling pci_get_domain_bus_and_slot(). This function searches the list
>>> of all PCI devices until the desired device is found. To improve lookup
>>> efficiency, a helper function named device_rbtree_find() is introduced
>>> to search for the device within the rbtree. Replace
>>> pci_get_domain_bus_and_slot() in the IO page fault handling path.
>>>
>>> Co-developed-by: Huang Jiaqing <jiaqing.huang@intel.com>
>>> Signed-off-by: Huang Jiaqing <jiaqing.huang@intel.com>
>>> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
>>> ---
>>> drivers/iommu/intel/iommu.h | 1 +
>>> drivers/iommu/intel/iommu.c | 29 +++++++++++++++++++++++++++++
>>> drivers/iommu/intel/svm.c | 14 ++++++--------
>>> 3 files changed, 36 insertions(+), 8 deletions(-)
>>>
>>> diff --git a/drivers/iommu/intel/iommu.h b/drivers/iommu/intel/iommu.h
>>> index 54eeaa8e35a9..f13c228924f8 100644
>>> --- a/drivers/iommu/intel/iommu.h
>>> +++ b/drivers/iommu/intel/iommu.h
>>> @@ -1081,6 +1081,7 @@ void free_pgtable_page(void *vaddr);
>>> void iommu_flush_write_buffer(struct intel_iommu *iommu);
>>> struct iommu_domain *intel_nested_domain_alloc(struct iommu_domain
>>> *parent,
>>> const struct iommu_user_data *user_data);
>>> +struct device *device_rbtree_find(struct intel_iommu *iommu, u16 rid);
>>> #ifdef CONFIG_INTEL_IOMMU_SVM
>>> void intel_svm_check(struct intel_iommu *iommu);
>>> diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
>>> index 09009d96e553..d92c680bcc96 100644
>>> --- a/drivers/iommu/intel/iommu.c
>>> +++ b/drivers/iommu/intel/iommu.c
>>> @@ -120,6 +120,35 @@ static int device_rid_cmp(struct rb_node *lhs,
>>> const struct rb_node *rhs)
>>> return device_rid_cmp_key(&key, rhs);
>>> }
>>> +/*
>>> + * Looks up an IOMMU-probed device using its source ID.
>>> + *
>>> + * If the device is found:
>>> + * - Increments its reference count.
>>> + * - Returns a pointer to the device.
>>> + * - The caller must call put_device() after using the pointer.
>>> + *
>>> + * If the device is not found, returns NULL.
>>> + */
>>> +struct device *device_rbtree_find(struct intel_iommu *iommu, u16 rid)
>>> +{
>>> + struct device_domain_info *info;
>>> + struct device *dev = NULL;
>>> + struct rb_node *node;
>>> + unsigned long flags;
>>> +
>>> + spin_lock_irqsave(&iommu->device_rbtree_lock, flags);
>>
>> Though per iommu device rbtree isn't a big tree, given already holds
>> spin_lock
>> why still needs irq off ?
>
> I want it to be usable not only in the normal execution flow, but also
> in the interrupt context, such as for the DMA remapping fault
> (unrecoverable) reporting path.
Holding rbtree_lock only should work in interrrupt context, I missed something ?
But with irq_off, as side effect, telling all the CPUs not to handle interrrupt
that moment.
Thanks,
Ethan
>
> Best regards,
> baolu
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 2/2] iommu/vt-d: Use device rbtree in iopf reporting path
2024-02-19 7:06 ` Ethan Zhao
@ 2024-02-19 7:22 ` Baolu Lu
0 siblings, 0 replies; 19+ messages in thread
From: Baolu Lu @ 2024-02-19 7:22 UTC (permalink / raw)
To: Ethan Zhao, Joerg Roedel, Will Deacon, Robin Murphy,
Jason Gunthorpe, Kevin Tian
Cc: baolu.lu, Huang Jiaqing, iommu, linux-kernel
On 2024/2/19 15:06, Ethan Zhao wrote:
> On 2/19/2024 2:58 PM, Baolu Lu wrote:
>> On 2024/2/19 14:54, Ethan Zhao wrote:
>>> On 2/15/2024 3:22 PM, Lu Baolu wrote:
>>>> The existing IO page fault handler currently locates the PCI device by
>>>> calling pci_get_domain_bus_and_slot(). This function searches the list
>>>> of all PCI devices until the desired device is found. To improve lookup
>>>> efficiency, a helper function named device_rbtree_find() is introduced
>>>> to search for the device within the rbtree. Replace
>>>> pci_get_domain_bus_and_slot() in the IO page fault handling path.
>>>>
>>>> Co-developed-by: Huang Jiaqing <jiaqing.huang@intel.com>
>>>> Signed-off-by: Huang Jiaqing <jiaqing.huang@intel.com>
>>>> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
>>>> ---
>>>> drivers/iommu/intel/iommu.h | 1 +
>>>> drivers/iommu/intel/iommu.c | 29 +++++++++++++++++++++++++++++
>>>> drivers/iommu/intel/svm.c | 14 ++++++--------
>>>> 3 files changed, 36 insertions(+), 8 deletions(-)
>>>>
>>>> diff --git a/drivers/iommu/intel/iommu.h b/drivers/iommu/intel/iommu.h
>>>> index 54eeaa8e35a9..f13c228924f8 100644
>>>> --- a/drivers/iommu/intel/iommu.h
>>>> +++ b/drivers/iommu/intel/iommu.h
>>>> @@ -1081,6 +1081,7 @@ void free_pgtable_page(void *vaddr);
>>>> void iommu_flush_write_buffer(struct intel_iommu *iommu);
>>>> struct iommu_domain *intel_nested_domain_alloc(struct iommu_domain
>>>> *parent,
>>>> const struct iommu_user_data *user_data);
>>>> +struct device *device_rbtree_find(struct intel_iommu *iommu, u16 rid);
>>>> #ifdef CONFIG_INTEL_IOMMU_SVM
>>>> void intel_svm_check(struct intel_iommu *iommu);
>>>> diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
>>>> index 09009d96e553..d92c680bcc96 100644
>>>> --- a/drivers/iommu/intel/iommu.c
>>>> +++ b/drivers/iommu/intel/iommu.c
>>>> @@ -120,6 +120,35 @@ static int device_rid_cmp(struct rb_node *lhs,
>>>> const struct rb_node *rhs)
>>>> return device_rid_cmp_key(&key, rhs);
>>>> }
>>>> +/*
>>>> + * Looks up an IOMMU-probed device using its source ID.
>>>> + *
>>>> + * If the device is found:
>>>> + * - Increments its reference count.
>>>> + * - Returns a pointer to the device.
>>>> + * - The caller must call put_device() after using the pointer.
>>>> + *
>>>> + * If the device is not found, returns NULL.
>>>> + */
>>>> +struct device *device_rbtree_find(struct intel_iommu *iommu, u16 rid)
>>>> +{
>>>> + struct device_domain_info *info;
>>>> + struct device *dev = NULL;
>>>> + struct rb_node *node;
>>>> + unsigned long flags;
>>>> +
>>>> + spin_lock_irqsave(&iommu->device_rbtree_lock, flags);
>>>
>>> Though per iommu device rbtree isn't a big tree, given already holds
>>> spin_lock
>>> why still needs irq off ?
>>
>> I want it to be usable not only in the normal execution flow, but also
>> in the interrupt context, such as for the DMA remapping fault
>> (unrecoverable) reporting path.
>
> Holding rbtree_lock only should work in interrrupt context, I missed
> something ?
No. That will possibly cause dead lock.
Best regards,
baolu
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 1/2] iommu/vt-d: Use rbtree to track iommu probed devices
2024-02-19 6:47 ` Baolu Lu
@ 2024-02-19 7:24 ` Ethan Zhao
0 siblings, 0 replies; 19+ messages in thread
From: Ethan Zhao @ 2024-02-19 7:24 UTC (permalink / raw)
To: Baolu Lu, Joerg Roedel, Will Deacon, Robin Murphy,
Jason Gunthorpe, Kevin Tian
Cc: Huang Jiaqing, iommu, linux-kernel
On 2/19/2024 2:47 PM, Baolu Lu wrote:
> On 2024/2/19 13:33, Ethan Zhao wrote:
>> On 2/19/2024 12:04 PM, Baolu Lu wrote:
>>> On 2024/2/19 10:45, Ethan Zhao wrote:
>>>>> @@ -4264,25 +4313,34 @@ static struct iommu_device
>>>>> *intel_iommu_probe_device(struct device *dev)
>>>>> }
>>>>> dev_iommu_priv_set(dev, info);
>>>>> + ret = device_rbtree_insert(iommu, info);
>>>>> + if (ret)
>>>>> + goto free;
>>>>> if (sm_supported(iommu) && !dev_is_real_dma_subdevice(dev)) {
>>>>> ret = intel_pasid_alloc_table(dev);
>>>>> if (ret) {
>>>>> dev_err(dev, "PASID table allocation failed\n");
>>>>> - kfree(info);
>>>>> - return ERR_PTR(ret);
>>>>> + goto clear_rbtree;
>>>>> }
>>>>> }
>>>>> intel_iommu_debugfs_create_dev(info);
>>>>> return &iommu->iommu;
>>>>> +clear_rbtree:
>>>>> + device_rbtree_remove(info);
>>>>> +free:
>>>>> + kfree(info);
>>>>> +
>>>>> + return ERR_PTR(ret);
>>>>> }
>>>>> static void intel_iommu_release_device(struct device *dev)
>>>>> {
>>>>> struct device_domain_info *info = dev_iommu_priv_get(dev);
>>>>> + device_rbtree_remove(info);
>>>>
>>>> Perhpas too early here to remove dev from the rbtree, if it is
>>>> wanted in
>>>> devTLB invalidation steps in intel_pasid_tear_down_entry().
>>>
>>> Perhaps the caller of device_rbtree_find() should not depend on the
>>
>> I didn't catch up here. seems have to maintain the lifecycle as PCI
>> subsystem
>> does, or there would be mutli instances for the same BDF(e.g. the
>> device is
>> removed then plugged, again and again.....in the same slot) in the
>> rbtree ?
>
> There should not be multiple instances for a same BDF. The lifecycle of
> a device is managed by the device and driver core. The iommu subsystem
> registers a notification to the core and take actions on device ADD and
> REMOVE events.
Move device_rbtree_remove(info) to end of this REMOVED notifier callback, or such
beautiful code wouldn't work :)
struct device *dev = device_rbtree_find(iommu, ite_sid);
if (!dev || !pci_device_is_present(to_pci_dev(dev)))
return -ETIMEDOUT;
Thanks,
Ethan
>
> Best regards,
> baolu
>
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 2/2] iommu/vt-d: Use device rbtree in iopf reporting path
2024-02-15 17:55 ` Jason Gunthorpe
2024-02-18 7:02 ` Baolu Lu
@ 2024-02-21 7:04 ` Ethan Zhao
2024-02-21 7:37 ` Baolu Lu
1 sibling, 1 reply; 19+ messages in thread
From: Ethan Zhao @ 2024-02-21 7:04 UTC (permalink / raw)
To: Jason Gunthorpe, Lu Baolu
Cc: Joerg Roedel, Will Deacon, Robin Murphy, Kevin Tian,
Huang Jiaqing, iommu, linux-kernel
On 2/16/2024 1:55 AM, Jason Gunthorpe wrote:
> On Thu, Feb 15, 2024 at 03:22:49PM +0800, Lu Baolu wrote:
>> The existing IO page fault handler currently locates the PCI device by
>> calling pci_get_domain_bus_and_slot(). This function searches the list
>> of all PCI devices until the desired device is found. To improve lookup
>> efficiency, a helper function named device_rbtree_find() is introduced
>> to search for the device within the rbtree. Replace
>> pci_get_domain_bus_and_slot() in the IO page fault handling path.
>>
>> Co-developed-by: Huang Jiaqing <jiaqing.huang@intel.com>
>> Signed-off-by: Huang Jiaqing <jiaqing.huang@intel.com>
>> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
>> ---
>> drivers/iommu/intel/iommu.h | 1 +
>> drivers/iommu/intel/iommu.c | 29 +++++++++++++++++++++++++++++
>> drivers/iommu/intel/svm.c | 14 ++++++--------
>> 3 files changed, 36 insertions(+), 8 deletions(-)
>>
>> diff --git a/drivers/iommu/intel/iommu.h b/drivers/iommu/intel/iommu.h
>> index 54eeaa8e35a9..f13c228924f8 100644
>> --- a/drivers/iommu/intel/iommu.h
>> +++ b/drivers/iommu/intel/iommu.h
>> @@ -1081,6 +1081,7 @@ void free_pgtable_page(void *vaddr);
>> void iommu_flush_write_buffer(struct intel_iommu *iommu);
>> struct iommu_domain *intel_nested_domain_alloc(struct iommu_domain *parent,
>> const struct iommu_user_data *user_data);
>> +struct device *device_rbtree_find(struct intel_iommu *iommu, u16 rid);
>>
>> #ifdef CONFIG_INTEL_IOMMU_SVM
>> void intel_svm_check(struct intel_iommu *iommu);
>> diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
>> index 09009d96e553..d92c680bcc96 100644
>> --- a/drivers/iommu/intel/iommu.c
>> +++ b/drivers/iommu/intel/iommu.c
>> @@ -120,6 +120,35 @@ static int device_rid_cmp(struct rb_node *lhs, const struct rb_node *rhs)
>> return device_rid_cmp_key(&key, rhs);
>> }
>>
>> +/*
>> + * Looks up an IOMMU-probed device using its source ID.
>> + *
>> + * If the device is found:
>> + * - Increments its reference count.
>> + * - Returns a pointer to the device.
>> + * - The caller must call put_device() after using the pointer.
>> + *
>> + * If the device is not found, returns NULL.
>> + */
>> +struct device *device_rbtree_find(struct intel_iommu *iommu, u16 rid)
>> +{
>> + struct device_domain_info *info;
>> + struct device *dev = NULL;
>> + struct rb_node *node;
>> + unsigned long flags;
>> +
>> + spin_lock_irqsave(&iommu->device_rbtree_lock, flags);
>> + node = rb_find(&rid, &iommu->device_rbtree, device_rid_cmp_key);
>> + if (node) {
>> + info = rb_entry(node, struct device_domain_info, node);
>> + dev = info->dev;
>> + get_device(dev);
> This get_device() is a bit troubling. It eventually calls into
> iommu_report_device_fault() which does:
>
> struct dev_iommu *param = dev->iommu;
So far no protection to dev->iommu structure access in generic
iommu layer between different threads, such as hot removal interrupt &
iopf handling thread, so we should enhance that in generic iommu code ?
Thanks,
Ethan
>
> Which is going to explode if the iomm driver release has already
> happened, which is a precondition to getting to a unref'd struct
> device.
>
> The driver needs to do something to fence these events during it's
> release function.
>
> If we are already doing that then I'd suggest to drop the get_device
> and add a big fat comment explaining the special rules about lifetime
> that are in effect here.
>
> Otherwise you need to do that barrier rethink the way the locking
> works..
>
> Aside from that this looks like a great improvement to me
>
> Thanks,
> Jason
>
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 2/2] iommu/vt-d: Use device rbtree in iopf reporting path
2024-02-21 7:04 ` Ethan Zhao
@ 2024-02-21 7:37 ` Baolu Lu
0 siblings, 0 replies; 19+ messages in thread
From: Baolu Lu @ 2024-02-21 7:37 UTC (permalink / raw)
To: Ethan Zhao, Jason Gunthorpe
Cc: baolu.lu, Joerg Roedel, Will Deacon, Robin Murphy, Kevin Tian,
Huang Jiaqing, iommu, linux-kernel
On 2024/2/21 15:04, Ethan Zhao wrote:
> On 2/16/2024 1:55 AM, Jason Gunthorpe wrote:
>> On Thu, Feb 15, 2024 at 03:22:49PM +0800, Lu Baolu wrote:
>>> The existing IO page fault handler currently locates the PCI device by
>>> calling pci_get_domain_bus_and_slot(). This function searches the list
>>> of all PCI devices until the desired device is found. To improve lookup
>>> efficiency, a helper function named device_rbtree_find() is introduced
>>> to search for the device within the rbtree. Replace
>>> pci_get_domain_bus_and_slot() in the IO page fault handling path.
>>>
>>> Co-developed-by: Huang Jiaqing <jiaqing.huang@intel.com>
>>> Signed-off-by: Huang Jiaqing <jiaqing.huang@intel.com>
>>> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
>>> ---
>>> drivers/iommu/intel/iommu.h | 1 +
>>> drivers/iommu/intel/iommu.c | 29 +++++++++++++++++++++++++++++
>>> drivers/iommu/intel/svm.c | 14 ++++++--------
>>> 3 files changed, 36 insertions(+), 8 deletions(-)
>>>
>>> diff --git a/drivers/iommu/intel/iommu.h b/drivers/iommu/intel/iommu.h
>>> index 54eeaa8e35a9..f13c228924f8 100644
>>> --- a/drivers/iommu/intel/iommu.h
>>> +++ b/drivers/iommu/intel/iommu.h
>>> @@ -1081,6 +1081,7 @@ void free_pgtable_page(void *vaddr);
>>> void iommu_flush_write_buffer(struct intel_iommu *iommu);
>>> struct iommu_domain *intel_nested_domain_alloc(struct iommu_domain
>>> *parent,
>>> const struct iommu_user_data *user_data);
>>> +struct device *device_rbtree_find(struct intel_iommu *iommu, u16 rid);
>>> #ifdef CONFIG_INTEL_IOMMU_SVM
>>> void intel_svm_check(struct intel_iommu *iommu);
>>> diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
>>> index 09009d96e553..d92c680bcc96 100644
>>> --- a/drivers/iommu/intel/iommu.c
>>> +++ b/drivers/iommu/intel/iommu.c
>>> @@ -120,6 +120,35 @@ static int device_rid_cmp(struct rb_node *lhs,
>>> const struct rb_node *rhs)
>>> return device_rid_cmp_key(&key, rhs);
>>> }
>>> +/*
>>> + * Looks up an IOMMU-probed device using its source ID.
>>> + *
>>> + * If the device is found:
>>> + * - Increments its reference count.
>>> + * - Returns a pointer to the device.
>>> + * - The caller must call put_device() after using the pointer.
>>> + *
>>> + * If the device is not found, returns NULL.
>>> + */
>>> +struct device *device_rbtree_find(struct intel_iommu *iommu, u16 rid)
>>> +{
>>> + struct device_domain_info *info;
>>> + struct device *dev = NULL;
>>> + struct rb_node *node;
>>> + unsigned long flags;
>>> +
>>> + spin_lock_irqsave(&iommu->device_rbtree_lock, flags);
>>> + node = rb_find(&rid, &iommu->device_rbtree, device_rid_cmp_key);
>>> + if (node) {
>>> + info = rb_entry(node, struct device_domain_info, node);
>>> + dev = info->dev;
>>> + get_device(dev);
>> This get_device() is a bit troubling. It eventually calls into
>> iommu_report_device_fault() which does:
>>
>> struct dev_iommu *param = dev->iommu;
>
> So far no protection to dev->iommu structure access in generic
> iommu layer between different threads, such as hot removal interrupt &
> iopf handling thread, so we should enhance that in generic iommu code ?
The iommu core is sitting between the upper layers and individual iommu
drivers. All calls from the upper layers are made within the driver
context. The device driver framework and iommu subsystem guarantee that
dev->iommu is always valid in the driver context.
However, iommu_report_device_fault() is different. It's called in the
interrupt thread, not in any driver context. Hence, the individual iommu
driver should guarantee that dev->iommu is valid when calling
iommu_report_device_fault().
Best regards,
baolu
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 2/2] iommu/vt-d: Use device rbtree in iopf reporting path
2024-02-18 7:02 ` Baolu Lu
@ 2024-02-21 15:31 ` Jason Gunthorpe
0 siblings, 0 replies; 19+ messages in thread
From: Jason Gunthorpe @ 2024-02-21 15:31 UTC (permalink / raw)
To: Baolu Lu
Cc: Joerg Roedel, Will Deacon, Robin Murphy, Kevin Tian,
Huang Jiaqing, Ethan Zhao, iommu, linux-kernel
On Sun, Feb 18, 2024 at 03:02:00PM +0800, Baolu Lu wrote:
> A device hot removing goes through at least the following steps:
>
> - Disable PRI.
> - Drain all outstanding I/O page faults.
> - Stop DMA.
> - Unload the device driver.
> - Call iommu_release_device() upon the BUS_NOTIFY_REMOVED_DEVICE event.
>
> This sequence ensures that a device cannot generate an I/O page fault
> after PRI has been disabled. So in reality it's impossible for a device
> to generate an I/O page fault before disabling PRI and then go through
> the long journey to reach iommu_release_device() before
> iopf_get_dev_fault_param() is called in page fault interrupt handling
> thread.
Why is this impossible? Seems like a classic race..
Flush the HW page fault queue as part of the above to ensure there is
no concurrent iopf_get_dev_fault_param() on the now PRI disabled BDF.
> Considering this behavior, adding a comment to the code explaining the
> sequence and removing put_device() may be a simpler solution?
A comment is definitely needed
Jason
^ permalink raw reply [flat|nested] 19+ messages in thread
end of thread, other threads:[~2024-02-21 15:31 UTC | newest]
Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-02-15 7:22 [PATCH 0/2] iommu/vt-d: Introduce rbtree for probed devices Lu Baolu
2024-02-15 7:22 ` [PATCH 1/2] iommu/vt-d: Use rbtree to track iommu " Lu Baolu
2024-02-15 17:47 ` Jason Gunthorpe
2024-02-18 4:22 ` Baolu Lu
2024-02-19 2:45 ` Ethan Zhao
2024-02-19 4:04 ` Baolu Lu
2024-02-19 5:33 ` Ethan Zhao
2024-02-19 6:47 ` Baolu Lu
2024-02-19 7:24 ` Ethan Zhao
2024-02-15 7:22 ` [PATCH 2/2] iommu/vt-d: Use device rbtree in iopf reporting path Lu Baolu
2024-02-15 17:55 ` Jason Gunthorpe
2024-02-18 7:02 ` Baolu Lu
2024-02-21 15:31 ` Jason Gunthorpe
2024-02-21 7:04 ` Ethan Zhao
2024-02-21 7:37 ` Baolu Lu
2024-02-19 6:54 ` Ethan Zhao
2024-02-19 6:58 ` Baolu Lu
2024-02-19 7:06 ` Ethan Zhao
2024-02-19 7:22 ` Baolu Lu
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox