public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Baolu Lu <baolu.lu@linux.intel.com>
To: Jason Gunthorpe <jgg@ziepe.ca>
Cc: baolu.lu@linux.intel.com, Joerg Roedel <joro@8bytes.org>,
	Will Deacon <will@kernel.org>,
	Robin Murphy <robin.murphy@arm.com>,
	Kevin Tian <kevin.tian@intel.com>,
	Huang Jiaqing <jiaqing.huang@intel.com>,
	Ethan Zhao <haifeng.zhao@linux.intel.com>,
	iommu@lists.linux.dev, linux-kernel@vger.kernel.org
Subject: Re: [PATCH 2/2] iommu/vt-d: Use device rbtree in iopf reporting path
Date: Sun, 18 Feb 2024 15:02:00 +0800	[thread overview]
Message-ID: <67391b2d-b441-4d43-aa46-2a30c95420a3@linux.intel.com> (raw)
In-Reply-To: <20240215175534.GD1299735@ziepe.ca>

On 2024/2/16 1:55, Jason Gunthorpe wrote:
> On Thu, Feb 15, 2024 at 03:22:49PM +0800, Lu Baolu wrote:
>> The existing IO page fault handler currently locates the PCI device by
>> calling pci_get_domain_bus_and_slot(). This function searches the list
>> of all PCI devices until the desired device is found. To improve lookup
>> efficiency, a helper function named device_rbtree_find() is introduced
>> to search for the device within the rbtree. Replace
>> pci_get_domain_bus_and_slot() in the IO page fault handling path.
>>
>> Co-developed-by: Huang Jiaqing <jiaqing.huang@intel.com>
>> Signed-off-by: Huang Jiaqing <jiaqing.huang@intel.com>
>> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
>> ---
>>   drivers/iommu/intel/iommu.h |  1 +
>>   drivers/iommu/intel/iommu.c | 29 +++++++++++++++++++++++++++++
>>   drivers/iommu/intel/svm.c   | 14 ++++++--------
>>   3 files changed, 36 insertions(+), 8 deletions(-)
>>
>> diff --git a/drivers/iommu/intel/iommu.h b/drivers/iommu/intel/iommu.h
>> index 54eeaa8e35a9..f13c228924f8 100644
>> --- a/drivers/iommu/intel/iommu.h
>> +++ b/drivers/iommu/intel/iommu.h
>> @@ -1081,6 +1081,7 @@ void free_pgtable_page(void *vaddr);
>>   void iommu_flush_write_buffer(struct intel_iommu *iommu);
>>   struct iommu_domain *intel_nested_domain_alloc(struct iommu_domain *parent,
>>   					       const struct iommu_user_data *user_data);
>> +struct device *device_rbtree_find(struct intel_iommu *iommu, u16 rid);
>>   
>>   #ifdef CONFIG_INTEL_IOMMU_SVM
>>   void intel_svm_check(struct intel_iommu *iommu);
>> diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
>> index 09009d96e553..d92c680bcc96 100644
>> --- a/drivers/iommu/intel/iommu.c
>> +++ b/drivers/iommu/intel/iommu.c
>> @@ -120,6 +120,35 @@ static int device_rid_cmp(struct rb_node *lhs, const struct rb_node *rhs)
>>   	return device_rid_cmp_key(&key, rhs);
>>   }
>>   
>> +/*
>> + * Looks up an IOMMU-probed device using its source ID.
>> + *
>> + * If the device is found:
>> + *  - Increments its reference count.
>> + *  - Returns a pointer to the device.
>> + *  - The caller must call put_device() after using the pointer.
>> + *
>> + * If the device is not found, returns NULL.
>> + */
>> +struct device *device_rbtree_find(struct intel_iommu *iommu, u16 rid)
>> +{
>> +	struct device_domain_info *info;
>> +	struct device *dev = NULL;
>> +	struct rb_node *node;
>> +	unsigned long flags;
>> +
>> +	spin_lock_irqsave(&iommu->device_rbtree_lock, flags);
>> +	node = rb_find(&rid, &iommu->device_rbtree, device_rid_cmp_key);
>> +	if (node) {
>> +		info = rb_entry(node, struct device_domain_info, node);
>> +		dev = info->dev;
>> +		get_device(dev);
> 
> This get_device() is a bit troubling. It eventually calls into
> iommu_report_device_fault() which does:
> 
> 	struct dev_iommu *param = dev->iommu;
> 
> Which is going to explode if the iomm driver release has already
> happened, which is a precondition to getting to a unref'd struct
> device.
> 
> The driver needs to do something to fence these events during it's
> release function.

Yes, theoretically the dev->iommu should be protected in the
iommu_report_device_fault() path.

> 
> If we are already doing that then I'd suggest to drop the get_device
> and add a big fat comment explaining the special rules about lifetime
> that are in effect here.
> 
> Otherwise you need to do that barrier rethink the way the locking
> works..

A device hot removing goes through at least the following steps:

- Disable PRI.
- Drain all outstanding I/O page faults.
- Stop DMA.
- Unload the device driver.
- Call iommu_release_device() upon the BUS_NOTIFY_REMOVED_DEVICE event.

This sequence ensures that a device cannot generate an I/O page fault
after PRI has been disabled. So in reality it's impossible for a device
to generate an I/O page fault before disabling PRI and then go through
the long journey to reach iommu_release_device() before
iopf_get_dev_fault_param() is called in page fault interrupt handling
thread.

Considering this behavior, adding a comment to the code explaining the
sequence and removing put_device() may be a simpler solution?

> 
> Aside from that this looks like a great improvement to me
> 
> Thanks,
> Jason

Best regards,
baolu

  reply	other threads:[~2024-02-18  7:02 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-02-15  7:22 [PATCH 0/2] iommu/vt-d: Introduce rbtree for probed devices Lu Baolu
2024-02-15  7:22 ` [PATCH 1/2] iommu/vt-d: Use rbtree to track iommu " Lu Baolu
2024-02-15 17:47   ` Jason Gunthorpe
2024-02-18  4:22     ` Baolu Lu
2024-02-19  2:45   ` Ethan Zhao
2024-02-19  4:04     ` Baolu Lu
2024-02-19  5:33       ` Ethan Zhao
2024-02-19  6:47         ` Baolu Lu
2024-02-19  7:24           ` Ethan Zhao
2024-02-15  7:22 ` [PATCH 2/2] iommu/vt-d: Use device rbtree in iopf reporting path Lu Baolu
2024-02-15 17:55   ` Jason Gunthorpe
2024-02-18  7:02     ` Baolu Lu [this message]
2024-02-21 15:31       ` Jason Gunthorpe
2024-02-21  7:04     ` Ethan Zhao
2024-02-21  7:37       ` Baolu Lu
2024-02-19  6:54   ` Ethan Zhao
2024-02-19  6:58     ` Baolu Lu
2024-02-19  7:06       ` Ethan Zhao
2024-02-19  7:22         ` Baolu Lu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=67391b2d-b441-4d43-aa46-2a30c95420a3@linux.intel.com \
    --to=baolu.lu@linux.intel.com \
    --cc=haifeng.zhao@linux.intel.com \
    --cc=iommu@lists.linux.dev \
    --cc=jgg@ziepe.ca \
    --cc=jiaqing.huang@intel.com \
    --cc=joro@8bytes.org \
    --cc=kevin.tian@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=robin.murphy@arm.com \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox