public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Ethan Zhao <haifeng.zhao@linux.intel.com>
To: bhelgaas@google.com, baolu.lu@linux.intel.com,
	dwmw2@infradead.org, will@kernel.org, robin.murphy@arm.com,
	lukas@wunner.de
Cc: linux-pci@vger.kernel.org, iommu@lists.linux.dev,
	linux-kernel@vger.kernel.org
Subject: [RFC PATCH v8 4/5] iommu/vt-d: don't issue device-TLB invalidate request when device is disconnected
Date: Tue, 26 Dec 2023 22:09:17 -0500	[thread overview]
Message-ID: <20231227030918.536413-5-haifeng.zhao@linux.intel.com> (raw)
In-Reply-To: <20231227030918.536413-1-haifeng.zhao@linux.intel.com>

Except those aggressive hotplug cases - surprise remove a hotplug device
while its safe removal is requested and handled in process by:

1. pull it out directly.
2. turn off its power.
3. bring the link down.
4. just died there that moment.

etc, in a word, 'gone' or 'disconnected'.

Mostly are regular normal safe removal and surprise removal unplug.
these hot unplug handling process could be optimized for fix the ATS
invalidation hang issue by calling pci_dev_is_disconnected() in function
devtlb_invalidation_with_pasid() to check target device state to avoid
sending meaningless ATS invalidation request to iommu when device is gone.
(see IMPLEMENTATION NOTE in PCIe spec r6.1 section 10.3.1)

For safe removal, device wouldn't be removed untill the whole software
handling process is done, it wouldn't trigger the hard lock up issue
caused by too long ATS invalidation timeout wait. in safe removal path,
device state isn't set to pci_channel_io_perm_failure in
pciehp_unconfigure_device() by checking 'presence' parameter, calling
pci_dev_is_disconnected() in devtlb_invalidation_with_pasid() will return
false there, wouldn't break the function.

For surprise removal, device state is set to pci_channel_io_perm_failure in
pciehp_unconfigure_device(), means device is already gone (disconnected)
call pci_dev_is_disconnected() in devtlb_invalidation_with_pasid() will
return true to break the function not to send ATS invalidation request to
the disconnected device blindly, thus avoid the further long time waiting
triggers the hard lockup.

safe removal & surprise removal

pciehp_ist()
   pciehp_handle_presence_or_link_change()
     pciehp_disable_slot()
       remove_board()
         pciehp_unconfigure_device(presence)

Tested-by: Haorong Ye <yehaorong@bytedance.com>
Signed-off-by: Ethan Zhao <haifeng.zhao@linux.intel.com>
---
 drivers/iommu/intel/pasid.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/iommu/intel/pasid.c b/drivers/iommu/intel/pasid.c
index 1c87fb1b1039..a08bdbec90eb 100644
--- a/drivers/iommu/intel/pasid.c
+++ b/drivers/iommu/intel/pasid.c
@@ -481,6 +481,9 @@ devtlb_invalidation_with_pasid(struct intel_iommu *iommu,
 	if (!info || !info->ats_enabled)
 		return;
 
+	if (pci_dev_is_disconnected(to_pci_dev(dev)))
+		return;
+
 	sid = info->bus << 8 | info->devfn;
 	qdep = info->ats_qdep;
 	pfsid = info->pfsid;
-- 
2.31.1


  parent reply	other threads:[~2023-12-27  3:09 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-12-27  3:09 [RFC PATCH v8 0/5] fix vt-d hard lockup when hotplug ATS capable device Ethan Zhao
2023-12-27  3:09 ` [RFC PATCH v8 1/5] iommu/vt-d: add flush_target_dev member to struct intel_iommu and pass device info to all ATS invalidation functions Ethan Zhao
2023-12-27  3:09 ` [RFC PATCH v8 2/5] iommu/vt-d: break out device-TLB invalidation if target device is gone Ethan Zhao
2023-12-27  3:09 ` [RFC PATCH v8 3/5] PCI: make pci_dev_is_disconnected() helper public for other drivers Ethan Zhao
2023-12-27  3:09 ` Ethan Zhao [this message]
2023-12-27  3:09 ` [RFC PATCH v8 5/5] iommu/vt-d: don't loop for timeout device-TLB invalidation request forever Ethan Zhao
  -- strict thread matches above, loose matches on Subject: below --
2023-12-27  2:59 [RFC PATCH v8 0/5] fix vt-d hard lockup when hotplug ATS capable device Ethan Zhao
2023-12-27  2:59 ` [RFC PATCH v8 4/5] iommu/vt-d: don't issue device-TLB invalidate request when device is disconnected Ethan Zhao
2023-12-27 13:11   ` Bjorn Helgaas
2023-12-27 23:31     ` Ethan Zhao

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20231227030918.536413-5-haifeng.zhao@linux.intel.com \
    --to=haifeng.zhao@linux.intel.com \
    --cc=baolu.lu@linux.intel.com \
    --cc=bhelgaas@google.com \
    --cc=dwmw2@infradead.org \
    --cc=iommu@lists.linux.dev \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=lukas@wunner.de \
    --cc=robin.murphy@arm.com \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox