From: Ethan Zhao <haifeng.zhao@linux.intel.com>
To: kevin.tian@intel.com, bhelgaas@google.com,
baolu.lu@linux.intel.com, dwmw2@infradead.org, will@kernel.org,
robin.murphy@arm.com, lukas@wunner.de
Cc: linux-pci@vger.kernel.org, iommu@lists.linux.dev,
linux-kernel@vger.kernel.org
Subject: Re: [RFC PATCH v10 0/5] fix vt-d hard lockup when hotplug ATS capable device
Date: Tue, 9 Jan 2024 09:24:05 +0800 [thread overview]
Message-ID: <fb4f62a9-131d-4c3c-b290-e77041ee0019@linux.intel.com> (raw)
In-Reply-To: <20231228170206.720675-1-haifeng.zhao@linux.intel.com>
On 12/29/2023 1:02 AM, Ethan Zhao wrote:
> This patchset is used to fix vt-d hard lockup reported when surprise
> unplug ATS capable endpoint device connects to system via PCIe switch
> as following topology.
>
> +-[0000:15]-+-00.0 Intel Corporation Ice Lake Memory Map/VT-d
> | +-00.1 Intel Corporation Ice Lake Mesh 2 PCIe
> | +-00.2 Intel Corporation Ice Lake RAS
> | +-00.4 Intel Corporation Device 0b23
> | \-01.0-[16-1b]----00.0-[17-1b]--+-00.0-[18]----00.0
> NVIDIA Corporation Device 2324
> | +-01.0-[19]----00.0
> Mellanox Technologies MT2910 Family [ConnectX-7]
>
> User brought endpoint device 19:00.0's link down by flapping it's hotplug
> capable slot 17:01.0 link control register, as sequence DLLSC response,
> pciehp_ist() will unload device driver and power it off, durning device
> driver is unloading an iommu device-TLB invalidation (Intel VT-d spec, or
> 'ATS Invalidation' in PCIe spec) request issued to that link down device,
> thus a long time completion/timeout waiting in interrupt context causes
> continuous hard lockup warnning and system hang.
>
> Other detail, see every patch commit log.
>
> patch [3&4] were tested by yehaorong@bytedance.com on stable v6.7-rc4.
> patch [1-5] passed compiling on stable v6.7-rc6.
>
>
> change log:
> v10:
> - refactor qi_submit_sync() and its callers to get pci_dev instance, as
> Kevin pointed out add target_flush_dev to iommu is not right.
> v9:
> - unify all spelling of ATS Invalidation adhere to PCIe spec per Bjorn's
> suggestion.
> v8:
> - add a patch to break the loop for timeout device-TLB invalidation, as
> Bjorn said there is possibility device just no response but not gone.
> v7:
> - reorder patches and revise commit log per Bjorn's guide.
> - other code and commit log revise per Lukas' suggestion.
> - rebased to stable v6.7-rc6.
> v6:
> - add two patches to break out device-TLB invalidation if device is gone.
> v5:
> - add a patch try to fix the rare case (surprise remove a device in
> safe removal process). not work because surprise removal handling can't
> re-enter when another safe removal is in process.
> v4:
> - move the PCI device state checking after ATS per Baolu's suggestion.
> v3:
> - fix commit description typo.
> v2:
> - revise commit[1] description part according to Lukas' suggestion.
> - revise commit[2] description to clarify the issue's impact.
> v1:
> - https://lore.kernel.org/lkml/20231213034637.2603013-1-haifeng.zhao@
> linux.intel.com/T/
>
>
> Thanks,
> Ethan
>
>
> Ethan Zhao (5):
> iommu/vt-d: add pci_dev parameter to qi_submit_sync and refactor
> callers
> iommu/vt-d: break out ATS Invalidation if target device is gone
> PCI: make pci_dev_is_disconnected() helper public for other drivers
> iommu/vt-d: don't issue ATS Invalidation request when device is
> disconnected
> iommu/vt-d: don't loop for timeout ATS Invalidation request forever
>
> drivers/iommu/intel/dmar.c | 55 ++++++++++++++++++++++-------
> drivers/iommu/intel/iommu.c | 26 ++++----------
> drivers/iommu/intel/iommu.h | 17 +++++----
> drivers/iommu/intel/irq_remapping.c | 2 +-
> drivers/iommu/intel/pasid.c | 13 +++----
> drivers/iommu/intel/svm.c | 13 ++++---
> drivers/pci/pci.h | 5 ---
> include/linux/pci.h | 5 +++
> 8 files changed, 74 insertions(+), 62 deletions(-)
Any new comment for this patchset ?
Thanks,
Ethan
next prev parent reply other threads:[~2024-01-09 1:24 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-12-28 17:02 [RFC PATCH v10 0/5] fix vt-d hard lockup when hotplug ATS capable device Ethan Zhao
2023-12-28 17:02 ` [RFC PATCH v10 1/5] iommu/vt-d: add pci_dev parameter to qi_submit_sync and refactor callers Ethan Zhao
2024-01-10 4:59 ` Baolu Lu
2024-01-10 7:51 ` Ethan Zhao
2023-12-28 17:02 ` [RFC PATCH v10 2/5] iommu/vt-d: break out ATS Invalidation if target device is gone Ethan Zhao
2024-01-10 5:17 ` Baolu Lu
2024-01-10 8:29 ` Ethan Zhao
2023-12-29 8:18 ` [RFC PATCH v10 0/5] fix vt-d hard lockup when hotplug ATS capable device Tian, Kevin
2023-12-29 9:28 ` Ethan Zhao
2024-01-09 1:24 ` Ethan Zhao [this message]
2024-01-15 7:58 ` Ethan Zhao
2024-01-17 3:24 ` Baolu Lu
2024-01-17 5:26 ` Ethan Zhao
2024-01-17 5:38 ` Ethan Zhao
2024-01-17 9:00 ` Ethan Zhao
2024-01-18 0:46 ` Baolu Lu
2024-01-18 2:26 ` Ethan Zhao
2024-01-18 2:32 ` Ethan Zhao
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=fb4f62a9-131d-4c3c-b290-e77041ee0019@linux.intel.com \
--to=haifeng.zhao@linux.intel.com \
--cc=baolu.lu@linux.intel.com \
--cc=bhelgaas@google.com \
--cc=dwmw2@infradead.org \
--cc=iommu@lists.linux.dev \
--cc=kevin.tian@intel.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-pci@vger.kernel.org \
--cc=lukas@wunner.de \
--cc=robin.murphy@arm.com \
--cc=will@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.