From: Robin Murphy <robin.murphy@arm.com>
To: Lukas Wunner <lukas@wunner.de>,
Ethan Zhao <haifeng.zhao@linux.intel.com>
Cc: bhelgaas@google.com, baolu.lu@linux.intel.com,
dwmw2@infradead.org, will@kernel.org, linux-pci@vger.kernel.org,
iommu@lists.linux.dev, linux-kernel@vger.kernel.org,
Haorong Ye <yehaorong@bytedance.com>
Subject: Re: [PATCH 2/2] iommu/vt-d: don's issue devTLB flush request when device is disconnected
Date: Wed, 13 Dec 2023 11:54:05 +0000 [thread overview]
Message-ID: <3b7742c4-bbae-4a78-a5a6-30df936a17d4@arm.com> (raw)
In-Reply-To: <20231213104417.GA31964@wunner.de>
On 13/12/2023 10:44 am, Lukas Wunner wrote:
> On Tue, Dec 12, 2023 at 10:46:37PM -0500, Ethan Zhao wrote:
>> For those endpoint devices connect to system via hotplug capable ports,
>> users could request a warm reset to the device by flapping device's link
>> through setting the slot's link control register,
>
> Well, users could just *unplug* the device, right? Why is it relevant
> that thay could fiddle with registers in config space?
>
>
>> as pciehpt_ist() DLLSC
>> interrupt sequence response, pciehp will unload the device driver and
>> then power it off. thus cause an IOMMU devTLB flush request for device to
>> be sent and a long time completion/timeout waiting in interrupt context.
>
> A completion timeout should be on the order of usecs or msecs, why does it
> cause a hard lockup? The dmesg excerpt you've provided shows a 12 *second*
> delay between hot removal and watchdog reaction.
The PCIe spec only requires an endpoint to respond to an ATS invalidate
within a rather hilarious 90 seconds, so it's primarily a question of
how patient the root complex and bridges in between are prepared to be.
>> Fix it by checking the device's error_state in
>> devtlb_invalidation_with_pasid() to avoid sending meaningless devTLB flush
>> request to link down device that is set to pci_channel_io_perm_failure and
>> then powered off in
>
> This doesn't seem to be a proper fix. It will work most of the time
> but not always. A user might bring down the slot via sysfs, then yank
> the card from the slot just when the iommu flush occurs such that the
> pci_dev_is_disconnected(pdev) check returns false but the card is
> physically gone immediately afterwards. In other words, you've shrunk
> the time window during which the issue may occur, but haven't eliminated
> it completely.
Yeah, I think we have a subtle but fundamental issue here in that the
iommu_release_device() callback is hooked to BUS_NOTIFY_REMOVED_DEVICE,
so in general probably shouldn't be assuming it's safe to do anything
with the device itself *after* it's already been removed from its bus -
this step is primarily about cleaning up any of the IOMMU's own state
relating to the given device.
I think if we want to ensure ATCs are invalidated on hot-unplug we need
an additional pre-removal notifier to take care of that, and that step
would then want to distinguish between an orderly removal where cleaning
up is somewhat meaningful, and a surprise removal where it definitely isn't.
Thanks,
Robin.
next prev parent reply other threads:[~2023-12-13 11:54 UTC|newest]
Thread overview: 31+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-12-13 3:46 [PATCH RFC 0/2] fix vt-d hard lockup when hotplug ATS capable device Ethan Zhao
2023-12-13 3:46 ` [PATCH 1/2] PCI: make pci_dev_is_disconnected() helper public for other drivers Ethan Zhao
2023-12-13 10:49 ` Lukas Wunner
2023-12-14 0:58 ` Ethan Zhao
2023-12-21 10:51 ` Lukas Wunner
2023-12-22 2:35 ` Ethan Zhao
2023-12-13 3:46 ` [PATCH 2/2] iommu/vt-d: don's issue devTLB flush request when device is disconnected Ethan Zhao
2023-12-13 10:44 ` Lukas Wunner
2023-12-13 11:54 ` Robin Murphy [this message]
2023-12-14 2:40 ` Ethan Zhao
2023-12-21 10:42 ` Lukas Wunner
2023-12-21 11:01 ` Robin Murphy
2023-12-21 11:07 ` Lukas Wunner
2023-12-22 3:20 ` Ethan Zhao
2023-12-14 2:16 ` Ethan Zhao
2023-12-15 0:43 ` Ethan Zhao
2023-12-13 11:59 ` Baolu Lu
2023-12-14 2:26 ` Ethan Zhao
2023-12-15 1:03 ` Ethan Zhao
2023-12-15 1:34 ` Baolu Lu
2023-12-15 1:51 ` Ethan Zhao
-- strict thread matches above, loose matches on Subject: below --
2023-12-20 0:51 [PATCH v4 0/2] fix vt-d hard lockup when hotplug ATS capable device Ethan Zhao
2023-12-20 0:51 ` [PATCH v4 1/2] PCI: make pci_dev_is_disconnected() helper public for other drivers Ethan Zhao
2023-12-20 0:51 ` [PATCH v4 2/2] iommu/vt-d: don's issue devTLB flush request when device is disconnected Ethan Zhao
2023-12-21 10:39 ` Lukas Wunner
2023-12-21 11:01 ` Lukas Wunner
2023-12-22 2:08 ` Ethan Zhao
2023-12-22 3:56 ` Ethan Zhao
2023-12-22 1:56 ` Ethan Zhao
2023-12-22 8:14 ` Lukas Wunner
2023-12-22 9:01 ` Ethan Zhao
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=3b7742c4-bbae-4a78-a5a6-30df936a17d4@arm.com \
--to=robin.murphy@arm.com \
--cc=baolu.lu@linux.intel.com \
--cc=bhelgaas@google.com \
--cc=dwmw2@infradead.org \
--cc=haifeng.zhao@linux.intel.com \
--cc=iommu@lists.linux.dev \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-pci@vger.kernel.org \
--cc=lukas@wunner.de \
--cc=will@kernel.org \
--cc=yehaorong@bytedance.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox