From: Ethan Zhao <haifeng.zhao@linux.intel.com>
To: Lukas Wunner <lukas@wunner.de>
Cc: bhelgaas@google.com, baolu.lu@linux.intel.com,
dwmw2@infradead.org, will@kernel.org, robin.murphy@arm.com,
linux-pci@vger.kernel.org, iommu@lists.linux.dev,
linux-kernel@vger.kernel.org
Subject: Re: [PATCH v4 2/2] iommu/vt-d: don's issue devTLB flush request when device is disconnected
Date: Fri, 22 Dec 2023 09:56:39 +0800 [thread overview]
Message-ID: <94b08bab-6488-4c4a-9742-30a69972ba50@linux.intel.com> (raw)
In-Reply-To: <20231221103940.GA12714@wunner.de>
On 12/21/2023 6:39 PM, Lukas Wunner wrote:
> On Tue, Dec 19, 2023 at 07:51:53PM -0500, Ethan Zhao wrote:
>> For those endpoint devices connect to system via hotplug capable ports,
>> users could request a warm reset to the device by flapping device's link
>> through setting the slot's link control register, as pciehpt_ist() DLLSC
>> interrupt sequence response, pciehp will unload the device driver and
>> then power it off. thus cause an IOMMU devTLB flush request for device to
>> be sent and a long time completion/timeout waiting in interrupt context.
> I think the problem is in the "waiting in interrupt context".
>
> Can you change qi_submit_sync() to *sleep* until the queue is done?
> Instead of busy-waiting in atomic context?
If you read that function carefully, you wouldn't say "sleep" there....
that is 'sync'ed.
>
> Is the hardware capable of sending an interrupt once the queue is done?
> If it is not capable, would it be viable to poll with exponential backoff
> and sleep in-between polling once the polling delay increases beyond, say,
> 10 usec?
I don't know if the polling along sleeping for completion of meanningless
devTLB invalidation request blindly sent to (removed/powered down/link down)
device makes sense or not.
But according to PCIe spec 6.1 10.3.1
"Software ensures no invalidations are issued to a Function when its
ATS capability is disabled. "
>
> Again, the proposed patch is not a proper solution. It will paper over
> the issue most of the time but every once in a while someone will still
> get a hard lockup splat and it will then be more difficult to reproduce
> and fix if the proposed patch is accepted.
Could you point out why is not proper ? Is there any other window
the hard lockup still could happen with the ATS capable devcie
supprise_removal case if we checked the connection state first ?
Please help to elaberate it.
>
>
>> [ 4223.822622] CPU: 144 PID: 1422 Comm: irq/57-pciehp Kdump: loaded Tainted: G S
>> OE kernel version xxxx
> I don't see any reason to hide the kernel version.
> This isn't Intel Confidential information.
>
Yes, this is the old kernel stack trace, but customer also tried lasted
6.7rc4
(doesn't work) and the patched 6.7rc4 (fixed).
Thanks,
Ethan
>> [ 4223.822628] Call Trace:
>> [ 4223.822628] qi_flush_dev_iotlb+0xb1/0xd0
>> [ 4223.822628] __dmar_remove_one_dev_info+0x224/0x250
>> [ 4223.822629] dmar_remove_one_dev_info+0x3e/0x50
> __dmar_remove_one_dev_info() was removed by db75c9573b08 in v6.0
> one and a half years ago, so the stack trace appears to be from
> an older kernel version.
>
> Thanks,
>
> Lukas
next prev parent reply other threads:[~2023-12-22 1:56 UTC|newest]
Thread overview: 31+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-12-20 0:51 [PATCH v4 0/2] fix vt-d hard lockup when hotplug ATS capable device Ethan Zhao
2023-12-20 0:51 ` [PATCH v4 1/2] PCI: make pci_dev_is_disconnected() helper public for other drivers Ethan Zhao
2023-12-20 0:51 ` [PATCH v4 2/2] iommu/vt-d: don's issue devTLB flush request when device is disconnected Ethan Zhao
2023-12-21 10:39 ` Lukas Wunner
2023-12-21 11:01 ` Lukas Wunner
2023-12-22 2:08 ` Ethan Zhao
2023-12-22 3:56 ` Ethan Zhao
2023-12-22 1:56 ` Ethan Zhao [this message]
2023-12-22 8:14 ` Lukas Wunner
2023-12-22 9:01 ` Ethan Zhao
-- strict thread matches above, loose matches on Subject: below --
2023-12-13 3:46 [PATCH RFC 0/2] fix vt-d hard lockup when hotplug ATS capable device Ethan Zhao
2023-12-13 3:46 ` [PATCH 1/2] PCI: make pci_dev_is_disconnected() helper public for other drivers Ethan Zhao
2023-12-13 10:49 ` Lukas Wunner
2023-12-14 0:58 ` Ethan Zhao
2023-12-21 10:51 ` Lukas Wunner
2023-12-22 2:35 ` Ethan Zhao
2023-12-13 3:46 ` [PATCH 2/2] iommu/vt-d: don's issue devTLB flush request when device is disconnected Ethan Zhao
2023-12-13 10:44 ` Lukas Wunner
2023-12-13 11:54 ` Robin Murphy
2023-12-14 2:40 ` Ethan Zhao
2023-12-21 10:42 ` Lukas Wunner
2023-12-21 11:01 ` Robin Murphy
2023-12-21 11:07 ` Lukas Wunner
2023-12-22 3:20 ` Ethan Zhao
2023-12-14 2:16 ` Ethan Zhao
2023-12-15 0:43 ` Ethan Zhao
2023-12-13 11:59 ` Baolu Lu
2023-12-14 2:26 ` Ethan Zhao
2023-12-15 1:03 ` Ethan Zhao
2023-12-15 1:34 ` Baolu Lu
2023-12-15 1:51 ` Ethan Zhao
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=94b08bab-6488-4c4a-9742-30a69972ba50@linux.intel.com \
--to=haifeng.zhao@linux.intel.com \
--cc=baolu.lu@linux.intel.com \
--cc=bhelgaas@google.com \
--cc=dwmw2@infradead.org \
--cc=iommu@lists.linux.dev \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-pci@vger.kernel.org \
--cc=lukas@wunner.de \
--cc=robin.murphy@arm.com \
--cc=will@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox