From: Lukas Wunner <lukas@wunner.de>
To: Ziming Du <duziming2@huawei.com>
Cc: bhelgaas@google.com, okaya@kernel.org, keith.busch@intel.com,
linux-pci@vger.kernel.org, linux-kernel@vger.kernel.org,
liuyongqiang13@huawei.com,
Alex Williamson <alex.williamson@redhat.com>
Subject: Re: [PATCH] PCI: Fix AB-BA deadlock between aer_isr() and device_shutdown()
Date: Tue, 24 Feb 2026 07:40:32 +0100 [thread overview]
Message-ID: <aZ1H4IMMW_4w60EH@wunner.de> (raw)
In-Reply-To: <20260109095603.1088620-1-duziming2@huawei.com>
On Fri, Jan 09, 2026 at 05:56:03PM +0800, Ziming Du wrote:
> During system shutdown, a deadlock may occur between AER recovery process
> and device shutdown as follows:
The subject is slightly misleading as this isn't an AB-BA deadlock,
which involves two locks. It's a deadlock involving a single lock
(device_lock), where one task (shutdown) acquires the lock, then
waits for the AER interrupt thread to finish, but that thread is
waiting on the lock.
device_shutdown() acquires the device_lock to avoid invoking a driver's
->shutdown() callback while its ->probe() callback is still running or
while the driver is being removed, cf. d1c6c030fcec. That seems
reasonable.
It's unclear why pci_bus_reset() needs to acquire device_lock. This was
introduced by 090a3c5322e9. I'm adding Alex (the author) to cc.
Another question to ask is whether it makes sense at all to attempt
error recovery when the system is shutting down. Maybe we should log
the errors, but no longer try to recover from them?
It's possible to determine whether shutdown is in progress by querying
system_state (set by kernel/reboot.c). However we can't just skip
calling pci_bus_error_reset() in aer_root_reset() if system_state
indicates shutdown because it would still be racy. The only race-free
solution would be to register a notifier with reboot_notifier_list
which sets a flag that shutdown is in progress and waits for the
interrupt thread to finish. It's quite a complicated solution just
to work around a deadlock, so I suggest to first look into removal of
device_lock acquisition in pci_bus_reset().
Simply using trylock doesn't seem bullet-proof.
Thanks,
Lukas
prev parent reply other threads:[~2026-02-24 6:40 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-01-09 9:56 [PATCH] PCI: Fix AB-BA deadlock between aer_isr() and device_shutdown() Ziming Du
2026-01-13 18:51 ` Bjorn Helgaas
2026-01-15 2:50 ` duziming
2026-02-24 6:40 ` Lukas Wunner [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aZ1H4IMMW_4w60EH@wunner.de \
--to=lukas@wunner.de \
--cc=alex.williamson@redhat.com \
--cc=bhelgaas@google.com \
--cc=duziming2@huawei.com \
--cc=keith.busch@intel.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-pci@vger.kernel.org \
--cc=liuyongqiang13@huawei.com \
--cc=okaya@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox