From: Bjorn Helgaas <bhelgaas@google.com>
To: intel-wired-lan@osuosl.org
Subject: [Intel-wired-lan] e1000e pci_disable_link_state_locked() issues
Date: Wed, 20 May 2015 14:47:49 -0500 [thread overview]
Message-ID: <20150520194749.GA10210@google.com> (raw)
I think we have some issues with the e1000e usage of
pci_disable_link_state_locked(), which Yinghai added with 9f728f53dd70
("PCI/e1000e: Add and use pci_disable_link_state_locked()").
That fixed an AER deadlock in the following path, where pci_bus_sem is held
by pci_walk_bus(), and we deadlocked when we tried to re-acquire it in
pci_disable_link_state():
do_recovery
broadcast_error_message(..., report_slot_reset)
pci_walk_bus
down_read(&pci_bus_sem)
cb(...) # report_slot_reset
report_slot_reset
dev->driver->err_handler->slot_reset # e1000_io_slot_reset
e1000_io_slot_reset
e1000e_disable_aspm
pci_disable_link_state
down_read(&pci_bus_sem)
9f728f53dd70 fixed that by changing e1000e_disable_aspm() to use
pci_disable_link_state_locked() instead, which assumes pci_bus_sem is
already held.
That's fine for the e1000_io_slot_reset() path, where pci_bus_sem really
*is* held. But e1000e_disable_aspm() is also called from e1000_probe() and
__e1000_resume(), and in those paths, we *don't* hold pci_bus_sem.
In effect, the caller of pci_disable_link_state_locked() is promising that
pci_bus_sem is held, and __pci_disable_link_state() relies on that promise
for its locking. But e1000e isn't upholding its end of the bargain.
I'm not 100% sure __pci_disable_link_state() actually *needs* that locking:
it is only called from a driver, and it should be impossible for a device
or any upstream bridge to go away while a driver is bound to it. If
somebody wanted to analyze this further and propose a patch to remove the
locking (if it seems safe), that would be great.
But in any case, __pci_disable_link_state() should be able to rely on its
callers following the rules, so I'd like to see an e1000e change to use
pci_disable_link_state() from the paths where pci_bus_sem is not held.
Bjorn
WARNING: multiple messages have this Message-ID (diff)
From: Bjorn Helgaas <bhelgaas@google.com>
To: Yinghai Lu <yinghai@kernel.org>,
Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Cc: linux-pci@vger.kernel.org, linux-kernel@vger.kernel.org,
netdev@vger.kernel.org, intel-wired-lan@lists.osuosl.org
Subject: e1000e pci_disable_link_state_locked() issues
Date: Wed, 20 May 2015 14:47:49 -0500 [thread overview]
Message-ID: <20150520194749.GA10210@google.com> (raw)
I think we have some issues with the e1000e usage of
pci_disable_link_state_locked(), which Yinghai added with 9f728f53dd70
("PCI/e1000e: Add and use pci_disable_link_state_locked()").
That fixed an AER deadlock in the following path, where pci_bus_sem is held
by pci_walk_bus(), and we deadlocked when we tried to re-acquire it in
pci_disable_link_state():
do_recovery
broadcast_error_message(..., report_slot_reset)
pci_walk_bus
down_read(&pci_bus_sem)
cb(...) # report_slot_reset
report_slot_reset
dev->driver->err_handler->slot_reset # e1000_io_slot_reset
e1000_io_slot_reset
e1000e_disable_aspm
pci_disable_link_state
down_read(&pci_bus_sem)
9f728f53dd70 fixed that by changing e1000e_disable_aspm() to use
pci_disable_link_state_locked() instead, which assumes pci_bus_sem is
already held.
That's fine for the e1000_io_slot_reset() path, where pci_bus_sem really
*is* held. But e1000e_disable_aspm() is also called from e1000_probe() and
__e1000_resume(), and in those paths, we *don't* hold pci_bus_sem.
In effect, the caller of pci_disable_link_state_locked() is promising that
pci_bus_sem is held, and __pci_disable_link_state() relies on that promise
for its locking. But e1000e isn't upholding its end of the bargain.
I'm not 100% sure __pci_disable_link_state() actually *needs* that locking:
it is only called from a driver, and it should be impossible for a device
or any upstream bridge to go away while a driver is bound to it. If
somebody wanted to analyze this further and propose a patch to remove the
locking (if it seems safe), that would be great.
But in any case, __pci_disable_link_state() should be able to rely on its
callers following the rules, so I'd like to see an e1000e change to use
pci_disable_link_state() from the paths where pci_bus_sem is not held.
Bjorn
next reply other threads:[~2015-05-20 19:47 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-05-20 19:47 Bjorn Helgaas [this message]
2015-05-20 19:47 ` e1000e pci_disable_link_state_locked() issues Bjorn Helgaas
2015-05-21 15:56 ` [Intel-wired-lan] " Lubetkin, YanirX
2015-05-21 15:56 ` Lubetkin, YanirX
2015-05-22 7:14 ` Yijing Wang
2015-05-22 7:14 ` Yijing Wang
2015-05-22 7:55 ` [Intel-wired-lan] " Yijing Wang
2015-05-22 7:55 ` Yijing Wang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20150520194749.GA10210@google.com \
--to=bhelgaas@google.com \
--cc=intel-wired-lan@osuosl.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.