All of lore.kernel.org
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: Huang Ying <ying.huang@intel.com>
Cc: Ben Hutchings <ben@decadent.org.uk>,
	linux-kernel@vger.kernel.org, stable@vger.kernel.org,
	alan@lxorguk.ukuu.org.uk, Bjorn Helgaas <bhelgaas@google.com>,
	"Rafael J. Wysocki" <rafael.j.wysocki@intel.com>,
	Zhang Yanmin <yanmin.zhang@intel.com>
Subject: Re: [ 02/38] PCI/PM: Fix deadlock when unbinding device if parent in D3cold
Date: Tue, 11 Dec 2012 10:08:12 -0800	[thread overview]
Message-ID: <20121211180812.GC25714@kroah.com> (raw)
In-Reply-To: <1355213572.7216.140.camel@yhuang-dev>

On Tue, Dec 11, 2012 at 04:12:52PM +0800, Huang Ying wrote:
> On Fri, 2012-11-30 at 10:54 +0800, Huang Ying wrote:
> > On Thu, 2012-11-29 at 18:01 -0800, Greg Kroah-Hartman wrote:
> > > On Fri, Nov 23, 2012 at 03:47:42PM +0800, Huang Ying wrote:
> > > > On Fri, 2012-11-23 at 11:09 +0800, Huang Ying wrote:
> > > > > On Fri, 2012-11-23 at 02:35 +0000, Ben Hutchings wrote:
> > > > > > On Wed, 2012-11-21 at 16:39 -0800, Greg Kroah-Hartman wrote:
> > > > > > > 3.0-stable review patch.  If anyone has any objections, please let me know.
> > > > > > > 
> > > > > > > ------------------
> > > > > > > 
> > > > > > > From: Huang Ying <ying.huang@intel.com>
> > > > > > > 
> > > > > > > commit 90b5c1d7c45eeb622302680ff96ed30c1a2b6f0e upstream.
> > > > > > > 
> > > > > > > If a PCI device and its parents are put into D3cold, unbinding the
> > > > > > > device will trigger deadlock as follow:
> > > > > > > 
> > > > > > > - driver_unbind
> > > > > > >   - device_release_driver
> > > > > > >     - device_lock(dev)				<--- previous lock here
> > > > > > >     - __device_release_driver
> > > > > > >       - pm_runtime_get_sync
> > > > > > >         ...
> > > > > > >           - rpm_resume(dev)
> > > > > > >             - rpm_resume(dev->parent)
> > > > > > >               ...
> > > > > > >                 - pci_pm_runtime_resume
> > > > > > >                   ...
> > > > > > >                   - pci_set_power_state
> > > > > > >                     - __pci_start_power_transition
> > > > > > >                       - pci_wakeup_bus(dev->parent->subordinate)
> > > > > > >                         - pci_walk_bus
> > > > > > >                           - device_lock(dev)	<--- deadlock here
> > > > > > > 
> > > > > > > 
> > > > > > > If we do not do device_lock in pci_walk_bus, we can avoid deadlock.
> > > > > > > Device_lock in pci_walk_bus is introduced in commit:
> > > > > > > d71374dafbba7ec3f67371d3b7e9f6310a588808, corresponding email thread
> > > > > > > is: https://lkml.org/lkml/2006/5/26/38.  The patch author Zhang Yanmin
> > > > > > > said device_lock is added to pci_walk_bus because:
> > > > > > > 
> > > > > > >   Some error handling functions call pci_walk_bus. For example, PCIe
> > > > > > >   aer. Here we lock the device, so the driver wouldn't detach from the
> > > > > > >   device, as the cb might call driver's callback function.
> > > > > > > 
> > > > > > > So I fixed the deadlock as follows:
> > > > > > > 
> > > > > > > - remove device_lock from pci_walk_bus
> > > > > > > - add device_lock into callback if callback will call driver's callback
> > > > > > > 
> > > > > > > I checked pci_walk_bus users one by one, and found only PCIe aer needs
> > > > > > > device lock.
> > > > > > [...]
> > > > > > 
> > > > > > What about eeh_report_error() in
> > > > > > arch/powerpc/platforms/pseries/eeh_driver.c?
> > > > > 
> > > > > En...  Because pci_walk_bus() invocation is removed in 3.7, so this
> > > > > patch is only valid for 3.7.  We need another version for 3.6.
> > > > 
> > > > Here is the patch for 3.6.  I have no powerpc machine, so build test
> > > > only.
> > > > 
> > > > Subject: [BUGFIX] PCI/PM: Fix deadlock when unbind device if its parent in D3cold
> > > > 
> > > > If a PCI device and its parents are put into D3cold, unbinding the
> > > > device will trigger deadlock as follow:
> > > > 
> > > > - driver_unbind
> > > >   - device_release_driver
> > > >     - device_lock(dev)				<--- previous lock here
> > > >     - __device_release_driver
> > > >       - pm_runtime_get_sync
> > > >         ...
> > > >           - rpm_resume(dev)
> > > >             - rpm_resume(dev->parent)
> > > >               ...
> > > >                 - pci_pm_runtime_resume
> > > >                   ...
> > > >                   - pci_set_power_state
> > > >                     - __pci_start_power_transition
> > > >                       - pci_wakeup_bus(dev->parent->subordinate)
> > > >                         - pci_walk_bus
> > > >                           - device_lock(dev)	<--- dead lock here
> > > > 
> > > > 
> > > > If we do not do device_lock in pci_walk_bus, we can avoid dead lock.
> > > > Device_lock in pci_walk_bus is introduced in commit:
> > > > d71374dafbba7ec3f67371d3b7e9f6310a588808, corresponding email thread
> > > > is: https://lkml.org/lkml/2006/5/26/38.  The patch author Zhang Yanmin
> > > > said device_lock is added to pci_walk_bus because:
> > > > 
> > > >   Some error handling functions call pci_walk_bus. For example, PCIe
> > > >   aer. Here we lock the device, so the driver wouldn't detach from the
> > > >   device, as the cb might call driver's callback function.
> > > > 
> > > > So I fixed the dead lock as follow:
> > > > 
> > > > - remove device_lock from pci_walk_bus
> > > > - add device_lock into callback if callback will call driver's callback
> > > > 
> > > > I checked pci_walk_bus users one by one, and found only PCIe aer needs
> > > > device lock.
> > > > 
> > > > Signed-off-by: Huang Ying <ying.huang@intel.com>
> > > > Cc: Zhang Yanmin <yanmin.zhang@intel.com>
> > > > ---
> > > >  arch/powerpc/platforms/pseries/eeh_driver.c |   51 ++++++++++++++++++----------
> > > 
> > > Due to me applying a power pci patch,
> > > feadf7c0a1a7c08c74bebb4a13b755f8c40e3bbc in Linus's tree to 3.6-stable,
> > > this patch doesn't apply here anymore.
> > > 
> > > Because that patch is in the tree, is it now just safe to take your
> > > original, unmodified, version of this patch for 3.6-stable?
> > 
> > No.  My original version does not work.  I need to rebase my patch on
> > this patch.  Which tree should I base?
> 
> Which tree should I base the patch on?

3.6.10 would be great.

thanks,

greg k-h

  reply	other threads:[~2012-12-11 18:08 UTC|newest]

Thread overview: 57+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20121122003904.262382971@linuxfoundation.org>
2012-11-22  0:39 ` [ 01/38] mm: bugfix: set current->reclaim_state to NULL while returning from kswapd() Greg Kroah-Hartman
2012-11-22  0:39 ` [ 02/38] PCI/PM: Fix deadlock when unbinding device if parent in D3cold Greg Kroah-Hartman
2012-11-23  2:35   ` Ben Hutchings
2012-11-23  3:09     ` Huang Ying
2012-11-23  7:47       ` Huang Ying
2012-11-30  2:01         ` Greg Kroah-Hartman
2012-11-30  2:54           ` Huang Ying
2012-12-11  8:12             ` Huang Ying
2012-12-11 18:08               ` Greg Kroah-Hartman [this message]
2012-12-14  7:08                 ` Huang Ying
2012-12-14 21:56                   ` Greg Kroah-Hartman
2012-11-26 18:55       ` Greg Kroah-Hartman
2012-11-26 19:08         ` Greg Kroah-Hartman
2012-11-26 19:30           ` Greg Kroah-Hartman
2012-11-27  0:28             ` Huang Ying
2012-11-22  0:39 ` [ 03/38] fanotify: fix missing break Greg Kroah-Hartman
2012-11-22  0:39 ` [ 04/38] crypto: cryptd - disable softirqs in cryptd_queue_worker to prevent data corruption Greg Kroah-Hartman
2012-11-22  0:39   ` Greg Kroah-Hartman
2012-11-22  0:39 ` [ 05/38] ptp: update adjfreq callback description Greg Kroah-Hartman
2012-11-24  0:26   ` Herton Ronaldo Krzesinski
2012-11-26 18:46     ` Greg Kroah-Hartman
2012-11-26 21:19       ` Keller, Jacob E
2012-11-22  0:39 ` [ 06/38] ALSA: hda: Cirrus: Fix coefficient index for beep configuration Greg Kroah-Hartman
2012-11-22  0:39 ` [ 07/38] ALSA: hda - Force to reset IEC958 status bits for AD codecs Greg Kroah-Hartman
2012-11-22  0:39 ` [ 08/38] ASoC: wm8978: pll incorrectly configured when codec is master Greg Kroah-Hartman
2012-11-22  0:39 ` [ 09/38] ASoC: dapm: Use card_list during DAPM shutdown Greg Kroah-Hartman
2012-11-22  0:39 ` [ 10/38] UBIFS: fix mounting problems after power cuts Greg Kroah-Hartman
2012-11-22  0:39 ` [ 11/38] UBIFS: introduce categorized lprops counter Greg Kroah-Hartman
2012-11-22  0:39 ` [ 12/38] s390/gup: add missing TASK_SIZE check to get_user_pages_fast() Greg Kroah-Hartman
2012-11-22  0:39 ` [ 13/38] USB: option: add Novatel E362 and Dell Wireless 5800 USB IDs Greg Kroah-Hartman
2012-11-22  0:39 ` [ 14/38] USB: option: add Alcatel X220/X500D " Greg Kroah-Hartman
2012-11-22  0:39 ` [ 15/38] wireless: allow 40 MHz on world roaming channels 12/13 Greg Kroah-Hartman
2012-11-22  0:39 ` [ 16/38] m68k: fix sigset_t accessor functions Greg Kroah-Hartman
2012-11-22  0:40 ` [ 17/38] ipv4: avoid undefined behavior in do_ip_setsockopt() Greg Kroah-Hartman
2012-11-22  0:40 ` [ 18/38] ipv6: setsockopt(IPIPPROTO_IPV6, IPV6_MINHOPCOUNT) forgot to set return value Greg Kroah-Hartman
2012-11-22  0:40 ` [ 19/38] net: correct check in dev_addr_del() Greg Kroah-Hartman
2012-11-22  0:40 ` [ 20/38] net-rps: Fix brokeness causing OOO packets Greg Kroah-Hartman
2012-11-22  0:40 ` [ 21/38] r8169: use unlimited DMA burst for TX Greg Kroah-Hartman
2012-11-22  0:40 ` [ 22/38] kbuild: Fix gcc -x syntax Greg Kroah-Hartman
2012-11-22  0:40 ` [ 23/38] netfilter: Validate the sequence number of dataless ACK packets as well Greg Kroah-Hartman
2012-11-22  0:40 ` [ 24/38] netfilter: Mark SYN/ACK packets as invalid from original direction Greg Kroah-Hartman
2012-11-22  0:40 ` [ 25/38] netfilter: nf_nat: dont check for port change on ICMP tuples Greg Kroah-Hartman
2012-11-22  0:40 ` [ 26/38] usb: use usb_serial_put in usb_serial_probe errors Greg Kroah-Hartman
2012-11-22  0:40 ` [ 27/38] eCryptfs: Copy up POSIX ACL and read-only flags from lower mount Greg Kroah-Hartman
2012-11-22  0:40 ` [ 28/38] eCryptfs: check for eCryptfs cipher support at mount Greg Kroah-Hartman
2012-11-22  0:40 ` [ 29/38] sky2: Fix for interrupt handler Greg Kroah-Hartman
2012-11-22  0:40 ` [ 30/38] drm/i915: fix overlay on i830M Greg Kroah-Hartman
2012-11-22  0:40 ` [ 31/38] NFS: Wait for session recovery to finish before returning Greg Kroah-Hartman
2012-11-22  0:40 ` [ 32/38] reiserfs: Fix lock ordering during remount Greg Kroah-Hartman
2012-11-22  0:40 ` [ 33/38] reiserfs: Protect reiserfs_quota_on() with write lock Greg Kroah-Hartman
2012-11-22  0:40 ` [ 34/38] reiserfs: Move quota calls out of " Greg Kroah-Hartman
2012-11-22  0:40 ` [ 35/38] reiserfs: Protect reiserfs_quota_write() with " Greg Kroah-Hartman
2012-11-22  0:40 ` [ 36/38] selinux: fix sel_netnode_insert() suspicious rcu dereference Greg Kroah-Hartman
2012-11-22  0:40 ` [ 37/38] PCI : ability to relocate assigned pci-resources Greg Kroah-Hartman
2012-11-23 13:29   ` Herton Ronaldo Krzesinski
     [not found]     ` <20121124014141.GD2752@ram.oc3035372033.ibm.com>
2012-11-26 18:53       ` Greg Kroah-Hartman
2012-11-22  0:40 ` [ 38/38] PCI : Calculate right add_size Greg Kroah-Hartman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20121211180812.GC25714@kroah.com \
    --to=gregkh@linuxfoundation.org \
    --cc=alan@lxorguk.ukuu.org.uk \
    --cc=ben@decadent.org.uk \
    --cc=bhelgaas@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=rafael.j.wysocki@intel.com \
    --cc=stable@vger.kernel.org \
    --cc=yanmin.zhang@intel.com \
    --cc=ying.huang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.