All of lore.kernel.org
 help / color / mirror / Atom feed
From: Yinghai Lu <yinghai.lu@oracle.com>
To: Jesse Barnes <jbarnes@virtuousgeek.org>
Cc: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>,
	Bjorn Helgaas <bhelgaas@google.com>,
	"linux-pci@vger.kernel.org" <linux-pci@vger.kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: [PATCH -v4] PCI: Make sriov work with hotplug remove
Date: Tue, 22 Nov 2011 21:01:21 -0800	[thread overview]
Message-ID: <4ECC7E21.8060805@oracle.com> (raw)
In-Reply-To: <4ECC5328.40406@jp.fujitsu.com>


When hot remove pci express module that have pcie switch and support SRIOV, got

[ 5918.610127] pciehp 0000:80:02.2:pcie04: pcie_isr: intr_loc 1
[ 5918.615779] pciehp 0000:80:02.2:pcie04: Attention button interrupt received
[ 5918.622730] pciehp 0000:80:02.2:pcie04: Button pressed on Slot(3)
[ 5918.629002] pciehp 0000:80:02.2:pcie04: pciehp_get_power_status: SLOTCTRL a8 value read 1f9
[ 5918.637416] pciehp 0000:80:02.2:pcie04: PCI slot #3 - powering off due to button press.
[ 5918.647125] pciehp 0000:80:02.2:pcie04: pcie_isr: intr_loc 10
[ 5918.653039] pciehp 0000:80:02.2:pcie04: pciehp_green_led_blink: SLOTCTRL a8 write cmd 200
[ 5918.661229] pciehp 0000:80:02.2:pcie04: pciehp_set_attention_status: SLOTCTRL a8 write cmd c0
[ 5924.667627] pciehp 0000:80:02.2:pcie04: Disabling domain:bus:device=0000:b0:00
[ 5924.674909] pciehp 0000:80:02.2:pcie04: pciehp_get_power_status: SLOTCTRL a8 value read 2f9
[ 5924.683262] pciehp 0000:80:02.2:pcie04: pciehp_unconfigure_device: domain:bus:dev = 0000:b0:00
[ 5924.693976] libfcoe_device_notification: NETDEV_UNREGISTER eth6
[ 5924.764979] libfcoe_device_notification: NETDEV_UNREGISTER eth14
[ 5924.873539] libfcoe_device_notification: NETDEV_UNREGISTER eth15
[ 5924.995209] libfcoe_device_notification: NETDEV_UNREGISTER eth16
[ 5926.114407] sxge 0000:b2:00.0: PCI INT A disabled
[ 5926.119342] BUG: unable to handle kernel NULL pointer dereference at (null)
[ 5926.127189] IP: [<ffffffff81353a3b>] pci_stop_bus_device+0x33/0x83
[ 5926.133377] PGD 0
[ 5926.135402] Oops: 0000 [#1] SMP
[ 5926.138659] CPU 2
[ 5926.140499] Modules linked in:
...
[ 5926.143754]
[ 5926.275823] Call Trace:
[ 5926.278267]  [<ffffffff81353a38>] pci_stop_bus_device+0x30/0x83
[ 5926.284180]  [<ffffffff81353af4>] pci_remove_bus_device+0x1a/0xba
[ 5926.290264]  [<ffffffff81366311>] pciehp_unconfigure_device+0x110/0x17b
[ 5926.296866]  [<ffffffff81365dd9>] ? pciehp_disable_slot+0x188/0x188
[ 5926.303123]  [<ffffffff81365d6f>] pciehp_disable_slot+0x11e/0x188
[ 5926.309206]  [<ffffffff81365e68>] pciehp_power_thread+0x8f/0xe0
...

 +-[0000:80]-+-00.0-[81-8f]--
 |           +-01.0-[90-9f]--
 |           +-02.0-[a0-af]--
 |           +-02.2-[b0-bf]----00.0-[b1-b3]--+-02.0-[b2]--+-00.0 Device
 |           |                               |            +-00.1 Device
 |           |                               |            +-00.2 Device
 |           |                               |            \-00.3 Device
 |           |                               \-03.0-[b3]--+-00.0 Device
 |           |                                            +-00.1 Device
 |           |                                            +-00.2 Device
 |           |                                            \-00.3 Device

root complex: 80:02.2
pci express modules: have pcie switch and are listed as b0:00.0, b1:02.0 and b1:03.0.
                end devices  are b2:00.0 and b3.00.0.
                VFs are: b2:00.1,... b2:00.3, and b3:00.1,...,b3:00.3

Root cause: when doing pci_stop_bus_device() with phys fn, it will stop virt fn and
remove the fn, so
	list_for_each_safe(l, n, &bus->devices)
will have problem to refer freed n that is pointed to vf entry.

Solution is just replacing list_for_each_safe() with list_for_each().
	pci_stop_bus_device(dev) will not remove dev from bus->devices list,
	so We only need use for_each here.

During reviewing the patch, Bjorn said:
|   The PCI hot-remove path calls pci_stop_bus_devices() via
|   pci_remove_bus_device().
|
|   pci_stop_bus_devices() traverses the bus->devices list (point A below),
|   stopping each device in turn, which calls the driver remove() method.  When
|   the device is an SR-IOV PF, the driver calls pci_disable_sriov(), which
|   also uses pci_remove_bus_device() to remove the VF devices from the
|   bus->devices list (point B).
|
|       pci_remove_bus_device
|         pci_stop_bus_device
|           pci_stop_bus_devices(subordinate)
|             list_for_each(bus->devices)             <-- A
|               pci_stop_bus_device(PF)
|                 ...
|                   driver->remove
|                     pci_disable_sriov
|                       ...
|                         pci_remove_bus_device(VF)
|                             <remove from bus_list>  <-- B
|
|   At B, we're changing the same list we're iterating through at A, so when
|   the driver remove() method returns, the pci_stop_bus_devices() iterator has
|   a pointer to a list entry that has already been freed.

Discussion thread can be found : https://lkml.org/lkml/2011/10/15/141

-v2: updated changelog.
-v3: According to Kenji, do not allocate another list for PF.
     Also We don't need to check dev->is_virtfn anymore, because PF should come
	first in the list, and even VF come first, it is still ok.
-v4: According to Kenji, expand list_for_each(), and add is_virtfn checking

Signed-off-by: Yinghai Lu <yinghai@kernel.org>

---
 drivers/pci/remove.c |   28 +++++++++++++++++++++++++++-
 1 file changed, 27 insertions(+), 1 deletion(-)

Index: linux-2.6/drivers/pci/remove.c
===================================================================
--- linux-2.6.orig/drivers/pci/remove.c
+++ linux-2.6/drivers/pci/remove.c
@@ -123,10 +123,36 @@ void pci_remove_behind_bridge(struct pci
 static void pci_stop_bus_devices(struct pci_bus *bus)
 {
 	struct list_head *l, *n;
+	struct list_head *head = &bus->devices;
 
+	/*
+	 * pci_stop_bus_device(dev) will not remove dev from bus->devices list,
+	 *  so We don't need use _safe version for_each here.
+	 * Also _safe version has problem when pci_stop_bus_device() for PF try
+	 *  to remove VFs.
+	 */
+	for (l = head->next; l != head;) {
+		struct pci_dev *dev = pci_dev_b(l);
+
+		/*
+		 * VFs are removed by pci_remove_bus_device() in the
+		 *  pci_stop_bus_devices() code path for PF.
+		 *  aka, bus->devices get updated in the process.
+		 * barrier() will make sure we get right next from that list.
+		 */
+		if (!dev->is_virtfn) {
+			pci_stop_bus_device(dev);
+			barrier();
+		}
+		l = l->next;
+	}
+
+	/* For left over VFs in case */
 	list_for_each_safe(l, n, &bus->devices) {
 		struct pci_dev *dev = pci_dev_b(l);
-		pci_stop_bus_device(dev);
+
+		if (dev->is_virtfn)
+			pci_stop_bus_device(dev);
 	}
 }
 

  reply	other threads:[~2011-11-23  5:02 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <4E9A3092.4080309@oracle.com>
2011-10-16  1:31 ` [PATCH 1/8] pci: Make sriov work with hotplug removal Yinghai Lu
2011-10-17 17:16   ` Bjorn Helgaas
2011-10-17 18:08     ` Yinghai Lu
2011-10-17 22:12       ` Bjorn Helgaas
2011-10-17 22:24         ` Yinghai Lu
2011-10-18 16:49           ` Bjorn Helgaas
2011-10-18 17:02             ` Yinghai Lu
2011-10-25 12:34               ` Kenji Kaneshige
2011-10-25 15:52                 ` Yinghai Lu
2011-11-11 21:00               ` [RESEND PATCH] PCI: Make sriov work with hotplug remove Yinghai Lu
2011-11-16  5:54                 ` Kenji Kaneshige
2011-11-16 21:23                   ` Yinghai Lu
2011-11-23  1:58                     ` Kenji Kaneshige
2011-11-23  5:01                       ` Yinghai Lu [this message]
2011-10-16  1:31 ` [PATCH 2/8] pci: Calculate right add_size Yinghai Lu
2011-10-16  1:32 ` [PATCH 3/8] pci: Try to assign required+option size at first Yinghai Lu
2011-10-16  1:32 ` [PATCH 4/8] PCI: Using add_list in pcie hotplug path Yinghai Lu
2011-10-16  1:32 ` [PATCH 5/8] PCI: Make rescan bus could increase bridge resource size if needed Yinghai Lu
2011-10-16  1:32 ` [PATCH 6/8] PCI: Make pci_rescan_bus handle add_list Yinghai Lu
2011-10-16  1:32 ` [PATCH 7/8] PCI, sysfs: merge dev and bus cpuaffinity show handling Yinghai Lu
2011-10-16  1:32 ` [PATCH 8/8] PCI, sys: only create rescan under /sys/.../pci/devices/... for pci bridges Yinghai Lu
2011-10-16  2:39   ` Greg KH
2011-10-16  5:34     ` Yinghai Lu
2011-10-16 15:55       ` Greg KH
2011-10-16 23:35         ` Yinghai Lu
2011-10-17  1:45           ` Greg KH
2011-10-17 18:27             ` [PATCH -v4 8_1/8] PCI, sys: Use device_type and attr_groups with pci dev Yinghai Lu
2011-10-17 18:38               ` Greg KH
2011-10-17 18:29             ` [PATCH -v4 8_2/8] PCI, sys: only create rescan under /sys/.../pci/devices/... for pci bridges Yinghai Lu
2011-10-17 18:33             ` [PATCH -v4 8_3/8] PCI, sys: Use is_visable() with boot_vga attribute for pci_dev Yinghai Lu
2011-10-17 18:36             ` [PATCH 8/8] PCI, sys: only create rescan under /sys/.../pci/devices/... for pci bridges Yinghai Lu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4ECC7E21.8060805@oracle.com \
    --to=yinghai.lu@oracle.com \
    --cc=bhelgaas@google.com \
    --cc=jbarnes@virtuousgeek.org \
    --cc=kaneshige.kenji@jp.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.