linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 1/8] pci: Make sriov work with hotplug removal
       [not found] <4E9A3092.4080309@oracle.com>
@ 2011-10-16  1:31 ` Yinghai Lu
  2011-10-17 17:16   ` Bjorn Helgaas
  2011-10-16  1:31 ` [PATCH 2/8] pci: Calculate right add_size Yinghai Lu
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 31+ messages in thread
From: Yinghai Lu @ 2011-10-16  1:31 UTC (permalink / raw)
  To: Jesse Barnes; +Cc: linux-pci@vger.kernel.org, linux-kernel@vger.kernel.org


When hot remove pci express module, got

[ 5918.610127] pciehp 0000:80:02.2:pcie04: pcie_isr: intr_loc 1
[ 5918.615779] pciehp 0000:80:02.2:pcie04: Attention button interrupt received
[ 5918.622730] pciehp 0000:80:02.2:pcie04: Button pressed on Slot(3)
[ 5918.629002] pciehp 0000:80:02.2:pcie04: pciehp_get_power_status: SLOTCTRL a8 value read 1f9
[ 5918.637416] pciehp 0000:80:02.2:pcie04: PCI slot #3 - powering off due to button press.
[ 5918.647125] pciehp 0000:80:02.2:pcie04: pcie_isr: intr_loc 10
[ 5918.653039] pciehp 0000:80:02.2:pcie04: pciehp_green_led_blink: SLOTCTRL a8 write cmd 200
[ 5918.661229] pciehp 0000:80:02.2:pcie04: pciehp_set_attention_status: SLOTCTRL a8 write cmd c0
[ 5924.667627] pciehp 0000:80:02.2:pcie04: Disabling domain:bus:device=0000:b0:00
[ 5924.674909] pciehp 0000:80:02.2:pcie04: pciehp_get_power_status: SLOTCTRL a8 value read 2f9
[ 5924.683262] pciehp 0000:80:02.2:pcie04: pciehp_unconfigure_device: domain:bus:dev = 0000:b0:00
[ 5924.693976] libfcoe_device_notification: NETDEV_UNREGISTER eth6
[ 5924.764979] libfcoe_device_notification: NETDEV_UNREGISTER eth14
[ 5924.873539] libfcoe_device_notification: NETDEV_UNREGISTER eth15
[ 5924.995209] libfcoe_device_notification: NETDEV_UNREGISTER eth16
[ 5926.114407] sxge 0000:b2:00.0: PCI INT A disabled
[ 5926.119342] BUG: unable to handle kernel NULL pointer dereference at (null)
[ 5926.127189] IP: [<ffffffff81353a3b>] pci_stop_bus_device+0x33/0x83
[ 5926.133377] PGD 0
[ 5926.135402] Oops: 0000 [#1] SMP
[ 5926.138659] CPU 2
[ 5926.140499] Modules linked in:
...
[ 5926.143754]
[ 5926.275823] Call Trace:
[ 5926.278267]  [<ffffffff81353a38>] pci_stop_bus_device+0x30/0x83
[ 5926.284180]  [<ffffffff81353af4>] pci_remove_bus_device+0x1a/0xba
[ 5926.290264]  [<ffffffff81366311>] pciehp_unconfigure_device+0x110/0x17b
[ 5926.296866]  [<ffffffff81365dd9>] ? pciehp_disable_slot+0x188/0x188
[ 5926.303123]  [<ffffffff81365d6f>] pciehp_disable_slot+0x11e/0x188
[ 5926.309206]  [<ffffffff81365e68>] pciehp_power_thread+0x8f/0xe0
...

Root cause: when doing pci_stop_bus_device() with phys fn, it will stop virt fn and
remove the fn, so
	list_for_each_safe(l, n, &bus->devices)
will have problem to refer freed n that is pointed to vf entry.

Solution is just call pci_stop_bus_device() with phys fn only. and before that need to
save phys fn aside and avoid to use bus->devices to loop.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>

---
 drivers/pci/remove.c |   33 +++++++++++++++++++++++++++++++++
 1 file changed, 33 insertions(+)

Index: linux-2.6/drivers/pci/remove.c
===================================================================
--- linux-2.6.orig/drivers/pci/remove.c
+++ linux-2.6/drivers/pci/remove.c
@@ -120,10 +120,43 @@ void pci_remove_behind_bridge(struct pci
 			pci_remove_bus_device(pci_dev_b(l));
 }
 
+struct dev_list {
+	struct pci_dev *dev;
+	struct list_head list;
+};
+
 static void pci_stop_bus_devices(struct pci_bus *bus)
 {
 	struct list_head *l, *n;
+	struct dev_list *dl, *dn;
+	LIST_HEAD(physfn_list);
+
+	/* Save phys_fn aside at first */
+	list_for_each(l, &bus->devices) {
+		struct pci_dev *dev = pci_dev_b(l);
+
+		if (!dev->is_virtfn) {
+			dl = kmalloc(sizeof(*dl), GFP_KERNEL);
+			if (!dl)
+				continue;
+			dl->dev = dev;
+			list_add_tail(&dl->list, &physfn_list);
+		}
+	}
+
+	/*
+	 * stop bus device for phys_fn at first
+	 *  it will stop and remove vf in driver remove action
+	 */
+	list_for_each_entry_safe(dl, dn, &physfn_list, list) {
+		struct pci_dev *dev = dl->dev;
+
+		pci_stop_bus_device(dev);
+
+		kfree(dl);
+	}
 
+	/* Do it again for left over in case */
 	list_for_each_safe(l, n, &bus->devices) {
 		struct pci_dev *dev = pci_dev_b(l);
 		pci_stop_bus_device(dev);

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH 2/8] pci: Calculate right add_size
       [not found] <4E9A3092.4080309@oracle.com>
  2011-10-16  1:31 ` [PATCH 1/8] pci: Make sriov work with hotplug removal Yinghai Lu
@ 2011-10-16  1:31 ` Yinghai Lu
  2011-10-16  1:32 ` [PATCH 3/8] pci: Try to assign required+option size at first Yinghai Lu
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 31+ messages in thread
From: Yinghai Lu @ 2011-10-16  1:31 UTC (permalink / raw)
  To: Jesse Barnes; +Cc: linux-pci@vger.kernel.org, linux-kernel@vger.kernel.org


During debug one SRIOV enabled hotplug, found add_size is not passed properly.

the device have devices under two level bridges..

 +-[0000:80]-+-00.0-[81-8f]--
 |           +-01.0-[90-9f]--
 |           +-02.0-[a0-af]----00.0-[a1-a3]--+-02.0-[a2]--+-00.0  Oracle Corporation Device
 |           |                               \-03.0-[a3]--+-00.0  Oracle Corporation Device

so later parent bridge will not try to add big range.
[  557.455077] pci 0000:a0:00.0: BAR 14: assigned [mem 0xf9000000-0xf93fffff]
[  557.461974] pci 0000:a0:00.0: BAR 15: assigned [mem 0xf6000000-0xf61fffff pref]
[  557.469340] pci 0000:a1:02.0: BAR 14: assigned [mem 0xf9000000-0xf91fffff]
[  557.476231] pci 0000:a1:02.0: BAR 15: assigned [mem 0xf6000000-0xf60fffff pref]
[  557.483582] pci 0000:a1:03.0: BAR 14: assigned [mem 0xf9200000-0xf93fffff]
[  557.490468] pci 0000:a1:03.0: BAR 15: assigned [mem 0xf6100000-0xf61fffff pref]
[  557.497833] pci 0000:a1:03.0: BAR 14: can't assign mem (size 0x200000)
[  557.504378] pci 0000:a1:03.0: failed to add optional resources res=[mem 0xf9200000-0xf93fffff]
[  557.513026] pci 0000:a1:02.0: BAR 14: can't assign mem (size 0x200000)
[  557.519578] pci 0000:a1:02.0: failed to add optional resources res=[mem 0xf9000000-0xf91fffff]

it turns out We did not calculate size1 properly.

static resource_size_t calculate_memsize(resource_size_t size,
                resource_size_t min_size,
                resource_size_t size1,
                resource_size_t old_size,
                resource_size_t align)
{
        if (size < min_size)
                size = min_size;
        if (old_size == 1 )
                old_size = 0;
        if (size < old_size)
                size = old_size;
        size = ALIGN(size + size1, align);
        return size;
}

We should not pass add_size with min_size in calculate_memsize.
that will make add_size not contribute final add_size.

Just pass add_size with size1 to calculate_memsize()

With this change, We should have chance to remove extra addon in pci_reassign_resource...

Signed-off-by: Yinghai Lu <yinghai@kernel.org>

---
 drivers/pci/setup-bus.c |   21 ++++++++++++++++-----
 1 file changed, 16 insertions(+), 5 deletions(-)

Index: linux-2.6/drivers/pci/setup-bus.c
===================================================================
--- linux-2.6.orig/drivers/pci/setup-bus.c
+++ linux-2.6/drivers/pci/setup-bus.c
@@ -612,7 +616,7 @@ static void pbus_size_io(struct pci_bus
 	if (children_add_size > add_size)
 		add_size = children_add_size;
 	size1 = (!realloc_head || (realloc_head && !add_size)) ? size0 :
-		calculate_iosize(size, min_size+add_size, size1,
+		calculate_iosize(size, min_size, add_size + size1,
 			resource_size(b_res), 4096);
 	if (!size0 && !size1) {
 		if (b_res->start || b_res->end)
@@ -726,7 +734,7 @@ static int pbus_size_mem(struct pci_bus
 	if (children_add_size > add_size)
 		add_size = children_add_size;
 	size1 = (!realloc_head || (realloc_head && !add_size)) ? size0 :
-		calculate_memsize(size, min_size+add_size, 0,
+		calculate_memsize(size, min_size, add_size,
 				resource_size(b_res), min_align);
 	if (!size0 && !size1) {
 		if (b_res->start || b_res->end)
Index: linux-2.6/drivers/pci/setup-res.c
===================================================================
--- linux-2.6.orig/drivers/pci/setup-res.c
+++ linux-2.6/drivers/pci/setup-res.c
@@ -232,11 +232,12 @@ int pci_reassign_resource(struct pci_dev
 		return -EINVAL;
 	}
 
-	new_size = resource_size(res) + addsize + min_align;
+	/* already aligned with min_align */
+	new_size = resource_size(res) + addsize;
 	ret = _pci_assign_resource(dev, resno, new_size, min_align);
 	if (!ret) {
 		res->flags &= ~IORESOURCE_STARTALIGN;
-		dev_info(&dev->dev, "BAR %d: assigned %pR\n", resno, res);
+		dev_info(&dev->dev, "BAR %d: reassigned %pR\n", resno, res);
 		if (resno < PCI_BRIDGE_RESOURCES)
 			pci_update_resource(dev, resno);
 	}

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH 3/8] pci: Try to assign required+option size at first
       [not found] <4E9A3092.4080309@oracle.com>
  2011-10-16  1:31 ` [PATCH 1/8] pci: Make sriov work with hotplug removal Yinghai Lu
  2011-10-16  1:31 ` [PATCH 2/8] pci: Calculate right add_size Yinghai Lu
@ 2011-10-16  1:32 ` Yinghai Lu
  2011-10-16  1:32 ` [PATCH 4/8] PCI: Using add_list in pcie hotplug path Yinghai Lu
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 31+ messages in thread
From: Yinghai Lu @ 2011-10-16  1:32 UTC (permalink / raw)
  To: Jesse Barnes; +Cc: linux-pci@vger.kernel.org, linux-kernel@vger.kernel.org


Found reassign can not find right range for one resource. even total range is enough.

bridge b1:02.0 will need 2M+3M
bridge b1:03.0 will need 2M+3M

so bridge b0:00.0 will get assigned: 4M : [f8000000-f83fffff]
   later is reassigned to 10M : [f8000000-f9ffffff]

b1:02.0 is assigned to 2M : [f8000000-f81fffff]
b1:03.0 is assigned to 2M : [f8200000-f83fffff]

after that b1:03.0 get chance to be reassigned to [f8200000-f86fffff]
but b1:02.0 will not have chance to expand, because b1:03.0 is using in middle one.

[  187.911401] pci 0000:b1:02.0: bridge window [mem 0x00100000-0x002fffff] to [bus b2-b2] add_size 300000
[  187.920764] pci 0000:b1:03.0: bridge window [mem 0x00100000-0x002fffff] to [bus b3-b3] add_size 300000
[  187.930129] pci 0000:b1:02.0: [mem 0x00100000-0x002fffff] get_res_add_size  add_size 300000
[  187.938500] pci 0000:b1:03.0: [mem 0x00100000-0x002fffff] get_res_add_size  add_size 300000
[  187.946857] pci 0000:b0:00.0: bridge window [mem 0x00100000-0x004fffff] to [bus b1-b3] add_size 600000
[  187.956206] pci 0000:b0:00.0: BAR 14: assigned [mem 0xf8000000-0xf83fffff]
[  187.963102] pci 0000:b0:00.0: BAR 15: assigned [mem 0xf5000000-0xf51fffff pref]
[  187.970434] pci 0000:b0:00.0: BAR 14: reassigned [mem 0xf8000000-0xf89fffff]
[  187.977497] pci 0000:b1:02.0: BAR 14: assigned [mem 0xf8000000-0xf81fffff]
[  187.984383] pci 0000:b1:02.0: BAR 15: assigned [mem 0xf5000000-0xf50fffff pref]
[  187.991695] pci 0000:b1:03.0: BAR 14: assigned [mem 0xf8200000-0xf83fffff]
[  187.998576] pci 0000:b1:03.0: BAR 15: assigned [mem 0xf5100000-0xf51fffff pref]
[  188.005888] pci 0000:b1:03.0: BAR 14: reassigned [mem 0xf8200000-0xf86fffff]
[  188.012939] pci 0000:b1:02.0: BAR 14: can't assign mem (size 0x200000)
[  188.019471] pci 0000:b1:02.0: failed to add 300000 to res=[mem 0xf8000000-0xf81fffff]
[  188.027326] pci 0000:b2:00.0: reg 184: [mem 0x00000000-0x00003fff 64bit]
[  188.034071] pci 0000:b2:00.0: reg 18c: [mem 0x00000000-0x000fffff 64bit]
[  188.040795] pci 0000:b2:00.0: BAR 2: assigned [mem 0xf8000000-0xf80fffff 64bit]
[  188.048119] pci 0000:b2:00.0: BAR 2: set to [mem 0xf8000000-0xf80fffff 64bit] (PCI address [0xf8000000-0xf80fffff])
[  188.058550] pci 0000:b2:00.0: BAR 6: assigned [mem 0xf5000000-0xf50fffff pref]
[  188.065802] pci 0000:b2:00.0: BAR 0: assigned [mem 0xf8100000-0xf8103fff 64bit]
[  188.073125] pci 0000:b2:00.0: BAR 0: set to [mem 0xf8100000-0xf8103fff 64bit] (PCI address [0xf8100000-0xf8103fff])
[  188.083596] pci 0000:b2:00.0: reg 18c: [mem 0x00000000-0x000fffff 64bit]
[  188.090310] pci 0000:b2:00.0: BAR 9: can't assign mem (size 0x300000)
[  188.096773] pci 0000:b2:00.0: reg 184: [mem 0x00000000-0x00003fff 64bit]
[  188.103479] pci 0000:b2:00.0: BAR 7: assigned [mem 0xf8104000-0xf810ffff 64bit]
[  188.110801] pci 0000:b2:00.0: BAR 7: set to [mem 0xf8104000-0xf810ffff 64bit] (PCI address [0xf8104000-0xf810ffff])
[  188.121256] pci 0000:b1:02.0: PCI bridge to [bus b2-b2]
[  188.126512] pci 0000:b1:02.0:   bridge window [mem 0xf8000000-0xf81fffff]
[  188.133328] pci 0000:b1:02.0:   bridge window [mem 0xf5000000-0xf50fffff pref]
[  188.140608] pci 0000:b3:00.0: reg 184: [mem 0x00000000-0x00003fff 64bit]
[  188.147341] pci 0000:b3:00.0: reg 18c: [mem 0x00000000-0x000fffff 64bit]
[  188.154076] pci 0000:b3:00.0: BAR 2: assigned [mem 0xf8200000-0xf82fffff 64bit]
[  188.161417] pci 0000:b3:00.0: BAR 2: set to [mem 0xf8200000-0xf82fffff 64bit] (PCI address [0xf8200000-0xf82fffff])
[  188.171865] pci 0000:b3:00.0: BAR 6: assigned [mem 0xf5100000-0xf51fffff pref]
[  188.179090] pci 0000:b3:00.0: BAR 0: assigned [mem 0xf8300000-0xf8303fff 64bit]
[  188.186431] pci 0000:b3:00.0: BAR 0: set to [mem 0xf8300000-0xf8303fff 64bit] (PCI address [0xf8300000-0xf8303fff])
[  188.196884] pci 0000:b3:00.0: reg 18c: [mem 0x00000000-0x000fffff 64bit]
[  188.203591] pci 0000:b3:00.0: BAR 9: assigned [mem 0xf8400000-0xf86fffff 64bit]
[  188.210909] pci 0000:b3:00.0: BAR 9: set to [mem 0xf8400000-0xf86fffff 64bit] (PCI address [0xf8400000-0xf86fffff])
[  188.221379] pci 0000:b3:00.0: reg 184: [mem 0x00000000-0x00003fff 64bit]
[  188.228089] pci 0000:b3:00.0: BAR 7: assigned [mem 0xf8304000-0xf830ffff 64bit]
[  188.235407] pci 0000:b3:00.0: BAR 7: set to [mem 0xf8304000-0xf830ffff 64bit] (PCI address [0xf8304000-0xf830ffff])
[  188.245843] pci 0000:b1:03.0: PCI bridge to [bus b3-b3]
[  188.251107] pci 0000:b1:03.0:   bridge window [mem 0xf8200000-0xf86fffff]
[  188.257922] pci 0000:b1:03.0:   bridge window [mem 0xf5100000-0xf51fffff pref]
[  188.265180] pci 0000:b0:00.0: PCI bridge to [bus b1-b3]
[  188.270443] pci 0000:b0:00.0:   bridge window [mem 0xf8000000-0xf89fffff]
[  188.277250] pci 0000:b0:00.0:   bridge window [mem 0xf5000000-0xf51fffff pref]
[  188.284512] pcieport 0000:80:02.2: PCI bridge to [bus b0-bf]
[  188.290184] pcieport 0000:80:02.2:   bridge window [io  0xa000-0xbfff]
[  188.296735] pcieport 0000:80:02.2:   bridge window [mem 0xf8000000-0xf8ffffff]
[  188.303963] pcieport 0000:80:02.2:   bridge window [mem 0xf5000000-0xf5ffffff 64bit pref]

b2:00.0 BAR 9 has not get assigned...

root cause:
b1:02.0 can not be added more range, because b1:03.0 is just after it.
not space between required ranges.

Solution:
Try to assign required + optional all together at first, and if it fails, go with required then reassign path.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>

---
 drivers/pci/setup-bus.c |  113 +++++++++++++++++++++++++++++++++++++++++-------
 1 file changed, 97 insertions(+), 16 deletions(-)

Index: linux-2.6/drivers/pci/setup-bus.c
===================================================================
--- linux-2.6.orig/drivers/pci/setup-bus.c
+++ linux-2.6/drivers/pci/setup-bus.c
@@ -64,7 +64,7 @@ void pci_realloc(void)
  * @add_size:	additional size to be optionally added
  *              to the resource
  */
-static void add_to_list(struct resource_list_x *head,
+static int add_to_list(struct resource_list_x *head,
 		 struct pci_dev *dev, struct resource *res,
 		 resource_size_t add_size, resource_size_t min_align)
 {
@@ -75,7 +75,7 @@ static void add_to_list(struct resource_
 	tmp = kmalloc(sizeof(*tmp), GFP_KERNEL);
 	if (!tmp) {
 		pr_warning("add_to_list: kmalloc() failed!\n");
-		return;
+		return -ENOMEM;
 	}
 
 	tmp->next = ln;
@@ -87,6 +87,8 @@ static void add_to_list(struct resource_
 	tmp->add_size = add_size;
 	tmp->min_align = min_align;
 	list->next = tmp;
+
+	return 0;
 }
 
 static void add_to_failed_list(struct resource_list_x *head,
@@ -97,6 +99,42 @@ static void add_to_failed_list(struct re
 			0 /* dont care */);
 }
 
+static void remove_from_list(struct resource_list_x *realloc_head,
+				 struct resource *res)
+{
+	struct resource_list_x *prev, *tmp, *list;
+
+	prev = realloc_head;
+	for (list = realloc_head->next; list;) {
+		if (list->res != res) {
+			prev = list;
+			list = list->next;
+			continue;
+		}
+		tmp = list;
+		prev->next = list = list->next;
+		kfree(tmp);
+	}
+}
+
+static resource_size_t get_res_add_size(struct resource_list_x *realloc_head,
+					struct resource *res)
+{
+	struct resource_list_x *list;
+
+	/* check if it is in realloc_head list */
+	for (list = realloc_head->next; list; list = list->next) {
+		if (list->res != res)
+			continue;
+		dev_printk(KERN_DEBUG, &list->dev->dev,
+			 "%pR get_res_add_size  add_size %llx\n",
+			 list->res, (unsigned long long)list->add_size);
+		return list->add_size;
+	}
+
+	return 0;
+}
+
 static void __dev_sort_resources(struct pci_dev *dev,
 				 struct resource_list *head)
 {
@@ -221,6 +259,63 @@ static void __assign_resources_sorted(st
 				 struct resource_list_x *realloc_head,
 				 struct resource_list_x *fail_head)
 {
+	/*
+	 * Should not assign requested resources at first.
+	 *   they could be adjacent, so later reassign can not reallocate
+	 *   them one by one in parent resource window.
+	 * Try to assign requested + add_size at begining
+	 *  if could do that, could get out early.
+	 *  if could not do that, we still try to assign requested at first,
+	 *    then try to reassign add_size for some resources.
+	 */
+	struct resource_list_x save_head, local_fail_head, *list;
+	struct resource_list *l;
+
+	if (!realloc_head)
+		goto requested_and_reassign;
+
+	/* Save original start, end, flags etc */
+	save_head.next = NULL;
+	for (l = head->next; l; l = l->next)
+		if (add_to_list(&save_head, l->dev, l->res, 0, 0)) {
+			free_list(resource_list_x, &save_head);
+			goto requested_and_reassign;
+		}
+
+	/* Update res in head list with add_size in realloc_head list */
+	for (l = head->next; l; l = l->next)
+		l->res->end += get_res_add_size(realloc_head, l->res);
+
+	/* Try updated head list with add_size added */
+	local_fail_head.next = NULL;
+	assign_requested_resources_sorted(head, &local_fail_head);
+
+	/* all assigned with add_size ? */
+	if (!local_fail_head.next) {
+		/* Remove head list from realloc_head list */
+		for (l = head->next; l; l = l->next)
+			remove_from_list(realloc_head, l->res);
+		free_list(resource_list_x, &save_head);
+		free_list(resource_list, head);
+		return;
+	}
+
+	free_list(resource_list_x, &local_fail_head);
+	/* Release assigned resource */
+	for (l = head->next; l; l = l->next)
+		if (l->res->parent)
+			release_resource(l->res);
+	/* Restore start/end/flags from save list */
+	for (list = save_head.next; list; list = list->next) {
+		struct resource *res = list->res;
+
+		res->start = list->start;
+		res->end = list->end;
+		res->flags = list->flags;
+	}
+	free_list(resource_list_x, &save_head);
+
+requested_and_reassign:
 	/* Satisfy the must-have resource requests */
 	assign_requested_resources_sorted(head, fail_head);
 
@@ -548,20 +643,6 @@ static resource_size_t calculate_memsize
 	return size;
 }
 
-static resource_size_t get_res_add_size(struct resource_list_x *realloc_head,
-					struct resource *res)
-{
-	struct resource_list_x *list;
-
-	/* check if it is in realloc_head list */
-	for (list = realloc_head->next; list && list->res != res;
-			list = list->next);
-	if (list)
-		return list->add_size;
-
-	return 0;
-}
-
 /**
  * pbus_size_io() - size the io window of a given bus
  *

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH 4/8] PCI: Using add_list in pcie hotplug path
       [not found] <4E9A3092.4080309@oracle.com>
                   ` (2 preceding siblings ...)
  2011-10-16  1:32 ` [PATCH 3/8] pci: Try to assign required+option size at first Yinghai Lu
@ 2011-10-16  1:32 ` Yinghai Lu
  2011-10-16  1:32 ` [PATCH 5/8] PCI: Make rescan bus could increase bridge resource size if needed Yinghai Lu
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 31+ messages in thread
From: Yinghai Lu @ 2011-10-16  1:32 UTC (permalink / raw)
  To: Jesse Barnes; +Cc: linux-pci@vger.kernel.org, linux-kernel@vger.kernel.org


We need add size for hot plug path when pluging in hotplug chassis without cards.
-v2: change descriptions. make it can be applied after
	pci: Check bridge resources after resource allocation.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>

---
 drivers/pci/setup-bus.c |   18 ++++++++++++------
 1 file changed, 12 insertions(+), 6 deletions(-)

Index: linux-2.6/drivers/pci/setup-bus.c
===================================================================
--- linux-2.6.orig/drivers/pci/setup-bus.c
+++ linux-2.6/drivers/pci/setup-bus.c
@@ -327,13 +327,14 @@ requested_and_reassign:
 }
 
 static void pdev_assign_resources_sorted(struct pci_dev *dev,
+				 struct resource_list_x *add_head,
 				 struct resource_list_x *fail_head)
 {
 	struct resource_list head;
 
 	head.next = NULL;
 	__dev_sort_resources(dev, &head);
-	__assign_resources_sorted(&head, NULL, fail_head);
+	__assign_resources_sorted(&head, add_head, fail_head);
 
 }
 
@@ -1003,17 +1004,19 @@ void __ref pci_bus_assign_resources(cons
 EXPORT_SYMBOL(pci_bus_assign_resources);
 
 static void __ref __pci_bridge_assign_resources(const struct pci_dev *bridge,
+					 struct resource_list_x *add_head,
 					 struct resource_list_x *fail_head)
 {
 	struct pci_bus *b;
 
-	pdev_assign_resources_sorted((struct pci_dev *)bridge, fail_head);
+	pdev_assign_resources_sorted((struct pci_dev *)bridge,
+					 add_head, fail_head);
 
 	b = bridge->subordinate;
 	if (!b)
 		return;
 
-	__pci_bus_assign_resources(b, NULL, fail_head);
+	__pci_bus_assign_resources(b, add_head, fail_head);
 
 	switch (bridge->class >> 8) {
 	case PCI_CLASS_BRIDGE_PCI:
@@ -1288,6 +1291,8 @@ enable_and_dump:
 void pci_assign_unassigned_bridge_resources(struct pci_dev *bridge)
 {
 	struct pci_bus *parent = bridge->subordinate;
+	struct resource_list_x add_list; /* list of resources that
+					want additional resources */
 	int tried_times = 0;
 	struct resource_list_x head, *list;
 	int retval;
@@ -1295,11 +1300,12 @@ void pci_assign_unassigned_bridge_resour
 				  IORESOURCE_PREFETCH;
 
 	head.next = NULL;
+	add_list.next = NULL;
 
 again:
-	pci_bus_size_bridges(parent);
-	__pci_bridge_assign_resources(bridge, &head);
-
+	__pci_bus_size_bridges(parent, &add_list);
+	__pci_bridge_assign_resources(bridge, &add_list, &head);
+	BUG_ON(add_list.next);
 	tried_times++;
 
 	if (!head.next)

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH 5/8] PCI: Make rescan bus could increase bridge resource size if needed
       [not found] <4E9A3092.4080309@oracle.com>
                   ` (3 preceding siblings ...)
  2011-10-16  1:32 ` [PATCH 4/8] PCI: Using add_list in pcie hotplug path Yinghai Lu
@ 2011-10-16  1:32 ` Yinghai Lu
  2011-10-16  1:32 ` [PATCH 6/8] PCI: Make pci_rescan_bus handle add_list Yinghai Lu
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 31+ messages in thread
From: Yinghai Lu @ 2011-10-16  1:32 UTC (permalink / raw)
  To: Jesse Barnes; +Cc: linux-pci@vger.kernel.org, linux-kernel@vger.kernel.org



Current rescan will not touch bridge MMIO and IO.

Try to reuse pci_assign_unassigned_bridge_resources(bridge) to update bridge
resource, if child devices need more resource.

only do that for bridges that all children get removed before. So do not
release resources that could already be used by drivers of child devices.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>

---
 drivers/pci/pci-sysfs.c |    5 ++++-
 drivers/pci/probe.c     |   24 ++++++++++++++++++++++++
 include/linux/pci.h     |    1 +
 3 files changed, 29 insertions(+), 1 deletion(-)

Index: linux-2.6/drivers/pci/pci-sysfs.c
===================================================================
--- linux-2.6.orig/drivers/pci/pci-sysfs.c
+++ linux-2.6/drivers/pci/pci-sysfs.c
@@ -365,7 +365,10 @@ dev_bus_rescan_store(struct device *dev,
 
 	if (val) {
 		mutex_lock(&pci_remove_rescan_mutex);
-		pci_rescan_bus(bus);
+		if (!pci_is_root_bus(bus) && list_empty(&bus->devices))
+			pci_rescan_bus_bridge_resize(bus->self);
+		else
+			pci_rescan_bus(bus);
 		mutex_unlock(&pci_remove_rescan_mutex);
 	}
 	return count;
Index: linux-2.6/drivers/pci/probe.c
===================================================================
--- linux-2.6.orig/drivers/pci/probe.c
+++ linux-2.6/drivers/pci/probe.c
@@ -1601,6 +1601,30 @@ EXPORT_SYMBOL(pci_scan_bus_parented);
 
 #ifdef CONFIG_HOTPLUG
 /**
+ * pci_rescan_bus_bridge_resize - scan a PCI bus for devices.
+ * @bridge: PCI bridge for the bus to scan
+ *
+ * Scan a PCI bus and child buses for new devices, adds them,
+ * and enables them, it will resize bridge mmio/io resource if it is possible
+ * for safe, caller should make sure that children get removed already.
+ *
+ * Returns the max number of subordinate bus discovered.
+ */
+unsigned int __ref pci_rescan_bus_bridge_resize(struct pci_dev *bridge)
+{
+	unsigned int max;
+	struct pci_bus *bus = bridge->subordinate;
+
+	max = pci_scan_child_bus(bus);
+
+	pci_assign_unassigned_bridge_resources(bridge);
+
+	pci_bus_add_devices(bus);
+
+	return max;
+}
+
+/**
  * pci_rescan_bus - scan a PCI bus for devices.
  * @bus: PCI bus to scan
  *
Index: linux-2.6/include/linux/pci.h
===================================================================
--- linux-2.6.orig/include/linux/pci.h
+++ linux-2.6/include/linux/pci.h
@@ -875,6 +875,7 @@ void set_pcie_hotplug_bridge(struct pci_
 /* Functions for PCI Hotplug drivers to use */
 int pci_bus_find_capability(struct pci_bus *bus, unsigned int devfn, int cap);
 #ifdef CONFIG_HOTPLUG
+unsigned int pci_rescan_bus_bridge_resize(struct pci_dev *bridge);
 unsigned int pci_rescan_bus(struct pci_bus *bus);
 #endif
 

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH 6/8] PCI: Make pci_rescan_bus handle add_list
       [not found] <4E9A3092.4080309@oracle.com>
                   ` (4 preceding siblings ...)
  2011-10-16  1:32 ` [PATCH 5/8] PCI: Make rescan bus could increase bridge resource size if needed Yinghai Lu
@ 2011-10-16  1:32 ` Yinghai Lu
  2011-10-16  1:32 ` [PATCH 7/8] PCI, sysfs: merge dev and bus cpuaffinity show handling Yinghai Lu
  2011-10-16  1:32 ` [PATCH 8/8] PCI, sys: only create rescan under /sys/.../pci/devices/... for pci bridges Yinghai Lu
  7 siblings, 0 replies; 31+ messages in thread
From: Yinghai Lu @ 2011-10-16  1:32 UTC (permalink / raw)
  To: Jesse Barnes; +Cc: linux-pci@vger.kernel.org, linux-kernel@vger.kernel.org


So it will allocate resource to hotplug bridge during remove/rescan.

Need to move that function to setup-bus.c so it could use
  __pci_bus_size_bridges and __pci_bus_assign_resources
directly to take add_list.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>

---
 drivers/pci/probe.c     |   32 --------------------------------
 drivers/pci/setup-bus.c |   39 +++++++++++++++++++++++++++++++++++++++
 2 files changed, 39 insertions(+), 32 deletions(-)

Index: linux-2.6/drivers/pci/probe.c
===================================================================
--- linux-2.6.orig/drivers/pci/probe.c
+++ linux-2.6/drivers/pci/probe.c
@@ -1624,38 +1624,6 @@ unsigned int __ref pci_rescan_bus_bridge
 	return max;
 }
 
-/**
- * pci_rescan_bus - scan a PCI bus for devices.
- * @bus: PCI bus to scan
- *
- * Scan a PCI bus and child buses for new devices, adds them,
- * and enables them.
- *
- * Returns the max number of subordinate bus discovered.
- */
-unsigned int __ref pci_rescan_bus(struct pci_bus *bus)
-{
-	unsigned int max;
-	struct pci_dev *dev;
-
-	max = pci_scan_child_bus(bus);
-
-	down_read(&pci_bus_sem);
-	list_for_each_entry(dev, &bus->devices, bus_list)
-		if (dev->hdr_type == PCI_HEADER_TYPE_BRIDGE ||
-		    dev->hdr_type == PCI_HEADER_TYPE_CARDBUS)
-			if (dev->subordinate)
-				pci_bus_size_bridges(dev->subordinate);
-	up_read(&pci_bus_sem);
-
-	pci_bus_assign_resources(bus);
-	pci_enable_bridges(bus);
-	pci_bus_add_devices(bus);
-
-	return max;
-}
-EXPORT_SYMBOL_GPL(pci_rescan_bus);
-
 EXPORT_SYMBOL(pci_add_new_bus);
 EXPORT_SYMBOL(pci_scan_slot);
 EXPORT_SYMBOL(pci_scan_bridge);
Index: linux-2.6/drivers/pci/setup-bus.c
===================================================================
--- linux-2.6.orig/drivers/pci/setup-bus.c
+++ linux-2.6/drivers/pci/setup-bus.c
@@ -1354,3 +1354,42 @@ enable_all:
 	pci_enable_bridges(parent);
 }
 EXPORT_SYMBOL_GPL(pci_assign_unassigned_bridge_resources);
+
+#ifdef CONFIG_HOTPLUG
+/**
+ * pci_rescan_bus - scan a PCI bus for devices.
+ * @bus: PCI bus to scan
+ *
+ * Scan a PCI bus and child buses for new devices, adds them,
+ * and enables them.
+ *
+ * Returns the max number of subordinate bus discovered.
+ */
+unsigned int __ref pci_rescan_bus(struct pci_bus *bus)
+{
+	unsigned int max;
+	struct pci_dev *dev;
+	struct resource_list_x add_list; /* list of resources that
+					want additional resources */
+
+	max = pci_scan_child_bus(bus);
+
+	add_list.next = NULL;
+	down_read(&pci_bus_sem);
+	list_for_each_entry(dev, &bus->devices, bus_list)
+		if (dev->hdr_type == PCI_HEADER_TYPE_BRIDGE ||
+		    dev->hdr_type == PCI_HEADER_TYPE_CARDBUS)
+			if (dev->subordinate)
+				__pci_bus_size_bridges(dev->subordinate,
+							 &add_list);
+	up_read(&pci_bus_sem);
+	__pci_bus_assign_resources(bus, &add_list, NULL);
+	BUG_ON(add_list.next);
+
+	pci_enable_bridges(bus);
+	pci_bus_add_devices(bus);
+
+	return max;
+}
+EXPORT_SYMBOL_GPL(pci_rescan_bus);
+#endif

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH 7/8] PCI, sysfs: merge dev and bus cpuaffinity show handling
       [not found] <4E9A3092.4080309@oracle.com>
                   ` (5 preceding siblings ...)
  2011-10-16  1:32 ` [PATCH 6/8] PCI: Make pci_rescan_bus handle add_list Yinghai Lu
@ 2011-10-16  1:32 ` Yinghai Lu
  2011-10-16  1:32 ` [PATCH 8/8] PCI, sys: only create rescan under /sys/.../pci/devices/... for pci bridges Yinghai Lu
  7 siblings, 0 replies; 31+ messages in thread
From: Yinghai Lu @ 2011-10-16  1:32 UTC (permalink / raw)
  To: Jesse Barnes; +Cc: linux-pci@vger.kernel.org, linux-kernel@vger.kernel.org


merge them one function with two extra parameters.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>

---
 drivers/pci/pci-sysfs.c |   69 ++++++++++++++++++------------------------------
 1 file changed, 27 insertions(+), 42 deletions(-)

Index: linux-2.6/drivers/pci/pci-sysfs.c
===================================================================
--- linux-2.6.orig/drivers/pci/pci-sysfs.c
+++ linux-2.6/drivers/pci/pci-sysfs.c
@@ -71,75 +71,60 @@ static ssize_t broken_parity_status_stor
 	return count;
 }
 
-static ssize_t local_cpus_show(struct device *dev,
-			struct device_attribute *attr, char *buf)
-{		
-	const struct cpumask *mask;
+static ssize_t pci_show_cpuaffinity(struct device *dev,
+					int type, bool is_pci_dev,
+					struct device_attribute *attr,
+					char *buf)
+{
 	int len;
+	const struct cpumask *mask;
 
+	if (is_pci_dev) {
 #ifdef CONFIG_NUMA
-	mask = (dev_to_node(dev) == -1) ? cpu_online_mask :
-					  cpumask_of_node(dev_to_node(dev));
+		int node = dev_to_node(dev);
+
+		mask = (node == -1) ? cpu_online_mask : cpumask_of_node(node);
 #else
-	mask = cpumask_of_pcibus(to_pci_dev(dev)->bus);
+		mask = cpumask_of_pcibus(to_pci_dev(dev)->bus);
 #endif
-	len = cpumask_scnprintf(buf, PAGE_SIZE-2, mask);
+	} else
+		mask = cpumask_of_pcibus(to_pci_bus(dev));
+	len = type ?
+		cpulist_scnprintf(buf, PAGE_SIZE-2, mask) :
+		cpumask_scnprintf(buf, PAGE_SIZE-2, mask);
 	buf[len++] = '\n';
 	buf[len] = '\0';
 	return len;
 }
 
+static ssize_t local_cpus_show(struct device *dev,
+			struct device_attribute *attr, char *buf)
+{
+	return pci_show_cpuaffinity(dev, 0, true, attr, buf);
+}
+
 
 static ssize_t local_cpulist_show(struct device *dev,
 			struct device_attribute *attr, char *buf)
 {
-	const struct cpumask *mask;
-	int len;
-
-#ifdef CONFIG_NUMA
-	mask = (dev_to_node(dev) == -1) ? cpu_online_mask :
-					  cpumask_of_node(dev_to_node(dev));
-#else
-	mask = cpumask_of_pcibus(to_pci_dev(dev)->bus);
-#endif
-	len = cpulist_scnprintf(buf, PAGE_SIZE-2, mask);
-	buf[len++] = '\n';
-	buf[len] = '\0';
-	return len;
+	return pci_show_cpuaffinity(dev, 1, true, attr, buf);
 }
 
 /*
  * PCI Bus Class Devices
  */
-static ssize_t pci_bus_show_cpuaffinity(struct device *dev,
-					int type,
-					struct device_attribute *attr,
-					char *buf)
-{
-	int ret;
-	const struct cpumask *cpumask;
-
-	cpumask = cpumask_of_pcibus(to_pci_bus(dev));
-	ret = type ?
-		cpulist_scnprintf(buf, PAGE_SIZE-2, cpumask) :
-		cpumask_scnprintf(buf, PAGE_SIZE-2, cpumask);
-	buf[ret++] = '\n';
-	buf[ret] = '\0';
-	return ret;
-}
-
-static inline ssize_t pci_bus_show_cpumaskaffinity(struct device *dev,
+static ssize_t pci_bus_show_cpumaskaffinity(struct device *dev,
 					struct device_attribute *attr,
 					char *buf)
 {
-	return pci_bus_show_cpuaffinity(dev, 0, attr, buf);
+	return pci_show_cpuaffinity(dev, 0, false, attr, buf);
 }
 
-static inline ssize_t pci_bus_show_cpulistaffinity(struct device *dev,
+static ssize_t pci_bus_show_cpulistaffinity(struct device *dev,
 					struct device_attribute *attr,
 					char *buf)
 {
-	return pci_bus_show_cpuaffinity(dev, 1, attr, buf);
+	return pci_show_cpuaffinity(dev, 1, false, attr, buf);
 }
 
 /* show resources */

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH 8/8] PCI, sys: only create rescan under /sys/.../pci/devices/... for pci bridges
       [not found] <4E9A3092.4080309@oracle.com>
                   ` (6 preceding siblings ...)
  2011-10-16  1:32 ` [PATCH 7/8] PCI, sysfs: merge dev and bus cpuaffinity show handling Yinghai Lu
@ 2011-10-16  1:32 ` Yinghai Lu
  2011-10-16  2:39   ` Greg KH
  7 siblings, 1 reply; 31+ messages in thread
From: Yinghai Lu @ 2011-10-16  1:32 UTC (permalink / raw)
  To: Jesse Barnes, Greg Kroah-Hartman
  Cc: linux-pci@vger.kernel.org, linux-kernel@vger.kernel.org


Current code will create rescan for every pci device,
that is not right. the device is already there, there is no reason to rescan it.

So only have rescan for pci bridges. less confusing

Need to add cond to be new member of device_attribute.
device_add_attrs will check cond to see if need to create the attr.

Also the rescan will rescan bridge's secondary bus instead of primary bus.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>

---
 drivers/base/bus.c      |    7 ++++++-
 drivers/pci/pci-sysfs.c |   16 ++++++++++++++--
 include/linux/device.h  |    1 +
 include/linux/sysfs.h   |    7 +++++++
 4 files changed, 28 insertions(+), 3 deletions(-)

Index: linux-2.6/drivers/base/bus.c
===================================================================
--- linux-2.6.orig/drivers/base/bus.c
+++ linux-2.6/drivers/base/bus.c
@@ -419,7 +419,12 @@ static int device_add_attrs(struct bus_t
 		return 0;
 
 	for (i = 0; attr_name(bus->dev_attrs[i]); i++) {
-		error = device_create_file(dev, &bus->dev_attrs[i]);
+		struct device_attribute *attr;
+
+		attr = &bus->dev_attrs[i];
+		if (attr->cond && !attr->cond(dev, attr))
+			continue;
+		error = device_create_file(dev, attr);
 		if (error) {
 			while (--i >= 0)
 				device_remove_file(dev, &bus->dev_attrs[i]);
Index: linux-2.6/drivers/pci/pci-sysfs.c
===================================================================
--- linux-2.6.orig/drivers/pci/pci-sysfs.c
+++ linux-2.6/drivers/pci/pci-sysfs.c
@@ -303,12 +303,23 @@ dev_rescan_store(struct device *dev, str
 
 	if (val) {
 		mutex_lock(&pci_remove_rescan_mutex);
-		pci_rescan_bus(pdev->bus);
+		pci_rescan_bus(pdev->subordinate);
 		mutex_unlock(&pci_remove_rescan_mutex);
 	}
 	return count;
 }
 
+static int
+dev_rescan_cond(struct device *dev, struct device_attribute *attr)
+{
+	struct pci_dev *pdev = to_pci_dev(dev);
+
+	if (pdev && pdev->subordinate)
+		return 1;
+
+	return 0;
+}
+
 static void remove_callback(struct device *dev)
 {
 	struct pci_dev *pdev = to_pci_dev(dev);
@@ -383,7 +394,8 @@ struct device_attribute pci_dev_attrs[]
 	__ATTR(msi_bus, 0644, msi_bus_show, msi_bus_store),
 #ifdef CONFIG_HOTPLUG
 	__ATTR(remove, (S_IWUSR|S_IWGRP), NULL, remove_store),
-	__ATTR(rescan, (S_IWUSR|S_IWGRP), NULL, dev_rescan_store),
+	__ATTR_COND(rescan, (S_IWUSR|S_IWGRP), NULL, dev_rescan_store,
+			dev_rescan_cond),
 #endif
 	__ATTR_NULL,
 };
Index: linux-2.6/include/linux/device.h
===================================================================
--- linux-2.6.orig/include/linux/device.h
+++ linux-2.6/include/linux/device.h
@@ -427,6 +427,7 @@ struct device_attribute {
 			char *buf);
 	ssize_t (*store)(struct device *dev, struct device_attribute *attr,
 			 const char *buf, size_t count);
+	int (*cond)(struct device *dev, struct device_attribute *attr);
 };
 
 #define DEVICE_ATTR(_name, _mode, _show, _store) \
Index: linux-2.6/include/linux/sysfs.h
===================================================================
--- linux-2.6.orig/include/linux/sysfs.h
+++ linux-2.6/include/linux/sysfs.h
@@ -73,6 +73,13 @@ struct attribute_group {
 	.store	= _store,					\
 }
 
+#define __ATTR_COND(_name, _mode, _show, _store, _cond) { \
+	.attr = {.name = __stringify(_name), .mode = _mode },	\
+	.show	= _show,					\
+	.store	= _store,					\
+	.cond	= _cond,					\
+}
+
 #define __ATTR_RO(_name) { \
 	.attr	= { .name = __stringify(_name), .mode = 0444 },	\
 	.show	= _name##_show,					\

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 8/8] PCI, sys: only create rescan under /sys/.../pci/devices/... for pci bridges
  2011-10-16  1:32 ` [PATCH 8/8] PCI, sys: only create rescan under /sys/.../pci/devices/... for pci bridges Yinghai Lu
@ 2011-10-16  2:39   ` Greg KH
  2011-10-16  5:34     ` Yinghai Lu
  0 siblings, 1 reply; 31+ messages in thread
From: Greg KH @ 2011-10-16  2:39 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Jesse Barnes, linux-pci@vger.kernel.org,
	linux-kernel@vger.kernel.org

On Sat, Oct 15, 2011 at 06:32:38PM -0700, Yinghai Lu wrote:
> 
> Current code will create rescan for every pci device,
> that is not right. the device is already there, there is no reason to rescan it.
> 
> So only have rescan for pci bridges. less confusing
> 
> Need to add cond to be new member of device_attribute.
> device_add_attrs will check cond to see if need to create the attr.

No, we already have a way to do this, this is not acceptable, sorry.

greg k-h

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 8/8] PCI, sys: only create rescan under /sys/.../pci/devices/... for pci bridges
  2011-10-16  2:39   ` Greg KH
@ 2011-10-16  5:34     ` Yinghai Lu
  2011-10-16 15:55       ` Greg KH
  0 siblings, 1 reply; 31+ messages in thread
From: Yinghai Lu @ 2011-10-16  5:34 UTC (permalink / raw)
  To: Greg KH
  Cc: Jesse Barnes, linux-pci@vger.kernel.org,
	linux-kernel@vger.kernel.org

On 10/15/2011 07:39 PM, Greg KH wrote:

> 
> No, we already have a way to do this, this is not acceptable, sorry.
> 


hope this one is ok...

[PATCH -v2 8/8] PCI, sys: only create rescan under /sys/.../pci/devices/... for pci bridges

Current code will create rescan for every pci device,
that is not right. the device is already there, there is no reason to rescan it.

So only have rescan for pci bridges. less confusing

Also the rescan will rescan bridge's secondary bus instead of primary bus.

-v2: don't add __ATTR_COND() as GregKH does not like it

Signed-off-by: Yinghai Lu <yinghai@kernel.org>

---
 drivers/pci/pci-sysfs.c |   29 +++++++++++++++++++++++++----
 1 file changed, 25 insertions(+), 4 deletions(-)

Index: linux-2.6/drivers/pci/pci-sysfs.c
===================================================================
--- linux-2.6.orig/drivers/pci/pci-sysfs.c
+++ linux-2.6/drivers/pci/pci-sysfs.c
@@ -303,12 +303,15 @@ dev_rescan_store(struct device *dev, str
 
 	if (val) {
 		mutex_lock(&pci_remove_rescan_mutex);
-		pci_rescan_bus(pdev->bus);
+		pci_rescan_bus(pdev->subordinate);
 		mutex_unlock(&pci_remove_rescan_mutex);
 	}
 	return count;
 }
 
+struct device_attribute bridge_rescan_attr =
+	__ATTR(rescan, (S_IWUSR|S_IWGRP), NULL, dev_rescan_store);
+
 static void remove_callback(struct device *dev)
 {
 	struct pci_dev *pdev = to_pci_dev(dev);
@@ -383,7 +386,6 @@ struct device_attribute pci_dev_attrs[]
 	__ATTR(msi_bus, 0644, msi_bus_show, msi_bus_store),
 #ifdef CONFIG_HOTPLUG
 	__ATTR(remove, (S_IWUSR|S_IWGRP), NULL, remove_store),
-	__ATTR(rescan, (S_IWUSR|S_IWGRP), NULL, dev_rescan_store),
 #endif
 	__ATTR_NULL,
 };
@@ -1217,20 +1219,34 @@ int __must_check pci_create_sysfs_dev_fi
 			goto err_rom_file;
 	}
 
+#ifdef CONFIG_HOTPLUG
+	if (pdev->subordinate) {
+		retval = device_create_file(&pdev->dev, &bridge_rescan_attr);
+		if (retval)
+			goto err_vga_file;
+	}
+#endif
+
 	/* add platform-specific attributes */
 	retval = pcibios_add_platform_entries(pdev);
 	if (retval)
-		goto err_vga_file;
+		goto err_bridge_rescan_file;
 
 	/* add sysfs entries for various capabilities */
 	retval = pci_create_capabilities_sysfs(pdev);
 	if (retval)
-		goto err_vga_file;
+		goto err_bridge_rescan_file;
 
 	pci_create_firmware_label_files(pdev);
 
 	return 0;
 
+err_bridge_rescan_file:
+#ifdef CONFIG_HOTPLUG
+	if (pdev->subordinate)
+		device_remove_file(&pdev->dev, &bridge_rescan_attr);
+#endif
+
 err_vga_file:
 	if ((pdev->class >> 8) == PCI_CLASS_DISPLAY_VGA)
 		device_remove_file(&pdev->dev, &vga_attr);
@@ -1297,6 +1313,11 @@ void pci_remove_sysfs_dev_files(struct p
 		kfree(pdev->rom_attr);
 	}
 
+#ifdef CONFIG_HOTPLUG
+	if (pdev->subordinate)
+		device_remove_file(&pdev->dev, &bridge_rescan_attr);
+#endif
+
 	pci_remove_firmware_label_files(pdev);
 
 }

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 8/8] PCI, sys: only create rescan under /sys/.../pci/devices/... for pci bridges
  2011-10-16  5:34     ` Yinghai Lu
@ 2011-10-16 15:55       ` Greg KH
  2011-10-16 23:35         ` Yinghai Lu
  0 siblings, 1 reply; 31+ messages in thread
From: Greg KH @ 2011-10-16 15:55 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Jesse Barnes, linux-pci@vger.kernel.org,
	linux-kernel@vger.kernel.org

On Sat, Oct 15, 2011 at 10:34:03PM -0700, Yinghai Lu wrote:
> On 10/15/2011 07:39 PM, Greg KH wrote:
> 
> > 
> > No, we already have a way to do this, this is not acceptable, sorry.
> > 
> 
> 
> hope this one is ok...

Nope, not at all, don't use #ifdef in .c files.

Again, we have a way to do this, built into sysfs, that can dynamically
know to add or not add, attributes when they are registered with the
system.

Is there some reason you want to create a new way to do this, or just
ignore it by using another forbidden thing (i.e. #ifdefs)?

confused,

greg k-h

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 8/8] PCI, sys: only create rescan under /sys/.../pci/devices/... for pci bridges
  2011-10-16 15:55       ` Greg KH
@ 2011-10-16 23:35         ` Yinghai Lu
  2011-10-17  1:45           ` Greg KH
  0 siblings, 1 reply; 31+ messages in thread
From: Yinghai Lu @ 2011-10-16 23:35 UTC (permalink / raw)
  To: Greg KH
  Cc: Jesse Barnes, linux-pci@vger.kernel.org,
	linux-kernel@vger.kernel.org

On 10/16/2011 08:55 AM, Greg KH wrote:

> On Sat, Oct 15, 2011 at 10:34:03PM -0700, Yinghai Lu wrote:
>> On 10/15/2011 07:39 PM, Greg KH wrote:
>>
>>>
>>> No, we already have a way to do this, this is not acceptable, sorry.
>>>
>>
>>
>> hope this one is ok...
> 
> Nope, not at all, don't use #ifdef in .c files.
> 
> Again, we have a way to do this, built into sysfs, that can dynamically
> know to add or not add, attributes when they are registered with the
> system.

thank you for the comments.

i got it. now introduce device type for pci dev.

please check if this one is what you want.

[PATCH -v3 8/8] PCI, sys: only create rescan under /sys/.../pci/devices/... for pci bridges

Current code will create rescan for every pci device,
that is not right. the device is already there, there is no reason to rescan it.

So only have rescan for pci bridges. less confusing

Also the rescan will rescan bridge's secondary bus instead of primary bus.

-v3: Use device_type for pci dev.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>

---
 drivers/pci/pci-sysfs.c |   39 +++++++++++++++++++++++++++++++++++++--
 drivers/pci/pci.h       |    1 +
 drivers/pci/probe.c     |    1 +
 3 files changed, 39 insertions(+), 2 deletions(-)

Index: linux-2.6/drivers/pci/pci-sysfs.c
===================================================================
--- linux-2.6.orig/drivers/pci/pci-sysfs.c
+++ linux-2.6/drivers/pci/pci-sysfs.c
@@ -303,12 +303,15 @@ dev_rescan_store(struct device *dev, str
 
 	if (val) {
 		mutex_lock(&pci_remove_rescan_mutex);
-		pci_rescan_bus(pdev->bus);
+		pci_rescan_bus(pdev->subordinate);
 		mutex_unlock(&pci_remove_rescan_mutex);
 	}
 	return count;
 }
 
+static struct device_attribute pci_dev_bridge_rescan_attr =
+	__ATTR(rescan, (S_IWUSR|S_IWGRP), NULL, dev_rescan_store);
+
 static void remove_callback(struct device *dev)
 {
 	struct pci_dev *pdev = to_pci_dev(dev);
@@ -358,8 +361,41 @@ dev_bus_rescan_store(struct device *dev,
 	}
 	return count;
 }
+#endif
+
 
+static struct attribute *pci_dev_bridge_attrs[] = {
+#ifdef CONFIG_HOTPLUG
+	&pci_dev_bridge_rescan_attr.attr,
 #endif
+	NULL,
+};
+
+static mode_t pci_dev_bridge_attrs_are_visible(struct kobject *kobj,
+						struct attribute *a, int n)
+{
+	struct device *dev = container_of(kobj, struct device, kobj);
+	struct pci_dev *pdev = to_pci_dev(dev);
+
+	if (!pdev->subordinate)
+		return 0;
+
+	return a->mode;
+}
+
+static struct attribute_group pci_dev_bridge_attr_group = {
+	.attrs = pci_dev_bridge_attrs,
+	.is_visible = pci_dev_bridge_attrs_are_visible,
+};
+
+static const struct attribute_group *pci_dev_attr_groups[] = {
+	&pci_dev_bridge_attr_group,
+	NULL,
+};
+
+struct device_type pci_dev_type = {
+	.groups = pci_dev_attr_groups,
+};
 
 struct device_attribute pci_dev_attrs[] = {
 	__ATTR_RO(resource),
@@ -383,7 +419,6 @@ struct device_attribute pci_dev_attrs[]
 	__ATTR(msi_bus, 0644, msi_bus_show, msi_bus_store),
 #ifdef CONFIG_HOTPLUG
 	__ATTR(remove, (S_IWUSR|S_IWGRP), NULL, remove_store),
-	__ATTR(rescan, (S_IWUSR|S_IWGRP), NULL, dev_rescan_store),
 #endif
 	__ATTR_NULL,
 };
Index: linux-2.6/drivers/pci/pci.h
===================================================================
--- linux-2.6.orig/drivers/pci/pci.h
+++ linux-2.6/drivers/pci/pci.h
@@ -159,6 +159,7 @@ static inline int pci_no_d1d2(struct pci
 }
 extern struct device_attribute pci_dev_attrs[];
 extern struct device_attribute pcibus_dev_attrs[];
+extern struct device_type pci_dev_type;
 #ifdef CONFIG_HOTPLUG
 extern struct bus_attribute pci_bus_attrs[];
 #else
Index: linux-2.6/drivers/pci/probe.c
===================================================================
--- linux-2.6.orig/drivers/pci/probe.c
+++ linux-2.6/drivers/pci/probe.c
@@ -902,6 +902,7 @@ int pci_setup_device(struct pci_dev *dev
 	dev->sysdata = dev->bus->sysdata;
 	dev->dev.parent = dev->bus->bridge;
 	dev->dev.bus = &pci_bus_type;
+	dev->dev.type = &pci_dev_type;
 	dev->hdr_type = hdr_type & 0x7f;
 	dev->multifunction = !!(hdr_type & 0x80);
 	dev->error_state = pci_channel_io_normal;

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 8/8] PCI, sys: only create rescan under /sys/.../pci/devices/... for pci bridges
  2011-10-16 23:35         ` Yinghai Lu
@ 2011-10-17  1:45           ` Greg KH
  2011-10-17 18:27             ` [PATCH -v4 8_1/8] PCI, sys: Use device_type and attr_groups with pci dev Yinghai Lu
                               ` (3 more replies)
  0 siblings, 4 replies; 31+ messages in thread
From: Greg KH @ 2011-10-17  1:45 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Jesse Barnes, linux-pci@vger.kernel.org,
	linux-kernel@vger.kernel.org

On Sun, Oct 16, 2011 at 04:35:11PM -0700, Yinghai Lu wrote:
> On 10/16/2011 08:55 AM, Greg KH wrote:
> 
> > On Sat, Oct 15, 2011 at 10:34:03PM -0700, Yinghai Lu wrote:
> >> On 10/15/2011 07:39 PM, Greg KH wrote:
> >>
> >>>
> >>> No, we already have a way to do this, this is not acceptable, sorry.
> >>>
> >>
> >>
> >> hope this one is ok...
> > 
> > Nope, not at all, don't use #ifdef in .c files.
> > 
> > Again, we have a way to do this, built into sysfs, that can dynamically
> > know to add or not add, attributes when they are registered with the
> > system.
> 
> thank you for the comments.
> 
> i got it. now introduce device type for pci dev.
> 
> please check if this one is what you want.
> 
> [PATCH -v3 8/8] PCI, sys: only create rescan under /sys/.../pci/devices/... for pci bridges
> 
> Current code will create rescan for every pci device,
> that is not right. the device is already there, there is no reason to rescan it.
> 
> So only have rescan for pci bridges. less confusing
> 
> Also the rescan will rescan bridge's secondary bus instead of primary bus.
> 
> -v3: Use device_type for pci dev.
> 
> Signed-off-by: Yinghai Lu <yinghai@kernel.org>
> 
> ---
>  drivers/pci/pci-sysfs.c |   39 +++++++++++++++++++++++++++++++++++++--
>  drivers/pci/pci.h       |    1 +
>  drivers/pci/probe.c     |    1 +
>  3 files changed, 39 insertions(+), 2 deletions(-)
> 
> Index: linux-2.6/drivers/pci/pci-sysfs.c
> ===================================================================
> --- linux-2.6.orig/drivers/pci/pci-sysfs.c
> +++ linux-2.6/drivers/pci/pci-sysfs.c
> @@ -303,12 +303,15 @@ dev_rescan_store(struct device *dev, str
>  
>  	if (val) {
>  		mutex_lock(&pci_remove_rescan_mutex);
> -		pci_rescan_bus(pdev->bus);
> +		pci_rescan_bus(pdev->subordinate);
>  		mutex_unlock(&pci_remove_rescan_mutex);
>  	}
>  	return count;
>  }

This chunk is separate from the rest of the patch, right?

Please split this up into doing one thing per patch please.

greg k-h

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 1/8] pci: Make sriov work with hotplug removal
  2011-10-16  1:31 ` [PATCH 1/8] pci: Make sriov work with hotplug removal Yinghai Lu
@ 2011-10-17 17:16   ` Bjorn Helgaas
  2011-10-17 18:08     ` Yinghai Lu
  0 siblings, 1 reply; 31+ messages in thread
From: Bjorn Helgaas @ 2011-10-17 17:16 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Jesse Barnes, linux-pci@vger.kernel.org,
	linux-kernel@vger.kernel.org

On Sat, Oct 15, 2011 at 7:31 PM, Yinghai Lu <yinghai.lu@oracle.com> wrote:
>
> When hot remove pci express module, got
>
> [ 5918.610127] pciehp 0000:80:02.2:pcie04: pcie_isr: intr_loc 1
> [ 5918.615779] pciehp 0000:80:02.2:pcie04: Attention button interrupt received
> [ 5918.622730] pciehp 0000:80:02.2:pcie04: Button pressed on Slot(3)
> [ 5918.629002] pciehp 0000:80:02.2:pcie04: pciehp_get_power_status: SLOTCTRL a8 value read 1f9
> [ 5918.637416] pciehp 0000:80:02.2:pcie04: PCI slot #3 - powering off due to button press.
> [ 5918.647125] pciehp 0000:80:02.2:pcie04: pcie_isr: intr_loc 10
> [ 5918.653039] pciehp 0000:80:02.2:pcie04: pciehp_green_led_blink: SLOTCTRL a8 write cmd 200
> [ 5918.661229] pciehp 0000:80:02.2:pcie04: pciehp_set_attention_status: SLOTCTRL a8 write cmd c0
> [ 5924.667627] pciehp 0000:80:02.2:pcie04: Disabling domain:bus:device=0000:b0:00
> [ 5924.674909] pciehp 0000:80:02.2:pcie04: pciehp_get_power_status: SLOTCTRL a8 value read 2f9
> [ 5924.683262] pciehp 0000:80:02.2:pcie04: pciehp_unconfigure_device: domain:bus:dev = 0000:b0:00
> [ 5924.693976] libfcoe_device_notification: NETDEV_UNREGISTER eth6
> [ 5924.764979] libfcoe_device_notification: NETDEV_UNREGISTER eth14
> [ 5924.873539] libfcoe_device_notification: NETDEV_UNREGISTER eth15
> [ 5924.995209] libfcoe_device_notification: NETDEV_UNREGISTER eth16
> [ 5926.114407] sxge 0000:b2:00.0: PCI INT A disabled
> [ 5926.119342] BUG: unable to handle kernel NULL pointer dereference at (null)
> [ 5926.127189] IP: [<ffffffff81353a3b>] pci_stop_bus_device+0x33/0x83
> [ 5926.133377] PGD 0
> [ 5926.135402] Oops: 0000 [#1] SMP
> [ 5926.138659] CPU 2
> [ 5926.140499] Modules linked in:
> ...
> [ 5926.143754]
> [ 5926.275823] Call Trace:
> [ 5926.278267]  [<ffffffff81353a38>] pci_stop_bus_device+0x30/0x83
> [ 5926.284180]  [<ffffffff81353af4>] pci_remove_bus_device+0x1a/0xba
> [ 5926.290264]  [<ffffffff81366311>] pciehp_unconfigure_device+0x110/0x17b
> [ 5926.296866]  [<ffffffff81365dd9>] ? pciehp_disable_slot+0x188/0x188
> [ 5926.303123]  [<ffffffff81365d6f>] pciehp_disable_slot+0x11e/0x188
> [ 5926.309206]  [<ffffffff81365e68>] pciehp_power_thread+0x8f/0xe0
> ...
>
> Root cause: when doing pci_stop_bus_device() with phys fn, it will stop virt fn and
> remove the fn, so
>        list_for_each_safe(l, n, &bus->devices)
> will have problem to refer freed n that is pointed to vf entry.
>
> Solution is just call pci_stop_bus_device() with phys fn only. and before that need to
> save phys fn aside and avoid to use bus->devices to loop.
>
> Signed-off-by: Yinghai Lu <yinghai@kernel.org>
>
> ---
>  drivers/pci/remove.c |   33 +++++++++++++++++++++++++++++++++
>  1 file changed, 33 insertions(+)
>
> Index: linux-2.6/drivers/pci/remove.c
> ===================================================================
> --- linux-2.6.orig/drivers/pci/remove.c
> +++ linux-2.6/drivers/pci/remove.c
> @@ -120,10 +120,43 @@ void pci_remove_behind_bridge(struct pci
>                        pci_remove_bus_device(pci_dev_b(l));
>  }
>
> +struct dev_list {
> +       struct pci_dev *dev;
> +       struct list_head list;
> +};
> +
>  static void pci_stop_bus_devices(struct pci_bus *bus)
>  {
>        struct list_head *l, *n;
> +       struct dev_list *dl, *dn;
> +       LIST_HEAD(physfn_list);
> +
> +       /* Save phys_fn aside at first */
> +       list_for_each(l, &bus->devices) {
> +               struct pci_dev *dev = pci_dev_b(l);
> +
> +               if (!dev->is_virtfn) {
> +                       dl = kmalloc(sizeof(*dl), GFP_KERNEL);
> +                       if (!dl)
> +                               continue;
> +                       dl->dev = dev;
> +                       list_add_tail(&dl->list, &physfn_list);
> +               }
> +       }
> +
> +       /*
> +        * stop bus device for phys_fn at first
> +        *  it will stop and remove vf in driver remove action
> +        */
> +       list_for_each_entry_safe(dl, dn, &physfn_list, list) {
> +               struct pci_dev *dev = dl->dev;
> +
> +               pci_stop_bus_device(dev);
> +
> +               kfree(dl);
> +       }
>
> +       /* Do it again for left over in case */
>        list_for_each_safe(l, n, &bus->devices) {
>                struct pci_dev *dev = pci_dev_b(l);
>                pci_stop_bus_device(dev);

Apparently it's a problem to stop a VF before its PF?  Why is that?
Sounds like there's an important dependency here, but I can't figure
it out.  I would have naively expected that you would *want* to stop a
VF first, since it depends on the PF.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 1/8] pci: Make sriov work with hotplug removal
  2011-10-17 17:16   ` Bjorn Helgaas
@ 2011-10-17 18:08     ` Yinghai Lu
  2011-10-17 22:12       ` Bjorn Helgaas
  0 siblings, 1 reply; 31+ messages in thread
From: Yinghai Lu @ 2011-10-17 18:08 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Jesse Barnes, linux-pci@vger.kernel.org,
	linux-kernel@vger.kernel.org

On 10/17/2011 10:16 AM, Bjorn Helgaas wrote:
> On Sat, Oct 15, 2011 at 7:31 PM, Yinghai Lu<yinghai.lu@oracle.com>  wrote:
>>
>> When hot remove pci express module, got
>>
>> [ 5918.610127] pciehp 0000:80:02.2:pcie04: pcie_isr: intr_loc 1
>> [ 5918.615779] pciehp 0000:80:02.2:pcie04: Attention button interrupt received
>> [ 5918.622730] pciehp 0000:80:02.2:pcie04: Button pressed on Slot(3)
>> [ 5918.629002] pciehp 0000:80:02.2:pcie04: pciehp_get_power_status: SLOTCTRL a8 value read 1f9
>> [ 5918.637416] pciehp 0000:80:02.2:pcie04: PCI slot #3 - powering off due to button press.
>> [ 5918.647125] pciehp 0000:80:02.2:pcie04: pcie_isr: intr_loc 10
>> [ 5918.653039] pciehp 0000:80:02.2:pcie04: pciehp_green_led_blink: SLOTCTRL a8 write cmd 200
>> [ 5918.661229] pciehp 0000:80:02.2:pcie04: pciehp_set_attention_status: SLOTCTRL a8 write cmd c0
>> [ 5924.667627] pciehp 0000:80:02.2:pcie04: Disabling domain:bus:device=0000:b0:00
>> [ 5924.674909] pciehp 0000:80:02.2:pcie04: pciehp_get_power_status: SLOTCTRL a8 value read 2f9
>> [ 5924.683262] pciehp 0000:80:02.2:pcie04: pciehp_unconfigure_device: domain:bus:dev = 0000:b0:00
>> [ 5924.693976] libfcoe_device_notification: NETDEV_UNREGISTER eth6
>> [ 5924.764979] libfcoe_device_notification: NETDEV_UNREGISTER eth14
>> [ 5924.873539] libfcoe_device_notification: NETDEV_UNREGISTER eth15
>> [ 5924.995209] libfcoe_device_notification: NETDEV_UNREGISTER eth16
>> [ 5926.114407] sxge 0000:b2:00.0: PCI INT A disabled
>> [ 5926.119342] BUG: unable to handle kernel NULL pointer dereference at (null)
>> [ 5926.127189] IP: [<ffffffff81353a3b>] pci_stop_bus_device+0x33/0x83
>> [ 5926.133377] PGD 0
>> [ 5926.135402] Oops: 0000 [#1] SMP
>> [ 5926.138659] CPU 2
>> [ 5926.140499] Modules linked in:
>> ...
>> [ 5926.143754]
>> [ 5926.275823] Call Trace:
>> [ 5926.278267]  [<ffffffff81353a38>] pci_stop_bus_device+0x30/0x83
>> [ 5926.284180]  [<ffffffff81353af4>] pci_remove_bus_device+0x1a/0xba
>> [ 5926.290264]  [<ffffffff81366311>] pciehp_unconfigure_device+0x110/0x17b
>> [ 5926.296866]  [<ffffffff81365dd9>] ? pciehp_disable_slot+0x188/0x188
>> [ 5926.303123]  [<ffffffff81365d6f>] pciehp_disable_slot+0x11e/0x188
>> [ 5926.309206]  [<ffffffff81365e68>] pciehp_power_thread+0x8f/0xe0
>> ...
>>
>> Root cause: when doing pci_stop_bus_device() with phys fn, it will stop virt fn and
>> remove the fn, so
>>         list_for_each_safe(l, n,&bus->devices)
>> will have problem to refer freed n that is pointed to vf entry.
>>
>> Solution is just call pci_stop_bus_device() with phys fn only. and before that need to
>> save phys fn aside and avoid to use bus->devices to loop.
>>
>> Signed-off-by: Yinghai Lu<yinghai@kernel.org>
>>
>> ---
>>   drivers/pci/remove.c |   33 +++++++++++++++++++++++++++++++++
>>   1 file changed, 33 insertions(+)
>>
>> Index: linux-2.6/drivers/pci/remove.c
>> ===================================================================
>> --- linux-2.6.orig/drivers/pci/remove.c
>> +++ linux-2.6/drivers/pci/remove.c
>> @@ -120,10 +120,43 @@ void pci_remove_behind_bridge(struct pci
>>                         pci_remove_bus_device(pci_dev_b(l));
>>   }
>>
>> +struct dev_list {
>> +       struct pci_dev *dev;
>> +       struct list_head list;
>> +};
>> +
>>   static void pci_stop_bus_devices(struct pci_bus *bus)
>>   {
>>         struct list_head *l, *n;
>> +       struct dev_list *dl, *dn;
>> +       LIST_HEAD(physfn_list);
>> +
>> +       /* Save phys_fn aside at first */
>> +       list_for_each(l,&bus->devices) {
>> +               struct pci_dev *dev = pci_dev_b(l);
>> +
>> +               if (!dev->is_virtfn) {
>> +                       dl = kmalloc(sizeof(*dl), GFP_KERNEL);
>> +                       if (!dl)
>> +                               continue;
>> +                       dl->dev = dev;
>> +                       list_add_tail(&dl->list,&physfn_list);
>> +               }
>> +       }
>> +
>> +       /*
>> +        * stop bus device for phys_fn at first
>> +        *  it will stop and remove vf in driver remove action
>> +        */
>> +       list_for_each_entry_safe(dl, dn,&physfn_list, list) {
>> +               struct pci_dev *dev = dl->dev;
>> +
>> +               pci_stop_bus_device(dev);
>> +
>> +               kfree(dl);
>> +       }
>>
>> +       /* Do it again for left over in case */
>>         list_for_each_safe(l, n,&bus->devices) {
>>                 struct pci_dev *dev = pci_dev_b(l);
>>                 pci_stop_bus_device(dev);
>
> Apparently it's a problem to stop a VF before its PF?  Why is that?
> Sounds like there's an important dependency here, but I can't figure
> it out.  I would have naively expected that you would *want* to stop a
> VF first, since it depends on the PF.

when you stop the VF, that VF will not be removed.

when you stop the PF, the driver will be unloaded, it will *STOP* the VF 
at first and *REMOVE* the VF there. So the bus->devices will be touch at 
that time.
for example when you have three VFs. those VFs will be removed from 
bus->devices, and n in the list_for_each_safe will have invalid pointer 
to freed entry.

So stop the VF at first will *NOT* help.

Thanks

Yinghai

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH -v4 8_1/8] PCI, sys: Use device_type and attr_groups with pci dev
  2011-10-17  1:45           ` Greg KH
@ 2011-10-17 18:27             ` Yinghai Lu
  2011-10-17 18:38               ` Greg KH
  2011-10-17 18:29             ` [PATCH -v4 8_2/8] PCI, sys: only create rescan under /sys/.../pci/devices/... for pci bridges Yinghai Lu
                               ` (2 subsequent siblings)
  3 siblings, 1 reply; 31+ messages in thread
From: Yinghai Lu @ 2011-10-17 18:27 UTC (permalink / raw)
  To: Greg KH, Jesse Barnes
  Cc: linux-pci@vger.kernel.org, linux-kernel@vger.kernel.org


We want to create rescan in sys only for pci bridge instead of all pci dev.

We could use attribute_groups/is_visible method to do that.

Now pci dev does not use device type yet. So add pci_dev_type to take
attr_groups with it.

Add pci_dev_bridge_attrs_are_visible() to control attr_bridge_group only create
attr for bridge.

This is the framework related change, later could attr to bridge_attr_group,
to make those attr only show up on pci bridge device.

Also We could add more attr groups with is_visible to reduce messness in
	pci_create_sysfs_dev_files. ( at least for boot_vga one )

Signed-off-by: Yinghai Lu <yinghai@kernel.org>

---
 drivers/pci/pci-sysfs.c |   30 ++++++++++++++++++++++++++++++
 drivers/pci/pci.h       |    1 +
 drivers/pci/probe.c     |    1 +
 3 files changed, 32 insertions(+)

Index: linux-2.6/drivers/pci/pci-sysfs.c
===================================================================
--- linux-2.6.orig/drivers/pci/pci-sysfs.c
+++ linux-2.6/drivers/pci/pci-sysfs.c
@@ -1319,3 +1319,33 @@ static int __init pci_sysfs_init(void)
 }
 
 late_initcall(pci_sysfs_init);
+
+static struct attribute *pci_dev_bridge_attrs[] = {
+	NULL,
+};
+
+static mode_t pci_dev_bridge_attrs_are_visible(struct kobject *kobj,
+						struct attribute *a, int n)
+{
+	struct device *dev = container_of(kobj, struct device, kobj);
+	struct pci_dev *pdev = to_pci_dev(dev);
+
+	if (!pdev->subordinate)
+		return 0;
+
+	return a->mode;
+}
+
+static struct attribute_group pci_dev_bridge_attr_group = {
+	.attrs = pci_dev_bridge_attrs,
+	.is_visible = pci_dev_bridge_attrs_are_visible,
+};
+
+static const struct attribute_group *pci_dev_attr_groups[] = {
+	&pci_dev_bridge_attr_group,
+	NULL,
+};
+
+struct device_type pci_dev_type = {
+	.groups = pci_dev_attr_groups,
+};
Index: linux-2.6/drivers/pci/pci.h
===================================================================
--- linux-2.6.orig/drivers/pci/pci.h
+++ linux-2.6/drivers/pci/pci.h
@@ -159,6 +159,7 @@ static inline int pci_no_d1d2(struct pci
 }
 extern struct device_attribute pci_dev_attrs[];
 extern struct device_attribute pcibus_dev_attrs[];
+extern struct device_type pci_dev_type;
 #ifdef CONFIG_HOTPLUG
 extern struct bus_attribute pci_bus_attrs[];
 #else
Index: linux-2.6/drivers/pci/probe.c
===================================================================
--- linux-2.6.orig/drivers/pci/probe.c
+++ linux-2.6/drivers/pci/probe.c
@@ -902,6 +902,7 @@ int pci_setup_device(struct pci_dev *dev
 	dev->sysdata = dev->bus->sysdata;
 	dev->dev.parent = dev->bus->bridge;
 	dev->dev.bus = &pci_bus_type;
+	dev->dev.type = &pci_dev_type;
 	dev->hdr_type = hdr_type & 0x7f;
 	dev->multifunction = !!(hdr_type & 0x80);
 	dev->error_state = pci_channel_io_normal;

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH -v4 8_2/8] PCI, sys: only create rescan under /sys/.../pci/devices/... for pci bridges
  2011-10-17  1:45           ` Greg KH
  2011-10-17 18:27             ` [PATCH -v4 8_1/8] PCI, sys: Use device_type and attr_groups with pci dev Yinghai Lu
@ 2011-10-17 18:29             ` Yinghai Lu
  2011-10-17 18:33             ` [PATCH -v4 8_3/8] PCI, sys: Use is_visable() with boot_vga attribute for pci_dev Yinghai Lu
  2011-10-17 18:36             ` [PATCH 8/8] PCI, sys: only create rescan under /sys/.../pci/devices/... for pci bridges Yinghai Lu
  3 siblings, 0 replies; 31+ messages in thread
From: Yinghai Lu @ 2011-10-17 18:29 UTC (permalink / raw)
  To: Greg KH, Jesse Barnes
  Cc: linux-pci@vger.kernel.org, linux-kernel@vger.kernel.org


Current code will create rescan for every pci device,
that is not right. the device is already there, there is no reason to rescan it.

We could have rescan for pci bridges. less confusing.

Need to move rescan attr to pci dev bridge attribute group.
And We should rescan bridge's secondary bus instead of primary bus.

-v3: Use device_type for pci dev.
-v4: Seperate pci device type change out

Signed-off-by: Yinghai Lu <yinghai@kernel.org>

---
 drivers/pci/pci-sysfs.c |    9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

Index: linux-2.6/drivers/pci/pci-sysfs.c
===================================================================
--- linux-2.6.orig/drivers/pci/pci-sysfs.c
+++ linux-2.6/drivers/pci/pci-sysfs.c
@@ -303,12 +303,15 @@ dev_rescan_store(struct device *dev, str
 
 	if (val) {
 		mutex_lock(&pci_remove_rescan_mutex);
-		pci_rescan_bus(pdev->bus);
+		pci_rescan_bus(pdev->subordinate);
 		mutex_unlock(&pci_remove_rescan_mutex);
 	}
 	return count;
 }
 
+static struct device_attribute pci_dev_bridge_rescan_attr =
+	__ATTR(rescan, (S_IWUSR|S_IWGRP), NULL, dev_rescan_store);
+
 static void remove_callback(struct device *dev)
 {
 	struct pci_dev *pdev = to_pci_dev(dev);
@@ -383,7 +386,6 @@ struct device_attribute pci_dev_attrs[]
 	__ATTR(msi_bus, 0644, msi_bus_show, msi_bus_store),
 #ifdef CONFIG_HOTPLUG
 	__ATTR(remove, (S_IWUSR|S_IWGRP), NULL, remove_store),
-	__ATTR(rescan, (S_IWUSR|S_IWGRP), NULL, dev_rescan_store),
 #endif
 	__ATTR_NULL,
 };
@@ -1321,6 +1323,9 @@ static int __init pci_sysfs_init(void)
 late_initcall(pci_sysfs_init);
 
 static struct attribute *pci_dev_bridge_attrs[] = {
+#ifdef CONFIG_HOTPLUG
+	&pci_dev_bridge_rescan_attr.attr,
+#endif
 	NULL,
 };
 

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH -v4 8_3/8] PCI, sys: Use is_visable() with boot_vga attribute for pci_dev
  2011-10-17  1:45           ` Greg KH
  2011-10-17 18:27             ` [PATCH -v4 8_1/8] PCI, sys: Use device_type and attr_groups with pci dev Yinghai Lu
  2011-10-17 18:29             ` [PATCH -v4 8_2/8] PCI, sys: only create rescan under /sys/.../pci/devices/... for pci bridges Yinghai Lu
@ 2011-10-17 18:33             ` Yinghai Lu
  2011-10-17 18:36             ` [PATCH 8/8] PCI, sys: only create rescan under /sys/.../pci/devices/... for pci bridges Yinghai Lu
  3 siblings, 0 replies; 31+ messages in thread
From: Yinghai Lu @ 2011-10-17 18:33 UTC (permalink / raw)
  To: Greg KH, Jesse Barnes
  Cc: linux-pci@vger.kernel.org, linux-kernel@vger.kernel.org


That could let pci_create_sysfs_dev_files more simple.

also fix possible fix memleak during removing path.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>

---
 drivers/pci/pci-sysfs.c |   36 +++++++++++++++++++++++++-----------
 1 file changed, 25 insertions(+), 11 deletions(-)

Index: linux-2.6/drivers/pci/pci-sysfs.c
===================================================================
--- linux-2.6.orig/drivers/pci/pci-sysfs.c
+++ linux-2.6/drivers/pci/pci-sysfs.c
@@ -1213,29 +1213,20 @@ int __must_check pci_create_sysfs_dev_fi
 		pdev->rom_attr = attr;
 	}
 
-	if ((pdev->class >> 8) == PCI_CLASS_DISPLAY_VGA) {
-		retval = device_create_file(&pdev->dev, &vga_attr);
-		if (retval)
-			goto err_rom_file;
-	}
-
 	/* add platform-specific attributes */
 	retval = pcibios_add_platform_entries(pdev);
 	if (retval)
-		goto err_vga_file;
+		goto err_rom_file;
 
 	/* add sysfs entries for various capabilities */
 	retval = pci_create_capabilities_sysfs(pdev);
 	if (retval)
-		goto err_vga_file;
+		goto err_rom_file;
 
 	pci_create_firmware_label_files(pdev);
 
 	return 0;
 
-err_vga_file:
-	if ((pdev->class >> 8) == PCI_CLASS_DISPLAY_VGA)
-		device_remove_file(&pdev->dev, &vga_attr);
 err_rom_file:
 	if (rom_size) {
 		sysfs_remove_bin_file(&pdev->dev.kobj, pdev->rom_attr);
@@ -1322,6 +1313,28 @@ static int __init pci_sysfs_init(void)
 
 late_initcall(pci_sysfs_init);
 
+static struct attribute *pci_dev_vga_attrs[] = {
+	&vga_attr.attr,
+	NULL,
+};
+
+static mode_t pci_dev_vga_attrs_are_visible(struct kobject *kobj,
+						struct attribute *a, int n)
+{
+	struct device *dev = container_of(kobj, struct device, kobj);
+	struct pci_dev *pdev = to_pci_dev(dev);
+
+	if ((pdev->class >> 8) != PCI_CLASS_DISPLAY_VGA)
+		return 0;
+
+	return a->mode;
+}
+
+static struct attribute_group pci_dev_vga_attr_group = {
+	.attrs = pci_dev_vga_attrs,
+	.is_visible = pci_dev_vga_attrs_are_visible,
+};
+
 static struct attribute *pci_dev_bridge_attrs[] = {
 #ifdef CONFIG_HOTPLUG
 	&pci_dev_bridge_rescan_attr.attr,
@@ -1348,6 +1361,7 @@ static struct attribute_group pci_dev_br
 
 static const struct attribute_group *pci_dev_attr_groups[] = {
 	&pci_dev_bridge_attr_group,
+	&pci_dev_vga_attr_group,
 	NULL,
 };
 

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 8/8] PCI, sys: only create rescan under /sys/.../pci/devices/... for pci bridges
  2011-10-17  1:45           ` Greg KH
                               ` (2 preceding siblings ...)
  2011-10-17 18:33             ` [PATCH -v4 8_3/8] PCI, sys: Use is_visable() with boot_vga attribute for pci_dev Yinghai Lu
@ 2011-10-17 18:36             ` Yinghai Lu
  3 siblings, 0 replies; 31+ messages in thread
From: Yinghai Lu @ 2011-10-17 18:36 UTC (permalink / raw)
  To: Greg KH
  Cc: Jesse Barnes, linux-pci@vger.kernel.org,
	linux-kernel@vger.kernel.org

On 10/16/2011 06:45 PM, Greg KH wrote:

> 
> This chunk is separate from the rest of the patch, right?
> 
> Please split this up into doing one thing per patch please.


just sent split versions out. please check it.

Thanks

Yinghai Lu

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH -v4 8_1/8] PCI, sys: Use device_type and attr_groups with pci dev
  2011-10-17 18:27             ` [PATCH -v4 8_1/8] PCI, sys: Use device_type and attr_groups with pci dev Yinghai Lu
@ 2011-10-17 18:38               ` Greg KH
  0 siblings, 0 replies; 31+ messages in thread
From: Greg KH @ 2011-10-17 18:38 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Jesse Barnes, linux-pci@vger.kernel.org,
	linux-kernel@vger.kernel.org

On Mon, Oct 17, 2011 at 11:27:18AM -0700, Yinghai Lu wrote:
> 
> We want to create rescan in sys only for pci bridge instead of all pci dev.

Much nicer, and the best part, the code is much smaller :)

> Also We could add more attr groups with is_visible to reduce messness in
> 	pci_create_sysfs_dev_files. ( at least for boot_vga one )

Yes, you probably should.

Thanks for making the changes.

greg k-h

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 1/8] pci: Make sriov work with hotplug removal
  2011-10-17 18:08     ` Yinghai Lu
@ 2011-10-17 22:12       ` Bjorn Helgaas
  2011-10-17 22:24         ` Yinghai Lu
  0 siblings, 1 reply; 31+ messages in thread
From: Bjorn Helgaas @ 2011-10-17 22:12 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Jesse Barnes, linux-pci@vger.kernel.org,
	linux-kernel@vger.kernel.org

On Mon, Oct 17, 2011 at 12:08 PM, Yinghai Lu <yinghai.lu@oracle.com> wrote:
> On 10/17/2011 10:16 AM, Bjorn Helgaas wrote:
>>
>> On Sat, Oct 15, 2011 at 7:31 PM, Yinghai Lu<yinghai.lu@oracle.com>  wrote:
>>>
>>> When hot remove pci express module, got
>>>
>>> [ 5918.610127] pciehp 0000:80:02.2:pcie04: pcie_isr: intr_loc 1
>>> [ 5918.615779] pciehp 0000:80:02.2:pcie04: Attention button interrupt
>>> received
>>> [ 5918.622730] pciehp 0000:80:02.2:pcie04: Button pressed on Slot(3)
>>> [ 5918.629002] pciehp 0000:80:02.2:pcie04: pciehp_get_power_status:
>>> SLOTCTRL a8 value read 1f9
>>> [ 5918.637416] pciehp 0000:80:02.2:pcie04: PCI slot #3 - powering off due
>>> to button press.
>>> [ 5918.647125] pciehp 0000:80:02.2:pcie04: pcie_isr: intr_loc 10
>>> [ 5918.653039] pciehp 0000:80:02.2:pcie04: pciehp_green_led_blink:
>>> SLOTCTRL a8 write cmd 200
>>> [ 5918.661229] pciehp 0000:80:02.2:pcie04: pciehp_set_attention_status:
>>> SLOTCTRL a8 write cmd c0
>>> [ 5924.667627] pciehp 0000:80:02.2:pcie04: Disabling
>>> domain:bus:device=0000:b0:00
>>> [ 5924.674909] pciehp 0000:80:02.2:pcie04: pciehp_get_power_status:
>>> SLOTCTRL a8 value read 2f9
>>> [ 5924.683262] pciehp 0000:80:02.2:pcie04: pciehp_unconfigure_device:
>>> domain:bus:dev = 0000:b0:00
>>> [ 5924.693976] libfcoe_device_notification: NETDEV_UNREGISTER eth6
>>> [ 5924.764979] libfcoe_device_notification: NETDEV_UNREGISTER eth14
>>> [ 5924.873539] libfcoe_device_notification: NETDEV_UNREGISTER eth15
>>> [ 5924.995209] libfcoe_device_notification: NETDEV_UNREGISTER eth16
>>> [ 5926.114407] sxge 0000:b2:00.0: PCI INT A disabled
>>> [ 5926.119342] BUG: unable to handle kernel NULL pointer dereference at
>>> (null)
>>> [ 5926.127189] IP: [<ffffffff81353a3b>] pci_stop_bus_device+0x33/0x83
>>> [ 5926.133377] PGD 0
>>> [ 5926.135402] Oops: 0000 [#1] SMP
>>> [ 5926.138659] CPU 2
>>> [ 5926.140499] Modules linked in:
>>> ...
>>> [ 5926.143754]
>>> [ 5926.275823] Call Trace:
>>> [ 5926.278267]  [<ffffffff81353a38>] pci_stop_bus_device+0x30/0x83
>>> [ 5926.284180]  [<ffffffff81353af4>] pci_remove_bus_device+0x1a/0xba
>>> [ 5926.290264]  [<ffffffff81366311>]
>>> pciehp_unconfigure_device+0x110/0x17b
>>> [ 5926.296866]  [<ffffffff81365dd9>] ? pciehp_disable_slot+0x188/0x188
>>> [ 5926.303123]  [<ffffffff81365d6f>] pciehp_disable_slot+0x11e/0x188
>>> [ 5926.309206]  [<ffffffff81365e68>] pciehp_power_thread+0x8f/0xe0
>>> ...
>>>
>>> Root cause: when doing pci_stop_bus_device() with phys fn, it will stop
>>> virt fn and
>>> remove the fn, so
>>>        list_for_each_safe(l, n,&bus->devices)
>>> will have problem to refer freed n that is pointed to vf entry.
>>>
>>> Solution is just call pci_stop_bus_device() with phys fn only. and before
>>> that need to
>>> save phys fn aside and avoid to use bus->devices to loop.
>>>
>>> Signed-off-by: Yinghai Lu<yinghai@kernel.org>
>>>
>>> ---
>>>  drivers/pci/remove.c |   33 +++++++++++++++++++++++++++++++++
>>>  1 file changed, 33 insertions(+)
>>>
>>> Index: linux-2.6/drivers/pci/remove.c
>>> ===================================================================
>>> --- linux-2.6.orig/drivers/pci/remove.c
>>> +++ linux-2.6/drivers/pci/remove.c
>>> @@ -120,10 +120,43 @@ void pci_remove_behind_bridge(struct pci
>>>                        pci_remove_bus_device(pci_dev_b(l));
>>>  }
>>>
>>> +struct dev_list {
>>> +       struct pci_dev *dev;
>>> +       struct list_head list;
>>> +};
>>> +
>>>  static void pci_stop_bus_devices(struct pci_bus *bus)
>>>  {
>>>        struct list_head *l, *n;
>>> +       struct dev_list *dl, *dn;
>>> +       LIST_HEAD(physfn_list);
>>> +
>>> +       /* Save phys_fn aside at first */
>>> +       list_for_each(l,&bus->devices) {
>>> +               struct pci_dev *dev = pci_dev_b(l);
>>> +
>>> +               if (!dev->is_virtfn) {
>>> +                       dl = kmalloc(sizeof(*dl), GFP_KERNEL);
>>> +                       if (!dl)
>>> +                               continue;
>>> +                       dl->dev = dev;
>>> +                       list_add_tail(&dl->list,&physfn_list);
>>> +               }
>>> +       }
>>> +
>>> +       /*
>>> +        * stop bus device for phys_fn at first
>>> +        *  it will stop and remove vf in driver remove action
>>> +        */
>>> +       list_for_each_entry_safe(dl, dn,&physfn_list, list) {
>>> +               struct pci_dev *dev = dl->dev;
>>> +
>>> +               pci_stop_bus_device(dev);
>>> +
>>> +               kfree(dl);
>>> +       }
>>>
>>> +       /* Do it again for left over in case */
>>>        list_for_each_safe(l, n,&bus->devices) {
>>>                struct pci_dev *dev = pci_dev_b(l);
>>>                pci_stop_bus_device(dev);
>>
>> Apparently it's a problem to stop a VF before its PF?  Why is that?
>> Sounds like there's an important dependency here, but I can't figure
>> it out.  I would have naively expected that you would *want* to stop a
>> VF first, since it depends on the PF.
>
> when you stop the VF, that VF will not be removed.
>
> when you stop the PF, the driver will be unloaded, it will *STOP* the VF at
> first and *REMOVE* the VF there. So the bus->devices will be touch at that
> time.
> for example when you have three VFs. those VFs will be removed from
> bus->devices, and n in the list_for_each_safe will have invalid pointer to
> freed entry.
>
> So stop the VF at first will *NOT* help.

No need to shout.  I wasn't suggesting that stopping VFs first was the
right solution.  It's just that stopping PFs first violates the
general design pattern of "discover top-down and remove bottom-up," so
it's not obvious why this is correct.

Let me see if I understand this.  Here's the path where I think the problem is:

    pciehp_unconfigure_device
      pci_remove_bus_device
        pci_stop_bus_device
          pci_stop_bus_devices(subordinate)
            list_for_each(...)    <-- A
              pci_stop_bus_device(PF)
                pci_stop_dev
                  device_unregister
                    ...
                      driver->remove, e.g., igb_remove
                        pci_disable_sriov
                          sriov_disable
                            virtfn_remove
                              pci_remove_bus_device
                                pci_destroy_dev
                                  <remove from bus_list>    <-- B

The hotplug removal starts with pci_remove_bus_device(), which calls
pci_stop_bus_device() on the PF, which ultimately calls the
driver->remove() method, which (since it's the PF) calls
pci_disable_sriov(), which destroys all the VFs and re-enters
pci_remove_bus_device() for each of them.

We're manipulating the same list at A and B, so when we get back to A
after removing all the VFs, the list has been changed out from under
us.  Your patch avoids the problem by first putting the PFs on a
separate list and walking that instead, so the list at A is now
different from the list at B.  Right?

Maybe this is the best we can do, but it still doesn't seem ideal, and
it's certainly not obvious when reading the code.  It doesn't seem
right for the driver ->remove() method to be calling
pci_destroy_dev().   Won't the core data structures be corrupted if a
defective driver doesn't call pci_disable_sriov()?  Seems like we
could end up with a device that's been physically removed, but still
has pci_dev structs hanging around.

Bjorn

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 1/8] pci: Make sriov work with hotplug removal
  2011-10-17 22:12       ` Bjorn Helgaas
@ 2011-10-17 22:24         ` Yinghai Lu
  2011-10-18 16:49           ` Bjorn Helgaas
  0 siblings, 1 reply; 31+ messages in thread
From: Yinghai Lu @ 2011-10-17 22:24 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Jesse Barnes, linux-pci@vger.kernel.org,
	linux-kernel@vger.kernel.org

On 10/17/2011 03:12 PM, Bjorn Helgaas wrote:

> 
> Maybe this is the best we can do, but it still doesn't seem ideal, and
> it's certainly not obvious when reading the code.  It doesn't seem
> right for the driver ->remove() method to be calling
> pci_destroy_dev().   Won't the core data structures be corrupted if a
> defective driver doesn't call pci_disable_sriov()?  Seems like we
> could end up with a device that's been physically removed, but still
> has pci_dev structs hanging around.


i did add some print out in
	pci_stop_bus_device
when stop PF, that function is called for those VFs.

also driver have to call pci_disable_sriov() and that is current design.

Yinghai

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 1/8] pci: Make sriov work with hotplug removal
  2011-10-17 22:24         ` Yinghai Lu
@ 2011-10-18 16:49           ` Bjorn Helgaas
  2011-10-18 17:02             ` Yinghai Lu
  0 siblings, 1 reply; 31+ messages in thread
From: Bjorn Helgaas @ 2011-10-18 16:49 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Jesse Barnes, linux-pci@vger.kernel.org,
	linux-kernel@vger.kernel.org

On Mon, Oct 17, 2011 at 4:24 PM, Yinghai Lu <yinghai.lu@oracle.com> wrote:
> On 10/17/2011 03:12 PM, Bjorn Helgaas wrote:
>>
>> Maybe this is the best we can do, but it still doesn't seem ideal, and
>> it's certainly not obvious when reading the code.  It doesn't seem
>> right for the driver ->remove() method to be calling
>> pci_destroy_dev().   Won't the core data structures be corrupted if a
>> defective driver doesn't call pci_disable_sriov()?  Seems like we
>> could end up with a device that's been physically removed, but still
>> has pci_dev structs hanging around.
>
> i did add some print out in
>        pci_stop_bus_device
> when stop PF, that function is called for those VFs.
>
> also driver have to call pci_disable_sriov() and that is current design.

Yep.  But I don't have to like the current design :)  It doesn't seem
as robust as it could be.

It took me a long time to puzzle out what was happening here.  Here's
some possible changelog text that would have saved me a lot of time:

    The PCI hot-remove path calls pci_stop_bus_devices() via
    pci_remove_bus_device().

    pci_stop_bus_devices() traverses the bus->devices list (point A below),
    stopping each device in turn, which calls the driver remove() method.  When
    the device is an SR-IOV PF, the driver calls pci_disable_sriov(), which
    also uses pci_remove_bus_device() to remove the VF devices from the
    bus->devices list (point B).

        pci_remove_bus_device
          pci_stop_bus_device
            pci_stop_bus_devices(subordinate)
              list_for_each(bus->devices)             <-- A
                pci_stop_bus_device(PF)
                  ...
                    driver->remove
                      pci_disable_sriov
                        ...
                          pci_remove_bus_device(VF)
                              <remove from bus_list>  <-- B

    At B, we're changing the same list we're iterating through at A, so when
    the driver remove() method returns, the pci_stop_bus_devices() iterator has
    a pointer to a list entry that has already been freed.

    This patch avoids the problem by building a separate list of all PFs on
    the bus and traversing that at A instead of the bus->devices list.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 1/8] pci: Make sriov work with hotplug removal
  2011-10-18 16:49           ` Bjorn Helgaas
@ 2011-10-18 17:02             ` Yinghai Lu
  2011-10-25 12:34               ` Kenji Kaneshige
  2011-11-11 21:00               ` [RESEND PATCH] PCI: Make sriov work with hotplug remove Yinghai Lu
  0 siblings, 2 replies; 31+ messages in thread
From: Yinghai Lu @ 2011-10-18 17:02 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Jesse Barnes, linux-pci@vger.kernel.org,
	linux-kernel@vger.kernel.org

On 10/18/2011 09:49 AM, Bjorn Helgaas wrote:
> On Mon, Oct 17, 2011 at 4:24 PM, Yinghai Lu<yinghai.lu@oracle.com>  wrote:
>> On 10/17/2011 03:12 PM, Bjorn Helgaas wrote:
>>>
>>> Maybe this is the best we can do, but it still doesn't seem ideal, and
>>> it's certainly not obvious when reading the code.  It doesn't seem
>>> right for the driver ->remove() method to be calling
>>> pci_destroy_dev().   Won't the core data structures be corrupted if a
>>> defective driver doesn't call pci_disable_sriov()?  Seems like we
>>> could end up with a device that's been physically removed, but still
>>> has pci_dev structs hanging around.
>>
>> i did add some print out in
>>         pci_stop_bus_device
>> when stop PF, that function is called for those VFs.
>>
>> also driver have to call pci_disable_sriov() and that is current design.
>
> Yep.  But I don't have to like the current design :)  It doesn't seem
> as robust as it could be.
>
> It took me a long time to puzzle out what was happening here.  Here's
> some possible changelog text that would have saved me a lot of time:
>
>      The PCI hot-remove path calls pci_stop_bus_devices() via
>      pci_remove_bus_device().
>
>      pci_stop_bus_devices() traverses the bus->devices list (point A below),
>      stopping each device in turn, which calls the driver remove() method.  When
>      the device is an SR-IOV PF, the driver calls pci_disable_sriov(), which
>      also uses pci_remove_bus_device() to remove the VF devices from the
>      bus->devices list (point B).
>
>          pci_remove_bus_device
>            pci_stop_bus_device
>              pci_stop_bus_devices(subordinate)
>                list_for_each(bus->devices)<-- A
>                  pci_stop_bus_device(PF)
>                    ...
>                      driver->remove
>                        pci_disable_sriov
>                          ...
>                            pci_remove_bus_device(VF)
>                                <remove from bus_list>   <-- B
>
>      At B, we're changing the same list we're iterating through at A, so when
>      the driver remove() method returns, the pci_stop_bus_devices() iterator has
>      a pointer to a list entry that has already been freed.
>
>      This patch avoids the problem by building a separate list of all PFs on
>      the bus and traversing that at A instead of the bus->devices list.

yes.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 1/8] pci: Make sriov work with hotplug removal
  2011-10-18 17:02             ` Yinghai Lu
@ 2011-10-25 12:34               ` Kenji Kaneshige
  2011-10-25 15:52                 ` Yinghai Lu
  2011-11-11 21:00               ` [RESEND PATCH] PCI: Make sriov work with hotplug remove Yinghai Lu
  1 sibling, 1 reply; 31+ messages in thread
From: Kenji Kaneshige @ 2011-10-25 12:34 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Bjorn Helgaas, Jesse Barnes, linux-pci@vger.kernel.org,
	linux-kernel@vger.kernel.org

(2011/10/19 2:02), Yinghai Lu wrote:
> On 10/18/2011 09:49 AM, Bjorn Helgaas wrote:
>> On Mon, Oct 17, 2011 at 4:24 PM, Yinghai Lu<yinghai.lu@oracle.com> wrote:
>>> On 10/17/2011 03:12 PM, Bjorn Helgaas wrote:
>>>>
>>>> Maybe this is the best we can do, but it still doesn't seem ideal, and
>>>> it's certainly not obvious when reading the code. It doesn't seem
>>>> right for the driver ->remove() method to be calling
>>>> pci_destroy_dev(). Won't the core data structures be corrupted if a
>>>> defective driver doesn't call pci_disable_sriov()? Seems like we
>>>> could end up with a device that's been physically removed, but still
>>>> has pci_dev structs hanging around.
>>>
>>> i did add some print out in
>>> pci_stop_bus_device
>>> when stop PF, that function is called for those VFs.
>>>
>>> also driver have to call pci_disable_sriov() and that is current design.
>>
>> Yep. But I don't have to like the current design :) It doesn't seem
>> as robust as it could be.
>>
>> It took me a long time to puzzle out what was happening here. Here's
>> some possible changelog text that would have saved me a lot of time:
>>
>> The PCI hot-remove path calls pci_stop_bus_devices() via
>> pci_remove_bus_device().
>>
>> pci_stop_bus_devices() traverses the bus->devices list (point A below),
>> stopping each device in turn, which calls the driver remove() method. When
>> the device is an SR-IOV PF, the driver calls pci_disable_sriov(), which
>> also uses pci_remove_bus_device() to remove the VF devices from the
>> bus->devices list (point B).
>>
>> pci_remove_bus_device
>> pci_stop_bus_device
>> pci_stop_bus_devices(subordinate)
>> list_for_each(bus->devices)<-- A
>> pci_stop_bus_device(PF)
>> ...
>> driver->remove
>> pci_disable_sriov
>> ...
>> pci_remove_bus_device(VF)
>> <remove from bus_list> <-- B
>>
>> At B, we're changing the same list we're iterating through at A, so when
>> the driver remove() method returns, the pci_stop_bus_devices() iterator has
>> a pointer to a list entry that has already been freed.
>>
>> This patch avoids the problem by building a separate list of all PFs on
>> the bus and traversing that at A instead of the bus->devices list.
> 
> yes.

I have one question.

I think pci_stop_bus_devices() is called only when bridge device is removed.
Are you trying to hot-remove the bridge device that has SR-IOV capable
devices on the subordinate bus?

Regards,
Kenji Kaneshige

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 1/8] pci: Make sriov work with hotplug removal
  2011-10-25 12:34               ` Kenji Kaneshige
@ 2011-10-25 15:52                 ` Yinghai Lu
  0 siblings, 0 replies; 31+ messages in thread
From: Yinghai Lu @ 2011-10-25 15:52 UTC (permalink / raw)
  To: Kenji Kaneshige
  Cc: Bjorn Helgaas, Jesse Barnes, linux-pci@vger.kernel.org,
	linux-kernel@vger.kernel.org

2011/10/25 Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>:
>
> I have one question.
>
> I think pci_stop_bus_devices() is called only when bridge device is removed.
> Are you trying to hot-remove the bridge device that has SR-IOV capable
> devices on the subordinate bus?

Yes, I should include more pci info in patch comments:

root complex: 80:02.2
pci express modules: have pcie switch and are listed as b0:00.0,
b1:02.0 and b1:03.0.
                 end devices  are b2:00.0 and b3.00.0.
                 VFs are: b2:00.1,... b2:00.3, and b3:00.1,...,b3:00.3

Thanks

Yinghai Lu

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [RESEND PATCH] PCI: Make sriov work with hotplug remove
  2011-10-18 17:02             ` Yinghai Lu
  2011-10-25 12:34               ` Kenji Kaneshige
@ 2011-11-11 21:00               ` Yinghai Lu
  2011-11-16  5:54                 ` Kenji Kaneshige
  1 sibling, 1 reply; 31+ messages in thread
From: Yinghai Lu @ 2011-11-11 21:00 UTC (permalink / raw)
  To: Jesse Barnes
  Cc: Bjorn Helgaas, linux-pci@vger.kernel.org,
	linux-kernel@vger.kernel.org, Kenji Kaneshige


When hot remove pci express module that have pcie switch and support SRIOV, got

[ 5918.610127] pciehp 0000:80:02.2:pcie04: pcie_isr: intr_loc 1
[ 5918.615779] pciehp 0000:80:02.2:pcie04: Attention button interrupt received
[ 5918.622730] pciehp 0000:80:02.2:pcie04: Button pressed on Slot(3)
[ 5918.629002] pciehp 0000:80:02.2:pcie04: pciehp_get_power_status: SLOTCTRL a8 value read 1f9
[ 5918.637416] pciehp 0000:80:02.2:pcie04: PCI slot #3 - powering off due to button press.
[ 5918.647125] pciehp 0000:80:02.2:pcie04: pcie_isr: intr_loc 10
[ 5918.653039] pciehp 0000:80:02.2:pcie04: pciehp_green_led_blink: SLOTCTRL a8 write cmd 200
[ 5918.661229] pciehp 0000:80:02.2:pcie04: pciehp_set_attention_status: SLOTCTRL a8 write cmd c0
[ 5924.667627] pciehp 0000:80:02.2:pcie04: Disabling domain:bus:device=0000:b0:00
[ 5924.674909] pciehp 0000:80:02.2:pcie04: pciehp_get_power_status: SLOTCTRL a8 value read 2f9
[ 5924.683262] pciehp 0000:80:02.2:pcie04: pciehp_unconfigure_device: domain:bus:dev = 0000:b0:00
[ 5924.693976] libfcoe_device_notification: NETDEV_UNREGISTER eth6
[ 5924.764979] libfcoe_device_notification: NETDEV_UNREGISTER eth14
[ 5924.873539] libfcoe_device_notification: NETDEV_UNREGISTER eth15
[ 5924.995209] libfcoe_device_notification: NETDEV_UNREGISTER eth16
[ 5926.114407] sxge 0000:b2:00.0: PCI INT A disabled
[ 5926.119342] BUG: unable to handle kernel NULL pointer dereference at (null)
[ 5926.127189] IP: [<ffffffff81353a3b>] pci_stop_bus_device+0x33/0x83
[ 5926.133377] PGD 0
[ 5926.135402] Oops: 0000 [#1] SMP
[ 5926.138659] CPU 2
[ 5926.140499] Modules linked in:
...
[ 5926.143754]
[ 5926.275823] Call Trace:
[ 5926.278267]  [<ffffffff81353a38>] pci_stop_bus_device+0x30/0x83
[ 5926.284180]  [<ffffffff81353af4>] pci_remove_bus_device+0x1a/0xba
[ 5926.290264]  [<ffffffff81366311>] pciehp_unconfigure_device+0x110/0x17b
[ 5926.296866]  [<ffffffff81365dd9>] ? pciehp_disable_slot+0x188/0x188
[ 5926.303123]  [<ffffffff81365d6f>] pciehp_disable_slot+0x11e/0x188
[ 5926.309206]  [<ffffffff81365e68>] pciehp_power_thread+0x8f/0xe0
...

 +-[0000:80]-+-00.0-[81-8f]--
 |           +-01.0-[90-9f]--
 |           +-02.0-[a0-af]--
 |           +-02.2-[b0-bf]----00.0-[b1-b3]--+-02.0-[b2]--+-00.0 Device
 |           |                               |            +-00.1 Device
 |           |                               |            +-00.2 Device
 |           |                               |            \-00.3 Device
 |           |                               \-03.0-[b3]--+-00.0 Device
 |           |                                            +-00.1 Device
 |           |                                            +-00.2 Device
 |           |                                            \-00.3 Device

root complex: 80:02.2
pci express modules: have pcie switch and are listed as b0:00.0, b1:02.0 and b1:03.0.
                end devices  are b2:00.0 and b3.00.0.
                VFs are: b2:00.1,... b2:00.3, and b3:00.1,...,b3:00.3

Root cause: when doing pci_stop_bus_device() with phys fn, it will stop virt fn and
remove the fn, so
	list_for_each_safe(l, n, &bus->devices)
will have problem to refer freed n that is pointed to vf entry.

Solution is just call pci_stop_bus_device() with phys fn only. and before that need to
save phys fn aside and avoid to use bus->devices to loop.

During reviewing the patch, Bjorn said:
|   The PCI hot-remove path calls pci_stop_bus_devices() via
|   pci_remove_bus_device().
|
|   pci_stop_bus_devices() traverses the bus->devices list (point A below),
|   stopping each device in turn, which calls the driver remove() method.  When
|   the device is an SR-IOV PF, the driver calls pci_disable_sriov(), which
|   also uses pci_remove_bus_device() to remove the VF devices from the
|   bus->devices list (point B).
|
|       pci_remove_bus_device
|         pci_stop_bus_device
|           pci_stop_bus_devices(subordinate)
|             list_for_each(bus->devices)             <-- A
|               pci_stop_bus_device(PF)
|                 ...
|                   driver->remove
|                     pci_disable_sriov
|                       ...
|                         pci_remove_bus_device(VF)
|                             <remove from bus_list>  <-- B
|
|   At B, we're changing the same list we're iterating through at A, so when
|   the driver remove() method returns, the pci_stop_bus_devices() iterator has
|   a pointer to a list entry that has already been freed.
|
|   This patch avoids the problem by building a separate list of all PFs on
|   the bus and traversing that at A instead of the bus->devices list.

Discussion thread can be found : https://lkml.org/lkml/2011/10/15/141

-v2: updated changelog.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>

---
 drivers/pci/remove.c |   33 +++++++++++++++++++++++++++++++++
 1 file changed, 33 insertions(+)

Index: linux-2.6/drivers/pci/remove.c
===================================================================
--- linux-2.6.orig/drivers/pci/remove.c
+++ linux-2.6/drivers/pci/remove.c
@@ -120,10 +120,43 @@ void pci_remove_behind_bridge(struct pci
 			pci_remove_bus_device(pci_dev_b(l));
 }
 
+struct dev_list {
+	struct pci_dev *dev;
+	struct list_head list;
+};
+
 static void pci_stop_bus_devices(struct pci_bus *bus)
 {
 	struct list_head *l, *n;
+	struct dev_list *dl, *dn;
+	LIST_HEAD(physfn_list);
+
+	/* Save phys_fn aside at first */
+	list_for_each(l, &bus->devices) {
+		struct pci_dev *dev = pci_dev_b(l);
+
+		if (!dev->is_virtfn) {
+			dl = kmalloc(sizeof(*dl), GFP_KERNEL);
+			if (!dl)
+				continue;
+			dl->dev = dev;
+			list_add_tail(&dl->list, &physfn_list);
+		}
+	}
+
+	/*
+	 * stop bus device for phys_fn at first
+	 *  it will stop and remove vf in driver remove action
+	 */
+	list_for_each_entry_safe(dl, dn, &physfn_list, list) {
+		struct pci_dev *dev = dl->dev;
+
+		pci_stop_bus_device(dev);
+
+		kfree(dl);
+	}
 
+	/* Do it again for left over in case */
 	list_for_each_safe(l, n, &bus->devices) {
 		struct pci_dev *dev = pci_dev_b(l);
 		pci_stop_bus_device(dev);

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RESEND PATCH] PCI: Make sriov work with hotplug remove
  2011-11-11 21:00               ` [RESEND PATCH] PCI: Make sriov work with hotplug remove Yinghai Lu
@ 2011-11-16  5:54                 ` Kenji Kaneshige
  2011-11-16 21:23                   ` Yinghai Lu
  0 siblings, 1 reply; 31+ messages in thread
From: Kenji Kaneshige @ 2011-11-16  5:54 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Jesse Barnes, Bjorn Helgaas, linux-pci@vger.kernel.org,
	linux-kernel@vger.kernel.org

(2011/11/12 6:00), Yinghai Lu wrote:
> 
> When hot remove pci express module that have pcie switch and support SRIOV, got
> 
> [ 5918.610127] pciehp 0000:80:02.2:pcie04: pcie_isr: intr_loc 1
> [ 5918.615779] pciehp 0000:80:02.2:pcie04: Attention button interrupt received
> [ 5918.622730] pciehp 0000:80:02.2:pcie04: Button pressed on Slot(3)
> [ 5918.629002] pciehp 0000:80:02.2:pcie04: pciehp_get_power_status: SLOTCTRL a8 value read 1f9
> [ 5918.637416] pciehp 0000:80:02.2:pcie04: PCI slot #3 - powering off due to button press.
> [ 5918.647125] pciehp 0000:80:02.2:pcie04: pcie_isr: intr_loc 10
> [ 5918.653039] pciehp 0000:80:02.2:pcie04: pciehp_green_led_blink: SLOTCTRL a8 write cmd 200
> [ 5918.661229] pciehp 0000:80:02.2:pcie04: pciehp_set_attention_status: SLOTCTRL a8 write cmd c0
> [ 5924.667627] pciehp 0000:80:02.2:pcie04: Disabling domain:bus:device=0000:b0:00
> [ 5924.674909] pciehp 0000:80:02.2:pcie04: pciehp_get_power_status: SLOTCTRL a8 value read 2f9
> [ 5924.683262] pciehp 0000:80:02.2:pcie04: pciehp_unconfigure_device: domain:bus:dev = 0000:b0:00
> [ 5924.693976] libfcoe_device_notification: NETDEV_UNREGISTER eth6
> [ 5924.764979] libfcoe_device_notification: NETDEV_UNREGISTER eth14
> [ 5924.873539] libfcoe_device_notification: NETDEV_UNREGISTER eth15
> [ 5924.995209] libfcoe_device_notification: NETDEV_UNREGISTER eth16
> [ 5926.114407] sxge 0000:b2:00.0: PCI INT A disabled
> [ 5926.119342] BUG: unable to handle kernel NULL pointer dereference at (null)
> [ 5926.127189] IP: [<ffffffff81353a3b>] pci_stop_bus_device+0x33/0x83
> [ 5926.133377] PGD 0
> [ 5926.135402] Oops: 0000 [#1] SMP
> [ 5926.138659] CPU 2
> [ 5926.140499] Modules linked in:
> ...
> [ 5926.143754]
> [ 5926.275823] Call Trace:
> [ 5926.278267]  [<ffffffff81353a38>] pci_stop_bus_device+0x30/0x83
> [ 5926.284180]  [<ffffffff81353af4>] pci_remove_bus_device+0x1a/0xba
> [ 5926.290264]  [<ffffffff81366311>] pciehp_unconfigure_device+0x110/0x17b
> [ 5926.296866]  [<ffffffff81365dd9>] ? pciehp_disable_slot+0x188/0x188
> [ 5926.303123]  [<ffffffff81365d6f>] pciehp_disable_slot+0x11e/0x188
> [ 5926.309206]  [<ffffffff81365e68>] pciehp_power_thread+0x8f/0xe0
> ...
> 
>   +-[0000:80]-+-00.0-[81-8f]--
>   |           +-01.0-[90-9f]--
>   |           +-02.0-[a0-af]--
>   |           +-02.2-[b0-bf]----00.0-[b1-b3]--+-02.0-[b2]--+-00.0 Device
>   |           |                               |            +-00.1 Device
>   |           |                               |            +-00.2 Device
>   |           |                               |            \-00.3 Device
>   |           |                               \-03.0-[b3]--+-00.0 Device
>   |           |                                            +-00.1 Device
>   |           |                                            +-00.2 Device
>   |           |                                            \-00.3 Device
> 
> root complex: 80:02.2
> pci express modules: have pcie switch and are listed as b0:00.0, b1:02.0 and b1:03.0.
>                  end devices  are b2:00.0 and b3.00.0.
>                  VFs are: b2:00.1,... b2:00.3, and b3:00.1,...,b3:00.3
> 
> Root cause: when doing pci_stop_bus_device() with phys fn, it will stop virt fn and
> remove the fn, so
> 	list_for_each_safe(l, n,&bus->devices)
> will have problem to refer freed n that is pointed to vf entry.
> 
> Solution is just call pci_stop_bus_device() with phys fn only. and before that need to
> save phys fn aside and avoid to use bus->devices to loop.
> 
> During reviewing the patch, Bjorn said:
> |   The PCI hot-remove path calls pci_stop_bus_devices() via
> |   pci_remove_bus_device().
> |
> |   pci_stop_bus_devices() traverses the bus->devices list (point A below),
> |   stopping each device in turn, which calls the driver remove() method.  When
> |   the device is an SR-IOV PF, the driver calls pci_disable_sriov(), which
> |   also uses pci_remove_bus_device() to remove the VF devices from the
> |   bus->devices list (point B).
> |
> |       pci_remove_bus_device
> |         pci_stop_bus_device
> |           pci_stop_bus_devices(subordinate)
> |             list_for_each(bus->devices)<-- A
> |               pci_stop_bus_device(PF)
> |                 ...
> |                   driver->remove
> |                     pci_disable_sriov
> |                       ...
> |                         pci_remove_bus_device(VF)
> |<remove from bus_list>   <-- B
> |
> |   At B, we're changing the same list we're iterating through at A, so when
> |   the driver remove() method returns, the pci_stop_bus_devices() iterator has
> |   a pointer to a list entry that has already been freed.
> |
> |   This patch avoids the problem by building a separate list of all PFs on
> |   the bus and traversing that at A instead of the bus->devices list.
> 
> Discussion thread can be found : https://lkml.org/lkml/2011/10/15/141
> 
> -v2: updated changelog.
> 
> Signed-off-by: Yinghai Lu<yinghai@kernel.org>
> 
> ---
>   drivers/pci/remove.c |   33 +++++++++++++++++++++++++++++++++
>   1 file changed, 33 insertions(+)
> 
> Index: linux-2.6/drivers/pci/remove.c
> ===================================================================
> --- linux-2.6.orig/drivers/pci/remove.c
> +++ linux-2.6/drivers/pci/remove.c
> @@ -120,10 +120,43 @@ void pci_remove_behind_bridge(struct pci
>   			pci_remove_bus_device(pci_dev_b(l));
>   }
> 
> +struct dev_list {
> +	struct pci_dev *dev;
> +	struct list_head list;
> +};
> +
>   static void pci_stop_bus_devices(struct pci_bus *bus)
>   {
>   	struct list_head *l, *n;
> +	struct dev_list *dl, *dn;
> +	LIST_HEAD(physfn_list);
> +
> +	/* Save phys_fn aside at first */
> +	list_for_each(l,&bus->devices) {
> +		struct pci_dev *dev = pci_dev_b(l);
> +
> +		if (!dev->is_virtfn) {
> +			dl = kmalloc(sizeof(*dl), GFP_KERNEL);
> +			if (!dl)
> +				continue;
> +			dl->dev = dev;
> +			list_add_tail(&dl->list,&physfn_list);
> +		}
> +	}
> +
> +	/*
> +	 * stop bus device for phys_fn at first
> +	 *  it will stop and remove vf in driver remove action
> +	 */
> +	list_for_each_entry_safe(dl, dn,&physfn_list, list) {
> +		struct pci_dev *dev = dl->dev;
> +
> +		pci_stop_bus_device(dev);
> +
> +		kfree(dl);
> +	}
> 
> +	/* Do it again for left over in case */
>   	list_for_each_safe(l, n,&bus->devices) {
>   		struct pci_dev *dev = pci_dev_b(l);
>   		pci_stop_bus_device(dev);

Sorry for the delayed comment.

I don't think allocating new memory for device removal is good idea. And
your code doesn't seem to work if memory allocation failed. What about
something like below? Note that it is not tested at all, sorry.

By the way, I think the root cause is that current SR-IOV design doesn't
match pci device driver model very well, IMHO.

Regards,
Kenji Kaneshige

---
 drivers/pci/remove.c |   20 +++++++++++++++++---
 1 file changed, 17 insertions(+), 3 deletions(-)

Index: 20111116/drivers/pci/remove.c
===================================================================
--- 20111116.orig/drivers/pci/remove.c
+++ 20111116/drivers/pci/remove.c
@@ -122,11 +122,25 @@ void pci_remove_behind_bridge(struct pci
 
 static void pci_stop_bus_devices(struct pci_bus *bus)
 {
-	struct list_head *l, *n;
+	struct list_head *pos, *head;
 
-	list_for_each_safe(l, n, &bus->devices) {
-		struct pci_dev *dev = pci_dev_b(l);
+	head = &bus->devices;
+	pos = head->next;
+	while (pos != head) {
+		struct pci_dev *dev = pci_dev_b(pos);
+		/*
+		 * NOTE: We need special handling here because VFs are
+		 * removed by pci_remove_bus_device() in the
+		 * pci_stop_bus_devices() code path for PF. Seems
+		 * current SR-IOV code doesn't match linux pci driver
+		 * model. Need refactoring.
+		 */
+		if (dev->is_virtfn)
+			goto next;
 		pci_stop_bus_device(dev);
+	next:
+		barrier();
+		pos = pos->next;
 	}
 }

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RESEND PATCH] PCI: Make sriov work with hotplug remove
  2011-11-16  5:54                 ` Kenji Kaneshige
@ 2011-11-16 21:23                   ` Yinghai Lu
  2011-11-23  1:58                     ` Kenji Kaneshige
  0 siblings, 1 reply; 31+ messages in thread
From: Yinghai Lu @ 2011-11-16 21:23 UTC (permalink / raw)
  To: Kenji Kaneshige
  Cc: Jesse Barnes, Bjorn Helgaas, linux-pci@vger.kernel.org,
	linux-kernel@vger.kernel.org

On 11/15/2011 09:54 PM, Kenji Kaneshige wrote:

> Sorry for the delayed comment.
> 
> I don't think allocating new memory for device removal is good idea. And
> your code doesn't seem to work if memory allocation failed. What about
> something like below? Note that it is not tested at all, sorry.
> 

Yes, it should work.

I updated the patch, use list_entry_for_each directly. also remove VF checking.

Please check it.

Thanks

Yinghai

Subject: [PATCH -v3] PCI: Make sriov work with hotplug remove

When hot remove pci express module that have pcie switch and support SRIOV, got

[ 5918.610127] pciehp 0000:80:02.2:pcie04: pcie_isr: intr_loc 1
[ 5918.615779] pciehp 0000:80:02.2:pcie04: Attention button interrupt received
[ 5918.622730] pciehp 0000:80:02.2:pcie04: Button pressed on Slot(3)
[ 5918.629002] pciehp 0000:80:02.2:pcie04: pciehp_get_power_status: SLOTCTRL a8 value read 1f9
[ 5918.637416] pciehp 0000:80:02.2:pcie04: PCI slot #3 - powering off due to button press.
[ 5918.647125] pciehp 0000:80:02.2:pcie04: pcie_isr: intr_loc 10
[ 5918.653039] pciehp 0000:80:02.2:pcie04: pciehp_green_led_blink: SLOTCTRL a8 write cmd 200
[ 5918.661229] pciehp 0000:80:02.2:pcie04: pciehp_set_attention_status: SLOTCTRL a8 write cmd c0
[ 5924.667627] pciehp 0000:80:02.2:pcie04: Disabling domain:bus:device=0000:b0:00
[ 5924.674909] pciehp 0000:80:02.2:pcie04: pciehp_get_power_status: SLOTCTRL a8 value read 2f9
[ 5924.683262] pciehp 0000:80:02.2:pcie04: pciehp_unconfigure_device: domain:bus:dev = 0000:b0:00
[ 5924.693976] libfcoe_device_notification: NETDEV_UNREGISTER eth6
[ 5924.764979] libfcoe_device_notification: NETDEV_UNREGISTER eth14
[ 5924.873539] libfcoe_device_notification: NETDEV_UNREGISTER eth15
[ 5924.995209] libfcoe_device_notification: NETDEV_UNREGISTER eth16
[ 5926.114407] sxge 0000:b2:00.0: PCI INT A disabled
[ 5926.119342] BUG: unable to handle kernel NULL pointer dereference at (null)
[ 5926.127189] IP: [<ffffffff81353a3b>] pci_stop_bus_device+0x33/0x83
[ 5926.133377] PGD 0
[ 5926.135402] Oops: 0000 [#1] SMP
[ 5926.138659] CPU 2
[ 5926.140499] Modules linked in:
...
[ 5926.143754]
[ 5926.275823] Call Trace:
[ 5926.278267]  [<ffffffff81353a38>] pci_stop_bus_device+0x30/0x83
[ 5926.284180]  [<ffffffff81353af4>] pci_remove_bus_device+0x1a/0xba
[ 5926.290264]  [<ffffffff81366311>] pciehp_unconfigure_device+0x110/0x17b
[ 5926.296866]  [<ffffffff81365dd9>] ? pciehp_disable_slot+0x188/0x188
[ 5926.303123]  [<ffffffff81365d6f>] pciehp_disable_slot+0x11e/0x188
[ 5926.309206]  [<ffffffff81365e68>] pciehp_power_thread+0x8f/0xe0
...

 +-[0000:80]-+-00.0-[81-8f]--
 |           +-01.0-[90-9f]--
 |           +-02.0-[a0-af]--
 |           +-02.2-[b0-bf]----00.0-[b1-b3]--+-02.0-[b2]--+-00.0 Device
 |           |                               |            +-00.1 Device
 |           |                               |            +-00.2 Device
 |           |                               |            \-00.3 Device
 |           |                               \-03.0-[b3]--+-00.0 Device
 |           |                                            +-00.1 Device
 |           |                                            +-00.2 Device
 |           |                                            \-00.3 Device

root complex: 80:02.2
pci express modules: have pcie switch and are listed as b0:00.0, b1:02.0 and b1:03.0.
                end devices  are b2:00.0 and b3.00.0.
                VFs are: b2:00.1,... b2:00.3, and b3:00.1,...,b3:00.3

Root cause: when doing pci_stop_bus_device() with phys fn, it will stop virt fn and
remove the fn, so
	list_for_each_safe(l, n, &bus->devices)
will have problem to refer freed n that is pointed to vf entry.

Solution is just replacing list_for_each_safe() with list_for_each().
	pci_stop_bus_device(dev) will not remove dev from bus->devices list,
	so We only need use for_each here.

During reviewing the patch, Bjorn said:
|   The PCI hot-remove path calls pci_stop_bus_devices() via
|   pci_remove_bus_device().
|
|   pci_stop_bus_devices() traverses the bus->devices list (point A below),
|   stopping each device in turn, which calls the driver remove() method.  When
|   the device is an SR-IOV PF, the driver calls pci_disable_sriov(), which
|   also uses pci_remove_bus_device() to remove the VF devices from the
|   bus->devices list (point B).
|
|       pci_remove_bus_device
|         pci_stop_bus_device
|           pci_stop_bus_devices(subordinate)
|             list_for_each(bus->devices)             <-- A
|               pci_stop_bus_device(PF)
|                 ...
|                   driver->remove
|                     pci_disable_sriov
|                       ...
|                         pci_remove_bus_device(VF)
|                             <remove from bus_list>  <-- B
|
|   At B, we're changing the same list we're iterating through at A, so when
|   the driver remove() method returns, the pci_stop_bus_devices() iterator has
|   a pointer to a list entry that has already been freed.

Discussion thread can be found : https://lkml.org/lkml/2011/10/15/141

-v2: updated changelog.
-v3: According to Kenji, do not allocate another list for PF.
     Also We don't need to check dev->is_virtfn anymore, because PF should come first
	in the list, and even VF come first, it is still ok.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>

---
 drivers/pci/remove.c |   17 +++++++++++++++--
 1 file changed, 15 insertions(+), 2 deletions(-)

Index: linux-2.6/drivers/pci/remove.c
===================================================================
--- linux-2.6.orig/drivers/pci/remove.c
+++ linux-2.6/drivers/pci/remove.c
@@ -122,11 +122,24 @@ void pci_remove_behind_bridge(struct pci
 
 static void pci_stop_bus_devices(struct pci_bus *bus)
 {
-	struct list_head *l, *n;
+	struct list_head *l;
 
-	list_for_each_safe(l, n, &bus->devices) {
+	/*
+	 * pci_stop_bus_device(dev) will not remove dev from bus->devices list,
+	 *  so We don't need use _safe version for_each here.
+	 * Also _safe version has problem when pci_stop_bus_device() for PF try
+	 *  to remove VFs.
+	 */
+	list_for_each(l, &bus->devices) {
 		struct pci_dev *dev = pci_dev_b(l);
 		pci_stop_bus_device(dev);
+		/*
+		 * VFs are removed by pci_remove_bus_device() in the
+		 *  pci_stop_bus_devices() code path for PF.
+		 *  aka, bus->devices get updated in the process.
+		 * barrier() will make sure we get right next from that list.
+		 */
+		barrier();
 	}
 }
 

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RESEND PATCH] PCI: Make sriov work with hotplug remove
  2011-11-16 21:23                   ` Yinghai Lu
@ 2011-11-23  1:58                     ` Kenji Kaneshige
  2011-11-23  5:01                       ` [PATCH -v4] " Yinghai Lu
  0 siblings, 1 reply; 31+ messages in thread
From: Kenji Kaneshige @ 2011-11-23  1:58 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Jesse Barnes, Bjorn Helgaas, linux-pci@vger.kernel.org,
	linux-kernel@vger.kernel.org

(2011/11/17 6:23), Yinghai Lu wrote:
> On 11/15/2011 09:54 PM, Kenji Kaneshige wrote:
>
>> Sorry for the delayed comment.
>>
>> I don't think allocating new memory for device removal is good idea. And
>> your code doesn't seem to work if memory allocation failed. What about
>> something like below? Note that it is not tested at all, sorry.
>>
>
> Yes, it should work.
>
> I updated the patch, use list_entry_for_each directly. also remove VF checking.
>
> Please check it.
>
> Thanks
>
> Yinghai
>
> Subject: [PATCH -v3] PCI: Make sriov work with hotplug remove
>
> When hot remove pci express module that have pcie switch and support SRIOV, got
>
> [ 5918.610127] pciehp 0000:80:02.2:pcie04: pcie_isr: intr_loc 1
> [ 5918.615779] pciehp 0000:80:02.2:pcie04: Attention button interrupt received
> [ 5918.622730] pciehp 0000:80:02.2:pcie04: Button pressed on Slot(3)
> [ 5918.629002] pciehp 0000:80:02.2:pcie04: pciehp_get_power_status: SLOTCTRL a8 value read 1f9
> [ 5918.637416] pciehp 0000:80:02.2:pcie04: PCI slot #3 - powering off due to button press.
> [ 5918.647125] pciehp 0000:80:02.2:pcie04: pcie_isr: intr_loc 10
> [ 5918.653039] pciehp 0000:80:02.2:pcie04: pciehp_green_led_blink: SLOTCTRL a8 write cmd 200
> [ 5918.661229] pciehp 0000:80:02.2:pcie04: pciehp_set_attention_status: SLOTCTRL a8 write cmd c0
> [ 5924.667627] pciehp 0000:80:02.2:pcie04: Disabling domain:bus:device=0000:b0:00
> [ 5924.674909] pciehp 0000:80:02.2:pcie04: pciehp_get_power_status: SLOTCTRL a8 value read 2f9
> [ 5924.683262] pciehp 0000:80:02.2:pcie04: pciehp_unconfigure_device: domain:bus:dev = 0000:b0:00
> [ 5924.693976] libfcoe_device_notification: NETDEV_UNREGISTER eth6
> [ 5924.764979] libfcoe_device_notification: NETDEV_UNREGISTER eth14
> [ 5924.873539] libfcoe_device_notification: NETDEV_UNREGISTER eth15
> [ 5924.995209] libfcoe_device_notification: NETDEV_UNREGISTER eth16
> [ 5926.114407] sxge 0000:b2:00.0: PCI INT A disabled
> [ 5926.119342] BUG: unable to handle kernel NULL pointer dereference at (null)
> [ 5926.127189] IP: [<ffffffff81353a3b>] pci_stop_bus_device+0x33/0x83
> [ 5926.133377] PGD 0
> [ 5926.135402] Oops: 0000 [#1] SMP
> [ 5926.138659] CPU 2
> [ 5926.140499] Modules linked in:
> ...
> [ 5926.143754]
> [ 5926.275823] Call Trace:
> [ 5926.278267]  [<ffffffff81353a38>] pci_stop_bus_device+0x30/0x83
> [ 5926.284180]  [<ffffffff81353af4>] pci_remove_bus_device+0x1a/0xba
> [ 5926.290264]  [<ffffffff81366311>] pciehp_unconfigure_device+0x110/0x17b
> [ 5926.296866]  [<ffffffff81365dd9>] ? pciehp_disable_slot+0x188/0x188
> [ 5926.303123]  [<ffffffff81365d6f>] pciehp_disable_slot+0x11e/0x188
> [ 5926.309206]  [<ffffffff81365e68>] pciehp_power_thread+0x8f/0xe0
> ...
>
>   +-[0000:80]-+-00.0-[81-8f]--
>   |           +-01.0-[90-9f]--
>   |           +-02.0-[a0-af]--
>   |           +-02.2-[b0-bf]----00.0-[b1-b3]--+-02.0-[b2]--+-00.0 Device
>   |           |                               |            +-00.1 Device
>   |           |                               |            +-00.2 Device
>   |           |                               |            \-00.3 Device
>   |           |                               \-03.0-[b3]--+-00.0 Device
>   |           |                                            +-00.1 Device
>   |           |                                            +-00.2 Device
>   |           |                                            \-00.3 Device
>
> root complex: 80:02.2
> pci express modules: have pcie switch and are listed as b0:00.0, b1:02.0 and b1:03.0.
>                  end devices  are b2:00.0 and b3.00.0.
>                  VFs are: b2:00.1,... b2:00.3, and b3:00.1,...,b3:00.3
>
> Root cause: when doing pci_stop_bus_device() with phys fn, it will stop virt fn and
> remove the fn, so
> 	list_for_each_safe(l, n,&bus->devices)
> will have problem to refer freed n that is pointed to vf entry.
>
> Solution is just replacing list_for_each_safe() with list_for_each().
> 	pci_stop_bus_device(dev) will not remove dev from bus->devices list,
> 	so We only need use for_each here.
>
> During reviewing the patch, Bjorn said:
> |   The PCI hot-remove path calls pci_stop_bus_devices() via
> |   pci_remove_bus_device().
> |
> |   pci_stop_bus_devices() traverses the bus->devices list (point A below),
> |   stopping each device in turn, which calls the driver remove() method.  When
> |   the device is an SR-IOV PF, the driver calls pci_disable_sriov(), which
> |   also uses pci_remove_bus_device() to remove the VF devices from the
> |   bus->devices list (point B).
> |
> |       pci_remove_bus_device
> |         pci_stop_bus_device
> |           pci_stop_bus_devices(subordinate)
> |             list_for_each(bus->devices)<-- A
> |               pci_stop_bus_device(PF)
> |                 ...
> |                   driver->remove
> |                     pci_disable_sriov
> |                       ...
> |                         pci_remove_bus_device(VF)
> |<remove from bus_list>   <-- B
> |
> |   At B, we're changing the same list we're iterating through at A, so when
> |   the driver remove() method returns, the pci_stop_bus_devices() iterator has
> |   a pointer to a list entry that has already been freed.
>
> Discussion thread can be found : https://lkml.org/lkml/2011/10/15/141
>
> -v2: updated changelog.
> -v3: According to Kenji, do not allocate another list for PF.
>       Also We don't need to check dev->is_virtfn anymore, because PF should come first
> 	in the list, and even VF come first, it is still ok.

I think we should not have an assumption that PF comes first in
the list. So I think dev->is_virtfn check is needed.

On the other hand, if we add the dev->is_virtfn check,
pci_stop_bus_devices() doesn't work for buses whose &bus->devices
list have only VFs(, though I think pci_stop_bus_devices() is not
used for this kind of bus currently). How about adding the
pci_stop_bus_device() loop for VFs after the loop for PFs, as you
did in your first try?

>
> Signed-off-by: Yinghai Lu<yinghai@kernel.org>
>
> ---
>   drivers/pci/remove.c |   17 +++++++++++++++--
>   1 file changed, 15 insertions(+), 2 deletions(-)
>
> Index: linux-2.6/drivers/pci/remove.c
> ===================================================================
> --- linux-2.6.orig/drivers/pci/remove.c
> +++ linux-2.6/drivers/pci/remove.c
> @@ -122,11 +122,24 @@ void pci_remove_behind_bridge(struct pci
>
>   static void pci_stop_bus_devices(struct pci_bus *bus)
>   {
> -	struct list_head *l, *n;
> +	struct list_head *l;
>
> -	list_for_each_safe(l, n,&bus->devices) {
> +	/*
> +	 * pci_stop_bus_device(dev) will not remove dev from bus->devices list,
> +	 *  so We don't need use _safe version for_each here.
> +	 * Also _safe version has problem when pci_stop_bus_device() for PF try
> +	 *  to remove VFs.
> +	 */
> +	list_for_each(l,&bus->devices) {
>   		struct pci_dev *dev = pci_dev_b(l);
>   		pci_stop_bus_device(dev);
> +		/*
> +		 * VFs are removed by pci_remove_bus_device() in the
> +		 *  pci_stop_bus_devices() code path for PF.
> +		 *  aka, bus->devices get updated in the process.
> +		 * barrier() will make sure we get right next from that list.
> +		 */
> +		barrier();

I think we should not use list_for_each() because

   - difficult to understand the intention of the barrier().
   - might not work if list_for_each() implementation were changed.

Regards,
Kenji Kaneshige

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH -v4] PCI: Make sriov work with hotplug remove
  2011-11-23  1:58                     ` Kenji Kaneshige
@ 2011-11-23  5:01                       ` Yinghai Lu
  0 siblings, 0 replies; 31+ messages in thread
From: Yinghai Lu @ 2011-11-23  5:01 UTC (permalink / raw)
  To: Jesse Barnes
  Cc: Kenji Kaneshige, Bjorn Helgaas, linux-pci@vger.kernel.org,
	linux-kernel@vger.kernel.org


When hot remove pci express module that have pcie switch and support SRIOV, got

[ 5918.610127] pciehp 0000:80:02.2:pcie04: pcie_isr: intr_loc 1
[ 5918.615779] pciehp 0000:80:02.2:pcie04: Attention button interrupt received
[ 5918.622730] pciehp 0000:80:02.2:pcie04: Button pressed on Slot(3)
[ 5918.629002] pciehp 0000:80:02.2:pcie04: pciehp_get_power_status: SLOTCTRL a8 value read 1f9
[ 5918.637416] pciehp 0000:80:02.2:pcie04: PCI slot #3 - powering off due to button press.
[ 5918.647125] pciehp 0000:80:02.2:pcie04: pcie_isr: intr_loc 10
[ 5918.653039] pciehp 0000:80:02.2:pcie04: pciehp_green_led_blink: SLOTCTRL a8 write cmd 200
[ 5918.661229] pciehp 0000:80:02.2:pcie04: pciehp_set_attention_status: SLOTCTRL a8 write cmd c0
[ 5924.667627] pciehp 0000:80:02.2:pcie04: Disabling domain:bus:device=0000:b0:00
[ 5924.674909] pciehp 0000:80:02.2:pcie04: pciehp_get_power_status: SLOTCTRL a8 value read 2f9
[ 5924.683262] pciehp 0000:80:02.2:pcie04: pciehp_unconfigure_device: domain:bus:dev = 0000:b0:00
[ 5924.693976] libfcoe_device_notification: NETDEV_UNREGISTER eth6
[ 5924.764979] libfcoe_device_notification: NETDEV_UNREGISTER eth14
[ 5924.873539] libfcoe_device_notification: NETDEV_UNREGISTER eth15
[ 5924.995209] libfcoe_device_notification: NETDEV_UNREGISTER eth16
[ 5926.114407] sxge 0000:b2:00.0: PCI INT A disabled
[ 5926.119342] BUG: unable to handle kernel NULL pointer dereference at (null)
[ 5926.127189] IP: [<ffffffff81353a3b>] pci_stop_bus_device+0x33/0x83
[ 5926.133377] PGD 0
[ 5926.135402] Oops: 0000 [#1] SMP
[ 5926.138659] CPU 2
[ 5926.140499] Modules linked in:
...
[ 5926.143754]
[ 5926.275823] Call Trace:
[ 5926.278267]  [<ffffffff81353a38>] pci_stop_bus_device+0x30/0x83
[ 5926.284180]  [<ffffffff81353af4>] pci_remove_bus_device+0x1a/0xba
[ 5926.290264]  [<ffffffff81366311>] pciehp_unconfigure_device+0x110/0x17b
[ 5926.296866]  [<ffffffff81365dd9>] ? pciehp_disable_slot+0x188/0x188
[ 5926.303123]  [<ffffffff81365d6f>] pciehp_disable_slot+0x11e/0x188
[ 5926.309206]  [<ffffffff81365e68>] pciehp_power_thread+0x8f/0xe0
...

 +-[0000:80]-+-00.0-[81-8f]--
 |           +-01.0-[90-9f]--
 |           +-02.0-[a0-af]--
 |           +-02.2-[b0-bf]----00.0-[b1-b3]--+-02.0-[b2]--+-00.0 Device
 |           |                               |            +-00.1 Device
 |           |                               |            +-00.2 Device
 |           |                               |            \-00.3 Device
 |           |                               \-03.0-[b3]--+-00.0 Device
 |           |                                            +-00.1 Device
 |           |                                            +-00.2 Device
 |           |                                            \-00.3 Device

root complex: 80:02.2
pci express modules: have pcie switch and are listed as b0:00.0, b1:02.0 and b1:03.0.
                end devices  are b2:00.0 and b3.00.0.
                VFs are: b2:00.1,... b2:00.3, and b3:00.1,...,b3:00.3

Root cause: when doing pci_stop_bus_device() with phys fn, it will stop virt fn and
remove the fn, so
	list_for_each_safe(l, n, &bus->devices)
will have problem to refer freed n that is pointed to vf entry.

Solution is just replacing list_for_each_safe() with list_for_each().
	pci_stop_bus_device(dev) will not remove dev from bus->devices list,
	so We only need use for_each here.

During reviewing the patch, Bjorn said:
|   The PCI hot-remove path calls pci_stop_bus_devices() via
|   pci_remove_bus_device().
|
|   pci_stop_bus_devices() traverses the bus->devices list (point A below),
|   stopping each device in turn, which calls the driver remove() method.  When
|   the device is an SR-IOV PF, the driver calls pci_disable_sriov(), which
|   also uses pci_remove_bus_device() to remove the VF devices from the
|   bus->devices list (point B).
|
|       pci_remove_bus_device
|         pci_stop_bus_device
|           pci_stop_bus_devices(subordinate)
|             list_for_each(bus->devices)             <-- A
|               pci_stop_bus_device(PF)
|                 ...
|                   driver->remove
|                     pci_disable_sriov
|                       ...
|                         pci_remove_bus_device(VF)
|                             <remove from bus_list>  <-- B
|
|   At B, we're changing the same list we're iterating through at A, so when
|   the driver remove() method returns, the pci_stop_bus_devices() iterator has
|   a pointer to a list entry that has already been freed.

Discussion thread can be found : https://lkml.org/lkml/2011/10/15/141

-v2: updated changelog.
-v3: According to Kenji, do not allocate another list for PF.
     Also We don't need to check dev->is_virtfn anymore, because PF should come
	first in the list, and even VF come first, it is still ok.
-v4: According to Kenji, expand list_for_each(), and add is_virtfn checking

Signed-off-by: Yinghai Lu <yinghai@kernel.org>

---
 drivers/pci/remove.c |   28 +++++++++++++++++++++++++++-
 1 file changed, 27 insertions(+), 1 deletion(-)

Index: linux-2.6/drivers/pci/remove.c
===================================================================
--- linux-2.6.orig/drivers/pci/remove.c
+++ linux-2.6/drivers/pci/remove.c
@@ -123,10 +123,36 @@ void pci_remove_behind_bridge(struct pci
 static void pci_stop_bus_devices(struct pci_bus *bus)
 {
 	struct list_head *l, *n;
+	struct list_head *head = &bus->devices;
 
+	/*
+	 * pci_stop_bus_device(dev) will not remove dev from bus->devices list,
+	 *  so We don't need use _safe version for_each here.
+	 * Also _safe version has problem when pci_stop_bus_device() for PF try
+	 *  to remove VFs.
+	 */
+	for (l = head->next; l != head;) {
+		struct pci_dev *dev = pci_dev_b(l);
+
+		/*
+		 * VFs are removed by pci_remove_bus_device() in the
+		 *  pci_stop_bus_devices() code path for PF.
+		 *  aka, bus->devices get updated in the process.
+		 * barrier() will make sure we get right next from that list.
+		 */
+		if (!dev->is_virtfn) {
+			pci_stop_bus_device(dev);
+			barrier();
+		}
+		l = l->next;
+	}
+
+	/* For left over VFs in case */
 	list_for_each_safe(l, n, &bus->devices) {
 		struct pci_dev *dev = pci_dev_b(l);
-		pci_stop_bus_device(dev);
+
+		if (dev->is_virtfn)
+			pci_stop_bus_device(dev);
 	}
 }
 

^ permalink raw reply	[flat|nested] 31+ messages in thread

end of thread, other threads:[~2011-11-23  5:02 UTC | newest]

Thread overview: 31+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <4E9A3092.4080309@oracle.com>
2011-10-16  1:31 ` [PATCH 1/8] pci: Make sriov work with hotplug removal Yinghai Lu
2011-10-17 17:16   ` Bjorn Helgaas
2011-10-17 18:08     ` Yinghai Lu
2011-10-17 22:12       ` Bjorn Helgaas
2011-10-17 22:24         ` Yinghai Lu
2011-10-18 16:49           ` Bjorn Helgaas
2011-10-18 17:02             ` Yinghai Lu
2011-10-25 12:34               ` Kenji Kaneshige
2011-10-25 15:52                 ` Yinghai Lu
2011-11-11 21:00               ` [RESEND PATCH] PCI: Make sriov work with hotplug remove Yinghai Lu
2011-11-16  5:54                 ` Kenji Kaneshige
2011-11-16 21:23                   ` Yinghai Lu
2011-11-23  1:58                     ` Kenji Kaneshige
2011-11-23  5:01                       ` [PATCH -v4] " Yinghai Lu
2011-10-16  1:31 ` [PATCH 2/8] pci: Calculate right add_size Yinghai Lu
2011-10-16  1:32 ` [PATCH 3/8] pci: Try to assign required+option size at first Yinghai Lu
2011-10-16  1:32 ` [PATCH 4/8] PCI: Using add_list in pcie hotplug path Yinghai Lu
2011-10-16  1:32 ` [PATCH 5/8] PCI: Make rescan bus could increase bridge resource size if needed Yinghai Lu
2011-10-16  1:32 ` [PATCH 6/8] PCI: Make pci_rescan_bus handle add_list Yinghai Lu
2011-10-16  1:32 ` [PATCH 7/8] PCI, sysfs: merge dev and bus cpuaffinity show handling Yinghai Lu
2011-10-16  1:32 ` [PATCH 8/8] PCI, sys: only create rescan under /sys/.../pci/devices/... for pci bridges Yinghai Lu
2011-10-16  2:39   ` Greg KH
2011-10-16  5:34     ` Yinghai Lu
2011-10-16 15:55       ` Greg KH
2011-10-16 23:35         ` Yinghai Lu
2011-10-17  1:45           ` Greg KH
2011-10-17 18:27             ` [PATCH -v4 8_1/8] PCI, sys: Use device_type and attr_groups with pci dev Yinghai Lu
2011-10-17 18:38               ` Greg KH
2011-10-17 18:29             ` [PATCH -v4 8_2/8] PCI, sys: only create rescan under /sys/.../pci/devices/... for pci bridges Yinghai Lu
2011-10-17 18:33             ` [PATCH -v4 8_3/8] PCI, sys: Use is_visable() with boot_vga attribute for pci_dev Yinghai Lu
2011-10-17 18:36             ` [PATCH 8/8] PCI, sys: only create rescan under /sys/.../pci/devices/... for pci bridges Yinghai Lu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).