linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] PCI: Fix hotplug remove with sriov again
@ 2013-07-19 19:14 Yinghai Lu
  2013-07-19 19:14 ` [PATCH] PCI: Separate stop and remove devices in pciehp Yinghai Lu
                   ` (5 more replies)
  0 siblings, 6 replies; 26+ messages in thread
From: Yinghai Lu @ 2013-07-19 19:14 UTC (permalink / raw)
  To: Bjorn Helgaas; +Cc: linux-pci, Yinghai Lu, Yijing Wang, stable

Found hot-remove pcie card with sriov enabled cause crash in v3.10.

It is regression caused by commit ba518e3c177547dfebf7fa7252cea0c850e7ce25
(PCI: pciehp: Iterate over all devices in slot, not functions 0-7)

That commit change to use bus->devices to iterate devices under
bus to run pci_stop_and_remove_bus_device().
Actually it duplicates the problem with those bus->devices iteratation
that we try to fix in commit ac205b7bb72fa4227d2e79979bbe2b4687cdf44d
(PCI: make sriov work with hotplug remove)

Change to iterate reversely as we did last time.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: Yijing Wang <wangyijing@huawei.com>
Cc: <stable@vger.kernel.org> v3.9+

---
 drivers/pci/hotplug/pciehp_pci.c |    8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

Index: linux-2.6/drivers/pci/hotplug/pciehp_pci.c
===================================================================
--- linux-2.6.orig/drivers/pci/hotplug/pciehp_pci.c
+++ linux-2.6/drivers/pci/hotplug/pciehp_pci.c
@@ -92,7 +92,13 @@ int pciehp_unconfigure_device(struct slo
 	if (ret)
 		presence = 0;
 
-	list_for_each_entry_safe(dev, temp, &parent->devices, bus_list) {
+	/*
+	 * Need to iterate device reversely, as during
+	 * stop PF driver, VF will be removed, the list_for_each
+	 * could point to removed VF with temp.
+	 */
+	list_for_each_entry_safe_reverse(dev, temp, &parent->devices,
+					 bus_list) {
 		pci_dev_get(dev);
 		if (dev->hdr_type == PCI_HEADER_TYPE_BRIDGE && presence) {
 			pci_read_config_byte(dev, PCI_BRIDGE_CONTROL, &bctl);

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH] PCI: Separate stop and remove devices in pciehp
  2013-07-19 19:14 [PATCH] PCI: Fix hotplug remove with sriov again Yinghai Lu
@ 2013-07-19 19:14 ` Yinghai Lu
  2013-07-22 21:23   ` Bjorn Helgaas
  2013-07-19 19:14 ` [PATCH] PCI: Stop sriov before remove PF Yinghai Lu
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 26+ messages in thread
From: Yinghai Lu @ 2013-07-19 19:14 UTC (permalink / raw)
  To: Bjorn Helgaas; +Cc: linux-pci, Yinghai Lu, Yijing Wang, Jiang Liu

After commit dc087f2f6a2925e81831f3016b9cbb6e470e7423
(PCI: Simplify IOV implementation and fix reference count races)
VF need to be removed via virtfn_remove to make sure ref to PF
is put back.

So we can not call stop_and_remove for VF before PF, that will
make VF get removed directly before PF's driver try to use
virtfn_remove to do it.

Solution is separating stop and remove in two iterations.

In first iteration, we stop VF driver at first with iterating devices
reversely, and later during stop PF driver, disable_sriov will use
virtfn_remove to remove VFs.

Also some driver (like mlx4_core) need VF's driver get stopped before PF.

To make it simple, separate VGA checking out and do that at first,
if there is VGA in the chain, do even try to stop or remove any device
under that bus.

Need this one for v3.11.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: Yijing Wang <wangyijing@huawei.com>
Cc: Jiang Liu <liuj97@gmail.com>

---
 drivers/pci/hotplug/pciehp_pci.c |   46 ++++++++++++++++++++++++++-------------
 drivers/pci/remove.c             |    6 +++--
 include/linux/pci.h              |    2 +
 3 files changed, 37 insertions(+), 17 deletions(-)

Index: linux-2.6/drivers/pci/hotplug/pciehp_pci.c
===================================================================
--- linux-2.6.orig/drivers/pci/hotplug/pciehp_pci.c
+++ linux-2.6/drivers/pci/hotplug/pciehp_pci.c
@@ -92,26 +92,37 @@ int pciehp_unconfigure_device(struct slo
 	if (ret)
 		presence = 0;
 
+	/* check if VGA is around */
+	if (presence) {
+		list_for_each_entry(dev, &parent->devices, bus_list) {
+			if (dev->hdr_type == PCI_HEADER_TYPE_BRIDGE) {
+				pci_read_config_byte(dev, PCI_BRIDGE_CONTROL,
+							&bctl);
+				if (bctl & PCI_BRIDGE_CTL_VGA) {
+					ctrl_err(ctrl,
+						 "Cannot remove display device %s\n",
+						 pci_name(dev));
+					return -EINVAL;
+				}
+			}
+		}
+	}
+
 	/*
-	 * Need to iterate device reversely, as during
+	 * Now VF need to be removed via virtfn_remove to make
+	 * sure ref to PF is put back. Some driver (mlx4_core) need
+	 * VF's driver get stopped before PF.
+	 * So we need to stop VF driver at first, that means
+	 * loop reversely, and later during stop PF driver,
+	 * disable_sriov will use virtfn_remove to remove VFs.
+	 * Also we can not loop without reverse, as during
 	 * stop PF driver, VF will be removed, the list_for_each
 	 * could point to removed VF with temp.
 	 */
 	list_for_each_entry_safe_reverse(dev, temp, &parent->devices,
-					 bus_list) {
-		pci_dev_get(dev);
-		if (dev->hdr_type == PCI_HEADER_TYPE_BRIDGE && presence) {
-			pci_read_config_byte(dev, PCI_BRIDGE_CONTROL, &bctl);
-			if (bctl & PCI_BRIDGE_CTL_VGA) {
-				ctrl_err(ctrl,
-					 "Cannot remove display device %s\n",
-					 pci_name(dev));
-				pci_dev_put(dev);
-				rc = -EINVAL;
-				break;
-			}
-		}
-		pci_stop_and_remove_bus_device(dev);
+						 bus_list) {
+		pci_stop_bus_device(dev);
+
 		/*
 		 * Ensure that no new Requests will be generated from
 		 * the device.
@@ -122,6 +133,11 @@ int pciehp_unconfigure_device(struct slo
 			command |= PCI_COMMAND_INTX_DISABLE;
 			pci_write_config_word(dev, PCI_COMMAND, command);
 		}
+	}
+
+	list_for_each_entry_safe(dev, temp, &parent->devices, bus_list) {
+		pci_dev_get(dev);
+		pci_remove_bus_device(dev);
 		pci_dev_put(dev);
 	}
 
Index: linux-2.6/drivers/pci/remove.c
===================================================================
--- linux-2.6.orig/drivers/pci/remove.c
+++ linux-2.6/drivers/pci/remove.c
@@ -56,7 +56,7 @@ void pci_remove_bus(struct pci_bus *bus)
 }
 EXPORT_SYMBOL(pci_remove_bus);
 
-static void pci_stop_bus_device(struct pci_dev *dev)
+void pci_stop_bus_device(struct pci_dev *dev)
 {
 	struct pci_bus *bus = dev->subordinate;
 	struct pci_dev *child, *tmp;
@@ -75,8 +75,9 @@ static void pci_stop_bus_device(struct p
 
 	pci_stop_dev(dev);
 }
+EXPORT_SYMBOL(pci_stop_bus_device);
 
-static void pci_remove_bus_device(struct pci_dev *dev)
+void pci_remove_bus_device(struct pci_dev *dev)
 {
 	struct pci_bus *bus = dev->subordinate;
 	struct pci_dev *child, *tmp;
@@ -92,6 +93,7 @@ static void pci_remove_bus_device(struct
 
 	pci_destroy_dev(dev);
 }
+EXPORT_SYMBOL(pci_remove_bus_device);
 
 /**
  * pci_stop_and_remove_bus_device - remove a PCI device and any children
Index: linux-2.6/include/linux/pci.h
===================================================================
--- linux-2.6.orig/include/linux/pci.h
+++ linux-2.6/include/linux/pci.h
@@ -754,6 +754,8 @@ u8 pci_common_swizzle(struct pci_dev *de
 struct pci_dev *pci_dev_get(struct pci_dev *dev);
 void pci_dev_put(struct pci_dev *dev);
 void pci_remove_bus(struct pci_bus *b);
+void pci_stop_bus_device(struct pci_dev *dev);
+void pci_remove_bus_device(struct pci_dev *dev);
 void pci_stop_and_remove_bus_device(struct pci_dev *dev);
 void pci_stop_root_bus(struct pci_bus *bus);
 void pci_remove_root_bus(struct pci_bus *bus);

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH] PCI: Stop sriov before remove PF
  2013-07-19 19:14 [PATCH] PCI: Fix hotplug remove with sriov again Yinghai Lu
  2013-07-19 19:14 ` [PATCH] PCI: Separate stop and remove devices in pciehp Yinghai Lu
@ 2013-07-19 19:14 ` Yinghai Lu
  2013-07-19 21:46   ` Alexander Duyck
  2013-07-22 23:15   ` Bjorn Helgaas
  2013-07-22  7:07 ` [PATCH] PCI: Fix hotplug remove with sriov again Yijing Wang
                   ` (3 subsequent siblings)
  5 siblings, 2 replies; 26+ messages in thread
From: Yinghai Lu @ 2013-07-19 19:14 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: linux-pci, Yinghai Lu, Jiang Liu, Alexander Duyck, Donald Dutile,
	Greg Rose

After commit dc087f2f6a2925e81831f3016b9cbb6e470e7423
(PCI: Simplify IOV implementation and fix reference count races)
VF need to be removed via virtfn_remove to make sure ref to PF
is put back.

Some driver (like ixgbe) does not call pci_disable_sriov() if
sriov is enabled via /sys/.../sriov_numvfs setting.
ixgbe does allow driver for PF get detached, but still have VFs
around.

But how about PF get removed via /sys or pciehp?

During hot-remove, VF will still hold one ref to PF and it
prevent PF to be removed.
That make the next hot-add fails, as old PF dev struct is still around.

We need to add pci_disable_sriov() calling during pci dev removing.

Need this one for v3.11

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: Jiang Liu <liuj97@gmail.com>
Cc: Alexander Duyck <alexander.h.duyck@intel.com>
Cc: Donald Dutile <ddutile@redhat.com>
Cc: Greg Rose <gregory.v.rose@intel.com>

---
 drivers/pci/remove.c |    3 +++
 1 file changed, 3 insertions(+)

Index: linux-2.6/drivers/pci/remove.c
===================================================================
--- linux-2.6.orig/drivers/pci/remove.c
+++ linux-2.6/drivers/pci/remove.c
@@ -34,6 +34,9 @@ static void pci_stop_dev(struct pci_dev
 
 static void pci_destroy_dev(struct pci_dev *dev)
 {
+	/* remove VF, if PF driver skip that */
+	pci_disable_sriov(dev);
+
 	down_write(&pci_bus_sem);
 	list_del(&dev->bus_list);
 	up_write(&pci_bus_sem);

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] PCI: Stop sriov before remove PF
  2013-07-19 19:14 ` [PATCH] PCI: Stop sriov before remove PF Yinghai Lu
@ 2013-07-19 21:46   ` Alexander Duyck
  2013-07-19 22:44     ` Yinghai Lu
  2013-07-22 23:15   ` Bjorn Helgaas
  1 sibling, 1 reply; 26+ messages in thread
From: Alexander Duyck @ 2013-07-19 21:46 UTC (permalink / raw)
  To: Yinghai Lu; +Cc: Bjorn Helgaas, linux-pci, Jiang Liu, Donald Dutile, Greg Rose

On 07/19/2013 12:14 PM, Yinghai Lu wrote:
> After commit dc087f2f6a2925e81831f3016b9cbb6e470e7423
> (PCI: Simplify IOV implementation and fix reference count races)
> VF need to be removed via virtfn_remove to make sure ref to PF
> is put back.
> 
> Some driver (like ixgbe) does not call pci_disable_sriov() if
> sriov is enabled via /sys/.../sriov_numvfs setting.
> ixgbe does allow driver for PF get detached, but still have VFs
> around.
> 
> But how about PF get removed via /sys or pciehp?
> 
> During hot-remove, VF will still hold one ref to PF and it
> prevent PF to be removed.
> That make the next hot-add fails, as old PF dev struct is still around.
> 
> We need to add pci_disable_sriov() calling during pci dev removing.
> 
> Need this one for v3.11
> 
> Signed-off-by: Yinghai Lu <yinghai@kernel.org>
> Cc: Jiang Liu <liuj97@gmail.com>
> Cc: Alexander Duyck <alexander.h.duyck@intel.com>
> Cc: Donald Dutile <ddutile@redhat.com>
> Cc: Greg Rose <gregory.v.rose@intel.com>
> 
> ---
>  drivers/pci/remove.c |    3 +++
>  1 file changed, 3 insertions(+)
> 
> Index: linux-2.6/drivers/pci/remove.c
> ===================================================================
> --- linux-2.6.orig/drivers/pci/remove.c
> +++ linux-2.6/drivers/pci/remove.c
> @@ -34,6 +34,9 @@ static void pci_stop_dev(struct pci_dev
>  
>  static void pci_destroy_dev(struct pci_dev *dev)
>  {
> +	/* remove VF, if PF driver skip that */
> +	pci_disable_sriov(dev);
> +
>  	down_write(&pci_bus_sem);
>  	list_del(&dev->bus_list);
>  	up_write(&pci_bus_sem);
> 

How are you able to hot-remove the PF if the VFs are still holding
references to it?

The issue I see with this patch is that if the PF has any VFs direct
assigned, hot plug removing the PF will cause the guests containing
those VFs to panic.

Thanks,

Alex



Thanks,

Alex


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] PCI: Stop sriov before remove PF
  2013-07-19 21:46   ` Alexander Duyck
@ 2013-07-19 22:44     ` Yinghai Lu
  2013-07-19 23:22       ` Alexander Duyck
  0 siblings, 1 reply; 26+ messages in thread
From: Yinghai Lu @ 2013-07-19 22:44 UTC (permalink / raw)
  To: Alexander Duyck
  Cc: Bjorn Helgaas, linux-pci@vger.kernel.org, Jiang Liu,
	Donald Dutile, Greg Rose

On Fri, Jul 19, 2013 at 2:46 PM, Alexander Duyck
<alexander.h.duyck@intel.com> wrote:
> On 07/19/2013 12:14 PM, Yinghai Lu wrote:
>> After commit dc087f2f6a2925e81831f3016b9cbb6e470e7423
>> (PCI: Simplify IOV implementation and fix reference count races)
>> VF need to be removed via virtfn_remove to make sure ref to PF
>> is put back.
>>
>> Some driver (like ixgbe) does not call pci_disable_sriov() if
>> sriov is enabled via /sys/.../sriov_numvfs setting.
>> ixgbe does allow driver for PF get detached, but still have VFs
>> around.
>>
>> But how about PF get removed via /sys or pciehp?
>>
>> During hot-remove, VF will still hold one ref to PF and it
>> prevent PF to be removed.
>> That make the next hot-add fails, as old PF dev struct is still around.
>>
>> We need to add pci_disable_sriov() calling during pci dev removing.
>>
>> Need this one for v3.11
>>
>> Signed-off-by: Yinghai Lu <yinghai@kernel.org>
>> Cc: Jiang Liu <liuj97@gmail.com>
>> Cc: Alexander Duyck <alexander.h.duyck@intel.com>
>> Cc: Donald Dutile <ddutile@redhat.com>
>> Cc: Greg Rose <gregory.v.rose@intel.com>
>>
>> ---
>>  drivers/pci/remove.c |    3 +++
>>  1 file changed, 3 insertions(+)
>>
>> Index: linux-2.6/drivers/pci/remove.c
>> ===================================================================
>> --- linux-2.6.orig/drivers/pci/remove.c
>> +++ linux-2.6/drivers/pci/remove.c
>> @@ -34,6 +34,9 @@ static void pci_stop_dev(struct pci_dev
>>
>>  static void pci_destroy_dev(struct pci_dev *dev)
>>  {
>> +     /* remove VF, if PF driver skip that */
>> +     pci_disable_sriov(dev);
>> +
>>       down_write(&pci_bus_sem);
>>       list_del(&dev->bus_list);
>>       up_write(&pci_bus_sem);
>>
>
> How are you able to hot-remove the PF if the VFs are still holding
> references to it?

usually pci_stop_and_remove_bus_device always successfully, and
power get turned off for that card.

>
> The issue I see with this patch is that if the PF has any VFs direct
> assigned, hot plug removing the PF will cause the guests containing
> those VFs to panic.

Then you should make guest support hotplug or suprise removal.

If the guest does panic because it does support hotplug, that is right behavior.

Just like in bare metal machine, if it does not support hotplug, and user would
know what is going to happen if he remove one pcie card.

Thanks

Yinghai

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] PCI: Stop sriov before remove PF
  2013-07-19 22:44     ` Yinghai Lu
@ 2013-07-19 23:22       ` Alexander Duyck
  2013-07-23 15:34         ` Don Dutile
  0 siblings, 1 reply; 26+ messages in thread
From: Alexander Duyck @ 2013-07-19 23:22 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Bjorn Helgaas, linux-pci@vger.kernel.org, Jiang Liu,
	Donald Dutile, Greg Rose

On 07/19/2013 03:44 PM, Yinghai Lu wrote:
> On Fri, Jul 19, 2013 at 2:46 PM, Alexander Duyck
> <alexander.h.duyck@intel.com> wrote:
>> On 07/19/2013 12:14 PM, Yinghai Lu wrote:
>>> After commit dc087f2f6a2925e81831f3016b9cbb6e470e7423
>>> (PCI: Simplify IOV implementation and fix reference count races)
>>> VF need to be removed via virtfn_remove to make sure ref to PF
>>> is put back.
>>>
>>> Some driver (like ixgbe) does not call pci_disable_sriov() if
>>> sriov is enabled via /sys/.../sriov_numvfs setting.
>>> ixgbe does allow driver for PF get detached, but still have VFs
>>> around.
>>>
>>> But how about PF get removed via /sys or pciehp?
>>>
>>> During hot-remove, VF will still hold one ref to PF and it
>>> prevent PF to be removed.
>>> That make the next hot-add fails, as old PF dev struct is still around.
>>>
>>> We need to add pci_disable_sriov() calling during pci dev removing.
>>>
>>> Need this one for v3.11
>>>
>>> Signed-off-by: Yinghai Lu <yinghai@kernel.org>
>>> Cc: Jiang Liu <liuj97@gmail.com>
>>> Cc: Alexander Duyck <alexander.h.duyck@intel.com>
>>> Cc: Donald Dutile <ddutile@redhat.com>
>>> Cc: Greg Rose <gregory.v.rose@intel.com>
>>>
>>> ---
>>>  drivers/pci/remove.c |    3 +++
>>>  1 file changed, 3 insertions(+)
>>>
>>> Index: linux-2.6/drivers/pci/remove.c
>>> ===================================================================
>>> --- linux-2.6.orig/drivers/pci/remove.c
>>> +++ linux-2.6/drivers/pci/remove.c
>>> @@ -34,6 +34,9 @@ static void pci_stop_dev(struct pci_dev
>>>
>>>  static void pci_destroy_dev(struct pci_dev *dev)
>>>  {
>>> +     /* remove VF, if PF driver skip that */
>>> +     pci_disable_sriov(dev);
>>> +
>>>       down_write(&pci_bus_sem);
>>>       list_del(&dev->bus_list);
>>>       up_write(&pci_bus_sem);
>>>
>> How are you able to hot-remove the PF if the VFs are still holding
>> references to it?
> usually pci_stop_and_remove_bus_device always successfully, and
> power get turned off for that card.

I'm not an expert in this area, but that doesn't seem right.  How is it
you can remove a device if there are still outstanding references to
it?  Is this one of those cases where we have to succeed because the
system is removing the device and there is nothing we can do to stop it?

>> The issue I see with this patch is that if the PF has any VFs direct
>> assigned, hot plug removing the PF will cause the guests containing
>> those VFs to panic.
> Then you should make guest support hotplug or suprise removal.
>
> If the guest does panic because it does support hotplug, that is right behavior.
>
> Just like in bare metal machine, if it does not support hotplug, and user would
> know what is going to happen if he remove one pcie card.
>
> Thanks
>
> Yinghai

I suspect that is much easier said than done.  We probably need somebody
familiar with the KVM side of things to address the feasibility of
something like that.  I believe it was Greg and Don that worked on the
original patches that made it so that we could leave the VFs in place on
driver removal.  They would likely have a better answer as to why it is
preferable to leave the VFs in place than panic a non-compliant guest.

Thanks,

Alex

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] PCI: Fix hotplug remove with sriov again
  2013-07-19 19:14 [PATCH] PCI: Fix hotplug remove with sriov again Yinghai Lu
  2013-07-19 19:14 ` [PATCH] PCI: Separate stop and remove devices in pciehp Yinghai Lu
  2013-07-19 19:14 ` [PATCH] PCI: Stop sriov before remove PF Yinghai Lu
@ 2013-07-22  7:07 ` Yijing Wang
  2013-07-22 17:39 ` Bjorn Helgaas
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 26+ messages in thread
From: Yijing Wang @ 2013-07-22  7:07 UTC (permalink / raw)
  To: Yinghai Lu; +Cc: Bjorn Helgaas, linux-pci, stable

On 2013/7/20 3:14, Yinghai Lu wrote:
> Found hot-remove pcie card with sriov enabled cause crash in v3.10.
> 
> It is regression caused by commit ba518e3c177547dfebf7fa7252cea0c850e7ce25
> (PCI: pciehp: Iterate over all devices in slot, not functions 0-7)
> 
> That commit change to use bus->devices to iterate devices under
> bus to run pci_stop_and_remove_bus_device().
> Actually it duplicates the problem with those bus->devices iteratation
> that we try to fix in commit ac205b7bb72fa4227d2e79979bbe2b4687cdf44d
> (PCI: make sriov work with hotplug remove)
> 
> Change to iterate reversely as we did last time.

It looks fine to me. Thanks!

> 
> Signed-off-by: Yinghai Lu <yinghai@kernel.org>
> Cc: Yijing Wang <wangyijing@huawei.com>
> Cc: <stable@vger.kernel.org> v3.9+
> 
> ---
>  drivers/pci/hotplug/pciehp_pci.c |    8 +++++++-
>  1 file changed, 7 insertions(+), 1 deletion(-)
> 
> Index: linux-2.6/drivers/pci/hotplug/pciehp_pci.c
> ===================================================================
> --- linux-2.6.orig/drivers/pci/hotplug/pciehp_pci.c
> +++ linux-2.6/drivers/pci/hotplug/pciehp_pci.c
> @@ -92,7 +92,13 @@ int pciehp_unconfigure_device(struct slo
>  	if (ret)
>  		presence = 0;
>  
> -	list_for_each_entry_safe(dev, temp, &parent->devices, bus_list) {
> +	/*
> +	 * Need to iterate device reversely, as during
> +	 * stop PF driver, VF will be removed, the list_for_each
> +	 * could point to removed VF with temp.
> +	 */
> +	list_for_each_entry_safe_reverse(dev, temp, &parent->devices,
> +					 bus_list) {
>  		pci_dev_get(dev);
>  		if (dev->hdr_type == PCI_HEADER_TYPE_BRIDGE && presence) {
>  			pci_read_config_byte(dev, PCI_BRIDGE_CONTROL, &bctl);
> 
> 


-- 
Thanks!
Yijing


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] PCI: Fix hotplug remove with sriov again
  2013-07-19 19:14 [PATCH] PCI: Fix hotplug remove with sriov again Yinghai Lu
                   ` (2 preceding siblings ...)
  2013-07-22  7:07 ` [PATCH] PCI: Fix hotplug remove with sriov again Yijing Wang
@ 2013-07-22 17:39 ` Bjorn Helgaas
  2013-07-22 17:48   ` Yinghai Lu
  2013-07-23 17:40 ` Bjorn Helgaas
  2013-07-24  2:01 ` Yijing Wang
  5 siblings, 1 reply; 26+ messages in thread
From: Bjorn Helgaas @ 2013-07-22 17:39 UTC (permalink / raw)
  To: Yinghai Lu; +Cc: linux-pci@vger.kernel.org, Yijing Wang, stable@vger.kernel.org

On Fri, Jul 19, 2013 at 1:14 PM, Yinghai Lu <yinghai@kernel.org> wrote:
> Found hot-remove pcie card with sriov enabled cause crash in v3.10.
>
> It is regression caused by commit ba518e3c177547dfebf7fa7252cea0c850e7ce25
> (PCI: pciehp: Iterate over all devices in slot, not functions 0-7)

Can you post the dmesg or console log showing the crash?  If somebody
else sees this crash, having the log in the mailing list archive will
help them figure out that this is the fix they need.

> That commit change to use bus->devices to iterate devices under
> bus to run pci_stop_and_remove_bus_device().
> Actually it duplicates the problem with those bus->devices iteratation
> that we try to fix in commit ac205b7bb72fa4227d2e79979bbe2b4687cdf44d
> (PCI: make sriov work with hotplug remove)
>
> Change to iterate reversely as we did last time.
>
> Signed-off-by: Yinghai Lu <yinghai@kernel.org>
> Cc: Yijing Wang <wangyijing@huawei.com>
> Cc: <stable@vger.kernel.org> v3.9+
>
> ---
>  drivers/pci/hotplug/pciehp_pci.c |    8 +++++++-
>  1 file changed, 7 insertions(+), 1 deletion(-)
>
> Index: linux-2.6/drivers/pci/hotplug/pciehp_pci.c
> ===================================================================
> --- linux-2.6.orig/drivers/pci/hotplug/pciehp_pci.c
> +++ linux-2.6/drivers/pci/hotplug/pciehp_pci.c
> @@ -92,7 +92,13 @@ int pciehp_unconfigure_device(struct slo
>         if (ret)
>                 presence = 0;
>
> -       list_for_each_entry_safe(dev, temp, &parent->devices, bus_list) {
> +       /*
> +        * Need to iterate device reversely, as during
> +        * stop PF driver, VF will be removed, the list_for_each
> +        * could point to removed VF with temp.
> +        */
> +       list_for_each_entry_safe_reverse(dev, temp, &parent->devices,
> +                                        bus_list) {
>                 pci_dev_get(dev);
>                 if (dev->hdr_type == PCI_HEADER_TYPE_BRIDGE && presence) {
>                         pci_read_config_byte(dev, PCI_BRIDGE_CONTROL, &bctl);

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] PCI: Fix hotplug remove with sriov again
  2013-07-22 17:39 ` Bjorn Helgaas
@ 2013-07-22 17:48   ` Yinghai Lu
  0 siblings, 0 replies; 26+ messages in thread
From: Yinghai Lu @ 2013-07-22 17:48 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: linux-pci@vger.kernel.org, Yijing Wang, stable@vger.kernel.org

On Mon, Jul 22, 2013 at 10:39 AM, Bjorn Helgaas <bhelgaas@google.com> wrote:
> On Fri, Jul 19, 2013 at 1:14 PM, Yinghai Lu <yinghai@kernel.org> wrote:
>> Found hot-remove pcie card with sriov enabled cause crash in v3.10.
>>
>> It is regression caused by commit ba518e3c177547dfebf7fa7252cea0c850e7ce25
>> (PCI: pciehp: Iterate over all devices in slot, not functions 0-7)
>
> Can you post the dmesg or console log showing the crash?  If somebody
> else sees this crash, having the log in the mailing list archive will
> help them figure out that this is the fix they need.

Rescue:~ # echo -n 0 > /sys/bus/pci/slots/2/power
[ 5445.937864] pci_hotplug: power_write_file: power = 0
[ 5445.938153] pciehp 0000:00:03.0:pcie04: disable_slot: physical_slot = 2
[ 5445.949164] pciehp 0000:00:03.0:pcie04: pciehp_get_power_status:
SLOTCTRL a8 value read 1f9
[ 5445.949672] pciehp 0000:00:03.0:pcie04: pciehp_unconfigure_device:
domain:bus:dev = 0000:02:00
[ 5445.969878] mlx4_core 0000:02:00.0: Disabling SR-IOV
[ 5446.215792] mlx4_core 0000:02:00.0: Received reset from slave:1
[ 5446.217265]   free irq_desc for 1174
[ 5446.217567]   free irq_desc for 1175
[ 5446.229637]   free irq_desc for 1176
[ 5446.229691]   free irq_desc for 1177
[ 5446.230758] pci 0000:02:00.1: freeing pci_dev info
[ 5446.463034] mlx4_core 0000:02:00.0: Received reset from slave:2
[ 5446.463984]   free irq_desc for 1178
[ 5446.464274]   free irq_desc for 1179
[ 5446.479397]   free irq_desc for 1180
[ 5446.479804]   free irq_desc for 1181
[ 5446.480364] pci 0000:02:00.2: freeing pci_dev info
[ 5446.718817] mlx4_core 0000:02:00.0: Received reset from slave:3
[ 5446.719737]   free irq_desc for 1182
[ 5446.720123]   free irq_desc for 1183
[ 5446.729482]   free irq_desc for 1184
[ 5446.729533]   free irq_desc for 1185
[ 5446.730128] pci 0000:02:00.3: freeing pci_dev info
[ 5446.967233] mlx4_core 0000:02:00.0: Received reset from slave:4
[ 5446.968242]   free irq_desc for 1186
[ 5446.968508]   free irq_desc for 1187
[ 5446.979492]   free irq_desc for 1188
[ 5446.979905]   free irq_desc for 1189
[ 5446.981335] pci 0000:02:00.4: freeing pci_dev info
[ 5447.215089] mlx4_core 0000:02:00.0: Received reset from slave:5
[ 5447.216123]   free irq_desc for 1190
[ 5447.216460]   free irq_desc for 1191
[ 5447.229640]   free irq_desc for 1192
[ 5447.229692]   free irq_desc for 1193
[ 5447.230212] pci 0000:02:00.5: freeing pci_dev info
[ 5447.463557] mlx4_core 0000:02:00.0: Received reset from slave:6
[ 5447.464562]   free irq_desc for 1194
[ 5447.464867]   free irq_desc for 1195
[ 5447.479527]   free irq_desc for 1196
[ 5447.479579]   free irq_desc for 1197
[ 5447.480518] pci 0000:02:00.6: freeing pci_dev info
[ 5447.731734] mlx4_core 0000:02:00.0: Received reset from slave:7
[ 5447.732717]   free irq_desc for 1198
[ 5447.733032]   free irq_desc for 1199
[ 5447.749623]   free irq_desc for 1200
[ 5447.749675]   free irq_desc for 1201
[ 5447.750238] pci 0000:02:00.7: freeing pci_dev info
[ 5447.966630] mlx4_core 0000:02:00.0: Received reset from slave:8
[ 5447.967646]   free irq_desc for 1202
[ 5447.967947]   free irq_desc for 1203
[ 5447.979694]   free irq_desc for 1204
[ 5447.979746]   free irq_desc for 1205
[ 5447.980168] pci 0000:02:01.0: freeing pci_dev info
[ 5448.222466] mlx4_core 0000:02:00.0: Received reset from slave:9
[ 5448.223455]   free irq_desc for 1206
[ 5448.223830]   free irq_desc for 1207
[ 5448.239969]   free irq_desc for 1208
[ 5448.240022]   free irq_desc for 1209
[ 5448.240782] pci 0000:02:01.1: freeing pci_dev info
[ 5448.478746] mlx4_core 0000:02:00.0: Received reset from slave:10
[ 5448.479740]   free irq_desc for 1210
[ 5448.480071]   free irq_desc for 1211
[ 5448.489773]   free irq_desc for 1212
[ 5448.490298]   free irq_desc for 1213
[ 5448.491051] pci 0000:02:01.2: freeing pci_dev info
[ 5448.720910] mlx4_core 0000:02:00.0: Received reset from slave:11
[ 5448.721946]   free irq_desc for 1214
[ 5448.722198]   free irq_desc for 1215
[ 5448.739749]   free irq_desc for 1216
[ 5448.739800]   free irq_desc for 1217
[ 5448.740644] pci 0000:02:01.3: freeing pci_dev info
[ 5448.970280] mlx4_core 0000:02:00.0: Received reset from slave:12
[ 5448.971247]   free irq_desc for 1218
[ 5448.971515]   free irq_desc for 1219
[ 5448.989981]   free irq_desc for 1220
[ 5448.990405]   free irq_desc for 1221
[ 5448.991519] pci 0000:02:01.4: freeing pci_dev info
[ 5449.223394] mlx4_core 0000:02:00.0: Received reset from slave:13
[ 5449.224348]   free irq_desc for 1222
[ 5449.224651]   free irq_desc for 1223
[ 5449.239806]   free irq_desc for 1224
[ 5449.239872]   free irq_desc for 1225
[ 5449.240365] pci 0000:02:01.5: freeing pci_dev info
[ 5449.483929] mlx4_core 0000:02:00.0: Received reset from slave:14
[ 5449.484909]   free irq_desc for 1226
[ 5449.485193]   free irq_desc for 1227
[ 5449.499872]   free irq_desc for 1228
[ 5449.499939]   free irq_desc for 1229
[ 5449.500420] pci 0000:02:01.6: freeing pci_dev info
[ 5449.736091] mlx4_core 0000:02:00.0: Received reset from slave:15
[ 5449.737126]   free irq_desc for 1230
[ 5449.737430]   free irq_desc for 1231
[ 5449.749963]   free irq_desc for 1232
[ 5449.750016]   free irq_desc for 1233
[ 5449.750655] pci 0000:02:01.7: freeing pci_dev info
[ 5449.976423] mlx4_core 0000:02:00.0: Received reset from slave:16
[ 5449.977463]   free irq_desc for 1234
[ 5449.977704]   free irq_desc for 1235
[ 5449.989977]   free irq_desc for 1236
[ 5449.990027]   free irq_desc for 1237
[ 5449.990567] pci 0000:02:02.0: freeing pci_dev info
[ 5450.203801] mlx4_core 0000:02:00.0: Received reset from slave:17
[ 5450.204757]   free irq_desc for 1238
[ 5450.205063]   free irq_desc for 1239
[ 5450.220138]   free irq_desc for 1240
[ 5450.220526]   free irq_desc for 1241
[ 5450.221462] pci 0000:02:02.1: freeing pci_dev info
[ 5450.459183] mlx4_core 0000:02:00.0: Received reset from slave:18
[ 5450.460094]   free irq_desc for 1242
[ 5450.460433]   free irq_desc for 1243
[ 5450.473318]   free irq_desc for 1244
[ 5450.473657]   free irq_desc for 1245
[ 5450.474055] pci 0000:02:02.2: freeing pci_dev info
[ 5450.716685] mlx4_core 0000:02:00.0: Received reset from slave:19
[ 5450.717715]   free irq_desc for 1246
[ 5450.717954]   free irq_desc for 1247
[ 5450.730326]   free irq_desc for 1248
[ 5450.730376]   free irq_desc for 1249
[ 5450.731344] pci 0000:02:02.3: freeing pci_dev info
[ 5450.963028] mlx4_core 0000:02:00.0: Received reset from slave:20
[ 5450.963980]   free irq_desc for 1250
[ 5450.964309]   free irq_desc for 1251
[ 5450.980321]   free irq_desc for 1252
[ 5450.980407]   free irq_desc for 1253
[ 5450.981486] pci 0000:02:02.4: freeing pci_dev info
[ 5451.216104] mlx4_core 0000:02:00.0: Received reset from slave:21
[ 5451.217006]   free irq_desc for 1254
[ 5451.217402]   free irq_desc for 1255
[ 5451.230218]   free irq_desc for 1256
[ 5451.230580]   free irq_desc for 1257
[ 5451.231433] pci 0000:02:02.5: freeing pci_dev info
[ 5451.468278] mlx4_core 0000:02:00.0: Received reset from slave:22
[ 5451.469250]   free irq_desc for 1258
[ 5451.469527]   free irq_desc for 1259
[ 5451.480235]   free irq_desc for 1260
[ 5451.480286]   free irq_desc for 1261
[ 5451.480821] pci 0000:02:02.6: freeing pci_dev info
[ 5451.708808] mlx4_core 0000:02:00.0: Received reset from slave:23
[ 5451.709831]   free irq_desc for 1262
[ 5451.710109]   free irq_desc for 1263
[ 5451.720320]   free irq_desc for 1264
[ 5451.720774]   free irq_desc for 1265
[ 5451.721286] pci 0000:02:02.7: freeing pci_dev info
[ 5451.967833] mlx4_core 0000:02:00.0: Received reset from slave:24
[ 5451.968861]   free irq_desc for 1266
[ 5451.969165]   free irq_desc for 1267
[ 5451.980340]   free irq_desc for 1268
[ 5451.980725]   free irq_desc for 1269
[ 5451.981814] pci 0000:02:03.0: freeing pci_dev info
[ 5452.215387] mlx4_core 0000:02:00.0: Received reset from slave:25
[ 5452.216410]   free irq_desc for 1270
[ 5452.216711]   free irq_desc for 1271
[ 5452.230791]   free irq_desc for 1272
[ 5452.230841]   free irq_desc for 1273
[ 5452.231804] pci 0000:02:03.1: freeing pci_dev info
[ 5452.479753] mlx4_core 0000:02:00.0: Received reset from slave:26
[ 5452.480677]   free irq_desc for 1274
[ 5452.481045]   free irq_desc for 1275
[ 5452.490411]   free irq_desc for 1276
[ 5452.490460]   free irq_desc for 1277
[ 5452.491277] pci 0000:02:03.2: freeing pci_dev info
[ 5452.728961] mlx4_core 0000:02:00.0: Received reset from slave:27
[ 5452.729876]   free irq_desc for 1278
[ 5452.730469]   free irq_desc for 1279
[ 5452.730525]   free irq_desc for 1280
[ 5452.730575]   free irq_desc for 1281
[ 5452.731132] pci 0000:02:03.3: freeing pci_dev info
[ 5452.968678] mlx4_core 0000:02:00.0: Received reset from slave:28
[ 5452.969655]   free irq_desc for 1282
[ 5452.969939]   free irq_desc for 1283
[ 5452.980494]   free irq_desc for 1284
[ 5452.980558]   free irq_desc for 1285
[ 5452.981101] pci 0000:02:03.4: freeing pci_dev info
[ 5453.223899] mlx4_core 0000:02:00.0: Received reset from slave:29
[ 5453.224896]   free irq_desc for 1286
[ 5453.225157]   free irq_desc for 1287
[ 5453.240827]   free irq_desc for 1288
[ 5453.240877]   free irq_desc for 1289
[ 5453.241357] pci 0000:02:03.5: freeing pci_dev info
[ 5453.480534] mlx4_core 0000:02:00.0: Received reset from slave:30
[ 5453.481469]   free irq_desc for 1290
[ 5453.481754]   free irq_desc for 1291
[ 5453.500604]   free irq_desc for 1292
[ 5453.501013]   free irq_desc for 1293
[ 5453.501753] pci 0000:02:03.6: freeing pci_dev info
[ 5453.753006] mlx4_core 0000:02:00.0: Received reset from slave:31
[ 5453.754081]   free irq_desc for 1294
[ 5453.754395]   free irq_desc for 1295
[ 5453.770626]   free irq_desc for 1296
[ 5453.770689]   free irq_desc for 1297
[ 5453.771160] pci 0000:02:03.7: freeing pci_dev info
[ 5453.996813] mlx4_core 0000:02:00.0: Received reset from slave:32
[ 5453.997762]   free irq_desc for 1298
[ 5453.998103]   free irq_desc for 1299
[ 5454.010692]   free irq_desc for 1300
[ 5454.010742]   free irq_desc for 1301
[ 5454.011668] pci 0000:02:04.0: freeing pci_dev info
[ 5454.248027] mlx4_core 0000:02:00.0: Received reset from slave:33
[ 5454.248994]   free irq_desc for 1302
[ 5454.249287]   free irq_desc for 1303
[ 5454.260953]   free irq_desc for 1304
[ 5454.261279]   free irq_desc for 1305
[ 5454.261696] pci 0000:02:04.1: freeing pci_dev info
[ 5454.497131] mlx4_core 0000:02:00.0: Received reset from slave:34
[ 5454.498033]   free irq_desc for 1306
[ 5454.498368]   free irq_desc for 1307
[ 5454.510715]   free irq_desc for 1308
[ 5454.511040]   free irq_desc for 1309
[ 5454.511637] pci 0000:02:04.2: freeing pci_dev info
[ 5454.756143] mlx4_core 0000:02:00.0: Received reset from slave:35
[ 5454.757169]   free irq_desc for 1310
[ 5454.757450]   free irq_desc for 1311
[ 5454.770945]   free irq_desc for 1312
[ 5454.770995]   free irq_desc for 1313
[ 5454.771448] pci 0000:02:04.3: freeing pci_dev info
[ 5455.025076] mlx4_core 0000:02:00.0: Received reset from slave:36
[ 5455.026029]   free irq_desc for 1314
[ 5455.026291]   free irq_desc for 1315
[ 5455.040816]   free irq_desc for 1316
[ 5455.040865]   free irq_desc for 1317
[ 5455.041315] pci 0000:02:04.4: freeing pci_dev info
[ 5455.271839] mlx4_core 0000:02:00.0: Received reset from slave:37
[ 5455.272777]   free irq_desc for 1318
[ 5455.273083]   free irq_desc for 1319
[ 5455.290978]   free irq_desc for 1320
[ 5455.291240]   free irq_desc for 1321
[ 5455.291721] pci 0000:02:04.5: freeing pci_dev info
[ 5455.515457] mlx4_core 0000:02:00.0: Received reset from slave:38
[ 5455.516424]   free irq_desc for 1322
[ 5455.516538]   free irq_desc for 1323
[ 5455.516587]   free irq_desc for 1324
[ 5455.516635]   free irq_desc for 1325
[ 5455.517140] pci 0000:02:04.6: freeing pci_dev info
[ 5455.757061] mlx4_core 0000:02:00.0: Received reset from slave:39
[ 5455.757998]   free irq_desc for 1326
[ 5455.758287]   free irq_desc for 1327
[ 5455.770962]   free irq_desc for 1328
[ 5455.771028]   free irq_desc for 1329
[ 5455.771488] pci 0000:02:04.7: freeing pci_dev info
[ 5456.005454] mlx4_core 0000:02:00.0: Received reset from slave:40
[ 5456.006393]   free irq_desc for 1330
[ 5456.006644]   free irq_desc for 1331
[ 5456.020994]   free irq_desc for 1332
[ 5456.021057]   free irq_desc for 1333
[ 5456.021527] pci 0000:02:05.0: freeing pci_dev info
[ 5456.249555] mlx4_core 0000:02:00.0: Received reset from slave:41
[ 5456.250495]   free irq_desc for 1334
[ 5456.250798]   free irq_desc for 1335
[ 5456.261188]   free irq_desc for 1336
[ 5456.261251]   free irq_desc for 1337
[ 5456.262184] pci 0000:02:05.1: freeing pci_dev info
[ 5456.496812] mlx4_core 0000:02:00.0: Received reset from slave:42
[ 5456.497782]   free irq_desc for 1338
[ 5456.498102]   free irq_desc for 1339
[ 5456.511340]   free irq_desc for 1340
[ 5456.511403]   free irq_desc for 1341
[ 5456.511908] pci 0000:02:05.2: freeing pci_dev info
[ 5456.752499] mlx4_core 0000:02:00.0: Received reset from slave:43
[ 5456.753720]   free irq_desc for 1342
[ 5456.753972]   free irq_desc for 1343
[ 5456.771268]   free irq_desc for 1344
[ 5456.771329]   free irq_desc for 1345
[ 5456.771727] pci 0000:02:05.3: freeing pci_dev info
[ 5457.007932] mlx4_core 0000:02:00.0: Received reset from slave:44
[ 5457.008893]   free irq_desc for 1346
[ 5457.009117]   free irq_desc for 1347
[ 5457.021232]   free irq_desc for 1348
[ 5457.021472]   free irq_desc for 1349
[ 5457.022457] pci 0000:02:05.4: freeing pci_dev info
[ 5457.261323] mlx4_core 0000:02:00.0: Received reset from slave:45
[ 5457.262191]   free irq_desc for 1350
[ 5457.262480]   free irq_desc for 1351
[ 5457.281299]   free irq_desc for 1352
[ 5457.281346]   free irq_desc for 1353
[ 5457.281710] pci 0000:02:05.5: freeing pci_dev info
[ 5457.513048] mlx4_core 0000:02:00.0: Received reset from slave:46
[ 5457.513944]   free irq_desc for 1354
[ 5457.514207]   free irq_desc for 1355
[ 5457.531258]   free irq_desc for 1356
[ 5457.531306]   free irq_desc for 1357
[ 5457.531670] pci 0000:02:05.6: freeing pci_dev info
[ 5457.760620] mlx4_core 0000:02:00.0: Received reset from slave:47
[ 5457.761494]   free irq_desc for 1358
[ 5457.761794]   free irq_desc for 1359
[ 5457.771264]   free irq_desc for 1360
[ 5457.771312]   free irq_desc for 1361
[ 5457.771773] pci 0000:02:05.7: freeing pci_dev info
[ 5458.008640] mlx4_core 0000:02:00.0: Received reset from slave:48
[ 5458.009717]   free irq_desc for 1362
[ 5458.009947]   free irq_desc for 1363
[ 5458.021417]   free irq_desc for 1364
[ 5458.021795]   free irq_desc for 1365
[ 5458.022351] pci 0000:02:06.0: freeing pci_dev info
[ 5458.252793] mlx4_core 0000:02:00.0: Received reset from slave:49
[ 5458.253677]   free irq_desc for 1366
[ 5458.253972]   free irq_desc for 1367
[ 5458.271677]   free irq_desc for 1368
[ 5458.271724]   free irq_desc for 1369
[ 5458.272084] pci 0000:02:06.1: freeing pci_dev info
[ 5458.489329] mlx4_core 0000:02:00.0: Received reset from slave:50
[ 5458.490216]   free irq_desc for 1370
[ 5458.490505]   free irq_desc for 1371
[ 5458.501513]   free irq_desc for 1372
[ 5458.501776]   free irq_desc for 1373
[ 5458.502129] pci 0000:02:06.2: freeing pci_dev info
[ 5458.731210] mlx4_core 0000:02:00.0: Received reset from slave:51
[ 5458.732090]   free irq_desc for 1374
[ 5458.732437]   free irq_desc for 1375
[ 5458.751457]   free irq_desc for 1376
[ 5458.751503]   free irq_desc for 1377
[ 5458.751859] pci 0000:02:06.3: freeing pci_dev info
[ 5458.973317] mlx4_core 0000:02:00.0: Received reset from slave:52
[ 5458.974191]   free irq_desc for 1378
[ 5458.974489]   free irq_desc for 1379
[ 5458.991634]   free irq_desc for 1380
[ 5458.991894]   free irq_desc for 1381
[ 5458.992269] pci 0000:02:06.4: freeing pci_dev info
[ 5459.228617] mlx4_core 0000:02:00.0: Received reset from slave:53
[ 5459.229475]   free irq_desc for 1382
[ 5459.229736]   free irq_desc for 1383
[ 5459.241833]   free irq_desc for 1384
[ 5459.242083]   free irq_desc for 1385
[ 5459.242445] pci 0000:02:06.5: freeing pci_dev info
[ 5459.457288] mlx4_core 0000:02:00.0: Received reset from slave:54
[ 5459.458150]   free irq_desc for 1386
[ 5459.458462]   free irq_desc for 1387
[ 5459.471683]   free irq_desc for 1388
[ 5459.471927]   free irq_desc for 1389
[ 5459.472283] pci 0000:02:06.6: freeing pci_dev info
[ 5459.713679] mlx4_core 0000:02:00.0: Received reset from slave:55
[ 5459.714556]   free irq_desc for 1390
[ 5459.714867]   free irq_desc for 1391
[ 5459.731666]   free irq_desc for 1392
[ 5459.731711]   free irq_desc for 1393
[ 5459.732073] pci 0000:02:06.7: freeing pci_dev info
[ 5459.957250] mlx4_core 0000:02:00.0: Received reset from slave:56
[ 5459.958154]   free irq_desc for 1394
[ 5459.958417]   free irq_desc for 1395
[ 5459.971922]   free irq_desc for 1396
[ 5459.971968]   free irq_desc for 1397
[ 5459.972321] pci 0000:02:07.0: freeing pci_dev info
[ 5460.191680] mlx4_core 0000:02:00.0: Received reset from slave:57
[ 5460.192550]   free irq_desc for 1398
[ 5460.192787]   free irq_desc for 1399
[ 5460.211683]   free irq_desc for 1400
[ 5460.211729]   free irq_desc for 1401
[ 5460.212086] pci 0000:02:07.1: freeing pci_dev info
[ 5460.433863] mlx4_core 0000:02:00.0: Received reset from slave:58
[ 5460.434719]   free irq_desc for 1402
[ 5460.435006]   free irq_desc for 1403
[ 5460.451914]   free irq_desc for 1404
[ 5460.451959]   free irq_desc for 1405
[ 5460.452316] pci 0000:02:07.2: freeing pci_dev info
[ 5460.673364] mlx4_core 0000:02:00.0: Received reset from slave:59
[ 5460.674225]   free irq_desc for 1406
[ 5460.674555]   free irq_desc for 1407
[ 5460.692072]   free irq_desc for 1408
[ 5460.692338]   free irq_desc for 1409
[ 5460.692695] pci 0000:02:07.3: freeing pci_dev info
[ 5460.922618] mlx4_core 0000:02:00.0: Received reset from slave:60
[ 5460.923478]   free irq_desc for 1410
[ 5460.923741]   free irq_desc for 1411
[ 5460.941851]   free irq_desc for 1412
[ 5460.941897]   free irq_desc for 1413
[ 5460.942254] pci 0000:02:07.4: freeing pci_dev info
[ 5461.161658] mlx4_core 0000:02:00.0: Received reset from slave:61
[ 5461.162534]   free irq_desc for 1414
[ 5461.162851]   free irq_desc for 1415
[ 5461.182001]   free irq_desc for 1416
[ 5461.182251]   free irq_desc for 1417
[ 5461.182608] pci 0000:02:07.5: freeing pci_dev info
[ 5461.412367] mlx4_core 0000:02:00.0: Received reset from slave:62
[ 5461.413303]   free irq_desc for 1418
[ 5461.413350]   free irq_desc for 1419
[ 5461.413394]   free irq_desc for 1420
[ 5461.413441]   free irq_desc for 1421
[ 5461.413991] pci 0000:02:07.6: freeing pci_dev info
[ 5461.664620] mlx4_core 0000:02:00.0: Received reset from slave:63
[ 5461.665489]   free irq_desc for 1422
[ 5461.665732]   free irq_desc for 1423
[ 5461.682056]   free irq_desc for 1424
[ 5461.682104]   free irq_desc for 1425
[ 5461.682465] pci 0000:02:07.7: freeing pci_dev info
[ 5463.037163] mlx4_core 0000:02:00.0: command 0x14 failed: fw status = 0x9
[ 5463.037635] mlx4_core 0000:02:00.0: vhcr command:0x14 slave:0
failed with error:0, status -9
[ 5463.052599] mlx4_core 0000:02:00.0: HW2SW_EQ failed (-5)
[ 5463.052605] remove_mtt_ok-761: state RES_MTT_ALLOCATED, ref_count 1
[ 5463.052607] mlx4_core 0000:02:00.0: vhcr command:0xf01 slave:0
failed with error:0, status -16
[ 5463.052611] mlx4_core 0000:02:00.0: Failed to free mtt range at:512 order:9
[ 5463.054664] mlx4_core 0000:02:00.0: command 0x14 failed: fw status = 0x9
[ 5463.054669] mlx4_core 0000:02:00.0: vhcr command:0x14 slave:0
failed with error:0, status -9
[ 5463.054673] mlx4_core 0000:02:00.0: HW2SW_EQ failed (-5)
[ 5463.054678] remove_mtt_ok-761: state RES_MTT_ALLOCATED, ref_count 1
[ 5463.054681] mlx4_core 0000:02:00.0: vhcr command:0xf01 slave:0
failed with error:0, status -16
[ 5463.054684] mlx4_core 0000:02:00.0: Failed to free mtt range at:1024 order:9
[ 5463.056586] mlx4_core 0000:02:00.0: command 0x14 failed: fw status = 0x9
[ 5463.056591] mlx4_core 0000:02:00.0: vhcr command:0x14 slave:0
failed with error:0, status -9
[ 5463.056595] mlx4_core 0000:02:00.0: HW2SW_EQ failed (-5)
[ 5463.056599] remove_mtt_ok-761: state RES_MTT_ALLOCATED, ref_count 1
[ 5463.056602] mlx4_core 0000:02:00.0: vhcr command:0xf01 slave:0
failed with error:0, status -16
[ 5463.056605] mlx4_core 0000:02:00.0: Failed to free mtt range at:1536 order:9
[ 5463.058522] mlx4_core 0000:02:00.0: command 0x14 failed: fw status = 0x9
[ 5463.058527] mlx4_core 0000:02:00.0: vhcr command:0x14 slave:0
failed with error:0, status -9
[ 5463.058531] mlx4_core 0000:02:00.0: HW2SW_EQ failed (-5)
[ 5463.058534] remove_mtt_ok-761: state RES_MTT_ALLOCATED, ref_count 1
[ 5463.058537] mlx4_core 0000:02:00.0: vhcr command:0xf01 slave:0
failed with error:0, status -16
[ 5463.058540] mlx4_core 0000:02:00.0: Failed to free mtt range at:32 order:2
[ 5463.072428]   free irq_desc for 1170
[ 5463.072475]   free irq_desc for 1171
[ 5463.072520]   free irq_desc for 1172
[ 5463.072565]   free irq_desc for 1173
[ 5464.075050] pci 0000:02:00.0: freeing pci_dev info
[ 5464.075364] ------------[ cut here ]------------
[ 5464.075602] WARNING: CPU: 20 PID: 25098 at include/linux/kref.h:47
kobject_get+0x40/0x60()
[ 5464.092574] Modules linked in:
[ 5464.092748] CPU: 20 PID: 25098 Comm: bash Not tainted
3.10.0-yh-04644-g692d9ae-dirty #1787
[ 5464.112615] Hardware name: Oracle Corporation  unknown       /
, BIOS 11016600    05/17/2011
[ 5464.112620]  0000000000000009 ffff885011f51cd8 ffffffff820b9680
0000000000004880
[ 5464.112623]  0000000000000000 ffff885011f51d18 ffffffff81096957
ffffffff82b28f20
[ 5464.112627]  ffff8880263dd000 0000000000000000 ffff883027056828
ffff882025616200
[ 5464.112629] Call Trace:
[ 5464.112640]  [<ffffffff820b9680>] dump_stack+0x46/0x58
[ 5464.112649]  [<ffffffff81096957>] warn_slowpath_common+0x87/0xb0
[ 5464.112654]  [<ffffffff8109699a>] warn_slowpath_null+0x1a/0x20
[ 5464.112657]  [<ffffffff81504aa0>] kobject_get+0x40/0x60
[ 5464.112664]  [<ffffffff81760db7>] get_device+0x17/0x30
[ 5464.112671]  [<ffffffff815387a2>] pci_dev_get+0x22/0x30
[ 5464.112677]  [<ffffffff8154e758>] pciehp_unconfigure_device+0xa8/0x190
[ 5464.112681]  [<ffffffff8154e150>] pciehp_disable_slot+0x160/0x200
[ 5464.112683]  [<ffffffff8154e494>] pciehp_sysfs_disable_slot+0x64/0x130
[ 5464.112686]  [<ffffffff8154d03a>] disable_slot+0x5a/0x70
[ 5464.112695]  [<ffffffff8154997f>] power_write_file+0xaf/0x130
[ 5464.112699]  [<ffffffff8153f471>] pci_slot_attr_store+0x21/0x30
[ 5464.112708]  [<ffffffff8124272b>] sysfs_write_file+0x10b/0x160
[ 5464.112718]  [<ffffffff811d1b1b>] vfs_write+0xeb/0x1d0
[ 5464.112721]  [<ffffffff811d1fb5>] SyS_write+0x55/0xb0
[ 5464.112728]  [<ffffffff820d609a>] tracesys+0xd4/0xd9
[ 5464.112730] ---[ end trace a3110aa40b91f7fc ]---
[ 5464.112744] pci : freeing pci_dev info
[ 5464.112745] ------------[ cut here ]------------
[ 5464.112753] WARNING: CPU: 20 PID: 25098 at lib/list_debug.c:56
__list_del_entry+0x63/0xe0()
[ 5464.112755] list_del corruption, ffff8880263dd000->prev is
LIST_POISON2 (dead000000200200)
[ 5464.112756] Modules linked in:
[ 5464.112759] CPU: 20 PID: 25098 Comm: bash Tainted: G        W
3.10.0-yh-04644-g692d9ae-dirty #1787
[ 5464.112760] Hardware name: Oracle Corporation  unknown       /
, BIOS 11016600    05/17/2011
[ 5464.112764]  0000000000000009 ffff885011f51bc8 ffffffff820b9680
0000000000000d70
[ 5464.112768]  ffff885011f51c18 ffff885011f51c08 ffffffff81096957
ffff885011f51c58
[ 5464.112771]  ffff8880263dd000 ffff8880263dd000 ffff8880263dd098
ffff8880263dd0a8
[ 5464.112772] Call Trace:
[ 5464.112776]  [<ffffffff820b9680>] dump_stack+0x46/0x58
[ 5464.112779]  [<ffffffff81096957>] warn_slowpath_common+0x87/0xb0
[ 5464.112782]  [<ffffffff81096a36>] warn_slowpath_fmt+0x46/0x50
[ 5464.112786]  [<ffffffff81518273>] __list_del_entry+0x63/0xe0
[ 5464.112790]  [<ffffffff81518301>] list_del+0x11/0x40
[ 5464.112797]  [<ffffffff8153064c>] pci_release_dev+0x4c/0x150
[ 5464.112801]  [<ffffffff817610f5>] device_release+0xa5/0x110
[ 5464.112804]  [<ffffffff81504c4f>] kobject_release+0x6f/0x90
[ 5464.112807]  [<ffffffff81504b0c>] kobject_put+0x4c/0x60
[ 5464.112810]  [<ffffffff81760e07>] put_device+0x17/0x20
[ 5464.112812]  [<ffffffff815386aa>] pci_dev_put+0x1a/0x20
[ 5464.112815]  [<ffffffff8154e812>] pciehp_unconfigure_device+0x162/0x190
[ 5464.112818]  [<ffffffff8154e150>] pciehp_disable_slot+0x160/0x200
[ 5464.112821]  [<ffffffff8154e494>] pciehp_sysfs_disable_slot+0x64/0x130
[ 5464.112823]  [<ffffffff8154d03a>] disable_slot+0x5a/0x70
[ 5464.112826]  [<ffffffff8154997f>] power_write_file+0xaf/0x130
[ 5464.112830]  [<ffffffff8153f471>] pci_slot_attr_store+0x21/0x30
[ 5464.112833]  [<ffffffff8124272b>] sysfs_write_file+0x10b/0x160
[ 5464.112836]  [<ffffffff811d1b1b>] vfs_write+0xeb/0x1d0
[ 5464.112839]  [<ffffffff811d1fb5>] SyS_write+0x55/0xb0
[ 5464.112842]  [<ffffffff820d609a>] tracesys+0xd4/0xd9
[ 5464.112844] ---[ end trace a3110aa40b91f7fd ]---
[ 5464.112859] BUG: unable to handle kernel NULL pointer dereference
at           (null)
[ 5464.112863] IP: [<ffffffff8154e815>] pciehp_unconfigure_device+0x165/0x190
[ 5464.112866] PGD 0
[ 5464.112868] Oops: 0000 [#1] SMP
[ 5464.112871] Modules linked in:
[ 5464.112873] CPU: 20 PID: 25098 Comm: bash Tainted: G        W
3.10.0-yh-04644-g692d9ae-dirty #1787
[ 5464.112874] Hardware name: Oracle Corporation  unknown       /
, BIOS 11016600    05/17/2011
[ 5464.112875] task: ffff8850278acb40 ti: ffff885011f50000 task.ti:
ffff885011f50000
[ 5464.112878] RIP: 0010:[<ffffffff8154e815>]  [<ffffffff8154e815>]
pciehp_unconfigure_device+0x165/0x190
[ 5464.112879] RSP: 0018:ffff885011f51d88  EFLAGS: 00010296
[ 5464.112881] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000181000094
[ 5464.112882] RDX: 0000000181000095 RSI: 0000000000000001 RDI: ffff88103d807d00
[ 5464.112883] RBP: ffff885011f51db8 R08: 0000000000000007 R09: ffff88303e7d70c0
[ 5464.112884] R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000000
[ 5464.112886] R13: ffff883027056828 R14: ffff882025616200 R15: ffff886025383ea0
[ 5464.112888] FS:  00007f6ef7efa700(0000) GS:ffff88303e600000(0000)
knlGS:0000000000000000
[ 5464.112889] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 5464.112891] CR2: 0000000000000000 CR3: 0000003010ce7000 CR4: 00000000000007e0
[ 5464.112892] Stack:
[ 5464.112895]  ffff885011f51da8 fefb010000000002 ffff882025616000
ffff882025616200
[ 5464.112898]  ffff882025616200 ffffffff822a36a0 ffff885011f51df8
ffffffff8154e150
[ 5464.112900]  0000000000000001 01ff882025616000 ffff882025616000
ffff8820256160f8
[ 5464.112900] Call Trace:
[ 5464.112903]  [<ffffffff8154e150>] pciehp_disable_slot+0x160/0x200
[ 5464.112905]  [<ffffffff8154e494>] pciehp_sysfs_disable_slot+0x64/0x130
[ 5464.112908]  [<ffffffff8154d03a>] disable_slot+0x5a/0x70
[ 5464.112910]  [<ffffffff8154997f>] power_write_file+0xaf/0x130
[ 5464.112913]  [<ffffffff8153f471>] pci_slot_attr_store+0x21/0x30
[ 5464.112916]  [<ffffffff8124272b>] sysfs_write_file+0x10b/0x160
[ 5464.112919]  [<ffffffff811d1b1b>] vfs_write+0xeb/0x1d0
[ 5464.112921]  [<ffffffff811d1fb5>] SyS_write+0x55/0xb0
[ 5464.112923]  [<ffffffff820d609a>] tracesys+0xd4/0xd9
[ 5464.112947] Code: ba 04 00 00 00 48 8b 7b 10 66 81 e1 fb fe 80 cd
04 66 89 4d de 0f b7 c9 e8 39 fd fd ff 48 89 df 4c 89 e3 e8 7e 9e fe
ff 4c 89 e0 <4d> 8b 24 24 4c 39 e8 0f 85 2e ff ff ff e9 21 ff ff ff 66
0f 1f
[ 5464.112949] RIP  [<ffffffff8154e815>] pciehp_unconfigure_device+0x165/0x190
[ 5464.112950]  RSP <ffff885011f51d88>
[ 5464.112951] CR2: 0000000000000000
[ 5464.113457] ---[ end trace a3110aa40b91f7fe ]---

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] PCI: Separate stop and remove devices in pciehp
  2013-07-19 19:14 ` [PATCH] PCI: Separate stop and remove devices in pciehp Yinghai Lu
@ 2013-07-22 21:23   ` Bjorn Helgaas
  2013-07-23  2:32     ` Yinghai Lu
  0 siblings, 1 reply; 26+ messages in thread
From: Bjorn Helgaas @ 2013-07-22 21:23 UTC (permalink / raw)
  To: Yinghai Lu; +Cc: linux-pci@vger.kernel.org, Yijing Wang, Jiang Liu

Evidently this is really part of a series, but you didn't label it
that way.  It looks like this applies on top of your "PCI: Fix hotplug
remove with sriov again" patch?

On Fri, Jul 19, 2013 at 1:14 PM, Yinghai Lu <yinghai@kernel.org> wrote:
> After commit dc087f2f6a2925e81831f3016b9cbb6e470e7423
> (PCI: Simplify IOV implementation and fix reference count races)
> VF need to be removed via virtfn_remove to make sure ref to PF
> is put back.

I'm sure this makes sense, but it needs background.  I haven't figured
out exactly what the problem is.  You're describing the lowest-level
details, but not the overall picture that would make it understandable
to the rest of us.

> So we can not call stop_and_remove for VF before PF, that will
> make VF get removed directly before PF's driver try to use
> virtfn_remove to do it.
>
> Solution is separating stop and remove in two iterations.
>
> In first iteration, we stop VF driver at first with iterating devices
> reversely, and later during stop PF driver, disable_sriov will use
> virtfn_remove to remove VFs.
>
> Also some driver (like mlx4_core) need VF's driver get stopped before PF.

If this is relevant, please give a pointer to the mlx4_core code that
requires VFs to be stopped before the PF.

> To make it simple, separate VGA checking out and do that at first,
> if there is VGA in the chain, do even try to stop or remove any device
> under that bus.

This part seems like it could be a separate patch.

> Need this one for v3.11.

OK, but why?  We need a better explanation of what problem this fixes.
 It sounds like it fixes a reference counting problem in v3.11-rc1,
but I don't know what the effect of that problem is.  Maybe it means a
device isn't freed when it should be?  Maybe it means we can't add a
new device after hot-removing an SR-IOV device?

> Signed-off-by: Yinghai Lu <yinghai@kernel.org>
> Cc: Yijing Wang <wangyijing@huawei.com>
> Cc: Jiang Liu <liuj97@gmail.com>
>
> ---
>  drivers/pci/hotplug/pciehp_pci.c |   46 ++++++++++++++++++++++++++-------------
>  drivers/pci/remove.c             |    6 +++--
>  include/linux/pci.h              |    2 +
>  3 files changed, 37 insertions(+), 17 deletions(-)
>
> Index: linux-2.6/drivers/pci/hotplug/pciehp_pci.c
> ===================================================================
> --- linux-2.6.orig/drivers/pci/hotplug/pciehp_pci.c
> +++ linux-2.6/drivers/pci/hotplug/pciehp_pci.c
> @@ -92,26 +92,37 @@ int pciehp_unconfigure_device(struct slo
>         if (ret)
>                 presence = 0;
>
> +       /* check if VGA is around */
> +       if (presence) {
> +               list_for_each_entry(dev, &parent->devices, bus_list) {
> +                       if (dev->hdr_type == PCI_HEADER_TYPE_BRIDGE) {
> +                               pci_read_config_byte(dev, PCI_BRIDGE_CONTROL,
> +                                                       &bctl);
> +                               if (bctl & PCI_BRIDGE_CTL_VGA) {
> +                                       ctrl_err(ctrl,
> +                                                "Cannot remove display device %s\n",
> +                                                pci_name(dev));
> +                                       return -EINVAL;
> +                               }
> +                       }
> +               }
> +       }
> +
>         /*
> -        * Need to iterate device reversely, as during
> +        * Now VF need to be removed via virtfn_remove to make
> +        * sure ref to PF is put back. Some driver (mlx4_core) need
> +        * VF's driver get stopped before PF.
> +        * So we need to stop VF driver at first, that means
> +        * loop reversely, and later during stop PF driver,
> +        * disable_sriov will use virtfn_remove to remove VFs.
> +        * Also we can not loop without reverse, as during
>          * stop PF driver, VF will be removed, the list_for_each
>          * could point to removed VF with temp.
>          */
>         list_for_each_entry_safe_reverse(dev, temp, &parent->devices,
> -                                        bus_list) {
> -               pci_dev_get(dev);
> -               if (dev->hdr_type == PCI_HEADER_TYPE_BRIDGE && presence) {
> -                       pci_read_config_byte(dev, PCI_BRIDGE_CONTROL, &bctl);
> -                       if (bctl & PCI_BRIDGE_CTL_VGA) {
> -                               ctrl_err(ctrl,
> -                                        "Cannot remove display device %s\n",
> -                                        pci_name(dev));
> -                               pci_dev_put(dev);
> -                               rc = -EINVAL;
> -                               break;
> -                       }
> -               }
> -               pci_stop_and_remove_bus_device(dev);
> +                                                bus_list) {
> +               pci_stop_bus_device(dev);
> +
>                 /*
>                  * Ensure that no new Requests will be generated from
>                  * the device.
> @@ -122,6 +133,11 @@ int pciehp_unconfigure_device(struct slo
>                         command |= PCI_COMMAND_INTX_DISABLE;
>                         pci_write_config_word(dev, PCI_COMMAND, command);
>                 }
> +       }
> +
> +       list_for_each_entry_safe(dev, temp, &parent->devices, bus_list) {
> +               pci_dev_get(dev);
> +               pci_remove_bus_device(dev);
>                 pci_dev_put(dev);
>         }
>
> Index: linux-2.6/drivers/pci/remove.c
> ===================================================================
> --- linux-2.6.orig/drivers/pci/remove.c
> +++ linux-2.6/drivers/pci/remove.c
> @@ -56,7 +56,7 @@ void pci_remove_bus(struct pci_bus *bus)
>  }
>  EXPORT_SYMBOL(pci_remove_bus);
>
> -static void pci_stop_bus_device(struct pci_dev *dev)
> +void pci_stop_bus_device(struct pci_dev *dev)
>  {
>         struct pci_bus *bus = dev->subordinate;
>         struct pci_dev *child, *tmp;
> @@ -75,8 +75,9 @@ static void pci_stop_bus_device(struct p
>
>         pci_stop_dev(dev);
>  }
> +EXPORT_SYMBOL(pci_stop_bus_device);
>
> -static void pci_remove_bus_device(struct pci_dev *dev)
> +void pci_remove_bus_device(struct pci_dev *dev)
>  {
>         struct pci_bus *bus = dev->subordinate;
>         struct pci_dev *child, *tmp;
> @@ -92,6 +93,7 @@ static void pci_remove_bus_device(struct
>
>         pci_destroy_dev(dev);
>  }
> +EXPORT_SYMBOL(pci_remove_bus_device);

I really don't want to export these two symbols unless we have to.
Obviously this is the heart of your patch, so we probably *will* have
to.

But maybe they can at least be confined to drivers/pci code, and not
exported to modules?  I don't think there's any reason pciehp needs to
be a module.  I was planning to merge a patch restricting it to be
built-in this cycle anyway.

>  /**
>   * pci_stop_and_remove_bus_device - remove a PCI device and any children
> Index: linux-2.6/include/linux/pci.h
> ===================================================================
> --- linux-2.6.orig/include/linux/pci.h
> +++ linux-2.6/include/linux/pci.h
> @@ -754,6 +754,8 @@ u8 pci_common_swizzle(struct pci_dev *de
>  struct pci_dev *pci_dev_get(struct pci_dev *dev);
>  void pci_dev_put(struct pci_dev *dev);
>  void pci_remove_bus(struct pci_bus *b);
> +void pci_stop_bus_device(struct pci_dev *dev);
> +void pci_remove_bus_device(struct pci_dev *dev);
>  void pci_stop_and_remove_bus_device(struct pci_dev *dev);
>  void pci_stop_root_bus(struct pci_bus *bus);
>  void pci_remove_root_bus(struct pci_bus *bus);

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] PCI: Stop sriov before remove PF
  2013-07-19 19:14 ` [PATCH] PCI: Stop sriov before remove PF Yinghai Lu
  2013-07-19 21:46   ` Alexander Duyck
@ 2013-07-22 23:15   ` Bjorn Helgaas
  2013-07-23  1:59     ` Yinghai Lu
  1 sibling, 1 reply; 26+ messages in thread
From: Bjorn Helgaas @ 2013-07-22 23:15 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: linux-pci@vger.kernel.org, Jiang Liu, Alexander Duyck,
	Donald Dutile, Greg Rose

On Fri, Jul 19, 2013 at 1:14 PM, Yinghai Lu <yinghai@kernel.org> wrote:
> After commit dc087f2f6a2925e81831f3016b9cbb6e470e7423
> (PCI: Simplify IOV implementation and fix reference count races)
> VF need to be removed via virtfn_remove to make sure ref to PF
> is put back.
>
> Some driver (like ixgbe) does not call pci_disable_sriov() if
> sriov is enabled via /sys/.../sriov_numvfs setting.
> ixgbe does allow driver for PF get detached, but still have VFs
> around.
>
> But how about PF get removed via /sys or pciehp?
>
> During hot-remove, VF will still hold one ref to PF and it
> prevent PF to be removed.
> That make the next hot-add fails, as old PF dev struct is still around.
>
> We need to add pci_disable_sriov() calling during pci dev removing.
>
> Need this one for v3.11

Needs explanation.  Pretend Linus is asking why we should put this in
after the merge window :)

I think the answer is that dc087f2f introduced a regression in certain
hot-remove/hot-add scenarios, but an example transcript showing the
issue would help a lot.

> Signed-off-by: Yinghai Lu <yinghai@kernel.org>
> Cc: Jiang Liu <liuj97@gmail.com>
> Cc: Alexander Duyck <alexander.h.duyck@intel.com>
> Cc: Donald Dutile <ddutile@redhat.com>
> Cc: Greg Rose <gregory.v.rose@intel.com>
>
> ---
>  drivers/pci/remove.c |    3 +++
>  1 file changed, 3 insertions(+)
>
> Index: linux-2.6/drivers/pci/remove.c
> ===================================================================
> --- linux-2.6.orig/drivers/pci/remove.c
> +++ linux-2.6/drivers/pci/remove.c
> @@ -34,6 +34,9 @@ static void pci_stop_dev(struct pci_dev
>
>  static void pci_destroy_dev(struct pci_dev *dev)
>  {
> +       /* remove VF, if PF driver skip that */
> +       pci_disable_sriov(dev);

How did you decide to call pci_disable_sriov() here rather than, for
example, in pci_stop_dev()?  We already have some PME and ASPM cleanup
in pci_stop_dev(), and this seems sort of similar to those.

>         down_write(&pci_bus_sem);
>         list_del(&dev->bus_list);
>         up_write(&pci_bus_sem);

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] PCI: Stop sriov before remove PF
  2013-07-22 23:15   ` Bjorn Helgaas
@ 2013-07-23  1:59     ` Yinghai Lu
  0 siblings, 0 replies; 26+ messages in thread
From: Yinghai Lu @ 2013-07-23  1:59 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: linux-pci@vger.kernel.org, Jiang Liu, Alexander Duyck,
	Donald Dutile, Greg Rose

On Mon, Jul 22, 2013 at 4:15 PM, Bjorn Helgaas <bhelgaas@google.com> wrote:
> On Fri, Jul 19, 2013 at 1:14 PM, Yinghai Lu <yinghai@kernel.org> wrote:
>> After commit dc087f2f6a2925e81831f3016b9cbb6e470e7423
>> (PCI: Simplify IOV implementation and fix reference count races)
>> VF need to be removed via virtfn_remove to make sure ref to PF
>> is put back.
>>
>> Some driver (like ixgbe) does not call pci_disable_sriov() if
>> sriov is enabled via /sys/.../sriov_numvfs setting.
>> ixgbe does allow driver for PF get detached, but still have VFs
>> around.
>>
>> But how about PF get removed via /sys or pciehp?
>>
>> During hot-remove, VF will still hold one ref to PF and it
>> prevent PF to be removed.
>> That make the next hot-add fails, as old PF dev struct is still around.
>>
>> We need to add pci_disable_sriov() calling during pci dev removing.
>>
>> Need this one for v3.11
>
> Needs explanation.  Pretend Linus is asking why we should put this in
> after the merge window :)
>
> I think the answer is that dc087f2f introduced a regression in certain
> hot-remove/hot-add scenarios, but an example transcript showing the
> issue would help a lot.

for intel 10GB ethernet, user could use /sys/../num_vfs to enable SRIOV
after PF driver is loaded.

If the cards in slots for pciehp, when user press button or use
/sys/bus/pci/slots/../power to turn off the power.

The PF driver will be stopped (but it does not call pci_disable_sriov),
and try to remove the PF, then turn off the power.
Actually the PF's pci_dev struct is not freed, because VFs still hold
some reference to it.

That is not fun, hotadd will not work, as old pci_dev struct is still there.
those VF struct still have old ref to it.

>
>> Signed-off-by: Yinghai Lu <yinghai@kernel.org>
>> Cc: Jiang Liu <liuj97@gmail.com>
>> Cc: Alexander Duyck <alexander.h.duyck@intel.com>
>> Cc: Donald Dutile <ddutile@redhat.com>
>> Cc: Greg Rose <gregory.v.rose@intel.com>
>>
>> ---
>>  drivers/pci/remove.c |    3 +++
>>  1 file changed, 3 insertions(+)
>>
>> Index: linux-2.6/drivers/pci/remove.c
>> ===================================================================
>> --- linux-2.6.orig/drivers/pci/remove.c
>> +++ linux-2.6/drivers/pci/remove.c
>> @@ -34,6 +34,9 @@ static void pci_stop_dev(struct pci_dev
>>
>>  static void pci_destroy_dev(struct pci_dev *dev)
>>  {
>> +       /* remove VF, if PF driver skip that */
>> +       pci_disable_sriov(dev);
>
> How did you decide to call pci_disable_sriov() here rather than, for
> example, in pci_stop_dev()?  We already have some PME and ASPM cleanup
> in pci_stop_dev(), and this seems sort of similar to those.

yes, pci_stop_dev is better.

I was thinking that pci_stop_dev could be used when PF's driver is
unloaded or detached.

Yinghai

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] PCI: Separate stop and remove devices in pciehp
  2013-07-22 21:23   ` Bjorn Helgaas
@ 2013-07-23  2:32     ` Yinghai Lu
  2013-07-23 15:56       ` Bjorn Helgaas
  0 siblings, 1 reply; 26+ messages in thread
From: Yinghai Lu @ 2013-07-23  2:32 UTC (permalink / raw)
  To: Bjorn Helgaas; +Cc: linux-pci@vger.kernel.org, Yijing Wang, Jiang Liu

On Mon, Jul 22, 2013 at 2:23 PM, Bjorn Helgaas <bhelgaas@google.com> wrote:
> Evidently this is really part of a series, but you didn't label it
> that way.  It looks like this applies on top of your "PCI: Fix hotplug
> remove with sriov again" patch?

Yes.

that one need be back ported to 3.9 and later.

this one need to be for 3.11 final, as Jiang's patch is in 3.11-rc0.

>
> On Fri, Jul 19, 2013 at 1:14 PM, Yinghai Lu <yinghai@kernel.org> wrote:
>> After commit dc087f2f6a2925e81831f3016b9cbb6e470e7423
>> (PCI: Simplify IOV implementation and fix reference count races)
>> VF need to be removed via virtfn_remove to make sure ref to PF
>> is put back.
>
> I'm sure this makes sense, but it needs background.  I haven't figured
> out exactly what the problem is.  You're describing the lowest-level
> details, but not the overall picture that would make it understandable
> to the rest of us.

Overall, after we reversely loop the bus_devices, VF get stop and removed,
but fail to release ref to PF, and prevent PF to be removed and freed.

>
>> So we can not call stop_and_remove for VF before PF, that will
>> make VF get removed directly before PF's driver try to use
>> virtfn_remove to do it.
>>
>> Solution is separating stop and remove in two iterations.
>>
>> In first iteration, we stop VF driver at first with iterating devices
>> reversely, and later during stop PF driver, disable_sriov will use
>> virtfn_remove to remove VFs.
>>
>> Also some driver (like mlx4_core) need VF's driver get stopped before PF.
>
> If this is relevant, please give a pointer to the mlx4_core code that
> requires VFs to be stopped before the PF.

that is add-on benefits.

drivers/net/ethernet/mellanox/mlx4/main.c::
static void mlx4_remove_one(struct pci_dev *pdev)
{
        struct mlx4_dev  *dev  = pci_get_drvdata(pdev);
        struct mlx4_priv *priv = mlx4_priv(dev);
        int p;

        if (dev) {
                /* in SRIOV it is not allowed to unload the pf's
                 * driver while there are alive vf's */
                if (mlx4_is_master(dev)) {
                        if (mlx4_how_many_lives_vf(dev))
                                printk(KERN_ERR "Removing PF when
there are assigned VF's !!!\n");
                }

mlx4_how_many_lives_vf() is checking how VFs have driver loaded.

>
>> To make it simple, separate VGA checking out and do that at first,
>> if there is VGA in the chain, do even try to stop or remove any device
>> under that bus.
>
> This part seems like it could be a separate patch.

ok, will separate it out in next rev.

>
>> Need this one for v3.11.
>
> OK, but why?  We need a better explanation of what problem this fixes.
>  It sounds like it fixes a reference counting problem in v3.11-rc1,
> but I don't know what the effect of that problem is.  Maybe it means a
> device isn't freed when it should be?  Maybe it means we can't add a
> new device after hot-removing an SR-IOV device?

Yes.

>
...
>> Index: linux-2.6/drivers/pci/remove.c
>> ===================================================================
>> --- linux-2.6.orig/drivers/pci/remove.c
>> +++ linux-2.6/drivers/pci/remove.c
>> @@ -56,7 +56,7 @@ void pci_remove_bus(struct pci_bus *bus)
>>  }
>>  EXPORT_SYMBOL(pci_remove_bus);
>>
>> -static void pci_stop_bus_device(struct pci_dev *dev)
>> +void pci_stop_bus_device(struct pci_dev *dev)
>>  {
>>         struct pci_bus *bus = dev->subordinate;
>>         struct pci_dev *child, *tmp;
>> @@ -75,8 +75,9 @@ static void pci_stop_bus_device(struct p
>>
>>         pci_stop_dev(dev);
>>  }
>> +EXPORT_SYMBOL(pci_stop_bus_device);
>>
>> -static void pci_remove_bus_device(struct pci_dev *dev)
>> +void pci_remove_bus_device(struct pci_dev *dev)
>>  {
>>         struct pci_bus *bus = dev->subordinate;
>>         struct pci_dev *child, *tmp;
>> @@ -92,6 +93,7 @@ static void pci_remove_bus_device(struct
>>
>>         pci_destroy_dev(dev);
>>  }
>> +EXPORT_SYMBOL(pci_remove_bus_device);
>
> I really don't want to export these two symbols unless we have to.
> Obviously this is the heart of your patch, so we probably *will* have
> to.

Agree.

>
> But maybe they can at least be confined to drivers/pci code, and not
> exported to modules?  I don't think there's any reason pciehp needs to
> be a module.  I was planning to merge a patch restricting it to be
> built-in this cycle anyway.

sure, you can apply that patch (make pciehp to be built-in) at first.

Thanks

Yinghai

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] PCI: Stop sriov before remove PF
  2013-07-19 23:22       ` Alexander Duyck
@ 2013-07-23 15:34         ` Don Dutile
  2013-07-23 16:10           ` Yinghai Lu
  0 siblings, 1 reply; 26+ messages in thread
From: Don Dutile @ 2013-07-23 15:34 UTC (permalink / raw)
  To: Alexander Duyck
  Cc: Yinghai Lu, Bjorn Helgaas, linux-pci@vger.kernel.org, Jiang Liu,
	Greg Rose

On 07/19/2013 07:22 PM, Alexander Duyck wrote:
> On 07/19/2013 03:44 PM, Yinghai Lu wrote:
>> On Fri, Jul 19, 2013 at 2:46 PM, Alexander Duyck
>> <alexander.h.duyck@intel.com>  wrote:
>>> On 07/19/2013 12:14 PM, Yinghai Lu wrote:
>>>> After commit dc087f2f6a2925e81831f3016b9cbb6e470e7423
>>>> (PCI: Simplify IOV implementation and fix reference count races)
>>>> VF need to be removed via virtfn_remove to make sure ref to PF
>>>> is put back.
>>>>
>>>> Some driver (like ixgbe) does not call pci_disable_sriov() if
>>>> sriov is enabled via /sys/.../sriov_numvfs setting.
>>>> ixgbe does allow driver for PF get detached, but still have VFs
>>>> around.
>>>>
>>>> But how about PF get removed via /sys or pciehp?
>>>>
>>>> During hot-remove, VF will still hold one ref to PF and it
>>>> prevent PF to be removed.
>>>> That make the next hot-add fails, as old PF dev struct is still around.
>>>>
>>>> We need to add pci_disable_sriov() calling during pci dev removing.
>>>>
>>>> Need this one for v3.11
>>>>
>>>> Signed-off-by: Yinghai Lu<yinghai@kernel.org>
>>>> Cc: Jiang Liu<liuj97@gmail.com>
>>>> Cc: Alexander Duyck<alexander.h.duyck@intel.com>
>>>> Cc: Donald Dutile<ddutile@redhat.com>
>>>> Cc: Greg Rose<gregory.v.rose@intel.com>
>>>>
>>>> ---
>>>>   drivers/pci/remove.c |    3 +++
>>>>   1 file changed, 3 insertions(+)
>>>>
>>>> Index: linux-2.6/drivers/pci/remove.c
>>>> ===================================================================
>>>> --- linux-2.6.orig/drivers/pci/remove.c
>>>> +++ linux-2.6/drivers/pci/remove.c
>>>> @@ -34,6 +34,9 @@ static void pci_stop_dev(struct pci_dev
>>>>
>>>>   static void pci_destroy_dev(struct pci_dev *dev)
>>>>   {
>>>> +     /* remove VF, if PF driver skip that */
>>>> +     pci_disable_sriov(dev);
>>>> +
>>>>        down_write(&pci_bus_sem);
>>>>        list_del(&dev->bus_list);
>>>>        up_write(&pci_bus_sem);
>>>>
>>> How are you able to hot-remove the PF if the VFs are still holding
>>> references to it?
>> usually pci_stop_and_remove_bus_device always successfully, and
>> power get turned off for that card.
>
> I'm not an expert in this area, but that doesn't seem right.  How is it
> you can remove a device if there are still outstanding references to
> it?  Is this one of those cases where we have to succeed because the
> system is removing the device and there is nothing we can do to stop it?
>
>>> The issue I see with this patch is that if the PF has any VFs direct
>>> assigned, hot plug removing the PF will cause the guests containing
>>> those VFs to panic.
>> Then you should make guest support hotplug or suprise removal.
>>
>> If the guest does panic because it does support hotplug, that is right behavior.
>>
>> Just like in bare metal machine, if it does not support hotplug, and user would
>> know what is going to happen if he remove one pcie card.
>>
>> Thanks
>>
>> Yinghai
>
> I suspect that is much easier said than done.  We probably need somebody
> familiar with the KVM side of things to address the feasibility of
> something like that.  I believe it was Greg and Don that worked on the
> original patches that made it so that we could leave the VFs in place on
> driver removal.  They would likely have a better answer as to why it is
> preferable to leave the VFs in place than panic a non-compliant guest.

The virtual effect of leaving the VFs in place was the equivalent of unplugging
the cable from the VF device in the guest. When the PF driver was reloaded, it
caused the virtual effect of the network cable being reconnected.  Before that
patch set (in ixgbe & igb), a PF driver unload in the host would result in the VF
assigned to KVM a guest caused a *host crash*.
So, start up a KVM (linux) guest, hot-remove the PF with a VF
assigned to a guest, and with these patches applied, ensure the host doesn't crash.
if it does crash, that's a regression that can't be tolerated, and this patch (set)
will need further work.
- Don

>
> Thanks,
>
> Alex


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] PCI: Separate stop and remove devices in pciehp
  2013-07-23  2:32     ` Yinghai Lu
@ 2013-07-23 15:56       ` Bjorn Helgaas
  2013-07-23 22:44         ` Yinghai Lu
  2013-07-23 23:15         ` Yinghai Lu
  0 siblings, 2 replies; 26+ messages in thread
From: Bjorn Helgaas @ 2013-07-23 15:56 UTC (permalink / raw)
  To: Yinghai Lu; +Cc: linux-pci@vger.kernel.org, Yijing Wang, Jiang Liu

On Mon, Jul 22, 2013 at 07:32:06PM -0700, Yinghai Lu wrote:
> On Mon, Jul 22, 2013 at 2:23 PM, Bjorn Helgaas <bhelgaas@google.com> wrote:
> > Evidently this is really part of a series, but you didn't label it
> > that way.  It looks like this applies on top of your "PCI: Fix hotplug
> > remove with sriov again" patch?
> 
> Yes.
> 
> that one need be back ported to 3.9 and later.
> 
> this one need to be for 3.11 final, as Jiang's patch is in 3.11-rc0.
> 
> >
> > On Fri, Jul 19, 2013 at 1:14 PM, Yinghai Lu <yinghai@kernel.org> wrote:
> >> After commit dc087f2f6a2925e81831f3016b9cbb6e470e7423
> >> (PCI: Simplify IOV implementation and fix reference count races)
> >> VF need to be removed via virtfn_remove to make sure ref to PF
> >> is put back.
> >
> > I'm sure this makes sense, but it needs background.  I haven't figured
> > out exactly what the problem is.  You're describing the lowest-level
> > details, but not the overall picture that would make it understandable
> > to the rest of us.
> 
> Overall, after we reversely loop the bus_devices, VF get stop and removed,
> but fail to release ref to PF, and prevent PF to be removed and freed.
> 
> >
> >> So we can not call stop_and_remove for VF before PF, that will
> >> make VF get removed directly before PF's driver try to use
> >> virtfn_remove to do it.
> >>
> >> Solution is separating stop and remove in two iterations.
> >>
> >> In first iteration, we stop VF driver at first with iterating devices
> >> reversely, and later during stop PF driver, disable_sriov will use
> >> virtfn_remove to remove VFs.
> >>
> >> Also some driver (like mlx4_core) need VF's driver get stopped before PF.
> >
> > If this is relevant, please give a pointer to the mlx4_core code that
> > requires VFs to be stopped before the PF.
> 
> that is add-on benefits.
> 
> drivers/net/ethernet/mellanox/mlx4/main.c::
> static void mlx4_remove_one(struct pci_dev *pdev)
> {
>         struct mlx4_dev  *dev  = pci_get_drvdata(pdev);
>         struct mlx4_priv *priv = mlx4_priv(dev);
>         int p;
> 
>         if (dev) {
>                 /* in SRIOV it is not allowed to unload the pf's
>                  * driver while there are alive vf's */
>                 if (mlx4_is_master(dev)) {
>                         if (mlx4_how_many_lives_vf(dev))
>                                 printk(KERN_ERR "Removing PF when
> there are assigned VF's !!!\n");
>                 }
> 
> mlx4_how_many_lives_vf() is checking how VFs have driver loaded.
> 
> >
> >> To make it simple, separate VGA checking out and do that at first,
> >> if there is VGA in the chain, do even try to stop or remove any device
> >> under that bus.
> >
> > This part seems like it could be a separate patch.
> 
> ok, will separate it out in next rev.
> 
> >
> >> Need this one for v3.11.
> >
> > OK, but why?  We need a better explanation of what problem this fixes.
> >  It sounds like it fixes a reference counting problem in v3.11-rc1,
> > but I don't know what the effect of that problem is.  Maybe it means a
> > device isn't freed when it should be?  Maybe it means we can't add a
> > new device after hot-removing an SR-IOV device?
> 
> Yes.

Can you include an example that shows the failure, like you did for
the "Fix hotplug remove" patch?

> ...
> >> Index: linux-2.6/drivers/pci/remove.c
> >> ===================================================================
> >> --- linux-2.6.orig/drivers/pci/remove.c
> >> +++ linux-2.6/drivers/pci/remove.c
> >> @@ -56,7 +56,7 @@ void pci_remove_bus(struct pci_bus *bus)
> >>  }
> >>  EXPORT_SYMBOL(pci_remove_bus);
> >>
> >> -static void pci_stop_bus_device(struct pci_dev *dev)
> >> +void pci_stop_bus_device(struct pci_dev *dev)
> >>  {
> >>         struct pci_bus *bus = dev->subordinate;
> >>         struct pci_dev *child, *tmp;
> >> @@ -75,8 +75,9 @@ static void pci_stop_bus_device(struct p
> >>
> >>         pci_stop_dev(dev);
> >>  }
> >> +EXPORT_SYMBOL(pci_stop_bus_device);
> >>
> >> -static void pci_remove_bus_device(struct pci_dev *dev)
> >> +void pci_remove_bus_device(struct pci_dev *dev)
> >>  {
> >>         struct pci_bus *bus = dev->subordinate;
> >>         struct pci_dev *child, *tmp;
> >> @@ -92,6 +93,7 @@ static void pci_remove_bus_device(struct
> >>
> >>         pci_destroy_dev(dev);
> >>  }
> >> +EXPORT_SYMBOL(pci_remove_bus_device);
> >
> > I really don't want to export these two symbols unless we have to.
> > Obviously this is the heart of your patch, so we probably *will* have
> > to.
> 
> Agree.
> 
> >
> > But maybe they can at least be confined to drivers/pci code, and not
> > exported to modules?  I don't think there's any reason pciehp needs to
> > be a module.  I was planning to merge a patch restricting it to be
> > built-in this cycle anyway.
> 
> sure, you can apply that patch (make pciehp to be built-in) at first.

Below is the patch I intend to apply.  We can easily do this for v3.12.
But your patch needs to be in v3.11, so we'll have to figure out how
to handle that.  Maybe we can put the Kconfig change in v3.11, too.
It should be low-risk because it doesn't add any new code paths that
weren't in v3.10.

Bjorn


commit 04216ce0f2381090572142ebab28f63bae157d8d
Author: Bjorn Helgaas <bhelgaas@google.com>
Date:   Thu Jun 27 10:16:19 2013 -0600

    PCI: pciehp: Convert pciehp to be builtin only, not modular
    
    Convert pciehp to be builtin only, with no module option.
    
    Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>

diff --git a/drivers/pci/pcie/Kconfig b/drivers/pci/pcie/Kconfig
index a82e70a..7958e59 100644
--- a/drivers/pci/pcie/Kconfig
+++ b/drivers/pci/pcie/Kconfig
@@ -14,15 +14,12 @@ config PCIEPORTBUS
 # Include service Kconfig here
 #
 config HOTPLUG_PCI_PCIE
-	tristate "PCI Express Hotplug driver"
+	bool "PCI Express Hotplug driver"
 	depends on HOTPLUG_PCI && PCIEPORTBUS
 	help
 	  Say Y here if you have a motherboard that supports PCI Express Native
 	  Hotplug
 
-	  To compile this driver as a module, choose M here: the
-	  module will be called pciehp.
-
 	  When in doubt, say N.
 
 source "drivers/pci/pcie/aer/Kconfig"

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH] PCI: Stop sriov before remove PF
  2013-07-23 15:34         ` Don Dutile
@ 2013-07-23 16:10           ` Yinghai Lu
  0 siblings, 0 replies; 26+ messages in thread
From: Yinghai Lu @ 2013-07-23 16:10 UTC (permalink / raw)
  To: Don Dutile
  Cc: Alexander Duyck, Bjorn Helgaas, linux-pci@vger.kernel.org,
	Jiang Liu, Greg Rose

On Tue, Jul 23, 2013 at 8:34 AM, Don Dutile <ddutile@redhat.com> wrote:
>>
>> I suspect that is much easier said than done.  We probably need somebody
>> familiar with the KVM side of things to address the feasibility of
>> something like that.  I believe it was Greg and Don that worked on the
>> original patches that made it so that we could leave the VFs in place on
>> driver removal.  They would likely have a better answer as to why it is
>> preferable to leave the VFs in place than panic a non-compliant guest.
>
>
> The virtual effect of leaving the VFs in place was the equivalent of
> unplugging
> the cable from the VF device in the guest. When the PF driver was reloaded,
> it
> caused the virtual effect of the network cable being reconnected.  Before
> that
> patch set (in ixgbe & igb), a PF driver unload in the host would result in
> the VF
> assigned to KVM a guest caused a *host crash*.
> So, start up a KVM (linux) guest, hot-remove the PF with a VF
> assigned to a guest, and with these patches applied, ensure the host doesn't
> crash.
> if it does crash, that's a regression that can't be tolerated, and this
> patch (set)
> will need further work.

at beginning, we have pcie native hotlpug working even with sriov enabled.

later patchset (in ixgbe & igb) will not call disable_sriov, that cause

hotplug does not work anymore, so that is first regression at all.

Anyway, guest host crash is not regression when PF get removed, that
is old behavior when guest/sriov pci passthrough support is added.

So need to pci_stub to notify guest to do hotremove, when PF's driver
get detached?

Thanks

Yinghai

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] PCI: Fix hotplug remove with sriov again
  2013-07-19 19:14 [PATCH] PCI: Fix hotplug remove with sriov again Yinghai Lu
                   ` (3 preceding siblings ...)
  2013-07-22 17:39 ` Bjorn Helgaas
@ 2013-07-23 17:40 ` Bjorn Helgaas
  2013-07-24  2:01 ` Yijing Wang
  5 siblings, 0 replies; 26+ messages in thread
From: Bjorn Helgaas @ 2013-07-23 17:40 UTC (permalink / raw)
  To: Yinghai Lu; +Cc: linux-pci@vger.kernel.org, Yijing Wang, stable@vger.kernel.org

On Fri, Jul 19, 2013 at 1:14 PM, Yinghai Lu <yinghai@kernel.org> wrote:
> Found hot-remove pcie card with sriov enabled cause crash in v3.10.
>
> It is regression caused by commit ba518e3c177547dfebf7fa7252cea0c850e7ce25
> (PCI: pciehp: Iterate over all devices in slot, not functions 0-7)
>
> That commit change to use bus->devices to iterate devices under
> bus to run pci_stop_and_remove_bus_device().
> Actually it duplicates the problem with those bus->devices iteratation
> that we try to fix in commit ac205b7bb72fa4227d2e79979bbe2b4687cdf44d
> (PCI: make sriov work with hotplug remove)
>
> Change to iterate reversely as we did last time.
>
> Signed-off-by: Yinghai Lu <yinghai@kernel.org>
> Cc: Yijing Wang <wangyijing@huawei.com>
> Cc: <stable@vger.kernel.org> v3.9+

Applied to for-linus for v3.11, thanks.

> ---
>  drivers/pci/hotplug/pciehp_pci.c |    8 +++++++-
>  1 file changed, 7 insertions(+), 1 deletion(-)
>
> Index: linux-2.6/drivers/pci/hotplug/pciehp_pci.c
> ===================================================================
> --- linux-2.6.orig/drivers/pci/hotplug/pciehp_pci.c
> +++ linux-2.6/drivers/pci/hotplug/pciehp_pci.c
> @@ -92,7 +92,13 @@ int pciehp_unconfigure_device(struct slo
>         if (ret)
>                 presence = 0;
>
> -       list_for_each_entry_safe(dev, temp, &parent->devices, bus_list) {
> +       /*
> +        * Need to iterate device reversely, as during
> +        * stop PF driver, VF will be removed, the list_for_each
> +        * could point to removed VF with temp.
> +        */
> +       list_for_each_entry_safe_reverse(dev, temp, &parent->devices,
> +                                        bus_list) {
>                 pci_dev_get(dev);
>                 if (dev->hdr_type == PCI_HEADER_TYPE_BRIDGE && presence) {
>                         pci_read_config_byte(dev, PCI_BRIDGE_CONTROL, &bctl);

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] PCI: Separate stop and remove devices in pciehp
  2013-07-23 15:56       ` Bjorn Helgaas
@ 2013-07-23 22:44         ` Yinghai Lu
  2013-07-23 23:15         ` Yinghai Lu
  1 sibling, 0 replies; 26+ messages in thread
From: Yinghai Lu @ 2013-07-23 22:44 UTC (permalink / raw)
  To: Bjorn Helgaas; +Cc: linux-pci@vger.kernel.org, Yijing Wang, Jiang Liu

On Tue, Jul 23, 2013 at 8:56 AM, Bjorn Helgaas <bhelgaas@google.com> wrote:
> Can you include an example that shows the failure, like you did for
> the "Fix hotplug remove" patch?

Rescue:~ # echo -n 0 > /sys/bus/pci/slots/2/power
[  342.236825] pci_hotplug: power_write_file: power = 0
[  342.237149] pciehp 0000:00:03.0:pcie04: disable_slot: physical_slot = 2
[  342.247370] pciehp 0000:00:03.0:pcie04: pciehp_get_power_status:
SLOTCTRL a8 value read 1f9
[  342.247922] pciehp 0000:00:03.0:pcie04: pciehp_unconfigure_device:
domain:bus:dev = 0000:02:00
[  342.467075] mlx4_core 0000:02:00.0: Received reset from slave:63
[  342.467909]   free irq_desc for 1422
[  342.468154]   free irq_desc for 1423
[  342.468181]   free irq_desc for 1424
[  342.468208]   free irq_desc for 1425
[  342.469192] pci 0000:02:07.7: freeing pci_dev info
[  342.708175] mlx4_core 0000:02:00.0: Received reset from slave:62
[  342.708990]   free irq_desc for 1418
[  342.709223]   free irq_desc for 1419
[  342.727400]   free irq_desc for 1420
[  342.727656]   free irq_desc for 1421
[  342.728072] pci 0000:02:07.6: freeing pci_dev info
[  342.946429] mlx4_core 0000:02:00.0: Received reset from slave:61
[  342.947283]   free irq_desc for 1414
[  342.947511]   free irq_desc for 1415
[  342.957275]   free irq_desc for 1416
[  342.957301]   free irq_desc for 1417
[  342.957722] pci 0000:02:07.5: freeing pci_dev info
[  343.189692] mlx4_core 0000:02:00.0: Received reset from slave:60
[  343.190545]   free irq_desc for 1410
[  343.190814]   free irq_desc for 1411
[  343.207337]   free irq_desc for 1412
[  343.207576]   free irq_desc for 1413
[  343.207985] pci 0000:02:07.4: freeing pci_dev info
[  343.410663] mlx4_core 0000:02:00.0: Received reset from slave:59
[  343.411639]   free irq_desc for 1406
[  343.411880]   free irq_desc for 1407
[  343.427370]   free irq_desc for 1408
[  343.427412]   free irq_desc for 1409
[  343.427741] pci 0000:02:07.3: freeing pci_dev info
[  343.630818] mlx4_core 0000:02:00.0: Received reset from slave:58
[  343.631652]   free irq_desc for 1402
[  343.631842]   free irq_desc for 1403
[  343.647404]   free irq_desc for 1404
[  343.647433]   free irq_desc for 1405
[  343.647769] pci 0000:02:07.2: freeing pci_dev info
[  343.877043] mlx4_core 0000:02:00.0: Received reset from slave:57
[  343.877884]   free irq_desc for 1398
[  343.878128]   free irq_desc for 1399
[  343.887382]   free irq_desc for 1400
[  343.887409]   free irq_desc for 1401
[  343.887827] pci 0000:02:07.1: freeing pci_dev info
[  344.117975] mlx4_core 0000:02:00.0: Received reset from slave:56
[  344.118777]   free irq_desc for 1394
[  344.118978]   free irq_desc for 1395
[  344.137653]   free irq_desc for 1396
[  344.137683]   free irq_desc for 1397
[  344.138013] pci 0000:02:07.0: freeing pci_dev info
[  344.353527] mlx4_core 0000:02:00.0: Received reset from slave:55
[  344.354368]   free irq_desc for 1390
[  344.354616]   free irq_desc for 1391
[  344.367628]   free irq_desc for 1392
[  344.367654]   free irq_desc for 1393
[  344.368080] pci 0000:02:06.7: freeing pci_dev info
[  344.578265] mlx4_core 0000:02:00.0: Received reset from slave:54
[  344.579170]   free irq_desc for 1386
[  344.579393]   free irq_desc for 1387
[  344.597591]   free irq_desc for 1388
[  344.597829]   free irq_desc for 1389
[  344.598375] pci 0000:02:06.6: freeing pci_dev info
[  344.823697] mlx4_core 0000:02:00.0: Received reset from slave:53
[  344.824566]   free irq_desc for 1382
[  344.824796]   free irq_desc for 1383
[  344.837932]   free irq_desc for 1384
[  344.837958]   free irq_desc for 1385
[  344.838295] pci 0000:02:06.5: freeing pci_dev info
[  345.080025] mlx4_core 0000:02:00.0: Received reset from slave:52
[  345.080991]   free irq_desc for 1378
[  345.081017]   free irq_desc for 1379
[  345.081044]   free irq_desc for 1380
[  345.081072]   free irq_desc for 1381
[  345.081673] pci 0000:02:06.4: freeing pci_dev info
[  345.324705] mlx4_core 0000:02:00.0: Received reset from slave:51
[  345.325533]   free irq_desc for 1374
[  345.325776]   free irq_desc for 1375
[  345.337761]   free irq_desc for 1376
[  345.338003]   free irq_desc for 1377
[  345.338344] pci 0000:02:06.3: freeing pci_dev info
[  345.561761] mlx4_core 0000:02:00.0: Received reset from slave:50
[  345.562572]   free irq_desc for 1370
[  345.562854]   free irq_desc for 1371
[  345.577696]   free irq_desc for 1372
[  345.577900]   free irq_desc for 1373
[  345.578428] pci 0000:02:06.2: freeing pci_dev info
[  345.817280] mlx4_core 0000:02:00.0: Received reset from slave:49
[  345.818069]   free irq_desc for 1366
[  345.818334]   free irq_desc for 1367
[  345.827792]   free irq_desc for 1368
[  345.828041]   free irq_desc for 1369
[  345.828479] pci 0000:02:06.1: freeing pci_dev info
[  346.043593] mlx4_core 0000:02:00.0: Received reset from slave:48
[  346.044435]   free irq_desc for 1362
[  346.044696]   free irq_desc for 1363
[  346.057780]   free irq_desc for 1364
[  346.058026]   free irq_desc for 1365
[  346.058468] pci 0000:02:06.0: freeing pci_dev info
[  346.285626] mlx4_core 0000:02:00.0: Received reset from slave:47
[  346.286447]   free irq_desc for 1358
[  346.286654]   free irq_desc for 1359
[  346.298437]   free irq_desc for 1360
[  346.298468]   free irq_desc for 1361
[  346.298883] pci 0000:02:05.7: freeing pci_dev info
[  346.521972] mlx4_core 0000:02:00.0: Received reset from slave:46
[  346.522804]   free irq_desc for 1354
[  346.523021]   free irq_desc for 1355
[  346.537805]   free irq_desc for 1356
[  346.538014]   free irq_desc for 1357
[  346.538345] pci 0000:02:05.6: freeing pci_dev info
[  346.758293] mlx4_core 0000:02:00.0: Received reset from slave:45
[  346.759116]   free irq_desc for 1350
[  346.759326]   free irq_desc for 1351
[  346.777970]   free irq_desc for 1352
[  346.778225]   free irq_desc for 1353
[  346.778594] pci 0000:02:05.5: freeing pci_dev info
[  347.002991] mlx4_core 0000:02:00.0: Received reset from slave:44
[  347.003844]   free irq_desc for 1346
[  347.004075]   free irq_desc for 1347
[  347.017996]   free irq_desc for 1348
[  347.018022]   free irq_desc for 1349
[  347.018428] pci 0000:02:05.4: freeing pci_dev info
[  347.246432] mlx4_core 0000:02:00.0: Received reset from slave:43
[  347.247341]   free irq_desc for 1342
[  347.247584]   free irq_desc for 1343
[  347.258171]   free irq_desc for 1344
[  347.258198]   free irq_desc for 1345
[  347.258711] pci 0000:02:05.3: freeing pci_dev info
[  347.479329] mlx4_core 0000:02:00.0: Received reset from slave:42
[  347.480199]   free irq_desc for 1338
[  347.480428]   free irq_desc for 1339
[  347.498712]   free irq_desc for 1340
[  347.498740]   free irq_desc for 1341
[  347.499261] pci 0000:02:05.2: freeing pci_dev info
[  347.722833] mlx4_core 0000:02:00.0: Received reset from slave:41
[  347.723694]   free irq_desc for 1334
[  347.723909]   free irq_desc for 1335
[  347.738117]   free irq_desc for 1336
[  347.738145]   free irq_desc for 1337
[  347.738865] pci 0000:02:05.1: freeing pci_dev info
[  347.966906] mlx4_core 0000:02:00.0: Received reset from slave:40
[  347.967833]   free irq_desc for 1330
[  347.968129]   free irq_desc for 1331
[  347.978209]   free irq_desc for 1332
[  347.978235]   free irq_desc for 1333
[  347.978779] pci 0000:02:05.0: freeing pci_dev info
[  348.202972] mlx4_core 0000:02:00.0: Received reset from slave:39
[  348.203801]   free irq_desc for 1326
[  348.204106]   free irq_desc for 1327
[  348.218210]   free irq_desc for 1328
[  348.218239]   free irq_desc for 1329
[  348.218707] pci 0000:02:04.7: freeing pci_dev info
[  348.440833] mlx4_core 0000:02:00.0: Received reset from slave:38
[  348.441566]   free irq_desc for 1322
[  348.441592]   free irq_desc for 1323
[  348.441618]   free irq_desc for 1324
[  348.441645]   free irq_desc for 1325
[  348.442210] pci 0000:02:04.6: freeing pci_dev info
[  348.674736] mlx4_core 0000:02:00.0: Received reset from slave:37
[  348.675620]   free irq_desc for 1318
[  348.675836]   free irq_desc for 1319
[  348.688249]   free irq_desc for 1320
[  348.688279]   free irq_desc for 1321
[  348.688913] pci 0000:02:04.5: freeing pci_dev info
[  348.915084] mlx4_core 0000:02:00.0: Received reset from slave:36
[  348.915924]   free irq_desc for 1314
[  348.916166]   free irq_desc for 1315
[  348.928316]   free irq_desc for 1316
[  348.928342]   free irq_desc for 1317
[  348.928891] pci 0000:02:04.4: freeing pci_dev info
[  349.163498] mlx4_core 0000:02:00.0: Received reset from slave:35
[  349.164328]   free irq_desc for 1310
[  349.164557]   free irq_desc for 1311
[  349.178429]   free irq_desc for 1312
[  349.178457]   free irq_desc for 1313
[  349.178925] pci 0000:02:04.3: freeing pci_dev info
[  349.408023] mlx4_core 0000:02:00.0: Received reset from slave:34
[  349.408968]   free irq_desc for 1306
[  349.409262]   free irq_desc for 1307
[  349.428433]   free irq_desc for 1308
[  349.428458]   free irq_desc for 1309
[  349.429123] pci 0000:02:04.2: freeing pci_dev info
[  349.654194] mlx4_core 0000:02:00.0: Received reset from slave:33
[  349.655147]   free irq_desc for 1302
[  349.655371]   free irq_desc for 1303
[  349.668461]   free irq_desc for 1304
[  349.668493]   free irq_desc for 1305
[  349.669050] pci 0000:02:04.1: freeing pci_dev info
[  349.891204] mlx4_core 0000:02:00.0: Received reset from slave:32
[  349.892084]   free irq_desc for 1298
[  349.892379]   free irq_desc for 1299
[  349.910554]   free irq_desc for 1300
[  349.910580]   free irq_desc for 1301
[  349.910977] pci 0000:02:04.0: freeing pci_dev info
[  350.135234] mlx4_core 0000:02:00.0: Received reset from slave:31
[  350.136159]   free irq_desc for 1294
[  350.136402]   free irq_desc for 1295
[  350.148679]   free irq_desc for 1296
[  350.148706]   free irq_desc for 1297
[  350.149284] pci 0000:02:03.7: freeing pci_dev info
[  350.375403] mlx4_core 0000:02:00.0: Received reset from slave:30
[  350.376317]   free irq_desc for 1290
[  350.376562]   free irq_desc for 1291
[  350.388723]   free irq_desc for 1292
[  350.388752]   free irq_desc for 1293
[  350.389227] pci 0000:02:03.6: freeing pci_dev info
[  350.623964] mlx4_core 0000:02:00.0: Received reset from slave:29
[  350.624553]   free irq_desc for 1286
[  350.624797]   free irq_desc for 1287
[  350.639613]   free irq_desc for 1288
[  350.639639]   free irq_desc for 1289
[  350.640102] pci 0000:02:03.5: freeing pci_dev info
[  350.859688] mlx4_core 0000:02:00.0: Received reset from slave:28
[  350.860599]   free irq_desc for 1282
[  350.860867]   free irq_desc for 1283
[  350.878685]   free irq_desc for 1284
[  350.878713]   free irq_desc for 1285
[  350.879210] pci 0000:02:03.4: freeing pci_dev info
[  351.120632] mlx4_core 0000:02:00.0: Received reset from slave:27
[  351.121627]   free irq_desc for 1278
[  351.121876]   free irq_desc for 1279
[  351.138726]   free irq_desc for 1280
[  351.138754]   free irq_desc for 1281
[  351.139239] pci 0000:02:03.3: freeing pci_dev info
[  351.371352] mlx4_core 0000:02:00.0: Received reset from slave:26
[  351.372238]   free irq_desc for 1274
[  351.372446]   free irq_desc for 1275
[  351.388802]   free irq_desc for 1276
[  351.388830]   free irq_desc for 1277
[  351.389344] pci 0000:02:03.2: freeing pci_dev info
[  351.607354] mlx4_core 0000:02:00.0: Received reset from slave:25
[  351.608232]   free irq_desc for 1270
[  351.608542]   free irq_desc for 1271
[  351.618836]   free irq_desc for 1272
[  351.618863]   free irq_desc for 1273
[  351.619517] pci 0000:02:03.1: freeing pci_dev info
[  351.847766] mlx4_core 0000:02:00.0: Received reset from slave:24
[  351.848654]   free irq_desc for 1266
[  351.848939]   free irq_desc for 1267
[  351.859021]   free irq_desc for 1268
[  351.859050]   free irq_desc for 1269
[  351.859523] pci 0000:02:03.0: freeing pci_dev info
[  352.088053] mlx4_core 0000:02:00.0: Received reset from slave:23
[  352.088968]   free irq_desc for 1262
[  352.089296]   free irq_desc for 1263
[  352.099156]   free irq_desc for 1264
[  352.099183]   free irq_desc for 1265
[  352.099819] pci 0000:02:02.7: freeing pci_dev info
[  352.315173] mlx4_core 0000:02:00.0: Received reset from slave:22
[  352.316039]   free irq_desc for 1258
[  352.316318]   free irq_desc for 1259
[  352.329593]   free irq_desc for 1260
[  352.329621]   free irq_desc for 1261
[  352.330101] pci 0000:02:02.6: freeing pci_dev info
[  352.552492] mlx4_core 0000:02:00.0: Received reset from slave:21
[  352.553419]   free irq_desc for 1254
[  352.553618]   free irq_desc for 1255
[  352.568953]   free irq_desc for 1256
[  352.569255]   free irq_desc for 1257
[  352.569765] pci 0000:02:02.5: freeing pci_dev info
[  352.795614] mlx4_core 0000:02:00.0: Received reset from slave:20
[  352.796516]   free irq_desc for 1250
[  352.796782]   free irq_desc for 1251
[  352.810942]   free irq_desc for 1252
[  352.810970]   free irq_desc for 1253
[  352.811667] pci 0000:02:02.4: freeing pci_dev info
[  353.036489] mlx4_core 0000:02:00.0: Received reset from slave:19
[  353.037399]   free irq_desc for 1246
[  353.037618]   free irq_desc for 1247
[  353.049243]   free irq_desc for 1248
[  353.049269]   free irq_desc for 1249
[  353.049919] pci 0000:02:02.3: freeing pci_dev info
[  353.268173] mlx4_core 0000:02:00.0: Received reset from slave:18
[  353.269072]   free irq_desc for 1242
[  353.269338]   free irq_desc for 1243
[  353.279047]   free irq_desc for 1244
[  353.279075]   free irq_desc for 1245
[  353.279578] pci 0000:02:02.2: freeing pci_dev info
[  353.495897] mlx4_core 0000:02:00.0: Received reset from slave:17
[  353.496792]   free irq_desc for 1238
[  353.497020]   free irq_desc for 1239
[  353.509130]   free irq_desc for 1240
[  353.509155]   free irq_desc for 1241
[  353.509591] pci 0000:02:02.1: freeing pci_dev info
[  353.727846] mlx4_core 0000:02:00.0: Received reset from slave:16
[  353.728760]   free irq_desc for 1234
[  353.729031]   free irq_desc for 1235
[  353.739199]   free irq_desc for 1236
[  353.739226]   free irq_desc for 1237
[  353.739710] pci 0000:02:02.0: freeing pci_dev info
[  353.964528] mlx4_core 0000:02:00.0: Received reset from slave:15
[  353.965411]   free irq_desc for 1230
[  353.965702]   free irq_desc for 1231
[  353.979235]   free irq_desc for 1232
[  353.979264]   free irq_desc for 1233
[  353.979754] pci 0000:02:01.7: freeing pci_dev info
[  354.192662] mlx4_core 0000:02:00.0: Received reset from slave:14
[  354.193571]   free irq_desc for 1226
[  354.193818]   free irq_desc for 1227
[  354.211294]   free irq_desc for 1228
[  354.211323]   free irq_desc for 1229
[  354.211725] pci 0000:02:01.6: freeing pci_dev info
[  354.430697] mlx4_core 0000:02:00.0: Received reset from slave:13
[  354.431291]   free irq_desc for 1222
[  354.431596]   free irq_desc for 1223
[  354.449506]   free irq_desc for 1224
[  354.449868]   free irq_desc for 1225
[  354.450335] pci 0000:02:01.5: freeing pci_dev info
[  354.671942] mlx4_core 0000:02:00.0: Received reset from slave:12
[  354.672829]   free irq_desc for 1218
[  354.673044]   free irq_desc for 1219
[  354.689367]   free irq_desc for 1220
[  354.689710]   free irq_desc for 1221
[  354.690078] pci 0000:02:01.4: freeing pci_dev info
[  354.905203] mlx4_core 0000:02:00.0: Received reset from slave:11
[  354.906106]   free irq_desc for 1214
[  354.906403]   free irq_desc for 1215
[  354.919409]   free irq_desc for 1216
[  354.919441]   free irq_desc for 1217
[  354.919868] pci 0000:02:01.3: freeing pci_dev info
[  355.136846] mlx4_core 0000:02:00.0: Received reset from slave:10
[  355.137760]   free irq_desc for 1210
[  355.138031]   free irq_desc for 1211
[  355.149449]   free irq_desc for 1212
[  355.149474]   free irq_desc for 1213
[  355.150103] pci 0000:02:01.2: freeing pci_dev info
[  355.376467] mlx4_core 0000:02:00.0: Received reset from slave:9
[  355.377307]   free irq_desc for 1206
[  355.377613]   free irq_desc for 1207
[  355.389630]   free irq_desc for 1208
[  355.389658]   free irq_desc for 1209
[  355.390091] pci 0000:02:01.1: freeing pci_dev info
[  355.614027] mlx4_core 0000:02:00.0: Received reset from slave:8
[  355.614871]   free irq_desc for 1202
[  355.615125]   free irq_desc for 1203
[  355.615151]   free irq_desc for 1204
[  355.615177]   free irq_desc for 1205
[  355.615608] pci 0000:02:01.0: freeing pci_dev info
[  355.845129] mlx4_core 0000:02:00.0: Received reset from slave:7
[  355.846105]   free irq_desc for 1198
[  355.846343]   free irq_desc for 1199
[  355.859707]   free irq_desc for 1200
[  355.859738]   free irq_desc for 1201
[  355.860170] pci 0000:02:00.7: freeing pci_dev info
[  356.080643] mlx4_core 0000:02:00.0: Received reset from slave:6
[  356.081540]   free irq_desc for 1194
[  356.081821]   free irq_desc for 1195
[  356.099602]   free irq_desc for 1196
[  356.099628]   free irq_desc for 1197
[  356.100048] pci 0000:02:00.6: freeing pci_dev info
[  356.317310] mlx4_core 0000:02:00.0: Received reset from slave:5
[  356.318150]   free irq_desc for 1190
[  356.318414]   free irq_desc for 1191
[  356.329658]   free irq_desc for 1192
[  356.329687]   free irq_desc for 1193
[  356.330131] pci 0000:02:00.5: freeing pci_dev info
[  356.560616] mlx4_core 0000:02:00.0: Received reset from slave:4
[  356.561558]   free irq_desc for 1186
[  356.561796]   free irq_desc for 1187
[  356.579690]   free irq_desc for 1188
[  356.579716]   free irq_desc for 1189
[  356.580123] pci 0000:02:00.4: freeing pci_dev info
[  356.805358] mlx4_core 0000:02:00.0: Received reset from slave:3
[  356.806231]   free irq_desc for 1182
[  356.806512]   free irq_desc for 1183
[  356.819726]   free irq_desc for 1184
[  356.819755]   free irq_desc for 1185
[  356.820244] pci 0000:02:00.3: freeing pci_dev info
[  357.037258] mlx4_core 0000:02:00.0: Received reset from slave:2
[  357.038125]   free irq_desc for 1178
[  357.038350]   free irq_desc for 1179
[  357.049932]   free irq_desc for 1180
[  357.049956]   free irq_desc for 1181
[  357.050484] pci 0000:02:00.2: freeing pci_dev info
[  357.280942] mlx4_core 0000:02:00.0: Received reset from slave:1
[  357.281875]   free irq_desc for 1174
[  357.282105]   free irq_desc for 1175
[  357.299910]   free irq_desc for 1176
[  357.299936]   free irq_desc for 1177
[  357.300380] pci 0000:02:00.1: freeing pci_dev info
[  357.666540]   free irq_desc for 1170
[  357.666769]   free irq_desc for 1171
[  357.666969]   free irq_desc for 1172
[  357.667331]   free irq_desc for 1173
[  357.679938] mlx4_core 0000:02:00.0: Disabling SR-IOV
[  359.692517] pcieport 0000:00:03.0: pcie_link_disable_set: lnk_ctrl = 50
[  359.693860] pciehp 0000:00:03.0:pcie04: pcie_isr: intr_loc 10
[  359.710375] pciehp 0000:00:03.0:pcie04: pciehp_power_off_slot:
SLOTCTRL a8 write cmd 400
[  360.713164] pciehp 0000:00:03.0:pcie04: pcie_isr: intr_loc 10
[  360.713568] pciehp 0000:00:03.0:pcie04: pciehp_green_led_off:
SLOTCTRL a8 write cmd 300
Rescue:~ #
Rescue:~ # echo -n 1 > /sys/bus/pci/slots/2/power
[  386.009999] pci_hotplug: power_write_file: power = 1
[  386.010279] pciehp 0000:00:03.0:pcie04: enable_slot: physical_slot = 2
[  386.014703] pciehp 0000:00:03.0:pcie04: pciehp_get_power_status:
SLOTCTRL a8 value read 7f9
[  386.016722] pciehp 0000:00:03.0:pcie04: pcie_isr: intr_loc 10
[  386.034851] pciehp 0000:00:03.0:pcie04: pciehp_power_on_slot:
SLOTCTRL a8 write cmd 0
[  386.054793] pcieport 0000:00:03.0: pcie_link_disable_set: lnk_ctrl = 40
[  386.055342] pciehp 0000:00:03.0:pcie04: pcie_isr: intr_loc 10
[  386.075495] pciehp 0000:00:03.0:pcie04: pciehp_green_led_blink:
SLOTCTRL a8 write cmd 200
[  386.345818] pciehp 0000:00:03.0:pcie04: check_link_active: lnk_status = f082
[  386.449877] pciehp 0000:00:03.0:pcie04: pciehp_check_link_status:
lnk_status = f082
[  386.450350] pciehp 0000:00:03.0:pcie04: Device 0000:02:00.0 already
exists at 0000:02:00, cannot hot-add
[  386.464941] pciehp 0000:00:03.0:pcie04: Cannot add device at 0000:02:00
[  386.485177] pcieport 0000:00:03.0: pcie_link_disable_set: lnk_ctrl = 50
[  386.485180] pciehp 0000:00:03.0:pcie04: check_link_active: lnk_status = f882
[  386.499216] pciehp 0000:00:03.0:pcie04: pcie_isr: intr_loc 10
[  386.499236] pciehp 0000:00:03.0:pcie04: pciehp_power_off_slot:
SLOTCTRL a8 write cmd 400
[  387.503092] pciehp 0000:00:03.0:pcie04: pcie_isr: intr_loc 10
[  387.503425] pciehp 0000:00:03.0:pcie04: pciehp_green_led_off:
SLOTCTRL a8 write cmd 300
[  387.515045] pciehp 0000:00:03.0:pcie04:
pciehp_set_attention_status: SLOTCTRL a8 write cmd 40
[  387.536437] pciehp 0000:00:03.0:pcie04: pcie_isr: intr_loc 10
-bash: echo: write error: Invalid argument

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] PCI: Separate stop and remove devices in pciehp
  2013-07-23 15:56       ` Bjorn Helgaas
  2013-07-23 22:44         ` Yinghai Lu
@ 2013-07-23 23:15         ` Yinghai Lu
  2013-07-24  4:00           ` Yijing Wang
  1 sibling, 1 reply; 26+ messages in thread
From: Yinghai Lu @ 2013-07-23 23:15 UTC (permalink / raw)
  To: Bjorn Helgaas; +Cc: linux-pci@vger.kernel.org, Yijing Wang, Jiang Liu

[-- Attachment #1: Type: text/plain, Size: 6481 bytes --]

On Tue, Jul 23, 2013 at 8:56 AM, Bjorn Helgaas <bhelgaas@google.com> wrote:
> On Mon, Jul 22, 2013 at 07:32:06PM -0700, Yinghai Lu wrote:
>> On Mon, Jul 22, 2013 at 2:23 PM, Bjorn Helgaas <bhelgaas@google.com> wrote:
>> > Evidently this is really part of a series, but you didn't label it
>> > that way.  It looks like this applies on top of your "PCI: Fix hotplug
>> > remove with sriov again" patch?
>>
>> Yes.
>>
>> that one need be back ported to 3.9 and later.
>>
>> this one need to be for 3.11 final, as Jiang's patch is in 3.11-rc0.
>>
>> >
>> > On Fri, Jul 19, 2013 at 1:14 PM, Yinghai Lu <yinghai@kernel.org> wrote:
>> >> After commit dc087f2f6a2925e81831f3016b9cbb6e470e7423
>> >> (PCI: Simplify IOV implementation and fix reference count races)
>> >> VF need to be removed via virtfn_remove to make sure ref to PF
>> >> is put back.
>> >
>> > I'm sure this makes sense, but it needs background.  I haven't figured
>> > out exactly what the problem is.  You're describing the lowest-level
>> > details, but not the overall picture that would make it understandable
>> > to the rest of us.
>>
>> Overall, after we reversely loop the bus_devices, VF get stop and removed,
>> but fail to release ref to PF, and prevent PF to be removed and freed.
>>
>> >
>> >> So we can not call stop_and_remove for VF before PF, that will
>> >> make VF get removed directly before PF's driver try to use
>> >> virtfn_remove to do it.
>> >>
>> >> Solution is separating stop and remove in two iterations.
>> >>
>> >> In first iteration, we stop VF driver at first with iterating devices
>> >> reversely, and later during stop PF driver, disable_sriov will use
>> >> virtfn_remove to remove VFs.
>> >>
>> >> Also some driver (like mlx4_core) need VF's driver get stopped before PF.
>> >
>> > If this is relevant, please give a pointer to the mlx4_core code that
>> > requires VFs to be stopped before the PF.
>>
>> that is add-on benefits.
>>
>> drivers/net/ethernet/mellanox/mlx4/main.c::
>> static void mlx4_remove_one(struct pci_dev *pdev)
>> {
>>         struct mlx4_dev  *dev  = pci_get_drvdata(pdev);
>>         struct mlx4_priv *priv = mlx4_priv(dev);
>>         int p;
>>
>>         if (dev) {
>>                 /* in SRIOV it is not allowed to unload the pf's
>>                  * driver while there are alive vf's */
>>                 if (mlx4_is_master(dev)) {
>>                         if (mlx4_how_many_lives_vf(dev))
>>                                 printk(KERN_ERR "Removing PF when
>> there are assigned VF's !!!\n");
>>                 }
>>
>> mlx4_how_many_lives_vf() is checking how VFs have driver loaded.
>>
>> >
>> >> To make it simple, separate VGA checking out and do that at first,
>> >> if there is VGA in the chain, do even try to stop or remove any device
>> >> under that bus.
>> >
>> > This part seems like it could be a separate patch.
>>
>> ok, will separate it out in next rev.
>>
>> >
>> >> Need this one for v3.11.
>> >
>> > OK, but why?  We need a better explanation of what problem this fixes.
>> >  It sounds like it fixes a reference counting problem in v3.11-rc1,
>> > but I don't know what the effect of that problem is.  Maybe it means a
>> > device isn't freed when it should be?  Maybe it means we can't add a
>> > new device after hot-removing an SR-IOV device?
>>
>> Yes.
>
> Can you include an example that shows the failure, like you did for
> the "Fix hotplug remove" patch?
>
>> ...
>> >> Index: linux-2.6/drivers/pci/remove.c
>> >> ===================================================================
>> >> --- linux-2.6.orig/drivers/pci/remove.c
>> >> +++ linux-2.6/drivers/pci/remove.c
>> >> @@ -56,7 +56,7 @@ void pci_remove_bus(struct pci_bus *bus)
>> >>  }
>> >>  EXPORT_SYMBOL(pci_remove_bus);
>> >>
>> >> -static void pci_stop_bus_device(struct pci_dev *dev)
>> >> +void pci_stop_bus_device(struct pci_dev *dev)
>> >>  {
>> >>         struct pci_bus *bus = dev->subordinate;
>> >>         struct pci_dev *child, *tmp;
>> >> @@ -75,8 +75,9 @@ static void pci_stop_bus_device(struct p
>> >>
>> >>         pci_stop_dev(dev);
>> >>  }
>> >> +EXPORT_SYMBOL(pci_stop_bus_device);
>> >>
>> >> -static void pci_remove_bus_device(struct pci_dev *dev)
>> >> +void pci_remove_bus_device(struct pci_dev *dev)
>> >>  {
>> >>         struct pci_bus *bus = dev->subordinate;
>> >>         struct pci_dev *child, *tmp;
>> >> @@ -92,6 +93,7 @@ static void pci_remove_bus_device(struct
>> >>
>> >>         pci_destroy_dev(dev);
>> >>  }
>> >> +EXPORT_SYMBOL(pci_remove_bus_device);
>> >
>> > I really don't want to export these two symbols unless we have to.
>> > Obviously this is the heart of your patch, so we probably *will* have
>> > to.
>>
>> Agree.
>>
>> >
>> > But maybe they can at least be confined to drivers/pci code, and not
>> > exported to modules?  I don't think there's any reason pciehp needs to
>> > be a module.  I was planning to merge a patch restricting it to be
>> > built-in this cycle anyway.
>>
>> sure, you can apply that patch (make pciehp to be built-in) at first.
>
> Below is the patch I intend to apply.  We can easily do this for v3.12.
> But your patch needs to be in v3.11, so we'll have to figure out how
> to handle that.  Maybe we can put the Kconfig change in v3.11, too.
> It should be low-risk because it doesn't add any new code paths that
> weren't in v3.10.
>
> Bjorn
>
>
> commit 04216ce0f2381090572142ebab28f63bae157d8d
> Author: Bjorn Helgaas <bhelgaas@google.com>
> Date:   Thu Jun 27 10:16:19 2013 -0600
>
>     PCI: pciehp: Convert pciehp to be builtin only, not modular
>
>     Convert pciehp to be builtin only, with no module option.
>
>     Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
>
> diff --git a/drivers/pci/pcie/Kconfig b/drivers/pci/pcie/Kconfig
> index a82e70a..7958e59 100644
> --- a/drivers/pci/pcie/Kconfig
> +++ b/drivers/pci/pcie/Kconfig
> @@ -14,15 +14,12 @@ config PCIEPORTBUS
>  # Include service Kconfig here
>  #
>  config HOTPLUG_PCI_PCIE
> -       tristate "PCI Express Hotplug driver"
> +       bool "PCI Express Hotplug driver"
>         depends on HOTPLUG_PCI && PCIEPORTBUS
>         help
>           Say Y here if you have a motherboard that supports PCI Express Native
>           Hotplug
>
> -         To compile this driver as a module, choose M here: the
> -         module will be called pciehp.
> -
>           When in doubt, say N.
>
>  source "drivers/pci/pcie/aer/Kconfig"

on top of for-linus, please check three updated patches.

Thanks

Yinghai

[-- Attachment #2: fix_cx3_hotplug_5.patch --]
[-- Type: application/octet-stream, Size: 1791 bytes --]

Subject: [PATCH] PCI/pciehp: Separate VGA checking out from loop

Separate VGA checking out and do that at first,
if there is VGA in the chain, do even try to stop or remove any device
under that bus.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>

---
 drivers/pci/hotplug/pciehp_pci.c |   27 ++++++++++++++++-----------
 1 file changed, 16 insertions(+), 11 deletions(-)

Index: linux-2.6/drivers/pci/hotplug/pciehp_pci.c
===================================================================
--- linux-2.6.orig/drivers/pci/hotplug/pciehp_pci.c
+++ linux-2.6/drivers/pci/hotplug/pciehp_pci.c
@@ -92,6 +92,22 @@ int pciehp_unconfigure_device(struct slo
 	if (ret)
 		presence = 0;
 
+	/* check if VGA is around */
+	if (presence) {
+		list_for_each_entry(dev, &parent->devices, bus_list) {
+			if (dev->hdr_type == PCI_HEADER_TYPE_BRIDGE) {
+				pci_read_config_byte(dev, PCI_BRIDGE_CONTROL,
+							&bctl);
+				if (bctl & PCI_BRIDGE_CTL_VGA) {
+					ctrl_err(ctrl,
+						 "Cannot remove display device %s\n",
+						 pci_name(dev));
+					return -EINVAL;
+				}
+			}
+		}
+	}
+
 	/*
 	 * Stopping an SR-IOV PF device removes all the associated VFs,
 	 * which will update the bus->devices list and confuse the
@@ -101,17 +117,6 @@ int pciehp_unconfigure_device(struct slo
 	list_for_each_entry_safe_reverse(dev, temp, &parent->devices,
 					 bus_list) {
 		pci_dev_get(dev);
-		if (dev->hdr_type == PCI_HEADER_TYPE_BRIDGE && presence) {
-			pci_read_config_byte(dev, PCI_BRIDGE_CONTROL, &bctl);
-			if (bctl & PCI_BRIDGE_CTL_VGA) {
-				ctrl_err(ctrl,
-					 "Cannot remove display device %s\n",
-					 pci_name(dev));
-				pci_dev_put(dev);
-				rc = -EINVAL;
-				break;
-			}
-		}
 		pci_stop_and_remove_bus_device(dev);
 		/*
 		 * Ensure that no new Requests will be generated from

[-- Attachment #3: fix_cx3_hotplug_1.patch --]
[-- Type: application/octet-stream, Size: 3933 bytes --]

Subject: [PATCH] PCI: Separate stop and remove devices in pciehp
From: Yinghai Lu <yinghai@kernel.org>

commit dc087f2f6a2925e81831f3016b9cbb6e470e7423
(PCI: Simplify IOV implementation and fix reference count races)
broke the pcie native hotplug with SRIOV enabled: PF is not freed
during hot-remove, and later hot-add do not work as old pci_dev
is still around, and can not create new pci_dev.

That commit need VFs to be removed via pci_disable_sriov/virtfn_remove
to make sure ref to PF is put back.

So we can not call stop_and_remove for VF before PF, that will
make VF get removed directly before PF's driver try to use
pci_disable_sriov/virtfn_remove to do it.

Solution is separating stop and remove in two iterations.

In first iteration, we stop VF driver at first with iterating devices
reversely, and later during stop PF driver, disable_sriov will use
virtfn_remove to remove VFs.

Also some driver (like mlx4_core) need VF's driver get stopped before PF's.

Need this one for v3.11.

-v2: separate VGA checking to another patch according to Bjorn.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: Yijing Wang <wangyijing@huawei.com>
Cc: Jiang Liu <liuj97@gmail.com>

---
 drivers/pci/hotplug/pciehp_pci.c |   15 +++++++++++++--
 drivers/pci/pci.h                |    3 +++
 drivers/pci/remove.c             |    4 ++--
 3 files changed, 18 insertions(+), 4 deletions(-)

Index: linux-2.6/drivers/pci/hotplug/pciehp_pci.c
===================================================================
--- linux-2.6.orig/drivers/pci/hotplug/pciehp_pci.c
+++ linux-2.6/drivers/pci/hotplug/pciehp_pci.c
@@ -109,6 +109,13 @@ int pciehp_unconfigure_device(struct slo
 	}
 
 	/*
+	 * Now VF need to be removed via virtfn_remove to make
+	 * sure ref to PF is put back. Some driver (mlx4_core) need
+	 * VF's driver get stopped before PF.
+	 * So we need to stop VF driver at first, that means
+	 * loop reversely, and later during stop PF driver,
+	 * disable_sriov will use virtfn_remove to remove VFs.
+	 *
 	 * Stopping an SR-IOV PF device removes all the associated VFs,
 	 * which will update the bus->devices list and confuse the
 	 * iterator.  Therefore, iterate in reverse so we remove the VFs
@@ -116,8 +123,7 @@ int pciehp_unconfigure_device(struct slo
 	 */
 	list_for_each_entry_safe_reverse(dev, temp, &parent->devices,
 					 bus_list) {
-		pci_dev_get(dev);
-		pci_stop_and_remove_bus_device(dev);
+		pci_stop_bus_device(dev);
 		/*
 		 * Ensure that no new Requests will be generated from
 		 * the device.
@@ -128,6 +134,11 @@ int pciehp_unconfigure_device(struct slo
 			command |= PCI_COMMAND_INTX_DISABLE;
 			pci_write_config_word(dev, PCI_COMMAND, command);
 		}
+	}
+
+	list_for_each_entry_safe(dev, temp, &parent->devices, bus_list) {
+		pci_dev_get(dev);
+		pci_remove_bus_device(dev);
 		pci_dev_put(dev);
 	}
 
Index: linux-2.6/drivers/pci/remove.c
===================================================================
--- linux-2.6.orig/drivers/pci/remove.c
+++ linux-2.6/drivers/pci/remove.c
@@ -56,7 +56,7 @@ void pci_remove_bus(struct pci_bus *bus)
 }
 EXPORT_SYMBOL(pci_remove_bus);
 
-static void pci_stop_bus_device(struct pci_dev *dev)
+void pci_stop_bus_device(struct pci_dev *dev)
 {
 	struct pci_bus *bus = dev->subordinate;
 	struct pci_dev *child, *tmp;
@@ -76,7 +76,7 @@ static void pci_stop_bus_device(struct p
 	pci_stop_dev(dev);
 }
 
-static void pci_remove_bus_device(struct pci_dev *dev)
+void pci_remove_bus_device(struct pci_dev *dev)
 {
 	struct pci_bus *bus = dev->subordinate;
 	struct pci_dev *child, *tmp;
Index: linux-2.6/drivers/pci/pci.h
===================================================================
--- linux-2.6.orig/drivers/pci/pci.h
+++ linux-2.6/drivers/pci/pci.h
@@ -317,4 +317,7 @@ static inline int pci_dev_specific_reset
 }
 #endif
 
+void pci_stop_bus_device(struct pci_dev *dev);
+void pci_remove_bus_device(struct pci_dev *dev);
+
 #endif /* DRIVERS_PCI_H */

[-- Attachment #4: fix_cx3_hotplug_2.patch --]
[-- Type: application/octet-stream, Size: 1568 bytes --]

Subject: [PATCH] PCI: Stop sriov after stop PF if PF's driver skip that
From: Yinghai Lu <yinghai@kernel.org>

After commit dc087f2f6a2925e81831f3016b9cbb6e470e7423
(PCI: Simplify IOV implementation and fix reference count races)
VF need to be removed via virtfn_remove to make sure ref to PF
is put back.

Some driver (like ixgbe) does not call pci_disable_sriov() if
sriov is enabled via /sys/.../sriov_numvfs setting.
ixgbe does allow driver for PF get detached, but still have VFs
around.

But how about PF get removed via /sys or pciehp finally?

During hot-remove, VF will still hold one ref to PF and it
prevent PF to be removed.
That make the next hot-add fails, as old PF dev struct is still around.

We need to add pci_disable_sriov() calling during stop PF .

Need this one for v3.11

-v2: Accoring to Bjorn, move that calling to pci_stop_dev.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: Jiang Liu <liuj97@gmail.com>
Cc: Alexander Duyck <alexander.h.duyck@intel.com>
Cc: Donald Dutile <ddutile@redhat.com>
Cc: Greg Rose <gregory.v.rose@intel.com>

---
 drivers/pci/remove.c |    2 ++
 1 file changed, 2 insertions(+)

Index: linux-2.6/drivers/pci/remove.c
===================================================================
--- linux-2.6.orig/drivers/pci/remove.c
+++ linux-2.6/drivers/pci/remove.c
@@ -25,6 +25,8 @@ static void pci_stop_dev(struct pci_dev
 		pci_proc_detach_device(dev);
 		pci_remove_sysfs_dev_files(dev);
 		device_del(&dev->dev);
+		/* remove VF, if PF driver skip that */
+		pci_disable_sriov(dev);
 		dev->is_added = 0;
 	}
 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] PCI: Fix hotplug remove with sriov again
  2013-07-19 19:14 [PATCH] PCI: Fix hotplug remove with sriov again Yinghai Lu
                   ` (4 preceding siblings ...)
  2013-07-23 17:40 ` Bjorn Helgaas
@ 2013-07-24  2:01 ` Yijing Wang
  2013-07-24  2:04   ` Yinghai Lu
  5 siblings, 1 reply; 26+ messages in thread
From: Yijing Wang @ 2013-07-24  2:01 UTC (permalink / raw)
  To: Yinghai Lu; +Cc: Bjorn Helgaas, linux-pci, stable

Hi Yinghai,
   It seems to have the the same problem in acpiphp,

diable_device(..):

	while ((pdev = dev_in_slot(slot))) {
		pci_stop_and_remove_bus_device(pdev);
		pci_dev_put(pdev);
	}


static struct pci_dev *dev_in_slot(struct acpiphp_slot *slot)
{
	struct pci_bus *bus = slot->bridge->pci_bus;
	struct pci_dev *dev;
	struct pci_dev *ret = NULL;

	down_read(&pci_bus_sem);
	list_for_each_entry(dev, &bus->devices, bus_list)
		if (PCI_SLOT(dev->devfn) == slot->device) {
			ret = pci_dev_get(dev);
			break;
		}
	up_read(&pci_bus_sem);


Thanks!
Yijing.

On 2013/7/20 3:14, Yinghai Lu wrote:
> Found hot-remove pcie card with sriov enabled cause crash in v3.10.
> 
> It is regression caused by commit ba518e3c177547dfebf7fa7252cea0c850e7ce25
> (PCI: pciehp: Iterate over all devices in slot, not functions 0-7)
> 
> That commit change to use bus->devices to iterate devices under
> bus to run pci_stop_and_remove_bus_device().
> Actually it duplicates the problem with those bus->devices iteratation
> that we try to fix in commit ac205b7bb72fa4227d2e79979bbe2b4687cdf44d
> (PCI: make sriov work with hotplug remove)
> 
> Change to iterate reversely as we did last time.
> 
> Signed-off-by: Yinghai Lu <yinghai@kernel.org>
> Cc: Yijing Wang <wangyijing@huawei.com>
> Cc: <stable@vger.kernel.org> v3.9+
> 
> ---
>  drivers/pci/hotplug/pciehp_pci.c |    8 +++++++-
>  1 file changed, 7 insertions(+), 1 deletion(-)
> 
> Index: linux-2.6/drivers/pci/hotplug/pciehp_pci.c
> ===================================================================
> --- linux-2.6.orig/drivers/pci/hotplug/pciehp_pci.c
> +++ linux-2.6/drivers/pci/hotplug/pciehp_pci.c
> @@ -92,7 +92,13 @@ int pciehp_unconfigure_device(struct slo
>  	if (ret)
>  		presence = 0;
>  
> -	list_for_each_entry_safe(dev, temp, &parent->devices, bus_list) {
> +	/*
> +	 * Need to iterate device reversely, as during
> +	 * stop PF driver, VF will be removed, the list_for_each
> +	 * could point to removed VF with temp.
> +	 */
> +	list_for_each_entry_safe_reverse(dev, temp, &parent->devices,
> +					 bus_list) {
>  		pci_dev_get(dev);
>  		if (dev->hdr_type == PCI_HEADER_TYPE_BRIDGE && presence) {
>  			pci_read_config_byte(dev, PCI_BRIDGE_CONTROL, &bctl);
> 
> 


-- 
Thanks!
Yijing


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] PCI: Fix hotplug remove with sriov again
  2013-07-24  2:01 ` Yijing Wang
@ 2013-07-24  2:04   ` Yinghai Lu
  2013-07-24  2:15     ` Yinghai Lu
  2013-07-24  2:25     ` Yijing Wang
  0 siblings, 2 replies; 26+ messages in thread
From: Yinghai Lu @ 2013-07-24  2:04 UTC (permalink / raw)
  To: Yijing Wang
  Cc: Bjorn Helgaas, linux-pci@vger.kernel.org, stable@vger.kernel.org

On Tue, Jul 23, 2013 at 7:01 PM, Yijing Wang <wangyijing@huawei.com> wrote:
> Hi Yinghai,
>    It seems to have the the same problem in acpiphp,
>
> diable_device(..):
>
>         while ((pdev = dev_in_slot(slot))) {
>                 pci_stop_and_remove_bus_device(pdev);
>                 pci_dev_put(pdev);
>         }
>
>
> static struct pci_dev *dev_in_slot(struct acpiphp_slot *slot)
> {
>         struct pci_bus *bus = slot->bridge->pci_bus;
>         struct pci_dev *dev;
>         struct pci_dev *ret = NULL;
>
>         down_read(&pci_bus_sem);
>         list_for_each_entry(dev, &bus->devices, bus_list)
>                 if (PCI_SLOT(dev->devfn) == slot->device) {
>                         ret = pci_dev_get(dev);
>                         break;
>                 }
>         up_read(&pci_bus_sem);
>

acpiphp is ok.

dev_in_slot will restart from bus->devices again every time.

Thanks

Yinghai

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] PCI: Fix hotplug remove with sriov again
  2013-07-24  2:04   ` Yinghai Lu
@ 2013-07-24  2:15     ` Yinghai Lu
  2013-07-24  2:25     ` Yijing Wang
  1 sibling, 0 replies; 26+ messages in thread
From: Yinghai Lu @ 2013-07-24  2:15 UTC (permalink / raw)
  To: Yijing Wang
  Cc: Bjorn Helgaas, linux-pci@vger.kernel.org, stable@vger.kernel.org

[-- Attachment #1: Type: text/plain, Size: 1194 bytes --]

On Tue, Jul 23, 2013 at 7:04 PM, Yinghai Lu <yinghai@kernel.org> wrote:
> On Tue, Jul 23, 2013 at 7:01 PM, Yijing Wang <wangyijing@huawei.com> wrote:
>> Hi Yinghai,
>>    It seems to have the the same problem in acpiphp,
>>
>> diable_device(..):
>>
>>         while ((pdev = dev_in_slot(slot))) {
>>                 pci_stop_and_remove_bus_device(pdev);
>>                 pci_dev_put(pdev);
>>         }
>>
>>
>> static struct pci_dev *dev_in_slot(struct acpiphp_slot *slot)
>> {
>>         struct pci_bus *bus = slot->bridge->pci_bus;
>>         struct pci_dev *dev;
>>         struct pci_dev *ret = NULL;
>>
>>         down_read(&pci_bus_sem);
>>         list_for_each_entry(dev, &bus->devices, bus_list)
>>                 if (PCI_SLOT(dev->devfn) == slot->device) {
>>                         ret = pci_dev_get(dev);
>>                         break;
>>                 }
>>         up_read(&pci_bus_sem);
>>
>
> acpiphp is ok.
>
> dev_in_slot will restart from bus->devices again every time.

Actually I had another version to fix the problem, but I did not even
try to compile
and to test it after i figured out that mlx4_core like to VF get stopped before
PF's driver.

Thanks

Yinghai

[-- Attachment #2: fix_cx3_hotplug_1x.patch --]
[-- Type: application/octet-stream, Size: 1420 bytes --]

---
 drivers/pci/hotplug/pciehp_pci.c |   21 ++++++++++++++++++---
 1 file changed, 18 insertions(+), 3 deletions(-)

Index: linux-2.6/drivers/pci/hotplug/pciehp_pci.c
===================================================================
--- linux-2.6.orig/drivers/pci/hotplug/pciehp_pci.c
+++ linux-2.6/drivers/pci/hotplug/pciehp_pci.c
@@ -76,12 +76,28 @@ int pciehp_configure_device(struct slot
 	return 0;
 }
 
+/* return one device on bus, acquiring a reference on it */
+static struct pci_dev *one_dev_on_bus(struct pci_bus *bus)
+{
+	struct pci_dev *dev;
+	struct pci_dev *ret = NULL;
+
+	down_read(&pci_bus_sem);
+	list_for_each_entry(dev, &bus->devices, bus_list) {
+		ret = pci_dev_get(dev);
+		break;
+	}
+	up_read(&pci_bus_sem);
+
+	return ret;
+}
+
 int pciehp_unconfigure_device(struct slot *p_slot)
 {
 	int ret, rc = 0;
 	u8 bctl = 0;
 	u8 presence = 0;
-	struct pci_dev *dev, *temp;
+	struct pci_dev *dev;
 	struct pci_bus *parent = p_slot->ctrl->pcie->port->subordinate;
 	u16 command;
 	struct controller *ctrl = p_slot->ctrl;
@@ -92,8 +108,7 @@ int pciehp_unconfigure_device(struct slo
 	if (ret)
 		presence = 0;
 
-	list_for_each_entry_safe(dev, temp, &parent->devices, bus_list) {
-		pci_dev_get(dev);
+	while (dev = one_dev_on_bus(parent)) {
 		if (dev->hdr_type == PCI_HEADER_TYPE_BRIDGE && presence) {
 			pci_read_config_byte(dev, PCI_BRIDGE_CONTROL, &bctl);
 			if (bctl & PCI_BRIDGE_CTL_VGA) {

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] PCI: Fix hotplug remove with sriov again
  2013-07-24  2:04   ` Yinghai Lu
  2013-07-24  2:15     ` Yinghai Lu
@ 2013-07-24  2:25     ` Yijing Wang
  1 sibling, 0 replies; 26+ messages in thread
From: Yijing Wang @ 2013-07-24  2:25 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Bjorn Helgaas, linux-pci@vger.kernel.org, stable@vger.kernel.org

On 2013/7/24 10:04, Yinghai Lu wrote:
> On Tue, Jul 23, 2013 at 7:01 PM, Yijing Wang <wangyijing@huawei.com> wrote:
>> Hi Yinghai,
>>    It seems to have the the same problem in acpiphp,
>>
>> diable_device(..):
>>
>>         while ((pdev = dev_in_slot(slot))) {
>>                 pci_stop_and_remove_bus_device(pdev);
>>                 pci_dev_put(pdev);
>>         }
>>
>>
>> static struct pci_dev *dev_in_slot(struct acpiphp_slot *slot)
>> {
>>         struct pci_bus *bus = slot->bridge->pci_bus;
>>         struct pci_dev *dev;
>>         struct pci_dev *ret = NULL;
>>
>>         down_read(&pci_bus_sem);
>>         list_for_each_entry(dev, &bus->devices, bus_list)
>>                 if (PCI_SLOT(dev->devfn) == slot->device) {
>>                         ret = pci_dev_get(dev);
>>                         break;
>>                 }
>>         up_read(&pci_bus_sem);
>>
> 
> acpiphp is ok.
> 
> dev_in_slot will restart from bus->devices again every time.

Ah, yes, thanks for explanation.

Thanks!
Yijing.

> 
> 


-- 
Thanks!
Yijing


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] PCI: Separate stop and remove devices in pciehp
  2013-07-23 23:15         ` Yinghai Lu
@ 2013-07-24  4:00           ` Yijing Wang
  2013-07-26 22:05             ` Bjorn Helgaas
  0 siblings, 1 reply; 26+ messages in thread
From: Yijing Wang @ 2013-07-24  4:00 UTC (permalink / raw)
  To: Yinghai Lu; +Cc: Bjorn Helgaas, linux-pci@vger.kernel.org, Jiang Liu

Hi Yinghai,
   The third patch make pci_stop_dev call pci_disable_sriov(dev).
It looks asymmetrical, because pci_enable_sriov(dev, NR_VIRTFN) always
be called by device driver. Why not work around this in device driver like ixgbe?
I'm not familiar with SRIOV, so I'm just a bit puzzled.

Thanks!
Yijing.

------------------------------------------------------------
Some driver (like ixgbe) does not call pci_disable_sriov() if
sriov is enabled via /sys/.../sriov_numvfs setting.
ixgbe does allow driver for PF get detached, but still have VFs
around.

But how about PF get removed via /sys or pciehp finally?

During hot-remove, VF will still hold one ref to PF and it
prevent PF to be removed.
That make the next hot-add fails, as old PF dev struct is still around.

We need to add pci_disable_sriov() calling during stop PF .

Need this one for v3.11

-v2: Accoring to Bjorn, move that calling to pci_stop_dev.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: Jiang Liu <liuj97@gmail.com>
Cc: Alexander Duyck <alexander.h.duyck@intel.com>
Cc: Donald Dutile <ddutile@redhat.com>
Cc: Greg Rose <gregory.v.rose@intel.com>

---
 drivers/pci/remove.c |    2 ++
 1 file changed, 2 insertions(+)

Index: linux-2.6/drivers/pci/remove.c
===================================================================
--- linux-2.6.orig/drivers/pci/remove.c
+++ linux-2.6/drivers/pci/remove.c
@@ -25,6 +25,8 @@ static void pci_stop_dev(struct pci_dev
 		pci_proc_detach_device(dev);
 		pci_remove_sysfs_dev_files(dev);
 		device_del(&dev->dev);
+		/* remove VF, if PF driver skip that */
+		pci_disable_sriov(dev);
 		dev->is_added = 0;
 	}


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] PCI: Separate stop and remove devices in pciehp
  2013-07-24  4:00           ` Yijing Wang
@ 2013-07-26 22:05             ` Bjorn Helgaas
  2013-07-27 14:30               ` Yinghai Lu
  0 siblings, 1 reply; 26+ messages in thread
From: Bjorn Helgaas @ 2013-07-26 22:05 UTC (permalink / raw)
  To: Yijing Wang; +Cc: Yinghai Lu, linux-pci@vger.kernel.org, Jiang Liu

On Tue, Jul 23, 2013 at 10:00 PM, Yijing Wang <wangyijing@huawei.com> wrote:
> Hi Yinghai,
>    The third patch make pci_stop_dev call pci_disable_sriov(dev).
> It looks asymmetrical, because pci_enable_sriov(dev, NR_VIRTFN) always
> be called by device driver. Why not work around this in device driver like ixgbe?
> I'm not familiar with SRIOV, so I'm just a bit puzzled.
>
> Thanks!
> Yijing.
>
> ------------------------------------------------------------
> Some driver (like ixgbe) does not call pci_disable_sriov() if
> sriov is enabled via /sys/.../sriov_numvfs setting.
> ixgbe does allow driver for PF get detached, but still have VFs
> around.
>
> But how about PF get removed via /sys or pciehp finally?
>
> During hot-remove, VF will still hold one ref to PF and it
> prevent PF to be removed.
> That make the next hot-add fails, as old PF dev struct is still around.
>
> We need to add pci_disable_sriov() calling during stop PF .
>
> Need this one for v3.11
>
> -v2: Accoring to Bjorn, move that calling to pci_stop_dev.
>
> Signed-off-by: Yinghai Lu <yinghai@kernel.org>
> Cc: Jiang Liu <liuj97@gmail.com>
> Cc: Alexander Duyck <alexander.h.duyck@intel.com>
> Cc: Donald Dutile <ddutile@redhat.com>
> Cc: Greg Rose <gregory.v.rose@intel.com>
>
> ---
>  drivers/pci/remove.c |    2 ++
>  1 file changed, 2 insertions(+)
>
> Index: linux-2.6/drivers/pci/remove.c
> ===================================================================
> --- linux-2.6.orig/drivers/pci/remove.c
> +++ linux-2.6/drivers/pci/remove.c
> @@ -25,6 +25,8 @@ static void pci_stop_dev(struct pci_dev
>                 pci_proc_detach_device(dev);
>                 pci_remove_sysfs_dev_files(dev);
>                 device_del(&dev->dev);
> +               /* remove VF, if PF driver skip that */
> +               pci_disable_sriov(dev);
>                 dev->is_added = 0;
>         }

Can you post these as a new thread with -v2 patches?

Why did you put pci_disable_sriov() inside the "if (dev->is_added)"
block?  Is it related to is_added?

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] PCI: Separate stop and remove devices in pciehp
  2013-07-26 22:05             ` Bjorn Helgaas
@ 2013-07-27 14:30               ` Yinghai Lu
  0 siblings, 0 replies; 26+ messages in thread
From: Yinghai Lu @ 2013-07-27 14:30 UTC (permalink / raw)
  To: Bjorn Helgaas; +Cc: Yijing Wang, linux-pci@vger.kernel.org, Jiang Liu

On Fri, Jul 26, 2013 at 3:05 PM, Bjorn Helgaas <bhelgaas@google.com> wrote:
> On Tue, Jul 23, 2013 at 10:00 PM, Yijing Wang <wangyijing@huawei.com> wrote:
>> Hi Yinghai,
>>    The third patch make pci_stop_dev call pci_disable_sriov(dev).
>> It looks asymmetrical, because pci_enable_sriov(dev, NR_VIRTFN) always
>> be called by device driver. Why not work around this in device driver like ixgbe?
>> I'm not familiar with SRIOV, so I'm just a bit puzzled.
>>
>> Thanks!
>> Yijing.
>>
>> ------------------------------------------------------------
>> Some driver (like ixgbe) does not call pci_disable_sriov() if
>> sriov is enabled via /sys/.../sriov_numvfs setting.
>> ixgbe does allow driver for PF get detached, but still have VFs
>> around.
>>
>> But how about PF get removed via /sys or pciehp finally?
>>
>> During hot-remove, VF will still hold one ref to PF and it
>> prevent PF to be removed.
>> That make the next hot-add fails, as old PF dev struct is still around.
>>
>> We need to add pci_disable_sriov() calling during stop PF .
>>
>> Need this one for v3.11
>>
>> -v2: Accoring to Bjorn, move that calling to pci_stop_dev.
>>
>> Signed-off-by: Yinghai Lu <yinghai@kernel.org>
>> Cc: Jiang Liu <liuj97@gmail.com>
>> Cc: Alexander Duyck <alexander.h.duyck@intel.com>
>> Cc: Donald Dutile <ddutile@redhat.com>
>> Cc: Greg Rose <gregory.v.rose@intel.com>
>>
>> ---
>>  drivers/pci/remove.c |    2 ++
>>  1 file changed, 2 insertions(+)
>>
>> Index: linux-2.6/drivers/pci/remove.c
>> ===================================================================
>> --- linux-2.6.orig/drivers/pci/remove.c
>> +++ linux-2.6/drivers/pci/remove.c
>> @@ -25,6 +25,8 @@ static void pci_stop_dev(struct pci_dev
>>                 pci_proc_detach_device(dev);
>>                 pci_remove_sysfs_dev_files(dev);
>>                 device_del(&dev->dev);
>> +               /* remove VF, if PF driver skip that */
>> +               pci_disable_sriov(dev);
>>                 dev->is_added = 0;
>>         }
>
> Can you post these as a new thread with -v2 patches?

ok, will repost them on top of pci/for-linus

>
> Why did you put pci_disable_sriov() inside the "if (dev->is_added)"
> block?  Is it related to is_added?

yes, only have pci_enable_sriov when is_added is true.
driver is detached when device_del() is called, and at that time
pci_disable_sriov should be just called.

Thanks

Yinghai

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2013-07-27 14:30 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-07-19 19:14 [PATCH] PCI: Fix hotplug remove with sriov again Yinghai Lu
2013-07-19 19:14 ` [PATCH] PCI: Separate stop and remove devices in pciehp Yinghai Lu
2013-07-22 21:23   ` Bjorn Helgaas
2013-07-23  2:32     ` Yinghai Lu
2013-07-23 15:56       ` Bjorn Helgaas
2013-07-23 22:44         ` Yinghai Lu
2013-07-23 23:15         ` Yinghai Lu
2013-07-24  4:00           ` Yijing Wang
2013-07-26 22:05             ` Bjorn Helgaas
2013-07-27 14:30               ` Yinghai Lu
2013-07-19 19:14 ` [PATCH] PCI: Stop sriov before remove PF Yinghai Lu
2013-07-19 21:46   ` Alexander Duyck
2013-07-19 22:44     ` Yinghai Lu
2013-07-19 23:22       ` Alexander Duyck
2013-07-23 15:34         ` Don Dutile
2013-07-23 16:10           ` Yinghai Lu
2013-07-22 23:15   ` Bjorn Helgaas
2013-07-23  1:59     ` Yinghai Lu
2013-07-22  7:07 ` [PATCH] PCI: Fix hotplug remove with sriov again Yijing Wang
2013-07-22 17:39 ` Bjorn Helgaas
2013-07-22 17:48   ` Yinghai Lu
2013-07-23 17:40 ` Bjorn Helgaas
2013-07-24  2:01 ` Yijing Wang
2013-07-24  2:04   ` Yinghai Lu
2013-07-24  2:15     ` Yinghai Lu
2013-07-24  2:25     ` Yijing Wang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).