* [PATCH V5 0/3] PCI: separate hotplug handling from fatal error handling
@ 2018-07-02 22:52 Sinan Kaya
  2018-07-02 22:52 ` [PATCH V5 1/3] PCI: pciehp: implement mask and unmask interrupt functions Sinan Kaya
                   ` (3 more replies)
  0 siblings, 4 replies; 35+ messages in thread
From: Sinan Kaya @ 2018-07-02 22:52 UTC (permalink / raw)
  To: linux-pci; +Cc: Sinan Kaya, linux-arm-msm, linux-arm-kernel
This is what happens when you insert a fatal error to a hotplug slot. See
multiple link up/down messages.
/_#_[__333.699731]_pcieport_0001:00:00.0:_AER:_Uncorrected_(Fatal)_error_received:_id=0000
[  333.748116] pcieport 0001:00:00.0: PCIe Bus Error: severity=Uncorrected (Fatal), type=Transaction Layer, id
[  333.816044] pcieport 0001:00:00.0:   device [17cb:0404] error status/mask=00040000/00400000
[  333.871581] pcieport 0001:00:00.0:    [18] Malformed TLP (First)
[  333.914675] pcieport 0001:00:00.0:   TLP Header: 40000001 000000ff 00000000 00000000
[  333.968397] pciehp 0001:00:00.0:pcie004: Slot(2): Link Down
[  334.043234] iommu: Removing device 0001:01:00.4 from group 0
[  334.095952] iommu: Removing device 0001:01:00.3 from group 0
[  334.144644] iommu: Removing device 0001:01:00.2 from group 0
[  334.194653] iommu: Removing device 0001:01:00.1 from group 0
[  334.243564] pciehp 0001:00:00.0:pcie004: Slot(2): Link Up
[  334.282556] iommu: Removing device 0001:01:00.0 from group 0
[  334.330994] pciehp 0001:00:00.0:pcie004: Slot(2): Link Up event queued;
currently getting powered off
[  334.890587] pciehp 0001:00:00.0:pcie004: Timeout on hotplug command
0x13f1 (issued 282900 msec ago)
[  335.070190] pciehp 0001:00:00.0:pcie004: Slot(2): Link Down
[  335.106960] pciehp 0001:00:00.0:pcie004: Slot(2): Link Down event queued; currently getting powered on
[  335.191119] pcieport 0001:00:00.0: AER: Device recovery failed
[  346.590153] pciehp 0001:00:00.0:pcie004: Timeout on hotplug command 0x17f1 (issued 10250 msec ago)
Since DPC/AER patch to refactor fatal error handling, both hotplug
driver and AER/DPC driver will try removing devices and perform enumeration on
link events/AER events.
Perfect environment for race condition without a change.
Try to mask hotplug interrupts during AER/DPC fatal error handling.
Changes from v4:
* add mask_irq() and unmask_irq() callbacks to struct pcie_driver
* refactor reset_slot() to use pciehp_mask_irq() an pciehp_unmask_irq()
Sinan Kaya (3):
  PCI: pciehp: implement mask and unmask interrupt functions
  PCI: pciehp: reuse pciehp_mask/unmask_irq() in reset_slot()
  PCI: Mask and unmask hotplug interrupts during reset
 drivers/pci/hotplug/pciehp.h      |  2 ++
 drivers/pci/hotplug/pciehp_core.c | 19 +++++++++++++++++++
 drivers/pci/hotplug/pciehp_hpc.c  | 36 ++++++++++++++++++++++++++++--------
 drivers/pci/pcie/err.c            | 11 +++++++++++
 drivers/pci/pcie/portdrv.h        |  2 ++
 5 files changed, 62 insertions(+), 8 deletions(-)
-- 
2.7.4
^ permalink raw reply	[flat|nested] 35+ messages in thread
* [PATCH V5 1/3] PCI: pciehp: implement mask and unmask interrupt functions
  2018-07-02 22:52 [PATCH V5 0/3] PCI: separate hotplug handling from fatal error handling Sinan Kaya
@ 2018-07-02 22:52 ` Sinan Kaya
  2018-07-02 22:52 ` [PATCH V5 2/3] PCI: pciehp: reuse pciehp_mask/unmask_irq() in reset_slot() Sinan Kaya
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 35+ messages in thread
From: Sinan Kaya @ 2018-07-02 22:52 UTC (permalink / raw)
  To: linux-pci
  Cc: linux-arm-msm, linux-arm-kernel, Sinan Kaya, Bjorn Helgaas,
	Andy Shevchenko, Greg Kroah-Hartman, Mika Westerberg,
	Frederick Lawler, Rafael J. Wysocki, Keith Busch, Lukas Wunner,
	Oza Pawandeep, Markus Elfring, Kees Cook, open list
Adding pciehp_mask_irq() and pciehp_unmask_irq() to be called from struct
pcie_device as mask_irq() and unmask_irq() callbacks.
Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
---
 drivers/pci/hotplug/pciehp.h      |  2 ++
 drivers/pci/hotplug/pciehp_core.c | 19 +++++++++++++++++++
 drivers/pci/hotplug/pciehp_hpc.c  | 39 +++++++++++++++++++++++++++++++++++++++
 drivers/pci/pcie/portdrv.h        |  2 ++
 4 files changed, 62 insertions(+)
diff --git a/drivers/pci/hotplug/pciehp.h b/drivers/pci/hotplug/pciehp.h
index 5f89206..c579d03 100644
--- a/drivers/pci/hotplug/pciehp.h
+++ b/drivers/pci/hotplug/pciehp.h
@@ -138,6 +138,8 @@ int pciehp_check_link_status(struct controller *ctrl);
 bool pciehp_check_link_active(struct controller *ctrl);
 void pciehp_release_ctrl(struct controller *ctrl);
 int pciehp_reset_slot(struct slot *slot, int probe);
+void pciehp_mask_irq(struct slot *slot);
+void pciehp_unmask_irq(struct slot *slot);
 
 int pciehp_set_raw_indicator_status(struct hotplug_slot *h_slot, u8 status);
 int pciehp_get_raw_indicator_status(struct hotplug_slot *h_slot, u8 *status);
diff --git a/drivers/pci/hotplug/pciehp_core.c b/drivers/pci/hotplug/pciehp_core.c
index 44a6a63..ca2faa1 100644
--- a/drivers/pci/hotplug/pciehp_core.c
+++ b/drivers/pci/hotplug/pciehp_core.c
@@ -299,6 +299,22 @@ static int pciehp_resume(struct pcie_device *dev)
 }
 #endif /* PM */
 
+static void pciehp_mask_int(struct pcie_device *dev)
+{
+	struct controller *ctrl = get_service_data(dev);
+	struct slot *slot = ctrl->slot;
+
+	pciehp_mask_irq(slot);
+}
+
+static void pciehp_unmask_int(struct pcie_device *dev)
+{
+	struct controller *ctrl = get_service_data(dev);
+	struct slot *slot = ctrl->slot;
+
+	pciehp_unmask_irq(slot);
+}
+
 static struct pcie_port_service_driver hpdriver_portdrv = {
 	.name		= PCIE_MODULE_NAME,
 	.port_type	= PCIE_ANY_PORT,
@@ -311,6 +327,9 @@ static struct pcie_port_service_driver hpdriver_portdrv = {
 	.suspend	= pciehp_suspend,
 	.resume		= pciehp_resume,
 #endif	/* PM */
+
+	.mask_irq	= pciehp_mask_int,
+	.unmask_irq	= pciehp_unmask_int,
 };
 
 static int __init pcied_init(void)
diff --git a/drivers/pci/hotplug/pciehp_hpc.c b/drivers/pci/hotplug/pciehp_hpc.c
index 8dae232..d44e2c6 100644
--- a/drivers/pci/hotplug/pciehp_hpc.c
+++ b/drivers/pci/hotplug/pciehp_hpc.c
@@ -757,6 +757,45 @@ int pciehp_reset_slot(struct slot *slot, int probe)
 	return rc;
 }
 
+void pciehp_mask_irq(struct slot *slot)
+{
+	struct controller *ctrl = slot->ctrl;
+	u16 ctrl_mask = 0;
+
+	if (!ATTN_BUTTN(ctrl))
+		ctrl_mask |= PCI_EXP_SLTCTL_PDCE;
+
+	ctrl_mask |= PCI_EXP_SLTCTL_DLLSCE;
+
+	pcie_write_cmd(ctrl, 0, ctrl_mask);
+	ctrl_dbg(ctrl, "%s: SLOTCTRL %x write cmd %x\n", __func__,
+		 pci_pcie_cap(ctrl->pcie->port) + PCI_EXP_SLTCTL, 0);
+	if (pciehp_poll_mode)
+		del_timer_sync(&ctrl->poll_timer);
+}
+
+void pciehp_unmask_irq(struct slot *slot)
+{
+	struct controller *ctrl = slot->ctrl;
+	struct pci_dev *pdev = ctrl_dev(ctrl);
+	u16 stat_mask = 0, ctrl_mask = 0;
+
+	if (!ATTN_BUTTN(ctrl)) {
+		ctrl_mask |= PCI_EXP_SLTCTL_PDCE;
+		stat_mask |= PCI_EXP_SLTSTA_PDC;
+	}
+	ctrl_mask |= PCI_EXP_SLTCTL_DLLSCE;
+	stat_mask |= PCI_EXP_SLTSTA_DLLSC;
+
+	pcie_capability_write_word(pdev, PCI_EXP_SLTSTA, stat_mask);
+	pcie_write_cmd_nowait(ctrl, ctrl_mask, ctrl_mask);
+	ctrl_dbg(ctrl, "%s: SLOTCTRL %x write cmd %x\n", __func__,
+		 pci_pcie_cap(ctrl->pcie->port) + PCI_EXP_SLTCTL, ctrl_mask);
+	if (pciehp_poll_mode)
+		int_poll_timeout(&ctrl->poll_timer);
+}
+
+
 int pcie_init_notification(struct controller *ctrl)
 {
 	if (pciehp_request_irq(ctrl))
diff --git a/drivers/pci/pcie/portdrv.h b/drivers/pci/pcie/portdrv.h
index 6ffc797..40bb6f2 100644
--- a/drivers/pci/pcie/portdrv.h
+++ b/drivers/pci/pcie/portdrv.h
@@ -51,6 +51,8 @@ struct pcie_port_service_driver {
 	void (*remove) (struct pcie_device *dev);
 	int (*suspend) (struct pcie_device *dev);
 	int (*resume) (struct pcie_device *dev);
+	void (*mask_irq)(struct pcie_device *dev);
+	void (*unmask_irq)(struct pcie_device *dev);
 
 	/* Device driver may resume normal operations */
 	void (*error_resume)(struct pci_dev *dev);
-- 
2.7.4
^ permalink raw reply related	[flat|nested] 35+ messages in thread
* [PATCH V5 2/3] PCI: pciehp: reuse pciehp_mask/unmask_irq() in reset_slot()
  2018-07-02 22:52 [PATCH V5 0/3] PCI: separate hotplug handling from fatal error handling Sinan Kaya
  2018-07-02 22:52 ` [PATCH V5 1/3] PCI: pciehp: implement mask and unmask interrupt functions Sinan Kaya
@ 2018-07-02 22:52 ` Sinan Kaya
  2018-07-02 22:52 ` [PATCH V5 3/3] PCI: Mask and unmask hotplug interrupts during reset Sinan Kaya
  2018-07-31 18:44 ` [PATCH V5 0/3] PCI: separate hotplug handling from fatal error handling Bjorn Helgaas
  3 siblings, 0 replies; 35+ messages in thread
From: Sinan Kaya @ 2018-07-02 22:52 UTC (permalink / raw)
  To: linux-pci
  Cc: linux-arm-msm, linux-arm-kernel, Sinan Kaya, Bjorn Helgaas,
	Mika Westerberg, Keith Busch, Markus Elfring, Kees Cook,
	Lukas Wunner, Oza Pawandeep, open list
Now that we have individual functions to mask/unmask hotplug interrupts
avoid code duplication and reuse the new API in multiple places.
Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
---
 drivers/pci/hotplug/pciehp_hpc.c | 23 ++---------------------
 1 file changed, 2 insertions(+), 21 deletions(-)
diff --git a/drivers/pci/hotplug/pciehp_hpc.c b/drivers/pci/hotplug/pciehp_hpc.c
index d44e2c6..ebf9b3b 100644
--- a/drivers/pci/hotplug/pciehp_hpc.c
+++ b/drivers/pci/hotplug/pciehp_hpc.c
@@ -726,34 +726,15 @@ static void pcie_disable_notification(struct controller *ctrl)
 int pciehp_reset_slot(struct slot *slot, int probe)
 {
 	struct controller *ctrl = slot->ctrl;
-	struct pci_dev *pdev = ctrl_dev(ctrl);
-	u16 stat_mask = 0, ctrl_mask = 0;
 	int rc;
 
 	if (probe)
 		return 0;
 
-	if (!ATTN_BUTTN(ctrl)) {
-		ctrl_mask |= PCI_EXP_SLTCTL_PDCE;
-		stat_mask |= PCI_EXP_SLTSTA_PDC;
-	}
-	ctrl_mask |= PCI_EXP_SLTCTL_DLLSCE;
-	stat_mask |= PCI_EXP_SLTSTA_DLLSC;
-
-	pcie_write_cmd(ctrl, 0, ctrl_mask);
-	ctrl_dbg(ctrl, "%s: SLOTCTRL %x write cmd %x\n", __func__,
-		 pci_pcie_cap(ctrl->pcie->port) + PCI_EXP_SLTCTL, 0);
-	if (pciehp_poll_mode)
-		del_timer_sync(&ctrl->poll_timer);
-
+	pciehp_mask_irq(slot);
 	rc = pci_bridge_secondary_bus_reset(ctrl->pcie->port);
+	pciehp_unmask_irq(slot);
 
-	pcie_capability_write_word(pdev, PCI_EXP_SLTSTA, stat_mask);
-	pcie_write_cmd_nowait(ctrl, ctrl_mask, ctrl_mask);
-	ctrl_dbg(ctrl, "%s: SLOTCTRL %x write cmd %x\n", __func__,
-		 pci_pcie_cap(ctrl->pcie->port) + PCI_EXP_SLTCTL, ctrl_mask);
-	if (pciehp_poll_mode)
-		int_poll_timeout(&ctrl->poll_timer);
 	return rc;
 }
 
-- 
2.7.4
^ permalink raw reply related	[flat|nested] 35+ messages in thread
* [PATCH V5 3/3] PCI: Mask and unmask hotplug interrupts during reset
  2018-07-02 22:52 [PATCH V5 0/3] PCI: separate hotplug handling from fatal error handling Sinan Kaya
  2018-07-02 22:52 ` [PATCH V5 1/3] PCI: pciehp: implement mask and unmask interrupt functions Sinan Kaya
  2018-07-02 22:52 ` [PATCH V5 2/3] PCI: pciehp: reuse pciehp_mask/unmask_irq() in reset_slot() Sinan Kaya
@ 2018-07-02 22:52 ` Sinan Kaya
  2018-07-03  8:34   ` Lukas Wunner
  2018-07-03 14:34   ` Lukas Wunner
  2018-07-31 18:44 ` [PATCH V5 0/3] PCI: separate hotplug handling from fatal error handling Bjorn Helgaas
  3 siblings, 2 replies; 35+ messages in thread
From: Sinan Kaya @ 2018-07-02 22:52 UTC (permalink / raw)
  To: linux-pci
  Cc: linux-arm-msm, linux-arm-kernel, Sinan Kaya, Bjorn Helgaas,
	Oza Pawandeep, Keith Busch, open list
If a bridge supports hotplug and observes a PCIe fatal error, the following
events happen:
1. AER driver removes the devices from PCI tree on fatal error
2. AER driver brings down the link by issuing a secondary bus reset waits
for the link to come up.
3. Hotplug driver observes a link down interrupt
4. Hotplug driver tries to remove the devices waiting for the rescan lock
but devices are already removed by the AER driver and AER driver is waiting
for the link to come back up.
5. AER driver tries to re-enumerate devices after polling for the link
state to go up.
6. Hotplug driver obtains the lock and tries to remove the devices again.
If a bridge is a hotplug capable bridge, mask hotplug interrupts before the
reset and unmask afterwards.
Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
---
 drivers/pci/pcie/err.c | 11 +++++++++++
 1 file changed, 11 insertions(+)
diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c
index ae72f88..5a2d410 100644
--- a/drivers/pci/pcie/err.c
+++ b/drivers/pci/pcie/err.c
@@ -285,10 +285,12 @@ static pci_ers_result_t broadcast_error_message(struct pci_dev *dev,
  */
 void pcie_do_fatal_recovery(struct pci_dev *dev, u32 service)
 {
+	struct pcie_port_service_driver *hpsvc;
 	struct pci_dev *udev;
 	struct pci_bus *parent;
 	struct pci_dev *pdev, *temp;
 	pci_ers_result_t result;
+	struct device *hpdev;
 
 	if (dev->hdr_type == PCI_HEADER_TYPE_BRIDGE)
 		udev = dev;
@@ -308,8 +310,17 @@ void pcie_do_fatal_recovery(struct pci_dev *dev, u32 service)
 		pci_dev_put(pdev);
 	}
 
+	hpsvc = pcie_port_find_service(udev, PCIE_PORT_SERVICE_HP);
+	hpdev = pcie_port_find_device(udev, PCIE_PORT_SERVICE_HP);
+
+	if (hpdev && hpsvc)
+		hpsvc->mask_irq(to_pcie_device(hpdev));
+
 	result = reset_link(udev, service);
 
+	if (hpdev && hpsvc)
+		hpsvc->unmask_irq(to_pcie_device(hpdev));
+
 	if ((service == PCIE_PORT_SERVICE_AER) &&
 	    (dev->hdr_type == PCI_HEADER_TYPE_BRIDGE)) {
 		/*
-- 
2.7.4
^ permalink raw reply related	[flat|nested] 35+ messages in thread
* Re: [PATCH V5 3/3] PCI: Mask and unmask hotplug interrupts during reset
  2018-07-02 22:52 ` [PATCH V5 3/3] PCI: Mask and unmask hotplug interrupts during reset Sinan Kaya
@ 2018-07-03  8:34   ` Lukas Wunner
  2018-07-03 10:52     ` poza
  2018-07-03 11:30     ` okaya
  2018-07-03 14:34   ` Lukas Wunner
  1 sibling, 2 replies; 35+ messages in thread
From: Lukas Wunner @ 2018-07-03  8:34 UTC (permalink / raw)
  To: Sinan Kaya
  Cc: linux-pci, linux-arm-msm, linux-arm-kernel, Bjorn Helgaas,
	Oza Pawandeep, Keith Busch, open list
On Mon, Jul 02, 2018 at 06:52:47PM -0400, Sinan Kaya wrote:
> If a bridge supports hotplug and observes a PCIe fatal error, the following
> events happen:
> 
> 1. AER driver removes the devices from PCI tree on fatal error
> 2. AER driver brings down the link by issuing a secondary bus reset waits
> for the link to come up.
> 3. Hotplug driver observes a link down interrupt
> 4. Hotplug driver tries to remove the devices waiting for the rescan lock
> but devices are already removed by the AER driver and AER driver is waiting
> for the link to come back up.
> 5. AER driver tries to re-enumerate devices after polling for the link
> state to go up.
> 6. Hotplug driver obtains the lock and tries to remove the devices again.
> 
> If a bridge is a hotplug capable bridge, mask hotplug interrupts before the
> reset and unmask afterwards.
Would it work for you if you just amended the AER driver to skip
removal and re-enumeration of devices if the port is a hotplug bridge?
Just check for is_hotplug_bridge in struct pci_dev.
That would seem like a much simpler solution, given that it is known
that the link will flap on reset, causing the hotplug driver to remove
and re-enumerate devices.  That would also cover cases where hotplug is
handled by a different driver than pciehp, or by the platform firmware.
Thanks,
Lukas
^ permalink raw reply	[flat|nested] 35+ messages in thread
* Re: [PATCH V5 3/3] PCI: Mask and unmask hotplug interrupts during reset
  2018-07-03  8:34   ` Lukas Wunner
@ 2018-07-03 10:52     ` poza
  2018-07-03 12:04       ` okaya
  2018-07-03 11:30     ` okaya
  1 sibling, 1 reply; 35+ messages in thread
From: poza @ 2018-07-03 10:52 UTC (permalink / raw)
  To: Lukas Wunner
  Cc: Sinan Kaya, linux-pci, linux-arm-msm, linux-arm-kernel,
	Bjorn Helgaas, Keith Busch, open list, linux-pci-owner
On 2018-07-03 14:04, Lukas Wunner wrote:
> On Mon, Jul 02, 2018 at 06:52:47PM -0400, Sinan Kaya wrote:
>> If a bridge supports hotplug and observes a PCIe fatal error, the 
>> following
>> events happen:
>> 
>> 1. AER driver removes the devices from PCI tree on fatal error
>> 2. AER driver brings down the link by issuing a secondary bus reset 
>> waits
>> for the link to come up.
>> 3. Hotplug driver observes a link down interrupt
>> 4. Hotplug driver tries to remove the devices waiting for the rescan 
>> lock
>> but devices are already removed by the AER driver and AER driver is 
>> waiting
>> for the link to come back up.
>> 5. AER driver tries to re-enumerate devices after polling for the link
>> state to go up.
>> 6. Hotplug driver obtains the lock and tries to remove the devices 
>> again.
>> 
>> If a bridge is a hotplug capable bridge, mask hotplug interrupts 
>> before the
>> reset and unmask afterwards.
> 
> Would it work for you if you just amended the AER driver to skip
> removal and re-enumeration of devices if the port is a hotplug bridge?
> Just check for is_hotplug_bridge in struct pci_dev.
> 
I tend to agree with you Lukas.
on this line I already have follow up patches
although I am waiting for Bjorn to review some patch-series before that.
[PATCH v2 0/6] Fix issues and cleanup for ERR_FATAL and ERR_NONFATAL
It doesn't look to me a an entirely a race condition since its guarded 
by pci_lock_rescan_remove())
I observed that both hotplug and aer/dpc comes out of it in a quiet sane 
state.
My thinking is: Disabling hotplug interrupts during ERR_FATAL,
is something little away from natural course of link_down event 
handling, which is handled by pciehp more maturely.
so it would be just easy not to take any action e.g. removal and 
re-enumeration of devices from ERR_FATAL handling point of view.
I leave it to Bjorn.
follwing is the patch wich I am trying to set it right and under test.
so till now I am in an opinion to handle this by checking in err.c
diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c
index 410c35c..607a234 100644
--- a/drivers/pci/pcie/err.c
+++ b/drivers/pci/pcie/err.c
@@ -292,15 +292,17 @@ void pcie_do_fatal_recovery(struct pci_dev *dev, 
u32 service)
         parent = udev->subordinate;
         pci_lock_rescan_remove();
-       list_for_each_entry_safe_reverse(pdev, temp, &parent->devices,
-                                        bus_list) {
-               pci_dev_get(pdev);
-               pci_dev_set_disconnected(pdev, NULL);
-               if (pci_has_subordinate(pdev))
-                       pci_walk_bus(pdev->subordinate,
-                                    pci_dev_set_disconnected, NULL);
-               pci_stop_and_remove_bus_device(pdev);
-               pci_dev_put(pdev);
+       if (!udev->is_hotplug_bridge) {
+               list_for_each_entry_safe_reverse(pdev, temp, 
&parent->devices,
+                                                bus_list) {
+                       pci_dev_get(pdev);
+                       pci_dev_set_disconnected(pdev, NULL);
+                       if (pci_has_subordinate(pdev))
+                               pci_walk_bus(pdev->subordinate,
+                                            pci_dev_set_disconnected, 
NULL);
+                       pci_stop_and_remove_bus_device(pdev);
+                       pci_dev_put(pdev);
+               }
         }
         result = reset_link(udev, service);
@@ -318,7 +320,7 @@ void pcie_do_fatal_recovery(struct pci_dev *dev, u32 
service)
         }
         if (result == PCI_ERS_RESULT_RECOVERED) {
-               if (pcie_wait_for_link(udev, true))
+               if (pcie_wait_for_link(udev, true) && 
!udev->is_hotplug_bridge)
                         pci_rescan_bus(udev->bus);
                 pci_info(dev, "Device recovery from fatal error 
successful\n");
                 dev->error_state = pci_channel_io_normal;
> That would seem like a much simpler solution, given that it is known
> that the link will flap on reset, causing the hotplug driver to remove
> and re-enumerate devices.  That would also cover cases where hotplug is
> handled by a different driver than pciehp, or by the platform firmware.
> 
> Thanks,
> 
> Lukas
^ permalink raw reply related	[flat|nested] 35+ messages in thread
* Re: [PATCH V5 3/3] PCI: Mask and unmask hotplug interrupts during reset
  2018-07-03  8:34   ` Lukas Wunner
  2018-07-03 10:52     ` poza
@ 2018-07-03 11:30     ` okaya
  2018-07-03 13:11       ` poza
                         ` (2 more replies)
  1 sibling, 3 replies; 35+ messages in thread
From: okaya @ 2018-07-03 11:30 UTC (permalink / raw)
  To: Lukas Wunner
  Cc: Oza Pawandeep, linux-pci, open list, Keith Busch, linux-arm-msm,
	Bjorn Helgaas, linux-arm-kernel
On 2018-07-03 04:34, Lukas Wunner wrote:
> On Mon, Jul 02, 2018 at 06:52:47PM -0400, Sinan Kaya wrote:
>> If a bridge supports hotplug and observes a PCIe fatal error, the 
>> following
>> events happen:
>> 
>> 1. AER driver removes the devices from PCI tree on fatal error
>> 2. AER driver brings down the link by issuing a secondary bus reset 
>> waits
>> for the link to come up.
>> 3. Hotplug driver observes a link down interrupt
>> 4. Hotplug driver tries to remove the devices waiting for the rescan 
>> lock
>> but devices are already removed by the AER driver and AER driver is 
>> waiting
>> for the link to come back up.
>> 5. AER driver tries to re-enumerate devices after polling for the link
>> state to go up.
>> 6. Hotplug driver obtains the lock and tries to remove the devices 
>> again.
>> 
>> If a bridge is a hotplug capable bridge, mask hotplug interrupts 
>> before the
>> reset and unmask afterwards.
> 
> Would it work for you if you just amended the AER driver to skip
> removal and re-enumeration of devices if the port is a hotplug bridge?
> Just check for is_hotplug_bridge in struct pci_dev.
The reason why we want to remove devices before secondary bus reset is 
to quiesce pcie bus traffic before issuing a reset.
Skipping this step might cause transactions to be lost in the middle of 
the reset as there will be active traffic flowing and drivers will 
suddenly start reading ffs.
I don't think we can skip this step.
> 
> That would seem like a much simpler solution, given that it is known
> that the link will flap on reset, causing the hotplug driver to remove
> and re-enumerate devices.  That would also cover cases where hotplug is
> handled by a different driver than pciehp, or by the platform firmware.
> 
> Thanks,
> 
> Lukas
^ permalink raw reply	[flat|nested] 35+ messages in thread
* Re: [PATCH V5 3/3] PCI: Mask and unmask hotplug interrupts during reset
  2018-07-03 10:52     ` poza
@ 2018-07-03 12:04       ` okaya
  0 siblings, 0 replies; 35+ messages in thread
From: okaya @ 2018-07-03 12:04 UTC (permalink / raw)
  To: poza
  Cc: Lukas Wunner, linux-pci, linux-arm-msm, linux-arm-kernel,
	Bjorn Helgaas, Keith Busch, open list, linux-pci-owner
On 2018-07-03 06:52, poza@codeaurora.org wrote:
> On 2018-07-03 14:04, Lukas Wunner wrote:
>> On Mon, Jul 02, 2018 at 06:52:47PM -0400, Sinan Kaya wrote:
>>> If a bridge supports hotplug and observes a PCIe fatal error, the 
>>> following
>>> events happen:
>>> 
>>> 1. AER driver removes the devices from PCI tree on fatal error
>>> 2. AER driver brings down the link by issuing a secondary bus reset 
>>> waits
>>> for the link to come up.
>>> 3. Hotplug driver observes a link down interrupt
>>> 4. Hotplug driver tries to remove the devices waiting for the rescan 
>>> lock
>>> but devices are already removed by the AER driver and AER driver is 
>>> waiting
>>> for the link to come back up.
>>> 5. AER driver tries to re-enumerate devices after polling for the 
>>> link
>>> state to go up.
>>> 6. Hotplug driver obtains the lock and tries to remove the devices 
>>> again.
>>> 
>>> If a bridge is a hotplug capable bridge, mask hotplug interrupts 
>>> before the
>>> reset and unmask afterwards.
>> 
>> Would it work for you if you just amended the AER driver to skip
>> removal and re-enumeration of devices if the port is a hotplug bridge?
>> Just check for is_hotplug_bridge in struct pci_dev.
>> 
> 
> I tend to agree with you Lukas.
> 
> on this line I already have follow up patches
> although I am waiting for Bjorn to review some patch-series before 
> that.
> [PATCH v2 0/6] Fix issues and cleanup for ERR_FATAL and ERR_NONFATAL
> 
> It doesn't look to me a an entirely a race condition since its guarded
> by pci_lock_rescan_remove())
> I observed that both hotplug and aer/dpc comes out of it in a quiet 
> sane state.
> 
To add more detail on when this issue happens.
This problem is more visible on root ports with MSI-x capability or with 
multiple MSI interrupt numbers.
AFAIK, QDT root ports are single shared MSI interrupt only. Therefore, 
you won't see this issue.
As you can see in the code, rescan lock is held for the entire fatal 
error handling path.
> My thinking is: Disabling hotplug interrupts during ERR_FATAL,
> is something little away from natural course of link_down event
> handling, which is handled by pciehp more maturely.
> so it would be just easy not to take any action e.g. removal and
> re-enumeration of devices from ERR_FATAL handling point of view.
> 
I think it is more unnatural to fragment code flow and allow two drivers 
to do the same thing in parallel or create inter-driver dependency.
I got the idea from pci_reset_slot() function which is already masking 
hotplug interrupts when called by external entries during secondary bus 
reset. We just didn't handle the same for fatal error cases.
^ permalink raw reply	[flat|nested] 35+ messages in thread
* Re: [PATCH V5 3/3] PCI: Mask and unmask hotplug interrupts during reset
  2018-07-03 11:30     ` okaya
@ 2018-07-03 13:11       ` poza
  2018-07-03 13:25         ` Sinan Kaya
  2018-07-29 12:32         ` Lukas Wunner
  2018-07-03 14:12       ` Lukas Wunner
  2018-07-29 12:19       ` Lukas Wunner
  2 siblings, 2 replies; 35+ messages in thread
From: poza @ 2018-07-03 13:11 UTC (permalink / raw)
  To: okaya
  Cc: Lukas Wunner, linux-pci, linux-arm-msm, linux-arm-kernel,
	Bjorn Helgaas, Keith Busch, open list
On 2018-07-03 17:00, okaya@codeaurora.org wrote:
> On 2018-07-03 04:34, Lukas Wunner wrote:
>> On Mon, Jul 02, 2018 at 06:52:47PM -0400, Sinan Kaya wrote:
>>> If a bridge supports hotplug and observes a PCIe fatal error, the 
>>> following
>>> events happen:
>>> 
>>> 1. AER driver removes the devices from PCI tree on fatal error
>>> 2. AER driver brings down the link by issuing a secondary bus reset 
>>> waits
>>> for the link to come up.
>>> 3. Hotplug driver observes a link down interrupt
>>> 4. Hotplug driver tries to remove the devices waiting for the rescan 
>>> lock
>>> but devices are already removed by the AER driver and AER driver is 
>>> waiting
>>> for the link to come back up.
>>> 5. AER driver tries to re-enumerate devices after polling for the 
>>> link
>>> state to go up.
>>> 6. Hotplug driver obtains the lock and tries to remove the devices 
>>> again.
>>> 
>>> If a bridge is a hotplug capable bridge, mask hotplug interrupts 
>>> before the
>>> reset and unmask afterwards.
>> 
>> Would it work for you if you just amended the AER driver to skip
>> removal and re-enumeration of devices if the port is a hotplug bridge?
>> Just check for is_hotplug_bridge in struct pci_dev.
> 
> The reason why we want to remove devices before secondary bus reset is
> to quiesce pcie bus traffic before issuing a reset.
> 
> Skipping this step might cause transactions to be lost in the middle
> of the reset as there will be active traffic flowing and drivers will
> suddenly start reading ffs.
> 
> I don't think we can skip this step.
> 
what if we only have conditional enumeration ?  (leaving removing 
devices followed by SBR as is) ?
following code is doing little more extra work than our normal ERR_FATAL 
path.
pciehp_unconfigure_device doing little more than enumeration to 
quiescence the bus.
/*
		 * Ensure that no new Requests will be generated from
		 * the device.
		 */
		if (presence) {
			pci_read_config_word(dev, PCI_COMMAND, &command);
			command &= ~(PCI_COMMAND_MASTER | PCI_COMMAND_SERR);
			command |= PCI_COMMAND_INTX_DISABLE;
			pci_write_config_word(dev, PCI_COMMAND, command);
		}
> 
>> 
>> That would seem like a much simpler solution, given that it is known
>> that the link will flap on reset, causing the hotplug driver to remove
>> and re-enumerate devices.  That would also cover cases where hotplug 
>> is
>> handled by a different driver than pciehp, or by the platform 
>> firmware.
>> 
>> Thanks,
>> 
>> Lukas
^ permalink raw reply	[flat|nested] 35+ messages in thread
* Re: [PATCH V5 3/3] PCI: Mask and unmask hotplug interrupts during reset
  2018-07-03 13:11       ` poza
@ 2018-07-03 13:25         ` Sinan Kaya
  2018-07-03 13:31           ` Sinan Kaya
  2018-07-29 12:32         ` Lukas Wunner
  1 sibling, 1 reply; 35+ messages in thread
From: Sinan Kaya @ 2018-07-03 13:25 UTC (permalink / raw)
  To: poza
  Cc: Lukas Wunner, linux-pci, linux-arm-msm, linux-arm-kernel,
	Bjorn Helgaas, Keith Busch, open list
> 
> what if we only have conditional enumeration ?  (leaving removing devices followed by SBR as is) ?
> 
If we leave it as is, hotplug driver observes a link down interrupt as soon
as secondary bus reset is issued. Hotplug driver will try device removal while
AER driver is working on the currently observe FATAL error causing a race
condition. Hotplug driver wait for rescan lock to be obtained.
> following code is doing little more extra work than our normal ERR_FATAL path.
> 
See this comment and the pseudo code below.
/*
 * pciehp has a 1:1 bus:slot relationship so we ultimately want a secondary
 * bus reset of the bridge, but at the same time we want to ensure that it is
 * not seen as a hot-unplug, followed by the hot-plug of the device. Thus,
 * disable link state notification and presence detection change notification
 * momentarily, if we see that they could interfere. Also, clear any spurious
 * events after.
 */
int pciehp_reset_slot(struct slot *slot, int probe)
{
	<mask hotplug interrupts>
	<issue secondary bus reset>
	<unmask hotplug interrupts>
}
This patch is trying to mask the interrupt before secondary bus reset in AER
path not after.
Otherwise, hotplug driver will remove the devices as soon as it obtains the
rescan lock following AER recovery since hotplug interrupts will be pending.
> pciehp_unconfigure_device doing little more than enumeration to quiescence the bus.
> 
> /*
>          * Ensure that no new Requests will be generated from
>          * the device.
>          */
>         if (presence) {
>             pci_read_config_word(dev, PCI_COMMAND, &command);
>             command &= ~(PCI_COMMAND_MASTER | PCI_COMMAND_SERR);
>             command |= PCI_COMMAND_INTX_DISABLE;
>             pci_write_config_word(dev, PCI_COMMAND, command);
>         }
> 
> 
> 
>>
>>>
>>> That would seem like a much simpler solution, given that it is known
>>> that the link will flap on reset, causing the hotplug driver to remove
>>> and re-enumerate devices.  That would also cover cases where hotplug is
>>> handled by a different driver than pciehp, or by the platform firmware.
>>>
>>> Thanks,
>>>
>>> Lukas
> 
-- 
Sinan Kaya
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.
^ permalink raw reply	[flat|nested] 35+ messages in thread
* Re: [PATCH V5 3/3] PCI: Mask and unmask hotplug interrupts during reset
  2018-07-03 13:25         ` Sinan Kaya
@ 2018-07-03 13:31           ` Sinan Kaya
  2018-07-03 13:59             ` Lukas Wunner
  0 siblings, 1 reply; 35+ messages in thread
From: Sinan Kaya @ 2018-07-03 13:31 UTC (permalink / raw)
  To: poza
  Cc: Lukas Wunner, linux-pci, linux-arm-msm, linux-arm-kernel,
	Bjorn Helgaas, Keith Busch, open list
On 7/3/2018 9:25 AM, Sinan Kaya wrote:
>> what if we only have conditional enumeration ?  (leaving removing devices followed by SBR as is) ?
>>
Sorry, I think I didn't quite get your question. 
Are you asking about the enumeration following link up while keeping the
current code as is?
Issue is observing hotplug link down event in the middle of AER recovery as in
my previous reply.
If we mask hotplug interrupts before secondary bus reset via my patch,
then hotplug driver will not observe both link up and link down interrupts.
If we don't mask hotplug interrupts, we have a race condition.
I hope I answered your question.
-- 
Sinan Kaya
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.
^ permalink raw reply	[flat|nested] 35+ messages in thread
* Re: [PATCH V5 3/3] PCI: Mask and unmask hotplug interrupts during reset
  2018-07-03 13:31           ` Sinan Kaya
@ 2018-07-03 13:59             ` Lukas Wunner
  2018-07-03 14:10               ` poza
  2018-07-03 15:34               ` Sinan Kaya
  0 siblings, 2 replies; 35+ messages in thread
From: Lukas Wunner @ 2018-07-03 13:59 UTC (permalink / raw)
  To: Sinan Kaya
  Cc: poza, linux-pci, linux-arm-msm, linux-arm-kernel, Bjorn Helgaas,
	Keith Busch, open list
On Tue, Jul 03, 2018 at 09:31:24AM -0400, Sinan Kaya wrote:
> Issue is observing hotplug link down event in the middle of AER recovery
> as in my previous reply.
> 
> If we mask hotplug interrupts before secondary bus reset via my patch,
> then hotplug driver will not observe both link up and link down interrupts.
> 
> If we don't mask hotplug interrupts, we have a race condition.
I assume that a bus reset not only causes a link and presence event but
also clears the Presence Detect State bit in the Slot Status register
and the Data Link Layer Link Active bit in the Link Status register
momentarily.
pciehp may access those two bits concurrently to the AER driver
performing a slot reset.  So it may not be sufficient to mask
the interrupt.
I've posted this patch to address the issue:
https://patchwork.ozlabs.org/patch/930391/
Thanks,
Lukas
^ permalink raw reply	[flat|nested] 35+ messages in thread
* Re: [PATCH V5 3/3] PCI: Mask and unmask hotplug interrupts during reset
  2018-07-03 13:59             ` Lukas Wunner
@ 2018-07-03 14:10               ` poza
  2018-07-03 14:17                 ` Lukas Wunner
  2018-07-03 15:34               ` Sinan Kaya
  1 sibling, 1 reply; 35+ messages in thread
From: poza @ 2018-07-03 14:10 UTC (permalink / raw)
  To: Lukas Wunner
  Cc: Sinan Kaya, linux-pci, linux-arm-msm, linux-arm-kernel,
	Bjorn Helgaas, Keith Busch, open list
On 2018-07-03 19:29, Lukas Wunner wrote:
> On Tue, Jul 03, 2018 at 09:31:24AM -0400, Sinan Kaya wrote:
>> Issue is observing hotplug link down event in the middle of AER 
>> recovery
>> as in my previous reply.
>> 
>> If we mask hotplug interrupts before secondary bus reset via my patch,
>> then hotplug driver will not observe both link up and link down 
>> interrupts.
>> 
>> If we don't mask hotplug interrupts, we have a race condition.
> 
> I assume that a bus reset not only causes a link and presence event but
> also clears the Presence Detect State bit in the Slot Status register
> and the Data Link Layer Link Active bit in the Link Status register
> momentarily.
> 
> pciehp may access those two bits concurrently to the AER driver
> performing a slot reset.  So it may not be sufficient to mask
> the interrupt.
Was just wondering that you are protecting Presence Detect State bit 
with reset_lock, mainly in pciehp_ist
but with hotplug interrupt disabled, is there another way that it 
hotplug code gets activated ?
> 
> I've posted this patch to address the issue:
> https://patchwork.ozlabs.org/patch/930391/
> 
> Thanks,
> 
> Lukas
^ permalink raw reply	[flat|nested] 35+ messages in thread
* Re: [PATCH V5 3/3] PCI: Mask and unmask hotplug interrupts during reset
  2018-07-03 11:30     ` okaya
  2018-07-03 13:11       ` poza
@ 2018-07-03 14:12       ` Lukas Wunner
  2018-07-03 14:29         ` poza
  2018-07-29 12:19       ` Lukas Wunner
  2 siblings, 1 reply; 35+ messages in thread
From: Lukas Wunner @ 2018-07-03 14:12 UTC (permalink / raw)
  To: okaya
  Cc: linux-pci, linux-arm-msm, linux-arm-kernel, Bjorn Helgaas,
	Oza Pawandeep, Keith Busch, open list
On Tue, Jul 03, 2018 at 07:30:28AM -0400, okaya@codeaurora.org wrote:
> On 2018-07-03 04:34, Lukas Wunner wrote:
> >On Mon, Jul 02, 2018 at 06:52:47PM -0400, Sinan Kaya wrote:
> >>If a bridge supports hotplug and observes a PCIe fatal error, the
> >>following
> >>events happen:
> >>
> >>1. AER driver removes the devices from PCI tree on fatal error
> >>2. AER driver brings down the link by issuing a secondary bus reset
> >>waits
> >>for the link to come up.
> >>3. Hotplug driver observes a link down interrupt
> >>4. Hotplug driver tries to remove the devices waiting for the rescan
> >>lock
> >>but devices are already removed by the AER driver and AER driver is
> >>waiting
> >>for the link to come back up.
> >>5. AER driver tries to re-enumerate devices after polling for the link
> >>state to go up.
> >>6. Hotplug driver obtains the lock and tries to remove the devices
> >>again.
> >>
> >>If a bridge is a hotplug capable bridge, mask hotplug interrupts before
> >>the
> >>reset and unmask afterwards.
> >
> >Would it work for you if you just amended the AER driver to skip
> >removal and re-enumeration of devices if the port is a hotplug bridge?
> >Just check for is_hotplug_bridge in struct pci_dev.
> 
> The reason why we want to remove devices before secondary bus reset is to
> quiesce pcie bus traffic before issuing a reset.
> 
> Skipping this step might cause transactions to be lost in the middle of the
> reset as there will be active traffic flowing and drivers will suddenly
> start reading ffs.
Interesting, I think that merits a code comment.
FWIW, macOS has a "PCI pause" callback to quiesce a device:
https://opensource.apple.com/source/IOPCIFamily/IOPCIFamily-239.1.2/pause.rtf
They're using it to reconfigure a device's BAR and bus number
at runtime (sic!), e.g. if mmio windows need to be moved around
on Thunderbolt hotplug if there's insufficient space:
	"During pause reconfiguration, the following may be changed:
	 - device BAR registers
	 - the devices bus number
	 - registry properties reflecting these values ("ranges",
	   "assigned-addresses", "reg")
	 - device MSI block values for address and value, but not the
	   number of MSIs allocated"
Conceptually, "PCI pause" is similar to putting the device in a suspend
state.  I'm wondering if suspending the devices below the bridge would
make more sense than removing them in the AER driver.
Lukas
^ permalink raw reply	[flat|nested] 35+ messages in thread
* Re: [PATCH V5 3/3] PCI: Mask and unmask hotplug interrupts during reset
  2018-07-03 14:10               ` poza
@ 2018-07-03 14:17                 ` Lukas Wunner
  0 siblings, 0 replies; 35+ messages in thread
From: Lukas Wunner @ 2018-07-03 14:17 UTC (permalink / raw)
  To: poza
  Cc: Sinan Kaya, linux-pci, linux-arm-msm, linux-arm-kernel,
	Bjorn Helgaas, Keith Busch, open list
On Tue, Jul 03, 2018 at 07:40:46PM +0530, poza@codeaurora.org wrote:
> On 2018-07-03 19:29, Lukas Wunner wrote:
> >On Tue, Jul 03, 2018 at 09:31:24AM -0400, Sinan Kaya wrote:
> >>Issue is observing hotplug link down event in the middle of AER recovery
> >>as in my previous reply.
> >>
> >>If we mask hotplug interrupts before secondary bus reset via my patch,
> >>then hotplug driver will not observe both link up and link down
> >>interrupts.
> >>
> >>If we don't mask hotplug interrupts, we have a race condition.
> >
> >I assume that a bus reset not only causes a link and presence event but
> >also clears the Presence Detect State bit in the Slot Status register
> >and the Data Link Layer Link Active bit in the Link Status register
> >momentarily.
> >
> >pciehp may access those two bits concurrently to the AER driver
> >performing a slot reset.  So it may not be sufficient to mask
> >the interrupt.
> >
> >I've posted this patch to address the issue:
> >https://patchwork.ozlabs.org/patch/930391/
> 
> Was just wondering that you are protecting Presence Detect State bit with
> reset_lock, mainly in pciehp_ist
> but with hotplug interrupt disabled, is there another way that it hotplug
> code gets activated ?
The user may turn the slot on/off via sysfs.  If an Attention Button
is present, the user may also press that button to turn the slot on/off
after 5 seconds.  Either way, it may cause pciehp's IRQ thread to run
concurrently to a reset initiated by the AER driver, independently of
any events signalled by the slot.
Thanks,
Lukas
^ permalink raw reply	[flat|nested] 35+ messages in thread
* Re: [PATCH V5 3/3] PCI: Mask and unmask hotplug interrupts during reset
  2018-07-03 14:12       ` Lukas Wunner
@ 2018-07-03 14:29         ` poza
  0 siblings, 0 replies; 35+ messages in thread
From: poza @ 2018-07-03 14:29 UTC (permalink / raw)
  To: Lukas Wunner
  Cc: okaya, linux-pci, linux-arm-msm, linux-arm-kernel, Bjorn Helgaas,
	Keith Busch, open list
On 2018-07-03 19:42, Lukas Wunner wrote:
> On Tue, Jul 03, 2018 at 07:30:28AM -0400, okaya@codeaurora.org wrote:
>> On 2018-07-03 04:34, Lukas Wunner wrote:
>> >On Mon, Jul 02, 2018 at 06:52:47PM -0400, Sinan Kaya wrote:
>> >>If a bridge supports hotplug and observes a PCIe fatal error, the
>> >>following
>> >>events happen:
>> >>
>> >>1. AER driver removes the devices from PCI tree on fatal error
>> >>2. AER driver brings down the link by issuing a secondary bus reset
>> >>waits
>> >>for the link to come up.
>> >>3. Hotplug driver observes a link down interrupt
>> >>4. Hotplug driver tries to remove the devices waiting for the rescan
>> >>lock
>> >>but devices are already removed by the AER driver and AER driver is
>> >>waiting
>> >>for the link to come back up.
>> >>5. AER driver tries to re-enumerate devices after polling for the link
>> >>state to go up.
>> >>6. Hotplug driver obtains the lock and tries to remove the devices
>> >>again.
>> >>
>> >>If a bridge is a hotplug capable bridge, mask hotplug interrupts before
>> >>the
>> >>reset and unmask afterwards.
>> >
>> >Would it work for you if you just amended the AER driver to skip
>> >removal and re-enumeration of devices if the port is a hotplug bridge?
>> >Just check for is_hotplug_bridge in struct pci_dev.
>> 
>> The reason why we want to remove devices before secondary bus reset is 
>> to
>> quiesce pcie bus traffic before issuing a reset.
>> 
>> Skipping this step might cause transactions to be lost in the middle 
>> of the
>> reset as there will be active traffic flowing and drivers will 
>> suddenly
>> start reading ffs.
> 
> Interesting, I think that merits a code comment.
> 
> FWIW, macOS has a "PCI pause" callback to quiesce a device:
> https://opensource.apple.com/source/IOPCIFamily/IOPCIFamily-239.1.2/pause.rtf
> 
> They're using it to reconfigure a device's BAR and bus number
> at runtime (sic!), e.g. if mmio windows need to be moved around
> on Thunderbolt hotplug if there's insufficient space:
> 
> 	"During pause reconfiguration, the following may be changed:
> 	 - device BAR registers
> 	 - the devices bus number
> 	 - registry properties reflecting these values ("ranges",
> 	   "assigned-addresses", "reg")
> 	 - device MSI block values for address and value, but not the
> 	   number of MSIs allocated"
> 
> Conceptually, "PCI pause" is similar to putting the device in a suspend
> state.  I'm wondering if suspending the devices below the bridge would
> make more sense than removing them in the AER driver.
> 
the code is shared by not only AER but also DPC, where if DPC event 
happens the devices are removed.
also if the bridge is hotplug capable, then the devices beneath might 
have changed and resume might break.
> Lukas
^ permalink raw reply	[flat|nested] 35+ messages in thread
* Re: [PATCH V5 3/3] PCI: Mask and unmask hotplug interrupts during reset
  2018-07-02 22:52 ` [PATCH V5 3/3] PCI: Mask and unmask hotplug interrupts during reset Sinan Kaya
  2018-07-03  8:34   ` Lukas Wunner
@ 2018-07-03 14:34   ` Lukas Wunner
  2018-07-03 15:12     ` poza
  2018-07-03 15:43     ` Sinan Kaya
  1 sibling, 2 replies; 35+ messages in thread
From: Lukas Wunner @ 2018-07-03 14:34 UTC (permalink / raw)
  To: Sinan Kaya
  Cc: linux-pci, linux-arm-msm, linux-arm-kernel, Bjorn Helgaas,
	Oza Pawandeep, Keith Busch, open list
On Mon, Jul 02, 2018 at 06:52:47PM -0400, Sinan Kaya wrote:
> @@ -308,8 +310,17 @@ void pcie_do_fatal_recovery(struct pci_dev *dev, u32 service)
>  		pci_dev_put(pdev);
>  	}
>  
> +	hpsvc = pcie_port_find_service(udev, PCIE_PORT_SERVICE_HP);
> +	hpdev = pcie_port_find_device(udev, PCIE_PORT_SERVICE_HP);
> +
> +	if (hpdev && hpsvc)
> +		hpsvc->mask_irq(to_pcie_device(hpdev));
> +
>  	result = reset_link(udev, service);
>  
> +	if (hpdev && hpsvc)
> +		hpsvc->unmask_irq(to_pcie_device(hpdev));
> +
We've already got the ->reset_slot callback in struct hotplug_slot_ops,
I'm wondering if we really need additional ones for this use case.
If you just do...
	if (!pci_probe_reset_slot(dev->slot))
		pci_reset_slot(dev->slot)
	else {
		/* regular code path */
	}
would that not be equivalent?
(It's worth noting though that pciehp is the only hotplug driver
implementing the ->reset_slot callback.  If hotplug is handled by
a different driver or by the platform firmware, devices may still
be removed and re-enumerated twice.)
Thanks,
Lukas
^ permalink raw reply	[flat|nested] 35+ messages in thread
* Re: [PATCH V5 3/3] PCI: Mask and unmask hotplug interrupts during reset
  2018-07-03 14:34   ` Lukas Wunner
@ 2018-07-03 15:12     ` poza
  2018-07-03 15:49       ` Sinan Kaya
  2018-07-03 15:43     ` Sinan Kaya
  1 sibling, 1 reply; 35+ messages in thread
From: poza @ 2018-07-03 15:12 UTC (permalink / raw)
  To: Lukas Wunner
  Cc: Sinan Kaya, linux-pci, linux-arm-msm, linux-arm-kernel,
	Bjorn Helgaas, Keith Busch, open list
On 2018-07-03 20:04, Lukas Wunner wrote:
> On Mon, Jul 02, 2018 at 06:52:47PM -0400, Sinan Kaya wrote:
>> @@ -308,8 +310,17 @@ void pcie_do_fatal_recovery(struct pci_dev *dev, 
>> u32 service)
>>  		pci_dev_put(pdev);
>>  	}
>> 
>> +	hpsvc = pcie_port_find_service(udev, PCIE_PORT_SERVICE_HP);
>> +	hpdev = pcie_port_find_device(udev, PCIE_PORT_SERVICE_HP);
>> +
>> +	if (hpdev && hpsvc)
>> +		hpsvc->mask_irq(to_pcie_device(hpdev));
>> +
>>  	result = reset_link(udev, service);
>> 
>> +	if (hpdev && hpsvc)
>> +		hpsvc->unmask_irq(to_pcie_device(hpdev));
>> +
> 
> We've already got the ->reset_slot callback in struct hotplug_slot_ops,
> I'm wondering if we really need additional ones for this use case.
> 
> If you just do...
> 
> 	if (!pci_probe_reset_slot(dev->slot))
> 		pci_reset_slot(dev->slot)
> 	else {
> 		/* regular code path */
> 	}
> 
> would that not be equivalent?
> 
pcie_do_fatal_recovery() calls
           reset_link()
              which calls dpc_reset_link() or aer_root_reset() depending 
on the event.
and dpc_reset_link() and aer_root_reset() do their own cleanup stuffs.
infact, aer_root_reset() is the only function which actually triggers 
pci_reset_bridge_secondary_bus().
So if we try to fit in your suggestion:
if (!pci_probe_reset_slot(dev->slot))
{
     pci_reset_slot(dev->slot)
     result = reset_link(udev, service); >> in this case aer_root_reset 
must not call pci_reset_bridge_secondary_bus()
} else
     result = reset_link(udev, service); >> in this case aer_root_reset 
must call pci_reset_bridge_secondary_bus() [since bridge is not hotplug 
capable)
Did I get your suggestion right ?
Regards,
Oza.
> (It's worth noting though that pciehp is the only hotplug driver
> implementing the ->reset_slot callback.  If hotplug is handled by
> a different driver or by the platform firmware, devices may still
> be removed and re-enumerated twice.)
> 
> Thanks,
> 
> Lukas
^ permalink raw reply	[flat|nested] 35+ messages in thread
* Re: [PATCH V5 3/3] PCI: Mask and unmask hotplug interrupts during reset
  2018-07-03 13:59             ` Lukas Wunner
  2018-07-03 14:10               ` poza
@ 2018-07-03 15:34               ` Sinan Kaya
  1 sibling, 0 replies; 35+ messages in thread
From: Sinan Kaya @ 2018-07-03 15:34 UTC (permalink / raw)
  To: Lukas Wunner
  Cc: poza, linux-pci, linux-arm-msm, linux-arm-kernel, Bjorn Helgaas,
	Keith Busch, open list
On 7/3/2018 9:59 AM, Lukas Wunner wrote:
> On Tue, Jul 03, 2018 at 09:31:24AM -0400, Sinan Kaya wrote:
>> Issue is observing hotplug link down event in the middle of AER recovery
>> as in my previous reply.
>>
>> If we mask hotplug interrupts before secondary bus reset via my patch,
>> then hotplug driver will not observe both link up and link down interrupts.
>>
>> If we don't mask hotplug interrupts, we have a race condition.
> 
> I assume that a bus reset not only causes a link and presence event but
> also clears the Presence Detect State bit in the Slot Status register
> and the Data Link Layer Link Active bit in the Link Status register
> momentarily.
> 
> pciehp may access those two bits concurrently to the AER driver
> performing a slot reset.  So it may not be sufficient to mask
> the interrupt.
> 
> I've posted this patch to address the issue:
> https://patchwork.ozlabs.org/patch/930391/
Very interesting!
I missed this completely.
I know for a fact that bus reset clears the Data Link Layer Active bit
as soon as link goes down. It gets set again following link up.
Presence detect depends on the HW implementation. 
QDT root ports don't change presence detect for instance since nobody
actually removed the card.
If an implementation supports in-band presence detect, the answer is yes.
As soon as the link goes down, presence detect bit will get cleared until
recovery.
It sounds like we need to update your lock change with my proposal.
lock() in mask_irq() and unlock() in unmask_irq()
> 
> Thanks,
> 
> Lukas
> 
-- 
Sinan Kaya
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.
^ permalink raw reply	[flat|nested] 35+ messages in thread
* Re: [PATCH V5 3/3] PCI: Mask and unmask hotplug interrupts during reset
  2018-07-03 14:34   ` Lukas Wunner
  2018-07-03 15:12     ` poza
@ 2018-07-03 15:43     ` Sinan Kaya
  2018-07-08 17:14       ` Lukas Wunner
  1 sibling, 1 reply; 35+ messages in thread
From: Sinan Kaya @ 2018-07-03 15:43 UTC (permalink / raw)
  To: Lukas Wunner
  Cc: linux-pci, linux-arm-msm, linux-arm-kernel, Bjorn Helgaas,
	Oza Pawandeep, Keith Busch, open list
On 7/3/2018 10:34 AM, Lukas Wunner wrote:
> On Mon, Jul 02, 2018 at 06:52:47PM -0400, Sinan Kaya wrote:
>> @@ -308,8 +310,17 @@ void pcie_do_fatal_recovery(struct pci_dev *dev, u32 service)
>>  		pci_dev_put(pdev);
>>  	}
>>  
>> +	hpsvc = pcie_port_find_service(udev, PCIE_PORT_SERVICE_HP);
>> +	hpdev = pcie_port_find_device(udev, PCIE_PORT_SERVICE_HP);
>> +
>> +	if (hpdev && hpsvc)
>> +		hpsvc->mask_irq(to_pcie_device(hpdev));
>> +
>>  	result = reset_link(udev, service);
>>  
>> +	if (hpdev && hpsvc)
>> +		hpsvc->unmask_irq(to_pcie_device(hpdev));
>> +
> 
> We've already got the ->reset_slot callback in struct hotplug_slot_ops,
> I'm wondering if we really need additional ones for this use case.
> 
> If you just do...
> 
> 	if (!pci_probe_reset_slot(dev->slot))
> 		pci_reset_slot(dev->slot)
> 	else {
> 		/* regular code path */
> 	}
> 
> would that not be equivalent?
As I have informed you before on my previous reply, the pdev->slot is
only valid for children objects such as endpoints not for a bridge when
using pciehp.
The pointer is NULL for the host bridge itself.
I reached out to reset_slot() callback in v4 of this implementation.
https://patchwork.kernel.org/patch/10494971/
However, as Oza explained FATAL error handling gets called from two different
paths as AER and DPC.
If the link goes down due to DPC, calling pci_reset_slot() would be a mistake
as DPC has its own recovery mechanism by programming the DPC capabilities.
pci_reset_slot() performs a secondary bus reset following hotplug interrupt
mask.
Issuing a secondary bus reset to a DPC event would be a mistake for recovery.
That's why, I extracted the hotplug mask and unmask IRQ calls into service
layer so that I can mask hotplug interrupts regardless of the source of the
FATAL error whether it is DPC or AER.
If error source is DPC, it still goes to DPC driver's reset_link() callback
for DPC specific clean up.
If error source is AER, it still goes to AER driver's reset_link() callback
for secondary bus reset.
Remember that AER driver completely bypasses pci_reset_slot() today. The lock
mechanism you are putting will not be useful for FATAL error case where
pci_secondary_bus_reset() is called directly.
pci_reset_slot() only gets called from external drivers such as VFIO to initiate
a reset to the slot if hotplug is supported.
> 
> (It's worth noting though that pciehp is the only hotplug driver
> implementing the ->reset_slot callback.  If hotplug is handled by
> a different driver or by the platform firmware, devices may still
> be removed and re-enumerated twice.)
> 
> Thanks,
> 
> Lukas
> 
-- 
Sinan Kaya
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.
^ permalink raw reply	[flat|nested] 35+ messages in thread
* Re: [PATCH V5 3/3] PCI: Mask and unmask hotplug interrupts during reset
  2018-07-03 15:12     ` poza
@ 2018-07-03 15:49       ` Sinan Kaya
  0 siblings, 0 replies; 35+ messages in thread
From: Sinan Kaya @ 2018-07-03 15:49 UTC (permalink / raw)
  To: poza, Lukas Wunner
  Cc: linux-pci, linux-arm-msm, linux-arm-kernel, Bjorn Helgaas,
	Keith Busch, open list
On 7/3/2018 11:12 AM, poza@codeaurora.org wrote:
> if (!pci_probe_reset_slot(dev->slot))
> {
>     pci_reset_slot(dev->slot)
>     result = reset_link(udev, service); >> in this case aer_root_reset must not call pci_reset_bridge_secondary_bus()
> } else
>     result = reset_link(udev, service); >> in this case aer_root_reset must call pci_reset_bridge_secondary_bus() [since bridge is not hotplug capable)
Here are two different flow for two different FATAL error sources
dpc_irq
	link is down due to DPC
	pcie_do_fatal_recovery()
		pci_reset_slot()
			mask hotplug IRQ
			secondary bus reset
			unmask hotplug IRQ
			undefined behavior as link went down due to DPC
		dpc_reset_link()
			undefined behavior secondary bus reset happened
					   while a DPC event is pending
			link may or may not be up at this moment
			recover the link via DPC way if HW can cope with this undefined behavior.
aer_irq
	link is up
	pcie_do_fatal_recovery()
		pci_reset_slot()
			mask hotplug IRQ
			secondary bus reset
			unmask hotplug IRQ
			link goes up
		aer_reset_link()
			secondary bus reset
			hotplug link down interrupt again
I tried to change pci_reset_slot() such that we do
mask hotplug IRQ
go to AER/DPC reset_link based on error source
unmask hotplug IRQ
-- 
Sinan Kaya
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.
^ permalink raw reply	[flat|nested] 35+ messages in thread
* Re: [PATCH V5 3/3] PCI: Mask and unmask hotplug interrupts during reset
  2018-07-03 15:43     ` Sinan Kaya
@ 2018-07-08 17:14       ` Lukas Wunner
  2018-07-09 14:48         ` Sinan Kaya
  0 siblings, 1 reply; 35+ messages in thread
From: Lukas Wunner @ 2018-07-08 17:14 UTC (permalink / raw)
  To: Sinan Kaya
  Cc: linux-pci, linux-arm-msm, linux-arm-kernel, Bjorn Helgaas,
	Oza Pawandeep, Keith Busch, open list
On Tue, Jul 03, 2018 at 11:43:26AM -0400, Sinan Kaya wrote:
> On 7/3/2018 10:34 AM, Lukas Wunner wrote:
> > We've already got the ->reset_slot callback in struct hotplug_slot_ops,
> > I'm wondering if we really need additional ones for this use case.
> 
> As I have informed you before on my previous reply, the pdev->slot is
> only valid for children objects such as endpoints not for a bridge when
> using pciehp.
> 
> The pointer is NULL for the host bridge itself.
Right, sorry, I had misremembered how this works.  So essentially the
pointer is only set for the devices "in" the slot, but not for the bridge
"to" that slot.  If the slot isn't occupied, *no* pci_dev points the
struct pci_slot.  Seems counter-intuitive to be honest.
Thanks for the explanation of the various reset codepaths, I'm afraid my
understanding of that portion of the PCI core is limited.
Browsing through drivers/pci/pci.c, I notice pci_dev_lock() and
pci_dev_save_and_disable(), both are called from reset codepaths and
apparently seek to stop access to the device during reset.  I'm wondering
why DPC and AER remove devices in order to avoid accesses to them during
reset, instead of utilizing these two functions?
My guess is that a lot of the reset code is historically grown and
could be simplified and made more consistent, but that requires digging
into the code and getting a complete picture.  I've sort of done that
for pciehp, I think I'm now intimately familiar with 90% of it,
so I'll be happy to review patches for it and answer questions,
but I'm pretty much stumped when it comes to reset code in the PCI core.
I treated the ->reset_slot() callback as one possible entry point into
pciehp and asked myself if it's properly serialized with the rest of the
driver and whether driver ->probe and ->remove is ordered such that
the driver is always properly initialized when the entry point might be
taken.  I did not look too closely at the codepaths actually leading to
invocation of the ->reset_slot() callback.
> I was curious if we could use a single work queue for all pcie portdrv
> services. This would also eliminate the need for locks that Lukas is
> adding.
> 
> If hotplug starts first,  hotplug code would run to completion before AER
> and DPC service starts recovery.
> 
> If DPC/AER starts first, my patch would mask the hotplug interrupts.
> 
> My solution doesn't help if link down interrupt is observed before the AER
> or DPC services.
If pciehp gets an interrupt quicker than dpc/aer, it will (at least with
my patches) remove all devices, check if the presence bit is set, and
if so, try to bring the slot up again.  My (limited) understanding is
that the link will stay down until dpc/aer react.  pciehp_check_link_status()
will wait 1 sec for the link, wait another 100 msec, then poll the vendor
register for 1 sec before giving up.  So if dpc/aer are locked out for this
long, they will only be able to reset the slot after 2100 msec.
I've had a brief look at the PCIe r4.0 base spec and could not find
anything about how pciehp and dpc/aer should coordinate.  Maybe that's
an oversight, or the PCISIG just leaves this to the OS.
> Another possibility is to add synchronization logic between these threads
> obviously.
Maybe call pci_channel_offline() in the poll loops of pcie_wait_for_link()
and pci_bus_check_dev() to avoid waiting for the link if an error needs to
be acted upon first?
Thanks,
Lukas
^ permalink raw reply	[flat|nested] 35+ messages in thread
* Re: [PATCH V5 3/3] PCI: Mask and unmask hotplug interrupts during reset
  2018-07-08 17:14       ` Lukas Wunner
@ 2018-07-09 14:48         ` Sinan Kaya
  2018-07-09 16:00           ` Lukas Wunner
  0 siblings, 1 reply; 35+ messages in thread
From: Sinan Kaya @ 2018-07-09 14:48 UTC (permalink / raw)
  To: Lukas Wunner
  Cc: linux-pci, linux-arm-msm, linux-arm-kernel, Bjorn Helgaas,
	Oza Pawandeep, Keith Busch, open list
On 7/8/18, Lukas Wunner <lukas@wunner.de> wrote:
> On Tue, Jul 03, 2018 at 11:43:26AM -0400, Sinan Kaya wrote:
>> On 7/3/2018 10:34 AM, Lukas Wunner wrote:
>> > We've already got the ->reset_slot callback in struct hotplug_slot_ops,
>> > I'm wondering if we really need additional ones for this use case.
>>
>> As I have informed you before on my previous reply, the pdev->slot is
>> only valid for children objects such as endpoints not for a bridge when
>> using pciehp.
>>
>> The pointer is NULL for the host bridge itself.
>
> Right, sorry, I had misremembered how this works.  So essentially the
> pointer is only set for the devices "in" the slot, but not for the bridge
> "to" that slot.  If the slot isn't occupied, *no* pci_dev points the
> struct pci_slot.  Seems counter-intuitive to be honest.
That is true. There is a bit of history here. Back in pci days, you
would have each PCI device number as a hotplug slot. There would be
multiple slots under a bridge. There is a one to many relationship.
>
> Thanks for the explanation of the various reset codepaths, I'm afraid my
> understanding of that portion of the PCI core is limited.
>
> Browsing through drivers/pci/pci.c, I notice pci_dev_lock() and
> pci_dev_save_and_disable(), both are called from reset codepaths and
> apparently seek to stop access to the device during reset.  I'm wondering
> why DPC and AER remove devices in order to avoid accesses to them during
> reset, instead of utilizing these two functions?
This was the behavior until 4.18. Since 4.18 all devices are being
removed and reenumerated following a fatal error condition.
>
> My guess is that a lot of the reset code is historically grown and
> could be simplified and made more consistent, but that requires digging
> into the code and getting a complete picture.  I've sort of done that
> for pciehp, I think I'm now intimately familiar with 90% of it,
> so I'll be happy to review patches for it and answer questions,
> but I'm pretty much stumped when it comes to reset code in the PCI core.
>
> I treated the ->reset_slot() callback as one possible entry point into
> pciehp and asked myself if it's properly serialized with the rest of the
> driver and whether driver ->probe and ->remove is ordered such that
> the driver is always properly initialized when the entry point might be
> taken.  I did not look too closely at the codepaths actually leading to
> invocation of the ->reset_slot() callback.
>
Sure, this path gets called from vfio today and possibly via a reset
file in sysfs. A bunch of things need to fail before hitting slot
reset path.
However, vfio is a direct caller if hotplug is supported.
>
>> I was curious if we could use a single work queue for all pcie portdrv
>> services. This would also eliminate the need for locks that Lukas is
>> adding.
>>
>> If hotplug starts first,  hotplug code would run to completion before AER
>> and DPC service starts recovery.
>>
>> If DPC/AER starts first, my patch would mask the hotplug interrupts.
>>
>> My solution doesn't help if link down interrupt is observed before the
>> AER
>> or DPC services.
>
> If pciehp gets an interrupt quicker than dpc/aer, it will (at least with
> my patches) remove all devices, check if the presence bit is set,
Yup.
> and
> if so, try to bring the slot up again.
Hotplug driver should only observe a link down interrupt. Link would
come up in response to a secondary bus reset initiated by the AER
driver.
Can you point me to the code that would bring up the link in hp code?
Maybe, I am missing something.
> My (limited) understanding is
> that the link will stay down until dpc/aer react.
> pciehp_check_link_status()
> will wait 1 sec for the link, wait another 100 msec, then poll the vendor
> register for 1 sec before giving up.  So if dpc/aer are locked out for this
> long, they will only be able to reset the slot after 2100 msec.
>
> I've had a brief look at the PCIe r4.0 base spec and could not find
> anything about how pciehp and dpc/aer should coordinate.  Maybe that's
> an oversight, or the PCISIG just leaves this to the OS.
>
>
>> Another possibility is to add synchronization logic between these threads
>> obviously.
>
> Maybe call pci_channel_offline() in the poll loops of pcie_wait_for_link()
> and pci_bus_check_dev() to avoid waiting for the link if an error needs to
> be acted upon first?
Let me think about this.
>
> Thanks,
>
> Lukas
>
^ permalink raw reply	[flat|nested] 35+ messages in thread
* Re: [PATCH V5 3/3] PCI: Mask and unmask hotplug interrupts during reset
  2018-07-09 14:48         ` Sinan Kaya
@ 2018-07-09 16:00           ` Lukas Wunner
  2018-07-10 18:30             ` Sinan Kaya
  0 siblings, 1 reply; 35+ messages in thread
From: Lukas Wunner @ 2018-07-09 16:00 UTC (permalink / raw)
  To: Sinan Kaya
  Cc: linux-pci, linux-arm-msm, linux-arm-kernel, Bjorn Helgaas,
	Oza Pawandeep, Keith Busch, open list
On Mon, Jul 09, 2018 at 08:48:44AM -0600, Sinan Kaya wrote:
> On 7/8/18, Lukas Wunner <lukas@wunner.de> wrote:
> > On Tue, Jul 03, 2018 at 11:43:26AM -0400, Sinan Kaya wrote:
> > > My solution doesn't help if link down interrupt is observed before the
> > > AER
> > > or DPC services.
> >
> > If pciehp gets an interrupt quicker than dpc/aer, it will (at least with
> > my patches) remove all devices, check if the presence bit is set,
> > and if so, try to bring the slot up again.
> 
> Hotplug driver should only observe a link down interrupt. Link would
> come up in response to a secondary bus reset initiated by the AER
> driver.
PCIe hotplug doesn't have separate Link Down and Link Up interrupts,
there is only a Link State *Changed* event.
> Can you point me to the code that would bring up the link in hp code?
I was referring to the situation with my recently posted pciehp patches
applied, in particular patch [21/32] ("PCI: pciehp: Become resilient to
missed events"):
https://patchwork.ozlabs.org/patch/930389/
When I get a presence or link changed event, I turn the slot off.  That
includes removing all devices in the slot.  Because even if the slot is
still occupied or link is up, there was definitely a change and the safe
behavior is to assume that the card in the slot is now a different one
than before.
Afterwards, I check if the slot is occupied or link is up.  If either
of those conditions is true, I try to bring the slot up again.
Thanks,
Lukas
^ permalink raw reply	[flat|nested] 35+ messages in thread
* Re: [PATCH V5 3/3] PCI: Mask and unmask hotplug interrupts during reset
  2018-07-09 16:00           ` Lukas Wunner
@ 2018-07-10 18:30             ` Sinan Kaya
  2018-07-20 20:01               ` Bjorn Helgaas
  0 siblings, 1 reply; 35+ messages in thread
From: Sinan Kaya @ 2018-07-10 18:30 UTC (permalink / raw)
  To: Lukas Wunner
  Cc: Oza Pawandeep, linux-pci, open list, Keith Busch, linux-arm-msm,
	Bjorn Helgaas, linux-arm-kernel
On Mon, Jul 9, 2018 at 12:00 PM, Lukas Wunner <lukas@wunner.de> wrote:
>
> On Mon, Jul 09, 2018 at 08:48:44AM -0600, Sinan Kaya wrote:
> > On 7/8/18, Lukas Wunner <lukas@wunner.de> wrote:
> > > On Tue, Jul 03, 2018 at 11:43:26AM -0400, Sinan Kaya wrote:
> > > > My solution doesn't help if link down interrupt is observed before the
> > > > AER
> > > > or DPC services.
> > >
> > > If pciehp gets an interrupt quicker than dpc/aer, it will (at least with
> > > my patches) remove all devices, check if the presence bit is set,
> > > and if so, try to bring the slot up again.
> >
> > Hotplug driver should only observe a link down interrupt. Link would
> > come up in response to a secondary bus reset initiated by the AER
> > driver.
>
> PCIe hotplug doesn't have separate Link Down and Link Up interrupts,
> there is only a Link State *Changed* event.
>
> > Can you point me to the code that would bring up the link in hp code?
>
> I was referring to the situation with my recently posted pciehp patches
> applied, in particular patch [21/32] ("PCI: pciehp: Become resilient to
> missed events"):
> https://patchwork.ozlabs.org/patch/930389/
>
> When I get a presence or link changed event, I turn the slot off.  That
> includes removing all devices in the slot.  Because even if the slot is
> still occupied or link is up, there was definitely a change and the safe
> behavior is to assume that the card in the slot is now a different one
> than before.
>
We do have a bit of mess unfortunately. Error handling and hotplug drivers
do not play nicely with each other.
When hotplug driver observes a link down, we are not checking if the
link down happened because user really wanted to remove a card or
if it was because it was originated by an error handling service such
as AER/DPC.
I'm thinking that we could potentially check if a hotplug event is pending
at the entrance of fatal error handling. If it is pending, we could poll until
the status bit clears. That should flush the link down event.
Even then, link down indication of hotplug seem to turn off slot power
and LED.
If AER/DPC service runs after the hotplug driver, link won't come back
up as the power to the slot is turned off.
I'd like to hear about Bjorn's opinion before we throw something else
into this problem.
> Afterwards, I check if the slot is occupied or link is up.  If either
> of those conditions is true, I try to bring the slot up again.
>
> Thanks,
>
> Lukas
^ permalink raw reply	[flat|nested] 35+ messages in thread
* Re: [PATCH V5 3/3] PCI: Mask and unmask hotplug interrupts during reset
  2018-07-10 18:30             ` Sinan Kaya
@ 2018-07-20 20:01               ` Bjorn Helgaas
  2018-07-21  2:58                 ` Sinan Kaya
  0 siblings, 1 reply; 35+ messages in thread
From: Bjorn Helgaas @ 2018-07-20 20:01 UTC (permalink / raw)
  To: Sinan Kaya
  Cc: Lukas Wunner, Oza Pawandeep, linux-pci, open list, Keith Busch,
	linux-arm-msm, Bjorn Helgaas, linux-arm-kernel
On Tue, Jul 10, 2018 at 02:30:11PM -0400, Sinan Kaya wrote:
> On Mon, Jul 9, 2018 at 12:00 PM, Lukas Wunner <lukas@wunner.de> wrote:
> > On Mon, Jul 09, 2018 at 08:48:44AM -0600, Sinan Kaya wrote:
> > > On 7/8/18, Lukas Wunner <lukas@wunner.de> wrote:
> > > > On Tue, Jul 03, 2018 at 11:43:26AM -0400, Sinan Kaya wrote:
> > > > > My solution doesn't help if link down interrupt is observed
> > > > > before the AER or DPC services.
> > > >
> > > > If pciehp gets an interrupt quicker than dpc/aer, it will (at
> > > > least with my patches) remove all devices, check if the
> > > > presence bit is set, and if so, try to bring the slot up
> > > > again.
> > >
> > > Hotplug driver should only observe a link down interrupt. Link
> > > would come up in response to a secondary bus reset initiated by
> > > the AER driver.
> >
> > PCIe hotplug doesn't have separate Link Down and Link Up
> > interrupts, there is only a Link State *Changed* event.
> >
> > > Can you point me to the code that would bring up the link in hp
> > > code?
> >
> > I was referring to the situation with my recently posted pciehp
> > patches applied, in particular patch [21/32] ("PCI: pciehp: Become
> > resilient to missed events"):
> > https://patchwork.ozlabs.org/patch/930389/
> >
> > When I get a presence or link changed event, I turn the slot off.
> > That includes removing all devices in the slot.  Because even if
> > the slot is still occupied or link is up, there was definitely a
> > change and the safe behavior is to assume that the card in the
> > slot is now a different one than before.
> 
> We do have a bit of mess unfortunately. Error handling and hotplug
> drivers do not play nicely with each other.
> 
> When hotplug driver observes a link down, we are not checking if the
> link down happened because user really wanted to remove a card or if
> it was because it was originated by an error handling service such
> as AER/DPC.
> 
> I'm thinking that we could potentially check if a hotplug event is
> pending at the entrance of fatal error handling. If it is pending,
> we could poll until the status bit clears. That should flush the
> link down event.
> 
> Even then, link down indication of hotplug seem to turn off slot
> power and LED.
> 
> If AER/DPC service runs after the hotplug driver, link won't come
> back up as the power to the slot is turned off.
> 
> I'd like to hear about Bjorn's opinion before we throw something
> else into this problem.
You guys know way more about this than I do.
I think the separation of AER/DPC/pciehp into separate drivers is
somewhat artificial because there are many interdependencies.  The
driver model doesn't apply very well because there's only one
underlying piece of hardware, which forces us to use the portdrv as
sort of a multiplexer.  The fact that portdrv claims these bridges
also means normal drivers (e.g., for performance counters) can't use
the usual model.
All that is to say that if integrating these services more tightly
would help solve this problem, I'd be open to that.
Bjorn
^ permalink raw reply	[flat|nested] 35+ messages in thread
* Re: [PATCH V5 3/3] PCI: Mask and unmask hotplug interrupts during reset
  2018-07-20 20:01               ` Bjorn Helgaas
@ 2018-07-21  2:58                 ` Sinan Kaya
  2018-07-21  6:07                   ` Sinan Kaya
  2018-07-29 18:02                   ` Lukas Wunner
  0 siblings, 2 replies; 35+ messages in thread
From: Sinan Kaya @ 2018-07-21  2:58 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Oza Pawandeep, linux-pci, open list, Keith Busch, Lukas Wunner,
	linux-arm-msm, Bjorn Helgaas, linux-arm-kernel
On 7/20/2018 1:01 PM, Bjorn Helgaas wrote:
> On Tue, Jul 10, 2018 at 02:30:11PM -0400, Sinan Kaya wrote:
>> On Mon, Jul 9, 2018 at 12:00 PM, Lukas Wunner <lukas@wunner.de> wrote:
>>> On Mon, Jul 09, 2018 at 08:48:44AM -0600, Sinan Kaya wrote:
>>>> On 7/8/18, Lukas Wunner <lukas@wunner.de> wrote:
>>>>> On Tue, Jul 03, 2018 at 11:43:26AM -0400, Sinan Kaya wrote:
>>>>>> My solution doesn't help if link down interrupt is observed
>>>>>> before the AER or DPC services.
>>>>>
>>>>> If pciehp gets an interrupt quicker than dpc/aer, it will (at
>>>>> least with my patches) remove all devices, check if the
>>>>> presence bit is set, and if so, try to bring the slot up
>>>>> again.
>>>>
>>>> Hotplug driver should only observe a link down interrupt. Link
>>>> would come up in response to a secondary bus reset initiated by
>>>> the AER driver.
>>>
>>> PCIe hotplug doesn't have separate Link Down and Link Up
>>> interrupts, there is only a Link State *Changed* event.
>>>
>>>> Can you point me to the code that would bring up the link in hp
>>>> code?
>>>
>>> I was referring to the situation with my recently posted pciehp
>>> patches applied, in particular patch [21/32] ("PCI: pciehp: Become
>>> resilient to missed events"):
>>> https://patchwork.ozlabs.org/patch/930389/
>>>
>>> When I get a presence or link changed event, I turn the slot off.
>>> That includes removing all devices in the slot.  Because even if
>>> the slot is still occupied or link is up, there was definitely a
>>> change and the safe behavior is to assume that the card in the
>>> slot is now a different one than before.
>>
>> We do have a bit of mess unfortunately. Error handling and hotplug
>> drivers do not play nicely with each other.
>>
>> When hotplug driver observes a link down, we are not checking if the
>> link down happened because user really wanted to remove a card or if
>> it was because it was originated by an error handling service such
>> as AER/DPC.
>>
>> I'm thinking that we could potentially check if a hotplug event is
>> pending at the entrance of fatal error handling. If it is pending,
>> we could poll until the status bit clears. That should flush the
>> link down event.
>>
>> Even then, link down indication of hotplug seem to turn off slot
>> power and LED.
>>
>> If AER/DPC service runs after the hotplug driver, link won't come
>> back up as the power to the slot is turned off.
>>
>> I'd like to hear about Bjorn's opinion before we throw something
>> else into this problem.
> 
> You guys know way more about this than I do.
> 
> I think the separation of AER/DPC/pciehp into separate drivers is
> somewhat artificial because there are many interdependencies.  The
> driver model doesn't apply very well because there's only one
> underlying piece of hardware, which forces us to use the portdrv as
> sort of a multiplexer.  The fact that portdrv claims these bridges
> also means normal drivers (e.g., for performance counters) can't use
> the usual model.
> 
> All that is to say that if integrating these services more tightly
> would help solve this problem, I'd be open to that.
I was looking at how to destroy the portdrv for a while. It looks like
a much more bigger task to be honest. There are multiple levels of
abstractions in the code as you highlighted.
My patch solves the problem if AER interrupt happens before the hotplug
interrupt. We are masking the data link layer active interrupt. So,
AER/DPC can perform their link operations without hotplug driver race.
We need to figure out how to gracefully return inside hotplug driver
if link down happened and there is an error pending.
My first question is why hotplug driver is reacting to the link event
if there was not an actual device insertion/removal.
Would it help to keep track of presence changed interrupts since last
link event?
IF counter is 0 and device is present, hotplug driver bails out
silently as an example.
> 
> Bjorn
> 
^ permalink raw reply	[flat|nested] 35+ messages in thread
* Re: [PATCH V5 3/3] PCI: Mask and unmask hotplug interrupts during reset
  2018-07-21  2:58                 ` Sinan Kaya
@ 2018-07-21  6:07                   ` Sinan Kaya
  2018-07-25  8:29                     ` poza
  2018-07-29 18:02                   ` Lukas Wunner
  1 sibling, 1 reply; 35+ messages in thread
From: Sinan Kaya @ 2018-07-21  6:07 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Oza Pawandeep, linux-pci, open list, Keith Busch, Lukas Wunner,
	linux-arm-msm, Bjorn Helgaas, linux-arm-kernel
On 7/20/2018 7:58 PM, Sinan Kaya wrote:
> We need to figure out how to gracefully return inside hotplug driver
> if link down happened and there is an error pending.
How about adding the following into the hotplug ISR?
1. check if firmware first is disabled
2. check if there is a fatal error pending in the device_status register
of the PCI Express capability on the root port.
3. bail out from hotplug routine if this is the case.
4. otherwise, existing behavior.
^ permalink raw reply	[flat|nested] 35+ messages in thread
* Re: [PATCH V5 3/3] PCI: Mask and unmask hotplug interrupts during reset
  2018-07-21  6:07                   ` Sinan Kaya
@ 2018-07-25  8:29                     ` poza
  0 siblings, 0 replies; 35+ messages in thread
From: poza @ 2018-07-25  8:29 UTC (permalink / raw)
  To: Sinan Kaya
  Cc: Bjorn Helgaas, Lukas Wunner, linux-pci, open list, Keith Busch,
	linux-arm-msm, Bjorn Helgaas, linux-arm-kernel
On 2018-07-21 11:37, Sinan Kaya wrote:
> On 7/20/2018 7:58 PM, Sinan Kaya wrote:
>> We need to figure out how to gracefully return inside hotplug driver
>> if link down happened and there is an error pending.
> 
> How about adding the following into the hotplug ISR?
> 
> 1. check if firmware first is disabled
> 2. check if there is a fatal error pending in the device_status 
> register
> of the PCI Express capability on the root port.
> 3. bail out from hotplug routine if this is the case.
> 4. otherwise, existing behavior.
This makes sense.
from Lukas's text
"
The user may turn the slot on/off via sysfs.  If an Attention Button
is present, the user may also press that button to turn the slot on/off
after 5 seconds.  Either way, it may cause pciehp's IRQ thread to run
concurrently to a reset initiated by the AER driver, independently of
any events signalled by the slot.
"
so if device gets removed and re-enumerated other than hotplug ISR,
or any other similar path has to take care of checking ERR_FATAL status.
Regards,
Oza.
^ permalink raw reply	[flat|nested] 35+ messages in thread
* Re: [PATCH V5 3/3] PCI: Mask and unmask hotplug interrupts during reset
  2018-07-03 11:30     ` okaya
  2018-07-03 13:11       ` poza
  2018-07-03 14:12       ` Lukas Wunner
@ 2018-07-29 12:19       ` Lukas Wunner
  2 siblings, 0 replies; 35+ messages in thread
From: Lukas Wunner @ 2018-07-29 12:19 UTC (permalink / raw)
  To: okaya
  Cc: linux-pci, linux-arm-msm, linux-arm-kernel, Bjorn Helgaas,
	Oza Pawandeep, Keith Busch, open list
On Tue, Jul 03, 2018 at 07:30:28AM -0400, okaya@codeaurora.org wrote:
> On 2018-07-03 04:34, Lukas Wunner wrote:
> > On Mon, Jul 02, 2018 at 06:52:47PM -0400, Sinan Kaya wrote:
> > > If a bridge supports hotplug and observes a PCIe fatal error, the
> > > following events happen:
> > >
> > > 1. AER driver removes the devices from PCI tree on fatal error
> > > 2. AER driver brings down the link by issuing a secondary bus reset
> > >    waits for the link to come up.
> > > 3. Hotplug driver observes a link down interrupt
> > > 4. Hotplug driver tries to remove the devices waiting for the rescan
> > >    lock but devices are already removed by the AER driver and AER
> > >    driver is waiting for the link to come back up.
> > > 5. AER driver tries to re-enumerate devices after polling for the link
> > >    state to go up.
> > > 6. Hotplug driver obtains the lock and tries to remove the devices
> > >    again.
> > >
> > > If a bridge is a hotplug capable bridge, mask hotplug interrupts
> > > before the reset and unmask afterwards.
> >
> > Would it work for you if you just amended the AER driver to skip
> > removal and re-enumeration of devices if the port is a hotplug bridge?
> > Just check for is_hotplug_bridge in struct pci_dev.
> 
> The reason why we want to remove devices before secondary bus reset is to
> quiesce pcie bus traffic before issuing a reset.
> 
> Skipping this step might cause transactions to be lost in the middle of
> the reset as there will be active traffic flowing and drivers will
> suddenly start reading ffs.
That's par for the course for any hotplug bridge with support for
surprise removal (e.g. Thunderbolt) and drivers must be able to
cope with it.  Nothing to worry about IMHO.
Thanks,
Lukas
^ permalink raw reply	[flat|nested] 35+ messages in thread
* Re: [PATCH V5 3/3] PCI: Mask and unmask hotplug interrupts during reset
  2018-07-03 13:11       ` poza
  2018-07-03 13:25         ` Sinan Kaya
@ 2018-07-29 12:32         ` Lukas Wunner
  1 sibling, 0 replies; 35+ messages in thread
From: Lukas Wunner @ 2018-07-29 12:32 UTC (permalink / raw)
  To: poza
  Cc: okaya, linux-pci, linux-arm-msm, linux-arm-kernel, Bjorn Helgaas,
	Keith Busch, open list
On Tue, Jul 03, 2018 at 06:41:33PM +0530, poza@codeaurora.org wrote:
> pciehp_unconfigure_device doing little more than enumeration to quiescence
> the bus.
> 
>		/*
> 		 * Ensure that no new Requests will be generated from
> 		 * the device.
> 		 */
> 		if (presence) {
> 			pci_read_config_word(dev, PCI_COMMAND, &command);
> 			command &= ~(PCI_COMMAND_MASTER | PCI_COMMAND_SERR);
> 			command |= PCI_COMMAND_INTX_DISABLE;
> 			pci_write_config_word(dev, PCI_COMMAND, command);
> 		}
That piece of code is supposed to be executed on safe removal via sysfs
or an Attention Button press:  The card remains in the slot, even though
the slot is brought down.  So the card is quiesced.
However IIUC, on fatal error the link goes down and for pciehp, that's
essentially a surprise removal.  In that case, the above code is not
intended to be executed, rather the devices below the hotplug bridge
are marked disconnected.  See this patch I posted yesterday:
https://www.spinics.net/lists/linux-pci/msg74763.html
Thanks,
Lukas
^ permalink raw reply	[flat|nested] 35+ messages in thread
* Re: [PATCH V5 3/3] PCI: Mask and unmask hotplug interrupts during reset
  2018-07-21  2:58                 ` Sinan Kaya
  2018-07-21  6:07                   ` Sinan Kaya
@ 2018-07-29 18:02                   ` Lukas Wunner
  1 sibling, 0 replies; 35+ messages in thread
From: Lukas Wunner @ 2018-07-29 18:02 UTC (permalink / raw)
  To: Sinan Kaya
  Cc: Bjorn Helgaas, Oza Pawandeep, linux-pci, open list, Keith Busch,
	linux-arm-msm, linux-arm-kernel
On Fri, Jul 20, 2018 at 07:58:20PM -0700, Sinan Kaya wrote:
> My patch solves the problem if AER interrupt happens before the hotplug
> interrupt. We are masking the data link layer active interrupt. So,
> AER/DPC can perform their link operations without hotplug driver race.
> 
> We need to figure out how to gracefully return inside hotplug driver
> if link down happened and there is an error pending.
> 
> My first question is why hotplug driver is reacting to the link event
> if there was not an actual device insertion/removal.
> 
> Would it help to keep track of presence changed interrupts since last
> link event?
> 
> IF counter is 0 and device is present, hotplug driver bails out
> silently as an example.
Counting PDC events doesn't work reliably if multiple such events
occur in very short succession as the interrupt handler may not
run quickly enough.  See this commit message which shows unbalanced
Link Up / Link Down events:
https://patchwork.ozlabs.org/patch/867418/
And on Thunderbolt, interrupts can be signaled even though the port
and its parents are in D3hot (sic!).  A Thunderbolt daisy chain can
consist of up to 6 devices, each comprising a PCI switch, so there's
a cascade of over a dozen Upstream / Downstream ports between the
Root port and the hotplug port at the end of the daisy chain.
God knows how many events have occurred by the time all the parents
are resumed to D0 and the Slot Status register of the hotplug port
is read/written.  That was really the motivation for the event
handling rework.
Thanks,
Lukas
^ permalink raw reply	[flat|nested] 35+ messages in thread
* Re: [PATCH V5 0/3] PCI: separate hotplug handling from fatal error handling
  2018-07-02 22:52 [PATCH V5 0/3] PCI: separate hotplug handling from fatal error handling Sinan Kaya
                   ` (2 preceding siblings ...)
  2018-07-02 22:52 ` [PATCH V5 3/3] PCI: Mask and unmask hotplug interrupts during reset Sinan Kaya
@ 2018-07-31 18:44 ` Bjorn Helgaas
  2018-07-31 18:54   ` Sinan Kaya
  3 siblings, 1 reply; 35+ messages in thread
From: Bjorn Helgaas @ 2018-07-31 18:44 UTC (permalink / raw)
  To: Sinan Kaya; +Cc: linux-pci, linux-arm-kernel, linux-arm-msm
On Mon, Jul 02, 2018 at 06:52:44PM -0400, Sinan Kaya wrote:
> This is what happens when you insert a fatal error to a hotplug slot. See
> multiple link up/down messages.
> 
> /_#_[__333.699731]_pcieport_0001:00:00.0:_AER:_Uncorrected_(Fatal)_error_received:_id=0000
> [  333.748116] pcieport 0001:00:00.0: PCIe Bus Error: severity=Uncorrected (Fatal), type=Transaction Layer, id
> [  333.816044] pcieport 0001:00:00.0:   device [17cb:0404] error status/mask=00040000/00400000
> [  333.871581] pcieport 0001:00:00.0:    [18] Malformed TLP (First)
> [  333.914675] pcieport 0001:00:00.0:   TLP Header: 40000001 000000ff 00000000 00000000
> [  333.968397] pciehp 0001:00:00.0:pcie004: Slot(2): Link Down
> [  334.043234] iommu: Removing device 0001:01:00.4 from group 0
> [  334.095952] iommu: Removing device 0001:01:00.3 from group 0
> [  334.144644] iommu: Removing device 0001:01:00.2 from group 0
> [  334.194653] iommu: Removing device 0001:01:00.1 from group 0
> [  334.243564] pciehp 0001:00:00.0:pcie004: Slot(2): Link Up
> [  334.282556] iommu: Removing device 0001:01:00.0 from group 0
> [  334.330994] pciehp 0001:00:00.0:pcie004: Slot(2): Link Up event queued;
> currently getting powered off
> [  334.890587] pciehp 0001:00:00.0:pcie004: Timeout on hotplug command
> 0x13f1 (issued 282900 msec ago)
> [  335.070190] pciehp 0001:00:00.0:pcie004: Slot(2): Link Down
> [  335.106960] pciehp 0001:00:00.0:pcie004: Slot(2): Link Down event queued; currently getting powered on
> [  335.191119] pcieport 0001:00:00.0: AER: Device recovery failed
> [  346.590153] pciehp 0001:00:00.0:pcie004: Timeout on hotplug command 0x17f1 (issued 10250 msec ago)
> 
> 
> Since DPC/AER patch to refactor fatal error handling, both hotplug
> driver and AER/DPC driver will try removing devices and perform enumeration on
> link events/AER events.
> 
> Perfect environment for race condition without a change.
> 
> Try to mask hotplug interrupts during AER/DPC fatal error handling.
> 
> Changes from v4:
> * add mask_irq() and unmask_irq() callbacks to struct pcie_driver
> * refactor reset_slot() to use pciehp_mask_irq() an pciehp_unmask_irq()
> 
> Sinan Kaya (3):
>   PCI: pciehp: implement mask and unmask interrupt functions
>   PCI: pciehp: reuse pciehp_mask/unmask_irq() in reset_slot()
>   PCI: Mask and unmask hotplug interrupts during reset
> 
>  drivers/pci/hotplug/pciehp.h      |  2 ++
>  drivers/pci/hotplug/pciehp_core.c | 19 +++++++++++++++++++
>  drivers/pci/hotplug/pciehp_hpc.c  | 36 ++++++++++++++++++++++++++++--------
>  drivers/pci/pcie/err.c            | 11 +++++++++++
>  drivers/pci/pcie/portdrv.h        |  2 ++
>  5 files changed, 62 insertions(+), 8 deletions(-)
There's been a ton of discussion about this, and I didn't see a clear
consensus emerge, so I'm going to drop it for now.  If it's still
needed, please repost it and I'll watch for reviewers.
Bjorn
^ permalink raw reply	[flat|nested] 35+ messages in thread
* Re: [PATCH V5 0/3] PCI: separate hotplug handling from fatal error handling
  2018-07-31 18:44 ` [PATCH V5 0/3] PCI: separate hotplug handling from fatal error handling Bjorn Helgaas
@ 2018-07-31 18:54   ` Sinan Kaya
  2018-07-31 20:16     ` Bjorn Helgaas
  0 siblings, 1 reply; 35+ messages in thread
From: Sinan Kaya @ 2018-07-31 18:54 UTC (permalink / raw)
  To: Bjorn Helgaas, Sinan Kaya; +Cc: linux-pci, linux-arm-kernel, linux-arm-msm
Hi Bjorn,
On 7/31/2018 11:44 AM, Bjorn Helgaas wrote:
> There's been a ton of discussion about this, and I didn't see a clear
> consensus emerge, so I'm going to drop it for now.  If it's still
> needed, please repost it and I'll watch for reviewers.
I apologize, let me summarize the conversation for you.
I proposed this [1]. Oza seems to be onboard [2].
Later, I posted v6 with this change last weekend.
[patch v6 1/1] pci: pciehp: ignore link events when there is a fatal 
error pending
Lukas and I discussed the patch last weekend under this v6 thread.
AFAIK, we agreed on the direction.
I'll post V7 with Lukas' comments and capture the detail from email
into the commit message.
Let us know what you think about it.
Sinan
[1] https://lkml.org/lkml/2018/7/21/13.
[2] https://lkml.org/lkml/2018/7/25/183
^ permalink raw reply	[flat|nested] 35+ messages in thread
* Re: [PATCH V5 0/3] PCI: separate hotplug handling from fatal error handling
  2018-07-31 18:54   ` Sinan Kaya
@ 2018-07-31 20:16     ` Bjorn Helgaas
  0 siblings, 0 replies; 35+ messages in thread
From: Bjorn Helgaas @ 2018-07-31 20:16 UTC (permalink / raw)
  To: Sinan Kaya; +Cc: Sinan Kaya, linux-pci, linux-arm-kernel, linux-arm-msm
On Tue, Jul 31, 2018 at 11:54:58AM -0700, Sinan Kaya wrote:
> Hi Bjorn,
> 
> On 7/31/2018 11:44 AM, Bjorn Helgaas wrote:
> > There's been a ton of discussion about this, and I didn't see a clear
> > consensus emerge, so I'm going to drop it for now.  If it's still
> > needed, please repost it and I'll watch for reviewers.
> 
> I apologize, let me summarize the conversation for you.
> I proposed this [1]. Oza seems to be onboard [2].
> 
> Later, I posted v6 with this change last weekend.
Oops, I missed that this had been obsoleted, sorry for the noise.
> [patch v6 1/1] pci: pciehp: ignore link events when there is a fatal error
> pending
> 
> Lukas and I discussed the patch last weekend under this v6 thread.
> AFAIK, we agreed on the direction.
> 
> I'll post V7 with Lukas' comments and capture the detail from email
> into the commit message.
> 
> Let us know what you think about it.
Sounds good, I'll look for that.
^ permalink raw reply	[flat|nested] 35+ messages in thread
end of thread, other threads:[~2018-07-31 20:16 UTC | newest]
Thread overview: 35+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-07-02 22:52 [PATCH V5 0/3] PCI: separate hotplug handling from fatal error handling Sinan Kaya
2018-07-02 22:52 ` [PATCH V5 1/3] PCI: pciehp: implement mask and unmask interrupt functions Sinan Kaya
2018-07-02 22:52 ` [PATCH V5 2/3] PCI: pciehp: reuse pciehp_mask/unmask_irq() in reset_slot() Sinan Kaya
2018-07-02 22:52 ` [PATCH V5 3/3] PCI: Mask and unmask hotplug interrupts during reset Sinan Kaya
2018-07-03  8:34   ` Lukas Wunner
2018-07-03 10:52     ` poza
2018-07-03 12:04       ` okaya
2018-07-03 11:30     ` okaya
2018-07-03 13:11       ` poza
2018-07-03 13:25         ` Sinan Kaya
2018-07-03 13:31           ` Sinan Kaya
2018-07-03 13:59             ` Lukas Wunner
2018-07-03 14:10               ` poza
2018-07-03 14:17                 ` Lukas Wunner
2018-07-03 15:34               ` Sinan Kaya
2018-07-29 12:32         ` Lukas Wunner
2018-07-03 14:12       ` Lukas Wunner
2018-07-03 14:29         ` poza
2018-07-29 12:19       ` Lukas Wunner
2018-07-03 14:34   ` Lukas Wunner
2018-07-03 15:12     ` poza
2018-07-03 15:49       ` Sinan Kaya
2018-07-03 15:43     ` Sinan Kaya
2018-07-08 17:14       ` Lukas Wunner
2018-07-09 14:48         ` Sinan Kaya
2018-07-09 16:00           ` Lukas Wunner
2018-07-10 18:30             ` Sinan Kaya
2018-07-20 20:01               ` Bjorn Helgaas
2018-07-21  2:58                 ` Sinan Kaya
2018-07-21  6:07                   ` Sinan Kaya
2018-07-25  8:29                     ` poza
2018-07-29 18:02                   ` Lukas Wunner
2018-07-31 18:44 ` [PATCH V5 0/3] PCI: separate hotplug handling from fatal error handling Bjorn Helgaas
2018-07-31 18:54   ` Sinan Kaya
2018-07-31 20:16     ` Bjorn Helgaas
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).