* [PATCH v7 0/1] PCI: Handle conflict between hotplug and error handling @ 2018-08-05 22:51 Sinan Kaya 2018-08-05 22:51 ` [PATCH v7 1/1] PCI: pciehp: Ignore link events when there is a fatal error pending Sinan Kaya 2018-09-04 18:18 ` [PATCH v7 0/1] PCI: Handle conflict between hotplug and error handling Bjorn Helgaas 0 siblings, 2 replies; 17+ messages in thread From: Sinan Kaya @ 2018-08-05 22:51 UTC (permalink / raw) To: linux-pci; +Cc: Sinan Kaya Lukas: 'Your patch "[PATCH v6 1/1] PCI: pciehp: Ignore link events when there is a fatal error pending" only checks once for a pending fatal error, it should poll until either the fatal error is gone or a timeout is hit. If the fatal error is gone and the link is up, you can just return from pciehp_handle_presence_or_link_change(). Else (in the timeout case) fall back to the normal handling of a Link Down, i.e. let it bring down the slot.' Sinan Kaya (1): PCI: pciehp: Ignore link events when there is a fatal error pending drivers/pci/hotplug/pciehp_ctrl.c | 19 ++++++++++ drivers/pci/pci.h | 2 ++ drivers/pci/pcie/err.c | 60 +++++++++++++++++++++++++++++++ 3 files changed, 81 insertions(+) -- 2.17.1 ^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH v7 1/1] PCI: pciehp: Ignore link events when there is a fatal error pending 2018-08-05 22:51 [PATCH v7 0/1] PCI: Handle conflict between hotplug and error handling Sinan Kaya @ 2018-08-05 22:51 ` Sinan Kaya 2018-08-06 7:51 ` poza 2018-08-06 9:30 ` Lukas Wunner 2018-09-04 18:18 ` [PATCH v7 0/1] PCI: Handle conflict between hotplug and error handling Bjorn Helgaas 1 sibling, 2 replies; 17+ messages in thread From: Sinan Kaya @ 2018-08-05 22:51 UTC (permalink / raw) To: linux-pci Cc: Sinan Kaya, Bjorn Helgaas, Lukas Wunner, Mika Westerberg, Gustavo A. R. Silva, Greg Kroah-Hartman, Oza Pawandeep, Keith Busch AER/DPC reset is known as warm-resets. HP link recovery is known as cold-reset via power-off and power-on command to the PCI slot. In the middle of a warm-reset operation (AER/DPC), we are: 1. turning off the slow power. Slot power needs to be kept on in order for recovery to succeed. 2. performing a cold reset causing Fatal Error recovery to fail. If link goes down due to a DPC event, it should be recovered by DPC status trigger. Injecting a cold reset in the middle can cause a HW lockup as it is an undefined behavior. Similarly, If link goes down due to an AER secondary bus reset issue, it should be recovered by HW. Injecting a cold reset in the middle of a secondary bus reset can cause a HW lockup as it is an undefined behavior. 1. HP ISR observes link down interrupt. 2. HP ISR checks that there is a fatal error pending, it doesn't touch the link. 3. HP ISR waits until link recovery happens. 4. HP ISR calls the read vendor id function. 5. If all fails, try the cold-reset approach. If fatal error is pending and a fatal error service such as DPC or AER is running, it is the responsibility of the fatal error service to recover the link. Signed-off-by: Sinan Kaya <okaya@kernel.org> --- drivers/pci/hotplug/pciehp_ctrl.c | 19 ++++++++++ drivers/pci/pci.h | 2 ++ drivers/pci/pcie/err.c | 60 +++++++++++++++++++++++++++++++ 3 files changed, 81 insertions(+) diff --git a/drivers/pci/hotplug/pciehp_ctrl.c b/drivers/pci/hotplug/pciehp_ctrl.c index da7c72372ffc..ba8dd51a3f0f 100644 --- a/drivers/pci/hotplug/pciehp_ctrl.c +++ b/drivers/pci/hotplug/pciehp_ctrl.c @@ -222,9 +222,28 @@ void pciehp_handle_disable_request(struct slot *slot) void pciehp_handle_presence_or_link_change(struct slot *slot, u32 events) { struct controller *ctrl = slot->ctrl; + struct pci_dev *pdev = ctrl->pcie->port; bool link_active; u8 present; + /* If a fatal error is pending, wait for AER or DPC to handle it. */ + if (pcie_fatal_error_pending(pdev, PCI_ERR_UNC_SURPDN)) { + bool recovered; + + recovered = pcie_wait_fatal_error_clear(pdev, + PCI_ERR_UNC_SURPDN); + + /* If the fatal error is gone and the link is up, return */ + if (recovered && pcie_wait_for_link(pdev, true)) { + ctrl_info(ctrl, "Slot(%s): Ignoring Link event due to successful fatal error recovery\n", + slot_name(slot)); + return; + } + + ctrl_info(ctrl, "Slot(%s): Fatal error recovery failed for Link event, tryig hotplug reset\n", + slot_name(slot)); + } + /* * If the slot is on and presence or link has changed, turn it off. * Even if it's occupied again, we cannot assume the card is the same. diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h index c358e7a07f3f..2ebb05338368 100644 --- a/drivers/pci/pci.h +++ b/drivers/pci/pci.h @@ -356,6 +356,8 @@ void pci_enable_acs(struct pci_dev *dev); /* PCI error reporting and recovery */ void pcie_do_fatal_recovery(struct pci_dev *dev, u32 service); void pcie_do_nonfatal_recovery(struct pci_dev *dev); +bool pcie_fatal_error_pending(struct pci_dev *pdev, u32 usrmask); +bool pcie_wait_fatal_error_clear(struct pci_dev *pdev, u32 usrmask); bool pcie_wait_for_link(struct pci_dev *pdev, bool active); #ifdef CONFIG_PCIEASPM diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c index f7ce0cb0b0b7..8ea012dbf063 100644 --- a/drivers/pci/pcie/err.c +++ b/drivers/pci/pcie/err.c @@ -16,6 +16,7 @@ #include <linux/kernel.h> #include <linux/errno.h> #include <linux/aer.h> +#include <linux/delay.h> #include "portdrv.h" #include "../pci.h" @@ -386,3 +387,62 @@ void pcie_do_nonfatal_recovery(struct pci_dev *dev) /* TODO: Should kernel panic here? */ pci_info(dev, "AER: Device recovery failed\n"); } + +bool pcie_fatal_error_pending(struct pci_dev *pdev, u32 usrmask) +{ + u16 err_status = 0; + u32 status, mask, severity; + int rc; + + if (!pci_is_pcie(pdev)) + return false; + + rc = pcie_capability_read_word(pdev, PCI_EXP_DEVSTA, &err_status); + if (rc) + return false; + + if (!(err_status & PCI_EXP_DEVSTA_FED)) + return false; + + if (!pdev->aer_cap) + return false; + + rc = pci_read_config_dword(pdev, pdev->aer_cap + PCI_ERR_UNCOR_STATUS, + &status); + if (rc) + return false; + + rc = pci_read_config_dword(pdev, pdev->aer_cap + PCI_ERR_UNCOR_MASK, + &mask); + if (rc) + return false; + + rc = pci_read_config_dword(pdev, pdev->aer_cap + PCI_ERR_UNCOR_SEVER, + &severity); + if (rc) + return false; + + status &= mask; + status &= ~usrmask; + status &= severity; + + return !!status; +} + +bool pcie_wait_fatal_error_clear(struct pci_dev *pdev, u32 usrmask) +{ + int timeout = 1000; + bool ret; + + for (;;) { + ret = pcie_fatal_error_pending(pdev, usrmask); + if (ret == false) + return true; + if (timeout <= 0) + break; + msleep(20); + timeout -= 20; + } + + return false; +} -- 2.17.1 ^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [PATCH v7 1/1] PCI: pciehp: Ignore link events when there is a fatal error pending 2018-08-05 22:51 ` [PATCH v7 1/1] PCI: pciehp: Ignore link events when there is a fatal error pending Sinan Kaya @ 2018-08-06 7:51 ` poza 2018-08-06 9:26 ` Lukas Wunner 2018-08-06 9:30 ` Lukas Wunner 1 sibling, 1 reply; 17+ messages in thread From: poza @ 2018-08-06 7:51 UTC (permalink / raw) To: Sinan Kaya Cc: linux-pci, Bjorn Helgaas, Lukas Wunner, Mika Westerberg, Gustavo A. R. Silva, Greg Kroah-Hartman, Keith Busch On 2018-08-06 04:21, Sinan Kaya wrote: > AER/DPC reset is known as warm-resets. HP link recovery is known as > cold-reset via power-off and power-on command to the PCI slot. > > In the middle of a warm-reset operation (AER/DPC), we are: > > 1. turning off the slow power. Slot power needs to be kept on in order > for recovery to succeed. > > 2. performing a cold reset causing Fatal Error recovery to fail. > > If link goes down due to a DPC event, it should be recovered by DPC > status trigger. Injecting a cold reset in the middle can cause a HW > lockup as it is an undefined behavior. > > Similarly, If link goes down due to an AER secondary bus reset issue, > it should be recovered by HW. Injecting a cold reset in the middle of a > secondary bus reset can cause a HW lockup as it is an undefined > behavior. > > 1. HP ISR observes link down interrupt. > 2. HP ISR checks that there is a fatal error pending, it doesn't touch > the link. > 3. HP ISR waits until link recovery happens. > 4. HP ISR calls the read vendor id function. > 5. If all fails, try the cold-reset approach. > > If fatal error is pending and a fatal error service such as DPC or AER > is running, it is the responsibility of the fatal error service to > recover the link. > > Signed-off-by: Sinan Kaya <okaya@kernel.org> > --- > drivers/pci/hotplug/pciehp_ctrl.c | 19 ++++++++++ > drivers/pci/pci.h | 2 ++ > drivers/pci/pcie/err.c | 60 +++++++++++++++++++++++++++++++ > 3 files changed, 81 insertions(+) > > diff --git a/drivers/pci/hotplug/pciehp_ctrl.c > b/drivers/pci/hotplug/pciehp_ctrl.c > index da7c72372ffc..ba8dd51a3f0f 100644 > --- a/drivers/pci/hotplug/pciehp_ctrl.c > +++ b/drivers/pci/hotplug/pciehp_ctrl.c > @@ -222,9 +222,28 @@ void pciehp_handle_disable_request(struct slot > *slot) > void pciehp_handle_presence_or_link_change(struct slot *slot, u32 > events) > { > struct controller *ctrl = slot->ctrl; > + struct pci_dev *pdev = ctrl->pcie->port; > bool link_active; > u8 present; > > + /* If a fatal error is pending, wait for AER or DPC to handle it. */ > + if (pcie_fatal_error_pending(pdev, PCI_ERR_UNC_SURPDN)) { > + bool recovered; > + > + recovered = pcie_wait_fatal_error_clear(pdev, > + PCI_ERR_UNC_SURPDN); > + > + /* If the fatal error is gone and the link is up, return */ > + if (recovered && pcie_wait_for_link(pdev, true)) { > + ctrl_info(ctrl, "Slot(%s): Ignoring Link event due to successful > fatal error recovery\n", > + slot_name(slot)); > + return; > + } > + > + ctrl_info(ctrl, "Slot(%s): Fatal error recovery failed for Link > event, tryig hotplug reset\n", > + slot_name(slot)); > + } > + > /* > * If the slot is on and presence or link has changed, turn it off. > * Even if it's occupied again, we cannot assume the card is the > same. > diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h > index c358e7a07f3f..2ebb05338368 100644 > --- a/drivers/pci/pci.h > +++ b/drivers/pci/pci.h > @@ -356,6 +356,8 @@ void pci_enable_acs(struct pci_dev *dev); > /* PCI error reporting and recovery */ > void pcie_do_fatal_recovery(struct pci_dev *dev, u32 service); > void pcie_do_nonfatal_recovery(struct pci_dev *dev); > +bool pcie_fatal_error_pending(struct pci_dev *pdev, u32 usrmask); > +bool pcie_wait_fatal_error_clear(struct pci_dev *pdev, u32 usrmask); > > bool pcie_wait_for_link(struct pci_dev *pdev, bool active); > #ifdef CONFIG_PCIEASPM > diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c > index f7ce0cb0b0b7..8ea012dbf063 100644 > --- a/drivers/pci/pcie/err.c > +++ b/drivers/pci/pcie/err.c > @@ -16,6 +16,7 @@ > #include <linux/kernel.h> > #include <linux/errno.h> > #include <linux/aer.h> > +#include <linux/delay.h> > #include "portdrv.h" > #include "../pci.h" > > @@ -386,3 +387,62 @@ void pcie_do_nonfatal_recovery(struct pci_dev > *dev) > /* TODO: Should kernel panic here? */ > pci_info(dev, "AER: Device recovery failed\n"); > } > + > +bool pcie_fatal_error_pending(struct pci_dev *pdev, u32 usrmask) > +{ > + u16 err_status = 0; > + u32 status, mask, severity; > + int rc; > + > + if (!pci_is_pcie(pdev)) > + return false; > + > + rc = pcie_capability_read_word(pdev, PCI_EXP_DEVSTA, &err_status); > + if (rc) > + return false; > + > + if (!(err_status & PCI_EXP_DEVSTA_FED)) > + return false; > + > + if (!pdev->aer_cap) > + return false; > + > + rc = pci_read_config_dword(pdev, pdev->aer_cap + > PCI_ERR_UNCOR_STATUS, > + &status); > + if (rc) > + return false; > + > + rc = pci_read_config_dword(pdev, pdev->aer_cap + PCI_ERR_UNCOR_MASK, > + &mask); > + if (rc) > + return false; > + > + rc = pci_read_config_dword(pdev, pdev->aer_cap + PCI_ERR_UNCOR_SEVER, > + &severity); > + if (rc) > + return false; > + > + status &= mask; > + status &= ~usrmask; > + status &= severity; > + > + return !!status; > +} > + > +bool pcie_wait_fatal_error_clear(struct pci_dev *pdev, u32 usrmask) > +{ > + int timeout = 1000; > + bool ret; > + > + for (;;) { > + ret = pcie_fatal_error_pending(pdev, usrmask); > + if (ret == false) > + return true; > + if (timeout <= 0) > + break; > + msleep(20); > + timeout -= 20; I assume that this timeout will come into effect if 1) AER/DPC takes longer time than 1 sec for recovery. 2) Lets us say both AER and DPC are disabled....are we going to wait for this timeout before HP can take over ? > + } > + > + return false; > +} ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v7 1/1] PCI: pciehp: Ignore link events when there is a fatal error pending 2018-08-06 7:51 ` poza @ 2018-08-06 9:26 ` Lukas Wunner 2018-08-06 14:55 ` poza 0 siblings, 1 reply; 17+ messages in thread From: Lukas Wunner @ 2018-08-06 9:26 UTC (permalink / raw) To: poza; +Cc: Sinan Kaya, linux-pci, Bjorn Helgaas, Mika Westerberg, Keith Busch On Mon, Aug 06, 2018 at 01:21:03PM +0530, poza@codeaurora.org wrote: > On 2018-08-06 04:21, Sinan Kaya wrote: > >+bool pcie_wait_fatal_error_clear(struct pci_dev *pdev, u32 usrmask) > >+{ > >+ int timeout = 1000; > >+ bool ret; > >+ > >+ for (;;) { > >+ ret = pcie_fatal_error_pending(pdev, usrmask); > >+ if (ret == false) > >+ return true; > >+ if (timeout <= 0) > >+ break; > >+ msleep(20); > >+ timeout -= 20; > I assume that this timeout will come into effect if > 1) AER/DPC takes longer time than 1 sec for recovery. > 2) Lets us say both AER and DPC are disabled....are we going to wait for > this timeout before HP can take over ? If CONFIG_PCIEAER is disabled, pdev->aer_cap will not be set because it is populated in pci_aer_init(). pcie_fatal_error_pending(), as introduced by this patch, returns false if pdev->aer_cap is not set. So pciehp will fall back to a cold reset if CONFIG_PCIEAER is disabled. I'm not seeing a similar check for CONFIG_PCIE_DPC=n in this patch, but I'm not familiar enough with PCIe error recovery to know if such a check is at all needed. Thanks, Lukas ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v7 1/1] PCI: pciehp: Ignore link events when there is a fatal error pending 2018-08-06 9:26 ` Lukas Wunner @ 2018-08-06 14:55 ` poza 2018-08-06 15:04 ` Sinan Kaya 0 siblings, 1 reply; 17+ messages in thread From: poza @ 2018-08-06 14:55 UTC (permalink / raw) To: Lukas Wunner Cc: Sinan Kaya, linux-pci, Bjorn Helgaas, Mika Westerberg, Keith Busch, linux-pci-owner On 2018-08-06 14:56, Lukas Wunner wrote: > On Mon, Aug 06, 2018 at 01:21:03PM +0530, poza@codeaurora.org wrote: >> On 2018-08-06 04:21, Sinan Kaya wrote: >> >+bool pcie_wait_fatal_error_clear(struct pci_dev *pdev, u32 usrmask) >> >+{ >> >+ int timeout = 1000; >> >+ bool ret; >> >+ >> >+ for (;;) { >> >+ ret = pcie_fatal_error_pending(pdev, usrmask); >> >+ if (ret == false) >> >+ return true; >> >+ if (timeout <= 0) >> >+ break; >> >+ msleep(20); >> >+ timeout -= 20; >> I assume that this timeout will come into effect if >> 1) AER/DPC takes longer time than 1 sec for recovery. >> 2) Lets us say both AER and DPC are disabled....are we going to wait >> for >> this timeout before HP can take over ? > > If CONFIG_PCIEAER is disabled, pdev->aer_cap will not be set because > it is populated in pci_aer_init(). > > pcie_fatal_error_pending(), as introduced by this patch, returns false > if pdev->aer_cap is not set. So pciehp will fall back to a cold reset > if CONFIG_PCIEAER is disabled. > > I'm not seeing a similar check for CONFIG_PCIE_DPC=n in this patch, > but I'm not familiar enough with PCIe error recovery to know if such > a check is at all needed. > Either AER or DPC would get triggered, not both. in that case, if AER is disabled, then this code will return false thinking HP needs to handle it. but it might be that DPC would be triggering as well. but I dont see DPC check anywhere, rather we are relying on PCI_EXP_DEVSTA and following condition: if (!pdev->aer_cap) return false; so here we dont check anything with respect to DPC capability (although there is no such thing as dpc_cap) (except If I missed something) > Thanks, > > Lukas ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v7 1/1] PCI: pciehp: Ignore link events when there is a fatal error pending 2018-08-06 14:55 ` poza @ 2018-08-06 15:04 ` Sinan Kaya 2018-08-06 15:06 ` Sinan Kaya 0 siblings, 1 reply; 17+ messages in thread From: Sinan Kaya @ 2018-08-06 15:04 UTC (permalink / raw) To: poza, Lukas Wunner Cc: linux-pci, Bjorn Helgaas, Mika Westerberg, Keith Busch, linux-pci-owner On 8/6/2018 10:55 AM, poza@codeaurora.org wrote: >> > > Either AER or DPC would get triggered, not both. > in that case, if AER is disabled, then this code will return false > thinking HP needs to handle it. > but it might be that DPC would be triggering as well. > but I dont see DPC check anywhere, rather we are relying on PCI_EXP_DEVSTA > and following condition: > if (!pdev->aer_cap) > return false; > so here we dont check anything with respect to DPC capability (although > there is no such thing as dpc_cap) > (except If I missed something) That's true. We either need to go poll every single source (AER/DPC) for an error or rely on the DEVSTA. For ease of implementation, I suggest DEVSTA. Something like this: +bool pcie_fatal_error_pending(struct pci_dev *pdev, u32 usrmask) +{ + u16 err_status = 0; + u32 status, mask, severity; + int rc; + + if (!pci_is_pcie(pdev)) + return false; + + rc = pcie_capability_read_word(pdev, PCI_EXP_DEVSTA, &err_status); + if (rc) + return false; + + if (!(err_status & PCI_EXP_DEVSTA_FED)) + return false; + return true; +} ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v7 1/1] PCI: pciehp: Ignore link events when there is a fatal error pending 2018-08-06 15:04 ` Sinan Kaya @ 2018-08-06 15:06 ` Sinan Kaya 2018-08-06 15:12 ` poza 2018-08-06 16:44 ` Lukas Wunner 0 siblings, 2 replies; 17+ messages in thread From: Sinan Kaya @ 2018-08-06 15:06 UTC (permalink / raw) To: poza, Lukas Wunner Cc: linux-pci, Bjorn Helgaas, Mika Westerberg, Keith Busch, linux-pci-owner On 8/6/2018 11:04 AM, Sinan Kaya wrote: >> Either AER or DPC would get triggered, not both. >> in that case, if AER is disabled, then this code will return false >> thinking HP needs to handle it. >> but it might be that DPC would be triggering as well. >> but I dont see DPC check anywhere, rather we are relying on >> PCI_EXP_DEVSTA >> and following condition: >> if (!pdev->aer_cap) >> return false; >> so here we dont check anything with respect to DPC capability >> (although there is no such thing as dpc_cap) >> (except If I missed something) > > That's true. We either need to go poll every single source (AER/DPC) for > an error or rely on the DEVSTA. For ease of implementation, I suggest > DEVSTA. Something like this: Hmm. Too quick... Reduced set doesn't help. Surprise Link Down is also a fatal error. That's what I was trying to exclude. We'll have to poll the error source. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v7 1/1] PCI: pciehp: Ignore link events when there is a fatal error pending 2018-08-06 15:06 ` Sinan Kaya @ 2018-08-06 15:12 ` poza 2018-08-06 16:34 ` Sinan Kaya 2018-08-06 16:44 ` Lukas Wunner 1 sibling, 1 reply; 17+ messages in thread From: poza @ 2018-08-06 15:12 UTC (permalink / raw) To: Sinan Kaya Cc: Lukas Wunner, linux-pci, Bjorn Helgaas, Mika Westerberg, Keith Busch, linux-pci-owner On 2018-08-06 20:36, Sinan Kaya wrote: > On 8/6/2018 11:04 AM, Sinan Kaya wrote: >>> Either AER or DPC would get triggered, not both. >>> in that case, if AER is disabled, then this code will return false >>> thinking HP needs to handle it. >>> but it might be that DPC would be triggering as well. >>> but I dont see DPC check anywhere, rather we are relying on >>> PCI_EXP_DEVSTA >>> and following condition: >>> if (!pdev->aer_cap) >>> return false; >>> so here we dont check anything with respect to DPC capability >>> (although there is no such thing as dpc_cap) >>> (except If I missed something) >> >> That's true. We either need to go poll every single source (AER/DPC) >> for >> an error or rely on the DEVSTA. For ease of implementation, I suggest >> DEVSTA. Something like this: > > Hmm. Too quick... > > Reduced set doesn't help. Surprise Link Down is also a fatal error. > That's what I was trying to exclude. We'll have to poll the error > source. How about using pcie_port_find_service() ? Regards, Oza. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v7 1/1] PCI: pciehp: Ignore link events when there is a fatal error pending 2018-08-06 15:12 ` poza @ 2018-08-06 16:34 ` Sinan Kaya 0 siblings, 0 replies; 17+ messages in thread From: Sinan Kaya @ 2018-08-06 16:34 UTC (permalink / raw) To: poza Cc: Lukas Wunner, linux-pci, Bjorn Helgaas, Mika Westerberg, Keith Busch, linux-pci-owner On 8/6/2018 11:12 AM, poza@codeaurora.org wrote: >>> That's true. We either need to go poll every single source (AER/DPC) for >>> an error or rely on the DEVSTA. For ease of implementation, I suggest >>> DEVSTA. Something like this: >> >> Hmm. Too quick... >> >> Reduced set doesn't help. Surprise Link Down is also a fatal error. >> That's what I was trying to exclude. We'll have to poll the error >> source. > > > How about using pcie_port_find_service() ? pcie_port_find_service() tells you that there is an active driver. We could potentially use it like follows to query the register themselves. if ((pcie_port_find_service(AER)) { check AER registers } if ((pcie_port_find_service(DPC)) { check DPC registers } ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v7 1/1] PCI: pciehp: Ignore link events when there is a fatal error pending 2018-08-06 15:06 ` Sinan Kaya 2018-08-06 15:12 ` poza @ 2018-08-06 16:44 ` Lukas Wunner 2018-08-06 16:49 ` Sinan Kaya 1 sibling, 1 reply; 17+ messages in thread From: Lukas Wunner @ 2018-08-06 16:44 UTC (permalink / raw) To: Sinan Kaya Cc: poza, linux-pci, Bjorn Helgaas, Mika Westerberg, Keith Busch, linux-pci-owner On Mon, Aug 06, 2018 at 11:06:48AM -0400, Sinan Kaya wrote: > Surprise Link Down is also a fatal error. Seriously? On a Downstream Port which has the Hot-Plug Surprise bit in the Slot Capabilities register set, Surprise Link Down is a fatal error? That would seem somewhat contradictory. Do you have a section number in the PCIe Base Spec for this? Thanks, Lukas ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v7 1/1] PCI: pciehp: Ignore link events when there is a fatal error pending 2018-08-06 16:44 ` Lukas Wunner @ 2018-08-06 16:49 ` Sinan Kaya 2018-08-06 17:03 ` Sinan Kaya 2018-08-06 17:05 ` Lukas Wunner 0 siblings, 2 replies; 17+ messages in thread From: Sinan Kaya @ 2018-08-06 16:49 UTC (permalink / raw) To: Lukas Wunner Cc: poza, linux-pci, Bjorn Helgaas, Mika Westerberg, Keith Busch, linux-pci-owner On 8/6/2018 12:44 PM, Lukas Wunner wrote: > On Mon, Aug 06, 2018 at 11:06:48AM -0400, Sinan Kaya wrote: >> Surprise Link Down is also a fatal error. > > Seriously? On a Downstream Port which has the Hot-Plug Surprise bit > in the Slot Capabilities register set, Surprise Link Down is a fatal > error? That would seem somewhat contradictory. Do you have a > section number in the PCIe Base Spec for this? Spec 3.0. 7.10.2. Uncorrectable Error Status Register (Offset 04h) bit 5 Surprise Down Error Status (Optional). > > Thanks, > > Lukas > ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v7 1/1] PCI: pciehp: Ignore link events when there is a fatal error pending 2018-08-06 16:49 ` Sinan Kaya @ 2018-08-06 17:03 ` Sinan Kaya 2018-08-06 17:05 ` Lukas Wunner 1 sibling, 0 replies; 17+ messages in thread From: Sinan Kaya @ 2018-08-06 17:03 UTC (permalink / raw) To: Lukas Wunner Cc: poza, linux-pci, Bjorn Helgaas, Mika Westerberg, Keith Busch, linux-pci-owner On 8/6/2018 12:49 PM, Sinan Kaya wrote: >>> Surprise Link Down is also a fatal error. >> >> Seriously? On a Downstream Port which has the Hot-Plug Surprise bit >> in the Slot Capabilities register set, Surprise Link Down is a fatal >> error? That would seem somewhat contradictory. Do you have a >> section number in the PCIe Base Spec for this? > > Spec 3.0. 7.10.2. Uncorrectable Error Status Register (Offset 04h) > bit 5 Surprise Down Error Status (Optional). I think hotplug surprise capability takes priority over the one in AER register. Other non-hotplug capable systems can report a link down via AER Fatal error and recover via the error handling path. We should maybe leave this one alone since hotplug code wouldn't clear the AER status registers. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v7 1/1] PCI: pciehp: Ignore link events when there is a fatal error pending 2018-08-06 16:49 ` Sinan Kaya 2018-08-06 17:03 ` Sinan Kaya @ 2018-08-06 17:05 ` Lukas Wunner 2018-08-06 17:09 ` Sinan Kaya 1 sibling, 1 reply; 17+ messages in thread From: Lukas Wunner @ 2018-08-06 17:05 UTC (permalink / raw) To: Sinan Kaya Cc: poza, linux-pci, Bjorn Helgaas, Mika Westerberg, Keith Busch, linux-pci-owner On Mon, Aug 06, 2018 at 12:49:04PM -0400, Sinan Kaya wrote: > On 8/6/2018 12:44 PM, Lukas Wunner wrote: > >On Mon, Aug 06, 2018 at 11:06:48AM -0400, Sinan Kaya wrote: > >>Surprise Link Down is also a fatal error. > > > >Seriously? On a Downstream Port which has the Hot-Plug Surprise bit > >in the Slot Capabilities register set, Surprise Link Down is a fatal > >error? That would seem somewhat contradictory. Do you have a > >section number in the PCIe Base Spec for this? > > Spec 3.0. 7.10.2. Uncorrectable Error Status Register (Offset 04h) > bit 5 Surprise Down Error Status (Optional). Okay, thanks. Can we always mask out this bit on a hotplug port with surprise removal capability? Doesn't make sense to me to let AER/DPC spring to action in that case. Lukas ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v7 1/1] PCI: pciehp: Ignore link events when there is a fatal error pending 2018-08-06 17:05 ` Lukas Wunner @ 2018-08-06 17:09 ` Sinan Kaya 0 siblings, 0 replies; 17+ messages in thread From: Sinan Kaya @ 2018-08-06 17:09 UTC (permalink / raw) To: Lukas Wunner Cc: poza, linux-pci, Bjorn Helgaas, Mika Westerberg, Keith Busch, linux-pci-owner On 8/6/2018 1:05 PM, Lukas Wunner wrote: > On Mon, Aug 06, 2018 at 12:49:04PM -0400, Sinan Kaya wrote: >> On 8/6/2018 12:44 PM, Lukas Wunner wrote: >>> On Mon, Aug 06, 2018 at 11:06:48AM -0400, Sinan Kaya wrote: >>>> Surprise Link Down is also a fatal error. >>> Seriously? On a Downstream Port which has the Hot-Plug Surprise bit >>> in the Slot Capabilities register set, Surprise Link Down is a fatal >>> error? That would seem somewhat contradictory. Do you have a >>> section number in the PCIe Base Spec for this? >> Spec 3.0. 7.10.2. Uncorrectable Error Status Register (Offset 04h) >> bit 5 Surprise Down Error Status (Optional). > Okay, thanks. Can we always mask out this bit on a hotplug port > with surprise removal capability? Doesn't make sense to me to let > AER/DPC spring to action in that case. Yep, good idea. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v7 1/1] PCI: pciehp: Ignore link events when there is a fatal error pending 2018-08-05 22:51 ` [PATCH v7 1/1] PCI: pciehp: Ignore link events when there is a fatal error pending Sinan Kaya 2018-08-06 7:51 ` poza @ 2018-08-06 9:30 ` Lukas Wunner 1 sibling, 0 replies; 17+ messages in thread From: Lukas Wunner @ 2018-08-06 9:30 UTC (permalink / raw) To: Sinan Kaya Cc: linux-pci, Bjorn Helgaas, Mika Westerberg, Oza Pawandeep, Keith Busch On Sun, Aug 05, 2018 at 03:51:36PM -0700, Sinan Kaya wrote: > --- a/drivers/pci/hotplug/pciehp_ctrl.c > +++ b/drivers/pci/hotplug/pciehp_ctrl.c > @@ -222,9 +222,28 @@ void pciehp_handle_disable_request(struct slot *slot) > void pciehp_handle_presence_or_link_change(struct slot *slot, u32 events) > { > struct controller *ctrl = slot->ctrl; > + struct pci_dev *pdev = ctrl->pcie->port; > bool link_active; > u8 present; > > + /* If a fatal error is pending, wait for AER or DPC to handle it. */ > + if (pcie_fatal_error_pending(pdev, PCI_ERR_UNC_SURPDN)) { > + bool recovered; > + > + recovered = pcie_wait_fatal_error_clear(pdev, > + PCI_ERR_UNC_SURPDN); > + > + /* If the fatal error is gone and the link is up, return */ > + if (recovered && pcie_wait_for_link(pdev, true)) { > + ctrl_info(ctrl, "Slot(%s): Ignoring Link event due to successful fatal error recovery\n", > + slot_name(slot)); > + return; > + } > + > + ctrl_info(ctrl, "Slot(%s): Fatal error recovery failed for Link event, tryig hotplug reset\n", Typo tryig -> trying. Otherwise this is Reviewed-by: Lukas Wunner <lukas@wunner.de> ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v7 0/1] PCI: Handle conflict between hotplug and error handling 2018-08-05 22:51 [PATCH v7 0/1] PCI: Handle conflict between hotplug and error handling Sinan Kaya 2018-08-05 22:51 ` [PATCH v7 1/1] PCI: pciehp: Ignore link events when there is a fatal error pending Sinan Kaya @ 2018-09-04 18:18 ` Bjorn Helgaas 2018-09-04 18:28 ` Sinan Kaya 1 sibling, 1 reply; 17+ messages in thread From: Bjorn Helgaas @ 2018-09-04 18:18 UTC (permalink / raw) To: Sinan Kaya; +Cc: linux-pci On Sun, Aug 05, 2018 at 03:51:35PM -0700, Sinan Kaya wrote: > Lukas: > 'Your patch "[PATCH v6 1/1] PCI: pciehp: Ignore link events when there > is a fatal error pending" only checks once for a pending fatal error, > it should poll until either the fatal error is gone or a timeout is > hit. If the fatal error is gone and the link is up, you can just return > from pciehp_handle_presence_or_link_change(). Else (in the timeout case) > fall back to the normal handling of a Link Down, i.e. let it bring down > the slot.' > > > Sinan Kaya (1): > PCI: pciehp: Ignore link events when there is a fatal error pending > > drivers/pci/hotplug/pciehp_ctrl.c | 19 ++++++++++ > drivers/pci/pci.h | 2 ++ > drivers/pci/pcie/err.c | 60 +++++++++++++++++++++++++++++++ > 3 files changed, 81 insertions(+) Dropping for now because it sounds like you intend to mask out the error for hotplug ports. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v7 0/1] PCI: Handle conflict between hotplug and error handling 2018-09-04 18:18 ` [PATCH v7 0/1] PCI: Handle conflict between hotplug and error handling Bjorn Helgaas @ 2018-09-04 18:28 ` Sinan Kaya 0 siblings, 0 replies; 17+ messages in thread From: Sinan Kaya @ 2018-09-04 18:28 UTC (permalink / raw) To: Bjorn Helgaas; +Cc: linux-pci On 9/4/2018 11:18 AM, Bjorn Helgaas wrote: > On Sun, Aug 05, 2018 at 03:51:35PM -0700, Sinan Kaya wrote: >> Lukas: >> 'Your patch "[PATCH v6 1/1] PCI: pciehp: Ignore link events when there >> is a fatal error pending" only checks once for a pending fatal error, >> it should poll until either the fatal error is gone or a timeout is >> hit. If the fatal error is gone and the link is up, you can just return >> from pciehp_handle_presence_or_link_change(). Else (in the timeout case) >> fall back to the normal handling of a Link Down, i.e. let it bring down >> the slot.' >> >> >> Sinan Kaya (1): >> PCI: pciehp: Ignore link events when there is a fatal error pending >> >> drivers/pci/hotplug/pciehp_ctrl.c | 19 ++++++++++ >> drivers/pci/pci.h | 2 ++ >> drivers/pci/pcie/err.c | 60 +++++++++++++++++++++++++++++++ >> 3 files changed, 81 insertions(+) > > Dropping for now because it sounds like you intend to mask out the > error for hotplug ports. > Yes, I handed this work off-list to Keith and Lucas. I don't have hardware to test. ^ permalink raw reply [flat|nested] 17+ messages in thread
end of thread, other threads:[~2018-09-04 22:44 UTC | newest] Thread overview: 17+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2018-08-05 22:51 [PATCH v7 0/1] PCI: Handle conflict between hotplug and error handling Sinan Kaya 2018-08-05 22:51 ` [PATCH v7 1/1] PCI: pciehp: Ignore link events when there is a fatal error pending Sinan Kaya 2018-08-06 7:51 ` poza 2018-08-06 9:26 ` Lukas Wunner 2018-08-06 14:55 ` poza 2018-08-06 15:04 ` Sinan Kaya 2018-08-06 15:06 ` Sinan Kaya 2018-08-06 15:12 ` poza 2018-08-06 16:34 ` Sinan Kaya 2018-08-06 16:44 ` Lukas Wunner 2018-08-06 16:49 ` Sinan Kaya 2018-08-06 17:03 ` Sinan Kaya 2018-08-06 17:05 ` Lukas Wunner 2018-08-06 17:09 ` Sinan Kaya 2018-08-06 9:30 ` Lukas Wunner 2018-09-04 18:18 ` [PATCH v7 0/1] PCI: Handle conflict between hotplug and error handling Bjorn Helgaas 2018-09-04 18:28 ` Sinan Kaya
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.