* [RFC PATCH] PCI, kdump: Clear bus master bit upon shutdown in kdump kernel
@ 2019-12-25 19:21 Kairui Song
[not found] ` <20200222165631.GA213225@google.com>
0 siblings, 1 reply; 6+ messages in thread
From: Kairui Song @ 2019-12-25 19:21 UTC (permalink / raw)
To: linux-kernel, linux-pci
Cc: Bjorn Helgaas, kexec, Jerry Hoemann, Baoquan He, Kairui Song
There are reports about kdump hang upon reboot on some HPE machines,
kernel hanged when trying to shutdown a PCIe port, an uncorrectable
error occurred and crashed the system.
On the machine I can reproduce this issue, part of the topology
looks like this:
[0000:00]-+-00.0 Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 DMI2
+-01.0-[02]--
+-01.1-[05]--
+-02.0-[06]--+-00.0 Emulex Corporation OneConnect NIC (Skyhawk)
| +-00.1 Emulex Corporation OneConnect NIC (Skyhawk)
| +-00.2 Emulex Corporation OneConnect NIC (Skyhawk)
| +-00.3 Emulex Corporation OneConnect NIC (Skyhawk)
| +-00.4 Emulex Corporation OneConnect NIC (Skyhawk)
| +-00.5 Emulex Corporation OneConnect NIC (Skyhawk)
| +-00.6 Emulex Corporation OneConnect NIC (Skyhawk)
| \-00.7 Emulex Corporation OneConnect NIC (Skyhawk)
+-02.1-[0f]--
+-02.2-[07]----00.0 Hewlett-Packard Company Smart Array Gen9 Controllers
When shuting down PCIe port 0000:00:02.2 or 0000:00:02.0, the machine
will hang, depend on which device is reinitialized in kdump kernel.
If force remove unused device then trigger kdump, the problem will never
happen:
echo 1 > /sys/bus/pci/devices/0000\:00\:02.2/0000\:07\:00.0/remove
echo c > /proc/sysrq-trigger
... Kdump save vmcore through network, the NIC get reinitialized and
hpsa is untouched. Then reboot with no problem. (If hpsa is used
instead, shutdown the NIC in first kernel will help)
The cause is that some devices are enabled by the first kernel, but it
don't have the chance to shutdown the device, and kdump kernel is not
aware of it, unless it reinitialize the device.
Upon reboot, kdump kernel will skip downstream device shutdown and
clears its bridge's master bit directly. The downstream device could
error out as it can still send requests but upstream refuses it.
So for kdump, let kernel read the correct hardware power state on boot,
and always clear the bus master bit of PCI device upon shutdown if the
device is on. PCIe port driver will always shutdown all downstream
devices first, so this should ensure all downstream devices have bus
master bit off before clearing the bridge's bus master bit.
Signed-off-by: Kairui Song <kasong@redhat.com>
---
drivers/pci/pci-driver.c | 11 ++++++++---
drivers/pci/quirks.c | 20 ++++++++++++++++++++
2 files changed, 28 insertions(+), 3 deletions(-)
diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
index 0454ca0e4e3f..84a7fd643b4d 100644
--- a/drivers/pci/pci-driver.c
+++ b/drivers/pci/pci-driver.c
@@ -18,6 +18,7 @@
#include <linux/kexec.h>
#include <linux/of_device.h>
#include <linux/acpi.h>
+#include <linux/crash_dump.h>
#include "pci.h"
#include "pcie/portdrv.h"
@@ -488,10 +489,14 @@ static void pci_device_shutdown(struct device *dev)
* If this is a kexec reboot, turn off Bus Master bit on the
* device to tell it to not continue to do DMA. Don't touch
* devices in D3cold or unknown states.
- * If it is not a kexec reboot, firmware will hit the PCI
- * devices with big hammer and stop their DMA any way.
+ * If this is kdump kernel, also turn off Bus Master, the device
+ * could be activated by previous crashed kernel and may block
+ * it's upstream from shutting down.
+ * Else, firmware will hit the PCI devices with big hammer
+ * and stop their DMA any way.
*/
- if (kexec_in_progress && (pci_dev->current_state <= PCI_D3hot))
+ if ((kexec_in_progress || is_kdump_kernel()) &&
+ pci_dev->current_state <= PCI_D3hot)
pci_clear_master(pci_dev);
}
diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
index 4937a088d7d8..c65d11ab3939 100644
--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -28,6 +28,7 @@
#include <linux/platform_data/x86/apple.h>
#include <linux/pm_runtime.h>
#include <linux/switchtec.h>
+#include <linux/crash_dump.h>
#include <asm/dma.h> /* isa_dma_bridge_buggy */
#include "pci.h"
@@ -192,6 +193,25 @@ static int __init pci_apply_final_quirks(void)
}
fs_initcall_sync(pci_apply_final_quirks);
+/*
+ * Read the device state even if it's not enabled. The device could be
+ * activated by previous crashed kernel, this will read and correct the
+ * cached state.
+ */
+static void quirk_read_pm_state_in_kdump(struct pci_dev *dev)
+{
+ u16 pmcsr;
+
+ if (!is_kdump_kernel())
+ return;
+
+ if (dev->pm_cap) {
+ pci_read_config_word(dev, dev->pm_cap + PCI_PM_CTRL, &pmcsr);
+ dev->current_state = (pmcsr & PCI_PM_CTRL_STATE_MASK);
+ }
+}
+DECLARE_PCI_FIXUP_FINAL(PCI_ANY_ID, PCI_ANY_ID, quirk_read_pm_state_in_kdump);
+
/*
* Decoding should be disabled for a PCI device during BAR sizing to avoid
* conflict. But doing so may cause problems on host bridge and perhaps other
--
2.24.1
_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [RFC PATCH] PCI, kdump: Clear bus master bit upon shutdown in kdump kernel
[not found] ` <20200306093829.GA27711@MiWiFi-R3L-srv>
@ 2020-07-22 14:52 ` Kairui Song
2020-07-22 15:21 ` Bjorn Helgaas
0 siblings, 1 reply; 6+ messages in thread
From: Kairui Song @ 2020-07-22 14:52 UTC (permalink / raw)
To: Baoquan He
Cc: jroedel, linux-pci, Dave Young, kexec, Linux Kernel Mailing List,
Randy Wright, Jerry Hoemann, Bjorn Helgaas, Deepa Dinamani,
Myron Stowe, Khalid Aziz
On Fri, Mar 6, 2020 at 5:38 PM Baoquan He <bhe@redhat.com> wrote:
>
> On 03/04/20 at 08:53pm, Deepa Dinamani wrote:
> > On Wed, Mar 4, 2020 at 7:53 PM Baoquan He <bhe@redhat.com> wrote:
> > >
> > > +Joerg to CC.
> > >
> > > On 03/03/20 at 01:01pm, Deepa Dinamani wrote:
> > > > I looked at this some more. Looks like we do not clear irqs when we do
> > > > a kexec reboot. And, the bootup code maintains the same table for the
> > > > kexec-ed kernel. I'm looking at the following code in
> > >
> > > I guess you are talking about kdump reboot here, right? Kexec and kdump
> > > boot take the similar mechanism, but differ a little.
> >
> > Right I meant kdump kernel here. And, clearly the is_kdump_kernel() case below.
> >
> > >
> > > > intel_irq_remapping.c:
> > > >
> > > > if (ir_pre_enabled(iommu)) {
> > > > if (!is_kdump_kernel()) {
> > > > pr_warn("IRQ remapping was enabled on %s but
> > > > we are not in kdump mode\n",
> > > > iommu->name);
> > > > clear_ir_pre_enabled(iommu);
> > > > iommu_disable_irq_remapping(iommu);
> > > > } else if (iommu_load_old_irte(iommu))
> > >
> > > Here, it's for kdump kernel to copy old ir table from 1st kernel.
> >
> > Correct.
> >
> > > > pr_err("Failed to copy IR table for %s from
> > > > previous kernel\n",
> > > > iommu->name);
> > > > else
> > > > pr_info("Copied IR table for %s from previous kernel\n",
> > > > iommu->name);
> > > > }
> > > >
> > > > Would cleaning the interrupts(like in the non kdump path above) just
> > > > before shutdown help here? This should clear the interrupts enabled
> > > > for all the devices in the current kernel. So when kdump kernel
> > > > starts, it starts clean. This should probably help block out the
> > > > interrupts from a device that does not have a driver.
> > >
> > > I think stopping those devices out of control from continue sending
> > > interrupts is a good idea. While not sure if only clearing the interrupt
> > > will be enough. Those devices which will be initialized by their driver
> > > will brake, but devices which drivers are not loaded into kdump kernel
> > > may continue acting. Even though interrupts are cleaning at this time,
> > > the on-flight DMA could continue triggerring interrupt since the ir
> > > table and iopage table are rebuilt.
> >
> > This should be handled by the IOMMU, right? And, hence you are getting
> > UR. This seems like the correct execution flow to me.
>
> Sorry for late reply.
> Yes, this is initializing IOMMU device.
>
> >
> > Anyway, you could just test this theory by removing the
> > is_kdump_kernel() check above and see if it solves your problem.
> > Obviously, check the VT-d spec to figure out the exact sequence to
> > turn off the IR.
>
> OK, I will talk to Kairui and get a machine to test it. Thanks for your
> nice idea, if you have a draft patch, we are happy to test it.
>
> >
> > Note that the device that is causing the problem here is a legit
> > device. We want to have interrupts from devices we don't know about
> > blocked anyway because we can have compromised firmware/ devices that
> > could cause a DoS attack. So blocking the unwanted interrupts seems
> > like the right thing to do here.
>
> Kairui said it's a device which driver is not loaded in kdump kernel
> because it's not needed by kdump. We try to only load kernel modules
> which are needed, e.g one device is the dump target, its driver has to
> be loaded in. In this case, the device is more like a out of control
> device to kdump kernel.
>
Hi Bao, Deepa, sorry for this very late response. The test machine was
not available for sometime, and I restarted to work on this problem.
For the workaround mention by Deepa (by remote the is_kdump_kernel()
check), it didn't work, the machine still hangs upon shutdown.
The devices that were left in an unknown state and sending interrupt
could be a problem, but it's irrelevant to this hanging problem.
I think I didn't make one thing clear, The PCI UR error never arrives
in kernel, it's the iLo BMC on that HPE machine caught the error, and
send kernel an NMI. kernel is panicked by NMI, I'm still trying to
figure out why the NMI hanged kernel, even with panic=-1,
panic_on_io_nmi, panic_on_unknown_nmi all set. But if we can avoid the
NMI by shutdown the devices in right order, that's also a solution.
--
Best Regards,
Kairui Song
_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [RFC PATCH] PCI, kdump: Clear bus master bit upon shutdown in kdump kernel
2020-07-22 14:52 ` Kairui Song
@ 2020-07-22 15:21 ` Bjorn Helgaas
2020-07-22 21:50 ` Jerry Hoemann
0 siblings, 1 reply; 6+ messages in thread
From: Bjorn Helgaas @ 2020-07-22 15:21 UTC (permalink / raw)
To: Kairui Song
Cc: jroedel, Baoquan He, linux-pci, Dave Young, kexec,
Linux Kernel Mailing List, Randy Wright, Jerry Hoemann,
Deepa Dinamani, Myron Stowe, Khalid Aziz
On Wed, Jul 22, 2020 at 10:52:26PM +0800, Kairui Song wrote:
> On Fri, Mar 6, 2020 at 5:38 PM Baoquan He <bhe@redhat.com> wrote:
> > On 03/04/20 at 08:53pm, Deepa Dinamani wrote:
> > > On Wed, Mar 4, 2020 at 7:53 PM Baoquan He <bhe@redhat.com> wrote:
> > > > On 03/03/20 at 01:01pm, Deepa Dinamani wrote:
> > > > > I looked at this some more. Looks like we do not clear irqs
> > > > > when we do a kexec reboot. And, the bootup code maintains
> > > > > the same table for the kexec-ed kernel. I'm looking at the
> > > > > following code in
> > > >
> > > > I guess you are talking about kdump reboot here, right? Kexec
> > > > and kdump boot take the similar mechanism, but differ a
> > > > little.
> > >
> > > Right I meant kdump kernel here. And, clearly the
> > > is_kdump_kernel() case below.
> > >
> > > > > intel_irq_remapping.c:
> > > > >
> > > > > if (ir_pre_enabled(iommu)) {
> > > > > if (!is_kdump_kernel()) {
> > > > > pr_warn("IRQ remapping was enabled on %s but
> > > > > we are not in kdump mode\n",
> > > > > iommu->name);
> > > > > clear_ir_pre_enabled(iommu);
> > > > > iommu_disable_irq_remapping(iommu);
> > > > > } else if (iommu_load_old_irte(iommu))
> > > >
> > > > Here, it's for kdump kernel to copy old ir table from 1st kernel.
> > >
> > > Correct.
> > >
> > > > > pr_err("Failed to copy IR table for %s from
> > > > > previous kernel\n",
> > > > > iommu->name);
> > > > > else
> > > > > pr_info("Copied IR table for %s from previous kernel\n",
> > > > > iommu->name);
> > > > > }
> > > > >
> > > > > Would cleaning the interrupts(like in the non kdump path
> > > > > above) just before shutdown help here? This should clear the
> > > > > interrupts enabled for all the devices in the current
> > > > > kernel. So when kdump kernel starts, it starts clean. This
> > > > > should probably help block out the interrupts from a device
> > > > > that does not have a driver.
> > > >
> > > > I think stopping those devices out of control from continue
> > > > sending interrupts is a good idea. While not sure if only
> > > > clearing the interrupt will be enough. Those devices which
> > > > will be initialized by their driver will brake, but devices
> > > > which drivers are not loaded into kdump kernel may continue
> > > > acting. Even though interrupts are cleaning at this time, the
> > > > on-flight DMA could continue triggerring interrupt since the
> > > > ir table and iopage table are rebuilt.
> > >
> > > This should be handled by the IOMMU, right? And, hence you are
> > > getting UR. This seems like the correct execution flow to me.
> >
> > Sorry for late reply.
> > Yes, this is initializing IOMMU device.
> >
> > > Anyway, you could just test this theory by removing the
> > > is_kdump_kernel() check above and see if it solves your problem.
> > > Obviously, check the VT-d spec to figure out the exact sequence to
> > > turn off the IR.
> >
> > OK, I will talk to Kairui and get a machine to test it. Thanks for your
> > nice idea, if you have a draft patch, we are happy to test it.
> >
> > > Note that the device that is causing the problem here is a legit
> > > device. We want to have interrupts from devices we don't know about
> > > blocked anyway because we can have compromised firmware/ devices that
> > > could cause a DoS attack. So blocking the unwanted interrupts seems
> > > like the right thing to do here.
> >
> > Kairui said it's a device which driver is not loaded in kdump kernel
> > because it's not needed by kdump. We try to only load kernel modules
> > which are needed, e.g one device is the dump target, its driver has to
> > be loaded in. In this case, the device is more like a out of control
> > device to kdump kernel.
>
> Hi Bao, Deepa, sorry for this very late response. The test machine was
> not available for sometime, and I restarted to work on this problem.
>
> For the workaround mention by Deepa (by remote the is_kdump_kernel()
> check), it didn't work, the machine still hangs upon shutdown.
> The devices that were left in an unknown state and sending interrupt
> could be a problem, but it's irrelevant to this hanging problem.
>
> I think I didn't make one thing clear, The PCI UR error never arrives
> in kernel, it's the iLo BMC on that HPE machine caught the error, and
> send kernel an NMI. kernel is panicked by NMI, I'm still trying to
> figure out why the NMI hanged kernel, even with panic=-1,
> panic_on_io_nmi, panic_on_unknown_nmi all set. But if we can avoid the
> NMI by shutdown the devices in right order, that's also a solution.
I'm not sure how much sympathy to have for this situation. A PCIe UR
is fatal for the transaction and maybe even the device, but from the
overall system point of view, it *should* be a recoverable error and
we shouldn't panic.
Errors like that should be reported via the normal AER or ACPI/APEI
mechanisms. It sounds like in this case, the platform has decided
these aren't enough and it is trying to force a reboot? If this is
"special" platform behavior, I'm not sure how much we need to cater
for it.
Bjorn
_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [RFC PATCH] PCI, kdump: Clear bus master bit upon shutdown in kdump kernel
2020-07-22 15:21 ` Bjorn Helgaas
@ 2020-07-22 21:50 ` Jerry Hoemann
2020-07-23 0:00 ` Bjorn Helgaas
0 siblings, 1 reply; 6+ messages in thread
From: Jerry Hoemann @ 2020-07-22 21:50 UTC (permalink / raw)
To: Bjorn Helgaas
Cc: jroedel, Kairui Song, Baoquan He, linux-pci, Dave Young, kexec,
Linux Kernel Mailing List, Randy Wright, Deepa Dinamani,
Myron Stowe, Khalid Aziz
On Wed, Jul 22, 2020 at 10:21:23AM -0500, Bjorn Helgaas wrote:
> On Wed, Jul 22, 2020 at 10:52:26PM +0800, Kairui Song wrote:
> > On Fri, Mar 6, 2020 at 5:38 PM Baoquan He <bhe@redhat.com> wrote:
> > > On 03/04/20 at 08:53pm, Deepa Dinamani wrote:
> > > > On Wed, Mar 4, 2020 at 7:53 PM Baoquan He <bhe@redhat.com> wrote:
> > > > > On 03/03/20 at 01:01pm, Deepa Dinamani wrote:
> > > > > > I looked at this some more. Looks like we do not clear irqs
> > > > > > when we do a kexec reboot. And, the bootup code maintains
> > > > > > the same table for the kexec-ed kernel. I'm looking at the
> > > > > > following code in
> > > > >
> > > > > I guess you are talking about kdump reboot here, right? Kexec
> > > > > and kdump boot take the similar mechanism, but differ a
> > > > > little.
> > > >
> > > > Right I meant kdump kernel here. And, clearly the
> > > > is_kdump_kernel() case below.
> > > >
> > > > > > intel_irq_remapping.c:
> > > > > >
> > > > > > if (ir_pre_enabled(iommu)) {
> > > > > > if (!is_kdump_kernel()) {
> > > > > > pr_warn("IRQ remapping was enabled on %s but
> > > > > > we are not in kdump mode\n",
> > > > > > iommu->name);
> > > > > > clear_ir_pre_enabled(iommu);
> > > > > > iommu_disable_irq_remapping(iommu);
> > > > > > } else if (iommu_load_old_irte(iommu))
> > > > >
> > > > > Here, it's for kdump kernel to copy old ir table from 1st kernel.
> > > >
> > > > Correct.
> > > >
> > > > > > pr_err("Failed to copy IR table for %s from
> > > > > > previous kernel\n",
> > > > > > iommu->name);
> > > > > > else
> > > > > > pr_info("Copied IR table for %s from previous kernel\n",
> > > > > > iommu->name);
> > > > > > }
> > > > > >
> > > > > > Would cleaning the interrupts(like in the non kdump path
> > > > > > above) just before shutdown help here? This should clear the
> > > > > > interrupts enabled for all the devices in the current
> > > > > > kernel. So when kdump kernel starts, it starts clean. This
> > > > > > should probably help block out the interrupts from a device
> > > > > > that does not have a driver.
> > > > >
> > > > > I think stopping those devices out of control from continue
> > > > > sending interrupts is a good idea. While not sure if only
> > > > > clearing the interrupt will be enough. Those devices which
> > > > > will be initialized by their driver will brake, but devices
> > > > > which drivers are not loaded into kdump kernel may continue
> > > > > acting. Even though interrupts are cleaning at this time, the
> > > > > on-flight DMA could continue triggerring interrupt since the
> > > > > ir table and iopage table are rebuilt.
> > > >
> > > > This should be handled by the IOMMU, right? And, hence you are
> > > > getting UR. This seems like the correct execution flow to me.
> > >
> > > Sorry for late reply.
> > > Yes, this is initializing IOMMU device.
> > >
> > > > Anyway, you could just test this theory by removing the
> > > > is_kdump_kernel() check above and see if it solves your problem.
> > > > Obviously, check the VT-d spec to figure out the exact sequence to
> > > > turn off the IR.
> > >
> > > OK, I will talk to Kairui and get a machine to test it. Thanks for your
> > > nice idea, if you have a draft patch, we are happy to test it.
> > >
> > > > Note that the device that is causing the problem here is a legit
> > > > device. We want to have interrupts from devices we don't know about
> > > > blocked anyway because we can have compromised firmware/ devices that
> > > > could cause a DoS attack. So blocking the unwanted interrupts seems
> > > > like the right thing to do here.
> > >
> > > Kairui said it's a device which driver is not loaded in kdump kernel
> > > because it's not needed by kdump. We try to only load kernel modules
> > > which are needed, e.g one device is the dump target, its driver has to
> > > be loaded in. In this case, the device is more like a out of control
> > > device to kdump kernel.
> >
> > Hi Bao, Deepa, sorry for this very late response. The test machine was
> > not available for sometime, and I restarted to work on this problem.
> >
> > For the workaround mention by Deepa (by remote the is_kdump_kernel()
> > check), it didn't work, the machine still hangs upon shutdown.
> > The devices that were left in an unknown state and sending interrupt
> > could be a problem, but it's irrelevant to this hanging problem.
> >
> > I think I didn't make one thing clear, The PCI UR error never arrives
> > in kernel, it's the iLo BMC on that HPE machine caught the error, and
> > send kernel an NMI. kernel is panicked by NMI, I'm still trying to
> > figure out why the NMI hanged kernel, even with panic=-1,
> > panic_on_io_nmi, panic_on_unknown_nmi all set. But if we can avoid the
> > NMI by shutdown the devices in right order, that's also a solution.
>
> I'm not sure how much sympathy to have for this situation. A PCIe UR
> is fatal for the transaction and maybe even the device, but from the
> overall system point of view, it *should* be a recoverable error and
> we shouldn't panic.
>
> Errors like that should be reported via the normal AER or ACPI/APEI
> mechanisms. It sounds like in this case, the platform has decided
> these aren't enough and it is trying to force a reboot? If this is
> "special" platform behavior, I'm not sure how much we need to cater
> for it.
>
Are these AER errors the type processed by the GHES code?
I'll note that RedHat runs their crash kernel with: hest_disable.
So, the ghes code is disabled in the crash kernel.
--
-----------------------------------------------------------------------------
Jerry Hoemann Software Engineer Hewlett Packard Enterprise
-----------------------------------------------------------------------------
_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [RFC PATCH] PCI, kdump: Clear bus master bit upon shutdown in kdump kernel
2020-07-22 21:50 ` Jerry Hoemann
@ 2020-07-23 0:00 ` Bjorn Helgaas
2020-07-23 18:34 ` Kairui Song
0 siblings, 1 reply; 6+ messages in thread
From: Bjorn Helgaas @ 2020-07-23 0:00 UTC (permalink / raw)
To: Jerry Hoemann
Cc: jroedel, Kairui Song, Baoquan He, linux-pci, Dave Young, kexec,
Linux Kernel Mailing List, Randy Wright, Deepa Dinamani,
Myron Stowe, Khalid Aziz
On Wed, Jul 22, 2020 at 03:50:48PM -0600, Jerry Hoemann wrote:
> On Wed, Jul 22, 2020 at 10:21:23AM -0500, Bjorn Helgaas wrote:
> > On Wed, Jul 22, 2020 at 10:52:26PM +0800, Kairui Song wrote:
> > > I think I didn't make one thing clear, The PCI UR error never arrives
> > > in kernel, it's the iLo BMC on that HPE machine caught the error, and
> > > send kernel an NMI. kernel is panicked by NMI, I'm still trying to
> > > figure out why the NMI hanged kernel, even with panic=-1,
> > > panic_on_io_nmi, panic_on_unknown_nmi all set. But if we can avoid the
> > > NMI by shutdown the devices in right order, that's also a solution.
ACPI v6.3, chapter 18, does mention NMIs several times, e.g., Table
18-394 and sec 18.4. I'm not familiar enough with APEI to know
whether Linux correctly supports all those cases. Maybe this is a
symptom that we don't?
> > I'm not sure how much sympathy to have for this situation. A PCIe UR
> > is fatal for the transaction and maybe even the device, but from the
> > overall system point of view, it *should* be a recoverable error and
> > we shouldn't panic.
> >
> > Errors like that should be reported via the normal AER or ACPI/APEI
> > mechanisms. It sounds like in this case, the platform has decided
> > these aren't enough and it is trying to force a reboot? If this is
> > "special" platform behavior, I'm not sure how much we need to cater
> > for it.
>
> Are these AER errors the type processed by the GHES code?
My understanding from ACPI v6.3, sec 18.3.2, is that the Hardware
Error Source Table may contain Error Source Descriptors of types like:
IA-32 Machine Check Exception
IA-32 Corrected Machine Check
IA-32 Non-Maskable Interrupt
PCIe Root Port AER
PCIe Device AER
Generic Hardware Error Source (GHES)
Hardware Error Notification
IA-32 Deferred Machine Check
I would naively expect PCIe UR errors to be reported via one of the
PCIe Error Sources, not GHES, but maybe there's some reason to use
GHES.
The kernel should already know how to deal with the PCIe AER errors,
but we'd have to add new device-specific code to handle things
reported via GHES, along the lines of what Shiju is doing here:
https://lore.kernel.org/r/20200722104245.1060-1-shiju.jose@huawei.com
> I'll note that RedHat runs their crash kernel with: hest_disable.
> So, the ghes code is disabled in the crash kernel.
That would disable all the HEST error sources, including the PCIe AER
ones as well as GHES ones. If we turn off some of the normal error
handling mechanisms, I guess we have to expect that some errors won't
be handled correctly.
Bjorn
_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [RFC PATCH] PCI, kdump: Clear bus master bit upon shutdown in kdump kernel
2020-07-23 0:00 ` Bjorn Helgaas
@ 2020-07-23 18:34 ` Kairui Song
0 siblings, 0 replies; 6+ messages in thread
From: Kairui Song @ 2020-07-23 18:34 UTC (permalink / raw)
To: Bjorn Helgaas
Cc: jroedel, Baoquan He, linux-pci, Dave Young, kexec, Jerry Hoemann,
Randy Wright, Linux Kernel Mailing List, Deepa Dinamani,
Myron Stowe, Khalid Aziz
On Thu, Jul 23, 2020 at 8:00 AM Bjorn Helgaas <helgaas@kernel.org> wrote:
>
> On Wed, Jul 22, 2020 at 03:50:48PM -0600, Jerry Hoemann wrote:
> > On Wed, Jul 22, 2020 at 10:21:23AM -0500, Bjorn Helgaas wrote:
> > > On Wed, Jul 22, 2020 at 10:52:26PM +0800, Kairui Song wrote:
>
> > > > I think I didn't make one thing clear, The PCI UR error never arrives
> > > > in kernel, it's the iLo BMC on that HPE machine caught the error, and
> > > > send kernel an NMI. kernel is panicked by NMI, I'm still trying to
> > > > figure out why the NMI hanged kernel, even with panic=-1,
> > > > panic_on_io_nmi, panic_on_unknown_nmi all set. But if we can avoid the
> > > > NMI by shutdown the devices in right order, that's also a solution.
>
> ACPI v6.3, chapter 18, does mention NMIs several times, e.g., Table
> 18-394 and sec 18.4. I'm not familiar enough with APEI to know
> whether Linux correctly supports all those cases. Maybe this is a
> symptom that we don't?
>
> > > I'm not sure how much sympathy to have for this situation. A PCIe UR
> > > is fatal for the transaction and maybe even the device, but from the
> > > overall system point of view, it *should* be a recoverable error and
> > > we shouldn't panic.
> > >
> > > Errors like that should be reported via the normal AER or ACPI/APEI
> > > mechanisms. It sounds like in this case, the platform has decided
> > > these aren't enough and it is trying to force a reboot? If this is
> > > "special" platform behavior, I'm not sure how much we need to cater
> > > for it.
> >
> > Are these AER errors the type processed by the GHES code?
>
> My understanding from ACPI v6.3, sec 18.3.2, is that the Hardware
> Error Source Table may contain Error Source Descriptors of types like:
>
> IA-32 Machine Check Exception
> IA-32 Corrected Machine Check
> IA-32 Non-Maskable Interrupt
> PCIe Root Port AER
> PCIe Device AER
> Generic Hardware Error Source (GHES)
> Hardware Error Notification
> IA-32 Deferred Machine Check
>
> I would naively expect PCIe UR errors to be reported via one of the
> PCIe Error Sources, not GHES, but maybe there's some reason to use
> GHES.
>
> The kernel should already know how to deal with the PCIe AER errors,
> but we'd have to add new device-specific code to handle things
> reported via GHES, along the lines of what Shiju is doing here:
>
> https://lore.kernel.org/r/20200722104245.1060-1-shiju.jose@huawei.com
>
> > I'll note that RedHat runs their crash kernel with: hest_disable.
> > So, the ghes code is disabled in the crash kernel.
>
> That would disable all the HEST error sources, including the PCIe AER
> ones as well as GHES ones. If we turn off some of the normal error
> handling mechanisms, I guess we have to expect that some errors won't
> be handled correctly.
Hi, that's true, hest_disable is added by default to reduce memory
usage in special cases.
But even if I remove hest_disable and have GHES enabled, but the
hanging issue still exists, from the iLO console log, it's still
sending an NMI to kernel, and kernel hanged.
The NMI won't hang the kernel for 100 percent, sometime it will just
panic and reboot and sometimes it hangs. This behavior didn't change
after/before enabled the GHES.
Maybe this is a "special platform behavior". I'm also not 100 percent
sure if/how we can cover this in a good way for now.
I'll try to figure how the NMI actually hanged the kernel and see if
it could be fixed in other ways.
--
Best Regards,
Kairui Song
_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2020-07-23 18:34 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2019-12-25 19:21 [RFC PATCH] PCI, kdump: Clear bus master bit upon shutdown in kdump kernel Kairui Song
[not found] ` <20200222165631.GA213225@google.com>
[not found] ` <CACPcB9dv1YPhRmyWvtdt2U4g=XXU7dK4bV4HB1dvCVMTpPFdzA@mail.gmail.com>
[not found] ` <CABeXuvqm1iUGt1GWC9eujuoaACdPiZ2X=3LjKJ5JXKZcXD_z_g@mail.gmail.com>
[not found] ` <CABeXuvonZpwWfcUef4PeihTJkgH2ZC_RCKuLR3rH3Re4hx36Aw@mail.gmail.com>
[not found] ` <20200305035329.GD4433@MiWiFi-R3L-srv>
[not found] ` <CABeXuvogFGv8-i4jsJYN5ya0hjf35EXLkmPqYWayDUvXaBKidA@mail.gmail.com>
[not found] ` <20200306093829.GA27711@MiWiFi-R3L-srv>
2020-07-22 14:52 ` Kairui Song
2020-07-22 15:21 ` Bjorn Helgaas
2020-07-22 21:50 ` Jerry Hoemann
2020-07-23 0:00 ` Bjorn Helgaas
2020-07-23 18:34 ` Kairui Song
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox