All of lore.kernel.org
 help / color / mirror / Atom feed
From: Bjorn Helgaas <helgaas@kernel.org>
To: Prabhakar Kushwaha <prabhakar.pkin@gmail.com>
Cc: Kuppuswamy Sathyanarayanan
	<sathyanarayanan.kuppuswamy@linux.intel.com>,
	Ganapatrao Prabhakerrao Kulkarni <gkulkarni@marvell.com>,
	Myron Stowe <myron.stowe@redhat.com>,
	Vijay Mohan Pandarathil <vijaymohan.pandarathil@hp.com>,
	Marc Zyngier <maz@kernel.org>,
	Bhupesh Sharma <bhsharma@redhat.com>,
	kexec mailing list <kexec@lists.infradead.org>,
	Robin Murphy <robin.murphy@arm.com>,
	linux-pci@vger.kernel.org,
	Prabhakar Kushwaha <pkushwaha@marvell.com>,
	Will Deacon <will@kernel.org>,
	linux-arm-kernel <linux-arm-kernel@lists.infradead.org>
Subject: Re: [PATCH][v2] iommu: arm-smmu-v3: Copy SMMU table for kdump kernel
Date: Wed, 27 May 2020 15:18:33 -0500	[thread overview]
Message-ID: <20200527201833.GA258397@bjorn-Precision-5520> (raw)
In-Reply-To: <CAJ2QiJKkHiY=qd=pEa2+b-cQenvNZ5XT_X1Djh-6P+a0smf++A@mail.gmail.com>

On Wed, May 27, 2020 at 05:14:39PM +0530, Prabhakar Kushwaha wrote:
> On Fri, May 22, 2020 at 4:19 AM Bjorn Helgaas <helgaas@kernel.org> wrote:
> > On Thu, May 21, 2020 at 09:28:20AM +0530, Prabhakar Kushwaha wrote:
> > > On Wed, May 20, 2020 at 4:52 AM Bjorn Helgaas <helgaas@kernel.org> wrote:
> > > > On Thu, May 14, 2020 at 12:47:02PM +0530, Prabhakar Kushwaha wrote:
> > > > > On Wed, May 13, 2020 at 3:33 AM Bjorn Helgaas <helgaas@kernel.org> wrote:
> > > > > > On Mon, May 11, 2020 at 07:46:06PM -0700, Prabhakar Kushwaha wrote:
> > > > > > > An SMMU Stream table is created by the primary kernel. This table is
> > > > > > > used by the SMMU to perform address translations for device-originated
> > > > > > > transactions. Any crash (if happened) launches the kdump kernel which
> > > > > > > re-creates the SMMU Stream table. New transactions will be translated
> > > > > > > via this new table..
> > > > > > >
> > > > > > > There are scenarios, where devices are still having old pending
> > > > > > > transactions (configured in the primary kernel). These transactions
> > > > > > > come in-between Stream table creation and device-driver probe.
> > > > > > > As new stream table does not have entry for older transactions,
> > > > > > > it will be aborted by SMMU.
> > > > > > >
> > > > > > > Similar observations were found with PCIe-Intel 82576 Gigabit
> > > > > > > Network card. It sends old Memory Read transaction in kdump kernel.
> > > > > > > Transactions configured for older Stream table entries, that do not
> > > > > > > exist any longer in the new table, will cause a PCIe Completion Abort.
> > > > > >
> > > > > > That sounds like exactly what we want, doesn't it?
> > > > > >
> > > > > > Or do you *want* DMA from the previous kernel to complete?  That will
> > > > > > read or scribble on something, but maybe that's not terrible as long
> > > > > > as it's not memory used by the kdump kernel.
> > > > >
> > > > > Yes, Abort should happen. But it should happen in context of driver.
> > > > > But current abort is happening because of SMMU and no driver/pcie
> > > > > setup present at this moment.
> > > >
> > > > I don't understand what you mean by "in context of driver."  The whole
> > > > problem is that we can't control *when* the abort happens, so it may
> > > > happen in *any* context.  It may happen when a NIC receives a packet
> > > > or at some other unpredictable time.
> > > >
> > > > > Solution of this issue should be at 2 place
> > > > > a) SMMU level: I still believe, this patch has potential to overcome
> > > > > issue till finally driver's probe takeover.
> > > > > b) Device level: Even if something goes wrong. Driver/device should
> > > > > able to recover.
> > > > >
> > > > > > > Returned PCIe completion abort further leads to AER Errors from APEI
> > > > > > > Generic Hardware Error Source (GHES) with completion timeout.
> > > > > > > A network device hang is observed even after continuous
> > > > > > > reset/recovery from driver, Hence device is no more usable.
> > > > > >
> > > > > > The fact that the device is no longer usable is definitely a problem.
> > > > > > But in principle we *should* be able to recover from these errors.  If
> > > > > > we could recover and reliably use the device after the error, that
> > > > > > seems like it would be a more robust solution that having to add
> > > > > > special cases in every IOMMU driver.
> > > > > >
> > > > > > If you have details about this sort of error, I'd like to try to fix
> > > > > > it because we want to recover from that sort of error in normal
> > > > > > (non-crash) situations as well.
> > > > > >
> > > > > Completion abort case should be gracefully handled.  And device should
> > > > > always remain usable.
> > > > >
> > > > > There are 2 scenario which I am testing with Ethernet card PCIe-Intel
> > > > > 82576 Gigabit Network card.
> > > > >
> > > > > I)  Crash testing using kdump root file system: De-facto scenario
> > > > >     -  kdump file system does not have Ethernet driver
> > > > >     -  A lot of AER prints [1], making it impossible to work on shell
> > > > > of kdump root file system.
> > > >
> > > > In this case, I think report_error_detected() is deciding that because
> > > > the device has no driver, we can't do anything.  The flow is like
> > > > this:
> > > >
> > > >   aer_recover_work_func               # aer_recover_work
> > > >     kfifo_get(aer_recover_ring, entry)
> > > >     dev = pci_get_domain_bus_and_slot
> > > >     cper_print_aer(dev, ...)
> > > >       pci_err("AER: aer_status:")
> > > >       pci_err("AER:   [14] CmpltTO")
> > > >       pci_err("AER: aer_layer=")
> > > >     if (AER_NONFATAL)
> > > >       pcie_do_recovery(dev, pci_channel_io_normal)
> > > >         status = CAN_RECOVER
> > > >         pci_walk_bus(report_normal_detected)
> > > >           report_error_detected
> > > >             if (!dev->driver)
> > > >               vote = NO_AER_DRIVER
> > > >               pci_info("can't recover (no error_detected callback)")
> > > >             *result = merge_result(*, NO_AER_DRIVER)
> > > >             # always NO_AER_DRIVER
> > > >         status is now NO_AER_DRIVER
> > > >
> > > > So pcie_do_recovery() does not call .report_mmio_enabled() or .slot_reset(),
> > > > and status is not RECOVERED, so it skips .resume().
> > > >
> > > > I don't remember the history there, but if a device has no driver and
> > > > the device generates errors, it seems like we ought to be able to
> > > > reset it.
> > >
> > > But how to reset the device considering there is no driver.
> > > Hypothetically, this case should be taken care by PCIe subsystem to
> > > perform reset at PCIe level.
> >
> > I don't understand your question.  The PCI core (not the device
> > driver) already does the reset.  When pcie_do_recovery() calls
> > reset_link(), all devices on the other side of the link are reset.
> >
> > > > We should be able to field one (or a few) AER errors, reset the
> > > > device, and you should be able to use the shell in the kdump kernel.
> > > >
> > > here kdump shell is usable only problem is a "lot of AER Errors". One
> > > cannot see what they are typing.
> >
> > Right, that's what I expect.  If the PCI core resets the device, you
> > should get just a few AER errors, and they should stop after the
> > device is reset.
> >
> > > > >     -  Note kdump shell allows to use makedumpfile, vmcore-dmesg applications.
> > > > >
> > > > > II) Crash testing using default root file system: Specific case to
> > > > > test Ethernet driver in second kernel
> > > > >    -  Default root file system have Ethernet driver
> > > > >    -  AER error comes even before the driver probe starts.
> > > > >    -  Driver does reset Ethernet card as part of probe but no success.
> > > > >    -  AER also tries to recover. but no success.  [2]
> > > > >    -  I also tries to remove AER errors by using "pci=noaer" bootargs
> > > > > and commenting ghes_handle_aer() from GHES driver..
> > > > >           than different set of errors come which also never able to recover [3]
> > > > >
> > >
> > > Please suggest your view on this case. Here driver is preset.
> > > (driver/net/ethernet/intel/igb/igb_main.c)
> > > In this case AER errors starts even before driver probe starts.
> > > After probe, driver does the device reset with no success and even AER
> > > recovery does not work.
> >
> > This case should be the same as the one above.  If we can change the
> > PCI core so it can reset the device when there's no driver,  that would
> > apply to case I (where there will never be a driver) and to case II
> > (where there is no driver now, but a driver will probe the device
> > later).
> 
> Does this means change are required in PCI core.

Yes, I am suggesting that the PCI core does not do the right thing
here.

> I tried following changes in pcie_do_recovery() but it did not help.
> Same error as before.
> 
> -- a/drivers/pci/pcie/err.c
> +++ b/drivers/pci/pcie/err.c
>         pci_info(dev, "broadcast resume message\n");
>         pci_walk_bus(bus, report_resume, &status);
> @@ -203,7 +207,12 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev,
>         return status;
> 
>  failed:
>         pci_uevent_ers(dev, PCI_ERS_RESULT_DISCONNECT);
> +       pci_reset_function(dev);
> +       pci_aer_clear_device_status(dev);
> +       pci_aer_clear_nonfatal_status(dev);

Did you confirm that this resets the devices in question (0000:09:00.0
and 0000:09:00.1, I think), and what reset mechanism this uses (FLR,
PM, etc)?

Case I is using APEI, and it looks like that can queue up 16 errors
(AER_RECOVER_RING_SIZE), so that queue could be completely full before
we even get a chance to reset the device.  But I would think that the
reset should *eventually* stop the errors, even though we might log
30+ of them first.

As an experiment, you could reduce AER_RECOVER_RING_SIZE to 1 or 2 and
see if it reduces the logging.

> > > Problem mentioned in case I and II goes away if do pci_reset_function
> > > during enumeration phase of kdump kernel.
> > > can we thought of doing pci_reset_function for all devices in kdump
> > > kernel or device specific quirk.
> > >
> > > --pk
> > >
> > >
> > > > > As per my understanding, possible solutions are
> > > > >  - Copy SMMU table i.e. this patch
> > > > > OR
> > > > >  - Doing pci_reset_function() during enumeration phase.
> > > > > I also tried clearing "M" bit using pci_clear_master during
> > > > > enumeration but it did not help. Because driver re-set M bit causing
> > > > > same AER error again.
> > > > >
> > > > >
> > > > > -pk
> > > > >
> > > > > ---------------------------------------------------------------------------------------------------------------------------
> > > > > [1] with bootargs having pci=noaer
> > > > >
> > > > > [   22.494648] {4}[Hardware Error]: Hardware error from APEI Generic
> > > > > Hardware Error Source: 1
> > > > > [   22.512773] {4}[Hardware Error]: event severity: recoverable
> > > > > [   22.518419] {4}[Hardware Error]:  Error 0, type: recoverable
> > > > > [   22.544804] {4}[Hardware Error]:   section_type: PCIe error
> > > > > [   22.550363] {4}[Hardware Error]:   port_type: 0, PCIe end point
> > > > > [   22.556268] {4}[Hardware Error]:   version: 3.0
> > > > > [   22.560785] {4}[Hardware Error]:   command: 0x0507, status: 0x4010
> > > > > [   22.576852] {4}[Hardware Error]:   device_id: 0000:09:00.1
> > > > > [   22.582323] {4}[Hardware Error]:   slot: 0
> > > > > [   22.586406] {4}[Hardware Error]:   secondary_bus: 0x00
> > > > > [   22.591530] {4}[Hardware Error]:   vendor_id: 0x8086, device_id: 0x10c9
> > > > > [   22.608900] {4}[Hardware Error]:   class_code: 000002
> > > > > [   22.613938] {4}[Hardware Error]:   serial number: 0xff1b4580, 0x90e2baff
> > > > > [   22.803534] pci 0000:09:00.1: AER: aer_status: 0x00004000,
> > > > > aer_mask: 0x00000000
> > > > > [   22.810838] pci 0000:09:00.1: AER:    [14] CmpltTO                (First)
> > > > > [   22.817613] pci 0000:09:00.1: AER: aer_layer=Transaction Layer,
> > > > > aer_agent=Requester ID
> > > > > [   22.847374] pci 0000:09:00.1: AER: aer_uncor_severity: 0x00062011
> > > > > [   22.866161] mpt3sas_cm0: 63 BIT PCI BUS DMA ADDRESSING SUPPORTED,
> > > > > total mem (8153768 kB)
> > > > > [   22.946178] pci 0000:09:00.0: AER: can't recover (no error_detected callback)
> > > > > [   22.995142] pci 0000:09:00.1: AER: can't recover (no error_detected callback)
> > > > > [   23.002300] pcieport 0000:00:09.0: AER: device recovery failed
> > > > > [   23.027607] pci 0000:09:00.1: AER: aer_status: 0x00004000,
> > > > > aer_mask: 0x00000000
> > > > > [   23.044109] pci 0000:09:00.1: AER:    [14] CmpltTO                (First)
> > > > > [   23.060713] pci 0000:09:00.1: AER: aer_layer=Transaction Layer,
> > > > > aer_agent=Requester ID
> > > > > [   23.068616] pci 0000:09:00.1: AER: aer_uncor_severity: 0x00062011
> > > > > [   23.122056] pci 0000:09:00.0: AER: can't recover (no error_detected callback)

<snip>

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

WARNING: multiple messages have this Message-ID (diff)
From: Bjorn Helgaas <helgaas@kernel.org>
To: Prabhakar Kushwaha <prabhakar.pkin@gmail.com>
Cc: Robin Murphy <robin.murphy@arm.com>,
	linux-arm-kernel <linux-arm-kernel@lists.infradead.org>,
	kexec mailing list <kexec@lists.infradead.org>,
	linux-pci@vger.kernel.org, Marc Zyngier <maz@kernel.org>,
	Will Deacon <will@kernel.org>,
	Ganapatrao Prabhakerrao Kulkarni <gkulkarni@marvell.com>,
	Bhupesh Sharma <bhsharma@redhat.com>,
	Prabhakar Kushwaha <pkushwaha@marvell.com>,
	Kuppuswamy Sathyanarayanan 
	<sathyanarayanan.kuppuswamy@linux.intel.com>,
	Vijay Mohan Pandarathil <vijaymohan.pandarathil@hp.com>,
	Myron Stowe <myron.stowe@redhat.com>
Subject: Re: [PATCH][v2] iommu: arm-smmu-v3: Copy SMMU table for kdump kernel
Date: Wed, 27 May 2020 15:18:33 -0500	[thread overview]
Message-ID: <20200527201833.GA258397@bjorn-Precision-5520> (raw)
In-Reply-To: <CAJ2QiJKkHiY=qd=pEa2+b-cQenvNZ5XT_X1Djh-6P+a0smf++A@mail.gmail.com>

On Wed, May 27, 2020 at 05:14:39PM +0530, Prabhakar Kushwaha wrote:
> On Fri, May 22, 2020 at 4:19 AM Bjorn Helgaas <helgaas@kernel.org> wrote:
> > On Thu, May 21, 2020 at 09:28:20AM +0530, Prabhakar Kushwaha wrote:
> > > On Wed, May 20, 2020 at 4:52 AM Bjorn Helgaas <helgaas@kernel.org> wrote:
> > > > On Thu, May 14, 2020 at 12:47:02PM +0530, Prabhakar Kushwaha wrote:
> > > > > On Wed, May 13, 2020 at 3:33 AM Bjorn Helgaas <helgaas@kernel.org> wrote:
> > > > > > On Mon, May 11, 2020 at 07:46:06PM -0700, Prabhakar Kushwaha wrote:
> > > > > > > An SMMU Stream table is created by the primary kernel. This table is
> > > > > > > used by the SMMU to perform address translations for device-originated
> > > > > > > transactions. Any crash (if happened) launches the kdump kernel which
> > > > > > > re-creates the SMMU Stream table. New transactions will be translated
> > > > > > > via this new table..
> > > > > > >
> > > > > > > There are scenarios, where devices are still having old pending
> > > > > > > transactions (configured in the primary kernel). These transactions
> > > > > > > come in-between Stream table creation and device-driver probe.
> > > > > > > As new stream table does not have entry for older transactions,
> > > > > > > it will be aborted by SMMU.
> > > > > > >
> > > > > > > Similar observations were found with PCIe-Intel 82576 Gigabit
> > > > > > > Network card. It sends old Memory Read transaction in kdump kernel.
> > > > > > > Transactions configured for older Stream table entries, that do not
> > > > > > > exist any longer in the new table, will cause a PCIe Completion Abort.
> > > > > >
> > > > > > That sounds like exactly what we want, doesn't it?
> > > > > >
> > > > > > Or do you *want* DMA from the previous kernel to complete?  That will
> > > > > > read or scribble on something, but maybe that's not terrible as long
> > > > > > as it's not memory used by the kdump kernel.
> > > > >
> > > > > Yes, Abort should happen. But it should happen in context of driver.
> > > > > But current abort is happening because of SMMU and no driver/pcie
> > > > > setup present at this moment.
> > > >
> > > > I don't understand what you mean by "in context of driver."  The whole
> > > > problem is that we can't control *when* the abort happens, so it may
> > > > happen in *any* context.  It may happen when a NIC receives a packet
> > > > or at some other unpredictable time.
> > > >
> > > > > Solution of this issue should be at 2 place
> > > > > a) SMMU level: I still believe, this patch has potential to overcome
> > > > > issue till finally driver's probe takeover.
> > > > > b) Device level: Even if something goes wrong. Driver/device should
> > > > > able to recover.
> > > > >
> > > > > > > Returned PCIe completion abort further leads to AER Errors from APEI
> > > > > > > Generic Hardware Error Source (GHES) with completion timeout.
> > > > > > > A network device hang is observed even after continuous
> > > > > > > reset/recovery from driver, Hence device is no more usable.
> > > > > >
> > > > > > The fact that the device is no longer usable is definitely a problem.
> > > > > > But in principle we *should* be able to recover from these errors.  If
> > > > > > we could recover and reliably use the device after the error, that
> > > > > > seems like it would be a more robust solution that having to add
> > > > > > special cases in every IOMMU driver.
> > > > > >
> > > > > > If you have details about this sort of error, I'd like to try to fix
> > > > > > it because we want to recover from that sort of error in normal
> > > > > > (non-crash) situations as well.
> > > > > >
> > > > > Completion abort case should be gracefully handled.  And device should
> > > > > always remain usable.
> > > > >
> > > > > There are 2 scenario which I am testing with Ethernet card PCIe-Intel
> > > > > 82576 Gigabit Network card.
> > > > >
> > > > > I)  Crash testing using kdump root file system: De-facto scenario
> > > > >     -  kdump file system does not have Ethernet driver
> > > > >     -  A lot of AER prints [1], making it impossible to work on shell
> > > > > of kdump root file system.
> > > >
> > > > In this case, I think report_error_detected() is deciding that because
> > > > the device has no driver, we can't do anything.  The flow is like
> > > > this:
> > > >
> > > >   aer_recover_work_func               # aer_recover_work
> > > >     kfifo_get(aer_recover_ring, entry)
> > > >     dev = pci_get_domain_bus_and_slot
> > > >     cper_print_aer(dev, ...)
> > > >       pci_err("AER: aer_status:")
> > > >       pci_err("AER:   [14] CmpltTO")
> > > >       pci_err("AER: aer_layer=")
> > > >     if (AER_NONFATAL)
> > > >       pcie_do_recovery(dev, pci_channel_io_normal)
> > > >         status = CAN_RECOVER
> > > >         pci_walk_bus(report_normal_detected)
> > > >           report_error_detected
> > > >             if (!dev->driver)
> > > >               vote = NO_AER_DRIVER
> > > >               pci_info("can't recover (no error_detected callback)")
> > > >             *result = merge_result(*, NO_AER_DRIVER)
> > > >             # always NO_AER_DRIVER
> > > >         status is now NO_AER_DRIVER
> > > >
> > > > So pcie_do_recovery() does not call .report_mmio_enabled() or .slot_reset(),
> > > > and status is not RECOVERED, so it skips .resume().
> > > >
> > > > I don't remember the history there, but if a device has no driver and
> > > > the device generates errors, it seems like we ought to be able to
> > > > reset it.
> > >
> > > But how to reset the device considering there is no driver.
> > > Hypothetically, this case should be taken care by PCIe subsystem to
> > > perform reset at PCIe level.
> >
> > I don't understand your question.  The PCI core (not the device
> > driver) already does the reset.  When pcie_do_recovery() calls
> > reset_link(), all devices on the other side of the link are reset.
> >
> > > > We should be able to field one (or a few) AER errors, reset the
> > > > device, and you should be able to use the shell in the kdump kernel.
> > > >
> > > here kdump shell is usable only problem is a "lot of AER Errors". One
> > > cannot see what they are typing.
> >
> > Right, that's what I expect.  If the PCI core resets the device, you
> > should get just a few AER errors, and they should stop after the
> > device is reset.
> >
> > > > >     -  Note kdump shell allows to use makedumpfile, vmcore-dmesg applications.
> > > > >
> > > > > II) Crash testing using default root file system: Specific case to
> > > > > test Ethernet driver in second kernel
> > > > >    -  Default root file system have Ethernet driver
> > > > >    -  AER error comes even before the driver probe starts.
> > > > >    -  Driver does reset Ethernet card as part of probe but no success.
> > > > >    -  AER also tries to recover. but no success.  [2]
> > > > >    -  I also tries to remove AER errors by using "pci=noaer" bootargs
> > > > > and commenting ghes_handle_aer() from GHES driver..
> > > > >           than different set of errors come which also never able to recover [3]
> > > > >
> > >
> > > Please suggest your view on this case. Here driver is preset.
> > > (driver/net/ethernet/intel/igb/igb_main.c)
> > > In this case AER errors starts even before driver probe starts.
> > > After probe, driver does the device reset with no success and even AER
> > > recovery does not work.
> >
> > This case should be the same as the one above.  If we can change the
> > PCI core so it can reset the device when there's no driver,  that would
> > apply to case I (where there will never be a driver) and to case II
> > (where there is no driver now, but a driver will probe the device
> > later).
> 
> Does this means change are required in PCI core.

Yes, I am suggesting that the PCI core does not do the right thing
here.

> I tried following changes in pcie_do_recovery() but it did not help.
> Same error as before.
> 
> -- a/drivers/pci/pcie/err.c
> +++ b/drivers/pci/pcie/err.c
>         pci_info(dev, "broadcast resume message\n");
>         pci_walk_bus(bus, report_resume, &status);
> @@ -203,7 +207,12 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev,
>         return status;
> 
>  failed:
>         pci_uevent_ers(dev, PCI_ERS_RESULT_DISCONNECT);
> +       pci_reset_function(dev);
> +       pci_aer_clear_device_status(dev);
> +       pci_aer_clear_nonfatal_status(dev);

Did you confirm that this resets the devices in question (0000:09:00.0
and 0000:09:00.1, I think), and what reset mechanism this uses (FLR,
PM, etc)?

Case I is using APEI, and it looks like that can queue up 16 errors
(AER_RECOVER_RING_SIZE), so that queue could be completely full before
we even get a chance to reset the device.  But I would think that the
reset should *eventually* stop the errors, even though we might log
30+ of them first.

As an experiment, you could reduce AER_RECOVER_RING_SIZE to 1 or 2 and
see if it reduces the logging.

> > > Problem mentioned in case I and II goes away if do pci_reset_function
> > > during enumeration phase of kdump kernel.
> > > can we thought of doing pci_reset_function for all devices in kdump
> > > kernel or device specific quirk.
> > >
> > > --pk
> > >
> > >
> > > > > As per my understanding, possible solutions are
> > > > >  - Copy SMMU table i.e. this patch
> > > > > OR
> > > > >  - Doing pci_reset_function() during enumeration phase.
> > > > > I also tried clearing "M" bit using pci_clear_master during
> > > > > enumeration but it did not help. Because driver re-set M bit causing
> > > > > same AER error again.
> > > > >
> > > > >
> > > > > -pk
> > > > >
> > > > > ---------------------------------------------------------------------------------------------------------------------------
> > > > > [1] with bootargs having pci=noaer
> > > > >
> > > > > [   22.494648] {4}[Hardware Error]: Hardware error from APEI Generic
> > > > > Hardware Error Source: 1
> > > > > [   22.512773] {4}[Hardware Error]: event severity: recoverable
> > > > > [   22.518419] {4}[Hardware Error]:  Error 0, type: recoverable
> > > > > [   22.544804] {4}[Hardware Error]:   section_type: PCIe error
> > > > > [   22.550363] {4}[Hardware Error]:   port_type: 0, PCIe end point
> > > > > [   22.556268] {4}[Hardware Error]:   version: 3.0
> > > > > [   22.560785] {4}[Hardware Error]:   command: 0x0507, status: 0x4010
> > > > > [   22.576852] {4}[Hardware Error]:   device_id: 0000:09:00.1
> > > > > [   22.582323] {4}[Hardware Error]:   slot: 0
> > > > > [   22.586406] {4}[Hardware Error]:   secondary_bus: 0x00
> > > > > [   22.591530] {4}[Hardware Error]:   vendor_id: 0x8086, device_id: 0x10c9
> > > > > [   22.608900] {4}[Hardware Error]:   class_code: 000002
> > > > > [   22.613938] {4}[Hardware Error]:   serial number: 0xff1b4580, 0x90e2baff
> > > > > [   22.803534] pci 0000:09:00.1: AER: aer_status: 0x00004000,
> > > > > aer_mask: 0x00000000
> > > > > [   22.810838] pci 0000:09:00.1: AER:    [14] CmpltTO                (First)
> > > > > [   22.817613] pci 0000:09:00.1: AER: aer_layer=Transaction Layer,
> > > > > aer_agent=Requester ID
> > > > > [   22.847374] pci 0000:09:00.1: AER: aer_uncor_severity: 0x00062011
> > > > > [   22.866161] mpt3sas_cm0: 63 BIT PCI BUS DMA ADDRESSING SUPPORTED,
> > > > > total mem (8153768 kB)
> > > > > [   22.946178] pci 0000:09:00.0: AER: can't recover (no error_detected callback)
> > > > > [   22.995142] pci 0000:09:00.1: AER: can't recover (no error_detected callback)
> > > > > [   23.002300] pcieport 0000:00:09.0: AER: device recovery failed
> > > > > [   23.027607] pci 0000:09:00.1: AER: aer_status: 0x00004000,
> > > > > aer_mask: 0x00000000
> > > > > [   23.044109] pci 0000:09:00.1: AER:    [14] CmpltTO                (First)
> > > > > [   23.060713] pci 0000:09:00.1: AER: aer_layer=Transaction Layer,
> > > > > aer_agent=Requester ID
> > > > > [   23.068616] pci 0000:09:00.1: AER: aer_uncor_severity: 0x00062011
> > > > > [   23.122056] pci 0000:09:00.0: AER: can't recover (no error_detected callback)

<snip>

WARNING: multiple messages have this Message-ID (diff)
From: Bjorn Helgaas <helgaas@kernel.org>
To: Prabhakar Kushwaha <prabhakar.pkin@gmail.com>
Cc: Kuppuswamy Sathyanarayanan
	<sathyanarayanan.kuppuswamy@linux.intel.com>,
	Ganapatrao Prabhakerrao Kulkarni <gkulkarni@marvell.com>,
	Myron Stowe <myron.stowe@redhat.com>,
	Vijay Mohan Pandarathil <vijaymohan.pandarathil@hp.com>,
	Marc Zyngier <maz@kernel.org>,
	Bhupesh Sharma <bhsharma@redhat.com>,
	kexec mailing list <kexec@lists.infradead.org>,
	Robin Murphy <robin.murphy@arm.com>,
	linux-pci@vger.kernel.org,
	Prabhakar Kushwaha <pkushwaha@marvell.com>,
	Will Deacon <will@kernel.org>,
	linux-arm-kernel <linux-arm-kernel@lists.infradead.org>
Subject: Re: [PATCH][v2] iommu: arm-smmu-v3: Copy SMMU table for kdump kernel
Date: Wed, 27 May 2020 15:18:33 -0500	[thread overview]
Message-ID: <20200527201833.GA258397@bjorn-Precision-5520> (raw)
In-Reply-To: <CAJ2QiJKkHiY=qd=pEa2+b-cQenvNZ5XT_X1Djh-6P+a0smf++A@mail.gmail.com>

On Wed, May 27, 2020 at 05:14:39PM +0530, Prabhakar Kushwaha wrote:
> On Fri, May 22, 2020 at 4:19 AM Bjorn Helgaas <helgaas@kernel.org> wrote:
> > On Thu, May 21, 2020 at 09:28:20AM +0530, Prabhakar Kushwaha wrote:
> > > On Wed, May 20, 2020 at 4:52 AM Bjorn Helgaas <helgaas@kernel.org> wrote:
> > > > On Thu, May 14, 2020 at 12:47:02PM +0530, Prabhakar Kushwaha wrote:
> > > > > On Wed, May 13, 2020 at 3:33 AM Bjorn Helgaas <helgaas@kernel.org> wrote:
> > > > > > On Mon, May 11, 2020 at 07:46:06PM -0700, Prabhakar Kushwaha wrote:
> > > > > > > An SMMU Stream table is created by the primary kernel. This table is
> > > > > > > used by the SMMU to perform address translations for device-originated
> > > > > > > transactions. Any crash (if happened) launches the kdump kernel which
> > > > > > > re-creates the SMMU Stream table. New transactions will be translated
> > > > > > > via this new table..
> > > > > > >
> > > > > > > There are scenarios, where devices are still having old pending
> > > > > > > transactions (configured in the primary kernel). These transactions
> > > > > > > come in-between Stream table creation and device-driver probe.
> > > > > > > As new stream table does not have entry for older transactions,
> > > > > > > it will be aborted by SMMU.
> > > > > > >
> > > > > > > Similar observations were found with PCIe-Intel 82576 Gigabit
> > > > > > > Network card. It sends old Memory Read transaction in kdump kernel.
> > > > > > > Transactions configured for older Stream table entries, that do not
> > > > > > > exist any longer in the new table, will cause a PCIe Completion Abort.
> > > > > >
> > > > > > That sounds like exactly what we want, doesn't it?
> > > > > >
> > > > > > Or do you *want* DMA from the previous kernel to complete?  That will
> > > > > > read or scribble on something, but maybe that's not terrible as long
> > > > > > as it's not memory used by the kdump kernel.
> > > > >
> > > > > Yes, Abort should happen. But it should happen in context of driver.
> > > > > But current abort is happening because of SMMU and no driver/pcie
> > > > > setup present at this moment.
> > > >
> > > > I don't understand what you mean by "in context of driver."  The whole
> > > > problem is that we can't control *when* the abort happens, so it may
> > > > happen in *any* context.  It may happen when a NIC receives a packet
> > > > or at some other unpredictable time.
> > > >
> > > > > Solution of this issue should be at 2 place
> > > > > a) SMMU level: I still believe, this patch has potential to overcome
> > > > > issue till finally driver's probe takeover.
> > > > > b) Device level: Even if something goes wrong. Driver/device should
> > > > > able to recover.
> > > > >
> > > > > > > Returned PCIe completion abort further leads to AER Errors from APEI
> > > > > > > Generic Hardware Error Source (GHES) with completion timeout.
> > > > > > > A network device hang is observed even after continuous
> > > > > > > reset/recovery from driver, Hence device is no more usable.
> > > > > >
> > > > > > The fact that the device is no longer usable is definitely a problem.
> > > > > > But in principle we *should* be able to recover from these errors.  If
> > > > > > we could recover and reliably use the device after the error, that
> > > > > > seems like it would be a more robust solution that having to add
> > > > > > special cases in every IOMMU driver.
> > > > > >
> > > > > > If you have details about this sort of error, I'd like to try to fix
> > > > > > it because we want to recover from that sort of error in normal
> > > > > > (non-crash) situations as well.
> > > > > >
> > > > > Completion abort case should be gracefully handled.  And device should
> > > > > always remain usable.
> > > > >
> > > > > There are 2 scenario which I am testing with Ethernet card PCIe-Intel
> > > > > 82576 Gigabit Network card.
> > > > >
> > > > > I)  Crash testing using kdump root file system: De-facto scenario
> > > > >     -  kdump file system does not have Ethernet driver
> > > > >     -  A lot of AER prints [1], making it impossible to work on shell
> > > > > of kdump root file system.
> > > >
> > > > In this case, I think report_error_detected() is deciding that because
> > > > the device has no driver, we can't do anything.  The flow is like
> > > > this:
> > > >
> > > >   aer_recover_work_func               # aer_recover_work
> > > >     kfifo_get(aer_recover_ring, entry)
> > > >     dev = pci_get_domain_bus_and_slot
> > > >     cper_print_aer(dev, ...)
> > > >       pci_err("AER: aer_status:")
> > > >       pci_err("AER:   [14] CmpltTO")
> > > >       pci_err("AER: aer_layer=")
> > > >     if (AER_NONFATAL)
> > > >       pcie_do_recovery(dev, pci_channel_io_normal)
> > > >         status = CAN_RECOVER
> > > >         pci_walk_bus(report_normal_detected)
> > > >           report_error_detected
> > > >             if (!dev->driver)
> > > >               vote = NO_AER_DRIVER
> > > >               pci_info("can't recover (no error_detected callback)")
> > > >             *result = merge_result(*, NO_AER_DRIVER)
> > > >             # always NO_AER_DRIVER
> > > >         status is now NO_AER_DRIVER
> > > >
> > > > So pcie_do_recovery() does not call .report_mmio_enabled() or .slot_reset(),
> > > > and status is not RECOVERED, so it skips .resume().
> > > >
> > > > I don't remember the history there, but if a device has no driver and
> > > > the device generates errors, it seems like we ought to be able to
> > > > reset it.
> > >
> > > But how to reset the device considering there is no driver.
> > > Hypothetically, this case should be taken care by PCIe subsystem to
> > > perform reset at PCIe level.
> >
> > I don't understand your question.  The PCI core (not the device
> > driver) already does the reset.  When pcie_do_recovery() calls
> > reset_link(), all devices on the other side of the link are reset.
> >
> > > > We should be able to field one (or a few) AER errors, reset the
> > > > device, and you should be able to use the shell in the kdump kernel.
> > > >
> > > here kdump shell is usable only problem is a "lot of AER Errors". One
> > > cannot see what they are typing.
> >
> > Right, that's what I expect.  If the PCI core resets the device, you
> > should get just a few AER errors, and they should stop after the
> > device is reset.
> >
> > > > >     -  Note kdump shell allows to use makedumpfile, vmcore-dmesg applications.
> > > > >
> > > > > II) Crash testing using default root file system: Specific case to
> > > > > test Ethernet driver in second kernel
> > > > >    -  Default root file system have Ethernet driver
> > > > >    -  AER error comes even before the driver probe starts.
> > > > >    -  Driver does reset Ethernet card as part of probe but no success.
> > > > >    -  AER also tries to recover. but no success.  [2]
> > > > >    -  I also tries to remove AER errors by using "pci=noaer" bootargs
> > > > > and commenting ghes_handle_aer() from GHES driver..
> > > > >           than different set of errors come which also never able to recover [3]
> > > > >
> > >
> > > Please suggest your view on this case. Here driver is preset.
> > > (driver/net/ethernet/intel/igb/igb_main.c)
> > > In this case AER errors starts even before driver probe starts.
> > > After probe, driver does the device reset with no success and even AER
> > > recovery does not work.
> >
> > This case should be the same as the one above.  If we can change the
> > PCI core so it can reset the device when there's no driver,  that would
> > apply to case I (where there will never be a driver) and to case II
> > (where there is no driver now, but a driver will probe the device
> > later).
> 
> Does this means change are required in PCI core.

Yes, I am suggesting that the PCI core does not do the right thing
here.

> I tried following changes in pcie_do_recovery() but it did not help.
> Same error as before.
> 
> -- a/drivers/pci/pcie/err.c
> +++ b/drivers/pci/pcie/err.c
>         pci_info(dev, "broadcast resume message\n");
>         pci_walk_bus(bus, report_resume, &status);
> @@ -203,7 +207,12 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev,
>         return status;
> 
>  failed:
>         pci_uevent_ers(dev, PCI_ERS_RESULT_DISCONNECT);
> +       pci_reset_function(dev);
> +       pci_aer_clear_device_status(dev);
> +       pci_aer_clear_nonfatal_status(dev);

Did you confirm that this resets the devices in question (0000:09:00.0
and 0000:09:00.1, I think), and what reset mechanism this uses (FLR,
PM, etc)?

Case I is using APEI, and it looks like that can queue up 16 errors
(AER_RECOVER_RING_SIZE), so that queue could be completely full before
we even get a chance to reset the device.  But I would think that the
reset should *eventually* stop the errors, even though we might log
30+ of them first.

As an experiment, you could reduce AER_RECOVER_RING_SIZE to 1 or 2 and
see if it reduces the logging.

> > > Problem mentioned in case I and II goes away if do pci_reset_function
> > > during enumeration phase of kdump kernel.
> > > can we thought of doing pci_reset_function for all devices in kdump
> > > kernel or device specific quirk.
> > >
> > > --pk
> > >
> > >
> > > > > As per my understanding, possible solutions are
> > > > >  - Copy SMMU table i.e. this patch
> > > > > OR
> > > > >  - Doing pci_reset_function() during enumeration phase.
> > > > > I also tried clearing "M" bit using pci_clear_master during
> > > > > enumeration but it did not help. Because driver re-set M bit causing
> > > > > same AER error again.
> > > > >
> > > > >
> > > > > -pk
> > > > >
> > > > > ---------------------------------------------------------------------------------------------------------------------------
> > > > > [1] with bootargs having pci=noaer
> > > > >
> > > > > [   22.494648] {4}[Hardware Error]: Hardware error from APEI Generic
> > > > > Hardware Error Source: 1
> > > > > [   22.512773] {4}[Hardware Error]: event severity: recoverable
> > > > > [   22.518419] {4}[Hardware Error]:  Error 0, type: recoverable
> > > > > [   22.544804] {4}[Hardware Error]:   section_type: PCIe error
> > > > > [   22.550363] {4}[Hardware Error]:   port_type: 0, PCIe end point
> > > > > [   22.556268] {4}[Hardware Error]:   version: 3.0
> > > > > [   22.560785] {4}[Hardware Error]:   command: 0x0507, status: 0x4010
> > > > > [   22.576852] {4}[Hardware Error]:   device_id: 0000:09:00.1
> > > > > [   22.582323] {4}[Hardware Error]:   slot: 0
> > > > > [   22.586406] {4}[Hardware Error]:   secondary_bus: 0x00
> > > > > [   22.591530] {4}[Hardware Error]:   vendor_id: 0x8086, device_id: 0x10c9
> > > > > [   22.608900] {4}[Hardware Error]:   class_code: 000002
> > > > > [   22.613938] {4}[Hardware Error]:   serial number: 0xff1b4580, 0x90e2baff
> > > > > [   22.803534] pci 0000:09:00.1: AER: aer_status: 0x00004000,
> > > > > aer_mask: 0x00000000
> > > > > [   22.810838] pci 0000:09:00.1: AER:    [14] CmpltTO                (First)
> > > > > [   22.817613] pci 0000:09:00.1: AER: aer_layer=Transaction Layer,
> > > > > aer_agent=Requester ID
> > > > > [   22.847374] pci 0000:09:00.1: AER: aer_uncor_severity: 0x00062011
> > > > > [   22.866161] mpt3sas_cm0: 63 BIT PCI BUS DMA ADDRESSING SUPPORTED,
> > > > > total mem (8153768 kB)
> > > > > [   22.946178] pci 0000:09:00.0: AER: can't recover (no error_detected callback)
> > > > > [   22.995142] pci 0000:09:00.1: AER: can't recover (no error_detected callback)
> > > > > [   23.002300] pcieport 0000:00:09.0: AER: device recovery failed
> > > > > [   23.027607] pci 0000:09:00.1: AER: aer_status: 0x00004000,
> > > > > aer_mask: 0x00000000
> > > > > [   23.044109] pci 0000:09:00.1: AER:    [14] CmpltTO                (First)
> > > > > [   23.060713] pci 0000:09:00.1: AER: aer_layer=Transaction Layer,
> > > > > aer_agent=Requester ID
> > > > > [   23.068616] pci 0000:09:00.1: AER: aer_uncor_severity: 0x00062011
> > > > > [   23.122056] pci 0000:09:00.0: AER: can't recover (no error_detected callback)

<snip>

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

  reply	other threads:[~2020-05-27 20:18 UTC|newest]

Thread overview: 55+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-05-12  2:46 [PATCH][v2] iommu: arm-smmu-v3: Copy SMMU table for kdump kernel Prabhakar Kushwaha
2020-05-12  2:46 ` Prabhakar Kushwaha
2020-05-12 22:03 ` Bjorn Helgaas
2020-05-12 22:03   ` Bjorn Helgaas
2020-05-12 22:03   ` Bjorn Helgaas
2020-05-14  7:17   ` Prabhakar Kushwaha
2020-05-14  7:17     ` Prabhakar Kushwaha
2020-05-14  7:17     ` Prabhakar Kushwaha
2020-05-19 23:22     ` Bjorn Helgaas
2020-05-19 23:22       ` Bjorn Helgaas
2020-05-19 23:22       ` Bjorn Helgaas
2020-05-21  3:58       ` Prabhakar Kushwaha
2020-05-21  3:58         ` Prabhakar Kushwaha
2020-05-21  3:58         ` Prabhakar Kushwaha
2020-05-21 22:49         ` Bjorn Helgaas
2020-05-21 22:49           ` Bjorn Helgaas
2020-05-21 22:49           ` Bjorn Helgaas
2020-05-27 11:44           ` Prabhakar Kushwaha
2020-05-27 11:44             ` Prabhakar Kushwaha
2020-05-27 11:44             ` Prabhakar Kushwaha
2020-05-27 20:18             ` Bjorn Helgaas [this message]
2020-05-27 20:18               ` Bjorn Helgaas
2020-05-27 20:18               ` Bjorn Helgaas
2020-05-29 14:18               ` Prabhakar Kushwaha
2020-05-29 14:18                 ` Prabhakar Kushwaha
2020-05-29 14:18                 ` Prabhakar Kushwaha
2020-05-29 19:33                 ` Bjorn Helgaas
2020-05-29 19:33                   ` Bjorn Helgaas
2020-05-29 19:33                   ` Bjorn Helgaas
2020-06-03 17:42                   ` Prabhakar Kushwaha
2020-06-03 17:42                     ` Prabhakar Kushwaha
2020-06-03 17:42                     ` Prabhakar Kushwaha
2020-06-04  0:02                     ` Bjorn Helgaas
2020-06-04  0:02                       ` Bjorn Helgaas
2020-06-04  0:02                       ` Bjorn Helgaas
2020-06-07  8:30                       ` Prabhakar Kushwaha
2020-06-07  8:30                         ` Prabhakar Kushwaha
2020-06-07  8:30                         ` Prabhakar Kushwaha
2020-06-11 23:03                         ` Bjorn Helgaas
2020-06-11 23:03                           ` Bjorn Helgaas
2020-06-11 23:03                           ` Bjorn Helgaas
2020-05-18 15:55 ` Will Deacon
2020-05-18 15:55   ` Will Deacon
2020-05-19  2:54   ` Prabhakar Kushwaha
2020-05-19  2:54     ` Prabhakar Kushwaha
2020-05-21  9:23     ` Will Deacon
2020-05-21  9:23       ` Will Deacon
2020-05-21 11:22       ` Prabhakar Kushwaha
2020-05-21 11:22         ` Prabhakar Kushwaha
2020-06-01  7:39         ` Will Deacon
2020-06-01  7:39           ` Will Deacon
2020-06-02 14:04           ` Prabhakar Kushwaha
2020-06-02 14:04             ` Prabhakar Kushwaha
2020-06-08 11:41             ` Will Deacon
2020-06-08 11:41               ` Will Deacon

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200527201833.GA258397@bjorn-Precision-5520 \
    --to=helgaas@kernel.org \
    --cc=bhsharma@redhat.com \
    --cc=gkulkarni@marvell.com \
    --cc=kexec@lists.infradead.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=maz@kernel.org \
    --cc=myron.stowe@redhat.com \
    --cc=pkushwaha@marvell.com \
    --cc=prabhakar.pkin@gmail.com \
    --cc=robin.murphy@arm.com \
    --cc=sathyanarayanan.kuppuswamy@linux.intel.com \
    --cc=vijaymohan.pandarathil@hp.com \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.