linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 1/1] pci/quirks: fix a dmar fault for intel 82599 card
@ 2014-09-30  6:09 Li, Zhen-Hua
  2014-09-30  6:15 ` Li, ZhenHua
  0 siblings, 1 reply; 5+ messages in thread
From: Li, Zhen-Hua @ 2014-09-30  6:09 UTC (permalink / raw)
  To: linux-kernel, Bjorn Helgaas, linux-pci
  Cc: Jeff Kirsher, Jesse Brandeburg, Bruce Allan, Carolyn Wyborny,
	Don Skidmore, Greg Rose, Alex Duyck, John Ronciak, Mitch Williams,
	Linux NICS, e1000-devel, linda.knippers, Li, Zhen-Hua

On a HP system with Intel Corporation 82599 ethernet adapter, when kernel
crashed and the kdump kernel boots with intel_iommu=on, there may be some
unexpected DMA requests on this adapter, which will cause DMA Remapping
faults like:
    dmar: DRHD: handling fault status reg 102
    dmar: DMAR:[DMA Read] Request device [41:00.0] fault addr fff81000
    DMAR:[fault reason 01] Present bit in root entry is clear

Analysis for this bug:

The present bit is set in this function:

static struct context_entry * device_to_context_entry(
		struct intel_iommu *iommu, u8 bus, u8 devfn)
{
    ......
                set_root_present(root);
    ......
}

Calling tree:
ixgbe_open
    ixgbe_setup_tx_resources
        intel_alloc_coherent
            __intel_map_single
                domain_context_mapping
                    domain_context_mapping_one
                        device_to_context_entry

This means, the present bit in root entry will not be set until the device
 driver is loaded.

But in the kdump kernel, some hardware device does not know the OS is the
 second kernel and the drivers should be loaded again, this causes there are
 some unexpected DMA requsts on this device when it has not been initialized,
 and then the DMA Remapping errors come.

To fix this DMAR fault, we need to reset the bus that this device on. Reset
 the device itself does not work.

There also was a discussion:
https://lkml.org/lkml/2013/5/14/9

Signed-off-by: Li, Zhen-Hua <zhen-hual@hp.com>
---
 drivers/pci/quirks.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
index 80c2d01..5198af3 100644
--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -25,6 +25,7 @@
 #include <linux/sched.h>
 #include <linux/ktime.h>
 #include <asm/dma.h>	/* isa_dma_bridge_buggy */
+#include <linux/crash_dump.h>
 #include "pci.h"
 
 /*
@@ -3832,3 +3833,13 @@ void pci_dev_specific_enable_acs(struct pci_dev *dev)
 		}
 	}
 }
+
+#ifdef CONFIG_CRASH_DUMP
+void quirk_reset_buggy_devices(struct pci_dev *dev)
+{
+	if (unlikely(is_kdump_kernel()))
+		pci_try_reset_bus(dev->bus);
+}
+DECLARE_PCI_FIXUP_CLASS_HEADER(PCI_VENDOR_ID_INTEL, 0x10f8,
+		PCI_CLASS_NETWORK_ETHERNET, 8, quirk_reset_buggy_devices);
+#endif
-- 
2.0.0-rc0


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH 1/1] pci/quirks: fix a dmar fault for intel 82599 card
  2014-09-30  6:09 [PATCH 1/1] pci/quirks: fix a dmar fault for intel 82599 card Li, Zhen-Hua
@ 2014-09-30  6:15 ` Li, ZhenHua
  2014-10-02 15:09   ` Bjorn Helgaas
  0 siblings, 1 reply; 5+ messages in thread
From: Li, ZhenHua @ 2014-09-30  6:15 UTC (permalink / raw)
  To: linux-kernel, Bjorn Helgaas, linux-pci, Joerg Roedel
  Cc: Li, Zhen-Hua, Jeff Kirsher, Jesse Brandeburg, Bruce Allan,
	Carolyn Wyborny, Don Skidmore, Greg Rose, Alex Duyck,
	John Ronciak, Mitch Williams, Linux NICS, e1000-devel,
	linda.knippers

Add Joerg to CC list. For it is also related to iommu module.

Joerg,
There was a try for this dmar fault,
	https://lkml.org/lkml/2014/8/18/118

This patch is trying to fix the same thing.


Zhenhua
On 09/30/2014 02:09 PM, Li, Zhen-Hua wrote:
> On a HP system with Intel Corporation 82599 ethernet adapter, when kernel
> crashed and the kdump kernel boots with intel_iommu=on, there may be some
> unexpected DMA requests on this adapter, which will cause DMA Remapping
> faults like:
>      dmar: DRHD: handling fault status reg 102
>      dmar: DMAR:[DMA Read] Request device [41:00.0] fault addr fff81000
>      DMAR:[fault reason 01] Present bit in root entry is clear
>
> Analysis for this bug:
>
> The present bit is set in this function:
>
> static struct context_entry * device_to_context_entry(
> 		struct intel_iommu *iommu, u8 bus, u8 devfn)
> {
>      ......
>                  set_root_present(root);
>      ......
> }
>
> Calling tree:
> ixgbe_open
>      ixgbe_setup_tx_resources
>          intel_alloc_coherent
>              __intel_map_single
>                  domain_context_mapping
>                      domain_context_mapping_one
>                          device_to_context_entry
>
> This means, the present bit in root entry will not be set until the device
>   driver is loaded.
>
> But in the kdump kernel, some hardware device does not know the OS is the
>   second kernel and the drivers should be loaded again, this causes there are
>   some unexpected DMA requsts on this device when it has not been initialized,
>   and then the DMA Remapping errors come.
>
> To fix this DMAR fault, we need to reset the bus that this device on. Reset
>   the device itself does not work.
>
> There also was a discussion:
> https://lkml.org/lkml/2013/5/14/9
>
> Signed-off-by: Li, Zhen-Hua <zhen-hual@hp.com>
> ---
>   drivers/pci/quirks.c | 11 +++++++++++
>   1 file changed, 11 insertions(+)
>
> diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
> index 80c2d01..5198af3 100644
> --- a/drivers/pci/quirks.c
> +++ b/drivers/pci/quirks.c
> @@ -25,6 +25,7 @@
>   #include <linux/sched.h>
>   #include <linux/ktime.h>
>   #include <asm/dma.h>	/* isa_dma_bridge_buggy */
> +#include <linux/crash_dump.h>
>   #include "pci.h"
>
>   /*
> @@ -3832,3 +3833,13 @@ void pci_dev_specific_enable_acs(struct pci_dev *dev)
>   		}
>   	}
>   }
> +
> +#ifdef CONFIG_CRASH_DUMP
> +void quirk_reset_buggy_devices(struct pci_dev *dev)
> +{
> +	if (unlikely(is_kdump_kernel()))
> +		pci_try_reset_bus(dev->bus);
> +}
> +DECLARE_PCI_FIXUP_CLASS_HEADER(PCI_VENDOR_ID_INTEL, 0x10f8,
> +		PCI_CLASS_NETWORK_ETHERNET, 8, quirk_reset_buggy_devices);
> +#endif
>


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH 1/1] pci/quirks: fix a dmar fault for intel 82599 card
  2014-09-30  6:15 ` Li, ZhenHua
@ 2014-10-02 15:09   ` Bjorn Helgaas
  2014-10-03 14:28     ` Alexander Duyck
  0 siblings, 1 reply; 5+ messages in thread
From: Bjorn Helgaas @ 2014-10-02 15:09 UTC (permalink / raw)
  To: Li, ZhenHua
  Cc: linux-kernel@vger.kernel.org, linux-pci@vger.kernel.org,
	Joerg Roedel, Jeff Kirsher, Jesse Brandeburg, Bruce Allan,
	Carolyn Wyborny, Don Skidmore, Greg Rose, Alex Duyck,
	John Ronciak, Mitch Williams, Linux NICS,
	e1000-devel@lists.sourceforge.net, linda.knippers

On Tue, Sep 30, 2014 at 12:15 AM, Li, ZhenHua <zhen-hual@hp.com> wrote:
> Add Joerg to CC list. For it is also related to iommu module.
>
> Joerg,
> There was a try for this dmar fault,
>         https://lkml.org/lkml/2014/8/18/118
>
> This patch is trying to fix the same thing.
>
>
> Zhenhua
>
> On 09/30/2014 02:09 PM, Li, Zhen-Hua wrote:
>>
>> On a HP system with Intel Corporation 82599 ethernet adapter, when kernel
>> crashed and the kdump kernel boots with intel_iommu=on, there may be some
>> unexpected DMA requests on this adapter, which will cause DMA Remapping
>> faults like:
>>      dmar: DRHD: handling fault status reg 102
>>      dmar: DMAR:[DMA Read] Request device [41:00.0] fault addr fff81000
>>      DMAR:[fault reason 01] Present bit in root entry is clear
>>
>> Analysis for this bug:
>>
>> The present bit is set in this function:
>>
>> static struct context_entry * device_to_context_entry(
>>                 struct intel_iommu *iommu, u8 bus, u8 devfn)
>> {
>>      ......
>>                  set_root_present(root);
>>      ......
>> }
>>
>> Calling tree:
>> ixgbe_open
>>      ixgbe_setup_tx_resources
>>          intel_alloc_coherent
>>              __intel_map_single
>>                  domain_context_mapping
>>                      domain_context_mapping_one
>>                          device_to_context_entry
>>
>> This means, the present bit in root entry will not be set until the device
>>   driver is loaded.
>>
>> But in the kdump kernel, some hardware device does not know the OS is the
>>   second kernel and the drivers should be loaded again, this causes there
>> are
>>   some unexpected DMA requsts on this device when it has not been
>> initialized,
>>   and then the DMA Remapping errors come.
>>
>> To fix this DMAR fault, we need to reset the bus that this device on.
>> Reset
>>   the device itself does not work.

This seems like something that could happen with *any* device, not
just the 82599 NIC.  Or is there something in the "kernel crash ->
kexec -> kdump kernel" path that stops DMA for most devices, but not
for the 82599?

>> There also was a discussion:
>> https://lkml.org/lkml/2013/5/14/9
>>
>> Signed-off-by: Li, Zhen-Hua <zhen-hual@hp.com>
>> ---
>>   drivers/pci/quirks.c | 11 +++++++++++
>>   1 file changed, 11 insertions(+)
>>
>> diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
>> index 80c2d01..5198af3 100644
>> --- a/drivers/pci/quirks.c
>> +++ b/drivers/pci/quirks.c
>> @@ -25,6 +25,7 @@
>>   #include <linux/sched.h>
>>   #include <linux/ktime.h>
>>   #include <asm/dma.h>  /* isa_dma_bridge_buggy */
>> +#include <linux/crash_dump.h>
>>   #include "pci.h"
>>
>>   /*
>> @@ -3832,3 +3833,13 @@ void pci_dev_specific_enable_acs(struct pci_dev
>> *dev)
>>                 }
>>         }
>>   }
>> +
>> +#ifdef CONFIG_CRASH_DUMP
>> +void quirk_reset_buggy_devices(struct pci_dev *dev)
>> +{
>> +       if (unlikely(is_kdump_kernel()))
>> +               pci_try_reset_bus(dev->bus);
>> +}
>> +DECLARE_PCI_FIXUP_CLASS_HEADER(PCI_VENDOR_ID_INTEL, 0x10f8,
>> +               PCI_CLASS_NETWORK_ETHERNET, 8, quirk_reset_buggy_devices);
>> +#endif
>>
>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH 1/1] pci/quirks: fix a dmar fault for intel 82599 card
  2014-10-02 15:09   ` Bjorn Helgaas
@ 2014-10-03 14:28     ` Alexander Duyck
  2014-10-08  1:46       ` Li, ZhenHua
  0 siblings, 1 reply; 5+ messages in thread
From: Alexander Duyck @ 2014-10-03 14:28 UTC (permalink / raw)
  To: Bjorn Helgaas, Li, ZhenHua
  Cc: linux-kernel@vger.kernel.org, linux-pci@vger.kernel.org,
	Joerg Roedel, Jeff Kirsher, Jesse Brandeburg, Bruce Allan,
	Carolyn Wyborny, Don Skidmore, Greg Rose, Alex Duyck,
	John Ronciak, Mitch Williams, Linux NICS,
	e1000-devel@lists.sourceforge.net, linda.knippers

On 10/02/2014 08:09 AM, Bjorn Helgaas wrote:
> On Tue, Sep 30, 2014 at 12:15 AM, Li, ZhenHua <zhen-hual@hp.com> wrote:
>> Add Joerg to CC list. For it is also related to iommu module.
>>
>> Joerg,
>> There was a try for this dmar fault,
>>         https://lkml.org/lkml/2014/8/18/118
>>
>> This patch is trying to fix the same thing.
>>
>>
>> Zhenhua
>>
>> On 09/30/2014 02:09 PM, Li, Zhen-Hua wrote:
>>> On a HP system with Intel Corporation 82599 ethernet adapter, when kernel
>>> crashed and the kdump kernel boots with intel_iommu=on, there may be some
>>> unexpected DMA requests on this adapter, which will cause DMA Remapping
>>> faults like:
>>>      dmar: DRHD: handling fault status reg 102
>>>      dmar: DMAR:[DMA Read] Request device [41:00.0] fault addr fff81000
>>>      DMAR:[fault reason 01] Present bit in root entry is clear
>>>
>>> Analysis for this bug:
>>>
>>> The present bit is set in this function:
>>>
>>> static struct context_entry * device_to_context_entry(
>>>                 struct intel_iommu *iommu, u8 bus, u8 devfn)
>>> {
>>>      ......
>>>                  set_root_present(root);
>>>      ......
>>> }
>>>
>>> Calling tree:
>>> ixgbe_open
>>>      ixgbe_setup_tx_resources
>>>          intel_alloc_coherent
>>>              __intel_map_single
>>>                  domain_context_mapping
>>>                      domain_context_mapping_one
>>>                          device_to_context_entry
>>>
>>> This means, the present bit in root entry will not be set until the device
>>>   driver is loaded.
>>>
>>> But in the kdump kernel, some hardware device does not know the OS is the
>>>   second kernel and the drivers should be loaded again, this causes there
>>> are
>>>   some unexpected DMA requsts on this device when it has not been
>>> initialized,
>>>   and then the DMA Remapping errors come.
>>>
>>> To fix this DMAR fault, we need to reset the bus that this device on.
>>> Reset
>>>   the device itself does not work.
> This seems like something that could happen with *any* device, not
> just the 82599 NIC.  Or is there something in the "kernel crash ->
> kexec -> kdump kernel" path that stops DMA for most devices, but not
> for the 82599?lex
>

This is an *any* device problem.  Specifically any device that is doing
active DMA when a kdump kernel is triggered will cause this issue since
the IOMMU will not have valid mappings for the DMA events until the
device driver itself is loaded and resets the device.

Thanks,

Alex

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH 1/1] pci/quirks: fix a dmar fault for intel 82599 card
  2014-10-03 14:28     ` Alexander Duyck
@ 2014-10-08  1:46       ` Li, ZhenHua
  0 siblings, 0 replies; 5+ messages in thread
From: Li, ZhenHua @ 2014-10-08  1:46 UTC (permalink / raw)
  To: Alexander Duyck, Bjorn Helgaas
  Cc: linux-kernel@vger.kernel.org, linux-pci@vger.kernel.org,
	Joerg Roedel, Jeff Kirsher, Jesse Brandeburg, Bruce Allan,
	Carolyn Wyborny, Don Skidmore, Greg Rose, Alex Duyck,
	John Ronciak, Mitch Williams, Linux NICS,
	e1000-devel@lists.sourceforge.net, linda.knippers

well,  then I will create a patch for ALL pcie devices.
On 10/03/2014 10:28 PM, Alexander Duyck wrote:
> On 10/02/2014 08:09 AM, Bjorn Helgaas wrote:
>> On Tue, Sep 30, 2014 at 12:15 AM, Li, ZhenHua <zhen-hual@hp.com> wrote:
>>> Add Joerg to CC list. For it is also related to iommu module.
>>>
>>> Joerg,
>>> There was a try for this dmar fault,
>>>          https://lkml.org/lkml/2014/8/18/118
>>>
>>> This patch is trying to fix the same thing.
>>>
>>>
>>> Zhenhua
>>>
>>> On 09/30/2014 02:09 PM, Li, Zhen-Hua wrote:
>>>> On a HP system with Intel Corporation 82599 ethernet adapter, when kernel
>>>> crashed and the kdump kernel boots with intel_iommu=on, there may be some
>>>> unexpected DMA requests on this adapter, which will cause DMA Remapping
>>>> faults like:
>>>>       dmar: DRHD: handling fault status reg 102
>>>>       dmar: DMAR:[DMA Read] Request device [41:00.0] fault addr fff81000
>>>>       DMAR:[fault reason 01] Present bit in root entry is clear
>>>>
>>>> Analysis for this bug:
>>>>
>>>> The present bit is set in this function:
>>>>
>>>> static struct context_entry * device_to_context_entry(
>>>>                  struct intel_iommu *iommu, u8 bus, u8 devfn)
>>>> {
>>>>       ......
>>>>                   set_root_present(root);
>>>>       ......
>>>> }
>>>>
>>>> Calling tree:
>>>> ixgbe_open
>>>>       ixgbe_setup_tx_resources
>>>>           intel_alloc_coherent
>>>>               __intel_map_single
>>>>                   domain_context_mapping
>>>>                       domain_context_mapping_one
>>>>                           device_to_context_entry
>>>>
>>>> This means, the present bit in root entry will not be set until the device
>>>>    driver is loaded.
>>>>
>>>> But in the kdump kernel, some hardware device does not know the OS is the
>>>>    second kernel and the drivers should be loaded again, this causes there
>>>> are
>>>>    some unexpected DMA requsts on this device when it has not been
>>>> initialized,
>>>>    and then the DMA Remapping errors come.
>>>>
>>>> To fix this DMAR fault, we need to reset the bus that this device on.
>>>> Reset
>>>>    the device itself does not work.
>> This seems like something that could happen with *any* device, not
>> just the 82599 NIC.  Or is there something in the "kernel crash ->
>> kexec -> kdump kernel" path that stops DMA for most devices, but not
>> for the 82599?lex
>>
>
> This is an *any* device problem.  Specifically any device that is doing
> active DMA when a kdump kernel is triggered will cause this issue since
> the IOMMU will not have valid mappings for the DMA events until the
> device driver itself is loaded and resets the device.
>
> Thanks,
>
> Alex
>


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2014-10-08  1:46 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-09-30  6:09 [PATCH 1/1] pci/quirks: fix a dmar fault for intel 82599 card Li, Zhen-Hua
2014-09-30  6:15 ` Li, ZhenHua
2014-10-02 15:09   ` Bjorn Helgaas
2014-10-03 14:28     ` Alexander Duyck
2014-10-08  1:46       ` Li, ZhenHua

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).