From mboxrd@z Thu Jan 1 00:00:00 1970 From: Lutz Vieweg Subject: Re: [E1000-devel] AMD-Vi: Event logged IO_PAGE_FAULT - ixgbe Detected Tx Unit Hang - Reset adapter - master disable timed out Date: Mon, 13 Jun 2016 19:40:11 +0200 Message-ID: <575EEFFB.20004@5t9.de> References: <5759A009.8040200@5t9.de> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii"; Format="flowed" Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: iommu-bounces-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org Errors-To: iommu-bounces-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org To: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org Cc: e1000-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f@public.gmane.org List-Id: iommu@lists.linux-foundation.org On 06/13/2016 04:46 AM, Wan ZongShun wrote: >> With "iommu=pt": >>> >>> [ 4.832580] iommu: Adding device 0000:04:00.0 to group 13 >>> [ 4.832838] iommu: Using direct mapping for device 0000:04:00.0 >> > > That is right, you will pass through AMD IOMMU when you set iommu=pt. > >> ... >>> >>> [ 4.837074] AMD-Vi: Found IOMMU at 0000:00:00.2 cap 0x40 >>> [ 4.837305] AMD-Vi: Found IOMMU at 0000:40:00.2 cap 0x40 >>> [ 4.837535] AMD-Vi: Interrupt remapping enabled >>> [ 4.838062] AMD-Vi: Lazy IO/TLB flushing enabled >>> [ 4.838291] PCI-DMA: Using software bounce buffering for IO (SWIOTLB) >>> [ 4.838533] software IO TLB [mem 0xd3e80000-0xd7e80000] (64MB) mapped >>> at [ffff8800d3e80000-ffff8800d7e7ffff] >> >> >> I hope that doesn't mean all my network data is now passing through >> an additional copy-by-CPU... that would be kind of the opposite of what >> "iommu=pt" seemed to promise :-) > > It depends. > > Firstly, I need to know if your ethernet card works well now or not > after you set iommu=pt. Too early to tell - the NIC worked for the last 4 days now without failing, however, that is only about the same time as it took after the upgrade to linux-4.6.1 before the bug was encountered, first. I'd say celebration of "works with iommu=pt" has to wait for at least two weeks or so before it is reasonably probable it works for this reason. > If your ethernet card with 64bit(not 32bit) DMA addressable cap, that > is ok, you will not be impacted by bounce buffer. > But iommu=pt is a terrible option, that make all devices bypass the iommu. Why is that terrible? The documentation I found on what iommu=pt actually means were pretty scarce, but I noticed how many places recommended to use this option for 10G NICs. > If you want to get further help, Please try: > > (1)Please add 'amd_iommu_dump' option in your kernel boot option, and > send your full kernel logs, lspci info, don't add iommu=pt. > (2) Add amd_iommu=fullflush option to kernel boot option, just try it. Will try that when the NIC becomes unavailable again. >> One more thing I find curious, but this didn't change with "iommu=pt": >>> >>> [ 0.000000] AGP: Checking aperture... >>> [ 0.000000] AGP: No AGP bridge found >>> [ 0.000000] AGP: Node 0: aperture [bus addr 0x00000000-0x01ffffff] >>> (32MB) >>> [ 0.000000] AGP: Your BIOS doesn't leave an aperture memory hole >>> [ 0.000000] AGP: Please enable the IOMMU option in the BIOS setup >>> [ 0.000000] AGP: This costs you 64MB of RAM >>> [ 0.000000] AGP: Mapping aperture over RAM [mem 0xcc000000-0xcfffffff] >>> (65536KB) >> >> I checked and the IOMMU-option is definitely enabled in the BIOS setup. >> So I assume right that these message are irrelevant (since AGP as a whole >> is irrelevant on this server)? > > Please cat /proc/iomem, send the information. Here it is: > 00000000-00000fff : reserved > 00001000-00097bff : System RAM > 00097c00-0009ffff : reserved > 000a0000-000bffff : PCI Bus 0000:00 > 000c0000-000c7fff : Video ROM > 000ce800-000d43ff : Adapter ROM > 000d4800-000d57ff : Adapter ROM > 000e6000-000fffff : reserved > 000f0000-000fffff : System ROM > 00100000-d7e7ffff : System RAM > 01000000-01688c05 : Kernel code > 01688c06-01d4f53f : Kernel data > 01eea000-02174fff : Kernel bss > d7e80000-d7e8dfff : RAM buffer > d7e8e000-d7e8ffff : reserved > d7e90000-d7eb3fff : ACPI Tables > d7eb4000-d7edffff : ACPI Non-volatile Storage > d7ee0000-d7ffffff : reserved > d9000000-daffffff : PCI Bus 0000:40 > d9000000-d90003ff : IOAPIC 2 > d9010000-d9013fff : amd_iommu > db000000-dcffffff : PCI Bus 0000:00 > db000000-dbffffff : PCI Bus 0000:01 > db000000-dbffffff : 0000:01:04.0 > db000000-dbffffff : mgadrmfb_vram > dcd00000-dcffffff : PCI Bus 0000:04 > dcdfc000-dcdfffff : 0000:04:00.0 > dcdfc000-dcdfffff : ixgbe > dce00000-dcffffff : 0000:04:00.0 > dce00000-dcffffff : ixgbe > dd000000-dfffffff : PCI Bus 0000:00 > def00000-df7fffff : PCI Bus 0000:01 > deffc000-deffffff : 0000:01:04.0 > deffc000-deffffff : mgadrmfb_mmio > df000000-df7fffff : 0000:01:04.0 > dfaf6000-dfaf6fff : 0000:00:12.1 > dfaf6000-dfaf6fff : ohci_hcd > dfaf7000-dfaf7fff : 0000:00:12.0 > dfaf7000-dfaf7fff : ohci_hcd > dfaf8400-dfaf87ff : 0000:00:11.0 > dfaf8400-dfaf87ff : ahci > dfaf8800-dfaf88ff : 0000:00:12.2 > dfaf8800-dfaf88ff : ehci_hcd > dfaf8c00-dfaf8cff : 0000:00:13.2 > dfaf8c00-dfaf8cff : ehci_hcd > dfaf9000-dfaf9fff : 0000:00:13.1 > dfaf9000-dfaf9fff : ohci_hcd > dfafa000-dfafafff : 0000:00:13.0 > dfafa000-dfafafff : ohci_hcd > dfafb000-dfafbfff : 0000:00:14.5 > dfafb000-dfafbfff : ohci_hcd > dfb00000-dfbfffff : PCI Bus 0000:02 > dfb1c000-dfb1ffff : 0000:02:00.1 > dfb1c000-dfb1ffff : igb > dfb20000-dfb3ffff : 0000:02:00.1 > dfb40000-dfb5ffff : 0000:02:00.1 > dfb40000-dfb5ffff : igb > dfb60000-dfb7ffff : 0000:02:00.1 > dfb60000-dfb7ffff : igb > dfb9c000-dfb9ffff : 0000:02:00.0 > dfb9c000-dfb9ffff : igb > dfba0000-dfbbffff : 0000:02:00.0 > dfbc0000-dfbdffff : 0000:02:00.0 > dfbc0000-dfbdffff : igb > dfbe0000-dfbfffff : 0000:02:00.0 > dfbe0000-dfbfffff : igb > dfc00000-dfcfffff : PCI Bus 0000:03 > dfc3c000-dfc3ffff : 0000:03:00.0 > dfc3c000-dfc3ffff : mpt2sas > dfc40000-dfc7ffff : 0000:03:00.0 > dfc40000-dfc7ffff : mpt2sas > dfc80000-dfcfffff : 0000:03:00.0 > dfd00000-dfdfffff : PCI Bus 0000:04 > dfd80000-dfdfffff : 0000:04:00.0 > dfe00000-dfffffff : PCI Bus 0000:05 > dfeb0000-dfebffff : 0000:05:00.0 > dfeb0000-dfebffff : mpt2sas > dfec0000-dfefffff : 0000:05:00.0 > dfec0000-dfefffff : mpt2sas > dff00000-dfffffff : 0000:05:00.0 > e0000000-efffffff : PCI MMCONFIG 0000 [bus 00-ff] > e0000000-efffffff : reserved > e0000000-efffffff : pnp 00:0a > f6000000-f6003fff : amd_iommu > fec00000-fec003ff : IOAPIC 0 > fec10000-fec1001f : pnp 00:04 > fec20000-fec203ff : IOAPIC 1 > fed00000-fed003ff : HPET 2 > fed00000-fed003ff : PNP0103:00 > fed40000-fed44fff : PCI Bus 0000:00 > fee00000-fee00fff : Local APIC > fee00000-fee00fff : pnp 00:03 > ffb80000-ffbfffff : pnp 00:04 > ffe00000-ffffffff : reserved > ffe50000-ffe5e05f : pnp 00:04 > 100000000-2026ffffff : System RAM > 2027000000-2027ffffff : RAM buffer Regards, Lutz Vieweg