From: Andreas Hartmann <andihartmann@freenet.de>
To: Bjorn Helgaas <bhelgaas@google.com>
Cc: Joerg Roedel <joro@8bytes.org>,
Alex Williamson <alex.williamson@redhat.com>,
linux-pci <linux-pci@vger.kernel.org>,
iommu <iommu@lists.linux-foundation.org>,
Tejun Heo <tj@kernel.org>,
"linux-ide@vger.kernel.org" <linux-ide@vger.kernel.org>
Subject: Re: Since Linux 4.1: A lot of AMD-Vi IO_PAGE_FAULTs
Date: Sun, 26 Jul 2015 18:28:39 +0200 [thread overview]
Message-ID: <55B50AB7.7050303@maya.org> (raw)
In-Reply-To: <55B3D3F3.5020601@maya.org>
[-- Attachment #1: Type: text/plain, Size: 5359 bytes --]
On 07/25/2015 at 08:22 PM, Andreas Hartmann wrote:
> On 07/24/2015 at 06:15 PM, Bjorn Helgaas wrote:
>> [+cc Tejun, linux-ide]
>>
>> On Thu, Jul 23, 2015 at 11:22 PM, Andreas Hartmann
>> <andihartmann@freenet.de> wrote:
>>> On Tue, Jul 21, 2015 at 06:35PM +0200, Joerg Roedel wrote:
>>>> On Tue, Jul 21, 2015 at 06:20:23PM +0200, Andreas Hartmann wrote:
>>>>> [ 48.193901] <6>[fglrx] Firegl kernel thread PID: 1840
>>>>> [ 48.193985] <6>[fglrx] Firegl kernel thread PID: 1841
>>>>> [ 48.194063] <6>[fglrx] Firegl kernel thread PID: 1842
>>>>> [ 48.194172] <6>[fglrx] IRQ 28 Enabled
>>>>> [ 48.261580] <6>[fglrx] Reserved FB block: Shared offset:0, size:1000000
>>>>> [ 48.261586] <6>[fglrx] Reserved FB block: Unshared offset:f7b4000, size:4000
>>>>> [ 48.261587] <6>[fglrx] Reserved FB block: Unshared offset:f7b8000, size:548000
>>>>> [ 48.261588] <6>[fglrx] Reserved FB block: Unshared offset:3fff3000, size:d000
>>>>
>>>> From a first glance it doesn't look like an IOMMU driver issue, because
>>>> the addresses where the faults happen are not from the AMD IOMMU driver.
>>>>
>>>> And you have proprietary closed-source drivers loaded, can you reproduce
>>>> the issue without fglrx?
>>>
>>> Yes. I attached this one.
>>>
>>> Meanwhile I tested with 4.0.9, too. I wasn't able to reproduce the
>>> problem with this kernel even after lots of reboots (the problem w/ 4.1
>>> usually comes up during boot process (but not only - it can be seen
>>> after boot process, too)).
>>>
>>> The problem always is, that there are errors w/ one of the sata discs
>>> and at the same time, IO_PAGE_FAULT errors are rising as described before:
>>>
>>> [ 152.533708] ata3.00: failed command: READ FPDMA QUEUED
>>> [ 152.538102] ata3.00: failed command: READ FPDMA QUEUED
>>> [ 152.539862] ata3.00: failed command: READ FPDMA QUEUED
>>> [ 152.541778] ata3.00: failed command: WRITE FPDMA QUEUED
>>> [ 152.543861] ata3.00: failed command: WRITE FPDMA QUEUED
>>>
>>> [ 5818.068050] ata2.00: failed command: WRITE FPDMA QUEUED
>>> [ 5818.068059] ata2.00: failed command: WRITE FPDMA QUEUED
>>>
>>> I compared dmesg from 4.1 w/ 4.0 and I realized the following *missing*
>>> entries in 4.1:
>>>
>>> [ 0.000000] ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled)
>>> [ 0.000000] ACPI: LAPIC (acpi_id[0x01] lapic_id[0x01] enabled)
>>> [ 0.000000] ACPI: LAPIC (acpi_id[0x02] lapic_id[0x02] enabled)
>>> [ 0.000000] ACPI: LAPIC (acpi_id[0x03] lapic_id[0x03] enabled)
>>> [ 0.000000] ACPI: LAPIC (acpi_id[0x04] lapic_id[0x04] enabled)
>>> [ 0.000000] ACPI: LAPIC (acpi_id[0x05] lapic_id[0x05] enabled)
>>> [ 0.000000] ACPI: LAPIC (acpi_id[0x06] lapic_id[0x06] enabled)
>>> [ 0.000000] ACPI: LAPIC (acpi_id[0x07] lapic_id[0x07] enabled)
>>>
>>>
>>> What does this mean? Is there missing some part of the acpi initialization?
>>>
>>>
>>> Thanks for any hint as Linux 4.1 is completely unusable here with these
>>> errors.
>>
>> This looks more like an AHCI problem than an IOMMU or PCI problem.
>> Seems like the device has the wrong idea about where its DMA buffers
>> are. Maybe something scribbled on its command list?
>
> During further tests I detected, that the problem already occurs in
> Linux 4.0. I couldn't see it in 3.19.8 until now.
>
>
> I tried hard to bisect it. I got stuck 2 times of 3 here (the third
> round, I got stuck later on - unfortunately, sometimes it is working :-( ):
>
> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=be5e6616dd74e17fdd8e16ca015cfef94d49b467
I did a few more bisects and got these two following possibly critical
changes at the end of each run (I always reduced the window):
Merge tag 'nfs-for-3.20-2' of
git://git.linux-nfs.org/projects/trondmy/linux-nfs
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=24a52e412ef22989b63c35428652598dc995812c
Merge tag 'pm+acpi-3.20-rc1-3' of
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=cd50b70ccd5c87794ec28bfb87b7fba9961eb0ae
BTW: I'm heavily using XFS and DM crypt. I attached the config I used
for testing.
>
> Does this help?
>
>
>> From your attachments:
>>
>> # lspci -vvs 00:11.0
>> 00:11.0 SATA controller: Advanced Micro Devices, Inc. [AMD/ATI]
>> SB7x0/SB8x0/SB9x0 SATA Controller [AHCI mode] (rev 40) (prog-if 01 [AHCI
>> 1.0])
>>
>> pci 0000:00:11.0: [1002:4391] type 00 class 0x010601
>> ahci 0000:00:11.0: version 3.0
>> ahci 0000:00:11.0: AHCI 0001.0200 32 slots 6 ports 6 Gbps 0x3f impl SATA mode
>> ahci 0000:00:11.0: flags: 64bit ncq sntf ilck pm led clo pmp pio slum part
>> AMD-Vi: Event logged [IO_PAGE_FAULT device=00:11.0 domain=0x0008
>> address=0x40eba32100618000 flags=0x0010]
>> AMD-Vi: Event logged [IO_PAGE_FAULT device=00:11.0 domain=0x0008
>> address=0x40eba32100618040 flags=0x0010]
>> AMD-Vi: Event logged [IO_PAGE_FAULT device=00:11.0 domain=0x0008
>> address=0x0000000000000000 flags=0x0000]
>> AMD-Vi: Event logged [IO_PAGE_FAULT device=00:11.0 domain=0x0008
>> address=0x00000000000000c0 flags=0x0000]
>> AMD-Vi: Event logged [IO_PAGE_FAULT device=00:11.0 domain=0x0008
>> address=0x0000000000000040 flags=0x0000]
>> AMD-Vi: Event logged [IO_PAGE_FAULT device=00:11.0 domain=0x0008
>> address=0x00000000000001c0 flags=0x0000]
Regards,
Andreas
[-- Attachment #2: config-4.0.gz --]
[-- Type: application/x-gzip, Size: 25544 bytes --]
next prev parent reply other threads:[~2015-07-26 16:28 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-07-21 15:04 Since Linux 4.1: A lot of AMD-Vi IO_PAGE_FAULTs Andreas Hartmann
2015-07-21 15:34 ` Alex Williamson
2015-07-21 15:56 ` Joerg Roedel
2015-07-21 16:20 ` Andreas Hartmann
2015-07-21 16:30 ` Joerg Roedel
2015-07-21 16:35 ` Joerg Roedel
2015-07-21 17:47 ` Andreas Hartmann
2015-07-24 4:22 ` Andreas Hartmann
2015-07-24 16:15 ` Bjorn Helgaas
2015-07-24 17:59 ` Andreas Hartmann
2015-07-29 15:22 ` Joerg Roedel
2015-07-29 16:31 ` Andreas Hartmann
2015-07-25 18:22 ` Andreas Hartmann
2015-07-26 16:28 ` Andreas Hartmann [this message]
2015-07-21 18:21 ` Borislav Petkov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=55B50AB7.7050303@maya.org \
--to=andihartmann@freenet.de \
--cc=alex.williamson@redhat.com \
--cc=bhelgaas@google.com \
--cc=iommu@lists.linux-foundation.org \
--cc=joro@8bytes.org \
--cc=linux-ide@vger.kernel.org \
--cc=linux-pci@vger.kernel.org \
--cc=tj@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.