linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Andreas Hartmann <andihartmann@freenet.de>
To: Bjorn Helgaas <bhelgaas@google.com>
Cc: Joerg Roedel <joro@8bytes.org>,
	Alex Williamson <alex.williamson@redhat.com>,
	linux-pci <linux-pci@vger.kernel.org>,
	iommu <iommu@lists.linux-foundation.org>,
	Tejun Heo <tj@kernel.org>,
	"linux-ide@vger.kernel.org" <linux-ide@vger.kernel.org>
Subject: Re: Since Linux 4.1: A lot of AMD-Vi IO_PAGE_FAULTs
Date: Sun, 26 Jul 2015 18:28:39 +0200	[thread overview]
Message-ID: <55B50AB7.7050303@maya.org> (raw)
In-Reply-To: <55B3D3F3.5020601@maya.org>

[-- Attachment #1: Type: text/plain, Size: 5359 bytes --]

On 07/25/2015 at 08:22 PM, Andreas Hartmann wrote:
> On 07/24/2015 at 06:15 PM, Bjorn Helgaas wrote:
>> [+cc Tejun, linux-ide]
>>
>> On Thu, Jul 23, 2015 at 11:22 PM, Andreas Hartmann
>> <andihartmann@freenet.de> wrote:
>>> On Tue, Jul 21, 2015 at 06:35PM +0200, Joerg Roedel wrote:
>>>> On Tue, Jul 21, 2015 at 06:20:23PM +0200, Andreas Hartmann wrote:
>>>>> [   48.193901] <6>[fglrx] Firegl kernel thread PID: 1840
>>>>> [   48.193985] <6>[fglrx] Firegl kernel thread PID: 1841
>>>>> [   48.194063] <6>[fglrx] Firegl kernel thread PID: 1842
>>>>> [   48.194172] <6>[fglrx] IRQ 28 Enabled
>>>>> [   48.261580] <6>[fglrx] Reserved FB block: Shared offset:0, size:1000000
>>>>> [   48.261586] <6>[fglrx] Reserved FB block: Unshared offset:f7b4000, size:4000
>>>>> [   48.261587] <6>[fglrx] Reserved FB block: Unshared offset:f7b8000, size:548000
>>>>> [   48.261588] <6>[fglrx] Reserved FB block: Unshared offset:3fff3000, size:d000
>>>>
>>>> From a first glance it doesn't look like an IOMMU driver issue, because
>>>> the addresses where the faults happen are not from the AMD IOMMU driver.
>>>>
>>>> And you have proprietary closed-source drivers loaded, can you reproduce
>>>> the issue without fglrx?
>>>
>>> Yes. I attached this one.
>>>
>>> Meanwhile I tested with 4.0.9, too. I wasn't able to reproduce the
>>> problem with this kernel even after lots of reboots (the problem w/ 4.1
>>> usually comes up during boot process (but not only - it can be seen
>>> after boot process, too)).
>>>
>>> The problem always is, that there are errors w/ one of the sata discs
>>> and at the same time, IO_PAGE_FAULT errors are rising as described before:
>>>
>>> [  152.533708] ata3.00: failed command: READ FPDMA QUEUED
>>> [  152.538102] ata3.00: failed command: READ FPDMA QUEUED
>>> [  152.539862] ata3.00: failed command: READ FPDMA QUEUED
>>> [  152.541778] ata3.00: failed command: WRITE FPDMA QUEUED
>>> [  152.543861] ata3.00: failed command: WRITE FPDMA QUEUED
>>>
>>> [ 5818.068050] ata2.00: failed command: WRITE FPDMA QUEUED
>>> [ 5818.068059] ata2.00: failed command: WRITE FPDMA QUEUED
>>>
>>> I compared dmesg from 4.1 w/ 4.0 and I realized the following *missing*
>>> entries in 4.1:
>>>
>>> [    0.000000] ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled)
>>> [    0.000000] ACPI: LAPIC (acpi_id[0x01] lapic_id[0x01] enabled)
>>> [    0.000000] ACPI: LAPIC (acpi_id[0x02] lapic_id[0x02] enabled)
>>> [    0.000000] ACPI: LAPIC (acpi_id[0x03] lapic_id[0x03] enabled)
>>> [    0.000000] ACPI: LAPIC (acpi_id[0x04] lapic_id[0x04] enabled)
>>> [    0.000000] ACPI: LAPIC (acpi_id[0x05] lapic_id[0x05] enabled)
>>> [    0.000000] ACPI: LAPIC (acpi_id[0x06] lapic_id[0x06] enabled)
>>> [    0.000000] ACPI: LAPIC (acpi_id[0x07] lapic_id[0x07] enabled)
>>>
>>>
>>> What does this mean? Is there missing some part of the acpi initialization?
>>>
>>>
>>> Thanks for any hint as Linux 4.1 is completely unusable here with these
>>> errors.
>>
>> This looks more like an AHCI problem than an IOMMU or PCI problem.
>> Seems like the device has the wrong idea about where its DMA buffers
>> are.  Maybe something scribbled on its command list?
> 
> During further tests I detected, that the problem already occurs in
> Linux 4.0. I couldn't see it in 3.19.8 until now.
> 
> 
> I tried hard to bisect it. I got stuck 2 times of 3 here (the third
> round, I got stuck later on - unfortunately, sometimes it is working :-( ):
> 
> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=be5e6616dd74e17fdd8e16ca015cfef94d49b467

I did a few more bisects and got these two following possibly critical
changes at the end of each run (I always reduced the window):


Merge tag 'nfs-for-3.20-2' of
git://git.linux-nfs.org/projects/trondmy/linux-nfs

https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=24a52e412ef22989b63c35428652598dc995812c



Merge tag 'pm+acpi-3.20-rc1-3' of
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=cd50b70ccd5c87794ec28bfb87b7fba9961eb0ae



BTW: I'm heavily using XFS and DM crypt. I attached the config I used
for testing.


> 
> Does this help?
> 
> 
>> From your attachments:
>>
>> # lspci -vvs 00:11.0
>> 00:11.0 SATA controller: Advanced Micro Devices, Inc. [AMD/ATI]
>> SB7x0/SB8x0/SB9x0 SATA Controller [AHCI mode] (rev 40) (prog-if 01 [AHCI
>> 1.0])
>>
>> pci 0000:00:11.0: [1002:4391] type 00 class 0x010601
>> ahci 0000:00:11.0: version 3.0
>> ahci 0000:00:11.0: AHCI 0001.0200 32 slots 6 ports 6 Gbps 0x3f impl SATA mode
>> ahci 0000:00:11.0: flags: 64bit ncq sntf ilck pm led clo pmp pio slum part
>> AMD-Vi: Event logged [IO_PAGE_FAULT device=00:11.0 domain=0x0008
>> address=0x40eba32100618000 flags=0x0010]
>> AMD-Vi: Event logged [IO_PAGE_FAULT device=00:11.0 domain=0x0008
>> address=0x40eba32100618040 flags=0x0010]
>> AMD-Vi: Event logged [IO_PAGE_FAULT device=00:11.0 domain=0x0008
>> address=0x0000000000000000 flags=0x0000]
>> AMD-Vi: Event logged [IO_PAGE_FAULT device=00:11.0 domain=0x0008
>> address=0x00000000000000c0 flags=0x0000]
>> AMD-Vi: Event logged [IO_PAGE_FAULT device=00:11.0 domain=0x0008
>> address=0x0000000000000040 flags=0x0000]
>> AMD-Vi: Event logged [IO_PAGE_FAULT device=00:11.0 domain=0x0008
>> address=0x00000000000001c0 flags=0x0000]


Regards,
Andreas


[-- Attachment #2: config-4.0.gz --]
[-- Type: application/x-gzip, Size: 25544 bytes --]

      reply	other threads:[~2015-07-26 16:31 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <55AE5F84.5020204@01019freenet.de>
2015-07-21 15:34 ` Since Linux 4.1: A lot of AMD-Vi IO_PAGE_FAULTs Alex Williamson
2015-07-21 15:56   ` Joerg Roedel
2015-07-21 16:20     ` Andreas Hartmann
2015-07-21 16:30       ` Joerg Roedel
2015-07-21 16:35       ` Joerg Roedel
2015-07-21 17:47         ` Andreas Hartmann
2015-07-24  4:22         ` Andreas Hartmann
2015-07-24 16:15           ` Bjorn Helgaas
2015-07-24 17:59             ` Andreas Hartmann
2015-07-29 15:22               ` Joerg Roedel
2015-07-29 16:31                 ` Andreas Hartmann
2015-07-25 18:22             ` Andreas Hartmann
2015-07-26 16:28               ` Andreas Hartmann [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=55B50AB7.7050303@maya.org \
    --to=andihartmann@freenet.de \
    --cc=alex.williamson@redhat.com \
    --cc=bhelgaas@google.com \
    --cc=iommu@lists.linux-foundation.org \
    --cc=joro@8bytes.org \
    --cc=linux-ide@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).