From: Andreas Hartmann <andihartmann@freenet.de>
To: Mikulas Patocka <mpatocka@redhat.com>,
Andreas Hartmann <andihartmann@freenet.de>,
Joerg Roedel <joro@8bytes.org>,
iommu@lists.linux-foundation.org, Leo Duran <leo.duran@amd.com>
Cc: Christoph Hellwig <hch@lst.de>,
device-mapper development <dm-devel@redhat.com>,
Milan Broz <mbroz@redhat.com>, Jens Axboe <axboe@fb.com>,
linux-pci <linux-pci@vger.kernel.org>,
Linus Torvalds <torvalds@linux-foundation.org>
Subject: Re: [dm-devel] AMD-Vi IO_PAGE_FAULTs and ata3.00: failed command: READ FPDMA QUEUED errors since Linux 4.0
Date: Sun, 20 Sep 2015 08:50:40 +0200 [thread overview]
Message-ID: <55FE5740.2060701@maya.org> (raw)
In-Reply-To: <alpine.LRH.2.02.1508021347480.17729@file01.intranet.prod.int.rdu2.redhat.com>
On 08/02/2015 at 07:57 PM, Mikulas Patocka wrote:
>
>
> On Sun, 2 Aug 2015, Andreas Hartmann wrote:
>
>> On 08/01/2015 at 04:20 PM Andreas Hartmann wrote:
>>> On 07/28/2015 at 09:29 PM, Mike Snitzer wrote:
>>> [...]
>>>> Mikulas was saying to biect what is causing ATA to fail.
>>>
>>> Some good news and some bad news. The good news first:
>>>
>>> Your patchset
>>>
>>> f3396c58fd8442850e759843457d78b6ec3a9589,
>>> cf2f1abfbd0dba701f7f16ef619e4d2485de3366,
>>> 7145c241a1bf2841952c3e297c4080b357b3e52d,
>>> 94f5e0243c48aa01441c987743dc468e2d6eaca2,
>>> dc2676210c425ee8e5cb1bec5bc84d004ddf4179,
>>> 0f5d8e6ee758f7023e4353cca75d785b2d4f6abe,
>>> b3c5fd3052492f1b8d060799d4f18be5a5438add
>>>
>>> seems to work fine w/ 3.18.19 !!
>>>
>>> Why did I test it with 3.18.x now? Because I suddenly got two ata errors
>>> (ata1 and ata2) with clean 3.19.8 (w/o the AMD-Vi IO_PAGE_FAULTs) during
>>> normal operation. This means: 3.19 must already be broken, too.
>>>
>>> Therefore, I applied your patchset to 3.18.x and it seems to work like a
>>> charme - I don't get any AMD-Vi IO_PAGE_FAULTs on boot and no ata errors
>>> (until now).
>>>
>>>
>>> Next I did: I tried to bisect between 3.18 and 3.19 with your patchset
>>> applied, because w/ this patchset applied, the problem can be seen
>>> easily and directly on boot. Unfortunately, this does work only a few
>>> git bisect rounds until I got stuck because of interferences with your
>>> extra patches applied:
>>
>> [Resolved the problems written at the last post.]
>>
>> Bisecting ended here:
>>
>> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=34b48db66e08ca1c1bc07cf305d672ac940268dc
>>
>> block: remove artifical max_hw_sectors cap
>>
>>
>> Removing this patch on 3.19 and 4.1 make things working again. Didn't
>> test 4.0, but I think it's the same. No more AMD-Vi IO_PAGE_FAULTS with
>> that patch reverted.
After long period of testing, I now can say, that max_sectors_kb can be
set to 1024 - higher values produce AMD-Vi IO_PAGE_FAULTS and ata faults.
This patch "sd: Fix maximum I/O size for BLOCK_PC requests"[1] as part
of 4.1.7 produces ata / AMD-Vi IO_PAGE_FAULTS already during boot, too -
no matter if "block: remove artifical max_hw_sectors cap"[2] has been
applied or not.
Next I tested was "dm crypt: constrain crypt device's max_segment_size
to PAGE_SIZE" patch[3] applied to an unchanged 4.1.7 kernel w/o setting
max_sectors_kb to 1024.
Interesting effect was, that booting has been fine, but I could see lots
of ata errors afterwards as soon as there is load on the md raid 1
(during kernel compile e.g.), which is built on *rotational* disks:
[ 367.264873] ata2.00: exception Emask 0x0 SAct 0x7fbfffff SErr 0x0
action 0x6 frozen
[ 367.264883] ata2.00: failed command: WRITE FPDMA QUEUED
[ 367.264893] ata2.00: cmd 61/40:00:b0:7b:d4/05:00:06:00:00/40 tag 0
ncq 688128 out
[ 367.264893] res 40/00:00:00:00:00/00:00:00:00:00/00 Emask
0x4 (timeout)
[ 367.264899] ata2.00: status: { DRDY }
...
[ 367.265332] ata2.00: failed command: WRITE FPDMA QUEUED
[ 367.265339] ata2.00: cmd 61/40:f0:30:71:d4/05:00:06:00:00/40 tag 30
ncq 688128 out
[ 367.265339] res 40/00:00:00:00:00/00:00:00:00:00/00 Emask
0x4 (timeout)
[ 367.265343] ata2.00: status: { DRDY }
[ 367.265350] ata2: hard resetting link
[ 367.775330] ata2: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[ 367.776970] ata2.00: configured for UDMA/133
[ 367.776997] ata2.00: device reported invalid CHS sector 0
...
[ 367.777761] ata2: EH complete
Iow: Using an unpatched kernel >= 3.19 means high risk to break
filesystems if there are given some yet unknown conditions [4].
>>
>>
>> Please check why this patch triggers AMD-Vi IO_PAGE_FAULTS.
>
> I would submit this bug to maintainers of AMD-Vi. They understand the
> hardware, so they should tell why do large I/O requests result in
> IO_PAGE_FAULTs.
>
> It is probably bug either in AMD-Vi driver or in hardware.
Until now, I didn't hear anything from the maintainers of AMD-Vi.
Regards,
Andreas Hartmann
[1] http://thread.gmane.org/gmane.linux.kernel.commits.head/538464
[2]
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=34b48db66e08ca1c1bc07cf305d672ac940268dc
[3]
http://news.gmane.org/find-root.php?group=gmane.linux.kernel&article=2036495
[4] http://thread.gmane.org/gmane.linux.kernel.pci/43851/focus=44011
next prev parent reply other threads:[~2015-09-20 7:04 UTC|newest]
Thread overview: 55+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-07-28 17:40 AMD-Vi IO_PAGE_FAULTs and ata3.00: failed command: READ FPDMA QUEUED errors since Linux 4.0 Andreas Hartmann
2015-07-28 17:50 ` Mike Snitzer
2015-07-28 18:20 ` Andreas Hartmann
2015-07-28 18:58 ` Mike Snitzer
2015-07-28 19:23 ` Andreas Hartmann
2015-07-28 19:31 ` Mike Snitzer
2015-07-28 20:08 ` Andreas Hartmann
2015-07-28 21:24 ` Mike Snitzer
2015-07-29 6:17 ` [dm-devel] " Ondrej Kozina
2015-07-29 6:41 ` Milan Broz
2015-07-29 17:23 ` Andreas Hartmann
2015-07-30 20:30 ` Andreas Hartmann
2015-07-31 7:23 ` Milan Broz
2015-07-31 7:55 ` Andreas Hartmann
2015-07-31 8:15 ` Andreas Hartmann
2015-07-31 8:28 ` Milan Broz
2015-07-29 10:37 ` Milan Broz
2015-07-28 18:56 ` Andreas Hartmann
2015-07-28 19:29 ` Mike Snitzer
2015-08-01 14:20 ` [dm-devel] " Andreas Hartmann
2015-08-02 13:38 ` Andreas Hartmann
2015-08-02 17:57 ` Mikulas Patocka
2015-08-02 18:48 ` Andreas Hartmann
2015-08-03 8:12 ` Joerg Roedel
2015-08-04 14:47 ` Mike Snitzer
2015-08-04 16:10 ` Jeff Moyer
2015-08-04 18:11 ` Andreas Hartmann
2015-08-07 6:04 ` Andreas Hartmann
2015-09-20 6:50 ` Andreas Hartmann [this message]
2015-09-29 15:21 ` [dm-devel] " Joerg Roedel
2015-09-29 15:58 ` Mikulas Patocka
2015-09-29 16:20 ` Joerg Roedel
2015-09-30 14:52 ` Andreas Hartmann
2015-10-06 10:13 ` Joerg Roedel
2015-10-06 18:37 ` Andreas Hartmann
2015-10-07 15:40 ` Joerg Roedel
2015-10-07 17:02 ` Andreas Hartmann
2015-10-08 17:30 ` Joerg Roedel
2015-10-08 18:59 ` Andreas Hartmann
2015-10-08 19:47 ` Andreas Hartmann
2015-10-09 10:40 ` Joerg Roedel
2015-10-09 14:45 ` [PATCH] iommu/amd: Fix NULL pointer deref on device detach " Joerg Roedel
2015-10-09 17:42 ` Andreas Hartmann
[not found] ` <56148A1B.5060506@maya.org>
2015-10-07 16:10 ` [dm-devel] AMD-Vi IO_PAGE_FAULTs and ata3.00: failed command: " Joerg Roedel
2015-10-07 16:52 ` Andreas Hartmann
2015-10-08 16:39 ` Joerg Roedel
2015-10-08 18:21 ` Andreas Hartmann
2015-10-08 19:52 ` Andreas Hartmann
2015-10-09 5:20 ` Andreas Hartmann
2015-10-09 9:15 ` Andreas Hartmann
2015-10-09 14:59 ` Joerg Roedel
2015-10-09 17:46 ` Andreas Hartmann
2015-10-11 12:23 ` Andreas Hartmann
2015-10-12 12:07 ` Andreas Hartmann
2015-10-12 12:34 ` Mikulas Patocka
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=55FE5740.2060701@maya.org \
--to=andihartmann@freenet.de \
--cc=axboe@fb.com \
--cc=dm-devel@redhat.com \
--cc=hch@lst.de \
--cc=iommu@lists.linux-foundation.org \
--cc=joro@8bytes.org \
--cc=leo.duran@amd.com \
--cc=linux-pci@vger.kernel.org \
--cc=mbroz@redhat.com \
--cc=mpatocka@redhat.com \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).