All of lore.kernel.org
 help / color / mirror / Atom feed
From: Andreas Hartmann <andihartmann-KuiJ5kEpwI6ELgA04lAiVw@public.gmane.org>
To: Mikulas Patocka
	<mpatocka-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>,
	Andreas Hartmann
	<andihartmann-KuiJ5kEpwI6ELgA04lAiVw@public.gmane.org>,
	Joerg Roedel <joro-zLv9SwRftAIdnm+yROfE0A@public.gmane.org>,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org,
	Leo Duran <leo.duran-5C7GfCeVMHo@public.gmane.org>
Cc: linux-pci <linux-pci-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
	Jens Axboe <axboe-b10kYP2dOMg@public.gmane.org>,
	device-mapper development
	<dm-devel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>,
	Linus Torvalds
	<torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>,
	Christoph Hellwig <hch-jcswGhMUV9g@public.gmane.org>,
	Milan Broz <mbroz-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
Subject: Re: [dm-devel] AMD-Vi IO_PAGE_FAULTs and ata3.00: failed command: READ FPDMA QUEUED errors since Linux 4.0
Date: Sun, 20 Sep 2015 08:50:40 +0200	[thread overview]
Message-ID: <55FE5740.2060701@maya.org> (raw)
In-Reply-To: <alpine.LRH.2.02.1508021347480.17729-Hpncn10jQN4oNljnaZt3ZvA+iT7yCHsGwRM8/txMwJMAicBL8TP8PQ@public.gmane.org>

On 08/02/2015 at 07:57 PM, Mikulas Patocka wrote:
> 
> 
> On Sun, 2 Aug 2015, Andreas Hartmann wrote:
> 
>> On 08/01/2015 at 04:20 PM Andreas Hartmann wrote:
>>> On 07/28/2015 at 09:29 PM, Mike Snitzer wrote:
>>> [...]
>>>> Mikulas was saying to biect what is causing ATA to fail.
>>>
>>> Some good news and some bad news. The good news first:
>>>
>>> Your patchset
>>>
>>> f3396c58fd8442850e759843457d78b6ec3a9589,
>>> cf2f1abfbd0dba701f7f16ef619e4d2485de3366,
>>> 7145c241a1bf2841952c3e297c4080b357b3e52d,
>>> 94f5e0243c48aa01441c987743dc468e2d6eaca2,
>>> dc2676210c425ee8e5cb1bec5bc84d004ddf4179,
>>> 0f5d8e6ee758f7023e4353cca75d785b2d4f6abe,
>>> b3c5fd3052492f1b8d060799d4f18be5a5438add
>>>
>>> seems to work fine w/ 3.18.19 !!
>>>
>>> Why did I test it with 3.18.x now? Because I suddenly got two ata errors
>>> (ata1 and ata2) with clean 3.19.8 (w/o the AMD-Vi IO_PAGE_FAULTs) during
>>> normal operation. This means: 3.19 must already be broken, too.
>>>
>>> Therefore, I applied your patchset to 3.18.x and it seems to work like a
>>> charme - I don't get any AMD-Vi IO_PAGE_FAULTs on boot and no ata errors
>>> (until now).
>>>
>>>
>>> Next I did: I tried to bisect between 3.18 and 3.19 with your patchset
>>> applied, because w/ this patchset applied, the problem can be seen
>>> easily and directly on boot. Unfortunately, this does work only a few
>>> git bisect rounds until I got stuck because of interferences with your
>>> extra patches applied:
>>
>> [Resolved the problems written at the last post.]
>>
>> Bisecting ended here:
>>
>> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=34b48db66e08ca1c1bc07cf305d672ac940268dc
>>
>> block: remove artifical max_hw_sectors cap
>>
>>
>> Removing this patch on 3.19 and 4.1 make things working again. Didn't
>> test 4.0, but I think it's the same. No more AMD-Vi IO_PAGE_FAULTS  with
>> that patch reverted.

After long period of testing, I now can say, that max_sectors_kb can be
set to 1024 - higher values produce AMD-Vi IO_PAGE_FAULTS and ata faults.


This patch "sd: Fix maximum I/O size for BLOCK_PC requests"[1] as part
of 4.1.7 produces ata / AMD-Vi IO_PAGE_FAULTS already during boot, too -
no matter if "block: remove artifical max_hw_sectors cap"[2] has been
applied or not.


Next I tested was "dm crypt: constrain crypt device's max_segment_size
to PAGE_SIZE" patch[3] applied to an unchanged 4.1.7 kernel w/o setting
max_sectors_kb to 1024.

Interesting effect was, that booting has been fine, but I could see lots
of ata errors afterwards as soon as there is load on the md raid 1
(during kernel compile e.g.), which is built on *rotational* disks:


[  367.264873] ata2.00: exception Emask 0x0 SAct 0x7fbfffff SErr 0x0
action 0x6 frozen
[  367.264883] ata2.00: failed command: WRITE FPDMA QUEUED
[  367.264893] ata2.00: cmd 61/40:00:b0:7b:d4/05:00:06:00:00/40 tag 0
ncq 688128 out
[  367.264893]          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask
0x4 (timeout)
[  367.264899] ata2.00: status: { DRDY }
...
[  367.265332] ata2.00: failed command: WRITE FPDMA QUEUED
[  367.265339] ata2.00: cmd 61/40:f0:30:71:d4/05:00:06:00:00/40 tag 30
ncq 688128 out
[  367.265339]          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask
0x4 (timeout)
[  367.265343] ata2.00: status: { DRDY }
[  367.265350] ata2: hard resetting link
[  367.775330] ata2: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[  367.776970] ata2.00: configured for UDMA/133
[  367.776997] ata2.00: device reported invalid CHS sector 0
...
[  367.777761] ata2: EH complete


Iow: Using an unpatched kernel >= 3.19 means high risk to break
filesystems if there are given some yet unknown conditions [4].

>>
>>
>> Please check why this patch triggers AMD-Vi IO_PAGE_FAULTS.
> 
> I would submit this bug to maintainers of AMD-Vi. They understand the 
> hardware, so they should tell why do large I/O requests result in 
> IO_PAGE_FAULTs.
> 
> It is probably bug either in AMD-Vi driver or in hardware.

Until now, I didn't hear anything from the maintainers of AMD-Vi.


Regards,
Andreas Hartmann


[1] http://thread.gmane.org/gmane.linux.kernel.commits.head/538464
[2]
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=34b48db66e08ca1c1bc07cf305d672ac940268dc
[3]
http://news.gmane.org/find-root.php?group=gmane.linux.kernel&article=2036495
[4] http://thread.gmane.org/gmane.linux.kernel.pci/43851/focus=44011

WARNING: multiple messages have this Message-ID (diff)
From: Andreas Hartmann <andihartmann@freenet.de>
To: Mikulas Patocka <mpatocka@redhat.com>,
	Andreas Hartmann <andihartmann@freenet.de>,
	Joerg Roedel <joro@8bytes.org>,
	iommu@lists.linux-foundation.org, Leo Duran <leo.duran@amd.com>
Cc: Christoph Hellwig <hch@lst.de>,
	device-mapper development <dm-devel@redhat.com>,
	Milan Broz <mbroz@redhat.com>, Jens Axboe <axboe@fb.com>,
	linux-pci <linux-pci@vger.kernel.org>,
	Linus Torvalds <torvalds@linux-foundation.org>
Subject: Re: [dm-devel] AMD-Vi IO_PAGE_FAULTs and ata3.00: failed command: READ FPDMA QUEUED errors since Linux 4.0
Date: Sun, 20 Sep 2015 08:50:40 +0200	[thread overview]
Message-ID: <55FE5740.2060701@maya.org> (raw)
In-Reply-To: <alpine.LRH.2.02.1508021347480.17729@file01.intranet.prod.int.rdu2.redhat.com>

On 08/02/2015 at 07:57 PM, Mikulas Patocka wrote:
> 
> 
> On Sun, 2 Aug 2015, Andreas Hartmann wrote:
> 
>> On 08/01/2015 at 04:20 PM Andreas Hartmann wrote:
>>> On 07/28/2015 at 09:29 PM, Mike Snitzer wrote:
>>> [...]
>>>> Mikulas was saying to biect what is causing ATA to fail.
>>>
>>> Some good news and some bad news. The good news first:
>>>
>>> Your patchset
>>>
>>> f3396c58fd8442850e759843457d78b6ec3a9589,
>>> cf2f1abfbd0dba701f7f16ef619e4d2485de3366,
>>> 7145c241a1bf2841952c3e297c4080b357b3e52d,
>>> 94f5e0243c48aa01441c987743dc468e2d6eaca2,
>>> dc2676210c425ee8e5cb1bec5bc84d004ddf4179,
>>> 0f5d8e6ee758f7023e4353cca75d785b2d4f6abe,
>>> b3c5fd3052492f1b8d060799d4f18be5a5438add
>>>
>>> seems to work fine w/ 3.18.19 !!
>>>
>>> Why did I test it with 3.18.x now? Because I suddenly got two ata errors
>>> (ata1 and ata2) with clean 3.19.8 (w/o the AMD-Vi IO_PAGE_FAULTs) during
>>> normal operation. This means: 3.19 must already be broken, too.
>>>
>>> Therefore, I applied your patchset to 3.18.x and it seems to work like a
>>> charme - I don't get any AMD-Vi IO_PAGE_FAULTs on boot and no ata errors
>>> (until now).
>>>
>>>
>>> Next I did: I tried to bisect between 3.18 and 3.19 with your patchset
>>> applied, because w/ this patchset applied, the problem can be seen
>>> easily and directly on boot. Unfortunately, this does work only a few
>>> git bisect rounds until I got stuck because of interferences with your
>>> extra patches applied:
>>
>> [Resolved the problems written at the last post.]
>>
>> Bisecting ended here:
>>
>> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=34b48db66e08ca1c1bc07cf305d672ac940268dc
>>
>> block: remove artifical max_hw_sectors cap
>>
>>
>> Removing this patch on 3.19 and 4.1 make things working again. Didn't
>> test 4.0, but I think it's the same. No more AMD-Vi IO_PAGE_FAULTS  with
>> that patch reverted.

After long period of testing, I now can say, that max_sectors_kb can be
set to 1024 - higher values produce AMD-Vi IO_PAGE_FAULTS and ata faults.


This patch "sd: Fix maximum I/O size for BLOCK_PC requests"[1] as part
of 4.1.7 produces ata / AMD-Vi IO_PAGE_FAULTS already during boot, too -
no matter if "block: remove artifical max_hw_sectors cap"[2] has been
applied or not.


Next I tested was "dm crypt: constrain crypt device's max_segment_size
to PAGE_SIZE" patch[3] applied to an unchanged 4.1.7 kernel w/o setting
max_sectors_kb to 1024.

Interesting effect was, that booting has been fine, but I could see lots
of ata errors afterwards as soon as there is load on the md raid 1
(during kernel compile e.g.), which is built on *rotational* disks:


[  367.264873] ata2.00: exception Emask 0x0 SAct 0x7fbfffff SErr 0x0
action 0x6 frozen
[  367.264883] ata2.00: failed command: WRITE FPDMA QUEUED
[  367.264893] ata2.00: cmd 61/40:00:b0:7b:d4/05:00:06:00:00/40 tag 0
ncq 688128 out
[  367.264893]          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask
0x4 (timeout)
[  367.264899] ata2.00: status: { DRDY }
...
[  367.265332] ata2.00: failed command: WRITE FPDMA QUEUED
[  367.265339] ata2.00: cmd 61/40:f0:30:71:d4/05:00:06:00:00/40 tag 30
ncq 688128 out
[  367.265339]          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask
0x4 (timeout)
[  367.265343] ata2.00: status: { DRDY }
[  367.265350] ata2: hard resetting link
[  367.775330] ata2: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[  367.776970] ata2.00: configured for UDMA/133
[  367.776997] ata2.00: device reported invalid CHS sector 0
...
[  367.777761] ata2: EH complete


Iow: Using an unpatched kernel >= 3.19 means high risk to break
filesystems if there are given some yet unknown conditions [4].

>>
>>
>> Please check why this patch triggers AMD-Vi IO_PAGE_FAULTS.
> 
> I would submit this bug to maintainers of AMD-Vi. They understand the 
> hardware, so they should tell why do large I/O requests result in 
> IO_PAGE_FAULTs.
> 
> It is probably bug either in AMD-Vi driver or in hardware.

Until now, I didn't hear anything from the maintainers of AMD-Vi.


Regards,
Andreas Hartmann


[1] http://thread.gmane.org/gmane.linux.kernel.commits.head/538464
[2]
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=34b48db66e08ca1c1bc07cf305d672ac940268dc
[3]
http://news.gmane.org/find-root.php?group=gmane.linux.kernel&article=2036495
[4] http://thread.gmane.org/gmane.linux.kernel.pci/43851/focus=44011

  parent reply	other threads:[~2015-09-20  6:50 UTC|newest]

Thread overview: 76+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-07-28 17:40 AMD-Vi IO_PAGE_FAULTs and ata3.00: failed command: READ FPDMA QUEUED errors since Linux 4.0 Andreas Hartmann
2015-07-28 17:50 ` Mike Snitzer
2015-07-28 18:20   ` Andreas Hartmann
2015-07-28 18:58     ` Mike Snitzer
2015-07-28 19:23       ` Andreas Hartmann
2015-07-28 19:31         ` Mike Snitzer
2015-07-28 20:08           ` Andreas Hartmann
2015-07-28 21:24             ` Mike Snitzer
2015-07-29  6:17               ` [dm-devel] " Ondrej Kozina
2015-07-29  6:41                 ` Milan Broz
2015-07-29 17:23                   ` Andreas Hartmann
2015-07-30 20:30                     ` Andreas Hartmann
2015-07-31  7:23                       ` Milan Broz
2015-07-31  7:55                         ` Andreas Hartmann
2015-07-31  8:15                           ` Andreas Hartmann
2015-07-31  8:28                           ` Milan Broz
2015-07-29 10:37               ` Milan Broz
2015-07-28 18:56   ` Andreas Hartmann
2015-07-28 19:29     ` Mike Snitzer
2015-08-01 14:20       ` [dm-devel] " Andreas Hartmann
2015-08-02 13:38         ` Andreas Hartmann
2015-08-02 17:57           ` Mikulas Patocka
     [not found]             ` <alpine.LRH.2.02.1508021347480.17729-Hpncn10jQN4oNljnaZt3ZvA+iT7yCHsGwRM8/txMwJMAicBL8TP8PQ@public.gmane.org>
2015-08-02 18:48               ` Andreas Hartmann
2015-08-02 18:48                 ` Andreas Hartmann
2015-08-03  8:12                 ` Joerg Roedel
2015-08-04 14:47                   ` Mike Snitzer
2015-08-04 16:10                     ` Jeff Moyer
     [not found]                       ` <x4937zzm3uc.fsf-RRHT56Q3PSP4kTEheFKJxxDDeQx5vsVwAInAS/Ez/D0@public.gmane.org>
2015-08-04 18:11                         ` Andreas Hartmann
2015-08-04 18:11                           ` Andreas Hartmann
2015-08-07  6:04                           ` Andreas Hartmann
2015-09-20  6:50               ` Andreas Hartmann [this message]
2015-09-20  6:50                 ` [dm-devel] " Andreas Hartmann
     [not found]                 ` <55FE5740.2060701-YKS6W9RDU/w@public.gmane.org>
2015-09-29 15:21                   ` Joerg Roedel
2015-09-29 15:21                     ` Joerg Roedel
     [not found]                     ` <20150929152100.GL3036-zLv9SwRftAIdnm+yROfE0A@public.gmane.org>
2015-09-29 15:58                       ` Mikulas Patocka
2015-09-29 15:58                         ` Mikulas Patocka
2015-09-29 16:20                         ` Joerg Roedel
     [not found]                           ` <20150929162042.GR3036-zLv9SwRftAIdnm+yROfE0A@public.gmane.org>
2015-09-30 14:52                             ` Andreas Hartmann
2015-09-30 14:52                               ` Andreas Hartmann
2015-10-06 10:13                               ` Joerg Roedel
     [not found]                                 ` <20151006101356.GE12506-zLv9SwRftAIdnm+yROfE0A@public.gmane.org>
2015-10-06 18:37                                   ` Andreas Hartmann
2015-10-06 18:37                                     ` Andreas Hartmann
     [not found]                                     ` <56141507.7040103-YKS6W9RDU/w@public.gmane.org>
2015-10-07  2:57                                       ` Andreas Hartmann
     [not found]                                         ` <56148A1B.5060506-YKS6W9RDU/w@public.gmane.org>
2015-10-07 16:10                                           ` Joerg Roedel
2015-10-07 16:10                                             ` Joerg Roedel
     [not found]                                             ` <20151007161022.GI28811-zLv9SwRftAIdnm+yROfE0A@public.gmane.org>
2015-10-07 16:52                                               ` Andreas Hartmann
2015-10-07 16:52                                                 ` Andreas Hartmann
2015-10-08 16:39                                                 ` Joerg Roedel
     [not found]                                                   ` <20151008163957.GK28811-zLv9SwRftAIdnm+yROfE0A@public.gmane.org>
2015-10-08 18:21                                                     ` Andreas Hartmann
2015-10-08 18:21                                                       ` Andreas Hartmann
     [not found]                                                       ` <5616B436.1000802-YKS6W9RDU/w@public.gmane.org>
2015-10-08 19:52                                                         ` Andreas Hartmann
2015-10-08 19:52                                                           ` Andreas Hartmann
     [not found]                                                           ` <5616C998.1010309-YKS6W9RDU/w@public.gmane.org>
2015-10-09  5:20                                                             ` Andreas Hartmann
2015-10-09  5:20                                                               ` Andreas Hartmann
     [not found]                                                               ` <56174EA6.7000106-YKS6W9RDU/w@public.gmane.org>
2015-10-09  9:15                                                                 ` Andreas Hartmann
2015-10-09  9:15                                                                   ` Andreas Hartmann
     [not found]                                                                   ` <56178599.6010807-YKS6W9RDU/w@public.gmane.org>
2015-10-09 14:59                                                                     ` Joerg Roedel
2015-10-09 14:59                                                                       ` Joerg Roedel
     [not found]                                                                       ` <20151009145951.GC27420-zLv9SwRftAIdnm+yROfE0A@public.gmane.org>
2015-10-09 17:46                                                                         ` Andreas Hartmann
2015-10-09 17:46                                                                           ` Andreas Hartmann
     [not found]                                                                           ` <5617FD6E.70802-YKS6W9RDU/w@public.gmane.org>
2015-10-11 12:23                                                                             ` Andreas Hartmann
2015-10-11 12:23                                                                               ` Andreas Hartmann
2015-10-12 12:07                                                                               ` Andreas Hartmann
2015-10-12 12:34                                                                           ` Mikulas Patocka
2015-10-07 15:40                                     ` Joerg Roedel
2015-10-07 17:02                                       ` Andreas Hartmann
2015-10-08 17:30                                         ` Joerg Roedel
     [not found]                                           ` <20151008173007.GL28811-zLv9SwRftAIdnm+yROfE0A@public.gmane.org>
2015-10-08 18:59                                             ` Andreas Hartmann
2015-10-08 18:59                                               ` Andreas Hartmann
     [not found]                                               ` <5616BCF4.10104-YKS6W9RDU/w@public.gmane.org>
2015-10-08 19:47                                                 ` Andreas Hartmann
2015-10-08 19:47                                                   ` Andreas Hartmann
2015-10-09 10:40                                                   ` Joerg Roedel
     [not found]                                                   ` <5616C850.2000906-YKS6W9RDU/w@public.gmane.org>
2015-10-09 14:45                                                     ` [PATCH] iommu/amd: Fix NULL pointer deref on device detach " Joerg Roedel
2015-10-09 14:45                                                       ` Joerg Roedel
2015-10-09 14:45                                                       ` Joerg Roedel
2015-10-09 17:42                                                       ` Andreas Hartmann

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=55FE5740.2060701@maya.org \
    --to=andihartmann-kuij5kepwi6elga04laivw@public.gmane.org \
    --cc=axboe-b10kYP2dOMg@public.gmane.org \
    --cc=dm-devel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org \
    --cc=hch-jcswGhMUV9g@public.gmane.org \
    --cc=iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org \
    --cc=joro-zLv9SwRftAIdnm+yROfE0A@public.gmane.org \
    --cc=leo.duran-5C7GfCeVMHo@public.gmane.org \
    --cc=linux-pci-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=mbroz-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org \
    --cc=mpatocka-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org \
    --cc=torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.