linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Andreas Hartmann <andihartmann@freenet.de>
To: Joerg Roedel <joro@8bytes.org>
Cc: Mikulas Patocka <mpatocka@redhat.com>,
	iommu@lists.linux-foundation.org, Leo Duran <leo.duran@amd.com>,
	Christoph Hellwig <hch@lst.de>,
	device-mapper development <dm-devel@redhat.com>,
	Milan Broz <mbroz@redhat.com>, Jens Axboe <axboe@fb.com>,
	linux-pci <linux-pci@vger.kernel.org>,
	Linus Torvalds <torvalds@linux-foundation.org>
Subject: Re: [dm-devel] AMD-Vi IO_PAGE_FAULTs and ata3.00: failed command: READ FPDMA QUEUED errors since Linux 4.0
Date: Fri, 9 Oct 2015 11:15:05 +0200	[thread overview]
Message-ID: <56178599.6010807@maya.org> (raw)
In-Reply-To: <56174EA6.7000106@maya.org>

On 10/09/2015 at 07:20 AM, Andreas Hartmann wrote:
> On 10/08/2015 at 09:52 PM, Andreas Hartmann wrote:
>> On 10/08/2015 at 08:21 PM, Andreas Hartmann wrote:
>>> Am 08.10.2015 um 18:39 schrieb Joerg Roedel:
>>>> On Wed, Oct 07, 2015 at 06:52:58PM +0200, Andreas Hartmann wrote:
>>>>> To reproduce the error:
>>>>> First I mounted /daten2, afterwards /raid/mt, which produces the errors.
>>>>> The ssd mounts have been already active (during boot by fstab).
>>>>
>>>> Okay, I spent the day on that problem, and managed to reproduce it here
>>>> on one of my AMD IOMMU boxes. I wasn't an easy journey, as I can only
>>>> reproduce it if I setup the crypto partition and everything above that
>>>> (like mounting the lvm volumes) _after_ the system has finished booting.
>>>> If everything is setup during system boot it works fine and I don't see
>>>> any IO_PAGE_FAULTS.
>>>
>>> Thank you very much for spending so much of your time to reproduce the
>>> problem!
>>>
>>>> I also tried kernel v4.3-rc4 first, to have it tested with a
>>>> self-compiled kernel. It didn't show up there, so I built a 4.1.0, where
>>>> it showed up again. Something seems to have fixed the issue in the
>>>> latest kernels.
>>>>
>>>> So I looked a little bit around at the commits that were merged into the
>>>> respective parts involved here, and found this one:
>>>>
>>>>     586b286 dm crypt: constrain crypt device's max_segment_size to
>>>> PAGE_SIZE
>>>>
>>>> The problem fixed with this commit looks quite similar to what you have
>>>> seen (execpt that there was no IOMMU involved). So I cherry-picked that
>>>> commit on 4.1.0 and tested that. The problem was gone.
>>>
>>> That's true - I already knew this patch and tested it some weeks ago -
>>> unfortunately it doesn't fix the problem here.
>>>
>>> To be really sure, I just retested it now again. I couldn't see any
>>> IO_PAGE_FAULTS errors today (unfortunately I can't remember anymore if I
>>> didn't see them too a few weeks ago) - but the ata errors remain.
>>> Therefore, this patch isn't a solution for the problem I encounter here.
>>>
>>>> So it looks like it was a dm-crypt issue, the patch went into v4.3-rc3,
>>>> either this kernel of rc4 should fix the problem for you too. Can you
>>>> please verify this is fixed for you too with v4.3-rc4?
>>>
>>> As I already wrote, I even couldn't see the problem with v4.3-rc2 any
>>> more (as far as I was able to test because of the other problem). I have
>>> to do some more tests now with this kernel to be really sure.
>>
>> I now tested w/ v4.3-rc4. I couldn't see any IO_PAGE_FAULTS but the ata
>> errors remain. The ata errors can be easily activated by copying a large
>> file (> 4 GB) from one partition on the raid to another partition on the
>> raid.
> 
> Hmmm, I retested this morning w/ v4.3-rc4 and 4.1.10 (with the above
> mentioned patch applied) - and now, I didn't get any more ata errors.
> 
> I'm confused now. The only difference between yesterday evening and this
> morning was, that the machine was over night completely powerless (via
> socket outlet switch). Could this really be the reason? Let's wait and
> see if this is a persistent state ... .

No - it is not a persistent state. The ata errors are back again (in
3.1.10 w/ the above mentioned patch applied). It just isn't that easy
any more to trigger them. After a short time of intermission w/ power
off / on cycle, the error came up up again doing the first test copy.
This means: there must be something more broken.

If I revert the original culprit of all of the problems (block: remove
artifical max_hw_sectors cap), it is possible to increase max_sectors_kb
to 1024 - any higher value leads to ata or IO_PAGE_FAULTS sooner or later.

v4.3-rc4 isn't usable at all for me as long as is hangs the machine on
the necessary PCI passthrough for VMs (I need them).


Regards,
Andreas

  reply	other threads:[~2015-10-09  9:21 UTC|newest]

Thread overview: 55+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-07-28 17:40 AMD-Vi IO_PAGE_FAULTs and ata3.00: failed command: READ FPDMA QUEUED errors since Linux 4.0 Andreas Hartmann
2015-07-28 17:50 ` Mike Snitzer
2015-07-28 18:20   ` Andreas Hartmann
2015-07-28 18:58     ` Mike Snitzer
2015-07-28 19:23       ` Andreas Hartmann
2015-07-28 19:31         ` Mike Snitzer
2015-07-28 20:08           ` Andreas Hartmann
2015-07-28 21:24             ` Mike Snitzer
2015-07-29  6:17               ` [dm-devel] " Ondrej Kozina
2015-07-29  6:41                 ` Milan Broz
2015-07-29 17:23                   ` Andreas Hartmann
2015-07-30 20:30                     ` Andreas Hartmann
2015-07-31  7:23                       ` Milan Broz
2015-07-31  7:55                         ` Andreas Hartmann
2015-07-31  8:15                           ` Andreas Hartmann
2015-07-31  8:28                           ` Milan Broz
2015-07-29 10:37               ` Milan Broz
2015-07-28 18:56   ` Andreas Hartmann
2015-07-28 19:29     ` Mike Snitzer
2015-08-01 14:20       ` [dm-devel] " Andreas Hartmann
2015-08-02 13:38         ` Andreas Hartmann
2015-08-02 17:57           ` Mikulas Patocka
2015-08-02 18:48             ` Andreas Hartmann
2015-08-03  8:12               ` Joerg Roedel
2015-08-04 14:47                 ` Mike Snitzer
2015-08-04 16:10                   ` Jeff Moyer
2015-08-04 18:11                     ` Andreas Hartmann
2015-08-07  6:04                       ` Andreas Hartmann
2015-09-20  6:50             ` [dm-devel] " Andreas Hartmann
2015-09-29 15:21               ` Joerg Roedel
2015-09-29 15:58                 ` Mikulas Patocka
2015-09-29 16:20                   ` Joerg Roedel
2015-09-30 14:52                     ` Andreas Hartmann
2015-10-06 10:13                       ` Joerg Roedel
2015-10-06 18:37                         ` Andreas Hartmann
2015-10-07 15:40                           ` Joerg Roedel
2015-10-07 17:02                             ` Andreas Hartmann
2015-10-08 17:30                               ` Joerg Roedel
2015-10-08 18:59                                 ` Andreas Hartmann
2015-10-08 19:47                                   ` Andreas Hartmann
2015-10-09 10:40                                     ` Joerg Roedel
2015-10-09 14:45                                     ` [PATCH] iommu/amd: Fix NULL pointer deref on device detach " Joerg Roedel
2015-10-09 17:42                                       ` Andreas Hartmann
     [not found]                           ` <56148A1B.5060506@maya.org>
2015-10-07 16:10                             ` [dm-devel] AMD-Vi IO_PAGE_FAULTs and ata3.00: failed command: " Joerg Roedel
2015-10-07 16:52                               ` Andreas Hartmann
2015-10-08 16:39                                 ` Joerg Roedel
2015-10-08 18:21                                   ` Andreas Hartmann
2015-10-08 19:52                                     ` Andreas Hartmann
2015-10-09  5:20                                       ` Andreas Hartmann
2015-10-09  9:15                                         ` Andreas Hartmann [this message]
2015-10-09 14:59                                           ` Joerg Roedel
2015-10-09 17:46                                             ` Andreas Hartmann
2015-10-11 12:23                                               ` Andreas Hartmann
2015-10-12 12:07                                                 ` Andreas Hartmann
2015-10-12 12:34                                               ` Mikulas Patocka

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=56178599.6010807@maya.org \
    --to=andihartmann@freenet.de \
    --cc=axboe@fb.com \
    --cc=dm-devel@redhat.com \
    --cc=hch@lst.de \
    --cc=iommu@lists.linux-foundation.org \
    --cc=joro@8bytes.org \
    --cc=leo.duran@amd.com \
    --cc=linux-pci@vger.kernel.org \
    --cc=mbroz@redhat.com \
    --cc=mpatocka@redhat.com \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).