Re: [dm-devel] AMD-Vi IO_PAGE_FAULTs and ata3.00: failed command: READ FPDMA QUEUED errors since Linux 4.0

Linux IOMMU Development
 help / color / mirror / Atom feed

* Re: [dm-devel] AMD-Vi IO_PAGE_FAULTs and ata3.00: failed command: READ FPDMA QUEUED errors since Linux 4.0
       [not found]         ` <55BE1D5E.6020709@maya.org>
@ 2015-08-02 17:57           ` Mikulas Patocka
       [not found]             ` <alpine.LRH.2.02.1508021347480.17729-Hpncn10jQN4oNljnaZt3ZvA+iT7yCHsGwRM8/txMwJMAicBL8TP8PQ@public.gmane.org>
  0 siblings, 1 reply; 35+ messages in thread
From: Mikulas Patocka @ 2015-08-02 17:57 UTC (permalink / raw)
  To: Andreas Hartmann, Joerg Roedel, iommu, Leo Duran
  Cc: Christoph Hellwig, device-mapper development, Milan Broz,
	Jens Axboe, linux-pci



On Sun, 2 Aug 2015, Andreas Hartmann wrote:

> On 08/01/2015 at 04:20 PM Andreas Hartmann wrote:
> > On 07/28/2015 at 09:29 PM, Mike Snitzer wrote:
> > [...]
> >> Mikulas was saying to biect what is causing ATA to fail.
> > 
> > Some good news and some bad news. The good news first:
> > 
> > Your patchset
> > 
> > f3396c58fd8442850e759843457d78b6ec3a9589,
> > cf2f1abfbd0dba701f7f16ef619e4d2485de3366,
> > 7145c241a1bf2841952c3e297c4080b357b3e52d,
> > 94f5e0243c48aa01441c987743dc468e2d6eaca2,
> > dc2676210c425ee8e5cb1bec5bc84d004ddf4179,
> > 0f5d8e6ee758f7023e4353cca75d785b2d4f6abe,
> > b3c5fd3052492f1b8d060799d4f18be5a5438add
> > 
> > seems to work fine w/ 3.18.19 !!
> > 
> > Why did I test it with 3.18.x now? Because I suddenly got two ata errors
> > (ata1 and ata2) with clean 3.19.8 (w/o the AMD-Vi IO_PAGE_FAULTs) during
> > normal operation. This means: 3.19 must already be broken, too.
> > 
> > Therefore, I applied your patchset to 3.18.x and it seems to work like a
> > charme - I don't get any AMD-Vi IO_PAGE_FAULTs on boot and no ata errors
> > (until now).
> > 
> > 
> > Next I did: I tried to bisect between 3.18 and 3.19 with your patchset
> > applied, because w/ this patchset applied, the problem can be seen
> > easily and directly on boot. Unfortunately, this does work only a few
> > git bisect rounds until I got stuck because of interferences with your
> > extra patches applied:
> 
> [Resolved the problems written at the last post.]
> 
> Bisecting ended here:
> 
> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=34b48db66e08ca1c1bc07cf305d672ac940268dc
> 
> block: remove artifical max_hw_sectors cap
> 
> 
> Removing this patch on 3.19 and 4.1 make things working again. Didn't
> test 4.0, but I think it's the same. No more AMD-Vi IO_PAGE_FAULTS  with
> that patch reverted.
> 
> 
> Please check why this patch triggers AMD-Vi IO_PAGE_FAULTS.
> 
> 
> Thanks,
> kind regards,
> Andreas Hartmann

I would submit this bug to maintainers of AMD-Vi. They understand the 
hardware, so they should tell why do large I/O requests result in 
IO_PAGE_FAULTs.

It is probably bug either in AMD-Vi driver or in hardware.

Mikulas

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [dm-devel] AMD-Vi IO_PAGE_FAULTs and ata3.00: failed command: READ FPDMA QUEUED errors since Linux 4.0
       [not found]             ` <alpine.LRH.2.02.1508021347480.17729-Hpncn10jQN4oNljnaZt3ZvA+iT7yCHsGwRM8/txMwJMAicBL8TP8PQ@public.gmane.org>
@ 2015-08-02 18:48               ` Andreas Hartmann
  2015-08-03  8:12                 ` Joerg Roedel
  2015-09-20  6:50               ` [dm-devel] " Andreas Hartmann
  1 sibling, 1 reply; 35+ messages in thread
From: Andreas Hartmann @ 2015-08-02 18:48 UTC (permalink / raw)
  To: Mikulas Patocka, Joerg Roedel,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, Leo Duran
  Cc: Jens Axboe, linux-pci, device-mapper development,
	Christoph Hellwig, Milan Broz

On 08/01/2015 at 19:57, Mikulas Patocka wrote:
>
>
> On Sun, 2 Aug 2015, Andreas Hartmann wrote:
>
>> On 08/01/2015 at 04:20 PM Andreas Hartmann wrote:
>>> On 07/28/2015 at 09:29 PM, Mike Snitzer wrote:
>>> [...]
>>>> Mikulas was saying to biect what is causing ATA to fail.
>>>
>>> Some good news and some bad news. The good news first:
>>>
>>> Your patchset
>>>
>>> f3396c58fd8442850e759843457d78b6ec3a9589,
>>> cf2f1abfbd0dba701f7f16ef619e4d2485de3366,
>>> 7145c241a1bf2841952c3e297c4080b357b3e52d,
>>> 94f5e0243c48aa01441c987743dc468e2d6eaca2,
>>> dc2676210c425ee8e5cb1bec5bc84d004ddf4179,
>>> 0f5d8e6ee758f7023e4353cca75d785b2d4f6abe,
>>> b3c5fd3052492f1b8d060799d4f18be5a5438add
>>>
>>> seems to work fine w/ 3.18.19 !!
>>>
>>> Why did I test it with 3.18.x now? Because I suddenly got two ata errors
>>> (ata1 and ata2) with clean 3.19.8 (w/o the AMD-Vi IO_PAGE_FAULTs) during
>>> normal operation. This means: 3.19 must already be broken, too.
>>>
>>> Therefore, I applied your patchset to 3.18.x and it seems to work like a
>>> charme - I don't get any AMD-Vi IO_PAGE_FAULTs on boot and no ata errors
>>> (until now).
>>>
>>>
>>> Next I did: I tried to bisect between 3.18 and 3.19 with your patchset
>>> applied, because w/ this patchset applied, the problem can be seen
>>> easily and directly on boot. Unfortunately, this does work only a few
>>> git bisect rounds until I got stuck because of interferences with your
>>> extra patches applied:
>>
>> [Resolved the problems written at the last post.]
>>
>> Bisecting ended here:
>>
>> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=34b48db66e08ca1c1bc07cf305d672ac940268dc
>>
>> block: remove artifical max_hw_sectors cap
>>
>>
>> Removing this patch on 3.19 and 4.1 make things working again. Didn't
>> test 4.0, but I think it's the same. No more AMD-Vi IO_PAGE_FAULTS  with
>> that patch reverted.
>>
>>
>> Please check why this patch triggers AMD-Vi IO_PAGE_FAULTS.
>>
>>
>> Thanks,
>> kind regards,
>> Andreas Hartmann
>
> I would submit this bug to maintainers of AMD-Vi. They understand the
> hardware, so they should tell why do large I/O requests result in
> IO_PAGE_FAULTs.

You forgot the ata errors ... . They are gone, too. I got these ata 
errors on 3.19 w/o your patchset and w/o AMD-Vi IO_PAGES_FAULTs, too.



Regards,
Andreas

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [dm-devel] AMD-Vi IO_PAGE_FAULTs and ata3.00: failed command: READ FPDMA QUEUED errors since Linux 4.0
  2015-08-02 18:48               ` Andreas Hartmann
@ 2015-08-03  8:12                 ` Joerg Roedel
  2015-08-04 14:47                   ` Mike Snitzer
  0 siblings, 1 reply; 35+ messages in thread
From: Joerg Roedel @ 2015-08-03  8:12 UTC (permalink / raw)
  To: Andreas Hartmann
  Cc: Mikulas Patocka, iommu, Leo Duran, Christoph Hellwig,
	device-mapper development, Milan Broz, Jens Axboe, linux-pci

On Sun, Aug 02, 2015 at 08:48:06PM +0200, Andreas Hartmann wrote:
> >>https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=34b48db66e08ca1c1bc07cf305d672ac940268dc
> >>
> >>block: remove artifical max_hw_sectors cap

Looking at the patch, it seems to me that it just uncovered a bug
elsewhere. It looks like an underlying driver doesn't expect the
big io-requests that the patch enables and does not dma-map the whole
target buffer, causing the IO_PAGE_FAULTs later.


	Joerg

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: AMD-Vi IO_PAGE_FAULTs and ata3.00: failed command: READ FPDMA QUEUED errors since Linux 4.0
  2015-08-03  8:12                 ` Joerg Roedel
@ 2015-08-04 14:47                   ` Mike Snitzer
  2015-08-04 16:10                     ` Jeff Moyer
  0 siblings, 1 reply; 35+ messages in thread
From: Mike Snitzer @ 2015-08-04 14:47 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: Andreas Hartmann, linux-pci, device-mapper development,
	Jens Axboe, iommu, Leo Duran, Mikulas Patocka, Christoph Hellwig,
	Milan Broz, Jeff Moyer

On Mon, Aug 03 2015 at  4:12am -0400,
Joerg Roedel <joro@8bytes.org> wrote:

> On Sun, Aug 02, 2015 at 08:48:06PM +0200, Andreas Hartmann wrote:
> > >>https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=34b48db66e08ca1c1bc07cf305d672ac940268dc
> > >>
> > >>block: remove artifical max_hw_sectors cap
> 
> Looking at the patch, it seems to me that it just uncovered a bug
> elsewhere. It looks like an underlying driver doesn't expect the
> big io-requests that the patch enables and does not dma-map the whole
> target buffer, causing the IO_PAGE_FAULTs later.

That patch has caused issues elsewhere too, see this 'Revert "block:
remove artifical max_hw_sectors cap"' thread (if/when lkml.org
cooperates): https://lkml.org/lkml/2015/7/20/572

But it could be that there is a need for a horkage fix for this specific
hardware? something comparable to this?:
http://git.kernel.org/linus/af34d637637eabaf49406eb35c948cd51ba262a6

We are running out of time to fix these whack-a-mole issues in 4.2
though.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: AMD-Vi IO_PAGE_FAULTs and ata3.00: failed command: READ FPDMA QUEUED errors since Linux 4.0
  2015-08-04 14:47                   ` Mike Snitzer
@ 2015-08-04 16:10                     ` Jeff Moyer
       [not found]                       ` <x4937zzm3uc.fsf-RRHT56Q3PSP4kTEheFKJxxDDeQx5vsVwAInAS/Ez/D0@public.gmane.org>
  0 siblings, 1 reply; 35+ messages in thread
From: Jeff Moyer @ 2015-08-04 16:10 UTC (permalink / raw)
  To: Mike Snitzer
  Cc: Joerg Roedel, Andreas Hartmann, linux-pci,
	device-mapper development, Jens Axboe, iommu, Leo Duran,
	Mikulas Patocka, Christoph Hellwig, Milan Broz, linux-ide

Mike Snitzer <snitzer@redhat.com> writes:

> On Mon, Aug 03 2015 at  4:12am -0400,
> Joerg Roedel <joro@8bytes.org> wrote:
>
>> On Sun, Aug 02, 2015 at 08:48:06PM +0200, Andreas Hartmann wrote:
>> > >>https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=34b48db66e08ca1c1bc07cf305d672ac940268dc
>> > >>
>> > >>block: remove artifical max_hw_sectors cap
>> 
>> Looking at the patch, it seems to me that it just uncovered a bug
>> elsewhere. It looks like an underlying driver doesn't expect the
>> big io-requests that the patch enables and does not dma-map the whole
>> target buffer, causing the IO_PAGE_FAULTs later.
>
> That patch has caused issues elsewhere too, see this 'Revert "block:
> remove artifical max_hw_sectors cap"' thread (if/when lkml.org
> cooperates): https://lkml.org/lkml/2015/7/20/572
>
> But it could be that there is a need for a horkage fix for this specific
> hardware? something comparable to this?:
> http://git.kernel.org/linus/af34d637637eabaf49406eb35c948cd51ba262a6
>
> We are running out of time to fix these whack-a-mole issues in 4.2
> though.

CC-ing linux-ide.  Original dm-devel posting:
  https://www.redhat.com/archives/dm-devel/2015-July/msg00178.html

Andreas, I would be curious to know what the value of
/sys/block/sdX/queue/max_hw_sectors_kb is for the affected disks.

Cheers,
Jeff

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: AMD-Vi IO_PAGE_FAULTs and ata3.00: failed command: READ FPDMA QUEUED errors since Linux 4.0
       [not found]                       ` <x4937zzm3uc.fsf-RRHT56Q3PSP4kTEheFKJxxDDeQx5vsVwAInAS/Ez/D0@public.gmane.org>
@ 2015-08-04 18:11                         ` Andreas Hartmann
  2015-08-07  6:04                           ` Andreas Hartmann
  0 siblings, 1 reply; 35+ messages in thread
From: Andreas Hartmann @ 2015-08-04 18:11 UTC (permalink / raw)
  To: Jeff Moyer, Mike Snitzer
  Cc: device-mapper development, linux-pci, Andreas Hartmann,
	Jens Axboe, linux-ide-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	Mikulas Patocka, Christoph Hellwig, Milan Broz

On 08/04/2015 at 06:10 PM Jeff Moyer wrote:
> Mike Snitzer <snitzer-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> writes:
> 
>> On Mon, Aug 03 2015 at  4:12am -0400,
>> Joerg Roedel <joro-zLv9SwRftAIdnm+yROfE0A@public.gmane.org> wrote:
>>
>>> On Sun, Aug 02, 2015 at 08:48:06PM +0200, Andreas Hartmann wrote:
>>>>>> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=34b48db66e08ca1c1bc07cf305d672ac940268dc
>>>>>>
>>>>>> block: remove artifical max_hw_sectors cap
>>>
>>> Looking at the patch, it seems to me that it just uncovered a bug
>>> elsewhere. It looks like an underlying driver doesn't expect the
>>> big io-requests that the patch enables and does not dma-map the whole
>>> target buffer, causing the IO_PAGE_FAULTs later.
>>
>> That patch has caused issues elsewhere too, see this 'Revert "block:
>> remove artifical max_hw_sectors cap"' thread (if/when lkml.org
>> cooperates): https://lkml.org/lkml/2015/7/20/572
>>
>> But it could be that there is a need for a horkage fix for this specific
>> hardware? something comparable to this?:
>> http://git.kernel.org/linus/af34d637637eabaf49406eb35c948cd51ba262a6
>>
>> We are running out of time to fix these whack-a-mole issues in 4.2
>> though.
> 
> CC-ing linux-ide.  Original dm-devel posting:
>   https://www.redhat.com/archives/dm-devel/2015-July/msg00178.html
> 
> Andreas, I would be curious to know what the value of
> /sys/block/sdX/queue/max_hw_sectors_kb is for the affected disks.


It's always 32767.

Where does this value come from? Is it empirical or is it calculated (on
base of which parameters)?


Devices are:

1 x Corsair Force GT (SSD)
2 x ST3000DM001-1CH166 (rotational) (WD)


How can I set a higher max_sectors_kb on boot before the partitions are
mounted (-> as kernel option)? Mounting the partitions here (w/ systemd)
is the easiest and best way to trigger the problem!



Thanks,
Andreas

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: AMD-Vi IO_PAGE_FAULTs and ata3.00: failed command: READ FPDMA QUEUED errors since Linux 4.0
  2015-08-04 18:11                         ` Andreas Hartmann
@ 2015-08-07  6:04                           ` Andreas Hartmann
  0 siblings, 0 replies; 35+ messages in thread
From: Andreas Hartmann @ 2015-08-07  6:04 UTC (permalink / raw)
  To: Andreas Hartmann, Jeff Moyer, Mike Snitzer
  Cc: Joerg Roedel, linux-pci, device-mapper development, Jens Axboe,
	iommu, Leo Duran, Mikulas Patocka, Christoph Hellwig, Milan Broz,
	linux-ide

On 08/04/2015 at 08:11 PM, Andreas Hartmann wrote:
> On 08/04/2015 at 06:10 PM Jeff Moyer wrote:
>> Mike Snitzer <snitzer@redhat.com> writes:
>>
>>> On Mon, Aug 03 2015 at  4:12am -0400,
>>> Joerg Roedel <joro@8bytes.org> wrote:
>>>
>>>> On Sun, Aug 02, 2015 at 08:48:06PM +0200, Andreas Hartmann wrote:
>>>>>>> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=34b48db66e08ca1c1bc07cf305d672ac940268dc
>>>>>>>
>>>>>>> block: remove artifical max_hw_sectors cap
>>>>
>>>> Looking at the patch, it seems to me that it just uncovered a bug
>>>> elsewhere. It looks like an underlying driver doesn't expect the
>>>> big io-requests that the patch enables and does not dma-map the whole
>>>> target buffer, causing the IO_PAGE_FAULTs later.
>>>
>>> That patch has caused issues elsewhere too, see this 'Revert "block:
>>> remove artifical max_hw_sectors cap"' thread (if/when lkml.org
>>> cooperates): https://lkml.org/lkml/2015/7/20/572
>>>
>>> But it could be that there is a need for a horkage fix for this specific
>>> hardware? something comparable to this?:
>>> http://git.kernel.org/linus/af34d637637eabaf49406eb35c948cd51ba262a6
>>>
>>> We are running out of time to fix these whack-a-mole issues in 4.2
>>> though.
>>
>> CC-ing linux-ide.  Original dm-devel posting:
>>   https://www.redhat.com/archives/dm-devel/2015-July/msg00178.html
>>
>> Andreas, I would be curious to know what the value of
>> /sys/block/sdX/queue/max_hw_sectors_kb is for the affected disks.
> 
> 
> It's always 32767.
> 
> Where does this value come from? Is it empirical or is it calculated (on
> base of which parameters)?
> 
> 
> Devices are:
> 
> 1 x Corsair Force GT (SSD)
> 2 x ST3000DM001-1CH166 (rotational) (WD)
> 
> 
> How can I set a higher max_sectors_kb on boot before the partitions are
> mounted (-> as kernel option)? Mounting the partitions here (w/ systemd)
> is the easiest and best way to trigger the problem!

Please - no idea how to set higher max_sectors_kb before mounting of
partitions? Come on! I want to test exactly this situation!


Regards,
Andreas

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [dm-devel] AMD-Vi IO_PAGE_FAULTs and ata3.00: failed command: READ FPDMA QUEUED errors since Linux 4.0
       [not found]             ` <alpine.LRH.2.02.1508021347480.17729-Hpncn10jQN4oNljnaZt3ZvA+iT7yCHsGwRM8/txMwJMAicBL8TP8PQ@public.gmane.org>
  2015-08-02 18:48               ` Andreas Hartmann
@ 2015-09-20  6:50               ` Andreas Hartmann
       [not found]                 ` <55FE5740.2060701-YKS6W9RDU/w@public.gmane.org>
  1 sibling, 1 reply; 35+ messages in thread
From: Andreas Hartmann @ 2015-09-20  6:50 UTC (permalink / raw)
  To: Mikulas Patocka, Andreas Hartmann, Joerg Roedel,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, Leo Duran
  Cc: linux-pci, Jens Axboe, device-mapper development, Linus Torvalds,
	Christoph Hellwig, Milan Broz

On 08/02/2015 at 07:57 PM, Mikulas Patocka wrote:
> 
> 
> On Sun, 2 Aug 2015, Andreas Hartmann wrote:
> 
>> On 08/01/2015 at 04:20 PM Andreas Hartmann wrote:
>>> On 07/28/2015 at 09:29 PM, Mike Snitzer wrote:
>>> [...]
>>>> Mikulas was saying to biect what is causing ATA to fail.
>>>
>>> Some good news and some bad news. The good news first:
>>>
>>> Your patchset
>>>
>>> f3396c58fd8442850e759843457d78b6ec3a9589,
>>> cf2f1abfbd0dba701f7f16ef619e4d2485de3366,
>>> 7145c241a1bf2841952c3e297c4080b357b3e52d,
>>> 94f5e0243c48aa01441c987743dc468e2d6eaca2,
>>> dc2676210c425ee8e5cb1bec5bc84d004ddf4179,
>>> 0f5d8e6ee758f7023e4353cca75d785b2d4f6abe,
>>> b3c5fd3052492f1b8d060799d4f18be5a5438add
>>>
>>> seems to work fine w/ 3.18.19 !!
>>>
>>> Why did I test it with 3.18.x now? Because I suddenly got two ata errors
>>> (ata1 and ata2) with clean 3.19.8 (w/o the AMD-Vi IO_PAGE_FAULTs) during
>>> normal operation. This means: 3.19 must already be broken, too.
>>>
>>> Therefore, I applied your patchset to 3.18.x and it seems to work like a
>>> charme - I don't get any AMD-Vi IO_PAGE_FAULTs on boot and no ata errors
>>> (until now).
>>>
>>>
>>> Next I did: I tried to bisect between 3.18 and 3.19 with your patchset
>>> applied, because w/ this patchset applied, the problem can be seen
>>> easily and directly on boot. Unfortunately, this does work only a few
>>> git bisect rounds until I got stuck because of interferences with your
>>> extra patches applied:
>>
>> [Resolved the problems written at the last post.]
>>
>> Bisecting ended here:
>>
>> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=34b48db66e08ca1c1bc07cf305d672ac940268dc
>>
>> block: remove artifical max_hw_sectors cap
>>
>>
>> Removing this patch on 3.19 and 4.1 make things working again. Didn't
>> test 4.0, but I think it's the same. No more AMD-Vi IO_PAGE_FAULTS  with
>> that patch reverted.

After long period of testing, I now can say, that max_sectors_kb can be
set to 1024 - higher values produce AMD-Vi IO_PAGE_FAULTS and ata faults.


This patch "sd: Fix maximum I/O size for BLOCK_PC requests"[1] as part
of 4.1.7 produces ata / AMD-Vi IO_PAGE_FAULTS already during boot, too -
no matter if "block: remove artifical max_hw_sectors cap"[2] has been
applied or not.


Next I tested was "dm crypt: constrain crypt device's max_segment_size
to PAGE_SIZE" patch[3] applied to an unchanged 4.1.7 kernel w/o setting
max_sectors_kb to 1024.

Interesting effect was, that booting has been fine, but I could see lots
of ata errors afterwards as soon as there is load on the md raid 1
(during kernel compile e.g.), which is built on *rotational* disks:


[  367.264873] ata2.00: exception Emask 0x0 SAct 0x7fbfffff SErr 0x0
action 0x6 frozen
[  367.264883] ata2.00: failed command: WRITE FPDMA QUEUED
[  367.264893] ata2.00: cmd 61/40:00:b0:7b:d4/05:00:06:00:00/40 tag 0
ncq 688128 out
[  367.264893]          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask
0x4 (timeout)
[  367.264899] ata2.00: status: { DRDY }
...
[  367.265332] ata2.00: failed command: WRITE FPDMA QUEUED
[  367.265339] ata2.00: cmd 61/40:f0:30:71:d4/05:00:06:00:00/40 tag 30
ncq 688128 out
[  367.265339]          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask
0x4 (timeout)
[  367.265343] ata2.00: status: { DRDY }
[  367.265350] ata2: hard resetting link
[  367.775330] ata2: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[  367.776970] ata2.00: configured for UDMA/133
[  367.776997] ata2.00: device reported invalid CHS sector 0
...
[  367.777761] ata2: EH complete


Iow: Using an unpatched kernel >= 3.19 means high risk to break
filesystems if there are given some yet unknown conditions [4].

>>
>>
>> Please check why this patch triggers AMD-Vi IO_PAGE_FAULTS.
> 
> I would submit this bug to maintainers of AMD-Vi. They understand the 
> hardware, so they should tell why do large I/O requests result in 
> IO_PAGE_FAULTs.
> 
> It is probably bug either in AMD-Vi driver or in hardware.

Until now, I didn't hear anything from the maintainers of AMD-Vi.


Regards,
Andreas Hartmann


[1] http://thread.gmane.org/gmane.linux.kernel.commits.head/538464
[2]
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=34b48db66e08ca1c1bc07cf305d672ac940268dc
[3]
http://news.gmane.org/find-root.php?group=gmane.linux.kernel&article=2036495
[4] http://thread.gmane.org/gmane.linux.kernel.pci/43851/focus=44011

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [dm-devel] AMD-Vi IO_PAGE_FAULTs and ata3.00: failed command: READ FPDMA QUEUED errors since Linux 4.0
       [not found]                 ` <55FE5740.2060701-YKS6W9RDU/w@public.gmane.org>
@ 2015-09-29 15:21                   ` Joerg Roedel
       [not found]                     ` <20150929152100.GL3036-zLv9SwRftAIdnm+yROfE0A@public.gmane.org>
  0 siblings, 1 reply; 35+ messages in thread
From: Joerg Roedel @ 2015-09-29 15:21 UTC (permalink / raw)
  To: Andreas Hartmann
  Cc: linux-pci, device-mapper development, Jens Axboe,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	Mikulas Patocka, Linus Torvalds, Christoph Hellwig, Milan Broz

On Sun, Sep 20, 2015 at 08:50:40AM +0200, Andreas Hartmann wrote:
> > I would submit this bug to maintainers of AMD-Vi. They understand the 
> > hardware, so they should tell why do large I/O requests result in 
> > IO_PAGE_FAULTs.
> > 
> > It is probably bug either in AMD-Vi driver or in hardware.
> 
> Until now, I didn't hear anything from the maintainers of AMD-Vi.

What do you mean by this? I've been commenting on this issue in the
past and I thought we agreed that this is no issue of the IOMMU driver.

It it were, bisection should lead to a commit that breaks it, but there
are no commits between v3.18 and v3.19 in the AMD IOMMU driver touching
the DMA-API path.


	Joerg

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [dm-devel] AMD-Vi IO_PAGE_FAULTs and ata3.00: failed command: READ FPDMA QUEUED errors since Linux 4.0
       [not found]                     ` <20150929152100.GL3036-zLv9SwRftAIdnm+yROfE0A@public.gmane.org>
@ 2015-09-29 15:58                       ` Mikulas Patocka
  2015-09-29 16:20                         ` Joerg Roedel
  0 siblings, 1 reply; 35+ messages in thread
From: Mikulas Patocka @ 2015-09-29 15:58 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: device-mapper development, linux-pci,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, Jens Axboe,
	Andreas Hartmann, Linus Torvalds, Christoph Hellwig, Milan Broz

On Tue, 29 Sep 2015, Joerg Roedel wrote:

> On Sun, Sep 20, 2015 at 08:50:40AM +0200, Andreas Hartmann wrote:
> > > I would submit this bug to maintainers of AMD-Vi. They understand the 
> > > hardware, so they should tell why do large I/O requests result in 
> > > IO_PAGE_FAULTs.
> > > 
> > > It is probably bug either in AMD-Vi driver or in hardware.
> > 
> > Until now, I didn't hear anything from the maintainers of AMD-Vi.
> 
> What do you mean by this? I've been commenting on this issue in the
> past and I thought we agreed that this is no issue of the IOMMU driver.
> 
> It it were, bisection should lead to a commit that breaks it, but there
> are no commits between v3.18 and v3.19 in the AMD IOMMU driver touching
> the DMA-API path.
> 
> 	Joerg

I don't know why are you so certain that the bug in not in AMD-Vi IOMMU.

There was a patch (34b48db66e08ca1c1bc07cf305d672ac940268dc) that 
increased default block request size. That patch triggers AMD-Vi page 
faults. The bug may be in ATA driver, in ATA controller on in AMD-Vi 
driver or hardware. I didn't see anything in that thread that proves that 
the bug in not in AMD-Vi IOMMU.

The bug probably existed even before kernel 3.19, but it was masked by the 
fact that I/O request size was artifically capped. Bisecting probably 
won't find it, as it may have existed since ever.

Mikulas

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [dm-devel] AMD-Vi IO_PAGE_FAULTs and ata3.00: failed command: READ FPDMA QUEUED errors since Linux 4.0
  2015-09-29 15:58                       ` Mikulas Patocka
@ 2015-09-29 16:20                         ` Joerg Roedel
       [not found]                           ` <20150929162042.GR3036-zLv9SwRftAIdnm+yROfE0A@public.gmane.org>
  0 siblings, 1 reply; 35+ messages in thread
From: Joerg Roedel @ 2015-09-29 16:20 UTC (permalink / raw)
  To: Mikulas Patocka
  Cc: Andreas Hartmann, iommu, Leo Duran, Christoph Hellwig,
	device-mapper development, Milan Broz, Jens Axboe, linux-pci,
	Linus Torvalds

On Tue, Sep 29, 2015 at 11:58:10AM -0400, Mikulas Patocka wrote:
> On Tue, 29 Sep 2015, Joerg Roedel wrote:
> There was a patch (34b48db66e08ca1c1bc07cf305d672ac940268dc) that 
> increased default block request size. That patch triggers AMD-Vi page 
> faults. The bug may be in ATA driver, in ATA controller on in AMD-Vi 
> driver or hardware. I didn't see anything in that thread that proves that 
> the bug in not in AMD-Vi IOMMU.
> 
> The bug probably existed even before kernel 3.19, but it was masked by the 
> fact that I/O request size was artifically capped. Bisecting probably 
> won't find it, as it may have existed since ever.

Okay, I see. But as long as the request-size is not bigger than 128MB
(the biggest chunk the AMD IOMMU driver can map at once), I don't see
how the IOMMU driver could be at fault.

Which ATA driver is in use when this happens and are there instructions
on how to reproduce the issue?

Alternativly someone who can reproduce it should trace the calls to
__map_single and __unmap_single in the AMD IOMMU driver to find out
whether the addresses which the faults happen on are really mapped, or
at least requested from the AMD IOMMU driver.



	Joerg

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [dm-devel] AMD-Vi IO_PAGE_FAULTs and ata3.00: failed command: READ FPDMA QUEUED errors since Linux 4.0
       [not found]                           ` <20150929162042.GR3036-zLv9SwRftAIdnm+yROfE0A@public.gmane.org>
@ 2015-09-30 14:52                             ` Andreas Hartmann
  2015-10-06 10:13                               ` Joerg Roedel
  0 siblings, 1 reply; 35+ messages in thread
From: Andreas Hartmann @ 2015-09-30 14:52 UTC (permalink / raw)
  To: Joerg Roedel, Mikulas Patocka
  Cc: device-mapper development, linux-pci,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, Jens Axboe,
	Andreas Hartmann, Linus Torvalds, Christoph Hellwig, Milan Broz

On 09/29/2015 at 06:20 PM, Joerg Roedel wrote:
> On Tue, Sep 29, 2015 at 11:58:10AM -0400, Mikulas Patocka wrote:
>> On Tue, 29 Sep 2015, Joerg Roedel wrote:
>> There was a patch (34b48db66e08ca1c1bc07cf305d672ac940268dc) that 
>> increased default block request size. That patch triggers AMD-Vi page 
>> faults. The bug may be in ATA driver, in ATA controller on in AMD-Vi 
>> driver or hardware. I didn't see anything in that thread that proves that 
>> the bug in not in AMD-Vi IOMMU.
>>
>> The bug probably existed even before kernel 3.19, but it was masked by the 
>> fact that I/O request size was artifically capped. Bisecting probably 
>> won't find it, as it may have existed since ever.
> 
> Okay, I see. But as long as the request-size is not bigger than 128MB
> (the biggest chunk the AMD IOMMU driver can map at once), I don't see
> how the IOMMU driver could be at fault.
> 
> Which ATA driver is in use when this happens and are there instructions
> on how to reproduce the issue?
> 
> Alternativly someone who can reproduce it should trace the calls to
> __map_single and __unmap_single in the AMD IOMMU driver to find out
> whether the addresses which the faults happen on are really mapped, or
> at least requested from the AMD IOMMU driver.

How can I trace it?


Thanks,
Andreas

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [dm-devel] AMD-Vi IO_PAGE_FAULTs and ata3.00: failed command: READ FPDMA QUEUED errors since Linux 4.0
  2015-09-30 14:52                             ` Andreas Hartmann
@ 2015-10-06 10:13                               ` Joerg Roedel
       [not found]                                 ` <20151006101356.GE12506-zLv9SwRftAIdnm+yROfE0A@public.gmane.org>
  0 siblings, 1 reply; 35+ messages in thread
From: Joerg Roedel @ 2015-10-06 10:13 UTC (permalink / raw)
  To: Andreas Hartmann
  Cc: Mikulas Patocka, iommu, Leo Duran, Christoph Hellwig,
	device-mapper development, Milan Broz, Jens Axboe, linux-pci,
	Linus Torvalds

[-- Attachment #1: Type: text/plain, Size: 839 bytes --]

On Wed, Sep 30, 2015 at 04:52:47PM +0200, Andreas Hartmann wrote:
> > Alternativly someone who can reproduce it should trace the calls to
> > __map_single and __unmap_single in the AMD IOMMU driver to find out
> > whether the addresses which the faults happen on are really mapped, or
> > at least requested from the AMD IOMMU driver.
> 
> How can I trace it?

Please apply the attached debug patch on-top of Linux v4.3-rc3 and boot
the machine. After boot you run (as root):


	# cat /sys/kernel/debug/tracing/trace_pipe > trace-data

Please run this in a seperate shell an keep it running.

Then trigger the problem while the above command is running. When you
triggered it, please send me the (compressed) trace-data file, full
dmesg and output of lspci on the box.

Please let me know if you have further questions.


Thanks,

	Joerg


[-- Attachment #2: iommu-debug.patch --]
[-- Type: text/x-diff, Size: 2298 bytes --]

diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
index f82060e7..0002e79 100644
--- a/drivers/iommu/amd_iommu.c
+++ b/drivers/iommu/amd_iommu.c
@@ -2465,6 +2465,7 @@ static dma_addr_t __map_single(struct device *dev,
 {
 	dma_addr_t offset = paddr & ~PAGE_MASK;
 	dma_addr_t address, start, ret;
+	phys_addr_t old_paddr = paddr;
 	unsigned int pages;
 	unsigned long align_mask = 0;
 	int i;
@@ -2521,6 +2522,8 @@ retry:
 		domain_flush_pages(&dma_dom->domain, address, size);
 
 out:
+	trace_printk("%s: mapped %llx paddr %llx size %zu\n",
+			dev_name(dev), address, old_paddr, size);
 	return address;
 
 out_unmap:
@@ -2532,6 +2535,9 @@ out_unmap:
 
 	dma_ops_free_addresses(dma_dom, address, pages);
 
+	trace_printk("%s: return DMA_ERROR_CODE paddr %llx size %zu\n",
+			dev_name(dev), old_paddr, size);
+
 	return DMA_ERROR_CODE;
 }
 
@@ -2628,6 +2634,8 @@ static void unmap_page(struct device *dev, dma_addr_t dma_addr, size_t size,
 
 	spin_lock_irqsave(&domain->lock, flags);
 
+	trace_printk("%s: unmap dma_addr %llx size %zu\n",
+			dev_name(dev), dma_addr, size);
 	__unmap_single(domain->priv, dma_addr, size, dir);
 
 	domain_flush_complete(domain);
@@ -2683,9 +2691,13 @@ out:
 	return mapped_elems;
 unmap:
 	for_each_sg(sglist, s, mapped_elems, i) {
-		if (s->dma_address)
+		if (s->dma_address) {
+			trace_printk("%s: unmap dma_addr %llx size %u\n",
+					dev_name(dev), s->dma_address,
+					s->dma_length);
 			__unmap_single(domain->priv, s->dma_address,
 				       s->dma_length, dir);
+		}
 		s->dma_address = s->dma_length = 0;
 	}
 
@@ -2716,6 +2728,9 @@ static void unmap_sg(struct device *dev, struct scatterlist *sglist,
 	spin_lock_irqsave(&domain->lock, flags);
 
 	for_each_sg(sglist, s, nelems, i) {
+	trace_printk("%s: unmap dma_addr %llx size %u\n",
+			dev_name(dev), s->dma_address, s->dma_length);
+
 		__unmap_single(domain->priv, s->dma_address,
 			       s->dma_length, dir);
 		s->dma_address = s->dma_length = 0;
@@ -2813,6 +2828,9 @@ static void free_coherent(struct device *dev, size_t size,
 
 	spin_lock_irqsave(&domain->lock, flags);
 
+	trace_printk("%s: unmap dma_addr %llx size %zu\n",
+			dev_name(dev), dma_addr, size);
+
 	__unmap_single(domain->priv, dma_addr, size, DMA_BIDIRECTIONAL);
 
 	domain_flush_complete(domain);

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: [dm-devel] AMD-Vi IO_PAGE_FAULTs and ata3.00: failed command: READ FPDMA QUEUED errors since Linux 4.0
       [not found]                                 ` <20151006101356.GE12506-zLv9SwRftAIdnm+yROfE0A@public.gmane.org>
@ 2015-10-06 18:37                                   ` Andreas Hartmann
       [not found]                                     ` <56141507.7040103-YKS6W9RDU/w@public.gmane.org>
  2015-10-07 15:40                                     ` Joerg Roedel
  0 siblings, 2 replies; 35+ messages in thread
From: Andreas Hartmann @ 2015-10-06 18:37 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: linux-pci, device-mapper development, Jens Axboe,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	Mikulas Patocka, Linus Torvalds, Christoph Hellwig, Milan Broz

[-- Attachment #1: Type: text/plain, Size: 1383 bytes --]

On 10/06/2015 at 12:13 PM, Joerg Roedel wrote:
> On Wed, Sep 30, 2015 at 04:52:47PM +0200, Andreas Hartmann wrote:
>>> Alternativly someone who can reproduce it should trace the calls to
>>> __map_single and __unmap_single in the AMD IOMMU driver to find out
>>> whether the addresses which the faults happen on are really mapped, or
>>> at least requested from the AMD IOMMU driver.
>>
>> How can I trace it?
> 
> Please apply the attached debug patch on-top of Linux v4.3-rc3 and boot
> the machine. After boot you run (as root):
> 
> 
> 	# cat /sys/kernel/debug/tracing/trace_pipe > trace-data
> 
> Please run this in a seperate shell an keep it running.
> 
> Then trigger the problem while the above command is running. When you
> triggered it, please send me the (compressed) trace-data file, full
> dmesg and output of lspci on the box.

Hmm, *seems* to work fine w/ 4.3-rc2. But I have to do some more tests
to be really sure.


W/ 4.1.10, the problem can be seen most always during boot (systemd) -
but at this point, it is difficult to trace. I have to take a closer
look to find a place to start the trace already during boot process.


But there is another problem w/ 4.3-rc2: Starting a VM w/ PCIe
passthrough doesn't work any more. I'm getting the attached null pointer
dereference and the machine hangs.


Thanks,
regards,
Andreas

[-- Attachment #2: trace --]
[-- Type: text/plain, Size: 21102 bytes --]

Oct  6 20:11:18 localhost kernel: [   32.461794] BUG: unable to handle kernel NULL pointer dereference at 00000000000000b8
Oct  6 20:11:18 localhost kernel: [   32.461853] IP: [<ffffffff8147a8a4>] do_detach+0x24/0xa0
Oct  6 20:11:18 localhost kernel: [   32.461888] PGD 0 
Oct  6 20:11:18 localhost kernel: [   32.461902] Oops: 0002 [#1] PREEMPT SMP 
Oct  6 20:11:18 localhost kernel: [   32.461929] Modules linked in: nf_log_ipv4 nf_log_common xt_LOG ipt_REJECT xt_tcpudp nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack iptable_filter ip_tables x_tables vfio_iommu_type1 vfio_pci vfio vfio_virqfd drbg ansi_cprng nfsd lockd grace nfs_acl auth_rpcgss sunrpc bridge stp llc tun it87 hwmon_vid snd_hda_codec_hdmi kvm_amd snd_hda_codec_realtek kvm snd_hda_codec_generic fam15h_power usb_storage snd_hda_intel pcspkr serio_raw snd_hda_codec edac_core snd_hda_core edac_mce_amd k10temp snd_hwdep snd_pcm firewire_ohci snd_seq e100 firewire_core crc_itu_t amdkfd sp5100_tco amd_iommu_v2 i2c_piix4 mxm_wmi sr_mod cdrom radeon snd_timer snd_seq_device snd ttm drm_kms_helper xhci_pci drm r8169 xhci_hcd mii fb_sys_fops sysimgblt sysfillrect syscopyarea soundcore i2c_algo_bit shpchp tpm_infineon tpm_tis tpm fjes 8250_fintek wmi button acpi_cpufreq sg thermal xfs libcrc32c linear crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel ohci_pci processor scsi_dh_rdac scsi_dh_hp_sw scsi_dh_emc scsi_dh_alua raid456 async_raid6_recov async_pq async_xor xor async_memcpy async_tx raid6_pq raid10 raid1 raid0 md_mod dm_snapshot dm_bufio dm_mirror dm_region_hash dm_log dm_crypt dm_mod aesni_intel ablk_helper cryptd lrw gf128mul glue_helper aes_x86_64 ata_generic pata_atiixp
Oct  6 20:11:18 localhost kernel: [   32.462728] CPU: 0 PID: 9374 Comm: qemu-system-x86 Not tainted 4.3.0-rc2-4-desktop #1
Oct  6 20:11:18 localhost kernel: [   32.462767] Hardware name: Gigabyte Technology Co., Ltd. GA-990XA-UD3/GA-990XA-UD3, BIOS F14b 01/24/2013
Oct  6 20:11:18 localhost kernel: [   32.462814] task: ffff8805f4ecc080 ti: ffff8805e1ec4000 task.ti: ffff8805e1ec4000
Oct  6 20:11:18 localhost kernel: [   32.462851] RIP: 0010:[<ffffffff8147a8a4>]  [<ffffffff8147a8a4>] do_detach+0x24/0xa0
Oct  6 20:11:18 localhost kernel: [   32.462894] RSP: 0018:ffff8805e1ec7ca0  EFLAGS: 00010006
Oct  6 20:11:18 localhost kernel: [   32.462922] RAX: 0000000000000000 RBX: ffff880614bef640 RCX: 00000000000000ff
Oct  6 20:11:18 localhost kernel: [   32.462959] RDX: 0000000000000000 RSI: ffff88062e70c098 RDI: ffff880614bef640
Oct  6 20:11:18 localhost kernel: [   32.462998] RBP: ffff8805e1ec7ca8 R08: ffff880614befc40 R09: 0000000000000000
Oct  6 20:11:18 localhost kernel: [   32.463033] R10: 0000000000000000 R11: ffffffff81a58df8 R12: ffff880614befc40
Oct  6 20:11:18 localhost kernel: [   32.463071] R13: ffff8806144b9858 R14: 0000000000000082 R15: ffff88062e70c098
Oct  6 20:11:18 localhost kernel: [   32.463110] FS:  00007f27cc824b80(0000) GS:ffff88062ec00000(0000) knlGS:0000000000000000
Oct  6 20:11:18 localhost kernel: [   32.463148] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Oct  6 20:11:18 localhost kernel: [   32.463177] CR2: 00000000000000b8 CR3: 00000000c8011000 CR4: 00000000000406f0
Oct  6 20:11:18 localhost kernel: [   32.463211] Stack:
Oct  6 20:11:18 localhost kernel: [   32.463223]  ffff880614bef640 ffff8805e1ec7cd8 ffffffff8147a96c ffff880614befc40
Oct  6 20:11:18 localhost kernel: [   32.463268]  ffff88062e70c098 0000000000000286 ffff8806144b9800 ffff8805e1ec7d08
Oct  6 20:11:18 localhost kernel: [   32.463310]  ffffffff8147aaf5 ffff880615dd0bc0 ffff8805e8354a00 ffff880614befc40
Oct  6 20:11:18 localhost kernel: [   32.463355] Call Trace:
Oct  6 20:11:18 localhost kernel: [   32.463377]  [<ffffffff8147a96c>] __detach_device+0x4c/0x80
Oct  6 20:11:18 localhost kernel: [   32.463412]  [<ffffffff8147aaf5>] detach_device+0x35/0xa0
Oct  6 20:11:18 localhost kernel: [   32.463444]  [<ffffffff8147b706>] amd_iommu_attach_device+0x66/0x2b0
Oct  6 20:11:18 localhost kernel: [   32.463481]  [<ffffffff81475d8e>] __iommu_attach_device+0x1e/0x80
Oct  6 20:11:18 localhost kernel: [   32.463513]  [<ffffffff81477013>] __iommu_attach_group+0x53/0x80
Oct  6 20:11:18 localhost kernel: [   32.463547]  [<ffffffff8147706b>] iommu_attach_group+0x2b/0x40
Oct  6 20:11:18 localhost kernel: [   32.463583]  [<ffffffffa07e9407>] vfio_iommu_type1_attach_group+0x187/0x4f8 [vfio_iommu_type1]
Oct  6 20:11:18 localhost kernel: [   32.463655]  [<ffffffffa07297e8>] vfio_fops_unl_ioctl+0x1b8/0x290 [vfio]
Oct  6 20:11:18 localhost kernel: [   32.463699]  [<ffffffff811f81cd>] do_vfs_ioctl+0x2cd/0x4c0
Oct  6 20:11:18 localhost kernel: [   32.463740]  [<ffffffff811f8439>] SyS_ioctl+0x79/0x90
Oct  6 20:11:18 localhost kernel: [   32.463775]  [<ffffffff816b3936>] entry_SYSCALL_64_fastpath+0x16/0x75
Oct  6 20:11:18 localhost kernel: [   32.466223] DWARF2 unwinder stuck at entry_SYSCALL_64_fastpath+0x16/0x75
Oct  6 20:11:18 localhost kernel: [   32.466258] 
Oct  6 20:11:18 localhost kernel: [   32.466267] Leftover inexact backtrace:
Oct  6 20:11:18 localhost kernel: [   32.466267] 
Oct  6 20:11:18 localhost kernel: [   32.466298] Code: 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 8b 05 23 c7 d9 00 48 89 e5 53 0f b7 57 40 48 89 fb 48 8b 04 d0 48 63 50 10 48 8b 47 38 <83> ac 90 b8 00 00 00 01 48 8b 47 38 83 a8 b4 00 00 00 01 48 8b 
Oct  6 20:11:18 localhost kernel: [   32.466496] RIP  [<ffffffff8147a8a4>] do_detach+0x24/0xa0
Oct  6 20:11:18 localhost kernel: [   32.466527]  RSP <ffff8805e1ec7ca0>
Oct  6 20:11:18 localhost kernel: [   32.466547] CR2: 00000000000000b8
Oct  6 20:11:18 localhost kernel: [   32.476984] ---[ end trace 09ac28af2000b365 ]---
Oct  6 20:11:18 localhost kernel: [   32.477031] note: qemu-system-x86[9374] exited with preempt_count 2
Oct  6 20:11:19 localhost kernel: [   32.577276] walk_shadow_page_get_mmio_spte: detect reserved bits on spte, addr 0xb8000, dump hierarchy:
Oct  6 20:11:19 localhost kernel: [   32.577279] ------ spte 0x5db618027 level 4.
Oct  6 20:11:19 localhost kernel: [   32.577281] ------ spte 0x5db619027 level 3.
Oct  6 20:11:19 localhost kernel: [   32.577281] ------ spte 0x5db61a027 level 2.
Oct  6 20:11:19 localhost kernel: [   32.577282] ------ spte 0xffff0000000b8f6f level 1.
Oct  6 20:11:19 localhost kernel: [   32.577283] ------------[ cut here ]------------
Oct  6 20:11:19 localhost kernel: [   32.577301] WARNING: CPU: 2 PID: 9389 at ../arch/x86/kvm/mmu.c:3385 handle_mmio_page_fault.isra.85+0x2c/0x31 [kvm]()
Oct  6 20:11:19 localhost kernel: [   32.577302] Modules linked in: nf_log_ipv4 nf_log_common xt_LOG ipt_REJECT xt_tcpudp nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack iptable_filter ip_tables x_tables vfio_iommu_type1 vfio_pci vfio vfio_virqfd drbg ansi_cprng nfsd lockd grace nfs_acl auth_rpcgss sunrpc bridge stp llc tun it87 hwmon_vid snd_hda_codec_hdmi kvm_amd snd_hda_codec_realtek kvm snd_hda_codec_generic fam15h_power usb_storage snd_hda_intel pcspkr serio_raw snd_hda_codec edac_core snd_hda_core edac_mce_amd k10temp snd_hwdep snd_pcm firewire_ohci snd_seq e100 firewire_core crc_itu_t amdkfd sp5100_tco amd_iommu_v2 i2c_piix4 mxm_wmi sr_mod cdrom radeon snd_timer snd_seq_device snd ttm drm_kms_helper xhci_pci drm r8169 xhci_hcd mii fb_sys_fops sysimgblt sysfillrect syscopyarea soundcore i2c_algo_bit shpchp tpm_infineon tpm_tis tpm fjes 8250_fintek wmi button acpi_cpufreq sg thermal xfs libcrc32c linear crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel ohci_pci processor scsi_dh_rdac scsi_dh_hp_sw scsi_dh_emc scsi_dh_alua raid456 async_raid6_recov async_pq async_xor xor async_memcpy async_tx raid6_pq raid10 raid1 raid0 md_mod dm_snapshot dm_bufio dm_mirror dm_region_hash dm_log dm_crypt dm_mod aesni_intel ablk_helper cryptd lrw gf128mul glue_helper aes_x86_64 ata_generic pata_atiixp
Oct  6 20:11:19 localhost kernel: [   32.577357] CPU: 2 PID: 9389 Comm: qemu-system-x86 Tainted: G      D         4.3.0-rc2-4-desktop #1
Oct  6 20:11:19 localhost kernel: [   32.577358] Hardware name: Gigabyte Technology Co., Ltd. GA-990XA-UD3/GA-990XA-UD3, BIOS F14b 01/24/2013
Oct  6 20:11:19 localhost kernel: [   32.577360]  ffffffffa083f3ce ffff8805e0f43ba0 ffffffff81368633 0000000000000000
Oct  6 20:11:19 localhost kernel: [   32.577362]  ffff8805e0f43bd8 ffffffff8106a628 ffff8805e2a70040 00000000000b8000
Oct  6 20:11:19 localhost kernel: [   32.577364]  0000000000000000 000000000000000f 000000000000000f ffff8805e0f43be8
Oct  6 20:11:19 localhost kernel: [   32.577366] Call Trace:
Oct  6 20:11:19 localhost kernel: [   32.577374]  [<ffffffff810085ee>] try_stack_unwind+0x17e/0x190
Oct  6 20:11:19 localhost kernel: [   32.577379]  [<ffffffff8100737f>] dump_trace+0x8f/0x3b0
Oct  6 20:11:19 localhost kernel: [   32.577382]  [<ffffffff8100864d>] show_trace_log_lvl+0x4d/0x60
Oct  6 20:11:19 localhost kernel: [   32.577385]  [<ffffffff810077a1>] show_stack_log_lvl+0x101/0x190
Oct  6 20:11:19 localhost kernel: [   32.577387]  [<ffffffff810086a5>] show_stack+0x25/0x50
Oct  6 20:11:19 localhost kernel: [   32.577390]  [<ffffffff81368633>] dump_stack+0x4b/0x78
Oct  6 20:11:19 localhost kernel: [   32.577394]  [<ffffffff8106a628>] warn_slowpath_common+0x88/0xc0
Oct  6 20:11:19 localhost kernel: [   32.577397]  [<ffffffff8106a71a>] warn_slowpath_null+0x1a/0x20
Oct  6 20:11:19 localhost kernel: [   32.577408]  [<ffffffffa08318eb>] handle_mmio_page_fault.isra.85+0x2c/0x31 [kvm]
Oct  6 20:11:19 localhost kernel: [   32.577427]  [<ffffffffa0816106>] tdp_page_fault+0x246/0x260 [kvm]
Oct  6 20:11:19 localhost kernel: [   32.577441]  [<ffffffffa080fbd4>] kvm_mmu_page_fault+0x24/0x110 [kvm]
Oct  6 20:11:19 localhost kernel: [   32.577446]  [<ffffffffa0cdbfc9>] pf_interception+0xc9/0x150 [kvm_amd]
Oct  6 20:11:19 localhost kernel: [   32.577451]  [<ffffffffa0cdf5a0>] handle_exit+0x180/0x9b0 [kvm_amd]
Oct  6 20:11:19 localhost kernel: [   32.577462]  [<ffffffffa0805dd9>] vcpu_enter_guest+0x769/0xde0 [kvm]
Oct  6 20:11:19 localhost kernel: [   32.577475]  [<ffffffffa080c62a>] kvm_arch_vcpu_ioctl_run+0x2da/0x400 [kvm]
Oct  6 20:11:19 localhost kernel: [   32.577486]  [<ffffffffa07f4d8f>] kvm_vcpu_ioctl+0x30f/0x5c0 [kvm]
Oct  6 20:11:19 localhost kernel: [   32.577490]  [<ffffffff811f81cd>] do_vfs_ioctl+0x2cd/0x4c0
Oct  6 20:11:19 localhost kernel: [   32.577497]  [<ffffffff811f8439>] SyS_ioctl+0x79/0x90
Oct  6 20:11:19 localhost kernel: [   32.577500]  [<ffffffff816b3936>] entry_SYSCALL_64_fastpath+0x16/0x75
Oct  6 20:11:19 localhost kernel: [   32.578805] DWARF2 unwinder stuck at entry_SYSCALL_64_fastpath+0x16/0x75
Oct  6 20:11:19 localhost kernel: [   32.578806] 
Oct  6 20:11:19 localhost kernel: [   32.578806] Leftover inexact backtrace:
Oct  6 20:11:19 localhost kernel: [   32.578806] 
Oct  6 20:11:19 localhost kernel: [   32.578808] ---[ end trace 09ac28af2000b366 ]---
Oct  6 20:11:19 localhost kernel: [   32.655728] walk_shadow_page_get_mmio_spte: detect reserved bits on spte, addr 0xb8000, dump hierarchy:
Oct  6 20:11:19 localhost kernel: [   32.655731] ------ spte 0x5e6703027 level 4.
Oct  6 20:11:19 localhost kernel: [   32.655732] ------ spte 0x5e5fb8027 level 3.
Oct  6 20:11:19 localhost kernel: [   32.655733] ------ spte 0x5e5fb9027 level 2.
Oct  6 20:11:19 localhost kernel: [   32.655734] ------ spte 0xffff0000000b8f67 level 1.
Oct  6 20:11:19 localhost kernel: [   32.655735] ------------[ cut here ]------------
Oct  6 20:11:19 localhost kernel: [   32.655764] WARNING: CPU: 2 PID: 9390 at ../arch/x86/kvm/mmu.c:3385 handle_mmio_page_fault.isra.85+0x2c/0x31 [kvm]()
Oct  6 20:11:19 localhost kernel: [   32.655765] Modules linked in: nf_log_ipv4 nf_log_common xt_LOG ipt_REJECT xt_tcpudp nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack iptable_filter ip_tables x_tables vfio_iommu_type1 vfio_pci vfio vfio_virqfd drbg ansi_cprng nfsd lockd grace nfs_acl auth_rpcgss sunrpc bridge stp llc tun it87 hwmon_vid snd_hda_codec_hdmi kvm_amd snd_hda_codec_realtek kvm snd_hda_codec_generic fam15h_power usb_storage snd_hda_intel pcspkr serio_raw snd_hda_codec edac_core snd_hda_core edac_mce_amd k10temp snd_hwdep snd_pcm firewire_ohci snd_seq e100 firewire_core crc_itu_t amdkfd sp5100_tco amd_iommu_v2 i2c_piix4 mxm_wmi sr_mod cdrom radeon snd_timer snd_seq_device snd ttm drm_kms_helper xhci_pci drm r8169 xhci_hcd mii fb_sys_fops sysimgblt sysfillrect syscopyarea soundcore i2c_algo_bit shpchp tpm_infineon tpm_tis tpm fjes 8250_fintek wmi button acpi_cpufreq sg thermal xfs libcrc32c linear crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel ohci_pci processor scsi_dh_rdac scsi_dh_hp_sw scsi_dh_emc scsi_dh_alua raid456 async_raid6_recov async_pq async_xor xor async_memcpy async_tx raid6_pq raid10 raid1 raid0 md_mod dm_snapshot dm_bufio dm_mirror dm_region_hash dm_log dm_crypt dm_mod aesni_intel ablk_helper cryptd lrw gf128mul glue_helper aes_x86_64 ata_generic pata_atiixp
Oct  6 20:11:19 localhost kernel: [   32.655823] CPU: 2 PID: 9390 Comm: qemu-system-x86 Tainted: G      D W       4.3.0-rc2-4-desktop #1
Oct  6 20:11:19 localhost kernel: [   32.655824] Hardware name: Gigabyte Technology Co., Ltd. GA-990XA-UD3/GA-990XA-UD3, BIOS F14b 01/24/2013
Oct  6 20:11:19 localhost kernel: [   32.655826]  ffffffffa083f3ce ffff8805e221fba0 ffffffff81368633 0000000000000000
Oct  6 20:11:19 localhost kernel: [   32.655828]  ffff8805e221fbd8 ffffffff8106a628 ffff8805e1e30080 00000000000b8000
Oct  6 20:11:19 localhost kernel: [   32.655830]  0000000000000000 000000000000000f 000000000000000f ffff8805e221fbe8
Oct  6 20:11:19 localhost kernel: [   32.655832] Call Trace:
Oct  6 20:11:19 localhost kernel: [   32.655840]  [<ffffffff810085ee>] try_stack_unwind+0x17e/0x190
Oct  6 20:11:19 localhost kernel: [   32.655845]  [<ffffffff8100737f>] dump_trace+0x8f/0x3b0
Oct  6 20:11:19 localhost kernel: [   32.655848]  [<ffffffff8100864d>] show_trace_log_lvl+0x4d/0x60
Oct  6 20:11:19 localhost kernel: [   32.655852]  [<ffffffff810077a1>] show_stack_log_lvl+0x101/0x190
Oct  6 20:11:19 localhost kernel: [   32.655864]  [<ffffffff810086a5>] show_stack+0x25/0x50
Oct  6 20:11:19 localhost kernel: [   32.655869]  [<ffffffff81368633>] dump_stack+0x4b/0x78
Oct  6 20:11:19 localhost kernel: [   32.655877]  [<ffffffff8106a628>] warn_slowpath_common+0x88/0xc0
Oct  6 20:11:19 localhost kernel: [   32.655881]  [<ffffffff8106a71a>] warn_slowpath_null+0x1a/0x20
Oct  6 20:11:19 localhost kernel: [   32.655894]  [<ffffffffa08318eb>] handle_mmio_page_fault.isra.85+0x2c/0x31 [kvm]
Oct  6 20:11:19 localhost kernel: [   32.655917]  [<ffffffffa0816106>] tdp_page_fault+0x246/0x260 [kvm]
Oct  6 20:11:19 localhost kernel: [   32.655940]  [<ffffffffa080fbd4>] kvm_mmu_page_fault+0x24/0x110 [kvm]
Oct  6 20:11:19 localhost kernel: [   32.655956]  [<ffffffffa0cdbfc9>] pf_interception+0xc9/0x150 [kvm_amd]
Oct  6 20:11:19 localhost kernel: [   32.655965]  [<ffffffffa0cdf5a0>] handle_exit+0x180/0x9b0 [kvm_amd]
Oct  6 20:11:19 localhost kernel: [   32.655983]  [<ffffffffa0805dd9>] vcpu_enter_guest+0x769/0xde0 [kvm]
Oct  6 20:11:19 localhost kernel: [   32.656000]  [<ffffffffa080c62a>] kvm_arch_vcpu_ioctl_run+0x2da/0x400 [kvm]
Oct  6 20:11:19 localhost kernel: [   32.656012]  [<ffffffffa07f4d8f>] kvm_vcpu_ioctl+0x30f/0x5c0 [kvm]
Oct  6 20:11:19 localhost kernel: [   32.656017]  [<ffffffff811f81cd>] do_vfs_ioctl+0x2cd/0x4c0
Oct  6 20:11:19 localhost kernel: [   32.656023]  [<ffffffff811f8439>] SyS_ioctl+0x79/0x90
Oct  6 20:11:19 localhost kernel: [   32.656028]  [<ffffffff816b3936>] entry_SYSCALL_64_fastpath+0x16/0x75
Oct  6 20:11:19 localhost kernel: [   32.657322] DWARF2 unwinder stuck at entry_SYSCALL_64_fastpath+0x16/0x75
Oct  6 20:11:19 localhost kernel: [   32.657324] 
Oct  6 20:11:19 localhost kernel: [   32.657324] Leftover inexact backtrace:
Oct  6 20:11:19 localhost kernel: [   32.657324] 
Oct  6 20:11:19 localhost kernel: [   32.657327] ---[ end trace 09ac28af2000b367 ]---
Oct  6 20:11:19 localhost kernel: [   32.715379] walk_shadow_page_get_mmio_spte: detect reserved bits on spte, addr 0xb8000, dump hierarchy:
Oct  6 20:11:19 localhost kernel: [   32.715382] ------ spte 0x5e6707027 level 4.
Oct  6 20:11:19 localhost kernel: [   32.715383] ------ spte 0x5e5781027 level 3.
Oct  6 20:11:19 localhost kernel: [   32.715384] ------ spte 0x5e2291027 level 2.
Oct  6 20:11:19 localhost kernel: [   32.715385] ------ spte 0xffff0000000b8f67 level 1.
Oct  6 20:11:19 localhost kernel: [   32.715386] ------------[ cut here ]------------
Oct  6 20:11:19 localhost kernel: [   32.715405] WARNING: CPU: 2 PID: 9393 at ../arch/x86/kvm/mmu.c:3385 handle_mmio_page_fault.isra.85+0x2c/0x31 [kvm]()
Oct  6 20:11:19 localhost kernel: [   32.715407] Modules linked in: nf_log_ipv4 nf_log_common xt_LOG ipt_REJECT xt_tcpudp nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack iptable_filter ip_tables x_tables vfio_iommu_type1 vfio_pci vfio vfio_virqfd drbg ansi_cprng nfsd lockd grace nfs_acl auth_rpcgss sunrpc bridge stp llc tun it87 hwmon_vid snd_hda_codec_hdmi kvm_amd snd_hda_codec_realtek kvm snd_hda_codec_generic fam15h_power usb_storage snd_hda_intel pcspkr serio_raw snd_hda_codec edac_core snd_hda_core edac_mce_amd k10temp snd_hwdep snd_pcm firewire_ohci snd_seq e100 firewire_core crc_itu_t amdkfd sp5100_tco amd_iommu_v2 i2c_piix4 mxm_wmi sr_mod cdrom radeon snd_timer snd_seq_device snd ttm drm_kms_helper xhci_pci drm r8169 xhci_hcd mii fb_sys_fops sysimgblt sysfillrect syscopyarea soundcore i2c_algo_bit shpchp tpm_infineon tpm_tis tpm fjes 8250_fintek wmi button acpi_cpufreq sg thermal xfs libcrc32c linear crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel ohci_pci processor scsi_dh_rdac scsi_dh_hp_sw scsi_dh_emc scsi_dh_alua raid456 async_raid6_recov async_pq async_xor xor async_memcpy async_tx raid6_pq raid10 raid1 raid0 md_mod dm_snapshot dm_bufio dm_mirror dm_region_hash dm_log dm_crypt dm_mod aesni_intel ablk_helper cryptd lrw gf128mul glue_helper aes_x86_64 ata_generic pata_atiixp
Oct  6 20:11:19 localhost kernel: [   32.715461] CPU: 2 PID: 9393 Comm: qemu-system-x86 Tainted: G      D W       4.3.0-rc2-4-desktop #1
Oct  6 20:11:19 localhost kernel: [   32.715462] Hardware name: Gigabyte Technology Co., Ltd. GA-990XA-UD3/GA-990XA-UD3, BIOS F14b 01/24/2013
Oct  6 20:11:19 localhost kernel: [   32.715464]  ffffffffa083f3ce ffff8805e93afba0 ffffffff81368633 0000000000000000
Oct  6 20:11:19 localhost kernel: [   32.715466]  ffff8805e93afbd8 ffffffff8106a628 ffff8805ec604140 00000000000b8000
Oct  6 20:11:19 localhost kernel: [   32.715468]  0000000000000000 000000000000000f 000000000000000f ffff8805e93afbe8
Oct  6 20:11:19 localhost kernel: [   32.715470] Call Trace:
Oct  6 20:11:19 localhost kernel: [   32.715478]  [<ffffffff810085ee>] try_stack_unwind+0x17e/0x190
Oct  6 20:11:19 localhost kernel: [   32.715483]  [<ffffffff8100737f>] dump_trace+0x8f/0x3b0
Oct  6 20:11:19 localhost kernel: [   32.715486]  [<ffffffff8100864d>] show_trace_log_lvl+0x4d/0x60
Oct  6 20:11:19 localhost kernel: [   32.715488]  [<ffffffff810077a1>] show_stack_log_lvl+0x101/0x190
Oct  6 20:11:19 localhost kernel: [   32.715491]  [<ffffffff810086a5>] show_stack+0x25/0x50
Oct  6 20:11:19 localhost kernel: [   32.715494]  [<ffffffff81368633>] dump_stack+0x4b/0x78
Oct  6 20:11:19 localhost kernel: [   32.715498]  [<ffffffff8106a628>] warn_slowpath_common+0x88/0xc0
Oct  6 20:11:19 localhost kernel: [   32.715501]  [<ffffffff8106a71a>] warn_slowpath_null+0x1a/0x20
Oct  6 20:11:19 localhost kernel: [   32.715512]  [<ffffffffa08318eb>] handle_mmio_page_fault.isra.85+0x2c/0x31 [kvm]
Oct  6 20:11:19 localhost kernel: [   32.715532]  [<ffffffffa0816106>] tdp_page_fault+0x246/0x260 [kvm]
Oct  6 20:11:19 localhost kernel: [   32.715546]  [<ffffffffa080fbd4>] kvm_mmu_page_fault+0x24/0x110 [kvm]
Oct  6 20:11:19 localhost kernel: [   32.715551]  [<ffffffffa0cdbfc9>] pf_interception+0xc9/0x150 [kvm_amd]
Oct  6 20:11:19 localhost kernel: [   32.715556]  [<ffffffffa0cdf5a0>] handle_exit+0x180/0x9b0 [kvm_amd]
Oct  6 20:11:19 localhost kernel: [   32.715567]  [<ffffffffa0805dd9>] vcpu_enter_guest+0x769/0xde0 [kvm]
Oct  6 20:11:19 localhost kernel: [   32.715580]  [<ffffffffa080c62a>] kvm_arch_vcpu_ioctl_run+0x2da/0x400 [kvm]
Oct  6 20:11:19 localhost kernel: [   32.715590]  [<ffffffffa07f4d8f>] kvm_vcpu_ioctl+0x30f/0x5c0 [kvm]
Oct  6 20:11:19 localhost kernel: [   32.715595]  [<ffffffff811f81cd>] do_vfs_ioctl+0x2cd/0x4c0
Oct  6 20:11:19 localhost kernel: [   32.715602]  [<ffffffff811f8439>] SyS_ioctl+0x79/0x90
Oct  6 20:11:19 localhost kernel: [   32.715606]  [<ffffffff816b3936>] entry_SYSCALL_64_fastpath+0x16/0x75
Oct  6 20:11:19 localhost kernel: [   32.716901] DWARF2 unwinder stuck at entry_SYSCALL_64_fastpath+0x16/0x75
Oct  6 20:11:19 localhost kernel: [   32.716902] 
Oct  6 20:11:19 localhost kernel: [   32.716902] Leftover inexact backtrace:
Oct  6 20:11:19 localhost kernel: [   32.716902] 
Oct  6 20:11:19 localhost kernel: [   32.716904] ---[ end trace 09ac28af2000b368 ]---

[-- Attachment #3: Type: text/plain, Size: 0 bytes --]



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [dm-devel] AMD-Vi IO_PAGE_FAULTs and ata3.00: failed command: READ FPDMA QUEUED errors since Linux 4.0
       [not found]                                     ` <56141507.7040103-YKS6W9RDU/w@public.gmane.org>
@ 2015-10-07  2:57                                       ` Andreas Hartmann
       [not found]                                         ` <56148A1B.5060506-YKS6W9RDU/w@public.gmane.org>
  0 siblings, 1 reply; 35+ messages in thread
From: Andreas Hartmann @ 2015-10-07  2:57 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: linux-pci, device-mapper development, Jens Axboe,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	Mikulas Patocka, Linus Torvalds, Christoph Hellwig, Milan Broz

[-- Attachment #1: Type: text/plain, Size: 1539 bytes --]

On 10/06/2015 at 08:37 PM, Andreas Hartmann wrote:
> On 10/06/2015 at 12:13 PM, Joerg Roedel wrote:
>> On Wed, Sep 30, 2015 at 04:52:47PM +0200, Andreas Hartmann wrote:
>>>> Alternativly someone who can reproduce it should trace the calls to
>>>> __map_single and __unmap_single in the AMD IOMMU driver to find out
>>>> whether the addresses which the faults happen on are really mapped, or
>>>> at least requested from the AMD IOMMU driver.
>>>
>>> How can I trace it?
>>
>> Please apply the attached debug patch on-top of Linux v4.3-rc3 and boot
>> the machine. After boot you run (as root):
>>
>>
>> 	# cat /sys/kernel/debug/tracing/trace_pipe > trace-data
>>
>> Please run this in a seperate shell an keep it running.
>>
>> Then trigger the problem while the above command is running. When you
>> triggered it, please send me the (compressed) trace-data file, full
>> dmesg and output of lspci on the box.
> 
> Hmm, *seems* to work fine w/ 4.3-rc2. But I have to do some more tests
> to be really sure.
> 
> 
> W/ 4.1.10, the problem can be seen most always during boot (systemd) -
> but at this point, it is difficult to trace. I have to take a closer
> look to find a place to start the trace already during boot process.

Got it during a single mount (I booted with massively reduced mounts and
did the mount afterwards manually. During the second manually mount, the
problem can be seen).

I attached the requested files. The mount starts at 80 seconds.
Hope this helps.


Thanks,
Andreas


[-- Attachment #2: dmesg.mount.gz --]
[-- Type: application/x-gzip, Size: 21873 bytes --]

[-- Attachment #3: trace.xz --]
[-- Type: application/x-xz, Size: 351048 bytes --]

[-- Attachment #4: Type: text/plain, Size: 0 bytes --]



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [dm-devel] AMD-Vi IO_PAGE_FAULTs and ata3.00: failed command: READ FPDMA QUEUED errors since Linux 4.0
  2015-10-06 18:37                                   ` Andreas Hartmann
       [not found]                                     ` <56141507.7040103-YKS6W9RDU/w@public.gmane.org>
@ 2015-10-07 15:40                                     ` Joerg Roedel
  2015-10-07 17:02                                       ` Andreas Hartmann
  1 sibling, 1 reply; 35+ messages in thread
From: Joerg Roedel @ 2015-10-07 15:40 UTC (permalink / raw)
  To: Andreas Hartmann
  Cc: Mikulas Patocka, iommu, Leo Duran, Christoph Hellwig,
	device-mapper development, Milan Broz, Jens Axboe, linux-pci,
	Linus Torvalds

On Tue, Oct 06, 2015 at 08:37:59PM +0200, Andreas Hartmann wrote:
> But there is another problem w/ 4.3-rc2: Starting a VM w/ PCIe
> passthrough doesn't work any more. I'm getting the attached null pointer
> dereference and the machine hangs.

Weird, probably a do_detach call for a device that is already detached.
Anyway, I can't reproduce this here on my two AMD IOMMU machines.  Can
you please boot the machine with amd_iommu_dump on the kernel command
line and send me dmesg after boot?

Also, which device are you trying to attach to the guest (pci
bus/device/function)?

Output of lspci might also be helpful.


Thanks,

	Joerg

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [dm-devel] AMD-Vi IO_PAGE_FAULTs and ata3.00: failed command: READ FPDMA QUEUED errors since Linux 4.0
       [not found]                                         ` <56148A1B.5060506-YKS6W9RDU/w@public.gmane.org>
@ 2015-10-07 16:10                                           ` Joerg Roedel
       [not found]                                             ` <20151007161022.GI28811-zLv9SwRftAIdnm+yROfE0A@public.gmane.org>
  0 siblings, 1 reply; 35+ messages in thread
From: Joerg Roedel @ 2015-10-07 16:10 UTC (permalink / raw)
  To: Andreas Hartmann
  Cc: linux-pci, device-mapper development, Jens Axboe,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	Mikulas Patocka, Linus Torvalds, Christoph Hellwig, Milan Broz

On Wed, Oct 07, 2015 at 04:57:31AM +0200, Andreas Hartmann wrote:
> Got it during a single mount (I booted with massively reduced mounts and
> did the mount afterwards manually. During the second manually mount, the
> problem can be seen).
> 
> I attached the requested files. The mount starts at 80 seconds.
> Hope this helps.

Okay, the lowest dma-addr the AMD IOMMU driver returns is 0x1000 and
the highest is 0x7ff4000. All fault addresses are outside of this range,
so the AMD IOMMU driver never returned these addresses.

This doesn't mean that it is not at fault, but it looks still unlikely.
Maybe I can reproduce the problem here. Can you please tell me some
details about the partitions you mounted to trigger this?

I remember something about a xfs->lvm->dm_crypt->md_raid->sata setup, but
having more details may help me to reproduce.

	Joerg

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [dm-devel] AMD-Vi IO_PAGE_FAULTs and ata3.00: failed command: READ FPDMA QUEUED errors since Linux 4.0
       [not found]                                             ` <20151007161022.GI28811-zLv9SwRftAIdnm+yROfE0A@public.gmane.org>
@ 2015-10-07 16:52                                               ` Andreas Hartmann
  2015-10-08 16:39                                                 ` Joerg Roedel
  0 siblings, 1 reply; 35+ messages in thread
From: Andreas Hartmann @ 2015-10-07 16:52 UTC (permalink / raw)
  To: Joerg Roedel, Andreas Hartmann
  Cc: linux-pci, device-mapper development, Jens Axboe,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	Mikulas Patocka, Linus Torvalds, Christoph Hellwig, Milan Broz

On 10/07/2015 at 06:10 PM, Joerg Roedel wrote:
> On Wed, Oct 07, 2015 at 04:57:31AM +0200, Andreas Hartmann wrote:
>> Got it during a single mount (I booted with massively reduced mounts and
>> did the mount afterwards manually. During the second manually mount, the
>> problem can be seen).
>>
>> I attached the requested files. The mount starts at 80 seconds.
>> Hope this helps.
> 
> Okay, the lowest dma-addr the AMD IOMMU driver returns is 0x1000 and
> the highest is 0x7ff4000. All fault addresses are outside of this range,
> so the AMD IOMMU driver never returned these addresses.
> 
> This doesn't mean that it is not at fault, but it looks still unlikely.
> Maybe I can reproduce the problem here. Can you please tell me some
> details about the partitions you mounted to trigger this?
> 
> I remember something about a xfs->lvm->dm_crypt->md_raid->sata setup, but
> having more details may help me to reproduce.

See attachments in http://article.gmane.org/gmane.linux.kernel.pci/43975


To reproduce the error:
First I mounted /daten2, afterwards /raid/mt, which produces the errors.
The ssd mounts have been already active (during boot by fstab).

If I mount all of them already during boot, the system mostly starts to
emergency mode which unfortunately is broken here.


Regards,
Andreas

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [dm-devel] AMD-Vi IO_PAGE_FAULTs and ata3.00: failed command: READ FPDMA QUEUED errors since Linux 4.0
  2015-10-07 15:40                                     ` Joerg Roedel
@ 2015-10-07 17:02                                       ` Andreas Hartmann
  2015-10-08 17:30                                         ` Joerg Roedel
  0 siblings, 1 reply; 35+ messages in thread
From: Andreas Hartmann @ 2015-10-07 17:02 UTC (permalink / raw)
  To: Joerg Roedel, Andreas Hartmann
  Cc: Mikulas Patocka, iommu, Leo Duran, Christoph Hellwig,
	device-mapper development, Milan Broz, Jens Axboe, linux-pci,
	Linus Torvalds

[-- Attachment #1: Type: text/plain, Size: 997 bytes --]

On 10/07/2015 at 05:40 PM Joerg Roedel wrote:
> On Tue, Oct 06, 2015 at 08:37:59PM +0200, Andreas Hartmann wrote:
>> But there is another problem w/ 4.3-rc2: Starting a VM w/ PCIe
>> passthrough doesn't work any more. I'm getting the attached null pointer
>> dereference and the machine hangs.
> 
> Weird, probably a do_detach call for a device that is already detached.
> Anyway, I can't reproduce this here on my two AMD IOMMU machines.  Can
> you please boot the machine with amd_iommu_dump on the kernel command
> line and send me dmesg after boot?

Binding the device to vfio isn't a problem (it's done before the vm is
started). The problem occurs during start of qemu-system-x86_64 (2.3.0).

The attached dmesg.out doesn't show the trace, but the desired iommu dump.

> Also, which device are you trying to attach to the guest (pci
> bus/device/function)?

See attached ath9k.device.

> Output of lspci might also be helpful.

I attached lscpi and dmesg.


Hope that helps,
thanks,
Andreas

[-- Attachment #2: dmesg.out.gz --]
[-- Type: application/x-gzip, Size: 20466 bytes --]

[-- Attachment #3: lspci.gz --]
[-- Type: application/x-gzip, Size: 2528 bytes --]

[-- Attachment #4: ath9k.device.gz --]
[-- Type: application/x-gzip, Size: 1567 bytes --]

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [dm-devel] AMD-Vi IO_PAGE_FAULTs and ata3.00: failed command: READ FPDMA QUEUED errors since Linux 4.0
  2015-10-07 16:52                                               ` Andreas Hartmann
@ 2015-10-08 16:39                                                 ` Joerg Roedel
       [not found]                                                   ` <20151008163957.GK28811-zLv9SwRftAIdnm+yROfE0A@public.gmane.org>
  0 siblings, 1 reply; 35+ messages in thread
From: Joerg Roedel @ 2015-10-08 16:39 UTC (permalink / raw)
  To: Andreas Hartmann
  Cc: Mikulas Patocka, iommu, Leo Duran, Christoph Hellwig,
	device-mapper development, Milan Broz, Jens Axboe, linux-pci,
	Linus Torvalds

On Wed, Oct 07, 2015 at 06:52:58PM +0200, Andreas Hartmann wrote:
> To reproduce the error:
> First I mounted /daten2, afterwards /raid/mt, which produces the errors.
> The ssd mounts have been already active (during boot by fstab).

Okay, I spent the day on that problem, and managed to reproduce it here
on one of my AMD IOMMU boxes. I wasn't an easy journey, as I can only
reproduce it if I setup the crypto partition and everything above that
(like mounting the lvm volumes) _after_ the system has finished booting.
If everything is setup during system boot it works fine and I don't see
any IO_PAGE_FAULTS.

I also tried kernel v4.3-rc4 first, to have it tested with a
self-compiled kernel. It didn't show up there, so I built a 4.1.0, where
it showed up again. Something seems to have fixed the issue in the
latest kernels.

So I looked a little bit around at the commits that were merged into the
respective parts involved here, and found this one:

	586b286 dm crypt: constrain crypt device's max_segment_size to PAGE_SIZE

The problem fixed with this commit looks quite similar to what you have
seen (execpt that there was no IOMMU involved). So I cherry-picked that
commit on 4.1.0 and tested that. The problem was gone.

So it looks like it was a dm-crypt issue, the patch went into v4.3-rc3,
either this kernel of rc4 should fix the problem for you too. Can you
please verify this is fixed for you too with v4.3-rc4?

Thanks,

	Joerg

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [dm-devel] AMD-Vi IO_PAGE_FAULTs and ata3.00: failed command: READ FPDMA QUEUED errors since Linux 4.0
  2015-10-07 17:02                                       ` Andreas Hartmann
@ 2015-10-08 17:30                                         ` Joerg Roedel
       [not found]                                           ` <20151008173007.GL28811-zLv9SwRftAIdnm+yROfE0A@public.gmane.org>
  0 siblings, 1 reply; 35+ messages in thread
From: Joerg Roedel @ 2015-10-08 17:30 UTC (permalink / raw)
  To: Andreas Hartmann
  Cc: Mikulas Patocka, iommu, Leo Duran, Christoph Hellwig,
	device-mapper development, Milan Broz, Jens Axboe, linux-pci,
	Linus Torvalds

On Wed, Oct 07, 2015 at 07:02:32PM +0200, Andreas Hartmann wrote:
> Binding the device to vfio isn't a problem (it's done before the vm is
> started). The problem occurs during start of qemu-system-x86_64 (2.3.0).
> 
> The attached dmesg.out doesn't show the trace, but the desired iommu dump.
> 
> > Also, which device are you trying to attach to the guest (pci
> > bus/device/function)?
> 
> See attached ath9k.device.

Hmm, can you also test this again with the v4.3-rc4 please? The device
you are attaching has its own group and no aliases, so I really can't
see how the trace could happen, and I can't reproduce it here either.

So I just want to make sure it is not a follow-on bug from the previous
problem.


Thanks,

	Joerg

P.S.: When you build the kernel with debug symbols and the problem
      occurs again, you can find out the source file and line where the
      bug happened with

      	$ objdump -Dlz --start-address=<rip> vmlinux | head

      Replace <rip> with the RIP from the kernel oops and the output
      should show you where in the source the bug comes from. The
      vmlinux file is in the kernels build directory.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [dm-devel] AMD-Vi IO_PAGE_FAULTs and ata3.00: failed command: READ FPDMA QUEUED errors since Linux 4.0
       [not found]                                                   ` <20151008163957.GK28811-zLv9SwRftAIdnm+yROfE0A@public.gmane.org>
@ 2015-10-08 18:21                                                     ` Andreas Hartmann
       [not found]                                                       ` <5616B436.1000802-YKS6W9RDU/w@public.gmane.org>
  0 siblings, 1 reply; 35+ messages in thread
From: Andreas Hartmann @ 2015-10-08 18:21 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: linux-pci, device-mapper development, Jens Axboe,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	Mikulas Patocka, Linus Torvalds, Christoph Hellwig, Milan Broz

Am 08.10.2015 um 18:39 schrieb Joerg Roedel:
> On Wed, Oct 07, 2015 at 06:52:58PM +0200, Andreas Hartmann wrote:
>> To reproduce the error:
>> First I mounted /daten2, afterwards /raid/mt, which produces the errors.
>> The ssd mounts have been already active (during boot by fstab).
>
> Okay, I spent the day on that problem, and managed to reproduce it here
> on one of my AMD IOMMU boxes. I wasn't an easy journey, as I can only
> reproduce it if I setup the crypto partition and everything above that
> (like mounting the lvm volumes) _after_ the system has finished booting.
> If everything is setup during system boot it works fine and I don't see
> any IO_PAGE_FAULTS.

Thank you very much for spending so much of your time to reproduce the 
problem!

> I also tried kernel v4.3-rc4 first, to have it tested with a
> self-compiled kernel. It didn't show up there, so I built a 4.1.0, where
> it showed up again. Something seems to have fixed the issue in the
> latest kernels.
>
> So I looked a little bit around at the commits that were merged into the
> respective parts involved here, and found this one:
>
> 	586b286 dm crypt: constrain crypt device's max_segment_size to PAGE_SIZE
 >
> The problem fixed with this commit looks quite similar to what you have
> seen (execpt that there was no IOMMU involved). So I cherry-picked that
> commit on 4.1.0 and tested that. The problem was gone.

That's true - I already knew this patch and tested it some weeks ago - 
unfortunately it doesn't fix the problem here.

To be really sure, I just retested it now again. I couldn't see any 
IO_PAGE_FAULTS errors today (unfortunately I can't remember anymore if I 
didn't see them too a few weeks ago) - but the ata errors remain. 
Therefore, this patch isn't a solution for the problem I encounter here.

> So it looks like it was a dm-crypt issue, the patch went into v4.3-rc3,
> either this kernel of rc4 should fix the problem for you too. Can you
> please verify this is fixed for you too with v4.3-rc4?

As I already wrote, I even couldn't see the problem with v4.3-rc2 any 
more (as far as I was able to test because of the other problem). I have 
to do some more tests now with this kernel to be really sure.


Kind regards,
Andreas

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [dm-devel] AMD-Vi IO_PAGE_FAULTs and ata3.00: failed command: READ FPDMA QUEUED errors since Linux 4.0
       [not found]                                           ` <20151008173007.GL28811-zLv9SwRftAIdnm+yROfE0A@public.gmane.org>
@ 2015-10-08 18:59                                             ` Andreas Hartmann
       [not found]                                               ` <5616BCF4.10104-YKS6W9RDU/w@public.gmane.org>
  0 siblings, 1 reply; 35+ messages in thread
From: Andreas Hartmann @ 2015-10-08 18:59 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: linux-pci, device-mapper development, Jens Axboe,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	Mikulas Patocka, Linus Torvalds, Christoph Hellwig, Milan Broz

On 10/08/2015 at 07:30 PM Joerg Roedel wrote:
> On Wed, Oct 07, 2015 at 07:02:32PM +0200, Andreas Hartmann wrote:
>> Binding the device to vfio isn't a problem (it's done before the vm is
>> started). The problem occurs during start of qemu-system-x86_64 (2.3.0).
>>
>> The attached dmesg.out doesn't show the trace, but the desired iommu dump.
>>
>>> Also, which device are you trying to attach to the guest (pci
>>> bus/device/function)?
>>
>> See attached ath9k.device.
> 
> Hmm, can you also test this again with the v4.3-rc4 please? The device
> you are attaching has its own group and no aliases, so I really can't
> see how the trace could happen, and I can't reproduce it here either.

Unchanged - this time hard locked machine and no trace at all because of
data loss after reboot :-(.

Btw: Linux 4.2 doesn't show this problem.

Nevertheless I'll try to get a trace - maybe I'm lucky and the machine
doesn't lock up completely another time :-).


Regards,
Andreas

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [dm-devel] AMD-Vi IO_PAGE_FAULTs and ata3.00: failed command: READ FPDMA QUEUED errors since Linux 4.0
       [not found]                                               ` <5616BCF4.10104-YKS6W9RDU/w@public.gmane.org>
@ 2015-10-08 19:47                                                 ` Andreas Hartmann
  2015-10-09 10:40                                                   ` Joerg Roedel
       [not found]                                                   ` <5616C850.2000906-YKS6W9RDU/w@public.gmane.org>
  0 siblings, 2 replies; 35+ messages in thread
From: Andreas Hartmann @ 2015-10-08 19:47 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: linux-pci, device-mapper development, Jens Axboe,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	Mikulas Patocka, Linus Torvalds, Christoph Hellwig, Milan Broz

[-- Attachment #1: Type: text/plain, Size: 1385 bytes --]

On 10/08/2015 at 08:59 PM, Andreas Hartmann wrote:
> On 10/08/2015 at 07:30 PM Joerg Roedel wrote:
>> On Wed, Oct 07, 2015 at 07:02:32PM +0200, Andreas Hartmann wrote:
>>> Binding the device to vfio isn't a problem (it's done before the vm is
>>> started). The problem occurs during start of qemu-system-x86_64 (2.3.0).
>>>
>>> The attached dmesg.out doesn't show the trace, but the desired iommu dump.
>>>
>>>> Also, which device are you trying to attach to the guest (pci
>>>> bus/device/function)?
>>>
>>> See attached ath9k.device.
>>
>> Hmm, can you also test this again with the v4.3-rc4 please? The device
>> you are attaching has its own group and no aliases, so I really can't
>> see how the trace could happen, and I can't reproduce it here either.
> 
> Unchanged - this time hard locked machine and no trace at all because of
> data loss after reboot :-(.
> 
> Btw: Linux 4.2 doesn't show this problem.
> 
> Nevertheless I'll try to get a trace - maybe I'm lucky and the machine
> doesn't lock up completely another time :-).

Got it. I attached the complete oops and the output of objdump.

Kernel was linux 4.3-rc4


This time, the oops was caused by the second PCI card I'm passing
through to another VM (the ath9k card worked fine this time - chance?).
I added the lspci output to the attached file, too.


Thanks,
regards,
Andreas

[-- Attachment #2: oops.gz --]
[-- Type: application/x-gzip, Size: 2833 bytes --]

[-- Attachment #3: Type: text/plain, Size: 0 bytes --]



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [dm-devel] AMD-Vi IO_PAGE_FAULTs and ata3.00: failed command: READ FPDMA QUEUED errors since Linux 4.0
       [not found]                                                       ` <5616B436.1000802-YKS6W9RDU/w@public.gmane.org>
@ 2015-10-08 19:52                                                         ` Andreas Hartmann
       [not found]                                                           ` <5616C998.1010309-YKS6W9RDU/w@public.gmane.org>
  0 siblings, 1 reply; 35+ messages in thread
From: Andreas Hartmann @ 2015-10-08 19:52 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: linux-pci, device-mapper development, Jens Axboe,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	Mikulas Patocka, Linus Torvalds, Christoph Hellwig, Milan Broz

On 10/08/2015 at 08:21 PM, Andreas Hartmann wrote:
> Am 08.10.2015 um 18:39 schrieb Joerg Roedel:
>> On Wed, Oct 07, 2015 at 06:52:58PM +0200, Andreas Hartmann wrote:
>>> To reproduce the error:
>>> First I mounted /daten2, afterwards /raid/mt, which produces the errors.
>>> The ssd mounts have been already active (during boot by fstab).
>>
>> Okay, I spent the day on that problem, and managed to reproduce it here
>> on one of my AMD IOMMU boxes. I wasn't an easy journey, as I can only
>> reproduce it if I setup the crypto partition and everything above that
>> (like mounting the lvm volumes) _after_ the system has finished booting.
>> If everything is setup during system boot it works fine and I don't see
>> any IO_PAGE_FAULTS.
> 
> Thank you very much for spending so much of your time to reproduce the
> problem!
> 
>> I also tried kernel v4.3-rc4 first, to have it tested with a
>> self-compiled kernel. It didn't show up there, so I built a 4.1.0, where
>> it showed up again. Something seems to have fixed the issue in the
>> latest kernels.
>>
>> So I looked a little bit around at the commits that were merged into the
>> respective parts involved here, and found this one:
>>
>>     586b286 dm crypt: constrain crypt device's max_segment_size to
>> PAGE_SIZE
>>
>> The problem fixed with this commit looks quite similar to what you have
>> seen (execpt that there was no IOMMU involved). So I cherry-picked that
>> commit on 4.1.0 and tested that. The problem was gone.
> 
> That's true - I already knew this patch and tested it some weeks ago -
> unfortunately it doesn't fix the problem here.
> 
> To be really sure, I just retested it now again. I couldn't see any
> IO_PAGE_FAULTS errors today (unfortunately I can't remember anymore if I
> didn't see them too a few weeks ago) - but the ata errors remain.
> Therefore, this patch isn't a solution for the problem I encounter here.
> 
>> So it looks like it was a dm-crypt issue, the patch went into v4.3-rc3,
>> either this kernel of rc4 should fix the problem for you too. Can you
>> please verify this is fixed for you too with v4.3-rc4?
> 
> As I already wrote, I even couldn't see the problem with v4.3-rc2 any
> more (as far as I was able to test because of the other problem). I have
> to do some more tests now with this kernel to be really sure.

I now tested w/ v4.3-rc4. I couldn't see any IO_PAGE_FAULTS but the ata
errors remain. The ata errors can be easily activated by copying a large
file (> 4 GB) from one partition on the raid to another partition on the
raid.



Thanks,
Andreas

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [dm-devel] AMD-Vi IO_PAGE_FAULTs and ata3.00: failed command: READ FPDMA QUEUED errors since Linux 4.0
       [not found]                                                           ` <5616C998.1010309-YKS6W9RDU/w@public.gmane.org>
@ 2015-10-09  5:20                                                             ` Andreas Hartmann
       [not found]                                                               ` <56174EA6.7000106-YKS6W9RDU/w@public.gmane.org>
  0 siblings, 1 reply; 35+ messages in thread
From: Andreas Hartmann @ 2015-10-09  5:20 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: linux-pci, device-mapper development, Jens Axboe,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	Mikulas Patocka, Linus Torvalds, Christoph Hellwig, Milan Broz

On 10/08/2015 at 09:52 PM, Andreas Hartmann wrote:
> On 10/08/2015 at 08:21 PM, Andreas Hartmann wrote:
>> Am 08.10.2015 um 18:39 schrieb Joerg Roedel:
>>> On Wed, Oct 07, 2015 at 06:52:58PM +0200, Andreas Hartmann wrote:
>>>> To reproduce the error:
>>>> First I mounted /daten2, afterwards /raid/mt, which produces the errors.
>>>> The ssd mounts have been already active (during boot by fstab).
>>>
>>> Okay, I spent the day on that problem, and managed to reproduce it here
>>> on one of my AMD IOMMU boxes. I wasn't an easy journey, as I can only
>>> reproduce it if I setup the crypto partition and everything above that
>>> (like mounting the lvm volumes) _after_ the system has finished booting.
>>> If everything is setup during system boot it works fine and I don't see
>>> any IO_PAGE_FAULTS.
>>
>> Thank you very much for spending so much of your time to reproduce the
>> problem!
>>
>>> I also tried kernel v4.3-rc4 first, to have it tested with a
>>> self-compiled kernel. It didn't show up there, so I built a 4.1.0, where
>>> it showed up again. Something seems to have fixed the issue in the
>>> latest kernels.
>>>
>>> So I looked a little bit around at the commits that were merged into the
>>> respective parts involved here, and found this one:
>>>
>>>     586b286 dm crypt: constrain crypt device's max_segment_size to
>>> PAGE_SIZE
>>>
>>> The problem fixed with this commit looks quite similar to what you have
>>> seen (execpt that there was no IOMMU involved). So I cherry-picked that
>>> commit on 4.1.0 and tested that. The problem was gone.
>>
>> That's true - I already knew this patch and tested it some weeks ago -
>> unfortunately it doesn't fix the problem here.
>>
>> To be really sure, I just retested it now again. I couldn't see any
>> IO_PAGE_FAULTS errors today (unfortunately I can't remember anymore if I
>> didn't see them too a few weeks ago) - but the ata errors remain.
>> Therefore, this patch isn't a solution for the problem I encounter here.
>>
>>> So it looks like it was a dm-crypt issue, the patch went into v4.3-rc3,
>>> either this kernel of rc4 should fix the problem for you too. Can you
>>> please verify this is fixed for you too with v4.3-rc4?
>>
>> As I already wrote, I even couldn't see the problem with v4.3-rc2 any
>> more (as far as I was able to test because of the other problem). I have
>> to do some more tests now with this kernel to be really sure.
> 
> I now tested w/ v4.3-rc4. I couldn't see any IO_PAGE_FAULTS but the ata
> errors remain. The ata errors can be easily activated by copying a large
> file (> 4 GB) from one partition on the raid to another partition on the
> raid.

Hmmm, I retested this morning w/ v4.3-rc4 and 4.1.10 (with the above
mentioned patch applied) - and now, I didn't get any more ata errors.

I'm confused now. The only difference between yesterday evening and this
morning was, that the machine was over night completely powerless (via
socket outlet switch). Could this really be the reason? Let's wait and
see if this is a persistent state ... .

But the other new error w/ 4.3-rc-2 or 4 while starting a VM with PCI
passthrough remains even this morning :-(. Would have been nice if it
would have gone over night, too ...


Thanks,
regards,
Andreas

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [dm-devel] AMD-Vi IO_PAGE_FAULTs and ata3.00: failed command: READ FPDMA QUEUED errors since Linux 4.0
       [not found]                                                               ` <56174EA6.7000106-YKS6W9RDU/w@public.gmane.org>
@ 2015-10-09  9:15                                                                 ` Andreas Hartmann
       [not found]                                                                   ` <56178599.6010807-YKS6W9RDU/w@public.gmane.org>
  0 siblings, 1 reply; 35+ messages in thread
From: Andreas Hartmann @ 2015-10-09  9:15 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: linux-pci, device-mapper development, Jens Axboe,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	Mikulas Patocka, Linus Torvalds, Christoph Hellwig, Milan Broz

On 10/09/2015 at 07:20 AM, Andreas Hartmann wrote:
> On 10/08/2015 at 09:52 PM, Andreas Hartmann wrote:
>> On 10/08/2015 at 08:21 PM, Andreas Hartmann wrote:
>>> Am 08.10.2015 um 18:39 schrieb Joerg Roedel:
>>>> On Wed, Oct 07, 2015 at 06:52:58PM +0200, Andreas Hartmann wrote:
>>>>> To reproduce the error:
>>>>> First I mounted /daten2, afterwards /raid/mt, which produces the errors.
>>>>> The ssd mounts have been already active (during boot by fstab).
>>>>
>>>> Okay, I spent the day on that problem, and managed to reproduce it here
>>>> on one of my AMD IOMMU boxes. I wasn't an easy journey, as I can only
>>>> reproduce it if I setup the crypto partition and everything above that
>>>> (like mounting the lvm volumes) _after_ the system has finished booting.
>>>> If everything is setup during system boot it works fine and I don't see
>>>> any IO_PAGE_FAULTS.
>>>
>>> Thank you very much for spending so much of your time to reproduce the
>>> problem!
>>>
>>>> I also tried kernel v4.3-rc4 first, to have it tested with a
>>>> self-compiled kernel. It didn't show up there, so I built a 4.1.0, where
>>>> it showed up again. Something seems to have fixed the issue in the
>>>> latest kernels.
>>>>
>>>> So I looked a little bit around at the commits that were merged into the
>>>> respective parts involved here, and found this one:
>>>>
>>>>     586b286 dm crypt: constrain crypt device's max_segment_size to
>>>> PAGE_SIZE
>>>>
>>>> The problem fixed with this commit looks quite similar to what you have
>>>> seen (execpt that there was no IOMMU involved). So I cherry-picked that
>>>> commit on 4.1.0 and tested that. The problem was gone.
>>>
>>> That's true - I already knew this patch and tested it some weeks ago -
>>> unfortunately it doesn't fix the problem here.
>>>
>>> To be really sure, I just retested it now again. I couldn't see any
>>> IO_PAGE_FAULTS errors today (unfortunately I can't remember anymore if I
>>> didn't see them too a few weeks ago) - but the ata errors remain.
>>> Therefore, this patch isn't a solution for the problem I encounter here.
>>>
>>>> So it looks like it was a dm-crypt issue, the patch went into v4.3-rc3,
>>>> either this kernel of rc4 should fix the problem for you too. Can you
>>>> please verify this is fixed for you too with v4.3-rc4?
>>>
>>> As I already wrote, I even couldn't see the problem with v4.3-rc2 any
>>> more (as far as I was able to test because of the other problem). I have
>>> to do some more tests now with this kernel to be really sure.
>>
>> I now tested w/ v4.3-rc4. I couldn't see any IO_PAGE_FAULTS but the ata
>> errors remain. The ata errors can be easily activated by copying a large
>> file (> 4 GB) from one partition on the raid to another partition on the
>> raid.
> 
> Hmmm, I retested this morning w/ v4.3-rc4 and 4.1.10 (with the above
> mentioned patch applied) - and now, I didn't get any more ata errors.
> 
> I'm confused now. The only difference between yesterday evening and this
> morning was, that the machine was over night completely powerless (via
> socket outlet switch). Could this really be the reason? Let's wait and
> see if this is a persistent state ... .

No - it is not a persistent state. The ata errors are back again (in
3.1.10 w/ the above mentioned patch applied). It just isn't that easy
any more to trigger them. After a short time of intermission w/ power
off / on cycle, the error came up up again doing the first test copy.
This means: there must be something more broken.

If I revert the original culprit of all of the problems (block: remove
artifical max_hw_sectors cap), it is possible to increase max_sectors_kb
to 1024 - any higher value leads to ata or IO_PAGE_FAULTS sooner or later.

v4.3-rc4 isn't usable at all for me as long as is hangs the machine on
the necessary PCI passthrough for VMs (I need them).


Regards,
Andreas

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [dm-devel] AMD-Vi IO_PAGE_FAULTs and ata3.00: failed command: READ FPDMA QUEUED errors since Linux 4.0
  2015-10-08 19:47                                                 ` Andreas Hartmann
@ 2015-10-09 10:40                                                   ` Joerg Roedel
       [not found]                                                   ` <5616C850.2000906-YKS6W9RDU/w@public.gmane.org>
  1 sibling, 0 replies; 35+ messages in thread
From: Joerg Roedel @ 2015-10-09 10:40 UTC (permalink / raw)
  To: Andreas Hartmann
  Cc: Mikulas Patocka, iommu, Leo Duran, Christoph Hellwig,
	device-mapper development, Milan Broz, Jens Axboe, linux-pci,
	Linus Torvalds

On Thu, Oct 08, 2015 at 09:47:28PM +0200, Andreas Hartmann wrote:
> Got it. I attached the complete oops and the output of objdump.
> 
> Kernel was linux 4.3-rc4
> 
> 
> This time, the oops was caused by the second PCI card I'm passing
> through to another VM (the ath9k card worked fine this time - chance?).
> I added the lspci output to the attached file, too.

Okay, thanks, this makes more sense to me. It looks like you are
attaching a 32bit PCI device, which has an alias. This is definitly a
bug in the AMD IOMMU driver and I have an idea how to fix it. I'll look
into this after lunch.



	Joerg

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH] iommu/amd: Fix NULL pointer deref on device detach READ FPDMA QUEUED errors since Linux 4.0
       [not found]                                                   ` <5616C850.2000906-YKS6W9RDU/w@public.gmane.org>
@ 2015-10-09 14:45                                                     ` Joerg Roedel
  2015-10-09 17:42                                                       ` Andreas Hartmann
  0 siblings, 1 reply; 35+ messages in thread
From: Joerg Roedel @ 2015-10-09 14:45 UTC (permalink / raw)
  To: Andreas Hartmann
  Cc: linux-pci, device-mapper development, Jens Axboe,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	Mikulas Patocka, Linus Torvalds, Christoph Hellwig, Milan Broz

Hi Andreas,

On Thu, Oct 08, 2015 at 09:47:28PM +0200, Andreas Hartmann wrote:
> This time, the oops was caused by the second PCI card I'm passing
> through to another VM (the ath9k card worked fine this time - chance?).
> I added the lspci output to the attached file, too.

I digged a little bit around here and found a 32bit PCI card and plugged
it into the AMD IOMMU box. I could reproduce the problem and here is
patch which fixes it for me. Can you test it too please? I'd like to
send a pull-req with this fix included to Linus for rc5.

Thanks,

	Joerg

>From d07307c04edffaaa045fb83713f8808e55ffa895 Mon Sep 17 00:00:00 2001
From: Joerg Roedel <jroedel-l3A5Bk7waGM@public.gmane.org>
Date: Fri, 9 Oct 2015 16:23:33 +0200
Subject: [PATCH] iommu/amd: Fix NULL pointer deref on device detach

When a device group is detached from its domain, the iommu
core code calls into the iommu driver to detach each device
individually.

Before this functionality went into the iommu core code, it
was implemented in the drivers, also in the AMD IOMMU
driver as the device alias handling code.

This code is still present, as there might be aliases that
don't exist as real PCI devices (and are therefore invisible
to the iommu core code).

Unfortunatly it might happen now, that a device is unbound
multiple times from its domain, first by the alias handling
code and then by the iommu core code (or vice verca).

This ends up in the do_detach function which dereferences
the dev_data->domain pointer. When the device is already
detached, this pointer is NULL and we get a kernel oops.

Removing the alias code completly is not an option, as that
would also remove the code which handles invisible aliases.
The code could be simplified, but this is too big of a
change outside the merge window.

For now, just check the dev_data->domain pointer in
do_detach and bail out if it is NULL.

Andreas Hartmann <andihartmann-KuiJ5kEpwI6ELgA04lAiVw@public.gmane.org>
Signed-off-by: Joerg Roedel <jroedel-l3A5Bk7waGM@public.gmane.org>
---
 drivers/iommu/amd_iommu.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
index f82060e7..08d2775 100644
--- a/drivers/iommu/amd_iommu.c
+++ b/drivers/iommu/amd_iommu.c
@@ -2006,6 +2006,15 @@ static void do_detach(struct iommu_dev_data *dev_data)
 {
 	struct amd_iommu *iommu;

+	/*
+	 * First check if the device is still attached. It might already
+	 * be detached from its domain because the generic
+	 * iommu_detach_group code detached it and we try again here in
+	 * our alias handling.
+	 */
+	if (!dev_data->domain)
+		return;
+
 	iommu = amd_iommu_rlookup_table[dev_data->devid];

 	/* decrease reference counters */
-- 
2.5.1

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: [dm-devel] AMD-Vi IO_PAGE_FAULTs and ata3.00: failed command: READ FPDMA QUEUED errors since Linux 4.0
       [not found]                                                                   ` <56178599.6010807-YKS6W9RDU/w@public.gmane.org>
@ 2015-10-09 14:59                                                                     ` Joerg Roedel
       [not found]                                                                       ` <20151009145951.GC27420-zLv9SwRftAIdnm+yROfE0A@public.gmane.org>
  0 siblings, 1 reply; 35+ messages in thread
From: Joerg Roedel @ 2015-10-09 14:59 UTC (permalink / raw)
  To: Andreas Hartmann
  Cc: linux-pci, device-mapper development, Jens Axboe,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	Mikulas Patocka, Linus Torvalds, Christoph Hellwig, Milan Broz

On Fri, Oct 09, 2015 at 11:15:05AM +0200, Andreas Hartmann wrote:
> v4.3-rc4 isn't usable at all for me as long as is hangs the machine on
> the necessary PCI passthrough for VMs (I need them).

If the fix I just sent you works, could you please test this again with
a (patched) v4.3-rc4 kernel?


Thanks,

	Joerg

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH] iommu/amd: Fix NULL pointer deref on device detach READ FPDMA QUEUED errors since Linux 4.0
  2015-10-09 14:45                                                     ` [PATCH] iommu/amd: Fix NULL pointer deref on device detach " Joerg Roedel
@ 2015-10-09 17:42                                                       ` Andreas Hartmann
  0 siblings, 0 replies; 35+ messages in thread
From: Andreas Hartmann @ 2015-10-09 17:42 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: Mikulas Patocka, iommu, Leo Duran, Christoph Hellwig,
	device-mapper development, Milan Broz, Jens Axboe, linux-pci,
	Linus Torvalds

Hello Jörg,

On 10/09/2015 at 04:45 PM, Joerg Roedel wrote:
> Hi Andreas,
> 
> On Thu, Oct 08, 2015 at 09:47:28PM +0200, Andreas Hartmann wrote:
>> This time, the oops was caused by the second PCI card I'm passing
>> through to another VM (the ath9k card worked fine this time - chance?).
>> I added the lspci output to the attached file, too.
> 
> I digged a little bit around here and found a 32bit PCI card and plugged
> it into the AMD IOMMU box. I could reproduce the problem and here is
> patch which fixes it for me. Can you test it too please?

Works fine here. Thanks.


But now, I can see the next big problem of v4.3-rc4: My VMs are
connected on the host via tun/tap devices. They themselves are connected
to a bridge device. If there is data sent between the VMs, the (system)
load is more as double (!) of the load seen with 4.1.x e.g. . Having an
allover throughput of about low 35 MBit/s seen by the host over all
tun/tap devices / bridge creates a load of 3 (!!) with 3 VMs being
involved. That's about half of the load produced by kernel compiling
(make -j8).



> I'd like to
> send a pull-req with this fix included to Linus for rc5.
> 
> Thanks,
> 
> 	Joerg
> 
> From d07307c04edffaaa045fb83713f8808e55ffa895 Mon Sep 17 00:00:00 2001
> From: Joerg Roedel <jroedel@suse.de>
> Date: Fri, 9 Oct 2015 16:23:33 +0200
> Subject: [PATCH] iommu/amd: Fix NULL pointer deref on device detach
> 
> When a device group is detached from its domain, the iommu
> core code calls into the iommu driver to detach each device
> individually.
> 
> Before this functionality went into the iommu core code, it
> was implemented in the drivers, also in the AMD IOMMU
> driver as the device alias handling code.
> 
> This code is still present, as there might be aliases that
> don't exist as real PCI devices (and are therefore invisible
> to the iommu core code).
> 
> Unfortunatly it might happen now, that a device is unbound
> multiple times from its domain, first by the alias handling
> code and then by the iommu core code (or vice verca).
> 
> This ends up in the do_detach function which dereferences
> the dev_data->domain pointer. When the device is already
> detached, this pointer is NULL and we get a kernel oops.
> 
> Removing the alias code completly is not an option, as that
> would also remove the code which handles invisible aliases.
> The code could be simplified, but this is too big of a
> change outside the merge window.
> 
> For now, just check the dev_data->domain pointer in
> do_detach and bail out if it is NULL.
> 
> Andreas Hartmann <andihartmann@freenet.de>
> Signed-off-by: Joerg Roedel <jroedel@suse.de>
> ---
>  drivers/iommu/amd_iommu.c | 9 +++++++++
>  1 file changed, 9 insertions(+)
> 
> diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
> index f82060e7..08d2775 100644
> --- a/drivers/iommu/amd_iommu.c
> +++ b/drivers/iommu/amd_iommu.c
> @@ -2006,6 +2006,15 @@ static void do_detach(struct iommu_dev_data *dev_data)
>  {
>  	struct amd_iommu *iommu;
>  
> +	/*
> +	 * First check if the device is still attached. It might already
> +	 * be detached from its domain because the generic
> +	 * iommu_detach_group code detached it and we try again here in
> +	 * our alias handling.
> +	 */
> +	if (!dev_data->domain)
> +		return;
> +
>  	iommu = amd_iommu_rlookup_table[dev_data->devid];
>  
>  	/* decrease reference counters */
> 

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [dm-devel] AMD-Vi IO_PAGE_FAULTs and ata3.00: failed command: READ FPDMA QUEUED errors since Linux 4.0
       [not found]                                                                       ` <20151009145951.GC27420-zLv9SwRftAIdnm+yROfE0A@public.gmane.org>
@ 2015-10-09 17:46                                                                         ` Andreas Hartmann
       [not found]                                                                           ` <5617FD6E.70802-YKS6W9RDU/w@public.gmane.org>
  2015-10-12 12:34                                                                           ` Mikulas Patocka
  0 siblings, 2 replies; 35+ messages in thread
From: Andreas Hartmann @ 2015-10-09 17:46 UTC (permalink / raw)
  To: Joerg Roedel, Andreas Hartmann
  Cc: linux-pci, device-mapper development, Jens Axboe,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	Mikulas Patocka, Linus Torvalds, Christoph Hellwig, Milan Broz

Hello Jörg,

On 10/09/2015 at 04:59 PM, Joerg Roedel wrote:
> On Fri, Oct 09, 2015 at 11:15:05AM +0200, Andreas Hartmann wrote:
>> v4.3-rc4 isn't usable at all for me as long as is hangs the machine on
>> the necessary PCI passthrough for VMs (I need them).
> 
> If the fix I just sent you works, could you please test this again with
> a (patched) v4.3-rc4 kernel?

Your IOMMU-patch works fine - but the ata-problem can be seen here, too.
Same behavior as with 4.1.10.


Thanks,
regards,
Andreas

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [dm-devel] AMD-Vi IO_PAGE_FAULTs and ata3.00: failed command: READ FPDMA QUEUED errors since Linux 4.0
       [not found]                                                                           ` <5617FD6E.70802-YKS6W9RDU/w@public.gmane.org>
@ 2015-10-11 12:23                                                                             ` Andreas Hartmann
  2015-10-12 12:07                                                                               ` Andreas Hartmann
  0 siblings, 1 reply; 35+ messages in thread
From: Andreas Hartmann @ 2015-10-11 12:23 UTC (permalink / raw)
  To: Joerg Roedel, James.Bottomley-d9PhHud1JfjCXq6kfMZ53/egYHeGw8Jk,
	Hannes Reinecke
  Cc: linux-pci, device-mapper development, Jens Axboe,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	Mikulas Patocka, Linus Torvalds, Christoph Hellwig, Milan Broz

On 10/09/2015 at 07:46 PM, Andreas Hartmann wrote:
> Hello Jörg,
> 
> On 10/09/2015 at 04:59 PM, Joerg Roedel wrote:
>> On Fri, Oct 09, 2015 at 11:15:05AM +0200, Andreas Hartmann wrote:
>>> v4.3-rc4 isn't usable at all for me as long as is hangs the machine on
>>> the necessary PCI passthrough for VMs (I need them).
>>
>> If the fix I just sent you works, could you please test this again with
>> a (patched) v4.3-rc4 kernel?
> 
> Your IOMMU-patch works fine - but the ata-problem can be seen here, too.
> Same behavior as with 4.1.10.
> 

Ok, this patch seems to fix the ata errors (I did a lot of tests until
now w/ v4.1.10 - but anyway I'm cautious):

http://thread.gmane.org/gmane.linux.scsi/104141/focus=104267

Would be nice to have it in all kernels (as stable patch too in 4.1.x).


Regards,
Andreas

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [dm-devel] AMD-Vi IO_PAGE_FAULTs and ata3.00: failed command: READ FPDMA QUEUED errors since Linux 4.0
  2015-10-11 12:23                                                                             ` Andreas Hartmann
@ 2015-10-12 12:07                                                                               ` Andreas Hartmann
  0 siblings, 0 replies; 35+ messages in thread
From: Andreas Hartmann @ 2015-10-12 12:07 UTC (permalink / raw)
  To: Joerg Roedel, James.Bottomley, Hannes Reinecke
  Cc: Mikulas Patocka, iommu, Leo Duran, Christoph Hellwig,
	device-mapper development, Milan Broz, Jens Axboe, linux-pci,
	Linus Torvalds

On 10/11/2015 at 02:23 PM, Andreas Hartmann wrote:
> On 10/09/2015 at 07:46 PM, Andreas Hartmann wrote:
>> Hello Jörg,
>>
>> On 10/09/2015 at 04:59 PM, Joerg Roedel wrote:
>>> On Fri, Oct 09, 2015 at 11:15:05AM +0200, Andreas Hartmann wrote:
>>>> v4.3-rc4 isn't usable at all for me as long as is hangs the machine on
>>>> the necessary PCI passthrough for VMs (I need them).
>>>
>>> If the fix I just sent you works, could you please test this again with
>>> a (patched) v4.3-rc4 kernel?
>>
>> Your IOMMU-patch works fine - but the ata-problem can be seen here, too.
>> Same behavior as with 4.1.10.
>>
> 
> Ok, this patch seems to fix the ata errors (I did a lot of tests until
> now w/ v4.1.10 - but anyway I'm cautious):
> 
> http://thread.gmane.org/gmane.linux.scsi/104141/focus=104267

-> Forget it - doesn't fix it.


Regards,
Andreas

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [dm-devel] AMD-Vi IO_PAGE_FAULTs and ata3.00: failed command: READ FPDMA QUEUED errors since Linux 4.0
  2015-10-09 17:46                                                                         ` Andreas Hartmann
       [not found]                                                                           ` <5617FD6E.70802-YKS6W9RDU/w@public.gmane.org>
@ 2015-10-12 12:34                                                                           ` Mikulas Patocka
  1 sibling, 0 replies; 35+ messages in thread
From: Mikulas Patocka @ 2015-10-12 12:34 UTC (permalink / raw)
  To: Andreas Hartmann
  Cc: Joerg Roedel, iommu, Leo Duran, Christoph Hellwig,
	device-mapper development, Milan Broz, Jens Axboe, linux-pci,
	Linus Torvalds

[-- Attachment #1: Type: TEXT/PLAIN, Size: 674 bytes --]



On Fri, 9 Oct 2015, Andreas Hartmann wrote:

> Hello Jörg,
> 
> On 10/09/2015 at 04:59 PM, Joerg Roedel wrote:
> > On Fri, Oct 09, 2015 at 11:15:05AM +0200, Andreas Hartmann wrote:
> >> v4.3-rc4 isn't usable at all for me as long as is hangs the machine on
> >> the necessary PCI passthrough for VMs (I need them).
> > 
> > If the fix I just sent you works, could you please test this again with
> > a (patched) v4.3-rc4 kernel?
> 
> Your IOMMU-patch works fine - but the ata-problem can be seen here, too.
> Same behavior as with 4.1.10.

Could you try another ata disk? (copy the whole filesystem to it and run 
the same test)

It may be bug in disk's firmware.

Mikulas

^ permalink raw reply	[flat|nested] 35+ messages in thread

end of thread, other threads:[~2015-10-12 12:34 UTC | newest]

Thread overview: 35+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <55B7BEA2.30205@01019freenet.de>
     [not found] ` <20150728175054.GB24782@redhat.com>
     [not found]   ` <55B7D054.4070308@maya.org>
     [not found]     ` <20150728192908.GA25264@redhat.com>
     [not found]       ` <55BCD5A7.2080708@maya.org>
     [not found]         ` <55BE1D5E.6020709@maya.org>
2015-08-02 17:57           ` [dm-devel] AMD-Vi IO_PAGE_FAULTs and ata3.00: failed command: READ FPDMA QUEUED errors since Linux 4.0 Mikulas Patocka
     [not found]             ` <alpine.LRH.2.02.1508021347480.17729-Hpncn10jQN4oNljnaZt3ZvA+iT7yCHsGwRM8/txMwJMAicBL8TP8PQ@public.gmane.org>
2015-08-02 18:48               ` Andreas Hartmann
2015-08-03  8:12                 ` Joerg Roedel
2015-08-04 14:47                   ` Mike Snitzer
2015-08-04 16:10                     ` Jeff Moyer
     [not found]                       ` <x4937zzm3uc.fsf-RRHT56Q3PSP4kTEheFKJxxDDeQx5vsVwAInAS/Ez/D0@public.gmane.org>
2015-08-04 18:11                         ` Andreas Hartmann
2015-08-07  6:04                           ` Andreas Hartmann
2015-09-20  6:50               ` [dm-devel] " Andreas Hartmann
     [not found]                 ` <55FE5740.2060701-YKS6W9RDU/w@public.gmane.org>
2015-09-29 15:21                   ` Joerg Roedel
     [not found]                     ` <20150929152100.GL3036-zLv9SwRftAIdnm+yROfE0A@public.gmane.org>
2015-09-29 15:58                       ` Mikulas Patocka
2015-09-29 16:20                         ` Joerg Roedel
     [not found]                           ` <20150929162042.GR3036-zLv9SwRftAIdnm+yROfE0A@public.gmane.org>
2015-09-30 14:52                             ` Andreas Hartmann
2015-10-06 10:13                               ` Joerg Roedel
     [not found]                                 ` <20151006101356.GE12506-zLv9SwRftAIdnm+yROfE0A@public.gmane.org>
2015-10-06 18:37                                   ` Andreas Hartmann
     [not found]                                     ` <56141507.7040103-YKS6W9RDU/w@public.gmane.org>
2015-10-07  2:57                                       ` Andreas Hartmann
     [not found]                                         ` <56148A1B.5060506-YKS6W9RDU/w@public.gmane.org>
2015-10-07 16:10                                           ` Joerg Roedel
     [not found]                                             ` <20151007161022.GI28811-zLv9SwRftAIdnm+yROfE0A@public.gmane.org>
2015-10-07 16:52                                               ` Andreas Hartmann
2015-10-08 16:39                                                 ` Joerg Roedel
     [not found]                                                   ` <20151008163957.GK28811-zLv9SwRftAIdnm+yROfE0A@public.gmane.org>
2015-10-08 18:21                                                     ` Andreas Hartmann
     [not found]                                                       ` <5616B436.1000802-YKS6W9RDU/w@public.gmane.org>
2015-10-08 19:52                                                         ` Andreas Hartmann
     [not found]                                                           ` <5616C998.1010309-YKS6W9RDU/w@public.gmane.org>
2015-10-09  5:20                                                             ` Andreas Hartmann
     [not found]                                                               ` <56174EA6.7000106-YKS6W9RDU/w@public.gmane.org>
2015-10-09  9:15                                                                 ` Andreas Hartmann
     [not found]                                                                   ` <56178599.6010807-YKS6W9RDU/w@public.gmane.org>
2015-10-09 14:59                                                                     ` Joerg Roedel
     [not found]                                                                       ` <20151009145951.GC27420-zLv9SwRftAIdnm+yROfE0A@public.gmane.org>
2015-10-09 17:46                                                                         ` Andreas Hartmann
     [not found]                                                                           ` <5617FD6E.70802-YKS6W9RDU/w@public.gmane.org>
2015-10-11 12:23                                                                             ` Andreas Hartmann
2015-10-12 12:07                                                                               ` Andreas Hartmann
2015-10-12 12:34                                                                           ` Mikulas Patocka
2015-10-07 15:40                                     ` Joerg Roedel
2015-10-07 17:02                                       ` Andreas Hartmann
2015-10-08 17:30                                         ` Joerg Roedel
     [not found]                                           ` <20151008173007.GL28811-zLv9SwRftAIdnm+yROfE0A@public.gmane.org>
2015-10-08 18:59                                             ` Andreas Hartmann
     [not found]                                               ` <5616BCF4.10104-YKS6W9RDU/w@public.gmane.org>
2015-10-08 19:47                                                 ` Andreas Hartmann
2015-10-09 10:40                                                   ` Joerg Roedel
     [not found]                                                   ` <5616C850.2000906-YKS6W9RDU/w@public.gmane.org>
2015-10-09 14:45                                                     ` [PATCH] iommu/amd: Fix NULL pointer deref on device detach " Joerg Roedel
2015-10-09 17:42                                                       ` Andreas Hartmann

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox