public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* Question about: AMD-Vi: Event logged [IO_PAGE_FAULT ...
@ 2015-01-05 15:25 Raimonds Cicans
  2015-01-05 16:49 ` Joerg Roedel
  0 siblings, 1 reply; 2+ messages in thread
From: Raimonds Cicans @ 2015-01-05 15:25 UTC (permalink / raw)
  To: linux-kernel

Hello.

After kernel upgrade (3.13 => 3.17) I started to receive following 
string in my logs:
AMD-Vi: Event logged [IO_PAGE_FAULT device=08:00.0 domain=0x001c 
address=0x0000000001355000 flags=0x0000]

I would like to deeper understand this problem, so it
would be nice if some body can fix my assumptions and
answer my questions.


Assumptions:

1) This message is generated by AMD IOMMU subsystem
      because PCIe device 08:00.0 tried to access memory
      region which was not mapped to any real memory
      (lspci show that this device is DVB-S2 receiver card
       TBS 6981)

2) Because flags are 0 and because in general receivers
     write to memory not read from memory it is memory
     write operation

3) Possible causes:
     a) memory region was never mapped
     b) device accessed memory region before it was mapped
     c) device accessed memory region after it was unmapped

3) Suspects:
      a) kernel's DMA subsystem: very unlikely
      b) kernel's IOMMU subsystem: very unlikely
      c) AMD IOMMU driver: unlikely? - i had problems with AMD IOMMU
          itself in kernels 3.14 - 3.17 (AMD-Vi: Completion-Wait loop 
timed out)
          So maybe this problem not fully fixed?
      d) Receiver's driver: likely


Questions:
1) What 'domain=0x001c' mean?
2) Where I can find definition of possible flags?
3) What kind of address is written in message?
      - physical?
      - virtual?
      - address from devices point of view?


Thank you.


Raimonds Cicans

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: Question about: AMD-Vi: Event logged [IO_PAGE_FAULT ...
  2015-01-05 15:25 Question about: AMD-Vi: Event logged [IO_PAGE_FAULT Raimonds Cicans
@ 2015-01-05 16:49 ` Joerg Roedel
  0 siblings, 0 replies; 2+ messages in thread
From: Joerg Roedel @ 2015-01-05 16:49 UTC (permalink / raw)
  To: Raimonds Cicans; +Cc: linux-kernel

Hello Raimonds,

On Mon, Jan 05, 2015 at 05:25:25PM +0200, Raimonds Cicans wrote:
> After kernel upgrade (3.13 => 3.17) I started to receive following
> string in my logs:
> AMD-Vi: Event logged [IO_PAGE_FAULT device=08:00.0 domain=0x001c
> address=0x0000000001355000 flags=0x0000]
> 
> I would like to deeper understand this problem, so it
> would be nice if some body can fix my assumptions and
> answer my questions.
> 
> 
> Assumptions:
> 
> 1) This message is generated by AMD IOMMU subsystem
>      because PCIe device 08:00.0 tried to access memory
>      region which was not mapped to any real memory
>      (lspci show that this device is DVB-S2 receiver card
>       TBS 6981)
> 
> 2) Because flags are 0 and because in general receivers
>     write to memory not read from memory it is memory
>     write operation

Almost right, but flags are 0 for this fault which means it was a read
operation. The operation was to a page marked as non-present. This
caused the fault.

> 3) Possible causes:
>     a) memory region was never mapped
>     b) device accessed memory region before it was mapped
>     c) device accessed memory region after it was unmapped

I'd vote for option c) The address reported in the fault is a device
virtual address. The value looks like it was handed out from the
DMA-address allocator in the AMD IOMMU driver, which means the address
was once mapped for the device.

> 
> 3) Suspects:
>      a) kernel's DMA subsystem: very unlikely
>      b) kernel's IOMMU subsystem: very unlikely
>      c) AMD IOMMU driver: unlikely? - i had problems with AMD IOMMU
>          itself in kernels 3.14 - 3.17 (AMD-Vi: Completion-Wait loop
> timed out)
>          So maybe this problem not fully fixed?

IO_PAGE_FAULTs are almost always a bug in the device driver for the
peripheral (or a bug in the firmware, but that is unlikely here).

But the "Completion-Wait loop timed out" message is also worrying. It
usually indicates broken firmware or broken hardware.

>      d) Receiver's driver: likely

Yes, my guess is that the driver for the receiver device calls
dma_unmap_$foo on a memory region it still uses for DMA. But the call
lets the AMD IOMMU driver unmap the region and DMA fails with the
message you see.

> Questions:
> 1) What 'domain=0x001c' mean?

This is just an internal handle and means the domain-id. It is reported
in the fault structure by the hardware and indicates whether the device
has been attached to a DMA domain at all.

> 2) Where I can find definition of possible flags?

In the AMD IOMMU specification, look for the IO_PAGE_FAULT reporting
structure. The flags reported in the kernel message are bits 16-27 of
the second 32bit value.

> 3) What kind of address is written in message?
>      - physical?
>      - virtual?
>      - address from devices point of view?

It is a device virtual address, the address the device tried to access
but which was not mapped.


HTH,

	Joerg


^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2015-01-05 16:49 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-01-05 15:25 Question about: AMD-Vi: Event logged [IO_PAGE_FAULT Raimonds Cicans
2015-01-05 16:49 ` Joerg Roedel

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox