From: Wei Wang <wei.wang2@amd.com>
To: Sander Eikelenboom <linux@eikelenboom.it>
Cc: "xen-devel@lists.xensource.com" <xen-devel@lists.xensource.com>,
Jan Beulich <JBeulich@suse.com>
Subject: Re: [PATCH] amd iommu: Dump flags of IO page faults
Date: Mon, 24 Sep 2012 14:24:16 +0200 [thread overview]
Message-ID: <506050F0.7020703@amd.com> (raw)
In-Reply-To: <74647167.20120924103835@eikelenboom.it>
On 09/24/2012 10:38 AM, Sander Eikelenboom wrote:
>
> Friday, September 7, 2012, 10:54:40 AM, you wrote:
>
>> On 09/07/2012 09:32 AM, Sander Eikelenboom wrote:
>>>
>>> Thursday, September 6, 2012, 5:03:05 PM, you wrote:
>>>
>>>> On 09/06/2012 03:50 PM, Sander Eikelenboom wrote:
>>>>>
>>>>> Thursday, September 6, 2012, 3:32:51 PM, you wrote:
>>>>>
>>>>>> On 09/06/2012 12:59 AM, Sander Eikelenboom wrote:
>>>>>>>
>>>>>>> Wednesday, September 5, 2012, 4:42:42 PM, you wrote:
>>>>>>>
>>>>>>>> Hi Jan,
>>>>>>>> Attached patch dumps io page fault flags. The flags show the reason of
>>>>>>>> the fault and tell us if this is an unmapped interrupt fault or a DMA fault.
>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Wei
>>>>>>>
>>>>>>>> signed-off-by: Wei Wang<wei.wang2@amd.com>
>>>>>>>
>>>>>>>
>>>>>>> I have applied the patch and the flags seem to differ between the faults:
>>>>>>>
>>>>>>> AMD-Vi: IO_PAGE_FAULT: domain = 0, device id = 0x0a06, fault address = 0xc2c2c2c0, flags = 0x000
>>>>>>> (XEN) [2012-09-05 20:54:16] AMD-Vi: IO_PAGE_FAULT: domain = 0, device id = 0x0a06, fault address = 0xc2c2c2c0, flags = 0x000
>>>>>>> (XEN) [2012-09-05 20:54:16] AMD-Vi: IO_PAGE_FAULT: domain = 14, device id = 0x0700, fault address = 0xa8d339e0, flags = 0x020
>>>>>>> (XEN) [2012-09-05 20:54:16] AMD-Vi: IO_PAGE_FAULT: domain = 14, device id = 0x0700, fault address = 0xa8d33a40, flags = 0x020
>>>>>
>>>>>> OK, so they are not interrupt requests. I guess further information from
>>>>>> your system would be helpful to debug this issue:
>>>>>> 1) xl info
>>>>>> 2) xl list
>>>>>> 3) lscpi -vvv (NOTE: not in dom0 but in your guest)
>>>>>> 4) cat /proc/iomem (in both dom0 and your hvm guest)
>>>>>
>>>>> dom14 is not a HVM guest,it's a PV guest.
>>>
>>>> Ah, I see. PV guest is quite different than hvm, it does use p2m tables
>>>> as io page tables. So no-sharept option does not work in this case. PV
>>>> guests always use separated io page tables. There might be some
>>>> incorrect mappings on the page tables. I will check this on my side.
>>>
>>> I have reverted the machine to xen-4.1.4-pre (changeset 23353) and kept everything else the same.
>>> I haven't seen any IO PAGE FAULTS after that.
>>>
>>> I did spot some differences in the output from lspci between xen 4.1 and 4.2, related to MSI enabled or not for the IOMMU device.
>>> Have attached the xl/xm dmesg and lspci from booting with both versions.
>>>
>>> lspci:
>>>
>>> 00:00.2 Generic system peripheral [0806]: ATI Technologies Inc RD990 I/O Memory Management Unit (IOMMU) [1002:5a23]
>>> Subsystem: ATI Technologies Inc RD990 I/O Memory Management Unit (IOMMU) [1002:5a23]
>>> Control: I/O- Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
>>> Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast>TAbort-<TAbort-<MAbort->SERR-<PERR- INTx-
>>> Latency: 0
>>> Interrupt: pin A routed to IRQ 10
>>> Capabilities: [40] Secure device<?>
>>> 4.1: Capabilities: [54] MSI: Enable- Count=1/1 Maskable- 64bit+
>
>> Eh... That is interesting. So which dom0 are you using? There is a c/s
>> in 4.2 to prevent recent dom0 to disable iommu interrupt (changeset
>> 25492:61844569a432) Otherwise, iommu cannot send any events including IO
>> PAGE faults. You could try to revert dom0 to an old version like 2.6
>> pv_ops to see if you really have no io page faults on 4.1
>
> Ok i finally got the time to do some more testing, tested 4.2 around that changeset, and made a copy of the guest using HVM instead of PV.
>
> The results:
> - On xen-4.1.* and a 3.6-rc6 kernel (dom0 and domU): the video device passed through works fine, both in a HVM as a PV guest, i don't see IO page faults getting reported.
> - On xen-4.2 changeset< 25492 and a 3.6-rc6 kernel (dom0 and domU): the video device passed through works fine, both in a HVM as a PV guest, i don't see IO page faults getting reported.
> - On xen-4.2 changeset> 25492 and a 3.6-rc6 kernel (dom0 and domU): the video device passed through works fine for a short while (around 5 to 10 minutes) in a PV guest, after that IO page faults get reported and the video freezes, i don't see any errors in the guest though.
> - On xen-unstable tip and a 3.6-rc6 kernel (dom0 and domU):
> PV: the video device passed through works fine for a short while (around 5 to 10 minutes), after that IO page faults get reported and the video freezes, i don't see any errors in the guest though.
> HVM: the video device passed through doesn't work from the start:
> - The device is there according to lspci
> - The video application start fine, but delivers a green image, so the device is not working properly. I don't see IO page faults though.
>
> Attached are (all with xen-unstable tip and the guest as HVM (domain 15):
> - xl dmesg
> - Patch which adds some more info, but all values reported seem to be zero (see xl dmesg)
> - lspci dom0
> - lspci HVM guest
HI,
Thanks for the information, very very helpful for debugging. I hope I
could start to look at this right after sending my next iommu patch
queue upstream...another question is: Did you see this issue on a single
pv/hvm guest system or you only saw it on a system with about 16 running
VMs?
Thanks,
Wei
>
>
>
>>> 4.2: Capabilities: [54] MSI: Enable+ Count=1/1 Maskable- 64bit+
>>> Address: 00000000fee0100c Data: 4128
>>> Capabilities: [64] HyperTransport: MSI Mapping Enable+ Fixed+
>>>
>>> Although it seems enabled, shouldn't the IRQ number used be much higher than 10 for MSI interrupts ?
>
>> The IRQ number is fine. MSI vector is stored at Data: 4128
>
>>>
>>> There is another difference in the bridge device that's in front of the 0a:00.6 device that faults before the kernel is even booted.
>>>
>>> 00:03.0 PCI bridge [0604]: ATI Technologies Inc RD890 PCI to PCI bridge (PCI express gpp port C) [1002:5a17] (prog-if 00 [Normal decode])
>>> Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx+
>>> 4.1: Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast>TAbort-<TAbort-<MAbort->SERR-<PERR- INTx-
>>> 4.2: Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast>TAbort-<TAbort+<MAbort->SERR-<PERR- INTx-
>>> Latency: 0, Cache Line Size: 64 bytes
>>> Bus: primary=00, secondary=0a, subordinate=0a, sec-latency=0
>>> I/O behind bridge: 0000f000-00000fff
>>> Memory behind bridge: f9f00000-f9ffffff
>>> Prefetchable memory behind bridge: 00000000fff00000-00000000000fffff
>>> 4.1: Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast>TAbort-<TAbort-<MAbort-<SERR-<PERR-
>>> 4.2: Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast>TAbort+<TAbort-<MAbort-<SERR-<PERR-
>>> BridgeCtl: Parity+ SERR+ NoISA+ VGA- MAbort->Reset- FastB2B-
>>> PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
>>> Capabilities: [50] Power Management version 3
>>> Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
>>> Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
>>> Capabilities: [58] Express (v2) Root Port (Slot+), MSI 00
>>> DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s<64ns, L1<1us
>>> ExtTag+ RBE+ FLReset-
>>> DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
>>> RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+
>>> MaxPayload 128 bytes, MaxReadReq 128 bytes
>>> DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend-
>>> LnkCap: Port #1, Speed 5GT/s, Width x8, ASPM L0s L1, Latency L0<1us, L1<8us
>>> ClockPM- Surprise- LLActRep+ BwNot+
>>> LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- Retrain- CommClk-
>>> ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
>>> LnkSta: Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive+ BWMgmt+ ABWMgmt-
>>> SltCap: AttnBtn- PwrCtrl- MRL- AttnInd- PwrInd- HotPlug- Surprise-
>>> Slot #3, PowerLimit 10.000W; Interlock- NoCompl+
>>> SltCtl: Enable: AttnBtn- PwrFlt- MRL- PresDet- CmdCplt- HPIrq- LinkChg-
>>> Control: AttnInd Unknown, PwrInd Unknown, Power- Interlock-
>>> SltSta: Status: AttnBtn- PowerFlt- MRL- CmdCplt- PresDet+ Interlock-
>>> Changed: MRL- PresDet+ LinkState+
>
>> The probably because of the IO_PAGE_FAULT.
>
>> Thanks,
>> Wei
>
>>> serveerstertje:~# lspci -t
>>> -[0000:00]-+-00.0
>>> +-00.2
>>> +-02.0-[0b]----00.0
>>> +-03.0-[0a]--+-00.0
>>> | +-00.1
>>> | +-00.2
>>> | +-00.3
>>> | +-00.4
>>> | +-00.5
>>> | +-00.6
>>> | \-00.7
>>> +-05.0-[09]----00.0
>>> +-06.0-[08]----00.0
>>> +-0a.0-[07]----00.0
>>> +-0b.0-[06]--+-00.0
>>> | \-00.1
>>> +-0c.0-[05]----00.0
>>> +-0d.0-[04]--+-00.0
>>> | +-00.1
>>> | +-00.2
>>> | +-00.3
>>> | +-00.4
>>> | +-00.5
>>> | +-00.6
>>> | \-00.7
>>> +-11.0
>>> +-12.0
>>> +-12.2
>>> +-13.0
>>> +-13.2
>>> +-14.0
>>> +-14.3
>>> +-14.4-[03]----06.0
>>> +-14.5
>>> +-15.0-[02]--
>>> +-16.0
>>> +-16.2
>>> +-18.0
>>> +-18.1
>>> +-18.2
>>> +-18.3
>>> \-18.4
>>>
>>>
>>>
>>>
>>>
>>>> Thanks,
>>>> Wei
>>>
>>>>> I will try to make a complete package, and try with one pv domain only where the devices are being passed through just to simplify the setup.
>>>>>
>>>>>
>>>>>> * I would also like to know the symptoms of device 0x0700 when IO_PF
>>>>>> happened. Did it stop working?
>>>>>
>>>>> Yes it stops working, the video capture just freezes, but the driver doesn't bail out.
>>>>> For the USB controller (0x0a06) it starts to give errors for usbdev_open in the guest.
>>>>>
>>>>>> (BTW: I copied a few options from your boot cmd line and it worked with
>>>>>> my RD890 system
>>>>>
>>>>>> dom0_mem=1024M,max:1024M loglvl=all loglvl_guest=all console_timestamps
>>>>>> cpuidle cpufreq=xen noreboot debug lapic=debug apic_verbosity=debug
>>>>>> apic=debug iommu=on,verbose,debug,no-sharept
>>>>>
>>>>>> * so, what OEM board you have?)
>>>>>
>>>>> MSI 890FXA-GD70
>>>>>
>>>>>> Also from your log, these lines looks very strange:
>>>>>
>>>>>> (XEN) [2012-09-04 15:54:35] hvm.c:2435:d15 guest attempted write to
>>>>>> read-only memory page. gfn=0xd5, mfn=0xa4a11
>>>>>> (XEN) [2012-09-04 15:54:35] hvm.c:2435:d15 guest attempted write to
>>>>>> read-only memory page. gfn=0xd7, mfn=0xa4a0f
>>>>>> (XEN) [2012-09-04 15:54:35] hvm.c:2435:d15 guest attempted write to
>>>>>> read-only memory page. gfn=0xd9, mfn=0xa4a0d
>>>>>> (XEN) [2012-09-04 15:54:35] hvm.c:2435:d15 guest attempted write to
>>>>>> read-only memory page. gfn=0xdb, mfn=0xa4a0b
>>>>>> (XEN) [2012-09-04 15:54:35] hvm.c:2435:d15 guest attempted write to
>>>>>> read-only memory page. gfn=0xdd, mfn=0xa4a09
>>>>>> (XEN) [2012-09-04 15:54:35] hvm.c:2435:d15 guest attempted write to
>>>>>> read-only memory page. gfn=0xdf, mfn=0xa4a07
>>>>>> (XEN) [2012-09-04 15:54:35] hvm.c:2435:d15 guest attempted write to
>>>>>> read-only memory page. gfn=0xe1, mfn=0xa4a05
>>>>>> (XEN) [2012-09-04 15:54:35] hvm.c:2435:d15 guest attempted write to
>>>>>> read-only memory page. gfn=0xe3, mfn=0xa4a03
>>>>>> (XEN) [2012-09-04 15:54:35] hvm.c:2435:d15 guest attempted write to
>>>>>> read-only memory page. gfn=0xe5, mfn=0xa4a01
>>>>>> (XEN) [2012-09-04 15:54:35] hvm.c:2435:d15 guest attempted write to
>>>>>> read-only memory page. gfn=0xe7, mfn=0xa463f
>>>>>> (XEN) [2012-09-04 15:54:35] hvm.c:2435:d15 guest attempted write to
>>>>>> read-only memory page. gfn=0xe9, mfn=0xa463d
>>>>>> (XEN) [2012-09-04 15:54:35] hvm.c:2435:d15 guest attempted write to
>>>>>> read-only memory page. gfn=0xeb, mfn=0xa463b
>>>>>> (XEN) [2012-09-04 15:54:35] hvm.c:2435:d15 guest attempted write to
>>>>>> read-only memory page. gfn=0xed, mfn=0xa4639
>>>>>> (XEN) [2012-09-04 15:54:35] hvm.c:2435:d15 guest attempted write to
>>>>>> read-only memory page. gfn=0xef, mfn=0xa4637
>>>>>> (XEN) [2012-09-04 16:13:56] AMD-Vi: IO_PAGE_FAULT: domain = 0, device id
>>>>>> = 0x0a06, fault address = 0xc2c2c2c0
>>>>>> (XEN) [2012-09-04 16:13:56] AMD-Vi: IO_PAGE_FAULT: domain = 14, device
>>>>>> id = 0x0700, fault address = 0xa90f8300
>>>>>> (XEN) [2012-09-04 16:13:56] AMD-Vi: IO_PAGE_FAULT: domain = 14, device
>>>>>> id = 0x0700, fault address = 0xa90f8340
>>>>>> (XEN) [2012-09-04 16:13:56] AMD-Vi: IO_PAGE_FAULT: domain = 14, device
>>>>>> id = 0x0700, fault address = 0xa90f8380
>>>>>> (XEN) [2012-09-04 16:13:56] AMD-Vi: IO_PAGE_FAULT: domain = 14, device
>>>>>> id = 0x0700, fault address = 0xa90f83c0
>>>>>
>>>>>> * they are just followed by the IO PAGE fault. Do you know where are
>>>>>> they from? Your video card driver maybe?
>>>>>
>>>>> From a HVM domain with a old (3.0.3) kernel, but the faults also occur without this domain being started.
>>>>>
>>>>>
>>>>>> Thanks,
>>>>>> Wei
>>>>>
>>>>>
>>>>>>> Complete xl dmesg and lspci -vvvknn attached.
>>>>>>>
>>>>>>> Thx
>>>>>>>
>>>>>>> --
>>>>>>> Sander
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>
>>>
>
>
next prev parent reply other threads:[~2012-09-24 12:24 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-09-05 14:42 [PATCH] amd iommu: Dump flags of IO page faults Wei Wang
2012-09-05 22:59 ` Sander Eikelenboom
2012-09-06 13:32 ` Wei Wang
2012-09-06 13:50 ` Sander Eikelenboom
2012-09-06 15:03 ` Wei Wang
2012-09-06 15:08 ` Sander Eikelenboom
2012-09-07 7:32 ` Sander Eikelenboom
2012-09-07 8:54 ` Wei Wang
2012-09-07 10:01 ` Sander Eikelenboom
2012-09-07 11:29 ` Jan Beulich
2012-09-07 20:51 ` Konrad Rzeszutek Wilk
2012-09-24 8:38 ` Sander Eikelenboom
2012-09-24 12:24 ` Wei Wang [this message]
[not found] ` <74647167 <506050F0.7020703@amd.com>
[not found] ` <74647167<506050F0.7020703@amd.com>
2012-09-24 12:27 ` Sander Eikelenboom
2012-09-24 21:08 ` Sander Eikelenboom
2012-10-01 15:02 ` Sander Eikelenboom
2012-09-07 9:17 ` [PATCH] amd iommu: Dump flags of IO page faults (off topic - pci devices) Andrew Cooper
2012-09-07 9:53 ` [PATCH] amd iommu: Dump flags of IO page faults Jan Beulich
2012-09-07 10:00 ` Sander Eikelenboom
2012-09-07 10:06 ` Jan Beulich
2012-09-07 10:15 ` Sander Eikelenboom
2012-09-07 11:17 ` Jan Beulich
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=506050F0.7020703@amd.com \
--to=wei.wang2@amd.com \
--cc=JBeulich@suse.com \
--cc=linux@eikelenboom.it \
--cc=xen-devel@lists.xensource.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).