All of lore.kernel.org
 help / color / mirror / Atom feed
From: Maik Broemme <mbroemme@parallels.com>
To: Alex Williamson <alex.williamson@redhat.com>
Cc: qemu-devel@nongnu.org
Subject: Re: [Qemu-devel] Multi GPU passthrough via VFIO
Date: Fri, 7 Feb 2014 19:07:09 +0100	[thread overview]
Message-ID: <20140207180709.GQ995@parallels.com> (raw)
In-Reply-To: <20140207002258.GJ994@parallels.com>

Hi Alex,

Maik Broemme <mbroemme@parallels.com> wrote:
> Hi Alex,
> 
> Alex Williamson <alex.williamson@redhat.com> wrote:
> > On Thu, 2014-02-06 at 01:25 +0100, Maik Broemme wrote:
> > > Hi Alex,
> > > 
> > > Maik Broemme <mbroemme@parallels.com> wrote:
> > > > > > > > Another minor issue is that the R9 290X is not reset during shutdown of
> > > > > > > > VM (neither Linux nor Windows) but it can be tricked with doing
> > > > > > > > "suspend-to-ram" between two starts. That's why I use '-no-reboot' option
> > > > > > > > in QEMU. The 7870 is doing the reset properly.
> > > > > > > 
> > > > > > > 
> > > > > > > Is the NoSoftRst "-" on the 290X vs "+" on the 7870 in lspci -vvv by
> > > > > > > chance?  Thanks,
> > > > > > > 
> > > > > > 
> > > > > > Here are both. It is funny it is opposite as you described. :)
> > > > > 
> > > > > 
> > > > > Oops, yes.  Does this help?
> > > > > 
> > > > > --- a/hw/misc/vfio.c
> > > > > +++ b/hw/misc/vfio.c
> > > > > @@ -3136,7 +3136,7 @@ static void vfio_pci_reset_handler(void *opaque)
> > > > >  
> > > > >      QLIST_FOREACH(group, &group_list, next) {
> > > > >          QLIST_FOREACH(vdev, &group->device_list, next) {
> > > > > -            if (!vdev->reset_works || (!vdev->has_flr && vdev->has_pm_reset)) {
> > > > > +            if (!vdev->reset_works || !vdev->has_flr) {
> > > > >                  vdev->needs_reset = true;
> > > > >              }
> > > > >          }
> > > > > 
> > > > > I can't figure out why I coded it the way that I did.  Probably overly
> > > > > targeting a specific device.  Thanks,
> > > > > 
> > > > 
> > > > This patch works absolutely fine. After applying it to my 'qemu-git', the
> > > > device resets works flawlessly. So it would be great to push it upstream
> > > > as it looks good.
> > > > 
> > > 
> > > Okay sorry. I was too fast here. It was just working first time but now
> > > even after clean reboot it no longer works as expected but behavior
> > > is very strange.
> > > 
> > > Windows:
> > > 
> > >   1st boot works fine - boot VGA and Windows ATI driver loaded, issue
> > >       reboot and qemu stopped due to '-no-reboot'.
> > > 
> > >   2nd boot works partially - boot VGA and Windows ATI driver loaded but
> > >       black screen and my system becames terrible slow and mostly
> > >       unresponsive. My dmesg shows immediately after ATI driver will
> > >       enable the device the following:
> > > 
> > > [  159.984324] vfio_ecap_init: 0000:01:00.0 hiding ecap 0x19@0x270
> > > [  159.984340] vfio_ecap_init: 0000:01:00.0 hiding ecap 0x1b@0x2d0
> > > [  160.129036] vfio_ecap_init: 0000:02:00.0 hiding ecap 0x19@0x270
> > > [  160.129049] vfio_ecap_init: 0000:02:00.0 hiding ecap 0x1b@0x2d0
> > > [  172.977677] kvm: zapping shadow pages for mmio generation wraparound
> > > [  173.160174] br0: port 2(tap0) entered forwarding state
> > > [  175.902967] vfio-pci 0000:01:00.0: irq 46 for MSI/MSI-X
> > > [  188.340430] Clocksource tsc unstable (delta = -119654611 ns)
> > > [  188.340511] Switched to clocksource hpet
> > > [  191.088693] hpet1: lost 12 rtc interrupts
> > > [  191.926555] hpet1: lost 25 rtc interrupts
> > > 
> > >   So your patch fixed indeed reset issue of boot VGA but something else
> > >   is wrong now. :)
> > 
> > Can you try the cards separately?  If you run lspci on the device in the
> > host, does it report as normal?  Often when the host gets slow and we
> > get these sorts of clock issues it means the bus is fatal and we get
> > timeouts trying to read from it.
> > 
> 
> Okay with only one card I don't have the clock issues anymore, so we
> should look into this a bit later as working reset seems more important
> for now.
> 
> > > Linux (fglrx):
> > > 
> > >   1st boot works fine - boot VGA, fglrx loads fine and X could be
> > >       started, issue reboot via SSH and qemu stopped due to
> > >       '-no-reboot'.
> > > 
> > >   2nd boot works partially - boot VGA, fglrx loads fine but X couldn't
> > >       be started and fails with:
> > > 
> > > [   34.265111] fglrx_pci 0000:02:00.0: irq 50 for MSI/MSI-X
> > > [   34.344313] <6>[fglrx] Firegl kernel thread PID: 318
> > > [   34.344400] <6>[fglrx] Firegl kernel thread PID: 319
> > > [   34.344478] <6>[fglrx] Firegl kernel thread PID: 320
> > > [   34.344589] <6>[fglrx] IRQ 50 Enabled
> > > [   34.356105] <6>[fglrx] Reserved FB block: Shared offset:0, size:1000000 
> > > [   34.356107] <6>[fglrx] Reserved FB block: Unshared offset:fac3000, size:3000 
> > > [   34.356109] <6>[fglrx] Reserved FB block: Unshared offset:fac6000, size:23a000 
> > > [   34.356110] <6>[fglrx] Reserved FB block: Unshared offset:7fff4000, size:c000 
> > > [   34.386436] fglrx_pci 0000:01:00.0: irq 51 for MSI/MSI-X
> > > [   34.490902] <6>[fglrx] Firegl kernel thread PID: 321
> > > [   34.490994] <6>[fglrx] Firegl kernel thread PID: 322
> > > [   34.491069] <6>[fglrx] Firegl kernel thread PID: 323
> > > [   34.491166] <6>[fglrx] IRQ 51 Enabled
> > > [   34.505271] <6>[fglrx] Reserved FB block: Shared offset:0, size:1000000 
> > > [   34.505273] <6>[fglrx] Reserved FB block: Unshared offset:f9c3000, size:3000 
> > > [   34.505274] <6>[fglrx] Reserved FB block: Unshared offset:f9c6000, size:23a000 
> > > [   34.505276] <6>[fglrx] Reserved FB block: Unshared offset:fc00000, size:100000 
> > > [   34.505277] <6>[fglrx] Reserved FB block: Unshared offset:fff8000, size:8000 
> > > [   34.505278] <6>[fglrx] Reserved FB block: Unshared offset:ffff4000, size:c000 
> > > [   34.526198] BUG: unable to handle kernel paging request at ffff880c724e8008
> > > [   34.526203] IP: [<ffffffffa0399af6>] TF_PhwCIslands_PopulateAndUploadSclkMclkDPMLevels+0x96/0x3d0 [fglrx]
> > > [   34.526277] PGD 1b3e067 PUD 0 
> > > [   34.526279] Oops: 0002 [#1] PREEMPT SMP 
> > > [   34.526282] Modules linked in: mousedev crct10dif_pclmul crct10dif_common crc32_pclmul crc32c_intel ghash_clmulni_intel ppdev aesni_intel snd_hda_codec_hdmi aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd snd_hda_intel microcode snd_hda_codec serio_raw psmouse parport_pc snd_hwdep snd_pcm parport snd_page_alloc processor snd_timer snd soundcore i2c_i801 intel_agp lpc_ich pcspkr intel_gtt i2c_core shpchp evdev fglrx(PO) amd_iommu_v2 button ext4 crc16 mbcache jbd2 atkbd libps2 virtio_blk virtio_net ahci libahci libata scsi_mod i8042 floppy serio virtio_pci virtio_ring virtio
> > > [   34.526307] CPU: 1 PID: 316 Comm: X Tainted: P           O 3.13.1-2-ARCH #1
> > > [   34.526309] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS Bochs 01/01/2011
> > > [   34.526311] task: ffff8800776e2d00 ti: ffff880037a28000 task.ti: ffff880037a28000
> > > [   34.526312] RIP: 0010:[<ffffffffa0399af6>]  [<ffffffffa0399af6>] TF_PhwCIslands_PopulateAndUploadSclkMclkDPMLevels+0x96/0x3d0 [fglrx]
> > > [   34.526353] RSP: 0018:ffff880037a29810  EFLAGS: 00010296
> > > [   34.526354] RAX: 0000000000000001 RBX: ffff8800724e800c RCX: 0000000000000006
> > > [   34.526356] RDX: 0000000000000003 RSI: 0000000000000002 RDI: ffff8800724e8264
> > > [   34.526357] RBP: ffff88007b19a00c R08: 00000000000186a0 R09: 000000000001e848
> > > [   34.526358] R10: 00000002fffffffd R11: 00000000ffffffff R12: 0000000000000001
> > > [   34.526359] R13: ffff88007b19a00c R14: 0000000000000000 R15: ffff880037a298b0
> > > [   34.526363] FS:  00007f0ba649b880(0000) GS:ffff88007fd00000(0000) knlGS:0000000000000000
> > > [   34.526365] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> > > [   34.526366] CR2: ffff880c724e8008 CR3: 0000000037998000 CR4: 00000000000406e0
> > > [   34.526372] Stack:
> > > [   34.526373]  ffff88007b19a2f4 ffff88007bffcd1c 0000000000000001 ffffffffa0322cf0
> > > [   34.526375]  0000000000000000 0000000000000000 0000000000000000 ffff880077ed2c08
> > > [   34.526378]  0000000000000000 ffff880077ed2c08 ffff880037a298a0 ffffffffa0327f14
> > > [   34.526380] Call Trace:
> > > [   34.526435]  [<ffffffffa0322cf0>] ? PHM_DispatchTable+0xf0/0x220 [fglrx]
> > > [   34.526490]  [<ffffffffa0327f14>] ? PECI_NotifyDALPreAdapterClockChange+0x144/0x160 [fglrx]
> > > [   34.526546]  [<ffffffffa031e321>] ? PHM_SetPowerState+0x31/0xc0 [fglrx]
> > > [   34.526597]  [<ffffffffa0340a5b>] ? PSM_ApplyHardwareAttributes_Dynamic+0x9b/0xf0 [fglrx]
> > > [   34.526651]  [<ffffffffa033fde9>] ? PSM_AdjustPowerState_Dynamic+0x169/0x540 [fglrx]
> > > [   34.526668]  [<ffffffffa0322cf0>] ? PHM_DispatchTable+0xf0/0x220 [fglrx]
> > > [   34.526668]  [<ffffffffa0342ee4>] ? PEM_ExcuteEventChain+0x64/0xe0 [fglrx]
> > > [   34.526668]  [<ffffffffa0341302>] ? PEM_HandleEvent+0x92/0xd0 [fglrx]
> > > [   34.526668]  [<ffffffffa03357c0>] ? PEM_CWDDEPM_NotifyEvent+0xe0/0x4d0 [fglrx]
> > > [   34.526668]  [<ffffffffa0333869>] ? PP_Cwdde+0x109/0x180 [fglrx]
> > > [   34.526668]  [<ffffffffa02091dc>] ? firegl_pplib_cwddepm+0xbc/0x130 [fglrx]
> > > [   34.526668]  [<ffffffffa02092d9>] ? firegl_pplib_notify_event+0x89/0xd0 [fglrx]
> > > [   34.526668]  [<ffffffffa020292f>] ? hal_init_gpu+0x2bf/0x480 [fglrx]
> > > [   34.526668]  [<ffffffffa01dcc7b>] ? firegl_open+0x2db/0x310 [fglrx]
> > > [   34.526668]  [<ffffffffa01cb287>] ? ip_firegl_open+0x17/0x20 [fglrx]
> > > [   34.526668]  [<ffffffffa01ccac8>] ? firegl_stub_open+0x98/0x100 [fglrx]
> > > [   34.526668]  [<ffffffff811a82bf>] ? chrdev_open+0x9f/0x1d0
> > > [   34.526668]  [<ffffffff811a1967>] ? do_dentry_open+0x1b7/0x2c0
> > > [   34.526668]  [<ffffffff811aed41>] ? __inode_permission+0x41/0xb0
> > > [   34.526668]  [<ffffffff811a8220>] ? cdev_put+0x30/0x30
> > > [   34.526668]  [<ffffffff811a1d91>] ? finish_open+0x31/0x40
> > > [   34.526668]  [<ffffffff811b1b72>] ? do_last+0x572/0xe90
> > > [   34.526668]  [<ffffffff811af036>] ? link_path_walk+0x236/0x8d0
> > > [   34.526668]  [<ffffffff811b254b>] ? path_openat+0xbb/0x6b0
> > > [   34.526668]  [<ffffffff811b3c6a>] ? do_filp_open+0x3a/0x90
> > > [   34.526668]  [<ffffffff811c0567>] ? __alloc_fd+0xa7/0x130
> > > [   34.526668]  [<ffffffff811a2f49>] ? do_sys_open+0x129/0x220
> > > [   34.526668]  [<ffffffff811a305e>] ? SyS_open+0x1e/0x20
> > > [   34.526668]  [<ffffffff8152136d>] ? system_call_fastpath+0x1a/0x1f
> > > [   34.526668] Code: 8b 4a 1c 8b 93 e0 18 00 00 48 8d bb 58 02 00 00 85 d2 0f 84 63 02 00 00 f6 c2 01 0f 84 20 01 00 00 44 8b 1b 41 ff cb 4f 8d 14 5b <46> 89 44 93 08 8b 95 3c 02 00 00 48 89 d0 48 c1 e8 07 a8 01 75 
> > > [   34.526668] RIP  [<ffffffffa0399af6>] TF_PhwCIslands_PopulateAndUploadSclkMclkDPMLevels+0x96/0x3d0 [fglrx]
> > > [   34.526668]  RSP <ffff880037a29810>
> > > [   34.526668] CR2: ffff880c724e8008
> > > [   34.526668] ---[ end trace 5431e6dcf1c31dea ]---
> > > [   69.317528] type=1006 audit(1391649552.046:4): pid=324 uid=0 old auid=4294967295 new auid=0 old ses=4294967295 new ses=3 res=1
> > > 
> > > I know it is the binary driver but I would also retry with radeon one but
> > > I believe there will be a similar crash. In my first try I just rebooted
> > > the Linux VM several times without starting X.
> > > 
> > > I got it one time working without getting 'Clocksource tsc unstable' but
> > > now I'm unable to repeat it. So I believe something more is needed.
> > 
> > Bus resets are a mixed blessing, it returns the card to a relatively
> > known state, but it's a fairly unusual event from a platform perspective
> > and we have no idea what kind of quirks the host system bios might have
> > in place to workaround hardware.  If the bus is not fatal you might try
> > running lspci -vvv in the host at various points to see what changed.
> > For instance, boot a Linux guest to text mode and see if the card is in
> > the same state between first boot and second boot before starting X.
> > Thanks,
> > 
> 
> I tried the R9 290X separately now. You're right there are some changes
> between lspci -vvv output between 1st and 2nd boot and they are reset
> if I do "suspend-to-ram" and resume before 3rd boot of VM. Below is the
> lspci from 1st boot and the diffs of the lspci outputs:
> 
> --- 001-lspci.290x.before.1st.log	2014-02-07 01:13:41.498827928 +0100
> +++ 002-lspci.290x.during.1st.before.X.log	2014-02-07 01:14:47.984612423 +0100
> @@ -1,6 +1,6 @@
>  01:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Hawaii XT [Radeon HD 8970] (prog-if 00 [VGA controller])
>  	Subsystem: Advanced Micro Devices, Inc. [AMD/ATI] Device 0b00
> -	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
> +	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx-
>  	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
>  	Latency: 0, Cache Line Size: 64 bytes
>  	Interrupt: pin A routed to IRQ 18
> @@ -19,7 +19,7 @@
>  		DevCtl:	Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
>  			RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+
>  			MaxPayload 128 bytes, MaxReadReq 512 bytes
> -		DevSta:	CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend-
> +		DevSta:	CorrErr+ UncorrErr- FatalErr- UnsuppReq+ AuxPwr- TransPend-
>  		LnkCap:	Port #0, Speed 8GT/s, Width x16, ASPM L0s L1, Exit Latency L0s <64ns, L1 <1us
>  			ClockPM- Surprise- LLActRep- BwNot-
>  		LnkCtl:	ASPM L0s L1 Enabled; RCB 64 bytes Disabled- CommClk+
> @@ -39,13 +39,13 @@
>  		UESta:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
>  		UEMsk:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
>  		UESvrt:	DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
> -		CESta:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-
> +		CESta:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
>  		CEMsk:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
>  		AERCap:	First Error Pointer: 00, GenCap+ CGenEn- ChkCap+ ChkEn-
>  	Capabilities: [270 v1] #19
>  	Capabilities: [2b0 v1] Address Translation Service (ATS)
>  		ATSCap:	Invalidate Queue Depth: 00
> -		ATSCtl:	Enable-, Smallest Translation Unit: 00
> +		ATSCtl:	Enable+, Smallest Translation Unit: 00
>  	Capabilities: [2c0 v1] #13
>  	Capabilities: [2d0 v1] #1b
>  	Kernel driver in use: vfio-pci
> 
> --- 002-lspci.290x.during.1st.before.X.log	2014-02-07 01:14:47.984612423 +0100
> +++ 003-lspci.290x.during.1st.after.X.log	2014-02-07 01:16:29.644846503 +0100
> @@ -1,9 +1,9 @@
>  01:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Hawaii XT [Radeon HD 8970] (prog-if 00 [VGA controller])
>  	Subsystem: Advanced Micro Devices, Inc. [AMD/ATI] Device 0b00
> -	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx-
> +	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx+
>  	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
>  	Latency: 0, Cache Line Size: 64 bytes
> -	Interrupt: pin A routed to IRQ 18
> +	Interrupt: pin A routed to IRQ 47
>  	Region 0: Memory at c0000000 (64-bit, prefetchable) [size=256M]
>  	Region 2: Memory at df800000 (64-bit, prefetchable) [size=8M]
>  	Region 4: I/O ports at be00 [size=256]
> @@ -17,14 +17,14 @@
>  		DevCap:	MaxPayload 256 bytes, PhantFunc 0, Latency L0s <4us, L1 unlimited
>  			ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
>  		DevCtl:	Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
> -			RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+
> +			RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+
>  			MaxPayload 128 bytes, MaxReadReq 512 bytes
> -		DevSta:	CorrErr+ UncorrErr- FatalErr- UnsuppReq+ AuxPwr- TransPend-
> +		DevSta:	CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend-
>  		LnkCap:	Port #0, Speed 8GT/s, Width x16, ASPM L0s L1, Exit Latency L0s <64ns, L1 <1us
>  			ClockPM- Surprise- LLActRep- BwNot-
>  		LnkCtl:	ASPM L0s L1 Enabled; RCB 64 bytes Disabled- CommClk+
>  			ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
> -		LnkSta:	Speed 5GT/s, Width x16, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
> +		LnkSta:	Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
>  		DevCap2: Completion Timeout: Not Supported, TimeoutDis-, LTR-, OBFF Not Supported
>  		DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled
>  		LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis-
> @@ -32,8 +32,8 @@
>  			 Compliance De-emphasis: -6dB
>  		LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete-, EqualizationPhase1-
>  			 EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
> -	Capabilities: [a0] MSI: Enable- Count=1/1 Maskable- 64bit+
> -		Address: 0000000000000000  Data: 0000
> +	Capabilities: [a0] MSI: Enable+ Count=1/1 Maskable- 64bit+
> +		Address: 00000000fee00000  Data: 0000
>  	Capabilities: [100 v1] Vendor Specific Information: ID=0001 Rev=1 Len=010 <?>
>  	Capabilities: [150 v2] Advanced Error Reporting
>  		UESta:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
> 
> Now I stopped X and powered down the VM and started 2nd cycle:
> 
> --- 003-lspci.290x.during.1st.after.X.log	2014-02-07 01:16:29.644846503 +0100
> +++ 004-lspci.290x.before.2nd.log	2014-02-07 01:16:50.966611282 +0100
> @@ -1,9 +1,9 @@
>  01:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Hawaii XT [Radeon HD 8970] (prog-if 00 [VGA controller])
>  	Subsystem: Advanced Micro Devices, Inc. [AMD/ATI] Device 0b00
> -	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx+
> +	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
>  	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
>  	Latency: 0, Cache Line Size: 64 bytes
> -	Interrupt: pin A routed to IRQ 47
> +	Interrupt: pin A routed to IRQ 18
>  	Region 0: Memory at c0000000 (64-bit, prefetchable) [size=256M]
>  	Region 2: Memory at df800000 (64-bit, prefetchable) [size=8M]
>  	Region 4: I/O ports at be00 [size=256]
> @@ -17,7 +17,7 @@
>  		DevCap:	MaxPayload 256 bytes, PhantFunc 0, Latency L0s <4us, L1 unlimited
>  			ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
>  		DevCtl:	Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
> -			RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+
> +			RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+
>  			MaxPayload 128 bytes, MaxReadReq 512 bytes
>  		DevSta:	CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend-
>  		LnkCap:	Port #0, Speed 8GT/s, Width x16, ASPM L0s L1, Exit Latency L0s <64ns, L1 <1us
> @@ -32,7 +32,7 @@
>  			 Compliance De-emphasis: -6dB
>  		LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete-, EqualizationPhase1-
>  			 EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
> -	Capabilities: [a0] MSI: Enable+ Count=1/1 Maskable- 64bit+
> +	Capabilities: [a0] MSI: Enable- Count=1/1 Maskable- 64bit+
>  		Address: 00000000fee00000  Data: 0000
>  	Capabilities: [100 v1] Vendor Specific Information: ID=0001 Rev=1 Len=010 <?>
>  	Capabilities: [150 v2] Advanced Error Reporting
> @@ -45,7 +45,7 @@
>  	Capabilities: [270 v1] #19
>  	Capabilities: [2b0 v1] Address Translation Service (ATS)
>  		ATSCap:	Invalidate Queue Depth: 00
> -		ATSCtl:	Enable+, Smallest Translation Unit: 00
> +		ATSCtl:	Enable-, Smallest Translation Unit: 00
>  	Capabilities: [2c0 v1] #13
>  	Capabilities: [2d0 v1] #1b
>  	Kernel driver in use: vfio-pci
> 
> --- 003-lspci.290x.during.1st.after.X.log	2014-02-07 01:16:29.644846503 +0100
> +++ 004-lspci.290x.before.2nd.log	2014-02-07 01:16:50.966611282 +0100
> @@ -1,9 +1,9 @@
>  01:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Hawaii XT [Radeon HD 8970] (prog-if 00 [VGA controller])
>  	Subsystem: Advanced Micro Devices, Inc. [AMD/ATI] Device 0b00
> -	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx+
> +	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
>  	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
>  	Latency: 0, Cache Line Size: 64 bytes
> -	Interrupt: pin A routed to IRQ 47
> +	Interrupt: pin A routed to IRQ 18
>  	Region 0: Memory at c0000000 (64-bit, prefetchable) [size=256M]
>  	Region 2: Memory at df800000 (64-bit, prefetchable) [size=8M]
>  	Region 4: I/O ports at be00 [size=256]
> @@ -17,7 +17,7 @@
>  		DevCap:	MaxPayload 256 bytes, PhantFunc 0, Latency L0s <4us, L1 unlimited
>  			ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
>  		DevCtl:	Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
> -			RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+
> +			RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+
>  			MaxPayload 128 bytes, MaxReadReq 512 bytes
>  		DevSta:	CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend-
>  		LnkCap:	Port #0, Speed 8GT/s, Width x16, ASPM L0s L1, Exit Latency L0s <64ns, L1 <1us
> @@ -32,7 +32,7 @@
>  			 Compliance De-emphasis: -6dB
>  		LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete-, EqualizationPhase1-
>  			 EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
> -	Capabilities: [a0] MSI: Enable+ Count=1/1 Maskable- 64bit+
> +	Capabilities: [a0] MSI: Enable- Count=1/1 Maskable- 64bit+
>  		Address: 00000000fee00000  Data: 0000
>  	Capabilities: [100 v1] Vendor Specific Information: ID=0001 Rev=1 Len=010 <?>
>  	Capabilities: [150 v2] Advanced Error Reporting
> @@ -45,7 +45,7 @@
>  	Capabilities: [270 v1] #19
>  	Capabilities: [2b0 v1] Address Translation Service (ATS)
>  		ATSCap:	Invalidate Queue Depth: 00
> -		ATSCtl:	Enable+, Smallest Translation Unit: 00
> +		ATSCtl:	Enable-, Smallest Translation Unit: 00
>  	Capabilities: [2c0 v1] #13
>  	Capabilities: [2d0 v1] #1b
>  	Kernel driver in use: vfio-pci
> 
> --- 004-lspci.290x.before.2nd.log	2014-02-07 01:16:50.966611282 +0100
> +++ 005-lspci.290x.during.2nd.before.X.log	2014-02-07 01:17:55.571676376 +0100
> @@ -1,6 +1,6 @@
>  01:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Hawaii XT [Radeon HD 8970] (prog-if 00 [VGA controller])
>  	Subsystem: Advanced Micro Devices, Inc. [AMD/ATI] Device 0b00
> -	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
> +	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx-
>  	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
>  	Latency: 0, Cache Line Size: 64 bytes
>  	Interrupt: pin A routed to IRQ 18
> @@ -19,12 +19,12 @@
>  		DevCtl:	Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
>  			RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+
>  			MaxPayload 128 bytes, MaxReadReq 512 bytes
> -		DevSta:	CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend-
> +		DevSta:	CorrErr+ UncorrErr- FatalErr- UnsuppReq+ AuxPwr- TransPend-
>  		LnkCap:	Port #0, Speed 8GT/s, Width x16, ASPM L0s L1, Exit Latency L0s <64ns, L1 <1us
>  			ClockPM- Surprise- LLActRep- BwNot-
>  		LnkCtl:	ASPM L0s L1 Enabled; RCB 64 bytes Disabled- CommClk+
>  			ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
> -		LnkSta:	Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
> +		LnkSta:	Speed 5GT/s, Width x16, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
>  		DevCap2: Completion Timeout: Not Supported, TimeoutDis-, LTR-, OBFF Not Supported
>  		DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled
>  		LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis-
> @@ -33,7 +33,7 @@
>  		LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete-, EqualizationPhase1-
>  			 EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
>  	Capabilities: [a0] MSI: Enable- Count=1/1 Maskable- 64bit+
> -		Address: 00000000fee00000  Data: 0000
> +		Address: 0000000000000000  Data: 0000
>  	Capabilities: [100 v1] Vendor Specific Information: ID=0001 Rev=1 Len=010 <?>
>  	Capabilities: [150 v2] Advanced Error Reporting
>  		UESta:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
> @@ -45,7 +45,7 @@
>  	Capabilities: [270 v1] #19
>  	Capabilities: [2b0 v1] Address Translation Service (ATS)
>  		ATSCap:	Invalidate Queue Depth: 00
> -		ATSCtl:	Enable-, Smallest Translation Unit: 00
> +		ATSCtl:	Enable+, Smallest Translation Unit: 00
>  	Capabilities: [2c0 v1] #13
>  	Capabilities: [2d0 v1] #1b
>  	Kernel driver in use: vfio-pci
> 
> --- 005-lspci.290x.during.2nd.before.X.log	2014-02-07 01:17:55.571676376 +0100
> +++ 006-lspci.290x.during.2nd.after.X.crash.log	2014-02-07 01:18:16.996855362 +0100
> @@ -1,9 +1,9 @@
>  01:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Hawaii XT [Radeon HD 8970] (prog-if 00 [VGA controller])
>  	Subsystem: Advanced Micro Devices, Inc. [AMD/ATI] Device 0b00
> -	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx-
> +	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx+
>  	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
>  	Latency: 0, Cache Line Size: 64 bytes
> -	Interrupt: pin A routed to IRQ 18
> +	Interrupt: pin A routed to IRQ 47
>  	Region 0: Memory at c0000000 (64-bit, prefetchable) [size=256M]
>  	Region 2: Memory at df800000 (64-bit, prefetchable) [size=8M]
>  	Region 4: I/O ports at be00 [size=256]
> @@ -17,9 +17,9 @@
>  		DevCap:	MaxPayload 256 bytes, PhantFunc 0, Latency L0s <4us, L1 unlimited
>  			ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
>  		DevCtl:	Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
> -			RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+
> +			RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+
>  			MaxPayload 128 bytes, MaxReadReq 512 bytes
> -		DevSta:	CorrErr+ UncorrErr- FatalErr- UnsuppReq+ AuxPwr- TransPend-
> +		DevSta:	CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend-
>  		LnkCap:	Port #0, Speed 8GT/s, Width x16, ASPM L0s L1, Exit Latency L0s <64ns, L1 <1us
>  			ClockPM- Surprise- LLActRep- BwNot-
>  		LnkCtl:	ASPM L0s L1 Enabled; RCB 64 bytes Disabled- CommClk+
> @@ -32,8 +32,8 @@
>  			 Compliance De-emphasis: -6dB
>  		LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete-, EqualizationPhase1-
>  			 EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
> -	Capabilities: [a0] MSI: Enable- Count=1/1 Maskable- 64bit+
> -		Address: 0000000000000000  Data: 0000
> +	Capabilities: [a0] MSI: Enable+ Count=1/1 Maskable- 64bit+
> +		Address: 00000000fee00000  Data: 0000
>  	Capabilities: [100 v1] Vendor Specific Information: ID=0001 Rev=1 Len=010 <?>
>  	Capabilities: [150 v2] Advanced Error Reporting
>  		UESta:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
> 
> Interesting is the diff between 1st and 2nd boot, so if I do the lspci
> prior to the booting. The only difference between 1st start and 2nd
> start are:
> 
> --- 001-lspci.290x.before.1st.log	2014-02-07 01:13:41.498827928 +0100
> +++ 004-lspci.290x.before.2nd.log	2014-02-07 01:16:50.966611282 +0100
> @@ -24,7 +24,7 @@
>  			ClockPM- Surprise- LLActRep- BwNot-
>  		LnkCtl:	ASPM L0s L1 Enabled; RCB 64 bytes Disabled- CommClk+
>  			ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
> -		LnkSta:	Speed 5GT/s, Width x16, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
> +		LnkSta:	Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
>  		DevCap2: Completion Timeout: Not Supported, TimeoutDis-, LTR-, OBFF Not Supported
>  		DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled
>  		LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis-
> @@ -33,13 +33,13 @@
>  		LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete-, EqualizationPhase1-
>  			 EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
>  	Capabilities: [a0] MSI: Enable- Count=1/1 Maskable- 64bit+
> -		Address: 0000000000000000  Data: 0000
> +		Address: 00000000fee00000  Data: 0000
>  	Capabilities: [100 v1] Vendor Specific Information: ID=0001 Rev=1 Len=010 <?>
>  	Capabilities: [150 v2] Advanced Error Reporting
>  		UESta:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
>  		UEMsk:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
>  		UESvrt:	DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
> -		CESta:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-
> +		CESta:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
>  		CEMsk:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
>  		AERCap:	First Error Pointer: 00, GenCap+ CGenEn- ChkCap+ ChkEn-
>  	Capabilities: [270 v1] #19
> 
> After that if I do suspend-to-ram / resume trick I have again lspci
> output from before 1st boot.
> 

Another workaround where your patch works fine is to do the following:

  #1 Start VM
  #2 Start X
  #3 Stop X
  #4 rmmod fglrx
  #5 poweroff

After this I'm able to restart the VM as many times as I want with boot
VGA, fglrx and X but obviously if the VM crashes I need to issue
"suspend-to-ram" / resume workaround. It looks like fglrx properly
disables the device if unloaded.

[   36.081197] <6>[fglrx] IRQ 48 Disabled
[   36.096488] <6>[fglrx] module unloaded - fglrx 13.35.5 [Jan 29 2014]

Should I retry it with radeon driver or with VFIO debug enabled?

> > Alex
> > 
> 
> --Maik
> 

--Maik

  reply	other threads:[~2014-02-07 18:07 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-02-05 18:59 [Qemu-devel] Multi GPU passthrough via VFIO Maik Broemme
2014-02-05 20:26 ` Alex Williamson
2014-02-05 21:10   ` Maik Broemme
2014-02-05 21:27     ` Alex Williamson
2014-02-05 23:47       ` Maik Broemme
2014-02-06  0:25         ` Maik Broemme
2014-02-06  3:36           ` Alex Williamson
2014-02-07  0:22             ` Maik Broemme
2014-02-07 18:07               ` Maik Broemme [this message]
2014-02-07 19:10               ` Alex Williamson
2014-02-07 20:17                 ` Maik Broemme
2014-02-14  0:01                   ` Maik Broemme
2014-02-14  0:33                     ` Alex Williamson
2014-02-14 14:51                       ` Maik Broemme
     [not found]                         ` <20140414170306.GH724@parallels.com>
2015-01-16 12:21                           ` Maik Broemme
2015-01-19 17:43                             ` Alex Williamson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140207180709.GQ995@parallels.com \
    --to=mbroemme@parallels.com \
    --cc=alex.williamson@redhat.com \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.