qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Maik Broemme <mbroemme@parallels.com>
To: Alex Williamson <alex.williamson@redhat.com>
Cc: qemu-devel@nongnu.org
Subject: Re: [Qemu-devel] Multi GPU passthrough via VFIO
Date: Fri, 7 Feb 2014 21:17:34 +0100	[thread overview]
Message-ID: <20140207201734.GR995@parallels.com> (raw)
In-Reply-To: <1391800246.6959.280.camel@bling.home>

Hi Alex,

Alex Williamson <alex.williamson@redhat.com> wrote:
> On Fri, 2014-02-07 at 01:22 +0100, Maik Broemme wrote:
> > Interesting is the diff between 1st and 2nd boot, so if I do the lspci
> > prior to the booting. The only difference between 1st start and 2nd
> > start are:
> > 
> > --- 001-lspci.290x.before.1st.log	2014-02-07 01:13:41.498827928 +0100
> > +++ 004-lspci.290x.before.2nd.log	2014-02-07 01:16:50.966611282 +0100
> > @@ -24,7 +24,7 @@
> >  			ClockPM- Surprise- LLActRep- BwNot-
> >  		LnkCtl:	ASPM L0s L1 Enabled; RCB 64 bytes Disabled- CommClk+
> >  			ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
> > -		LnkSta:	Speed 5GT/s, Width x16, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
> > +		LnkSta:	Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
> >  		DevCap2: Completion Timeout: Not Supported, TimeoutDis-, LTR-, OBFF Not Supported
> >  		DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled
> >  		LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis-
> > @@ -33,13 +33,13 @@
> >  		LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete-, EqualizationPhase1-
> >  			 EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
> >  	Capabilities: [a0] MSI: Enable- Count=1/1 Maskable- 64bit+
> > -		Address: 0000000000000000  Data: 0000
> > +		Address: 00000000fee00000  Data: 0000
> >  	Capabilities: [100 v1] Vendor Specific Information: ID=0001 Rev=1 Len=010 <?>
> >  	Capabilities: [150 v2] Advanced Error Reporting
> >  		UESta:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
> >  		UEMsk:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
> >  		UESvrt:	DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
> > -		CESta:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-
> > +		CESta:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
> >  		CEMsk:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
> >  		AERCap:	First Error Pointer: 00, GenCap+ CGenEn- ChkCap+ ChkEn-
> >  	Capabilities: [270 v1] #19
> > 
> > After that if I do suspend-to-ram / resume trick I have again lspci
> > output from before 1st boot.
> 
> The Link Status change after X is stopped seems the most interesting to
> me.  The MSI change is probably explained by the MSI save/restore of the
> device, but should be harmless since MSI is disabled.  I'm a bit
> surprised the Correctable Error Status in the AER capability didn't get
> cleared.  I would have thought that a bus reset would have caused the
> link to retrain back to the original speed/width as well.  Let's check
> that we're actually getting a bus reset, try this in addition to the
> previous qemu patch.  This just enables debug logging for the bus resest
> function.  Thanks,
> 

Below are the outputs from 2 boots, VGA, load fglrx and start X. (2nd
time X gets killed and oops happened)

- 1st boot:

vfio: vfio_pci_hot_reset(0000:01:00.1) multi
vfio: 0000:01:00.1: hot reset dependent devices:
vfio: 	0000:01:00.0 group 1
vfio: 	0000:01:00.1 group 1
vfio: 0000:01:00.1 hot reset: Success
vfio: vfio_pci_hot_reset(0000:01:00.1) one
vfio: 0000:01:00.1: hot reset dependent devices:
vfio: 	0000:01:00.0 group 1
vfio: vfio: found another in-use device 0000:01:00.0
vfio: vfio_pci_hot_reset(0000:01:00.0) one
vfio: 0000:01:00.0: hot reset dependent devices:
vfio: 	0000:01:00.0 group 1
vfio: 	0000:01:00.1 group 1
vfio: vfio: found another in-use device 0000:01:00.1

- 2nd boot:

vfio: vfio_pci_hot_reset(0000:01:00.1) multi
vfio: 0000:01:00.1: hot reset dependent devices:
vfio: 	0000:01:00.0 group 1
vfio: 	0000:01:00.1 group 1
vfio: 0000:01:00.1 hot reset: Success
vfio: vfio_pci_hot_reset(0000:01:00.1) one
vfio: 0000:01:00.1: hot reset dependent devices:
vfio: 	0000:01:00.0 group 1
vfio: vfio: found another in-use device 0000:01:00.0
vfio: vfio_pci_hot_reset(0000:01:00.0) one
vfio: 0000:01:00.0: hot reset dependent devices:
vfio: 	0000:01:00.0 group 1
vfio: 	0000:01:00.1 group 1
vfio: vfio: found another in-use device 0000:01:00.1

> Alex
> 
> diff --git a/hw/misc/vfio.c b/hw/misc/vfio.c
> index 8db182f..7fec259 100644
> --- a/hw/misc/vfio.c
> +++ b/hw/misc/vfio.c
> @@ -2927,6 +2927,10 @@ static bool vfio_pci_host_match(PCIHostDeviceAddress *hos
>              host1->slot == host2->slot && host1->function == host2->function);
>  }
>  
> +#undef DPRINTF
> +#define DPRINTF(fmt, ...) \
> +    do { fprintf(stderr, "vfio: " fmt, ## __VA_ARGS__); } while (0)
> +
>  static int vfio_pci_hot_reset(VFIODevice *vdev, bool single)
>  {
>      VFIOGroup *group;
> @@ -3104,6 +3108,15 @@ out_single:
>      return ret;
>  }
>  
> +#undef DPRINTF
> +#ifdef DEBUG_VFIO
> +#define DPRINTF(fmt, ...) \
> +    do { fprintf(stderr, "vfio: " fmt, ## __VA_ARGS__); } while (0)
> +#else
> +#define DPRINTF(fmt, ...) \
> +    do { } while (0)
> +#endif
> +
>  /*
>   * We want to differentiate hot reset of mulitple in-use devices vs hot reset
>   * of a single in-use device.  VFIO_DEVICE_RESET will already handle the case
> 
> 

--Maik

  reply	other threads:[~2014-02-07 20:17 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-02-05 18:59 [Qemu-devel] Multi GPU passthrough via VFIO Maik Broemme
2014-02-05 20:26 ` Alex Williamson
2014-02-05 21:10   ` Maik Broemme
2014-02-05 21:27     ` Alex Williamson
2014-02-05 23:47       ` Maik Broemme
2014-02-06  0:25         ` Maik Broemme
2014-02-06  3:36           ` Alex Williamson
2014-02-07  0:22             ` Maik Broemme
2014-02-07 18:07               ` Maik Broemme
2014-02-07 19:10               ` Alex Williamson
2014-02-07 20:17                 ` Maik Broemme [this message]
2014-02-14  0:01                   ` Maik Broemme
2014-02-14  0:33                     ` Alex Williamson
2014-02-14 14:51                       ` Maik Broemme
     [not found]                         ` <20140414170306.GH724@parallels.com>
2015-01-16 12:21                           ` Maik Broemme
2015-01-19 17:43                             ` Alex Williamson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140207201734.GR995@parallels.com \
    --to=mbroemme@parallels.com \
    --cc=alex.williamson@redhat.com \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).