xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] libxl: Don't insert PCI device into xenstore for HVM guests
@ 2015-05-29  7:59 Ross Lagerwall
  2015-05-29  9:41 ` Wei Liu
                   ` (2 more replies)
  0 siblings, 3 replies; 17+ messages in thread
From: Ross Lagerwall @ 2015-05-29  7:59 UTC (permalink / raw)
  To: xen-devel
  Cc: Ian Jackson, Ross Lagerwall, Wei Liu, Ian Campbell,
	Stefano Stabellini

When doing passthrough of a PCI device for an HVM guest, don't insert
the device into xenstore, otherwise pciback attempts to use it which
conflicts with QEMU.

This manifests itself such that the first time a device is passed to a
domain, it succeeds. Subsequent attempts fail unless the device is
unbound from pciback or the machine rebooted.

Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
---
 tools/libxl/libxl_pci.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/libxl/libxl_pci.c b/tools/libxl/libxl_pci.c
index e0743f8..2552889 100644
--- a/tools/libxl/libxl_pci.c
+++ b/tools/libxl/libxl_pci.c
@@ -994,7 +994,7 @@ out:
         }
     }
 
-    if (!starting)
+    if (!starting && type == LIBXL_DOMAIN_TYPE_PV)
         rc = libxl__device_pci_add_xenstore(gc, domid, pcidev, starting);
     else
         rc = 0;
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH] libxl: Don't insert PCI device into xenstore for HVM guests
  2015-05-29  7:59 [PATCH] libxl: Don't insert PCI device into xenstore for HVM guests Ross Lagerwall
@ 2015-05-29  9:41 ` Wei Liu
  2015-05-29  9:43   ` Ross Lagerwall
  2015-06-01 10:12 ` George Dunlap
  2015-06-01 15:26 ` Konrad Rzeszutek Wilk
  2 siblings, 1 reply; 17+ messages in thread
From: Wei Liu @ 2015-05-29  9:41 UTC (permalink / raw)
  To: Ross Lagerwall, Konrad Rzeszutek Wilk
  Cc: Ian Jackson, Stefano Stabellini, Wei Liu, Ian Campbell, xen-devel

On Fri, May 29, 2015 at 08:59:45AM +0100, Ross Lagerwall wrote:
> When doing passthrough of a PCI device for an HVM guest, don't insert
> the device into xenstore, otherwise pciback attempts to use it which
> conflicts with QEMU.
> 
> This manifests itself such that the first time a device is passed to a
> domain, it succeeds. Subsequent attempts fail unless the device is
> unbound from pciback or the machine rebooted.
> 

The commit message looks sensible to me. However this might break
qemu-trad PCI passthrough if I'm not mistaken. What QEMU version are you
using?  Upstream or trad? Have you tested both of them?

Wei.

> Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
> ---
>  tools/libxl/libxl_pci.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/tools/libxl/libxl_pci.c b/tools/libxl/libxl_pci.c
> index e0743f8..2552889 100644
> --- a/tools/libxl/libxl_pci.c
> +++ b/tools/libxl/libxl_pci.c
> @@ -994,7 +994,7 @@ out:
>          }
>      }
>  
> -    if (!starting)
> +    if (!starting && type == LIBXL_DOMAIN_TYPE_PV)
>          rc = libxl__device_pci_add_xenstore(gc, domid, pcidev, starting);
>      else
>          rc = 0;
> -- 
> 2.1.0

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] libxl: Don't insert PCI device into xenstore for HVM guests
  2015-05-29  9:41 ` Wei Liu
@ 2015-05-29  9:43   ` Ross Lagerwall
  2015-05-29  9:50     ` Wei Liu
  0 siblings, 1 reply; 17+ messages in thread
From: Ross Lagerwall @ 2015-05-29  9:43 UTC (permalink / raw)
  To: Wei Liu, Konrad Rzeszutek Wilk
  Cc: Stefano Stabellini, Ian Jackson, Ian Campbell, xen-devel

On 05/29/2015 10:41 AM, Wei Liu wrote:
> On Fri, May 29, 2015 at 08:59:45AM +0100, Ross Lagerwall wrote:
>> When doing passthrough of a PCI device for an HVM guest, don't insert
>> the device into xenstore, otherwise pciback attempts to use it which
>> conflicts with QEMU.
>>
>> This manifests itself such that the first time a device is passed to a
>> domain, it succeeds. Subsequent attempts fail unless the device is
>> unbound from pciback or the machine rebooted.
>>
>
> The commit message looks sensible to me. However this might break
> qemu-trad PCI passthrough if I'm not mistaken. What QEMU version are you
> using?  Upstream or trad? Have you tested both of them?
>

qemu-trad. I haven't tested with qemu-upstream.

-- 
Ross Lagerwall

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] libxl: Don't insert PCI device into xenstore for HVM guests
  2015-05-29  9:43   ` Ross Lagerwall
@ 2015-05-29  9:50     ` Wei Liu
  2015-05-29  9:54       ` Ross Lagerwall
  0 siblings, 1 reply; 17+ messages in thread
From: Wei Liu @ 2015-05-29  9:50 UTC (permalink / raw)
  To: Ross Lagerwall
  Cc: Wei Liu, Ian Campbell, Stefano Stabellini, Ian Jackson, xen-devel

On Fri, May 29, 2015 at 10:43:08AM +0100, Ross Lagerwall wrote:
> On 05/29/2015 10:41 AM, Wei Liu wrote:
> >On Fri, May 29, 2015 at 08:59:45AM +0100, Ross Lagerwall wrote:
> >>When doing passthrough of a PCI device for an HVM guest, don't insert
> >>the device into xenstore, otherwise pciback attempts to use it which
> >>conflicts with QEMU.
> >>
> >>This manifests itself such that the first time a device is passed to a
> >>domain, it succeeds. Subsequent attempts fail unless the device is
> >>unbound from pciback or the machine rebooted.
> >>
> >
> >The commit message looks sensible to me. However this might break
> >qemu-trad PCI passthrough if I'm not mistaken. What QEMU version are you
> >using?  Upstream or trad? Have you tested both of them?
> >
> 
> qemu-trad. I haven't tested with qemu-upstream.
> 

I don't quite get this. Doesn't qemu-trad depends on those xenstore
nodes for PCI passthrough information?  What did I miss?

Wei.

> -- 
> Ross Lagerwall

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] libxl: Don't insert PCI device into xenstore for HVM guests
  2015-05-29  9:50     ` Wei Liu
@ 2015-05-29  9:54       ` Ross Lagerwall
  2015-05-29 10:24         ` Wei Liu
  0 siblings, 1 reply; 17+ messages in thread
From: Ross Lagerwall @ 2015-05-29  9:54 UTC (permalink / raw)
  To: Wei Liu; +Cc: xen-devel, Ian Jackson, Ian Campbell, Stefano Stabellini

On 05/29/2015 10:50 AM, Wei Liu wrote:
> On Fri, May 29, 2015 at 10:43:08AM +0100, Ross Lagerwall wrote:
>> On 05/29/2015 10:41 AM, Wei Liu wrote:
>>> On Fri, May 29, 2015 at 08:59:45AM +0100, Ross Lagerwall wrote:
>>>> When doing passthrough of a PCI device for an HVM guest, don't insert
>>>> the device into xenstore, otherwise pciback attempts to use it which
>>>> conflicts with QEMU.
>>>>
>>>> This manifests itself such that the first time a device is passed to a
>>>> domain, it succeeds. Subsequent attempts fail unless the device is
>>>> unbound from pciback or the machine rebooted.
>>>>
>>>
>>> The commit message looks sensible to me. However this might break
>>> qemu-trad PCI passthrough if I'm not mistaken. What QEMU version are you
>>> using?  Upstream or trad? Have you tested both of them?
>>>
>>
>> qemu-trad. I haven't tested with qemu-upstream.
>>
>
> I don't quite get this. Doesn't qemu-trad depends on those xenstore
> nodes for PCI passthrough information?  What did I miss?
>

A different set of xenstore keys are used for communication between 
libxl and QEMU. The communication between libxl and QEMU happens further 
up in the same function:
http://xenbits.xen.org/gitweb/?p=xen.git;a=blob;f=tools/libxl/libxl_pci.c;h=e0743f8112689b340ba7de88bc8895b62105aaba;hb=HEAD#l901

Regards,
-- 
Ross Lagerwall

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] libxl: Don't insert PCI device into xenstore for HVM guests
  2015-05-29  9:54       ` Ross Lagerwall
@ 2015-05-29 10:24         ` Wei Liu
  0 siblings, 0 replies; 17+ messages in thread
From: Wei Liu @ 2015-05-29 10:24 UTC (permalink / raw)
  To: Ross Lagerwall
  Cc: Wei Liu, Ian Campbell, Stefano Stabellini, Ian Jackson, xen-devel

On Fri, May 29, 2015 at 10:54:09AM +0100, Ross Lagerwall wrote:
> On 05/29/2015 10:50 AM, Wei Liu wrote:
> >On Fri, May 29, 2015 at 10:43:08AM +0100, Ross Lagerwall wrote:
> >>On 05/29/2015 10:41 AM, Wei Liu wrote:
> >>>On Fri, May 29, 2015 at 08:59:45AM +0100, Ross Lagerwall wrote:
> >>>>When doing passthrough of a PCI device for an HVM guest, don't insert
> >>>>the device into xenstore, otherwise pciback attempts to use it which
> >>>>conflicts with QEMU.
> >>>>
> >>>>This manifests itself such that the first time a device is passed to a
> >>>>domain, it succeeds. Subsequent attempts fail unless the device is
> >>>>unbound from pciback or the machine rebooted.
> >>>>
> >>>
> >>>The commit message looks sensible to me. However this might break
> >>>qemu-trad PCI passthrough if I'm not mistaken. What QEMU version are you
> >>>using?  Upstream or trad? Have you tested both of them?
> >>>
> >>
> >>qemu-trad. I haven't tested with qemu-upstream.
> >>
> >
> >I don't quite get this. Doesn't qemu-trad depends on those xenstore
> >nodes for PCI passthrough information?  What did I miss?
> >
> 
> A different set of xenstore keys are used for communication between libxl
> and QEMU. The communication between libxl and QEMU happens further up in the
> same function:
> http://xenbits.xen.org/gitweb/?p=xen.git;a=blob;f=tools/libxl/libxl_pci.c;h=e0743f8112689b340ba7de88bc8895b62105aaba;hb=HEAD#l901
> 

OK. Now I get the idea. IMHO this piece of code is not in a very good
state. The problem is the way it works is very fragile. Now we have
three functions, each of which has partial responsibility of writing
some xenstore nodes. This is not your fault.

Acked-by: Wei Liu <wei.liu2@citrix.com>

> Regards,
> -- 
> Ross Lagerwall

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] libxl: Don't insert PCI device into xenstore for HVM guests
  2015-05-29  7:59 [PATCH] libxl: Don't insert PCI device into xenstore for HVM guests Ross Lagerwall
  2015-05-29  9:41 ` Wei Liu
@ 2015-06-01 10:12 ` George Dunlap
  2015-06-01 15:26 ` Konrad Rzeszutek Wilk
  2 siblings, 0 replies; 17+ messages in thread
From: George Dunlap @ 2015-06-01 10:12 UTC (permalink / raw)
  To: Ross Lagerwall
  Cc: Wei Liu, Stefano Stabellini, Ian Jackson, Ian Campbell,
	xen-devel@lists.xen.org

On Fri, May 29, 2015 at 8:59 AM, Ross Lagerwall
<ross.lagerwall@citrix.com> wrote:
> When doing passthrough of a PCI device for an HVM guest, don't insert
> the device into xenstore, otherwise pciback attempts to use it which
> conflicts with QEMU.
>
> This manifests itself such that the first time a device is passed to a
> domain, it succeeds. Subsequent attempts fail unless the device is
> unbound from pciback or the machine rebooted.
>
> Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
> ---
>  tools/libxl/libxl_pci.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/tools/libxl/libxl_pci.c b/tools/libxl/libxl_pci.c
> index e0743f8..2552889 100644
> --- a/tools/libxl/libxl_pci.c
> +++ b/tools/libxl/libxl_pci.c
> @@ -994,7 +994,7 @@ out:
>          }
>      }
>
> -    if (!starting)
> +    if (!starting && type == LIBXL_DOMAIN_TYPE_PV)
>          rc = libxl__device_pci_add_xenstore(gc, domid, pcidev, starting);

It looks like device_pci_list reads backend/pci/ for the list of
assigned devices.  Does "libxl__device_pci_list()" still work after
this change?

 -George

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] libxl: Don't insert PCI device into xenstore for HVM guests
  2015-05-29  7:59 [PATCH] libxl: Don't insert PCI device into xenstore for HVM guests Ross Lagerwall
  2015-05-29  9:41 ` Wei Liu
  2015-06-01 10:12 ` George Dunlap
@ 2015-06-01 15:26 ` Konrad Rzeszutek Wilk
  2015-06-01 15:43   ` Ross Lagerwall
  2 siblings, 1 reply; 17+ messages in thread
From: Konrad Rzeszutek Wilk @ 2015-06-01 15:26 UTC (permalink / raw)
  To: Ross Lagerwall
  Cc: Wei Liu, Stefano Stabellini, Ian Jackson, Ian Campbell, xen-devel

On Fri, May 29, 2015 at 08:59:45AM +0100, Ross Lagerwall wrote:
> When doing passthrough of a PCI device for an HVM guest, don't insert
> the device into xenstore, otherwise pciback attempts to use it which
> conflicts with QEMU.

How does it conflict?
> 
> This manifests itself such that the first time a device is passed to a
> domain, it succeeds. Subsequent attempts fail unless the device is
> unbound from pciback or the machine rebooted.

Can you be more specific please? What are the issues? Why does it
fail?

There are certain things that pciback does to "prepare" an PCI device
which QEMU also does. Some of them - such as saving the configuration
registers (And then restoring them after the device has been detached) -
is something that QEMU does not do.

> 
> Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
> ---
>  tools/libxl/libxl_pci.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/tools/libxl/libxl_pci.c b/tools/libxl/libxl_pci.c
> index e0743f8..2552889 100644
> --- a/tools/libxl/libxl_pci.c
> +++ b/tools/libxl/libxl_pci.c
> @@ -994,7 +994,7 @@ out:
>          }
>      }
>  
> -    if (!starting)
> +    if (!starting && type == LIBXL_DOMAIN_TYPE_PV)
>          rc = libxl__device_pci_add_xenstore(gc, domid, pcidev, starting);
>      else
>          rc = 0;
> -- 
> 2.1.0
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] libxl: Don't insert PCI device into xenstore for HVM guests
  2015-06-01 15:26 ` Konrad Rzeszutek Wilk
@ 2015-06-01 15:43   ` Ross Lagerwall
  2015-06-01 15:58     ` Konrad Rzeszutek Wilk
                       ` (2 more replies)
  0 siblings, 3 replies; 17+ messages in thread
From: Ross Lagerwall @ 2015-06-01 15:43 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Wei Liu, Stefano Stabellini, Ian Jackson, Ian Campbell, xen-devel

On 06/01/2015 04:26 PM, Konrad Rzeszutek Wilk wrote:
> On Fri, May 29, 2015 at 08:59:45AM +0100, Ross Lagerwall wrote:
>> When doing passthrough of a PCI device for an HVM guest, don't insert
>> the device into xenstore, otherwise pciback attempts to use it which
>> conflicts with QEMU.
>
> How does it conflict?

It doesn't work with repeated use. See below.

>>
>> This manifests itself such that the first time a device is passed to a
>> domain, it succeeds. Subsequent attempts fail unless the device is
>> unbound from pciback or the machine rebooted.
>
> Can you be more specific please? What are the issues? Why does it
> fail?

Without this patch, if a device (e.g. a GPU) is bound to pciback and 
then passed through to a guest using xl pci-attach, it appears in the 
guest and works fine. If the guest is rebooted, and the device is again 
passed through with xl pci-attach, it appears in the guest as before but 
does not work. In Windows, it gets something like Error Code 43 and on 
Linux, the Nouveau driver fails to initialize the device (with error -22 
or something). The only way to get the device to work again is to reboot 
the host or unbind and rebind it to pciback.

With this patch, it works as expected. The device is bound to pciback 
and works after being passed through, even after the VM is rebooted.

>
> There are certain things that pciback does to "prepare" an PCI device
> which QEMU also does. Some of them - such as saving the configuration
> registers (And then restoring them after the device has been detached) -
> is something that QEMU does not do.
>

I really have no idea what the correct thing to do is, but the current 
code with qemu-trad doesn't seem to work (for me).

Regards
-- 
Ross Lagerwall

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] libxl: Don't insert PCI device into xenstore for HVM guests
  2015-06-01 15:43   ` Ross Lagerwall
@ 2015-06-01 15:58     ` Konrad Rzeszutek Wilk
  2015-06-01 15:59     ` Sander Eikelenboom
  2015-06-01 16:03     ` Malcolm Crossley
  2 siblings, 0 replies; 17+ messages in thread
From: Konrad Rzeszutek Wilk @ 2015-06-01 15:58 UTC (permalink / raw)
  To: Ross Lagerwall
  Cc: Wei Liu, Stefano Stabellini, Ian Jackson, Ian Campbell, xen-devel

On Mon, Jun 01, 2015 at 04:43:53PM +0100, Ross Lagerwall wrote:
> On 06/01/2015 04:26 PM, Konrad Rzeszutek Wilk wrote:
> >On Fri, May 29, 2015 at 08:59:45AM +0100, Ross Lagerwall wrote:
> >>When doing passthrough of a PCI device for an HVM guest, don't insert
> >>the device into xenstore, otherwise pciback attempts to use it which
> >>conflicts with QEMU.
> >
> >How does it conflict?
> 
> It doesn't work with repeated use. See below.
> 
> >>
> >>This manifests itself such that the first time a device is passed to a
> >>domain, it succeeds. Subsequent attempts fail unless the device is
> >>unbound from pciback or the machine rebooted.
> >
> >Can you be more specific please? What are the issues? Why does it
> >fail?
> 
> Without this patch, if a device (e.g. a GPU) is bound to pciback and then
> passed through to a guest using xl pci-attach, it appears in the guest and
> works fine. If the guest is rebooted, and the device is again passed through
> with xl pci-attach, it appears in the guest as before but does not work. In
> Windows, it gets something like Error Code 43 and on Linux, the Nouveau
> driver fails to initialize the device (with error -22 or something). The
> only way to get the device to work again is to reboot the host or unbind and
> rebind it to pciback.
> 
> With this patch, it works as expected. The device is bound to pciback and
> works after being passed through, even after the VM is rebooted.

That would imply that the logic to 'prepare' the device is not working
as expected when the device is unplugged and replugged.

Can you provide an output from 'lspci -xxxx -vvvv <BDF>' for the device
before you do 'xl pci-attach', after (When it is running in the guest),
and after you do the second 'pci-attach' after the guest has been
rebooted?
> 
> >
> >There are certain things that pciback does to "prepare" an PCI device
> >which QEMU also does. Some of them - such as saving the configuration
> >registers (And then restoring them after the device has been detached) -
> >is something that QEMU does not do.
> >
> 
> I really have no idea what the correct thing to do is, but the current code
> with qemu-trad doesn't seem to work (for me).

> 
> Regards
> -- 
> Ross Lagerwall

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] libxl: Don't insert PCI device into xenstore for HVM guests
  2015-06-01 15:43   ` Ross Lagerwall
  2015-06-01 15:58     ` Konrad Rzeszutek Wilk
@ 2015-06-01 15:59     ` Sander Eikelenboom
  2015-06-01 16:03     ` Malcolm Crossley
  2 siblings, 0 replies; 17+ messages in thread
From: Sander Eikelenboom @ 2015-06-01 15:59 UTC (permalink / raw)
  To: Ross Lagerwall
  Cc: Wei Liu, Ian Campbell, Stefano Stabellini, Ian Jackson, xen-devel


Monday, June 1, 2015, 5:43:53 PM, you wrote:

> On 06/01/2015 04:26 PM, Konrad Rzeszutek Wilk wrote:
>> On Fri, May 29, 2015 at 08:59:45AM +0100, Ross Lagerwall wrote:
>>> When doing passthrough of a PCI device for an HVM guest, don't insert
>>> the device into xenstore, otherwise pciback attempts to use it which
>>> conflicts with QEMU.
>>
>> How does it conflict?

> It doesn't work with repeated use. See below.

>>>
>>> This manifests itself such that the first time a device is passed to a
>>> domain, it succeeds. Subsequent attempts fail unless the device is
>>> unbound from pciback or the machine rebooted.
>>
>> Can you be more specific please? What are the issues? Why does it
>> fail?

> Without this patch, if a device (e.g. a GPU) is bound to pciback and 
> then passed through to a guest using xl pci-attach, it appears in the 
> guest and works fine. If the guest is rebooted, and the device is again 
> passed through with xl pci-attach, it appears in the guest as before but 
> does not work. In Windows, it gets something like Error Code 43 and on 
> Linux, the Nouveau driver fails to initialize the device (with error -22 
> or something). The only way to get the device to work again is to reboot 
> the host or unbind and rebind it to pciback.

> With this patch, it works as expected. The device is bound to pciback 
> and works after being passed through, even after the VM is rebooted.

>>
>> There are certain things that pciback does to "prepare" an PCI device
>> which QEMU also does. Some of them - such as saving the configuration
>> registers (And then restoring them after the device has been detached) -
>> is something that QEMU does not do.
>>

> I really have no idea what the correct thing to do is, but the current 
> code with qemu-trad doesn't seem to work (for me).

> Regards

The Konrad's "do_flr" patch that David NACKed works fine for me. It makes it possible to
do VGA passthrough and reboot endlessly. 
But is has to be refactored into something that is acceptable upstream.
(and possibly be self contained and not rely on the do_flr sysfs trigger to work 
around some locking issue, that would make most part of the NACK (the naming of 
the option) also go away. )

--
Sander

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] libxl: Don't insert PCI device into xenstore for HVM guests
  2015-06-01 15:43   ` Ross Lagerwall
  2015-06-01 15:58     ` Konrad Rzeszutek Wilk
  2015-06-01 15:59     ` Sander Eikelenboom
@ 2015-06-01 16:03     ` Malcolm Crossley
  2015-06-01 17:55       ` Konrad Rzeszutek Wilk
  2 siblings, 1 reply; 17+ messages in thread
From: Malcolm Crossley @ 2015-06-01 16:03 UTC (permalink / raw)
  To: Ross Lagerwall, Konrad Rzeszutek Wilk
  Cc: Ian Jackson, xen-devel, Wei Liu, Ian Campbell, Stefano Stabellini

On 01/06/15 16:43, Ross Lagerwall wrote:
> On 06/01/2015 04:26 PM, Konrad Rzeszutek Wilk wrote:
>> On Fri, May 29, 2015 at 08:59:45AM +0100, Ross Lagerwall wrote:
>>> When doing passthrough of a PCI device for an HVM guest, don't insert
>>> the device into xenstore, otherwise pciback attempts to use it which
>>> conflicts with QEMU.
>>
>> How does it conflict?
> 
> It doesn't work with repeated use. See below.
> 
>>>
>>> This manifests itself such that the first time a device is passed to a
>>> domain, it succeeds. Subsequent attempts fail unless the device is
>>> unbound from pciback or the machine rebooted.
>>
>> Can you be more specific please? What are the issues? Why does it
>> fail?
> 
> Without this patch, if a device (e.g. a GPU) is bound to pciback and
> then passed through to a guest using xl pci-attach, it appears in the
> guest and works fine. If the guest is rebooted, and the device is again
> passed through with xl pci-attach, it appears in the guest as before but
> does not work. In Windows, it gets something like Error Code 43 and on
> Linux, the Nouveau driver fails to initialize the device (with error -22
> or something). The only way to get the device to work again is to reboot
> the host or unbind and rebind it to pciback.
> 
> With this patch, it works as expected. The device is bound to pciback
> and works after being passed through, even after the VM is rebooted.
> 
>>
>> There are certain things that pciback does to "prepare" an PCI device
>> which QEMU also does. Some of them - such as saving the configuration
>> registers (And then restoring them after the device has been detached) -
>> is something that QEMU does not do.
>>
> 
> I really have no idea what the correct thing to do is, but the current
> code with qemu-trad doesn't seem to work (for me).

The pciback pci_stub.c implements the pciback.hide and the device reset
logic.

The rest of pciback implements the pciback xenbus device which PV guests
need in order to map/unmap MSI interrupts and access PCI config space.

QEMU emulates and handles the MSI interrupt capabilities and PCI config
space directly.

This is why a pciback xenbus device should not be created for
passthrough PCI device being handled by QEMU.

Malcolm

> 
> Regards

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] libxl: Don't insert PCI device into xenstore for HVM guests
  2015-06-01 16:03     ` Malcolm Crossley
@ 2015-06-01 17:55       ` Konrad Rzeszutek Wilk
  2015-06-02 10:06         ` Malcolm Crossley
  0 siblings, 1 reply; 17+ messages in thread
From: Konrad Rzeszutek Wilk @ 2015-06-01 17:55 UTC (permalink / raw)
  To: Malcolm Crossley
  Cc: Wei Liu, Ian Campbell, Stefano Stabellini, Ian Jackson, xen-devel,
	Ross Lagerwall

On Mon, Jun 01, 2015 at 05:03:14PM +0100, Malcolm Crossley wrote:
> On 01/06/15 16:43, Ross Lagerwall wrote:
> > On 06/01/2015 04:26 PM, Konrad Rzeszutek Wilk wrote:
> >> On Fri, May 29, 2015 at 08:59:45AM +0100, Ross Lagerwall wrote:
> >>> When doing passthrough of a PCI device for an HVM guest, don't insert
> >>> the device into xenstore, otherwise pciback attempts to use it which
> >>> conflicts with QEMU.
> >>
> >> How does it conflict?
> > 
> > It doesn't work with repeated use. See below.
> > 
> >>>
> >>> This manifests itself such that the first time a device is passed to a
> >>> domain, it succeeds. Subsequent attempts fail unless the device is
> >>> unbound from pciback or the machine rebooted.
> >>
> >> Can you be more specific please? What are the issues? Why does it
> >> fail?
> > 
> > Without this patch, if a device (e.g. a GPU) is bound to pciback and
> > then passed through to a guest using xl pci-attach, it appears in the
> > guest and works fine. If the guest is rebooted, and the device is again
> > passed through with xl pci-attach, it appears in the guest as before but
> > does not work. In Windows, it gets something like Error Code 43 and on
> > Linux, the Nouveau driver fails to initialize the device (with error -22
> > or something). The only way to get the device to work again is to reboot
> > the host or unbind and rebind it to pciback.
> > 
> > With this patch, it works as expected. The device is bound to pciback
> > and works after being passed through, even after the VM is rebooted.
> > 
> >>
> >> There are certain things that pciback does to "prepare" an PCI device
> >> which QEMU also does. Some of them - such as saving the configuration
> >> registers (And then restoring them after the device has been detached) -
> >> is something that QEMU does not do.
> >>
> > 
> > I really have no idea what the correct thing to do is, but the current
> > code with qemu-trad doesn't seem to work (for me).
> 
> The pciback pci_stub.c implements the pciback.hide and the device reset
> logic.
> 
> The rest of pciback implements the pciback xenbus device which PV guests
> need in order to map/unmap MSI interrupts and access PCI config space.
> 
> QEMU emulates and handles the MSI interrupt capabilities and PCI config
> space directly.

Right..
> 
> This is why a pciback xenbus device should not be created for
> passthrough PCI device being handled by QEMU.

To me that sounds that we should not have PV drivers because QEMU
emulates IDE or network devices.

The crux here is that none of the operations that pciback performs
should affect QEMU or guests. But it does - so there is a bug.

I would like to understand which ones do it so I can fix in
pciback - as it might be also be a problem with PV.

Unless... are you by any chance using extra patches on top of the
native pciback?

> 
> Malcolm
> 
> > 
> > Regards
> 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] libxl: Don't insert PCI device into xenstore for HVM guests
  2015-06-01 17:55       ` Konrad Rzeszutek Wilk
@ 2015-06-02 10:06         ` Malcolm Crossley
  2015-06-02 14:34           ` Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 17+ messages in thread
From: Malcolm Crossley @ 2015-06-02 10:06 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Wei Liu, Ian Campbell, Stefano Stabellini, Ian Jackson, xen-devel,
	Ross Lagerwall

On 01/06/15 18:55, Konrad Rzeszutek Wilk wrote:
> On Mon, Jun 01, 2015 at 05:03:14PM +0100, Malcolm Crossley wrote:
>> On 01/06/15 16:43, Ross Lagerwall wrote:
>>> On 06/01/2015 04:26 PM, Konrad Rzeszutek Wilk wrote:
>>>> On Fri, May 29, 2015 at 08:59:45AM +0100, Ross Lagerwall wrote:
>>>>> When doing passthrough of a PCI device for an HVM guest, don't insert
>>>>> the device into xenstore, otherwise pciback attempts to use it which
>>>>> conflicts with QEMU.
>>>>
>>>> How does it conflict?
>>>
>>> It doesn't work with repeated use. See below.
>>>
>>>>>
>>>>> This manifests itself such that the first time a device is passed to a
>>>>> domain, it succeeds. Subsequent attempts fail unless the device is
>>>>> unbound from pciback or the machine rebooted.
>>>>
>>>> Can you be more specific please? What are the issues? Why does it
>>>> fail?
>>>
>>> Without this patch, if a device (e.g. a GPU) is bound to pciback and
>>> then passed through to a guest using xl pci-attach, it appears in the
>>> guest and works fine. If the guest is rebooted, and the device is again
>>> passed through with xl pci-attach, it appears in the guest as before but
>>> does not work. In Windows, it gets something like Error Code 43 and on
>>> Linux, the Nouveau driver fails to initialize the device (with error -22
>>> or something). The only way to get the device to work again is to reboot
>>> the host or unbind and rebind it to pciback.
>>>
>>> With this patch, it works as expected. The device is bound to pciback
>>> and works after being passed through, even after the VM is rebooted.
>>>
>>>>
>>>> There are certain things that pciback does to "prepare" an PCI device
>>>> which QEMU also does. Some of them - such as saving the configuration
>>>> registers (And then restoring them after the device has been detached) -
>>>> is something that QEMU does not do.
>>>>
>>>
>>> I really have no idea what the correct thing to do is, but the current
>>> code with qemu-trad doesn't seem to work (for me).
>>
>> The pciback pci_stub.c implements the pciback.hide and the device reset
>> logic.
>>
>> The rest of pciback implements the pciback xenbus device which PV guests
>> need in order to map/unmap MSI interrupts and access PCI config space.
>>
>> QEMU emulates and handles the MSI interrupt capabilities and PCI config
>> space directly.
> 
> Right..
>>
>> This is why a pciback xenbus device should not be created for
>> passthrough PCI device being handled by QEMU.
> 
> To me that sounds that we should not have PV drivers because QEMU
> emulates IDE or network devices.

That is different. We first boot with QEMU handling the devices and then
we explictly unplug QEMU's handling of IDE and network devices.

That handover protocol does not currently exist for PCI passthrough
devices so we have to chose one mechanism or the other to manage the
passed through PCI device at boot time. Otherwise a HVM guest could load
pcifront and cause's all kinds of chaos with interrupt management or
outbound MMIO window management.

> 
> The crux here is that none of the operations that pciback performs
> should affect QEMU or guests. But it does - so there is a bug.

I agree there is a bug but should we try to fix it based upon my
comments above?
> 
> I would like to understand which ones do it so I can fix in
> pciback - as it might be also be a problem with PV.
> 
> Unless... are you by any chance using extra patches on top of the
> native pciback?

We do have extra patches but they only allow us to do a SBR on PCI
device's which require it. They failure listed above occurs on devices
with device specific resets (e.g. FLR,D3) as well so those extra patches
aren't being used.

> 
>>
>> Malcolm
>>
>>>
>>> Regards
>>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] libxl: Don't insert PCI device into xenstore for HVM guests
  2015-06-02 10:06         ` Malcolm Crossley
@ 2015-06-02 14:34           ` Konrad Rzeszutek Wilk
  2015-06-02 15:48             ` Malcolm Crossley
  0 siblings, 1 reply; 17+ messages in thread
From: Konrad Rzeszutek Wilk @ 2015-06-02 14:34 UTC (permalink / raw)
  To: Malcolm Crossley
  Cc: Wei Liu, Ian Campbell, Stefano Stabellini, Ian Jackson, xen-devel,
	Ross Lagerwall

On Tue, Jun 02, 2015 at 11:06:26AM +0100, Malcolm Crossley wrote:
> On 01/06/15 18:55, Konrad Rzeszutek Wilk wrote:
> > On Mon, Jun 01, 2015 at 05:03:14PM +0100, Malcolm Crossley wrote:
> >> On 01/06/15 16:43, Ross Lagerwall wrote:
> >>> On 06/01/2015 04:26 PM, Konrad Rzeszutek Wilk wrote:
> >>>> On Fri, May 29, 2015 at 08:59:45AM +0100, Ross Lagerwall wrote:
> >>>>> When doing passthrough of a PCI device for an HVM guest, don't insert
> >>>>> the device into xenstore, otherwise pciback attempts to use it which
> >>>>> conflicts with QEMU.
> >>>>
> >>>> How does it conflict?
> >>>
> >>> It doesn't work with repeated use. See below.
> >>>
> >>>>>
> >>>>> This manifests itself such that the first time a device is passed to a
> >>>>> domain, it succeeds. Subsequent attempts fail unless the device is
> >>>>> unbound from pciback or the machine rebooted.
> >>>>
> >>>> Can you be more specific please? What are the issues? Why does it
> >>>> fail?
> >>>
> >>> Without this patch, if a device (e.g. a GPU) is bound to pciback and
> >>> then passed through to a guest using xl pci-attach, it appears in the
> >>> guest and works fine. If the guest is rebooted, and the device is again
> >>> passed through with xl pci-attach, it appears in the guest as before but
> >>> does not work. In Windows, it gets something like Error Code 43 and on
> >>> Linux, the Nouveau driver fails to initialize the device (with error -22
> >>> or something). The only way to get the device to work again is to reboot
> >>> the host or unbind and rebind it to pciback.
> >>>
> >>> With this patch, it works as expected. The device is bound to pciback
> >>> and works after being passed through, even after the VM is rebooted.
> >>>
> >>>>
> >>>> There are certain things that pciback does to "prepare" an PCI device
> >>>> which QEMU also does. Some of them - such as saving the configuration
> >>>> registers (And then restoring them after the device has been detached) -
> >>>> is something that QEMU does not do.
> >>>>
> >>>
> >>> I really have no idea what the correct thing to do is, but the current
> >>> code with qemu-trad doesn't seem to work (for me).

I think I know what the problem is. Do you by any chance have the XSA133-addenum
patch in? If not could you apply it and tell me if it works?

> >>
> >> The pciback pci_stub.c implements the pciback.hide and the device reset
> >> logic.
> >>
> >> The rest of pciback implements the pciback xenbus device which PV guests
> >> need in order to map/unmap MSI interrupts and access PCI config space.
> >>
> >> QEMU emulates and handles the MSI interrupt capabilities and PCI config
> >> space directly.
> > 
> > Right..
> >>
> >> This is why a pciback xenbus device should not be created for
> >> passthrough PCI device being handled by QEMU.
> > 
> > To me that sounds that we should not have PV drivers because QEMU
> > emulates IDE or network devices.
> 
> That is different. We first boot with QEMU handling the devices and then
> we explictly unplug QEMU's handling of IDE and network devices.
> 
> That handover protocol does not currently exist for PCI passthrough
> devices so we have to chose one mechanism or the other to manage the
> passed through PCI device at boot time. Otherwise a HVM guest could load
> pcifront and cause's all kinds of chaos with interrupt management or
> outbound MMIO window management.

Which would be fun! :-)
> 
> > 
> > The crux here is that none of the operations that pciback performs
> > should affect QEMU or guests. But it does - so there is a bug.
> 
> I agree there is a bug but should we try to fix it based upon my
> comments above?

I am still thinking about it. I do like certain things that pciback
does as part of it being notified that a device is to be used by
a guest and performing the configuration save/reset (see
pcistub_put_pci_dev in the pciback).

If somehow that can still be done by libxl (or QEMU) via SysFS
that would be good.

Just to clarify:
 - I concur with you that having xen-pcifront loaded in HVM
   guest and doing odd things behind QEMU is not good.
 - I like the fact that xen-pciback does a bunch of safety
   things with the PCI device to prepare it for a guest.
 - Currently these 'safety things'  are done when you
   'unbind' or 'bind' the device to pciback.
 - Or when the guest is shutdown and via XenBus we are told
   and can do the 'safety things'. This is the crux - if there
   is a way to do this via SysFS this would be super.

   Or perhaps xenpciback can figure out that the guest is HVM
   and ignore any XenBus actions?

> > 
> > I would like to understand which ones do it so I can fix in
> > pciback - as it might be also be a problem with PV.
> > 
> > Unless... are you by any chance using extra patches on top of the
> > native pciback?
> 
> We do have extra patches but they only allow us to do a SBR on PCI
> device's which require it. They failure listed above occurs on devices
> with device specific resets (e.g. FLR,D3) as well so those extra patches
> aren't being used.
> 
> > 
> >>
> >> Malcolm
> >>
> >>>
> >>> Regards
> >>
> 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] libxl: Don't insert PCI device into xenstore for HVM guests
  2015-06-02 14:34           ` Konrad Rzeszutek Wilk
@ 2015-06-02 15:48             ` Malcolm Crossley
  2015-06-10 20:10               ` Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 17+ messages in thread
From: Malcolm Crossley @ 2015-06-02 15:48 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Wei Liu, Ian Campbell, Stefano Stabellini, Ian Jackson, xen-devel,
	Ross Lagerwall

On 02/06/15 15:34, Konrad Rzeszutek Wilk wrote:
> On Tue, Jun 02, 2015 at 11:06:26AM +0100, Malcolm Crossley wrote:
>> On 01/06/15 18:55, Konrad Rzeszutek Wilk wrote:
>>> On Mon, Jun 01, 2015 at 05:03:14PM +0100, Malcolm Crossley wrote:
>>>> On 01/06/15 16:43, Ross Lagerwall wrote:
>>>>> On 06/01/2015 04:26 PM, Konrad Rzeszutek Wilk wrote:
>>>>>> On Fri, May 29, 2015 at 08:59:45AM +0100, Ross Lagerwall wrote:
>>>>>>> When doing passthrough of a PCI device for an HVM guest, don't insert
>>>>>>> the device into xenstore, otherwise pciback attempts to use it which
>>>>>>> conflicts with QEMU.
>>>>>>
>>>>>> How does it conflict?
>>>>>
>>>>> It doesn't work with repeated use. See below.
>>>>>
>>>>>>>
>>>>>>> This manifests itself such that the first time a device is passed to a
>>>>>>> domain, it succeeds. Subsequent attempts fail unless the device is
>>>>>>> unbound from pciback or the machine rebooted.
>>>>>>
>>>>>> Can you be more specific please? What are the issues? Why does it
>>>>>> fail?
>>>>>
>>>>> Without this patch, if a device (e.g. a GPU) is bound to pciback and
>>>>> then passed through to a guest using xl pci-attach, it appears in the
>>>>> guest and works fine. If the guest is rebooted, and the device is again
>>>>> passed through with xl pci-attach, it appears in the guest as before but
>>>>> does not work. In Windows, it gets something like Error Code 43 and on
>>>>> Linux, the Nouveau driver fails to initialize the device (with error -22
>>>>> or something). The only way to get the device to work again is to reboot
>>>>> the host or unbind and rebind it to pciback.
>>>>>
>>>>> With this patch, it works as expected. The device is bound to pciback
>>>>> and works after being passed through, even after the VM is rebooted.
>>>>>
>>>>>>
>>>>>> There are certain things that pciback does to "prepare" an PCI device
>>>>>> which QEMU also does. Some of them - such as saving the configuration
>>>>>> registers (And then restoring them after the device has been detached) -
>>>>>> is something that QEMU does not do.
>>>>>>
>>>>>
>>>>> I really have no idea what the correct thing to do is, but the current
>>>>> code with qemu-trad doesn't seem to work (for me).
> 
> I think I know what the problem is. Do you by any chance have the XSA133-addenum
> patch in? If not could you apply it and tell me if it works?
> 
>>>>
>>>> The pciback pci_stub.c implements the pciback.hide and the device reset
>>>> logic.
>>>>
>>>> The rest of pciback implements the pciback xenbus device which PV guests
>>>> need in order to map/unmap MSI interrupts and access PCI config space.
>>>>
>>>> QEMU emulates and handles the MSI interrupt capabilities and PCI config
>>>> space directly.
>>>
>>> Right..
>>>>
>>>> This is why a pciback xenbus device should not be created for
>>>> passthrough PCI device being handled by QEMU.
>>>
>>> To me that sounds that we should not have PV drivers because QEMU
>>> emulates IDE or network devices.
>>
>> That is different. We first boot with QEMU handling the devices and then
>> we explictly unplug QEMU's handling of IDE and network devices.
>>
>> That handover protocol does not currently exist for PCI passthrough
>> devices so we have to chose one mechanism or the other to manage the
>> passed through PCI device at boot time. Otherwise a HVM guest could load
>> pcifront and cause's all kinds of chaos with interrupt management or
>> outbound MMIO window management.
> 
> Which would be fun! :-)
>>
>>>
>>> The crux here is that none of the operations that pciback performs
>>> should affect QEMU or guests. But it does - so there is a bug.
>>
>> I agree there is a bug but should we try to fix it based upon my
>> comments above?
> 
> I am still thinking about it. I do like certain things that pciback
> does as part of it being notified that a device is to be used by
> a guest and performing the configuration save/reset (see
> pcistub_put_pci_dev in the pciback).
> 
> If somehow that can still be done by libxl (or QEMU) via SysFS
> that would be good.
> 
> Just to clarify:
>  - I concur with you that having xen-pcifront loaded in HVM
>    guest and doing odd things behind QEMU is not good.
>  - I like the fact that xen-pciback does a bunch of safety
>    things with the PCI device to prepare it for a guest.
>  - Currently these 'safety things'  are done when you
>    'unbind' or 'bind' the device to pciback.
>  - Or when the guest is shutdown and via XenBus we are told
>    and can do the 'safety things'. This is the crux - if there
>    is a way to do this via SysFS this would be super.
> 
>    Or perhaps xenpciback can figure out that the guest is HVM
>    and ignore any XenBus actions?
> 

Xenserver toolstack currently bind/unbinds the device to pciback and
manually triggers the reset on the device. It then uses xenstore keys to
communicate with the QEMU hotplug mechanism.

A complete description is here:

"Xenopsd performs the following steps for HVM PCI passthrough (for each
PCI device):

    write /local/domain/0/backend/pci/<domid>/0/msitranslate “0” or “1”
    write /local/domain/0/backend/pci/<domid>/0/power_mgmt “0” or “1”
    bind device to pciback (write device id to the new_slot and bind
nodes in sysfs)
    write /local/domain/0/backend/device-model/<domid>/command “pci-ins”
    write /local/domain/0/backend/device-model/<domid>/parameter
“xxxx:xx:xx.x”
    wait for “pci-inserted” to appear in
/local/domain/0/backend/<domid>/device-model/state
    write /local/domain/0/backend/pci/<domid>/0/dev-<x> “xxxx:xx:xx.x”
    if /local/domain/<domid>/device/pci/0 does not yet exist, then
create /local/domain/<domid>/device/pci/0, give ownership to the guest,
and write backend=/local/domain/0/backend/pci/<domid>/0, backend-id=0,
state=1
    xc_domain_assign_device


To unplug:

    write /local/domain/0/backend/device-model/<domid>/command “pci-rem”
    write /local/domain/0/backend/device-model/<domid>/parameter
“xxxx:xx:xx.x”
    wait for “pci-removed” to appear in
/local/domain/0/backend/<domid>/device-model/state
    remove /local/domain/0/backend/pci/<domid>/0/dev-<x>
    call /usr/lib/xcp/lib/pci-flr flr-pre xxxx:xx:xx.x
    if the file /sys/bus/pci/devices/xxxx:xx:xx.x/reset exists, then
write "1" to it; otherwise write "xxxx:xx:xx.x" to
/sys/bus/pci/drivers/pciback/do_flr
    call /usr/lib/xcp/lib/pci-flr flr-post xxxx:xx:xx.x
    xc_domain_deassign_device"


I would recommend that libxl performs similar operations (except for the
QEMU hotplug part and the flr script parts).

We should probably split pciback into parts:

1. A part to capture device at boot and prevent other drivers loading
(bind/unbind)

2. A part to manage reset on device's which don't have device specific
reset's.

3. A part to manage the pciback PV device for communicating to pcifront.


I don't think we can have pciback work out if it's a HVM guest and I
think we should instead use the toolstack to prep the device correctly
before passthrough. The toolstack is donating it's PCI device to the new
domain afterall :)


>>>
>>> I would like to understand which ones do it so I can fix in
>>> pciback - as it might be also be a problem with PV.
>>>
>>> Unless... are you by any chance using extra patches on top of the
>>> native pciback?
>>
>> We do have extra patches but they only allow us to do a SBR on PCI
>> device's which require it. They failure listed above occurs on devices
>> with device specific resets (e.g. FLR,D3) as well so those extra patches
>> aren't being used.
>>
>>>
>>>>
>>>> Malcolm
>>>>
>>>>>
>>>>> Regards
>>>>
>>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] libxl: Don't insert PCI device into xenstore for HVM guests
  2015-06-02 15:48             ` Malcolm Crossley
@ 2015-06-10 20:10               ` Konrad Rzeszutek Wilk
  0 siblings, 0 replies; 17+ messages in thread
From: Konrad Rzeszutek Wilk @ 2015-06-10 20:10 UTC (permalink / raw)
  To: Malcolm Crossley
  Cc: Wei Liu, Ian Campbell, Stefano Stabellini, Ian Jackson, xen-devel,
	Ross Lagerwall

On Tue, Jun 02, 2015 at 04:48:25PM +0100, Malcolm Crossley wrote:
> On 02/06/15 15:34, Konrad Rzeszutek Wilk wrote:
> > On Tue, Jun 02, 2015 at 11:06:26AM +0100, Malcolm Crossley wrote:
> >> On 01/06/15 18:55, Konrad Rzeszutek Wilk wrote:
> >>> On Mon, Jun 01, 2015 at 05:03:14PM +0100, Malcolm Crossley wrote:
> >>>> On 01/06/15 16:43, Ross Lagerwall wrote:
> >>>>> On 06/01/2015 04:26 PM, Konrad Rzeszutek Wilk wrote:
> >>>>>> On Fri, May 29, 2015 at 08:59:45AM +0100, Ross Lagerwall wrote:
> >>>>>>> When doing passthrough of a PCI device for an HVM guest, don't insert
> >>>>>>> the device into xenstore, otherwise pciback attempts to use it which
> >>>>>>> conflicts with QEMU.
> >>>>>>
> >>>>>> How does it conflict?
> >>>>>
> >>>>> It doesn't work with repeated use. See below.
> >>>>>
> >>>>>>>
> >>>>>>> This manifests itself such that the first time a device is passed to a
> >>>>>>> domain, it succeeds. Subsequent attempts fail unless the device is
> >>>>>>> unbound from pciback or the machine rebooted.
> >>>>>>
> >>>>>> Can you be more specific please? What are the issues? Why does it
> >>>>>> fail?
> >>>>>
> >>>>> Without this patch, if a device (e.g. a GPU) is bound to pciback and
> >>>>> then passed through to a guest using xl pci-attach, it appears in the
> >>>>> guest and works fine. If the guest is rebooted, and the device is again
> >>>>> passed through with xl pci-attach, it appears in the guest as before but
> >>>>> does not work. In Windows, it gets something like Error Code 43 and on
> >>>>> Linux, the Nouveau driver fails to initialize the device (with error -22
> >>>>> or something). The only way to get the device to work again is to reboot
> >>>>> the host or unbind and rebind it to pciback.
> >>>>>
> >>>>> With this patch, it works as expected. The device is bound to pciback
> >>>>> and works after being passed through, even after the VM is rebooted.
> >>>>>
> >>>>>>
> >>>>>> There are certain things that pciback does to "prepare" an PCI device
> >>>>>> which QEMU also does. Some of them - such as saving the configuration
> >>>>>> registers (And then restoring them after the device has been detached) -
> >>>>>> is something that QEMU does not do.
> >>>>>>
> >>>>>
> >>>>> I really have no idea what the correct thing to do is, but the current
> >>>>> code with qemu-trad doesn't seem to work (for me).
> > 
> > I think I know what the problem is. Do you by any chance have the XSA133-addenum
> > patch in? If not could you apply it and tell me if it works?

Ping?

It was XSA120-addendum.

> > 
> >>>>
> >>>> The pciback pci_stub.c implements the pciback.hide and the device reset
> >>>> logic.
> >>>>
> >>>> The rest of pciback implements the pciback xenbus device which PV guests
> >>>> need in order to map/unmap MSI interrupts and access PCI config space.
> >>>>
> >>>> QEMU emulates and handles the MSI interrupt capabilities and PCI config
> >>>> space directly.
> >>>
> >>> Right..
> >>>>
> >>>> This is why a pciback xenbus device should not be created for
> >>>> passthrough PCI device being handled by QEMU.
> >>>
> >>> To me that sounds that we should not have PV drivers because QEMU
> >>> emulates IDE or network devices.
> >>
> >> That is different. We first boot with QEMU handling the devices and then
> >> we explictly unplug QEMU's handling of IDE and network devices.
> >>
> >> That handover protocol does not currently exist for PCI passthrough
> >> devices so we have to chose one mechanism or the other to manage the
> >> passed through PCI device at boot time. Otherwise a HVM guest could load
> >> pcifront and cause's all kinds of chaos with interrupt management or
> >> outbound MMIO window management.
> > 
> > Which would be fun! :-)
> >>
> >>>
> >>> The crux here is that none of the operations that pciback performs
> >>> should affect QEMU or guests. But it does - so there is a bug.
> >>
> >> I agree there is a bug but should we try to fix it based upon my
> >> comments above?
> > 
> > I am still thinking about it. I do like certain things that pciback
> > does as part of it being notified that a device is to be used by
> > a guest and performing the configuration save/reset (see
> > pcistub_put_pci_dev in the pciback).
> > 
> > If somehow that can still be done by libxl (or QEMU) via SysFS
> > that would be good.
> > 
> > Just to clarify:
> >  - I concur with you that having xen-pcifront loaded in HVM
> >    guest and doing odd things behind QEMU is not good.
> >  - I like the fact that xen-pciback does a bunch of safety
> >    things with the PCI device to prepare it for a guest.
> >  - Currently these 'safety things'  are done when you
> >    'unbind' or 'bind' the device to pciback.
> >  - Or when the guest is shutdown and via XenBus we are told
> >    and can do the 'safety things'. This is the crux - if there
> >    is a way to do this via SysFS this would be super.
> > 
> >    Or perhaps xenpciback can figure out that the guest is HVM
> >    and ignore any XenBus actions?
> > 
> 
> Xenserver toolstack currently bind/unbinds the device to pciback and
> manually triggers the reset on the device. It then uses xenstore keys to
> communicate with the QEMU hotplug mechanism.
> 
> A complete description is here:
> 
> "Xenopsd performs the following steps for HVM PCI passthrough (for each
> PCI device):
> 
>     write /local/domain/0/backend/pci/<domid>/0/msitranslate “0” or “1”
>     write /local/domain/0/backend/pci/<domid>/0/power_mgmt “0” or “1”
>     bind device to pciback (write device id to the new_slot and bind
> nodes in sysfs)
>     write /local/domain/0/backend/device-model/<domid>/command “pci-ins”
>     write /local/domain/0/backend/device-model/<domid>/parameter
> “xxxx:xx:xx.x”
>     wait for “pci-inserted” to appear in
> /local/domain/0/backend/<domid>/device-model/state
>     write /local/domain/0/backend/pci/<domid>/0/dev-<x> “xxxx:xx:xx.x”
>     if /local/domain/<domid>/device/pci/0 does not yet exist, then
> create /local/domain/<domid>/device/pci/0, give ownership to the guest,
> and write backend=/local/domain/0/backend/pci/<domid>/0, backend-id=0,
> state=1
>     xc_domain_assign_device
> 
> 
> To unplug:
> 
>     write /local/domain/0/backend/device-model/<domid>/command “pci-rem”
>     write /local/domain/0/backend/device-model/<domid>/parameter
> “xxxx:xx:xx.x”
>     wait for “pci-removed” to appear in
> /local/domain/0/backend/<domid>/device-model/state
>     remove /local/domain/0/backend/pci/<domid>/0/dev-<x>
>     call /usr/lib/xcp/lib/pci-flr flr-pre xxxx:xx:xx.x

'pci-flr' ? User-space program to poke the configuration registers?

>     if the file /sys/bus/pci/devices/xxxx:xx:xx.x/reset exists, then
> write "1" to it; otherwise write "xxxx:xx:xx.x" to
> /sys/bus/pci/drivers/pciback/do_flr

Do you have a patch to expose this? This paramter does not exist
in the upstream Linux kernel.

>     call /usr/lib/xcp/lib/pci-flr flr-post xxxx:xx:xx.x
>     xc_domain_deassign_device"
> 
> 
> I would recommend that libxl performs similar operations (except for the
> QEMU hotplug part and the flr script parts).
> 
> We should probably split pciback into parts:
> 
> 1. A part to capture device at boot and prevent other drivers loading
> (bind/unbind)
> 
> 2. A part to manage reset on device's which don't have device specific
> reset's.

Not following you. I think you mean if the standard PCI 'reset' is not
exposed because the device has some quirks or it truly does not expose
FLR, D3/D0 switch, or any other ways to reset it?

> 
> 3. A part to manage the pciback PV device for communicating to pcifront.
> 
> 
> I don't think we can have pciback work out if it's a HVM guest and I
> think we should instead use the toolstack to prep the device correctly
> before passthrough. The toolstack is donating it's PCI device to the new

I do not know what is the right way. The kernel has a lot of knowledge
about these devices and can proper locking and deal with quirks. QEMU
also has some idea of what to do and how to reset a device.

> domain afterall :)
> 
> 
> >>>
> >>> I would like to understand which ones do it so I can fix in
> >>> pciback - as it might be also be a problem with PV.
> >>>
> >>> Unless... are you by any chance using extra patches on top of the
> >>> native pciback?
> >>
> >> We do have extra patches but they only allow us to do a SBR on PCI
> >> device's which require it. They failure listed above occurs on devices
> >> with device specific resets (e.g. FLR,D3) as well so those extra patches
> >> aren't being used.
> >>
> >>>
> >>>>
> >>>> Malcolm
> >>>>
> >>>>>
> >>>>> Regards
> >>>>
> >>
> 

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2015-06-10 20:10 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-05-29  7:59 [PATCH] libxl: Don't insert PCI device into xenstore for HVM guests Ross Lagerwall
2015-05-29  9:41 ` Wei Liu
2015-05-29  9:43   ` Ross Lagerwall
2015-05-29  9:50     ` Wei Liu
2015-05-29  9:54       ` Ross Lagerwall
2015-05-29 10:24         ` Wei Liu
2015-06-01 10:12 ` George Dunlap
2015-06-01 15:26 ` Konrad Rzeszutek Wilk
2015-06-01 15:43   ` Ross Lagerwall
2015-06-01 15:58     ` Konrad Rzeszutek Wilk
2015-06-01 15:59     ` Sander Eikelenboom
2015-06-01 16:03     ` Malcolm Crossley
2015-06-01 17:55       ` Konrad Rzeszutek Wilk
2015-06-02 10:06         ` Malcolm Crossley
2015-06-02 14:34           ` Konrad Rzeszutek Wilk
2015-06-02 15:48             ` Malcolm Crossley
2015-06-10 20:10               ` Konrad Rzeszutek Wilk

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).