[Qemu-devel] VFIO PCIe Extended Capabilities

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

* [Qemu-devel] VFIO PCIe Extended Capabilities
@ 2016-07-19 17:12 Spenser Gilliland
  2016-07-19 17:22 ` Alex Williamson
  0 siblings, 1 reply; 9+ messages in thread
From: Spenser Gilliland @ 2016-07-19 17:12 UTC (permalink / raw)
  To: chen.fan.fnst@cn.fujitsu.com, alex.williamson@redhat.com
  Cc: qemu-devel@nongnu.org

Hi,

I noticed your patches "vfio: add pcie extended capability support" and "vfio/pci: Hide SR-IOV capability" have gone into qemu mainline.  These look really good, and thanks so much for doing these.

I was wondering if there were any side effects to removing the pci_bus_is_express check on line 1776 of hw/vfio/pci.c .

    /* Only add extended caps if we have them and the guest can see them */
-   if (!pci_is_express(pdev) || !pci_bus_is_express(pdev->bus) ||
+  if (!pci_is_express(pdev) ||
        !pci_get_long(pdev->config + PCI_CONFIG_SPACE_SIZE)) {
        return 0;
    }

I'm asking because it looks like the defaults for libvirt/OpenStack are to create a "hostdev" stanza in the libvirt xml to define this pass through condition.  However, it also appears that the "hostdev" stanza only supports pci-bridge bus connections.  Thus, it's not easily possible to use this patch in a libvirt/OpenStack environment as the bus is technically a non-express bus.  It looks like adding PCIe bus support to libvirt/OpenStack may be a lot more effort than a simple workaround here.

I have tested this on my local system and it does work as intended for my use case.  The following is from an OpenStack VM and shows that the 0x340 extended configuration space is passed through correctly.  I've also done testing which uses this space and the results are positive.

# lspci -tvnn
-[0000:00]-+-00.0  Intel Corporation 82G33/G31/P35/P31 Express DRAM Controller [8086:29c0]
           +-01.0  Cirrus Logic GD 5446 [1013:00b8]
           +-1e.0-[01-02]----01.0-[02]--+-01.0  Red Hat, Inc Virtio network device [1af4:1000]
           |                            +-02.0  Red Hat, Inc Virtio block device [1af4:1001]
           |                            +-03.0  Red Hat, Inc Virtio block device [1af4:1001]
           |                            +-04.0  Xilinx Corporation Device [10ee:8138]
           |                            \-05.0  Red Hat, Inc Virtio memory balloon [1af4:1002]
           +-1f.0  Intel Corporation 82801IB (ICH9) LPC Interface Controller [8086:2918]
           +-1f.2  Intel Corporation 82801IR/IO/IH (ICH9R/DO/DH) 6 port SATA Controller [AHCI mode] [8086:2922]
           \-1f.3  Intel Corporation 82801I (ICH9 Family) SMBus Controller [8086:2930]

# lspci -vvv -d 10ee:
02:04.0 Memory controller: Xilinx Corporation Device 8138
        Subsystem: Xilinx Corporation Device 0121
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx+
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0
        Interrupt: pin A routed to IRQ 29
        Region 0: Memory at fdc00000 (32-bit, non-prefetchable) [size=4M]
        Region 1: Memory at fe000000 (32-bit, non-prefetchable) [size=1M]
        Capabilities: [80] Power Management version 3
                Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
                Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
        Capabilities: [90] MSI: Enable+ Count=1/1 Maskable- 64bit+
                Address: 00000000fee00000  Data: 4099
        Capabilities: [c0] Express (v2) Endpoint, MSI 00
                DevCap: MaxPayload 512 bytes, PhantFunc 0, Latency L0s <64ns, L1 <1us
                        ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
                DevCtl: Report errors: Correctable- Non-Fatal+ Fatal+ Unsupported+
                        RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+
                        MaxPayload 256 bytes, MaxReadReq 512 bytes
                DevSta: CorrErr+ UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend-
                LnkCap: Port #0, Speed 8GT/s, Width x8, ASPM unknown, Latency L0 unlimited, L1 unlimited
                        ClockPM- Surprise- LLActRep- BwNot-
                LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- Retrain- CommClk+
                        ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
                LnkSta: Speed 8GT/s, Width x8, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
                DevCap2: Completion Timeout: Range B, TimeoutDis+, LTR-, OBFF Not Supported
                DevCtl2: Completion Timeout: 65ms to 210ms, TimeoutDis-, LTR-, OBFF Disabled
                LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete+, EqualizationPhase1+
                         EqualizationPhase2+, EqualizationPhase3+, LinkEqualizationRequest-
        Capabilities: [100 v2] Advanced Error Reporting
                UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt+ UnxCmplt+ RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UESvrt: DLP+ SDES+ TLP+ FCP+ CmpltTO+ CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
                CESta:  RxErr+ BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-
                CEMsk:  RxErr+ BadTLP+ BadDLLP+ Rollover+ Timeout+ NonFatalErr+
                AERCap: First Error Pointer: 00, GenCap- CGenEn- ChkCap- ChkEn-
        Capabilities: [340 v1] Vendor Specific Information: ID=0001 Rev=0 Len=02c <?>
        Kernel driver in use: xcldma
        Kernel modules: xcldma

Thanks,
Spenser


This email and any attachments are intended for the sole use of the named recipient(s) and contain(s) confidential information that may be proprietary, privileged or copyrighted under applicable law. If you are not the intended recipient, do not read, copy, or forward this email message or any attachments. Delete this email message and any attachments immediately.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Qemu-devel] VFIO PCIe Extended Capabilities
  2016-07-19 17:12 [Qemu-devel] VFIO PCIe Extended Capabilities Spenser Gilliland
@ 2016-07-19 17:22 ` Alex Williamson
  2016-07-19 17:55   ` Alex Williamson
  0 siblings, 1 reply; 9+ messages in thread
From: Alex Williamson @ 2016-07-19 17:22 UTC (permalink / raw)
  To: Spenser Gilliland
  Cc: chen.fan.fnst@cn.fujitsu.com, qemu-devel@nongnu.org,
	Marcel Apfelbaum

On Tue, 19 Jul 2016 17:12:45 +0000
Spenser Gilliland <spenser.gilliland@xilinx.com> wrote:

> Hi,
> 
> I noticed your patches "vfio: add pcie extended capability support" and "vfio/pci: Hide SR-IOV capability" have gone into qemu mainline.  These look really good, and thanks so much for doing these.
> 
> I was wondering if there were any side effects to removing the pci_bus_is_express check on line 1776 of hw/vfio/pci.c .
> 
>     /* Only add extended caps if we have them and the guest can see them */
> -   if (!pci_is_express(pdev) || !pci_bus_is_express(pdev->bus) ||
> +  if (!pci_is_express(pdev) ||
>         !pci_get_long(pdev->config + PCI_CONFIG_SPACE_SIZE)) {
>         return 0;
>     }
> 
> I'm asking because it looks like the defaults for libvirt/OpenStack are to create a "hostdev" stanza in the libvirt xml to define this pass through condition.  However, it also appears that the "hostdev" stanza only supports pci-bridge bus connections.  Thus, it's not easily possible to use this patch in a libvirt/OpenStack environment as the bus is technically a non-express bus.  It looks like adding PCIe bus support to libvirt/OpenStack may be a lot more effort than a simple workaround here.
> 
> I have tested this on my local system and it does work as intended for my use case.  The following is from an OpenStack VM and shows that the 0x340 extended configuration space is passed through correctly.  I've also done testing which uses this space and the results are positive.

If the bus is not express then extended capabilities on the device
should not be accessible, this would be a QEMU bug for allowing it.
Cc'ing Marcel for that.  Thanks,

Alex

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Qemu-devel] VFIO PCIe Extended Capabilities
  2016-07-19 17:22 ` Alex Williamson
@ 2016-07-19 17:55   ` Alex Williamson
  2016-07-19 18:16     ` Marcel Apfelbaum
  0 siblings, 1 reply; 9+ messages in thread
From: Alex Williamson @ 2016-07-19 17:55 UTC (permalink / raw)
  To: Spenser Gilliland
  Cc: chen.fan.fnst@cn.fujitsu.com, Marcel Apfelbaum,
	qemu-devel@nongnu.org, Michael S. Tsirkin

On Tue, 19 Jul 2016 11:22:29 -0600
Alex Williamson <alex.williamson@redhat.com> wrote:

> On Tue, 19 Jul 2016 17:12:45 +0000
> Spenser Gilliland <spenser.gilliland@xilinx.com> wrote:
> 
> > Hi,
> > 
> > I noticed your patches "vfio: add pcie extended capability support" and "vfio/pci: Hide SR-IOV capability" have gone into qemu mainline.  These look really good, and thanks so much for doing these.
> > 
> > I was wondering if there were any side effects to removing the pci_bus_is_express check on line 1776 of hw/vfio/pci.c .
> > 
> >     /* Only add extended caps if we have them and the guest can see them */
> > -   if (!pci_is_express(pdev) || !pci_bus_is_express(pdev->bus) ||
> > +  if (!pci_is_express(pdev) ||
> >         !pci_get_long(pdev->config + PCI_CONFIG_SPACE_SIZE)) {
> >         return 0;
> >     }
> > 
> > I'm asking because it looks like the defaults for libvirt/OpenStack are to create a "hostdev" stanza in the libvirt xml to define this pass through condition.  However, it also appears that the "hostdev" stanza only supports pci-bridge bus connections.  Thus, it's not easily possible to use this patch in a libvirt/OpenStack environment as the bus is technically a non-express bus.  It looks like adding PCIe bus support to libvirt/OpenStack may be a lot more effort than a simple workaround here.
> > 
> > I have tested this on my local system and it does work as intended for my use case.  The following is from an OpenStack VM and shows that the 0x340 extended configuration space is passed through correctly.  I've also done testing which uses this space and the results are positive.  
> 
> If the bus is not express then extended capabilities on the device
> should not be accessible, this would be a QEMU bug for allowing it.
> Cc'ing Marcel for that.  Thanks,

In fact, I've tried to fix this multiple times:

https://lists.nongnu.org/archive/html/qemu-devel/2015-10/msg05384.html
https://lists.nongnu.org/archive/html/qemu-devel/2015-11/msg02422.html
https://lists.nongnu.org/archive/html/qemu-devel/2016-01/msg03259.html

Yet the patch remains unapplied :(

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Qemu-devel] VFIO PCIe Extended Capabilities
  2016-07-19 17:55   ` Alex Williamson
@ 2016-07-19 18:16     ` Marcel Apfelbaum
  2016-07-19 18:30       ` Alex Williamson
  0 siblings, 1 reply; 9+ messages in thread
From: Marcel Apfelbaum @ 2016-07-19 18:16 UTC (permalink / raw)
  To: Alex Williamson, Spenser Gilliland
  Cc: chen.fan.fnst@cn.fujitsu.com, qemu-devel@nongnu.org,
	Michael S. Tsirkin

On 07/19/2016 08:55 PM, Alex Williamson wrote:
> On Tue, 19 Jul 2016 11:22:29 -0600
> Alex Williamson <alex.williamson@redhat.com> wrote:
>
>> On Tue, 19 Jul 2016 17:12:45 +0000
>> Spenser Gilliland <spenser.gilliland@xilinx.com> wrote:
>>
>>> Hi,
>>>
>>> I noticed your patches "vfio: add pcie extended capability support" and "vfio/pci: Hide SR-IOV capability" have gone into qemu mainline.  These look really good, and thanks so much for doing these.
>>>
>>> I was wondering if there were any side effects to removing the pci_bus_is_express check on line 1776 of hw/vfio/pci.c .
>>>
>>>      /* Only add extended caps if we have them and the guest can see them */
>>> -   if (!pci_is_express(pdev) || !pci_bus_is_express(pdev->bus) ||
>>> +  if (!pci_is_express(pdev) ||
>>>          !pci_get_long(pdev->config + PCI_CONFIG_SPACE_SIZE)) {
>>>          return 0;
>>>      }
>>>
>>> I'm asking because it looks like the defaults for libvirt/OpenStack are to create a "hostdev" stanza in the libvirt xml to define this pass through condition.  However, it also appears that the "hostdev" stanza only supports pci-bridge bus connections.  Thus, it's not easily possible to use this patch in a libvirt/OpenStack environment as the bus is technically a non-express bus.  It looks like adding PCIe bus support to libvirt/OpenStack may be a lot more effort than a simple workaround here.
>>>
>>> I have tested this on my local system and it does work as intended for my use case.  The following is from an OpenStack VM and shows that the 0x340 extended configuration space is passed through correctly.  I've also done testing which uses this space and the results are positive.
>>
>> If the bus is not express then extended capabilities on the device
>> should not be accessible, this would be a QEMU bug for allowing it.
>> Cc'ing Marcel for that.  Thanks,

Hi Spencer,

Indeed, if a device is attached to a PCI bus it makes no sense to advertise the extended configuration space.
Can you please share the QEMU command line? Maybe is possible to make the device's bus PCIe in QEMU?

>
> In fact, I've tried to fix this multiple times:
>
> https://lists.nongnu.org/archive/html/qemu-devel/2015-10/msg05384.html
> https://lists.nongnu.org/archive/html/qemu-devel/2015-11/msg02422.html
> https://lists.nongnu.org/archive/html/qemu-devel/2016-01/msg03259.html
>
> Yet the patch remains unapplied :(

I thought is it in already. Maybe Michael can add it as part of the hard freeze.
And if the patch will be applied, the tweak above wouldn't help, right Alex?

Thanks,
Marcel

>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Qemu-devel] VFIO PCIe Extended Capabilities
  2016-07-19 18:16     ` Marcel Apfelbaum
@ 2016-07-19 18:30       ` Alex Williamson
  2016-07-19 19:03         ` Marcel Apfelbaum
  0 siblings, 1 reply; 9+ messages in thread
From: Alex Williamson @ 2016-07-19 18:30 UTC (permalink / raw)
  To: Marcel Apfelbaum
  Cc: Spenser Gilliland, chen.fan.fnst@cn.fujitsu.com,
	qemu-devel@nongnu.org, Michael S. Tsirkin

On Tue, 19 Jul 2016 21:16:40 +0300
Marcel Apfelbaum <marcel@redhat.com> wrote:

> On 07/19/2016 08:55 PM, Alex Williamson wrote:
> > On Tue, 19 Jul 2016 11:22:29 -0600
> > Alex Williamson <alex.williamson@redhat.com> wrote:
> >  
> >> On Tue, 19 Jul 2016 17:12:45 +0000
> >> Spenser Gilliland <spenser.gilliland@xilinx.com> wrote:
> >>  
> >>> Hi,
> >>>
> >>> I noticed your patches "vfio: add pcie extended capability support" and "vfio/pci: Hide SR-IOV capability" have gone into qemu mainline.  These look really good, and thanks so much for doing these.
> >>>
> >>> I was wondering if there were any side effects to removing the pci_bus_is_express check on line 1776 of hw/vfio/pci.c .
> >>>
> >>>      /* Only add extended caps if we have them and the guest can see them */
> >>> -   if (!pci_is_express(pdev) || !pci_bus_is_express(pdev->bus) ||
> >>> +  if (!pci_is_express(pdev) ||
> >>>          !pci_get_long(pdev->config + PCI_CONFIG_SPACE_SIZE)) {
> >>>          return 0;
> >>>      }
> >>>
> >>> I'm asking because it looks like the defaults for libvirt/OpenStack are to create a "hostdev" stanza in the libvirt xml to define this pass through condition.  However, it also appears that the "hostdev" stanza only supports pci-bridge bus connections.  Thus, it's not easily possible to use this patch in a libvirt/OpenStack environment as the bus is technically a non-express bus.  It looks like adding PCIe bus support to libvirt/OpenStack may be a lot more effort than a simple workaround here.
> >>>
> >>> I have tested this on my local system and it does work as intended for my use case.  The following is from an OpenStack VM and shows that the 0x340 extended configuration space is passed through correctly.  I've also done testing which uses this space and the results are positive.  
> >>
> >> If the bus is not express then extended capabilities on the device
> >> should not be accessible, this would be a QEMU bug for allowing it.
> >> Cc'ing Marcel for that.  Thanks,  
> 
> Hi Spencer,
> 
> Indeed, if a device is attached to a PCI bus it makes no sense to advertise the extended configuration space.
> Can you please share the QEMU command line? Maybe is possible to make the device's bus PCIe in QEMU?

I think that any instance of a q35 machine where the assigned device is
placed on the conventional PCI bridge will create this scenario.  It's
the default for attaching devices to a libvirt managed q35 VM AFAIK.

> >
> > In fact, I've tried to fix this multiple times:
> >
> > https://lists.nongnu.org/archive/html/qemu-devel/2015-10/msg05384.html
> > https://lists.nongnu.org/archive/html/qemu-devel/2015-11/msg02422.html
> > https://lists.nongnu.org/archive/html/qemu-devel/2016-01/msg03259.html
> >
> > Yet the patch remains unapplied :(  
> 
> I thought is it in already. Maybe Michael can add it as part of the hard freeze.
> And if the patch will be applied, the tweak above wouldn't help, right Alex?

The tweak Spenser suggested above should be unnecessary with my proposed
patch applied.  Only now searching for that patch did I notice Michael's
comment hidden at the bottom of his reply, which I assume is why it
never got applied:

https://patchwork.kernel.org/patch/8057411/

Anyway, the current behavior is clearly a bug, so QEMU hard freeze
should be irrelevant.  If anyone wants to take over the patch, feel
free.  Thanks,

Alex

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Qemu-devel] VFIO PCIe Extended Capabilities
  2016-07-19 18:30       ` Alex Williamson
@ 2016-07-19 19:03         ` Marcel Apfelbaum
  2016-07-19 20:39           ` Spenser Gilliland
  0 siblings, 1 reply; 9+ messages in thread
From: Marcel Apfelbaum @ 2016-07-19 19:03 UTC (permalink / raw)
  To: Alex Williamson
  Cc: Spenser Gilliland, chen.fan.fnst@cn.fujitsu.com,
	qemu-devel@nongnu.org, Michael S. Tsirkin, Laine Stump

On 07/19/2016 09:30 PM, Alex Williamson wrote:
> On Tue, 19 Jul 2016 21:16:40 +0300
> Marcel Apfelbaum <marcel@redhat.com> wrote:
>
>> On 07/19/2016 08:55 PM, Alex Williamson wrote:
>>> On Tue, 19 Jul 2016 11:22:29 -0600
>>> Alex Williamson <alex.williamson@redhat.com> wrote:
>>>
>>>> On Tue, 19 Jul 2016 17:12:45 +0000
>>>> Spenser Gilliland <spenser.gilliland@xilinx.com> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> I noticed your patches "vfio: add pcie extended capability support" and "vfio/pci: Hide SR-IOV capability" have gone into qemu mainline.  These look really good, and thanks so much for doing these.
>>>>>
>>>>> I was wondering if there were any side effects to removing the pci_bus_is_express check on line 1776 of hw/vfio/pci.c .
>>>>>
>>>>>       /* Only add extended caps if we have them and the guest can see them */
>>>>> -   if (!pci_is_express(pdev) || !pci_bus_is_express(pdev->bus) ||
>>>>> +  if (!pci_is_express(pdev) ||
>>>>>           !pci_get_long(pdev->config + PCI_CONFIG_SPACE_SIZE)) {
>>>>>           return 0;
>>>>>       }
>>>>>
>>>>> I'm asking because it looks like the defaults for libvirt/OpenStack are to create a "hostdev" stanza in the libvirt xml to define this pass through condition.  However, it also appears that the "hostdev" stanza only supports pci-bridge bus connections.  Thus, it's not easily possible to use this patch in a libvirt/OpenStack environment as the bus is technically a non-express bus.  It looks like adding PCIe bus support to libvirt/OpenStack may be a lot more effort than a simple workaround here.
>>>>>
>>>>> I have tested this on my local system and it does work as intended for my use case.  The following is from an OpenStack VM and shows that the 0x340 extended configuration space is passed through correctly.  I've also done testing which uses this space and the results are positive.
>>>>
>>>> If the bus is not express then extended capabilities on the device
>>>> should not be accessible, this would be a QEMU bug for allowing it.
>>>> Cc'ing Marcel for that.  Thanks,
>>
>> Hi Spencer,
>>
>> Indeed, if a device is attached to a PCI bus it makes no sense to advertise the extended configuration space.
>> Can you please share the QEMU command line? Maybe is possible to make the device's bus PCIe in QEMU?
>
> I think that any instance of a q35 machine where the assigned device is
> placed on the conventional PCI bridge will create this scenario.  It's
> the default for attaching devices to a libvirt managed q35 VM AFAIK.
>

Yes, I discussed with Laine from libvirt the possibility to assign
devices to a PCIe port instead.


>>>
>>> In fact, I've tried to fix this multiple times:
>>>
>>> https://lists.nongnu.org/archive/html/qemu-devel/2015-10/msg05384.html
>>> https://lists.nongnu.org/archive/html/qemu-devel/2015-11/msg02422.html
>>> https://lists.nongnu.org/archive/html/qemu-devel/2016-01/msg03259.html
>>>
>>> Yet the patch remains unapplied :(
>>
>> I thought is it in already. Maybe Michael can add it as part of the hard freeze.
>> And if the patch will be applied, the tweak above wouldn't help, right Alex?
>
> The tweak Spenser suggested above should be unnecessary with my proposed
> patch applied.

I think I finally understand this. The bus is not pcie -> we return
from vfio_add_ext_cap without "breaking" the extended capabilities
chain and the bare metal SR-IOV capability will be visible to guest.
With your patch the PCI bridge will "interfere" and mask the extended
configuration space completely.

   Only now searching for that patch did I notice Michael's
> comment hidden at the bottom of his reply, which I assume is why it
> never got applied:
>
> https://patchwork.kernel.org/patch/8057411/
>

I just saw it too! It seems Michael wants to cache this info
in device instead of re-calculating it every time.

> Anyway, the current behavior is clearly a bug, so QEMU hard freeze
> should be irrelevant.  If anyone wants to take over the patch, feel
> free.  Thanks,
>

I suppose I can handle it, but sadly not for 2.7.
If Spencer has some time now I can help by testing it and reviewing it quickly :)

Thanks,
Marcel

> Alex
>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Qemu-devel] VFIO PCIe Extended Capabilities
  2016-07-19 19:03         ` Marcel Apfelbaum
@ 2016-07-19 20:39           ` Spenser Gilliland
  2016-07-19 20:51             ` Alex Williamson
  0 siblings, 1 reply; 9+ messages in thread
From: Spenser Gilliland @ 2016-07-19 20:39 UTC (permalink / raw)
  To: Marcel Apfelbaum, Alex Williamson
  Cc: chen.fan.fnst@cn.fujitsu.com, qemu-devel@nongnu.org,
	Michael S. Tsirkin, Laine Stump, libvirt-users@redhat.com

Hi Marcel,

>> Indeed, if a device is attached to a PCI bus it makes no sense to advertise the extended configuration space.
>> Can you please share the QEMU command line? Maybe is possible to make the device's bus PCIe in QEMU?

Changing the following should make the tweak I proposed irrelevant.

- device vfio-pci,host=03:00.0,id=hostdev0,bus=pci.2,addr=0x4
+ device vfio-pci,host=03:00.0,id=hostdev0,bus=pcie.0,addr=0x4

Here's the full command for reference.

2016-07-19 18:42:22.110+0000: starting up libvirt version: 1.2.17, package: 13.el7 (CentOS BuildSystem <http://bugs.centos.org>, 2015-11-20-16:24:10, worker1.bsys.centos.org), qemu version: 2.3.0 (qemu-kvm-ev-2.3.0-31.el7_2.10.9)
LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin QEMU_AUDIO_DRV=none /usr/libexec/qemu-kvm -name instance-00000019 -S -machine pc-q35-rhel7.2.0,accel=kvm,usb=off -cpu Haswell-noTSX,+abm,+pdpe1gb,+rdrand,+f16c,+osxsave,+dca,+pdcm,+xtpr,+tm2,+est,+smx,+vmx,+ds_cpl,+monitor,+dtes64,+pbe,+tm,+ht,+ss,+acpi,+ds,+vme -m 8192 -realtime mlock=off -smp 8,sockets=8,cores=1,threads=1 -uuid 8b6865eb-2118-430e-a7e0-2989696576b1 -smbios type=1,manufacturer=Fedora Project,product=OpenStack Nova,version=13.1.0-1.el7,serial=a1068a93-24d1-4da2-8903-f9b8307fb0d8,uuid=8b6865eb-2118-430e-a7e0-2989696576b1,family=Virtual Machine -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/domain-instance-00000019/monitor.sock,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc,driftfix=slew -global kvm-pit.lost_tick_policy=discard -no-hpet -no-shutdown -boot strict=on -device i82801b11-bridge,id=pci.1,bus=pcie.0,addr=0x1e -device pci-bridge,chassis_nr=2,id=pci.2,bus=pci.1,addr=0x1 -drive file=/var/lib/nova/instances/8b6865eb-2118-430e-a7e0-2989696576b1/disk,if=none,id=drive-virtio-disk0,format=qcow2,cache=none -device virtio-blk-pci,scsi=off,bus=pci.2,addr=0x2,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -drive file=/var/lib/nova/instances/8b6865eb-2118-430e-a7e0-2989696576b1/disk.swap,if=none,id=drive-virtio-disk1,format=qcow2,cache=none -device virtio-blk-pci,scsi=off,bus=pci.2,addr=0x3,drive=drive-virtio-disk1,id=virtio-disk1 -netdev tap,fd=29,id=hostnet0,vhost=on,vhostfd=30 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=fa:16:3e:9c:cf:60,bus=pci.2,addr=0x1 -chardev file,id=charserial0,path=/var/lib/nova/instances/8b6865eb-2118-430e-a7e0-2989696576b1/console.log -device isa-serial,chardev=charserial0,id=serial0 -chardev pty,id=charserial1 -device isa-serial,chardev=charserial1,id=serial1 -vnc 0.0.0.0:0 -k en-us -device cirrus-vga,id=video0,bus=pcie.0,addr=0x1 -device vfio-pci,host=03:00.0,id=hostdev0,bus=pci.2,addr=0x4 -device virtio-balloon-pci,id=balloon0,bus=pci.2,addr=0x5 -msg timestamp=on
char device redirected to /dev/pts/3 (label charserial1)

It seems very reasonable to change the qemu command line.  However, I'm using OpenStack Nova to launch the instance, which then uses libvirt, which finally uses qemu.  So, the issues is very buried.

> I think that any instance of a q35 machine where the assigned device is
> placed on the conventional PCI bridge will create this scenario.  It's
> the default for attaching devices to a libvirt managed q35 VM AFAIK.

> Yes, I discussed with Laine from libvirt the possibility to assign
> devices to a PCIe port instead.

Yes, I see this as the issue as well. The VM is being created by libvirt with the following xml.  The problem is that the device is auto-assigned to the pci-bridge by default.

<domain type="kvm">
  <uuid>8b6865eb-2118-430e-a7e0-2989696576b1</uuid>
  <name>instance-00000019</name>
  <memory>8388608</memory>
  <vcpu>8</vcpu>
    ... <snip> ...
  <devices>
    ... <snip> ...
    <hostdev mode="subsystem" type="pci" managed="yes">
      <source>
        <address bus="0x03" domain="0x0000" function="0x0" slot="0x00"/>
      </source>
    </hostdev>
    ... <snip> ...
  </devices>
</domain>

If I do a dumpxml I get the following

<domain type="kvm">
  <uuid>8b6865eb-2118-430e-a7e0-2989696576b1</uuid>
  <name>instance-00000019</name>
  <memory>8388608</memory>
  <vcpu>8</vcpu>
    ... <snip> ...
  <devices>
    ... <snip> ...
    <controller type='pci' index='0' model='pcie-root'>
      <alias name='pcie.0'/>
    </controller>
    <controller type='pci' index='1' model='dmi-to-pci-bridge'>
      <alias name='pci.1'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x1e' function='0x0'/>
    </controller>
    <controller type='pci' index='2' model='pci-bridge'>
      <alias name='pci.2'/>
      <address type='pci' domain='0x0000' bus='0x01' slot='0x01' function='0x0'/>
    </controller>
    <interface type='bridge'>
    ... <snip> ...
    <hostdev mode='subsystem' type='pci' managed='yes'>
      <driver name='vfio'/>
      <source>
        <address domain='0x0000' bus='0x03' slot='0x00' function='0x0'/>
      </source>
      <alias name='hostdev0'/>
      <address type='pci' domain='0x0000' bus='0x02' slot='0x02' function='0x0'/>
    </hostdev>
  </devices>
</domain>

If I manually change the domain as follows

    <hostdev mode='subsystem' type='pci' managed='yes'>
      <driver name='vfio'/>
      <source>
        <address domain='0x0000' bus='0x03' slot='0x00' function='0x0'/>
      </source>
      <alias name='hostdev0'/>
-     <address type='pci' domain='0x0000' bus='0x02' slot='0x02' function='0x0'/>
+    <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/>
    </hostdev>

I can reimport and the device and it attaches successfully to the pcie-root hub.  The problem is I need to either manually specify this in nova or update the behavior of libvirt to auto assign pcie devices to pcie busses.  This would involve reading pci configuration space to check if the device is a pcie devices.  Then attaching to the pcie-root, if possible.

>>>
>>> In fact, I've tried to fix this multiple times:
>>>
>>> https://lists.nongnu.org/archive/html/qemu-devel/2015-10/msg05384.html
>>> https://lists.nongnu.org/archive/html/qemu-devel/2015-11/msg02422.html
>>> https://lists.nongnu.org/archive/html/qemu-devel/2016-01/msg03259.html
>>>
>>> Yet the patch remains unapplied :(
>>
>> I thought is it in already. Maybe Michael can add it as part of the hard freeze.
>> And if the patch will be applied, the tweak above wouldn't help, right Alex?
>
> The tweak Spenser suggested above should be unnecessary with my proposed
> patch applied.

> I think I finally understand this. The bus is not pcie -> we return
> from vfio_add_ext_cap without "breaking" the extended capabilities
> chain and the bare metal SR-IOV capability will be visible to guest.
> With your patch the PCI bridge will "interfere" and mask the extended
> configuration space completely.

>   Only now searching for that patch did I notice Michael's
> comment hidden at the bottom of his reply, which I assume is why it
> never got applied:
>
> https://patchwork.kernel.org/patch/8057411/
>

> I just saw it too! It seems Michael wants to cache this info
> in device instead of re-calculating it every time.

>> Anyway, the current behavior is clearly a bug, so QEMU hard freeze
>> should be irrelevant.  If anyone wants to take over the patch, feel
>> free.  Thanks,

> I suppose I can handle it, but sadly not for 2.7.
> If Spencer has some time now he can help by testing it and reviewing it quickly :)

I'd be happy to help; but that patch really just more permanently breaks my current workaround ;-) .

Thanks,
Spenser


This email and any attachments are intended for the sole use of the named recipient(s) and contain(s) confidential information that may be proprietary, privileged or copyrighted under applicable law. If you are not the intended recipient, do not read, copy, or forward this email message or any attachments. Delete this email message and any attachments immediately.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Qemu-devel] VFIO PCIe Extended Capabilities
  2016-07-19 20:39           ` Spenser Gilliland
@ 2016-07-19 20:51             ` Alex Williamson
  2016-07-19 21:25               ` Spenser Gilliland
  0 siblings, 1 reply; 9+ messages in thread
From: Alex Williamson @ 2016-07-19 20:51 UTC (permalink / raw)
  To: Spenser Gilliland
  Cc: Marcel Apfelbaum, chen.fan.fnst@cn.fujitsu.com,
	qemu-devel@nongnu.org, Michael S. Tsirkin, Laine Stump,
	libvirt-users@redhat.com

On Tue, 19 Jul 2016 20:39:26 +0000
Spenser Gilliland <spenser.gilliland@xilinx.com> wrote:
> >> Anyway, the current behavior is clearly a bug, so QEMU hard freeze
> >> should be irrelevant.  If anyone wants to take over the patch, feel
> >> free.  Thanks,  
> 
> > I suppose I can handle it, but sadly not for 2.7.
> > If Spencer has some time now he can help by testing it and reviewing it quickly :)  
> 
> I'd be happy to help; but that patch really just more permanently breaks my current workaround ;-) .
 
How so?  The patch should make extended config space disappear when the
device is not on a PCIe bus.  This is the correct behavior per the
PCIe spec.  Are you trying to exploit this QEMU bug to see some PCIe
capabilities but not others?  That's wrong, you'll need to move the
device to a PCIe slot to get extended capabilities.

> This email and any attachments are intended for the sole use of the named recipient(s) and contain(s) confidential information that may be proprietary, privileged or copyrighted under applicable law. If you are not the intended recipient, do not read, copy, or forward this email message or any attachments. Delete this email message and any attachments immediately.

You're sending email to a public list, please stop including these.
Thanks,

Alex

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Qemu-devel] VFIO PCIe Extended Capabilities
  2016-07-19 20:51             ` Alex Williamson
@ 2016-07-19 21:25               ` Spenser Gilliland
  0 siblings, 0 replies; 9+ messages in thread
From: Spenser Gilliland @ 2016-07-19 21:25 UTC (permalink / raw)
  To: Alex Williamson
  Cc: Marcel Apfelbaum, chen.fan.fnst@cn.fujitsu.com,
	qemu-devel@nongnu.org, Michael S. Tsirkin, Laine Stump,
	libvirt-users@redhat.com

Hey Alex,

> How so?  The patch should make extended config space disappear when the
> device is not on a PCIe bus.  This is the correct behavior per the
> PCIe spec.  Are you trying to exploit this QEMU bug to see some PCIe
> capabilities but not others?  That's wrong, you'll need to move the
> device to a PCIe slot to get extended capabilities.

Yes, I am exploiting the peculiarities of QEMU to make my hardware work properly.  It may be wrong but it works (hence the ;-) ) .

>From this conversation, it seems the real problem is that libvirt does not auto assign the hardware to a pcie root port in the q35 machine type. It seems like this really needs to be fixed in the libvirt source code; so, I'm working on a patch for libvirt.

I'd be happy to test your patch for you;  I have access to hardware that will allow me to verify that your patch works correctly. Feel free to cc me on the next version of the patch and I'll add my tested-by.

> You're sending email to a public list, please stop including these.

This is not intentional.  It seems like the Xilinx email servers (not my local machine) are adding this to all messages.  I'll see what I can do and I'm sorry for the inconvenience.

Thanks,
Spenser

This email and any attachments are intended for the sole use of the named recipient(s) and contain(s) confidential information that may be proprietary, privileged or copyrighted under applicable law. If you are not the intended recipient, do not read, copy, or forward this email message or any attachments. Delete this email message and any attachments immediately.

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2016-07-19 21:25 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-07-19 17:12 [Qemu-devel] VFIO PCIe Extended Capabilities Spenser Gilliland
2016-07-19 17:22 ` Alex Williamson
2016-07-19 17:55   ` Alex Williamson
2016-07-19 18:16     ` Marcel Apfelbaum
2016-07-19 18:30       ` Alex Williamson
2016-07-19 19:03         ` Marcel Apfelbaum
2016-07-19 20:39           ` Spenser Gilliland
2016-07-19 20:51             ` Alex Williamson
2016-07-19 21:25               ` Spenser Gilliland

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).