From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([2001:4830:134:3::10]:48443)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <marcel@redhat.com>) id 1bPaIg-00082g-VC
	for qemu-devel@nongnu.org; Tue, 19 Jul 2016 15:03:28 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <marcel@redhat.com>) id 1bPaId-0008V2-9O
	for qemu-devel@nongnu.org; Tue, 19 Jul 2016 15:03:25 -0400
Received: from mx1.redhat.com ([209.132.183.28]:56791)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <marcel@redhat.com>) id 1bPaId-0008Ux-1A
	for qemu-devel@nongnu.org; Tue, 19 Jul 2016 15:03:23 -0400
References: <66583216254AFC43ADCD7BD88016E00698544F11@XSJ-PSEXMBX02.xlnx.xilinx.com>
	<20160719112229.409eee63@t450s.home>
	<20160719115522.71d80873@t450s.home>
	<578E6E88.5040903@redhat.com> <20160719123035.6da9b869@t450s.home>
From: Marcel Apfelbaum <marcel@redhat.com>
Message-ID: <578E7975.406@redhat.com>
Date: Tue, 19 Jul 2016 22:03:17 +0300
MIME-Version: 1.0
In-Reply-To: <20160719123035.6da9b869@t450s.home>
Content-Type: text/plain; charset=windows-1252; format=flowed
Content-Transfer-Encoding: 7bit
Subject: Re: [Qemu-devel] VFIO PCIe Extended Capabilities
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel/>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Alex Williamson <alex.williamson@redhat.com>
Cc: Spenser Gilliland <spenser.gilliland@xilinx.com>, "chen.fan.fnst@cn.fujitsu.com" <chen.fan.fnst@cn.fujitsu.com>, "qemu-devel@nongnu.org" <qemu-devel@nongnu.org>, "Michael S. Tsirkin" <mst@redhat.com>, Laine Stump <laine@redhat.com>

On 07/19/2016 09:30 PM, Alex Williamson wrote:
> On Tue, 19 Jul 2016 21:16:40 +0300
> Marcel Apfelbaum <marcel@redhat.com> wrote:
>
>> On 07/19/2016 08:55 PM, Alex Williamson wrote:
>>> On Tue, 19 Jul 2016 11:22:29 -0600
>>> Alex Williamson <alex.williamson@redhat.com> wrote:
>>>
>>>> On Tue, 19 Jul 2016 17:12:45 +0000
>>>> Spenser Gilliland <spenser.gilliland@xilinx.com> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> I noticed your patches "vfio: add pcie extended capability support" and "vfio/pci: Hide SR-IOV capability" have gone into qemu mainline.  These look really good, and thanks so much for doing these.
>>>>>
>>>>> I was wondering if there were any side effects to removing the pci_bus_is_express check on line 1776 of hw/vfio/pci.c .
>>>>>
>>>>>       /* Only add extended caps if we have them and the guest can see them */
>>>>> -   if (!pci_is_express(pdev) || !pci_bus_is_express(pdev->bus) ||
>>>>> +  if (!pci_is_express(pdev) ||
>>>>>           !pci_get_long(pdev->config + PCI_CONFIG_SPACE_SIZE)) {
>>>>>           return 0;
>>>>>       }
>>>>>
>>>>> I'm asking because it looks like the defaults for libvirt/OpenStack are to create a "hostdev" stanza in the libvirt xml to define this pass through condition.  However, it also appears that the "hostdev" stanza only supports pci-bridge bus connections.  Thus, it's not easily possible to use this patch in a libvirt/OpenStack environment as the bus is technically a non-express bus.  It looks like adding PCIe bus support to libvirt/OpenStack may be a lot more effort than a simple workaround here.
>>>>>
>>>>> I have tested this on my local system and it does work as intended for my use case.  The following is from an OpenStack VM and shows that the 0x340 extended configuration space is passed through correctly.  I've also done testing which uses this space and the results are positive.
>>>>
>>>> If the bus is not express then extended capabilities on the device
>>>> should not be accessible, this would be a QEMU bug for allowing it.
>>>> Cc'ing Marcel for that.  Thanks,
>>
>> Hi Spencer,
>>
>> Indeed, if a device is attached to a PCI bus it makes no sense to advertise the extended configuration space.
>> Can you please share the QEMU command line? Maybe is possible to make the device's bus PCIe in QEMU?
>
> I think that any instance of a q35 machine where the assigned device is
> placed on the conventional PCI bridge will create this scenario.  It's
> the default for attaching devices to a libvirt managed q35 VM AFAIK.
>

Yes, I discussed with Laine from libvirt the possibility to assign
devices to a PCIe port instead.


>>>
>>> In fact, I've tried to fix this multiple times:
>>>
>>> https://lists.nongnu.org/archive/html/qemu-devel/2015-10/msg05384.html
>>> https://lists.nongnu.org/archive/html/qemu-devel/2015-11/msg02422.html
>>> https://lists.nongnu.org/archive/html/qemu-devel/2016-01/msg03259.html
>>>
>>> Yet the patch remains unapplied :(
>>
>> I thought is it in already. Maybe Michael can add it as part of the hard freeze.
>> And if the patch will be applied, the tweak above wouldn't help, right Alex?
>
> The tweak Spenser suggested above should be unnecessary with my proposed
> patch applied.

I think I finally understand this. The bus is not pcie -> we return
from vfio_add_ext_cap without "breaking" the extended capabilities
chain and the bare metal SR-IOV capability will be visible to guest.
With your patch the PCI bridge will "interfere" and mask the extended
configuration space completely.

   Only now searching for that patch did I notice Michael's
> comment hidden at the bottom of his reply, which I assume is why it
> never got applied:
>
> https://patchwork.kernel.org/patch/8057411/
>

I just saw it too! It seems Michael wants to cache this info
in device instead of re-calculating it every time.

> Anyway, the current behavior is clearly a bug, so QEMU hard freeze
> should be irrelevant.  If anyone wants to take over the patch, feel
> free.  Thanks,
>

I suppose I can handle it, but sadly not for 2.7.
If Spencer has some time now I can help by testing it and reviewing it quickly :)

Thanks,
Marcel

> Alex
>