From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([2001:4830:134:3::10]:41535)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <marcel@redhat.com>) id 1a444p-0007zd-KU
	for qemu-devel@nongnu.org; Wed, 02 Dec 2015 04:51:56 -0500
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <marcel@redhat.com>) id 1a444k-0004Vd-Ke
	for qemu-devel@nongnu.org; Wed, 02 Dec 2015 04:51:55 -0500
Received: from mx1.redhat.com ([209.132.183.28]:39664)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <marcel@redhat.com>) id 1a444k-0004VU-C9
	for qemu-devel@nongnu.org; Wed, 02 Dec 2015 04:51:50 -0500
References: <1448987005-28335-1-git-send-email-shmulik.ladkani@ravellosystems.com>
	<565DCC97.5020504@redhat.com> <20151201213007.2b41e810@halley>
	<565E0729.9060603@redhat.com> <20151202100109.72be507a@pixies>
From: Marcel Apfelbaum <marcel@redhat.com>
Message-ID: <565EBF32.80803@redhat.com>
Date: Wed, 2 Dec 2015 11:51:46 +0200
MIME-Version: 1.0
In-Reply-To: <20151202100109.72be507a@pixies>
Content-Type: text/plain; charset=windows-1252; format=flowed
Content-Transfer-Encoding: 7bit
Subject: Re: [Qemu-devel] [PATCH] virtio-pci: Set the QEMU_PCI_CAP_EXPRESS
 capability early in its DeviceClass realize method
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Shmulik Ladkani <shmulik.ladkani@ravellosystems.com>
Cc: qemu-devel@nongnu.org, "Michael S. Tsirkin" <mst@redhat.com>

On 12/02/2015 10:01 AM, Shmulik Ladkani wrote:
> Thanks Marcel,
>
> On Tue, 1 Dec 2015 22:46:33 +0200, marcel@redhat.com wrote:
>>>> The reason is the device becomes express only if *all* the conditions
>>>> are met.
>>>
>>> I'm ok with either approaches.
>>>
>>> However it seems common practice to set QEMU_PCI_CAP_EXPRESS
>>> unconditionally for PCIE devices.
>>>
>>> The few existing PCIE devices do so by assigning their
>>> PCIDeviceClass.is_express to 1 within their 'class_init', regardless the
>>> properties of the bus their on.
>>> (e.g. xhci_class_init, megasas_class_init, vfio_pci_dev_class_init,
>>>    nvme_class_init, and more)
>>>
>>> Some devices later call pcie_endpoint_cap_init conditionally.
>>> (e.g. usb_xhci_realize).
>>>
>>> Can you please examine this and let me know the preferred approach?
>>
>> Yes, I saw that..., as always not a walk in the park.
>>
>> - So we have "is_express = true" <=> QEMU_PCI_CAP_EXPRESS on <=> "config size = PCIe"
>> - Not related to the above (!!), if (some condition) => add PCIe express capability
>>     (megasas is the exception)
>>
>> Let's take "usb_xhci":
>>    - If we put it under a PCI bus it will not be an express device, but
>>      it will have a "big" config space. Also pci_is_express(dev) will still return true!
>>    - This is probably a bug. (or I am missing something)
>
> I actually assumed this is the right behavior.
>
> A device class reports whether its instances *could* be pcie by arming
> its PCIDeviceClass.is_express.
> As such, the "big" config space is allocated for the instance. This is
> harmless.
>
> Such a device may (or may not) be connected to a pcie bus, and only if
> so, we report it is a pcie endpoint.
>
> Also, pcie_add_capability is allowed on that device, in order to setup
> whatever capabilities on its pcie config space (even if finally not on a
> pcie bus).
>
> Moreover, VMSTATE_PCIE_DEVICE (which uses vmstate_pcie_device rather
> than vmstate_pci_device) can be used for that device's
> VMStateDescription fields *without* worrying whether the actual config
> space is "big" or "small".
> Otherwise one should examine whether vmstate_pcie_device or
> vmstate_pci_device need to be used. Seems tedious.
>
> This is the reasoning I can think of, why assigning QEMU_PCI_CAP_EXPRESS
> and reporting pcie_endpoint_cap_init are not tightly coupled.

I agree it may be the reason, but that does not make it right.
I still see two *possible* problems:
1. Pci config space is guest visible. The guest can read/write to
    a place it shouldn't. I don't know if is a *real* issue, but it needs checking.
2. We still have pci_is_express returning true, this is error prone because one
    can use this function assuming the device is express. Maybe we should call it "can_be_express" ?
If the migration construct (VMSTATE) is *the only* reason for doing this, maybe is not a good
enough reason (I am not the one to decide :)  ). Is still it seems a little off to me.

If you think this is good enough, you can simply do the same:
   - Instead of replacing the realize method, just advertise it with "is_express" (meaning it can be express)
   - Leave all the conditions as they were in prev patch.
As a result, the pci config space will have the right length.
The consequences are obvious now, if virtio/pci maintainers are OK with that, so am I.

Thanks,
Marcel


>
> Indeed, no strict solution here, both approaches seem reasoanble (and
> both are used!).
>
> WDYT? Is my above interpretation makes sense?
>
> Regards,
> Shmulik
>