From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:41535) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1a444p-0007zd-KU for qemu-devel@nongnu.org; Wed, 02 Dec 2015 04:51:56 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1a444k-0004Vd-Ke for qemu-devel@nongnu.org; Wed, 02 Dec 2015 04:51:55 -0500 Received: from mx1.redhat.com ([209.132.183.28]:39664) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1a444k-0004VU-C9 for qemu-devel@nongnu.org; Wed, 02 Dec 2015 04:51:50 -0500 References: <1448987005-28335-1-git-send-email-shmulik.ladkani@ravellosystems.com> <565DCC97.5020504@redhat.com> <20151201213007.2b41e810@halley> <565E0729.9060603@redhat.com> <20151202100109.72be507a@pixies> From: Marcel Apfelbaum Message-ID: <565EBF32.80803@redhat.com> Date: Wed, 2 Dec 2015 11:51:46 +0200 MIME-Version: 1.0 In-Reply-To: <20151202100109.72be507a@pixies> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [PATCH] virtio-pci: Set the QEMU_PCI_CAP_EXPRESS capability early in its DeviceClass realize method List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Shmulik Ladkani Cc: qemu-devel@nongnu.org, "Michael S. Tsirkin" On 12/02/2015 10:01 AM, Shmulik Ladkani wrote: > Thanks Marcel, > > On Tue, 1 Dec 2015 22:46:33 +0200, marcel@redhat.com wrote: >>>> The reason is the device becomes express only if *all* the conditions >>>> are met. >>> >>> I'm ok with either approaches. >>> >>> However it seems common practice to set QEMU_PCI_CAP_EXPRESS >>> unconditionally for PCIE devices. >>> >>> The few existing PCIE devices do so by assigning their >>> PCIDeviceClass.is_express to 1 within their 'class_init', regardless the >>> properties of the bus their on. >>> (e.g. xhci_class_init, megasas_class_init, vfio_pci_dev_class_init, >>> nvme_class_init, and more) >>> >>> Some devices later call pcie_endpoint_cap_init conditionally. >>> (e.g. usb_xhci_realize). >>> >>> Can you please examine this and let me know the preferred approach? >> >> Yes, I saw that..., as always not a walk in the park. >> >> - So we have "is_express = true" <=> QEMU_PCI_CAP_EXPRESS on <=> "config size = PCIe" >> - Not related to the above (!!), if (some condition) => add PCIe express capability >> (megasas is the exception) >> >> Let's take "usb_xhci": >> - If we put it under a PCI bus it will not be an express device, but >> it will have a "big" config space. Also pci_is_express(dev) will still return true! >> - This is probably a bug. (or I am missing something) > > I actually assumed this is the right behavior. > > A device class reports whether its instances *could* be pcie by arming > its PCIDeviceClass.is_express. > As such, the "big" config space is allocated for the instance. This is > harmless. > > Such a device may (or may not) be connected to a pcie bus, and only if > so, we report it is a pcie endpoint. > > Also, pcie_add_capability is allowed on that device, in order to setup > whatever capabilities on its pcie config space (even if finally not on a > pcie bus). > > Moreover, VMSTATE_PCIE_DEVICE (which uses vmstate_pcie_device rather > than vmstate_pci_device) can be used for that device's > VMStateDescription fields *without* worrying whether the actual config > space is "big" or "small". > Otherwise one should examine whether vmstate_pcie_device or > vmstate_pci_device need to be used. Seems tedious. > > This is the reasoning I can think of, why assigning QEMU_PCI_CAP_EXPRESS > and reporting pcie_endpoint_cap_init are not tightly coupled. I agree it may be the reason, but that does not make it right. I still see two *possible* problems: 1. Pci config space is guest visible. The guest can read/write to a place it shouldn't. I don't know if is a *real* issue, but it needs checking. 2. We still have pci_is_express returning true, this is error prone because one can use this function assuming the device is express. Maybe we should call it "can_be_express" ? If the migration construct (VMSTATE) is *the only* reason for doing this, maybe is not a good enough reason (I am not the one to decide :) ). Is still it seems a little off to me. If you think this is good enough, you can simply do the same: - Instead of replacing the realize method, just advertise it with "is_express" (meaning it can be express) - Leave all the conditions as they were in prev patch. As a result, the pci config space will have the right length. The consequences are obvious now, if virtio/pci maintainers are OK with that, so am I. Thanks, Marcel > > Indeed, no strict solution here, both approaches seem reasoanble (and > both are used!). > > WDYT? Is my above interpretation makes sense? > > Regards, > Shmulik >