From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:54728) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1cdDeT-0004ef-T0 for qemu-devel@nongnu.org; Mon, 13 Feb 2017 05:14:36 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1cdDeO-0000Dn-RY for qemu-devel@nongnu.org; Mon, 13 Feb 2017 05:14:33 -0500 References: <20170208061602.17666-1-david@gibson.dropbear.id.au> <46d64512-1085-6d22-1e0f-660757ff7131@redhat.com> <20170209041634.GC14524@umbus> <95706652-0a80-92fc-951b-7a454d496ddf@redhat.com> <20170210003746.GP27610@umbus.fritz.box> <5ea3785c-b979-8b8c-3ab0-243d69384697@redhat.com> <20170213043307.GT25381@umbus> From: Marcel Apfelbaum Message-ID: <45446029-e404-77d4-754e-5541a506bb7c@redhat.com> Date: Mon, 13 Feb 2017 12:14:23 +0200 MIME-Version: 1.0 In-Reply-To: <20170213043307.GT25381@umbus> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [RFC] virtio-pci: Allow PCIe virtio devices on root bus List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: David Gibson Cc: Laszlo Ersek , abologna@redhat.com, lvivier@redhat.com, thuth@redhat.com, mst@redhat.com, qemu-devel@nongnu.org, qemu-ppc@nongnu.org On 02/13/2017 06:33 AM, David Gibson wrote: > On Sun, Feb 12, 2017 at 09:05:46PM +0200, Marcel Apfelbaum wrote: >> On 02/10/2017 02:37 AM, David Gibson wrote: >>> On Thu, Feb 09, 2017 at 10:04:47AM +0100, Laszlo Ersek wrote: >>>> On 02/09/17 05:16, David Gibson wrote: >>>>> On Wed, Feb 08, 2017 at 11:40:50AM +0100, Laszlo Ersek wrote: >>>>>> On 02/08/17 07:16, David Gibson wrote: >>>>>>> Marcel, >>>>>>> >>>>>>> Your original patch adding PCIe support to virtio-pci.c has the >>>>>>> limitation noted below that PCIe won't be enabled if the device is on >>>>>>> the root bus (rather than under a root or downstream port). As >>>>>>> reasoned below, I think removing the check is correct, even for x86 >>>>>>> (though it would rarely be useful there). But I could well have >>>>>>> missed something. Let me know if so... >>>>>>> >>>>>>> >>>>>>> >>>>>>> Virtio devices can appear as either vanilla PCI or PCI-Express devices >>>>>>> depending on the bus they're connected to. At the moment it will only >>>>>>> appear as vanilla PCI if connected to the root bus of a PCIe host bridge. >>>>>>> >>>>>>> Presumably this is to reflect the fact that PCIe devices usually need to >>>>>>> be connected to a root (or further downstream) port rather than directly >>>>>>> on the root bus. However, due to the odd requirements of the PAPR spec on the 'pseries' >>>>>>> machine type, it's normal for PCIe devices to appear on the root bus >>>>>>> without root ports. >>>>>>> >>>>>>> Further, even on x86, there's no inherent reason we couldn't present a >>>>>>> virtio device as an "integrated device" (typically used for things built >>>>>>> into the PCI chipset), and those devices *do* typically appear on the root >>>>>>> bus. >>>>>> >>>>>> I'm not personally making a counter-argument, just qouting some of >>>>>> the relevant parts of "docs/pcie.txt" ("PCI EXPRESS GUIDELINES"): >>>>> >>>>> So, an earlier discussion more or less concluded that the PCIe >>>>> guidelines don't really work with PAPR guests. That comes because >>>>> PAPR was designed with PowerVM in mind which allows PCI passthrough >>>>> but doesn't do any emulated PCI devices. So they wanted to present >>>>> passed through devices (virtual or phyical) to the guest without >>>>> inserting virtual root ports. >>>>> >>>>> Now, you can argue that this was a silly decision in PAPR, and you >>>>> could well be right, but there it is. >>>> >>>> I can totally accept this, but then we should state it as a fact near >>>> the top of "docs/pcie.txt". >>>> >>>>> >>>>>>> Place only the following kinds of devices directly on the Root Complex: >>>>>>> (1) PCI Devices (e.g. network card, graphics card, IDE controller), >>>>>>> not controllers. Place only legacy PCI devices on >>>>>>> the Root Complex. These will be considered Integrated Endpoints. >>>>>>> Note: Integrated Endpoints are not hot-pluggable. >>>>>>> >>>>>>> Although the PCI Express spec does not forbid PCI Express devices as >>>>>>> Integrated Endpoints, existing hardware mostly integrates legacy PCI >>>>>>> devices with the Root Complex. >>>>> >>>>> "Mostly".. on my laptop at least the GPU shows up as an integrated PCI >>>>> Express endpoint, so it's certainly not the case that *all* root bus >>>>> devices are legacy. >>>>> >>>>>> Guest OSes are suspected to behave >>>>>>> strangely when PCI Express devices are integrated >>>>>>> with the Root Complex. >>>>> >>>>> Clearly not that strangely, that often, since my laptop works just fine. >>>>> >>>>>>> >>>>>>> [...] >>>>>>> >>>>>>> 2.2 PCI Express only hierarchy >>>>>>> ============================== >>>>>>> Always use PCI Express Root Ports to start PCI Express hierarchies. >>>>>> >>>>>> Above you mention "it's normal for PCIe devices to appear on the root bus without root ports". >>>>> >>>>> Well "normal" perhaps wasn't the right word. Let's say precedented, >>>>> if uncommon. >>>>> >>>>>> Let me turn the question around: is it a *problem* for "pseries" if >>>>>> we require root ports? If so, why exactly? >>>>> >>>>> That's.. a complex question. At least Linux guests (and we don't >>>>> support any others yet) might cope with the addition of root ports. >>>>> Maybe. I have discussed this option with BenH and others. >>>>> >>>>> However it is gratuitously different from how PCIe devices will >>>>> typically appear for the same guest running under PowerVM. Although I >>>>> suspect Linux would cope with the "normal standard" rather than "PAPR >>>>> standard" presentation, I'm not as confident about it as I would like. >>>>> >>>>> Another consideration here is that other PCIe capable qemu emulated >>>>> devices, such as XHCI, will present fine as PCIe integrated endpoints >>>>> when attached to the root bus. Libvirt won't do that usually, of >>>>> course, and it may not be the recommended way of doing things (on PC) >>>>> but it's possible. I don't see any particular reason that virtio-pci >>>>> should enforce the root port requirement more so than any other >>>>> device. >>>>> >>>>>> On 02/08/17 07:16, David Gibson wrote: >>>>>>> >>>>>>> pcie_endpoint_cap_init() already automatically adjusts to advertise as >>>>>>> an integrated device rather than a "normal" PCIe endpoint when attached to >>>>>>> a root bus. So we can remove the check for root bus within virtio and >>>>>>> allow (at the user's discretion) a PCIe virtio bus to be attached to a >>>>>>> root bus. >>>>>> >>>>>> If Marcel thinks this is a good change, then I think we should go >>>>>> through "docs/pcie.txt" with a fine-toothed comb, and update all >>>>>> relevant spots. (If Marcel agrees, perhaps you can include such >>>>>> hunks in your patch at once.) >>>>> >>>>> Actually, I think that would be a neverending process. Maybe better >>>>> to put in a whole different spapr-pcie.txt with the assorted ways that >>>>> PAPR violates PCIe conventions. >>>> >>>> That works for me too, but I think it would be a lot more work for you >>>> and others. >>>> >>>> I plan on consulting "docs/pcie.txt" frequently; among other things, for >>>> deciding debates. Thus, improving the scope of "docs/pcie.txt" is very >>>> welcome IMO. >>>> >>>>> >>>>>> It also may have consequences for libvirt (but I see you addressed >>>>>> Andrea at once, which is great). >>>>> >>>>> Right, I've been discussed this with Andrea all along. We're working >>>>> on a proposed PAPR specific way of allocating PCI and PCIe addresses >>>>> (different from the PCIe normal way, but the same as each other). >>>>> That will simplify adding PCIe support to PAPR, and also has some >>>>> other advantages for PAPR guests (related to the platform specific >>>>> isolation, hotplug and error recovery mechanisms - also different >>>>> from the normal PCIe ones). >>>> >>>> Great, if Andrea is aware, that's a relief. >>>> >>>> Can you resubmit this patch with a small hunk for "docs/pcie.txt" that >>>> removes PAPR from the scope? >>> >> >> Hi David, >> Sorry for the delay, I just came back from PTO. >> >>> Well, first I'd like to see if Marcel knows of some reason I didn't >>> think of why this test is important for virtio particularly here. But >>> assuming the basic idea is acceptable, then yes, I'll update pcie.txt. >>> >> >> There are two reasons for keeping virtio Integrated Endpoints as PCI devices. >> 1. The first point is generic; even if having PCIe devices as Integrated Endpoints should be OK, >> is not recommended because some guests may miss-behave (*). X86 arch supports a large number >> of guests and we don't want to check and fix everything if *we don't have to*. >> Even if is not written anywhere and there are actually some PCI Express Integrated Endpoints, >> most of them are legacy PCI devices (I actually think this is why we have Integrated Endpoints >> at all, but I might be wrong). > > Hm, ok. Could we implement that restriction in the pci/pcie core > rather than in the virtio device? I am not sure if we should do that. Most of the devices are PCI or PCIe. Only some devices are "hybrid", the virtio devices, XHCI and I am not sure we have more. > That would then protect things like > XHCI as well. I don't see a reason to have XHCI as Integrated Endpoint, I think it should be always plugged into a root port. (for x86. arm and power) And for my purposes it would also make it easier to > implement aa machine type specific hook to re-allow that configuration > on pseries. I agree we need a solution for PAPR. What about a pcie_papr() function and then: diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c index 5ce42af..2c646ae 100644 --- a/hw/virtio/virtio-pci.c +++ b/hw/virtio/virtio-pci.c @@ -1804,7 +1804,7 @@ static void virtio_pci_realize(PCIDevice *pci_dev, Error **errp) return; } - if (pcie_port && pci_is_express(pci_dev)) { + if ((pcie_port || pcie_parp()) && pci_is_express(pci_dev)) { int pos; pos = pcie_endpoint_cap_init(pci_dev, 0); > > At the moment XHCI and virtio-pci behave differently, which seems less > than ideal. > >> 2. The second point is virtio specific. Not all the guests have virtio 1.0 support (e.g RHEL 6) and we allow them >> to use legacy virtio devices as Integrated Endpoints (following the thought that this is why we have Integrated Endpoints) >> Making the virtio devices PCI Express, but not virtio 1.0 is also problematic since now we will have too much >> types of virtio devices. We want to keep it simple: virtio legacy >> -> PCI, virtio modern -> PCIe. > > Ok.. it's not obvious to me why integrated endpoint vs. under a root > port is relevant to this. Can't we enable/disable PCIe mode based > directly on the legacy/modern settings? > Yes, we can, but we don't want to do that. Previous setups will stop working and we will need libvirt to mange the disable-* properties. As a matter of fact the code today is after some discussions with libvirt guys. >> (*) A while ago Alex Williamson found such of issue, I think is this one: >> 0282ab (vfio/pci: Hide device PCIe capability on non-express buses for PCIe VMs) > > It's also not clear to me why this fix is relevant to the question. > That change disables the PCIe capability on a bus which is > not-express, but is under an express root bridge (and is therefore > clearly *not* on the root bus). For the case I'm talking about the > *is* on an express bus and it *is* the root bus. AFAICT that patch > would be relevant only for devices under a PCIe-to-PCI bridge on a > PCIe system. > I might have selected the wrong patch. Thanks, Marcel >>>> be appreciated too, if that makes sense. (By default we aim at >>>> multi-arch / multi-target with this document; we may not have stated it >>>> explicitly, but AFAIR we intend to cover aarch64 / "virt" too.) >>> >>> Right, that was my understanding as well. >>> >> >> Indeed, we want the document to support them all. If PAPR is different, we should mention it. > > Sure. I'm trying to work out what we can/should do for pseries first; > I'll write something up for the docs when I have something I think is > ready to merge. >