From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:47479) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1cdofO-0002Eg-5P for qemu-devel@nongnu.org; Tue, 14 Feb 2017 20:46:01 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1cdofL-0004wO-0q for qemu-devel@nongnu.org; Tue, 14 Feb 2017 20:45:58 -0500 Date: Wed, 15 Feb 2017 12:45:44 +1100 From: David Gibson Message-ID: <20170215014544.GC12369@umbus.fritz.box> References: <20170208061602.17666-1-david@gibson.dropbear.id.au> <46d64512-1085-6d22-1e0f-660757ff7131@redhat.com> <20170209041634.GC14524@umbus> <95706652-0a80-92fc-951b-7a454d496ddf@redhat.com> <20170210003746.GP27610@umbus.fritz.box> <5ea3785c-b979-8b8c-3ab0-243d69384697@redhat.com> <20170213043307.GT25381@umbus> <45446029-e404-77d4-754e-5541a506bb7c@redhat.com> <20170214041532.GF2169@umbus.fritz.box> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="WfZ7S8PLGjBY9Voh" Content-Disposition: inline In-Reply-To: Subject: Re: [Qemu-devel] [RFC] virtio-pci: Allow PCIe virtio devices on root bus List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Marcel Apfelbaum Cc: Laszlo Ersek , abologna@redhat.com, lvivier@redhat.com, thuth@redhat.com, mst@redhat.com, qemu-devel@nongnu.org, qemu-ppc@nongnu.org, kraxel@redhat.com, Eduardo Habkost --WfZ7S8PLGjBY9Voh Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Tue, Feb 14, 2017 at 02:53:08PM +0200, Marcel Apfelbaum wrote: > On 02/14/2017 06:15 AM, David Gibson wrote: > > On Mon, Feb 13, 2017 at 12:14:23PM +0200, Marcel Apfelbaum wrote: > > > On 02/13/2017 06:33 AM, David Gibson wrote: > > > > On Sun, Feb 12, 2017 at 09:05:46PM +0200, Marcel Apfelbaum wrote: > > > > > On 02/10/2017 02:37 AM, David Gibson wrote: > > > > > > On Thu, Feb 09, 2017 at 10:04:47AM +0100, Laszlo Ersek wrote: > > > > > > > On 02/09/17 05:16, David Gibson wrote: > > > > > > > > On Wed, Feb 08, 2017 at 11:40:50AM +0100, Laszlo Ersek wrot= e: > > > > > > > > > On 02/08/17 07:16, David Gibson wrote: > > > > > > > > > > Marcel, > > > > > > > > > >=20 > > > > > > > > > > Your original patch adding PCIe support to virtio-pci.c= has the > > > > > > > > > > limitation noted below that PCIe won't be enabled if th= e device is on > > > > > > > > > > the root bus (rather than under a root or downstream po= rt). As > > > > > > > > > > reasoned below, I think removing the check is correct, = even for x86 > > > > > > > > > > (though it would rarely be useful there). But I could = well have > > > > > > > > > > missed something. Let me know if so... > > > > > > > > > >=20 > > > > > > > > > >=20 > > > > > > > > > >=20 > > > > > > > > > > Virtio devices can appear as either vanilla PCI or PCI-= Express devices > > > > > > > > > > depending on the bus they're connected to. At the mome= nt it will only > > > > > > > > > > appear as vanilla PCI if connected to the root bus of a= PCIe host bridge. > > > > > > > > > >=20 > > > > > > > > > > Presumably this is to reflect the fact that PCIe device= s usually need to > > > > > > > > > > be connected to a root (or further downstream) port rat= her than directly > > > > > > > > > > on the root bus. However, due to the odd requirements = of the PAPR spec on the 'pseries' > > > > > > > > > > machine type, it's normal for PCIe devices to appear on= the root bus > > > > > > > > > > without root ports. > > > > > > > > > >=20 > > > > > > > > > > Further, even on x86, there's no inherent reason we cou= ldn't present a > > > > > > > > > > virtio device as an "integrated device" (typically used= for things built > > > > > > > > > > into the PCI chipset), and those devices *do* typically= appear on the root > > > > > > > > > > bus. > > > > > > > > >=20 > > > > > > > > > I'm not personally making a counter-argument, just qoutin= g some of > > > > > > > > > the relevant parts of "docs/pcie.txt" ("PCI EXPRESS GUIDE= LINES"): > > > > > > > >=20 > > > > > > > > So, an earlier discussion more or less concluded that the P= CIe > > > > > > > > guidelines don't really work with PAPR guests. That comes = because > > > > > > > > PAPR was designed with PowerVM in mind which allows PCI pas= sthrough > > > > > > > > but doesn't do any emulated PCI devices. So they wanted to= present > > > > > > > > passed through devices (virtual or phyical) to the guest wi= thout > > > > > > > > inserting virtual root ports. > > > > > > > >=20 > > > > > > > > Now, you can argue that this was a silly decision in PAPR, = and you > > > > > > > > could well be right, but there it is. > > > > > > >=20 > > > > > > > I can totally accept this, but then we should state it as a f= act near > > > > > > > the top of "docs/pcie.txt". > > > > > > >=20 > > > > > > > >=20 > > > > > > > > > > Place only the following kinds of devices directly on t= he Root Complex: > > > > > > > > > > (1) PCI Devices (e.g. network card, graphics card, = IDE controller), > > > > > > > > > > not controllers. Place only legacy PCI devices = on > > > > > > > > > > the Root Complex. These will be considered Inte= grated Endpoints. > > > > > > > > > > Note: Integrated Endpoints are not hot-pluggabl= e. > > > > > > > > > >=20 > > > > > > > > > > Although the PCI Express spec does not forbid P= CI Express devices as > > > > > > > > > > Integrated Endpoints, existing hardware mostly = integrates legacy PCI > > > > > > > > > > devices with the Root Complex. > > > > > > > >=20 > > > > > > > > "Mostly".. on my laptop at least the GPU shows up as an int= egrated PCI > > > > > > > > Express endpoint, so it's certainly not the case that *all*= root bus > > > > > > > > devices are legacy. > > > > > > > >=20 > > > > > > > > > Guest OSes are suspected to behave > > > > > > > > > > strangely when PCI Express devices are integrat= ed > > > > > > > > > > with the Root Complex. > > > > > > > >=20 > > > > > > > > Clearly not that strangely, that often, since my laptop wor= ks just fine. > > > > > > > >=20 > > > > > > > > > >=20 > > > > > > > > > > [...] > > > > > > > > > >=20 > > > > > > > > > > 2.2 PCI Express only hierarchy > > > > > > > > > > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > > > > > > > > > > Always use PCI Express Root Ports to start PCI Express = hierarchies. > > > > > > > > >=20 > > > > > > > > > Above you mention "it's normal for PCIe devices to appear= on the root bus without root ports". > > > > > > > >=20 > > > > > > > > Well "normal" perhaps wasn't the right word. Let's say pre= cedented, > > > > > > > > if uncommon. > > > > > > > >=20 > > > > > > > > > Let me turn the question around: is it a *problem* for "p= series" if > > > > > > > > > we require root ports? If so, why exactly? > > > > > > > >=20 > > > > > > > > That's.. a complex question. At least Linux guests (and we= don't > > > > > > > > support any others yet) might cope with the addition of roo= t ports. > > > > > > > > Maybe. I have discussed this option with BenH and others. > > > > > > > >=20 > > > > > > > > However it is gratuitously different from how PCIe devices = will > > > > > > > > typically appear for the same guest running under PowerVM. = Although I > > > > > > > > suspect Linux would cope with the "normal standard" rather = than "PAPR > > > > > > > > standard" presentation, I'm not as confident about it as I = would like. > > > > > > > >=20 > > > > > > > > Another consideration here is that other PCIe capable qemu = emulated > > > > > > > > devices, such as XHCI, will present fine as PCIe integrated= endpoints > > > > > > > > when attached to the root bus. Libvirt won't do that usual= ly, of > > > > > > > > course, and it may not be the recommended way of doing thin= gs (on PC) > > > > > > > > but it's possible. I don't see any particular reason that = virtio-pci > > > > > > > > should enforce the root port requirement more so than any o= ther > > > > > > > > device. > > > > > > > >=20 > > > > > > > > > On 02/08/17 07:16, David Gibson wrote: > > > > > > > > > >=20 > > > > > > > > > > pcie_endpoint_cap_init() already automatically adjusts = to advertise as > > > > > > > > > > an integrated device rather than a "normal" PCIe endpoi= nt when attached to > > > > > > > > > > a root bus. So we can remove the check for root bus wi= thin virtio and > > > > > > > > > > allow (at the user's discretion) a PCIe virtio bus to b= e attached to a > > > > > > > > > > root bus. > > > > > > > > >=20 > > > > > > > > > If Marcel thinks this is a good change, then I think we s= hould go > > > > > > > > > through "docs/pcie.txt" with a fine-toothed comb, and upd= ate all > > > > > > > > > relevant spots. (If Marcel agrees, perhaps you can includ= e such > > > > > > > > > hunks in your patch at once.) > > > > > > > >=20 > > > > > > > > Actually, I think that would be a neverending process. May= be better > > > > > > > > to put in a whole different spapr-pcie.txt with the assorte= d ways that > > > > > > > > PAPR violates PCIe conventions. > > > > > > >=20 > > > > > > > That works for me too, but I think it would be a lot more wor= k for you > > > > > > > and others. > > > > > > >=20 > > > > > > > I plan on consulting "docs/pcie.txt" frequently; among other = things, for > > > > > > > deciding debates. Thus, improving the scope of "docs/pcie.txt= " is very > > > > > > > welcome IMO. > > > > > > >=20 > > > > > > > >=20 > > > > > > > > > It also may have consequences for libvirt (but I see you = addressed > > > > > > > > > Andrea at once, which is great). > > > > > > > >=20 > > > > > > > > Right, I've been discussed this with Andrea all along. We'= re working > > > > > > > > on a proposed PAPR specific way of allocating PCI and PCIe = addresses > > > > > > > > (different from the PCIe normal way, but the same as each o= ther). > > > > > > > > That will simplify adding PCIe support to PAPR, and also ha= s some > > > > > > > > other advantages for PAPR guests (related to the platform s= pecific > > > > > > > > isolation, hotplug and error recovery mechanisms - also di= fferent > > > > > > > > from the normal PCIe ones). > > > > > > >=20 > > > > > > > Great, if Andrea is aware, that's a relief. > > > > > > >=20 > > > > > > > Can you resubmit this patch with a small hunk for "docs/pcie.= txt" that > > > > > > > removes PAPR from the scope? > > > > > >=20 > > > > >=20 > > > > > Hi David, > > > > > Sorry for the delay, I just came back from PTO. > > > > >=20 > > > > > > Well, first I'd like to see if Marcel knows of some reason I di= dn't > > > > > > think of why this test is important for virtio particularly her= e. But > > > > > > assuming the basic idea is acceptable, then yes, I'll update pc= ie.txt. > > > > > >=20 > > > > >=20 > > > > > There are two reasons for keeping virtio Integrated Endpoints as = PCI devices. > > > > > 1. The first point is generic; even if having PCIe devices as Int= egrated Endpoints should be OK, > > > > > is not recommended because some guests may miss-behave (*). X8= 6 arch supports a large number > > > > > of guests and we don't want to check and fix everything if *we= don't have to*. > > > > > Even if is not written anywhere and there are actually some PC= I Express Integrated Endpoints, > > > > > most of them are legacy PCI devices (I actually think this is = why we have Integrated Endpoints > > > > > at all, but I might be wrong). > > > >=20 > > > > Hm, ok. Could we implement that restriction in the pci/pcie core > > > > rather than in the virtio device? > > >=20 > > > I am not sure if we should do that. Most of the devices are PCI or PC= Ie. > > > Only some devices are "hybrid", the virtio devices, XHCI and I am not= sure we have more. > >=20 > > Ok, I see your point, the pcie core might not be right. But it still > > seems really weird to have it explicitly in each hybrid device, even > > if it's just the two. As the code stands right now, XHCI and virtio > > have different behaviour, without a clear reason for it. > >=20 >=20 > I suppose XHCI can behave the same as virtio if Gerd has nothing against = it > (remain PCI if plugged into the Root Complex), but I don't know how it wi= ll help your case. > > Maybe a hybrid_setup() helper function? > >=20 >=20 > How would it help PAPR scenario? It doesn't, directly, but it means there would only be a single place to fix with a PAPR hack, instead of both virtio and XHCI and any future hybrid devices. > Anyway, Eduardo is working on supplying > new query interfaces for libvirt and he touches this subject, I think > he does plan to mark the hybrid devices in some way. >=20 >=20 > > > > That would then protect things like > > > > XHCI as well. > > >=20 > > > I don't see a reason to have XHCI as Integrated Endpoint, I think it = should be always > > > plugged into a root port. (for x86. arm and power) > >=20 > > No, not for Power. Well, ok, yes for most ppc machine types, but not > > for the paravirtualized 'pseries' machine (which is the one we most > > care about). > >=20 > > Well.. I guess it doesn't need to be an Integrated Endpoint per se, > > but we should be able to have it appear on the guest root bus. > >=20 > > The thing to realize is that the paravirtualized PCI interfaces in > > PAPR mean there's very little guest visible difference between PCI and > > PCIe. In fact in most ways you could say that the paravirtualized bus > > operates like a vanilla PCI bus.. except that it does provide a way to > > access PCIe extended config space. >=20 > Maybe we can have a new bus type deriving from PCIBus and sibling of the = PCIe Bus. > Then we can have the same rules as the PCI bus and add tweaks when > necessary. Sounds possible. I guess it would need to return true for pci_bus_is_express(), so that PCIe devices will attach to it. In which case I'm not entirely sure how it would differ from a PCIe bus =66rom the qemu side. AFAICT with the exception of the virtio check and maybe a handful of others, the PCIe placement rules are handled by libvirt, rather than by qemu. > This does not solve the virtio problem since the code is actively looking= for > a PCIe Root Port and we don't have it for PAPR. Technically, it's not even looking for a root port, just excluding being on the root bus itself. I guess any kind of bridge that has something PCI-ish on the upstream side and PCIe on the downstream side more or less has to be either a root port or a PCIe switch downstream port, though. > Which means that you can use it to > > drive PCIe devices just fine. "Bus level" PCIe extensions like AER > > and PCIe standard hotplug won't work, but PAPR has its own mechanisms > > for those (common between PCI and PCIe). > >=20 > > I did float the idea of having the pseries PCI bus remain plain PCI > > but with a special flag to allow PCIe devices to be attached to it > > anyway. It wasn't greeted with much enthusiasm.. > >=20 >=20 > Can you point me to the discussion please? It seems similar to what I pro= posed above. Sorry, I was misleading. I think I just raised that idea with Andrea and a few other people internally, not on one of the lists at large. > As you properly described it, is much closer to PCI then PCIe, even the o= nly characteristic > that makes it "a little" PCIe, the Extended Configuration Space support, > is done with an alternative interface. >=20 > I agree the PAPR bus is not PCIe. Ok, so if we take that direction, the question becomes how do we let PCIe devices plug into this mostly-not-PCIe bus. Maybe introduce a "pci_bus_accepts_express()" function that will replace many, but not all current uses of "pci_bus_is_express()"? Such a helper could maybe simplify the logic in virtio-pci (and XHCI?) by returning false on an x86 root bus. >=20 > > > And for my purposes it would also make it easier to > > > > implement aa machine type specific hook to re-allow that configurat= ion > > > > on pseries. > > >=20 > > > I agree we need a solution for PAPR. > > >=20 > > > What about a pcie_papr() function and then: > > >=20 > > > diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c > > > index 5ce42af..2c646ae 100644 > > > --- a/hw/virtio/virtio-pci.c > > > +++ b/hw/virtio/virtio-pci.c > > > @@ -1804,7 +1804,7 @@ static void virtio_pci_realize(PCIDevice *pci_d= ev, Error **errp) > > > return; > > > } > > >=20 > > > - if (pcie_port && pci_is_express(pci_dev)) { > > > + if ((pcie_port || pcie_parp()) && pci_is_express(pci_dev)) { > > > int pos; > > >=20 > > > pos =3D pcie_endpoint_cap_init(pci_dev, 0); > >=20 > > That would be sufficient, yes, so I'll take it if we don't come to any > > other solutions. >=20 > OK >=20 > It still seems weird to me to have this logic within > > a specific device implementation though. > >=20 >=20 > I suppose you have a point, let's wait for Gerd and maybe Michael to comm= ent > on that, but anyway it will not help your case. (it will only make XHCI P= CI if plugged into Root Complex) It won't help directly, but it means we could put that pcie_papr() test in just the common code, rather than having to look at every hybrid and PCIe device. > > > >=20 > > > > At the moment XHCI and virtio-pci behave differently, which seems l= ess > > > > than ideal. > > > >=20 > > > > > 2. The second point is virtio specific. Not all the guests have v= irtio 1.0 support (e.g RHEL 6) and we allow them > > > > > to use legacy virtio devices as Integrated Endpoints (followin= g the thought that this is why we have Integrated Endpoints) > > > > > Making the virtio devices PCI Express, but not virtio 1.0 is a= lso problematic since now we will have too much > > > > > types of virtio devices. We want to keep it simple: virtio leg= acy > > > > > -> PCI, virtio modern -> PCIe. > > > >=20 > > > > Ok.. it's not obvious to me why integrated endpoint vs. under a root > > > > port is relevant to this. Can't we enable/disable PCIe mode based > > > > directly on the legacy/modern settings? > > > >=20 > > >=20 > > > Yes, we can, but we don't want to do that. Previous setups will stop = working > > > and we will need libvirt to mange the disable-* properties. > > > As a matter of fact the code today is after some discussions with lib= virt guys. > >=20 > > Uh.. I think I'm missing something.. but it's still not clear to me > > why this would break existing setups or impose more work on libvirt. > >=20 >=20 > Long story short, libvirt guys don't want to manually set the disable-* p= roperties > of virtio devices to make them comply to our goal (legacy -> PCI, modern = PCIe). > They also don't want to look at QEMU version/machine type to make a decis= ion on the above properties. >=20 > I agree the solution is not perfect, but at least it makes our testing ma= trix smaller > and makes our PCIe guidelines a little easier to understand (e.g. all sup= ported devices are PCIe, > but if want legacy PCI devices put them on the Root Complex) Ok.. I'm still missing something. Are you saying that instead of the legacy/modern status determining PCIe support, that PCIe status is determining legacy/modern support by default? That would actually seem to simplify things: whatever method we end up allowing PCIe virtio on PAPR, that should automatically enable modern mode, which is fine. Although.. I do wonder about legacy/modern for PAPR in general. Current kernels will support virtio 1.0 fine, but we have older released versions which only support legacy. Unlike PC we won't (I hope) have a whole machine type switch to handle this. At this stage I think we want virtio devices (whatever their bus type) on PAPR to default to allowing both legacy and modern for the forseeable future. How does that affect the matrix? > XHCI being PCIe on Root Complex is an unintended exception, but we want i= t connected to a > Root Port anyway, we don't have anything to gain from having it as Integr= ated Endpoint. > We only loose a slot that can be used by 8 Root Ports assembled into one = multi-function device. >=20 > PAPR bus should not be considered PCIe and should have a different set of= rules allowing PCIe > devices to be plugged into Root Complex. Alright, I can work with that. Michael, Andrea, how does this idea seem to you? --=20 David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson --WfZ7S8PLGjBY9Voh Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQIcBAEBCAAGBQJYo7LGAAoJEGw4ysog2bOSyA4QALk/4Ivz9VujDiej7HWQ+58O /GT11lWBFTYC6EdZRLKEs9N41AGeimf/ki3Sqc2hZWZP+1BepH5vdawjDLPJa+1G OiXkcR6kpCpeBhXJoxqRPBvj9CKu9FRQhiZp23BeIvafnkRDlbADtI0QAOD0gb9F sx//H7cmXCXdPkqPgIn8U3xZHn+czpp88RBWmC0kwgtp6amYmTvm8PWPaveqUTdt j2LfCk5MpwTrEeSU2OGIC54sKF2z4f+w/8K3HsfmjGS6BwsEGUytksZRpYnGWh0t e5DO7xSd5f3hZCZlTJ7Vc8t5opeOJmVdWq1utoMfoBv9xa04gbwYAtQFAWWRMBPE TK57jp81+F2O05oO4hh2/+qii1xjpEf1LtTIq62LRDvfFCe2lz/F5w15YW11VuQ/ bIVUf0UzoiARPz5kL9kGzRU3ONhSMC3VxaoPZopKf7tM3mBDG/T2agrQu/N4j+8h An+jl18DZCQ30TNKjuKgmUmxozo3s8EBwBBwKNMkPK/heWRjUhFh3LHFGmHRw9vx NAUQ8Y/xBjk2a+W8hPYl3UaIvywvCLydA8ratoAZBPMSvGcwTzl4KfPdXKUdTQeH UMV/X/LzpKQVuXDOimqwnL8zjsGwTfdBhpqdYEn+mbfTmHcFjdvrPoeT7Xe/ncvY WTzwptPpAA7JobMAQFuS =gc5m -----END PGP SIGNATURE----- --WfZ7S8PLGjBY9Voh--