From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:54101) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1c7guw-0007XG-Jl for qemu-devel@nongnu.org; Fri, 18 Nov 2016 06:01:16 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1c7gur-0006ta-KZ for qemu-devel@nongnu.org; Fri, 18 Nov 2016 06:01:14 -0500 Date: Fri, 18 Nov 2016 17:11:06 +1100 From: David Gibson Message-ID: <20161118061106.GG31640@umbus.fritz.box> References: <1477641400-23292-1-git-send-email-aik@ozlabs.ru> <20161028120712.4a911866@bahia> <20161031025314.GJ18226@umbus.fritz.box> <20161101024634.GA14860@umbus.fritz.box> <1479218565.3319.18.camel@redhat.com> <3353ecef-2308-13e3-025d-df41b2e89945@ozlabs.ru> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="+1TulI7fc0PCHNy3" Content-Disposition: inline In-Reply-To: <3353ecef-2308-13e3-025d-df41b2e89945@ozlabs.ru> Subject: Re: [Qemu-devel] [Qemu-ppc] [RFC PATCH qemu] spapr_pci: Create PCI-express root bus by default List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Alexey Kardashevskiy Cc: Andrea Bolognani , Greg Kurz , Paolo Bonzini , Alex Williamson , qemu-ppc@nongnu.org, qemu-devel@nongnu.org, libvir-list@redhat.com, Michael Roth --+1TulI7fc0PCHNy3 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Thu, Nov 17, 2016 at 01:02:57PM +1100, Alexey Kardashevskiy wrote: > On 16/11/16 01:02, Andrea Bolognani wrote: > > On Tue, 2016-11-01 at 13:46 +1100, David Gibson wrote: > >> On Mon, Oct 31, 2016 at 03:10:23PM +1100, Alexey Kardashevskiy wrote: > >>> =20 > >>> On 31/10/16 13:53, David Gibson wrote: > >>>> =20 > >>>> On Fri, Oct 28, 2016 at 12:07:12PM +0200, Greg Kurz wrote: > >>>>> =20 > >>>>> On Fri, 28 Oct 2016 18:56:40 +1100 > >>>>> Alexey Kardashevskiy wrote: > >>>>> =20 > >>>>>> =20 > >>>>>> At the moment sPAPR PHB creates a root buf of TYPE_PCI_BUS type. > >>>>>> This means that vfio-pci devices attached to it (and this is > >>>>>> a default behaviour) hide PCIe extended capabilities as > >>>>>> the bus does not pass a pci_bus_is_express(pdev->bus) check. > >>>>>> =20 > >>>>>> This changes adds a default PCI bus type property to sPAPR PHB > >>>>>> and uses TYPE_PCIE_BUS if none passed; older machines get TYPE_PCI= _BUS > >>>>>> for backward compatibility as a bus type is used in the bus name > >>>>>> so the root bus name becomes "pcie.0" instead of "pci.0". > >>>>>> =20 > >>>>>> Signed-off-by: Alexey Kardashevskiy > >>>>>> --- > >>>>>> =20 > >>>>>> What can possibly go wrong with such change of a name? > >>>>>> From devices prospective, I cannot see any. > >>>>>> =20 > >>>>>> libvirt might get upset as "pci.0" will not be available, > >>>>>> will it make sense to create pcie.0 as a root bus and always > >>>>>> add a PCIe->PCI bridge and name its bus "pci.0"? > >>>>>> =20 > >>>>>> Or create root bus from TYPE_PCIE_BUS and force name to "pci.0"? > >>>>>> pci_register_bus() can do this. > >>>>>> =20 > >>>>>> =20 > >>>>>> --- > >>>>>> hw/ppc/spapr.c | 5 +++++ > >>>>>> hw/ppc/spapr_pci.c | 5 ++++- > >>>>>> include/hw/pci-host/spapr.h | 1 + > >>>>>> 3 files changed, 10 insertions(+), 1 deletion(-) > >>>>>> =20 > >>>>>> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c > >>>>>> index 0b3820b..a268511 100644 > >>>>>> --- a/hw/ppc/spapr.c > >>>>>> +++ b/hw/ppc/spapr.c > >>>>>> @@ -2541,6 +2541,11 @@ DEFINE_SPAPR_MACHINE(2_8, "2.8", true); > >>>>>> .driver =3D TYPE_SPAPR_PCI_HOST_BRIDGE, \ > >>>>>> .property =3D "mem64_win_size", \ > >>>>>> .value =3D "0", \ > >>>>>> + }, \ > >>>>>> + { \ > >>>>>> + .driver =3D TYPE_SPAPR_PCI_HOST_BRIDGE, \ > >>>>>> + .property =3D "root_bus_type", \ > >>>>>> + .value =3D TYPE_PCI_BUS, \ > >>>>>> }, > >>>>>> =20 > >>>>>> static void phb_placement_2_7(sPAPRMachineState *spapr, uint32_t= index, > >>>>>> diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c > >>>>>> index 7cde30e..2fa1f22 100644 > >>>>>> --- a/hw/ppc/spapr_pci.c > >>>>>> +++ b/hw/ppc/spapr_pci.c > >>>>>> @@ -1434,7 +1434,9 @@ static void spapr_phb_realize(DeviceState *d= ev, Error **errp) > >>>>>> bus =3D pci_register_bus(dev, NULL, > >>>>>> pci_spapr_set_irq, pci_spapr_map_irq,= sphb, > >>>>>> &sphb->memspace, &sphb->iospace, > >>>>>> - PCI_DEVFN(0, 0), PCI_NUM_PINS, TYPE_PC= I_BUS); > >>>>>> + PCI_DEVFN(0, 0), PCI_NUM_PINS, > >>>>>> + sphb->root_bus_type ? sphb->root_bus_t= ype : > >>>>>> + TYPE_PCIE_BUS); > >>>>> =20 > >>>>> Shouldn't we ensure that sphb->root_bus_type is either TYPE_PCIE_BU= S or > >>>>> TYPE_PCI_BUS ? > >>>> =20 > >>>> Yes, I think so. In fact, I think it would be better to make the > >>>> property a boolean that just selects PCI-E, rather than this which > >>>> exposes qemu (semi-)internal type names on the comamnd line. > >>> =20 > >>> Sure, a "pcie-root" boolean property should do. > >>> =20 > >>> However this is not my main concern, I rather wonder if we have to ha= ve > >>> pci.0 when we pick PCIe for the root. > >> =20 > >> Right. > >> =20 > >> I've added Andrea Bologna to the CC list to get a libvirt perspective. > >=20 > > Thanks for doing so: changes such as this one can have quite > > an impact on the upper layers of the stack, so the earliest > > libvirt is involved in the discussion the better. > >=20 > > I'm going to go a step further and cross-post to libvir-list > > in order to give other libvirt contributors a chance to chime > > in too. > >=20 > >> Andrea, > >> =20 > >> To summarise the issue here: > >> * As I've said before the PAPR spec kinda-sorta abstracts the > >> difference between vanilla PCI and PCI-E > >> * However, because within qemu we're declaring the bus as PCI that > >> means some PCI-E devices aren't working right > >> * In particular it means that PCI-E extended config space isn't > >> available > >> =20 > >> The proposal is to change (on newer machine types) the spapr PHB code > >> to declare a PCI-E bus instead. AIUI this still won't make the root > >> complex guest visible (which it's not supposed to be under PAPR), and > >> the guest shouldn't see a difference in most cases - it will still see > >> the PAPR abstracted PCIish bus, but will now be able to get extended > >> config space. > >> =20 > >> The possible problem from a libvirt perspective is that doing this in > >> the simplest way in qemu would change the name of the default bus from > >> pci.0 to pcie.0. We have two suggested ways to mitigate this: > >> 1) Automatically create a PCI-E to PCI bridge, so that new machine > >> types will have both a pcie.0 and pci.0 bus > >> 2) Force the name of the bus to be pci.0, even though it's treated > >> as PCI-E in other ways. > >> =20 > >> We're trying to work out exactly what will and won't cause trouble for > >> libvirt. > >=20 > > Option 2) is definitely a no-no, as we don't want to be piling > > up even more hacks and architecture-specific code: the PCI > > Express Root Bus should be called pcie.0, just as it is on q35 > > and mach-virt machine types. > >=20 > > Option 1) doesn't look too bad, but devices that are added > > automatically by QEMU are an issue since we need to hardcode > > knowledge of them into libvirt if we want the rest of the PCI > > address allocation logic to handle them correctly. > >=20 > > Moreover libvirt now has the ability of building a legacy PCI > > topology without user intervention, if needed to plug in > > legacy devices, on machines that have a PCI Express Root Bus, > > which makes the additional bridge fully redundant... > >=20 > > ... or at least it would, if we actually had a proper > > PCIe-to-PCI bridge; AFAIK, though, the closest we have is the > > i82801b11-bridge that is Intel-specific despite having so far > > been abused as a generic PCIe-to-PCI bridge. I'm not even > > sure whether it would work at all on ppc64. > >=20 > > Moving from legacy PCI to PCI Express would definitely be an > > improvement, in my opinion. As mentioned, that's already the > > case for at least two other architectures, so the more we can > > standardize on that, the better. > >=20 > > That said, considering that a big part of the PCI address > > allocation logic is based off whether the specific machine > > type exposes a legay PCI Root Bus or a PCI Express Root Bus, > > libvirt will need a way to be able to tell which one is which. > >=20 > > Version checks are pretty much out of the question, as they > > fail as soon as downstream releases enter the picture. A > > few ways we could deal with the situation: > >=20 > > 1) switch to PCI Express on newer machine types, and > > expose some sort of capability through QMP so that > > libvirt can know about the switch > >=20 > > 2) switch between legacy PCI and PCI Express based on a > > machine type option. libvirt would be able to find out > > whether the option is available or not, and default to > > either > >=20 > > > >=20 > > or > >=20 > > > >=20 > > based on that. In order to support multiple PHBs > > properly, those would have to be switchable with an > > option as well > >=20 > > 3) create an entirely new machine type, eg. pseries-pcie > > or whatever someone with the ability to come up with > > decent names can suggest :) That would make ppc64 > > similar to x86, where i440fx and q35 have different > > root buses. libvirt would learn about the new machine > > type, know that it has a PCI Express Root Bus, and > > behave accordingly > >=20 > > Option 1) would break horribly with existing libvirt > > versions, and so would Option 2) if we default to using >=20 >=20 > How exactly 1) will break libvirt? Migrating from pseries-2.7 to > pseries-2.8 does not work anyway, and machines are allowed to behave > different from version to version, what distinct difference will using > "pseries-pcie-X.Y" make? I believe after we introduced the very first > pseries-pcie-X.Y, we will just stop adding new pseries-X.Y. IIUC, it's because when libvirt wants to attach a PCI device, it will just try to attach it to bus pci.0, which will no longer exist. > > PCI Express. Option 2) with default to legacy PCI and > > option 3) would work just fine with existing libvirt > > versions AFAICT, but wouldn't of course expose the new > > capabilities. > >=20 > > Option 3) is probably the one that will be less confusing > > to users; we might even decide to take the chance and fix > > other small annoyances with the current pseries machine > > type, if there's any. On the other hand, it might very well > > be considered to be too big a hammer for such a small nail. >=20 >=20 >=20 --=20 David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson --+1TulI7fc0PCHNy3 Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQIcBAEBCAAGBQJYLpt6AAoJEGw4ysog2bOSuOUP/3Wk3f1PuOxaeGJi8GOdgrxL rIuMRnfo7nFTpJm7hWrebEaIzBoQQ0Anf20bBG+HRLm7CZv/nzBrSRsdxtdK3RLe 4yKvq0V0TKlvLYYSW+lsl5t5tHW6aRha2FaXm+PbBeqN9IhHgdl8+osxkwxCi5aq 40qhSQRopmX0sSG2yUTbvRy0fwNjv/XD3DsaOVoN9Of71peEorZ9ha0dyr2x5I3n GcjuDON+/m7/u5g2+nOSEBVuRdvwzSgmlbmmyNYjQM2eUoGoMtGz4i60TUA0pRvQ M2N2voHZ9CKY4ZnXSd9YlOWkgpCjPZG6n3Ms/rFWSIeOy3gOTRUezcYmDpD1oJaN RD/xcuqjP0MGl1dN3/b75JshApJgexLijhGbJ9/L3TVk0VyUlTFCjjUYmnnse+Sh SpI24PAChqFgpEy9LevRaoaGogsb2ky1DOq+LpCHaKOj+/dstPSOhWh83j6a4EFu ljrb8Zr8ICaoTNxUY9eAFUlVpcunu4VowKQya6J2r+zKk9+ODMjjmRXP1Zf3saKX gklIZUqpY1pLcU8mnyuf2VZk6xcSYxevDYyV4oCkoQgiG4BgWrrdk6yNpbdwceoT rYgQPEp1zCLOeW4qOgRG/0AGOfglqYWGwg/BUHfoB2VDoZNYsjr0q4bJatRlWo8I B45YEFPtkGW4GTRZNWlb =L4P2 -----END PGP SIGNATURE----- --+1TulI7fc0PCHNy3--