From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:60996) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bmw3D-0002Lv-DJ for qemu-devel@nongnu.org; Thu, 22 Sep 2016 00:56:00 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1bmw39-0007g2-BN for qemu-devel@nongnu.org; Thu, 22 Sep 2016 00:55:59 -0400 Date: Thu, 22 Sep 2016 14:48:26 +1000 From: David Gibson Message-ID: <20160922044826.GD2085@umbus.fritz.box> References: <1469606618-26915-1-git-send-email-aik@ozlabs.ru> <20160913232911.31012.55556@loki> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="q9KOos5vDmpwPx9o" Content-Disposition: inline In-Reply-To: Subject: Re: [Qemu-devel] [PATCH qemu] spapr_pci: Add numa node id List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Alexey Kardashevskiy Cc: Michael Roth , qemu-devel@nongnu.org, qemu-ppc@nongnu.org, sbhat@linux.vnet.ibm.com, Bharata B Rao --q9KOos5vDmpwPx9o Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Wed, Sep 14, 2016 at 07:39:10PM +1000, Alexey Kardashevskiy wrote: > On 14/09/16 09:29, Michael Roth wrote: > > Quoting Alexey Kardashevskiy (2016-07-27 03:03:38) > >> This adds a numa id property to a PHB to allow linking passed PCI devi= ce > >> to CPU/memory. It is up to the management stack to do CPU/memory pinni= ng > >> to the node with the actual PCI device. > >=20 > > It looks like x86 relies on PCIBus->numa_node() method (via > > pci_bus_numa_node()) to encode similar PCIBus affinities > > into ACPI tables, and currently exposes it via > > -device pci-[-express]-expander-bus,numa_node=3DX. >=20 > Well, until we allow DMA windows per PCI bus (not per PHB as it is now), > this won't make much sense for us (unless I am missing something > here). I can't actually see any sane way we could have NUMA associativity of PCI at any finer granularity than the host bridge level. I suspect the only reason it works that way on x86 is due to some of the weird stuff PC does to make what are effectively different host bridges appear as different buses in a single logical domain. >=20 > > Maybe we should implement it using this existing > > PCIBus->numa_node() interface? > >=20 > > We'd still have to expose numa_node as a spapr-pci-host-bridge > > device option though. Not sure if there's a more common way > > to expose it that might be easier for libvirt to discover. As it > > stands we'd need to add spapr-pci-host-bridge to a libvirt > > whitelist that currently only covers pci-expander-bus. > >=20 > > Cc'ing Shiva who was looking into the libvirt side. I think we should change the actual name of the option to "numa_node" to match the option used on the pxb, though. > >=20 > > One comment below: > >=20 > >> > >> Signed-off-by: Alexey Kardashevskiy > >> --- > >> hw/ppc/spapr_pci.c | 13 +++++++++++++ > >> include/hw/pci-host/spapr.h | 2 ++ > >> 2 files changed, 15 insertions(+) > >> > >> diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c > >> index 949c44f..af5394a 100644 > >> --- a/hw/ppc/spapr_pci.c > >> +++ b/hw/ppc/spapr_pci.c > >> @@ -47,6 +47,7 @@ > >> #include "sysemu/device_tree.h" > >> #include "sysemu/kvm.h" > >> #include "sysemu/hostmem.h" > >> +#include "sysemu/numa.h" > >> > >> #include "hw/vfio/vfio.h" > >> > >> @@ -1544,6 +1545,7 @@ static Property spapr_phb_properties[] =3D { > >> DEFINE_PROP_BOOL("ddw", sPAPRPHBState, ddw_enabled, true), > >> DEFINE_PROP_UINT64("pgsz", sPAPRPHBState, page_size_mask, > >> (1ULL << 12) | (1ULL << 16)), > >> + DEFINE_PROP_UINT32("node", sPAPRPHBState, numa_node, -1), > >> DEFINE_PROP_END_OF_LIST(), > >> }; > >> > >> @@ -1805,6 +1807,11 @@ int spapr_populate_pci_dt(sPAPRPHBState *phb, > >> cpu_to_be32(1), > >> cpu_to_be32(RTAS_IBM_RESET_PE_DMA_WINDOW) > >> }; > >> + uint32_t associativity[] =3D {cpu_to_be32(0x4), > >> + cpu_to_be32(0x0), > >> + cpu_to_be32(0x0), > >> + cpu_to_be32(0x0), > >> + cpu_to_be32(phb->numa_node)}; > >> sPAPRTCETable *tcet; > >> PCIBus *bus =3D PCI_HOST_BRIDGE(phb)->bus; > >> sPAPRFDT s_fdt; > >> @@ -1837,6 +1844,12 @@ int spapr_populate_pci_dt(sPAPRPHBState *phb, > >> &ddw_extensions, sizeof(ddw_extensions))); > >> } > >> > >> + /* Advertise NUMA via ibm,associativity */ > >> + if (nb_numa_nodes > 1) { > >> + _FDT(fdt_setprop(fdt, bus_off, "ibm,associativity", associati= vity, > >> + sizeof(associativity))); > >> + } > >=20 > > LoPAPR 15.3 seems to suggest that ibm,associativity-reference-points is > > required alongside ibm,associativity for each DT node it appears in, > > and since we hardcode "Form 1" affinity it should be done similarly as > > the entry we place in the top-level DT node. >=20 >=20 > Hm, okay, I'll add it. There is a question to Bharata - why do we have 4s > in spapr_create_fdt_skel()'s refpoints? Just a random pick? >=20 > vec5[5]=3D=3D0x80 means we are doing "Form1" and these 4s are far/near di= stances? >=20 >=20 > >> + > >> /* Build the interrupt-map, this must matches what is done > >> * in pci_spapr_map_irq > >> */ > >> diff --git a/include/hw/pci-host/spapr.h b/include/hw/pci-host/spapr.h > >> index 5adc603..53c4b2d 100644 > >> --- a/include/hw/pci-host/spapr.h > >> +++ b/include/hw/pci-host/spapr.h > >> @@ -75,6 +75,8 @@ struct sPAPRPHBState { > >> bool ddw_enabled; > >> uint64_t page_size_mask; > >> uint64_t dma64_win_addr; > >> + > >> + uint32_t numa_node; > >> }; > >> > >> #define SPAPR_PCI_MAX_INDEX 255 > >> > >=20 >=20 >=20 --=20 David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson --q9KOos5vDmpwPx9o Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQIcBAEBCAAGBQJX42KZAAoJEGw4ysog2bOSrzIP/0M1aB5AsGYJ9OAJETxeMJiO DXuURLuUEnTojvVhZb+kVa8WIAlco4i9IsRpIYvnfV+pKZoWqOHQ7r8eZiZvOKxs h2F+gWnvSRKRXqdkgxBlZhBvV66gFCBax2Qr6CDFQgw6SIB2fRkrdLXr0x2oOsgk PegK3IOX5AfGuDlbG3JNbb0Y8ZN9pUHKQy7oMXGd5l5DUfhjGHvslnAswmtn4zae wjSyqmn08bnA10MgcPyTT6eW17iV+ExOitzWhOFdRUGPz+VB+Sf2rG5iVtd4bXxr UIa5cUEHFsVj+GrnXwU80ANaidoq/uZ9c65caRBDSZ6wrSqH3D3wbKVF3zzXelTR hVrMuFY2X8grirBq2I4eQZpDItiI/iTWAyDpebLRdRVCVVc9SEIme338egLijIu7 sbCPS0d2KoRYaFJj5u3uxQ89mbEWAoN44hATmTG4Omh38fkkfOG5urlEc7WkJ+xi gDIlZJa+C0x9eLBmTRut53S+7EqSK2FSh8Hnu7mx49Kw4xIFD+tf4kZLsX7z6OdO DM1+wz7wP4kMXOKVqPETBi8UVbt6UVL7PA41K4nxF93fZ/w206fqP2CgDQWDVKpk agbwR3E5EBhjGTlUyo0TzUmRyy2GNDcPYaHJUzQMuQ2J43sNA0guIxinzB70iiY8 p2h0Y8ymhux4fba1zVrf =JT6z -----END PGP SIGNATURE----- --q9KOos5vDmpwPx9o--