* Re: [PATCH v2 1/8] powerpc/xive: Use cpu_to_node() instead of ibm,chip-id property [not found] ` <3180b5c6-e61f-9c5f-3c80-f10e69dc5785@linux.ibm.com> @ 2021-03-09 17:26 ` Cédric Le Goater 2021-03-12 1:55 ` David Gibson 0 siblings, 1 reply; 6+ messages in thread From: Cédric Le Goater @ 2021-03-09 17:26 UTC (permalink / raw) To: Daniel Henrique Barboza, Greg Kurz Cc: list@suse.de:PowerPC, Michael Ellerman, linuxppc-dev, QEMU Developers, David Gibson On 3/9/21 6:08 PM, Daniel Henrique Barboza wrote: > > > On 3/9/21 12:33 PM, Cédric Le Goater wrote: >> On 3/8/21 6:13 PM, Greg Kurz wrote: >>> On Wed, 3 Mar 2021 18:48:50 +0100 >>> Cédric Le Goater <clg@kaod.org> wrote: >>> >>>> The 'chip_id' field of the XIVE CPU structure is used to choose a >>>> target for a source located on the same chip when possible. This field >>>> is assigned on the PowerNV platform using the "ibm,chip-id" property >>>> on pSeries under KVM when NUMA nodes are defined but it is undefined >>> >>> This sentence seems to have a syntax problem... like it is missing an >>> 'and' before 'on pSeries'. >> >> ah yes, or simply a comma. >> >>>> under PowerVM. The XIVE source structure has a similar field >>>> 'src_chip' which is only assigned on the PowerNV platform. >>>> >>>> cpu_to_node() returns a compatible value on all platforms, 0 being the >>>> default node. It will also give us the opportunity to set the affinity >>>> of a source on pSeries when we can localize them. >>>> >>> >>> IIUC this relies on the fact that the NUMA node id is == to chip id >>> on PowerNV, i.e. xc->chip_id which is passed to OPAL remain stable >>> with this change. >> >> Linux sets the NUMA node in numa_setup_cpu(). On pseries, the hcall >> H_HOME_NODE_ASSOCIATIVITY returns the node id if I am correct (Daniel >> in Cc:) > > That's correct. H_HOME_NODE_ASSOCIATIVITY returns not only the node_id, but > a list with the ibm,associativity domains of the CPU that "proc-no" (processor > identifier) is mapped to inside QEMU. > > node_id in this case, considering that we're working with a reference-points > of size 4, is the 4th element of the returned list. The last element is > "procno" itself. > > >> >> On PowerNV, Linux uses "ibm,associativity" property of the CPU to find >> the node id. This value is built from the chip id in OPAL, so the >> value returned by cpu_to_node(cpu) and the value of the "ibm,chip-id" >> property are unlikely to be different. >> >> cpu_to_node(cpu) is used in many places to allocate the structures >> locally to the owning node. XIVE is not an exception (see below in the >> same patch), it is better to be consistent and get the same information >> (node id) using the same routine. >> >> >> In Linux, "ibm,chip-id" is only used in low level PowerNV drivers : >> LPC, XSCOM, RNG, VAS, NX. XIVE should be in that list also but skiboot >> unifies the controllers of the system to only expose one the OS. This >> is problematic and should be changed but it's another topic. >> >> >>> On the other hand, you have the pSeries case under PowerVM that >>> doesn't xc->chip_id, which isn't passed to any hcall AFAICT. >> >> yes "ibm,chip-id" is an OPAL concept unfortunately and it has no meaning >> under PAPR. xc->chip_id on pseries (PowerVM) will contains an invalid >> chip id. >> >> QEMU/KVM exposes "ibm,chip-id" but it's not used. (its value is not >> always correct btw) > > > If you have a way to reliably reproduce this, let me know and I'll fix it > up in QEMU. with : -smp 4,cores=1,maxcpus=8 -object memory-backend-ram,id=ram-node0,size=2G -numa node,nodeid=0,cpus=0-1,cpus=4-5,memdev=ram-node0 -object memory-backend-ram,id=ram-node1,size=2G -numa node,nodeid=1,cpus=2-3,cpus=6-7,memdev=ram-node1 # dmesg | grep numa [ 0.013106] numa: Node 0 CPUs: 0-1 [ 0.013136] numa: Node 1 CPUs: 2-3 # dtc -I fs /proc/device-tree/cpus/ -f | grep ibm,chip-id ibm,chip-id = <0x01>; ibm,chip-id = <0x02>; ibm,chip-id = <0x00>; ibm,chip-id = <0x03>; with : -smp 4,cores=4,maxcpus=8,threads=1 -object memory-backend-ram,id=ram-node0,size=2G -numa node,nodeid=0,cpus=0-1,cpus=4-5,memdev=ram-node0 -object memory-backend-ram,id=ram-node1,size=2G -numa node,nodeid=1,cpus=2-3,cpus=6-7,memdev=ram-node1 # dmesg | grep numa [ 0.013106] numa: Node 0 CPUs: 0-1 [ 0.013136] numa: Node 1 CPUs: 2-3 # dtc -I fs /proc/device-tree/cpus/ -f | grep ibm,chip-id ibm,chip-id = <0x00>; ibm,chip-id = <0x00>; ibm,chip-id = <0x00>; ibm,chip-id = <0x00>; I think we should simply remove "ibm,chip-id" since it's not used and not in the PAPR spec. Thanks, C. > > Thanks, > > > DHB > > >> >>> It looks like the chip id is only used for localization purpose in >>> this case, right ? >> >> Yes and PAPR sources are not localized. So it's not used. MSI sources >> could be if we rewrote the MSI driver. >> >>> In this case, what about doing this change for pSeries only, >>> somewhere in spapr.c ? >> >> The IPI code is common to all platforms and all have the same issue. >> I rather not. >> >> Thanks, >> >> C. >> >>>> Signed-off-by: Cédric Le Goater <clg@kaod.org> >>>> --- >>>> arch/powerpc/sysdev/xive/common.c | 7 +------ >>>> 1 file changed, 1 insertion(+), 6 deletions(-) >>>> >>>> diff --git a/arch/powerpc/sysdev/xive/common.c b/arch/powerpc/sysdev/xive/common.c >>>> index 595310e056f4..b8e456da28aa 100644 >>>> --- a/arch/powerpc/sysdev/xive/common.c >>>> +++ b/arch/powerpc/sysdev/xive/common.c >>>> @@ -1335,16 +1335,11 @@ static int xive_prepare_cpu(unsigned int cpu) >>>> xc = per_cpu(xive_cpu, cpu); >>>> if (!xc) { >>>> - struct device_node *np; >>>> - >>>> xc = kzalloc_node(sizeof(struct xive_cpu), >>>> GFP_KERNEL, cpu_to_node(cpu)); >>>> if (!xc) >>>> return -ENOMEM; >>>> - np = of_get_cpu_node(cpu, NULL); >>>> - if (np) >>>> - xc->chip_id = of_get_ibm_chip_id(np); >>>> - of_node_put(np); >>>> + xc->chip_id = cpu_to_node(cpu); >>>> xc->hw_ipi = XIVE_BAD_IRQ; >>>> per_cpu(xive_cpu, cpu) = xc; >>> >> ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH v2 1/8] powerpc/xive: Use cpu_to_node() instead of ibm,chip-id property 2021-03-09 17:26 ` [PATCH v2 1/8] powerpc/xive: Use cpu_to_node() instead of ibm,chip-id property Cédric Le Goater @ 2021-03-12 1:55 ` David Gibson 2021-03-12 9:53 ` Cédric Le Goater 0 siblings, 1 reply; 6+ messages in thread From: David Gibson @ 2021-03-12 1:55 UTC (permalink / raw) To: Cédric Le Goater Cc: Michael Ellerman, Daniel Henrique Barboza, Greg Kurz, QEMU Developers, list@suse.de:PowerPC, linuxppc-dev On Tue, 9 Mar 2021 18:26:35 +0100 Cédric Le Goater <clg@kaod.org> wrote: > On 3/9/21 6:08 PM, Daniel Henrique Barboza wrote: > > > > > > On 3/9/21 12:33 PM, Cédric Le Goater wrote: > >> On 3/8/21 6:13 PM, Greg Kurz wrote: > >>> On Wed, 3 Mar 2021 18:48:50 +0100 > >>> Cédric Le Goater <clg@kaod.org> wrote: > >>> > >>>> The 'chip_id' field of the XIVE CPU structure is used to choose a > >>>> target for a source located on the same chip when possible. This field > >>>> is assigned on the PowerNV platform using the "ibm,chip-id" property > >>>> on pSeries under KVM when NUMA nodes are defined but it is undefined > >>> > >>> This sentence seems to have a syntax problem... like it is missing an > >>> 'and' before 'on pSeries'. > >> > >> ah yes, or simply a comma. > >> > >>>> under PowerVM. The XIVE source structure has a similar field > >>>> 'src_chip' which is only assigned on the PowerNV platform. > >>>> > >>>> cpu_to_node() returns a compatible value on all platforms, 0 being the > >>>> default node. It will also give us the opportunity to set the affinity > >>>> of a source on pSeries when we can localize them. > >>>> > >>> > >>> IIUC this relies on the fact that the NUMA node id is == to chip id > >>> on PowerNV, i.e. xc->chip_id which is passed to OPAL remain stable > >>> with this change. > >> > >> Linux sets the NUMA node in numa_setup_cpu(). On pseries, the hcall > >> H_HOME_NODE_ASSOCIATIVITY returns the node id if I am correct (Daniel > >> in Cc:) > [...] > >> > >> On PowerNV, Linux uses "ibm,associativity" property of the CPU to find > >> the node id. This value is built from the chip id in OPAL, so the > >> value returned by cpu_to_node(cpu) and the value of the "ibm,chip-id" > >> property are unlikely to be different. > >> > >> cpu_to_node(cpu) is used in many places to allocate the structures > >> locally to the owning node. XIVE is not an exception (see below in the > >> same patch), it is better to be consistent and get the same information > >> (node id) using the same routine. > >> > >> > >> In Linux, "ibm,chip-id" is only used in low level PowerNV drivers : > >> LPC, XSCOM, RNG, VAS, NX. XIVE should be in that list also but skiboot > >> unifies the controllers of the system to only expose one the OS. This > >> is problematic and should be changed but it's another topic. > >> > >> > >>> On the other hand, you have the pSeries case under PowerVM that > >>> doesn't xc->chip_id, which isn't passed to any hcall AFAICT. > >> > >> yes "ibm,chip-id" is an OPAL concept unfortunately and it has no meaning > >> under PAPR. xc->chip_id on pseries (PowerVM) will contains an invalid > >> chip id. > >> > >> QEMU/KVM exposes "ibm,chip-id" but it's not used. (its value is not > >> always correct btw) > > > > > > If you have a way to reliably reproduce this, let me know and I'll fix it > > up in QEMU. > > with : > > -smp 4,cores=1,maxcpus=8 -object memory-backend-ram,id=ram-node0,size=2G -numa node,nodeid=0,cpus=0-1,cpus=4-5,memdev=ram-node0 -object memory-backend-ram,id=ram-node1,size=2G -numa node,nodeid=1,cpus=2-3,cpus=6-7,memdev=ram-node1 > > # dmesg | grep numa > [ 0.013106] numa: Node 0 CPUs: 0-1 > [ 0.013136] numa: Node 1 CPUs: 2-3 > > # dtc -I fs /proc/device-tree/cpus/ -f | grep ibm,chip-id > ibm,chip-id = <0x01>; > ibm,chip-id = <0x02>; > ibm,chip-id = <0x00>; > ibm,chip-id = <0x03>; > > with : > > -smp 4,cores=4,maxcpus=8,threads=1 -object memory-backend-ram,id=ram-node0,size=2G -numa node,nodeid=0,cpus=0-1,cpus=4-5,memdev=ram-node0 -object memory-backend-ram,id=ram-node1,size=2G -numa node,nodeid=1,cpus=2-3,cpus=6-7,memdev=ram-node1 > > # dmesg | grep numa > [ 0.013106] numa: Node 0 CPUs: 0-1 > [ 0.013136] numa: Node 1 CPUs: 2-3 > > # dtc -I fs /proc/device-tree/cpus/ -f | grep ibm,chip-id > ibm,chip-id = <0x00>; > ibm,chip-id = <0x00>; > ibm,chip-id = <0x00>; > ibm,chip-id = <0x00>; > > I think we should simply remove "ibm,chip-id" since it's not used and > not in the PAPR spec. As I mentioned to Daniel on our call this morning, oddly it *does* appear to be used in the RHEL kernel, even though that's 4.18 based. This patch seems to have caused a minor regression; not in the identification of NUMA nodes, but in the number of sockets shown be lscpu, etc. See https://bugzilla.redhat.com/show_bug.cgi?id=1934421 for more information. Since the value was used by some PAPR kernels - even if they shouldn't have - I think we should only remove this for newer machine types. We also need to check what we're not supplying that the guest kernel is showing a different number of sockets than specified on the qemu command line. > > Thanks, > > C. > > > > [...] > [...] > [...] > [...] > [...] > [...] > [...] > [...] > [...] > -- David Gibson <dgibson@redhat.com> Principal Software Engineer, Virtualization, Red Hat ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH v2 1/8] powerpc/xive: Use cpu_to_node() instead of ibm,chip-id property 2021-03-12 1:55 ` David Gibson @ 2021-03-12 9:53 ` Cédric Le Goater 2021-03-12 12:18 ` Daniel Henrique Barboza 0 siblings, 1 reply; 6+ messages in thread From: Cédric Le Goater @ 2021-03-12 9:53 UTC (permalink / raw) To: David Gibson Cc: Michael Ellerman, Daniel Henrique Barboza, Greg Kurz, QEMU Developers, list@suse.de:PowerPC, linuxppc-dev On 3/12/21 2:55 AM, David Gibson wrote: > On Tue, 9 Mar 2021 18:26:35 +0100 > Cédric Le Goater <clg@kaod.org> wrote: > >> On 3/9/21 6:08 PM, Daniel Henrique Barboza wrote: >>> >>> >>> On 3/9/21 12:33 PM, Cédric Le Goater wrote: >>>> On 3/8/21 6:13 PM, Greg Kurz wrote: >>>>> On Wed, 3 Mar 2021 18:48:50 +0100 >>>>> Cédric Le Goater <clg@kaod.org> wrote: >>>>> >>>>>> The 'chip_id' field of the XIVE CPU structure is used to choose a >>>>>> target for a source located on the same chip when possible. This field >>>>>> is assigned on the PowerNV platform using the "ibm,chip-id" property >>>>>> on pSeries under KVM when NUMA nodes are defined but it is undefined >>>>> >>>>> This sentence seems to have a syntax problem... like it is missing an >>>>> 'and' before 'on pSeries'. >>>> >>>> ah yes, or simply a comma. >>>> >>>>>> under PowerVM. The XIVE source structure has a similar field >>>>>> 'src_chip' which is only assigned on the PowerNV platform. >>>>>> >>>>>> cpu_to_node() returns a compatible value on all platforms, 0 being the >>>>>> default node. It will also give us the opportunity to set the affinity >>>>>> of a source on pSeries when we can localize them. >>>>>> >>>>> >>>>> IIUC this relies on the fact that the NUMA node id is == to chip id >>>>> on PowerNV, i.e. xc->chip_id which is passed to OPAL remain stable >>>>> with this change. >>>> >>>> Linux sets the NUMA node in numa_setup_cpu(). On pseries, the hcall >>>> H_HOME_NODE_ASSOCIATIVITY returns the node id if I am correct (Daniel >>>> in Cc:) >> [...] >>>> >>>> On PowerNV, Linux uses "ibm,associativity" property of the CPU to find >>>> the node id. This value is built from the chip id in OPAL, so the >>>> value returned by cpu_to_node(cpu) and the value of the "ibm,chip-id" >>>> property are unlikely to be different. >>>> >>>> cpu_to_node(cpu) is used in many places to allocate the structures >>>> locally to the owning node. XIVE is not an exception (see below in the >>>> same patch), it is better to be consistent and get the same information >>>> (node id) using the same routine. >>>> >>>> >>>> In Linux, "ibm,chip-id" is only used in low level PowerNV drivers : >>>> LPC, XSCOM, RNG, VAS, NX. XIVE should be in that list also but skiboot >>>> unifies the controllers of the system to only expose one the OS. This >>>> is problematic and should be changed but it's another topic. >>>> >>>> >>>>> On the other hand, you have the pSeries case under PowerVM that >>>>> doesn't xc->chip_id, which isn't passed to any hcall AFAICT. >>>> >>>> yes "ibm,chip-id" is an OPAL concept unfortunately and it has no meaning >>>> under PAPR. xc->chip_id on pseries (PowerVM) will contains an invalid >>>> chip id. >>>> >>>> QEMU/KVM exposes "ibm,chip-id" but it's not used. (its value is not >>>> always correct btw) >>> >>> >>> If you have a way to reliably reproduce this, let me know and I'll fix it >>> up in QEMU. >> >> with : >> >> -smp 4,cores=1,maxcpus=8 -object memory-backend-ram,id=ram-node0,size=2G -numa node,nodeid=0,cpus=0-1,cpus=4-5,memdev=ram-node0 -object memory-backend-ram,id=ram-node1,size=2G -numa node,nodeid=1,cpus=2-3,cpus=6-7,memdev=ram-node1 >> >> # dmesg | grep numa >> [ 0.013106] numa: Node 0 CPUs: 0-1 >> [ 0.013136] numa: Node 1 CPUs: 2-3 >> >> # dtc -I fs /proc/device-tree/cpus/ -f | grep ibm,chip-id >> ibm,chip-id = <0x01>; >> ibm,chip-id = <0x02>; >> ibm,chip-id = <0x00>; >> ibm,chip-id = <0x03>; >> >> with : >> >> -smp 4,cores=4,maxcpus=8,threads=1 -object memory-backend-ram,id=ram-node0,size=2G -numa node,nodeid=0,cpus=0-1,cpus=4-5,memdev=ram-node0 -object memory-backend-ram,id=ram-node1,size=2G -numa node,nodeid=1,cpus=2-3,cpus=6-7,memdev=ram-node1 >> >> # dmesg | grep numa >> [ 0.013106] numa: Node 0 CPUs: 0-1 >> [ 0.013136] numa: Node 1 CPUs: 2-3 >> >> # dtc -I fs /proc/device-tree/cpus/ -f | grep ibm,chip-id >> ibm,chip-id = <0x00>; >> ibm,chip-id = <0x00>; >> ibm,chip-id = <0x00>; >> ibm,chip-id = <0x00>; >> >> I think we should simply remove "ibm,chip-id" since it's not used and >> not in the PAPR spec. > > As I mentioned to Daniel on our call this morning, oddly it *does* > appear to be used in the RHEL kernel, even though that's 4.18 based. > This patch seems to have caused a minor regression; not in the > identification of NUMA nodes, but in the number of sockets shown be > lscpu, etc. See https://bugzilla.redhat.com/show_bug.cgi?id=1934421 > for more information. Yes. The property "ibm,chip-id" is wrongly calculated in QEMU. If we remove it, we get with 4.18.0-295.el8.ppc64le or 5.12.0-rc2 : [root@localhost ~]# lscpu Architecture: ppc64le Byte Order: Little Endian CPU(s): 128 On-line CPU(s) list: 0-127 Thread(s) per core: 4 Core(s) per socket: 16 Socket(s): 2 NUMA node(s): 2 Model: 2.2 (pvr 004e 1202) Model name: POWER9 (architected), altivec supported Hypervisor vendor: KVM Virtualization type: para L1d cache: 32K L1i cache: 32K NUMA node0 CPU(s): 0-63 NUMA node1 CPU(s): 64-127 [root@localhost ~]# grep . /sys/devices/system/cpu/*/topology/physical_package_id /sys/devices/system/cpu/cpu0/topology/physical_package_id:-1 /sys/devices/system/cpu/cpu100/topology/physical_package_id:-1 /sys/devices/system/cpu/cpu101/topology/physical_package_id:-1 /sys/devices/system/cpu/cpu102/topology/physical_package_id:-1 /sys/devices/system/cpu/cpu103/topology/physical_package_id:-1 .... "ibm,chip-id" is still being used on some occasion on pSeries machines. This is wrong :/ The problem is : #define topology_physical_package_id(cpu) (cpu_to_chip_id(cpu)) We should be using cpu_to_node(). C. > > Since the value was used by some PAPR kernels - even if they shouldn't > have - I think we should only remove this for newer machine types. We > also need to check what we're not supplying that the guest kernel is > showing a different number of sockets than specified on the qemu > command line. > >> >> Thanks, >> >> C. >> >> >> >> [...] >> [...] >> [...] >> [...] >> [...] >> [...] >> [...] >> [...] >> [...] >> > > ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH v2 1/8] powerpc/xive: Use cpu_to_node() instead of ibm,chip-id property 2021-03-12 9:53 ` Cédric Le Goater @ 2021-03-12 12:18 ` Daniel Henrique Barboza 2021-03-12 13:03 ` Greg Kurz 2021-03-12 13:28 ` Cédric Le Goater 0 siblings, 2 replies; 6+ messages in thread From: Daniel Henrique Barboza @ 2021-03-12 12:18 UTC (permalink / raw) To: Cédric Le Goater, David Gibson Cc: list@suse.de:PowerPC, Michael Ellerman, linuxppc-dev, Greg Kurz, QEMU Developers On 3/12/21 6:53 AM, Cédric Le Goater wrote: > On 3/12/21 2:55 AM, David Gibson wrote: >> On Tue, 9 Mar 2021 18:26:35 +0100 >> Cédric Le Goater <clg@kaod.org> wrote: >> >>> On 3/9/21 6:08 PM, Daniel Henrique Barboza wrote: >>>> >>>> >>>> On 3/9/21 12:33 PM, Cédric Le Goater wrote: >>>>> On 3/8/21 6:13 PM, Greg Kurz wrote: >>>>>> On Wed, 3 Mar 2021 18:48:50 +0100 >>>>>> Cédric Le Goater <clg@kaod.org> wrote: >>>>>> >>>>>>> The 'chip_id' field of the XIVE CPU structure is used to choose a >>>>>>> target for a source located on the same chip when possible. This field >>>>>>> is assigned on the PowerNV platform using the "ibm,chip-id" property >>>>>>> on pSeries under KVM when NUMA nodes are defined but it is undefined >>>>>> >>>>>> This sentence seems to have a syntax problem... like it is missing an >>>>>> 'and' before 'on pSeries'. >>>>> >>>>> ah yes, or simply a comma. >>>>> >>>>>>> under PowerVM. The XIVE source structure has a similar field >>>>>>> 'src_chip' which is only assigned on the PowerNV platform. >>>>>>> >>>>>>> cpu_to_node() returns a compatible value on all platforms, 0 being the >>>>>>> default node. It will also give us the opportunity to set the affinity >>>>>>> of a source on pSeries when we can localize them. >>>>>>> >>>>>> >>>>>> IIUC this relies on the fact that the NUMA node id is == to chip id >>>>>> on PowerNV, i.e. xc->chip_id which is passed to OPAL remain stable >>>>>> with this change. >>>>> >>>>> Linux sets the NUMA node in numa_setup_cpu(). On pseries, the hcall >>>>> H_HOME_NODE_ASSOCIATIVITY returns the node id if I am correct (Daniel >>>>> in Cc:) >>> [...] >>>>> >>>>> On PowerNV, Linux uses "ibm,associativity" property of the CPU to find >>>>> the node id. This value is built from the chip id in OPAL, so the >>>>> value returned by cpu_to_node(cpu) and the value of the "ibm,chip-id" >>>>> property are unlikely to be different. >>>>> >>>>> cpu_to_node(cpu) is used in many places to allocate the structures >>>>> locally to the owning node. XIVE is not an exception (see below in the >>>>> same patch), it is better to be consistent and get the same information >>>>> (node id) using the same routine. >>>>> >>>>> >>>>> In Linux, "ibm,chip-id" is only used in low level PowerNV drivers : >>>>> LPC, XSCOM, RNG, VAS, NX. XIVE should be in that list also but skiboot >>>>> unifies the controllers of the system to only expose one the OS. This >>>>> is problematic and should be changed but it's another topic. >>>>> >>>>> >>>>>> On the other hand, you have the pSeries case under PowerVM that >>>>>> doesn't xc->chip_id, which isn't passed to any hcall AFAICT. >>>>> >>>>> yes "ibm,chip-id" is an OPAL concept unfortunately and it has no meaning >>>>> under PAPR. xc->chip_id on pseries (PowerVM) will contains an invalid >>>>> chip id. >>>>> >>>>> QEMU/KVM exposes "ibm,chip-id" but it's not used. (its value is not >>>>> always correct btw) >>>> >>>> >>>> If you have a way to reliably reproduce this, let me know and I'll fix it >>>> up in QEMU. >>> >>> with : >>> >>> -smp 4,cores=1,maxcpus=8 -object memory-backend-ram,id=ram-node0,size=2G -numa node,nodeid=0,cpus=0-1,cpus=4-5,memdev=ram-node0 -object memory-backend-ram,id=ram-node1,size=2G -numa node,nodeid=1,cpus=2-3,cpus=6-7,memdev=ram-node1 >>> >>> # dmesg | grep numa >>> [ 0.013106] numa: Node 0 CPUs: 0-1 >>> [ 0.013136] numa: Node 1 CPUs: 2-3 >>> >>> # dtc -I fs /proc/device-tree/cpus/ -f | grep ibm,chip-id >>> ibm,chip-id = <0x01>; >>> ibm,chip-id = <0x02>; >>> ibm,chip-id = <0x00>; >>> ibm,chip-id = <0x03>; >>> >>> with : >>> >>> -smp 4,cores=4,maxcpus=8,threads=1 -object memory-backend-ram,id=ram-node0,size=2G -numa node,nodeid=0,cpus=0-1,cpus=4-5,memdev=ram-node0 -object memory-backend-ram,id=ram-node1,size=2G -numa node,nodeid=1,cpus=2-3,cpus=6-7,memdev=ram-node1 >>> >>> # dmesg | grep numa >>> [ 0.013106] numa: Node 0 CPUs: 0-1 >>> [ 0.013136] numa: Node 1 CPUs: 2-3 >>> >>> # dtc -I fs /proc/device-tree/cpus/ -f | grep ibm,chip-id >>> ibm,chip-id = <0x00>; >>> ibm,chip-id = <0x00>; >>> ibm,chip-id = <0x00>; >>> ibm,chip-id = <0x00>; >>> >>> I think we should simply remove "ibm,chip-id" since it's not used and >>> not in the PAPR spec. >> >> As I mentioned to Daniel on our call this morning, oddly it *does* >> appear to be used in the RHEL kernel, even though that's 4.18 based. >> This patch seems to have caused a minor regression; not in the >> identification of NUMA nodes, but in the number of sockets shown be >> lscpu, etc. See https://bugzilla.redhat.com/show_bug.cgi?id=1934421 >> for more information. > > Yes. The property "ibm,chip-id" is wrongly calculated in QEMU. If we > remove it, we get with 4.18.0-295.el8.ppc64le or 5.12.0-rc2 : > > [root@localhost ~]# lscpu > Architecture: ppc64le > Byte Order: Little Endian > CPU(s): 128 > On-line CPU(s) list: 0-127 > Thread(s) per core: 4 > Core(s) per socket: 16 > Socket(s): 2 > NUMA node(s): 2 > Model: 2.2 (pvr 004e 1202) > Model name: POWER9 (architected), altivec supported > Hypervisor vendor: KVM > Virtualization type: para > L1d cache: 32K > L1i cache: 32K > NUMA node0 CPU(s): 0-63 > NUMA node1 CPU(s): 64-127 > > [root@localhost ~]# grep . /sys/devices/system/cpu/*/topology/physical_package_id > /sys/devices/system/cpu/cpu0/topology/physical_package_id:-1 > /sys/devices/system/cpu/cpu100/topology/physical_package_id:-1 > /sys/devices/system/cpu/cpu101/topology/physical_package_id:-1 > /sys/devices/system/cpu/cpu102/topology/physical_package_id:-1 > /sys/devices/system/cpu/cpu103/topology/physical_package_id:-1 > .... > > "ibm,chip-id" is still being used on some occasion on pSeries machines. > This is wrong :/ The problem is : > > #define topology_physical_package_id(cpu) (cpu_to_chip_id(cpu)) > > We should be using cpu_to_node(). IIUC the "real fix" then is this change you mentioned above, together with this xive patch as well, to stop using ibm,chip-id for good in the pserie kernel. With these changes QEMU can remove 'ibm,chip-id' from the pseries machine without impact. Is this correct? If that's the case, then I believe it's ok to go forward with the QEMU side change (just for 6.0.0 and newer machines). Or should I wait for the kernel changes to be merged upstream first? Thanks, DHB > > C. > >> >> Since the value was used by some PAPR kernels - even if they shouldn't >> have - I think we should only remove this for newer machine types. We >> also need to check what we're not supplying that the guest kernel is >> showing a different number of sockets than specified on the qemu >> command line. >> >>> >>> Thanks, >>> >>> C. >>> >>> >>> >>> [...] >>> [...] >>> [...] >>> [...] >>> [...] >>> [...] >>> [...] >>> [...] >>> [...] >>> >> >> > ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH v2 1/8] powerpc/xive: Use cpu_to_node() instead of ibm,chip-id property 2021-03-12 12:18 ` Daniel Henrique Barboza @ 2021-03-12 13:03 ` Greg Kurz 2021-03-12 13:28 ` Cédric Le Goater 1 sibling, 0 replies; 6+ messages in thread From: Greg Kurz @ 2021-03-12 13:03 UTC (permalink / raw) To: Daniel Henrique Barboza Cc: Michael Ellerman, QEMU Developers, list@suse.de:PowerPC, Cédric Le Goater, David Gibson, linuxppc-dev On Fri, 12 Mar 2021 09:18:39 -0300 Daniel Henrique Barboza <danielhb@linux.ibm.com> wrote: > > > On 3/12/21 6:53 AM, Cédric Le Goater wrote: > > On 3/12/21 2:55 AM, David Gibson wrote: > >> On Tue, 9 Mar 2021 18:26:35 +0100 > >> Cédric Le Goater <clg@kaod.org> wrote: > >> > >>> On 3/9/21 6:08 PM, Daniel Henrique Barboza wrote: > >>>> > >>>> > >>>> On 3/9/21 12:33 PM, Cédric Le Goater wrote: > >>>>> On 3/8/21 6:13 PM, Greg Kurz wrote: > >>>>>> On Wed, 3 Mar 2021 18:48:50 +0100 > >>>>>> Cédric Le Goater <clg@kaod.org> wrote: > >>>>>> > >>>>>>> The 'chip_id' field of the XIVE CPU structure is used to choose a > >>>>>>> target for a source located on the same chip when possible. This field > >>>>>>> is assigned on the PowerNV platform using the "ibm,chip-id" property > >>>>>>> on pSeries under KVM when NUMA nodes are defined but it is undefined > >>>>>> > >>>>>> This sentence seems to have a syntax problem... like it is missing an > >>>>>> 'and' before 'on pSeries'. > >>>>> > >>>>> ah yes, or simply a comma. > >>>>> > >>>>>>> under PowerVM. The XIVE source structure has a similar field > >>>>>>> 'src_chip' which is only assigned on the PowerNV platform. > >>>>>>> > >>>>>>> cpu_to_node() returns a compatible value on all platforms, 0 being the > >>>>>>> default node. It will also give us the opportunity to set the affinity > >>>>>>> of a source on pSeries when we can localize them. > >>>>>>> > >>>>>> > >>>>>> IIUC this relies on the fact that the NUMA node id is == to chip id > >>>>>> on PowerNV, i.e. xc->chip_id which is passed to OPAL remain stable > >>>>>> with this change. > >>>>> > >>>>> Linux sets the NUMA node in numa_setup_cpu(). On pseries, the hcall > >>>>> H_HOME_NODE_ASSOCIATIVITY returns the node id if I am correct (Daniel > >>>>> in Cc:) > >>> [...] > >>>>> > >>>>> On PowerNV, Linux uses "ibm,associativity" property of the CPU to find > >>>>> the node id. This value is built from the chip id in OPAL, so the > >>>>> value returned by cpu_to_node(cpu) and the value of the "ibm,chip-id" > >>>>> property are unlikely to be different. > >>>>> > >>>>> cpu_to_node(cpu) is used in many places to allocate the structures > >>>>> locally to the owning node. XIVE is not an exception (see below in the > >>>>> same patch), it is better to be consistent and get the same information > >>>>> (node id) using the same routine. > >>>>> > >>>>> > >>>>> In Linux, "ibm,chip-id" is only used in low level PowerNV drivers : > >>>>> LPC, XSCOM, RNG, VAS, NX. XIVE should be in that list also but skiboot > >>>>> unifies the controllers of the system to only expose one the OS. This > >>>>> is problematic and should be changed but it's another topic. > >>>>> > >>>>> > >>>>>> On the other hand, you have the pSeries case under PowerVM that > >>>>>> doesn't xc->chip_id, which isn't passed to any hcall AFAICT. > >>>>> > >>>>> yes "ibm,chip-id" is an OPAL concept unfortunately and it has no meaning > >>>>> under PAPR. xc->chip_id on pseries (PowerVM) will contains an invalid > >>>>> chip id. > >>>>> > >>>>> QEMU/KVM exposes "ibm,chip-id" but it's not used. (its value is not > >>>>> always correct btw) > >>>> > >>>> > >>>> If you have a way to reliably reproduce this, let me know and I'll fix it > >>>> up in QEMU. > >>> > >>> with : > >>> > >>> -smp 4,cores=1,maxcpus=8 -object memory-backend-ram,id=ram-node0,size=2G -numa node,nodeid=0,cpus=0-1,cpus=4-5,memdev=ram-node0 -object memory-backend-ram,id=ram-node1,size=2G -numa node,nodeid=1,cpus=2-3,cpus=6-7,memdev=ram-node1 > >>> > >>> # dmesg | grep numa > >>> [ 0.013106] numa: Node 0 CPUs: 0-1 > >>> [ 0.013136] numa: Node 1 CPUs: 2-3 > >>> > >>> # dtc -I fs /proc/device-tree/cpus/ -f | grep ibm,chip-id > >>> ibm,chip-id = <0x01>; > >>> ibm,chip-id = <0x02>; > >>> ibm,chip-id = <0x00>; > >>> ibm,chip-id = <0x03>; > >>> > >>> with : > >>> > >>> -smp 4,cores=4,maxcpus=8,threads=1 -object memory-backend-ram,id=ram-node0,size=2G -numa node,nodeid=0,cpus=0-1,cpus=4-5,memdev=ram-node0 -object memory-backend-ram,id=ram-node1,size=2G -numa node,nodeid=1,cpus=2-3,cpus=6-7,memdev=ram-node1 > >>> > >>> # dmesg | grep numa > >>> [ 0.013106] numa: Node 0 CPUs: 0-1 > >>> [ 0.013136] numa: Node 1 CPUs: 2-3 > >>> > >>> # dtc -I fs /proc/device-tree/cpus/ -f | grep ibm,chip-id > >>> ibm,chip-id = <0x00>; > >>> ibm,chip-id = <0x00>; > >>> ibm,chip-id = <0x00>; > >>> ibm,chip-id = <0x00>; > >>> > >>> I think we should simply remove "ibm,chip-id" since it's not used and > >>> not in the PAPR spec. > >> > >> As I mentioned to Daniel on our call this morning, oddly it *does* > >> appear to be used in the RHEL kernel, even though that's 4.18 based. > >> This patch seems to have caused a minor regression; not in the > >> identification of NUMA nodes, but in the number of sockets shown be > >> lscpu, etc. See https://bugzilla.redhat.com/show_bug.cgi?id=1934421 > >> for more information. > > > > Yes. The property "ibm,chip-id" is wrongly calculated in QEMU. If we > > remove it, we get with 4.18.0-295.el8.ppc64le or 5.12.0-rc2 : > > > > [root@localhost ~]# lscpu > > Architecture: ppc64le > > Byte Order: Little Endian > > CPU(s): 128 > > On-line CPU(s) list: 0-127 > > Thread(s) per core: 4 > > Core(s) per socket: 16 > > Socket(s): 2 > > NUMA node(s): 2 > > Model: 2.2 (pvr 004e 1202) > > Model name: POWER9 (architected), altivec supported > > Hypervisor vendor: KVM > > Virtualization type: para > > L1d cache: 32K > > L1i cache: 32K > > NUMA node0 CPU(s): 0-63 > > NUMA node1 CPU(s): 64-127 > > > > [root@localhost ~]# grep . /sys/devices/system/cpu/*/topology/physical_package_id > > /sys/devices/system/cpu/cpu0/topology/physical_package_id:-1 > > /sys/devices/system/cpu/cpu100/topology/physical_package_id:-1 > > /sys/devices/system/cpu/cpu101/topology/physical_package_id:-1 > > /sys/devices/system/cpu/cpu102/topology/physical_package_id:-1 > > /sys/devices/system/cpu/cpu103/topology/physical_package_id:-1 > > .... > > > > "ibm,chip-id" is still being used on some occasion on pSeries machines. > > This is wrong :/ The problem is : > > > > #define topology_physical_package_id(cpu) (cpu_to_chip_id(cpu)) > > > > We should be using cpu_to_node(). > > > IIUC the "real fix" then is this change you mentioned above, together with > this xive patch as well, to stop using ibm,chip-id for good in the pserie > kernel. With these changes QEMU can remove 'ibm,chip-id' from the pseries > machine without impact. Is this correct? > > If that's the case, then I believe it's ok to go forward with the QEMU side > change (just for 6.0.0 and newer machines). Or should I wait for the kernel > changes to be merged upstream first? > I'd say the latter since this is a breaking change and people will want to identify the upstream commits they have to backport to their kernel in order to support the disappearance of "ibm,chip-id". Cheers, -- Greg > > Thanks, > > > DHB > > > > > > C. > > > >> > >> Since the value was used by some PAPR kernels - even if they shouldn't > >> have - I think we should only remove this for newer machine types. We > >> also need to check what we're not supplying that the guest kernel is > >> showing a different number of sockets than specified on the qemu > >> command line. > >> > >>> > >>> Thanks, > >>> > >>> C. > >>> > >>> > >>> > >>> [...] > >>> [...] > >>> [...] > >>> [...] > >>> [...] > >>> [...] > >>> [...] > >>> [...] > >>> [...] > >>> > >> > >> > > ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH v2 1/8] powerpc/xive: Use cpu_to_node() instead of ibm,chip-id property 2021-03-12 12:18 ` Daniel Henrique Barboza 2021-03-12 13:03 ` Greg Kurz @ 2021-03-12 13:28 ` Cédric Le Goater 1 sibling, 0 replies; 6+ messages in thread From: Cédric Le Goater @ 2021-03-12 13:28 UTC (permalink / raw) To: Daniel Henrique Barboza, David Gibson Cc: list@suse.de:PowerPC, Michael Ellerman, linuxppc-dev, Greg Kurz, QEMU Developers On 3/12/21 1:18 PM, Daniel Henrique Barboza wrote: > > > On 3/12/21 6:53 AM, Cédric Le Goater wrote: >> On 3/12/21 2:55 AM, David Gibson wrote: >>> On Tue, 9 Mar 2021 18:26:35 +0100 >>> Cédric Le Goater <clg@kaod.org> wrote: >>> >>>> On 3/9/21 6:08 PM, Daniel Henrique Barboza wrote: >>>>> >>>>> >>>>> On 3/9/21 12:33 PM, Cédric Le Goater wrote: >>>>>> On 3/8/21 6:13 PM, Greg Kurz wrote: >>>>>>> On Wed, 3 Mar 2021 18:48:50 +0100 >>>>>>> Cédric Le Goater <clg@kaod.org> wrote: >>>>>>> >>>>>>>> The 'chip_id' field of the XIVE CPU structure is used to choose a >>>>>>>> target for a source located on the same chip when possible. This field >>>>>>>> is assigned on the PowerNV platform using the "ibm,chip-id" property >>>>>>>> on pSeries under KVM when NUMA nodes are defined but it is undefined >>>>>>> >>>>>>> This sentence seems to have a syntax problem... like it is missing an >>>>>>> 'and' before 'on pSeries'. >>>>>> >>>>>> ah yes, or simply a comma. >>>>>> >>>>>>>> under PowerVM. The XIVE source structure has a similar field >>>>>>>> 'src_chip' which is only assigned on the PowerNV platform. >>>>>>>> >>>>>>>> cpu_to_node() returns a compatible value on all platforms, 0 being the >>>>>>>> default node. It will also give us the opportunity to set the affinity >>>>>>>> of a source on pSeries when we can localize them. >>>>>>>> >>>>>>> >>>>>>> IIUC this relies on the fact that the NUMA node id is == to chip id >>>>>>> on PowerNV, i.e. xc->chip_id which is passed to OPAL remain stable >>>>>>> with this change. >>>>>> >>>>>> Linux sets the NUMA node in numa_setup_cpu(). On pseries, the hcall >>>>>> H_HOME_NODE_ASSOCIATIVITY returns the node id if I am correct (Daniel >>>>>> in Cc:) >>>> [...] >>>>>> >>>>>> On PowerNV, Linux uses "ibm,associativity" property of the CPU to find >>>>>> the node id. This value is built from the chip id in OPAL, so the >>>>>> value returned by cpu_to_node(cpu) and the value of the "ibm,chip-id" >>>>>> property are unlikely to be different. >>>>>> >>>>>> cpu_to_node(cpu) is used in many places to allocate the structures >>>>>> locally to the owning node. XIVE is not an exception (see below in the >>>>>> same patch), it is better to be consistent and get the same information >>>>>> (node id) using the same routine. >>>>>> >>>>>> >>>>>> In Linux, "ibm,chip-id" is only used in low level PowerNV drivers : >>>>>> LPC, XSCOM, RNG, VAS, NX. XIVE should be in that list also but skiboot >>>>>> unifies the controllers of the system to only expose one the OS. This >>>>>> is problematic and should be changed but it's another topic. >>>>>> >>>>>> >>>>>>> On the other hand, you have the pSeries case under PowerVM that >>>>>>> doesn't xc->chip_id, which isn't passed to any hcall AFAICT. >>>>>> >>>>>> yes "ibm,chip-id" is an OPAL concept unfortunately and it has no meaning >>>>>> under PAPR. xc->chip_id on pseries (PowerVM) will contains an invalid >>>>>> chip id. >>>>>> >>>>>> QEMU/KVM exposes "ibm,chip-id" but it's not used. (its value is not >>>>>> always correct btw) >>>>> >>>>> >>>>> If you have a way to reliably reproduce this, let me know and I'll fix it >>>>> up in QEMU. >>>> >>>> with : >>>> >>>> -smp 4,cores=1,maxcpus=8 -object memory-backend-ram,id=ram-node0,size=2G -numa node,nodeid=0,cpus=0-1,cpus=4-5,memdev=ram-node0 -object memory-backend-ram,id=ram-node1,size=2G -numa node,nodeid=1,cpus=2-3,cpus=6-7,memdev=ram-node1 >>>> >>>> # dmesg | grep numa >>>> [ 0.013106] numa: Node 0 CPUs: 0-1 >>>> [ 0.013136] numa: Node 1 CPUs: 2-3 >>>> >>>> # dtc -I fs /proc/device-tree/cpus/ -f | grep ibm,chip-id >>>> ibm,chip-id = <0x01>; >>>> ibm,chip-id = <0x02>; >>>> ibm,chip-id = <0x00>; >>>> ibm,chip-id = <0x03>; >>>> >>>> with : >>>> >>>> -smp 4,cores=4,maxcpus=8,threads=1 -object memory-backend-ram,id=ram-node0,size=2G -numa node,nodeid=0,cpus=0-1,cpus=4-5,memdev=ram-node0 -object memory-backend-ram,id=ram-node1,size=2G -numa node,nodeid=1,cpus=2-3,cpus=6-7,memdev=ram-node1 >>>> >>>> # dmesg | grep numa >>>> [ 0.013106] numa: Node 0 CPUs: 0-1 >>>> [ 0.013136] numa: Node 1 CPUs: 2-3 >>>> >>>> # dtc -I fs /proc/device-tree/cpus/ -f | grep ibm,chip-id >>>> ibm,chip-id = <0x00>; >>>> ibm,chip-id = <0x00>; >>>> ibm,chip-id = <0x00>; >>>> ibm,chip-id = <0x00>; >>>> >>>> I think we should simply remove "ibm,chip-id" since it's not used and >>>> not in the PAPR spec. >>> >>> As I mentioned to Daniel on our call this morning, oddly it *does* >>> appear to be used in the RHEL kernel, even though that's 4.18 based. >>> This patch seems to have caused a minor regression; not in the >>> identification of NUMA nodes, but in the number of sockets shown be >>> lscpu, etc. See https://bugzilla.redhat.com/show_bug.cgi?id=1934421 >>> for more information. >> >> Yes. The property "ibm,chip-id" is wrongly calculated in QEMU. If we >> remove it, we get with 4.18.0-295.el8.ppc64le or 5.12.0-rc2 : >> >> [root@localhost ~]# lscpu >> Architecture: ppc64le >> Byte Order: Little Endian >> CPU(s): 128 >> On-line CPU(s) list: 0-127 >> Thread(s) per core: 4 >> Core(s) per socket: 16 >> Socket(s): 2 >> NUMA node(s): 2 >> Model: 2.2 (pvr 004e 1202) >> Model name: POWER9 (architected), altivec supported >> Hypervisor vendor: KVM >> Virtualization type: para >> L1d cache: 32K >> L1i cache: 32K >> NUMA node0 CPU(s): 0-63 >> NUMA node1 CPU(s): 64-127 >> >> [root@localhost ~]# grep . /sys/devices/system/cpu/*/topology/physical_package_id >> /sys/devices/system/cpu/cpu0/topology/physical_package_id:-1 >> /sys/devices/system/cpu/cpu100/topology/physical_package_id:-1 >> /sys/devices/system/cpu/cpu101/topology/physical_package_id:-1 >> /sys/devices/system/cpu/cpu102/topology/physical_package_id:-1 >> /sys/devices/system/cpu/cpu103/topology/physical_package_id:-1 >> .... >> >> "ibm,chip-id" is still being used on some occasion on pSeries machines. >> This is wrong :/ The problem is : >> >> #define topology_physical_package_id(cpu) (cpu_to_chip_id(cpu)) >> >> We should be using cpu_to_node(). > > > IIUC the "real fix" then is this change you mentioned above, together with > this xive patch as well, These are independent. The XIVE patch just raised the issue because it's another usage example of cpu_to_chip_id() or directly "ibm,chip-id" in the XIVE case, on a pseries machine. The use of cpu_to_node(cpu) for topology_physical_package_id(cpu) is a fix for the sysfs issue reported in the redhat BZ. > to stop using ibm,chip-id for good in the pserie > kernel. With these changes QEMU can remove 'ibm,chip-id' from the pseries > machine without impact. Is this correct? Linux is already "broken" on PowerVM today since we don't have the "ibm,chip-id" property. QEMU is just hiding the problem on KVM. But we have to be bug compatible :) if the QEMU fix is under the pseries-6.x machine we should be fine. > If that's the case, then I believe it's ok to go forward with the QEMU side > change (just for 6.0.0 and newer machines). Or should I wait for the kernel > changes to be merged upstream first? Once Linux is fixed, we shouldn't care if QEMU exports 'ibm,chip-id' or not. I don't think the order is very important. These are independent. C. ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2021-03-12 13:30 UTC | newest] Thread overview: 6+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- [not found] <20210303174857.1760393-1-clg@kaod.org> [not found] ` <20210303174857.1760393-2-clg@kaod.org> [not found] ` <20210308181359.789c143b@bahia.lan> [not found] ` <8dd98e22-1f10-e87b-3fe3-e786bc9a8d71@kaod.org> [not found] ` <3180b5c6-e61f-9c5f-3c80-f10e69dc5785@linux.ibm.com> 2021-03-09 17:26 ` [PATCH v2 1/8] powerpc/xive: Use cpu_to_node() instead of ibm,chip-id property Cédric Le Goater 2021-03-12 1:55 ` David Gibson 2021-03-12 9:53 ` Cédric Le Goater 2021-03-12 12:18 ` Daniel Henrique Barboza 2021-03-12 13:03 ` Greg Kurz 2021-03-12 13:28 ` Cédric Le Goater
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).