Re: [PATCH v3 00/18] APIC ID fixes for AMD EPYC CPU models

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

From: Igor Mammedov <imammedo@redhat.com>
To: Babu Moger <babu.moger@amd.com>
Cc: ehabkost@redhat.com, mst@redhat.com, armbru@redhat.com,
	qemu-devel@nongnu.org, pbonzini@redhat.com, rth@twiddle.net
Subject: Re: [PATCH v3 00/18] APIC ID fixes for AMD EPYC CPU models
Date: Tue, 4 Feb 2020 09:02:30 +0100	[thread overview]
Message-ID: <20200204090230.28f31a87@redhat.com> (raw)
In-Reply-To: <b493a4f4-48de-79a7-00d5-119fbe789879@amd.com>

On Mon, 3 Feb 2020 13:31:29 -0600
Babu Moger <babu.moger@amd.com> wrote:

> On 2/3/20 8:59 AM, Igor Mammedov wrote:
> > On Tue, 03 Dec 2019 18:36:54 -0600
> > Babu Moger <babu.moger@amd.com> wrote:
> >   
> >> This series fixes APIC ID encoding problems on AMD EPYC CPUs.
> >> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugzilla.redhat.com%2Fshow_bug.cgi%3Fid%3D1728166&amp;data=02%7C01%7Cbabu.moger%40amd.com%7C50685202e372472d7b2c08d7a8b9afa6%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637163387802886193&amp;sdata=N%2FaBBZ8G3D1gCNvabVQ%2FraHvINazcVeEc9FWdxQAWmg%3D&amp;reserved=0
> >>
> >> Currently, the APIC ID is decoded based on the sequence
> >> sockets->dies->cores->threads. This works for most standard AMD and other
> >> vendors' configurations, but this decoding sequence does not follow that of
> >> AMD's APIC ID enumeration strictly. In some cases this can cause CPU topology
> >> inconsistency.  When booting a guest VM, the kernel tries to validate the
> >> topology, and finds it inconsistent with the enumeration of EPYC cpu models.
> >>
> >> To fix the problem we need to build the topology as per the Processor
> >> Programming Reference (PPR) for AMD Family 17h Model 01h, Revision B1
> >> Processors. It is available at https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.amd.com%2Fsystem%2Ffiles%2FTechDocs%2F55570-B1_PUB.zip&amp;data=02%7C01%7Cbabu.moger%40amd.com%7C50685202e372472d7b2c08d7a8b9afa6%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637163387802886193&amp;sdata=McjyMS3A3x5Jr57VxJmHDyh5jumdybzW%2FwLtE4FAKHQ%3D&amp;reserved=0
> >>
> >> Here is the text from the PPR.
> >> Operating systems are expected to use Core::X86::Cpuid::SizeId[ApicIdSize], the
> >> number of least significant bits in the Initial APIC ID that indicate core ID
> >> within a processor, in constructing per-core CPUID masks.
> >> Core::X86::Cpuid::SizeId[ApicIdSize] determines the maximum number of cores
> >> (MNC) that the processor could theoretically support, not the actual number of
> >> cores that are actually implemented or enabled on the processor, as indicated
> >> by Core::X86::Cpuid::SizeId[NC].
> >> Each Core::X86::Apic::ApicId[ApicId] register is preset as follows:
> >> • ApicId[6] = Socket ID.
> >> • ApicId[5:4] = Node ID.
> >> • ApicId[3] = Logical CCX L3 complex ID
> >> • ApicId[2:0]= (SMT) ? {LogicalCoreID[1:0],ThreadId} : {1'b0,LogicalCoreID[1:0]}  
> > 
> > 
> > After checking out all patches and some pondering, used here approach
> > looks to me too intrusive for the task at hand especially where it
> > comes to generic code.
> > 
> > (Ignore till ==== to see suggestion how to simplify without reading
> > reasoning behind it first)
> > 
> > Lets look for a way to simplify it a little bit.
> > 
> > So problem we are trying to solve,
> >  1: calculate APIC IDs based on cpu type (to e more specific: for EPYC based CPUs)
> >  2: it depends on knowing total number of numa nodes.
> > 
> > Externally workflow looks like following:
> >   1. user provides -smp x,sockets,cores,...,maxcpus
> >       that's used by possible_cpu_arch_ids() singleton to build list of
> >       possible CPUs (which is available to user via command 'hotpluggable-cpus')
> > 
> >       Hook could be called very early and possible_cpus data might be
> >       not complete. It builds a list of possible CPUs which user could
> >       modify later.
> > 
> >   2.1 user uses "-numa cpu,node-id=x,..." or legacy "-numa node,node_id=x,cpus="
> >       options to assign cpus to nodes, which is one way or another calling
> >       machine_set_cpu_numa_node(). The later updates 'possible_cpus' list
> >       with node information. It happens early when total number of nodes
> >       is not available.
> > 
> >   2.2 user does not provide explicit node mappings for CPUs.
> >       QEMU steps in and assigns possible cpus to nodes in machine_numa_finish_cpu_init()
> >       (using the same machine_set_cpu_numa_node()) right before calling boards
> >       specific machine init(). At that time total number of nodes is known.
> > 
> > In 1 -- 2.1 cases, 'arch_id' in 'possible_cpus' list doesn't have to be defined before
> > boards init() is run.
> > 
> > In 2.2 case it calls get_default_cpu_node_id() -> x86_get_default_cpu_node_id()
> > which uses arch_id calculate numa node.
> > But then question is: does it have to use APIC id or could it infer 'pkg_id',
> > it's after, from ms->possible_cpus->cpus[i].props data?  
> 
> Not sure if I got the question right. In this case because the numa
> information is not provided all the cpus are assigned to only one node.
> The apic id is used here to get the correct pkg_id.

apicid was composed from socket/core/thread[/die] tuple which cpus[i].props is.

Question is if we can compose only pkg_id based on the same data without
converting it to apicid and then "reverse engineering" it back
original data?

Or more direct question: is socket-id the same as pkg_id?


> 
> >   
> > With that out of the way APIC ID will be used only during board's init(),
> > so board could update possible_cpus with valid APIC IDs at the start of
> > x86_cpus_init().
> > 
> > ====
> > in nutshell it would be much easier to do following:
> > 
> >  1. make x86_get_default_cpu_node_id() APIC ID in-depended or
> >     if impossible as alternative recompute APIC IDs there if cpu
> >     type is EPYC based (since number of nodes is already known)
> >  2. recompute APIC IDs in x86_cpus_init() if cpu type is EPYC based
> > 
> > this way one doesn't need to touch generic numa code, introduce
> > x86 specific init_apicid_fn() hook into generic code and keep
> > x86/EPYC nuances contained within x86 code only.  
> 
> I was kind of already working in the similar direction in v4.
> 1. We already have split the numa initialization in patch #12(Split the
> numa initialization). This way we know exactly how many numa nodes are
> there before hand.

I suggest to drop that patch, It's the one that touches generic numa
code and adding more legacy based extensions like cpu_indexes.
Which I'd like to get rid of to begin with, so only -numa cpu is left.

I think it's not necessary to touch numa code at all for apicid generation
purpose, as I tried to explain above. We should be able to keep
this x86 only business.

> 2. Planning to remove init_apicid_fn
> 3. Insert the handlers inside X86CPUDefinition.
what handlers do you mean?

> 4. EPYC model will have its own apid id handlers. Everything else will be
> initialized with a default handlers(current default handler).
> 5. The function pc_possible_cpu_arch_ids will load the model definition
> and initialize the PCMachineState data structure with the model specific
> handlers.
I'm not sure what do you mean here.
 
> Does that sound similar to what you are thinking. Thoughts?
If you have something to share and can push it on github,
I can look at, whether it has design issues to spare you a round trip on a list.
(it won't be proper review but at least I can help to pinpoint most problematic parts)

> 
> >   
> >> v3:
> >>   1. Consolidated the topology information in structure X86CPUTopoInfo.
> >>   2. Changed the ccx_id to llc_id as commented by upstream.
> >>   3. Generalized the apic id decoding. It is mostly similar to current apic id
> >>      except that it adds new field llc_id when numa configured. Removes all the
> >>      hardcoded values.
> >>   4. Removed the earlier parse_numa split. And moved the numa node initialization
> >>      inside the numa_complete_configuration. This is bit cleaner as commented by 
> >>      Eduardo.
> >>   5. Added new function init_apicid_fn inside machine_class structure. This
> >>      will be used to update the apic id handler specific to cpu model.
> >>   6. Updated the cpuid unit tests.
> >>   7. TODO : Need to figure out how to dynamically update the handlers using cpu models.
> >>      I might some guidance on that.
> >>
> >> v2:
> >>   https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.kernel.org%2Fqemu-devel%2F156779689013.21957.1631551572950676212.stgit%40localhost.localdomain%2F&amp;data=02%7C01%7Cbabu.moger%40amd.com%7C50685202e372472d7b2c08d7a8b9afa6%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637163387802886193&amp;sdata=ls1cxA1yh0P05zYsAf3sLXDM11DFHtxZvfWWaar7Mgg%3D&amp;reserved=0
> >>   1. Introduced the new property epyc to enable new epyc mode.
> >>   2. Separated the epyc mode and non epyc mode function.
> >>   3. Introduced function pointers in PCMachineState to handle the
> >>      differences.
> >>   4. Mildly tested different combinations to make things are working as expected.
> >>   5. TODO : Setting the epyc feature bit needs to be worked out. This feature is
> >>      supported only on AMD EPYC models. I may need some guidance on that.
> >>
> >> v1:
> >>   https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.kernel.org%2Fqemu-devel%2F20190731232032.51786-1-babu.moger%40amd.com%2F&amp;data=02%7C01%7Cbabu.moger%40amd.com%7C50685202e372472d7b2c08d7a8b9afa6%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637163387802886193&amp;sdata=nT4T9RIL4EeSvB%2Ff9%2BjbU7lldopjglQ2X6uYx13WMPE%3D&amp;reserved=0
> >>
> >> ---
> >>
> >> Babu Moger (18):
> >>       hw/i386: Rename X86CPUTopoInfo structure to X86CPUTopoIDs
> >>       hw/i386: Introduce X86CPUTopoInfo to contain topology info
> >>       hw/i386: Consolidate topology functions
> >>       hw/i386: Introduce initialize_topo_info to initialize X86CPUTopoInfo
> >>       machine: Add SMP Sockets in CpuTopology
> >>       hw/core: Add core complex id in X86CPU topology
> >>       machine: Add a new function init_apicid_fn in MachineClass
> >>       hw/i386: Update structures for nodes_per_pkg
> >>       i386: Add CPUX86Family type in CPUX86State
> >>       hw/386: Add EPYC mode topology decoding functions
> >>       i386: Cleanup and use the EPYC mode topology functions
> >>       numa: Split the numa initialization
> >>       hw/i386: Introduce apicid_from_cpu_idx in PCMachineState
> >>       hw/i386: Introduce topo_ids_from_apicid handler PCMachineState
> >>       hw/i386: Introduce apic_id_from_topo_ids handler in PCMachineState
> >>       hw/i386: Introduce EPYC mode function handlers
> >>       i386: Fix pkg_id offset for epyc mode
> >>       tests: Update the Unit tests
> >>
> >>
> >>  hw/core/machine-hmp-cmds.c |    3 +
> >>  hw/core/machine.c          |   14 +++
> >>  hw/core/numa.c             |   62 +++++++++----
> >>  hw/i386/pc.c               |  132 +++++++++++++++++++---------
> >>  include/hw/boards.h        |    3 +
> >>  include/hw/i386/pc.h       |    9 ++
> >>  include/hw/i386/topology.h |  209 +++++++++++++++++++++++++++++++-------------
> >>  include/sysemu/numa.h      |    5 +
> >>  qapi/machine.json          |    7 +
> >>  target/i386/cpu.c          |  196 ++++++++++++-----------------------------
> >>  target/i386/cpu.h          |    9 ++
> >>  tests/test-x86-cpuid.c     |  115 ++++++++++++++----------
> >>  vl.c                       |    4 +
> >>  13 files changed, 455 insertions(+), 313 deletions(-)
> >>
> >> --
> >>  
> >   
>

next prev parent reply	other threads:[~2020-02-04  8:04 UTC|newest]

Thread overview: 53+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-12-04  0:36 [PATCH v3 00/18] APIC ID fixes for AMD EPYC CPU models Babu Moger
2019-12-04  0:37 ` [PATCH v3 01/18] hw/i386: Rename X86CPUTopoInfo structure to X86CPUTopoIDs Babu Moger
2020-02-03 15:08   ` Igor Mammedov
2020-02-03 18:25     ` Babu Moger
2019-12-04  0:37 ` [PATCH v3 02/18] hw/i386: Introduce X86CPUTopoInfo to contain topology info Babu Moger
2020-01-28 15:44   ` Igor Mammedov
2019-12-04  0:37 ` [PATCH v3 03/18] hw/i386: Consolidate topology functions Babu Moger
2020-01-28 15:46   ` Igor Mammedov
2019-12-04  0:37 ` [PATCH v3 04/18] hw/i386: Introduce initialize_topo_info to initialize X86CPUTopoInfo Babu Moger
2020-01-28 15:49   ` Igor Mammedov
2020-01-28 16:42     ` Babu Moger
2019-12-04  0:37 ` [PATCH v3 05/18] machine: Add SMP Sockets in CpuTopology Babu Moger
2019-12-04  0:37 ` [PATCH v3 06/18] hw/core: Add core complex id in X86CPU topology Babu Moger
2020-01-28 16:27   ` Igor Mammedov
2020-01-28 16:44     ` Babu Moger
2020-01-28 16:31   ` Eric Blake
2020-01-28 16:44     ` Babu Moger
2019-12-04  0:37 ` [PATCH v3 07/18] machine: Add a new function init_apicid_fn in MachineClass Babu Moger
2020-01-28 16:29   ` Igor Mammedov
2020-01-28 19:45     ` Babu Moger
2020-01-28 20:12       ` Eduardo Habkost
2020-01-29  9:14       ` Igor Mammedov
2020-01-29 16:17         ` Babu Moger
2020-02-03 15:17           ` Igor Mammedov
2020-02-03 21:49             ` Babu Moger
2020-02-04  7:38               ` Igor Mammedov
2020-01-29 16:32         ` Babu Moger
2020-01-29 16:51           ` Eduardo Habkost
2020-01-29 17:05             ` Babu Moger
2019-12-04  0:37 ` [PATCH v3 08/18] hw/i386: Update structures for nodes_per_pkg Babu Moger
2019-12-04  0:37 ` [PATCH v3 09/18] i386: Add CPUX86Family type in CPUX86State Babu Moger
2019-12-04  0:38 ` [PATCH v3 10/18] hw/386: Add EPYC mode topology decoding functions Babu Moger
2019-12-04  0:38 ` [PATCH v3 11/18] i386: Cleanup and use the EPYC mode topology functions Babu Moger
2019-12-04  0:38 ` [PATCH v3 12/18] numa: Split the numa initialization Babu Moger
2019-12-04  0:38 ` [PATCH v3 13/18] hw/i386: Introduce apicid_from_cpu_idx in PCMachineState Babu Moger
2019-12-04  0:38 ` [PATCH v3 14/18] hw/i386: Introduce topo_ids_from_apicid handler PCMachineState Babu Moger
2019-12-04  0:38 ` [PATCH v3 15/18] hw/i386: Introduce apic_id_from_topo_ids handler in PCMachineState Babu Moger
2019-12-04  0:38 ` [PATCH v3 16/18] hw/i386: Introduce EPYC mode function handlers Babu Moger
2020-01-28 20:04   ` Eduardo Habkost
2020-01-28 21:48     ` Babu Moger
2020-01-29 16:41       ` Eduardo Habkost
2019-12-04  0:38 ` [PATCH v3 17/18] i386: Fix pkg_id offset for epyc mode Babu Moger
2019-12-04  0:39 ` [PATCH v3 18/18] tests: Update the Unit tests Babu Moger
2020-02-03 14:59 ` [PATCH v3 00/18] APIC ID fixes for AMD EPYC CPU models Igor Mammedov
2020-02-03 19:31   ` Babu Moger
2020-02-04  8:02     ` Igor Mammedov [this message]
2020-02-04 19:08       ` Babu Moger
2020-02-05  9:38         ` Igor Mammedov
2020-02-05 16:10           ` Babu Moger
2020-02-05 16:56             ` Igor Mammedov
2020-02-05 19:07               ` Babu Moger
2020-02-06 13:08                 ` Igor Mammedov
2020-02-06 15:32                   ` Babu Moger

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200204090230.28f31a87@redhat.com \
    --to=imammedo@redhat.com \
    --cc=armbru@redhat.com \
    --cc=babu.moger@amd.com \
    --cc=ehabkost@redhat.com \
    --cc=mst@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=rth@twiddle.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).