qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Igor Mammedov <imammedo@redhat.com>
To: Babu Moger <babu.moger@amd.com>
Cc: ehabkost@redhat.com, mst@redhat.com, armbru@redhat.com,
	qemu-devel@nongnu.org, pbonzini@redhat.com, rth@twiddle.net
Subject: Re: [PATCH v3 00/18] APIC ID fixes for AMD EPYC CPU models
Date: Thu, 6 Feb 2020 14:08:39 +0100	[thread overview]
Message-ID: <20200206140839.378ea544@redhat.com> (raw)
In-Reply-To: <8b97f174-dd7e-cec0-daf7-29104d663aa6@amd.com>

On Wed, 5 Feb 2020 13:07:31 -0600
Babu Moger <babu.moger@amd.com> wrote:

> On 2/5/20 10:56 AM, Igor Mammedov wrote:
> > On Wed, 5 Feb 2020 10:10:06 -0600
> > Babu Moger <babu.moger@amd.com> wrote:
> >   
> >> On 2/5/20 3:38 AM, Igor Mammedov wrote:  
> >>> On Tue, 4 Feb 2020 13:08:58 -0600
> >>> Babu Moger <babu.moger@amd.com> wrote:
> >>>     
> >>>> On 2/4/20 2:02 AM, Igor Mammedov wrote:    
> >>>>> On Mon, 3 Feb 2020 13:31:29 -0600
> >>>>> Babu Moger <babu.moger@amd.com> wrote:
> >>>>>       
> >>>>>> On 2/3/20 8:59 AM, Igor Mammedov wrote:      
> >>>>>>> On Tue, 03 Dec 2019 18:36:54 -0600
> >>>>>>> Babu Moger <babu.moger@amd.com> wrote:
> >>>>>>>         
> >>>>>>>> This series fixes APIC ID encoding problems on AMD EPYC CPUs.
> >>>>>>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugzilla.redhat.com%2Fshow_bug.cgi%3Fid%3D1728166&amp;data=02%7C01%7Cbabu.moger%40amd.com%7C6b6d6af79fee45cc904808d7aa5c5f37%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637165186049856500&amp;sdata=vDAkIxR3U6LX%2FmnYjZPRC55smMqLend%2FHQjbfYWydBk%3D&amp;reserved=0
> >>>>>>>>
> >>>>>>>> Currently, the APIC ID is decoded based on the sequence
> >>>>>>>> sockets->dies->cores->threads. This works for most standard AMD and other
> >>>>>>>> vendors' configurations, but this decoding sequence does not follow that of
> >>>>>>>> AMD's APIC ID enumeration strictly. In some cases this can cause CPU topology
> >>>>>>>> inconsistency.  When booting a guest VM, the kernel tries to validate the
> >>>>>>>> topology, and finds it inconsistent with the enumeration of EPYC cpu models.
> >>>>>>>>
> >>>>>>>> To fix the problem we need to build the topology as per the Processor
> >>>>>>>> Programming Reference (PPR) for AMD Family 17h Model 01h, Revision B1
> >>>>>>>> Processors. It is available at https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.amd.com%2Fsystem%2Ffiles%2FTechDocs%2F55570-B1_PUB.zip&amp;data=02%7C01%7Cbabu.moger%40amd.com%7C6b6d6af79fee45cc904808d7aa5c5f37%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637165186049856500&amp;sdata=rVMRN%2BbUeGWEksKO5uQ3Wxc71eeHCXMrkLVRbo4JHHI%3D&amp;reserved=0
> >>>>>>>>
> >>>>>>>> Here is the text from the PPR.
> >>>>>>>> Operating systems are expected to use Core::X86::Cpuid::SizeId[ApicIdSize], the
> >>>>>>>> number of least significant bits in the Initial APIC ID that indicate core ID
> >>>>>>>> within a processor, in constructing per-core CPUID masks.
> >>>>>>>> Core::X86::Cpuid::SizeId[ApicIdSize] determines the maximum number of cores
> >>>>>>>> (MNC) that the processor could theoretically support, not the actual number of
> >>>>>>>> cores that are actually implemented or enabled on the processor, as indicated
> >>>>>>>> by Core::X86::Cpuid::SizeId[NC].
> >>>>>>>> Each Core::X86::Apic::ApicId[ApicId] register is preset as follows:
> >>>>>>>> • ApicId[6] = Socket ID.
> >>>>>>>> • ApicId[5:4] = Node ID.
> >>>>>>>> • ApicId[3] = Logical CCX L3 complex ID
> >>>>>>>> • ApicId[2:0]= (SMT) ? {LogicalCoreID[1:0],ThreadId} : {1'b0,LogicalCoreID[1:0]}        
> >>>>>>>
> >>>>>>>
> >>>>>>> After checking out all patches and some pondering, used here approach
> >>>>>>> looks to me too intrusive for the task at hand especially where it
> >>>>>>> comes to generic code.
> >>>>>>>
> >>>>>>> (Ignore till ==== to see suggestion how to simplify without reading
> >>>>>>> reasoning behind it first)
> >>>>>>>
> >>>>>>> Lets look for a way to simplify it a little bit.
> >>>>>>>
> >>>>>>> So problem we are trying to solve,
> >>>>>>>  1: calculate APIC IDs based on cpu type (to e more specific: for EPYC based CPUs)
> >>>>>>>  2: it depends on knowing total number of numa nodes.
> >>>>>>>
> >>>>>>> Externally workflow looks like following:
> >>>>>>>   1. user provides -smp x,sockets,cores,...,maxcpus
> >>>>>>>       that's used by possible_cpu_arch_ids() singleton to build list of
> >>>>>>>       possible CPUs (which is available to user via command 'hotpluggable-cpus')
> >>>>>>>
> >>>>>>>       Hook could be called very early and possible_cpus data might be
> >>>>>>>       not complete. It builds a list of possible CPUs which user could
> >>>>>>>       modify later.
> >>>>>>>
> >>>>>>>   2.1 user uses "-numa cpu,node-id=x,..." or legacy "-numa node,node_id=x,cpus="
> >>>>>>>       options to assign cpus to nodes, which is one way or another calling
> >>>>>>>       machine_set_cpu_numa_node(). The later updates 'possible_cpus' list
> >>>>>>>       with node information. It happens early when total number of nodes
> >>>>>>>       is not available.
> >>>>>>>
> >>>>>>>   2.2 user does not provide explicit node mappings for CPUs.
> >>>>>>>       QEMU steps in and assigns possible cpus to nodes in machine_numa_finish_cpu_init()
> >>>>>>>       (using the same machine_set_cpu_numa_node()) right before calling boards
> >>>>>>>       specific machine init(). At that time total number of nodes is known.
> >>>>>>>
> >>>>>>> In 1 -- 2.1 cases, 'arch_id' in 'possible_cpus' list doesn't have to be defined before
> >>>>>>> boards init() is run.      
> >>>>
> >>>> In case of 2.1, we need to have the arch_id already generated. This is
> >>>> done inside possible_cpu_arch_ids. The arch_id is used by
> >>>> machine_set_cpu_numa_node to assign the cpus to correct numa node.    
> >>>
> >>> I might have missed something but I don't see arch_id itself being used in
> >>> machine_set_cpu_numa_node(). It only uses props part of possible_cpus    
> >>
> >> Before calling machine_set_cpu_numa_node, we call
> >> cpu_index_to_instance_props -> x86_cpu_index_to_props->
> >> possible_cpu_arch_ids->x86_possible_cpu_arch_ids.
> >>
> >> This sequence sets up the arch_id(in x86_cpu_apic_id_from_index) for all
> >> the available cpus. Based on the arch_id, it also sets up the props.  
> > 
> > 
> > x86_possible_cpu_arch_ids()
> >    arch_id = x86_cpu_apic_id_from_index(x86ms, i)
> >    x86_topo_ids_from_apicid(arch_id, x86ms->smp_dies, ms->smp.cores,  ms->smp.threads, &topo);
> >    // assign socket/die/core/thread from topo
> > 
> > so currently it uses indirect way to convert index in possible_cpus->cpus[]
> > to socket/die/core/thread ids.
> > But essentially it take '-smp' options and [0..max_cpus) number as original data
> > converts it into intermediate apic_id and then reverse engineer it back to
> > topo info.
> > 
> > Why not use x86_topo_ids_from_idx() directly to get rid of 'props' dependency on apic_id?  
> 
> It might work. But this feels like a work-around and delaying the problem
> for later. Just re-arranging the numa code little bit we can address this.

The idea behind possible_cpus is to allow users query topo information
board generates (based on -smp) at configuration time (or late) so users
could know what -numa cpu,topo_options [and -device foo-cpu,topo_options]
to use, initializing apic_id on the first access is secondary and I did
it only because I could do it without additional data.

But main purpose of possible_cpus is to keep topology information.
That includes numa node mapping, which should be stored in possible_cpus
along with the rest of cpu topology.

Looking [12/18] numa patch, it makes -numa node,cpus legacy option
to reintroduce data duplication, by storing mapping elsewhere and
then putting that mapping into possible_cpus at numa complete time
(that's what I dislike and don't see a valid reason to do so).

That also won't work if user queries hotpluggable-cpus before that time
and it also doesn't work if user uses preferable -numa cpu,topo_options
as both would initialize possible_cpus on the first access.

So if you need do some board specific post-processing done on topo
information when it's complete and recalculate apic_id do it at board
init time like was suggested before (x86_cpu_new() looks like a good
place to do it).

[...]



  reply	other threads:[~2020-02-06 13:22 UTC|newest]

Thread overview: 53+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-12-04  0:36 [PATCH v3 00/18] APIC ID fixes for AMD EPYC CPU models Babu Moger
2019-12-04  0:37 ` [PATCH v3 01/18] hw/i386: Rename X86CPUTopoInfo structure to X86CPUTopoIDs Babu Moger
2020-02-03 15:08   ` Igor Mammedov
2020-02-03 18:25     ` Babu Moger
2019-12-04  0:37 ` [PATCH v3 02/18] hw/i386: Introduce X86CPUTopoInfo to contain topology info Babu Moger
2020-01-28 15:44   ` Igor Mammedov
2019-12-04  0:37 ` [PATCH v3 03/18] hw/i386: Consolidate topology functions Babu Moger
2020-01-28 15:46   ` Igor Mammedov
2019-12-04  0:37 ` [PATCH v3 04/18] hw/i386: Introduce initialize_topo_info to initialize X86CPUTopoInfo Babu Moger
2020-01-28 15:49   ` Igor Mammedov
2020-01-28 16:42     ` Babu Moger
2019-12-04  0:37 ` [PATCH v3 05/18] machine: Add SMP Sockets in CpuTopology Babu Moger
2019-12-04  0:37 ` [PATCH v3 06/18] hw/core: Add core complex id in X86CPU topology Babu Moger
2020-01-28 16:27   ` Igor Mammedov
2020-01-28 16:44     ` Babu Moger
2020-01-28 16:31   ` Eric Blake
2020-01-28 16:44     ` Babu Moger
2019-12-04  0:37 ` [PATCH v3 07/18] machine: Add a new function init_apicid_fn in MachineClass Babu Moger
2020-01-28 16:29   ` Igor Mammedov
2020-01-28 19:45     ` Babu Moger
2020-01-28 20:12       ` Eduardo Habkost
2020-01-29  9:14       ` Igor Mammedov
2020-01-29 16:17         ` Babu Moger
2020-02-03 15:17           ` Igor Mammedov
2020-02-03 21:49             ` Babu Moger
2020-02-04  7:38               ` Igor Mammedov
2020-01-29 16:32         ` Babu Moger
2020-01-29 16:51           ` Eduardo Habkost
2020-01-29 17:05             ` Babu Moger
2019-12-04  0:37 ` [PATCH v3 08/18] hw/i386: Update structures for nodes_per_pkg Babu Moger
2019-12-04  0:37 ` [PATCH v3 09/18] i386: Add CPUX86Family type in CPUX86State Babu Moger
2019-12-04  0:38 ` [PATCH v3 10/18] hw/386: Add EPYC mode topology decoding functions Babu Moger
2019-12-04  0:38 ` [PATCH v3 11/18] i386: Cleanup and use the EPYC mode topology functions Babu Moger
2019-12-04  0:38 ` [PATCH v3 12/18] numa: Split the numa initialization Babu Moger
2019-12-04  0:38 ` [PATCH v3 13/18] hw/i386: Introduce apicid_from_cpu_idx in PCMachineState Babu Moger
2019-12-04  0:38 ` [PATCH v3 14/18] hw/i386: Introduce topo_ids_from_apicid handler PCMachineState Babu Moger
2019-12-04  0:38 ` [PATCH v3 15/18] hw/i386: Introduce apic_id_from_topo_ids handler in PCMachineState Babu Moger
2019-12-04  0:38 ` [PATCH v3 16/18] hw/i386: Introduce EPYC mode function handlers Babu Moger
2020-01-28 20:04   ` Eduardo Habkost
2020-01-28 21:48     ` Babu Moger
2020-01-29 16:41       ` Eduardo Habkost
2019-12-04  0:38 ` [PATCH v3 17/18] i386: Fix pkg_id offset for epyc mode Babu Moger
2019-12-04  0:39 ` [PATCH v3 18/18] tests: Update the Unit tests Babu Moger
2020-02-03 14:59 ` [PATCH v3 00/18] APIC ID fixes for AMD EPYC CPU models Igor Mammedov
2020-02-03 19:31   ` Babu Moger
2020-02-04  8:02     ` Igor Mammedov
2020-02-04 19:08       ` Babu Moger
2020-02-05  9:38         ` Igor Mammedov
2020-02-05 16:10           ` Babu Moger
2020-02-05 16:56             ` Igor Mammedov
2020-02-05 19:07               ` Babu Moger
2020-02-06 13:08                 ` Igor Mammedov [this message]
2020-02-06 15:32                   ` Babu Moger

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200206140839.378ea544@redhat.com \
    --to=imammedo@redhat.com \
    --cc=armbru@redhat.com \
    --cc=babu.moger@amd.com \
    --cc=ehabkost@redhat.com \
    --cc=mst@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=rth@twiddle.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).