All of lore.kernel.org
 help / color / mirror / Atom feed
From: Igor Mammedov <imammedo@redhat.com>
To: Babu Moger <babu.moger@amd.com>
Cc: "pbonzini@redhat.com" <pbonzini@redhat.com>,
	"qemu-devel@nongnu.org" <qemu-devel@nongnu.org>,
	"ehabkost@redhat.com" <ehabkost@redhat.com>,
	"rth@twiddle.net" <rth@twiddle.net>
Subject: Re: [PATCH v2 1/3] hw/i386: Initialize topo_ids from CpuInstanceProperties
Date: Thu, 30 Jul 2020 13:27:20 +0200	[thread overview]
Message-ID: <20200730132720.49206d44@redhat.com> (raw)
In-Reply-To: <e0b8086f-3678-37ea-aab8-ca5163ecee07@amd.com>

On Wed, 29 Jul 2020 16:22:32 -0500
Babu Moger <babu.moger@amd.com> wrote:

> Igor,
> Sorry. Few more questions.
> 
> > -----Original Message-----
> > From: Igor Mammedov <imammedo@redhat.com>
> > Sent: Wednesday, July 29, 2020 9:12 AM
> > To: Moger, Babu <Babu.Moger@amd.com>
> > Cc: pbonzini@redhat.com; rth@twiddle.net; qemu-devel@nongnu.org;
> > ehabkost@redhat.com
> > Subject: Re: [PATCH v2 1/3] hw/i386: Initialize topo_ids from
> > CpuInstanceProperties
> > 
> > On Mon, 27 Jul 2020 18:59:42 -0500
> > Babu Moger <babu.moger@amd.com> wrote:
> >   
> > > > -----Original Message-----
> > > > From: Igor Mammedov <imammedo@redhat.com>
> > > > Sent: Monday, July 27, 2020 12:14 PM
> > > > To: Moger, Babu <Babu.Moger@amd.com>
> > > > Cc: qemu-devel@nongnu.org; pbonzini@redhat.com;  
> > ehabkost@redhat.com;  
> > > > rth@twiddle.net
> > > > Subject: Re: [PATCH v2 1/3] hw/i386: Initialize topo_ids from
> > > > CpuInstanceProperties
> > > >
> > > > On Mon, 27 Jul 2020 10:49:08 -0500
> > > > Babu Moger <babu.moger@amd.com> wrote:
> > > >  
> > > > > > -----Original Message-----
> > > > > > From: Igor Mammedov <imammedo@redhat.com>
> > > > > > Sent: Friday, July 24, 2020 12:05 PM
> > > > > > To: Moger, Babu <Babu.Moger@amd.com>
> > > > > > Cc: qemu-devel@nongnu.org; pbonzini@redhat.com;  
> > > > ehabkost@redhat.com;  
> > > > > > rth@twiddle.net
> > > > > > Subject: Re: [PATCH v2 1/3] hw/i386: Initialize topo_ids from
> > > > > > CpuInstanceProperties
> > > > > >
> > > > > > On Mon, 13 Jul 2020 14:30:29 -0500 Babu Moger
> > > > > > <babu.moger@amd.com> wrote:
> > > > > >  
> > > > > > > > -----Original Message-----
> > > > > > > > From: Igor Mammedov <imammedo@redhat.com>
> > > > > > > > Sent: Monday, July 13, 2020 12:32 PM
> > > > > > > > To: Moger, Babu <Babu.Moger@amd.com>
> > > > > > > > Cc: pbonzini@redhat.com; rth@twiddle.net;
> > > > > > > > ehabkost@redhat.com;
> > > > > > > > qemu- devel@nongnu.org
> > > > > > > > Subject: Re: [PATCH v2 1/3] hw/i386: Initialize topo_ids
> > > > > > > > from CpuInstanceProperties
> > > > > > > >
> > > > > > > > On Mon, 13 Jul 2020 11:43:33 -0500 Babu Moger
> > > > > > > > <babu.moger@amd.com> wrote:
> > > > > > > >  
> > > > > > > > > On 7/13/20 11:17 AM, Igor Mammedov wrote:  
> > > > > > > > > > On Mon, 13 Jul 2020 10:02:22 -0500 Babu Moger
> > > > > > > > > > <babu.moger@amd.com> wrote:
> > > > > > > > > >  
> > > > > > > > > >>> -----Original Message-----
> > > > > > > > > >>> From: Igor Mammedov <imammedo@redhat.com>
> > > > > > > > > >>> Sent: Monday, July 13, 2020 4:08 AM
> > > > > > > > > >>> To: Moger, Babu <Babu.Moger@amd.com>
> > > > > > > > > >>> Cc: pbonzini@redhat.com; rth@twiddle.net;
> > > > > > > > > >>> ehabkost@redhat.com;
> > > > > > > > > >>> qemu- devel@nongnu.org
> > > > > > > > > >>> Subject: Re: [PATCH v2 1/3] hw/i386: Initialize
> > > > > > > > > >>> topo_ids from CpuInstanceProperties  
[...]

> > > > > There will be complications when user configures with both die_id
> > > > > and numa_id. It will complicate things further. I will have to
> > > > > look closely at the code if it is feasible.  
> > > >
> > > > it's worth a try.
> > > > conseptionally die_id in intel/amd is the same. Most likely intel
> > > > has a dedicated memory controller on each die so it still should
> > > > form a NUMA node. But that aspect probably was ignored while
> > > > implementing it in QEMU so ping of configuring QEMU right is on
> > > > user's shoulders (there is no checks whatsoever if cpu belonging to specific  
> > die is on right NUMA node).  
> > >
> > > So you are suggesting to use die_id to build the topology for EPYC.
> > > Also initialize die_id based on the numa information. Re-arrange the
> > > numa code to make sure we have all the information before we build the
> > > topology. And then remove the node_id inside X86CPUTopoIDs. Is that the  
> > plan?
> > reusing die_id might simplify logic and at very least we won't have 2 very similar
> > fields to deal with. With die_id it should be conditional on EPYC.  
> 
> Not convinced if the using the die_id will solve the problem here. But
> going to investigate this little bit.
it allows  us to drop nodes_per_pkg calculation with its dependency on numa,
since it's provided by user with -smp dies=

We might need a sanity check that user provided value is valid in case on EPYC though.

> > > Regardless of using die_id, we should  
> > 
> > (1) error out if tolopolgy will require more than 1 numa node and no numa
> > config was provided.  
> 
> We already have node_id check in numa_cpu_pre_plug. Do you want me to
> bring this check in pc_cpu_pre_plug?

There are several checks there and that includes validating per CPU node-id
values and workarounds for broken libvirt.

Where I'm talking more about number of numa nodes required f(-smp dies,-cpu EPYC),
like:
  if (dies>1 && epyc && nb_numa_nodes != sockets * dies)
     error_steg("chosen cpu model ... and -smp ... parameters require X numa nodes being configured")
     error_append_hint("use -numa options to create requred number of numa nodes")

I'm not sure where put it in for now, we can try to put it into x86_cpus_init()
for starters and later see if there is more sutable place for it.


> > (2) for 1 NUMA node use autonuma to create 1 node implicitly, that requres
> > converting static MachineClass::auto_enable_numa into an instance specific
> > value, i.e. moving it into MachineState, so that we can change it at runtime
> > depending on CPU type.  
> 
> Isn't it already taken care in numa_complete_configuration when num_nodes
> = 0? Where does this change go if at all required?
numa_complete_configuration()
...
    if (ms->numa_state->num_nodes == 0 &&                                        
        ((ms->ram_slots && mc->auto_enable_numa_with_memhp) ||                   
         (ms->maxram_size > ms->ram_size && mc->auto_enable_numa_with_memdev) || 
         mc->auto_enable_numa)) {                                                
            NumaNodeOptions node = { };                                          
            parse_numa_node(ms, &node, &error_abort);                            
            numa_info[0].node_mem = ram_size;                                    
    }

that is a fragment that takes care of implict creation of the single numa node.
lets ignore *numa_with_* cases and look into mc->auto_enable_numa.
It is MachineClass field and we are not supposed to change it at runtime,
but we need to enable it in case options specify 1 node config, i.e.

  "-cpu epyc -smp x,sockets=1,dies=1"

so we need to trigger auto_enable_numa depending on above condition.
Looking at the current code there is no good place to put it in.

we can try to replace MachineClass::auto_enable_numa with callback
  bool MachineClass::auto_enable_numa_cb(MachineState *ms)
so we can change logic at runtime where it's needed.

> 
> > (3) use NUMA id from CPU::node-id for CPUID_8000001E and have a checks that
> > will ensure
> >     that used value is possible to fit in CPUID leaf.  
> 
> Node_id is already part of apic_id. We can easily extract it from apic_id.
> I have already sent the patch to simplify CPUID_8000001E. I will make it
> part of this series.
> https://lore.kernel.org/qemu-devel/159164753686.20543.4158548114186964547.stgit@naples-babu.amd.com/
that's where confusion in naming gets in a way:
let's set following difinitions for purpose of this discussion/QEMU
   node_id = system wide NUMA node id  
   AMD's ApicId[5:4] = die_id

what AMD encodes in APIC ID is not a node_id but reather an index of a node within package.
Even in spec in one place it's called "Node ID" but in another place it's reffered as DIE_ID.

Whith that cleared up, following CPUID defined as
CPUID_Fn8000001E_ECX[7:0] = NodeId
but it's not the same as ApicId[5:4], description says it's {5'b00000,1'b[SOCKET_ID],2'b[DIE_ID]}.
and CPUID dump from EPYC machine confirms that. It also matches with system wide NUMA node ids
encoded in SRAT table.
So above pointed patch is incorrect wrt CPUID_Fn8000001E_ECX.

Considering we allow for more nodes than existing EPYCs can have,
I'd rather it would take value of NUMA node id from CPU's "node-id" property
with a check that it fits within CPUID_Fn8000001E_ECX[7:0] space at realize time,
to ensure that NUMA node ids are consistent with what user provides and ACPI tables.

   
> > 
> > 
> > 
> >   
> > > > What AMD has implemented on top of that in CPU hw, is to expose NUMA
> > > > node id in CPUID_8000001E. I don't know why it was done as usually
> > > > it's ACPI tables that describe relations between nodes so for OS
> > > > this info almost useless (I'd guess it's faster to use CPUID instead
> > > > of fetching pre-cpu variable but that's pretty much it from OS point
> > > > of view)
> > > >  
> > > > >  
> > > > > >
> > > > > >
> > > > > >  
> > > > > > > > > >>>> +    topo_ids->pkg_id = props.has_socket_id ? props.socket_id  
> > :  
> > > > > > > > > >>>> +0; }
> > > > > > > > > >>>>  /*
> > > > > > > > > >>>>   * Make APIC ID for the CPU 'cpu_index'
> > > > > > > > > >>>>   *
> > > > > > > > > >>>>  
> > > > > > > > > >>  
> > > > > > > > > >  
> > > > > > > > >  
> > > > > > >
> > > > > > >  
> > > > >  
> > >
> > >  
> 



  reply	other threads:[~2020-07-30 11:31 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-07-01 17:30 [PATCH v2 0/3] Fix couple of issues with AMD topology Babu Moger
2020-07-01 17:31 ` [PATCH v2 1/3] hw/i386: Initialize topo_ids from CpuInstanceProperties Babu Moger
2020-07-13  9:08   ` Igor Mammedov
2020-07-13 15:02     ` Babu Moger
2020-07-13 16:17       ` Igor Mammedov
2020-07-13 16:43         ` Babu Moger
2020-07-13 17:32           ` Igor Mammedov
2020-07-13 19:30             ` Babu Moger
2020-07-14 16:41               ` Igor Mammedov
2020-07-14 17:26                 ` Babu Moger
2020-07-14 18:26                   ` Igor Mammedov
2020-07-24 17:05               ` Igor Mammedov
2020-07-27 15:49                 ` Babu Moger
2020-07-27 17:14                   ` Igor Mammedov
2020-07-27 23:59                     ` Babu Moger
2020-07-29 14:12                       ` Igor Mammedov
2020-07-29 21:22                         ` Babu Moger
2020-07-30 11:27                           ` Igor Mammedov [this message]
2020-07-01 17:31 ` [PATCH v2 2/3] hw/i386: Build apic_id " Babu Moger
2020-07-13  9:15   ` Igor Mammedov
2020-07-13 15:00     ` Babu Moger
2020-07-01 17:31 ` [PATCH v2 3/3] hw/386: Fix uninitialized memory with -device and CPU hotplug Babu Moger
2020-07-27 16:36   ` Igor Mammedov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200730132720.49206d44@redhat.com \
    --to=imammedo@redhat.com \
    --cc=babu.moger@amd.com \
    --cc=ehabkost@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=rth@twiddle.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.