All of lore.kernel.org
 help / color / mirror / Atom feed
From: Igor Mammedov <imammedo@redhat.com>
To: "Daniel P. Berrangé" <berrange@redhat.com>
Cc: "ehabkost@redhat.com" <ehabkost@redhat.com>,
	"mst@redhat.com" <mst@redhat.com>,
	Michal Privoznik <mprivozn@redhat.com>,
	"qemu-devel@nongnu.org" <qemu-devel@nongnu.org>,
	"Dr. David Alan Gilbert" <dgilbert@redhat.com>,
	Babu Moger <babu.moger@amd.com>,
	"pbonzini@redhat.com" <pbonzini@redhat.com>,
	"rth@twiddle.net" <rth@twiddle.net>
Subject: Re: [PATCH v5 0/8] Remove EPYC mode apicid decode and use generic decode
Date: Fri, 28 Aug 2020 13:24:36 +0200	[thread overview]
Message-ID: <20200828132436.63de9921@redhat.com> (raw)
In-Reply-To: <20200828085803.GD224144@redhat.com>

On Fri, 28 Aug 2020 09:58:03 +0100
Daniel P. Berrangé <berrange@redhat.com> wrote:

> On Thu, Aug 27, 2020 at 10:21:10PM +0200, Igor Mammedov wrote:
> > On Wed, 26 Aug 2020 13:45:51 -0500
> > Babu Moger <babu.moger@amd.com> wrote:
> >   
> > > 
> > >   
> > > > -----Original Message-----
> > > > From: Dr. David Alan Gilbert <dgilbert@redhat.com>
> > > > Sent: Wednesday, August 26, 2020 1:34 PM
> > > > To: Moger, Babu <Babu.Moger@amd.com>
> > > > Cc: Igor Mammedov <imammedo@redhat.com>; Daniel P. Berrangé
> > > > <berrange@redhat.com>; ehabkost@redhat.com; mst@redhat.com; Michal
> > > > Privoznik <mprivozn@redhat.com>; qemu-devel@nongnu.org;
> > > > pbonzini@redhat.com; rth@twiddle.net
> > > > Subject: Re: [PATCH v5 0/8] Remove EPYC mode apicid decode and use generic
> > > > decode
> > > > 
> > > > * Babu Moger (babu.moger@amd.com) wrote:  
> > > > >  
> > > > > > -----Original Message-----
> > > > > > From: Igor Mammedov <imammedo@redhat.com>
> > > > > > Sent: Wednesday, August 26, 2020 8:31 AM
> > > > > > To: Daniel P. Berrangé <berrange@redhat.com>
> > > > > > Cc: Moger, Babu <Babu.Moger@amd.com>; pbonzini@redhat.com;
> > > > > > rth@twiddle.net; ehabkost@redhat.com; qemu-devel@nongnu.org;
> > > > > > mst@redhat.com; Michal Privoznik <mprivozn@redhat.com>
> > > > > > Subject: Re: [PATCH v5 0/8] Remove EPYC mode apicid decode and use
> > > > > > generic decode
> > > > > >
> > > > > > On Wed, 26 Aug 2020 13:50:59 +0100
> > > > > > Daniel P. Berrangé <berrange@redhat.com> wrote:
> > > > > >  
> > > > > > > On Wed, Aug 26, 2020 at 02:38:49PM +0200, Igor Mammedov wrote:  
> > > > > > > > On Fri, 21 Aug 2020 17:12:19 -0500 Babu Moger
> > > > > > > > <babu.moger@amd.com> wrote:
> > > > > > > >  
> > > > > > > > > To support some of the complex topology, we introduced EPYC
> > > > > > > > > mode  
> > > > > > apicid decode.  
> > > > > > > > > But, EPYC mode decode is running into problems. Also it can
> > > > > > > > > become quite a maintenance problem in the future. So, it was
> > > > > > > > > decided to remove that code and use the generic decode which
> > > > > > > > > works for majority of the topology. Most of the SPECed
> > > > > > > > > configuration would work just fine. With some non-SPECed user
> > > > > > > > > inputs, it will create some sub-  
> > > > > > optimal configuration.  
> > > > > > > > > Here is the discussion thread.
> > > > > > > > > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2
> > > > > > > > > F%2F
> > > > > > > > > lore.kernel.org%2Fqemu-devel%2Fc0bcc1a6-1d84-a6e7-e468-  
> > > > > > d5b437c1b25  
> > > > > > > > >  
> > > > > >  
> > > > 4%40amd.com%2F&amp;data=02%7C01%7Cbabu.moger%40amd.com%7C8a5c  
> > > > > > 52f92  
> > > > > > > > >  
> > > > > >  
> > > > 3f04082a40808d849c43d49%7C3dd8961fe4884e608e11a82d994e183d%7C0%7  
> > > > > > C0  
> > > > > > > > >  
> > > > > >  
> > > > %7C637340454473508873&amp;sdata=VnW28H1v4XwK3GaNGFxu%2BhwiMeA  
> > > > > > YO%2B  
> > > > > > > > > 3WAzo3DeY5Ha8%3D&amp;reserved=0
> > > > > > > > >
> > > > > > > > > This series removes all the EPYC mode specific apicid changes
> > > > > > > > > and use the generic apicid decode.  
> > > > > > > >
> > > > > > > > the main difference between EPYC and all other CPUs is that it
> > > > > > > > requires numa configuration (it's not optional) so we need an
> > > > > > > > extra  
> > > > > No, That is not true. Because of that assumption we made all these
> > > > > apicid changes. And here we are now.
> > > > >
> > > > > AMD supports varies mixed configurations. In case of EPYC-Rome, we
> > > > > have NPS1, NPS2 and NPS4(Numa Nodes per socket). In case of NPS1,
> > > > > basically we have all the cores in a socket under one numa node. This
> > > > > is non-numa configuration.
> > > > > Looking at the various configurations and also discussing internally,
> > > > > it is not advisable to have (epyc && !numa) check.  
> > > > 
> > > > Indeed on real hardware, I don't think we always see NUMA; my single socket,
> > > > 16 core/32 thread 7302P Dell box, shows the kernel printing 'No NUMA
> > > > configuration found...Faking a node.'  
> > looks like firmware bug or maybe it's feature and there is a knob in fw
> > to turn it on/off in case used OS doesn't like it for some reason.
> > 
> >   
> > > > So if real hardware hasn't got a NUMA node, what's the real problem?  
> > > 
> > > I don't see any problem once we revert all these changes(patch 1-7).
> > > We don't need if (epyc && !numa) error check or auto_enable_numa=true
> > > unconditionally.  
> > 
> > We need revert to unbreak migration from QEMU < 5.0,
> > everything else (fixes for CPUID_Fn8000001E) could go on top.
> > 
> > So what's on top (because old code also wasn't correct when
> > CPUID_Fn8000001E is taken in account, tha's why we are at this point),
> > 
> > When starting QEMU without -numa
> > Indeed we can skip "if (epyc && !numa) error check or auto_enable_numa=true",
> > in case where there is 1 die (NPS1).
> > (1) User however may set core/threads number bigger than possible by spec,
> >     in which case CPUID_Fn8000001E_EBX/CPUID_Fn8000001E_ECX will not be
> >     valid spec vise and could trigger OPPs in guest kernel.
> >     Given we allow go out of spec, perhaps we should add a warning at
> >     realize time saying that used -smp config is not supported since it
> >     doesn't match AMD EPYC spec and might not work.
> > 
> > (2) Earlier we agreed that we can reuse existing die_id instead of internal
> >     (topo_ids.node_id in current code)
> >     (It's is called DIE_ID and NODE ID in spec interchangeably)
> >     Same as (1) add a warning when '-smp dies' goes beyond spec limits.
> >     
> > (3) "-smp dies>1" ''if'' we allow to run it without -numa,
> >     then system wide NUMA node id in CPUID_Fn8000001E_ECX probably doesn't matter.
> >     could be something like in spec but taking in account die offset, to produce
> >     unique id.
> > 
> >     Same, add a warning that there are more than 1 dies but numa is not enabled,
> >     suggest to enable numa.
> > 
> >     With current code it produces invalid APIC ID for valid '-smp' combination,
> >     however if we revert it and switch to die_id than it should produce
> >     valid APIC ID once again (as in 4.2).
> >     Given it produces invalid APIC id, maybe we should just ditch the case and
> >     fold it in (4) (i.e. require -numa if "-smp dies>1")
> > 
> > (4) -numa is used (RHBZ1728166)
> >     we need to ensure that socket*dies == ms->numa_state->num_nodes
> >      and make sure that CPUID_Fn8000001E_ECX consistent with
> >     cpu mapping provided with "-numa cpu=" option.  
> 
> Why do we need to socket*dies == ms->numa_state->num_nodes ? That doesn't
> seem to be the case in bare metal EPYC nodes I've used which lets you
> configure how many NUMA nodes in firmware.

(From dumps Babu has provided earlier, it was dies == nodes and
CPUID_Fn8000001E_ECX == numa node ids in SRAT.)

dumping CPUID_Fn8000001E and SRAT table for such configs will help us
to figure out if we need socket*dies != nodes and how to compose config
were SRAT differs from CPUID_Fn8000001E_ECX.

Babu, can you provide CPUID_Fn8000001E and SRAT dumps for
above configs combinations? Or to some spec/guide how it should be.


> 
> 
> Regards,
> Daniel



  reply	other threads:[~2020-08-28 11:25 UTC|newest]

Thread overview: 51+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-08-21 22:12 [PATCH v5 0/8] Remove EPYC mode apicid decode and use generic decode Babu Moger
2020-08-21 22:12 ` [PATCH v5 1/8] hw/i386: Remove node_id, nr_nodes and nodes_per_pkg from topology Babu Moger
2020-08-26  9:57   ` Igor Mammedov
2020-08-26 17:31     ` Babu Moger
2020-08-21 22:12 ` [PATCH v5 2/8] Revert "i386: Fix pkg_id offset for EPYC cpu models" Babu Moger
2020-08-21 22:12 ` [PATCH v5 3/8] Revert "target/i386: Enable new apic id encoding for EPYC based cpus models" Babu Moger
2020-08-21 22:12 ` [PATCH v5 4/8] Revert "hw/i386: Move arch_id decode inside x86_cpus_init" Babu Moger
2020-08-21 22:12 ` [PATCH v5 5/8] Revert "i386: Introduce use_epyc_apic_id_encoding in X86CPUDefinition" Babu Moger
2020-08-21 22:12 ` [PATCH v5 6/8] Revert "hw/i386: Introduce apicid functions inside X86MachineState" Babu Moger
2020-08-21 22:13 ` [PATCH v5 7/8] Revert "hw/386: Add EPYC mode topology decoding functions" Babu Moger
2020-08-28 17:27   ` Eduardo Habkost
2020-08-28 18:03     ` Babu Moger
2020-08-21 22:13 ` [PATCH v5 8/8] i386: Simplify CPUID_8000_001E for AMD Babu Moger
2020-08-26 12:19   ` Igor Mammedov
2020-08-24 18:41 ` [PATCH v5 0/8] Remove EPYC mode apicid decode and use generic decode Dr. David Alan Gilbert
2020-08-24 19:20   ` Babu Moger
2020-08-25  8:15     ` Dr. David Alan Gilbert
2020-08-25 14:38       ` Igor Mammedov
2020-08-25 15:25         ` Dr. David Alan Gilbert
2020-08-26 12:43           ` Igor Mammedov
2020-08-26 14:10             ` Dr. David Alan Gilbert
2020-08-27 21:19               ` Igor Mammedov
2020-08-27 22:58                 ` Babu Moger
2020-08-28  8:42                   ` Igor Mammedov
2020-08-28 14:22                     ` Babu Moger
2020-08-28  8:48                 ` Dr. David Alan Gilbert
2020-08-28 11:36                   ` Igor Mammedov
2020-08-26 12:38 ` Igor Mammedov
2020-08-26 12:50   ` Daniel P. Berrangé
2020-08-26 13:30     ` Igor Mammedov
2020-08-26 13:36       ` Daniel P. Berrangé
2020-08-26 14:02         ` Igor Mammedov
2020-08-26 15:03           ` Daniel P. Berrangé
2020-08-26 15:18             ` Eduardo Habkost
2020-08-27 17:03             ` Igor Mammedov
2020-08-27 19:07               ` Eduardo Habkost
2020-08-27 20:55                 ` Igor Mammedov
2020-08-28  8:55                   ` Daniel P. Berrangé
2020-08-28 16:29                     ` Eduardo Habkost
2020-08-28 16:32                       ` Daniel P. Berrangé
2020-08-28 16:45                         ` Eduardo Habkost
2020-08-28 18:00                           ` Babu Moger
2020-08-26 17:17       ` Babu Moger
2020-08-26 18:33         ` Dr. David Alan Gilbert
2020-08-26 18:45           ` Babu Moger
2020-08-27 20:21             ` Igor Mammedov
2020-08-28  8:58               ` Daniel P. Berrangé
2020-08-28 11:24                 ` Igor Mammedov [this message]
2020-08-28 14:17                   ` Babu Moger
2020-08-28 14:48                     ` Igor Mammedov
2020-08-26 14:04   ` Eduardo Habkost

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200828132436.63de9921@redhat.com \
    --to=imammedo@redhat.com \
    --cc=babu.moger@amd.com \
    --cc=berrange@redhat.com \
    --cc=dgilbert@redhat.com \
    --cc=ehabkost@redhat.com \
    --cc=mprivozn@redhat.com \
    --cc=mst@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=rth@twiddle.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.