From: Igor Mammedov <imammedo@redhat.com>
To: Babu Moger <babu.moger@amd.com>
Cc: "Daniel P. Berrangé" <berrange@redhat.com>,
"ehabkost@redhat.com" <ehabkost@redhat.com>,
"mst@redhat.com" <mst@redhat.com>,
"Michal Privoznik" <mprivozn@redhat.com>,
"qemu-devel@nongnu.org" <qemu-devel@nongnu.org>,
"Dr. David Alan Gilbert" <dgilbert@redhat.com>,
"pbonzini@redhat.com" <pbonzini@redhat.com>,
"rth@twiddle.net" <rth@twiddle.net>
Subject: Re: [PATCH v5 0/8] Remove EPYC mode apicid decode and use generic decode
Date: Thu, 27 Aug 2020 22:21:10 +0200 [thread overview]
Message-ID: <20200827222110.1c2ee236@imammedo-mac> (raw)
In-Reply-To: <11489e5f-2285-ddb4-9c35-c9f522d603a0@amd.com>
On Wed, 26 Aug 2020 13:45:51 -0500
Babu Moger <babu.moger@amd.com> wrote:
>
>
> > -----Original Message-----
> > From: Dr. David Alan Gilbert <dgilbert@redhat.com>
> > Sent: Wednesday, August 26, 2020 1:34 PM
> > To: Moger, Babu <Babu.Moger@amd.com>
> > Cc: Igor Mammedov <imammedo@redhat.com>; Daniel P. Berrangé
> > <berrange@redhat.com>; ehabkost@redhat.com; mst@redhat.com; Michal
> > Privoznik <mprivozn@redhat.com>; qemu-devel@nongnu.org;
> > pbonzini@redhat.com; rth@twiddle.net
> > Subject: Re: [PATCH v5 0/8] Remove EPYC mode apicid decode and use generic
> > decode
> >
> > * Babu Moger (babu.moger@amd.com) wrote:
> > >
> > > > -----Original Message-----
> > > > From: Igor Mammedov <imammedo@redhat.com>
> > > > Sent: Wednesday, August 26, 2020 8:31 AM
> > > > To: Daniel P. Berrangé <berrange@redhat.com>
> > > > Cc: Moger, Babu <Babu.Moger@amd.com>; pbonzini@redhat.com;
> > > > rth@twiddle.net; ehabkost@redhat.com; qemu-devel@nongnu.org;
> > > > mst@redhat.com; Michal Privoznik <mprivozn@redhat.com>
> > > > Subject: Re: [PATCH v5 0/8] Remove EPYC mode apicid decode and use
> > > > generic decode
> > > >
> > > > On Wed, 26 Aug 2020 13:50:59 +0100
> > > > Daniel P. Berrangé <berrange@redhat.com> wrote:
> > > >
> > > > > On Wed, Aug 26, 2020 at 02:38:49PM +0200, Igor Mammedov wrote:
> > > > > > On Fri, 21 Aug 2020 17:12:19 -0500 Babu Moger
> > > > > > <babu.moger@amd.com> wrote:
> > > > > >
> > > > > > > To support some of the complex topology, we introduced EPYC
> > > > > > > mode
> > > > apicid decode.
> > > > > > > But, EPYC mode decode is running into problems. Also it can
> > > > > > > become quite a maintenance problem in the future. So, it was
> > > > > > > decided to remove that code and use the generic decode which
> > > > > > > works for majority of the topology. Most of the SPECed
> > > > > > > configuration would work just fine. With some non-SPECed user
> > > > > > > inputs, it will create some sub-
> > > > optimal configuration.
> > > > > > > Here is the discussion thread.
> > > > > > > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2
> > > > > > > F%2F
> > > > > > > lore.kernel.org%2Fqemu-devel%2Fc0bcc1a6-1d84-a6e7-e468-
> > > > d5b437c1b25
> > > > > > >
> > > >
> > 4%40amd.com%2F&data=02%7C01%7Cbabu.moger%40amd.com%7C8a5c
> > > > 52f92
> > > > > > >
> > > >
> > 3f04082a40808d849c43d49%7C3dd8961fe4884e608e11a82d994e183d%7C0%7
> > > > C0
> > > > > > >
> > > >
> > %7C637340454473508873&sdata=VnW28H1v4XwK3GaNGFxu%2BhwiMeA
> > > > YO%2B
> > > > > > > 3WAzo3DeY5Ha8%3D&reserved=0
> > > > > > >
> > > > > > > This series removes all the EPYC mode specific apicid changes
> > > > > > > and use the generic apicid decode.
> > > > > >
> > > > > > the main difference between EPYC and all other CPUs is that it
> > > > > > requires numa configuration (it's not optional) so we need an
> > > > > > extra
> > > No, That is not true. Because of that assumption we made all these
> > > apicid changes. And here we are now.
> > >
> > > AMD supports varies mixed configurations. In case of EPYC-Rome, we
> > > have NPS1, NPS2 and NPS4(Numa Nodes per socket). In case of NPS1,
> > > basically we have all the cores in a socket under one numa node. This
> > > is non-numa configuration.
> > > Looking at the various configurations and also discussing internally,
> > > it is not advisable to have (epyc && !numa) check.
> >
> > Indeed on real hardware, I don't think we always see NUMA; my single socket,
> > 16 core/32 thread 7302P Dell box, shows the kernel printing 'No NUMA
> > configuration found...Faking a node.'
looks like firmware bug or maybe it's feature and there is a knob in fw
to turn it on/off in case used OS doesn't like it for some reason.
> > So if real hardware hasn't got a NUMA node, what's the real problem?
>
> I don't see any problem once we revert all these changes(patch 1-7).
> We don't need if (epyc && !numa) error check or auto_enable_numa=true
> unconditionally.
We need revert to unbreak migration from QEMU < 5.0,
everything else (fixes for CPUID_Fn8000001E) could go on top.
So what's on top (because old code also wasn't correct when
CPUID_Fn8000001E is taken in account, tha's why we are at this point),
When starting QEMU without -numa
Indeed we can skip "if (epyc && !numa) error check or auto_enable_numa=true",
in case where there is 1 die (NPS1).
(1) User however may set core/threads number bigger than possible by spec,
in which case CPUID_Fn8000001E_EBX/CPUID_Fn8000001E_ECX will not be
valid spec vise and could trigger OPPs in guest kernel.
Given we allow go out of spec, perhaps we should add a warning at
realize time saying that used -smp config is not supported since it
doesn't match AMD EPYC spec and might not work.
(2) Earlier we agreed that we can reuse existing die_id instead of internal
(topo_ids.node_id in current code)
(It's is called DIE_ID and NODE ID in spec interchangeably)
Same as (1) add a warning when '-smp dies' goes beyond spec limits.
(3) "-smp dies>1" ''if'' we allow to run it without -numa,
then system wide NUMA node id in CPUID_Fn8000001E_ECX probably doesn't matter.
could be something like in spec but taking in account die offset, to produce
unique id.
Same, add a warning that there are more than 1 dies but numa is not enabled,
suggest to enable numa.
With current code it produces invalid APIC ID for valid '-smp' combination,
however if we revert it and switch to die_id than it should produce
valid APIC ID once again (as in 4.2).
Given it produces invalid APIC id, maybe we should just ditch the case and
fold it in (4) (i.e. require -numa if "-smp dies>1")
(4) -numa is used (RHBZ1728166)
we need to ensure that socket*dies == ms->numa_state->num_nodes
and make sure that CPUID_Fn8000001E_ECX consistent with
cpu mapping provided with "-numa cpu=" option.
Warnings won't help a lot, but at least they will point out at
possible problem when someone complains.
> >
> > Dave
> >
> > > > > > patch on top of this series to enfoce that, i.e:
> > > > > >
> > > > > > if (epyc && !numa)
> > > > > > error("EPYC cpu requires numa to be configured")
> > > > >
> > > > > Please no. This will break 90% of current usage of the EPYC CPU in
> > > > > real world QEMU deployments. That is way too user hostile to
> > > > > introduce as a requirement.
> > > > >
> > > > > Why do we need to force this ? People have been successfuly using
> > > > > EPYC CPUs without NUMA in QEMU for years now.
> > > > >
> > > > > It might not match behaviour of bare metal silicon, but that
> > > > > hasn't obviously caused the world to come crashing down.
> > > > So far it produces warning in linux kernel (RHBZ1728166), (resulting
> > > > performance might be suboptimal), but I haven't seen anyone reporting
> > crashes yet.
> > > >
> > > >
> > > > What other options do we have?
> > > > Perhaps we can turn on strict check for new machine types only, so
> > > > old configs can keep broken topology (CPUID), while new ones would
> > > > require -numa and produce correct topology.
> > > >
> > > >
> > > > >
> > > > > Regards,
> > > > > Daniel
> > >
> > >
> > --
> > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
>
next prev parent reply other threads:[~2020-08-27 20:22 UTC|newest]
Thread overview: 51+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-08-21 22:12 [PATCH v5 0/8] Remove EPYC mode apicid decode and use generic decode Babu Moger
2020-08-21 22:12 ` [PATCH v5 1/8] hw/i386: Remove node_id, nr_nodes and nodes_per_pkg from topology Babu Moger
2020-08-26 9:57 ` Igor Mammedov
2020-08-26 17:31 ` Babu Moger
2020-08-21 22:12 ` [PATCH v5 2/8] Revert "i386: Fix pkg_id offset for EPYC cpu models" Babu Moger
2020-08-21 22:12 ` [PATCH v5 3/8] Revert "target/i386: Enable new apic id encoding for EPYC based cpus models" Babu Moger
2020-08-21 22:12 ` [PATCH v5 4/8] Revert "hw/i386: Move arch_id decode inside x86_cpus_init" Babu Moger
2020-08-21 22:12 ` [PATCH v5 5/8] Revert "i386: Introduce use_epyc_apic_id_encoding in X86CPUDefinition" Babu Moger
2020-08-21 22:12 ` [PATCH v5 6/8] Revert "hw/i386: Introduce apicid functions inside X86MachineState" Babu Moger
2020-08-21 22:13 ` [PATCH v5 7/8] Revert "hw/386: Add EPYC mode topology decoding functions" Babu Moger
2020-08-28 17:27 ` Eduardo Habkost
2020-08-28 18:03 ` Babu Moger
2020-08-21 22:13 ` [PATCH v5 8/8] i386: Simplify CPUID_8000_001E for AMD Babu Moger
2020-08-26 12:19 ` Igor Mammedov
2020-08-24 18:41 ` [PATCH v5 0/8] Remove EPYC mode apicid decode and use generic decode Dr. David Alan Gilbert
2020-08-24 19:20 ` Babu Moger
2020-08-25 8:15 ` Dr. David Alan Gilbert
2020-08-25 14:38 ` Igor Mammedov
2020-08-25 15:25 ` Dr. David Alan Gilbert
2020-08-26 12:43 ` Igor Mammedov
2020-08-26 14:10 ` Dr. David Alan Gilbert
2020-08-27 21:19 ` Igor Mammedov
2020-08-27 22:58 ` Babu Moger
2020-08-28 8:42 ` Igor Mammedov
2020-08-28 14:22 ` Babu Moger
2020-08-28 8:48 ` Dr. David Alan Gilbert
2020-08-28 11:36 ` Igor Mammedov
2020-08-26 12:38 ` Igor Mammedov
2020-08-26 12:50 ` Daniel P. Berrangé
2020-08-26 13:30 ` Igor Mammedov
2020-08-26 13:36 ` Daniel P. Berrangé
2020-08-26 14:02 ` Igor Mammedov
2020-08-26 15:03 ` Daniel P. Berrangé
2020-08-26 15:18 ` Eduardo Habkost
2020-08-27 17:03 ` Igor Mammedov
2020-08-27 19:07 ` Eduardo Habkost
2020-08-27 20:55 ` Igor Mammedov
2020-08-28 8:55 ` Daniel P. Berrangé
2020-08-28 16:29 ` Eduardo Habkost
2020-08-28 16:32 ` Daniel P. Berrangé
2020-08-28 16:45 ` Eduardo Habkost
2020-08-28 18:00 ` Babu Moger
2020-08-26 17:17 ` Babu Moger
2020-08-26 18:33 ` Dr. David Alan Gilbert
2020-08-26 18:45 ` Babu Moger
2020-08-27 20:21 ` Igor Mammedov [this message]
2020-08-28 8:58 ` Daniel P. Berrangé
2020-08-28 11:24 ` Igor Mammedov
2020-08-28 14:17 ` Babu Moger
2020-08-28 14:48 ` Igor Mammedov
2020-08-26 14:04 ` Eduardo Habkost
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20200827222110.1c2ee236@imammedo-mac \
--to=imammedo@redhat.com \
--cc=babu.moger@amd.com \
--cc=berrange@redhat.com \
--cc=dgilbert@redhat.com \
--cc=ehabkost@redhat.com \
--cc=mprivozn@redhat.com \
--cc=mst@redhat.com \
--cc=pbonzini@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=rth@twiddle.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).