From: Igor Mammedov <imammedo@redhat.com>
To: Andrew Jones <drjones@redhat.com>
Cc: peter.maydell@linaro.org, Gavin Shan <gshan@redhat.com>,
ehabkost@redhat.com, robh@kernel.org, qemu-devel@nongnu.org,
qemu-arm@nongnu.org, shan.gavin@gmail.com
Subject: Re: [PATCH 1/2] numa: Set default distance map if needed
Date: Tue, 12 Oct 2021 15:53:21 +0200 [thread overview]
Message-ID: <20211012155321.256e8867@redhat.com> (raw)
In-Reply-To: <20211012131308.45j7ofd4xwk42epv@gator>
On Tue, 12 Oct 2021 15:13:08 +0200
Andrew Jones <drjones@redhat.com> wrote:
> On Tue, Oct 12, 2021 at 02:27:54PM +0200, Igor Mammedov wrote:
> > On Tue, 12 Oct 2021 12:37:54 +0200
> > Andrew Jones <drjones@redhat.com> wrote:
> >
> > > On Tue, Oct 12, 2021 at 11:40:16AM +0200, Igor Mammedov wrote:
> > > > On Wed, 6 Oct 2021 18:22:08 +0800
> > > > Gavin Shan <gshan@redhat.com> wrote:
> > > >
> > > > > The following option is used to specify the distance map. It's
> > > > > possible the option isn't provided by user. In this case, the
> > > > > distance map isn't populated and exposed to platform. On the
> > > > > other hand, the empty NUMA node, where no memory resides, is
> > > > > allowed on ARM64 virt platform. For these empty NUMA nodes,
> > > > > their corresponding device-tree nodes aren't populated, but
> > > > > their NUMA IDs should be included in the "/distance-map"
> > > > > device-tree node, so that kernel can probe them properly if
> > > > > device-tree is used.
> > > > >
> > > > > -numa,dist,src=<numa_id>,dst=<numa_id>,val=<distance>
> > > > >
> > > > > So when user doesn't specify distance map, we need to generate
> > > > > the default distance map, where the local and remote distances
> > > > > are 10 and 20 separately. This adds an extra parameter to the
> > > > > exiting complete_init_numa_distance() to generate the default
> > > > > distance map for this case.
> > > > >
> > > > > Signed-off-by: Gavin Shan <gshan@redhat.com>
> > > >
> > > >
> > > > how about error-ing out if distance map is required but
> > > > not provided by user explicitly and asking user to fix
> > > > command line?
> > > >
> > > > Reasoning behind this that defaults are hard to maintain
> > > > and will require compat hacks and being raod blocks down
> > > > the road.
> > > > Approach I was taking with generic NUMA code, is deprecating
> > > > defaults and replacing them with sanity checks, which bail
> > > > out on incorrect configuration and ask user to correct command line.
> > > > Hence I dislike approach taken in this patch.
> > > >
> > > > If you really wish to provide default, push it out of
> > > > generic code into ARM specific one
> > > > (then I won't oppose it that much (I think PPC does
> > > > some magic like this))
> > > > Also behavior seems to be ARM specific so generic
> > > > NUMA code isn't a place for it anyways
> > >
> > > The distance-map DT node and the default 10/20 distance-map values
> > > aren't arch-specific. RISCV is using it too.
> > >
> > > I'm on the fence with this. I see erroring-out to require users
> > > to provide explicit command lines as a good thing, but I also
> > > see it as potentially an unnecessary burden for those that want
> > > the default map anyway. The optional nature of the distance-map
> > > node and the specification of the default map is here [1]
> > >
> > > [1] Linux source: Documentation/devicetree/bindings/numa.txt
> >
> > Looking at proposed linux patches [ https://lkml.org/lkml/2021/9/27/31 ],
> > using optional distance table as source for numa-node-ids,
> > looks like a hack around kernel's inability to fish them out
> > from CPU &| PCI nodes (using those nodes as source should
> > cover memory-less node use-case).
> >
> > I consider including optional node as a policy decision.
> > So user shall include it explicitly on QEMU command line
> > if necessary (that works just fine for x86), or guest OS
> > can make up defaults on its own in absence of data.
>
> OK, so erroring-out on configs that must provide distance-maps, rather
> than automatically generating them for all configs is better.
>
> >
> > > So, my r-b stands for this patch, but I also wouldn't complain
> > > about respinning it to error out instead.
> >
> > > I would complain about
> > > moving the logic to Arm specific code, though, since RISCV would
> > > then need to duplicate it.
> >
> > Instead of putting workaround in QEMU and then making them generic,
> > I'd prefer to:
> > 1. make QEMU to be able generate DT with memory-less nodes
>
> How? DT syntax doesn't allow this, because each node needs a unique
> name which is derived from its base address, which an empty numa
you are talking about memory@foo nodes, aren't you?
> node doesn't have.
Looking at Documentation/devicetree/bindings/numa.txt
mem/cpu/pci nodes also contain numa-node-id attribute,
so idea is to collect IDs from all present sources
instead of abusing distance map.
That would allow QEMU to skip memory@foo elements for
memory-less nodes because they obviously do not exist
and there is no way to describe them using 'memory' nodes.
> > 2. fix guest to get numa-node-id from CPU/PCI nodes if
> > memory node isn't present,
>
> I'm not sure that's possible with DT. If it is, then proposing it
> upstream to Linux DT maintainers would be the next step.
Added Rob to CC.
>
> > or use ACPI tables which can
> > describe memory-less NUMA nodes if fixing how DT is
> > parsed unfeasible.
>
> We use ACPI already for our guests, but we also generate a DT (which
> edk2 consumes). We can't generate a valid DT when empty numa nodes
does edk2 actually uses numa info from QEMU?
> are put on the command line unless we follow a DT spec saying how
> to do that. The current spec says we should have a distance-map
> that contains those nodes.
can you point out to the spec and place within it, pls?
> Thanks,
> drew
>
next prev parent reply other threads:[~2021-10-12 13:55 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-10-06 10:22 [PATCH 0/2] hw/arm/virt: Fix qemu booting failure on device-tree Gavin Shan
2021-10-06 10:22 ` [PATCH 1/2] numa: Set default distance map if needed Gavin Shan
2021-10-06 10:35 ` Andrew Jones
2021-10-06 11:03 ` Gavin Shan
2021-10-06 11:56 ` Andrew Jones
2021-10-07 23:51 ` Gavin Shan
2021-10-08 6:07 ` Andrew Jones
2021-10-12 6:13 ` Gavin Shan
2021-10-12 9:40 ` Igor Mammedov
2021-10-12 10:31 ` Gavin Shan
2021-10-12 11:18 ` Igor Mammedov
2021-10-12 11:48 ` Andrew Jones
2021-10-12 12:34 ` Igor Mammedov
2021-10-12 13:05 ` Andrew Jones
2021-10-12 22:59 ` Gavin Shan
2021-10-12 10:37 ` Andrew Jones
2021-10-12 12:27 ` Igor Mammedov
2021-10-12 13:13 ` Andrew Jones
2021-10-12 13:53 ` Igor Mammedov [this message]
2021-10-12 23:32 ` Gavin Shan
2021-10-13 9:32 ` Igor Mammedov
2021-10-13 6:29 ` Andrew Jones
2021-10-06 10:22 ` [PATCH 2/2] hw/arm/virt: Don't create device-tree node for empty NUMA node Gavin Shan
2021-10-06 10:36 ` Andrew Jones
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20211012155321.256e8867@redhat.com \
--to=imammedo@redhat.com \
--cc=drjones@redhat.com \
--cc=ehabkost@redhat.com \
--cc=gshan@redhat.com \
--cc=peter.maydell@linaro.org \
--cc=qemu-arm@nongnu.org \
--cc=qemu-devel@nongnu.org \
--cc=robh@kernel.org \
--cc=shan.gavin@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).