Re: [PATCH v6 1/2] qom: new object to associate device to numa node

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

From: Dan Williams <dan.j.williams@intel.com>
To: David Hildenbrand <david@redhat.com>,
	Jonathan Cameron <Jonathan.Cameron@huawei.com>,
	Alex Williamson <alex.williamson@redhat.com>
Cc: Ankit Agrawal <ankita@nvidia.com>,
	Jason Gunthorpe <jgg@nvidia.com>,
	"clg@redhat.com" <clg@redhat.com>,
	"shannon.zhaosl@gmail.com" <shannon.zhaosl@gmail.com>,
	"peter.maydell@linaro.org" <peter.maydell@linaro.org>,
	"ani@anisinha.ca" <ani@anisinha.ca>,
	"berrange@redhat.com" <berrange@redhat.com>,
	"eduardo@habkost.net" <eduardo@habkost.net>,
	"imammedo@redhat.com" <imammedo@redhat.com>,
	"mst@redhat.com" <mst@redhat.com>,
	"eblake@redhat.com" <eblake@redhat.com>,
	"armbru@redhat.com" <armbru@redhat.com>,
	"gshan@redhat.com" <gshan@redhat.com>,
	Aniket Agashe <aniketa@nvidia.com>, Neo Jia <cjia@nvidia.com>,
	Kirti Wankhede <kwankhede@nvidia.com>,
	"Tarun Gupta (SW-GPU)" <targupta@nvidia.com>,
	Vikram Sethi <vsethi@nvidia.com>,
	"Andy Currid" <acurrid@nvidia.com>,
	Dheeraj Nigam <dnigam@nvidia.com>, Uday Dhoke <udhoke@nvidia.com>,
	"qemu-arm@nongnu.org" <qemu-arm@nongnu.org>,
	"qemu-devel@nongnu.org" <qemu-devel@nongnu.org>
Subject: Re: [PATCH v6 1/2] qom: new object to associate device to numa node
Date: Wed, 10 Jan 2024 15:19:05 -0800	[thread overview]
Message-ID: <659f25e98bbb_5cee2945@dwillia2-xfh.jf.intel.com.notmuch> (raw)
In-Reply-To: <16d54fd2-9bab-46cd-a1b7-9742674453d6@redhat.com>

David Hildenbrand wrote:
> On 09.01.24 17:52, Jonathan Cameron wrote:
> > On Thu, 4 Jan 2024 10:39:41 -0700
> > Alex Williamson <alex.williamson@redhat.com> wrote:
> > 
> >> On Thu, 4 Jan 2024 16:40:39 +0000
> >> Ankit Agrawal <ankita@nvidia.com> wrote:
> >>
> >>> Had a discussion with RH folks, summary follows:
> >>>
> >>> 1. To align with the current spec description pointed by Jonathan, we first do
> >>>       a separate object instance per GI node as suggested by Jonathan. i.e.
> >>>       a acpi-generic-initiator would only link one node to the device. To
> >>>       associate a set of nodes, those number of object instances should be
> >>>       created.
> >>> 2. In parallel, we work to get the spec updated. After the update, we switch
> >>>      to the current implementation to link a PCI device with a set of NUMA
> >>>      nodes.
> >>>
> >>> Alex/Jonathan, does this sound fine?
> >>>    
> >>
> >> Yes, as I understand Jonathan's comments, the acpi-generic-initiator
> >> object should currently define a single device:node relationship to
> >> match the ACPI definition.
> > 
> > Doesn't matter for this, but it's a many_device:single_node
> > relationship as currently defined. We should be able to support that
> > in any new interfaces for QEMU.
> > 
> >>   Separately a clarification of the spec
> >> could be pursued that could allow us to reinstate a node list option
> >> for the acpi-generic-initiator object.  In the interim, a user can
> >> define multiple 1:1 objects to create the 1:N relationship that's
> >> ultimately required here.  Thanks,
> > 
> > Yes, a spec clarification would work, probably needs some text
> > to say a GI might not be an initiator as well - my worry is
> > theoretical backwards compatibility with a (probably
> > nonexistent) OS that assumes the N:1 mapping. So you may be in
> > new SRAT entry territory.
> > 
> > Given that, an alternative proposal that I think would work
> > for you would be to add a 'placeholder' memory node definition
> > in SRAT (so allow 0 size explicitly - might need a new SRAT
> > entry to avoid backwards compat issues).
> 
> Putting all the PCI/GI/... complexity aside, I'll just raise again that 
> for virtio-mem something simple like that might be helpful as well, IIUC.
> 
> 	-numa node,nodeid=2 \
> 	...
> 	-device virtio-mem-pci,node=2,... \
> 
> All we need is the OS to prepare for an empty node that will get 
> populated with memory later.
> 
> So if that's what a "placeholder" node definition in srat could achieve 
> as well, even without all of the other acpi-generic-initiator stuff, 
> that would be great.

Please no "placeholder" definitions in SRAT. One of the main thrusts of
CXL is to move away from static ACPI tables describing vendor-specific
memory topology, towards an industry standard device enumeration.

Platform firmware enumerates the platform CXL "windows" (ACPI CEDT
CFMWS) and the relative performance of the CPU access a CXL port (ACPI
HMAT Generic Port), everything else is CXL standard enumeration.

It is strictly OS policy about how many NUMA nodes it imagines it wants
to define within that playground. The current OS policy is one node per
"window". If a solution believes Linux should be creating more than that
I submit that's a discussion with OS policy developers, not a trip to
the BIOS team to please sprinkle in more placeholders. Linux can fully
own the policy here. The painful bit is just that it never had to
before.

next prev parent reply	other threads:[~2024-01-10 23:19 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-12-25  4:56 [PATCH v6 0/2] acpi: report numa nodes for device memory using GI ankita
2023-12-25  4:56 ` [PATCH v6 1/2] qom: new object to associate device to numa node ankita
2024-01-02 12:58   ` Jonathan Cameron via
2024-01-04  3:36     ` Ankit Agrawal
2024-01-04 12:33       ` Ankit Agrawal
2024-01-04 16:40       ` Ankit Agrawal
2024-01-04 17:39         ` Alex Williamson
2024-01-09 16:52           ` Jonathan Cameron via
2024-01-09 17:02             ` David Hildenbrand
2024-01-09 17:10               ` Jason Gunthorpe
2024-01-09 19:36                 ` Dan Williams
2024-01-09 19:38                   ` Jason Gunthorpe
2024-01-10 23:19               ` Dan Williams [this message]
2024-01-11  7:01                 ` Michael S. Tsirkin
2024-01-16 14:02                 ` Ankit Agrawal
2024-01-04 17:23       ` Alex Williamson
2024-01-09  4:21         ` Ankit Agrawal
2024-01-09 16:38       ` Jonathan Cameron via
2024-01-08 12:09   ` Markus Armbruster
2024-01-09  4:11     ` Ankit Agrawal
2024-01-09  7:02       ` Markus Armbruster
2023-12-25  4:56 ` [PATCH v6 2/2] hw/acpi: Implement the SRAT GI affinity structure ankita
2024-01-02 12:31 ` [PATCH v6 0/2] acpi: report numa nodes for device memory using GI Jonathan Cameron via
2024-01-04  3:05   ` Ankit Agrawal
2024-02-12 16:05     ` Michael S. Tsirkin
2024-02-13  3:32       ` Ankit Agrawal

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=659f25e98bbb_5cee2945@dwillia2-xfh.jf.intel.com.notmuch \
    --to=dan.j.williams@intel.com \
    --cc=Jonathan.Cameron@huawei.com \
    --cc=acurrid@nvidia.com \
    --cc=alex.williamson@redhat.com \
    --cc=ani@anisinha.ca \
    --cc=aniketa@nvidia.com \
    --cc=ankita@nvidia.com \
    --cc=armbru@redhat.com \
    --cc=berrange@redhat.com \
    --cc=cjia@nvidia.com \
    --cc=clg@redhat.com \
    --cc=david@redhat.com \
    --cc=dnigam@nvidia.com \
    --cc=eblake@redhat.com \
    --cc=eduardo@habkost.net \
    --cc=gshan@redhat.com \
    --cc=imammedo@redhat.com \
    --cc=jgg@nvidia.com \
    --cc=kwankhede@nvidia.com \
    --cc=mst@redhat.com \
    --cc=peter.maydell@linaro.org \
    --cc=qemu-arm@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    --cc=shannon.zhaosl@gmail.com \
    --cc=targupta@nvidia.com \
    --cc=udhoke@nvidia.com \
    --cc=vsethi@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).