All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jonathan Cameron via qemu development <qemu-devel@nongnu.org>
To: Jonathan Cameron via qemu development <qemu-devel@nongnu.org>
Cc: Jonathan Cameron <jonathan.cameron@huawei.com>,
	Ankit Agrawal <ankita@nvidia.com>,
	Jason Gunthorpe <jgg@nvidia.com>,
	"Michael S. Tsirkin" <mst@redhat.com>,
	Igor Mammedov <imammedo@redhat.com>,
	Vikram Sethi <vsethi@nvidia.com>,
	Shameer Kolothum Thodi <skolothumtho@nvidia.com>,
	"alex@shazbot.org" <alex@shazbot.org>,
	"anisinha@redhat.com" <anisinha@redhat.com>,
	Aniket Agashe <aniketa@nvidia.com>, Neo Jia <cjia@nvidia.com>,
	Kirti Wankhede <kwankhede@nvidia.com>,
	"Tarun Gupta (SW-GPU)" <targupta@nvidia.com>,
	Zhi Wang <zhiw@nvidia.com>, Matt Ochs <mochs@nvidia.com>,
	Krishnakant Jaju <kjaju@nvidia.com>
Subject: Re: [PATCH v1 1/1] hw/acpi/pci.c: preserve generic initiator insertion order
Date: Tue, 24 Feb 2026 17:13:40 +0000	[thread overview]
Message-ID: <20260224171340.00006613@huawei.com> (raw)
In-Reply-To: <20260224164116.00003fc0@huawei.com>

On Tue, 24 Feb 2026 16:41:16 +0000
Jonathan Cameron via qemu development <qemu-devel@nongnu.org> wrote:

> On Tue, 24 Feb 2026 16:22:56 +0000
> Ankit Agrawal <ankita@nvidia.com> wrote:
> 
> > >> Now the kernel parse it in the sequence of their occurrence. A jumbled up
> > >> sequence thus results in a jumbled up assignment.    
> > >
> > > But what is the actual failure mode here? So the numa IDs are all in a
> > > weird order, what goes wrong from that?    
> > 
> > This interferes with the ability to replicate the numa distance topology
> > on host in the VM through qemu command line.
> > 
> > E.g. consider a NUMA system with 2 sockets each with a GPU.
> > 0,1 are the node ids for the sysmem on socket 0,1 respectively and
> > 2,3 are the node ids for the GPU memory on socket 0,1 respectively
> > dist(0,2) = X
> > dist(0,3) = Y
> > 
> > If we try to replicate this for the VM by passing qemu arguments with
> > 4 numa nodes and assign numa distances similar to host, and for the
> > sake of example qemu mixes up by putting GI for 3 over 2. The SLIT
> > which sets up the distances do it considering the original order in the
> > qemu command line.
> > https://github.com/qemu/qemu/blob/stable-10.2/hw/acpi/aml-build.c#L2040
> > 
> > This would lead to a different numa config in terms of distance within
> > the VM that the one intended through the qemu command line.  
> 
> This is the case where I'd like to see an example of the tables before
> and after your patch.  If the SLIT is not correctly created wrt to PXMs
> (rather than the order of the commands) then we indeed have a QEMU bug that
> needs fixing.  However, I'm confused as SLIT should also not be ordered
> by command line if the say the command line was:
> 
>        -object acpi-generic-initiator,id=gi0,pci-dev=dev0,node=3 \
>        -object acpi-generic-initiator,id=gi1,pci-dev=dev0,node=4 \
>        -object acpi-generic-initiator,id=gi2,pci-dev=dev0,node=6 \
>        -object acpi-generic-initiator,id=gi3,pci-dev=dev0,node=5 \
>        -object acpi-generic-initiator,id=gi4,pci-dev=dev0,node=2 \
>        -object acpi-generic-initiator,id=gi5,pci-dev=dev0,node=7 \
>        -object acpi-generic-initiator,id=gi6,pci-dev=dev0,node=8 \
>        -object acpi-generic-initiator,id=gi7,pci-dev=dev0,node=9 \
> 
> and numa stuff was something like
>        -numa dist,src=3,dst=0,val=100
>        -numa dist,src=4,dst=0,val=200
>        -numa dist,src=5,dst=0,val=300
>        -numa dist,src=6,dst=0,val=100
>        -numa dist,src=7,dst=0,val=200
>        -numa dist,src=8,dst=0,val=300
>        -numa dist,src=9,dst=0,val=100
> 
> Then it should be matching src numbers here to node in the GIs whatever the order.

I had a mess around and it seems SLIT is stable to ordering of the nodes (based
on a very minimal test so I may well be missing something!), but because the
/sys/bus/node/devices/nodeX/distance is reordered by the PXM to kernel numa
node mapping (which as you've observed is first come first served in parsing
for GIs in new nodes), you will see that apparently reordering to reflect the
kernel numa node order.

How do you associate the resulting numa node with a particular resource on your
GPU?  That mapping should also be by PXM and as a result I would expect to see it
refer to the appropriate entry after PXM to node translation in the kernel
whatever order stuff under /sys/bus/nodes/devices/nodeX ends up in.

For extra fun I put my CPUs and memory on different nodes and that always ends
up mapped to the first node in Linux (assuming they are all on one node) with
appropriate reordering of the nodeX/distance entries.

Jonathan


> 
> Thanks,
> 
> Jonathan
> 
> 
> > 
> > Thanks
> > Ankit Agrawal  
> 
> 



  reply	other threads:[~2026-02-24 17:14 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20260222020812.26475-1-ankita@nvidia.com>
2026-02-23  7:28 ` [PATCH v1 1/1] hw/acpi/pci.c: preserve generic initiator insertion order Igor Mammedov
     [not found]   ` <SA1PR12MB7199F0C2E1D2325B0062B004B077A@SA1PR12MB7199.namprd12.prod.outlook.com>
2026-02-23  9:44     ` Igor Mammedov
2026-02-23 11:13       ` Jonathan Cameron via qemu development
2026-02-24 13:51         ` Jason Gunthorpe
2026-02-24 14:01           ` Michael S. Tsirkin
2026-02-24 14:42             ` Jason Gunthorpe
2026-02-24 14:48               ` Michael S. Tsirkin
2026-02-24 14:51               ` Ankit Agrawal
2026-02-24 14:54                 ` Michael S. Tsirkin
2026-02-24 14:58                 ` Jason Gunthorpe
2026-02-24 16:22                   ` Ankit Agrawal
2026-02-24 16:30                     ` Michael S. Tsirkin
2026-02-24 16:41                     ` Jonathan Cameron via qemu development
2026-02-24 17:13                       ` Jonathan Cameron via qemu development [this message]
2026-02-24 14:54             ` Jonathan Cameron via qemu development

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260224171340.00006613@huawei.com \
    --to=qemu-devel@nongnu.org \
    --cc=alex@shazbot.org \
    --cc=aniketa@nvidia.com \
    --cc=anisinha@redhat.com \
    --cc=ankita@nvidia.com \
    --cc=cjia@nvidia.com \
    --cc=imammedo@redhat.com \
    --cc=jgg@nvidia.com \
    --cc=jonathan.cameron@huawei.com \
    --cc=kjaju@nvidia.com \
    --cc=kwankhede@nvidia.com \
    --cc=mochs@nvidia.com \
    --cc=mst@redhat.com \
    --cc=skolothumtho@nvidia.com \
    --cc=targupta@nvidia.com \
    --cc=vsethi@nvidia.com \
    --cc=zhiw@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.