From: He Chen <he.chen@linux.intel.com>
To: Eduardo Habkost <ehabkost@redhat.com>
Cc: "Daniel P. Berrange" <berrange@redhat.com>,
"Michael S . Tsirkin" <mst@redhat.com>,
Markus Armbruster <armbru@redhat.com>,
qemu-devel@nongnu.org, Igor Mammedov <imammedo@redhat.com>,
Paolo Bonzini <pbonzini@redhat.com>,
Chao Peng <chao.p.peng@linux.intel.com>,
Richard Henderson <rth@twiddle.net>,
Eric Blake <eblake@redhat.com>
Subject: Re: [Qemu-devel] [RFC] x86: Allow to set NUMA distance for different NUMA nodes
Date: Tue, 7 Mar 2017 09:50:38 +0800 [thread overview]
Message-ID: <20170307015038.GA17728@he> (raw)
In-Reply-To: <20170303171050.GA2778@thinpad.lan.raisama.net>
On Fri, Mar 03, 2017 at 02:10:50PM -0300, Eduardo Habkost wrote:
> On Fri, Mar 03, 2017 at 04:52:18PM +0000, Daniel P. Berrange wrote:
> > On Fri, Mar 03, 2017 at 01:47:51PM -0300, Eduardo Habkost wrote:
> > > On Fri, Mar 03, 2017 at 04:26:12PM +0000, Daniel P. Berrange wrote:
> > > > On Fri, Mar 03, 2017 at 10:09:22AM -0600, Eric Blake wrote:
> > > > > On 03/03/2017 07:57 AM, Eduardo Habkost wrote:
> > > > >
> > > > > >> With this patch, when a user wants to create a guest that contains
> > > > > >> several vNUMA nodes and also wants to set distance among those nodes,
> > > > > >> the QEMU command would like:
> > > > > >>
> > > > > >> ```
> > > > > >> -object memory-backend-ram,size=1G,prealloc=yes,host-nodes=0,policy=bind,id=node0 \
> > > > > >> -numa node,nodeid=0,cpus=0,memdev=node0,distance=10,distance=21,distance=31,distance=41 \
> > > > >
> > > > > >
> > > > > > It would be nice to have a more intuitive syntax to represent
> > > > > > ordered lists in QemuOpts. But this is what we have today.
> > > > > >
> > > > >
> > > > > Markus has the discussion on representing arrays via the command line;
> > > > > particularly since this array is very tightly coupled to the order in
> > > > > which values are presented, it may be worth having:
> > > > >
> > > > > -numa
> > > > > node,nodeid=0,cpus=0,memdev=nod0,distance.0=10,distance.1=21,distance.2=31,distance.3=41
> > > > >
> > > > > with the explicit distance.0= suffixes to distance making it more
> > > > > obvious that we are dealing with an array.
> > > > >
> > > > > > I think the proposal makes sense. I would like the semantics of the new option
> > > > > > to be documented at qapi-schema.json and qemu-options.hx.
> > > > > >
> > > > > > I would call the new NumaNodeOptions field "distances", as it is
> > > > > > a list of distances.
> > > > >
> > > > > Indeed, Markus is trying (with his work on -blockdev for 2.9) to get the
> > > > > command line to a point where it is identical to the QMP code, by
> > > > > reusing qapi-schema.json, so we should very much keep that in mind with
> > > > > whatever we add to -numa in 2.10.
> > > > >
> > > > >
> > > > > > but in the future we could support something like:
> > > > > >
> > > > > > -numa node,nodeid=0,cpus=0,memdev=node0 \
> > > > > > -numa node,nodeid=1,cpus=1,memdev=node1 \
> > > > > > -numa node,nodeid=2,cpus=2,memdev=node2 \
> > > > > > -numa node,nodeid=3,cpus=3,memdev=node3 \
> > > > > > -numa distances,distances[0][0]=10,distances[0][1]=21,distances[0][2]=31,distances[0][3]=41,\
> > > > > > distances[1][0]=21,distances[1][1]=10,distances[1][2]=21,distances[1][3]=31,\
> > > > > > distances[2][0]=31,distances[2][1]=21,distances[2][2]=10,distances[2][3]=21,\
> > > > > > distances[3][0]=41,distances[3][1]=31,distances[3][2]=21,distances[3][3]=10
> > > > >
> > > > > Except that [] requires special shell quoting, so the proposal would be
> > > > > more like:
> > > > >
> > > > > -numa distances.0.0=10,distances.0.1=21
> > > > >
> > > > > Right now, QMP doesn't support 2-D arrays (although this may be a good
> > > > > reason to introduce support), so that's also something to think about
> > > > > (not insurmountable, but makes the task more complex).
> > > >
> > > > What I don't like about this syntax is that it is duplicating information
> > > > twice. IIUC the NUMA distance information is unidirectional, so specifying
> > > > the same data for both direetions (node 0 -> node 3, and node 3 -> node 0)
> > > > looks like overkill. Also the self-node distance isi defined to always be
> > > > 10 IIUC, so specifying that is not required. IOW, could cut down the data
> > > > we need to provider to just
> > > >
> > > > -numa distances,nodea=0,nodeb=1,value=20
> > > > -numa distances,nodea=0,nodeb=2,value=20
> > > > -numa distances,nodea=0,nodeb=3,value=20
> > > > -numa distances,nodea=1,nodeb=2,value=20
> > > > -numa distances,nodea=1,nodeb=3,value=20
> > > > -numa distances,nodea=2,nodeb=3,value=20
> > >
> > > The ACPI spec (I'm looking at revision 5.0) explicitly mentions
> > > that A->B distance may be different from B->A distrance:
> > >
> > > "The entry value is a one-byte unsigned integer. The relative
> > > distance from System Locality i to System Locality j is the
> > > i*N + j entry in the matrix, where N is the number of System
> > > Localities. Except for the relative distance from a System
> > > Locality to itself, each relative distance is stored twice in the
> > > matrix. This provides the capability to describe the scenario
> > > where the relative distances for the two directions between
> > > System Localities is different."
> >
> > Ah interesting, learn something new every day ? I've only made
> > that unidirectional assumption for the last 10 years ;-P
> >
> > > But I agree we could figure out a more compact syntax for more
> > > common cases where self-node distance is 10 and distance is the
> > > same both ways.
> >
> > QAPI would need a specialized numeric matrix type, which we could
> > efficiently map into some CLI syntax, in order to avoid needing to
> > tickle the rather verbose general purpose list syntax. Probably
> > not worth the hassle though - rather than just picking shorter
> > variable names eg
> >
> > -numa dist,a=0,b=1,val=3
> >
> > instead of
> >
> > -numa distances,nodea=0,nodeb=1,value=20
>
> Whatever syntax/names we choose, we could have reasonable
> defaults for omitted values:
>
> * If A->B is set and B->A is omitted, use the same value for both
> A->B and B->A
> * If A->A is omitted, use min(10, configured_distances)
>
> This way, the previous example:
>
> -numa distances,distances.0.0=10,distances.0.1=21,distances.0.2=31,distances.0.3=41,\
> distances.1.0=21,distances.1.1=10,distances.1.2=21,distances.1.3=31,\
> distances.2.0=31,distances.2.1=21,distances.2.2=10,distances.2.3=21,\
> distances.3.0=41,distances.3.1=31,distances.3.2=21,distances.3.3=10
>
> could be written as:
>
> -numa distances,distances.0.1=21,distances.0.2=31,distances.0.3=41,\
> distances.1.2=21,distances.1.3=31,\
> distances.2.3=21
>
It seems that the dotted key convention has not been supported yet.
So which syntax do you think is proper for NUMA distance?
Maybe I will implement something like `-numa dist,a=0,b=1,val=21` first
then change the syntax to dotted key convention when it get merged?
next prev parent reply other threads:[~2017-03-07 1:50 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-03-03 5:01 [Qemu-devel] [RFC] x86: Allow to set NUMA distance for different NUMA nodes He Chen
2017-03-03 5:19 ` no-reply
2017-03-03 13:57 ` Eduardo Habkost
2017-03-03 16:09 ` Eric Blake
2017-03-03 16:26 ` Daniel P. Berrange
2017-03-03 16:47 ` Eduardo Habkost
2017-03-03 16:52 ` Daniel P. Berrange
2017-03-03 17:10 ` Eduardo Habkost
2017-03-03 17:12 ` Daniel P. Berrange
2017-03-07 1:50 ` He Chen [this message]
2017-03-07 7:37 ` Markus Armbruster
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20170307015038.GA17728@he \
--to=he.chen@linux.intel.com \
--cc=armbru@redhat.com \
--cc=berrange@redhat.com \
--cc=chao.p.peng@linux.intel.com \
--cc=eblake@redhat.com \
--cc=ehabkost@redhat.com \
--cc=imammedo@redhat.com \
--cc=mst@redhat.com \
--cc=pbonzini@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=rth@twiddle.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).