Re: [PATCH 1/5] x86: allow specifying the NUMA nodes Dom0 should run on

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Dario Faggioli <dario.faggioli@citrix.com>
To: "JBeulich@suse.com" <JBeulich@suse.com>
Cc: "Keir (Xen.org)" <keir@xen.org>,
	Ian Campbell <Ian.Campbell@citrix.com>,
	Andrew Cooper <Andrew.Cooper3@citrix.com>,
	"Tim (Xen.org)" <tim@xen.org>,
	"xen-devel@lists.xenproject.org" <xen-devel@lists.xenproject.org>,
	Ian Jackson <Ian.Jackson@citrix.com>
Subject: Re: [PATCH 1/5] x86: allow specifying the NUMA nodes Dom0 should run on
Date: Fri, 27 Feb 2015 10:04:30 +0000	[thread overview]
Message-ID: <1425031468.10194.43.camel@citrix.com> (raw)
In-Reply-To: <54F03CE0020000780006472F@mail.emea.novell.com>


[-- Attachment #1.1: Type: text/plain, Size: 5415 bytes --]

On Fri, 2015-02-27 at 08:46 +0000, Jan Beulich wrote:
> >>> On 26.02.15 at 18:14, <dario.faggioli@citrix.com> wrote:
> > On Thu, 2015-02-26 at 13:52 +0000, Jan Beulich wrote:
> >> +### dom0\_nodes
> >> +
> >> +> `= <integer>[,...]`
> >> +
> >> +Specify the NUMA nodes to place Dom0 on. Defaults for vCPU-s created
> >> +and memory assigned to Dom0 will be adjusted to match the node
> >> +restrictions set up here. Note that the values to be specified here are
> >> +ACPI PXM ones, not Xen internal node numbers.
> >> +
> > Why use PXM ids? It might be me being much more used to work with NUMA
> > node ids, but wouldn't the other way round be more consistent (almost
> > everything the user interacts with after boot speak node ids) and easier
> > for the user to figure things out (e.g., with tools like numactl on
> > baremetal)?
> 
> This way behavior doesn't change if internally in the hypervisor we
> need to change the mapping from PXMs to node IDs.
> 
Ok, I see the value of this. I'm still a bit concerned about the fact
that everything else "speak" NUMA node, but it's probably just me being
much more used to that than to PXMs. :-)

> >> +static struct vcpu *__init setup_vcpu(struct domain *d, unsigned int vcpu_id,
> >> +                                      unsigned int cpu)
> >> +{
> >> +    struct vcpu *v = alloc_vcpu(d, vcpu_id, cpu);
> >> +
> >> +    if ( v )
> >> +    {
> >> +        if ( !d->is_pinned )
> >> +            cpumask_copy(v->cpu_hard_affinity, &dom0_cpus);
> >> +        cpumask_copy(v->cpu_soft_affinity, &dom0_cpus);
> >> +    }
> >> +
> > About this, for DomUs, now that we have soft affinity available, what we
> > do is set only soft affinity to match the NUMA placement. I think I see
> > and agree why we want to be 'more strict' in Dom0, but I felt like it
> > was worth to point out the difference in behaviour (should it be
> > documented somewhere?).
> 
> I'm simply adjusting what sched_init_vcpu() did, which is alter
> hard affinity conditionally on is_pinned and soft affinity
> unconditionally.
> 
Ok, I understand the idea behing this better now, thanks.

So, with the following boot command line (i.e., with 'dom0_vcpus_pin'):
com1=115200,8n1 dom0_vcpus_pin dom0_nodes=1 sched=credit noreboot dom0_mem=512M,max:512M dom0_max_vcpus=4 console=com1

I get this:
Name                                ID  VCPU   CPU State   Time(s) Affinity (Hard / Soft)
Domain-0                             0     0    8   -b-       4.4  8 / 8-15
Domain-0                             0     1    9   -b-       4.0  9 / 8-15
Domain-0                             0     2   10   r--       4.1  10 / 8-15
Domain-0                             0     3   11   -b-       3.6  11 / 8-15

With the following boot command line (i.e., without 'dom0_vcpus_pin'):
com1=115200,8n1 dom0_nodes=1 sched=credit noreboot dom0_mem=512M,max:512M dom0_max_vcpus=4 console=com1

I get this:
Name                                ID  VCPU   CPU State   Time(s) Affinity (Hard / Soft)
Domain-0                             0     0    8   r--       4.3  8-15 / 8-15
Domain-0                             0     1   12   -b-       2.9  8-15 / 8-15
Domain-0                             0     2   10   r--       3.5  8-15 / 8-15
Domain-0                             0     3   14   -b-       2.9  8-15 / 8-15

Setting soft affinity as a superset of (in the former case) or equal to
(in the latter) hard affinity is just pure overhead, when in the
scheduler.

In fact, if the scheduler sees that soft affinity is defined, it will go
through the load balancing/vcpu placement logic twice, the first time
using the soft affinity mask, the second using the hard affinity one.
Actually, the first time it uses 'soft & hard', which in these cases is
exactly equal to hard, and that's why I'm calling this pure overhead.

I probably should add checks in the scheduler to identify such
situations as "no need to consider soft affinity". I thought about this
before, but didn't do that because it's a more cpumask_foo() fiddling in
a few hot paths... but of course I can check for the relationship
between hard and soft affinity masks upfront, cache the result in a
bool_t, and use _that_ in hot paths... what do you think?

All this being said, I still would avoid putting the system in a
configuration where soft is superset or equal to hard, at the very least
not automatically, as I think it can appear confusing to the user (the
user himself can, of course, do that after boot, for Dom0 or DomUs, but
that's another story, I think). So I'm now thinking whether it wouldn't
be better to, in this patch, leave soft affinity alone completely.

Then, if we want to make it possible to tweak soft affinity, we can
allow for something like "dom0_nodes=soft:1,3" and, in that case, alter
soft affinity only.

> > BTW, mostly out of curiosity, I've had a few strange issues/conflicts in
> > applying this on top of staging, in order to test it... Was it me doing
> > something very stupid, or was this based on something different?
> 
> Apart from the one patch named in the cover letter there shouldn't
> be any other dependencies. Without you naming the issues you
> encountered, I can't tell.
> 
I see. Never mind then, maybe I messed up with my various branches...
Sorry for bothering with this. :-)

Regards,
Dario

[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

next prev parent reply	other threads:[~2015-02-27 10:04 UTC|newest]

Thread overview: 51+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-02-26 13:44 [PATCH 0/5] (not just)x86/Dom0: NUMA related adjustments Jan Beulich
2015-02-26 13:52 ` [PATCH 1/5] x86: allow specifying the NUMA nodes Dom0 should run on Jan Beulich
2015-02-26 17:14   ` Dario Faggioli
2015-02-27  8:46     ` Jan Beulich
2015-02-27 10:04       ` Dario Faggioli [this message]
2015-02-27 10:50         ` Jan Beulich
2015-02-27 14:54           ` Dario Faggioli
2015-02-27 15:04             ` Jan Beulich
2015-03-03 10:51             ` Jan Beulich
2015-03-04 10:18               ` Dario Faggioli
2015-03-06  9:11               ` Jan Beulich
2015-03-06 10:46                 ` Dario Faggioli
2015-03-06 11:33                   ` Dario Faggioli
2015-03-06 13:26                     ` Jan Beulich
2015-03-06 11:49                   ` Jan Beulich
2015-03-03  9:59   ` Ian Campbell
2015-03-05 16:11   ` Andrew Cooper
2015-03-05 16:43     ` Jan Beulich
2015-03-05 17:27       ` Andrew Cooper
2015-03-06  9:19         ` [PATCH 1/5 v2] " Jan Beulich
2015-03-06 10:41           ` Dario Faggioli
2015-03-06 16:05           ` Andrew Cooper
2015-02-26 13:53 ` [PATCH 2/5] allow domain heap allocations to specify more than one NUMA node Jan Beulich
2015-02-27 11:34   ` Dario Faggioli
2015-03-02 17:12   ` Ian Campbell
2015-03-03  7:59     ` Jan Beulich
2015-03-05 16:18   ` Andrew Cooper
2015-02-26 13:54 ` [PATCH 3/5] x86: widen NUMA nodes to be allocated from Jan Beulich
2015-02-27 13:27   ` Dario Faggioli
2015-02-27 13:36     ` Jan Beulich
2015-02-27 14:11       ` Dario Faggioli
2015-02-27 13:38     ` Julien Grall
2015-02-27 13:55       ` Dario Faggioli
2015-02-27 13:58       ` Jan Beulich
2015-02-27 13:46     ` Ian Campbell
2015-02-27 14:00       ` Dario Faggioli
2015-02-27 14:03       ` Jan Beulich
2015-03-05 16:39   ` Andrew Cooper
2015-02-26 13:55 ` [PATCH 4/5] VT-d: " Jan Beulich
2015-03-05 17:08   ` Andrew Cooper
2015-03-09  3:07     ` Tian, Kevin
2015-02-26 13:56 ` [PATCH 5/5] AMD IOMMU: " Jan Beulich
2015-03-05 17:30   ` Andrew Cooper
2015-03-06  7:50     ` Jan Beulich
2015-03-06 12:15       ` Andrew Cooper
2015-03-09 15:42         ` Suravee Suthikulanit
2015-03-09 17:26           ` Andrew Cooper
2015-03-09 19:02             ` Suravee Suthikulanit
2015-03-10  7:35               ` Jan Beulich
2015-03-10 13:55                 ` Boris Ostrovsky
2015-02-27 10:04 ` [PATCH 0/5] (not just)x86/Dom0: NUMA related adjustments Dario Faggioli

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1425031468.10194.43.camel@citrix.com \
    --to=dario.faggioli@citrix.com \
    --cc=Andrew.Cooper3@citrix.com \
    --cc=Ian.Campbell@citrix.com \
    --cc=Ian.Jackson@citrix.com \
    --cc=JBeulich@suse.com \
    --cc=keir@xen.org \
    --cc=tim@xen.org \
    --cc=xen-devel@lists.xenproject.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.