Fwd: [PATCH RESEND 01/12] xen: numa-sched: leave node-affinity alone if not in "auto" mode

xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed

* Fwd: [PATCH RESEND 01/12] xen: numa-sched: leave node-affinity alone if not in "auto" mode
@ 2013-11-12  8:11 Jan Beulich
  2013-11-12  8:35 ` Keir Fraser
  0 siblings, 1 reply; 4+ messages in thread
From: Jan Beulich @ 2013-11-12  8:11 UTC (permalink / raw)
  To: Keir Fraser; +Cc: xen-devel

Hi Keir,

below the one remaining patch mentioned yesterday.

Jan

>>> On 05.11.13 at 15:34, Dario Faggioli <dario.faggioli@citrix.com> wrote:
> If the domain's NUMA node-affinity is being specified by the
> user/toolstack (instead of being automatically computed by Xen),
> we really should stick to that. This means domain_update_node_affinity()
> is wrong when it filters out some stuff from there even in "!auto"
> mode.
> 
> This commit fixes that. Of course, this does not mean node-affinity
> is always honoured (e.g., a vcpu won't run on a pcpu of a different
> cpupool) but the necessary logic for taking into account all the
> possible situations lives in the scheduler code, where it belongs.
> 
> What could happen without this change is that, under certain
> circumstances, the node-affinity of a domain may change when the
> user modifies the vcpu-affinity of the domain's vcpus. This, even
> if probably not a real bug, is at least something the user does
> not expect, so let's avoid it.
> 
> Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
> Reviewed-by: George Dunlap <george.dunlap@eu.citrix.com>
> ---
> This has been submitted already as a single patch on its own.
> Since this series needs the change done here, just include it
> in here, instead of pinging the original submission and deferring
> posting this series.
> ---
>  xen/common/domain.c |   28 +++++++++-------------------
>  1 file changed, 9 insertions(+), 19 deletions(-)
> 
> diff --git a/xen/common/domain.c b/xen/common/domain.c
> index 5999779..af31ab4 100644
> --- a/xen/common/domain.c
> +++ b/xen/common/domain.c
> @@ -352,7 +352,6 @@ void domain_update_node_affinity(struct domain *d)
>      cpumask_var_t cpumask;
>      cpumask_var_t online_affinity;
>      const cpumask_t *online;
> -    nodemask_t nodemask = NODE_MASK_NONE;
>      struct vcpu *v;
>      unsigned int node;
>  
> @@ -374,28 +373,19 @@ void domain_update_node_affinity(struct domain *d)
>          cpumask_or(cpumask, cpumask, online_affinity);
>      }
>  
> +    /*
> +     * If d->auto_node_affinity is true, the domain's node-affinity mask
> +     * (d->node_affinity) is automaically computed from all the domain's
> +     * vcpus' vcpu-affinity masks (the union of which we have just built
> +     * above in cpumask). OTOH, if d->auto_node_affinity is false, we
> +     * must leave the node-affinity of the domain alone.
> +     */
>      if ( d->auto_node_affinity )
>      {
> -        /* Node-affinity is automaically computed from all vcpu-affinities 
> */
> +        nodes_clear(d->node_affinity);
>          for_each_online_node ( node )
>              if ( cpumask_intersects(&node_to_cpumask(node), cpumask) )
> -                node_set(node, nodemask);
> -
> -        d->node_affinity = nodemask;
> -    }
> -    else
> -    {
> -        /* Node-affinity is provided by someone else, just filter out cpus
> -         * that are either offline or not in the affinity of any vcpus. */
> -        nodemask = d->node_affinity;
> -        for_each_node_mask ( node, d->node_affinity )
> -            if ( !cpumask_intersects(&node_to_cpumask(node), cpumask) )
> -                node_clear(node, nodemask);//d->node_affinity);
> -
> -        /* Avoid loosing track of node-affinity because of a bad
> -         * vcpu-affinity has been specified. */
> -        if ( !nodes_empty(nodemask) )
> -            d->node_affinity = nodemask;
> +                node_set(node, d->node_affinity);
>      }
>  
>      sched_set_node_affinity(d, &d->node_affinity);

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH RESEND 01/12] xen: numa-sched: leave node-affinity alone if not in "auto" mode
  2013-11-12  8:11 Fwd: [PATCH RESEND 01/12] xen: numa-sched: leave node-affinity alone if not in "auto" mode Jan Beulich
@ 2013-11-12  8:35 ` Keir Fraser
  0 siblings, 0 replies; 4+ messages in thread
From: Keir Fraser @ 2013-11-12  8:35 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel

On 12/11/2013 08:11, "Jan Beulich" <JBeulich@suse.com> wrote:

> Hi Keir,
> 
> below the one remaining patch mentioned yesterday.
> 
> Jan
> 
>>>> On 05.11.13 at 15:34, Dario Faggioli <dario.faggioli@citrix.com> wrote:
>> If the domain's NUMA node-affinity is being specified by the
>> user/toolstack (instead of being automatically computed by Xen),
>> we really should stick to that. This means domain_update_node_affinity()
>> is wrong when it filters out some stuff from there even in "!auto"
>> mode.
>> 
>> This commit fixes that. Of course, this does not mean node-affinity
>> is always honoured (e.g., a vcpu won't run on a pcpu of a different
>> cpupool) but the necessary logic for taking into account all the
>> possible situations lives in the scheduler code, where it belongs.
>> 
>> What could happen without this change is that, under certain
>> circumstances, the node-affinity of a domain may change when the
>> user modifies the vcpu-affinity of the domain's vcpus. This, even
>> if probably not a real bug, is at least something the user does
>> not expect, so let's avoid it.
>> 
>> Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
>> Reviewed-by: George Dunlap <george.dunlap@eu.citrix.com>

Acked-by: Keir Fraser <keir@xen.org>

>> ---
>> This has been submitted already as a single patch on its own.
>> Since this series needs the change done here, just include it
>> in here, instead of pinging the original submission and deferring
>> posting this series.
>> ---
>>  xen/common/domain.c |   28 +++++++++-------------------
>>  1 file changed, 9 insertions(+), 19 deletions(-)
>> 
>> diff --git a/xen/common/domain.c b/xen/common/domain.c
>> index 5999779..af31ab4 100644
>> --- a/xen/common/domain.c
>> +++ b/xen/common/domain.c
>> @@ -352,7 +352,6 @@ void domain_update_node_affinity(struct domain *d)
>>      cpumask_var_t cpumask;
>>      cpumask_var_t online_affinity;
>>      const cpumask_t *online;
>> -    nodemask_t nodemask = NODE_MASK_NONE;
>>      struct vcpu *v;
>>      unsigned int node;
>>  
>> @@ -374,28 +373,19 @@ void domain_update_node_affinity(struct domain *d)
>>          cpumask_or(cpumask, cpumask, online_affinity);
>>      }
>>  
>> +    /*
>> +     * If d->auto_node_affinity is true, the domain's node-affinity mask
>> +     * (d->node_affinity) is automaically computed from all the domain's
>> +     * vcpus' vcpu-affinity masks (the union of which we have just built
>> +     * above in cpumask). OTOH, if d->auto_node_affinity is false, we
>> +     * must leave the node-affinity of the domain alone.
>> +     */
>>      if ( d->auto_node_affinity )
>>      {
>> -        /* Node-affinity is automaically computed from all vcpu-affinities
>> */
>> +        nodes_clear(d->node_affinity);
>>          for_each_online_node ( node )
>>              if ( cpumask_intersects(&node_to_cpumask(node), cpumask) )
>> -                node_set(node, nodemask);
>> -
>> -        d->node_affinity = nodemask;
>> -    }
>> -    else
>> -    {
>> -        /* Node-affinity is provided by someone else, just filter out cpus
>> -         * that are either offline or not in the affinity of any vcpus. */
>> -        nodemask = d->node_affinity;
>> -        for_each_node_mask ( node, d->node_affinity )
>> -            if ( !cpumask_intersects(&node_to_cpumask(node), cpumask) )
>> -                node_clear(node, nodemask);//d->node_affinity);
>> -
>> -        /* Avoid loosing track of node-affinity because of a bad
>> -         * vcpu-affinity has been specified. */
>> -        if ( !nodes_empty(nodemask) )
>> -            d->node_affinity = nodemask;
>> +                node_set(node, d->node_affinity);
>>      }
>>  
>>      sched_set_node_affinity(d, &d->node_affinity);
> 
> 
> 

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [PATCH RESEND 00/12] Implement per-vcpu NUMA node-affinity for credit1
@ 2013-11-05 14:33 Dario Faggioli
  2013-11-05 14:34 ` [PATCH RESEND 01/12] xen: numa-sched: leave node-affinity alone if not in "auto" mode Dario Faggioli
  0 siblings, 1 reply; 4+ messages in thread
From: Dario Faggioli @ 2013-11-05 14:33 UTC (permalink / raw)
  To: xen-devel
  Cc: Marcus Granado, Keir Fraser, Ian Campbell, Li Yechen,
	George Dunlap, Andrew Cooper, Juergen Gross, Ian Jackson,
	Jan Beulich, Justin Weaver, Daniel De Graaf, Matt Wilson,
	Elena Ufimtseva

Hi,

This is basically a resend of
http://lists.xen.org/archives/html/xen-devel/2013-10/msg00164.html, as someone
requested, for easier review. Nothing change, apart from the fact that I remove
the patches that have already been applied from the series.

The git branch is now this one:

 git://xenbits.xen.org/people/dariof/xen.git numa/per-vcpu-affinity-RESEND

The original cover letter also follows.

Thanks and Regards,
Dario

---

Hi everyone,

So, this series introduces the concept of per-vcpu NUMA node-affinity. In fact,
up to now, node-affinity has only been "per-domain". That means it was the
domain that had a node-affinity and:
 - that node-affinity was used to decide where to allocate the memory for the
   domain;
 - that node-affinity was used to decide on what nodes _all_ the vcpus of the
   domain prefer to be scheduled.

After this series this changes like this:
 - each vcpu of a domain has (well, may have) its own node-affinity, and that
   is what is used to determine (if the credit1 scheduler is used) where each
   specific vcpu prefers to run;
 - the node-affinity of the whole domain is the _union_ of all the
   node-affinities of the domain's vcpus;
 - the memory is still allocated following what the node-affinity of the whole
   domain (so, the union of vcpu node-affinities, as said above) says.

In practise, it's not such a big change, I'm just extending at the per-vcpu
level what we already had at the domain level. This is also making
node-affinity a lot more similar to vcpu-pinning, both in terms of functioning
and user interface. As a side efect, that simplify the scheduling code (at
least the NUMA-aware part) by quite a bit. Finally, and most important, this is
something that will become really important when we will start to support
virtual NUMA topologies, as, a that point, having the same node-affinity for
all the vcpus in a domain won't be enough any longer (we'll want the vcpus from
a particular vnode to have node-afinity with a particular pnode).

More detailed description of the mechanism and of the implementation choices
are provided in the changelogs and in the documentation (docs/misc and
manpages).

One last thing is that this series relies on some other patches and series that
I sent on xen-devel already, but have not been applied yet.  I'm re-sending
them here, as a part of this series, so feel free to pick them up from here, if
wanting to apply them, or comment on them in this thread, if you want me to
change them.  In particular, patches 01 and 03, I already sent as single
patches, patches 04-07, I already sent them as a series. Sorry if that is a bit
clumsy, but I couldn't find a better way to do it. :-)

In the detailed list of patches below, 'x' means previously submitted, '*'
means already acked/reviewed-by.

Finally, Elena, that is not super important, but perhaps, in the next release
of your vNUMA series, you could try to integrate it with this (and of course,
ask if you need anything while trying to do that).

Matt, if/when you eventually get to release, even as RFC or something like
that, your HVM vNUMA series, we can try to figure out how to integrate that
with this, so to use node-affinity instead than pinning.

The series is also available at the following git coordinates:

 git://xenbits.xen.org/people/dariof/xen.git numa/per-vcpu-affinity-v1

http://xenbits.xen.org/gitweb/?p=people/dariof/xen.git;a=shortlog;h=refs/heads/numa/per-vcpu-affinity-v1

Let me know what you think about all this.

Regards,
Dario

---

Dario Faggioli (12):
 x    xen: numa-sched: leave node-affinity alone if not in "auto" mode
      xl: allow for node-wise specification of vcpu pinning
 x    xl: implement and enable dryrun mode for `xl vcpu-pin'
 x    xl: test script for the cpumap parser (for vCPU pinning)
      xen: numa-sched: make space for per-vcpu node-affinity
      xen: numa-sched: domain node-affinity always comes from vcpu node-affinity
      xen: numa-sched: use per-vcpu node-affinity for actual scheduling
      xen: numa-sched: enable getting/specifying per-vcpu node-affinity
      libxc: numa-sched: enable getting/specifying per-vcpu node-affinity
      libxl: numa-sched: enable getting/specifying per-vcpu node-affinity
      xl: numa-sched: enable getting/specifying per-vcpu node-affinity
      xl: numa-sched: enable specifying node-affinity in VM config file

 docs/man/xl.cfg.pod.5                           |   90 ++++-
 docs/man/xl.pod.1                               |   25 +
 docs/misc/xl-numa-placement.markdown            |  124 ++++--
 tools/libxc/xc_domain.c                         |   90 ++++-
 tools/libxc/xenctrl.h                           |   19 +
 tools/libxl/check-xl-vcpupin-parse              |  294 +++++++++++++++
 tools/libxl/check-xl-vcpupin-parse.data-example |   53 +++
 tools/libxl/libxl.c                             |   28 +
 tools/libxl/libxl.h                             |   11 +
 tools/libxl/libxl_dom.c                         |   18 +
 tools/libxl/libxl_numa.c                        |   14 -
 tools/libxl/libxl_types.idl                     |    1 
 tools/libxl/libxl_utils.h                       |   12 +
 tools/libxl/xl.h                                |    1 
 tools/libxl/xl_cmdimpl.c                        |  458 +++++++++++++++++++----
 tools/libxl/xl_cmdtable.c                       |   11 -
 xen/common/domain.c                             |   97 ++---
 xen/common/domctl.c                             |   47 ++
 xen/common/keyhandler.c                         |    6 
 xen/common/sched_credit.c                       |   63 ---
 xen/common/schedule.c                           |   55 +++
 xen/include/public/domctl.h                     |    8 
 xen/include/xen/sched-if.h                      |    2 
 xen/include/xen/sched.h                         |   13 +
 xen/xsm/flask/hooks.c                           |    2 
 25 files changed, 1254 insertions(+), 288 deletions(-)
 create mode 100755 tools/libxl/check-xl-vcpupin-parse
 create mode 100644 tools/libxl/check-xl-vcpupin-parse.data-example

-- 
Signature

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [PATCH RESEND 01/12] xen: numa-sched: leave node-affinity alone if not in "auto" mode
  2013-11-05 14:33 [PATCH RESEND 00/12] Implement per-vcpu NUMA node-affinity for credit1 Dario Faggioli
@ 2013-11-05 14:34 ` Dario Faggioli
  2013-11-05 14:43   ` George Dunlap
  0 siblings, 1 reply; 4+ messages in thread
From: Dario Faggioli @ 2013-11-05 14:34 UTC (permalink / raw)
  To: xen-devel
  Cc: Marcus Granado, Keir Fraser, Ian Campbell, Li Yechen,
	George Dunlap, Andrew Cooper, Juergen Gross, Ian Jackson,
	Jan Beulich, Justin Weaver, Daniel De Graaf, Matt Wilson,
	Elena Ufimtseva

If the domain's NUMA node-affinity is being specified by the
user/toolstack (instead of being automatically computed by Xen),
we really should stick to that. This means domain_update_node_affinity()
is wrong when it filters out some stuff from there even in "!auto"
mode.

This commit fixes that. Of course, this does not mean node-affinity
is always honoured (e.g., a vcpu won't run on a pcpu of a different
cpupool) but the necessary logic for taking into account all the
possible situations lives in the scheduler code, where it belongs.

What could happen without this change is that, under certain
circumstances, the node-affinity of a domain may change when the
user modifies the vcpu-affinity of the domain's vcpus. This, even
if probably not a real bug, is at least something the user does
not expect, so let's avoid it.

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
Reviewed-by: George Dunlap <george.dunlap@eu.citrix.com>
---
This has been submitted already as a single patch on its own.
Since this series needs the change done here, just include it
in here, instead of pinging the original submission and deferring
posting this series.
---
 xen/common/domain.c |   28 +++++++++-------------------
 1 file changed, 9 insertions(+), 19 deletions(-)

diff --git a/xen/common/domain.c b/xen/common/domain.c
index 5999779..af31ab4 100644
--- a/xen/common/domain.c
+++ b/xen/common/domain.c
@@ -352,7 +352,6 @@ void domain_update_node_affinity(struct domain *d)
     cpumask_var_t cpumask;
     cpumask_var_t online_affinity;
     const cpumask_t *online;
-    nodemask_t nodemask = NODE_MASK_NONE;
     struct vcpu *v;
     unsigned int node;
 
@@ -374,28 +373,19 @@ void domain_update_node_affinity(struct domain *d)
         cpumask_or(cpumask, cpumask, online_affinity);
     }
 
+    /*
+     * If d->auto_node_affinity is true, the domain's node-affinity mask
+     * (d->node_affinity) is automaically computed from all the domain's
+     * vcpus' vcpu-affinity masks (the union of which we have just built
+     * above in cpumask). OTOH, if d->auto_node_affinity is false, we
+     * must leave the node-affinity of the domain alone.
+     */
     if ( d->auto_node_affinity )
     {
-        /* Node-affinity is automaically computed from all vcpu-affinities */
+        nodes_clear(d->node_affinity);
         for_each_online_node ( node )
             if ( cpumask_intersects(&node_to_cpumask(node), cpumask) )
-                node_set(node, nodemask);
-
-        d->node_affinity = nodemask;
-    }
-    else
-    {
-        /* Node-affinity is provided by someone else, just filter out cpus
-         * that are either offline or not in the affinity of any vcpus. */
-        nodemask = d->node_affinity;
-        for_each_node_mask ( node, d->node_affinity )
-            if ( !cpumask_intersects(&node_to_cpumask(node), cpumask) )
-                node_clear(node, nodemask);//d->node_affinity);
-
-        /* Avoid loosing track of node-affinity because of a bad
-         * vcpu-affinity has been specified. */
-        if ( !nodes_empty(nodemask) )
-            d->node_affinity = nodemask;
+                node_set(node, d->node_affinity);
     }
 
     sched_set_node_affinity(d, &d->node_affinity);

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH RESEND 01/12] xen: numa-sched: leave node-affinity alone if not in "auto" mode
  2013-11-05 14:34 ` [PATCH RESEND 01/12] xen: numa-sched: leave node-affinity alone if not in "auto" mode Dario Faggioli
@ 2013-11-05 14:43   ` George Dunlap
  0 siblings, 0 replies; 4+ messages in thread
From: George Dunlap @ 2013-11-05 14:43 UTC (permalink / raw)
  To: Dario Faggioli
  Cc: Marcus Granado, Keir Fraser, Ian Campbell, Li Yechen,
	Andrew Cooper, Juergen Gross, Ian Jackson, xen-devel, Jan Beulich,
	Justin Weaver, Daniel De Graaf, Matt Wilson, Elena Ufimtseva

On 11/05/2013 02:34 PM, Dario Faggioli wrote:
> If the domain's NUMA node-affinity is being specified by the
> user/toolstack (instead of being automatically computed by Xen),
> we really should stick to that. This means domain_update_node_affinity()
> is wrong when it filters out some stuff from there even in "!auto"
> mode.
>
> This commit fixes that. Of course, this does not mean node-affinity
> is always honoured (e.g., a vcpu won't run on a pcpu of a different
> cpupool) but the necessary logic for taking into account all the
> possible situations lives in the scheduler code, where it belongs.
>
> What could happen without this change is that, under certain
> circumstances, the node-affinity of a domain may change when the
> user modifies the vcpu-affinity of the domain's vcpus. This, even
> if probably not a real bug, is at least something the user does
> not expect, so let's avoid it.
>
> Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
> Reviewed-by: George Dunlap <george.dunlap@eu.citrix.com>
> ---
> This has been submitted already as a single patch on its own.
> Since this series needs the change done here, just include it
> in here, instead of pinging the original submission and deferring
> posting this series.

Just reiterating what I said on the last send... this one is independent 
and can be checked in whenever you're ready.

  -George

> ---
>   xen/common/domain.c |   28 +++++++++-------------------
>   1 file changed, 9 insertions(+), 19 deletions(-)
>
> diff --git a/xen/common/domain.c b/xen/common/domain.c
> index 5999779..af31ab4 100644
> --- a/xen/common/domain.c
> +++ b/xen/common/domain.c
> @@ -352,7 +352,6 @@ void domain_update_node_affinity(struct domain *d)
>       cpumask_var_t cpumask;
>       cpumask_var_t online_affinity;
>       const cpumask_t *online;
> -    nodemask_t nodemask = NODE_MASK_NONE;
>       struct vcpu *v;
>       unsigned int node;
>
> @@ -374,28 +373,19 @@ void domain_update_node_affinity(struct domain *d)
>           cpumask_or(cpumask, cpumask, online_affinity);
>       }
>
> +    /*
> +     * If d->auto_node_affinity is true, the domain's node-affinity mask
> +     * (d->node_affinity) is automaically computed from all the domain's
> +     * vcpus' vcpu-affinity masks (the union of which we have just built
> +     * above in cpumask). OTOH, if d->auto_node_affinity is false, we
> +     * must leave the node-affinity of the domain alone.
> +     */
>       if ( d->auto_node_affinity )
>       {
> -        /* Node-affinity is automaically computed from all vcpu-affinities */
> +        nodes_clear(d->node_affinity);
>           for_each_online_node ( node )
>               if ( cpumask_intersects(&node_to_cpumask(node), cpumask) )
> -                node_set(node, nodemask);
> -
> -        d->node_affinity = nodemask;
> -    }
> -    else
> -    {
> -        /* Node-affinity is provided by someone else, just filter out cpus
> -         * that are either offline or not in the affinity of any vcpus. */
> -        nodemask = d->node_affinity;
> -        for_each_node_mask ( node, d->node_affinity )
> -            if ( !cpumask_intersects(&node_to_cpumask(node), cpumask) )
> -                node_clear(node, nodemask);//d->node_affinity);
> -
> -        /* Avoid loosing track of node-affinity because of a bad
> -         * vcpu-affinity has been specified. */
> -        if ( !nodes_empty(nodemask) )
> -            d->node_affinity = nodemask;
> +                node_set(node, d->node_affinity);
>       }
>
>       sched_set_node_affinity(d, &d->node_affinity);
>

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2013-11-12  8:35 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-11-12  8:11 Fwd: [PATCH RESEND 01/12] xen: numa-sched: leave node-affinity alone if not in "auto" mode Jan Beulich
2013-11-12  8:35 ` Keir Fraser
  -- strict thread matches above, loose matches on Subject: below --
2013-11-05 14:33 [PATCH RESEND 00/12] Implement per-vcpu NUMA node-affinity for credit1 Dario Faggioli
2013-11-05 14:34 ` [PATCH RESEND 01/12] xen: numa-sched: leave node-affinity alone if not in "auto" mode Dario Faggioli
2013-11-05 14:43   ` George Dunlap

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).