Re: [PATCH v2] sched: credit2: respect per-vcpu hard affinity

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Dario Faggioli <dario.faggioli@citrix.com>
To: George Dunlap <George.Dunlap@citrix.com>
Cc: "xen-devel@lists.xen.org" <xen-devel@lists.xen.org>,
	"jtweaver@hawaii.edu" <jtweaver@hawaii.edu>,
	"JBeulich@suse.com" <JBeulich@suse.com>,
	"henric@hawaii.edu" <henric@hawaii.edu>
Subject: Re: [PATCH v2] sched: credit2: respect per-vcpu hard affinity
Date: Mon, 9 Mar 2015 15:07:37 +0000	[thread overview]
Message-ID: <1425913651.2729.87.camel@citrix.com> (raw)
In-Reply-To: <54FD87BD.8000806@eu.citrix.com>


[-- Attachment #1.1.1: Type: text/plain, Size: 6288 bytes --]

On Mon, 2015-03-09 at 11:45 +0000, George Dunlap wrote:
> On 03/09/2015 07:11 AM, Justin Weaver wrote:

> > I don't think there's any way the mask can be empty after the
> > cpumask_and with the cpu_hard_affinity and the VCPU2ONLINE. In
> > schedule.c:vcpu_set_hard_affinity, if the intersection of the new hard
> > mask and VCPU2ONLINE is empty, the function returns -EINVAL. I took
> > into account all the discussion here and modified the function for v3.
> 
> What about this:
> 
> * Create a cpu pool with cpus 0 and 1.  online_cpus is now [0,1].
> * Set a hard affinity of [1].  This succeeds.
>
So, I decided to try this:

# xl cpupool-list -c
Name               CPU list
Pool-0              0,1

# xl list -c
Name                                        ID   Mem VCPUs	State	Time(s)         Cpupool
Domain-0                                     0   507     4     r-----      19.9           Pool0
test-pv                                      1  2000     8     -b----      19.2           Pool0

# xl vcpu-list test-pv
Name                                ID  VCPU   CPU State   Time(s) Affinity (Hard / Soft)
test-pv                              1     0    1   -b-       5.5  1 / 1
test-pv                              1     1    1   -b-       2.4  1 / 1
test-pv                              1     2    1   -b-       2.9  1 / 1
test-pv                              1     3    1   -b-       1.9  1 / 1
test-pv                              1     4    1   -b-       1.0  1 / 1
test-pv                              1     5    1   -b-       2.0  1 / 1
test-pv                              1     6    1   -b-       1.2  1 / 1
test-pv                              1     7    1   -b-       2.4  1 / 1

# xl cpupool-cpu-remove Pool0 1
(XEN) ****************************************
(XEN) Panic on CPU 0:
(XEN) Assertion '!cpumask_empty(dom_cpumask)' failed at domain.c:460
(XEN) ****************************************

i.e., surprise surprise, there's already an assert guarding this... and
it triggers!! :-(

It seems to have been added in v4 of the per-vcpu soft affinity work:
http://xenbits.xen.org/gitweb/?p=people/dariof/xen.git;a=commit;h=4d4e999637f38e0bbd4318ad8e0c92fd1e580430

So, we must have had this discussion before! :-)

I've done a bit of archeology and the ASSERT() in
domain_update_node_affinity() was introduced by me (upon request) while
working on implementing per-vcpu soft affinity. Then, cs 93be8285 is
what causes the assert to trigger. Time seems not to match, but that's
because soft affinity was put on hold when it was decided not to include
it in 4.4, and I probably forgot to retest with a use case similar to
the above when resubmitting it! :-(

A patch to fix things is attached to this email, for convenience (I'll
submit it properly, with changelog and everything, right away).

After seeing this, I'm even more convinced that
!empty(online && hard_affinity) is really something we want and, as we
ASSERT() it in domain.c, the same should be done in sched_credit2.c.
OTOH, if we don't want to ASSERT() in the specific scheduler code, then
I'd remove the one in domain.c too (and, perhaps, just use online as a
fallback), or things would look inconsistet.

Of course, this can also be seen as a proof of George's point, that this
is something not obvious and not easy to catch. Still, I think that if
we say that we support hard vcpu affinity (a.k.a. pinning), we should
not allow vcpus outside their hard affinity, and the code must reflect
this.

> * Move cpu 1 to a different cpupool.
>
> After a quick look I don't see anything that updates the hard affinity
> when cpus are removed from pools.
> 
cpupool_unassign_cpu() calls cpupool_unassign_cpu_helper()that calls
cpu_disable_scheduler() which, if it finds that empty(online &&
hard_affinity)==true, it resets hard_affinity to "all".

Note that this is reported in the log, as a confirmation that this is
really something exceptional:

  (XEN) Breaking affinity for d1v0
  (XEN) Breaking affinity for d1v1
  (XEN) Breaking affinity for d1v2
  (XEN) Breaking affinity for d1v3
  (XEN) Breaking affinity for d1v4
  (XEN) Breaking affinity for d1v5
  (XEN) Breaking affinity for d1v6
  (XEN) Breaking affinity for d1v7

And also note that, as a consequence of fiddling with cpupools, the
affinity has been altered by Xen (i.e., vcpus still always run within
their hard affinity masks):


# xl vcpu-pin 1 all 1 1
# xl cpupool-cpu-remove Pool-0 1
# xl cpupool-list -c
Name               CPU list
Pool-0             0,2,3,4,5,6,7,8,9,10,11,12,13,14,15

# xl vcpu-list test-pv
Name                                ID  VCPU   CPU State   Time(s) Affinity (Hard / Soft)
test-pv                              1     0    2   -b-       6.1  all / 1
test-pv                              1     1    6   -b-       1.6  all / 1
test-pv                              1     2    4   -b-       1.6  all / 1
test-pv                              1     3    5   -b-       3.1  all / 1
test-pv                              1     4    4   -b-       0.8  all / 1
test-pv                              1     5    7   -b-       3.0  all / 1
test-pv                              1     6    3   -b-       1.1  all / 1
test-pv                              1     7    3   -b-       0.7  all / 1

That's what drove all the reasoning when changing
domain_update_node_affinity(), and led to that ASSERT(), during the soft
affinity work. The reason why the ASSERT() triggers, as can be seen in
the patch, is that, because of the cited changeset,
domain_update_node_affinity() is called too early.

> And, even if it does *now*, it's possible that something might be
> changed in the future that would forget it; this ASSERT() isn't exactly
> next to that code.
> 
But it would help us catch the bug at some point... As proved above! :-D

BTW, I've already written and submitted an OSSTest test involving
cpupools (we don't do anything at the moment). I'll see about adding
these kind of things to it.

> So it seems to me like handling that case makes the software
> more robust for little cost.
> 
Yep, I also agree that it at least won't cost much, in terms of runtime
overhead.

Regards,
Dario

[-- Attachment #1.1.2: patch --]
[-- Type: text/plain, Size: 1469 bytes --]

commit 14430d25448aa85df7cef991b0f52de91e37440b
Author: Dario Faggioli <dario.faggioli@citrix.com>
Date:   Mon Mar 9 15:26:30 2015 +0100

    Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>

diff --git a/xen/common/cpupool.c b/xen/common/cpupool.c
index a758a8b..ad7bbb5 100644
--- a/xen/common/cpupool.c
+++ b/xen/common/cpupool.c
@@ -296,6 +296,8 @@ static int cpupool_assign_cpu_locked(struct cpupool *c, unsigned int cpu)
 static long cpupool_unassign_cpu_helper(void *info)
 {
     int cpu = cpupool_moving_cpu;
+    struct cpupool *c = info;
+    struct domain *d;
     long ret;
 
     cpupool_dprintk("cpupool_unassign_cpu(pool=%d,cpu=%d)\n",
@@ -318,6 +320,11 @@ static long cpupool_unassign_cpu_helper(void *info)
         cpupool_cpu_moving = NULL;
     }
 
+    rcu_read_lock(&domlist_read_lock);
+    for_each_domain_in_cpupool(d, c)
+        domain_update_node_affinity(d);
+    rcu_read_unlock(&domlist_read_lock);
+
 out:
     spin_unlock(&cpupool_lock);
     cpupool_dprintk("cpupool_unassign_cpu ret=%ld\n", ret);
@@ -379,12 +386,6 @@ static int cpupool_unassign_cpu(struct cpupool *c, unsigned int cpu)
     atomic_inc(&c->refcnt);
     cpupool_cpu_moving = c;
     cpumask_clear_cpu(cpu, c->cpu_valid);
-
-    rcu_read_lock(&domlist_read_lock);
-    for_each_domain_in_cpupool(d, c)
-        domain_update_node_affinity(d);
-    rcu_read_unlock(&domlist_read_lock);
-
     spin_unlock(&cpupool_lock);
 
     work_cpu = smp_processor_id();

[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

next prev parent reply	other threads:[~2015-03-09 15:07 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-02-09  3:45 [PATCH v2] sched: credit2: respect per-vcpu hard affinity Justin T. Weaver
2015-03-03  3:15 ` Dario Faggioli
2015-03-03  9:12   ` Jan Beulich
2015-03-04 11:03     ` Dario Faggioli
2015-03-04 12:50       ` Jan Beulich
2015-03-04 13:08         ` Dario Faggioli
2015-03-04 13:24           ` Jan Beulich
2015-03-06 15:18   ` George Dunlap
2015-03-06 17:02     ` Dario Faggioli
2015-03-09  7:11       ` Justin Weaver
2015-03-09 11:45         ` George Dunlap
2015-03-09 15:07           ` Dario Faggioli [this message]
2015-03-10 13:23         ` Dario Faggioli
2015-03-13 17:11 ` Dario Faggioli
2015-03-14  3:48   ` Justin Weaver

find likely ancestor, descendant, or conflicting patches for this message:
( dfblob:a758a8b dfblob:ad7bbb5 )
 OR (
bs:"Re: [PATCH v2] sched: credit2: respect per-vcpu hard affinity" )
	(help)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1425913651.2729.87.camel@citrix.com \
    --to=dario.faggioli@citrix.com \
    --cc=George.Dunlap@citrix.com \
    --cc=JBeulich@suse.com \
    --cc=henric@hawaii.edu \
    --cc=jtweaver@hawaii.edu \
    --cc=xen-devel@lists.xen.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.