From: Dario Faggioli <dario.faggioli@citrix.com>
To: suokun <suokunstar@gmail.com>
Cc: George Dunlap <George.Dunlap@citrix.com>, xen-devel@lists.xen.org
Subject: Re: [BUG] mistakenly wake in Xen's credit scheduler
Date: Wed, 28 Oct 2015 06:41:24 +0100 [thread overview]
Message-ID: <1446010884.2937.276.camel@citrix.com> (raw)
In-Reply-To: <CAG2GYXHxmD8bBPzRyW7T61bEnfCya2O2Gz+wmxnvNUY0EbavXA@mail.gmail.com>
[-- Attachment #1.1.1: Type: text/plain, Size: 3509 bytes --]
On Tue, 2015-10-27 at 14:32 -0600, suokun wrote:
> On Tue, Oct 27, 2015 at 4:44 AM, Dario Faggioli
> <dario.faggioli@citrix.com> wrote:
> Hi, Dario,
> Thank you for your reply.
>
Hi,
> Here are my two VMs running on the same physical CPU.
> VM-IO: 1-vCPU pinned to a pCPU, running netperf
> VM-CPU: 1-vCPU pinned the the same pCPU, running a while(1) loop
> Another machine run the netperf client to send the requests to VM-IO.
>
> My code is very simple:
> in VM-IO, as server side: $ netserver -p 12345
> in VM-CPU, just running a while(1) loop: $./loop
> in the client, send I/O request to the VM-IO: $ netperf -H
> [server_ip]
> -l 15 -t TCP_STREAM -p 12345
>
Ok, thanks.
> The setting that led to the poor IO performance is as follows:
> VM-IO: 1-vCPU pinned to a pCPU, running netperf
> VM-CPU: 1-vCPU pinned the the same pCPU, running a while(1) loop
>
> The root cause is that when an IO request comes, VM-IO’s vCPU is
> elevated to BOOST and goes through vcpu_wake —> __runq_tickle. In
> __runq_tickle, the currently running vCPU (i.e., the vCPU from VM
> -CPU) is marked as _VPF_migrating.
>
Ok.
> Then, Xen goes through schedule() to
> reschedule the current vCPU (i.e., vCPU from VM-CPU) and schedule the
> next vCPU (i.e., the vCPU from VM-IO). Due to the _VPF_migrating
> flag, the descheduled vCPU will be migrated in context_saved() and
> later woken up in cpu_wake().
>
Sure.
> Indeed, csched_vcpu_wake() will quit if the
> vCPU from VM-CPU is on run queue. But it is actually not. In
> csched_schedule(), the vCPU will not be inserted back to run queue
> because it is not runnable due to the __VPF_migrating bit in
> pause_flags. As such, the vCPU from VM-CPU will boosted and not be
> preempted by a later IO request because BOOST can not preempt BOOST.
>
Aha! Now I see what you mean. From the previous email, I couldn't
really tell which one call to schedule you where looking at, during
each phase of the analysis... Thanks for clarifying!
And, yes, I agree with you that, since the vCPU of VM-CPU fails the
vcpu_runnable() test, it's being treated as it is really waking up from
sleep, in csched_vcpu_wake(), and hence boosted.
> A simple fix would be allowing BOOST to preempt BOOST.
>
Nah, that would be an hack on top of an hack! :-P
> A better fix
> would be checking the CPU affinity before setting the __VPF_migrating
> flag.
>
Yeah, I like this better. So, can you try the patch attached to this
email?
Here at my place, without any patch, I get the following results:
idle: throughput = 806.64
with noise: throughput = 166.50
With the patch, I get this:
idle: throughput = 807.18
with noise: throughput = 731.66
The patch (if you confirm that it works) fixes the bug in this
particular situations, where vCPUs are all pinned to the same pCPUs,
but does not prevent vCPUs being migrated around the pCPUs to become
BOOSTed in Credit2.
That is something I think we should avoid, and I've got a (small) patch
series ready for that. I'll give some more testing to it before sending
it to the list, though, as I want to make sure it's not causing
regressions.
Thanks and Regards,
Dario
---
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)
[-- Attachment #1.1.2: xen-sched-credit1-fix-tickle-migrate-cur.patch --]
[-- Type: text/x-patch, Size: 5341 bytes --]
commit 16381936ad320d010c7566c946a3e528f803e78a
Author: Dario Faggioli <dario.faggioli@citrix.com>
Date: Tue Oct 27 23:22:16 2015 +0100
xen: credit1: on vCPU wakeup, kick away current only if makes sense
In fact, when waking up a vCPU, __runq_tickle() is called
to allow the new vCPU to run on a pCPU (which one, depends
on the relationship between the priority of the new vCPU,
and the ones of the vCPUs that are already running).
If there is no idle processor on which the new vCPU can
run (e.g., because of pinning/affinity), we try to migrate
away the vCPU that is currently running on the new vCPU's
processor (i.e., the processor on which the vCPU is waking
up).
Now, trying to migrate a vCPU has the effect of pushing it
through a
running --> offline --> runnable
transition, which, in turn, has the following negative
effects:
1) Credit1 counts that as a wakeup, and it BOOSTs the
vCPU, even if it is a CPU-bound one, which wouldn't
normally have deserved boosting. This can prevent
legit IO-bound vCPUs to get ahold of the processor
until such spurious boosting expires, hurting the
performance!
2) since the vCPU is fails the vcpu_runnable() test
(within the call to csched_schedule() that follows
the wakeup, as a consequence of tickling) the
scheduling rate-limiting mechanism is also fooled,
i.e., the context switch happens even if less than
the minimum execution amount of time passed.
In particular, 1) has been reported to cause the following
issue:
* VM-IO: 1-vCPU pinned to a pCPU, running netperf
* VM-CPU: 1-vCPU pinned the the same pCPU, running a busy
CPU loop
==> Only VM-I/O: throughput is 806.64 Mbps
==> VM-I/O + VM-CPU: throughput is 166.50 Mbps
This patch solves (for the above scenario) the problem
by checking whether or not it makes sense to try to
migrate away the vCPU currently running on the processor.
In fact, we shouldn't even try to do it, if there aren't
idle processors where such a vCPU can execute. In such case,
Attempting the migration is just futile (harmful, actually!).
With this patch, in the above configuration, results are:
==> Only VM-I/O: throughput is 807.18 Mbps
==> VM-I/O + VM-CPU: throughput is 731.66 Mbps
Note that, still about 1), it is _wrong_ that Credit1
treats wakeups resulting from migration of a vCPU to
another pCPU as "regular wakeups", hence granting BOOST
priority to the vCPUs experiencing that. However:
- fixing that is non-trivial, and requires being done
in its own patch;
- that is orthogonal to the fix being introduced here.
That is to say, even when Credit1 will be fixed not
to boost migrating vCPUs, this patch will be still
corect and necessary.
Reported-by: suokun <suokunstar@gmail.com>
Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
---
Cc: George Dunlap <george.dunlap@citrix.com>
Cc: suokun <suokunstar@gmail.com>
diff --git a/xen/common/sched_credit.c b/xen/common/sched_credit.c
index b8f28fe..1b30e67 100644
--- a/xen/common/sched_credit.c
+++ b/xen/common/sched_credit.c
@@ -426,9 +426,10 @@ __runq_tickle(unsigned int cpu, struct csched_vcpu *new)
/*
* If there are no suitable idlers for new, and it's higher
- * priority than cur, ask the scheduler to migrate cur away.
- * We have to act like this (instead of just waking some of
- * the idlers suitable for cur) because cur is running.
+ * priority than cur, check whether we can migrate cur away.
+ * (We have to do it indirectly, via _VPF_migrating, instead
+ * of just tickling any idler suitable for cur) because cur
+ * is running.)
*
* If there are suitable idlers for new, no matter priorities,
* leave cur alone (as it is running and is, likely, cache-hot)
@@ -437,11 +438,18 @@ __runq_tickle(unsigned int cpu, struct csched_vcpu *new)
*/
if ( new_idlers_empty && new->pri > cur->pri )
{
+ csched_balance_cpumask(cur->vcpu, balance_step,
+ csched_balance_mask(cpu));
+ if ( cpumask_intersects(csched_balance_mask(cpu),
+ &idle_mask) )
+ {
+ SCHED_VCPU_STAT_CRANK(cur, kicked_away);
+ SCHED_VCPU_STAT_CRANK(cur, migrate_r);
+ SCHED_STAT_CRANK(migrate_kicked_away);
+ set_bit(_VPF_migrating, &cur->vcpu->pause_flags);
+ }
+ /* Tickle cpu anyway, to let new preempt cur. */
SCHED_STAT_CRANK(tickle_idlers_none);
- SCHED_VCPU_STAT_CRANK(cur, kicked_away);
- SCHED_VCPU_STAT_CRANK(cur, migrate_r);
- SCHED_STAT_CRANK(migrate_kicked_away);
- set_bit(_VPF_migrating, &cur->vcpu->pause_flags);
__cpumask_set_cpu(cpu, &mask);
}
else if ( !new_idlers_empty )
[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 181 bytes --]
[-- Attachment #2: Type: text/plain, Size: 126 bytes --]
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
next prev parent reply other threads:[~2015-10-28 5:41 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-10-27 5:59 [BUG] mistakenly wake in Xen's credit scheduler suokun
2015-10-27 9:44 ` George Dunlap
2015-10-27 9:53 ` Dario Faggioli
2015-10-27 20:11 ` suokun
2015-10-28 5:39 ` suokun
2015-10-28 5:54 ` Dario Faggioli
2015-10-28 6:01 ` Juergen Gross
2015-10-28 6:08 ` Dario Faggioli
2015-10-28 11:03 ` George Dunlap
2015-10-27 10:44 ` Dario Faggioli
2015-10-27 20:32 ` suokun
2015-10-28 5:41 ` Dario Faggioli [this message]
2015-10-28 17:04 ` suokun
2015-10-29 10:25 ` Dario Faggioli
-- strict thread matches above, loose matches on Subject: below --
2015-10-26 22:30 Kun Suo
2015-10-27 5:48 ` Jia Rao
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1446010884.2937.276.camel@citrix.com \
--to=dario.faggioli@citrix.com \
--cc=George.Dunlap@citrix.com \
--cc=suokunstar@gmail.com \
--cc=xen-devel@lists.xen.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.