Re: [BUG] Linux process vruntime accounting in Xen

xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed

From: Dario Faggioli <dario.faggioli@citrix.com>
To: Tony S <suokunstar@gmail.com>, xen-devel@lists.xen.org
Cc: George Dunlap <George.Dunlap@eu.citrix.com>,
	Juergen Gross <jgross@suse.com>,
	Boris Ostrovsky <boris.ostrovsky@oracle.com>,
	David Vrabel <david.vrabel@citrix.com>,
	Matt Fleming <matt@codeblueprint.co.uk>
Subject: Re: [BUG] Linux process vruntime accounting in Xen
Date: Mon, 16 May 2016 13:37:01 +0200	[thread overview]
Message-ID: <1463398621.18789.55.camel@citrix.com> (raw)
In-Reply-To: <CAG2GYXEoByMMbxUCMw8-ZMsvnt3mDWND09CjPfMLkt=neCGWyA@mail.gmail.com>


[-- Attachment #1.1: Type: text/plain, Size: 4347 bytes --]

[Adding George again, and a few Linux/Xen folks]

On Sat, 2016-05-14 at 18:25 -0600, Tony S wrote:
> In virtualized environments, sometimes we need to limit the CPU
> resources to a virtual machine(VM). For example in Xen, we use
> $ xl sched-credit -d 1 -c 50
> 
> to limit the CPU resource of dom 1 as half of
> one physical CPU core. If the VM CPU resource is capped, the process
> inside the VM will have a vruntime accounting problem. Here, I report
> my findings about Linux process scheduler under the above scenario.
> 
Thanks for this other report as well. :-)

All you say makes sense to me, and I will think about it. I'm not sure
about one thing, though...

> ------------Description------------
> Linux CFS relies on delta_exec to charge the vruntime of processes.
> The variable delta_exec is the difference of a process starts and
> stops running on a CPU. This works well in physical machine. However,
> in virtual machine under capped resources, some processes might be
> accounted with inaccurate vruntime.
> 
> For example, suppose we have a VM which has one vCPU and is capped to
> have as much as 50% of a physical CPU. When process A inside the VM
> starts running and the CPU resource of that VM runs out, the VM will
> be paused. Next round when the VM is allocated new CPU resource and
> starts running again, process A stops running and is put back to the
> runqueue. The delta_exec of process A is accounted as its "real
> execution time" plus the paused time of its VM. That will make the
> vruntime of process A much larger than it should be and process A
> would not be scheduled again for a long time until the vruntimes of
> other
> processes catch it.
> ---------------------------------------
> 
> 
> ------------Analysis----------------
> When a process stops running and is going to put back to the
> runqueue,
> update_curr() will be executed.
> [src/kernel/sched/fair.c]
> 
> static void update_curr(struct cfs_rq *cfs_rq)
> {
>     ... ...
>     delta_exec = now - curr->exec_start;
>     ... ...
>     curr->exec_start = now;
>     ... ...
>     curr->sum_exec_runtime += delta_exec;
>     schedstat_add(cfs_rq, exec_clock, delta_exec);
>     curr->vruntime += calc_delta_fair(delta_exec, curr);
>     update_min_vruntime(cfs_rq);
>     ... ...
> }
> 
> "now" --> the right now time
> "exec_start" --> the time when the current process is put on the CPU
> "delta_exec" --> the time difference of a process between it starts
> and stops running on the CPU
> 
> When a process starts running before its VM is paused and the process
> stops running after its VM is unpaused, the delta_exec will include
> the VM suspend time which is pretty large compared to the real
> execution time of a process.
> 
... but would that also apply to a VM that is not scheduled --just
because of pCPU contention, not because it was paused-- for a few time?

Isn't there anything in place in Xen or Linux (the latter being better
suitable for something like this, IMHO) to compensate for that?

I have to admit I haven't really ever checked myself, maybe either
George or our Linux people do know more?

> This issue will make a great performance harm to the victim process.
> If the process is an I/O-bound workload, its throughput and latency
> will be influenced. If the process is a CPU-bound workload, this
> issue
> will make its vruntime "unfair" compared to other processes under
> CFS.
> 
> Because the CPU resource of some type VMs in the cloud are limited as
> the above describes(like Amazon EC2 t2.small instance), I doubt that
> will also harm the performance of public cloud instances.
> ---------------------------------------
> 
> 
> My test environment is as follows: Hypervisor(Xen 4.5.0), Dom 0(Linux
> 3.18.21), Dom U(Linux 3.18.21). I also test longterm version Linux
> 3.18.30 and the latest longterm version, Linux 4.4.7. Those kernels
> all have this issue.
> 
> Please confirm this bug. Thanks.
> 
> 
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)


[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

next prev parent reply	other threads:[~2016-05-16 11:37 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-05-15  0:25 [BUG] Linux process vruntime accounting in Xen Tony S
2016-05-16 11:37 ` Dario Faggioli [this message]
2016-05-16 21:38   ` Tony S
2016-05-16 22:33     ` Boris Ostrovsky
2016-05-17  9:33       ` George Dunlap
2016-05-17  9:45         ` Juergen Gross
2016-05-18 12:24         ` Juergen Gross
2016-05-18 14:57           ` Dario Faggioli
2016-05-18 16:09             ` Tony S
2016-05-18 16:14               ` Juergen Gross
2016-05-20 12:50                 ` Juergen Gross
2016-05-16 22:33     ` Tony S

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1463398621.18789.55.camel@citrix.com \
    --to=dario.faggioli@citrix.com \
    --cc=George.Dunlap@eu.citrix.com \
    --cc=boris.ostrovsky@oracle.com \
    --cc=david.vrabel@citrix.com \
    --cc=jgross@suse.com \
    --cc=matt@codeblueprint.co.uk \
    --cc=suokunstar@gmail.com \
    --cc=xen-devel@lists.xen.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).