From mboxrd@z Thu Jan 1 00:00:00 1970 From: Dario Faggioli Subject: Re: [BUG] Linux process vruntime accounting in Xen Date: Mon, 16 May 2016 13:37:01 +0200 Message-ID: <1463398621.18789.55.camel@citrix.com> References: Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============3254945905644526782==" Return-path: In-Reply-To: List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Errors-To: xen-devel-bounces@lists.xen.org Sender: "Xen-devel" To: Tony S , xen-devel@lists.xen.org Cc: George Dunlap , Juergen Gross , Boris Ostrovsky , David Vrabel , Matt Fleming List-Id: xen-devel@lists.xenproject.org --===============3254945905644526782== Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="=-t5mT9/nDoW7EsPWizcaR" --=-t5mT9/nDoW7EsPWizcaR Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable [Adding George again, and a few Linux/Xen folks] On Sat, 2016-05-14 at 18:25 -0600, Tony S wrote: > In virtualized environments, sometimes we need to limit the CPU > resources to a virtual machine(VM). For example in Xen, we use > $ xl sched-credit -d 1 -c 50 >=20 > to limit the CPU resource of dom 1 as half of > one physical CPU core. If the VM CPU resource is capped, the process > inside the VM will have a vruntime accounting problem. Here, I report > my findings about Linux process scheduler under the above scenario. >=20 Thanks for this other report as well. :-) All you say makes sense to me, and I will think about it. I'm not sure about one thing, though... > ------------Description------------ > Linux CFS relies on delta_exec to charge the vruntime of processes. > The variable delta_exec is the difference of a process starts and > stops running on a CPU. This works well in physical machine. However, > in virtual machine under capped resources, some processes might be > accounted with inaccurate vruntime. >=20 > For example, suppose we have a VM which has one vCPU and is capped to > have as much as 50% of a physical CPU. When process A inside the VM > starts running and the CPU resource of that VM runs out, the VM will > be paused. Next round when the VM is allocated new CPU resource and > starts running again, process A stops running and is put back to the > runqueue. The delta_exec of process A is accounted as its "real > execution time" plus the paused time of its VM. That will make the > vruntime of process A much larger than it should be and process A > would not be scheduled again for a long time until the vruntimes of > other > processes catch it. > --------------------------------------- >=20 >=20 > ------------Analysis---------------- > When a process stops running and is going to put back to the > runqueue, > update_curr() will be executed. > [src/kernel/sched/fair.c] >=20 > static void update_curr(struct cfs_rq *cfs_rq) > { > =C2=A0=C2=A0=C2=A0=C2=A0... ... > =C2=A0=C2=A0=C2=A0=C2=A0delta_exec =3D now - curr->exec_start; > =C2=A0=C2=A0=C2=A0=C2=A0... ... > =C2=A0=C2=A0=C2=A0=C2=A0curr->exec_start =3D now; > =C2=A0=C2=A0=C2=A0=C2=A0... ... > =C2=A0=C2=A0=C2=A0=C2=A0curr->sum_exec_runtime +=3D delta_exec; > =C2=A0=C2=A0=C2=A0=C2=A0schedstat_add(cfs_rq, exec_clock, delta_exec); > =C2=A0=C2=A0=C2=A0=C2=A0curr->vruntime +=3D calc_delta_fair(delta_exec, c= urr); > =C2=A0=C2=A0=C2=A0=C2=A0update_min_vruntime(cfs_rq); > =C2=A0=C2=A0=C2=A0=C2=A0... ... > } >=20 > "now" --> the right now time > "exec_start" --> the time when the current process is put on the CPU > "delta_exec" --> the time difference of a process between it starts > and stops running on the CPU >=20 > When a process starts running before its VM is paused and the process > stops running after its VM is unpaused, the delta_exec will include > the VM suspend time which is pretty large compared to the real > execution time of a process. >=20 ... but would that also apply to a VM that is not scheduled --just because of pCPU contention, not because it was paused-- for a few time? Isn't there anything in place in Xen or Linux (the latter being better suitable for something like this, IMHO) to compensate for that? I have to admit I haven't really ever checked myself, maybe either George or our Linux people do know more? > This issue will make a great performance harm to the victim process. > If the process is an I/O-bound workload, its throughput and latency > will be influenced. If the process is a CPU-bound workload, this > issue > will make its vruntime "unfair" compared to other processes under > CFS. >=20 > Because the CPU resource of some type VMs in the cloud are limited as > the above describes(like Amazon EC2 t2.small instance), I doubt that > will also harm the performance of public cloud instances. > --------------------------------------- >=20 >=20 > My test environment is as follows: Hypervisor(Xen 4.5.0), Dom 0(Linux > 3.18.21), Dom U(Linux 3.18.21). I also test longterm version Linux > 3.18.30 and the latest longterm version, Linux 4.4.7. Those kernels > all have this issue. >=20 > Please confirm this bug. Thanks. >=20 >=20 --=20 <> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://about.me/dario.faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK) --=-t5mT9/nDoW7EsPWizcaR Content-Type: application/pgp-signature; name="signature.asc" Content-Description: This is a digitally signed message part Content-Transfer-Encoding: 7bit -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQIcBAABCAAGBQJXObDdAAoJEBZCeImluHPuQq4P/0xP2JqWveZWPyzb55+sj4j3 46ehc6v1RC8LH5x5tzgPfc8qNT0U9ifs+4/U/W0ZEfPqjuE0/YAd+6twm6YayVoi k8nLD/jwu6EVirxU7yfKHjzsjwTkF9hzilP/LxD7/F7wx7pC2JSefcdMJ6sqKfn0 kf/JZOgKMI5DXFE7JLrcLohvpR955gqAJsQmTUo1fDeLslyydWw18WBYNLMT4SRl dYIQ5HsZtIsik71JIMS4ak6MqulV1BGscf8aUthQkbCPvXFrri7P+gilDvxh92PJ DfMwHmnFtfCokilF/R5L/wq6mV9S4vNSsOMVd1HXPm8ASRP+rO7DP4c/7InJ4HvP HeC2iyAZlHaAwJvPCdk8vKhA7JQy/8aaAJxx8FkRY48b78lg7SqM1O9aRo1iR/EP kcqEHAT/SF7P7DPAlD2QuYuEP9pK6EHuWePQPZ4BY+tFECQHJhEh9H4TMWHHZ3eg 1ga1uWIi+a4/AJ4v3/MjwZFVtI3wLSy+n0/rWg2g/JhWgcIVE+wK4GPIyoCx8+UI 1Y8xuTcdSbdjizcTc0UQStYNN37BExcTVV5hs2WixftP2U5N7RQ9PFlSIgRYLCAS qieyMaDQyW3TvtdDT2zmw+T85L6LxWjlGzo8uNz1vzO2OGHPndruKSA1WkXed3Us mEvZ06vKLh1B7mi7Ww/Q =fJIJ -----END PGP SIGNATURE----- --=-t5mT9/nDoW7EsPWizcaR-- --===============3254945905644526782== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: base64 Content-Disposition: inline X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KWGVuLWRldmVs IG1haWxpbmcgbGlzdApYZW4tZGV2ZWxAbGlzdHMueGVuLm9yZwpodHRwOi8vbGlzdHMueGVuLm9y Zy94ZW4tZGV2ZWwK --===============3254945905644526782==--