From mboxrd@z Thu Jan 1 00:00:00 1970 From: Dario Faggioli Subject: Re: [PATCH] xen:rtds:fix bug in accounting budget Date: Fri, 21 Oct 2016 19:36:22 +0200 Message-ID: <1477071382.24930.153.camel@citrix.com> References: <1476890041-4248-1-git-send-email-mengxu@cis.upenn.edu> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============7312045329474748575==" Return-path: Received: from mail6.bemta6.messagelabs.com ([193.109.254.103]) by lists.xenproject.org with esmtp (Exim 4.84_2) (envelope-from ) id 1bxdk8-0001g0-BL for xen-devel@lists.xenproject.org; Fri, 21 Oct 2016 17:36:32 +0000 In-Reply-To: <1476890041-4248-1-git-send-email-mengxu@cis.upenn.edu> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Errors-To: xen-devel-bounces@lists.xen.org Sender: "Xen-devel" To: Meng Xu , xen-devel@lists.xenproject.org Cc: Wei Liu , George Dunlap , Haoran Li , Linh Thi Xuan Phan , Meng Xu , Dagaen Golomb List-Id: xen-devel@lists.xenproject.org --===============7312045329474748575== Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="=-ivdnIMpm5qflyXrFPcz+" --=-ivdnIMpm5qflyXrFPcz+ Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Wed, 2016-10-19 at 11:13 -0400, Meng Xu wrote: > The bug is introduced in Xen 4.7 when we converted RTDS scheduler > from quantum-driven model to event-driven model. > We assumed rt_schedule() is always called for a VCPU > before the VCPUs budget replenished handler. > No, we didn't. Or at least, I've never done so, and tried as hard as I could to tell you guys to not make any assumptions on who run first. So, yes, I agree that, if there is code that only works under such assumption, it's a bug. > This assumption does not hold, when system is overloaded, or > when the VCPU budget is almost equal its period. >=20 > Buggy behavior: > 1) A VCPU may get less budget that assigned in a period. > 2) A full capacity VCPU, i.e., a VCPU whose period is equal to > budget, > =C2=A0=C2=A0=C2=A0may not get any budget in some period. >=20 So, there are two bugs. And things are very subtle, as far as I can judge from both the bugs description and the code. It would be, therefore, a lot more clear if you could send _one_ patch per bug. > Bug analysis: > 1) A VCPU deadline can be fast-forwarded by more than one period. > =C2=A0=C2=A0=C2=A0However, the VCPU last_start time was not updated imme= diately. > =C2=A0=C2=A0=C2=A0If rt_schedule() is called after rt_update_deadline(),= which > happens > =C2=A0=C2=A0=C2=A0when VCPU budget is equal to period or when VCPU has d= eadline > miss, > =C2=A0=C2=A0=C2=A0burn_budget() will burn the budget that was just reple= nished, > =C2=A0=C2=A0=C2=A0although the replenished budget should be used in the = most recent > period only. >=20 -EPARSE. I've looked at the code and try to match current behavior, your proposed change, and this description, but failed. Can you be a little more precise and specific about what happens when? I'll keep looking and thinking, but any help in making all this a bit more clear would be very welcome. > 2) When a full capacity VCPU depletes its budget and is context > switching out, > =C2=A0=C2=A0=C2=A0but has not updated the cores current running VCPU, > "has not updated the cores current running VCPU," I've not idea what this sentence means. What is it that has not yet been updated? > =C2=A0=C2=A0=C2=A0the budget replenish timer may be triggerred. > =C2=A0=C2=A0=C2=A0The replenish handler failed to re-schedule the full c= apacity VCPU > =C2=A0=C2=A0=C2=A0because it thought the VCPU is running. >=20 > =C2=A0=C2=A0=C2=A0When a VCPU budget is replenished, we try to tickle a = CPU. > =C2=A0=C2=A0=C2=A0When we find a core for a VCPU to tickle and the VCPU = is context > switching out, > =C2=A0=C2=A0=C2=A0we will always tickle the core where the VCPU was runn= ing, > =C2=A0=C2=A0=C2=A0if the VCPU cannot find another core to tickle >=20 Can't understand much again... I guess this is the description of the solution to the bug? > This bug was reported by Dagaen Golomb >=20 You can give credit by using the following tag: Reported by:=C2=A0Dagaen Golomb Thanks and Regards, Dario --=20 <> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://about.me/dario.faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK) --=-ivdnIMpm5qflyXrFPcz+ Content-Type: application/pgp-signature; name="signature.asc" Content-Description: This is a digitally signed message part Content-Transfer-Encoding: 7bit -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQIcBAABCAAGBQJYClIWAAoJEBZCeImluHPuA0kQAIx16LqRj9Bw9CUKnqH36iql GUJ3W7JOus8FY8UidjXWaSh51gdEkKkc468sk3xBtG+fexmcm/H99rlQumcwD85h qhKDBtiiP24ssyV0QhG+Br61eKNuQv6QmgFEkcv4Xwx5ZHFoautq6uMkY/gOAIE0 QYTZUaCyZb4geJYaUUGTNu8NQ8eDIHNL9d5xjLqzsvk2GpSck55MnGP5tnBiGSAG PqvDi85TdqykFQwb+bzIdl2pcFBQKQVb95QA83V4o4Nd1/YL0nURynP6U20yOjQe PA8tYVI9bblyCiLXl7jHjDNxsv+oXC/FISPggxbzWUGXQyw2QgqdLp9PI/6rffp5 nGMEhh+QQZE0vY+hDZ0jQDCrqqCcZG3sTPd9zGYDJpo865xBeCZPP1jKCabRIS2b c/tsFRmm0kEg54jDm/sXSp/uWbxW9U5V6YBKvjWaULrlh0S+gw4FNjUnq53ZIorz DoF4osfIAgdkvLptzeRMxxq9/6Lt8g3uzjll5c/hqsIhc6ROvQMF77ruyNme0p5L KAW+lpeHhHAqhG9+Gv4QVkUWRL8kRymmdq7dJ2Fc4u0wQozfWZfdbyraH7+BzIC2 QFQXfIcqy56RnnR/FLwpu14U7YsaMb8cekDTot+ajaiYhu3AUE+fV9w9otEbHrgS 30juffRv6096TOA48+3J =hduB -----END PGP SIGNATURE----- --=-ivdnIMpm5qflyXrFPcz+-- --===============7312045329474748575== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: base64 Content-Disposition: inline X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KWGVuLWRldmVs IG1haWxpbmcgbGlzdApYZW4tZGV2ZWxAbGlzdHMueGVuLm9yZwpodHRwczovL2xpc3RzLnhlbi5v cmcveGVuLWRldmVsCg== --===============7312045329474748575==--