From mboxrd@z Thu Jan 1 00:00:00 1970 From: Dario Faggioli Subject: Re: [PATCH v5][RFC]xen: sched: convert RTDS from time to event driven model Date: Thu, 25 Feb 2016 18:51:23 +0100 Message-ID: <1456422683.2959.46.camel@citrix.com> References: <1454992407-5436-1-git-send-email-tiche@seas.upenn.edu> <1456365728.17312.119.camel@citrix.com> <56CE9C0D.30100@seas.upenn.edu> <1456396456.6288.58.camel@citrix.com> <56CF39F3.5040807@seas.upenn.edu> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============1429984557912676508==" Return-path: Received: from mail6.bemta5.messagelabs.com ([195.245.231.135]) by lists.xen.org with esmtp (Exim 4.84) (envelope-from ) id 1aZ04Y-0005Vz-Ew for xen-devel@lists.xenproject.org; Thu, 25 Feb 2016 17:51:30 +0000 In-Reply-To: <56CF39F3.5040807@seas.upenn.edu> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Errors-To: xen-devel-bounces@lists.xen.org Sender: "Xen-devel" To: Tianyang Chen , xen-devel@lists.xenproject.org Cc: george.dunlap@citrix.com, Dagaen Golomb , Meng Xu List-Id: xen-devel@lists.xenproject.org --===============1429984557912676508== Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="=-3LgaUjuNetk8Yv05wrsj" --=-3LgaUjuNetk8Yv05wrsj Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Thu, 2016-02-25 at 12:29 -0500, Tianyang Chen wrote: > On 2/25/2016 5:34 AM, Dario Faggioli wrote: > > >=20 > > Which one ASSERT() fails? > >=20 > The replq_insert() fails because it's already on the replenishment > queue=C2=A0 > when rt_vcpu_wake() is trying to insert a vcpu again. >=C2=A0 > (XEN) Xen call trace: > (XEN)=C2=A0=C2=A0=C2=A0=C2=A0[] sched_rt.c#rt_vcpu_wake= +0xf0/0x17f > (XEN)=C2=A0=C2=A0=C2=A0=C2=A0[] vcpu_wake+0x213/0x3d4 > (XEN)=C2=A0=C2=A0=C2=A0=C2=A0[] vcpu_unblock+0x4b/0x4d > (XEN)=C2=A0=C2=A0=C2=A0=C2=A0[] vcpu_kick+0x20/0x6f > (XEN)=C2=A0=C2=A0=C2=A0=C2=A0[] vcpu_mark_events_pendin= g+0x2c/0x2f > (XEN)=C2=A0=C2=A0=C2=A0=C2=A0[] > event_2l.c#evtchn_2l_set_pending+0xa9/0xb9 > (XEN)=C2=A0=C2=A0=C2=A0=C2=A0[] evtchn_send+0x158/0x183 > (XEN)=C2=A0=C2=A0=C2=A0=C2=A0[] do_event_channel_op+0xe= 21/0x147d > (XEN)=C2=A0=C2=A0=C2=A0=C2=A0[] lstar_enter+0xe2/0x13c > (XEN) > (XEN) > (XEN) **************************************** > (XEN) Panic on CPU 0: > (XEN) Assertion '!__vcpu_on_replq(svc)' failed at sched_rt.c:527 > (XEN) **************************************** > Ok, makes sense. > > > And like you > > > said, mostly spurious sleep > > > happens when a vcpu is running and it could happen in other > > > cases, > > > although rare. > > >=20 > > I think I said already there's no such thing as "spurious sleep". > > Or at > > least, I can't think of anything that I would define a spurious > > sleep, > > if you do, please, explain what situation you're referring to. > >=20 > I meant to say spurious wakeup...=20 > If, for spurious wakeup, we refer to wakeups happening when the vcpu is either running or runnable (and hence in the runqueue already), which I do, we don't even get to call __replq_insert() in those cases. I mean, we leave rt_vcpu_wake() before that, don't we? > If rt_vcpu_sleep() removes vcpus from=C2=A0 > replenishment queue, it's perfectly safe for rt_vcpu_wake() to > insert=C2=A0 > them back.=20 > It's safe against sleeps, not against blockings. That's the point I'm trying to make. > So, I'm suspecting it's the spurious wakeup that's causing=C2=A0 > troubles because vcpus are not removed prior to rt_vcpu_wake(). > However,=C2=A0 > the two RETURNs at the beginning of rt_vcpu_wake() should catch that=C2= =A0 > shouldn't it? > Exactly! > > In any case, one way of dealing with vcpus blocking/offlining/etc > > could > > be to, in context_saved(), in case we are not adding the vcpu back > > to > > the runq, cancel its replenishment event with __replq_remove(). > >=20 > > (This may make it possible to avoid doing it in rt_vcpu_sleep() > > too, > > but you'll need to check and test.) > >=20 > > Can you give this a try. > That makes sense. Doing it in context_saved() kinda implies that if > a=C2=A0 > vcpu is sleeping and taken off, its replenishment event should be=C2=A0 > removed. On the other hand, the logic is the same as removing it in=C2=A0 > rt_vcpu_sleep() but just at different times. > It is, but, if done properly, it catches more cases, or at least that's what I'm after. > Well, I have tried it and=C2=A0 > the check still needs to be there in rt_vcpu_wake(). I will send the=C2= =A0 > next version so it's easier to look at. >=20 If you're still seeing assertion failures, perhaps context_saved() is not the right place where to do that... I'll think more about this... Anyway, yes, let's see the code. Please, also send again, as a follow up email, the assertion failure log you get if you remove the check. Thanks and Regards, Dario --=20 <> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://about.me/dario.faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK) --=-3LgaUjuNetk8Yv05wrsj Content-Type: application/pgp-signature; name="signature.asc" Content-Description: This is a digitally signed message part Content-Transfer-Encoding: 7bit -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iEYEABECAAYFAlbPPxsACgkQk4XaBE3IOsQkfwCffGdKlfcVibfQT8C5Yyy8T3Ez b3sAn3nN9/VJLbeIEQok7g3JCFtk8B+U =Hx76 -----END PGP SIGNATURE----- --=-3LgaUjuNetk8Yv05wrsj-- --===============1429984557912676508== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: base64 Content-Disposition: inline X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KWGVuLWRldmVs IG1haWxpbmcgbGlzdApYZW4tZGV2ZWxAbGlzdHMueGVuLm9yZwpodHRwOi8vbGlzdHMueGVuLm9y Zy94ZW4tZGV2ZWwK --===============1429984557912676508==--