From mboxrd@z Thu Jan 1 00:00:00 1970 From: Dario Faggioli Subject: Re: [PATCH v7]xen: sched: convert RTDS from time to event driven model Date: Wed, 9 Mar 2016 16:46:04 +0100 Message-ID: <1457538364.3102.418.camel@citrix.com> References: <1457141967-3340-1-git-send-email-tiche@seas.upenn.edu> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============6769900033073948292==" Return-path: Received: from mail6.bemta5.messagelabs.com ([195.245.231.135]) by lists.xen.org with esmtp (Exim 4.84) (envelope-from ) id 1adgJQ-0001ef-Mb for xen-devel@lists.xenproject.org; Wed, 09 Mar 2016 15:46:12 +0000 In-Reply-To: List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Errors-To: xen-devel-bounces@lists.xen.org Sender: "Xen-devel" To: Meng Xu , Tianyang Chen Cc: "xen-devel@lists.xenproject.org" , George Dunlap , Dagaen Golomb , Meng Xu List-Id: xen-devel@lists.xenproject.org --===============6769900033073948292== Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="=-Z3HA1zXxfDTZ+ZCQ7hZe" --=-Z3HA1zXxfDTZ+ZCQ7hZe Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Tue, 2016-03-08 at 23:33 -0500, Meng Xu wrote: > I didn't mark out all repeated style issues. I think you can correct > all of the style issues, such as the spaces in the code, in the next > version. >=20 Yes, please. At v7, style issues shouldn't definitely be there any longer. Have another look at CODING_STYLE, perhaps, especially focusing on spaces in conditional expressions and line length. > Hi Dario, > Could you help check if my two concerns in the comments below make > sense? > One is small, about if we should use () to make the code clearer. > The other is about the logic in the replenishment handler. >=20 I think tickling during replenishment does have issues, yes. See below. Anyway, Meng, can you trim your quotes when replying? By this, I mean cut the part of conversation/patches that are not relevant for what you are saying in this very email (but leave enough context in place for making what you want to point out understandable). That avoids having to scroll pages and pages of quoted text in order to find the actual content. > On Fri, Mar 4, 2016 at 8:39 PM, Tianyang Chen > wrote: > >=C2=A0 > > @@ -1033,32 +1159,40 @@ rt_vcpu_wake(const struct scheduler *ops, > > struct vcpu *vc) > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0else > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0SCHED_STAT_CRANK(= vcpu_wake_not_runnable); > >=20 > > -=C2=A0=C2=A0=C2=A0=C2=A0/* If context hasn't been saved for this vcpu = yet, we can't > > put it on > > -=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0* the Runqueue/DepletedQ. Instead, we se= t a flag so that it > > will be > > -=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0* put on the Runqueue/DepletedQ after th= e context has been > > saved. > > +=C2=A0=C2=A0=C2=A0=C2=A0/* > > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0* If a deadline passed while svc was asl= eep/blocked, we need > > new > > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0* scheduling parameters (a new deadline = and full budget). > > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0*/ > > +=C2=A0=C2=A0=C2=A0=C2=A0if ( (missed =3D now >=3D svc->cur_deadline) ) > Dario, > Is it better to have a () outside of=C2=A0=C2=A0=C2=A0now >=3D svc->cur_d= eadline to > make the logic clearer, instead of relying on the operator's > priority? >=20 Yes, go ahead. You may even compute the value of missed outside of the if, and just use it (both here and below). As you wish. > > @@ -1150,6 +1276,71 @@ rt_dom_cntl( > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0return rc; > > =C2=A0} > >=20 > > +/* > > + * The replenishment timer handler picks vcpus > > + * from the replq and does the actual replenishment. > > + */ > > +static void repl_handler(void *data){ > > +=C2=A0=C2=A0=C2=A0=C2=A0unsigned long flags; > > +=C2=A0=C2=A0=C2=A0=C2=A0s_time_t now =3D NOW(); > > +=C2=A0=C2=A0=C2=A0=C2=A0struct scheduler *ops =3D data; > > +=C2=A0=C2=A0=C2=A0=C2=A0struct rt_private *prv =3D rt_priv(ops); > > +=C2=A0=C2=A0=C2=A0=C2=A0struct list_head *replq =3D rt_replq(ops); > > +=C2=A0=C2=A0=C2=A0=C2=A0struct timer *repl_timer =3D prv->repl_timer; > > +=C2=A0=C2=A0=C2=A0=C2=A0struct list_head *iter, *tmp; > > +=C2=A0=C2=A0=C2=A0=C2=A0struct rt_vcpu *svc =3D NULL; > > + > > +=C2=A0=C2=A0=C2=A0=C2=A0spin_lock_irqsave(&prv->lock, flags); > > + > > +=C2=A0=C2=A0=C2=A0=C2=A0stop_timer(repl_timer); > > + > > +=C2=A0=C2=A0=C2=A0=C2=A0list_for_each_safe(iter, tmp, replq) > > +=C2=A0=C2=A0=C2=A0=C2=A0{ > > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0svc =3D replq_elem(ite= r); > > + > > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0if ( now < svc->cur_de= adline ) > > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0break; > > + > > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0rt_update_deadline(now= , svc); > > + > > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0/* > > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0* when the reple= nishment happens > > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0* svc is either = on a pcpu or on > > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0* runq/depletedq > > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0*/ > > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0if( __vcpu_on_q(svc) ) > > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0{ > > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0/* put back to runq */ > > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0__q_remove(svc); > > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0__runq_insert(ops, svc); > > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0} > > + > > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0/* > > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0* tickle regardl= ess where it's at > > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0* because a runn= ing vcpu could have > > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0* a later deadli= ne than others after > > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0* replenishment > > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0*/ > > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0runq_tickle(ops, svc); > Well, I'm thinking about the correctness of tickle here. > The svc is running on a core, say P_i, right now. After replenishing > the budget for svc, svc's deadline also changes. So svc=C2=A0=C2=A0should > probably be preempted by the head vcpu in the runq. If that's the > case, we should use the head of runq to tickle, instead of using svc. > Right? >=20 I agree that, in case of replenishment of a running vcpu, calling runq_tickle() like this is not correct. In fact, the whole idea of tickling in Credit1 and 2, from which it has been borrowed to here, was meant at (potentially) putting a vcpu in execution, not vice versa. :-/ I therefore agree that, if svc ends up with a later deadline than the first vcpu in the runque, we should actually call run_tickle() on the latter! > Actually, is the runq_tickle necessary here? Can we just check if the > updated svc has higher priority (smaller deadline) than the head vcpu > in the runq? If so, we just tickle the cpu P_i, where svc is > currently > running on.=20 > Well, if we are replenishing a vcpu that is depleted, whose new deadline can be earlier than any of the ones of the vcpus that are running (can't it?) and/or there can be idle vcpus on which you can run it, now that it has budget. So, in this case, I think we need what's done in runq_tickle()... The third case is that we replenish a vcpu that is in the runqueue, so it had budget, but was not running. In that case, there should be no chance that it should preempt a running vcpu, as it was waiting in the runqueue, which means --if we have M pcpus-- it was not among the M earliest deadline vcpus, and we just pushed its deadline away! It should also not be necessary to do any deadline comparison with other vcpus in the runqueue. In fact, still considering that we are moving the deadline ahead, it's going to either be in the same or "worst" position in the runqueue. > But either way, I'm thinking the logic here is incorrect. If the > updated svc has a lower priority, you will end up tickle no core > although the svc should be preempted. >=20 It seems to me that the logic here could be (sort-of): =C2=A0 for_each_to_be_replenished_vcpu(v) =C2=A0 { =C2=A0 =C2=A0=C2=A0deadline_queue_remove(replq, v); =C2=A0 =C2=A0 rt_update_deadline(v); =C2=A0 =C2=A0 if (=C2=A0curr_on_cpu(v->processor) =3D=3D v)) //running =C2=A0 =C2=A0 { =C2=A0 =C2=A0 =C2=A0 =C2=A0 if ( v->cur_deadline >=3D runq[0]->cur_deadline= ) =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 tickle(runq[0]); /* runq[0] =3D> first v= cpu in the runq */ =C2=A0 =C2=A0 } =C2=A0 =C2=A0 else if ( __vcpu_on_q(v) ) =C2=A0 =C2=A0 { =C2=A0 =C2=A0 =C2=A0 =C2=A0 if (=C2=A0v->cur_budget =3D=3D 0 ) =C2=A0 =C2= =A0 =C2=A0 =C2=A0 //depleted =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 tickle(v); =C2=A0 =C2=A0 =C2=A0 =C2=A0 else =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0//runnable =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 /* do nothing */ =C2=A0 =C2=A0 } =C2=A0 =C2=A0=C2=A0deadline_queue_insert(v, replq); =C2=A0 } Thoughts? I actually think there may be room for a nasty race, in case we do more than one replenishments. In fact, assume that, at time t: =C2=A0- we do the replenishment of v1, which is running and v2, which is =C2=A0 =C2=A0runnable, =C2=A0- we have 1 cpu, =C2=A0- v1 and v2 have, currently, the same deadline =3D=3D t, =C2=A0- v1's new deadline is going to be t+100, and v2's new deadline is =C2=A0 =C2=A0going to be t+150, =C2=A0- v2 is the only (i.e., also the first) runnable vcpu, =C2=A0- v1's replenishment event comes first in the replenishment queue. With the code above, we update v1's deadline (to t+100), go check it against v2's _not_updated_ deadline (t) and (since t+100 > t) find that a preemption is necessary, so we call tickle(v2). That will raise SCHEDULE_SOFTIRQ for the cpu, because v2 should --as far as situation is right now-- preempt v1. However, right after that, we update v2's deadline to t+150, and we do nothing. So, even if the vcpu with the earliest deadline (v1) is running, we reschedule. This should be harmless, as we do figure out during rt_schedule() that no real context switch is necessary, but I think it would better be avoided, if possible. It looks to me that we can re-arrange the code above as follows: =C2=A0 for_each_to_be_replenished_vcpu(v) =C2=A0 { =C2=A0 =C2=A0 rt_update_deadline(v); =C2=A0 } =C2=A0 for_each_to_be_replenished_vcpu(v) =C2=A0 { =C2=A0 =C2=A0=C2=A0deadline_queue_remove(replq, v); =C2=A0 =C2=A0 if (=C2=A0curr_on_cpu(v->processor) =3D=3D v)) //running =C2=A0 =C2=A0 { =C2=A0 =C2=A0 =C2=A0 =C2=A0 if ( v->cur_deadline >=3D runq[0]->cur_deadline= ) =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 tickle(runq[0]); /* runq[0] =3D> first v= cpu in the runq */ =C2=A0 =C2=A0 } =C2=A0 =C2=A0 else if ( __vcpu_on_q(v) ) =C2=A0 =C2=A0 { =C2=A0 =C2=A0 =C2=A0 =C2=A0 if (=C2=A0v->cur_budget =3D=3D 0 ) =C2=A0 =C2= =A0 =C2=A0 =C2=A0 //depleted =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 tickle(v); =C2=A0 =C2=A0 =C2=A0 =C2=A0 else =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0//runnable =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 /* do nothing */ =C2=A0 =C2=A0 } =C2=A0 =C2=A0=C2=A0deadline_queue_insert(v, replq); =C2=A0 } Basically, by doing all the replenishments (which includes updating all the deadlines) upfront, we should be able to prevent the above situation. So, again, thoughts? Regards, Dario --=20 <> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://about.me/dario.faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK) --=-Z3HA1zXxfDTZ+ZCQ7hZe Content-Type: application/pgp-signature; name="signature.asc" Content-Description: This is a digitally signed message part Content-Transfer-Encoding: 7bit -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iEYEABECAAYFAlbgRT0ACgkQk4XaBE3IOsRlCwCfbOwV2Oca4cz6IB1YKc8gxmP7 sm0AnAk5udjn9Xq61z/zN0+KE6tJYjOJ =PCip -----END PGP SIGNATURE----- --=-Z3HA1zXxfDTZ+ZCQ7hZe-- --===============6769900033073948292== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: base64 Content-Disposition: inline X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KWGVuLWRldmVs IG1haWxpbmcgbGlzdApYZW4tZGV2ZWxAbGlzdHMueGVuLm9yZwpodHRwOi8vbGlzdHMueGVuLm9y Zy94ZW4tZGV2ZWwK --===============6769900033073948292==--