From mboxrd@z Thu Jan  1 00:00:00 1970
From: Dario Faggioli <dario.faggioli@citrix.com>
Subject: Re: [PATCH v4] xen: sched: convert RTDS from time to
 event driven model
Date: Tue, 2 Feb 2016 16:08:25 +0100
Message-ID: <1454425705.9227.176.camel@citrix.com>
References: <1454301176-7131-1-git-send-email-tiche@seas.upenn.edu>
Mime-Version: 1.0
Content-Type: multipart/mixed; boundary="===============4923990612523484960=="
Return-path: <xen-devel-bounces@lists.xen.org>
Received: from mail6.bemta3.messagelabs.com ([195.245.230.39])
	by lists.xen.org with esmtp (Exim 4.72)
	(envelope-from <prvs=8339ef7e0=dario.faggioli@citrix.com>)
	id 1aQcZH-0002aA-VU
	for xen-devel@lists.xenproject.org; Tue, 02 Feb 2016 15:08:36 +0000
In-Reply-To: <1454301176-7131-1-git-send-email-tiche@seas.upenn.edu>
List-Unsubscribe: <http://lists.xen.org/cgi-bin/mailman/options/xen-devel>,
	<mailto:xen-devel-request@lists.xen.org?subject=unsubscribe>
List-Post: <mailto:xen-devel@lists.xen.org>
List-Help: <mailto:xen-devel-request@lists.xen.org?subject=help>
List-Subscribe: <http://lists.xen.org/cgi-bin/mailman/listinfo/xen-devel>,
	<mailto:xen-devel-request@lists.xen.org?subject=subscribe>
Sender: xen-devel-bounces@lists.xen.org
Errors-To: xen-devel-bounces@lists.xen.org
To: Tianyang Chen <tiche@seas.upenn.edu>, xen-devel@lists.xenproject.org
Cc: george.dunlap@citrix.com, Dagaen Golomb <dgolomb@seas.upenn.edu>, Meng Xu <mengxu@cis.upenn.edu>
List-Id: xen-devel@lists.xenproject.org

--===============4923990612523484960==
Content-Type: multipart/signed; micalg=pgp-sha1;
	protocol="application/pgp-signature"; boundary="=-E7gVb2yvKkbtjqF1oyRI"

--=-E7gVb2yvKkbtjqF1oyRI
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

On Sun, 2016-01-31 at 23:32 -0500, Tianyang Chen wrote:
> v4 is meant for discussion on the addition of replq.
>=20
In which cases, you should always put something like RFC in the
subject, so people knows what the intent is even by just skimming their
inboxes.

> Changes since v3:
> 	removed running queue.
> 	added repl queue to keep track of repl events.
> 	timer is now per scheduler.
> 	timer is init on a valid cpu in a cpupool.
>=20
> Bugs to be fixed: Cpupool and locks. When a pcpu is removed from a
> pool and added to another, the lock equality assert in free_pdata()
> fails when Pool-0 is using rtds.
>=20
Known issue. I will fix it for both Credit2 and RTDS, so just ignore
it.

> diff --git a/xen/common/sched_rt.c b/xen/common/sched_rt.c
> index 2e5430f..c36e5de 100644
> --- a/xen/common/sched_rt.c
> +++ b/xen/common/sched_rt.c
> @@ -16,6 +16,7 @@
> =C2=A0#include <xen/delay.h>
> =C2=A0#include <xen/event.h>
> =C2=A0#include <xen/time.h>
> +#include <xen/timer.h>
> =C2=A0#include <xen/perfc.h>
> =C2=A0#include <xen/sched-if.h>
> =C2=A0#include <xen/softirq.h>
> @@ -87,7 +88,7 @@
> =C2=A0#define RTDS_DEFAULT_BUDGET=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0(MICROSECS=
(4000))
> =C2=A0
> =C2=A0#define UPDATE_LIMIT_SHIFT=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A010
> -#define MAX_SCHEDULE=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=
=A0=C2=A0=C2=A0=C2=A0(MILLISECS(1))
> +
>
Unrelated to the topic. Can we have a dedicated patch that adds a
comment above=C2=A0UPDATE_LIMIT_SHIFT, explainig what it is and how it's
used.

This is something low priority, of course.

> @@ -160,6 +166,7 @@ struct rt_private {
> =C2=A0 */
> =C2=A0struct rt_vcpu {
> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0struct list_head q_elem;=C2=A0=C2=A0=C2=A0=
=C2=A0/* on the runq/depletedq list */
> +=C2=A0=C2=A0=C2=A0=C2=A0struct list_head p_elem;=C2=A0=C2=A0=C2=A0=C2=A0=
/* on the repl event list */
> =C2=A0
Mmm... 'p_elem' because the previous one was 'q_elem', and 'p' follows
'q', I guess. It feels a bit obscure, though, and I'd prefer a more
'talking' name.

For now, and at this level, I guess I can live with it. But in other
places, things need to be improved (see below).

> @@ -213,8 +220,14 @@ static inline struct list_head
> *rt_depletedq(const struct scheduler *ops)
> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0return &rt_priv(ops)->depletedq;
> =C2=A0}
> =C2=A0
> +static inline struct list_head *rt_replq(const struct scheduler
> *ops)
> +{
> +=C2=A0=C2=A0=C2=A0=C2=A0return &rt_priv(ops)->replq;
> +}
> +
> =C2=A0/*
> - * Queue helper functions for runq and depletedq
> + * Queue helper functions for runq, depletedq
> + * and repl event q
> =C2=A0 */
>
So, in comments, can we use full words as much as possible. It makes
them a lot easier to read (so, e.g., "replenishment events queue" or
"queue of the replenishment events").

I know this very file's style is not like that, from this point of
view... and, in fact, this is something I would like to change, when we
get the chance (i.e., when touching the code for other reasons, like in
this case).

> @@ -228,6 +241,18 @@ __q_elem(struct list_head *elem)
> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0return list_entry(elem, struct rt_vcpu, q_e=
lem);
> =C2=A0}
> =C2=A0
> +static struct rt_vcpu *
> +__p_elem(struct list_head *elem)
> +{
> +=C2=A0=C2=A0=C2=A0=C2=A0return list_entry(elem, struct rt_vcpu, p_elem);
> +}
> +
So, while this may well still be fine...

> +static int
> +__vcpu_on_p(const struct rt_vcpu *svc)
> +{
> +=C2=A0=C2=A0=C2=A0return !list_empty(&svc->p_elem);
> +}
> +
>
...here I really would like to go for something that will make it much
more obvious, just by giving a look at the code, where we are actually
checking the vcpu to be, without having to remember that p is the
replenishment queue.

So, I don't know, maybe something like vcpu_on_replq(), or
vcpu_needs_replenish() ?

> @@ -387,6 +412,13 @@ __q_remove(struct rt_vcpu *svc)
> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0list_del_init(&svc-=
>q_elem);
> =C2=A0}
> =C2=A0
> +static inline void
> +__p_remove(struct rt_vcpu *svc)
> +{
> +=C2=A0=C2=A0=C2=A0=C2=A0if ( __vcpu_on_p(svc) )
> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0list_del_init(&svc->p_el=
em);
> +}
> +
replq_remove(), or vcpu_repl_cancel() (or something else) ?

> @@ -421,6 +453,32 @@ __runq_insert(const struct scheduler *ops,
> struct rt_vcpu *svc)
> =C2=A0}
> =C2=A0
> =C2=A0/*
> + * Insert svc into the repl even list:
> + * vcpus that needs to be repl earlier go first.
> + */
> +static void
> +__replq_insert(const struct scheduler *ops, struct rt_vcpu *svc)
> +{
>
This is exactly what I've been trying to say above: __replq_insert() is
ok, much better than __q_insert()! Do the same everywhere else. :-)

> +=C2=A0=C2=A0=C2=A0=C2=A0struct rt_private *prv =3D rt_priv(ops);
> +=C2=A0=C2=A0=C2=A0=C2=A0struct list_head *replq =3D rt_replq(ops);
> +=C2=A0=C2=A0=C2=A0=C2=A0struct list_head *iter;
> +
> +=C2=A0=C2=A0=C2=A0=C2=A0ASSERT( spin_is_locked(&prv->lock) );
> +
Is this really useful? Considering where this is called, I don't think
it is.

> +=C2=A0=C2=A0=C2=A0=C2=A0ASSERT( !__vcpu_on_p(svc) );
> +
> +=C2=A0=C2=A0=C2=A0=C2=A0list_for_each(iter, replq)
> +=C2=A0=C2=A0=C2=A0=C2=A0{
> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0struct rt_vcpu * iter_sv=
c =3D __p_elem(iter);
> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0if ( svc->cur_deadline <=
=3D iter_svc->cur_deadline )
> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=
=C2=A0=C2=A0=C2=A0=C2=A0break;
> +=C2=A0=C2=A0=C2=A0=C2=A0}
> +
> +=C2=A0=C2=A0=C2=A0=C2=A0list_add_tail(&svc->p_elem, iter);
> +}
This looks ok. The list_for_each()+list_ad_tail() part would be
basically identical to what we have inside __runq_insert(), thgough,
wouldn't it?

It may make sense to define an _ordered_queue_insert() (or
_deadline_queue_insert(), since it's alwas the deadline that is
compared) utility function that we can then call in both cases.

> @@ -449,6 +507,7 @@ rt_init(struct scheduler *ops)
> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0INIT_LIST_HEAD(&prv->sdom);
> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0INIT_LIST_HEAD(&prv->runq);
> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0INIT_LIST_HEAD(&prv->depletedq);
> +=C2=A0=C2=A0=C2=A0=C2=A0INIT_LIST_HEAD(&prv->replq);
> =C2=A0
Is there a reason why you don't do at least the allocation of the timer
here? Because, to me, this looks the best place for that.

> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0cpumask_clear(&prv->tickled);
> =C2=A0
> @@ -473,6 +532,9 @@ rt_deinit(const struct scheduler *ops)
> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0xfree(_cpumask_scra=
tch);
> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0_cpumask_scratch =
=3D NULL;
> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0}
> +
> +=C2=A0=C2=A0=C2=A0=C2=A0kill_timer(prv->repl_timer);
> +
> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0xfree(prv);
>
And here, you're freeing prv, without having freed prv->timer, which is
therefore leaked.
=C2=A0
> @@ -632,6 +699,17 @@ rt_vcpu_insert(const struct scheduler *ops,
> struct vcpu *vc)
> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0vcpu_schedule_unlock_irq(lock, vc);
> =C2=A0
> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0SCHED_STAT_CRANK(vcpu_insert);
> +
> +=C2=A0=C2=A0=C2=A0=C2=A0if( prv->repl_timer =3D=3D NULL )
> +=C2=A0=C2=A0=C2=A0=C2=A0{=C2=A0=C2=A0=C2=A0
> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0/* vc->processor has bee=
n set in schedule.c */
> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0cpu =3D vc->processor;
> +
> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0repl_timer =3D xzalloc(s=
truct timer);
> +
> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0prv->repl_timer =3D repl=
_timer;
> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0init_timer(repl_timer, r=
epl_handler, (void *)ops, cpu);
> +=C2=A0=C2=A0=C2=A0=C2=A0}
> =C2=A0}
> =C2=A0
This belong to rt_init, IMO.

> @@ -841,7 +881,6 @@ rt_schedule(const struct scheduler *ops, s_time_t
> now, bool_t tasklet_work_sched
> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0/* burn_budget would return for IDLE VCPU *=
/
> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0burn_budget(ops, scurr, now);
> =C2=A0
> -=C2=A0=C2=A0=C2=A0=C2=A0__repl_update(ops, now);
> =C2=A0
__repl_update() going away is of course fine. But we need to enact the
budget enforcement logic somewhere in here, don't we?

> @@ -924,6 +967,10 @@ rt_vcpu_sleep(const struct scheduler *ops,
> struct vcpu *vc)
> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0__q_remove(svc);
> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0else if ( svc->flags & RTDS_delayed_runq_ad=
d )
> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0clear_bit(__RTDS_de=
layed_runq_add, &svc->flags);
> +
> +=C2=A0=C2=A0=C2=A0=C2=A0/* stop tracking the repl time of this vcpu */
> +=C2=A0=C2=A0=C2=A0/* if( __vcpu_on_p(svc) )
> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0__p_remove(svc); */
> =C2=A0}
> =C2=A0
Is it ok to kill the replenishment in this case?

This is a genuine question. What does the dynamic DS algorithm that
you're trying to implement here says about this? Meng, maybe you can
help here?

Is it ok to do this _because_ you then handle the situation of a
replenishment having to had happened while the vcpu was asleep in
rt_vcpu_wake (the '(now>=3Dsvc->cur_deadline)' thing)? Yes... it probably
is ok for that reason...

> @@ -1027,23 +1074,21 @@ rt_vcpu_wake(const struct scheduler *ops,
> struct vcpu *vc)
> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0struct rt_vcpu * const svc =3D rt_vcpu(vc);
> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0s_time_t now =3D NOW();
> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0struct rt_private *prv =3D rt_priv(ops);
> -=C2=A0=C2=A0=C2=A0=C2=A0struct rt_vcpu *snext =3D NULL; /* highest prior=
ity on RunQ */
> -=C2=A0=C2=A0=C2=A0=C2=A0struct rt_dom *sdom =3D NULL;
> -=C2=A0=C2=A0=C2=A0=C2=A0cpumask_t *online;
> +=C2=A0=C2=A0=C2=A0=C2=A0struct timer *repl_timer =3D prv->repl_timer;
> =C2=A0
> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0BUG_ON( is_idle_vcpu(vc) );
> =C2=A0
> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0if ( unlikely(curr_on_cpu(vc->processor) =
=3D=3D vc) )
> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0{
> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0SCHED_STAT_CRANK(vc=
pu_wake_running);
> -=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0return;
> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0goto out;
> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0}
> =C2=A0
> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0/* on RunQ/DepletedQ, just update info is o=
k */
> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0if ( unlikely(__vcpu_on_q(svc)) )
> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0{
> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0SCHED_STAT_CRANK(vc=
pu_wake_onrunq);
> -=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0return;
> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0goto out;
> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0}
> =C2=A0
> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0if ( likely(vcpu_runnable(vc)) )
>
Mmm.. no. At least the first two cases, they should really just be
'return'-s.

As you say yourself in the comment further down, right below the 'out:'
label "a newly waken-up vcpu could xxx". If this is running on a pCPU,
or it is in a runqueue already, it is not a newly woken-up vcpu.

In fact, we need to check for this cases, but let's at least made them
act like NOPs, as all other schedulers do and as it's correct.

> @@ -1058,25 +1103,39 @@ rt_vcpu_wake(const struct scheduler *ops,
> struct vcpu *vc)
> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0if ( unlikely(svc->flags & RTDS_scheduled) =
)
> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0{
> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0set_bit(__RTDS_dela=
yed_runq_add, &svc->flags);
> -=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0return;
> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0goto out;
> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0}
>
In this case, I'm less sure. I think this should just return as well,
but it may depend on how other stuff are done (like budget
enforcement), so it's hard to think at it right now, but consider this
when working on next version.

> +=C2=A0=C2=A0=C2=A0=C2=A0/* budget repl here is needed before inserting b=
ack to runq. If
> so,
> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0* it should be taken out of replq and put =
back. This can
> potentially
> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0* cause the timer handler to replenish no =
vcpu when it triggers
> if
> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0* the replenishment is done here already.
> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0*/
>
Coding style for long comments wants them to have both "wings" (you're
missing the opening one, '/*').

> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0if ( now >=3D svc->cur_deadline)
> +=C2=A0=C2=A0=C2=A0=C2=A0{=C2=A0=C2=A0=C2=A0
> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0rt_update_deadline(=
now, svc);
> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0if( __vcpu_on_p(svc) )
> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=
__p_remove(svc);
> +=C2=A0=C2=A0=C2=A0=C2=A0}
> =C2=A0
Well, then maybe the __p_remove() function can be made smart enough to:
=C2=A0- check whether the one being removed is the first element of the
=C2=A0 =C2=A0queue;
=C2=A0- if it is, disarm the timer
=C2=A0- if there still are elements in the queue, rearm the timer to fire
at=C2=A0
=C2=A0 =C2=A0the new first's deadline

This will be useful at least in another case (rt_vcpu_sleep()).

> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0/* insert svc to runq/depletedq because svc=
 is not in queue now
> */
> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0__runq_insert(ops, svc);
> =C2=A0
> -=C2=A0=C2=A0=C2=A0=C2=A0__repl_update(ops, now);
> -
> -=C2=A0=C2=A0=C2=A0=C2=A0ASSERT(!list_empty(&prv->sdom));
> -=C2=A0=C2=A0=C2=A0=C2=A0sdom =3D list_entry(prv->sdom.next, struct rt_do=
m, sdom_elem);
> -=C2=A0=C2=A0=C2=A0=C2=A0online =3D cpupool_domain_cpumask(sdom->dom);
> -=C2=A0=C2=A0=C2=A0=C2=A0snext =3D __runq_pick(ops, online); /* pick snex=
t from ALL valid
> cpus */
> -
> -=C2=A0=C2=A0=C2=A0=C2=A0runq_tickle(ops, snext);
> +=C2=A0=C2=A0=C2=A0=C2=A0runq_tickle(ops, svc);
> =C2=A0
> -=C2=A0=C2=A0=C2=A0=C2=A0return;
> +out:
> +=C2=A0=C2=A0=C2=A0=C2=A0/* a newly waken-up vcpu could have an earlier r=
elease time=C2=A0
> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0* or it could be the first to program the =
timer
> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0*/
> +=C2=A0=C2=A0=C2=A0=C2=A0if( repl_timer->expires =3D=3D 0 || repl_timer->=
expires > svc-
> >cur_deadline )
> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0set_timer(repl_timer,svc=
->cur_deadline);
> +
And again, can't this logic be part of __rqplq_insert()? I.e.:
=C2=A0- do the insertion
=C2=A0- if the inserted element is the new head of the queue, reprogram the=
=C2=A0
=C2=A0 =C2=A0timer

> +=C2=A0=C2=A0=C2=A0=C2=A0/* start tracking the repl time of this vcpu her=
e=C2=A0
> +=C2=A0=C2=A0=C2=A0=C2=A0* it stays on the replq unless it goes to sleep
> +=C2=A0=C2=A0=C2=A0=C2=A0* or marked as not-runnable
> +=C2=A0=C2=A0=C2=A0=C2=A0*/
> +=C2=A0=C2=A0=C2=A0=C2=A0if( !__vcpu_on_p(svc) )
> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0__replq_insert(ops, svc);
> =C2=A0}

> @@ -1168,6 +1216,80 @@ rt_dom_cntl(
> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0return rc;
> =C2=A0}
> =C2=A0
> +/* The replenishment timer handler picks vcpus
> + * from the replq and does the actual replenishment
> + * the replq only keeps track of runnable vcpus
> + */
> +static void repl_handler(void *data){
> +=C2=A0=C2=A0=C2=A0=C2=A0unsigned long flags;
> +=C2=A0=C2=A0=C2=A0=C2=A0s_time_t now =3D NOW();
> +=C2=A0=C2=A0=C2=A0=C2=A0s_time_t t_next =3D LONG_MAX; /* the next time t=
imer should be
> triggered */
> +=C2=A0=C2=A0=C2=A0=C2=A0struct scheduler *ops =3D data;=C2=A0
> +=C2=A0=C2=A0=C2=A0=C2=A0struct rt_private *prv =3D rt_priv(ops);
> +=C2=A0=C2=A0=C2=A0=C2=A0struct list_head *replq =3D rt_replq(ops);
> +=C2=A0=C2=A0=C2=A0=C2=A0struct timer *repl_timer =3D prv->repl_timer;
> +=C2=A0=C2=A0=C2=A0=C2=A0struct list_head *iter, *tmp;
> +=C2=A0=C2=A0=C2=A0=C2=A0struct rt_vcpu *svc =3D NULL;
> +
> +=C2=A0=C2=A0=C2=A0=C2=A0stop_timer(repl_timer);
> +
> +=C2=A0=C2=A0=C2=A0=C2=A0spin_lock_irqsave(&prv->lock, flags);
> +=C2=A0
> +=C2=A0=C2=A0=C2=A0=C2=A0list_for_each_safe(iter, tmp, replq)
> +=C2=A0=C2=A0=C2=A0=C2=A0{
> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0svc =3D __p_elem(iter);
> +
> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0if ( now >=3D svc->cur_d=
eadline )
> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0{
> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=
rt_update_deadline(now, svc);
> +
> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=
if( t_next > svc->cur_deadline )
> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=
=C2=A0=C2=A0=C2=A0=C2=A0t_next =3D svc->cur_deadline;
> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0
Why do you need to do this? The next time at which the timer needs to
be reprogrammed is, after you'll have done all the replenishments that
needed to be done, here in this loop, the deadline of the new head of
the replenishment queue (if there still are elements in there).

> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=
/* when the replenishment happens
> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=
=C2=A0* svc is either on a pcpu or on
> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=
=C2=A0* runq/depletedq
> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=
=C2=A0*/
> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=
if( __vcpu_on_q(svc) )
> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=
{
> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=
=C2=A0=C2=A0=C2=A0=C2=A0/* put back to runq */
> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=
=C2=A0=C2=A0=C2=A0=C2=A0__q_remove(svc);
> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=
=C2=A0=C2=A0=C2=A0=C2=A0__runq_insert(ops, svc);
> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=
=C2=A0=C2=A0=C2=A0=C2=A0runq_tickle(ops, svc);
> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=
}
> +
Aha! So, it looks the budget enforcement logic ended up here? Mmm...
no, I think that doing it in rt_schedule(), as we agreed, would make
things simpler and better looking, both here and there.

This is the replenishment timer handler, and it really should do one
thins: handle replenishment events. That's it! ;-P

> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=
/* resort replq, keep track of this
> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=
=C2=A0* vcpu if it's still runnable. If=C2=A0
> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=
=C2=A0* at this point no vcpu are, then
> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=
=C2=A0* the timer waits for wake() to
> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=
=C2=A0* program it.
> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=
=C2=A0*/
> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=
__p_remove(svc);
> +
> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=
if( vcpu_runnable(svc->vcpu) )
> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=
=C2=A0=C2=A0=C2=A0=C2=A0__replq_insert(ops, svc);
> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0}
> +
> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0else
> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0{
> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=
/* Break out of the loop if a vcpu's
> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=
=C2=A0* cur_deadline is bigger than now.
> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=
=C2=A0* Also when some replenishment is done
> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=
=C2=A0* in wake(), the code could end up here
> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=
=C2=A0* if the time fires earlier than it should
> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=
*/
> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=
if( t_next > svc->cur_deadline )
> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=
=C2=A0=C2=A0=C2=A0=C2=A0t_next =3D svc->cur_deadline;
> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=
break;
> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0}
> +=C2=A0=C2=A0=C2=A0=C2=A0}
> +
> +=C2=A0=C2=A0=C2=A0=C2=A0set_timer(repl_timer, t_next);
> +
Well, what if there are no more replenishments to be performed? You're
still reprogramming the timer, either at t_next or at cur_deadline,
which looks arbitrary.

If there are no replenishments, let's just keep the timer off.

So, all in all, this is a very very good first attempt at implementing
what we agreed upon in previous email... Good work, can't wait to see
v5! :-D

Let me know if there is anything unclear with any of the comments.

Thanks and Regards,
Dario
--=20
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)


--=-E7gVb2yvKkbtjqF1oyRI
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: This is a digitally signed message part
Content-Transfer-Encoding: 7bit

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1

iEYEABECAAYFAlawxmsACgkQk4XaBE3IOsS7xQCgm+Sf6yhViSpLGbdUxAyQTSdC
2vkAn3D/oI3FHI+m/peB+XjykawMaF3G
=Dgkf
-----END PGP SIGNATURE-----

--=-E7gVb2yvKkbtjqF1oyRI--


--===============4923990612523484960==
Content-Type: text/plain; charset="us-ascii"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
Content-Disposition: inline

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

--===============4923990612523484960==--