Re: [PATCH v4] xen: sched: convert RTDS from time to event driven model

xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed

From: Tianyang Chen <tiche@seas.upenn.edu>
To: Dario Faggioli <dario.faggioli@citrix.com>,
	xen-devel@lists.xenproject.org
Cc: george.dunlap@citrix.com, Dagaen Golomb <dgolomb@seas.upenn.edu>,
	Meng Xu <mengxu@cis.upenn.edu>
Subject: Re: [PATCH v4] xen: sched: convert RTDS from time to event driven model
Date: Fri, 5 Feb 2016 23:27:56 -0500	[thread overview]
Message-ID: <56B5764C.7090504@seas.upenn.edu> (raw)
In-Reply-To: <1454683182.9227.431.camel@citrix.com>



On 2/5/2016 9:39 AM, Dario Faggioli wrote:
> On Wed, 2016-02-03 at 21:23 -0500, Tianyang Chen wrote:
>>
>> On 2/3/2016 7:39 AM, Dario Faggioli wrote:
>>> On Tue, 2016-02-02 at 13:09 -0500, Tianyang Chen wrote:
>>>>
>>> So what do we do, we raise the *_delayed_runq_add flag and continue
>>> and
>>> leave the vcpu alone. This happens, for instance, when there is a
>>> race
>>> between scheduling out a vcpu that is going to sleep, and the vcpu
>>> in
>>> question being woken back up already! It's not frequent, but we
>>> need to
>>> take care of that. At some point (sooner rather tan later, most
>>> likely)
>>> the context switch will actually happen, and the vcpu that is
>>> waking up
>>> is put in the runqueue.
>>>
>>
>> Yeah I agree with this situation. But I meant to say when a vcpu is
>> waking up while it's still on a queue. See below.
>>
> So, if a vcpu wakes up while it is running on a pcpu already, or while
> it is already woken and has already been put in the runqueue, these are
> SPURIOUS WAKEUP, which should be dealt with by just doing nothing, as
> every scheduler is doing.... This is what I'm tring to say since a
> couple of email, let's see if I've said it clear enough this time. :-D
>

I see. So I'm just curious what can cause spurious wakeup? Does it only 
happen to a running vcpu (currently on pcpu), and a vcpu that is on 
either runq or depletedq?

>>> static void
>>> rt_vcpu_wake(const struct scheduler *ops, struct vcpu *vc)
>>> {
>>>       struct rt_vcpu * const svc = rt_vcpu(vc);
>>>       s_time_t now = NOW();
>>>       struct rt_private *prv = rt_priv(ops);
>>>
>>>       BUG_ON( is_idle_vcpu(vc) );
>>>
>>>       if ( unlikely(curr_on_cpu(vc->processor) == vc) )
>>>       {
>>>           SCHED_STAT_CRANK(vcpu_wake_running);
>>>           return;
>>>       }
>>>
>>
>> For this situation, a vcpu is waking up on a pcpu. It could be taken
>> off from the replq inside the timer handler at the end. So, if we
>> decide
>> to not replenish any sleeping/un-runnable vcpus any further, we
>> should
>> probably add it back to replq here right?
>>
> I'm sorry, I don't understand what you're saying here. The vcpu woke up
> a few time ago. We went through here. At the time it was not on any
> pcpu and it was not in the runqueue, so we did plan a replenishment
> event for it in the replenishment queue, at it's next deadline.
>
> Now we're here again, for some reason. We figure out that the vcpu is
> running already and, most likely (feel free to add an ASSERT() inside
> the if to that effect), it will also still have its replenishment event
> queued.
>
> So we realized that this is just a spurious wakeup, and get back to
> what we were doing.
>
> What's wrong with this I just said?
>

The reason why I wanted to add a vcpu back to replq is because it could 
be taken off from the timer handler. Now, because of the spurious 
wakeup, I think the timer shouldn't take any vcpus off of replq, in 
which case wake() should be doing nothing just like other schedulers 
when it's a spurious wakeup. Also, only sleep() should remove events 
from replq.

>>>       /* on RunQ/DepletedQ, just update info is ok */
>>>       if ( unlikely(__vcpu_on_q(svc)) )
>>>       {
>>>           SCHED_STAT_CRANK(vcpu_wake_onrunq);
>>>           return;
>>>       }
>>>
>>
>> For this situation, a vcpu could be going through the following
>> phases:
>>
>> on runq --> sleep --> (for a long time) --> wake up
>>
> Well, but then we're not here, because when it went to sleep, it's been
> taken off the runqueue, hasn't it?
>
> Actually, I do think that you need to be running to go to sleep, but
> anyway, even if a vcpu could go to sleep or block from the runqueue,
> it's not going to stay in the runqueue, and the first wakeup that
> occurs does not find it on the runqueue, and hence is not covered by
> this case... Is it?
>

uh, I missed the __q_remove() from sleep. It will not end up here in the 
case I mentioned.

>> When it wakes up, it could stay in the front of the runq because
>> cur_deadline hasn't been updated. It will, inherently have some high
>> priority-ness (because of EDF) and be picked over other vcpus right?
>> Do
>> we want to update it's deadline and budget here and re-insert it
>> accordingly?
>>
> When it wakes up and it is found not to be either running or in the
> runqueue already, yes, we certainly want to do that!
>
> But if some other (again, spurious) wakeup occurs, and find it already
> in the runqueue (i.e., this case) we actually *don't* want to update
> its deadline again.
>

Agree.

>>>       if ( likely(vcpu_runnable(vc)) )
>>>           SCHED_STAT_CRANK(vcpu_wake_runnable);
>>>       else
>>>           SCHED_STAT_CRANK(vcpu_wake_not_runnable);
>>>
>>>
>>>       if ( now >= svc->cur_deadline)
>>>       {
>>>           rt_update_deadline(now, svc);
>>>           __replq_remove(svc);
>>>       }
>>>
>>>       __replq_insert(svc);
>>>
>>
>> For here, consider this scenario:
>>
>> on pcpu (originally on replq) --> sleep --> context_saved() but
>> still
>> asleep --> wake()
>>
>> Timer handler isn't triggered before it awakes. Then this vcpu will
>> be
>> added to replq again while it's already there. I'm not 100% positive
>> if
>> this situation is possible though. But judging from
>> rt_context_saved(),
>> it checks if a vcpu is runnable before adding back to the runq.
>>
> I'm also having a bit of an hard time understanding this... especially
> what context_saved() has to do with it.
>

I guess I shouldn't have put /* */ around the removal of a vcpu on replq 
in sleep(). So the original patch won't actually take a vcpu off if it 
goes to sleep. Like Meng said, it should. And I see what you mean by 
spurious wakeup now. I will put a summary for wake() somewhere below.

> Are you saying that it is possible for a vcpu to start running with
> deadline td, and hence a replenishment is programmed to happen at td,
> and then to go to sleep at ta<td and wakeup at tb that is also <td?
>
> Well, in that case, yes, whether to update the deadline or not, and
> whether to reuse the already existing replenishment or not, it's up to
> you guys, as it's part of the algorithm. What does the algorithm say in
> this case?
>
> In this implementation, you are removing replenishment when a vcpu
> sleep, so it looks to me that you don't have much choice than re-
> instating it at wakeup (we've gone through this already, remember?).
>
> OTOH, if you wouldn't stop the timer on sleep, then yes, we ma need
> some logic here, to understand whether the replenishment happened
> during sleeping/blocking time, or if it's still pending.
>
> If that was not what you were saying... then, sorry, but I just did not
> get it...
>
>>> Thoughts?
>>>
>>>
>> So just to wrap up my thoughts for budget replenishment inside of
>> wake(), a vcpu can be in different states(on pcpu, on queue etc.)
>> when
>> it wakes up.
>>
> Yes, and only one is usually interesting: when it's not in any runqueue
> nor on any pcpu. In out case, the case where it's in _delayed_runq add
> needs some care as well (unlike, e.g., in Credit2).
>

See below.

>> 1)wake up on a pcpu:
>> Do nothing as it may have some remaining budget and we just ignore
>> it's
>> cur_deadline when it wakes up. I can't find a corresponding pattern
>> in
>> DS but this can be treated like a server being pushed back because
>> of
>> the nap.
>>
> Well, I'd say: do nothing because it's a spurious wakeup.
>

Yes.

>> 2)wake up on a queue:
>> Update the deadline(refill budget) of it and re-insert it into runq
>> for
>> the reason stated above.
>>
> Do nothing because it's a spurious wakeup.
>

Yes.

>> 3)wake up in the middle of context switching:
>> Update the deadline(refill budget) of it so it can be re-inserted
>> into
>> runq in rt_context_saved().
>>
> Ok.
>
>> 4)wake up on nowhere ( sleep on pcpu--> context_saved() --> wake()):
>> Replenish it and re-insert it back to runq.
>>
> Here it is the one we care about. And yes, I agree on what we should do
> in this case.
>
>> As for replenishment queue manipulation, we should always try and add
>> it
>> back to replq because we don't know if the timer handler has taken
>> it
>> off or not.
>>
> Mmm... no, I think we should know/figure out whether a replenishment is
> pending already and we are ok with it, or if we need a new one.
>

Just to make sure, spurious sleep/wakeup are for a vcpu that is on pcpu 
or any queue right?


If a real sleep happens, it should be taken off from replq. However, in 
spurious wakeup (which I assume corresponds to a spurious sleep?), it 
shouldn't be taken off from replq. Adding back to replq should happen 
for those vcpus which were taken off because of "real sleep".

So here is a summary to make sure everyone's on the same page :)
This is basically your code with a slight modification before 
replq_insert().

static void
rt_vcpu_wake(const struct scheduler *ops, struct vcpu *vc)
{
     struct rt_vcpu * const svc = rt_vcpu(vc);
     s_time_t now = NOW();

     BUG_ON( is_idle_vcpu(vc) );

     if ( unlikely(curr_on_cpu(vc->processor) == vc) )
     {
	/*
	 * spurious wakeup so do nothing as it is
	 * not removed from replq
   	 */
         SCHED_STAT_CRANK(vcpu_wake_running);
         return;
     }

     /* on RunQ/DepletedQ, just update info is ok */
     if ( unlikely(__vcpu_on_q(svc)) )
     {
	/*
	 * spurious wakeup so do nothing as it is
	 * not removed from replq
	 */
         SCHED_STAT_CRANK(vcpu_wake_onrunq);
         return;
     }

     if ( likely(vcpu_runnable(vc)) )
         SCHED_STAT_CRANK(vcpu_wake_runnable);
     else
         SCHED_STAT_CRANK(vcpu_wake_not_runnable);


     /*
      * sleep() might have taken it off from replq before
      * conext_saved(). Three cases:
      * 1) sleep on pcpu--> context_saved() --> wake()
      * 2) sleep on pcpu--> wake() --> context_saved()
      * 3) sleep before context_saved() -->wake().
      * The first two cases sleep() doesn't remove this vcpu
      * so add a check before inserting.
      */
     if ( now >= svc->cur_deadline)
     {
         rt_update_deadline(now, svc);
         __replq_remove(svc);
     }

     if( !__vcpu_on_replq(svc) )
         __replq_insert(svc);

     /* If context hasn't been saved for this vcpu yet, we can't put it on
      * the Runqueue/DepletedQ. Instead, we set a flag so that it will be
      * put on the Runqueue/DepletedQ after the context has been saved.
      */
     if ( unlikely(svc->flags & RTDS_scheduled) )
     {
         set_bit(__RTDS_delayed_runq_add, &svc->flags);

         return;
     }

     /* insert svc to runq/depletedq because svc is not in queue now */
     __runq_insert(ops, svc);
     runq_tickle(ops, snext);

     return;
}


static void
rt_vcpu_sleep(const struct scheduler *ops, struct vcpu *vc)
{
     struct rt_vcpu * const svc = rt_vcpu(vc);

     BUG_ON( is_idle_vcpu(vc) );
     SCHED_STAT_CRANK(vcpu_sleep);

     /*
      * If a vcpu is not sleeping on a pcpu or a queue,
      * it should be taken off
      * from the replenishment queue
      * Case 1: sleep on a pcpu
      * Case 2: sleep on a queue
      * Case 3: sleep before context switch is finished
      */
     if ( curr_on_cpu(vc->processor) == vc )
         cpu_raise_softirq(vc->processor, SCHEDULE_SOFTIRQ);
     else if ( __vcpu_on_q(svc) )
         __q_remove(svc);
     else if ( svc->flags & RTDS_delayed_runq_add )
     {
         clear_bit(__RTDS_delayed_runq_add, &svc->flags);

         if( __vcpu_on_replq(svc) )
            __replq_remove(svc);
     }
}



>>>> Oh I might be misunderstanding what should be put in
>>>> rt_schedule().
>>>> Was
>>>> it specifically for handling a vcpu that shouldn't be taken off a
>>>> core?
>>>> Why are you referring this piece of code as budget enforcement?
>>>>
>>>> Isn't it necessary to modify the runq here after budget
>>>> replenishment
>>>> has happened? If runq goes out of order here, will rt_schedule()
>>>> be
>>>> sorting it?
>>>>
>>> Ops, yes, you are right, this is not budget enforcement. So, budget
>>> enforcement looks still to be missing from this patch (and that is
>>> why
>>> I asked whether it is actually working).
>>>
>>> And this chunk of code (and perhaps this whole function) is
>>> probably
>>> ok, it's just a bit hard to understand what it does...
>>>
>>
>> So, the burn_budget() is still there right? I'm not sure what you
>> are
>> referring to as budget enforcement. Correct me if I'm wrong, budget
>> enforcement means that each vcpu uses its assigned/remaining budget.
>> Is
>> there any specific situation I'm missing?
>>
> That's correct, and I saw that burn_budget() is still there. What I
> failed to see was logic, within rt_schedule() for stopping a vcpu that
> has exhausted its budget until its next replenishment.
>
> Before, this was done in __repl_update(), which is (and that's a good
> thing) no longer there.
>
> But maybe it's me that missed something...
>

__repl_update() only scans runq and depletedq for replenishment right? 
And right after burn_budget(), if the current vcpu runs out of budget,
snext will be picked here because of:

/* if scurr has higher priority and budget, still pick scurr */
if ( !is_idle_vcpu(current) && vcpu_runnable(current) &&
       scurr->cur_budget > 0 && ( is_idle_vcpu(snext->vcpu) ||
     scurr->cur_deadline <= snext->cur_deadline ) )
     snext = scurr;

if scurr->cur_budget == 0, snext will be picked immediately.
Is that what you looking for?

>>> So, do you mind if we take a step back again, and analyze the
>>> situation. When the timer fires, and this handler is executed and a
>>> replenishment event taken off the replenishment queue, the
>>> following
>>> situations are possible:
>>>
>>>    1) the replenishment is for a running vcpu
>>>    2) the replenishment is for a runnable vcpu in the runq
>>>    3) the replenishment is for a runnable vcpu in the depletedq
>>>    4) the replenishment is for a sleeping vcpu
>>>
>>> So, 4) is never going to happen, right?, as we're stopping the
>>> timer
>>> when a vcpu blocks. If we go for this, let's put an ASSERT() and a
>>> comment as a guard for that. If we do the optimization in this very
>>> patch. If we leave it for another patch, we need to handle this
>>> case, I
>>> guess.
>>>
>>> In all 1), 2) and 3) cases, we need to remove the replenishment
>>> event
>>> from the replenishment queue, so just (in the code) do it upfront.
>>>
>>
>> I don't think we need to take them of the replenishment queue? See
>> the
>> last couple paragraphs.
>>
> I meant removing it for reinserting it again with a different
> replenishment time, due to the fact that, as a consequence of the
> replenishment that is just happening, the vcpu is being given a new
> deadline.
>

Ok so we are talking about the same thing.

>>> What other actions need to happen in each case, 1), 2) and 3)? Can
>>> you
>>> summarize then in here, so we can decide how to handle each
>>> situation?
>>>
>>
>> Here we go.
>> 1) the replenishment is for a running vcpu
>> Replenish the vcpu. Keep it on the replenishment queue. Tickle it as
>> it
>> may have a later deadline.
>>
> What do you mean keep it in the repl queue? This particular
> replenishment just happened, which means the time at which it was
> scheduler passed, there is no reason for keeping it in the queue like
> this. Do you mean inserting a new replenishment event in the queue, for
> this same vcpu, with a new replenishment time corresponding to its
> deadline? If yes, then yes, I agree with that.
>

Exactly. It's the way I expressed it. When I said keep it on, it 
includes remove() + insert(). lol

>> 2) the replenishment is for a runnable vcpu in the runq
>> Replenish the vcpu. Keep it on the replenishment queue.
>>
> Same as above, I guess?
>

Yes.

>> (re-)insert it
>> into runq. Tickle it.
>>
> Ok.
>
>> 3) the replenishment is for a runnable vcpu in the depletedq
>> Same as 2)
>>
> Same as 2), but you insert it in the runq (not again in the depletedq),
> right?
>

Right. _q_insert() checks if a vcpu has budget. If so it's put on the 
runq. Otherwise on the depletedq.

>> 4) the replenishment is for a sleeping vcpu
>> Take it off from the replenishment queue and don't do replenishment.
>> As
>> a matter of fact, this patch does the replenishment first and then
>> take
>> it off.
>>
> Ah, so can this happen? Aren't you removing the queued replenishment
> event when a vcpu goes to sleep?
>

Now, if we all agree on removing a vcpu in sleep(), this is not going to 
happen. So, there are three functions manipulating the replq now.

>>> I'd say that, in both case 2) and case 3), the vcpu needs to be
>>> taken
>>> out of the queue where they are, and (re-)inserted in the runqueue,
>>> and
>>> then we need to tickle... Is that correct?
>>>
>>
>> correct^.
>>
> Ok. :-)
>
>>> I'd also say that, in case 1), we also need to tickle because the
>>> deadline may now be have become bigger than some other vcpu that is
>>> in
>>> the runqueue?
>>>
>>
>> correct. I missed this case^.
>>
> Ok. :-) :-)
>
>>> Also, under what conditions we need, as soon as replenishment for
>>> vcpu
>>> X happened, to queue another replenishment for the same vcpu, at
>>> its
>>> next deadline? Always, I would say (unless the vcpu is sleeping,
>>> because of the optimization you introduced)?
>>
>> So this is kinda confusing. If in case 1) 2) and 3) they are never
>> taken
>> off from the replenishment queue, a series of replenishment events
>> are
>> automatically updated after the replenishment.
>>
> This is mostly about the both of us saying the same thing differently.
>
> I'm saying that you should remove the vcpu from the replenishment runqueue and then insert it in there again with an updated replenishment time, computed out of the vcpu's new deadline.
>
> You are saying that the replenishment events are updated.
>
> Are we (talking about the same thing)? :-)
>

Yeah.

>>   There is no need, as
>> least one less reason, to keep track of when to queue the next vcpus
>> for
>> replenishment.
>>
> I don't understand what you mean here.
>
>> The side effect of this is that all vcpus could be going to sleep
>> right
>> after the timer was programmed at the end of the handler. If they
>> don't
>> wake up before the timer fires next time, the handler is called to
>> replenish nobody. If that's the case, timer will not be programmed
>> again
>> in the handler. Wake() kicks in and programs it instead. And the
>> downside of this is that we always need to check if an awaking vcpu
>> is
>> on replq or not. It not add it back(hence the notion of starting to
>> track it's replenishment events again). When adding back, check if
>> there
>> is a need to reprogram the timer.
>>
> This all sounds fine to me.
>
Great. I will put together v5 soon. If we all agree on the rt_wake() and 
rt_sleep().

Thanks,
Tianyang

next prev parent reply	other threads:[~2016-02-06  4:28 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-02-01  4:32 [PATCH v4] xen: sched: convert RTDS from time to event driven model Tianyang Chen
2016-02-02 15:08 ` Dario Faggioli
2016-02-02 18:09   ` Tianyang Chen
2016-02-03 12:39     ` Dario Faggioli
2016-02-04  2:23       ` Tianyang Chen
2016-02-05 14:39         ` Dario Faggioli
2016-02-06  4:27           ` Tianyang Chen [this message]
2016-02-08 11:27             ` Dario Faggioli
2016-02-08 19:04               ` Tianyang Chen
2016-02-08 21:23                 ` Dario Faggioli
2016-02-04  4:03       ` Meng Xu
2016-02-05 14:45         ` Dario Faggioli
2016-02-03  3:33   ` Meng Xu
2016-02-03 12:30     ` Dario Faggioli

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=56B5764C.7090504@seas.upenn.edu \
    --to=tiche@seas.upenn.edu \
    --cc=dario.faggioli@citrix.com \
    --cc=dgolomb@seas.upenn.edu \
    --cc=george.dunlap@citrix.com \
    --cc=mengxu@cis.upenn.edu \
    --cc=xen-devel@lists.xenproject.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).