xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
From: Dario Faggioli <dario.faggioli@citrix.com>
To: Anshul Makkar <anshul.makkar@citrix.com>, xen-devel@lists.xenproject.org
Cc: George Dunlap <george.dunlap@eu.citrix.com>,
	Andrew Cooper <andrew.cooper3@citrix.com>,
	Wei Liu <wei.liu2@citrix.com>,
	Ian Jackson <ian.jackson@eu.citrix.com>,
	Jan Beulich <jbeulich@suse.com>
Subject: Re: [PATCH 1/4] xen: credit2: implement utilization cap
Date: Tue, 13 Jun 2017 23:13:43 +0200	[thread overview]
Message-ID: <1497388423.26212.43.camel@citrix.com> (raw)
In-Reply-To: <142e982a-f07e-8e24-9a5e-7d4eed213dd1@citrix.com>


[-- Attachment #1.1: Type: text/plain, Size: 9943 bytes --]

On Tue, 2017-06-13 at 17:07 +0100, Anshul Makkar wrote:
> On 12/06/2017 14:19, Dario Faggioli wrote:
> > > > @@ -92,6 +92,82 @@
> > > >   */
> > > > 
> > > >  /*
> > > > + * Utilization cap:
> > > > + *
> > > > + * Setting an pCPU utilization cap for a domain means the
> > > > following:
> > > > + *
> > > > + * - a domain can have a cap, expressed in terms of % of
> > > > physical
> > > > + * For implementing this, we use the following approach:
> > > > + *
> > > > + * - each domain is given a 'budget', an each domain has a
> > > > timer,
> > > > which
> > > > + *   replenishes the domain's budget periodically. The budget
> > > > is
> > > > the amount
> > > > + *   of time the vCPUs of the domain can use every 'period';
> > > > + *
> > > > + * - the period is CSCHED2_BDGT_REPL_PERIOD, and is the same
> > > > for
> > > > all domains
> > > > + *   (but each domain has its own timer; so the all are
> > > > periodic
> > > > by the same
> > > > + *   period, but replenishment of the budgets of the various
> > > > domains, at
> > > > + *   periods boundaries, are not synchronous);
> > > > + *
> > > > + * - when vCPUs run, they consume budget. When they don't run,
> > > > they don't
> > > > + *   consume budget. If there is no budget left for the
> > > > domain, no
> > > > vCPU of
> > > > + *   that domain can run. If a vCPU tries to run and finds
> > > > that
> > > > there is no
> > > > + *   budget, it blocks.
> > > > + *   Budget never expires, so at whatever time a vCPU wants to
> > > > run, it can
> > > > + *   check the domain's budget, and if there is some, it can
> > > > use
> > > > it.
> > > > + *
> > > > + * - budget is replenished to the top of the capacity for the
> > > > domain once
> > > > + *   per period. Even if there was some leftover budget from
> > > > previous period,
> > > > + *   though, the budget after a replenishment will always be
> > > > at
> > > > most equal
> > > > + *   to the total capacify of the domain ('tot_budget');
> > > > + *
> > > 
> > > budget is replenished but credits not available ?
> > > 
> > 
> > Still not getting this.
> 
> what I want to ask is that if the budget of the domain is
> replenished, 
> but credit for the vcpus of that domain is not available, then what 
> happens.
>
Yes, but the point is that budget can be available or not, while
credits are always available. There's no such thing as credit not being
available at all.

The amount of credits each vcpu has decides which vcpu will run, in the
sense that it will be the one that has the highest amount of credits.
The others will indeed wait, but because they've got less credit than
the one that runs, not because they don't have credits available.

> I believe, vcpus won't be scheduled (even if they have budget_quota) 
> till they get their credit replenished.
>
Credits are not exhausted or replenished.

If you want to know what happens when there are two vcpus, but with
budget, and a different amount of credits (and only 1 pcpu where to run
them), that is: the one with more credits runs.

> > 
> > > budget is finished but not vcpu has not reached the rate limit
> > > boundary ?
> > > 
> > 
> > Budget takes precedence over ratelimiting. This is important to
> > keep
> > cap working "regularly", rather then in some kind of permanent
> > "trying-
> > to-keep-up-with-overruns-in-previous-periods" state.
> > 
> > And, ideally, a vcpu cap and ratelimiting should be set in such a
> > way
> > that they don't step on each other toe (or do that only rarely). I
> > can
> > see about trying to print a warning when I detect potential tricky
> > values (but it's not easy, considering budget is per-domain, so I
> > can't
> > be sure about how much each vcpu will actually get, and whether or
> > not
> 
> why you can't be sure. Scheduler know the domain budget, number of
> vcpus 
> per domain and we can calculate the budget_quota and translate it
> into 
> cpu slot duration.
>
Sure. So, let's say you give a domain 200%, which means 200ms of budget
every 100ms. It has 4 vcpus, which means each vcpu will get 50ms.

At time t, vcpu1 starts running, executes for 10ms, and then stops.
Still at time t, all the other three vcpus (vcpu2, vcpu3 and vcpu4)
starts running; they run for 50ms, which means they exhaust the quota
you assigned to them, but they would like to continue to run?
What do you do?
There's still 40ms worth of budget available, for this period, in the
domain.

If you don't let (any of) them run, and use that budget, then you're
limiting the domain to 160%.

If you do let (maybe some of) them run, then they are using more than
the quota you calculated for each of them, which is fine, from the cap
point of view (and, in fact, it's what happens with this series), but
means that you can't assume to know for sure what quota of budget each
vcpu will actually use, and hence you can't...

> Similarly , the value of rate limit is also known. We can compare
> and 
> give a warning to the user if the budget_quota is less than rate
> limit.
> 
...compare that with the ratelimit value (or at least, you can sort of
guess and try to come up with a sensible warning, but you can't be
sure).

> This is very important for the user to know, if wrongly chosen, it
> can 
> adversely affect the system's performance with frequent context 
> switches. (the problem we are aware of).
> 
I know. I'll think at how to better prevent (or warn if seeing) too
small values, but there's no such thing as crystal balls or magic wands
:-(

> > > I checked the implenation below and I believe we can allow for
> > > these
> > > type of dynamic budget_quota allocation per vcpu. Not for initial
> > > version, but certainly we can consider it for future versions.
> > > 
> > 
> > But... it's already totally dynamic.
> 
> csched2_dom_cntl()
> {
> svc->budget_quota = max(sdom->tot_budget / sdom->nr_vcpus,
>                                          CSCHED2_MIN_TIMER);
> }
> If domain->tot_budge = 200
> nr_cpus is 4, then each cpu gets 50%.
> How this is dynamic allocation ? We are not considering vcpu
> utilization 
> of other vcpus of domain before allocating budget_quota for some
> vcpu.
> 
Right. Well, what this means is that each vcpu will get budget in
chunks of tot_budget/nr_vcpus. But then, how much budget each vcpu will
actually be able to get and consume in each period, it's impossible to
know in advance, as it will depend on overall system load, and the
behavior of the various vcpus of the domain.

> > > In runq candidate we have a code base
> > > /*
> > >   * Return the current vcpu if it has executed for less than
> > > ratelimit.
> > >   * Adjuststment for the selected vcpu's credit and decision
> > >   * for how long it will run will be taken in csched2_runtime.
> > >   *
> > >   * Note that, if scurr is yielding, we don't let rate limiting
> > > kick
> > > in.
> > >   * In fact, it may be the case that scurr is about to spin, and
> > > there's
> > >   * no point forcing it to do so until rate limiting expires.
> > >   */
> > >   if ( !yield && prv->ratelimit_us && !is_idle_vcpu(scurr->vcpu)
> > > &&
> > >        vcpu_runnable(scurr->vcpu) &&
> > >       (now - scurr->vcpu->runstate.state_entry_time) <
> > >         MICROSECS(prv->ratelimit_us) )
> > > In this codeblock we return scurr. Here there is no check for
> > > vcpu-
> > > > budget.
> > > 
> > > Even if the scurr vcpu has executed for less than rate limit and
> > > scurr
> > > is not yielding, we need to check for its budget before returning
> > > scurr.
> > > 
> > 
> > But we check vcpu_runnable(scurr). And we've already called, in
> > csched2_schedule(), vcpu_try_to_get_budget(scurr). And if scurr
> > could
> > not get any budget, we called park_vcpu(scurr), which sets scurr up
> > in
> > such a way that vcpu_runnable(scurr) is false.
> 
> Yes, got your point, but then the call for vcpu_try_to_get_budet
> should 
> move to the code block in runq_candidate that return scurr other
> wise 
> the condition looks incomplete and makes the logic ambiguous.
> 
I don't think so. I've used a new pause flag for parking vcpus
_exactly_ for taking advantage of the fact that vcpu_runnable() will
then do the right thing automatically, and I wouldn't have to spread
budget checks all around the code.

For instance, something similar happens in context_saved(). There it's
the opposite, i.e., if a vcpu had been parked, but a replenishment
arrived, clearing the _VPF_parked flag, then the vcpu_runnable() check
already present in context_save() will do the right thing and add the
vcpu back in the runqueue.

It's a distinctive characteristic of this implementation, as opposed,
for instance, to Credit1 one, which use vcpu_pause() and vcpu_unpause()
for the same purpose (which is something I totally dislike), and I
don't see why not take advantage of it.

> We call runq_candidate to find the next runnable candidate. If we
> want 
> to return scurr as the current runnable candidate then it should
> have 
> gone through all the checks including budget_quota and all these
> checks 
> should be at one place.
>
Exactly! And in fact, they all are exactly there, being taken care of
by vcpu_runnable() (in the same exact way as it takes care of checking
whether the vcpu has blocked on some I/O, or has been explicitly
paused, or ...).

Regards,
Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)

[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

[-- Attachment #2: Type: text/plain, Size: 127 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

  reply	other threads:[~2017-06-13 21:13 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-06-08 12:08 [PATCH 0/4] xen/tools: Credit2: implement caps Dario Faggioli
2017-06-08 12:08 ` [PATCH 1/4] xen: credit2: implement utilization cap Dario Faggioli
2017-06-12 11:16   ` Anshul Makkar
2017-06-12 13:19     ` Dario Faggioli
2017-06-13 16:07       ` Anshul Makkar
2017-06-13 21:13         ` Dario Faggioli [this message]
2017-06-15 16:16           ` Anshul Makkar
2017-06-22 16:55   ` George Dunlap
2017-06-23 16:19     ` Dario Faggioli
2017-06-28 14:28       ` George Dunlap
2017-06-28 14:56         ` Dario Faggioli
2017-06-28 19:05           ` George Dunlap
2017-06-29 10:09             ` Dario Faggioli
2017-07-25 14:34               ` George Dunlap
2017-07-25 17:29                 ` Dario Faggioli
2017-07-25 15:08   ` George Dunlap
2017-07-25 16:05     ` Dario Faggioli
2017-06-08 12:08 ` [PATCH 2/4] xen: credit2: allow to set and get " Dario Faggioli
2017-06-28 15:19   ` George Dunlap
2017-06-29 10:21     ` Dario Faggioli
2017-06-29  7:39   ` Alan Robinson
2017-06-29  8:26     ` George Dunlap
2017-06-08 12:09 ` [PATCH 3/4] xen: credit2: improve distribution of budget (for domains with caps) Dario Faggioli
2017-06-28 16:02   ` George Dunlap
2017-06-08 12:09 ` [PATCH 4/4] libxl/xl: allow to get and set cap on Credit2 Dario Faggioli
2017-06-09 10:41   ` Wei Liu
2017-06-28 18:43   ` George Dunlap
2017-06-29 10:22     ` Dario Faggioli

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1497388423.26212.43.camel@citrix.com \
    --to=dario.faggioli@citrix.com \
    --cc=andrew.cooper3@citrix.com \
    --cc=anshul.makkar@citrix.com \
    --cc=george.dunlap@eu.citrix.com \
    --cc=ian.jackson@eu.citrix.com \
    --cc=jbeulich@suse.com \
    --cc=wei.liu2@citrix.com \
    --cc=xen-devel@lists.xenproject.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).