xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
From: Dario Faggioli <dario.faggioli@citrix.com>
To: anshul.makkar@citrix.com, xen-devel@lists.xen.org
Cc: george.dunlap@eu.citrix.com
Subject: Re: [PATCH] credi2-ratelimit: Implement rate limit for credit2 scheduler
Date: Thu, 7 Jul 2016 11:59:11 +0200	[thread overview]
Message-ID: <1467885551.23985.40.camel@citrix.com> (raw)
In-Reply-To: <1467826414-17337-1-git-send-email-anshul.makkar@citrix.com>


[-- Attachment #1.1: Type: text/plain, Size: 9516 bytes --]

On Wed, 2016-07-06 at 18:33 +0100, anshul.makkar@citrix.com wrote:
> From: Anshul Makkar <anshul.makkar@citrix.com>
> 
Hey, Anshul,

Thanks for doing this!

> Rate limit assures that a vcpu will execute for a minimum amount of
> time before
> being put at the back of a queue or being preempted by higher
> priority thread.
> 
I'd say "before a context switch occurs", or "before a context switch
is allowed".

> It introduces a minimum amount of latency to enable a VM to batch its
> work and
> it also ensures that system is not spending most of its time in
> VMEXIT/VMENTRY because of VM that is waking/sleeping at high rate.
> 
Again, something like, "ensures that context switches does not happen
too frequently" seems more accurate and useful as a description.

I also wouldn't use the word "latency", above, to describe the time a
VM has to carry on some work uninterrupted.

> ratelimit can be disabled by setting it to 0.
> 
Good.

> --- a/xen/common/sched_credit2.c
> +++ b/xen/common/sched_credit2.c
> @@ -171,6 +171,11 @@ integer_param("sched_credit2_migrate_resist",
> opt_migrate_resist);
>  #define c2r(_ops, _cpu)     (CSCHED2_PRIV(_ops)->runq_map[(_cpu)])
>  /* CPU to runqueue struct macro */
>  #define RQD(_ops, _cpu)     (&CSCHED2_PRIV(_ops)->rqd[c2r(_ops,
> _cpu)])
> +/* Find the max of time slice */
> +#define MAX_TSLICE(t1, t2)  \
> +                   ({ typeof (t1) _t1 = (t1); \
> +                      typeof (t1) _t2 = (t2); \
> +                      _t1 > _t2 ? _t1 < 0 ? 0 : _t1 : _t2 < 0 ? 0 :
> _t2; })
> 
I'm bad at this things, but isn't subtracting the two values a more
efficient, effective and easier to read way to do this?

Also, I think this macro can be called something like MAX_STIME(), or
STIME_BEFORE (or something like that) and be put in time.h.

> @@ -280,6 +285,7 @@ struct csched2_private {
>      struct csched2_runqueue_data rqd[NR_CPUS];
>  
>      unsigned int load_window_shift;
> +    unsigned ratelimit_us; /* each cpupool can have its onw
> ratelimit */
>  };
>  
>  /*
> @@ -1588,6 +1594,34 @@ csched2_dom_cntl(
>      return rc;
>  }
>  
> +static int csched2_sys_cntl(const struct scheduler *ops,
> +                            struct xen_sysctl_scheduler_op *sc)
> +{
> +    int rc = -EINVAL;
> +    xen_sysctl_credit_schedule_t *params = &sc->u.sched_credit;
> +    struct csched2_private *prv = CSCHED2_PRIV(ops);
> +    unsigned long flags;
> +
> +    switch (sc->cmd )
> +    {
> +        case XEN_SYSCTL_SCHEDOP_putinfo:
> +            if ( params->ratelimit_us &&
> +                ( params->ratelimit_us < CSCHED2_MIN_TIMER ||
> +                  params->ratelimit_us >
> MICROSECS(CSCHED2_MAX_TIMER) ))
>
I see the link with CSCHED2_MIN_TIMER. Still, I think we can just
reuse XEN_SYSCTL_SCHED_RATELIMIT_MAX and _MIN from sysctl.h.

In fact, the minimum interval of time for programming a timer and the
minimum (theoretical) value usable for rate-limiting are, logically,
two different things. Not to mention, that MIN_TIMER is an internal,
implementation detail, which we may want to change at some point, for
various reasons, and it's not good to have one user facing parameter
bound to something like that, IMO.

This also will make things more consistent (between Credit1 and
Credit2).

> +                return rc;
> +            spin_lock_irqsave(&prv->lock, flags);
> +            prv->ratelimit_us = params->ratelimit_us;
> +            spin_unlock_irqrestore(&prv->lock, flags);
> +            break;
> +
In Credit1, this break is not there, and putinfo also returns back the
value that it has just written. I don't feel like this is terribly
important, but, as said, there's value in keeping the two schedulers in
sync, with respect to the interface to this.

> @@ -1680,6 +1717,14 @@ csched2_runtime(const struct scheduler *ops,
> int cpu, struct csched2_vcpu *snext
>  
>      /* 1) Basic time: Run until credit is 0. */
>      rt_credit = snext->credit;
> +    if (snext->vcpu->is_running)
> +        runtime = now - snext->vcpu->runstate.state_entry_time;
> +    if ( runtime < 0 )
> +    {
> +        runtime = 0;
> +        d2printk("%s: Time went backwards? now %"PRI_stime"
> state_entry_time %"PRI_stime"\n",
> +                  _func__, now, snext->runstate.state_entry_time);
> +    }
>  
>      /* 2) If there's someone waiting whose credit is positive,
>       * run until your credit ~= his */
> @@ -1695,11 +1740,24 @@ csched2_runtime(const struct scheduler *ops,
> int cpu, struct csched2_vcpu *snext
>      }
>  
>      /* The next guy may actually have a higher credit, if we've
> tried to
> -     * avoid migrating him from a different cpu.  DTRT.  */
> +     * avoid migrating him from a different cpu.  DTRT.
> +     * Even if the next guy has higher credit and current vcpu has
> executed
> +     * for less amount of time than rate limit, allow it to run for
> minimum
> +     * amount of time.
> +     */
>      if ( rt_credit <= 0 )
>      {
> -        time = CSCHED2_MIN_TIMER;
> -        SCHED_STAT_CRANK(runtime_min_timer);
> +        if ( snext->vcpu->is_running && prv->ratelimit_us)
> +           /* implies the current one has executed for time <
> ratelimit and thus
> +            * it has neen selcted int runq_candidate to run next.
> +            * No need to check for this condition again.
> +            */
> +            time = MAX_TSLICE(CSCHED2_MIN_TIMER,
> +                               MICROSECS(prv->ratelimit_us) -
> runtime);
> +        else
> +            time = MAX_TSLICE(CSCHED2_MIN_TIMER, MICROSECS(prv-
> >ratelimit_us));
> +
> +        SCHED_STAT_CRANK(time);
>
If you want to define a new performance counter, you need more than
just this. But I think you should just not do that for now...

>      }
>      else
>      {
> @@ -1709,17 +1767,22 @@ csched2_runtime(const struct scheduler *ops,
> int cpu, struct csched2_vcpu *snext
>           * at a different rate. */
>          time = c2t(rqd, rt_credit, snext);
>  
> -        /* Check limits */
> -        if ( time < CSCHED2_MIN_TIMER )
> -        {
> -            time = CSCHED2_MIN_TIMER;
> -            SCHED_STAT_CRANK(runtime_min_timer);
> -        }
> -        else if ( time > CSCHED2_MAX_TIMER )
> +        /* Check limits.
> +         * time > ratelimit --> time > MIN.
> +         */
> +        if ( time > CSCHED2_MAX_TIMER )
>          {
> +
>              time = CSCHED2_MAX_TIMER;
>              SCHED_STAT_CRANK(runtime_max_timer);
>          }
> +        else
> +        {
> +            time = MAX_TSLICE(MAX_TSLICE(CSCHED2_MIN_TIMER,
> +                                          MICROSECS(prv-
> >ratelimit_us)), time);
> +            SCHED_STAT_CRANK(time);
> +        }
> +
>      }
>  
So, I've been thinking... Why don't we "just" fixup, right here, the
value that we are about to return?

I.e., basically we return the max between time and
prv->runtime_us - runtime (with runtime being =0 in case snext hasn't
run yet). Or am I missing something?

>      return time;

> @@ -1762,9 +1835,13 @@ runq_candidate(struct csched2_runqueue_data
> *rqd,
>          }
>  
>          /* If the next one on the list has more credit than current
> -         * (or idle, if current is not runnable), choose it. */
> +         * (or idle, if current is not runnable) and current one has
> already
> +         * executed for more than ratelimit. choose it.
> +         * Control has reached here means that current vcpu has
> executed >
> +         * ratelimit_us or ratelimit is off, so chose the next one.
> +         */
>          if ( svc->credit > snext->credit )
> -            snext = svc;
> +                snext = svc;
>  
I don't think I see the need for these changes (with this line here
above being more wrong than pointless, actually).

> @@ -2353,6 +2431,8 @@ csched2_init(struct scheduler *ops)
>          prv->runq_map[i] = -1;
>          prv->rqd[i].id = -1;
>      }
> +    /* initialize ratelimit */
> +    prv->ratelimit_us = 1000 * CSCHED2_MIN_TIMER;
>  
>      prv->load_window_shift = opt_load_window_shift;
>
Mmm... I'd rather set this to either 0 (so, disabled)
or XEN_SYSCTL_CSCHED_TSLICE_MIN (but I indeed prefer 0) by default.
Then do some experiments, see if it helps and try to come up with a
reasonable value.

Thanks and regards,
Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)


[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

[-- Attachment #2: Type: text/plain, Size: 127 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

  reply	other threads:[~2016-07-07  9:59 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-07-06 17:33 [PATCH] credi2-ratelimit: Implement rate limit for credit2 scheduler Anshul
2016-07-07  9:59 ` Dario Faggioli [this message]
2016-07-12 16:16 ` George Dunlap
2016-07-13  8:53   ` Dario Faggioli
2016-07-13 11:08     ` George Dunlap
2016-07-18 11:54     ` anshul makkar

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1467885551.23985.40.camel@citrix.com \
    --to=dario.faggioli@citrix.com \
    --cc=anshul.makkar@citrix.com \
    --cc=george.dunlap@eu.citrix.com \
    --cc=xen-devel@lists.xen.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).