* [PATCH] xen/sched_credit: Use delay to control scheduling frequency
@ 2011-12-19 22:13 Hui Lv
2011-12-20 9:07 ` Ian Campbell
0 siblings, 1 reply; 20+ messages in thread
From: Hui Lv @ 2011-12-19 22:13 UTC (permalink / raw)
To: xen-devel; +Cc: George.Dunlap, hui.lv
1. Basically, the "delay method" can achieve 11% overall performance boost for SPECvirt than original credit scheduler.
2. We have tried 1ms delay and 10ms delay, there is no big difference between these two configurations. (1ms is enough to achieve a good performance)
3. We have compared different load level response time/latency (low, high, peak), "delay method" didn't bring very much response time increase.
4. 1ms delay can reduce 30% context switch at peak performance, where produces the benefits. (.int sched_ratelimit_us = 1000. is the recommended setting)
Signed-off-by: Hui Lv <hui.lv@intel.com>
Signed-off-by: George Dunlap <George.Dunlap@eu.citrix.com>
diff -r a4bff36780a3 -r 114495d1ff94 xen/common/sched_credit.c
--- a/xen/common/sched_credit.c Fri Dec 16 18:46:27 2011 +0000
+++ b/xen/common/sched_credit.c Mon Dec 19 15:58:50 2011 -0500
@@ -110,6 +110,10 @@ boolean_param("sched_credit_default_yiel
static int __read_mostly sched_credit_tslice_ms = CSCHED_DEFAULT_TSLICE_MS;
integer_param("sched_credit_tslice_ms", sched_credit_tslice_ms);
+/* Scheduler generic parameters
+*/
+extern int sched_ratelimit_us;
+
/*
* Physical CPU
*/
@@ -1297,10 +1301,15 @@ csched_schedule(
struct csched_private *prv = CSCHED_PRIV(ops);
struct csched_vcpu *snext;
struct task_slice ret;
+ s_time_t runtime, tslice;
CSCHED_STAT_CRANK(schedule);
CSCHED_VCPU_CHECK(current);
+ runtime = now - current->runstate.state_entry_time;
+ if ( runtime < 0 ) /* Does this ever happen? */
+ runtime = 0;
+
if ( !is_idle_vcpu(scurr->vcpu) )
{
/* Update credits of a non-idle VCPU. */
@@ -1313,6 +1322,41 @@ csched_schedule(
scurr->pri = CSCHED_PRI_IDLE;
}
+ /* Choices, choices:
+ * - If we have a tasklet, we need to run the idle vcpu no matter what.
+ * - If sched rate limiting is in effect, and the current vcpu has
+ * run for less than that amount of time, continue the current one,
+ * but with a shorter timeslice and return it immediately
+ * - Otherwise, chose the one with the highest priority (which may
+ * be the one currently running)
+ * - If the currently running one is TS_OVER, see if there
+ * is a higher priority one waiting on the runqueue of another
+ * cpu and steal it.
+ */
+
+ /* If we have schedule rate limiting enabled, check to see
+ * how long we've run for. */
+ if ( !tasklet_work_scheduled
+ && sched_ratelimit_us
+ && vcpu_runnable(current)
+ && !is_idle_vcpu(current)
+ && runtime < MICROSECS(sched_ratelimit_us) )
+ {
+ snext = scurr;
+ snext->start_time += now;
+ perfc_incr(delay_ms);
+ tslice = MICROSECS(sched_ratelimit_us);
+ ret.migrated = 0;
+ goto out;
+ }
+ else
+ {
+ /*
+ * Select next runnable local VCPU (ie top of local runq)
+ */
+ tslice = MILLISECS(prv->tslice_ms);
+ }
+
/*
* Select next runnable local VCPU (ie top of local runq)
*/
@@ -1367,11 +1411,12 @@ csched_schedule(
if ( !is_idle_vcpu(snext->vcpu) )
snext->start_time += now;
+out:
/*
* Return task to run next...
*/
ret.time = (is_idle_vcpu(snext->vcpu) ?
- -1 : MILLISECS(prv->tslice_ms));
+ -1 : tslice);
ret.task = snext->vcpu;
CSCHED_VCPU_CHECK(ret.task);
diff -r a4bff36780a3 -r 114495d1ff94 xen/common/schedule.c
--- a/xen/common/schedule.c Fri Dec 16 18:46:27 2011 +0000
+++ b/xen/common/schedule.c Mon Dec 19 15:58:50 2011 -0500
@@ -47,6 +47,12 @@ string_param("sched", opt_sched);
bool_t sched_smt_power_savings = 0;
boolean_param("sched_smt_power_savings", sched_smt_power_savings);
+/* Default scheduling rate limit: 1ms
+ * It's not recommended to set this value bigger than "sched_credit_tslice_ms"
+ * otherwise, something weired may happen
+ * */
+int sched_ratelimit_us = 1000;
+integer_param("sched_ratelimit_us", sched_ratelimit_us);
/* Various timer handlers. */
static void s_timer_fn(void *unused);
static void vcpu_periodic_timer_fn(void *data);
diff -r a4bff36780a3 -r 114495d1ff94 xen/include/xen/perfc_defn.h
--- a/xen/include/xen/perfc_defn.h Fri Dec 16 18:46:27 2011 +0000
+++ b/xen/include/xen/perfc_defn.h Mon Dec 19 15:58:50 2011 -0500
@@ -16,6 +16,7 @@ PERFCOUNTER(sched_irq, "sch
PERFCOUNTER(sched_run, "sched: runs through scheduler")
PERFCOUNTER(sched_ctx, "sched: context switches")
+PERFCOUNTER(delay_ms, "csched: delay")
PERFCOUNTER(vcpu_check, "csched: vcpu_check")
PERFCOUNTER(schedule, "csched: schedule")
PERFCOUNTER(acct_run, "csched: acct_run")
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] xen/sched_credit: Use delay to control scheduling frequency
2011-12-19 22:13 Hui Lv
@ 2011-12-20 9:07 ` Ian Campbell
2011-12-20 9:44 ` Lv, Hui
2011-12-20 10:21 ` George Dunlap
0 siblings, 2 replies; 20+ messages in thread
From: Ian Campbell @ 2011-12-20 9:07 UTC (permalink / raw)
To: Hui Lv; +Cc: George Dunlap, xen-devel@lists.xensource.com
On Mon, 2011-12-19 at 22:13 +0000, Hui Lv wrote:
> 1. Basically, the "delay method" can achieve 11% overall performance
> boost for SPECvirt than original credit scheduler.
Is it possible to publish the raw numbers?
> 2. We have tried 1ms delay and 10ms delay, there is no big difference
> between these two configurations. (1ms is enough to achieve a good
> performance)
> 3. We have compared different load level response time/latency (low,
> high, peak), "delay method" didn't bring very much response time
> increase.
> 4. 1ms delay can reduce 30% context switch at peak performance, where
> produces the benefits. (.int sched_ratelimit_us = 1000. is the
> recommended setting)
>
> Signed-off-by: Hui Lv <hui.lv@intel.com>
> Signed-off-by: George Dunlap <George.Dunlap@eu.citrix.com>
>
> diff -r a4bff36780a3 -r 114495d1ff94 xen/common/sched_credit.c
> --- a/xen/common/sched_credit.c Fri Dec 16 18:46:27 2011 +0000
> +++ b/xen/common/sched_credit.c Mon Dec 19 15:58:50 2011 -0500
> @@ -110,6 +110,10 @@ boolean_param("sched_credit_default_yiel
> static int __read_mostly sched_credit_tslice_ms = CSCHED_DEFAULT_TSLICE_MS;
> integer_param("sched_credit_tslice_ms", sched_credit_tslice_ms);
>
> +/* Scheduler generic parameters
> +*/
> +extern int sched_ratelimit_us;
If this is generic then it seems like xen/sched.h is the right place for
it.
Is it really generic though if only the credit scheduler applies it?
> +
> /*
> * Physical CPU
> */
> @@ -1297,10 +1301,15 @@ csched_schedule(
> struct csched_private *prv = CSCHED_PRIV(ops);
> struct csched_vcpu *snext;
> struct task_slice ret;
> + s_time_t runtime, tslice;
>
> CSCHED_STAT_CRANK(schedule);
> CSCHED_VCPU_CHECK(current);
>
> + runtime = now - current->runstate.state_entry_time;
> + if ( runtime < 0 ) /* Does this ever happen? */
> + runtime = 0;
> +
> if ( !is_idle_vcpu(scurr->vcpu) )
> {
> /* Update credits of a non-idle VCPU. */
> @@ -1313,6 +1322,41 @@ csched_schedule(
> scurr->pri = CSCHED_PRI_IDLE;
> }
>
> + /* Choices, choices:
> + * - If we have a tasklet, we need to run the idle vcpu no matter what.
> + * - If sched rate limiting is in effect, and the current vcpu has
> + * run for less than that amount of time, continue the current one,
> + * but with a shorter timeslice and return it immediately
> + * - Otherwise, chose the one with the highest priority (which may
> + * be the one currently running)
> + * - If the currently running one is TS_OVER, see if there
> + * is a higher priority one waiting on the runqueue of another
> + * cpu and steal it.
> + */
> +
> + /* If we have schedule rate limiting enabled, check to see
> + * how long we've run for. */
> + if ( !tasklet_work_scheduled
> + && sched_ratelimit_us
> + && vcpu_runnable(current)
> + && !is_idle_vcpu(current)
> + && runtime < MICROSECS(sched_ratelimit_us) )
> + {
> + snext = scurr;
> + snext->start_time += now;
> + perfc_incr(delay_ms);
> + tslice = MICROSECS(sched_ratelimit_us);
> + ret.migrated = 0;
> + goto out;
> + }
> + else
> + {
This is the same comment as the next block below, does it really apply
here too?
Since the previous if block ends with a goto the else clause is a bit
redundant.
> + /*
> + * Select next runnable local VCPU (ie top of local runq)
> + */
> + tslice = MILLISECS(prv->tslice_ms);
> + }
> +
> /*
> * Select next runnable local VCPU (ie top of local runq)
> */
> @@ -1367,11 +1411,12 @@ csched_schedule(
> if ( !is_idle_vcpu(snext->vcpu) )
> snext->start_time += now;
>
> +out:
> /*
> * Return task to run next...
> */
> ret.time = (is_idle_vcpu(snext->vcpu) ?
> - -1 : MILLISECS(prv->tslice_ms));
> + -1 : tslice);
> ret.task = snext->vcpu;
>
> CSCHED_VCPU_CHECK(ret.task);
> diff -r a4bff36780a3 -r 114495d1ff94 xen/common/schedule.c
> --- a/xen/common/schedule.c Fri Dec 16 18:46:27 2011 +0000
> +++ b/xen/common/schedule.c Mon Dec 19 15:58:50 2011 -0500
> @@ -47,6 +47,12 @@ string_param("sched", opt_sched);
> bool_t sched_smt_power_savings = 0;
> boolean_param("sched_smt_power_savings", sched_smt_power_savings);
>
> +/* Default scheduling rate limit: 1ms
> + * It's not recommended to set this value bigger than "sched_credit_tslice_ms"
> + * otherwise, something weired may happen
weird
It that constraint enforced anywhere?
> + * */
> +int sched_ratelimit_us = 1000;
> +integer_param("sched_ratelimit_us", sched_ratelimit_us);
> /* Various timer handlers. */
> static void s_timer_fn(void *unused);
> static void vcpu_periodic_timer_fn(void *data);
> diff -r a4bff36780a3 -r 114495d1ff94 xen/include/xen/perfc_defn.h
> --- a/xen/include/xen/perfc_defn.h Fri Dec 16 18:46:27 2011 +0000
> +++ b/xen/include/xen/perfc_defn.h Mon Dec 19 15:58:50 2011 -0500
> @@ -16,6 +16,7 @@ PERFCOUNTER(sched_irq, "sch
> PERFCOUNTER(sched_run, "sched: runs through scheduler")
> PERFCOUNTER(sched_ctx, "sched: context switches")
>
> +PERFCOUNTER(delay_ms, "csched: delay")
> PERFCOUNTER(vcpu_check, "csched: vcpu_check")
> PERFCOUNTER(schedule, "csched: schedule")
> PERFCOUNTER(acct_run, "csched: acct_run")
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] xen/sched_credit: Use delay to control scheduling frequency
2011-12-20 9:07 ` Ian Campbell
@ 2011-12-20 9:44 ` Lv, Hui
2011-12-20 10:09 ` Dario Faggioli
2011-12-20 10:21 ` George Dunlap
1 sibling, 1 reply; 20+ messages in thread
From: Lv, Hui @ 2011-12-20 9:44 UTC (permalink / raw)
To: Ian Campbell; +Cc: George Dunlap, xen-devel@lists.xensource.com
[-- Attachment #1: Type: text/plain, Size: 975 bytes --]
> Is it possible to publish the raw numbers?
Yes.
Actually, I have posted the raw data before.
You can check the attachment for the details.
> If this is generic then it seems like xen/sched.h is the right place for it.
>
> Is it really generic though if only the credit scheduler applies it?
We have discussed the "generic" issue in previous mails with George, Dario Faggioli and some other people.
The conclusion is that currently, it is applied to credit scheduler. In future, it may be developed into a generic solution.
> This is the same comment as the next block below, does it really apply here
> too?
>
> Since the previous if block ends with a goto the else clause is a bit redundant.
Thanks, I'll remove the redundant comment and else clause.
> It that constraint enforced anywhere?
Currently, it's not constraint enforced.
I think George have gave the explanation in yesterday's discussion.
Best regards,
Lv, Hui
[-- Attachment #2: scheduler comparison.xlsx --]
[-- Type: application/vnd.openxmlformats-officedocument.spreadsheetml.sheet, Size: 12105 bytes --]
[-- Attachment #3: Type: text/plain, Size: 138 bytes --]
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] xen/sched_credit: Use delay to control scheduling frequency
2011-12-20 9:44 ` Lv, Hui
@ 2011-12-20 10:09 ` Dario Faggioli
2011-12-20 10:26 ` George Dunlap
0 siblings, 1 reply; 20+ messages in thread
From: Dario Faggioli @ 2011-12-20 10:09 UTC (permalink / raw)
To: Lv, Hui; +Cc: George Dunlap, xen-devel@lists.xensource.com, Ian Campbell
[-- Attachment #1.1: Type: text/plain, Size: 657 bytes --]
On Tue, 2011-12-20 at 17:44 +0800, Lv, Hui wrote:
> > It that constraint enforced anywhere?
>
> Currently, it's not constraint enforced.
> I think George have gave the explanation in yesterday's discussion.
>
I think that too, but maybe you can print some kind of warning when it
happens?
Regards,
Dario
--
<<This happens because I choose it to happen!>> (Raistlin Majere)
-------------------------------------------------------------------
Dario Faggioli, http://retis.sssup.it/people/faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)
PhD Candidate, ReTiS Lab, Scuola Superiore Sant'Anna, Pisa (Italy)
[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 198 bytes --]
[-- Attachment #2: Type: text/plain, Size: 138 bytes --]
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] xen/sched_credit: Use delay to control scheduling frequency
2011-12-20 9:07 ` Ian Campbell
2011-12-20 9:44 ` Lv, Hui
@ 2011-12-20 10:21 ` George Dunlap
1 sibling, 0 replies; 20+ messages in thread
From: George Dunlap @ 2011-12-20 10:21 UTC (permalink / raw)
To: Ian Campbell; +Cc: George Dunlap, xen-devel@lists.xensource.com, Hui Lv
On Tue, 2011-12-20 at 09:07 +0000, Ian Campbell wrote:
>
> > +/* Scheduler generic parameters
> > +*/
> > +extern int sched_ratelimit_us;
>
> If this is generic then it seems like xen/sched.h is the right place for
> it.
Yes, this should be declared in xen/include/xen/sched.h.
> Is it really generic though if only the credit scheduler applies it?
The concept is generic enough that I think saying, "This is the control,
not all schedulers implement it" makes sense. credit2 *may* in the
future have use for such a control; at which point it would be a bit
asinine to have two switches, named "credit_sched_ratelimit_us" and
"credit2_sched_ratelimit_us".
> > + /* If we have schedule rate limiting enabled, check to see
> > + * how long we've run for. */
> > + if ( !tasklet_work_scheduled
> > + && sched_ratelimit_us
> > + && vcpu_runnable(current)
> > + && !is_idle_vcpu(current)
> > + && runtime < MICROSECS(sched_ratelimit_us) )
> > + {
> > + snext = scurr;
> > + snext->start_time += now;
> > + perfc_incr(delay_ms);
> > + tslice = MICROSECS(sched_ratelimit_us);
> > + ret.migrated = 0;
> > + goto out;
> > + }
> > + else
> > + {
>
> This is the same comment as the next block below, does it really apply
> here too?
>
> Since the previous if block ends with a goto the else clause is a bit
> redundant.
Yeah, probably better w/o the else.
> > +/* Default scheduling rate limit: 1ms
> > + * It's not recommended to set this value bigger than "sched_credit_tslice_ms"
> > + * otherwise, something weired may happen
Hui, "something weird may happen" is fine for informal speech, but for
comments we tend to use more formal language. I might say something
like this:
"The behavior when sched_ratelimit_us is greater than
sched_credit_tslice_ms is undefined."
-George
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] xen/sched_credit: Use delay to control scheduling frequency
2011-12-20 10:09 ` Dario Faggioli
@ 2011-12-20 10:26 ` George Dunlap
2011-12-20 10:51 ` Dario Faggioli
2011-12-20 12:14 ` Lv, Hui
0 siblings, 2 replies; 20+ messages in thread
From: George Dunlap @ 2011-12-20 10:26 UTC (permalink / raw)
To: Dario Faggioli
Cc: George Dunlap, xen-devel@lists.xensource.com, Ian Campbell,
Jan Beulich, Lv, Hui
On Tue, 2011-12-20 at 10:09 +0000, Dario Faggioli wrote:
> On Tue, 2011-12-20 at 17:44 +0800, Lv, Hui wrote:
> > > It that constraint enforced anywhere?
> >
> > Currently, it's not constraint enforced.
> > I think George have gave the explanation in yesterday's discussion.
> >
> I think that too, but maybe you can print some kind of warning when it
> happens?
I guess the consensus is that we should put in some effort to make the
interface more polished.
How about this:
First, add a ratelimit_us element to csched_priv, just like the current
tslice_ms element.
When the scheduler comes up (in csched_init), it checks to see if
MICROSECS(sched_ratelimit_us) > MILLISECS(sched_credit_tslice_ms); if
so, it prints a warning, and sets prv->ratelimit_us to
1000*prv->tslice_ms.
In the future, when we implement the domctls to change a scheduler's
ratelimit_us, and tslice_ms, we disallow changes which would violate the
"ratelimit < tslice" rule (returning -EINVAL or something like that).
Thoughts?
-George
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] xen/sched_credit: Use delay to control scheduling frequency
2011-12-20 10:26 ` George Dunlap
@ 2011-12-20 10:51 ` Dario Faggioli
2011-12-20 12:14 ` Lv, Hui
1 sibling, 0 replies; 20+ messages in thread
From: Dario Faggioli @ 2011-12-20 10:51 UTC (permalink / raw)
To: George Dunlap
Cc: George Dunlap, xen-devel@lists.xensource.com, Ian Campbell,
Jan Beulich, Lv, Hui
[-- Attachment #1.1: Type: text/plain, Size: 1415 bytes --]
On Tue, 2011-12-20 at 10:26 +0000, George Dunlap wrote:
> > I think that too, but maybe you can print some kind of warning when it
> > happens?
>
> I guess the consensus is that we should put in some effort to make the
> interface more polished.
>
Exactly! :-)
> How about this:
>
> First, add a ratelimit_us element to csched_priv, just like the current
> tslice_ms element.
>
> When the scheduler comes up (in csched_init), it checks to see if
> MICROSECS(sched_ratelimit_us) > MILLISECS(sched_credit_tslice_ms); if
> so, it prints a warning, and sets prv->ratelimit_us to
> 1000*prv->tslice_ms.
>
For what it counts, I'm more than fine with just printing some
"WARNING: rate-limit > time slice is nonsense", as well as doing as you
suggest... Or even with doing both! :-P
> In the future, when we implement the domctls to change a scheduler's
> ratelimit_us, and tslice_ms, we disallow changes which would violate the
> "ratelimit < tslice" rule (returning -EINVAL or something like that).
>
That makes a lot of sense.
Regards,
Dario
--
<<This happens because I choose it to happen!>> (Raistlin Majere)
-------------------------------------------------------------------
Dario Faggioli, http://retis.sssup.it/people/faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)
PhD Candidate, ReTiS Lab, Scuola Superiore Sant'Anna, Pisa (Italy)
[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 198 bytes --]
[-- Attachment #2: Type: text/plain, Size: 138 bytes --]
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] xen/sched_credit: Use delay to control scheduling frequency
2011-12-20 10:26 ` George Dunlap
2011-12-20 10:51 ` Dario Faggioli
@ 2011-12-20 12:14 ` Lv, Hui
1 sibling, 0 replies; 20+ messages in thread
From: Lv, Hui @ 2011-12-20 12:14 UTC (permalink / raw)
To: George Dunlap, Dario Faggioli
Cc: George Dunlap, xen-devel@lists.xensource.com, Ian Campbell,
Jan Beulich
> How about this:
>
> First, add a ratelimit_us element to csched_priv, just like the current tslice_ms
> element.
>
> When the scheduler comes up (in csched_init), it checks to see if
> MICROSECS(sched_ratelimit_us) > MILLISECS(sched_credit_tslice_ms); if so, it
> prints a warning, and sets prv->ratelimit_us to 1000*prv->tslice_ms.
>
I think it makes sense to do so.
^ permalink raw reply [flat|nested] 20+ messages in thread
* [PATCH] xen/sched_credit: Use delay to control scheduling frequency
@ 2011-12-26 8:46 Hui Lv
2012-01-02 8:19 ` Jan Beulich
2012-01-06 19:56 ` George Dunlap
0 siblings, 2 replies; 20+ messages in thread
From: Hui Lv @ 2011-12-26 8:46 UTC (permalink / raw)
To: xen-devel, George.Dunlap, raistlin, JBeulich, Ian.Campbell; +Cc: hui.lv
Some modifications for this patch.
1.Based on George's proposal, added a ratelimit_us element to csched_priv to constrain prv->ratelimit_us <= 1000* prv->tslice_ms in csched_init.
2.Move the definition of sched_ratelimit_us to xen/sched.h
3.Remove the redundant "else" structure
See if you have any comment for such changes.
This patch can improve Xen performance:
1. Basically, the "delay method" can achieve 11% overall performance boost for SPECvirt than original credit scheduler.
2. We have tried 1ms delay and 10ms delay, there is no big difference between these two configurations. (1ms is enough to achieve a good performance)
3. We have compared different load level response time/latency (low, high, peak), "delay method" didn't bring very much response time increase.
4. 1ms delay can reduce 30% context switch at peak performance, where produces the benefits. (int sched_ratelimit_us = 1000 is the recommended setting)
Signed-off-by: Hui Lv <hui.lv@intel.com>
Signed-off-by: George Dunlap <George.Dunlap@eu.citrix.com>
diff -r a4bff36780a3 -r 18f40e1d0274 xen/common/sched_credit.c
--- a/xen/common/sched_credit.c Fri Dec 16 18:46:27 2011 +0000
+++ b/xen/common/sched_credit.c Mon Dec 26 03:44:39 2011 -0500
@@ -172,6 +172,7 @@ struct csched_private {
uint32_t credit;
int credit_balance;
uint32_t runq_sort;
+ unsigned ratelimit_us;
/* Period of master and tick in milliseconds */
unsigned tslice_ms, tick_period_us, ticks_per_tslice;
unsigned credits_per_tslice;
@@ -1297,10 +1298,15 @@ csched_schedule(
struct csched_private *prv = CSCHED_PRIV(ops);
struct csched_vcpu *snext;
struct task_slice ret;
+ s_time_t runtime, tslice;
CSCHED_STAT_CRANK(schedule);
CSCHED_VCPU_CHECK(current);
+ runtime = now - current->runstate.state_entry_time;
+ if ( runtime < 0 ) /* Does this ever happen? */
+ runtime = 0;
+
if ( !is_idle_vcpu(scurr->vcpu) )
{
/* Update credits of a non-idle VCPU. */
@@ -1313,6 +1319,35 @@ csched_schedule(
scurr->pri = CSCHED_PRI_IDLE;
}
+ /* Choices, choices:
+ * - If we have a tasklet, we need to run the idle vcpu no matter what.
+ * - If sched rate limiting is in effect, and the current vcpu has
+ * run for less than that amount of time, continue the current one,
+ * but with a shorter timeslice and return it immediately
+ * - Otherwise, chose the one with the highest priority (which may
+ * be the one currently running)
+ * - If the currently running one is TS_OVER, see if there
+ * is a higher priority one waiting on the runqueue of another
+ * cpu and steal it.
+ */
+
+ /* If we have schedule rate limiting enabled, check to see
+ * how long we've run for. */
+ if ( !tasklet_work_scheduled
+ && prv->ratelimit_us
+ && vcpu_runnable(current)
+ && !is_idle_vcpu(current)
+ && runtime < MICROSECS(prv->ratelimit_us) )
+ {
+ snext = scurr;
+ snext->start_time += now;
+ perfc_incr(delay_ms);
+ tslice = MICROSECS(prv->ratelimit_us);
+ ret.migrated = 0;
+ goto out;
+ }
+ tslice = MILLISECS(prv->tslice_ms);
+
/*
* Select next runnable local VCPU (ie top of local runq)
*/
@@ -1367,11 +1402,12 @@ csched_schedule(
if ( !is_idle_vcpu(snext->vcpu) )
snext->start_time += now;
+out:
/*
* Return task to run next...
*/
ret.time = (is_idle_vcpu(snext->vcpu) ?
- -1 : MILLISECS(prv->tslice_ms));
+ -1 : tslice);
ret.task = snext->vcpu;
CSCHED_VCPU_CHECK(ret.task);
@@ -1533,6 +1569,15 @@ csched_init(struct scheduler *ops)
prv->tick_period_us = prv->tslice_ms * 1000 / prv->ticks_per_tslice;
prv->credits_per_tslice = CSCHED_CREDITS_PER_MSEC * prv->tslice_ms;
+ if ( MICROSECS(sched_ratelimit_us) > MILLISECS(sched_credit_tslice_ms) )
+ {
+ printk("WARNING: sched_ratelimit_us >"
+ "sched_credit_tslice_ms is undefined\n"
+ "ratelimit_us is set to 1000 * tslice_ms forcely\n");
+ prv->ratelimit_us = 1000 * prv->tslice_ms;
+ }
+ else
+ prv->ratelimit_us = sched_ratelimit_us;
return 0;
}
diff -r a4bff36780a3 -r 18f40e1d0274 xen/common/schedule.c
--- a/xen/common/schedule.c Fri Dec 16 18:46:27 2011 +0000
+++ b/xen/common/schedule.c Mon Dec 26 03:44:39 2011 -0500
@@ -47,6 +47,11 @@ string_param("sched", opt_sched);
bool_t sched_smt_power_savings = 0;
boolean_param("sched_smt_power_savings", sched_smt_power_savings);
+/* Default scheduling rate limit: 1ms
+ * The behavior when sched_ratelimit_us is greater than sched_credit_tslice_ms is undefined
+ * */
+int sched_ratelimit_us = 1000;
+integer_param("sched_ratelimit_us", sched_ratelimit_us);
/* Various timer handlers. */
static void s_timer_fn(void *unused);
static void vcpu_periodic_timer_fn(void *data);
diff -r a4bff36780a3 -r 18f40e1d0274 xen/include/xen/perfc_defn.h
--- a/xen/include/xen/perfc_defn.h Fri Dec 16 18:46:27 2011 +0000
+++ b/xen/include/xen/perfc_defn.h Mon Dec 26 03:44:39 2011 -0500
@@ -16,6 +16,7 @@ PERFCOUNTER(sched_irq, "sch
PERFCOUNTER(sched_run, "sched: runs through scheduler")
PERFCOUNTER(sched_ctx, "sched: context switches")
+PERFCOUNTER(delay_ms, "csched: delay")
PERFCOUNTER(vcpu_check, "csched: vcpu_check")
PERFCOUNTER(schedule, "csched: schedule")
PERFCOUNTER(acct_run, "csched: acct_run")
diff -r a4bff36780a3 -r 18f40e1d0274 xen/include/xen/sched-if.h
--- a/xen/include/xen/sched-if.h Fri Dec 16 18:46:27 2011 +0000
+++ b/xen/include/xen/sched-if.h Mon Dec 26 03:44:39 2011 -0500
@@ -16,6 +16,11 @@ extern struct cpupool *cpupool0;
/* cpus currently in no cpupool */
extern cpumask_t cpupool_free_cpus;
+/* Scheduler generic parameters
+ * */
+extern int sched_ratelimit_us;
+
+
/*
* In order to allow a scheduler to remap the lock->cpu mapping,
* we have a per-cpu pointer, along with a pre-allocated set of
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] xen/sched_credit: Use delay to control scheduling frequency
2011-12-26 8:46 [PATCH] xen/sched_credit: Use delay to control scheduling frequency Hui Lv
@ 2012-01-02 8:19 ` Jan Beulich
2012-01-05 4:08 ` Lv, Hui
2012-01-06 19:50 ` George Dunlap
2012-01-06 19:56 ` George Dunlap
1 sibling, 2 replies; 20+ messages in thread
From: Jan Beulich @ 2012-01-02 8:19 UTC (permalink / raw)
To: Hui Lv; +Cc: George.Dunlap, xen-devel, raistlin, Ian.Campbell
>>> On 26.12.11 at 09:46, Hui Lv <hui.lv@intel.com> wrote:
> Some modifications for this patch.
> 1.Based on George's proposal, added a ratelimit_us element to csched_priv to
> constrain prv->ratelimit_us <= 1000* prv->tslice_ms in csched_init.
I do not see why you need a per-scheduler-instance variable for
issuing the warning and correcting the value - one warning (during
the first initialization) is entirely sufficient. (Otoh the already existing
tslice_ms field is pointless currently too, as it never gets changed
after being initialized from sched_credit_tslice_ms - George?)
Jan
> 2.Move the definition of sched_ratelimit_us to xen/sched.h
> 3.Remove the redundant "else" structure
>
> See if you have any comment for such changes.
>
>
> This patch can improve Xen performance:
> 1. Basically, the "delay method" can achieve 11% overall performance boost
> for SPECvirt than original credit scheduler.
> 2. We have tried 1ms delay and 10ms delay, there is no big difference
> between these two configurations. (1ms is enough to achieve a good
> performance)
> 3. We have compared different load level response time/latency (low, high,
> peak), "delay method" didn't bring very much response time increase.
> 4. 1ms delay can reduce 30% context switch at peak performance, where
> produces the benefits. (int sched_ratelimit_us = 1000 is the recommended
> setting)
>
> Signed-off-by: Hui Lv <hui.lv@intel.com>
> Signed-off-by: George Dunlap <George.Dunlap@eu.citrix.com>
>
> diff -r a4bff36780a3 -r 18f40e1d0274 xen/common/sched_credit.c
> --- a/xen/common/sched_credit.c Fri Dec 16 18:46:27 2011 +0000
> +++ b/xen/common/sched_credit.c Mon Dec 26 03:44:39 2011 -0500
> @@ -172,6 +172,7 @@ struct csched_private {
> uint32_t credit;
> int credit_balance;
> uint32_t runq_sort;
> + unsigned ratelimit_us;
> /* Period of master and tick in milliseconds */
> unsigned tslice_ms, tick_period_us, ticks_per_tslice;
> unsigned credits_per_tslice;
> @@ -1297,10 +1298,15 @@ csched_schedule(
> struct csched_private *prv = CSCHED_PRIV(ops);
> struct csched_vcpu *snext;
> struct task_slice ret;
> + s_time_t runtime, tslice;
>
> CSCHED_STAT_CRANK(schedule);
> CSCHED_VCPU_CHECK(current);
>
> + runtime = now - current->runstate.state_entry_time;
> + if ( runtime < 0 ) /* Does this ever happen? */
> + runtime = 0;
> +
> if ( !is_idle_vcpu(scurr->vcpu) )
> {
> /* Update credits of a non-idle VCPU. */
> @@ -1313,6 +1319,35 @@ csched_schedule(
> scurr->pri = CSCHED_PRI_IDLE;
> }
>
> + /* Choices, choices:
> + * - If we have a tasklet, we need to run the idle vcpu no matter what.
> + * - If sched rate limiting is in effect, and the current vcpu has
> + * run for less than that amount of time, continue the current one,
> + * but with a shorter timeslice and return it immediately
> + * - Otherwise, chose the one with the highest priority (which may
> + * be the one currently running)
> + * - If the currently running one is TS_OVER, see if there
> + * is a higher priority one waiting on the runqueue of another
> + * cpu and steal it.
> + */
> +
> + /* If we have schedule rate limiting enabled, check to see
> + * how long we've run for. */
> + if ( !tasklet_work_scheduled
> + && prv->ratelimit_us
> + && vcpu_runnable(current)
> + && !is_idle_vcpu(current)
> + && runtime < MICROSECS(prv->ratelimit_us) )
> + {
> + snext = scurr;
> + snext->start_time += now;
> + perfc_incr(delay_ms);
> + tslice = MICROSECS(prv->ratelimit_us);
> + ret.migrated = 0;
> + goto out;
> + }
> + tslice = MILLISECS(prv->tslice_ms);
> +
> /*
> * Select next runnable local VCPU (ie top of local runq)
> */
> @@ -1367,11 +1402,12 @@ csched_schedule(
> if ( !is_idle_vcpu(snext->vcpu) )
> snext->start_time += now;
>
> +out:
> /*
> * Return task to run next...
> */
> ret.time = (is_idle_vcpu(snext->vcpu) ?
> - -1 : MILLISECS(prv->tslice_ms));
> + -1 : tslice);
> ret.task = snext->vcpu;
>
> CSCHED_VCPU_CHECK(ret.task);
> @@ -1533,6 +1569,15 @@ csched_init(struct scheduler *ops)
> prv->tick_period_us = prv->tslice_ms * 1000 / prv->ticks_per_tslice;
> prv->credits_per_tslice = CSCHED_CREDITS_PER_MSEC * prv->tslice_ms;
>
> + if ( MICROSECS(sched_ratelimit_us) > MILLISECS(sched_credit_tslice_ms) )
> + {
> + printk("WARNING: sched_ratelimit_us >"
> + "sched_credit_tslice_ms is undefined\n"
> + "ratelimit_us is set to 1000 * tslice_ms forcely\n");
> + prv->ratelimit_us = 1000 * prv->tslice_ms;
> + }
> + else
> + prv->ratelimit_us = sched_ratelimit_us;
> return 0;
> }
>
> diff -r a4bff36780a3 -r 18f40e1d0274 xen/common/schedule.c
> --- a/xen/common/schedule.c Fri Dec 16 18:46:27 2011 +0000
> +++ b/xen/common/schedule.c Mon Dec 26 03:44:39 2011 -0500
> @@ -47,6 +47,11 @@ string_param("sched", opt_sched);
> bool_t sched_smt_power_savings = 0;
> boolean_param("sched_smt_power_savings", sched_smt_power_savings);
>
> +/* Default scheduling rate limit: 1ms
> + * The behavior when sched_ratelimit_us is greater than
> sched_credit_tslice_ms is undefined
> + * */
> +int sched_ratelimit_us = 1000;
> +integer_param("sched_ratelimit_us", sched_ratelimit_us);
> /* Various timer handlers. */
> static void s_timer_fn(void *unused);
> static void vcpu_periodic_timer_fn(void *data);
> diff -r a4bff36780a3 -r 18f40e1d0274 xen/include/xen/perfc_defn.h
> --- a/xen/include/xen/perfc_defn.h Fri Dec 16 18:46:27 2011 +0000
> +++ b/xen/include/xen/perfc_defn.h Mon Dec 26 03:44:39 2011 -0500
> @@ -16,6 +16,7 @@ PERFCOUNTER(sched_irq, "sch
> PERFCOUNTER(sched_run, "sched: runs through scheduler")
> PERFCOUNTER(sched_ctx, "sched: context switches")
>
> +PERFCOUNTER(delay_ms, "csched: delay")
> PERFCOUNTER(vcpu_check, "csched: vcpu_check")
> PERFCOUNTER(schedule, "csched: schedule")
> PERFCOUNTER(acct_run, "csched: acct_run")
> diff -r a4bff36780a3 -r 18f40e1d0274 xen/include/xen/sched-if.h
> --- a/xen/include/xen/sched-if.h Fri Dec 16 18:46:27 2011 +0000
> +++ b/xen/include/xen/sched-if.h Mon Dec 26 03:44:39 2011 -0500
> @@ -16,6 +16,11 @@ extern struct cpupool *cpupool0;
> /* cpus currently in no cpupool */
> extern cpumask_t cpupool_free_cpus;
>
> +/* Scheduler generic parameters
> + * */
> +extern int sched_ratelimit_us;
> +
> +
> /*
> * In order to allow a scheduler to remap the lock->cpu mapping,
> * we have a per-cpu pointer, along with a pre-allocated set of
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] xen/sched_credit: Use delay to control scheduling frequency
2012-01-02 8:19 ` Jan Beulich
@ 2012-01-05 4:08 ` Lv, Hui
2012-01-06 19:50 ` George Dunlap
1 sibling, 0 replies; 20+ messages in thread
From: Lv, Hui @ 2012-01-05 4:08 UTC (permalink / raw)
To: Jan Beulich, George.Dunlap@eu.citrix.com
Cc: xen-devel@lists.xensource.com, raistlin@linux.it,
Ian.Campbell@citrix.com
George,
Any comments for this?
Is it the right "warning" way?
> -----Original Message-----
> From: Jan Beulich [mailto:JBeulich@suse.com]
> Sent: Monday, January 02, 2012 4:19 PM
> To: Lv, Hui
> Cc: Ian.Campbell@citrix.com; George.Dunlap@eu.citrix.com;
> raistlin@linux.it; xen-devel@lists.xensource.com
> Subject: Re: [PATCH] xen/sched_credit: Use delay to control scheduling
> frequency
>
> >>> On 26.12.11 at 09:46, Hui Lv <hui.lv@intel.com> wrote:
> > Some modifications for this patch.
> > 1.Based on George's proposal, added a ratelimit_us element to
> csched_priv to
> > constrain prv->ratelimit_us <= 1000* prv->tslice_ms in csched_init.
>
> I do not see why you need a per-scheduler-instance variable for
> issuing the warning and correcting the value - one warning (during
> the first initialization) is entirely sufficient. (Otoh the already
> existing
> tslice_ms field is pointless currently too, as it never gets changed
> after being initialized from sched_credit_tslice_ms - George?)
>
> Jan
>
> > 2.Move the definition of sched_ratelimit_us to xen/sched.h
> > 3.Remove the redundant "else" structure
> >
> > See if you have any comment for such changes.
> >
> >
> > This patch can improve Xen performance:
> > 1. Basically, the "delay method" can achieve 11% overall performance
> boost
> > for SPECvirt than original credit scheduler.
> > 2. We have tried 1ms delay and 10ms delay, there is no big difference
> > between these two configurations. (1ms is enough to achieve a good
> > performance)
> > 3. We have compared different load level response time/latency (low,
> high,
> > peak), "delay method" didn't bring very much response time increase.
> > 4. 1ms delay can reduce 30% context switch at peak performance, where
> > produces the benefits. (int sched_ratelimit_us = 1000 is the
> recommended
> > setting)
> >
> > Signed-off-by: Hui Lv <hui.lv@intel.com>
> > Signed-off-by: George Dunlap <George.Dunlap@eu.citrix.com>
> >
> > diff -r a4bff36780a3 -r 18f40e1d0274 xen/common/sched_credit.c
> > --- a/xen/common/sched_credit.c Fri Dec 16 18:46:27 2011 +0000
> > +++ b/xen/common/sched_credit.c Mon Dec 26 03:44:39 2011 -0500
> > @@ -172,6 +172,7 @@ struct csched_private {
> > uint32_t credit;
> > int credit_balance;
> > uint32_t runq_sort;
> > + unsigned ratelimit_us;
> > /* Period of master and tick in milliseconds */
> > unsigned tslice_ms, tick_period_us, ticks_per_tslice;
> > unsigned credits_per_tslice;
> > @@ -1297,10 +1298,15 @@ csched_schedule(
> > struct csched_private *prv = CSCHED_PRIV(ops);
> > struct csched_vcpu *snext;
> > struct task_slice ret;
> > + s_time_t runtime, tslice;
> >
> > CSCHED_STAT_CRANK(schedule);
> > CSCHED_VCPU_CHECK(current);
> >
> > + runtime = now - current->runstate.state_entry_time;
> > + if ( runtime < 0 ) /* Does this ever happen? */
> > + runtime = 0;
> > +
> > if ( !is_idle_vcpu(scurr->vcpu) )
> > {
> > /* Update credits of a non-idle VCPU. */
> > @@ -1313,6 +1319,35 @@ csched_schedule(
> > scurr->pri = CSCHED_PRI_IDLE;
> > }
> >
> > + /* Choices, choices:
> > + * - If we have a tasklet, we need to run the idle vcpu no
> matter what.
> > + * - If sched rate limiting is in effect, and the current vcpu
> has
> > + * run for less than that amount of time, continue the current
> one,
> > + * but with a shorter timeslice and return it immediately
> > + * - Otherwise, chose the one with the highest priority (which
> may
> > + * be the one currently running)
> > + * - If the currently running one is TS_OVER, see if there
> > + * is a higher priority one waiting on the runqueue of another
> > + * cpu and steal it.
> > + */
> > +
> > + /* If we have schedule rate limiting enabled, check to see
> > + * how long we've run for. */
> > + if ( !tasklet_work_scheduled
> > + && prv->ratelimit_us
> > + && vcpu_runnable(current)
> > + && !is_idle_vcpu(current)
> > + && runtime < MICROSECS(prv->ratelimit_us) )
> > + {
> > + snext = scurr;
> > + snext->start_time += now;
> > + perfc_incr(delay_ms);
> > + tslice = MICROSECS(prv->ratelimit_us);
> > + ret.migrated = 0;
> > + goto out;
> > + }
> > + tslice = MILLISECS(prv->tslice_ms);
> > +
> > /*
> > * Select next runnable local VCPU (ie top of local runq)
> > */
> > @@ -1367,11 +1402,12 @@ csched_schedule(
> > if ( !is_idle_vcpu(snext->vcpu) )
> > snext->start_time += now;
> >
> > +out:
> > /*
> > * Return task to run next...
> > */
> > ret.time = (is_idle_vcpu(snext->vcpu) ?
> > - -1 : MILLISECS(prv->tslice_ms));
> > + -1 : tslice);
> > ret.task = snext->vcpu;
> >
> > CSCHED_VCPU_CHECK(ret.task);
> > @@ -1533,6 +1569,15 @@ csched_init(struct scheduler *ops)
> > prv->tick_period_us = prv->tslice_ms * 1000 / prv-
> >ticks_per_tslice;
> > prv->credits_per_tslice = CSCHED_CREDITS_PER_MSEC * prv-
> >tslice_ms;
> >
> > + if ( MICROSECS(sched_ratelimit_us) >
> MILLISECS(sched_credit_tslice_ms) )
> > + {
> > + printk("WARNING: sched_ratelimit_us >"
> > + "sched_credit_tslice_ms is undefined\n"
> > + "ratelimit_us is set to 1000 * tslice_ms forcely\n");
> > + prv->ratelimit_us = 1000 * prv->tslice_ms;
> > + }
> > + else
> > + prv->ratelimit_us = sched_ratelimit_us;
> > return 0;
> > }
> >
> > diff -r a4bff36780a3 -r 18f40e1d0274 xen/common/schedule.c
> > --- a/xen/common/schedule.c Fri Dec 16 18:46:27 2011 +0000
> > +++ b/xen/common/schedule.c Mon Dec 26 03:44:39 2011 -0500
> > @@ -47,6 +47,11 @@ string_param("sched", opt_sched);
> > bool_t sched_smt_power_savings = 0;
> > boolean_param("sched_smt_power_savings", sched_smt_power_savings);
> >
> > +/* Default scheduling rate limit: 1ms
> > + * The behavior when sched_ratelimit_us is greater than
> > sched_credit_tslice_ms is undefined
> > + * */
> > +int sched_ratelimit_us = 1000;
> > +integer_param("sched_ratelimit_us", sched_ratelimit_us);
> > /* Various timer handlers. */
> > static void s_timer_fn(void *unused);
> > static void vcpu_periodic_timer_fn(void *data);
> > diff -r a4bff36780a3 -r 18f40e1d0274 xen/include/xen/perfc_defn.h
> > --- a/xen/include/xen/perfc_defn.h Fri Dec 16 18:46:27 2011
> +0000
> > +++ b/xen/include/xen/perfc_defn.h Mon Dec 26 03:44:39 2011 -
> 0500
> > @@ -16,6 +16,7 @@ PERFCOUNTER(sched_irq, "sch
> > PERFCOUNTER(sched_run, "sched: runs through scheduler")
> > PERFCOUNTER(sched_ctx, "sched: context switches")
> >
> > +PERFCOUNTER(delay_ms, "csched: delay")
> > PERFCOUNTER(vcpu_check, "csched: vcpu_check")
> > PERFCOUNTER(schedule, "csched: schedule")
> > PERFCOUNTER(acct_run, "csched: acct_run")
> > diff -r a4bff36780a3 -r 18f40e1d0274 xen/include/xen/sched-if.h
> > --- a/xen/include/xen/sched-if.h Fri Dec 16 18:46:27 2011 +0000
> > +++ b/xen/include/xen/sched-if.h Mon Dec 26 03:44:39 2011 -0500
> > @@ -16,6 +16,11 @@ extern struct cpupool *cpupool0;
> > /* cpus currently in no cpupool */
> > extern cpumask_t cpupool_free_cpus;
> >
> > +/* Scheduler generic parameters
> > + * */
> > +extern int sched_ratelimit_us;
> > +
> > +
> > /*
> > * In order to allow a scheduler to remap the lock->cpu mapping,
> > * we have a per-cpu pointer, along with a pre-allocated set of
>
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] xen/sched_credit: Use delay to control scheduling frequency
2012-01-02 8:19 ` Jan Beulich
2012-01-05 4:08 ` Lv, Hui
@ 2012-01-06 19:50 ` George Dunlap
1 sibling, 0 replies; 20+ messages in thread
From: George Dunlap @ 2012-01-06 19:50 UTC (permalink / raw)
To: Jan Beulich
Cc: xen-devel@lists.xensource.com, raistlin@linux.it, Ian Campbell,
Hui Lv
On 02/01/12 03:19, Jan Beulich wrote:
>>>> On 26.12.11 at 09:46, Hui Lv<hui.lv@intel.com> wrote:
>> Some modifications for this patch.
>> 1.Based on George's proposal, added a ratelimit_us element to csched_priv to
>> constrain prv->ratelimit_us<= 1000* prv->tslice_ms in csched_init.
> I do not see why you need a per-scheduler-instance variable for
> issuing the warning and correcting the value - one warning (during
> the first initialization) is entirely sufficient. (Otoh the already existing
> tslice_ms field is pointless currently too, as it never gets changed
> after being initialized from sched_credit_tslice_ms - George?)
Before the 4.2 release, I plan to add SYSCTL_scheduler_op calls for the
credit scheduler for both the timeslice and ratelimiting. But I don't
think it's necessary for this patch to be admitted.
-George
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] xen/sched_credit: Use delay to control scheduling frequency
2011-12-26 8:46 [PATCH] xen/sched_credit: Use delay to control scheduling frequency Hui Lv
2012-01-02 8:19 ` Jan Beulich
@ 2012-01-06 19:56 ` George Dunlap
2012-01-08 12:03 ` Lv, Hui
1 sibling, 1 reply; 20+ messages in thread
From: George Dunlap @ 2012-01-06 19:56 UTC (permalink / raw)
To: Hui Lv
Cc: xen-devel@lists.xensource.com, raistlin@linux.it, Ian Campbell,
JBeulich@suse.com
Sorry for the delay; just catching up after the Christmas holidays.
On 26/12/11 03:46, Hui Lv wrote:
> @@ -1533,6 +1569,15 @@ csched_init(struct scheduler *ops)
> prv->tick_period_us = prv->tslice_ms * 1000 / prv->ticks_per_tslice;
> prv->credits_per_tslice = CSCHED_CREDITS_PER_MSEC * prv->tslice_ms;
>
> + if ( MICROSECS(sched_ratelimit_us)> MILLISECS(sched_credit_tslice_ms) )
> + {
> + printk("WARNING: sched_ratelimit_us>"
> + "sched_credit_tslice_ms is undefined\n"
> + "ratelimit_us is set to 1000 * tslice_ms forcely\n")
The standard idiom for this kind of message would be:
WARNING [what's wrong]
[What you're doing about it]
So the last sentence of the warning should be:
Setting ratelimit_us to 1000 * tslice_ms
(Grammatically, you could say "Forcing ratelimit..." but I think "force"
is too strong in this case.)
Other than that, I'm happy with it, if everyone else is:
Acked-by: George Dunlap <george.dunlap@eu.citrix.com>
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] xen/sched_credit: Use delay to control scheduling frequency
2012-01-06 19:56 ` George Dunlap
@ 2012-01-08 12:03 ` Lv, Hui
2012-01-09 9:49 ` George Dunlap
0 siblings, 1 reply; 20+ messages in thread
From: Lv, Hui @ 2012-01-08 12:03 UTC (permalink / raw)
To: George Dunlap
Cc: xen-devel@lists.xensource.com, raistlin@linux.it, Ian Campbell,
JBeulich@suse.com
Thanks, George.
Should I send a revised version?
Can it be checked in?
-----Original Message-----
From: George Dunlap [mailto:george.dunlap@eu.citrix.com]
Sent: Saturday, January 07, 2012 3:57 AM
To: Lv, Hui
Cc: xen-devel@lists.xensource.com; raistlin@linux.it; JBeulich@suse.com; Ian Campbell
Subject: Re: [PATCH] xen/sched_credit: Use delay to control scheduling frequency
Sorry for the delay; just catching up after the Christmas holidays.
On 26/12/11 03:46, Hui Lv wrote:
> @@ -1533,6 +1569,15 @@ csched_init(struct scheduler *ops)
> prv->tick_period_us = prv->tslice_ms * 1000 / prv->ticks_per_tslice;
> prv->credits_per_tslice = CSCHED_CREDITS_PER_MSEC * prv->tslice_ms;
>
> + if ( MICROSECS(sched_ratelimit_us)> MILLISECS(sched_credit_tslice_ms) )
> + {
> + printk("WARNING: sched_ratelimit_us>"
> + "sched_credit_tslice_ms is undefined\n"
> + "ratelimit_us is set to 1000 * tslice_ms forcely\n")
The standard idiom for this kind of message would be:
WARNING [what's wrong]
[What you're doing about it]
So the last sentence of the warning should be:
Setting ratelimit_us to 1000 * tslice_ms
(Grammatically, you could say "Forcing ratelimit..." but I think "force"
is too strong in this case.)
Other than that, I'm happy with it, if everyone else is:
Acked-by: George Dunlap <george.dunlap@eu.citrix.com>
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] xen/sched_credit: Use delay to control scheduling frequency
2012-01-08 12:03 ` Lv, Hui
@ 2012-01-09 9:49 ` George Dunlap
0 siblings, 0 replies; 20+ messages in thread
From: George Dunlap @ 2012-01-09 9:49 UTC (permalink / raw)
To: Lv, Hui
Cc: George Dunlap, xen-devel@lists.xensource.com, raistlin@linux.it,
Ian Campbell, JBeulich@suse.com
On Sun, 2012-01-08 at 12:03 +0000, Lv, Hui wrote:
> Thanks, George.
> Should I send a revised version?
> Can it be checked in?
Yes, please send a revised version.
It will be checked in if one of the maintainers thinks there's enough
consensus (probably if no one has any more comments in the next day or
two).
Thanks,
-George
>
> -----Original Message-----
> From: George Dunlap [mailto:george.dunlap@eu.citrix.com]
> Sent: Saturday, January 07, 2012 3:57 AM
> To: Lv, Hui
> Cc: xen-devel@lists.xensource.com; raistlin@linux.it; JBeulich@suse.com; Ian Campbell
> Subject: Re: [PATCH] xen/sched_credit: Use delay to control scheduling frequency
>
> Sorry for the delay; just catching up after the Christmas holidays.
>
> On 26/12/11 03:46, Hui Lv wrote:
> > @@ -1533,6 +1569,15 @@ csched_init(struct scheduler *ops)
> > prv->tick_period_us = prv->tslice_ms * 1000 / prv->ticks_per_tslice;
> > prv->credits_per_tslice = CSCHED_CREDITS_PER_MSEC * prv->tslice_ms;
> >
> > + if ( MICROSECS(sched_ratelimit_us)> MILLISECS(sched_credit_tslice_ms) )
> > + {
> > + printk("WARNING: sched_ratelimit_us>"
> > + "sched_credit_tslice_ms is undefined\n"
> > + "ratelimit_us is set to 1000 * tslice_ms forcely\n")
> The standard idiom for this kind of message would be:
> WARNING [what's wrong]
> [What you're doing about it]
>
> So the last sentence of the warning should be:
> Setting ratelimit_us to 1000 * tslice_ms
>
> (Grammatically, you could say "Forcing ratelimit..." but I think "force"
> is too strong in this case.)
>
> Other than that, I'm happy with it, if everyone else is:
>
> Acked-by: George Dunlap <george.dunlap@eu.citrix.com>
>
^ permalink raw reply [flat|nested] 20+ messages in thread
* [PATCH] xen/sched_credit: Use delay to control scheduling frequency
@ 2012-01-09 10:22 Hui Lv
2012-01-10 10:05 ` George Dunlap
0 siblings, 1 reply; 20+ messages in thread
From: Hui Lv @ 2012-01-09 10:22 UTC (permalink / raw)
To: xen-devel, George.Dunlap, raistlin, JBeulich, Ian.Campbell; +Cc: hui.lv
Updated the warning sentence for ratelimit_us.
This patch can improve Xen performance:
1. Basically, the "delay method" can achieve 11% overall performance boost for SPECvirt than original credit scheduler.
2. We have tried 1ms delay and 10ms delay, there is no big difference between these two configurations. (1ms is enough to achieve a good performance)
3. We have compared different load level response time/latency (low, high, peak), "delay method" didn't bring very much response time increase.
4. 1ms delay can reduce 30% context switch at peak performance, where produces the benefits. (int sched_ratelimit_us = 1000 is the recommended setting)
Signed-off-by: Hui Lv <hui.lv@intel.com>
Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
Acked-by: George Dunlap <george.dunlap@eu.citrix.com>
diff -r a4bff36780a3 -r fe8d0ca867aa xen/common/sched_credit.c
--- a/xen/common/sched_credit.c Fri Dec 16 18:46:27 2011 +0000
+++ b/xen/common/sched_credit.c Mon Jan 09 05:21:35 2012 -0500
@@ -172,6 +172,7 @@ struct csched_private {
uint32_t credit;
int credit_balance;
uint32_t runq_sort;
+ unsigned ratelimit_us;
/* Period of master and tick in milliseconds */
unsigned tslice_ms, tick_period_us, ticks_per_tslice;
unsigned credits_per_tslice;
@@ -1297,10 +1298,15 @@ csched_schedule(
struct csched_private *prv = CSCHED_PRIV(ops);
struct csched_vcpu *snext;
struct task_slice ret;
+ s_time_t runtime, tslice;
CSCHED_STAT_CRANK(schedule);
CSCHED_VCPU_CHECK(current);
+ runtime = now - current->runstate.state_entry_time;
+ if ( runtime < 0 ) /* Does this ever happen? */
+ runtime = 0;
+
if ( !is_idle_vcpu(scurr->vcpu) )
{
/* Update credits of a non-idle VCPU. */
@@ -1313,6 +1319,35 @@ csched_schedule(
scurr->pri = CSCHED_PRI_IDLE;
}
+ /* Choices, choices:
+ * - If we have a tasklet, we need to run the idle vcpu no matter what.
+ * - If sched rate limiting is in effect, and the current vcpu has
+ * run for less than that amount of time, continue the current one,
+ * but with a shorter timeslice and return it immediately
+ * - Otherwise, chose the one with the highest priority (which may
+ * be the one currently running)
+ * - If the currently running one is TS_OVER, see if there
+ * is a higher priority one waiting on the runqueue of another
+ * cpu and steal it.
+ */
+
+ /* If we have schedule rate limiting enabled, check to see
+ * how long we've run for. */
+ if ( !tasklet_work_scheduled
+ && prv->ratelimit_us
+ && vcpu_runnable(current)
+ && !is_idle_vcpu(current)
+ && runtime < MICROSECS(prv->ratelimit_us) )
+ {
+ snext = scurr;
+ snext->start_time += now;
+ perfc_incr(delay_ms);
+ tslice = MICROSECS(prv->ratelimit_us);
+ ret.migrated = 0;
+ goto out;
+ }
+ tslice = MILLISECS(prv->tslice_ms);
+
/*
* Select next runnable local VCPU (ie top of local runq)
*/
@@ -1367,11 +1402,12 @@ csched_schedule(
if ( !is_idle_vcpu(snext->vcpu) )
snext->start_time += now;
+out:
/*
* Return task to run next...
*/
ret.time = (is_idle_vcpu(snext->vcpu) ?
- -1 : MILLISECS(prv->tslice_ms));
+ -1 : tslice);
ret.task = snext->vcpu;
CSCHED_VCPU_CHECK(ret.task);
@@ -1533,6 +1569,15 @@ csched_init(struct scheduler *ops)
prv->tick_period_us = prv->tslice_ms * 1000 / prv->ticks_per_tslice;
prv->credits_per_tslice = CSCHED_CREDITS_PER_MSEC * prv->tslice_ms;
+ if ( MICROSECS(sched_ratelimit_us) > MILLISECS(sched_credit_tslice_ms) )
+ {
+ printk("WARNING: sched_ratelimit_us >"
+ "sched_credit_tslice_ms is undefined\n"
+ "Setting ratelimit_us to 1000 * tslice_ms\n");
+ prv->ratelimit_us = 1000 * prv->tslice_ms;
+ }
+ else
+ prv->ratelimit_us = sched_ratelimit_us;
return 0;
}
diff -r a4bff36780a3 -r fe8d0ca867aa xen/common/schedule.c
--- a/xen/common/schedule.c Fri Dec 16 18:46:27 2011 +0000
+++ b/xen/common/schedule.c Mon Jan 09 05:21:35 2012 -0500
@@ -47,6 +47,11 @@ string_param("sched", opt_sched);
bool_t sched_smt_power_savings = 0;
boolean_param("sched_smt_power_savings", sched_smt_power_savings);
+/* Default scheduling rate limit: 1ms
+ * The behavior when sched_ratelimit_us is greater than sched_credit_tslice_ms is undefined
+ * */
+int sched_ratelimit_us = 1000;
+integer_param("sched_ratelimit_us", sched_ratelimit_us);
/* Various timer handlers. */
static void s_timer_fn(void *unused);
static void vcpu_periodic_timer_fn(void *data);
diff -r a4bff36780a3 -r fe8d0ca867aa xen/include/xen/perfc_defn.h
--- a/xen/include/xen/perfc_defn.h Fri Dec 16 18:46:27 2011 +0000
+++ b/xen/include/xen/perfc_defn.h Mon Jan 09 05:21:35 2012 -0500
@@ -16,6 +16,7 @@ PERFCOUNTER(sched_irq, "sch
PERFCOUNTER(sched_run, "sched: runs through scheduler")
PERFCOUNTER(sched_ctx, "sched: context switches")
+PERFCOUNTER(delay_ms, "csched: delay")
PERFCOUNTER(vcpu_check, "csched: vcpu_check")
PERFCOUNTER(schedule, "csched: schedule")
PERFCOUNTER(acct_run, "csched: acct_run")
diff -r a4bff36780a3 -r fe8d0ca867aa xen/include/xen/sched-if.h
--- a/xen/include/xen/sched-if.h Fri Dec 16 18:46:27 2011 +0000
+++ b/xen/include/xen/sched-if.h Mon Jan 09 05:21:35 2012 -0500
@@ -16,6 +16,11 @@ extern struct cpupool *cpupool0;
/* cpus currently in no cpupool */
extern cpumask_t cpupool_free_cpus;
+/* Scheduler generic parameters
+ * */
+extern int sched_ratelimit_us;
+
+
/*
* In order to allow a scheduler to remap the lock->cpu mapping,
* we have a per-cpu pointer, along with a pre-allocated set of
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] xen/sched_credit: Use delay to control scheduling frequency
2012-01-09 10:22 Hui Lv
@ 2012-01-10 10:05 ` George Dunlap
2012-01-24 12:01 ` George Dunlap
0 siblings, 1 reply; 20+ messages in thread
From: George Dunlap @ 2012-01-10 10:05 UTC (permalink / raw)
To: Hui Lv; +Cc: xen-devel, raistlin, Ian.Campbell, JBeulich
On Mon, Jan 9, 2012 at 10:22 AM, Hui Lv <hui.lv@intel.com> wrote:
> Updated the warning sentence for ratelimit_us.
>
> This patch can improve Xen performance:
> 1. Basically, the "delay method" can achieve 11% overall performance boost for SPECvirt than original credit scheduler.
> 2. We have tried 1ms delay and 10ms delay, there is no big difference between these two configurations. (1ms is enough to achieve a good performance)
> 3. We have compared different load level response time/latency (low, high, peak), "delay method" didn't bring very much response time increase.
> 4. 1ms delay can reduce 30% context switch at peak performance, where produces the benefits. (int sched_ratelimit_us = 1000 is the recommended setting)
>
> Signed-off-by: Hui Lv <hui.lv@intel.com>
> Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
>
> Acked-by: George Dunlap <george.dunlap@eu.citrix.com>
Just confirming this:
Acked-by: George Dunlap <george.dunlap@eu.citrix.com>
Thanks, Hui.
-George
>
> diff -r a4bff36780a3 -r fe8d0ca867aa xen/common/sched_credit.c
> --- a/xen/common/sched_credit.c Fri Dec 16 18:46:27 2011 +0000
> +++ b/xen/common/sched_credit.c Mon Jan 09 05:21:35 2012 -0500
> @@ -172,6 +172,7 @@ struct csched_private {
> uint32_t credit;
> int credit_balance;
> uint32_t runq_sort;
> + unsigned ratelimit_us;
> /* Period of master and tick in milliseconds */
> unsigned tslice_ms, tick_period_us, ticks_per_tslice;
> unsigned credits_per_tslice;
> @@ -1297,10 +1298,15 @@ csched_schedule(
> struct csched_private *prv = CSCHED_PRIV(ops);
> struct csched_vcpu *snext;
> struct task_slice ret;
> + s_time_t runtime, tslice;
>
> CSCHED_STAT_CRANK(schedule);
> CSCHED_VCPU_CHECK(current);
>
> + runtime = now - current->runstate.state_entry_time;
> + if ( runtime < 0 ) /* Does this ever happen? */
> + runtime = 0;
> +
> if ( !is_idle_vcpu(scurr->vcpu) )
> {
> /* Update credits of a non-idle VCPU. */
> @@ -1313,6 +1319,35 @@ csched_schedule(
> scurr->pri = CSCHED_PRI_IDLE;
> }
>
> + /* Choices, choices:
> + * - If we have a tasklet, we need to run the idle vcpu no matter what.
> + * - If sched rate limiting is in effect, and the current vcpu has
> + * run for less than that amount of time, continue the current one,
> + * but with a shorter timeslice and return it immediately
> + * - Otherwise, chose the one with the highest priority (which may
> + * be the one currently running)
> + * - If the currently running one is TS_OVER, see if there
> + * is a higher priority one waiting on the runqueue of another
> + * cpu and steal it.
> + */
> +
> + /* If we have schedule rate limiting enabled, check to see
> + * how long we've run for. */
> + if ( !tasklet_work_scheduled
> + && prv->ratelimit_us
> + && vcpu_runnable(current)
> + && !is_idle_vcpu(current)
> + && runtime < MICROSECS(prv->ratelimit_us) )
> + {
> + snext = scurr;
> + snext->start_time += now;
> + perfc_incr(delay_ms);
> + tslice = MICROSECS(prv->ratelimit_us);
> + ret.migrated = 0;
> + goto out;
> + }
> + tslice = MILLISECS(prv->tslice_ms);
> +
> /*
> * Select next runnable local VCPU (ie top of local runq)
> */
> @@ -1367,11 +1402,12 @@ csched_schedule(
> if ( !is_idle_vcpu(snext->vcpu) )
> snext->start_time += now;
>
> +out:
> /*
> * Return task to run next...
> */
> ret.time = (is_idle_vcpu(snext->vcpu) ?
> - -1 : MILLISECS(prv->tslice_ms));
> + -1 : tslice);
> ret.task = snext->vcpu;
>
> CSCHED_VCPU_CHECK(ret.task);
> @@ -1533,6 +1569,15 @@ csched_init(struct scheduler *ops)
> prv->tick_period_us = prv->tslice_ms * 1000 / prv->ticks_per_tslice;
> prv->credits_per_tslice = CSCHED_CREDITS_PER_MSEC * prv->tslice_ms;
>
> + if ( MICROSECS(sched_ratelimit_us) > MILLISECS(sched_credit_tslice_ms) )
> + {
> + printk("WARNING: sched_ratelimit_us >"
> + "sched_credit_tslice_ms is undefined\n"
> + "Setting ratelimit_us to 1000 * tslice_ms\n");
> + prv->ratelimit_us = 1000 * prv->tslice_ms;
> + }
> + else
> + prv->ratelimit_us = sched_ratelimit_us;
> return 0;
> }
>
> diff -r a4bff36780a3 -r fe8d0ca867aa xen/common/schedule.c
> --- a/xen/common/schedule.c Fri Dec 16 18:46:27 2011 +0000
> +++ b/xen/common/schedule.c Mon Jan 09 05:21:35 2012 -0500
> @@ -47,6 +47,11 @@ string_param("sched", opt_sched);
> bool_t sched_smt_power_savings = 0;
> boolean_param("sched_smt_power_savings", sched_smt_power_savings);
>
> +/* Default scheduling rate limit: 1ms
> + * The behavior when sched_ratelimit_us is greater than sched_credit_tslice_ms is undefined
> + * */
> +int sched_ratelimit_us = 1000;
> +integer_param("sched_ratelimit_us", sched_ratelimit_us);
> /* Various timer handlers. */
> static void s_timer_fn(void *unused);
> static void vcpu_periodic_timer_fn(void *data);
> diff -r a4bff36780a3 -r fe8d0ca867aa xen/include/xen/perfc_defn.h
> --- a/xen/include/xen/perfc_defn.h Fri Dec 16 18:46:27 2011 +0000
> +++ b/xen/include/xen/perfc_defn.h Mon Jan 09 05:21:35 2012 -0500
> @@ -16,6 +16,7 @@ PERFCOUNTER(sched_irq, "sch
> PERFCOUNTER(sched_run, "sched: runs through scheduler")
> PERFCOUNTER(sched_ctx, "sched: context switches")
>
> +PERFCOUNTER(delay_ms, "csched: delay")
> PERFCOUNTER(vcpu_check, "csched: vcpu_check")
> PERFCOUNTER(schedule, "csched: schedule")
> PERFCOUNTER(acct_run, "csched: acct_run")
> diff -r a4bff36780a3 -r fe8d0ca867aa xen/include/xen/sched-if.h
> --- a/xen/include/xen/sched-if.h Fri Dec 16 18:46:27 2011 +0000
> +++ b/xen/include/xen/sched-if.h Mon Jan 09 05:21:35 2012 -0500
> @@ -16,6 +16,11 @@ extern struct cpupool *cpupool0;
> /* cpus currently in no cpupool */
> extern cpumask_t cpupool_free_cpus;
>
> +/* Scheduler generic parameters
> + * */
> +extern int sched_ratelimit_us;
> +
> +
> /*
> * In order to allow a scheduler to remap the lock->cpu mapping,
> * we have a per-cpu pointer, along with a pre-allocated set of
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] xen/sched_credit: Use delay to control scheduling frequency
2012-01-10 10:05 ` George Dunlap
@ 2012-01-24 12:01 ` George Dunlap
2012-01-24 14:17 ` Keir Fraser
0 siblings, 1 reply; 20+ messages in thread
From: George Dunlap @ 2012-01-24 12:01 UTC (permalink / raw)
To: Keir Fraser; +Cc: xen-devel, raistlin, JBeulich
Ping?
On Tue, Jan 10, 2012 at 10:05 AM, George Dunlap
<George.Dunlap@eu.citrix.com> wrote:
> On Mon, Jan 9, 2012 at 10:22 AM, Hui Lv <hui.lv@intel.com> wrote:
>> Updated the warning sentence for ratelimit_us.
>>
>> This patch can improve Xen performance:
>> 1. Basically, the "delay method" can achieve 11% overall performance boost for SPECvirt than original credit scheduler.
>> 2. We have tried 1ms delay and 10ms delay, there is no big difference between these two configurations. (1ms is enough to achieve a good performance)
>> 3. We have compared different load level response time/latency (low, high, peak), "delay method" didn't bring very much response time increase.
>> 4. 1ms delay can reduce 30% context switch at peak performance, where produces the benefits. (int sched_ratelimit_us = 1000 is the recommended setting)
>>
>> Signed-off-by: Hui Lv <hui.lv@intel.com>
>> Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
>>
>> Acked-by: George Dunlap <george.dunlap@eu.citrix.com>
>
> Just confirming this:
> Acked-by: George Dunlap <george.dunlap@eu.citrix.com>
>
> Thanks, Hui.
> -George
>
>>
>> diff -r a4bff36780a3 -r fe8d0ca867aa xen/common/sched_credit.c
>> --- a/xen/common/sched_credit.c Fri Dec 16 18:46:27 2011 +0000
>> +++ b/xen/common/sched_credit.c Mon Jan 09 05:21:35 2012 -0500
>> @@ -172,6 +172,7 @@ struct csched_private {
>> uint32_t credit;
>> int credit_balance;
>> uint32_t runq_sort;
>> + unsigned ratelimit_us;
>> /* Period of master and tick in milliseconds */
>> unsigned tslice_ms, tick_period_us, ticks_per_tslice;
>> unsigned credits_per_tslice;
>> @@ -1297,10 +1298,15 @@ csched_schedule(
>> struct csched_private *prv = CSCHED_PRIV(ops);
>> struct csched_vcpu *snext;
>> struct task_slice ret;
>> + s_time_t runtime, tslice;
>>
>> CSCHED_STAT_CRANK(schedule);
>> CSCHED_VCPU_CHECK(current);
>>
>> + runtime = now - current->runstate.state_entry_time;
>> + if ( runtime < 0 ) /* Does this ever happen? */
>> + runtime = 0;
>> +
>> if ( !is_idle_vcpu(scurr->vcpu) )
>> {
>> /* Update credits of a non-idle VCPU. */
>> @@ -1313,6 +1319,35 @@ csched_schedule(
>> scurr->pri = CSCHED_PRI_IDLE;
>> }
>>
>> + /* Choices, choices:
>> + * - If we have a tasklet, we need to run the idle vcpu no matter what.
>> + * - If sched rate limiting is in effect, and the current vcpu has
>> + * run for less than that amount of time, continue the current one,
>> + * but with a shorter timeslice and return it immediately
>> + * - Otherwise, chose the one with the highest priority (which may
>> + * be the one currently running)
>> + * - If the currently running one is TS_OVER, see if there
>> + * is a higher priority one waiting on the runqueue of another
>> + * cpu and steal it.
>> + */
>> +
>> + /* If we have schedule rate limiting enabled, check to see
>> + * how long we've run for. */
>> + if ( !tasklet_work_scheduled
>> + && prv->ratelimit_us
>> + && vcpu_runnable(current)
>> + && !is_idle_vcpu(current)
>> + && runtime < MICROSECS(prv->ratelimit_us) )
>> + {
>> + snext = scurr;
>> + snext->start_time += now;
>> + perfc_incr(delay_ms);
>> + tslice = MICROSECS(prv->ratelimit_us);
>> + ret.migrated = 0;
>> + goto out;
>> + }
>> + tslice = MILLISECS(prv->tslice_ms);
>> +
>> /*
>> * Select next runnable local VCPU (ie top of local runq)
>> */
>> @@ -1367,11 +1402,12 @@ csched_schedule(
>> if ( !is_idle_vcpu(snext->vcpu) )
>> snext->start_time += now;
>>
>> +out:
>> /*
>> * Return task to run next...
>> */
>> ret.time = (is_idle_vcpu(snext->vcpu) ?
>> - -1 : MILLISECS(prv->tslice_ms));
>> + -1 : tslice);
>> ret.task = snext->vcpu;
>>
>> CSCHED_VCPU_CHECK(ret.task);
>> @@ -1533,6 +1569,15 @@ csched_init(struct scheduler *ops)
>> prv->tick_period_us = prv->tslice_ms * 1000 / prv->ticks_per_tslice;
>> prv->credits_per_tslice = CSCHED_CREDITS_PER_MSEC * prv->tslice_ms;
>>
>> + if ( MICROSECS(sched_ratelimit_us) > MILLISECS(sched_credit_tslice_ms) )
>> + {
>> + printk("WARNING: sched_ratelimit_us >"
>> + "sched_credit_tslice_ms is undefined\n"
>> + "Setting ratelimit_us to 1000 * tslice_ms\n");
>> + prv->ratelimit_us = 1000 * prv->tslice_ms;
>> + }
>> + else
>> + prv->ratelimit_us = sched_ratelimit_us;
>> return 0;
>> }
>>
>> diff -r a4bff36780a3 -r fe8d0ca867aa xen/common/schedule.c
>> --- a/xen/common/schedule.c Fri Dec 16 18:46:27 2011 +0000
>> +++ b/xen/common/schedule.c Mon Jan 09 05:21:35 2012 -0500
>> @@ -47,6 +47,11 @@ string_param("sched", opt_sched);
>> bool_t sched_smt_power_savings = 0;
>> boolean_param("sched_smt_power_savings", sched_smt_power_savings);
>>
>> +/* Default scheduling rate limit: 1ms
>> + * The behavior when sched_ratelimit_us is greater than sched_credit_tslice_ms is undefined
>> + * */
>> +int sched_ratelimit_us = 1000;
>> +integer_param("sched_ratelimit_us", sched_ratelimit_us);
>> /* Various timer handlers. */
>> static void s_timer_fn(void *unused);
>> static void vcpu_periodic_timer_fn(void *data);
>> diff -r a4bff36780a3 -r fe8d0ca867aa xen/include/xen/perfc_defn.h
>> --- a/xen/include/xen/perfc_defn.h Fri Dec 16 18:46:27 2011 +0000
>> +++ b/xen/include/xen/perfc_defn.h Mon Jan 09 05:21:35 2012 -0500
>> @@ -16,6 +16,7 @@ PERFCOUNTER(sched_irq, "sch
>> PERFCOUNTER(sched_run, "sched: runs through scheduler")
>> PERFCOUNTER(sched_ctx, "sched: context switches")
>>
>> +PERFCOUNTER(delay_ms, "csched: delay")
>> PERFCOUNTER(vcpu_check, "csched: vcpu_check")
>> PERFCOUNTER(schedule, "csched: schedule")
>> PERFCOUNTER(acct_run, "csched: acct_run")
>> diff -r a4bff36780a3 -r fe8d0ca867aa xen/include/xen/sched-if.h
>> --- a/xen/include/xen/sched-if.h Fri Dec 16 18:46:27 2011 +0000
>> +++ b/xen/include/xen/sched-if.h Mon Jan 09 05:21:35 2012 -0500
>> @@ -16,6 +16,11 @@ extern struct cpupool *cpupool0;
>> /* cpus currently in no cpupool */
>> extern cpumask_t cpupool_free_cpus;
>>
>> +/* Scheduler generic parameters
>> + * */
>> +extern int sched_ratelimit_us;
>> +
>> +
>> /*
>> * In order to allow a scheduler to remap the lock->cpu mapping,
>> * we have a per-cpu pointer, along with a pre-allocated set of
>>
>> _______________________________________________
>> Xen-devel mailing list
>> Xen-devel@lists.xensource.com
>> http://lists.xensource.com/xen-devel
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] xen/sched_credit: Use delay to control scheduling frequency
2012-01-24 12:01 ` George Dunlap
@ 2012-01-24 14:17 ` Keir Fraser
2012-01-24 14:31 ` George Dunlap
0 siblings, 1 reply; 20+ messages in thread
From: Keir Fraser @ 2012-01-24 14:17 UTC (permalink / raw)
To: George Dunlap; +Cc: xen-devel, raistlin, JBeulich
On 24/01/2012 12:01, "George Dunlap" <George.Dunlap@eu.citrix.com> wrote:
> Ping?
It's been in the tree for a week.
> On Tue, Jan 10, 2012 at 10:05 AM, George Dunlap
> <George.Dunlap@eu.citrix.com> wrote:
>> On Mon, Jan 9, 2012 at 10:22 AM, Hui Lv <hui.lv@intel.com> wrote:
>>> Updated the warning sentence for ratelimit_us.
>>>
>>> This patch can improve Xen performance:
>>> 1. Basically, the "delay method" can achieve 11% overall performance boost
>>> for SPECvirt than original credit scheduler.
>>> 2. We have tried 1ms delay and 10ms delay, there is no big difference
>>> between these two configurations. (1ms is enough to achieve a good
>>> performance)
>>> 3. We have compared different load level response time/latency (low, high,
>>> peak), "delay method" didn't bring very much response time increase.
>>> 4. 1ms delay can reduce 30% context switch at peak performance, where
>>> produces the benefits. (int sched_ratelimit_us = 1000 is the recommended
>>> setting)
>>>
>>> Signed-off-by: Hui Lv <hui.lv@intel.com>
>>> Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
>>>
>>> Acked-by: George Dunlap <george.dunlap@eu.citrix.com>
>>
>> Just confirming this:
>> Acked-by: George Dunlap <george.dunlap@eu.citrix.com>
>>
>> Thanks, Hui.
>> -George
>>
>>>
>>> diff -r a4bff36780a3 -r fe8d0ca867aa xen/common/sched_credit.c
>>> --- a/xen/common/sched_credit.c Fri Dec 16 18:46:27 2011 +0000
>>> +++ b/xen/common/sched_credit.c Mon Jan 09 05:21:35 2012 -0500
>>> @@ -172,6 +172,7 @@ struct csched_private {
>>> uint32_t credit;
>>> int credit_balance;
>>> uint32_t runq_sort;
>>> + unsigned ratelimit_us;
>>> /* Period of master and tick in milliseconds */
>>> unsigned tslice_ms, tick_period_us, ticks_per_tslice;
>>> unsigned credits_per_tslice;
>>> @@ -1297,10 +1298,15 @@ csched_schedule(
>>> struct csched_private *prv = CSCHED_PRIV(ops);
>>> struct csched_vcpu *snext;
>>> struct task_slice ret;
>>> + s_time_t runtime, tslice;
>>>
>>> CSCHED_STAT_CRANK(schedule);
>>> CSCHED_VCPU_CHECK(current);
>>>
>>> + runtime = now - current->runstate.state_entry_time;
>>> + if ( runtime < 0 ) /* Does this ever happen? */
>>> + runtime = 0;
>>> +
>>> if ( !is_idle_vcpu(scurr->vcpu) )
>>> {
>>> /* Update credits of a non-idle VCPU. */
>>> @@ -1313,6 +1319,35 @@ csched_schedule(
>>> scurr->pri = CSCHED_PRI_IDLE;
>>> }
>>>
>>> + /* Choices, choices:
>>> + * - If we have a tasklet, we need to run the idle vcpu no matter what.
>>> + * - If sched rate limiting is in effect, and the current vcpu has
>>> + * run for less than that amount of time, continue the current one,
>>> + * but with a shorter timeslice and return it immediately
>>> + * - Otherwise, chose the one with the highest priority (which may
>>> + * be the one currently running)
>>> + * - If the currently running one is TS_OVER, see if there
>>> + * is a higher priority one waiting on the runqueue of another
>>> + * cpu and steal it.
>>> + */
>>> +
>>> + /* If we have schedule rate limiting enabled, check to see
>>> + * how long we've run for. */
>>> + if ( !tasklet_work_scheduled
>>> + && prv->ratelimit_us
>>> + && vcpu_runnable(current)
>>> + && !is_idle_vcpu(current)
>>> + && runtime < MICROSECS(prv->ratelimit_us) )
>>> + {
>>> + snext = scurr;
>>> + snext->start_time += now;
>>> + perfc_incr(delay_ms);
>>> + tslice = MICROSECS(prv->ratelimit_us);
>>> + ret.migrated = 0;
>>> + goto out;
>>> + }
>>> + tslice = MILLISECS(prv->tslice_ms);
>>> +
>>> /*
>>> * Select next runnable local VCPU (ie top of local runq)
>>> */
>>> @@ -1367,11 +1402,12 @@ csched_schedule(
>>> if ( !is_idle_vcpu(snext->vcpu) )
>>> snext->start_time += now;
>>>
>>> +out:
>>> /*
>>> * Return task to run next...
>>> */
>>> ret.time = (is_idle_vcpu(snext->vcpu) ?
>>> - -1 : MILLISECS(prv->tslice_ms));
>>> + -1 : tslice);
>>> ret.task = snext->vcpu;
>>>
>>> CSCHED_VCPU_CHECK(ret.task);
>>> @@ -1533,6 +1569,15 @@ csched_init(struct scheduler *ops)
>>> prv->tick_period_us = prv->tslice_ms * 1000 / prv->ticks_per_tslice;
>>> prv->credits_per_tslice = CSCHED_CREDITS_PER_MSEC * prv->tslice_ms;
>>>
>>> + if ( MICROSECS(sched_ratelimit_us) > MILLISECS(sched_credit_tslice_ms)
>>> )
>>> + {
>>> + printk("WARNING: sched_ratelimit_us >"
>>> + "sched_credit_tslice_ms is undefined\n"
>>> + "Setting ratelimit_us to 1000 * tslice_ms\n");
>>> + prv->ratelimit_us = 1000 * prv->tslice_ms;
>>> + }
>>> + else
>>> + prv->ratelimit_us = sched_ratelimit_us;
>>> return 0;
>>> }
>>>
>>> diff -r a4bff36780a3 -r fe8d0ca867aa xen/common/schedule.c
>>> --- a/xen/common/schedule.c Fri Dec 16 18:46:27 2011 +0000
>>> +++ b/xen/common/schedule.c Mon Jan 09 05:21:35 2012 -0500
>>> @@ -47,6 +47,11 @@ string_param("sched", opt_sched);
>>> bool_t sched_smt_power_savings = 0;
>>> boolean_param("sched_smt_power_savings", sched_smt_power_savings);
>>>
>>> +/* Default scheduling rate limit: 1ms
>>> + * The behavior when sched_ratelimit_us is greater than
>>> sched_credit_tslice_ms is undefined
>>> + * */
>>> +int sched_ratelimit_us = 1000;
>>> +integer_param("sched_ratelimit_us", sched_ratelimit_us);
>>> /* Various timer handlers. */
>>> static void s_timer_fn(void *unused);
>>> static void vcpu_periodic_timer_fn(void *data);
>>> diff -r a4bff36780a3 -r fe8d0ca867aa xen/include/xen/perfc_defn.h
>>> --- a/xen/include/xen/perfc_defn.h Fri Dec 16 18:46:27 2011 +0000
>>> +++ b/xen/include/xen/perfc_defn.h Mon Jan 09 05:21:35 2012 -0500
>>> @@ -16,6 +16,7 @@ PERFCOUNTER(sched_irq, "sch
>>> PERFCOUNTER(sched_run, "sched: runs through scheduler")
>>> PERFCOUNTER(sched_ctx, "sched: context switches")
>>>
>>> +PERFCOUNTER(delay_ms, "csched: delay")
>>> PERFCOUNTER(vcpu_check, "csched: vcpu_check")
>>> PERFCOUNTER(schedule, "csched: schedule")
>>> PERFCOUNTER(acct_run, "csched: acct_run")
>>> diff -r a4bff36780a3 -r fe8d0ca867aa xen/include/xen/sched-if.h
>>> --- a/xen/include/xen/sched-if.h Fri Dec 16 18:46:27 2011 +0000
>>> +++ b/xen/include/xen/sched-if.h Mon Jan 09 05:21:35 2012 -0500
>>> @@ -16,6 +16,11 @@ extern struct cpupool *cpupool0;
>>> /* cpus currently in no cpupool */
>>> extern cpumask_t cpupool_free_cpus;
>>>
>>> +/* Scheduler generic parameters
>>> + * */
>>> +extern int sched_ratelimit_us;
>>> +
>>> +
>>> /*
>>> * In order to allow a scheduler to remap the lock->cpu mapping,
>>> * we have a per-cpu pointer, along with a pre-allocated set of
>>>
>>> _______________________________________________
>>> Xen-devel mailing list
>>> Xen-devel@lists.xensource.com
>>> http://lists.xensource.com/xen-devel
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] xen/sched_credit: Use delay to control scheduling frequency
2012-01-24 14:17 ` Keir Fraser
@ 2012-01-24 14:31 ` George Dunlap
0 siblings, 0 replies; 20+ messages in thread
From: George Dunlap @ 2012-01-24 14:31 UTC (permalink / raw)
To: Keir Fraser
Cc: George Dunlap, xen-devel@lists.xensource.com, raistlin@linux.it,
JBeulich@suse.com
On Tue, 2012-01-24 at 14:17 +0000, Keir Fraser wrote:
> On 24/01/2012 12:01, "George Dunlap" <George.Dunlap@eu.citrix.com> wrote:
>
> > Ping?
>
> It's been in the tree for a week.
Sorry -- I was sure I looked for it and didn't see it; but now I look
again, there it is. :-/
-George
>
> > On Tue, Jan 10, 2012 at 10:05 AM, George Dunlap
> > <George.Dunlap@eu.citrix.com> wrote:
> >> On Mon, Jan 9, 2012 at 10:22 AM, Hui Lv <hui.lv@intel.com> wrote:
> >>> Updated the warning sentence for ratelimit_us.
> >>>
> >>> This patch can improve Xen performance:
> >>> 1. Basically, the "delay method" can achieve 11% overall performance boost
> >>> for SPECvirt than original credit scheduler.
> >>> 2. We have tried 1ms delay and 10ms delay, there is no big difference
> >>> between these two configurations. (1ms is enough to achieve a good
> >>> performance)
> >>> 3. We have compared different load level response time/latency (low, high,
> >>> peak), "delay method" didn't bring very much response time increase.
> >>> 4. 1ms delay can reduce 30% context switch at peak performance, where
> >>> produces the benefits. (int sched_ratelimit_us = 1000 is the recommended
> >>> setting)
> >>>
> >>> Signed-off-by: Hui Lv <hui.lv@intel.com>
> >>> Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
> >>>
> >>> Acked-by: George Dunlap <george.dunlap@eu.citrix.com>
> >>
> >> Just confirming this:
> >> Acked-by: George Dunlap <george.dunlap@eu.citrix.com>
> >>
> >> Thanks, Hui.
> >> -George
> >>
> >>>
> >>> diff -r a4bff36780a3 -r fe8d0ca867aa xen/common/sched_credit.c
> >>> --- a/xen/common/sched_credit.c Fri Dec 16 18:46:27 2011 +0000
> >>> +++ b/xen/common/sched_credit.c Mon Jan 09 05:21:35 2012 -0500
> >>> @@ -172,6 +172,7 @@ struct csched_private {
> >>> uint32_t credit;
> >>> int credit_balance;
> >>> uint32_t runq_sort;
> >>> + unsigned ratelimit_us;
> >>> /* Period of master and tick in milliseconds */
> >>> unsigned tslice_ms, tick_period_us, ticks_per_tslice;
> >>> unsigned credits_per_tslice;
> >>> @@ -1297,10 +1298,15 @@ csched_schedule(
> >>> struct csched_private *prv = CSCHED_PRIV(ops);
> >>> struct csched_vcpu *snext;
> >>> struct task_slice ret;
> >>> + s_time_t runtime, tslice;
> >>>
> >>> CSCHED_STAT_CRANK(schedule);
> >>> CSCHED_VCPU_CHECK(current);
> >>>
> >>> + runtime = now - current->runstate.state_entry_time;
> >>> + if ( runtime < 0 ) /* Does this ever happen? */
> >>> + runtime = 0;
> >>> +
> >>> if ( !is_idle_vcpu(scurr->vcpu) )
> >>> {
> >>> /* Update credits of a non-idle VCPU. */
> >>> @@ -1313,6 +1319,35 @@ csched_schedule(
> >>> scurr->pri = CSCHED_PRI_IDLE;
> >>> }
> >>>
> >>> + /* Choices, choices:
> >>> + * - If we have a tasklet, we need to run the idle vcpu no matter what.
> >>> + * - If sched rate limiting is in effect, and the current vcpu has
> >>> + * run for less than that amount of time, continue the current one,
> >>> + * but with a shorter timeslice and return it immediately
> >>> + * - Otherwise, chose the one with the highest priority (which may
> >>> + * be the one currently running)
> >>> + * - If the currently running one is TS_OVER, see if there
> >>> + * is a higher priority one waiting on the runqueue of another
> >>> + * cpu and steal it.
> >>> + */
> >>> +
> >>> + /* If we have schedule rate limiting enabled, check to see
> >>> + * how long we've run for. */
> >>> + if ( !tasklet_work_scheduled
> >>> + && prv->ratelimit_us
> >>> + && vcpu_runnable(current)
> >>> + && !is_idle_vcpu(current)
> >>> + && runtime < MICROSECS(prv->ratelimit_us) )
> >>> + {
> >>> + snext = scurr;
> >>> + snext->start_time += now;
> >>> + perfc_incr(delay_ms);
> >>> + tslice = MICROSECS(prv->ratelimit_us);
> >>> + ret.migrated = 0;
> >>> + goto out;
> >>> + }
> >>> + tslice = MILLISECS(prv->tslice_ms);
> >>> +
> >>> /*
> >>> * Select next runnable local VCPU (ie top of local runq)
> >>> */
> >>> @@ -1367,11 +1402,12 @@ csched_schedule(
> >>> if ( !is_idle_vcpu(snext->vcpu) )
> >>> snext->start_time += now;
> >>>
> >>> +out:
> >>> /*
> >>> * Return task to run next...
> >>> */
> >>> ret.time = (is_idle_vcpu(snext->vcpu) ?
> >>> - -1 : MILLISECS(prv->tslice_ms));
> >>> + -1 : tslice);
> >>> ret.task = snext->vcpu;
> >>>
> >>> CSCHED_VCPU_CHECK(ret.task);
> >>> @@ -1533,6 +1569,15 @@ csched_init(struct scheduler *ops)
> >>> prv->tick_period_us = prv->tslice_ms * 1000 / prv->ticks_per_tslice;
> >>> prv->credits_per_tslice = CSCHED_CREDITS_PER_MSEC * prv->tslice_ms;
> >>>
> >>> + if ( MICROSECS(sched_ratelimit_us) > MILLISECS(sched_credit_tslice_ms)
> >>> )
> >>> + {
> >>> + printk("WARNING: sched_ratelimit_us >"
> >>> + "sched_credit_tslice_ms is undefined\n"
> >>> + "Setting ratelimit_us to 1000 * tslice_ms\n");
> >>> + prv->ratelimit_us = 1000 * prv->tslice_ms;
> >>> + }
> >>> + else
> >>> + prv->ratelimit_us = sched_ratelimit_us;
> >>> return 0;
> >>> }
> >>>
> >>> diff -r a4bff36780a3 -r fe8d0ca867aa xen/common/schedule.c
> >>> --- a/xen/common/schedule.c Fri Dec 16 18:46:27 2011 +0000
> >>> +++ b/xen/common/schedule.c Mon Jan 09 05:21:35 2012 -0500
> >>> @@ -47,6 +47,11 @@ string_param("sched", opt_sched);
> >>> bool_t sched_smt_power_savings = 0;
> >>> boolean_param("sched_smt_power_savings", sched_smt_power_savings);
> >>>
> >>> +/* Default scheduling rate limit: 1ms
> >>> + * The behavior when sched_ratelimit_us is greater than
> >>> sched_credit_tslice_ms is undefined
> >>> + * */
> >>> +int sched_ratelimit_us = 1000;
> >>> +integer_param("sched_ratelimit_us", sched_ratelimit_us);
> >>> /* Various timer handlers. */
> >>> static void s_timer_fn(void *unused);
> >>> static void vcpu_periodic_timer_fn(void *data);
> >>> diff -r a4bff36780a3 -r fe8d0ca867aa xen/include/xen/perfc_defn.h
> >>> --- a/xen/include/xen/perfc_defn.h Fri Dec 16 18:46:27 2011 +0000
> >>> +++ b/xen/include/xen/perfc_defn.h Mon Jan 09 05:21:35 2012 -0500
> >>> @@ -16,6 +16,7 @@ PERFCOUNTER(sched_irq, "sch
> >>> PERFCOUNTER(sched_run, "sched: runs through scheduler")
> >>> PERFCOUNTER(sched_ctx, "sched: context switches")
> >>>
> >>> +PERFCOUNTER(delay_ms, "csched: delay")
> >>> PERFCOUNTER(vcpu_check, "csched: vcpu_check")
> >>> PERFCOUNTER(schedule, "csched: schedule")
> >>> PERFCOUNTER(acct_run, "csched: acct_run")
> >>> diff -r a4bff36780a3 -r fe8d0ca867aa xen/include/xen/sched-if.h
> >>> --- a/xen/include/xen/sched-if.h Fri Dec 16 18:46:27 2011 +0000
> >>> +++ b/xen/include/xen/sched-if.h Mon Jan 09 05:21:35 2012 -0500
> >>> @@ -16,6 +16,11 @@ extern struct cpupool *cpupool0;
> >>> /* cpus currently in no cpupool */
> >>> extern cpumask_t cpupool_free_cpus;
> >>>
> >>> +/* Scheduler generic parameters
> >>> + * */
> >>> +extern int sched_ratelimit_us;
> >>> +
> >>> +
> >>> /*
> >>> * In order to allow a scheduler to remap the lock->cpu mapping,
> >>> * we have a per-cpu pointer, along with a pre-allocated set of
> >>>
> >>> _______________________________________________
> >>> Xen-devel mailing list
> >>> Xen-devel@lists.xensource.com
> >>> http://lists.xensource.com/xen-devel
>
>
^ permalink raw reply [flat|nested] 20+ messages in thread
end of thread, other threads:[~2012-01-24 14:31 UTC | newest]
Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-12-26 8:46 [PATCH] xen/sched_credit: Use delay to control scheduling frequency Hui Lv
2012-01-02 8:19 ` Jan Beulich
2012-01-05 4:08 ` Lv, Hui
2012-01-06 19:50 ` George Dunlap
2012-01-06 19:56 ` George Dunlap
2012-01-08 12:03 ` Lv, Hui
2012-01-09 9:49 ` George Dunlap
-- strict thread matches above, loose matches on Subject: below --
2012-01-09 10:22 Hui Lv
2012-01-10 10:05 ` George Dunlap
2012-01-24 12:01 ` George Dunlap
2012-01-24 14:17 ` Keir Fraser
2012-01-24 14:31 ` George Dunlap
2011-12-19 22:13 Hui Lv
2011-12-20 9:07 ` Ian Campbell
2011-12-20 9:44 ` Lv, Hui
2011-12-20 10:09 ` Dario Faggioli
2011-12-20 10:26 ` George Dunlap
2011-12-20 10:51 ` Dario Faggioli
2011-12-20 12:14 ` Lv, Hui
2011-12-20 10:21 ` George Dunlap
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).