Re: Time slice for SCHED

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* Re: Time slice for SCHED_BATCH ( CFS)
       [not found]     ` <303179.28635.qm@web94707.mail.in2.yahoo.com>
@ 2009-02-11 12:40       ` Ingo Molnar
  2009-02-11 13:02       ` Peter Zijlstra
  1 sibling, 0 replies; 4+ messages in thread
From: Ingo Molnar @ 2009-02-11 12:40 UTC (permalink / raw)
  To: J K Rai; +Cc: Peter Zijlstra, linux-kernel


* J K Rai <jk.anurag@yahoo.com> wrote:

> Thanks,
> 
> I want to do profiling (e.g. on-chip cache related behavior of processes) from 
> user-land and want to study the impact of time-slice and sampling interval on the 
> quality of profile. Hence thought of knowing the time-slice.

btw., how do you do that profiling? How do you measure cache behavior?

	Ingo

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Time slice for SCHED_BATCH ( CFS)
       [not found]     ` <303179.28635.qm@web94707.mail.in2.yahoo.com>
  2009-02-11 12:40       ` Time slice for SCHED_BATCH ( CFS) Ingo Molnar
@ 2009-02-11 13:02       ` Peter Zijlstra
       [not found]         ` <441046.97200.qm@web94712.mail.in2.yahoo.com>
  1 sibling, 1 reply; 4+ messages in thread
From: Peter Zijlstra @ 2009-02-11 13:02 UTC (permalink / raw)
  To: J K Rai; +Cc: linux-kernel, Ingo Molnar

On Wed, 2009-02-11 at 17:58 +0530, J K Rai wrote:
> Can we say that given n cpus and m processes the time-slice will
> remain constant under SCHED_BATCH or so? 

Only if those processes remain running, if they get blocked for whatever
reason it'll change.

> Can we form some kind of relationship? 

Sure,

latency := 20ms * (1 + log2(nr_cpus))
min_granularity := 4ms * (1 + log2(nr_cpus))
nr_latency := floor(latency / min_granularity)


           latency ; nr_running <= nr_latency
period = {
           nr_running * min_granularity ; nr_running > nr_latency


slice = task_weight * period / runqueue_weight


as you can see, its a function of the number of cpus, as well as all
other running tasks on a particular cpu.

Load-balancing of course makes this an even more interesting thing.



^ permalink raw reply	[flat|nested] 4+ messages in thread

[parent not found: <441046.97200.qm@web94712.mail.in2.yahoo.com>]

* Re: Time slice for SCHED_BATCH ( CFS)
       [not found]         ` <441046.97200.qm@web94712.mail.in2.yahoo.com>
@ 2009-02-12  9:13           ` Peter Zijlstra
       [not found]             ` <466094.58460.qm@web94704.mail.in2.yahoo.com>
  0 siblings, 1 reply; 4+ messages in thread
From: Peter Zijlstra @ 2009-02-12  9:13 UTC (permalink / raw)
  To: J K Rai; +Cc: Ingo Molnar, lkml

On Thu, 2009-02-12 at 11:17 +0530, J K Rai wrote:
> 
> May I have little more clarification on this:
>  
> latency := 20ms * (1 + log2(nr_cpus))
> min_granularity := 4ms * (1 + log2(nr_cpus))
> nr_latency := floor(latency / min_granularity)
> 
> 1) In above the 20ms and 4 ms seems to be the default values of
> sched_latency_ns and sched_min_granularity_ns, that means if we change
> them thru sysctl -w then we should keep those changed values in the
> above relationship in place of 20ms and 4 ms. Am I correct?

Yes, sysctl setting replaces the whole expression, that is, including
the log2 cpu factor.

> 2) What exactly or tentatively  we signify by latency, min_granularity
> and nr_latency?

latency -- the desired scheduling latency of applications on low/medium
load machines (20ms is around the human observable).

min_granularity -- since we let slices get smaller the more tasks there
are in roughly: latency/nr_running fashion, we want to avoid them
getting too small. min_granularity provides a lower bound.

nr_latency -- the cut off point where we let go of the desired
scheduling latency and start growing linearly.

>         latency ; nr_running <= nr_latency
> period = {
>           nr_running * min_granularity ; nr_running > nr_latency
> 
> slice = task_weight * period / runqueue_weight
> 
> 
> 3) Here in above, what is meant by task_weight and runqueue_weight ?

Since CFS is a proportional weight scheduler, each task is assigned a
relative weight. Two tasks with weight 1 will get similar amounts of cpu
time, a weight ratio of 1:2 will get the former task half as much cpu
time as the latter.

The runqueue weight is the sum of all task weights.

> Load-balancing of course makes this an even more interesting thing.
> 
> 4) Can we say something more about load-balancing effect on
> time-slice. 
> How the load-balancing works at present, is it by making the trees of
> equal hight / no of elements?

Well, load balancing just moves tasks around trying to ensure the sum of
weights on each cpu is roughly equal, the slice calculation is done with
whatever is present on a single cpu.

^ permalink raw reply	[flat|nested] 4+ messages in thread

[parent not found: <466094.58460.qm@web94704.mail.in2.yahoo.com>]

* Re: Time slice for SCHED_BATCH ( CFS)
       [not found]             ` <466094.58460.qm@web94704.mail.in2.yahoo.com>
@ 2009-02-12 11:04               ` Peter Zijlstra
  0 siblings, 0 replies; 4+ messages in thread
From: Peter Zijlstra @ 2009-02-12 11:04 UTC (permalink / raw)
  To: J K Rai; +Cc: Ingo Molnar, lkml

On Thu, 2009-02-12 at 15:51 +0530, J K Rai wrote:
> Thanks a lot,

LKML etiquette prefers if you do not top-post, and your email to at
least have a plain text copy -- thanks.

> Some more queries:
> 
> 1) For a scenario where we can assume to have some 2*n running
> processes and n cpus, which settings should one perform thru sysctl -w
> to get almost constant and reasonable long (server class) slices.
> Should one change both sched_min_granularity_ns and sched_latency_ns.
> Is it OK to use SCHED_BATCH (thru chrt) or SCHED_OTHER (the default)
> will suffice.

At that point each cpu ought to have 2 tasks, which is lower than the
default nr_latency, so you'll end up with 20ms*(1+log2(nr_cpus)) / 2
slices.

Which is plenty long to qualify as server class imho.

> 2) May I know about few more scheduler settings as shown below:
> sched_wakeup_granularity_ns

  measure of unfairness in order to achieve progress. CFS will schedule
that task that has received least service, the wakeup granularity
governs wakeup-preemption and will let a that be that much not left most
and still not preempt it, this is so that it can make some progress.

> sched_batch_wakeup_granularity_ns

This does not exist anymore, you must be running something ancient ;-)

> sched_features

Too much detail, its a bitmask with each bit a 'feature', its basically
a set of things where we had to make a random choice in the
implementation and wanted a switch.

> sched_migration_cost

Measure for how expensive it is to move a task between cpus.

> sched_nr_migrate

Limit on the number of tasks it iterates when load-balancing, this is a
latency thing.

> sched_rt_period_us
> sched_rt_runtime_us

global bandwidth limit on RT tasks, they get runtime every period.

> sched_compat_yield

Some broken programs rely on implementation details of sched_yield() for
SCHED_OTHER -- POSIX doesn't define sched_yield() for anything but FIFO
(maybe RR), so any implementation is a good one :-)

> 3)
> 
>  latency := 20ms * (1 + log2(nr_cpus))
>  min_granularity := 4ms * (1 + log2(nr_cpus))
>  nr_latency := floor(latency / min_granularity)
> 
> min_granularity -- since we let slices get smaller the more tasks
> there
> are in roughly: latency/nr_running fashion, we want to avoid them
> getting too small. min_granularity provides a lower bound.
> 
>         latency ; nr_running <= nr_latency
>  period = {
>           nr_running * min_granularity ; nr_running > nr_latency
>  
>  slice = task_weight * period / runqueue_weight
> 
> 3) In above schema  how the task weights are calculated? 
> That calculation may cause the slices to get smaller as you said. If I
> understand correctly.

Nice value is mapped to task weight:

/*
 * Nice levels are multiplicative, with a gentle 10% change for every
 * nice level changed. I.e. when a CPU-bound task goes from nice 0 to
 * nice 1, it will get ~10% less CPU time than another CPU-bound task
 * that remained on nice 0.
 *
 * The "10% effect" is relative and cumulative: from _any_ nice level,
 * if you go up 1 level, it's -10% CPU usage, if you go down 1 level
 * it's +10% CPU usage. (to achieve that we use a multiplier of 1.25.
 * If a task goes up by ~10% and another task goes down by ~10% then
 * the relative distance between them is ~25%.)
 */
static const int prio_to_weight[40] = {
 /* -20 */     88761,     71755,     56483,     46273,     36291,
 /* -15 */     29154,     23254,     18705,     14949,     11916,
 /* -10 */      9548,      7620,      6100,      4904,      3906,
 /*  -5 */      3121,      2501,      1991,      1586,      1277,
 /*   0 */      1024,       820,       655,       526,       423,
 /*   5 */       335,       272,       215,       172,       137,
 /*  10 */       110,        87,        70,        56,        45,
 /*  15 */        36,        29,        23,        18,        15,
};

fixed point, 10 bits.



^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2009-02-12 11:03 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <315626.71453.qm@web94713.mail.in2.yahoo.com>
     [not found] ` <20090211102024.GI20518@elte.hu>
     [not found]   ` <1234348436.23438.119.camel@twins>
     [not found]     ` <303179.28635.qm@web94707.mail.in2.yahoo.com>
2009-02-11 12:40       ` Time slice for SCHED_BATCH ( CFS) Ingo Molnar
2009-02-11 13:02       ` Peter Zijlstra
     [not found]         ` <441046.97200.qm@web94712.mail.in2.yahoo.com>
2009-02-12  9:13           ` Peter Zijlstra
     [not found]             ` <466094.58460.qm@web94704.mail.in2.yahoo.com>
2009-02-12 11:04               ` Peter Zijlstra

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox