* Re: Time slice for SCHED_BATCH ( CFS) [not found] ` <303179.28635.qm@web94707.mail.in2.yahoo.com> @ 2009-02-11 12:40 ` Ingo Molnar 2009-02-11 13:02 ` Peter Zijlstra 1 sibling, 0 replies; 4+ messages in thread From: Ingo Molnar @ 2009-02-11 12:40 UTC (permalink / raw) To: J K Rai; +Cc: Peter Zijlstra, linux-kernel * J K Rai <jk.anurag@yahoo.com> wrote: > Thanks, > > I want to do profiling (e.g. on-chip cache related behavior of processes) from > user-land and want to study the impact of time-slice and sampling interval on the > quality of profile. Hence thought of knowing the time-slice. btw., how do you do that profiling? How do you measure cache behavior? Ingo ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Time slice for SCHED_BATCH ( CFS) [not found] ` <303179.28635.qm@web94707.mail.in2.yahoo.com> 2009-02-11 12:40 ` Time slice for SCHED_BATCH ( CFS) Ingo Molnar @ 2009-02-11 13:02 ` Peter Zijlstra [not found] ` <441046.97200.qm@web94712.mail.in2.yahoo.com> 1 sibling, 1 reply; 4+ messages in thread From: Peter Zijlstra @ 2009-02-11 13:02 UTC (permalink / raw) To: J K Rai; +Cc: linux-kernel, Ingo Molnar On Wed, 2009-02-11 at 17:58 +0530, J K Rai wrote: > Can we say that given n cpus and m processes the time-slice will > remain constant under SCHED_BATCH or so? Only if those processes remain running, if they get blocked for whatever reason it'll change. > Can we form some kind of relationship? Sure, latency := 20ms * (1 + log2(nr_cpus)) min_granularity := 4ms * (1 + log2(nr_cpus)) nr_latency := floor(latency / min_granularity) latency ; nr_running <= nr_latency period = { nr_running * min_granularity ; nr_running > nr_latency slice = task_weight * period / runqueue_weight as you can see, its a function of the number of cpus, as well as all other running tasks on a particular cpu. Load-balancing of course makes this an even more interesting thing. ^ permalink raw reply [flat|nested] 4+ messages in thread
[parent not found: <441046.97200.qm@web94712.mail.in2.yahoo.com>]
* Re: Time slice for SCHED_BATCH ( CFS) [not found] ` <441046.97200.qm@web94712.mail.in2.yahoo.com> @ 2009-02-12 9:13 ` Peter Zijlstra [not found] ` <466094.58460.qm@web94704.mail.in2.yahoo.com> 0 siblings, 1 reply; 4+ messages in thread From: Peter Zijlstra @ 2009-02-12 9:13 UTC (permalink / raw) To: J K Rai; +Cc: Ingo Molnar, lkml On Thu, 2009-02-12 at 11:17 +0530, J K Rai wrote: > > May I have little more clarification on this: > > latency := 20ms * (1 + log2(nr_cpus)) > min_granularity := 4ms * (1 + log2(nr_cpus)) > nr_latency := floor(latency / min_granularity) > > 1) In above the 20ms and 4 ms seems to be the default values of > sched_latency_ns and sched_min_granularity_ns, that means if we change > them thru sysctl -w then we should keep those changed values in the > above relationship in place of 20ms and 4 ms. Am I correct? Yes, sysctl setting replaces the whole expression, that is, including the log2 cpu factor. > 2) What exactly or tentatively we signify by latency, min_granularity > and nr_latency? latency -- the desired scheduling latency of applications on low/medium load machines (20ms is around the human observable). min_granularity -- since we let slices get smaller the more tasks there are in roughly: latency/nr_running fashion, we want to avoid them getting too small. min_granularity provides a lower bound. nr_latency -- the cut off point where we let go of the desired scheduling latency and start growing linearly. > latency ; nr_running <= nr_latency > period = { > nr_running * min_granularity ; nr_running > nr_latency > > slice = task_weight * period / runqueue_weight > > > 3) Here in above, what is meant by task_weight and runqueue_weight ? Since CFS is a proportional weight scheduler, each task is assigned a relative weight. Two tasks with weight 1 will get similar amounts of cpu time, a weight ratio of 1:2 will get the former task half as much cpu time as the latter. The runqueue weight is the sum of all task weights. > Load-balancing of course makes this an even more interesting thing. > > 4) Can we say something more about load-balancing effect on > time-slice. > How the load-balancing works at present, is it by making the trees of > equal hight / no of elements? Well, load balancing just moves tasks around trying to ensure the sum of weights on each cpu is roughly equal, the slice calculation is done with whatever is present on a single cpu. ^ permalink raw reply [flat|nested] 4+ messages in thread
[parent not found: <466094.58460.qm@web94704.mail.in2.yahoo.com>]
* Re: Time slice for SCHED_BATCH ( CFS) [not found] ` <466094.58460.qm@web94704.mail.in2.yahoo.com> @ 2009-02-12 11:04 ` Peter Zijlstra 0 siblings, 0 replies; 4+ messages in thread From: Peter Zijlstra @ 2009-02-12 11:04 UTC (permalink / raw) To: J K Rai; +Cc: Ingo Molnar, lkml On Thu, 2009-02-12 at 15:51 +0530, J K Rai wrote: > Thanks a lot, LKML etiquette prefers if you do not top-post, and your email to at least have a plain text copy -- thanks. > Some more queries: > > 1) For a scenario where we can assume to have some 2*n running > processes and n cpus, which settings should one perform thru sysctl -w > to get almost constant and reasonable long (server class) slices. > Should one change both sched_min_granularity_ns and sched_latency_ns. > Is it OK to use SCHED_BATCH (thru chrt) or SCHED_OTHER (the default) > will suffice. At that point each cpu ought to have 2 tasks, which is lower than the default nr_latency, so you'll end up with 20ms*(1+log2(nr_cpus)) / 2 slices. Which is plenty long to qualify as server class imho. > 2) May I know about few more scheduler settings as shown below: > sched_wakeup_granularity_ns measure of unfairness in order to achieve progress. CFS will schedule that task that has received least service, the wakeup granularity governs wakeup-preemption and will let a that be that much not left most and still not preempt it, this is so that it can make some progress. > sched_batch_wakeup_granularity_ns This does not exist anymore, you must be running something ancient ;-) > sched_features Too much detail, its a bitmask with each bit a 'feature', its basically a set of things where we had to make a random choice in the implementation and wanted a switch. > sched_migration_cost Measure for how expensive it is to move a task between cpus. > sched_nr_migrate Limit on the number of tasks it iterates when load-balancing, this is a latency thing. > sched_rt_period_us > sched_rt_runtime_us global bandwidth limit on RT tasks, they get runtime every period. > sched_compat_yield Some broken programs rely on implementation details of sched_yield() for SCHED_OTHER -- POSIX doesn't define sched_yield() for anything but FIFO (maybe RR), so any implementation is a good one :-) > 3) > > latency := 20ms * (1 + log2(nr_cpus)) > min_granularity := 4ms * (1 + log2(nr_cpus)) > nr_latency := floor(latency / min_granularity) > > min_granularity -- since we let slices get smaller the more tasks > there > are in roughly: latency/nr_running fashion, we want to avoid them > getting too small. min_granularity provides a lower bound. > > latency ; nr_running <= nr_latency > period = { > nr_running * min_granularity ; nr_running > nr_latency > > slice = task_weight * period / runqueue_weight > > 3) In above schema how the task weights are calculated? > That calculation may cause the slices to get smaller as you said. If I > understand correctly. Nice value is mapped to task weight: /* * Nice levels are multiplicative, with a gentle 10% change for every * nice level changed. I.e. when a CPU-bound task goes from nice 0 to * nice 1, it will get ~10% less CPU time than another CPU-bound task * that remained on nice 0. * * The "10% effect" is relative and cumulative: from _any_ nice level, * if you go up 1 level, it's -10% CPU usage, if you go down 1 level * it's +10% CPU usage. (to achieve that we use a multiplier of 1.25. * If a task goes up by ~10% and another task goes down by ~10% then * the relative distance between them is ~25%.) */ static const int prio_to_weight[40] = { /* -20 */ 88761, 71755, 56483, 46273, 36291, /* -15 */ 29154, 23254, 18705, 14949, 11916, /* -10 */ 9548, 7620, 6100, 4904, 3906, /* -5 */ 3121, 2501, 1991, 1586, 1277, /* 0 */ 1024, 820, 655, 526, 423, /* 5 */ 335, 272, 215, 172, 137, /* 10 */ 110, 87, 70, 56, 45, /* 15 */ 36, 29, 23, 18, 15, }; fixed point, 10 bits. ^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2009-02-12 11:03 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <315626.71453.qm@web94713.mail.in2.yahoo.com>
[not found] ` <20090211102024.GI20518@elte.hu>
[not found] ` <1234348436.23438.119.camel@twins>
[not found] ` <303179.28635.qm@web94707.mail.in2.yahoo.com>
2009-02-11 12:40 ` Time slice for SCHED_BATCH ( CFS) Ingo Molnar
2009-02-11 13:02 ` Peter Zijlstra
[not found] ` <441046.97200.qm@web94712.mail.in2.yahoo.com>
2009-02-12 9:13 ` Peter Zijlstra
[not found] ` <466094.58460.qm@web94704.mail.in2.yahoo.com>
2009-02-12 11:04 ` Peter Zijlstra
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox