All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ingo Molnar <mingo@elte.hu>
To: Pranith Kumar D <pranith-kumar_d@mentor.com>
Cc: linux-kernel@vger.kernel.org
Subject: Re: [PATCH] CFS: sched-design-CFS.txt - ambiguity about leftmost
Date: Tue, 22 May 2007 10:30:51 +0200	[thread overview]
Message-ID: <20070522083051.GA9699@elte.hu> (raw)
In-Reply-To: <465292B0.8040604@mentorg.com>


* Pranith Kumar D <pranith-kumar_d@mentor.com> wrote:

> Hello,
> I felt the description of the leftmost task a bit ambiguous. Is it the 
> leftmost task in the rbtree?

yeah, the leftmost task in the rbtree.

> or did u mean the "most leftout task" in the task list? If it is so 
> then this patch should correct the leftmost task as "most leftout 
> task". NACK it if I'm wrong. Just trying to help. :)

feel free to make it less ambigious. FYI, i recently changed the text 
(not released yet, change attached below), if you change it then please 
send a patch against this version. Thanks,

	Ingo

Index: linux/Documentation/sched-design-CFS.txt
===================================================================
--- linux.orig/Documentation/sched-design-CFS.txt
+++ linux/Documentation/sched-design-CFS.txt
@@ -1,20 +1,59 @@
-[announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]
 
-i'm pleased to announce the first release of the "Modular Scheduler Core
-and Completely Fair Scheduler [CFS]" patchset:
+this is the CFS scheduler.
 
-   http://redhat.com/~mingo/cfs-scheduler/
+80% of CFS's design can be summed up in a single sentence: CFS basically
+models an "ideal, precise multi-tasking CPU" on real hardware.
 
-This project is a complete rewrite of the Linux task scheduler. My goal
-is to address various feature requests and to fix deficiencies in the
-vanilla scheduler that were suggested/found in the past few years, both
-for desktop scheduling and for server scheduling workloads.
+"Ideal multi-tasking CPU" is a (non-existent :-) CPU that has 100%
+physical power and which can run each task at precise equal speed, in
+parallel, each at 1/nr_running speed. For example: if there are 2 tasks
+running then it runs each at 50% physical power - totally in parallel.
+
+On real hardware, we can run only a single task at once, so while that
+one task runs the other tasks that are waiting for the CPU are at a
+disadvantage - the current task gets an unfair amount of CPU time. In
+CFS this fairness imbalance is expressed and tracked via the per-task
+p->wait_runtime (nanosec-unit) value. "wait_runtime" is the amount of
+time the task should now run on the CPU for it become completely fair
+and balanced.
+
+( small detail: on 'ideal' hardware, the p->wait_runtime value would
+  always be zero - no task would ever get 'out of balance' from the
+  'ideal' share of CPU time. )
+
+CFS's task picking logic is based on this p->wait_runtime value and it
+is thus very simple: it always tries to run the task with the largest
+p->wait_runtime value. In other words, CFS tries to run the task with
+the 'gravest need' for more CPU time. So CFS always tries to split up
+CPU time between runnable tasks as close to 'ideal multitasking
+hardware' as possible.
+
+Most of the rest of CFS's design just falls out of this really simple
+concept, with a few add-on embellishments like nice levels,
+multiprocessing and various algorithm variants to recognize sleepers.
+
+In practice it works like this: the system runs a task a bit, and when
+the task schedules (or a scheduler tick happens) the task's CPU usage is
+'accounted for': the (small) time it just spent using the physical CPU
+is deducted from p->wait_runtime. [minus the 'fair share' it would have
+gotten anyway]. Once p->wait_runtime gets low enough so that another
+task becomes the 'leftmost task' (plus a small amount of 'granularity'
+distance relative to the leftmost task so that we do not over-schedule
+tasks and trash the cache) then the new leftmost task is picked and the
+current task is preempted.
+
+The rq->fair_clock value tracks the 'CPU time a runnable task would have
+fairly gotten, had it been runnable during that time'. So by using
+rq->fair_clock values we can accurately timestamp and measure the
+'expected CPU time' a task should have gotten. All runnable tasks are
+sorted in the rbtree by the "rq->fair_clock - p->wait_runtime" key, and
+CFS picks the 'leftmost' task and sticks to it. As the system progresses
+forwards, newly woken tasks are put into the tree more and more to the
+right - slowly but surely giving a chance for every task to become the
+'leftmost task' and thus get on the CPU within a deterministic amount of
+time.
 
-[ QuickStart: apply the patch, recompile, reboot. The new scheduler
-  will be active by default and all tasks will default to the
-  SCHED_NORMAL interactive scheduling class. ]
-
-Highlights are:
+Some implementation details:
 
  - the introduction of Scheduling Classes: an extensible hierarchy of
    scheduler modules. These modules encapsulate scheduling policy
@@ -78,30 +117,3 @@ Highlights are:
    iterators of the scheduling modules are used. The balancing code got
    quite a bit simpler as a result.
 
-the core scheduler got smaller by more than 700 lines:
-
- kernel/sched.c | 1454 ++++++++++++++++------------------------------------------------
- 1 file changed, 372 insertions(+), 1082 deletions(-)
-
-and even adding all the scheduling modules, the total size impact is
-relatively small:
-
- 18 files changed, 1454 insertions(+), 1133 deletions(-)
-
-most of the increase is due to extensive comments. The kernel size
-impact is in fact a small negative:
-
-   text    data     bss     dec     hex filename
-  23366    4001      24   27391    6aff kernel/sched.o.vanilla
-  24159    2705      56   26920    6928 kernel/sched.o.CFS
-
-(this is mainly due to the benefit of getting rid of the expired array
-and its data structure overhead.)
-
-thanks go to Thomas Gleixner and Arjan van de Ven for review of this
-patchset.
-
-as usual, any sort of feedback, bugreports, fixes and suggestions are
-more than welcome,
-
-	Ingo

  reply	other threads:[~2007-05-22  8:31 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-05-22  6:50 [PATCH] CFS: sched-design-CFS.txt - ambiguity about leftmost Pranith Kumar D
2007-05-22  8:30 ` Ingo Molnar [this message]
2007-05-22  9:28   ` [PATCH] CFS: sched-design-CFS.txt - ambiguity about leftmost and some formatting Pranith Kumar D
2007-05-22 10:14     ` Ingo Molnar
     [not found]       ` <4652C379.3050801@mentorg.com>
     [not found]         ` <20070522102430.GA2344@elte.hu>
2007-05-22 10:44           ` Pranith Kumar D
2007-05-22 10:48       ` Ingo Molnar

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20070522083051.GA9699@elte.hu \
    --to=mingo@elte.hu \
    --cc=linux-kernel@vger.kernel.org \
    --cc=pranith-kumar_d@mentor.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.