All of lore.kernel.org
 help / color / mirror / Atom feed
From: Raistlin <raistlin@linux.it>
To: Peter Zijlstra <peterz@infradead.org>
Cc: linux-kernel <linux-kernel@vger.kernel.org>,
	michael trimarchi <michael@evidence.eu.com>,
	Fabio Checconi <fabio@gandalf.sssup.it>,
	Ingo Molnar <mingo@elte.hu>, Thomas Gleixner <tglx@linutronix.de>,
	Dhaval Giani <dhaval.giani@gmail.com>,
	Johan Eker <johan.eker@ericsson.com>,
	"p.faure" <p.faure@akatech.ch>,
	Chris Friesen <cfriesen@nortel.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	Henrik Austad <henrik@austad.us>,
	Frederic Weisbecker <fweisbec@gmail.com>,
	Darren Hart <darren@dvhart.com>,
	Sven-Thorsten Dietrich <sven@thebigcorporation.com>,
	Bjoern Brandenburg <bbb@cs.unc.edu>,
	Tommaso Cucinotta <tommaso.cucinotta@sssup.it>,
	"giuseppe.lipari" <giuseppe.lipari@sssup.it>,
	Juri Lelli <juri.lelli@gmail.com>
Subject: [RFC 11/12][PATCH] SCHED_DEADLINE: documentation
Date: Fri, 16 Oct 2009 17:47:23 +0200	[thread overview]
Message-ID: <1255708043.6228.467.camel@Palantir> (raw)
In-Reply-To: <1255707324.6228.448.camel@Palantir>

[-- Attachment #1: Type: text/plain, Size: 10647 bytes --]

This commit adds some more documentation and comments on how the new
scheduling policy works.

Signed-off-by: Raistlin <raistlin@linux.it>
---
 Documentation/scheduler/sched-deadline.txt |  174 ++++++++++++++++++++++++++++
 include/linux/sched.h                      |   45 +++++++
 init/Kconfig                               |    1 +
 3 files changed, 220 insertions(+), 0 deletions(-)
 create mode 100644 Documentation/scheduler/sched-deadline.txt

diff --git a/Documentation/scheduler/sched-deadline.txt b/Documentation/scheduler/sched-deadline.txt
new file mode 100644
index 0000000..cadfa9f
--- /dev/null
+++ b/Documentation/scheduler/sched-deadline.txt
@@ -0,0 +1,174 @@
+			Deadline Task and Group Scheduling
+			----------------------------------
+
+CONTENTS
+========
+
+0. WARNING
+1. Overview
+  1.1 Task scheduling
+  1.2 Group scheduling
+2. The interface
+  2.1 System-wide settings
+  2.2 Default behavior
+  2.3 Basis for grouping tasks
+3. Future plans
+
+
+0. WARNING
+==========
+
+ Fiddling with these settings can result in an unpredictable or even unstable
+ system behavior. As for -rt (group) scheduling, it is assumed that root
+ knows what he is doing.
+
+
+1. Overview
+===========
+
+The SCHED_DEADLINE scheduling class implements the Earliest Deadline First
+(EDF) algorithm and uses the Constant Bandwidth Server (CBS) to provide
+bandwidth isolation among tasks.
+The implementation is aligned with the current mainstream kernel, and it
+relies on standard Linux mechanisms (e.g., control groups) to natively support
+multicore platforms and to provide hierarchical scheduling through a standard
+API.
+
+
+1.1 Task scheduling
+-------------------
+
+The SCHED_DEADLINE scheduling class does not make any restrictive assumption
+on the characteristics of the tasks, thus it can handle:
+ * periodic tasks, typical in real-time and control applications;
+ * sporadic tasks, typical in soft real-time and multimedia applications;
+ * aperiodic tasks.
+
+This is mainly because temporal isolation is ensured: the temporal behavior
+of each task (i.e., its ability to meet deadlines) is not affected by what
+happens in any other task in the system.
+In other words, even if a task misbehaves, it is not able to exploit larger
+execution time than the amount that has been devoted to it.
+
+In fact, each task is assigned a ``scheduling budget'' (sched_runtime) and a
+``scheduling deadline'' (sched_deadline, also called period in this branch
+of the real-time literature).
+This means the task is guaranteed to execute for an amount of time equal to
+sched_runtime every sched_deadline, i.e., to utilize at most a CPU bandwidth
+equal to sched_runtime/sched_deadline.
+If it tries to execute more than its sched_runtime it is slowed down, by
+stopping it until the time instant of its next deadline.
+
+However, although this algorithm (i.e., the CBS) is effective for encapsulating
+aperiodic or sporadic --real-time or non real-time-- tasks in a real-time
+EDF scheduled system, it imposes some overhead to ``standard'' periodic tasks.
+Therefore, we make it possible for periodic task to specify that they are going
+to sleep, waiting for the next activation, because a periodic instance just
+ended. This avoid them (provided they behave well!) being disturbed by
+the CBS bandwidth management logic.
+
+
+Group scheduling
+----------------
+
+The scheduling class is integrated with the control groups mechanism in order
+to allow the creation of groups of tasks with a cap on their total utilization.
+
+However, groups plays no role in the on-line scheduling decisions. This is
+different on how group scheduling works for the -rt scheduling class, and
+the difference comes from the fact that -deadline tasks _already_ have their
+own bandwidth, which is not true for standard POSIX SCHED_FIFO or SCHED_RR
+processes and threads.
+
+Therefore, there is no need for fully hierarchical runqueue implementation,
+hierarchical runtime accounting, etc., which result in simpler code and
+smaller overhead.
+All we do are bandwidth ``consistency checks'', which are performed at the
+occurrence of the following events:
+ * a -deadline task is created or moved inside a group,
+ * the parameters of a -deadline task (if inside a group) are modified,
+ * the -deadline related parameters of a group are modified.
+
+The purpose of this is ensuring the cumulative utilization of tasks and
+groups is below the one of the group containing them (see below).
+
+
+2. The Interface
+================
+
+
+2.1 System wide settings
+------------------------
+
+The system wide settings are configured under the /proc virtual file system:
+
+/proc/sys/kernel/sched_deadline_period_us:
+  The scheduling period that is equivalent to 100% CPU bandwidth
+
+/proc/sys/kernel/sched_deadline_runtime_us:
+  A global limit on how much time real-time scheduling may use. Even without
+  CONFIG_DEADLINE_GROUP_SCHED enabled, this will limit time reserved to
+  -deadline processes. With CONFIG_DEADLINE_GROUP_SCHED it signifies the
+  total bandwidth available to all real-time groups.
+
+  * Time is specified in us because the interface is s32. This gives an
+    operating range from 1us to about 35 minutes;
+  * sched_deadline_period_us takes values from 1 to INT_MAX;
+  * sched_deadline_runtime_us takes values from 1 to INT_MAX;
+  * setting runtime = period specifies 100% bandwidth exploitable by
+    -deadline tasks;
+  * setting runtime > period allows for more than 100% bandwidth
+    exploitable by -deadline tasks, which still might make sense,
+    especially in SMP systems.
+
+
+2.2 Default behavior
+---------------------
+
+The default values for sched_deadline_period_us and
+sched_deadline_runtime_us are 0.  This means no -deadline tasks or
+groups can be created!
+
+Consistently, bandwidth assigned to the root group, and to each newly created
+group, is 0 as well.
+
+
+2.3 Basis for grouping tasks
+----------------------------
+
+There are two compile-time settings for allocating CPU bandwidth. These are
+configured using the "Basis for grouping tasks" multiple choice menu under
+General setup > Group CPU Scheduler:
+
+CONFIG_USER_SCHED (aka "Basis for grouping tasks" =  "user id")
+
+This, for now, is not supported for deadline group scheduling.
+
+CONFIG_CGROUP_SCHED (aka "Basis for grouping tasks" = "Control groups")
+
+This uses the /cgroup virtual file system, i.e.:
+ * /cgroup/<cgroup>/cpu.deadline_runtime_us and
+ * /cgroup/<cgroup>/cpu.deadline_period_us,
+to control the CPU time reserved or each control group.
+
+For more information on working with control groups, you should read
+Documentation/cgroups/cgroups.txt as well.
+
+Group settings are checked against the following limits:
+
+ * for the root group {r}
+     runtime_{r} / period_{r} <= global_runtime / global_period
+ * for each group {i}, subgroup of group {j}
+     \Sum_{i} runtime_{i} / period_{i} <= runtime_{j} / period_{j}
+
+
+3. Future plans
+===============
+
+Only two, but very important pieces are missing:
+
+ * SMP/multicore global scheduling throughout push and pull logic (as in
+   -rt). This is not finished, but is on it's way, and will come very soon!
+ * Deadline/BandWidth Inheritance and/or Proxy Execution mechanisms for the
+  rt_mutexes. This probably need some more discussion, and also some more time
+  to have it implemented!
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 4de72eb..ec0324f 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -95,6 +95,51 @@ struct sched_param {
 
 #include <asm/processor.h>
 
+/*
+ * Extended sched_param for SCHED_DEADLINE tasks.
+ *
+ * In fact, struct sched_param can not be modified for binary compatibility
+ * issues.
+ *
+ * A SCHED_DEADLINE task have at least a scheduling deadline (sched_deadline)
+ * and a scheduling runtime (sched_runtime). Space for a scheduling
+ * period (sched_period) is reserved, but the field is not used right now.
+ *
+ * When a SCHED_DEADLINE task activates at time t, its absolute deadline is
+ * computed as:
+ *	deadline = t + sched_deadline.
+ * The SCHED_DEADLINE runqueue is ordered according to ascending tasks'
+ * deadline values, thus the task with the _earliest_ deadline is the one
+ * that will be scheduled.
+ *
+ * In order of avoiding one task to cause intefrerence on the others, each
+ * task activation is allowed to run for at its runtime, which is at most
+ * sched_runtime.
+ * After that, the task is stopped until its deadline, when it is reactivated
+ * with a new 'runtime quota' and a new deadline.
+ *
+ * Period (or minimum interarrival time) is not dealt with in the kernel, and
+ * it is up to the user to make the task suspend at the end of each instance.
+ * The sched_wait_interval() --with clock_nanosleep like semantic-- syscall
+ * can be used for this purpose. In this case, when the task resumes, the
+ * scheduler assumes a new instance is just starting, and provide the task
+ * with new runtime and deadline values.
+ *
+ * Scheduling flags, finally, let the user specify if runtime overruns (which
+ * may occur, e.g., for timing resolution issues) and/or deadline misses
+ * (e.g., because system is oversubscribed) have to be notified by means of
+ * SIGXCPU signals.
+ *
+ * @sched_priority:	not used right now
+ *
+ * @sched_deadline:	scheduling deadline of the task
+ * @sched_runtime:	scheduling runtime of the task
+ * @sched_period:	not used right now
+ *
+ * @sched_flags:	scheduling flags of the task (runtime overrun and/or
+ *			deadline miss only, for now)
+ */
+
 #define SCHED_SIG_RORUN		0x80000000
 #define SCHED_SIG_DMISS		0x40000000
 
diff --git a/init/Kconfig b/init/Kconfig
index 17318ca..d4a52b7 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -467,6 +467,7 @@ config DEADLINE_GROUP_SCHED
 	  tasks (and other groups) can be added to it only up to such
 	  ``bandwidth cap'', which might be useful for avoiding or
 	  controlling oversubscription.
+	  See Documentation/scheduler/sched-deadline.txt for more.
 
 choice
 	depends on GROUP_SCHED
-- 
1.6.0.4


-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
----------------------------------------------------------------------
Dario Faggioli, ReTiS Lab, Scuola Superiore Sant'Anna, Pisa  (Italy)

http://blog.linux.it/raistlin / raistlin@ekiga.net /
dario.faggioli@jabber.org

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 197 bytes --]

  parent reply	other threads:[~2009-10-16 15:47 UTC|newest]

Thread overview: 45+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-10-16 15:35 [RFC 0/12][PATCH] SCHED_DEADLINE (new version of SCHED_EDF) Raistlin
2009-10-16 15:38 ` [RFC 1/12][PATCH] Extended scheduling parameters structure added Raistlin
2009-12-29 12:15   ` Peter Zijlstra
2010-01-13 10:36     ` Raistlin
2009-10-16 15:40 ` [RFC 0/12][PATCH] SCHED_DEADLINE: core of the scheduling class Raistlin
2009-12-29 12:25   ` Peter Zijlstra
2010-01-13 10:40     ` Dario Faggioli
2009-12-29 12:27   ` Peter Zijlstra
2010-01-13 10:42     ` Raistlin
2009-12-29 14:30   ` Peter Zijlstra
2009-12-29 14:37     ` Peter Zijlstra
2009-12-29 14:40       ` Peter Zijlstra
2010-01-13 16:32     ` Dario Faggioli
2010-01-13 16:47       ` Peter Zijlstra
2009-12-29 14:41   ` Peter Zijlstra
2010-01-13 10:46     ` Raistlin
2009-10-16 15:41 ` [RFC 0/12][PATCH] SCHED_DEADLINE: fork and terminate task logic Raistlin
2009-12-29 15:20   ` Peter Zijlstra
2010-01-13 11:11     ` Raistlin
2010-01-13 16:15       ` Peter Zijlstra
2010-01-13 16:28         ` Dario Faggioli
2010-01-13 21:30         ` Fabio Checconi
2009-10-16 15:41 ` [RFC 0/12][PATCH] SCHED_DEADLINE: added sched_*_ex syscalls Raistlin
2009-10-16 15:42 ` [RFC 0/12][PATCH] SCHED_DEADLINE: added sched-debug support Raistlin
2009-10-16 15:43 ` [RFC 6/12][PATCH] SCHED_DEADLINE: added scheduling latency tracer Raistlin
2009-10-16 15:44 ` [RFC 7/12][PATCH] SCHED_DEADLINE: signal delivery when overrunning Raistlin
2009-12-28 14:19   ` Peter Zijlstra
2010-01-13  9:30     ` Raistlin
2009-10-16 15:44 ` [RFC 8/12][PATCH] SCHED_DEADLINE: wait next instance syscall added Raistlin
2009-12-28 14:30   ` Peter Zijlstra
2010-01-13  9:33     ` Raistlin
2009-10-16 15:45 ` [RFC 9/12][PATCH] SCHED_DEADLINE: system wide bandwidth management Raistlin
2009-11-06 11:34   ` Dhaval Giani
2009-12-28 14:44   ` Peter Zijlstra
2010-01-13  9:41     ` Raistlin
2009-10-16 15:46 ` [RFC 10/12][PATCH] SCHED_DEADLINE: group bandwidth management code Raistlin
2009-12-28 14:51   ` Peter Zijlstra
2010-01-13  9:46     ` Raistlin
2009-10-16 15:47 ` Raistlin [this message]
2009-10-16 15:48 ` [RFC 12/12][PATCH] SCHED_DEADLINE: modified sched_*_ex API Raistlin
2009-12-28 15:09   ` Peter Zijlstra
2010-01-13 10:27     ` Raistlin
2010-01-13 16:23       ` Peter Zijlstra
2009-12-29 12:15   ` Peter Zijlstra
2010-01-13 10:33     ` Raistlin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1255708043.6228.467.camel@Palantir \
    --to=raistlin@linux.it \
    --cc=bbb@cs.unc.edu \
    --cc=cfriesen@nortel.com \
    --cc=darren@dvhart.com \
    --cc=dhaval.giani@gmail.com \
    --cc=fabio@gandalf.sssup.it \
    --cc=fweisbec@gmail.com \
    --cc=giuseppe.lipari@sssup.it \
    --cc=henrik@austad.us \
    --cc=johan.eker@ericsson.com \
    --cc=juri.lelli@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=michael@evidence.eu.com \
    --cc=mingo@elte.hu \
    --cc=p.faure@akatech.ch \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=sven@thebigcorporation.com \
    --cc=tglx@linutronix.de \
    --cc=tommaso.cucinotta@sssup.it \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.