linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Henrik Austad <henrik@austad.us>
To: Peter Zijlstra <peterz@infradead.org>
Cc: "Michael Kerrisk (man-pages)" <mtk.manpages@gmail.com>,
	Dario Faggioli <raistlin@linux.it>,
	Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@redhat.com>,
	rostedt@goodmis.org, Oleg Nesterov <oleg@redhat.com>,
	fweisbec@gmail.com, darren@dvhart.com, johan.eker@ericsson.com,
	p.faure@akatech.ch, Linux Kernel <linux-kernel@vger.kernel.org>,
	claudio@evidence.eu.com, michael@amarulasolutions.com,
	fchecconi@gmail.com, tommaso.cucinotta@sssup.it,
	juri.lelli@gmail.com, nicola.manica@disi.unitn.it,
	luca.abeni@unitn.it, dhaval.giani@gmail.com, hgu1972@gmail.com,
	Paul McKenney <paulmck@linux.vnet.ibm.com>,
	insop.song@gmail.com, liming.wang@windriver.com,
	jkacur@redhat.com, linux-man@vger.kernel.org
Subject: Re: sched_{set,get}attr() manpage
Date: Wed, 9 Apr 2014 17:19:11 +0200	[thread overview]
Message-ID: <20140409151911.GA4041@austad.us> (raw)
In-Reply-To: <20140409092510.GQ11096@twins.programming.kicks-ass.net>

On Wed, Apr 09, 2014 at 11:25:10AM +0200, Peter Zijlstra wrote:
> On Mon, Feb 17, 2014 at 02:20:29PM +0100, Michael Kerrisk (man-pages) wrote:
> > If your could take another pass though your existing text, to incorporate the
> > new flags stuff, and then send a page to me + linux-man@
> > that would be great.
> 
> 
> Sorry, this slipped my mind. An updated version below. Heavy borrowing
> from SCHED_SETSCHEDULER(2) as before.
> 
> ---
> 
> NAME
> 	sched_setattr, sched_getattr - set and get scheduling policy/attributes
> 
> SYNOPSIS
> 	#include <sched.h>
> 
> 	struct sched_attr {
> 		u32 size;
> 		u32 sched_policy;
> 		u64 sched_flags;
> 
> 		/* SCHED_NORMAL, SCHED_BATCH */
> 		s32 sched_nice;
> 		/* SCHED_FIFO, SCHED_RR */
> 		u32 sched_priority;
> 		/* SCHED_DEADLINE */
> 		u64 sched_runtime;
> 		u64 sched_deadline;
> 		u64 sched_period;
> 	};
> 	int sched_setattr(pid_t pid, const struct sched_attr *attr, unsigned int flags);
> 
> 	int sched_getattr(pid_t pid, const struct sched_attr *attr, unsigned int size, unsigned int flags);
> 
> DESCRIPTION
> 	sched_setattr() sets both the scheduling policy and the
> 	associated attributes for the process whose ID is specified in
> 	pid.  If pid equals zero, the scheduling policy and attributes
> 	of the calling process will be set.  The interpretation of the
> 	argument attr depends on the selected policy.  Currently, Linux
> 	supports the following "normal" (i.e., non-real-time) scheduling
> 	policies:
> 
> 	SCHED_OTHER	the standard "fair" time-sharing policy;
> 
> 	SCHED_BATCH	for "batch" style execution of processes; and
> 
> 	SCHED_IDLE	for running very low priority background jobs.
> 
> 	The following "real-time" policies are also supported, for

why the "'s?

> 	special time-critical applications that need precise control
> 	over the way in which runnable processes are selected for
> 	execution:
> 
> 	SCHED_FIFO	a first-in, first-out policy;
> 
> 	SCHED_RR	a round-robin policy; and
> 
> 	SCHED_DEADLINE	a deadline policy.
> 
> 	The semantics of each of these policies are detailed below.
> 
> 	sched_attr::size must be set to the size of the structure, as in
> 	sizeof(struct sched_attr), if the provided structure is smaller
> 	than the kernel structure, any additional fields are assumed
> 	'0'. If the provided structure is larger than the kernel
> 	structure, the kernel verifies all additional fields are '0' if
> 	not the syscall will fail with -E2BIG.
> 
> 	sched_attr::sched_policy the desired scheduling policy.
> 
> 	sched_attr::sched_flags additional flags that can influence
> 	scheduling behaviour. Currently as per Linux kernel 3.14:
> 
> 		SCHED_FLAG_RESET_ON_FORK - resets the scheduling policy
> 		to: (struct sched_attr){ .sched_policy = SCHED_OTHER, }
> 		on fork().
> 
> 	is the only supported flag.
> 
> 	sched_attr::sched_nice should only be set for SCHED_OTHER,
> 	SCHED_BATCH, the desired nice value [-20,19], see NICE(2).
> 
> 	sched_attr::sched_priority should only be set for SCHED_FIFO,
> 	SCHED_RR, the desired static priority [1,99].
> 
> 	sched_attr::sched_runtime
> 	sched_attr::sched_deadline
> 	sched_attr::sched_period should only be set for SCHED_DEADLINE
> 	and are the traditional sporadic task model parameters.
> 
> 	The flags argument should be 0.
> 
> 	sched_getattr() queries the scheduling policy currently applied
> 	to the process identified by pid.  If pid equals zero, the
> 	policy of the calling process will be retrieved.
> 
> 	The size argument should reflect the size of struct sched_attr
> 	as known to userspace. The kernel fills out sched_attr::size to
> 	the size of its sched_attr structure. If the user provided
> 	structure is larger, additional fields are not touched. If the
> 	user provided structure is smaller, but the kernel needs to
> 	return values outside the provided space, the syscall will fail
> 	with -E2BIG.
> 
> 	The flags argument should be 0.

What about SCHED_FLAG_RESET_ON_FOR?

> 	The other sched_attr fields are filled out as described in
> 	sched_setattr().
> 
>    Scheduling Policies
>        The  scheduler  is  the  kernel  component  that decides which runnable
>        process will be executed by the CPU next.  Each process has an  associ‐
>        ated  scheduling  policy and a static scheduling priority, sched_prior‐
>        ity; these are the settings that are modified by  sched_setscheduler().
>        The  scheduler  makes it decisions based on knowledge of the scheduling
>        policy and static priority of all processes on the system.

Isn't this last sentence redundant/sliglhtly repetitive?

>        For processes scheduled under one of  the  normal  scheduling  policies
>        (SCHED_OTHER,  SCHED_IDLE,  SCHED_BATCH), sched_priority is not used in
>        scheduling decisions (it must be specified as 0).
> 
>        Processes scheduled under one of the  real-time  policies  (SCHED_FIFO,
>        SCHED_RR)  have  a  sched_priority  value  in  the  range 1 (low) to 99
>        (high).  (As the numbers imply, real-time processes always have  higher
>        priority than normal processes.)  Note well: POSIX.1-2001 only requires
>        an implementation to support a minimum 32 distinct priority levels  for
>        the  real-time  policies,  and  some  systems supply just this minimum.
>        Portable   programs   should    use    sched_get_priority_min(2)    and
>        sched_get_priority_max(2) to find the range of priorities supported for
>        a particular policy.
> 
>        Conceptually, the scheduler maintains a list of runnable processes  for
>        each  possible  sched_priority  value.   In  order  to  determine which
>        process runs next, the scheduler looks for the nonempty list  with  the
>        highest  static  priority  and  selects the process at the head of this
>        list.
> 
>        A process's scheduling policy determines where it will be inserted into
>        the  list  of processes with equal static priority and how it will move
>        inside this list.
> 
>        All scheduling is preemptive: if a process with a higher static  prior‐
>        ity  becomes  ready  to run, the currently running process will be pre‐
>        empted and returned to the wait list for  its  static  priority  level.
>        The  scheduling  policy only determines the ordering within the list of
>        runnable processes with equal static priority.
> 
>     SCHED_DEADLINE: Sporadic task model deadline scheduling
> 	SCHED_DEADLINE is an implementation of GEDF (Global Earliest
> 	Deadline First) with additional CBS (Constant Bandwidth Server).
> 	The CBS guarantees that tasks that over-run their specified
> 	budget are throttled and do not affect the correct performance
> 	of other SCHED_DEADLINE tasks.
> 
> 	SCHED_DEADLINE tasks will fail FORK(2) with -EAGAIN
> 
> 	Setting SCHED_DEADLINE can fail with -EINVAL when admission
> 	control tests fail.

Perhaps add a note about the deadline-class having higher priority than the 
other classes; i.e. if a deadline-task is runnable, it will preempt any 
other SCHED_(RR|FIFO) regardless of priority?

>    SCHED_FIFO: First In-First Out scheduling
>        SCHED_FIFO can only be used with static priorities higher than 0, which
>        means that when a SCHED_FIFO processes becomes runnable, it will always
>        immediately preempt any currently running SCHED_OTHER, SCHED_BATCH,  or
>        SCHED_IDLE  process.  SCHED_FIFO is a simple scheduling algorithm with‐
>        out time slicing.  For processes scheduled under the SCHED_FIFO policy,
>        the following rules apply:
> 
>        *  A  SCHED_FIFO  process that has been preempted by another process of
>           higher priority will stay at the head of the list for  its  priority
>           and  will resume execution as soon as all processes of higher prior‐
>           ity are blocked again.
> 
>        *  When a SCHED_FIFO process becomes runnable, it will be  inserted  at
>           the end of the list for its priority.
> 
>        *  A  call  to  sched_setscheduler()  or sched_setparam(2) will put the
>           SCHED_FIFO (or SCHED_RR) process identified by pid at the  start  of
>           the  list  if it was runnable.  As a consequence, it may preempt the
>           currently  running  process   if   it   has   the   same   priority.
>           (POSIX.1-2001 specifies that the process should go to the end of the
>           list.)
> 
>        *  A process calling sched_yield(2) will be put at the end of the list.

How about the recent discussion regarding sched_yield(). Is this correct?

lkml.kernel.org/r/alpine.DEB.2.02.1403312333100.14882@ionos.tec.linutronix.de

Is this the correct place to add a note explaining te potentional pitfalls 
using sched_yield?

>        No other events will move a process scheduled under the SCHED_FIFO pol‐
>        icy in the wait list of runnable processes with equal static priority.
> 
>        A SCHED_FIFO process runs until either it is blocked by an I/O request,
>        it  is  preempted  by  a  higher  priority   process,   or   it   calls
>        sched_yield(2).
> 
>    SCHED_RR: Round Robin scheduling
>        SCHED_RR  is  a simple enhancement of SCHED_FIFO.  Everything described
>        above for SCHED_FIFO also applies to SCHED_RR, except that each process
>        is  only  allowed  to  run  for  a maximum time quantum.  If a SCHED_RR
>        process has been running for a time period equal to or longer than  the
>        time  quantum,  it will be put at the end of the list for its priority.
>        A SCHED_RR process that has been preempted by a higher priority process
>        and  subsequently  resumes execution as a running process will complete
>        the unexpired portion of its round robin time quantum.  The  length  of
>        the time quantum can be retrieved using sched_rr_get_interval(2).

-> Default is 0.1HZ ms

This is a question I get form time to time, having this in the manpage 
would be helpful.

>    SCHED_OTHER: Default Linux time-sharing scheduling
>        SCHED_OTHER  can only be used at static priority 0.  SCHED_OTHER is the
>        standard Linux time-sharing scheduler that is  intended  for  all  pro‐
>        cesses  that  do  not  require  the  special real-time mechanisms.  The
>        process to run is chosen from the static priority 0  list  based  on  a
>        dynamic priority that is determined only inside this list.  The dynamic
>        priority is based on the nice value (set by nice(2) or  setpriority(2))
>        and  increased  for  each time quantum the process is ready to run, but
>        denied to run by the scheduler.  This ensures fair progress  among  all
>        SCHED_OTHER processes.
> 
>    SCHED_BATCH: Scheduling batch processes
>        (Since  Linux 2.6.16.)  SCHED_BATCH can only be used at static priority
>        0.  This policy is similar to SCHED_OTHER  in  that  it  schedules  the
>        process  according  to  its dynamic priority (based on the nice value).
>        The difference is that this policy will cause the scheduler  to  always
>        assume  that the process is CPU-intensive.  Consequently, the scheduler
>        will apply a small scheduling penalty with respect to wakeup behaviour,
>        so that this process is mildly disfavored in scheduling decisions.
> 
>        This policy is useful for workloads that are noninteractive, but do not
>        want to lower their nice value, and for workloads that want a determin‐
>        istic scheduling policy without interactivity causing extra preemptions
>        (between the workload's tasks).
> 
>    SCHED_IDLE: Scheduling very low priority jobs
>        (Since Linux 2.6.23.)  SCHED_IDLE can only be used at  static  priority
>        0; the process nice value has no influence for this policy.
> 
>        This  policy  is  intended  for  running jobs at extremely low priority
>        (lower even than a +19 nice value with the SCHED_OTHER  or  SCHED_BATCH
>        policies).
> 
> RETURN VALUE
> 	On success, sched_setattr() and sched_getattr() return 0. On
> 	error, -1 is returned, and errno is set appropriately.
> 
> ERRORS
>        EINVAL The scheduling policy is not one  of  the  recognized  policies,
>               param is NULL, or param does not make sense for the policy.
> 
>        EPERM  The calling process does not have appropriate privileges.
> 
>        ESRCH  The process whose ID is pid could not be found.
> 
>        E2BIG  The provided storage for struct sched_attr is either too
>               big, see sched_setattr(), or too small, see sched_getattr().

Where's the EBUSY? It can throw this from __sched_setscheduler() when it 
checks if there's enough bandwidth to run the task.

> 
> NOTES
> 	While the text above (and in SCHED_SETSCHEDULER(2)) talks about
> 	processes, in actual fact these system calls are thread specific.


-- 
Henrik Austad

  reply	other threads:[~2014-04-09 15:22 UTC|newest]

Thread overview: 71+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-12-17 12:27 [PATCH 00/13] sched, deadline: patches Peter Zijlstra
2013-12-17 12:27 ` [PATCH 01/13] sched: Add 3 new scheduler syscalls to support an extended scheduling parameters ABI Peter Zijlstra
2014-01-21 14:36   ` Michael Kerrisk
2014-01-21 15:38     ` Peter Zijlstra
2014-01-21 15:46       ` Peter Zijlstra
2014-01-21 16:02         ` Steven Rostedt
2014-01-21 16:06           ` Peter Zijlstra
2014-01-21 16:46             ` Juri Lelli
2014-02-14 14:13       ` Michael Kerrisk (man-pages)
2014-02-14 16:19         ` Peter Zijlstra
2014-02-15 12:52           ` Ingo Molnar
2014-02-17 13:20           ` Michael Kerrisk (man-pages)
2014-04-09  9:25             ` sched_{set,get}attr() manpage Peter Zijlstra
2014-04-09 15:19               ` Henrik Austad [this message]
2014-04-09 15:42                 ` Peter Zijlstra
2014-04-10  7:47                   ` Juri Lelli
2014-04-10  9:59                     ` Claudio Scordino
2014-04-27 15:47                   ` Michael Kerrisk (man-pages)
2014-04-27 19:34                     ` Peter Zijlstra
2014-04-27 19:45                       ` Steven Rostedt
2014-04-28  7:39                       ` Juri Lelli
2014-04-28  8:18             ` Peter Zijlstra
2014-04-29 13:08               ` Michael Kerrisk (man-pages)
2014-04-29 14:22                 ` Peter Zijlstra
2014-04-29 16:04                 ` Peter Zijlstra
2014-04-30 11:09                   ` Michael Kerrisk (man-pages)
2014-04-30 12:35                     ` Peter Zijlstra
2014-04-30 13:09                     ` Peter Zijlstra
2014-05-03 10:43                       ` Juri Lelli
2014-05-05  6:55                         ` Michael Kerrisk (man-pages)
2014-05-05  7:21                           ` Peter Zijlstra
2014-05-05  7:41                             ` Michael Kerrisk (man-pages)
2014-05-05  7:47                               ` Peter Zijlstra
2014-05-05  9:53                                 ` Michael Kerrisk (man-pages)
2014-05-06  8:16                             ` Peter Zijlstra
2014-05-09  8:23                               ` Michael Kerrisk (man-pages)
2014-05-09  8:53                                 ` Peter Zijlstra
2014-05-09  9:26                                   ` Michael Kerrisk (man-pages)
2014-05-19 13:06                                   ` [tip:sched/core] sched: Disallow sched_attr::sched_policy < 0 tip-bot for Peter Zijlstra
2014-05-22 12:25                                   ` tip-bot for Peter Zijlstra
2014-02-21 20:32           ` [tip:sched/urgent] sched: Add 'flags' argument to sched_{set, get}attr() syscalls tip-bot for Peter Zijlstra
2014-01-26  9:48   ` [PATCH 01/13] sched: Add 3 new scheduler syscalls to support an extended scheduling parameters ABI Geert Uytterhoeven
2013-12-17 12:27 ` [PATCH 02/13] sched: SCHED_DEADLINE structures & implementation Peter Zijlstra
2013-12-17 12:27 ` [PATCH 03/13] sched: SCHED_DEADLINE SMP-related data structures & logic Peter Zijlstra
2013-12-17 12:27 ` [PATCH 04/13] [PATCH 05/13] sched: SCHED_DEADLINE avg_update accounting Peter Zijlstra
2013-12-17 12:27 ` [PATCH 05/13] sched: Add period support for -deadline tasks Peter Zijlstra
2013-12-17 12:27 ` [PATCH 06/13] [PATCH 07/13] sched: Add latency tracing " Peter Zijlstra
2013-12-17 12:27 ` [PATCH 07/13] rtmutex: Turn the plist into an rb-tree Peter Zijlstra
2013-12-17 12:27 ` [PATCH 08/13] sched: Drafted deadline inheritance logic Peter Zijlstra
2013-12-17 12:27 ` [PATCH 09/13] sched: Add bandwidth management for sched_dl Peter Zijlstra
2013-12-18 16:55   ` Peter Zijlstra
2013-12-20 17:13     ` Peter Zijlstra
2013-12-20 17:37       ` Steven Rostedt
2013-12-20 17:42         ` Peter Zijlstra
2013-12-20 18:23           ` Steven Rostedt
2013-12-20 18:26             ` Steven Rostedt
2013-12-20 21:44             ` Peter Zijlstra
2013-12-20 23:29               ` Steven Rostedt
2013-12-21 10:05                 ` Peter Zijlstra
2013-12-21 17:26                   ` Peter Zijlstra
2014-01-13 15:55       ` [tip:sched/core] sched/deadline: Fix hotplug admission control tip-bot for Peter Zijlstra
2013-12-17 12:27 ` [PATCH 10/13] sched: speed up -dl pushes with a push-heap Peter Zijlstra
2013-12-17 12:27 ` [PATCH 11/13] sched: Remove sched_setscheduler2() Peter Zijlstra
2013-12-17 12:27 ` [PATCH 12/13] sched, deadline: Fixup the smp-affinity mask tests Peter Zijlstra
2013-12-17 12:27 ` [PATCH 13/13] sched, deadline: Remove the sysctl_sched_dl knobs Peter Zijlstra
2013-12-17 20:17 ` [PATCH] sched, deadline: Properly initialize def_dl_bandwidth lock Steven Rostedt
2013-12-18 10:01   ` Peter Zijlstra
2013-12-20 13:51 ` [PATCH 00/13] sched, deadline: patches Juri Lelli
2013-12-20 14:28   ` Steven Rostedt
2013-12-20 14:51   ` Peter Zijlstra
2013-12-20 15:19     ` Steven Rostedt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140409151911.GA4041@austad.us \
    --to=henrik@austad.us \
    --cc=claudio@evidence.eu.com \
    --cc=darren@dvhart.com \
    --cc=dhaval.giani@gmail.com \
    --cc=fchecconi@gmail.com \
    --cc=fweisbec@gmail.com \
    --cc=hgu1972@gmail.com \
    --cc=insop.song@gmail.com \
    --cc=jkacur@redhat.com \
    --cc=johan.eker@ericsson.com \
    --cc=juri.lelli@gmail.com \
    --cc=liming.wang@windriver.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-man@vger.kernel.org \
    --cc=luca.abeni@unitn.it \
    --cc=michael@amarulasolutions.com \
    --cc=mingo@redhat.com \
    --cc=mtk.manpages@gmail.com \
    --cc=nicola.manica@disi.unitn.it \
    --cc=oleg@redhat.com \
    --cc=p.faure@akatech.ch \
    --cc=paulmck@linux.vnet.ibm.com \
    --cc=peterz@infradead.org \
    --cc=raistlin@linux.it \
    --cc=rostedt@goodmis.org \
    --cc=tglx@linutronix.de \
    --cc=tommaso.cucinotta@sssup.it \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).