linux-man.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* sched_{set,get}attr() manpage
       [not found]             ` <53020C9D.1050208-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
@ 2014-04-09  9:25               ` Peter Zijlstra
       [not found]                 ` <20140409092510.GQ11096-ndre7Fmf5hadTX5a5knrm8zTDFooKrT+cvkQGrU6aU0@public.gmane.org>
  2014-04-28  8:18               ` Peter Zijlstra
  1 sibling, 1 reply; 26+ messages in thread
From: Peter Zijlstra @ 2014-04-09  9:25 UTC (permalink / raw)
  To: Michael Kerrisk (man-pages)
  Cc: Dario Faggioli, Thomas Gleixner, Ingo Molnar,
	rostedt-nx8X9YLhiw1AfugRpC6u6w, Oleg Nesterov,
	fweisbec-Re5JQEeQqe8AvxtiuMwx3w, darren-P76s1CtE8BHQT0dZR+AlfA,
	johan.eker-IzeFyvvaP7pWk0Htik3J/w, p.faure-et3tyl94nDNyDzI6CaY1VQ,
	Linux Kernel, claudio-YOzL5CV4y4YG1A2ADO40+w,
	michael-dyjBcgdgk7Pe9wHmmfpqLFaTQe2KTcn/,
	fchecconi-Re5JQEeQqe8AvxtiuMwx3w, tommaso.cucinotta-gAmJrWFzCps,
	juri.lelli-Re5JQEeQqe8AvxtiuMwx3w,
	nicola.manica-+cHZLFJ93xAO91npARCAeA, luca.abeni-3IIOeSMMxS4,
	dhaval.giani-Re5JQEeQqe8AvxtiuMwx3w,
	hgu1972-Re5JQEeQqe8AvxtiuMwx3w, Paul McKenney,
	insop.song-Re5JQEeQqe8AvxtiuMwx3w,
	liming.wang-CWA4WttNNZF54TAoqtyWWQ, jkacur-H+wXaHxf7aLQT0dZR+AlfA,
	linux-man-u79uwXL29TY76Z2rM5mHXA

On Mon, Feb 17, 2014 at 02:20:29PM +0100, Michael Kerrisk (man-pages) wrote:
> If your could take another pass though your existing text, to incorporate the
> new flags stuff, and then send a page to me + linux-man@
> that would be great.


Sorry, this slipped my mind. An updated version below. Heavy borrowing
from SCHED_SETSCHEDULER(2) as before.

---

NAME
	sched_setattr, sched_getattr - set and get scheduling policy/attributes

SYNOPSIS
	#include <sched.h>

	struct sched_attr {
		u32 size;
		u32 sched_policy;
		u64 sched_flags;

		/* SCHED_NORMAL, SCHED_BATCH */
		s32 sched_nice;
		/* SCHED_FIFO, SCHED_RR */
		u32 sched_priority;
		/* SCHED_DEADLINE */
		u64 sched_runtime;
		u64 sched_deadline;
		u64 sched_period;
	};
	int sched_setattr(pid_t pid, const struct sched_attr *attr, unsigned int flags);

	int sched_getattr(pid_t pid, const struct sched_attr *attr, unsigned int size, unsigned int flags);

DESCRIPTION
	sched_setattr() sets both the scheduling policy and the
	associated attributes for the process whose ID is specified in
	pid.  If pid equals zero, the scheduling policy and attributes
	of the calling process will be set.  The interpretation of the
	argument attr depends on the selected policy.  Currently, Linux
	supports the following "normal" (i.e., non-real-time) scheduling
	policies:

	SCHED_OTHER	the standard "fair" time-sharing policy;

	SCHED_BATCH	for "batch" style execution of processes; and

	SCHED_IDLE	for running very low priority background jobs.

	The following "real-time" policies are also supported, for
	special time-critical applications that need precise control
	over the way in which runnable processes are selected for
	execution:

	SCHED_FIFO	a first-in, first-out policy;

	SCHED_RR	a round-robin policy; and

	SCHED_DEADLINE	a deadline policy.

	The semantics of each of these policies are detailed below.

	sched_attr::size must be set to the size of the structure, as in
	sizeof(struct sched_attr), if the provided structure is smaller
	than the kernel structure, any additional fields are assumed
	'0'. If the provided structure is larger than the kernel
	structure, the kernel verifies all additional fields are '0' if
	not the syscall will fail with -E2BIG.

	sched_attr::sched_policy the desired scheduling policy.

	sched_attr::sched_flags additional flags that can influence
	scheduling behaviour. Currently as per Linux kernel 3.14:

		SCHED_FLAG_RESET_ON_FORK - resets the scheduling policy
		to: (struct sched_attr){ .sched_policy = SCHED_OTHER, }
		on fork().

	is the only supported flag.

	sched_attr::sched_nice should only be set for SCHED_OTHER,
	SCHED_BATCH, the desired nice value [-20,19], see NICE(2).

	sched_attr::sched_priority should only be set for SCHED_FIFO,
	SCHED_RR, the desired static priority [1,99].

	sched_attr::sched_runtime
	sched_attr::sched_deadline
	sched_attr::sched_period should only be set for SCHED_DEADLINE
	and are the traditional sporadic task model parameters.

	The flags argument should be 0.

	sched_getattr() queries the scheduling policy currently applied
	to the process identified by pid.  If pid equals zero, the
	policy of the calling process will be retrieved.

	The size argument should reflect the size of struct sched_attr
	as known to userspace. The kernel fills out sched_attr::size to
	the size of its sched_attr structure. If the user provided
	structure is larger, additional fields are not touched. If the
	user provided structure is smaller, but the kernel needs to
	return values outside the provided space, the syscall will fail
	with -E2BIG.

	The flags argument should be 0.

	The other sched_attr fields are filled out as described in
	sched_setattr().

   Scheduling Policies
       The  scheduler  is  the  kernel  component  that decides which runnable
       process will be executed by the CPU next.  Each process has an  associ‐
       ated  scheduling  policy and a static scheduling priority, sched_prior‐
       ity; these are the settings that are modified by  sched_setscheduler().
       The  scheduler  makes it decisions based on knowledge of the scheduling
       policy and static priority of all processes on the system.

       For processes scheduled under one of  the  normal  scheduling  policies
       (SCHED_OTHER,  SCHED_IDLE,  SCHED_BATCH), sched_priority is not used in
       scheduling decisions (it must be specified as 0).

       Processes scheduled under one of the  real-time  policies  (SCHED_FIFO,
       SCHED_RR)  have  a  sched_priority  value  in  the  range 1 (low) to 99
       (high).  (As the numbers imply, real-time processes always have  higher
       priority than normal processes.)  Note well: POSIX.1-2001 only requires
       an implementation to support a minimum 32 distinct priority levels  for
       the  real-time  policies,  and  some  systems supply just this minimum.
       Portable   programs   should    use    sched_get_priority_min(2)    and
       sched_get_priority_max(2) to find the range of priorities supported for
       a particular policy.

       Conceptually, the scheduler maintains a list of runnable processes  for
       each  possible  sched_priority  value.   In  order  to  determine which
       process runs next, the scheduler looks for the nonempty list  with  the
       highest  static  priority  and  selects the process at the head of this
       list.

       A process's scheduling policy determines where it will be inserted into
       the  list  of processes with equal static priority and how it will move
       inside this list.

       All scheduling is preemptive: if a process with a higher static  prior‐
       ity  becomes  ready  to run, the currently running process will be pre‐
       empted and returned to the wait list for  its  static  priority  level.
       The  scheduling  policy only determines the ordering within the list of
       runnable processes with equal static priority.

    SCHED_DEADLINE: Sporadic task model deadline scheduling
	SCHED_DEADLINE is an implementation of GEDF (Global Earliest
	Deadline First) with additional CBS (Constant Bandwidth Server).
	The CBS guarantees that tasks that over-run their specified
	budget are throttled and do not affect the correct performance
	of other SCHED_DEADLINE tasks.

	SCHED_DEADLINE tasks will fail FORK(2) with -EAGAIN

	Setting SCHED_DEADLINE can fail with -EINVAL when admission
	control tests fail.

   SCHED_FIFO: First In-First Out scheduling
       SCHED_FIFO can only be used with static priorities higher than 0, which
       means that when a SCHED_FIFO processes becomes runnable, it will always
       immediately preempt any currently running SCHED_OTHER, SCHED_BATCH,  or
       SCHED_IDLE  process.  SCHED_FIFO is a simple scheduling algorithm with‐
       out time slicing.  For processes scheduled under the SCHED_FIFO policy,
       the following rules apply:

       *  A  SCHED_FIFO  process that has been preempted by another process of
          higher priority will stay at the head of the list for  its  priority
          and  will resume execution as soon as all processes of higher prior‐
          ity are blocked again.

       *  When a SCHED_FIFO process becomes runnable, it will be  inserted  at
          the end of the list for its priority.

       *  A  call  to  sched_setscheduler()  or sched_setparam(2) will put the
          SCHED_FIFO (or SCHED_RR) process identified by pid at the  start  of
          the  list  if it was runnable.  As a consequence, it may preempt the
          currently  running  process   if   it   has   the   same   priority.
          (POSIX.1-2001 specifies that the process should go to the end of the
          list.)

       *  A process calling sched_yield(2) will be put at the end of the list.

       No other events will move a process scheduled under the SCHED_FIFO pol‐
       icy in the wait list of runnable processes with equal static priority.

       A SCHED_FIFO process runs until either it is blocked by an I/O request,
       it  is  preempted  by  a  higher  priority   process,   or   it   calls
       sched_yield(2).

   SCHED_RR: Round Robin scheduling
       SCHED_RR  is  a simple enhancement of SCHED_FIFO.  Everything described
       above for SCHED_FIFO also applies to SCHED_RR, except that each process
       is  only  allowed  to  run  for  a maximum time quantum.  If a SCHED_RR
       process has been running for a time period equal to or longer than  the
       time  quantum,  it will be put at the end of the list for its priority.
       A SCHED_RR process that has been preempted by a higher priority process
       and  subsequently  resumes execution as a running process will complete
       the unexpired portion of its round robin time quantum.  The  length  of
       the time quantum can be retrieved using sched_rr_get_interval(2).

   SCHED_OTHER: Default Linux time-sharing scheduling
       SCHED_OTHER  can only be used at static priority 0.  SCHED_OTHER is the
       standard Linux time-sharing scheduler that is  intended  for  all  pro‐
       cesses  that  do  not  require  the  special real-time mechanisms.  The
       process to run is chosen from the static priority 0  list  based  on  a
       dynamic priority that is determined only inside this list.  The dynamic
       priority is based on the nice value (set by nice(2) or  setpriority(2))
       and  increased  for  each time quantum the process is ready to run, but
       denied to run by the scheduler.  This ensures fair progress  among  all
       SCHED_OTHER processes.

   SCHED_BATCH: Scheduling batch processes
       (Since  Linux 2.6.16.)  SCHED_BATCH can only be used at static priority
       0.  This policy is similar to SCHED_OTHER  in  that  it  schedules  the
       process  according  to  its dynamic priority (based on the nice value).
       The difference is that this policy will cause the scheduler  to  always
       assume  that the process is CPU-intensive.  Consequently, the scheduler
       will apply a small scheduling penalty with respect to wakeup behaviour,
       so that this process is mildly disfavored in scheduling decisions.

       This policy is useful for workloads that are noninteractive, but do not
       want to lower their nice value, and for workloads that want a determin‐
       istic scheduling policy without interactivity causing extra preemptions
       (between the workload's tasks).

   SCHED_IDLE: Scheduling very low priority jobs
       (Since Linux 2.6.23.)  SCHED_IDLE can only be used at  static  priority
       0; the process nice value has no influence for this policy.

       This  policy  is  intended  for  running jobs at extremely low priority
       (lower even than a +19 nice value with the SCHED_OTHER  or  SCHED_BATCH
       policies).

RETURN VALUE
	On success, sched_setattr() and sched_getattr() return 0. On
	error, -1 is returned, and errno is set appropriately.

ERRORS
       EINVAL The scheduling policy is not one  of  the  recognized  policies,
              param is NULL, or param does not make sense for the policy.

       EPERM  The calling process does not have appropriate privileges.

       ESRCH  The process whose ID is pid could not be found.

       E2BIG  The provided storage for struct sched_attr is either too
              big, see sched_setattr(), or too small, see sched_getattr().

NOTES
	While the text above (and in SCHED_SETSCHEDULER(2)) talks about
	processes, in actual fact these system calls are thread specific.
--
To unsubscribe from this list: send the line "unsubscribe linux-man" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: sched_{set,get}attr() manpage
       [not found]                 ` <20140409092510.GQ11096-ndre7Fmf5hadTX5a5knrm8zTDFooKrT+cvkQGrU6aU0@public.gmane.org>
@ 2014-04-09 15:19                   ` Henrik Austad
       [not found]                     ` <20140409151911.GA4041-RT+80VE2nyv1P9xLtpHBDw@public.gmane.org>
  0 siblings, 1 reply; 26+ messages in thread
From: Henrik Austad @ 2014-04-09 15:19 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Michael Kerrisk (man-pages), Dario Faggioli, Thomas Gleixner,
	Ingo Molnar, rostedt-nx8X9YLhiw1AfugRpC6u6w, Oleg Nesterov,
	fweisbec-Re5JQEeQqe8AvxtiuMwx3w, darren-P76s1CtE8BHQT0dZR+AlfA,
	johan.eker-IzeFyvvaP7pWk0Htik3J/w, p.faure-et3tyl94nDNyDzI6CaY1VQ,
	Linux Kernel, claudio-YOzL5CV4y4YG1A2ADO40+w,
	michael-dyjBcgdgk7Pe9wHmmfpqLFaTQe2KTcn/,
	fchecconi-Re5JQEeQqe8AvxtiuMwx3w, tommaso.cucinotta-gAmJrWFzCps,
	juri.lelli-Re5JQEeQqe8AvxtiuMwx3w,
	nicola.manica-+cHZLFJ93xAO91npARCAeA, luca.abeni-3IIOeSMMxS4,
	dhaval.giani-Re5JQEeQqe8AvxtiuMwx3w,
	hgu1972-Re5JQEeQqe8AvxtiuMwx3w, Paul McKenney,
	insop.song-Re5JQEeQqe8AvxtiuMwx3w,
	liming.wang-CWA4WttNNZF54TAoqtyWWQ, jkacur-H+wXaHxf7aLQT0dZR+AlfA,
	linux-man-u79uwXL29TY76Z2rM5mHXA

On Wed, Apr 09, 2014 at 11:25:10AM +0200, Peter Zijlstra wrote:
> On Mon, Feb 17, 2014 at 02:20:29PM +0100, Michael Kerrisk (man-pages) wrote:
> > If your could take another pass though your existing text, to incorporate the
> > new flags stuff, and then send a page to me + linux-man@
> > that would be great.
> 
> 
> Sorry, this slipped my mind. An updated version below. Heavy borrowing
> from SCHED_SETSCHEDULER(2) as before.
> 
> ---
> 
> NAME
> 	sched_setattr, sched_getattr - set and get scheduling policy/attributes
> 
> SYNOPSIS
> 	#include <sched.h>
> 
> 	struct sched_attr {
> 		u32 size;
> 		u32 sched_policy;
> 		u64 sched_flags;
> 
> 		/* SCHED_NORMAL, SCHED_BATCH */
> 		s32 sched_nice;
> 		/* SCHED_FIFO, SCHED_RR */
> 		u32 sched_priority;
> 		/* SCHED_DEADLINE */
> 		u64 sched_runtime;
> 		u64 sched_deadline;
> 		u64 sched_period;
> 	};
> 	int sched_setattr(pid_t pid, const struct sched_attr *attr, unsigned int flags);
> 
> 	int sched_getattr(pid_t pid, const struct sched_attr *attr, unsigned int size, unsigned int flags);
> 
> DESCRIPTION
> 	sched_setattr() sets both the scheduling policy and the
> 	associated attributes for the process whose ID is specified in
> 	pid.  If pid equals zero, the scheduling policy and attributes
> 	of the calling process will be set.  The interpretation of the
> 	argument attr depends on the selected policy.  Currently, Linux
> 	supports the following "normal" (i.e., non-real-time) scheduling
> 	policies:
> 
> 	SCHED_OTHER	the standard "fair" time-sharing policy;
> 
> 	SCHED_BATCH	for "batch" style execution of processes; and
> 
> 	SCHED_IDLE	for running very low priority background jobs.
> 
> 	The following "real-time" policies are also supported, for

why the "'s?

> 	special time-critical applications that need precise control
> 	over the way in which runnable processes are selected for
> 	execution:
> 
> 	SCHED_FIFO	a first-in, first-out policy;
> 
> 	SCHED_RR	a round-robin policy; and
> 
> 	SCHED_DEADLINE	a deadline policy.
> 
> 	The semantics of each of these policies are detailed below.
> 
> 	sched_attr::size must be set to the size of the structure, as in
> 	sizeof(struct sched_attr), if the provided structure is smaller
> 	than the kernel structure, any additional fields are assumed
> 	'0'. If the provided structure is larger than the kernel
> 	structure, the kernel verifies all additional fields are '0' if
> 	not the syscall will fail with -E2BIG.
> 
> 	sched_attr::sched_policy the desired scheduling policy.
> 
> 	sched_attr::sched_flags additional flags that can influence
> 	scheduling behaviour. Currently as per Linux kernel 3.14:
> 
> 		SCHED_FLAG_RESET_ON_FORK - resets the scheduling policy
> 		to: (struct sched_attr){ .sched_policy = SCHED_OTHER, }
> 		on fork().
> 
> 	is the only supported flag.
> 
> 	sched_attr::sched_nice should only be set for SCHED_OTHER,
> 	SCHED_BATCH, the desired nice value [-20,19], see NICE(2).
> 
> 	sched_attr::sched_priority should only be set for SCHED_FIFO,
> 	SCHED_RR, the desired static priority [1,99].
> 
> 	sched_attr::sched_runtime
> 	sched_attr::sched_deadline
> 	sched_attr::sched_period should only be set for SCHED_DEADLINE
> 	and are the traditional sporadic task model parameters.
> 
> 	The flags argument should be 0.
> 
> 	sched_getattr() queries the scheduling policy currently applied
> 	to the process identified by pid.  If pid equals zero, the
> 	policy of the calling process will be retrieved.
> 
> 	The size argument should reflect the size of struct sched_attr
> 	as known to userspace. The kernel fills out sched_attr::size to
> 	the size of its sched_attr structure. If the user provided
> 	structure is larger, additional fields are not touched. If the
> 	user provided structure is smaller, but the kernel needs to
> 	return values outside the provided space, the syscall will fail
> 	with -E2BIG.
> 
> 	The flags argument should be 0.

What about SCHED_FLAG_RESET_ON_FOR?

> 	The other sched_attr fields are filled out as described in
> 	sched_setattr().
> 
>    Scheduling Policies
>        The  scheduler  is  the  kernel  component  that decides which runnable
>        process will be executed by the CPU next.  Each process has an  associ‐
>        ated  scheduling  policy and a static scheduling priority, sched_prior‐
>        ity; these are the settings that are modified by  sched_setscheduler().
>        The  scheduler  makes it decisions based on knowledge of the scheduling
>        policy and static priority of all processes on the system.

Isn't this last sentence redundant/sliglhtly repetitive?

>        For processes scheduled under one of  the  normal  scheduling  policies
>        (SCHED_OTHER,  SCHED_IDLE,  SCHED_BATCH), sched_priority is not used in
>        scheduling decisions (it must be specified as 0).
> 
>        Processes scheduled under one of the  real-time  policies  (SCHED_FIFO,
>        SCHED_RR)  have  a  sched_priority  value  in  the  range 1 (low) to 99
>        (high).  (As the numbers imply, real-time processes always have  higher
>        priority than normal processes.)  Note well: POSIX.1-2001 only requires
>        an implementation to support a minimum 32 distinct priority levels  for
>        the  real-time  policies,  and  some  systems supply just this minimum.
>        Portable   programs   should    use    sched_get_priority_min(2)    and
>        sched_get_priority_max(2) to find the range of priorities supported for
>        a particular policy.
> 
>        Conceptually, the scheduler maintains a list of runnable processes  for
>        each  possible  sched_priority  value.   In  order  to  determine which
>        process runs next, the scheduler looks for the nonempty list  with  the
>        highest  static  priority  and  selects the process at the head of this
>        list.
> 
>        A process's scheduling policy determines where it will be inserted into
>        the  list  of processes with equal static priority and how it will move
>        inside this list.
> 
>        All scheduling is preemptive: if a process with a higher static  prior‐
>        ity  becomes  ready  to run, the currently running process will be pre‐
>        empted and returned to the wait list for  its  static  priority  level.
>        The  scheduling  policy only determines the ordering within the list of
>        runnable processes with equal static priority.
> 
>     SCHED_DEADLINE: Sporadic task model deadline scheduling
> 	SCHED_DEADLINE is an implementation of GEDF (Global Earliest
> 	Deadline First) with additional CBS (Constant Bandwidth Server).
> 	The CBS guarantees that tasks that over-run their specified
> 	budget are throttled and do not affect the correct performance
> 	of other SCHED_DEADLINE tasks.
> 
> 	SCHED_DEADLINE tasks will fail FORK(2) with -EAGAIN
> 
> 	Setting SCHED_DEADLINE can fail with -EINVAL when admission
> 	control tests fail.

Perhaps add a note about the deadline-class having higher priority than the 
other classes; i.e. if a deadline-task is runnable, it will preempt any 
other SCHED_(RR|FIFO) regardless of priority?

>    SCHED_FIFO: First In-First Out scheduling
>        SCHED_FIFO can only be used with static priorities higher than 0, which
>        means that when a SCHED_FIFO processes becomes runnable, it will always
>        immediately preempt any currently running SCHED_OTHER, SCHED_BATCH,  or
>        SCHED_IDLE  process.  SCHED_FIFO is a simple scheduling algorithm with‐
>        out time slicing.  For processes scheduled under the SCHED_FIFO policy,
>        the following rules apply:
> 
>        *  A  SCHED_FIFO  process that has been preempted by another process of
>           higher priority will stay at the head of the list for  its  priority
>           and  will resume execution as soon as all processes of higher prior‐
>           ity are blocked again.
> 
>        *  When a SCHED_FIFO process becomes runnable, it will be  inserted  at
>           the end of the list for its priority.
> 
>        *  A  call  to  sched_setscheduler()  or sched_setparam(2) will put the
>           SCHED_FIFO (or SCHED_RR) process identified by pid at the  start  of
>           the  list  if it was runnable.  As a consequence, it may preempt the
>           currently  running  process   if   it   has   the   same   priority.
>           (POSIX.1-2001 specifies that the process should go to the end of the
>           list.)
> 
>        *  A process calling sched_yield(2) will be put at the end of the list.

How about the recent discussion regarding sched_yield(). Is this correct?

lkml.kernel.org/r/alpine.DEB.2.02.1403312333100.14882-3cz04HxQygjZikZi3RtOZ2GXanvQGlWp@public.gmane.orgnix.de

Is this the correct place to add a note explaining te potentional pitfalls 
using sched_yield?

>        No other events will move a process scheduled under the SCHED_FIFO pol‐
>        icy in the wait list of runnable processes with equal static priority.
> 
>        A SCHED_FIFO process runs until either it is blocked by an I/O request,
>        it  is  preempted  by  a  higher  priority   process,   or   it   calls
>        sched_yield(2).
> 
>    SCHED_RR: Round Robin scheduling
>        SCHED_RR  is  a simple enhancement of SCHED_FIFO.  Everything described
>        above for SCHED_FIFO also applies to SCHED_RR, except that each process
>        is  only  allowed  to  run  for  a maximum time quantum.  If a SCHED_RR
>        process has been running for a time period equal to or longer than  the
>        time  quantum,  it will be put at the end of the list for its priority.
>        A SCHED_RR process that has been preempted by a higher priority process
>        and  subsequently  resumes execution as a running process will complete
>        the unexpired portion of its round robin time quantum.  The  length  of
>        the time quantum can be retrieved using sched_rr_get_interval(2).

-> Default is 0.1HZ ms

This is a question I get form time to time, having this in the manpage 
would be helpful.

>    SCHED_OTHER: Default Linux time-sharing scheduling
>        SCHED_OTHER  can only be used at static priority 0.  SCHED_OTHER is the
>        standard Linux time-sharing scheduler that is  intended  for  all  pro‐
>        cesses  that  do  not  require  the  special real-time mechanisms.  The
>        process to run is chosen from the static priority 0  list  based  on  a
>        dynamic priority that is determined only inside this list.  The dynamic
>        priority is based on the nice value (set by nice(2) or  setpriority(2))
>        and  increased  for  each time quantum the process is ready to run, but
>        denied to run by the scheduler.  This ensures fair progress  among  all
>        SCHED_OTHER processes.
> 
>    SCHED_BATCH: Scheduling batch processes
>        (Since  Linux 2.6.16.)  SCHED_BATCH can only be used at static priority
>        0.  This policy is similar to SCHED_OTHER  in  that  it  schedules  the
>        process  according  to  its dynamic priority (based on the nice value).
>        The difference is that this policy will cause the scheduler  to  always
>        assume  that the process is CPU-intensive.  Consequently, the scheduler
>        will apply a small scheduling penalty with respect to wakeup behaviour,
>        so that this process is mildly disfavored in scheduling decisions.
> 
>        This policy is useful for workloads that are noninteractive, but do not
>        want to lower their nice value, and for workloads that want a determin‐
>        istic scheduling policy without interactivity causing extra preemptions
>        (between the workload's tasks).
> 
>    SCHED_IDLE: Scheduling very low priority jobs
>        (Since Linux 2.6.23.)  SCHED_IDLE can only be used at  static  priority
>        0; the process nice value has no influence for this policy.
> 
>        This  policy  is  intended  for  running jobs at extremely low priority
>        (lower even than a +19 nice value with the SCHED_OTHER  or  SCHED_BATCH
>        policies).
> 
> RETURN VALUE
> 	On success, sched_setattr() and sched_getattr() return 0. On
> 	error, -1 is returned, and errno is set appropriately.
> 
> ERRORS
>        EINVAL The scheduling policy is not one  of  the  recognized  policies,
>               param is NULL, or param does not make sense for the policy.
> 
>        EPERM  The calling process does not have appropriate privileges.
> 
>        ESRCH  The process whose ID is pid could not be found.
> 
>        E2BIG  The provided storage for struct sched_attr is either too
>               big, see sched_setattr(), or too small, see sched_getattr().

Where's the EBUSY? It can throw this from __sched_setscheduler() when it 
checks if there's enough bandwidth to run the task.

> 
> NOTES
> 	While the text above (and in SCHED_SETSCHEDULER(2)) talks about
> 	processes, in actual fact these system calls are thread specific.


-- 
Henrik Austad
--
To unsubscribe from this list: send the line "unsubscribe linux-man" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: sched_{set,get}attr() manpage
       [not found]                     ` <20140409151911.GA4041-RT+80VE2nyv1P9xLtpHBDw@public.gmane.org>
@ 2014-04-09 15:42                       ` Peter Zijlstra
       [not found]                         ` <20140409154204.GD10526-ndre7Fmf5hadTX5a5knrm8zTDFooKrT+cvkQGrU6aU0@public.gmane.org>
  0 siblings, 1 reply; 26+ messages in thread
From: Peter Zijlstra @ 2014-04-09 15:42 UTC (permalink / raw)
  To: Henrik Austad
  Cc: Michael Kerrisk (man-pages), Dario Faggioli, Thomas Gleixner,
	Ingo Molnar, rostedt-nx8X9YLhiw1AfugRpC6u6w, Oleg Nesterov,
	fweisbec-Re5JQEeQqe8AvxtiuMwx3w, darren-P76s1CtE8BHQT0dZR+AlfA,
	johan.eker-IzeFyvvaP7pWk0Htik3J/w, p.faure-et3tyl94nDNyDzI6CaY1VQ,
	Linux Kernel, claudio-YOzL5CV4y4YG1A2ADO40+w,
	michael-dyjBcgdgk7Pe9wHmmfpqLFaTQe2KTcn/,
	fchecconi-Re5JQEeQqe8AvxtiuMwx3w, tommaso.cucinotta-gAmJrWFzCps,
	juri.lelli-Re5JQEeQqe8AvxtiuMwx3w,
	nicola.manica-+cHZLFJ93xAO91npARCAeA, luca.abeni-3IIOeSMMxS4,
	dhaval.giani-Re5JQEeQqe8AvxtiuMwx3w,
	hgu1972-Re5JQEeQqe8AvxtiuMwx3w, Paul McKenney,
	insop.song-Re5JQEeQqe8AvxtiuMwx3w,
	liming.wang-CWA4WttNNZF54TAoqtyWWQ, jkacur-H+wXaHxf7aLQT0dZR+AlfA,
	linux-man-u79uwXL29TY76Z2rM5mHXA

On Wed, Apr 09, 2014 at 05:19:11PM +0200, Henrik Austad wrote:
> > 	The following "real-time" policies are also supported, for
> 
> why the "'s?

I borrowed those from SCHED_SETSCHEDULER(2).

> > 	sched_attr::sched_flags additional flags that can influence
> > 	scheduling behaviour. Currently as per Linux kernel 3.14:
> > 
> > 		SCHED_FLAG_RESET_ON_FORK - resets the scheduling policy
> > 		to: (struct sched_attr){ .sched_policy = SCHED_OTHER, }
> > 		on fork().
> > 
> > 	is the only supported flag.

...

> > 	The flags argument should be 0.
> 
> What about SCHED_FLAG_RESET_ON_FOR?

Different flags. The one is sched_attr::flags the other is
sched_setattr(.flags).

> > 	The other sched_attr fields are filled out as described in
> > 	sched_setattr().
> > 
> >    Scheduling Policies
> >        The  scheduler  is  the  kernel  component  that decides which runnable
> >        process will be executed by the CPU next.  Each process has an  associ‐
> >        ated  scheduling  policy and a static scheduling priority, sched_prior‐
> >        ity; these are the settings that are modified by  sched_setscheduler().
> >        The  scheduler  makes it decisions based on knowledge of the scheduling
> >        policy and static priority of all processes on the system.
> 
> Isn't this last sentence redundant/sliglhtly repetitive?

I borrowed that from SCHED_SETSCHEDULER(2) again.

> >     SCHED_DEADLINE: Sporadic task model deadline scheduling
> > 	SCHED_DEADLINE is an implementation of GEDF (Global Earliest
> > 	Deadline First) with additional CBS (Constant Bandwidth Server).
> > 	The CBS guarantees that tasks that over-run their specified
> > 	budget are throttled and do not affect the correct performance
> > 	of other SCHED_DEADLINE tasks.
> > 
> > 	SCHED_DEADLINE tasks will fail FORK(2) with -EAGAIN
> > 
> > 	Setting SCHED_DEADLINE can fail with -EINVAL when admission
> > 	control tests fail.
> 
> Perhaps add a note about the deadline-class having higher priority than the 
> other classes; i.e. if a deadline-task is runnable, it will preempt any 
> other SCHED_(RR|FIFO) regardless of priority?

Yes, good point, will do.

> >    SCHED_FIFO: First In-First Out scheduling
> >        SCHED_FIFO can only be used with static priorities higher than 0, which
> >        means that when a SCHED_FIFO processes becomes runnable, it will always
> >        immediately preempt any currently running SCHED_OTHER, SCHED_BATCH,  or
> >        SCHED_IDLE  process.  SCHED_FIFO is a simple scheduling algorithm with‐
> >        out time slicing.  For processes scheduled under the SCHED_FIFO policy,
> >        the following rules apply:
> > 
> >        *  A  SCHED_FIFO  process that has been preempted by another process of
> >           higher priority will stay at the head of the list for  its  priority
> >           and  will resume execution as soon as all processes of higher prior‐
> >           ity are blocked again.
> > 
> >        *  When a SCHED_FIFO process becomes runnable, it will be  inserted  at
> >           the end of the list for its priority.
> > 
> >        *  A  call  to  sched_setscheduler()  or sched_setparam(2) will put the
> >           SCHED_FIFO (or SCHED_RR) process identified by pid at the  start  of
> >           the  list  if it was runnable.  As a consequence, it may preempt the
> >           currently  running  process   if   it   has   the   same   priority.
> >           (POSIX.1-2001 specifies that the process should go to the end of the
> >           list.)
> > 
> >        *  A process calling sched_yield(2) will be put at the end of the list.
> 
> How about the recent discussion regarding sched_yield(). Is this correct?
> 
> lkml.kernel.org/r/alpine.DEB.2.02.1403312333100.14882-3cz04HxQyghirQl2xIGP3A@public.gmane.orgronix.de
> 
> Is this the correct place to add a note explaining te potentional pitfalls 
> using sched_yield?

I'm not sure; there's a SCHED_YIELD(2) manpage to fill with that
nonsense.

Also; I realized I have not described the DEADLINE sched_yield()
behaviour.

> >        No other events will move a process scheduled under the SCHED_FIFO pol‐
> >        icy in the wait list of runnable processes with equal static priority.
> > 
> >        A SCHED_FIFO process runs until either it is blocked by an I/O request,
> >        it  is  preempted  by  a  higher  priority   process,   or   it   calls
> >        sched_yield(2).
> > 
> >    SCHED_RR: Round Robin scheduling
> >        SCHED_RR  is  a simple enhancement of SCHED_FIFO.  Everything described
> >        above for SCHED_FIFO also applies to SCHED_RR, except that each process
> >        is  only  allowed  to  run  for  a maximum time quantum.  If a SCHED_RR
> >        process has been running for a time period equal to or longer than  the
> >        time  quantum,  it will be put at the end of the list for its priority.
> >        A SCHED_RR process that has been preempted by a higher priority process
> >        and  subsequently  resumes execution as a running process will complete
> >        the unexpired portion of its round robin time quantum.  The  length  of
> >        the time quantum can be retrieved using sched_rr_get_interval(2).
> 
> -> Default is 0.1HZ ms
> 
> This is a question I get form time to time, having this in the manpage 
> would be helpful.

Again, brazenly stolen from SCHED_SETSCHEDULER(2); but yes. Also I'm not
sure I'd call RR an enhancement of anything much at all ;-)

> > ERRORS
> >        EINVAL The scheduling policy is not one  of  the  recognized  policies,
> >               param is NULL, or param does not make sense for the policy.
> > 
> >        EPERM  The calling process does not have appropriate privileges.
> > 
> >        ESRCH  The process whose ID is pid could not be found.
> > 
> >        E2BIG  The provided storage for struct sched_attr is either too
> >               big, see sched_setattr(), or too small, see sched_getattr().
> 
> Where's the EBUSY? It can throw this from __sched_setscheduler() when it 
> checks if there's enough bandwidth to run the task.

Uhhm.. it got lost :-) /me quickly adds.
--
To unsubscribe from this list: send the line "unsubscribe linux-man" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: sched_{set,get}attr() manpage
       [not found]                         ` <20140409154204.GD10526-ndre7Fmf5hadTX5a5knrm8zTDFooKrT+cvkQGrU6aU0@public.gmane.org>
@ 2014-04-10  7:47                           ` Juri Lelli
  2014-04-10  9:59                             ` Claudio Scordino
  2014-04-27 15:47                           ` Michael Kerrisk (man-pages)
  1 sibling, 1 reply; 26+ messages in thread
From: Juri Lelli @ 2014-04-10  7:47 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Henrik Austad, Michael Kerrisk (man-pages), Dario Faggioli,
	Thomas Gleixner, Ingo Molnar, rostedt-nx8X9YLhiw1AfugRpC6u6w,
	Oleg Nesterov, fweisbec-Re5JQEeQqe8AvxtiuMwx3w,
	darren-P76s1CtE8BHQT0dZR+AlfA, johan.eker-IzeFyvvaP7pWk0Htik3J/w,
	p.faure-et3tyl94nDNyDzI6CaY1VQ, Linux Kernel,
	claudio-YOzL5CV4y4YG1A2ADO40+w,
	michael-dyjBcgdgk7Pe9wHmmfpqLFaTQe2KTcn/,
	fchecconi-Re5JQEeQqe8AvxtiuMwx3w, tommaso.cucinotta-gAmJrWFzCps,
	nicola.manica-+cHZLFJ93xAO91npARCAeA, luca.abeni-3IIOeSMMxS4,
	dhaval.giani-Re5JQEeQqe8AvxtiuMwx3w,
	hgu1972-Re5JQEeQqe8AvxtiuMwx3w, Paul McKenney,
	insop.song-Re5JQEeQqe8AvxtiuMwx3w,
	liming.wang-CWA4WttNNZF54TAoqtyWWQ, jkacur-H+wXaHxf7aLQT0dZR+AlfA,
	linux-man-u79uwXL29TY76Z2rM5mHXA

Hi all,

On Wed, 9 Apr 2014 17:42:04 +0200
Peter Zijlstra <peterz-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org> wrote:

> On Wed, Apr 09, 2014 at 05:19:11PM +0200, Henrik Austad wrote:
> > > 	The following "real-time" policies are also supported, for
> > 
> > why the "'s?
> 
> I borrowed those from SCHED_SETSCHEDULER(2).
> 
> > > 	sched_attr::sched_flags additional flags that can influence
> > > 	scheduling behaviour. Currently as per Linux kernel 3.14:
> > > 
> > > 		SCHED_FLAG_RESET_ON_FORK - resets the scheduling policy
> > > 		to: (struct sched_attr){ .sched_policy = SCHED_OTHER, }
> > > 		on fork().
> > > 
> > > 	is the only supported flag.
> 
> ...
> 
> > > 	The flags argument should be 0.
> > 
> > What about SCHED_FLAG_RESET_ON_FOR?
> 
> Different flags. The one is sched_attr::flags the other is
> sched_setattr(.flags).
> 
> > > 	The other sched_attr fields are filled out as described in
> > > 	sched_setattr().
> > > 
> > >    Scheduling Policies
> > >        The  scheduler  is  the  kernel  component  that decides which runnable
> > >        process will be executed by the CPU next.  Each process has an  associ‐
> > >        ated  scheduling  policy and a static scheduling priority, sched_prior‐
> > >        ity; these are the settings that are modified by  sched_setscheduler().
> > >        The  scheduler  makes it decisions based on knowledge of the scheduling
> > >        policy and static priority of all processes on the system.
> > 
> > Isn't this last sentence redundant/sliglhtly repetitive?
> 
> I borrowed that from SCHED_SETSCHEDULER(2) again.
> 
> > >     SCHED_DEADLINE: Sporadic task model deadline scheduling
> > > 	SCHED_DEADLINE is an implementation of GEDF (Global Earliest
> > > 	Deadline First) with additional CBS (Constant Bandwidth Server).
> > > 	The CBS guarantees that tasks that over-run their specified
> > > 	budget are throttled and do not affect the correct performance
> > > 	of other SCHED_DEADLINE tasks.
> > > 
> > > 	SCHED_DEADLINE tasks will fail FORK(2) with -EAGAIN
> > > 
> > > 	Setting SCHED_DEADLINE can fail with -EINVAL when admission
> > > 	control tests fail.
> > 
> > Perhaps add a note about the deadline-class having higher priority than the 
> > other classes; i.e. if a deadline-task is runnable, it will preempt any 
> > other SCHED_(RR|FIFO) regardless of priority?
> 
> Yes, good point, will do.
> 
> > >    SCHED_FIFO: First In-First Out scheduling
> > >        SCHED_FIFO can only be used with static priorities higher than 0, which
> > >        means that when a SCHED_FIFO processes becomes runnable, it will always
> > >        immediately preempt any currently running SCHED_OTHER, SCHED_BATCH,  or
> > >        SCHED_IDLE  process.  SCHED_FIFO is a simple scheduling algorithm with‐
> > >        out time slicing.  For processes scheduled under the SCHED_FIFO policy,
> > >        the following rules apply:
> > > 
> > >        *  A  SCHED_FIFO  process that has been preempted by another process of
> > >           higher priority will stay at the head of the list for  its  priority
> > >           and  will resume execution as soon as all processes of higher prior‐
> > >           ity are blocked again.
> > > 
> > >        *  When a SCHED_FIFO process becomes runnable, it will be  inserted  at
> > >           the end of the list for its priority.
> > > 
> > >        *  A  call  to  sched_setscheduler()  or sched_setparam(2) will put the
> > >           SCHED_FIFO (or SCHED_RR) process identified by pid at the  start  of
> > >           the  list  if it was runnable.  As a consequence, it may preempt the
> > >           currently  running  process   if   it   has   the   same   priority.
> > >           (POSIX.1-2001 specifies that the process should go to the end of the
> > >           list.)
> > > 
> > >        *  A process calling sched_yield(2) will be put at the end of the list.
> > 
> > How about the recent discussion regarding sched_yield(). Is this correct?
> > 
> > lkml.kernel.org/r/alpine.DEB.2.02.1403312333100.14882-3cz04HxQyghBDYKCnqQNtA@public.gmane.orgutronix.de
> > 
> > Is this the correct place to add a note explaining te potentional pitfalls 
> > using sched_yield?
> 
> I'm not sure; there's a SCHED_YIELD(2) manpage to fill with that
> nonsense.
> 
> Also; I realized I have not described the DEADLINE sched_yield()
> behaviour.
> 

So, for SCHED_DEADLINE we currently have this behaviour:

/*
 * Yield task semantic for -deadline tasks is:
 *
 *   get off from the CPU until our next instance, with
 *   a new runtime. This is of little use now, since we
 *   don't have a bandwidth reclaiming mechanism. Anyway,
 *   bandwidth reclaiming is planned for the future, and
 *   yield_task_dl will indicate that some spare budget
 *   is available for other task instances to use it.
 */

But, considering also the discussion above, I'm less sure now that's
what we want. Still, I think we will want some way in the future to be
able to say "I'm finished with my current job, give this remaining
runtime to someone else", like another syscall or something.

Thanks,

- Juri

> > >        No other events will move a process scheduled under the SCHED_FIFO pol‐
> > >        icy in the wait list of runnable processes with equal static priority.
> > > 
> > >        A SCHED_FIFO process runs until either it is blocked by an I/O request,
> > >        it  is  preempted  by  a  higher  priority   process,   or   it   calls
> > >        sched_yield(2).
> > > 
> > >    SCHED_RR: Round Robin scheduling
> > >        SCHED_RR  is  a simple enhancement of SCHED_FIFO.  Everything described
> > >        above for SCHED_FIFO also applies to SCHED_RR, except that each process
> > >        is  only  allowed  to  run  for  a maximum time quantum.  If a SCHED_RR
> > >        process has been running for a time period equal to or longer than  the
> > >        time  quantum,  it will be put at the end of the list for its priority.
> > >        A SCHED_RR process that has been preempted by a higher priority process
> > >        and  subsequently  resumes execution as a running process will complete
> > >        the unexpired portion of its round robin time quantum.  The  length  of
> > >        the time quantum can be retrieved using sched_rr_get_interval(2).
> > 
> > -> Default is 0.1HZ ms
> > 
> > This is a question I get form time to time, having this in the manpage 
> > would be helpful.
> 
> Again, brazenly stolen from SCHED_SETSCHEDULER(2); but yes. Also I'm not
> sure I'd call RR an enhancement of anything much at all ;-)
> 
> > > ERRORS
> > >        EINVAL The scheduling policy is not one  of  the  recognized  policies,
> > >               param is NULL, or param does not make sense for the policy.
> > > 
> > >        EPERM  The calling process does not have appropriate privileges.
> > > 
> > >        ESRCH  The process whose ID is pid could not be found.
> > > 
> > >        E2BIG  The provided storage for struct sched_attr is either too
> > >               big, see sched_setattr(), or too small, see sched_getattr().
> > 
> > Where's the EBUSY? It can throw this from __sched_setscheduler() when it 
> > checks if there's enough bandwidth to run the task.
> 
> Uhhm.. it got lost :-) /me quickly adds.
--
To unsubscribe from this list: send the line "unsubscribe linux-man" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: sched_{set,get}attr() manpage
  2014-04-10  7:47                           ` Juri Lelli
@ 2014-04-10  9:59                             ` Claudio Scordino
  0 siblings, 0 replies; 26+ messages in thread
From: Claudio Scordino @ 2014-04-10  9:59 UTC (permalink / raw)
  To: Juri Lelli, Peter Zijlstra
  Cc: Henrik Austad, Michael Kerrisk (man-pages), Dario Faggioli,
	Thomas Gleixner, Ingo Molnar, rostedt, Oleg Nesterov, fweisbec,
	darren, johan.eker, p.faure, Linux Kernel, michael, fchecconi,
	tommaso.cucinotta, nicola.manica, luca.abeni, dhaval.giani,
	hgu1972, Paul McKenney, insop.song, liming.wang, jkacur,
	linux-man

Il 10/04/2014 09:47, Juri Lelli ha scritto:
> Hi all,
>
> On Wed, 9 Apr 2014 17:42:04 +0200
> Peter Zijlstra <peterz@infradead.org> wrote:
>
>> On Wed, Apr 09, 2014 at 05:19:11PM +0200, Henrik Austad wrote:
>>>> 	The following "real-time" policies are also supported, for
>>> why the "'s?
>> I borrowed those from SCHED_SETSCHEDULER(2).
>>
>>>> 	sched_attr::sched_flags additional flags that can influence
>>>> 	scheduling behaviour. Currently as per Linux kernel 3.14:
>>>>
>>>> 		SCHED_FLAG_RESET_ON_FORK - resets the scheduling policy
>>>> 		to: (struct sched_attr){ .sched_policy = SCHED_OTHER, }
>>>> 		on fork().
>>>>
>>>> 	is the only supported flag.
>> ...
>>
>>>> 	The flags argument should be 0.
>>> What about SCHED_FLAG_RESET_ON_FOR?
>> Different flags. The one is sched_attr::flags the other is
>> sched_setattr(.flags).
>>
>>>> 	The other sched_attr fields are filled out as described in
>>>> 	sched_setattr().
>>>>
>>>>     Scheduling Policies
>>>>         The  scheduler  is  the  kernel  component  that decides which runnable
>>>>         process will be executed by the CPU next.  Each process has an  associ‐
>>>>         ated  scheduling  policy and a static scheduling priority, sched_prior‐
>>>>         ity; these are the settings that are modified by  sched_setscheduler().
>>>>         The  scheduler  makes it decisions based on knowledge of the scheduling
>>>>         policy and static priority of all processes on the system.
>>> Isn't this last sentence redundant/sliglhtly repetitive?
>> I borrowed that from SCHED_SETSCHEDULER(2) again.
>>
>>>>      SCHED_DEADLINE: Sporadic task model deadline scheduling
>>>> 	SCHED_DEADLINE is an implementation of GEDF (Global Earliest
>>>> 	Deadline First) with additional CBS (Constant Bandwidth Server).
>>>> 	The CBS guarantees that tasks that over-run their specified
>>>> 	budget are throttled and do not affect the correct performance
>>>> 	of other SCHED_DEADLINE tasks.
>>>>
>>>> 	SCHED_DEADLINE tasks will fail FORK(2) with -EAGAIN
>>>>
>>>> 	Setting SCHED_DEADLINE can fail with -EINVAL when admission
>>>> 	control tests fail.
>>> Perhaps add a note about the deadline-class having higher priority than the
>>> other classes; i.e. if a deadline-task is runnable, it will preempt any
>>> other SCHED_(RR|FIFO) regardless of priority?
>> Yes, good point, will do.
>>
>>>>     SCHED_FIFO: First In-First Out scheduling
>>>>         SCHED_FIFO can only be used with static priorities higher than 0, which
>>>>         means that when a SCHED_FIFO processes becomes runnable, it will always
>>>>         immediately preempt any currently running SCHED_OTHER, SCHED_BATCH,  or
>>>>         SCHED_IDLE  process.  SCHED_FIFO is a simple scheduling algorithm with‐
>>>>         out time slicing.  For processes scheduled under the SCHED_FIFO policy,
>>>>         the following rules apply:
>>>>
>>>>         *  A  SCHED_FIFO  process that has been preempted by another process of
>>>>            higher priority will stay at the head of the list for  its  priority
>>>>            and  will resume execution as soon as all processes of higher prior‐
>>>>            ity are blocked again.
>>>>
>>>>         *  When a SCHED_FIFO process becomes runnable, it will be  inserted  at
>>>>            the end of the list for its priority.
>>>>
>>>>         *  A  call  to  sched_setscheduler()  or sched_setparam(2) will put the
>>>>            SCHED_FIFO (or SCHED_RR) process identified by pid at the  start  of
>>>>            the  list  if it was runnable.  As a consequence, it may preempt the
>>>>            currently  running  process   if   it   has   the   same   priority.
>>>>            (POSIX.1-2001 specifies that the process should go to the end of the
>>>>            list.)
>>>>
>>>>         *  A process calling sched_yield(2) will be put at the end of the list.
>>> How about the recent discussion regarding sched_yield(). Is this correct?
>>>
>>> lkml.kernel.org/r/alpine.DEB.2.02.1403312333100.14882@ionos.tec.linutronix.de
>>>
>>> Is this the correct place to add a note explaining te potentional pitfalls
>>> using sched_yield?
>> I'm not sure; there's a SCHED_YIELD(2) manpage to fill with that
>> nonsense.
>>
>> Also; I realized I have not described the DEADLINE sched_yield()
>> behaviour.
>>
> So, for SCHED_DEADLINE we currently have this behaviour:
>
> /*
>   * Yield task semantic for -deadline tasks is:
>   *
>   *   get off from the CPU until our next instance, with
>   *   a new runtime. This is of little use now, since we
>   *   don't have a bandwidth reclaiming mechanism. Anyway,
>   *   bandwidth reclaiming is planned for the future, and
>   *   yield_task_dl will indicate that some spare budget
>   *   is available for other task instances to use it.
>   */
>
> But, considering also the discussion above, I'm less sure now that's
> what we want. Still, I think we will want some way in the future to be
> able to say "I'm finished with my current job, give this remaining
> runtime to someone else", like another syscall or something.

Hi Juri, hi Peter,

my two cents:

A syscall to block the task until its next instance is definitely useful.
This way, a periodic task doesn't have to sleep anymore: the kernel 
takes care of unblocking the task at the right moment.
This would be easier (for user-level) and more efficient too.
I don't know if using sched_yield() to get this behavior is a good 
choice or not. You have ways more experience than me :)

Best,

         Claudio

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: sched_{set,get}attr() manpage
       [not found]                         ` <20140409154204.GD10526-ndre7Fmf5hadTX5a5knrm8zTDFooKrT+cvkQGrU6aU0@public.gmane.org>
  2014-04-10  7:47                           ` Juri Lelli
@ 2014-04-27 15:47                           ` Michael Kerrisk (man-pages)
       [not found]                             ` <CAKgNAki5BkOyckf1zxJCRs2tq-eG9bWW_yRGi3hDynz12wz+QQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  1 sibling, 1 reply; 26+ messages in thread
From: Michael Kerrisk (man-pages) @ 2014-04-27 15:47 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Henrik Austad, Dario Faggioli, Thomas Gleixner, Ingo Molnar,
	rostedt-nx8X9YLhiw1AfugRpC6u6w, Oleg Nesterov,
	Frédéric Weisbecker, darren-P76s1CtE8BHQT0dZR+AlfA,
	johan.eker-IzeFyvvaP7pWk0Htik3J/w, p.faure-et3tyl94nDNyDzI6CaY1VQ,
	Linux Kernel, Claudio Scordino, Michael Trimarchi, Fabio Checconi,
	Tommaso Cucinotta, Juri Lelli,
	nicola.manica-+cHZLFJ93xAO91npARCAeA, luca.abeni-3IIOeSMMxS4,
	Dhaval Giani, hgu1972-Re5JQEeQqe8AvxtiuMwx3w, Paul McKenney,
	Insop Song, liming.wang-CWA4WttNNZF54TAoqtyWWQ,
	jkacur-H+wXaHxf7aLQT0dZR+AlfA, linux-man

Hi Peter,

Following the review comments that one or two people sent, are you
planning to send in a revised version of this page? Also, is there any
test code lying about somewhere that I could play with?

Thanks,

Michael


On Wed, Apr 9, 2014 at 5:42 PM, Peter Zijlstra <peterz-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org> wrote:
> On Wed, Apr 09, 2014 at 05:19:11PM +0200, Henrik Austad wrote:
>> >     The following "real-time" policies are also supported, for
>>
>> why the "'s?
>
> I borrowed those from SCHED_SETSCHEDULER(2).
>
>> >     sched_attr::sched_flags additional flags that can influence
>> >     scheduling behaviour. Currently as per Linux kernel 3.14:
>> >
>> >             SCHED_FLAG_RESET_ON_FORK - resets the scheduling policy
>> >             to: (struct sched_attr){ .sched_policy = SCHED_OTHER, }
>> >             on fork().
>> >
>> >     is the only supported flag.
>
> ...
>
>> >     The flags argument should be 0.
>>
>> What about SCHED_FLAG_RESET_ON_FOR?
>
> Different flags. The one is sched_attr::flags the other is
> sched_setattr(.flags).
>
>> >     The other sched_attr fields are filled out as described in
>> >     sched_setattr().
>> >
>> >    Scheduling Policies
>> >        The  scheduler  is  the  kernel  component  that decides which runnable
>> >        process will be executed by the CPU next.  Each process has an  associ‐
>> >        ated  scheduling  policy and a static scheduling priority, sched_prior‐
>> >        ity; these are the settings that are modified by  sched_setscheduler().
>> >        The  scheduler  makes it decisions based on knowledge of the scheduling
>> >        policy and static priority of all processes on the system.
>>
>> Isn't this last sentence redundant/sliglhtly repetitive?
>
> I borrowed that from SCHED_SETSCHEDULER(2) again.
>
>> >     SCHED_DEADLINE: Sporadic task model deadline scheduling
>> >     SCHED_DEADLINE is an implementation of GEDF (Global Earliest
>> >     Deadline First) with additional CBS (Constant Bandwidth Server).
>> >     The CBS guarantees that tasks that over-run their specified
>> >     budget are throttled and do not affect the correct performance
>> >     of other SCHED_DEADLINE tasks.
>> >
>> >     SCHED_DEADLINE tasks will fail FORK(2) with -EAGAIN
>> >
>> >     Setting SCHED_DEADLINE can fail with -EINVAL when admission
>> >     control tests fail.
>>
>> Perhaps add a note about the deadline-class having higher priority than the
>> other classes; i.e. if a deadline-task is runnable, it will preempt any
>> other SCHED_(RR|FIFO) regardless of priority?
>
> Yes, good point, will do.
>
>> >    SCHED_FIFO: First In-First Out scheduling
>> >        SCHED_FIFO can only be used with static priorities higher than 0, which
>> >        means that when a SCHED_FIFO processes becomes runnable, it will always
>> >        immediately preempt any currently running SCHED_OTHER, SCHED_BATCH,  or
>> >        SCHED_IDLE  process.  SCHED_FIFO is a simple scheduling algorithm with‐
>> >        out time slicing.  For processes scheduled under the SCHED_FIFO policy,
>> >        the following rules apply:
>> >
>> >        *  A  SCHED_FIFO  process that has been preempted by another process of
>> >           higher priority will stay at the head of the list for  its  priority
>> >           and  will resume execution as soon as all processes of higher prior‐
>> >           ity are blocked again.
>> >
>> >        *  When a SCHED_FIFO process becomes runnable, it will be  inserted  at
>> >           the end of the list for its priority.
>> >
>> >        *  A  call  to  sched_setscheduler()  or sched_setparam(2) will put the
>> >           SCHED_FIFO (or SCHED_RR) process identified by pid at the  start  of
>> >           the  list  if it was runnable.  As a consequence, it may preempt the
>> >           currently  running  process   if   it   has   the   same   priority.
>> >           (POSIX.1-2001 specifies that the process should go to the end of the
>> >           list.)
>> >
>> >        *  A process calling sched_yield(2) will be put at the end of the list.
>>
>> How about the recent discussion regarding sched_yield(). Is this correct?
>>
>> lkml.kernel.org/r/alpine.DEB.2.02.1403312333100.14882-3cz04HxQyghMPrZFtrUIWQ@public.gmane.orgtronix.de
>>
>> Is this the correct place to add a note explaining te potentional pitfalls
>> using sched_yield?
>
> I'm not sure; there's a SCHED_YIELD(2) manpage to fill with that
> nonsense.
>
> Also; I realized I have not described the DEADLINE sched_yield()
> behaviour.
>
>> >        No other events will move a process scheduled under the SCHED_FIFO pol‐
>> >        icy in the wait list of runnable processes with equal static priority.
>> >
>> >        A SCHED_FIFO process runs until either it is blocked by an I/O request,
>> >        it  is  preempted  by  a  higher  priority   process,   or   it   calls
>> >        sched_yield(2).
>> >
>> >    SCHED_RR: Round Robin scheduling
>> >        SCHED_RR  is  a simple enhancement of SCHED_FIFO.  Everything described
>> >        above for SCHED_FIFO also applies to SCHED_RR, except that each process
>> >        is  only  allowed  to  run  for  a maximum time quantum.  If a SCHED_RR
>> >        process has been running for a time period equal to or longer than  the
>> >        time  quantum,  it will be put at the end of the list for its priority.
>> >        A SCHED_RR process that has been preempted by a higher priority process
>> >        and  subsequently  resumes execution as a running process will complete
>> >        the unexpired portion of its round robin time quantum.  The  length  of
>> >        the time quantum can be retrieved using sched_rr_get_interval(2).
>>
>> -> Default is 0.1HZ ms
>>
>> This is a question I get form time to time, having this in the manpage
>> would be helpful.
>
> Again, brazenly stolen from SCHED_SETSCHEDULER(2); but yes. Also I'm not
> sure I'd call RR an enhancement of anything much at all ;-)
>
>> > ERRORS
>> >        EINVAL The scheduling policy is not one  of  the  recognized  policies,
>> >               param is NULL, or param does not make sense for the policy.
>> >
>> >        EPERM  The calling process does not have appropriate privileges.
>> >
>> >        ESRCH  The process whose ID is pid could not be found.
>> >
>> >        E2BIG  The provided storage for struct sched_attr is either too
>> >               big, see sched_setattr(), or too small, see sched_getattr().
>>
>> Where's the EBUSY? It can throw this from __sched_setscheduler() when it
>> checks if there's enough bandwidth to run the task.
>
> Uhhm.. it got lost :-) /me quickly adds.



-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/
--
To unsubscribe from this list: send the line "unsubscribe linux-man" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: sched_{set,get}attr() manpage
       [not found]                             ` <CAKgNAki5BkOyckf1zxJCRs2tq-eG9bWW_yRGi3hDynz12wz+QQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2014-04-27 19:34                               ` Peter Zijlstra
  2014-04-27 19:45                                 ` Steven Rostedt
       [not found]                                 ` <20140427193449.GB17778-RM5+C6weyIYnLiPH7yDmwOa11wxjtiyuLtmvbW2Dspo@public.gmane.org>
  0 siblings, 2 replies; 26+ messages in thread
From: Peter Zijlstra @ 2014-04-27 19:34 UTC (permalink / raw)
  To: Michael Kerrisk (man-pages)
  Cc: Henrik Austad, Dario Faggioli, Thomas Gleixner, Ingo Molnar,
	rostedt-nx8X9YLhiw1AfugRpC6u6w, Oleg Nesterov,
	Frédéric Weisbecker, darren-P76s1CtE8BHQT0dZR+AlfA,
	johan.eker-IzeFyvvaP7pWk0Htik3J/w, p.faure-et3tyl94nDNyDzI6CaY1VQ,
	Linux Kernel, Claudio Scordino, Michael Trimarchi, Fabio Checconi,
	Tommaso Cucinotta, Juri Lelli,
	nicola.manica-+cHZLFJ93xAO91npARCAeA, luca.abeni-3IIOeSMMxS4,
	Dhaval Giani, hgu1972-Re5JQEeQqe8AvxtiuMwx3w, Paul McKenney,
	Insop Song, liming.wang-CWA4WttNNZF54TAoqtyWWQ,
	jkacur-H+wXaHxf7aLQT0dZR+AlfA, linux-man

On Sun, Apr 27, 2014 at 05:47:25PM +0200, Michael Kerrisk (man-pages) wrote:
> Hi Peter,
> 
> Following the review comments that one or two people sent, are you
> planning to send in a revised version of this page?

Yes, I just suck at getting around to it :-(, I'll do it first thing
tomorrow.

> Also, is there any test code lying about somewhere that I could play with?

Juri?
--
To unsubscribe from this list: send the line "unsubscribe linux-man" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: sched_{set,get}attr() manpage
  2014-04-27 19:34                               ` Peter Zijlstra
@ 2014-04-27 19:45                                 ` Steven Rostedt
       [not found]                                 ` <20140427193449.GB17778-RM5+C6weyIYnLiPH7yDmwOa11wxjtiyuLtmvbW2Dspo@public.gmane.org>
  1 sibling, 0 replies; 26+ messages in thread
From: Steven Rostedt @ 2014-04-27 19:45 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Michael Kerrisk (man-pages), Henrik Austad, Dario Faggioli,
	Thomas Gleixner, Ingo Molnar, Oleg Nesterov,
	Frédéric Weisbecker, darren, johan.eker, p.faure,
	Linux Kernel, Claudio Scordino, Michael Trimarchi, Fabio Checconi,
	Tommaso Cucinotta, Juri Lelli, nicola.manica, luca.abeni,
	Dhaval Giani, hgu1972, Paul McKenney, Insop Song, liming.wang,
	jkacur, linux-man

On Sun, 27 Apr 2014 21:34:49 +0200
Peter Zijlstra <peterz@infradead.org> wrote:

> > Also, is there any test code lying about somewhere that I could play with?

I have a deadline program you can play with too:

http://rostedt.homelinux.com/private/deadline.c

-- Steve

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: sched_{set,get}attr() manpage
       [not found]                                 ` <20140427193449.GB17778-RM5+C6weyIYnLiPH7yDmwOa11wxjtiyuLtmvbW2Dspo@public.gmane.org>
@ 2014-04-28  7:39                                   ` Juri Lelli
  0 siblings, 0 replies; 26+ messages in thread
From: Juri Lelli @ 2014-04-28  7:39 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Michael Kerrisk (man-pages), Henrik Austad, Dario Faggioli,
	Thomas Gleixner, Ingo Molnar, rostedt-nx8X9YLhiw1AfugRpC6u6w,
	Oleg Nesterov, Frédéric Weisbecker,
	darren-P76s1CtE8BHQT0dZR+AlfA, johan.eker-IzeFyvvaP7pWk0Htik3J/w,
	p.faure-et3tyl94nDNyDzI6CaY1VQ, Linux Kernel, Claudio Scordino,
	Michael Trimarchi, Fabio Checconi, Tommaso Cucinotta,
	nicola.manica-+cHZLFJ93xAO91npARCAeA, luca.abeni-3IIOeSMMxS4,
	Dhaval Giani, hgu1972-Re5JQEeQqe8AvxtiuMwx3w, Paul McKenney,
	Insop Song, liming.wang-CWA4WttNNZF54TAoqtyWWQ,
	jkacur-H+wXaHxf7aLQT0dZR+AlfA, linux-man

On Sun, 27 Apr 2014 21:34:49 +0200
Peter Zijlstra <peterz-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org> wrote:

> On Sun, Apr 27, 2014 at 05:47:25PM +0200, Michael Kerrisk (man-pages) wrote:
> > Hi Peter,
> > 
> > Following the review comments that one or two people sent, are you
> > planning to send in a revised version of this page?
> 
> Yes, I just suck at getting around to it :-(, I'll do it first thing
> tomorrow.
> 
> > Also, is there any test code lying about somewhere that I could play with?
> 
> Juri?

Yes. I use this two tools:

- rt-app (to create periodic workload, also not RT/DL)
  https://github.com/gbagnoli/rt-app

- schedtool-dl (patched version of schetool)
  https://github.com/jlelli/schedtool-dl

Both are aligned to the last interface.

Best,

- Juri
--
To unsubscribe from this list: send the line "unsubscribe linux-man" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 26+ messages in thread

* sched_{set,get}attr() manpage
       [not found]             ` <53020C9D.1050208-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
  2014-04-09  9:25               ` sched_{set,get}attr() manpage Peter Zijlstra
@ 2014-04-28  8:18               ` Peter Zijlstra
       [not found]                 ` <20140428081858.GX13658-ndre7Fmf5hadTX5a5knrm8zTDFooKrT+cvkQGrU6aU0@public.gmane.org>
  1 sibling, 1 reply; 26+ messages in thread
From: Peter Zijlstra @ 2014-04-28  8:18 UTC (permalink / raw)
  To: Michael Kerrisk (man-pages)
  Cc: Dario Faggioli, Thomas Gleixner, Ingo Molnar,
	rostedt-nx8X9YLhiw1AfugRpC6u6w, Oleg Nesterov,
	fweisbec-Re5JQEeQqe8AvxtiuMwx3w, darren-P76s1CtE8BHQT0dZR+AlfA,
	johan.eker-IzeFyvvaP7pWk0Htik3J/w, p.faure-et3tyl94nDNyDzI6CaY1VQ,
	Linux Kernel, claudio-YOzL5CV4y4YG1A2ADO40+w,
	michael-dyjBcgdgk7Pe9wHmmfpqLFaTQe2KTcn/,
	fchecconi-Re5JQEeQqe8AvxtiuMwx3w, tommaso.cucinotta-gAmJrWFzCps,
	juri.lelli-Re5JQEeQqe8AvxtiuMwx3w,
	nicola.manica-+cHZLFJ93xAO91npARCAeA, luca.abeni-3IIOeSMMxS4,
	dhaval.giani-Re5JQEeQqe8AvxtiuMwx3w,
	hgu1972-Re5JQEeQqe8AvxtiuMwx3w, Paul McKenney,
	insop.song-Re5JQEeQqe8AvxtiuMwx3w,
	liming.wang-CWA4WttNNZF54TAoqtyWWQ, jkacur-H+wXaHxf7aLQT0dZR+AlfA,
	linux-man-u79uwXL29TY76Z2rM5mHXA

Hi Michael,

find below an updated manpage, I did not apply the comments on parts
that are identical to SCHED_SETSCHEDULER(2) in order to keep these texts
in alignment. I feel that if we change one we should also change the
other, and such a 'patch' is best done separate from the new manpage
itself.

I did add the missing EBUSY error, and amended the text where it said
we'd return EINVAL in that case.

I added a paragraph stating that SCHED_DEADLINE preempted anything else
userspace can do (with the explicit mention of userspace to leave me
wriggle room for the kernel's stop task :-).

I also did a short paragraph on the deadline sched_yield(). For further
deadline yield details we should maybe add to the SCHED_YIELD(2)
manpage.

Re juri/claudio; no I think sched_yield() as implemented for deadline
makes sense, no other yield semantics other than NOP makes sense for it,
and since we have the syscall already might as well make it do something
useful.


---

NAME
	sched_setattr, sched_getattr - set and get scheduling policy/attributes

SYNOPSIS
	#include <sched.h>

	struct sched_attr {
		u32 size;
		u32 sched_policy;
		u64 sched_flags;

		/* SCHED_NORMAL, SCHED_BATCH */
		s32 sched_nice;
		/* SCHED_FIFO, SCHED_RR */
		u32 sched_priority;
		/* SCHED_DEADLINE */
		u64 sched_runtime;
		u64 sched_deadline;
		u64 sched_period;
	};
	int sched_setattr(pid_t pid, const struct sched_attr *attr, unsigned int flags);

	int sched_getattr(pid_t pid, const struct sched_attr *attr, unsigned int size, unsigned int flags);

DESCRIPTION
	sched_setattr() sets both the scheduling policy and the
	associated attributes for the process whose ID is specified in
	pid.  If pid equals zero, the scheduling policy and attributes
	of the calling process will be set.  The interpretation of the
	argument attr depends on the selected policy.  Currently, Linux
	supports the following "normal" (i.e., non-real-time) scheduling
	policies:

	SCHED_OTHER	the standard "fair" time-sharing policy;

	SCHED_BATCH	for "batch" style execution of processes; and

	SCHED_IDLE	for running very low priority background jobs.

	The following "real-time" policies are also supported, for
	special time-critical applications that need precise control
	over the way in which runnable processes are selected for
	execution:

	SCHED_FIFO	a first-in, first-out policy;

	SCHED_RR	a round-robin policy; and

	SCHED_DEADLINE	a deadline policy.

	The semantics of each of these policies are detailed below.

	sched_attr::size must be set to the size of the structure, as in
	sizeof(struct sched_attr), if the provided structure is smaller
	than the kernel structure, any additional fields are assumed
	'0'. If the provided structure is larger than the kernel
	structure, the kernel verifies all additional fields are '0' if
	not the syscall will fail with -E2BIG.

	sched_attr::sched_policy the desired scheduling policy.

	sched_attr::sched_flags additional flags that can influence
	scheduling behaviour. Currently as per Linux kernel 3.14:

		SCHED_FLAG_RESET_ON_FORK - resets the scheduling policy
		to: (struct sched_attr){ .sched_policy = SCHED_OTHER, }
		on fork().

	is the only supported flag.

	sched_attr::sched_nice should only be set for SCHED_OTHER,
	SCHED_BATCH, the desired nice value [-20,19], see NICE(2).

	sched_attr::sched_priority should only be set for SCHED_FIFO,
	SCHED_RR, the desired static priority [1,99].

	sched_attr::sched_runtime
	sched_attr::sched_deadline
	sched_attr::sched_period should only be set for SCHED_DEADLINE
	and are the traditional sporadic task model parameters.

	The flags argument should be 0.

	sched_getattr() queries the scheduling policy currently applied
	to the process identified by pid.  If pid equals zero, the
	policy of the calling process will be retrieved.

	The size argument should reflect the size of struct sched_attr
	as known to userspace. The kernel fills out sched_attr::size to
	the size of its sched_attr structure. If the user provided
	structure is larger, additional fields are not touched. If the
	user provided structure is smaller, but the kernel needs to
	return values outside the provided space, the syscall will fail
	with -E2BIG.

	The flags argument should be 0.

	The other sched_attr fields are filled out as described in
	sched_setattr().

   Scheduling Policies
       The  scheduler  is  the  kernel  component  that decides which runnable
       process will be executed by the CPU next.  Each process has an  associ‐
       ated  scheduling  policy and a static scheduling priority, sched_prior‐
       ity; these are the settings that are modified by  sched_setscheduler().
       The  scheduler  makes it decisions based on knowledge of the scheduling
       policy and static priority of all processes on the system.

       For processes scheduled under one of  the  normal  scheduling  policies
       (SCHED_OTHER,  SCHED_IDLE,  SCHED_BATCH), sched_priority is not used in
       scheduling decisions (it must be specified as 0).

       Processes scheduled under one of the  real-time  policies  (SCHED_FIFO,
       SCHED_RR)  have  a  sched_priority  value  in  the  range 1 (low) to 99
       (high).  (As the numbers imply, real-time processes always have  higher
       priority than normal processes.)  Note well: POSIX.1-2001 only requires
       an implementation to support a minimum 32 distinct priority levels  for
       the  real-time  policies,  and  some  systems supply just this minimum.
       Portable   programs   should    use    sched_get_priority_min(2)    and
       sched_get_priority_max(2) to find the range of priorities supported for
       a particular policy.

       Conceptually, the scheduler maintains a list of runnable processes  for
       each  possible  sched_priority  value.   In  order  to  determine which
       process runs next, the scheduler looks for the nonempty list  with  the
       highest  static  priority  and  selects the process at the head of this
       list.

       A process's scheduling policy determines where it will be inserted into
       the  list  of processes with equal static priority and how it will move
       inside this list.

       All scheduling is preemptive: if a process with a higher static  prior‐
       ity  becomes  ready  to run, the currently running process will be pre‐
       empted and returned to the wait list for  its  static  priority  level.
       The  scheduling  policy only determines the ordering within the list of
       runnable processes with equal static priority.

    SCHED_DEADLINE: Sporadic task model deadline scheduling
       SCHED_DEADLINE is an implementation of GEDF (Global Earliest
       Deadline First) with additional CBS (Constant Bandwidth Server).
       The CBS guarantees that tasks that over-run their specified
       budget are throttled and do not affect the correct performance
       of other SCHED_DEADLINE tasks.

       SCHED_DEADLINE tasks will fail FORK(2) with -EAGAIN

       Setting SCHED_DEADLINE can fail with -EBUSY when admission
       control tests fail.

       Because of the nature of (G)EDF, SCHED_DEADLINE tasks are the
       highest priority (user controllable) tasks in the system, if any
       SCHED_DEADLINE task is runnable it will preempt anything
       FIFO/RR/OTHER/BATCH/IDLE task out there.

       A SCHED_DEADLINE task calling sched_yield() will 'yield' the
       current job and wait for a new period to begin.

   SCHED_FIFO: First In-First Out scheduling
       SCHED_FIFO can only be used with static priorities higher than 0, which
       means that when a SCHED_FIFO processes becomes runnable, it will always
       immediately preempt any currently running SCHED_OTHER, SCHED_BATCH,  or
       SCHED_IDLE  process.  SCHED_FIFO is a simple scheduling algorithm with‐
       out time slicing.  For processes scheduled under the SCHED_FIFO policy,
       the following rules apply:

       *  A  SCHED_FIFO  process that has been preempted by another process of
          higher priority will stay at the head of the list for  its  priority
          and  will resume execution as soon as all processes of higher prior‐
          ity are blocked again.

       *  When a SCHED_FIFO process becomes runnable, it will be  inserted  at
          the end of the list for its priority.

       *  A  call  to  sched_setscheduler()  or sched_setparam(2) will put the
          SCHED_FIFO (or SCHED_RR) process identified by pid at the  start  of
          the  list  if it was runnable.  As a consequence, it may preempt the
          currently  running  process   if   it   has   the   same   priority.
          (POSIX.1-2001 specifies that the process should go to the end of the
          list.)

       *  A process calling sched_yield(2) will be put at the end of the list.

       No other events will move a process scheduled under the SCHED_FIFO pol‐
       icy in the wait list of runnable processes with equal static priority.

       A SCHED_FIFO process runs until either it is blocked by an I/O request,
       it  is  preempted  by  a  higher  priority   process,   or   it   calls
       sched_yield(2).

   SCHED_RR: Round Robin scheduling
       SCHED_RR  is  a simple enhancement of SCHED_FIFO.  Everything described
       above for SCHED_FIFO also applies to SCHED_RR, except that each process
       is  only  allowed  to  run  for  a maximum time quantum.  If a SCHED_RR
       process has been running for a time period equal to or longer than  the
       time  quantum,  it will be put at the end of the list for its priority.
       A SCHED_RR process that has been preempted by a higher priority process
       and  subsequently  resumes execution as a running process will complete
       the unexpired portion of its round robin time quantum.  The  length  of
       the time quantum can be retrieved using sched_rr_get_interval(2).

   SCHED_OTHER: Default Linux time-sharing scheduling
       SCHED_OTHER  can only be used at static priority 0.  SCHED_OTHER is the
       standard Linux time-sharing scheduler that is  intended  for  all  pro‐
       cesses  that  do  not  require  the  special real-time mechanisms.  The
       process to run is chosen from the static priority 0  list  based  on  a
       dynamic priority that is determined only inside this list.  The dynamic
       priority is based on the nice value (set by nice(2) or  setpriority(2))
       and  increased  for  each time quantum the process is ready to run, but
       denied to run by the scheduler.  This ensures fair progress  among  all
       SCHED_OTHER processes.

   SCHED_BATCH: Scheduling batch processes
       (Since  Linux 2.6.16.)  SCHED_BATCH can only be used at static priority
       0.  This policy is similar to SCHED_OTHER  in  that  it  schedules  the
       process  according  to  its dynamic priority (based on the nice value).
       The difference is that this policy will cause the scheduler  to  always
       assume  that the process is CPU-intensive.  Consequently, the scheduler
       will apply a small scheduling penalty with respect to wakeup behaviour,
       so that this process is mildly disfavored in scheduling decisions.

       This policy is useful for workloads that are noninteractive, but do not
       want to lower their nice value, and for workloads that want a determin‐
       istic scheduling policy without interactivity causing extra preemptions
       (between the workload's tasks).

   SCHED_IDLE: Scheduling very low priority jobs
       (Since Linux 2.6.23.)  SCHED_IDLE can only be used at  static  priority
       0; the process nice value has no influence for this policy.

       This  policy  is  intended  for  running jobs at extremely low priority
       (lower even than a +19 nice value with the SCHED_OTHER  or  SCHED_BATCH
       policies).

RETURN VALUE
	On success, sched_setattr() and sched_getattr() return 0. On
	error, -1 is returned, and errno is set appropriately.

ERRORS
       EINVAL The scheduling policy is not one  of  the  recognized  policies,
              param is NULL, or param does not make sense for the policy.

       EPERM  The calling process does not have appropriate privileges.

       ESRCH  The process whose ID is pid could not be found.

       E2BIG  The provided storage for struct sched_attr is either too
              big, see sched_setattr(), or too small, see sched_getattr().

       EBUSY  SCHED_DEADLINE admission control failure

NOTES
	While the text above (and in SCHED_SETSCHEDULER(2)) talks about
	processes, in actual fact these system calls are thread specific.
--
To unsubscribe from this list: send the line "unsubscribe linux-man" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: sched_{set,get}attr() manpage
       [not found]                 ` <20140428081858.GX13658-ndre7Fmf5hadTX5a5knrm8zTDFooKrT+cvkQGrU6aU0@public.gmane.org>
@ 2014-04-29 13:08                   ` Michael Kerrisk (man-pages)
       [not found]                     ` <535FA467.2070403-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
  2014-04-29 16:04                     ` Peter Zijlstra
  0 siblings, 2 replies; 26+ messages in thread
From: Michael Kerrisk (man-pages) @ 2014-04-29 13:08 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w, Dario Faggioli,
	Thomas Gleixner, Ingo Molnar, rostedt-nx8X9YLhiw1AfugRpC6u6w,
	Oleg Nesterov, fweisbec-Re5JQEeQqe8AvxtiuMwx3w,
	darren-P76s1CtE8BHQT0dZR+AlfA, johan.eker-IzeFyvvaP7pWk0Htik3J/w,
	p.faure-et3tyl94nDNyDzI6CaY1VQ, Linux Kernel,
	claudio-YOzL5CV4y4YG1A2ADO40+w,
	michael-dyjBcgdgk7Pe9wHmmfpqLFaTQe2KTcn/,
	fchecconi-Re5JQEeQqe8AvxtiuMwx3w, tommaso.cucinotta-gAmJrWFzCps,
	juri.lelli-Re5JQEeQqe8AvxtiuMwx3w,
	nicola.manica-+cHZLFJ93xAO91npARCAeA, luca.abeni-3IIOeSMMxS4,
	dhaval.giani-Re5JQEeQqe8AvxtiuMwx3w,
	hgu1972-Re5JQEeQqe8AvxtiuMwx3w, Paul McKenney,
	insop.song-Re5JQEeQqe8AvxtiuMwx3w,
	liming.wang-CWA4WttNNZF54TAoqtyWWQ, jkacur-H+wXaHxf7aLQT0dZR+AlfA,
	linux-man-u79uwXL29TY76Z2rM5mHXA

Hi Peter,

On 04/28/2014 10:18 AM, Peter Zijlstra wrote:
> Hi Michael,
> 
> find below an updated manpage, I did not apply the comments on parts
> that are identical to SCHED_SETSCHEDULER(2) in order to keep these texts
> in alignment. I feel that if we change one we should also change the
> other, and such a 'patch' is best done separate from the new manpage
> itself.
> 
> I did add the missing EBUSY error, and amended the text where it said
> we'd return EINVAL in that case.
> 
> I added a paragraph stating that SCHED_DEADLINE preempted anything else
> userspace can do (with the explicit mention of userspace to leave me
> wriggle room for the kernel's stop task :-).
> 
> I also did a short paragraph on the deadline sched_yield(). For further
> deadline yield details we should maybe add to the SCHED_YIELD(2)
> manpage.
> 
> Re juri/claudio; no I think sched_yield() as implemented for deadline
> makes sense, no other yield semantics other than NOP makes sense for it,
> and since we have the syscall already might as well make it do something
> useful.

Thanks for the updated page. Would you be willing
to revise as per the comments below.


> NAME
> 	sched_setattr, sched_getattr - set and get scheduling policy/attributes
> 
> SYNOPSIS
> 	#include <sched.h>
> 
> 	struct sched_attr {
> 		u32 size;
> 		u32 sched_policy;
> 		u64 sched_flags;
> 
> 		/* SCHED_NORMAL, SCHED_BATCH */
> 		s32 sched_nice;
> 		/* SCHED_FIFO, SCHED_RR */
> 		u32 sched_priority;
> 		/* SCHED_DEADLINE */
> 		u64 sched_runtime;
> 		u64 sched_deadline;
> 		u64 sched_period;
> 	};
> 	int sched_setattr(pid_t pid, const struct sched_attr *attr, unsigned int flags);
> 
> 	int sched_getattr(pid_t pid, const struct sched_attr *attr, unsigned int size, unsigned int flags);
> 
> DESCRIPTION
> 	sched_setattr() sets both the scheduling policy and the
> 	associated attributes for the process whose ID is specified in
> 	pid.  

Around about here, I think there needs to be a sentence explaining
that sched_setattr() provides a superset of the functionality of 
sched_setscheduler(2) and setpritority(2). I mean, it can do all that 
those two calls can do, right?

> If pid equals zero, the scheduling policy and attributes
> 	of the calling process will be set.  The interpretation of the
> 	argument attr depends on the selected policy.  Currently, Linux
> 	supports the following "normal" (i.e., non-real-time) scheduling
> 	policies:
> 
> 	SCHED_OTHER	the standard "fair" time-sharing policy;
> 
> 	SCHED_BATCH	for "batch" style execution of processes; and
> 
> 	SCHED_IDLE	for running very low priority background jobs.
> 
> 	The following "real-time" policies are also supported, for
> 	special time-critical applications that need precise control
> 	over the way in which runnable processes are selected for
> 	execution:
> 
> 	SCHED_FIFO	a first-in, first-out policy;
> 
> 	SCHED_RR	a round-robin policy; and
> 
> 	SCHED_DEADLINE	a deadline policy.
> 
> 	The semantics of each of these policies are detailed below.

The semantics of each of these policies are detailed in sched(7).

[See my comments below]

> 
> 	sched_attr::size must be set to the size of the structure, as in
> 	sizeof(struct sched_attr), if the provided structure is smaller
> 	than the kernel structure, any additional fields are assumed
> 	'0'. If the provided structure is larger than the kernel
> 	structure, the kernel verifies all additional fields are '0' if
> 	not the syscall will fail with -E2BIG.
> 
> 	sched_attr::sched_policy the desired scheduling policy.
> 
> 	sched_attr::sched_flags additional flags that can influence
> 	scheduling behaviour. Currently as per Linux kernel 3.14:
> 
> 		SCHED_FLAG_RESET_ON_FORK - resets the scheduling policy
> 		to: (struct sched_attr){ .sched_policy = SCHED_OTHER, }
> 		on fork().
> 
> 	is the only supported flag.
> 
> 	sched_attr::sched_nice should only be set for SCHED_OTHER,
> 	SCHED_BATCH, the desired nice value [-20,19], see NICE(2).
> 
> 	sched_attr::sched_priority should only be set for SCHED_FIFO,
> 	SCHED_RR, the desired static priority [1,99].
> 
> 	sched_attr::sched_runtime
> 	sched_attr::sched_deadline
> 	sched_attr::sched_period should only be set for SCHED_DEADLINE
> 	and are the traditional sporadic task model parameters.

Could you add (a lot ;-)) more detail on these three fields? Assume the
reader does not know about this traditional sporadic task model, and 
then give some explanation of what these three fields do. Probably, at
this point you can work in some statement  about the admission control
test.

[but, see my comment below. It may be that sched(7) is a better
place for this detail.

> 	The flags argument should be 0.
> 
> 	sched_getattr() queries the scheduling policy currently applied
> 	to the process identified by pid.  If pid equals zero, the
> 	policy of the calling process will be retrieved.
> 
> 	The size argument should reflect the size of struct sched_attr
> 	as known to userspace. The kernel fills out sched_attr::size to
> 	the size of its sched_attr structure. If the user provided
> 	structure is larger, additional fields are not touched. If the
> 	user provided structure is smaller, but the kernel needs to
> 	return values outside the provided space, the syscall will fail
> 	with -E2BIG.
> 
> 	The flags argument should be 0.
> 
> 	The other sched_attr fields are filled out as described in
> 	sched_setattr().

I assume that everything between my [[[ and ]]] blocks below is taken straight 
from sched_setscheduler(2). (If that is not true, please let me know.)
This reminds me that there is a structural fault in this part of man-pages ;-).
The problem is sched_setscheduler(2) currently tries to do two things:

[a] Document the sched_setscheduler() and sched_scheduler system calls
[b] Provide and overview od scheduling policies and parameters.

It should really only do the former. I have now gone through the task of
separating [b] out into a separate page, sched(7), which other pages,
such as sched_setscheduler(2) and sched_setattr(2) can refer to. You
can see the current versions of sched_setscheduelr.2 and sched.7 in Git
(https://www.kernel.org/doc/man-pages/download.html )

So, what I would ideally like to see

[1] A page describing the sched_setattr() and sched_getattr() APIs
[2] A piece of text describing the SCHED_DEADLINE policy, which I can
drop into sched(7).

Could you revise like that?

[[[[
>    Scheduling Policies
>        The  scheduler  is  the  kernel  component  that decides which runnable
>        process will be executed by the CPU next.  Each process has an  associ‐
>        ated  scheduling  policy and a static scheduling priority, sched_prior‐
>        ity; these are the settings that are modified by  sched_setscheduler().
>        The  scheduler  makes it decisions based on knowledge of the scheduling
>        policy and static priority of all processes on the system.
> 
>        For processes scheduled under one of  the  normal  scheduling  policies
>        (SCHED_OTHER,  SCHED_IDLE,  SCHED_BATCH), sched_priority is not used in
>        scheduling decisions (it must be specified as 0).
> 
>        Processes scheduled under one of the  real-time  policies  (SCHED_FIFO,
>        SCHED_RR)  have  a  sched_priority  value  in  the  range 1 (low) to 99
>        (high).  (As the numbers imply, real-time processes always have  higher
>        priority than normal processes.)  Note well: POSIX.1-2001 only requires
>        an implementation to support a minimum 32 distinct priority levels  for
>        the  real-time  policies,  and  some  systems supply just this minimum.
>        Portable   programs   should    use    sched_get_priority_min(2)    and
>        sched_get_priority_max(2) to find the range of priorities supported for
>        a particular policy.
> 
>        Conceptually, the scheduler maintains a list of runnable processes  for
>        each  possible  sched_priority  value.   In  order  to  determine which
>        process runs next, the scheduler looks for the nonempty list  with  the
>        highest  static  priority  and  selects the process at the head of this
>        list.
> 
>        A process's scheduling policy determines where it will be inserted into
>        the  list  of processes with equal static priority and how it will move
>        inside this list.
> 
>        All scheduling is preemptive: if a process with a higher static  prior‐
>        ity  becomes  ready  to run, the currently running process will be pre‐
>        empted and returned to the wait list for  its  static  priority  level.
>        The  scheduling  policy only determines the ordering within the list of
>        runnable processes with equal static priority.
]]]]

>     SCHED_DEADLINE: Sporadic task model deadline scheduling
>        SCHED_DEADLINE is an implementation of GEDF (Global Earliest
>        Deadline First) with additional CBS (Constant Bandwidth Server).
>        The CBS guarantees that tasks that over-run their specified
>        budget are throttled and do not affect the correct performance
>        of other SCHED_DEADLINE tasks.
> 
>        SCHED_DEADLINE tasks will fail FORK(2) with -EAGAIN
> 
>        Setting SCHED_DEADLINE can fail with -EBUSY when admission
>        control tests fail.
> 
>        Because of the nature of (G)EDF, SCHED_DEADLINE tasks are the
>        highest priority (user controllable) tasks in the system, if any
>        SCHED_DEADLINE task is runnable it will preempt anything
>        FIFO/RR/OTHER/BATCH/IDLE task out there.
> 
>        A SCHED_DEADLINE task calling sched_yield() will 'yield' the
>        current job and wait for a new period to begin.

This is the piece that could go into sched(7), but I'd like it to include
a discussion of deadline, period, and runtime.

[[[[
 
>    SCHED_FIFO: First In-First Out scheduling
>        SCHED_FIFO can only be used with static priorities higher than 0, which
>        means that when a SCHED_FIFO processes becomes runnable, it will always
>        immediately preempt any currently running SCHED_OTHER, SCHED_BATCH,  or
>        SCHED_IDLE  process.  SCHED_FIFO is a simple scheduling algorithm with‐
>        out time slicing.  For processes scheduled under the SCHED_FIFO policy,
>        the following rules apply:
> 
>        *  A  SCHED_FIFO  process that has been preempted by another process of
>           higher priority will stay at the head of the list for  its  priority
>           and  will resume execution as soon as all processes of higher prior‐
>           ity are blocked again.
> 
>        *  When a SCHED_FIFO process becomes runnable, it will be  inserted  at
>           the end of the list for its priority.
> 
>        *  A  call  to  sched_setscheduler()  or sched_setparam(2) will put the
>           SCHED_FIFO (or SCHED_RR) process identified by pid at the  start  of
>           the  list  if it was runnable.  As a consequence, it may preempt the
>           currently  running  process   if   it   has   the   same   priority.
>           (POSIX.1-2001 specifies that the process should go to the end of the
>           list.)
> 
>        *  A process calling sched_yield(2) will be put at the end of the list.
> 
>        No other events will move a process scheduled under the SCHED_FIFO pol‐
>        icy in the wait list of runnable processes with equal static priority.
> 
>        A SCHED_FIFO process runs until either it is blocked by an I/O request,
>        it  is  preempted  by  a  higher  priority   process,   or   it   calls
>        sched_yield(2).
> 
>    SCHED_RR: Round Robin scheduling
>        SCHED_RR  is  a simple enhancement of SCHED_FIFO.  Everything described
>        above for SCHED_FIFO also applies to SCHED_RR, except that each process
>        is  only  allowed  to  run  for  a maximum time quantum.  If a SCHED_RR
>        process has been running for a time period equal to or longer than  the
>        time  quantum,  it will be put at the end of the list for its priority.
>        A SCHED_RR process that has been preempted by a higher priority process
>        and  subsequently  resumes execution as a running process will complete
>        the unexpired portion of its round robin time quantum.  The  length  of
>        the time quantum can be retrieved using sched_rr_get_interval(2).
> 
>    SCHED_OTHER: Default Linux time-sharing scheduling
>        SCHED_OTHER  can only be used at static priority 0.  SCHED_OTHER is the
>        standard Linux time-sharing scheduler that is  intended  for  all  pro‐
>        cesses  that  do  not  require  the  special real-time mechanisms.  The
>        process to run is chosen from the static priority 0  list  based  on  a
>        dynamic priority that is determined only inside this list.  The dynamic
>        priority is based on the nice value (set by nice(2) or  setpriority(2))
>        and  increased  for  each time quantum the process is ready to run, but
>        denied to run by the scheduler.  This ensures fair progress  among  all
>        SCHED_OTHER processes.
> 
>    SCHED_BATCH: Scheduling batch processes
>        (Since  Linux 2.6.16.)  SCHED_BATCH can only be used at static priority
>        0.  This policy is similar to SCHED_OTHER  in  that  it  schedules  the
>        process  according  to  its dynamic priority (based on the nice value).
>        The difference is that this policy will cause the scheduler  to  always
>        assume  that the process is CPU-intensive.  Consequently, the scheduler
>        will apply a small scheduling penalty with respect to wakeup behaviour,
>        so that this process is mildly disfavored in scheduling decisions.
> 
>        This policy is useful for workloads that are noninteractive, but do not
>        want to lower their nice value, and for workloads that want a determin‐
>        istic scheduling policy without interactivity causing extra preemptions
>        (between the workload's tasks).
> 
>    SCHED_IDLE: Scheduling very low priority jobs
>        (Since Linux 2.6.23.)  SCHED_IDLE can only be used at  static  priority
>        0; the process nice value has no influence for this policy.
> 
>        This  policy  is  intended  for  running jobs at extremely low priority
>        (lower even than a +19 nice value with the SCHED_OTHER  or  SCHED_BATCH
>        policies).
]]]]

> RETURN VALUE
> 	On success, sched_setattr() and sched_getattr() return 0. On
> 	error, -1 is returned, and errno is set appropriately.
> 
> ERRORS
>        EINVAL The scheduling policy is not one  of  the  recognized  policies,
>               param is NULL, or param does not make sense for the policy.
> 
>        EPERM  The calling process does not have appropriate privileges.
> 
>        ESRCH  The process whose ID is pid could not be found.
> 
>        E2BIG  The provided storage for struct sched_attr is either too
>               big, see sched_setattr(), or too small, see sched_getattr().
> 
>        EBUSY  SCHED_DEADLINE admission control failure

The above is the only place on the page that mentions admission control.
As well as the suggestions above, it would be nice to have somewhere a
summary of how admission control is calculated.

> NOTES
> 	While the text above (and in SCHED_SETSCHEDULER(2)) talks about
> 	processes, in actual fact these system calls are thread specific.
> 

Cheers,

Michael


-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/
--
To unsubscribe from this list: send the line "unsubscribe linux-man" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: sched_{set,get}attr() manpage
       [not found]                     ` <535FA467.2070403-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
@ 2014-04-29 14:22                       ` Peter Zijlstra
  0 siblings, 0 replies; 26+ messages in thread
From: Peter Zijlstra @ 2014-04-29 14:22 UTC (permalink / raw)
  To: Michael Kerrisk (man-pages)
  Cc: Dario Faggioli, Thomas Gleixner, Ingo Molnar,
	rostedt-nx8X9YLhiw1AfugRpC6u6w, Oleg Nesterov,
	fweisbec-Re5JQEeQqe8AvxtiuMwx3w, darren-P76s1CtE8BHQT0dZR+AlfA,
	johan.eker-IzeFyvvaP7pWk0Htik3J/w, p.faure-et3tyl94nDNyDzI6CaY1VQ,
	Linux Kernel, claudio-YOzL5CV4y4YG1A2ADO40+w,
	michael-dyjBcgdgk7Pe9wHmmfpqLFaTQe2KTcn/,
	fchecconi-Re5JQEeQqe8AvxtiuMwx3w, tommaso.cucinotta-gAmJrWFzCps,
	juri.lelli-Re5JQEeQqe8AvxtiuMwx3w,
	nicola.manica-+cHZLFJ93xAO91npARCAeA, luca.abeni-3IIOeSMMxS4,
	dhaval.giani-Re5JQEeQqe8AvxtiuMwx3w,
	hgu1972-Re5JQEeQqe8AvxtiuMwx3w, Paul McKenney,
	insop.song-Re5JQEeQqe8AvxtiuMwx3w,
	liming.wang-CWA4WttNNZF54TAoqtyWWQ, jkacur-H+wXaHxf7aLQT0dZR+AlfA,
	linux-man-u79uwXL29TY76Z2rM5mHXA

On Tue, Apr 29, 2014 at 03:08:55PM +0200, Michael Kerrisk (man-pages) wrote:
> Hi Peter,
> 
> On 04/28/2014 10:18 AM, Peter Zijlstra wrote:
> > Hi Michael,
> > 
> > find below an updated manpage, I did not apply the comments on parts
> > that are identical to SCHED_SETSCHEDULER(2) in order to keep these texts
> > in alignment. I feel that if we change one we should also change the
> > other, and such a 'patch' is best done separate from the new manpage
> > itself.
> > 
> > I did add the missing EBUSY error, and amended the text where it said
> > we'd return EINVAL in that case.
> > 
> > I added a paragraph stating that SCHED_DEADLINE preempted anything else
> > userspace can do (with the explicit mention of userspace to leave me
> > wriggle room for the kernel's stop task :-).
> > 
> > I also did a short paragraph on the deadline sched_yield(). For further
> > deadline yield details we should maybe add to the SCHED_YIELD(2)
> > manpage.
> > 
> > Re juri/claudio; no I think sched_yield() as implemented for deadline
> > makes sense, no other yield semantics other than NOP makes sense for it,
> > and since we have the syscall already might as well make it do something
> > useful.
> 
> Thanks for the updated page. Would you be willing
> to revise as per the comments below.

Ok.

> 
> > NAME
> > 	sched_setattr, sched_getattr - set and get scheduling policy/attributes
> > 
> > SYNOPSIS
> > 	#include <sched.h>
> > 
> > 	struct sched_attr {
> > 		u32 size;
> > 		u32 sched_policy;
> > 		u64 sched_flags;
> > 
> > 		/* SCHED_NORMAL, SCHED_BATCH */
> > 		s32 sched_nice;
> > 		/* SCHED_FIFO, SCHED_RR */
> > 		u32 sched_priority;
> > 		/* SCHED_DEADLINE */
> > 		u64 sched_runtime;
> > 		u64 sched_deadline;
> > 		u64 sched_period;
> > 	};
> > 	int sched_setattr(pid_t pid, const struct sched_attr *attr, unsigned int flags);
> > 
> > 	int sched_getattr(pid_t pid, const struct sched_attr *attr, unsigned int size, unsigned int flags);
> > 
> > DESCRIPTION
> > 	sched_setattr() sets both the scheduling policy and the
> > 	associated attributes for the process whose ID is specified in
> > 	pid.  
> 
> Around about here, I think there needs to be a sentence explaining
> that sched_setattr() provides a superset of the functionality of 
> sched_setscheduler(2) and setpritority(2). I mean, it can do all that 
> those two calls can do, right?

Almost; setpriority() has the .which argument which we don't have. So
while that syscall can change the nice value for an entire process group
or user, sched_setattr() can only change the nice value for 1 task.

But yes, I can mention something along those lines.

> > If pid equals zero, the scheduling policy and attributes
> > 	of the calling process will be set.  The interpretation of the
> > 	argument attr depends on the selected policy.  Currently, Linux
> > 	supports the following "normal" (i.e., non-real-time) scheduling
> > 	policies:
> > 
> > 	SCHED_OTHER	the standard "fair" time-sharing policy;
> > 
> > 	SCHED_BATCH	for "batch" style execution of processes; and
> > 
> > 	SCHED_IDLE	for running very low priority background jobs.
> > 
> > 	The following "real-time" policies are also supported, for
> > 	special time-critical applications that need precise control
> > 	over the way in which runnable processes are selected for
> > 	execution:
> > 
> > 	SCHED_FIFO	a first-in, first-out policy;
> > 
> > 	SCHED_RR	a round-robin policy; and
> > 
> > 	SCHED_DEADLINE	a deadline policy.
> > 
> > 	The semantics of each of these policies are detailed below.
> 
> The semantics of each of these policies are detailed in sched(7).

I don't appear to have SCHED(7), how new is that?

> [See my comments below]
> 
> > 
> > 	sched_attr::size must be set to the size of the structure, as in
> > 	sizeof(struct sched_attr), if the provided structure is smaller
> > 	than the kernel structure, any additional fields are assumed
> > 	'0'. If the provided structure is larger than the kernel
> > 	structure, the kernel verifies all additional fields are '0' if
> > 	not the syscall will fail with -E2BIG.
> > 
> > 	sched_attr::sched_policy the desired scheduling policy.
> > 
> > 	sched_attr::sched_flags additional flags that can influence
> > 	scheduling behaviour. Currently as per Linux kernel 3.14:
> > 
> > 		SCHED_FLAG_RESET_ON_FORK - resets the scheduling policy
> > 		to: (struct sched_attr){ .sched_policy = SCHED_OTHER, }
> > 		on fork().
> > 
> > 	is the only supported flag.
> > 
> > 	sched_attr::sched_nice should only be set for SCHED_OTHER,
> > 	SCHED_BATCH, the desired nice value [-20,19], see NICE(2).
> > 
> > 	sched_attr::sched_priority should only be set for SCHED_FIFO,
> > 	SCHED_RR, the desired static priority [1,99].
> > 
> > 	sched_attr::sched_runtime
> > 	sched_attr::sched_deadline
> > 	sched_attr::sched_period should only be set for SCHED_DEADLINE
> > 	and are the traditional sporadic task model parameters.
> 
> Could you add (a lot ;-)) more detail on these three fields? Assume the
> reader does not know about this traditional sporadic task model, and 
> then give some explanation of what these three fields do. Probably, at
> this point you can work in some statement  about the admission control
> test.
> 
> [but, see my comment below. It may be that sched(7) is a better
> place for this detail.

Yes, I think SCHED(7) would be a better place; also I think I forgot to
put a reference in to Documentation/scheduler/sched-deadline.txt

I'll try and write something concise. This is the stuff of books, not
paragraphs :/

> > 	The flags argument should be 0.
> > 
> > 	sched_getattr() queries the scheduling policy currently applied
> > 	to the process identified by pid.  If pid equals zero, the
> > 	policy of the calling process will be retrieved.
> > 
> > 	The size argument should reflect the size of struct sched_attr
> > 	as known to userspace. The kernel fills out sched_attr::size to
> > 	the size of its sched_attr structure. If the user provided
> > 	structure is larger, additional fields are not touched. If the
> > 	user provided structure is smaller, but the kernel needs to
> > 	return values outside the provided space, the syscall will fail
> > 	with -E2BIG.
> > 
> > 	The flags argument should be 0.
> > 
> > 	The other sched_attr fields are filled out as described in
> > 	sched_setattr().
> 
> I assume that everything between my [[[ and ]]] blocks below is taken straight 
> from sched_setscheduler(2). (If that is not true, please let me know.)

That did indeed look about right.

> This reminds me that there is a structural fault in this part of man-pages ;-).
> The problem is sched_setscheduler(2) currently tries to do two things:
> 
> [a] Document the sched_setscheduler() and sched_scheduler system calls
> [b] Provide and overview od scheduling policies and parameters.
> 
> It should really only do the former. I have now gone through the task of
> separating [b] out into a separate page, sched(7), which other pages,
> such as sched_setscheduler(2) and sched_setattr(2) can refer to. You
> can see the current versions of sched_setscheduelr.2 and sched.7 in Git
> (https://www.kernel.org/doc/man-pages/download.html )
> 
> So, what I would ideally like to see
> 
> [1] A page describing the sched_setattr() and sched_getattr() APIs
> [2] A piece of text describing the SCHED_DEADLINE policy, which I can
> drop into sched(7).
> 
> Could you revise like that?

ACK.

> [[[[

> ]]]]
> 
> >     SCHED_DEADLINE: Sporadic task model deadline scheduling
> >        SCHED_DEADLINE is an implementation of GEDF (Global Earliest
> >        Deadline First) with additional CBS (Constant Bandwidth Server).
> >        The CBS guarantees that tasks that over-run their specified
> >        budget are throttled and do not affect the correct performance
> >        of other SCHED_DEADLINE tasks.
> > 
> >        SCHED_DEADLINE tasks will fail FORK(2) with -EAGAIN
> > 
> >        Setting SCHED_DEADLINE can fail with -EBUSY when admission
> >        control tests fail.
> > 
> >        Because of the nature of (G)EDF, SCHED_DEADLINE tasks are the
> >        highest priority (user controllable) tasks in the system, if any
> >        SCHED_DEADLINE task is runnable it will preempt anything
> >        FIFO/RR/OTHER/BATCH/IDLE task out there.
> > 
> >        A SCHED_DEADLINE task calling sched_yield() will 'yield' the
> >        current job and wait for a new period to begin.
> 
> This is the piece that could go into sched(7), but I'd like it to include
> a discussion of deadline, period, and runtime.
> 
> [[[[

> ]]]]
> 
> > RETURN VALUE
> > 	On success, sched_setattr() and sched_getattr() return 0. On
> > 	error, -1 is returned, and errno is set appropriately.
> > 
> > ERRORS
> >        EINVAL The scheduling policy is not one  of  the  recognized  policies,
> >               param is NULL, or param does not make sense for the policy.
> > 
> >        EPERM  The calling process does not have appropriate privileges.
> > 
> >        ESRCH  The process whose ID is pid could not be found.
> > 
> >        E2BIG  The provided storage for struct sched_attr is either too
> >               big, see sched_setattr(), or too small, see sched_getattr().
> > 
> >        EBUSY  SCHED_DEADLINE admission control failure
> 
> The above is the only place on the page that mentions admission control.
> As well as the suggestions above, it would be nice to have somewhere a
> summary of how admission control is calculated.

I think I'll write down what admission control is without specifics.
Giving specifics pins you down on the implementation. In general
admission control enforces a bound on the schedulability of the task
set. New and interesting ways of computing schedulability are the
subject of papers each year.
--
To unsubscribe from this list: send the line "unsubscribe linux-man" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: sched_{set,get}attr() manpage
  2014-04-29 13:08                   ` Michael Kerrisk (man-pages)
       [not found]                     ` <535FA467.2070403-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
@ 2014-04-29 16:04                     ` Peter Zijlstra
  2014-04-30 11:09                       ` Michael Kerrisk (man-pages)
  1 sibling, 1 reply; 26+ messages in thread
From: Peter Zijlstra @ 2014-04-29 16:04 UTC (permalink / raw)
  To: Michael Kerrisk (man-pages)
  Cc: Dario Faggioli, Thomas Gleixner, Ingo Molnar, rostedt,
	Oleg Nesterov, fweisbec, darren, johan.eker, p.faure,
	Linux Kernel, claudio, michael, fchecconi, tommaso.cucinotta,
	juri.lelli, nicola.manica, luca.abeni, dhaval.giani, hgu1972,
	Paul McKenney, insop.song, liming.wang, jkacur, linux-man

On Tue, Apr 29, 2014 at 03:08:55PM +0200, Michael Kerrisk (man-pages) wrote:

Juri, Dario, Can you have a look at the 2nd part; I'm not at all sure I
got the activate/release the right way around.

My current thinking was that we activate first, and then release it to
go run. But googling the terms only confused me more. I suppose its one
of those things that's not actually _that_ well defined. And I hope the
ASCII art actually clarifies things better than the terms used.

> [1] A page describing the sched_setattr() and sched_getattr() APIs

NAME
	sched_setattr, sched_getattr - set and get scheduling policy/attributes

SYNOPSIS
	#include <sched.h>

	struct sched_attr {
		u32 size;
		u32 sched_policy;
		u64 sched_flags;

		/* SCHED_NORMAL, SCHED_BATCH */
		s32 sched_nice;

		/* SCHED_FIFO, SCHED_RR */
		u32 sched_priority;

		/* SCHED_DEADLINE */
		u64 sched_runtime;
		u64 sched_deadline;
		u64 sched_period;
	};

	int sched_setattr(pid_t pid, const struct sched_attr *attr, unsigned int flags);

	int sched_getattr(pid_t pid, const struct sched_attr *attr, unsigned int size, unsigned int flags);

DESCRIPTION
	sched_setattr() sets both the scheduling policy and the
	associated attributes for the process whose ID is specified in
	pid.

	sched_setattr() replaces sched_setscheduler(), sched_setparam(),
	nice() and some of setpriority().

	If pid equals zero, the scheduling policy and attributes
	of the calling process will be set.  The interpretation of the
	argument attr depends on the selected policy.  Currently, Linux
	supports the following "normal" (i.e., non-real-time) scheduling
	policies:

	SCHED_OTHER	the standard "fair" time-sharing policy;

	SCHED_BATCH	for "batch" style execution of processes; and

	SCHED_IDLE	for running very low priority background jobs.

	The following "real-time" policies are also supported, for
	special time-critical applications that need precise control
	over the way in which runnable processes are selected for
	execution:

	SCHED_FIFO	a static priority first-in, first-out policy;

	SCHED_RR	a static priority round-robin policy; and

	SCHED_DEADLINE	a dynamic priority deadline policy.

	The semantics of each of these policies are detailed in
	sched(7).

	sched_attr::size must be set to the size of the structure, as in
	sizeof(struct sched_attr), if the provided structure is smaller
	than the kernel structure, any additional fields are assumed
	'0'. If the provided structure is larger than the kernel
	structure, the kernel verifies all additional fields are '0' if
	not the syscall will fail with -E2BIG.

	sched_attr::sched_policy the desired scheduling policy.

	sched_attr::sched_flags additional flags that can influence
	scheduling behaviour. Currently as per Linux kernel 3.14:

		SCHED_FLAG_RESET_ON_FORK - resets the scheduling policy
		to: (struct sched_attr){ .sched_policy = SCHED_OTHER, }
		on fork().

	is the only supported flag.

	sched_attr::sched_nice should only be set for SCHED_OTHER,
	SCHED_BATCH, the desired nice value [-20,19], see sched(7).

	sched_attr::sched_priority should only be set for SCHED_FIFO,
	SCHED_RR, the desired static priority [1,99], see sched(7).

	sched_attr::sched_runtime
	sched_attr::sched_deadline
	sched_attr::sched_period should only be set for SCHED_DEADLINE
	and are the traditional sporadic task model parameters, see
	sched(7).

	The flags argument should be 0.

	sched_getattr() queries the scheduling policy currently applied
	to the process identified by pid.

	Similar to sched_setattr(), sched_getattr() replaces
	sched_getscheduler(), sched_getparam() and some of
	getpriority().

	If pid equals zero, the policy of the calling process will be
	retrieved.

	The size argument should reflect the size of struct sched_attr
	as known to userspace. The kernel fills out sched_attr::size to
	the size of its sched_attr structure. If the user provided
	structure is larger, additional fields are not touched. If the
	user provided structure is smaller, but the kernel needs to
	return values outside the provided space, the syscall will fail
	with -E2BIG.

	The flags argument should be 0.

	The other sched_attr fields are filled out as described in
	sched_setattr().

RETURN VALUE
	On success, sched_setattr() and sched_getattr() return 0. On
	error, -1 is returned, and errno is set appropriately.

ERRORS
       EINVAL The scheduling policy is not one  of  the  recognized  policies,
              param is NULL, or param does not make sense for the selected
	      policy.

       EPERM  The calling process does not have appropriate privileges.

       ESRCH  The process whose ID is pid could not be found.

       E2BIG  The provided storage for struct sched_attr is either too
              big, see sched_setattr(), or too small, see sched_getattr().

       EBUSY  SCHED_DEADLINE admission control failure, see sched(7).

NOTES
       While the text above (and in sched_setscheduler(2)) talks about
       processes, in actual fact these system calls are thread specific.

> [2] A piece of text describing the SCHED_DEADLINE policy, which I can
> drop into sched(7).

    SCHED_DEADLINE: Sporadic task model deadline scheduling
       SCHED_DEADLINE is an implementation of GEDF (Global Earliest
       Deadline First) with additional CBS (Constant Bandwidth Server).

       A sporadic task is on that has a sequence of jobs, where each job
       is activated at most once per period [us]. Each job will have an
       absolute deadline relative to its activation before which it must
       finish its execution, and it shall at no time run longer
       than runtime [us] after its release.

              activation/wakeup       absolute deadline
              |        release        |
              v        v              v
       -------x--------x--------------x--------x-------
                       |<- Runtime -->|
              |<---------- Deadline ->|
              |<---------- Period  ----------->|

       This gives: runtime <= (rel) deadline <= period.

       The CBS guarantees that tasks that over-run their specified
       runtime are throttled and do not affect the correct performance
       of other SCHED_DEADLINE tasks.

       In general a task set of such tasks it not feasible/schedulable
       within the given constraints. Therefore we must do an admittance
       test on setting/changing SCHED_DEADLINE policy/attributes.

       This admission test calculates that the task set is
       feasible/schedulable, failing this, sched_setattr() will return
       -EBUSY.

       For example, it is required (but not sufficient) for the total
       utilization to be less or equal to the total amount of cpu time
       available. That is, since each job can maximally run for runtime
       [us] per period [us], that task's utilization is runtime/period.
       Summing this over all tasks must be less than the total amount of
       CPUs present.

       SCHED_DEADLINE tasks will fail fork(2) with -EAGAIN.

       Because of the nature of (G)EDF, SCHED_DEADLINE tasks are the
       highest priority (user controllable) tasks in the system, if any
       SCHED_DEADLINE task is runnable it will preempt anything
       FIFO/RR/OTHER/BATCH/IDLE task out there.

       A SCHED_DEADLINE task calling sched_yield() will 'yield' the
       current job and wait for a new period to begin.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: sched_{set,get}attr() manpage
  2014-04-29 16:04                     ` Peter Zijlstra
@ 2014-04-30 11:09                       ` Michael Kerrisk (man-pages)
       [not found]                         ` <5360D9E5.9080206-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
  0 siblings, 1 reply; 26+ messages in thread
From: Michael Kerrisk (man-pages) @ 2014-04-30 11:09 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: mtk.manpages, Dario Faggioli, Thomas Gleixner, Ingo Molnar,
	rostedt, Oleg Nesterov, fweisbec, darren, johan.eker, p.faure,
	Linux Kernel, claudio, michael, fchecconi, tommaso.cucinotta,
	juri.lelli, nicola.manica, luca.abeni, dhaval.giani, hgu1972,
	Paul McKenney, insop.song, liming.wang, jkacur, linux-man

Hi Peter,

Thanks for the revision. More comments below. Could you revise in 
the light of those comments, and hopefully also after feedback from 
Juri and Dario?

On 04/29/2014 06:04 PM, Peter Zijlstra wrote:
> On Tue, Apr 29, 2014 at 03:08:55PM +0200, Michael Kerrisk (man-pages) wrote:
> 
> Juri, Dario, Can you have a look at the 2nd part; I'm not at all sure I
> got the activate/release the right way around.
> 
> My current thinking was that we activate first, and then release it to
> go run. But googling the terms only confused me more. I suppose its one
> of those things that's not actually _that_ well defined. And I hope the
> ASCII art actually clarifies things better than the terms used.
> 
>> [1] A page describing the sched_setattr() and sched_getattr() APIs
> 
> NAME
> 	sched_setattr, sched_getattr - set and get scheduling policy/attributes
> 
> SYNOPSIS
> 	#include <sched.h>
> 
> 	struct sched_attr {
> 		u32 size;
> 		u32 sched_policy;
> 		u64 sched_flags;
> 
> 		/* SCHED_NORMAL, SCHED_BATCH */
> 		s32 sched_nice;
> 
> 		/* SCHED_FIFO, SCHED_RR */
> 		u32 sched_priority;
> 
> 		/* SCHED_DEADLINE */
> 		u64 sched_runtime;
> 		u64 sched_deadline;
> 		u64 sched_period;
> 	};
> 
> 	int sched_setattr(pid_t pid, const struct sched_attr *attr, unsigned int flags);
> 
> 	int sched_getattr(pid_t pid, const struct sched_attr *attr, unsigned int size, unsigned int flags);
> 
> DESCRIPTION
> 	sched_setattr() sets both the scheduling policy and the
> 	associated attributes for the process whose ID is specified in
> 	pid.
> 
> 	sched_setattr() replaces sched_setscheduler(), sched_setparam(),
> 	nice() and some of setpriority().
> 
> 	If pid equals zero, the scheduling policy and attributes
> 	of the calling process will be set.  The interpretation of the
> 	argument attr depends on the selected policy.  Currently, Linux
> 	supports the following "normal" (i.e., non-real-time) scheduling
> 	policies:
> 
> 	SCHED_OTHER	the standard "fair" time-sharing policy;
> 
> 	SCHED_BATCH	for "batch" style execution of processes; and
> 
> 	SCHED_IDLE	for running very low priority background jobs.
> 
> 	The following "real-time" policies are also supported, for
> 	special time-critical applications that need precise control
> 	over the way in which runnable processes are selected for
> 	execution:
> 
> 	SCHED_FIFO	a static priority first-in, first-out policy;
> 
> 	SCHED_RR	a static priority round-robin policy; and
> 
> 	SCHED_DEADLINE	a dynamic priority deadline policy.
> 
> 	The semantics of each of these policies are detailed in
> 	sched(7).
> 
> 	sched_attr::size must be set to the size of the structure, as in
> 	sizeof(struct sched_attr), if the provided structure is smaller
> 	than the kernel structure, any additional fields are assumed
> 	'0'. If the provided structure is larger than the kernel
> 	structure, the kernel verifies all additional fields are '0' if
> 	not the syscall will fail with -E2BIG.
> 
> 	sched_attr::sched_policy the desired scheduling policy.
> 
> 	sched_attr::sched_flags additional flags that can influence
> 	scheduling behaviour. Currently as per Linux kernel 3.14:
> 
> 		SCHED_FLAG_RESET_ON_FORK - resets the scheduling policy
> 		to: (struct sched_attr){ .sched_policy = SCHED_OTHER, }
> 		on fork().
> 
> 	is the only supported flag.
> 
> 	sched_attr::sched_nice should only be set for SCHED_OTHER,
> 	SCHED_BATCH, the desired nice value [-20,19], see sched(7).
> 
> 	sched_attr::sched_priority should only be set for SCHED_FIFO,
> 	SCHED_RR, the desired static priority [1,99], see sched(7).
> 
> 	sched_attr::sched_runtime
> 	sched_attr::sched_deadline
> 	sched_attr::sched_period should only be set for SCHED_DEADLINE
> 	and are the traditional sporadic task model parameters, see
> 	sched(7).

So, are there fields expressed in some unit (presumably microseconds)?
Best to mention that here.

> 	The flags argument should be 0.
> 
> 	sched_getattr() queries the scheduling policy currently applied
> 	to the process identified by pid.
> 
> 	Similar to sched_setattr(), sched_getattr() replaces
> 	sched_getscheduler(), sched_getparam() and some of
> 	getpriority().
> 
> 	If pid equals zero, the policy of the calling process will be
> 	retrieved.
> 
> 	The size argument should reflect the size of struct sched_attr
> 	as known to userspace. The kernel fills out sched_attr::size to
> 	the size of its sched_attr structure. If the user provided
> 	structure is larger, additional fields are not touched. If the
> 	user provided structure is smaller, but the kernel needs to
> 	return values outside the provided space, the syscall will fail
> 	with -E2BIG.
> 
> 	The flags argument should be 0.
> 
> 	The other sched_attr fields are filled out as described in
> 	sched_setattr().
> 
> RETURN VALUE
> 	On success, sched_setattr() and sched_getattr() return 0. On
> 	error, -1 is returned, and errno is set appropriately.
> 
> ERRORS
>        EINVAL The scheduling policy is not one  of  the  recognized  policies,
>               param is NULL, or param does not make sense for the selected
> 	      policy.
> 
>        EPERM  The calling process does not have appropriate privileges.
> 
>        ESRCH  The process whose ID is pid could not be found.
> 
>        E2BIG  The provided storage for struct sched_attr is either too
>               big, see sched_setattr(), or too small, see sched_getattr().
> 
>        EBUSY  SCHED_DEADLINE admission control failure, see sched(7).
> 
> NOTES
>        While the text above (and in sched_setscheduler(2)) talks about
>        processes, in actual fact these system calls are thread specific.
> 
>> [2] A piece of text describing the SCHED_DEADLINE policy, which I can
>> drop into sched(7).
> 
>     SCHED_DEADLINE: Sporadic task model deadline scheduling
>        SCHED_DEADLINE is an implementation of GEDF (Global Earliest
>        Deadline First) with additional CBS (Constant Bandwidth Server).
> 
>        A sporadic task is on that has a sequence of jobs, where each job
>        is activated at most once per period [us]. Each job will have an
>        absolute deadline relative to its activation before which it must
>        finish its execution, and it shall at no time run longer
>        than runtime [us] after its release.
> 
>               activation/wakeup       absolute deadline
>               |        release        |
>               v        v              v
>        -------x--------x--------------x--------x-------
>                        |<- Runtime -->|
>               |<---------- Deadline ->|
>               |<---------- Period  ----------->|
> 
>        This gives: runtime <= (rel) deadline <= period.

So, the 'sched_deadline' field in the 'sched_attr' expresses the release
deadline? (I had initially thought it was the "absolute deadline".
Could you make this clearer in the text please.

>        The CBS guarantees that tasks that over-run their specified
>        runtime are throttled and do not affect the correct performance
>        of other SCHED_DEADLINE tasks.
> 
>        In general a task set of such tasks it not feasible/schedulable

That last line is garbled. Could you fix, please.

Also, could you add some words to explain what you mean by 'task set'.

>        within the given constraints. Therefore we must do an admittance
>        test on setting/changing SCHED_DEADLINE policy/attributes.
> 
>        This admission test calculates that the task set is
>        feasible/schedulable, failing this, sched_setattr() will return
>        -EBUSY.
> 
>        For example, it is required (but not sufficient) for the total
>        utilization to be less or equal to the total amount of cpu time
>        available. That is, since each job can maximally run for runtime
>        [us] per period [us], that task's utilization is runtime/period.
>        Summing this over all tasks must be less than the total amount of
>        CPUs present.
> 
>        SCHED_DEADLINE tasks will fail fork(2) with -EAGAIN.

Except if SCHED_RESET_ON_FORK was set, right? If yes, that should be
mentioned here.

>        Because of the nature of (G)EDF, SCHED_DEADLINE tasks are the
>        highest priority (user controllable) tasks in the system, if any
>        SCHED_DEADLINE task is runnable it will preempt anything
>        FIFO/RR/OTHER/BATCH/IDLE task out there.
> 
>        A SCHED_DEADLINE task calling sched_yield() will 'yield' the
>        current job and wait for a new period to begin.

So, I'm trying to naively understand how this all works. If different 
processes specify different deadline periods, how does the kernel deal
with that? Is it worth adding some detail on this point?

Thanks,

Michael


-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: sched_{set,get}attr() manpage
       [not found]                         ` <5360D9E5.9080206-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
@ 2014-04-30 12:35                           ` Peter Zijlstra
  2014-04-30 13:09                           ` Peter Zijlstra
  1 sibling, 0 replies; 26+ messages in thread
From: Peter Zijlstra @ 2014-04-30 12:35 UTC (permalink / raw)
  To: Michael Kerrisk (man-pages)
  Cc: Dario Faggioli, Thomas Gleixner, Ingo Molnar,
	rostedt-nx8X9YLhiw1AfugRpC6u6w, Oleg Nesterov,
	fweisbec-Re5JQEeQqe8AvxtiuMwx3w, darren-P76s1CtE8BHQT0dZR+AlfA,
	johan.eker-IzeFyvvaP7pWk0Htik3J/w, p.faure-et3tyl94nDNyDzI6CaY1VQ,
	Linux Kernel, claudio-YOzL5CV4y4YG1A2ADO40+w,
	michael-dyjBcgdgk7Pe9wHmmfpqLFaTQe2KTcn/,
	fchecconi-Re5JQEeQqe8AvxtiuMwx3w, tommaso.cucinotta-gAmJrWFzCps,
	juri.lelli-Re5JQEeQqe8AvxtiuMwx3w,
	nicola.manica-+cHZLFJ93xAO91npARCAeA, luca.abeni-3IIOeSMMxS4,
	dhaval.giani-Re5JQEeQqe8AvxtiuMwx3w,
	hgu1972-Re5JQEeQqe8AvxtiuMwx3w, Paul McKenney,
	insop.song-Re5JQEeQqe8AvxtiuMwx3w,
	liming.wang-CWA4WttNNZF54TAoqtyWWQ, jkacur-H+wXaHxf7aLQT0dZR+AlfA,
	linux-man-u79uwXL29TY76Z2rM5mHXA

On Wed, Apr 30, 2014 at 01:09:25PM +0200, Michael Kerrisk (man-pages) wrote:
> Hi Peter,
> 
> Thanks for the revision. More comments below. Could you revise in 
> the light of those comments, and hopefully also after feedback from 
> Juri and Dario?
> 
> > 
> > 	sched_attr::sched_runtime
> > 	sched_attr::sched_deadline
> > 	sched_attr::sched_period should only be set for SCHED_DEADLINE
> > 	and are the traditional sporadic task model parameters, see
> > 	sched(7).
> 
> So, are there fields expressed in some unit (presumably microseconds)?
> Best to mention that here.

Oh wait, no its nanoseconds. Which means I should amend the text below.

> >> [2] A piece of text describing the SCHED_DEADLINE policy, which I can
> >> drop into sched(7).
> > 
> >     SCHED_DEADLINE: Sporadic task model deadline scheduling
> >        SCHED_DEADLINE is an implementation of GEDF (Global Earliest
> >        Deadline First) with additional CBS (Constant Bandwidth Server).
> > 
> >        A sporadic task is on that has a sequence of jobs, where each job
> >        is activated at most once per period [us]. Each job will have an
> >        absolute deadline relative to its activation before which it must

(A)

> >        finish its execution, and it shall at no time run longer
> >        than runtime [us] after its release.
> > 
> >               activation/wakeup       absolute deadline
> >               |        release        |
> >               v        v              v
> >        -------x--------x--------------x--------x-------
> >                        |<- Runtime -->|
> >               |<---------- Deadline ->|
> >               |<---------- Period  ----------->|
> > 
> >        This gives: runtime <= (rel) deadline <= period.
> 
> So, the 'sched_deadline' field in the 'sched_attr' expresses the release
> deadline? (I had initially thought it was the "absolute deadline".
> Could you make this clearer in the text please.

No, and yes, sched_attr::sched_deadline is a relative deadline wrt to
the activation. Like said at (A).

So we get: absolute deadline = activation + relative deadline.

And we must be done running at that point, so the very last possible
release moment is: absolute deadline - runtime.

And therefore, it too is a release deadline, since we must not release
later than that.

> >        The CBS guarantees that tasks that over-run their specified
> >        runtime are throttled and do not affect the correct performance
> >        of other SCHED_DEADLINE tasks.
> > 
> >        In general a task set of such tasks it not feasible/schedulable
> 
> That last line is garbled. Could you fix, please.

s/it/is/

> Also, could you add some words to explain what you mean by 'task set'.

A set of tasks? :-) In particular all tasks in the system of
SCHED_DEADLINE, indicated by 'of such'.

> >        within the given constraints. Therefore we must do an admittance
> >        test on setting/changing SCHED_DEADLINE policy/attributes.
> > 
> >        This admission test calculates that the task set is
> >        feasible/schedulable, failing this, sched_setattr() will return
> >        -EBUSY.
> > 
> >        For example, it is required (but not sufficient) for the total
> >        utilization to be less or equal to the total amount of cpu time
> >        available. That is, since each job can maximally run for runtime
> >        [us] per period [us], that task's utilization is runtime/period.
> >        Summing this over all tasks must be less than the total amount of
> >        CPUs present.
> > 
> >        SCHED_DEADLINE tasks will fail fork(2) with -EAGAIN.
> 
> Except if SCHED_RESET_ON_FORK was set, right? If yes, that should be
> mentioned here.

Ah, indeed.

> >        Because of the nature of (G)EDF, SCHED_DEADLINE tasks are the
> >        highest priority (user controllable) tasks in the system, if any
> >        SCHED_DEADLINE task is runnable it will preempt anything
> >        FIFO/RR/OTHER/BATCH/IDLE task out there.
> > 
> >        A SCHED_DEADLINE task calling sched_yield() will 'yield' the
> >        current job and wait for a new period to begin.
> 
> So, I'm trying to naively understand how this all works. If different 
> processes specify different deadline periods, how does the kernel deal
> with that? Is it worth adding some detail on this point?

Userspace should not rely on any implementation details there. Saying
its a (G)EDF scheduler is maybe already too much. All userspace should
really care about is that its tasks _should_ be scheduled such that it
meets the specified requirements.

There are multiple scheduling algorithms that can be employed to make it
so, and I don't want to pin us to whatever we chose to implement this
time.

That said, the current (G)EDF is a soft realtime scheduler in that it
guarantees a bounded tardiness (which is the time we can miss the
deadline by) but not a hard realtime, since the bound is not 0.

Anyway, for your elucidation; assuming no overhead and a UP system
(SMP is a right head-ache), and a further assumption that deadline ==
period. It is reasonable straight forward to see that scheduling the
task with the earliest deadline will satisfy the constraints IFF the
total utilization (\Sum runtime_i / deadline_i) <= 1.

Suppose two tasks: A := { 5, 10 } and B := { 10, 20 } with strict
periodic activation:

    A1,B1     A2        Ad2
    |         Ad1       Bd1
    v         v         v
  --AAAAABBBBBAAAAABBBBBx--
  --AAAAABBBBBBBBBBAAAAAx--

Where A# is the #th activation, Ad# is the corresponding #th deadline
before which we must have sufficient time.

Since we're perfectly synced up there is a tie and we get two possible
outcomes. But note that in either case A has gotten 2x its 5 As and B
has gotten its 10 Bs.

Non-periodic activation, and deadline != period make the thing more
interesting, but at that point I would ask Juri (or others) to refer you
to a paper/book.

Now, let me go update the texts yet again :-)
--
To unsubscribe from this list: send the line "unsubscribe linux-man" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: sched_{set,get}attr() manpage
       [not found]                         ` <5360D9E5.9080206-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
  2014-04-30 12:35                           ` Peter Zijlstra
@ 2014-04-30 13:09                           ` Peter Zijlstra
       [not found]                             ` <20140430130937.GH30445-ndre7Fmf5hadTX5a5knrm8zTDFooKrT+cvkQGrU6aU0@public.gmane.org>
  1 sibling, 1 reply; 26+ messages in thread
From: Peter Zijlstra @ 2014-04-30 13:09 UTC (permalink / raw)
  To: Michael Kerrisk (man-pages)
  Cc: Dario Faggioli, Thomas Gleixner, Ingo Molnar,
	rostedt-nx8X9YLhiw1AfugRpC6u6w, Oleg Nesterov,
	fweisbec-Re5JQEeQqe8AvxtiuMwx3w, darren-P76s1CtE8BHQT0dZR+AlfA,
	johan.eker-IzeFyvvaP7pWk0Htik3J/w, p.faure-et3tyl94nDNyDzI6CaY1VQ,
	Linux Kernel, claudio-YOzL5CV4y4YG1A2ADO40+w,
	michael-dyjBcgdgk7Pe9wHmmfpqLFaTQe2KTcn/,
	fchecconi-Re5JQEeQqe8AvxtiuMwx3w, tommaso.cucinotta-gAmJrWFzCps,
	juri.lelli-Re5JQEeQqe8AvxtiuMwx3w,
	nicola.manica-+cHZLFJ93xAO91npARCAeA, luca.abeni-3IIOeSMMxS4,
	dhaval.giani-Re5JQEeQqe8AvxtiuMwx3w,
	hgu1972-Re5JQEeQqe8AvxtiuMwx3w, Paul McKenney,
	insop.song-Re5JQEeQqe8AvxtiuMwx3w,
	liming.wang-CWA4WttNNZF54TAoqtyWWQ, jkacur-H+wXaHxf7aLQT0dZR+AlfA,
	linux-man-u79uwXL29TY76Z2rM5mHXA

On Wed, Apr 30, 2014 at 01:09:25PM +0200, Michael Kerrisk (man-pages) wrote:
> Hi Peter,
> 
> Thanks for the revision. More comments below. Could you revise in 
> the light of those comments, and hopefully also after feedback from 
> Juri and Dario?

New text below; hopefully a little clearer. If not, do holler.

---
> [1] A page describing the sched_setattr() and sched_getattr() APIs

NAME
	sched_setattr, sched_getattr - set and get scheduling policy/attributes

SYNOPSIS
	#include <sched.h>

	struct sched_attr {
		u32 size;
		u32 sched_policy;
		u64 sched_flags;

		/* SCHED_NORMAL, SCHED_BATCH */
		s32 sched_nice;

		/* SCHED_FIFO, SCHED_RR */
		u32 sched_priority;

		/* SCHED_DEADLINE */
		u64 sched_runtime;
		u64 sched_deadline;
		u64 sched_period;
	};

	int sched_setattr(pid_t pid, const struct sched_attr *attr, unsigned int flags);

	int sched_getattr(pid_t pid, const struct sched_attr *attr, unsigned int size, unsigned int flags);

DESCRIPTION
	sched_setattr() sets both the scheduling policy and the
	associated attributes for the process whose ID is specified in
	pid.

	sched_setattr() replaces sched_setscheduler(), sched_setparam(),
	nice() and some of setpriority().

	If pid equals zero, the scheduling policy and attributes
	of the calling process will be set.  The interpretation of the
	argument attr depends on the selected policy.  Currently, Linux
	supports the following "normal" (i.e., non-real-time) scheduling
	policies:

	SCHED_OTHER	the standard "fair" time-sharing policy;

	SCHED_BATCH	for "batch" style execution of processes; and

	SCHED_IDLE	for running very low priority background jobs.

	The following "real-time" policies are also supported, for
	special time-critical applications that need precise control
	over the way in which runnable processes are selected for
	execution:

	SCHED_FIFO	a static priority first-in, first-out policy;

	SCHED_RR	a static priority round-robin policy; and

	SCHED_DEADLINE	a dynamic priority deadline policy.

	The semantics of each of these policies are detailed in
	sched(7).

	sched_attr::size must be set to the size of the structure, as in
	sizeof(struct sched_attr), if the provided structure is smaller
	than the kernel structure, any additional fields are assumed
	'0'. If the provided structure is larger than the kernel
	structure, the kernel verifies all additional fields are '0' if
	not the syscall will fail with -E2BIG.

	sched_attr::sched_policy the desired scheduling policy.

	sched_attr::sched_flags additional flags that can influence
	scheduling behaviour. Currently as per Linux kernel 3.14:

		SCHED_FLAG_RESET_ON_FORK - resets the scheduling policy
		to: (struct sched_attr){ .sched_policy = SCHED_OTHER, }
		on fork().

	is the only supported flag.

	sched_attr::sched_nice should only be set for SCHED_OTHER,
	SCHED_BATCH, the desired nice value [-20,19], see sched(7).

	sched_attr::sched_priority should only be set for SCHED_FIFO,
	SCHED_RR, the desired static priority [1,99], see sched(7).

	sched_attr::sched_runtime in nanoseconds,
	sched_attr::sched_deadline in nanoseconds,
	sched_attr::sched_period in nanoseconds, should only be set for
	SCHED_DEADLINE and are the traditional sporadic task model
	parameters, see sched(7).

	The flags argument should be 0.

	sched_getattr() queries the scheduling policy currently applied
	to the process identified by pid.

	Similar to sched_setattr(), sched_getattr() replaces
	sched_getscheduler(), sched_getparam() and some of
	getpriority().

	If pid equals zero, the policy of the calling process will be
	retrieved.

	The size argument should reflect the size of struct sched_attr
	as known to userspace. The kernel fills out sched_attr::size to
	the size of its sched_attr structure. If the user provided
	structure is larger, additional fields are not touched. If the
	user provided structure is smaller, but the kernel needs to
	return values outside the provided space, the syscall will fail
	with -E2BIG.

	The flags argument should be 0.

	The other sched_attr fields are filled out as described in
	sched_setattr().

RETURN VALUE
	On success, sched_setattr() and sched_getattr() return 0. On
	error, -1 is returned, and errno is set appropriately.

ERRORS
       EINVAL The scheduling policy is not one  of  the  recognized  policies,
              param is NULL, or param does not make sense for the selected
	      policy.

       EPERM  The calling process does not have appropriate privileges.

       ESRCH  The process whose ID is pid could not be found.

       E2BIG  The provided storage for struct sched_attr is either too
              big, see sched_setattr(), or too small, see sched_getattr().

       EBUSY  SCHED_DEADLINE admission control failure, see sched(7).

NOTES
       While the text above (and in sched_setscheduler(2)) talks about
       processes, in actual fact these system calls are thread specific.

       While the SCHED_DEADLINE parameters are in nanoseconds, current
       kernels truncate the lower 10 bits and we get an effective
       microsecond resolution.

> [2] A piece of text describing the SCHED_DEADLINE policy, which I can
> drop into sched(7).

    SCHED_DEADLINE: Sporadic task model deadline scheduling
       SCHED_DEADLINE is currently implemented using GEDF (Global
       Earliest Deadline First) with additional CBS (Constant Bandwidth
       Server).

       A sporadic task is on that has a sequence of jobs, where each job
       is activated at most once per period [ns]. Each job will have an
       absolute deadline relative to its activation before which it must
       finish its execution, and it shall at no time run longer
       than runtime [ns] after its release.

              activation/wakeup       absolute deadline
              |        release        |
              v        v              v
       -------x--------x--------------x--------x-------
                       |<- Runtime -->|
              |<---------- Deadline ->|
              |<---------- Period  ----------->|

       This gives: runtime <= (rel) deadline <= period.

       The CBS guarantees non-interference between tasks, by throttling
       tasks that attempt to over-run their specified runtime.

       In general the set of all SCHED_DEADLINE tasks is not
       feasible/schedulable within the given constraints. Therefore we
       must do an admittance test on setting/changing SCHED_DEADLINE
       policy/attributes.

       This admission test calculates that the task set is
       feasible/schedulable, failing this, sched_setattr() will return
       -EBUSY.

       For example, it is required (but not necessarily sufficient) for
       the total utilization to be less or equal to the total amount of
       CPUs available, where, since each task can maximally run for
       runtime [us] per period [us], that task's utilization is its
       runtime/period.

       Because we must be able to calculate admittance SCHED_DEADLINE
       tasks are the highest priority (user controllable) tasks in the
       system, if any SCHED_DEADLINE task is runnable it will preempt
       any FIFO/RR/OTHER/BATCH/IDLE task.

       SCHED_DEADLINE tasks will fail fork(2) with -EAGAIN, except when
       the forking task has SCHED_FLAG_RESET_ON_FORK set.

       A SCHED_DEADLINE task calling sched_yield() will 'yield' the
       current job and wait for a new period to begin.

--
To unsubscribe from this list: send the line "unsubscribe linux-man" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: sched_{set,get}attr() manpage
       [not found]                             ` <20140430130937.GH30445-ndre7Fmf5hadTX5a5knrm8zTDFooKrT+cvkQGrU6aU0@public.gmane.org>
@ 2014-05-03 10:43                               ` Juri Lelli
       [not found]                                 ` <20140503124355.5d927080518051ca507bc381-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
  0 siblings, 1 reply; 26+ messages in thread
From: Juri Lelli @ 2014-05-03 10:43 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Michael Kerrisk (man-pages), Dario Faggioli, Thomas Gleixner,
	Ingo Molnar, rostedt-nx8X9YLhiw1AfugRpC6u6w, Oleg Nesterov,
	fweisbec-Re5JQEeQqe8AvxtiuMwx3w, darren-P76s1CtE8BHQT0dZR+AlfA,
	johan.eker-IzeFyvvaP7pWk0Htik3J/w, p.faure-et3tyl94nDNyDzI6CaY1VQ,
	Linux Kernel, claudio-YOzL5CV4y4YG1A2ADO40+w,
	michael-dyjBcgdgk7Pe9wHmmfpqLFaTQe2KTcn/,
	fchecconi-Re5JQEeQqe8AvxtiuMwx3w, tommaso.cucinotta-gAmJrWFzCps,
	nicola.manica-+cHZLFJ93xAO91npARCAeA, luca.abeni-3IIOeSMMxS4,
	dhaval.giani-Re5JQEeQqe8AvxtiuMwx3w,
	hgu1972-Re5JQEeQqe8AvxtiuMwx3w, Paul McKenney,
	insop.song-Re5JQEeQqe8AvxtiuMwx3w,
	liming.wang-CWA4WttNNZF54TAoqtyWWQ, jkacur-H+wXaHxf7aLQT0dZR+AlfA,
	linux-man-u79uwXL29TY76Z2rM5mHXA

Hi,

sorry for the late reply, but I was travelling for work.

On Wed, 30 Apr 2014 15:09:37 +0200
Peter Zijlstra <peterz-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org> wrote:

> On Wed, Apr 30, 2014 at 01:09:25PM +0200, Michael Kerrisk (man-pages) wrote:
> > Hi Peter,
> > 
> > Thanks for the revision. More comments below. Could you revise in 
> > the light of those comments, and hopefully also after feedback from 
> > Juri and Dario?
> 
> New text below; hopefully a little clearer. If not, do holler.
> 
> ---
> > [1] A page describing the sched_setattr() and sched_getattr() APIs
> 
> NAME
> 	sched_setattr, sched_getattr - set and get scheduling policy/attributes
> 
> SYNOPSIS
> 	#include <sched.h>
> 
> 	struct sched_attr {
> 		u32 size;
> 		u32 sched_policy;
> 		u64 sched_flags;
> 
> 		/* SCHED_NORMAL, SCHED_BATCH */
> 		s32 sched_nice;
> 
> 		/* SCHED_FIFO, SCHED_RR */
> 		u32 sched_priority;
> 
> 		/* SCHED_DEADLINE */
> 		u64 sched_runtime;
> 		u64 sched_deadline;
> 		u64 sched_period;
> 	};
> 
> 	int sched_setattr(pid_t pid, const struct sched_attr *attr, unsigned int flags);
> 
> 	int sched_getattr(pid_t pid, const struct sched_attr *attr, unsigned int size, unsigned int flags);
> 
> DESCRIPTION
> 	sched_setattr() sets both the scheduling policy and the
> 	associated attributes for the process whose ID is specified in
> 	pid.
> 
> 	sched_setattr() replaces sched_setscheduler(), sched_setparam(),
> 	nice() and some of setpriority().
> 
> 	If pid equals zero, the scheduling policy and attributes
> 	of the calling process will be set.  The interpretation of the
> 	argument attr depends on the selected policy.  Currently, Linux
> 	supports the following "normal" (i.e., non-real-time) scheduling
> 	policies:
> 
> 	SCHED_OTHER	the standard "fair" time-sharing policy;
> 
> 	SCHED_BATCH	for "batch" style execution of processes; and
> 
> 	SCHED_IDLE	for running very low priority background jobs.
> 
> 	The following "real-time" policies are also supported, for
> 	special time-critical applications that need precise control
> 	over the way in which runnable processes are selected for
> 	execution:
> 
> 	SCHED_FIFO	a static priority first-in, first-out policy;
> 
> 	SCHED_RR	a static priority round-robin policy; and
> 
> 	SCHED_DEADLINE	a dynamic priority deadline policy.
> 
> 	The semantics of each of these policies are detailed in
> 	sched(7).
> 
> 	sched_attr::size must be set to the size of the structure, as in
> 	sizeof(struct sched_attr), if the provided structure is smaller
> 	than the kernel structure, any additional fields are assumed
> 	'0'. If the provided structure is larger than the kernel
> 	structure, the kernel verifies all additional fields are '0' if
> 	not the syscall will fail with -E2BIG.
> 
> 	sched_attr::sched_policy the desired scheduling policy.
> 
> 	sched_attr::sched_flags additional flags that can influence
> 	scheduling behaviour. Currently as per Linux kernel 3.14:
> 
> 		SCHED_FLAG_RESET_ON_FORK - resets the scheduling policy
> 		to: (struct sched_attr){ .sched_policy = SCHED_OTHER, }
> 		on fork().
> 
> 	is the only supported flag.
> 
> 	sched_attr::sched_nice should only be set for SCHED_OTHER,
> 	SCHED_BATCH, the desired nice value [-20,19], see sched(7).
> 
> 	sched_attr::sched_priority should only be set for SCHED_FIFO,
> 	SCHED_RR, the desired static priority [1,99], see sched(7).
> 
> 	sched_attr::sched_runtime in nanoseconds,
> 	sched_attr::sched_deadline in nanoseconds,
> 	sched_attr::sched_period in nanoseconds, should only be set for
> 	SCHED_DEADLINE and are the traditional sporadic task model
> 	parameters, see sched(7).
> 
> 	The flags argument should be 0.
> 
> 	sched_getattr() queries the scheduling policy currently applied
> 	to the process identified by pid.
> 
> 	Similar to sched_setattr(), sched_getattr() replaces
> 	sched_getscheduler(), sched_getparam() and some of
> 	getpriority().
> 
> 	If pid equals zero, the policy of the calling process will be
> 	retrieved.
> 
> 	The size argument should reflect the size of struct sched_attr
> 	as known to userspace. The kernel fills out sched_attr::size to
> 	the size of its sched_attr structure. If the user provided
> 	structure is larger, additional fields are not touched. If the
> 	user provided structure is smaller, but the kernel needs to
> 	return values outside the provided space, the syscall will fail
> 	with -E2BIG.
> 
> 	The flags argument should be 0.
> 
> 	The other sched_attr fields are filled out as described in
> 	sched_setattr().
> 
> RETURN VALUE
> 	On success, sched_setattr() and sched_getattr() return 0. On
> 	error, -1 is returned, and errno is set appropriately.
> 
> ERRORS
>        EINVAL The scheduling policy is not one  of  the  recognized  policies,
>               param is NULL, or param does not make sense for the selected
> 	      policy.
> 
>        EPERM  The calling process does not have appropriate privileges.
> 
>        ESRCH  The process whose ID is pid could not be found.
> 
>        E2BIG  The provided storage for struct sched_attr is either too
>               big, see sched_setattr(), or too small, see sched_getattr().
> 
>        EBUSY  SCHED_DEADLINE admission control failure, see sched(7).
> 
> NOTES
>        While the text above (and in sched_setscheduler(2)) talks about
>        processes, in actual fact these system calls are thread specific.
> 
>        While the SCHED_DEADLINE parameters are in nanoseconds, current
>        kernels truncate the lower 10 bits and we get an effective
>        microsecond resolution.
> 
> > [2] A piece of text describing the SCHED_DEADLINE policy, which I can
> > drop into sched(7).
> 

I'd tweak the following a bit, just to be sure that users understand
that one thing is the model of tasks behavior and another thing is what
you can set using SCHED_DEADLINE. Then the two things are obviously
closely related, but different settings can be in principle used to
schedule the same task set (with lot of literature about optimal
settings and so on).

>     SCHED_DEADLINE: Sporadic task model deadline scheduling
>        SCHED_DEADLINE is currently implemented using GEDF (Global
>        Earliest Deadline First) with additional CBS (Constant Bandwidth
>        Server).
> 
>        A sporadic task is on that has a sequence of jobs, where each job
>        is activated at most once per period [ns]. Each job will have an
>        absolute deadline relative to its activation before which it must
>        finish its execution, and it shall at no time run longer
>        than runtime [ns] after its release.
> 

A sporadic task is one that has a sequence of jobs, where each job is
activated at most once per period. Each job has also a relative
deadline, before which it should finish execution, and a computation
time, that is the time necessary for executing the job without
interruption. The instant of time when a task wakes up, because a new
job has to be executed, is called arrival time (and it is also referred
to as request time or release time). Start time is instead the time at
which a task starts its execution. The absolute deadline is thus
obtained adding the relative deadline to the arrival time. The
following diagram clarifies these terms:

>               activation/wakeup       absolute deadline
>               |        release        |
>               v        v              v
>        -------x--------x--------------x--------x-------
>                        |<- Runtime -->|
>               |<---------- Deadline ->|
>               |<---------- Period  ----------->|
> 

               arrival/wakeup           absolute deadline
               |        start time          |
               v        v                   v
        -------x--------xoooooooooooo-------x--------x-----
                        |<- comp. ->|
               |<---------- rel. deadline ->|
               |<---------- period   --------------->|

SCHED_DEADLINE allows the user to specify three parameters (see
sched_setattr(2)): Runtime [ns], Deadline [ns] and Period [ns]. Such
parameters has not necessarily to correspond to the aforementioned
terms, while usual practise is to set Runtime to something bigger than
the average computation time (or worst-case execution time for hard
real-time tasks), Deadline to the relative deadline and Period to the
period of the task. With such a setting we would have:

               arrival/wakeup           absolute deadline
               |        start time          |
               v        v                   v
        -------x--------xoooooooooooo-------x--------x-----
                        |<- Runtime  ->|
               |<---------- Deadline ------>|
               |<---------- Period   --------------->|
 


>        This gives: runtime <= (rel) deadline <= period.
> 

It is checked that: Runtime <= Deadline <= Period.

>        The CBS guarantees non-interference between tasks, by throttling
>        tasks that attempt to over-run their specified runtime.
> 

s/runtime/Runtime to be consistent.

>        In general the set of all SCHED_DEADLINE tasks is not
>        feasible/schedulable within the given constraints. Therefore we
>        must do an admittance test on setting/changing SCHED_DEADLINE
>        policy/attributes.
> 

To guarantee some degree of timeliness we must do an admission test on
setting/changing SCHED_DEADLINE policy/attributes.


>        This admission test calculates that the task set is
>        feasible/schedulable, failing this, sched_setattr() will return
>        -EBUSY.
> 
>        For example, it is required (but not necessarily sufficient) for
>        the total utilization to be less or equal to the total amount of
>        CPUs available, where, since each task can maximally run for
>        runtime [us] per period [us], that task's utilization is its
>        runtime/period.
> 

CPUs available, where, since each task can maximally run for Runtime
per Period, that task's utilization is its Runtime/Period.

>        Because we must be able to calculate admittance SCHED_DEADLINE
>        tasks are the highest priority (user controllable) tasks in the
>        system, if any SCHED_DEADLINE task is runnable it will preempt
>        any FIFO/RR/OTHER/BATCH/IDLE task.
> 
>        SCHED_DEADLINE tasks will fail fork(2) with -EAGAIN, except when
>        the forking task has SCHED_FLAG_RESET_ON_FORK set.
> 
>        A SCHED_DEADLINE task calling sched_yield() will 'yield' the
>        current job and wait for a new period to begin.
> 

Does it look any better?

Thanks,

- Juri
--
To unsubscribe from this list: send the line "unsubscribe linux-man" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: sched_{set,get}attr() manpage
       [not found]                                 ` <20140503124355.5d927080518051ca507bc381-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
@ 2014-05-05  6:55                                   ` Michael Kerrisk (man-pages)
  2014-05-05  7:21                                     ` Peter Zijlstra
  0 siblings, 1 reply; 26+ messages in thread
From: Michael Kerrisk (man-pages) @ 2014-05-05  6:55 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Juri Lelli, mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w, Dario Faggioli,
	Thomas Gleixner, Ingo Molnar, rostedt-nx8X9YLhiw1AfugRpC6u6w,
	Oleg Nesterov, fweisbec-Re5JQEeQqe8AvxtiuMwx3w,
	darren-P76s1CtE8BHQT0dZR+AlfA, johan.eker-IzeFyvvaP7pWk0Htik3J/w,
	p.faure-et3tyl94nDNyDzI6CaY1VQ, Linux Kernel,
	claudio-YOzL5CV4y4YG1A2ADO40+w,
	michael-dyjBcgdgk7Pe9wHmmfpqLFaTQe2KTcn/,
	fchecconi-Re5JQEeQqe8AvxtiuMwx3w, tommaso.cucinotta-gAmJrWFzCps,
	nicola.manica-+cHZLFJ93xAO91npARCAeA, luca.abeni-3IIOeSMMxS4,
	dhaval.giani-Re5JQEeQqe8AvxtiuMwx3w,
	hgu1972-Re5JQEeQqe8AvxtiuMwx3w, Paul McKenney,
	insop.song-Re5JQEeQqe8AvxtiuMwx3w,
	liming.wang-CWA4WttNNZF54TAoqtyWWQ, jkacur-H+wXaHxf7aLQT0dZR+AlfA,
	linux-man-u79uwXL29TY76Z2rM5mHXA

Hi Peter,

Looks like a good set of comments from Juri. Could you revise and 
resubmit?

By the way, I assume you are just writing this page as raw text.
While I'd prefer to get proper man markup source, I'll add that
if you if you don't :-/. But, in that case, I need to know the
copyright and license you want to use. Please see
https://www.kernel.org/doc/man-pages/licenses.html

Cheers,

Michael


On 05/03/2014 12:43 PM, Juri Lelli wrote:
> Hi,
> 
> sorry for the late reply, but I was travelling for work.
> 
> On Wed, 30 Apr 2014 15:09:37 +0200
> Peter Zijlstra <peterz-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org> wrote:
> 
>> On Wed, Apr 30, 2014 at 01:09:25PM +0200, Michael Kerrisk (man-pages) wrote:
>>> Hi Peter,
>>>
>>> Thanks for the revision. More comments below. Could you revise in 
>>> the light of those comments, and hopefully also after feedback from 
>>> Juri and Dario?
>>
>> New text below; hopefully a little clearer. If not, do holler.
>>
>> ---
>>> [1] A page describing the sched_setattr() and sched_getattr() APIs
>>
>> NAME
>> 	sched_setattr, sched_getattr - set and get scheduling policy/attributes
>>
>> SYNOPSIS
>> 	#include <sched.h>
>>
>> 	struct sched_attr {
>> 		u32 size;
>> 		u32 sched_policy;
>> 		u64 sched_flags;
>>
>> 		/* SCHED_NORMAL, SCHED_BATCH */
>> 		s32 sched_nice;
>>
>> 		/* SCHED_FIFO, SCHED_RR */
>> 		u32 sched_priority;
>>
>> 		/* SCHED_DEADLINE */
>> 		u64 sched_runtime;
>> 		u64 sched_deadline;
>> 		u64 sched_period;
>> 	};
>>
>> 	int sched_setattr(pid_t pid, const struct sched_attr *attr, unsigned int flags);
>>
>> 	int sched_getattr(pid_t pid, const struct sched_attr *attr, unsigned int size, unsigned int flags);
>>
>> DESCRIPTION
>> 	sched_setattr() sets both the scheduling policy and the
>> 	associated attributes for the process whose ID is specified in
>> 	pid.
>>
>> 	sched_setattr() replaces sched_setscheduler(), sched_setparam(),
>> 	nice() and some of setpriority().
>>
>> 	If pid equals zero, the scheduling policy and attributes
>> 	of the calling process will be set.  The interpretation of the
>> 	argument attr depends on the selected policy.  Currently, Linux
>> 	supports the following "normal" (i.e., non-real-time) scheduling
>> 	policies:
>>
>> 	SCHED_OTHER	the standard "fair" time-sharing policy;
>>
>> 	SCHED_BATCH	for "batch" style execution of processes; and
>>
>> 	SCHED_IDLE	for running very low priority background jobs.
>>
>> 	The following "real-time" policies are also supported, for
>> 	special time-critical applications that need precise control
>> 	over the way in which runnable processes are selected for
>> 	execution:
>>
>> 	SCHED_FIFO	a static priority first-in, first-out policy;
>>
>> 	SCHED_RR	a static priority round-robin policy; and
>>
>> 	SCHED_DEADLINE	a dynamic priority deadline policy.
>>
>> 	The semantics of each of these policies are detailed in
>> 	sched(7).
>>
>> 	sched_attr::size must be set to the size of the structure, as in
>> 	sizeof(struct sched_attr), if the provided structure is smaller
>> 	than the kernel structure, any additional fields are assumed
>> 	'0'. If the provided structure is larger than the kernel
>> 	structure, the kernel verifies all additional fields are '0' if
>> 	not the syscall will fail with -E2BIG.
>>
>> 	sched_attr::sched_policy the desired scheduling policy.
>>
>> 	sched_attr::sched_flags additional flags that can influence
>> 	scheduling behaviour. Currently as per Linux kernel 3.14:
>>
>> 		SCHED_FLAG_RESET_ON_FORK - resets the scheduling policy
>> 		to: (struct sched_attr){ .sched_policy = SCHED_OTHER, }
>> 		on fork().
>>
>> 	is the only supported flag.
>>
>> 	sched_attr::sched_nice should only be set for SCHED_OTHER,
>> 	SCHED_BATCH, the desired nice value [-20,19], see sched(7).
>>
>> 	sched_attr::sched_priority should only be set for SCHED_FIFO,
>> 	SCHED_RR, the desired static priority [1,99], see sched(7).
>>
>> 	sched_attr::sched_runtime in nanoseconds,
>> 	sched_attr::sched_deadline in nanoseconds,
>> 	sched_attr::sched_period in nanoseconds, should only be set for
>> 	SCHED_DEADLINE and are the traditional sporadic task model
>> 	parameters, see sched(7).
>>
>> 	The flags argument should be 0.
>>
>> 	sched_getattr() queries the scheduling policy currently applied
>> 	to the process identified by pid.
>>
>> 	Similar to sched_setattr(), sched_getattr() replaces
>> 	sched_getscheduler(), sched_getparam() and some of
>> 	getpriority().
>>
>> 	If pid equals zero, the policy of the calling process will be
>> 	retrieved.
>>
>> 	The size argument should reflect the size of struct sched_attr
>> 	as known to userspace. The kernel fills out sched_attr::size to
>> 	the size of its sched_attr structure. If the user provided
>> 	structure is larger, additional fields are not touched. If the
>> 	user provided structure is smaller, but the kernel needs to
>> 	return values outside the provided space, the syscall will fail
>> 	with -E2BIG.
>>
>> 	The flags argument should be 0.
>>
>> 	The other sched_attr fields are filled out as described in
>> 	sched_setattr().
>>
>> RETURN VALUE
>> 	On success, sched_setattr() and sched_getattr() return 0. On
>> 	error, -1 is returned, and errno is set appropriately.
>>
>> ERRORS
>>        EINVAL The scheduling policy is not one  of  the  recognized  policies,
>>               param is NULL, or param does not make sense for the selected
>> 	      policy.
>>
>>        EPERM  The calling process does not have appropriate privileges.
>>
>>        ESRCH  The process whose ID is pid could not be found.
>>
>>        E2BIG  The provided storage for struct sched_attr is either too
>>               big, see sched_setattr(), or too small, see sched_getattr().
>>
>>        EBUSY  SCHED_DEADLINE admission control failure, see sched(7).
>>
>> NOTES
>>        While the text above (and in sched_setscheduler(2)) talks about
>>        processes, in actual fact these system calls are thread specific.
>>
>>        While the SCHED_DEADLINE parameters are in nanoseconds, current
>>        kernels truncate the lower 10 bits and we get an effective
>>        microsecond resolution.
>>
>>> [2] A piece of text describing the SCHED_DEADLINE policy, which I can
>>> drop into sched(7).
>>
> 
> I'd tweak the following a bit, just to be sure that users understand
> that one thing is the model of tasks behavior and another thing is what
> you can set using SCHED_DEADLINE. Then the two things are obviously
> closely related, but different settings can be in principle used to
> schedule the same task set (with lot of literature about optimal
> settings and so on).
> 
>>     SCHED_DEADLINE: Sporadic task model deadline scheduling
>>        SCHED_DEADLINE is currently implemented using GEDF (Global
>>        Earliest Deadline First) with additional CBS (Constant Bandwidth
>>        Server).
>>
>>        A sporadic task is on that has a sequence of jobs, where each job
>>        is activated at most once per period [ns]. Each job will have an
>>        absolute deadline relative to its activation before which it must
>>        finish its execution, and it shall at no time run longer
>>        than runtime [ns] after its release.
>>
> 
> A sporadic task is one that has a sequence of jobs, where each job is
> activated at most once per period. Each job has also a relative
> deadline, before which it should finish execution, and a computation
> time, that is the time necessary for executing the job without
> interruption. The instant of time when a task wakes up, because a new
> job has to be executed, is called arrival time (and it is also referred
> to as request time or release time). Start time is instead the time at
> which a task starts its execution. The absolute deadline is thus
> obtained adding the relative deadline to the arrival time. The
> following diagram clarifies these terms:
> 
>>               activation/wakeup       absolute deadline
>>               |        release        |
>>               v        v              v
>>        -------x--------x--------------x--------x-------
>>                        |<- Runtime -->|
>>               |<---------- Deadline ->|
>>               |<---------- Period  ----------->|
>>
> 
>                arrival/wakeup           absolute deadline
>                |        start time          |
>                v        v                   v
>         -------x--------xoooooooooooo-------x--------x-----
>                         |<- comp. ->|
>                |<---------- rel. deadline ->|
>                |<---------- period   --------------->|
> 
> SCHED_DEADLINE allows the user to specify three parameters (see
> sched_setattr(2)): Runtime [ns], Deadline [ns] and Period [ns]. Such
> parameters has not necessarily to correspond to the aforementioned
> terms, while usual practise is to set Runtime to something bigger than
> the average computation time (or worst-case execution time for hard
> real-time tasks), Deadline to the relative deadline and Period to the
> period of the task. With such a setting we would have:
> 
>                arrival/wakeup           absolute deadline
>                |        start time          |
>                v        v                   v
>         -------x--------xoooooooooooo-------x--------x-----
>                         |<- Runtime  ->|
>                |<---------- Deadline ------>|
>                |<---------- Period   --------------->|
>  
> 
> 
>>        This gives: runtime <= (rel) deadline <= period.
>>
> 
> It is checked that: Runtime <= Deadline <= Period.
> 
>>        The CBS guarantees non-interference between tasks, by throttling
>>        tasks that attempt to over-run their specified runtime.
>>
> 
> s/runtime/Runtime to be consistent.
> 
>>        In general the set of all SCHED_DEADLINE tasks is not
>>        feasible/schedulable within the given constraints. Therefore we
>>        must do an admittance test on setting/changing SCHED_DEADLINE
>>        policy/attributes.
>>
> 
> To guarantee some degree of timeliness we must do an admission test on
> setting/changing SCHED_DEADLINE policy/attributes.
> 
> 
>>        This admission test calculates that the task set is
>>        feasible/schedulable, failing this, sched_setattr() will return
>>        -EBUSY.
>>
>>        For example, it is required (but not necessarily sufficient) for
>>        the total utilization to be less or equal to the total amount of
>>        CPUs available, where, since each task can maximally run for
>>        runtime [us] per period [us], that task's utilization is its
>>        runtime/period.
>>
> 
> CPUs available, where, since each task can maximally run for Runtime
> per Period, that task's utilization is its Runtime/Period.
> 
>>        Because we must be able to calculate admittance SCHED_DEADLINE
>>        tasks are the highest priority (user controllable) tasks in the
>>        system, if any SCHED_DEADLINE task is runnable it will preempt
>>        any FIFO/RR/OTHER/BATCH/IDLE task.
>>
>>        SCHED_DEADLINE tasks will fail fork(2) with -EAGAIN, except when
>>        the forking task has SCHED_FLAG_RESET_ON_FORK set.
>>
>>        A SCHED_DEADLINE task calling sched_yield() will 'yield' the
>>        current job and wait for a new period to begin.
>>
> 
> Does it look any better?
> 
> Thanks,
> 
> - Juri
> 


-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/
--
To unsubscribe from this list: send the line "unsubscribe linux-man" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: sched_{set,get}attr() manpage
  2014-05-05  6:55                                   ` Michael Kerrisk (man-pages)
@ 2014-05-05  7:21                                     ` Peter Zijlstra
       [not found]                                       ` <20140505072114.GY11096-ndre7Fmf5hadTX5a5knrm8zTDFooKrT+cvkQGrU6aU0@public.gmane.org>
  2014-05-06  8:16                                       ` Peter Zijlstra
  0 siblings, 2 replies; 26+ messages in thread
From: Peter Zijlstra @ 2014-05-05  7:21 UTC (permalink / raw)
  To: Michael Kerrisk (man-pages)
  Cc: Juri Lelli, Dario Faggioli, Thomas Gleixner, Ingo Molnar, rostedt,
	Oleg Nesterov, fweisbec, darren, johan.eker, p.faure,
	Linux Kernel, claudio, michael, fchecconi, tommaso.cucinotta,
	nicola.manica, luca.abeni, dhaval.giani, hgu1972, Paul McKenney,
	insop.song, liming.wang, jkacur, linux-man

[-- Attachment #1: Type: text/plain, Size: 942 bytes --]

On Mon, May 05, 2014 at 08:55:28AM +0200, Michael Kerrisk (man-pages) wrote:
> Hi Peter,
> 
> Looks like a good set of comments from Juri. Could you revise and 
> resubmit?

Yeah, I'll try and get it done today, but there's a few icky bugs
waiting for my attention as well, I'll do me bestest :-)

> By the way, I assume you are just writing this page as raw text.
> While I'd prefer to get proper man markup source, I'll add that
> if you if you don't :-/. 

Well, learning *roff will likely take me more time than writing this
text + all revisions so far :/ But yeah, I appreciate the grief.

Is there a TeX variant one could use to generate the *roff muck? While
my TeX isn't entirely fresh its at least something I've done lots of.

> But, in that case, I need to know the
> copyright and license you want to use. Please see
> https://www.kernel.org/doc/man-pages/licenses.html

GPLv2 + DOC (not v2+) sounds good.

[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: sched_{set,get}attr() manpage
       [not found]                                       ` <20140505072114.GY11096-ndre7Fmf5hadTX5a5knrm8zTDFooKrT+cvkQGrU6aU0@public.gmane.org>
@ 2014-05-05  7:41                                         ` Michael Kerrisk (man-pages)
       [not found]                                           ` <53674094.2020307-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
  0 siblings, 1 reply; 26+ messages in thread
From: Michael Kerrisk (man-pages) @ 2014-05-05  7:41 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w, Juri Lelli, Dario Faggioli,
	Thomas Gleixner, Ingo Molnar, rostedt-nx8X9YLhiw1AfugRpC6u6w,
	Oleg Nesterov, fweisbec-Re5JQEeQqe8AvxtiuMwx3w,
	darren-P76s1CtE8BHQT0dZR+AlfA, johan.eker-IzeFyvvaP7pWk0Htik3J/w,
	p.faure-et3tyl94nDNyDzI6CaY1VQ, Linux Kernel,
	claudio-YOzL5CV4y4YG1A2ADO40+w,
	michael-dyjBcgdgk7Pe9wHmmfpqLFaTQe2KTcn/,
	fchecconi-Re5JQEeQqe8AvxtiuMwx3w, tommaso.cucinotta-gAmJrWFzCps,
	nicola.manica-+cHZLFJ93xAO91npARCAeA, luca.abeni-3IIOeSMMxS4,
	dhaval.giani-Re5JQEeQqe8AvxtiuMwx3w,
	hgu1972-Re5JQEeQqe8AvxtiuMwx3w, Paul McKenney,
	insop.song-Re5JQEeQqe8AvxtiuMwx3w,
	liming.wang-CWA4WttNNZF54TAoqtyWWQ, jkacur-H+wXaHxf7aLQT0dZR+AlfA,
	linux-man-u79uwXL29TY76Z2rM5mHXA

On 05/05/2014 09:21 AM, Peter Zijlstra wrote:
> On Mon, May 05, 2014 at 08:55:28AM +0200, Michael Kerrisk (man-pages) wrote:
>> Hi Peter,
>>
>> Looks like a good set of comments from Juri. Could you revise and 
>> resubmit?
> 
> Yeah, I'll try and get it done today, but there's a few icky bugs
> waiting for my attention as well, I'll do me bestest :-)
> 
>> By the way, I assume you are just writing this page as raw text.
>> While I'd prefer to get proper man markup source, I'll add that
>> if you if you don't :-/. 
> 
> Well, learning *roff will likely take me more time than writing this
> text + all revisions so far :/ But yeah, I appreciate the grief.
> 
> Is there a TeX variant one could use to generate the *roff muck? While
> my TeX isn't entirely fresh its at least something I've done lots of.

Don't worry -- just send me the plain text; I'll do it. I appreciate 
you writing the text in the first place; I'll handle the rest--it won't
take me too long, and probably I'll find things to fix/check on the way.

>> But, in that case, I need to know the
>> copyright and license you want to use. Please see
>> https://www.kernel.org/doc/man-pages/licenses.html
> 
> GPLv2 + DOC (not v2+) sounds good.

I'm a little unclear here. Do you or don't you mean
https://www.kernel.org/doc/man-pages/licenses.html#gpl
?

(Note, I'd really prefer to stick to one of those licenses
(without variants). (My personal preference is the "verbatim"
license, which is the most widely used one.) There's already
do many licenses in in man-pages...

Cheers,

Michael


-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/
--
To unsubscribe from this list: send the line "unsubscribe linux-man" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: sched_{set,get}attr() manpage
       [not found]                                           ` <53674094.2020307-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
@ 2014-05-05  7:47                                             ` Peter Zijlstra
  2014-05-05  9:53                                               ` Michael Kerrisk (man-pages)
  0 siblings, 1 reply; 26+ messages in thread
From: Peter Zijlstra @ 2014-05-05  7:47 UTC (permalink / raw)
  To: Michael Kerrisk (man-pages)
  Cc: Juri Lelli, Dario Faggioli, Thomas Gleixner, Ingo Molnar,
	rostedt-nx8X9YLhiw1AfugRpC6u6w, Oleg Nesterov,
	fweisbec-Re5JQEeQqe8AvxtiuMwx3w, darren-P76s1CtE8BHQT0dZR+AlfA,
	johan.eker-IzeFyvvaP7pWk0Htik3J/w, p.faure-et3tyl94nDNyDzI6CaY1VQ,
	Linux Kernel, claudio-YOzL5CV4y4YG1A2ADO40+w,
	michael-dyjBcgdgk7Pe9wHmmfpqLFaTQe2KTcn/,
	fchecconi-Re5JQEeQqe8AvxtiuMwx3w, tommaso.cucinotta-gAmJrWFzCps,
	nicola.manica-+cHZLFJ93xAO91npARCAeA, luca.abeni-3IIOeSMMxS4,
	dhaval.giani-Re5JQEeQqe8AvxtiuMwx3w,
	hgu1972-Re5JQEeQqe8AvxtiuMwx3w, Paul McKenney,
	insop.song-Re5JQEeQqe8AvxtiuMwx3w,
	liming.wang-CWA4WttNNZF54TAoqtyWWQ, jkacur-H+wXaHxf7aLQT0dZR+AlfA,
	linux-man-u79uwXL29TY76Z2rM5mHXA

[-- Attachment #1: Type: text/plain, Size: 846 bytes --]

On Mon, May 05, 2014 at 09:41:08AM +0200, Michael Kerrisk (man-pages) wrote:
> >> But, in that case, I need to know the
> >> copyright and license you want to use. Please see
> >> https://www.kernel.org/doc/man-pages/licenses.html
> > 
> > GPLv2 + DOC (not v2+) sounds good.
> 
> I'm a little unclear here. Do you or don't you mean
> https://www.kernel.org/doc/man-pages/licenses.html#gpl
> ?

A variant with out the +, just like I do my kernel code, no greater gpl
versions. However, 

> (Note, I'd really prefer to stick to one of those licenses
> (without variants). (My personal preference is the "verbatim"
> license, which is the most widely used one.) There's already
> do many licenses in in man-pages...

Verbatim is OK with me I suppose. Its only text after all, who cares
about that :-)

/me runs for the hills.

[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: sched_{set,get}attr() manpage
  2014-05-05  7:47                                             ` Peter Zijlstra
@ 2014-05-05  9:53                                               ` Michael Kerrisk (man-pages)
  0 siblings, 0 replies; 26+ messages in thread
From: Michael Kerrisk (man-pages) @ 2014-05-05  9:53 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: mtk.manpages, Juri Lelli, Dario Faggioli, Thomas Gleixner,
	Ingo Molnar, rostedt, Oleg Nesterov, fweisbec, darren, johan.eker,
	p.faure, Linux Kernel, claudio, michael, fchecconi,
	tommaso.cucinotta, nicola.manica, luca.abeni, dhaval.giani,
	hgu1972, Paul McKenney, insop.song, liming.wang, jkacur,
	linux-man

On 05/05/2014 09:47 AM, Peter Zijlstra wrote:
> On Mon, May 05, 2014 at 09:41:08AM +0200, Michael Kerrisk (man-pages) wrote:
>>>> But, in that case, I need to know the
>>>> copyright and license you want to use. Please see
>>>> https://www.kernel.org/doc/man-pages/licenses.html
>>>
>>> GPLv2 + DOC (not v2+) sounds good.
>>
>> I'm a little unclear here. Do you or don't you mean
>> https://www.kernel.org/doc/man-pages/licenses.html#gpl
>> ?
> 
> A variant with out the +, just like I do my kernel code, no greater gpl
> versions. However, 
> 
>> (Note, I'd really prefer to stick to one of those licenses
>> (without variants). (My personal preference is the "verbatim"
>> license, which is the most widely used one.) There's already
>> do many licenses in in man-pages...
> 
> Verbatim is OK with me I suppose. 

And don't neglect to mention who the copyright is to please.

> Its only text after all, who cares
> about that :-)
> /me runs for the hills.

Well, apparently you care, so thanks ;-)

 


-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: sched_{set,get}attr() manpage
  2014-05-05  7:21                                     ` Peter Zijlstra
       [not found]                                       ` <20140505072114.GY11096-ndre7Fmf5hadTX5a5knrm8zTDFooKrT+cvkQGrU6aU0@public.gmane.org>
@ 2014-05-06  8:16                                       ` Peter Zijlstra
  2014-05-09  8:23                                         ` Michael Kerrisk (man-pages)
  1 sibling, 1 reply; 26+ messages in thread
From: Peter Zijlstra @ 2014-05-06  8:16 UTC (permalink / raw)
  To: Michael Kerrisk (man-pages)
  Cc: Juri Lelli, Dario Faggioli, Thomas Gleixner, Ingo Molnar, rostedt,
	Oleg Nesterov, fweisbec, darren, johan.eker, p.faure,
	Linux Kernel, claudio, michael, fchecconi, tommaso.cucinotta,
	nicola.manica, luca.abeni, dhaval.giani, hgu1972, Paul McKenney,
	insop.song, liming.wang, jkacur, linux-man

[-- Attachment #1: Type: text/plain, Size: 9226 bytes --]

On Mon, May 05, 2014 at 09:21:14AM +0200, Peter Zijlstra wrote:
> On Mon, May 05, 2014 at 08:55:28AM +0200, Michael Kerrisk (man-pages) wrote:
> > Hi Peter,
> > 
> > Looks like a good set of comments from Juri. Could you revise and 
> > resubmit?
> 
> Yeah, I'll try and get it done today, but there's a few icky bugs
> waiting for my attention as well, I'll do me bestest :-)

OK, not quite managed it yesterday, but here goes.

So Verbatim license, for the first part to me and whoever I borrowed
sched_setscheduler() bits from.

For the second part to me and Juri.

---

> [1] A page describing the sched_setattr() and sched_getattr() APIs

NAME
	sched_setattr, sched_getattr - set and get scheduling policy/attributes

SYNOPSIS
	#include <sched.h>

	struct sched_attr {
		u32 size;
		u32 sched_policy;
		u64 sched_flags;

		/* SCHED_NORMAL, SCHED_BATCH */
		s32 sched_nice;

		/* SCHED_FIFO, SCHED_RR */
		u32 sched_priority;

		/* SCHED_DEADLINE */
		u64 sched_runtime;
		u64 sched_deadline;
		u64 sched_period;
	};

	int sched_setattr(pid_t pid, const struct sched_attr *attr, unsigned int flags);

	int sched_getattr(pid_t pid, const struct sched_attr *attr, unsigned int size, unsigned int flags);

DESCRIPTION
	sched_setattr() sets both the scheduling policy and the
	associated attributes for the process whose ID is specified in
	pid.

	sched_setattr() replaces sched_setscheduler(), sched_setparam(),
	nice() and some of setpriority().

	If pid equals zero, the scheduling policy and attributes
	of the calling process will be set.  The interpretation of the
	argument attr depends on the selected policy.  Currently, Linux
	supports the following "normal" (i.e., non-real-time) scheduling
	policies:

	SCHED_OTHER	the standard "fair" time-sharing policy;

	SCHED_BATCH	for "batch" style execution of processes; and

	SCHED_IDLE	for running very low priority background jobs.

	The following "real-time" policies are also supported, for
	special time-critical applications that need precise control
	over the way in which runnable processes are selected for
	execution:

	SCHED_FIFO	a static priority first-in, first-out policy;

	SCHED_RR	a static priority round-robin policy; and

	SCHED_DEADLINE	a dynamic priority deadline policy.

	The semantics of each of these policies are detailed in
	sched(7).

	sched_attr::size must be set to the size of the structure, as in
	sizeof(struct sched_attr), if the provided structure is smaller
	than the kernel structure, any additional fields are assumed
	'0'. If the provided structure is larger than the kernel
	structure, the kernel verifies all additional fields are '0' if
	not the syscall will fail with -E2BIG.

	sched_attr::sched_policy the desired scheduling policy.

	sched_attr::sched_flags additional flags that can influence
	scheduling behaviour. Currently as per Linux kernel 3.14:

		SCHED_FLAG_RESET_ON_FORK - resets the scheduling policy
		to: (struct sched_attr){ .sched_policy = SCHED_OTHER, }
		on fork().

	is the only supported flag.

	sched_attr::sched_nice should only be set for SCHED_OTHER,
	SCHED_BATCH, the desired nice value [-20,19], see sched(7).

	sched_attr::sched_priority should only be set for SCHED_FIFO,
	SCHED_RR, the desired static priority [1,99], see sched(7).

	sched_attr::sched_runtime in nanoseconds,
	sched_attr::sched_deadline in nanoseconds,
	sched_attr::sched_period in nanoseconds, should only be set for
	SCHED_DEADLINE and are the traditional sporadic task model
	parameters, see sched(7).

	The flags argument should be 0.

	sched_getattr() queries the scheduling policy currently applied
	to the process identified by pid.

	Similar to sched_setattr(), sched_getattr() replaces
	sched_getscheduler(), sched_getparam() and some of
	getpriority().

	If pid equals zero, the policy of the calling process will be
	retrieved.

	The size argument should reflect the size of struct sched_attr
	as known to userspace. The kernel fills out sched_attr::size to
	the size of its sched_attr structure. If the user provided
	structure is larger, additional fields are not touched. If the
	user provided structure is smaller, but the kernel needs to
	return values outside the provided space, the syscall will fail
	with -E2BIG.

	The flags argument should be 0.

	The other sched_attr fields are filled out as described in
	sched_setattr().

RETURN VALUE
	On success, sched_setattr() and sched_getattr() return 0. On
	error, -1 is returned, and errno is set appropriately.

ERRORS
       EINVAL The scheduling policy is not one  of  the  recognized  policies,
              param is NULL, or param does not make sense for the selected
	      policy.

       EPERM  The calling process does not have appropriate privileges.

       ESRCH  The process whose ID is pid could not be found.

       E2BIG  The provided storage for struct sched_attr is either too
              big, see sched_setattr(), or too small, see sched_getattr().

       EBUSY  SCHED_DEADLINE admission control failure, see sched(7).

NOTES
       While the text above (and in sched_setscheduler(2)) talks about
       processes, in actual fact these system calls are thread specific.

       While the SCHED_DEADLINE parameters are in nanoseconds, current
       kernels truncate the lower 10 bits and we get an effective
       microsecond resolution.

> [2] A piece of text describing the SCHED_DEADLINE policy, which I can
> drop into sched(7).

    SCHED_DEADLINE: Sporadic task model deadline scheduling
       SCHED_DEADLINE is currently implemented using GEDF (Global
       Earliest Deadline First) with additional CBS (Constant Bandwidth
       Server).

       A sporadic task is one that has a sequence of jobs, where each
       job is activated at most once per period. Each job has also a
       relative deadline, before which it should finish execution, and a
       computation time, that is the time necessary for executing the
       job without interruption. The instant of time when a task wakes
       up, because a new job has to be executed, is called arrival time
       (and it is also referred to as request time or release time).
       Start time is instead the time at which a task starts its
       execution. The absolute deadline is thus obtained adding the
       relative deadline to the arrival time.

       The following diagram clarifies these terms:

               arrival/wakeup           absolute deadline
               |        start time          |
               v        v                   v
        -------x--------xoooooooooooo-------x--------x-----
                        |<- comp. ->|
               |<---------- rel. deadline ->|
               |<---------- period ----------------->|

       SCHED_DEADLINE allows the user to specify three parameters (see
       sched_setattr(2)): Runtime [ns], Deadline [ns] and Period [ns].
       Such parameters has not necessarily to correspond to the
       aforementioned terms, while usual practise is to set Runtime to
       something bigger than the average computation time (or worst-case
       execution time for hard real-time tasks), Deadline to the
       relative deadline and Period to the period of the task. With such
       a setting we would have:

               arrival/wakeup           absolute deadline
               |        start time          |
               v        v                   v
        -------x--------xoooooooooooo-------x--------x-----
                        |<- Runtime -->|
               |<---------- Deadline ------>|
               |<---------- Period ----------------->|

       It is checked that: Runtime <= Deadline <= Period.

       The CBS guarantees non-interference between tasks, by throttling
       tasks that attempt to over-run their specified Runtime.

       In general the set of all SCHED_DEADLINE tasks is not
       feasible/schedulable within the given constraints. To guarantee
       some degree of timeliness we must do an admittance test on
       setting/changing SCHED_DEADLINE policy/attributes.

       This admission test calculates that the task set is
       feasible/schedulable, failing this, sched_setattr() will return
       -EBUSY.

       For example, it is required (but not necessarily sufficient) for
       the total utilization to be less or equal to the total amount of
       CPUs available, where, since each task can maximally run for
       Runtime per Period, that task's utilization is its
       Runtime/Period.

       Because we must be able to calculate admittance SCHED_DEADLINE
       tasks are the highest priority (user controllable) tasks in the
       system, if any SCHED_DEADLINE task is runnable it will preempt
       any FIFO/RR/OTHER/BATCH/IDLE task.

       SCHED_DEADLINE tasks will fail fork(2) with -EAGAIN, except when
       the forking task has SCHED_FLAG_RESET_ON_FORK set.

       A SCHED_DEADLINE task calling sched_yield() will 'yield' the
       current job and wait for a new period to begin.


[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: sched_{set,get}attr() manpage
  2014-05-06  8:16                                       ` Peter Zijlstra
@ 2014-05-09  8:23                                         ` Michael Kerrisk (man-pages)
       [not found]                                           ` <536C907A.1040205-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
  0 siblings, 1 reply; 26+ messages in thread
From: Michael Kerrisk (man-pages) @ 2014-05-09  8:23 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: mtk.manpages, Juri Lelli, Dario Faggioli, Thomas Gleixner,
	Ingo Molnar, rostedt, Oleg Nesterov, fweisbec, darren, johan.eker,
	p.faure, Linux Kernel, claudio, michael, fchecconi,
	tommaso.cucinotta, nicola.manica, luca.abeni, dhaval.giani,
	hgu1972, Paul McKenney, insop.song, liming.wang, jkacur,
	linux-man

Hi Peter,

I'm working on this text. I see the following in kernel/sched/core.c:

[[
static int __sched_setscheduler(struct task_struct *p,
                                const struct sched_attr *attr,
                                bool user)
{
        ...

        int policy = attr->sched_policy;
        ...
        if (policy < 0) {
                reset_on_fork = p->sched_reset_on_fork;
                policy = oldpolicy = p->policy;
]]

What's a negative policy about? Is this something that should 
be documented?

Cheers,

Michael

On 05/06/2014 10:16 AM, Peter Zijlstra wrote:
> On Mon, May 05, 2014 at 09:21:14AM +0200, Peter Zijlstra wrote:
>> On Mon, May 05, 2014 at 08:55:28AM +0200, Michael Kerrisk (man-pages) wrote:
>>> Hi Peter,
>>>
>>> Looks like a good set of comments from Juri. Could you revise and 
>>> resubmit?
>>
>> Yeah, I'll try and get it done today, but there's a few icky bugs
>> waiting for my attention as well, I'll do me bestest :-)
> 
> OK, not quite managed it yesterday, but here goes.
> 
> So Verbatim license, for the first part to me and whoever I borrowed
> sched_setscheduler() bits from.
> 
> For the second part to me and Juri.
> 
> ---
> 
>> [1] A page describing the sched_setattr() and sched_getattr() APIs
> 
> NAME
> 	sched_setattr, sched_getattr - set and get scheduling policy/attributes
> 
> SYNOPSIS
> 	#include <sched.h>
> 
> 	struct sched_attr {
> 		u32 size;
> 		u32 sched_policy;
> 		u64 sched_flags;
> 
> 		/* SCHED_NORMAL, SCHED_BATCH */
> 		s32 sched_nice;
> 
> 		/* SCHED_FIFO, SCHED_RR */
> 		u32 sched_priority;
> 
> 		/* SCHED_DEADLINE */
> 		u64 sched_runtime;
> 		u64 sched_deadline;
> 		u64 sched_period;
> 	};
> 
> 	int sched_setattr(pid_t pid, const struct sched_attr *attr, unsigned int flags);
> 
> 	int sched_getattr(pid_t pid, const struct sched_attr *attr, unsigned int size, unsigned int flags);
> 
> DESCRIPTION
> 	sched_setattr() sets both the scheduling policy and the
> 	associated attributes for the process whose ID is specified in
> 	pid.
> 
> 	sched_setattr() replaces sched_setscheduler(), sched_setparam(),
> 	nice() and some of setpriority().
> 
> 	If pid equals zero, the scheduling policy and attributes
> 	of the calling process will be set.  The interpretation of the
> 	argument attr depends on the selected policy.  Currently, Linux
> 	supports the following "normal" (i.e., non-real-time) scheduling
> 	policies:
> 
> 	SCHED_OTHER	the standard "fair" time-sharing policy;
> 
> 	SCHED_BATCH	for "batch" style execution of processes; and
> 
> 	SCHED_IDLE	for running very low priority background jobs.
> 
> 	The following "real-time" policies are also supported, for
> 	special time-critical applications that need precise control
> 	over the way in which runnable processes are selected for
> 	execution:
> 
> 	SCHED_FIFO	a static priority first-in, first-out policy;
> 
> 	SCHED_RR	a static priority round-robin policy; and
> 
> 	SCHED_DEADLINE	a dynamic priority deadline policy.
> 
> 	The semantics of each of these policies are detailed in
> 	sched(7).
> 
> 	sched_attr::size must be set to the size of the structure, as in
> 	sizeof(struct sched_attr), if the provided structure is smaller
> 	than the kernel structure, any additional fields are assumed
> 	'0'. If the provided structure is larger than the kernel
> 	structure, the kernel verifies all additional fields are '0' if
> 	not the syscall will fail with -E2BIG.
> 
> 	sched_attr::sched_policy the desired scheduling policy.
> 
> 	sched_attr::sched_flags additional flags that can influence
> 	scheduling behaviour. Currently as per Linux kernel 3.14:
> 
> 		SCHED_FLAG_RESET_ON_FORK - resets the scheduling policy
> 		to: (struct sched_attr){ .sched_policy = SCHED_OTHER, }
> 		on fork().
> 
> 	is the only supported flag.
> 
> 	sched_attr::sched_nice should only be set for SCHED_OTHER,
> 	SCHED_BATCH, the desired nice value [-20,19], see sched(7).
> 
> 	sched_attr::sched_priority should only be set for SCHED_FIFO,
> 	SCHED_RR, the desired static priority [1,99], see sched(7).
> 
> 	sched_attr::sched_runtime in nanoseconds,
> 	sched_attr::sched_deadline in nanoseconds,
> 	sched_attr::sched_period in nanoseconds, should only be set for
> 	SCHED_DEADLINE and are the traditional sporadic task model
> 	parameters, see sched(7).
> 
> 	The flags argument should be 0.
> 
> 	sched_getattr() queries the scheduling policy currently applied
> 	to the process identified by pid.
> 
> 	Similar to sched_setattr(), sched_getattr() replaces
> 	sched_getscheduler(), sched_getparam() and some of
> 	getpriority().
> 
> 	If pid equals zero, the policy of the calling process will be
> 	retrieved.
> 
> 	The size argument should reflect the size of struct sched_attr
> 	as known to userspace. The kernel fills out sched_attr::size to
> 	the size of its sched_attr structure. If the user provided
> 	structure is larger, additional fields are not touched. If the
> 	user provided structure is smaller, but the kernel needs to
> 	return values outside the provided space, the syscall will fail
> 	with -E2BIG.
> 
> 	The flags argument should be 0.
> 
> 	The other sched_attr fields are filled out as described in
> 	sched_setattr().
> 
> RETURN VALUE
> 	On success, sched_setattr() and sched_getattr() return 0. On
> 	error, -1 is returned, and errno is set appropriately.
> 
> ERRORS
>        EINVAL The scheduling policy is not one  of  the  recognized  policies,
>               param is NULL, or param does not make sense for the selected
> 	      policy.
> 
>        EPERM  The calling process does not have appropriate privileges.
> 
>        ESRCH  The process whose ID is pid could not be found.
> 
>        E2BIG  The provided storage for struct sched_attr is either too
>               big, see sched_setattr(), or too small, see sched_getattr().
> 
>        EBUSY  SCHED_DEADLINE admission control failure, see sched(7).
> 
> NOTES
>        While the text above (and in sched_setscheduler(2)) talks about
>        processes, in actual fact these system calls are thread specific.
> 
>        While the SCHED_DEADLINE parameters are in nanoseconds, current
>        kernels truncate the lower 10 bits and we get an effective
>        microsecond resolution.
> 
>> [2] A piece of text describing the SCHED_DEADLINE policy, which I can
>> drop into sched(7).
> 
>     SCHED_DEADLINE: Sporadic task model deadline scheduling
>        SCHED_DEADLINE is currently implemented using GEDF (Global
>        Earliest Deadline First) with additional CBS (Constant Bandwidth
>        Server).
> 
>        A sporadic task is one that has a sequence of jobs, where each
>        job is activated at most once per period. Each job has also a
>        relative deadline, before which it should finish execution, and a
>        computation time, that is the time necessary for executing the
>        job without interruption. The instant of time when a task wakes
>        up, because a new job has to be executed, is called arrival time
>        (and it is also referred to as request time or release time).
>        Start time is instead the time at which a task starts its
>        execution. The absolute deadline is thus obtained adding the
>        relative deadline to the arrival time.
> 
>        The following diagram clarifies these terms:
> 
>                arrival/wakeup           absolute deadline
>                |        start time          |
>                v        v                   v
>         -------x--------xoooooooooooo-------x--------x-----
>                         |<- comp. ->|
>                |<---------- rel. deadline ->|
>                |<---------- period ----------------->|
> 
>        SCHED_DEADLINE allows the user to specify three parameters (see
>        sched_setattr(2)): Runtime [ns], Deadline [ns] and Period [ns].
>        Such parameters has not necessarily to correspond to the
>        aforementioned terms, while usual practise is to set Runtime to
>        something bigger than the average computation time (or worst-case
>        execution time for hard real-time tasks), Deadline to the
>        relative deadline and Period to the period of the task. With such
>        a setting we would have:
> 
>                arrival/wakeup           absolute deadline
>                |        start time          |
>                v        v                   v
>         -------x--------xoooooooooooo-------x--------x-----
>                         |<- Runtime -->|
>                |<---------- Deadline ------>|
>                |<---------- Period ----------------->|
> 
>        It is checked that: Runtime <= Deadline <= Period.
> 
>        The CBS guarantees non-interference between tasks, by throttling
>        tasks that attempt to over-run their specified Runtime.
> 
>        In general the set of all SCHED_DEADLINE tasks is not
>        feasible/schedulable within the given constraints. To guarantee
>        some degree of timeliness we must do an admittance test on
>        setting/changing SCHED_DEADLINE policy/attributes.
> 
>        This admission test calculates that the task set is
>        feasible/schedulable, failing this, sched_setattr() will return
>        -EBUSY.
> 
>        For example, it is required (but not necessarily sufficient) for
>        the total utilization to be less or equal to the total amount of
>        CPUs available, where, since each task can maximally run for
>        Runtime per Period, that task's utilization is its
>        Runtime/Period.
> 
>        Because we must be able to calculate admittance SCHED_DEADLINE
>        tasks are the highest priority (user controllable) tasks in the
>        system, if any SCHED_DEADLINE task is runnable it will preempt
>        any FIFO/RR/OTHER/BATCH/IDLE task.
> 
>        SCHED_DEADLINE tasks will fail fork(2) with -EAGAIN, except when
>        the forking task has SCHED_FLAG_RESET_ON_FORK set.
> 
>        A SCHED_DEADLINE task calling sched_yield() will 'yield' the
>        current job and wait for a new period to begin.
> 


-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: sched_{set,get}attr() manpage
       [not found]                                           ` <536C907A.1040205-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
@ 2014-05-09  8:53                                             ` Peter Zijlstra
  2014-05-09  9:26                                               ` Michael Kerrisk (man-pages)
  0 siblings, 1 reply; 26+ messages in thread
From: Peter Zijlstra @ 2014-05-09  8:53 UTC (permalink / raw)
  To: Michael Kerrisk (man-pages)
  Cc: Juri Lelli, Dario Faggioli, Thomas Gleixner, Ingo Molnar,
	rostedt-nx8X9YLhiw1AfugRpC6u6w, Oleg Nesterov,
	fweisbec-Re5JQEeQqe8AvxtiuMwx3w, darren-P76s1CtE8BHQT0dZR+AlfA,
	johan.eker-IzeFyvvaP7pWk0Htik3J/w, p.faure-et3tyl94nDNyDzI6CaY1VQ,
	Linux Kernel, claudio-YOzL5CV4y4YG1A2ADO40+w,
	michael-dyjBcgdgk7Pe9wHmmfpqLFaTQe2KTcn/,
	fchecconi-Re5JQEeQqe8AvxtiuMwx3w, tommaso.cucinotta-gAmJrWFzCps,
	nicola.manica-+cHZLFJ93xAO91npARCAeA, luca.abeni-3IIOeSMMxS4,
	dhaval.giani-Re5JQEeQqe8AvxtiuMwx3w,
	hgu1972-Re5JQEeQqe8AvxtiuMwx3w, Paul McKenney,
	insop.song-Re5JQEeQqe8AvxtiuMwx3w,
	liming.wang-CWA4WttNNZF54TAoqtyWWQ, jkacur-H+wXaHxf7aLQT0dZR+AlfA,
	linux-man-u79uwXL29TY76Z2rM5mHXA

[-- Attachment #1: Type: text/plain, Size: 1982 bytes --]

On Fri, May 09, 2014 at 10:23:22AM +0200, Michael Kerrisk (man-pages) wrote:
> Hi Peter,
> 
> I'm working on this text. I see the following in kernel/sched/core.c:
> 
> [[
> static int __sched_setscheduler(struct task_struct *p,
>                                 const struct sched_attr *attr,
>                                 bool user)
> {
>         ...
> 
>         int policy = attr->sched_policy;
>         ...
>         if (policy < 0) {
>                 reset_on_fork = p->sched_reset_on_fork;
>                 policy = oldpolicy = p->policy;
> ]]
> 
> What's a negative policy about? Is this something that should 
> be documented?

That's for sched_setparam(), which internally passes policy = -1, it
wasn't meant to be user visible, lemme double check that.

sys_sched_setscheduler() -- explicit check for policy < 0
sys_sched_setparam() -- explicitly passes policy=-1, not user visible
sys_sched_setattr() -- hmm, it looks like fail


---
Subject: sched: Disallow sched_attr::sched_policy < 0
From: Peter Zijlstra <peterz-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
Date: Fri May  9 10:49:03 CEST 2014

The scheduler uses policy=-1 to preserve the current policy state to
implement sys_sched_setparam(), this got exposed to userspace by
accident through sys_sched_setattr(), cure this.

Reported-by: Michael Kerrisk <mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
Signed-off-by: Peter Zijlstra <peterz-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
Link: http://lkml.kernel.org/n/tip-b4kbwz2qh21xlngdzje00t55-Ckxz5ZWcFp/9qxiX1TGQuw@public.gmane.org
---
 kernel/sched/core.c |    3 +++
 1 file changed, 3 insertions(+)

--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -3711,6 +3711,9 @@ SYSCALL_DEFINE3(sched_setattr, pid_t, pi
 	if (sched_copy_attr(uattr, &attr))
 		return -EFAULT;
 
+	if (attr.sched_policy < 0)
+		return -EINVAL;
+
 	rcu_read_lock();
 	retval = -ESRCH;
 	p = find_process_by_pid(pid);

[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: sched_{set,get}attr() manpage
  2014-05-09  8:53                                             ` Peter Zijlstra
@ 2014-05-09  9:26                                               ` Michael Kerrisk (man-pages)
  0 siblings, 0 replies; 26+ messages in thread
From: Michael Kerrisk (man-pages) @ 2014-05-09  9:26 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Juri Lelli, Dario Faggioli, Thomas Gleixner, Ingo Molnar,
	Steven Rostedt, Oleg Nesterov, Frédéric Weisbecker,
	Darren Hart, johan.eker, p.faure, Linux Kernel, Claudio Scordino,
	Michael Trimarchi, Fabio Checconi, Tommaso Cucinotta,
	nicola.manica, luca.abeni, Dhaval Giani, hgu1972, Paul McKenney,
	Insop Song, liming.wang, jkacur, linux-man

Hi Peter,

On Fri, May 9, 2014 at 10:53 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> On Fri, May 09, 2014 at 10:23:22AM +0200, Michael Kerrisk (man-pages) wrote:
>> Hi Peter,
>>
>> I'm working on this text. I see the following in kernel/sched/core.c:
>>
>> [[
>> static int __sched_setscheduler(struct task_struct *p,
>>                                 const struct sched_attr *attr,
>>                                 bool user)
>> {
>>         ...
>>
>>         int policy = attr->sched_policy;
>>         ...
>>         if (policy < 0) {
>>                 reset_on_fork = p->sched_reset_on_fork;
>>                 policy = oldpolicy = p->policy;
>> ]]
>>
>> What's a negative policy about? Is this something that should
>> be documented?
>
> That's for sched_setparam(), which internally passes policy = -1, it
> wasn't meant to be user visible, lemme double check that.
>
> sys_sched_setscheduler() -- explicit check for policy < 0
> sys_sched_setparam() -- explicitly passes policy=-1, not user visible

(Ahh -- I missed that piece in sys_sched_setparam())

> sys_sched_setattr() -- hmm, it looks like fail

Yep, I was seeing that there was no check in sched_setatr().

As I recently said, when it comes to writing a man page, show me a new
interface, and I'll show you a bug ;-).

Thanks for the clarification.

Acked-by: Michael Kerrisk <mtk.manpages@gmail.com>

Cheers,

Michael


> ---
> Subject: sched: Disallow sched_attr::sched_policy < 0
> From: Peter Zijlstra <peterz@infradead.org>
> Date: Fri May  9 10:49:03 CEST 2014
>
> The scheduler uses policy=-1 to preserve the current policy state to
> implement sys_sched_setparam(), this got exposed to userspace by
> accident through sys_sched_setattr(), cure this.
>
> Reported-by: Michael Kerrisk <mtk.manpages@gmail.com>
> Signed-off-by: Peter Zijlstra <peterz@infradead.org>
> Link: http://lkml.kernel.org/n/tip-b4kbwz2qh21xlngdzje00t55@git.kernel.org
> ---
>  kernel/sched/core.c |    3 +++
>  1 file changed, 3 insertions(+)
>
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -3711,6 +3711,9 @@ SYSCALL_DEFINE3(sched_setattr, pid_t, pi
>         if (sched_copy_attr(uattr, &attr))
>                 return -EFAULT;
>
> +       if (attr.sched_policy < 0)
> +               return -EINVAL;
> +
>         rcu_read_lock();
>         retval = -ESRCH;
>         p = find_process_by_pid(pid);



-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2014-05-09  9:26 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20131217122720.950475833@infradead.org>
     [not found] ` <20131217123352.692059839@infradead.org>
     [not found]   ` <CAHO5Pa3=+Zhg72tVfddSUvgirUyObir6atJVo4_16bVWB2Osgw@mail.gmail.com>
     [not found]     ` <20140121153851.GZ31570@twins.programming.kicks-ass.net>
     [not found]       ` <CAKgNAkgw+U44SH0wd_06ZMXaCC9nCX4NZxZHkMKUdC7E7YxBhQ@mail.gmail.com>
     [not found]         ` <20140214161929.GL27965@twins.programming.kicks-ass.net>
     [not found]           ` <53020C9D.1050208@gmail.com>
     [not found]             ` <53020C9D.1050208-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2014-04-09  9:25               ` sched_{set,get}attr() manpage Peter Zijlstra
     [not found]                 ` <20140409092510.GQ11096-ndre7Fmf5hadTX5a5knrm8zTDFooKrT+cvkQGrU6aU0@public.gmane.org>
2014-04-09 15:19                   ` Henrik Austad
     [not found]                     ` <20140409151911.GA4041-RT+80VE2nyv1P9xLtpHBDw@public.gmane.org>
2014-04-09 15:42                       ` Peter Zijlstra
     [not found]                         ` <20140409154204.GD10526-ndre7Fmf5hadTX5a5knrm8zTDFooKrT+cvkQGrU6aU0@public.gmane.org>
2014-04-10  7:47                           ` Juri Lelli
2014-04-10  9:59                             ` Claudio Scordino
2014-04-27 15:47                           ` Michael Kerrisk (man-pages)
     [not found]                             ` <CAKgNAki5BkOyckf1zxJCRs2tq-eG9bWW_yRGi3hDynz12wz+QQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2014-04-27 19:34                               ` Peter Zijlstra
2014-04-27 19:45                                 ` Steven Rostedt
     [not found]                                 ` <20140427193449.GB17778-RM5+C6weyIYnLiPH7yDmwOa11wxjtiyuLtmvbW2Dspo@public.gmane.org>
2014-04-28  7:39                                   ` Juri Lelli
2014-04-28  8:18               ` Peter Zijlstra
     [not found]                 ` <20140428081858.GX13658-ndre7Fmf5hadTX5a5knrm8zTDFooKrT+cvkQGrU6aU0@public.gmane.org>
2014-04-29 13:08                   ` Michael Kerrisk (man-pages)
     [not found]                     ` <535FA467.2070403-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2014-04-29 14:22                       ` Peter Zijlstra
2014-04-29 16:04                     ` Peter Zijlstra
2014-04-30 11:09                       ` Michael Kerrisk (man-pages)
     [not found]                         ` <5360D9E5.9080206-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2014-04-30 12:35                           ` Peter Zijlstra
2014-04-30 13:09                           ` Peter Zijlstra
     [not found]                             ` <20140430130937.GH30445-ndre7Fmf5hadTX5a5knrm8zTDFooKrT+cvkQGrU6aU0@public.gmane.org>
2014-05-03 10:43                               ` Juri Lelli
     [not found]                                 ` <20140503124355.5d927080518051ca507bc381-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2014-05-05  6:55                                   ` Michael Kerrisk (man-pages)
2014-05-05  7:21                                     ` Peter Zijlstra
     [not found]                                       ` <20140505072114.GY11096-ndre7Fmf5hadTX5a5knrm8zTDFooKrT+cvkQGrU6aU0@public.gmane.org>
2014-05-05  7:41                                         ` Michael Kerrisk (man-pages)
     [not found]                                           ` <53674094.2020307-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2014-05-05  7:47                                             ` Peter Zijlstra
2014-05-05  9:53                                               ` Michael Kerrisk (man-pages)
2014-05-06  8:16                                       ` Peter Zijlstra
2014-05-09  8:23                                         ` Michael Kerrisk (man-pages)
     [not found]                                           ` <536C907A.1040205-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2014-05-09  8:53                                             ` Peter Zijlstra
2014-05-09  9:26                                               ` Michael Kerrisk (man-pages)

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).