* sched_{set,get}attr() manpage [not found] ` <53020C9D.1050208-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> @ 2014-04-09 9:25 ` Peter Zijlstra [not found] ` <20140409092510.GQ11096-ndre7Fmf5hadTX5a5knrm8zTDFooKrT+cvkQGrU6aU0@public.gmane.org> 2014-04-28 8:18 ` Peter Zijlstra 1 sibling, 1 reply; 26+ messages in thread From: Peter Zijlstra @ 2014-04-09 9:25 UTC (permalink / raw) To: Michael Kerrisk (man-pages) Cc: Dario Faggioli, Thomas Gleixner, Ingo Molnar, rostedt-nx8X9YLhiw1AfugRpC6u6w, Oleg Nesterov, fweisbec-Re5JQEeQqe8AvxtiuMwx3w, darren-P76s1CtE8BHQT0dZR+AlfA, johan.eker-IzeFyvvaP7pWk0Htik3J/w, p.faure-et3tyl94nDNyDzI6CaY1VQ, Linux Kernel, claudio-YOzL5CV4y4YG1A2ADO40+w, michael-dyjBcgdgk7Pe9wHmmfpqLFaTQe2KTcn/, fchecconi-Re5JQEeQqe8AvxtiuMwx3w, tommaso.cucinotta-gAmJrWFzCps, juri.lelli-Re5JQEeQqe8AvxtiuMwx3w, nicola.manica-+cHZLFJ93xAO91npARCAeA, luca.abeni-3IIOeSMMxS4, dhaval.giani-Re5JQEeQqe8AvxtiuMwx3w, hgu1972-Re5JQEeQqe8AvxtiuMwx3w, Paul McKenney, insop.song-Re5JQEeQqe8AvxtiuMwx3w, liming.wang-CWA4WttNNZF54TAoqtyWWQ, jkacur-H+wXaHxf7aLQT0dZR+AlfA, linux-man-u79uwXL29TY76Z2rM5mHXA On Mon, Feb 17, 2014 at 02:20:29PM +0100, Michael Kerrisk (man-pages) wrote: > If your could take another pass though your existing text, to incorporate the > new flags stuff, and then send a page to me + linux-man@ > that would be great. Sorry, this slipped my mind. An updated version below. Heavy borrowing from SCHED_SETSCHEDULER(2) as before. --- NAME sched_setattr, sched_getattr - set and get scheduling policy/attributes SYNOPSIS #include <sched.h> struct sched_attr { u32 size; u32 sched_policy; u64 sched_flags; /* SCHED_NORMAL, SCHED_BATCH */ s32 sched_nice; /* SCHED_FIFO, SCHED_RR */ u32 sched_priority; /* SCHED_DEADLINE */ u64 sched_runtime; u64 sched_deadline; u64 sched_period; }; int sched_setattr(pid_t pid, const struct sched_attr *attr, unsigned int flags); int sched_getattr(pid_t pid, const struct sched_attr *attr, unsigned int size, unsigned int flags); DESCRIPTION sched_setattr() sets both the scheduling policy and the associated attributes for the process whose ID is specified in pid. If pid equals zero, the scheduling policy and attributes of the calling process will be set. The interpretation of the argument attr depends on the selected policy. Currently, Linux supports the following "normal" (i.e., non-real-time) scheduling policies: SCHED_OTHER the standard "fair" time-sharing policy; SCHED_BATCH for "batch" style execution of processes; and SCHED_IDLE for running very low priority background jobs. The following "real-time" policies are also supported, for special time-critical applications that need precise control over the way in which runnable processes are selected for execution: SCHED_FIFO a first-in, first-out policy; SCHED_RR a round-robin policy; and SCHED_DEADLINE a deadline policy. The semantics of each of these policies are detailed below. sched_attr::size must be set to the size of the structure, as in sizeof(struct sched_attr), if the provided structure is smaller than the kernel structure, any additional fields are assumed '0'. If the provided structure is larger than the kernel structure, the kernel verifies all additional fields are '0' if not the syscall will fail with -E2BIG. sched_attr::sched_policy the desired scheduling policy. sched_attr::sched_flags additional flags that can influence scheduling behaviour. Currently as per Linux kernel 3.14: SCHED_FLAG_RESET_ON_FORK - resets the scheduling policy to: (struct sched_attr){ .sched_policy = SCHED_OTHER, } on fork(). is the only supported flag. sched_attr::sched_nice should only be set for SCHED_OTHER, SCHED_BATCH, the desired nice value [-20,19], see NICE(2). sched_attr::sched_priority should only be set for SCHED_FIFO, SCHED_RR, the desired static priority [1,99]. sched_attr::sched_runtime sched_attr::sched_deadline sched_attr::sched_period should only be set for SCHED_DEADLINE and are the traditional sporadic task model parameters. The flags argument should be 0. sched_getattr() queries the scheduling policy currently applied to the process identified by pid. If pid equals zero, the policy of the calling process will be retrieved. The size argument should reflect the size of struct sched_attr as known to userspace. The kernel fills out sched_attr::size to the size of its sched_attr structure. If the user provided structure is larger, additional fields are not touched. If the user provided structure is smaller, but the kernel needs to return values outside the provided space, the syscall will fail with -E2BIG. The flags argument should be 0. The other sched_attr fields are filled out as described in sched_setattr(). Scheduling Policies The scheduler is the kernel component that decides which runnable process will be executed by the CPU next. Each process has an associ‐ ated scheduling policy and a static scheduling priority, sched_prior‐ ity; these are the settings that are modified by sched_setscheduler(). The scheduler makes it decisions based on knowledge of the scheduling policy and static priority of all processes on the system. For processes scheduled under one of the normal scheduling policies (SCHED_OTHER, SCHED_IDLE, SCHED_BATCH), sched_priority is not used in scheduling decisions (it must be specified as 0). Processes scheduled under one of the real-time policies (SCHED_FIFO, SCHED_RR) have a sched_priority value in the range 1 (low) to 99 (high). (As the numbers imply, real-time processes always have higher priority than normal processes.) Note well: POSIX.1-2001 only requires an implementation to support a minimum 32 distinct priority levels for the real-time policies, and some systems supply just this minimum. Portable programs should use sched_get_priority_min(2) and sched_get_priority_max(2) to find the range of priorities supported for a particular policy. Conceptually, the scheduler maintains a list of runnable processes for each possible sched_priority value. In order to determine which process runs next, the scheduler looks for the nonempty list with the highest static priority and selects the process at the head of this list. A process's scheduling policy determines where it will be inserted into the list of processes with equal static priority and how it will move inside this list. All scheduling is preemptive: if a process with a higher static prior‐ ity becomes ready to run, the currently running process will be pre‐ empted and returned to the wait list for its static priority level. The scheduling policy only determines the ordering within the list of runnable processes with equal static priority. SCHED_DEADLINE: Sporadic task model deadline scheduling SCHED_DEADLINE is an implementation of GEDF (Global Earliest Deadline First) with additional CBS (Constant Bandwidth Server). The CBS guarantees that tasks that over-run their specified budget are throttled and do not affect the correct performance of other SCHED_DEADLINE tasks. SCHED_DEADLINE tasks will fail FORK(2) with -EAGAIN Setting SCHED_DEADLINE can fail with -EINVAL when admission control tests fail. SCHED_FIFO: First In-First Out scheduling SCHED_FIFO can only be used with static priorities higher than 0, which means that when a SCHED_FIFO processes becomes runnable, it will always immediately preempt any currently running SCHED_OTHER, SCHED_BATCH, or SCHED_IDLE process. SCHED_FIFO is a simple scheduling algorithm with‐ out time slicing. For processes scheduled under the SCHED_FIFO policy, the following rules apply: * A SCHED_FIFO process that has been preempted by another process of higher priority will stay at the head of the list for its priority and will resume execution as soon as all processes of higher prior‐ ity are blocked again. * When a SCHED_FIFO process becomes runnable, it will be inserted at the end of the list for its priority. * A call to sched_setscheduler() or sched_setparam(2) will put the SCHED_FIFO (or SCHED_RR) process identified by pid at the start of the list if it was runnable. As a consequence, it may preempt the currently running process if it has the same priority. (POSIX.1-2001 specifies that the process should go to the end of the list.) * A process calling sched_yield(2) will be put at the end of the list. No other events will move a process scheduled under the SCHED_FIFO pol‐ icy in the wait list of runnable processes with equal static priority. A SCHED_FIFO process runs until either it is blocked by an I/O request, it is preempted by a higher priority process, or it calls sched_yield(2). SCHED_RR: Round Robin scheduling SCHED_RR is a simple enhancement of SCHED_FIFO. Everything described above for SCHED_FIFO also applies to SCHED_RR, except that each process is only allowed to run for a maximum time quantum. If a SCHED_RR process has been running for a time period equal to or longer than the time quantum, it will be put at the end of the list for its priority. A SCHED_RR process that has been preempted by a higher priority process and subsequently resumes execution as a running process will complete the unexpired portion of its round robin time quantum. The length of the time quantum can be retrieved using sched_rr_get_interval(2). SCHED_OTHER: Default Linux time-sharing scheduling SCHED_OTHER can only be used at static priority 0. SCHED_OTHER is the standard Linux time-sharing scheduler that is intended for all pro‐ cesses that do not require the special real-time mechanisms. The process to run is chosen from the static priority 0 list based on a dynamic priority that is determined only inside this list. The dynamic priority is based on the nice value (set by nice(2) or setpriority(2)) and increased for each time quantum the process is ready to run, but denied to run by the scheduler. This ensures fair progress among all SCHED_OTHER processes. SCHED_BATCH: Scheduling batch processes (Since Linux 2.6.16.) SCHED_BATCH can only be used at static priority 0. This policy is similar to SCHED_OTHER in that it schedules the process according to its dynamic priority (based on the nice value). The difference is that this policy will cause the scheduler to always assume that the process is CPU-intensive. Consequently, the scheduler will apply a small scheduling penalty with respect to wakeup behaviour, so that this process is mildly disfavored in scheduling decisions. This policy is useful for workloads that are noninteractive, but do not want to lower their nice value, and for workloads that want a determin‐ istic scheduling policy without interactivity causing extra preemptions (between the workload's tasks). SCHED_IDLE: Scheduling very low priority jobs (Since Linux 2.6.23.) SCHED_IDLE can only be used at static priority 0; the process nice value has no influence for this policy. This policy is intended for running jobs at extremely low priority (lower even than a +19 nice value with the SCHED_OTHER or SCHED_BATCH policies). RETURN VALUE On success, sched_setattr() and sched_getattr() return 0. On error, -1 is returned, and errno is set appropriately. ERRORS EINVAL The scheduling policy is not one of the recognized policies, param is NULL, or param does not make sense for the policy. EPERM The calling process does not have appropriate privileges. ESRCH The process whose ID is pid could not be found. E2BIG The provided storage for struct sched_attr is either too big, see sched_setattr(), or too small, see sched_getattr(). NOTES While the text above (and in SCHED_SETSCHEDULER(2)) talks about processes, in actual fact these system calls are thread specific. -- To unsubscribe from this list: send the line "unsubscribe linux-man" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 26+ messages in thread
[parent not found: <20140409092510.GQ11096-ndre7Fmf5hadTX5a5knrm8zTDFooKrT+cvkQGrU6aU0@public.gmane.org>]
* Re: sched_{set,get}attr() manpage [not found] ` <20140409092510.GQ11096-ndre7Fmf5hadTX5a5knrm8zTDFooKrT+cvkQGrU6aU0@public.gmane.org> @ 2014-04-09 15:19 ` Henrik Austad [not found] ` <20140409151911.GA4041-RT+80VE2nyv1P9xLtpHBDw@public.gmane.org> 0 siblings, 1 reply; 26+ messages in thread From: Henrik Austad @ 2014-04-09 15:19 UTC (permalink / raw) To: Peter Zijlstra Cc: Michael Kerrisk (man-pages), Dario Faggioli, Thomas Gleixner, Ingo Molnar, rostedt-nx8X9YLhiw1AfugRpC6u6w, Oleg Nesterov, fweisbec-Re5JQEeQqe8AvxtiuMwx3w, darren-P76s1CtE8BHQT0dZR+AlfA, johan.eker-IzeFyvvaP7pWk0Htik3J/w, p.faure-et3tyl94nDNyDzI6CaY1VQ, Linux Kernel, claudio-YOzL5CV4y4YG1A2ADO40+w, michael-dyjBcgdgk7Pe9wHmmfpqLFaTQe2KTcn/, fchecconi-Re5JQEeQqe8AvxtiuMwx3w, tommaso.cucinotta-gAmJrWFzCps, juri.lelli-Re5JQEeQqe8AvxtiuMwx3w, nicola.manica-+cHZLFJ93xAO91npARCAeA, luca.abeni-3IIOeSMMxS4, dhaval.giani-Re5JQEeQqe8AvxtiuMwx3w, hgu1972-Re5JQEeQqe8AvxtiuMwx3w, Paul McKenney, insop.song-Re5JQEeQqe8AvxtiuMwx3w, liming.wang-CWA4WttNNZF54TAoqtyWWQ, jkacur-H+wXaHxf7aLQT0dZR+AlfA, linux-man-u79uwXL29TY76Z2rM5mHXA On Wed, Apr 09, 2014 at 11:25:10AM +0200, Peter Zijlstra wrote: > On Mon, Feb 17, 2014 at 02:20:29PM +0100, Michael Kerrisk (man-pages) wrote: > > If your could take another pass though your existing text, to incorporate the > > new flags stuff, and then send a page to me + linux-man@ > > that would be great. > > > Sorry, this slipped my mind. An updated version below. Heavy borrowing > from SCHED_SETSCHEDULER(2) as before. > > --- > > NAME > sched_setattr, sched_getattr - set and get scheduling policy/attributes > > SYNOPSIS > #include <sched.h> > > struct sched_attr { > u32 size; > u32 sched_policy; > u64 sched_flags; > > /* SCHED_NORMAL, SCHED_BATCH */ > s32 sched_nice; > /* SCHED_FIFO, SCHED_RR */ > u32 sched_priority; > /* SCHED_DEADLINE */ > u64 sched_runtime; > u64 sched_deadline; > u64 sched_period; > }; > int sched_setattr(pid_t pid, const struct sched_attr *attr, unsigned int flags); > > int sched_getattr(pid_t pid, const struct sched_attr *attr, unsigned int size, unsigned int flags); > > DESCRIPTION > sched_setattr() sets both the scheduling policy and the > associated attributes for the process whose ID is specified in > pid. If pid equals zero, the scheduling policy and attributes > of the calling process will be set. The interpretation of the > argument attr depends on the selected policy. Currently, Linux > supports the following "normal" (i.e., non-real-time) scheduling > policies: > > SCHED_OTHER the standard "fair" time-sharing policy; > > SCHED_BATCH for "batch" style execution of processes; and > > SCHED_IDLE for running very low priority background jobs. > > The following "real-time" policies are also supported, for why the "'s? > special time-critical applications that need precise control > over the way in which runnable processes are selected for > execution: > > SCHED_FIFO a first-in, first-out policy; > > SCHED_RR a round-robin policy; and > > SCHED_DEADLINE a deadline policy. > > The semantics of each of these policies are detailed below. > > sched_attr::size must be set to the size of the structure, as in > sizeof(struct sched_attr), if the provided structure is smaller > than the kernel structure, any additional fields are assumed > '0'. If the provided structure is larger than the kernel > structure, the kernel verifies all additional fields are '0' if > not the syscall will fail with -E2BIG. > > sched_attr::sched_policy the desired scheduling policy. > > sched_attr::sched_flags additional flags that can influence > scheduling behaviour. Currently as per Linux kernel 3.14: > > SCHED_FLAG_RESET_ON_FORK - resets the scheduling policy > to: (struct sched_attr){ .sched_policy = SCHED_OTHER, } > on fork(). > > is the only supported flag. > > sched_attr::sched_nice should only be set for SCHED_OTHER, > SCHED_BATCH, the desired nice value [-20,19], see NICE(2). > > sched_attr::sched_priority should only be set for SCHED_FIFO, > SCHED_RR, the desired static priority [1,99]. > > sched_attr::sched_runtime > sched_attr::sched_deadline > sched_attr::sched_period should only be set for SCHED_DEADLINE > and are the traditional sporadic task model parameters. > > The flags argument should be 0. > > sched_getattr() queries the scheduling policy currently applied > to the process identified by pid. If pid equals zero, the > policy of the calling process will be retrieved. > > The size argument should reflect the size of struct sched_attr > as known to userspace. The kernel fills out sched_attr::size to > the size of its sched_attr structure. If the user provided > structure is larger, additional fields are not touched. If the > user provided structure is smaller, but the kernel needs to > return values outside the provided space, the syscall will fail > with -E2BIG. > > The flags argument should be 0. What about SCHED_FLAG_RESET_ON_FOR? > The other sched_attr fields are filled out as described in > sched_setattr(). > > Scheduling Policies > The scheduler is the kernel component that decides which runnable > process will be executed by the CPU next. Each process has an associ‐ > ated scheduling policy and a static scheduling priority, sched_prior‐ > ity; these are the settings that are modified by sched_setscheduler(). > The scheduler makes it decisions based on knowledge of the scheduling > policy and static priority of all processes on the system. Isn't this last sentence redundant/sliglhtly repetitive? > For processes scheduled under one of the normal scheduling policies > (SCHED_OTHER, SCHED_IDLE, SCHED_BATCH), sched_priority is not used in > scheduling decisions (it must be specified as 0). > > Processes scheduled under one of the real-time policies (SCHED_FIFO, > SCHED_RR) have a sched_priority value in the range 1 (low) to 99 > (high). (As the numbers imply, real-time processes always have higher > priority than normal processes.) Note well: POSIX.1-2001 only requires > an implementation to support a minimum 32 distinct priority levels for > the real-time policies, and some systems supply just this minimum. > Portable programs should use sched_get_priority_min(2) and > sched_get_priority_max(2) to find the range of priorities supported for > a particular policy. > > Conceptually, the scheduler maintains a list of runnable processes for > each possible sched_priority value. In order to determine which > process runs next, the scheduler looks for the nonempty list with the > highest static priority and selects the process at the head of this > list. > > A process's scheduling policy determines where it will be inserted into > the list of processes with equal static priority and how it will move > inside this list. > > All scheduling is preemptive: if a process with a higher static prior‐ > ity becomes ready to run, the currently running process will be pre‐ > empted and returned to the wait list for its static priority level. > The scheduling policy only determines the ordering within the list of > runnable processes with equal static priority. > > SCHED_DEADLINE: Sporadic task model deadline scheduling > SCHED_DEADLINE is an implementation of GEDF (Global Earliest > Deadline First) with additional CBS (Constant Bandwidth Server). > The CBS guarantees that tasks that over-run their specified > budget are throttled and do not affect the correct performance > of other SCHED_DEADLINE tasks. > > SCHED_DEADLINE tasks will fail FORK(2) with -EAGAIN > > Setting SCHED_DEADLINE can fail with -EINVAL when admission > control tests fail. Perhaps add a note about the deadline-class having higher priority than the other classes; i.e. if a deadline-task is runnable, it will preempt any other SCHED_(RR|FIFO) regardless of priority? > SCHED_FIFO: First In-First Out scheduling > SCHED_FIFO can only be used with static priorities higher than 0, which > means that when a SCHED_FIFO processes becomes runnable, it will always > immediately preempt any currently running SCHED_OTHER, SCHED_BATCH, or > SCHED_IDLE process. SCHED_FIFO is a simple scheduling algorithm with‐ > out time slicing. For processes scheduled under the SCHED_FIFO policy, > the following rules apply: > > * A SCHED_FIFO process that has been preempted by another process of > higher priority will stay at the head of the list for its priority > and will resume execution as soon as all processes of higher prior‐ > ity are blocked again. > > * When a SCHED_FIFO process becomes runnable, it will be inserted at > the end of the list for its priority. > > * A call to sched_setscheduler() or sched_setparam(2) will put the > SCHED_FIFO (or SCHED_RR) process identified by pid at the start of > the list if it was runnable. As a consequence, it may preempt the > currently running process if it has the same priority. > (POSIX.1-2001 specifies that the process should go to the end of the > list.) > > * A process calling sched_yield(2) will be put at the end of the list. How about the recent discussion regarding sched_yield(). Is this correct? lkml.kernel.org/r/alpine.DEB.2.02.1403312333100.14882-3cz04HxQygjZikZi3RtOZ2GXanvQGlWp@public.gmane.orgnix.de Is this the correct place to add a note explaining te potentional pitfalls using sched_yield? > No other events will move a process scheduled under the SCHED_FIFO pol‐ > icy in the wait list of runnable processes with equal static priority. > > A SCHED_FIFO process runs until either it is blocked by an I/O request, > it is preempted by a higher priority process, or it calls > sched_yield(2). > > SCHED_RR: Round Robin scheduling > SCHED_RR is a simple enhancement of SCHED_FIFO. Everything described > above for SCHED_FIFO also applies to SCHED_RR, except that each process > is only allowed to run for a maximum time quantum. If a SCHED_RR > process has been running for a time period equal to or longer than the > time quantum, it will be put at the end of the list for its priority. > A SCHED_RR process that has been preempted by a higher priority process > and subsequently resumes execution as a running process will complete > the unexpired portion of its round robin time quantum. The length of > the time quantum can be retrieved using sched_rr_get_interval(2). -> Default is 0.1HZ ms This is a question I get form time to time, having this in the manpage would be helpful. > SCHED_OTHER: Default Linux time-sharing scheduling > SCHED_OTHER can only be used at static priority 0. SCHED_OTHER is the > standard Linux time-sharing scheduler that is intended for all pro‐ > cesses that do not require the special real-time mechanisms. The > process to run is chosen from the static priority 0 list based on a > dynamic priority that is determined only inside this list. The dynamic > priority is based on the nice value (set by nice(2) or setpriority(2)) > and increased for each time quantum the process is ready to run, but > denied to run by the scheduler. This ensures fair progress among all > SCHED_OTHER processes. > > SCHED_BATCH: Scheduling batch processes > (Since Linux 2.6.16.) SCHED_BATCH can only be used at static priority > 0. This policy is similar to SCHED_OTHER in that it schedules the > process according to its dynamic priority (based on the nice value). > The difference is that this policy will cause the scheduler to always > assume that the process is CPU-intensive. Consequently, the scheduler > will apply a small scheduling penalty with respect to wakeup behaviour, > so that this process is mildly disfavored in scheduling decisions. > > This policy is useful for workloads that are noninteractive, but do not > want to lower their nice value, and for workloads that want a determin‐ > istic scheduling policy without interactivity causing extra preemptions > (between the workload's tasks). > > SCHED_IDLE: Scheduling very low priority jobs > (Since Linux 2.6.23.) SCHED_IDLE can only be used at static priority > 0; the process nice value has no influence for this policy. > > This policy is intended for running jobs at extremely low priority > (lower even than a +19 nice value with the SCHED_OTHER or SCHED_BATCH > policies). > > RETURN VALUE > On success, sched_setattr() and sched_getattr() return 0. On > error, -1 is returned, and errno is set appropriately. > > ERRORS > EINVAL The scheduling policy is not one of the recognized policies, > param is NULL, or param does not make sense for the policy. > > EPERM The calling process does not have appropriate privileges. > > ESRCH The process whose ID is pid could not be found. > > E2BIG The provided storage for struct sched_attr is either too > big, see sched_setattr(), or too small, see sched_getattr(). Where's the EBUSY? It can throw this from __sched_setscheduler() when it checks if there's enough bandwidth to run the task. > > NOTES > While the text above (and in SCHED_SETSCHEDULER(2)) talks about > processes, in actual fact these system calls are thread specific. -- Henrik Austad -- To unsubscribe from this list: send the line "unsubscribe linux-man" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 26+ messages in thread
[parent not found: <20140409151911.GA4041-RT+80VE2nyv1P9xLtpHBDw@public.gmane.org>]
* Re: sched_{set,get}attr() manpage [not found] ` <20140409151911.GA4041-RT+80VE2nyv1P9xLtpHBDw@public.gmane.org> @ 2014-04-09 15:42 ` Peter Zijlstra [not found] ` <20140409154204.GD10526-ndre7Fmf5hadTX5a5knrm8zTDFooKrT+cvkQGrU6aU0@public.gmane.org> 0 siblings, 1 reply; 26+ messages in thread From: Peter Zijlstra @ 2014-04-09 15:42 UTC (permalink / raw) To: Henrik Austad Cc: Michael Kerrisk (man-pages), Dario Faggioli, Thomas Gleixner, Ingo Molnar, rostedt-nx8X9YLhiw1AfugRpC6u6w, Oleg Nesterov, fweisbec-Re5JQEeQqe8AvxtiuMwx3w, darren-P76s1CtE8BHQT0dZR+AlfA, johan.eker-IzeFyvvaP7pWk0Htik3J/w, p.faure-et3tyl94nDNyDzI6CaY1VQ, Linux Kernel, claudio-YOzL5CV4y4YG1A2ADO40+w, michael-dyjBcgdgk7Pe9wHmmfpqLFaTQe2KTcn/, fchecconi-Re5JQEeQqe8AvxtiuMwx3w, tommaso.cucinotta-gAmJrWFzCps, juri.lelli-Re5JQEeQqe8AvxtiuMwx3w, nicola.manica-+cHZLFJ93xAO91npARCAeA, luca.abeni-3IIOeSMMxS4, dhaval.giani-Re5JQEeQqe8AvxtiuMwx3w, hgu1972-Re5JQEeQqe8AvxtiuMwx3w, Paul McKenney, insop.song-Re5JQEeQqe8AvxtiuMwx3w, liming.wang-CWA4WttNNZF54TAoqtyWWQ, jkacur-H+wXaHxf7aLQT0dZR+AlfA, linux-man-u79uwXL29TY76Z2rM5mHXA On Wed, Apr 09, 2014 at 05:19:11PM +0200, Henrik Austad wrote: > > The following "real-time" policies are also supported, for > > why the "'s? I borrowed those from SCHED_SETSCHEDULER(2). > > sched_attr::sched_flags additional flags that can influence > > scheduling behaviour. Currently as per Linux kernel 3.14: > > > > SCHED_FLAG_RESET_ON_FORK - resets the scheduling policy > > to: (struct sched_attr){ .sched_policy = SCHED_OTHER, } > > on fork(). > > > > is the only supported flag. ... > > The flags argument should be 0. > > What about SCHED_FLAG_RESET_ON_FOR? Different flags. The one is sched_attr::flags the other is sched_setattr(.flags). > > The other sched_attr fields are filled out as described in > > sched_setattr(). > > > > Scheduling Policies > > The scheduler is the kernel component that decides which runnable > > process will be executed by the CPU next. Each process has an associ‐ > > ated scheduling policy and a static scheduling priority, sched_prior‐ > > ity; these are the settings that are modified by sched_setscheduler(). > > The scheduler makes it decisions based on knowledge of the scheduling > > policy and static priority of all processes on the system. > > Isn't this last sentence redundant/sliglhtly repetitive? I borrowed that from SCHED_SETSCHEDULER(2) again. > > SCHED_DEADLINE: Sporadic task model deadline scheduling > > SCHED_DEADLINE is an implementation of GEDF (Global Earliest > > Deadline First) with additional CBS (Constant Bandwidth Server). > > The CBS guarantees that tasks that over-run their specified > > budget are throttled and do not affect the correct performance > > of other SCHED_DEADLINE tasks. > > > > SCHED_DEADLINE tasks will fail FORK(2) with -EAGAIN > > > > Setting SCHED_DEADLINE can fail with -EINVAL when admission > > control tests fail. > > Perhaps add a note about the deadline-class having higher priority than the > other classes; i.e. if a deadline-task is runnable, it will preempt any > other SCHED_(RR|FIFO) regardless of priority? Yes, good point, will do. > > SCHED_FIFO: First In-First Out scheduling > > SCHED_FIFO can only be used with static priorities higher than 0, which > > means that when a SCHED_FIFO processes becomes runnable, it will always > > immediately preempt any currently running SCHED_OTHER, SCHED_BATCH, or > > SCHED_IDLE process. SCHED_FIFO is a simple scheduling algorithm with‐ > > out time slicing. For processes scheduled under the SCHED_FIFO policy, > > the following rules apply: > > > > * A SCHED_FIFO process that has been preempted by another process of > > higher priority will stay at the head of the list for its priority > > and will resume execution as soon as all processes of higher prior‐ > > ity are blocked again. > > > > * When a SCHED_FIFO process becomes runnable, it will be inserted at > > the end of the list for its priority. > > > > * A call to sched_setscheduler() or sched_setparam(2) will put the > > SCHED_FIFO (or SCHED_RR) process identified by pid at the start of > > the list if it was runnable. As a consequence, it may preempt the > > currently running process if it has the same priority. > > (POSIX.1-2001 specifies that the process should go to the end of the > > list.) > > > > * A process calling sched_yield(2) will be put at the end of the list. > > How about the recent discussion regarding sched_yield(). Is this correct? > > lkml.kernel.org/r/alpine.DEB.2.02.1403312333100.14882-3cz04HxQyghirQl2xIGP3A@public.gmane.orgronix.de > > Is this the correct place to add a note explaining te potentional pitfalls > using sched_yield? I'm not sure; there's a SCHED_YIELD(2) manpage to fill with that nonsense. Also; I realized I have not described the DEADLINE sched_yield() behaviour. > > No other events will move a process scheduled under the SCHED_FIFO pol‐ > > icy in the wait list of runnable processes with equal static priority. > > > > A SCHED_FIFO process runs until either it is blocked by an I/O request, > > it is preempted by a higher priority process, or it calls > > sched_yield(2). > > > > SCHED_RR: Round Robin scheduling > > SCHED_RR is a simple enhancement of SCHED_FIFO. Everything described > > above for SCHED_FIFO also applies to SCHED_RR, except that each process > > is only allowed to run for a maximum time quantum. If a SCHED_RR > > process has been running for a time period equal to or longer than the > > time quantum, it will be put at the end of the list for its priority. > > A SCHED_RR process that has been preempted by a higher priority process > > and subsequently resumes execution as a running process will complete > > the unexpired portion of its round robin time quantum. The length of > > the time quantum can be retrieved using sched_rr_get_interval(2). > > -> Default is 0.1HZ ms > > This is a question I get form time to time, having this in the manpage > would be helpful. Again, brazenly stolen from SCHED_SETSCHEDULER(2); but yes. Also I'm not sure I'd call RR an enhancement of anything much at all ;-) > > ERRORS > > EINVAL The scheduling policy is not one of the recognized policies, > > param is NULL, or param does not make sense for the policy. > > > > EPERM The calling process does not have appropriate privileges. > > > > ESRCH The process whose ID is pid could not be found. > > > > E2BIG The provided storage for struct sched_attr is either too > > big, see sched_setattr(), or too small, see sched_getattr(). > > Where's the EBUSY? It can throw this from __sched_setscheduler() when it > checks if there's enough bandwidth to run the task. Uhhm.. it got lost :-) /me quickly adds. -- To unsubscribe from this list: send the line "unsubscribe linux-man" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 26+ messages in thread
[parent not found: <20140409154204.GD10526-ndre7Fmf5hadTX5a5knrm8zTDFooKrT+cvkQGrU6aU0@public.gmane.org>]
* Re: sched_{set,get}attr() manpage [not found] ` <20140409154204.GD10526-ndre7Fmf5hadTX5a5knrm8zTDFooKrT+cvkQGrU6aU0@public.gmane.org> @ 2014-04-10 7:47 ` Juri Lelli 2014-04-10 9:59 ` Claudio Scordino 2014-04-27 15:47 ` Michael Kerrisk (man-pages) 1 sibling, 1 reply; 26+ messages in thread From: Juri Lelli @ 2014-04-10 7:47 UTC (permalink / raw) To: Peter Zijlstra Cc: Henrik Austad, Michael Kerrisk (man-pages), Dario Faggioli, Thomas Gleixner, Ingo Molnar, rostedt-nx8X9YLhiw1AfugRpC6u6w, Oleg Nesterov, fweisbec-Re5JQEeQqe8AvxtiuMwx3w, darren-P76s1CtE8BHQT0dZR+AlfA, johan.eker-IzeFyvvaP7pWk0Htik3J/w, p.faure-et3tyl94nDNyDzI6CaY1VQ, Linux Kernel, claudio-YOzL5CV4y4YG1A2ADO40+w, michael-dyjBcgdgk7Pe9wHmmfpqLFaTQe2KTcn/, fchecconi-Re5JQEeQqe8AvxtiuMwx3w, tommaso.cucinotta-gAmJrWFzCps, nicola.manica-+cHZLFJ93xAO91npARCAeA, luca.abeni-3IIOeSMMxS4, dhaval.giani-Re5JQEeQqe8AvxtiuMwx3w, hgu1972-Re5JQEeQqe8AvxtiuMwx3w, Paul McKenney, insop.song-Re5JQEeQqe8AvxtiuMwx3w, liming.wang-CWA4WttNNZF54TAoqtyWWQ, jkacur-H+wXaHxf7aLQT0dZR+AlfA, linux-man-u79uwXL29TY76Z2rM5mHXA Hi all, On Wed, 9 Apr 2014 17:42:04 +0200 Peter Zijlstra <peterz-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org> wrote: > On Wed, Apr 09, 2014 at 05:19:11PM +0200, Henrik Austad wrote: > > > The following "real-time" policies are also supported, for > > > > why the "'s? > > I borrowed those from SCHED_SETSCHEDULER(2). > > > > sched_attr::sched_flags additional flags that can influence > > > scheduling behaviour. Currently as per Linux kernel 3.14: > > > > > > SCHED_FLAG_RESET_ON_FORK - resets the scheduling policy > > > to: (struct sched_attr){ .sched_policy = SCHED_OTHER, } > > > on fork(). > > > > > > is the only supported flag. > > ... > > > > The flags argument should be 0. > > > > What about SCHED_FLAG_RESET_ON_FOR? > > Different flags. The one is sched_attr::flags the other is > sched_setattr(.flags). > > > > The other sched_attr fields are filled out as described in > > > sched_setattr(). > > > > > > Scheduling Policies > > > The scheduler is the kernel component that decides which runnable > > > process will be executed by the CPU next. Each process has an associ‐ > > > ated scheduling policy and a static scheduling priority, sched_prior‐ > > > ity; these are the settings that are modified by sched_setscheduler(). > > > The scheduler makes it decisions based on knowledge of the scheduling > > > policy and static priority of all processes on the system. > > > > Isn't this last sentence redundant/sliglhtly repetitive? > > I borrowed that from SCHED_SETSCHEDULER(2) again. > > > > SCHED_DEADLINE: Sporadic task model deadline scheduling > > > SCHED_DEADLINE is an implementation of GEDF (Global Earliest > > > Deadline First) with additional CBS (Constant Bandwidth Server). > > > The CBS guarantees that tasks that over-run their specified > > > budget are throttled and do not affect the correct performance > > > of other SCHED_DEADLINE tasks. > > > > > > SCHED_DEADLINE tasks will fail FORK(2) with -EAGAIN > > > > > > Setting SCHED_DEADLINE can fail with -EINVAL when admission > > > control tests fail. > > > > Perhaps add a note about the deadline-class having higher priority than the > > other classes; i.e. if a deadline-task is runnable, it will preempt any > > other SCHED_(RR|FIFO) regardless of priority? > > Yes, good point, will do. > > > > SCHED_FIFO: First In-First Out scheduling > > > SCHED_FIFO can only be used with static priorities higher than 0, which > > > means that when a SCHED_FIFO processes becomes runnable, it will always > > > immediately preempt any currently running SCHED_OTHER, SCHED_BATCH, or > > > SCHED_IDLE process. SCHED_FIFO is a simple scheduling algorithm with‐ > > > out time slicing. For processes scheduled under the SCHED_FIFO policy, > > > the following rules apply: > > > > > > * A SCHED_FIFO process that has been preempted by another process of > > > higher priority will stay at the head of the list for its priority > > > and will resume execution as soon as all processes of higher prior‐ > > > ity are blocked again. > > > > > > * When a SCHED_FIFO process becomes runnable, it will be inserted at > > > the end of the list for its priority. > > > > > > * A call to sched_setscheduler() or sched_setparam(2) will put the > > > SCHED_FIFO (or SCHED_RR) process identified by pid at the start of > > > the list if it was runnable. As a consequence, it may preempt the > > > currently running process if it has the same priority. > > > (POSIX.1-2001 specifies that the process should go to the end of the > > > list.) > > > > > > * A process calling sched_yield(2) will be put at the end of the list. > > > > How about the recent discussion regarding sched_yield(). Is this correct? > > > > lkml.kernel.org/r/alpine.DEB.2.02.1403312333100.14882-3cz04HxQyghBDYKCnqQNtA@public.gmane.orgutronix.de > > > > Is this the correct place to add a note explaining te potentional pitfalls > > using sched_yield? > > I'm not sure; there's a SCHED_YIELD(2) manpage to fill with that > nonsense. > > Also; I realized I have not described the DEADLINE sched_yield() > behaviour. > So, for SCHED_DEADLINE we currently have this behaviour: /* * Yield task semantic for -deadline tasks is: * * get off from the CPU until our next instance, with * a new runtime. This is of little use now, since we * don't have a bandwidth reclaiming mechanism. Anyway, * bandwidth reclaiming is planned for the future, and * yield_task_dl will indicate that some spare budget * is available for other task instances to use it. */ But, considering also the discussion above, I'm less sure now that's what we want. Still, I think we will want some way in the future to be able to say "I'm finished with my current job, give this remaining runtime to someone else", like another syscall or something. Thanks, - Juri > > > No other events will move a process scheduled under the SCHED_FIFO pol‐ > > > icy in the wait list of runnable processes with equal static priority. > > > > > > A SCHED_FIFO process runs until either it is blocked by an I/O request, > > > it is preempted by a higher priority process, or it calls > > > sched_yield(2). > > > > > > SCHED_RR: Round Robin scheduling > > > SCHED_RR is a simple enhancement of SCHED_FIFO. Everything described > > > above for SCHED_FIFO also applies to SCHED_RR, except that each process > > > is only allowed to run for a maximum time quantum. If a SCHED_RR > > > process has been running for a time period equal to or longer than the > > > time quantum, it will be put at the end of the list for its priority. > > > A SCHED_RR process that has been preempted by a higher priority process > > > and subsequently resumes execution as a running process will complete > > > the unexpired portion of its round robin time quantum. The length of > > > the time quantum can be retrieved using sched_rr_get_interval(2). > > > > -> Default is 0.1HZ ms > > > > This is a question I get form time to time, having this in the manpage > > would be helpful. > > Again, brazenly stolen from SCHED_SETSCHEDULER(2); but yes. Also I'm not > sure I'd call RR an enhancement of anything much at all ;-) > > > > ERRORS > > > EINVAL The scheduling policy is not one of the recognized policies, > > > param is NULL, or param does not make sense for the policy. > > > > > > EPERM The calling process does not have appropriate privileges. > > > > > > ESRCH The process whose ID is pid could not be found. > > > > > > E2BIG The provided storage for struct sched_attr is either too > > > big, see sched_setattr(), or too small, see sched_getattr(). > > > > Where's the EBUSY? It can throw this from __sched_setscheduler() when it > > checks if there's enough bandwidth to run the task. > > Uhhm.. it got lost :-) /me quickly adds. -- To unsubscribe from this list: send the line "unsubscribe linux-man" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: sched_{set,get}attr() manpage 2014-04-10 7:47 ` Juri Lelli @ 2014-04-10 9:59 ` Claudio Scordino 0 siblings, 0 replies; 26+ messages in thread From: Claudio Scordino @ 2014-04-10 9:59 UTC (permalink / raw) To: Juri Lelli, Peter Zijlstra Cc: Henrik Austad, Michael Kerrisk (man-pages), Dario Faggioli, Thomas Gleixner, Ingo Molnar, rostedt, Oleg Nesterov, fweisbec, darren, johan.eker, p.faure, Linux Kernel, michael, fchecconi, tommaso.cucinotta, nicola.manica, luca.abeni, dhaval.giani, hgu1972, Paul McKenney, insop.song, liming.wang, jkacur, linux-man Il 10/04/2014 09:47, Juri Lelli ha scritto: > Hi all, > > On Wed, 9 Apr 2014 17:42:04 +0200 > Peter Zijlstra <peterz@infradead.org> wrote: > >> On Wed, Apr 09, 2014 at 05:19:11PM +0200, Henrik Austad wrote: >>>> The following "real-time" policies are also supported, for >>> why the "'s? >> I borrowed those from SCHED_SETSCHEDULER(2). >> >>>> sched_attr::sched_flags additional flags that can influence >>>> scheduling behaviour. Currently as per Linux kernel 3.14: >>>> >>>> SCHED_FLAG_RESET_ON_FORK - resets the scheduling policy >>>> to: (struct sched_attr){ .sched_policy = SCHED_OTHER, } >>>> on fork(). >>>> >>>> is the only supported flag. >> ... >> >>>> The flags argument should be 0. >>> What about SCHED_FLAG_RESET_ON_FOR? >> Different flags. The one is sched_attr::flags the other is >> sched_setattr(.flags). >> >>>> The other sched_attr fields are filled out as described in >>>> sched_setattr(). >>>> >>>> Scheduling Policies >>>> The scheduler is the kernel component that decides which runnable >>>> process will be executed by the CPU next. Each process has an associ‐ >>>> ated scheduling policy and a static scheduling priority, sched_prior‐ >>>> ity; these are the settings that are modified by sched_setscheduler(). >>>> The scheduler makes it decisions based on knowledge of the scheduling >>>> policy and static priority of all processes on the system. >>> Isn't this last sentence redundant/sliglhtly repetitive? >> I borrowed that from SCHED_SETSCHEDULER(2) again. >> >>>> SCHED_DEADLINE: Sporadic task model deadline scheduling >>>> SCHED_DEADLINE is an implementation of GEDF (Global Earliest >>>> Deadline First) with additional CBS (Constant Bandwidth Server). >>>> The CBS guarantees that tasks that over-run their specified >>>> budget are throttled and do not affect the correct performance >>>> of other SCHED_DEADLINE tasks. >>>> >>>> SCHED_DEADLINE tasks will fail FORK(2) with -EAGAIN >>>> >>>> Setting SCHED_DEADLINE can fail with -EINVAL when admission >>>> control tests fail. >>> Perhaps add a note about the deadline-class having higher priority than the >>> other classes; i.e. if a deadline-task is runnable, it will preempt any >>> other SCHED_(RR|FIFO) regardless of priority? >> Yes, good point, will do. >> >>>> SCHED_FIFO: First In-First Out scheduling >>>> SCHED_FIFO can only be used with static priorities higher than 0, which >>>> means that when a SCHED_FIFO processes becomes runnable, it will always >>>> immediately preempt any currently running SCHED_OTHER, SCHED_BATCH, or >>>> SCHED_IDLE process. SCHED_FIFO is a simple scheduling algorithm with‐ >>>> out time slicing. For processes scheduled under the SCHED_FIFO policy, >>>> the following rules apply: >>>> >>>> * A SCHED_FIFO process that has been preempted by another process of >>>> higher priority will stay at the head of the list for its priority >>>> and will resume execution as soon as all processes of higher prior‐ >>>> ity are blocked again. >>>> >>>> * When a SCHED_FIFO process becomes runnable, it will be inserted at >>>> the end of the list for its priority. >>>> >>>> * A call to sched_setscheduler() or sched_setparam(2) will put the >>>> SCHED_FIFO (or SCHED_RR) process identified by pid at the start of >>>> the list if it was runnable. As a consequence, it may preempt the >>>> currently running process if it has the same priority. >>>> (POSIX.1-2001 specifies that the process should go to the end of the >>>> list.) >>>> >>>> * A process calling sched_yield(2) will be put at the end of the list. >>> How about the recent discussion regarding sched_yield(). Is this correct? >>> >>> lkml.kernel.org/r/alpine.DEB.2.02.1403312333100.14882@ionos.tec.linutronix.de >>> >>> Is this the correct place to add a note explaining te potentional pitfalls >>> using sched_yield? >> I'm not sure; there's a SCHED_YIELD(2) manpage to fill with that >> nonsense. >> >> Also; I realized I have not described the DEADLINE sched_yield() >> behaviour. >> > So, for SCHED_DEADLINE we currently have this behaviour: > > /* > * Yield task semantic for -deadline tasks is: > * > * get off from the CPU until our next instance, with > * a new runtime. This is of little use now, since we > * don't have a bandwidth reclaiming mechanism. Anyway, > * bandwidth reclaiming is planned for the future, and > * yield_task_dl will indicate that some spare budget > * is available for other task instances to use it. > */ > > But, considering also the discussion above, I'm less sure now that's > what we want. Still, I think we will want some way in the future to be > able to say "I'm finished with my current job, give this remaining > runtime to someone else", like another syscall or something. Hi Juri, hi Peter, my two cents: A syscall to block the task until its next instance is definitely useful. This way, a periodic task doesn't have to sleep anymore: the kernel takes care of unblocking the task at the right moment. This would be easier (for user-level) and more efficient too. I don't know if using sched_yield() to get this behavior is a good choice or not. You have ways more experience than me :) Best, Claudio ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: sched_{set,get}attr() manpage [not found] ` <20140409154204.GD10526-ndre7Fmf5hadTX5a5knrm8zTDFooKrT+cvkQGrU6aU0@public.gmane.org> 2014-04-10 7:47 ` Juri Lelli @ 2014-04-27 15:47 ` Michael Kerrisk (man-pages) [not found] ` <CAKgNAki5BkOyckf1zxJCRs2tq-eG9bWW_yRGi3hDynz12wz+QQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 1 sibling, 1 reply; 26+ messages in thread From: Michael Kerrisk (man-pages) @ 2014-04-27 15:47 UTC (permalink / raw) To: Peter Zijlstra Cc: Henrik Austad, Dario Faggioli, Thomas Gleixner, Ingo Molnar, rostedt-nx8X9YLhiw1AfugRpC6u6w, Oleg Nesterov, Frédéric Weisbecker, darren-P76s1CtE8BHQT0dZR+AlfA, johan.eker-IzeFyvvaP7pWk0Htik3J/w, p.faure-et3tyl94nDNyDzI6CaY1VQ, Linux Kernel, Claudio Scordino, Michael Trimarchi, Fabio Checconi, Tommaso Cucinotta, Juri Lelli, nicola.manica-+cHZLFJ93xAO91npARCAeA, luca.abeni-3IIOeSMMxS4, Dhaval Giani, hgu1972-Re5JQEeQqe8AvxtiuMwx3w, Paul McKenney, Insop Song, liming.wang-CWA4WttNNZF54TAoqtyWWQ, jkacur-H+wXaHxf7aLQT0dZR+AlfA, linux-man Hi Peter, Following the review comments that one or two people sent, are you planning to send in a revised version of this page? Also, is there any test code lying about somewhere that I could play with? Thanks, Michael On Wed, Apr 9, 2014 at 5:42 PM, Peter Zijlstra <peterz-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org> wrote: > On Wed, Apr 09, 2014 at 05:19:11PM +0200, Henrik Austad wrote: >> > The following "real-time" policies are also supported, for >> >> why the "'s? > > I borrowed those from SCHED_SETSCHEDULER(2). > >> > sched_attr::sched_flags additional flags that can influence >> > scheduling behaviour. Currently as per Linux kernel 3.14: >> > >> > SCHED_FLAG_RESET_ON_FORK - resets the scheduling policy >> > to: (struct sched_attr){ .sched_policy = SCHED_OTHER, } >> > on fork(). >> > >> > is the only supported flag. > > ... > >> > The flags argument should be 0. >> >> What about SCHED_FLAG_RESET_ON_FOR? > > Different flags. The one is sched_attr::flags the other is > sched_setattr(.flags). > >> > The other sched_attr fields are filled out as described in >> > sched_setattr(). >> > >> > Scheduling Policies >> > The scheduler is the kernel component that decides which runnable >> > process will be executed by the CPU next. Each process has an associ‐ >> > ated scheduling policy and a static scheduling priority, sched_prior‐ >> > ity; these are the settings that are modified by sched_setscheduler(). >> > The scheduler makes it decisions based on knowledge of the scheduling >> > policy and static priority of all processes on the system. >> >> Isn't this last sentence redundant/sliglhtly repetitive? > > I borrowed that from SCHED_SETSCHEDULER(2) again. > >> > SCHED_DEADLINE: Sporadic task model deadline scheduling >> > SCHED_DEADLINE is an implementation of GEDF (Global Earliest >> > Deadline First) with additional CBS (Constant Bandwidth Server). >> > The CBS guarantees that tasks that over-run their specified >> > budget are throttled and do not affect the correct performance >> > of other SCHED_DEADLINE tasks. >> > >> > SCHED_DEADLINE tasks will fail FORK(2) with -EAGAIN >> > >> > Setting SCHED_DEADLINE can fail with -EINVAL when admission >> > control tests fail. >> >> Perhaps add a note about the deadline-class having higher priority than the >> other classes; i.e. if a deadline-task is runnable, it will preempt any >> other SCHED_(RR|FIFO) regardless of priority? > > Yes, good point, will do. > >> > SCHED_FIFO: First In-First Out scheduling >> > SCHED_FIFO can only be used with static priorities higher than 0, which >> > means that when a SCHED_FIFO processes becomes runnable, it will always >> > immediately preempt any currently running SCHED_OTHER, SCHED_BATCH, or >> > SCHED_IDLE process. SCHED_FIFO is a simple scheduling algorithm with‐ >> > out time slicing. For processes scheduled under the SCHED_FIFO policy, >> > the following rules apply: >> > >> > * A SCHED_FIFO process that has been preempted by another process of >> > higher priority will stay at the head of the list for its priority >> > and will resume execution as soon as all processes of higher prior‐ >> > ity are blocked again. >> > >> > * When a SCHED_FIFO process becomes runnable, it will be inserted at >> > the end of the list for its priority. >> > >> > * A call to sched_setscheduler() or sched_setparam(2) will put the >> > SCHED_FIFO (or SCHED_RR) process identified by pid at the start of >> > the list if it was runnable. As a consequence, it may preempt the >> > currently running process if it has the same priority. >> > (POSIX.1-2001 specifies that the process should go to the end of the >> > list.) >> > >> > * A process calling sched_yield(2) will be put at the end of the list. >> >> How about the recent discussion regarding sched_yield(). Is this correct? >> >> lkml.kernel.org/r/alpine.DEB.2.02.1403312333100.14882-3cz04HxQyghMPrZFtrUIWQ@public.gmane.orgtronix.de >> >> Is this the correct place to add a note explaining te potentional pitfalls >> using sched_yield? > > I'm not sure; there's a SCHED_YIELD(2) manpage to fill with that > nonsense. > > Also; I realized I have not described the DEADLINE sched_yield() > behaviour. > >> > No other events will move a process scheduled under the SCHED_FIFO pol‐ >> > icy in the wait list of runnable processes with equal static priority. >> > >> > A SCHED_FIFO process runs until either it is blocked by an I/O request, >> > it is preempted by a higher priority process, or it calls >> > sched_yield(2). >> > >> > SCHED_RR: Round Robin scheduling >> > SCHED_RR is a simple enhancement of SCHED_FIFO. Everything described >> > above for SCHED_FIFO also applies to SCHED_RR, except that each process >> > is only allowed to run for a maximum time quantum. If a SCHED_RR >> > process has been running for a time period equal to or longer than the >> > time quantum, it will be put at the end of the list for its priority. >> > A SCHED_RR process that has been preempted by a higher priority process >> > and subsequently resumes execution as a running process will complete >> > the unexpired portion of its round robin time quantum. The length of >> > the time quantum can be retrieved using sched_rr_get_interval(2). >> >> -> Default is 0.1HZ ms >> >> This is a question I get form time to time, having this in the manpage >> would be helpful. > > Again, brazenly stolen from SCHED_SETSCHEDULER(2); but yes. Also I'm not > sure I'd call RR an enhancement of anything much at all ;-) > >> > ERRORS >> > EINVAL The scheduling policy is not one of the recognized policies, >> > param is NULL, or param does not make sense for the policy. >> > >> > EPERM The calling process does not have appropriate privileges. >> > >> > ESRCH The process whose ID is pid could not be found. >> > >> > E2BIG The provided storage for struct sched_attr is either too >> > big, see sched_setattr(), or too small, see sched_getattr(). >> >> Where's the EBUSY? It can throw this from __sched_setscheduler() when it >> checks if there's enough bandwidth to run the task. > > Uhhm.. it got lost :-) /me quickly adds. -- Michael Kerrisk Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/ Linux/UNIX System Programming Training: http://man7.org/training/ -- To unsubscribe from this list: send the line "unsubscribe linux-man" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 26+ messages in thread
[parent not found: <CAKgNAki5BkOyckf1zxJCRs2tq-eG9bWW_yRGi3hDynz12wz+QQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]
* Re: sched_{set,get}attr() manpage [not found] ` <CAKgNAki5BkOyckf1zxJCRs2tq-eG9bWW_yRGi3hDynz12wz+QQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> @ 2014-04-27 19:34 ` Peter Zijlstra 2014-04-27 19:45 ` Steven Rostedt [not found] ` <20140427193449.GB17778-RM5+C6weyIYnLiPH7yDmwOa11wxjtiyuLtmvbW2Dspo@public.gmane.org> 0 siblings, 2 replies; 26+ messages in thread From: Peter Zijlstra @ 2014-04-27 19:34 UTC (permalink / raw) To: Michael Kerrisk (man-pages) Cc: Henrik Austad, Dario Faggioli, Thomas Gleixner, Ingo Molnar, rostedt-nx8X9YLhiw1AfugRpC6u6w, Oleg Nesterov, Frédéric Weisbecker, darren-P76s1CtE8BHQT0dZR+AlfA, johan.eker-IzeFyvvaP7pWk0Htik3J/w, p.faure-et3tyl94nDNyDzI6CaY1VQ, Linux Kernel, Claudio Scordino, Michael Trimarchi, Fabio Checconi, Tommaso Cucinotta, Juri Lelli, nicola.manica-+cHZLFJ93xAO91npARCAeA, luca.abeni-3IIOeSMMxS4, Dhaval Giani, hgu1972-Re5JQEeQqe8AvxtiuMwx3w, Paul McKenney, Insop Song, liming.wang-CWA4WttNNZF54TAoqtyWWQ, jkacur-H+wXaHxf7aLQT0dZR+AlfA, linux-man On Sun, Apr 27, 2014 at 05:47:25PM +0200, Michael Kerrisk (man-pages) wrote: > Hi Peter, > > Following the review comments that one or two people sent, are you > planning to send in a revised version of this page? Yes, I just suck at getting around to it :-(, I'll do it first thing tomorrow. > Also, is there any test code lying about somewhere that I could play with? Juri? -- To unsubscribe from this list: send the line "unsubscribe linux-man" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: sched_{set,get}attr() manpage 2014-04-27 19:34 ` Peter Zijlstra @ 2014-04-27 19:45 ` Steven Rostedt [not found] ` <20140427193449.GB17778-RM5+C6weyIYnLiPH7yDmwOa11wxjtiyuLtmvbW2Dspo@public.gmane.org> 1 sibling, 0 replies; 26+ messages in thread From: Steven Rostedt @ 2014-04-27 19:45 UTC (permalink / raw) To: Peter Zijlstra Cc: Michael Kerrisk (man-pages), Henrik Austad, Dario Faggioli, Thomas Gleixner, Ingo Molnar, Oleg Nesterov, Frédéric Weisbecker, darren, johan.eker, p.faure, Linux Kernel, Claudio Scordino, Michael Trimarchi, Fabio Checconi, Tommaso Cucinotta, Juri Lelli, nicola.manica, luca.abeni, Dhaval Giani, hgu1972, Paul McKenney, Insop Song, liming.wang, jkacur, linux-man On Sun, 27 Apr 2014 21:34:49 +0200 Peter Zijlstra <peterz@infradead.org> wrote: > > Also, is there any test code lying about somewhere that I could play with? I have a deadline program you can play with too: http://rostedt.homelinux.com/private/deadline.c -- Steve ^ permalink raw reply [flat|nested] 26+ messages in thread
[parent not found: <20140427193449.GB17778-RM5+C6weyIYnLiPH7yDmwOa11wxjtiyuLtmvbW2Dspo@public.gmane.org>]
* Re: sched_{set,get}attr() manpage [not found] ` <20140427193449.GB17778-RM5+C6weyIYnLiPH7yDmwOa11wxjtiyuLtmvbW2Dspo@public.gmane.org> @ 2014-04-28 7:39 ` Juri Lelli 0 siblings, 0 replies; 26+ messages in thread From: Juri Lelli @ 2014-04-28 7:39 UTC (permalink / raw) To: Peter Zijlstra Cc: Michael Kerrisk (man-pages), Henrik Austad, Dario Faggioli, Thomas Gleixner, Ingo Molnar, rostedt-nx8X9YLhiw1AfugRpC6u6w, Oleg Nesterov, Frédéric Weisbecker, darren-P76s1CtE8BHQT0dZR+AlfA, johan.eker-IzeFyvvaP7pWk0Htik3J/w, p.faure-et3tyl94nDNyDzI6CaY1VQ, Linux Kernel, Claudio Scordino, Michael Trimarchi, Fabio Checconi, Tommaso Cucinotta, nicola.manica-+cHZLFJ93xAO91npARCAeA, luca.abeni-3IIOeSMMxS4, Dhaval Giani, hgu1972-Re5JQEeQqe8AvxtiuMwx3w, Paul McKenney, Insop Song, liming.wang-CWA4WttNNZF54TAoqtyWWQ, jkacur-H+wXaHxf7aLQT0dZR+AlfA, linux-man On Sun, 27 Apr 2014 21:34:49 +0200 Peter Zijlstra <peterz-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org> wrote: > On Sun, Apr 27, 2014 at 05:47:25PM +0200, Michael Kerrisk (man-pages) wrote: > > Hi Peter, > > > > Following the review comments that one or two people sent, are you > > planning to send in a revised version of this page? > > Yes, I just suck at getting around to it :-(, I'll do it first thing > tomorrow. > > > Also, is there any test code lying about somewhere that I could play with? > > Juri? Yes. I use this two tools: - rt-app (to create periodic workload, also not RT/DL) https://github.com/gbagnoli/rt-app - schedtool-dl (patched version of schetool) https://github.com/jlelli/schedtool-dl Both are aligned to the last interface. Best, - Juri -- To unsubscribe from this list: send the line "unsubscribe linux-man" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 26+ messages in thread
* sched_{set,get}attr() manpage [not found] ` <53020C9D.1050208-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> 2014-04-09 9:25 ` sched_{set,get}attr() manpage Peter Zijlstra @ 2014-04-28 8:18 ` Peter Zijlstra [not found] ` <20140428081858.GX13658-ndre7Fmf5hadTX5a5knrm8zTDFooKrT+cvkQGrU6aU0@public.gmane.org> 1 sibling, 1 reply; 26+ messages in thread From: Peter Zijlstra @ 2014-04-28 8:18 UTC (permalink / raw) To: Michael Kerrisk (man-pages) Cc: Dario Faggioli, Thomas Gleixner, Ingo Molnar, rostedt-nx8X9YLhiw1AfugRpC6u6w, Oleg Nesterov, fweisbec-Re5JQEeQqe8AvxtiuMwx3w, darren-P76s1CtE8BHQT0dZR+AlfA, johan.eker-IzeFyvvaP7pWk0Htik3J/w, p.faure-et3tyl94nDNyDzI6CaY1VQ, Linux Kernel, claudio-YOzL5CV4y4YG1A2ADO40+w, michael-dyjBcgdgk7Pe9wHmmfpqLFaTQe2KTcn/, fchecconi-Re5JQEeQqe8AvxtiuMwx3w, tommaso.cucinotta-gAmJrWFzCps, juri.lelli-Re5JQEeQqe8AvxtiuMwx3w, nicola.manica-+cHZLFJ93xAO91npARCAeA, luca.abeni-3IIOeSMMxS4, dhaval.giani-Re5JQEeQqe8AvxtiuMwx3w, hgu1972-Re5JQEeQqe8AvxtiuMwx3w, Paul McKenney, insop.song-Re5JQEeQqe8AvxtiuMwx3w, liming.wang-CWA4WttNNZF54TAoqtyWWQ, jkacur-H+wXaHxf7aLQT0dZR+AlfA, linux-man-u79uwXL29TY76Z2rM5mHXA Hi Michael, find below an updated manpage, I did not apply the comments on parts that are identical to SCHED_SETSCHEDULER(2) in order to keep these texts in alignment. I feel that if we change one we should also change the other, and such a 'patch' is best done separate from the new manpage itself. I did add the missing EBUSY error, and amended the text where it said we'd return EINVAL in that case. I added a paragraph stating that SCHED_DEADLINE preempted anything else userspace can do (with the explicit mention of userspace to leave me wriggle room for the kernel's stop task :-). I also did a short paragraph on the deadline sched_yield(). For further deadline yield details we should maybe add to the SCHED_YIELD(2) manpage. Re juri/claudio; no I think sched_yield() as implemented for deadline makes sense, no other yield semantics other than NOP makes sense for it, and since we have the syscall already might as well make it do something useful. --- NAME sched_setattr, sched_getattr - set and get scheduling policy/attributes SYNOPSIS #include <sched.h> struct sched_attr { u32 size; u32 sched_policy; u64 sched_flags; /* SCHED_NORMAL, SCHED_BATCH */ s32 sched_nice; /* SCHED_FIFO, SCHED_RR */ u32 sched_priority; /* SCHED_DEADLINE */ u64 sched_runtime; u64 sched_deadline; u64 sched_period; }; int sched_setattr(pid_t pid, const struct sched_attr *attr, unsigned int flags); int sched_getattr(pid_t pid, const struct sched_attr *attr, unsigned int size, unsigned int flags); DESCRIPTION sched_setattr() sets both the scheduling policy and the associated attributes for the process whose ID is specified in pid. If pid equals zero, the scheduling policy and attributes of the calling process will be set. The interpretation of the argument attr depends on the selected policy. Currently, Linux supports the following "normal" (i.e., non-real-time) scheduling policies: SCHED_OTHER the standard "fair" time-sharing policy; SCHED_BATCH for "batch" style execution of processes; and SCHED_IDLE for running very low priority background jobs. The following "real-time" policies are also supported, for special time-critical applications that need precise control over the way in which runnable processes are selected for execution: SCHED_FIFO a first-in, first-out policy; SCHED_RR a round-robin policy; and SCHED_DEADLINE a deadline policy. The semantics of each of these policies are detailed below. sched_attr::size must be set to the size of the structure, as in sizeof(struct sched_attr), if the provided structure is smaller than the kernel structure, any additional fields are assumed '0'. If the provided structure is larger than the kernel structure, the kernel verifies all additional fields are '0' if not the syscall will fail with -E2BIG. sched_attr::sched_policy the desired scheduling policy. sched_attr::sched_flags additional flags that can influence scheduling behaviour. Currently as per Linux kernel 3.14: SCHED_FLAG_RESET_ON_FORK - resets the scheduling policy to: (struct sched_attr){ .sched_policy = SCHED_OTHER, } on fork(). is the only supported flag. sched_attr::sched_nice should only be set for SCHED_OTHER, SCHED_BATCH, the desired nice value [-20,19], see NICE(2). sched_attr::sched_priority should only be set for SCHED_FIFO, SCHED_RR, the desired static priority [1,99]. sched_attr::sched_runtime sched_attr::sched_deadline sched_attr::sched_period should only be set for SCHED_DEADLINE and are the traditional sporadic task model parameters. The flags argument should be 0. sched_getattr() queries the scheduling policy currently applied to the process identified by pid. If pid equals zero, the policy of the calling process will be retrieved. The size argument should reflect the size of struct sched_attr as known to userspace. The kernel fills out sched_attr::size to the size of its sched_attr structure. If the user provided structure is larger, additional fields are not touched. If the user provided structure is smaller, but the kernel needs to return values outside the provided space, the syscall will fail with -E2BIG. The flags argument should be 0. The other sched_attr fields are filled out as described in sched_setattr(). Scheduling Policies The scheduler is the kernel component that decides which runnable process will be executed by the CPU next. Each process has an associ‐ ated scheduling policy and a static scheduling priority, sched_prior‐ ity; these are the settings that are modified by sched_setscheduler(). The scheduler makes it decisions based on knowledge of the scheduling policy and static priority of all processes on the system. For processes scheduled under one of the normal scheduling policies (SCHED_OTHER, SCHED_IDLE, SCHED_BATCH), sched_priority is not used in scheduling decisions (it must be specified as 0). Processes scheduled under one of the real-time policies (SCHED_FIFO, SCHED_RR) have a sched_priority value in the range 1 (low) to 99 (high). (As the numbers imply, real-time processes always have higher priority than normal processes.) Note well: POSIX.1-2001 only requires an implementation to support a minimum 32 distinct priority levels for the real-time policies, and some systems supply just this minimum. Portable programs should use sched_get_priority_min(2) and sched_get_priority_max(2) to find the range of priorities supported for a particular policy. Conceptually, the scheduler maintains a list of runnable processes for each possible sched_priority value. In order to determine which process runs next, the scheduler looks for the nonempty list with the highest static priority and selects the process at the head of this list. A process's scheduling policy determines where it will be inserted into the list of processes with equal static priority and how it will move inside this list. All scheduling is preemptive: if a process with a higher static prior‐ ity becomes ready to run, the currently running process will be pre‐ empted and returned to the wait list for its static priority level. The scheduling policy only determines the ordering within the list of runnable processes with equal static priority. SCHED_DEADLINE: Sporadic task model deadline scheduling SCHED_DEADLINE is an implementation of GEDF (Global Earliest Deadline First) with additional CBS (Constant Bandwidth Server). The CBS guarantees that tasks that over-run their specified budget are throttled and do not affect the correct performance of other SCHED_DEADLINE tasks. SCHED_DEADLINE tasks will fail FORK(2) with -EAGAIN Setting SCHED_DEADLINE can fail with -EBUSY when admission control tests fail. Because of the nature of (G)EDF, SCHED_DEADLINE tasks are the highest priority (user controllable) tasks in the system, if any SCHED_DEADLINE task is runnable it will preempt anything FIFO/RR/OTHER/BATCH/IDLE task out there. A SCHED_DEADLINE task calling sched_yield() will 'yield' the current job and wait for a new period to begin. SCHED_FIFO: First In-First Out scheduling SCHED_FIFO can only be used with static priorities higher than 0, which means that when a SCHED_FIFO processes becomes runnable, it will always immediately preempt any currently running SCHED_OTHER, SCHED_BATCH, or SCHED_IDLE process. SCHED_FIFO is a simple scheduling algorithm with‐ out time slicing. For processes scheduled under the SCHED_FIFO policy, the following rules apply: * A SCHED_FIFO process that has been preempted by another process of higher priority will stay at the head of the list for its priority and will resume execution as soon as all processes of higher prior‐ ity are blocked again. * When a SCHED_FIFO process becomes runnable, it will be inserted at the end of the list for its priority. * A call to sched_setscheduler() or sched_setparam(2) will put the SCHED_FIFO (or SCHED_RR) process identified by pid at the start of the list if it was runnable. As a consequence, it may preempt the currently running process if it has the same priority. (POSIX.1-2001 specifies that the process should go to the end of the list.) * A process calling sched_yield(2) will be put at the end of the list. No other events will move a process scheduled under the SCHED_FIFO pol‐ icy in the wait list of runnable processes with equal static priority. A SCHED_FIFO process runs until either it is blocked by an I/O request, it is preempted by a higher priority process, or it calls sched_yield(2). SCHED_RR: Round Robin scheduling SCHED_RR is a simple enhancement of SCHED_FIFO. Everything described above for SCHED_FIFO also applies to SCHED_RR, except that each process is only allowed to run for a maximum time quantum. If a SCHED_RR process has been running for a time period equal to or longer than the time quantum, it will be put at the end of the list for its priority. A SCHED_RR process that has been preempted by a higher priority process and subsequently resumes execution as a running process will complete the unexpired portion of its round robin time quantum. The length of the time quantum can be retrieved using sched_rr_get_interval(2). SCHED_OTHER: Default Linux time-sharing scheduling SCHED_OTHER can only be used at static priority 0. SCHED_OTHER is the standard Linux time-sharing scheduler that is intended for all pro‐ cesses that do not require the special real-time mechanisms. The process to run is chosen from the static priority 0 list based on a dynamic priority that is determined only inside this list. The dynamic priority is based on the nice value (set by nice(2) or setpriority(2)) and increased for each time quantum the process is ready to run, but denied to run by the scheduler. This ensures fair progress among all SCHED_OTHER processes. SCHED_BATCH: Scheduling batch processes (Since Linux 2.6.16.) SCHED_BATCH can only be used at static priority 0. This policy is similar to SCHED_OTHER in that it schedules the process according to its dynamic priority (based on the nice value). The difference is that this policy will cause the scheduler to always assume that the process is CPU-intensive. Consequently, the scheduler will apply a small scheduling penalty with respect to wakeup behaviour, so that this process is mildly disfavored in scheduling decisions. This policy is useful for workloads that are noninteractive, but do not want to lower their nice value, and for workloads that want a determin‐ istic scheduling policy without interactivity causing extra preemptions (between the workload's tasks). SCHED_IDLE: Scheduling very low priority jobs (Since Linux 2.6.23.) SCHED_IDLE can only be used at static priority 0; the process nice value has no influence for this policy. This policy is intended for running jobs at extremely low priority (lower even than a +19 nice value with the SCHED_OTHER or SCHED_BATCH policies). RETURN VALUE On success, sched_setattr() and sched_getattr() return 0. On error, -1 is returned, and errno is set appropriately. ERRORS EINVAL The scheduling policy is not one of the recognized policies, param is NULL, or param does not make sense for the policy. EPERM The calling process does not have appropriate privileges. ESRCH The process whose ID is pid could not be found. E2BIG The provided storage for struct sched_attr is either too big, see sched_setattr(), or too small, see sched_getattr(). EBUSY SCHED_DEADLINE admission control failure NOTES While the text above (and in SCHED_SETSCHEDULER(2)) talks about processes, in actual fact these system calls are thread specific. -- To unsubscribe from this list: send the line "unsubscribe linux-man" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 26+ messages in thread
[parent not found: <20140428081858.GX13658-ndre7Fmf5hadTX5a5knrm8zTDFooKrT+cvkQGrU6aU0@public.gmane.org>]
* Re: sched_{set,get}attr() manpage [not found] ` <20140428081858.GX13658-ndre7Fmf5hadTX5a5knrm8zTDFooKrT+cvkQGrU6aU0@public.gmane.org> @ 2014-04-29 13:08 ` Michael Kerrisk (man-pages) [not found] ` <535FA467.2070403-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> 2014-04-29 16:04 ` Peter Zijlstra 0 siblings, 2 replies; 26+ messages in thread From: Michael Kerrisk (man-pages) @ 2014-04-29 13:08 UTC (permalink / raw) To: Peter Zijlstra Cc: mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w, Dario Faggioli, Thomas Gleixner, Ingo Molnar, rostedt-nx8X9YLhiw1AfugRpC6u6w, Oleg Nesterov, fweisbec-Re5JQEeQqe8AvxtiuMwx3w, darren-P76s1CtE8BHQT0dZR+AlfA, johan.eker-IzeFyvvaP7pWk0Htik3J/w, p.faure-et3tyl94nDNyDzI6CaY1VQ, Linux Kernel, claudio-YOzL5CV4y4YG1A2ADO40+w, michael-dyjBcgdgk7Pe9wHmmfpqLFaTQe2KTcn/, fchecconi-Re5JQEeQqe8AvxtiuMwx3w, tommaso.cucinotta-gAmJrWFzCps, juri.lelli-Re5JQEeQqe8AvxtiuMwx3w, nicola.manica-+cHZLFJ93xAO91npARCAeA, luca.abeni-3IIOeSMMxS4, dhaval.giani-Re5JQEeQqe8AvxtiuMwx3w, hgu1972-Re5JQEeQqe8AvxtiuMwx3w, Paul McKenney, insop.song-Re5JQEeQqe8AvxtiuMwx3w, liming.wang-CWA4WttNNZF54TAoqtyWWQ, jkacur-H+wXaHxf7aLQT0dZR+AlfA, linux-man-u79uwXL29TY76Z2rM5mHXA Hi Peter, On 04/28/2014 10:18 AM, Peter Zijlstra wrote: > Hi Michael, > > find below an updated manpage, I did not apply the comments on parts > that are identical to SCHED_SETSCHEDULER(2) in order to keep these texts > in alignment. I feel that if we change one we should also change the > other, and such a 'patch' is best done separate from the new manpage > itself. > > I did add the missing EBUSY error, and amended the text where it said > we'd return EINVAL in that case. > > I added a paragraph stating that SCHED_DEADLINE preempted anything else > userspace can do (with the explicit mention of userspace to leave me > wriggle room for the kernel's stop task :-). > > I also did a short paragraph on the deadline sched_yield(). For further > deadline yield details we should maybe add to the SCHED_YIELD(2) > manpage. > > Re juri/claudio; no I think sched_yield() as implemented for deadline > makes sense, no other yield semantics other than NOP makes sense for it, > and since we have the syscall already might as well make it do something > useful. Thanks for the updated page. Would you be willing to revise as per the comments below. > NAME > sched_setattr, sched_getattr - set and get scheduling policy/attributes > > SYNOPSIS > #include <sched.h> > > struct sched_attr { > u32 size; > u32 sched_policy; > u64 sched_flags; > > /* SCHED_NORMAL, SCHED_BATCH */ > s32 sched_nice; > /* SCHED_FIFO, SCHED_RR */ > u32 sched_priority; > /* SCHED_DEADLINE */ > u64 sched_runtime; > u64 sched_deadline; > u64 sched_period; > }; > int sched_setattr(pid_t pid, const struct sched_attr *attr, unsigned int flags); > > int sched_getattr(pid_t pid, const struct sched_attr *attr, unsigned int size, unsigned int flags); > > DESCRIPTION > sched_setattr() sets both the scheduling policy and the > associated attributes for the process whose ID is specified in > pid. Around about here, I think there needs to be a sentence explaining that sched_setattr() provides a superset of the functionality of sched_setscheduler(2) and setpritority(2). I mean, it can do all that those two calls can do, right? > If pid equals zero, the scheduling policy and attributes > of the calling process will be set. The interpretation of the > argument attr depends on the selected policy. Currently, Linux > supports the following "normal" (i.e., non-real-time) scheduling > policies: > > SCHED_OTHER the standard "fair" time-sharing policy; > > SCHED_BATCH for "batch" style execution of processes; and > > SCHED_IDLE for running very low priority background jobs. > > The following "real-time" policies are also supported, for > special time-critical applications that need precise control > over the way in which runnable processes are selected for > execution: > > SCHED_FIFO a first-in, first-out policy; > > SCHED_RR a round-robin policy; and > > SCHED_DEADLINE a deadline policy. > > The semantics of each of these policies are detailed below. The semantics of each of these policies are detailed in sched(7). [See my comments below] > > sched_attr::size must be set to the size of the structure, as in > sizeof(struct sched_attr), if the provided structure is smaller > than the kernel structure, any additional fields are assumed > '0'. If the provided structure is larger than the kernel > structure, the kernel verifies all additional fields are '0' if > not the syscall will fail with -E2BIG. > > sched_attr::sched_policy the desired scheduling policy. > > sched_attr::sched_flags additional flags that can influence > scheduling behaviour. Currently as per Linux kernel 3.14: > > SCHED_FLAG_RESET_ON_FORK - resets the scheduling policy > to: (struct sched_attr){ .sched_policy = SCHED_OTHER, } > on fork(). > > is the only supported flag. > > sched_attr::sched_nice should only be set for SCHED_OTHER, > SCHED_BATCH, the desired nice value [-20,19], see NICE(2). > > sched_attr::sched_priority should only be set for SCHED_FIFO, > SCHED_RR, the desired static priority [1,99]. > > sched_attr::sched_runtime > sched_attr::sched_deadline > sched_attr::sched_period should only be set for SCHED_DEADLINE > and are the traditional sporadic task model parameters. Could you add (a lot ;-)) more detail on these three fields? Assume the reader does not know about this traditional sporadic task model, and then give some explanation of what these three fields do. Probably, at this point you can work in some statement about the admission control test. [but, see my comment below. It may be that sched(7) is a better place for this detail. > The flags argument should be 0. > > sched_getattr() queries the scheduling policy currently applied > to the process identified by pid. If pid equals zero, the > policy of the calling process will be retrieved. > > The size argument should reflect the size of struct sched_attr > as known to userspace. The kernel fills out sched_attr::size to > the size of its sched_attr structure. If the user provided > structure is larger, additional fields are not touched. If the > user provided structure is smaller, but the kernel needs to > return values outside the provided space, the syscall will fail > with -E2BIG. > > The flags argument should be 0. > > The other sched_attr fields are filled out as described in > sched_setattr(). I assume that everything between my [[[ and ]]] blocks below is taken straight from sched_setscheduler(2). (If that is not true, please let me know.) This reminds me that there is a structural fault in this part of man-pages ;-). The problem is sched_setscheduler(2) currently tries to do two things: [a] Document the sched_setscheduler() and sched_scheduler system calls [b] Provide and overview od scheduling policies and parameters. It should really only do the former. I have now gone through the task of separating [b] out into a separate page, sched(7), which other pages, such as sched_setscheduler(2) and sched_setattr(2) can refer to. You can see the current versions of sched_setscheduelr.2 and sched.7 in Git (https://www.kernel.org/doc/man-pages/download.html ) So, what I would ideally like to see [1] A page describing the sched_setattr() and sched_getattr() APIs [2] A piece of text describing the SCHED_DEADLINE policy, which I can drop into sched(7). Could you revise like that? [[[[ > Scheduling Policies > The scheduler is the kernel component that decides which runnable > process will be executed by the CPU next. Each process has an associ‐ > ated scheduling policy and a static scheduling priority, sched_prior‐ > ity; these are the settings that are modified by sched_setscheduler(). > The scheduler makes it decisions based on knowledge of the scheduling > policy and static priority of all processes on the system. > > For processes scheduled under one of the normal scheduling policies > (SCHED_OTHER, SCHED_IDLE, SCHED_BATCH), sched_priority is not used in > scheduling decisions (it must be specified as 0). > > Processes scheduled under one of the real-time policies (SCHED_FIFO, > SCHED_RR) have a sched_priority value in the range 1 (low) to 99 > (high). (As the numbers imply, real-time processes always have higher > priority than normal processes.) Note well: POSIX.1-2001 only requires > an implementation to support a minimum 32 distinct priority levels for > the real-time policies, and some systems supply just this minimum. > Portable programs should use sched_get_priority_min(2) and > sched_get_priority_max(2) to find the range of priorities supported for > a particular policy. > > Conceptually, the scheduler maintains a list of runnable processes for > each possible sched_priority value. In order to determine which > process runs next, the scheduler looks for the nonempty list with the > highest static priority and selects the process at the head of this > list. > > A process's scheduling policy determines where it will be inserted into > the list of processes with equal static priority and how it will move > inside this list. > > All scheduling is preemptive: if a process with a higher static prior‐ > ity becomes ready to run, the currently running process will be pre‐ > empted and returned to the wait list for its static priority level. > The scheduling policy only determines the ordering within the list of > runnable processes with equal static priority. ]]]] > SCHED_DEADLINE: Sporadic task model deadline scheduling > SCHED_DEADLINE is an implementation of GEDF (Global Earliest > Deadline First) with additional CBS (Constant Bandwidth Server). > The CBS guarantees that tasks that over-run their specified > budget are throttled and do not affect the correct performance > of other SCHED_DEADLINE tasks. > > SCHED_DEADLINE tasks will fail FORK(2) with -EAGAIN > > Setting SCHED_DEADLINE can fail with -EBUSY when admission > control tests fail. > > Because of the nature of (G)EDF, SCHED_DEADLINE tasks are the > highest priority (user controllable) tasks in the system, if any > SCHED_DEADLINE task is runnable it will preempt anything > FIFO/RR/OTHER/BATCH/IDLE task out there. > > A SCHED_DEADLINE task calling sched_yield() will 'yield' the > current job and wait for a new period to begin. This is the piece that could go into sched(7), but I'd like it to include a discussion of deadline, period, and runtime. [[[[ > SCHED_FIFO: First In-First Out scheduling > SCHED_FIFO can only be used with static priorities higher than 0, which > means that when a SCHED_FIFO processes becomes runnable, it will always > immediately preempt any currently running SCHED_OTHER, SCHED_BATCH, or > SCHED_IDLE process. SCHED_FIFO is a simple scheduling algorithm with‐ > out time slicing. For processes scheduled under the SCHED_FIFO policy, > the following rules apply: > > * A SCHED_FIFO process that has been preempted by another process of > higher priority will stay at the head of the list for its priority > and will resume execution as soon as all processes of higher prior‐ > ity are blocked again. > > * When a SCHED_FIFO process becomes runnable, it will be inserted at > the end of the list for its priority. > > * A call to sched_setscheduler() or sched_setparam(2) will put the > SCHED_FIFO (or SCHED_RR) process identified by pid at the start of > the list if it was runnable. As a consequence, it may preempt the > currently running process if it has the same priority. > (POSIX.1-2001 specifies that the process should go to the end of the > list.) > > * A process calling sched_yield(2) will be put at the end of the list. > > No other events will move a process scheduled under the SCHED_FIFO pol‐ > icy in the wait list of runnable processes with equal static priority. > > A SCHED_FIFO process runs until either it is blocked by an I/O request, > it is preempted by a higher priority process, or it calls > sched_yield(2). > > SCHED_RR: Round Robin scheduling > SCHED_RR is a simple enhancement of SCHED_FIFO. Everything described > above for SCHED_FIFO also applies to SCHED_RR, except that each process > is only allowed to run for a maximum time quantum. If a SCHED_RR > process has been running for a time period equal to or longer than the > time quantum, it will be put at the end of the list for its priority. > A SCHED_RR process that has been preempted by a higher priority process > and subsequently resumes execution as a running process will complete > the unexpired portion of its round robin time quantum. The length of > the time quantum can be retrieved using sched_rr_get_interval(2). > > SCHED_OTHER: Default Linux time-sharing scheduling > SCHED_OTHER can only be used at static priority 0. SCHED_OTHER is the > standard Linux time-sharing scheduler that is intended for all pro‐ > cesses that do not require the special real-time mechanisms. The > process to run is chosen from the static priority 0 list based on a > dynamic priority that is determined only inside this list. The dynamic > priority is based on the nice value (set by nice(2) or setpriority(2)) > and increased for each time quantum the process is ready to run, but > denied to run by the scheduler. This ensures fair progress among all > SCHED_OTHER processes. > > SCHED_BATCH: Scheduling batch processes > (Since Linux 2.6.16.) SCHED_BATCH can only be used at static priority > 0. This policy is similar to SCHED_OTHER in that it schedules the > process according to its dynamic priority (based on the nice value). > The difference is that this policy will cause the scheduler to always > assume that the process is CPU-intensive. Consequently, the scheduler > will apply a small scheduling penalty with respect to wakeup behaviour, > so that this process is mildly disfavored in scheduling decisions. > > This policy is useful for workloads that are noninteractive, but do not > want to lower their nice value, and for workloads that want a determin‐ > istic scheduling policy without interactivity causing extra preemptions > (between the workload's tasks). > > SCHED_IDLE: Scheduling very low priority jobs > (Since Linux 2.6.23.) SCHED_IDLE can only be used at static priority > 0; the process nice value has no influence for this policy. > > This policy is intended for running jobs at extremely low priority > (lower even than a +19 nice value with the SCHED_OTHER or SCHED_BATCH > policies). ]]]] > RETURN VALUE > On success, sched_setattr() and sched_getattr() return 0. On > error, -1 is returned, and errno is set appropriately. > > ERRORS > EINVAL The scheduling policy is not one of the recognized policies, > param is NULL, or param does not make sense for the policy. > > EPERM The calling process does not have appropriate privileges. > > ESRCH The process whose ID is pid could not be found. > > E2BIG The provided storage for struct sched_attr is either too > big, see sched_setattr(), or too small, see sched_getattr(). > > EBUSY SCHED_DEADLINE admission control failure The above is the only place on the page that mentions admission control. As well as the suggestions above, it would be nice to have somewhere a summary of how admission control is calculated. > NOTES > While the text above (and in SCHED_SETSCHEDULER(2)) talks about > processes, in actual fact these system calls are thread specific. > Cheers, Michael -- Michael Kerrisk Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/ Linux/UNIX System Programming Training: http://man7.org/training/ -- To unsubscribe from this list: send the line "unsubscribe linux-man" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 26+ messages in thread
[parent not found: <535FA467.2070403-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>]
* Re: sched_{set,get}attr() manpage [not found] ` <535FA467.2070403-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> @ 2014-04-29 14:22 ` Peter Zijlstra 0 siblings, 0 replies; 26+ messages in thread From: Peter Zijlstra @ 2014-04-29 14:22 UTC (permalink / raw) To: Michael Kerrisk (man-pages) Cc: Dario Faggioli, Thomas Gleixner, Ingo Molnar, rostedt-nx8X9YLhiw1AfugRpC6u6w, Oleg Nesterov, fweisbec-Re5JQEeQqe8AvxtiuMwx3w, darren-P76s1CtE8BHQT0dZR+AlfA, johan.eker-IzeFyvvaP7pWk0Htik3J/w, p.faure-et3tyl94nDNyDzI6CaY1VQ, Linux Kernel, claudio-YOzL5CV4y4YG1A2ADO40+w, michael-dyjBcgdgk7Pe9wHmmfpqLFaTQe2KTcn/, fchecconi-Re5JQEeQqe8AvxtiuMwx3w, tommaso.cucinotta-gAmJrWFzCps, juri.lelli-Re5JQEeQqe8AvxtiuMwx3w, nicola.manica-+cHZLFJ93xAO91npARCAeA, luca.abeni-3IIOeSMMxS4, dhaval.giani-Re5JQEeQqe8AvxtiuMwx3w, hgu1972-Re5JQEeQqe8AvxtiuMwx3w, Paul McKenney, insop.song-Re5JQEeQqe8AvxtiuMwx3w, liming.wang-CWA4WttNNZF54TAoqtyWWQ, jkacur-H+wXaHxf7aLQT0dZR+AlfA, linux-man-u79uwXL29TY76Z2rM5mHXA On Tue, Apr 29, 2014 at 03:08:55PM +0200, Michael Kerrisk (man-pages) wrote: > Hi Peter, > > On 04/28/2014 10:18 AM, Peter Zijlstra wrote: > > Hi Michael, > > > > find below an updated manpage, I did not apply the comments on parts > > that are identical to SCHED_SETSCHEDULER(2) in order to keep these texts > > in alignment. I feel that if we change one we should also change the > > other, and such a 'patch' is best done separate from the new manpage > > itself. > > > > I did add the missing EBUSY error, and amended the text where it said > > we'd return EINVAL in that case. > > > > I added a paragraph stating that SCHED_DEADLINE preempted anything else > > userspace can do (with the explicit mention of userspace to leave me > > wriggle room for the kernel's stop task :-). > > > > I also did a short paragraph on the deadline sched_yield(). For further > > deadline yield details we should maybe add to the SCHED_YIELD(2) > > manpage. > > > > Re juri/claudio; no I think sched_yield() as implemented for deadline > > makes sense, no other yield semantics other than NOP makes sense for it, > > and since we have the syscall already might as well make it do something > > useful. > > Thanks for the updated page. Would you be willing > to revise as per the comments below. Ok. > > > NAME > > sched_setattr, sched_getattr - set and get scheduling policy/attributes > > > > SYNOPSIS > > #include <sched.h> > > > > struct sched_attr { > > u32 size; > > u32 sched_policy; > > u64 sched_flags; > > > > /* SCHED_NORMAL, SCHED_BATCH */ > > s32 sched_nice; > > /* SCHED_FIFO, SCHED_RR */ > > u32 sched_priority; > > /* SCHED_DEADLINE */ > > u64 sched_runtime; > > u64 sched_deadline; > > u64 sched_period; > > }; > > int sched_setattr(pid_t pid, const struct sched_attr *attr, unsigned int flags); > > > > int sched_getattr(pid_t pid, const struct sched_attr *attr, unsigned int size, unsigned int flags); > > > > DESCRIPTION > > sched_setattr() sets both the scheduling policy and the > > associated attributes for the process whose ID is specified in > > pid. > > Around about here, I think there needs to be a sentence explaining > that sched_setattr() provides a superset of the functionality of > sched_setscheduler(2) and setpritority(2). I mean, it can do all that > those two calls can do, right? Almost; setpriority() has the .which argument which we don't have. So while that syscall can change the nice value for an entire process group or user, sched_setattr() can only change the nice value for 1 task. But yes, I can mention something along those lines. > > If pid equals zero, the scheduling policy and attributes > > of the calling process will be set. The interpretation of the > > argument attr depends on the selected policy. Currently, Linux > > supports the following "normal" (i.e., non-real-time) scheduling > > policies: > > > > SCHED_OTHER the standard "fair" time-sharing policy; > > > > SCHED_BATCH for "batch" style execution of processes; and > > > > SCHED_IDLE for running very low priority background jobs. > > > > The following "real-time" policies are also supported, for > > special time-critical applications that need precise control > > over the way in which runnable processes are selected for > > execution: > > > > SCHED_FIFO a first-in, first-out policy; > > > > SCHED_RR a round-robin policy; and > > > > SCHED_DEADLINE a deadline policy. > > > > The semantics of each of these policies are detailed below. > > The semantics of each of these policies are detailed in sched(7). I don't appear to have SCHED(7), how new is that? > [See my comments below] > > > > > sched_attr::size must be set to the size of the structure, as in > > sizeof(struct sched_attr), if the provided structure is smaller > > than the kernel structure, any additional fields are assumed > > '0'. If the provided structure is larger than the kernel > > structure, the kernel verifies all additional fields are '0' if > > not the syscall will fail with -E2BIG. > > > > sched_attr::sched_policy the desired scheduling policy. > > > > sched_attr::sched_flags additional flags that can influence > > scheduling behaviour. Currently as per Linux kernel 3.14: > > > > SCHED_FLAG_RESET_ON_FORK - resets the scheduling policy > > to: (struct sched_attr){ .sched_policy = SCHED_OTHER, } > > on fork(). > > > > is the only supported flag. > > > > sched_attr::sched_nice should only be set for SCHED_OTHER, > > SCHED_BATCH, the desired nice value [-20,19], see NICE(2). > > > > sched_attr::sched_priority should only be set for SCHED_FIFO, > > SCHED_RR, the desired static priority [1,99]. > > > > sched_attr::sched_runtime > > sched_attr::sched_deadline > > sched_attr::sched_period should only be set for SCHED_DEADLINE > > and are the traditional sporadic task model parameters. > > Could you add (a lot ;-)) more detail on these three fields? Assume the > reader does not know about this traditional sporadic task model, and > then give some explanation of what these three fields do. Probably, at > this point you can work in some statement about the admission control > test. > > [but, see my comment below. It may be that sched(7) is a better > place for this detail. Yes, I think SCHED(7) would be a better place; also I think I forgot to put a reference in to Documentation/scheduler/sched-deadline.txt I'll try and write something concise. This is the stuff of books, not paragraphs :/ > > The flags argument should be 0. > > > > sched_getattr() queries the scheduling policy currently applied > > to the process identified by pid. If pid equals zero, the > > policy of the calling process will be retrieved. > > > > The size argument should reflect the size of struct sched_attr > > as known to userspace. The kernel fills out sched_attr::size to > > the size of its sched_attr structure. If the user provided > > structure is larger, additional fields are not touched. If the > > user provided structure is smaller, but the kernel needs to > > return values outside the provided space, the syscall will fail > > with -E2BIG. > > > > The flags argument should be 0. > > > > The other sched_attr fields are filled out as described in > > sched_setattr(). > > I assume that everything between my [[[ and ]]] blocks below is taken straight > from sched_setscheduler(2). (If that is not true, please let me know.) That did indeed look about right. > This reminds me that there is a structural fault in this part of man-pages ;-). > The problem is sched_setscheduler(2) currently tries to do two things: > > [a] Document the sched_setscheduler() and sched_scheduler system calls > [b] Provide and overview od scheduling policies and parameters. > > It should really only do the former. I have now gone through the task of > separating [b] out into a separate page, sched(7), which other pages, > such as sched_setscheduler(2) and sched_setattr(2) can refer to. You > can see the current versions of sched_setscheduelr.2 and sched.7 in Git > (https://www.kernel.org/doc/man-pages/download.html ) > > So, what I would ideally like to see > > [1] A page describing the sched_setattr() and sched_getattr() APIs > [2] A piece of text describing the SCHED_DEADLINE policy, which I can > drop into sched(7). > > Could you revise like that? ACK. > [[[[ > ]]]] > > > SCHED_DEADLINE: Sporadic task model deadline scheduling > > SCHED_DEADLINE is an implementation of GEDF (Global Earliest > > Deadline First) with additional CBS (Constant Bandwidth Server). > > The CBS guarantees that tasks that over-run their specified > > budget are throttled and do not affect the correct performance > > of other SCHED_DEADLINE tasks. > > > > SCHED_DEADLINE tasks will fail FORK(2) with -EAGAIN > > > > Setting SCHED_DEADLINE can fail with -EBUSY when admission > > control tests fail. > > > > Because of the nature of (G)EDF, SCHED_DEADLINE tasks are the > > highest priority (user controllable) tasks in the system, if any > > SCHED_DEADLINE task is runnable it will preempt anything > > FIFO/RR/OTHER/BATCH/IDLE task out there. > > > > A SCHED_DEADLINE task calling sched_yield() will 'yield' the > > current job and wait for a new period to begin. > > This is the piece that could go into sched(7), but I'd like it to include > a discussion of deadline, period, and runtime. > > [[[[ > ]]]] > > > RETURN VALUE > > On success, sched_setattr() and sched_getattr() return 0. On > > error, -1 is returned, and errno is set appropriately. > > > > ERRORS > > EINVAL The scheduling policy is not one of the recognized policies, > > param is NULL, or param does not make sense for the policy. > > > > EPERM The calling process does not have appropriate privileges. > > > > ESRCH The process whose ID is pid could not be found. > > > > E2BIG The provided storage for struct sched_attr is either too > > big, see sched_setattr(), or too small, see sched_getattr(). > > > > EBUSY SCHED_DEADLINE admission control failure > > The above is the only place on the page that mentions admission control. > As well as the suggestions above, it would be nice to have somewhere a > summary of how admission control is calculated. I think I'll write down what admission control is without specifics. Giving specifics pins you down on the implementation. In general admission control enforces a bound on the schedulability of the task set. New and interesting ways of computing schedulability are the subject of papers each year. -- To unsubscribe from this list: send the line "unsubscribe linux-man" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: sched_{set,get}attr() manpage 2014-04-29 13:08 ` Michael Kerrisk (man-pages) [not found] ` <535FA467.2070403-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> @ 2014-04-29 16:04 ` Peter Zijlstra 2014-04-30 11:09 ` Michael Kerrisk (man-pages) 1 sibling, 1 reply; 26+ messages in thread From: Peter Zijlstra @ 2014-04-29 16:04 UTC (permalink / raw) To: Michael Kerrisk (man-pages) Cc: Dario Faggioli, Thomas Gleixner, Ingo Molnar, rostedt, Oleg Nesterov, fweisbec, darren, johan.eker, p.faure, Linux Kernel, claudio, michael, fchecconi, tommaso.cucinotta, juri.lelli, nicola.manica, luca.abeni, dhaval.giani, hgu1972, Paul McKenney, insop.song, liming.wang, jkacur, linux-man On Tue, Apr 29, 2014 at 03:08:55PM +0200, Michael Kerrisk (man-pages) wrote: Juri, Dario, Can you have a look at the 2nd part; I'm not at all sure I got the activate/release the right way around. My current thinking was that we activate first, and then release it to go run. But googling the terms only confused me more. I suppose its one of those things that's not actually _that_ well defined. And I hope the ASCII art actually clarifies things better than the terms used. > [1] A page describing the sched_setattr() and sched_getattr() APIs NAME sched_setattr, sched_getattr - set and get scheduling policy/attributes SYNOPSIS #include <sched.h> struct sched_attr { u32 size; u32 sched_policy; u64 sched_flags; /* SCHED_NORMAL, SCHED_BATCH */ s32 sched_nice; /* SCHED_FIFO, SCHED_RR */ u32 sched_priority; /* SCHED_DEADLINE */ u64 sched_runtime; u64 sched_deadline; u64 sched_period; }; int sched_setattr(pid_t pid, const struct sched_attr *attr, unsigned int flags); int sched_getattr(pid_t pid, const struct sched_attr *attr, unsigned int size, unsigned int flags); DESCRIPTION sched_setattr() sets both the scheduling policy and the associated attributes for the process whose ID is specified in pid. sched_setattr() replaces sched_setscheduler(), sched_setparam(), nice() and some of setpriority(). If pid equals zero, the scheduling policy and attributes of the calling process will be set. The interpretation of the argument attr depends on the selected policy. Currently, Linux supports the following "normal" (i.e., non-real-time) scheduling policies: SCHED_OTHER the standard "fair" time-sharing policy; SCHED_BATCH for "batch" style execution of processes; and SCHED_IDLE for running very low priority background jobs. The following "real-time" policies are also supported, for special time-critical applications that need precise control over the way in which runnable processes are selected for execution: SCHED_FIFO a static priority first-in, first-out policy; SCHED_RR a static priority round-robin policy; and SCHED_DEADLINE a dynamic priority deadline policy. The semantics of each of these policies are detailed in sched(7). sched_attr::size must be set to the size of the structure, as in sizeof(struct sched_attr), if the provided structure is smaller than the kernel structure, any additional fields are assumed '0'. If the provided structure is larger than the kernel structure, the kernel verifies all additional fields are '0' if not the syscall will fail with -E2BIG. sched_attr::sched_policy the desired scheduling policy. sched_attr::sched_flags additional flags that can influence scheduling behaviour. Currently as per Linux kernel 3.14: SCHED_FLAG_RESET_ON_FORK - resets the scheduling policy to: (struct sched_attr){ .sched_policy = SCHED_OTHER, } on fork(). is the only supported flag. sched_attr::sched_nice should only be set for SCHED_OTHER, SCHED_BATCH, the desired nice value [-20,19], see sched(7). sched_attr::sched_priority should only be set for SCHED_FIFO, SCHED_RR, the desired static priority [1,99], see sched(7). sched_attr::sched_runtime sched_attr::sched_deadline sched_attr::sched_period should only be set for SCHED_DEADLINE and are the traditional sporadic task model parameters, see sched(7). The flags argument should be 0. sched_getattr() queries the scheduling policy currently applied to the process identified by pid. Similar to sched_setattr(), sched_getattr() replaces sched_getscheduler(), sched_getparam() and some of getpriority(). If pid equals zero, the policy of the calling process will be retrieved. The size argument should reflect the size of struct sched_attr as known to userspace. The kernel fills out sched_attr::size to the size of its sched_attr structure. If the user provided structure is larger, additional fields are not touched. If the user provided structure is smaller, but the kernel needs to return values outside the provided space, the syscall will fail with -E2BIG. The flags argument should be 0. The other sched_attr fields are filled out as described in sched_setattr(). RETURN VALUE On success, sched_setattr() and sched_getattr() return 0. On error, -1 is returned, and errno is set appropriately. ERRORS EINVAL The scheduling policy is not one of the recognized policies, param is NULL, or param does not make sense for the selected policy. EPERM The calling process does not have appropriate privileges. ESRCH The process whose ID is pid could not be found. E2BIG The provided storage for struct sched_attr is either too big, see sched_setattr(), or too small, see sched_getattr(). EBUSY SCHED_DEADLINE admission control failure, see sched(7). NOTES While the text above (and in sched_setscheduler(2)) talks about processes, in actual fact these system calls are thread specific. > [2] A piece of text describing the SCHED_DEADLINE policy, which I can > drop into sched(7). SCHED_DEADLINE: Sporadic task model deadline scheduling SCHED_DEADLINE is an implementation of GEDF (Global Earliest Deadline First) with additional CBS (Constant Bandwidth Server). A sporadic task is on that has a sequence of jobs, where each job is activated at most once per period [us]. Each job will have an absolute deadline relative to its activation before which it must finish its execution, and it shall at no time run longer than runtime [us] after its release. activation/wakeup absolute deadline | release | v v v -------x--------x--------------x--------x------- |<- Runtime -->| |<---------- Deadline ->| |<---------- Period ----------->| This gives: runtime <= (rel) deadline <= period. The CBS guarantees that tasks that over-run their specified runtime are throttled and do not affect the correct performance of other SCHED_DEADLINE tasks. In general a task set of such tasks it not feasible/schedulable within the given constraints. Therefore we must do an admittance test on setting/changing SCHED_DEADLINE policy/attributes. This admission test calculates that the task set is feasible/schedulable, failing this, sched_setattr() will return -EBUSY. For example, it is required (but not sufficient) for the total utilization to be less or equal to the total amount of cpu time available. That is, since each job can maximally run for runtime [us] per period [us], that task's utilization is runtime/period. Summing this over all tasks must be less than the total amount of CPUs present. SCHED_DEADLINE tasks will fail fork(2) with -EAGAIN. Because of the nature of (G)EDF, SCHED_DEADLINE tasks are the highest priority (user controllable) tasks in the system, if any SCHED_DEADLINE task is runnable it will preempt anything FIFO/RR/OTHER/BATCH/IDLE task out there. A SCHED_DEADLINE task calling sched_yield() will 'yield' the current job and wait for a new period to begin. ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: sched_{set,get}attr() manpage 2014-04-29 16:04 ` Peter Zijlstra @ 2014-04-30 11:09 ` Michael Kerrisk (man-pages) [not found] ` <5360D9E5.9080206-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> 0 siblings, 1 reply; 26+ messages in thread From: Michael Kerrisk (man-pages) @ 2014-04-30 11:09 UTC (permalink / raw) To: Peter Zijlstra Cc: mtk.manpages, Dario Faggioli, Thomas Gleixner, Ingo Molnar, rostedt, Oleg Nesterov, fweisbec, darren, johan.eker, p.faure, Linux Kernel, claudio, michael, fchecconi, tommaso.cucinotta, juri.lelli, nicola.manica, luca.abeni, dhaval.giani, hgu1972, Paul McKenney, insop.song, liming.wang, jkacur, linux-man Hi Peter, Thanks for the revision. More comments below. Could you revise in the light of those comments, and hopefully also after feedback from Juri and Dario? On 04/29/2014 06:04 PM, Peter Zijlstra wrote: > On Tue, Apr 29, 2014 at 03:08:55PM +0200, Michael Kerrisk (man-pages) wrote: > > Juri, Dario, Can you have a look at the 2nd part; I'm not at all sure I > got the activate/release the right way around. > > My current thinking was that we activate first, and then release it to > go run. But googling the terms only confused me more. I suppose its one > of those things that's not actually _that_ well defined. And I hope the > ASCII art actually clarifies things better than the terms used. > >> [1] A page describing the sched_setattr() and sched_getattr() APIs > > NAME > sched_setattr, sched_getattr - set and get scheduling policy/attributes > > SYNOPSIS > #include <sched.h> > > struct sched_attr { > u32 size; > u32 sched_policy; > u64 sched_flags; > > /* SCHED_NORMAL, SCHED_BATCH */ > s32 sched_nice; > > /* SCHED_FIFO, SCHED_RR */ > u32 sched_priority; > > /* SCHED_DEADLINE */ > u64 sched_runtime; > u64 sched_deadline; > u64 sched_period; > }; > > int sched_setattr(pid_t pid, const struct sched_attr *attr, unsigned int flags); > > int sched_getattr(pid_t pid, const struct sched_attr *attr, unsigned int size, unsigned int flags); > > DESCRIPTION > sched_setattr() sets both the scheduling policy and the > associated attributes for the process whose ID is specified in > pid. > > sched_setattr() replaces sched_setscheduler(), sched_setparam(), > nice() and some of setpriority(). > > If pid equals zero, the scheduling policy and attributes > of the calling process will be set. The interpretation of the > argument attr depends on the selected policy. Currently, Linux > supports the following "normal" (i.e., non-real-time) scheduling > policies: > > SCHED_OTHER the standard "fair" time-sharing policy; > > SCHED_BATCH for "batch" style execution of processes; and > > SCHED_IDLE for running very low priority background jobs. > > The following "real-time" policies are also supported, for > special time-critical applications that need precise control > over the way in which runnable processes are selected for > execution: > > SCHED_FIFO a static priority first-in, first-out policy; > > SCHED_RR a static priority round-robin policy; and > > SCHED_DEADLINE a dynamic priority deadline policy. > > The semantics of each of these policies are detailed in > sched(7). > > sched_attr::size must be set to the size of the structure, as in > sizeof(struct sched_attr), if the provided structure is smaller > than the kernel structure, any additional fields are assumed > '0'. If the provided structure is larger than the kernel > structure, the kernel verifies all additional fields are '0' if > not the syscall will fail with -E2BIG. > > sched_attr::sched_policy the desired scheduling policy. > > sched_attr::sched_flags additional flags that can influence > scheduling behaviour. Currently as per Linux kernel 3.14: > > SCHED_FLAG_RESET_ON_FORK - resets the scheduling policy > to: (struct sched_attr){ .sched_policy = SCHED_OTHER, } > on fork(). > > is the only supported flag. > > sched_attr::sched_nice should only be set for SCHED_OTHER, > SCHED_BATCH, the desired nice value [-20,19], see sched(7). > > sched_attr::sched_priority should only be set for SCHED_FIFO, > SCHED_RR, the desired static priority [1,99], see sched(7). > > sched_attr::sched_runtime > sched_attr::sched_deadline > sched_attr::sched_period should only be set for SCHED_DEADLINE > and are the traditional sporadic task model parameters, see > sched(7). So, are there fields expressed in some unit (presumably microseconds)? Best to mention that here. > The flags argument should be 0. > > sched_getattr() queries the scheduling policy currently applied > to the process identified by pid. > > Similar to sched_setattr(), sched_getattr() replaces > sched_getscheduler(), sched_getparam() and some of > getpriority(). > > If pid equals zero, the policy of the calling process will be > retrieved. > > The size argument should reflect the size of struct sched_attr > as known to userspace. The kernel fills out sched_attr::size to > the size of its sched_attr structure. If the user provided > structure is larger, additional fields are not touched. If the > user provided structure is smaller, but the kernel needs to > return values outside the provided space, the syscall will fail > with -E2BIG. > > The flags argument should be 0. > > The other sched_attr fields are filled out as described in > sched_setattr(). > > RETURN VALUE > On success, sched_setattr() and sched_getattr() return 0. On > error, -1 is returned, and errno is set appropriately. > > ERRORS > EINVAL The scheduling policy is not one of the recognized policies, > param is NULL, or param does not make sense for the selected > policy. > > EPERM The calling process does not have appropriate privileges. > > ESRCH The process whose ID is pid could not be found. > > E2BIG The provided storage for struct sched_attr is either too > big, see sched_setattr(), or too small, see sched_getattr(). > > EBUSY SCHED_DEADLINE admission control failure, see sched(7). > > NOTES > While the text above (and in sched_setscheduler(2)) talks about > processes, in actual fact these system calls are thread specific. > >> [2] A piece of text describing the SCHED_DEADLINE policy, which I can >> drop into sched(7). > > SCHED_DEADLINE: Sporadic task model deadline scheduling > SCHED_DEADLINE is an implementation of GEDF (Global Earliest > Deadline First) with additional CBS (Constant Bandwidth Server). > > A sporadic task is on that has a sequence of jobs, where each job > is activated at most once per period [us]. Each job will have an > absolute deadline relative to its activation before which it must > finish its execution, and it shall at no time run longer > than runtime [us] after its release. > > activation/wakeup absolute deadline > | release | > v v v > -------x--------x--------------x--------x------- > |<- Runtime -->| > |<---------- Deadline ->| > |<---------- Period ----------->| > > This gives: runtime <= (rel) deadline <= period. So, the 'sched_deadline' field in the 'sched_attr' expresses the release deadline? (I had initially thought it was the "absolute deadline". Could you make this clearer in the text please. > The CBS guarantees that tasks that over-run their specified > runtime are throttled and do not affect the correct performance > of other SCHED_DEADLINE tasks. > > In general a task set of such tasks it not feasible/schedulable That last line is garbled. Could you fix, please. Also, could you add some words to explain what you mean by 'task set'. > within the given constraints. Therefore we must do an admittance > test on setting/changing SCHED_DEADLINE policy/attributes. > > This admission test calculates that the task set is > feasible/schedulable, failing this, sched_setattr() will return > -EBUSY. > > For example, it is required (but not sufficient) for the total > utilization to be less or equal to the total amount of cpu time > available. That is, since each job can maximally run for runtime > [us] per period [us], that task's utilization is runtime/period. > Summing this over all tasks must be less than the total amount of > CPUs present. > > SCHED_DEADLINE tasks will fail fork(2) with -EAGAIN. Except if SCHED_RESET_ON_FORK was set, right? If yes, that should be mentioned here. > Because of the nature of (G)EDF, SCHED_DEADLINE tasks are the > highest priority (user controllable) tasks in the system, if any > SCHED_DEADLINE task is runnable it will preempt anything > FIFO/RR/OTHER/BATCH/IDLE task out there. > > A SCHED_DEADLINE task calling sched_yield() will 'yield' the > current job and wait for a new period to begin. So, I'm trying to naively understand how this all works. If different processes specify different deadline periods, how does the kernel deal with that? Is it worth adding some detail on this point? Thanks, Michael -- Michael Kerrisk Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/ Linux/UNIX System Programming Training: http://man7.org/training/ ^ permalink raw reply [flat|nested] 26+ messages in thread
[parent not found: <5360D9E5.9080206-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>]
* Re: sched_{set,get}attr() manpage [not found] ` <5360D9E5.9080206-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> @ 2014-04-30 12:35 ` Peter Zijlstra 2014-04-30 13:09 ` Peter Zijlstra 1 sibling, 0 replies; 26+ messages in thread From: Peter Zijlstra @ 2014-04-30 12:35 UTC (permalink / raw) To: Michael Kerrisk (man-pages) Cc: Dario Faggioli, Thomas Gleixner, Ingo Molnar, rostedt-nx8X9YLhiw1AfugRpC6u6w, Oleg Nesterov, fweisbec-Re5JQEeQqe8AvxtiuMwx3w, darren-P76s1CtE8BHQT0dZR+AlfA, johan.eker-IzeFyvvaP7pWk0Htik3J/w, p.faure-et3tyl94nDNyDzI6CaY1VQ, Linux Kernel, claudio-YOzL5CV4y4YG1A2ADO40+w, michael-dyjBcgdgk7Pe9wHmmfpqLFaTQe2KTcn/, fchecconi-Re5JQEeQqe8AvxtiuMwx3w, tommaso.cucinotta-gAmJrWFzCps, juri.lelli-Re5JQEeQqe8AvxtiuMwx3w, nicola.manica-+cHZLFJ93xAO91npARCAeA, luca.abeni-3IIOeSMMxS4, dhaval.giani-Re5JQEeQqe8AvxtiuMwx3w, hgu1972-Re5JQEeQqe8AvxtiuMwx3w, Paul McKenney, insop.song-Re5JQEeQqe8AvxtiuMwx3w, liming.wang-CWA4WttNNZF54TAoqtyWWQ, jkacur-H+wXaHxf7aLQT0dZR+AlfA, linux-man-u79uwXL29TY76Z2rM5mHXA On Wed, Apr 30, 2014 at 01:09:25PM +0200, Michael Kerrisk (man-pages) wrote: > Hi Peter, > > Thanks for the revision. More comments below. Could you revise in > the light of those comments, and hopefully also after feedback from > Juri and Dario? > > > > > sched_attr::sched_runtime > > sched_attr::sched_deadline > > sched_attr::sched_period should only be set for SCHED_DEADLINE > > and are the traditional sporadic task model parameters, see > > sched(7). > > So, are there fields expressed in some unit (presumably microseconds)? > Best to mention that here. Oh wait, no its nanoseconds. Which means I should amend the text below. > >> [2] A piece of text describing the SCHED_DEADLINE policy, which I can > >> drop into sched(7). > > > > SCHED_DEADLINE: Sporadic task model deadline scheduling > > SCHED_DEADLINE is an implementation of GEDF (Global Earliest > > Deadline First) with additional CBS (Constant Bandwidth Server). > > > > A sporadic task is on that has a sequence of jobs, where each job > > is activated at most once per period [us]. Each job will have an > > absolute deadline relative to its activation before which it must (A) > > finish its execution, and it shall at no time run longer > > than runtime [us] after its release. > > > > activation/wakeup absolute deadline > > | release | > > v v v > > -------x--------x--------------x--------x------- > > |<- Runtime -->| > > |<---------- Deadline ->| > > |<---------- Period ----------->| > > > > This gives: runtime <= (rel) deadline <= period. > > So, the 'sched_deadline' field in the 'sched_attr' expresses the release > deadline? (I had initially thought it was the "absolute deadline". > Could you make this clearer in the text please. No, and yes, sched_attr::sched_deadline is a relative deadline wrt to the activation. Like said at (A). So we get: absolute deadline = activation + relative deadline. And we must be done running at that point, so the very last possible release moment is: absolute deadline - runtime. And therefore, it too is a release deadline, since we must not release later than that. > > The CBS guarantees that tasks that over-run their specified > > runtime are throttled and do not affect the correct performance > > of other SCHED_DEADLINE tasks. > > > > In general a task set of such tasks it not feasible/schedulable > > That last line is garbled. Could you fix, please. s/it/is/ > Also, could you add some words to explain what you mean by 'task set'. A set of tasks? :-) In particular all tasks in the system of SCHED_DEADLINE, indicated by 'of such'. > > within the given constraints. Therefore we must do an admittance > > test on setting/changing SCHED_DEADLINE policy/attributes. > > > > This admission test calculates that the task set is > > feasible/schedulable, failing this, sched_setattr() will return > > -EBUSY. > > > > For example, it is required (but not sufficient) for the total > > utilization to be less or equal to the total amount of cpu time > > available. That is, since each job can maximally run for runtime > > [us] per period [us], that task's utilization is runtime/period. > > Summing this over all tasks must be less than the total amount of > > CPUs present. > > > > SCHED_DEADLINE tasks will fail fork(2) with -EAGAIN. > > Except if SCHED_RESET_ON_FORK was set, right? If yes, that should be > mentioned here. Ah, indeed. > > Because of the nature of (G)EDF, SCHED_DEADLINE tasks are the > > highest priority (user controllable) tasks in the system, if any > > SCHED_DEADLINE task is runnable it will preempt anything > > FIFO/RR/OTHER/BATCH/IDLE task out there. > > > > A SCHED_DEADLINE task calling sched_yield() will 'yield' the > > current job and wait for a new period to begin. > > So, I'm trying to naively understand how this all works. If different > processes specify different deadline periods, how does the kernel deal > with that? Is it worth adding some detail on this point? Userspace should not rely on any implementation details there. Saying its a (G)EDF scheduler is maybe already too much. All userspace should really care about is that its tasks _should_ be scheduled such that it meets the specified requirements. There are multiple scheduling algorithms that can be employed to make it so, and I don't want to pin us to whatever we chose to implement this time. That said, the current (G)EDF is a soft realtime scheduler in that it guarantees a bounded tardiness (which is the time we can miss the deadline by) but not a hard realtime, since the bound is not 0. Anyway, for your elucidation; assuming no overhead and a UP system (SMP is a right head-ache), and a further assumption that deadline == period. It is reasonable straight forward to see that scheduling the task with the earliest deadline will satisfy the constraints IFF the total utilization (\Sum runtime_i / deadline_i) <= 1. Suppose two tasks: A := { 5, 10 } and B := { 10, 20 } with strict periodic activation: A1,B1 A2 Ad2 | Ad1 Bd1 v v v --AAAAABBBBBAAAAABBBBBx-- --AAAAABBBBBBBBBBAAAAAx-- Where A# is the #th activation, Ad# is the corresponding #th deadline before which we must have sufficient time. Since we're perfectly synced up there is a tie and we get two possible outcomes. But note that in either case A has gotten 2x its 5 As and B has gotten its 10 Bs. Non-periodic activation, and deadline != period make the thing more interesting, but at that point I would ask Juri (or others) to refer you to a paper/book. Now, let me go update the texts yet again :-) -- To unsubscribe from this list: send the line "unsubscribe linux-man" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: sched_{set,get}attr() manpage [not found] ` <5360D9E5.9080206-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> 2014-04-30 12:35 ` Peter Zijlstra @ 2014-04-30 13:09 ` Peter Zijlstra [not found] ` <20140430130937.GH30445-ndre7Fmf5hadTX5a5knrm8zTDFooKrT+cvkQGrU6aU0@public.gmane.org> 1 sibling, 1 reply; 26+ messages in thread From: Peter Zijlstra @ 2014-04-30 13:09 UTC (permalink / raw) To: Michael Kerrisk (man-pages) Cc: Dario Faggioli, Thomas Gleixner, Ingo Molnar, rostedt-nx8X9YLhiw1AfugRpC6u6w, Oleg Nesterov, fweisbec-Re5JQEeQqe8AvxtiuMwx3w, darren-P76s1CtE8BHQT0dZR+AlfA, johan.eker-IzeFyvvaP7pWk0Htik3J/w, p.faure-et3tyl94nDNyDzI6CaY1VQ, Linux Kernel, claudio-YOzL5CV4y4YG1A2ADO40+w, michael-dyjBcgdgk7Pe9wHmmfpqLFaTQe2KTcn/, fchecconi-Re5JQEeQqe8AvxtiuMwx3w, tommaso.cucinotta-gAmJrWFzCps, juri.lelli-Re5JQEeQqe8AvxtiuMwx3w, nicola.manica-+cHZLFJ93xAO91npARCAeA, luca.abeni-3IIOeSMMxS4, dhaval.giani-Re5JQEeQqe8AvxtiuMwx3w, hgu1972-Re5JQEeQqe8AvxtiuMwx3w, Paul McKenney, insop.song-Re5JQEeQqe8AvxtiuMwx3w, liming.wang-CWA4WttNNZF54TAoqtyWWQ, jkacur-H+wXaHxf7aLQT0dZR+AlfA, linux-man-u79uwXL29TY76Z2rM5mHXA On Wed, Apr 30, 2014 at 01:09:25PM +0200, Michael Kerrisk (man-pages) wrote: > Hi Peter, > > Thanks for the revision. More comments below. Could you revise in > the light of those comments, and hopefully also after feedback from > Juri and Dario? New text below; hopefully a little clearer. If not, do holler. --- > [1] A page describing the sched_setattr() and sched_getattr() APIs NAME sched_setattr, sched_getattr - set and get scheduling policy/attributes SYNOPSIS #include <sched.h> struct sched_attr { u32 size; u32 sched_policy; u64 sched_flags; /* SCHED_NORMAL, SCHED_BATCH */ s32 sched_nice; /* SCHED_FIFO, SCHED_RR */ u32 sched_priority; /* SCHED_DEADLINE */ u64 sched_runtime; u64 sched_deadline; u64 sched_period; }; int sched_setattr(pid_t pid, const struct sched_attr *attr, unsigned int flags); int sched_getattr(pid_t pid, const struct sched_attr *attr, unsigned int size, unsigned int flags); DESCRIPTION sched_setattr() sets both the scheduling policy and the associated attributes for the process whose ID is specified in pid. sched_setattr() replaces sched_setscheduler(), sched_setparam(), nice() and some of setpriority(). If pid equals zero, the scheduling policy and attributes of the calling process will be set. The interpretation of the argument attr depends on the selected policy. Currently, Linux supports the following "normal" (i.e., non-real-time) scheduling policies: SCHED_OTHER the standard "fair" time-sharing policy; SCHED_BATCH for "batch" style execution of processes; and SCHED_IDLE for running very low priority background jobs. The following "real-time" policies are also supported, for special time-critical applications that need precise control over the way in which runnable processes are selected for execution: SCHED_FIFO a static priority first-in, first-out policy; SCHED_RR a static priority round-robin policy; and SCHED_DEADLINE a dynamic priority deadline policy. The semantics of each of these policies are detailed in sched(7). sched_attr::size must be set to the size of the structure, as in sizeof(struct sched_attr), if the provided structure is smaller than the kernel structure, any additional fields are assumed '0'. If the provided structure is larger than the kernel structure, the kernel verifies all additional fields are '0' if not the syscall will fail with -E2BIG. sched_attr::sched_policy the desired scheduling policy. sched_attr::sched_flags additional flags that can influence scheduling behaviour. Currently as per Linux kernel 3.14: SCHED_FLAG_RESET_ON_FORK - resets the scheduling policy to: (struct sched_attr){ .sched_policy = SCHED_OTHER, } on fork(). is the only supported flag. sched_attr::sched_nice should only be set for SCHED_OTHER, SCHED_BATCH, the desired nice value [-20,19], see sched(7). sched_attr::sched_priority should only be set for SCHED_FIFO, SCHED_RR, the desired static priority [1,99], see sched(7). sched_attr::sched_runtime in nanoseconds, sched_attr::sched_deadline in nanoseconds, sched_attr::sched_period in nanoseconds, should only be set for SCHED_DEADLINE and are the traditional sporadic task model parameters, see sched(7). The flags argument should be 0. sched_getattr() queries the scheduling policy currently applied to the process identified by pid. Similar to sched_setattr(), sched_getattr() replaces sched_getscheduler(), sched_getparam() and some of getpriority(). If pid equals zero, the policy of the calling process will be retrieved. The size argument should reflect the size of struct sched_attr as known to userspace. The kernel fills out sched_attr::size to the size of its sched_attr structure. If the user provided structure is larger, additional fields are not touched. If the user provided structure is smaller, but the kernel needs to return values outside the provided space, the syscall will fail with -E2BIG. The flags argument should be 0. The other sched_attr fields are filled out as described in sched_setattr(). RETURN VALUE On success, sched_setattr() and sched_getattr() return 0. On error, -1 is returned, and errno is set appropriately. ERRORS EINVAL The scheduling policy is not one of the recognized policies, param is NULL, or param does not make sense for the selected policy. EPERM The calling process does not have appropriate privileges. ESRCH The process whose ID is pid could not be found. E2BIG The provided storage for struct sched_attr is either too big, see sched_setattr(), or too small, see sched_getattr(). EBUSY SCHED_DEADLINE admission control failure, see sched(7). NOTES While the text above (and in sched_setscheduler(2)) talks about processes, in actual fact these system calls are thread specific. While the SCHED_DEADLINE parameters are in nanoseconds, current kernels truncate the lower 10 bits and we get an effective microsecond resolution. > [2] A piece of text describing the SCHED_DEADLINE policy, which I can > drop into sched(7). SCHED_DEADLINE: Sporadic task model deadline scheduling SCHED_DEADLINE is currently implemented using GEDF (Global Earliest Deadline First) with additional CBS (Constant Bandwidth Server). A sporadic task is on that has a sequence of jobs, where each job is activated at most once per period [ns]. Each job will have an absolute deadline relative to its activation before which it must finish its execution, and it shall at no time run longer than runtime [ns] after its release. activation/wakeup absolute deadline | release | v v v -------x--------x--------------x--------x------- |<- Runtime -->| |<---------- Deadline ->| |<---------- Period ----------->| This gives: runtime <= (rel) deadline <= period. The CBS guarantees non-interference between tasks, by throttling tasks that attempt to over-run their specified runtime. In general the set of all SCHED_DEADLINE tasks is not feasible/schedulable within the given constraints. Therefore we must do an admittance test on setting/changing SCHED_DEADLINE policy/attributes. This admission test calculates that the task set is feasible/schedulable, failing this, sched_setattr() will return -EBUSY. For example, it is required (but not necessarily sufficient) for the total utilization to be less or equal to the total amount of CPUs available, where, since each task can maximally run for runtime [us] per period [us], that task's utilization is its runtime/period. Because we must be able to calculate admittance SCHED_DEADLINE tasks are the highest priority (user controllable) tasks in the system, if any SCHED_DEADLINE task is runnable it will preempt any FIFO/RR/OTHER/BATCH/IDLE task. SCHED_DEADLINE tasks will fail fork(2) with -EAGAIN, except when the forking task has SCHED_FLAG_RESET_ON_FORK set. A SCHED_DEADLINE task calling sched_yield() will 'yield' the current job and wait for a new period to begin. -- To unsubscribe from this list: send the line "unsubscribe linux-man" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 26+ messages in thread
[parent not found: <20140430130937.GH30445-ndre7Fmf5hadTX5a5knrm8zTDFooKrT+cvkQGrU6aU0@public.gmane.org>]
* Re: sched_{set,get}attr() manpage [not found] ` <20140430130937.GH30445-ndre7Fmf5hadTX5a5knrm8zTDFooKrT+cvkQGrU6aU0@public.gmane.org> @ 2014-05-03 10:43 ` Juri Lelli [not found] ` <20140503124355.5d927080518051ca507bc381-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> 0 siblings, 1 reply; 26+ messages in thread From: Juri Lelli @ 2014-05-03 10:43 UTC (permalink / raw) To: Peter Zijlstra Cc: Michael Kerrisk (man-pages), Dario Faggioli, Thomas Gleixner, Ingo Molnar, rostedt-nx8X9YLhiw1AfugRpC6u6w, Oleg Nesterov, fweisbec-Re5JQEeQqe8AvxtiuMwx3w, darren-P76s1CtE8BHQT0dZR+AlfA, johan.eker-IzeFyvvaP7pWk0Htik3J/w, p.faure-et3tyl94nDNyDzI6CaY1VQ, Linux Kernel, claudio-YOzL5CV4y4YG1A2ADO40+w, michael-dyjBcgdgk7Pe9wHmmfpqLFaTQe2KTcn/, fchecconi-Re5JQEeQqe8AvxtiuMwx3w, tommaso.cucinotta-gAmJrWFzCps, nicola.manica-+cHZLFJ93xAO91npARCAeA, luca.abeni-3IIOeSMMxS4, dhaval.giani-Re5JQEeQqe8AvxtiuMwx3w, hgu1972-Re5JQEeQqe8AvxtiuMwx3w, Paul McKenney, insop.song-Re5JQEeQqe8AvxtiuMwx3w, liming.wang-CWA4WttNNZF54TAoqtyWWQ, jkacur-H+wXaHxf7aLQT0dZR+AlfA, linux-man-u79uwXL29TY76Z2rM5mHXA Hi, sorry for the late reply, but I was travelling for work. On Wed, 30 Apr 2014 15:09:37 +0200 Peter Zijlstra <peterz-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org> wrote: > On Wed, Apr 30, 2014 at 01:09:25PM +0200, Michael Kerrisk (man-pages) wrote: > > Hi Peter, > > > > Thanks for the revision. More comments below. Could you revise in > > the light of those comments, and hopefully also after feedback from > > Juri and Dario? > > New text below; hopefully a little clearer. If not, do holler. > > --- > > [1] A page describing the sched_setattr() and sched_getattr() APIs > > NAME > sched_setattr, sched_getattr - set and get scheduling policy/attributes > > SYNOPSIS > #include <sched.h> > > struct sched_attr { > u32 size; > u32 sched_policy; > u64 sched_flags; > > /* SCHED_NORMAL, SCHED_BATCH */ > s32 sched_nice; > > /* SCHED_FIFO, SCHED_RR */ > u32 sched_priority; > > /* SCHED_DEADLINE */ > u64 sched_runtime; > u64 sched_deadline; > u64 sched_period; > }; > > int sched_setattr(pid_t pid, const struct sched_attr *attr, unsigned int flags); > > int sched_getattr(pid_t pid, const struct sched_attr *attr, unsigned int size, unsigned int flags); > > DESCRIPTION > sched_setattr() sets both the scheduling policy and the > associated attributes for the process whose ID is specified in > pid. > > sched_setattr() replaces sched_setscheduler(), sched_setparam(), > nice() and some of setpriority(). > > If pid equals zero, the scheduling policy and attributes > of the calling process will be set. The interpretation of the > argument attr depends on the selected policy. Currently, Linux > supports the following "normal" (i.e., non-real-time) scheduling > policies: > > SCHED_OTHER the standard "fair" time-sharing policy; > > SCHED_BATCH for "batch" style execution of processes; and > > SCHED_IDLE for running very low priority background jobs. > > The following "real-time" policies are also supported, for > special time-critical applications that need precise control > over the way in which runnable processes are selected for > execution: > > SCHED_FIFO a static priority first-in, first-out policy; > > SCHED_RR a static priority round-robin policy; and > > SCHED_DEADLINE a dynamic priority deadline policy. > > The semantics of each of these policies are detailed in > sched(7). > > sched_attr::size must be set to the size of the structure, as in > sizeof(struct sched_attr), if the provided structure is smaller > than the kernel structure, any additional fields are assumed > '0'. If the provided structure is larger than the kernel > structure, the kernel verifies all additional fields are '0' if > not the syscall will fail with -E2BIG. > > sched_attr::sched_policy the desired scheduling policy. > > sched_attr::sched_flags additional flags that can influence > scheduling behaviour. Currently as per Linux kernel 3.14: > > SCHED_FLAG_RESET_ON_FORK - resets the scheduling policy > to: (struct sched_attr){ .sched_policy = SCHED_OTHER, } > on fork(). > > is the only supported flag. > > sched_attr::sched_nice should only be set for SCHED_OTHER, > SCHED_BATCH, the desired nice value [-20,19], see sched(7). > > sched_attr::sched_priority should only be set for SCHED_FIFO, > SCHED_RR, the desired static priority [1,99], see sched(7). > > sched_attr::sched_runtime in nanoseconds, > sched_attr::sched_deadline in nanoseconds, > sched_attr::sched_period in nanoseconds, should only be set for > SCHED_DEADLINE and are the traditional sporadic task model > parameters, see sched(7). > > The flags argument should be 0. > > sched_getattr() queries the scheduling policy currently applied > to the process identified by pid. > > Similar to sched_setattr(), sched_getattr() replaces > sched_getscheduler(), sched_getparam() and some of > getpriority(). > > If pid equals zero, the policy of the calling process will be > retrieved. > > The size argument should reflect the size of struct sched_attr > as known to userspace. The kernel fills out sched_attr::size to > the size of its sched_attr structure. If the user provided > structure is larger, additional fields are not touched. If the > user provided structure is smaller, but the kernel needs to > return values outside the provided space, the syscall will fail > with -E2BIG. > > The flags argument should be 0. > > The other sched_attr fields are filled out as described in > sched_setattr(). > > RETURN VALUE > On success, sched_setattr() and sched_getattr() return 0. On > error, -1 is returned, and errno is set appropriately. > > ERRORS > EINVAL The scheduling policy is not one of the recognized policies, > param is NULL, or param does not make sense for the selected > policy. > > EPERM The calling process does not have appropriate privileges. > > ESRCH The process whose ID is pid could not be found. > > E2BIG The provided storage for struct sched_attr is either too > big, see sched_setattr(), or too small, see sched_getattr(). > > EBUSY SCHED_DEADLINE admission control failure, see sched(7). > > NOTES > While the text above (and in sched_setscheduler(2)) talks about > processes, in actual fact these system calls are thread specific. > > While the SCHED_DEADLINE parameters are in nanoseconds, current > kernels truncate the lower 10 bits and we get an effective > microsecond resolution. > > > [2] A piece of text describing the SCHED_DEADLINE policy, which I can > > drop into sched(7). > I'd tweak the following a bit, just to be sure that users understand that one thing is the model of tasks behavior and another thing is what you can set using SCHED_DEADLINE. Then the two things are obviously closely related, but different settings can be in principle used to schedule the same task set (with lot of literature about optimal settings and so on). > SCHED_DEADLINE: Sporadic task model deadline scheduling > SCHED_DEADLINE is currently implemented using GEDF (Global > Earliest Deadline First) with additional CBS (Constant Bandwidth > Server). > > A sporadic task is on that has a sequence of jobs, where each job > is activated at most once per period [ns]. Each job will have an > absolute deadline relative to its activation before which it must > finish its execution, and it shall at no time run longer > than runtime [ns] after its release. > A sporadic task is one that has a sequence of jobs, where each job is activated at most once per period. Each job has also a relative deadline, before which it should finish execution, and a computation time, that is the time necessary for executing the job without interruption. The instant of time when a task wakes up, because a new job has to be executed, is called arrival time (and it is also referred to as request time or release time). Start time is instead the time at which a task starts its execution. The absolute deadline is thus obtained adding the relative deadline to the arrival time. The following diagram clarifies these terms: > activation/wakeup absolute deadline > | release | > v v v > -------x--------x--------------x--------x------- > |<- Runtime -->| > |<---------- Deadline ->| > |<---------- Period ----------->| > arrival/wakeup absolute deadline | start time | v v v -------x--------xoooooooooooo-------x--------x----- |<- comp. ->| |<---------- rel. deadline ->| |<---------- period --------------->| SCHED_DEADLINE allows the user to specify three parameters (see sched_setattr(2)): Runtime [ns], Deadline [ns] and Period [ns]. Such parameters has not necessarily to correspond to the aforementioned terms, while usual practise is to set Runtime to something bigger than the average computation time (or worst-case execution time for hard real-time tasks), Deadline to the relative deadline and Period to the period of the task. With such a setting we would have: arrival/wakeup absolute deadline | start time | v v v -------x--------xoooooooooooo-------x--------x----- |<- Runtime ->| |<---------- Deadline ------>| |<---------- Period --------------->| > This gives: runtime <= (rel) deadline <= period. > It is checked that: Runtime <= Deadline <= Period. > The CBS guarantees non-interference between tasks, by throttling > tasks that attempt to over-run their specified runtime. > s/runtime/Runtime to be consistent. > In general the set of all SCHED_DEADLINE tasks is not > feasible/schedulable within the given constraints. Therefore we > must do an admittance test on setting/changing SCHED_DEADLINE > policy/attributes. > To guarantee some degree of timeliness we must do an admission test on setting/changing SCHED_DEADLINE policy/attributes. > This admission test calculates that the task set is > feasible/schedulable, failing this, sched_setattr() will return > -EBUSY. > > For example, it is required (but not necessarily sufficient) for > the total utilization to be less or equal to the total amount of > CPUs available, where, since each task can maximally run for > runtime [us] per period [us], that task's utilization is its > runtime/period. > CPUs available, where, since each task can maximally run for Runtime per Period, that task's utilization is its Runtime/Period. > Because we must be able to calculate admittance SCHED_DEADLINE > tasks are the highest priority (user controllable) tasks in the > system, if any SCHED_DEADLINE task is runnable it will preempt > any FIFO/RR/OTHER/BATCH/IDLE task. > > SCHED_DEADLINE tasks will fail fork(2) with -EAGAIN, except when > the forking task has SCHED_FLAG_RESET_ON_FORK set. > > A SCHED_DEADLINE task calling sched_yield() will 'yield' the > current job and wait for a new period to begin. > Does it look any better? Thanks, - Juri -- To unsubscribe from this list: send the line "unsubscribe linux-man" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 26+ messages in thread
[parent not found: <20140503124355.5d927080518051ca507bc381-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>]
* Re: sched_{set,get}attr() manpage [not found] ` <20140503124355.5d927080518051ca507bc381-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> @ 2014-05-05 6:55 ` Michael Kerrisk (man-pages) 2014-05-05 7:21 ` Peter Zijlstra 0 siblings, 1 reply; 26+ messages in thread From: Michael Kerrisk (man-pages) @ 2014-05-05 6:55 UTC (permalink / raw) To: Peter Zijlstra Cc: Juri Lelli, mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w, Dario Faggioli, Thomas Gleixner, Ingo Molnar, rostedt-nx8X9YLhiw1AfugRpC6u6w, Oleg Nesterov, fweisbec-Re5JQEeQqe8AvxtiuMwx3w, darren-P76s1CtE8BHQT0dZR+AlfA, johan.eker-IzeFyvvaP7pWk0Htik3J/w, p.faure-et3tyl94nDNyDzI6CaY1VQ, Linux Kernel, claudio-YOzL5CV4y4YG1A2ADO40+w, michael-dyjBcgdgk7Pe9wHmmfpqLFaTQe2KTcn/, fchecconi-Re5JQEeQqe8AvxtiuMwx3w, tommaso.cucinotta-gAmJrWFzCps, nicola.manica-+cHZLFJ93xAO91npARCAeA, luca.abeni-3IIOeSMMxS4, dhaval.giani-Re5JQEeQqe8AvxtiuMwx3w, hgu1972-Re5JQEeQqe8AvxtiuMwx3w, Paul McKenney, insop.song-Re5JQEeQqe8AvxtiuMwx3w, liming.wang-CWA4WttNNZF54TAoqtyWWQ, jkacur-H+wXaHxf7aLQT0dZR+AlfA, linux-man-u79uwXL29TY76Z2rM5mHXA Hi Peter, Looks like a good set of comments from Juri. Could you revise and resubmit? By the way, I assume you are just writing this page as raw text. While I'd prefer to get proper man markup source, I'll add that if you if you don't :-/. But, in that case, I need to know the copyright and license you want to use. Please see https://www.kernel.org/doc/man-pages/licenses.html Cheers, Michael On 05/03/2014 12:43 PM, Juri Lelli wrote: > Hi, > > sorry for the late reply, but I was travelling for work. > > On Wed, 30 Apr 2014 15:09:37 +0200 > Peter Zijlstra <peterz-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org> wrote: > >> On Wed, Apr 30, 2014 at 01:09:25PM +0200, Michael Kerrisk (man-pages) wrote: >>> Hi Peter, >>> >>> Thanks for the revision. More comments below. Could you revise in >>> the light of those comments, and hopefully also after feedback from >>> Juri and Dario? >> >> New text below; hopefully a little clearer. If not, do holler. >> >> --- >>> [1] A page describing the sched_setattr() and sched_getattr() APIs >> >> NAME >> sched_setattr, sched_getattr - set and get scheduling policy/attributes >> >> SYNOPSIS >> #include <sched.h> >> >> struct sched_attr { >> u32 size; >> u32 sched_policy; >> u64 sched_flags; >> >> /* SCHED_NORMAL, SCHED_BATCH */ >> s32 sched_nice; >> >> /* SCHED_FIFO, SCHED_RR */ >> u32 sched_priority; >> >> /* SCHED_DEADLINE */ >> u64 sched_runtime; >> u64 sched_deadline; >> u64 sched_period; >> }; >> >> int sched_setattr(pid_t pid, const struct sched_attr *attr, unsigned int flags); >> >> int sched_getattr(pid_t pid, const struct sched_attr *attr, unsigned int size, unsigned int flags); >> >> DESCRIPTION >> sched_setattr() sets both the scheduling policy and the >> associated attributes for the process whose ID is specified in >> pid. >> >> sched_setattr() replaces sched_setscheduler(), sched_setparam(), >> nice() and some of setpriority(). >> >> If pid equals zero, the scheduling policy and attributes >> of the calling process will be set. The interpretation of the >> argument attr depends on the selected policy. Currently, Linux >> supports the following "normal" (i.e., non-real-time) scheduling >> policies: >> >> SCHED_OTHER the standard "fair" time-sharing policy; >> >> SCHED_BATCH for "batch" style execution of processes; and >> >> SCHED_IDLE for running very low priority background jobs. >> >> The following "real-time" policies are also supported, for >> special time-critical applications that need precise control >> over the way in which runnable processes are selected for >> execution: >> >> SCHED_FIFO a static priority first-in, first-out policy; >> >> SCHED_RR a static priority round-robin policy; and >> >> SCHED_DEADLINE a dynamic priority deadline policy. >> >> The semantics of each of these policies are detailed in >> sched(7). >> >> sched_attr::size must be set to the size of the structure, as in >> sizeof(struct sched_attr), if the provided structure is smaller >> than the kernel structure, any additional fields are assumed >> '0'. If the provided structure is larger than the kernel >> structure, the kernel verifies all additional fields are '0' if >> not the syscall will fail with -E2BIG. >> >> sched_attr::sched_policy the desired scheduling policy. >> >> sched_attr::sched_flags additional flags that can influence >> scheduling behaviour. Currently as per Linux kernel 3.14: >> >> SCHED_FLAG_RESET_ON_FORK - resets the scheduling policy >> to: (struct sched_attr){ .sched_policy = SCHED_OTHER, } >> on fork(). >> >> is the only supported flag. >> >> sched_attr::sched_nice should only be set for SCHED_OTHER, >> SCHED_BATCH, the desired nice value [-20,19], see sched(7). >> >> sched_attr::sched_priority should only be set for SCHED_FIFO, >> SCHED_RR, the desired static priority [1,99], see sched(7). >> >> sched_attr::sched_runtime in nanoseconds, >> sched_attr::sched_deadline in nanoseconds, >> sched_attr::sched_period in nanoseconds, should only be set for >> SCHED_DEADLINE and are the traditional sporadic task model >> parameters, see sched(7). >> >> The flags argument should be 0. >> >> sched_getattr() queries the scheduling policy currently applied >> to the process identified by pid. >> >> Similar to sched_setattr(), sched_getattr() replaces >> sched_getscheduler(), sched_getparam() and some of >> getpriority(). >> >> If pid equals zero, the policy of the calling process will be >> retrieved. >> >> The size argument should reflect the size of struct sched_attr >> as known to userspace. The kernel fills out sched_attr::size to >> the size of its sched_attr structure. If the user provided >> structure is larger, additional fields are not touched. If the >> user provided structure is smaller, but the kernel needs to >> return values outside the provided space, the syscall will fail >> with -E2BIG. >> >> The flags argument should be 0. >> >> The other sched_attr fields are filled out as described in >> sched_setattr(). >> >> RETURN VALUE >> On success, sched_setattr() and sched_getattr() return 0. On >> error, -1 is returned, and errno is set appropriately. >> >> ERRORS >> EINVAL The scheduling policy is not one of the recognized policies, >> param is NULL, or param does not make sense for the selected >> policy. >> >> EPERM The calling process does not have appropriate privileges. >> >> ESRCH The process whose ID is pid could not be found. >> >> E2BIG The provided storage for struct sched_attr is either too >> big, see sched_setattr(), or too small, see sched_getattr(). >> >> EBUSY SCHED_DEADLINE admission control failure, see sched(7). >> >> NOTES >> While the text above (and in sched_setscheduler(2)) talks about >> processes, in actual fact these system calls are thread specific. >> >> While the SCHED_DEADLINE parameters are in nanoseconds, current >> kernels truncate the lower 10 bits and we get an effective >> microsecond resolution. >> >>> [2] A piece of text describing the SCHED_DEADLINE policy, which I can >>> drop into sched(7). >> > > I'd tweak the following a bit, just to be sure that users understand > that one thing is the model of tasks behavior and another thing is what > you can set using SCHED_DEADLINE. Then the two things are obviously > closely related, but different settings can be in principle used to > schedule the same task set (with lot of literature about optimal > settings and so on). > >> SCHED_DEADLINE: Sporadic task model deadline scheduling >> SCHED_DEADLINE is currently implemented using GEDF (Global >> Earliest Deadline First) with additional CBS (Constant Bandwidth >> Server). >> >> A sporadic task is on that has a sequence of jobs, where each job >> is activated at most once per period [ns]. Each job will have an >> absolute deadline relative to its activation before which it must >> finish its execution, and it shall at no time run longer >> than runtime [ns] after its release. >> > > A sporadic task is one that has a sequence of jobs, where each job is > activated at most once per period. Each job has also a relative > deadline, before which it should finish execution, and a computation > time, that is the time necessary for executing the job without > interruption. The instant of time when a task wakes up, because a new > job has to be executed, is called arrival time (and it is also referred > to as request time or release time). Start time is instead the time at > which a task starts its execution. The absolute deadline is thus > obtained adding the relative deadline to the arrival time. The > following diagram clarifies these terms: > >> activation/wakeup absolute deadline >> | release | >> v v v >> -------x--------x--------------x--------x------- >> |<- Runtime -->| >> |<---------- Deadline ->| >> |<---------- Period ----------->| >> > > arrival/wakeup absolute deadline > | start time | > v v v > -------x--------xoooooooooooo-------x--------x----- > |<- comp. ->| > |<---------- rel. deadline ->| > |<---------- period --------------->| > > SCHED_DEADLINE allows the user to specify three parameters (see > sched_setattr(2)): Runtime [ns], Deadline [ns] and Period [ns]. Such > parameters has not necessarily to correspond to the aforementioned > terms, while usual practise is to set Runtime to something bigger than > the average computation time (or worst-case execution time for hard > real-time tasks), Deadline to the relative deadline and Period to the > period of the task. With such a setting we would have: > > arrival/wakeup absolute deadline > | start time | > v v v > -------x--------xoooooooooooo-------x--------x----- > |<- Runtime ->| > |<---------- Deadline ------>| > |<---------- Period --------------->| > > > >> This gives: runtime <= (rel) deadline <= period. >> > > It is checked that: Runtime <= Deadline <= Period. > >> The CBS guarantees non-interference between tasks, by throttling >> tasks that attempt to over-run their specified runtime. >> > > s/runtime/Runtime to be consistent. > >> In general the set of all SCHED_DEADLINE tasks is not >> feasible/schedulable within the given constraints. Therefore we >> must do an admittance test on setting/changing SCHED_DEADLINE >> policy/attributes. >> > > To guarantee some degree of timeliness we must do an admission test on > setting/changing SCHED_DEADLINE policy/attributes. > > >> This admission test calculates that the task set is >> feasible/schedulable, failing this, sched_setattr() will return >> -EBUSY. >> >> For example, it is required (but not necessarily sufficient) for >> the total utilization to be less or equal to the total amount of >> CPUs available, where, since each task can maximally run for >> runtime [us] per period [us], that task's utilization is its >> runtime/period. >> > > CPUs available, where, since each task can maximally run for Runtime > per Period, that task's utilization is its Runtime/Period. > >> Because we must be able to calculate admittance SCHED_DEADLINE >> tasks are the highest priority (user controllable) tasks in the >> system, if any SCHED_DEADLINE task is runnable it will preempt >> any FIFO/RR/OTHER/BATCH/IDLE task. >> >> SCHED_DEADLINE tasks will fail fork(2) with -EAGAIN, except when >> the forking task has SCHED_FLAG_RESET_ON_FORK set. >> >> A SCHED_DEADLINE task calling sched_yield() will 'yield' the >> current job and wait for a new period to begin. >> > > Does it look any better? > > Thanks, > > - Juri > -- Michael Kerrisk Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/ Linux/UNIX System Programming Training: http://man7.org/training/ -- To unsubscribe from this list: send the line "unsubscribe linux-man" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: sched_{set,get}attr() manpage 2014-05-05 6:55 ` Michael Kerrisk (man-pages) @ 2014-05-05 7:21 ` Peter Zijlstra [not found] ` <20140505072114.GY11096-ndre7Fmf5hadTX5a5knrm8zTDFooKrT+cvkQGrU6aU0@public.gmane.org> 2014-05-06 8:16 ` Peter Zijlstra 0 siblings, 2 replies; 26+ messages in thread From: Peter Zijlstra @ 2014-05-05 7:21 UTC (permalink / raw) To: Michael Kerrisk (man-pages) Cc: Juri Lelli, Dario Faggioli, Thomas Gleixner, Ingo Molnar, rostedt, Oleg Nesterov, fweisbec, darren, johan.eker, p.faure, Linux Kernel, claudio, michael, fchecconi, tommaso.cucinotta, nicola.manica, luca.abeni, dhaval.giani, hgu1972, Paul McKenney, insop.song, liming.wang, jkacur, linux-man [-- Attachment #1: Type: text/plain, Size: 942 bytes --] On Mon, May 05, 2014 at 08:55:28AM +0200, Michael Kerrisk (man-pages) wrote: > Hi Peter, > > Looks like a good set of comments from Juri. Could you revise and > resubmit? Yeah, I'll try and get it done today, but there's a few icky bugs waiting for my attention as well, I'll do me bestest :-) > By the way, I assume you are just writing this page as raw text. > While I'd prefer to get proper man markup source, I'll add that > if you if you don't :-/. Well, learning *roff will likely take me more time than writing this text + all revisions so far :/ But yeah, I appreciate the grief. Is there a TeX variant one could use to generate the *roff muck? While my TeX isn't entirely fresh its at least something I've done lots of. > But, in that case, I need to know the > copyright and license you want to use. Please see > https://www.kernel.org/doc/man-pages/licenses.html GPLv2 + DOC (not v2+) sounds good. [-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --] ^ permalink raw reply [flat|nested] 26+ messages in thread
[parent not found: <20140505072114.GY11096-ndre7Fmf5hadTX5a5knrm8zTDFooKrT+cvkQGrU6aU0@public.gmane.org>]
* Re: sched_{set,get}attr() manpage [not found] ` <20140505072114.GY11096-ndre7Fmf5hadTX5a5knrm8zTDFooKrT+cvkQGrU6aU0@public.gmane.org> @ 2014-05-05 7:41 ` Michael Kerrisk (man-pages) [not found] ` <53674094.2020307-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> 0 siblings, 1 reply; 26+ messages in thread From: Michael Kerrisk (man-pages) @ 2014-05-05 7:41 UTC (permalink / raw) To: Peter Zijlstra Cc: mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w, Juri Lelli, Dario Faggioli, Thomas Gleixner, Ingo Molnar, rostedt-nx8X9YLhiw1AfugRpC6u6w, Oleg Nesterov, fweisbec-Re5JQEeQqe8AvxtiuMwx3w, darren-P76s1CtE8BHQT0dZR+AlfA, johan.eker-IzeFyvvaP7pWk0Htik3J/w, p.faure-et3tyl94nDNyDzI6CaY1VQ, Linux Kernel, claudio-YOzL5CV4y4YG1A2ADO40+w, michael-dyjBcgdgk7Pe9wHmmfpqLFaTQe2KTcn/, fchecconi-Re5JQEeQqe8AvxtiuMwx3w, tommaso.cucinotta-gAmJrWFzCps, nicola.manica-+cHZLFJ93xAO91npARCAeA, luca.abeni-3IIOeSMMxS4, dhaval.giani-Re5JQEeQqe8AvxtiuMwx3w, hgu1972-Re5JQEeQqe8AvxtiuMwx3w, Paul McKenney, insop.song-Re5JQEeQqe8AvxtiuMwx3w, liming.wang-CWA4WttNNZF54TAoqtyWWQ, jkacur-H+wXaHxf7aLQT0dZR+AlfA, linux-man-u79uwXL29TY76Z2rM5mHXA On 05/05/2014 09:21 AM, Peter Zijlstra wrote: > On Mon, May 05, 2014 at 08:55:28AM +0200, Michael Kerrisk (man-pages) wrote: >> Hi Peter, >> >> Looks like a good set of comments from Juri. Could you revise and >> resubmit? > > Yeah, I'll try and get it done today, but there's a few icky bugs > waiting for my attention as well, I'll do me bestest :-) > >> By the way, I assume you are just writing this page as raw text. >> While I'd prefer to get proper man markup source, I'll add that >> if you if you don't :-/. > > Well, learning *roff will likely take me more time than writing this > text + all revisions so far :/ But yeah, I appreciate the grief. > > Is there a TeX variant one could use to generate the *roff muck? While > my TeX isn't entirely fresh its at least something I've done lots of. Don't worry -- just send me the plain text; I'll do it. I appreciate you writing the text in the first place; I'll handle the rest--it won't take me too long, and probably I'll find things to fix/check on the way. >> But, in that case, I need to know the >> copyright and license you want to use. Please see >> https://www.kernel.org/doc/man-pages/licenses.html > > GPLv2 + DOC (not v2+) sounds good. I'm a little unclear here. Do you or don't you mean https://www.kernel.org/doc/man-pages/licenses.html#gpl ? (Note, I'd really prefer to stick to one of those licenses (without variants). (My personal preference is the "verbatim" license, which is the most widely used one.) There's already do many licenses in in man-pages... Cheers, Michael -- Michael Kerrisk Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/ Linux/UNIX System Programming Training: http://man7.org/training/ -- To unsubscribe from this list: send the line "unsubscribe linux-man" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 26+ messages in thread
[parent not found: <53674094.2020307-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>]
* Re: sched_{set,get}attr() manpage [not found] ` <53674094.2020307-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> @ 2014-05-05 7:47 ` Peter Zijlstra 2014-05-05 9:53 ` Michael Kerrisk (man-pages) 0 siblings, 1 reply; 26+ messages in thread From: Peter Zijlstra @ 2014-05-05 7:47 UTC (permalink / raw) To: Michael Kerrisk (man-pages) Cc: Juri Lelli, Dario Faggioli, Thomas Gleixner, Ingo Molnar, rostedt-nx8X9YLhiw1AfugRpC6u6w, Oleg Nesterov, fweisbec-Re5JQEeQqe8AvxtiuMwx3w, darren-P76s1CtE8BHQT0dZR+AlfA, johan.eker-IzeFyvvaP7pWk0Htik3J/w, p.faure-et3tyl94nDNyDzI6CaY1VQ, Linux Kernel, claudio-YOzL5CV4y4YG1A2ADO40+w, michael-dyjBcgdgk7Pe9wHmmfpqLFaTQe2KTcn/, fchecconi-Re5JQEeQqe8AvxtiuMwx3w, tommaso.cucinotta-gAmJrWFzCps, nicola.manica-+cHZLFJ93xAO91npARCAeA, luca.abeni-3IIOeSMMxS4, dhaval.giani-Re5JQEeQqe8AvxtiuMwx3w, hgu1972-Re5JQEeQqe8AvxtiuMwx3w, Paul McKenney, insop.song-Re5JQEeQqe8AvxtiuMwx3w, liming.wang-CWA4WttNNZF54TAoqtyWWQ, jkacur-H+wXaHxf7aLQT0dZR+AlfA, linux-man-u79uwXL29TY76Z2rM5mHXA [-- Attachment #1: Type: text/plain, Size: 846 bytes --] On Mon, May 05, 2014 at 09:41:08AM +0200, Michael Kerrisk (man-pages) wrote: > >> But, in that case, I need to know the > >> copyright and license you want to use. Please see > >> https://www.kernel.org/doc/man-pages/licenses.html > > > > GPLv2 + DOC (not v2+) sounds good. > > I'm a little unclear here. Do you or don't you mean > https://www.kernel.org/doc/man-pages/licenses.html#gpl > ? A variant with out the +, just like I do my kernel code, no greater gpl versions. However, > (Note, I'd really prefer to stick to one of those licenses > (without variants). (My personal preference is the "verbatim" > license, which is the most widely used one.) There's already > do many licenses in in man-pages... Verbatim is OK with me I suppose. Its only text after all, who cares about that :-) /me runs for the hills. [-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --] ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: sched_{set,get}attr() manpage 2014-05-05 7:47 ` Peter Zijlstra @ 2014-05-05 9:53 ` Michael Kerrisk (man-pages) 0 siblings, 0 replies; 26+ messages in thread From: Michael Kerrisk (man-pages) @ 2014-05-05 9:53 UTC (permalink / raw) To: Peter Zijlstra Cc: mtk.manpages, Juri Lelli, Dario Faggioli, Thomas Gleixner, Ingo Molnar, rostedt, Oleg Nesterov, fweisbec, darren, johan.eker, p.faure, Linux Kernel, claudio, michael, fchecconi, tommaso.cucinotta, nicola.manica, luca.abeni, dhaval.giani, hgu1972, Paul McKenney, insop.song, liming.wang, jkacur, linux-man On 05/05/2014 09:47 AM, Peter Zijlstra wrote: > On Mon, May 05, 2014 at 09:41:08AM +0200, Michael Kerrisk (man-pages) wrote: >>>> But, in that case, I need to know the >>>> copyright and license you want to use. Please see >>>> https://www.kernel.org/doc/man-pages/licenses.html >>> >>> GPLv2 + DOC (not v2+) sounds good. >> >> I'm a little unclear here. Do you or don't you mean >> https://www.kernel.org/doc/man-pages/licenses.html#gpl >> ? > > A variant with out the +, just like I do my kernel code, no greater gpl > versions. However, > >> (Note, I'd really prefer to stick to one of those licenses >> (without variants). (My personal preference is the "verbatim" >> license, which is the most widely used one.) There's already >> do many licenses in in man-pages... > > Verbatim is OK with me I suppose. And don't neglect to mention who the copyright is to please. > Its only text after all, who cares > about that :-) > /me runs for the hills. Well, apparently you care, so thanks ;-) -- Michael Kerrisk Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/ Linux/UNIX System Programming Training: http://man7.org/training/ ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: sched_{set,get}attr() manpage 2014-05-05 7:21 ` Peter Zijlstra [not found] ` <20140505072114.GY11096-ndre7Fmf5hadTX5a5knrm8zTDFooKrT+cvkQGrU6aU0@public.gmane.org> @ 2014-05-06 8:16 ` Peter Zijlstra 2014-05-09 8:23 ` Michael Kerrisk (man-pages) 1 sibling, 1 reply; 26+ messages in thread From: Peter Zijlstra @ 2014-05-06 8:16 UTC (permalink / raw) To: Michael Kerrisk (man-pages) Cc: Juri Lelli, Dario Faggioli, Thomas Gleixner, Ingo Molnar, rostedt, Oleg Nesterov, fweisbec, darren, johan.eker, p.faure, Linux Kernel, claudio, michael, fchecconi, tommaso.cucinotta, nicola.manica, luca.abeni, dhaval.giani, hgu1972, Paul McKenney, insop.song, liming.wang, jkacur, linux-man [-- Attachment #1: Type: text/plain, Size: 9226 bytes --] On Mon, May 05, 2014 at 09:21:14AM +0200, Peter Zijlstra wrote: > On Mon, May 05, 2014 at 08:55:28AM +0200, Michael Kerrisk (man-pages) wrote: > > Hi Peter, > > > > Looks like a good set of comments from Juri. Could you revise and > > resubmit? > > Yeah, I'll try and get it done today, but there's a few icky bugs > waiting for my attention as well, I'll do me bestest :-) OK, not quite managed it yesterday, but here goes. So Verbatim license, for the first part to me and whoever I borrowed sched_setscheduler() bits from. For the second part to me and Juri. --- > [1] A page describing the sched_setattr() and sched_getattr() APIs NAME sched_setattr, sched_getattr - set and get scheduling policy/attributes SYNOPSIS #include <sched.h> struct sched_attr { u32 size; u32 sched_policy; u64 sched_flags; /* SCHED_NORMAL, SCHED_BATCH */ s32 sched_nice; /* SCHED_FIFO, SCHED_RR */ u32 sched_priority; /* SCHED_DEADLINE */ u64 sched_runtime; u64 sched_deadline; u64 sched_period; }; int sched_setattr(pid_t pid, const struct sched_attr *attr, unsigned int flags); int sched_getattr(pid_t pid, const struct sched_attr *attr, unsigned int size, unsigned int flags); DESCRIPTION sched_setattr() sets both the scheduling policy and the associated attributes for the process whose ID is specified in pid. sched_setattr() replaces sched_setscheduler(), sched_setparam(), nice() and some of setpriority(). If pid equals zero, the scheduling policy and attributes of the calling process will be set. The interpretation of the argument attr depends on the selected policy. Currently, Linux supports the following "normal" (i.e., non-real-time) scheduling policies: SCHED_OTHER the standard "fair" time-sharing policy; SCHED_BATCH for "batch" style execution of processes; and SCHED_IDLE for running very low priority background jobs. The following "real-time" policies are also supported, for special time-critical applications that need precise control over the way in which runnable processes are selected for execution: SCHED_FIFO a static priority first-in, first-out policy; SCHED_RR a static priority round-robin policy; and SCHED_DEADLINE a dynamic priority deadline policy. The semantics of each of these policies are detailed in sched(7). sched_attr::size must be set to the size of the structure, as in sizeof(struct sched_attr), if the provided structure is smaller than the kernel structure, any additional fields are assumed '0'. If the provided structure is larger than the kernel structure, the kernel verifies all additional fields are '0' if not the syscall will fail with -E2BIG. sched_attr::sched_policy the desired scheduling policy. sched_attr::sched_flags additional flags that can influence scheduling behaviour. Currently as per Linux kernel 3.14: SCHED_FLAG_RESET_ON_FORK - resets the scheduling policy to: (struct sched_attr){ .sched_policy = SCHED_OTHER, } on fork(). is the only supported flag. sched_attr::sched_nice should only be set for SCHED_OTHER, SCHED_BATCH, the desired nice value [-20,19], see sched(7). sched_attr::sched_priority should only be set for SCHED_FIFO, SCHED_RR, the desired static priority [1,99], see sched(7). sched_attr::sched_runtime in nanoseconds, sched_attr::sched_deadline in nanoseconds, sched_attr::sched_period in nanoseconds, should only be set for SCHED_DEADLINE and are the traditional sporadic task model parameters, see sched(7). The flags argument should be 0. sched_getattr() queries the scheduling policy currently applied to the process identified by pid. Similar to sched_setattr(), sched_getattr() replaces sched_getscheduler(), sched_getparam() and some of getpriority(). If pid equals zero, the policy of the calling process will be retrieved. The size argument should reflect the size of struct sched_attr as known to userspace. The kernel fills out sched_attr::size to the size of its sched_attr structure. If the user provided structure is larger, additional fields are not touched. If the user provided structure is smaller, but the kernel needs to return values outside the provided space, the syscall will fail with -E2BIG. The flags argument should be 0. The other sched_attr fields are filled out as described in sched_setattr(). RETURN VALUE On success, sched_setattr() and sched_getattr() return 0. On error, -1 is returned, and errno is set appropriately. ERRORS EINVAL The scheduling policy is not one of the recognized policies, param is NULL, or param does not make sense for the selected policy. EPERM The calling process does not have appropriate privileges. ESRCH The process whose ID is pid could not be found. E2BIG The provided storage for struct sched_attr is either too big, see sched_setattr(), or too small, see sched_getattr(). EBUSY SCHED_DEADLINE admission control failure, see sched(7). NOTES While the text above (and in sched_setscheduler(2)) talks about processes, in actual fact these system calls are thread specific. While the SCHED_DEADLINE parameters are in nanoseconds, current kernels truncate the lower 10 bits and we get an effective microsecond resolution. > [2] A piece of text describing the SCHED_DEADLINE policy, which I can > drop into sched(7). SCHED_DEADLINE: Sporadic task model deadline scheduling SCHED_DEADLINE is currently implemented using GEDF (Global Earliest Deadline First) with additional CBS (Constant Bandwidth Server). A sporadic task is one that has a sequence of jobs, where each job is activated at most once per period. Each job has also a relative deadline, before which it should finish execution, and a computation time, that is the time necessary for executing the job without interruption. The instant of time when a task wakes up, because a new job has to be executed, is called arrival time (and it is also referred to as request time or release time). Start time is instead the time at which a task starts its execution. The absolute deadline is thus obtained adding the relative deadline to the arrival time. The following diagram clarifies these terms: arrival/wakeup absolute deadline | start time | v v v -------x--------xoooooooooooo-------x--------x----- |<- comp. ->| |<---------- rel. deadline ->| |<---------- period ----------------->| SCHED_DEADLINE allows the user to specify three parameters (see sched_setattr(2)): Runtime [ns], Deadline [ns] and Period [ns]. Such parameters has not necessarily to correspond to the aforementioned terms, while usual practise is to set Runtime to something bigger than the average computation time (or worst-case execution time for hard real-time tasks), Deadline to the relative deadline and Period to the period of the task. With such a setting we would have: arrival/wakeup absolute deadline | start time | v v v -------x--------xoooooooooooo-------x--------x----- |<- Runtime -->| |<---------- Deadline ------>| |<---------- Period ----------------->| It is checked that: Runtime <= Deadline <= Period. The CBS guarantees non-interference between tasks, by throttling tasks that attempt to over-run their specified Runtime. In general the set of all SCHED_DEADLINE tasks is not feasible/schedulable within the given constraints. To guarantee some degree of timeliness we must do an admittance test on setting/changing SCHED_DEADLINE policy/attributes. This admission test calculates that the task set is feasible/schedulable, failing this, sched_setattr() will return -EBUSY. For example, it is required (but not necessarily sufficient) for the total utilization to be less or equal to the total amount of CPUs available, where, since each task can maximally run for Runtime per Period, that task's utilization is its Runtime/Period. Because we must be able to calculate admittance SCHED_DEADLINE tasks are the highest priority (user controllable) tasks in the system, if any SCHED_DEADLINE task is runnable it will preempt any FIFO/RR/OTHER/BATCH/IDLE task. SCHED_DEADLINE tasks will fail fork(2) with -EAGAIN, except when the forking task has SCHED_FLAG_RESET_ON_FORK set. A SCHED_DEADLINE task calling sched_yield() will 'yield' the current job and wait for a new period to begin. [-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --] ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: sched_{set,get}attr() manpage 2014-05-06 8:16 ` Peter Zijlstra @ 2014-05-09 8:23 ` Michael Kerrisk (man-pages) [not found] ` <536C907A.1040205-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> 0 siblings, 1 reply; 26+ messages in thread From: Michael Kerrisk (man-pages) @ 2014-05-09 8:23 UTC (permalink / raw) To: Peter Zijlstra Cc: mtk.manpages, Juri Lelli, Dario Faggioli, Thomas Gleixner, Ingo Molnar, rostedt, Oleg Nesterov, fweisbec, darren, johan.eker, p.faure, Linux Kernel, claudio, michael, fchecconi, tommaso.cucinotta, nicola.manica, luca.abeni, dhaval.giani, hgu1972, Paul McKenney, insop.song, liming.wang, jkacur, linux-man Hi Peter, I'm working on this text. I see the following in kernel/sched/core.c: [[ static int __sched_setscheduler(struct task_struct *p, const struct sched_attr *attr, bool user) { ... int policy = attr->sched_policy; ... if (policy < 0) { reset_on_fork = p->sched_reset_on_fork; policy = oldpolicy = p->policy; ]] What's a negative policy about? Is this something that should be documented? Cheers, Michael On 05/06/2014 10:16 AM, Peter Zijlstra wrote: > On Mon, May 05, 2014 at 09:21:14AM +0200, Peter Zijlstra wrote: >> On Mon, May 05, 2014 at 08:55:28AM +0200, Michael Kerrisk (man-pages) wrote: >>> Hi Peter, >>> >>> Looks like a good set of comments from Juri. Could you revise and >>> resubmit? >> >> Yeah, I'll try and get it done today, but there's a few icky bugs >> waiting for my attention as well, I'll do me bestest :-) > > OK, not quite managed it yesterday, but here goes. > > So Verbatim license, for the first part to me and whoever I borrowed > sched_setscheduler() bits from. > > For the second part to me and Juri. > > --- > >> [1] A page describing the sched_setattr() and sched_getattr() APIs > > NAME > sched_setattr, sched_getattr - set and get scheduling policy/attributes > > SYNOPSIS > #include <sched.h> > > struct sched_attr { > u32 size; > u32 sched_policy; > u64 sched_flags; > > /* SCHED_NORMAL, SCHED_BATCH */ > s32 sched_nice; > > /* SCHED_FIFO, SCHED_RR */ > u32 sched_priority; > > /* SCHED_DEADLINE */ > u64 sched_runtime; > u64 sched_deadline; > u64 sched_period; > }; > > int sched_setattr(pid_t pid, const struct sched_attr *attr, unsigned int flags); > > int sched_getattr(pid_t pid, const struct sched_attr *attr, unsigned int size, unsigned int flags); > > DESCRIPTION > sched_setattr() sets both the scheduling policy and the > associated attributes for the process whose ID is specified in > pid. > > sched_setattr() replaces sched_setscheduler(), sched_setparam(), > nice() and some of setpriority(). > > If pid equals zero, the scheduling policy and attributes > of the calling process will be set. The interpretation of the > argument attr depends on the selected policy. Currently, Linux > supports the following "normal" (i.e., non-real-time) scheduling > policies: > > SCHED_OTHER the standard "fair" time-sharing policy; > > SCHED_BATCH for "batch" style execution of processes; and > > SCHED_IDLE for running very low priority background jobs. > > The following "real-time" policies are also supported, for > special time-critical applications that need precise control > over the way in which runnable processes are selected for > execution: > > SCHED_FIFO a static priority first-in, first-out policy; > > SCHED_RR a static priority round-robin policy; and > > SCHED_DEADLINE a dynamic priority deadline policy. > > The semantics of each of these policies are detailed in > sched(7). > > sched_attr::size must be set to the size of the structure, as in > sizeof(struct sched_attr), if the provided structure is smaller > than the kernel structure, any additional fields are assumed > '0'. If the provided structure is larger than the kernel > structure, the kernel verifies all additional fields are '0' if > not the syscall will fail with -E2BIG. > > sched_attr::sched_policy the desired scheduling policy. > > sched_attr::sched_flags additional flags that can influence > scheduling behaviour. Currently as per Linux kernel 3.14: > > SCHED_FLAG_RESET_ON_FORK - resets the scheduling policy > to: (struct sched_attr){ .sched_policy = SCHED_OTHER, } > on fork(). > > is the only supported flag. > > sched_attr::sched_nice should only be set for SCHED_OTHER, > SCHED_BATCH, the desired nice value [-20,19], see sched(7). > > sched_attr::sched_priority should only be set for SCHED_FIFO, > SCHED_RR, the desired static priority [1,99], see sched(7). > > sched_attr::sched_runtime in nanoseconds, > sched_attr::sched_deadline in nanoseconds, > sched_attr::sched_period in nanoseconds, should only be set for > SCHED_DEADLINE and are the traditional sporadic task model > parameters, see sched(7). > > The flags argument should be 0. > > sched_getattr() queries the scheduling policy currently applied > to the process identified by pid. > > Similar to sched_setattr(), sched_getattr() replaces > sched_getscheduler(), sched_getparam() and some of > getpriority(). > > If pid equals zero, the policy of the calling process will be > retrieved. > > The size argument should reflect the size of struct sched_attr > as known to userspace. The kernel fills out sched_attr::size to > the size of its sched_attr structure. If the user provided > structure is larger, additional fields are not touched. If the > user provided structure is smaller, but the kernel needs to > return values outside the provided space, the syscall will fail > with -E2BIG. > > The flags argument should be 0. > > The other sched_attr fields are filled out as described in > sched_setattr(). > > RETURN VALUE > On success, sched_setattr() and sched_getattr() return 0. On > error, -1 is returned, and errno is set appropriately. > > ERRORS > EINVAL The scheduling policy is not one of the recognized policies, > param is NULL, or param does not make sense for the selected > policy. > > EPERM The calling process does not have appropriate privileges. > > ESRCH The process whose ID is pid could not be found. > > E2BIG The provided storage for struct sched_attr is either too > big, see sched_setattr(), or too small, see sched_getattr(). > > EBUSY SCHED_DEADLINE admission control failure, see sched(7). > > NOTES > While the text above (and in sched_setscheduler(2)) talks about > processes, in actual fact these system calls are thread specific. > > While the SCHED_DEADLINE parameters are in nanoseconds, current > kernels truncate the lower 10 bits and we get an effective > microsecond resolution. > >> [2] A piece of text describing the SCHED_DEADLINE policy, which I can >> drop into sched(7). > > SCHED_DEADLINE: Sporadic task model deadline scheduling > SCHED_DEADLINE is currently implemented using GEDF (Global > Earliest Deadline First) with additional CBS (Constant Bandwidth > Server). > > A sporadic task is one that has a sequence of jobs, where each > job is activated at most once per period. Each job has also a > relative deadline, before which it should finish execution, and a > computation time, that is the time necessary for executing the > job without interruption. The instant of time when a task wakes > up, because a new job has to be executed, is called arrival time > (and it is also referred to as request time or release time). > Start time is instead the time at which a task starts its > execution. The absolute deadline is thus obtained adding the > relative deadline to the arrival time. > > The following diagram clarifies these terms: > > arrival/wakeup absolute deadline > | start time | > v v v > -------x--------xoooooooooooo-------x--------x----- > |<- comp. ->| > |<---------- rel. deadline ->| > |<---------- period ----------------->| > > SCHED_DEADLINE allows the user to specify three parameters (see > sched_setattr(2)): Runtime [ns], Deadline [ns] and Period [ns]. > Such parameters has not necessarily to correspond to the > aforementioned terms, while usual practise is to set Runtime to > something bigger than the average computation time (or worst-case > execution time for hard real-time tasks), Deadline to the > relative deadline and Period to the period of the task. With such > a setting we would have: > > arrival/wakeup absolute deadline > | start time | > v v v > -------x--------xoooooooooooo-------x--------x----- > |<- Runtime -->| > |<---------- Deadline ------>| > |<---------- Period ----------------->| > > It is checked that: Runtime <= Deadline <= Period. > > The CBS guarantees non-interference between tasks, by throttling > tasks that attempt to over-run their specified Runtime. > > In general the set of all SCHED_DEADLINE tasks is not > feasible/schedulable within the given constraints. To guarantee > some degree of timeliness we must do an admittance test on > setting/changing SCHED_DEADLINE policy/attributes. > > This admission test calculates that the task set is > feasible/schedulable, failing this, sched_setattr() will return > -EBUSY. > > For example, it is required (but not necessarily sufficient) for > the total utilization to be less or equal to the total amount of > CPUs available, where, since each task can maximally run for > Runtime per Period, that task's utilization is its > Runtime/Period. > > Because we must be able to calculate admittance SCHED_DEADLINE > tasks are the highest priority (user controllable) tasks in the > system, if any SCHED_DEADLINE task is runnable it will preempt > any FIFO/RR/OTHER/BATCH/IDLE task. > > SCHED_DEADLINE tasks will fail fork(2) with -EAGAIN, except when > the forking task has SCHED_FLAG_RESET_ON_FORK set. > > A SCHED_DEADLINE task calling sched_yield() will 'yield' the > current job and wait for a new period to begin. > -- Michael Kerrisk Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/ Linux/UNIX System Programming Training: http://man7.org/training/ ^ permalink raw reply [flat|nested] 26+ messages in thread
[parent not found: <536C907A.1040205-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>]
* Re: sched_{set,get}attr() manpage [not found] ` <536C907A.1040205-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> @ 2014-05-09 8:53 ` Peter Zijlstra 2014-05-09 9:26 ` Michael Kerrisk (man-pages) 0 siblings, 1 reply; 26+ messages in thread From: Peter Zijlstra @ 2014-05-09 8:53 UTC (permalink / raw) To: Michael Kerrisk (man-pages) Cc: Juri Lelli, Dario Faggioli, Thomas Gleixner, Ingo Molnar, rostedt-nx8X9YLhiw1AfugRpC6u6w, Oleg Nesterov, fweisbec-Re5JQEeQqe8AvxtiuMwx3w, darren-P76s1CtE8BHQT0dZR+AlfA, johan.eker-IzeFyvvaP7pWk0Htik3J/w, p.faure-et3tyl94nDNyDzI6CaY1VQ, Linux Kernel, claudio-YOzL5CV4y4YG1A2ADO40+w, michael-dyjBcgdgk7Pe9wHmmfpqLFaTQe2KTcn/, fchecconi-Re5JQEeQqe8AvxtiuMwx3w, tommaso.cucinotta-gAmJrWFzCps, nicola.manica-+cHZLFJ93xAO91npARCAeA, luca.abeni-3IIOeSMMxS4, dhaval.giani-Re5JQEeQqe8AvxtiuMwx3w, hgu1972-Re5JQEeQqe8AvxtiuMwx3w, Paul McKenney, insop.song-Re5JQEeQqe8AvxtiuMwx3w, liming.wang-CWA4WttNNZF54TAoqtyWWQ, jkacur-H+wXaHxf7aLQT0dZR+AlfA, linux-man-u79uwXL29TY76Z2rM5mHXA [-- Attachment #1: Type: text/plain, Size: 1982 bytes --] On Fri, May 09, 2014 at 10:23:22AM +0200, Michael Kerrisk (man-pages) wrote: > Hi Peter, > > I'm working on this text. I see the following in kernel/sched/core.c: > > [[ > static int __sched_setscheduler(struct task_struct *p, > const struct sched_attr *attr, > bool user) > { > ... > > int policy = attr->sched_policy; > ... > if (policy < 0) { > reset_on_fork = p->sched_reset_on_fork; > policy = oldpolicy = p->policy; > ]] > > What's a negative policy about? Is this something that should > be documented? That's for sched_setparam(), which internally passes policy = -1, it wasn't meant to be user visible, lemme double check that. sys_sched_setscheduler() -- explicit check for policy < 0 sys_sched_setparam() -- explicitly passes policy=-1, not user visible sys_sched_setattr() -- hmm, it looks like fail --- Subject: sched: Disallow sched_attr::sched_policy < 0 From: Peter Zijlstra <peterz-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org> Date: Fri May 9 10:49:03 CEST 2014 The scheduler uses policy=-1 to preserve the current policy state to implement sys_sched_setparam(), this got exposed to userspace by accident through sys_sched_setattr(), cure this. Reported-by: Michael Kerrisk <mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> Signed-off-by: Peter Zijlstra <peterz-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org> Link: http://lkml.kernel.org/n/tip-b4kbwz2qh21xlngdzje00t55-Ckxz5ZWcFp/9qxiX1TGQuw@public.gmane.org --- kernel/sched/core.c | 3 +++ 1 file changed, 3 insertions(+) --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -3711,6 +3711,9 @@ SYSCALL_DEFINE3(sched_setattr, pid_t, pi if (sched_copy_attr(uattr, &attr)) return -EFAULT; + if (attr.sched_policy < 0) + return -EINVAL; + rcu_read_lock(); retval = -ESRCH; p = find_process_by_pid(pid); [-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --] ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: sched_{set,get}attr() manpage 2014-05-09 8:53 ` Peter Zijlstra @ 2014-05-09 9:26 ` Michael Kerrisk (man-pages) 0 siblings, 0 replies; 26+ messages in thread From: Michael Kerrisk (man-pages) @ 2014-05-09 9:26 UTC (permalink / raw) To: Peter Zijlstra Cc: Juri Lelli, Dario Faggioli, Thomas Gleixner, Ingo Molnar, Steven Rostedt, Oleg Nesterov, Frédéric Weisbecker, Darren Hart, johan.eker, p.faure, Linux Kernel, Claudio Scordino, Michael Trimarchi, Fabio Checconi, Tommaso Cucinotta, nicola.manica, luca.abeni, Dhaval Giani, hgu1972, Paul McKenney, Insop Song, liming.wang, jkacur, linux-man Hi Peter, On Fri, May 9, 2014 at 10:53 AM, Peter Zijlstra <peterz@infradead.org> wrote: > On Fri, May 09, 2014 at 10:23:22AM +0200, Michael Kerrisk (man-pages) wrote: >> Hi Peter, >> >> I'm working on this text. I see the following in kernel/sched/core.c: >> >> [[ >> static int __sched_setscheduler(struct task_struct *p, >> const struct sched_attr *attr, >> bool user) >> { >> ... >> >> int policy = attr->sched_policy; >> ... >> if (policy < 0) { >> reset_on_fork = p->sched_reset_on_fork; >> policy = oldpolicy = p->policy; >> ]] >> >> What's a negative policy about? Is this something that should >> be documented? > > That's for sched_setparam(), which internally passes policy = -1, it > wasn't meant to be user visible, lemme double check that. > > sys_sched_setscheduler() -- explicit check for policy < 0 > sys_sched_setparam() -- explicitly passes policy=-1, not user visible (Ahh -- I missed that piece in sys_sched_setparam()) > sys_sched_setattr() -- hmm, it looks like fail Yep, I was seeing that there was no check in sched_setatr(). As I recently said, when it comes to writing a man page, show me a new interface, and I'll show you a bug ;-). Thanks for the clarification. Acked-by: Michael Kerrisk <mtk.manpages@gmail.com> Cheers, Michael > --- > Subject: sched: Disallow sched_attr::sched_policy < 0 > From: Peter Zijlstra <peterz@infradead.org> > Date: Fri May 9 10:49:03 CEST 2014 > > The scheduler uses policy=-1 to preserve the current policy state to > implement sys_sched_setparam(), this got exposed to userspace by > accident through sys_sched_setattr(), cure this. > > Reported-by: Michael Kerrisk <mtk.manpages@gmail.com> > Signed-off-by: Peter Zijlstra <peterz@infradead.org> > Link: http://lkml.kernel.org/n/tip-b4kbwz2qh21xlngdzje00t55@git.kernel.org > --- > kernel/sched/core.c | 3 +++ > 1 file changed, 3 insertions(+) > > --- a/kernel/sched/core.c > +++ b/kernel/sched/core.c > @@ -3711,6 +3711,9 @@ SYSCALL_DEFINE3(sched_setattr, pid_t, pi > if (sched_copy_attr(uattr, &attr)) > return -EFAULT; > > + if (attr.sched_policy < 0) > + return -EINVAL; > + > rcu_read_lock(); > retval = -ESRCH; > p = find_process_by_pid(pid); -- Michael Kerrisk Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/ Linux/UNIX System Programming Training: http://man7.org/training/ ^ permalink raw reply [flat|nested] 26+ messages in thread
end of thread, other threads:[~2014-05-09 9:26 UTC | newest]
Thread overview: 26+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20131217122720.950475833@infradead.org>
[not found] ` <20131217123352.692059839@infradead.org>
[not found] ` <CAHO5Pa3=+Zhg72tVfddSUvgirUyObir6atJVo4_16bVWB2Osgw@mail.gmail.com>
[not found] ` <20140121153851.GZ31570@twins.programming.kicks-ass.net>
[not found] ` <CAKgNAkgw+U44SH0wd_06ZMXaCC9nCX4NZxZHkMKUdC7E7YxBhQ@mail.gmail.com>
[not found] ` <20140214161929.GL27965@twins.programming.kicks-ass.net>
[not found] ` <53020C9D.1050208@gmail.com>
[not found] ` <53020C9D.1050208-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2014-04-09 9:25 ` sched_{set,get}attr() manpage Peter Zijlstra
[not found] ` <20140409092510.GQ11096-ndre7Fmf5hadTX5a5knrm8zTDFooKrT+cvkQGrU6aU0@public.gmane.org>
2014-04-09 15:19 ` Henrik Austad
[not found] ` <20140409151911.GA4041-RT+80VE2nyv1P9xLtpHBDw@public.gmane.org>
2014-04-09 15:42 ` Peter Zijlstra
[not found] ` <20140409154204.GD10526-ndre7Fmf5hadTX5a5knrm8zTDFooKrT+cvkQGrU6aU0@public.gmane.org>
2014-04-10 7:47 ` Juri Lelli
2014-04-10 9:59 ` Claudio Scordino
2014-04-27 15:47 ` Michael Kerrisk (man-pages)
[not found] ` <CAKgNAki5BkOyckf1zxJCRs2tq-eG9bWW_yRGi3hDynz12wz+QQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2014-04-27 19:34 ` Peter Zijlstra
2014-04-27 19:45 ` Steven Rostedt
[not found] ` <20140427193449.GB17778-RM5+C6weyIYnLiPH7yDmwOa11wxjtiyuLtmvbW2Dspo@public.gmane.org>
2014-04-28 7:39 ` Juri Lelli
2014-04-28 8:18 ` Peter Zijlstra
[not found] ` <20140428081858.GX13658-ndre7Fmf5hadTX5a5knrm8zTDFooKrT+cvkQGrU6aU0@public.gmane.org>
2014-04-29 13:08 ` Michael Kerrisk (man-pages)
[not found] ` <535FA467.2070403-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2014-04-29 14:22 ` Peter Zijlstra
2014-04-29 16:04 ` Peter Zijlstra
2014-04-30 11:09 ` Michael Kerrisk (man-pages)
[not found] ` <5360D9E5.9080206-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2014-04-30 12:35 ` Peter Zijlstra
2014-04-30 13:09 ` Peter Zijlstra
[not found] ` <20140430130937.GH30445-ndre7Fmf5hadTX5a5knrm8zTDFooKrT+cvkQGrU6aU0@public.gmane.org>
2014-05-03 10:43 ` Juri Lelli
[not found] ` <20140503124355.5d927080518051ca507bc381-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2014-05-05 6:55 ` Michael Kerrisk (man-pages)
2014-05-05 7:21 ` Peter Zijlstra
[not found] ` <20140505072114.GY11096-ndre7Fmf5hadTX5a5knrm8zTDFooKrT+cvkQGrU6aU0@public.gmane.org>
2014-05-05 7:41 ` Michael Kerrisk (man-pages)
[not found] ` <53674094.2020307-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2014-05-05 7:47 ` Peter Zijlstra
2014-05-05 9:53 ` Michael Kerrisk (man-pages)
2014-05-06 8:16 ` Peter Zijlstra
2014-05-09 8:23 ` Michael Kerrisk (man-pages)
[not found] ` <536C907A.1040205-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2014-05-09 8:53 ` Peter Zijlstra
2014-05-09 9:26 ` Michael Kerrisk (man-pages)
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).