From mboxrd@z Thu Jan 1 00:00:00 1970 From: Henrik Austad Subject: Re: sched_{set,get}attr() manpage Date: Wed, 9 Apr 2014 17:19:11 +0200 Message-ID: <20140409151911.GA4041@austad.us> References: <20131217122720.950475833@infradead.org> <20131217123352.692059839@infradead.org> <20140121153851.GZ31570@twins.programming.kicks-ass.net> <20140214161929.GL27965@twins.programming.kicks-ass.net> <53020C9D.1050208@gmail.com> <20140409092510.GQ11096@twins.programming.kicks-ass.net> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: QUOTED-PRINTABLE Return-path: Content-Disposition: inline In-Reply-To: <20140409092510.GQ11096-ndre7Fmf5hadTX5a5knrm8zTDFooKrT+cvkQGrU6aU0@public.gmane.org> Sender: linux-man-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org To: Peter Zijlstra Cc: "Michael Kerrisk (man-pages)" , Dario Faggioli , Thomas Gleixner , Ingo Molnar , rostedt-nx8X9YLhiw1AfugRpC6u6w@public.gmane.org, Oleg Nesterov , fweisbec-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org, darren-P76s1CtE8BHQT0dZR+AlfA@public.gmane.org, johan.eker-IzeFyvvaP7pWk0Htik3J/w@public.gmane.org, p.faure-et3tyl94nDNyDzI6CaY1VQ@public.gmane.org, Linux Kernel , claudio-YOzL5CV4y4YG1A2ADO40+w@public.gmane.org, michael-dyjBcgdgk7Pe9wHmmfpqLFaTQe2KTcn/@public.gmane.org, fchecconi-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org, tommaso.cucinotta-gAmJrWFzCps@public.gmane.org, juri.lelli-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org, nicola.manica-+cHZLFJ93xAO91npARCAeA@public.gmane.org, luca.abeni-3IIOeSMMxS4@public.gmane.org, dhaval.giani-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org, hgu1972-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org, Paul McKenney , insop.song-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org, liming.wang-CWA4WttNNZF54TAoqtyWWQ@public.gmane.org, jkacur-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org, linux-man-u79uwXL29TY76Z2rM5mHXA@public.gmane.org List-Id: linux-man@vger.kernel.org On Wed, Apr 09, 2014 at 11:25:10AM +0200, Peter Zijlstra wrote: > On Mon, Feb 17, 2014 at 02:20:29PM +0100, Michael Kerrisk (man-pages)= wrote: > > If your could take another pass though your existing text, to incor= porate the > > new flags stuff, and then send a page to me + linux-man@ > > that would be great. >=20 >=20 > Sorry, this slipped my mind. An updated version below. Heavy borrowin= g > from SCHED_SETSCHEDULER(2) as before. >=20 > --- >=20 > NAME > sched_setattr, sched_getattr - set and get scheduling policy/attribu= tes >=20 > SYNOPSIS > #include >=20 > struct sched_attr { > u32 size; > u32 sched_policy; > u64 sched_flags; >=20 > /* SCHED_NORMAL, SCHED_BATCH */ > s32 sched_nice; > /* SCHED_FIFO, SCHED_RR */ > u32 sched_priority; > /* SCHED_DEADLINE */ > u64 sched_runtime; > u64 sched_deadline; > u64 sched_period; > }; > int sched_setattr(pid_t pid, const struct sched_attr *attr, unsigned= int flags); >=20 > int sched_getattr(pid_t pid, const struct sched_attr *attr, unsigned= int size, unsigned int flags); >=20 > DESCRIPTION > sched_setattr() sets both the scheduling policy and the > associated attributes for the process whose ID is specified in > pid. If pid equals zero, the scheduling policy and attributes > of the calling process will be set. The interpretation of the > argument attr depends on the selected policy. Currently, Linux > supports the following "normal" (i.e., non-real-time) scheduling > policies: >=20 > SCHED_OTHER the standard "fair" time-sharing policy; >=20 > SCHED_BATCH for "batch" style execution of processes; and >=20 > SCHED_IDLE for running very low priority background jobs. >=20 > The following "real-time" policies are also supported, for why the "'s? > special time-critical applications that need precise control > over the way in which runnable processes are selected for > execution: >=20 > SCHED_FIFO a first-in, first-out policy; >=20 > SCHED_RR a round-robin policy; and >=20 > SCHED_DEADLINE a deadline policy. >=20 > The semantics of each of these policies are detailed below. >=20 > sched_attr::size must be set to the size of the structure, as in > sizeof(struct sched_attr), if the provided structure is smaller > than the kernel structure, any additional fields are assumed > '0'. If the provided structure is larger than the kernel > structure, the kernel verifies all additional fields are '0' if > not the syscall will fail with -E2BIG. >=20 > sched_attr::sched_policy the desired scheduling policy. >=20 > sched_attr::sched_flags additional flags that can influence > scheduling behaviour. Currently as per Linux kernel 3.14: >=20 > SCHED_FLAG_RESET_ON_FORK - resets the scheduling policy > to: (struct sched_attr){ .sched_policy =3D SCHED_OTHER, } > on fork(). >=20 > is the only supported flag. >=20 > sched_attr::sched_nice should only be set for SCHED_OTHER, > SCHED_BATCH, the desired nice value [-20,19], see NICE(2). >=20 > sched_attr::sched_priority should only be set for SCHED_FIFO, > SCHED_RR, the desired static priority [1,99]. >=20 > sched_attr::sched_runtime > sched_attr::sched_deadline > sched_attr::sched_period should only be set for SCHED_DEADLINE > and are the traditional sporadic task model parameters. >=20 > The flags argument should be 0. >=20 > sched_getattr() queries the scheduling policy currently applied > to the process identified by pid. If pid equals zero, the > policy of the calling process will be retrieved. >=20 > The size argument should reflect the size of struct sched_attr > as known to userspace. The kernel fills out sched_attr::size to > the size of its sched_attr structure. If the user provided > structure is larger, additional fields are not touched. If the > user provided structure is smaller, but the kernel needs to > return values outside the provided space, the syscall will fail > with -E2BIG. >=20 > The flags argument should be 0. What about SCHED_FLAG_RESET_ON_FOR? > The other sched_attr fields are filled out as described in > sched_setattr(). >=20 > Scheduling Policies > The scheduler is the kernel component that decides which= runnable > process will be executed by the CPU next. Each process has an= associ=E2=80=90 > ated scheduling policy and a static scheduling priority, sch= ed_prior=E2=80=90 > ity; these are the settings that are modified by sched_setsch= eduler(). > The scheduler makes it decisions based on knowledge of the s= cheduling > policy and static priority of all processes on the system. Isn't this last sentence redundant/sliglhtly repetitive? > For processes scheduled under one of the normal scheduling = policies > (SCHED_OTHER, SCHED_IDLE, SCHED_BATCH), sched_priority is no= t used in > scheduling decisions (it must be specified as 0). >=20 > Processes scheduled under one of the real-time policies (SC= HED_FIFO, > SCHED_RR) have a sched_priority value in the range 1 (l= ow) to 99 > (high). (As the numbers imply, real-time processes always hav= e higher > priority than normal processes.) Note well: POSIX.1-2001 only= requires > an implementation to support a minimum 32 distinct priority le= vels for > the real-time policies, and some systems supply just this= minimum. > Portable programs should use sched_get_priority_min(= 2) and > sched_get_priority_max(2) to find the range of priorities supp= orted for > a particular policy. >=20 > Conceptually, the scheduler maintains a list of runnable proce= sses for > each possible sched_priority value. In order to determ= ine which > process runs next, the scheduler looks for the nonempty list = with the > highest static priority and selects the process at the hea= d of this > list. >=20 > A process's scheduling policy determines where it will be inse= rted into > the list of processes with equal static priority and how it = will move > inside this list. >=20 > All scheduling is preemptive: if a process with a higher stati= c prior=E2=80=90 > ity becomes ready to run, the currently running process wil= l be pre=E2=80=90 > empted and returned to the wait list for its static priorit= y level. > The scheduling policy only determines the ordering within th= e list of > runnable processes with equal static priority. >=20 > SCHED_DEADLINE: Sporadic task model deadline scheduling > SCHED_DEADLINE is an implementation of GEDF (Global Earliest > Deadline First) with additional CBS (Constant Bandwidth Server). > The CBS guarantees that tasks that over-run their specified > budget are throttled and do not affect the correct performance > of other SCHED_DEADLINE tasks. >=20 > SCHED_DEADLINE tasks will fail FORK(2) with -EAGAIN >=20 > Setting SCHED_DEADLINE can fail with -EINVAL when admission > control tests fail. Perhaps add a note about the deadline-class having higher priority than= the=20 other classes; i.e. if a deadline-task is runnable, it will preempt any= =20 other SCHED_(RR|FIFO) regardless of priority? > SCHED_FIFO: First In-First Out scheduling > SCHED_FIFO can only be used with static priorities higher than= 0, which > means that when a SCHED_FIFO processes becomes runnable, it wi= ll always > immediately preempt any currently running SCHED_OTHER, SCHED_B= ATCH, or > SCHED_IDLE process. SCHED_FIFO is a simple scheduling algori= thm with=E2=80=90 > out time slicing. For processes scheduled under the SCHED_FIF= O policy, > the following rules apply: >=20 > * A SCHED_FIFO process that has been preempted by another p= rocess of > higher priority will stay at the head of the list for its = priority > and will resume execution as soon as all processes of high= er prior=E2=80=90 > ity are blocked again. >=20 > * When a SCHED_FIFO process becomes runnable, it will be ins= erted at > the end of the list for its priority. >=20 > * A call to sched_setscheduler() or sched_setparam(2) wil= l put the > SCHED_FIFO (or SCHED_RR) process identified by pid at the = start of > the list if it was runnable. As a consequence, it may pr= eempt the > currently running process if it has the same = priority. > (POSIX.1-2001 specifies that the process should go to the e= nd of the > list.) >=20 > * A process calling sched_yield(2) will be put at the end of = the list. How about the recent discussion regarding sched_yield(). Is this correc= t? lkml.kernel.org/r/alpine.DEB.2.02.1403312333100.14882-3cz04HxQygjZikZi3RtOZ2GXanvQGlWp@public.gmane.org= nix.de Is this the correct place to add a note explaining te potentional pitfa= lls=20 using sched_yield? > No other events will move a process scheduled under the SCHED_= =46IFO pol=E2=80=90 > icy in the wait list of runnable processes with equal static p= riority. >=20 > A SCHED_FIFO process runs until either it is blocked by an I/O= request, > it is preempted by a higher priority process, or i= t calls > sched_yield(2). >=20 > SCHED_RR: Round Robin scheduling > SCHED_RR is a simple enhancement of SCHED_FIFO. Everything = described > above for SCHED_FIFO also applies to SCHED_RR, except that eac= h process > is only allowed to run for a maximum time quantum. If a= SCHED_RR > process has been running for a time period equal to or longer = than the > time quantum, it will be put at the end of the list for its = priority. > A SCHED_RR process that has been preempted by a higher priorit= y process > and subsequently resumes execution as a running process will= complete > the unexpired portion of its round robin time quantum. The l= ength of > the time quantum can be retrieved using sched_rr_get_interval(= 2). -> Default is 0.1HZ ms This is a question I get form time to time, having this in the manpage=20 would be helpful. > SCHED_OTHER: Default Linux time-sharing scheduling > SCHED_OTHER can only be used at static priority 0. SCHED_OTH= ER is the > standard Linux time-sharing scheduler that is intended for = all pro=E2=80=90 > cesses that do not require the special real-time mechani= sms. The > process to run is chosen from the static priority 0 list bas= ed on a > dynamic priority that is determined only inside this list. Th= e dynamic > priority is based on the nice value (set by nice(2) or setpri= ority(2)) > and increased for each time quantum the process is ready to= run, but > denied to run by the scheduler. This ensures fair progress a= mong all > SCHED_OTHER processes. >=20 > SCHED_BATCH: Scheduling batch processes > (Since Linux 2.6.16.) SCHED_BATCH can only be used at static= priority > 0. This policy is similar to SCHED_OTHER in that it sched= ules the > process according to its dynamic priority (based on the nic= e value). > The difference is that this policy will cause the scheduler t= o always > assume that the process is CPU-intensive. Consequently, the = scheduler > will apply a small scheduling penalty with respect to wakeup b= ehaviour, > so that this process is mildly disfavored in scheduling decisi= ons. >=20 > This policy is useful for workloads that are noninteractive, b= ut do not > want to lower their nice value, and for workloads that want a = determin=E2=80=90 > istic scheduling policy without interactivity causing extra pr= eemptions > (between the workload's tasks). >=20 > SCHED_IDLE: Scheduling very low priority jobs > (Since Linux 2.6.23.) SCHED_IDLE can only be used at static = priority > 0; the process nice value has no influence for this policy. >=20 > This policy is intended for running jobs at extremely low= priority > (lower even than a +19 nice value with the SCHED_OTHER or SC= HED_BATCH > policies). >=20 > RETURN VALUE > On success, sched_setattr() and sched_getattr() return 0. On > error, -1 is returned, and errno is set appropriately. >=20 > ERRORS > EINVAL The scheduling policy is not one of the recognized = policies, > param is NULL, or param does not make sense for the pol= icy. >=20 > EPERM The calling process does not have appropriate privilege= s. >=20 > ESRCH The process whose ID is pid could not be found. >=20 > E2BIG The provided storage for struct sched_attr is either to= o > big, see sched_setattr(), or too small, see sched_getat= tr(). Where's the EBUSY? It can throw this from __sched_setscheduler() when i= t=20 checks if there's enough bandwidth to run the task. >=20 > NOTES > While the text above (and in SCHED_SETSCHEDULER(2)) talks about > processes, in actual fact these system calls are thread specific. --=20 Henrik Austad -- To unsubscribe from this list: send the line "unsubscribe linux-man" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html