* [RFC] CPU controllers?
@ 2006-06-15 13:46 Srivatsa Vaddagiri
2006-06-15 21:52 ` Sam Vilain
` (2 more replies)
0 siblings, 3 replies; 36+ messages in thread
From: Srivatsa Vaddagiri @ 2006-06-15 13:46 UTC (permalink / raw)
To: Sam Vilain, Kirill Korotaev, Mike Galbraith, Ingo Molnar,
Nick Piggin, Peter Williams, Andrew Morton, sekharan,
Balbir Singh
Cc: linux-kernel
Hello,
There have been several proposals so far on this subject and no
consensus seems to have been reached on what an acceptable CPU controller
for Linux needs to provide. I am hoping this mail will trigger some
discussions in that regard. In particular I am keen to know what the
various maintainers think about this subject.
The various approaches proposed so far are:
- CPU rate-cap (limit CPU execution rate per-task)
http://lkml.org/lkml/2006/5/26/7
- f-series CKRM controller (CPU usage guarantee for a task-group)
http://lkml.org/lkml/2006/4/27/399
- e-series CKRM controller (CPU usage guarantee/limit for a task-group)
http://prdownloads.sourceforge.net/ckrm/cpu.ckrm-e18.v10.patch.gz?download
- OpenVZ controller (CPU usage guarantee/hard-limit for a task-group)
http://openvz.org/
- vserver controller (CPU usage guarantee(?)/limit for a task-group)
http://linux-vserver.org/
(I apologize if I have missed any other significant proposal for Linux)
Their salient features and limitations/drawbacks, as I could gather, are
summarized later below. To note is each controller varies in degree of
complexity and addresses its own set of requirements.
In going forward for an acceptable controller in mainline it would help, IMHO,
if we put together the set of requirements which the Linux CPU controller
should support. Some questions that arise in this regard are:
- Do we need mechanisms to control CPU usage of tasks, further to what
already exists (like nice)? IMO yes.
- What are the requirements of such a CPU controller? Some of them to
consider are:
- Should it operate on a per-task basis or on a per-task-group
basis?
- Should it support more than one level of task-groups?
- If we want to allow on a per-task-group basis, which mechanism
do we use for grouping tasks (Resource Groups, PAGG,
uid/session id ..)?
- Should it support limit and guarantee both? In case of limit,
should it support both soft and hard limit?
- What interface do we choose for user to specify
limit/guarantee? system call or filesystem based (ex: /proc or
Resource Group's rcfs)?
- Over what interval should guarantee/limit be monitored and
controlled?
- With what accuracy should we allow the limit/guarantee to be
expressed?
- Co-existence with CPUset - should guarantee/limit be
enforced only on the set of CPUs attached to the cpuset?
- Should real-time tasks be outside the purview of this control?
- Load balance to be made aware of the guarantee/limit of tasks
(or task-groups)? Ofcourse yes!
One possibility is to add a basic controller, that addresses some minimal
requirements, to begin with and progressively enhance it capabilities. From this
pov, both the f-series resource group controller and cpu rate-cap seem to be
good candidates for a minimal controller to begin with.
Thoughts?
Salient features of various CPU controllers that have been proposed so far are
summarized below. I have not captured OpenVZ and Vserver controller aspects
well. Request the maintainers to fill-in!
1. CPU Rate Cap (by Peter Williams)
Features:
* Limit CPU execution rate on a per-task basis.
* Limit specified in terms of parts-per-thousand. Limit set thr' /proc
interface.
* Supports hard limit and soft limit
* Introduces new task priorities where tasks that have exceeded their
soft limit can be "parked" until the O(1) scheduler picks them for
execution
* Load balancing on SMP systems made aware of tasks whose execution
rate is limited by this feature
* Patch is simple
Limitations:
* Does not support guarantee
Drawbacks:
* Limiting CPU execution rate of a group of tasks has to be tackled from
an external module (user or kernel space) which may make this approach
somewhat inconvenient to implement for task-groups.
2. Timeslice scaling (Maeda Naoaki and Kurosawa Takahiro)
Features:
* Provide guaranteed CPU execution rate on a per-task-group basis
Guarantee provided over an interval of 5 seconds.
* Hooked to Resource Group infrastructure currently and hence
guarantee/limit set thr' Resource Group's RCFS interface.
* Achieves guaranteed execution by scaling down timeslice of tasks
who are above their guaranteed execution rate. Timeslice can be
scaled down only to a minimum of 1 slice.
* Does not scale down timeslice of interactive tasks (even if their
CPU usage is beyond what is guaranteed) and does not avoid requeue
of interactive tasks.
* Patch is quite simple
Limitations:
* Does not support limiting task-group CPU execution rate
Drawbacks:
(Some of the drawbacks listed are probably being addressed currently
with a redesign - which we are yet to see)
* Interactive tasks (and their requeuing) can come in the way of
providing guaranteed execution rate to other tasks
* SMP load balancing does not take into account guarantee provided to
task groups.
* It may not be possible to restrict CPU usage of a task group to only
its guaranteed usage if the task-group has large number of tasks
(each task is run for a minimum of 1 timeslice)
* May not handle bursty loads
3. Resource Group e-series CPU controller
Features:
* Provides both guarantee and limit for CPU execution rate of task
groups (classes)
* Two-level scheduling. Pick a task-group (class) to execute first and
then a task within the task-group. Both are of O(1) complexity.
* Classes are given priorities based on their guaranteed CPU usage,
accumulated CPU execution and the highest priority task present
within the group. Class with the highest priority picked up
for execution next.
* Guarantee/Limit specified in terms of shares
Drawbacks:
* Complexity
3. OpenVZ CPU controller
Features:
- Provides both guarantee [1] and (hard) limit for CPU execution rate
of task group (containers)
- Multi-level scheduler (Pick a task-group to run first, then pick a
virtual-cpu and then a task)
- Virtual cpu concept makes group-aware SMP load balancing easy
- Uses cycles (rather than ticks) consumed for accounting (?)
[1] - http://download.openvz.org/doc/OpenVZ-Users-Guide.pdf
Limitations:
- ?
Drawbacks:
- ?
4. VServer CPU controller
Features:
- Token-bucket based
Drawbacks:
- ?
Limitations:
- ?
--
Regards,
vatsa
^ permalink raw reply [flat|nested] 36+ messages in thread* Re: [RFC] CPU controllers? 2006-06-15 13:46 [RFC] CPU controllers? Srivatsa Vaddagiri @ 2006-06-15 21:52 ` Sam Vilain 2006-06-15 23:30 ` Peter Williams 2006-06-17 8:48 ` Nick Piggin 2 siblings, 0 replies; 36+ messages in thread From: Sam Vilain @ 2006-06-15 21:52 UTC (permalink / raw) To: vatsa Cc: Kirill Korotaev, Mike Galbraith, Ingo Molnar, Nick Piggin, Peter Williams, Andrew Morton, sekharan, Balbir Singh, linux-kernel Srivatsa Vaddagiri wrote: > One possibility is to add a basic controller, that addresses some minimal > requirements, to begin with and progressively enhance it capabilities. From this > pov, both the f-series resource group controller and cpu rate-cap seem to be > good candidates for a minimal controller to begin with. > > Thoughts? > Sounds like you're on the right track, but I don't know whether we can truly be happy making the performance/guarantee trade-off decision for the user. You could grossly put the solutions into several camps; 1. solutions which have very low impact and provide soft assurances only 2. solutions which provide hard limits 3. solutions which provide guarantees I think it's almost invariant that the latter solutions have more of a performance impact, and that it's quite important that normal system throughput does not suffer from the "scheduling namespace" solution that we come up with. > Salient features of various CPU controllers that have been proposed so far are > summarized below. I have not captured OpenVZ and Vserver controller aspects > well. Request the maintainers to fill-in! > [...] > 2. Timeslice scaling (Maeda Naoaki and Kurosawa Takahiro) > > Features: > * Provide guaranteed CPU execution rate on a per-task-group basis > Guarantee provided over an interval of 5 seconds. > * Hooked to Resource Group infrastructure currently and hence > guarantee/limit set thr' Resource Group's RCFS interface. > * Achieves guaranteed execution by scaling down timeslice of tasks > who are above their guaranteed execution rate. Timeslice can be > scaled down only to a minimum of 1 slice. > * Does not scale down timeslice of interactive tasks (even if their > CPU usage is beyond what is guaranteed) and does not avoid requeue > of interactive tasks. > * Patch is quite simple > > Limitations: > * Does not support limiting task-group CPU execution rate > > Drawbacks: > (Some of the drawbacks listed are probably being addressed currently > with a redesign - which we are yet to see) > > * Interactive tasks (and their requeuing) can come in the way of > providing guaranteed execution rate to other tasks > * SMP load balancing does not take into account guarantee provided to > task groups. > * It may not be possible to restrict CPU usage of a task group to only > its guaranteed usage if the task-group has large number of tasks > (each task is run for a minimum of 1 timeslice) > * May not handle bursty loads > > [...] > 4. VServer CPU controller > > Features: > - Token-bucket based > The VServer scheduler is also timeslice scaling - it just uses the token bucket to know how much to scale the timeslices. It doesn't care about interactive bonuses, although it does lessen the interactivity bonus a notch or two (to -5..+5). This means that it's performance neutral in the general case. > Drawbacks: > - ? > It fits into category 1 (or, using Herbert Poetzl's enhancements, 2), so does not provide guarantees. > Limitations: > - ? Doesn't deal with huge numbers of processes; but with task group ulimits that problem goes away in practice. Sam. ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [RFC] CPU controllers? 2006-06-15 13:46 [RFC] CPU controllers? Srivatsa Vaddagiri 2006-06-15 21:52 ` Sam Vilain @ 2006-06-15 23:30 ` Peter Williams 2006-06-16 0:42 ` Matt Helsley 2006-06-17 8:48 ` Nick Piggin 2 siblings, 1 reply; 36+ messages in thread From: Peter Williams @ 2006-06-15 23:30 UTC (permalink / raw) To: vatsa Cc: Sam Vilain, Kirill Korotaev, Mike Galbraith, Ingo Molnar, Nick Piggin, Andrew Morton, sekharan, Balbir Singh, linux-kernel Srivatsa Vaddagiri wrote: > Hello, > There have been several proposals so far on this subject and no > consensus seems to have been reached on what an acceptable CPU controller > for Linux needs to provide. I am hoping this mail will trigger some > discussions in that regard. In particular I am keen to know what the > various maintainers think about this subject. > > The various approaches proposed so far are: > > - CPU rate-cap (limit CPU execution rate per-task) > http://lkml.org/lkml/2006/5/26/7 > > - f-series CKRM controller (CPU usage guarantee for a task-group) > http://lkml.org/lkml/2006/4/27/399 > > - e-series CKRM controller (CPU usage guarantee/limit for a task-group) > http://prdownloads.sourceforge.net/ckrm/cpu.ckrm-e18.v10.patch.gz?download > > - OpenVZ controller (CPU usage guarantee/hard-limit for a task-group) > http://openvz.org/ > > - vserver controller (CPU usage guarantee(?)/limit for a task-group) > http://linux-vserver.org/ > > (I apologize if I have missed any other significant proposal for Linux) > > Their salient features and limitations/drawbacks, as I could gather, are > summarized later below. To note is each controller varies in degree of > complexity and addresses its own set of requirements. > > In going forward for an acceptable controller in mainline it would help, IMHO, > if we put together the set of requirements which the Linux CPU controller > should support. Some questions that arise in this regard are: > > - Do we need mechanisms to control CPU usage of tasks, further to what > already exists (like nice)? IMO yes. > > - What are the requirements of such a CPU controller? Some of them to > consider are: > > - Should it operate on a per-task basis or on a per-task-group > basis? > - Should it support more than one level of task-groups? > - If we want to allow on a per-task-group basis, which mechanism > do we use for grouping tasks (Resource Groups, PAGG, > uid/session id ..)? > - Should it support limit and guarantee both? In case of limit, > should it support both soft and hard limit? > - What interface do we choose for user to specify > limit/guarantee? system call or filesystem based (ex: /proc or > Resource Group's rcfs)? > - Over what interval should guarantee/limit be monitored and > controlled? > - With what accuracy should we allow the limit/guarantee to be > expressed? > - Co-existence with CPUset - should guarantee/limit be > enforced only on the set of CPUs attached to the cpuset? > - Should real-time tasks be outside the purview of this control? > - Load balance to be made aware of the guarantee/limit of tasks > (or task-groups)? Ofcourse yes! > > One possibility is to add a basic controller, that addresses some minimal > requirements, to begin with and progressively enhance it capabilities. I would amend this to say "provide the basic controllers and let more complex management mechanisms use them (from outside the scheduler) to provide higher level control. An essential part of this will be the provision of statistics for these external controllers to use. > From this > pov, both the f-series resource group controller and cpu rate-cap seem to be > good candidates for a minimal controller to begin with. > > Thoughts? > > Salient features of various CPU controllers that have been proposed so far are > summarized below. I have not captured OpenVZ and Vserver controller aspects > well. Request the maintainers to fill-in! > > 1. CPU Rate Cap (by Peter Williams) > > Features: > > * Limit CPU execution rate on a per-task basis. > * Limit specified in terms of parts-per-thousand. Limit set thr' /proc > interface. The /proc interface is not an essential part of this patch and the reason that it was implemented is that it was simple, easy and useful for testing. The patch "proper" provides four functions for setting/getting the soft/hard caps an exports these so that they can be used from modules. I.e. it would be very easy to replace the /proc interface with another one (or more) or to keep it and make another interface as well. All the essential testing/processing required for setting the caps properly is inside the functions NOT the /proc interface. > * Supports hard limit and soft limit > * Introduces new task priorities where tasks that have exceeded their > soft limit can be "parked" until the O(1) scheduler picks them for > execution > * Load balancing on SMP systems made aware of tasks whose execution > rate is limited by this feature > * Patch is simple > > Limitations: > * Does not support guarantee Why would a capping mechanism support guarantees? The two mechanisms can be implemented separately. The only interaction between them that is required is a statement about which has precedence. I.e. if a cap is less than a guarantee is it enforced? I would opine that it should be. BTW if "nice" works properly, guarantees can be implemented by suitable fiddling of task "nice" values. > > Drawbacks: > * Limiting CPU execution rate of a group of tasks has to be tackled from > an external module (user or kernel space) which may make this approach > somewhat inconvenient to implement for task-groups. Nevertheless it can be done and it has the advantage that the cost is only borne by those who wish to use such high level controls. The caps provided by this (simple) patch provide functionality that ordinary can find useful. In particular, the use of a soft cap of zero to effectively put a task (and all of its children) in the background is very useful for doing software builds on a work station. Con Kolivas's SCHED_IDLE scheduling class in his staircase scheduler provides the same functionality and is (from all reports) very popular. The key difference between soft caps and the SCHED_IDLE mechanism is that it is more general in that limits other than zero can be specified. This provides more flexibility. Peter -- Peter Williams pwil3058@bigpond.net.au "Learning, n. The kind of ignorance distinguishing the studious." -- Ambrose Bierce ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [RFC] CPU controllers? 2006-06-15 23:30 ` Peter Williams @ 2006-06-16 0:42 ` Matt Helsley 0 siblings, 0 replies; 36+ messages in thread From: Matt Helsley @ 2006-06-16 0:42 UTC (permalink / raw) To: Peter Williams Cc: vatsa, Sam Vilain, Kirill Korotaev, Mike Galbraith, Ingo Molnar, Nick Piggin, Andrew Morton, Chandra S. Seetharaman, Balbir Singh, LKML On Fri, 2006-06-16 at 09:30 +1000, Peter Williams wrote: > Srivatsa Vaddagiri wrote: <snip> > > * Supports hard limit and soft limit > > * Introduces new task priorities where tasks that have exceeded their > > soft limit can be "parked" until the O(1) scheduler picks them for > > execution > > * Load balancing on SMP systems made aware of tasks whose execution > > rate is limited by this feature > > * Patch is simple > > > > Limitations: > > * Does not support guarantee > > Why would a capping mechanism support guarantees? The two mechanisms > can be implemented separately. The only interaction between them that > is required is a statement about which has precedence. I.e. if a cap is > less than a guarantee is it enforced? I would opine that it should be. When this combination occurs userspace is crazy/uncoordinated/dumb and can't be "satisfied". Perhaps the better approach is to ignore both guarantee and limit (cap) in this case -- treat it as if userspace hasn't specified either. Alternatively the kernel can refuse to allow configuring such a combination in the first place. This is one reason tying guarantees and limits (caps) into the same framework would be useful. <snip> Cheers, -Matt Helsley ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [RFC] CPU controllers? 2006-06-15 13:46 [RFC] CPU controllers? Srivatsa Vaddagiri 2006-06-15 21:52 ` Sam Vilain 2006-06-15 23:30 ` Peter Williams @ 2006-06-17 8:48 ` Nick Piggin 2006-06-17 15:55 ` Balbir Singh ` (2 more replies) 2 siblings, 3 replies; 36+ messages in thread From: Nick Piggin @ 2006-06-17 8:48 UTC (permalink / raw) To: vatsa Cc: Sam Vilain, Kirill Korotaev, Mike Galbraith, Ingo Molnar, Peter Williams, Andrew Morton, sekharan, Balbir Singh, linux-kernel Srivatsa Vaddagiri wrote: > Hello, > There have been several proposals so far on this subject and no > consensus seems to have been reached on what an acceptable CPU controller > for Linux needs to provide. I am hoping this mail will trigger some > discussions in that regard. In particular I am keen to know what the > various maintainers think about this subject. > > The various approaches proposed so far are: > > - CPU rate-cap (limit CPU execution rate per-task) > http://lkml.org/lkml/2006/5/26/7 > > - f-series CKRM controller (CPU usage guarantee for a task-group) > http://lkml.org/lkml/2006/4/27/399 > > - e-series CKRM controller (CPU usage guarantee/limit for a task-group) > http://prdownloads.sourceforge.net/ckrm/cpu.ckrm-e18.v10.patch.gz?download > > - OpenVZ controller (CPU usage guarantee/hard-limit for a task-group) > http://openvz.org/ > > - vserver controller (CPU usage guarantee(?)/limit for a task-group) > http://linux-vserver.org/ > > (I apologize if I have missed any other significant proposal for Linux) > > Their salient features and limitations/drawbacks, as I could gather, are > summarized later below. To note is each controller varies in degree of > complexity and addresses its own set of requirements. > > In going forward for an acceptable controller in mainline it would help, IMHO, > if we put together the set of requirements which the Linux CPU controller > should support. Some questions that arise in this regard are: > > - Do we need mechanisms to control CPU usage of tasks, further to what > already exists (like nice)? IMO yes. Can we get back to the question of need? And from there, work out what features are wanted. IMHO, having containers try to virtualise all resources (memory, pagecache, slab cache, CPU, disk/network IO...) seems insane: we may just as well use virtualisation. So, from my POV, I would like to be convinced of the need for this first. I would really love to be able to keep core kernel simple and fast even if it means edge cases might need to use a slightly different solution. -- SUSE Labs, Novell Inc. Send instant messages to your online friends http://au.messenger.yahoo.com ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [RFC] CPU controllers? 2006-06-17 8:48 ` Nick Piggin @ 2006-06-17 15:55 ` Balbir Singh 2006-06-17 16:48 ` Srivatsa Vaddagiri 2006-06-19 18:14 ` Chris Friesen 2 siblings, 0 replies; 36+ messages in thread From: Balbir Singh @ 2006-06-17 15:55 UTC (permalink / raw) To: Nick Piggin Cc: vatsa, Sam Vilain, Kirill Korotaev, Mike Galbraith, Ingo Molnar, Peter Williams, Andrew Morton, sekharan, linux-kernel On 6/17/06, Nick Piggin <nickpiggin@yahoo.com.au> wrote: > Srivatsa Vaddagiri wrote: > > Hello, > > There have been several proposals so far on this subject and no > > consensus seems to have been reached on what an acceptable CPU controller > > for Linux needs to provide. I am hoping this mail will trigger some > > discussions in that regard. In particular I am keen to know what the > > various maintainers think about this subject. > > > > The various approaches proposed so far are: > > > > - CPU rate-cap (limit CPU execution rate per-task) > > http://lkml.org/lkml/2006/5/26/7 > > > > - f-series CKRM controller (CPU usage guarantee for a task-group) > > http://lkml.org/lkml/2006/4/27/399 > > > > - e-series CKRM controller (CPU usage guarantee/limit for a task-group) > > http://prdownloads.sourceforge.net/ckrm/cpu.ckrm-e18.v10.patch.gz?download > > > > - OpenVZ controller (CPU usage guarantee/hard-limit for a task-group) > > http://openvz.org/ > > > > - vserver controller (CPU usage guarantee(?)/limit for a task-group) > > http://linux-vserver.org/ > > > > (I apologize if I have missed any other significant proposal for Linux) > > > > Their salient features and limitations/drawbacks, as I could gather, are > > summarized later below. To note is each controller varies in degree of > > complexity and addresses its own set of requirements. > > > > In going forward for an acceptable controller in mainline it would help, IMHO, > > if we put together the set of requirements which the Linux CPU controller > > should support. Some questions that arise in this regard are: > > > > - Do we need mechanisms to control CPU usage of tasks, further to what > > already exists (like nice)? IMO yes. > > Can we get back to the question of need? And from there, work out what > features are wanted. > > IMHO, having containers try to virtualise all resources (memory, pagecache, > slab cache, CPU, disk/network IO...) seems insane: we may just as well use > virtualisation. > > So, from my POV, I would like to be convinced of the need for this first. > I would really love to be able to keep core kernel simple and fast even if > it means edge cases might need to use a slightly different solution. > > -- > SUSE Labs, Novell Inc. The simplest example that comes to my mind to explain the need is through quality of service. Consider a single system running two instances of an application (lets say a web portal or a database sever). If one of the instances is production and the other is development, and if the development instance is being stress tested - how do I provide reliable quality of service to the users of the production instance? I am sure other people will probably have better examples. Warm Regards, Balbir Linux Technology Center IBM, ISL ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [RFC] CPU controllers? 2006-06-17 8:48 ` Nick Piggin 2006-06-17 15:55 ` Balbir Singh @ 2006-06-17 16:48 ` Srivatsa Vaddagiri 2006-06-18 5:06 ` Nick Piggin 2006-06-19 18:14 ` Chris Friesen 2 siblings, 1 reply; 36+ messages in thread From: Srivatsa Vaddagiri @ 2006-06-17 16:48 UTC (permalink / raw) To: Nick Piggin Cc: Sam Vilain, Kirill Korotaev, Mike Galbraith, Ingo Molnar, Peter Williams, Andrew Morton, sekharan, Balbir Singh, linux-kernel, maeda.naoaki, kurosawa On Sat, Jun 17, 2006 at 06:48:17PM +1000, Nick Piggin wrote: > Srivatsa Vaddagiri wrote: > > - Do we need mechanisms to control CPU usage of tasks, further to > > what > > already exists (like nice)? IMO yes. > > Can we get back to the question of need? And from there, work out what > features are wanted. > > IMHO, having containers try to virtualise all resources (memory, pagecache, > slab cache, CPU, disk/network IO...) seems insane: we may just as well use > virtualisation. > > So, from my POV, I would like to be convinced of the need for this first. > I would really love to be able to keep core kernel simple and fast even if > it means edge cases might need to use a slightly different solution. I think a proportional-share scheduler (which is what a CPU controller may provide) has non-container uses also. Do you think nice (or sched policy) is enough to, say, provide guaranteed CPU usage for applications or limit their CPU usage? Moreover it is more flexible if guarantee/limit can be specified for a group of tasks, rather than individual tasks even in non-container scenarios (like limiting CPU usage of all web-server tasks togther or for limiting CPU usage of make -j command). -- Regards, vatsa ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [RFC] CPU controllers? 2006-06-17 16:48 ` Srivatsa Vaddagiri @ 2006-06-18 5:06 ` Nick Piggin 2006-06-18 5:53 ` Sam Vilain 0 siblings, 1 reply; 36+ messages in thread From: Nick Piggin @ 2006-06-18 5:06 UTC (permalink / raw) To: vatsa Cc: Sam Vilain, Kirill Korotaev, Mike Galbraith, Ingo Molnar, Peter Williams, Andrew Morton, sekharan, Balbir Singh, linux-kernel, maeda.naoaki, kurosawa Srivatsa Vaddagiri wrote: > On Sat, Jun 17, 2006 at 06:48:17PM +1000, Nick Piggin wrote: > >>Srivatsa Vaddagiri wrote: >> >>> - Do we need mechanisms to control CPU usage of tasks, further to >>> what >>> already exists (like nice)? IMO yes. >> >>Can we get back to the question of need? And from there, work out what >>features are wanted. >> >>IMHO, having containers try to virtualise all resources (memory, pagecache, >>slab cache, CPU, disk/network IO...) seems insane: we may just as well use >>virtualisation. >> >>So, from my POV, I would like to be convinced of the need for this first. >>I would really love to be able to keep core kernel simple and fast even if >>it means edge cases might need to use a slightly different solution. > > > I think a proportional-share scheduler (which is what a CPU controller > may provide) has non-container uses also. Do you think nice (or sched policy) > is enough to, say, provide guaranteed CPU usage for applications or limit > their CPU usage? Moreover it is more flexible if guarantee/limit can be > specified for a group of tasks, rather than individual tasks even in > non-container scenarios (like limiting CPU usage of all web-server > tasks togther or for limiting CPU usage of make -j command). > Oh, I'm sure there are lots of things we *could* do that we currently can't. What I want to establish first is: what exact functionality is required, why, and by whom. Only then can we sanely discuss the fitness of solutions and propose alternatives, and decide whether to merge. -- SUSE Labs, Novell Inc. Send instant messages to your online friends http://au.messenger.yahoo.com ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [RFC] CPU controllers? 2006-06-18 5:06 ` Nick Piggin @ 2006-06-18 5:53 ` Sam Vilain 2006-06-18 6:11 ` Nick Piggin 2006-06-18 7:18 ` Srivatsa Vaddagiri 0 siblings, 2 replies; 36+ messages in thread From: Sam Vilain @ 2006-06-18 5:53 UTC (permalink / raw) To: Nick Piggin Cc: vatsa, Kirill Korotaev, Mike Galbraith, Ingo Molnar, Peter Williams, Andrew Morton, sekharan, Balbir Singh, linux-kernel, maeda.naoaki, kurosawa Nick Piggin wrote: >> I think a proportional-share scheduler (which is what a CPU controller >> may provide) has non-container uses also. Do you think nice (or sched >> policy) is enough to, say, provide guaranteed CPU usage for >> applications or limit their CPU usage? Moreover it is more flexible >> if guarantee/limit can be specified for a group of tasks, rather than >> individual tasks even in >> non-container scenarios (like limiting CPU usage of all web-server >> tasks togther or for limiting CPU usage of make -j command). >> > > Oh, I'm sure there are lots of things we *could* do that we currently > can't. > > What I want to establish first is: what exact functionality is > required, why, and by whom. You make it sound like users should feel sorry for wanting features already commonly available on other high performance unix kernels. The answer is quite simple, people who are consolidating systems and working with fewer, larger systems, want to mark processes, groups of processes or entire containers into CPU scheduling classes, then either fair balance between them, limit them or reserve them a portion of the CPU - depending on the user and what their requirements are. What is unclear about that? Yes, this does get somewhat simpler if you strap yourself into a complete virtualisation straightjacket, but the current thread is not about that approach - and the continual suggestions that we are all just being stupid and going about it the wrong way are locally off-topic. Bear in mind that we have on the table at least one group of scheduling solutions (timeslice scaling based ones, such as the VServer one) which is virtually no overhead and could potentially provide the "jumpers" necessary for implementing more complex scheduling policies. Sam. > Only then can we sanely discuss the fitness of solutions and propose > alternatives, and decide whether to merge. ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [RFC] CPU controllers? 2006-06-18 5:53 ` Sam Vilain @ 2006-06-18 6:11 ` Nick Piggin 2006-06-18 6:40 ` Sam Vilain 2006-06-18 6:42 ` Andrew Morton 2006-06-18 7:18 ` Srivatsa Vaddagiri 1 sibling, 2 replies; 36+ messages in thread From: Nick Piggin @ 2006-06-18 6:11 UTC (permalink / raw) To: Sam Vilain Cc: vatsa, Kirill Korotaev, Mike Galbraith, Ingo Molnar, Peter Williams, Andrew Morton, sekharan, Balbir Singh, linux-kernel, maeda.naoaki, kurosawa Sam Vilain wrote: > Nick Piggin wrote: > >>> I think a proportional-share scheduler (which is what a CPU controller >>> may provide) has non-container uses also. Do you think nice (or sched >>> policy) is enough to, say, provide guaranteed CPU usage for >>> applications or limit their CPU usage? Moreover it is more flexible >>> if guarantee/limit can be specified for a group of tasks, rather than >>> individual tasks even in >>> non-container scenarios (like limiting CPU usage of all web-server >>> tasks togther or for limiting CPU usage of make -j command). >>> >> >> Oh, I'm sure there are lots of things we *could* do that we currently >> can't. >> >> What I want to establish first is: what exact functionality is >> required, why, and by whom. > > > You make it sound like users should feel sorry for wanting features > already commonly available on other high performance unix kernels. If telling me what exact functionality they want is going to cause them so much pain, I suppose they should feel sorry for themselves. And I don't care about any other kernels, unix or not. I care about what Linux users want. > > The answer is quite simple, people who are consolidating systems and > working with fewer, larger systems, want to mark processes, groups of > processes or entire containers into CPU scheduling classes, then either > fair balance between them, limit them or reserve them a portion of the > CPU - depending on the user and what their requirements are. What is > unclear about that? > It is unclear whether we should have hard limits, or just nice like priority levels. Whether virtualisation (+/- containers) could be a good solution, etc. If you want to *completely* isolate N groups of users, surely you have to use virtualisation, unless you are willing to isolate memory management, pagecache, slab caches, network and disk IO, etc. > Yes, this does get somewhat simpler if you strap yourself into a > complete virtualisation straightjacket, but the current thread is not > about that approach - and the continual suggestions that we are all just > being stupid and going about it the wrong way are locally off-topic. I'm sorry you cannot come up with a statement of the functionality you require without badmouthing "complete" virtualisation or implying that I'm saying you're stupid. I think the containers people might also recognise that it may not be the best solution to make containers the be all and end all of consolidating systems, and virtualisation is a very relevant topic when discussing pros and cons and alternate solutions. But at this point I'm yet to be shown what the *problem* is. I'm not trying to deny that one might exist. > > Bear in mind that we have on the table at least one group of scheduling > solutions (timeslice scaling based ones, such as the VServer one) which > is virtually no overhead and could potentially provide the "jumpers" > necessary for implementing more complex scheduling policies. Again, I don't care about the solutions at this stage. I want to know what the problem is. Please? -- SUSE Labs, Novell Inc. Send instant messages to your online friends http://au.messenger.yahoo.com ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [RFC] CPU controllers? 2006-06-18 6:11 ` Nick Piggin @ 2006-06-18 6:40 ` Sam Vilain 2006-06-18 7:17 ` Nick Piggin 2006-06-18 6:42 ` Andrew Morton 1 sibling, 1 reply; 36+ messages in thread From: Sam Vilain @ 2006-06-18 6:40 UTC (permalink / raw) To: Nick Piggin Cc: vatsa, Kirill Korotaev, Mike Galbraith, Ingo Molnar, Peter Williams, Andrew Morton, sekharan, Balbir Singh, linux-kernel, maeda.naoaki, kurosawa Nick Piggin wrote: >> The answer is quite simple, people who are consolidating systems and >> working with fewer, larger systems, want to mark processes, groups of >> processes or entire containers into CPU scheduling classes, then >> either fair balance between them, limit them or reserve them a >> portion of the CPU - depending on the user and what their >> requirements are. What is unclear about that? >> > > It is unclear whether we should have hard limits, or just nice like > priority levels. Whether virtualisation (+/- containers) could be a > good solution, etc. Look, that was actually answered in the paragraph you're responding to. Once again, give me a set of possible requirements and I'll find you a set of users that have them. I am finding this sub-thread quite redundant. > If you want to *completely* isolate N groups of users, surely you > have to use virtualisation, unless you are willing to isolate memory > management, pagecache, slab caches, network and disk IO, etc. No, you have to use separate hardware. Try to claim otherwise and you're glossing over the corner cases. Sam. ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [RFC] CPU controllers? 2006-06-18 6:40 ` Sam Vilain @ 2006-06-18 7:17 ` Nick Piggin 0 siblings, 0 replies; 36+ messages in thread From: Nick Piggin @ 2006-06-18 7:17 UTC (permalink / raw) To: Sam Vilain Cc: vatsa, Kirill Korotaev, Mike Galbraith, Ingo Molnar, Peter Williams, Andrew Morton, sekharan, Balbir Singh, linux-kernel, maeda.naoaki, kurosawa Sam Vilain wrote: > Nick Piggin wrote: > >>> The answer is quite simple, people who are consolidating systems and >>> working with fewer, larger systems, want to mark processes, groups of >>> processes or entire containers into CPU scheduling classes, then >>> either fair balance between them, limit them or reserve them a >>> portion of the CPU - depending on the user and what their >>> requirements are. What is unclear about that? >>> >> >> It is unclear whether we should have hard limits, or just nice like >> priority levels. Whether virtualisation (+/- containers) could be a >> good solution, etc. > > > Look, that was actually answered in the paragraph you're responding to. > Once again, give me a set of possible requirements and I'll find you a > set of users that have them. I am finding this sub-thread quite redundant. Clearly we can't stuff everything into the kernel. What I'm asking is what the important functionality is that people want to cover. I don't know how you could possibly interpret it as anything else. > >> If you want to *completely* isolate N groups of users, surely you >> have to use virtualisation, unless you are willing to isolate memory >> management, pagecache, slab caches, network and disk IO, etc. > > > No, you have to use separate hardware. Try to claim otherwise and you're > glossing over the corner cases. Well, virtualisation seems like it would get you a lot further than containers for the same amount of work. -- SUSE Labs, Novell Inc. Send instant messages to your online friends http://au.messenger.yahoo.com ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [RFC] CPU controllers? 2006-06-18 6:11 ` Nick Piggin 2006-06-18 6:40 ` Sam Vilain @ 2006-06-18 6:42 ` Andrew Morton 2006-06-18 7:28 ` Nick Piggin 2006-06-18 7:36 ` [RFC] CPU controllers? Mike Galbraith 1 sibling, 2 replies; 36+ messages in thread From: Andrew Morton @ 2006-06-18 6:42 UTC (permalink / raw) To: Nick Piggin Cc: sam, vatsa, dev, efault, mingo, pwil3058, sekharan, balbir, linux-kernel, maeda.naoaki, kurosawa On Sun, 18 Jun 2006 16:11:18 +1000 Nick Piggin <nickpiggin@yahoo.com.au> wrote: > If you want to *completely* isolate N groups of users, surely you > have to use virtualisation, I'd view this as a kludge. If one group of tasks is trashing the performance of another group of tasks the user is forced to use hardware virtualisation to work around it. I mean, is this our answer to the updatedb problem? Instantiate a separate copy of the kernel just to run updatedb? > unless you are willing to isolate memory > management, pagecache, slab caches, network and disk IO, etc. Well yes. Ideally and ultimately. People have done this, and it's in production. We need to see (and work upon) the patches before we can judge whether we want to do this, and how far we want to go. > Again, I don't care about the solutions at this stage. I want to know > what the problem is. Please? Isolation. To prevent one group of processes from damaging the performance of other groups, by providing manageability of the resource consumption of each group. There are plenty of applications of this, not just server-consolidation-via-server-virtualisation. ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [RFC] CPU controllers? 2006-06-18 6:42 ` Andrew Morton @ 2006-06-18 7:28 ` Nick Piggin 2006-06-19 19:03 ` Resource Management Requirements (was "[RFC] CPU controllers?") Chandra Seetharaman 2006-06-18 7:36 ` [RFC] CPU controllers? Mike Galbraith 1 sibling, 1 reply; 36+ messages in thread From: Nick Piggin @ 2006-06-18 7:28 UTC (permalink / raw) To: Andrew Morton Cc: sam, vatsa, dev, efault, mingo, pwil3058, sekharan, balbir, linux-kernel, maeda.naoaki, kurosawa Andrew Morton wrote: > On Sun, 18 Jun 2006 16:11:18 +1000 > Nick Piggin <nickpiggin@yahoo.com.au> wrote: > > >>If you want to *completely* isolate N groups of users, surely you >>have to use virtualisation, > > > I'd view this as a kludge. If one group of tasks is trashing the > performance of another group of tasks the user is forced to use hardware > virtualisation to work around it. > > I mean, is this our answer to the updatedb problem? Instantiate a separate > copy of the kernel just to run updatedb? Well even before that, I'd view the fact that working around the VM's poor behaviour by putting updatedb into a container or memory control as a kludge anyway. CPU and IO control (ie. nice & ioprio) is reasonable. updatedb is pretty simple and the VM should easily be able to recognise its use-once nature. However I don't doubt that people would like to be able to manage memory better. Whether that is best served by having resource control heirarchies or virtualisation or something else completely is still on the table IMO. > > >>unless you are willing to isolate memory >>management, pagecache, slab caches, network and disk IO, etc. > > > Well yes. Ideally and ultimately. People have done this, and it's in > production. We need to see (and work upon) the patches before we can judge > whether we want to do this, and how far we want to go. Definitely. > > >>Again, I don't care about the solutions at this stage. I want to know >>what the problem is. Please? > > > Isolation. To prevent one group of processes from damaging the performance > of other groups, by providing manageability of the resource consumption of > each group. There are plenty of applications of this, not just > server-consolidation-via-server-virtualisation. OK... let me put it more clearly. What are the requirements? I don't like that apparently virtualisation can't be discussed in a general thread about resource control. Nothing is going to be a 100% solution for everybody. If, for a *specific* application, virtualisation can be discounted... then great, that is the kind of discussion I would like to see. -- SUSE Labs, Novell Inc. Send instant messages to your online friends http://au.messenger.yahoo.com ^ permalink raw reply [flat|nested] 36+ messages in thread
* Resource Management Requirements (was "[RFC] CPU controllers?") 2006-06-18 7:28 ` Nick Piggin @ 2006-06-19 19:03 ` Chandra Seetharaman 2006-06-20 5:40 ` Srivatsa Vaddagiri 0 siblings, 1 reply; 36+ messages in thread From: Chandra Seetharaman @ 2006-06-19 19:03 UTC (permalink / raw) To: Nick Piggin Cc: Andrew Morton, sam, vatsa, dev, efault, mingo, pwil3058, balbir, linux-kernel, maeda.naoaki, kurosawa, ckrm-tech On Sun, 2006-06-18 at 17:28 +1000, Nick Piggin wrote: > OK... let me put it more clearly. What are the requirements? Nick, Here are some requirements we(Resource Groups aka CKRM) are working towards (Note that this is not limited to CPU alone): In a enterprise environment: - Ability to group applications into their importance levels and assign appropriate amount of resources to them. - In case of server consolidation, ability to allocate and control resources to a specific group of applications. Ability to account/charge according to their usages. - manage multiple departments in a single OS instance with ability to allocate and control resources department wise (similar to above requirement :) - ability to guarantee "time to complete" for a specific user request (by controlling resource usage starting from the web server to the database server). - In case of ISPs and ASPs, ability to guarantee/limit usages to independent clients (in a single OS instance). - Ability to control runaway processes from bringing down the system response (DoS attacks, fork bombs etc.,) In a university environment (can be treated as a subset of enterprise requirements above): - Ability to limit resource consumption at individual user level. - Ability to control runaway processes. - Ability for a user to manage resources allocated to them (as explained in the desktop environment below). In a desktop environment: - Ability to control resource usage of a set of applications (ex: infamous updatedb issue). - Ability to run different loads and get the expected result (like checking emails or browsing Internet while compilation is in progress) Generic: Provide these resource management capabilities with less overhead on overall system performance. regards, chandra -- ---------------------------------------------------------------------- Chandra Seetharaman | Be careful what you choose.... - sekharan@us.ibm.com | .......you may get it. ---------------------------------------------------------------------- ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Resource Management Requirements (was "[RFC] CPU controllers?") 2006-06-19 19:03 ` Resource Management Requirements (was "[RFC] CPU controllers?") Chandra Seetharaman @ 2006-06-20 5:40 ` Srivatsa Vaddagiri 0 siblings, 0 replies; 36+ messages in thread From: Srivatsa Vaddagiri @ 2006-06-20 5:40 UTC (permalink / raw) To: Chandra Seetharaman Cc: Nick Piggin, Andrew Morton, sam, dev, efault, mingo, pwil3058, balbir, linux-kernel, maeda.naoaki, kurosawa, ckrm-tech On Mon, Jun 19, 2006 at 12:03:23PM -0700, Chandra Seetharaman wrote: > On Sun, 2006-06-18 at 17:28 +1000, Nick Piggin wrote: > > > OK... let me put it more clearly. What are the requirements? At a very broad-level, all the requirements pointed by Chandra below boil down to the requirement of providing guaranteed CPU usage for a group of tasks and the ability of limiting (hard or soft) CPU usage of other group of tasks. At a finer-level, this broad requirement could be interpreted and implemented in a number of ways (ex: by having kernel support only task-level limit and implementing group-level in user-space etc) and thats what this RFC was about - to discuss what minimal kernel support would be needed to support the above broad requirement! > Nick, > > Here are some requirements we(Resource Groups aka CKRM) are working > towards (Note that this is not limited to CPU alone): > > In a enterprise environment: > - Ability to group applications into their importance levels and assign > appropriate amount of resources to them. > - In case of server consolidation, ability to allocate and control > resources to a specific group of applications. Ability to > account/charge according to their usages. > - manage multiple departments in a single OS instance with ability to > allocate and control resources department wise (similar to above > requirement :) > - ability to guarantee "time to complete" for a specific user > request (by controlling resource usage starting from the web server > to the database server). > - In case of ISPs and ASPs, ability to guarantee/limit usages to > independent clients (in a single OS instance). > - Ability to control runaway processes from bringing down the system > response (DoS attacks, fork bombs etc.,) > > In a university environment (can be treated as a subset of enterprise > requirements above): > - Ability to limit resource consumption at individual user level. > - Ability to control runaway processes. > - Ability for a user to manage resources allocated to them (as > explained in the desktop environment below). > > In a desktop environment: > - Ability to control resource usage of a set of applications > (ex: infamous updatedb issue). > - Ability to run different loads and get the expected result (like > checking emails or browsing Internet while compilation is in > progress) > > Generic: > Provide these resource management capabilities with less overhead on > overall system performance. -- Regards, vatsa ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [RFC] CPU controllers? 2006-06-18 6:42 ` Andrew Morton 2006-06-18 7:28 ` Nick Piggin @ 2006-06-18 7:36 ` Mike Galbraith 2006-06-18 7:49 ` Nick Piggin ` (3 more replies) 1 sibling, 4 replies; 36+ messages in thread From: Mike Galbraith @ 2006-06-18 7:36 UTC (permalink / raw) To: Andrew Morton Cc: Nick Piggin, sam, vatsa, dev, mingo, pwil3058, sekharan, balbir, linux-kernel, maeda.naoaki, kurosawa On Sat, 2006-06-17 at 23:42 -0700, Andrew Morton wrote: > On Sun, 18 Jun 2006 16:11:18 +1000 > Nick Piggin <nickpiggin@yahoo.com.au> wrote: > > Again, I don't care about the solutions at this stage. I want to know > > what the problem is. Please? > > Isolation. To prevent one group of processes from damaging the performance > of other groups, by providing manageability of the resource consumption of > each group. There are plenty of applications of this, not just > server-consolidation-via-server-virtualisation. Scheduling contexts do sound useful. They're easily defeated though, as evolution mail demonstrates to me every time it's GUI hangs and I see that a nice 19 find is running, eating very little CPU, but effectively DoSing evolution nonetheless (journal). I wonder how often people who tried to distribute CPU would likewise be stymied by other resources. -Mike ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [RFC] CPU controllers? 2006-06-18 7:36 ` [RFC] CPU controllers? Mike Galbraith @ 2006-06-18 7:49 ` Nick Piggin 2006-06-18 7:49 ` Nick Piggin ` (2 subsequent siblings) 3 siblings, 0 replies; 36+ messages in thread From: Nick Piggin @ 2006-06-18 7:49 UTC (permalink / raw) To: Mike Galbraith Cc: Andrew Morton, sam, vatsa, dev, mingo, pwil3058, sekharan, balbir, linux-kernel, maeda.naoaki, kurosawa Mike Galbraith wrote: > On Sat, 2006-06-17 at 23:42 -0700, Andrew Morton wrote: > >>On Sun, 18 Jun 2006 16:11:18 +1000 >>Nick Piggin <nickpiggin@yahoo.com.au> wrote: > > >>>Again, I don't care about the solutions at this stage. I want to know >>>what the problem is. Please? >> >>Isolation. To prevent one group of processes from damaging the performance >>of other groups, by providing manageability of the resource consumption of >>each group. There are plenty of applications of this, not just >>server-consolidation-via-server-virtualisation. > > > Scheduling contexts do sound useful. They're easily defeated though, as > evolution mail demonstrates to me every time it's GUI hangs and I see > that a nice 19 find is running, eating very little CPU, but effectively > DoSing evolution nonetheless (journal). I wonder how often people who > tried to distribute CPU would likewise be stymied by other resources. Not entirely infrequently. Which is why it really doesn't seem like it could be useful from a security point of view without a *huge* amount of work and complexity... and even from a guaranteed-service point of view, it still seems (to me) like a pretty big and complex problem. As a check box for marketing it sounds pretty cool though, I admit ;) -- SUSE Labs, Novell Inc. Send instant messages to your online friends http://au.messenger.yahoo.com ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [RFC] CPU controllers? 2006-06-18 7:36 ` [RFC] CPU controllers? Mike Galbraith 2006-06-18 7:49 ` Nick Piggin @ 2006-06-18 7:49 ` Nick Piggin 2006-06-18 9:09 ` Andrew Morton 2006-06-19 18:21 ` Chris Friesen 3 siblings, 0 replies; 36+ messages in thread From: Nick Piggin @ 2006-06-18 7:49 UTC (permalink / raw) To: Mike Galbraith Cc: Andrew Morton, sam, vatsa, dev, mingo, pwil3058, sekharan, balbir, linux-kernel, maeda.naoaki, kurosawa Mike Galbraith wrote: > On Sat, 2006-06-17 at 23:42 -0700, Andrew Morton wrote: > >>On Sun, 18 Jun 2006 16:11:18 +1000 >>Nick Piggin <nickpiggin@yahoo.com.au> wrote: > > >>>Again, I don't care about the solutions at this stage. I want to know >>>what the problem is. Please? >> >>Isolation. To prevent one group of processes from damaging the performance >>of other groups, by providing manageability of the resource consumption of >>each group. There are plenty of applications of this, not just >>server-consolidation-via-server-virtualisation. > > > Scheduling contexts do sound useful. They're easily defeated though, as > evolution mail demonstrates to me every time it's GUI hangs and I see > that a nice 19 find is running, eating very little CPU, but effectively > DoSing evolution nonetheless (journal). I wonder how often people who > tried to distribute CPU would likewise be stymied by other resources. Not entirely infrequently. Which is why it really doesn't seem like it could be useful from a security point of view without a *huge* amount of work and complexity... and even from a guaranteed-service point of view, it still seems (to me) like a pretty big and complex problem. As a check box for marketing it sounds pretty cool though, I admit ;) -- SUSE Labs, Novell Inc. Send instant messages to your online friends http://au.messenger.yahoo.com ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [RFC] CPU controllers? 2006-06-18 7:36 ` [RFC] CPU controllers? Mike Galbraith 2006-06-18 7:49 ` Nick Piggin 2006-06-18 7:49 ` Nick Piggin @ 2006-06-18 9:09 ` Andrew Morton 2006-06-18 9:49 ` Mike Galbraith 2006-06-19 18:21 ` Chris Friesen 3 siblings, 1 reply; 36+ messages in thread From: Andrew Morton @ 2006-06-18 9:09 UTC (permalink / raw) To: Mike Galbraith Cc: nickpiggin, sam, vatsa, dev, mingo, pwil3058, sekharan, balbir, linux-kernel, maeda.naoaki, kurosawa On Sun, 18 Jun 2006 09:36:16 +0200 Mike Galbraith <efault@gmx.de> wrote: > as > evolution mail demonstrates to me every time it's GUI hangs and I see > that a nice 19 find is running, eating very little CPU, but effectively > DoSing evolution nonetheless (journal). eh? That would be an io scheduler bug, wouldn't it? Tell us more. ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [RFC] CPU controllers? 2006-06-18 9:09 ` Andrew Morton @ 2006-06-18 9:49 ` Mike Galbraith 2006-06-19 6:28 ` Mike Galbraith 0 siblings, 1 reply; 36+ messages in thread From: Mike Galbraith @ 2006-06-18 9:49 UTC (permalink / raw) To: Andrew Morton Cc: nickpiggin, sam, vatsa, dev, mingo, pwil3058, sekharan, balbir, linux-kernel, maeda.naoaki, kurosawa On Sun, 2006-06-18 at 02:09 -0700, Andrew Morton wrote: > On Sun, 18 Jun 2006 09:36:16 +0200 > Mike Galbraith <efault@gmx.de> wrote: > > > as > > evolution mail demonstrates to me every time it's GUI hangs and I see > > that a nice 19 find is running, eating very little CPU, but effectively > > DoSing evolution nonetheless (journal). > > eh? That would be an io scheduler bug, wouldn't it? > > Tell us more. The trace below was done with a nice -n 19 bonnie -s 2047 running, but the same happens with the find that SuSE starts at annoying times. Scheduler is cfq, but changing schedulers doesn't help. Place a shell window over the evolution window, start io, then click on the evolution window, and see how long it takes to be able to read mail. Here, it's a couple forevers. evolution D 00000001 0 9324 6938 9333 7851 (NOTLB) ef322dec 00000000 00000000 00000001 00000003 93a3f580 000f44c2 ef322000 ef322000 ef314030 93a3f580 000f44c2 ef322000 001fb058 ef24d980 ef322000 ef322e50 b10bcb57 00000000 b1399998 ef322e3c 00000001 ef24d9c0 ef24d9d0 Call Trace: [<b10bcb57>] log_wait_commit+0x139/0x1f1 [<b10b6000>] journal_stop+0x239/0x350 [<b10b6dc8>] journal_force_commit+0x1d/0x1f [<b10ae32a>] ext3_force_commit+0x24/0x26 [<b10a83a0>] ext3_write_inode+0x34/0x7b [<b107fa79>] __writeback_single_inode+0x2e8/0x3c9 [<b10803f1>] sync_inode+0x15/0x2f [<b10a426b>] ext3_sync_file+0xc3/0xc8 [<b10600fc>] do_fsync+0x68/0xb3 [<b1060167>] __do_fsync+0x20/0x2f [<b1060195>] sys_fsync+0xd/0xf [<b1002e1b>] syscall_call+0x7/0xb ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [RFC] CPU controllers? 2006-06-18 9:49 ` Mike Galbraith @ 2006-06-19 6:28 ` Mike Galbraith 2006-06-19 6:35 ` Andrew Morton 0 siblings, 1 reply; 36+ messages in thread From: Mike Galbraith @ 2006-06-19 6:28 UTC (permalink / raw) To: Andrew Morton Cc: nickpiggin, sam, vatsa, dev, mingo, pwil3058, sekharan, balbir, linux-kernel, maeda.naoaki, kurosawa This is kinda OT for this thread, but here's another example of where the IO can easily foil CPU distribution plans. I wonder how many folks get nailed by /proc being mounted without noatime,nodiratime like I just apparently did. top D E29B4928 0 10174 8510 (NOTLB) d2f63c4c 00100100 00200200 e29b4928 ea07f3c0 f1510c40 000f6e66 d2f63000 d2f63000 ed88c550 f1510c40 000f6e66 d2f63000 d2f63000 ed062220 ed88c550 d2f63c70 b139a97b ed062224 efef8df8 ed062224 ed88c550 d2f63000 0000385a Call Trace: [<b139a97b>] __mutex_lock_slowpath+0x59/0xb0 [<b139a9d7>] .text.lock.mutex+0x5/0x14 [<b10bb24f>] __log_wait_for_space+0x53/0xb4 [<b10b67b4>] start_this_handle+0x100/0x617 [<b10b6d86>] journal_start+0xbb/0xe0 [<b10ae10e>] ext3_journal_start_sb+0x29/0x4a [<b10a8d9f>] ext3_dirty_inode+0x2a/0xaf [<b1080171>] __mark_inode_dirty+0x2a/0x19e [<b107784a>] touch_atime+0x79/0x9f [<b103fda5>] do_generic_mapping_read+0x370/0x480 [<b1040747>] __generic_file_aio_read+0xf0/0x205 [<b1040896>] generic_file_aio_read+0x3a/0x46 [<b105d919>] do_sync_read+0xbb/0xf1 [<b105e2c1>] vfs_read+0xa4/0x166 [<b105e6c1>] sys_read+0x3d/0x64 [<b1002e1b>] syscall_call+0x7/0xb netdaemon D EC84ED04 0 7696 1 7711 7695 (NOTLB) efef8dec 00000000 efef8000 ec84ed04 efef8e00 402cecc0 000f6e68 efef8000 efef8000 ed617030 402cecc0 000f6e68 efef8000 efef8000 ed062220 ed617030 efef8e10 b139a97b ed062224 ed062224 d2f63c58 ed617030 efef8000 0000385a Call Trace: [<b139a97b>] __mutex_lock_slowpath+0x59/0xb0 [<b139a9d7>] .text.lock.mutex+0x5/0x14 [<b10bb24f>] __log_wait_for_space+0x53/0xb4 [<b10b67b4>] start_this_handle+0x100/0x617 [<b10b6d86>] journal_start+0xbb/0xe0 [<b10ae10e>] ext3_journal_start_sb+0x29/0x4a [<b10a8d9f>] ext3_dirty_inode+0x2a/0xaf [<b1080171>] __mark_inode_dirty+0x2a/0x19e [<b107784a>] touch_atime+0x79/0x9f [<b106fc08>] vfs_readdir+0x91/0x93 [<b106fc6a>] sys_getdents64+0x60/0xa7 [<b1002e1b>] syscall_call+0x7/0xb ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [RFC] CPU controllers? 2006-06-19 6:28 ` Mike Galbraith @ 2006-06-19 6:35 ` Andrew Morton 2006-06-19 6:46 ` Mike Galbraith 0 siblings, 1 reply; 36+ messages in thread From: Andrew Morton @ 2006-06-19 6:35 UTC (permalink / raw) To: Mike Galbraith Cc: nickpiggin, sam, vatsa, dev, mingo, pwil3058, sekharan, balbir, linux-kernel, maeda.naoaki, kurosawa On Mon, 19 Jun 2006 08:28:45 +0200 Mike Galbraith <efault@gmx.de> wrote: > This is kinda OT for this thread, but here's another example of where > the IO can easily foil CPU distribution plans. I wonder how many folks > get nailed by /proc being mounted without noatime,nodiratime like I just > apparently did. > > top D E29B4928 0 10174 8510 (NOTLB) > d2f63c4c 00100100 00200200 e29b4928 ea07f3c0 f1510c40 000f6e66 d2f63000 > d2f63000 ed88c550 f1510c40 000f6e66 d2f63000 d2f63000 ed062220 ed88c550 > d2f63c70 b139a97b ed062224 efef8df8 ed062224 ed88c550 d2f63000 0000385a > Call Trace: > [<b139a97b>] __mutex_lock_slowpath+0x59/0xb0 > [<b139a9d7>] .text.lock.mutex+0x5/0x14 > [<b10bb24f>] __log_wait_for_space+0x53/0xb4 > [<b10b67b4>] start_this_handle+0x100/0x617 > [<b10b6d86>] journal_start+0xbb/0xe0 > [<b10ae10e>] ext3_journal_start_sb+0x29/0x4a > [<b10a8d9f>] ext3_dirty_inode+0x2a/0xaf > [<b1080171>] __mark_inode_dirty+0x2a/0x19e > [<b107784a>] touch_atime+0x79/0x9f > [<b103fda5>] do_generic_mapping_read+0x370/0x480 > [<b1040747>] __generic_file_aio_read+0xf0/0x205 > [<b1040896>] generic_file_aio_read+0x3a/0x46 > [<b105d919>] do_sync_read+0xbb/0xf1 > [<b105e2c1>] vfs_read+0xa4/0x166 > [<b105e6c1>] sys_read+0x3d/0x64 > [<b1002e1b>] syscall_call+0x7/0xb Confused. What has this to do with /proc? ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [RFC] CPU controllers? 2006-06-19 6:35 ` Andrew Morton @ 2006-06-19 6:46 ` Mike Galbraith 0 siblings, 0 replies; 36+ messages in thread From: Mike Galbraith @ 2006-06-19 6:46 UTC (permalink / raw) To: Andrew Morton Cc: nickpiggin, sam, vatsa, dev, mingo, pwil3058, sekharan, balbir, linux-kernel, maeda.naoaki, kurosawa On Sun, 2006-06-18 at 23:35 -0700, Andrew Morton wrote: > On Mon, 19 Jun 2006 08:28:45 +0200 > Mike Galbraith <efault@gmx.de> wrote: > > > This is kinda OT for this thread, but here's another example of where > > the IO can easily foil CPU distribution plans. I wonder how many folks > > get nailed by /proc being mounted without noatime,nodiratime like I just > > apparently did. > > > > top D E29B4928 0 10174 8510 (NOTLB) > > d2f63c4c 00100100 00200200 e29b4928 ea07f3c0 f1510c40 000f6e66 d2f63000 > > d2f63000 ed88c550 f1510c40 000f6e66 d2f63000 d2f63000 ed062220 ed88c550 > > d2f63c70 b139a97b ed062224 efef8df8 ed062224 ed88c550 d2f63000 0000385a > > Call Trace: > > [<b139a97b>] __mutex_lock_slowpath+0x59/0xb0 > > [<b139a9d7>] .text.lock.mutex+0x5/0x14 > > [<b10bb24f>] __log_wait_for_space+0x53/0xb4 > > [<b10b67b4>] start_this_handle+0x100/0x617 > > [<b10b6d86>] journal_start+0xbb/0xe0 > > [<b10ae10e>] ext3_journal_start_sb+0x29/0x4a > > [<b10a8d9f>] ext3_dirty_inode+0x2a/0xaf > > [<b1080171>] __mark_inode_dirty+0x2a/0x19e > > [<b107784a>] touch_atime+0x79/0x9f > > [<b103fda5>] do_generic_mapping_read+0x370/0x480 > > [<b1040747>] __generic_file_aio_read+0xf0/0x205 > > [<b1040896>] generic_file_aio_read+0x3a/0x46 > > [<b105d919>] do_sync_read+0xbb/0xf1 > > [<b105e2c1>] vfs_read+0xa4/0x166 > > [<b105e6c1>] sys_read+0x3d/0x64 > > [<b1002e1b>] syscall_call+0x7/0xb > > Confused. What has this to do with /proc? /me assumed... with usual result. ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [RFC] CPU controllers? 2006-06-18 7:36 ` [RFC] CPU controllers? Mike Galbraith ` (2 preceding siblings ...) 2006-06-18 9:09 ` Andrew Morton @ 2006-06-19 18:21 ` Chris Friesen 2006-06-20 6:20 ` Mike Galbraith 3 siblings, 1 reply; 36+ messages in thread From: Chris Friesen @ 2006-06-19 18:21 UTC (permalink / raw) To: Mike Galbraith Cc: Andrew Morton, Nick Piggin, sam, vatsa, dev, mingo, pwil3058, sekharan, balbir, linux-kernel, maeda.naoaki, kurosawa Mike Galbraith wrote: > Scheduling contexts do sound useful. They're easily defeated though, as > evolution mail demonstrates to me every time it's GUI hangs and I see > that a nice 19 find is running, eating very little CPU, but effectively > DoSing evolution nonetheless (journal). I wonder how often people who > tried to distribute CPU would likewise be stymied by other resources. We do a lot with diskless blades. Basically cpu(s), memory, and network ports. For this case, cpu, memory, and network controllers are sufficient. Even just cpu gets you a long way, since mostly we're not IO-intensive and we generally have a pretty good idea of memory consumption. Chris ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [RFC] CPU controllers? 2006-06-19 18:21 ` Chris Friesen @ 2006-06-20 6:20 ` Mike Galbraith 0 siblings, 0 replies; 36+ messages in thread From: Mike Galbraith @ 2006-06-20 6:20 UTC (permalink / raw) To: Chris Friesen Cc: Andrew Morton, Nick Piggin, sam, vatsa, dev, mingo, pwil3058, sekharan, balbir, linux-kernel, maeda.naoaki, kurosawa On Mon, 2006-06-19 at 12:21 -0600, Chris Friesen wrote: > Mike Galbraith wrote: > > > Scheduling contexts do sound useful. They're easily defeated though, as > > evolution mail demonstrates to me every time it's GUI hangs and I see > > that a nice 19 find is running, eating very little CPU, but effectively > > DoSing evolution nonetheless (journal). I wonder how often people who > > tried to distribute CPU would likewise be stymied by other resources. > > We do a lot with diskless blades. Basically cpu(s), memory, and network > ports. > > For this case, cpu, memory, and network controllers are sufficient. > Even just cpu gets you a long way, since mostly we're not IO-intensive > and we generally have a pretty good idea of memory consumption. Sure. Some conflicts can be avoided with foreknowledge, and those conflicts that do occur don't necessarily make limits worthless or unmanageable. Nonetheless, I can imagine them becoming problematic. -Mike ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [RFC] CPU controllers? 2006-06-18 5:53 ` Sam Vilain 2006-06-18 6:11 ` Nick Piggin @ 2006-06-18 7:18 ` Srivatsa Vaddagiri 2006-06-19 2:07 ` Sam Vilain 1 sibling, 1 reply; 36+ messages in thread From: Srivatsa Vaddagiri @ 2006-06-18 7:18 UTC (permalink / raw) To: Sam Vilain Cc: Nick Piggin, Kirill Korotaev, Mike Galbraith, Ingo Molnar, Peter Williams, Andrew Morton, sekharan, Balbir Singh, linux-kernel, maeda.naoaki, kurosawa On Sun, Jun 18, 2006 at 05:53:42PM +1200, Sam Vilain wrote: > Bear in mind that we have on the table at least one group of scheduling > solutions (timeslice scaling based ones, such as the VServer one) which > is virtually no overhead and could potentially provide the "jumpers" > necessary for implementing more complex scheduling policies. Sam, Do you have any plans to post the vserver CPU control implementation hooked against maybe Resource Groups (for grouping tasks)? Seeing several different implementation against current kernel may perhaps help maintainers decide what they like and what they don't? -- Regards, vatsa ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [RFC] CPU controllers? 2006-06-18 7:18 ` Srivatsa Vaddagiri @ 2006-06-19 2:07 ` Sam Vilain 2006-06-19 7:04 ` MAEDA Naoaki 0 siblings, 1 reply; 36+ messages in thread From: Sam Vilain @ 2006-06-19 2:07 UTC (permalink / raw) To: vatsa Cc: Nick Piggin, Kirill Korotaev, Mike Galbraith, Ingo Molnar, Peter Williams, Andrew Morton, sekharan, Balbir Singh, linux-kernel, maeda.naoaki, kurosawa, ckrm-tech Srivatsa Vaddagiri wrote: > On Sun, Jun 18, 2006 at 05:53:42PM +1200, Sam Vilain wrote: > >> Bear in mind that we have on the table at least one group of scheduling >> solutions (timeslice scaling based ones, such as the VServer one) which >> is virtually no overhead and could potentially provide the "jumpers" >> necessary for implementing more complex scheduling policies. >> > Do you have any plans to post the vserver CPU control > implementation hooked against maybe Resource Groups (for grouping > tasks)? Seeing several different implementation against current > kernel may perhaps help maintainers decide what they like and what they > don't? That sounds like a good idea, I like the Resource Groups concept in general and it would be good to be able to fit this into a more generic and comprehensive framework. I'll try it against Chandra and Maeda's Apr 27 submission (a shame I missed it the first time around), and see how far I get. [goes away a bit] ok, so basically the bit in cpu_rc_load() where for_each_cpu_mask() is called, in Maeda Naoaki's patch "CPU controller - Add class load estimation support", is where O(N) creeps in that could be remedied with a token bucket algorithm. You don't want this because if you have 10,000 processes on a system in two resource groups, the aggregate performance will suffer due to the large number of cacheline misses during the 5,000 size loop that runs every resched. To apply the token bucket here, you would first change the per-CPU struct cpu_rc to have the TBF fields; minimally: int tokens; /* current number of CPU tokens */ int fill_rate[2]; /* Fill rate: add X tokens... */ int interval[2]; /* Divisor: per Y jiffies */ int tokens_max; /* Limit: no more than N tokens */ unsigned long last_time; /* last time accounted */ (note: the VServer implementation has several other fields for various reasons; the above are the important ones). Then, in cpu_rc_record_allocation(), you'd take the length of the slice out of the bucket (subtract from tokens). In cpu_rc_account(), you would then "refund" unused CPU tokens back. The approach in Linux-VServer is to remove tokens every scheduler_tick(), but perhaps there are advantages to doing it the way you are in the CPU controller Resource Groups patch. That part should obviate the need for cpu_rc_load() altogether. Then, in cpu_rc_scale_timeslice(), you would make it add a bonus depending on (tokens / tokens_max); I found a quadratic back-off, scaling 0% full to a +15 penalty, 100% full to a -5 bonus and 50% full to no bonus, worked well - in my simple purely CPU bound process tests using tight loop processes. Note that when the bucket reaches 0, there is a choice to keep allocating short timeslices anyway, under the presumption that the system has CPU to burn (sched_soft), or to put all processes in that RC on hold (sched_hard). This could potentially be controlled by flags on the bucket - as well as the size of the boost. Hence, the "jumpers" I refer to are the bucket parameters - for instance, if you set the tokens_max to ~HZ, and have a suitably high priority/RT task monitoring the buckets, then that process should be able to; - get a complete record of how many tokens were used by a RC since it last checked, - influence subsequent scheduling priority of the RC, by adjusting the fill rate, current tokens value, the size of the boost, or the "sched_hard" flag ...and it could probably do that with very occasional timeslices, such as one slice per N*HZ (where N ~ the number of resource groups). So that makes it a candidate for moving to userland. The current VServer implementation fails to schedule fairly when the CPU allocations do not add up correctly; if you only allocated 25% of CPU to one vserver, then 40% to another, and they are both busy, they might end up both with empty buckets and an equal +15 penalty - effectively using 50/50 CPU and allocating very short timeslices, yielding poor batch performance. So, with (possibly userland) policy monitoring for this sort of condition and adjusting bucket sizes and levels appropriately, that old "problem" that leads people to conclude that the VServer scheduler does not work could be solved - all without incurring major overhead even on very busy systems. I think that the characteristics of these two approaches are subtly different. Both scale timeslices, but in a different way - instead of estimating the load and scaling back timeslices up front, busy Resource Groups are relied on to deplete their tokens in a timely manner, and get shorter slices allocated because of that. No doubt from 10,000 feet they both look the same. There is probably enough information here for an implementation, but I'll wait for feedback on this post before going any further with it. Sam. ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [RFC] CPU controllers? 2006-06-19 2:07 ` Sam Vilain @ 2006-06-19 7:04 ` MAEDA Naoaki 2006-06-19 8:19 ` Sam Vilain 0 siblings, 1 reply; 36+ messages in thread From: MAEDA Naoaki @ 2006-06-19 7:04 UTC (permalink / raw) To: Sam Vilain Cc: vatsa, Nick Piggin, Kirill Korotaev, Mike Galbraith, Ingo Molnar, Peter Williams, Andrew Morton, sekharan, Balbir Singh, linux-kernel, kurosawa, ckrm-tech, MAEDA Naoaki Sam Vilain wrote: > Srivatsa Vaddagiri wrote: >> On Sun, Jun 18, 2006 at 05:53:42PM +1200, Sam Vilain wrote: >> >>> Bear in mind that we have on the table at least one group of scheduling >>> solutions (timeslice scaling based ones, such as the VServer one) which >>> is virtually no overhead and could potentially provide the "jumpers" >>> necessary for implementing more complex scheduling policies. >>> >> Do you have any plans to post the vserver CPU control >> implementation hooked against maybe Resource Groups (for grouping >> tasks)? Seeing several different implementation against current >> kernel may perhaps help maintainers decide what they like and what they >> don't? > > That sounds like a good idea, I like the Resource Groups concept in > general and it would be good to be able to fit this into a more generic > and comprehensive framework. That sounds nice. > I'll try it against Chandra and Maeda's Apr 27 submission (a shame I > missed it the first time around), and see how far I get. > > [goes away a bit] > > ok, so basically the bit in cpu_rc_load() where for_each_cpu_mask() is > called, in Maeda Naoaki's patch "CPU controller - Add class load > estimation support", is where O(N) creeps in that could be remedied with > a token bucket algorithm. You don't want this because if you have 10,000 > processes on a system in two resource groups, the aggregate performance > will suffer due to the large number of cacheline misses during the 5,000 > size loop that runs every resched. Thank you for looking the code. cpu_rc_load() is never called unless sysadm tries to access the load information via configfs from userland. In addition, it sums up per-CPU group stats, so the size of loop is the number of CPU, not process in the group. However, there is a similer loop in cpu_rc_recalc_tsfactor(), which runs every CPU_RC_RECALC_INTERVAL that is defined as HZ. I don't think it will cause a big performance penalty. > To apply the token bucket here, you would first change the per-CPU > struct cpu_rc to have the TBF fields; minimally: > > int tokens; /* current number of CPU tokens */ > > int fill_rate[2]; /* Fill rate: add X tokens... */ > int interval[2]; /* Divisor: per Y jiffies */ > int tokens_max; /* Limit: no more than N tokens */ > > unsigned long last_time; /* last time accounted */ > > (note: the VServer implementation has several other fields for various > reasons; the above are the important ones). > > Then, in cpu_rc_record_allocation(), you'd take the length of the slice > out of the bucket (subtract from tokens). In cpu_rc_account(), you would > then "refund" unused CPU tokens back. The approach in Linux-VServer is > to remove tokens every scheduler_tick(), but perhaps there are > advantages to doing it the way you are in the CPU controller Resource > Groups patch. > > That part should obviate the need for cpu_rc_load() altogether. > > Then, in cpu_rc_scale_timeslice(), you would make it add a bonus > depending on (tokens / tokens_max); I found a quadratic back-off, > scaling 0% full to a +15 penalty, 100% full to a -5 bonus and 50% full > to no bonus, worked well - in my simple purely CPU bound process tests > using tight loop processes. > > Note that when the bucket reaches 0, there is a choice to keep > allocating short timeslices anyway, under the presumption that the > system has CPU to burn (sched_soft), or to put all processes in that RC > on hold (sched_hard). This could potentially be controlled by flags on > the bucket - as well as the size of the boost. > > Hence, the "jumpers" I refer to are the bucket parameters - for > instance, if you set the tokens_max to ~HZ, and have a suitably high > priority/RT task monitoring the buckets, then that process should be > able to; > > - get a complete record of how many tokens were used by a RC since it > last checked, > - influence subsequent scheduling priority of the RC, by adjusting the > fill rate, current tokens value, the size of the boost, or the > "sched_hard" flag > > ...and it could probably do that with very occasional timeslices, such > as one slice per N*HZ (where N ~ the number of resource groups). So that > makes it a candidate for moving to userland. > > The current VServer implementation fails to schedule fairly when the CPU > allocations do not add up correctly; if you only allocated 25% of CPU to > one vserver, then 40% to another, and they are both busy, they might end > up both with empty buckets and an equal +15 penalty - effectively using > 50/50 CPU and allocating very short timeslices, yielding poor batch > performance. > > So, with (possibly userland) policy monitoring for this sort of > condition and adjusting bucket sizes and levels appropriately, that old > "problem" that leads people to conclude that the VServer scheduler does > not work could be solved - all without incurring major overhead even on > very busy systems. > > I think that the characteristics of these two approaches are subtly > different. Both scale timeslices, but in a different way - instead of > estimating the load and scaling back timeslices up front, busy Resource > Groups are relied on to deplete their tokens in a timely manner, and get > shorter slices allocated because of that. No doubt from 10,000 feet they > both look the same. Current 0(1) scheduler gives extra bonus for interactive tasks by requeuing them to active array for a while. It would break the controller's efforts. So, I'm planning to stop the interactive task requeuing if the target share doesn't meet. Are there a similar issue on the vserver scheduler? > There is probably enough information here for an implementation, but > I'll wait for feedback on this post before going any further with it. > > Sam. Thanks, MAEDA Naoaki ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [RFC] CPU controllers? 2006-06-19 7:04 ` MAEDA Naoaki @ 2006-06-19 8:19 ` Sam Vilain 2006-06-19 8:41 ` MAEDA Naoaki 0 siblings, 1 reply; 36+ messages in thread From: Sam Vilain @ 2006-06-19 8:19 UTC (permalink / raw) To: MAEDA Naoaki Cc: vatsa, Nick Piggin, Kirill Korotaev, Mike Galbraith, Ingo Molnar, Peter Williams, Andrew Morton, sekharan, Balbir Singh, linux-kernel, kurosawa, ckrm-tech MAEDA Naoaki wrote: >> ok, so basically the bit in cpu_rc_load() where for_each_cpu_mask() is >> called, in Maeda Naoaki's patch "CPU controller - Add class load >> estimation support", is where O(N) creeps in that could be remedied with >> a token bucket algorithm. You don't want this because if you have 10,000 >> processes on a system in two resource groups, the aggregate performance >> will suffer due to the large number of cacheline misses during the 5,000 >> size loop that runs every resched. >> > > Thank you for looking the code. > > cpu_rc_load() is never called unless sysadm tries to access the load > information via configfs from userland. In addition, it sums up per-CPU > group stats, so the size of loop is the number of CPU, not process in > the group. > > However, there is a similer loop in cpu_rc_recalc_tsfactor(), which runs > every CPU_RC_RECALC_INTERVAL that is defined as HZ. I don't think it > will cause a big performance penalty. > Ok, so that's not as bad as it looked. So, while it is still O(N), the fact that it is O(N/HZ) makes this not a problem until you get to possibly impractical levels of runqueue length. I'm thinking it's probably worth doing anyway, just so that it can be performance tested to see if this performance guestimate is accurate. >> To apply the token bucket here, you would first change the per-CPU >> struct cpu_rc to have the TBF fields; minimally: >> >> [...] >> I think that the characteristics of these two approaches are subtly >> different. Both scale timeslices, but in a different way - instead of >> estimating the load and scaling back timeslices up front, busy Resource >> Groups are relied on to deplete their tokens in a timely manner, and get >> shorter slices allocated because of that. No doubt from 10,000 feet they >> both look the same. >> > > Current 0(1) scheduler gives extra bonus for interactive tasks by > requeuing them to active array for a while. It would break > the controller's efforts. So, I'm planning to stop the interactive > task requeuing if the target share doesn't meet. > > Are there a similar issue on the vserver scheduler? > Not an issue - those extra requeued timeslices are accounted for normally. Sam. ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [RFC] CPU controllers? 2006-06-19 8:19 ` Sam Vilain @ 2006-06-19 8:41 ` MAEDA Naoaki 2006-06-19 8:53 ` Sam Vilain 0 siblings, 1 reply; 36+ messages in thread From: MAEDA Naoaki @ 2006-06-19 8:41 UTC (permalink / raw) To: Sam Vilain Cc: vatsa, Nick Piggin, Kirill Korotaev, Mike Galbraith, Ingo Molnar, Peter Williams, Andrew Morton, sekharan, Balbir Singh, linux-kernel, kurosawa, ckrm-tech, MAEDA Naoaki Sam Vilain wrote: > MAEDA Naoaki wrote: >>> ok, so basically the bit in cpu_rc_load() where for_each_cpu_mask() is >>> called, in Maeda Naoaki's patch "CPU controller - Add class load >>> estimation support", is where O(N) creeps in that could be remedied with >>> a token bucket algorithm. You don't want this because if you have 10,000 >>> processes on a system in two resource groups, the aggregate performance >>> will suffer due to the large number of cacheline misses during the 5,000 >>> size loop that runs every resched. >>> >> Thank you for looking the code. >> >> cpu_rc_load() is never called unless sysadm tries to access the load >> information via configfs from userland. In addition, it sums up per-CPU >> group stats, so the size of loop is the number of CPU, not process in >> the group. >> >> However, there is a similer loop in cpu_rc_recalc_tsfactor(), which runs >> every CPU_RC_RECALC_INTERVAL that is defined as HZ. I don't think it >> will cause a big performance penalty. >> > > Ok, so that's not as bad as it looked. So, while it is still O(N), the > fact that it is O(N/HZ) makes this not a problem until you get to > possibly impractical levels of runqueue length. Do you mean N is the size of the loop? for_each_cpu_mask() loops the number of CPUs times. It is not directly related to runqueue length. > I'm thinking it's probably worth doing anyway, just so that it can be > performance tested to see if this performance guestimate is accurate. > >>> To apply the token bucket here, you would first change the per-CPU >>> struct cpu_rc to have the TBF fields; minimally: >>> >>> [...] >>> I think that the characteristics of these two approaches are subtly >>> different. Both scale timeslices, but in a different way - instead of >>> estimating the load and scaling back timeslices up front, busy Resource >>> Groups are relied on to deplete their tokens in a timely manner, and get >>> shorter slices allocated because of that. No doubt from 10,000 feet they >>> both look the same. >>> >> Current 0(1) scheduler gives extra bonus for interactive tasks by >> requeuing them to active array for a while. It would break >> the controller's efforts. So, I'm planning to stop the interactive >> task requeuing if the target share doesn't meet. >> >> Are there a similar issue on the vserver scheduler? >> > > Not an issue - those extra requeued timeslices are accounted for normally. It's great. Thanks, MAEDA Naoaki ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [RFC] CPU controllers? 2006-06-19 8:41 ` MAEDA Naoaki @ 2006-06-19 8:53 ` Sam Vilain 2006-06-19 21:44 ` MAEDA Naoaki 0 siblings, 1 reply; 36+ messages in thread From: Sam Vilain @ 2006-06-19 8:53 UTC (permalink / raw) To: MAEDA Naoaki Cc: vatsa, Nick Piggin, Kirill Korotaev, Mike Galbraith, Ingo Molnar, Peter Williams, Andrew Morton, sekharan, Balbir Singh, linux-kernel, kurosawa, ckrm-tech MAEDA Naoaki wrote: >> Ok, so that's not as bad as it looked. So, while it is still O(N), the >> fact that it is O(N/HZ) makes this not a problem until you get to >> possibly impractical levels of runqueue length. >> > > Do you mean N is the size of the loop? for_each_cpu_mask() loops > the number of CPUs times. It is not directly related to runqueue length. > Ok, I mistook it for a per-task loop. Well, let me know if you think it's worth trying it out anyway. Sam. ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [RFC] CPU controllers? 2006-06-19 8:53 ` Sam Vilain @ 2006-06-19 21:44 ` MAEDA Naoaki 0 siblings, 0 replies; 36+ messages in thread From: MAEDA Naoaki @ 2006-06-19 21:44 UTC (permalink / raw) To: Sam Vilain Cc: vatsa, Nick Piggin, Kirill Korotaev, Mike Galbraith, Ingo Molnar, Peter Williams, Andrew Morton, sekharan, Balbir Singh, linux-kernel, kurosawa, ckrm-tech, MAEDA Naoaki Sam Vilain wrote: > MAEDA Naoaki wrote: >>> Ok, so that's not as bad as it looked. So, while it is still O(N), the >>> fact that it is O(N/HZ) makes this not a problem until you get to >>> possibly impractical levels of runqueue length. >>> >> Do you mean N is the size of the loop? for_each_cpu_mask() loops >> the number of CPUs times. It is not directly related to runqueue length. >> > > Ok, I mistook it for a per-task loop. > > Well, let me know if you think it's worth trying it out anyway. I don't think this loop would be a bottle neck, but test by third person is always valuable. Thanks, MAEDA Naoaki ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [RFC] CPU controllers? 2006-06-17 8:48 ` Nick Piggin 2006-06-17 15:55 ` Balbir Singh 2006-06-17 16:48 ` Srivatsa Vaddagiri @ 2006-06-19 18:14 ` Chris Friesen 2006-06-19 19:11 ` Chandra Seetharaman 2 siblings, 1 reply; 36+ messages in thread From: Chris Friesen @ 2006-06-19 18:14 UTC (permalink / raw) To: Nick Piggin Cc: vatsa, Sam Vilain, Kirill Korotaev, Mike Galbraith, Ingo Molnar, Peter Williams, Andrew Morton, sekharan, Balbir Singh, linux-kernel Nick Piggin wrote: > So, from my POV, I would like to be convinced of the need for this first. > I would really love to be able to keep core kernel simple and fast even if > it means edge cases might need to use a slightly different solution. We currently use a heavily modified CKRM version "e". The "resource groups" (formerly known as CKRM) cpu controls express what we'd like to do, but they aren't nearly accurate enough. We don't make use the limits, but we do use per-cpu guarantees, along with the hierarchy concept. Our engineering guys need to be able to make cpu guarantees for the various type of processes. "main server app gets 90%, these fault handling guys normally get 2% but should be able to burst to 100% for up to 100ms, that other group gets 5% in total, but a subset of them should get priority over the others, and this little guy here should only be guaranteed .5% but it should take priority over everything else on the system as long as it hasn't used all its allocation". Ideally they'd really like sub percentage (.1% would be nice, but .5% is proably more realistic) accuracy over the divisions. This should be expressed per-cpu, and tasks should be migrated as necessary to maintain fairness. (Ie, a task belonging to a group with 50% on each cpu should be able to run essentially continuously, bouncing back and forth between cpus.) In our case, predictability/fairness comes first, then performance. If a method is accepted into mainline, it would be nice to have NPTL support it as a thread attribute so that different threads can be in different groups. Chris ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [RFC] CPU controllers? 2006-06-19 18:14 ` Chris Friesen @ 2006-06-19 19:11 ` Chandra Seetharaman 2006-06-19 20:28 ` Chris Friesen 0 siblings, 1 reply; 36+ messages in thread From: Chandra Seetharaman @ 2006-06-19 19:11 UTC (permalink / raw) To: Chris Friesen Cc: Nick Piggin, vatsa, Sam Vilain, Kirill Korotaev, Mike Galbraith, Ingo Molnar, Peter Williams, Andrew Morton, Balbir Singh, linux-kernel On Mon, 2006-06-19 at 12:14 -0600, Chris Friesen wrote: > Nick Piggin wrote: > > > So, from my POV, I would like to be convinced of the need for this first. > > I would really love to be able to keep core kernel simple and fast even if > > it means edge cases might need to use a slightly different solution. > > We currently use a heavily modified CKRM version "e". > > The "resource groups" (formerly known as CKRM) cpu controls express what > we'd like to do, but they aren't nearly accurate enough. We don't make > use the limits, but we do use per-cpu guarantees, along with the > hierarchy concept. > > Our engineering guys need to be able to make cpu guarantees for the > various type of processes. "main server app gets 90%, these fault > handling guys normally get 2% but should be able to burst to 100% for up > to 100ms, that other group gets 5% in total, but a subset of them should > get priority over the others, and this little guy here should only be > guaranteed .5% but it should take priority over everything else on the > system as long as it hasn't used all its allocation". > > Ideally they'd really like sub percentage (.1% would be nice, but .5% is > proably more realistic) accuracy over the divisions. This should be > expressed per-cpu, and tasks should be migrated as necessary to maintain > fairness. (Ie, a task belonging to a group with 50% on each cpu should > be able to run essentially continuously, bouncing back and forth between > cpus.) In our case, predictability/fairness comes first, then performance. > > If a method is accepted into mainline, it would be nice to have NPTL > support it as a thread attribute so that different threads can be in > different groups. > Chris, Resource Groups(CKRM) does allow threads to be in different Resource Groups ( and since Resource Group assignment is dynamic, a thread can move to a high priority resource group for a specific operation and get back to its original resource group after the operation is complete). Just wondering if that is sufficient or you _would_ need support from NPTL. chandra > Chris -- ---------------------------------------------------------------------- Chandra Seetharaman | Be careful what you choose.... - sekharan@us.ibm.com | .......you may get it. ---------------------------------------------------------------------- ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [RFC] CPU controllers? 2006-06-19 19:11 ` Chandra Seetharaman @ 2006-06-19 20:28 ` Chris Friesen 0 siblings, 0 replies; 36+ messages in thread From: Chris Friesen @ 2006-06-19 20:28 UTC (permalink / raw) To: sekharan Cc: Nick Piggin, vatsa, Sam Vilain, Kirill Korotaev, Mike Galbraith, Ingo Molnar, Peter Williams, Andrew Morton, Balbir Singh, linux-kernel Chandra Seetharaman wrote: > Resource Groups(CKRM) does allow threads to be in different Resource > Groups ( and since Resource Group assignment is dynamic, a thread can > move to a high priority resource group for a specific operation and get > back to its original resource group after the operation is complete). > > Just wondering if that is sufficient or you _would_ need support from > NPTL. The main issue is that the mapping between pthread_t and and PID is only known by NPTL. A thread can find it's own PID (for purposes of resource groups) but it can't find the PID of other threads given their pthread_t. Essentially I'm looking for cpu-group equivalents to: pthread_setschedparam() pthread_getschedparam() pthread_attr_setschedpolicy() pthread_attr_getschedpolicy() It's not absolutely critical, but we did add it to our current NPTL. Chris ^ permalink raw reply [flat|nested] 36+ messages in thread
end of thread, other threads:[~2006-06-20 6:17 UTC | newest] Thread overview: 36+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2006-06-15 13:46 [RFC] CPU controllers? Srivatsa Vaddagiri 2006-06-15 21:52 ` Sam Vilain 2006-06-15 23:30 ` Peter Williams 2006-06-16 0:42 ` Matt Helsley 2006-06-17 8:48 ` Nick Piggin 2006-06-17 15:55 ` Balbir Singh 2006-06-17 16:48 ` Srivatsa Vaddagiri 2006-06-18 5:06 ` Nick Piggin 2006-06-18 5:53 ` Sam Vilain 2006-06-18 6:11 ` Nick Piggin 2006-06-18 6:40 ` Sam Vilain 2006-06-18 7:17 ` Nick Piggin 2006-06-18 6:42 ` Andrew Morton 2006-06-18 7:28 ` Nick Piggin 2006-06-19 19:03 ` Resource Management Requirements (was "[RFC] CPU controllers?") Chandra Seetharaman 2006-06-20 5:40 ` Srivatsa Vaddagiri 2006-06-18 7:36 ` [RFC] CPU controllers? Mike Galbraith 2006-06-18 7:49 ` Nick Piggin 2006-06-18 7:49 ` Nick Piggin 2006-06-18 9:09 ` Andrew Morton 2006-06-18 9:49 ` Mike Galbraith 2006-06-19 6:28 ` Mike Galbraith 2006-06-19 6:35 ` Andrew Morton 2006-06-19 6:46 ` Mike Galbraith 2006-06-19 18:21 ` Chris Friesen 2006-06-20 6:20 ` Mike Galbraith 2006-06-18 7:18 ` Srivatsa Vaddagiri 2006-06-19 2:07 ` Sam Vilain 2006-06-19 7:04 ` MAEDA Naoaki 2006-06-19 8:19 ` Sam Vilain 2006-06-19 8:41 ` MAEDA Naoaki 2006-06-19 8:53 ` Sam Vilain 2006-06-19 21:44 ` MAEDA Naoaki 2006-06-19 18:14 ` Chris Friesen 2006-06-19 19:11 ` Chandra Seetharaman 2006-06-19 20:28 ` Chris Friesen
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox