Re: [PATCH 00/10] cgroups: Task counter subsystem v6

Linux Container Development
 help / color / mirror / Atom feed

* Re: [PATCH 00/10] cgroups: Task counter subsystem v6
       [not found]     ` <CAAAKZwu67VMiZgdpp=i5p7zyGbOHGHXwF_iprufGPzTLkkUF2A-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2011-10-28 23:30       ` Andrew Morton
       [not found]         ` <20111028163021.1ce61f8a.akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>
  0 siblings, 1 reply; 17+ messages in thread
From: Andrew Morton @ 2011-10-28 23:30 UTC (permalink / raw)
  To: Tim Hockin
  Cc: Aditya Kali, Frederic Weisbecker, Paul Menage, Kay Sievers, LKML,
	Oleg Nesterov, Johannes Weiner, Tejun Heo, Containers

On Tue, 25 Oct 2011 13:06:35 -0700
Tim Hockin <thockin-Rl2oBbRerpQdnm+yROfE0A@public.gmane.org> wrote:

> On Tue, Oct 4, 2011 at 3:01 PM, Andrew Morton <akpm00-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
> > On Mon, __3 Oct 2011 21:07:02 +0200
> > Frederic Weisbecker <fweisbec-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
> >
> >> Hi Andrew,
> >>
> >> This contains minor changes, mostly documentation and changelog
> >> updates, off-case build fix, and a code optimization in
> >> res_counter_common_ancestor().
> >
> > I'd normally duck a patch series like this when we're at -rc8 and ask
> > for it to be resent late in -rc1. __But I was feeling frisky so I
> > grabbed this lot for a bit of testing and will sit on it until -rc1.
> >
> > I'm still not convinced that the kernel has a burning need for a "task
> > counter subsystem". __Someone convince me that we should merge this!
> 
> We have real (accidental) DoS situations which happen because we don't
> have this.  It usually takes the form of some library no re-joining
> threads.  We end up deploying a few apps linked against this library,
> and suddenly we're in trouble on a machine.  Except, this being
> Google, we're in trouble on a lot of machines.

This is a bit foggy.  I think you mean that machines are experiencing
accidental forkbombs?

> There may be other ways to cobble this sort of safety together, but
> they are less appealing for various reasons.  cgroups are how we
> control groups of related pids.
> 
> I'd really love to be able to use this.

Has it been confirmed that this implementation actually solves the
problem?  ie: tested a bit?

btw, Frederic told me that this version of the patchset had some
serious problem so it's on hold pending an upgrade, regardless of other
matters.

^ permalink raw reply	[flat|nested] 17+ messages in thread

[parent not found: <20111028163021.1ce61f8a.akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>]

* Re: [PATCH 00/10] cgroups: Task counter subsystem v6
       [not found]         ` <20111028163021.1ce61f8a.akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>
@ 2011-10-29  9:38           ` Glauber Costa
       [not found]             ` <CAA6-i6o0SPfZJDx4SRR1hY-He0L6zHuv0saH6EaE7Mrc2HF6PA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  2011-11-03 17:00           ` Frederic Weisbecker
  1 sibling, 1 reply; 17+ messages in thread
From: Glauber Costa @ 2011-10-29  9:38 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Aditya Kali, Tim Hockin, Frederic Weisbecker, Paul Menage,
	Kay Sievers, LKML, Oleg Nesterov, Johannes Weiner, Tejun Heo,
	Containers

On Sat, Oct 29, 2011 at 1:30 AM, Andrew Morton
<akpm@linux-foundation.org> wrote:
> On Tue, 25 Oct 2011 13:06:35 -0700
> Tim Hockin <thockin@hockin.org> wrote:
>
>> On Tue, Oct 4, 2011 at 3:01 PM, Andrew Morton <akpm00@gmail.com> wrote:
>> > On Mon, __3 Oct 2011 21:07:02 +0200
>> > Frederic Weisbecker <fweisbec@gmail.com> wrote:
>> >
>> >> Hi Andrew,
>> >>
>> >> This contains minor changes, mostly documentation and changelog
>> >> updates, off-case build fix, and a code optimization in
>> >> res_counter_common_ancestor().
>> >
>> > I'd normally duck a patch series like this when we're at -rc8 and ask
>> > for it to be resent late in -rc1. __But I was feeling frisky so I
>> > grabbed this lot for a bit of testing and will sit on it until -rc1.
>> >
>> > I'm still not convinced that the kernel has a burning need for a "task
>> > counter subsystem". __Someone convince me that we should merge this!
>>
>> We have real (accidental) DoS situations which happen because we don't
>> have this.  It usually takes the form of some library no re-joining
>> threads.  We end up deploying a few apps linked against this library,
>> and suddenly we're in trouble on a machine.  Except, this being
>> Google, we're in trouble on a lot of machines.
>
> This is a bit foggy.  I think you mean that machines are experiencing
> accidental forkbombs?
>
>> There may be other ways to cobble this sort of safety together, but
>> they are less appealing for various reasons.  cgroups are how we
>> control groups of related pids.
>>

In the end of the day, all cgroups are just a group of tasks. So I don't really
get the need to have a cgroup to control the number of tasks in the system.

Why don't we just allow all cgroups to have a limit on the number of
tasks it can hold?




-- 
Sent from my Atari.
_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/containers

^ permalink raw reply	[flat|nested] 17+ messages in thread

[parent not found: <CAA6-i6o0SPfZJDx4SRR1hY-He0L6zHuv0saH6EaE7Mrc2HF6PA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]

* Re: [PATCH 00/10] cgroups: Task counter subsystem v6
       [not found]             ` <CAA6-i6o0SPfZJDx4SRR1hY-He0L6zHuv0saH6EaE7Mrc2HF6PA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2011-11-03 16:49               ` Frederic Weisbecker
       [not found]                 ` <20111103164917.GF8198-oHC15RC7JGTpAmv0O++HtFaTQe2KTcn/@public.gmane.org>
  0 siblings, 1 reply; 17+ messages in thread
From: Frederic Weisbecker @ 2011-11-03 16:49 UTC (permalink / raw)
  To: Glauber Costa
  Cc: Aditya Kali, Tim Hockin, Paul Menage, Kay Sievers, LKML,
	Oleg Nesterov, Johannes Weiner, Tejun Heo, Andrew Morton,
	Containers

On Sat, Oct 29, 2011 at 11:38:25AM +0200, Glauber Costa wrote:
> On Sat, Oct 29, 2011 at 1:30 AM, Andrew Morton
> <akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org> wrote:
> > On Tue, 25 Oct 2011 13:06:35 -0700
> > Tim Hockin <thockin-Rl2oBbRerpQdnm+yROfE0A@public.gmane.org> wrote:
> >
> >> On Tue, Oct 4, 2011 at 3:01 PM, Andrew Morton <akpm00-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
> >> > On Mon, __3 Oct 2011 21:07:02 +0200
> >> > Frederic Weisbecker <fweisbec-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
> >> >
> >> >> Hi Andrew,
> >> >>
> >> >> This contains minor changes, mostly documentation and changelog
> >> >> updates, off-case build fix, and a code optimization in
> >> >> res_counter_common_ancestor().
> >> >
> >> > I'd normally duck a patch series like this when we're at -rc8 and ask
> >> > for it to be resent late in -rc1. __But I was feeling frisky so I
> >> > grabbed this lot for a bit of testing and will sit on it until -rc1.
> >> >
> >> > I'm still not convinced that the kernel has a burning need for a "task
> >> > counter subsystem". __Someone convince me that we should merge this!
> >>
> >> We have real (accidental) DoS situations which happen because we don't
> >> have this.  It usually takes the form of some library no re-joining
> >> threads.  We end up deploying a few apps linked against this library,
> >> and suddenly we're in trouble on a machine.  Except, this being
> >> Google, we're in trouble on a lot of machines.
> >
> > This is a bit foggy.  I think you mean that machines are experiencing
> > accidental forkbombs?
> >
> >> There may be other ways to cobble this sort of safety together, but
> >> they are less appealing for various reasons.  cgroups are how we
> >> control groups of related pids.
> >>
> 
> In the end of the day, all cgroups are just a group of tasks. So I don't really
> get the need to have a cgroup to control the number of tasks in the system.
> 
> Why don't we just allow all cgroups to have a limit on the number of
> tasks it can hold?

Not sure what you mean. You would prefer to have this as a core feature in
cgroups rather than a subsystem?

^ permalink raw reply	[flat|nested] 17+ messages in thread

[parent not found: <20111103164917.GF8198-oHC15RC7JGTpAmv0O++HtFaTQe2KTcn/@public.gmane.org>]

* Re: [PATCH 00/10] cgroups: Task counter subsystem v6
       [not found]                 ` <20111103164917.GF8198-oHC15RC7JGTpAmv0O++HtFaTQe2KTcn/@public.gmane.org>
@ 2011-11-03 16:58                   ` Glauber Costa
       [not found]                     ` <4EB2C852.6020706-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
  0 siblings, 1 reply; 17+ messages in thread
From: Glauber Costa @ 2011-11-03 16:58 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Aditya Kali, Tim Hockin, Glauber Costa, Paul Menage, Kay Sievers,
	LKML, Oleg Nesterov, Johannes Weiner, Tejun Heo, Andrew Morton,
	Paul Turner, Containers

On 11/03/2011 02:49 PM, Frederic Weisbecker wrote:
> On Sat, Oct 29, 2011 at 11:38:25AM +0200, Glauber Costa wrote:
>> On Sat, Oct 29, 2011 at 1:30 AM, Andrew Morton
>> <akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>  wrote:
>>> On Tue, 25 Oct 2011 13:06:35 -0700
>>> Tim Hockin<thockin-Rl2oBbRerpQdnm+yROfE0A@public.gmane.org>  wrote:
>>>
>>>> On Tue, Oct 4, 2011 at 3:01 PM, Andrew Morton<akpm00-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>  wrote:
>>>>> On Mon, __3 Oct 2011 21:07:02 +0200
>>>>> Frederic Weisbecker<fweisbec-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>  wrote:
>>>>>
>>>>>> Hi Andrew,
>>>>>>
>>>>>> This contains minor changes, mostly documentation and changelog
>>>>>> updates, off-case build fix, and a code optimization in
>>>>>> res_counter_common_ancestor().
>>>>>
>>>>> I'd normally duck a patch series like this when we're at -rc8 and ask
>>>>> for it to be resent late in -rc1. __But I was feeling frisky so I
>>>>> grabbed this lot for a bit of testing and will sit on it until -rc1.
>>>>>
>>>>> I'm still not convinced that the kernel has a burning need for a "task
>>>>> counter subsystem". __Someone convince me that we should merge this!
>>>>
>>>> We have real (accidental) DoS situations which happen because we don't
>>>> have this.  It usually takes the form of some library no re-joining
>>>> threads.  We end up deploying a few apps linked against this library,
>>>> and suddenly we're in trouble on a machine.  Except, this being
>>>> Google, we're in trouble on a lot of machines.
>>>
>>> This is a bit foggy.  I think you mean that machines are experiencing
>>> accidental forkbombs?
>>>
>>>> There may be other ways to cobble this sort of safety together, but
>>>> they are less appealing for various reasons.  cgroups are how we
>>>> control groups of related pids.
>>>>
>>
>> In the end of the day, all cgroups are just a group of tasks. So I don't really
>> get the need to have a cgroup to control the number of tasks in the system.
>>
>> Why don't we just allow all cgroups to have a limit on the number of
>> tasks it can hold?
>
> Not sure what you mean. You would prefer to have this as a core feature in
> cgroups rather than a subsystem?
Well, ideally, I think we should put some effort in trying to reduce the 
number of different possible cgroups subsystems.

I do see how keeping a different cgroup here adds flexibility. However, 
this flexibility very easily translate into performance losses. The 
reason is that when more than one cgroup needs to control and update 
some piece of data, because we can't assume anything about the set of 
processes they have, we have to walk hierarchies upwards multiple times 
- they are potentially different.

See for instance what happens with cpu vs cpuacct, that I am trying to 
get rid of.

Because you are controlling tasks, and tasks are the main building block 
of all cgroups, I think you should at least consider either using
a cgroup property, or bundling this into some other cgroup, like cpu - 
where there is already some need, albeit minor, to keep track of the 
number of process in a group.

^ permalink raw reply	[flat|nested] 17+ messages in thread

[parent not found: <4EB2C852.6020706-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>]

* Re: [PATCH 00/10] cgroups: Task counter subsystem v6
       [not found]                     ` <4EB2C852.6020706-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
@ 2011-11-03 17:02                       ` Paul Menage
       [not found]                         ` <CALdu-PDY8zpXYM3V9KRk4f2NyGevfNnuaWVdoT-qzSHOK--K3A-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 17+ messages in thread
From: Paul Menage @ 2011-11-03 17:02 UTC (permalink / raw)
  To: Glauber Costa
  Cc: Aditya Kali, Kay Sievers, Tim Hockin, Frederic Weisbecker,
	Containers, Johannes Weiner, LKML, Oleg Nesterov, Glauber Costa,
	Tejun Heo, Andrew Morton, Paul Turner

On Thu, Nov 3, 2011 at 9:58 AM, Glauber Costa <glommer-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org> wrote:
>
> Because you are controlling tasks, and tasks are the main building block of
> all cgroups, I think you should at least consider either using
> a cgroup property,

I don't see how making it a core cgroup property would remove the need
to walk the hierarchy.

Paul

^ permalink raw reply	[flat|nested] 17+ messages in thread

[parent not found: <CALdu-PDY8zpXYM3V9KRk4f2NyGevfNnuaWVdoT-qzSHOK--K3A-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]

* Re: [PATCH 00/10] cgroups: Task counter subsystem v6
       [not found]                         ` <CALdu-PDY8zpXYM3V9KRk4f2NyGevfNnuaWVdoT-qzSHOK--K3A-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2011-11-03 17:06                           ` Glauber Costa
       [not found]                             ` <4EB2CA03.7030601-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
  0 siblings, 1 reply; 17+ messages in thread
From: Glauber Costa @ 2011-11-03 17:06 UTC (permalink / raw)
  To: Paul Menage
  Cc: Aditya Kali, Kay Sievers, Tim Hockin, Frederic Weisbecker,
	Containers, Johannes Weiner, LKML, Oleg Nesterov, Glauber Costa,
	Tejun Heo, Andrew Morton, Paul Turner

On 11/03/2011 03:02 PM, Paul Menage wrote:
> On Thu, Nov 3, 2011 at 9:58 AM, Glauber Costa<glommer-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>  wrote:
>>
>> Because you are controlling tasks, and tasks are the main building block of
>> all cgroups, I think you should at least consider either using
>> a cgroup property,
>
> I don't see how making it a core cgroup property would remove the need
> to walk the hierarchy.
>
Sorry if I wasn't clear: It removes the need to walk multiple 
independent hierarchies. The walk is done only once.

^ permalink raw reply	[flat|nested] 17+ messages in thread

[parent not found: <4EB2CA03.7030601-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>]

* Re: [PATCH 00/10] cgroups: Task counter subsystem v6
       [not found]                             ` <4EB2CA03.7030601-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
@ 2011-11-03 17:28                               ` Paul Menage
       [not found]                                 ` <CALdu-PA2CDoeUMoNd1y44p_QzphX8J4s6NDcSyVC-rP1HGYwkA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 17+ messages in thread
From: Paul Menage @ 2011-11-03 17:28 UTC (permalink / raw)
  To: Glauber Costa
  Cc: Aditya Kali, Kay Sievers, Tim Hockin, Frederic Weisbecker,
	Containers, Johannes Weiner, LKML, Oleg Nesterov, Glauber Costa,
	Tejun Heo, Andrew Morton, Paul Turner

On Thu, Nov 3, 2011 at 10:06 AM, Glauber Costa <glommer-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org> wrote:
> Sorry if I wasn't clear: It removes the need to walk multiple independent
> hierarchies. The walk is done only once.

You're talking about at fork time, and the concern is the cache
footprint involved in walking up the parent pointer chain?

Isn't that an argument against multiple hierarchies (which is a
decision for the admin), rather than against more subsystem
flexibility? If multiple subsystems on the same hierarchy each need to
walk up the pointer chain on the same event, then after the first
subsystem has done so the chain will be in cache for any subsequent
walks from other subsystems.

Paul

^ permalink raw reply	[flat|nested] 17+ messages in thread

[parent not found: <CALdu-PA2CDoeUMoNd1y44p_QzphX8J4s6NDcSyVC-rP1HGYwkA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]

* Re: [PATCH 00/10] cgroups: Task counter subsystem v6
       [not found]                                 ` <CALdu-PA2CDoeUMoNd1y44p_QzphX8J4s6NDcSyVC-rP1HGYwkA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2011-11-03 17:35                                   ` Glauber Costa
       [not found]                                     ` <4EB2D0F2.40309-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
  0 siblings, 1 reply; 17+ messages in thread
From: Glauber Costa @ 2011-11-03 17:35 UTC (permalink / raw)
  To: Paul Menage
  Cc: Aditya Kali, Kay Sievers, Tim Hockin, Frederic Weisbecker,
	Containers, Johannes Weiner, LKML, Oleg Nesterov, Glauber Costa,
	Tejun Heo, Andrew Morton, Paul Turner

On 11/03/2011 03:28 PM, Paul Menage wrote:
> On Thu, Nov 3, 2011 at 10:06 AM, Glauber Costa<glommer-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>  wrote:
>> Sorry if I wasn't clear: It removes the need to walk multiple independent
>> hierarchies. The walk is done only once.
>
> You're talking about at fork time, and the concern is the cache
> footprint involved in walking up the parent pointer chain?

Yes, we can say this is my main concern.

> Isn't that an argument against multiple hierarchies (which is a
> decision for the admin), rather than against more subsystem
> flexibility?

Not always it is a decision for the admin. In most cases, it is a 
constraint of the problem. For containers - take lxc as an example,
the most reasonable thing to do is to grab all cgroups subsystems 
available, and contain them.

> If multiple subsystems on the same hierarchy each need to
> walk up the pointer chain on the same event, then after the first
> subsystem has done so the chain will be in cache for any subsequent
> walks from other subsystems.
No, it won't. Precisely because different subsystems have completely
independent pointer chains.

^ permalink raw reply	[flat|nested] 17+ messages in thread

[parent not found: <4EB2D0F2.40309-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>]

* Re: [PATCH 00/10] cgroups: Task counter subsystem v6
       [not found]                                     ` <4EB2D0F2.40309-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
@ 2011-11-03 17:56                                       ` Paul Menage
       [not found]                                         ` <CALdu-PDbJ69FayXSd-kjAMX8AKEroZytPapxsUn8GFsz-z1omQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 17+ messages in thread
From: Paul Menage @ 2011-11-03 17:56 UTC (permalink / raw)
  To: Glauber Costa
  Cc: Aditya Kali, Kay Sievers, Tim Hockin, Frederic Weisbecker,
	Containers, Johannes Weiner, LKML, Oleg Nesterov, Glauber Costa,
	Tejun Heo, Andrew Morton, Paul Turner

On Thu, Nov 3, 2011 at 10:35 AM, Glauber Costa <glommer-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org> wrote:
>
>> If multiple subsystems on the same hierarchy each need to
>> walk up the pointer chain on the same event, then after the first
>> subsystem has done so the chain will be in cache for any subsequent
>> walks from other subsystems.
>
> No, it won't. Precisely because different subsystems have completely
> independent pointer chains.

Because they're following res_counter parent pointers, etc, rather
than using the single cgroups parent pointer chain?

So if that's the problem, rather than artificially constrain
flexibility in order to improve micro-benchmarks, why not come up with
approaches that keep both the flexibility and the performance?

- make res_counter hierarchies be explicitly defined via the cgroup
parent pointers, rather than an parent pointer hidden inside the
res_counter. So the cgroup parent chain traversal would all be along
the common parent pointers (and res_counter would be one pointer
smaller).

- allow subsystems to specify that they need a small amount of data
that can be accessed efficiently up the cgroup chain. (Many subsystems
wouldn't need this, and those that do would likely only need it for a
subset of their per-cgroup data). Pack this data into as few
cachelines as possible, allocated as a single lump of memory per
cgroup. Each subsystem would know where in that allocation its private
data lay (it would be the same offset for every cgroup, although
dynamically determined at runtime based on the number of subsystems
mounted on that hierarchy)

Paul

^ permalink raw reply	[flat|nested] 17+ messages in thread

[parent not found: <CALdu-PDbJ69FayXSd-kjAMX8AKEroZytPapxsUn8GFsz-z1omQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]

* Re: [PATCH 00/10] cgroups: Task counter subsystem v6
       [not found]                                         ` <CALdu-PDbJ69FayXSd-kjAMX8AKEroZytPapxsUn8GFsz-z1omQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2011-11-04 13:17                                           ` Glauber Costa
  0 siblings, 0 replies; 17+ messages in thread
From: Glauber Costa @ 2011-11-04 13:17 UTC (permalink / raw)
  To: Paul Menage
  Cc: Aditya Kali, Kay Sievers, Tim Hockin, Frederic Weisbecker,
	Containers, Johannes Weiner, LKML, Oleg Nesterov,
	cgroups-u79uwXL29TY76Z2rM5mHXA, Glauber Costa, Tejun Heo,
	Andrew Morton, Paul Turner

On 11/03/2011 03:56 PM, Paul Menage wrote:
> On Thu, Nov 3, 2011 at 10:35 AM, Glauber Costa<glommer-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>  wrote:
>>
>>> If multiple subsystems on the same hierarchy each need to
>>> walk up the pointer chain on the same event, then after the first
>>> subsystem has done so the chain will be in cache for any subsequent
>>> walks from other subsystems.
>>
>> No, it won't. Precisely because different subsystems have completely
>> independent pointer chains.
>
> Because they're following res_counter parent pointers, etc, rather
> than using the single cgroups parent pointer chain?

No. Because:

/sys/fs/cgroup/my_subsys/
/sys/fs/cgroup/my_subsys/foo1
/sys/fs/cgroup/my_subsys/foo2
/sys/fs/cgroup/my_subsys/foo1/bar1

and:

/sys/fs/cgroup/my_subsys2/
/sys/fs/cgroup/my_subsys2/foo1
/sys/fs/cgroup/my_subsys2/foo1/bar1
/sys/fs/cgroup/my_subsys2/foo1/bar2

Are completely independent pointer chains. the only thing they share is 
the pointer to the root. And that's irrelevant in the pointer dance.
Also note that I used cpu and cpuacct as an example, and they don't use 
res_counters.

> So if that's the problem, rather than artificially constrain
> flexibility in order to improve micro-benchmarks, why not come up with
> approaches that keep both the flexibility and the performance?

Well, I am not opposed to that even if you happen to agree on what I 
said above. But in the end of the day, with many cgroups appearing, it
may not be about just micro benchmarks.

It is hard to draw the line, but I believe that avoiding creating new 
cgroups subsystems when possible plays in our favor.

Specifically for this one, my arguments are:

* cgroups are a task-grouping entity
* therefore, all cgroups already do some task manipulation in attach/dettach
* all cgroups subsystem already can register a fork handler

Adding a fork limit as a cgroup property seems a logical step to me 
based on that.

If, however, we are really creating this, I think we'd be better of 
referring to this as a "Task Controller" rather than a "Task Counter".

Then at least in the near future when people start trying to limit other 
task-related resources, this can serve as a natural placeholder for 
this. (See the syscall limiting that Lukasz is trying to achieve)

>
> - make res_counter hierarchies be explicitly defined via the cgroup
> parent pointers, rather than an parent pointer hidden inside the
> res_counter. So the cgroup parent chain traversal would all be along
> the common parent pointers (and res_counter would be one pointer
> smaller).
 >
>
> - allow subsystems to specify that they need a small amount of data
> that can be accessed efficiently up the cgroup chain. (Many subsystems
> wouldn't need this, and those that do would likely only need it for a
> subset of their per-cgroup data). Pack this data into as few
> cachelines as possible, allocated as a single lump of memory per
> cgroup. Each subsystem would know where in that allocation its private
> data lay (it would be the same offset for every cgroup, although
> dynamically determined at runtime based on the number of subsystems
> mounted on that hierarchy)
I thought about this second one myself.
I am not yet convinced this would be a win, but I believe there are chances.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 00/10] cgroups: Task counter subsystem v6
       [not found]         ` <20111028163021.1ce61f8a.akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>
  2011-10-29  9:38           ` Glauber Costa
@ 2011-11-03 17:00           ` Frederic Weisbecker
       [not found]             ` <20111103170038.GG8198-oHC15RC7JGTpAmv0O++HtFaTQe2KTcn/@public.gmane.org>
  1 sibling, 1 reply; 17+ messages in thread
From: Frederic Weisbecker @ 2011-11-03 17:00 UTC (permalink / raw)
  To: Andrew Morton, Tim Hockin
  Cc: Aditya Kali, Paul Menage, Kay Sievers, LKML, Oleg Nesterov,
	Johannes Weiner, Tejun Heo, Containers

On Fri, Oct 28, 2011 at 04:30:21PM -0700, Andrew Morton wrote:
> On Tue, 25 Oct 2011 13:06:35 -0700
> Tim Hockin <thockin-Rl2oBbRerpQdnm+yROfE0A@public.gmane.org> wrote:
> 
> > On Tue, Oct 4, 2011 at 3:01 PM, Andrew Morton <akpm00-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
> > > On Mon, __3 Oct 2011 21:07:02 +0200
> > > Frederic Weisbecker <fweisbec-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
> > >
> > >> Hi Andrew,
> > >>
> > >> This contains minor changes, mostly documentation and changelog
> > >> updates, off-case build fix, and a code optimization in
> > >> res_counter_common_ancestor().
> > >
> > > I'd normally duck a patch series like this when we're at -rc8 and ask
> > > for it to be resent late in -rc1. __But I was feeling frisky so I
> > > grabbed this lot for a bit of testing and will sit on it until -rc1.
> > >
> > > I'm still not convinced that the kernel has a burning need for a "task
> > > counter subsystem". __Someone convince me that we should merge this!
> > 
> > We have real (accidental) DoS situations which happen because we don't
> > have this.  It usually takes the form of some library no re-joining
> > threads.  We end up deploying a few apps linked against this library,
> > and suddenly we're in trouble on a machine.  Except, this being
> > Google, we're in trouble on a lot of machines.
> 
> This is a bit foggy.  I think you mean that machines are experiencing
> accidental forkbombs?

I'd like to hear about more details as well.

> 
> > There may be other ways to cobble this sort of safety together, but
> > they are less appealing for various reasons.  cgroups are how we
> > control groups of related pids.
> > 
> > I'd really love to be able to use this.
> 
> Has it been confirmed that this implementation actually solves the
> problem?  ie: tested a bit?
> 
> btw, Frederic told me that this version of the patchset had some
> serious problem so it's on hold pending an upgrade, regardless of other
> matters.

Yep. The particular issue is https://lkml.org/lkml/2011/10/13/532

Li Zefan proposed a fix (https://lkml.org/lkml/2011/10/17/26) which I'm
currently reworking.

But then I'd love it if you can test this subsystem to see if it really matches
your needs, Tim.

Thanks!

^ permalink raw reply	[flat|nested] 17+ messages in thread

[parent not found: <20111103170038.GG8198-oHC15RC7JGTpAmv0O++HtFaTQe2KTcn/@public.gmane.org>]

* Re: [PATCH 00/10] cgroups: Task counter subsystem v6
       [not found]             ` <20111103170038.GG8198-oHC15RC7JGTpAmv0O++HtFaTQe2KTcn/@public.gmane.org>
@ 2011-11-04  2:57               ` Li Zefan
       [not found]                 ` <4EB3549D.5090404-BthXqXjhjHXQFUHtdCDX3A@public.gmane.org>
  0 siblings, 1 reply; 17+ messages in thread
From: Li Zefan @ 2011-11-04  2:57 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Aditya Kali, Tim Hockin, Paul Menage, Kay Sievers, LKML,
	Oleg Nesterov, Johannes Weiner, Tejun Heo, Andrew Morton,
	Containers

>>> There may be other ways to cobble this sort of safety together, but
>>> they are less appealing for various reasons.  cgroups are how we
>>> control groups of related pids.
>>>
>>> I'd really love to be able to use this.
>>
>> Has it been confirmed that this implementation actually solves the
>> problem?  ie: tested a bit?
>>
>> btw, Frederic told me that this version of the patchset had some
>> serious problem so it's on hold pending an upgrade, regardless of other
>> matters.
> 
> Yep. The particular issue is https://lkml.org/lkml/2011/10/13/532
> 
> Li Zefan proposed a fix (https://lkml.org/lkml/2011/10/17/26) which I'm
> currently reworking.
> 

We really need to coordinate cgroup patches. I mean, the patchset+fix conflict
with Tejun's work, and the conflict is not trivial.

> But then I'd love it if you can test this subsystem to see if it really matches
> your needs, Tim.
> 

^ permalink raw reply	[flat|nested] 17+ messages in thread

[parent not found: <4EB3549D.5090404-BthXqXjhjHXQFUHtdCDX3A@public.gmane.org>]

* Re: [PATCH 00/10] cgroups: Task counter subsystem v6
       [not found]                 ` <4EB3549D.5090404-BthXqXjhjHXQFUHtdCDX3A@public.gmane.org>
@ 2011-11-04 12:37                   ` Frederic Weisbecker
  0 siblings, 0 replies; 17+ messages in thread
From: Frederic Weisbecker @ 2011-11-04 12:37 UTC (permalink / raw)
  To: Li Zefan
  Cc: Aditya Kali, Tim Hockin, Paul Menage, Kay Sievers, LKML,
	Oleg Nesterov, Johannes Weiner, Tejun Heo, Andrew Morton,
	Containers

On Fri, Nov 04, 2011 at 10:57:33AM +0800, Li Zefan wrote:
> >>> There may be other ways to cobble this sort of safety together, but
> >>> they are less appealing for various reasons.  cgroups are how we
> >>> control groups of related pids.
> >>>
> >>> I'd really love to be able to use this.
> >>
> >> Has it been confirmed that this implementation actually solves the
> >> problem?  ie: tested a bit?
> >>
> >> btw, Frederic told me that this version of the patchset had some
> >> serious problem so it's on hold pending an upgrade, regardless of other
> >> matters.
> > 
> > Yep. The particular issue is https://lkml.org/lkml/2011/10/13/532
> > 
> > Li Zefan proposed a fix (https://lkml.org/lkml/2011/10/17/26) which I'm
> > currently reworking.
> > 
> 
> We really need to coordinate cgroup patches. I mean, the patchset+fix conflict
> with Tejun's work, and the conflict is not trivial.

Either Tejun targets for -mm, or I try to get my patches into the pm
tree where Tejun's patches are aimed. I just would like to keep Andrew
in the process of my patches somehow.

Also it might be time for you and/or Paul Menage to run a cgroup git tree, what do
you think :)

^ permalink raw reply	[flat|nested] 17+ messages in thread

[parent not found: <1317668832-10784-1-git-send-email-fweisbec-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>]

* Re: [PATCH 00/10] cgroups: Task counter subsystem v6
       [not found] ` <1317668832-10784-1-git-send-email-fweisbec-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
@ 2011-12-13 15:58   ` Tejun Heo
       [not found]     ` <20111213155848.GI25802-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
  0 siblings, 1 reply; 17+ messages in thread
From: Tejun Heo @ 2011-12-13 15:58 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Aditya Kali, Tim Hockin, Paul Menage, Kay Sievers, LKML,
	Oleg Nesterov, Johannes Weiner, Andrew Morton, Containers

Hello, Frederic.

Can you please rebase the patchset on top of cgroup/for-3.3?  I
primarily like the idea of being able to track process usage w/ cgroup
and enforce limits on it but hope that it could somehow integrate w/
cgroup freezer.  ie. trigger freezer if it goes over limit and let the
userland tool / administrator deal with the frozen cgroup.  I'm
planning on extending cgroup freezer such that it supports recursive
freezing and killing of frozen tasks.  If we can fit task counters
into that, we'll have general method of handling problematic cgroups -
freeze, notify userland and let it deal with it.

Thank you.

-- 
tejun

^ permalink raw reply	[flat|nested] 17+ messages in thread

[parent not found: <20111213155848.GI25802-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>]

* Re: [PATCH 00/10] cgroups: Task counter subsystem v6
       [not found]     ` <20111213155848.GI25802-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
@ 2011-12-13 19:06       ` Frederic Weisbecker
       [not found]         ` <20111213190642.GB2421-oHC15RC7JGTpAmv0O++HtFaTQe2KTcn/@public.gmane.org>
  0 siblings, 1 reply; 17+ messages in thread
From: Frederic Weisbecker @ 2011-12-13 19:06 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Aditya Kali, Tim Hockin, Paul Menage, Kay Sievers, LKML,
	Oleg Nesterov, Johannes Weiner, Andrew Morton, Containers

On Tue, Dec 13, 2011 at 07:58:48AM -0800, Tejun Heo wrote:
> Hello, Frederic.
> 
> Can you please rebase the patchset on top of cgroup/for-3.3?

Sure. But please note its fate is still under discussion. Whether
we want it upstream is still a running debate. But I certainly
need to rebase against your tree.

> I primarily like the idea of being able to track process usage w/ cgroup
> and enforce limits on it but hope that it could somehow integrate w/
> cgroup freezer.  ie. trigger freezer if it goes over limit and let the
> userland tool / administrator deal with the frozen cgroup.  I'm
> planning on extending cgroup freezer such that it supports recursive
> freezing and killing of frozen tasks.  If we can fit task counters
> into that, we'll have general method of handling problematic cgroups -
> freeze, notify userland and let it deal with it.

Hmm, so you suggest a kernel trigger that freeze the cgroup when the
task limit is reached?

What about rather implementing register_event() for the tasks.usage such
that the user can be notified using eventfd when the limit is reached.
Then it would be up to the user to decide to freeze or any other thing.
Sounds like a more generic solution.

Hm?

Thanks.

^ permalink raw reply	[flat|nested] 17+ messages in thread

[parent not found: <20111213190642.GB2421-oHC15RC7JGTpAmv0O++HtFaTQe2KTcn/@public.gmane.org>]

* Re: [PATCH 00/10] cgroups: Task counter subsystem v6
       [not found]         ` <20111213190642.GB2421-oHC15RC7JGTpAmv0O++HtFaTQe2KTcn/@public.gmane.org>
@ 2011-12-13 20:49           ` Tejun Heo
       [not found]             ` <20111213204918.GK25802-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
  0 siblings, 1 reply; 17+ messages in thread
From: Tejun Heo @ 2011-12-13 20:49 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Aditya Kali, Tim Hockin, Paul Menage, Kay Sievers, LKML,
	Oleg Nesterov, Johannes Weiner, Andrew Morton, Containers

Hello,

On Tue, Dec 13, 2011 at 08:06:46PM +0100, Frederic Weisbecker wrote:
> On Tue, Dec 13, 2011 at 07:58:48AM -0800, Tejun Heo wrote:
> > Can you please rebase the patchset on top of cgroup/for-3.3?
> 
> Sure. But please note its fate is still under discussion. Whether
> we want it upstream is still a running debate. But I certainly
> need to rebase against your tree.

I see.

> > I primarily like the idea of being able to track process usage w/ cgroup
> > and enforce limits on it but hope that it could somehow integrate w/
> > cgroup freezer.  ie. trigger freezer if it goes over limit and let the
> > userland tool / administrator deal with the frozen cgroup.  I'm
> > planning on extending cgroup freezer such that it supports recursive
> > freezing and killing of frozen tasks.  If we can fit task counters
> > into that, we'll have general method of handling problematic cgroups -
> > freeze, notify userland and let it deal with it.
> 
> Hmm, so you suggest a kernel trigger that freeze the cgroup when the
> task limit is reached?

Yeah, something like that.  I'm not really sure about how it would
actually work tho.

> What about rather implementing register_event() for the tasks.usage such
> that the user can be notified using eventfd when the limit is reached.
> Then it would be up to the user to decide to freeze or any other thing.
> Sounds like a more generic solution.

Maybe, the problem would be how to ensure that the userland manager
can respond fast enough (whatever that means...).

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 17+ messages in thread

[parent not found: <20111213204918.GK25802-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>]

* Re: [PATCH 00/10] cgroups: Task counter subsystem v6
       [not found]             ` <20111213204918.GK25802-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
@ 2011-12-14 15:07               ` Frederic Weisbecker
  0 siblings, 0 replies; 17+ messages in thread
From: Frederic Weisbecker @ 2011-12-14 15:07 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Aditya Kali, Tim Hockin, Paul Menage, Kay Sievers, LKML,
	Oleg Nesterov, Johannes Weiner, Andrew Morton, Containers

On Tue, Dec 13, 2011 at 12:49:18PM -0800, Tejun Heo wrote:
> Hello,
> 
> On Tue, Dec 13, 2011 at 08:06:46PM +0100, Frederic Weisbecker wrote:
> > On Tue, Dec 13, 2011 at 07:58:48AM -0800, Tejun Heo wrote:
> > > Can you please rebase the patchset on top of cgroup/for-3.3?
> > 
> > Sure. But please note its fate is still under discussion. Whether
> > we want it upstream is still a running debate. But I certainly
> > need to rebase against your tree.
> 
> I see.
> 
> > > I primarily like the idea of being able to track process usage w/ cgroup
> > > and enforce limits on it but hope that it could somehow integrate w/
> > > cgroup freezer.  ie. trigger freezer if it goes over limit and let the
> > > userland tool / administrator deal with the frozen cgroup.  I'm
> > > planning on extending cgroup freezer such that it supports recursive
> > > freezing and killing of frozen tasks.  If we can fit task counters
> > > into that, we'll have general method of handling problematic cgroups -
> > > freeze, notify userland and let it deal with it.
> > 
> > Hmm, so you suggest a kernel trigger that freeze the cgroup when the
> > task limit is reached?
> 
> Yeah, something like that.  I'm not really sure about how it would
> actually work tho.
> 
> > What about rather implementing register_event() for the tasks.usage such
> > that the user can be notified using eventfd when the limit is reached.
> > Then it would be up to the user to decide to freeze or any other thing.
> > Sounds like a more generic solution.
> 
> Maybe, the problem would be how to ensure that the userland manager
> can respond fast enough (whatever that means...).

Yeah that's part of the goal of the task counter: limit the spreading
of the forkbomb soon enough such that the machine stays responsive and
the admin can react accordingly.

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2011-12-14 15:07 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <1317668832-10784-1-git-send-email-fweisbec@gmail.com>
     [not found] ` <20111004150111.e9337268.akpm00@gmail.com>
     [not found]   ` <CAAAKZwu67VMiZgdpp=i5p7zyGbOHGHXwF_iprufGPzTLkkUF2A@mail.gmail.com>
     [not found]     ` <CAAAKZwu67VMiZgdpp=i5p7zyGbOHGHXwF_iprufGPzTLkkUF2A-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2011-10-28 23:30       ` [PATCH 00/10] cgroups: Task counter subsystem v6 Andrew Morton
     [not found]         ` <20111028163021.1ce61f8a.akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>
2011-10-29  9:38           ` Glauber Costa
     [not found]             ` <CAA6-i6o0SPfZJDx4SRR1hY-He0L6zHuv0saH6EaE7Mrc2HF6PA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2011-11-03 16:49               ` Frederic Weisbecker
     [not found]                 ` <20111103164917.GF8198-oHC15RC7JGTpAmv0O++HtFaTQe2KTcn/@public.gmane.org>
2011-11-03 16:58                   ` Glauber Costa
     [not found]                     ` <4EB2C852.6020706-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
2011-11-03 17:02                       ` Paul Menage
     [not found]                         ` <CALdu-PDY8zpXYM3V9KRk4f2NyGevfNnuaWVdoT-qzSHOK--K3A-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2011-11-03 17:06                           ` Glauber Costa
     [not found]                             ` <4EB2CA03.7030601-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
2011-11-03 17:28                               ` Paul Menage
     [not found]                                 ` <CALdu-PA2CDoeUMoNd1y44p_QzphX8J4s6NDcSyVC-rP1HGYwkA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2011-11-03 17:35                                   ` Glauber Costa
     [not found]                                     ` <4EB2D0F2.40309-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
2011-11-03 17:56                                       ` Paul Menage
     [not found]                                         ` <CALdu-PDbJ69FayXSd-kjAMX8AKEroZytPapxsUn8GFsz-z1omQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2011-11-04 13:17                                           ` Glauber Costa
2011-11-03 17:00           ` Frederic Weisbecker
     [not found]             ` <20111103170038.GG8198-oHC15RC7JGTpAmv0O++HtFaTQe2KTcn/@public.gmane.org>
2011-11-04  2:57               ` Li Zefan
     [not found]                 ` <4EB3549D.5090404-BthXqXjhjHXQFUHtdCDX3A@public.gmane.org>
2011-11-04 12:37                   ` Frederic Weisbecker
     [not found] ` <1317668832-10784-1-git-send-email-fweisbec-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2011-12-13 15:58   ` Tejun Heo
     [not found]     ` <20111213155848.GI25802-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
2011-12-13 19:06       ` Frederic Weisbecker
     [not found]         ` <20111213190642.GB2421-oHC15RC7JGTpAmv0O++HtFaTQe2KTcn/@public.gmane.org>
2011-12-13 20:49           ` Tejun Heo
     [not found]             ` <20111213204918.GK25802-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
2011-12-14 15:07               ` Frederic Weisbecker

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox