Protection against container fork bombs [WAS: Re: memcg with kmem limit doesn't recover after disk i/o causes limit to be hit]

Linux Container Development
 help / color / mirror / Atom feed

* Protection against container fork bombs [WAS: Re: memcg with kmem limit doesn't recover after disk i/o causes limit to be hit]
       [not found]     ` <5351679F.5040908-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
@ 2014-04-20 14:28       ` Richard Davies
  0 siblings, 0 replies; 33+ messages in thread
From: Richard Davies @ 2014-04-20 14:28 UTC (permalink / raw)
  To: Vladimir Davydov, Frederic Weisbecker, David Rientjes,
	Glauber Costa, Tejun Heo, Max Kellermann, Johannes Weiner,
	William Dauchy, Tim Hockin, Michal Hocko, Daniel Walsh,
	Daniel Berrange
  Cc: cgroups-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg

Vladimir Davydov wrote:
> Richard Davies wrote:
> > I have a simple reproducible test case in which untar in a memcg with a
> > kmem limit gets into trouble during heavy disk i/o (on ext3) and never
> > properly recovers. This is simplified from real world problems with
> > heavy disk i/o inside containers.
>
> Unfortunately, work on per cgroup kmem limits is not completed yet.
> Currently it lacks kmem reclaim on per cgroup memory pressure, which is
> vital for using kmem limits in real life.
...
> In short, kmem limiting for memory cgroups is currently broken. Do not
> use it. We are working on making it usable though.

Thanks for explaining the strange errors I got.


My motivation is to prevent a fork bomb in a container from affecting other
processes outside that container.

kmem limits were the preferred mechanism in several previous discussions
about two years ago (I'm copying in participants from those previous
discussions and give links below). So I tried kmem first but found bugs.


What is the best mechanism available today, until kmem limits mature?

RLIMIT_NPROC exists but is per-user, not per-container.

Perhaps there is an up-to-date task counter patchset or similar?


Thank you all,

Richard.



Some references to previous discussions:

Fork bomb limitation in memcg WAS: Re: [PATCH 00/11] kmem controller for memcg: stripped down version
http://thread.gmane.org/gmane.linux.kernel/1318266/focus=1319372

Re: [PATCH 00/10] cgroups: Task counter subsystem v8
http://thread.gmane.org/gmane.linux.kernel/1246704/focus=1467310

[RFD] Merge task counter into memcg
http://thread.gmane.org/gmane.linux.kernel/1280302

Re: [PATCH -mm] cgroup: Fix task counter common ancestor logic
http://thread.gmane.org/gmane.linux.kernel/1212650/focus=1220186

[PATCH] new cgroup controller "fork"
http://thread.gmane.org/gmane.linux.kernel/1210878

Re: Process Limit cgroups
http://thread.gmane.org/gmane.linux.kernel.cgroups/9368/focus=9369

Re: [lxc-devel] process number limit
https://www.mail-archive.com/lxc-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f@public.gmane.org/msg03309.html

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Protection against container fork bombs [WAS: Re: memcg with kmem limit doesn't recover after disk i/o causes limit to be hit]
       [not found]       ` <20140420142830.GC22077-2oeHp4OYwSjPZs67QiJbJtBPR1lH4CV8@public.gmane.org>
@ 2014-04-20 18:35         ` Tim Hockin
  2014-04-22 18:39         ` Dwight Engen
  1 sibling, 0 replies; 33+ messages in thread
From: Tim Hockin @ 2014-04-20 18:35 UTC (permalink / raw)
  To: Richard Davies
  Cc: Vladimir Davydov, Daniel Walsh, Max Kellermann,
	Frederic Weisbecker,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	David Rientjes, Glauber Costa, Michal Hocko,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg, William Dauchy, Johannes Weiner,
	Tejun Heo, cgroups-u79uwXL29TY76Z2rM5mHXA

I would still be in strong support of a cgroup replacement for NPROC rlimit.
On Apr 20, 2014 7:29 AM, "Richard Davies" <richard-li8END47hbdWk0Htik3J/w@public.gmane.org> wrote:

> Vladimir Davydov wrote:
> > Richard Davies wrote:
> > > I have a simple reproducible test case in which untar in a memcg with a
> > > kmem limit gets into trouble during heavy disk i/o (on ext3) and never
> > > properly recovers. This is simplified from real world problems with
> > > heavy disk i/o inside containers.
> >
> > Unfortunately, work on per cgroup kmem limits is not completed yet.
> > Currently it lacks kmem reclaim on per cgroup memory pressure, which is
> > vital for using kmem limits in real life.
> ...
> > In short, kmem limiting for memory cgroups is currently broken. Do not
> > use it. We are working on making it usable though.
>
> Thanks for explaining the strange errors I got.
>
>
> My motivation is to prevent a fork bomb in a container from affecting other
> processes outside that container.
>
> kmem limits were the preferred mechanism in several previous discussions
> about two years ago (I'm copying in participants from those previous
> discussions and give links below). So I tried kmem first but found bugs.
>
>
> What is the best mechanism available today, until kmem limits mature?
>
> RLIMIT_NPROC exists but is per-user, not per-container.
>
> Perhaps there is an up-to-date task counter patchset or similar?
>
>
> Thank you all,
>
> Richard.
>
>
>
> Some references to previous discussions:
>
> Fork bomb limitation in memcg WAS: Re: [PATCH 00/11] kmem controller for
> memcg: stripped down version
> http://thread.gmane.org/gmane.linux.kernel/1318266/focus=1319372
>
> Re: [PATCH 00/10] cgroups: Task counter subsystem v8
> http://thread.gmane.org/gmane.linux.kernel/1246704/focus=1467310
>
> [RFD] Merge task counter into memcg
> http://thread.gmane.org/gmane.linux.kernel/1280302
>
> Re: [PATCH -mm] cgroup: Fix task counter common ancestor logic
> http://thread.gmane.org/gmane.linux.kernel/1212650/focus=1220186
>
> [PATCH] new cgroup controller "fork"
> http://thread.gmane.org/gmane.linux.kernel/1210878
>
> Re: Process Limit cgroups
> http://thread.gmane.org/gmane.linux.kernel.cgroups/9368/focus=9369
>
> Re: [lxc-devel] process number limit
> https://www.mail-archive.com/lxc-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f@public.gmane.org/msg03309.html
>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Protection against container fork bombs [WAS: Re: memcg with kmem limit doesn't recover after disk i/o causes limit to be hit]
       [not found]       ` <20140420142830.GC22077-2oeHp4OYwSjPZs67QiJbJtBPR1lH4CV8@public.gmane.org>
  2014-04-20 18:35         ` Tim Hockin
@ 2014-04-22 18:39         ` Dwight Engen
  1 sibling, 0 replies; 33+ messages in thread
From: Dwight Engen @ 2014-04-22 18:39 UTC (permalink / raw)
  To: Richard Davies
  Cc: Vladimir Davydov, Daniel Walsh, Max Kellermann, Tim Hockin,
	Frederic Weisbecker,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	Johannes Weiner, Glauber Costa, Michal Hocko,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg, William Dauchy, David Rientjes,
	Tejun Heo, cgroups-u79uwXL29TY76Z2rM5mHXA

On Sun, 20 Apr 2014 15:28:30 +0100
Richard Davies <richard-li8END47hbdWk0Htik3J/w@public.gmane.org> wrote:

> Vladimir Davydov wrote:
> > Richard Davies wrote:
> > > I have a simple reproducible test case in which untar in a memcg
> > > with a kmem limit gets into trouble during heavy disk i/o (on
> > > ext3) and never properly recovers. This is simplified from real
> > > world problems with heavy disk i/o inside containers.
> >
> > Unfortunately, work on per cgroup kmem limits is not completed yet.
> > Currently it lacks kmem reclaim on per cgroup memory pressure,
> > which is vital for using kmem limits in real life.
> ...
> > In short, kmem limiting for memory cgroups is currently broken. Do
> > not use it. We are working on making it usable though.
> 
> Thanks for explaining the strange errors I got.
> 
> 
> My motivation is to prevent a fork bomb in a container from affecting
> other processes outside that container.
> 
> kmem limits were the preferred mechanism in several previous
> discussions about two years ago (I'm copying in participants from
> those previous discussions and give links below). So I tried kmem
> first but found bugs.
> 
> 
> What is the best mechanism available today, until kmem limits mature?
> 
> RLIMIT_NPROC exists but is per-user, not per-container.
> 
> Perhaps there is an up-to-date task counter patchset or similar?

I updated Frederic's task counter patches and included Max Kellermann's
fork limiter here:

http://thread.gmane.org/gmane.linux.kernel.containers/27212

I can send you a more recent patchset (against 3.13.10) if you would
find it useful.

> Thank you all,
> 
> Richard.
> 
> 
> 
> Some references to previous discussions:
> 
> Fork bomb limitation in memcg WAS: Re: [PATCH 00/11] kmem controller
> for memcg: stripped down version
> http://thread.gmane.org/gmane.linux.kernel/1318266/focus=1319372
> 
> Re: [PATCH 00/10] cgroups: Task counter subsystem v8
> http://thread.gmane.org/gmane.linux.kernel/1246704/focus=1467310
> 
> [RFD] Merge task counter into memcg
> http://thread.gmane.org/gmane.linux.kernel/1280302
> 
> Re: [PATCH -mm] cgroup: Fix task counter common ancestor logic
> http://thread.gmane.org/gmane.linux.kernel/1212650/focus=1220186
> 
> [PATCH] new cgroup controller "fork"
> http://thread.gmane.org/gmane.linux.kernel/1210878
> 
> Re: Process Limit cgroups
> http://thread.gmane.org/gmane.linux.kernel.cgroups/9368/focus=9369
> 
> Re: [lxc-devel] process number limit
> https://www.mail-archive.com/lxc-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f@public.gmane.org/msg03309.html
> _______________________________________________
> Containers mailing list
> Containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org
> https://lists.linuxfoundation.org/mailman/listinfo/containers

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Protection against container fork bombs [WAS: Re: memcg with kmem limit doesn't recover after disk i/o causes limit to be hit]
       [not found]         ` <20140422143943.20609800-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
@ 2014-04-22 20:05           ` Richard Davies
  0 siblings, 0 replies; 33+ messages in thread
From: Richard Davies @ 2014-04-22 20:05 UTC (permalink / raw)
  To: Dwight Engen
  Cc: Vladimir Davydov, Daniel Walsh, Max Kellermann, Tim Hockin,
	Frederic Weisbecker,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	Johannes Weiner, Glauber Costa, Michal Hocko,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg, William Dauchy, David Rientjes,
	Tejun Heo, cgroups-u79uwXL29TY76Z2rM5mHXA

Dwight Engen wrote:
> Richard Davies wrote:
> > Vladimir Davydov wrote:
> > > In short, kmem limiting for memory cgroups is currently broken. Do
> > > not use it. We are working on making it usable though.
...
> > What is the best mechanism available today, until kmem limits mature?
> >
> > RLIMIT_NPROC exists but is per-user, not per-container.
> >
> > Perhaps there is an up-to-date task counter patchset or similar?
>
> I updated Frederic's task counter patches and included Max Kellermann's
> fork limiter here:
>
> http://thread.gmane.org/gmane.linux.kernel.containers/27212
>
> I can send you a more recent patchset (against 3.13.10) if you would
> find it useful.

Yes please, I would be interested in that. Ideally even against 3.14.1 if
you have that too.

Thanks,

Richard.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Protection against container fork bombs [WAS: Re: memcg with kmem limit doesn't recover after disk i/o causes limit to be hit]
       [not found]           ` <20140422200531.GA19334-2oeHp4OYwSjPZs67QiJbJtBPR1lH4CV8@public.gmane.org>
@ 2014-04-22 20:13             ` Tim Hockin
  2014-04-23  6:07             ` Marian Marinov
  2014-06-10 12:18             ` Alin Dobre
  2 siblings, 0 replies; 33+ messages in thread
From: Tim Hockin @ 2014-04-22 20:13 UTC (permalink / raw)
  To: Richard Davies
  Cc: Vladimir Davydov, Daniel Walsh, Max Kellermann, Tim Hockin,
	Frederic Weisbecker,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	Johannes Weiner, Glauber Costa, Michal Hocko,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org, William Dauchy,
	David Rientjes, Tejun Heo, cgroups-u79uwXL29TY76Z2rM5mHXA

Who in kernel-land still needs to be convinced of the utility of this idea?

On Tue, Apr 22, 2014 at 1:05 PM, Richard Davies <richard-li8END47hbdWk0Htik3J/w@public.gmane.org> wrote:
> Dwight Engen wrote:
>> Richard Davies wrote:
>> > Vladimir Davydov wrote:
>> > > In short, kmem limiting for memory cgroups is currently broken. Do
>> > > not use it. We are working on making it usable though.
> ...
>> > What is the best mechanism available today, until kmem limits mature?
>> >
>> > RLIMIT_NPROC exists but is per-user, not per-container.
>> >
>> > Perhaps there is an up-to-date task counter patchset or similar?
>>
>> I updated Frederic's task counter patches and included Max Kellermann's
>> fork limiter here:
>>
>> http://thread.gmane.org/gmane.linux.kernel.containers/27212
>>
>> I can send you a more recent patchset (against 3.13.10) if you would
>> find it useful.
>
> Yes please, I would be interested in that. Ideally even against 3.14.1 if
> you have that too.
>
> Thanks,
>
> Richard.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Protection against container fork bombs [WAS: Re: memcg with kmem limit doesn't recover after disk i/o causes limit to be hit]
       [not found]           ` <20140422200531.GA19334-2oeHp4OYwSjPZs67QiJbJtBPR1lH4CV8@public.gmane.org>
  2014-04-22 20:13             ` Tim Hockin
@ 2014-04-23  6:07             ` Marian Marinov
  2014-06-10 12:18             ` Alin Dobre
  2 siblings, 0 replies; 33+ messages in thread
From: Marian Marinov @ 2014-04-23  6:07 UTC (permalink / raw)
  To: Richard Davies, Dwight Engen
  Cc: Vladimir Davydov, Daniel Walsh, Max Kellermann, Tim Hockin,
	Frederic Weisbecker,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	Johannes Weiner, Glauber Costa, Michal Hocko,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg, William Dauchy, David Rientjes,
	Tejun Heo, cgroups-u79uwXL29TY76Z2rM5mHXA

On 04/22/2014 11:05 PM, Richard Davies wrote:
> Dwight Engen wrote:
>> Richard Davies wrote:
>>> Vladimir Davydov wrote:
>>>> In short, kmem limiting for memory cgroups is currently broken. Do
>>>> not use it. We are working on making it usable though.
> ...
>>> What is the best mechanism available today, until kmem limits mature?
>>>
>>> RLIMIT_NPROC exists but is per-user, not per-container.
>>>
>>> Perhaps there is an up-to-date task counter patchset or similar?
>>
>> I updated Frederic's task counter patches and included Max Kellermann's
>> fork limiter here:
>>
>> http://thread.gmane.org/gmane.linux.kernel.containers/27212
>>
>> I can send you a more recent patchset (against 3.13.10) if you would
>> find it useful.
>
> Yes please, I would be interested in that. Ideally even against 3.14.1 if
> you have that too.

Dwight, do you have these patches in any public repo?

I would like to test them also.

Marian

>
> Thanks,
>
> Richard.
> --
> To unsubscribe from this list: send the line "unsubscribe cgroups" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Protection against container fork bombs [WAS: Re: memcg with kmem limit doesn't recover after disk i/o causes limit to be hit]
       [not found]             ` <535758A0.5000500-NV7Lj0SOnH0@public.gmane.org>
@ 2014-04-23 12:49               ` Dwight Engen
  0 siblings, 0 replies; 33+ messages in thread
From: Dwight Engen @ 2014-04-23 12:49 UTC (permalink / raw)
  To: Marian Marinov
  Cc: Richard Davies, Vladimir Davydov, Daniel Walsh, Max Kellermann,
	Tim Hockin, Frederic Weisbecker,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	Johannes Weiner, Glauber Costa, Michal Hocko,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg, William Dauchy, David Rientjes,
	Tejun Heo, cgroups-u79uwXL29TY76Z2rM5mHXA

On Wed, 23 Apr 2014 09:07:28 +0300
Marian Marinov <mm-NV7Lj0SOnH0@public.gmane.org> wrote:

> On 04/22/2014 11:05 PM, Richard Davies wrote:
> > Dwight Engen wrote:
> >> Richard Davies wrote:
> >>> Vladimir Davydov wrote:
> >>>> In short, kmem limiting for memory cgroups is currently broken.
> >>>> Do not use it. We are working on making it usable though.
> > ...
> >>> What is the best mechanism available today, until kmem limits
> >>> mature?
> >>>
> >>> RLIMIT_NPROC exists but is per-user, not per-container.
> >>>
> >>> Perhaps there is an up-to-date task counter patchset or similar?
> >>
> >> I updated Frederic's task counter patches and included Max
> >> Kellermann's fork limiter here:
> >>
> >> http://thread.gmane.org/gmane.linux.kernel.containers/27212
> >>
> >> I can send you a more recent patchset (against 3.13.10) if you
> >> would find it useful.
> >
> > Yes please, I would be interested in that. Ideally even against
> > 3.14.1 if you have that too.
> 
> Dwight, do you have these patches in any public repo?
> 
> I would like to test them also.

Hi Marian, I put the patches against 3.13.11 and 3.14.1 up at:

git://github.com/dwengen/linux.git cpuacct-task-limit-3.13
git://github.com/dwengen/linux.git cpuacct-task-limit-3.14
 
> Marian
> 
> >
> > Thanks,
> >
> > Richard.
> > --
> > To unsubscribe from this list: send the line "unsubscribe cgroups"
> > in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >
> >
> 

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Protection against container fork bombs [WAS: Re: memcg with kmem limit doesn't recover after disk i/o causes limit to be hit]
       [not found]               ` <20140423084942.560ae837-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
@ 2014-04-28 18:00                 ` Serge Hallyn
  2014-04-29  7:25                   ` Michal Hocko
  2014-05-06 11:40                 ` Marian Marinov
  2014-06-10 14:50                 ` Marian Marinov
  2 siblings, 1 reply; 33+ messages in thread
From: Serge Hallyn @ 2014-04-28 18:00 UTC (permalink / raw)
  To: Dwight Engen
  Cc: Richard Davies, Vladimir Davydov, Marian Marinov, Max Kellermann,
	Tim Hockin, Frederic Weisbecker,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	Daniel Walsh, cgroups-u79uwXL29TY76Z2rM5mHXA, Glauber Costa,
	Michal Hocko, linux-mm-Bw31MaZKKs3YtjvyW6yDsg, William Dauchy,
	Johannes Weiner, Tejun Heo, David Rientjes

Quoting Dwight Engen (dwight.engen-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org):
> On Wed, 23 Apr 2014 09:07:28 +0300
> Marian Marinov <mm-NV7Lj0SOnH0@public.gmane.org> wrote:
> 
> > On 04/22/2014 11:05 PM, Richard Davies wrote:
> > > Dwight Engen wrote:
> > >> Richard Davies wrote:
> > >>> Vladimir Davydov wrote:
> > >>>> In short, kmem limiting for memory cgroups is currently broken.
> > >>>> Do not use it. We are working on making it usable though.
> > > ...
> > >>> What is the best mechanism available today, until kmem limits
> > >>> mature?
> > >>>
> > >>> RLIMIT_NPROC exists but is per-user, not per-container.
> > >>>
> > >>> Perhaps there is an up-to-date task counter patchset or similar?
> > >>
> > >> I updated Frederic's task counter patches and included Max
> > >> Kellermann's fork limiter here:
> > >>
> > >> http://thread.gmane.org/gmane.linux.kernel.containers/27212
> > >>
> > >> I can send you a more recent patchset (against 3.13.10) if you
> > >> would find it useful.
> > >
> > > Yes please, I would be interested in that. Ideally even against
> > > 3.14.1 if you have that too.
> > 
> > Dwight, do you have these patches in any public repo?
> > 
> > I would like to test them also.
> 
> Hi Marian, I put the patches against 3.13.11 and 3.14.1 up at:
> 
> git://github.com/dwengen/linux.git cpuacct-task-limit-3.13
> git://github.com/dwengen/linux.git cpuacct-task-limit-3.14

Thanks, Dwight.  FWIW I'm agreed with Tim, Dwight, Richard, and Marian
that a task limit would be a proper cgroup extension, and specifically
that approximating that with a kmem limit is not a reasonable substitute.

-serge

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Protection against container fork bombs [WAS: Re: memcg with kmem limit doesn't recover after disk i/o causes limit to be hit]
  2014-04-28 18:00                 ` Serge Hallyn
@ 2014-04-29  7:25                   ` Michal Hocko
       [not found]                     ` <20140429072515.GB15058-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
       [not found]                     ` <20140429130353.GA27354@ubuntumail>
  0 siblings, 2 replies; 33+ messages in thread
From: Michal Hocko @ 2014-04-29  7:25 UTC (permalink / raw)
  To: Serge Hallyn
  Cc: Richard Davies, Vladimir Davydov, Marian Marinov, Max Kellermann,
	Tim Hockin, Frederic Weisbecker,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	Daniel Walsh, cgroups-u79uwXL29TY76Z2rM5mHXA, Glauber Costa,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg, William Dauchy, Johannes Weiner,
	Tejun Heo, David Rientjes

On Mon 28-04-14 18:00:25, Serge Hallyn wrote:
> Quoting Dwight Engen (dwight.engen-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org):
> > On Wed, 23 Apr 2014 09:07:28 +0300
> > Marian Marinov <mm-NV7Lj0SOnH0@public.gmane.org> wrote:
> > 
> > > On 04/22/2014 11:05 PM, Richard Davies wrote:
> > > > Dwight Engen wrote:
> > > >> Richard Davies wrote:
> > > >>> Vladimir Davydov wrote:
> > > >>>> In short, kmem limiting for memory cgroups is currently broken.
> > > >>>> Do not use it. We are working on making it usable though.
> > > > ...
> > > >>> What is the best mechanism available today, until kmem limits
> > > >>> mature?
> > > >>>
> > > >>> RLIMIT_NPROC exists but is per-user, not per-container.
> > > >>>
> > > >>> Perhaps there is an up-to-date task counter patchset or similar?
> > > >>
> > > >> I updated Frederic's task counter patches and included Max
> > > >> Kellermann's fork limiter here:
> > > >>
> > > >> http://thread.gmane.org/gmane.linux.kernel.containers/27212
> > > >>
> > > >> I can send you a more recent patchset (against 3.13.10) if you
> > > >> would find it useful.
> > > >
> > > > Yes please, I would be interested in that. Ideally even against
> > > > 3.14.1 if you have that too.
> > > 
> > > Dwight, do you have these patches in any public repo?
> > > 
> > > I would like to test them also.
> > 
> > Hi Marian, I put the patches against 3.13.11 and 3.14.1 up at:
> > 
> > git://github.com/dwengen/linux.git cpuacct-task-limit-3.13
> > git://github.com/dwengen/linux.git cpuacct-task-limit-3.14
> 
> Thanks, Dwight.  FWIW I'm agreed with Tim, Dwight, Richard, and Marian
> that a task limit would be a proper cgroup extension, and specifically
> that approximating that with a kmem limit is not a reasonable substitute.

The current state of the kmem limit, which is improving a lot thanks to
Vladimir, is not a reason for a new extension/controller. We are just
not yet there.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Protection against container fork bombs [WAS: Re: memcg with kmem limit doesn't recover after disk i/o causes limit to be hit]
       [not found]                     ` <20140429072515.GB15058-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
@ 2014-04-29 13:03                       ` Serge Hallyn
  0 siblings, 0 replies; 33+ messages in thread
From: Serge Hallyn @ 2014-04-29 13:03 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Richard Davies, Vladimir Davydov, David Rientjes, Marian Marinov,
	Max Kellermann, Tim Hockin, Frederic Weisbecker,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	Daniel Walsh, Glauber Costa, linux-mm-Bw31MaZKKs3YtjvyW6yDsg,
	William Dauchy, Johannes Weiner, Tejun Heo,
	cgroups-u79uwXL29TY76Z2rM5mHXA

Quoting Michal Hocko (mhocko-AlSwsSmVLrQ@public.gmane.org):
> On Mon 28-04-14 18:00:25, Serge Hallyn wrote:
> > Quoting Dwight Engen (dwight.engen-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org):
> > > On Wed, 23 Apr 2014 09:07:28 +0300
> > > Marian Marinov <mm-NV7Lj0SOnH0@public.gmane.org> wrote:
> > > 
> > > > On 04/22/2014 11:05 PM, Richard Davies wrote:
> > > > > Dwight Engen wrote:
> > > > >> Richard Davies wrote:
> > > > >>> Vladimir Davydov wrote:
> > > > >>>> In short, kmem limiting for memory cgroups is currently broken.
> > > > >>>> Do not use it. We are working on making it usable though.
> > > > > ...
> > > > >>> What is the best mechanism available today, until kmem limits
> > > > >>> mature?
> > > > >>>
> > > > >>> RLIMIT_NPROC exists but is per-user, not per-container.
> > > > >>>
> > > > >>> Perhaps there is an up-to-date task counter patchset or similar?
> > > > >>
> > > > >> I updated Frederic's task counter patches and included Max
> > > > >> Kellermann's fork limiter here:
> > > > >>
> > > > >> http://thread.gmane.org/gmane.linux.kernel.containers/27212
> > > > >>
> > > > >> I can send you a more recent patchset (against 3.13.10) if you
> > > > >> would find it useful.
> > > > >
> > > > > Yes please, I would be interested in that. Ideally even against
> > > > > 3.14.1 if you have that too.
> > > > 
> > > > Dwight, do you have these patches in any public repo?
> > > > 
> > > > I would like to test them also.
> > > 
> > > Hi Marian, I put the patches against 3.13.11 and 3.14.1 up at:
> > > 
> > > git://github.com/dwengen/linux.git cpuacct-task-limit-3.13
> > > git://github.com/dwengen/linux.git cpuacct-task-limit-3.14
> > 
> > Thanks, Dwight.  FWIW I'm agreed with Tim, Dwight, Richard, and Marian
> > that a task limit would be a proper cgroup extension, and specifically
> > that approximating that with a kmem limit is not a reasonable substitute.
> 
> The current state of the kmem limit, which is improving a lot thanks to
> Vladimir, is not a reason for a new extension/controller. We are just
> not yet there.

It has nothing to do with the state of the limit.  I simply don't
believe that emulating RLIMIT_NPROC by controlling stack size is a
good idea.

-serge

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Protection against container fork bombs [WAS: Re: memcg with kmem limit doesn't recover after disk i/o causes limit to be hit]
       [not found]                     ` <20140429130353.GA27354@ubuntumail>
@ 2014-04-29 13:57                       ` Marian Marinov
  2014-04-29 14:04                       ` Tim Hockin
                                         ` (2 subsequent siblings)
  3 siblings, 0 replies; 33+ messages in thread
From: Marian Marinov @ 2014-04-29 13:57 UTC (permalink / raw)
  To: Serge Hallyn, Michal Hocko
  Cc: Richard Davies, Vladimir Davydov, David Rientjes, Max Kellermann,
	Tim Hockin, Frederic Weisbecker,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	Daniel Walsh, Glauber Costa, linux-mm-Bw31MaZKKs3YtjvyW6yDsg,
	William Dauchy, Johannes Weiner, Tejun Heo,
	cgroups-u79uwXL29TY76Z2rM5mHXA

On 04/29/2014 04:03 PM, Serge Hallyn wrote:
> Quoting Michal Hocko (mhocko-AlSwsSmVLrQ@public.gmane.org):
>> On Mon 28-04-14 18:00:25, Serge Hallyn wrote:
>>> Quoting Dwight Engen (dwight.engen-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org):
>>>> On Wed, 23 Apr 2014 09:07:28 +0300
>>>> Marian Marinov <mm-NV7Lj0SOnH0@public.gmane.org> wrote:
>>>>
>>>>> On 04/22/2014 11:05 PM, Richard Davies wrote:
>>>>>> Dwight Engen wrote:
>>>>>>> Richard Davies wrote:
>>>>>>>> Vladimir Davydov wrote:
>>>>>>>>> In short, kmem limiting for memory cgroups is currently broken.
>>>>>>>>> Do not use it. We are working on making it usable though.
>>>>>> ...
>>>>>>>> What is the best mechanism available today, until kmem limits
>>>>>>>> mature?
>>>>>>>>
>>>>>>>> RLIMIT_NPROC exists but is per-user, not per-container.
>>>>>>>>
>>>>>>>> Perhaps there is an up-to-date task counter patchset or similar?
>>>>>>>
>>>>>>> I updated Frederic's task counter patches and included Max
>>>>>>> Kellermann's fork limiter here:
>>>>>>>
>>>>>>> http://thread.gmane.org/gmane.linux.kernel.containers/27212
>>>>>>>
>>>>>>> I can send you a more recent patchset (against 3.13.10) if you
>>>>>>> would find it useful.
>>>>>>
>>>>>> Yes please, I would be interested in that. Ideally even against
>>>>>> 3.14.1 if you have that too.
>>>>>
>>>>> Dwight, do you have these patches in any public repo?
>>>>>
>>>>> I would like to test them also.
>>>>
>>>> Hi Marian, I put the patches against 3.13.11 and 3.14.1 up at:
>>>>
>>>> git://github.com/dwengen/linux.git cpuacct-task-limit-3.13
>>>> git://github.com/dwengen/linux.git cpuacct-task-limit-3.14
>>>
>>> Thanks, Dwight.  FWIW I'm agreed with Tim, Dwight, Richard, and Marian
>>> that a task limit would be a proper cgroup extension, and specifically
>>> that approximating that with a kmem limit is not a reasonable substitute.
>>
>> The current state of the kmem limit, which is improving a lot thanks to
>> Vladimir, is not a reason for a new extension/controller. We are just
>> not yet there.
>
> It has nothing to do with the state of the limit.  I simply don't
> believe that emulating RLIMIT_NPROC by controlling stack size is a
> good idea.
>
> -serge

I think that having a limit on the number of processes allowed in a cgroup is a lot better then relaying on the kmem limit.
The problem that task-limit tries to solve is degradation of system performance caused by too many processes in a 
certain cgroup. I'm currently testing the patches with 3.12.16.

-hackman

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Protection against container fork bombs [WAS: Re: memcg with kmem limit doesn't recover after disk i/o causes limit to be hit]
       [not found]                     ` <20140429130353.GA27354@ubuntumail>
  2014-04-29 13:57                       ` Marian Marinov
@ 2014-04-29 14:04                       ` Tim Hockin
  2014-04-29 15:43                       ` Michal Hocko
       [not found]                       ` <20140429154345.GH15058@dhcp22.suse.cz>
  3 siblings, 0 replies; 33+ messages in thread
From: Tim Hockin @ 2014-04-29 14:04 UTC (permalink / raw)
  To: Serge Hallyn
  Cc: Vladimir Davydov, Richard Davies, Marian Marinov, Max Kellermann,
	Tim Hockin, Frederic Weisbecker,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	David Rientjes, Glauber Costa, Michal Hocko,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg, William Dauchy, Johannes Weiner,
	Tejun Heo, cgroups-u79uwXL29TY76Z2rM5mHXA, Daniel Walsh

Thank you.  These are two different things.  They may have a relationship
but they ate not the same, and pretending they are is a bad experience.
On Apr 29, 2014 6:04 AM, "Serge Hallyn" <serge.hallyn-GeWIH/nMZzLQT0dZR+AlfA@public.gmane.org> wrote:

> Quoting Michal Hocko (mhocko-AlSwsSmVLrQ@public.gmane.org):
> > On Mon 28-04-14 18:00:25, Serge Hallyn wrote:
> > > Quoting Dwight Engen (dwight.engen-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org):
> > > > On Wed, 23 Apr 2014 09:07:28 +0300
> > > > Marian Marinov <mm-NV7Lj0SOnH0@public.gmane.org> wrote:
> > > >
> > > > > On 04/22/2014 11:05 PM, Richard Davies wrote:
> > > > > > Dwight Engen wrote:
> > > > > >> Richard Davies wrote:
> > > > > >>> Vladimir Davydov wrote:
> > > > > >>>> In short, kmem limiting for memory cgroups is currently
> broken.
> > > > > >>>> Do not use it. We are working on making it usable though.
> > > > > > ...
> > > > > >>> What is the best mechanism available today, until kmem limits
> > > > > >>> mature?
> > > > > >>>
> > > > > >>> RLIMIT_NPROC exists but is per-user, not per-container.
> > > > > >>>
> > > > > >>> Perhaps there is an up-to-date task counter patchset or
> similar?
> > > > > >>
> > > > > >> I updated Frederic's task counter patches and included Max
> > > > > >> Kellermann's fork limiter here:
> > > > > >>
> > > > > >> http://thread.gmane.org/gmane.linux.kernel.containers/27212
> > > > > >>
> > > > > >> I can send you a more recent patchset (against 3.13.10) if you
> > > > > >> would find it useful.
> > > > > >
> > > > > > Yes please, I would be interested in that. Ideally even against
> > > > > > 3.14.1 if you have that too.
> > > > >
> > > > > Dwight, do you have these patches in any public repo?
> > > > >
> > > > > I would like to test them also.
> > > >
> > > > Hi Marian, I put the patches against 3.13.11 and 3.14.1 up at:
> > > >
> > > > git://github.com/dwengen/linux.git cpuacct-task-limit-3.13
> > > > git://github.com/dwengen/linux.git cpuacct-task-limit-3.14
> > >
> > > Thanks, Dwight.  FWIW I'm agreed with Tim, Dwight, Richard, and Marian
> > > that a task limit would be a proper cgroup extension, and specifically
> > > that approximating that with a kmem limit is not a reasonable
> substitute.
> >
> > The current state of the kmem limit, which is improving a lot thanks to
> > Vladimir, is not a reason for a new extension/controller. We are just
> > not yet there.
>
> It has nothing to do with the state of the limit.  I simply don't
> believe that emulating RLIMIT_NPROC by controlling stack size is a
> good idea.
>
> -serge
>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Protection against container fork bombs [WAS: Re: memcg with kmem limit doesn't recover after disk i/o causes limit to be hit]
       [not found]                     ` <20140429130353.GA27354@ubuntumail>
  2014-04-29 13:57                       ` Marian Marinov
  2014-04-29 14:04                       ` Tim Hockin
@ 2014-04-29 15:43                       ` Michal Hocko
       [not found]                       ` <20140429154345.GH15058@dhcp22.suse.cz>
  3 siblings, 0 replies; 33+ messages in thread
From: Michal Hocko @ 2014-04-29 15:43 UTC (permalink / raw)
  To: Serge Hallyn
  Cc: Richard Davies, Vladimir Davydov, David Rientjes, Marian Marinov,
	Max Kellermann, Tim Hockin, Frederic Weisbecker,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	Daniel Walsh, Glauber Costa, linux-mm-Bw31MaZKKs3YtjvyW6yDsg,
	William Dauchy, Johannes Weiner, Tejun Heo,
	cgroups-u79uwXL29TY76Z2rM5mHXA

On Tue 29-04-14 13:03:53, Serge Hallyn wrote:
> Quoting Michal Hocko (mhocko-AlSwsSmVLrQ@public.gmane.org):
> > On Mon 28-04-14 18:00:25, Serge Hallyn wrote:
> > > Quoting Dwight Engen (dwight.engen-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org):
> > > > On Wed, 23 Apr 2014 09:07:28 +0300
> > > > Marian Marinov <mm-NV7Lj0SOnH0@public.gmane.org> wrote:
> > > > 
> > > > > On 04/22/2014 11:05 PM, Richard Davies wrote:
> > > > > > Dwight Engen wrote:
> > > > > >> Richard Davies wrote:
> > > > > >>> Vladimir Davydov wrote:
> > > > > >>>> In short, kmem limiting for memory cgroups is currently broken.
> > > > > >>>> Do not use it. We are working on making it usable though.
> > > > > > ...
> > > > > >>> What is the best mechanism available today, until kmem limits
> > > > > >>> mature?
> > > > > >>>
> > > > > >>> RLIMIT_NPROC exists but is per-user, not per-container.
> > > > > >>>
> > > > > >>> Perhaps there is an up-to-date task counter patchset or similar?
> > > > > >>
> > > > > >> I updated Frederic's task counter patches and included Max
> > > > > >> Kellermann's fork limiter here:
> > > > > >>
> > > > > >> http://thread.gmane.org/gmane.linux.kernel.containers/27212
> > > > > >>
> > > > > >> I can send you a more recent patchset (against 3.13.10) if you
> > > > > >> would find it useful.
> > > > > >
> > > > > > Yes please, I would be interested in that. Ideally even against
> > > > > > 3.14.1 if you have that too.
> > > > > 
> > > > > Dwight, do you have these patches in any public repo?
> > > > > 
> > > > > I would like to test them also.
> > > > 
> > > > Hi Marian, I put the patches against 3.13.11 and 3.14.1 up at:
> > > > 
> > > > git://github.com/dwengen/linux.git cpuacct-task-limit-3.13
> > > > git://github.com/dwengen/linux.git cpuacct-task-limit-3.14
> > > 
> > > Thanks, Dwight.  FWIW I'm agreed with Tim, Dwight, Richard, and Marian
> > > that a task limit would be a proper cgroup extension, and specifically
> > > that approximating that with a kmem limit is not a reasonable substitute.
> > 
> > The current state of the kmem limit, which is improving a lot thanks to
> > Vladimir, is not a reason for a new extension/controller. We are just
> > not yet there.
> 
> It has nothing to do with the state of the limit.  I simply don't
> believe that emulating RLIMIT_NPROC by controlling stack size is a
> good idea.

I was not the one who decided that the kmem extension of memory
controller should cover also the task number as a side effect but still
the decision sounds plausible to me because the kmem approach is more
generic.

Btw. if this is a problem them please go ahead and continue the original
discussion (http://marc.info/?l=linux-kernel&m=133417075309923) with the
other people involved.

I do not see any new arguments here, except that the kmem implementation
is not ready yet.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Protection against container fork bombs [WAS: Re: memcg with kmem limit doesn't recover after disk i/o causes limit to be hit]
       [not found]                         ` <20140429154345.GH15058-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
@ 2014-04-29 16:06                           ` Tim Hockin
  0 siblings, 0 replies; 33+ messages in thread
From: Tim Hockin @ 2014-04-29 16:06 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Richard Davies, Vladimir Davydov, David Rientjes, Marian Marinov,
	Max Kellermann, Tim Hockin, Frederic Weisbecker,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	Serge Hallyn, Glauber Costa,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org, William Dauchy,
	Johannes Weiner, Tejun Heo, cgroups-u79uwXL29TY76Z2rM5mHXA,
	Daniel Walsh

Why the insistence that we manage something that REALLY IS a
first-class concept (hey, it has it's own RLIMIT) as a side effect of
something that doesn't quite capture what we want to achieve?

Is there some specific technical reason why you think this is a bad
idea?  I would think, especially in a more unified hierarchy world,
that more cgroup controllers with smaller sets of responsibility would
make for more manageable code (within limits, obviously).

On Tue, Apr 29, 2014 at 8:43 AM, Michal Hocko <mhocko-AlSwsSmVLrQ@public.gmane.org> wrote:
> On Tue 29-04-14 13:03:53, Serge Hallyn wrote:
>> Quoting Michal Hocko (mhocko-AlSwsSmVLrQ@public.gmane.org):
>> > On Mon 28-04-14 18:00:25, Serge Hallyn wrote:
>> > > Quoting Dwight Engen (dwight.engen-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org):
>> > > > On Wed, 23 Apr 2014 09:07:28 +0300
>> > > > Marian Marinov <mm-NV7Lj0SOnH0@public.gmane.org> wrote:
>> > > >
>> > > > > On 04/22/2014 11:05 PM, Richard Davies wrote:
>> > > > > > Dwight Engen wrote:
>> > > > > >> Richard Davies wrote:
>> > > > > >>> Vladimir Davydov wrote:
>> > > > > >>>> In short, kmem limiting for memory cgroups is currently broken.
>> > > > > >>>> Do not use it. We are working on making it usable though.
>> > > > > > ...
>> > > > > >>> What is the best mechanism available today, until kmem limits
>> > > > > >>> mature?
>> > > > > >>>
>> > > > > >>> RLIMIT_NPROC exists but is per-user, not per-container.
>> > > > > >>>
>> > > > > >>> Perhaps there is an up-to-date task counter patchset or similar?
>> > > > > >>
>> > > > > >> I updated Frederic's task counter patches and included Max
>> > > > > >> Kellermann's fork limiter here:
>> > > > > >>
>> > > > > >> http://thread.gmane.org/gmane.linux.kernel.containers/27212
>> > > > > >>
>> > > > > >> I can send you a more recent patchset (against 3.13.10) if you
>> > > > > >> would find it useful.
>> > > > > >
>> > > > > > Yes please, I would be interested in that. Ideally even against
>> > > > > > 3.14.1 if you have that too.
>> > > > >
>> > > > > Dwight, do you have these patches in any public repo?
>> > > > >
>> > > > > I would like to test them also.
>> > > >
>> > > > Hi Marian, I put the patches against 3.13.11 and 3.14.1 up at:
>> > > >
>> > > > git://github.com/dwengen/linux.git cpuacct-task-limit-3.13
>> > > > git://github.com/dwengen/linux.git cpuacct-task-limit-3.14
>> > >
>> > > Thanks, Dwight.  FWIW I'm agreed with Tim, Dwight, Richard, and Marian
>> > > that a task limit would be a proper cgroup extension, and specifically
>> > > that approximating that with a kmem limit is not a reasonable substitute.
>> >
>> > The current state of the kmem limit, which is improving a lot thanks to
>> > Vladimir, is not a reason for a new extension/controller. We are just
>> > not yet there.
>>
>> It has nothing to do with the state of the limit.  I simply don't
>> believe that emulating RLIMIT_NPROC by controlling stack size is a
>> good idea.
>
> I was not the one who decided that the kmem extension of memory
> controller should cover also the task number as a side effect but still
> the decision sounds plausible to me because the kmem approach is more
> generic.
>
> Btw. if this is a problem them please go ahead and continue the original
> discussion (http://marc.info/?l=linux-kernel&m=133417075309923) with the
> other people involved.
>
> I do not see any new arguments here, except that the kmem implementation
> is not ready yet.
> --
> Michal Hocko
> SUSE Labs

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Protection against container fork bombs [WAS: Re: memcg with kmem limit doesn't recover after disk i/o causes limit to be hit]
       [not found]                           ` <CAO_RewYZDGLBAKit4CudTbqVk+zfDRX8kP0W6Zz90xJh7abM9Q-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2014-04-29 16:51                             ` Frederic Weisbecker
  0 siblings, 0 replies; 33+ messages in thread
From: Frederic Weisbecker @ 2014-04-29 16:51 UTC (permalink / raw)
  To: Tim Hockin
  Cc: Richard Davies, Vladimir Davydov, David Rientjes, Marian Marinov,
	Max Kellermann, Tim Hockin,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	Serge Hallyn, Glauber Costa, Michal Hocko,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org, William Dauchy,
	Johannes Weiner, Tejun Heo, cgroups-u79uwXL29TY76Z2rM5mHXA,
	Daniel Walsh

On Tue, Apr 29, 2014 at 09:06:22AM -0700, Tim Hockin wrote:
> Why the insistence that we manage something that REALLY IS a
> first-class concept (hey, it has it's own RLIMIT) as a side effect of
> something that doesn't quite capture what we want to achieve?

It's not a side effect, the kmem task stack control was partly
motivated to solve forkbomb issues in containers.

Also in general if we can reuse existing features and code to solve
a problem without disturbing side issues, we just do it.

Now if kmem doesn't solve the issue for you for any reason, or it does
but it brings other problems that aren't fixable in kmem itself, we can
certainly reconsider this cgroup subsystem. But I haven't yet seen
argument of this kind yet.

> 
> Is there some specific technical reason why you think this is a bad
> idea?
> I would think, especially in a more unified hierarchy world,
> that more cgroup controllers with smaller sets of responsibility would
> make for more manageable code (within limits, obviously).

Because it's core code and it adds complications and overhead in the
fork/exit path. We just don't add new core code just for the sake of
slightly prettier interfaces.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Protection against container fork bombs [WAS: Re: memcg with kmem limit doesn't recover after disk i/o causes limit to be hit]
       [not found]                             ` <20140429165114.GE6129-bi+AKbBUZKY6gyzm1THtWbp2dZbC/Bob@public.gmane.org>
@ 2014-04-29 16:59                               ` Tim Hockin
  0 siblings, 0 replies; 33+ messages in thread
From: Tim Hockin @ 2014-04-29 16:59 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Richard Davies, Vladimir Davydov, David Rientjes, Marian Marinov,
	Max Kellermann, Tim Hockin,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	Serge Hallyn, Glauber Costa, Michal Hocko,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org, William Dauchy,
	Johannes Weiner, Tejun Heo, cgroups-u79uwXL29TY76Z2rM5mHXA,
	Daniel Walsh

Here's the reason it doesn't work for us: It doesn't work.  It was
something like 2 YEARS since we first wanted this, and it STILL does
not work.  You're postponing a pretty simple request indefinitely in
favor of a much more complex feature, which still doesn't really give
me what I want.  What I want is an API that works like rlimit but
per-cgroup, rather than per-UID.

On Tue, Apr 29, 2014 at 9:51 AM, Frederic Weisbecker <fweisbec-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
> On Tue, Apr 29, 2014 at 09:06:22AM -0700, Tim Hockin wrote:
>> Why the insistence that we manage something that REALLY IS a
>> first-class concept (hey, it has it's own RLIMIT) as a side effect of
>> something that doesn't quite capture what we want to achieve?
>
> It's not a side effect, the kmem task stack control was partly
> motivated to solve forkbomb issues in containers.
>
> Also in general if we can reuse existing features and code to solve
> a problem without disturbing side issues, we just do it.
>
> Now if kmem doesn't solve the issue for you for any reason, or it does
> but it brings other problems that aren't fixable in kmem itself, we can
> certainly reconsider this cgroup subsystem. But I haven't yet seen
> argument of this kind yet.
>
>>
>> Is there some specific technical reason why you think this is a bad
>> idea?
>> I would think, especially in a more unified hierarchy world,
>> that more cgroup controllers with smaller sets of responsibility would
>> make for more manageable code (within limits, obviously).
>
> Because it's core code and it adds complications and overhead in the
> fork/exit path. We just don't add new core code just for the sake of
> slightly prettier interfaces.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Protection against container fork bombs [WAS: Re: memcg with kmem limit doesn't recover after disk i/o causes limit to be hit]
       [not found]                               ` <CAO_Rewa20dneL8e3T4UPnu2Dkv28KTgFJR9_YSmRBKp-_yqewg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2014-04-29 17:06                                 ` Michal Hocko
  2014-04-29 21:44                                 ` Frederic Weisbecker
  1 sibling, 0 replies; 33+ messages in thread
From: Michal Hocko @ 2014-04-29 17:06 UTC (permalink / raw)
  To: Tim Hockin
  Cc: Richard Davies, Vladimir Davydov, David Rientjes, Marian Marinov,
	Max Kellermann, Tim Hockin, Frederic Weisbecker,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	Serge Hallyn, Glauber Costa,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org, William Dauchy,
	Johannes Weiner, Tejun Heo, cgroups-u79uwXL29TY76Z2rM5mHXA,
	Daniel Walsh

On Tue 29-04-14 09:59:30, Tim Hockin wrote:
> Here's the reason it doesn't work for us: It doesn't work. 

There is a "simple" solution for that. Help us to fix it.

> It was something like 2 YEARS since we first wanted this, and it STILL
> does not work.

My recollection is that it was primarily Parallels and Google asking for
the kmem accounting. The reason why I didn't fight against inclusion
although the implementation at the time didn't have a proper slab
shrinking implemented was that that would happen later. Well, that later
hasn't happened yet and we are slowly getting there.

> You're postponing a pretty simple request indefinitely in
> favor of a much more complex feature, which still doesn't really give
> me what I want. 

But we cannot simply add a new interface that will have to be maintained
for ever just because something else that is supposed to workaround bugs.

> What I want is an API that works like rlimit but per-cgroup, rather
> than per-UID.

You can use an out-of-tree patchset for the time being or help to get
kmem into shape. If there are principal reasons why kmem cannot be used
then you better articulate them.

> On Tue, Apr 29, 2014 at 9:51 AM, Frederic Weisbecker <fweisbec-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
> > On Tue, Apr 29, 2014 at 09:06:22AM -0700, Tim Hockin wrote:
> >> Why the insistence that we manage something that REALLY IS a
> >> first-class concept (hey, it has it's own RLIMIT) as a side effect of
> >> something that doesn't quite capture what we want to achieve?
> >
> > It's not a side effect, the kmem task stack control was partly
> > motivated to solve forkbomb issues in containers.
> >
> > Also in general if we can reuse existing features and code to solve
> > a problem without disturbing side issues, we just do it.
> >
> > Now if kmem doesn't solve the issue for you for any reason, or it does
> > but it brings other problems that aren't fixable in kmem itself, we can
> > certainly reconsider this cgroup subsystem. But I haven't yet seen
> > argument of this kind yet.
> >
> >>
> >> Is there some specific technical reason why you think this is a bad
> >> idea?
> >> I would think, especially in a more unified hierarchy world,
> >> that more cgroup controllers with smaller sets of responsibility would
> >> make for more manageable code (within limits, obviously).
> >
> > Because it's core code and it adds complications and overhead in the
> > fork/exit path. We just don't add new core code just for the sake of
> > slightly prettier interfaces.

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Protection against container fork bombs [WAS: Re: memcg with kmem limit doesn't recover after disk i/o causes limit to be hit]
       [not found]                                 ` <20140429170639.GA25609-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
@ 2014-04-29 17:30                                   ` Dwight Engen
  0 siblings, 0 replies; 33+ messages in thread
From: Dwight Engen @ 2014-04-29 17:30 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Richard Davies, Vladimir Davydov, Marian Marinov, Max Kellermann,
	Tim Hockin, Frederic Weisbecker,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	Serge Hallyn, Tim Hockin, Glauber Costa, Johannes Weiner,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org, William Dauchy,
	David Rientjes, Tejun Heo, cgroups-u79uwXL29TY76Z2rM5mHXA,
	Daniel Walsh

On Tue, 29 Apr 2014 19:06:39 +0200
Michal Hocko <mhocko-AlSwsSmVLrQ@public.gmane.org> wrote:

> On Tue 29-04-14 09:59:30, Tim Hockin wrote:
> > Here's the reason it doesn't work for us: It doesn't work. 
> 
> There is a "simple" solution for that. Help us to fix it.
> 
> > It was something like 2 YEARS since we first wanted this, and it
> > STILL does not work.
> 
> My recollection is that it was primarily Parallels and Google asking
> for the kmem accounting. The reason why I didn't fight against
> inclusion although the implementation at the time didn't have a
> proper slab shrinking implemented was that that would happen later.
> Well, that later hasn't happened yet and we are slowly getting there.
> 
> > You're postponing a pretty simple request indefinitely in
> > favor of a much more complex feature, which still doesn't really
> > give me what I want. 
> 
> But we cannot simply add a new interface that will have to be
> maintained for ever just because something else that is supposed to
> workaround bugs.
> 
> > What I want is an API that works like rlimit but per-cgroup, rather
> > than per-UID.
> 
> You can use an out-of-tree patchset for the time being or help to get
> kmem into shape. If there are principal reasons why kmem cannot be
> used then you better articulate them.

Is there a plan to separately account/limit stack pages vs kmem in
general? Richard would have to verify, but I suspect kmem is not currently
viable as a process limiter for him because icache/dcache/stack is all
accounted together.

> > On Tue, Apr 29, 2014 at 9:51 AM, Frederic Weisbecker
> > <fweisbec-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
> > > On Tue, Apr 29, 2014 at 09:06:22AM -0700, Tim Hockin wrote:
> > >> Why the insistence that we manage something that REALLY IS a
> > >> first-class concept (hey, it has it's own RLIMIT) as a side
> > >> effect of something that doesn't quite capture what we want to
> > >> achieve?
> > >
> > > It's not a side effect, the kmem task stack control was partly
> > > motivated to solve forkbomb issues in containers.
> > >
> > > Also in general if we can reuse existing features and code to
> > > solve a problem without disturbing side issues, we just do it.
> > >
> > > Now if kmem doesn't solve the issue for you for any reason, or it
> > > does but it brings other problems that aren't fixable in kmem
> > > itself, we can certainly reconsider this cgroup subsystem. But I
> > > haven't yet seen argument of this kind yet.
> > >
> > >>
> > >> Is there some specific technical reason why you think this is a
> > >> bad idea?
> > >> I would think, especially in a more unified hierarchy world,
> > >> that more cgroup controllers with smaller sets of responsibility
> > >> would make for more manageable code (within limits, obviously).
> > >
> > > Because it's core code and it adds complications and overhead in
> > > the fork/exit path. We just don't add new core code just for the
> > > sake of slightly prettier interfaces.
> 

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Protection against container fork bombs [WAS: Re: memcg with kmem limit doesn't recover after disk i/o causes limit to be hit]
       [not found]                                   ` <20140429133039.162d9dd7-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
@ 2014-04-29 18:09                                     ` Richard Davies
  0 siblings, 0 replies; 33+ messages in thread
From: Richard Davies @ 2014-04-29 18:09 UTC (permalink / raw)
  To: Dwight Engen
  Cc: Vladimir Davydov, Marian Marinov, Max Kellermann, Tim Hockin,
	Frederic Weisbecker,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	Serge Hallyn, Johannes Weiner, Tim Hockin, Glauber Costa,
	Michal Hocko, linux-mm-Bw31MaZKKs3YtjvyW6yDsg, William Dauchy,
	David Rientjes, Tejun Heo, cgroups-u79uwXL29TY76Z2rM5mHXA,
	Daniel Walsh

Dwight Engen wrote:
> Michal Hocko wrote:
> > Tim Hockin wrote:
> > > Here's the reason it doesn't work for us: It doesn't work.
> >
> > There is a "simple" solution for that. Help us to fix it.
> >
> > > It was something like 2 YEARS since we first wanted this, and it
> > > STILL does not work.
> >
> > My recollection is that it was primarily Parallels and Google asking
> > for the kmem accounting. The reason why I didn't fight against
> > inclusion although the implementation at the time didn't have a
> > proper slab shrinking implemented was that that would happen later.
> > Well, that later hasn't happened yet and we are slowly getting there.
> >
> > > You're postponing a pretty simple request indefinitely in
> > > favor of a much more complex feature, which still doesn't really
> > > give me what I want.
> >
> > But we cannot simply add a new interface that will have to be
> > maintained for ever just because something else that is supposed to
> > workaround bugs.
> >
> > > What I want is an API that works like rlimit but per-cgroup, rather
> > > than per-UID.
> >
> > You can use an out-of-tree patchset for the time being or help to get
> > kmem into shape. If there are principal reasons why kmem cannot be
> > used then you better articulate them.
>
> Is there a plan to separately account/limit stack pages vs kmem in
> general? Richard would have to verify, but I suspect kmem is not currently
> viable as a process limiter for him because icache/dcache/stack is all
> accounted together.

Certainly I would like to be able to limit container fork-bombs without
limiting the amount of disk IO caching for processes in those containers.

In my testing with of kmem limits, I needed a limit of 256MB or lower to
catch fork bombs early enough. I would definitely like more than 256MB of
disk caching.

So if we go the "working kmem" route, I would like to be able to specify a
limit excluding disk cache.


I am also somewhat worried that normal software use could legitimately go
above 256MB of kmem (even excluding disk cache) - I got to 50MB in testing
just by booting a distro with a few daemons in a container.

Richard.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Protection against container fork bombs [WAS: Re: memcg with kmem limit doesn't recover after disk i/o causes limit to be hit]
       [not found]                                     ` <20140429180927.GB29606-2oeHp4OYwSjPZs67QiJbJtBPR1lH4CV8@public.gmane.org>
@ 2014-04-29 18:27                                       ` Michal Hocko
  0 siblings, 0 replies; 33+ messages in thread
From: Michal Hocko @ 2014-04-29 18:27 UTC (permalink / raw)
  To: Richard Davies
  Cc: Vladimir Davydov, Marian Marinov, Max Kellermann, Tim Hockin,
	Frederic Weisbecker,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	Serge Hallyn, Tim Hockin, Glauber Costa, Johannes Weiner,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg, William Dauchy, David Rientjes,
	Tejun Heo, cgroups-u79uwXL29TY76Z2rM5mHXA, Daniel Walsh

On Tue 29-04-14 19:09:27, Richard Davies wrote:
> Dwight Engen wrote:
> > Michal Hocko wrote:
> > > Tim Hockin wrote:
> > > > Here's the reason it doesn't work for us: It doesn't work.
> > >
> > > There is a "simple" solution for that. Help us to fix it.
> > >
> > > > It was something like 2 YEARS since we first wanted this, and it
> > > > STILL does not work.
> > >
> > > My recollection is that it was primarily Parallels and Google asking
> > > for the kmem accounting. The reason why I didn't fight against
> > > inclusion although the implementation at the time didn't have a
> > > proper slab shrinking implemented was that that would happen later.
> > > Well, that later hasn't happened yet and we are slowly getting there.
> > >
> > > > You're postponing a pretty simple request indefinitely in
> > > > favor of a much more complex feature, which still doesn't really
> > > > give me what I want.
> > >
> > > But we cannot simply add a new interface that will have to be
> > > maintained for ever just because something else that is supposed to
> > > workaround bugs.
> > >
> > > > What I want is an API that works like rlimit but per-cgroup, rather
> > > > than per-UID.
> > >
> > > You can use an out-of-tree patchset for the time being or help to get
> > > kmem into shape. If there are principal reasons why kmem cannot be
> > > used then you better articulate them.
> >
> > Is there a plan to separately account/limit stack pages vs kmem in
> > general? Richard would have to verify, but I suspect kmem is not currently
> > viable as a process limiter for him because icache/dcache/stack is all
> > accounted together.
> 
> Certainly I would like to be able to limit container fork-bombs without
> limiting the amount of disk IO caching for processes in those containers.
> 
> In my testing with of kmem limits, I needed a limit of 256MB or lower to
> catch fork bombs early enough. I would definitely like more than 256MB of
> disk caching.
> 
> So if we go the "working kmem" route, I would like to be able to specify a
> limit excluding disk cache.

Page cache (which is what you mean by disk cache probably) is a
userspace accounted memory with the memory cgroup controller. And you
do not have to limit that one. Kmem accounting refers to kernel internal
allocations - slab memory and per process kernel stack. You can see how
much memory is allocated per container by memory.kmem.usage_in_bytes or
have a look at /proc/slabinfo to see what kind of memory kernel
allocates globally and might be accounted for a container as well.

The primary problem with the kmem accounting right now is that such a
memory is not "reclaimed" and so if the kmem limit is reached all the
further kmem allocations fail. The biggest user of the kmem allocations
on many systems is dentry and inode chache which is reclaimable easily.
When this is implemented the kmem limit will be usable to both prevent
forkbombs but also other DOS scenarios when the kernel is pushed to
allocate a huge amount of memory.

HTH

> I am also somewhat worried that normal software use could legitimately go
> above 256MB of kmem (even excluding disk cache) - I got to 50MB in testing
> just by booting a distro with a few daemons in a container.
> 
> Richard.

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Protection against container fork bombs [WAS: Re: memcg with kmem limit doesn't recover after disk i/o causes limit to be hit]
       [not found]                                       ` <20140429182742.GB25609-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
@ 2014-04-29 18:39                                         ` Richard Davies
       [not found]                                           ` <20140429183928.GF29606-2oeHp4OYwSjPZs67QiJbJtBPR1lH4CV8@public.gmane.org>
  2014-04-29 21:36                                         ` Marian Marinov
  1 sibling, 1 reply; 33+ messages in thread
From: Richard Davies @ 2014-04-29 18:39 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Vladimir Davydov, Marian Marinov, Max Kellermann, Tim Hockin,
	Frederic Weisbecker,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	Serge Hallyn, Tim Hockin, Glauber Costa, Johannes Weiner,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg, William Dauchy, David Rientjes,
	Tejun Heo, cgroups-u79uwXL29TY76Z2rM5mHXA, Daniel Walsh

Michal Hocko wrote:
> Richard Davies wrote:
> > Dwight Engen wrote:
> > > Is there a plan to separately account/limit stack pages vs kmem in
> > > general? Richard would have to verify, but I suspect kmem is not
> > > currently viable as a process limiter for him because
> > > icache/dcache/stack is all accounted together.
> >
> > Certainly I would like to be able to limit container fork-bombs without
> > limiting the amount of disk IO caching for processes in those containers.
> >
> > In my testing with of kmem limits, I needed a limit of 256MB or lower to
> > catch fork bombs early enough. I would definitely like more than 256MB of
> > disk caching.
> >
> > So if we go the "working kmem" route, I would like to be able to specify a
> > limit excluding disk cache.
>
> Page cache (which is what you mean by disk cache probably) is a
> userspace accounted memory with the memory cgroup controller. And you
> do not have to limit that one.

OK, that's helpful - thanks.

As an aside, with the normal (non-kmem) cgroup controller, is there a way
for me to exclude page cache and only limit the equivalent of the rss line
in memory.stat?

e.g. say I have a 256GB physical machine, running 200 containers, each with
1GB normal-mem limit (for running software) and 256MB kmem limit (to stop
fork-bombs).

The physical disk IO bandwidth is a shared resource between all the
containers, so ideally I would like the kernel to used the 56GB of RAM as
shared page cache however it best reduces physical IOPs, rather than having
a per-container limit.

Thanks,

Richard.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Protection against container fork bombs [WAS: Re: memcg with kmem limit doesn't recover after disk i/o causes limit to be hit]
       [not found]                                           ` <20140429183928.GF29606-2oeHp4OYwSjPZs67QiJbJtBPR1lH4CV8@public.gmane.org>
@ 2014-04-29 19:03                                             ` Michal Hocko
  0 siblings, 0 replies; 33+ messages in thread
From: Michal Hocko @ 2014-04-29 19:03 UTC (permalink / raw)
  To: Richard Davies
  Cc: Vladimir Davydov, Marian Marinov, Max Kellermann, Tim Hockin,
	Frederic Weisbecker,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	Serge Hallyn, Tim Hockin, Glauber Costa, Johannes Weiner,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg, William Dauchy, David Rientjes,
	Tejun Heo, cgroups-u79uwXL29TY76Z2rM5mHXA, Daniel Walsh

On Tue 29-04-14 19:39:28, Richard Davies wrote:
> Michal Hocko wrote:
> > Richard Davies wrote:
> > > Dwight Engen wrote:
> > > > Is there a plan to separately account/limit stack pages vs kmem in
> > > > general? Richard would have to verify, but I suspect kmem is not
> > > > currently viable as a process limiter for him because
> > > > icache/dcache/stack is all accounted together.
> > >
> > > Certainly I would like to be able to limit container fork-bombs without
> > > limiting the amount of disk IO caching for processes in those containers.
> > >
> > > In my testing with of kmem limits, I needed a limit of 256MB or lower to
> > > catch fork bombs early enough. I would definitely like more than 256MB of
> > > disk caching.
> > >
> > > So if we go the "working kmem" route, I would like to be able to specify a
> > > limit excluding disk cache.
> >
> > Page cache (which is what you mean by disk cache probably) is a
> > userspace accounted memory with the memory cgroup controller. And you
> > do not have to limit that one.
> 
> OK, that's helpful - thanks.
> 
> As an aside, with the normal (non-kmem) cgroup controller, is there a way
> for me to exclude page cache and only limit the equivalent of the rss line
> in memory.stat?

No

> e.g. say I have a 256GB physical machine, running 200 containers, each with
> 1GB normal-mem limit (for running software) and 256MB kmem limit (to stop
> fork-bombs).
> 
> The physical disk IO bandwidth is a shared resource between all the
> containers, so ideally I would like the kernel to used the 56GB of RAM as
> shared page cache however it best reduces physical IOPs, rather than having
> a per-container limit.

Then do not use any memory.limit_in_bytes and if there is a memory
pressure then the global reclaim will shrink all the containers
proportionally and the page cache will be the #1 target for the
reclaim (but we are getting off-topic here I am afraid).
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Protection against container fork bombs [WAS: Re: memcg with kmem limit doesn't recover after disk i/o causes limit to be hit]
       [not found]                                       ` <20140429182742.GB25609-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
  2014-04-29 18:39                                         ` Richard Davies
@ 2014-04-29 21:36                                         ` Marian Marinov
       [not found]                                           ` <53601B68.60906-NV7Lj0SOnH0@public.gmane.org>
  1 sibling, 1 reply; 33+ messages in thread
From: Marian Marinov @ 2014-04-29 21:36 UTC (permalink / raw)
  To: Michal Hocko, Richard Davies
  Cc: Vladimir Davydov, Daniel Walsh, Max Kellermann, Tim Hockin,
	Frederic Weisbecker,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	Serge Hallyn, Tim Hockin, Glauber Costa, Johannes Weiner,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg, William Dauchy, David Rientjes,
	Tejun Heo, cgroups-u79uwXL29TY76Z2rM5mHXA

On 04/29/2014 09:27 PM, Michal Hocko wrote:
> On Tue 29-04-14 19:09:27, Richard Davies wrote:
>> Dwight Engen wrote:
>>> Michal Hocko wrote:
>>>> Tim Hockin wrote:
>>>>> Here's the reason it doesn't work for us: It doesn't work.
>>>>
>>>> There is a "simple" solution for that. Help us to fix it.
>>>>
>>>>> It was something like 2 YEARS since we first wanted this, and it
>>>>> STILL does not work.
>>>>
>>>> My recollection is that it was primarily Parallels and Google asking
>>>> for the kmem accounting. The reason why I didn't fight against
>>>> inclusion although the implementation at the time didn't have a
>>>> proper slab shrinking implemented was that that would happen later.
>>>> Well, that later hasn't happened yet and we are slowly getting there.
>>>>
>>>>> You're postponing a pretty simple request indefinitely in
>>>>> favor of a much more complex feature, which still doesn't really
>>>>> give me what I want.
>>>>
>>>> But we cannot simply add a new interface that will have to be
>>>> maintained for ever just because something else that is supposed to
>>>> workaround bugs.
>>>>
>>>>> What I want is an API that works like rlimit but per-cgroup, rather
>>>>> than per-UID.
>>>>
>>>> You can use an out-of-tree patchset for the time being or help to get
>>>> kmem into shape. If there are principal reasons why kmem cannot be
>>>> used then you better articulate them.
>>>
>>> Is there a plan to separately account/limit stack pages vs kmem in
>>> general? Richard would have to verify, but I suspect kmem is not currently
>>> viable as a process limiter for him because icache/dcache/stack is all
>>> accounted together.
>>
>> Certainly I would like to be able to limit container fork-bombs without
>> limiting the amount of disk IO caching for processes in those containers.
>>
>> In my testing with of kmem limits, I needed a limit of 256MB or lower to
>> catch fork bombs early enough. I would definitely like more than 256MB of
>> disk caching.
>>
>> So if we go the "working kmem" route, I would like to be able to specify a
>> limit excluding disk cache.
>
> Page cache (which is what you mean by disk cache probably) is a
> userspace accounted memory with the memory cgroup controller. And you
> do not have to limit that one. Kmem accounting refers to kernel internal
> allocations - slab memory and per process kernel stack. You can see how
> much memory is allocated per container by memory.kmem.usage_in_bytes or
> have a look at /proc/slabinfo to see what kind of memory kernel
> allocates globally and might be accounted for a container as well.
>
> The primary problem with the kmem accounting right now is that such a
> memory is not "reclaimed" and so if the kmem limit is reached all the
> further kmem allocations fail. The biggest user of the kmem allocations
> on many systems is dentry and inode chache which is reclaimable easily.
> When this is implemented the kmem limit will be usable to both prevent
> forkbombs but also other DOS scenarios when the kernel is pushed to
> allocate a huge amount of memory.

I would have to disagree here.
If a container starts to create many processes it will use kmem, however my use cases, the memory is not the problem.
The simple scheduling of so many processes generates have load on the machine.
Even if I have the memory to handle this... the problem becomes the scheduling of all of these processes.

Typical rsync of 2-3TB of small files(1-100k) will generate heavy pressure on the kmem, but will would not produce many 
processes.
On the other hand, forking thousands of processes with low memory footprint will hit the scheduler a lot faster then 
hitting the kmem limit.

Kmem limit is something that we need! But firmly believe that we need a simple NPROC limit for cgroups.

-hackman

>
> HTH
>
>> I am also somewhat worried that normal software use could legitimately go
>> above 256MB of kmem (even excluding disk cache) - I got to 50MB in testing
>> just by booting a distro with a few daemons in a container.
>>
>> Richard.
>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Protection against container fork bombs [WAS: Re: memcg with kmem limit doesn't recover after disk i/o causes limit to be hit]
       [not found]                               ` <CAO_Rewa20dneL8e3T4UPnu2Dkv28KTgFJR9_YSmRBKp-_yqewg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  2014-04-29 17:06                                 ` Michal Hocko
@ 2014-04-29 21:44                                 ` Frederic Weisbecker
  1 sibling, 0 replies; 33+ messages in thread
From: Frederic Weisbecker @ 2014-04-29 21:44 UTC (permalink / raw)
  To: Tim Hockin
  Cc: Richard Davies, Vladimir Davydov, David Rientjes, Marian Marinov,
	Max Kellermann, Tim Hockin,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	Serge Hallyn, Glauber Costa, Michal Hocko,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org, William Dauchy,
	Johannes Weiner, Tejun Heo, cgroups-u79uwXL29TY76Z2rM5mHXA,
	Daniel Walsh

On Tue, Apr 29, 2014 at 09:59:30AM -0700, Tim Hockin wrote:
> Here's the reason it doesn't work for us: It doesn't work.  It was
> something like 2 YEARS since we first wanted this, and it STILL does
> not work.

When I was working on the task counter cgroup subsystem 2 years
ago, the patches were actually pushed back by google people, in favour
of task stack kmem cgroup subsystem.

The reason was that expressing the forkbomb issue in terms of
number of tasks as a resource is awkward and that the real resource
in the game comes from kernel memory exhaustion due to task stack being
allocated over and over, swap ping-pong and stuffs...

And that was a pretty good argument. I still agree with that. Especially
since that could solve others people issues at the same time. kmem
cgroup has a quite large domain of application.

> You're postponing a pretty simple request indefinitely in
> favor of a much more complex feature, which still doesn't really give
> me what I want.  What I want is an API that works like rlimit but
> per-cgroup, rather than per-UID.

The request is simple but I don't think that adding the task counter
cgroup subsystem is simpler than extending the kmem code to apply limits
to only task stack. Especially in terms of maintainance.

Also you guys have very good mm kernel developers who are already
familiar with this.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Protection against container fork bombs [WAS: Re: memcg with kmem limit doesn't recover after disk i/o causes limit to be hit]
       [not found]                                 ` <20140429214454.GF6129-bi+AKbBUZKY6gyzm1THtWbp2dZbC/Bob@public.gmane.org>
@ 2014-04-30 13:12                                   ` Daniel J Walsh
  0 siblings, 0 replies; 33+ messages in thread
From: Daniel J Walsh @ 2014-04-30 13:12 UTC (permalink / raw)
  To: Frederic Weisbecker, Tim Hockin
  Cc: Richard Davies, Vladimir Davydov, David Rientjes, Marian Marinov,
	Max Kellermann, Tim Hockin,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	Serge Hallyn, Glauber Costa, Michal Hocko,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org, William Dauchy,
	Johannes Weiner, Tejun Heo, cgroups-u79uwXL29TY76Z2rM5mHXA


On 04/29/2014 05:44 PM, Frederic Weisbecker wrote:
> On Tue, Apr 29, 2014 at 09:59:30AM -0700, Tim Hockin wrote:
>> Here's the reason it doesn't work for us: It doesn't work.  It was
>> something like 2 YEARS since we first wanted this, and it STILL does
>> not work.
> When I was working on the task counter cgroup subsystem 2 years
> ago, the patches were actually pushed back by google people, in favour
> of task stack kmem cgroup subsystem.
>
> The reason was that expressing the forkbomb issue in terms of
> number of tasks as a resource is awkward and that the real resource
> in the game comes from kernel memory exhaustion due to task stack being
> allocated over and over, swap ping-pong and stuffs...
>
> And that was a pretty good argument. I still agree with that. Especially
> since that could solve others people issues at the same time. kmem
> cgroup has a quite large domain of application.
>
>> You're postponing a pretty simple request indefinitely in
>> favor of a much more complex feature, which still doesn't really give
>> me what I want.  What I want is an API that works like rlimit but
>> per-cgroup, rather than per-UID.
> The request is simple but I don't think that adding the task counter
> cgroup subsystem is simpler than extending the kmem code to apply limits
> to only task stack. Especially in terms of maintainance.
>
> Also you guys have very good mm kernel developers who are already
> familiar with this.
I would look at this from a Usability point of view.  It is a lot easier
to understand number of processes then the mount of KMEM those processes
will need.  Setting something like
ProcessLimit=1000 in a systemd unit file is easy to explain.  Now if
systemd has the ability to translate this into something that makes
sense in terms of kmem cgroup, then my argument goes away.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Protection against container fork bombs [WAS: Re: memcg with kmem limit doesn't recover after disk i/o causes limit to be hit]
       [not found]                                   ` <5360F6B4.9010308-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2014-04-30 13:28                                     ` Frederic Weisbecker
  0 siblings, 0 replies; 33+ messages in thread
From: Frederic Weisbecker @ 2014-04-30 13:28 UTC (permalink / raw)
  To: Daniel J Walsh
  Cc: Richard Davies, Vladimir Davydov, David Rientjes, Marian Marinov,
	Max Kellermann, Tim Hockin,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	Serge Hallyn, Tim Hockin, Glauber Costa, Michal Hocko,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org, William Dauchy,
	Johannes Weiner, Tejun Heo, cgroups-u79uwXL29TY76Z2rM5mHXA

On Wed, Apr 30, 2014 at 09:12:20AM -0400, Daniel J Walsh wrote:
> 
> On 04/29/2014 05:44 PM, Frederic Weisbecker wrote:
> > On Tue, Apr 29, 2014 at 09:59:30AM -0700, Tim Hockin wrote:
> >> Here's the reason it doesn't work for us: It doesn't work.  It was
> >> something like 2 YEARS since we first wanted this, and it STILL does
> >> not work.
> > When I was working on the task counter cgroup subsystem 2 years
> > ago, the patches were actually pushed back by google people, in favour
> > of task stack kmem cgroup subsystem.
> >
> > The reason was that expressing the forkbomb issue in terms of
> > number of tasks as a resource is awkward and that the real resource
> > in the game comes from kernel memory exhaustion due to task stack being
> > allocated over and over, swap ping-pong and stuffs...
> >
> > And that was a pretty good argument. I still agree with that. Especially
> > since that could solve others people issues at the same time. kmem
> > cgroup has a quite large domain of application.
> >
> >> You're postponing a pretty simple request indefinitely in
> >> favor of a much more complex feature, which still doesn't really give
> >> me what I want.  What I want is an API that works like rlimit but
> >> per-cgroup, rather than per-UID.
> > The request is simple but I don't think that adding the task counter
> > cgroup subsystem is simpler than extending the kmem code to apply limits
> > to only task stack. Especially in terms of maintainance.
> >
> > Also you guys have very good mm kernel developers who are already
> > familiar with this.
> I would look at this from a Usability point of view.  It is a lot easier
> to understand number of processes then the mount of KMEM those processes
> will need.  Setting something like
> ProcessLimit=1000 in a systemd unit file is easy to explain.

Yeah that's a fair point.

> Now if systemd has the ability to translate this into something that makes
> sense in terms of kmem cgroup, then my argument goes away.

Yeah if we keep the kmem direction, this can be a place where we do the mapping.
Now I just hope the amount of stack memory allocated doesn't differ too much per arch.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Protection against container fork bombs [WAS: Re: memcg with kmem limit doesn't recover after disk i/o causes limit to be hit]
       [not found]                                           ` <53601B68.60906-NV7Lj0SOnH0@public.gmane.org>
@ 2014-04-30 13:31                                             ` Michal Hocko
  0 siblings, 0 replies; 33+ messages in thread
From: Michal Hocko @ 2014-04-30 13:31 UTC (permalink / raw)
  To: Marian Marinov
  Cc: Richard Davies, Vladimir Davydov, Daniel Walsh, Max Kellermann,
	Tim Hockin, Frederic Weisbecker,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	Serge Hallyn, Tim Hockin, Glauber Costa, Johannes Weiner,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg, William Dauchy, David Rientjes,
	Tejun Heo, cgroups-u79uwXL29TY76Z2rM5mHXA

On Wed 30-04-14 00:36:40, Marian Marinov wrote:
> On 04/29/2014 09:27 PM, Michal Hocko wrote:
> >On Tue 29-04-14 19:09:27, Richard Davies wrote:
> >>Dwight Engen wrote:
> >>>Michal Hocko wrote:
> >>>>Tim Hockin wrote:
> >>>>>Here's the reason it doesn't work for us: It doesn't work.
> >>>>
> >>>>There is a "simple" solution for that. Help us to fix it.
> >>>>
> >>>>>It was something like 2 YEARS since we first wanted this, and it
> >>>>>STILL does not work.
> >>>>
> >>>>My recollection is that it was primarily Parallels and Google asking
> >>>>for the kmem accounting. The reason why I didn't fight against
> >>>>inclusion although the implementation at the time didn't have a
> >>>>proper slab shrinking implemented was that that would happen later.
> >>>>Well, that later hasn't happened yet and we are slowly getting there.
> >>>>
> >>>>>You're postponing a pretty simple request indefinitely in
> >>>>>favor of a much more complex feature, which still doesn't really
> >>>>>give me what I want.
> >>>>
> >>>>But we cannot simply add a new interface that will have to be
> >>>>maintained for ever just because something else that is supposed to
> >>>>workaround bugs.
> >>>>
> >>>>>What I want is an API that works like rlimit but per-cgroup, rather
> >>>>>than per-UID.
> >>>>
> >>>>You can use an out-of-tree patchset for the time being or help to get
> >>>>kmem into shape. If there are principal reasons why kmem cannot be
> >>>>used then you better articulate them.
> >>>
> >>>Is there a plan to separately account/limit stack pages vs kmem in
> >>>general? Richard would have to verify, but I suspect kmem is not currently
> >>>viable as a process limiter for him because icache/dcache/stack is all
> >>>accounted together.
> >>
> >>Certainly I would like to be able to limit container fork-bombs without
> >>limiting the amount of disk IO caching for processes in those containers.
> >>
> >>In my testing with of kmem limits, I needed a limit of 256MB or lower to
> >>catch fork bombs early enough. I would definitely like more than 256MB of
> >>disk caching.
> >>
> >>So if we go the "working kmem" route, I would like to be able to specify a
> >>limit excluding disk cache.
> >
> >Page cache (which is what you mean by disk cache probably) is a
> >userspace accounted memory with the memory cgroup controller. And you
> >do not have to limit that one. Kmem accounting refers to kernel internal
> >allocations - slab memory and per process kernel stack. You can see how
> >much memory is allocated per container by memory.kmem.usage_in_bytes or
> >have a look at /proc/slabinfo to see what kind of memory kernel
> >allocates globally and might be accounted for a container as well.
> >
> >The primary problem with the kmem accounting right now is that such a
> >memory is not "reclaimed" and so if the kmem limit is reached all the
> >further kmem allocations fail. The biggest user of the kmem allocations
> >on many systems is dentry and inode chache which is reclaimable easily.
> >When this is implemented the kmem limit will be usable to both prevent
> >forkbombs but also other DOS scenarios when the kernel is pushed to
> >allocate a huge amount of memory.
> 
> I would have to disagree here.
> If a container starts to create many processes it will use kmem, however my use cases, the memory is not the problem.
> The simple scheduling of so many processes generates have load on the machine.
> Even if I have the memory to handle this... the problem becomes the scheduling of all of these processes.

What prevents you from setting the kmem limit to NR_PROC * 8K + slab_pillow?

> Typical rsync of 2-3TB of small files(1-100k) will generate heavy pressure
> on the kmem, but will would not produce many processes.

Once we have a proper slab reclaim implementation this shouldn't be a
problem.

> On the other hand, forking thousands of processes with low memory footprint
> will hit the scheduler a lot faster then hitting the kmem limit.
>
> Kmem limit is something that we need! But firmly believe that we need
> a simple NPROC limit for cgroups.

Once again. If you feel that your usecase is not covered by the kmem
limit follow up on the original email thread I have referenced earlier
in the thread. Splitting up the discussion doesn't help at all.

> -hackman
> 
> >
> >HTH
> >
> >>I am also somewhat worried that normal software use could legitimately go
> >>above 256MB of kmem (even excluding disk cache) - I got to 50MB in testing
> >>just by booting a distro with a few daemons in a container.
> >>
> >>Richard.
> >
> 

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Protection against container fork bombs [WAS: Re: memcg with kmem limit doesn't recover after disk i/o causes limit to be hit]
       [not found]               ` <20140423084942.560ae837-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
  2014-04-28 18:00                 ` Serge Hallyn
@ 2014-05-06 11:40                 ` Marian Marinov
  2014-06-10 14:50                 ` Marian Marinov
  2 siblings, 0 replies; 33+ messages in thread
From: Marian Marinov @ 2014-05-06 11:40 UTC (permalink / raw)
  To: Dwight Engen
  Cc: Richard Davies, Vladimir Davydov, Daniel Walsh, Max Kellermann,
	Tim Hockin, Frederic Weisbecker,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	Johannes Weiner, Glauber Costa, Michal Hocko,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg, William Dauchy, David Rientjes,
	Tejun Heo, cgroups-u79uwXL29TY76Z2rM5mHXA

On 04/23/2014 03:49 PM, Dwight Engen wrote:
> On Wed, 23 Apr 2014 09:07:28 +0300
> Marian Marinov <mm-NV7Lj0SOnH0@public.gmane.org> wrote:
>
>> On 04/22/2014 11:05 PM, Richard Davies wrote:
>>> Dwight Engen wrote:
>>>> Richard Davies wrote:
>>>>> Vladimir Davydov wrote:
>>>>>> In short, kmem limiting for memory cgroups is currently broken.
>>>>>> Do not use it. We are working on making it usable though.
>>> ...
>>>>> What is the best mechanism available today, until kmem limits
>>>>> mature?
>>>>>
>>>>> RLIMIT_NPROC exists but is per-user, not per-container.
>>>>>
>>>>> Perhaps there is an up-to-date task counter patchset or similar?
>>>>
>>>> I updated Frederic's task counter patches and included Max
>>>> Kellermann's fork limiter here:
>>>>
>>>> http://thread.gmane.org/gmane.linux.kernel.containers/27212
>>>>
>>>> I can send you a more recent patchset (against 3.13.10) if you
>>>> would find it useful.
>>>
>>> Yes please, I would be interested in that. Ideally even against
>>> 3.14.1 if you have that too.
>>
>> Dwight, do you have these patches in any public repo?
>>
>> I would like to test them also.
>
> Hi Marian, I put the patches against 3.13.11 and 3.14.1 up at:
>
> git://github.com/dwengen/linux.git cpuacct-task-limit-3.13
> git://github.com/dwengen/linux.git cpuacct-task-limit-3.14
>
Guys I tested the patches with 3.12.16. However I see a problem with them.

Trying to set the limit to a cgroup which already have processes in it does not work:

[root@sp2 lxc]# echo 50 > cpuacct.task_limit
-bash: echo: write error: Device or resource busy
[root@sp2 lxc]# echo 0 > cpuacct.task_limit
-bash: echo: write error: Device or resource busy
[root@sp2 lxc]#

I have even tried to remove this check:
+               if (cgroup_task_count(cgrp) || !list_empty(&cgrp->children))
+                       return -EBUSY;
But still give me 'Device or resource busy'.

Any pointers of why is this happening ?

Marian

>> Marian
>>
>>>
>>> Thanks,
>>>
>>> Richard.
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe cgroups"
>>> in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>
>>>
>>
>
> --
> To unsubscribe from this list: send the line "unsubscribe cgroups" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Protection against container fork bombs [WAS: Re: memcg with kmem limit doesn't recover after disk i/o causes limit to be hit]
       [not found]                 ` <5368CA47.7030007-NV7Lj0SOnH0@public.gmane.org>
@ 2014-05-07 17:15                   ` Dwight Engen
  0 siblings, 0 replies; 33+ messages in thread
From: Dwight Engen @ 2014-05-07 17:15 UTC (permalink / raw)
  To: Marian Marinov
  Cc: Richard Davies, Vladimir Davydov, Daniel Walsh, Max Kellermann,
	Tim Hockin, Frederic Weisbecker,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	Johannes Weiner, Glauber Costa, Michal Hocko,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg, William Dauchy, David Rientjes,
	Tejun Heo, cgroups-u79uwXL29TY76Z2rM5mHXA

On Tue, 06 May 2014 14:40:55 +0300
Marian Marinov <mm-NV7Lj0SOnH0@public.gmane.org> wrote:

> On 04/23/2014 03:49 PM, Dwight Engen wrote:
> > On Wed, 23 Apr 2014 09:07:28 +0300
> > Marian Marinov <mm-NV7Lj0SOnH0@public.gmane.org> wrote:
> >
> >> On 04/22/2014 11:05 PM, Richard Davies wrote:
> >>> Dwight Engen wrote:
> >>>> Richard Davies wrote:
> >>>>> Vladimir Davydov wrote:
> >>>>>> In short, kmem limiting for memory cgroups is currently broken.
> >>>>>> Do not use it. We are working on making it usable though.
> >>> ...
> >>>>> What is the best mechanism available today, until kmem limits
> >>>>> mature?
> >>>>>
> >>>>> RLIMIT_NPROC exists but is per-user, not per-container.
> >>>>>
> >>>>> Perhaps there is an up-to-date task counter patchset or similar?
> >>>>
> >>>> I updated Frederic's task counter patches and included Max
> >>>> Kellermann's fork limiter here:
> >>>>
> >>>> http://thread.gmane.org/gmane.linux.kernel.containers/27212
> >>>>
> >>>> I can send you a more recent patchset (against 3.13.10) if you
> >>>> would find it useful.
> >>>
> >>> Yes please, I would be interested in that. Ideally even against
> >>> 3.14.1 if you have that too.
> >>
> >> Dwight, do you have these patches in any public repo?
> >>
> >> I would like to test them also.
> >
> > Hi Marian, I put the patches against 3.13.11 and 3.14.1 up at:
> >
> > git://github.com/dwengen/linux.git cpuacct-task-limit-3.13
> > git://github.com/dwengen/linux.git cpuacct-task-limit-3.14
> >
> Guys I tested the patches with 3.12.16. However I see a problem with
> them.
> 
> Trying to set the limit to a cgroup which already have processes in
> it does not work:

This is a similar check/limitation to the one for kmem in memcg, and is
done here to keep the res_counters consistent and from going negative.
It could probably be relaxed slightly by using res_counter_set_limit()
instead, but you would still need to initially set a limit before
adding tasks to the group.

> [root@sp2 lxc]# echo 50 > cpuacct.task_limit
> -bash: echo: write error: Device or resource busy
> [root@sp2 lxc]# echo 0 > cpuacct.task_limit
> -bash: echo: write error: Device or resource busy
> [root@sp2 lxc]#
> 
> I have even tried to remove this check:
> +               if (cgroup_task_count(cgrp)
> || !list_empty(&cgrp->children))
> +                       return -EBUSY;
> But still give me 'Device or resource busy'.
> 
> Any pointers of why is this happening ?
> 
> Marian
> 
> >> Marian
> >>
> >>>
> >>> Thanks,
> >>>
> >>> Richard.
> >>> --
> >>> To unsubscribe from this list: send the line "unsubscribe cgroups"
> >>> in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> >>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >>>
> >>>
> >>
> >
> > --
> > To unsubscribe from this list: send the line "unsubscribe cgroups"
> > in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >
> >
> 

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Protection against container fork bombs [WAS: Re: memcg with kmem limit doesn't recover after disk i/o causes limit to be hit]
       [not found]                   ` <20140507131514.43716518-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
@ 2014-05-07 22:39                     ` Marian Marinov
       [not found]                       ` <536AB626.9070005-108MBtLGafw@public.gmane.org>
  0 siblings, 1 reply; 33+ messages in thread
From: Marian Marinov @ 2014-05-07 22:39 UTC (permalink / raw)
  To: Dwight Engen, Marian Marinov
  Cc: Richard Davies, Vladimir Davydov, Max Kellermann, Tim Hockin,
	Frederic Weisbecker,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	Daniel Walsh, cgroups-u79uwXL29TY76Z2rM5mHXA, Glauber Costa,
	Michal Hocko, linux-mm-Bw31MaZKKs3YtjvyW6yDsg, William Dauchy,
	Johannes Weiner, Tejun Heo, David Rientjes

On 05/07/2014 08:15 PM, Dwight Engen wrote:
> On Tue, 06 May 2014 14:40:55 +0300
> Marian Marinov <mm-NV7Lj0SOnH0@public.gmane.org> wrote:
>
>> On 04/23/2014 03:49 PM, Dwight Engen wrote:
>>> On Wed, 23 Apr 2014 09:07:28 +0300
>>> Marian Marinov <mm-NV7Lj0SOnH0@public.gmane.org> wrote:
>>>
>>>> On 04/22/2014 11:05 PM, Richard Davies wrote:
>>>>> Dwight Engen wrote:
>>>>>> Richard Davies wrote:
>>>>>>> Vladimir Davydov wrote:
>>>>>>>> In short, kmem limiting for memory cgroups is currently broken.
>>>>>>>> Do not use it. We are working on making it usable though.
>>>>> ...
>>>>>>> What is the best mechanism available today, until kmem limits
>>>>>>> mature?
>>>>>>>
>>>>>>> RLIMIT_NPROC exists but is per-user, not per-container.
>>>>>>>
>>>>>>> Perhaps there is an up-to-date task counter patchset or similar?
>>>>>>
>>>>>> I updated Frederic's task counter patches and included Max
>>>>>> Kellermann's fork limiter here:
>>>>>>
>>>>>> http://thread.gmane.org/gmane.linux.kernel.containers/27212
>>>>>>
>>>>>> I can send you a more recent patchset (against 3.13.10) if you
>>>>>> would find it useful.
>>>>>
>>>>> Yes please, I would be interested in that. Ideally even against
>>>>> 3.14.1 if you have that too.
>>>>
>>>> Dwight, do you have these patches in any public repo?
>>>>
>>>> I would like to test them also.
>>>
>>> Hi Marian, I put the patches against 3.13.11 and 3.14.1 up at:
>>>
>>> git://github.com/dwengen/linux.git cpuacct-task-limit-3.13
>>> git://github.com/dwengen/linux.git cpuacct-task-limit-3.14
>>>
>> Guys I tested the patches with 3.12.16. However I see a problem with
>> them.
>>
>> Trying to set the limit to a cgroup which already have processes in
>> it does not work:
>
> This is a similar check/limitation to the one for kmem in memcg, and is
> done here to keep the res_counters consistent and from going negative.
> It could probably be relaxed slightly by using res_counter_set_limit()
> instead, but you would still need to initially set a limit before
> adding tasks to the group.

I have removed the check entirely and still receive the EBUSY... I just don't understand what is returning it. If you 
have any pointers, I would be happy to take a look.

I'll look at set_limit(), thanks for pointing that one.

What I'm proposing is the following checks:

     if (val > RES_COUNTER_MAX || val < 0)
         return -EBUSY;
     if (val != 0 && val <= cgroup_task_count(cgrp))
         return -EBUSY;

     res_counter_write_u64(&ca->task_limit, type, val);

This way we ensure that val is within the limits > 0 and < RES_COUNTER_MAX. And also allow only values of 0 or greater 
then the current task count.

Marian
>
>> [root@sp2 lxc]# echo 50 > cpuacct.task_limit
>> -bash: echo: write error: Device or resource busy
>> [root@sp2 lxc]# echo 0 > cpuacct.task_limit
>> -bash: echo: write error: Device or resource busy
>> [root@sp2 lxc]#
>>
>> I have even tried to remove this check:
>> +               if (cgroup_task_count(cgrp)
>> || !list_empty(&cgrp->children))
>> +                       return -EBUSY;
>> But still give me 'Device or resource busy'.
>>
>> Any pointers of why is this happening ?
>>
>> Marian
>>
>>>> Marian
>>>>
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Richard.
>>>>> --
>>>>> To unsubscribe from this list: send the line "unsubscribe cgroups"
>>>>> in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>
>>>>>
>>>>
>>>
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe cgroups"
>>> in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>
>>>
>>
>
> _______________________________________________
> Containers mailing list
> Containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org
> https://lists.linuxfoundation.org/mailman/listinfo/containers
>


-- 
Marian Marinov
Founder & CEO of 1H Ltd.
Jabber/GTalk: hackman-/eSpBmjxGS4dnm+yROfE0A@public.gmane.org
ICQ: 7556201
Mobile: +359 886 660 270

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Protection against container fork bombs [WAS: Re: memcg with kmem limit doesn't recover after disk i/o causes limit to be hit]
       [not found]                       ` <536AB626.9070005-108MBtLGafw@public.gmane.org>
@ 2014-05-08 15:25                         ` Richard Davies
  0 siblings, 0 replies; 33+ messages in thread
From: Richard Davies @ 2014-05-08 15:25 UTC (permalink / raw)
  To: Marian Marinov
  Cc: Vladimir Davydov, Marian Marinov, Max Kellermann, Tim Hockin,
	Frederic Weisbecker,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	Daniel Walsh, cgroups-u79uwXL29TY76Z2rM5mHXA, Glauber Costa,
	Michal Hocko, linux-mm-Bw31MaZKKs3YtjvyW6yDsg, William Dauchy,
	Johannes Weiner, Tejun Heo, David Rientjes

Marian Marinov wrote:
> On 05/07/2014 08:15 PM, Dwight Engen wrote:
> >On Tue, 06 May 2014 14:40:55 +0300
> >Marian Marinov <mm-NV7Lj0SOnH0@public.gmane.org> wrote:
> >
> >>On 04/23/2014 03:49 PM, Dwight Engen wrote:
> >>>On Wed, 23 Apr 2014 09:07:28 +0300
> >>>Marian Marinov <mm-NV7Lj0SOnH0@public.gmane.org> wrote:
> >>>
> >>>>On 04/22/2014 11:05 PM, Richard Davies wrote:
> >>>>>Dwight Engen wrote:
> >>>>>>Richard Davies wrote:
> >>>>>>>Vladimir Davydov wrote:
> >>>>>>>>In short, kmem limiting for memory cgroups is currently broken.
> >>>>>>>>Do not use it. We are working on making it usable though.
> >>>>>...
> >>>>>>>What is the best mechanism available today, until kmem limits
> >>>>>>>mature?
> >>>>>>>
> >>>>>>>RLIMIT_NPROC exists but is per-user, not per-container.
> >>>>>>>
> >>>>>>>Perhaps there is an up-to-date task counter patchset or similar?
> >>>>>>
> >>>>>>I updated Frederic's task counter patches and included Max
> >>>>>>Kellermann's fork limiter here:
> >>>>>>
> >>>>>>http://thread.gmane.org/gmane.linux.kernel.containers/27212
> >>>>>>
> >>>>>>I can send you a more recent patchset (against 3.13.10) if you
> >>>>>>would find it useful.
> >>>>>
> >>>>>Yes please, I would be interested in that. Ideally even against
> >>>>>3.14.1 if you have that too.
> >>>>
> >>>>Dwight, do you have these patches in any public repo?
> >>>>
> >>>>I would like to test them also.
> >>>
> >>>Hi Marian, I put the patches against 3.13.11 and 3.14.1 up at:
> >>>
> >>>git://github.com/dwengen/linux.git cpuacct-task-limit-3.13
> >>>git://github.com/dwengen/linux.git cpuacct-task-limit-3.14
> >>>
> >>Guys I tested the patches with 3.12.16. However I see a problem with
> >>them.
> >>
> >>Trying to set the limit to a cgroup which already have processes in
> >>it does not work:
> >
> >This is a similar check/limitation to the one for kmem in memcg, and is
> >done here to keep the res_counters consistent and from going negative.
> >It could probably be relaxed slightly by using res_counter_set_limit()
> >instead, but you would still need to initially set a limit before
> >adding tasks to the group.
> 
> I have removed the check entirely and still receive the EBUSY... I
> just don't understand what is returning it. If you have any
> pointers, I would be happy to take a look.
> 
> I'll look at set_limit(), thanks for pointing that one.
> 
> What I'm proposing is the following checks:
> 
>     if (val > RES_COUNTER_MAX || val < 0)
>         return -EBUSY;
>     if (val != 0 && val <= cgroup_task_count(cgrp))
>         return -EBUSY;
> 
>     res_counter_write_u64(&ca->task_limit, type, val);
> 
> This way we ensure that val is within the limits > 0 and <
> RES_COUNTER_MAX. And also allow only values of 0 or greater then the
> current task count.

I have also noticed that I can't change many different cgroup limits while
there are tasks running in the cgroup - not just cpuacct.task_limit, but
also kmem and even normal memory.limit_in_bytes

I would like to be able to change all of these limits, as long as the new
limit is greater than the actual current use.

Could a method like this be used for all of the others too?

Richard.

> >>[root@sp2 lxc]# echo 50 > cpuacct.task_limit
> >>-bash: echo: write error: Device or resource busy
> >>[root@sp2 lxc]# echo 0 > cpuacct.task_limit
> >>-bash: echo: write error: Device or resource busy
> >>[root@sp2 lxc]#
> >>
> >>I have even tried to remove this check:
> >>+               if (cgroup_task_count(cgrp)
> >>|| !list_empty(&cgrp->children))
> >>+                       return -EBUSY;
> >>But still give me 'Device or resource busy'.
> >>
> >>Any pointers of why is this happening ?
> >>
> >>Marian

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Protection against container fork bombs [WAS: Re: memcg with kmem limit doesn't recover after disk i/o causes limit to be hit]
       [not found]           ` <20140422200531.GA19334-2oeHp4OYwSjPZs67QiJbJtBPR1lH4CV8@public.gmane.org>
  2014-04-22 20:13             ` Tim Hockin
  2014-04-23  6:07             ` Marian Marinov
@ 2014-06-10 12:18             ` Alin Dobre
  2 siblings, 0 replies; 33+ messages in thread
From: Alin Dobre @ 2014-06-10 12:18 UTC (permalink / raw)
  To: Richard Davies, Dwight Engen
  Cc: Vladimir Davydov, Daniel Walsh, Max Kellermann, Tim Hockin,
	Frederic Weisbecker,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	Johannes Weiner, Glauber Costa, Michal Hocko,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg, William Dauchy, David Rientjes,
	Tejun Heo, cgroups-u79uwXL29TY76Z2rM5mHXA

On 22/04/14 21:05, Richard Davies wrote:
> Dwight Engen wrote:
>> Richard Davies wrote:
>>> Vladimir Davydov wrote:
>>>> In short, kmem limiting for memory cgroups is currently broken. Do
>>>> not use it. We are working on making it usable though.
> ...
>>> What is the best mechanism available today, until kmem limits mature?
>>>
>>> RLIMIT_NPROC exists but is per-user, not per-container.
>>>
>>> Perhaps there is an up-to-date task counter patchset or similar?
>>
>> I updated Frederic's task counter patches and included Max Kellermann's
>> fork limiter here:
>>
>> http://thread.gmane.org/gmane.linux.kernel.containers/27212
>>
>> I can send you a more recent patchset (against 3.13.10) if you would
>> find it useful.
> 
> Yes please, I would be interested in that. Ideally even against 3.14.1 if
> you have that too.

Any chance for a 3.15 rebase, since the changes from cgroup_fork() makes
the operation no longer trivial.

Cheers,
Alin.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Protection against container fork bombs [WAS: Re: memcg with kmem limit doesn't recover after disk i/o causes limit to be hit]
       [not found]               ` <20140423084942.560ae837-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
  2014-04-28 18:00                 ` Serge Hallyn
  2014-05-06 11:40                 ` Marian Marinov
@ 2014-06-10 14:50                 ` Marian Marinov
  2 siblings, 0 replies; 33+ messages in thread
From: Marian Marinov @ 2014-06-10 14:50 UTC (permalink / raw)
  To: Dwight Engen
  Cc: Richard Davies, Vladimir Davydov, Daniel Walsh, Max Kellermann,
	Tim Hockin, Frederic Weisbecker,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	Johannes Weiner, Glauber Costa, Michal Hocko,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg, William Dauchy, David Rientjes,
	Tejun Heo, cgroups-u79uwXL29TY76Z2rM5mHXA


[-- Attachment #1.1: Type: text/plain, Size: 2306 bytes --]

On 04/23/2014 03:49 PM, Dwight Engen wrote:
> On Wed, 23 Apr 2014 09:07:28 +0300
> Marian Marinov <mm-NV7Lj0SOnH0@public.gmane.org> wrote:
> 
>> On 04/22/2014 11:05 PM, Richard Davies wrote:
>>> Dwight Engen wrote:
>>>> Richard Davies wrote:
>>>>> Vladimir Davydov wrote:
>>>>>> In short, kmem limiting for memory cgroups is currently broken.
>>>>>> Do not use it. We are working on making it usable though.
>>> ...
>>>>> What is the best mechanism available today, until kmem limits
>>>>> mature?
>>>>>
>>>>> RLIMIT_NPROC exists but is per-user, not per-container.
>>>>>
>>>>> Perhaps there is an up-to-date task counter patchset or similar?
>>>>
>>>> I updated Frederic's task counter patches and included Max
>>>> Kellermann's fork limiter here:
>>>>
>>>> http://thread.gmane.org/gmane.linux.kernel.containers/27212
>>>>
>>>> I can send you a more recent patchset (against 3.13.10) if you
>>>> would find it useful.
>>>
>>> Yes please, I would be interested in that. Ideally even against
>>> 3.14.1 if you have that too.
>>
>> Dwight, do you have these patches in any public repo?
>>
>> I would like to test them also.
> 
> Hi Marian, I put the patches against 3.13.11 and 3.14.1 up at:
> 
> git://github.com/dwengen/linux.git cpuacct-task-limit-3.13
> git://github.com/dwengen/linux.git cpuacct-task-limit-3.14

I did a backport of the patches to 3.12.16 and forward ported them to 3.12.20.

I'm very happy with how they work.

I used the patches on machines with 10-20k processes and it worked perfectly when some of the containers spawned 100s of
processes. It really saved us when one of the containers was attacked :)

The only thing that I'm going to add is on the fly change of the limit.

Marian

>  
>> Marian
>>
>>>
>>> Thanks,
>>>
>>> Richard.
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe cgroups"
>>> in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>
>>>
>>
> 
> --
> To unsubscribe from this list: send the line "unsubscribe cgroups" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 



[-- Attachment #1.2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

[-- Attachment #2: Type: text/plain, Size: 205 bytes --]

_______________________________________________
Containers mailing list
Containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org
https://lists.linuxfoundation.org/mailman/listinfo/containers

^ permalink raw reply	[flat|nested] 33+ messages in thread

end of thread, other threads:[~2014-06-10 14:50 UTC | newest]

Thread overview: 33+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20140416154650.GA3034@alpha.arachsys.com>
     [not found] ` <20140418155939.GE4523@dhcp22.suse.cz>
     [not found]   ` <5351679F.5040908@parallels.com>
     [not found]     ` <5351679F.5040908-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
2014-04-20 14:28       ` Protection against container fork bombs [WAS: Re: memcg with kmem limit doesn't recover after disk i/o causes limit to be hit] Richard Davies
     [not found]     ` <20140420142830.GC22077@alpha.arachsys.com>
     [not found]       ` <20140420142830.GC22077-2oeHp4OYwSjPZs67QiJbJtBPR1lH4CV8@public.gmane.org>
2014-04-20 18:35         ` Tim Hockin
2014-04-22 18:39         ` Dwight Engen
     [not found]       ` <20140422143943.20609800@oracle.com>
     [not found]         ` <20140422143943.20609800-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2014-04-22 20:05           ` Richard Davies
     [not found]         ` <20140422200531.GA19334@alpha.arachsys.com>
     [not found]           ` <20140422200531.GA19334-2oeHp4OYwSjPZs67QiJbJtBPR1lH4CV8@public.gmane.org>
2014-04-22 20:13             ` Tim Hockin
2014-04-23  6:07             ` Marian Marinov
2014-06-10 12:18             ` Alin Dobre
     [not found]           ` <535758A0.5000500@yuhu.biz>
     [not found]             ` <535758A0.5000500-NV7Lj0SOnH0@public.gmane.org>
2014-04-23 12:49               ` Dwight Engen
     [not found]             ` <20140423084942.560ae837@oracle.com>
     [not found]               ` <20140423084942.560ae837-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2014-04-28 18:00                 ` Serge Hallyn
2014-04-29  7:25                   ` Michal Hocko
     [not found]                     ` <20140429072515.GB15058-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
2014-04-29 13:03                       ` Serge Hallyn
     [not found]                     ` <20140429130353.GA27354@ubuntumail>
2014-04-29 13:57                       ` Marian Marinov
2014-04-29 14:04                       ` Tim Hockin
2014-04-29 15:43                       ` Michal Hocko
     [not found]                       ` <20140429154345.GH15058@dhcp22.suse.cz>
     [not found]                         ` <20140429154345.GH15058-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
2014-04-29 16:06                           ` Tim Hockin
     [not found]                         ` <CAO_RewYZDGLBAKit4CudTbqVk+zfDRX8kP0W6Zz90xJh7abM9Q@mail.gmail.com>
     [not found]                           ` <CAO_RewYZDGLBAKit4CudTbqVk+zfDRX8kP0W6Zz90xJh7abM9Q-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2014-04-29 16:51                             ` Frederic Weisbecker
     [not found]                           ` <20140429165114.GE6129@localhost.localdomain>
     [not found]                             ` <20140429165114.GE6129-bi+AKbBUZKY6gyzm1THtWbp2dZbC/Bob@public.gmane.org>
2014-04-29 16:59                               ` Tim Hockin
     [not found]                             ` <CAO_Rewa20dneL8e3T4UPnu2Dkv28KTgFJR9_YSmRBKp-_yqewg@mail.gmail.com>
     [not found]                               ` <CAO_Rewa20dneL8e3T4UPnu2Dkv28KTgFJR9_YSmRBKp-_yqewg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2014-04-29 17:06                                 ` Michal Hocko
2014-04-29 21:44                                 ` Frederic Weisbecker
     [not found]                               ` <20140429170639.GA25609@dhcp22.suse.cz>
     [not found]                                 ` <20140429170639.GA25609-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
2014-04-29 17:30                                   ` Dwight Engen
     [not found]                                 ` <20140429133039.162d9dd7@oracle.com>
     [not found]                                   ` <20140429133039.162d9dd7-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2014-04-29 18:09                                     ` Richard Davies
     [not found]                                   ` <20140429180927.GB29606@alpha.arachsys.com>
     [not found]                                     ` <20140429180927.GB29606-2oeHp4OYwSjPZs67QiJbJtBPR1lH4CV8@public.gmane.org>
2014-04-29 18:27                                       ` Michal Hocko
     [not found]                                     ` <20140429182742.GB25609@dhcp22.suse.cz>
     [not found]                                       ` <20140429182742.GB25609-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
2014-04-29 18:39                                         ` Richard Davies
     [not found]                                           ` <20140429183928.GF29606-2oeHp4OYwSjPZs67QiJbJtBPR1lH4CV8@public.gmane.org>
2014-04-29 19:03                                             ` Michal Hocko
2014-04-29 21:36                                         ` Marian Marinov
     [not found]                                           ` <53601B68.60906-NV7Lj0SOnH0@public.gmane.org>
2014-04-30 13:31                                             ` Michal Hocko
     [not found]                               ` <20140429214454.GF6129@localhost.localdomain>
     [not found]                                 ` <20140429214454.GF6129-bi+AKbBUZKY6gyzm1THtWbp2dZbC/Bob@public.gmane.org>
2014-04-30 13:12                                   ` Daniel J Walsh
     [not found]                                 ` <5360F6B4.9010308@redhat.com>
     [not found]                                   ` <5360F6B4.9010308-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2014-04-30 13:28                                     ` Frederic Weisbecker
2014-05-06 11:40                 ` Marian Marinov
2014-06-10 14:50                 ` Marian Marinov
     [not found]               ` <5368CA47.7030007@yuhu.biz>
     [not found]                 ` <5368CA47.7030007-NV7Lj0SOnH0@public.gmane.org>
2014-05-07 17:15                   ` Dwight Engen
     [not found]                 ` <20140507131514.43716518@oracle.com>
     [not found]                   ` <20140507131514.43716518-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2014-05-07 22:39                     ` Marian Marinov
     [not found]                       ` <536AB626.9070005-108MBtLGafw@public.gmane.org>
2014-05-08 15:25                         ` Richard Davies

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox