From mboxrd@z Thu Jan 1 00:00:00 1970 From: Frederic Weisbecker Subject: Re: Protection against container fork bombs [WAS: Re: memcg with kmem limit doesn't recover after disk i/o causes limit to be hit] Date: Wed, 30 Apr 2014 15:28:49 +0200 Message-ID: <20140430132846.GA17745@localhost.localdomain> References: <20140423084942.560ae837@oracle.com> <20140428180025.GC25689@ubuntumail> <20140429072515.GB15058@dhcp22.suse.cz> <20140429130353.GA27354@ubuntumail> <20140429154345.GH15058@dhcp22.suse.cz> <20140429165114.GE6129@localhost.localdomain> <20140429214454.GF6129@localhost.localdomain> <5360F6B4.9010308@redhat.com> Mime-Version: 1.0 Return-path: DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=date:from:to:cc:subject:message-id:references:mime-version :content-type:content-disposition:in-reply-to:user-agent; bh=Wnx+24puP6iXGkA/z4Q5iMEqd2d0OOl0luPZt9BrW0Y=; b=h5aH/zWyeOHoztoskzzlKVpZPYg+d61hbYHruH1fAFL1mvx8xipfA/WsCyfWhr0gJr sCyAZjCdZH/4xlbbzmi1Pxss1o6N49cw8YOoY4Dc3SsIde0vjGsqUa7zgbNeVQ5VQtAa m6Mmz/xMQ3kkySAl4j+92IUmc5yG0IXrRdc5QxYx2eH1LhtodwkirIMXQRLsrpbDowtg L1la9bYzKRS+zz4RNEK5J5YhBy4rcUktd0IEwoD0hG/ty/Ah82Cc/JIhwZnwyJpqTB3n dYbIyAtWQ0mAjB6B/kzhTzBictcl9RZcGPWqEtEnrjBxtQD63EwyWWQV9mD8Oefp5KLC AcXw== Content-Disposition: inline In-Reply-To: <5360F6B4.9010308-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> Sender: cgroups-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org List-ID: Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: Daniel J Walsh Cc: Tim Hockin , Michal Hocko , Serge Hallyn , Richard Davies , Vladimir Davydov , Marian Marinov , Max Kellermann , Tim Hockin , containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org, cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Glauber Costa , "linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org" , William Dauchy , Johannes Weiner , Tejun Heo , David Rientjes On Wed, Apr 30, 2014 at 09:12:20AM -0400, Daniel J Walsh wrote: > > On 04/29/2014 05:44 PM, Frederic Weisbecker wrote: > > On Tue, Apr 29, 2014 at 09:59:30AM -0700, Tim Hockin wrote: > >> Here's the reason it doesn't work for us: It doesn't work. It was > >> something like 2 YEARS since we first wanted this, and it STILL does > >> not work. > > When I was working on the task counter cgroup subsystem 2 years > > ago, the patches were actually pushed back by google people, in favour > > of task stack kmem cgroup subsystem. > > > > The reason was that expressing the forkbomb issue in terms of > > number of tasks as a resource is awkward and that the real resource > > in the game comes from kernel memory exhaustion due to task stack being > > allocated over and over, swap ping-pong and stuffs... > > > > And that was a pretty good argument. I still agree with that. Especially > > since that could solve others people issues at the same time. kmem > > cgroup has a quite large domain of application. > > > >> You're postponing a pretty simple request indefinitely in > >> favor of a much more complex feature, which still doesn't really give > >> me what I want. What I want is an API that works like rlimit but > >> per-cgroup, rather than per-UID. > > The request is simple but I don't think that adding the task counter > > cgroup subsystem is simpler than extending the kmem code to apply limits > > to only task stack. Especially in terms of maintainance. > > > > Also you guys have very good mm kernel developers who are already > > familiar with this. > I would look at this from a Usability point of view. It is a lot easier > to understand number of processes then the mount of KMEM those processes > will need. Setting something like > ProcessLimit=1000 in a systemd unit file is easy to explain. Yeah that's a fair point. > Now if systemd has the ability to translate this into something that makes > sense in terms of kmem cgroup, then my argument goes away. Yeah if we keep the kmem direction, this can be a place where we do the mapping. Now I just hope the amount of stack memory allocated doesn't differ too much per arch.