From mboxrd@z Thu Jan 1 00:00:00 1970 From: Tejun Heo Subject: Re: [patch 7/8] mm, memcg: allow processes handling oom notifications to access reserves Date: Wed, 11 Dec 2013 07:42:40 -0500 Message-ID: <20131211124240.GA24557@htj.dyndns.org> References: <20131204054533.GZ3556@cmpxchg.org> <20131205025026.GA26777@htj.dyndns.org> <20131206190105.GE13373@htj.dyndns.org> <20131210215037.GB9143@htj.dyndns.org> Mime-Version: 1.0 Return-path: DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=sender:date:from:to:cc:subject:message-id:references:mime-version :content-type:content-disposition:in-reply-to:user-agent; bh=hcWtwoa1Uw6PnftUXUxc7tT1TOAgpcN/tDuFfNWxQNE=; b=mjf5tImUPilvUgC6n7IXxZTha0GyrGYMhMAf09EpMXClItDvsgQjXqNXaJoNhp4Mjg pzYWIkzon5a2mNAegtzDk8EUwoUqhJnVmKSvQwPiMMBuU9r4l9ja5M1OQ5W3Nrhj2ovL 2u9AwywH1OMETJhav6k1Mi5ppdqzOU6uLhJc4IRapJKig53OmwB7gK2EIa31GNsNqYZD nqlbAHRmItPNiNAy9nMSH4AoarVpv5cu5LRy8wkLYhBPspMStC9teAQab5xuODVvdp9U ckziwO8zAdY/odGieQQmQ3Zdo2hbLJPjFtU1YVOI30GkE47W7XfYStIaeSHpe86gOwH6 wYbg== Content-Disposition: inline In-Reply-To: Sender: owner-linux-mm@kvack.org List-ID: Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: David Rientjes Cc: Johannes Weiner , Andrew Morton , Michal Hocko , KAMEZAWA Hiroyuki , Mel Gorman , Rik van Riel , Pekka Enberg , Christoph Lameter , Li Zefan , linux-kernel@vger.kernel.org, linux-mm@kvack.org, cgroups@vger.kernel.org Yo, On Tue, Dec 10, 2013 at 03:55:48PM -0800, David Rientjes wrote: > > Well, the gotcha there is that you won't be able to do that with > > system level OOM handler either unless you create a separately > > reserved memory, which, again, can be achieved using hierarchical > > memcg setup already. Am I missing something here? > > System oom conditions would only arise when the usage of memcgs A + B > above cause the page allocator to not be able to allocate memory without > oom killing something even though the limits of both A and B may not have > been reached yet. No userspace oom handler can allocate memory with > access to memory reserves in the page allocator in such a context; it's > vital that if we are to handle system oom conditions in userspace that we > given them access to memory that other processes can't allocate. You > could attach a userspace system oom handler to any memcg in this scenario > with memory.oom_reserve_in_bytes and since it has PF_OOM_HANDLER it would > be able to allocate in reserves in the page allocator and overcharge in > its memcg to handle it. This isn't possible only with a hierarchical > memcg setup unless you ensure the sum of the limits of the top level > memcgs do not equal or exceed the sum of the min watermarks of all memory > zones, and we exceed that. Yes, exactly. If system memory is 128M, create top level memcgs w/ 120M and 8M each (well, with some slack of course) and then overcommit the descendants of 120M while putting OOM handlers and friends under 8M without overcommitting. ... > The stronger rationale is that you can't handle system oom in userspace > without this functionality and we need to do so. You're giving yourself an unreasonable precondition - overcommitting at root level and handling system OOM from userland - and then trying to contort everything to fit that. How can possibly "overcommitting at root level" be a goal of and in itself? Please take a step back and look at and explain the *problem* you're trying to solve. You haven't explained why that *need*s to be the case at all. I wrote this at the start of the thread but you're still doing the same thing. You're trying to create a hidden memcg level inside a memcg. At the beginning of this thread, you were trying to do that for !root memcgs and now you're arguing that you *need* that for root memcg. Because there's no other limit we can make use of, you're suggesting the use of kernel reserve memory for that purpose. It seems like an absurd thing to do to me. It could be that you might not be able to achieve exactly the same thing that way, but the right thing to do would be improving memcg in general so that it can instead of adding yet more layer of half-baked complexity, right? Even if there are some inherent advantages of system userland OOM handling with a separate physical memory reserve, which AFAICS you haven't succeeded at showing yet, this is a very invasive change and, as you said before, something with an *extremely* narrow use case. Wouldn't it be a better idea to improve the existing mechanisms - be that memcg in general or kernel OOM handling - to fit the niche use case better? I mean, just think about all the corner cases. How are you gonna handle priority inversion through locked pages or allocations given out to other tasks through slab? You're suggesting opening a giant can of worms for extremely narrow benefit which doesn't even seem like actually needing opening the said can. Thanks. -- tejun -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org