From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-oi0-f71.google.com (mail-oi0-f71.google.com [209.85.218.71]) by kanga.kvack.org (Postfix) with ESMTP id 54C846B4DC7 for ; Wed, 29 Aug 2018 17:24:45 -0400 (EDT) Received: by mail-oi0-f71.google.com with SMTP id 13-v6so5569597oiq.1 for ; Wed, 29 Aug 2018 14:24:45 -0700 (PDT) Received: from mx0a-00082601.pphosted.com (mx0a-00082601.pphosted.com. [67.231.145.42]) by mx.google.com with ESMTPS id u17-v6si3749343oia.184.2018.08.29.14.24.44 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 29 Aug 2018 14:24:44 -0700 (PDT) Date: Wed, 29 Aug 2018 14:24:25 -0700 From: Roman Gushchin Subject: Re: [PATCH v2 1/3] mm: rework memcg kernel stack accounting Message-ID: <20180829212422.GA13097@castle> References: <20180821213559.14694-1-guro@fb.com> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Disposition: inline In-Reply-To: Sender: owner-linux-mm@kvack.org List-ID: To: Shakeel Butt Cc: Linux MM , LKML , kernel-team@fb.com, Johannes Weiner , Michal Hocko , luto@kernel.org, Konstantin Khlebnikov , Tejun Heo On Tue, Aug 21, 2018 at 03:10:52PM -0700, Shakeel Butt wrote: > On Tue, Aug 21, 2018 at 2:36 PM Roman Gushchin wrote: > > > > If CONFIG_VMAP_STACK is set, kernel stacks are allocated > > using __vmalloc_node_range() with __GFP_ACCOUNT. So kernel > > stack pages are charged against corresponding memory cgroups > > on allocation and uncharged on releasing them. > > > > The problem is that we do cache kernel stacks in small > > per-cpu caches and do reuse them for new tasks, which can > > belong to different memory cgroups. > > > > Each stack page still holds a reference to the original cgroup, > > so the cgroup can't be released until the vmap area is released. > > > > To make this happen we need more than two subsequent exits > > without forks in between on the current cpu, which makes it > > very unlikely to happen. As a result, I saw a significant number > > of dying cgroups (in theory, up to 2 * number_of_cpu + > > number_of_tasks), which can't be released even by significant > > memory pressure. > > > > As a cgroup structure can take a significant amount of memory > > (first of all, per-cpu data like memcg statistics), it leads > > to a noticeable waste of memory. > > > > Signed-off-by: Roman Gushchin > > Reviewed-by: Shakeel Butt > > BTW this makes a very good use-case for optimizing kmem uncharging > similar to what you did for skmem uncharging. The only thing I'm slightly worried here is that it can make reclaiming of memory cgroups harder. Probably, it's still ok, but let me first finish the work I'm doing on optimizing the whole memcg reclaim process, and then return to this case. Thanks!