From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1863EC43334 for ; Tue, 12 Jul 2022 16:24:30 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234030AbiGLQY2 (ORCPT ); Tue, 12 Jul 2022 12:24:28 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40090 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234300AbiGLQY1 (ORCPT ); Tue, 12 Jul 2022 12:24:27 -0400 Received: from mail-pj1-x102a.google.com (mail-pj1-x102a.google.com [IPv6:2607:f8b0:4864:20::102a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7C2E6CAF34 for ; Tue, 12 Jul 2022 09:24:23 -0700 (PDT) Received: by mail-pj1-x102a.google.com with SMTP id s21so8144283pjq.4 for ; Tue, 12 Jul 2022 09:24:23 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=sender:date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=7sEHZnwDHngZ8PkrUxm/ATIo3aq5UKj0LAY0nMBz/yk=; b=meJQ/dKf8HP6vS20zVKapOxrcORiVKzXMeR0qkrRs/dGuhkKecJaRK6r2Iss9H2K2j gA5Rzq4tqzIgi5NGmZUm8OyerBZnQZxXA7i5AWTSMU7BP+IOP6RITSysiX1+GtSibOpr InGYEdDZViTnBqcEj9HgVOcmo3T9o+0IFmi1CH2C+Ci+znLIvofR+Kd2RDshqo58PJff aXv/DPcAVZ3x7X+27pEziAPi3yIxwI8mTdNR/WDAvu98ogUyKF8iutbuBO1KwxbqKrRp 3U6Vfx+iImOsbz76hKnyYLmh/Kv1bp08Coae9qoybh1lk8OgcSVY0kOb2uDZbar9du7u HJqw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:sender:date:from:to:cc:subject:message-id :references:mime-version:content-disposition:in-reply-to; bh=7sEHZnwDHngZ8PkrUxm/ATIo3aq5UKj0LAY0nMBz/yk=; b=vSInTp9HcziY9Y4oabN+cdslPC6K+hfa/ZVdCcRVgJ5VmjTk+X0Et3VukRJU1wgjcH k7DZdlV2WuQU1Pa8tZfczEWvC2oL7JiUOhg28UDv+jcpHVI5qhft+rvn0gYdTs25e6ve 6ruHpFigTyfNU3WXAoAszQUx2ezXIAzQ6iToGjswHnXXClQzXkeZV/K9mFS0yYjrDItJ lp6/VpWjB9NVKg8LEn9tTbfURKthfNTPfFU6CKeRfcec4KCFL4wItf+y8h1KBOXapt+Q v4fvAjGi/VrF8NvseQjBdAFMR3jowfQDgZVaQN+5Co6WADO+XfGhHM3zVZRK1SSGCbJl sUjA== X-Gm-Message-State: AJIora+9tvpW3YGDTQmmjhhXdI4jeXlog0cnBR8AcvB0TK4U+7VAuO1S kfb8PMbygAe8SZv7w5YAIkk= X-Google-Smtp-Source: AGRyM1vPQIv97fm/20hxzm1gicPKu76SuIxxULA29ZAmF4uFJB7irtvGCWOL90Qd5lx0F5BGLLZvWA== X-Received: by 2002:a17:90b:1e4d:b0:1f0:462b:b573 with SMTP id pi13-20020a17090b1e4d00b001f0462bb573mr5337524pjb.164.1657643062744; Tue, 12 Jul 2022 09:24:22 -0700 (PDT) Received: from localhost (2603-800c-1a02-1bae-a7fa-157f-969a-4cde.res6.spectrum.com. [2603:800c:1a02:1bae:a7fa:157f:969a:4cde]) by smtp.gmail.com with ESMTPSA id be4-20020a656e44000000b0040caab35e5bsm6313952pgb.89.2022.07.12.09.24.21 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 12 Jul 2022 09:24:21 -0700 (PDT) Sender: Tejun Heo Date: Tue, 12 Jul 2022 06:24:20 -1000 From: Tejun Heo To: Michal Hocko Cc: Yafang Shao , Alexei Starovoitov , Shakeel Butt , Matthew Wilcox , Christoph Hellwig , "David S. Miller" , Daniel Borkmann , Andrii Nakryiko , Martin KaFai Lau , bpf , Kernel Team , linux-mm , Christoph Lameter , Pekka Enberg , David Rientjes , Joonsoo Kim , Andrew Morton , Vlastimil Babka Subject: Re: [PATCH bpf-next 0/5] bpf: BPF specific memory allocator. Message-ID: References: <20220708174858.6gl2ag3asmoimpoe@macbook-pro-3.dhcp.thefacebook.com> <20220708215536.pqclxdqvtrfll2y4@google.com> <20220710073213.bkkdweiqrlnr35sv@google.com> <20220712043914.pxmbm7vockuvpmmh@macbook-pro-3.dhcp.thefacebook.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org Hello, Michal. On Tue, Jul 12, 2022 at 11:52:11AM +0200, Michal Hocko wrote: > > Agreed. That's why I don't like reparenting. > > Reparenting just reparent the charged pages and then redirect the new > > charge, but can't reparents the 'limit' of the original memcg. > > So it is a risk if the original memcg is still being charged. We have > > to forbid the destruction of the original memcg. > > yes, I was toying with an idea like that. I guess we really want a > measure to keep cgroups around if they are bound to a resource which is > sticky itself. I am not sure how many other resources like BPF (aka > module like) we already do charge for memcg but considering the > potential memory consumption just reparenting will not help in general > case I am afraid. I think the solution here is an extra cgroup layering to represent persistent resource tracking. In systemd-speak, a service should have a cgroup representing a persistent service type and a cgroup representing the current running instance. This way, the user (or system agent) can clearly distinguish all resources that have ever been attributed to the service and the resources that are accounted to the current instance while also giving visibility into residual resources for services that are no longer running. This gives userspace control over what to track for how long and also fits what the kernel can do in terms of resource tracking. If we try to do something smart from kernel side, there are cases which are inherently insolvable. e.g. if a service instance creates tmpfs / shmem / whawtever and leaves it pinned one way or another and then exits, and there's no one who actively accessed it afterwards, there is no userland visible entity we can reasonably attribute that memory to other than the parent cgroup. Thanks. -- tejun