From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id AB18EC433EF for ; Thu, 23 Jun 2022 15:03:40 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 15DF68E015C; Thu, 23 Jun 2022 11:03:40 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 10E628E0144; Thu, 23 Jun 2022 11:03:40 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id EF0418E015C; Thu, 23 Jun 2022 11:03:39 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id DD0F28E0144 for ; Thu, 23 Jun 2022 11:03:39 -0400 (EDT) Received: from smtpin07.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id A622420E91 for ; Thu, 23 Jun 2022 15:03:39 +0000 (UTC) X-FDA: 79609819758.07.7812BC0 Received: from mail-lj1-f177.google.com (mail-lj1-f177.google.com [209.85.208.177]) by imf23.hostedemail.com (Postfix) with ESMTP id 3846B1400B3 for ; Thu, 23 Jun 2022 15:03:36 +0000 (UTC) Received: by mail-lj1-f177.google.com with SMTP id c30so23632876ljr.9 for ; Thu, 23 Jun 2022 08:03:35 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=openvz-org.20210112.gappssmtp.com; s=20210112; h=message-id:date:mime-version:user-agent:subject:content-language :from:to:cc:references:in-reply-to:content-transfer-encoding; bh=4H3ceOwo0QWDBTksiBY1nTfnYze/Q04xjv0/hjoR1QE=; b=jAZaSHl/WL950DRTIHi6bVKfTCkAeNlBKZFbqzBO3gOEk7LKaX32kIlX07WpEZCiz0 8G+aI1JqTLmysh+J300moZbLIcNCv++WFB0ojr1jQsBUBMcSFVHWLZWJz7XghvqcPpSm iNov6Zi/0JW8qHxs8svvnsceLLskWXq9yZlKEMKeMWxdAblWG1UKyzunLZqOjNIZaUxi KLnxlf2Te8z+3XiuCJloWU7WTwQmHw8F0mIwM/4Qf9tghtbT9kcOWc2ILKATdGpxsGVN MbxUrDuXfxc63NXnh0g98qScyfobiftrceqVId/NAayYbGtGgVwH6cWIzYNd2RgAbg9g eGIA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:message-id:date:mime-version:user-agent:subject :content-language:from:to:cc:references:in-reply-to :content-transfer-encoding; bh=4H3ceOwo0QWDBTksiBY1nTfnYze/Q04xjv0/hjoR1QE=; b=3Gg3mlNyOBvSBYCoyHSXMJv+P7LRgFoKVtGYfXsf1DkuFcyj3Wa/gf1anowoziFNZI S5qaonR1i25po7O0JHZ6ZKWStRSA7AfI9GkQUIs9yYSFksH93VL4zBorV7svwacmZjhS KIQ92sXXisq4dUrboC3Rfd+RXRawQvQ3j4cR+UtVSV3jYfhhbxg6wT2tqF4MPOD4efvM 6D+BcAB7bdWaRqkpoqVrxyNhGydaSMiT6awtjm1NofAL2sJpF6WzwM/B5y2wUqlXHhvi fsKWXTQVwlw7UF6M4VsiIznTXXfhNv76sIRh23Vkf8mBe6ol3WEcWf+vgpICijnsJ/CL yZdw== X-Gm-Message-State: AJIora+Kf7G0Raeu3CRPHavckLuCp2hCoFFGTRi+CugB1TbMVjmA7GEW xT4vhCCPtHH7pPd0zjpht1P/XA== X-Google-Smtp-Source: AGRyM1tCSSWNE2nvbVxfjOHpauS9XcbDxDCn4l90o154YEvxcYZQPjkTEm9kW5yyxn8qV7cFd9CvsQ== X-Received: by 2002:a2e:1411:0:b0:25a:6b41:feec with SMTP id u17-20020a2e1411000000b0025a6b41feecmr4993477ljd.270.1655996612863; Thu, 23 Jun 2022 08:03:32 -0700 (PDT) Received: from [192.168.1.65] ([46.188.121.129]) by smtp.gmail.com with ESMTPSA id e6-20020a195006000000b0047f749df807sm1543373lfb.61.2022.06.23.08.03.32 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Thu, 23 Jun 2022 08:03:32 -0700 (PDT) Message-ID: Date: Thu, 23 Jun 2022 18:03:31 +0300 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.9.1 Subject: Re: [PATCH mm v5 0/9] memcg: accounting for objects allocated by mkdir, cgroup Content-Language: en-US From: Vasily Averin To: Michal Hocko Cc: kernel@openvz.org, Andrew Morton , linux-kernel@vger.kernel.org, linux-mm@kvack.org, Shakeel Butt , Roman Gushchin , =?UTF-8?Q?Michal_Koutn=c3=bd?= , Vlastimil Babka , Muchun Song , cgroups@vger.kernel.org References: <4e685057-b07d-745d-fdaa-1a6a5a681060@openvz.org> <0fe836b4-5c0f-0e32-d511-db816d359748@openvz.org> In-Reply-To: <0fe836b4-5c0f-0e32-d511-db816d359748@openvz.org> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1655996618; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=4H3ceOwo0QWDBTksiBY1nTfnYze/Q04xjv0/hjoR1QE=; b=YVXnwMnRnp/byQBhahR2j7aVSIVa6V/E0LRtPw+3KM6+dCxNbjnWkTU7yPdubZTtLccPMd mx1+94n7eOxsd2wl9cXIb7kutRK2A7NdbmgoDYAeWyrxRD3KMgMeT4fGUH8jCe+rY4R5k2 +QizqbG+WXf+Cn8iH8AFDJ3nERY9gIE= ARC-Authentication-Results: i=1; imf23.hostedemail.com; dkim=pass header.d=openvz-org.20210112.gappssmtp.com header.s=20210112 header.b="jAZaSHl/"; spf=pass (imf23.hostedemail.com: domain of vvs@openvz.org designates 209.85.208.177 as permitted sender) smtp.mailfrom=vvs@openvz.org; dmarc=pass (policy=none) header.from=openvz.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1655996618; a=rsa-sha256; cv=none; b=H6HAafuK0TnckVE+xOqZalv8ESia+YPOp55+2X2/OU2659yLoOAniKSSGtdSVozOtqBBFs Di78Zb8VgZxw74l1DjzIDzk/I7EVZxn/RxXmtPybCGtrK5CKMZGZBCBnNrPfJ7CwxmPeGX 3DKl1w4A9FrYOWeQ+xUpZiydyOOFTgE= X-Rspamd-Queue-Id: 3846B1400B3 Authentication-Results: imf23.hostedemail.com; dkim=pass header.d=openvz-org.20210112.gappssmtp.com header.s=20210112 header.b="jAZaSHl/"; spf=pass (imf23.hostedemail.com: domain of vvs@openvz.org designates 209.85.208.177 as permitted sender) smtp.mailfrom=vvs@openvz.org; dmarc=pass (policy=none) header.from=openvz.org X-Rspam-User: X-Stat-Signature: fr445mnpd4zbg6reodwzrk65apwpeq7t X-Rspamd-Server: rspam04 X-HE-Tag: 1655996616-271018 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Dear Michal, do you still have any concerns about this patch set? Thank you, Vasily Averin On 6/23/22 17:50, Vasily Averin wrote: > In some cases, creating a cgroup allocates a noticeable amount of memory. > This operation can be executed from inside memory-limited container, > but currently this memory is not accounted to memcg and can be misused. > This allow container to exceed the assigned memory limit and avoid > memcg OOM. Moreover, in case of global memory shortage on the host, > the OOM-killer may not find a real memory eater and start killing > random processes on the host. > > This is especially important for OpenVZ and LXC used on hosting, > where containers are used by untrusted end users. > > Below is tracing results of mkdir /sys/fs/cgroup/vvs.test on > 4cpu VM with Fedora and self-complied upstream kernel. The calculations > are not precise, it depends on kernel config options, number of cpus, > enabled controllers, ignores possible page allocations etc. > However this is enough to clarify the general situation. > All allocations are splitted into: > - common part, always called for each cgroup type > - per-cgroup allocations > > In each group we consider 2 corner cases: > - usual allocations, important for 1-2 CPU nodes/Vms > - percpu allocations, important for 'big irons' > > common part: ~11Kb + 318 bytes percpu > memcg: ~17Kb + 4692 bytes percpu > cpu: ~2.5Kb + 1036 bytes percpu > cpuset: ~3Kb + 12 bytes percpu > blkcg: ~3Kb + 12 bytes percpu > pid: ~1.5Kb + 12 bytes percpu > perf: ~320b + 60 bytes percpu > ------------------------------------------- > total: ~38Kb + 6142 bytes percpu > currently accounted: 4668 bytes percpu > > - it's important to account usual allocations called > in common part, because almost all of cgroup-specific allocations > are small. One exception here is memory cgroup, it allocates a few > huge objects that should be accounted. > - Percpu allocation called in common part, in memcg and cpu cgroups > should be accounted, rest ones are small an can be ignored. > - KERNFS objects are allocated both in common part and in most of > cgroups > > Details can be found here: > https://lore.kernel.org/all/d28233ee-bccb-7bc3-c2ec-461fd7f95e6a@openvz.org/ > > I checked other cgroups types was found that they all can be ignored. > Additionally I found allocation of struct rt_rq called in cpu cgroup > if CONFIG_RT_GROUP_SCHED was enabled, it allocates huge (~1700 bytes) > percpu structure and should be accounted too. > > v5: > 1) re-based to linux-mm (mm-everything-2022-06-22-20-36) > > v4: > 1) re-based to linux-next (next-20220610) > now psi_group is not a part of struct cgroup and is allocated on demand > 2) added received approval from Muchun Song > 3) improved cover letter description according to akpm@ request > > v3: > 1) re-based to current upstream (v5.18-11267-gb00ed48bb0a7) > 2) fixed few typos > 3) added received approvals > > v2: > 1) re-split to simplify possible bisect, re-ordered > 2) added accounting for percpu psi_group_cpu and cgroup_rstat_cpu, > allocated in common part > 3) added accounting for percpu allocation of struct rt_rq > (actual if CONFIG_RT_GROUP_SCHED is enabled) > 4) improved patches descriptions > > Vasily Averin (9): > memcg: enable accounting for struct cgroup > memcg: enable accounting for kernfs nodes > memcg: enable accounting for kernfs iattrs > memcg: enable accounting for struct simple_xattr > memcg: enable accounting for percpu allocation of struct psi_group_cpu > memcg: enable accounting for percpu allocation of struct > cgroup_rstat_cpu > memcg: enable accounting for large allocations in mem_cgroup_css_alloc > memcg: enable accounting for allocations in alloc_fair_sched_group > memcg: enable accounting for perpu allocation of struct rt_rq > > fs/kernfs/mount.c | 6 ++++-- > fs/xattr.c | 2 +- > kernel/cgroup/cgroup.c | 2 +- > kernel/cgroup/rstat.c | 3 ++- > kernel/sched/fair.c | 4 ++-- > kernel/sched/psi.c | 2 +- > kernel/sched/rt.c | 2 +- > mm/memcontrol.c | 4 ++-- > 8 files changed, 14 insertions(+), 11 deletions(-) >