From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 94A9BC43334 for ; Thu, 23 Jun 2022 14:50:49 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 10A9A8E0158; Thu, 23 Jun 2022 10:50:49 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 0BB0A8E0144; Thu, 23 Jun 2022 10:50:49 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id EEC7F8E0158; Thu, 23 Jun 2022 10:50:48 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id DD2268E0144 for ; Thu, 23 Jun 2022 10:50:48 -0400 (EDT) Received: from smtpin12.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id AD02E20B6B for ; Thu, 23 Jun 2022 14:50:48 +0000 (UTC) X-FDA: 79609787376.12.3314E1F Received: from mail-lf1-f53.google.com (mail-lf1-f53.google.com [209.85.167.53]) by imf03.hostedemail.com (Postfix) with ESMTP id 01DE9200AE for ; Thu, 23 Jun 2022 14:50:45 +0000 (UTC) Received: by mail-lf1-f53.google.com with SMTP id x3so6114610lfd.2 for ; Thu, 23 Jun 2022 07:50:44 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=openvz-org.20210112.gappssmtp.com; s=20210112; h=message-id:date:mime-version:user-agent:from:subject:to:cc :references:content-language:in-reply-to:content-transfer-encoding; bh=1bcYs/tPp9DPi/Wqk7L+riaRA+D466h+ZJY4pfI0O2g=; b=2nVn4C1yCEydX2fk5EcrBIpk/CFe9RFSZALX8zPrL8etewPDITPil7xcCntvZB6Fay 4zuaF9OVJOb8XEZmGQ1AZnzIrVQbR5weeT2K8jCyk8Uv4UysqcWsURbgqc2IqPni0YoM qqdAQo5VH8ThIirYGZTjVUKR2Le7kvoAwMNsewNWCjWTtpGMJEgMdrh5JIBpsB/YqJnG zttFzE0lgyM/J0O6GKeoTA7sXcWcCcBADmcMO2nyOpFCFOPOonnxG3zBV9GwSV/qwNO+ g11zvxGlf7osZ3ONy1Ws9tsJLBwTTCv/+Pweo6JUzGspIvt8VX4SCpE75HrTIM+skuEr iofA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:message-id:date:mime-version:user-agent:from :subject:to:cc:references:content-language:in-reply-to :content-transfer-encoding; bh=1bcYs/tPp9DPi/Wqk7L+riaRA+D466h+ZJY4pfI0O2g=; b=7QkJWQWlv0WG0bJrTVJIrY1W4m59TOsphIv/5MYAs7BEtRUa+7JM3QSi3dfyPgOV3h 4cNTN2GOM54+t/+rFG4BXf3Q7pLxwgQyn4Toh2syELH43edB0kRlk+gJ1f1mMZU2V3hz G0nScWmtcDa8VJ34tEZckah8zioV5/IIsmWDVING29U2etcMcZGAsMrc+OTcJV3vlbdY Qc/bsJMq4mPqpzp4BBeHdRgcGhP25jUBP16hFQTqvOuEnX7VhiFLKeKD/kUGzw00rfc9 9sYqHTOguQ7fEswJFeCw+yYRuA8w2uX5k60M3wKBlnCyCAwXE0qsirYkMiD54e1BEjrq uCXA== X-Gm-Message-State: AJIora8o1Cd7nSusXncKhA4AzHwbX88E82XaiSIRjWOifZ7Rlfpfgr4y 5wQfP2T9r5SVdKmhhcKbG2caHg== X-Google-Smtp-Source: AGRyM1vrz8cgtEnvLDZrtm9xg+8BYIcCZwc6rUBAu40Yw4vy7D36HuI0qYdj/5xvJFCwwwbPdiJjWg== X-Received: by 2002:a05:6512:688:b0:47f:5f74:de81 with SMTP id t8-20020a056512068800b0047f5f74de81mr5708499lfe.477.1655995842563; Thu, 23 Jun 2022 07:50:42 -0700 (PDT) Received: from [192.168.1.65] ([46.188.121.129]) by smtp.gmail.com with ESMTPSA id e18-20020ac25cb2000000b0047f8790085csm1017767lfq.71.2022.06.23.07.50.41 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Thu, 23 Jun 2022 07:50:42 -0700 (PDT) Message-ID: <0fe836b4-5c0f-0e32-d511-db816d359748@openvz.org> Date: Thu, 23 Jun 2022 17:50:41 +0300 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.9.1 From: Vasily Averin Subject: [PATCH mm v5 0/9] memcg: accounting for objects allocated by mkdir, cgroup To: Andrew Morton Cc: kernel@openvz.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Shakeel Butt , Roman Gushchin , =?UTF-8?Q?Michal_Koutn=c3=bd?= , Vlastimil Babka , Michal Hocko , Muchun Song , cgroups@vger.kernel.org References: <4e685057-b07d-745d-fdaa-1a6a5a681060@openvz.org> Content-Language: en-US In-Reply-To: <4e685057-b07d-745d-fdaa-1a6a5a681060@openvz.org> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1655995848; a=rsa-sha256; cv=none; b=tsDqfC9+EtKHwGzGKY1OYK6S93Svc9Udi0iHRNYW0XiCiipWT6uvQCPcehzzEtbiwZ8V2L umfqoPpW7IjA4yIQPyBKD+QiRaqyK2uqIcj5PLz0XY/+87u0XauvaCp9p+fOcM2nZ7HWOa GUBvrHo9Hmb59J2afKIQ6WiR4ps+Ukc= ARC-Authentication-Results: i=1; imf03.hostedemail.com; dkim=pass header.d=openvz-org.20210112.gappssmtp.com header.s=20210112 header.b=2nVn4C1y; spf=pass (imf03.hostedemail.com: domain of vvs@openvz.org designates 209.85.167.53 as permitted sender) smtp.mailfrom=vvs@openvz.org; dmarc=pass (policy=none) header.from=openvz.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1655995848; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=1bcYs/tPp9DPi/Wqk7L+riaRA+D466h+ZJY4pfI0O2g=; b=bwFGVf5pYc/mymd/8wCFszcA254e/3PYsRxidTC9iEjy9BtKc1GGN5h4pFRdEgU+DH5HDc 24YCJDcf056EvYC2HJ5EqOmVsN4iVw+dhb2o3+wr5RhbVnoGZskvm3xo9T5qMBI2AZBVS9 AB3JoxzABtUVoKTwe5eHhZG4CSv7t2E= X-Stat-Signature: nqjud6x4mwwmf65pj6fbcc3fxre76w4o X-Rspamd-Server: rspam06 Authentication-Results: imf03.hostedemail.com; dkim=pass header.d=openvz-org.20210112.gappssmtp.com header.s=20210112 header.b=2nVn4C1y; spf=pass (imf03.hostedemail.com: domain of vvs@openvz.org designates 209.85.167.53 as permitted sender) smtp.mailfrom=vvs@openvz.org; dmarc=pass (policy=none) header.from=openvz.org X-Rspam-User: X-Rspamd-Queue-Id: 01DE9200AE X-HE-Tag: 1655995845-994896 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: In some cases, creating a cgroup allocates a noticeable amount of memory. This operation can be executed from inside memory-limited container, but currently this memory is not accounted to memcg and can be misused. This allow container to exceed the assigned memory limit and avoid memcg OOM. Moreover, in case of global memory shortage on the host, the OOM-killer may not find a real memory eater and start killing random processes on the host. This is especially important for OpenVZ and LXC used on hosting, where containers are used by untrusted end users. Below is tracing results of mkdir /sys/fs/cgroup/vvs.test on 4cpu VM with Fedora and self-complied upstream kernel. The calculations are not precise, it depends on kernel config options, number of cpus, enabled controllers, ignores possible page allocations etc. However this is enough to clarify the general situation. All allocations are splitted into: - common part, always called for each cgroup type - per-cgroup allocations In each group we consider 2 corner cases: - usual allocations, important for 1-2 CPU nodes/Vms - percpu allocations, important for 'big irons' common part: ~11Kb + 318 bytes percpu memcg: ~17Kb + 4692 bytes percpu cpu: ~2.5Kb + 1036 bytes percpu cpuset: ~3Kb + 12 bytes percpu blkcg: ~3Kb + 12 bytes percpu pid: ~1.5Kb + 12 bytes percpu perf: ~320b + 60 bytes percpu ------------------------------------------- total: ~38Kb + 6142 bytes percpu currently accounted: 4668 bytes percpu - it's important to account usual allocations called in common part, because almost all of cgroup-specific allocations are small. One exception here is memory cgroup, it allocates a few huge objects that should be accounted. - Percpu allocation called in common part, in memcg and cpu cgroups should be accounted, rest ones are small an can be ignored. - KERNFS objects are allocated both in common part and in most of cgroups Details can be found here: https://lore.kernel.org/all/d28233ee-bccb-7bc3-c2ec-461fd7f95e6a@openvz.org/ I checked other cgroups types was found that they all can be ignored. Additionally I found allocation of struct rt_rq called in cpu cgroup if CONFIG_RT_GROUP_SCHED was enabled, it allocates huge (~1700 bytes) percpu structure and should be accounted too. v5: 1) re-based to linux-mm (mm-everything-2022-06-22-20-36) v4: 1) re-based to linux-next (next-20220610) now psi_group is not a part of struct cgroup and is allocated on demand 2) added received approval from Muchun Song 3) improved cover letter description according to akpm@ request v3: 1) re-based to current upstream (v5.18-11267-gb00ed48bb0a7) 2) fixed few typos 3) added received approvals v2: 1) re-split to simplify possible bisect, re-ordered 2) added accounting for percpu psi_group_cpu and cgroup_rstat_cpu, allocated in common part 3) added accounting for percpu allocation of struct rt_rq (actual if CONFIG_RT_GROUP_SCHED is enabled) 4) improved patches descriptions Vasily Averin (9): memcg: enable accounting for struct cgroup memcg: enable accounting for kernfs nodes memcg: enable accounting for kernfs iattrs memcg: enable accounting for struct simple_xattr memcg: enable accounting for percpu allocation of struct psi_group_cpu memcg: enable accounting for percpu allocation of struct cgroup_rstat_cpu memcg: enable accounting for large allocations in mem_cgroup_css_alloc memcg: enable accounting for allocations in alloc_fair_sched_group memcg: enable accounting for perpu allocation of struct rt_rq fs/kernfs/mount.c | 6 ++++-- fs/xattr.c | 2 +- kernel/cgroup/cgroup.c | 2 +- kernel/cgroup/rstat.c | 3 ++- kernel/sched/fair.c | 4 ++-- kernel/sched/psi.c | 2 +- kernel/sched/rt.c | 2 +- mm/memcontrol.c | 4 ++-- 8 files changed, 14 insertions(+), 11 deletions(-) -- 2.36.1