From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.5 required=3.0 tests=BAYES_00, DKIM_ADSP_CUSTOM_MED,DKIM_INVALID,DKIM_SIGNED,FREEMAIL_FORGED_FROMDOMAIN, FREEMAIL_FROM,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id E1CA2C433B4 for ; Thu, 20 May 2021 08:11:53 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 7F98760200 for ; Thu, 20 May 2021 08:11:53 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 7F98760200 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id F1A0D8D0002; Thu, 20 May 2021 04:11:52 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id EA30E6B0074; Thu, 20 May 2021 04:11:52 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id CD4238D0002; Thu, 20 May 2021 04:11:52 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0180.hostedemail.com [216.40.44.180]) by kanga.kvack.org (Postfix) with ESMTP id 9746D6B0073 for ; Thu, 20 May 2021 04:11:52 -0400 (EDT) Received: from smtpin23.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 3659FAF9B for ; Thu, 20 May 2021 08:11:52 +0000 (UTC) X-FDA: 78160890864.23.5F733AB Received: from mail-pj1-f49.google.com (mail-pj1-f49.google.com [209.85.216.49]) by imf18.hostedemail.com (Postfix) with ESMTP id 9569620007FD for ; Thu, 20 May 2021 08:11:50 +0000 (UTC) Received: by mail-pj1-f49.google.com with SMTP id n6-20020a17090ac686b029015d2f7aeea8so5022681pjt.1 for ; Thu, 20 May 2021 01:11:51 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=dKXXvvEdCPH+ptTNMVfkzPYklSo2kJNNdVAijt8am50=; b=bSBv8DPvXGB6vKZs7INGAbnQThj5J60bW/4UZEusYvMVqSrlHpOmp8KgtEXN8nr1YJ M8psiSgHzbh/qnoEijRaK89rimo0pOcaItrcZAaW9xdUBk8fC9FWy3TklLk66gjrFWdF bxXMSz9OoyzsjDbf6tjj3KzxY4i1B6E6Qova2egNUABDWK0UDMijXbkPEsiT4/tPWeZJ blaauM7ADJMjZCDvacAykSPpoaSe4hXt+XTmTGtIz8qE09+uCvp0urfp2/98+8zLLUcm 8Hm/3eo7aE9EYzFB93p3KOs6G5SltSzQrJ15E9h65/cgVE76Y8lr+YrwiPMQhvhHky9c KSlA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=dKXXvvEdCPH+ptTNMVfkzPYklSo2kJNNdVAijt8am50=; b=Ad2a1WMIoMXyU5vTsi3TD93CKAAY78Qw9uuM/q6MAGHK0SYZCURqeb3QyMYcp13HPM 7Uqqs54xV2Y53HJTNUfo9EXxvcyPOhc1qGyFXFgmVNtrcGrFX9I1lKbtJsLIWT6jB1Bk iz5gCBpa76QaPWfwrsDb/KxRcGb3BgjXS8xtYayg4s6pQsFJbRkuaSPTxdzXNov0wRnz lzyCXrpsQGjriDXUfrD4Qow13XKB2fOM8ujw6MKk4xdcSSSsC7W/qG0eOjgALXhkpWtY PFZIXLU98OtkWXB4vwNPWCSad+H51g1FgmTC7S4HQbG/RmO8oGoe88eUVwiYdoYvUxHV h4JQ== X-Gm-Message-State: AOAM533+NrHwju3oopEwQBPxfUFUWUT3RPp3bMIf+4H5tOACn3Qo1JxN /B0bpyt/OJEDmpXOm32iwsE= X-Google-Smtp-Source: ABdhPJxze/dA5YjPAlUiDARVnP2gEp4Jb6NsZnKWnW8zCEwdTnMH8tdkAumg4sqvpjY3ke+dFwd/Qg== X-Received: by 2002:a17:902:e812:b029:f0:aa50:2f1d with SMTP id u18-20020a170902e812b02900f0aa502f1dmr4391264plg.79.1621498310783; Thu, 20 May 2021 01:11:50 -0700 (PDT) Received: from localhost.localdomain ([27.102.114.24]) by smtp.gmail.com with ESMTPSA id t14sm1242380pfg.168.2021.05.20.01.11.48 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 20 May 2021 01:11:50 -0700 (PDT) From: Yutian Yang To: mhocko@kernel.org Cc: hannes@cmpxchg.org, vdavydov.dev@gmail.com, shenwenbo@zju.edu.cn, cgroups@vger.kernel.org, linux-mm@kvack.org, ytyang@zju.edu.cn, mhocko@suse.com, Yutian Yang Subject: [PATCH] mm: fix unaccounted time namespace objects Date: Thu, 20 May 2021 17:08:58 +0900 Message-Id: <20210520080858.25450-1-nglaive@gmail.com> X-Mailer: git-send-email 2.27.0 MIME-Version: 1.0 X-Rspamd-Queue-Id: 9569620007FD Authentication-Results: imf18.hostedemail.com; dkim=pass header.d=gmail.com header.s=20161025 header.b=bSBv8DPv; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf18.hostedemail.com: domain of nglaive@gmail.com designates 209.85.216.49 as permitted sender) smtp.mailfrom=nglaive@gmail.com X-Rspamd-Server: rspam03 X-Stat-Signature: h3516mrowjxmd7fria5m156mbuueexmm X-HE-Tag: 1621498310-235589 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: This patch adds memcg accounting for time namespace objects, as we have c= onfirmed that unaccounted namespace objects could lead to breaking memcg = limit. For common concerns on this patch, we have the following response: For the practicality of our concerns, we have confirmed that repeatedly c= reating new namespaces could lead to breaking memcg limit. Although the n= umber of namespaces could be limited by per-user quota (e.g., max_time_na= mespaces), depending on per-user quota to limit memory usage is unsafe an= d impractical as users may have their own considerations when setting the= se limits. In fact, limitation on memory usage is more foundamental than = limitation on various kernel objects. I believe this is also the reason w= hy the fd tables and pipe buffers have been accounted by memcg even if th= ey are also under per-user quota's limitation. The same reason applies to= limitation of pid cgroups. Moreover, both net and uts namespaces are pro= perly accounted while the others are not, which shows inconsistencies. For other unaccounted allocations (proc_alloc_inum, vvar_page and likely = others), we have not reached them yet as our detecting tool reported many= results which require much manual effort to go through. To me, it seems = that vvar_page also need patches. Lastly, our work is based on a detecting tool and we only report missing-= charging sites that are manually confirmed to be triggerable from syscall= s. The results that are obviously unexploitable like uncharged ldt_struct= , which is allocated per process, are also filtered out. We would like to= continuously contribute to memcg and we are planning to submit more patc= hes in the future. I have reported the patch but I have not added it to the public mailing l= ist then. Consequently,I switch to a new thread and copy our previous dis= cussions below: > -----Original Messages----- > From: "Michal Hocko" > Sent Time: 2021-04-16 14:29:52 (Friday) > To: "Yutian Yang" > Cc: tglx@linutronix.de, "shenwenbo@zju.edu.cn" , = "vdavydov.dev@gmail.com" > Subject: Re: User-controllable memcg-unaccounted objects of time namesp= ace > > Thank you for this and other reports which are trying to track memcg > unaccounted objects. I have few remarks/questions. > > > On Thu 15-04-21 21:29:57, Yutian Yang wrote: > > Hi, our team has found bugs in time namespace module on Linux kernel = v5.10.19, which leads to user-controllable memcg-unaccounted objects. > > They are caused by the code snippets listed below: > > > > /*--------------- kernel/time/namespace.c --------------------*/ > > ...... > > 91ns =3D kmalloc(sizeof(*ns), GFP_KERNEL); > > 92if (!ns) > > 93goto fail_dec; > > ...... > > /*----------------------------- end -------------------------------*/ > > > > > > The code at line 91 could be triggered by syscall clone if > > CLONE_NEWTIME flag is set in the parameter. A user could repeatedly > > make the clone syscall and trigger the bugs to occupy more and > > more unaccounted memory. In fact, time namespaces objects could be > > allocated by users and are also controllable by users. As a result, > > they need to be accounted and we suggest the following patch: > > Is this a practical concern? I am not really deeply familiar with > namespaces but isn't there any cap on how many of them can be created b= y > user? If not, isn't that contained by the pid cgroup controller? If eve= n > that is not the case, care to explain why? > > You are referring to struct time_namespace above (that is 88B) but I ca= n > see there are other unaccounted allocations (proc_alloc_inum, vvar_page > and likely others) so why the above is more important than those? > > Btw. a similar feedback applies to other reports similar to this one. I > assume you have some sort of tool to explore those potential run aways > and that is really great but it would be really helpful and highly > appreciated to analyze those reports and try to provide some sort of > risk assessment. > > Thanks! > -- > Michal Hocko > SUSE Labs Thanks! Yutian Yang, Zhejiang University Signed-off-by: Yutian Yang --- kernel/time/namespace.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/kernel/time/namespace.c b/kernel/time/namespace.c index afc65e6be..00c20f7fd 100644 --- a/kernel/time/namespace.c +++ b/kernel/time/namespace.c @@ -88,13 +88,13 @@ static struct time_namespace *clone_time_ns(struct us= er_namespace *user_ns, goto fail; =20 err =3D -ENOMEM; - ns =3D kmalloc(sizeof(*ns), GFP_KERNEL); + ns =3D kmalloc(sizeof(*ns), GFP_KERNEL_ACCOUNT); if (!ns) goto fail_dec; =20 kref_init(&ns->kref); =20 - ns->vvar_page =3D alloc_page(GFP_KERNEL | __GFP_ZERO); + ns->vvar_page =3D alloc_page(GFP_KERNEL_ACCOUNT | __GFP_ZERO); if (!ns->vvar_page) goto fail_free; =20 --=20 2.25.1