From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.7 required=3.0 tests=BAYES_00, DKIM_ADSP_CUSTOM_MED,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI, MENTIONS_GIT_HOSTING,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 32489C433DB for ; Mon, 1 Feb 2021 14:21:00 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 9D51864EAA for ; Mon, 1 Feb 2021 14:20:59 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 9D51864EAA Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 0DF186B007E; Mon, 1 Feb 2021 09:20:59 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 090796B0080; Mon, 1 Feb 2021 09:20:59 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id F10856B0081; Mon, 1 Feb 2021 09:20:58 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0167.hostedemail.com [216.40.44.167]) by kanga.kvack.org (Postfix) with ESMTP id DABC56B007E for ; Mon, 1 Feb 2021 09:20:58 -0500 (EST) Received: from smtpin27.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 9B42D8249980 for ; Mon, 1 Feb 2021 14:20:58 +0000 (UTC) X-FDA: 77769910596.27.trick57_3614840275c2 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin27.hostedemail.com (Postfix) with ESMTP id 757A33D663 for ; Mon, 1 Feb 2021 14:20:58 +0000 (UTC) X-HE-Tag: trick57_3614840275c2 X-Filterd-Recvd-Size: 7483 Received: from raptor.unsafe.ru (raptor.unsafe.ru [5.9.43.93]) by imf28.hostedemail.com (Postfix) with ESMTP for ; Mon, 1 Feb 2021 14:20:57 +0000 (UTC) Received: from comp-core-i7-2640m-0182e6.redhat.com (ip-94-112-41-137.net.upcbroadband.cz [94.112.41.137]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) (No client certificate requested) by raptor.unsafe.ru (Postfix) with ESMTPSA id 5D65D20A0F; Mon, 1 Feb 2021 14:20:38 +0000 (UTC) From: Alexey Gladkov To: LKML , io-uring@vger.kernel.org, Kernel Hardening , Linux Containers , linux-mm@kvack.org Cc: Alexey Gladkov , Andrew Morton , Christian Brauner , "Eric W . Biederman" , Jann Horn , Jens Axboe , Kees Cook , Linus Torvalds , Oleg Nesterov Subject: [PATCH v5 0/7] Count rlimits in each user namespace Date: Mon, 1 Feb 2021 15:18:28 +0100 Message-Id: X-Mailer: git-send-email 2.29.2 MIME-Version: 1.0 X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.6.1 (raptor.unsafe.ru [5.9.43.93]); Mon, 01 Feb 2021 14:20:56 +0000 (UTC) Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Preface ------- These patches are for binding the rlimit counters to a user in user names= pace. This patch set can be applied on top of: git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git v5.11-rc2 Problem ------- The RLIMIT_NPROC, RLIMIT_MEMLOCK, RLIMIT_SIGPENDING, RLIMIT_MSGQUEUE rlim= its implementation places the counters in user_struct [1]. These limits are g= lobal between processes and persists for the lifetime of the process, even if processes are in different user namespaces. To illustrate the impact of rlimits, let's say there is a program that do= es not fork. Some service-A wants to run this program as user X in multiple cont= ainers. Since the program never fork the service wants to set RLIMIT_NPROC=3D1. service-A \- program (uid=3D1000, container1, rlimit_nproc=3D1) \- program (uid=3D1000, container2, rlimit_nproc=3D1) The service-A sets RLIMIT_NPROC=3D1 and runs the program in container1. W= hen the service-A tries to run a program with RLIMIT_NPROC=3D1 in container2 it f= ails since user X already has one running process. The problem is not that the limit from container1 affects container2. The problem is that limit is verified against the global counter that reflect= s the number of processes in all containers. This problem can be worked around by using different users for each conta= iner but in this case we face a different problem of uid mapping when transfer= ring files from one container to another. Eric W. Biederman mentioned this issue [2][3]. Introduced changes ------------------ To address the problem, we bind rlimit counters to user namespace. Each c= ounter reflects the number of processes in a given uid in a given user namespace= . The result is a tree of rlimit counters with the biggest value at the root (a= ka init_user_ns). The limit is considered exceeded if it's exceeded up in th= e tree. [1] https://lore.kernel.org/containers/87imd2incs.fsf@x220.int.ebiederm.o= rg/ [2] https://lists.linuxfoundation.org/pipermail/containers/2020-August/04= 2096.html [3] https://lists.linuxfoundation.org/pipermail/containers/2020-October/0= 42524.html Changelog --------- v5: * Split the first commit into two commits: change ucounts.count type to a= tomic_long_t and add ucounts to cred. These commits were merged by mistake during th= e rebase. * The __get_ucounts() renamed to alloc_ucounts(). * The cred.ucounts update has been moved from commit_creds() as it did no= t allow to handle errors. * Added error handling of set_cred_ucounts(). v4: * Reverted the type change of ucounts.count to refcount_t. * Fixed typo in the kernel/cred.c v3: * Added get_ucounts() function to increase the reference count. The exist= ing get_counts() function renamed to __get_ucounts(). * The type of ucounts.count changed from atomic_t to refcount_t. * Dropped 'const' from set_cred_ucounts() arguments. * Fixed a bug with freeing the cred structure after calling cred_alloc_bl= ank(). * Commit messages have been updated. * Added selftest. v2: * RLIMIT_MEMLOCK, RLIMIT_SIGPENDING and RLIMIT_MSGQUEUE are migrated to u= counts. * Added ucounts for pair uid and user namespace into cred. * Added the ability to increase ucount by more than 1. v1: * After discussion with Eric W. Biederman, I increased the size of ucount= s to atomic_long_t. * Added ucount_max to avoid the fork bomb. -- Alexey Gladkov (7): Increase size of ucounts to atomic_long_t Add a reference to ucounts for each cred Reimplement RLIMIT_NPROC on top of ucounts Reimplement RLIMIT_MSGQUEUE on top of ucounts Reimplement RLIMIT_SIGPENDING on top of ucounts Reimplement RLIMIT_MEMLOCK on top of ucounts kselftests: Add test to check for rlimit changes in different user namespaces fs/exec.c | 6 +- fs/hugetlbfs/inode.c | 17 +- fs/io-wq.c | 22 ++- fs/io-wq.h | 2 +- fs/io_uring.c | 2 +- fs/proc/array.c | 2 +- include/linux/cred.h | 4 + include/linux/hugetlb.h | 3 +- include/linux/mm.h | 4 +- include/linux/sched/user.h | 7 - include/linux/shmem_fs.h | 2 +- include/linux/signal_types.h | 4 +- include/linux/user_namespace.h | 23 ++- ipc/mqueue.c | 29 ++-- ipc/shm.c | 31 ++-- kernel/cred.c | 56 +++++- kernel/exit.c | 2 +- kernel/fork.c | 18 +- kernel/signal.c | 53 +++--- kernel/sys.c | 14 +- kernel/ucount.c | 105 ++++++++++-- kernel/user.c | 3 - kernel/user_namespace.c | 9 +- mm/memfd.c | 4 +- mm/mlock.c | 35 ++-- mm/mmap.c | 3 +- mm/shmem.c | 8 +- tools/testing/selftests/Makefile | 1 + tools/testing/selftests/rlimits/.gitignore | 2 + tools/testing/selftests/rlimits/Makefile | 6 + tools/testing/selftests/rlimits/config | 1 + .../selftests/rlimits/rlimits-per-userns.c | 161 ++++++++++++++++++ 32 files changed, 483 insertions(+), 156 deletions(-) create mode 100644 tools/testing/selftests/rlimits/.gitignore create mode 100644 tools/testing/selftests/rlimits/Makefile create mode 100644 tools/testing/selftests/rlimits/config create mode 100644 tools/testing/selftests/rlimits/rlimits-per-userns.c --=20 2.29.2