From: ebiederm@xmission.com (Eric W. Biederman)
To: Alexey Gladkov <gladkov.alexey@gmail.com>
Cc: Jens Axboe <axboe@kernel.dk>, Kees Cook <keescook@chromium.org>,
Kernel Hardening <kernel-hardening@lists.openwall.com>,
Linus Torvalds <torvalds@linux-foundation.org>,
Linux Containers <containers@lists.linux-foundation.org>,
Jann Horn <jannh@google.com>, LKML <linux-kernel@vger.kernel.org>,
Oleg Nesterov <oleg@redhat.com>,
linux-mm@kvack.org, Andrew Morton <akpm@linux-foundation.org>,
Alexey Gladkov <legion@kernel.org>
Subject: Re: [PATCH v9 0/8] Count rlimits in each user namespace
Date: Mon, 05 Apr 2021 12:03:05 -0500 [thread overview]
Message-ID: <m17dlgll4m.fsf@fess.ebiederm.org> (raw)
In-Reply-To: <cover.1616533074.git.gladkov.alexey@gmail.com> (Alexey Gladkov's message of "Tue, 23 Mar 2021 21:59:09 +0100")
Alexey Gladkov <gladkov.alexey@gmail.com> writes:
> Preface
> -------
> These patches are for binding the rlimit counters to a user in user namespace.
> This patch set can be applied on top of:
>
> git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git v5.12-rc4
>
> Problem
> -------
> The RLIMIT_NPROC, RLIMIT_MEMLOCK, RLIMIT_SIGPENDING, RLIMIT_MSGQUEUE rlimits
> implementation places the counters in user_struct [1]. These limits are global
> between processes and persists for the lifetime of the process, even if
> processes are in different user namespaces.
>
> To illustrate the impact of rlimits, let's say there is a program that does not
> fork. Some service-A wants to run this program as user X in multiple containers.
> Since the program never fork the service wants to set RLIMIT_NPROC=1.
>
> service-A
> \- program (uid=1000, container1, rlimit_nproc=1)
> \- program (uid=1000, container2, rlimit_nproc=1)
>
> The service-A sets RLIMIT_NPROC=1 and runs the program in container1. When the
> service-A tries to run a program with RLIMIT_NPROC=1 in container2 it fails
> since user X already has one running process.
>
> The problem is not that the limit from container1 affects container2. The
> problem is that limit is verified against the global counter that reflects
> the number of processes in all containers.
>
> This problem can be worked around by using different users for each container
> but in this case we face a different problem of uid mapping when transferring
> files from one container to another.
>
> Eric W. Biederman mentioned this issue [2][3].
>
> Introduced changes
> ------------------
> To address the problem, we bind rlimit counters to user namespace. Each counter
> reflects the number of processes in a given uid in a given user namespace. The
> result is a tree of rlimit counters with the biggest value at the root (aka
> init_user_ns). The limit is considered exceeded if it's exceeded up in the tree.
>
> [1]: https://lore.kernel.org/containers/87imd2incs.fsf@x220.int.ebiederm.org/
> [2]: https://lists.linuxfoundation.org/pipermail/containers/2020-August/042096.html
> [3]: https://lists.linuxfoundation.org/pipermail/containers/2020-October/042524.html
>
> Changelog
> ---------
> v9:
> * Used a negative value to check that the ucounts->count is close to overflow.
> * Rebased onto v5.12-rc4.
>
> v8:
> * Used atomic_t for ucounts reference counting. Also added counter overflow
> check (thanks to Linus Torvalds for the idea).
> * Fixed other issues found by lkp-tests project in the patch that Reimplements
> RLIMIT_MEMLOCK on top of ucounts.
>
> v7:
> * Fixed issues found by lkp-tests project in the patch that Reimplements
> RLIMIT_MEMLOCK on top of ucounts.
>
> v6:
> * Fixed issues found by lkp-tests project.
> * Rebased onto v5.11.
>
> v5:
> * Split the first commit into two commits: change ucounts.count type to atomic_long_t
> and add ucounts to cred. These commits were merged by mistake during the rebase.
> * The __get_ucounts() renamed to alloc_ucounts().
> * The cred.ucounts update has been moved from commit_creds() as it did not allow
> to handle errors.
> * Added error handling of set_cred_ucounts().
>
> v4:
> * Reverted the type change of ucounts.count to refcount_t.
> * Fixed typo in the kernel/cred.c
>
> v3:
> * Added get_ucounts() function to increase the reference count. The existing
> get_counts() function renamed to __get_ucounts().
> * The type of ucounts.count changed from atomic_t to refcount_t.
> * Dropped 'const' from set_cred_ucounts() arguments.
> * Fixed a bug with freeing the cred structure after calling cred_alloc_blank().
> * Commit messages have been updated.
> * Added selftest.
>
> v2:
> * RLIMIT_MEMLOCK, RLIMIT_SIGPENDING and RLIMIT_MSGQUEUE are migrated to ucounts.
> * Added ucounts for pair uid and user namespace into cred.
> * Added the ability to increase ucount by more than 1.
>
> v1:
> * After discussion with Eric W. Biederman, I increased the size of ucounts to
> atomic_long_t.
> * Added ucount_max to avoid the fork bomb.
>
> --
>
> Alexey Gladkov (8):
> Increase size of ucounts to atomic_long_t
> Add a reference to ucounts for each cred
> Use atomic_t for ucounts reference counting
> Reimplement RLIMIT_NPROC on top of ucounts
> Reimplement RLIMIT_MSGQUEUE on top of ucounts
> Reimplement RLIMIT_SIGPENDING on top of ucounts
> Reimplement RLIMIT_MEMLOCK on top of ucounts
> kselftests: Add test to check for rlimit changes in different user namespaces
>
Overall this looks good, and it is a very good sign that the automatic
testing bots have not found anything. I found a few little nits.
But thing are looking very good.
Eric
_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/containers
WARNING: multiple messages have this Message-ID (diff)
From: ebiederm@xmission.com (Eric W. Biederman)
To: Alexey Gladkov <gladkov.alexey@gmail.com>
Cc: LKML <linux-kernel@vger.kernel.org>,
Kernel Hardening <kernel-hardening@lists.openwall.com>,
Linux Containers <containers@lists.linux-foundation.org>,
linux-mm@kvack.org, Alexey Gladkov <legion@kernel.org>,
Andrew Morton <akpm@linux-foundation.org>,
Christian Brauner <christian.brauner@ubuntu.com>,
Jann Horn <jannh@google.com>, Jens Axboe <axboe@kernel.dk>,
Kees Cook <keescook@chromium.org>,
Linus Torvalds <torvalds@linux-foundation.org>,
Oleg Nesterov <oleg@redhat.com>
Subject: Re: [PATCH v9 0/8] Count rlimits in each user namespace
Date: Mon, 05 Apr 2021 12:03:05 -0500 [thread overview]
Message-ID: <m17dlgll4m.fsf@fess.ebiederm.org> (raw)
In-Reply-To: <cover.1616533074.git.gladkov.alexey@gmail.com> (Alexey Gladkov's message of "Tue, 23 Mar 2021 21:59:09 +0100")
Alexey Gladkov <gladkov.alexey@gmail.com> writes:
> Preface
> -------
> These patches are for binding the rlimit counters to a user in user namespace.
> This patch set can be applied on top of:
>
> git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git v5.12-rc4
>
> Problem
> -------
> The RLIMIT_NPROC, RLIMIT_MEMLOCK, RLIMIT_SIGPENDING, RLIMIT_MSGQUEUE rlimits
> implementation places the counters in user_struct [1]. These limits are global
> between processes and persists for the lifetime of the process, even if
> processes are in different user namespaces.
>
> To illustrate the impact of rlimits, let's say there is a program that does not
> fork. Some service-A wants to run this program as user X in multiple containers.
> Since the program never fork the service wants to set RLIMIT_NPROC=1.
>
> service-A
> \- program (uid=1000, container1, rlimit_nproc=1)
> \- program (uid=1000, container2, rlimit_nproc=1)
>
> The service-A sets RLIMIT_NPROC=1 and runs the program in container1. When the
> service-A tries to run a program with RLIMIT_NPROC=1 in container2 it fails
> since user X already has one running process.
>
> The problem is not that the limit from container1 affects container2. The
> problem is that limit is verified against the global counter that reflects
> the number of processes in all containers.
>
> This problem can be worked around by using different users for each container
> but in this case we face a different problem of uid mapping when transferring
> files from one container to another.
>
> Eric W. Biederman mentioned this issue [2][3].
>
> Introduced changes
> ------------------
> To address the problem, we bind rlimit counters to user namespace. Each counter
> reflects the number of processes in a given uid in a given user namespace. The
> result is a tree of rlimit counters with the biggest value at the root (aka
> init_user_ns). The limit is considered exceeded if it's exceeded up in the tree.
>
> [1]: https://lore.kernel.org/containers/87imd2incs.fsf@x220.int.ebiederm.org/
> [2]: https://lists.linuxfoundation.org/pipermail/containers/2020-August/042096.html
> [3]: https://lists.linuxfoundation.org/pipermail/containers/2020-October/042524.html
>
> Changelog
> ---------
> v9:
> * Used a negative value to check that the ucounts->count is close to overflow.
> * Rebased onto v5.12-rc4.
>
> v8:
> * Used atomic_t for ucounts reference counting. Also added counter overflow
> check (thanks to Linus Torvalds for the idea).
> * Fixed other issues found by lkp-tests project in the patch that Reimplements
> RLIMIT_MEMLOCK on top of ucounts.
>
> v7:
> * Fixed issues found by lkp-tests project in the patch that Reimplements
> RLIMIT_MEMLOCK on top of ucounts.
>
> v6:
> * Fixed issues found by lkp-tests project.
> * Rebased onto v5.11.
>
> v5:
> * Split the first commit into two commits: change ucounts.count type to atomic_long_t
> and add ucounts to cred. These commits were merged by mistake during the rebase.
> * The __get_ucounts() renamed to alloc_ucounts().
> * The cred.ucounts update has been moved from commit_creds() as it did not allow
> to handle errors.
> * Added error handling of set_cred_ucounts().
>
> v4:
> * Reverted the type change of ucounts.count to refcount_t.
> * Fixed typo in the kernel/cred.c
>
> v3:
> * Added get_ucounts() function to increase the reference count. The existing
> get_counts() function renamed to __get_ucounts().
> * The type of ucounts.count changed from atomic_t to refcount_t.
> * Dropped 'const' from set_cred_ucounts() arguments.
> * Fixed a bug with freeing the cred structure after calling cred_alloc_blank().
> * Commit messages have been updated.
> * Added selftest.
>
> v2:
> * RLIMIT_MEMLOCK, RLIMIT_SIGPENDING and RLIMIT_MSGQUEUE are migrated to ucounts.
> * Added ucounts for pair uid and user namespace into cred.
> * Added the ability to increase ucount by more than 1.
>
> v1:
> * After discussion with Eric W. Biederman, I increased the size of ucounts to
> atomic_long_t.
> * Added ucount_max to avoid the fork bomb.
>
> --
>
> Alexey Gladkov (8):
> Increase size of ucounts to atomic_long_t
> Add a reference to ucounts for each cred
> Use atomic_t for ucounts reference counting
> Reimplement RLIMIT_NPROC on top of ucounts
> Reimplement RLIMIT_MSGQUEUE on top of ucounts
> Reimplement RLIMIT_SIGPENDING on top of ucounts
> Reimplement RLIMIT_MEMLOCK on top of ucounts
> kselftests: Add test to check for rlimit changes in different user namespaces
>
Overall this looks good, and it is a very good sign that the automatic
testing bots have not found anything. I found a few little nits.
But thing are looking very good.
Eric
next prev parent reply other threads:[~2021-04-05 17:03 UTC|newest]
Thread overview: 30+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-03-23 20:59 [PATCH v9 0/8] Count rlimits in each user namespace Alexey Gladkov
2021-03-23 20:59 ` Alexey Gladkov
2021-03-23 20:59 ` [PATCH v9 1/8] Increase size of ucounts to atomic_long_t Alexey Gladkov
2021-03-23 20:59 ` Alexey Gladkov
2021-03-23 20:59 ` [PATCH v9 2/8] Add a reference to ucounts for each cred Alexey Gladkov
2021-03-23 20:59 ` Alexey Gladkov
2021-03-23 20:59 ` [PATCH v9 3/8] Use atomic_t for ucounts reference counting Alexey Gladkov
2021-03-23 20:59 ` Alexey Gladkov
2021-04-05 17:01 ` Eric W. Biederman
2021-04-05 17:01 ` Eric W. Biederman
2021-03-23 20:59 ` [PATCH v9 4/8] Reimplement RLIMIT_NPROC on top of ucounts Alexey Gladkov
2021-03-23 20:59 ` Alexey Gladkov
2021-04-05 16:56 ` Eric W. Biederman
2021-04-05 16:56 ` Eric W. Biederman
2021-04-06 15:44 ` Alexey Gladkov
2021-04-06 15:44 ` Alexey Gladkov
2021-04-07 16:56 ` Eric W. Biederman
2021-04-07 16:56 ` Eric W. Biederman
2021-03-23 20:59 ` [PATCH v9 5/8] Reimplement RLIMIT_MSGQUEUE " Alexey Gladkov
2021-03-23 20:59 ` Alexey Gladkov
2021-03-23 20:59 ` [PATCH v9 6/8] Reimplement RLIMIT_SIGPENDING " Alexey Gladkov
2021-03-23 20:59 ` Alexey Gladkov
2021-04-05 16:58 ` Eric W. Biederman
2021-04-05 16:58 ` Eric W. Biederman
2021-03-23 20:59 ` [PATCH v9 7/8] Reimplement RLIMIT_MEMLOCK " Alexey Gladkov
2021-03-23 20:59 ` Alexey Gladkov
2021-03-23 20:59 ` [PATCH v9 8/8] kselftests: Add test to check for rlimit changes in different user namespaces Alexey Gladkov
2021-03-23 20:59 ` Alexey Gladkov
2021-04-05 17:03 ` Eric W. Biederman [this message]
2021-04-05 17:03 ` [PATCH v9 0/8] Count rlimits in each user namespace Eric W. Biederman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=m17dlgll4m.fsf@fess.ebiederm.org \
--to=ebiederm@xmission.com \
--cc=akpm@linux-foundation.org \
--cc=axboe@kernel.dk \
--cc=containers@lists.linux-foundation.org \
--cc=gladkov.alexey@gmail.com \
--cc=jannh@google.com \
--cc=keescook@chromium.org \
--cc=kernel-hardening@lists.openwall.com \
--cc=legion@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=oleg@redhat.com \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.