public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Giuseppe Scrivano <gscrivan@redhat.com>
To: Christian Brauner <brauner@kernel.org>
Cc: Colin Walters <walters@verbum.org>,
	Aleksa Sarai <cyphar@cyphar.com>,
	linux-kernel@vger.kernel.org, Kees Cook <keescook@chromium.org>,
	bristot@redhat.com, "Eric W. Biederman" <ebiederm@xmission.com>,
	Al Viro <viro@zeniv.linux.org.uk>,
	Alexander Larsson <alexl@redhat.com>,
	Peter Zijlstra <peterz@infradead.org>,
	bmasney@redhat.com
Subject: Re: [PATCH v3 1/2] exec: add PR_HIDE_SELF_EXE prctl
Date: Tue, 31 Jan 2023 15:17:16 +0100	[thread overview]
Message-ID: <875ycmg55f.fsf@redhat.com> (raw)
In-Reply-To: <20230130100602.elyvs6oorfzukjwh@wittgenstein> (Christian Brauner's message of "Mon, 30 Jan 2023 11:06:02 +0100")

Christian Brauner <brauner@kernel.org> writes:

> On Mon, Jan 30, 2023 at 10:53:31AM +0100, Christian Brauner wrote:
>> On Sun, Jan 29, 2023 at 01:12:45PM -0500, Colin Walters wrote:
>> > 
>> > 
>> > On Sun, Jan 29, 2023, at 11:58 AM, Christian Brauner wrote:
>> > > On Sun, Jan 29, 2023 at 08:59:32AM -0500, Colin Walters wrote:
>> > >> 
>> > >> 
>> > >> On Wed, Jan 25, 2023, at 11:30 AM, Giuseppe Scrivano wrote:
>> > >> > 
>> > >> > After reading some comments on the LWN.net article, I wonder if
>> > >> > PR_HIDE_SELF_EXE should apply to CAP_SYS_ADMIN in the initial user
>> > >> > namespace or if in this case root should keep the privilege to inspect
>> > >> > the binary of a process.  If a container runs with that many privileges
>> > >> > then it has already other ways to damage the host anyway.
>> > >> 
>> > >> Right, that's what I was trying to express with the "make it
>> > >> work the same as map_files".  Hiding the entry entirely even
>> > >> for initial-namespace-root (real root) seems like it's going to
>> > >> potentially confuse profiling/tracing/debugging tools for no
>> > >> good reason.
>> > >
>> > > If this can be circumvented via CAP_SYS_ADMIN 
>> > 
>> > To be clear, I'm proposing CAP_SYS_ADMIN in the current user
>> > namespace at the time of the prctl().  (Or if keeping around a
>> > reference just for this is too problematic, perhaps hardcoding to
>> > the init ns)
>> 
>> Oh no, I fully understand. The point was that the userspace fix protects
>> even against attackers with CAP_SYS_ADMIN in init_user_ns. And that was
>> important back then and is still relevant today for some workloads.
>> 
>> For unprivileged containers where host and container are separate by a
>> meaningful user namespace boundary this whole mitigation is irrelevant
>> as the binary can't be overwritten.
>> 
>> > 
>> > A process with CAP_SYS_ADMIN in a child namespace would still not be able to read the binary.
>> > 
>> > > then this mitigation
>> > > becomes immediately way less interesting because the userspace
>> > > mitigation we came up with protects against CAP_SYS_ADMIN as well
>> > > without any regression risk. 
>> > 
>> > The userspace mitigation here being "clone self to memfd"?  But that's a sufficiently ugly workaround that it's created new problems; see https://lwn.net/Articles/918106/
>> 
>> But this is a problem with the memfd api not with the fix. Following the
>> thread the ability to create executable memfds will stay around. As it
>> should be given how long this has been supported. And they have backward
>> compatibility in mind which is great.
>
> Following up from yesterday's promise to check with the criu org I'm
> part of: this is going to break criu unforunately as it dumps (and
> restores) /proc/self/exe. Even with an escape hatch we'd still risk
> breaking it. Whereas again, the memfd solution doesn't cause those
> issues.
>
> Don't get me wrong it's pretty obvious that I was pretty supportive of
> this fix especially because it looked rather simple but this is turning
> out to be less simple than we tought. I don't think that this is worth
> it given the functioning fixes we already have.
>
> The good thing is that - even if it will take a longer - that Aleksa's
> patchset will provide a more general solution by making it possible for
> runc/crun/lxc to open the target binary with a restricted upgrade mask
> making it impossible to open the binary read-write again. This won't
> break criu and will fix this issue and is generally useful.

I was not aware that running with CAP_SYS_ADMIN in the initial userns
was considered as a use case, but in this case don't we need to protect
/proc/$PID/map_files as well or do we rely only on randomize_va_space? 
It is a more difficult to guess the name but we can still exec these
files and grab a reference to them.

The current patch I've proposed is probably a too big hammer for the
small issue we really have:

other processes from the container are already blocked by PR_SET_DUMPABLE unless
CAP_SYS_PTRACE is granted; but if CAP_SYS_PTRACE is granted then it seems
already vulnerable today since processes from the container can just
read the /proc/PID/map_files files without even requiring the exec trick.

So the only hole left, that I can see, is that the container runtime
can be tricked to exec /proc/self/exe (or /proc/self/map_files/*) and
from there open a reference to the binary.

Could we just restrict the usage to the current thread group?  That
won't affect in any way other processes.

The patch won't be too much more complicated, we just need to amend the
following fix:

diff --git a/fs/proc/base.c b/fs/proc/base.c
index e9127084b82a..2f5c5ed2dae8 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -1723,6 +1723,7 @@ static int proc_exe_link(struct dentry *dentry, struct path *exe_path)
 {
        struct task_struct *task;
        struct file *exe_file;
+       bool is_same_tgroup;
        long hide_self_exe;
 
        task = get_proc_task(d_inode(dentry));
@@ -1730,8 +1731,9 @@ static int proc_exe_link(struct dentry *dentry, struct path *exe_path)
                return -ENOENT;
        exe_file = get_task_exe_file(task);
        hide_self_exe = task_hide_self_exe(task);
+       is_same_tgroup = same_thread_group(current, task);
        put_task_struct(task);
-       if (hide_self_exe)
+       if (hide_self_exe && is_same_tgroup)
                return -EPERM;
        if (exe_file) {
                *exe_path = exe_file->f_path;

Would that be sufficient or are there other ways to attack it?

Given the premise about CAP_SYS_ADMIN (and even more loosen
CAP_CHECKPOINT_RESTORE in the *current* user namespace), I think we
probably need a similar protection fo /proc/PID/map_files.

Regards,
Giuseppe


  parent reply	other threads:[~2023-01-31 14:18 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-01-20 10:25 [PATCH v3 1/2] exec: add PR_HIDE_SELF_EXE prctl Giuseppe Scrivano
2023-01-20 10:25 ` [PATCH v3 2/2] selftests: add tests for prctl(SET_HIDE_SELF_EXE) Giuseppe Scrivano
2023-01-20 16:05   ` Brian Masney
2023-01-23 18:41 ` [PATCH v3 1/2] exec: add PR_HIDE_SELF_EXE prctl Colin Walters
2023-01-23 19:21   ` Giuseppe Scrivano
2023-01-23 22:07     ` Colin Walters
2023-01-23 22:54       ` Giuseppe Scrivano
2023-01-23 23:14         ` Colin Walters
2023-01-24  1:53 ` Aleksa Sarai
2023-01-24  7:29   ` Giuseppe Scrivano
2023-01-25 15:28     ` Aleksa Sarai
2023-01-25 16:30       ` Giuseppe Scrivano
2023-01-29 13:59         ` Colin Walters
2023-01-29 16:58           ` Christian Brauner
2023-01-29 18:12             ` Colin Walters
2023-01-30  9:53               ` Christian Brauner
2023-01-30 10:06                 ` Christian Brauner
2023-01-30 21:52                   ` Colin Walters
2023-01-31 14:17                   ` Giuseppe Scrivano [this message]
2023-02-25  0:27                   ` Andrei Vagin
2023-02-28 14:19                     ` Giuseppe Scrivano
2023-01-26  8:25       ` Christian Brauner
2023-01-24 19:17   ` Andrei Vagin
2023-01-27 12:31 ` Christian Brauner
2023-01-27 20:34   ` Kees Cook

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=875ycmg55f.fsf@redhat.com \
    --to=gscrivan@redhat.com \
    --cc=alexl@redhat.com \
    --cc=bmasney@redhat.com \
    --cc=brauner@kernel.org \
    --cc=bristot@redhat.com \
    --cc=cyphar@cyphar.com \
    --cc=ebiederm@xmission.com \
    --cc=keescook@chromium.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=peterz@infradead.org \
    --cc=viro@zeniv.linux.org.uk \
    --cc=walters@verbum.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox