From: Giuseppe Scrivano <gscrivan@redhat.com>
To: Aleksa Sarai <cyphar@cyphar.com>
Cc: linux-kernel@vger.kernel.org, keescook@chromium.org,
bristot@redhat.com, ebiederm@xmission.com, brauner@kernel.org,
viro@zeniv.linux.org.uk, alexl@redhat.com, peterz@infradead.org,
bmasney@redhat.com
Subject: Re: [PATCH v3 1/2] exec: add PR_HIDE_SELF_EXE prctl
Date: Wed, 25 Jan 2023 17:30:37 +0100 [thread overview]
Message-ID: <871qnibmqa.fsf@redhat.com> (raw)
In-Reply-To: <20230125152847.wr443tggzb3no6mg@senku> (Aleksa Sarai's message of "Thu, 26 Jan 2023 02:28:47 +1100")
Aleksa Sarai <cyphar@cyphar.com> writes:
> On 2023-01-24, Giuseppe Scrivano <gscrivan@redhat.com> wrote:
>> Aleksa Sarai <cyphar@cyphar.com> writes:
>>
>> > On 2023-01-20, Giuseppe Scrivano <gscrivan@redhat.com> wrote:
>> >> This patch adds a new prctl called PR_HIDE_SELF_EXE which allows
>> >> processes to hide their own /proc/*/exe file. When this prctl is
>> >> used, every access to /proc/*/exe for the calling process will
>> >> fail with ENOENT.
>> >>
>> >> This is useful for preventing issues like CVE-2019-5736, where an
>> >> attacker can gain host root access by overwriting the binary
>> >> in OCI runtimes through file-descriptor mishandling in containers.
>> >>
>> >> The current fix for CVE-2019-5736 is to create a read-only copy or
>> >> a bind-mount of the current executable, and then re-exec the current
>> >> process. With the new prctl, the read-only copy or bind-mount copy is
>> >> not needed anymore.
>> >>
>> >> While map_files/ also might contain symlinks to files in host,
>> >> proc_map_files_get_link() permissions checks are already sufficient.
>> >
>> > I suspect this doesn't protect against the execve("/proc/self/exe")
>> > tactic (because it clears the bit on execve), so I'm not sure this is
>> > much safer than PR_SET_DUMPABLE (yeah, it stops root in the source
>> > userns from accessing /proc/$pid/exe but the above attack makes that no
>> > longer that important).
>>
>> it protects against that attack too. It clears the bit _after_ the
>> execve() syscall is done.
>>
>> If you attempt execve("/proc/self/exe") you still get ENOENT:
>>
>> ```
>> #include <stdlib.h>
>> #include <stdio.h>
>> #include <sys/prctl.h>
>> #include <unistd.h>
>>
>> int main(void)
>> {
>> int ret;
>>
>> ret = prctl(65, 1, 0, 0, 0);
>> if (ret != 0)
>> exit(1);
>>
>> execl("/proc/self/exe", "foo", NULL);
>> exit(2);
>> }
>> ```
>>
>> # strace -e prctl,execve ./hide-self-exe
>> execve("./hide-self-exe", ["./hide-self-exe"], 0x7fff975a3690 /* 39 vars */) = 0
>> prctl(0x41 /* PR_??? */, 0x1, 0, 0, 0) = 0
>> execve("/proc/self/exe", ["foo"], 0x7ffcf51868b8 /* 39 vars */) = -1 ENOENT (No such file or directory)
>> +++ exited with 2 +++
>>
>> I've also tried execv'ing with a script that uses "#!/proc/self/exe" and
>> I get the same ENOENT.
>
> Ah, you're right. As you mentioned, you could still do the attack
> through /proc/self/map_files but that would require you to know where
> the binary will be located (and being non-dumpable blocks container
> processes from doing tricks to get the right path).
>
> I wonder if we should somehow require (or auto-apply) SUID_DUMP_NONE
> when setting this prctl, since it does currently depend on it to be
> properly secure...
from what I can see, access to /proc/*/map_files is already protected
by proc_map_files_get_link() that requires either CAP_SYS_ADMIN in the
initial user namespace or CAP_CHECKPOINT_RESTORE in the user namespace.
Setting SUID_DUMP_NONE wouldn't hurt though :-)
After reading some comments on the LWN.net article, I wonder if
PR_HIDE_SELF_EXE should apply to CAP_SYS_ADMIN in the initial user
namespace or if in this case root should keep the privilege to inspect
the binary of a process. If a container runs with that many privileges
then it has already other ways to damage the host anyway.
>> > I think the only way to fix this properly is by blocking re-opens of
>> > magic links that have more permissions than they originally did. I just
>> > got back from vacation, but I'm working on fixing up [1] so it's ready
>> > to be an RFC so we can close this hole once and for all.
>>
>> so that relies on the fact opening /proc/self/exe with O_WRONLY fails
>> with ETXTBSY?
>
> Not quite, it relies on the fact that /proc/self/exe (and any other
> magiclink to /proc/self/exe) does not have a write mode (semantically,
> because of -ETXTBSY) and thus blocks any attempt to open it (or re-open
> it) with a write mode. It also fixes some other possible issues and lets
> you have upgrade masks (a-la capabilities) to file descriptors.
>
> Ultimately I think having a complete "no really, nobody can touch this"
> knob is also a good idea, and as this is is much simpler we can it in
> much quicker than the magiclink stuff (which I still think is necessary
> in general).
>
>> > [1]: https://github.com/cyphar/linux/tree/magiclink/open_how-reopen
>> >
>> >>
>> >> Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
>> >> ---
>> >> v2: https://lkml.org/lkml/2023/1/19/849
>> >>
>> >> Differences from v2:
>> >>
>> >> - fixed the test to check PR_SET_HIDE_SELF_EXE after fork
>> >>
>> >> v1: https://lkml.org/lkml/2023/1/4/334
>> >>
>> >> Differences from v1:
>> >>
>> >> - amended more information in the commit message wrt map_files not
>> >> requiring the same protection.
>> >> - changed the test to verify PR_HIDE_SELF_EXE cannot be unset after
>> >> a fork.
>> >>
>> >> fs/exec.c | 1 +
>> >> fs/proc/base.c | 8 +++++---
>> >> include/linux/sched.h | 5 +++++
>> >> include/uapi/linux/prctl.h | 3 +++
>> >> kernel/sys.c | 9 +++++++++
>> >> tools/include/uapi/linux/prctl.h | 3 +++
>> >> 6 files changed, 26 insertions(+), 3 deletions(-)
>> >>
>> >> diff --git a/fs/exec.c b/fs/exec.c
>> >> index ab913243a367..5a5dd964c3a3 100644
>> >> --- a/fs/exec.c
>> >> +++ b/fs/exec.c
>> >> @@ -1855,6 +1855,7 @@ static int bprm_execve(struct linux_binprm *bprm,
>> >> /* execve succeeded */
>> >> current->fs->in_exec = 0;
>> >> current->in_execve = 0;
>> >> + task_clear_hide_self_exe(current);
>> >> rseq_execve(current);
>> >> acct_update_integrals(current);
>> >> task_numa_free(current, false);
>> >> diff --git a/fs/proc/base.c b/fs/proc/base.c
>> >> index 9e479d7d202b..959968e2da0d 100644
>> >> --- a/fs/proc/base.c
>> >> +++ b/fs/proc/base.c
>> >> @@ -1723,19 +1723,21 @@ static int proc_exe_link(struct dentry *dentry, struct path *exe_path)
>> >> {
>> >> struct task_struct *task;
>> >> struct file *exe_file;
>> >> + long hide_self_exe;
>> >>
>> >> task = get_proc_task(d_inode(dentry));
>> >> if (!task)
>> >> return -ENOENT;
>> >> exe_file = get_task_exe_file(task);
>> >> + hide_self_exe = task_hide_self_exe(task);
>> >> put_task_struct(task);
>> >> - if (exe_file) {
>> >> + if (exe_file && !hide_self_exe) {
>> >> *exe_path = exe_file->f_path;
>> >> path_get(&exe_file->f_path);
>> >> fput(exe_file);
>> >> return 0;
>> >> - } else
>> >> - return -ENOENT;
>> >> + }
>> >> + return -ENOENT;
>> >> }
>> >>
>> >> static const char *proc_pid_get_link(struct dentry *dentry,
>> >> diff --git a/include/linux/sched.h b/include/linux/sched.h
>> >> index 853d08f7562b..8db32d5fc285 100644
>> >> --- a/include/linux/sched.h
>> >> +++ b/include/linux/sched.h
>> >> @@ -1790,6 +1790,7 @@ static __always_inline bool is_percpu_thread(void)
>> >> #define PFA_SPEC_IB_DISABLE 5 /* Indirect branch speculation restricted */
>> >> #define PFA_SPEC_IB_FORCE_DISABLE 6 /* Indirect branch speculation permanently restricted */
>> >> #define PFA_SPEC_SSB_NOEXEC 7 /* Speculative Store Bypass clear on execve() */
>> >> +#define PFA_HIDE_SELF_EXE 8 /* Hide /proc/self/exe for the process */
>> >>
>> >> #define TASK_PFA_TEST(name, func) \
>> >> static inline bool task_##func(struct task_struct *p) \
>> >> @@ -1832,6 +1833,10 @@ TASK_PFA_CLEAR(SPEC_IB_DISABLE, spec_ib_disable)
>> >> TASK_PFA_TEST(SPEC_IB_FORCE_DISABLE, spec_ib_force_disable)
>> >> TASK_PFA_SET(SPEC_IB_FORCE_DISABLE, spec_ib_force_disable)
>> >>
>> >> +TASK_PFA_TEST(HIDE_SELF_EXE, hide_self_exe)
>> >> +TASK_PFA_SET(HIDE_SELF_EXE, hide_self_exe)
>> >> +TASK_PFA_CLEAR(HIDE_SELF_EXE, hide_self_exe)
>> >> +
>> >> static inline void
>> >> current_restore_flags(unsigned long orig_flags, unsigned long flags)
>> >> {
>> >> diff --git a/include/uapi/linux/prctl.h b/include/uapi/linux/prctl.h
>> >> index a5e06dcbba13..f12f3df12468 100644
>> >> --- a/include/uapi/linux/prctl.h
>> >> +++ b/include/uapi/linux/prctl.h
>> >> @@ -284,4 +284,7 @@ struct prctl_mm_map {
>> >> #define PR_SET_VMA 0x53564d41
>> >> # define PR_SET_VMA_ANON_NAME 0
>> >>
>> >> +#define PR_SET_HIDE_SELF_EXE 65
>> >> +#define PR_GET_HIDE_SELF_EXE 66
>> >> +
>> >> #endif /* _LINUX_PRCTL_H */
>> >> diff --git a/kernel/sys.c b/kernel/sys.c
>> >> index 5fd54bf0e886..e992f1b72973 100644
>> >> --- a/kernel/sys.c
>> >> +++ b/kernel/sys.c
>> >> @@ -2626,6 +2626,15 @@ SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, arg3,
>> >> case PR_SET_VMA:
>> >> error = prctl_set_vma(arg2, arg3, arg4, arg5);
>> >> break;
>> >> + case PR_SET_HIDE_SELF_EXE:
>> >> + if (arg2 != 1 || arg3 || arg4 || arg5)
>> >> + return -EINVAL;
>> >> + task_set_hide_self_exe(current);
>> >> + break;
>> >> + case PR_GET_HIDE_SELF_EXE:
>> >> + if (arg2 || arg3 || arg4 || arg5)
>> >> + return -EINVAL;
>> >> + return task_hide_self_exe(current) ? 1 : 0;
>> >> default:
>> >> error = -EINVAL;
>> >> break;
>> >> diff --git a/tools/include/uapi/linux/prctl.h b/tools/include/uapi/linux/prctl.h
>> >> index a5e06dcbba13..f12f3df12468 100644
>> >> --- a/tools/include/uapi/linux/prctl.h
>> >> +++ b/tools/include/uapi/linux/prctl.h
>> >> @@ -284,4 +284,7 @@ struct prctl_mm_map {
>> >> #define PR_SET_VMA 0x53564d41
>> >> # define PR_SET_VMA_ANON_NAME 0
>> >>
>> >> +#define PR_SET_HIDE_SELF_EXE 65
>> >> +#define PR_GET_HIDE_SELF_EXE 66
>> >> +
>> >> #endif /* _LINUX_PRCTL_H */
>> >> --
>> >> 2.38.1
>> >>
>>
next prev parent reply other threads:[~2023-01-25 16:31 UTC|newest]
Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-01-20 10:25 [PATCH v3 1/2] exec: add PR_HIDE_SELF_EXE prctl Giuseppe Scrivano
2023-01-20 10:25 ` [PATCH v3 2/2] selftests: add tests for prctl(SET_HIDE_SELF_EXE) Giuseppe Scrivano
2023-01-20 16:05 ` Brian Masney
2023-01-23 18:41 ` [PATCH v3 1/2] exec: add PR_HIDE_SELF_EXE prctl Colin Walters
2023-01-23 19:21 ` Giuseppe Scrivano
2023-01-23 22:07 ` Colin Walters
2023-01-23 22:54 ` Giuseppe Scrivano
2023-01-23 23:14 ` Colin Walters
2023-01-24 1:53 ` Aleksa Sarai
2023-01-24 7:29 ` Giuseppe Scrivano
2023-01-25 15:28 ` Aleksa Sarai
2023-01-25 16:30 ` Giuseppe Scrivano [this message]
2023-01-29 13:59 ` Colin Walters
2023-01-29 16:58 ` Christian Brauner
2023-01-29 18:12 ` Colin Walters
2023-01-30 9:53 ` Christian Brauner
2023-01-30 10:06 ` Christian Brauner
2023-01-30 21:52 ` Colin Walters
2023-01-31 14:17 ` Giuseppe Scrivano
2023-02-25 0:27 ` Andrei Vagin
2023-02-28 14:19 ` Giuseppe Scrivano
2023-01-26 8:25 ` Christian Brauner
2023-01-24 19:17 ` Andrei Vagin
2023-01-27 12:31 ` Christian Brauner
2023-01-27 20:34 ` Kees Cook
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=871qnibmqa.fsf@redhat.com \
--to=gscrivan@redhat.com \
--cc=alexl@redhat.com \
--cc=bmasney@redhat.com \
--cc=brauner@kernel.org \
--cc=bristot@redhat.com \
--cc=cyphar@cyphar.com \
--cc=ebiederm@xmission.com \
--cc=keescook@chromium.org \
--cc=linux-kernel@vger.kernel.org \
--cc=peterz@infradead.org \
--cc=viro@zeniv.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.