From: "David Hildenbrand (Arm)" <david@kernel.org>
To: "Lorenzo Stoakes (Oracle)" <ljs@kernel.org>,
Qi Tang <tpluszz77@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
Cyrill Gorcunov <gorcunov@openvz.org>,
Oleg Nesterov <oleg@redhat.com>,
linux-kernel@vger.kernel.org, stable@vger.kernel.org
Subject: Re: [PATCH] prctl: require checkpoint_restore_ns_capable for PR_SET_MM_MAP
Date: Thu, 2 Apr 2026 15:55:27 +0200 [thread overview]
Message-ID: <389887c2-ddae-4456-b9d2-417aaaa2b340@kernel.org> (raw)
In-Reply-To: <686134c9-c2e3-444f-b83a-dd229c7b0102@lucifer.local>
On 4/2/26 15:06, Lorenzo Stoakes (Oracle) wrote:
> On Thu, Apr 02, 2026 at 07:13:32PM +0800, Qi Tang wrote:
>> prctl_set_mm_map() allows modifying all mm_struct boundaries and
>> the saved auxv vector. The individual field path (PR_SET_MM_START_CODE
>> etc.) correctly requires CAP_SYS_RESOURCE, but the PR_SET_MM_MAP path
>> dispatches before this check and has no capability requirement of its
>> own when exe_fd is -1.
>>
>> This means any unprivileged user on a CONFIG_CHECKPOINT_RESTORE kernel
>> (nearly all distros) can rewrite mm boundaries including start_brk, brk,
>> arg_start/end, env_start/end and saved_auxv. Consequences include:
>>
>> - SELinux PROCESS__EXECHEAP bypass via start_brk manipulation
>> - procfs info disclosure by pointing arg/env ranges at other memory
>> - auxv poisoning (AT_SYSINFO_EHDR, AT_BASE, AT_ENTRY)
>>
>> The original commit f606b77f1a9e ("prctl: PR_SET_MM -- introduce
>> PR_SET_MM_MAP operation") states "we require the caller to be at least
>> user-namespace root user", but this was never enforced in the code.
>>
>> Add a checkpoint_restore_ns_capable() check at the top of
>> prctl_set_mm_map(), after the PR_SET_MM_MAP_SIZE early return. This
>> requires CAP_CHECKPOINT_RESTORE or CAP_SYS_ADMIN in the caller's
>> user namespace, matching the stated design intent and the existing
>> check for exe_fd changes.
>>
>> Fixes: f606b77f1a9e ("prctl: PR_SET_MM -- introduce PR_SET_MM_MAP operation")
>
> We've had a gaping security hole since 2014 and nobody noticed? I find it
> hard to believe.
>
>> Cc: stable@vger.kernel.org
>> Cc: Cyrill Gorcunov <gorcunov@openvz.org>
>> Signed-off-by: Qi Tang <tpluszz77@gmail.com>
>> ---
>> kernel/sys.c | 3 +++
>> 1 file changed, 3 insertions(+)
>>
>> diff --git a/kernel/sys.c b/kernel/sys.c
>> index c86eba9aa7e9..2b8c57f23a35 100644
>> --- a/kernel/sys.c
>> +++ b/kernel/sys.c
>> @@ -2071,6 +2071,9 @@ static int prctl_set_mm_map(int opt, const void __user *addr, unsigned long data
>> return put_user((unsigned int)sizeof(prctl_map),
>> (unsigned int __user *)addr);
>>
>> + if (!checkpoint_restore_ns_capable(current_user_ns()))
>> + return -EPERM;
>
> Hmm there is already:
>
> if (prctl_map.exe_fd != (u32)-1) {
> /*
> * Check if the current user is checkpoint/restore capable.
> * At the time of this writing, it checks for CAP_SYS_ADMIN
> * or CAP_CHECKPOINT_RESTORE.
> * Note that a user with access to ptrace can masquerade an
> * arbitrary program as any executable, even setuid ones.
> * This may have implications in the tomoyo subsystem.
> */
> if (!checkpoint_restore_ns_capable(current_user_ns()))
> return -EPERM;
>
> And you're proposing _adding_ this check on top of that? Seems super
> redundant.
Yes, should be moved.
>
> but also, this seems super-specific buuut... Then again #ifdef
> CONFIG_CHECKPOINT_RESTORE around this. Ugh.
>
> I _hate_ this inteface. HATE HATE HATE it.
>
> Anyway, does updating _your own_ auxv really require elevated permissions
> like this?
>
> I don't think so? Couldn't you go and manipulate that anyway without
> elevated anything?
Hard to believe ...
I was wondering whether this could break some users. At least CRIU doc
states:
This option tells *criu* to accept the limitations when running
as non-root. Running as non-root requires *criu* at least to have
*CAP_SYS_ADMIN* or *CAP_CHECKPOINT_RESTORE*. For details about
running *criu* as non-root please consult the *NON-ROOT* section.
I mean, the check makes sense given that prctl_set_mm() rejects all
these operations without CAP_SYS_RESOURCE.
CAP_CHECKPOINT_RESTORE was not introduced before
commit 124ea650d3072b005457faed69909221c2905a1f
Author: Adrian Reber <areber@redhat.com>
Date: Sun Jul 19 12:04:11 2020 +0200
capabilities: Introduce CAP_CHECKPOINT_RESTORE
So at the time PR_SET_MM_MAP was added there simply was no such capability.
Likely, now that we have it, we should indeed use it.
--
Cheers,
David
next prev parent reply other threads:[~2026-04-02 13:55 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-02 11:13 [PATCH] prctl: require checkpoint_restore_ns_capable for PR_SET_MM_MAP Qi Tang
2026-04-02 12:57 ` Oleg Nesterov
2026-04-02 13:07 ` Lorenzo Stoakes (Oracle)
2026-04-02 13:13 ` Oleg Nesterov
2026-04-02 13:41 ` David Hildenbrand (Arm)
2026-04-02 13:06 ` Lorenzo Stoakes (Oracle)
2026-04-02 13:55 ` David Hildenbrand (Arm) [this message]
2026-04-02 14:05 ` David Hildenbrand (Arm)
2026-04-02 14:21 ` Lorenzo Stoakes (Oracle)
2026-04-02 14:27 ` David Hildenbrand (Arm)
2026-04-02 17:46 ` Andrei Vagin
2026-04-02 13:30 ` David Hildenbrand (Arm)
2026-04-02 17:47 ` Andrew Morton
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=389887c2-ddae-4456-b9d2-417aaaa2b340@kernel.org \
--to=david@kernel.org \
--cc=akpm@linux-foundation.org \
--cc=gorcunov@openvz.org \
--cc=linux-kernel@vger.kernel.org \
--cc=ljs@kernel.org \
--cc=oleg@redhat.com \
--cc=stable@vger.kernel.org \
--cc=tpluszz77@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox