[PATCH] security/commoncap: don't assume "setid" if all ids are identical

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH] security/commoncap: don't assume "setid" if all ids are identical
@ 2025-03-06  8:26 Max Kellermann
  2025-03-07 10:32 ` kernel test robot
                   ` (2 more replies)
  0 siblings, 3 replies; 44+ messages in thread
From: Max Kellermann @ 2025-03-06  8:26 UTC (permalink / raw)
  To: serge, paul, jmorris, linux-security-module, linux-kernel; +Cc: Max Kellermann

If a program enables `NO_NEW_PRIVS` and sets up
differing real/effective/saved/fs ids, the effective ids are
downgraded during exec because the kernel believes it should "get no
more than they had, and maybe less".

I believe it is safe to keep differing ids even if `NO_NEW_PRIVS` is
set.  The newly executed program doesn't get any more, but there's no
reason to give it less.

This is different from "set[ug]id/setpcap" execution where privileges
may be raised; here, the assumption that it's "set[ug]id" if
effective!=real is too broad.

If we verify that all user/group ids remain as they were, we can
safely allow the new program to keep them.

Signed-off-by: Max Kellermann <max.kellermann@ionos.com>
---
 security/commoncap.c | 23 ++++++++++++++++++++++-
 1 file changed, 22 insertions(+), 1 deletion(-)

diff --git a/security/commoncap.c b/security/commoncap.c
index 58a0c1c3e409..057a7400ef7d 100644
--- a/security/commoncap.c
+++ b/security/commoncap.c
@@ -861,6 +861,26 @@ static inline bool __is_setuid(struct cred *new, const struct cred *old)
 static inline bool __is_setgid(struct cred *new, const struct cred *old)
 { return !gid_eq(new->egid, old->gid); }
 
+/**
+ * Are all user/group ids in both cred instances identical?
+ *
+ * It can be used after __is_setuid() / __is_setgid() to check whether
+ * this is really a set*id operation or whether both processes just
+ * have differing real/effective ids.  It is safe to keep differing
+ * real/effective ids in "unsafe" program execution.
+ */
+static bool has_identical_uids_gids(const struct cred *a, const struct cred *b)
+{
+	return uid_eq(a->uid, b->uid) &&
+		gid_eq(a->gid, b->gid) &&
+		uid_eq(a->suid, b->suid) &&
+		gid_eq(a->sgid, b->sgid) &&
+		uid_eq(a->euid, b->euid) &&
+		gid_eq(a->egid, b->egid) &&
+		uid_eq(a->fsuid, b->fsuid) &&
+		gid_eq(a->fsgid, b->fsgid);
+}
+
 /*
  * 1) Audit candidate if current->cap_effective is set
  *
@@ -940,7 +960,8 @@ int cap_bprm_creds_from_file(struct linux_binprm *bprm, const struct file *file)
 	 *
 	 * In addition, if NO_NEW_PRIVS, then ensure we get no new privs.
 	 */
-	is_setid = __is_setuid(new, old) || __is_setgid(new, old);
+	is_setid = (__is_setuid(new, old) || __is_setgid(new, old)) &&
+		!has_identical_uids_gids(new, old);
 
 	if ((is_setid || __cap_gained(permitted, new, old)) &&
 	    ((bprm->unsafe & ~LSM_UNSAFE_PTRACE) ||
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* Re: [PATCH] security/commoncap: don't assume "setid" if all ids are identical
  2025-03-06  8:26 [PATCH] security/commoncap: don't assume "setid" if all ids are identical Max Kellermann
@ 2025-03-07 10:32 ` kernel test robot
  2025-03-09 15:19 ` Serge E. Hallyn
  2025-05-15 16:24 ` [PATCH] exec: Correct the permission check for unsafe exec Eric W. Biederman
  2 siblings, 0 replies; 44+ messages in thread
From: kernel test robot @ 2025-03-07 10:32 UTC (permalink / raw)
  To: Max Kellermann, serge, paul, jmorris, linux-security-module,
	linux-kernel
  Cc: oe-kbuild-all, Max Kellermann

Hi Max,

kernel test robot noticed the following build warnings:

[auto build test WARNING on pcmoore-selinux/next]
[also build test WARNING on linus/master v6.14-rc5 next-20250306]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Max-Kellermann/security-commoncap-don-t-assume-setid-if-all-ids-are-identical/20250306-162826
base:   https://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux.git next
patch link:    https://lore.kernel.org/r/20250306082615.174777-1-max.kellermann%40ionos.com
patch subject: [PATCH] security/commoncap: don't assume "setid" if all ids are identical
config: arc-allnoconfig (https://download.01.org/0day-ci/archive/20250307/202503071808.FE4vwUc4-lkp@intel.com/config)
compiler: arc-elf-gcc (GCC) 13.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20250307/202503071808.FE4vwUc4-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202503071808.FE4vwUc4-lkp@intel.com/

All warnings (new ones prefixed by >>):

>> security/commoncap.c:865: warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst
    * Are all user/group ids in both cred instances identical?


vim +865 security/commoncap.c

   863	
   864	/**
 > 865	 * Are all user/group ids in both cred instances identical?
   866	 *
   867	 * It can be used after __is_setuid() / __is_setgid() to check whether
   868	 * this is really a set*id operation or whether both processes just
   869	 * have differing real/effective ids.  It is safe to keep differing
   870	 * real/effective ids in "unsafe" program execution.
   871	 */
   872	static bool has_identical_uids_gids(const struct cred *a, const struct cred *b)
   873	{
   874		return uid_eq(a->uid, b->uid) &&
   875			gid_eq(a->gid, b->gid) &&
   876			uid_eq(a->suid, b->suid) &&
   877			gid_eq(a->sgid, b->sgid) &&
   878			uid_eq(a->euid, b->euid) &&
   879			gid_eq(a->egid, b->egid) &&
   880			uid_eq(a->fsuid, b->fsuid) &&
   881			gid_eq(a->fsgid, b->fsgid);
   882	}
   883	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH] security/commoncap: don't assume "setid" if all ids are identical
  2025-03-06  8:26 [PATCH] security/commoncap: don't assume "setid" if all ids are identical Max Kellermann
  2025-03-07 10:32 ` kernel test robot
@ 2025-03-09 15:19 ` Serge E. Hallyn
  2025-04-28 11:43   ` Max Kellermann
  2025-05-15 16:24 ` [PATCH] exec: Correct the permission check for unsafe exec Eric W. Biederman
  2 siblings, 1 reply; 44+ messages in thread
From: Serge E. Hallyn @ 2025-03-09 15:19 UTC (permalink / raw)
  To: Max Kellermann, Andy Lutomirski
  Cc: serge, paul, jmorris, linux-security-module, linux-kernel

On Thu, Mar 06, 2025 at 09:26:15AM +0100, Max Kellermann wrote:
> If a program enables `NO_NEW_PRIVS` and sets up
> differing real/effective/saved/fs ids, the effective ids are
> downgraded during exec because the kernel believes it should "get no
> more than they had, and maybe less".
> 
> I believe it is safe to keep differing ids even if `NO_NEW_PRIVS` is
> set.  The newly executed program doesn't get any more, but there's no
> reason to give it less.
> 
> This is different from "set[ug]id/setpcap" execution where privileges
> may be raised; here, the assumption that it's "set[ug]id" if
> effective!=real is too broad.
> 
> If we verify that all user/group ids remain as they were, we can
> safely allow the new program to keep them.

Thanks, it's an interesting point.  Seems to mainly depend on what users
of the feature have come to expect.

Andy, what do you think?

> Signed-off-by: Max Kellermann <max.kellermann@ionos.com>
> ---
>  security/commoncap.c | 23 ++++++++++++++++++++++-
>  1 file changed, 22 insertions(+), 1 deletion(-)
> 
> diff --git a/security/commoncap.c b/security/commoncap.c
> index 58a0c1c3e409..057a7400ef7d 100644
> --- a/security/commoncap.c
> +++ b/security/commoncap.c
> @@ -861,6 +861,26 @@ static inline bool __is_setuid(struct cred *new, const struct cred *old)
>  static inline bool __is_setgid(struct cred *new, const struct cred *old)
>  { return !gid_eq(new->egid, old->gid); }
>  
> +/**
> + * Are all user/group ids in both cred instances identical?
> + *
> + * It can be used after __is_setuid() / __is_setgid() to check whether
> + * this is really a set*id operation or whether both processes just
> + * have differing real/effective ids.  It is safe to keep differing
> + * real/effective ids in "unsafe" program execution.
> + */
> +static bool has_identical_uids_gids(const struct cred *a, const struct cred *b)
> +{
> +	return uid_eq(a->uid, b->uid) &&
> +		gid_eq(a->gid, b->gid) &&
> +		uid_eq(a->suid, b->suid) &&
> +		gid_eq(a->sgid, b->sgid) &&
> +		uid_eq(a->euid, b->euid) &&
> +		gid_eq(a->egid, b->egid) &&
> +		uid_eq(a->fsuid, b->fsuid) &&
> +		gid_eq(a->fsgid, b->fsgid);
> +}
> +
>  /*
>   * 1) Audit candidate if current->cap_effective is set
>   *
> @@ -940,7 +960,8 @@ int cap_bprm_creds_from_file(struct linux_binprm *bprm, const struct file *file)
>  	 *
>  	 * In addition, if NO_NEW_PRIVS, then ensure we get no new privs.
>  	 */
> -	is_setid = __is_setuid(new, old) || __is_setgid(new, old);
> +	is_setid = (__is_setuid(new, old) || __is_setgid(new, old)) &&
> +		!has_identical_uids_gids(new, old);
>  
>  	if ((is_setid || __cap_gained(permitted, new, old)) &&
>  	    ((bprm->unsafe & ~LSM_UNSAFE_PTRACE) ||
> -- 
> 2.47.2

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH] security/commoncap: don't assume "setid" if all ids are identical
  2025-03-09 15:19 ` Serge E. Hallyn
@ 2025-04-28 11:43   ` Max Kellermann
  2025-05-06 13:21     ` Serge E. Hallyn
  0 siblings, 1 reply; 44+ messages in thread
From: Max Kellermann @ 2025-04-28 11:43 UTC (permalink / raw)
  To: Serge E. Hallyn
  Cc: Andy Lutomirski, paul, jmorris, linux-security-module,
	linux-kernel

On Sun, Mar 9, 2025 at 4:19 PM Serge E. Hallyn <serge@hallyn.com> wrote:
>
> On Thu, Mar 06, 2025 at 09:26:15AM +0100, Max Kellermann wrote:
> > If a program enables `NO_NEW_PRIVS` and sets up
> > differing real/effective/saved/fs ids, the effective ids are
> > downgraded during exec because the kernel believes it should "get no
> > more than they had, and maybe less".
> >
> > I believe it is safe to keep differing ids even if `NO_NEW_PRIVS` is
> > set.  The newly executed program doesn't get any more, but there's no
> > reason to give it less.
> >
> > This is different from "set[ug]id/setpcap" execution where privileges
> > may be raised; here, the assumption that it's "set[ug]id" if
> > effective!=real is too broad.
> >
> > If we verify that all user/group ids remain as they were, we can
> > safely allow the new program to keep them.
>
> Thanks, it's an interesting point.  Seems to mainly depend on what users
> of the feature have come to expect.
>
> Andy, what do you think?

Serge & Andy, what's your opinion on my patch?

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH] security/commoncap: don't assume "setid" if all ids are identical
  2025-04-28 11:43   ` Max Kellermann
@ 2025-05-06 13:21     ` Serge E. Hallyn
  2025-05-06 14:51       ` Max Kellermann
  0 siblings, 1 reply; 44+ messages in thread
From: Serge E. Hallyn @ 2025-05-06 13:21 UTC (permalink / raw)
  To: Max Kellermann
  Cc: Serge E. Hallyn, Andy Lutomirski, paul, jmorris, kees, morgan,
	linux-security-module, linux-kernel

On Mon, Apr 28, 2025 at 01:43:43PM +0200, Max Kellermann wrote:
> On Sun, Mar 9, 2025 at 4:19 PM Serge E. Hallyn <serge@hallyn.com> wrote:
> >

Adding Kees and Andrew Morgan for their opinions.

Sorry, I had snipped the actual patch below.  Pre-b4 I would have appended it
here, but as you can just
   b4 mbox CAKPOu+_vTuZqsBLfRH+kyphiWAtRfWq=nKAcAYu=Wn2JBAkkYg@mail.gmail.com
I won't do so unless you ask me to.

> > On Thu, Mar 06, 2025 at 09:26:15AM +0100, Max Kellermann wrote:
> > > If a program enables `NO_NEW_PRIVS` and sets up
> > > differing real/effective/saved/fs ids, the effective ids are
> > > downgraded during exec because the kernel believes it should "get no
> > > more than they had, and maybe less".

Just to quibble here: I don't use NO_NEW_PRIVS, but it seems to me quite
likely that your claim is wrong here.  The whole SECBIT_KEEP_CAPS etc
dance is based on the idea that you understand that once you exec, you
lose some of your existing privilege.  Similarly, it seems quite
likely to me that people using NO_NEW_PRIVS understand, expect, and
count on the fact that their effective ids will be cleared on exec.

Note also that so far I'm only asking about the intent of the patch.
Apart from that, I do think the implementation is wrong, because you
are impacting non-NO_NEW_PRIVS behavior as well, such as calculation
of cap_permitted and the clearing of ambient capabilities.
And, I'm not sure the has_identical_uids_gids() is quite right, as I'm
not sure what the bprm->cred->fsuid and suid make sense, though the
process's fsuid and suid of course need to be checked.

> > > I believe it is safe to keep differing ids even if `NO_NEW_PRIVS` is
> > > set.  The newly executed program doesn't get any more, but there's no
> > > reason to give it less.
> > >
> > > This is different from "set[ug]id/setpcap" execution where privileges
> > > may be raised; here, the assumption that it's "set[ug]id" if
> > > effective!=real is too broad.
> > >
> > > If we verify that all user/group ids remain as they were, we can
> > > safely allow the new program to keep them.
> >
> > Thanks, it's an interesting point.  Seems to mainly depend on what users
> > of the feature have come to expect.
> >
> > Andy, what do you think?
> 
> Serge & Andy, what's your opinion on my patch?

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH] security/commoncap: don't assume "setid" if all ids are identical
  2025-05-06 13:21     ` Serge E. Hallyn
@ 2025-05-06 14:51       ` Max Kellermann
  2025-05-07  3:16         ` Andrew G. Morgan
  2025-05-08 22:12         ` sergeh
  0 siblings, 2 replies; 44+ messages in thread
From: Max Kellermann @ 2025-05-06 14:51 UTC (permalink / raw)
  To: Serge E. Hallyn
  Cc: Andy Lutomirski, paul, jmorris, kees, morgan,
	linux-security-module, linux-kernel

On Tue, May 6, 2025 at 3:22 PM Serge E. Hallyn <serge@hallyn.com> wrote:
> Just to quibble here: I don't use NO_NEW_PRIVS, but it seems to me quite
> likely that your claim is wrong here.  The whole SECBIT_KEEP_CAPS etc
> dance is based on the idea that you understand that once you exec, you
> lose some of your existing privilege.  Similarly, it seems quite
> likely to me that people using NO_NEW_PRIVS understand, expect, and
> count on the fact that their effective ids will be cleared on exec.

One could define NO_NEW_PRIVS that way, but that's not how it is documented.
Of course, we can't rule out that somewhere, somebody exists who
relies on the current behavior, and that we must preserve it for ABI
stability (I think this was your point). If you desire ABI stability,
then this behavior should really be documented.

To me, the current implementation looks weird and buggy (but that's
just my opinion). The code figures that that it's a set-id exec when
effective!=real, which is indeed how set-id execution looks like, but
still that check is slightly off:

1. it's really only set-id when new!=old; checking real!=effective is
conceptually the wrong angle
2. there may be other reasons why real!=effective

My patch is an attempt to fix this in an unintrusive way, by not
rewriting it but adding another check to rule out some special case.
If I were to rewrite this from scratch, I'd do it differently (only
compare new!=old), but I don't want to mess too much with security
code that I'm not very familiar with. I believe the guy who initially
wrote it made wrong assumptions, but maybe I'm just wrong, I'm not the
expert here.

> Note also that so far I'm only asking about the intent of the patch.

In a shared webhosting environment, we want to run an Apache (or
nginx) in each website's container. If the website owner does "chmod
600", the Apache should not be able to read the file; but PHP
processes spawned by the Apache should have full access. Therefore, we
run Apache with a different fsuid; when Apache executes PHP, the fsuid
is reverted.

But how to spawn Apache with a different fsuid? Not possible directly
(fsuid is always reverted on exec), but by giving it a different euid
(and ruid = website uid), granting it access to that secondary uid.
After exec, Apache swaps uids, sets effective=real=apache_uid, and
fsuid=website_uid.
That works fine, until we enable NO_NEW_PRIVS - which is surprising,
because we indeed don't want any new privs - just keep the existing
ones.
The documentation doesn't explain this behavior, and we don't want to
omit NO_NEW_PRIVS as a workaround.

> Apart from that, I do think the implementation is wrong, because you
> are impacting non-NO_NEW_PRIVS behavior as well, such as calculation
> of cap_permitted and the clearing of ambient capabilities.

You are right, it affects all three code blocks that are checking
"is_setid", but why do you believe it's wrong?
I can move the new check to the bottom, covering only the
"secureexec=1" line, if that worries you.

What sure is flawed is that my patch description fails to mention the
other two changes. Sorry for that, I'll amend the description (if/when
we agree that my patch is ok).

Though I do believe that all 3 changes are correct. Why would you want
to clear ambient capabilities just because real!=effective? The
manpage says: "Executing a program that changes UID or GID due to the
set-user-ID or set-group-ID bits or executing a program that has  any
file  capabilities set will clear the ambient set."

Documentation and code disagree! Currently, the kernel does not check
for "changes UID/GID", but whether effective!=real. These two are
orthogonal, the kernel is buggy, and my patch makes it a little bit
more correct (but does not remove the wrong real!=effective check, see
above).

> And, I'm not sure the has_identical_uids_gids() is quite right, as I'm
> not sure what the bprm->cred->fsuid and suid make sense, though the
> process's fsuid and suid of course need to be checked.

Sorry, I don't get that. What do you mean?

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH] security/commoncap: don't assume "setid" if all ids are identical
  2025-05-06 14:51       ` Max Kellermann
@ 2025-05-07  3:16         ` Andrew G. Morgan
  2025-05-07  6:33           ` Max Kellermann
  2025-05-08 22:12         ` sergeh
  1 sibling, 1 reply; 44+ messages in thread
From: Andrew G. Morgan @ 2025-05-07  3:16 UTC (permalink / raw)
  To: Max Kellermann
  Cc: Serge E. Hallyn, Andy Lutomirski, paul, jmorris, kees,
	linux-security-module, linux-kernel

On Tue, May 6, 2025 at 7:51 AM Max Kellermann <max.kellermann@ionos.com> wrote:
> To me, the current implementation looks weird and buggy (but that's
> just my opinion). The code figures that that it's a set-id exec when
> effective!=real, which is indeed how set-id execution looks like, but
> still that check is slightly off:
>
> 1. it's really only set-id when new!=old; checking real!=effective is
> conceptually the wrong angle
> 2. there may be other reasons why real!=effective
>
> My patch is an attempt to fix this in an unintrusive way, by not
> rewriting it but adding another check to rule out some special case.
> If I were to rewrite this from scratch, I'd do it differently (only
> compare new!=old), but I don't want to mess too much with security
> code that I'm not very familiar with. I believe the guy who initially
> wrote it made wrong assumptions, but maybe I'm just wrong, I'm not the
> expert here.
>
> > Note also that so far I'm only asking about the intent of the patch.
>
> In a shared webhosting environment, we want to run an Apache (or
> nginx) in each website's container. If the website owner does "chmod
> 600", the Apache should not be able to read the file; but PHP
> processes spawned by the Apache should have full access. Therefore, we
> run Apache with a different fsuid; when Apache executes PHP, the fsuid
> is reverted.

[...]

> > Apart from that, I do think the implementation is wrong, because you
> > are impacting non-NO_NEW_PRIVS behavior as well, such as calculation
> > of cap_permitted and the clearing of ambient capabilities.
>
> You are right, it affects all three code blocks that are checking
> "is_setid", but why do you believe it's wrong?
> I can move the new check to the bottom, covering only the
> "secureexec=1" line, if that worries you.

If a setuid program execs itself, does the presence of this code undo
any protection the kernel afforded it on its first invocation? It
feels like changing that could easily turn out to be exploitable in
some context. Programs re-exec themselves all the time.

> > And, I'm not sure the has_identical_uids_gids() is quite right, as I'm
> > not sure what the bprm->cred->fsuid and suid make sense, though the
> > process's fsuid and suid of course need to be checked.
>
> Sorry, I don't get that. What do you mean?

I suspect Serge is noting that s*id and f*uid are forced to be the
corresponding e*id later in the code (indeed you mention this feature
above where I've put [...]), so comparing the values before that takes
effect isn't really comparing the right values in your added code.

FWIW I ran the libcap quicktest.sh script against your change and it
doesn't break any capability thing I test for when making libcap
releases. That being said, in general this whole zoo of *[ug]id values
and their subtle behavior is not casually obvious. I'd recommend using
a userspace workaround for your use case, and not changing the legacy
behavior of the kernel.

Cheers

Andrew

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH] security/commoncap: don't assume "setid" if all ids are identical
  2025-05-07  3:16         ` Andrew G. Morgan
@ 2025-05-07  6:33           ` Max Kellermann
  2025-05-08  3:32             ` Andrew G. Morgan
  2025-05-09 17:50             ` Max Kellermann
  0 siblings, 2 replies; 44+ messages in thread
From: Max Kellermann @ 2025-05-07  6:33 UTC (permalink / raw)
  To: Andrew G. Morgan
  Cc: Serge E. Hallyn, Andy Lutomirski, paul, jmorris, kees,
	linux-security-module, linux-kernel

On Wed, May 7, 2025 at 5:16 AM Andrew G. Morgan <morgan@kernel.org> wrote:
> If a setuid program execs itself, does the presence of this code undo
> any protection the kernel afforded it on its first invocation?

What protection do you mean, and what behavior do you expect when
setid execs itself? I see this affects:

1. reset effective ids to real ids (only affects NO_NEW_PRIVS)
2. new cap_permitted cannot be higher than old cap_permitted
3. clear cap_ambient
4. clear pdeath_signal (in begin_new_exec)
5. reset stack limits (in begin_new_exec)

About these (from my very limited knowledge of this part of the kernel):

1. is my primary goal, and really no new privs gained by allowing the
process to keep existing ids
2. only ever changes anything if new cap_permitted is higher, but if
that's the case, the is_setid check is irrelevant because __cap_gained
is true, therefore no change with my patch
3. as I already described, the kernel is wrong (or the documentation
is wrong), and my patch adjusts kernel to behave as documented
4. I don't see how this is dangerous for anything regarding re-exec;
if pdeath_signal wasn't reset on the first exec, it's safe to keep it
after the re-exec, too
5. same as 4, I think

Did I miss anything?

> FWIW I ran the libcap quicktest.sh script against your change and it
> doesn't break any capability thing I test for when making libcap
> releases.

Thanks for taking the time to run these tests. I'm glad the existing
tests didn't find any obvious bugs. If we identify an actual problem
with my patch, let's write a new test that fails with my patch!

> That being said, in general this whole zoo of *[ug]id values
> and their subtle behavior is not casually obvious. I'd recommend using
> a userspace workaround for your use case, and not changing the legacy
> behavior of the kernel.

What userspace workaround would be possible? The only thing that comes
to my mind is to apply NO_NEW_PRIVS in the child process after exec -
but that's too late, because arbitrary untrusted code has already been
executed at this point. This means I lose NO_NEW_PRIVS completely,
because the attacker can simply skip the call.

Max

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH] security/commoncap: don't assume "setid" if all ids are identical
  2025-05-07  6:33           ` Max Kellermann
@ 2025-05-08  3:32             ` Andrew G. Morgan
  2025-05-08  6:38               ` Max Kellermann
  2025-05-08  8:37               ` Max Kellermann
  2025-05-09 17:50             ` Max Kellermann
  1 sibling, 2 replies; 44+ messages in thread
From: Andrew G. Morgan @ 2025-05-08  3:32 UTC (permalink / raw)
  To: Max Kellermann
  Cc: Serge E. Hallyn, Andy Lutomirski, paul, jmorris, kees,
	linux-security-module, linux-kernel

I'm looking at this code:

        /* Check for privilege-elevated exec. */
        if (is_setid ||
            (!__is_real(root_uid, new) &&
             (effective ||
              __cap_grew(permitted, ambient, new))))
                bprm->secureexec = 1;

If a luser runs a setuid program, then the kernel will set this
bprm->secureexec bit. Indeed, every time this program re-exec's
itself, that bit will consistently be set. Today.

However, with your change, that behavior will change. The first time
the program is exec'd by luser this bit will be set. However, it will
"surprisingly" not occur should the program exec itself again. If you
are unaware of this bit's purpose there is a nice writeup here:

https://www.kernel.org/doc/html/v5.1/security/LSM.html

See the "bprm_set_creds" paragraph. My concern is that there is an
exploit vector associated with an abuser setting LD_LIBRARY_PATH= to
something nefarious, and then invoking a setuid program that happens
to re-exec itself for some reason. The first invocation will be as
before, but when the binary re-exec's itself, I am concerned that this
could cause the privileged binary to load an exploit.

This has nothing to do with your interest in NO_NEW_PRIV but more to
do with the legacy behavior changes like this are exposed to.

Cheers

Andrew

On Tue, May 6, 2025 at 11:33 PM Max Kellermann <max.kellermann@ionos.com> wrote:
>
> On Wed, May 7, 2025 at 5:16 AM Andrew G. Morgan <morgan@kernel.org> wrote:
> > If a setuid program execs itself, does the presence of this code undo
> > any protection the kernel afforded it on its first invocation?
>
> What protection do you mean, and what behavior do you expect when
> setid execs itself? I see this affects:
>
> 1. reset effective ids to real ids (only affects NO_NEW_PRIVS)
> 2. new cap_permitted cannot be higher than old cap_permitted
> 3. clear cap_ambient
> 4. clear pdeath_signal (in begin_new_exec)
> 5. reset stack limits (in begin_new_exec)
>
> About these (from my very limited knowledge of this part of the kernel):
>
> 1. is my primary goal, and really no new privs gained by allowing the
> process to keep existing ids
> 2. only ever changes anything if new cap_permitted is higher, but if
> that's the case, the is_setid check is irrelevant because __cap_gained
> is true, therefore no change with my patch
> 3. as I already described, the kernel is wrong (or the documentation
> is wrong), and my patch adjusts kernel to behave as documented
> 4. I don't see how this is dangerous for anything regarding re-exec;
> if pdeath_signal wasn't reset on the first exec, it's safe to keep it
> after the re-exec, too
> 5. same as 4, I think
>
> Did I miss anything?
>
> > FWIW I ran the libcap quicktest.sh script against your change and it
> > doesn't break any capability thing I test for when making libcap
> > releases.
>
> Thanks for taking the time to run these tests. I'm glad the existing
> tests didn't find any obvious bugs. If we identify an actual problem
> with my patch, let's write a new test that fails with my patch!
>
> > That being said, in general this whole zoo of *[ug]id values
> > and their subtle behavior is not casually obvious. I'd recommend using
> > a userspace workaround for your use case, and not changing the legacy
> > behavior of the kernel.
>
> What userspace workaround would be possible? The only thing that comes
> to my mind is to apply NO_NEW_PRIVS in the child process after exec -
> but that's too late, because arbitrary untrusted code has already been
> executed at this point. This means I lose NO_NEW_PRIVS completely,
> because the attacker can simply skip the call.
>
> Max

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH] security/commoncap: don't assume "setid" if all ids are identical
  2025-05-08  3:32             ` Andrew G. Morgan
@ 2025-05-08  6:38               ` Max Kellermann
  2025-05-08  8:37               ` Max Kellermann
  1 sibling, 0 replies; 44+ messages in thread
From: Max Kellermann @ 2025-05-08  6:38 UTC (permalink / raw)
  To: Andrew G. Morgan
  Cc: Serge E. Hallyn, Andy Lutomirski, paul, jmorris, kees,
	linux-security-module, linux-kernel

On Thu, May 8, 2025 at 5:32 AM Andrew G. Morgan <morgan@kernel.org> wrote:
> If a luser runs a setuid program, then the kernel will set this
> bprm->secureexec bit. Indeed, every time this program re-exec's
> itself, that bit will consistently be set. Today.
>
> However, with your change, that behavior will change. The first time
> the program is exec'd by luser this bit will be set. However, it will
> "surprisingly" not occur should the program exec itself again.

I had covered this case in my previous email already. This flag is
only used by begin_new_exec(), and the only consequence is resetting
pdeath_signal (point 4) and stack limit reset (point 5). I thought
both are no deal at all for the second exec - or are they? I don't
know.

> If you are unaware of this bit's purpose there is a nice writeup here:
>
> https://www.kernel.org/doc/html/v5.1/security/LSM.html

I don't see this document describing the purpose anywhere. It only
states that bprm_set_creds should set it "if a secure exec has
happened" but doesn't say what a "secure exec" is, and doesn't say
what is supposed to happen when it's set.

Is it really a "secureexec" if executing a non-suid program while
having effective!=real? (Currently: true, my patch: false)
Is it really a "secureexec" if executing a suid program but no actual
uid/gid changes occurred (no-op) because they're the same as before
(and no capabilities raised)? (Currently: true, my patch: false)

> See the "bprm_set_creds" paragraph. My concern is that there is an
> exploit vector associated with an abuser setting LD_LIBRARY_PATH= to
> something nefarious, and then invoking a setuid program that happens
> to re-exec itself for some reason. The first invocation will be as
> before, but when the binary re-exec's itself, I am concerned that this
> could cause the privileged binary to load an exploit.

How would resetting pdeath_signal and stack limit affect this problem?

LD_LIBRARY_PATH is an environment variable that's used by the
userspace linker, not by the kernel. Userspace doesn't have access to
the "secureexec" flag. The usual protection against
LD_LIBRARY_PATH/LD_PRELOAD attacks is that the glibc ld.so ignores (or
removes?) these when it observes real!=effective. That check still
works with my patch, and has nothing to do with the "secureexec" flag,
does it?

> This has nothing to do with your interest in NO_NEW_PRIV but more to
> do with the legacy behavior changes like this are exposed to.

Yes, I understand your worries, and I can limit my patch to the
NO_NEW_PRIVS case, if you prefer that. But I think it is worth
exploring the legacy behavior and see if there is really a problem
(i.e. if there is really a userspace-visible effect), and if that
legacy behavior really makes sense. To me, it seems like the legacy
code doesn't make much sense; the checks for "secureexec" are wrong
currently, and maybe we can fix this without adding more complexity,
but by cutting unnecessary complexity away. Simpler code that is
easier to understand is less likely to have vulnerabilities.

Max

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH] security/commoncap: don't assume "setid" if all ids are identical
  2025-05-08  3:32             ` Andrew G. Morgan
  2025-05-08  6:38               ` Max Kellermann
@ 2025-05-08  8:37               ` Max Kellermann
  1 sibling, 0 replies; 44+ messages in thread
From: Max Kellermann @ 2025-05-08  8:37 UTC (permalink / raw)
  To: Andrew G. Morgan
  Cc: Serge E. Hallyn, Andy Lutomirski, paul, jmorris, kees,
	linux-security-module, linux-kernel

On Thu, May 8, 2025 at 5:32 AM Andrew G. Morgan <morgan@kernel.org> wrote:
> See the "bprm_set_creds" paragraph. My concern is that there is an
> exploit vector associated with an abuser setting LD_LIBRARY_PATH= to
> something nefarious, and then invoking a setuid program that happens
> to re-exec itself for some reason. The first invocation will be as
> before, but when the binary re-exec's itself, I am concerned that this
> could cause the privileged binary to load an exploit.

Let's talk about this potential vulnerability again. Consider the
following code:

 #include <stdio.h>
 #include <sys/prctl.h>
 #include <unistd.h>
 int main(int argc, char **argv) {
   printf("ruid=%d euid=%d\n", getuid(), geteuid());
   if (argc > 1) {
     printf("setting NO_NEW_PRIVS\n");
     prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0);
   }
   if (**argv) {
     printf("re-exec\n");
     execl(*argv, "", NULL);
     perror("exec");
   }
 }

Without my patch:

 $ /reexec
 ruid=1000 euid=0
 re-exec
 ruid=1000 euid=0
 $ /reexec 1
 ruid=1000 euid=0
 setting NO_NEW_PRIVS
 re-exec
 ruid=1000 euid=1000

Without NO_NEW_PRIVS, the re-exec keeps the real/effective, but also
gains setuid again, but the suid is no-op; the euid is already 0.
With NO_NEW_PRIVS, the re-exec drops the euid=0, and doesn't regain it
- the setuid bit is ignored.

With my patch:

 $ /reexec
 ruid=1000 euid=0
 re-exec
 ruid=1000 euid=0
 $ /reexec 1
 ruid=1000 euid=0
 setting NO_NEW_PRIVS
 re-exec
 ruid=1000 euid=0

Same without NO_NEW_PRIVS (but internally, the re-exec is not
considered "secureexec" because the setuid bit is effectively a
no-op).
With NO_NEW_PRIVS, the setuid bit is ignored, but the process is
allowed to keep the euid=0 - which is the whole point of my patch.
Indeed, no new privs are gained - just like NO_NEW_PRIVS is
documented!

Back to your LD_LIBRARY_PATH example: with my patch, glibc ignores
LD_LIBRARY_PATH because effective!==real (still).

But without my patch, with the vanilla kernel, glibc does use
LD_LIBRARY_PATH (effective==real because effective was reset) and
you're actually vulnerable! It seems to be quite opposite of what you
have been assuming?
(Don't get confused that the kernel has reverted the euid to 1000;
this is a privileged superuser process with a full set of
capabilities!)

Am I missing something obvious, or is NO_NEW_PRIVS+reexec is a serious
Linux kernel vulnerability? (And it is fixed by my patch)

Max

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH] security/commoncap: don't assume "setid" if all ids are identical
  2025-05-06 14:51       ` Max Kellermann
  2025-05-07  3:16         ` Andrew G. Morgan
@ 2025-05-08 22:12         ` sergeh
  2025-05-09  6:15           ` Max Kellermann
  1 sibling, 1 reply; 44+ messages in thread
From: sergeh @ 2025-05-08 22:12 UTC (permalink / raw)
  To: Max Kellermann
  Cc: Serge E. Hallyn, Andy Lutomirski, paul, jmorris, kees, morgan,
	linux-security-module, linux-kernel, Eric Biederman

On Tue, May 06, 2025 at 04:51:39PM +0200, Max Kellermann wrote:
> On Tue, May 6, 2025 at 3:22 PM Serge E. Hallyn <serge@hallyn.com> wrote:
> > Just to quibble here: I don't use NO_NEW_PRIVS, but it seems to me quite
> > likely that your claim is wrong here.  The whole SECBIT_KEEP_CAPS etc
> > dance is based on the idea that you understand that once you exec, you
> > lose some of your existing privilege.  Similarly, it seems quite
> > likely to me that people using NO_NEW_PRIVS understand, expect, and
> > count on the fact that their effective ids will be cleared on exec.
> 
> One could define NO_NEW_PRIVS that way, but that's not how it is documented.
> Of course, we can't rule out that somewhere, somebody exists who
> relies on the current behavior, and that we must preserve it for ABI
> stability (I think this was your point). If you desire ABI stability,
> then this behavior should really be documented.

ABI stability is about the most important thing to Linus, so yes, if
documentation and code disagree, then we should fix the documentation,
except in the case where the current behavior just really is wrong
or insecure.

> To me, the current implementation looks weird and buggy (but that's
> just my opinion). The code figures that that it's a set-id exec when
> effective!=real, which is indeed how set-id execution looks like, but
> still that check is slightly off:

Here's my current reading:

The bprm->cred is initially set from current's, with suid and fsuid set
to euid.  So new->euid will be same as old->euid.  Then
bprm_creds_from_file() will call bprm_fill_uid() right before
security_bprm_creds_from_file().  bprm_fill_uid() will set new->euid to
the file's i_uid - only if the setuid bit is set AND only if not
task_no_new_privs(current).  Meaning that for NNP tasks it never sets
new->euid to the file's i_uid.

I think the summary is that I don't object to your patch per se (except
the ambient creds part, which I'll reply to later - oh and possibly the
potential for capability dropping, also for later), but your
terminology.  setuid and setgid mean something very specific: a file
which, when any user executes it, causes execution to happen under the
file's owner credentials.  And the behavior you are changing has nothing
to do with that.  What you are changing is that a NNP process with
different ruid and euid will continue to run, after exec, with those
previous ruid and euid, whether or not the file is setXid.

> 1. it's really only set-id when new!=old; checking real!=effective is
> conceptually the wrong angle
> 2. there may be other reasons why real!=effective
> 
> My patch is an attempt to fix this in an unintrusive way, by not
> rewriting it but adding another check to rule out some special case.
> If I were to rewrite this from scratch, I'd do it differently (only
> compare new!=old), but I don't want to mess too much with security
> code that I'm not very familiar with. I believe the guy who initially
> wrote it made wrong assumptions, but maybe I'm just wrong, I'm not the
> expert here.
> 
> > Note also that so far I'm only asking about the intent of the patch.
> 
> In a shared webhosting environment, we want to run an Apache (or
> nginx) in each website's container. If the website owner does "chmod
> 600", the Apache should not be able to read the file; but PHP
> processes spawned by the Apache should have full access. Therefore, we
> run Apache with a different fsuid; when Apache executes PHP, the fsuid
> is reverted.
> 
> But how to spawn Apache with a different fsuid? Not possible directly
> (fsuid is always reverted on exec), but by giving it a different euid
> (and ruid = website uid), granting it access to that secondary uid.
> After exec, Apache swaps uids, sets effective=real=apache_uid, and
> fsuid=website_uid.
> That works fine, until we enable NO_NEW_PRIVS - which is surprising,
> because we indeed don't want any new privs - just keep the existing
> ones.
> The documentation doesn't explain this behavior, and we don't want to
> omit NO_NEW_PRIVS as a workaround.
> 
> > Apart from that, I do think the implementation is wrong, because you
> > are impacting non-NO_NEW_PRIVS behavior as well, such as calculation
> > of cap_permitted and the clearing of ambient capabilities.
> 
> You are right, it affects all three code blocks that are checking
> "is_setid", but why do you believe it's wrong?
> I can move the new check to the bottom, covering only the
> "secureexec=1" line, if that worries you.
> 
> What sure is flawed is that my patch description fails to mention the
> other two changes. Sorry for that, I'll amend the description (if/when
> we agree that my patch is ok).
> 
> Though I do believe that all 3 changes are correct. Why would you want
> to clear ambient capabilities just because real!=effective? The
> manpage says: "Executing a program that changes UID or GID due to the
> set-user-ID or set-group-ID bits or executing a program that has  any
> file  capabilities set will clear the ambient set."
> 
> Documentation and code disagree! Currently, the kernel does not check
> for "changes UID/GID", but whether effective!=real. These two are
> orthogonal, the kernel is buggy, and my patch makes it a little bit
> more correct (but does not remove the wrong real!=effective check, see
> above).
> 
> > And, I'm not sure the has_identical_uids_gids() is quite right, as I'm
> > not sure what the bprm->cred->fsuid and suid make sense, though the
> > process's fsuid and suid of course need to be checked.
> 
> Sorry, I don't get that. What do you mean?

I meant I hadn't yet delved back into the location of bprm_fill_uid()
etc, and I know that code has moved around a bit in the last few years.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH] security/commoncap: don't assume "setid" if all ids are identical
  2025-05-08 22:12         ` sergeh
@ 2025-05-09  6:15           ` Max Kellermann
  2025-05-09 14:44             ` Eric W. Biederman
  0 siblings, 1 reply; 44+ messages in thread
From: Max Kellermann @ 2025-05-09  6:15 UTC (permalink / raw)
  To: sergeh
  Cc: Serge E. Hallyn, Andy Lutomirski, paul, jmorris, kees, morgan,
	linux-security-module, linux-kernel, Eric Biederman

On Fri, May 9, 2025 at 12:12 AM <sergeh@kernel.org> wrote:
> ABI stability is about the most important thing to Linus, so yes, if
> documentation and code disagree, then we should fix the documentation,
> except in the case where the current behavior just really is wrong
> or insecure.

It is insecure indeed (can be abused for LD_PRELOAD
attacks):https://lore.kernel.org/lkml/CAKPOu+8+1uVrDJHwmHJd2d46-N6AwjR4_bbtoSJS+sx6J=rkjg@mail.gmail.com/

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH] security/commoncap: don't assume "setid" if all ids are identical
  2025-05-09  6:15           ` Max Kellermann
@ 2025-05-09 14:44             ` Eric W. Biederman
  2025-05-09 16:53               ` Max Kellermann
  2025-05-09 18:41               ` [PATCH] Documentation/no_new_privs.rst: document dropping effective ids Max Kellermann
  0 siblings, 2 replies; 44+ messages in thread
From: Eric W. Biederman @ 2025-05-09 14:44 UTC (permalink / raw)
  To: Max Kellermann
  Cc: sergeh, Serge E. Hallyn, Andy Lutomirski, paul, jmorris, kees,
	morgan, linux-security-module, linux-kernel

Max Kellermann <max.kellermann@ionos.com> writes:

> On Fri, May 9, 2025 at 12:12 AM <sergeh@kernel.org> wrote:
>> ABI stability is about the most important thing to Linus, so yes, if
>> documentation and code disagree, then we should fix the documentation,
>> except in the case where the current behavior just really is wrong
>> or insecure.
>
> It is insecure indeed (can be abused for LD_PRELOAD
> attacks):https://lore.kernel.org/lkml/CAKPOu+8+1uVrDJHwmHJd2d46-N6AwjR4_bbtoSJS+sx6J=rkjg@mail.gmail.com/

I don't understand what you are trying to solve,
but the patch at the top of the thread introduces a
has_identical_uids_gids and is pure nonsense.

In particular __is_setuid or __is_setgid being true guarantees
that has_identical_uids_gids will be false.

Which means has_identical_uids_gids adds nothing, and the patch is
pointless.

If your concern is LD_PRELOAD and the like please don't play with
the uids/gids and instead just make certain bprm->secureexec gets
set.

At this point I am pretty certain that changing the logic and leaving
extra uids/gids set will result in security vulnerabilities for someone
who actually depends upon how the code works today.  I see no evidence
in this conversation that anyone has surveyed the users of NO_NEW_PRIVS
and verified how anyone actually uses it.  Without such evidence we
have to assume that userspace depends upon the current behavior.

Eric

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH] security/commoncap: don't assume "setid" if all ids are identical
  2025-05-09 14:44             ` Eric W. Biederman
@ 2025-05-09 16:53               ` Max Kellermann
  2025-05-09 20:17                 ` Serge E. Hallyn
  2025-05-09 18:41               ` [PATCH] Documentation/no_new_privs.rst: document dropping effective ids Max Kellermann
  1 sibling, 1 reply; 44+ messages in thread
From: Max Kellermann @ 2025-05-09 16:53 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: sergeh, Serge E. Hallyn, Andy Lutomirski, paul, jmorris, kees,
	morgan, linux-security-module, linux-kernel

On Fri, May 9, 2025 at 4:45 PM Eric W. Biederman <ebiederm@xmission.com> wrote:
> In particular __is_setuid or __is_setgid being true guarantees
> that has_identical_uids_gids will be false.

Sorry, no, that's completely wrong!

__is_setXid() compares effective with real.
has_identical_uids_gids() compares effective with effective, real with real etc.

See the difference?

> Which means has_identical_uids_gids adds nothing, and the patch is
> pointless.

Also wrong. If that were correct, then my patch would not have an
observable effect. But it does. Try it, try the small program I
posted!

It seems your whole email is based on this misunderstanding. Please reconsider.

> If your concern is LD_PRELOAD and the like please don't play with
> the uids/gids and instead just make certain bprm->secureexec gets
> set.

LD_PRELOAD is not my concern at all. I just observed that the current
kernel behavior can annul the LD_PRELOAD/suid protection as
implemented in glibc.

> I see no evidence
> in this conversation that anyone has surveyed the users of NO_NEW_PRIVS
> and verified how anyone actually uses it.  Without such evidence we
> have to assume that userspace depends upon the current behavior.

That's fine for me. But this behavior should be documented, because it
is rather surprising.

(In any case, we will keep the patch in our kernel fork because we
need this part of the kernel to work properly. Our machines don't run
any code that depends on the buggy behavior.)

Max

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH] security/commoncap: don't assume "setid" if all ids are identical
  2025-05-07  6:33           ` Max Kellermann
  2025-05-08  3:32             ` Andrew G. Morgan
@ 2025-05-09 17:50             ` Max Kellermann
  1 sibling, 0 replies; 44+ messages in thread
From: Max Kellermann @ 2025-05-09 17:50 UTC (permalink / raw)
  To: Andrew G. Morgan
  Cc: Serge E. Hallyn, Andy Lutomirski, paul, jmorris, kees,
	linux-security-module, linux-kernel

On Wed, May 7, 2025 at 8:33 AM Max Kellermann <max.kellermann@ionos.com> wrote:
> What protection do you mean, and what behavior do you expect when
> setid execs itself? I see this affects:
>
> 1. reset effective ids to real ids (only affects NO_NEW_PRIVS)
> 2. new cap_permitted cannot be higher than old cap_permitted
> 3. clear cap_ambient
> 4. clear pdeath_signal (in begin_new_exec)
> 5. reset stack limits (in begin_new_exec)
>
[...]
>
> Did I miss anything?

Indeed I missed something (but this was apparently so hard to find
that nobody could answer my question, until I found out myself).
The "secureexec" flag is not just used for resetting pdeath_signal and
stack limits; its primary purpose is to set the AT_SECURE ELF flag.

So yes, my patch is wrong. The "secureexec" setting must be excluded
from my patch.

Max

^ permalink raw reply	[flat|nested] 44+ messages in thread

* [PATCH] Documentation/no_new_privs.rst: document dropping effective ids
  2025-05-09 14:44             ` Eric W. Biederman
  2025-05-09 16:53               ` Max Kellermann
@ 2025-05-09 18:41               ` Max Kellermann
  1 sibling, 0 replies; 44+ messages in thread
From: Max Kellermann @ 2025-05-09 18:41 UTC (permalink / raw)
  To: ebiederm, serge, paul, jmorris, linux-security-module,
	linux-kernel, linux-doc
  Cc: Max Kellermann

Usually, execve() preserves the effective ids.  Many programs rely on
this to detect setuid/setgid execution and will disable certain
features (such as rejecting certain user input / environment
variables).

However, if NO_NEW_PRIVS is set, effective ids are always reset by
cap_bprm_creds_from_file(), but capabilities are not revoked.  That
means the process looks like it's not setuid/setgid, but has full
capabilities, and is effectively a superuser process.  This breaks
userspace assumptions.

It was argued [1] that this surprising behavior must not change
because programs might rely on it:

Of course, this leaves many programs vulnerable, but if we decide the
behavior must remain, we should at least document it with a warning.

[1] https://lore.kernel.org/lkml/87h61t7siv.fsf@email.froward.int.ebiederm.org/

Signed-off-by: Max Kellermann <max.kellermann@ionos.com>
---
 Documentation/userspace-api/no_new_privs.rst | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/Documentation/userspace-api/no_new_privs.rst b/Documentation/userspace-api/no_new_privs.rst
index d060ea217ea1..89b0884991e9 100644
--- a/Documentation/userspace-api/no_new_privs.rst
+++ b/Documentation/userspace-api/no_new_privs.rst
@@ -29,6 +29,12 @@ bits will no longer change the uid or gid; file capabilities will not
 add to the permitted set, and LSMs will not relax constraints after
 execve.

+A successful execve call with ``no_new_privs`` will reset the
+effective uid/gid to the real uid/gid, but does not drop capabilities.
+This means that comparing effective and real ids is not a valid method
+to detect setuid/setgid execution; the proper way to do that is
+getauxval(AT_SECURE).
+
 To set ``no_new_privs``, use::

     prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0);
-- 
2.47.2

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* Re: [PATCH] security/commoncap: don't assume "setid" if all ids are identical
  2025-05-09 16:53               ` Max Kellermann
@ 2025-05-09 20:17                 ` Serge E. Hallyn
  0 siblings, 0 replies; 44+ messages in thread
From: Serge E. Hallyn @ 2025-05-09 20:17 UTC (permalink / raw)
  To: Max Kellermann
  Cc: Eric W. Biederman, sergeh, Serge E. Hallyn, Andy Lutomirski, paul,
	jmorris, kees, morgan, linux-security-module, linux-kernel

On Fri, May 09, 2025 at 06:53:11PM +0200, Max Kellermann wrote:
> On Fri, May 9, 2025 at 4:45 PM Eric W. Biederman <ebiederm@xmission.com> wrote:
> > In particular __is_setuid or __is_setgid being true guarantees
> > that has_identical_uids_gids will be false.
> 
> Sorry, no, that's completely wrong!
> 
> __is_setXid() compares effective with real.
> has_identical_uids_gids() compares effective with effective, real with real etc.
> 
> See the difference?
> 
> > Which means has_identical_uids_gids adds nothing, and the patch is
> > pointless.
> 
> Also wrong. If that were correct, then my patch would not have an
> observable effect. But it does. Try it, try the small program I
> posted!
> 
> It seems your whole email is based on this misunderstanding. Please reconsider.
> 
> > If your concern is LD_PRELOAD and the like please don't play with
> > the uids/gids and instead just make certain bprm->secureexec gets
> > set.
> 
> LD_PRELOAD is not my concern at all. I just observed that the current

Right, it is an aside, though an important one.

> kernel behavior can annul the LD_PRELOAD/suid protection as
> implemented in glibc.

Hm, but no, it doesn't annul glibc's protection, right?

The concern is that:
a. musl doesn't implement LD_PRELOAD clearing
b. with NNP, setuid-exec followed by setting NNP followed by exec,
   will lead to different behavior from non-NNP.  In non-NNP,
   you'll continue to have euid=0, ruid=1000, and caps.  With NNP,
   you'll have euid=ruid=1000, and still full caps.

So, if someone is using NNP believing that it is a safe way to
execute untrusted code with privilege, because they see they
are now uid 1000, they will be confused.  Worse, they are
subject with musl to LD_PRELOAD from the user before setuid-root.

So two things we can do are

1. have NNP drop privilege.
2. have NNP not force euid to be ruid.

(1) sort of makes sense since you've bothered to use NNP, but
as Max, who is a user of NNP, says, that is not the behavior
that would be useful to him.  It also leaves the non-NNP and
NNP exec behavior - which is already - obviously - far too
complicated - with yet more cases.

(2) is concerning because it is a change in behavior for NNP
users, but on the other hand, it leaves us with fewer special
cases.

At this point I'm kind of leaning towards (2), though with the
obvious modification Max has already found should be added (for
secureexec).

> > I see no evidence
> > in this conversation that anyone has surveyed the users of NO_NEW_PRIVS
> > and verified how anyone actually uses it.  Without such evidence we

Max is such a user.  I don't know what we can do to get input from
more users.  Perhaps scan the debian codebase results at

  https://codesearch.debian.net/search?q=NO_NEW_PRIVS&literal=1

I'll take a look through those in a bit.

> > have to assume that userspace depends upon the current behavior.
> 
> That's fine for me. But this behavior should be documented, because it
> is rather surprising.
> 
> (In any case, we will keep the patch in our kernel fork because we
> need this part of the kernel to work properly. Our machines don't run
> any code that depends on the buggy behavior.)
> 
> Max

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH] security/commoncap: don't assume "setid" if all ids are identical
       [not found]                   ` <202505121725.34FC22D2D@keescook>
@ 2025-05-13  6:25                     ` Max Kellermann
  0 siblings, 0 replies; 44+ messages in thread
From: Max Kellermann @ 2025-05-13  6:25 UTC (permalink / raw)
  To: Kees Cook
  Cc: Serge E. Hallyn, Andy Lutomirski, paul, jmorris, morgan,
	Eric Biederman, LKML

On Tue, May 13, 2025 at 3:24 AM Kees Cook <kees@kernel.org> wrote:
>
> On Sat, May 10, 2025 at 02:29:19PM -0500, Serge E. Hallyn wrote:
> > Kees, do you have any input?
>
> I'd like to move this back to the list. I don't think there is anything
> secret here?

I don't think there ever was anything secret, and I don't understand
why it was ever taken off-list. My initial post was public and it
already contained everything.
(My reply will go back to LKML and a full quote of your email to give
context to those who were excluded from this private thread. I assume
you're fine with that?)

>
> Regardless, I apologize for being so late to the discussion. Last week
> was weird. Anyway, random thoughts from the thread and refreshing myself
> on the various code flows:
>
> - musl doesn't clear LD_PRELOAD under AT_SECURE: Ew. That's a bug, IMO.

Why would it be a bug? If a program doesn't use or care for
LD_PRELOAD, why would it need to clear environment variables that may
or may be used by other arbitrary programs? Is every program
responsible for clearing sensitive environment variables for every
other program on the planet? (I'm stretching this for the sake of the
argument, but I hope you get my basic idea.)

> - We care about non-ELF, yes, but currently all realistic paths lead back
>   to ELF (e.g. creds are calculated from the interpreter and not
>   the script for binfmt_script).
>
> - "classic elevated privilege check" in userspace is just wrong. It's
>   been insufficient to check for euid!=uid for decades. Looking at
>   AT_SECURE is, afaict, sufficient, though.

We agree that AT_SECURE has been the canonical Linux-specific way for
this check for at least 20 years.

Yes, everybody should have an #ifdef __linux__ with this check instead
of geteuid()!=getuid(). But that's not the reality we're living in.
Did you know that not even util-linux and e2fsprogs get this right?
Look at:

- https://github.com/util-linux/util-linux/pull/3566
- https://github.com/tytso/e2fsprogs/pull/226

(I posted patches to a bunch of projects to get userspace fixed.
Others that don't get it right include: cups, libevent, Firefox, Qt,
Chromium, GnuPG, krb5, Postfix, Mono, iptables and many many others.)

The reality is that knowledge of the existence of AT_SECURE has
propagated only to a tiny bubble, and the majority of people have
never heard of it.

That alone would not justify fixing the NO_NEW_PRIVS behavior; but it
looks like the NO_NEW_PRIVS behavior was initially an undesired side
effect of a broken setuid check in the kernel. What the kernel does
here is not documented, and neither does it make sense. This is really
why I argue that the kernel should be fixed. If it doesn't make sense
AND breaks userspace, there is really only one way to fix it. But
that's just my opinion.

> - Linux has a kind of messy separation of logic:
>         - fs/exec.c is looking at DAC, but sets up checks for ptrace,
>           nnp, shared fs
>         - security/commoncap.c is evaluating DAC, caps, and ptrace.
>
> - in the kernel, having euid != uid is _defined_ as being setuid.
>   Whether or not this is "more privileged" isn't part of the definition.
>
> - NNP applies only in two place:
>         - stopping gain of euid/egid from setuid/setgid of a file (same
>           as nosuid mount option) in bprm_fill_uid().
>         - unconditionally removing euid when lacking CAP_SETUID,
>           in cap_bprm_creds_from_file() (same as shared fs, ptracing).
>           (The explicit check for NNP is not redundant here: it's forcing
>           NNP even with CAP_SETUID, where as the other LSM_UNSAFE flags
>           allow CAP_SETUID to bypass them.)
>
> I've read through the thread and I'm convinced the original patch should
> not be applied (it changes the definition of setuid and setgid). Having
> the _purpose_ of the patch explain in the commit log next time would be
> good, but that came out in the thread later: there is a removal of euid
> under the described construction of apache/php/etc.
>
> In CAKPOu+8+1uVrDJHwmHJd2d46-N6AwjR4_bbtoSJS+sx6J=rkjg@mail.gmail.com:
> > Without my patch:
> >
> >  $ /reexec
> >  ruid=1000 euid=0
>
> I'm so confused. Is this "reexec" marked setuid=0?
>
> >  re-exec
> >  ruid=1000 euid=0
> >  $ /reexec 1
> >  ruid=1000 euid=0
> >  setting NO_NEW_PRIVS
> >  re-exec
> >  ruid=1000 euid=1000
>
> I assume so, since this is the behavior I'd expect.
>
> > Without NO_NEW_PRIVS, the re-exec keeps the real/effective, but also
> > gains setuid again, but the suid is no-op; the euid is already 0.
> > With NO_NEW_PRIVS, the re-exec drops the euid=0, and doesn't regain it
> > - the setuid bit is ignored.
> >
> > With my patch:
> >
> >  $ /reexec
> >  ruid=1000 euid=0
> >  re-exec
> >  ruid=1000 euid=0
> >  $ /reexec 1
> >  ruid=1000 euid=0
> >  setting NO_NEW_PRIVS
> >  re-exec
> >  ruid=1000 euid=0
>
> I see how this looks inconsistent, but it is correct: any LSM_UNSAFE_*
> flag will trigger the stripping of euid. It could be argued it should
> strip _more_, IMO.

One can define NO_NEW_PRIVS this way (though I don't agree that it
should be), but that is not how it is documented currently. And the
name NO_NEW_PRIVS doesn't even remotely imply this behavior; you argue
for taking away privileges, but the name says no new privileges shall
be granted; keeping existing privileges is that.

For better NO_NEW_PRIVS documentation, please consider merging my
patch: https://lore.kernel.org/lkml/20250509184105.840928-1-max.kellermann@ionos.com/

In any case, I will keep my "ids identical" patch (with "secureexec"
bug fix) in our private kernel fork, because we need this behavior,
one way or another, but there is only this way. Thanks to everybody
who participated in the discussion - I really learned something here,
and that helped me understand the "secureexec" mistake in my patch.

Max

^ permalink raw reply	[flat|nested] 44+ messages in thread

* [PATCH] exec: Correct the permission check for unsafe exec
  2025-03-06  8:26 [PATCH] security/commoncap: don't assume "setid" if all ids are identical Max Kellermann
  2025-03-07 10:32 ` kernel test robot
  2025-03-09 15:19 ` Serge E. Hallyn
@ 2025-05-15 16:24 ` Eric W. Biederman
  2025-05-15 22:09   ` Kees Cook
  2025-05-16 21:49   ` sergeh
  2 siblings, 2 replies; 44+ messages in thread
From: Eric W. Biederman @ 2025-05-15 16:24 UTC (permalink / raw)
  To: Kees Cook; +Cc: Max Kellermann, Serge E. Hallyn


Max Kellerman recently experienced a problem[1] when calling exec with
differing uid and euid's and he triggered the logic that is supposed
to only handle setuid executables.

When exec isn't changing anything in struct cred it doesn't make sense
to go into the code that is there to handle the case when the
credentials change.

When looking into the history of the code I discovered that this issue
was not present in Linux-2.4.0-test12 and was introduced in
Linux-2.4.0-prerelease when the logic for handling this case was moved
from prepare_binprm to compute_creds in fs/exec.c.

The bug introduced was to comparing euid in the new credentials with
uid instead of euid in the old credentials, when testing if setuid
had changed the euid.

Since triggering the keep ptrace limping along case for setuid
executables makes no sense when it was not a setuid exec revert back
to the logic present in Linux-2.4.0-test12.

This removes the confusingly named and subtlety incorrect helpers
is_setuid and is_setgid, that helped this bug to persist.

The variable is_setid is renamed to id_changed (it's Linux-2.4.0-test12
name) as the old name describes matters rather than it's cause.

The code removed in Linux-2.4.0-prerelease was:
-       /* Set-uid? */
-       if (mode & S_ISUID) {
-               bprm->e_uid = inode->i_uid;
-               if (bprm->e_uid != current->euid)
-                       id_change = 1;
-       }
-
-       /* Set-gid? */
-       /*
-        * If setgid is set but no group execute bit then this
-        * is a candidate for mandatory locking, not a setgid
-        * executable.
-        */
-       if ((mode & (S_ISGID | S_IXGRP)) == (S_ISGID | S_IXGRP)) {
-               bprm->e_gid = inode->i_gid;
-               if (!in_group_p(bprm->e_gid))
-                       id_change = 1;

Linux-2.4.0-prerelease added the current logic as:
+       if (bprm->e_uid != current->uid || bprm->e_gid != current->gid ||
+           !cap_issubset(new_permitted, current->cap_permitted)) {
+                current->dumpable = 0;
+
+               lock_kernel();
+               if (must_not_trace_exec(current)
+                   || atomic_read(&current->fs->count) > 1
+                   || atomic_read(&current->files->count) > 1
+                   || atomic_read(&current->sig->count) > 1) {
+                       if(!capable(CAP_SETUID)) {
+                               bprm->e_uid = current->uid;
+                               bprm->e_gid = current->gid;
+                       }
+                       if(!capable(CAP_SETPCAP)) {
+                               new_permitted = cap_intersect(new_permitted,
+                                                       current->cap_permitted);
+                       }
+               }
+               do_unlock = 1;
+       }

I have condensed the logic from Linux-2.4.0-test12 to just:
	id_changed = !uid_eq(new->euid, old->euid) || !in_group_p(new->egid);


This change is userspace visible, but I don't expect anyone to care.

For the bug that is being fixed to trigger bprm->unsafe has to be set.
The variable bprm->unsafe is set when ptracing an executable, when
sharing a working directory, or when no_new_privs is set.  Properly
testing for cases that are safe even in those conditions and doing
nothing special should not affect anyone.  Especially if they were
previously ok with their credentials getting munged

Reported-by: Max Kellermann <max.kellermann@ionos.com>
Fixes: 64444d3d0d7f ("Linux version 2.4.0-prerelease")
[1] https://lkml.kernel.org/r/20250306082615.174777-1-max.kellermann@ionos.com
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 security/commoncap.c | 18 ++++++------------
 1 file changed, 6 insertions(+), 12 deletions(-)

diff --git a/security/commoncap.c b/security/commoncap.c
index 28d4248bf001..96c7654f2012 100644
--- a/security/commoncap.c
+++ b/security/commoncap.c
@@ -856,12 +856,6 @@ static void handle_privileged_root(struct linux_binprm *bprm, bool has_fcap,
 #define __cap_full(field, cred) \
 	cap_issubset(CAP_FULL_SET, cred->cap_##field)
 
-static inline bool __is_setuid(struct cred *new, const struct cred *old)
-{ return !uid_eq(new->euid, old->uid); }
-
-static inline bool __is_setgid(struct cred *new, const struct cred *old)
-{ return !gid_eq(new->egid, old->gid); }
-
 /*
  * 1) Audit candidate if current->cap_effective is set
  *
@@ -891,7 +885,7 @@ static inline bool nonroot_raised_pE(struct cred *new, const struct cred *old,
 	    (root_privileged() &&
 	     __is_suid(root, new) &&
 	     !__cap_full(effective, new)) ||
-	    (!__is_setuid(new, old) &&
+	    (uid_eq(new->euid, old->euid) &&
 	     ((has_fcap &&
 	       __cap_gained(permitted, new, old)) ||
 	      __cap_gained(ambient, new, old))))
@@ -917,7 +911,7 @@ int cap_bprm_creds_from_file(struct linux_binprm *bprm, const struct file *file)
 	/* Process setpcap binaries and capabilities for uid 0 */
 	const struct cred *old = current_cred();
 	struct cred *new = bprm->cred;
-	bool effective = false, has_fcap = false, is_setid;
+	bool effective = false, has_fcap = false, id_changed;
 	int ret;
 	kuid_t root_uid;
 
@@ -941,9 +935,9 @@ int cap_bprm_creds_from_file(struct linux_binprm *bprm, const struct file *file)
 	 *
 	 * In addition, if NO_NEW_PRIVS, then ensure we get no new privs.
 	 */
-	is_setid = __is_setuid(new, old) || __is_setgid(new, old);
+	id_changed = !uid_eq(new->euid, old->euid) || !in_group_p(new->egid);
 
-	if ((is_setid || __cap_gained(permitted, new, old)) &&
+	if ((id_changed || __cap_gained(permitted, new, old)) &&
 	    ((bprm->unsafe & ~LSM_UNSAFE_PTRACE) ||
 	     !ptracer_capable(current, new->user_ns))) {
 		/* downgrade; they get no more than they had, and maybe less */
@@ -960,7 +954,7 @@ int cap_bprm_creds_from_file(struct linux_binprm *bprm, const struct file *file)
 	new->sgid = new->fsgid = new->egid;
 
 	/* File caps or setid cancels ambient. */
-	if (has_fcap || is_setid)
+	if (has_fcap || id_changed)
 		cap_clear(new->cap_ambient);
 
 	/*
@@ -993,7 +987,7 @@ int cap_bprm_creds_from_file(struct linux_binprm *bprm, const struct file *file)
 		return -EPERM;
 
 	/* Check for privilege-elevated exec. */
-	if (is_setid ||
+	if (id_changed ||
 	    (!__is_real(root_uid, new) &&
 	     (effective ||
 	      __cap_grew(permitted, ambient, new))))
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* Re: [PATCH] exec: Correct the permission check for unsafe exec
  2025-05-15 16:24 ` [PATCH] exec: Correct the permission check for unsafe exec Eric W. Biederman
@ 2025-05-15 22:09   ` Kees Cook
  2025-05-16 15:26     ` Eric W. Biederman
  2025-05-16 21:48     ` [PATCH] " sergeh
  2025-05-16 21:49   ` sergeh
  1 sibling, 2 replies; 44+ messages in thread
From: Kees Cook @ 2025-05-15 22:09 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Max Kellermann, Serge E. Hallyn, paul, jmorris, Andy Lutomirski,
	morgan, Christian Brauner, Jann Horn, linux-security-module,
	linux-kernel

On Thu, May 15, 2025 at 11:24:47AM -0500, Eric W. Biederman wrote:
> I have condensed the logic from Linux-2.4.0-test12 to just:
> 	id_changed = !uid_eq(new->euid, old->euid) || !in_group_p(new->egid);
> 
> This change is userspace visible, but I don't expect anyone to care.
> [...]
> -static inline bool __is_setuid(struct cred *new, const struct cred *old)
> -{ return !uid_eq(new->euid, old->uid); }
> -
> -static inline bool __is_setgid(struct cred *new, const struct cred *old)
> -{ return !gid_eq(new->egid, old->gid); }
> -
> [...]
> -	is_setid = __is_setuid(new, old) || __is_setgid(new, old);
> +	id_changed = !uid_eq(new->euid, old->euid) || !in_group_p(new->egid);

The core change here is testing for differing euid rather than
mismatched uid/euid. (And checking for egid in the set of all groups.)

Imagined situations:

- setuid process is sharing fs. We already believe this is a non-issue,
  as Jann pointed out about Chrome's order of operations, so so changes
  here are likely fine.

- somehow ptracing a process with uid!=euid, and it tries to exec a
  different setuid==euid ELF. Is switching ELF images a security
  boundary? Probably not realistically.

- setuid process sets NNP and execs a setuid==euid ELF, expecting to
  have euid stripped. That doesn't happen any more. This is the most
  worrisome case, but a program like that should _really_ have dropped
  euid first if it is depending on that behavior. Hmmm. Probably some
  code searching is needed...

-Kees

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH] exec: Correct the permission check for unsafe exec
  2025-05-15 22:09   ` Kees Cook
@ 2025-05-16 15:26     ` Eric W. Biederman
  2025-05-16 18:06       ` Jann Horn
  2025-05-16 21:48     ` [PATCH] " sergeh
  1 sibling, 1 reply; 44+ messages in thread
From: Eric W. Biederman @ 2025-05-16 15:26 UTC (permalink / raw)
  To: Kees Cook
  Cc: Max Kellermann, Serge E. Hallyn, paul, jmorris, Andy Lutomirski,
	morgan, Christian Brauner, Jann Horn, linux-security-module,
	linux-kernel

Kees Cook <kees@kernel.org> writes:

> On Thu, May 15, 2025 at 11:24:47AM -0500, Eric W. Biederman wrote:
>> I have condensed the logic from Linux-2.4.0-test12 to just:
>> 	id_changed = !uid_eq(new->euid, old->euid) || !in_group_p(new->egid);
>> 
>> This change is userspace visible, but I don't expect anyone to care.
>> [...]
>> -static inline bool __is_setuid(struct cred *new, const struct cred *old)
>> -{ return !uid_eq(new->euid, old->uid); }
>> -
>> -static inline bool __is_setgid(struct cred *new, const struct cred *old)
>> -{ return !gid_eq(new->egid, old->gid); }
>> -
>> [...]
>> -	is_setid = __is_setuid(new, old) || __is_setgid(new, old);
>> +	id_changed = !uid_eq(new->euid, old->euid) || !in_group_p(new->egid);
>
> The core change here is testing for differing euid rather than
> mismatched uid/euid. (And checking for egid in the set of all groups.)

Yes.

For what the code is trying to do I can't fathom what was trying to
be accomplished by the "mismatched" uid/euid check.

> Imagined situations:
>
> - setuid process is sharing fs. We already believe this is a non-issue,
>   as Jann pointed out about Chrome's order of operations, so so changes
>   here are likely fine.

Yes, nothing has changed from a security standpoint.

> - somehow ptracing a process with uid!=euid, and it tries to exec a
>   different setuid==euid ELF. Is switching ELF images a security
>   boundary? Probably not realistically.

The concern with tracing is can the tracer gain more privileges from
the traced application.  If there is no switch of euid or egid the
answer is no.

In fact what we do is actively bad in the ptrace case as it makes
debugging unnecessarily change the behavior of an application.

> - setuid process sets NNP and execs a setuid==euid ELF, expecting to
>   have euid stripped. That doesn't happen any more. This is the most
>   worrisome case, but a program like that should _really_ have dropped
>   euid first if it is depending on that behavior. Hmmm. Probably some
>   code searching is needed...

That is a fair question.  Has some no-new-privs using application
that has uid != euid when it calls exec developed a dependency on
how we the code sets euid == uid when we call exec today.

That is exactly the case that Max Kellermann has a problem with,
so we have at least one no-new-privs user that wants the fixed behavior.

In addition to code searching we could first change the behavior for
everything except ptrace to just return -EPERM.  That should flush out
most of the users of this case if we miss some one.

Given that I have only seen a justification for limping along during
ptrace from Linus (and it was a long time ago) I think we would be
better off just making it fail, and then we don't have to worry
about userspace depending upon this weird case in future maintenance.

Eric

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH] exec: Correct the permission check for unsafe exec
  2025-05-16 15:26     ` Eric W. Biederman
@ 2025-05-16 18:06       ` Jann Horn
  2025-05-16 18:08         ` Jann Horn
                           ` (2 more replies)
  0 siblings, 3 replies; 44+ messages in thread
From: Jann Horn @ 2025-05-16 18:06 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Kees Cook, Max Kellermann, Serge E. Hallyn, paul, jmorris,
	Andy Lutomirski, morgan, Christian Brauner, linux-security-module,
	linux-kernel

On Fri, May 16, 2025 at 5:26 PM Eric W. Biederman <ebiederm@xmission.com> wrote:
> Kees Cook <kees@kernel.org> writes:
>
> > On Thu, May 15, 2025 at 11:24:47AM -0500, Eric W. Biederman wrote:
> >> I have condensed the logic from Linux-2.4.0-test12 to just:
> >>      id_changed = !uid_eq(new->euid, old->euid) || !in_group_p(new->egid);
> >>
> >> This change is userspace visible, but I don't expect anyone to care.
> >> [...]
> >> -static inline bool __is_setuid(struct cred *new, const struct cred *old)
> >> -{ return !uid_eq(new->euid, old->uid); }
> >> -
> >> -static inline bool __is_setgid(struct cred *new, const struct cred *old)
> >> -{ return !gid_eq(new->egid, old->gid); }
> >> -
> >> [...]
> >> -    is_setid = __is_setuid(new, old) || __is_setgid(new, old);
> >> +    id_changed = !uid_eq(new->euid, old->euid) || !in_group_p(new->egid);
> >
> > The core change here is testing for differing euid rather than
> > mismatched uid/euid. (And checking for egid in the set of all groups.)
>
> Yes.
>
> For what the code is trying to do I can't fathom what was trying to
> be accomplished by the "mismatched" uid/euid check.

I remember that when I was looking at this code years ago, one case I
was interested in was what happens when a setuid process (running with
something like euid=1000,ruid=0) execve()'s a normal binary. Clearly
the LSM_UNSAFE_* stuff is not so interesting there, because we're
already coming from a privileged context; but the behavior of
bprm->secureexec could be important.

Like, I think currently a setuid binary like this is probably (?) not
exploitable:

int main(void) {
  execl("/bin/echo", "echo", "hello world");
}

but after your proposed change, I think it might (?) become
exploitable because "echo" would not have AT_SECURE set (I think?) and
would therefore load libraries based on environment variables?

To be clear, I think this would be a stupid thing for userspace to do
- a setuid binary just should not be running other binaries with the
caller-provided environment while having elevated privileges. But if
userspace was doing something like that, this change might make it
more exploitable, and I imagine that the check for mismatched uid/euid
was intended to catch cases like this?

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH] exec: Correct the permission check for unsafe exec
  2025-05-16 18:06       ` Jann Horn
@ 2025-05-16 18:08         ` Jann Horn
  2025-05-16 21:46         ` sergeh
  2025-05-16 23:29         ` Eric W. Biederman
  2 siblings, 0 replies; 44+ messages in thread
From: Jann Horn @ 2025-05-16 18:08 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Kees Cook, Max Kellermann, Serge E. Hallyn, paul, jmorris,
	Andy Lutomirski, morgan, Christian Brauner, linux-security-module,
	linux-kernel

On Fri, May 16, 2025 at 8:06 PM Jann Horn <jannh@google.com> wrote:
> Like, I think currently a setuid binary like this is probably (?) not
> exploitable:
>
> int main(void) {
>   execl("/bin/echo", "echo", "hello world");
> }

(bleh, of course what I meant to write here was
`execl("/bin/echo", "echo", "hello world", NULL);`)

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH] exec: Correct the permission check for unsafe exec
  2025-05-16 18:06       ` Jann Horn
  2025-05-16 18:08         ` Jann Horn
@ 2025-05-16 21:46         ` sergeh
  2025-05-20 22:38           ` Jann Horn
  2025-05-16 23:29         ` Eric W. Biederman
  2 siblings, 1 reply; 44+ messages in thread
From: sergeh @ 2025-05-16 21:46 UTC (permalink / raw)
  To: Jann Horn
  Cc: Eric W. Biederman, Kees Cook, Max Kellermann, Serge E. Hallyn,
	paul, jmorris, Andy Lutomirski, morgan, Christian Brauner,
	linux-security-module, linux-kernel

On Fri, May 16, 2025 at 08:06:15PM +0200, Jann Horn wrote:
> On Fri, May 16, 2025 at 5:26 PM Eric W. Biederman <ebiederm@xmission.com> wrote:
> > Kees Cook <kees@kernel.org> writes:
> >
> > > On Thu, May 15, 2025 at 11:24:47AM -0500, Eric W. Biederman wrote:
> > >> I have condensed the logic from Linux-2.4.0-test12 to just:
> > >>      id_changed = !uid_eq(new->euid, old->euid) || !in_group_p(new->egid);
> > >>
> > >> This change is userspace visible, but I don't expect anyone to care.
> > >> [...]
> > >> -static inline bool __is_setuid(struct cred *new, const struct cred *old)
> > >> -{ return !uid_eq(new->euid, old->uid); }
> > >> -
> > >> -static inline bool __is_setgid(struct cred *new, const struct cred *old)
> > >> -{ return !gid_eq(new->egid, old->gid); }
> > >> -
> > >> [...]
> > >> -    is_setid = __is_setuid(new, old) || __is_setgid(new, old);
> > >> +    id_changed = !uid_eq(new->euid, old->euid) || !in_group_p(new->egid);
> > >
> > > The core change here is testing for differing euid rather than
> > > mismatched uid/euid. (And checking for egid in the set of all groups.)
> >
> > Yes.
> >
> > For what the code is trying to do I can't fathom what was trying to
> > be accomplished by the "mismatched" uid/euid check.
> 
> I remember that when I was looking at this code years ago, one case I
> was interested in was what happens when a setuid process (running with
> something like euid=1000,ruid=0) execve()'s a normal binary. Clearly
> the LSM_UNSAFE_* stuff is not so interesting there, because we're
> already coming from a privileged context; but the behavior of
> bprm->secureexec could be important.
> 
> Like, I think currently a setuid binary like this is probably (?) not
> exploitable:
> 
> int main(void) {
>   execl("/bin/echo", "echo", "hello world");
> }
> 
> but after your proposed change, I think it might (?) become
> exploitable because "echo" would not have AT_SECURE set (I think?) and
> would therefore load libraries based on environment variables?
> 
> To be clear, I think this would be a stupid thing for userspace to do
> - a setuid binary just should not be running other binaries with the
> caller-provided environment while having elevated privileges. But if
> userspace was doing something like that, this change might make it
> more exploitable, and I imagine that the check for mismatched uid/euid
> was intended to catch cases like this?

If the original process became privileged by executing a setuid-root
file (and uses glibc), then LD_PRELOAD will already have been cleared,
right?  So it would either have to add the unsafe entries back to
LD_PRELOAD again, or it has to have been root all along, not a
setuid-root program.  I think at that point we have to say this is what
it intended, and possibly with good reason.

-serge

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH] exec: Correct the permission check for unsafe exec
  2025-05-15 22:09   ` Kees Cook
  2025-05-16 15:26     ` Eric W. Biederman
@ 2025-05-16 21:48     ` sergeh
  1 sibling, 0 replies; 44+ messages in thread
From: sergeh @ 2025-05-16 21:48 UTC (permalink / raw)
  To: Kees Cook
  Cc: Eric W. Biederman, Max Kellermann, Serge E. Hallyn, paul, jmorris,
	Andy Lutomirski, morgan, Christian Brauner, Jann Horn,
	linux-security-module, linux-kernel

On Thu, May 15, 2025 at 03:09:48PM -0700, Kees Cook wrote:
> On Thu, May 15, 2025 at 11:24:47AM -0500, Eric W. Biederman wrote:
> > I have condensed the logic from Linux-2.4.0-test12 to just:
> > 	id_changed = !uid_eq(new->euid, old->euid) || !in_group_p(new->egid);
> > 
> > This change is userspace visible, but I don't expect anyone to care.
> > [...]
> > -static inline bool __is_setuid(struct cred *new, const struct cred *old)
> > -{ return !uid_eq(new->euid, old->uid); }
> > -
> > -static inline bool __is_setgid(struct cred *new, const struct cred *old)
> > -{ return !gid_eq(new->egid, old->gid); }
> > -
> > [...]
> > -	is_setid = __is_setuid(new, old) || __is_setgid(new, old);
> > +	id_changed = !uid_eq(new->euid, old->euid) || !in_group_p(new->egid);
> 
> The core change here is testing for differing euid rather than
> mismatched uid/euid. (And checking for egid in the set of all groups.)
> 
> Imagined situations:
> 
> - setuid process is sharing fs. We already believe this is a non-issue,
>   as Jann pointed out about Chrome's order of operations, so so changes
>   here are likely fine.
> 
> - somehow ptracing a process with uid!=euid, and it tries to exec a
>   different setuid==euid ELF. Is switching ELF images a security
>   boundary? Probably not realistically.
> 
> - setuid process sets NNP and execs a setuid==euid ELF, expecting to
>   have euid stripped. That doesn't happen any more. This is the most

Yeah, that had been my first concern: that nnp users will have learned
about and therefore started depending upon this behavior.  But right
now while euid gets stripped, capabilities do not, so the current
situation is actually far more unsafe.

>   worrisome case, but a program like that should _really_ have dropped
>   euid first if it is depending on that behavior. Hmmm. Probably some
>   code searching is needed...

I had started looking through the debian code search, but didn't finish.
Mostly saw util-linux and systemd...

> -Kees
> 
> -- 
> Kees Cook

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH] exec: Correct the permission check for unsafe exec
  2025-05-15 16:24 ` [PATCH] exec: Correct the permission check for unsafe exec Eric W. Biederman
  2025-05-15 22:09   ` Kees Cook
@ 2025-05-16 21:49   ` sergeh
  1 sibling, 0 replies; 44+ messages in thread
From: sergeh @ 2025-05-16 21:49 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Kees Cook, Max Kellermann, Serge E. Hallyn, paul, jmorris,
	Andy Lutomirski, morgan, Christian Brauner, linux-security-module,
	linux-kernel

On Thu, May 15, 2025 at 11:24:47AM -0500, Eric W. Biederman wrote:
> 
> Max Kellerman recently experienced a problem[1] when calling exec with
> differing uid and euid's and he triggered the logic that is supposed
> to only handle setuid executables.
> 
> When exec isn't changing anything in struct cred it doesn't make sense
> to go into the code that is there to handle the case when the
> credentials change.
> 
> When looking into the history of the code I discovered that this issue
> was not present in Linux-2.4.0-test12 and was introduced in
> Linux-2.4.0-prerelease when the logic for handling this case was moved
> from prepare_binprm to compute_creds in fs/exec.c.
> 
> The bug introduced was to comparing euid in the new credentials with
> uid instead of euid in the old credentials, when testing if setuid
> had changed the euid.
> 
> Since triggering the keep ptrace limping along case for setuid
> executables makes no sense when it was not a setuid exec revert back
> to the logic present in Linux-2.4.0-test12.
> 
> This removes the confusingly named and subtlety incorrect helpers
> is_setuid and is_setgid, that helped this bug to persist.
> 
> The variable is_setid is renamed to id_changed (it's Linux-2.4.0-test12
> name) as the old name describes matters rather than it's cause.
> 
> The code removed in Linux-2.4.0-prerelease was:
> -       /* Set-uid? */
> -       if (mode & S_ISUID) {
> -               bprm->e_uid = inode->i_uid;
> -               if (bprm->e_uid != current->euid)
> -                       id_change = 1;
> -       }
> -
> -       /* Set-gid? */
> -       /*
> -        * If setgid is set but no group execute bit then this
> -        * is a candidate for mandatory locking, not a setgid
> -        * executable.
> -        */
> -       if ((mode & (S_ISGID | S_IXGRP)) == (S_ISGID | S_IXGRP)) {
> -               bprm->e_gid = inode->i_gid;
> -               if (!in_group_p(bprm->e_gid))
> -                       id_change = 1;
> 
> Linux-2.4.0-prerelease added the current logic as:
> +       if (bprm->e_uid != current->uid || bprm->e_gid != current->gid ||
> +           !cap_issubset(new_permitted, current->cap_permitted)) {
> +                current->dumpable = 0;
> +
> +               lock_kernel();
> +               if (must_not_trace_exec(current)
> +                   || atomic_read(&current->fs->count) > 1
> +                   || atomic_read(&current->files->count) > 1
> +                   || atomic_read(&current->sig->count) > 1) {
> +                       if(!capable(CAP_SETUID)) {
> +                               bprm->e_uid = current->uid;
> +                               bprm->e_gid = current->gid;
> +                       }
> +                       if(!capable(CAP_SETPCAP)) {
> +                               new_permitted = cap_intersect(new_permitted,
> +                                                       current->cap_permitted);
> +                       }
> +               }
> +               do_unlock = 1;
> +       }
> 
> I have condensed the logic from Linux-2.4.0-test12 to just:
> 	id_changed = !uid_eq(new->euid, old->euid) || !in_group_p(new->egid);
> 
> 
> This change is userspace visible, but I don't expect anyone to care.
> 
> For the bug that is being fixed to trigger bprm->unsafe has to be set.
> The variable bprm->unsafe is set when ptracing an executable, when
> sharing a working directory, or when no_new_privs is set.  Properly
> testing for cases that are safe even in those conditions and doing
> nothing special should not affect anyone.  Especially if they were
> previously ok with their credentials getting munged
> 
> Reported-by: Max Kellermann <max.kellermann@ionos.com>
> Fixes: 64444d3d0d7f ("Linux version 2.4.0-prerelease")
> [1] https://lkml.kernel.org/r/20250306082615.174777-1-max.kellermann@ionos.com
> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>

Thanks, Eric.

Obviously we should continue to discuss Jann's concern and the
potential change in behavior, but I'm hugely in favor of this patch.

Reviewed-by: Serge Hallyn <serge@hallyn.com>

> ---
>  security/commoncap.c | 18 ++++++------------
>  1 file changed, 6 insertions(+), 12 deletions(-)
> 
> diff --git a/security/commoncap.c b/security/commoncap.c
> index 28d4248bf001..96c7654f2012 100644
> --- a/security/commoncap.c
> +++ b/security/commoncap.c
> @@ -856,12 +856,6 @@ static void handle_privileged_root(struct linux_binprm *bprm, bool has_fcap,
>  #define __cap_full(field, cred) \
>  	cap_issubset(CAP_FULL_SET, cred->cap_##field)
>  
> -static inline bool __is_setuid(struct cred *new, const struct cred *old)
> -{ return !uid_eq(new->euid, old->uid); }
> -
> -static inline bool __is_setgid(struct cred *new, const struct cred *old)
> -{ return !gid_eq(new->egid, old->gid); }
> -
>  /*
>   * 1) Audit candidate if current->cap_effective is set
>   *
> @@ -891,7 +885,7 @@ static inline bool nonroot_raised_pE(struct cred *new, const struct cred *old,
>  	    (root_privileged() &&
>  	     __is_suid(root, new) &&
>  	     !__cap_full(effective, new)) ||
> -	    (!__is_setuid(new, old) &&
> +	    (uid_eq(new->euid, old->euid) &&
>  	     ((has_fcap &&
>  	       __cap_gained(permitted, new, old)) ||
>  	      __cap_gained(ambient, new, old))))
> @@ -917,7 +911,7 @@ int cap_bprm_creds_from_file(struct linux_binprm *bprm, const struct file *file)
>  	/* Process setpcap binaries and capabilities for uid 0 */
>  	const struct cred *old = current_cred();
>  	struct cred *new = bprm->cred;
> -	bool effective = false, has_fcap = false, is_setid;
> +	bool effective = false, has_fcap = false, id_changed;
>  	int ret;
>  	kuid_t root_uid;
>  
> @@ -941,9 +935,9 @@ int cap_bprm_creds_from_file(struct linux_binprm *bprm, const struct file *file)
>  	 *
>  	 * In addition, if NO_NEW_PRIVS, then ensure we get no new privs.
>  	 */
> -	is_setid = __is_setuid(new, old) || __is_setgid(new, old);
> +	id_changed = !uid_eq(new->euid, old->euid) || !in_group_p(new->egid);
>  
> -	if ((is_setid || __cap_gained(permitted, new, old)) &&
> +	if ((id_changed || __cap_gained(permitted, new, old)) &&
>  	    ((bprm->unsafe & ~LSM_UNSAFE_PTRACE) ||
>  	     !ptracer_capable(current, new->user_ns))) {
>  		/* downgrade; they get no more than they had, and maybe less */
> @@ -960,7 +954,7 @@ int cap_bprm_creds_from_file(struct linux_binprm *bprm, const struct file *file)
>  	new->sgid = new->fsgid = new->egid;
>  
>  	/* File caps or setid cancels ambient. */
> -	if (has_fcap || is_setid)
> +	if (has_fcap || id_changed)
>  		cap_clear(new->cap_ambient);
>  
>  	/*
> @@ -993,7 +987,7 @@ int cap_bprm_creds_from_file(struct linux_binprm *bprm, const struct file *file)
>  		return -EPERM;
>  
>  	/* Check for privilege-elevated exec. */
> -	if (is_setid ||
> +	if (id_changed ||
>  	    (!__is_real(root_uid, new) &&
>  	     (effective ||
>  	      __cap_grew(permitted, ambient, new))))
> -- 
> 2.41.0
> 

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH] exec: Correct the permission check for unsafe exec
  2025-05-16 18:06       ` Jann Horn
  2025-05-16 18:08         ` Jann Horn
  2025-05-16 21:46         ` sergeh
@ 2025-05-16 23:29         ` Eric W. Biederman
  2025-05-20 20:20           ` Kees Cook
  2 siblings, 1 reply; 44+ messages in thread
From: Eric W. Biederman @ 2025-05-16 23:29 UTC (permalink / raw)
  To: Jann Horn
  Cc: Kees Cook, Max Kellermann, Serge E. Hallyn, paul, jmorris,
	Andy Lutomirski, morgan, Christian Brauner, linux-security-module,
	linux-kernel

Jann Horn <jannh@google.com> writes:

> On Fri, May 16, 2025 at 5:26 PM Eric W. Biederman <ebiederm@xmission.com> wrote:
>> Kees Cook <kees@kernel.org> writes:
>>
>> > On Thu, May 15, 2025 at 11:24:47AM -0500, Eric W. Biederman wrote:
>> >> I have condensed the logic from Linux-2.4.0-test12 to just:
>> >>      id_changed = !uid_eq(new->euid, old->euid) || !in_group_p(new->egid);
>> >>
>> >> This change is userspace visible, but I don't expect anyone to care.
>> >> [...]
>> >> -static inline bool __is_setuid(struct cred *new, const struct cred *old)
>> >> -{ return !uid_eq(new->euid, old->uid); }
>> >> -
>> >> -static inline bool __is_setgid(struct cred *new, const struct cred *old)
>> >> -{ return !gid_eq(new->egid, old->gid); }
>> >> -
>> >> [...]
>> >> -    is_setid = __is_setuid(new, old) || __is_setgid(new, old);
>> >> +    id_changed = !uid_eq(new->euid, old->euid) || !in_group_p(new->egid);
>> >
>> > The core change here is testing for differing euid rather than
>> > mismatched uid/euid. (And checking for egid in the set of all groups.)
>>
>> Yes.
>>
>> For what the code is trying to do I can't fathom what was trying to
>> be accomplished by the "mismatched" uid/euid check.
>
> I remember that when I was looking at this code years ago, one case I
> was interested in was what happens when a setuid process (running with
> something like euid=1000,ruid=0) execve()'s a normal binary. Clearly
> the LSM_UNSAFE_* stuff is not so interesting there, because we're
> already coming from a privileged context; but the behavior of
> bprm->secureexec could be important.
>
> Like, I think currently a setuid binary like this is probably (?) not
> exploitable:
>
> int main(void) {
>   execl("/bin/echo", "echo", "hello world");
> }
>
> but after your proposed change, I think it might (?) become
> exploitable because "echo" would not have AT_SECURE set (I think?) and
> would therefore load libraries based on environment variables?

Yes.  bprm->secureexec controls AT_SECURE.

I am fine if we want to set secureexec and AT_SECURE in this situation.
It is a bit odd, but I don't see a problem with that.

> To be clear, I think this would be a stupid thing for userspace to do
> - a setuid binary just should not be running other binaries with the
> caller-provided environment while having elevated privileges. But if
> userspace was doing something like that, this change might make it
> more exploitable, and I imagine that the check for mismatched uid/euid
> was intended to catch cases like this?

The patch that made the change doesn't show up on lore.kernel.org so I
believe any record of the rational is lost.

To me it looks like someone was right up against the deadline to get
their code change into 2.4.0, was used to uid != euid in userspace, and
talked Linus into a last minute merge before 2.4.0 was released.

Eric

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH] exec: Correct the permission check for unsafe exec
  2025-05-16 23:29         ` Eric W. Biederman
@ 2025-05-20 20:20           ` Kees Cook
  2025-05-20 22:13             ` [PATCH v2] " Eric W. Biederman
  0 siblings, 1 reply; 44+ messages in thread
From: Kees Cook @ 2025-05-20 20:20 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Jann Horn, Max Kellermann, Serge E. Hallyn, paul, jmorris,
	Andy Lutomirski, morgan, Christian Brauner, linux-security-module,
	linux-kernel

On Fri, May 16, 2025 at 06:29:21PM -0500, Eric W. Biederman wrote:
> Jann Horn <jannh@google.com> writes:
> 
> > On Fri, May 16, 2025 at 5:26 PM Eric W. Biederman <ebiederm@xmission.com> wrote:
> >> Kees Cook <kees@kernel.org> writes:
> >>
> >> > On Thu, May 15, 2025 at 11:24:47AM -0500, Eric W. Biederman wrote:
> >> >> I have condensed the logic from Linux-2.4.0-test12 to just:
> >> >>      id_changed = !uid_eq(new->euid, old->euid) || !in_group_p(new->egid);
> >> >>
> >> >> This change is userspace visible, but I don't expect anyone to care.
> >> >> [...]
> >> >> -static inline bool __is_setuid(struct cred *new, const struct cred *old)
> >> >> -{ return !uid_eq(new->euid, old->uid); }
> >> >> -
> >> >> -static inline bool __is_setgid(struct cred *new, const struct cred *old)
> >> >> -{ return !gid_eq(new->egid, old->gid); }
> >> >> -
> >> >> [...]
> >> >> -    is_setid = __is_setuid(new, old) || __is_setgid(new, old);
> >> >> +    id_changed = !uid_eq(new->euid, old->euid) || !in_group_p(new->egid);
> >> >
> >> > The core change here is testing for differing euid rather than
> >> > mismatched uid/euid. (And checking for egid in the set of all groups.)
> >>
> >> Yes.
> >>
> >> For what the code is trying to do I can't fathom what was trying to
> >> be accomplished by the "mismatched" uid/euid check.
> >
> > I remember that when I was looking at this code years ago, one case I
> > was interested in was what happens when a setuid process (running with
> > something like euid=1000,ruid=0) execve()'s a normal binary. Clearly
> > the LSM_UNSAFE_* stuff is not so interesting there, because we're
> > already coming from a privileged context; but the behavior of
> > bprm->secureexec could be important.
> >
> > Like, I think currently a setuid binary like this is probably (?) not
> > exploitable:
> >
> > int main(void) {
> >   execl("/bin/echo", "echo", "hello world");
> > }
> >
> > but after your proposed change, I think it might (?) become
> > exploitable because "echo" would not have AT_SECURE set (I think?) and
> > would therefore load libraries based on environment variables?
> 
> Yes.  bprm->secureexec controls AT_SECURE.
> 
> I am fine if we want to set secureexec and AT_SECURE in this situation.
> It is a bit odd, but I don't see a problem with that.

So the idea would be that uid/euid mismatch would still induce AT_SECURE?

That seems reasonable.

I was already waiting for the after the coming merge window to put this
into -next, so if you cant update it for the AT_SECURE logic, we can
give that a try and see what we get.

Thanks!

-Kees

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 44+ messages in thread

* [PATCH v2] exec: Correct the permission check for unsafe exec
  2025-05-20 20:20           ` Kees Cook
@ 2025-05-20 22:13             ` Eric W. Biederman
  2025-05-20 22:35               ` Kees Cook
  2025-05-20 23:53               ` Jann Horn
  0 siblings, 2 replies; 44+ messages in thread
From: Eric W. Biederman @ 2025-05-20 22:13 UTC (permalink / raw)
  To: Kees Cook
  Cc: Jann Horn, Max Kellermann, Serge E. Hallyn, paul, jmorris,
	Andy Lutomirski, morgan, Christian Brauner, linux-security-module,
	linux-kernel


Max Kellerman recently experienced a problem[1] when calling exec with
differing uid and euid's and he triggered the logic that is supposed
to only handle setuid executables.

When exec isn't changing anything in struct cred it doesn't make sense
to go into the code that is there to handle the case when the
credentials change.

When looking into the history of the code I discovered that this issue
was not present in Linux-2.4.0-test12 and was introduced in
Linux-2.4.0-prerelease when the logic for handling this case was moved
from prepare_binprm to compute_creds in fs/exec.c.

The bug introdused was to comparing euid in the new credentials with
uid instead of euid in the old credentials, when testing if setuid
had changed the euid.

Since triggering the keep ptrace limping along case for setuid
executables makes no sense when it was not a setuid exec revert back
to the logic present in Linux-2.4.0-test12.

This removes the confusingly named and subtlety incorrect helpers
is_setuid and is_setgid, that helped this bug to persist.

The varaiable is_setid is renamed to id_changed (it's Linux-2.4.0-test12)
as the old name describes what matters rather than it's cause.

The code removed in Linux-2.4.0-prerelease was:
-       /* Set-uid? */
-       if (mode & S_ISUID) {
-               bprm->e_uid = inode->i_uid;
-               if (bprm->e_uid != current->euid)
-                       id_change = 1;
-       }
-
-       /* Set-gid? */
-       /*
-        * If setgid is set but no group execute bit then this
-        * is a candidate for mandatory locking, not a setgid
-        * executable.
-        */
-       if ((mode & (S_ISGID | S_IXGRP)) == (S_ISGID | S_IXGRP)) {
-               bprm->e_gid = inode->i_gid;
-               if (!in_group_p(bprm->e_gid))
-                       id_change = 1;

Linux-2.4.0-prerelease added the current logic as:
+       if (bprm->e_uid != current->uid || bprm->e_gid != current->gid ||
+           !cap_issubset(new_permitted, current->cap_permitted)) {
+                current->dumpable = 0;
+
+               lock_kernel();
+               if (must_not_trace_exec(current)
+                   || atomic_read(&current->fs->count) > 1
+                   || atomic_read(&current->files->count) > 1
+                   || atomic_read(&current->sig->count) > 1) {
+                       if(!capable(CAP_SETUID)) {
+                               bprm->e_uid = current->uid;
+                               bprm->e_gid = current->gid;
+                       }
+                       if(!capable(CAP_SETPCAP)) {
+                               new_permitted = cap_intersect(new_permitted,
+                                                       current->cap_permitted);
+                       }
+               }
+               do_unlock = 1;
+       }

I have condenced the logic from Linux-2.4.0-test12 to just:
	id_changed = !uid_eq(new->euid, old->euid) || !in_group_p(new->egid);

This change is userspace visible, but I don't expect anyone to care.

For the bug that is being fixed to trigger bprm->unsafe has to be set.
The variable bprm->unsafe is set when ptracing an executable, when
sharing a working directory, or when no_new_privs is set.  Properly
testing for cases that are safe even in those conditions and doing
nothing special should not affect anyone.  Especially if they were
previously ok with their credentials getting munged

To minimize behavioural changes the code continues to set secureexec
when euid != uid or when egid != gid.

Reported-by: Max Kellermann <max.kellermann@ionos.com>
Fixes: 64444d3d0d7f ("Linux version 2.4.0-prerelease")
[1] https://lkml.kernel.org/r/20250306082615.174777-1-max.kellermann@ionos.com
v1: https://lkml.kernel.org/r/878qmxsuy8.fsf@email.froward.int.ebiederm.org
Reviewed-by: Serge Hallyn <serge@hallyn.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 security/commoncap.c | 20 ++++++++------------
 1 file changed, 8 insertions(+), 12 deletions(-)

diff --git a/security/commoncap.c b/security/commoncap.c
index 28d4248bf001..6bd4adeb4795 100644
--- a/security/commoncap.c
+++ b/security/commoncap.c
@@ -856,12 +856,6 @@ static void handle_privileged_root(struct linux_binprm *bprm, bool has_fcap,
 #define __cap_full(field, cred) \
 	cap_issubset(CAP_FULL_SET, cred->cap_##field)
 
-static inline bool __is_setuid(struct cred *new, const struct cred *old)
-{ return !uid_eq(new->euid, old->uid); }
-
-static inline bool __is_setgid(struct cred *new, const struct cred *old)
-{ return !gid_eq(new->egid, old->gid); }
-
 /*
  * 1) Audit candidate if current->cap_effective is set
  *
@@ -891,7 +885,7 @@ static inline bool nonroot_raised_pE(struct cred *new, const struct cred *old,
 	    (root_privileged() &&
 	     __is_suid(root, new) &&
 	     !__cap_full(effective, new)) ||
-	    (!__is_setuid(new, old) &&
+	    (uid_eq(new->euid, old->euid) &&
 	     ((has_fcap &&
 	       __cap_gained(permitted, new, old)) ||
 	      __cap_gained(ambient, new, old))))
@@ -917,7 +911,7 @@ int cap_bprm_creds_from_file(struct linux_binprm *bprm, const struct file *file)
 	/* Process setpcap binaries and capabilities for uid 0 */
 	const struct cred *old = current_cred();
 	struct cred *new = bprm->cred;
-	bool effective = false, has_fcap = false, is_setid;
+	bool effective = false, has_fcap = false, id_changed;
 	int ret;
 	kuid_t root_uid;
 
@@ -941,9 +935,9 @@ int cap_bprm_creds_from_file(struct linux_binprm *bprm, const struct file *file)
 	 *
 	 * In addition, if NO_NEW_PRIVS, then ensure we get no new privs.
 	 */
-	is_setid = __is_setuid(new, old) || __is_setgid(new, old);
+	id_changed = !uid_eq(new->euid, old->euid) || !in_group_p(new->egid);
 
-	if ((is_setid || __cap_gained(permitted, new, old)) &&
+	if ((id_changed || __cap_gained(permitted, new, old)) &&
 	    ((bprm->unsafe & ~LSM_UNSAFE_PTRACE) ||
 	     !ptracer_capable(current, new->user_ns))) {
 		/* downgrade; they get no more than they had, and maybe less */
@@ -960,7 +954,7 @@ int cap_bprm_creds_from_file(struct linux_binprm *bprm, const struct file *file)
 	new->sgid = new->fsgid = new->egid;
 
 	/* File caps or setid cancels ambient. */
-	if (has_fcap || is_setid)
+	if (has_fcap || id_changed)
 		cap_clear(new->cap_ambient);
 
 	/*
@@ -993,7 +987,9 @@ int cap_bprm_creds_from_file(struct linux_binprm *bprm, const struct file *file)
 		return -EPERM;
 
 	/* Check for privilege-elevated exec. */
-	if (is_setid ||
+	if (id_changed ||
+	    !uid_eq(new->euid, old->uid) ||
+	    !gid_eq(new->egid, old->gid) ||
 	    (!__is_real(root_uid, new) &&
 	     (effective ||
 	      __cap_grew(permitted, ambient, new))))
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* Re: [PATCH v2] exec: Correct the permission check for unsafe exec
  2025-05-20 22:13             ` [PATCH v2] " Eric W. Biederman
@ 2025-05-20 22:35               ` Kees Cook
  2025-05-20 23:53               ` Jann Horn
  1 sibling, 0 replies; 44+ messages in thread
From: Kees Cook @ 2025-05-20 22:35 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Jann Horn, Max Kellermann, Serge E. Hallyn, paul, jmorris,
	Andy Lutomirski, morgan, Christian Brauner, linux-security-module,
	linux-kernel

On Tue, May 20, 2025 at 05:13:03PM -0500, Eric W. Biederman wrote:
> Max Kellerman recently experienced a problem[1] when calling exec with
> differing uid and euid's and he triggered the logic that is supposed
> to only handle setuid executables.

Max, can you verify this patch solves your use case?

> [...]
> To minimize behavioural changes the code continues to set secureexec
> when euid != uid or when egid != gid.
> [...]
> @@ -993,7 +987,9 @@ int cap_bprm_creds_from_file(struct linux_binprm *bprm, const struct file *file)
>  		return -EPERM;
>  
>  	/* Check for privilege-elevated exec. */
> -	if (is_setid ||
> +	if (id_changed ||
> +	    !uid_eq(new->euid, old->uid) ||
> +	    !gid_eq(new->egid, old->gid) ||
>  	    (!__is_real(root_uid, new) &&
>  	     (effective ||
>  	      __cap_grew(permitted, ambient, new))))

Great! Thanks for the secureexec tweak here. Jann, does this look
reasonable to you?

-Kees

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH] exec: Correct the permission check for unsafe exec
  2025-05-16 21:46         ` sergeh
@ 2025-05-20 22:38           ` Jann Horn
  2025-05-20 22:43             ` Kees Cook
  0 siblings, 1 reply; 44+ messages in thread
From: Jann Horn @ 2025-05-20 22:38 UTC (permalink / raw)
  To: sergeh
  Cc: Eric W. Biederman, Kees Cook, Max Kellermann, Serge E. Hallyn,
	paul, jmorris, Andy Lutomirski, morgan, Christian Brauner,
	linux-security-module, linux-kernel

On Fri, May 16, 2025 at 11:46 PM <sergeh@kernel.org> wrote:
> On Fri, May 16, 2025 at 08:06:15PM +0200, Jann Horn wrote:
> > On Fri, May 16, 2025 at 5:26 PM Eric W. Biederman <ebiederm@xmission.com> wrote:
> > > Kees Cook <kees@kernel.org> writes:
> > >
> > > > On Thu, May 15, 2025 at 11:24:47AM -0500, Eric W. Biederman wrote:
> > > >> I have condensed the logic from Linux-2.4.0-test12 to just:
> > > >>      id_changed = !uid_eq(new->euid, old->euid) || !in_group_p(new->egid);
> > > >>
> > > >> This change is userspace visible, but I don't expect anyone to care.
> > > >> [...]
> > > >> -static inline bool __is_setuid(struct cred *new, const struct cred *old)
> > > >> -{ return !uid_eq(new->euid, old->uid); }
> > > >> -
> > > >> -static inline bool __is_setgid(struct cred *new, const struct cred *old)
> > > >> -{ return !gid_eq(new->egid, old->gid); }
> > > >> -
> > > >> [...]
> > > >> -    is_setid = __is_setuid(new, old) || __is_setgid(new, old);
> > > >> +    id_changed = !uid_eq(new->euid, old->euid) || !in_group_p(new->egid);
> > > >
> > > > The core change here is testing for differing euid rather than
> > > > mismatched uid/euid. (And checking for egid in the set of all groups.)
> > >
> > > Yes.
> > >
> > > For what the code is trying to do I can't fathom what was trying to
> > > be accomplished by the "mismatched" uid/euid check.
> >
> > I remember that when I was looking at this code years ago, one case I
> > was interested in was what happens when a setuid process (running with
> > something like euid=1000,ruid=0) execve()'s a normal binary. Clearly
> > the LSM_UNSAFE_* stuff is not so interesting there, because we're
> > already coming from a privileged context; but the behavior of
> > bprm->secureexec could be important.
> >
> > Like, I think currently a setuid binary like this is probably (?) not
> > exploitable:
> >
> > int main(void) {
> >   execl("/bin/echo", "echo", "hello world");
> > }
> >
> > but after your proposed change, I think it might (?) become
> > exploitable because "echo" would not have AT_SECURE set (I think?) and
> > would therefore load libraries based on environment variables?
> >
> > To be clear, I think this would be a stupid thing for userspace to do
> > - a setuid binary just should not be running other binaries with the
> > caller-provided environment while having elevated privileges. But if
> > userspace was doing something like that, this change might make it
> > more exploitable, and I imagine that the check for mismatched uid/euid
> > was intended to catch cases like this?
>
> If the original process became privileged by executing a setuid-root
> file (and uses glibc), then LD_PRELOAD will already have been cleared,
> right?  So it would either have to add the unsafe entries back to
> LD_PRELOAD again, or it has to have been root all along, not a
> setuid-root program.  I think at that point we have to say this is what
> it intended, and possibly with good reason.

Oh, I see what you mean, glibc's loader code zaps that environment
variable on secureexec for additional safety, I didn't know that.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH] exec: Correct the permission check for unsafe exec
  2025-05-20 22:38           ` Jann Horn
@ 2025-05-20 22:43             ` Kees Cook
  0 siblings, 0 replies; 44+ messages in thread
From: Kees Cook @ 2025-05-20 22:43 UTC (permalink / raw)
  To: Jann Horn
  Cc: sergeh, Eric W. Biederman, Max Kellermann, Serge E. Hallyn, paul,
	jmorris, Andy Lutomirski, morgan, Christian Brauner,
	linux-security-module, linux-kernel

On Wed, May 21, 2025 at 12:38:33AM +0200, Jann Horn wrote:
> On Fri, May 16, 2025 at 11:46 PM <sergeh@kernel.org> wrote:
> > On Fri, May 16, 2025 at 08:06:15PM +0200, Jann Horn wrote:
> > > On Fri, May 16, 2025 at 5:26 PM Eric W. Biederman <ebiederm@xmission.com> wrote:
> > > > Kees Cook <kees@kernel.org> writes:
> > > >
> > > > > On Thu, May 15, 2025 at 11:24:47AM -0500, Eric W. Biederman wrote:
> > > > >> I have condensed the logic from Linux-2.4.0-test12 to just:
> > > > >>      id_changed = !uid_eq(new->euid, old->euid) || !in_group_p(new->egid);
> > > > >>
> > > > >> This change is userspace visible, but I don't expect anyone to care.
> > > > >> [...]
> > > > >> -static inline bool __is_setuid(struct cred *new, const struct cred *old)
> > > > >> -{ return !uid_eq(new->euid, old->uid); }
> > > > >> -
> > > > >> -static inline bool __is_setgid(struct cred *new, const struct cred *old)
> > > > >> -{ return !gid_eq(new->egid, old->gid); }
> > > > >> -
> > > > >> [...]
> > > > >> -    is_setid = __is_setuid(new, old) || __is_setgid(new, old);
> > > > >> +    id_changed = !uid_eq(new->euid, old->euid) || !in_group_p(new->egid);
> > > > >
> > > > > The core change here is testing for differing euid rather than
> > > > > mismatched uid/euid. (And checking for egid in the set of all groups.)
> > > >
> > > > Yes.
> > > >
> > > > For what the code is trying to do I can't fathom what was trying to
> > > > be accomplished by the "mismatched" uid/euid check.
> > >
> > > I remember that when I was looking at this code years ago, one case I
> > > was interested in was what happens when a setuid process (running with
> > > something like euid=1000,ruid=0) execve()'s a normal binary. Clearly
> > > the LSM_UNSAFE_* stuff is not so interesting there, because we're
> > > already coming from a privileged context; but the behavior of
> > > bprm->secureexec could be important.
> > >
> > > Like, I think currently a setuid binary like this is probably (?) not
> > > exploitable:
> > >
> > > int main(void) {
> > >   execl("/bin/echo", "echo", "hello world");
> > > }
> > >
> > > but after your proposed change, I think it might (?) become
> > > exploitable because "echo" would not have AT_SECURE set (I think?) and
> > > would therefore load libraries based on environment variables?
> > >
> > > To be clear, I think this would be a stupid thing for userspace to do
> > > - a setuid binary just should not be running other binaries with the
> > > caller-provided environment while having elevated privileges. But if
> > > userspace was doing something like that, this change might make it
> > > more exploitable, and I imagine that the check for mismatched uid/euid
> > > was intended to catch cases like this?
> >
> > If the original process became privileged by executing a setuid-root
> > file (and uses glibc), then LD_PRELOAD will already have been cleared,
> > right?  So it would either have to add the unsafe entries back to
> > LD_PRELOAD again, or it has to have been root all along, not a
> > setuid-root program.  I think at that point we have to say this is what
> > it intended, and possibly with good reason.
> 
> Oh, I see what you mean, glibc's loader code zaps that environment
> variable on secureexec for additional safety, I didn't know that.

It was pointed out that musl does _not_ zap it; it just ignores it but
leaves it set. (I have not verified this myself...)

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2] exec: Correct the permission check for unsafe exec
  2025-05-20 22:13             ` [PATCH v2] " Eric W. Biederman
  2025-05-20 22:35               ` Kees Cook
@ 2025-05-20 23:53               ` Jann Horn
  2025-05-21 15:27                 ` Eric W. Biederman
  1 sibling, 1 reply; 44+ messages in thread
From: Jann Horn @ 2025-05-20 23:53 UTC (permalink / raw)
  To: Eric W. Biederman, Richard Guy Briggs, Serge E. Hallyn
  Cc: Kees Cook, Max Kellermann, paul, jmorris, Andy Lutomirski, morgan,
	Christian Brauner, linux-security-module, linux-kernel

On Wed, May 21, 2025 at 12:13 AM Eric W. Biederman
<ebiederm@xmission.com> wrote:
> Max Kellerman recently experienced a problem[1] when calling exec with
> differing uid and euid's and he triggered the logic that is supposed
> to only handle setuid executables.
>
> When exec isn't changing anything in struct cred it doesn't make sense
> to go into the code that is there to handle the case when the
> credentials change.

Maybe you could summarize the change here, something like:

"To fix it, when executing with RUID != EUID or RGID != EGID on a
non-set[ug]id transition, do not strip privileges such as ambient
capabilities and (if the execution is marked as unsafe) EUID/EGID
anymore; but keep marking such executions as secureexec."

> When looking into the history of the code I discovered that this issue
> was not present in Linux-2.4.0-test12 and was introduced in
> Linux-2.4.0-prerelease when the logic for handling this case was moved
> from prepare_binprm to compute_creds in fs/exec.c.
>
> The bug introdused was to comparing euid in the new credentials with
> uid instead of euid in the old credentials, when testing if setuid
> had changed the euid.
>
> Since triggering the keep ptrace limping along case for setuid
> executables makes no sense when it was not a setuid exec revert back
> to the logic present in Linux-2.4.0-test12.
>
> This removes the confusingly named and subtlety incorrect helpers
> is_setuid and is_setgid, that helped this bug to persist.
>
> The varaiable is_setid is renamed to id_changed (it's Linux-2.4.0-test12)
> as the old name describes what matters rather than it's cause.
>
> The code removed in Linux-2.4.0-prerelease was:
> -       /* Set-uid? */
> -       if (mode & S_ISUID) {
> -               bprm->e_uid = inode->i_uid;
> -               if (bprm->e_uid != current->euid)
> -                       id_change = 1;
> -       }
> -
> -       /* Set-gid? */
> -       /*
> -        * If setgid is set but no group execute bit then this
> -        * is a candidate for mandatory locking, not a setgid
> -        * executable.
> -        */
> -       if ((mode & (S_ISGID | S_IXGRP)) == (S_ISGID | S_IXGRP)) {
> -               bprm->e_gid = inode->i_gid;
> -               if (!in_group_p(bprm->e_gid))
> -                       id_change = 1;
>
> Linux-2.4.0-prerelease added the current logic as:
> +       if (bprm->e_uid != current->uid || bprm->e_gid != current->gid ||
> +           !cap_issubset(new_permitted, current->cap_permitted)) {
> +                current->dumpable = 0;
> +
> +               lock_kernel();
> +               if (must_not_trace_exec(current)
> +                   || atomic_read(&current->fs->count) > 1
> +                   || atomic_read(&current->files->count) > 1
> +                   || atomic_read(&current->sig->count) > 1) {
> +                       if(!capable(CAP_SETUID)) {
> +                               bprm->e_uid = current->uid;
> +                               bprm->e_gid = current->gid;
> +                       }
> +                       if(!capable(CAP_SETPCAP)) {
> +                               new_permitted = cap_intersect(new_permitted,
> +                                                       current->cap_permitted);
> +                       }
> +               }
> +               do_unlock = 1;
> +       }
>
> I have condenced the logic from Linux-2.4.0-test12 to just:
>         id_changed = !uid_eq(new->euid, old->euid) || !in_group_p(new->egid);
>
> This change is userspace visible, but I don't expect anyone to care.
>
> For the bug that is being fixed to trigger bprm->unsafe has to be set.
> The variable bprm->unsafe is set when ptracing an executable, when
> sharing a working directory, or when no_new_privs is set.  Properly
> testing for cases that are safe even in those conditions and doing
> nothing special should not affect anyone.  Especially if they were
> previously ok with their credentials getting munged
>
> To minimize behavioural changes the code continues to set secureexec
> when euid != uid or when egid != gid.
>
> Reported-by: Max Kellermann <max.kellermann@ionos.com>
> Fixes: 64444d3d0d7f ("Linux version 2.4.0-prerelease")
> [1] https://lkml.kernel.org/r/20250306082615.174777-1-max.kellermann@ionos.com
> v1: https://lkml.kernel.org/r/878qmxsuy8.fsf@email.froward.int.ebiederm.org
> Reviewed-by: Serge Hallyn <serge@hallyn.com>
> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>

Looks good to me overall, thanks for figuring out the history of this
not-particularly-easy-to-understand code and figuring out the right
fix.

Reviewed-by: Jann Horn <jannh@google.com>

> ---
>  security/commoncap.c | 20 ++++++++------------
>  1 file changed, 8 insertions(+), 12 deletions(-)
>
> diff --git a/security/commoncap.c b/security/commoncap.c
> index 28d4248bf001..6bd4adeb4795 100644
> --- a/security/commoncap.c
> +++ b/security/commoncap.c
> @@ -856,12 +856,6 @@ static void handle_privileged_root(struct linux_binprm *bprm, bool has_fcap,
>  #define __cap_full(field, cred) \
>         cap_issubset(CAP_FULL_SET, cred->cap_##field)
>
> -static inline bool __is_setuid(struct cred *new, const struct cred *old)
> -{ return !uid_eq(new->euid, old->uid); }
> -
> -static inline bool __is_setgid(struct cred *new, const struct cred *old)
> -{ return !gid_eq(new->egid, old->gid); }
> -
>  /*
>   * 1) Audit candidate if current->cap_effective is set
>   *
> @@ -891,7 +885,7 @@ static inline bool nonroot_raised_pE(struct cred *new, const struct cred *old,
>             (root_privileged() &&
>              __is_suid(root, new) &&
>              !__cap_full(effective, new)) ||
> -           (!__is_setuid(new, old) &&
> +           (uid_eq(new->euid, old->euid) &&
>              ((has_fcap &&
>                __cap_gained(permitted, new, old)) ||
>               __cap_gained(ambient, new, old))))

Just a comment unrelated to this specific change:

Wow, nonroot_raised_pE() is hard to read, but I guess luckily it's
only used for auditing, so it's not that important...

The diff of the whole expression that decides whether to audit the
execution, reindented for clarity, looks like this:

(
  __cap_grew(effective, ambient, new) &&
  !(
    __cap_full(effective, new)
    &&
    (__is_eff(root, new) || __is_real(root, new))
    &&
    root_privileged()
  )
)
||
(
  root_privileged()
  &&
  __is_suid(root, new)
  &&
  !__cap_full(effective, new)
)
||
(
- !__is_setuid(new, old)
+ uid_eq(new->euid, old->euid)
  &&
  (
    (has_fcap && __cap_gained(permitted, new, old))
    ||
    __cap_gained(ambient, new, old)
  )
)

And we do auditing in three scenarios (situation with the patch applied):

1. We have effective caps that are not ambient caps, and we don't have
a full capability set based on having a root-privileged EUID or RUID.
2. We are in a suid-like execution but are not getting a full capability set.
3. [changed part] We are not changing UID through a suid execution,
but either we gained ambient capabilities through the execution (???)
or we gained permitted capabilities while executing a file with fcaps.

I am highly confused by the __cap_gained(ambient, new, old) check
because I have no idea why ambient capabilities would ever increase on
exec. I thought they can only *decrease* on exec? Apparently that was
introduced in commit dbbbe1105ea6a ("capabilities: audit log other
surprising conditions").

Anyway, yeah, your change looks fine, just the preexisting code looks
dodgy to me.

> @@ -917,7 +911,7 @@ int cap_bprm_creds_from_file(struct linux_binprm *bprm, const struct file *file)
>         /* Process setpcap binaries and capabilities for uid 0 */
>         const struct cred *old = current_cred();
>         struct cred *new = bprm->cred;
> -       bool effective = false, has_fcap = false, is_setid;
> +       bool effective = false, has_fcap = false, id_changed;
>         int ret;
>         kuid_t root_uid;
>
> @@ -941,9 +935,9 @@ int cap_bprm_creds_from_file(struct linux_binprm *bprm, const struct file *file)
>          *
>          * In addition, if NO_NEW_PRIVS, then ensure we get no new privs.
>          */
> -       is_setid = __is_setuid(new, old) || __is_setgid(new, old);
> +       id_changed = !uid_eq(new->euid, old->euid) || !in_group_p(new->egid);

Hm, so when we change from one EGID to another EGID which was already
in our groups list, we don't treat it as a privileged exec? Which is
okay because, while an unprivileged user would not just be allowed to
change their EGID to a GID from their groups list themselves through
__sys_setregid(), they would be allowed to create a new setgid binary
owned by a group from their groups list and then execute that?

That's fine with me, though it seems a little weird to me. setgid exec
is changing our creds and yet we're not treating it as a "real" setgid
execution because the execution is only granting privileges that
userspace could have gotten anyway.

> -       if ((is_setid || __cap_gained(permitted, new, old)) &&
> +       if ((id_changed || __cap_gained(permitted, new, old)) &&
>             ((bprm->unsafe & ~LSM_UNSAFE_PTRACE) ||
>              !ptracer_capable(current, new->user_ns))) {
>                 /* downgrade; they get no more than they had, and maybe less */
> @@ -960,7 +954,7 @@ int cap_bprm_creds_from_file(struct linux_binprm *bprm, const struct file *file)
>         new->sgid = new->fsgid = new->egid;
>
>         /* File caps or setid cancels ambient. */
> -       if (has_fcap || is_setid)
> +       if (has_fcap || id_changed)
>                 cap_clear(new->cap_ambient);
>
>         /*
> @@ -993,7 +987,9 @@ int cap_bprm_creds_from_file(struct linux_binprm *bprm, const struct file *file)
>                 return -EPERM;
>
>         /* Check for privilege-elevated exec. */
> -       if (is_setid ||
> +       if (id_changed ||
> +           !uid_eq(new->euid, old->uid) ||
> +           !gid_eq(new->egid, old->gid) ||
>             (!__is_real(root_uid, new) &&
>              (effective ||
>               __cap_grew(permitted, ambient, new))))
> --
> 2.41.0
>

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2] exec: Correct the permission check for unsafe exec
  2025-05-20 23:53               ` Jann Horn
@ 2025-05-21 15:27                 ` Eric W. Biederman
  2025-05-21 15:36                   ` Jann Horn
  0 siblings, 1 reply; 44+ messages in thread
From: Eric W. Biederman @ 2025-05-21 15:27 UTC (permalink / raw)
  To: Jann Horn
  Cc: Richard Guy Briggs, Serge E. Hallyn, Kees Cook, Max Kellermann,
	paul, jmorris, Andy Lutomirski, morgan, Christian Brauner,
	linux-security-module, linux-kernel

Jann Horn <jannh@google.com> writes:

> On Wed, May 21, 2025 at 12:13 AM Eric W. Biederman
> <ebiederm@xmission.com> wrote:

> Looks good to me overall, thanks for figuring out the history of this
> not-particularly-easy-to-understand code and figuring out the right
> fix.
>
> Reviewed-by: Jann Horn <jannh@google.com>
>
>> @@ -917,7 +911,7 @@ int cap_bprm_creds_from_file(struct linux_binprm *bprm, const struct file *file)
>>         /* Process setpcap binaries and capabilities for uid 0 */
>>         const struct cred *old = current_cred();
>>         struct cred *new = bprm->cred;
>> -       bool effective = false, has_fcap = false, is_setid;
>> +       bool effective = false, has_fcap = false, id_changed;
>>         int ret;
>>         kuid_t root_uid;
>>
>> @@ -941,9 +935,9 @@ int cap_bprm_creds_from_file(struct linux_binprm *bprm, const struct file *file)
>>          *
>>          * In addition, if NO_NEW_PRIVS, then ensure we get no new privs.
>>          */
>> -       is_setid = __is_setuid(new, old) || __is_setgid(new, old);
>> +       id_changed = !uid_eq(new->euid, old->euid) || !in_group_p(new->egid);
>
> Hm, so when we change from one EGID to another EGID which was already
> in our groups list, we don't treat it as a privileged exec? Which is
> okay because, while an unprivileged user would not just be allowed to
> change their EGID to a GID from their groups list themselves through
> __sys_setregid(), they would be allowed to create a new setgid binary
> owned by a group from their groups list and then execute that?
>
> That's fine with me, though it seems a little weird to me. setgid exec
> is changing our creds and yet we're not treating it as a "real" setgid
> execution because the execution is only granting privileges that
> userspace could have gotten anyway.

More than could have gotten.  From permission checking point of view
permission that the application already had.  In general group based
permission checks just check in_group_p, which looks at cred->fsgid and
the group.

The logic is since the effective permissions of the running executable
have not changed, there is nothing to special case.

Arguably a setgid exec can drop what was egid, and if people have
configured their permissions to deny people access based upon a group
they are in that could change the result of the permission checks.  If
changing egid winds up dropping a group from the list of the process's
groups, the process could also have dropped that group with setresgid.
So I don't think we need to be concerned about the combination of
dropping egid and brpm->unsafe.

If anyone sees a hole in that logic I am happy to change the check
to !gid_eq(new->egid, old->egid), but I just can't see a way changing
egid/fsgid to a group the process already has is a problem.

>> -       if ((is_setid || __cap_gained(permitted, new, old)) &&
>> +       if ((id_changed || __cap_gained(permitted, new, old)) &&
>>             ((bprm->unsafe & ~LSM_UNSAFE_PTRACE) ||
>>              !ptracer_capable(current, new->user_ns))) {
>>                 /* downgrade; they get no more than they had, and maybe less */

Eric

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2] exec: Correct the permission check for unsafe exec
  2025-05-21 15:27                 ` Eric W. Biederman
@ 2025-05-21 15:36                   ` Jann Horn
  2025-06-11  0:18                     ` Paul Moore
  0 siblings, 1 reply; 44+ messages in thread
From: Jann Horn @ 2025-05-21 15:36 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Richard Guy Briggs, Serge E. Hallyn, Kees Cook, Max Kellermann,
	paul, jmorris, Andy Lutomirski, morgan, Christian Brauner,
	linux-security-module, linux-kernel

On Wed, May 21, 2025 at 5:27 PM Eric W. Biederman <ebiederm@xmission.com> wrote:
> Jann Horn <jannh@google.com> writes:
>
> > On Wed, May 21, 2025 at 12:13 AM Eric W. Biederman
> > <ebiederm@xmission.com> wrote:
>
> > Looks good to me overall, thanks for figuring out the history of this
> > not-particularly-easy-to-understand code and figuring out the right
> > fix.
> >
> > Reviewed-by: Jann Horn <jannh@google.com>
> >
> >> @@ -917,7 +911,7 @@ int cap_bprm_creds_from_file(struct linux_binprm *bprm, const struct file *file)
> >>         /* Process setpcap binaries and capabilities for uid 0 */
> >>         const struct cred *old = current_cred();
> >>         struct cred *new = bprm->cred;
> >> -       bool effective = false, has_fcap = false, is_setid;
> >> +       bool effective = false, has_fcap = false, id_changed;
> >>         int ret;
> >>         kuid_t root_uid;
> >>
> >> @@ -941,9 +935,9 @@ int cap_bprm_creds_from_file(struct linux_binprm *bprm, const struct file *file)
> >>          *
> >>          * In addition, if NO_NEW_PRIVS, then ensure we get no new privs.
> >>          */
> >> -       is_setid = __is_setuid(new, old) || __is_setgid(new, old);
> >> +       id_changed = !uid_eq(new->euid, old->euid) || !in_group_p(new->egid);
> >
> > Hm, so when we change from one EGID to another EGID which was already
> > in our groups list, we don't treat it as a privileged exec? Which is
> > okay because, while an unprivileged user would not just be allowed to
> > change their EGID to a GID from their groups list themselves through
> > __sys_setregid(), they would be allowed to create a new setgid binary
> > owned by a group from their groups list and then execute that?
> >
> > That's fine with me, though it seems a little weird to me. setgid exec
> > is changing our creds and yet we're not treating it as a "real" setgid
> > execution because the execution is only granting privileges that
> > userspace could have gotten anyway.
>
> More than could have gotten.  From permission checking point of view
> permission that the application already had.  In general group based
> permission checks just check in_group_p, which looks at cred->fsgid and
> the group.
>
> The logic is since the effective permissions of the running executable
> have not changed, there is nothing to special case.
>
> Arguably a setgid exec can drop what was egid, and if people have
> configured their permissions to deny people access based upon a group
> they are in that could change the result of the permission checks.  If
> changing egid winds up dropping a group from the list of the process's
> groups, the process could also have dropped that group with setresgid.
> So I don't think we need to be concerned about the combination of
> dropping egid and brpm->unsafe.
>
> If anyone sees a hole in that logic I am happy to change the check
> to !gid_eq(new->egid, old->egid), but I just can't see a way changing
> egid/fsgid to a group the process already has is a problem.

I'm fine with leaving your patch as-is.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2] exec: Correct the permission check for unsafe exec
  2025-05-21 15:36                   ` Jann Horn
@ 2025-06-11  0:18                     ` Paul Moore
  2025-06-11 14:23                       ` Max Kellermann
  2025-06-12 21:26                       ` Serge E. Hallyn
  0 siblings, 2 replies; 44+ messages in thread
From: Paul Moore @ 2025-06-11  0:18 UTC (permalink / raw)
  To: Jann Horn
  Cc: Eric W. Biederman, Richard Guy Briggs, Serge E. Hallyn, Kees Cook,
	Max Kellermann, jmorris, Andy Lutomirski, morgan,
	Christian Brauner, linux-security-module, linux-kernel

On Wed, May 21, 2025 at 11:36 AM Jann Horn <jannh@google.com> wrote:
> On Wed, May 21, 2025 at 5:27 PM Eric W. Biederman <ebiederm@xmission.com> wrote:
> > Jann Horn <jannh@google.com> writes:
> >
> > > On Wed, May 21, 2025 at 12:13 AM Eric W. Biederman
> > > <ebiederm@xmission.com> wrote:
> >
> > > Looks good to me overall, thanks for figuring out the history of this
> > > not-particularly-easy-to-understand code and figuring out the right
> > > fix.
> > >
> > > Reviewed-by: Jann Horn <jannh@google.com>
> > >
> > >> @@ -917,7 +911,7 @@ int cap_bprm_creds_from_file(struct linux_binprm *bprm, const struct file *file)
> > >>         /* Process setpcap binaries and capabilities for uid 0 */
> > >>         const struct cred *old = current_cred();
> > >>         struct cred *new = bprm->cred;
> > >> -       bool effective = false, has_fcap = false, is_setid;
> > >> +       bool effective = false, has_fcap = false, id_changed;
> > >>         int ret;
> > >>         kuid_t root_uid;
> > >>
> > >> @@ -941,9 +935,9 @@ int cap_bprm_creds_from_file(struct linux_binprm *bprm, const struct file *file)
> > >>          *
> > >>          * In addition, if NO_NEW_PRIVS, then ensure we get no new privs.
> > >>          */
> > >> -       is_setid = __is_setuid(new, old) || __is_setgid(new, old);
> > >> +       id_changed = !uid_eq(new->euid, old->euid) || !in_group_p(new->egid);
> > >
> > > Hm, so when we change from one EGID to another EGID which was already
> > > in our groups list, we don't treat it as a privileged exec? Which is
> > > okay because, while an unprivileged user would not just be allowed to
> > > change their EGID to a GID from their groups list themselves through
> > > __sys_setregid(), they would be allowed to create a new setgid binary
> > > owned by a group from their groups list and then execute that?
> > >
> > > That's fine with me, though it seems a little weird to me. setgid exec
> > > is changing our creds and yet we're not treating it as a "real" setgid
> > > execution because the execution is only granting privileges that
> > > userspace could have gotten anyway.
> >
> > More than could have gotten.  From permission checking point of view
> > permission that the application already had.  In general group based
> > permission checks just check in_group_p, which looks at cred->fsgid and
> > the group.
> >
> > The logic is since the effective permissions of the running executable
> > have not changed, there is nothing to special case.
> >
> > Arguably a setgid exec can drop what was egid, and if people have
> > configured their permissions to deny people access based upon a group
> > they are in that could change the result of the permission checks.  If
> > changing egid winds up dropping a group from the list of the process's
> > groups, the process could also have dropped that group with setresgid.
> > So I don't think we need to be concerned about the combination of
> > dropping egid and brpm->unsafe.
> >
> > If anyone sees a hole in that logic I am happy to change the check
> > to !gid_eq(new->egid, old->egid), but I just can't see a way changing
> > egid/fsgid to a group the process already has is a problem.
>
> I'm fine with leaving your patch as-is.

Aside from a tested-by verification from Max, it looks like everyone
is satisfied with the v2 patch, yes?

Serge, I see you've reviewed this patch, can I assume that now you
have a capabilities tree up and running you'll take this patch?

-- 
paul-moore.com

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2] exec: Correct the permission check for unsafe exec
  2025-06-11  0:18                     ` Paul Moore
@ 2025-06-11 14:23                       ` Max Kellermann
  2025-06-13 15:07                         ` Eric W. Biederman
  2025-06-12 21:26                       ` Serge E. Hallyn
  1 sibling, 1 reply; 44+ messages in thread
From: Max Kellermann @ 2025-06-11 14:23 UTC (permalink / raw)
  To: Paul Moore
  Cc: Jann Horn, Eric W. Biederman, Richard Guy Briggs, Serge E. Hallyn,
	Kees Cook, jmorris, Andy Lutomirski, morgan, Christian Brauner,
	linux-security-module, linux-kernel

On Wed, Jun 11, 2025 at 2:19 AM Paul Moore <paul@paul-moore.com> wrote:
> Aside from a tested-by verification from Max, it looks like everyone
> is satisfied with the v2 patch, yes?

Sorry for the delay. I tested Eric's v2 patch and it solves my
problem. His patch is nearly identical to mine, it's only a bit more
intrusive by removing the weird __is_setXid functions that never made
sense. I welcome that; I wasn't confident enough to do that and tried
to make the least intrusive patch.

Eric, I'm glad you changed your mind and no longer consider my work
"pure nonsense" and "pointless".

But one problem remains: in the same email, you demanded evidence that
userspace doesn't depend on the current behavior. However, in your
patch description, you hand-waved that away by "I don't expect anyone
to care". What happened to that?

Max

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2] exec: Correct the permission check for unsafe exec
  2025-06-11  0:18                     ` Paul Moore
  2025-06-11 14:23                       ` Max Kellermann
@ 2025-06-12 21:26                       ` Serge E. Hallyn
  2025-06-13  1:48                         ` Kees Cook
  1 sibling, 1 reply; 44+ messages in thread
From: Serge E. Hallyn @ 2025-06-12 21:26 UTC (permalink / raw)
  To: Paul Moore
  Cc: Jann Horn, Eric W. Biederman, Richard Guy Briggs, Serge E. Hallyn,
	Kees Cook, Max Kellermann, jmorris, Andy Lutomirski, morgan,
	Christian Brauner, linux-security-module, linux-kernel

On Tue, Jun 10, 2025 at 08:18:56PM -0400, Paul Moore wrote:
> On Wed, May 21, 2025 at 11:36 AM Jann Horn <jannh@google.com> wrote:
> > On Wed, May 21, 2025 at 5:27 PM Eric W. Biederman <ebiederm@xmission.com> wrote:
> > > Jann Horn <jannh@google.com> writes:
> > >
> > > > On Wed, May 21, 2025 at 12:13 AM Eric W. Biederman
> > > > <ebiederm@xmission.com> wrote:
> > >
> > > > Looks good to me overall, thanks for figuring out the history of this
> > > > not-particularly-easy-to-understand code and figuring out the right
> > > > fix.
> > > >
> > > > Reviewed-by: Jann Horn <jannh@google.com>
> > > >
> > > >> @@ -917,7 +911,7 @@ int cap_bprm_creds_from_file(struct linux_binprm *bprm, const struct file *file)
> > > >>         /* Process setpcap binaries and capabilities for uid 0 */
> > > >>         const struct cred *old = current_cred();
> > > >>         struct cred *new = bprm->cred;
> > > >> -       bool effective = false, has_fcap = false, is_setid;
> > > >> +       bool effective = false, has_fcap = false, id_changed;
> > > >>         int ret;
> > > >>         kuid_t root_uid;
> > > >>
> > > >> @@ -941,9 +935,9 @@ int cap_bprm_creds_from_file(struct linux_binprm *bprm, const struct file *file)
> > > >>          *
> > > >>          * In addition, if NO_NEW_PRIVS, then ensure we get no new privs.
> > > >>          */
> > > >> -       is_setid = __is_setuid(new, old) || __is_setgid(new, old);
> > > >> +       id_changed = !uid_eq(new->euid, old->euid) || !in_group_p(new->egid);
> > > >
> > > > Hm, so when we change from one EGID to another EGID which was already
> > > > in our groups list, we don't treat it as a privileged exec? Which is
> > > > okay because, while an unprivileged user would not just be allowed to
> > > > change their EGID to a GID from their groups list themselves through
> > > > __sys_setregid(), they would be allowed to create a new setgid binary
> > > > owned by a group from their groups list and then execute that?
> > > >
> > > > That's fine with me, though it seems a little weird to me. setgid exec
> > > > is changing our creds and yet we're not treating it as a "real" setgid
> > > > execution because the execution is only granting privileges that
> > > > userspace could have gotten anyway.
> > >
> > > More than could have gotten.  From permission checking point of view
> > > permission that the application already had.  In general group based
> > > permission checks just check in_group_p, which looks at cred->fsgid and
> > > the group.
> > >
> > > The logic is since the effective permissions of the running executable
> > > have not changed, there is nothing to special case.
> > >
> > > Arguably a setgid exec can drop what was egid, and if people have
> > > configured their permissions to deny people access based upon a group
> > > they are in that could change the result of the permission checks.  If
> > > changing egid winds up dropping a group from the list of the process's
> > > groups, the process could also have dropped that group with setresgid.
> > > So I don't think we need to be concerned about the combination of
> > > dropping egid and brpm->unsafe.
> > >
> > > If anyone sees a hole in that logic I am happy to change the check
> > > to !gid_eq(new->egid, old->egid), but I just can't see a way changing
> > > egid/fsgid to a group the process already has is a problem.
> >
> > I'm fine with leaving your patch as-is.
> 
> Aside from a tested-by verification from Max, it looks like everyone
> is satisfied with the v2 patch, yes?
> 
> Serge, I see you've reviewed this patch, can I assume that now you
> have a capabilities tree up and running you'll take this patch?

I can take another look and consider taking it on Monday, but until
then I'm effectively afk.

-serge

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2] exec: Correct the permission check for unsafe exec
  2025-06-12 21:26                       ` Serge E. Hallyn
@ 2025-06-13  1:48                         ` Kees Cook
  2025-06-13 15:28                           ` Paul Moore
  0 siblings, 1 reply; 44+ messages in thread
From: Kees Cook @ 2025-06-13  1:48 UTC (permalink / raw)
  To: Serge E. Hallyn, Paul Moore
  Cc: Jann Horn, Eric W. Biederman, Richard Guy Briggs, Max Kellermann,
	jmorris, Andy Lutomirski, morgan, Christian Brauner,
	linux-security-module, linux-kernel



On June 12, 2025 2:26:26 PM PDT, "Serge E. Hallyn" <serge@hallyn.com> wrote:
>On Tue, Jun 10, 2025 at 08:18:56PM -0400, Paul Moore wrote:
>> On Wed, May 21, 2025 at 11:36 AM Jann Horn <jannh@google.com> wrote:
>> > On Wed, May 21, 2025 at 5:27 PM Eric W. Biederman <ebiederm@xmission.com> wrote:
>> > > Jann Horn <jannh@google.com> writes:
>> > >
>> > > > On Wed, May 21, 2025 at 12:13 AM Eric W. Biederman
>> > > > <ebiederm@xmission.com> wrote:
>> > >
>> > > > Looks good to me overall, thanks for figuring out the history of this
>> > > > not-particularly-easy-to-understand code and figuring out the right
>> > > > fix.
>> > > >
>> > > > Reviewed-by: Jann Horn <jannh@google.com>
>> > > >
>> > > >> @@ -917,7 +911,7 @@ int cap_bprm_creds_from_file(struct linux_binprm *bprm, const struct file *file)
>> > > >>         /* Process setpcap binaries and capabilities for uid 0 */
>> > > >>         const struct cred *old = current_cred();
>> > > >>         struct cred *new = bprm->cred;
>> > > >> -       bool effective = false, has_fcap = false, is_setid;
>> > > >> +       bool effective = false, has_fcap = false, id_changed;
>> > > >>         int ret;
>> > > >>         kuid_t root_uid;
>> > > >>
>> > > >> @@ -941,9 +935,9 @@ int cap_bprm_creds_from_file(struct linux_binprm *bprm, const struct file *file)
>> > > >>          *
>> > > >>          * In addition, if NO_NEW_PRIVS, then ensure we get no new privs.
>> > > >>          */
>> > > >> -       is_setid = __is_setuid(new, old) || __is_setgid(new, old);
>> > > >> +       id_changed = !uid_eq(new->euid, old->euid) || !in_group_p(new->egid);
>> > > >
>> > > > Hm, so when we change from one EGID to another EGID which was already
>> > > > in our groups list, we don't treat it as a privileged exec? Which is
>> > > > okay because, while an unprivileged user would not just be allowed to
>> > > > change their EGID to a GID from their groups list themselves through
>> > > > __sys_setregid(), they would be allowed to create a new setgid binary
>> > > > owned by a group from their groups list and then execute that?
>> > > >
>> > > > That's fine with me, though it seems a little weird to me. setgid exec
>> > > > is changing our creds and yet we're not treating it as a "real" setgid
>> > > > execution because the execution is only granting privileges that
>> > > > userspace could have gotten anyway.
>> > >
>> > > More than could have gotten.  From permission checking point of view
>> > > permission that the application already had.  In general group based
>> > > permission checks just check in_group_p, which looks at cred->fsgid and
>> > > the group.
>> > >
>> > > The logic is since the effective permissions of the running executable
>> > > have not changed, there is nothing to special case.
>> > >
>> > > Arguably a setgid exec can drop what was egid, and if people have
>> > > configured their permissions to deny people access based upon a group
>> > > they are in that could change the result of the permission checks.  If
>> > > changing egid winds up dropping a group from the list of the process's
>> > > groups, the process could also have dropped that group with setresgid.
>> > > So I don't think we need to be concerned about the combination of
>> > > dropping egid and brpm->unsafe.
>> > >
>> > > If anyone sees a hole in that logic I am happy to change the check
>> > > to !gid_eq(new->egid, old->egid), but I just can't see a way changing
>> > > egid/fsgid to a group the process already has is a problem.
>> >
>> > I'm fine with leaving your patch as-is.
>> 
>> Aside from a tested-by verification from Max, it looks like everyone
>> is satisfied with the v2 patch, yes?
>> 
>> Serge, I see you've reviewed this patch, can I assume that now you
>> have a capabilities tree up and running you'll take this patch?
>
>I can take another look and consider taking it on Monday, but until
>then I'm effectively afk.

I'd rather this go via the execve/binfmt tree. I was waiting for -rc2 before putting it into -next. I can do Sunday night after it's out. :)

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2] exec: Correct the permission check for unsafe exec
  2025-06-11 14:23                       ` Max Kellermann
@ 2025-06-13 15:07                         ` Eric W. Biederman
  0 siblings, 0 replies; 44+ messages in thread
From: Eric W. Biederman @ 2025-06-13 15:07 UTC (permalink / raw)
  To: Max Kellermann
  Cc: Paul Moore, Jann Horn, Richard Guy Briggs, Serge E. Hallyn,
	Kees Cook, jmorris, Andy Lutomirski, morgan, Christian Brauner,
	linux-security-module, linux-kernel

Max Kellermann <max.kellermann@ionos.com> writes:

> On Wed, Jun 11, 2025 at 2:19 AM Paul Moore <paul@paul-moore.com> wrote:
>> Aside from a tested-by verification from Max, it looks like everyone
>> is satisfied with the v2 patch, yes?
>
> Sorry for the delay. I tested Eric's v2 patch and it solves my
> problem. His patch is nearly identical to mine, it's only a bit more
> intrusive by removing the weird __is_setXid functions that never made
> sense. I welcome that; I wasn't confident enough to do that and tried
> to make the least intrusive patch.
>
> Eric, I'm glad you changed your mind and no longer consider my work
> "pure nonsense" and "pointless".

As you pointed out in that case my analysis of your code was incorrect.

Further I wrote this patch when I finally realized what is going on and
that the case you are dealing with is an actual bug in the current
code and not some kind of enhancement or extension.

> But one problem remains: in the same email, you demanded evidence that
> userspace doesn't depend on the current behavior. However, in your
> patch description, you hand-waved that away by "I don't expect anyone
> to care". What happened to that?

The analysis of __is_setuid and __is_setgid that allowed me to remove
them helped quite a lot.

The analysis makes it clear that the code change is semantically safe
so we don't loose anything by not mucking with permissions.

The analysis shows the code is good comprehension and maintenance and
not just for your case.  It also makes it clear why not supporting your
case is a bug, and frankly a regression in the current code. (A 20 year
old regression so that doesn't carry much weight but still a
regression).

A related analysis in another parallel thread mostly concluded that for
brpm->unsafe in general it is a better user experience to terminate the
exec with a permission error instead of continuing the exec.  Exception
ptrace.

Part of my resistance was the initial reading that your change was
trying to escape the unsafe downgrading of permissions, instead of
showing that it was safe to keep the permissions.

With the reminder that no one should even be exercising the permission
downgrading case, it became clear that changing the little bit that is
safe should not affect may users at all.

Failing the exec rather than downgrading permissions will also make
a good test to see if anyone cares about this functionality.  I do still
believe we should tread carefully.

Hopefully that makes things clear.

Eric

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2] exec: Correct the permission check for unsafe exec
  2025-06-13  1:48                         ` Kees Cook
@ 2025-06-13 15:28                           ` Paul Moore
  2025-06-16 19:57                             ` Kees Cook
  0 siblings, 1 reply; 44+ messages in thread
From: Paul Moore @ 2025-06-13 15:28 UTC (permalink / raw)
  To: Kees Cook
  Cc: Serge E. Hallyn, Jann Horn, Eric W. Biederman, Richard Guy Briggs,
	Max Kellermann, jmorris, Andy Lutomirski, morgan,
	Christian Brauner, linux-security-module, linux-kernel

On Thu, Jun 12, 2025 at 9:48 PM Kees Cook <kees@kernel.org> wrote:
> On June 12, 2025 2:26:26 PM PDT, "Serge E. Hallyn" <serge@hallyn.com> wrote:
> >On Tue, Jun 10, 2025 at 08:18:56PM -0400, Paul Moore wrote:
> >> On Wed, May 21, 2025 at 11:36 AM Jann Horn <jannh@google.com> wrote:
> >> > On Wed, May 21, 2025 at 5:27 PM Eric W. Biederman <ebiederm@xmission.com> wrote:
> >> > > Jann Horn <jannh@google.com> writes:
> >> > >
> >> > > > On Wed, May 21, 2025 at 12:13 AM Eric W. Biederman
> >> > > > <ebiederm@xmission.com> wrote:
> >> > >
> >> > > > Looks good to me overall, thanks for figuring out the history of this
> >> > > > not-particularly-easy-to-understand code and figuring out the right
> >> > > > fix.
> >> > > >
> >> > > > Reviewed-by: Jann Horn <jannh@google.com>
> >> > > >
> >> > > >> @@ -917,7 +911,7 @@ int cap_bprm_creds_from_file(struct linux_binprm *bprm, const struct file *file)
> >> > > >>         /* Process setpcap binaries and capabilities for uid 0 */
> >> > > >>         const struct cred *old = current_cred();
> >> > > >>         struct cred *new = bprm->cred;
> >> > > >> -       bool effective = false, has_fcap = false, is_setid;
> >> > > >> +       bool effective = false, has_fcap = false, id_changed;
> >> > > >>         int ret;
> >> > > >>         kuid_t root_uid;
> >> > > >>
> >> > > >> @@ -941,9 +935,9 @@ int cap_bprm_creds_from_file(struct linux_binprm *bprm, const struct file *file)
> >> > > >>          *
> >> > > >>          * In addition, if NO_NEW_PRIVS, then ensure we get no new privs.
> >> > > >>          */
> >> > > >> -       is_setid = __is_setuid(new, old) || __is_setgid(new, old);
> >> > > >> +       id_changed = !uid_eq(new->euid, old->euid) || !in_group_p(new->egid);
> >> > > >
> >> > > > Hm, so when we change from one EGID to another EGID which was already
> >> > > > in our groups list, we don't treat it as a privileged exec? Which is
> >> > > > okay because, while an unprivileged user would not just be allowed to
> >> > > > change their EGID to a GID from their groups list themselves through
> >> > > > __sys_setregid(), they would be allowed to create a new setgid binary
> >> > > > owned by a group from their groups list and then execute that?
> >> > > >
> >> > > > That's fine with me, though it seems a little weird to me. setgid exec
> >> > > > is changing our creds and yet we're not treating it as a "real" setgid
> >> > > > execution because the execution is only granting privileges that
> >> > > > userspace could have gotten anyway.
> >> > >
> >> > > More than could have gotten.  From permission checking point of view
> >> > > permission that the application already had.  In general group based
> >> > > permission checks just check in_group_p, which looks at cred->fsgid and
> >> > > the group.
> >> > >
> >> > > The logic is since the effective permissions of the running executable
> >> > > have not changed, there is nothing to special case.
> >> > >
> >> > > Arguably a setgid exec can drop what was egid, and if people have
> >> > > configured their permissions to deny people access based upon a group
> >> > > they are in that could change the result of the permission checks.  If
> >> > > changing egid winds up dropping a group from the list of the process's
> >> > > groups, the process could also have dropped that group with setresgid.
> >> > > So I don't think we need to be concerned about the combination of
> >> > > dropping egid and brpm->unsafe.
> >> > >
> >> > > If anyone sees a hole in that logic I am happy to change the check
> >> > > to !gid_eq(new->egid, old->egid), but I just can't see a way changing
> >> > > egid/fsgid to a group the process already has is a problem.
> >> >
> >> > I'm fine with leaving your patch as-is.
> >>
> >> Aside from a tested-by verification from Max, it looks like everyone
> >> is satisfied with the v2 patch, yes?
> >>
> >> Serge, I see you've reviewed this patch, can I assume that now you
> >> have a capabilities tree up and running you'll take this patch?
> >
> >I can take another look and consider taking it on Monday, but until
> >then I'm effectively afk.
>
> I'd rather this go via the execve/binfmt tree. I was waiting for -rc2 before putting it into -next. I can do Sunday night after it's out. :)

I'm not going to argue either way on this, that's between you and
Serge, but as the entire patch is located within commoncap.c and that
is part of the capabilities code which Serge maintains, can you
explain why this should go via the execve/binfmt tree and not the
capabilities tree?

-- 
paul-moore.com

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2] exec: Correct the permission check for unsafe exec
  2025-06-13 15:28                           ` Paul Moore
@ 2025-06-16 19:57                             ` Kees Cook
  2025-06-16 20:16                               ` Paul Moore
  0 siblings, 1 reply; 44+ messages in thread
From: Kees Cook @ 2025-06-16 19:57 UTC (permalink / raw)
  To: Paul Moore
  Cc: Serge E. Hallyn, Jann Horn, Eric W. Biederman, Richard Guy Briggs,
	Max Kellermann, jmorris, Andy Lutomirski, morgan,
	Christian Brauner, linux-security-module, linux-kernel



On June 13, 2025 8:28:46 AM PDT, Paul Moore <paul@paul-moore.com> wrote:
>On Thu, Jun 12, 2025 at 9:48 PM Kees Cook <kees@kernel.org> wrote:
>> On June 12, 2025 2:26:26 PM PDT, "Serge E. Hallyn" <serge@hallyn.com> wrote:
>> >On Tue, Jun 10, 2025 at 08:18:56PM -0400, Paul Moore wrote:
>> >> On Wed, May 21, 2025 at 11:36 AM Jann Horn <jannh@google.com> wrote:
>> >> > On Wed, May 21, 2025 at 5:27 PM Eric W. Biederman <ebiederm@xmission.com> wrote:
>> >> > > Jann Horn <jannh@google.com> writes:
>> >> > >
>> >> > > > On Wed, May 21, 2025 at 12:13 AM Eric W. Biederman
>> >> > > > <ebiederm@xmission.com> wrote:
>> >> > >
>> >> > > > Looks good to me overall, thanks for figuring out the history of this
>> >> > > > not-particularly-easy-to-understand code and figuring out the right
>> >> > > > fix.
>> >> > > >
>> >> > > > Reviewed-by: Jann Horn <jannh@google.com>
>> >> > > >
>> >> > > >> @@ -917,7 +911,7 @@ int cap_bprm_creds_from_file(struct linux_binprm *bprm, const struct file *file)
>> >> > > >>         /* Process setpcap binaries and capabilities for uid 0 */
>> >> > > >>         const struct cred *old = current_cred();
>> >> > > >>         struct cred *new = bprm->cred;
>> >> > > >> -       bool effective = false, has_fcap = false, is_setid;
>> >> > > >> +       bool effective = false, has_fcap = false, id_changed;
>> >> > > >>         int ret;
>> >> > > >>         kuid_t root_uid;
>> >> > > >>
>> >> > > >> @@ -941,9 +935,9 @@ int cap_bprm_creds_from_file(struct linux_binprm *bprm, const struct file *file)
>> >> > > >>          *
>> >> > > >>          * In addition, if NO_NEW_PRIVS, then ensure we get no new privs.
>> >> > > >>          */
>> >> > > >> -       is_setid = __is_setuid(new, old) || __is_setgid(new, old);
>> >> > > >> +       id_changed = !uid_eq(new->euid, old->euid) || !in_group_p(new->egid);
>> >> > > >
>> >> > > > Hm, so when we change from one EGID to another EGID which was already
>> >> > > > in our groups list, we don't treat it as a privileged exec? Which is
>> >> > > > okay because, while an unprivileged user would not just be allowed to
>> >> > > > change their EGID to a GID from their groups list themselves through
>> >> > > > __sys_setregid(), they would be allowed to create a new setgid binary
>> >> > > > owned by a group from their groups list and then execute that?
>> >> > > >
>> >> > > > That's fine with me, though it seems a little weird to me. setgid exec
>> >> > > > is changing our creds and yet we're not treating it as a "real" setgid
>> >> > > > execution because the execution is only granting privileges that
>> >> > > > userspace could have gotten anyway.
>> >> > >
>> >> > > More than could have gotten.  From permission checking point of view
>> >> > > permission that the application already had.  In general group based
>> >> > > permission checks just check in_group_p, which looks at cred->fsgid and
>> >> > > the group.
>> >> > >
>> >> > > The logic is since the effective permissions of the running executable
>> >> > > have not changed, there is nothing to special case.
>> >> > >
>> >> > > Arguably a setgid exec can drop what was egid, and if people have
>> >> > > configured their permissions to deny people access based upon a group
>> >> > > they are in that could change the result of the permission checks.  If
>> >> > > changing egid winds up dropping a group from the list of the process's
>> >> > > groups, the process could also have dropped that group with setresgid.
>> >> > > So I don't think we need to be concerned about the combination of
>> >> > > dropping egid and brpm->unsafe.
>> >> > >
>> >> > > If anyone sees a hole in that logic I am happy to change the check
>> >> > > to !gid_eq(new->egid, old->egid), but I just can't see a way changing
>> >> > > egid/fsgid to a group the process already has is a problem.
>> >> >
>> >> > I'm fine with leaving your patch as-is.
>> >>
>> >> Aside from a tested-by verification from Max, it looks like everyone
>> >> is satisfied with the v2 patch, yes?
>> >>
>> >> Serge, I see you've reviewed this patch, can I assume that now you
>> >> have a capabilities tree up and running you'll take this patch?
>> >
>> >I can take another look and consider taking it on Monday, but until
>> >then I'm effectively afk.
>>
>> I'd rather this go via the execve/binfmt tree. I was waiting for -rc2 before putting it into -next. I can do Sunday night after it's out. :)
>
>I'm not going to argue either way on this, that's between you and
>Serge, but as the entire patch is located within commoncap.c and that
>is part of the capabilities code which Serge maintains, can you
>explain why this should go via the execve/binfmt tree and not the
>capabilities tree?


Well, it's code exercised exclusively by the execve syscall (even the subject prefix says "exec"), but really I just want to be aware of changes in this area. How about we do what we're doing in vma, which is to list specific files in both entries in the MAINTAINERS file?

Anyway, sure, Serge, go ahead and take it. :)

Acked-by: Kees Cook <kees@kernel.org>


-- 
Kees Cook

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2] exec: Correct the permission check for unsafe exec
  2025-06-16 19:57                             ` Kees Cook
@ 2025-06-16 20:16                               ` Paul Moore
  0 siblings, 0 replies; 44+ messages in thread
From: Paul Moore @ 2025-06-16 20:16 UTC (permalink / raw)
  To: Kees Cook
  Cc: Serge E. Hallyn, Jann Horn, Eric W. Biederman, Richard Guy Briggs,
	Max Kellermann, jmorris, Andy Lutomirski, morgan,
	Christian Brauner, linux-security-module, linux-kernel

On Mon, Jun 16, 2025 at 3:57 PM Kees Cook <kees@kernel.org> wrote:
> On June 13, 2025 8:28:46 AM PDT, Paul Moore <paul@paul-moore.com> wrote:
> >On Thu, Jun 12, 2025 at 9:48 PM Kees Cook <kees@kernel.org> wrote:
> >> On June 12, 2025 2:26:26 PM PDT, "Serge E. Hallyn" <serge@hallyn.com> wrote:
> >> >On Tue, Jun 10, 2025 at 08:18:56PM -0400, Paul Moore wrote:
> >> >> On Wed, May 21, 2025 at 11:36 AM Jann Horn <jannh@google.com> wrote:
> >> >> > On Wed, May 21, 2025 at 5:27 PM Eric W. Biederman <ebiederm@xmission.com> wrote:
> >> >> > > Jann Horn <jannh@google.com> writes:
> >> >> > >
> >> >> > > > On Wed, May 21, 2025 at 12:13 AM Eric W. Biederman
> >> >> > > > <ebiederm@xmission.com> wrote:
> >> >> > >
> >> >> > > > Looks good to me overall, thanks for figuring out the history of this
> >> >> > > > not-particularly-easy-to-understand code and figuring out the right
> >> >> > > > fix.
> >> >> > > >
> >> >> > > > Reviewed-by: Jann Horn <jannh@google.com>
> >> >> > > >
> >> >> > > >> @@ -917,7 +911,7 @@ int cap_bprm_creds_from_file(struct linux_binprm *bprm, const struct file *file)
> >> >> > > >>         /* Process setpcap binaries and capabilities for uid 0 */
> >> >> > > >>         const struct cred *old = current_cred();
> >> >> > > >>         struct cred *new = bprm->cred;
> >> >> > > >> -       bool effective = false, has_fcap = false, is_setid;
> >> >> > > >> +       bool effective = false, has_fcap = false, id_changed;
> >> >> > > >>         int ret;
> >> >> > > >>         kuid_t root_uid;
> >> >> > > >>
> >> >> > > >> @@ -941,9 +935,9 @@ int cap_bprm_creds_from_file(struct linux_binprm *bprm, const struct file *file)
> >> >> > > >>          *
> >> >> > > >>          * In addition, if NO_NEW_PRIVS, then ensure we get no new privs.
> >> >> > > >>          */
> >> >> > > >> -       is_setid = __is_setuid(new, old) || __is_setgid(new, old);
> >> >> > > >> +       id_changed = !uid_eq(new->euid, old->euid) || !in_group_p(new->egid);
> >> >> > > >
> >> >> > > > Hm, so when we change from one EGID to another EGID which was already
> >> >> > > > in our groups list, we don't treat it as a privileged exec? Which is
> >> >> > > > okay because, while an unprivileged user would not just be allowed to
> >> >> > > > change their EGID to a GID from their groups list themselves through
> >> >> > > > __sys_setregid(), they would be allowed to create a new setgid binary
> >> >> > > > owned by a group from their groups list and then execute that?
> >> >> > > >
> >> >> > > > That's fine with me, though it seems a little weird to me. setgid exec
> >> >> > > > is changing our creds and yet we're not treating it as a "real" setgid
> >> >> > > > execution because the execution is only granting privileges that
> >> >> > > > userspace could have gotten anyway.
> >> >> > >
> >> >> > > More than could have gotten.  From permission checking point of view
> >> >> > > permission that the application already had.  In general group based
> >> >> > > permission checks just check in_group_p, which looks at cred->fsgid and
> >> >> > > the group.
> >> >> > >
> >> >> > > The logic is since the effective permissions of the running executable
> >> >> > > have not changed, there is nothing to special case.
> >> >> > >
> >> >> > > Arguably a setgid exec can drop what was egid, and if people have
> >> >> > > configured their permissions to deny people access based upon a group
> >> >> > > they are in that could change the result of the permission checks.  If
> >> >> > > changing egid winds up dropping a group from the list of the process's
> >> >> > > groups, the process could also have dropped that group with setresgid.
> >> >> > > So I don't think we need to be concerned about the combination of
> >> >> > > dropping egid and brpm->unsafe.
> >> >> > >
> >> >> > > If anyone sees a hole in that logic I am happy to change the check
> >> >> > > to !gid_eq(new->egid, old->egid), but I just can't see a way changing
> >> >> > > egid/fsgid to a group the process already has is a problem.
> >> >> >
> >> >> > I'm fine with leaving your patch as-is.
> >> >>
> >> >> Aside from a tested-by verification from Max, it looks like everyone
> >> >> is satisfied with the v2 patch, yes?
> >> >>
> >> >> Serge, I see you've reviewed this patch, can I assume that now you
> >> >> have a capabilities tree up and running you'll take this patch?
> >> >
> >> >I can take another look and consider taking it on Monday, but until
> >> >then I'm effectively afk.
> >>
> >> I'd rather this go via the execve/binfmt tree. I was waiting for -rc2 before putting it into -next. I can do Sunday night after it's out. :)
> >
> >I'm not going to argue either way on this, that's between you and
> >Serge, but as the entire patch is located within commoncap.c and that
> >is part of the capabilities code which Serge maintains, can you
> >explain why this should go via the execve/binfmt tree and not the
> >capabilities tree?
>
>
> Well, it's code exercised exclusively by the execve syscall (even the subject prefix says "exec"), but really I just want to be aware of changes in this area. How about we do what we're doing in vma, which is to list specific files in both entries in the MAINTAINERS file?

Double listing commoncap.c seems a bit heavyweight here and will
likely catch a lot of things unrelated to exec, but maybe some
additional K/regex fields in the exec section would make sense?

> Anyway, sure, Serge, go ahead and take it. :)
>
> Acked-by: Kees Cook <kees@kernel.org>

-- 
paul-moore.com

^ permalink raw reply	[flat|nested] 44+ messages in thread

end of thread, other threads:[~2025-06-16 20:16 UTC | newest]

Thread overview: 44+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-03-06  8:26 [PATCH] security/commoncap: don't assume "setid" if all ids are identical Max Kellermann
2025-03-07 10:32 ` kernel test robot
2025-03-09 15:19 ` Serge E. Hallyn
2025-04-28 11:43   ` Max Kellermann
2025-05-06 13:21     ` Serge E. Hallyn
2025-05-06 14:51       ` Max Kellermann
2025-05-07  3:16         ` Andrew G. Morgan
2025-05-07  6:33           ` Max Kellermann
2025-05-08  3:32             ` Andrew G. Morgan
2025-05-08  6:38               ` Max Kellermann
2025-05-08  8:37               ` Max Kellermann
2025-05-09 17:50             ` Max Kellermann
2025-05-08 22:12         ` sergeh
2025-05-09  6:15           ` Max Kellermann
2025-05-09 14:44             ` Eric W. Biederman
2025-05-09 16:53               ` Max Kellermann
2025-05-09 20:17                 ` Serge E. Hallyn
2025-05-09 18:41               ` [PATCH] Documentation/no_new_privs.rst: document dropping effective ids Max Kellermann
2025-05-15 16:24 ` [PATCH] exec: Correct the permission check for unsafe exec Eric W. Biederman
2025-05-15 22:09   ` Kees Cook
2025-05-16 15:26     ` Eric W. Biederman
2025-05-16 18:06       ` Jann Horn
2025-05-16 18:08         ` Jann Horn
2025-05-16 21:46         ` sergeh
2025-05-20 22:38           ` Jann Horn
2025-05-20 22:43             ` Kees Cook
2025-05-16 23:29         ` Eric W. Biederman
2025-05-20 20:20           ` Kees Cook
2025-05-20 22:13             ` [PATCH v2] " Eric W. Biederman
2025-05-20 22:35               ` Kees Cook
2025-05-20 23:53               ` Jann Horn
2025-05-21 15:27                 ` Eric W. Biederman
2025-05-21 15:36                   ` Jann Horn
2025-06-11  0:18                     ` Paul Moore
2025-06-11 14:23                       ` Max Kellermann
2025-06-13 15:07                         ` Eric W. Biederman
2025-06-12 21:26                       ` Serge E. Hallyn
2025-06-13  1:48                         ` Kees Cook
2025-06-13 15:28                           ` Paul Moore
2025-06-16 19:57                             ` Kees Cook
2025-06-16 20:16                               ` Paul Moore
2025-05-16 21:48     ` [PATCH] " sergeh
2025-05-16 21:49   ` sergeh
     [not found] <CAKPOu+_p6D-s9k_rokup+7JC-GDJMCxweBrR2JthLGtSufrUCQ@mail.gmail.com>
     [not found] ` <20250509180150.GA707254@mail.hallyn.com>
     [not found]   ` <CAKPOu+_MEX-KC-Mp+Pn0=mKRoNQcrAs0nEUmyU84EZC=2CMkXA@mail.gmail.com>
     [not found]     ` <20250509192642.GA707808@mail.hallyn.com>
     [not found]       ` <CAKPOu+-iV8daKn_TVR5QStoBNWD-MDpG6Bmj4zoe4QQL_PLJtw@mail.gmail.com>
     [not found]         ` <20250510130938.GA712334@mail.hallyn.com>
     [not found]           ` <CAKPOu+_U6AAf5MTSi8mmUTbLdGEw0w=5mqnaCZ3YjvxgCN6WeA@mail.gmail.com>
     [not found]             ` <20250510160809.GA713084@mail.hallyn.com>
     [not found]               ` <CAKPOu+9ooP6yH6qHgAEaX_3W+kGxYjJX9UuTeGS_BPc3BhTDyA@mail.gmail.com>
     [not found]                 ` <20250510192919.GA714187@mail.hallyn.com>
     [not found]                   ` <202505121725.34FC22D2D@keescook>
2025-05-13  6:25                     ` [PATCH] security/commoncap: don't assume "setid" if all ids are identical Max Kellermann

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).