* [GIT PULL] execve updates for v6.13-rc1 (take 2)
@ 2024-11-21 14:53 Kees Cook
2024-11-25 23:40 ` Linus Torvalds
0 siblings, 1 reply; 21+ messages in thread
From: Kees Cook @ 2024-11-21 14:53 UTC (permalink / raw)
To: Linus Torvalds
Cc: linux-kernel, Alexander Viro, Christophe JAILLET,
Eric W. Biederman, Kees Cook, Nir Lichtman, Tycho Andersen,
Vegard Nossum
Hi Linus,
Please pull these execve updates for v6.13-rc1 (take 2). I've dropped
the argv[0] vs "comm" setting patches. We'll work on the better solution
for the next merge window.
Thanks!
-Kees
The following changes since commit 8cf0b93919e13d1e8d4466eb4080a4c4d9d66d7b:
Linux 6.12-rc2 (2024-10-06 15:32:27 -0700)
are available in the Git repository at:
https://git.kernel.org/pub/scm/linux/kernel/git/kees/linux.git tags/execve-v6.13-rc1-take2
for you to fetch changes up to 45bf05a51842c4c274f94514195c2e99cc1c200c:
exec: remove legacy custom binfmt modules autoloading (2024-11-21 06:44:02 -0800)
----------------------------------------------------------------
execve updates for v6.13-rc1 (take2)
- binfmt_misc: Fix comment typos (Christophe JAILLET)
- exec: move empty argv[0] warning closer to actual logic (Nir Lichtman)
- exec: remove legacy custom binfmt modules autoloading (Nir Lichtman)
- coredump: Do not lock when copying "comm"
- MAINTAINERS: add auxvec.h and set myself as maintainer
----------------------------------------------------------------
Christophe JAILLET (1):
fs: binfmt: Fix a typo
Kees Cook (3):
coredump: Do not lock during 'comm' reporting
MAINTAINERS: exec: Add auxvec.h UAPI
MAINTAINERS: exec: Mark Kees as maintainer
Nir Lichtman (1):
exec: remove legacy custom binfmt modules autoloading
Tycho Andersen (1):
selftests/exec: add a test for execveat()'s comm
nir@lichtman.org (1):
exec: move warning of null argv to be next to the relevant code
MAINTAINERS | 3 +-
fs/binfmt_misc.c | 2 +-
fs/exec.c | 22 ++--------
include/linux/coredump.h | 4 +-
tools/testing/selftests/exec/execveat.c | 77 +++++++++++++++++++++++++++++++--
5 files changed, 83 insertions(+), 25 deletions(-)
--
Kees Cook
^ permalink raw reply [flat|nested] 21+ messages in thread* Re: [GIT PULL] execve updates for v6.13-rc1 (take 2) 2024-11-21 14:53 [GIT PULL] execve updates for v6.13-rc1 (take 2) Kees Cook @ 2024-11-25 23:40 ` Linus Torvalds 2024-11-26 5:09 ` Kees Cook 0 siblings, 1 reply; 21+ messages in thread From: Linus Torvalds @ 2024-11-25 23:40 UTC (permalink / raw) To: Kees Cook Cc: linux-kernel, Alexander Viro, Christophe JAILLET, Eric W. Biederman, Nir Lichtman, Tycho Andersen, Vegard Nossum On Thu, 21 Nov 2024 at 06:53, Kees Cook <kees@kernel.org> wrote: > > Please pull these execve updates for v6.13-rc1 (take 2). I've dropped > the argv[0] vs "comm" setting patches. We'll work on the better solution > for the next merge window. Yeah, I was pulling this, and then noted that the selftest is now documented to be that garbage. So I unpulled again. Linus ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [GIT PULL] execve updates for v6.13-rc1 (take 2) 2024-11-25 23:40 ` Linus Torvalds @ 2024-11-26 5:09 ` Kees Cook 2024-11-26 20:11 ` Linus Torvalds 0 siblings, 1 reply; 21+ messages in thread From: Kees Cook @ 2024-11-26 5:09 UTC (permalink / raw) To: Linus Torvalds Cc: linux-kernel, Alexander Viro, Christophe JAILLET, Eric W. Biederman, Nir Lichtman, Tycho Andersen, Vegard Nossum On November 26, 2024 9:40:22 AM GMT+10:00, Linus Torvalds <torvalds@linux-foundation.org> wrote: >On Thu, 21 Nov 2024 at 06:53, Kees Cook <kees@kernel.org> wrote: >> >> Please pull these execve updates for v6.13-rc1 (take 2). I've dropped >> the argv[0] vs "comm" setting patches. We'll work on the better solution >> for the next merge window. > >Yeah, I was pulling this, and then noted that the selftest is now >documented to be that garbage. > >So I unpulled again. Okay. I'd left it because I figured we'd be tweaking it for the new implementation, but yeah I can just rework it all then. For the new implementation, do you want to wait a full dev cycle for it to bake in -next or should I send what I proposed based on your and Al's suggestions for this merge window? -Kees -- Kees Cook -- Kees Cook ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [GIT PULL] execve updates for v6.13-rc1 (take 2) 2024-11-26 5:09 ` Kees Cook @ 2024-11-26 20:11 ` Linus Torvalds 2024-11-26 20:12 ` Linus Torvalds 2024-11-28 0:53 ` Kees Cook 0 siblings, 2 replies; 21+ messages in thread From: Linus Torvalds @ 2024-11-26 20:11 UTC (permalink / raw) To: Kees Cook Cc: linux-kernel, Alexander Viro, Christophe JAILLET, Eric W. Biederman, Nir Lichtman, Tycho Andersen, Vegard Nossum On Mon, 25 Nov 2024 at 21:10, Kees Cook <kees@kernel.org> wrote: > > For the new implementation, do you want to wait a full dev cycle for > it to bake in -next or should I send what I proposed based on your and > Al's suggestions for this merge window? So honestly, the more I look at our current implementation, the more I dislike this code. And it looks like __set_task_comm() is actually buggy right now, because while we have a comment in linux/sched.h that says * The strscpy_pad() in __set_task_comm() can ensure that the task comm is * always NUL-terminated and zero-padded. that isn't actually true, because it looks like sized_strscpy() actually adds the final NUL at the end. I think that's because Andrew only merged a partial patch series. The task_lock() doesn't help that issue, because readers don't take it (and never really did: the '%s'+tsk->comm pattern has always existed). End result: I think we should get rid of the pointless task_lock, explicitly make sure it's NUL-terminated, and do the actual comm[] setup in alloc_bprm() where we make all these decisions anyway. So something like the attached. But it's *ENTIRELY* untested. It looks ObviouslyCorrect(tm), but hey, things always do until somebody finds some obvious bug. If this tests ok, I think it could make 6.13, but ... And I'm looking at the other uses of bprm->filename for the fdpath case, and it's all horrible. Yes, it's the scripting, but it's also bprm->exec + AT_EXECFN. So we're passing in those fake pathnames that won't actually work if the fd is used for something else, and that I think could be used as an attack vector if any user space actually trusts them. That all looks horrid. I *really* wish we generated the real pathname instead. Oh well. That's a separate issue. Linus ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [GIT PULL] execve updates for v6.13-rc1 (take 2) 2024-11-26 20:11 ` Linus Torvalds @ 2024-11-26 20:12 ` Linus Torvalds 2024-11-28 0:53 ` Kees Cook 1 sibling, 0 replies; 21+ messages in thread From: Linus Torvalds @ 2024-11-26 20:12 UTC (permalink / raw) To: Kees Cook Cc: linux-kernel, Alexander Viro, Christophe JAILLET, Eric W. Biederman, Nir Lichtman, Tycho Andersen, Vegard Nossum [-- Attachment #1: Type: text/plain, Size: 238 bytes --] On Tue, 26 Nov 2024 at 12:11, Linus Torvalds <torvalds@linux-foundation.org> wrote: > > So something like the attached. But it's *ENTIRELY* untested. [..] It was also entirely un-attached. Here's the actual attachment. Linus [-- Attachment #2: patch.diff --] [-- Type: text/x-patch, Size: 2623 bytes --] fs/exec.c | 27 +++++++++++++++++++++++---- include/linux/binfmts.h | 1 + 2 files changed, 24 insertions(+), 4 deletions(-) diff --git a/fs/exec.c b/fs/exec.c index da51ca70489a..850421445ccb 100644 --- a/fs/exec.c +++ b/fs/exec.c @@ -1194,12 +1194,22 @@ static int unshare_sighand(struct task_struct *me) * so that a new one can be started */ +/* + * No locking, instead this just makes sure that comm[] is always + * NUL-terminated. Note that 'strscpy_pad()' does not guarantee + * that at least right now - it may do the copy without a NUL and + * then overwrite the last character. + * + * One of the few cases where 'strncpy()' does the right thing. + * + * If you do concurrent set_task_comm() calls, the name may be + * a bit random, but it's going to be NUL-terminated. + */ void __set_task_comm(struct task_struct *tsk, const char *buf, bool exec) { - task_lock(tsk); trace_task_rename(tsk, buf); - strscpy_pad(tsk->comm, buf, sizeof(tsk->comm)); - task_unlock(tsk); + tsk->comm[sizeof(tsk->comm)-1] = 0; + strncpy(tsk->comm, buf, sizeof(tsk->comm)-1); perf_event_comm(tsk, exec); } @@ -1337,7 +1347,7 @@ int begin_new_exec(struct linux_binprm * bprm) set_dumpable(current->mm, SUID_DUMP_USER); perf_event_exec(); - __set_task_comm(me, kbasename(bprm->filename), true); + __set_task_comm(me, bprm->comm, true); /* An exec changes our domain. We are no longer part of the thread group */ @@ -1510,6 +1520,11 @@ static struct linux_binprm *alloc_bprm(int fd, struct filename *filename, int fl if (fd == AT_FDCWD || filename->name[0] == '/') { bprm->filename = filename->name; + /* + * FIXME(?): this could take advantage of 'filename->len' + * to do better than kbasename() + */ + strscpy(bprm->comm, kbasename(filename->name)); } else { if (filename->name[0] == '\0') bprm->fdpath = kasprintf(GFP_KERNEL, "/dev/fd/%d", fd); @@ -1532,6 +1547,10 @@ static struct linux_binprm *alloc_bprm(int fd, struct filename *filename, int fl bprm->interp_flags |= BINPRM_FLAGS_PATH_INACCESSIBLE; bprm->filename = bprm->fdpath; + + rcu_read_lock(); + strscpy(bprm->comm, smp_load_acquire(&file->f_path.dentry->d_name.name)); + rcu_read_unlock(); } bprm->interp = bprm->filename; diff --git a/include/linux/binfmts.h b/include/linux/binfmts.h index e6c00e860951..3c8bc84ca9e4 100644 --- a/include/linux/binfmts.h +++ b/include/linux/binfmts.h @@ -61,6 +61,7 @@ struct linux_binprm { struct rlimit rlim_stack; /* Saved RLIMIT_STACK used during exec. */ + char comm[TASK_COMM_LEN]; char buf[BINPRM_BUF_SIZE]; } __randomize_layout; ^ permalink raw reply related [flat|nested] 21+ messages in thread
* Re: [GIT PULL] execve updates for v6.13-rc1 (take 2) 2024-11-26 20:11 ` Linus Torvalds 2024-11-26 20:12 ` Linus Torvalds @ 2024-11-28 0:53 ` Kees Cook 2024-11-28 1:59 ` Linus Torvalds 1 sibling, 1 reply; 21+ messages in thread From: Kees Cook @ 2024-11-28 0:53 UTC (permalink / raw) To: Linus Torvalds Cc: linux-kernel, Alexander Viro, Christophe JAILLET, Eric W. Biederman, Nir Lichtman, Tycho Andersen, Vegard Nossum On Tue, Nov 26, 2024 at 12:11:06PM -0800, Linus Torvalds wrote: > And it looks like __set_task_comm() is actually buggy right now, > because while we have a comment in linux/sched.h that says > > * The strscpy_pad() in __set_task_comm() can ensure that the task comm is > * always NUL-terminated and zero-padded. > > that isn't actually true, because it looks like sized_strscpy() > actually adds the final NUL at the end. I think that's because Andrew > only merged a partial patch series. Oh ew, yes, it looks like it copies the last byte, checks if it was NUL, and then NULs it if it ran out of space. Ugh. Before for comm we used strlen + memcpy, and so the trailing NUL was always present. Grumble grumble. I'll look at your patch and will dig around and see what works best. I'm on vacation currently, so there will be some latency... On a related note, what do you think of using execveat's "pathname" argument as "comm" if AT_EMPTY_PATH is set? That'll give process launchers control over comm (which is what they want), and we can keep the dentry name fallback as proposed too? -Kees -- Kees Cook ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [GIT PULL] execve updates for v6.13-rc1 (take 2) 2024-11-28 0:53 ` Kees Cook @ 2024-11-28 1:59 ` Linus Torvalds 2024-11-28 2:05 ` Al Viro 0 siblings, 1 reply; 21+ messages in thread From: Linus Torvalds @ 2024-11-28 1:59 UTC (permalink / raw) To: Kees Cook Cc: linux-kernel, Alexander Viro, Christophe JAILLET, Eric W. Biederman, Nir Lichtman, Tycho Andersen, Vegard Nossum On Wed, 27 Nov 2024 at 16:53, Kees Cook <kees@kernel.org> wrote: > > On a related note, what do you think of using execveat's "pathname" > argument as "comm" if AT_EMPTY_PATH is set? That'll give process > launchers control over comm (which is what they want), and we can keep > the dentry name fallback as proposed too? That's not actually how AT_EMPTY_PATH works. Yes, it's how AT_EMPTY_PATH *should* work, but despite the name, AT_EMPTYH_PATH does not mean "path is empty". It means "path *may* be empty - but if path isn't empty, it's a regular path". IOW, what is going on is that POSIX required that an empty path be an error. And AT_EMPTY_PATH is basically a "don't error out on an empty path" flag, not a "path *is* empty" flag. So if pathname exists and isn't empty, AT_EMPTY_PATH does nothing. Linus ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [GIT PULL] execve updates for v6.13-rc1 (take 2) 2024-11-28 1:59 ` Linus Torvalds @ 2024-11-28 2:05 ` Al Viro 2024-11-28 2:24 ` Linus Torvalds 0 siblings, 1 reply; 21+ messages in thread From: Al Viro @ 2024-11-28 2:05 UTC (permalink / raw) To: Linus Torvalds Cc: Kees Cook, linux-kernel, Christophe JAILLET, Eric W. Biederman, Nir Lichtman, Tycho Andersen, Vegard Nossum On Wed, Nov 27, 2024 at 05:59:53PM -0800, Linus Torvalds wrote: > On Wed, 27 Nov 2024 at 16:53, Kees Cook <kees@kernel.org> wrote: > > > > On a related note, what do you think of using execveat's "pathname" > > argument as "comm" if AT_EMPTY_PATH is set? That'll give process > > launchers control over comm (which is what they want), and we can keep > > the dentry name fallback as proposed too? > > That's not actually how AT_EMPTY_PATH works. > > Yes, it's how AT_EMPTY_PATH *should* work, but despite the name, > AT_EMPTYH_PATH does not mean "path is empty". > > It means "path *may* be empty - but if path isn't empty, it's a regular path". > > IOW, what is going on is that POSIX required that an empty path be an > error. And AT_EMPTY_PATH is basically a "don't error out on an empty > path" flag, not a "path *is* empty" flag. > > So if pathname exists and isn't empty, AT_EMPTY_PATH does nothing. ... so let's tie that to pathname _being_ empty - it's not as if it had been hard to check. What's more, let's allow userland pointer to be NULL - use getname_maybe_null() and treat NULL returned by it as "we have an empty pathname". ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [GIT PULL] execve updates for v6.13-rc1 (take 2) 2024-11-28 2:05 ` Al Viro @ 2024-11-28 2:24 ` Linus Torvalds 2024-11-29 2:08 ` Kees Cook 0 siblings, 1 reply; 21+ messages in thread From: Linus Torvalds @ 2024-11-28 2:24 UTC (permalink / raw) To: Al Viro Cc: Kees Cook, linux-kernel, Christophe JAILLET, Eric W. Biederman, Nir Lichtman, Tycho Andersen, Vegard Nossum On Wed, 27 Nov 2024 at 18:06, Al Viro <viro@zeniv.linux.org.uk> wrote: > > > So if pathname exists and isn't empty, AT_EMPTY_PATH does nothing. > > ... so let's tie that to pathname _being_ empty - it's not as if it > had been hard to check. This is not some kind of new system call, and AT_EMPTY_PATH isn't some Linux-only thing. It has well-defined and documented semantics: AT_EMPTY_PATH If this flag is specified, oldname can be an empty string. Note the "can be". Not "will/must be". > What's more, let's allow userland pointer to be NULL - use getname_maybe_null() > and treat NULL returned by it as "we have an empty pathname". Now, that's separate, and I agree with that extension. That just suppresses another "empty string" error case. But no, I do not accept changing well-documented behaviour of AT_EMPTY_PATH, much less the insanity of making "execveat()" have completely different semantics for AT_EMPTY_PATH than a plain openat. Linus ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [GIT PULL] execve updates for v6.13-rc1 (take 2) 2024-11-28 2:24 ` Linus Torvalds @ 2024-11-29 2:08 ` Kees Cook 2024-11-29 2:43 ` Linus Torvalds 0 siblings, 1 reply; 21+ messages in thread From: Kees Cook @ 2024-11-29 2:08 UTC (permalink / raw) To: Linus Torvalds, Al Viro Cc: linux-kernel, Christophe JAILLET, Eric W. Biederman, Nir Lichtman, Tycho Andersen, Vegard Nossum On November 28, 2024 12:24:27 PM GMT+10:00, Linus Torvalds <torvalds@linux-foundation.org> wrote: >But no, I do not accept changing well-documented behaviour of >AT_EMPTY_PATH, much less the insanity of making "execveat()" have >completely different semantics for AT_EMPTY_PATH than a plain openat. Yeah, that was going to be the next question. ;) Okay, so, that last thing I can see here as an option is to add an exec-specific AT flag, and if that isn't workable I don't see a way to make this happen with execveat. -Kees -- Kees Cook ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [GIT PULL] execve updates for v6.13-rc1 (take 2) 2024-11-29 2:08 ` Kees Cook @ 2024-11-29 2:43 ` Linus Torvalds 2024-11-29 3:34 ` Al Viro 0 siblings, 1 reply; 21+ messages in thread From: Linus Torvalds @ 2024-11-29 2:43 UTC (permalink / raw) To: Kees Cook Cc: Al Viro, linux-kernel, Christophe JAILLET, Eric W. Biederman, Nir Lichtman, Tycho Andersen, Vegard Nossum On Thu, 28 Nov 2024 at 18:08, Kees Cook <kees@kernel.org> wrote: > > Okay, so, that last thing I can see here as an option is to add an exec-specific AT flag, and if that isn't workable I don't see a way to make this happen with execveat. So Kees, ask yourself WHY you want to do this idiotic thing in the first place. You say "isn't workable" and "I don't see a way to make this happen with execveat". But be HONEST now. Really - take a step back, and ask yourself what are actually sane semantics. And do NOT think about the insane semantics that you have been spoon-fed by some insane and broken argument where somebody told you what they wanted and said "this is what systemd needs". A project that is *famous* for doing things wrong? Instead, ask yourself that the *sane* semantics are, when you DON'T come from them from "somebody crazy told me what they want". Look at regular "execve()". Really. What is "comm[]" when you do an execveat? It's THE NAME OF THE EXECUTABLE. It's not "some different thing that was passed in separately". It wasn't argv[0] for regular execve(), was it? Even though regular execve() very much also has an argv[0]. So if it wasn't argv[0] for regular execve(), what *is* it, really? And what is the equivalent for execveat(fd, "", AT_EMPTY_PATH)? Can you ask yourself that WITHOUT the spoon-fed answer - hint, it was wrong - that you have been arguing for. Here's a hint: it's the name of the executable you are running. That's it. Really. It clearly was NEVER "argv[0]". That has nothing to do with the name of the executable, has nothing to do with what regular execve() does, and clearly has no relationship with what you are executing. IO tried to tell you. Then I tried to *show* you how even in the kernel sources we had very real examples of "execveat()" that had a argv[0] that was entirely unrelated to the executable. You seem to not have figured it out. Here's another hint: it's not "", and it's not "some other string I pass in with a separate flag to give that string meaning". It's also not the name of the symlink that was dereferenced in an earlier system call and that resulted in that 'fd' file descriptor. That symlink is gone. It might literally not exist any more by the time you do the execveat(), but even if it did, the very act of opening it made it irrelevant. The open followed the symlink, the symlink is done with. That's how symlinks work, and it's literally the difference between a symlink and a hardlink. And I claim that if you are actually honest about it, you'll come up with the name that I told you about originally. Because that dentry->d_name really *IS* the name of the file that is associated with the 'fd'. So here's the deal: stop making excuses. Stop ignoring reality. Stop this idiocy of trying to impose some new meaning on 'comm[]' that makes no sense, and has *NOTHING* to do with a regular 'execve()'. And furthermore, start looking at your own motivations. Why are you taking at face value the word of somebody who claims that "this is what systemd wants, and the symlink name is magical"? Face the fact that maybe systemd is famous for bad decisions, and this is just another example of it. A sane setup has lots of options - just use execve() with the actual name of the executable - use hardlinks and use execveat() - use open() on a symlink and then execveat() of that file, and get the actual name of the executable behind the symlink - disagree about comm[] being relevant at all, and don't use it, and don't use tools that do and none of those are wrong decisions. But what *is* wrong is to argue for something insane because you didn't accept the sane end result. Spend your time trying to convince the systemd people to do the sane thing instead of trying to make the kernel do something stupid, ok? Because I'm done with this argument. If I get more stupid emails on this, I will just block you. Because my time isn't worth arguing with flat-earthers. Learn to accept reality. In the kernel we have some taste. Linus ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [GIT PULL] execve updates for v6.13-rc1 (take 2) 2024-11-29 2:43 ` Linus Torvalds @ 2024-11-29 3:34 ` Al Viro 2024-11-29 4:23 ` Eric W. Biederman 2024-11-29 4:44 ` Linus Torvalds 0 siblings, 2 replies; 21+ messages in thread From: Al Viro @ 2024-11-29 3:34 UTC (permalink / raw) To: Linus Torvalds Cc: Kees Cook, linux-kernel, Christophe JAILLET, Eric W. Biederman, Nir Lichtman, Tycho Andersen, Vegard Nossum On Thu, Nov 28, 2024 at 06:43:31PM -0800, Linus Torvalds wrote: > A sane setup has lots of options > > - just use execve() with the actual name of the executable > > - use hardlinks and use execveat() > > - use open() on a symlink and then execveat() of that file, and get > the actual name of the executable behind the symlink > > - disagree about comm[] being relevant at all, and don't use it, and > don't use tools that do > > and none of those are wrong decisions. Just one thing - IMO we want to use the relative pathname when it's not empty. Even in execveat(). Because some wanker *will* decide that newer is better and make libc use execveat() to implement execve(). Which won't be spotted for about a year, and when it does we'll get seriously stuck. I agree that for fexecve() the only sane approach is to go by whatever that opened file refers to; I'm not sold on the _usefulness_ of fexecve() to start with, but if we want that thing, that's the way to go. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [GIT PULL] execve updates for v6.13-rc1 (take 2) 2024-11-29 3:34 ` Al Viro @ 2024-11-29 4:23 ` Eric W. Biederman 2024-11-29 4:48 ` Al Viro ` (2 more replies) 2024-11-29 4:44 ` Linus Torvalds 1 sibling, 3 replies; 21+ messages in thread From: Eric W. Biederman @ 2024-11-29 4:23 UTC (permalink / raw) To: Casey Schaufler Cc: Linus Torvalds, Kees Cook, linux-kernel, Christophe JAILLET, Nir Lichtman, Tycho Andersen, Vegard Nossum, linux-security-module, Al Viro Al Viro <viro@zeniv.linux.org.uk> writes: > On Thu, Nov 28, 2024 at 06:43:31PM -0800, Linus Torvalds wrote: >> A sane setup has lots of options >> >> - just use execve() with the actual name of the executable >> >> - use hardlinks and use execveat() >> >> - use open() on a symlink and then execveat() of that file, and get >> the actual name of the executable behind the symlink >> >> - disagree about comm[] being relevant at all, and don't use it, and >> don't use tools that do >> >> and none of those are wrong decisions. > > Just one thing - IMO we want to use the relative pathname when it's > not empty. Even in execveat(). Because some wanker *will* decide > that newer is better and make libc use execveat() to implement execve(). > Which won't be spotted for about a year, and when it does we'll get > seriously stuck. For clarity the only patches I have seen so far have been about the fexecve subset of execveat. AKA when execveat is has not been supplied a path. > I agree that for fexecve() the only sane approach is to go by whatever > that opened file refers to; I'm not sold on the _usefulness_ of > fexecve() to start with, but if we want that thing, that's the way > to go. The craziness is that apparently systemd wants to implement execve in terms of fexecve, not execveat. ... Wanting to see what is going on I cloned the most recent version of systemd. I see some calls that are generally redundant. That is systemd opens the executable and fstat's the executable to make certain the executable is a regular file and not a directory symlink. Which seems harmless but pointless unless something is interesting is going to happen with the executable_fd before it is passed to the kernel to execute. The only case in systemd that does something interesting with the executable_fd (that I could see) has to do with smack. Please see the code below. Casey do you have any idea why systemd would want to read the label from the executable and apply it to the current process? Is the systemd smack support sensible? As it is written this seems like the kind of logic every process that calls execve will want to repeat. So I am wondering isn't it easier just to get the kernel to do the right thing and not have deeply special smack code in systemd? Does the kernel already do the right thing and systemd is just confused? Right now it looks like the sane path is to sort out the systemd's smack support and then systemd should be able to continue using execve like any sane program. #if ENABLE_SMACK static int setup_smack( const ExecParameters *params, const ExecContext *context, int executable_fd) { int r; assert(params); assert(executable_fd >= 0); if (context->smack_process_label) { r = mac_smack_apply_pid(0, context->smack_process_label); if (r < 0) return r; } else if (params->fallback_smack_process_label) { _cleanup_free_ char *exec_label = NULL; r = mac_smack_read_fd(executable_fd, SMACK_ATTR_EXEC, &exec_label); if (r < 0 && !ERRNO_IS_XATTR_ABSENT(r)) return r; r = mac_smack_apply_pid(0, exec_label ?: params->fallback_smack_process_label); if (r < 0) return r; } return 0; } #endif Eric ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [GIT PULL] execve updates for v6.13-rc1 (take 2) 2024-11-29 4:23 ` Eric W. Biederman @ 2024-11-29 4:48 ` Al Viro 2024-11-29 17:00 ` Casey Schaufler 2024-11-29 19:33 ` Eric W. Biederman 2 siblings, 0 replies; 21+ messages in thread From: Al Viro @ 2024-11-29 4:48 UTC (permalink / raw) To: Eric W. Biederman Cc: Casey Schaufler, Linus Torvalds, Kees Cook, linux-kernel, Christophe JAILLET, Nir Lichtman, Tycho Andersen, Vegard Nossum, linux-security-module On Thu, Nov 28, 2024 at 10:23:18PM -0600, Eric W. Biederman wrote: > > I agree that for fexecve() the only sane approach is to go by whatever > > that opened file refers to; I'm not sold on the _usefulness_ of > > fexecve() to start with, but if we want that thing, that's the way > > to go. > > The craziness is that apparently systemd wants to implement execve in > terms of fexecve, not execveat. ... presumably because the pathname might have changed its meaning just as we called execve(). Which is why we want it to show up in comm, got it. </sarcasm> ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [GIT PULL] execve updates for v6.13-rc1 (take 2) 2024-11-29 4:23 ` Eric W. Biederman 2024-11-29 4:48 ` Al Viro @ 2024-11-29 17:00 ` Casey Schaufler 2024-11-29 19:33 ` Eric W. Biederman 2 siblings, 0 replies; 21+ messages in thread From: Casey Schaufler @ 2024-11-29 17:00 UTC (permalink / raw) To: Eric W. Biederman Cc: Linus Torvalds, Kees Cook, linux-kernel, Christophe JAILLET, Nir Lichtman, Tycho Andersen, Vegard Nossum, linux-security-module, Al Viro, Casey Schaufler On 11/28/2024 8:23 PM, Eric W. Biederman wrote: > Al Viro <viro@zeniv.linux.org.uk> writes: > >> On Thu, Nov 28, 2024 at 06:43:31PM -0800, Linus Torvalds wrote: >>> A sane setup has lots of options >>> >>> - just use execve() with the actual name of the executable >>> >>> - use hardlinks and use execveat() >>> >>> - use open() on a symlink and then execveat() of that file, and get >>> the actual name of the executable behind the symlink >>> >>> - disagree about comm[] being relevant at all, and don't use it, and >>> don't use tools that do >>> >>> and none of those are wrong decisions. >> Just one thing - IMO we want to use the relative pathname when it's >> not empty. Even in execveat(). Because some wanker *will* decide >> that newer is better and make libc use execveat() to implement execve(). >> Which won't be spotted for about a year, and when it does we'll get >> seriously stuck. > For clarity the only patches I have seen so far have been > about the fexecve subset of execveat. AKA when execveat is has > not been supplied a path. > >> I agree that for fexecve() the only sane approach is to go by whatever >> that opened file refers to; I'm not sold on the _usefulness_ of >> fexecve() to start with, but if we want that thing, that's the way >> to go. > The craziness is that apparently systemd wants to implement execve in > terms of fexecve, not execveat. > > .. > > Wanting to see what is going on I cloned the most recent version of > systemd. I see some calls that are generally redundant. That is > systemd opens the executable and fstat's the executable to make > certain the executable is a regular file and not a directory symlink. > > Which seems harmless but pointless unless something is interesting > is going to happen with the executable_fd before it is passed to > the kernel to execute. > > The only case in systemd that does something interesting with the > executable_fd (that I could see) has to do with smack. Please see > the code below. > > Casey do you have any idea why systemd would want to read the label from > the executable and apply it to the current process? Is the systemd > smack support sensible? Smack supports an attribute SMACK64EXEC, which identifies the label the program should run with. I haven't looked at how fexecve() treats this. I would think that fexecve() would have to respect this behavior just as it would support setuid and setgid bits. I am looking at the systemd code and it emulates the exec() behavior, so regardless of the fexecve() behavior the program will run with the correct label. > As it is written this seems like the kind of logic every process that > calls execve will want to repeat. > > So I am wondering isn't it easier just to get the kernel to do the right > thing and not have deeply special smack code in systemd? Does the > kernel already do the right thing and systemd is just confused? The reason systemd emulates the exec() behavior is to allow for the case where systemd starts all processes with a particular label. I don't know why it uses fexecve(). > Right now it looks like the sane path is to sort out the systemd's > smack support and then systemd should be able to continue using > execve like any sane program. > > #if ENABLE_SMACK > static int setup_smack( > const ExecParameters *params, > const ExecContext *context, > int executable_fd) { > int r; > > assert(params); > assert(executable_fd >= 0); > > if (context->smack_process_label) { > r = mac_smack_apply_pid(0, context->smack_process_label); > if (r < 0) > return r; > } else if (params->fallback_smack_process_label) { > _cleanup_free_ char *exec_label = NULL; > > r = mac_smack_read_fd(executable_fd, SMACK_ATTR_EXEC, &exec_label); > if (r < 0 && !ERRNO_IS_XATTR_ABSENT(r)) > return r; > > r = mac_smack_apply_pid(0, exec_label ?: params->fallback_smack_process_label); > if (r < 0) > return r; > } > > return 0; > } > #endif > > Eric > ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [GIT PULL] execve updates for v6.13-rc1 (take 2) 2024-11-29 4:23 ` Eric W. Biederman 2024-11-29 4:48 ` Al Viro 2024-11-29 17:00 ` Casey Schaufler @ 2024-11-29 19:33 ` Eric W. Biederman 2 siblings, 0 replies; 21+ messages in thread From: Eric W. Biederman @ 2024-11-29 19:33 UTC (permalink / raw) To: Casey Schaufler Cc: Linus Torvalds, Kees Cook, linux-kernel, Christophe JAILLET, Nir Lichtman, Tycho Andersen, Vegard Nossum, linux-security-module, Al Viro Casey and the smack folks my apologies for copying you in. I just read the code below a little more carefully and it is definitely a systemd bug. mac_smack_read_fd reads the xattr that smack will apply as a label if it is present. So there is no reason for systemd to apply the label itself. Worse smack_bprm_creds_for_exec applies checks before applying the label (aka is the superblock trusted) that systemd doesn't. Which means systemd might apply a label from a smack xattr when smack wouldn't. > static int setup_smack( > const ExecParameters *params, > const ExecContext *context, > int executable_fd) { > int r; > > assert(params); > assert(executable_fd >= 0); > > if (context->smack_process_label) { > r = mac_smack_apply_pid(0, context->smack_process_label); > if (r < 0) > return r; > } else if (params->fallback_smack_process_label) { > _cleanup_free_ char *exec_label = NULL; > > r = mac_smack_read_fd(executable_fd, SMACK_ATTR_EXEC, &exec_label); > if (r < 0 && !ERRNO_IS_XATTR_ABSENT(r)) > return r; > > r = mac_smack_apply_pid(0, exec_label ?: params->fallback_smack_process_label); > if (r < 0) > return r; > } > > return 0; > } Which means the systemd code should really be: > static int setup_smack( > const ExecParameters *params, > const ExecContext *context) { > int r; > > assert(params); > if (context->smack_process_label) { > r = mac_smack_apply_pid(0, context->smack_process_label); > if (r < 0) > return r; > } else if (params->fallback_smack_process_label) { > r = mac_smack_apply_pid(0, params->fallback_smack_process_label); > if (r < 0) > return r; > } > > return 0; > } At which point systemd has no need to open the executable file descriptor and thus no need to play with fexecve. Eric ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [GIT PULL] execve updates for v6.13-rc1 (take 2) 2024-11-29 3:34 ` Al Viro 2024-11-29 4:23 ` Eric W. Biederman @ 2024-11-29 4:44 ` Linus Torvalds 2024-11-29 12:41 ` Eric W. Biederman 2024-11-29 21:42 ` Vegard Nossum 1 sibling, 2 replies; 21+ messages in thread From: Linus Torvalds @ 2024-11-29 4:44 UTC (permalink / raw) To: Al Viro Cc: Kees Cook, linux-kernel, Christophe JAILLET, Eric W. Biederman, Nir Lichtman, Tycho Andersen, Vegard Nossum On Thu, 28 Nov 2024 at 19:34, Al Viro <viro@zeniv.linux.org.uk> wrote: > > Just one thing - IMO we want to use the relative pathname when it's > not empty. Even in execveat() Oh, absolutely agreed. Good catch, because yes, I messed that part up in my suggested patch at https://lore.kernel.org/all/CAHk-=wjF_09Z6vu7f8UAbQVDDoHnd-j391YpUxmBPLD=SKbKtQ@mail.gmail.com/ which did this dentry name thing for anything that used a base fd, but yes, as you say, it should only do it when there is no name at all. So instead of basing it (incorrectly) on that existing if (fd == AT_FDCWD || filename->name[0] == '/') { test, the logic should probably look something like if (!filename->name[0]) { rcu_read_lock(); strscpy(bprm->comm, smp_load_acquire(&file->f_path.dentry->d_name.name)); rcu_read_unlock(); } else strscpy(bprm->comm, kbasename(filename->name)); and it probably wouldn't be a bad idea to separate this out to be a helper function that just does this one thing. Linus ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [GIT PULL] execve updates for v6.13-rc1 (take 2) 2024-11-29 4:44 ` Linus Torvalds @ 2024-11-29 12:41 ` Eric W. Biederman 2024-11-29 21:42 ` Vegard Nossum 1 sibling, 0 replies; 21+ messages in thread From: Eric W. Biederman @ 2024-11-29 12:41 UTC (permalink / raw) To: Linus Torvalds Cc: Al Viro, Kees Cook, linux-kernel, Christophe JAILLET, Nir Lichtman, Tycho Andersen, Vegard Nossum Linus Torvalds <torvalds@linux-foundation.org> writes: > On Thu, 28 Nov 2024 at 19:34, Al Viro <viro@zeniv.linux.org.uk> wrote: >> >> Just one thing - IMO we want to use the relative pathname when it's >> not empty. Even in execveat() > > Oh, absolutely agreed. > > Good catch, because yes, I messed that part up in my suggested patch at > > https://lore.kernel.org/all/CAHk-=wjF_09Z6vu7f8UAbQVDDoHnd-j391YpUxmBPLD=SKbKtQ@mail.gmail.com/ > > which did this dentry name thing for anything that used a base fd, but > yes, as you say, it should only do it when there is no name at all. > > So instead of basing it (incorrectly) on that existing > > if (fd == AT_FDCWD || filename->name[0] == '/') { > > test, the logic should probably look something like > > if (!filename->name[0]) { > rcu_read_lock(); > strscpy(bprm->comm, > smp_load_acquire(&file->f_path.dentry->d_name.name)); > rcu_read_unlock(); > } else > strscpy(bprm->comm, kbasename(filename->name)); > > and it probably wouldn't be a bad idea to separate this out to be a > helper function that just does this one thing. Yes. It probably makes sense to change __set_task_comm to something simple like: void __set_task_comm(struct task_struct *tsk, const char *buf[TASK_COMM_LEN], bool exec) { trace_task_rename(tsk, buf); memcpy(tsk->comm, buf, TASK_COMM_LEN); perf_event_comm(tsk, exec); } There are only 6 callers of set_task_comm and __set_task_comm and they are straight forward to verify that they pass in a TASK_COMM_LEN bytes that are already initialized. With the bytes already properly initialized just copying them looks like it is as safe as anything else when it comes to races. The callers need to be tweaked a little to meet that condition something like: diff --git a/io_uring/io-wq.c b/io_uring/io-wq.c index a38f36b68060..5d0928f37471 100644 --- a/io_uring/io-wq.c +++ b/io_uring/io-wq.c @@ -634,7 +634,7 @@ static int io_wq_worker(void *data) struct io_wq_acct *acct = io_wq_get_acct(worker); struct io_wq *wq = worker->wq; bool exit_mask = false, last_timeout = false; - char buf[TASK_COMM_LEN]; + char buf[TASK_COMM_LEN] = {}; set_mask_bits(&worker->flags, 0, BIT(IO_WORKER_F_UP) | BIT(IO_WORKER_F_RUNNING)); diff --git a/io_uring/sqpoll.c b/io_uring/sqpoll.c index 6df5e649c413..701390bbb303 100644 --- a/io_uring/sqpoll.c +++ b/io_uring/sqpoll.c @@ -264,7 +264,7 @@ static int io_sq_thread(void *data) struct io_ring_ctx *ctx; struct rusage start; unsigned long timeout = 0; - char buf[TASK_COMM_LEN]; + char buf[TASK_COMM_LEN] = {}; DEFINE_WAIT(wait); /* offload context creation failed, just exit */ diff --git a/kernel/kthread.c b/kernel/kthread.c index a5ac612b1609..2b126835d728 100644 --- a/kernel/kthread.c +++ b/kernel/kthread.c @@ -738,10 +738,11 @@ EXPORT_SYMBOL(kthread_stop_put); int kthreadd(void *unused) { + static const char comm[TASK_COMM_LEN] = "kthreadd"; struct task_struct *tsk = current; /* Setup a clean context for our children to inherit. */ - set_task_comm(tsk, "kthreadd"); + set_task_comm(tsk, comm); ignore_signals(tsk); set_cpus_allowed_ptr(tsk, housekeeping_cpumask(HK_TYPE_KTHREAD)); set_mems_allowed(node_states[N_MEMORY]); Eric ^ permalink raw reply related [flat|nested] 21+ messages in thread
* Re: [GIT PULL] execve updates for v6.13-rc1 (take 2) 2024-11-29 4:44 ` Linus Torvalds 2024-11-29 12:41 ` Eric W. Biederman @ 2024-11-29 21:42 ` Vegard Nossum 2024-11-29 22:54 ` Al Viro 2024-11-30 4:24 ` Linus Torvalds 1 sibling, 2 replies; 21+ messages in thread From: Vegard Nossum @ 2024-11-29 21:42 UTC (permalink / raw) To: Linus Torvalds, Al Viro Cc: Kees Cook, linux-kernel, Christophe JAILLET, Eric W. Biederman, Nir Lichtman, Tycho Andersen On 29/11/2024 05:44, Linus Torvalds wrote: > On Thu, 28 Nov 2024 at 19:34, Al Viro <viro@zeniv.linux.org.uk> wrote: >> >> Just one thing - IMO we want to use the relative pathname when it's >> not empty. Even in execveat() > > Oh, absolutely agreed. > > Good catch, because yes, I messed that part up in my suggested patch at > > https://lore.kernel.org/all/CAHk-=wjF_09Z6vu7f8UAbQVDDoHnd-j391YpUxmBPLD=SKbKtQ@mail.gmail.com/ > > which did this dentry name thing for anything that used a base fd, but > yes, as you say, it should only do it when there is no name at all. > > So instead of basing it (incorrectly) on that existing > > if (fd == AT_FDCWD || filename->name[0] == '/') { > > test, the logic should probably look something like > > if (!filename->name[0]) { > rcu_read_lock(); > strscpy(bprm->comm, > smp_load_acquire(&file->f_path.dentry->d_name.name)); > rcu_read_unlock(); > } else > strscpy(bprm->comm, kbasename(filename->name)); > > and it probably wouldn't be a bad idea to separate this out to be a > helper function that just does this one thing. Probably a silly question, but why not do the same thing in all cases? file->f_path.dentry->d_name.name _is_ the basename of whatever the normal path resolution machinery ended up with; why not use just it unconditionally? It handled empty paths already. This also seems to match what you wrote in https://lore.kernel.org/all/CAHk-=wgKgi5eqo6oW0bBS2-Cr+d4jraoKfVq6wbmdiWWsZbMLw@mail.gmail.com/ (no argv[0], no magic strings, no symlinks): > dentry->d_name really *IS* the name of the file that is associated > with the 'fd'. Vegard ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [GIT PULL] execve updates for v6.13-rc1 (take 2) 2024-11-29 21:42 ` Vegard Nossum @ 2024-11-29 22:54 ` Al Viro 2024-11-30 4:24 ` Linus Torvalds 1 sibling, 0 replies; 21+ messages in thread From: Al Viro @ 2024-11-29 22:54 UTC (permalink / raw) To: Vegard Nossum Cc: Linus Torvalds, Kees Cook, linux-kernel, Christophe JAILLET, Eric W. Biederman, Nir Lichtman, Tycho Andersen On Fri, Nov 29, 2024 at 10:42:47PM +0100, Vegard Nossum wrote: > Probably a silly question, but why not do the same thing in all cases? > file->f_path.dentry->d_name.name _is_ the basename of whatever the > normal path resolution machinery ended up with; why not use just it > unconditionally? It handled empty paths already. This also seems to > match what you wrote in https://lore.kernel.org/all/CAHk-=wgKgi5eqo6oW0bBS2-Cr+d4jraoKfVq6wbmdiWWsZbMLw@mail.gmail.com/ > (no argv[0], no magic strings, no symlinks): User-visible change of behaviour, quite possibly confusing existing scripts... Let's not go there, it's a silly place. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [GIT PULL] execve updates for v6.13-rc1 (take 2) 2024-11-29 21:42 ` Vegard Nossum 2024-11-29 22:54 ` Al Viro @ 2024-11-30 4:24 ` Linus Torvalds 1 sibling, 0 replies; 21+ messages in thread From: Linus Torvalds @ 2024-11-30 4:24 UTC (permalink / raw) To: Vegard Nossum Cc: Al Viro, Kees Cook, linux-kernel, Christophe JAILLET, Eric W. Biederman, Nir Lichtman, Tycho Andersen On Fri, 29 Nov 2024 at 13:43, Vegard Nossum <vegard.nossum@oracle.com> wrote: > > Probably a silly question, but why not do the same thing in all cases? Because it would actually make a difference for the symlink case. And unlike the open() -> fd -> execveat() case, the symlink is actually active at the time of the execve(), so at that time it's a real part of the name. Now, do I believe it would actually matter? No, I doubt anybody would really notice. But let's not change user-visible behavior "just because". Linus ^ permalink raw reply [flat|nested] 21+ messages in thread
end of thread, other threads:[~2024-11-30 4:25 UTC | newest] Thread overview: 21+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2024-11-21 14:53 [GIT PULL] execve updates for v6.13-rc1 (take 2) Kees Cook 2024-11-25 23:40 ` Linus Torvalds 2024-11-26 5:09 ` Kees Cook 2024-11-26 20:11 ` Linus Torvalds 2024-11-26 20:12 ` Linus Torvalds 2024-11-28 0:53 ` Kees Cook 2024-11-28 1:59 ` Linus Torvalds 2024-11-28 2:05 ` Al Viro 2024-11-28 2:24 ` Linus Torvalds 2024-11-29 2:08 ` Kees Cook 2024-11-29 2:43 ` Linus Torvalds 2024-11-29 3:34 ` Al Viro 2024-11-29 4:23 ` Eric W. Biederman 2024-11-29 4:48 ` Al Viro 2024-11-29 17:00 ` Casey Schaufler 2024-11-29 19:33 ` Eric W. Biederman 2024-11-29 4:44 ` Linus Torvalds 2024-11-29 12:41 ` Eric W. Biederman 2024-11-29 21:42 ` Vegard Nossum 2024-11-29 22:54 ` Al Viro 2024-11-30 4:24 ` Linus Torvalds
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox