Re: [PATCH] fs/exec.c: Add fast path for ENOENT on PATH search before allocating mm

linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Kees Cook <keescook@chromium.org>
To: Josh Triplett <josh@joshtriplett.org>
Cc: Peter Zijlstra <peterz@infradead.org>,
	Eric Biederman <ebiederm@xmission.com>,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	linux-mm@kvack.org, linux-fsdevel@vger.kernel.org,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH] fs/exec.c: Add fast path for ENOENT on PATH search before allocating mm
Date: Fri, 16 Sep 2022 17:11:18 -0700	[thread overview]
Message-ID: <202209161637.9EDAF6B18@keescook> (raw)
In-Reply-To: <YyTY+OaClK+JHCOw@localhost>

[Hi Peter, apologies for dumping you into the middle of this thread.
I've got a question about sched_exec() below...]

On Fri, Sep 16, 2022 at 09:13:44PM +0100, Josh Triplett wrote:
> musl does the same thing, as do python and perl (likely via execvp or
> posix_spawnp). As does gcc when it executes `as`. And I've seen more
> than a few programs hand-implement a PATH search the same way. Seems
> worth optimizing for.

Yeah, it does seem like a simple way to eliminate needless work, though
I'd really like to see some kind of perf count of "in a given kernel
build, how many execve() system calls fail due to path search vs succeed",
just to get a better sense of the scale of the problem.

I don't like the idea of penalizing the _succeeding_ case, though, which
happens if we do the path walk twice. So, I went and refactoring the setup
order, moving the do_open_execat() up into alloc_bprm() instead of where
it was in bprm_exec(). The result makes it so it is, as you observed,
before the mm creation and generally expensive argument copying. The
difference to your patch seems to only be the allocation of the file
table entry, but avoids the double lookup, so I'm hoping the result is
actually even faster.

This cleanup is actually quite satisfying organizationally too -- the
fd and filename were passed around rather oddly.

The interaction with sched_exec() should be no worse (the file is opened
before it in either case), but in reading that function, it talks about
taking the opportunity to move the process to another CPU (IIUC) since,
paraphrasing, "it is at its lowest memory/cache size." But I wonder if
there is an existing accidental pessimistic result in that the process
stack has already been allocated. I am only passingly familiar with how
tasks get moved around under NUMA -- is the scheduler going to move
this process onto a different NUMA node and now it will be forced to
have the userspace process stack on one node and the program text and
heap on another? Or is that totally lost in the noise?

More specifically, I was wondering if processes would benefit from having
sched_exec() moved before the mm creation?

Regardless, here's a very lightly tested patch. Can you take this for a
spin and check your benchmark? Thanks!

-Kees

diff --git a/fs/exec.c b/fs/exec.c
index 9a5ca7b82bfc..5534301d67ca 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -898,6 +898,10 @@ EXPORT_SYMBOL(transfer_args_to_stack);
 
 #endif /* CONFIG_MMU */
 
+/*
+ * On success, callers must call do_close_execat() on the returned
+ * struct file.
+ */
 static struct file *do_open_execat(int fd, struct filename *name, int flags)
 {
 	struct file *file;
@@ -945,6 +949,16 @@ static struct file *do_open_execat(int fd, struct filename *name, int flags)
 	return ERR_PTR(err);
 }
 
+/**
+ * open_exec - Open a path name for execution
+ *
+ * @name: path name to open with the intent of executing it.
+ *
+ * Returns ERR_PTR on failure or allocated struct file on success.
+ *
+ * As this is a wrapper for the internal do_open_execat(), callers
+ * must call allow_write_access() before fput() on release.
+ */
 struct file *open_exec(const char *name)
 {
 	struct filename *filename = getname_kernel(name);
@@ -1485,6 +1499,15 @@ static int prepare_bprm_creds(struct linux_binprm *bprm)
 	return -ENOMEM;
 }
 
+/* Matches do_open_execat() */
+static void do_close_execat(struct file *file)
+{
+	if (!file)
+		return;
+	allow_write_access(file);
+	fput(file);
+}
+
 static void free_bprm(struct linux_binprm *bprm)
 {
 	if (bprm->mm) {
@@ -1496,10 +1519,7 @@ static void free_bprm(struct linux_binprm *bprm)
 		mutex_unlock(&current->signal->cred_guard_mutex);
 		abort_creds(bprm->cred);
 	}
-	if (bprm->file) {
-		allow_write_access(bprm->file);
-		fput(bprm->file);
-	}
+	do_close_execat(bprm->file);
 	if (bprm->executable)
 		fput(bprm->executable);
 	/* If a binfmt changed the interp, free it. */
@@ -1509,12 +1529,26 @@ static void free_bprm(struct linux_binprm *bprm)
 	kfree(bprm);
 }
 
-static struct linux_binprm *alloc_bprm(int fd, struct filename *filename)
+static struct linux_binprm *alloc_bprm(int fd, struct filename *filename,
+				       int flags)
 {
-	struct linux_binprm *bprm = kzalloc(sizeof(*bprm), GFP_KERNEL);
-	int retval = -ENOMEM;
-	if (!bprm)
+	struct linux_binprm *bprm;
+	struct file *file;
+	int retval;
+
+	file = do_open_execat(fd, filename, flags);
+	if (IS_ERR(file)) {
+		retval = PTR_ERR(file);
 		goto out;
+	}
+
+	retval = -ENOMEM;
+	bprm = kzalloc(sizeof(*bprm), GFP_KERNEL);
+	if (!bprm) {
+		do_close_execat(file);
+		goto out;
+	}
+	bprm->file = file;
 
 	if (fd == AT_FDCWD || filename->name[0] == '/') {
 		bprm->filename = filename->name;
@@ -1531,6 +1565,18 @@ static struct linux_binprm *alloc_bprm(int fd, struct filename *filename)
 	}
 	bprm->interp = bprm->filename;
 
+	/*
+	 * Record that a name derived from an O_CLOEXEC fd will be
+	 * inaccessible after exec.  This allows the code in exec to
+	 * choose to fail when the executable is not mmaped into the
+	 * interpreter and an open file descriptor is not passed to
+	 * the interpreter.  This makes for a better user experience
+	 * than having the interpreter start and then immediately fail
+	 * when it finds the executable is inaccessible.
+	 */
+	if (bprm->fdpath && get_close_on_exec(fd))
+		bprm->interp_flags |= BINPRM_FLAGS_PATH_INACCESSIBLE;
+
 	retval = bprm_mm_init(bprm);
 	if (retval)
 		goto out_free;
@@ -1803,10 +1849,8 @@ static int exec_binprm(struct linux_binprm *bprm)
 /*
  * sys_execve() executes a new program.
  */
-static int bprm_execve(struct linux_binprm *bprm,
-		       int fd, struct filename *filename, int flags)
+static int bprm_execve(struct linux_binprm *bprm)
 {
-	struct file *file;
 	int retval;
 
 	retval = prepare_bprm_creds(bprm);
@@ -1816,26 +1860,8 @@ static int bprm_execve(struct linux_binprm *bprm,
 	check_unsafe_exec(bprm);
 	current->in_execve = 1;
 
-	file = do_open_execat(fd, filename, flags);
-	retval = PTR_ERR(file);
-	if (IS_ERR(file))
-		goto out_unmark;
-
 	sched_exec();
 
-	bprm->file = file;
-	/*
-	 * Record that a name derived from an O_CLOEXEC fd will be
-	 * inaccessible after exec.  This allows the code in exec to
-	 * choose to fail when the executable is not mmaped into the
-	 * interpreter and an open file descriptor is not passed to
-	 * the interpreter.  This makes for a better user experience
-	 * than having the interpreter start and then immediately fail
-	 * when it finds the executable is inaccessible.
-	 */
-	if (bprm->fdpath && get_close_on_exec(fd))
-		bprm->interp_flags |= BINPRM_FLAGS_PATH_INACCESSIBLE;
-
 	/* Set the unchanging part of bprm->cred */
 	retval = security_bprm_creds_for_exec(bprm);
 	if (retval)
@@ -1863,7 +1889,6 @@ static int bprm_execve(struct linux_binprm *bprm,
 	if (bprm->point_of_no_return && !fatal_signal_pending(current))
 		force_fatal_sig(SIGSEGV);
 
-out_unmark:
 	current->fs->in_exec = 0;
 	current->in_execve = 0;
 
@@ -1897,7 +1922,7 @@ static int do_execveat_common(int fd, struct filename *filename,
 	 * further execve() calls fail. */
 	current->flags &= ~PF_NPROC_EXCEEDED;
 
-	bprm = alloc_bprm(fd, filename);
+	bprm = alloc_bprm(fd, filename, flags);
 	if (IS_ERR(bprm)) {
 		retval = PTR_ERR(bprm);
 		goto out_ret;
@@ -1946,7 +1971,7 @@ static int do_execveat_common(int fd, struct filename *filename,
 		bprm->argc = 1;
 	}
 
-	retval = bprm_execve(bprm, fd, filename, flags);
+	retval = bprm_execve(bprm);
 out_free:
 	free_bprm(bprm);
 
@@ -1971,7 +1996,7 @@ int kernel_execve(const char *kernel_filename,
 	if (IS_ERR(filename))
 		return PTR_ERR(filename);
 
-	bprm = alloc_bprm(fd, filename);
+	bprm = alloc_bprm(fd, filename, 0);
 	if (IS_ERR(bprm)) {
 		retval = PTR_ERR(bprm);
 		goto out_ret;
@@ -2006,7 +2031,7 @@ int kernel_execve(const char *kernel_filename,
 	if (retval < 0)
 		goto out_free;
 
-	retval = bprm_execve(bprm, fd, filename, 0);
+	retval = bprm_execve(bprm);
 out_free:
 	free_bprm(bprm);
 out_ret:


-- 
Kees Cook

next prev parent reply	other threads:[~2022-09-17  0:11 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-09-16 13:41 [PATCH] fs/exec.c: Add fast path for ENOENT on PATH search before allocating mm Josh Triplett
2022-09-16 14:38 ` Kees Cook
2022-09-16 20:13   ` Josh Triplett
2022-09-17  0:11     ` Kees Cook [this message]
2022-09-17  0:50       ` Josh Triplett
2022-09-19 20:02         ` Kees Cook
2022-10-01 16:01           ` Josh Triplett
2022-09-19 14:34       ` Peter Zijlstra
2022-09-22  7:27 ` [fs/exec.c] 0a276ae2d2: BUG:workqueue_lockup-pool kernel test robot
2023-11-07 20:30 ` [PATCH] fs/exec.c: Add fast path for ENOENT on PATH search before allocating mm Kees Cook
2023-11-07 20:51   ` Mateusz Guzik
2023-11-07 21:23     ` Mateusz Guzik
2023-11-07 22:50       ` Kees Cook
2023-11-07 23:08         ` Mateusz Guzik
2023-11-07 23:39           ` Kees Cook
2023-11-08  0:03             ` Mateusz Guzik
2023-11-08 19:25               ` Kees Cook
2023-11-08 19:31               ` Kees Cook
2023-11-08 19:35                 ` Mateusz Guzik
2023-11-09  0:17                   ` Eric W. Biederman
2023-11-09 12:21                     ` Mateusz Guzik
2023-11-10  5:26                       ` Eric W. Biederman
2023-11-07 20:37 ` Kees Cook

find likely ancestor, descendant, or conflicting patches for this message:
( dfblob:9a5ca7b82bf dfblob:5534301d67c )
 OR (
bs:"Re: [PATCH] fs/exec.c: Add fast path for ENOENT on PATH search before allocating mm" )
	(help)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=202209161637.9EDAF6B18@keescook \
    --to=keescook@chromium.org \
    --cc=ebiederm@xmission.com \
    --cc=josh@joshtriplett.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=peterz@infradead.org \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).