linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC][PATCH 00/13] io_uring, struct filename and audit
@ 2025-11-09  6:37 Al Viro
  2025-11-09  6:37 ` [RFC][PATCH 01/13] do_faccessat(): import pathname only once Al Viro
                   ` (12 more replies)
  0 siblings, 13 replies; 51+ messages in thread
From: Al Viro @ 2025-11-09  6:37 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: torvalds, brauner, jack, mjguzik, paul, axboe, audit, io-uring

[The last time that was discussed a year ago; rather than posting it
as a followup into the old thread, I'm starting a new one.  Old thread
can be found at https://lore.kernel.org/all/20240922004901.GA3413968@ZenIV/]

There are two filename-related problems in io_uring and its interplay
with audit.

Filenames are imported when request is submitted and used when it is
processed.  Unfortunately, with io_uring the latter may very well happen
in a different thread.  In that case the reference to filename is put into
the wrong audit_context - that of submitting thread, not the processing
one.  Audit logics is called by the latter, and it really wants to be
able to find the names in audit_context of current (== processing) thread.

Another related problem is the headache with refcounts - normally all
references to given struct filename are visible only to one thread (the
one that uses that struct filename).  io_uring violates that - an extra
reference is stashed in audit_context of submitter.  It gets dropped
when submitter returns to userland, which can happen simultaneously with
processing thread deciding to drop the reference it got.

We paper over that by making refcount atomic, but that means pointless
overhead for everyone.

One more headache with refcount comes from retry loops - handling of
-ESTALE while resolving a parent is dealt with transparently, but
server might get around to telling you that things got stale only
when you e.g. ask it to remove a link.  In that case we have to repeat
the pathwalk in "trust no one" mode and see if it helps.

So far, so good, but depending upon the helpers we are using we might
end up re-importing the pathname from userland.  Had that been merely
duplicated work on an already slow path, it wouldn't matter much, but
with audit in the mix it becomes seriously confusing.  Currently
getname() and its ilk try to cope with that (only if audit is enabled)
by stashing the userland address in struct filename and checking if
an instance for the same userland address has already been made visible
to audit, in which case we just grab an extra reference.

That's bogus for several reasons.  In particular, having the same
pointer passed in different pathname arguments of a syscall should not be
different from having separate strings with identical context given to it.
Compiler might turn the latter into the former, after all - merging
identical string literals may happen.  As the result, audit might produce
significantly different outputs on the same program, depending upon the
compiler flags used when building it.  This is obviously not a good thing.
The fact that this logics is dependent upon CONFIG_AUDIT also doesn't help.

The right solution is to have the pathname imported once, before the
retry loop; fortunately, most of those loops are already done that way
these days - only 9 exceptions in the entire kernel.

With those exceptions taken care of, we can get rid of the entire "stash
the userland address in struct filename" thing.  That helps to solve both
io_uring problems - import of pathname in there is already separated from
making use of it (the former happens in submitting thread, the latter -
in processing one).  If we explicitly mark the places where we start
making use of those suckers (in io_mkdirat(), etc.), we can have the
submitter do "incomplete" imports (just copying the name from userland
and stashing the result in opaque object).  Then processing thread
would explicitly ask to complete the import and use the resulting struct
filename *, same as a normal syscall would - all in the thread that does
actual work, so that situation looks normal for audit - the damn thing
goes into the right audit_context, all uses are thread-synchronous and
from the same thread, etc.  What's more, refcount can become non-atomic
(and grabbed only inside the kernel/auditsc.c, at that).

The series below attempts to do that.  It does need a serious review,
including that from audit and io_uring folks.

It lives in git://git.kernel.org:/pub/scm/linux/kernel/git/viro/vfs.git #work.filename-refcnt
Individual patches in followups.

Al Viro (12):
  do_faccessat(): import pathname only once
  do_fchmodat(): import pathname only once
  do_fchownat(): import pathname only once
  do_utimes_path(): import pathname only once
  chdir(2): import pathname only once
  chroot(2): import pathname only once
  user_statfs(): import pathname only once
  do_sys_truncate(): import pathname only once
  do_readlinkat(): import pathname only once
  get rid of audit_reusename()
  allow incomplete imports of filenames
  struct filename ->refcnt doesn't need to be atomic

Mateusz Guzik (1):
  fs: touch up predicts in putname()

 fs/namei.c            |  68 +++++++++++++++++++++-------
 fs/open.c             |  39 +++++++++-------
 fs/stat.c             |   6 +--
 fs/statfs.c           |   4 +-
 fs/utimes.c           |  13 +++---
 include/linux/audit.h |  11 -----
 include/linux/fs.h    |  17 ++++---
 io_uring/fs.c         | 101 ++++++++++++++++++++++--------------------
 io_uring/openclose.c  |  16 +++----
 io_uring/statx.c      |  17 +++----
 io_uring/xattr.c      |  30 +++++--------
 kernel/auditsc.c      |  23 ++--------
 12 files changed, 178 insertions(+), 167 deletions(-)

-- 
2.47.3


^ permalink raw reply	[flat|nested] 51+ messages in thread

* [RFC][PATCH 01/13] do_faccessat(): import pathname only once
  2025-11-09  6:37 [RFC][PATCH 00/13] io_uring, struct filename and audit Al Viro
@ 2025-11-09  6:37 ` Al Viro
  2025-11-13 10:11   ` Jan Kara
  2025-11-09  6:37 ` [RFC][PATCH 02/13] do_fchmodat(): " Al Viro
                   ` (11 subsequent siblings)
  12 siblings, 1 reply; 51+ messages in thread
From: Al Viro @ 2025-11-09  6:37 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: torvalds, brauner, jack, mjguzik, paul, axboe, audit, io-uring

Convert the user_path_at() call inside a retry loop into getname_flags() +
filename_lookup() + putname() and leave only filename_lookup() inside
the loop.

Since we have the default logics for use of LOOKUP_EMPTY (passed iff
AT_EMPTY_PATH is present in flags), just use getname_uflags() and
don't bother with setting LOOKUP_EMPTY in lookup_flags - getname_uflags()
will pass the right thing to getname_flags() and filename_lookup()
doesn't care about LOOKUP_EMPTY at all.

The things could be further simplified by use of cleanup.h stuff, but
let's not clutter the patch with that.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/open.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/fs/open.c b/fs/open.c
index 3d64372ecc67..db8fe2b5463d 100644
--- a/fs/open.c
+++ b/fs/open.c
@@ -471,6 +471,7 @@ static int do_faccessat(int dfd, const char __user *filename, int mode, int flag
 	int res;
 	unsigned int lookup_flags = LOOKUP_FOLLOW;
 	const struct cred *old_cred = NULL;
+	struct filename *name;
 
 	if (mode & ~S_IRWXO)	/* where's F_OK, X_OK, W_OK, R_OK? */
 		return -EINVAL;
@@ -480,8 +481,6 @@ static int do_faccessat(int dfd, const char __user *filename, int mode, int flag
 
 	if (flags & AT_SYMLINK_NOFOLLOW)
 		lookup_flags &= ~LOOKUP_FOLLOW;
-	if (flags & AT_EMPTY_PATH)
-		lookup_flags |= LOOKUP_EMPTY;
 
 	if (access_need_override_creds(flags)) {
 		old_cred = access_override_creds();
@@ -489,8 +488,9 @@ static int do_faccessat(int dfd, const char __user *filename, int mode, int flag
 			return -ENOMEM;
 	}
 
+	name = getname_uflags(filename, flags);
 retry:
-	res = user_path_at(dfd, filename, lookup_flags, &path);
+	res = filename_lookup(dfd, name, lookup_flags, &path, NULL);
 	if (res)
 		goto out;
 
@@ -530,6 +530,7 @@ static int do_faccessat(int dfd, const char __user *filename, int mode, int flag
 		goto retry;
 	}
 out:
+	putname(name);
 	if (old_cred)
 		put_cred(revert_creds(old_cred));
 
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [RFC][PATCH 02/13] do_fchmodat(): import pathname only once
  2025-11-09  6:37 [RFC][PATCH 00/13] io_uring, struct filename and audit Al Viro
  2025-11-09  6:37 ` [RFC][PATCH 01/13] do_faccessat(): import pathname only once Al Viro
@ 2025-11-09  6:37 ` Al Viro
  2025-11-13 10:12   ` Jan Kara
  2025-11-09  6:37 ` [RFC][PATCH 03/13] do_fchownat(): " Al Viro
                   ` (10 subsequent siblings)
  12 siblings, 1 reply; 51+ messages in thread
From: Al Viro @ 2025-11-09  6:37 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: torvalds, brauner, jack, mjguzik, paul, axboe, audit, io-uring

Convert the user_path_at() call inside a retry loop into getname_flags() +
filename_lookup() + putname() and leave only filename_lookup() inside
the loop.

Since we have the default logics for use of LOOKUP_EMPTY (passed iff
AT_EMPTY_PATH is present in flags), just use getname_uflags() and
don't bother with setting LOOKUP_EMPTY in lookup_flags - getname_uflags()
will pass the right thing to getname_flags() and filename_lookup()
doesn't care about LOOKUP_EMPTY at all.

The things could be further simplified by use of cleanup.h stuff, but
let's not clutter the patch with that.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/open.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/fs/open.c b/fs/open.c
index db8fe2b5463d..e9a08a820e49 100644
--- a/fs/open.c
+++ b/fs/open.c
@@ -682,6 +682,7 @@ static int do_fchmodat(int dfd, const char __user *filename, umode_t mode,
 		       unsigned int flags)
 {
 	struct path path;
+	struct filename *name;
 	int error;
 	unsigned int lookup_flags;
 
@@ -689,11 +690,9 @@ static int do_fchmodat(int dfd, const char __user *filename, umode_t mode,
 		return -EINVAL;
 
 	lookup_flags = (flags & AT_SYMLINK_NOFOLLOW) ? 0 : LOOKUP_FOLLOW;
-	if (flags & AT_EMPTY_PATH)
-		lookup_flags |= LOOKUP_EMPTY;
-
+	name = getname_uflags(filename, flags);
 retry:
-	error = user_path_at(dfd, filename, lookup_flags, &path);
+	error = filename_lookup(dfd, name, lookup_flags, &path, NULL);
 	if (!error) {
 		error = chmod_common(&path, mode);
 		path_put(&path);
@@ -702,6 +701,7 @@ static int do_fchmodat(int dfd, const char __user *filename, umode_t mode,
 			goto retry;
 		}
 	}
+	putname(name);
 	return error;
 }
 
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [RFC][PATCH 03/13] do_fchownat(): import pathname only once
  2025-11-09  6:37 [RFC][PATCH 00/13] io_uring, struct filename and audit Al Viro
  2025-11-09  6:37 ` [RFC][PATCH 01/13] do_faccessat(): import pathname only once Al Viro
  2025-11-09  6:37 ` [RFC][PATCH 02/13] do_fchmodat(): " Al Viro
@ 2025-11-09  6:37 ` Al Viro
  2025-11-13 10:13   ` Jan Kara
  2025-11-09  6:37 ` [RFC][PATCH 04/13] do_utimes_path(): " Al Viro
                   ` (9 subsequent siblings)
  12 siblings, 1 reply; 51+ messages in thread
From: Al Viro @ 2025-11-09  6:37 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: torvalds, brauner, jack, mjguzik, paul, axboe, audit, io-uring

Convert the user_path_at() call inside a retry loop into getname_flags() +
filename_lookup() + putname() and leave only filename_lookup() inside
the loop.

Since we have the default logics for use of LOOKUP_EMPTY (passed iff
AT_EMPTY_PATH is present in flags), just use getname_uflags() and
don't bother with setting LOOKUP_EMPTY in lookup_flags - getname_uflags()
will pass the right thing to getname_flags() and filename_lookup()
doesn't care about LOOKUP_EMPTY at all.

The things could be further simplified by use of cleanup.h stuff, but
let's not clutter the patch with that.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/open.c | 11 ++++++-----
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/fs/open.c b/fs/open.c
index e9a08a820e49..e5110f5e80c7 100644
--- a/fs/open.c
+++ b/fs/open.c
@@ -804,17 +804,17 @@ int do_fchownat(int dfd, const char __user *filename, uid_t user, gid_t group,
 		int flag)
 {
 	struct path path;
-	int error = -EINVAL;
+	int error;
 	int lookup_flags;
+	struct filename *name;
 
 	if ((flag & ~(AT_SYMLINK_NOFOLLOW | AT_EMPTY_PATH)) != 0)
-		goto out;
+		return -EINVAL;
 
 	lookup_flags = (flag & AT_SYMLINK_NOFOLLOW) ? 0 : LOOKUP_FOLLOW;
-	if (flag & AT_EMPTY_PATH)
-		lookup_flags |= LOOKUP_EMPTY;
+	name = getname_uflags(filename, flag);
 retry:
-	error = user_path_at(dfd, filename, lookup_flags, &path);
+	error = filename_lookup(dfd, name, lookup_flags, &path, NULL);
 	if (error)
 		goto out;
 	error = mnt_want_write(path.mnt);
@@ -829,6 +829,7 @@ int do_fchownat(int dfd, const char __user *filename, uid_t user, gid_t group,
 		goto retry;
 	}
 out:
+	putname(name);
 	return error;
 }
 
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [RFC][PATCH 04/13] do_utimes_path(): import pathname only once
  2025-11-09  6:37 [RFC][PATCH 00/13] io_uring, struct filename and audit Al Viro
                   ` (2 preceding siblings ...)
  2025-11-09  6:37 ` [RFC][PATCH 03/13] do_fchownat(): " Al Viro
@ 2025-11-09  6:37 ` Al Viro
  2025-11-13 10:15   ` Jan Kara
  2025-11-09  6:37 ` [RFC][PATCH 05/13] chdir(2): " Al Viro
                   ` (8 subsequent siblings)
  12 siblings, 1 reply; 51+ messages in thread
From: Al Viro @ 2025-11-09  6:37 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: torvalds, brauner, jack, mjguzik, paul, axboe, audit, io-uring

Convert the user_path_at() call inside a retry loop into getname_flags() +
filename_lookup() + putname() and leave only filename_lookup() inside
the loop.

Since we have the default logics for use of LOOKUP_EMPTY (passed iff
AT_EMPTY_PATH is present in flags), just use getname_uflags() and
don't bother with setting LOOKUP_EMPTY in lookup_flags - getname_uflags()
will pass the right thing to getname_flags() and filename_lookup()
doesn't care about LOOKUP_EMPTY at all.

The things could be further simplified by use of cleanup.h stuff, but
let's not clutter the patch with that.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/utimes.c | 13 +++++++------
 1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/fs/utimes.c b/fs/utimes.c
index c7c7958e57b2..262a4ddeb9cc 100644
--- a/fs/utimes.c
+++ b/fs/utimes.c
@@ -8,6 +8,7 @@
 #include <linux/compat.h>
 #include <asm/unistd.h>
 #include <linux/filelock.h>
+#include "internal.h"
 
 static bool nsec_valid(long nsec)
 {
@@ -82,27 +83,27 @@ static int do_utimes_path(int dfd, const char __user *filename,
 {
 	struct path path;
 	int lookup_flags = 0, error;
+	struct filename *name;
 
 	if (flags & ~(AT_SYMLINK_NOFOLLOW | AT_EMPTY_PATH))
 		return -EINVAL;
 
 	if (!(flags & AT_SYMLINK_NOFOLLOW))
 		lookup_flags |= LOOKUP_FOLLOW;
-	if (flags & AT_EMPTY_PATH)
-		lookup_flags |= LOOKUP_EMPTY;
+	name = getname_uflags(filename, flags);
 
 retry:
-	error = user_path_at(dfd, filename, lookup_flags, &path);
+	error = filename_lookup(dfd, name, lookup_flags, &path, NULL);
 	if (error)
-		return error;
-
+		goto out;
 	error = vfs_utimes(&path, times);
 	path_put(&path);
 	if (retry_estale(error, lookup_flags)) {
 		lookup_flags |= LOOKUP_REVAL;
 		goto retry;
 	}
-
+out:
+	putname(name);
 	return error;
 }
 
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [RFC][PATCH 05/13] chdir(2): import pathname only once
  2025-11-09  6:37 [RFC][PATCH 00/13] io_uring, struct filename and audit Al Viro
                   ` (3 preceding siblings ...)
  2025-11-09  6:37 ` [RFC][PATCH 04/13] do_utimes_path(): " Al Viro
@ 2025-11-09  6:37 ` Al Viro
  2025-11-13 10:16   ` Jan Kara
  2025-11-09  6:37 ` [RFC][PATCH 06/13] chroot(2): " Al Viro
                   ` (7 subsequent siblings)
  12 siblings, 1 reply; 51+ messages in thread
From: Al Viro @ 2025-11-09  6:37 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: torvalds, brauner, jack, mjguzik, paul, axboe, audit, io-uring

Convert the user_path_at() call inside a retry loop into getname_flags() +
filename_lookup() + putname() and leave only filename_lookup() inside
the loop.

In this case we never pass LOOKUP_EMPTY, so getname_flags() is equivalent
to plain getname().

The things could be further simplified by use of cleanup.h stuff, but
let's not clutter the patch with that.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/open.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/fs/open.c b/fs/open.c
index e5110f5e80c7..8bc2f313f4a9 100644
--- a/fs/open.c
+++ b/fs/open.c
@@ -558,8 +558,9 @@ SYSCALL_DEFINE1(chdir, const char __user *, filename)
 	struct path path;
 	int error;
 	unsigned int lookup_flags = LOOKUP_FOLLOW | LOOKUP_DIRECTORY;
+	struct filename *name = getname(filename);
 retry:
-	error = user_path_at(AT_FDCWD, filename, lookup_flags, &path);
+	error = filename_lookup(AT_FDCWD, name, lookup_flags, &path, NULL);
 	if (error)
 		goto out;
 
@@ -576,6 +577,7 @@ SYSCALL_DEFINE1(chdir, const char __user *, filename)
 		goto retry;
 	}
 out:
+	putname(name);
 	return error;
 }
 
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [RFC][PATCH 06/13] chroot(2): import pathname only once
  2025-11-09  6:37 [RFC][PATCH 00/13] io_uring, struct filename and audit Al Viro
                   ` (4 preceding siblings ...)
  2025-11-09  6:37 ` [RFC][PATCH 05/13] chdir(2): " Al Viro
@ 2025-11-09  6:37 ` Al Viro
  2025-11-13 10:18   ` Jan Kara
  2025-11-09  6:37 ` [RFC][PATCH 07/13] user_statfs(): " Al Viro
                   ` (6 subsequent siblings)
  12 siblings, 1 reply; 51+ messages in thread
From: Al Viro @ 2025-11-09  6:37 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: torvalds, brauner, jack, mjguzik, paul, axboe, audit, io-uring

Convert the user_path_at() call inside a retry loop into getname_flags() +
filename_lookup() + putname() and leave only filename_lookup() inside
the loop.

In this case we never pass LOOKUP_EMPTY, so getname_flags() is equivalent
to plain getname().

The things could be further simplified by use of cleanup.h stuff, but
let's not clutter the patch with that.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/open.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/fs/open.c b/fs/open.c
index 8bc2f313f4a9..e67baae339fc 100644
--- a/fs/open.c
+++ b/fs/open.c
@@ -603,8 +603,9 @@ SYSCALL_DEFINE1(chroot, const char __user *, filename)
 	struct path path;
 	int error;
 	unsigned int lookup_flags = LOOKUP_FOLLOW | LOOKUP_DIRECTORY;
+	struct filename *name = getname(filename);
 retry:
-	error = user_path_at(AT_FDCWD, filename, lookup_flags, &path);
+	error = filename_lookup(AT_FDCWD, name, lookup_flags, &path, NULL);
 	if (error)
 		goto out;
 
@@ -628,6 +629,7 @@ SYSCALL_DEFINE1(chroot, const char __user *, filename)
 		goto retry;
 	}
 out:
+	putname(name);
 	return error;
 }
 
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [RFC][PATCH 07/13] user_statfs(): import pathname only once
  2025-11-09  6:37 [RFC][PATCH 00/13] io_uring, struct filename and audit Al Viro
                   ` (5 preceding siblings ...)
  2025-11-09  6:37 ` [RFC][PATCH 06/13] chroot(2): " Al Viro
@ 2025-11-09  6:37 ` Al Viro
  2025-11-13 10:18   ` Jan Kara
  2025-11-09  6:37 ` [RFC][PATCH 08/13] do_sys_truncate(): " Al Viro
                   ` (5 subsequent siblings)
  12 siblings, 1 reply; 51+ messages in thread
From: Al Viro @ 2025-11-09  6:37 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: torvalds, brauner, jack, mjguzik, paul, axboe, audit, io-uring

Convert the user_path_at() call inside a retry loop into getname_flags() +
filename_lookup() + putname() and leave only filename_lookup() inside
the loop.

In this case we never pass LOOKUP_EMPTY, so getname_flags() is equivalent
to plain getname().

The things could be further simplified by use of cleanup.h stuff, but
let's not clutter the patch with that.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/statfs.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/fs/statfs.c b/fs/statfs.c
index a45ac85e6048..a5671bf6c7f0 100644
--- a/fs/statfs.c
+++ b/fs/statfs.c
@@ -99,8 +99,9 @@ int user_statfs(const char __user *pathname, struct kstatfs *st)
 	struct path path;
 	int error;
 	unsigned int lookup_flags = LOOKUP_FOLLOW|LOOKUP_AUTOMOUNT;
+	struct filename *name = getname(pathname);
 retry:
-	error = user_path_at(AT_FDCWD, pathname, lookup_flags, &path);
+	error = filename_lookup(AT_FDCWD, name, lookup_flags, &path, NULL);
 	if (!error) {
 		error = vfs_statfs(&path, st);
 		path_put(&path);
@@ -109,6 +110,7 @@ int user_statfs(const char __user *pathname, struct kstatfs *st)
 			goto retry;
 		}
 	}
+	putname(name);
 	return error;
 }
 
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [RFC][PATCH 08/13] do_sys_truncate(): import pathname only once
  2025-11-09  6:37 [RFC][PATCH 00/13] io_uring, struct filename and audit Al Viro
                   ` (6 preceding siblings ...)
  2025-11-09  6:37 ` [RFC][PATCH 07/13] user_statfs(): " Al Viro
@ 2025-11-09  6:37 ` Al Viro
  2025-11-13 10:18   ` Jan Kara
  2025-11-09  6:37 ` [RFC][PATCH 09/13] do_readlinkat(): " Al Viro
                   ` (4 subsequent siblings)
  12 siblings, 1 reply; 51+ messages in thread
From: Al Viro @ 2025-11-09  6:37 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: torvalds, brauner, jack, mjguzik, paul, axboe, audit, io-uring

Convert the user_path_at() call inside a retry loop into getname_flags() +
filename_lookup() + putname() and leave only filename_lookup() inside
the loop.

In this case we never pass LOOKUP_EMPTY, so getname_flags() is equivalent
to plain getname().

The things could be further simplified by use of cleanup.h stuff, but
let's not clutter the patch with that.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/open.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/fs/open.c b/fs/open.c
index e67baae339fc..eb2ff940393d 100644
--- a/fs/open.c
+++ b/fs/open.c
@@ -129,14 +129,16 @@ EXPORT_SYMBOL_GPL(vfs_truncate);
 int do_sys_truncate(const char __user *pathname, loff_t length)
 {
 	unsigned int lookup_flags = LOOKUP_FOLLOW;
+	struct filename *name;
 	struct path path;
 	int error;
 
 	if (length < 0)	/* sorry, but loff_t says... */
 		return -EINVAL;
 
+	name = getname(pathname);
 retry:
-	error = user_path_at(AT_FDCWD, pathname, lookup_flags, &path);
+	error = filename_lookup(AT_FDCWD, name, lookup_flags, &path, NULL);
 	if (!error) {
 		error = vfs_truncate(&path, length);
 		path_put(&path);
@@ -145,6 +147,7 @@ int do_sys_truncate(const char __user *pathname, loff_t length)
 		lookup_flags |= LOOKUP_REVAL;
 		goto retry;
 	}
+	putname(name);
 	return error;
 }
 
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [RFC][PATCH 09/13] do_readlinkat(): import pathname only once
  2025-11-09  6:37 [RFC][PATCH 00/13] io_uring, struct filename and audit Al Viro
                   ` (7 preceding siblings ...)
  2025-11-09  6:37 ` [RFC][PATCH 08/13] do_sys_truncate(): " Al Viro
@ 2025-11-09  6:37 ` Al Viro
  2025-11-13 10:20   ` Jan Kara
  2025-11-09  6:37 ` [RFC][PATCH 10/13] get rid of audit_reusename() Al Viro
                   ` (3 subsequent siblings)
  12 siblings, 1 reply; 51+ messages in thread
From: Al Viro @ 2025-11-09  6:37 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: torvalds, brauner, jack, mjguzik, paul, axboe, audit, io-uring

Take getname_flags() and putname() outside of retry loop.

Since getname_flags() is the only thing that cares about LOOKUP_EMPTY,
don't bother with setting LOOKUP_EMPTY in lookup_flags - just pass it
to getname_flags() and be done with that.

The things could be further simplified by use of cleanup.h stuff, but
let's not clutter the patch with that.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/stat.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/fs/stat.c b/fs/stat.c
index 6c79661e1b96..ee9ae2c3273a 100644
--- a/fs/stat.c
+++ b/fs/stat.c
@@ -566,13 +566,13 @@ static int do_readlinkat(int dfd, const char __user *pathname,
 	struct path path;
 	struct filename *name;
 	int error;
-	unsigned int lookup_flags = LOOKUP_EMPTY;
+	unsigned int lookup_flags = 0;
 
 	if (bufsiz <= 0)
 		return -EINVAL;
 
+	name = getname_flags(pathname, LOOKUP_EMPTY);
 retry:
-	name = getname_flags(pathname, lookup_flags);
 	error = filename_lookup(dfd, name, lookup_flags, &path, NULL);
 	if (unlikely(error)) {
 		putname(name);
@@ -593,11 +593,11 @@ static int do_readlinkat(int dfd, const char __user *pathname,
 		error = (name->name[0] == '\0') ? -ENOENT : -EINVAL;
 	}
 	path_put(&path);
-	putname(name);
 	if (retry_estale(error, lookup_flags)) {
 		lookup_flags |= LOOKUP_REVAL;
 		goto retry;
 	}
+	putname(name);
 	return error;
 }
 
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [RFC][PATCH 10/13] get rid of audit_reusename()
  2025-11-09  6:37 [RFC][PATCH 00/13] io_uring, struct filename and audit Al Viro
                   ` (8 preceding siblings ...)
  2025-11-09  6:37 ` [RFC][PATCH 09/13] do_readlinkat(): " Al Viro
@ 2025-11-09  6:37 ` Al Viro
  2025-11-09 19:18   ` Linus Torvalds
                     ` (2 more replies)
  2025-11-09  6:37 ` [RFC][PATCH 11/13] allow incomplete imports of filenames Al Viro
                   ` (2 subsequent siblings)
  12 siblings, 3 replies; 51+ messages in thread
From: Al Viro @ 2025-11-09  6:37 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: torvalds, brauner, jack, mjguzik, paul, axboe, audit, io-uring

Originally we tried to avoid multiple insertions into audit names array
during retry loop by a cute hack - memorize the userland pointer and
if there already is a match, just grab an extra reference to it.

Cute as it had been, it had problems - two identical pointers had
audit aux entries merged, two identical strings did not.  Having
different behaviour for syscalls that differ only by addresses of
otherwise identical string arguments is obviously wrong - if nothing
else, compiler can decide to merge identical string literals.

Besides, this hack does nothing for non-audited processes - they get
a fresh copy for retry.  It's not time-critical, but having behaviour
subtly differ that way is bogus.

These days we have very few places that import filename more than once
(9 functions total) and it's easy to massage them so we get rid of all
re-imports.  With that done, we don't need audit_reusename() anymore.
There's no need to memorize userland pointer either.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namei.c            | 11 +++--------
 include/linux/audit.h | 11 -----------
 include/linux/fs.h    |  1 -
 kernel/auditsc.c      | 23 -----------------------
 4 files changed, 3 insertions(+), 43 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 7377020a2cba..dd86e41deeeb 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -125,9 +125,8 @@
 
 #define EMBEDDED_NAME_MAX	(PATH_MAX - offsetof(struct filename, iname))
 
-static inline void initname(struct filename *name, const char __user *uptr)
+static inline void initname(struct filename *name)
 {
-	name->uptr = uptr;
 	name->aname = NULL;
 	atomic_set(&name->refcnt, 1);
 }
@@ -139,10 +138,6 @@ getname_flags(const char __user *filename, int flags)
 	char *kname;
 	int len;
 
-	result = audit_reusename(filename);
-	if (result)
-		return result;
-
 	result = __getname();
 	if (unlikely(!result))
 		return ERR_PTR(-ENOMEM);
@@ -210,7 +205,7 @@ getname_flags(const char __user *filename, int flags)
 			return ERR_PTR(-ENAMETOOLONG);
 		}
 	}
-	initname(result, filename);
+	initname(result);
 	audit_getname(result);
 	return result;
 }
@@ -268,7 +263,7 @@ struct filename *getname_kernel(const char * filename)
 		return ERR_PTR(-ENAMETOOLONG);
 	}
 	memcpy((char *)result->name, filename, len);
-	initname(result, NULL);
+	initname(result);
 	audit_getname(result);
 	return result;
 }
diff --git a/include/linux/audit.h b/include/linux/audit.h
index 536f8ee8da81..d936a604d056 100644
--- a/include/linux/audit.h
+++ b/include/linux/audit.h
@@ -316,7 +316,6 @@ extern void __audit_uring_exit(int success, long code);
 extern void __audit_syscall_entry(int major, unsigned long a0, unsigned long a1,
 				  unsigned long a2, unsigned long a3);
 extern void __audit_syscall_exit(int ret_success, long ret_value);
-extern struct filename *__audit_reusename(const __user char *uptr);
 extern void __audit_getname(struct filename *name);
 extern void __audit_inode(struct filename *name, const struct dentry *dentry,
 				unsigned int flags);
@@ -380,12 +379,6 @@ static inline void audit_syscall_exit(void *pt_regs)
 		__audit_syscall_exit(success, return_code);
 	}
 }
-static inline struct filename *audit_reusename(const __user char *name)
-{
-	if (unlikely(!audit_dummy_context()))
-		return __audit_reusename(name);
-	return NULL;
-}
 static inline void audit_getname(struct filename *name)
 {
 	if (unlikely(!audit_dummy_context()))
@@ -624,10 +617,6 @@ static inline struct audit_context *audit_context(void)
 {
 	return NULL;
 }
-static inline struct filename *audit_reusename(const __user char *name)
-{
-	return NULL;
-}
 static inline void audit_getname(struct filename *name)
 { }
 static inline void audit_inode(struct filename *name,
diff --git a/include/linux/fs.h b/include/linux/fs.h
index c895146c1444..bbae3cfdc338 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2835,7 +2835,6 @@ extern struct kobject *fs_kobj;
 struct audit_names;
 struct filename {
 	const char		*name;	/* pointer to actual string */
-	const __user char	*uptr;	/* original userland pointer */
 	atomic_t		refcnt;
 	struct audit_names	*aname;
 	const char		iname[];
diff --git a/kernel/auditsc.c b/kernel/auditsc.c
index d1966144bdfe..e59a094bb9f7 100644
--- a/kernel/auditsc.c
+++ b/kernel/auditsc.c
@@ -2169,29 +2169,6 @@ static struct audit_names *audit_alloc_name(struct audit_context *context,
 	return aname;
 }
 
-/**
- * __audit_reusename - fill out filename with info from existing entry
- * @uptr: userland ptr to pathname
- *
- * Search the audit_names list for the current audit context. If there is an
- * existing entry with a matching "uptr" then return the filename
- * associated with that audit_name. If not, return NULL.
- */
-struct filename *
-__audit_reusename(const __user char *uptr)
-{
-	struct audit_context *context = audit_context();
-	struct audit_names *n;
-
-	list_for_each_entry(n, &context->names_list, list) {
-		if (!n->name)
-			continue;
-		if (n->name->uptr == uptr)
-			return refname(n->name);
-	}
-	return NULL;
-}
-
 /**
  * __audit_getname - add a name to the list
  * @name: name to add
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [RFC][PATCH 11/13] allow incomplete imports of filenames
  2025-11-09  6:37 [RFC][PATCH 00/13] io_uring, struct filename and audit Al Viro
                   ` (9 preceding siblings ...)
  2025-11-09  6:37 ` [RFC][PATCH 10/13] get rid of audit_reusename() Al Viro
@ 2025-11-09  6:37 ` Al Viro
  2025-11-11  0:45   ` Paul Moore
  2025-11-11 14:41   ` Jens Axboe
  2025-11-09  6:37 ` [RFC][PATCH 12/13] fs: touch up predicts in putname() Al Viro
  2025-11-09  6:37 ` [RFC][PATCH 13/13] struct filename ->refcnt doesn't need to be atomic Al Viro
  12 siblings, 2 replies; 51+ messages in thread
From: Al Viro @ 2025-11-09  6:37 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: torvalds, brauner, jack, mjguzik, paul, axboe, audit, io-uring

There are two filename-related problems in io_uring and its
interplay with audit.

Filenames are imported when request is submitted and used when
it is processed.  Unfortunately, the latter may very well
happen in a different thread.  In that case the reference to
filename is put into the wrong audit_context - that of submitting
thread, not the processing one.  Audit logics is called by
the latter, and it really wants to be able to find the names
in audit_context current (== processing) thread.

Another related problem is the headache with refcounts -
normally all references to given struct filename are visible
only to one thread (the one that uses that struct filename).
io_uring violates that - an extra reference is stashed in
audit_context of submitter.  It gets dropped when submitter
returns to userland, which can happen simultaneously with
processing thread deciding to drop the reference it got.

We paper over that by making refcount atomic, but that means
pointless headache for everyone.

Solution: the notion of partially imported filenames.  Namely,
already copied from userland, but *not* exposed to audit yet.

io_uring can create that in submitter thread, and complete the
import (obtaining the usual reference to struct filename) in
processing thread.

Object: struct delayed_filename.

Primitives for working with it:

delayed_getname(&delayed_filename, user_string) - copies the name
from userland, returning 0 and stashing the address of (still incomplete)
struct filename in delayed_filename on success and returning -E... on
error.

delayed_getname_uflags(&delayed_filename, user_string, atflags) - similar,
in the same relation to delayed_getname() as getname_uflags() is to getname()

complete_getname(&delayed_getname) - completes the import of filename stashed
in delayed_getname and returns struct filename to caller, emptying delayed_getname.

dismiss_delayed_getname(&delayed_getname) - destructor; drops whatever
might be stashed in delayed_getname, emptying it.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namei.c           |  45 +++++++++++++++++--
 include/linux/fs.h   |  10 +++++
 io_uring/fs.c        | 101 +++++++++++++++++++++++--------------------
 io_uring/openclose.c |  16 +++----
 io_uring/statx.c     |  17 +++-----
 io_uring/xattr.c     |  30 +++++--------
 6 files changed, 129 insertions(+), 90 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index dd86e41deeeb..37beb524b362 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -131,8 +131,8 @@ static inline void initname(struct filename *name)
 	atomic_set(&name->refcnt, 1);
 }
 
-struct filename *
-getname_flags(const char __user *filename, int flags)
+static struct filename *
+do_getname(const char __user *filename, int flags, bool incomplete)
 {
 	struct filename *result;
 	char *kname;
@@ -206,10 +206,17 @@ getname_flags(const char __user *filename, int flags)
 		}
 	}
 	initname(result);
-	audit_getname(result);
+	if (likely(!incomplete))
+		audit_getname(result);
 	return result;
 }
 
+struct filename *
+getname_flags(const char __user *filename, int flags)
+{
+	return do_getname(filename, flags, false);
+}
+
 struct filename *getname_uflags(const char __user *filename, int uflags)
 {
 	int flags = (uflags & AT_EMPTY_PATH) ? LOOKUP_EMPTY : 0;
@@ -293,6 +300,38 @@ void putname(struct filename *name)
 }
 EXPORT_SYMBOL(putname);
 
+static inline int __delayed_getname(struct delayed_filename *v,
+			   const char __user *string, int flags)
+{
+	v->__incomplete_filename = do_getname(string, flags, true);
+	return PTR_ERR_OR_ZERO(v->__incomplete_filename);
+}
+
+int delayed_getname(struct delayed_filename *v, const char __user *string)
+{
+	return __delayed_getname(v, string, 0);
+}
+
+int delayed_getname_uflags(struct delayed_filename *v, const char __user *string,
+			 int uflags)
+{
+	int flags = (uflags & AT_EMPTY_PATH) ? LOOKUP_EMPTY : 0;
+	return __delayed_getname(v, string, flags);
+}
+
+void dismiss_delayed_filename(struct delayed_filename *v)
+{
+	putname(no_free_ptr(v->__incomplete_filename));
+}
+
+struct filename *complete_getname(struct delayed_filename *v)
+{
+	struct filename *res = no_free_ptr(v->__incomplete_filename);
+	if (!IS_ERR(res))
+		audit_getname(res);
+	return res;
+}
+
 /**
  * check_acl - perform ACL permission checking
  * @idmap:	idmap of the mount the inode was found from
diff --git a/include/linux/fs.h b/include/linux/fs.h
index bbae3cfdc338..d57c8e21be13 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2934,6 +2934,16 @@ static inline struct filename *getname_maybe_null(const char __user *name, int f
 extern void putname(struct filename *name);
 DEFINE_FREE(putname, struct filename *, if (!IS_ERR_OR_NULL(_T)) putname(_T))
 
+struct delayed_filename {
+	struct filename *__incomplete_filename;	// don't touch
+};
+#define INIT_DELAYED_FILENAME(ptr) \
+	((void)(*(ptr) = (struct delayed_filename){}))
+int delayed_getname(struct delayed_filename *, const char __user *);
+int delayed_getname_uflags(struct delayed_filename *v, const char __user *, int);
+void dismiss_delayed_filename(struct delayed_filename *);
+struct filename *complete_getname(struct delayed_filename *);
+
 static inline struct filename *refname(struct filename *name)
 {
 	atomic_inc(&name->refcnt);
diff --git a/io_uring/fs.c b/io_uring/fs.c
index 37079a414eab..c04c6282210a 100644
--- a/io_uring/fs.c
+++ b/io_uring/fs.c
@@ -19,8 +19,8 @@ struct io_rename {
 	struct file			*file;
 	int				old_dfd;
 	int				new_dfd;
-	struct filename			*oldpath;
-	struct filename			*newpath;
+	struct delayed_filename		oldpath;
+	struct delayed_filename		newpath;
 	int				flags;
 };
 
@@ -28,22 +28,22 @@ struct io_unlink {
 	struct file			*file;
 	int				dfd;
 	int				flags;
-	struct filename			*filename;
+	struct delayed_filename		filename;
 };
 
 struct io_mkdir {
 	struct file			*file;
 	int				dfd;
 	umode_t				mode;
-	struct filename			*filename;
+	struct delayed_filename		filename;
 };
 
 struct io_link {
 	struct file			*file;
 	int				old_dfd;
 	int				new_dfd;
-	struct filename			*oldpath;
-	struct filename			*newpath;
+	struct delayed_filename		oldpath;
+	struct delayed_filename		newpath;
 	int				flags;
 };
 
@@ -51,6 +51,7 @@ int io_renameat_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
 {
 	struct io_rename *ren = io_kiocb_to_cmd(req, struct io_rename);
 	const char __user *oldf, *newf;
+	int err;
 
 	if (sqe->buf_index || sqe->splice_fd_in)
 		return -EINVAL;
@@ -63,14 +64,14 @@ int io_renameat_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
 	ren->new_dfd = READ_ONCE(sqe->len);
 	ren->flags = READ_ONCE(sqe->rename_flags);
 
-	ren->oldpath = getname(oldf);
-	if (IS_ERR(ren->oldpath))
-		return PTR_ERR(ren->oldpath);
+	err = delayed_getname(&ren->oldpath, oldf);
+	if (unlikely(err))
+		return err;
 
-	ren->newpath = getname(newf);
-	if (IS_ERR(ren->newpath)) {
-		putname(ren->oldpath);
-		return PTR_ERR(ren->newpath);
+	err = delayed_getname(&ren->newpath, newf);
+	if (unlikely(err)) {
+		dismiss_delayed_filename(&ren->oldpath);
+		return err;
 	}
 
 	req->flags |= REQ_F_NEED_CLEANUP;
@@ -85,8 +86,9 @@ int io_renameat(struct io_kiocb *req, unsigned int issue_flags)
 
 	WARN_ON_ONCE(issue_flags & IO_URING_F_NONBLOCK);
 
-	ret = do_renameat2(ren->old_dfd, ren->oldpath, ren->new_dfd,
-				ren->newpath, ren->flags);
+	ret = do_renameat2(ren->old_dfd, complete_getname(&ren->oldpath),
+			   ren->new_dfd, complete_getname(&ren->newpath),
+			   ren->flags);
 
 	req->flags &= ~REQ_F_NEED_CLEANUP;
 	io_req_set_res(req, ret, 0);
@@ -97,14 +99,15 @@ void io_renameat_cleanup(struct io_kiocb *req)
 {
 	struct io_rename *ren = io_kiocb_to_cmd(req, struct io_rename);
 
-	putname(ren->oldpath);
-	putname(ren->newpath);
+	dismiss_delayed_filename(&ren->oldpath);
+	dismiss_delayed_filename(&ren->newpath);
 }
 
 int io_unlinkat_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
 {
 	struct io_unlink *un = io_kiocb_to_cmd(req, struct io_unlink);
 	const char __user *fname;
+	int err;
 
 	if (sqe->off || sqe->len || sqe->buf_index || sqe->splice_fd_in)
 		return -EINVAL;
@@ -118,9 +121,9 @@ int io_unlinkat_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
 		return -EINVAL;
 
 	fname = u64_to_user_ptr(READ_ONCE(sqe->addr));
-	un->filename = getname(fname);
-	if (IS_ERR(un->filename))
-		return PTR_ERR(un->filename);
+	err = delayed_getname(&un->filename, fname);
+	if (unlikely(err))
+		return err;
 
 	req->flags |= REQ_F_NEED_CLEANUP;
 	req->flags |= REQ_F_FORCE_ASYNC;
@@ -135,9 +138,9 @@ int io_unlinkat(struct io_kiocb *req, unsigned int issue_flags)
 	WARN_ON_ONCE(issue_flags & IO_URING_F_NONBLOCK);
 
 	if (un->flags & AT_REMOVEDIR)
-		ret = do_rmdir(un->dfd, un->filename);
+		ret = do_rmdir(un->dfd, complete_getname(&un->filename));
 	else
-		ret = do_unlinkat(un->dfd, un->filename);
+		ret = do_unlinkat(un->dfd, complete_getname(&un->filename));
 
 	req->flags &= ~REQ_F_NEED_CLEANUP;
 	io_req_set_res(req, ret, 0);
@@ -148,13 +151,14 @@ void io_unlinkat_cleanup(struct io_kiocb *req)
 {
 	struct io_unlink *ul = io_kiocb_to_cmd(req, struct io_unlink);
 
-	putname(ul->filename);
+	dismiss_delayed_filename(&ul->filename);
 }
 
 int io_mkdirat_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
 {
 	struct io_mkdir *mkd = io_kiocb_to_cmd(req, struct io_mkdir);
 	const char __user *fname;
+	int err;
 
 	if (sqe->off || sqe->rw_flags || sqe->buf_index || sqe->splice_fd_in)
 		return -EINVAL;
@@ -165,9 +169,9 @@ int io_mkdirat_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
 	mkd->mode = READ_ONCE(sqe->len);
 
 	fname = u64_to_user_ptr(READ_ONCE(sqe->addr));
-	mkd->filename = getname(fname);
-	if (IS_ERR(mkd->filename))
-		return PTR_ERR(mkd->filename);
+	err = delayed_getname(&mkd->filename, fname);
+	if (unlikely(err))
+		return err;
 
 	req->flags |= REQ_F_NEED_CLEANUP;
 	req->flags |= REQ_F_FORCE_ASYNC;
@@ -181,7 +185,7 @@ int io_mkdirat(struct io_kiocb *req, unsigned int issue_flags)
 
 	WARN_ON_ONCE(issue_flags & IO_URING_F_NONBLOCK);
 
-	ret = do_mkdirat(mkd->dfd, mkd->filename, mkd->mode);
+	ret = do_mkdirat(mkd->dfd, complete_getname(&mkd->filename), mkd->mode);
 
 	req->flags &= ~REQ_F_NEED_CLEANUP;
 	io_req_set_res(req, ret, 0);
@@ -192,13 +196,14 @@ void io_mkdirat_cleanup(struct io_kiocb *req)
 {
 	struct io_mkdir *md = io_kiocb_to_cmd(req, struct io_mkdir);
 
-	putname(md->filename);
+	dismiss_delayed_filename(&md->filename);
 }
 
 int io_symlinkat_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
 {
 	struct io_link *sl = io_kiocb_to_cmd(req, struct io_link);
 	const char __user *oldpath, *newpath;
+	int err;
 
 	if (sqe->len || sqe->rw_flags || sqe->buf_index || sqe->splice_fd_in)
 		return -EINVAL;
@@ -209,14 +214,14 @@ int io_symlinkat_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
 	oldpath = u64_to_user_ptr(READ_ONCE(sqe->addr));
 	newpath = u64_to_user_ptr(READ_ONCE(sqe->addr2));
 
-	sl->oldpath = getname(oldpath);
-	if (IS_ERR(sl->oldpath))
-		return PTR_ERR(sl->oldpath);
+	err = delayed_getname(&sl->oldpath, oldpath);
+	if (unlikely(err))
+		return err;
 
-	sl->newpath = getname(newpath);
-	if (IS_ERR(sl->newpath)) {
-		putname(sl->oldpath);
-		return PTR_ERR(sl->newpath);
+	err = delayed_getname(&sl->newpath, newpath);
+	if (unlikely(err)) {
+		dismiss_delayed_filename(&sl->oldpath);
+		return err;
 	}
 
 	req->flags |= REQ_F_NEED_CLEANUP;
@@ -231,7 +236,8 @@ int io_symlinkat(struct io_kiocb *req, unsigned int issue_flags)
 
 	WARN_ON_ONCE(issue_flags & IO_URING_F_NONBLOCK);
 
-	ret = do_symlinkat(sl->oldpath, sl->new_dfd, sl->newpath);
+	ret = do_symlinkat(complete_getname(&sl->oldpath), sl->new_dfd,
+			   complete_getname(&sl->newpath));
 
 	req->flags &= ~REQ_F_NEED_CLEANUP;
 	io_req_set_res(req, ret, 0);
@@ -242,6 +248,7 @@ int io_linkat_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
 {
 	struct io_link *lnk = io_kiocb_to_cmd(req, struct io_link);
 	const char __user *oldf, *newf;
+	int err;
 
 	if (sqe->buf_index || sqe->splice_fd_in)
 		return -EINVAL;
@@ -254,14 +261,14 @@ int io_linkat_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
 	newf = u64_to_user_ptr(READ_ONCE(sqe->addr2));
 	lnk->flags = READ_ONCE(sqe->hardlink_flags);
 
-	lnk->oldpath = getname_uflags(oldf, lnk->flags);
-	if (IS_ERR(lnk->oldpath))
-		return PTR_ERR(lnk->oldpath);
+	err = delayed_getname_uflags(&lnk->oldpath, oldf, lnk->flags);
+	if (unlikely(err))
+		return err;
 
-	lnk->newpath = getname(newf);
-	if (IS_ERR(lnk->newpath)) {
-		putname(lnk->oldpath);
-		return PTR_ERR(lnk->newpath);
+	err = delayed_getname(&lnk->newpath, newf);
+	if (unlikely(err)) {
+		dismiss_delayed_filename(&lnk->oldpath);
+		return err;
 	}
 
 	req->flags |= REQ_F_NEED_CLEANUP;
@@ -276,8 +283,8 @@ int io_linkat(struct io_kiocb *req, unsigned int issue_flags)
 
 	WARN_ON_ONCE(issue_flags & IO_URING_F_NONBLOCK);
 
-	ret = do_linkat(lnk->old_dfd, lnk->oldpath, lnk->new_dfd,
-				lnk->newpath, lnk->flags);
+	ret = do_linkat(lnk->old_dfd, complete_getname(&lnk->oldpath),
+			lnk->new_dfd, complete_getname(&lnk->newpath), lnk->flags);
 
 	req->flags &= ~REQ_F_NEED_CLEANUP;
 	io_req_set_res(req, ret, 0);
@@ -288,6 +295,6 @@ void io_link_cleanup(struct io_kiocb *req)
 {
 	struct io_link *sl = io_kiocb_to_cmd(req, struct io_link);
 
-	putname(sl->oldpath);
-	putname(sl->newpath);
+	dismiss_delayed_filename(&sl->oldpath);
+	dismiss_delayed_filename(&sl->newpath);
 }
diff --git a/io_uring/openclose.c b/io_uring/openclose.c
index bfeb91b31bba..6bc14f626923 100644
--- a/io_uring/openclose.c
+++ b/io_uring/openclose.c
@@ -23,7 +23,7 @@ struct io_open {
 	struct file			*file;
 	int				dfd;
 	u32				file_slot;
-	struct filename			*filename;
+	struct delayed_filename		filename;
 	struct open_how			how;
 	unsigned long			nofile;
 };
@@ -67,12 +67,9 @@ static int __io_openat_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe
 
 	open->dfd = READ_ONCE(sqe->fd);
 	fname = u64_to_user_ptr(READ_ONCE(sqe->addr));
-	open->filename = getname(fname);
-	if (IS_ERR(open->filename)) {
-		ret = PTR_ERR(open->filename);
-		open->filename = NULL;
+	ret = delayed_getname(&open->filename, fname);
+	if (unlikely(ret))
 		return ret;
-	}
 
 	open->file_slot = READ_ONCE(sqe->file_index);
 	if (open->file_slot && (open->how.flags & O_CLOEXEC))
@@ -121,6 +118,7 @@ int io_openat2(struct io_kiocb *req, unsigned int issue_flags)
 	struct file *file;
 	bool resolve_nonblock, nonblock_set;
 	bool fixed = !!open->file_slot;
+	struct filename *name __free(putname) = complete_getname(&open->filename);
 	int ret;
 
 	ret = build_open_flags(&open->how, &op);
@@ -140,7 +138,7 @@ int io_openat2(struct io_kiocb *req, unsigned int issue_flags)
 			goto err;
 	}
 
-	file = do_filp_open(open->dfd, open->filename, &op);
+	file = do_filp_open(open->dfd, name, &op);
 	if (IS_ERR(file)) {
 		/*
 		 * We could hang on to this 'fd' on retrying, but seems like
@@ -167,7 +165,6 @@ int io_openat2(struct io_kiocb *req, unsigned int issue_flags)
 		ret = io_fixed_fd_install(req, issue_flags, file,
 						open->file_slot);
 err:
-	putname(open->filename);
 	req->flags &= ~REQ_F_NEED_CLEANUP;
 	if (ret < 0)
 		req_set_fail(req);
@@ -184,8 +181,7 @@ void io_open_cleanup(struct io_kiocb *req)
 {
 	struct io_open *open = io_kiocb_to_cmd(req, struct io_open);
 
-	if (open->filename)
-		putname(open->filename);
+	dismiss_delayed_filename(&open->filename);
 }
 
 int __io_close_fixed(struct io_ring_ctx *ctx, unsigned int issue_flags,
diff --git a/io_uring/statx.c b/io_uring/statx.c
index 5111e9befbfe..dc10b48bcde6 100644
--- a/io_uring/statx.c
+++ b/io_uring/statx.c
@@ -16,7 +16,7 @@ struct io_statx {
 	int				dfd;
 	unsigned int			mask;
 	unsigned int			flags;
-	struct filename			*filename;
+	struct delayed_filename		filename;
 	struct statx __user		*buffer;
 };
 
@@ -24,6 +24,7 @@ int io_statx_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
 {
 	struct io_statx *sx = io_kiocb_to_cmd(req, struct io_statx);
 	const char __user *path;
+	int ret;
 
 	if (sqe->buf_index || sqe->splice_fd_in)
 		return -EINVAL;
@@ -36,14 +37,10 @@ int io_statx_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
 	sx->buffer = u64_to_user_ptr(READ_ONCE(sqe->addr2));
 	sx->flags = READ_ONCE(sqe->statx_flags);
 
-	sx->filename = getname_uflags(path, sx->flags);
-
-	if (IS_ERR(sx->filename)) {
-		int ret = PTR_ERR(sx->filename);
+	ret = delayed_getname_uflags(&sx->filename, path, sx->flags);
 
-		sx->filename = NULL;
+	if (unlikely(ret))
 		return ret;
-	}
 
 	req->flags |= REQ_F_NEED_CLEANUP;
 	req->flags |= REQ_F_FORCE_ASYNC;
@@ -53,11 +50,12 @@ int io_statx_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
 int io_statx(struct io_kiocb *req, unsigned int issue_flags)
 {
 	struct io_statx *sx = io_kiocb_to_cmd(req, struct io_statx);
+	struct filename *name __free(putname) = complete_getname(&sx->filename);
 	int ret;
 
 	WARN_ON_ONCE(issue_flags & IO_URING_F_NONBLOCK);
 
-	ret = do_statx(sx->dfd, sx->filename, sx->flags, sx->mask, sx->buffer);
+	ret = do_statx(sx->dfd, name, sx->flags, sx->mask, sx->buffer);
 	io_req_set_res(req, ret, 0);
 	return IOU_COMPLETE;
 }
@@ -66,6 +64,5 @@ void io_statx_cleanup(struct io_kiocb *req)
 {
 	struct io_statx *sx = io_kiocb_to_cmd(req, struct io_statx);
 
-	if (sx->filename)
-		putname(sx->filename);
+	dismiss_delayed_filename(&sx->filename);
 }
diff --git a/io_uring/xattr.c b/io_uring/xattr.c
index 322b94ff9e4b..0fb4e5303500 100644
--- a/io_uring/xattr.c
+++ b/io_uring/xattr.c
@@ -19,16 +19,14 @@
 struct io_xattr {
 	struct file			*file;
 	struct kernel_xattr_ctx		ctx;
-	struct filename			*filename;
+	struct delayed_filename		filename;
 };
 
 void io_xattr_cleanup(struct io_kiocb *req)
 {
 	struct io_xattr *ix = io_kiocb_to_cmd(req, struct io_xattr);
 
-	if (ix->filename)
-		putname(ix->filename);
-
+	dismiss_delayed_filename(&ix->filename);
 	kfree(ix->ctx.kname);
 	kvfree(ix->ctx.kvalue);
 }
@@ -48,7 +46,7 @@ static int __io_getxattr_prep(struct io_kiocb *req,
 	const char __user *name;
 	int ret;
 
-	ix->filename = NULL;
+	INIT_DELAYED_FILENAME(&ix->filename);
 	ix->ctx.kvalue = NULL;
 	name = u64_to_user_ptr(READ_ONCE(sqe->addr));
 	ix->ctx.value = u64_to_user_ptr(READ_ONCE(sqe->addr2));
@@ -93,11 +91,7 @@ int io_getxattr_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
 
 	path = u64_to_user_ptr(READ_ONCE(sqe->addr3));
 
-	ix->filename = getname(path);
-	if (IS_ERR(ix->filename))
-		return PTR_ERR(ix->filename);
-
-	return 0;
+	return delayed_getname(&ix->filename, path);
 }
 
 int io_fgetxattr(struct io_kiocb *req, unsigned int issue_flags)
@@ -119,8 +113,8 @@ int io_getxattr(struct io_kiocb *req, unsigned int issue_flags)
 
 	WARN_ON_ONCE(issue_flags & IO_URING_F_NONBLOCK);
 
-	ret = filename_getxattr(AT_FDCWD, ix->filename, LOOKUP_FOLLOW, &ix->ctx);
-	ix->filename = NULL;
+	ret = filename_getxattr(AT_FDCWD, complete_getname(&ix->filename),
+				LOOKUP_FOLLOW, &ix->ctx);
 	io_xattr_finish(req, ret);
 	return IOU_COMPLETE;
 }
@@ -132,7 +126,7 @@ static int __io_setxattr_prep(struct io_kiocb *req,
 	const char __user *name;
 	int ret;
 
-	ix->filename = NULL;
+	INIT_DELAYED_FILENAME(&ix->filename);
 	name = u64_to_user_ptr(READ_ONCE(sqe->addr));
 	ix->ctx.cvalue = u64_to_user_ptr(READ_ONCE(sqe->addr2));
 	ix->ctx.kvalue = NULL;
@@ -169,11 +163,7 @@ int io_setxattr_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
 
 	path = u64_to_user_ptr(READ_ONCE(sqe->addr3));
 
-	ix->filename = getname(path);
-	if (IS_ERR(ix->filename))
-		return PTR_ERR(ix->filename);
-
-	return 0;
+	return delayed_getname(&ix->filename, path);
 }
 
 int io_fsetxattr_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
@@ -200,8 +190,8 @@ int io_setxattr(struct io_kiocb *req, unsigned int issue_flags)
 
 	WARN_ON_ONCE(issue_flags & IO_URING_F_NONBLOCK);
 
-	ret = filename_setxattr(AT_FDCWD, ix->filename, LOOKUP_FOLLOW, &ix->ctx);
-	ix->filename = NULL;
+	ret = filename_setxattr(AT_FDCWD, complete_getname(&ix->filename),
+				LOOKUP_FOLLOW, &ix->ctx);
 	io_xattr_finish(req, ret);
 	return IOU_COMPLETE;
 }
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [RFC][PATCH 12/13] fs: touch up predicts in putname()
  2025-11-09  6:37 [RFC][PATCH 00/13] io_uring, struct filename and audit Al Viro
                   ` (10 preceding siblings ...)
  2025-11-09  6:37 ` [RFC][PATCH 11/13] allow incomplete imports of filenames Al Viro
@ 2025-11-09  6:37 ` Al Viro
  2025-11-09  6:37 ` [RFC][PATCH 13/13] struct filename ->refcnt doesn't need to be atomic Al Viro
  12 siblings, 0 replies; 51+ messages in thread
From: Al Viro @ 2025-11-09  6:37 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: torvalds, brauner, jack, mjguzik, paul, axboe, audit, io-uring

From: Mateusz Guzik <mjguzik@gmail.com>

1. we already expect the refcount is 1.
2. path creation predicts name == iname

I verified this straightens out the asm, no functional changes.

Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Link: https://patch.msgid.link/20251029134952.658450-1-mjguzik@gmail.com
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Christian Brauner <brauner@kernel.org>
---
 fs/namei.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 37beb524b362..bb306177b8a3 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -284,7 +284,7 @@ void putname(struct filename *name)
 		return;
 
 	refcnt = atomic_read(&name->refcnt);
-	if (refcnt != 1) {
+	if (unlikely(refcnt != 1)) {
 		if (WARN_ON_ONCE(!refcnt))
 			return;
 
@@ -292,7 +292,7 @@ void putname(struct filename *name)
 			return;
 	}
 
-	if (name->name != name->iname) {
+	if (unlikely(name->name != name->iname)) {
 		__putname(name->name);
 		kfree(name);
 	} else
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [RFC][PATCH 13/13] struct filename ->refcnt doesn't need to be atomic
  2025-11-09  6:37 [RFC][PATCH 00/13] io_uring, struct filename and audit Al Viro
                   ` (11 preceding siblings ...)
  2025-11-09  6:37 ` [RFC][PATCH 12/13] fs: touch up predicts in putname() Al Viro
@ 2025-11-09  6:37 ` Al Viro
  12 siblings, 0 replies; 51+ messages in thread
From: Al Viro @ 2025-11-09  6:37 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: torvalds, brauner, jack, mjguzik, paul, axboe, audit, io-uring

... or visible outside of audit, really.  Note that references
held in delayed_filename always have refcount 1, and from the
moment of complete_getname() or equivalent point in getname...()
there is no references to struct filename instance anywhere
that would be visible to other threads.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namei.c         | 8 ++++----
 include/linux/fs.h | 8 +-------
 kernel/auditsc.c   | 6 ++++++
 3 files changed, 11 insertions(+), 11 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index bb306177b8a3..9863cc319181 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -128,7 +128,7 @@
 static inline void initname(struct filename *name)
 {
 	name->aname = NULL;
-	atomic_set(&name->refcnt, 1);
+	name->refcnt = 1;
 }
 
 static struct filename *
@@ -283,13 +283,13 @@ void putname(struct filename *name)
 	if (IS_ERR_OR_NULL(name))
 		return;
 
-	refcnt = atomic_read(&name->refcnt);
+	refcnt = name->refcnt;
 	if (unlikely(refcnt != 1)) {
 		if (WARN_ON_ONCE(!refcnt))
 			return;
 
-		if (!atomic_dec_and_test(&name->refcnt))
-			return;
+		name->refcnt--;
+		return;
 	}
 
 	if (unlikely(name->name != name->iname)) {
diff --git a/include/linux/fs.h b/include/linux/fs.h
index d57c8e21be13..4ea32aba847b 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2835,7 +2835,7 @@ extern struct kobject *fs_kobj;
 struct audit_names;
 struct filename {
 	const char		*name;	/* pointer to actual string */
-	atomic_t		refcnt;
+	int			refcnt;
 	struct audit_names	*aname;
 	const char		iname[];
 };
@@ -2944,12 +2944,6 @@ int delayed_getname_uflags(struct delayed_filename *v, const char __user *, int)
 void dismiss_delayed_filename(struct delayed_filename *);
 struct filename *complete_getname(struct delayed_filename *);
 
-static inline struct filename *refname(struct filename *name)
-{
-	atomic_inc(&name->refcnt);
-	return name;
-}
-
 extern int finish_open(struct file *file, struct dentry *dentry,
 			int (*open)(struct inode *, struct file *));
 extern int finish_no_open(struct file *file, struct dentry *dentry);
diff --git a/kernel/auditsc.c b/kernel/auditsc.c
index e59a094bb9f7..d71fc73455b2 100644
--- a/kernel/auditsc.c
+++ b/kernel/auditsc.c
@@ -2169,6 +2169,12 @@ static struct audit_names *audit_alloc_name(struct audit_context *context,
 	return aname;
 }
 
+static inline struct filename *refname(struct filename *name)
+{
+	name->refcnt++;
+	return name;
+}
+
 /**
  * __audit_getname - add a name to the list
  * @name: name to add
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* Re: [RFC][PATCH 10/13] get rid of audit_reusename()
  2025-11-09  6:37 ` [RFC][PATCH 10/13] get rid of audit_reusename() Al Viro
@ 2025-11-09 19:18   ` Linus Torvalds
  2025-11-09 19:55     ` Mateusz Guzik
  2025-11-09 22:18     ` Linus Torvalds
  2025-11-10 23:13   ` Paul Moore
  2025-11-13 10:29   ` Jan Kara
  2 siblings, 2 replies; 51+ messages in thread
From: Linus Torvalds @ 2025-11-09 19:18 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-fsdevel, brauner, jack, mjguzik, paul, axboe, audit,
	io-uring

[-- Attachment #1: Type: text/plain, Size: 3429 bytes --]

On Sat, 8 Nov 2025 at 22:38, Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> These days we have very few places that import filename more than once
> (9 functions total) and it's easy to massage them so we get rid of all
> re-imports.  With that done, we don't need audit_reusename() anymore.
> There's no need to memorize userland pointer either.

Lovely. Ack on the whole series.

I do wonder if we could go one step further, and try to make the
"struct filename" allocation rather much smaller, so that we could fit
it on the stack,m and avoid the whole __getname() call *entirely* for
shorter pathnames.

That __getname() allocation is fairly costly, and 99% of the time we
really don't need it because audit doesn't even get a ref to it so
it's all entirely thread-local.

Right now the allocation is a full page, which is almost entirely for
historical reasons ("__getname()" long long ago used to be
"__get_free_page()"m and then when it was made a kmemc_cache_alloc()
it just stayed page-sized, and we did replaced the size to PATH_MAX
and limited it to 4k regardless of page size - and then with the
embedded 'struct filename' we now have that odd

    #define EMBEDDED_NAME_MAX       (PATH_MAX - offsetof(struct
filename, iname))

and that PATH_MAX thing really is a random value these days, because
the size of the __getname() allocation has *NOTHING* to do with the
maximum pathname, and we actually have to do a *separate* allocation
if we have a long path that needs the whole PATH_MAX.

Now, that separate allocation we do oddly - in that we actually
continue to use that '__getname() allocation for the pathname, and the
new allocation is just for the initial part of 'struct filename'. But
it's *odd* and purely due to those historical oddities. We could make
the new allocation be the actual PATH_MAX size, and continue to use
the smaller original allocation for 'struct filename', and the code in
getname_flags() would be a lot more logical.

Now, for all the same historical reasons there are a few users that
mis-use "__getname()" and "__putname()" to *not* allocate an actual
'struct filename', but really just do a "kmalloc(PATH_MAX)".  The fix
for that is to just leave "__getname()/__putname()" as that odd legacy
"allocate a pathname", and just make the actual real "struct filename"
use proper allocators with proper type checking.

The attached patch is ENTIRELY UNTESTED, so please see it as a
"something like this". But wouldn't it be really nice to not play
those odd games with "struct filename" that getname_flags() currently
plays? And with this, 'struct filename' is small enough that we
*could* just allocate it on the stack if we then also add code to deal
with the audit case (which this does *not* do, just to clarify: this
is literally just the "prep for a smaller structure" part).

Also note that this assumes that short pathname (smaller than that new

   #define EMBEDDED_NAME_MAX      64

size) are actually the common case. With longer paths, and without the
"allocate on stack", this patch will cause two allocations, because it
then does that

                kname = kmalloc(PATH_MAX, GFP_KERNEL);

to allocate the separate name when it didn't fit in the smaller
embedded path buffer. So in this form, this is actually a
pessimization, and again, none of this makes sense *unless* we then go
on to allocate the smaller filename struct on the stack.

Hmm? Comments?

           Linus

[-- Attachment #2: patch.diff --]
[-- Type: text/x-patch, Size: 5738 bytes --]

 fs/dcache.c        |  8 ++++---
 fs/namei.c         | 61 +++++++++++++++++++++---------------------------------
 include/linux/fs.h | 18 +++++++++++++---
 3 files changed, 44 insertions(+), 43 deletions(-)

diff --git a/fs/dcache.c b/fs/dcache.c
index 035cccbc9276..9deaa26d0f46 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -3246,7 +3246,7 @@ static void __init dcache_init(void)
 	runtime_const_init(ptr, dentry_hashtable);
 }
 
-/* SLAB cache for __getname() consumers */
+/* SLAB cache for alloc_filename() consumers */
 struct kmem_cache *names_cachep __ro_after_init;
 EXPORT_SYMBOL(names_cachep);
 
@@ -3263,8 +3263,10 @@ void __init vfs_caches_init_early(void)
 
 void __init vfs_caches_init(void)
 {
-	names_cachep = kmem_cache_create_usercopy("names_cache", PATH_MAX, 0,
-			SLAB_HWCACHE_ALIGN|SLAB_PANIC, 0, PATH_MAX, NULL);
+	names_cachep = kmem_cache_create_usercopy("names_cache",
+			sizeof(struct filename), 0, SLAB_PANIC,
+			offsetof(struct filename, iname), EMBEDDED_NAME_MAX,
+			NULL);
 
 	dcache_init();
 	inode_init();
diff --git a/fs/namei.c b/fs/namei.c
index 7377020a2cba..739f70372d92 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -123,8 +123,6 @@
  * PATH_MAX includes the nul terminator --RR.
  */
 
-#define EMBEDDED_NAME_MAX	(PATH_MAX - offsetof(struct filename, iname))
-
 static inline void initname(struct filename *name, const char __user *uptr)
 {
 	name->uptr = uptr;
@@ -143,7 +141,7 @@ getname_flags(const char __user *filename, int flags)
 	if (result)
 		return result;
 
-	result = __getname();
+	result = alloc_filename();
 	if (unlikely(!result))
 		return ERR_PTR(-ENOMEM);
 
@@ -160,13 +158,13 @@ getname_flags(const char __user *filename, int flags)
 	 */
 	if (unlikely(len <= 0)) {
 		if (unlikely(len < 0)) {
-			__putname(result);
+			free_filename(result);
 			return ERR_PTR(len);
 		}
 
 		/* The empty path is special. */
 		if (!(flags & LOOKUP_EMPTY)) {
-			__putname(result);
+			free_filename(result);
 			return ERR_PTR(-ENOENT);
 		}
 	}
@@ -178,35 +176,26 @@ getname_flags(const char __user *filename, int flags)
 	 * userland.
 	 */
 	if (unlikely(len == EMBEDDED_NAME_MAX)) {
-		const size_t size = offsetof(struct filename, iname[1]);
-		kname = (char *)result;
-
-		/*
-		 * size is chosen that way we to guarantee that
-		 * result->iname[0] is within the same object and that
-		 * kname can't be equal to result->iname, no matter what.
-		 */
-		result = kzalloc(size, GFP_KERNEL);
-		if (unlikely(!result)) {
-			__putname(kname);
+		kname = kmalloc(PATH_MAX, GFP_KERNEL);
+		if (unlikely(!kname)) {
+			free_filename(result);
 			return ERR_PTR(-ENOMEM);
 		}
-		result->name = kname;
-		len = strncpy_from_user(kname, filename, PATH_MAX);
+		memcpy(kname, result->iname, EMBEDDED_NAME_MAX);
+
+		// Copy remaining part of the name
+		len = strncpy_from_user(kname + EMBEDDED_NAME_MAX,
+			filename + EMBEDDED_NAME_MAX,
+			PATH_MAX-EMBEDDED_NAME_MAX);
 		if (unlikely(len < 0)) {
-			__putname(kname);
-			kfree(result);
+			free_filename(result);
+			kfree(kname);
 			return ERR_PTR(len);
 		}
-		/* The empty path is special. */
-		if (unlikely(!len) && !(flags & LOOKUP_EMPTY)) {
-			__putname(kname);
-			kfree(result);
-			return ERR_PTR(-ENOENT);
-		}
-		if (unlikely(len == PATH_MAX)) {
-			__putname(kname);
-			kfree(result);
+		result->name = kname;
+		if (unlikely(len == PATH_MAX-EMBEDDED_NAME_MAX)) {
+			free_filename(result);
+			kfree(kname);
 			return ERR_PTR(-ENAMETOOLONG);
 		}
 	}
@@ -246,7 +235,7 @@ struct filename *getname_kernel(const char * filename)
 	struct filename *result;
 	int len = strlen(filename) + 1;
 
-	result = __getname();
+	result = alloc_filename();
 	if (unlikely(!result))
 		return ERR_PTR(-ENOMEM);
 
@@ -258,13 +247,13 @@ struct filename *getname_kernel(const char * filename)
 
 		tmp = kmalloc(size, GFP_KERNEL);
 		if (unlikely(!tmp)) {
-			__putname(result);
+			free_filename(result);
 			return ERR_PTR(-ENOMEM);
 		}
 		tmp->name = (char *)result;
 		result = tmp;
 	} else {
-		__putname(result);
+		free_filename(result);
 		return ERR_PTR(-ENAMETOOLONG);
 	}
 	memcpy((char *)result->name, filename, len);
@@ -290,11 +279,9 @@ void putname(struct filename *name)
 			return;
 	}
 
-	if (name->name != name->iname) {
-		__putname(name->name);
-		kfree(name);
-	} else
-		__putname(name);
+	if (name->name != name->iname)
+		kfree(name->name);
+	free_filename(name);
 }
 EXPORT_SYMBOL(putname);
 
diff --git a/include/linux/fs.h b/include/linux/fs.h
index c895146c1444..197a21897af2 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2833,12 +2833,13 @@ extern struct kobject *fs_kobj;
 
 /* fs/open.c */
 struct audit_names;
+#define EMBEDDED_NAME_MAX	64
 struct filename {
 	const char		*name;	/* pointer to actual string */
 	const __user char	*uptr;	/* original userland pointer */
 	atomic_t		refcnt;
 	struct audit_names	*aname;
-	const char		iname[];
+	const char		iname[EMBEDDED_NAME_MAX];
 };
 static_assert(offsetof(struct filename, iname) % sizeof(long) == 0);
 
@@ -2960,8 +2961,19 @@ extern void __init vfs_caches_init(void);
 
 extern struct kmem_cache *names_cachep;
 
-#define __getname()		kmem_cache_alloc(names_cachep, GFP_KERNEL)
-#define __putname(name)		kmem_cache_free(names_cachep, (void *)(name))
+static inline struct filename *alloc_filename(void)
+{
+	return kmem_cache_alloc(names_cachep, GFP_KERNEL);
+}
+
+static inline void free_filename(struct filename *name)
+{
+	kmem_cache_free(names_cachep, name);
+}
+
+// Crazy old legacy uses for pathname allocations
+#define __getname() kmalloc(PATH_MAX, GFP_KERNEL)
+#define __putname(name) kfree((void *)(name))
 
 extern struct super_block *blockdev_superblock;
 static inline bool sb_is_blkdev_sb(struct super_block *sb)

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* Re: [RFC][PATCH 10/13] get rid of audit_reusename()
  2025-11-09 19:18   ` Linus Torvalds
@ 2025-11-09 19:55     ` Mateusz Guzik
  2025-11-09 20:22       ` Linus Torvalds
  2025-11-09 22:18     ` Linus Torvalds
  1 sibling, 1 reply; 51+ messages in thread
From: Mateusz Guzik @ 2025-11-09 19:55 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Al Viro, linux-fsdevel, brauner, jack, paul, axboe, audit,
	io-uring

On Sun, Nov 9, 2025 at 8:18 PM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> On Sat, 8 Nov 2025 at 22:38, Al Viro <viro@zeniv.linux.org.uk> wrote:
> >
> > These days we have very few places that import filename more than once
> > (9 functions total) and it's easy to massage them so we get rid of all
> > re-imports.  With that done, we don't need audit_reusename() anymore.
> > There's no need to memorize userland pointer either.
>
> Lovely. Ack on the whole series.
>
> I do wonder if we could go one step further, and try to make the
> "struct filename" allocation rather much smaller, so that we could fit
> it on the stack,m and avoid the whole __getname() call *entirely* for
> shorter pathnames.
>
> That __getname() allocation is fairly costly, and 99% of the time we
> really don't need it because audit doesn't even get a ref to it so
> it's all entirely thread-local.
>

I looked into this in the past, 64 definitely does not cut it. For
example take a look at these paths from gcc:
/usr/lib/gcc/x86_64-linux-gnu/12/../../../../x86_64-linux-gnu/lib/x86_64-linux-gnu/Scrt1.o
/usr/lib/gcc/x86_64-linux-gnu/12/../../../../x86_64-linux-gnu/lib/../lib/Scrt1.o
/usr/lib/gcc/x86_64-linux-gnu/12/../../../x86_64-linux-gnu/12/Scrt1.o
/usr/lib/gcc/x86_64-linux-gnu/12/../../../x86_64-linux-gnu/Scrt1.o

Anyhow, given that the intent is to damage-control allocation cost, I
have to point out there is a patchset to replace the current kmem
alloc/free code with sheaves for everyone which promises better
performance:
https://lore.kernel.org/linux-mm/20251023-sheaves-for-all-v1-0-6ffa2c9941c0@suse.cz/

I tried it and there is some improvement, but the allocator still
remains as a problem. Best case scenario sheaves code just gets better
and everyone benefits.

However, so happens I was looking at this very issue recently and I
suspect the way forward is to handroll a small per-cpu cache from
kmalloced memory. Items would be put there and removed protected by
preemption only, so it should be rather cheap without any of the
allocator boiler-plate. The bufs could be -- say -- 512 bytes in size
and would still be perfectly legal to hand off to audit as they come.
The question is how many paths need to be cached to avoid going to the
real allocator in practice -- too many would invalidate the idea.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC][PATCH 10/13] get rid of audit_reusename()
  2025-11-09 19:55     ` Mateusz Guzik
@ 2025-11-09 20:22       ` Linus Torvalds
  2025-11-09 22:18         ` Mateusz Guzik
  0 siblings, 1 reply; 51+ messages in thread
From: Linus Torvalds @ 2025-11-09 20:22 UTC (permalink / raw)
  To: Mateusz Guzik
  Cc: Al Viro, linux-fsdevel, brauner, jack, paul, axboe, audit,
	io-uring

On Sun, 9 Nov 2025 at 11:55, Mateusz Guzik <mjguzik@gmail.com> wrote:
>
> I looked into this in the past, 64 definitely does not cut it.

We could easily make it be 128 bytes, I just picked 64 at random.

> Anyhow, given that the intent is to damage-control allocation cost, I
> have to point out there is a patchset to replace the current kmem
> alloc/free code with sheaves for everyone which promises better
> performance:

Oh, I'm sure sheaves will improve on the allocation path, but it's not
going to be even remotely near what a simple stack allocation will be.
Not just from an allocation cost standpoint, but just from D$ density.

But numbers talk, BS walks.

That said, I partly like my patch just because the current code in
getname_flags() is simply disgusting for all those historical reasons.
So even if we kept the allocation big - and didn't put it on the stack
- I think actually using a proper 'struct filename' allocation would
be a good change.

(That patch also avoids the double strncpy_from_user() for the long
path case, because once you don't play those odd allocation games with
"sometimes it's a 'struct filename', sometimes it's just the path
string", it's just simpler to do a fixed-size memcpy followed by a
"copy the rest of the filename from user space". Admittedly you can do
that with an overlapping 'memmove()' even with the current code, but
the code is just conceptually unnecessarily complicated for those
legacy reasons).

           Linus

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC][PATCH 10/13] get rid of audit_reusename()
  2025-11-09 19:18   ` Linus Torvalds
  2025-11-09 19:55     ` Mateusz Guzik
@ 2025-11-09 22:18     ` Linus Torvalds
  2025-11-10  5:17       ` Al Viro
                         ` (2 more replies)
  1 sibling, 3 replies; 51+ messages in thread
From: Linus Torvalds @ 2025-11-09 22:18 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-fsdevel, brauner, jack, mjguzik, paul, axboe, audit,
	io-uring

[-- Attachment #1: Type: text/plain, Size: 1504 bytes --]

On Sun, 9 Nov 2025 at 11:18, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> Hmm? Comments?

Oh, and while double-checking bad users, I note that ntfs3 does

        uni = kmem_cache_alloc(names_cachep, GFP_NOWAIT);
...
        kmem_cache_free(names_cachep, uni);

which is all complete and utter bogosity. I have no idea why anybody
ever thought that was acceptable. It's garbage.

That "uni" isn't even a filename - of either kind. It's neither a
"struct filename" nor a PATH_MAX buffer. It's a "struct cpu_str *"
which is a UTF16 thing that has absolutely nothing to do with
names_cachep, and should never have been allocated that way. It's pure
random insanity.

It should just do a "kmalloc/kfree", with the size being 512 (255
UTF16 characters plus two bytes for len/unused).

Anyway, slightly updated patch that makes "names_cachep" local to
fs/namei.c just because there is absolutely _no_ reason for anybody
else to ever use it. Except for that insane legacy one of __getname(),
that is now just a kmalloc.

I also made EMBEDDED_NAME_MAX be 128 as per Mateusz' comment, although
to avoid double allocations it should probably be even bigger. A
"small" value is good for testing that the new logic works, though.

I haven't actually dared trying to boot into this, so it's still
entirely untested. But I've at least looked through that patch a bit
more and tried to search for other insane patterns, and so far that
oddity in ntfs3 was the only related thing I've found.

        Linus

[-- Attachment #2: patch.diff --]
[-- Type: text/x-patch, Size: 7162 bytes --]

 fs/dcache.c        |  8 +----
 fs/namei.c         | 86 ++++++++++++++++++++++++++++--------------------------
 fs/ntfs3/namei.c   |  4 +--
 include/linux/fs.h | 11 +++----
 4 files changed, 54 insertions(+), 55 deletions(-)

diff --git a/fs/dcache.c b/fs/dcache.c
index 035cccbc9276..bd4432f46d15 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -3246,10 +3246,6 @@ static void __init dcache_init(void)
 	runtime_const_init(ptr, dentry_hashtable);
 }
 
-/* SLAB cache for __getname() consumers */
-struct kmem_cache *names_cachep __ro_after_init;
-EXPORT_SYMBOL(names_cachep);
-
 void __init vfs_caches_init_early(void)
 {
 	int i;
@@ -3263,9 +3259,7 @@ void __init vfs_caches_init_early(void)
 
 void __init vfs_caches_init(void)
 {
-	names_cachep = kmem_cache_create_usercopy("names_cache", PATH_MAX, 0,
-			SLAB_HWCACHE_ALIGN|SLAB_PANIC, 0, PATH_MAX, NULL);
-
+	namei_init();
 	dcache_init();
 	inode_init();
 	files_init();
diff --git a/fs/namei.c b/fs/namei.c
index 7377020a2cba..aaede1892133 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -123,7 +123,26 @@
  * PATH_MAX includes the nul terminator --RR.
  */
 
-#define EMBEDDED_NAME_MAX	(PATH_MAX - offsetof(struct filename, iname))
+/* SLAB cache for alloc_filename() consumers */
+static struct kmem_cache *names_cachep __ro_after_init;
+
+void __init namei_init(void)
+{
+	names_cachep = kmem_cache_create_usercopy("names_cache",
+			sizeof(struct filename), 0, SLAB_PANIC,
+			offsetof(struct filename, iname), EMBEDDED_NAME_MAX,
+			NULL);
+}
+
+static inline struct filename *alloc_filename(void)
+{
+	return kmem_cache_alloc(names_cachep, GFP_KERNEL);
+}
+
+static inline void free_filename(struct filename *name)
+{
+	kmem_cache_free(names_cachep, name);
+}
 
 static inline void initname(struct filename *name, const char __user *uptr)
 {
@@ -143,7 +162,7 @@ getname_flags(const char __user *filename, int flags)
 	if (result)
 		return result;
 
-	result = __getname();
+	result = alloc_filename();
 	if (unlikely(!result))
 		return ERR_PTR(-ENOMEM);
 
@@ -160,55 +179,42 @@ getname_flags(const char __user *filename, int flags)
 	 */
 	if (unlikely(len <= 0)) {
 		if (unlikely(len < 0)) {
-			__putname(result);
+			free_filename(result);
 			return ERR_PTR(len);
 		}
 
 		/* The empty path is special. */
 		if (!(flags & LOOKUP_EMPTY)) {
-			__putname(result);
+			free_filename(result);
 			return ERR_PTR(-ENOENT);
 		}
 	}
 
 	/*
 	 * Uh-oh. We have a name that's approaching PATH_MAX. Allocate a
-	 * separate struct filename so we can dedicate the entire
-	 * names_cache allocation for the pathname, and re-do the copy from
-	 * userland.
+	 * separate pathname, copy the partial result we already did, and
+	 * then copy the rest of the pathname from user space.
 	 */
 	if (unlikely(len == EMBEDDED_NAME_MAX)) {
-		const size_t size = offsetof(struct filename, iname[1]);
-		kname = (char *)result;
-
-		/*
-		 * size is chosen that way we to guarantee that
-		 * result->iname[0] is within the same object and that
-		 * kname can't be equal to result->iname, no matter what.
-		 */
-		result = kzalloc(size, GFP_KERNEL);
-		if (unlikely(!result)) {
-			__putname(kname);
+		kname = kmalloc(PATH_MAX, GFP_KERNEL);
+		if (unlikely(!kname)) {
+			free_filename(result);
 			return ERR_PTR(-ENOMEM);
 		}
-		result->name = kname;
-		len = strncpy_from_user(kname, filename, PATH_MAX);
+		memcpy(kname, result->iname, EMBEDDED_NAME_MAX);
+
+		// Copy remaining part of the name
+		len = strncpy_from_user(kname + EMBEDDED_NAME_MAX,
+			filename + EMBEDDED_NAME_MAX,
+			PATH_MAX-EMBEDDED_NAME_MAX);
+		if (unlikely(len == PATH_MAX-EMBEDDED_NAME_MAX))
+			len = -ENAMETOOLONG;
 		if (unlikely(len < 0)) {
-			__putname(kname);
-			kfree(result);
+			free_filename(result);
+			kfree(kname);
 			return ERR_PTR(len);
 		}
-		/* The empty path is special. */
-		if (unlikely(!len) && !(flags & LOOKUP_EMPTY)) {
-			__putname(kname);
-			kfree(result);
-			return ERR_PTR(-ENOENT);
-		}
-		if (unlikely(len == PATH_MAX)) {
-			__putname(kname);
-			kfree(result);
-			return ERR_PTR(-ENAMETOOLONG);
-		}
+		result->name = kname;
 	}
 	initname(result, filename);
 	audit_getname(result);
@@ -246,7 +252,7 @@ struct filename *getname_kernel(const char * filename)
 	struct filename *result;
 	int len = strlen(filename) + 1;
 
-	result = __getname();
+	result = alloc_filename();
 	if (unlikely(!result))
 		return ERR_PTR(-ENOMEM);
 
@@ -258,13 +264,13 @@ struct filename *getname_kernel(const char * filename)
 
 		tmp = kmalloc(size, GFP_KERNEL);
 		if (unlikely(!tmp)) {
-			__putname(result);
+			free_filename(result);
 			return ERR_PTR(-ENOMEM);
 		}
 		tmp->name = (char *)result;
 		result = tmp;
 	} else {
-		__putname(result);
+		free_filename(result);
 		return ERR_PTR(-ENAMETOOLONG);
 	}
 	memcpy((char *)result->name, filename, len);
@@ -290,11 +296,9 @@ void putname(struct filename *name)
 			return;
 	}
 
-	if (name->name != name->iname) {
-		__putname(name->name);
-		kfree(name);
-	} else
-		__putname(name);
+	if (name->name != name->iname)
+		kfree(name->name);
+	free_filename(name);
 }
 EXPORT_SYMBOL(putname);
 
diff --git a/fs/ntfs3/namei.c b/fs/ntfs3/namei.c
index 82c8ae56beee..5ddbfe17d8e3 100644
--- a/fs/ntfs3/namei.c
+++ b/fs/ntfs3/namei.c
@@ -407,7 +407,7 @@ static int ntfs_d_hash(const struct dentry *dentry, struct qstr *name)
 	/*
 	 * Try slow way with current upcase table
 	 */
-	uni = kmem_cache_alloc(names_cachep, GFP_NOWAIT);
+	uni = kmalloc(2*(NTFS_NAME_LEN + 1), GFP_NOWAIT);
 	if (!uni)
 		return -ENOMEM;
 
@@ -429,7 +429,7 @@ static int ntfs_d_hash(const struct dentry *dentry, struct qstr *name)
 	err = 0;
 
 out:
-	kmem_cache_free(names_cachep, uni);
+	kfree(uni);
 	return err;
 }
 
diff --git a/include/linux/fs.h b/include/linux/fs.h
index c9588d555f73..9d4707bbc83a 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -81,6 +81,7 @@ struct fs_parameter_spec;
 struct file_kattr;
 struct iomap_ops;
 
+extern void __init namei_init(void);
 extern void __init inode_init(void);
 extern void __init inode_init_early(void);
 extern void __init files_init(void);
@@ -2834,12 +2835,13 @@ extern struct kobject *fs_kobj;
 
 /* fs/open.c */
 struct audit_names;
+#define EMBEDDED_NAME_MAX	128
 struct filename {
 	const char		*name;	/* pointer to actual string */
 	const __user char	*uptr;	/* original userland pointer */
 	atomic_t		refcnt;
 	struct audit_names	*aname;
-	const char		iname[];
+	const char		iname[EMBEDDED_NAME_MAX];
 };
 static_assert(offsetof(struct filename, iname) % sizeof(long) == 0);
 
@@ -2959,10 +2961,9 @@ static inline int finish_open_simple(struct file *file, int error)
 extern void __init vfs_caches_init_early(void);
 extern void __init vfs_caches_init(void);
 
-extern struct kmem_cache *names_cachep;
-
-#define __getname()		kmem_cache_alloc(names_cachep, GFP_KERNEL)
-#define __putname(name)		kmem_cache_free(names_cachep, (void *)(name))
+// Crazy old legacy uses for pathname allocations
+#define __getname() kmalloc(PATH_MAX, GFP_KERNEL)
+#define __putname(name) kfree((void *)(name))
 
 extern struct super_block *blockdev_superblock;
 static inline bool sb_is_blkdev_sb(struct super_block *sb)

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* Re: [RFC][PATCH 10/13] get rid of audit_reusename()
  2025-11-09 20:22       ` Linus Torvalds
@ 2025-11-09 22:18         ` Mateusz Guzik
  2025-11-09 22:29           ` Linus Torvalds
  0 siblings, 1 reply; 51+ messages in thread
From: Mateusz Guzik @ 2025-11-09 22:18 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Al Viro, linux-fsdevel, brauner, jack, paul, axboe, audit,
	io-uring

On Sun, Nov 9, 2025 at 9:22 PM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> On Sun, 9 Nov 2025 at 11:55, Mateusz Guzik <mjguzik@gmail.com> wrote:
> >
> > I looked into this in the past, 64 definitely does not cut it.
>
> We could easily make it be 128 bytes, I just picked 64 at random.
>

I see I neglected to mention, the lengths I had seen were untenable
for a stack-based allocation. Going with a smaller on-stack buf means
having to retry with an extra SMAP trip which probably makes it a
no-go.

While I can't easily redo the survey on Linux, here is a taste from 10
minutes of package building on FreeBSD. A histogram of lengths with a
step of 8, rounded down.

You would need 256 bytes to cover almost all of this. Maybe 192-ish is
a bare minimum where the idea is likely a win? But even then the
people who want 8K stacks probably wont be able to use the feature to
begin with.

dtrace -n 'vfs:namei:lookup:entry { @ =
lquantize(strlen(stringof(arg1)), 0, 384, 8); }'

 value  ------------- Distribution ------------- count
             < 0 |                                         0
               0 |@@@@@@@@                                 18105105
               8 |@@@@@@@                                  16360012
              16 |@@@@@@@@@                                21313430
              24 |@@@@@@                                   15000426
              32 |@@@                                      6450202
              40 |@@                                       4209166
              48 |@                                        2533298
              56 |@                                        1611506
              64 |@                                        1203825
              72 |                                         1068207
              80 |                                         877158
              88 |                                         592192
              96 |                                         489958
             104 |                                         709757
             112 |                                         925775
             120 |                                         1041627
             128 |@                                        1315123
             136 |                                         664687
             144 |                                         276673
             152 |                                         150870
             160 |                                         82661
             168 |                                         40630
             176 |                                         26693
             184 |                                         15112
             192 |                                         7276
             200 |                                         5773
             208 |                                         2462
             216 |                                         1679
             224 |                                         1150
             232 |                                         1301
             240 |                                         1652
             248 |                                         659
             256 |                                         464
             264 |                                         0


> > Anyhow, given that the intent is to damage-control allocation cost, I
> > have to point out there is a patchset to replace the current kmem
> > alloc/free code with sheaves for everyone which promises better
> > performance:
>
> Oh, I'm sure sheaves will improve on the allocation path, but it's not
> going to be even remotely near what a simple stack allocation will be.
> Not just from an allocation cost standpoint, but just from D$ density.
>

I completely agree, but per the above the sizes look unwieldy for the
stack. This is something I tried to do years back and backed off due
to that reason.

> That said, I partly like my patch just because the current code in
> getname_flags() is simply disgusting for all those historical reasons.
> So even if we kept the allocation big - and didn't put it on the stack
> - I think actually using a proper 'struct filename' allocation would
> be a good change.
>

I don't know of anyone is fond of the current code. ;)

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC][PATCH 10/13] get rid of audit_reusename()
  2025-11-09 22:18         ` Mateusz Guzik
@ 2025-11-09 22:29           ` Linus Torvalds
  2025-11-09 22:33             ` Mateusz Guzik
  0 siblings, 1 reply; 51+ messages in thread
From: Linus Torvalds @ 2025-11-09 22:29 UTC (permalink / raw)
  To: Mateusz Guzik
  Cc: Al Viro, linux-fsdevel, brauner, jack, paul, axboe, audit,
	io-uring

On Sun, 9 Nov 2025 at 14:18, Mateusz Guzik <mjguzik@gmail.com> wrote:
>
> You would need 256 bytes to cover almost all of this.

Why would you care to cover all of that?

Your very numbers show that 128 bytes covers 97+% of all cases (and
160 bytes is at 99.8%)

The other cases need to be *correct*, of course, but not necessarily
optimized for.

If we can do 97% of all filenames with a simple on-stack allocation,
that would be a huge win.

(In fact, 64 bytes covers 90% of the cases according to your numbers).

              Linus

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC][PATCH 10/13] get rid of audit_reusename()
  2025-11-09 22:29           ` Linus Torvalds
@ 2025-11-09 22:33             ` Mateusz Guzik
  2025-11-09 22:39               ` Mateusz Guzik
  2025-11-09 22:41               ` Linus Torvalds
  0 siblings, 2 replies; 51+ messages in thread
From: Mateusz Guzik @ 2025-11-09 22:33 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Al Viro, linux-fsdevel, brauner, jack, paul, axboe, audit,
	io-uring

On Sun, Nov 9, 2025 at 11:29 PM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> On Sun, 9 Nov 2025 at 14:18, Mateusz Guzik <mjguzik@gmail.com> wrote:
> >
> > You would need 256 bytes to cover almost all of this.
>
> Why would you care to cover all of that?
>
> Your very numbers show that 128 bytes covers 97+% of all cases (and
> 160 bytes is at 99.8%)
>
> The other cases need to be *correct*, of course, but not necessarily
> optimized for.
>
> If we can do 97% of all filenames with a simple on-stack allocation,
> that would be a huge win.
>
> (In fact, 64 bytes covers 90% of the cases according to your numbers).
>

The programs which pass in these "too long" names just keep doing it,
meaning with a stack-based scheme which forces an extra SMAP trip
means they are permanently shafted. It's not that only a small % of
their lookups is penalized.

However, now that I wrote, I figured one could create a trivial
heuristic: if a given process had too many long names in a row, switch
to go directly to kmem going forward? Reset the flag on exec.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC][PATCH 10/13] get rid of audit_reusename()
  2025-11-09 22:33             ` Mateusz Guzik
@ 2025-11-09 22:39               ` Mateusz Guzik
  2025-11-09 22:41               ` Linus Torvalds
  1 sibling, 0 replies; 51+ messages in thread
From: Mateusz Guzik @ 2025-11-09 22:39 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Al Viro, linux-fsdevel, brauner, jack, paul, axboe, audit,
	io-uring

On Sun, Nov 9, 2025 at 11:33 PM Mateusz Guzik <mjguzik@gmail.com> wrote:
>
> On Sun, Nov 9, 2025 at 11:29 PM Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
> >
> > On Sun, 9 Nov 2025 at 14:18, Mateusz Guzik <mjguzik@gmail.com> wrote:
> > >
> > > You would need 256 bytes to cover almost all of this.
> >
> > Why would you care to cover all of that?
> >
> > Your very numbers show that 128 bytes covers 97+% of all cases (and
> > 160 bytes is at 99.8%)
> >
> > The other cases need to be *correct*, of course, but not necessarily
> > optimized for.
> >
> > If we can do 97% of all filenames with a simple on-stack allocation,
> > that would be a huge win.
> >
> > (In fact, 64 bytes covers 90% of the cases according to your numbers).
> >
>
> The programs which pass in these "too long" names just keep doing it,
> meaning with a stack-based scheme which forces an extra SMAP trip
> means they are permanently shafted. It's not that only a small % of
> their lookups is penalized.
>
> However, now that I wrote, I figured one could create a trivial
> heuristic: if a given process had too many long names in a row, switch
> to go directly to kmem going forward? Reset the flag on exec.

Geez, that was rather poorly stated. Let me try again:

1. The programs which pass long names just keep doing for majority of
their lookups, meaning the extra overhead from failing to fit on the
stack will be there for most of their syscalls.

2. I noted a heuristic could be added to detect these wankers and go
straight to kmem in their case. One trivial idea is to bump a counter
task_struct for every long name and dec for every short name, keep it
bounded. If it goes past a threshold for long names, skip stack
allocs. Then indeed a smaller on-stack buffer would be a great win
overall.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC][PATCH 10/13] get rid of audit_reusename()
  2025-11-09 22:33             ` Mateusz Guzik
  2025-11-09 22:39               ` Mateusz Guzik
@ 2025-11-09 22:41               ` Linus Torvalds
  2025-11-09 22:44                 ` Linus Torvalds
  1 sibling, 1 reply; 51+ messages in thread
From: Linus Torvalds @ 2025-11-09 22:41 UTC (permalink / raw)
  To: Mateusz Guzik
  Cc: Al Viro, linux-fsdevel, brauner, jack, paul, axboe, audit,
	io-uring

On Sun, 9 Nov 2025 at 14:33, Mateusz Guzik <mjguzik@gmail.com> wrote:
>
> The programs which pass in these "too long" names just keep doing it,

Sure. So what?

We optimize for the common case.

Sure, there's an extra SMAP sequence for people using longer names,
but while those SMAP things are costly compared to individual
instructions, they aren't costly in the *big* picture.

They are a pipeline stall, not some kind of horrendous thing.

It would be *more* expensive to try to keep statistics than it is to
just say "long pathnames are more expensive than short ones".

                Linus

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC][PATCH 10/13] get rid of audit_reusename()
  2025-11-09 22:41               ` Linus Torvalds
@ 2025-11-09 22:44                 ` Linus Torvalds
  2025-11-09 23:07                   ` Linus Torvalds
  0 siblings, 1 reply; 51+ messages in thread
From: Linus Torvalds @ 2025-11-09 22:44 UTC (permalink / raw)
  To: Mateusz Guzik
  Cc: Al Viro, linux-fsdevel, brauner, jack, paul, axboe, audit,
	io-uring

On Sun, 9 Nov 2025 at 14:41, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> We optimize for the common case.

Side note: I'm now running that second version of the patch I posted.
It boots. It seems to work.

And again: that patch will slow things down, because it doesn't
actually take advantage of any stack allocation model. So I'm not
claiming that it's an optimization in *that* form. It might _allow_
optimizing things, but as-is is only adds more allocations (but not
particularly many more)

               Linus

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC][PATCH 10/13] get rid of audit_reusename()
  2025-11-09 22:44                 ` Linus Torvalds
@ 2025-11-09 23:07                   ` Linus Torvalds
  0 siblings, 0 replies; 51+ messages in thread
From: Linus Torvalds @ 2025-11-09 23:07 UTC (permalink / raw)
  To: Mateusz Guzik
  Cc: Al Viro, linux-fsdevel, brauner, jack, paul, axboe, audit,
	io-uring

On Sun, 9 Nov 2025 at 14:44, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> And again: that patch will slow things down [..]

Having done a profile just to see, the regular allocation path
(getname_flags -> kmem_cache_alloc_noprof) certainly shows up.

But on that test set (kernel build), I never hit the 128-byte limit,
and interestingly putname() shows up a tiny bit more than
getname_flags().

At least one reason seems to be that the

        if (refcnt != 1) {

thing in putname() is mispredicted, and without the auditing code, the
"refcnt == 1" case is obviously the common case.

Anyway. Not a great test, and this is all the "good behavior", and all
the *real* costs in that particular path are in strncpy_from_user(),
kmem_cache_free() and kmem_cache_alloc_noprof().

And yeah, STAC/CLAC is expensive on my old Zen 2 machine, as are the
cmpxchg16b in the slab alloc/free paths.

And in the end, the actual path walking is more expensive than all of
this, as is selinux_inode_permission(). So the actual filename copying
isn't *really* all that noticeable - you have to look for it.

              Linus

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC][PATCH 10/13] get rid of audit_reusename()
  2025-11-09 22:18     ` Linus Torvalds
@ 2025-11-10  5:17       ` Al Viro
  2025-11-10 16:41         ` Linus Torvalds
  2025-11-10  6:05       ` Al Viro
  2025-11-10  6:36       ` Al Viro
  2 siblings, 1 reply; 51+ messages in thread
From: Al Viro @ 2025-11-10  5:17 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: linux-fsdevel, brauner, jack, mjguzik, paul, axboe, audit,
	io-uring

On Sun, Nov 09, 2025 at 02:18:04PM -0800, Linus Torvalds wrote:

> Anyway, slightly updated patch that makes "names_cachep" local to
> fs/namei.c just because there is absolutely _no_ reason for anybody
> else to ever use it. Except for that insane legacy one of __getname(),
> that is now just a kmalloc.
> 
> I also made EMBEDDED_NAME_MAX be 128 as per Mateusz' comment, although
> to avoid double allocations it should probably be even bigger. A
> "small" value is good for testing that the new logic works, though.
> 
> I haven't actually dared trying to boot into this, so it's still
> entirely untested. But I've at least looked through that patch a bit
> more and tried to search for other insane patterns, and so far that
> oddity in ntfs3 was the only related thing I've found.

*snort*
That's more about weird callers of getname(), but...

#ifdef CONFIG_SYSFS_SYSCALL
static int fs_index(const char __user * __name)
{
	struct file_system_type * tmp;
	struct filename *name;
	int err, index;

	name = getname(__name);
	err = PTR_ERR(name);
	if (IS_ERR(name))   
		return err;

	err = -EINVAL;
	read_lock(&file_systems_lock);
	for (tmp=file_systems, index=0 ; tmp ; tmp=tmp->next, index++) {
		if (strcmp(tmp->name, name->name) == 0) {
			err = index;
			break;
		}
	}
	read_unlock(&file_systems_lock);
	putname(name);
	return err;
}

in fs/filesystems.c

Yes, really - echo $((`sed -ne "/.\<$1$/=" </proc/filesystems` - 1))
apparently does deserve a syscall.  Multiplexor, as well (other
subfunctions are about as hard to implement in userland)...

IMO things like "xfs" or "ceph" don't look like pathnames - if
anything, we ought to use copy_mount_string() for consistency with
mount(2)...

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC][PATCH 10/13] get rid of audit_reusename()
  2025-11-09 22:18     ` Linus Torvalds
  2025-11-10  5:17       ` Al Viro
@ 2025-11-10  6:05       ` Al Viro
  2025-11-10  6:36       ` Al Viro
  2 siblings, 0 replies; 51+ messages in thread
From: Al Viro @ 2025-11-10  6:05 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: linux-fsdevel, brauner, jack, mjguzik, paul, axboe, audit,
	io-uring

On Sun, Nov 09, 2025 at 02:18:04PM -0800, Linus Torvalds wrote:
> On Sun, 9 Nov 2025 at 11:18, Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
> >
> > Hmm? Comments?
> 
> Oh, and while double-checking bad users, I note that ntfs3 does
> 
>         uni = kmem_cache_alloc(names_cachep, GFP_NOWAIT);
> ...
>         kmem_cache_free(names_cachep, uni);
> 
> which is all complete and utter bogosity. I have no idea why anybody
> ever thought that was acceptable. It's garbage.

	So's ntfs_d_compare() right next to it - and there they *still*
hadn't bothered with non-blocking allocation while under ->d_lock and
rcu_read_lock()...

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC][PATCH 10/13] get rid of audit_reusename()
  2025-11-09 22:18     ` Linus Torvalds
  2025-11-10  5:17       ` Al Viro
  2025-11-10  6:05       ` Al Viro
@ 2025-11-10  6:36       ` Al Viro
  2025-11-10 16:50         ` Linus Torvalds
  2 siblings, 1 reply; 51+ messages in thread
From: Al Viro @ 2025-11-10  6:36 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: linux-fsdevel, brauner, jack, mjguzik, paul, axboe, audit,
	io-uring

On Sun, Nov 09, 2025 at 02:18:04PM -0800, Linus Torvalds wrote:

> @@ -143,7 +162,7 @@ getname_flags(const char __user *filename, int flags)

> -		const size_t size = offsetof(struct filename, iname[1]);
> -		kname = (char *)result;
> -
> -		/*
> -		 * size is chosen that way we to guarantee that
> -		 * result->iname[0] is within the same object and that
> -		 * kname can't be equal to result->iname, no matter what.
> -		 */
> -		result = kzalloc(size, GFP_KERNEL);
> -		if (unlikely(!result)) {
> -			__putname(kname);
> +		kname = kmalloc(PATH_MAX, GFP_KERNEL);
> +		if (unlikely(!kname)) {
> +			free_filename(result);
>  			return ERR_PTR(-ENOMEM);
>  		}
> -		result->name = kname;
> -		len = strncpy_from_user(kname, filename, PATH_MAX);
> +		memcpy(kname, result->iname, EMBEDDED_NAME_MAX);
> +
> +		// Copy remaining part of the name
> +		len = strncpy_from_user(kname + EMBEDDED_NAME_MAX,
> +			filename + EMBEDDED_NAME_MAX,
> +			PATH_MAX-EMBEDDED_NAME_MAX);
> +		if (unlikely(len == PATH_MAX-EMBEDDED_NAME_MAX))
> +			len = -ENAMETOOLONG;
>  		if (unlikely(len < 0)) {
> -			__putname(kname);
> -			kfree(result);
> +			free_filename(result);
> +			kfree(kname);
>  			return ERR_PTR(len);
>  		}
> -		/* The empty path is special. */
> -		if (unlikely(!len) && !(flags & LOOKUP_EMPTY)) {
> -			__putname(kname);
> -			kfree(result);
> -			return ERR_PTR(-ENOENT);
> -		}
> -		if (unlikely(len == PATH_MAX)) {
> -			__putname(kname);
> -			kfree(result);
> -			return ERR_PTR(-ENAMETOOLONG);
> -		}
> +		result->name = kname;


FWIW, I would take that part into an out-of-line helper - it does
not need flags and it doesn't really need to bother with
free_filename() on failure - that can be left to the caller,
where we would have
	len = strncpy_from_user(kname, filename, EMBEDDED_NAME_MAX);
	if (unlikely(len == 0) && !(flags & LOOKUP_EMPTY))
		len = -ENOENT;
	if (unlikely(len == EMBEDDED_NAME_MAX))
		len = getname_long_name(result, filename);
	if (unlikely(len < 0)) {
		free_filename(result);
		return ERR_PTR(len);
	}
	// we are fine, finish setting it up
	...
	return result;


> @@ -246,7 +252,7 @@ struct filename *getname_kernel(const char * filename)
>  	struct filename *result;
>  	int len = strlen(filename) + 1;
>  
> -	result = __getname();
> +	result = alloc_filename();
>  	if (unlikely(!result))
>  		return ERR_PTR(-ENOMEM);
>  
> @@ -258,13 +264,13 @@ struct filename *getname_kernel(const char * filename)
>  
>  		tmp = kmalloc(size, GFP_KERNEL);
>  		if (unlikely(!tmp)) {
> -			__putname(result);
> +			free_filename(result);
>  			return ERR_PTR(-ENOMEM);
>  		}
>  		tmp->name = (char *)result;
>  		result = tmp;

That's wrong - putname() will choke on that (free_filename() on result of
kmalloc()).  Something like
        if (len <= EMBEDDED_NAME_MAX) {
		result->name = (char *)result->iname;
		memcpy(result->name, filename, len);
	} else if (len <= PATH_MAX) {
		result->name = kmemdup(filename, len);
		if (unlikely(!result->name))
			len = -ENOMEM;
	} else {
		len = -ENAMETOOLONG;
	}
	if (unlikely(len < 0) {
		free_filename(result);
		return ERR_PTR(len);
	}
	// finish setting up
	...
	return result;
perhaps?

And if we are going for that kind of changes, I would rather take
that stuff to fs/filename.c, TBH...

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC][PATCH 10/13] get rid of audit_reusename()
  2025-11-10  5:17       ` Al Viro
@ 2025-11-10 16:41         ` Linus Torvalds
  2025-11-10 19:58           ` Al Viro
  2025-11-12  9:26           ` Christian Brauner
  0 siblings, 2 replies; 51+ messages in thread
From: Linus Torvalds @ 2025-11-10 16:41 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-fsdevel, brauner, jack, mjguzik, paul, axboe, audit,
	io-uring

On Sun, 9 Nov 2025 at 21:17, Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> That's more about weird callers of getname(), but...
>
> #ifdef CONFIG_SYSFS_SYSCALL
> static int fs_index(const char __user * __name)
> {
>         struct file_system_type * tmp;
>         struct filename *name;
>         int err, index;
>
>         name = getname(__name);

Yeah, ok, this is certainly a somewhat unusual pattern in that "name"
here is not a pathname, but at the same time I can't fault this code
for using a convenient function for "allocate and copy a string from
user space".

> Yes, really - echo $((`sed -ne "/.\<$1$/=" </proc/filesystems` - 1))
> apparently does deserve a syscall.  Multiplexor, as well (other
> subfunctions are about as hard to implement in userland)...

I think those are all "crazy legacy from back in the dark ages when we
thought iBCS2 was a goal".

I doubt anybody uses that 'sysfs()' system call, and it's behind the
SYSFS_SYSCALL config variable that was finally made "default n" this
year, but has actually had a help-message that called it obsolete
since at least 2014.

The code predates not just git, but the bitkeeper history too - and
we've long since removed all the actual iBCS2 code (see for example
commit 612a95b4e053: "x86: remove iBCS support", which removed some
binfmt left-overs - back in 2008).

> IMO things like "xfs" or "ceph" don't look like pathnames - if
> anything, we ought to use copy_mount_string() for consistency with
> mount(2)...

Oh, absolutely not.

But that code certainly could just do strndup_user(). That's the
normal thing for "get a string from user space" these days, but it
didn't historically exist..

That said, I think that code will  just get removed, so it's not even
worth worrying about. I don't think anybody even *noticed* that we
made it "default n" after all these years.

             Linus

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC][PATCH 10/13] get rid of audit_reusename()
  2025-11-10  6:36       ` Al Viro
@ 2025-11-10 16:50         ` Linus Torvalds
  0 siblings, 0 replies; 51+ messages in thread
From: Linus Torvalds @ 2025-11-10 16:50 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-fsdevel, brauner, jack, mjguzik, paul, axboe, audit,
	io-uring

On Sun, 9 Nov 2025 at 22:37, Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> > @@ -258,13 +264,13 @@ struct filename *getname_kernel(const char * filename)
> >
> >               tmp = kmalloc(size, GFP_KERNEL);
> >               if (unlikely(!tmp)) {
> > -                     __putname(result);
> > +                     free_filename(result);
> >                       return ERR_PTR(-ENOMEM);
> >               }
> >               tmp->name = (char *)result;
> >               result = tmp;
>
> That's wrong - putname() will choke on that (free_filename() on result of
> kmalloc()).

Yeah, that's me not doing the right conversion from the old crazy
"turn allocations around".

It should just do

                char *tmp = kmalloc(len, GFP_KERNEL);
                .... NULL check ..
                result->name = tmp;

without any odd games with types. And yeah, that code could be
re-organized to be clearer.

             Linus

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC][PATCH 10/13] get rid of audit_reusename()
  2025-11-10 16:41         ` Linus Torvalds
@ 2025-11-10 19:58           ` Al Viro
  2025-11-10 20:52             ` Linus Torvalds
  2025-11-12  9:26           ` Christian Brauner
  1 sibling, 1 reply; 51+ messages in thread
From: Al Viro @ 2025-11-10 19:58 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: linux-fsdevel, brauner, jack, mjguzik, paul, axboe, audit,
	io-uring

On Mon, Nov 10, 2025 at 08:41:45AM -0800, Linus Torvalds wrote:

> > IMO things like "xfs" or "ceph" don't look like pathnames - if
> > anything, we ought to use copy_mount_string() for consistency with
> > mount(2)...
> 
> Oh, absolutely not.
> 
> But that code certainly could just do strndup_user(). That's the
> normal thing for "get a string from user space" these days, but it
> didn't historically exist..

There would be a minor change in errors (s/ENAMETOOLONG/EINVAL/
and s/ENOENT/EINVAL/, AFAICS), but nobody really gives a damn,
so...

If we go that way, do you see any problems with treating
osf_{ufs,cdfs}_mount() in the same manner?  Yes, these are pathnames,
but the strings copied there are going to be fed (after another round
of copying) to lookup_bdev(), i.e. getname_kernel().  At least that
way it would be consistent with what mount(2) does...

diff --git a/arch/alpha/kernel/osf_sys.c b/arch/alpha/kernel/osf_sys.c
index a08e8edef1a4..21a8f985999b 100644
--- a/arch/alpha/kernel/osf_sys.c
+++ b/arch/alpha/kernel/osf_sys.c
@@ -454,42 +454,34 @@ static int
 osf_ufs_mount(const char __user *dirname,
 	      struct ufs_args __user *args, int flags)
 {
-	int retval;
+	char *devname __free(kfree) = NULL;
 	struct cdfs_args tmp;
-	struct filename *devname;
 
-	retval = -EFAULT;
 	if (copy_from_user(&tmp, args, sizeof(tmp)))
-		goto out;
-	devname = getname(tmp.devname);
-	retval = PTR_ERR(devname);
+		return -EFAULT;
+
+	devname = strndup_user(tmp.devname, PATH_MAX);
 	if (IS_ERR(devname))
-		goto out;
-	retval = do_mount(devname->name, dirname, "ext2", flags, NULL);
-	putname(devname);
- out:
-	return retval;
+		return PTR_ERR(devname);
+
+	return do_mount(devname, dirname, "ext2", flags, NULL);
 }
 
 static int
 osf_cdfs_mount(const char __user *dirname,
 	       struct cdfs_args __user *args, int flags)
 {
-	int retval;
+	char *devname __free(kfree) = NULL;
 	struct cdfs_args tmp;
-	struct filename *devname;
 
-	retval = -EFAULT;
 	if (copy_from_user(&tmp, args, sizeof(tmp)))
-		goto out;
-	devname = getname(tmp.devname);
-	retval = PTR_ERR(devname);
+		return -EFAULT;
+
+	devname = strndup_user(tmp.devname, PATH_MAX);
 	if (IS_ERR(devname))
-		goto out;
-	retval = do_mount(devname->name, dirname, "iso9660", flags, NULL);
-	putname(devname);
- out:
-	return retval;
+		return PTR_ERR(devname);
+
+	return do_mount(devname, dirname, "iso9660", flags, NULL);
 }
 
 static int


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* Re: [RFC][PATCH 10/13] get rid of audit_reusename()
  2025-11-10 19:58           ` Al Viro
@ 2025-11-10 20:52             ` Linus Torvalds
  2025-11-11  1:16               ` Al Viro
  0 siblings, 1 reply; 51+ messages in thread
From: Linus Torvalds @ 2025-11-10 20:52 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-fsdevel, brauner, jack, mjguzik, paul, axboe, audit,
	io-uring

On Mon, 10 Nov 2025 at 11:58, Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> If we go that way, do you see any problems with treating
> osf_{ufs,cdfs}_mount() in the same manner?  Yes, these are pathnames,

Hmm. In those cases, the ENAMETOOLONG thing actually does make sense -
exactly because they are pathnames.

So I think that in those two places using getname() is fairly natural
and gets us the natural error handling too. No?

              Linus

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC][PATCH 10/13] get rid of audit_reusename()
  2025-11-09  6:37 ` [RFC][PATCH 10/13] get rid of audit_reusename() Al Viro
  2025-11-09 19:18   ` Linus Torvalds
@ 2025-11-10 23:13   ` Paul Moore
  2025-11-11  0:23     ` Paul Moore
  2025-11-13 10:29   ` Jan Kara
  2 siblings, 1 reply; 51+ messages in thread
From: Paul Moore @ 2025-11-10 23:13 UTC (permalink / raw)
  To: Al Viro
  Cc: linux-fsdevel, torvalds, brauner, jack, mjguzik, axboe, audit,
	io-uring

On Sun, Nov 9, 2025 at 1:37 AM Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> Originally we tried to avoid multiple insertions into audit names array
> during retry loop by a cute hack - memorize the userland pointer and
> if there already is a match, just grab an extra reference to it.
>
> Cute as it had been, it had problems - two identical pointers had
> audit aux entries merged, two identical strings did not.  Having
> different behaviour for syscalls that differ only by addresses of
> otherwise identical string arguments is obviously wrong - if nothing
> else, compiler can decide to merge identical string literals.
>
> Besides, this hack does nothing for non-audited processes - they get
> a fresh copy for retry.  It's not time-critical, but having behaviour
> subtly differ that way is bogus.
>
> These days we have very few places that import filename more than once
> (9 functions total) and it's easy to massage them so we get rid of all
> re-imports.  With that done, we don't need audit_reusename() anymore.
> There's no need to memorize userland pointer either.
>
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> ---
>  fs/namei.c            | 11 +++--------
>  include/linux/audit.h | 11 -----------
>  include/linux/fs.h    |  1 -
>  kernel/auditsc.c      | 23 -----------------------
>  4 files changed, 3 insertions(+), 43 deletions(-)

Looks reasonable to me.  Not sure if you've run it through the
audit-testsuite yet, but I'm building a test kernel as I write this,
I'll let you know how it goes.

Acked-by: Paul Moore <paul@paul-moore.com>

-- 
paul-moore.com

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC][PATCH 10/13] get rid of audit_reusename()
  2025-11-10 23:13   ` Paul Moore
@ 2025-11-11  0:23     ` Paul Moore
  0 siblings, 0 replies; 51+ messages in thread
From: Paul Moore @ 2025-11-11  0:23 UTC (permalink / raw)
  To: Al Viro
  Cc: linux-fsdevel, torvalds, brauner, jack, mjguzik, axboe, audit,
	io-uring

On Mon, Nov 10, 2025 at 6:13 PM Paul Moore <paul@paul-moore.com> wrote:
> On Sun, Nov 9, 2025 at 1:37 AM Al Viro <viro@zeniv.linux.org.uk> wrote:
> >
> > Originally we tried to avoid multiple insertions into audit names array
> > during retry loop by a cute hack - memorize the userland pointer and
> > if there already is a match, just grab an extra reference to it.
> >
> > Cute as it had been, it had problems - two identical pointers had
> > audit aux entries merged, two identical strings did not.  Having
> > different behaviour for syscalls that differ only by addresses of
> > otherwise identical string arguments is obviously wrong - if nothing
> > else, compiler can decide to merge identical string literals.
> >
> > Besides, this hack does nothing for non-audited processes - they get
> > a fresh copy for retry.  It's not time-critical, but having behaviour
> > subtly differ that way is bogus.
> >
> > These days we have very few places that import filename more than once
> > (9 functions total) and it's easy to massage them so we get rid of all
> > re-imports.  With that done, we don't need audit_reusename() anymore.
> > There's no need to memorize userland pointer either.
> >
> > Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> > ---
> >  fs/namei.c            | 11 +++--------
> >  include/linux/audit.h | 11 -----------
> >  include/linux/fs.h    |  1 -
> >  kernel/auditsc.c      | 23 -----------------------
> >  4 files changed, 3 insertions(+), 43 deletions(-)
>
> Looks reasonable to me.  Not sure if you've run it through the
> audit-testsuite yet, but I'm building a test kernel as I write this,
> I'll let you know how it goes.
>
> Acked-by: Paul Moore <paul@paul-moore.com>

FWIW, it passes the audit-testsuite.

Tested-by: Paul Moore <paul@paul-moore.com>

-- 
paul-moore.com

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC][PATCH 11/13] allow incomplete imports of filenames
  2025-11-09  6:37 ` [RFC][PATCH 11/13] allow incomplete imports of filenames Al Viro
@ 2025-11-11  0:45   ` Paul Moore
  2025-11-11 14:41   ` Jens Axboe
  1 sibling, 0 replies; 51+ messages in thread
From: Paul Moore @ 2025-11-11  0:45 UTC (permalink / raw)
  To: Al Viro
  Cc: linux-fsdevel, torvalds, brauner, jack, mjguzik, axboe, audit,
	io-uring

On Sun, Nov 9, 2025 at 1:38 AM Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> There are two filename-related problems in io_uring and its
> interplay with audit.
>
> Filenames are imported when request is submitted and used when
> it is processed.  Unfortunately, the latter may very well
> happen in a different thread.  In that case the reference to
> filename is put into the wrong audit_context - that of submitting
> thread, not the processing one.  Audit logics is called by
> the latter, and it really wants to be able to find the names
> in audit_context current (== processing) thread.
>
> Another related problem is the headache with refcounts -
> normally all references to given struct filename are visible
> only to one thread (the one that uses that struct filename).
> io_uring violates that - an extra reference is stashed in
> audit_context of submitter.  It gets dropped when submitter
> returns to userland, which can happen simultaneously with
> processing thread deciding to drop the reference it got.
>
> We paper over that by making refcount atomic, but that means
> pointless headache for everyone.
>
> Solution: the notion of partially imported filenames.  Namely,
> already copied from userland, but *not* exposed to audit yet.
>
> io_uring can create that in submitter thread, and complete the
> import (obtaining the usual reference to struct filename) in
> processing thread.
>
> Object: struct delayed_filename.
>
> Primitives for working with it:
>
> delayed_getname(&delayed_filename, user_string) - copies the name
> from userland, returning 0 and stashing the address of (still incomplete)
> struct filename in delayed_filename on success and returning -E... on
> error.
>
> delayed_getname_uflags(&delayed_filename, user_string, atflags) - similar,
> in the same relation to delayed_getname() as getname_uflags() is to getname()
>
> complete_getname(&delayed_getname) - completes the import of filename stashed
> in delayed_getname and returns struct filename to caller, emptying delayed_getname.
>
> dismiss_delayed_getname(&delayed_getname) - destructor; drops whatever
> might be stashed in delayed_getname, emptying it.
>
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> ---
>  fs/namei.c           |  45 +++++++++++++++++--
>  include/linux/fs.h   |  10 +++++
>  io_uring/fs.c        | 101 +++++++++++++++++++++++--------------------
>  io_uring/openclose.c |  16 +++----
>  io_uring/statx.c     |  17 +++-----
>  io_uring/xattr.c     |  30 +++++--------
>  6 files changed, 129 insertions(+), 90 deletions(-)

I don't have any patches to share for this yet, but as a FYI, I've
started working on some audit patches to deal with this issue in a
more general way; the io_uring approach of splitting processing
between a prep and work stage causes audit problems beyond just
filenames.  My current thought is to setup an audit_context in the
io_uring prep stage for those ops that need auditing, and then carry
around the context in the io_kiocb for later use, swapping it to
current->audit_context when the work is performed.  Still plenty of
hand wavy stuff to sort out, and it's entirely possible that it
doesn't work out, or is too ugly to see posted, but it seemed relevant
to the patch.

Regardless, I think this approach is reasonable; best case it is a
stopgap until we have something more general, worst case it becomes a
more permanent fix.  I can live with either outcome.

Acked-by: Paul Moore <paul@paul-moore.com>

-- 
paul-moore.com

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC][PATCH 10/13] get rid of audit_reusename()
  2025-11-10 20:52             ` Linus Torvalds
@ 2025-11-11  1:16               ` Al Viro
  0 siblings, 0 replies; 51+ messages in thread
From: Al Viro @ 2025-11-11  1:16 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: linux-fsdevel, brauner, jack, mjguzik, paul, axboe, audit,
	io-uring

On Mon, Nov 10, 2025 at 12:52:57PM -0800, Linus Torvalds wrote:
> On Mon, 10 Nov 2025 at 11:58, Al Viro <viro@zeniv.linux.org.uk> wrote:
> >
> > If we go that way, do you see any problems with treating
> > osf_{ufs,cdfs}_mount() in the same manner?  Yes, these are pathnames,
> 
> Hmm. In those cases, the ENAMETOOLONG thing actually does make sense -
> exactly because they are pathnames.
> 
> So I think that in those two places using getname() is fairly natural
> and gets us the natural error handling too. No?

Seeing that we don't do that for native mount(2)...  Hell knows -
the thing is, import is done early in syscall, before we know what
does that particular filesystem type expect.  It's not always a pathname
of any sort.

Note that empty string for the first argument of mount(2) is perfectly
fine - for some values of the 5th (flags) and 3rd (type) ones.
So plain getname() is not an option and we do, unfortunately, need the
sucker imported before we start parsing the flags, etc.

I'd love to take security_sb_mount() - it's an utter shitshow of API,
inherently racy, etc., but currently it takes a kernelspace string for
the first argument (dev_name) and is called before we'd even started
to look at flags (so the instances have to essentially duplicate flags
parsing, among other things).

But that's a lot of massage and it will require the LSM crowd acceptance ;-/
If earlier attempts are anything to go by, there will be a lot of
resistance.

BTW, take a look at apparmor_sb_mount() and aa_bind_mount().  Compare
the calls of kern_path() in aa_bind_mount() and in do_loopback() and
note that the damn thing relies upon both pathwalks resolving to the
same location...

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC][PATCH 11/13] allow incomplete imports of filenames
  2025-11-09  6:37 ` [RFC][PATCH 11/13] allow incomplete imports of filenames Al Viro
  2025-11-11  0:45   ` Paul Moore
@ 2025-11-11 14:41   ` Jens Axboe
  2025-11-19  1:12     ` Al Viro
  1 sibling, 1 reply; 51+ messages in thread
From: Jens Axboe @ 2025-11-11 14:41 UTC (permalink / raw)
  To: Al Viro, linux-fsdevel
  Cc: torvalds, brauner, jack, mjguzik, paul, audit, io-uring

On 11/8/25 11:37 PM, Al Viro wrote:
> There are two filename-related problems in io_uring and its
> interplay with audit.
> 
> Filenames are imported when request is submitted and used when
> it is processed.  Unfortunately, the latter may very well
> happen in a different thread.  In that case the reference to
> filename is put into the wrong audit_context - that of submitting
> thread, not the processing one.  Audit logics is called by
> the latter, and it really wants to be able to find the names
> in audit_context current (== processing) thread.
> 
> Another related problem is the headache with refcounts -
> normally all references to given struct filename are visible
> only to one thread (the one that uses that struct filename).
> io_uring violates that - an extra reference is stashed in
> audit_context of submitter.  It gets dropped when submitter
> returns to userland, which can happen simultaneously with
> processing thread deciding to drop the reference it got.
> 
> We paper over that by making refcount atomic, but that means
> pointless headache for everyone.
> 
> Solution: the notion of partially imported filenames.  Namely,
> already copied from userland, but *not* exposed to audit yet.
> 
> io_uring can create that in submitter thread, and complete the
> import (obtaining the usual reference to struct filename) in
> processing thread.
> 
> Object: struct delayed_filename.
> 
> Primitives for working with it:
> 
> delayed_getname(&delayed_filename, user_string) - copies the name
> from userland, returning 0 and stashing the address of (still incomplete)
> struct filename in delayed_filename on success and returning -E... on
> error.
> 
> delayed_getname_uflags(&delayed_filename, user_string, atflags) - similar,
> in the same relation to delayed_getname() as getname_uflags() is to getname()
> 
> complete_getname(&delayed_getname) - completes the import of filename stashed
> in delayed_getname and returns struct filename to caller, emptying delayed_getname.

dismiss_delayed_filename()

> diff --git a/io_uring/openclose.c b/io_uring/openclose.c
> index bfeb91b31bba..6bc14f626923 100644
> --- a/io_uring/openclose.c
> +++ b/io_uring/openclose.c
> @@ -121,6 +118,7 @@ int io_openat2(struct io_kiocb *req, unsigned int issue_flags)
>  	struct file *file;
>  	bool resolve_nonblock, nonblock_set;
>  	bool fixed = !!open->file_slot;
> +	struct filename *name __free(putname) = complete_getname(&open->filename);
>  	int ret;
>  
>  	ret = build_open_flags(&open->how, &op);

I don't think this will work as-is - the prep has been done on the
request, but we could be retrying io_openat2(). That will happen if this
function returns -EAGAIN. That will then end up with a cleared out
filename for the second (blocking) invocation.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC][PATCH 10/13] get rid of audit_reusename()
  2025-11-10 16:41         ` Linus Torvalds
  2025-11-10 19:58           ` Al Viro
@ 2025-11-12  9:26           ` Christian Brauner
  1 sibling, 0 replies; 51+ messages in thread
From: Christian Brauner @ 2025-11-12  9:26 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Al Viro, linux-fsdevel, jack, mjguzik, paul, axboe, audit,
	io-uring

On Mon, Nov 10, 2025 at 08:41:45AM -0800, Linus Torvalds wrote:
> On Sun, 9 Nov 2025 at 21:17, Al Viro <viro@zeniv.linux.org.uk> wrote:
> >
> > That's more about weird callers of getname(), but...
> >
> > #ifdef CONFIG_SYSFS_SYSCALL
> > static int fs_index(const char __user * __name)
> > {
> >         struct file_system_type * tmp;
> >         struct filename *name;
> >         int err, index;
> >
> >         name = getname(__name);
> 
> Yeah, ok, this is certainly a somewhat unusual pattern in that "name"
> here is not a pathname, but at the same time I can't fault this code
> for using a convenient function for "allocate and copy a string from
> user space".
> 
> > Yes, really - echo $((`sed -ne "/.\<$1$/=" </proc/filesystems` - 1))
> > apparently does deserve a syscall.  Multiplexor, as well (other
> > subfunctions are about as hard to implement in userland)...
> 
> I think those are all "crazy legacy from back in the dark ages when we
> thought iBCS2 was a goal".
> 
> I doubt anybody uses that 'sysfs()' system call, and it's behind the
> SYSFS_SYSCALL config variable that was finally made "default n" this
> year, but has actually had a help-message that called it obsolete
> since at least 2014.
> 
> The code predates not just git, but the bitkeeper history too - and
> we've long since removed all the actual iBCS2 code (see for example
> commit 612a95b4e053: "x86: remove iBCS support", which removed some
> binfmt left-overs - back in 2008).
> 
> > IMO things like "xfs" or "ceph" don't look like pathnames - if
> > anything, we ought to use copy_mount_string() for consistency with
> > mount(2)...
> 
> Oh, absolutely not.
> 
> But that code certainly could just do strndup_user(). That's the
> normal thing for "get a string from user space" these days, but it
> didn't historically exist..
> 
> That said, I think that code will  just get removed, so it's not even
> worth worrying about. I don't think anybody even *noticed* that we
> made it "default n" after all these years.

Nobody noticed at all when I flipped the switch earlier this year. I
explicitly dit it to prepare for removal of the sysfs() system call so
I'm happy to pull the trigger any time!

https://lore.kernel.org/all/20250415-dezimieren-wertpapier-9fd18a211a41@brauner

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC][PATCH 01/13] do_faccessat(): import pathname only once
  2025-11-09  6:37 ` [RFC][PATCH 01/13] do_faccessat(): import pathname only once Al Viro
@ 2025-11-13 10:11   ` Jan Kara
  0 siblings, 0 replies; 51+ messages in thread
From: Jan Kara @ 2025-11-13 10:11 UTC (permalink / raw)
  To: Al Viro
  Cc: linux-fsdevel, torvalds, brauner, jack, mjguzik, paul, axboe,
	audit, io-uring

On Sun 09-11-25 06:37:33, Al Viro wrote:
> Convert the user_path_at() call inside a retry loop into getname_flags() +
> filename_lookup() + putname() and leave only filename_lookup() inside
> the loop.
> 
> Since we have the default logics for use of LOOKUP_EMPTY (passed iff
> AT_EMPTY_PATH is present in flags), just use getname_uflags() and
> don't bother with setting LOOKUP_EMPTY in lookup_flags - getname_uflags()
> will pass the right thing to getname_flags() and filename_lookup()
> doesn't care about LOOKUP_EMPTY at all.
> 
> The things could be further simplified by use of cleanup.h stuff, but
> let's not clutter the patch with that.
> 
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

Looks good. Feel free to add:

Reviewed-by: Jan Kara <jack@suse.cz>

								Honza

> ---
>  fs/open.c | 7 ++++---
>  1 file changed, 4 insertions(+), 3 deletions(-)
> 
> diff --git a/fs/open.c b/fs/open.c
> index 3d64372ecc67..db8fe2b5463d 100644
> --- a/fs/open.c
> +++ b/fs/open.c
> @@ -471,6 +471,7 @@ static int do_faccessat(int dfd, const char __user *filename, int mode, int flag
>  	int res;
>  	unsigned int lookup_flags = LOOKUP_FOLLOW;
>  	const struct cred *old_cred = NULL;
> +	struct filename *name;
>  
>  	if (mode & ~S_IRWXO)	/* where's F_OK, X_OK, W_OK, R_OK? */
>  		return -EINVAL;
> @@ -480,8 +481,6 @@ static int do_faccessat(int dfd, const char __user *filename, int mode, int flag
>  
>  	if (flags & AT_SYMLINK_NOFOLLOW)
>  		lookup_flags &= ~LOOKUP_FOLLOW;
> -	if (flags & AT_EMPTY_PATH)
> -		lookup_flags |= LOOKUP_EMPTY;
>  
>  	if (access_need_override_creds(flags)) {
>  		old_cred = access_override_creds();
> @@ -489,8 +488,9 @@ static int do_faccessat(int dfd, const char __user *filename, int mode, int flag
>  			return -ENOMEM;
>  	}
>  
> +	name = getname_uflags(filename, flags);
>  retry:
> -	res = user_path_at(dfd, filename, lookup_flags, &path);
> +	res = filename_lookup(dfd, name, lookup_flags, &path, NULL);
>  	if (res)
>  		goto out;
>  
> @@ -530,6 +530,7 @@ static int do_faccessat(int dfd, const char __user *filename, int mode, int flag
>  		goto retry;
>  	}
>  out:
> +	putname(name);
>  	if (old_cred)
>  		put_cred(revert_creds(old_cred));
>  
> -- 
> 2.47.3
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC][PATCH 02/13] do_fchmodat(): import pathname only once
  2025-11-09  6:37 ` [RFC][PATCH 02/13] do_fchmodat(): " Al Viro
@ 2025-11-13 10:12   ` Jan Kara
  0 siblings, 0 replies; 51+ messages in thread
From: Jan Kara @ 2025-11-13 10:12 UTC (permalink / raw)
  To: Al Viro
  Cc: linux-fsdevel, torvalds, brauner, jack, mjguzik, paul, axboe,
	audit, io-uring

On Sun 09-11-25 06:37:34, Al Viro wrote:
> Convert the user_path_at() call inside a retry loop into getname_flags() +
> filename_lookup() + putname() and leave only filename_lookup() inside
> the loop.
> 
> Since we have the default logics for use of LOOKUP_EMPTY (passed iff
> AT_EMPTY_PATH is present in flags), just use getname_uflags() and
> don't bother with setting LOOKUP_EMPTY in lookup_flags - getname_uflags()
> will pass the right thing to getname_flags() and filename_lookup()
> doesn't care about LOOKUP_EMPTY at all.
> 
> The things could be further simplified by use of cleanup.h stuff, but
> let's not clutter the patch with that.
> 
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

Looks good. Feel free to add:

Reviewed-by: Jan Kara <jack@suse.cz>

								Honza

> ---
>  fs/open.c | 8 ++++----
>  1 file changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/fs/open.c b/fs/open.c
> index db8fe2b5463d..e9a08a820e49 100644
> --- a/fs/open.c
> +++ b/fs/open.c
> @@ -682,6 +682,7 @@ static int do_fchmodat(int dfd, const char __user *filename, umode_t mode,
>  		       unsigned int flags)
>  {
>  	struct path path;
> +	struct filename *name;
>  	int error;
>  	unsigned int lookup_flags;
>  
> @@ -689,11 +690,9 @@ static int do_fchmodat(int dfd, const char __user *filename, umode_t mode,
>  		return -EINVAL;
>  
>  	lookup_flags = (flags & AT_SYMLINK_NOFOLLOW) ? 0 : LOOKUP_FOLLOW;
> -	if (flags & AT_EMPTY_PATH)
> -		lookup_flags |= LOOKUP_EMPTY;
> -
> +	name = getname_uflags(filename, flags);
>  retry:
> -	error = user_path_at(dfd, filename, lookup_flags, &path);
> +	error = filename_lookup(dfd, name, lookup_flags, &path, NULL);
>  	if (!error) {
>  		error = chmod_common(&path, mode);
>  		path_put(&path);
> @@ -702,6 +701,7 @@ static int do_fchmodat(int dfd, const char __user *filename, umode_t mode,
>  			goto retry;
>  		}
>  	}
> +	putname(name);
>  	return error;
>  }
>  
> -- 
> 2.47.3
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC][PATCH 03/13] do_fchownat(): import pathname only once
  2025-11-09  6:37 ` [RFC][PATCH 03/13] do_fchownat(): " Al Viro
@ 2025-11-13 10:13   ` Jan Kara
  0 siblings, 0 replies; 51+ messages in thread
From: Jan Kara @ 2025-11-13 10:13 UTC (permalink / raw)
  To: Al Viro
  Cc: linux-fsdevel, torvalds, brauner, jack, mjguzik, paul, axboe,
	audit, io-uring

On Sun 09-11-25 06:37:35, Al Viro wrote:
> Convert the user_path_at() call inside a retry loop into getname_flags() +
> filename_lookup() + putname() and leave only filename_lookup() inside
> the loop.
> 
> Since we have the default logics for use of LOOKUP_EMPTY (passed iff
> AT_EMPTY_PATH is present in flags), just use getname_uflags() and
> don't bother with setting LOOKUP_EMPTY in lookup_flags - getname_uflags()
> will pass the right thing to getname_flags() and filename_lookup()
> doesn't care about LOOKUP_EMPTY at all.
> 
> The things could be further simplified by use of cleanup.h stuff, but
> let's not clutter the patch with that.
> 
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

Looks good. Feel free to add:

Reviewed-by: Jan Kara <jack@suse.cz>

								Honza

> ---
>  fs/open.c | 11 ++++++-----
>  1 file changed, 6 insertions(+), 5 deletions(-)
> 
> diff --git a/fs/open.c b/fs/open.c
> index e9a08a820e49..e5110f5e80c7 100644
> --- a/fs/open.c
> +++ b/fs/open.c
> @@ -804,17 +804,17 @@ int do_fchownat(int dfd, const char __user *filename, uid_t user, gid_t group,
>  		int flag)
>  {
>  	struct path path;
> -	int error = -EINVAL;
> +	int error;
>  	int lookup_flags;
> +	struct filename *name;
>  
>  	if ((flag & ~(AT_SYMLINK_NOFOLLOW | AT_EMPTY_PATH)) != 0)
> -		goto out;
> +		return -EINVAL;
>  
>  	lookup_flags = (flag & AT_SYMLINK_NOFOLLOW) ? 0 : LOOKUP_FOLLOW;
> -	if (flag & AT_EMPTY_PATH)
> -		lookup_flags |= LOOKUP_EMPTY;
> +	name = getname_uflags(filename, flag);
>  retry:
> -	error = user_path_at(dfd, filename, lookup_flags, &path);
> +	error = filename_lookup(dfd, name, lookup_flags, &path, NULL);
>  	if (error)
>  		goto out;
>  	error = mnt_want_write(path.mnt);
> @@ -829,6 +829,7 @@ int do_fchownat(int dfd, const char __user *filename, uid_t user, gid_t group,
>  		goto retry;
>  	}
>  out:
> +	putname(name);
>  	return error;
>  }
>  
> -- 
> 2.47.3
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC][PATCH 04/13] do_utimes_path(): import pathname only once
  2025-11-09  6:37 ` [RFC][PATCH 04/13] do_utimes_path(): " Al Viro
@ 2025-11-13 10:15   ` Jan Kara
  0 siblings, 0 replies; 51+ messages in thread
From: Jan Kara @ 2025-11-13 10:15 UTC (permalink / raw)
  To: Al Viro
  Cc: linux-fsdevel, torvalds, brauner, jack, mjguzik, paul, axboe,
	audit, io-uring

On Sun 09-11-25 06:37:36, Al Viro wrote:
> Convert the user_path_at() call inside a retry loop into getname_flags() +
> filename_lookup() + putname() and leave only filename_lookup() inside
> the loop.
> 
> Since we have the default logics for use of LOOKUP_EMPTY (passed iff
> AT_EMPTY_PATH is present in flags), just use getname_uflags() and
> don't bother with setting LOOKUP_EMPTY in lookup_flags - getname_uflags()
> will pass the right thing to getname_flags() and filename_lookup()
> doesn't care about LOOKUP_EMPTY at all.
> 
> The things could be further simplified by use of cleanup.h stuff, but
> let's not clutter the patch with that.
> 
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

Looks good. Feel free to add:

Reviewed-by: Jan Kara <jack@suse.cz>

								Honza

> ---
>  fs/utimes.c | 13 +++++++------
>  1 file changed, 7 insertions(+), 6 deletions(-)
> 
> diff --git a/fs/utimes.c b/fs/utimes.c
> index c7c7958e57b2..262a4ddeb9cc 100644
> --- a/fs/utimes.c
> +++ b/fs/utimes.c
> @@ -8,6 +8,7 @@
>  #include <linux/compat.h>
>  #include <asm/unistd.h>
>  #include <linux/filelock.h>
> +#include "internal.h"
>  
>  static bool nsec_valid(long nsec)
>  {
> @@ -82,27 +83,27 @@ static int do_utimes_path(int dfd, const char __user *filename,
>  {
>  	struct path path;
>  	int lookup_flags = 0, error;
> +	struct filename *name;
>  
>  	if (flags & ~(AT_SYMLINK_NOFOLLOW | AT_EMPTY_PATH))
>  		return -EINVAL;
>  
>  	if (!(flags & AT_SYMLINK_NOFOLLOW))
>  		lookup_flags |= LOOKUP_FOLLOW;
> -	if (flags & AT_EMPTY_PATH)
> -		lookup_flags |= LOOKUP_EMPTY;
> +	name = getname_uflags(filename, flags);
>  
>  retry:
> -	error = user_path_at(dfd, filename, lookup_flags, &path);
> +	error = filename_lookup(dfd, name, lookup_flags, &path, NULL);
>  	if (error)
> -		return error;
> -
> +		goto out;
>  	error = vfs_utimes(&path, times);
>  	path_put(&path);
>  	if (retry_estale(error, lookup_flags)) {
>  		lookup_flags |= LOOKUP_REVAL;
>  		goto retry;
>  	}
> -
> +out:
> +	putname(name);
>  	return error;
>  }
>  
> -- 
> 2.47.3
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC][PATCH 05/13] chdir(2): import pathname only once
  2025-11-09  6:37 ` [RFC][PATCH 05/13] chdir(2): " Al Viro
@ 2025-11-13 10:16   ` Jan Kara
  0 siblings, 0 replies; 51+ messages in thread
From: Jan Kara @ 2025-11-13 10:16 UTC (permalink / raw)
  To: Al Viro
  Cc: linux-fsdevel, torvalds, brauner, jack, mjguzik, paul, axboe,
	audit, io-uring

On Sun 09-11-25 06:37:37, Al Viro wrote:
> Convert the user_path_at() call inside a retry loop into getname_flags() +
> filename_lookup() + putname() and leave only filename_lookup() inside
> the loop.
> 
> In this case we never pass LOOKUP_EMPTY, so getname_flags() is equivalent
> to plain getname().
> 
> The things could be further simplified by use of cleanup.h stuff, but
> let's not clutter the patch with that.
> 
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

Looks good. Feel free to add:

Reviewed-by: Jan Kara <jack@suse.cz>

								Honza

> ---
>  fs/open.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/open.c b/fs/open.c
> index e5110f5e80c7..8bc2f313f4a9 100644
> --- a/fs/open.c
> +++ b/fs/open.c
> @@ -558,8 +558,9 @@ SYSCALL_DEFINE1(chdir, const char __user *, filename)
>  	struct path path;
>  	int error;
>  	unsigned int lookup_flags = LOOKUP_FOLLOW | LOOKUP_DIRECTORY;
> +	struct filename *name = getname(filename);
>  retry:
> -	error = user_path_at(AT_FDCWD, filename, lookup_flags, &path);
> +	error = filename_lookup(AT_FDCWD, name, lookup_flags, &path, NULL);
>  	if (error)
>  		goto out;
>  
> @@ -576,6 +577,7 @@ SYSCALL_DEFINE1(chdir, const char __user *, filename)
>  		goto retry;
>  	}
>  out:
> +	putname(name);
>  	return error;
>  }
>  
> -- 
> 2.47.3
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC][PATCH 06/13] chroot(2): import pathname only once
  2025-11-09  6:37 ` [RFC][PATCH 06/13] chroot(2): " Al Viro
@ 2025-11-13 10:18   ` Jan Kara
  0 siblings, 0 replies; 51+ messages in thread
From: Jan Kara @ 2025-11-13 10:18 UTC (permalink / raw)
  To: Al Viro
  Cc: linux-fsdevel, torvalds, brauner, jack, mjguzik, paul, axboe,
	audit, io-uring

On Sun 09-11-25 06:37:38, Al Viro wrote:
> Convert the user_path_at() call inside a retry loop into getname_flags() +
> filename_lookup() + putname() and leave only filename_lookup() inside
> the loop.
> 
> In this case we never pass LOOKUP_EMPTY, so getname_flags() is equivalent
> to plain getname().
> 
> The things could be further simplified by use of cleanup.h stuff, but
> let's not clutter the patch with that.
> 
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

Looks good. Feel free to add:

Reviewed-by: Jan Kara <jack@suse.cz>

								Honza

> ---
>  fs/open.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/open.c b/fs/open.c
> index 8bc2f313f4a9..e67baae339fc 100644
> --- a/fs/open.c
> +++ b/fs/open.c
> @@ -603,8 +603,9 @@ SYSCALL_DEFINE1(chroot, const char __user *, filename)
>  	struct path path;
>  	int error;
>  	unsigned int lookup_flags = LOOKUP_FOLLOW | LOOKUP_DIRECTORY;
> +	struct filename *name = getname(filename);
>  retry:
> -	error = user_path_at(AT_FDCWD, filename, lookup_flags, &path);
> +	error = filename_lookup(AT_FDCWD, name, lookup_flags, &path, NULL);
>  	if (error)
>  		goto out;
>  
> @@ -628,6 +629,7 @@ SYSCALL_DEFINE1(chroot, const char __user *, filename)
>  		goto retry;
>  	}
>  out:
> +	putname(name);
>  	return error;
>  }
>  
> -- 
> 2.47.3
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC][PATCH 07/13] user_statfs(): import pathname only once
  2025-11-09  6:37 ` [RFC][PATCH 07/13] user_statfs(): " Al Viro
@ 2025-11-13 10:18   ` Jan Kara
  0 siblings, 0 replies; 51+ messages in thread
From: Jan Kara @ 2025-11-13 10:18 UTC (permalink / raw)
  To: Al Viro
  Cc: linux-fsdevel, torvalds, brauner, jack, mjguzik, paul, axboe,
	audit, io-uring

On Sun 09-11-25 06:37:39, Al Viro wrote:
> Convert the user_path_at() call inside a retry loop into getname_flags() +
> filename_lookup() + putname() and leave only filename_lookup() inside
> the loop.
> 
> In this case we never pass LOOKUP_EMPTY, so getname_flags() is equivalent
> to plain getname().
> 
> The things could be further simplified by use of cleanup.h stuff, but
> let's not clutter the patch with that.
> 
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

Looks good. Feel free to add:

Reviewed-by: Jan Kara <jack@suse.cz>

								Honza

> ---
>  fs/statfs.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/statfs.c b/fs/statfs.c
> index a45ac85e6048..a5671bf6c7f0 100644
> --- a/fs/statfs.c
> +++ b/fs/statfs.c
> @@ -99,8 +99,9 @@ int user_statfs(const char __user *pathname, struct kstatfs *st)
>  	struct path path;
>  	int error;
>  	unsigned int lookup_flags = LOOKUP_FOLLOW|LOOKUP_AUTOMOUNT;
> +	struct filename *name = getname(pathname);
>  retry:
> -	error = user_path_at(AT_FDCWD, pathname, lookup_flags, &path);
> +	error = filename_lookup(AT_FDCWD, name, lookup_flags, &path, NULL);
>  	if (!error) {
>  		error = vfs_statfs(&path, st);
>  		path_put(&path);
> @@ -109,6 +110,7 @@ int user_statfs(const char __user *pathname, struct kstatfs *st)
>  			goto retry;
>  		}
>  	}
> +	putname(name);
>  	return error;
>  }
>  
> -- 
> 2.47.3
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC][PATCH 08/13] do_sys_truncate(): import pathname only once
  2025-11-09  6:37 ` [RFC][PATCH 08/13] do_sys_truncate(): " Al Viro
@ 2025-11-13 10:18   ` Jan Kara
  0 siblings, 0 replies; 51+ messages in thread
From: Jan Kara @ 2025-11-13 10:18 UTC (permalink / raw)
  To: Al Viro
  Cc: linux-fsdevel, torvalds, brauner, jack, mjguzik, paul, axboe,
	audit, io-uring

On Sun 09-11-25 06:37:40, Al Viro wrote:
> Convert the user_path_at() call inside a retry loop into getname_flags() +
> filename_lookup() + putname() and leave only filename_lookup() inside
> the loop.
> 
> In this case we never pass LOOKUP_EMPTY, so getname_flags() is equivalent
> to plain getname().
> 
> The things could be further simplified by use of cleanup.h stuff, but
> let's not clutter the patch with that.
> 
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

Looks good. Feel free to add:

Reviewed-by: Jan Kara <jack@suse.cz>

								Honza

> ---
>  fs/open.c | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/open.c b/fs/open.c
> index e67baae339fc..eb2ff940393d 100644
> --- a/fs/open.c
> +++ b/fs/open.c
> @@ -129,14 +129,16 @@ EXPORT_SYMBOL_GPL(vfs_truncate);
>  int do_sys_truncate(const char __user *pathname, loff_t length)
>  {
>  	unsigned int lookup_flags = LOOKUP_FOLLOW;
> +	struct filename *name;
>  	struct path path;
>  	int error;
>  
>  	if (length < 0)	/* sorry, but loff_t says... */
>  		return -EINVAL;
>  
> +	name = getname(pathname);
>  retry:
> -	error = user_path_at(AT_FDCWD, pathname, lookup_flags, &path);
> +	error = filename_lookup(AT_FDCWD, name, lookup_flags, &path, NULL);
>  	if (!error) {
>  		error = vfs_truncate(&path, length);
>  		path_put(&path);
> @@ -145,6 +147,7 @@ int do_sys_truncate(const char __user *pathname, loff_t length)
>  		lookup_flags |= LOOKUP_REVAL;
>  		goto retry;
>  	}
> +	putname(name);
>  	return error;
>  }
>  
> -- 
> 2.47.3
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC][PATCH 09/13] do_readlinkat(): import pathname only once
  2025-11-09  6:37 ` [RFC][PATCH 09/13] do_readlinkat(): " Al Viro
@ 2025-11-13 10:20   ` Jan Kara
  0 siblings, 0 replies; 51+ messages in thread
From: Jan Kara @ 2025-11-13 10:20 UTC (permalink / raw)
  To: Al Viro
  Cc: linux-fsdevel, torvalds, brauner, jack, mjguzik, paul, axboe,
	audit, io-uring

On Sun 09-11-25 06:37:41, Al Viro wrote:
> Take getname_flags() and putname() outside of retry loop.
> 
> Since getname_flags() is the only thing that cares about LOOKUP_EMPTY,
> don't bother with setting LOOKUP_EMPTY in lookup_flags - just pass it
> to getname_flags() and be done with that.
> 
> The things could be further simplified by use of cleanup.h stuff, but
> let's not clutter the patch with that.
> 
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

Looks good. Feel free to add:

Reviewed-by: Jan Kara <jack@suse.cz>

								Honza

> ---
>  fs/stat.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/fs/stat.c b/fs/stat.c
> index 6c79661e1b96..ee9ae2c3273a 100644
> --- a/fs/stat.c
> +++ b/fs/stat.c
> @@ -566,13 +566,13 @@ static int do_readlinkat(int dfd, const char __user *pathname,
>  	struct path path;
>  	struct filename *name;
>  	int error;
> -	unsigned int lookup_flags = LOOKUP_EMPTY;
> +	unsigned int lookup_flags = 0;
>  
>  	if (bufsiz <= 0)
>  		return -EINVAL;
>  
> +	name = getname_flags(pathname, LOOKUP_EMPTY);
>  retry:
> -	name = getname_flags(pathname, lookup_flags);
>  	error = filename_lookup(dfd, name, lookup_flags, &path, NULL);
>  	if (unlikely(error)) {
>  		putname(name);
> @@ -593,11 +593,11 @@ static int do_readlinkat(int dfd, const char __user *pathname,
>  		error = (name->name[0] == '\0') ? -ENOENT : -EINVAL;
>  	}
>  	path_put(&path);
> -	putname(name);
>  	if (retry_estale(error, lookup_flags)) {
>  		lookup_flags |= LOOKUP_REVAL;
>  		goto retry;
>  	}
> +	putname(name);
>  	return error;
>  }
>  
> -- 
> 2.47.3
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC][PATCH 10/13] get rid of audit_reusename()
  2025-11-09  6:37 ` [RFC][PATCH 10/13] get rid of audit_reusename() Al Viro
  2025-11-09 19:18   ` Linus Torvalds
  2025-11-10 23:13   ` Paul Moore
@ 2025-11-13 10:29   ` Jan Kara
  2 siblings, 0 replies; 51+ messages in thread
From: Jan Kara @ 2025-11-13 10:29 UTC (permalink / raw)
  To: Al Viro
  Cc: linux-fsdevel, torvalds, brauner, jack, mjguzik, paul, axboe,
	audit, io-uring

On Sun 09-11-25 06:37:42, Al Viro wrote:
> Originally we tried to avoid multiple insertions into audit names array
> during retry loop by a cute hack - memorize the userland pointer and
> if there already is a match, just grab an extra reference to it.
> 
> Cute as it had been, it had problems - two identical pointers had
> audit aux entries merged, two identical strings did not.  Having
> different behaviour for syscalls that differ only by addresses of
> otherwise identical string arguments is obviously wrong - if nothing
> else, compiler can decide to merge identical string literals.
> 
> Besides, this hack does nothing for non-audited processes - they get
> a fresh copy for retry.  It's not time-critical, but having behaviour
> subtly differ that way is bogus.
> 
> These days we have very few places that import filename more than once
> (9 functions total) and it's easy to massage them so we get rid of all
> re-imports.  With that done, we don't need audit_reusename() anymore.
> There's no need to memorize userland pointer either.
> 
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

Looks good. Feel free to add:

Reviewed-by: Jan Kara <jack@suse.cz>

								Honza

> ---
>  fs/namei.c            | 11 +++--------
>  include/linux/audit.h | 11 -----------
>  include/linux/fs.h    |  1 -
>  kernel/auditsc.c      | 23 -----------------------
>  4 files changed, 3 insertions(+), 43 deletions(-)
> 
> diff --git a/fs/namei.c b/fs/namei.c
> index 7377020a2cba..dd86e41deeeb 100644
> --- a/fs/namei.c
> +++ b/fs/namei.c
> @@ -125,9 +125,8 @@
>  
>  #define EMBEDDED_NAME_MAX	(PATH_MAX - offsetof(struct filename, iname))
>  
> -static inline void initname(struct filename *name, const char __user *uptr)
> +static inline void initname(struct filename *name)
>  {
> -	name->uptr = uptr;
>  	name->aname = NULL;
>  	atomic_set(&name->refcnt, 1);
>  }
> @@ -139,10 +138,6 @@ getname_flags(const char __user *filename, int flags)
>  	char *kname;
>  	int len;
>  
> -	result = audit_reusename(filename);
> -	if (result)
> -		return result;
> -
>  	result = __getname();
>  	if (unlikely(!result))
>  		return ERR_PTR(-ENOMEM);
> @@ -210,7 +205,7 @@ getname_flags(const char __user *filename, int flags)
>  			return ERR_PTR(-ENAMETOOLONG);
>  		}
>  	}
> -	initname(result, filename);
> +	initname(result);
>  	audit_getname(result);
>  	return result;
>  }
> @@ -268,7 +263,7 @@ struct filename *getname_kernel(const char * filename)
>  		return ERR_PTR(-ENAMETOOLONG);
>  	}
>  	memcpy((char *)result->name, filename, len);
> -	initname(result, NULL);
> +	initname(result);
>  	audit_getname(result);
>  	return result;
>  }
> diff --git a/include/linux/audit.h b/include/linux/audit.h
> index 536f8ee8da81..d936a604d056 100644
> --- a/include/linux/audit.h
> +++ b/include/linux/audit.h
> @@ -316,7 +316,6 @@ extern void __audit_uring_exit(int success, long code);
>  extern void __audit_syscall_entry(int major, unsigned long a0, unsigned long a1,
>  				  unsigned long a2, unsigned long a3);
>  extern void __audit_syscall_exit(int ret_success, long ret_value);
> -extern struct filename *__audit_reusename(const __user char *uptr);
>  extern void __audit_getname(struct filename *name);
>  extern void __audit_inode(struct filename *name, const struct dentry *dentry,
>  				unsigned int flags);
> @@ -380,12 +379,6 @@ static inline void audit_syscall_exit(void *pt_regs)
>  		__audit_syscall_exit(success, return_code);
>  	}
>  }
> -static inline struct filename *audit_reusename(const __user char *name)
> -{
> -	if (unlikely(!audit_dummy_context()))
> -		return __audit_reusename(name);
> -	return NULL;
> -}
>  static inline void audit_getname(struct filename *name)
>  {
>  	if (unlikely(!audit_dummy_context()))
> @@ -624,10 +617,6 @@ static inline struct audit_context *audit_context(void)
>  {
>  	return NULL;
>  }
> -static inline struct filename *audit_reusename(const __user char *name)
> -{
> -	return NULL;
> -}
>  static inline void audit_getname(struct filename *name)
>  { }
>  static inline void audit_inode(struct filename *name,
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index c895146c1444..bbae3cfdc338 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -2835,7 +2835,6 @@ extern struct kobject *fs_kobj;
>  struct audit_names;
>  struct filename {
>  	const char		*name;	/* pointer to actual string */
> -	const __user char	*uptr;	/* original userland pointer */
>  	atomic_t		refcnt;
>  	struct audit_names	*aname;
>  	const char		iname[];
> diff --git a/kernel/auditsc.c b/kernel/auditsc.c
> index d1966144bdfe..e59a094bb9f7 100644
> --- a/kernel/auditsc.c
> +++ b/kernel/auditsc.c
> @@ -2169,29 +2169,6 @@ static struct audit_names *audit_alloc_name(struct audit_context *context,
>  	return aname;
>  }
>  
> -/**
> - * __audit_reusename - fill out filename with info from existing entry
> - * @uptr: userland ptr to pathname
> - *
> - * Search the audit_names list for the current audit context. If there is an
> - * existing entry with a matching "uptr" then return the filename
> - * associated with that audit_name. If not, return NULL.
> - */
> -struct filename *
> -__audit_reusename(const __user char *uptr)
> -{
> -	struct audit_context *context = audit_context();
> -	struct audit_names *n;
> -
> -	list_for_each_entry(n, &context->names_list, list) {
> -		if (!n->name)
> -			continue;
> -		if (n->name->uptr == uptr)
> -			return refname(n->name);
> -	}
> -	return NULL;
> -}
> -
>  /**
>   * __audit_getname - add a name to the list
>   * @name: name to add
> -- 
> 2.47.3
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC][PATCH 11/13] allow incomplete imports of filenames
  2025-11-11 14:41   ` Jens Axboe
@ 2025-11-19  1:12     ` Al Viro
  2025-11-19  1:14       ` Al Viro
  0 siblings, 1 reply; 51+ messages in thread
From: Al Viro @ 2025-11-19  1:12 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-fsdevel, torvalds, brauner, jack, mjguzik, paul, audit,
	io-uring

On Tue, Nov 11, 2025 at 07:41:24AM -0700, Jens Axboe wrote:

> > diff --git a/io_uring/openclose.c b/io_uring/openclose.c
> > index bfeb91b31bba..6bc14f626923 100644
> > --- a/io_uring/openclose.c
> > +++ b/io_uring/openclose.c
> > @@ -121,6 +118,7 @@ int io_openat2(struct io_kiocb *req, unsigned int issue_flags)
> >  	struct file *file;
> >  	bool resolve_nonblock, nonblock_set;
> >  	bool fixed = !!open->file_slot;
> > +	struct filename *name __free(putname) = complete_getname(&open->filename);
> >  	int ret;
> >  
> >  	ret = build_open_flags(&open->how, &op);
> 
> I don't think this will work as-is - the prep has been done on the
> request, but we could be retrying io_openat2(). That will happen if this
> function returns -EAGAIN. That will then end up with a cleared out
> filename for the second (blocking) invocation.

If retry happens in a different thread, we do have a problem ;-/
This -EAGAIN might've come from ->open() itself (io_openat2() sets
O_NONBLOCK on the same calls), and by that point we have already
shoved that filename in direction of audit...

IMO the first 10 commits (up to and including audit_reusename() removal)
are useful on their own, but io_openat2() part does look broken.

Hmm...  FWIW, we could do a primitive like

int putname_to_incomplete(struct incomplete_name *v, struct filename *name)
{
	if (likely(name->refcnt == 1)) {
		v->__incomplete_filename = name;
		return 0;
	}
	v->__incomplete_filename = <duplicate name>;
	putname(name);
	if (unlikely(!v->__incomplete_filename))
		return -ENOMEM;
	return 0;
}

and have
                if (ret == -EAGAIN &&
		    (!resolve_nonblock && (issue_flags & IO_URING_F_NONBLOCK))) {
			ret = putname_to_incomplete(&open->filename,
						    no_free_ptr(name));
			if (unlikely(ret))
				goto err;
			return -EAGAIN;
		}

in io_openat2() (in addition to what's already done in 11/13).  Workable or
too disgusting?

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC][PATCH 11/13] allow incomplete imports of filenames
  2025-11-19  1:12     ` Al Viro
@ 2025-11-19  1:14       ` Al Viro
  2025-11-19  5:41         ` Al Viro
  0 siblings, 1 reply; 51+ messages in thread
From: Al Viro @ 2025-11-19  1:14 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-fsdevel, torvalds, brauner, jack, mjguzik, paul, audit,
	io-uring

On Wed, Nov 19, 2025 at 01:12:23AM +0000, Al Viro wrote:

> int putname_to_incomplete(struct incomplete_name *v, struct filename *name)
> {
> 	if (likely(name->refcnt == 1)) {
> 		v->__incomplete_filename = name;
> 		return 0;
> 	}
> 	v->__incomplete_filename = <duplicate name>;
> 	putname(name);
> 	if (unlikely(!v->__incomplete_filename))
> 		return -ENOMEM;
> 	return 0;
> }
> 
> and have
>                 if (ret == -EAGAIN &&
> 		    (!resolve_nonblock && (issue_flags & IO_URING_F_NONBLOCK))) {
> 			ret = putname_to_incomplete(&open->filename,
> 						    no_free_ptr(name));
> 			if (unlikely(ret))
> 				goto err;
> 			return -EAGAIN;
> 		}
> 
> in io_openat2() (in addition to what's already done in 11/13).  Workable or
> too disgusting?

Note that copying would happen only if extra references had been grabbed
and are still held; that's already a slow path.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC][PATCH 11/13] allow incomplete imports of filenames
  2025-11-19  1:14       ` Al Viro
@ 2025-11-19  5:41         ` Al Viro
  0 siblings, 0 replies; 51+ messages in thread
From: Al Viro @ 2025-11-19  5:41 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-fsdevel, torvalds, brauner, jack, mjguzik, paul, audit,
	io-uring

On Wed, Nov 19, 2025 at 01:14:47AM +0000, Al Viro wrote:
> On Wed, Nov 19, 2025 at 01:12:23AM +0000, Al Viro wrote:
> 
> > int putname_to_incomplete(struct incomplete_name *v, struct filename *name)
> > {
> > 	if (likely(name->refcnt == 1)) {
> > 		v->__incomplete_filename = name;
> > 		return 0;
> > 	}
> > 	v->__incomplete_filename = <duplicate name>;
> > 	putname(name);
> > 	if (unlikely(!v->__incomplete_filename))
> > 		return -ENOMEM;
> > 	return 0;
> > }
> > 
> > and have
> >                 if (ret == -EAGAIN &&
> > 		    (!resolve_nonblock && (issue_flags & IO_URING_F_NONBLOCK))) {
> > 			ret = putname_to_incomplete(&open->filename,
> > 						    no_free_ptr(name));
> > 			if (unlikely(ret))
> > 				goto err;
> > 			return -EAGAIN;
> > 		}
> > 
> > in io_openat2() (in addition to what's already done in 11/13).  Workable or
> > too disgusting?
> 
> Note that copying would happen only if extra references had been grabbed
> and are still held; that's already a slow path.

... and writing the "duplicate name" part has been a very convincing argument
in favour of a scheme Linus suggested upthread (shorter embedded name,
struct filename *always* coming from names_cachep, long name or short).
Current layout is too unpleasant to work with.

^ permalink raw reply	[flat|nested] 51+ messages in thread

end of thread, other threads:[~2025-11-19  5:41 UTC | newest]

Thread overview: 51+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-11-09  6:37 [RFC][PATCH 00/13] io_uring, struct filename and audit Al Viro
2025-11-09  6:37 ` [RFC][PATCH 01/13] do_faccessat(): import pathname only once Al Viro
2025-11-13 10:11   ` Jan Kara
2025-11-09  6:37 ` [RFC][PATCH 02/13] do_fchmodat(): " Al Viro
2025-11-13 10:12   ` Jan Kara
2025-11-09  6:37 ` [RFC][PATCH 03/13] do_fchownat(): " Al Viro
2025-11-13 10:13   ` Jan Kara
2025-11-09  6:37 ` [RFC][PATCH 04/13] do_utimes_path(): " Al Viro
2025-11-13 10:15   ` Jan Kara
2025-11-09  6:37 ` [RFC][PATCH 05/13] chdir(2): " Al Viro
2025-11-13 10:16   ` Jan Kara
2025-11-09  6:37 ` [RFC][PATCH 06/13] chroot(2): " Al Viro
2025-11-13 10:18   ` Jan Kara
2025-11-09  6:37 ` [RFC][PATCH 07/13] user_statfs(): " Al Viro
2025-11-13 10:18   ` Jan Kara
2025-11-09  6:37 ` [RFC][PATCH 08/13] do_sys_truncate(): " Al Viro
2025-11-13 10:18   ` Jan Kara
2025-11-09  6:37 ` [RFC][PATCH 09/13] do_readlinkat(): " Al Viro
2025-11-13 10:20   ` Jan Kara
2025-11-09  6:37 ` [RFC][PATCH 10/13] get rid of audit_reusename() Al Viro
2025-11-09 19:18   ` Linus Torvalds
2025-11-09 19:55     ` Mateusz Guzik
2025-11-09 20:22       ` Linus Torvalds
2025-11-09 22:18         ` Mateusz Guzik
2025-11-09 22:29           ` Linus Torvalds
2025-11-09 22:33             ` Mateusz Guzik
2025-11-09 22:39               ` Mateusz Guzik
2025-11-09 22:41               ` Linus Torvalds
2025-11-09 22:44                 ` Linus Torvalds
2025-11-09 23:07                   ` Linus Torvalds
2025-11-09 22:18     ` Linus Torvalds
2025-11-10  5:17       ` Al Viro
2025-11-10 16:41         ` Linus Torvalds
2025-11-10 19:58           ` Al Viro
2025-11-10 20:52             ` Linus Torvalds
2025-11-11  1:16               ` Al Viro
2025-11-12  9:26           ` Christian Brauner
2025-11-10  6:05       ` Al Viro
2025-11-10  6:36       ` Al Viro
2025-11-10 16:50         ` Linus Torvalds
2025-11-10 23:13   ` Paul Moore
2025-11-11  0:23     ` Paul Moore
2025-11-13 10:29   ` Jan Kara
2025-11-09  6:37 ` [RFC][PATCH 11/13] allow incomplete imports of filenames Al Viro
2025-11-11  0:45   ` Paul Moore
2025-11-11 14:41   ` Jens Axboe
2025-11-19  1:12     ` Al Viro
2025-11-19  1:14       ` Al Viro
2025-11-19  5:41         ` Al Viro
2025-11-09  6:37 ` [RFC][PATCH 12/13] fs: touch up predicts in putname() Al Viro
2025-11-09  6:37 ` [RFC][PATCH 13/13] struct filename ->refcnt doesn't need to be atomic Al Viro

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).