linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC][PATCH 00/13] io_uring, struct filename and audit
@ 2025-11-09  6:37 Al Viro
  2025-11-09  6:37 ` [RFC][PATCH 01/13] do_faccessat(): import pathname only once Al Viro
                   ` (12 more replies)
  0 siblings, 13 replies; 51+ messages in thread
From: Al Viro @ 2025-11-09  6:37 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: torvalds, brauner, jack, mjguzik, paul, axboe, audit, io-uring

[The last time that was discussed a year ago; rather than posting it
as a followup into the old thread, I'm starting a new one.  Old thread
can be found at https://lore.kernel.org/all/20240922004901.GA3413968@ZenIV/]

There are two filename-related problems in io_uring and its interplay
with audit.

Filenames are imported when request is submitted and used when it is
processed.  Unfortunately, with io_uring the latter may very well happen
in a different thread.  In that case the reference to filename is put into
the wrong audit_context - that of submitting thread, not the processing
one.  Audit logics is called by the latter, and it really wants to be
able to find the names in audit_context of current (== processing) thread.

Another related problem is the headache with refcounts - normally all
references to given struct filename are visible only to one thread (the
one that uses that struct filename).  io_uring violates that - an extra
reference is stashed in audit_context of submitter.  It gets dropped
when submitter returns to userland, which can happen simultaneously with
processing thread deciding to drop the reference it got.

We paper over that by making refcount atomic, but that means pointless
overhead for everyone.

One more headache with refcount comes from retry loops - handling of
-ESTALE while resolving a parent is dealt with transparently, but
server might get around to telling you that things got stale only
when you e.g. ask it to remove a link.  In that case we have to repeat
the pathwalk in "trust no one" mode and see if it helps.

So far, so good, but depending upon the helpers we are using we might
end up re-importing the pathname from userland.  Had that been merely
duplicated work on an already slow path, it wouldn't matter much, but
with audit in the mix it becomes seriously confusing.  Currently
getname() and its ilk try to cope with that (only if audit is enabled)
by stashing the userland address in struct filename and checking if
an instance for the same userland address has already been made visible
to audit, in which case we just grab an extra reference.

That's bogus for several reasons.  In particular, having the same
pointer passed in different pathname arguments of a syscall should not be
different from having separate strings with identical context given to it.
Compiler might turn the latter into the former, after all - merging
identical string literals may happen.  As the result, audit might produce
significantly different outputs on the same program, depending upon the
compiler flags used when building it.  This is obviously not a good thing.
The fact that this logics is dependent upon CONFIG_AUDIT also doesn't help.

The right solution is to have the pathname imported once, before the
retry loop; fortunately, most of those loops are already done that way
these days - only 9 exceptions in the entire kernel.

With those exceptions taken care of, we can get rid of the entire "stash
the userland address in struct filename" thing.  That helps to solve both
io_uring problems - import of pathname in there is already separated from
making use of it (the former happens in submitting thread, the latter -
in processing one).  If we explicitly mark the places where we start
making use of those suckers (in io_mkdirat(), etc.), we can have the
submitter do "incomplete" imports (just copying the name from userland
and stashing the result in opaque object).  Then processing thread
would explicitly ask to complete the import and use the resulting struct
filename *, same as a normal syscall would - all in the thread that does
actual work, so that situation looks normal for audit - the damn thing
goes into the right audit_context, all uses are thread-synchronous and
from the same thread, etc.  What's more, refcount can become non-atomic
(and grabbed only inside the kernel/auditsc.c, at that).

The series below attempts to do that.  It does need a serious review,
including that from audit and io_uring folks.

It lives in git://git.kernel.org:/pub/scm/linux/kernel/git/viro/vfs.git #work.filename-refcnt
Individual patches in followups.

Al Viro (12):
  do_faccessat(): import pathname only once
  do_fchmodat(): import pathname only once
  do_fchownat(): import pathname only once
  do_utimes_path(): import pathname only once
  chdir(2): import pathname only once
  chroot(2): import pathname only once
  user_statfs(): import pathname only once
  do_sys_truncate(): import pathname only once
  do_readlinkat(): import pathname only once
  get rid of audit_reusename()
  allow incomplete imports of filenames
  struct filename ->refcnt doesn't need to be atomic

Mateusz Guzik (1):
  fs: touch up predicts in putname()

 fs/namei.c            |  68 +++++++++++++++++++++-------
 fs/open.c             |  39 +++++++++-------
 fs/stat.c             |   6 +--
 fs/statfs.c           |   4 +-
 fs/utimes.c           |  13 +++---
 include/linux/audit.h |  11 -----
 include/linux/fs.h    |  17 ++++---
 io_uring/fs.c         | 101 ++++++++++++++++++++++--------------------
 io_uring/openclose.c  |  16 +++----
 io_uring/statx.c      |  17 +++----
 io_uring/xattr.c      |  30 +++++--------
 kernel/auditsc.c      |  23 ++--------
 12 files changed, 178 insertions(+), 167 deletions(-)

-- 
2.47.3


^ permalink raw reply	[flat|nested] 51+ messages in thread

end of thread, other threads:[~2025-11-19  5:41 UTC | newest]

Thread overview: 51+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-11-09  6:37 [RFC][PATCH 00/13] io_uring, struct filename and audit Al Viro
2025-11-09  6:37 ` [RFC][PATCH 01/13] do_faccessat(): import pathname only once Al Viro
2025-11-13 10:11   ` Jan Kara
2025-11-09  6:37 ` [RFC][PATCH 02/13] do_fchmodat(): " Al Viro
2025-11-13 10:12   ` Jan Kara
2025-11-09  6:37 ` [RFC][PATCH 03/13] do_fchownat(): " Al Viro
2025-11-13 10:13   ` Jan Kara
2025-11-09  6:37 ` [RFC][PATCH 04/13] do_utimes_path(): " Al Viro
2025-11-13 10:15   ` Jan Kara
2025-11-09  6:37 ` [RFC][PATCH 05/13] chdir(2): " Al Viro
2025-11-13 10:16   ` Jan Kara
2025-11-09  6:37 ` [RFC][PATCH 06/13] chroot(2): " Al Viro
2025-11-13 10:18   ` Jan Kara
2025-11-09  6:37 ` [RFC][PATCH 07/13] user_statfs(): " Al Viro
2025-11-13 10:18   ` Jan Kara
2025-11-09  6:37 ` [RFC][PATCH 08/13] do_sys_truncate(): " Al Viro
2025-11-13 10:18   ` Jan Kara
2025-11-09  6:37 ` [RFC][PATCH 09/13] do_readlinkat(): " Al Viro
2025-11-13 10:20   ` Jan Kara
2025-11-09  6:37 ` [RFC][PATCH 10/13] get rid of audit_reusename() Al Viro
2025-11-09 19:18   ` Linus Torvalds
2025-11-09 19:55     ` Mateusz Guzik
2025-11-09 20:22       ` Linus Torvalds
2025-11-09 22:18         ` Mateusz Guzik
2025-11-09 22:29           ` Linus Torvalds
2025-11-09 22:33             ` Mateusz Guzik
2025-11-09 22:39               ` Mateusz Guzik
2025-11-09 22:41               ` Linus Torvalds
2025-11-09 22:44                 ` Linus Torvalds
2025-11-09 23:07                   ` Linus Torvalds
2025-11-09 22:18     ` Linus Torvalds
2025-11-10  5:17       ` Al Viro
2025-11-10 16:41         ` Linus Torvalds
2025-11-10 19:58           ` Al Viro
2025-11-10 20:52             ` Linus Torvalds
2025-11-11  1:16               ` Al Viro
2025-11-12  9:26           ` Christian Brauner
2025-11-10  6:05       ` Al Viro
2025-11-10  6:36       ` Al Viro
2025-11-10 16:50         ` Linus Torvalds
2025-11-10 23:13   ` Paul Moore
2025-11-11  0:23     ` Paul Moore
2025-11-13 10:29   ` Jan Kara
2025-11-09  6:37 ` [RFC][PATCH 11/13] allow incomplete imports of filenames Al Viro
2025-11-11  0:45   ` Paul Moore
2025-11-11 14:41   ` Jens Axboe
2025-11-19  1:12     ` Al Viro
2025-11-19  1:14       ` Al Viro
2025-11-19  5:41         ` Al Viro
2025-11-09  6:37 ` [RFC][PATCH 12/13] fs: touch up predicts in putname() Al Viro
2025-11-09  6:37 ` [RFC][PATCH 13/13] struct filename ->refcnt doesn't need to be atomic Al Viro

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).