[PATCHSET][RFC] struct fd and memory safety

linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCHSET][RFC] struct fd and memory safety
@ 2024-07-30  5:09 Al Viro
  2024-07-30  5:15 ` [PATCH 01/39] memcg_write_event_control(): fix a user-triggerable oops viro
                   ` (6 more replies)
  0 siblings, 7 replies; 134+ messages in thread
From: Al Viro @ 2024-07-30  5:09 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: Linus Torvalds, Christian Brauner, bpf, Amir Goldstein, kvm,
	cgroups, netdev

		Background
		==========

	Most of the system calls use file descriptors to refer to opened
files (struct file references) currently stored in the given slot(s) of
caller's descriptor table.  There are obvious exceptions (e.g. close(),
dup2(), etc.) where a file descriptor argument is used for more than
that, but those are few.  The common case is something like read(), write(),
etc. - the syscalls that just want to do something with the file specified
by descriptor given by the caller.

	Such system calls need a primitive that would find the struct
file reference by descriptor.  However, finding the reference is not
enough - several threads might share the same descriptor table, so there's
a possibility that descriptor passed to a syscall in one such thread
will be closed by another while the first one is using the file.

	Lifetime of struct file is controlled by refcount; each
reference in a descriptor table contributes to it and struct file
is not shut down until its refcount drops to zero.

	There is a primitive (fget()) that takes a file descriptor and
returns either an extra reference to corresponding struct file (incrementing
its refcount) or NULL (if descriptor is not currently opened).  Decrement
of refcount is done by fput().  We could use that pair of primitives for
all such system calls -
	struct file *file = fget(fd);
	if (file) {	// fd is opened, file contains a cloned reference
		do something
		fput(file);	// release the reference we'd cloned
	} else {	// fd is not opened
		do something else
	}
That way we are guaranteed that file won't be shut down while we are
using it.  That works, but incrmenting and decrementing refcount has
an overhead and for some applications it's non-negligible.

	There is a fairly common case when we don't need to bump the
reference count of struct file: if descriptor table is not shared with
any other threads, we know that nobody else is going to modify it.
In that case we could borrow the struct file reference rather than
cloning it - the reference we'd found in descriptor table is not going
anywhere, making sure that file refcount will remain positive.

	That, of course, does not extend to keeping the file reference
past the return from syscall, passing it to other threads, etc. -
for those uses we have to bump the file refcount, no matter what.
The borrowed reference is not going to remain valid forever; the things
that may terminate its validity are
	* return to userland (close(2) might follow immediately)
	* removing and dropping some reference from descriptor table
(some ioctls have to be careful because of that; they can't outright
close a descriptor - removing a file reference from descriptor table
is fine, but dropping it must be delayed)
	* spawning a thread that would share descriptor table.  Note
that this is the only thing that could turn a previously unshared
table into a shared one.
As long as the borrowing thread does not do any of the above, it is
safe.

	We do need to remember whether we decided to clone or to borrow -
while an unshared descriptor table can't become shared (as long as we
abide by the rules above), the opposite can easily happen.  So when
it comes to deciding whether we need to drop the reference once we
are done with it, we can't just repeat the check - its outcome might
be different by that point.

		struct fd
		=========

	struct fd is the object we use to represent those borrowed or
cloned struct file references.
Valid values are:
	* a cloned file reference; contributes to file refcount,
needs to be dropped when we are done.
	* a borrowed file reference; does *not* contribute to
file refcount.  No need to drop refcount when we are done, but
we need to be guaranteed that original reference will not be
dropped until we are done.
	* empty.
[Note: the current representation is going to change; in the following
I'm going to use a couple of helpers that are currently open-coded
in mainline]

Equivalent of the fget()-based snippet above would be

	struct fd f = fdget(fd);
	if (!fd_empty(f)) { // fd is opened, fd_file(f) is a live file reference
		do something, using fd_file(f)
		fdput(f);
	} else {	// fd is not opened
		do something else
	}

Rules:
	* fdget(n) yields an empty value iff descriptor #n is not open;
otherwise it either borrows or clones the struct file reference that
descriptor refers to.
	* fd_empty(f) is a predicate checking if f is empty.
	* fd_file(f) is NULL if f is empty; otherwise it returns
a pointer to struct file instance that is guaranteed to have a positive
refcount at least until fdput(f).
	* any non-empty instance *MUST* be passed to fdput() exactly once.
Forgetting to do it is a leak; doing it twice is a potential UAF.
[not entirely true - at the moment we have one ugly wart that manages to
get away with violating that rule; sockfd_lookup_light() is really
special that way ;-/]
	* fdput(empty) is a no-op.
	* things that would invalidate the borrowed file references
are not allowed between fdget() and matching fdput().

There are several additional twists.

First of all, most of the syscalls refuse to act upon O_PATH-opened files -
if given a descriptor that corresponds to such file, they fail in the
same way as if it hadn't been opened.  fdget() (as well as fget()) is
where that is handled - we treat finding a reference to O_PATH-opened
file as "no file reference for that descriptor".

However, there are syscalls that do allow O_PATH files - ...at(2)
family is the chief example.  E.g. mkdirat(fd, "foo", 0777) (create
a subdirectory named foo in whatever directory fd refers to) has
no problem with fd being an O_PATH-opened file - that's what it
is usually passed, actually.  There are other syscalls like that
(fchdir(2), etc.); for those we have fget_raw() and fdget_raw() -
same as fget()/fdget(), except that O_PATH-opened files are not
rejected.  Other than that, the rules for fdget_raw() are the same
as for fdget() - it needs to be paired with fdput(), etc.

Next, there are syscalls that want a serialization for their use
of current IO position of file they are acting upon.  That serialization
is on per-struct file basis (as is the IO position itself) and it's
done on file->f_pos_mutex.  The rules are rather convoluted,
but the bottom line is that ->f_pos_mutex needs to be taken in
some, but not all cases.  What's more, we can't just repeat the
checks when it comes to releasing that mutex - we need to
remember whether we'd takeng it or not.  In other words, for those
users we have an additional bit of information to be stored in
struct fd - "->f_pos_mutex needs to be released when we are done".

fdget_pos() handles that; it starts with doing what fdget() would,
then if ->f_pos_mutex needs to be taken, takes it and sets the
"needs to be released in the end" flag.

We could have fdput() handle that, but since fdget() and fdget_raw()
callers outnumber fdget_pos() ones by more than an order of magnitude,
it's better to have a separate destructor (fdput_pos()) to be used
with fdget_pos().  That way we don't get the pointless checks in
syscalls that don't need that serialization.

fdput_pos() must be paired with fdget_pos() - using it with fdget() or
fdget_raw() is a bug, and so's using fdput() with fdget_pos().

The absolute majority of instances comes from fdget(), fdget_raw(),
fdget_pos() (134, 14 and 12 callers resp., as of 6.10-rc1) or explicit
initializers for an empty struct fd (4 instances).

The rest (11 instances, all in overlayfs) arguably should be using
a separate type, similar to struct fd, but not identical to it.
These instances do not come from descriptor table (which means different
rules for what's allowed while using them) and they would be better off
with "error such and such" instead of "empty".  We could try to do
something similar to the way ERR_PTR() abuses pointer types, but
shoehorning that into struct fd ends up with too much headache both
for those instances in overlayfs and for the normal struct fd users;
it's better to use a separate type.


		Code Audit
		==========

Looking through the users of struct fd early in 6.10 cycle has caught
several leaks, as well as a bunch of assorted UAF.  That kind of stuff
keeps cropping up; it would be nice to have as much as possible done
automatically.

The way audit went:
	1.  struct fd is never a member of struct or union.
	2.  no global variables of that type.
	
	3.  Pointers to struct fd:
	3a.  very few such pointers anywhere - they only occur as
arguments to 4 functions (ovl_real_fdget_meta(), ovl_real_fdget(),
timerfd_fget() and perf_fget_light()).	All calls of these functions
are direct ones.  The values passed in such arguments are either
addresses of local struct fd variables in the callers or, in case of
ovl_real_fdget_meta() call in ovl_real_fdget() equal to the corresponding
argument of the latter (i.e. an address of local variable in caller's
caller).  These functions are basically constructors with lousy calling
conventions.
	3b.  none of those 4 functions ever convert the struct fd *
argument to anything, explicitly or implicitly.
	3c.  nothing is ever cast to such pointers.  Implicit conversions
are excluded by the above.
	3d.  the only places where we take address of struct fd are in
arguments of calls of those 4 functions.
	3e.  In other words, all pointers to struct fd are addresses of
local variables of such type.  No dynamically allocated instances exist
- only local variables, formal arguments of functions and return values
of functions.

	4.  the only functions that return struct fd are fdget(),
fdget_raw(), fdget_pos() and their common helper (__to_fd(), not called
by anything else).  All of those are called only directly, return value
is always stored into a local struct fd variable in the caller.  Return
value is formed as a compound literal in __to_fd().  With 4 exceptions
the local variable in question is uninitialized.  Exceptions are
	4a.  xfs_find_handle() and ib_uverbs_open_xrcd() - assignment
is conditional, variable has been initialized empty and later there's
an unconditional fdput().
	4b.  two places in do_mq_notify() - both store to the variable
that is either uninitialized or reused after prior fdput().
	5.  the variable passed to disguised constructors (see above)
is almost always uninitialized.  The only exception is the call of
perf_fget_light() from perf_event_open() - there the variable has been
initialized empty, with conditional call of perf_fget_light() assigning
to it and an unconditional fdput() done downstream.

	6.  Stores to struct fd happen in
	6a.  explicit initializers with fdget{,_raw,_pos}().
	6b.  assignments of the value returned by fdget{,_raw,_pos}()
into uninitialized variables.
	6c.  assignments of the value returned by fdget{,_raw,_pos}()
into empty-initialized variables (see above).
	6d.  assignments of the value returned by fdget{,_raw,_pos}()
into a variable that had been passed to fdput() (see above).
	6e.  explicit initializers with empty (in 3 cases mentioned above)
	6f.  field-by-field assignments to previously uninitialized
variable done by ovl_real_fdget{,_meta}()
	6g.  whole-structure move from local variable to caller's local
variable in perf_fget_light() and timerfd_fget().  The target is either
uninitialized or, in one case, previously initialized empty (see above).
	6h.  In other words, with very few exceptions the variables
are assign-once.

	7.  few functions take struct fd as an argument.  There are two
destructors (fdput() and fdput_pos(), the latter calling the former
unconditionally) and 4 irregulars.
	7a.  vmsplice_type(); only one caller, returns 0 or -EBADF,
equivalent to fdput() if -EBADF has been returned.  struct fd argument
comes from caller's local variable (initialized by prior fdget()), caller
buggers off if it got negative.  IMO we'd be better off expanding it in
its sole call site.
	7b.  ____bpf_prog_get().  Almost the same story.  The difference
is, return value is not an integer - it's either ERR_PTR(...) or the
struct bpf_prog * value found in ->private_data of bpf-prog opened file.
The caller buggers off immediately if IS_ERR() is true.  Not a leak,
but only because ->private_data for those files is never an ERR_PTR(...)
value.  Not that hard to prove, though.
	7c.  map_get_sys_perms(); might as well have been given fd.file
as an argument, neutral from the lifetime POV (immutable borrow, for
rust crowd)
	7d.  __bpf_map_get().  Similar to ____bpf_prog_get(), except
that it wants the file to be bpf-map one.  Again, correctness depends
upon the proof that ->private_data is never ERR_PTR(...) for any of those.
Always immediately downstream of fdget()...  Frankly, I would rather
lift fdput() into the handling of failure exit in the caller - that
would turn what's left into imm borrow argument...

	8.  The only places where we touch a variable downstream of
fdput() are two reuses in do_mq_notify().  In those two cases the
next access after fdput() is assignment of fdget() result.

	9.  The lack of access to uninitialized instances:
	9a.  The first access to any instance is either storing
fdget{,_raw,_pos}() return value in it, or the call of one of the
disguised initializers.  The former initializes the entire struct fd.
	9b.  ovl_read_fdget() returns 0 or -E..., and in the latter
case all callers bugger off without looking at the junk left in
struct fd (and junk it is).  In case when it returns 0, it does
store a valid value in struct fd it's been asked to initialize.
	9c.  ovl_real_fdget_meta() - same story; the only twist is
that it might be tail-called by ovl_real_fdget().
	9d.  timerfd_fget() leaves the target uninitialized when
it returns a negative value.  Callers bugger off immediately if
they see an error.  If it has returned zero, the value left in
the target is something fdget() returned.
	9e.  perf_fget_light() - exact same story.

	10.  Lack of leaks:
	10a.  All calls of fdput_pos() that return a non-empty
value are followed by exactly one call of fdput_pos().
	10b.  Most of the calls of fdget{,_raw}() that return non-empty
value are followed by exactly one call of fdput().  There are several
exceptions.
	10c.  There are two outright leaks (ppc kvm and lirc).
	10d.  timerfd_fget() and perf_fget_light() might move the result
into caller-supplied variable and return 0.  No fdput() in that case -
it's going to be done by caller to the moved value.
	10e.  sockfd_lookup_light() is really, really special.
On success, it forgets the struct fd instance it has done fdget() into,
and returns the "borrowed or cloned" flag via *fput_needed and struct
socket pointer associated with fd.file via return value.  File pointer
is lost; the caller eventually recovers it from ->file pointer in the
struct socket it got and does conditional fput().  That works, but proof
of correctness is highly unpleasant.  The worst part is, it does not
really buy us anything.  We could as well have lifted that struct fd
(along with fdget() and fdput() on failures) into the callers, and used
fdput() instead of this fput_light() song and dance.  Better code
generation and much simpler proof of correctness that way...
	10f.  All successful calls of ovl_real_fdget{,_meta}(),
timerfd_fget() and perf_fget_light() are followed by exactly one call
of fdput().

	That pretty much concludes the analysis as far as memory
safety of struct fd uses goes.  There had been several places where
UAF got caught in the same functions (e.g. fetching ->private_data
and accessing the object it points to after we'd done fdput(),
without anything to guarantee that the object will live past
the file shutdown that becomes possible at that point), plus the
usual scattering of bugs one catches when looking through random
places in the tree.

	It wasn't impossibly hard, but it would be nice to reduce the
amount of PITA next time around.

		How to make it less painful?
		============================

Quite a bit of headache in the audit above had been caused by
irregularities that are relatively easy to deal with:

1.  One low-hanging fruit is sockfd_lookup_light() - getting rid of
that is as simple as turning
	int fput_needed;
	sock = sockfd_lookup_light(fd, &err, &fput_needed);
	if (!sock)
		return err;
	...
	fput_light(sock->file, fput_needed);
into
	struct fd f = fdget(fd);
	if (!f.file)
		return -EBADF;
        sock = sock_from_file(f.file);
	if (!sock) {
		fdput(f);
		return -ENOTSOCK;
	}
	...
	fdput(f);
or, possibly,
	CLASS(fd, f)(fd);
	if (fd_empty(f))
		return -EBADF;
        sock = sock_from_file(fd_file(f));
	if (!sock)
		return -ENOTSOCK;
	...
sockfd_lookup_light() and fput_light() go away, along with the very
unpleasant side trip in the proof of correctness.  At least that
way we have "every non-empty instance must get to destructor" as
a hard rule.

2.  Getting rid of passing struct fd by reference.  There are very
few places where we do that - a couple in overlayfs, timerfd_fget()
and perf_fget_light().

2a.  overlayfs stuff is not quite a match for struct fd.  There we want
not "file reference or nothing" - it's "file reference or an error".
For pointers we can do that by use of ERR_PTR(); strictly speaking, that's
an abuse of C pointers, but the kernel does make sure that the uppermost
page worth of virtual addresses never gets mapped and no architecture
we care about has those as trap representations.  It might be possible
to use the same trick for struct fd; however, for most of the regular
struct fd users that would be a pointless headache - it would pessimize
the code generation for no real benefit.  I considered a possibility of
using represenation of ERR_PTR(-EBADF) for empty struct fd, but that's
far from being consistent among the struct fd users and that ends up
with bad code generation even for the users that want to treat empty as
"fail with -EBADF".

It's better to introduce a separate type, say, struct fderr.
Representable values:
	* borrowed file reference (address of struct file instance)
	* cloned file reference (address of struct file instance)
	* error (a small negative integer).
With sane constructors we get rid of the field-by-field mess in
ovl_real_fdget{,_meta}() and have them serve as initializers,
always returning a valid struct fderr value.

2b.  timerfd_fget() trivially folds into its callers.  perf_fget_light(),
OTOH, is better off converted to is_perf_file(struct fd), with fdget()
and fdput() on failures lifted into the callers.

3.  It would be nice to have all instances assign-once.  We are not
far away from that; with overlayfs stuff taken care of we only have to
deal with reuses in do_mq_notify() and several assignments of fdget()
result into a variable that is currently empty.

Reuses in do_mq_notify() are trivial to get rid of - it's the only user
of netlink_getsockbyfilp() and passing it a descriptor instead of struct
file * (and shifting the fdget()/fdput() pair around the call into the
function itself) is all it takes.

That leaves xfs_find_handle(), ib_uverbs_open_xrcd(), perf_event_open(2).

xfs_find_handle() is basically
	if asked to work by fd
		fdget
		use file_inode() of resulting file
	else
		user_path_at
		use ->dentry->d_inode of resulting path
	do work, using inode
	if asked to work by fd
		fdput
	else
		path_put
IMO it's easier to convert that to
	if asked to work by fd
		fdget
		pin and copy ->f_path of resulting file
		fdput
	else
		user_path_at
	do work, using path.dentry->d_inode
	path_put
Simpler control flow that way, if nothing else...

perf_event_open() and ib_uverbs_open_xrcd() are both basically
	if (fd != -1) {
		fdget
		if empty
			fail
		...
	}
	do some work
	fdput
and seeing that fdget(-1) will always yield empty, we could turn
that into
	fdget	// empty from fdget(-1) is not an error here
	if (fd != -1)
		if empty
			fail
		...
	...
	fdput

I'm not certain if it's the best way to deal with that.  However,
it's very local and with that done we get assign-once for all struct
fd instances.


4.  We are very close to having fdput() done in the same function that
has called fdget().  Exceptions: vmsplice_type(), ____bpf_prog_get() and
__bpf_map_get().  The former two are easier to fold into their respective
sole callers, the last one...  just lift the fdput() on failure into the
callers.  Among other things, the proof of memory safety no longer has to
rely upon file->private_data never being ERR_PTR(...) for bpffs files.
Original calling conventions made it impossible for the caller to tell
whether __bpf_map_get() has returned ERR_PTR(-EINVAL) because it has found
the file not be a bpf map one (in which case it would've done fdput())
or because it found a bpf map file whose ->private_data happened to be
ERR_PTR(-EINVAL) (in which case fdput() would _not_ have been done).
In fact, the latter would never happen, but proving that added a needless
distraction.


This takes care of the irregularities in the audit above.
What remains is verifying that all instances are
	* initialized in the function where they are declared
	* never reassigned
	* never used uninitialized
	* destructor is called in the same function, always
exactly once unless the constructor has yielded an empty
	* never touched after the call of destructor
	* never passed anywhere by reference


		Making compiler do the rest of the work
		=======================================

That does make for a more straightforward audit.  However, it looks like
we could get most of that work done by compiler, if we could make use of
__cleanup without too much PITA.  We even have convenient helpers for
that - CLASS(fd_{,raw}, f)(fd); will have f declared and initialized,
with automatic fdput() whenever we leave the scope.


There are several issues with that approach, though.  First of all, we
want the compiler to see that fdput(empty) is a no-op.  Otherwise we'll
get pointless junk on the EBADF failure exits. Currently it ends up
generating a pointless test + skipped call of fput() there; the reason
is that compiler doesn't know that fdget() et.al.  never return a value
with NULL pointer and non-zero flags.

It's not that hard to deal with - the real primitives behind fdget()
et.al. are returning an unsigned long value, unpacked by (inlined)
__to_fd() into the current struct file * + int.  Linus suggested that
keeping that unsigned long around with the extractions done by inlined
accessors should generate a sane code and that turns out to be the case.
Turning struct fd into a struct-wrapped unsinged long, with
	fd_empty(f) => unlikely(f.word == 0)
	fd_file(f) => (struct file *)(f.word & ~3)
	fdput(f) => if (f.word & 1) fput(fd_file(f))
ends up with compiler doing the right thing.  The cost is the patch
footprint, of course - we need to switch f.file to fd_file(f) all over
the tree, and it's not doable with simple search and replace; there are
false positives, etc.  Might become a PITA at merge time; however,
actual update of that from 6.10-rc1 to 6.11-rc1 had brought surprisingly
few conflicts.


Unfortunately, there are more subtle issues.

The main reason why struct fd might be amenable to __cleanup-based
approaches is that delaying the calls of fdput() is fairly harmless.
If descriptor table had not been shared in the first place, fdput()
is a no-op; if it had been shared, another thread could have bumped
the refcount of our file for a while, so we had no warranty that
fdput() would've driven the refcount to zero.  Delaying fdput_pos()
is potentially more damaging, since it involves releasing a mutex,
but it's always done to local variable defined in the same function
and the only things ever done between it and return are sys_inc{r,w}()
and add_{r,w}char().  Those never call anything else, so even if we delay
the calls of fdput_pos() all the way to the end of function where their
argument is defined, we won't get any deadlocks.

So the lifetime of struct fd instance could be stretched to the end of
whatever scope we are in.  However, there is another kind of trouble -
if fdget() is not in the very beginning of the scope, we might have
	do something
	if (failed)
		goto out;
	f = fdget(...);
	....
	fdput(f);
out:
	do something else
which is *not* going to tolerate shifting the fdput() downstream.
If that "do something else" is simply a return - not a problem, we
can simply replace the goto with return.  If it's something trickier,
we have a problem.


The really bad problem (and one of the reasons why I'm cautious about
__cleanup()-based approaches in general) is that gcc does *NOT* notice
a problem in the pattern that comes from the "goto around fdget()/fdput()"
trouble case above:
	if (foo)
		goto l;
	some_type var __cleanup(destructor) = expr;
	...
l:	// no uses of of var from that point on
It quietly produces garbage, at least up to gcc 12.  clang does catch
that kind of bugs, but we can neither make it mandatory (not all
targets are supported, to start with) nor bump the minimal gcc version
from 5.1 all the way to 13.  For this series most of the affected code
is covered by LLVM=1 builds and the few places not covered were not
hard to examine manually.  Note that this "goto around" thing is not
a pure theory; I've run into it several times in this series.  Any
patches that convert to CLASS(...) use need to watch out for that
kind of trouble.


Let's take a look at the prep cleanups from that POV:

	* sockfd_lookup_light() elimination.  sockfd_lookup_light() has
13 callers; 7 of those (__sys_bind(), __sys_listen(), __sys_getsockname(),
___sys_getpeername(), __sys_setsockopt(), __sys_getsockopt(),
__sys_shutdown()) are as simple as it gets; sockfd_lookup_light()
call is the first thing done in the scope, fput_light() is immediately
followed by return.  Those clearly can be converted to CLASS(fd) without
any trouble for code generation, provided that representation change for
struct fd is already done.  The rest (__sys_sendto(), __sys_recvfrom(),
__sys_sendmsg(), __sys_sendmmsg(), __sys_recvmsg(), do_recvmmsg()) have
some work (and failure exits) prior to the call of sockfd_lookup_light(),
but all of those failure exits are plain returns and fput_light() is
still immediately followed by return.  In other words, those can also
be converted to CLASS(fd) without any problems.
	Doing that ends up with cleaner control flow in the callers.
Looks like CLASS(fd) conversion is the thing to do for those.

	* overlayfs struct fderr stuff.  There the things get trickier;
we obviously need a separate DEFINE_CLASS there (different constructor,
if nothing else), but that's not all.  Some callers of ovl_real_fdget()
are simple (ovl_splie_read(), ovl_fadvise(), ovl_flush()) - nothing
before ovl_real_fdget(), fdput() at the very end.  Those can be converted
easily.  ovl_fsync() would be the same way, except for a different
constructor.  Two more (ovl_llseek() and ovl_read_iter()) have
work (and exits) prior to the call of ovl_real_fdget(), but those
exits are plain returns.  Also converts just fine.
	Unfortunately, there are 4 callers that are done under
inode_lock() on the overlayfs file.  For two of them (ovl_write_iter()
and ovl_splice_write()) we would end up with fdput() reordered past
the inode_unlock(), but that would be it; however, for ovl_fallocate()
and ovl_copyfile() we really have a problem - the pattern in those is
	inode_lock()
	...
	file_remove_privs() and goto out_unlock if that fails
	ovl_real_fdget()
	work
	fdput()
	out_unlock: inode_unlock()
There pushing fdput() downstream would be a real bug.

	Not sure what would be the best way to deal with that.
Theoretically, we could have that inode_unlock() done via __cleanup;
that would've solved the immediate problem, but I really hate that idea.
That's the kind of attractive nuisance that is pretty much guaranteed to
spread and it _will_ end up a source of deadlocks.  For now I went with
the minimally invasive approach and added a scope from ovl_real_fdget()
to fdput() in each of those 4 functions; that makes all of them trivial
to convert.  Another option would be to take the same areas into helper
functions...

	Unlike the sockfd_lookup_light() removal, I think it's better to
keep the conversion to CLASS(...) separate from the prep cleanup per se.


	* timerfd_fget().  Two callers, do_timerfd_gettime() calls
timerfd_fget() at the very beginning, do_timerfd_settime() - after
a check with immediate return on failure.  In both cases fdput()
is immediately followed by return.  Trivial conversion to CLASS(fd).

	* perf_fget_light().  Same picture in both callers - if descriptor
is not -1, we want it to be an opened perf file (with -EBADF otherwise)
and the object we are really interested in is the ->private_data of the
perf file in question.  That object is valid as long as the file is
open.  In _perf_ioctl() we have
		if (arg != -1) {
			struct perf_event *output_event;
			struct fd output;
			ret = perf_fget_light(arg, &output);
			if (ret)
				return ret;
			output_event = fd_file(output)->private_data;
			ret = perf_event_set_output(event, output_event);
			fdput(output);
		} else {
			ret = perf_event_set_output(event, NULL);
		}
and naive conversion would be
		if (arg != -1) {
			CLASS(fd, output)(arg);
			struct perf_event *output_event;
			if (!is_perf_file(output))
				return -EBADF;
			output_event = fd_file(output)->private_data;
			return perf_event_set_output(event, output_event);
		} else {
			return perf_event_set_output(event, NULL);
		}
which, while correct, is really asking for trouble - it would tempt
a reader to lift the declaration of output_event outside of that
if, turning it into
		struct perf_event *output_event = NULL;
		if (arg != -1) {
			CLASS(fd, output)(arg);
			if (!is_perf_file(output))
				return -EBADF;
			output_event = fd_file(output)->private_data;
		}
		return perf_event_set_output(event, output_event);
but that would be an instant UAF - we can't use output_event after
the (now implicit) fdput(output).  We need output in the same scope
as output_event, so IMO the variant least likely to invite trouble
down the road would be this:
		struct perf_event *output_event = NULL;
		CLASS(fd, output)(arg);	// arg == -1 => empty
		if (arg != -1) {
			if (!is_perf_file(output))
				return -EBADF;
			output_event = fd_file(output)->private_data;
		}
		return perf_event_set_output(event, output_event);
The situation in the other caller (perf_event_open(2)) is similar;
there we have a conditional assignment replacing empty with the result
of fdget() when group_fd is not equal to -1.  Again, lifting fdget()
outside of the check for descriptor being -1 avoids the headache.
This case is also easy to convert to CLASS(fd) - all failure exits prior
to fdget() are plain returns and the only thing downstream of fdput()
(put_unused_fd()) can be safely transposed with it.

	NB: transposition can be avoided if put_unused_fd() itself
is done via __cleanup(); I was (and still am) sceptical about that,
but Christian added such DEFINE_CLASS lately.  And yes, blind conversion
to that would be a problem ;-/
	
	* do_mq_notify() reuses: netlink_getsockbyfd() converts to
CLASS(fd, ...) trivially.  The call of fdget() remaining in do_mq_notify()
is _not_ trivial to convert, thanks to the weirdness past the matching
fdput():
	fdput(f);
out:
	if (sock)
		netlink_detachskb(sock, nc);
	else
free_skb:
		dev_kfree_skb(nc);
	return ret;
Unfortunately, there are gotos from before the fdget() both to 'out'
and to 'free_skb'.  Those gotos are bollocks, though - not only because
microoptimizing such failure exits is a bad idea; all gotos to 'free_skb'
should really have been calls of kfree_skb(nc) followed by return.
With that done, we are left with no way to reach 'out' with !sock &&
nc, so that dev_kfree_skb() becomes a dead code.
	After that do_mq_notify() can be switched to CLASS(fd) - all
failure exits prior to fdget() are returns and the only thing downstream
of fdput() is a call of netlink_detachskb(), which can be safely done
before the fdput().

	* getting rid of reassignment in xfs_find_handle().  Result is
trivial to convert to CLASS(fd) - we have fdget() in the very beginning
of scope and fdput() at the very end, several lines later...

	* ib_uverbs_open_xrcd().  FWIW, a closer look shows that the
damn thing is buggy - it accepts _any_ descriptor and pins the associated
inode.  mount tmpfs, open a file there, feed it to that, unmount and
watch the show...  AFAICS, that's done for the sake of libibverbs and
I've no idea how it's actually used - all examples I'd been able to
find use -1 for descriptor here.  Needs to be discussed with infiniband
folks (Sean Hefty?).  For now, leave that as-is.

	* Expanding the vmsplice_type() call.  Result is trivial to convert
to CLASS(fd) - the only failure exit prior to fdget() is a return, all
fdput() are immediately followed by returns.

	* Expanding ____bpf_prog_get() call.  Result is trivial to convert -
fdget() is the very first thing done, all fdput() are immediately followed
by returns.

	* Lifting fdput() from __bpf_map_get().  There are 15
callers.  bpf_map_meta_alloc(), bpf_map_fd_get_ptr(), bpf_map_get(),
bpf_map_get_with_uref() are trivially converted to CLASS(fd); fdget()
is the very first thing done, all fdput() are immediately followed
by returns.  map_lookup_elem(), map_update_elem(), map_delete_elem(),
map_get_next_key(), map_lookup_and_delete_elem(), map_freeze(),
bpf_map_do_batch(), sock_map_get_from_fd(), sock_map_prog_detach()
and sock_map_bpf_prog_query() are also trivial - the only failure
exits prior to fdget() are plain returns and all fdput() are immediately
followed by returns.
	However, the caller in resolve_pseudo_ldimm64() is not trivial.
What happens there is a pass through the array of insns, decoding and
massaging some of two-slot insns.  Massage involves getting a bpf_map by
descriptor we extract from instruction, attaching a reference to that
map to program we are rewriting and replacing the insn with equivalent
that uses that map.  There's a bunch of sanity checks, each with fdput()
on failure, so a conversion to CLASS(fd) would be very nice to have;
unfortunately, there's both a bit of work done after fdput() *and* a goto
to that work from before the fdget().  The former wouldn't be a problem
(the work in question is transposable with fdput()), but the latter is
more serious.
	Lifting this "process one insn" into a helper resolves that,
and result is actually easier to read.
	With that done as a preparatory step, this caller of fdget()
(now in the helper) converts to CLASS(fd) trivially - all failure
exits prior to it are returns, all fdput() are immediately followed
by returns.


	In other words, prep cleanups are playing nice wrt CLASS(fd)
conversions.  There are several places in overlayfs where we need an
explicit extra scope, reordering fdput() with put_unused_fd()
in perf_event_open(2) and possibly something in ib_uverbs_open_xrcd() -
the last one left alone for now.


	With the prep done, most of the remaining values come from
fdget(); fdget_raw() and fdget_pos() are in minority.  fdget_raw()
already has CLASS equivalent (fd_raw); it turns out that _all_
existing callers of fdget_raw() are trivial to convert - all of
them are the very first thing in their scopes and corresponding
fdput() are immediately followed by leaving the scope (either
by return or, in one case, by being the last statement in the scope).
So fdget_raw() converts trivially.


	fdget_pos() doesn't have a matching CLASS() defined, so we'd need
to add one (with fdput_pos() for destructor).  All calls of fdget_pos()
are the very first things done in their scopes; in 10 cases out of 12
(osf_getdirentries(2), ksys_lseek(), llseek(2), ksys_read(), ksys_write(),
old_readdir(2), getdents(2), getdents64(2), old_readdir(2, compat),
getdents(2, compat)) the matching fdput_pos() is immediately followed
by return, making them trivial to convert.  In the remaining two
(do_readv() and do_writev()) there's add_{r,w}char() and inc_sysc{r,w}()
calls downstream of fdput(); fdput_pos() is transposable with those,
so these callers also can be converted.
	Incidentally, there's an inconsistency in rules for calling
inc_sysc{r,w}() - read(2) calls inc_syscr() only when descriptor is
actually opened, but readv(2) does it on every call, opened descriptor
or not.  Had been that way since 2005, when these counters had been
introduced...


	With that taken care of, all we have left is (a plenty of)
fdget() callers.  That falls into several classes:

1) Simple - fdget() done at the beginning of block, fdput() is the last
thing in the block and happens on all paths that leave the block.
Those are trivial to convert to CLASS(fd.....), with no behaviour
changes.
	kvm_spapr_tce_attach_iommu_group()
	kvm_vcpu_ioctl_enable_cap() [3 cases in it]
	spu_create(2)
	sgx_set_attribute()
	__sev_issue_cmd()
	sev_vm_move_enc_context_from()
	amdgpu_sched_process_priority_override()
	amdgpu_sched_context_priority_override()
	drm_syncobj_fd_to_handle()
	rc_dev_get_from_fd()
	__btrfs_ioctl_snap_create()
	eventfd_ctx_fdget()
	do_epoll_ctl() [two nested]
	ioctl_file_clone()
	ioctl(2)
	ioctl(2, compat)
	kernel_read_file_from_fd()
	fanotify_find_path()
	ksys_fallocate()
	fchmod(2)
	ksys_fchown()
	copy_file_range(2) [two nested]
	do_signalfd4()
	syncfs(2)
	do_fsync()
	ksys_sync_file_range()
	fsetxattr(2)
	fgetxattr(2)
	flistxattr(2)
	fremovexattr(2)
	io_attach_sq_data()
	io_sq_offload_create()
	btf_get_by_fd()
	bpf_link_get_from_fd()
	bpf_token_get_from_fd()
	perf_cgroup_connect()
	setns(2)
	pidfd_get_pid()
	prctl_set_mm_exe_file()
	get_watch_queue()
	ksys_fadvise64_64()
	ksys_readahead()
	get_net_ns_by_fd()
	get_ruleset_from_fd()
	kvm_vfio_file_del()
	fini_module()

2) same, but some work (and failure exits) might precede fdget();
fdput() is still immediately followed by leaving the scope.
	ucma_migrate_id()
	do_epoll_wait()
	__ext4_ioctl()
	__f2fs_ioc_move_range()
	fsconfig(2)
	fuse_dev_ioctl_clone()
	flock(2)
	fsmount(2)
	build_mount_idmapped()
	do_fanotify_mark()
	inotify_add_watch(2)
	do_sys_ftruncate()
	ksys_pread64()
	ksys_pwrite64()
	do_sendfile() [two nested]
	splice(2) [two nested]
	tee(2) [two nested]
	xfs_ioc_exchange_range()
	xfs_ioc_swapext() [two nested]
	do_mq_timedsend()
	do_mq_timedreceive()
	do_mq_getsetattr()
	bpf_obj_get_info_by_fd()
	pidfd_getfd(2)
	pidfd_send_signal(2)
	cgroupstats_user_cmd()
	ima_kexec_cmdline()
	read_trusted_verity_root_digests()
	kvm_vfio_file_set_spapr_tce()

That leaves relatively few callers.  Further look at those shows
	* an fdget() call in ib_uverbs_open_xrcd() - to be left alone for
now.
	* do_p{read,write}v().  fdput() is followed by inc_sysc{r,w}() and
possibly add_{r,w}char() there; it's the same story as in do_{read,write}v(),
except that here we use fdget()/fdput() instead of fdget_pos()/fdput_pos(),
and we can transpose those with fdput() just as we could with fdput_pos().
	* cachestat(2).  fdput() is followed by copy_to_user() there,
and it obviously can be transposed with it.  Again, a harmless reorder
is all it takes.
	* spu_run(2).  spufs_calls_put() is transposable with fdput();
might or might not be worth converting spufs_calls_put() to __cleanup - doing
that would get rid of transposition.  Actually, such conversion does look
good...
	* media_request_get_by_fd().  Debugging printk after fdput() on
failure exit path.  Transposable with fdput().
	* coda_parse_fd().  Nearly the same, with invalf() in role of
debugging printk.
	* cifs_ioctl_copychunk().  fdput() is followed by mnt_drop_file_write(); 
all failure exits before fdget() are returns and fdput() _can_ be moved past
mnt_drop_file_write().  Alternatively, we could add a scope there (from
fdget() to fdput()) _OR_ make mnt_{want,drop}_file_write() usable via
CLASS; either would avoid reordering.  The latter is surprisingly plausible,
but that's a separate series.
	* vfs_dedupe_file_range().  fdput() is followed by checking
fatal_signal_pending() (and aborting the loop in such case).  Transposable
with that check.  Yes, it'll probably end up with slightly fatter code
(call after the check has returned false + call on the almost never taken
out-of-line path instead of one call before the check), but it's not worth
bothering with explicit extra scope there (or dragging the check into
the loop condition, for that matter).
	* do_select().  IMO the simplest solution is to take the logics
from fdget() to fdput() into an inlined helper - with existing wait_key_set()
subsumed into that.  No reordering.
	* do_pollfd().  Setting pollfd->revents is downstream of fdput() and
has a goto from before fdget() leading to it.  Solution: lift it into the
only caller of do_pollfd().  No reordering.
	* bpf_token_create().  Hell knows - it copies ->f_path, grabs
a reference to it and does fdput().  struct path is dropped at the very
end.  My preference would be to have file reference kept thourough the
entire thing instead and just use file->f_path instead of its copy.
However, security_bpf_token_create(token, attr, &path) in there takes
struct path *, rather than const struct path.  Hell knows why - the
only LSM that has the corresponding hook ignores the path argument
completely.  Need to discuss with LSM folks.
	* o2hb_region_dev_store() - would've been in the class
above, if not for pointless gotos to return statement just
past the fdput().  Prep commit before the conversions to regularize
that caller.
	* privcmd_ioeventfd_assign() that shouldn't be using fdget() at
all - it actually open-codes eventfd_ctx_fdget() and it ought to just
call it, same as privcmd_ioeventfd_deassign() does.
	* irqfd and friends.  Failure exits prior to fdget() are plain
returns, with one exception (privcmd, where a failure in copy_from_user()
results in goto to the place where we kfree() the thing we'd just
allocated), the only thing downstream of fdput() is (on failure exit)
a kfree() and fdput() can be moved past that.  For privcmd we also need
to move fdget() up - I'd taken it to just before the allocation, leaving
the emptiness check where it was, so that the error values wouldn't
change.  Not pretty, to put it mildly.	That's vfio_virqfd_enable(),
acrn_irqfd_assign(), privcmd_irqfd_assign() and kvm_irqfd_assign().
	* memcg_write_event_control().  Several issues here - for one thing,
its string handling is fucked; accepted inputs are
	<spaces><number> <number><spaces>
or
	<spaces><number> <number> <arguments><spaces>
The numbers represent two file descriptors; <arguments> is a string that
gets interpreted in a way that depends upon the sysfs file we are using.
memcg_write_event_control() starts with parsing the string it had been
given; it decodes the numbers and leaves 'buf' pointing to the location
2 bytes past the end of the second number.  In the second form (with
arguments present) that would point to arguments string.  Unfortunately,
if there'd been neither arguments nor trailing spaces, that pointer
goes 1 byte past the NUL.  And that pointer gets used for further
parsing of arguments.  The original array is what the userland had
passed to write(2), with NUL appended to it.  So a write() of 4095-byte
array, filled with a bunch of spaces followed by a descriptor, space and
another descriptor (and no NUL anywhere) will end up trying to pass
an unmapped address to strcmp().
	Another issue is that the cleanup logics is really messy -
it gradually builds an object that will either end up shoved
into memcg->event_list (on success) or taken apart (on failure).
Theoretically, the latter could be turned into __cleanup-based
destruction, but that's way too much for this series.  The minimal massage
that would suffice for CLASS(fd) conversion is to move both fdget()
to the point before the kmalloc(), with matching fdput() moved to the
very end of cleanup sequence.  After that conversion becomes trivial,
but the whole thing is definitely not pretty.  It is similar to irqfd
stuff, actually; any sanitizing of cleanups in those would probably best
go together (possibly along with other eventfd users).


Overall, looks like conversion to CLASS(fd, ...) is doable with a bit of
preliminary work.  Problematic cases exist, but they are few.  I'd
done such branch (6.10-rc1-based).  The interesting part had been to
see how hard would it be to rebase to 6.11-rc1; that turned out to be
easier than I expected.

The current branch lives in 
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs.git work.fd

Individual patches in followups; it definitely needs review and testing.

Shortlog:
Al Viro (39):
      memcg_write_event_control(): fix a user-triggerable oops
      introduce fd_file(), convert all accessors to it.
      struct fd: representation change
      add struct fd constructors, get rid of __to_fd()
      regularize emptiness checks in fini_module(2) and vfs_dedupe_file_range()
      net/socket.c: switch to CLASS(fd)
      introduce struct fderr, convert overlayfs uses to that
      experimental: convert fs/overlayfs/file.c to CLASS(...)
      timerfd: switch to CLASS(fd, ...)
      get rid of perf_fget_light(), convert kernel/events/core.c to CLASS(fd)
      switch netlink_getsockbyfilp() to taking descriptor
      do_mq_notify(): saner skb freeing on failures
      do_mq_notify(): switch to CLASS(fd, ...)
      simplify xfs_find_handle() a bit
      convert vmsplice() to CLASS(fd, ...)
      convert __bpf_prog_get() to CLASS(fd, ...)
      bpf: resolve_pseudo_ldimm64(): take handling of a single ldimm64 insn into helper
      bpf maps: switch to CLASS(fd, ...)
      fdget_raw() users: switch to CLASS(fd_raw, ...)
      introduce "fd_pos" class, convert fdget_pos() users to it.
      o2hb_region_dev_store(): avoid goto around fdget()/fdput()
      privcmd_ioeventfd_assign(): don't open-code eventfd_ctx_fdget()
      fdget(), trivial conversions
      fdget(), more trivial conversions
      convert do_preadv()/do_pwritev()
      convert cachestat(2)
      switch spufs_calls_{get,put}() to CLASS() use
      convert spu_run(2)
      convert media_request_get_by_fd()
      convert coda_parse_fd()
      convert cifs_ioctl_copychunk()
      convert vfs_dedupe_file_range().
      convert do_select()
      do_pollfd(): convert to CLASS(fd)
      convert bpf_token_create()
      assorted variants of irqfd setup: convert to CLASS(fd)
      memcg_write_event_control(): switch to CLASS(fd)
      css_set_fork(): switch to CLASS(fd_raw, ...)
      deal with the last remaing boolean uses of fd_file()

Diffstat:
 arch/alpha/kernel/osf_sys.c                |   7 +-
 arch/arm/kernel/sys_oabi-compat.c          |  18 +-
 arch/powerpc/kvm/book3s_64_vio.c           |  23 +-
 arch/powerpc/kvm/powerpc.c                 |  30 +--
 arch/powerpc/platforms/cell/spu_syscalls.c |  68 ++----
 arch/x86/kernel/cpu/sgx/main.c             |  10 +-
 arch/x86/kvm/svm/sev.c                     |  43 ++--
 drivers/gpu/drm/amd/amdgpu/amdgpu_sched.c  |  27 +--
 drivers/gpu/drm/drm_syncobj.c              |  11 +-
 drivers/infiniband/core/ucma.c             |  21 +-
 drivers/infiniband/core/uverbs_cmd.c       |  12 +-
 drivers/media/mc/mc-request.c              |  22 +-
 drivers/media/rc/lirc_dev.c                |  15 +-
 drivers/vfio/group.c                       |  10 +-
 drivers/vfio/virqfd.c                      |  20 +-
 drivers/virt/acrn/irqfd.c                  |  17 +-
 drivers/xen/privcmd.c                      |  32 +--
 fs/btrfs/ioctl.c                           |   7 +-
 fs/coda/inode.c                            |  13 +-
 fs/eventfd.c                               |   9 +-
 fs/eventpoll.c                             |  62 ++----
 fs/ext4/ioctl.c                            |  23 +-
 fs/f2fs/file.c                             |  17 +-
 fs/fcntl.c                                 |  74 +++----
 fs/fhandle.c                               |   7 +-
 fs/file.c                                  |  26 +--
 fs/fsopen.c                                |  23 +-
 fs/fuse/dev.c                              |  10 +-
 fs/ioctl.c                                 |  47 ++---
 fs/kernel_read_file.c                      |  12 +-
 fs/locks.c                                 |  27 +--
 fs/namei.c                                 |  19 +-
 fs/namespace.c                             |  53 ++---
 fs/notify/fanotify/fanotify_user.c         |  50 ++---
 fs/notify/inotify/inotify_user.c           |  44 ++--
 fs/ocfs2/cluster/heartbeat.c               |  28 ++-
 fs/open.c                                  |  67 +++---
 fs/overlayfs/file.c                        | 195 ++++++++---------
 fs/quota/quota.c                           |  18 +-
 fs/read_write.c                            | 227 ++++++++------------
 fs/readdir.c                               |  38 ++--
 fs/remap_range.c                           |  11 +-
 fs/select.c                                |  50 ++---
 fs/signalfd.c                              |  11 +-
 fs/smb/client/ioctl.c                      |  17 +-
 fs/splice.c                                |  82 +++-----
 fs/stat.c                                  |  14 +-
 fs/statfs.c                                |  12 +-
 fs/sync.c                                  |  33 ++-
 fs/timerfd.c                               |  44 ++--
 fs/utimes.c                                |  11 +-
 fs/xattr.c                                 |  59 +++---
 fs/xfs/xfs_exchrange.c                     |  12 +-
 fs/xfs/xfs_handle.c                        |  16 +-
 fs/xfs/xfs_ioctl.c                         |  85 +++-----
 include/linux/bpf.h                        |   9 +-
 include/linux/cleanup.h                    |   2 +-
 include/linux/file.h                       |  86 +++++---
 include/linux/netlink.h                    |   2 +-
 io_uring/sqpoll.c                          |  31 +--
 ipc/mqueue.c                               | 137 ++++--------
 kernel/bpf/bpf_inode_storage.c             |  29 +--
 kernel/bpf/btf.c                           |  13 +-
 kernel/bpf/map_in_map.c                    |  38 +---
 kernel/bpf/syscall.c                       | 197 ++++++-----------
 kernel/bpf/token.c                         |  82 +++-----
 kernel/bpf/verifier.c                      | 327 ++++++++++++++---------------
 kernel/cgroup/cgroup.c                     |  21 +-
 kernel/events/core.c                       |  69 +++---
 kernel/module/main.c                       |  15 +-
 kernel/nsproxy.c                           |  15 +-
 kernel/pid.c                               |  26 +--
 kernel/signal.c                            |  33 ++-
 kernel/sys.c                               |  21 +-
 kernel/taskstats.c                         |  20 +-
 kernel/watch_queue.c                       |   8 +-
 mm/fadvise.c                               |  10 +-
 mm/filemap.c                               |  19 +-
 mm/memcontrol-v1.c                         |  59 +++---
 mm/readahead.c                             |  25 +--
 net/core/net_namespace.c                   |  14 +-
 net/core/sock_map.c                        |  23 +-
 net/netlink/af_netlink.c                   |   9 +-
 net/socket.c                               | 303 ++++++++++++--------------
 security/integrity/ima/ima_main.c          |   9 +-
 security/landlock/syscalls.c               |  57 ++---
 security/loadpin/loadpin.c                 |  10 +-
 sound/core/pcm_native.c                    |   6 +-
 virt/kvm/eventfd.c                         |  19 +-
 virt/kvm/vfio.c                            |  18 +-
 90 files changed, 1462 insertions(+), 2239 deletions(-)

Outline of the series:

01/39) memcg_write_event_control(): fix a user-triggerable oops
	will go into #fixes (or via cgroup tree); broken string
handling, caught during that audit, independent from the rest.

Representation change for struct fd:
02/39) introduce fd_file(), convert all accessors to it.
03/39) struct fd: representation change
04/39) add struct fd constructors, get rid of __to_fd()
05/39) regularize emptiness checks in fini_module(2) and vfs_dedupe_file_range()

By this point we have struct fd switched to new representation, accessors added,
all emptiness checks done directly on struct fd instances.

06/39) net/socket.c: switch to CLASS(fd)
	Get rid of the sockfd_lookup_light() and associated irregularities;
fput_light() gone, old users of sockfd_lookup_light() switched to CLASS(fd) +
sock_from_file().

Overlayfs stuff:
07/39) introduce struct fderr, convert overlayfs uses to that
08/39) experimental: convert fs/overlayfs/file.c to CLASS(...)
	
This introduces struct fderr and switches overlayfs to using that
instead of struct fd.

Getting rid of passing struct fd by reference:
09/39) timerfd: switch to CLASS(fd, ...)
10/39) get rid of perf_fget_light(), convert kernel/events/core.c to CLASS(fd)

do_mq_notify() regularization:
11/39) switch netlink_getsockbyfilp() to taking descriptor
12/39) do_mq_notify(): saner skb freeing on failures
13/39) do_mq_notify(): switch to CLASS(fd, ...)

After that the weirdness with reassignments in do_mq_notify() is gone
(and, IMO, result is easier to follow).

14/39) simplify xfs_find_handle() a bit
	Massage to get rid of reassignment there; simplifies control
flow...

Making sure that fdget() and fdput() are done in the same function:
15/39) convert vmsplice() to CLASS(fd, ...)
16/39) convert __bpf_prog_get() to CLASS(fd, ...)
17/39) bpf: resolve_pseudo_ldimm64(): take handling of a single ldimm64 insn into helper
18/39) bpf maps: switch to CLASS(fd, ...)

Deal with fdget_raw() and fdget_pos() users - all trivial to convert.
19/39) fdget_raw() users: switch to CLASS(fd_raw, ...)
20/39) introduce "fd_pos" class, convert fdget_pos() users to it.

Prep for fdget() conversions:
21/39) o2hb_region_dev_store(): avoid goto around fdget()/fdput()
22/39) privcmd_ioeventfd_assign(): don't open-code eventfd_ctx_fdget()

23/39) fdget(), trivial conversions.
	Big one: all callers that have fdget() done the first thing in
scope, with all matching fdput() immediately followed by leaving the
scope.  All of those are trivial to convert.
24/39) fdget(), more trivial conversions
	Same, except that fdget() is preceded by some work.  All fdput()
are still immediately followed by leaving the scope.  These are also
trivial to convert, and along with the previous commit that takes care
of the majority of fdget() calls.

25/39) convert do_preadv()/do_pwritev()
	fdput() is transposable with everything done after it (inc_syscw()
et.al.)
26/39) convert cachestat(2)
	fdput() is transposable with copy_to_user() downstream of it.

27/39) switch spufs_calls_{get,put}() to CLASS() use
28/39) convert spu_run(2)
	fdput() used to be followed by spufs_calls_put(); we could transpose
those two, but spufs_calls_get()/spufd_calls_put() itself can be converted
to CLASS() use and it's cleaner that way.

29/39) convert media_request_get_by_fd()
	fdput() is transposable with debugging printk
30/39) convert coda_parse_fd()
	fdput() is transposable with invalf()

31/39) convert cifs_ioctl_copychunk()
	fdput() moved past mnt_drop_file_write(); harmless, if somewhat
cringeworthy.  Reordering could be avoided either by adding an explicit
scope or by making mnt_drop_file_write() called via __cleanup...

32/39) convert vfs_dedupe_file_range()
	fdput() is followed by checking fatal_signal_pending() (and aborting
the loop in such case).  fdput() is transposable with that check.
Yes, it'll probably end up with slightly fatter code (call after the
check has returned false + call on the almost never taken out-of-line path
instead of one call before the check), but it's not worth bothering with
explicit extra scope there (or dragging the check into the loop condition,
for that matter).

33/39) convert do_select()
	take the logics from fdget() to fdput() into an inlined helper -
with existing wait_key_set() subsumed into that.
34/39) do_pollfd(): convert to CLASS(fd)
	lift setting ->revents into the caller, so that failure exits
(including the early one) would be plain returns.

35/39) convert bpf_token_create()
	keep file reference through the entire thing, don't bother with
grabbing struct path reference (except, for now, around the LSM call
and that only until it gets constified) and while we are at it, don't
confuse the hell out of readers by random mix of path.dentry->d_sb and
path.mnt->mnt_sb uses - these two are equal, so just put one of those
into a local variable and use that.

36/39) assorted variants of irqfd setup: convert to CLASS(fd)
	fdput() is transposable with kfree(); some reordering
is required in one of those (we do fdget() a bit earlier there).
37/39) memcg_write_event_control(): switch to CLASS(fd)
	similar to the previous.  As the matter of fact, there
might be a missing common helper or two hiding in both...

38/39) css_set_fork(): switch to CLASS(fd_raw, ...)
	could be separated from the series; its use of fget_raw()
could be converted to fdget_raw(), with the result convertible to
CLASS(fd_raw)

39/39) deal with the last remaing boolean uses of fd_file()
	most of them had been converted to fd_empty() by now; pick
the strugglers.

^ permalink raw reply	[flat|nested] 134+ messages in thread

* [PATCH 01/39] memcg_write_event_control(): fix a user-triggerable oops
  2024-07-30  5:09 [PATCHSET][RFC] struct fd and memory safety Al Viro
@ 2024-07-30  5:15 ` viro
  2024-07-30  5:15   ` [PATCH 02/39] introduce fd_file(), convert all accessors to it viro
                     ` (38 more replies)
  2024-07-30  5:17 ` [PATCHSET][RFC] struct fd and memory safety Al Viro
                   ` (5 subsequent siblings)
  6 siblings, 39 replies; 134+ messages in thread
From: viro @ 2024-07-30  5:15 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: amir73il, bpf, brauner, cgroups, kvm, netdev, torvalds

From: Al Viro <viro@zeniv.linux.org.uk>

we are *not* guaranteed that anything past the terminating NUL
is mapped (let alone initialized with anything sane).

[the sucker got moved in mainline]

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 mm/memcontrol-v1.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/mm/memcontrol-v1.c b/mm/memcontrol-v1.c
index 2aeea4d8bf8e..417c96f2da28 100644
--- a/mm/memcontrol-v1.c
+++ b/mm/memcontrol-v1.c
@@ -1842,9 +1842,12 @@ static ssize_t memcg_write_event_control(struct kernfs_open_file *of,
 	buf = endp + 1;
 
 	cfd = simple_strtoul(buf, &endp, 10);
-	if ((*endp != ' ') && (*endp != '\0'))
+	if (*endp == '\0')
+		buf = endp;
+	else if (*endp == ' ')
+		buf = endp + 1;
+	else
 		return -EINVAL;
-	buf = endp + 1;
 
 	event = kzalloc(sizeof(*event), GFP_KERNEL);
 	if (!event)
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 134+ messages in thread

* [PATCH 02/39] introduce fd_file(), convert all accessors to it.
  2024-07-30  5:15 ` [PATCH 01/39] memcg_write_event_control(): fix a user-triggerable oops viro
@ 2024-07-30  5:15   ` viro
  2024-08-07  9:55     ` Christian Brauner
  2024-07-30  5:15   ` [PATCH 03/39] struct fd: representation change viro
                     ` (37 subsequent siblings)
  38 siblings, 1 reply; 134+ messages in thread
From: viro @ 2024-07-30  5:15 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: amir73il, bpf, brauner, cgroups, kvm, netdev, torvalds

From: Al Viro <viro@zeniv.linux.org.uk>

	For any changes of struct fd representation we need to
turn existing accesses to fields into calls of wrappers.
Accesses to struct fd::flags are very few (3 in linux/file.h,
1 in net/socket.c, 3 in fs/overlayfs/file.c and 3 more in
explicit initializers).
	Those can be dealt with in the commit converting to
new layout; accesses to struct fd::file are too many for that.
	This commit converts (almost) all of f.file to
fd_file(f).  It's not entirely mechanical ('file' is used as
a member name more than just in struct fd) and it does not
even attempt to distinguish the uses in pointer context from
those in boolean context; the latter will be eventually turned
into a separate helper (fd_empty()).

	NOTE: mass conversion to fd_empty(), tempting as it
might be, is a bad idea; better do that piecewise in commit
that convert from fdget...() to CLASS(...).

[conflicts in fs/fhandle.c, kernel/bpf/syscall.c, mm/memcontrol.c
caught by git; fs/stat.c one got caught by git grep]
[fs/xattr.c conflict]

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 arch/alpha/kernel/osf_sys.c                |   4 +-
 arch/arm/kernel/sys_oabi-compat.c          |  10 +-
 arch/powerpc/kvm/book3s_64_vio.c           |   4 +-
 arch/powerpc/kvm/powerpc.c                 |  12 +--
 arch/powerpc/platforms/cell/spu_syscalls.c |   8 +-
 arch/x86/kernel/cpu/sgx/main.c             |   4 +-
 arch/x86/kvm/svm/sev.c                     |  16 +--
 drivers/gpu/drm/amd/amdgpu/amdgpu_sched.c  |   8 +-
 drivers/gpu/drm/drm_syncobj.c              |   6 +-
 drivers/infiniband/core/ucma.c             |   6 +-
 drivers/infiniband/core/uverbs_cmd.c       |   8 +-
 drivers/media/mc/mc-request.c              |   6 +-
 drivers/media/rc/lirc_dev.c                |   8 +-
 drivers/vfio/group.c                       |   6 +-
 drivers/vfio/virqfd.c                      |   6 +-
 drivers/virt/acrn/irqfd.c                  |   6 +-
 drivers/xen/privcmd.c                      |  10 +-
 fs/btrfs/ioctl.c                           |   4 +-
 fs/coda/inode.c                            |   4 +-
 fs/eventfd.c                               |   4 +-
 fs/eventpoll.c                             |  30 +++---
 fs/ext4/ioctl.c                            |   6 +-
 fs/f2fs/file.c                             |   6 +-
 fs/fcntl.c                                 |  38 +++----
 fs/fhandle.c                               |   4 +-
 fs/fsopen.c                                |   6 +-
 fs/fuse/dev.c                              |   6 +-
 fs/ioctl.c                                 |  30 +++---
 fs/kernel_read_file.c                      |   4 +-
 fs/locks.c                                 |  14 +--
 fs/namei.c                                 |  10 +-
 fs/namespace.c                             |  12 +--
 fs/notify/fanotify/fanotify_user.c         |  12 +--
 fs/notify/inotify/inotify_user.c           |  12 +--
 fs/ocfs2/cluster/heartbeat.c               |   6 +-
 fs/open.c                                  |  24 ++---
 fs/overlayfs/file.c                        |  40 +++----
 fs/quota/quota.c                           |   8 +-
 fs/read_write.c                            | 118 ++++++++++-----------
 fs/readdir.c                               |  20 ++--
 fs/remap_range.c                           |   2 +-
 fs/select.c                                |   8 +-
 fs/signalfd.c                              |   6 +-
 fs/smb/client/ioctl.c                      |   8 +-
 fs/splice.c                                |  22 ++--
 fs/stat.c                                  |   8 +-
 fs/statfs.c                                |   4 +-
 fs/sync.c                                  |  14 +--
 fs/timerfd.c                               |   8 +-
 fs/utimes.c                                |   4 +-
 fs/xattr.c                                 |  36 +++----
 fs/xfs/xfs_exchrange.c                     |   4 +-
 fs/xfs/xfs_handle.c                        |   4 +-
 fs/xfs/xfs_ioctl.c                         |  28 ++---
 include/linux/cleanup.h                    |   2 +-
 include/linux/file.h                       |   6 +-
 io_uring/sqpoll.c                          |  10 +-
 ipc/mqueue.c                               |  50 ++++-----
 kernel/bpf/bpf_inode_storage.c             |  14 +--
 kernel/bpf/btf.c                           |   6 +-
 kernel/bpf/syscall.c                       |  42 ++++----
 kernel/bpf/token.c                         |  10 +-
 kernel/cgroup/cgroup.c                     |   4 +-
 kernel/events/core.c                       |  12 +--
 kernel/module/main.c                       |   2 +-
 kernel/nsproxy.c                           |  12 +--
 kernel/pid.c                               |  10 +-
 kernel/signal.c                            |   6 +-
 kernel/sys.c                               |  10 +-
 kernel/taskstats.c                         |   4 +-
 kernel/watch_queue.c                       |   4 +-
 mm/fadvise.c                               |   4 +-
 mm/filemap.c                               |   6 +-
 mm/memcontrol-v1.c                         |  12 +--
 mm/readahead.c                             |  10 +-
 net/core/net_namespace.c                   |   6 +-
 net/socket.c                               |  12 +--
 security/integrity/ima/ima_main.c          |   4 +-
 security/landlock/syscalls.c               |  22 ++--
 security/loadpin/loadpin.c                 |   4 +-
 sound/core/pcm_native.c                    |   6 +-
 virt/kvm/eventfd.c                         |   6 +-
 virt/kvm/vfio.c                            |   8 +-
 83 files changed, 504 insertions(+), 502 deletions(-)

diff --git a/arch/alpha/kernel/osf_sys.c b/arch/alpha/kernel/osf_sys.c
index e5f881bc8288..56fea57f9642 100644
--- a/arch/alpha/kernel/osf_sys.c
+++ b/arch/alpha/kernel/osf_sys.c
@@ -160,10 +160,10 @@ SYSCALL_DEFINE4(osf_getdirentries, unsigned int, fd,
 		.count = count
 	};
 
-	if (!arg.file)
+	if (!fd_file(arg))
 		return -EBADF;
 
-	error = iterate_dir(arg.file, &buf.ctx);
+	error = iterate_dir(fd_file(arg), &buf.ctx);
 	if (error >= 0)
 		error = buf.error;
 	if (count != buf.count)
diff --git a/arch/arm/kernel/sys_oabi-compat.c b/arch/arm/kernel/sys_oabi-compat.c
index d00f4040a9f5..f5781ff54a5c 100644
--- a/arch/arm/kernel/sys_oabi-compat.c
+++ b/arch/arm/kernel/sys_oabi-compat.c
@@ -239,19 +239,19 @@ asmlinkage long sys_oabi_fcntl64(unsigned int fd, unsigned int cmd,
 	struct flock64 flock;
 	long err = -EBADF;
 
-	if (!f.file)
+	if (!fd_file(f))
 		goto out;
 
 	switch (cmd) {
 	case F_GETLK64:
 	case F_OFD_GETLK:
-		err = security_file_fcntl(f.file, cmd, arg);
+		err = security_file_fcntl(fd_file(f), cmd, arg);
 		if (err)
 			break;
 		err = get_oabi_flock(&flock, argp);
 		if (err)
 			break;
-		err = fcntl_getlk64(f.file, cmd, &flock);
+		err = fcntl_getlk64(fd_file(f), cmd, &flock);
 		if (!err)
 		       err = put_oabi_flock(&flock, argp);
 		break;
@@ -259,13 +259,13 @@ asmlinkage long sys_oabi_fcntl64(unsigned int fd, unsigned int cmd,
 	case F_SETLKW64:
 	case F_OFD_SETLK:
 	case F_OFD_SETLKW:
-		err = security_file_fcntl(f.file, cmd, arg);
+		err = security_file_fcntl(fd_file(f), cmd, arg);
 		if (err)
 			break;
 		err = get_oabi_flock(&flock, argp);
 		if (err)
 			break;
-		err = fcntl_setlk64(fd, f.file, cmd, &flock);
+		err = fcntl_setlk64(fd, fd_file(f), cmd, &flock);
 		break;
 	default:
 		err = sys_fcntl64(fd, cmd, arg);
diff --git a/arch/powerpc/kvm/book3s_64_vio.c b/arch/powerpc/kvm/book3s_64_vio.c
index 3ff3de9a52ac..34c0adb9fdbf 100644
--- a/arch/powerpc/kvm/book3s_64_vio.c
+++ b/arch/powerpc/kvm/book3s_64_vio.c
@@ -118,12 +118,12 @@ long kvm_spapr_tce_attach_iommu_group(struct kvm *kvm, int tablefd,
 	struct fd f;
 
 	f = fdget(tablefd);
-	if (!f.file)
+	if (!fd_file(f))
 		return -EBADF;
 
 	rcu_read_lock();
 	list_for_each_entry_rcu(stt, &kvm->arch.spapr_tce_tables, list) {
-		if (stt == f.file->private_data) {
+		if (stt == fd_file(f)->private_data) {
 			found = true;
 			break;
 		}
diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index 5e6c7b527677..f14329989e9a 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -1938,11 +1938,11 @@ static int kvm_vcpu_ioctl_enable_cap(struct kvm_vcpu *vcpu,
 
 		r = -EBADF;
 		f = fdget(cap->args[0]);
-		if (!f.file)
+		if (!fd_file(f))
 			break;
 
 		r = -EPERM;
-		dev = kvm_device_from_filp(f.file);
+		dev = kvm_device_from_filp(fd_file(f));
 		if (dev)
 			r = kvmppc_mpic_connect_vcpu(dev, vcpu, cap->args[1]);
 
@@ -1957,11 +1957,11 @@ static int kvm_vcpu_ioctl_enable_cap(struct kvm_vcpu *vcpu,
 
 		r = -EBADF;
 		f = fdget(cap->args[0]);
-		if (!f.file)
+		if (!fd_file(f))
 			break;
 
 		r = -EPERM;
-		dev = kvm_device_from_filp(f.file);
+		dev = kvm_device_from_filp(fd_file(f));
 		if (dev) {
 			if (xics_on_xive())
 				r = kvmppc_xive_connect_vcpu(dev, vcpu, cap->args[1]);
@@ -1980,7 +1980,7 @@ static int kvm_vcpu_ioctl_enable_cap(struct kvm_vcpu *vcpu,
 
 		r = -EBADF;
 		f = fdget(cap->args[0]);
-		if (!f.file)
+		if (!fd_file(f))
 			break;
 
 		r = -ENXIO;
@@ -1990,7 +1990,7 @@ static int kvm_vcpu_ioctl_enable_cap(struct kvm_vcpu *vcpu,
 		}
 
 		r = -EPERM;
-		dev = kvm_device_from_filp(f.file);
+		dev = kvm_device_from_filp(fd_file(f));
 		if (dev)
 			r = kvmppc_xive_native_connect_vcpu(dev, vcpu,
 							    cap->args[1]);
diff --git a/arch/powerpc/platforms/cell/spu_syscalls.c b/arch/powerpc/platforms/cell/spu_syscalls.c
index 87ad7d563cfa..cd7d42fc12a6 100644
--- a/arch/powerpc/platforms/cell/spu_syscalls.c
+++ b/arch/powerpc/platforms/cell/spu_syscalls.c
@@ -66,8 +66,8 @@ SYSCALL_DEFINE4(spu_create, const char __user *, name, unsigned int, flags,
 	if (flags & SPU_CREATE_AFFINITY_SPU) {
 		struct fd neighbor = fdget(neighbor_fd);
 		ret = -EBADF;
-		if (neighbor.file) {
-			ret = calls->create_thread(name, flags, mode, neighbor.file);
+		if (fd_file(neighbor)) {
+			ret = calls->create_thread(name, flags, mode, fd_file(neighbor));
 			fdput(neighbor);
 		}
 	} else
@@ -89,8 +89,8 @@ SYSCALL_DEFINE3(spu_run,int, fd, __u32 __user *, unpc, __u32 __user *, ustatus)
 
 	ret = -EBADF;
 	arg = fdget(fd);
-	if (arg.file) {
-		ret = calls->spu_run(arg.file, unpc, ustatus);
+	if (fd_file(arg)) {
+		ret = calls->spu_run(fd_file(arg), unpc, ustatus);
 		fdput(arg);
 	}
 
diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
index 27892e57c4ef..d01deb386395 100644
--- a/arch/x86/kernel/cpu/sgx/main.c
+++ b/arch/x86/kernel/cpu/sgx/main.c
@@ -895,10 +895,10 @@ int sgx_set_attribute(unsigned long *allowed_attributes,
 {
 	struct fd f = fdget(attribute_fd);
 
-	if (!f.file)
+	if (!fd_file(f))
 		return -EINVAL;
 
-	if (f.file->f_op != &sgx_provision_fops) {
+	if (fd_file(f)->f_op != &sgx_provision_fops) {
 		fdput(f);
 		return -EINVAL;
 	}
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index a16c873b3232..0e38b5223263 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -534,10 +534,10 @@ static int __sev_issue_cmd(int fd, int id, void *data, int *error)
 	int ret;
 
 	f = fdget(fd);
-	if (!f.file)
+	if (!fd_file(f))
 		return -EBADF;
 
-	ret = sev_issue_cmd_external_user(f.file, id, data, error);
+	ret = sev_issue_cmd_external_user(fd_file(f), id, data, error);
 
 	fdput(f);
 	return ret;
@@ -2078,15 +2078,15 @@ int sev_vm_move_enc_context_from(struct kvm *kvm, unsigned int source_fd)
 	bool charged = false;
 	int ret;
 
-	if (!f.file)
+	if (!fd_file(f))
 		return -EBADF;
 
-	if (!file_is_kvm(f.file)) {
+	if (!file_is_kvm(fd_file(f))) {
 		ret = -EBADF;
 		goto out_fput;
 	}
 
-	source_kvm = f.file->private_data;
+	source_kvm = fd_file(f)->private_data;
 	ret = sev_lock_two_vms(kvm, source_kvm);
 	if (ret)
 		goto out_fput;
@@ -2801,15 +2801,15 @@ int sev_vm_copy_enc_context_from(struct kvm *kvm, unsigned int source_fd)
 	struct kvm_sev_info *source_sev, *mirror_sev;
 	int ret;
 
-	if (!f.file)
+	if (!fd_file(f))
 		return -EBADF;
 
-	if (!file_is_kvm(f.file)) {
+	if (!file_is_kvm(fd_file(f))) {
 		ret = -EBADF;
 		goto e_source_fput;
 	}
 
-	source_kvm = f.file->private_data;
+	source_kvm = fd_file(f)->private_data;
 	ret = sev_lock_two_vms(kvm, source_kvm);
 	if (ret)
 		goto e_source_fput;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_sched.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_sched.c
index 863b2a34b2d6..a9298cb8d19a 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_sched.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_sched.c
@@ -43,10 +43,10 @@ static int amdgpu_sched_process_priority_override(struct amdgpu_device *adev,
 	uint32_t id;
 	int r;
 
-	if (!f.file)
+	if (!fd_file(f))
 		return -EINVAL;
 
-	r = amdgpu_file_to_fpriv(f.file, &fpriv);
+	r = amdgpu_file_to_fpriv(fd_file(f), &fpriv);
 	if (r) {
 		fdput(f);
 		return r;
@@ -72,10 +72,10 @@ static int amdgpu_sched_context_priority_override(struct amdgpu_device *adev,
 	struct amdgpu_ctx *ctx;
 	int r;
 
-	if (!f.file)
+	if (!fd_file(f))
 		return -EINVAL;
 
-	r = amdgpu_file_to_fpriv(f.file, &fpriv);
+	r = amdgpu_file_to_fpriv(fd_file(f), &fpriv);
 	if (r) {
 		fdput(f);
 		return r;
diff --git a/drivers/gpu/drm/drm_syncobj.c b/drivers/gpu/drm/drm_syncobj.c
index a0e94217b511..7fb31ca3b5fc 100644
--- a/drivers/gpu/drm/drm_syncobj.c
+++ b/drivers/gpu/drm/drm_syncobj.c
@@ -715,16 +715,16 @@ static int drm_syncobj_fd_to_handle(struct drm_file *file_private,
 	struct fd f = fdget(fd);
 	int ret;
 
-	if (!f.file)
+	if (!fd_file(f))
 		return -EINVAL;
 
-	if (f.file->f_op != &drm_syncobj_file_fops) {
+	if (fd_file(f)->f_op != &drm_syncobj_file_fops) {
 		fdput(f);
 		return -EINVAL;
 	}
 
 	/* take a reference to put in the idr */
-	syncobj = f.file->private_data;
+	syncobj = fd_file(f)->private_data;
 	drm_syncobj_get(syncobj);
 
 	idr_preload(GFP_KERNEL);
diff --git a/drivers/infiniband/core/ucma.c b/drivers/infiniband/core/ucma.c
index 5f5ad8faf86e..dc57d07a1f45 100644
--- a/drivers/infiniband/core/ucma.c
+++ b/drivers/infiniband/core/ucma.c
@@ -1624,13 +1624,13 @@ static ssize_t ucma_migrate_id(struct ucma_file *new_file,
 
 	/* Get current fd to protect against it being closed */
 	f = fdget(cmd.fd);
-	if (!f.file)
+	if (!fd_file(f))
 		return -ENOENT;
-	if (f.file->f_op != &ucma_fops) {
+	if (fd_file(f)->f_op != &ucma_fops) {
 		ret = -EINVAL;
 		goto file_put;
 	}
-	cur_file = f.file->private_data;
+	cur_file = fd_file(f)->private_data;
 
 	/* Validate current fd and prevent destruction of id. */
 	ctx = ucma_get_ctx(cur_file, cmd.id);
diff --git a/drivers/infiniband/core/uverbs_cmd.c b/drivers/infiniband/core/uverbs_cmd.c
index 1b3ea71f2c33..3f85575cf971 100644
--- a/drivers/infiniband/core/uverbs_cmd.c
+++ b/drivers/infiniband/core/uverbs_cmd.c
@@ -584,12 +584,12 @@ static int ib_uverbs_open_xrcd(struct uverbs_attr_bundle *attrs)
 	if (cmd.fd != -1) {
 		/* search for file descriptor */
 		f = fdget(cmd.fd);
-		if (!f.file) {
+		if (!fd_file(f)) {
 			ret = -EBADF;
 			goto err_tree_mutex_unlock;
 		}
 
-		inode = file_inode(f.file);
+		inode = file_inode(fd_file(f));
 		xrcd = find_xrcd(ibudev, inode);
 		if (!xrcd && !(cmd.oflags & O_CREAT)) {
 			/* no file descriptor. Need CREATE flag */
@@ -632,7 +632,7 @@ static int ib_uverbs_open_xrcd(struct uverbs_attr_bundle *attrs)
 		atomic_inc(&xrcd->usecnt);
 	}
 
-	if (f.file)
+	if (fd_file(f))
 		fdput(f);
 
 	mutex_unlock(&ibudev->xrcd_tree_mutex);
@@ -648,7 +648,7 @@ static int ib_uverbs_open_xrcd(struct uverbs_attr_bundle *attrs)
 	uobj_alloc_abort(&obj->uobject, attrs);
 
 err_tree_mutex_unlock:
-	if (f.file)
+	if (fd_file(f))
 		fdput(f);
 
 	mutex_unlock(&ibudev->xrcd_tree_mutex);
diff --git a/drivers/media/mc/mc-request.c b/drivers/media/mc/mc-request.c
index addb8f2d8939..e064914c476e 100644
--- a/drivers/media/mc/mc-request.c
+++ b/drivers/media/mc/mc-request.c
@@ -254,12 +254,12 @@ media_request_get_by_fd(struct media_device *mdev, int request_fd)
 		return ERR_PTR(-EBADR);
 
 	f = fdget(request_fd);
-	if (!f.file)
+	if (!fd_file(f))
 		goto err_no_req_fd;
 
-	if (f.file->f_op != &request_fops)
+	if (fd_file(f)->f_op != &request_fops)
 		goto err_fput;
-	req = f.file->private_data;
+	req = fd_file(f)->private_data;
 	if (req->mdev != mdev)
 		goto err_fput;
 
diff --git a/drivers/media/rc/lirc_dev.c b/drivers/media/rc/lirc_dev.c
index 717c441b4a86..b8dfd530fab7 100644
--- a/drivers/media/rc/lirc_dev.c
+++ b/drivers/media/rc/lirc_dev.c
@@ -820,20 +820,20 @@ struct rc_dev *rc_dev_get_from_fd(int fd, bool write)
 	struct lirc_fh *fh;
 	struct rc_dev *dev;
 
-	if (!f.file)
+	if (!fd_file(f))
 		return ERR_PTR(-EBADF);
 
-	if (f.file->f_op != &lirc_fops) {
+	if (fd_file(f)->f_op != &lirc_fops) {
 		fdput(f);
 		return ERR_PTR(-EINVAL);
 	}
 
-	if (write && !(f.file->f_mode & FMODE_WRITE)) {
+	if (write && !(fd_file(f)->f_mode & FMODE_WRITE)) {
 		fdput(f);
 		return ERR_PTR(-EPERM);
 	}
 
-	fh = f.file->private_data;
+	fh = fd_file(f)->private_data;
 	dev = fh->rc;
 
 	get_device(&dev->dev);
diff --git a/drivers/vfio/group.c b/drivers/vfio/group.c
index ded364588d29..95b336de8a17 100644
--- a/drivers/vfio/group.c
+++ b/drivers/vfio/group.c
@@ -112,7 +112,7 @@ static int vfio_group_ioctl_set_container(struct vfio_group *group,
 		return -EFAULT;
 
 	f = fdget(fd);
-	if (!f.file)
+	if (!fd_file(f))
 		return -EBADF;
 
 	mutex_lock(&group->group_lock);
@@ -125,13 +125,13 @@ static int vfio_group_ioctl_set_container(struct vfio_group *group,
 		goto out_unlock;
 	}
 
-	container = vfio_container_from_file(f.file);
+	container = vfio_container_from_file(fd_file(f));
 	if (container) {
 		ret = vfio_container_attach_group(container, group);
 		goto out_unlock;
 	}
 
-	iommufd = iommufd_ctx_from_file(f.file);
+	iommufd = iommufd_ctx_from_file(fd_file(f));
 	if (!IS_ERR(iommufd)) {
 		if (IS_ENABLED(CONFIG_VFIO_NOIOMMU) &&
 		    group->type == VFIO_NO_IOMMU)
diff --git a/drivers/vfio/virqfd.c b/drivers/vfio/virqfd.c
index 532269133801..d22881245e89 100644
--- a/drivers/vfio/virqfd.c
+++ b/drivers/vfio/virqfd.c
@@ -134,12 +134,12 @@ int vfio_virqfd_enable(void *opaque,
 	INIT_WORK(&virqfd->flush_inject, virqfd_flush_inject);
 
 	irqfd = fdget(fd);
-	if (!irqfd.file) {
+	if (!fd_file(irqfd)) {
 		ret = -EBADF;
 		goto err_fd;
 	}
 
-	ctx = eventfd_ctx_fileget(irqfd.file);
+	ctx = eventfd_ctx_fileget(fd_file(irqfd));
 	if (IS_ERR(ctx)) {
 		ret = PTR_ERR(ctx);
 		goto err_ctx;
@@ -171,7 +171,7 @@ int vfio_virqfd_enable(void *opaque,
 	init_waitqueue_func_entry(&virqfd->wait, virqfd_wakeup);
 	init_poll_funcptr(&virqfd->pt, virqfd_ptable_queue_proc);
 
-	events = vfs_poll(irqfd.file, &virqfd->pt);
+	events = vfs_poll(fd_file(irqfd), &virqfd->pt);
 
 	/*
 	 * Check if there was an event already pending on the eventfd
diff --git a/drivers/virt/acrn/irqfd.c b/drivers/virt/acrn/irqfd.c
index d4ad211dce7a..9994d818bb7e 100644
--- a/drivers/virt/acrn/irqfd.c
+++ b/drivers/virt/acrn/irqfd.c
@@ -125,12 +125,12 @@ static int acrn_irqfd_assign(struct acrn_vm *vm, struct acrn_irqfd *args)
 	INIT_WORK(&irqfd->shutdown, hsm_irqfd_shutdown_work);
 
 	f = fdget(args->fd);
-	if (!f.file) {
+	if (!fd_file(f)) {
 		ret = -EBADF;
 		goto out;
 	}
 
-	eventfd = eventfd_ctx_fileget(f.file);
+	eventfd = eventfd_ctx_fileget(fd_file(f));
 	if (IS_ERR(eventfd)) {
 		ret = PTR_ERR(eventfd);
 		goto fail;
@@ -157,7 +157,7 @@ static int acrn_irqfd_assign(struct acrn_vm *vm, struct acrn_irqfd *args)
 	mutex_unlock(&vm->irqfds_lock);
 
 	/* Check the pending event in this stage */
-	events = vfs_poll(f.file, &irqfd->pt);
+	events = vfs_poll(fd_file(f), &irqfd->pt);
 
 	if (events & EPOLLIN)
 		acrn_irqfd_inject(irqfd);
diff --git a/drivers/xen/privcmd.c b/drivers/xen/privcmd.c
index 9563650dfbaf..54e4f285c0f4 100644
--- a/drivers/xen/privcmd.c
+++ b/drivers/xen/privcmd.c
@@ -959,12 +959,12 @@ static int privcmd_irqfd_assign(struct privcmd_irqfd *irqfd)
 	INIT_WORK(&kirqfd->shutdown, irqfd_shutdown);
 
 	f = fdget(irqfd->fd);
-	if (!f.file) {
+	if (!fd_file(f)) {
 		ret = -EBADF;
 		goto error_kfree;
 	}
 
-	kirqfd->eventfd = eventfd_ctx_fileget(f.file);
+	kirqfd->eventfd = eventfd_ctx_fileget(fd_file(f));
 	if (IS_ERR(kirqfd->eventfd)) {
 		ret = PTR_ERR(kirqfd->eventfd);
 		goto error_fd_put;
@@ -995,7 +995,7 @@ static int privcmd_irqfd_assign(struct privcmd_irqfd *irqfd)
 	 * Check if there was an event already pending on the eventfd before we
 	 * registered, and trigger it as if we didn't miss it.
 	 */
-	events = vfs_poll(f.file, &kirqfd->pt);
+	events = vfs_poll(fd_file(f), &kirqfd->pt);
 	if (events & EPOLLIN)
 		irqfd_inject(kirqfd);
 
@@ -1345,12 +1345,12 @@ static int privcmd_ioeventfd_assign(struct privcmd_ioeventfd *ioeventfd)
 		return -ENOMEM;
 
 	f = fdget(ioeventfd->event_fd);
-	if (!f.file) {
+	if (!fd_file(f)) {
 		ret = -EBADF;
 		goto error_kfree;
 	}
 
-	kioeventfd->eventfd = eventfd_ctx_fileget(f.file);
+	kioeventfd->eventfd = eventfd_ctx_fileget(fd_file(f));
 	fdput(f);
 
 	if (IS_ERR(kioeventfd->eventfd)) {
diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index e0a664b8a46a..32ddd3d31719 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -1312,12 +1312,12 @@ static noinline int __btrfs_ioctl_snap_create(struct file *file,
 	} else {
 		struct fd src = fdget(fd);
 		struct inode *src_inode;
-		if (!src.file) {
+		if (!fd_file(src)) {
 			ret = -EINVAL;
 			goto out_drop_write;
 		}
 
-		src_inode = file_inode(src.file);
+		src_inode = file_inode(fd_file(src));
 		if (src_inode->i_sb != file_inode(file)->i_sb) {
 			btrfs_info(BTRFS_I(file_inode(file))->root->fs_info,
 				   "Snapshot src from another FS");
diff --git a/fs/coda/inode.c b/fs/coda/inode.c
index 6898dc621011..7d56b6d1e4c3 100644
--- a/fs/coda/inode.c
+++ b/fs/coda/inode.c
@@ -127,9 +127,9 @@ static int coda_parse_fd(struct fs_context *fc, int fd)
 	int idx;
 
 	f = fdget(fd);
-	if (!f.file)
+	if (!fd_file(f))
 		return -EBADF;
-	inode = file_inode(f.file);
+	inode = file_inode(fd_file(f));
 	if (!S_ISCHR(inode->i_mode) || imajor(inode) != CODA_PSDEV_MAJOR) {
 		fdput(f);
 		return invalf(fc, "code: Not coda psdev");
diff --git a/fs/eventfd.c b/fs/eventfd.c
index 9afdb722fa92..22c934f3a080 100644
--- a/fs/eventfd.c
+++ b/fs/eventfd.c
@@ -349,9 +349,9 @@ struct eventfd_ctx *eventfd_ctx_fdget(int fd)
 {
 	struct eventfd_ctx *ctx;
 	struct fd f = fdget(fd);
-	if (!f.file)
+	if (!fd_file(f))
 		return ERR_PTR(-EBADF);
-	ctx = eventfd_ctx_fileget(f.file);
+	ctx = eventfd_ctx_fileget(fd_file(f));
 	fdput(f);
 	return ctx;
 }
diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index f53ca4f7fced..28d1a754cf33 100644
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -2266,17 +2266,17 @@ int do_epoll_ctl(int epfd, int op, int fd, struct epoll_event *epds,
 
 	error = -EBADF;
 	f = fdget(epfd);
-	if (!f.file)
+	if (!fd_file(f))
 		goto error_return;
 
 	/* Get the "struct file *" for the target file */
 	tf = fdget(fd);
-	if (!tf.file)
+	if (!fd_file(tf))
 		goto error_fput;
 
 	/* The target file descriptor must support poll */
 	error = -EPERM;
-	if (!file_can_poll(tf.file))
+	if (!file_can_poll(fd_file(tf)))
 		goto error_tgt_fput;
 
 	/* Check if EPOLLWAKEUP is allowed */
@@ -2289,7 +2289,7 @@ int do_epoll_ctl(int epfd, int op, int fd, struct epoll_event *epds,
 	 * adding an epoll file descriptor inside itself.
 	 */
 	error = -EINVAL;
-	if (f.file == tf.file || !is_file_epoll(f.file))
+	if (fd_file(f) == fd_file(tf) || !is_file_epoll(fd_file(f)))
 		goto error_tgt_fput;
 
 	/*
@@ -2300,7 +2300,7 @@ int do_epoll_ctl(int epfd, int op, int fd, struct epoll_event *epds,
 	if (ep_op_has_event(op) && (epds->events & EPOLLEXCLUSIVE)) {
 		if (op == EPOLL_CTL_MOD)
 			goto error_tgt_fput;
-		if (op == EPOLL_CTL_ADD && (is_file_epoll(tf.file) ||
+		if (op == EPOLL_CTL_ADD && (is_file_epoll(fd_file(tf)) ||
 				(epds->events & ~EPOLLEXCLUSIVE_OK_BITS)))
 			goto error_tgt_fput;
 	}
@@ -2309,7 +2309,7 @@ int do_epoll_ctl(int epfd, int op, int fd, struct epoll_event *epds,
 	 * At this point it is safe to assume that the "private_data" contains
 	 * our own data structure.
 	 */
-	ep = f.file->private_data;
+	ep = fd_file(f)->private_data;
 
 	/*
 	 * When we insert an epoll file descriptor inside another epoll file
@@ -2330,16 +2330,16 @@ int do_epoll_ctl(int epfd, int op, int fd, struct epoll_event *epds,
 	if (error)
 		goto error_tgt_fput;
 	if (op == EPOLL_CTL_ADD) {
-		if (READ_ONCE(f.file->f_ep) || ep->gen == loop_check_gen ||
-		    is_file_epoll(tf.file)) {
+		if (READ_ONCE(fd_file(f)->f_ep) || ep->gen == loop_check_gen ||
+		    is_file_epoll(fd_file(tf))) {
 			mutex_unlock(&ep->mtx);
 			error = epoll_mutex_lock(&epnested_mutex, 0, nonblock);
 			if (error)
 				goto error_tgt_fput;
 			loop_check_gen++;
 			full_check = 1;
-			if (is_file_epoll(tf.file)) {
-				tep = tf.file->private_data;
+			if (is_file_epoll(fd_file(tf))) {
+				tep = fd_file(tf)->private_data;
 				error = -ELOOP;
 				if (ep_loop_check(ep, tep) != 0)
 					goto error_tgt_fput;
@@ -2355,14 +2355,14 @@ int do_epoll_ctl(int epfd, int op, int fd, struct epoll_event *epds,
 	 * above, we can be sure to be able to use the item looked up by
 	 * ep_find() till we release the mutex.
 	 */
-	epi = ep_find(ep, tf.file, fd);
+	epi = ep_find(ep, fd_file(tf), fd);
 
 	error = -EINVAL;
 	switch (op) {
 	case EPOLL_CTL_ADD:
 		if (!epi) {
 			epds->events |= EPOLLERR | EPOLLHUP;
-			error = ep_insert(ep, epds, tf.file, fd, full_check);
+			error = ep_insert(ep, epds, fd_file(tf), fd, full_check);
 		} else
 			error = -EEXIST;
 		break;
@@ -2443,7 +2443,7 @@ static int do_epoll_wait(int epfd, struct epoll_event __user *events,
 
 	/* Get the "struct file *" for the eventpoll file */
 	f = fdget(epfd);
-	if (!f.file)
+	if (!fd_file(f))
 		return -EBADF;
 
 	/*
@@ -2451,14 +2451,14 @@ static int do_epoll_wait(int epfd, struct epoll_event __user *events,
 	 * the user passed to us _is_ an eventpoll file.
 	 */
 	error = -EINVAL;
-	if (!is_file_epoll(f.file))
+	if (!is_file_epoll(fd_file(f)))
 		goto error_fput;
 
 	/*
 	 * At this point it is safe to assume that the "private_data" contains
 	 * our own data structure.
 	 */
-	ep = f.file->private_data;
+	ep = fd_file(f)->private_data;
 
 	/* Time to fish for events ... */
 	error = ep_poll(ep, events, maxevents, to);
diff --git a/fs/ext4/ioctl.c b/fs/ext4/ioctl.c
index e8bf5972dd47..1c77400bd88e 100644
--- a/fs/ext4/ioctl.c
+++ b/fs/ext4/ioctl.c
@@ -1343,10 +1343,10 @@ static long __ext4_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
 		me.moved_len = 0;
 
 		donor = fdget(me.donor_fd);
-		if (!donor.file)
+		if (!fd_file(donor))
 			return -EBADF;
 
-		if (!(donor.file->f_mode & FMODE_WRITE)) {
+		if (!(fd_file(donor)->f_mode & FMODE_WRITE)) {
 			err = -EBADF;
 			goto mext_out;
 		}
@@ -1367,7 +1367,7 @@ static long __ext4_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
 		if (err)
 			goto mext_out;
 
-		err = ext4_move_extents(filp, donor.file, me.orig_start,
+		err = ext4_move_extents(filp, fd_file(donor), me.orig_start,
 					me.donor_start, me.len, &me.moved_len);
 		mnt_drop_write_file(filp);
 
diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
index 168f08507004..903337f8d21a 100644
--- a/fs/f2fs/file.c
+++ b/fs/f2fs/file.c
@@ -3014,10 +3014,10 @@ static int __f2fs_ioc_move_range(struct file *filp,
 		return -EBADF;
 
 	dst = fdget(range->dst_fd);
-	if (!dst.file)
+	if (!fd_file(dst))
 		return -EBADF;
 
-	if (!(dst.file->f_mode & FMODE_WRITE)) {
+	if (!(fd_file(dst)->f_mode & FMODE_WRITE)) {
 		err = -EBADF;
 		goto err_out;
 	}
@@ -3026,7 +3026,7 @@ static int __f2fs_ioc_move_range(struct file *filp,
 	if (err)
 		goto err_out;
 
-	err = f2fs_move_file_range(filp, range->pos_in, dst.file,
+	err = f2fs_move_file_range(filp, range->pos_in, fd_file(dst),
 					range->pos_out, range->len);
 
 	mnt_drop_write_file(filp);
diff --git a/fs/fcntl.c b/fs/fcntl.c
index 300e5d9ad913..2b5616762354 100644
--- a/fs/fcntl.c
+++ b/fs/fcntl.c
@@ -340,7 +340,7 @@ static long f_dupfd_query(int fd, struct file *filp)
 	 * overkill, but given our lockless file pointer lookup, the
 	 * alternatives are complicated.
 	 */
-	return f.file == filp;
+	return fd_file(f) == filp;
 }
 
 static long do_fcntl(int fd, unsigned int cmd, unsigned long arg,
@@ -479,17 +479,17 @@ SYSCALL_DEFINE3(fcntl, unsigned int, fd, unsigned int, cmd, unsigned long, arg)
 	struct fd f = fdget_raw(fd);
 	long err = -EBADF;
 
-	if (!f.file)
+	if (!fd_file(f))
 		goto out;
 
-	if (unlikely(f.file->f_mode & FMODE_PATH)) {
+	if (unlikely(fd_file(f)->f_mode & FMODE_PATH)) {
 		if (!check_fcntl_cmd(cmd))
 			goto out1;
 	}
 
-	err = security_file_fcntl(f.file, cmd, arg);
+	err = security_file_fcntl(fd_file(f), cmd, arg);
 	if (!err)
-		err = do_fcntl(fd, cmd, arg, f.file);
+		err = do_fcntl(fd, cmd, arg, fd_file(f));
 
 out1:
  	fdput(f);
@@ -506,15 +506,15 @@ SYSCALL_DEFINE3(fcntl64, unsigned int, fd, unsigned int, cmd,
 	struct flock64 flock;
 	long err = -EBADF;
 
-	if (!f.file)
+	if (!fd_file(f))
 		goto out;
 
-	if (unlikely(f.file->f_mode & FMODE_PATH)) {
+	if (unlikely(fd_file(f)->f_mode & FMODE_PATH)) {
 		if (!check_fcntl_cmd(cmd))
 			goto out1;
 	}
 
-	err = security_file_fcntl(f.file, cmd, arg);
+	err = security_file_fcntl(fd_file(f), cmd, arg);
 	if (err)
 		goto out1;
 	
@@ -524,7 +524,7 @@ SYSCALL_DEFINE3(fcntl64, unsigned int, fd, unsigned int, cmd,
 		err = -EFAULT;
 		if (copy_from_user(&flock, argp, sizeof(flock)))
 			break;
-		err = fcntl_getlk64(f.file, cmd, &flock);
+		err = fcntl_getlk64(fd_file(f), cmd, &flock);
 		if (!err && copy_to_user(argp, &flock, sizeof(flock)))
 			err = -EFAULT;
 		break;
@@ -535,10 +535,10 @@ SYSCALL_DEFINE3(fcntl64, unsigned int, fd, unsigned int, cmd,
 		err = -EFAULT;
 		if (copy_from_user(&flock, argp, sizeof(flock)))
 			break;
-		err = fcntl_setlk64(fd, f.file, cmd, &flock);
+		err = fcntl_setlk64(fd, fd_file(f), cmd, &flock);
 		break;
 	default:
-		err = do_fcntl(fd, cmd, arg, f.file);
+		err = do_fcntl(fd, cmd, arg, fd_file(f));
 		break;
 	}
 out1:
@@ -643,15 +643,15 @@ static long do_compat_fcntl64(unsigned int fd, unsigned int cmd,
 	struct flock flock;
 	long err = -EBADF;
 
-	if (!f.file)
+	if (!fd_file(f))
 		return err;
 
-	if (unlikely(f.file->f_mode & FMODE_PATH)) {
+	if (unlikely(fd_file(f)->f_mode & FMODE_PATH)) {
 		if (!check_fcntl_cmd(cmd))
 			goto out_put;
 	}
 
-	err = security_file_fcntl(f.file, cmd, arg);
+	err = security_file_fcntl(fd_file(f), cmd, arg);
 	if (err)
 		goto out_put;
 
@@ -660,7 +660,7 @@ static long do_compat_fcntl64(unsigned int fd, unsigned int cmd,
 		err = get_compat_flock(&flock, compat_ptr(arg));
 		if (err)
 			break;
-		err = fcntl_getlk(f.file, convert_fcntl_cmd(cmd), &flock);
+		err = fcntl_getlk(fd_file(f), convert_fcntl_cmd(cmd), &flock);
 		if (err)
 			break;
 		err = fixup_compat_flock(&flock);
@@ -672,7 +672,7 @@ static long do_compat_fcntl64(unsigned int fd, unsigned int cmd,
 		err = get_compat_flock64(&flock, compat_ptr(arg));
 		if (err)
 			break;
-		err = fcntl_getlk(f.file, convert_fcntl_cmd(cmd), &flock);
+		err = fcntl_getlk(fd_file(f), convert_fcntl_cmd(cmd), &flock);
 		if (!err)
 			err = put_compat_flock64(&flock, compat_ptr(arg));
 		break;
@@ -681,7 +681,7 @@ static long do_compat_fcntl64(unsigned int fd, unsigned int cmd,
 		err = get_compat_flock(&flock, compat_ptr(arg));
 		if (err)
 			break;
-		err = fcntl_setlk(fd, f.file, convert_fcntl_cmd(cmd), &flock);
+		err = fcntl_setlk(fd, fd_file(f), convert_fcntl_cmd(cmd), &flock);
 		break;
 	case F_SETLK64:
 	case F_SETLKW64:
@@ -690,10 +690,10 @@ static long do_compat_fcntl64(unsigned int fd, unsigned int cmd,
 		err = get_compat_flock64(&flock, compat_ptr(arg));
 		if (err)
 			break;
-		err = fcntl_setlk(fd, f.file, convert_fcntl_cmd(cmd), &flock);
+		err = fcntl_setlk(fd, fd_file(f), convert_fcntl_cmd(cmd), &flock);
 		break;
 	default:
-		err = do_fcntl(fd, cmd, arg, f.file);
+		err = do_fcntl(fd, cmd, arg, fd_file(f));
 		break;
 	}
 out_put:
diff --git a/fs/fhandle.c b/fs/fhandle.c
index 6e8cea16790e..3f07b52874a8 100644
--- a/fs/fhandle.c
+++ b/fs/fhandle.c
@@ -125,9 +125,9 @@ static int get_path_from_fd(int fd, struct path *root)
 		spin_unlock(&fs->lock);
 	} else {
 		struct fd f = fdget(fd);
-		if (!f.file)
+		if (!fd_file(f))
 			return -EBADF;
-		*root = f.file->f_path;
+		*root = fd_file(f)->f_path;
 		path_get(root);
 		fdput(f);
 	}
diff --git a/fs/fsopen.c b/fs/fsopen.c
index ed2dd000622e..ee92ca58429e 100644
--- a/fs/fsopen.c
+++ b/fs/fsopen.c
@@ -394,13 +394,13 @@ SYSCALL_DEFINE5(fsconfig,
 	}
 
 	f = fdget(fd);
-	if (!f.file)
+	if (!fd_file(f))
 		return -EBADF;
 	ret = -EINVAL;
-	if (f.file->f_op != &fscontext_fops)
+	if (fd_file(f)->f_op != &fscontext_fops)
 		goto out_f;
 
-	fc = f.file->private_data;
+	fc = fd_file(f)->private_data;
 	if (fc->ops == &legacy_fs_context_ops) {
 		switch (cmd) {
 		case FSCONFIG_SET_BINARY:
diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
index 9eb191b5c4de..991b9ae8e7c9 100644
--- a/fs/fuse/dev.c
+++ b/fs/fuse/dev.c
@@ -2321,15 +2321,15 @@ static long fuse_dev_ioctl_clone(struct file *file, __u32 __user *argp)
 		return -EFAULT;
 
 	f = fdget(oldfd);
-	if (!f.file)
+	if (!fd_file(f))
 		return -EINVAL;
 
 	/*
 	 * Check against file->f_op because CUSE
 	 * uses the same ioctl handler.
 	 */
-	if (f.file->f_op == file->f_op)
-		fud = fuse_get_dev(f.file);
+	if (fd_file(f)->f_op == file->f_op)
+		fud = fuse_get_dev(fd_file(f));
 
 	res = -EINVAL;
 	if (fud) {
diff --git a/fs/ioctl.c b/fs/ioctl.c
index 64776891120c..6e0c954388d4 100644
--- a/fs/ioctl.c
+++ b/fs/ioctl.c
@@ -235,9 +235,9 @@ static long ioctl_file_clone(struct file *dst_file, unsigned long srcfd,
 	loff_t cloned;
 	int ret;
 
-	if (!src_file.file)
+	if (!fd_file(src_file))
 		return -EBADF;
-	cloned = vfs_clone_file_range(src_file.file, off, dst_file, destoff,
+	cloned = vfs_clone_file_range(fd_file(src_file), off, dst_file, destoff,
 				      olen, 0);
 	if (cloned < 0)
 		ret = cloned;
@@ -895,16 +895,16 @@ SYSCALL_DEFINE3(ioctl, unsigned int, fd, unsigned int, cmd, unsigned long, arg)
 	struct fd f = fdget(fd);
 	int error;
 
-	if (!f.file)
+	if (!fd_file(f))
 		return -EBADF;
 
-	error = security_file_ioctl(f.file, cmd, arg);
+	error = security_file_ioctl(fd_file(f), cmd, arg);
 	if (error)
 		goto out;
 
-	error = do_vfs_ioctl(f.file, fd, cmd, arg);
+	error = do_vfs_ioctl(fd_file(f), fd, cmd, arg);
 	if (error == -ENOIOCTLCMD)
-		error = vfs_ioctl(f.file, cmd, arg);
+		error = vfs_ioctl(fd_file(f), cmd, arg);
 
 out:
 	fdput(f);
@@ -953,32 +953,32 @@ COMPAT_SYSCALL_DEFINE3(ioctl, unsigned int, fd, unsigned int, cmd,
 	struct fd f = fdget(fd);
 	int error;
 
-	if (!f.file)
+	if (!fd_file(f))
 		return -EBADF;
 
-	error = security_file_ioctl_compat(f.file, cmd, arg);
+	error = security_file_ioctl_compat(fd_file(f), cmd, arg);
 	if (error)
 		goto out;
 
 	switch (cmd) {
 	/* FICLONE takes an int argument, so don't use compat_ptr() */
 	case FICLONE:
-		error = ioctl_file_clone(f.file, arg, 0, 0, 0);
+		error = ioctl_file_clone(fd_file(f), arg, 0, 0, 0);
 		break;
 
 #if defined(CONFIG_X86_64)
 	/* these get messy on amd64 due to alignment differences */
 	case FS_IOC_RESVSP_32:
 	case FS_IOC_RESVSP64_32:
-		error = compat_ioctl_preallocate(f.file, 0, compat_ptr(arg));
+		error = compat_ioctl_preallocate(fd_file(f), 0, compat_ptr(arg));
 		break;
 	case FS_IOC_UNRESVSP_32:
 	case FS_IOC_UNRESVSP64_32:
-		error = compat_ioctl_preallocate(f.file, FALLOC_FL_PUNCH_HOLE,
+		error = compat_ioctl_preallocate(fd_file(f), FALLOC_FL_PUNCH_HOLE,
 				compat_ptr(arg));
 		break;
 	case FS_IOC_ZERO_RANGE_32:
-		error = compat_ioctl_preallocate(f.file, FALLOC_FL_ZERO_RANGE,
+		error = compat_ioctl_preallocate(fd_file(f), FALLOC_FL_ZERO_RANGE,
 				compat_ptr(arg));
 		break;
 #endif
@@ -998,13 +998,13 @@ COMPAT_SYSCALL_DEFINE3(ioctl, unsigned int, fd, unsigned int, cmd,
 	 * argument.
 	 */
 	default:
-		error = do_vfs_ioctl(f.file, fd, cmd,
+		error = do_vfs_ioctl(fd_file(f), fd, cmd,
 				     (unsigned long)compat_ptr(arg));
 		if (error != -ENOIOCTLCMD)
 			break;
 
-		if (f.file->f_op->compat_ioctl)
-			error = f.file->f_op->compat_ioctl(f.file, cmd, arg);
+		if (fd_file(f)->f_op->compat_ioctl)
+			error = fd_file(f)->f_op->compat_ioctl(fd_file(f), cmd, arg);
 		if (error == -ENOIOCTLCMD)
 			error = -ENOTTY;
 		break;
diff --git a/fs/kernel_read_file.c b/fs/kernel_read_file.c
index c429c42a6867..9ff37ae650ea 100644
--- a/fs/kernel_read_file.c
+++ b/fs/kernel_read_file.c
@@ -178,10 +178,10 @@ ssize_t kernel_read_file_from_fd(int fd, loff_t offset, void **buf,
 	struct fd f = fdget(fd);
 	ssize_t ret = -EBADF;
 
-	if (!f.file || !(f.file->f_mode & FMODE_READ))
+	if (!fd_file(f) || !(fd_file(f)->f_mode & FMODE_READ))
 		goto out;
 
-	ret = kernel_read_file(f.file, offset, buf, buf_size, file_size, id);
+	ret = kernel_read_file(fd_file(f), offset, buf, buf_size, file_size, id);
 out:
 	fdput(f);
 	return ret;
diff --git a/fs/locks.c b/fs/locks.c
index 9afb16e0683f..239759a356e9 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -2153,15 +2153,15 @@ SYSCALL_DEFINE2(flock, unsigned int, fd, unsigned int, cmd)
 
 	error = -EBADF;
 	f = fdget(fd);
-	if (!f.file)
+	if (!fd_file(f))
 		return error;
 
-	if (type != F_UNLCK && !(f.file->f_mode & (FMODE_READ | FMODE_WRITE)))
+	if (type != F_UNLCK && !(fd_file(f)->f_mode & (FMODE_READ | FMODE_WRITE)))
 		goto out_putf;
 
-	flock_make_lock(f.file, &fl, type);
+	flock_make_lock(fd_file(f), &fl, type);
 
-	error = security_file_lock(f.file, fl.c.flc_type);
+	error = security_file_lock(fd_file(f), fl.c.flc_type);
 	if (error)
 		goto out_putf;
 
@@ -2169,12 +2169,12 @@ SYSCALL_DEFINE2(flock, unsigned int, fd, unsigned int, cmd)
 	if (can_sleep)
 		fl.c.flc_flags |= FL_SLEEP;
 
-	if (f.file->f_op->flock)
-		error = f.file->f_op->flock(f.file,
+	if (fd_file(f)->f_op->flock)
+		error = fd_file(f)->f_op->flock(fd_file(f),
 					    (can_sleep) ? F_SETLKW : F_SETLK,
 					    &fl);
 	else
-		error = locks_lock_file_wait(f.file, &fl);
+		error = locks_lock_file_wait(fd_file(f), &fl);
 
 	locks_release_private(&fl);
  out_putf:
diff --git a/fs/namei.c b/fs/namei.c
index 5512cb10fa89..af86e3330594 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -2492,25 +2492,25 @@ static const char *path_init(struct nameidata *nd, unsigned flags)
 		struct fd f = fdget_raw(nd->dfd);
 		struct dentry *dentry;
 
-		if (!f.file)
+		if (!fd_file(f))
 			return ERR_PTR(-EBADF);
 
 		if (flags & LOOKUP_LINKAT_EMPTY) {
-			if (f.file->f_cred != current_cred() &&
-			    !ns_capable(f.file->f_cred->user_ns, CAP_DAC_READ_SEARCH)) {
+			if (fd_file(f)->f_cred != current_cred() &&
+			    !ns_capable(fd_file(f)->f_cred->user_ns, CAP_DAC_READ_SEARCH)) {
 				fdput(f);
 				return ERR_PTR(-ENOENT);
 			}
 		}
 
-		dentry = f.file->f_path.dentry;
+		dentry = fd_file(f)->f_path.dentry;
 
 		if (*s && unlikely(!d_can_lookup(dentry))) {
 			fdput(f);
 			return ERR_PTR(-ENOTDIR);
 		}
 
-		nd->path = f.file->f_path;
+		nd->path = fd_file(f)->f_path;
 		if (flags & LOOKUP_RCU) {
 			nd->inode = nd->path.dentry->d_inode;
 			nd->seq = read_seqcount_begin(&nd->path.dentry->d_seq);
diff --git a/fs/namespace.c b/fs/namespace.c
index 328087a4df8a..c46d48bb38cd 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -4099,14 +4099,14 @@ SYSCALL_DEFINE3(fsmount, int, fs_fd, unsigned int, flags,
 	}
 
 	f = fdget(fs_fd);
-	if (!f.file)
+	if (!fd_file(f))
 		return -EBADF;
 
 	ret = -EINVAL;
-	if (f.file->f_op != &fscontext_fops)
+	if (fd_file(f)->f_op != &fscontext_fops)
 		goto err_fsfd;
 
-	fc = f.file->private_data;
+	fc = fd_file(f)->private_data;
 
 	ret = mutex_lock_interruptible(&fc->uapi_mutex);
 	if (ret < 0)
@@ -4649,15 +4649,15 @@ static int build_mount_idmapped(const struct mount_attr *attr, size_t usize,
 		return -EINVAL;
 
 	f = fdget(attr->userns_fd);
-	if (!f.file)
+	if (!fd_file(f))
 		return -EBADF;
 
-	if (!proc_ns_file(f.file)) {
+	if (!proc_ns_file(fd_file(f))) {
 		err = -EINVAL;
 		goto out_fput;
 	}
 
-	ns = get_proc_ns(file_inode(f.file));
+	ns = get_proc_ns(file_inode(fd_file(f)));
 	if (ns->ops->type != CLONE_NEWUSER) {
 		err = -EINVAL;
 		goto out_fput;
diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c
index 9ec313e9f6e1..13454e5fd3fb 100644
--- a/fs/notify/fanotify/fanotify_user.c
+++ b/fs/notify/fanotify/fanotify_user.c
@@ -1006,17 +1006,17 @@ static int fanotify_find_path(int dfd, const char __user *filename,
 		struct fd f = fdget(dfd);
 
 		ret = -EBADF;
-		if (!f.file)
+		if (!fd_file(f))
 			goto out;
 
 		ret = -ENOTDIR;
 		if ((flags & FAN_MARK_ONLYDIR) &&
-		    !(S_ISDIR(file_inode(f.file)->i_mode))) {
+		    !(S_ISDIR(file_inode(fd_file(f))->i_mode))) {
 			fdput(f);
 			goto out;
 		}
 
-		*path = f.file->f_path;
+		*path = fd_file(f)->f_path;
 		path_get(path);
 		fdput(f);
 	} else {
@@ -1753,14 +1753,14 @@ static int do_fanotify_mark(int fanotify_fd, unsigned int flags, __u64 mask,
 	}
 
 	f = fdget(fanotify_fd);
-	if (unlikely(!f.file))
+	if (unlikely(!fd_file(f)))
 		return -EBADF;
 
 	/* verify that this is indeed an fanotify instance */
 	ret = -EINVAL;
-	if (unlikely(f.file->f_op != &fanotify_fops))
+	if (unlikely(fd_file(f)->f_op != &fanotify_fops))
 		goto fput_and_out;
-	group = f.file->private_data;
+	group = fd_file(f)->private_data;
 
 	/*
 	 * An unprivileged user is not allowed to setup mount nor filesystem
diff --git a/fs/notify/inotify/inotify_user.c b/fs/notify/inotify/inotify_user.c
index 4ffc30606e0b..c7e451d5bd51 100644
--- a/fs/notify/inotify/inotify_user.c
+++ b/fs/notify/inotify/inotify_user.c
@@ -753,7 +753,7 @@ SYSCALL_DEFINE3(inotify_add_watch, int, fd, const char __user *, pathname,
 		return -EINVAL;
 
 	f = fdget(fd);
-	if (unlikely(!f.file))
+	if (unlikely(!fd_file(f)))
 		return -EBADF;
 
 	/* IN_MASK_ADD and IN_MASK_CREATE don't make sense together */
@@ -763,7 +763,7 @@ SYSCALL_DEFINE3(inotify_add_watch, int, fd, const char __user *, pathname,
 	}
 
 	/* verify that this is indeed an inotify instance */
-	if (unlikely(f.file->f_op != &inotify_fops)) {
+	if (unlikely(fd_file(f)->f_op != &inotify_fops)) {
 		ret = -EINVAL;
 		goto fput_and_out;
 	}
@@ -780,7 +780,7 @@ SYSCALL_DEFINE3(inotify_add_watch, int, fd, const char __user *, pathname,
 
 	/* inode held in place by reference to path; group by fget on fd */
 	inode = path.dentry->d_inode;
-	group = f.file->private_data;
+	group = fd_file(f)->private_data;
 
 	/* create/update an inode mark */
 	ret = inotify_update_watch(group, inode, mask);
@@ -798,14 +798,14 @@ SYSCALL_DEFINE2(inotify_rm_watch, int, fd, __s32, wd)
 	int ret = -EINVAL;
 
 	f = fdget(fd);
-	if (unlikely(!f.file))
+	if (unlikely(!fd_file(f)))
 		return -EBADF;
 
 	/* verify that this is indeed an inotify instance */
-	if (unlikely(f.file->f_op != &inotify_fops))
+	if (unlikely(fd_file(f)->f_op != &inotify_fops))
 		goto out;
 
-	group = f.file->private_data;
+	group = fd_file(f)->private_data;
 
 	i_mark = inotify_idr_find(group, wd);
 	if (unlikely(!i_mark))
diff --git a/fs/ocfs2/cluster/heartbeat.c b/fs/ocfs2/cluster/heartbeat.c
index 1bde1281d514..4b9f45d7049e 100644
--- a/fs/ocfs2/cluster/heartbeat.c
+++ b/fs/ocfs2/cluster/heartbeat.c
@@ -1785,17 +1785,17 @@ static ssize_t o2hb_region_dev_store(struct config_item *item,
 		goto out;
 
 	f = fdget(fd);
-	if (f.file == NULL)
+	if (fd_file(f) == NULL)
 		goto out;
 
 	if (reg->hr_blocks == 0 || reg->hr_start_block == 0 ||
 	    reg->hr_block_bytes == 0)
 		goto out2;
 
-	if (!S_ISBLK(f.file->f_mapping->host->i_mode))
+	if (!S_ISBLK(fd_file(f)->f_mapping->host->i_mode))
 		goto out2;
 
-	reg->hr_bdev_file = bdev_file_open_by_dev(f.file->f_mapping->host->i_rdev,
+	reg->hr_bdev_file = bdev_file_open_by_dev(fd_file(f)->f_mapping->host->i_rdev,
 			BLK_OPEN_WRITE | BLK_OPEN_READ, NULL, NULL);
 	if (IS_ERR(reg->hr_bdev_file)) {
 		ret = PTR_ERR(reg->hr_bdev_file);
diff --git a/fs/open.c b/fs/open.c
index 22adbef7ecc2..a388828ccd22 100644
--- a/fs/open.c
+++ b/fs/open.c
@@ -193,10 +193,10 @@ long do_sys_ftruncate(unsigned int fd, loff_t length, int small)
 	if (length < 0)
 		return -EINVAL;
 	f = fdget(fd);
-	if (!f.file)
+	if (!fd_file(f))
 		return -EBADF;
 
-	error = do_ftruncate(f.file, length, small);
+	error = do_ftruncate(fd_file(f), length, small);
 
 	fdput(f);
 	return error;
@@ -353,8 +353,8 @@ int ksys_fallocate(int fd, int mode, loff_t offset, loff_t len)
 	struct fd f = fdget(fd);
 	int error = -EBADF;
 
-	if (f.file) {
-		error = vfs_fallocate(f.file, mode, offset, len);
+	if (fd_file(f)) {
+		error = vfs_fallocate(fd_file(f), mode, offset, len);
 		fdput(f);
 	}
 	return error;
@@ -585,16 +585,16 @@ SYSCALL_DEFINE1(fchdir, unsigned int, fd)
 	int error;
 
 	error = -EBADF;
-	if (!f.file)
+	if (!fd_file(f))
 		goto out;
 
 	error = -ENOTDIR;
-	if (!d_can_lookup(f.file->f_path.dentry))
+	if (!d_can_lookup(fd_file(f)->f_path.dentry))
 		goto out_putf;
 
-	error = file_permission(f.file, MAY_EXEC | MAY_CHDIR);
+	error = file_permission(fd_file(f), MAY_EXEC | MAY_CHDIR);
 	if (!error)
-		set_fs_pwd(current->fs, &f.file->f_path);
+		set_fs_pwd(current->fs, &fd_file(f)->f_path);
 out_putf:
 	fdput(f);
 out:
@@ -675,8 +675,8 @@ SYSCALL_DEFINE2(fchmod, unsigned int, fd, umode_t, mode)
 	struct fd f = fdget(fd);
 	int err = -EBADF;
 
-	if (f.file) {
-		err = vfs_fchmod(f.file, mode);
+	if (fd_file(f)) {
+		err = vfs_fchmod(fd_file(f), mode);
 		fdput(f);
 	}
 	return err;
@@ -869,8 +869,8 @@ int ksys_fchown(unsigned int fd, uid_t user, gid_t group)
 	struct fd f = fdget(fd);
 	int error = -EBADF;
 
-	if (f.file) {
-		error = vfs_fchown(f.file, user, group);
+	if (fd_file(f)) {
+		error = vfs_fchown(fd_file(f), user, group);
 		fdput(f);
 	}
 	return error;
diff --git a/fs/overlayfs/file.c b/fs/overlayfs/file.c
index 1a411cae57ed..c4963d0c5549 100644
--- a/fs/overlayfs/file.c
+++ b/fs/overlayfs/file.c
@@ -209,13 +209,13 @@ static loff_t ovl_llseek(struct file *file, loff_t offset, int whence)
 	 * files, so we use the real file to perform seeks.
 	 */
 	ovl_inode_lock(inode);
-	real.file->f_pos = file->f_pos;
+	fd_file(real)->f_pos = file->f_pos;
 
 	old_cred = ovl_override_creds(inode->i_sb);
-	ret = vfs_llseek(real.file, offset, whence);
+	ret = vfs_llseek(fd_file(real), offset, whence);
 	revert_creds(old_cred);
 
-	file->f_pos = real.file->f_pos;
+	file->f_pos = fd_file(real)->f_pos;
 	ovl_inode_unlock(inode);
 
 	fdput(real);
@@ -275,7 +275,7 @@ static ssize_t ovl_read_iter(struct kiocb *iocb, struct iov_iter *iter)
 	if (ret)
 		return ret;
 
-	ret = backing_file_read_iter(real.file, iter, iocb, iocb->ki_flags,
+	ret = backing_file_read_iter(fd_file(real), iter, iocb, iocb->ki_flags,
 				     &ctx);
 	fdput(real);
 
@@ -314,7 +314,7 @@ static ssize_t ovl_write_iter(struct kiocb *iocb, struct iov_iter *iter)
 	 * this property in case it is set by the issuer.
 	 */
 	ifl &= ~IOCB_DIO_CALLER_COMP;
-	ret = backing_file_write_iter(real.file, iter, iocb, ifl, &ctx);
+	ret = backing_file_write_iter(fd_file(real), iter, iocb, ifl, &ctx);
 	fdput(real);
 
 out_unlock:
@@ -339,7 +339,7 @@ static ssize_t ovl_splice_read(struct file *in, loff_t *ppos,
 	if (ret)
 		return ret;
 
-	ret = backing_file_splice_read(real.file, ppos, pipe, len, flags, &ctx);
+	ret = backing_file_splice_read(fd_file(real), ppos, pipe, len, flags, &ctx);
 	fdput(real);
 
 	return ret;
@@ -348,7 +348,7 @@ static ssize_t ovl_splice_read(struct file *in, loff_t *ppos,
 /*
  * Calling iter_file_splice_write() directly from overlay's f_op may deadlock
  * due to lock order inversion between pipe->mutex in iter_file_splice_write()
- * and file_start_write(real.file) in ovl_write_iter().
+ * and file_start_write(fd_file(real)) in ovl_write_iter().
  *
  * So do everything ovl_write_iter() does and call iter_file_splice_write() on
  * the real file.
@@ -373,7 +373,7 @@ static ssize_t ovl_splice_write(struct pipe_inode_info *pipe, struct file *out,
 	if (ret)
 		goto out_unlock;
 
-	ret = backing_file_splice_write(pipe, real.file, ppos, len, flags, &ctx);
+	ret = backing_file_splice_write(pipe, fd_file(real), ppos, len, flags, &ctx);
 	fdput(real);
 
 out_unlock:
@@ -397,9 +397,9 @@ static int ovl_fsync(struct file *file, loff_t start, loff_t end, int datasync)
 		return ret;
 
 	/* Don't sync lower file for fear of receiving EROFS error */
-	if (file_inode(real.file) == ovl_inode_upper(file_inode(file))) {
+	if (file_inode(fd_file(real)) == ovl_inode_upper(file_inode(file))) {
 		old_cred = ovl_override_creds(file_inode(file)->i_sb);
-		ret = vfs_fsync_range(real.file, start, end, datasync);
+		ret = vfs_fsync_range(fd_file(real), start, end, datasync);
 		revert_creds(old_cred);
 	}
 
@@ -439,7 +439,7 @@ static long ovl_fallocate(struct file *file, int mode, loff_t offset, loff_t len
 		goto out_unlock;
 
 	old_cred = ovl_override_creds(file_inode(file)->i_sb);
-	ret = vfs_fallocate(real.file, mode, offset, len);
+	ret = vfs_fallocate(fd_file(real), mode, offset, len);
 	revert_creds(old_cred);
 
 	/* Update size */
@@ -464,7 +464,7 @@ static int ovl_fadvise(struct file *file, loff_t offset, loff_t len, int advice)
 		return ret;
 
 	old_cred = ovl_override_creds(file_inode(file)->i_sb);
-	ret = vfs_fadvise(real.file, offset, len, advice);
+	ret = vfs_fadvise(fd_file(real), offset, len, advice);
 	revert_creds(old_cred);
 
 	fdput(real);
@@ -509,18 +509,18 @@ static loff_t ovl_copyfile(struct file *file_in, loff_t pos_in,
 	old_cred = ovl_override_creds(file_inode(file_out)->i_sb);
 	switch (op) {
 	case OVL_COPY:
-		ret = vfs_copy_file_range(real_in.file, pos_in,
-					  real_out.file, pos_out, len, flags);
+		ret = vfs_copy_file_range(fd_file(real_in), pos_in,
+					  fd_file(real_out), pos_out, len, flags);
 		break;
 
 	case OVL_CLONE:
-		ret = vfs_clone_file_range(real_in.file, pos_in,
-					   real_out.file, pos_out, len, flags);
+		ret = vfs_clone_file_range(fd_file(real_in), pos_in,
+					   fd_file(real_out), pos_out, len, flags);
 		break;
 
 	case OVL_DEDUPE:
-		ret = vfs_dedupe_file_range_one(real_in.file, pos_in,
-						real_out.file, pos_out, len,
+		ret = vfs_dedupe_file_range_one(fd_file(real_in), pos_in,
+						fd_file(real_out), pos_out, len,
 						flags);
 		break;
 	}
@@ -583,9 +583,9 @@ static int ovl_flush(struct file *file, fl_owner_t id)
 	if (err)
 		return err;
 
-	if (real.file->f_op->flush) {
+	if (fd_file(real)->f_op->flush) {
 		old_cred = ovl_override_creds(file_inode(file)->i_sb);
-		err = real.file->f_op->flush(real.file, id);
+		err = fd_file(real)->f_op->flush(fd_file(real), id);
 		revert_creds(old_cred);
 	}
 	fdput(real);
diff --git a/fs/quota/quota.c b/fs/quota/quota.c
index 0e41fb84060f..290157bc7bec 100644
--- a/fs/quota/quota.c
+++ b/fs/quota/quota.c
@@ -980,7 +980,7 @@ SYSCALL_DEFINE4(quotactl_fd, unsigned int, fd, unsigned int, cmd,
 	int ret;
 
 	f = fdget_raw(fd);
-	if (!f.file)
+	if (!fd_file(f))
 		return -EBADF;
 
 	ret = -EINVAL;
@@ -988,12 +988,12 @@ SYSCALL_DEFINE4(quotactl_fd, unsigned int, fd, unsigned int, cmd,
 		goto out;
 
 	if (quotactl_cmd_write(cmds)) {
-		ret = mnt_want_write(f.file->f_path.mnt);
+		ret = mnt_want_write(fd_file(f)->f_path.mnt);
 		if (ret)
 			goto out;
 	}
 
-	sb = f.file->f_path.mnt->mnt_sb;
+	sb = fd_file(f)->f_path.mnt->mnt_sb;
 	if (quotactl_cmd_onoff(cmds))
 		down_write(&sb->s_umount);
 	else
@@ -1007,7 +1007,7 @@ SYSCALL_DEFINE4(quotactl_fd, unsigned int, fd, unsigned int, cmd,
 		up_read(&sb->s_umount);
 
 	if (quotactl_cmd_write(cmds))
-		mnt_drop_write(f.file->f_path.mnt);
+		mnt_drop_write(fd_file(f)->f_path.mnt);
 out:
 	fdput(f);
 	return ret;
diff --git a/fs/read_write.c b/fs/read_write.c
index 90e283b31ca1..59d6d0dee579 100644
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -294,12 +294,12 @@ static off_t ksys_lseek(unsigned int fd, off_t offset, unsigned int whence)
 {
 	off_t retval;
 	struct fd f = fdget_pos(fd);
-	if (!f.file)
+	if (!fd_file(f))
 		return -EBADF;
 
 	retval = -EINVAL;
 	if (whence <= SEEK_MAX) {
-		loff_t res = vfs_llseek(f.file, offset, whence);
+		loff_t res = vfs_llseek(fd_file(f), offset, whence);
 		retval = res;
 		if (res != (loff_t)retval)
 			retval = -EOVERFLOW;	/* LFS: should only happen on 32 bit platforms */
@@ -330,14 +330,14 @@ SYSCALL_DEFINE5(llseek, unsigned int, fd, unsigned long, offset_high,
 	struct fd f = fdget_pos(fd);
 	loff_t offset;
 
-	if (!f.file)
+	if (!fd_file(f))
 		return -EBADF;
 
 	retval = -EINVAL;
 	if (whence > SEEK_MAX)
 		goto out_putf;
 
-	offset = vfs_llseek(f.file, ((loff_t) offset_high << 32) | offset_low,
+	offset = vfs_llseek(fd_file(f), ((loff_t) offset_high << 32) | offset_low,
 			whence);
 
 	retval = (int)offset;
@@ -610,15 +610,15 @@ ssize_t ksys_read(unsigned int fd, char __user *buf, size_t count)
 	struct fd f = fdget_pos(fd);
 	ssize_t ret = -EBADF;
 
-	if (f.file) {
-		loff_t pos, *ppos = file_ppos(f.file);
+	if (fd_file(f)) {
+		loff_t pos, *ppos = file_ppos(fd_file(f));
 		if (ppos) {
 			pos = *ppos;
 			ppos = &pos;
 		}
-		ret = vfs_read(f.file, buf, count, ppos);
+		ret = vfs_read(fd_file(f), buf, count, ppos);
 		if (ret >= 0 && ppos)
-			f.file->f_pos = pos;
+			fd_file(f)->f_pos = pos;
 		fdput_pos(f);
 	}
 	return ret;
@@ -634,15 +634,15 @@ ssize_t ksys_write(unsigned int fd, const char __user *buf, size_t count)
 	struct fd f = fdget_pos(fd);
 	ssize_t ret = -EBADF;
 
-	if (f.file) {
-		loff_t pos, *ppos = file_ppos(f.file);
+	if (fd_file(f)) {
+		loff_t pos, *ppos = file_ppos(fd_file(f));
 		if (ppos) {
 			pos = *ppos;
 			ppos = &pos;
 		}
-		ret = vfs_write(f.file, buf, count, ppos);
+		ret = vfs_write(fd_file(f), buf, count, ppos);
 		if (ret >= 0 && ppos)
-			f.file->f_pos = pos;
+			fd_file(f)->f_pos = pos;
 		fdput_pos(f);
 	}
 
@@ -665,10 +665,10 @@ ssize_t ksys_pread64(unsigned int fd, char __user *buf, size_t count,
 		return -EINVAL;
 
 	f = fdget(fd);
-	if (f.file) {
+	if (fd_file(f)) {
 		ret = -ESPIPE;
-		if (f.file->f_mode & FMODE_PREAD)
-			ret = vfs_read(f.file, buf, count, &pos);
+		if (fd_file(f)->f_mode & FMODE_PREAD)
+			ret = vfs_read(fd_file(f), buf, count, &pos);
 		fdput(f);
 	}
 
@@ -699,10 +699,10 @@ ssize_t ksys_pwrite64(unsigned int fd, const char __user *buf,
 		return -EINVAL;
 
 	f = fdget(fd);
-	if (f.file) {
+	if (fd_file(f)) {
 		ret = -ESPIPE;
-		if (f.file->f_mode & FMODE_PWRITE)  
-			ret = vfs_write(f.file, buf, count, &pos);
+		if (fd_file(f)->f_mode & FMODE_PWRITE)
+			ret = vfs_write(fd_file(f), buf, count, &pos);
 		fdput(f);
 	}
 
@@ -985,15 +985,15 @@ static ssize_t do_readv(unsigned long fd, const struct iovec __user *vec,
 	struct fd f = fdget_pos(fd);
 	ssize_t ret = -EBADF;
 
-	if (f.file) {
-		loff_t pos, *ppos = file_ppos(f.file);
+	if (fd_file(f)) {
+		loff_t pos, *ppos = file_ppos(fd_file(f));
 		if (ppos) {
 			pos = *ppos;
 			ppos = &pos;
 		}
-		ret = vfs_readv(f.file, vec, vlen, ppos, flags);
+		ret = vfs_readv(fd_file(f), vec, vlen, ppos, flags);
 		if (ret >= 0 && ppos)
-			f.file->f_pos = pos;
+			fd_file(f)->f_pos = pos;
 		fdput_pos(f);
 	}
 
@@ -1009,15 +1009,15 @@ static ssize_t do_writev(unsigned long fd, const struct iovec __user *vec,
 	struct fd f = fdget_pos(fd);
 	ssize_t ret = -EBADF;
 
-	if (f.file) {
-		loff_t pos, *ppos = file_ppos(f.file);
+	if (fd_file(f)) {
+		loff_t pos, *ppos = file_ppos(fd_file(f));
 		if (ppos) {
 			pos = *ppos;
 			ppos = &pos;
 		}
-		ret = vfs_writev(f.file, vec, vlen, ppos, flags);
+		ret = vfs_writev(fd_file(f), vec, vlen, ppos, flags);
 		if (ret >= 0 && ppos)
-			f.file->f_pos = pos;
+			fd_file(f)->f_pos = pos;
 		fdput_pos(f);
 	}
 
@@ -1043,10 +1043,10 @@ static ssize_t do_preadv(unsigned long fd, const struct iovec __user *vec,
 		return -EINVAL;
 
 	f = fdget(fd);
-	if (f.file) {
+	if (fd_file(f)) {
 		ret = -ESPIPE;
-		if (f.file->f_mode & FMODE_PREAD)
-			ret = vfs_readv(f.file, vec, vlen, &pos, flags);
+		if (fd_file(f)->f_mode & FMODE_PREAD)
+			ret = vfs_readv(fd_file(f), vec, vlen, &pos, flags);
 		fdput(f);
 	}
 
@@ -1066,10 +1066,10 @@ static ssize_t do_pwritev(unsigned long fd, const struct iovec __user *vec,
 		return -EINVAL;
 
 	f = fdget(fd);
-	if (f.file) {
+	if (fd_file(f)) {
 		ret = -ESPIPE;
-		if (f.file->f_mode & FMODE_PWRITE)
-			ret = vfs_writev(f.file, vec, vlen, &pos, flags);
+		if (fd_file(f)->f_mode & FMODE_PWRITE)
+			ret = vfs_writev(fd_file(f), vec, vlen, &pos, flags);
 		fdput(f);
 	}
 
@@ -1235,19 +1235,19 @@ static ssize_t do_sendfile(int out_fd, int in_fd, loff_t *ppos,
 	 */
 	retval = -EBADF;
 	in = fdget(in_fd);
-	if (!in.file)
+	if (!fd_file(in))
 		goto out;
-	if (!(in.file->f_mode & FMODE_READ))
+	if (!(fd_file(in)->f_mode & FMODE_READ))
 		goto fput_in;
 	retval = -ESPIPE;
 	if (!ppos) {
-		pos = in.file->f_pos;
+		pos = fd_file(in)->f_pos;
 	} else {
 		pos = *ppos;
-		if (!(in.file->f_mode & FMODE_PREAD))
+		if (!(fd_file(in)->f_mode & FMODE_PREAD))
 			goto fput_in;
 	}
-	retval = rw_verify_area(READ, in.file, &pos, count);
+	retval = rw_verify_area(READ, fd_file(in), &pos, count);
 	if (retval < 0)
 		goto fput_in;
 	if (count > MAX_RW_COUNT)
@@ -1258,13 +1258,13 @@ static ssize_t do_sendfile(int out_fd, int in_fd, loff_t *ppos,
 	 */
 	retval = -EBADF;
 	out = fdget(out_fd);
-	if (!out.file)
+	if (!fd_file(out))
 		goto fput_in;
-	if (!(out.file->f_mode & FMODE_WRITE))
+	if (!(fd_file(out)->f_mode & FMODE_WRITE))
 		goto fput_out;
-	in_inode = file_inode(in.file);
-	out_inode = file_inode(out.file);
-	out_pos = out.file->f_pos;
+	in_inode = file_inode(fd_file(in));
+	out_inode = file_inode(fd_file(out));
+	out_pos = fd_file(out)->f_pos;
 
 	if (!max)
 		max = min(in_inode->i_sb->s_maxbytes, out_inode->i_sb->s_maxbytes);
@@ -1284,33 +1284,33 @@ static ssize_t do_sendfile(int out_fd, int in_fd, loff_t *ppos,
 	 * and the application is arguably buggy if it doesn't expect
 	 * EAGAIN on a non-blocking file descriptor.
 	 */
-	if (in.file->f_flags & O_NONBLOCK)
+	if (fd_file(in)->f_flags & O_NONBLOCK)
 		fl = SPLICE_F_NONBLOCK;
 #endif
-	opipe = get_pipe_info(out.file, true);
+	opipe = get_pipe_info(fd_file(out), true);
 	if (!opipe) {
-		retval = rw_verify_area(WRITE, out.file, &out_pos, count);
+		retval = rw_verify_area(WRITE, fd_file(out), &out_pos, count);
 		if (retval < 0)
 			goto fput_out;
-		retval = do_splice_direct(in.file, &pos, out.file, &out_pos,
+		retval = do_splice_direct(fd_file(in), &pos, fd_file(out), &out_pos,
 					  count, fl);
 	} else {
-		if (out.file->f_flags & O_NONBLOCK)
+		if (fd_file(out)->f_flags & O_NONBLOCK)
 			fl |= SPLICE_F_NONBLOCK;
 
-		retval = splice_file_to_pipe(in.file, opipe, &pos, count, fl);
+		retval = splice_file_to_pipe(fd_file(in), opipe, &pos, count, fl);
 	}
 
 	if (retval > 0) {
 		add_rchar(current, retval);
 		add_wchar(current, retval);
-		fsnotify_access(in.file);
-		fsnotify_modify(out.file);
-		out.file->f_pos = out_pos;
+		fsnotify_access(fd_file(in));
+		fsnotify_modify(fd_file(out));
+		fd_file(out)->f_pos = out_pos;
 		if (ppos)
 			*ppos = pos;
 		else
-			in.file->f_pos = pos;
+			fd_file(in)->f_pos = pos;
 	}
 
 	inc_syscr(current);
@@ -1583,11 +1583,11 @@ SYSCALL_DEFINE6(copy_file_range, int, fd_in, loff_t __user *, off_in,
 	ssize_t ret = -EBADF;
 
 	f_in = fdget(fd_in);
-	if (!f_in.file)
+	if (!fd_file(f_in))
 		goto out2;
 
 	f_out = fdget(fd_out);
-	if (!f_out.file)
+	if (!fd_file(f_out))
 		goto out1;
 
 	ret = -EFAULT;
@@ -1595,21 +1595,21 @@ SYSCALL_DEFINE6(copy_file_range, int, fd_in, loff_t __user *, off_in,
 		if (copy_from_user(&pos_in, off_in, sizeof(loff_t)))
 			goto out;
 	} else {
-		pos_in = f_in.file->f_pos;
+		pos_in = fd_file(f_in)->f_pos;
 	}
 
 	if (off_out) {
 		if (copy_from_user(&pos_out, off_out, sizeof(loff_t)))
 			goto out;
 	} else {
-		pos_out = f_out.file->f_pos;
+		pos_out = fd_file(f_out)->f_pos;
 	}
 
 	ret = -EINVAL;
 	if (flags != 0)
 		goto out;
 
-	ret = vfs_copy_file_range(f_in.file, pos_in, f_out.file, pos_out, len,
+	ret = vfs_copy_file_range(fd_file(f_in), pos_in, fd_file(f_out), pos_out, len,
 				  flags);
 	if (ret > 0) {
 		pos_in += ret;
@@ -1619,14 +1619,14 @@ SYSCALL_DEFINE6(copy_file_range, int, fd_in, loff_t __user *, off_in,
 			if (copy_to_user(off_in, &pos_in, sizeof(loff_t)))
 				ret = -EFAULT;
 		} else {
-			f_in.file->f_pos = pos_in;
+			fd_file(f_in)->f_pos = pos_in;
 		}
 
 		if (off_out) {
 			if (copy_to_user(off_out, &pos_out, sizeof(loff_t)))
 				ret = -EFAULT;
 		} else {
-			f_out.file->f_pos = pos_out;
+			fd_file(f_out)->f_pos = pos_out;
 		}
 	}
 
diff --git a/fs/readdir.c b/fs/readdir.c
index d6c82421902a..6d29cab8576e 100644
--- a/fs/readdir.c
+++ b/fs/readdir.c
@@ -225,10 +225,10 @@ SYSCALL_DEFINE3(old_readdir, unsigned int, fd,
 		.dirent = dirent
 	};
 
-	if (!f.file)
+	if (!fd_file(f))
 		return -EBADF;
 
-	error = iterate_dir(f.file, &buf.ctx);
+	error = iterate_dir(fd_file(f), &buf.ctx);
 	if (buf.result)
 		error = buf.result;
 
@@ -318,10 +318,10 @@ SYSCALL_DEFINE3(getdents, unsigned int, fd,
 	int error;
 
 	f = fdget_pos(fd);
-	if (!f.file)
+	if (!fd_file(f))
 		return -EBADF;
 
-	error = iterate_dir(f.file, &buf.ctx);
+	error = iterate_dir(fd_file(f), &buf.ctx);
 	if (error >= 0)
 		error = buf.error;
 	if (buf.prev_reclen) {
@@ -401,10 +401,10 @@ SYSCALL_DEFINE3(getdents64, unsigned int, fd,
 	int error;
 
 	f = fdget_pos(fd);
-	if (!f.file)
+	if (!fd_file(f))
 		return -EBADF;
 
-	error = iterate_dir(f.file, &buf.ctx);
+	error = iterate_dir(fd_file(f), &buf.ctx);
 	if (error >= 0)
 		error = buf.error;
 	if (buf.prev_reclen) {
@@ -483,10 +483,10 @@ COMPAT_SYSCALL_DEFINE3(old_readdir, unsigned int, fd,
 		.dirent = dirent
 	};
 
-	if (!f.file)
+	if (!fd_file(f))
 		return -EBADF;
 
-	error = iterate_dir(f.file, &buf.ctx);
+	error = iterate_dir(fd_file(f), &buf.ctx);
 	if (buf.result)
 		error = buf.result;
 
@@ -569,10 +569,10 @@ COMPAT_SYSCALL_DEFINE3(getdents, unsigned int, fd,
 	int error;
 
 	f = fdget_pos(fd);
-	if (!f.file)
+	if (!fd_file(f))
 		return -EBADF;
 
-	error = iterate_dir(f.file, &buf.ctx);
+	error = iterate_dir(fd_file(f), &buf.ctx);
 	if (error >= 0)
 		error = buf.error;
 	if (buf.prev_reclen) {
diff --git a/fs/remap_range.c b/fs/remap_range.c
index 28246dfc8485..4403d5c68fcb 100644
--- a/fs/remap_range.c
+++ b/fs/remap_range.c
@@ -537,7 +537,7 @@ int vfs_dedupe_file_range(struct file *file, struct file_dedupe_range *same)
 
 	for (i = 0, info = same->info; i < count; i++, info++) {
 		struct fd dst_fd = fdget(info->dest_fd);
-		struct file *dst_file = dst_fd.file;
+		struct file *dst_file = fd_file(dst_fd);
 
 		if (!dst_file) {
 			info->status = -EBADF;
diff --git a/fs/select.c b/fs/select.c
index 9515c3fa1a03..97e1009dde00 100644
--- a/fs/select.c
+++ b/fs/select.c
@@ -532,10 +532,10 @@ static noinline_for_stack int do_select(int n, fd_set_bits *fds, struct timespec
 					continue;
 				mask = EPOLLNVAL;
 				f = fdget(i);
-				if (f.file) {
+				if (fd_file(f)) {
 					wait_key_set(wait, in, out, bit,
 						     busy_flag);
-					mask = vfs_poll(f.file, wait);
+					mask = vfs_poll(fd_file(f), wait);
 
 					fdput(f);
 				}
@@ -864,13 +864,13 @@ static inline __poll_t do_pollfd(struct pollfd *pollfd, poll_table *pwait,
 		goto out;
 	mask = EPOLLNVAL;
 	f = fdget(fd);
-	if (!f.file)
+	if (!fd_file(f))
 		goto out;
 
 	/* userland u16 ->events contains POLL... bitmap */
 	filter = demangle_poll(pollfd->events) | EPOLLERR | EPOLLHUP;
 	pwait->_key = filter | busy_flag;
-	mask = vfs_poll(f.file, pwait);
+	mask = vfs_poll(fd_file(f), pwait);
 	if (mask & busy_flag)
 		*can_busy_poll = true;
 	mask &= filter;		/* Mask out unneeded events. */
diff --git a/fs/signalfd.c b/fs/signalfd.c
index ec7b2da2477a..777e889ab0e8 100644
--- a/fs/signalfd.c
+++ b/fs/signalfd.c
@@ -289,10 +289,10 @@ static int do_signalfd4(int ufd, sigset_t *mask, int flags)
 		fd_install(ufd, file);
 	} else {
 		struct fd f = fdget(ufd);
-		if (!f.file)
+		if (!fd_file(f))
 			return -EBADF;
-		ctx = f.file->private_data;
-		if (f.file->f_op != &signalfd_fops) {
+		ctx = fd_file(f)->private_data;
+		if (fd_file(f)->f_op != &signalfd_fops) {
 			fdput(f);
 			return -EINVAL;
 		}
diff --git a/fs/smb/client/ioctl.c b/fs/smb/client/ioctl.c
index 855ac5a62edf..94bf2e5014d9 100644
--- a/fs/smb/client/ioctl.c
+++ b/fs/smb/client/ioctl.c
@@ -90,23 +90,23 @@ static long cifs_ioctl_copychunk(unsigned int xid, struct file *dst_file,
 	}
 
 	src_file = fdget(srcfd);
-	if (!src_file.file) {
+	if (!fd_file(src_file)) {
 		rc = -EBADF;
 		goto out_drop_write;
 	}
 
-	if (src_file.file->f_op->unlocked_ioctl != cifs_ioctl) {
+	if (fd_file(src_file)->f_op->unlocked_ioctl != cifs_ioctl) {
 		rc = -EBADF;
 		cifs_dbg(VFS, "src file seems to be from a different filesystem type\n");
 		goto out_fput;
 	}
 
-	src_inode = file_inode(src_file.file);
+	src_inode = file_inode(fd_file(src_file));
 	rc = -EINVAL;
 	if (S_ISDIR(src_inode->i_mode))
 		goto out_fput;
 
-	rc = cifs_file_copychunk_range(xid, src_file.file, 0, dst_file, 0,
+	rc = cifs_file_copychunk_range(xid, fd_file(src_file), 0, dst_file, 0,
 					src_inode->i_size, 0);
 	if (rc > 0)
 		rc = 0;
diff --git a/fs/splice.c b/fs/splice.c
index 60aed8de21f8..06232d7e505f 100644
--- a/fs/splice.c
+++ b/fs/splice.c
@@ -1566,11 +1566,11 @@ static ssize_t vmsplice_to_pipe(struct file *file, struct iov_iter *iter,
 
 static int vmsplice_type(struct fd f, int *type)
 {
-	if (!f.file)
+	if (!fd_file(f))
 		return -EBADF;
-	if (f.file->f_mode & FMODE_WRITE) {
+	if (fd_file(f)->f_mode & FMODE_WRITE) {
 		*type = ITER_SOURCE;
-	} else if (f.file->f_mode & FMODE_READ) {
+	} else if (fd_file(f)->f_mode & FMODE_READ) {
 		*type = ITER_DEST;
 	} else {
 		fdput(f);
@@ -1621,9 +1621,9 @@ SYSCALL_DEFINE4(vmsplice, int, fd, const struct iovec __user *, uiov,
 	if (!iov_iter_count(&iter))
 		error = 0;
 	else if (type == ITER_SOURCE)
-		error = vmsplice_to_pipe(f.file, &iter, flags);
+		error = vmsplice_to_pipe(fd_file(f), &iter, flags);
 	else
-		error = vmsplice_to_user(f.file, &iter, flags);
+		error = vmsplice_to_user(fd_file(f), &iter, flags);
 
 	kfree(iov);
 out_fdput:
@@ -1646,10 +1646,10 @@ SYSCALL_DEFINE6(splice, int, fd_in, loff_t __user *, off_in,
 
 	error = -EBADF;
 	in = fdget(fd_in);
-	if (in.file) {
+	if (fd_file(in)) {
 		out = fdget(fd_out);
-		if (out.file) {
-			error = __do_splice(in.file, off_in, out.file, off_out,
+		if (fd_file(out)) {
+			error = __do_splice(fd_file(in), off_in, fd_file(out), off_out,
 					    len, flags);
 			fdput(out);
 		}
@@ -2016,10 +2016,10 @@ SYSCALL_DEFINE4(tee, int, fdin, int, fdout, size_t, len, unsigned int, flags)
 
 	error = -EBADF;
 	in = fdget(fdin);
-	if (in.file) {
+	if (fd_file(in)) {
 		out = fdget(fdout);
-		if (out.file) {
-			error = do_tee(in.file, out.file, len, flags);
+		if (fd_file(out)) {
+			error = do_tee(fd_file(in), fd_file(out), len, flags);
 			fdput(out);
 		}
  		fdput(in);
diff --git a/fs/stat.c b/fs/stat.c
index 89ce1be56310..41e598376d7e 100644
--- a/fs/stat.c
+++ b/fs/stat.c
@@ -224,9 +224,9 @@ int vfs_fstat(int fd, struct kstat *stat)
 	int error;
 
 	f = fdget_raw(fd);
-	if (!f.file)
+	if (!fd_file(f))
 		return -EBADF;
-	error = vfs_getattr(&f.file->f_path, stat, STATX_BASIC_STATS, 0);
+	error = vfs_getattr(&fd_file(f)->f_path, stat, STATX_BASIC_STATS, 0);
 	fdput(f);
 	return error;
 }
@@ -277,9 +277,9 @@ static int vfs_statx_fd(int fd, int flags, struct kstat *stat,
 			  u32 request_mask)
 {
 	CLASS(fd_raw, f)(fd);
-	if (!f.file)
+	if (!fd_file(f))
 		return -EBADF;
-	return vfs_statx_path(&f.file->f_path, flags, stat, request_mask);
+	return vfs_statx_path(&fd_file(f)->f_path, flags, stat, request_mask);
 }
 
 /**
diff --git a/fs/statfs.c b/fs/statfs.c
index 96d1c3edf289..9c7bb27e7932 100644
--- a/fs/statfs.c
+++ b/fs/statfs.c
@@ -116,8 +116,8 @@ int fd_statfs(int fd, struct kstatfs *st)
 {
 	struct fd f = fdget_raw(fd);
 	int error = -EBADF;
-	if (f.file) {
-		error = vfs_statfs(&f.file->f_path, st);
+	if (fd_file(f)) {
+		error = vfs_statfs(&fd_file(f)->f_path, st);
 		fdput(f);
 	}
 	return error;
diff --git a/fs/sync.c b/fs/sync.c
index dc725914e1ed..67df255eb189 100644
--- a/fs/sync.c
+++ b/fs/sync.c
@@ -152,15 +152,15 @@ SYSCALL_DEFINE1(syncfs, int, fd)
 	struct super_block *sb;
 	int ret, ret2;
 
-	if (!f.file)
+	if (!fd_file(f))
 		return -EBADF;
-	sb = f.file->f_path.dentry->d_sb;
+	sb = fd_file(f)->f_path.dentry->d_sb;
 
 	down_read(&sb->s_umount);
 	ret = sync_filesystem(sb);
 	up_read(&sb->s_umount);
 
-	ret2 = errseq_check_and_advance(&sb->s_wb_err, &f.file->f_sb_err);
+	ret2 = errseq_check_and_advance(&sb->s_wb_err, &fd_file(f)->f_sb_err);
 
 	fdput(f);
 	return ret ? ret : ret2;
@@ -208,8 +208,8 @@ static int do_fsync(unsigned int fd, int datasync)
 	struct fd f = fdget(fd);
 	int ret = -EBADF;
 
-	if (f.file) {
-		ret = vfs_fsync(f.file, datasync);
+	if (fd_file(f)) {
+		ret = vfs_fsync(fd_file(f), datasync);
 		fdput(f);
 	}
 	return ret;
@@ -360,8 +360,8 @@ int ksys_sync_file_range(int fd, loff_t offset, loff_t nbytes,
 
 	ret = -EBADF;
 	f = fdget(fd);
-	if (f.file)
-		ret = sync_file_range(f.file, offset, nbytes, flags);
+	if (fd_file(f))
+		ret = sync_file_range(fd_file(f), offset, nbytes, flags);
 
 	fdput(f);
 	return ret;
diff --git a/fs/timerfd.c b/fs/timerfd.c
index 4bf2f8bfec11..137523e0bb21 100644
--- a/fs/timerfd.c
+++ b/fs/timerfd.c
@@ -397,9 +397,9 @@ static const struct file_operations timerfd_fops = {
 static int timerfd_fget(int fd, struct fd *p)
 {
 	struct fd f = fdget(fd);
-	if (!f.file)
+	if (!fd_file(f))
 		return -EBADF;
-	if (f.file->f_op != &timerfd_fops) {
+	if (fd_file(f)->f_op != &timerfd_fops) {
 		fdput(f);
 		return -EINVAL;
 	}
@@ -482,7 +482,7 @@ static int do_timerfd_settime(int ufd, int flags,
 	ret = timerfd_fget(ufd, &f);
 	if (ret)
 		return ret;
-	ctx = f.file->private_data;
+	ctx = fd_file(f)->private_data;
 
 	if (isalarm(ctx) && !capable(CAP_WAKE_ALARM)) {
 		fdput(f);
@@ -546,7 +546,7 @@ static int do_timerfd_gettime(int ufd, struct itimerspec64 *t)
 	int ret = timerfd_fget(ufd, &f);
 	if (ret)
 		return ret;
-	ctx = f.file->private_data;
+	ctx = fd_file(f)->private_data;
 
 	spin_lock_irq(&ctx->wqh.lock);
 	if (ctx->expired && ctx->tintv) {
diff --git a/fs/utimes.c b/fs/utimes.c
index 3701b3946f88..99b26f792b89 100644
--- a/fs/utimes.c
+++ b/fs/utimes.c
@@ -115,9 +115,9 @@ static int do_utimes_fd(int fd, struct timespec64 *times, int flags)
 		return -EINVAL;
 
 	f = fdget(fd);
-	if (!f.file)
+	if (!fd_file(f))
 		return -EBADF;
-	error = vfs_utimes(&f.file->f_path, times);
+	error = vfs_utimes(&fd_file(f)->f_path, times);
 	fdput(f);
 	return error;
 }
diff --git a/fs/xattr.c b/fs/xattr.c
index 7672ce5486c5..05ec7e7d9e87 100644
--- a/fs/xattr.c
+++ b/fs/xattr.c
@@ -697,19 +697,19 @@ SYSCALL_DEFINE5(fsetxattr, int, fd, const char __user *, name,
 	int error;
 
 	CLASS(fd, f)(fd);
-	if (!f.file)
+	if (!fd_file(f))
 		return -EBADF;
 
-	audit_file(f.file);
+	audit_file(fd_file(f));
 	error = setxattr_copy(name, &ctx);
 	if (error)
 		return error;
 
-	error = mnt_want_write_file(f.file);
+	error = mnt_want_write_file(fd_file(f));
 	if (!error) {
-		error = do_setxattr(file_mnt_idmap(f.file),
-				    f.file->f_path.dentry, &ctx);
-		mnt_drop_write_file(f.file);
+		error = do_setxattr(file_mnt_idmap(fd_file(f)),
+				    fd_file(f)->f_path.dentry, &ctx);
+		mnt_drop_write_file(fd_file(f));
 	}
 	kvfree(ctx.kvalue);
 	return error;
@@ -812,10 +812,10 @@ SYSCALL_DEFINE4(fgetxattr, int, fd, const char __user *, name,
 	struct fd f = fdget(fd);
 	ssize_t error = -EBADF;
 
-	if (!f.file)
+	if (!fd_file(f))
 		return error;
-	audit_file(f.file);
-	error = getxattr(file_mnt_idmap(f.file), f.file->f_path.dentry,
+	audit_file(fd_file(f));
+	error = getxattr(file_mnt_idmap(fd_file(f)), fd_file(f)->f_path.dentry,
 			 name, value, size);
 	fdput(f);
 	return error;
@@ -888,10 +888,10 @@ SYSCALL_DEFINE3(flistxattr, int, fd, char __user *, list, size_t, size)
 	struct fd f = fdget(fd);
 	ssize_t error = -EBADF;
 
-	if (!f.file)
+	if (!fd_file(f))
 		return error;
-	audit_file(f.file);
-	error = listxattr(f.file->f_path.dentry, list, size);
+	audit_file(fd_file(f));
+	error = listxattr(fd_file(f)->f_path.dentry, list, size);
 	fdput(f);
 	return error;
 }
@@ -954,9 +954,9 @@ SYSCALL_DEFINE2(fremovexattr, int, fd, const char __user *, name)
 	char kname[XATTR_NAME_MAX + 1];
 	int error = -EBADF;
 
-	if (!f.file)
+	if (!fd_file(f))
 		return error;
-	audit_file(f.file);
+	audit_file(fd_file(f));
 
 	error = strncpy_from_user(kname, name, sizeof(kname));
 	if (error == 0 || error == sizeof(kname))
@@ -964,11 +964,11 @@ SYSCALL_DEFINE2(fremovexattr, int, fd, const char __user *, name)
 	if (error < 0)
 		return error;
 
-	error = mnt_want_write_file(f.file);
+	error = mnt_want_write_file(fd_file(f));
 	if (!error) {
-		error = removexattr(file_mnt_idmap(f.file),
-				    f.file->f_path.dentry, kname);
-		mnt_drop_write_file(f.file);
+		error = removexattr(file_mnt_idmap(fd_file(f)),
+				    fd_file(f)->f_path.dentry, kname);
+		mnt_drop_write_file(fd_file(f));
 	}
 	fdput(f);
 	return error;
diff --git a/fs/xfs/xfs_exchrange.c b/fs/xfs/xfs_exchrange.c
index c8a655c92c92..9790e0f45d14 100644
--- a/fs/xfs/xfs_exchrange.c
+++ b/fs/xfs/xfs_exchrange.c
@@ -794,9 +794,9 @@ xfs_ioc_exchange_range(
 	fxr.flags		= args.flags;
 
 	file1 = fdget(args.file1_fd);
-	if (!file1.file)
+	if (!fd_file(file1))
 		return -EBADF;
-	fxr.file1 = file1.file;
+	fxr.file1 = fd_file(file1);
 
 	error = xfs_exchange_range(&fxr);
 	fdput(file1);
diff --git a/fs/xfs/xfs_handle.c b/fs/xfs/xfs_handle.c
index cf5acbd3c7ca..7bcc4f519cb8 100644
--- a/fs/xfs/xfs_handle.c
+++ b/fs/xfs/xfs_handle.c
@@ -92,9 +92,9 @@ xfs_find_handle(
 
 	if (cmd == XFS_IOC_FD_TO_HANDLE) {
 		f = fdget(hreq->fd);
-		if (!f.file)
+		if (!fd_file(f))
 			return -EBADF;
-		inode = file_inode(f.file);
+		inode = file_inode(fd_file(f));
 	} else {
 		error = user_path_at(AT_FDCWD, hreq->path, 0, &path);
 		if (error)
diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
index 4e933db75b12..95641817370b 100644
--- a/fs/xfs/xfs_ioctl.c
+++ b/fs/xfs/xfs_ioctl.c
@@ -1005,33 +1005,33 @@ xfs_ioc_swapext(
 
 	/* Pull information for the target fd */
 	f = fdget((int)sxp->sx_fdtarget);
-	if (!f.file) {
+	if (!fd_file(f)) {
 		error = -EINVAL;
 		goto out;
 	}
 
-	if (!(f.file->f_mode & FMODE_WRITE) ||
-	    !(f.file->f_mode & FMODE_READ) ||
-	    (f.file->f_flags & O_APPEND)) {
+	if (!(fd_file(f)->f_mode & FMODE_WRITE) ||
+	    !(fd_file(f)->f_mode & FMODE_READ) ||
+	    (fd_file(f)->f_flags & O_APPEND)) {
 		error = -EBADF;
 		goto out_put_file;
 	}
 
 	tmp = fdget((int)sxp->sx_fdtmp);
-	if (!tmp.file) {
+	if (!fd_file(tmp)) {
 		error = -EINVAL;
 		goto out_put_file;
 	}
 
-	if (!(tmp.file->f_mode & FMODE_WRITE) ||
-	    !(tmp.file->f_mode & FMODE_READ) ||
-	    (tmp.file->f_flags & O_APPEND)) {
+	if (!(fd_file(tmp)->f_mode & FMODE_WRITE) ||
+	    !(fd_file(tmp)->f_mode & FMODE_READ) ||
+	    (fd_file(tmp)->f_flags & O_APPEND)) {
 		error = -EBADF;
 		goto out_put_tmp_file;
 	}
 
-	if (IS_SWAPFILE(file_inode(f.file)) ||
-	    IS_SWAPFILE(file_inode(tmp.file))) {
+	if (IS_SWAPFILE(file_inode(fd_file(f))) ||
+	    IS_SWAPFILE(file_inode(fd_file(tmp)))) {
 		error = -EINVAL;
 		goto out_put_tmp_file;
 	}
@@ -1041,14 +1041,14 @@ xfs_ioc_swapext(
 	 * before we cast and access them as XFS structures as we have no
 	 * control over what the user passes us here.
 	 */
-	if (f.file->f_op != &xfs_file_operations ||
-	    tmp.file->f_op != &xfs_file_operations) {
+	if (fd_file(f)->f_op != &xfs_file_operations ||
+	    fd_file(tmp)->f_op != &xfs_file_operations) {
 		error = -EINVAL;
 		goto out_put_tmp_file;
 	}
 
-	ip = XFS_I(file_inode(f.file));
-	tip = XFS_I(file_inode(tmp.file));
+	ip = XFS_I(file_inode(fd_file(f)));
+	tip = XFS_I(file_inode(fd_file(tmp)));
 
 	if (ip->i_mount != tip->i_mount) {
 		error = -EINVAL;
diff --git a/include/linux/cleanup.h b/include/linux/cleanup.h
index d9e613803df1..a3d3e888cf1f 100644
--- a/include/linux/cleanup.h
+++ b/include/linux/cleanup.h
@@ -98,7 +98,7 @@ const volatile void * __must_check_fn(const volatile void *val)
  * DEFINE_CLASS(fdget, struct fd, fdput(_T), fdget(fd), int fd)
  *
  *	CLASS(fdget, f)(fd);
- *	if (!f.file)
+ *	if (!fd_file(f))
  *		return -EBADF;
  *
  *	// use 'f' without concern
diff --git a/include/linux/file.h b/include/linux/file.h
index 237931f20739..0f3f369f2450 100644
--- a/include/linux/file.h
+++ b/include/linux/file.h
@@ -42,10 +42,12 @@ struct fd {
 #define FDPUT_FPUT       1
 #define FDPUT_POS_UNLOCK 2
 
+#define fd_file(f) ((f).file)
+
 static inline void fdput(struct fd fd)
 {
 	if (fd.flags & FDPUT_FPUT)
-		fput(fd.file);
+		fput(fd_file(fd));
 }
 
 extern struct file *fget(unsigned int fd);
@@ -79,7 +81,7 @@ static inline struct fd fdget_pos(int fd)
 static inline void fdput_pos(struct fd f)
 {
 	if (f.flags & FDPUT_POS_UNLOCK)
-		__f_unlock_pos(f.file);
+		__f_unlock_pos(fd_file(f));
 	fdput(f);
 }
 
diff --git a/io_uring/sqpoll.c b/io_uring/sqpoll.c
index b3722e5275e7..ffa7d341bd95 100644
--- a/io_uring/sqpoll.c
+++ b/io_uring/sqpoll.c
@@ -108,14 +108,14 @@ static struct io_sq_data *io_attach_sq_data(struct io_uring_params *p)
 	struct fd f;
 
 	f = fdget(p->wq_fd);
-	if (!f.file)
+	if (!fd_file(f))
 		return ERR_PTR(-ENXIO);
-	if (!io_is_uring_fops(f.file)) {
+	if (!io_is_uring_fops(fd_file(f))) {
 		fdput(f);
 		return ERR_PTR(-EINVAL);
 	}
 
-	ctx_attach = f.file->private_data;
+	ctx_attach = fd_file(f)->private_data;
 	sqd = ctx_attach->sq_data;
 	if (!sqd) {
 		fdput(f);
@@ -418,9 +418,9 @@ __cold int io_sq_offload_create(struct io_ring_ctx *ctx,
 		struct fd f;
 
 		f = fdget(p->wq_fd);
-		if (!f.file)
+		if (!fd_file(f))
 			return -ENXIO;
-		if (!io_is_uring_fops(f.file)) {
+		if (!io_is_uring_fops(fd_file(f))) {
 			fdput(f);
 			return -EINVAL;
 		}
diff --git a/ipc/mqueue.c b/ipc/mqueue.c
index a7cbd69efbef..34fa0bd8bb11 100644
--- a/ipc/mqueue.c
+++ b/ipc/mqueue.c
@@ -1085,20 +1085,20 @@ static int do_mq_timedsend(mqd_t mqdes, const char __user *u_msg_ptr,
 	audit_mq_sendrecv(mqdes, msg_len, msg_prio, ts);
 
 	f = fdget(mqdes);
-	if (unlikely(!f.file)) {
+	if (unlikely(!fd_file(f))) {
 		ret = -EBADF;
 		goto out;
 	}
 
-	inode = file_inode(f.file);
-	if (unlikely(f.file->f_op != &mqueue_file_operations)) {
+	inode = file_inode(fd_file(f));
+	if (unlikely(fd_file(f)->f_op != &mqueue_file_operations)) {
 		ret = -EBADF;
 		goto out_fput;
 	}
 	info = MQUEUE_I(inode);
-	audit_file(f.file);
+	audit_file(fd_file(f));
 
-	if (unlikely(!(f.file->f_mode & FMODE_WRITE))) {
+	if (unlikely(!(fd_file(f)->f_mode & FMODE_WRITE))) {
 		ret = -EBADF;
 		goto out_fput;
 	}
@@ -1138,7 +1138,7 @@ static int do_mq_timedsend(mqd_t mqdes, const char __user *u_msg_ptr,
 	}
 
 	if (info->attr.mq_curmsgs == info->attr.mq_maxmsg) {
-		if (f.file->f_flags & O_NONBLOCK) {
+		if (fd_file(f)->f_flags & O_NONBLOCK) {
 			ret = -EAGAIN;
 		} else {
 			wait.task = current;
@@ -1199,20 +1199,20 @@ static int do_mq_timedreceive(mqd_t mqdes, char __user *u_msg_ptr,
 	audit_mq_sendrecv(mqdes, msg_len, 0, ts);
 
 	f = fdget(mqdes);
-	if (unlikely(!f.file)) {
+	if (unlikely(!fd_file(f))) {
 		ret = -EBADF;
 		goto out;
 	}
 
-	inode = file_inode(f.file);
-	if (unlikely(f.file->f_op != &mqueue_file_operations)) {
+	inode = file_inode(fd_file(f));
+	if (unlikely(fd_file(f)->f_op != &mqueue_file_operations)) {
 		ret = -EBADF;
 		goto out_fput;
 	}
 	info = MQUEUE_I(inode);
-	audit_file(f.file);
+	audit_file(fd_file(f));
 
-	if (unlikely(!(f.file->f_mode & FMODE_READ))) {
+	if (unlikely(!(fd_file(f)->f_mode & FMODE_READ))) {
 		ret = -EBADF;
 		goto out_fput;
 	}
@@ -1242,7 +1242,7 @@ static int do_mq_timedreceive(mqd_t mqdes, char __user *u_msg_ptr,
 	}
 
 	if (info->attr.mq_curmsgs == 0) {
-		if (f.file->f_flags & O_NONBLOCK) {
+		if (fd_file(f)->f_flags & O_NONBLOCK) {
 			spin_unlock(&info->lock);
 			ret = -EAGAIN;
 		} else {
@@ -1356,11 +1356,11 @@ static int do_mq_notify(mqd_t mqdes, const struct sigevent *notification)
 			/* and attach it to the socket */
 retry:
 			f = fdget(notification->sigev_signo);
-			if (!f.file) {
+			if (!fd_file(f)) {
 				ret = -EBADF;
 				goto out;
 			}
-			sock = netlink_getsockbyfilp(f.file);
+			sock = netlink_getsockbyfilp(fd_file(f));
 			fdput(f);
 			if (IS_ERR(sock)) {
 				ret = PTR_ERR(sock);
@@ -1379,13 +1379,13 @@ static int do_mq_notify(mqd_t mqdes, const struct sigevent *notification)
 	}
 
 	f = fdget(mqdes);
-	if (!f.file) {
+	if (!fd_file(f)) {
 		ret = -EBADF;
 		goto out;
 	}
 
-	inode = file_inode(f.file);
-	if (unlikely(f.file->f_op != &mqueue_file_operations)) {
+	inode = file_inode(fd_file(f));
+	if (unlikely(fd_file(f)->f_op != &mqueue_file_operations)) {
 		ret = -EBADF;
 		goto out_fput;
 	}
@@ -1460,31 +1460,31 @@ static int do_mq_getsetattr(int mqdes, struct mq_attr *new, struct mq_attr *old)
 		return -EINVAL;
 
 	f = fdget(mqdes);
-	if (!f.file)
+	if (!fd_file(f))
 		return -EBADF;
 
-	if (unlikely(f.file->f_op != &mqueue_file_operations)) {
+	if (unlikely(fd_file(f)->f_op != &mqueue_file_operations)) {
 		fdput(f);
 		return -EBADF;
 	}
 
-	inode = file_inode(f.file);
+	inode = file_inode(fd_file(f));
 	info = MQUEUE_I(inode);
 
 	spin_lock(&info->lock);
 
 	if (old) {
 		*old = info->attr;
-		old->mq_flags = f.file->f_flags & O_NONBLOCK;
+		old->mq_flags = fd_file(f)->f_flags & O_NONBLOCK;
 	}
 	if (new) {
 		audit_mq_getsetattr(mqdes, new);
-		spin_lock(&f.file->f_lock);
+		spin_lock(&fd_file(f)->f_lock);
 		if (new->mq_flags & O_NONBLOCK)
-			f.file->f_flags |= O_NONBLOCK;
+			fd_file(f)->f_flags |= O_NONBLOCK;
 		else
-			f.file->f_flags &= ~O_NONBLOCK;
-		spin_unlock(&f.file->f_lock);
+			fd_file(f)->f_flags &= ~O_NONBLOCK;
+		spin_unlock(&fd_file(f)->f_lock);
 
 		inode_set_atime_to_ts(inode, inode_set_ctime_current(inode));
 	}
diff --git a/kernel/bpf/bpf_inode_storage.c b/kernel/bpf/bpf_inode_storage.c
index b0ef45db207c..0a79aee6523d 100644
--- a/kernel/bpf/bpf_inode_storage.c
+++ b/kernel/bpf/bpf_inode_storage.c
@@ -80,10 +80,10 @@ static void *bpf_fd_inode_storage_lookup_elem(struct bpf_map *map, void *key)
 	struct bpf_local_storage_data *sdata;
 	struct fd f = fdget_raw(*(int *)key);
 
-	if (!f.file)
+	if (!fd_file(f))
 		return ERR_PTR(-EBADF);
 
-	sdata = inode_storage_lookup(file_inode(f.file), map, true);
+	sdata = inode_storage_lookup(file_inode(fd_file(f)), map, true);
 	fdput(f);
 	return sdata ? sdata->data : NULL;
 }
@@ -94,14 +94,14 @@ static long bpf_fd_inode_storage_update_elem(struct bpf_map *map, void *key,
 	struct bpf_local_storage_data *sdata;
 	struct fd f = fdget_raw(*(int *)key);
 
-	if (!f.file)
+	if (!fd_file(f))
 		return -EBADF;
-	if (!inode_storage_ptr(file_inode(f.file))) {
+	if (!inode_storage_ptr(file_inode(fd_file(f)))) {
 		fdput(f);
 		return -EBADF;
 	}
 
-	sdata = bpf_local_storage_update(file_inode(f.file),
+	sdata = bpf_local_storage_update(file_inode(fd_file(f)),
 					 (struct bpf_local_storage_map *)map,
 					 value, map_flags, GFP_ATOMIC);
 	fdput(f);
@@ -126,10 +126,10 @@ static long bpf_fd_inode_storage_delete_elem(struct bpf_map *map, void *key)
 	struct fd f = fdget_raw(*(int *)key);
 	int err;
 
-	if (!f.file)
+	if (!fd_file(f))
 		return -EBADF;
 
-	err = inode_storage_delete(file_inode(f.file), map);
+	err = inode_storage_delete(file_inode(fd_file(f)), map);
 	fdput(f);
 	return err;
 }
diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
index 520f49f422fe..4de1e3dc2284 100644
--- a/kernel/bpf/btf.c
+++ b/kernel/bpf/btf.c
@@ -7677,15 +7677,15 @@ struct btf *btf_get_by_fd(int fd)
 
 	f = fdget(fd);
 
-	if (!f.file)
+	if (!fd_file(f))
 		return ERR_PTR(-EBADF);
 
-	if (f.file->f_op != &btf_fops) {
+	if (fd_file(f)->f_op != &btf_fops) {
 		fdput(f);
 		return ERR_PTR(-EINVAL);
 	}
 
-	btf = f.file->private_data;
+	btf = fd_file(f)->private_data;
 	refcount_inc(&btf->refcnt);
 	fdput(f);
 
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index bf6c5f685ea2..3093bf2cc266 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -829,7 +829,7 @@ static int bpf_map_release(struct inode *inode, struct file *filp)
 
 static fmode_t map_get_sys_perms(struct bpf_map *map, struct fd f)
 {
-	fmode_t mode = f.file->f_mode;
+	fmode_t mode = fd_file(f)->f_mode;
 
 	/* Our file permissions may have been overridden by global
 	 * map permissions facing syscall side.
@@ -1423,14 +1423,14 @@ static int map_create(union bpf_attr *attr)
  */
 struct bpf_map *__bpf_map_get(struct fd f)
 {
-	if (!f.file)
+	if (!fd_file(f))
 		return ERR_PTR(-EBADF);
-	if (f.file->f_op != &bpf_map_fops) {
+	if (fd_file(f)->f_op != &bpf_map_fops) {
 		fdput(f);
 		return ERR_PTR(-EINVAL);
 	}
 
-	return f.file->private_data;
+	return fd_file(f)->private_data;
 }
 
 void bpf_map_inc(struct bpf_map *map)
@@ -1651,7 +1651,7 @@ static int map_update_elem(union bpf_attr *attr, bpfptr_t uattr)
 		goto free_key;
 	}
 
-	err = bpf_map_update_value(map, f.file, key, value, attr->flags);
+	err = bpf_map_update_value(map, fd_file(f), key, value, attr->flags);
 	if (!err)
 		maybe_wait_bpf_programs(map);
 
@@ -2409,14 +2409,14 @@ int bpf_prog_new_fd(struct bpf_prog *prog)
 
 static struct bpf_prog *____bpf_prog_get(struct fd f)
 {
-	if (!f.file)
+	if (!fd_file(f))
 		return ERR_PTR(-EBADF);
-	if (f.file->f_op != &bpf_prog_fops) {
+	if (fd_file(f)->f_op != &bpf_prog_fops) {
 		fdput(f);
 		return ERR_PTR(-EINVAL);
 	}
 
-	return f.file->private_data;
+	return fd_file(f)->private_data;
 }
 
 void bpf_prog_add(struct bpf_prog *prog, int i)
@@ -3259,14 +3259,14 @@ struct bpf_link *bpf_link_get_from_fd(u32 ufd)
 	struct fd f = fdget(ufd);
 	struct bpf_link *link;
 
-	if (!f.file)
+	if (!fd_file(f))
 		return ERR_PTR(-EBADF);
-	if (f.file->f_op != &bpf_link_fops && f.file->f_op != &bpf_link_fops_poll) {
+	if (fd_file(f)->f_op != &bpf_link_fops && fd_file(f)->f_op != &bpf_link_fops_poll) {
 		fdput(f);
 		return ERR_PTR(-EINVAL);
 	}
 
-	link = f.file->private_data;
+	link = fd_file(f)->private_data;
 	bpf_link_inc(link);
 	fdput(f);
 
@@ -4982,19 +4982,19 @@ static int bpf_obj_get_info_by_fd(const union bpf_attr *attr,
 		return -EINVAL;
 
 	f = fdget(ufd);
-	if (!f.file)
+	if (!fd_file(f))
 		return -EBADFD;
 
-	if (f.file->f_op == &bpf_prog_fops)
-		err = bpf_prog_get_info_by_fd(f.file, f.file->private_data, attr,
+	if (fd_file(f)->f_op == &bpf_prog_fops)
+		err = bpf_prog_get_info_by_fd(fd_file(f), fd_file(f)->private_data, attr,
 					      uattr);
-	else if (f.file->f_op == &bpf_map_fops)
-		err = bpf_map_get_info_by_fd(f.file, f.file->private_data, attr,
+	else if (fd_file(f)->f_op == &bpf_map_fops)
+		err = bpf_map_get_info_by_fd(fd_file(f), fd_file(f)->private_data, attr,
 					     uattr);
-	else if (f.file->f_op == &btf_fops)
-		err = bpf_btf_get_info_by_fd(f.file, f.file->private_data, attr, uattr);
-	else if (f.file->f_op == &bpf_link_fops || f.file->f_op == &bpf_link_fops_poll)
-		err = bpf_link_get_info_by_fd(f.file, f.file->private_data,
+	else if (fd_file(f)->f_op == &btf_fops)
+		err = bpf_btf_get_info_by_fd(fd_file(f), fd_file(f)->private_data, attr, uattr);
+	else if (fd_file(f)->f_op == &bpf_link_fops || fd_file(f)->f_op == &bpf_link_fops_poll)
+		err = bpf_link_get_info_by_fd(fd_file(f), fd_file(f)->private_data,
 					      attr, uattr);
 	else
 		err = -EINVAL;
@@ -5215,7 +5215,7 @@ static int bpf_map_do_batch(const union bpf_attr *attr,
 	else if (cmd == BPF_MAP_LOOKUP_AND_DELETE_BATCH)
 		BPF_DO_BATCH(map->ops->map_lookup_and_delete_batch, map, attr, uattr);
 	else if (cmd == BPF_MAP_UPDATE_BATCH)
-		BPF_DO_BATCH(map->ops->map_update_batch, map, f.file, attr, uattr);
+		BPF_DO_BATCH(map->ops->map_update_batch, map, fd_file(f), attr, uattr);
 	else
 		BPF_DO_BATCH(map->ops->map_delete_batch, map, attr, uattr);
 err_put:
diff --git a/kernel/bpf/token.c b/kernel/bpf/token.c
index d6ccf8d00eab..9a1d356e79ed 100644
--- a/kernel/bpf/token.c
+++ b/kernel/bpf/token.c
@@ -122,10 +122,10 @@ int bpf_token_create(union bpf_attr *attr)
 	int err, fd;
 
 	f = fdget(attr->token_create.bpffs_fd);
-	if (!f.file)
+	if (!fd_file(f))
 		return -EBADF;
 
-	path = f.file->f_path;
+	path = fd_file(f)->f_path;
 	path_get(&path);
 	fdput(f);
 
@@ -235,14 +235,14 @@ struct bpf_token *bpf_token_get_from_fd(u32 ufd)
 	struct fd f = fdget(ufd);
 	struct bpf_token *token;
 
-	if (!f.file)
+	if (!fd_file(f))
 		return ERR_PTR(-EBADF);
-	if (f.file->f_op != &bpf_token_fops) {
+	if (fd_file(f)->f_op != &bpf_token_fops) {
 		fdput(f);
 		return ERR_PTR(-EINVAL);
 	}
 
-	token = f.file->private_data;
+	token = fd_file(f)->private_data;
 	bpf_token_inc(token);
 	fdput(f);
 
diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
index c8e4b62b436a..b96489277f70 100644
--- a/kernel/cgroup/cgroup.c
+++ b/kernel/cgroup/cgroup.c
@@ -6901,10 +6901,10 @@ struct cgroup *cgroup_v1v2_get_from_fd(int fd)
 {
 	struct cgroup *cgrp;
 	struct fd f = fdget_raw(fd);
-	if (!f.file)
+	if (!fd_file(f))
 		return ERR_PTR(-EBADF);
 
-	cgrp = cgroup_v1v2_get_from_file(f.file);
+	cgrp = cgroup_v1v2_get_from_file(fd_file(f));
 	fdput(f);
 	return cgrp;
 }
diff --git a/kernel/events/core.c b/kernel/events/core.c
index aa3450bdc227..17b19d3e74ba 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -933,10 +933,10 @@ static inline int perf_cgroup_connect(int fd, struct perf_event *event,
 	struct fd f = fdget(fd);
 	int ret = 0;
 
-	if (!f.file)
+	if (!fd_file(f))
 		return -EBADF;
 
-	css = css_tryget_online_from_dir(f.file->f_path.dentry,
+	css = css_tryget_online_from_dir(fd_file(f)->f_path.dentry,
 					 &perf_event_cgrp_subsys);
 	if (IS_ERR(css)) {
 		ret = PTR_ERR(css);
@@ -5898,10 +5898,10 @@ static const struct file_operations perf_fops;
 static inline int perf_fget_light(int fd, struct fd *p)
 {
 	struct fd f = fdget(fd);
-	if (!f.file)
+	if (!fd_file(f))
 		return -EBADF;
 
-	if (f.file->f_op != &perf_fops) {
+	if (fd_file(f)->f_op != &perf_fops) {
 		fdput(f);
 		return -EBADF;
 	}
@@ -5961,7 +5961,7 @@ static long _perf_ioctl(struct perf_event *event, unsigned int cmd, unsigned lon
 			ret = perf_fget_light(arg, &output);
 			if (ret)
 				return ret;
-			output_event = output.file->private_data;
+			output_event = fd_file(output)->private_data;
 			ret = perf_event_set_output(event, output_event);
 			fdput(output);
 		} else {
@@ -12549,7 +12549,7 @@ SYSCALL_DEFINE5(perf_event_open,
 		err = perf_fget_light(group_fd, &group);
 		if (err)
 			goto err_fd;
-		group_leader = group.file->private_data;
+		group_leader = fd_file(group)->private_data;
 		if (flags & PERF_FLAG_FD_OUTPUT)
 			output_event = group_leader;
 		if (flags & PERF_FLAG_FD_NO_GROUP)
diff --git a/kernel/module/main.c b/kernel/module/main.c
index d9592195c5bb..6ed334eecc14 100644
--- a/kernel/module/main.c
+++ b/kernel/module/main.c
@@ -3211,7 +3211,7 @@ SYSCALL_DEFINE3(finit_module, int, fd, const char __user *, uargs, int, flags)
 		return -EINVAL;
 
 	f = fdget(fd);
-	err = idempotent_init_module(f.file, uargs, flags);
+	err = idempotent_init_module(fd_file(f), uargs, flags);
 	fdput(f);
 	return err;
 }
diff --git a/kernel/nsproxy.c b/kernel/nsproxy.c
index 6ec3deec68c2..dc952c3b05af 100644
--- a/kernel/nsproxy.c
+++ b/kernel/nsproxy.c
@@ -550,15 +550,15 @@ SYSCALL_DEFINE2(setns, int, fd, int, flags)
 	struct nsset nsset = {};
 	int err = 0;
 
-	if (!f.file)
+	if (!fd_file(f))
 		return -EBADF;
 
-	if (proc_ns_file(f.file)) {
-		ns = get_proc_ns(file_inode(f.file));
+	if (proc_ns_file(fd_file(f))) {
+		ns = get_proc_ns(file_inode(fd_file(f)));
 		if (flags && (ns->ops->type != flags))
 			err = -EINVAL;
 		flags = ns->ops->type;
-	} else if (!IS_ERR(pidfd_pid(f.file))) {
+	} else if (!IS_ERR(pidfd_pid(fd_file(f)))) {
 		err = check_setns_flags(flags);
 	} else {
 		err = -EINVAL;
@@ -570,10 +570,10 @@ SYSCALL_DEFINE2(setns, int, fd, int, flags)
 	if (err)
 		goto out;
 
-	if (proc_ns_file(f.file))
+	if (proc_ns_file(fd_file(f)))
 		err = validate_ns(&nsset, ns);
 	else
-		err = validate_nsset(&nsset, pidfd_pid(f.file));
+		err = validate_nsset(&nsset, pidfd_pid(fd_file(f)));
 	if (!err) {
 		commit_nsset(&nsset);
 		perf_event_namespaces(current);
diff --git a/kernel/pid.c b/kernel/pid.c
index da76ed1873f7..2715afb77eab 100644
--- a/kernel/pid.c
+++ b/kernel/pid.c
@@ -540,13 +540,13 @@ struct pid *pidfd_get_pid(unsigned int fd, unsigned int *flags)
 	struct pid *pid;
 
 	f = fdget(fd);
-	if (!f.file)
+	if (!fd_file(f))
 		return ERR_PTR(-EBADF);
 
-	pid = pidfd_pid(f.file);
+	pid = pidfd_pid(fd_file(f));
 	if (!IS_ERR(pid)) {
 		get_pid(pid);
-		*flags = f.file->f_flags;
+		*flags = fd_file(f)->f_flags;
 	}
 
 	fdput(f);
@@ -755,10 +755,10 @@ SYSCALL_DEFINE3(pidfd_getfd, int, pidfd, int, fd,
 		return -EINVAL;
 
 	f = fdget(pidfd);
-	if (!f.file)
+	if (!fd_file(f))
 		return -EBADF;
 
-	pid = pidfd_pid(f.file);
+	pid = pidfd_pid(fd_file(f));
 	if (IS_ERR(pid))
 		ret = PTR_ERR(pid);
 	else
diff --git a/kernel/signal.c b/kernel/signal.c
index 60c737e423a1..cc5d87cfa7c0 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -3922,11 +3922,11 @@ SYSCALL_DEFINE4(pidfd_send_signal, int, pidfd, int, sig,
 		return -EINVAL;
 
 	f = fdget(pidfd);
-	if (!f.file)
+	if (!fd_file(f))
 		return -EBADF;
 
 	/* Is this a pidfd? */
-	pid = pidfd_to_pid(f.file);
+	pid = pidfd_to_pid(fd_file(f));
 	if (IS_ERR(pid)) {
 		ret = PTR_ERR(pid);
 		goto err;
@@ -3939,7 +3939,7 @@ SYSCALL_DEFINE4(pidfd_send_signal, int, pidfd, int, sig,
 	switch (flags) {
 	case 0:
 		/* Infer scope from the type of pidfd. */
-		if (f.file->f_flags & PIDFD_THREAD)
+		if (fd_file(f)->f_flags & PIDFD_THREAD)
 			type = PIDTYPE_PID;
 		else
 			type = PIDTYPE_TGID;
diff --git a/kernel/sys.c b/kernel/sys.c
index 3a2df1bd9f64..a4be1e568ff5 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -1916,10 +1916,10 @@ static int prctl_set_mm_exe_file(struct mm_struct *mm, unsigned int fd)
 	int err;
 
 	exe = fdget(fd);
-	if (!exe.file)
+	if (!fd_file(exe))
 		return -EBADF;
 
-	inode = file_inode(exe.file);
+	inode = file_inode(fd_file(exe));
 
 	/*
 	 * Because the original mm->exe_file points to executable file, make
@@ -1927,14 +1927,14 @@ static int prctl_set_mm_exe_file(struct mm_struct *mm, unsigned int fd)
 	 * overall picture.
 	 */
 	err = -EACCES;
-	if (!S_ISREG(inode->i_mode) || path_noexec(&exe.file->f_path))
+	if (!S_ISREG(inode->i_mode) || path_noexec(&fd_file(exe)->f_path))
 		goto exit;
 
-	err = file_permission(exe.file, MAY_EXEC);
+	err = file_permission(fd_file(exe), MAY_EXEC);
 	if (err)
 		goto exit;
 
-	err = replace_mm_exe_file(mm, exe.file);
+	err = replace_mm_exe_file(mm, fd_file(exe));
 exit:
 	fdput(exe);
 	return err;
diff --git a/kernel/taskstats.c b/kernel/taskstats.c
index 4354ea231fab..0700f40c53ac 100644
--- a/kernel/taskstats.c
+++ b/kernel/taskstats.c
@@ -419,7 +419,7 @@ static int cgroupstats_user_cmd(struct sk_buff *skb, struct genl_info *info)
 
 	fd = nla_get_u32(info->attrs[CGROUPSTATS_CMD_ATTR_FD]);
 	f = fdget(fd);
-	if (!f.file)
+	if (!fd_file(f))
 		return 0;
 
 	size = nla_total_size(sizeof(struct cgroupstats));
@@ -440,7 +440,7 @@ static int cgroupstats_user_cmd(struct sk_buff *skb, struct genl_info *info)
 	stats = nla_data(na);
 	memset(stats, 0, sizeof(*stats));
 
-	rc = cgroupstats_build(stats, f.file->f_path.dentry);
+	rc = cgroupstats_build(stats, fd_file(f)->f_path.dentry);
 	if (rc < 0) {
 		nlmsg_free(rep_skb);
 		goto err;
diff --git a/kernel/watch_queue.c b/kernel/watch_queue.c
index 03b90d7d2175..d36242fd4936 100644
--- a/kernel/watch_queue.c
+++ b/kernel/watch_queue.c
@@ -666,8 +666,8 @@ struct watch_queue *get_watch_queue(int fd)
 	struct fd f;
 
 	f = fdget(fd);
-	if (f.file) {
-		pipe = get_pipe_info(f.file, false);
+	if (fd_file(f)) {
+		pipe = get_pipe_info(fd_file(f), false);
 		if (pipe && pipe->watch_queue) {
 			wqueue = pipe->watch_queue;
 			kref_get(&wqueue->usage);
diff --git a/mm/fadvise.c b/mm/fadvise.c
index 6c39d42f16dc..532dee205c6e 100644
--- a/mm/fadvise.c
+++ b/mm/fadvise.c
@@ -193,10 +193,10 @@ int ksys_fadvise64_64(int fd, loff_t offset, loff_t len, int advice)
 	struct fd f = fdget(fd);
 	int ret;
 
-	if (!f.file)
+	if (!fd_file(f))
 		return -EBADF;
 
-	ret = vfs_fadvise(f.file, offset, len, advice);
+	ret = vfs_fadvise(fd_file(f), offset, len, advice);
 
 	fdput(f);
 	return ret;
diff --git a/mm/filemap.c b/mm/filemap.c
index d62150418b91..0b5cbd644fdd 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -4393,7 +4393,7 @@ SYSCALL_DEFINE4(cachestat, unsigned int, fd,
 	struct cachestat cs;
 	pgoff_t first_index, last_index;
 
-	if (!f.file)
+	if (!fd_file(f))
 		return -EBADF;
 
 	if (copy_from_user(&csr, cstat_range,
@@ -4403,7 +4403,7 @@ SYSCALL_DEFINE4(cachestat, unsigned int, fd,
 	}
 
 	/* hugetlbfs is not supported */
-	if (is_file_hugepages(f.file)) {
+	if (is_file_hugepages(fd_file(f))) {
 		fdput(f);
 		return -EOPNOTSUPP;
 	}
@@ -4417,7 +4417,7 @@ SYSCALL_DEFINE4(cachestat, unsigned int, fd,
 	last_index =
 		csr.len == 0 ? ULONG_MAX : (csr.off + csr.len - 1) >> PAGE_SHIFT;
 	memset(&cs, 0, sizeof(struct cachestat));
-	mapping = f.file->f_mapping;
+	mapping = fd_file(f)->f_mapping;
 	filemap_cachestat(mapping, first_index, last_index, &cs);
 	fdput(f);
 
diff --git a/mm/memcontrol-v1.c b/mm/memcontrol-v1.c
index 417c96f2da28..9725c731fb21 100644
--- a/mm/memcontrol-v1.c
+++ b/mm/memcontrol-v1.c
@@ -1860,26 +1860,26 @@ static ssize_t memcg_write_event_control(struct kernfs_open_file *of,
 	INIT_WORK(&event->remove, memcg_event_remove);
 
 	efile = fdget(efd);
-	if (!efile.file) {
+	if (!fd_file(efile)) {
 		ret = -EBADF;
 		goto out_kfree;
 	}
 
-	event->eventfd = eventfd_ctx_fileget(efile.file);
+	event->eventfd = eventfd_ctx_fileget(fd_file(efile));
 	if (IS_ERR(event->eventfd)) {
 		ret = PTR_ERR(event->eventfd);
 		goto out_put_efile;
 	}
 
 	cfile = fdget(cfd);
-	if (!cfile.file) {
+	if (!fd_file(cfile)) {
 		ret = -EBADF;
 		goto out_put_eventfd;
 	}
 
 	/* the process need read permission on control file */
 	/* AV: shouldn't we check that it's been opened for read instead? */
-	ret = file_permission(cfile.file, MAY_READ);
+	ret = file_permission(fd_file(cfile), MAY_READ);
 	if (ret < 0)
 		goto out_put_cfile;
 
@@ -1887,7 +1887,7 @@ static ssize_t memcg_write_event_control(struct kernfs_open_file *of,
 	 * The control file must be a regular cgroup1 file. As a regular cgroup
 	 * file can't be renamed, it's safe to access its name afterwards.
 	 */
-	cdentry = cfile.file->f_path.dentry;
+	cdentry = fd_file(cfile)->f_path.dentry;
 	if (cdentry->d_sb->s_type != &cgroup_fs_type || !d_is_reg(cdentry)) {
 		ret = -EINVAL;
 		goto out_put_cfile;
@@ -1939,7 +1939,7 @@ static ssize_t memcg_write_event_control(struct kernfs_open_file *of,
 	if (ret)
 		goto out_put_css;
 
-	vfs_poll(efile.file, &event->pt);
+	vfs_poll(fd_file(efile), &event->pt);
 
 	spin_lock_irq(&memcg->event_list_lock);
 	list_add(&event->list, &memcg->event_list);
diff --git a/mm/readahead.c b/mm/readahead.c
index 517c0be7ce66..e83fe1c6e5ac 100644
--- a/mm/readahead.c
+++ b/mm/readahead.c
@@ -646,7 +646,7 @@ ssize_t ksys_readahead(int fd, loff_t offset, size_t count)
 
 	ret = -EBADF;
 	f = fdget(fd);
-	if (!f.file || !(f.file->f_mode & FMODE_READ))
+	if (!fd_file(f) || !(fd_file(f)->f_mode & FMODE_READ))
 		goto out;
 
 	/*
@@ -655,12 +655,12 @@ ssize_t ksys_readahead(int fd, loff_t offset, size_t count)
 	 * on this file, then we must return -EINVAL.
 	 */
 	ret = -EINVAL;
-	if (!f.file->f_mapping || !f.file->f_mapping->a_ops ||
-	    (!S_ISREG(file_inode(f.file)->i_mode) &&
-	    !S_ISBLK(file_inode(f.file)->i_mode)))
+	if (!fd_file(f)->f_mapping || !fd_file(f)->f_mapping->a_ops ||
+	    (!S_ISREG(file_inode(fd_file(f))->i_mode) &&
+	    !S_ISBLK(file_inode(fd_file(f))->i_mode)))
 		goto out;
 
-	ret = vfs_fadvise(f.file, offset, count, POSIX_FADV_WILLNEED);
+	ret = vfs_fadvise(fd_file(f), offset, count, POSIX_FADV_WILLNEED);
 out:
 	fdput(f);
 	return ret;
diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c
index 6a823ba906c6..18b7c8f31234 100644
--- a/net/core/net_namespace.c
+++ b/net/core/net_namespace.c
@@ -711,11 +711,11 @@ struct net *get_net_ns_by_fd(int fd)
 	struct fd f = fdget(fd);
 	struct net *net = ERR_PTR(-EINVAL);
 
-	if (!f.file)
+	if (!fd_file(f))
 		return ERR_PTR(-EBADF);
 
-	if (proc_ns_file(f.file)) {
-		struct ns_common *ns = get_proc_ns(file_inode(f.file));
+	if (proc_ns_file(fd_file(f))) {
+		struct ns_common *ns = get_proc_ns(file_inode(fd_file(f)));
 		if (ns->ops == &netns_operations)
 			net = get_net(container_of(ns, struct net, ns));
 	}
diff --git a/net/socket.c b/net/socket.c
index fcbdd5bc47ac..f77a42a74510 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -556,8 +556,8 @@ static struct socket *sockfd_lookup_light(int fd, int *err, int *fput_needed)
 	struct socket *sock;
 
 	*err = -EBADF;
-	if (f.file) {
-		sock = sock_from_file(f.file);
+	if (fd_file(f)) {
+		sock = sock_from_file(fd_file(f));
 		if (likely(sock)) {
 			*fput_needed = f.flags & FDPUT_FPUT;
 			return sock;
@@ -2008,8 +2008,8 @@ int __sys_accept4(int fd, struct sockaddr __user *upeer_sockaddr,
 	struct fd f;
 
 	f = fdget(fd);
-	if (f.file) {
-		ret = __sys_accept4_file(f.file, upeer_sockaddr,
+	if (fd_file(f)) {
+		ret = __sys_accept4_file(fd_file(f), upeer_sockaddr,
 					 upeer_addrlen, flags);
 		fdput(f);
 	}
@@ -2070,12 +2070,12 @@ int __sys_connect(int fd, struct sockaddr __user *uservaddr, int addrlen)
 	struct fd f;
 
 	f = fdget(fd);
-	if (f.file) {
+	if (fd_file(f)) {
 		struct sockaddr_storage address;
 
 		ret = move_addr_to_kernel(uservaddr, addrlen, &address);
 		if (!ret)
-			ret = __sys_connect_file(f.file, &address, addrlen, 0);
+			ret = __sys_connect_file(fd_file(f), &address, addrlen, 0);
 		fdput(f);
 	}
 
diff --git a/security/integrity/ima/ima_main.c b/security/integrity/ima/ima_main.c
index f04f43af651c..e7c1d3ae33fe 100644
--- a/security/integrity/ima/ima_main.c
+++ b/security/integrity/ima/ima_main.c
@@ -1068,10 +1068,10 @@ void ima_kexec_cmdline(int kernel_fd, const void *buf, int size)
 		return;
 
 	f = fdget(kernel_fd);
-	if (!f.file)
+	if (!fd_file(f))
 		return;
 
-	process_buffer_measurement(file_mnt_idmap(f.file), file_inode(f.file),
+	process_buffer_measurement(file_mnt_idmap(fd_file(f)), file_inode(fd_file(f)),
 				   buf, size, "kexec-cmdline", KEXEC_CMDLINE, 0,
 				   NULL, false, NULL, 0);
 	fdput(f);
diff --git a/security/landlock/syscalls.c b/security/landlock/syscalls.c
index ccc8bc6c1584..00b63971ab64 100644
--- a/security/landlock/syscalls.c
+++ b/security/landlock/syscalls.c
@@ -238,19 +238,19 @@ static struct landlock_ruleset *get_ruleset_from_fd(const int fd,
 	struct landlock_ruleset *ruleset;
 
 	ruleset_f = fdget(fd);
-	if (!ruleset_f.file)
+	if (!fd_file(ruleset_f))
 		return ERR_PTR(-EBADF);
 
 	/* Checks FD type and access right. */
-	if (ruleset_f.file->f_op != &ruleset_fops) {
+	if (fd_file(ruleset_f)->f_op != &ruleset_fops) {
 		ruleset = ERR_PTR(-EBADFD);
 		goto out_fdput;
 	}
-	if (!(ruleset_f.file->f_mode & mode)) {
+	if (!(fd_file(ruleset_f)->f_mode & mode)) {
 		ruleset = ERR_PTR(-EPERM);
 		goto out_fdput;
 	}
-	ruleset = ruleset_f.file->private_data;
+	ruleset = fd_file(ruleset_f)->private_data;
 	if (WARN_ON_ONCE(ruleset->num_layers != 1)) {
 		ruleset = ERR_PTR(-EINVAL);
 		goto out_fdput;
@@ -277,22 +277,22 @@ static int get_path_from_fd(const s32 fd, struct path *const path)
 
 	/* Handles O_PATH. */
 	f = fdget_raw(fd);
-	if (!f.file)
+	if (!fd_file(f))
 		return -EBADF;
 	/*
 	 * Forbids ruleset FDs, internal filesystems (e.g. nsfs), including
 	 * pseudo filesystems that will never be mountable (e.g. sockfs,
 	 * pipefs).
 	 */
-	if ((f.file->f_op == &ruleset_fops) ||
-	    (f.file->f_path.mnt->mnt_flags & MNT_INTERNAL) ||
-	    (f.file->f_path.dentry->d_sb->s_flags & SB_NOUSER) ||
-	    d_is_negative(f.file->f_path.dentry) ||
-	    IS_PRIVATE(d_backing_inode(f.file->f_path.dentry))) {
+	if ((fd_file(f)->f_op == &ruleset_fops) ||
+	    (fd_file(f)->f_path.mnt->mnt_flags & MNT_INTERNAL) ||
+	    (fd_file(f)->f_path.dentry->d_sb->s_flags & SB_NOUSER) ||
+	    d_is_negative(fd_file(f)->f_path.dentry) ||
+	    IS_PRIVATE(d_backing_inode(fd_file(f)->f_path.dentry))) {
 		err = -EBADFD;
 		goto out_fdput;
 	}
-	*path = f.file->f_path;
+	*path = fd_file(f)->f_path;
 	path_get(path);
 
 out_fdput:
diff --git a/security/loadpin/loadpin.c b/security/loadpin/loadpin.c
index 93fd4d47b334..02144ec39f43 100644
--- a/security/loadpin/loadpin.c
+++ b/security/loadpin/loadpin.c
@@ -296,7 +296,7 @@ static int read_trusted_verity_root_digests(unsigned int fd)
 		return -EPERM;
 
 	f = fdget(fd);
-	if (!f.file)
+	if (!fd_file(f))
 		return -EINVAL;
 
 	data = kzalloc(SZ_4K, GFP_KERNEL);
@@ -305,7 +305,7 @@ static int read_trusted_verity_root_digests(unsigned int fd)
 		goto err;
 	}
 
-	rc = kernel_read_file(f.file, 0, (void **)&data, SZ_4K - 1, NULL, READING_POLICY);
+	rc = kernel_read_file(fd_file(f), 0, (void **)&data, SZ_4K - 1, NULL, READING_POLICY);
 	if (rc < 0)
 		goto err;
 
diff --git a/sound/core/pcm_native.c b/sound/core/pcm_native.c
index 4057f9f10aee..cbb9c972cb93 100644
--- a/sound/core/pcm_native.c
+++ b/sound/core/pcm_native.c
@@ -2250,12 +2250,12 @@ static int snd_pcm_link(struct snd_pcm_substream *substream, int fd)
 	bool nonatomic = substream->pcm->nonatomic;
 	CLASS(fd, f)(fd);
 
-	if (!f.file)
+	if (!fd_file(f))
 		return -EBADFD;
-	if (!is_pcm_file(f.file))
+	if (!is_pcm_file(fd_file(f)))
 		return -EBADFD;
 
-	pcm_file = f.file->private_data;
+	pcm_file = fd_file(f)->private_data;
 	substream1 = pcm_file->substream;
 
 	if (substream == substream1)
diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c
index 229570059a1b..65efb3735e79 100644
--- a/virt/kvm/eventfd.c
+++ b/virt/kvm/eventfd.c
@@ -327,12 +327,12 @@ kvm_irqfd_assign(struct kvm *kvm, struct kvm_irqfd *args)
 	seqcount_spinlock_init(&irqfd->irq_entry_sc, &kvm->irqfds.lock);
 
 	f = fdget(args->fd);
-	if (!f.file) {
+	if (!fd_file(f)) {
 		ret = -EBADF;
 		goto out;
 	}
 
-	eventfd = eventfd_ctx_fileget(f.file);
+	eventfd = eventfd_ctx_fileget(fd_file(f));
 	if (IS_ERR(eventfd)) {
 		ret = PTR_ERR(eventfd);
 		goto fail;
@@ -419,7 +419,7 @@ kvm_irqfd_assign(struct kvm *kvm, struct kvm_irqfd *args)
 	 * Check if there was an event already pending on the eventfd
 	 * before we registered, and trigger it as if we didn't miss it.
 	 */
-	events = vfs_poll(f.file, &irqfd->pt);
+	events = vfs_poll(fd_file(f), &irqfd->pt);
 
 	if (events & EPOLLIN)
 		schedule_work(&irqfd->inject);
diff --git a/virt/kvm/vfio.c b/virt/kvm/vfio.c
index 76b7f6085dcd..388ae471d258 100644
--- a/virt/kvm/vfio.c
+++ b/virt/kvm/vfio.c
@@ -194,7 +194,7 @@ static int kvm_vfio_file_del(struct kvm_device *dev, unsigned int fd)
 	int ret;
 
 	f = fdget(fd);
-	if (!f.file)
+	if (!fd_file(f))
 		return -EBADF;
 
 	ret = -ENOENT;
@@ -202,7 +202,7 @@ static int kvm_vfio_file_del(struct kvm_device *dev, unsigned int fd)
 	mutex_lock(&kv->lock);
 
 	list_for_each_entry(kvf, &kv->file_list, node) {
-		if (kvf->file != f.file)
+		if (kvf->file != fd_file(f))
 			continue;
 
 		list_del(&kvf->node);
@@ -240,7 +240,7 @@ static int kvm_vfio_file_set_spapr_tce(struct kvm_device *dev,
 		return -EFAULT;
 
 	f = fdget(param.groupfd);
-	if (!f.file)
+	if (!fd_file(f))
 		return -EBADF;
 
 	ret = -ENOENT;
@@ -248,7 +248,7 @@ static int kvm_vfio_file_set_spapr_tce(struct kvm_device *dev,
 	mutex_lock(&kv->lock);
 
 	list_for_each_entry(kvf, &kv->file_list, node) {
-		if (kvf->file != f.file)
+		if (kvf->file != fd_file(f))
 			continue;
 
 		if (!kvf->iommu_group) {
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 134+ messages in thread

* Re: [PATCH 02/39] introduce fd_file(), convert all accessors to it.
  2024-07-30  5:15   ` [PATCH 02/39] introduce fd_file(), convert all accessors to it viro
@ 2024-08-07  9:55     ` Christian Brauner
  0 siblings, 0 replies; 134+ messages in thread
From: Christian Brauner @ 2024-08-07  9:55 UTC (permalink / raw)
  To: viro; +Cc: linux-fsdevel, amir73il, bpf, cgroups, kvm, netdev, torvalds

On Tue, Jul 30, 2024 at 01:15:48AM GMT, viro@kernel.org wrote:
> From: Al Viro <viro@zeniv.linux.org.uk>
> 
> 	For any changes of struct fd representation we need to
> turn existing accesses to fields into calls of wrappers.
> Accesses to struct fd::flags are very few (3 in linux/file.h,
> 1 in net/socket.c, 3 in fs/overlayfs/file.c and 3 more in
> explicit initializers).
> 	Those can be dealt with in the commit converting to
> new layout; accesses to struct fd::file are too many for that.
> 	This commit converts (almost) all of f.file to
> fd_file(f).  It's not entirely mechanical ('file' is used as
> a member name more than just in struct fd) and it does not
> even attempt to distinguish the uses in pointer context from
> those in boolean context; the latter will be eventually turned
> into a separate helper (fd_empty()).
> 
> 	NOTE: mass conversion to fd_empty(), tempting as it
> might be, is a bad idea; better do that piecewise in commit
> that convert from fdget...() to CLASS(...).
> 
> [conflicts in fs/fhandle.c, kernel/bpf/syscall.c, mm/memcontrol.c
> caught by git; fs/stat.c one got caught by git grep]
> [fs/xattr.c conflict]
> 
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> ---

Reviewed-by: Christian Brauner <brauner@kernel.org>

^ permalink raw reply	[flat|nested] 134+ messages in thread

* [PATCH 03/39] struct fd: representation change
  2024-07-30  5:15 ` [PATCH 01/39] memcg_write_event_control(): fix a user-triggerable oops viro
  2024-07-30  5:15   ` [PATCH 02/39] introduce fd_file(), convert all accessors to it viro
@ 2024-07-30  5:15   ` viro
  2024-07-30 18:10     ` Josef Bacik
  2024-08-07 10:03     ` Christian Brauner
  2024-07-30  5:15   ` [PATCH 04/39] add struct fd constructors, get rid of __to_fd() viro
                     ` (36 subsequent siblings)
  38 siblings, 2 replies; 134+ messages in thread
From: viro @ 2024-07-30  5:15 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: amir73il, bpf, brauner, cgroups, kvm, netdev, torvalds

From: Al Viro <viro@zeniv.linux.org.uk>

	The absolute majority of instances comes from fdget() and its
relatives; the underlying primitives actually return a struct file
reference and a couple of flags encoded into an unsigned long - the lower
two bits of file address are always zero, so we can stash the flags
into those.  On the way out we use __to_fd() to unpack that unsigned
long into struct fd.

	Let's use that representation for struct fd itself - make it
a structure with a single unsigned long member (.word), with the value
equal either to (unsigned long)p | flags, p being an address of some
struct file instance, or to 0 for an empty fd.

	Note that we never used a struct fd instance with NULL ->file
and non-zero ->flags; the emptiness had been checked as (!f.file) and
we expected e.g. fdput(empty) to be a no-op.  With new representation
we can use (!f.word) for emptiness check; that is enough for compiler
to figure out that (f.word & FDPUT_FPUT) will be false and that fdput(f)
will be a no-op in such case.

	For now the new predicate (fd_empty(f)) has no users; all the
existing checks have form (!fd_file(f)).  We will convert to fd_empty()
use later; here we only define it (and tell the compiler that it's
unlikely to return true).

	This commit only deals with representation change; there will
be followups.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 drivers/infiniband/core/uverbs_cmd.c |  2 +-
 fs/overlayfs/file.c                  | 28 +++++++++++++++-------------
 fs/xfs/xfs_handle.c                  |  2 +-
 include/linux/file.h                 | 22 ++++++++++++++++------
 kernel/events/core.c                 |  2 +-
 net/socket.c                         |  2 +-
 6 files changed, 35 insertions(+), 23 deletions(-)

diff --git a/drivers/infiniband/core/uverbs_cmd.c b/drivers/infiniband/core/uverbs_cmd.c
index 3f85575cf971..a4cce360df21 100644
--- a/drivers/infiniband/core/uverbs_cmd.c
+++ b/drivers/infiniband/core/uverbs_cmd.c
@@ -572,7 +572,7 @@ static int ib_uverbs_open_xrcd(struct uverbs_attr_bundle *attrs)
 	struct inode                   *inode = NULL;
 	int				new_xrcd = 0;
 	struct ib_device *ib_dev;
-	struct fd f = {};
+	struct fd f = EMPTY_FD;
 	int ret;
 
 	ret = uverbs_request(attrs, &cmd, sizeof(cmd));
diff --git a/fs/overlayfs/file.c b/fs/overlayfs/file.c
index c4963d0c5549..2b7a5a3a7a2f 100644
--- a/fs/overlayfs/file.c
+++ b/fs/overlayfs/file.c
@@ -93,11 +93,11 @@ static int ovl_real_fdget_meta(const struct file *file, struct fd *real,
 			       bool allow_meta)
 {
 	struct dentry *dentry = file_dentry(file);
+	struct file *realfile = file->private_data;
 	struct path realpath;
 	int err;
 
-	real->flags = 0;
-	real->file = file->private_data;
+	real->word = (unsigned long)realfile;
 
 	if (allow_meta) {
 		ovl_path_real(dentry, &realpath);
@@ -113,16 +113,17 @@ static int ovl_real_fdget_meta(const struct file *file, struct fd *real,
 		return -EIO;
 
 	/* Has it been copied up since we'd opened it? */
-	if (unlikely(file_inode(real->file) != d_inode(realpath.dentry))) {
-		real->flags = FDPUT_FPUT;
-		real->file = ovl_open_realfile(file, &realpath);
-
-		return PTR_ERR_OR_ZERO(real->file);
+	if (unlikely(file_inode(realfile) != d_inode(realpath.dentry))) {
+		struct file *f = ovl_open_realfile(file, &realpath);
+		if (IS_ERR(f))
+			return PTR_ERR(f);
+		real->word = (unsigned long)ovl_open_realfile(file, &realpath) | FDPUT_FPUT;
+		return 0;
 	}
 
 	/* Did the flags change since open? */
-	if (unlikely((file->f_flags ^ real->file->f_flags) & ~OVL_OPEN_FLAGS))
-		return ovl_change_flags(real->file, file->f_flags);
+	if (unlikely((file->f_flags ^ realfile->f_flags) & ~OVL_OPEN_FLAGS))
+		return ovl_change_flags(realfile, file->f_flags);
 
 	return 0;
 }
@@ -130,10 +131,11 @@ static int ovl_real_fdget_meta(const struct file *file, struct fd *real,
 static int ovl_real_fdget(const struct file *file, struct fd *real)
 {
 	if (d_is_dir(file_dentry(file))) {
-		real->flags = 0;
-		real->file = ovl_dir_real_file(file, false);
-
-		return PTR_ERR_OR_ZERO(real->file);
+		struct file *f = ovl_dir_real_file(file, false);
+		if (IS_ERR(f))
+			return PTR_ERR(f);
+		real->word = (unsigned long)f;
+		return 0;
 	}
 
 	return ovl_real_fdget_meta(file, real, false);
diff --git a/fs/xfs/xfs_handle.c b/fs/xfs/xfs_handle.c
index 7bcc4f519cb8..49e5e5f04e60 100644
--- a/fs/xfs/xfs_handle.c
+++ b/fs/xfs/xfs_handle.c
@@ -85,7 +85,7 @@ xfs_find_handle(
 	int			hsize;
 	xfs_handle_t		handle;
 	struct inode		*inode;
-	struct fd		f = {NULL};
+	struct fd		f = EMPTY_FD;
 	struct path		path;
 	int			error;
 	struct xfs_inode	*ip;
diff --git a/include/linux/file.h b/include/linux/file.h
index 0f3f369f2450..bdd6e1766839 100644
--- a/include/linux/file.h
+++ b/include/linux/file.h
@@ -35,18 +35,28 @@ static inline void fput_light(struct file *file, int fput_needed)
 		fput(file);
 }
 
+/* either a reference to struct file + flags
+ * (cloned vs. borrowed, pos locked), with
+ * flags stored in lower bits of value,
+ * or empty (represented by 0).
+ */
 struct fd {
-	struct file *file;
-	unsigned int flags;
+	unsigned long word;
 };
 #define FDPUT_FPUT       1
 #define FDPUT_POS_UNLOCK 2
 
-#define fd_file(f) ((f).file)
+#define fd_file(f) ((struct file *)((f).word & ~3))
+static inline bool fd_empty(struct fd f)
+{
+	return unlikely(!f.word);
+}
+
+#define EMPTY_FD (struct fd){0}
 
 static inline void fdput(struct fd fd)
 {
-	if (fd.flags & FDPUT_FPUT)
+	if (fd.word & FDPUT_FPUT)
 		fput(fd_file(fd));
 }
 
@@ -60,7 +70,7 @@ extern void __f_unlock_pos(struct file *);
 
 static inline struct fd __to_fd(unsigned long v)
 {
-	return (struct fd){(struct file *)(v & ~3),v & 3};
+	return (struct fd){v};
 }
 
 static inline struct fd fdget(unsigned int fd)
@@ -80,7 +90,7 @@ static inline struct fd fdget_pos(int fd)
 
 static inline void fdput_pos(struct fd f)
 {
-	if (f.flags & FDPUT_POS_UNLOCK)
+	if (f.word & FDPUT_POS_UNLOCK)
 		__f_unlock_pos(fd_file(f));
 	fdput(f);
 }
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 17b19d3e74ba..fd2ac9c7fd77 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -12474,7 +12474,7 @@ SYSCALL_DEFINE5(perf_event_open,
 	struct perf_event_attr attr;
 	struct perf_event_context *ctx;
 	struct file *event_file = NULL;
-	struct fd group = {NULL, 0};
+	struct fd group = EMPTY_FD;
 	struct task_struct *task = NULL;
 	struct pmu *pmu;
 	int event_fd;
diff --git a/net/socket.c b/net/socket.c
index f77a42a74510..c0d4f5032374 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -559,7 +559,7 @@ static struct socket *sockfd_lookup_light(int fd, int *err, int *fput_needed)
 	if (fd_file(f)) {
 		sock = sock_from_file(fd_file(f));
 		if (likely(sock)) {
-			*fput_needed = f.flags & FDPUT_FPUT;
+			*fput_needed = f.word & FDPUT_FPUT;
 			return sock;
 		}
 		*err = -ENOTSOCK;
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 134+ messages in thread

* Re: [PATCH 03/39] struct fd: representation change
  2024-07-30  5:15   ` [PATCH 03/39] struct fd: representation change viro
@ 2024-07-30 18:10     ` Josef Bacik
  2024-08-07 10:07       ` Christian Brauner
  2024-08-07 10:03     ` Christian Brauner
  1 sibling, 1 reply; 134+ messages in thread
From: Josef Bacik @ 2024-07-30 18:10 UTC (permalink / raw)
  To: viro; +Cc: linux-fsdevel, amir73il, bpf, brauner, cgroups, kvm, netdev,
	torvalds

On Tue, Jul 30, 2024 at 01:15:49AM -0400, viro@kernel.org wrote:
> From: Al Viro <viro@zeniv.linux.org.uk>
> 
> 	The absolute majority of instances comes from fdget() and its
> relatives; the underlying primitives actually return a struct file
> reference and a couple of flags encoded into an unsigned long - the lower
> two bits of file address are always zero, so we can stash the flags
> into those.  On the way out we use __to_fd() to unpack that unsigned
> long into struct fd.
> 
> 	Let's use that representation for struct fd itself - make it
> a structure with a single unsigned long member (.word), with the value
> equal either to (unsigned long)p | flags, p being an address of some
> struct file instance, or to 0 for an empty fd.
> 
> 	Note that we never used a struct fd instance with NULL ->file
> and non-zero ->flags; the emptiness had been checked as (!f.file) and
> we expected e.g. fdput(empty) to be a no-op.  With new representation
> we can use (!f.word) for emptiness check; that is enough for compiler
> to figure out that (f.word & FDPUT_FPUT) will be false and that fdput(f)
> will be a no-op in such case.
> 
> 	For now the new predicate (fd_empty(f)) has no users; all the
> existing checks have form (!fd_file(f)).  We will convert to fd_empty()
> use later; here we only define it (and tell the compiler that it's
> unlikely to return true).
> 
> 	This commit only deals with representation change; there will
> be followups.

I'm still trawling through all of this code and trying to grok it, but one thing
I kept wondering was wtf would we do this ->word trick in the first place?  It
seemed needlessly complicated and I don't love having structs with inprecise
members.

But then buried deep in your cover letter you have this

"""
It's not that hard to deal with - the real primitives behind fdget()
et.al. are returning an unsigned long value, unpacked by (inlined)
__to_fd() into the current struct file * + int.  Linus suggested that
keeping that unsigned long around with the extractions done by inlined
accessors should generate a sane code and that turns out to be the case.
Turning struct fd into a struct-wrapped unsinged long, with
	fd_empty(f) => unlikely(f.word == 0)
	fd_file(f) => (struct file *)(f.word & ~3)
	fdput(f) => if (f.word & 1) fput(fd_file(f))
ends up with compiler doing the right thing.  The cost is the patch
footprint, of course - we need to switch f.file to fd_file(f) all over
the tree, and it's not doable with simple search and replace; there are
false positives, etc.  Might become a PITA at merge time; however,
actual update of that from 6.10-rc1 to 6.11-rc1 had brought surprisingly
few conflicts.
"""

Which makes a whole lot of sense.  The member name doesn't matter much since
we're now using helpers everywhere, but for an idiot like me having something
like this

struct fd {
	unsigned long __file_ptr;
};

would make it so that you don't have random people deciding they can access
fd->file, and it helps make it more clear what exactly the point of this thing
is.

That being said I think renaming it really isn't the important part, I think
including something like the bit you wrote above in the commit message would
help when somebody is trying to figure out why this more obtuse strategy is used
rather than just having struct file *__file with the low bit masking stuff
tacked on the side.

I'm still working through all the patches, but in general the doc made sense,
and the patches thusfar are easy enough to follow.  Thanks,

Josef

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCH 03/39] struct fd: representation change
  2024-07-30 18:10     ` Josef Bacik
@ 2024-08-07 10:07       ` Christian Brauner
  0 siblings, 0 replies; 134+ messages in thread
From: Christian Brauner @ 2024-08-07 10:07 UTC (permalink / raw)
  To: Josef Bacik
  Cc: viro, linux-fsdevel, amir73il, bpf, cgroups, kvm, netdev,
	torvalds

> Which makes a whole lot of sense.  The member name doesn't matter much since
> we're now using helpers everywhere, but for an idiot like me having something
> like this
> 
> struct fd {
> 	unsigned long __file_ptr;
> };

I agree that that would be helpful.

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCH 03/39] struct fd: representation change
  2024-07-30  5:15   ` [PATCH 03/39] struct fd: representation change viro
  2024-07-30 18:10     ` Josef Bacik
@ 2024-08-07 10:03     ` Christian Brauner
  1 sibling, 0 replies; 134+ messages in thread
From: Christian Brauner @ 2024-08-07 10:03 UTC (permalink / raw)
  To: viro; +Cc: linux-fsdevel, amir73il, bpf, cgroups, kvm, netdev, torvalds

On Tue, Jul 30, 2024 at 01:15:49AM GMT, viro@kernel.org wrote:
> From: Al Viro <viro@zeniv.linux.org.uk>
> 
> 	The absolute majority of instances comes from fdget() and its
> relatives; the underlying primitives actually return a struct file
> reference and a couple of flags encoded into an unsigned long - the lower
> two bits of file address are always zero, so we can stash the flags
> into those.  On the way out we use __to_fd() to unpack that unsigned
> long into struct fd.
> 
> 	Let's use that representation for struct fd itself - make it
> a structure with a single unsigned long member (.word), with the value
> equal either to (unsigned long)p | flags, p being an address of some
> struct file instance, or to 0 for an empty fd.
> 
> 	Note that we never used a struct fd instance with NULL ->file
> and non-zero ->flags; the emptiness had been checked as (!f.file) and
> we expected e.g. fdput(empty) to be a no-op.  With new representation
> we can use (!f.word) for emptiness check; that is enough for compiler
> to figure out that (f.word & FDPUT_FPUT) will be false and that fdput(f)
> will be a no-op in such case.
> 
> 	For now the new predicate (fd_empty(f)) has no users; all the
> existing checks have form (!fd_file(f)).  We will convert to fd_empty()
> use later; here we only define it (and tell the compiler that it's
> unlikely to return true).
> 
> 	This commit only deals with representation change; there will
> be followups.
> 
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> ---
>  drivers/infiniband/core/uverbs_cmd.c |  2 +-
>  fs/overlayfs/file.c                  | 28 +++++++++++++++-------------
>  fs/xfs/xfs_handle.c                  |  2 +-
>  include/linux/file.h                 | 22 ++++++++++++++++------
>  kernel/events/core.c                 |  2 +-
>  net/socket.c                         |  2 +-
>  6 files changed, 35 insertions(+), 23 deletions(-)
> 
> diff --git a/drivers/infiniband/core/uverbs_cmd.c b/drivers/infiniband/core/uverbs_cmd.c
> index 3f85575cf971..a4cce360df21 100644
> --- a/drivers/infiniband/core/uverbs_cmd.c
> +++ b/drivers/infiniband/core/uverbs_cmd.c
> @@ -572,7 +572,7 @@ static int ib_uverbs_open_xrcd(struct uverbs_attr_bundle *attrs)
>  	struct inode                   *inode = NULL;
>  	int				new_xrcd = 0;
>  	struct ib_device *ib_dev;
> -	struct fd f = {};
> +	struct fd f = EMPTY_FD;
>  	int ret;
>  
>  	ret = uverbs_request(attrs, &cmd, sizeof(cmd));
> diff --git a/fs/overlayfs/file.c b/fs/overlayfs/file.c
> index c4963d0c5549..2b7a5a3a7a2f 100644
> --- a/fs/overlayfs/file.c
> +++ b/fs/overlayfs/file.c
> @@ -93,11 +93,11 @@ static int ovl_real_fdget_meta(const struct file *file, struct fd *real,
>  			       bool allow_meta)
>  {
>  	struct dentry *dentry = file_dentry(file);
> +	struct file *realfile = file->private_data;
>  	struct path realpath;
>  	int err;
>  
> -	real->flags = 0;
> -	real->file = file->private_data;
> +	real->word = (unsigned long)realfile;
>  
>  	if (allow_meta) {
>  		ovl_path_real(dentry, &realpath);
> @@ -113,16 +113,17 @@ static int ovl_real_fdget_meta(const struct file *file, struct fd *real,
>  		return -EIO;
>  
>  	/* Has it been copied up since we'd opened it? */
> -	if (unlikely(file_inode(real->file) != d_inode(realpath.dentry))) {
> -		real->flags = FDPUT_FPUT;
> -		real->file = ovl_open_realfile(file, &realpath);
> -
> -		return PTR_ERR_OR_ZERO(real->file);
> +	if (unlikely(file_inode(realfile) != d_inode(realpath.dentry))) {
> +		struct file *f = ovl_open_realfile(file, &realpath);
> +		if (IS_ERR(f))
> +			return PTR_ERR(f);
> +		real->word = (unsigned long)ovl_open_realfile(file, &realpath) | FDPUT_FPUT;
> +		return 0;
>  	}
>  
>  	/* Did the flags change since open? */
> -	if (unlikely((file->f_flags ^ real->file->f_flags) & ~OVL_OPEN_FLAGS))
> -		return ovl_change_flags(real->file, file->f_flags);
> +	if (unlikely((file->f_flags ^ realfile->f_flags) & ~OVL_OPEN_FLAGS))
> +		return ovl_change_flags(realfile, file->f_flags);
>  
>  	return 0;
>  }
> @@ -130,10 +131,11 @@ static int ovl_real_fdget_meta(const struct file *file, struct fd *real,
>  static int ovl_real_fdget(const struct file *file, struct fd *real)
>  {
>  	if (d_is_dir(file_dentry(file))) {
> -		real->flags = 0;
> -		real->file = ovl_dir_real_file(file, false);
> -
> -		return PTR_ERR_OR_ZERO(real->file);
> +		struct file *f = ovl_dir_real_file(file, false);
> +		if (IS_ERR(f))
> +			return PTR_ERR(f);
> +		real->word = (unsigned long)f;
> +		return 0;
>  	}
>  
>  	return ovl_real_fdget_meta(file, real, false);
> diff --git a/fs/xfs/xfs_handle.c b/fs/xfs/xfs_handle.c
> index 7bcc4f519cb8..49e5e5f04e60 100644
> --- a/fs/xfs/xfs_handle.c
> +++ b/fs/xfs/xfs_handle.c
> @@ -85,7 +85,7 @@ xfs_find_handle(
>  	int			hsize;
>  	xfs_handle_t		handle;
>  	struct inode		*inode;
> -	struct fd		f = {NULL};
> +	struct fd		f = EMPTY_FD;
>  	struct path		path;
>  	int			error;
>  	struct xfs_inode	*ip;
> diff --git a/include/linux/file.h b/include/linux/file.h
> index 0f3f369f2450..bdd6e1766839 100644
> --- a/include/linux/file.h
> +++ b/include/linux/file.h
> @@ -35,18 +35,28 @@ static inline void fput_light(struct file *file, int fput_needed)
>  		fput(file);
>  }
>  
> +/* either a reference to struct file + flags
> + * (cloned vs. borrowed, pos locked), with
> + * flags stored in lower bits of value,
> + * or empty (represented by 0).
> + */
>  struct fd {
> -	struct file *file;
> -	unsigned int flags;
> +	unsigned long word;
>  };
>  #define FDPUT_FPUT       1
>  #define FDPUT_POS_UNLOCK 2
>  
> -#define fd_file(f) ((f).file)
> +#define fd_file(f) ((struct file *)((f).word & ~3))

Minor thing but it would be nice if you'd use the macros for this
instead of the raw number:

#define fd_file(f) ((struct file *)((f).word & ~(FDPUT_FPUT | FDPUT_POS_UNLOCK))

Reviewed-by: Christian Brauner <brauner@kernel.org>

^ permalink raw reply	[flat|nested] 134+ messages in thread

* [PATCH 04/39] add struct fd constructors, get rid of __to_fd()
  2024-07-30  5:15 ` [PATCH 01/39] memcg_write_event_control(): fix a user-triggerable oops viro
  2024-07-30  5:15   ` [PATCH 02/39] introduce fd_file(), convert all accessors to it viro
  2024-07-30  5:15   ` [PATCH 03/39] struct fd: representation change viro
@ 2024-07-30  5:15   ` viro
  2024-08-07 10:09     ` Christian Brauner
  2024-07-30  5:15   ` [PATCH 05/39] regularize emptiness checks in fini_module(2) and vfs_dedupe_file_range() viro
                     ` (35 subsequent siblings)
  38 siblings, 1 reply; 134+ messages in thread
From: viro @ 2024-07-30  5:15 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: amir73il, bpf, brauner, cgroups, kvm, netdev, torvalds

From: Al Viro <viro@zeniv.linux.org.uk>

	Make __fdget() et.al. return struct fd directly.
New helpers: BORROWED_FD(file) and CLONED_FD(file), for
borrowed and cloned file references resp.

	NOTE: this might need tuning; in particular, inline on
__fget_light() is there to keep the code generation same as
before - we probably want to keep it inlined in fdget() et.al.
(especially so in fdget_pos()), but that needs profiling.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/file.c            | 26 +++++++++++++-------------
 include/linux/file.h | 33 +++++++++++----------------------
 2 files changed, 24 insertions(+), 35 deletions(-)

diff --git a/fs/file.c b/fs/file.c
index a3b72aa64f11..6f5ed4daafd2 100644
--- a/fs/file.c
+++ b/fs/file.c
@@ -1128,7 +1128,7 @@ EXPORT_SYMBOL(task_lookup_next_fdget_rcu);
  * The fput_needed flag returned by fget_light should be passed to the
  * corresponding fput_light.
  */
-static unsigned long __fget_light(unsigned int fd, fmode_t mask)
+static inline struct fd __fget_light(unsigned int fd, fmode_t mask)
 {
 	struct files_struct *files = current->files;
 	struct file *file;
@@ -1145,22 +1145,22 @@ static unsigned long __fget_light(unsigned int fd, fmode_t mask)
 	if (likely(atomic_read_acquire(&files->count) == 1)) {
 		file = files_lookup_fd_raw(files, fd);
 		if (!file || unlikely(file->f_mode & mask))
-			return 0;
-		return (unsigned long)file;
+			return EMPTY_FD;
+		return BORROWED_FD(file);
 	} else {
 		file = __fget_files(files, fd, mask);
 		if (!file)
-			return 0;
-		return FDPUT_FPUT | (unsigned long)file;
+			return EMPTY_FD;
+		return CLONED_FD(file);
 	}
 }
-unsigned long __fdget(unsigned int fd)
+struct fd fdget(unsigned int fd)
 {
 	return __fget_light(fd, FMODE_PATH);
 }
-EXPORT_SYMBOL(__fdget);
+EXPORT_SYMBOL(fdget);
 
-unsigned long __fdget_raw(unsigned int fd)
+struct fd fdget_raw(unsigned int fd)
 {
 	return __fget_light(fd, 0);
 }
@@ -1181,16 +1181,16 @@ static inline bool file_needs_f_pos_lock(struct file *file)
 		(file_count(file) > 1 || file->f_op->iterate_shared);
 }
 
-unsigned long __fdget_pos(unsigned int fd)
+struct fd fdget_pos(unsigned int fd)
 {
-	unsigned long v = __fdget(fd);
-	struct file *file = (struct file *)(v & ~3);
+	struct fd f = fdget(fd);
+	struct file *file = fd_file(f);
 
 	if (file && file_needs_f_pos_lock(file)) {
-		v |= FDPUT_POS_UNLOCK;
+		f.word |= FDPUT_POS_UNLOCK;
 		mutex_lock(&file->f_pos_lock);
 	}
-	return v;
+	return f;
 }
 
 void __f_unlock_pos(struct file *f)
diff --git a/include/linux/file.h b/include/linux/file.h
index bdd6e1766839..00a42604d322 100644
--- a/include/linux/file.h
+++ b/include/linux/file.h
@@ -53,6 +53,14 @@ static inline bool fd_empty(struct fd f)
 }
 
 #define EMPTY_FD (struct fd){0}
+static inline struct fd BORROWED_FD(struct file *f)
+{
+	return (struct fd){(unsigned long)f};
+}
+static inline struct fd CLONED_FD(struct file *f)
+{
+	return (struct fd){(unsigned long)f | FDPUT_FPUT};
+}
 
 static inline void fdput(struct fd fd)
 {
@@ -63,30 +71,11 @@ static inline void fdput(struct fd fd)
 extern struct file *fget(unsigned int fd);
 extern struct file *fget_raw(unsigned int fd);
 extern struct file *fget_task(struct task_struct *task, unsigned int fd);
-extern unsigned long __fdget(unsigned int fd);
-extern unsigned long __fdget_raw(unsigned int fd);
-extern unsigned long __fdget_pos(unsigned int fd);
 extern void __f_unlock_pos(struct file *);
 
-static inline struct fd __to_fd(unsigned long v)
-{
-	return (struct fd){v};
-}
-
-static inline struct fd fdget(unsigned int fd)
-{
-	return __to_fd(__fdget(fd));
-}
-
-static inline struct fd fdget_raw(unsigned int fd)
-{
-	return __to_fd(__fdget_raw(fd));
-}
-
-static inline struct fd fdget_pos(int fd)
-{
-	return __to_fd(__fdget_pos(fd));
-}
+struct fd fdget(unsigned int fd);
+struct fd fdget_raw(unsigned int fd);
+struct fd fdget_pos(unsigned int fd);
 
 static inline void fdput_pos(struct fd f)
 {
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 134+ messages in thread

* Re: [PATCH 04/39] add struct fd constructors, get rid of __to_fd()
  2024-07-30  5:15   ` [PATCH 04/39] add struct fd constructors, get rid of __to_fd() viro
@ 2024-08-07 10:09     ` Christian Brauner
  0 siblings, 0 replies; 134+ messages in thread
From: Christian Brauner @ 2024-08-07 10:09 UTC (permalink / raw)
  To: viro; +Cc: linux-fsdevel, amir73il, bpf, cgroups, kvm, netdev, torvalds

On Tue, Jul 30, 2024 at 01:15:50AM GMT, viro@kernel.org wrote:
> From: Al Viro <viro@zeniv.linux.org.uk>
> 
> 	Make __fdget() et.al. return struct fd directly.
> New helpers: BORROWED_FD(file) and CLONED_FD(file), for
> borrowed and cloned file references resp.
> 
> 	NOTE: this might need tuning; in particular, inline on
> __fget_light() is there to keep the code generation same as
> before - we probably want to keep it inlined in fdget() et.al.
> (especially so in fdget_pos()), but that needs profiling.
> 
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> ---

Reviewed-by: Christian Brauner <brauner@kernel.org>

^ permalink raw reply	[flat|nested] 134+ messages in thread

* [PATCH 05/39] regularize emptiness checks in fini_module(2) and vfs_dedupe_file_range()
  2024-07-30  5:15 ` [PATCH 01/39] memcg_write_event_control(): fix a user-triggerable oops viro
                     ` (2 preceding siblings ...)
  2024-07-30  5:15   ` [PATCH 04/39] add struct fd constructors, get rid of __to_fd() viro
@ 2024-07-30  5:15   ` viro
  2024-08-07 10:10     ` Christian Brauner
  2024-07-30  5:15   ` [PATCH 06/39] net/socket.c: switch to CLASS(fd) viro
                     ` (34 subsequent siblings)
  38 siblings, 1 reply; 134+ messages in thread
From: viro @ 2024-07-30  5:15 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: amir73il, bpf, brauner, cgroups, kvm, netdev, torvalds

From: Al Viro <viro@zeniv.linux.org.uk>

Currently all emptiness checks are done as fd_file(...) in boolean
context (usually something like if (!fd_file(f))...); those will be
taken care of later.

However, there's a couple of places where we do those checks as
'store fd_file(...) into a variable, then check if this variable is
NULL' and those are harder to spot.

Get rid of those now.

use fd_empty() instead of extracting file and then checking it for NULL.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/remap_range.c     | 5 ++---
 kernel/module/main.c | 4 +++-
 2 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/fs/remap_range.c b/fs/remap_range.c
index 4403d5c68fcb..017d0d1ea6c9 100644
--- a/fs/remap_range.c
+++ b/fs/remap_range.c
@@ -537,9 +537,8 @@ int vfs_dedupe_file_range(struct file *file, struct file_dedupe_range *same)
 
 	for (i = 0, info = same->info; i < count; i++, info++) {
 		struct fd dst_fd = fdget(info->dest_fd);
-		struct file *dst_file = fd_file(dst_fd);
 
-		if (!dst_file) {
+		if (fd_empty(dst_fd)) {
 			info->status = -EBADF;
 			goto next_loop;
 		}
@@ -549,7 +548,7 @@ int vfs_dedupe_file_range(struct file *file, struct file_dedupe_range *same)
 			goto next_fdput;
 		}
 
-		deduped = vfs_dedupe_file_range_one(file, off, dst_file,
+		deduped = vfs_dedupe_file_range_one(file, off, fd_file(dst_fd),
 						    info->dest_offset, len,
 						    REMAP_FILE_CAN_SHORTEN);
 		if (deduped == -EBADE)
diff --git a/kernel/module/main.c b/kernel/module/main.c
index 6ed334eecc14..93cc9fea9d9d 100644
--- a/kernel/module/main.c
+++ b/kernel/module/main.c
@@ -3180,7 +3180,7 @@ static int idempotent_init_module(struct file *f, const char __user * uargs, int
 {
 	struct idempotent idem;
 
-	if (!f || !(f->f_mode & FMODE_READ))
+	if (!(f->f_mode & FMODE_READ))
 		return -EBADF;
 
 	/* See if somebody else is doing the operation? */
@@ -3211,6 +3211,8 @@ SYSCALL_DEFINE3(finit_module, int, fd, const char __user *, uargs, int, flags)
 		return -EINVAL;
 
 	f = fdget(fd);
+	if (fd_empty(f))
+		return -EBADF;
 	err = idempotent_init_module(fd_file(f), uargs, flags);
 	fdput(f);
 	return err;
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 134+ messages in thread

* Re: [PATCH 05/39] regularize emptiness checks in fini_module(2) and vfs_dedupe_file_range()
  2024-07-30  5:15   ` [PATCH 05/39] regularize emptiness checks in fini_module(2) and vfs_dedupe_file_range() viro
@ 2024-08-07 10:10     ` Christian Brauner
  0 siblings, 0 replies; 134+ messages in thread
From: Christian Brauner @ 2024-08-07 10:10 UTC (permalink / raw)
  To: viro; +Cc: linux-fsdevel, amir73il, bpf, cgroups, kvm, netdev, torvalds

On Tue, Jul 30, 2024 at 01:15:51AM GMT, viro@kernel.org wrote:
> From: Al Viro <viro@zeniv.linux.org.uk>
> 
> Currently all emptiness checks are done as fd_file(...) in boolean
> context (usually something like if (!fd_file(f))...); those will be
> taken care of later.
> 
> However, there's a couple of places where we do those checks as
> 'store fd_file(...) into a variable, then check if this variable is
> NULL' and those are harder to spot.
> 
> Get rid of those now.
> 
> use fd_empty() instead of extracting file and then checking it for NULL.
> 
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> ---

Reviewed-by: Christian Brauner <brauner@kernel.org>

^ permalink raw reply	[flat|nested] 134+ messages in thread

* [PATCH 06/39] net/socket.c: switch to CLASS(fd)
  2024-07-30  5:15 ` [PATCH 01/39] memcg_write_event_control(): fix a user-triggerable oops viro
                     ` (3 preceding siblings ...)
  2024-07-30  5:15   ` [PATCH 05/39] regularize emptiness checks in fini_module(2) and vfs_dedupe_file_range() viro
@ 2024-07-30  5:15   ` viro
  2024-08-07 10:13     ` Christian Brauner
  2024-07-30  5:15   ` [PATCH 07/39] introduce struct fderr, convert overlayfs uses to that viro
                     ` (33 subsequent siblings)
  38 siblings, 1 reply; 134+ messages in thread
From: viro @ 2024-07-30  5:15 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: amir73il, bpf, brauner, cgroups, kvm, netdev, torvalds

From: Al Viro <viro@zeniv.linux.org.uk>

	I strongly suspect that important part in sockfd_lookup_light()
is avoiding needless file refcount operations, not the marginal reduction
of the register pressure from not keeping a struct file pointer in
the caller.

	If that's true, we should get the same benefits from straight
fdget()/fdput().  And AFAICS with sane use of CLASS(fd) we can get a
better code generation...

	Would be nice if somebody tested it on networking test suites
(including benchmarks)...

	sockfd_lookup_light() does fdget(), uses sock_from_file() to
get the associated socket and returns the struct socket reference to
the caller, along with "do we need to fput()" flag.  No matching fdput(),
the caller does its equivalent manually, using the fact that sock->file
points to the struct file the socket has come from.

	Get rid of that - have the callers do fdget()/fdput() and
use sock_from_file() directly.  That kills sockfd_lookup_light()
and fput_light() (no users left).

	What's more, we can get rid of explicit fdget()/fdput() by
switching to CLASS(fd, ...) - code generation does not suffer, since
now fdput() inserted on "descriptor is not opened" failure exit
is recognized to be a no-op by compiler.

	We could split that commit in two (getting rid of sockd_lookup_light()
and switch to CLASS(fd, ...)), but AFAICS it ends up being harder to read
that way.

[conflicts in a couple of functions]

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 include/linux/file.h |   6 -
 net/socket.c         | 303 +++++++++++++++++++------------------------
 2 files changed, 137 insertions(+), 172 deletions(-)

diff --git a/include/linux/file.h b/include/linux/file.h
index 00a42604d322..3353d70fd460 100644
--- a/include/linux/file.h
+++ b/include/linux/file.h
@@ -29,12 +29,6 @@ extern struct file *alloc_file_pseudo_noaccount(struct inode *, struct vfsmount
 extern struct file *alloc_file_clone(struct file *, int flags,
 	const struct file_operations *);
 
-static inline void fput_light(struct file *file, int fput_needed)
-{
-	if (fput_needed)
-		fput(file);
-}
-
 /* either a reference to struct file + flags
  * (cloned vs. borrowed, pos locked), with
  * flags stored in lower bits of value,
diff --git a/net/socket.c b/net/socket.c
index c0d4f5032374..c09c69eddafa 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -510,7 +510,7 @@ static int sock_map_fd(struct socket *sock, int flags)
 
 struct socket *sock_from_file(struct file *file)
 {
-	if (file->f_op == &socket_file_ops)
+	if (likely(file->f_op == &socket_file_ops))
 		return file->private_data;	/* set in sock_alloc_file */
 
 	return NULL;
@@ -550,24 +550,6 @@ struct socket *sockfd_lookup(int fd, int *err)
 }
 EXPORT_SYMBOL(sockfd_lookup);
 
-static struct socket *sockfd_lookup_light(int fd, int *err, int *fput_needed)
-{
-	struct fd f = fdget(fd);
-	struct socket *sock;
-
-	*err = -EBADF;
-	if (fd_file(f)) {
-		sock = sock_from_file(fd_file(f));
-		if (likely(sock)) {
-			*fput_needed = f.word & FDPUT_FPUT;
-			return sock;
-		}
-		*err = -ENOTSOCK;
-		fdput(f);
-	}
-	return NULL;
-}
-
 static ssize_t sockfs_listxattr(struct dentry *dentry, char *buffer,
 				size_t size)
 {
@@ -1848,16 +1830,20 @@ int __sys_bind(int fd, struct sockaddr __user *umyaddr, int addrlen)
 {
 	struct socket *sock;
 	struct sockaddr_storage address;
-	int err, fput_needed;
-
-	sock = sockfd_lookup_light(fd, &err, &fput_needed);
-	if (sock) {
-		err = move_addr_to_kernel(umyaddr, addrlen, &address);
-		if (!err)
-			err = __sys_bind_socket(sock, &address, addrlen);
-		fput_light(sock->file, fput_needed);
-	}
-	return err;
+	CLASS(fd, f)(fd);
+	int err;
+
+	if (fd_empty(f))
+		return -EBADF;
+	sock = sock_from_file(fd_file(f));
+	if (unlikely(!sock))
+		return -ENOTSOCK;
+
+	err = move_addr_to_kernel(umyaddr, addrlen, &address);
+	if (unlikely(err))
+		return err;
+
+	return __sys_bind_socket(sock, &address, addrlen);
 }
 
 SYSCALL_DEFINE3(bind, int, fd, struct sockaddr __user *, umyaddr, int, addrlen)
@@ -1886,15 +1872,16 @@ int __sys_listen_socket(struct socket *sock, int backlog)
 
 int __sys_listen(int fd, int backlog)
 {
+	CLASS(fd, f)(fd);
 	struct socket *sock;
-	int err, fput_needed;
 
-	sock = sockfd_lookup_light(fd, &err, &fput_needed);
-	if (sock) {
-		err = __sys_listen_socket(sock, backlog);
-		fput_light(sock->file, fput_needed);
-	}
-	return err;
+	if (fd_empty(f))
+		return -EBADF;
+	sock = sock_from_file(fd_file(f));
+	if (unlikely(!sock))
+		return -ENOTSOCK;
+
+	return __sys_listen_socket(sock, backlog);
 }
 
 SYSCALL_DEFINE2(listen, int, fd, int, backlog)
@@ -2004,17 +1991,12 @@ static int __sys_accept4_file(struct file *file, struct sockaddr __user *upeer_s
 int __sys_accept4(int fd, struct sockaddr __user *upeer_sockaddr,
 		  int __user *upeer_addrlen, int flags)
 {
-	int ret = -EBADF;
-	struct fd f;
+	CLASS(fd, f)(fd);
 
-	f = fdget(fd);
-	if (fd_file(f)) {
-		ret = __sys_accept4_file(fd_file(f), upeer_sockaddr,
+	if (fd_empty(f))
+		return -EBADF;
+	return __sys_accept4_file(fd_file(f), upeer_sockaddr,
 					 upeer_addrlen, flags);
-		fdput(f);
-	}
-
-	return ret;
 }
 
 SYSCALL_DEFINE4(accept4, int, fd, struct sockaddr __user *, upeer_sockaddr,
@@ -2066,20 +2048,18 @@ int __sys_connect_file(struct file *file, struct sockaddr_storage *address,
 
 int __sys_connect(int fd, struct sockaddr __user *uservaddr, int addrlen)
 {
-	int ret = -EBADF;
-	struct fd f;
+	struct sockaddr_storage address;
+	CLASS(fd, f)(fd);
+	int ret;
 
-	f = fdget(fd);
-	if (fd_file(f)) {
-		struct sockaddr_storage address;
+	if (fd_empty(f))
+		return -EBADF;
 
-		ret = move_addr_to_kernel(uservaddr, addrlen, &address);
-		if (!ret)
-			ret = __sys_connect_file(fd_file(f), &address, addrlen, 0);
-		fdput(f);
-	}
+	ret = move_addr_to_kernel(uservaddr, addrlen, &address);
+	if (ret)
+		return ret;
 
-	return ret;
+	return __sys_connect_file(fd_file(f), &address, addrlen, 0);
 }
 
 SYSCALL_DEFINE3(connect, int, fd, struct sockaddr __user *, uservaddr,
@@ -2098,26 +2078,25 @@ int __sys_getsockname(int fd, struct sockaddr __user *usockaddr,
 {
 	struct socket *sock;
 	struct sockaddr_storage address;
-	int err, fput_needed;
+	CLASS(fd, f)(fd);
+	int err;
 
-	sock = sockfd_lookup_light(fd, &err, &fput_needed);
-	if (!sock)
-		goto out;
+	if (fd_empty(f))
+		return -EBADF;
+	sock = sock_from_file(fd_file(f));
+	if (unlikely(!sock))
+		return -ENOTSOCK;
 
 	err = security_socket_getsockname(sock);
 	if (err)
-		goto out_put;
+		return err;
 
 	err = READ_ONCE(sock->ops)->getname(sock, (struct sockaddr *)&address, 0);
 	if (err < 0)
-		goto out_put;
-	/* "err" is actually length in this case */
-	err = move_addr_to_user(&address, err, usockaddr, usockaddr_len);
+		return err;
 
-out_put:
-	fput_light(sock->file, fput_needed);
-out:
-	return err;
+	/* "err" is actually length in this case */
+	return move_addr_to_user(&address, err, usockaddr, usockaddr_len);
 }
 
 SYSCALL_DEFINE3(getsockname, int, fd, struct sockaddr __user *, usockaddr,
@@ -2136,26 +2115,25 @@ int __sys_getpeername(int fd, struct sockaddr __user *usockaddr,
 {
 	struct socket *sock;
 	struct sockaddr_storage address;
-	int err, fput_needed;
+	CLASS(fd, f)(fd);
+	int err;
 
-	sock = sockfd_lookup_light(fd, &err, &fput_needed);
-	if (sock != NULL) {
-		const struct proto_ops *ops = READ_ONCE(sock->ops);
+	if (fd_empty(f))
+		return -EBADF;
+	sock = sock_from_file(fd_file(f));
+	if (unlikely(!sock))
+		return -ENOTSOCK;
 
-		err = security_socket_getpeername(sock);
-		if (err) {
-			fput_light(sock->file, fput_needed);
-			return err;
-		}
+	err = security_socket_getpeername(sock);
+	if (err)
+		return err;
 
-		err = ops->getname(sock, (struct sockaddr *)&address, 1);
-		if (err >= 0)
-			/* "err" is actually length in this case */
-			err = move_addr_to_user(&address, err, usockaddr,
-						usockaddr_len);
-		fput_light(sock->file, fput_needed);
-	}
-	return err;
+	err = READ_ONCE(sock->ops)->getname(sock, (struct sockaddr *)&address, 1);
+	if (err < 0)
+		return err;
+
+	/* "err" is actually length in this case */
+	return move_addr_to_user(&address, err, usockaddr, usockaddr_len);
 }
 
 SYSCALL_DEFINE3(getpeername, int, fd, struct sockaddr __user *, usockaddr,
@@ -2176,14 +2154,17 @@ int __sys_sendto(int fd, void __user *buff, size_t len, unsigned int flags,
 	struct sockaddr_storage address;
 	int err;
 	struct msghdr msg;
-	int fput_needed;
 
 	err = import_ubuf(ITER_SOURCE, buff, len, &msg.msg_iter);
 	if (unlikely(err))
 		return err;
-	sock = sockfd_lookup_light(fd, &err, &fput_needed);
-	if (!sock)
-		goto out;
+
+	CLASS(fd, f)(fd);
+	if (fd_empty(f))
+		return -EBADF;
+	sock = sock_from_file(fd_file(f));
+	if (unlikely(!sock))
+		return -ENOTSOCK;
 
 	msg.msg_name = NULL;
 	msg.msg_control = NULL;
@@ -2193,7 +2174,7 @@ int __sys_sendto(int fd, void __user *buff, size_t len, unsigned int flags,
 	if (addr) {
 		err = move_addr_to_kernel(addr, addr_len, &address);
 		if (err < 0)
-			goto out_put;
+			return err;
 		msg.msg_name = (struct sockaddr *)&address;
 		msg.msg_namelen = addr_len;
 	}
@@ -2201,12 +2182,7 @@ int __sys_sendto(int fd, void __user *buff, size_t len, unsigned int flags,
 	if (sock->file->f_flags & O_NONBLOCK)
 		flags |= MSG_DONTWAIT;
 	msg.msg_flags = flags;
-	err = __sock_sendmsg(sock, &msg);
-
-out_put:
-	fput_light(sock->file, fput_needed);
-out:
-	return err;
+	return __sock_sendmsg(sock, &msg);
 }
 
 SYSCALL_DEFINE6(sendto, int, fd, void __user *, buff, size_t, len,
@@ -2241,14 +2217,18 @@ int __sys_recvfrom(int fd, void __user *ubuf, size_t size, unsigned int flags,
 	};
 	struct socket *sock;
 	int err, err2;
-	int fput_needed;
 
 	err = import_ubuf(ITER_DEST, ubuf, size, &msg.msg_iter);
 	if (unlikely(err))
 		return err;
-	sock = sockfd_lookup_light(fd, &err, &fput_needed);
-	if (!sock)
-		goto out;
+
+	CLASS(fd, f)(fd);
+
+	if (fd_empty(f))
+		return -EBADF;
+	sock = sock_from_file(fd_file(f));
+	if (unlikely(!sock))
+		return -ENOTSOCK;
 
 	if (sock->file->f_flags & O_NONBLOCK)
 		flags |= MSG_DONTWAIT;
@@ -2260,9 +2240,6 @@ int __sys_recvfrom(int fd, void __user *ubuf, size_t size, unsigned int flags,
 		if (err2 < 0)
 			err = err2;
 	}
-
-	fput_light(sock->file, fput_needed);
-out:
 	return err;
 }
 
@@ -2337,17 +2314,16 @@ int __sys_setsockopt(int fd, int level, int optname, char __user *user_optval,
 {
 	sockptr_t optval = USER_SOCKPTR(user_optval);
 	bool compat = in_compat_syscall();
-	int err, fput_needed;
 	struct socket *sock;
+	CLASS(fd, f)(fd);
 
-	sock = sockfd_lookup_light(fd, &err, &fput_needed);
-	if (!sock)
-		return err;
-
-	err = do_sock_setsockopt(sock, compat, level, optname, optval, optlen);
+	if (fd_empty(f))
+		return -EBADF;
+	sock = sock_from_file(fd_file(f));
+	if (unlikely(!sock))
+		return -ENOTSOCK;
 
-	fput_light(sock->file, fput_needed);
-	return err;
+	return do_sock_setsockopt(sock, compat, level, optname, optval, optlen);
 }
 
 SYSCALL_DEFINE5(setsockopt, int, fd, int, level, int, optname,
@@ -2403,20 +2379,17 @@ EXPORT_SYMBOL(do_sock_getsockopt);
 int __sys_getsockopt(int fd, int level, int optname, char __user *optval,
 		int __user *optlen)
 {
-	int err, fput_needed;
 	struct socket *sock;
-	bool compat;
+	CLASS(fd, f)(fd);
 
-	sock = sockfd_lookup_light(fd, &err, &fput_needed);
-	if (!sock)
-		return err;
+	if (fd_empty(f))
+		return -EBADF;
+	sock = sock_from_file(fd_file(f));
+	if (unlikely(!sock))
+		return -ENOTSOCK;
 
-	compat = in_compat_syscall();
-	err = do_sock_getsockopt(sock, compat, level, optname,
+	return do_sock_getsockopt(sock, in_compat_syscall(), level, optname,
 				 USER_SOCKPTR(optval), USER_SOCKPTR(optlen));
-
-	fput_light(sock->file, fput_needed);
-	return err;
 }
 
 SYSCALL_DEFINE5(getsockopt, int, fd, int, level, int, optname,
@@ -2442,15 +2415,16 @@ int __sys_shutdown_sock(struct socket *sock, int how)
 
 int __sys_shutdown(int fd, int how)
 {
-	int err, fput_needed;
 	struct socket *sock;
+	CLASS(fd, f)(fd);
 
-	sock = sockfd_lookup_light(fd, &err, &fput_needed);
-	if (sock != NULL) {
-		err = __sys_shutdown_sock(sock, how);
-		fput_light(sock->file, fput_needed);
-	}
-	return err;
+	if (fd_empty(f))
+		return -EBADF;
+	sock = sock_from_file(fd_file(f));
+	if (unlikely(!sock))
+		return -ENOTSOCK;
+
+	return __sys_shutdown_sock(sock, how);
 }
 
 SYSCALL_DEFINE2(shutdown, int, fd, int, how)
@@ -2666,22 +2640,21 @@ long __sys_sendmsg_sock(struct socket *sock, struct msghdr *msg,
 long __sys_sendmsg(int fd, struct user_msghdr __user *msg, unsigned int flags,
 		   bool forbid_cmsg_compat)
 {
-	int fput_needed, err;
 	struct msghdr msg_sys;
 	struct socket *sock;
 
 	if (forbid_cmsg_compat && (flags & MSG_CMSG_COMPAT))
 		return -EINVAL;
 
-	sock = sockfd_lookup_light(fd, &err, &fput_needed);
-	if (!sock)
-		goto out;
+	CLASS(fd, f)(fd);
 
-	err = ___sys_sendmsg(sock, msg, &msg_sys, flags, NULL, 0);
+	if (fd_empty(f))
+		return -EBADF;
+	sock = sock_from_file(fd_file(f));
+	if (unlikely(!sock))
+		return -ENOTSOCK;
 
-	fput_light(sock->file, fput_needed);
-out:
-	return err;
+	return ___sys_sendmsg(sock, msg, &msg_sys, flags, NULL, 0);
 }
 
 SYSCALL_DEFINE3(sendmsg, int, fd, struct user_msghdr __user *, msg, unsigned int, flags)
@@ -2696,7 +2669,7 @@ SYSCALL_DEFINE3(sendmsg, int, fd, struct user_msghdr __user *, msg, unsigned int
 int __sys_sendmmsg(int fd, struct mmsghdr __user *mmsg, unsigned int vlen,
 		   unsigned int flags, bool forbid_cmsg_compat)
 {
-	int fput_needed, err, datagrams;
+	int err, datagrams;
 	struct socket *sock;
 	struct mmsghdr __user *entry;
 	struct compat_mmsghdr __user *compat_entry;
@@ -2712,9 +2685,13 @@ int __sys_sendmmsg(int fd, struct mmsghdr __user *mmsg, unsigned int vlen,
 
 	datagrams = 0;
 
-	sock = sockfd_lookup_light(fd, &err, &fput_needed);
-	if (!sock)
-		return err;
+	CLASS(fd, f)(fd);
+
+	if (fd_empty(f))
+		return -EBADF;
+	sock = sock_from_file(fd_file(f));
+	if (unlikely(!sock))
+		return -ENOTSOCK;
 
 	used_address.name_len = UINT_MAX;
 	entry = mmsg;
@@ -2751,8 +2728,6 @@ int __sys_sendmmsg(int fd, struct mmsghdr __user *mmsg, unsigned int vlen,
 		cond_resched();
 	}
 
-	fput_light(sock->file, fput_needed);
-
 	/* We only return an error if no datagrams were able to be sent */
 	if (datagrams != 0)
 		return datagrams;
@@ -2874,22 +2849,21 @@ long __sys_recvmsg_sock(struct socket *sock, struct msghdr *msg,
 long __sys_recvmsg(int fd, struct user_msghdr __user *msg, unsigned int flags,
 		   bool forbid_cmsg_compat)
 {
-	int fput_needed, err;
 	struct msghdr msg_sys;
 	struct socket *sock;
 
 	if (forbid_cmsg_compat && (flags & MSG_CMSG_COMPAT))
 		return -EINVAL;
 
-	sock = sockfd_lookup_light(fd, &err, &fput_needed);
-	if (!sock)
-		goto out;
+	CLASS(fd, f)(fd);
 
-	err = ___sys_recvmsg(sock, msg, &msg_sys, flags, 0);
+	if (fd_empty(f))
+		return -EBADF;
+	sock = sock_from_file(fd_file(f));
+	if (unlikely(!sock))
+		return -ENOTSOCK;
 
-	fput_light(sock->file, fput_needed);
-out:
-	return err;
+	return ___sys_recvmsg(sock, msg, &msg_sys, flags, 0);
 }
 
 SYSCALL_DEFINE3(recvmsg, int, fd, struct user_msghdr __user *, msg,
@@ -2906,7 +2880,7 @@ static int do_recvmmsg(int fd, struct mmsghdr __user *mmsg,
 			  unsigned int vlen, unsigned int flags,
 			  struct timespec64 *timeout)
 {
-	int fput_needed, err, datagrams;
+	int err, datagrams;
 	struct socket *sock;
 	struct mmsghdr __user *entry;
 	struct compat_mmsghdr __user *compat_entry;
@@ -2921,16 +2895,18 @@ static int do_recvmmsg(int fd, struct mmsghdr __user *mmsg,
 
 	datagrams = 0;
 
-	sock = sockfd_lookup_light(fd, &err, &fput_needed);
-	if (!sock)
-		return err;
+	CLASS(fd, f)(fd);
+
+	if (fd_empty(f))
+		return -EBADF;
+	sock = sock_from_file(fd_file(f));
+	if (unlikely(!sock))
+		return -ENOTSOCK;
 
 	if (likely(!(flags & MSG_ERRQUEUE))) {
 		err = sock_error(sock->sk);
-		if (err) {
-			datagrams = err;
-			goto out_put;
-		}
+		if (err)
+			return err;
 	}
 
 	entry = mmsg;
@@ -2987,12 +2963,10 @@ static int do_recvmmsg(int fd, struct mmsghdr __user *mmsg,
 	}
 
 	if (err == 0)
-		goto out_put;
+		return datagrams;
 
-	if (datagrams == 0) {
-		datagrams = err;
-		goto out_put;
-	}
+	if (datagrams == 0)
+		return err;
 
 	/*
 	 * We may return less entries than requested (vlen) if the
@@ -3007,9 +2981,6 @@ static int do_recvmmsg(int fd, struct mmsghdr __user *mmsg,
 		 */
 		WRITE_ONCE(sock->sk->sk_err, -err);
 	}
-out_put:
-	fput_light(sock->file, fput_needed);
-
 	return datagrams;
 }
 
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 134+ messages in thread

* Re: [PATCH 06/39] net/socket.c: switch to CLASS(fd)
  2024-07-30  5:15   ` [PATCH 06/39] net/socket.c: switch to CLASS(fd) viro
@ 2024-08-07 10:13     ` Christian Brauner
  0 siblings, 0 replies; 134+ messages in thread
From: Christian Brauner @ 2024-08-07 10:13 UTC (permalink / raw)
  To: viro; +Cc: linux-fsdevel, amir73il, bpf, cgroups, kvm, netdev, torvalds

On Tue, Jul 30, 2024 at 01:15:52AM GMT, viro@kernel.org wrote:
> From: Al Viro <viro@zeniv.linux.org.uk>
> 
> 	I strongly suspect that important part in sockfd_lookup_light()
> is avoiding needless file refcount operations, not the marginal reduction
> of the register pressure from not keeping a struct file pointer in
> the caller.
> 
> 	If that's true, we should get the same benefits from straight
> fdget()/fdput().  And AFAICS with sane use of CLASS(fd) we can get a
> better code generation...
> 
> 	Would be nice if somebody tested it on networking test suites
> (including benchmarks)...
> 
> 	sockfd_lookup_light() does fdget(), uses sock_from_file() to
> get the associated socket and returns the struct socket reference to
> the caller, along with "do we need to fput()" flag.  No matching fdput(),
> the caller does its equivalent manually, using the fact that sock->file
> points to the struct file the socket has come from.
> 
> 	Get rid of that - have the callers do fdget()/fdput() and
> use sock_from_file() directly.  That kills sockfd_lookup_light()
> and fput_light() (no users left).
> 
> 	What's more, we can get rid of explicit fdget()/fdput() by
> switching to CLASS(fd, ...) - code generation does not suffer, since
> now fdput() inserted on "descriptor is not opened" failure exit
> is recognized to be a no-op by compiler.
> 
> 	We could split that commit in two (getting rid of sockd_lookup_light()
> and switch to CLASS(fd, ...)), but AFAICS it ends up being harder to read
> that way.
> 
> [conflicts in a couple of functions]
> 
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> ---

Reviewed-by: Christian Brauner <brauner@kernel.org>

^ permalink raw reply	[flat|nested] 134+ messages in thread

* [PATCH 07/39] introduce struct fderr, convert overlayfs uses to that
  2024-07-30  5:15 ` [PATCH 01/39] memcg_write_event_control(): fix a user-triggerable oops viro
                     ` (4 preceding siblings ...)
  2024-07-30  5:15   ` [PATCH 06/39] net/socket.c: switch to CLASS(fd) viro
@ 2024-07-30  5:15   ` viro
  2024-07-30  5:15   ` [PATCH 08/39] experimental: convert fs/overlayfs/file.c to CLASS(...) viro
                     ` (32 subsequent siblings)
  38 siblings, 0 replies; 134+ messages in thread
From: viro @ 2024-07-30  5:15 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: amir73il, bpf, brauner, cgroups, kvm, netdev, torvalds

From: Al Viro <viro@zeniv.linux.org.uk>

Similar to struct fd; unlike struct fd, it can represent
error values.

Accessors:

* fd_empty(f):	true if f represents an error
* fd_file(f):	just as for struct fd it yields a pointer to
		struct file if fd_empty(f) is false.  If
		fd_empty(f) is true, fd_file(f) is guaranteed
		_not_ to be an address of any object (IS_ERR()
		will be true in that case)
* fd_error(f):	if f represents an error, returns that error,
		otherwise the return value is junk.

Constructors:

* ERR_FD(-E...):	an instance encoding given error [ERR_FDERR, perhaps?]
* BORROWED_FDERR(file):	if file points to a struct file instance,
			return a struct fderr representing that file
			reference with no flags set.
			if file is an ERR_PTR(-E...), return a struct
			fderr representing that error.
			file MUST NOT be NULL.
* CLONED_FDERR(file):	similar, but in case when file points to
			a struct file instance, set FDPUT_FPUT in flags.

Same destructor as for struct fd; I'm not entirely convinced that
playing with _Generic is a good idea here, but for now let's go
that way...

See fs/overlayfs/file.c for example of use.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/overlayfs/file.c  | 125 +++++++++++++++++++++----------------------
 include/linux/file.h |  38 +++++++++++--
 2 files changed, 95 insertions(+), 68 deletions(-)

diff --git a/fs/overlayfs/file.c b/fs/overlayfs/file.c
index 2b7a5a3a7a2f..4b9e145bc7b8 100644
--- a/fs/overlayfs/file.c
+++ b/fs/overlayfs/file.c
@@ -89,56 +89,47 @@ static int ovl_change_flags(struct file *file, unsigned int flags)
 	return 0;
 }
 
-static int ovl_real_fdget_meta(const struct file *file, struct fd *real,
-			       bool allow_meta)
+static struct fderr ovl_real_fdget_meta(const struct file *file, bool allow_meta)
 {
 	struct dentry *dentry = file_dentry(file);
 	struct file *realfile = file->private_data;
 	struct path realpath;
 	int err;
 
-	real->word = (unsigned long)realfile;
-
 	if (allow_meta) {
 		ovl_path_real(dentry, &realpath);
 	} else {
 		/* lazy lookup and verify of lowerdata */
 		err = ovl_verify_lowerdata(dentry);
 		if (err)
-			return err;
+			return ERR_FD(err);
 
 		ovl_path_realdata(dentry, &realpath);
 	}
 	if (!realpath.dentry)
-		return -EIO;
+		return ERR_FD(-EIO);
 
 	/* Has it been copied up since we'd opened it? */
 	if (unlikely(file_inode(realfile) != d_inode(realpath.dentry))) {
-		struct file *f = ovl_open_realfile(file, &realpath);
-		if (IS_ERR(f))
-			return PTR_ERR(f);
-		real->word = (unsigned long)ovl_open_realfile(file, &realpath) | FDPUT_FPUT;
-		return 0;
+		return CLONED_FDERR(ovl_open_realfile(file, &realpath));
 	}
 
 	/* Did the flags change since open? */
-	if (unlikely((file->f_flags ^ realfile->f_flags) & ~OVL_OPEN_FLAGS))
-		return ovl_change_flags(realfile, file->f_flags);
+	if (unlikely((file->f_flags ^ realfile->f_flags) & ~OVL_OPEN_FLAGS)) {
+		err = ovl_change_flags(realfile, file->f_flags);
+		if (err)
+			return ERR_FD(err);
+	}
 
-	return 0;
+	return BORROWED_FDERR(realfile);
 }
 
-static int ovl_real_fdget(const struct file *file, struct fd *real)
+static struct fderr ovl_real_fdget(const struct file *file)
 {
-	if (d_is_dir(file_dentry(file))) {
-		struct file *f = ovl_dir_real_file(file, false);
-		if (IS_ERR(f))
-			return PTR_ERR(f);
-		real->word = (unsigned long)f;
-		return 0;
-	}
+	if (d_is_dir(file_dentry(file)))
+		return BORROWED_FDERR(ovl_dir_real_file(file, false));
 
-	return ovl_real_fdget_meta(file, real, false);
+	return ovl_real_fdget_meta(file, false);
 }
 
 static int ovl_open(struct inode *inode, struct file *file)
@@ -183,7 +174,7 @@ static int ovl_release(struct inode *inode, struct file *file)
 static loff_t ovl_llseek(struct file *file, loff_t offset, int whence)
 {
 	struct inode *inode = file_inode(file);
-	struct fd real;
+	struct fderr real;
 	const struct cred *old_cred;
 	loff_t ret;
 
@@ -199,9 +190,9 @@ static loff_t ovl_llseek(struct file *file, loff_t offset, int whence)
 			return vfs_setpos(file, 0, 0);
 	}
 
-	ret = ovl_real_fdget(file, &real);
-	if (ret)
-		return ret;
+	real = ovl_real_fdget(file);
+	if (fd_empty(real))
+		return fd_error(real);
 
 	/*
 	 * Overlay file f_pos is the master copy that is preserved
@@ -262,7 +253,7 @@ static void ovl_file_accessed(struct file *file)
 static ssize_t ovl_read_iter(struct kiocb *iocb, struct iov_iter *iter)
 {
 	struct file *file = iocb->ki_filp;
-	struct fd real;
+	struct fderr real;
 	ssize_t ret;
 	struct backing_file_ctx ctx = {
 		.cred = ovl_creds(file_inode(file)->i_sb),
@@ -273,9 +264,9 @@ static ssize_t ovl_read_iter(struct kiocb *iocb, struct iov_iter *iter)
 	if (!iov_iter_count(iter))
 		return 0;
 
-	ret = ovl_real_fdget(file, &real);
-	if (ret)
-		return ret;
+	real = ovl_real_fdget(file);
+	if (fd_empty(real))
+		return fd_error(real);
 
 	ret = backing_file_read_iter(fd_file(real), iter, iocb, iocb->ki_flags,
 				     &ctx);
@@ -288,7 +279,7 @@ static ssize_t ovl_write_iter(struct kiocb *iocb, struct iov_iter *iter)
 {
 	struct file *file = iocb->ki_filp;
 	struct inode *inode = file_inode(file);
-	struct fd real;
+	struct fderr real;
 	ssize_t ret;
 	int ifl = iocb->ki_flags;
 	struct backing_file_ctx ctx = {
@@ -304,9 +295,11 @@ static ssize_t ovl_write_iter(struct kiocb *iocb, struct iov_iter *iter)
 	/* Update mode */
 	ovl_copyattr(inode);
 
-	ret = ovl_real_fdget(file, &real);
-	if (ret)
+	real = ovl_real_fdget(file);
+	if (fd_empty(real)) {
+		ret = fd_error(real);
 		goto out_unlock;
+	}
 
 	if (!ovl_should_sync(OVL_FS(inode->i_sb)))
 		ifl &= ~(IOCB_DSYNC | IOCB_SYNC);
@@ -329,7 +322,7 @@ static ssize_t ovl_splice_read(struct file *in, loff_t *ppos,
 			       struct pipe_inode_info *pipe, size_t len,
 			       unsigned int flags)
 {
-	struct fd real;
+	struct fderr real;
 	ssize_t ret;
 	struct backing_file_ctx ctx = {
 		.cred = ovl_creds(file_inode(in)->i_sb),
@@ -337,9 +330,9 @@ static ssize_t ovl_splice_read(struct file *in, loff_t *ppos,
 		.accessed = ovl_file_accessed,
 	};
 
-	ret = ovl_real_fdget(in, &real);
-	if (ret)
-		return ret;
+	real = ovl_real_fdget(in);
+	if (fd_empty(real))
+		return fd_error(real);
 
 	ret = backing_file_splice_read(fd_file(real), ppos, pipe, len, flags, &ctx);
 	fdput(real);
@@ -358,7 +351,7 @@ static ssize_t ovl_splice_read(struct file *in, loff_t *ppos,
 static ssize_t ovl_splice_write(struct pipe_inode_info *pipe, struct file *out,
 				loff_t *ppos, size_t len, unsigned int flags)
 {
-	struct fd real;
+	struct fderr real;
 	struct inode *inode = file_inode(out);
 	ssize_t ret;
 	struct backing_file_ctx ctx = {
@@ -371,9 +364,11 @@ static ssize_t ovl_splice_write(struct pipe_inode_info *pipe, struct file *out,
 	/* Update mode */
 	ovl_copyattr(inode);
 
-	ret = ovl_real_fdget(out, &real);
-	if (ret)
+	real = ovl_real_fdget(out);
+	if (fd_empty(real)) {
+		ret = fd_error(real);
 		goto out_unlock;
+	}
 
 	ret = backing_file_splice_write(pipe, fd_file(real), ppos, len, flags, &ctx);
 	fdput(real);
@@ -386,7 +381,7 @@ static ssize_t ovl_splice_write(struct pipe_inode_info *pipe, struct file *out,
 
 static int ovl_fsync(struct file *file, loff_t start, loff_t end, int datasync)
 {
-	struct fd real;
+	struct fderr real;
 	const struct cred *old_cred;
 	int ret;
 
@@ -394,9 +389,9 @@ static int ovl_fsync(struct file *file, loff_t start, loff_t end, int datasync)
 	if (ret <= 0)
 		return ret;
 
-	ret = ovl_real_fdget_meta(file, &real, !datasync);
-	if (ret)
-		return ret;
+	real = ovl_real_fdget_meta(file, !datasync);
+	if (fd_empty(real))
+		return fd_error(real);
 
 	/* Don't sync lower file for fear of receiving EROFS error */
 	if (file_inode(fd_file(real)) == ovl_inode_upper(file_inode(file))) {
@@ -425,7 +420,7 @@ static int ovl_mmap(struct file *file, struct vm_area_struct *vma)
 static long ovl_fallocate(struct file *file, int mode, loff_t offset, loff_t len)
 {
 	struct inode *inode = file_inode(file);
-	struct fd real;
+	struct fderr real;
 	const struct cred *old_cred;
 	int ret;
 
@@ -435,10 +430,11 @@ static long ovl_fallocate(struct file *file, int mode, loff_t offset, loff_t len
 	ret = file_remove_privs(file);
 	if (ret)
 		goto out_unlock;
-
-	ret = ovl_real_fdget(file, &real);
-	if (ret)
+	real = ovl_real_fdget(file);
+	if (fd_empty(real)) {
+		ret = fd_error(real);
 		goto out_unlock;
+	}
 
 	old_cred = ovl_override_creds(file_inode(file)->i_sb);
 	ret = vfs_fallocate(fd_file(real), mode, offset, len);
@@ -457,13 +453,13 @@ static long ovl_fallocate(struct file *file, int mode, loff_t offset, loff_t len
 
 static int ovl_fadvise(struct file *file, loff_t offset, loff_t len, int advice)
 {
-	struct fd real;
+	struct fderr real;
 	const struct cred *old_cred;
 	int ret;
 
-	ret = ovl_real_fdget(file, &real);
-	if (ret)
-		return ret;
+	real = ovl_real_fdget(file);
+	if (fd_empty(real))
+		return fd_error(real);
 
 	old_cred = ovl_override_creds(file_inode(file)->i_sb);
 	ret = vfs_fadvise(fd_file(real), offset, len, advice);
@@ -485,7 +481,7 @@ static loff_t ovl_copyfile(struct file *file_in, loff_t pos_in,
 			    loff_t len, unsigned int flags, enum ovl_copyop op)
 {
 	struct inode *inode_out = file_inode(file_out);
-	struct fd real_in, real_out;
+	struct fderr real_in, real_out;
 	const struct cred *old_cred;
 	loff_t ret;
 
@@ -498,13 +494,16 @@ static loff_t ovl_copyfile(struct file *file_in, loff_t pos_in,
 			goto out_unlock;
 	}
 
-	ret = ovl_real_fdget(file_out, &real_out);
-	if (ret)
+	real_out = ovl_real_fdget(file_out);
+	if (fd_empty(real_out)) {
+		ret = fd_error(real_out);
 		goto out_unlock;
+	}
 
-	ret = ovl_real_fdget(file_in, &real_in);
-	if (ret) {
+	real_in = ovl_real_fdget(file_in);
+	if (fd_empty(real_in)) {
 		fdput(real_out);
+		ret = fd_error(real_in);
 		goto out_unlock;
 	}
 
@@ -577,13 +576,13 @@ static loff_t ovl_remap_file_range(struct file *file_in, loff_t pos_in,
 
 static int ovl_flush(struct file *file, fl_owner_t id)
 {
-	struct fd real;
+	struct fderr real;
 	const struct cred *old_cred;
-	int err;
+	int err = 0;
 
-	err = ovl_real_fdget(file, &real);
-	if (err)
-		return err;
+	real = ovl_real_fdget(file);
+	if (fd_empty(real))
+		return fd_error(real);
 
 	if (fd_file(real)->f_op->flush) {
 		old_cred = ovl_override_creds(file_inode(file)->i_sb);
diff --git a/include/linux/file.h b/include/linux/file.h
index 3353d70fd460..d3165d7a8112 100644
--- a/include/linux/file.h
+++ b/include/linux/file.h
@@ -10,6 +10,7 @@
 #include <linux/types.h>
 #include <linux/posix_types.h>
 #include <linux/errno.h>
+#include <linux/err.h>
 #include <linux/cleanup.h>
 
 struct file;
@@ -37,13 +38,26 @@ extern struct file *alloc_file_clone(struct file *, int flags,
 struct fd {
 	unsigned long word;
 };
+
+/* either a reference to struct file + flags
+ * (cloned vs. borrowed, pos locked), with
+ * flags stored in lower bits of value,
+ * or an error (represented by small negative value).
+ */
+struct fderr {
+	unsigned long word;
+};
+
 #define FDPUT_FPUT       1
 #define FDPUT_POS_UNLOCK 2
 
+#define fd_empty(f)	_Generic((f), \
+				struct fd: unlikely(!(f).word), \
+				struct fderr: IS_ERR_VALUE((f).word))
 #define fd_file(f) ((struct file *)((f).word & ~3))
-static inline bool fd_empty(struct fd f)
+static inline long fd_error(struct fderr f)
 {
-	return unlikely(!f.word);
+	return (long)f.word;
 }
 
 #define EMPTY_FD (struct fd){0}
@@ -56,11 +70,25 @@ static inline struct fd CLONED_FD(struct file *f)
 	return (struct fd){(unsigned long)f | FDPUT_FPUT};
 }
 
-static inline void fdput(struct fd fd)
+static inline struct fderr ERR_FD(long n)
+{
+	return (struct fderr){(unsigned long)n};
+}
+static inline struct fderr BORROWED_FDERR(struct file *f)
 {
-	if (fd.word & FDPUT_FPUT)
-		fput(fd_file(fd));
+	return (struct fderr){(unsigned long)f};
 }
+static inline struct fderr CLONED_FDERR(struct file *f)
+{
+	if (IS_ERR(f))
+		return BORROWED_FDERR(f);
+	return (struct fderr){(unsigned long)f | FDPUT_FPUT};
+}
+
+#define fdput(f)	(void) (_Generic((f), \
+				struct fderr: IS_ERR_VALUE((f).word),	\
+				struct fd: true) && \
+			    ((f).word & FDPUT_FPUT) && (fput(fd_file(f)),0))
 
 extern struct file *fget(unsigned int fd);
 extern struct file *fget_raw(unsigned int fd);
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 134+ messages in thread

* [PATCH 08/39] experimental: convert fs/overlayfs/file.c to CLASS(...)
  2024-07-30  5:15 ` [PATCH 01/39] memcg_write_event_control(): fix a user-triggerable oops viro
                     ` (5 preceding siblings ...)
  2024-07-30  5:15   ` [PATCH 07/39] introduce struct fderr, convert overlayfs uses to that viro
@ 2024-07-30  5:15   ` viro
  2024-07-30 19:10     ` Josef Bacik
  2024-08-07 10:23     ` Christian Brauner
  2024-07-30  5:15   ` [PATCH 09/39] timerfd: switch to CLASS(fd, ...) viro
                     ` (31 subsequent siblings)
  38 siblings, 2 replies; 134+ messages in thread
From: viro @ 2024-07-30  5:15 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: amir73il, bpf, brauner, cgroups, kvm, netdev, torvalds

From: Al Viro <viro@zeniv.linux.org.uk>

There are four places where we end up adding an extra scope
covering just the range from constructor to destructor;
not sure if that's the best way to handle that.

The functions in question are ovl_write_iter(), ovl_splice_write(),
ovl_fadvise() and ovl_copyfile().

This is very likely *NOT* the final form of that thing - it
needs to be discussed.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/overlayfs/file.c | 72 ++++++++++++++++++---------------------------
 1 file changed, 29 insertions(+), 43 deletions(-)

diff --git a/fs/overlayfs/file.c b/fs/overlayfs/file.c
index 4b9e145bc7b8..a2911c632137 100644
--- a/fs/overlayfs/file.c
+++ b/fs/overlayfs/file.c
@@ -132,6 +132,8 @@ static struct fderr ovl_real_fdget(const struct file *file)
 	return ovl_real_fdget_meta(file, false);
 }
 
+DEFINE_CLASS(fd_real, struct fderr, fdput(_T), ovl_real_fdget(file), struct file *file)
+
 static int ovl_open(struct inode *inode, struct file *file)
 {
 	struct dentry *dentry = file_dentry(file);
@@ -174,7 +176,6 @@ static int ovl_release(struct inode *inode, struct file *file)
 static loff_t ovl_llseek(struct file *file, loff_t offset, int whence)
 {
 	struct inode *inode = file_inode(file);
-	struct fderr real;
 	const struct cred *old_cred;
 	loff_t ret;
 
@@ -190,7 +191,7 @@ static loff_t ovl_llseek(struct file *file, loff_t offset, int whence)
 			return vfs_setpos(file, 0, 0);
 	}
 
-	real = ovl_real_fdget(file);
+	CLASS(fd_real, real)(file);
 	if (fd_empty(real))
 		return fd_error(real);
 
@@ -211,8 +212,6 @@ static loff_t ovl_llseek(struct file *file, loff_t offset, int whence)
 	file->f_pos = fd_file(real)->f_pos;
 	ovl_inode_unlock(inode);
 
-	fdput(real);
-
 	return ret;
 }
 
@@ -253,8 +252,6 @@ static void ovl_file_accessed(struct file *file)
 static ssize_t ovl_read_iter(struct kiocb *iocb, struct iov_iter *iter)
 {
 	struct file *file = iocb->ki_filp;
-	struct fderr real;
-	ssize_t ret;
 	struct backing_file_ctx ctx = {
 		.cred = ovl_creds(file_inode(file)->i_sb),
 		.user_file = file,
@@ -264,22 +261,18 @@ static ssize_t ovl_read_iter(struct kiocb *iocb, struct iov_iter *iter)
 	if (!iov_iter_count(iter))
 		return 0;
 
-	real = ovl_real_fdget(file);
+	CLASS(fd_real, real)(file);
 	if (fd_empty(real))
 		return fd_error(real);
 
-	ret = backing_file_read_iter(fd_file(real), iter, iocb, iocb->ki_flags,
-				     &ctx);
-	fdput(real);
-
-	return ret;
+	return backing_file_read_iter(fd_file(real), iter, iocb, iocb->ki_flags,
+				      &ctx);
 }
 
 static ssize_t ovl_write_iter(struct kiocb *iocb, struct iov_iter *iter)
 {
 	struct file *file = iocb->ki_filp;
 	struct inode *inode = file_inode(file);
-	struct fderr real;
 	ssize_t ret;
 	int ifl = iocb->ki_flags;
 	struct backing_file_ctx ctx = {
@@ -295,7 +288,9 @@ static ssize_t ovl_write_iter(struct kiocb *iocb, struct iov_iter *iter)
 	/* Update mode */
 	ovl_copyattr(inode);
 
-	real = ovl_real_fdget(file);
+	{
+
+	CLASS(fd_real, real)(file);
 	if (fd_empty(real)) {
 		ret = fd_error(real);
 		goto out_unlock;
@@ -310,7 +305,8 @@ static ssize_t ovl_write_iter(struct kiocb *iocb, struct iov_iter *iter)
 	 */
 	ifl &= ~IOCB_DIO_CALLER_COMP;
 	ret = backing_file_write_iter(fd_file(real), iter, iocb, ifl, &ctx);
-	fdput(real);
+
+	}
 
 out_unlock:
 	inode_unlock(inode);
@@ -322,22 +318,18 @@ static ssize_t ovl_splice_read(struct file *in, loff_t *ppos,
 			       struct pipe_inode_info *pipe, size_t len,
 			       unsigned int flags)
 {
-	struct fderr real;
-	ssize_t ret;
+	CLASS(fd_real, real)(in);
 	struct backing_file_ctx ctx = {
 		.cred = ovl_creds(file_inode(in)->i_sb),
 		.user_file = in,
 		.accessed = ovl_file_accessed,
 	};
 
-	real = ovl_real_fdget(in);
 	if (fd_empty(real))
 		return fd_error(real);
 
-	ret = backing_file_splice_read(fd_file(real), ppos, pipe, len, flags, &ctx);
-	fdput(real);
-
-	return ret;
+	return backing_file_splice_read(fd_file(real), ppos, pipe, len, flags,
+					&ctx);
 }
 
 /*
@@ -351,7 +343,6 @@ static ssize_t ovl_splice_read(struct file *in, loff_t *ppos,
 static ssize_t ovl_splice_write(struct pipe_inode_info *pipe, struct file *out,
 				loff_t *ppos, size_t len, unsigned int flags)
 {
-	struct fderr real;
 	struct inode *inode = file_inode(out);
 	ssize_t ret;
 	struct backing_file_ctx ctx = {
@@ -364,15 +355,17 @@ static ssize_t ovl_splice_write(struct pipe_inode_info *pipe, struct file *out,
 	/* Update mode */
 	ovl_copyattr(inode);
 
-	real = ovl_real_fdget(out);
+	{
+
+	CLASS(fd_real, real)(out);
 	if (fd_empty(real)) {
 		ret = fd_error(real);
 		goto out_unlock;
 	}
 
 	ret = backing_file_splice_write(pipe, fd_file(real), ppos, len, flags, &ctx);
-	fdput(real);
 
+	}
 out_unlock:
 	inode_unlock(inode);
 
@@ -420,7 +413,6 @@ static int ovl_mmap(struct file *file, struct vm_area_struct *vma)
 static long ovl_fallocate(struct file *file, int mode, loff_t offset, loff_t len)
 {
 	struct inode *inode = file_inode(file);
-	struct fderr real;
 	const struct cred *old_cred;
 	int ret;
 
@@ -430,7 +422,9 @@ static long ovl_fallocate(struct file *file, int mode, loff_t offset, loff_t len
 	ret = file_remove_privs(file);
 	if (ret)
 		goto out_unlock;
-	real = ovl_real_fdget(file);
+	{
+
+	CLASS(fd_real, real)(file);
 	if (fd_empty(real)) {
 		ret = fd_error(real);
 		goto out_unlock;
@@ -443,8 +437,7 @@ static long ovl_fallocate(struct file *file, int mode, loff_t offset, loff_t len
 	/* Update size */
 	ovl_file_modified(file);
 
-	fdput(real);
-
+	}
 out_unlock:
 	inode_unlock(inode);
 
@@ -453,11 +446,10 @@ static long ovl_fallocate(struct file *file, int mode, loff_t offset, loff_t len
 
 static int ovl_fadvise(struct file *file, loff_t offset, loff_t len, int advice)
 {
-	struct fderr real;
+	CLASS(fd_real, real)(file);
 	const struct cred *old_cred;
 	int ret;
 
-	real = ovl_real_fdget(file);
 	if (fd_empty(real))
 		return fd_error(real);
 
@@ -465,8 +457,6 @@ static int ovl_fadvise(struct file *file, loff_t offset, loff_t len, int advice)
 	ret = vfs_fadvise(fd_file(real), offset, len, advice);
 	revert_creds(old_cred);
 
-	fdput(real);
-
 	return ret;
 }
 
@@ -481,7 +471,6 @@ static loff_t ovl_copyfile(struct file *file_in, loff_t pos_in,
 			    loff_t len, unsigned int flags, enum ovl_copyop op)
 {
 	struct inode *inode_out = file_inode(file_out);
-	struct fderr real_in, real_out;
 	const struct cred *old_cred;
 	loff_t ret;
 
@@ -494,15 +483,16 @@ static loff_t ovl_copyfile(struct file *file_in, loff_t pos_in,
 			goto out_unlock;
 	}
 
-	real_out = ovl_real_fdget(file_out);
+	{
+
+	CLASS(fd_real, real_out)(file_out);
 	if (fd_empty(real_out)) {
 		ret = fd_error(real_out);
 		goto out_unlock;
 	}
 
-	real_in = ovl_real_fdget(file_in);
+	CLASS(fd_real, real_in)(file_in);
 	if (fd_empty(real_in)) {
-		fdput(real_out);
 		ret = fd_error(real_in);
 		goto out_unlock;
 	}
@@ -530,8 +520,7 @@ static loff_t ovl_copyfile(struct file *file_in, loff_t pos_in,
 	/* Update size */
 	ovl_file_modified(file_out);
 
-	fdput(real_in);
-	fdput(real_out);
+	}
 
 out_unlock:
 	inode_unlock(inode_out);
@@ -576,11 +565,10 @@ static loff_t ovl_remap_file_range(struct file *file_in, loff_t pos_in,
 
 static int ovl_flush(struct file *file, fl_owner_t id)
 {
-	struct fderr real;
+	CLASS(fd_real, real)(file);
 	const struct cred *old_cred;
 	int err = 0;
 
-	real = ovl_real_fdget(file);
 	if (fd_empty(real))
 		return fd_error(real);
 
@@ -589,8 +577,6 @@ static int ovl_flush(struct file *file, fl_owner_t id)
 		err = fd_file(real)->f_op->flush(fd_file(real), id);
 		revert_creds(old_cred);
 	}
-	fdput(real);
-
 	return err;
 }
 
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 134+ messages in thread

* Re: [PATCH 08/39] experimental: convert fs/overlayfs/file.c to CLASS(...)
  2024-07-30  5:15   ` [PATCH 08/39] experimental: convert fs/overlayfs/file.c to CLASS(...) viro
@ 2024-07-30 19:10     ` Josef Bacik
  2024-07-30 21:12       ` Al Viro
  2024-08-07 10:23     ` Christian Brauner
  1 sibling, 1 reply; 134+ messages in thread
From: Josef Bacik @ 2024-07-30 19:10 UTC (permalink / raw)
  To: viro; +Cc: linux-fsdevel, amir73il, bpf, brauner, cgroups, kvm, netdev,
	torvalds

On Tue, Jul 30, 2024 at 01:15:54AM -0400, viro@kernel.org wrote:
> From: Al Viro <viro@zeniv.linux.org.uk>
> 
> There are four places where we end up adding an extra scope
> covering just the range from constructor to destructor;
> not sure if that's the best way to handle that.
> 
> The functions in question are ovl_write_iter(), ovl_splice_write(),
> ovl_fadvise() and ovl_copyfile().
> 
> This is very likely *NOT* the final form of that thing - it
> needs to be discussed.
> 
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> ---
>  fs/overlayfs/file.c | 72 ++++++++++++++++++---------------------------
>  1 file changed, 29 insertions(+), 43 deletions(-)
> 
> diff --git a/fs/overlayfs/file.c b/fs/overlayfs/file.c
> index 4b9e145bc7b8..a2911c632137 100644
> --- a/fs/overlayfs/file.c
> +++ b/fs/overlayfs/file.c
> @@ -132,6 +132,8 @@ static struct fderr ovl_real_fdget(const struct file *file)
>  	return ovl_real_fdget_meta(file, false);
>  }
>  
> +DEFINE_CLASS(fd_real, struct fderr, fdput(_T), ovl_real_fdget(file), struct file *file)
> +
>  static int ovl_open(struct inode *inode, struct file *file)
>  {
>  	struct dentry *dentry = file_dentry(file);
> @@ -174,7 +176,6 @@ static int ovl_release(struct inode *inode, struct file *file)
>  static loff_t ovl_llseek(struct file *file, loff_t offset, int whence)
>  {
>  	struct inode *inode = file_inode(file);
> -	struct fderr real;
>  	const struct cred *old_cred;
>  	loff_t ret;
>  
> @@ -190,7 +191,7 @@ static loff_t ovl_llseek(struct file *file, loff_t offset, int whence)
>  			return vfs_setpos(file, 0, 0);
>  	}
>  
> -	real = ovl_real_fdget(file);
> +	CLASS(fd_real, real)(file);
>  	if (fd_empty(real))
>  		return fd_error(real);
>  
> @@ -211,8 +212,6 @@ static loff_t ovl_llseek(struct file *file, loff_t offset, int whence)
>  	file->f_pos = fd_file(real)->f_pos;
>  	ovl_inode_unlock(inode);
>  
> -	fdput(real);
> -
>  	return ret;
>  }
>  
> @@ -253,8 +252,6 @@ static void ovl_file_accessed(struct file *file)
>  static ssize_t ovl_read_iter(struct kiocb *iocb, struct iov_iter *iter)
>  {
>  	struct file *file = iocb->ki_filp;
> -	struct fderr real;
> -	ssize_t ret;
>  	struct backing_file_ctx ctx = {
>  		.cred = ovl_creds(file_inode(file)->i_sb),
>  		.user_file = file,
> @@ -264,22 +261,18 @@ static ssize_t ovl_read_iter(struct kiocb *iocb, struct iov_iter *iter)
>  	if (!iov_iter_count(iter))
>  		return 0;
>  
> -	real = ovl_real_fdget(file);
> +	CLASS(fd_real, real)(file);
>  	if (fd_empty(real))
>  		return fd_error(real);
>  
> -	ret = backing_file_read_iter(fd_file(real), iter, iocb, iocb->ki_flags,
> -				     &ctx);
> -	fdput(real);
> -
> -	return ret;
> +	return backing_file_read_iter(fd_file(real), iter, iocb, iocb->ki_flags,
> +				      &ctx);
>  }
>  
>  static ssize_t ovl_write_iter(struct kiocb *iocb, struct iov_iter *iter)
>  {
>  	struct file *file = iocb->ki_filp;
>  	struct inode *inode = file_inode(file);
> -	struct fderr real;
>  	ssize_t ret;
>  	int ifl = iocb->ki_flags;
>  	struct backing_file_ctx ctx = {
> @@ -295,7 +288,9 @@ static ssize_t ovl_write_iter(struct kiocb *iocb, struct iov_iter *iter)
>  	/* Update mode */
>  	ovl_copyattr(inode);
>  
> -	real = ovl_real_fdget(file);
> +	{

Is this what we want to do from a code cleanliness standpoint?  This feels
pretty ugly to me, I feal like it would be better to have something like

scoped_class(fd_real, real) {
	// code
}

rather than the {} at the same indent level as the underlying block.

I don't feel super strongly about this, but I do feel like we need to either
explicitly say "this is the way/an acceptable way to do this" from a code
formatting standpoint, or we need to come up with a cleaner way of representing
the scoped area.  Thanks,

Josef

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCH 08/39] experimental: convert fs/overlayfs/file.c to CLASS(...)
  2024-07-30 19:10     ` Josef Bacik
@ 2024-07-30 21:12       ` Al Viro
  2024-07-31 21:11         ` Josef Bacik
  0 siblings, 1 reply; 134+ messages in thread
From: Al Viro @ 2024-07-30 21:12 UTC (permalink / raw)
  To: Josef Bacik
  Cc: viro, linux-fsdevel, amir73il, bpf, brauner, cgroups, kvm, netdev,
	torvalds

On Tue, Jul 30, 2024 at 03:10:25PM -0400, Josef Bacik wrote:
> On Tue, Jul 30, 2024 at 01:15:54AM -0400, viro@kernel.org wrote:
> > From: Al Viro <viro@zeniv.linux.org.uk>
> > 
> > There are four places where we end up adding an extra scope
> > covering just the range from constructor to destructor;
> > not sure if that's the best way to handle that.
> > 
> > The functions in question are ovl_write_iter(), ovl_splice_write(),
> > ovl_fadvise() and ovl_copyfile().
> > 
> > This is very likely *NOT* the final form of that thing - it
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> > needs to be discussed.

> Is this what we want to do from a code cleanliness standpoint?  This feels
> pretty ugly to me, I feal like it would be better to have something like
> 
> scoped_class(fd_real, real) {
> 	// code
> }
> 
> rather than the {} at the same indent level as the underlying block.
> 
> I don't feel super strongly about this, but I do feel like we need to either
> explicitly say "this is the way/an acceptable way to do this" from a code
> formatting standpoint, or we need to come up with a cleaner way of representing
> the scoped area.

That's a bit painful in these cases - sure, we can do something like
	scoped_class(fd_real, real)(file) {
		if (fd_empty(fd_real)) {
			ret = fd_error(real);
			break;
		}
		old_cred = ovl_override_creds(file_inode(file)->i_sb);
		ret = vfs_fallocate(fd_file(real), mode, offset, len);
		revert_creds(old_cred);

		/* Update size */
		ovl_file_modified(file);  
	}
but that use of break would need to be documented.  And IMO anything like
        scoped_cond_guard (mutex_intr, return -ERESTARTNOINTR,
			   &task->signal->cred_guard_mutex) {
is just distasteful ;-/  Control flow should _not_ be hidden that way;
it's hard on casual reader.

The variant I'd put in there is obviously not suitable for merge - we need
something else, the question is what that something should be...

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCH 08/39] experimental: convert fs/overlayfs/file.c to CLASS(...)
  2024-07-30 21:12       ` Al Viro
@ 2024-07-31 21:11         ` Josef Bacik
  0 siblings, 0 replies; 134+ messages in thread
From: Josef Bacik @ 2024-07-31 21:11 UTC (permalink / raw)
  To: Al Viro
  Cc: viro, linux-fsdevel, amir73il, bpf, brauner, cgroups, kvm, netdev,
	torvalds

On Tue, Jul 30, 2024 at 10:12:25PM +0100, Al Viro wrote:
> On Tue, Jul 30, 2024 at 03:10:25PM -0400, Josef Bacik wrote:
> > On Tue, Jul 30, 2024 at 01:15:54AM -0400, viro@kernel.org wrote:
> > > From: Al Viro <viro@zeniv.linux.org.uk>
> > > 
> > > There are four places where we end up adding an extra scope
> > > covering just the range from constructor to destructor;
> > > not sure if that's the best way to handle that.
> > > 
> > > The functions in question are ovl_write_iter(), ovl_splice_write(),
> > > ovl_fadvise() and ovl_copyfile().
> > > 
> > > This is very likely *NOT* the final form of that thing - it
>     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> > > needs to be discussed.
> 

Fair, I think I misunderstood what you were unhappy with in that code.

> > Is this what we want to do from a code cleanliness standpoint?  This feels
> > pretty ugly to me, I feal like it would be better to have something like
> > 
> > scoped_class(fd_real, real) {
> > 	// code
> > }
> > 
> > rather than the {} at the same indent level as the underlying block.
> > 
> > I don't feel super strongly about this, but I do feel like we need to either
> > explicitly say "this is the way/an acceptable way to do this" from a code
> > formatting standpoint, or we need to come up with a cleaner way of representing
> > the scoped area.
> 
> That's a bit painful in these cases - sure, we can do something like
> 	scoped_class(fd_real, real)(file) {
> 		if (fd_empty(fd_real)) {
> 			ret = fd_error(real);
> 			break;
> 		}
> 		old_cred = ovl_override_creds(file_inode(file)->i_sb);
> 		ret = vfs_fallocate(fd_file(real), mode, offset, len);
> 		revert_creds(old_cred);
> 
> 		/* Update size */
> 		ovl_file_modified(file);  
> 	}
> but that use of break would need to be documented.  And IMO anything like
>         scoped_cond_guard (mutex_intr, return -ERESTARTNOINTR,
> 			   &task->signal->cred_guard_mutex) {
> is just distasteful ;-/  Control flow should _not_ be hidden that way;
> it's hard on casual reader.
> 
> The variant I'd put in there is obviously not suitable for merge - we need
> something else, the question is what that something should be...

I went and looked at our c++ codebase to see what they do here, and it appears
that this is the accepted norm for this style of scoped variables

{
	CLASS(fd_real, real_out)(file_out);
	// blah blah
}

Looking at our code guidelines this appears to be the widely accepted norm, and
I don't hate it.  I feel like this is more readable than the scoped_class()
idea, and is honestly the cleanest solution.  Thanks,

Josef

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCH 08/39] experimental: convert fs/overlayfs/file.c to CLASS(...)
  2024-07-30  5:15   ` [PATCH 08/39] experimental: convert fs/overlayfs/file.c to CLASS(...) viro
  2024-07-30 19:10     ` Josef Bacik
@ 2024-08-07 10:23     ` Christian Brauner
  1 sibling, 0 replies; 134+ messages in thread
From: Christian Brauner @ 2024-08-07 10:23 UTC (permalink / raw)
  To: viro; +Cc: linux-fsdevel, amir73il, bpf, cgroups, kvm, netdev, torvalds

On Tue, Jul 30, 2024 at 01:15:54AM GMT, viro@kernel.org wrote:
> From: Al Viro <viro@zeniv.linux.org.uk>
> 
> There are four places where we end up adding an extra scope
> covering just the range from constructor to destructor;
> not sure if that's the best way to handle that.

I think it's fine and not worth obsessing about it.
Reviewed-by: Christian Brauner <brauner@kernel.org>

^ permalink raw reply	[flat|nested] 134+ messages in thread

* [PATCH 09/39] timerfd: switch to CLASS(fd, ...)
  2024-07-30  5:15 ` [PATCH 01/39] memcg_write_event_control(): fix a user-triggerable oops viro
                     ` (6 preceding siblings ...)
  2024-07-30  5:15   ` [PATCH 08/39] experimental: convert fs/overlayfs/file.c to CLASS(...) viro
@ 2024-07-30  5:15   ` viro
  2024-08-07 10:24     ` Christian Brauner
  2024-07-30  5:15   ` [PATCH 10/39] get rid of perf_fget_light(), convert kernel/events/core.c to CLASS(fd) viro
                     ` (30 subsequent siblings)
  38 siblings, 1 reply; 134+ messages in thread
From: viro @ 2024-07-30  5:15 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: amir73il, bpf, brauner, cgroups, kvm, netdev, torvalds

From: Al Viro <viro@zeniv.linux.org.uk>

Fold timerfd_fget() into both callers to have fdget() and fdput() in
the same scope.  Could be done in different ways, but this is probably
the smallest solution.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/timerfd.c | 40 ++++++++++++++--------------------------
 1 file changed, 14 insertions(+), 26 deletions(-)

diff --git a/fs/timerfd.c b/fs/timerfd.c
index 137523e0bb21..4c32244b0508 100644
--- a/fs/timerfd.c
+++ b/fs/timerfd.c
@@ -394,19 +394,6 @@ static const struct file_operations timerfd_fops = {
 	.unlocked_ioctl	= timerfd_ioctl,
 };
 
-static int timerfd_fget(int fd, struct fd *p)
-{
-	struct fd f = fdget(fd);
-	if (!fd_file(f))
-		return -EBADF;
-	if (fd_file(f)->f_op != &timerfd_fops) {
-		fdput(f);
-		return -EINVAL;
-	}
-	*p = f;
-	return 0;
-}
-
 SYSCALL_DEFINE2(timerfd_create, int, clockid, int, flags)
 {
 	int ufd;
@@ -471,7 +458,6 @@ static int do_timerfd_settime(int ufd, int flags,
 		const struct itimerspec64 *new,
 		struct itimerspec64 *old)
 {
-	struct fd f;
 	struct timerfd_ctx *ctx;
 	int ret;
 
@@ -479,15 +465,17 @@ static int do_timerfd_settime(int ufd, int flags,
 		 !itimerspec64_valid(new))
 		return -EINVAL;
 
-	ret = timerfd_fget(ufd, &f);
-	if (ret)
-		return ret;
+	CLASS(fd, f)(ufd);
+	if (fd_empty(f))
+		return -EBADF;
+
+	if (fd_file(f)->f_op != &timerfd_fops)
+		return -EINVAL;
+
 	ctx = fd_file(f)->private_data;
 
-	if (isalarm(ctx) && !capable(CAP_WAKE_ALARM)) {
-		fdput(f);
+	if (isalarm(ctx) && !capable(CAP_WAKE_ALARM))
 		return -EPERM;
-	}
 
 	timerfd_setup_cancel(ctx, flags);
 
@@ -535,17 +523,18 @@ static int do_timerfd_settime(int ufd, int flags,
 	ret = timerfd_setup(ctx, flags, new);
 
 	spin_unlock_irq(&ctx->wqh.lock);
-	fdput(f);
 	return ret;
 }
 
 static int do_timerfd_gettime(int ufd, struct itimerspec64 *t)
 {
-	struct fd f;
 	struct timerfd_ctx *ctx;
-	int ret = timerfd_fget(ufd, &f);
-	if (ret)
-		return ret;
+	CLASS(fd, f)(ufd);
+
+	if (fd_empty(f))
+		return -EBADF;
+	if (fd_file(f)->f_op != &timerfd_fops)
+		return -EINVAL;
 	ctx = fd_file(f)->private_data;
 
 	spin_lock_irq(&ctx->wqh.lock);
@@ -567,7 +556,6 @@ static int do_timerfd_gettime(int ufd, struct itimerspec64 *t)
 	t->it_value = ktime_to_timespec64(timerfd_get_remaining(ctx));
 	t->it_interval = ktime_to_timespec64(ctx->tintv);
 	spin_unlock_irq(&ctx->wqh.lock);
-	fdput(f);
 	return 0;
 }
 
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 134+ messages in thread

* Re: [PATCH 09/39] timerfd: switch to CLASS(fd, ...)
  2024-07-30  5:15   ` [PATCH 09/39] timerfd: switch to CLASS(fd, ...) viro
@ 2024-08-07 10:24     ` Christian Brauner
  0 siblings, 0 replies; 134+ messages in thread
From: Christian Brauner @ 2024-08-07 10:24 UTC (permalink / raw)
  To: viro; +Cc: linux-fsdevel, amir73il, bpf, cgroups, kvm, netdev, torvalds

On Tue, Jul 30, 2024 at 01:15:55AM GMT, viro@kernel.org wrote:
> From: Al Viro <viro@zeniv.linux.org.uk>
> 
> Fold timerfd_fget() into both callers to have fdget() and fdput() in
> the same scope.  Could be done in different ways, but this is probably
> the smallest solution.
> 
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> ---

Reviewed-by: Christian Brauner <brauner@kernel.org>

^ permalink raw reply	[flat|nested] 134+ messages in thread

* [PATCH 10/39] get rid of perf_fget_light(), convert kernel/events/core.c to CLASS(fd)
  2024-07-30  5:15 ` [PATCH 01/39] memcg_write_event_control(): fix a user-triggerable oops viro
                     ` (7 preceding siblings ...)
  2024-07-30  5:15   ` [PATCH 09/39] timerfd: switch to CLASS(fd, ...) viro
@ 2024-07-30  5:15   ` viro
  2024-08-07 10:25     ` Christian Brauner
  2024-07-30  5:15   ` [PATCH 11/39] switch netlink_getsockbyfilp() to taking descriptor viro
                     ` (29 subsequent siblings)
  38 siblings, 1 reply; 134+ messages in thread
From: viro @ 2024-07-30  5:15 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: amir73il, bpf, brauner, cgroups, kvm, netdev, torvalds

From: Al Viro <viro@zeniv.linux.org.uk>

Lift fdget() and fdput() out of perf_fget_light(), turning it into
is_perf_file(struct fd f).  The life gets easier in both callers
if we do fdget() unconditionally, including the case when we are
given -1 instead of a descriptor - that avoids a reassignment in
perf_event_open(2) and it avoids a nasty temptation in _perf_ioctl()
where we must *not* lift output_event out of scope for output.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 kernel/events/core.c | 49 +++++++++++++++-----------------------------
 1 file changed, 16 insertions(+), 33 deletions(-)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index fd2ac9c7fd77..dae815c30514 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -5895,18 +5895,9 @@ EXPORT_SYMBOL_GPL(perf_event_period);
 
 static const struct file_operations perf_fops;
 
-static inline int perf_fget_light(int fd, struct fd *p)
+static inline bool is_perf_file(struct fd f)
 {
-	struct fd f = fdget(fd);
-	if (!fd_file(f))
-		return -EBADF;
-
-	if (fd_file(f)->f_op != &perf_fops) {
-		fdput(f);
-		return -EBADF;
-	}
-	*p = f;
-	return 0;
+	return !fd_empty(f) && fd_file(f)->f_op == &perf_fops;
 }
 
 static int perf_event_set_output(struct perf_event *event,
@@ -5954,20 +5945,14 @@ static long _perf_ioctl(struct perf_event *event, unsigned int cmd, unsigned lon
 
 	case PERF_EVENT_IOC_SET_OUTPUT:
 	{
-		int ret;
+		CLASS(fd, output)(arg);	     // arg == -1 => empty
+		struct perf_event *output_event = NULL;
 		if (arg != -1) {
-			struct perf_event *output_event;
-			struct fd output;
-			ret = perf_fget_light(arg, &output);
-			if (ret)
-				return ret;
+			if (!is_perf_file(output))
+				return -EBADF;
 			output_event = fd_file(output)->private_data;
-			ret = perf_event_set_output(event, output_event);
-			fdput(output);
-		} else {
-			ret = perf_event_set_output(event, NULL);
 		}
-		return ret;
+		return perf_event_set_output(event, output_event);
 	}
 
 	case PERF_EVENT_IOC_SET_FILTER:
@@ -12474,7 +12459,6 @@ SYSCALL_DEFINE5(perf_event_open,
 	struct perf_event_attr attr;
 	struct perf_event_context *ctx;
 	struct file *event_file = NULL;
-	struct fd group = EMPTY_FD;
 	struct task_struct *task = NULL;
 	struct pmu *pmu;
 	int event_fd;
@@ -12545,10 +12529,12 @@ SYSCALL_DEFINE5(perf_event_open,
 	if (event_fd < 0)
 		return event_fd;
 
+	CLASS(fd, group)(group_fd);     // group_fd == -1 => empty
 	if (group_fd != -1) {
-		err = perf_fget_light(group_fd, &group);
-		if (err)
+		if (!is_perf_file(group)) {
+			err = -EBADF;
 			goto err_fd;
+		}
 		group_leader = fd_file(group)->private_data;
 		if (flags & PERF_FLAG_FD_OUTPUT)
 			output_event = group_leader;
@@ -12560,7 +12546,7 @@ SYSCALL_DEFINE5(perf_event_open,
 		task = find_lively_task_by_vpid(pid);
 		if (IS_ERR(task)) {
 			err = PTR_ERR(task);
-			goto err_group_fd;
+			goto err_fd;
 		}
 	}
 
@@ -12827,12 +12813,11 @@ SYSCALL_DEFINE5(perf_event_open,
 	mutex_unlock(&current->perf_event_mutex);
 
 	/*
-	 * Drop the reference on the group_event after placing the
-	 * new event on the sibling_list. This ensures destruction
-	 * of the group leader will find the pointer to itself in
-	 * perf_group_detach().
+	 * File reference in group guarantees that group_leader has been
+	 * kept alive until we place the new event on the sibling_list.
+	 * This ensures destruction of the group leader will find
+	 * the pointer to itself in perf_group_detach().
 	 */
-	fdput(group);
 	fd_install(event_fd, event_file);
 	return event_fd;
 
@@ -12851,8 +12836,6 @@ SYSCALL_DEFINE5(perf_event_open,
 err_task:
 	if (task)
 		put_task_struct(task);
-err_group_fd:
-	fdput(group);
 err_fd:
 	put_unused_fd(event_fd);
 	return err;
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 134+ messages in thread

* Re: [PATCH 10/39] get rid of perf_fget_light(), convert kernel/events/core.c to CLASS(fd)
  2024-07-30  5:15   ` [PATCH 10/39] get rid of perf_fget_light(), convert kernel/events/core.c to CLASS(fd) viro
@ 2024-08-07 10:25     ` Christian Brauner
  0 siblings, 0 replies; 134+ messages in thread
From: Christian Brauner @ 2024-08-07 10:25 UTC (permalink / raw)
  To: viro; +Cc: linux-fsdevel, amir73il, bpf, cgroups, kvm, netdev, torvalds

On Tue, Jul 30, 2024 at 01:15:56AM GMT, viro@kernel.org wrote:
> From: Al Viro <viro@zeniv.linux.org.uk>
> 
> Lift fdget() and fdput() out of perf_fget_light(), turning it into
> is_perf_file(struct fd f).  The life gets easier in both callers
> if we do fdget() unconditionally, including the case when we are
> given -1 instead of a descriptor - that avoids a reassignment in
> perf_event_open(2) and it avoids a nasty temptation in _perf_ioctl()
> where we must *not* lift output_event out of scope for output.
> 
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> ---

Reviewed-by: Christian Brauner <brauner@kernel.org>

^ permalink raw reply	[flat|nested] 134+ messages in thread

* [PATCH 11/39] switch netlink_getsockbyfilp() to taking descriptor
  2024-07-30  5:15 ` [PATCH 01/39] memcg_write_event_control(): fix a user-triggerable oops viro
                     ` (8 preceding siblings ...)
  2024-07-30  5:15   ` [PATCH 10/39] get rid of perf_fget_light(), convert kernel/events/core.c to CLASS(fd) viro
@ 2024-07-30  5:15   ` viro
  2024-08-07 10:26     ` Christian Brauner
  2024-07-30  5:15   ` [PATCH 12/39] do_mq_notify(): saner skb freeing on failures viro
                     ` (28 subsequent siblings)
  38 siblings, 1 reply; 134+ messages in thread
From: viro @ 2024-07-30  5:15 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: amir73il, bpf, brauner, cgroups, kvm, netdev, torvalds

From: Al Viro <viro@zeniv.linux.org.uk>

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 include/linux/netlink.h  | 2 +-
 ipc/mqueue.c             | 8 +-------
 net/netlink/af_netlink.c | 9 +++++++--
 3 files changed, 9 insertions(+), 10 deletions(-)

diff --git a/include/linux/netlink.h b/include/linux/netlink.h
index b332c2048c75..a48a30842d84 100644
--- a/include/linux/netlink.h
+++ b/include/linux/netlink.h
@@ -239,7 +239,7 @@ int netlink_register_notifier(struct notifier_block *nb);
 int netlink_unregister_notifier(struct notifier_block *nb);
 
 /* finegrained unicast helpers: */
-struct sock *netlink_getsockbyfilp(struct file *filp);
+struct sock *netlink_getsockbyfd(int fd);
 int netlink_attachskb(struct sock *sk, struct sk_buff *skb,
 		      long *timeo, struct sock *ssk);
 void netlink_detachskb(struct sock *sk, struct sk_buff *skb);
diff --git a/ipc/mqueue.c b/ipc/mqueue.c
index 34fa0bd8bb11..fd05e3d4f7b6 100644
--- a/ipc/mqueue.c
+++ b/ipc/mqueue.c
@@ -1355,13 +1355,7 @@ static int do_mq_notify(mqd_t mqdes, const struct sigevent *notification)
 			skb_put(nc, NOTIFY_COOKIE_LEN);
 			/* and attach it to the socket */
 retry:
-			f = fdget(notification->sigev_signo);
-			if (!fd_file(f)) {
-				ret = -EBADF;
-				goto out;
-			}
-			sock = netlink_getsockbyfilp(fd_file(f));
-			fdput(f);
+			sock = netlink_getsockbyfd(notification->sigev_signo);
 			if (IS_ERR(sock)) {
 				ret = PTR_ERR(sock);
 				goto free_skb;
diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
index 0b7a89db3ab7..42451ac355d0 100644
--- a/net/netlink/af_netlink.c
+++ b/net/netlink/af_netlink.c
@@ -1180,11 +1180,16 @@ static struct sock *netlink_getsockbyportid(struct sock *ssk, u32 portid)
 	return sock;
 }
 
-struct sock *netlink_getsockbyfilp(struct file *filp)
+struct sock *netlink_getsockbyfd(int fd)
 {
-	struct inode *inode = file_inode(filp);
+	CLASS(fd, f)(fd);
+	struct inode *inode;
 	struct sock *sock;
 
+	if (fd_empty(f))
+		return ERR_PTR(-EBADF);
+
+	inode = file_inode(fd_file(f));
 	if (!S_ISSOCK(inode->i_mode))
 		return ERR_PTR(-ENOTSOCK);
 
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 134+ messages in thread

* Re: [PATCH 11/39] switch netlink_getsockbyfilp() to taking descriptor
  2024-07-30  5:15   ` [PATCH 11/39] switch netlink_getsockbyfilp() to taking descriptor viro
@ 2024-08-07 10:26     ` Christian Brauner
  0 siblings, 0 replies; 134+ messages in thread
From: Christian Brauner @ 2024-08-07 10:26 UTC (permalink / raw)
  To: viro; +Cc: linux-fsdevel, amir73il, bpf, cgroups, kvm, netdev, torvalds

On Tue, Jul 30, 2024 at 01:15:57AM GMT, viro@kernel.org wrote:
> From: Al Viro <viro@zeniv.linux.org.uk>
> 
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> ---

Reviewed-by: Christian Brauner <brauner@kernel.org>

^ permalink raw reply	[flat|nested] 134+ messages in thread

* [PATCH 12/39] do_mq_notify(): saner skb freeing on failures
  2024-07-30  5:15 ` [PATCH 01/39] memcg_write_event_control(): fix a user-triggerable oops viro
                     ` (9 preceding siblings ...)
  2024-07-30  5:15   ` [PATCH 11/39] switch netlink_getsockbyfilp() to taking descriptor viro
@ 2024-07-30  5:15   ` viro
  2024-07-30  5:15   ` [PATCH 13/39] do_mq_notify(): switch to CLASS(fd, ...) viro
                     ` (27 subsequent siblings)
  38 siblings, 0 replies; 134+ messages in thread
From: viro @ 2024-07-30  5:15 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: amir73il, bpf, brauner, cgroups, kvm, netdev, torvalds

From: Al Viro <viro@zeniv.linux.org.uk>

cleanup is convoluted enough as it is; it's easier to have early
failure outs do explicit kfree_skb(nc), rather than going to
convolutions needed to reuse the cleanup from late failures.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 ipc/mqueue.c | 16 +++++-----------
 1 file changed, 5 insertions(+), 11 deletions(-)

diff --git a/ipc/mqueue.c b/ipc/mqueue.c
index fd05e3d4f7b6..48640a362637 100644
--- a/ipc/mqueue.c
+++ b/ipc/mqueue.c
@@ -1347,8 +1347,8 @@ static int do_mq_notify(mqd_t mqdes, const struct sigevent *notification)
 			if (copy_from_user(nc->data,
 					notification->sigev_value.sival_ptr,
 					NOTIFY_COOKIE_LEN)) {
-				ret = -EFAULT;
-				goto free_skb;
+				kfree_skb(nc);
+				return -EFAULT;
 			}
 
 			/* TODO: add a header? */
@@ -1357,16 +1357,14 @@ static int do_mq_notify(mqd_t mqdes, const struct sigevent *notification)
 retry:
 			sock = netlink_getsockbyfd(notification->sigev_signo);
 			if (IS_ERR(sock)) {
-				ret = PTR_ERR(sock);
-				goto free_skb;
+				kfree_skb(nc);
+				return PTR_ERR(sock);
 			}
 
 			timeo = MAX_SCHEDULE_TIMEOUT;
 			ret = netlink_attachskb(sock, nc, &timeo, NULL);
-			if (ret == 1) {
-				sock = NULL;
+			if (ret == 1)
 				goto retry;
-			}
 			if (ret)
 				return ret;
 		}
@@ -1425,10 +1423,6 @@ static int do_mq_notify(mqd_t mqdes, const struct sigevent *notification)
 out:
 	if (sock)
 		netlink_detachskb(sock, nc);
-	else
-free_skb:
-		dev_kfree_skb(nc);
-
 	return ret;
 }
 
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 134+ messages in thread

* [PATCH 13/39] do_mq_notify(): switch to CLASS(fd, ...)
  2024-07-30  5:15 ` [PATCH 01/39] memcg_write_event_control(): fix a user-triggerable oops viro
                     ` (10 preceding siblings ...)
  2024-07-30  5:15   ` [PATCH 12/39] do_mq_notify(): saner skb freeing on failures viro
@ 2024-07-30  5:15   ` viro
  2024-08-07 10:27     ` Christian Brauner
  2024-07-30  5:16   ` [PATCH 14/39] simplify xfs_find_handle() a bit viro
                     ` (26 subsequent siblings)
  38 siblings, 1 reply; 134+ messages in thread
From: viro @ 2024-07-30  5:15 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: amir73il, bpf, brauner, cgroups, kvm, netdev, torvalds

From: Al Viro <viro@zeniv.linux.org.uk>

semi-simple.  The only failure exit before fdget() is a return, the
only thing done after fdput() is transposable with it.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 ipc/mqueue.c | 9 +++------
 1 file changed, 3 insertions(+), 6 deletions(-)

diff --git a/ipc/mqueue.c b/ipc/mqueue.c
index 48640a362637..4f1dec518fae 100644
--- a/ipc/mqueue.c
+++ b/ipc/mqueue.c
@@ -1317,7 +1317,6 @@ SYSCALL_DEFINE5(mq_timedreceive, mqd_t, mqdes, char __user *, u_msg_ptr,
 static int do_mq_notify(mqd_t mqdes, const struct sigevent *notification)
 {
 	int ret;
-	struct fd f;
 	struct sock *sock;
 	struct inode *inode;
 	struct mqueue_inode_info *info;
@@ -1370,8 +1369,8 @@ static int do_mq_notify(mqd_t mqdes, const struct sigevent *notification)
 		}
 	}
 
-	f = fdget(mqdes);
-	if (!fd_file(f)) {
+	CLASS(fd, f)(mqdes);
+	if (fd_empty(f)) {
 		ret = -EBADF;
 		goto out;
 	}
@@ -1379,7 +1378,7 @@ static int do_mq_notify(mqd_t mqdes, const struct sigevent *notification)
 	inode = file_inode(fd_file(f));
 	if (unlikely(fd_file(f)->f_op != &mqueue_file_operations)) {
 		ret = -EBADF;
-		goto out_fput;
+		goto out;
 	}
 	info = MQUEUE_I(inode);
 
@@ -1418,8 +1417,6 @@ static int do_mq_notify(mqd_t mqdes, const struct sigevent *notification)
 		inode_set_atime_to_ts(inode, inode_set_ctime_current(inode));
 	}
 	spin_unlock(&info->lock);
-out_fput:
-	fdput(f);
 out:
 	if (sock)
 		netlink_detachskb(sock, nc);
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 134+ messages in thread

* Re: [PATCH 13/39] do_mq_notify(): switch to CLASS(fd, ...)
  2024-07-30  5:15   ` [PATCH 13/39] do_mq_notify(): switch to CLASS(fd, ...) viro
@ 2024-08-07 10:27     ` Christian Brauner
  0 siblings, 0 replies; 134+ messages in thread
From: Christian Brauner @ 2024-08-07 10:27 UTC (permalink / raw)
  To: viro; +Cc: linux-fsdevel, amir73il, bpf, cgroups, kvm, netdev, torvalds

On Tue, Jul 30, 2024 at 01:15:59AM GMT, viro@kernel.org wrote:
> From: Al Viro <viro@zeniv.linux.org.uk>
> 
> semi-simple.  The only failure exit before fdget() is a return, the
> only thing done after fdput() is transposable with it.
> 
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> ---

Reviewed-by: Christian Brauner <brauner@kernel.org>

^ permalink raw reply	[flat|nested] 134+ messages in thread

* [PATCH 14/39] simplify xfs_find_handle() a bit
  2024-07-30  5:15 ` [PATCH 01/39] memcg_write_event_control(): fix a user-triggerable oops viro
                     ` (11 preceding siblings ...)
  2024-07-30  5:15   ` [PATCH 13/39] do_mq_notify(): switch to CLASS(fd, ...) viro
@ 2024-07-30  5:16   ` viro
  2024-07-30  5:16   ` [PATCH 15/39] convert vmsplice() to CLASS(fd, ...) viro
                     ` (25 subsequent siblings)
  38 siblings, 0 replies; 134+ messages in thread
From: viro @ 2024-07-30  5:16 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: amir73il, bpf, brauner, cgroups, kvm, netdev, torvalds

From: Al Viro <viro@zeniv.linux.org.uk>

XFS_IOC_FD_TO_HANDLE can grab a reference to copied ->f_path and
let the file go; results in simpler control flow - cleanup is
the same for both "by descriptor" and "by pathname" cases.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/xfs/xfs_handle.c | 16 +++++++---------
 1 file changed, 7 insertions(+), 9 deletions(-)

diff --git a/fs/xfs/xfs_handle.c b/fs/xfs/xfs_handle.c
index 49e5e5f04e60..f19fce557354 100644
--- a/fs/xfs/xfs_handle.c
+++ b/fs/xfs/xfs_handle.c
@@ -85,22 +85,23 @@ xfs_find_handle(
 	int			hsize;
 	xfs_handle_t		handle;
 	struct inode		*inode;
-	struct fd		f = EMPTY_FD;
 	struct path		path;
 	int			error;
 	struct xfs_inode	*ip;
 
 	if (cmd == XFS_IOC_FD_TO_HANDLE) {
-		f = fdget(hreq->fd);
-		if (!fd_file(f))
+		CLASS(fd, f)(hreq->fd);
+
+		if (fd_empty(f))
 			return -EBADF;
-		inode = file_inode(fd_file(f));
+		path = fd_file(f)->f_path;
+		path_get(&path);
 	} else {
 		error = user_path_at(AT_FDCWD, hreq->path, 0, &path);
 		if (error)
 			return error;
-		inode = d_inode(path.dentry);
 	}
+	inode = d_inode(path.dentry);
 	ip = XFS_I(inode);
 
 	/*
@@ -134,10 +135,7 @@ xfs_find_handle(
 	error = 0;
 
  out_put:
-	if (cmd == XFS_IOC_FD_TO_HANDLE)
-		fdput(f);
-	else
-		path_put(&path);
+	path_put(&path);
 	return error;
 }
 
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 134+ messages in thread

* [PATCH 15/39] convert vmsplice() to CLASS(fd, ...)
  2024-07-30  5:15 ` [PATCH 01/39] memcg_write_event_control(): fix a user-triggerable oops viro
                     ` (12 preceding siblings ...)
  2024-07-30  5:16   ` [PATCH 14/39] simplify xfs_find_handle() a bit viro
@ 2024-07-30  5:16   ` viro
  2024-08-07 10:27     ` Christian Brauner
  2024-07-30  5:16   ` [PATCH 16/39] convert __bpf_prog_get() " viro
                     ` (24 subsequent siblings)
  38 siblings, 1 reply; 134+ messages in thread
From: viro @ 2024-07-30  5:16 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: amir73il, bpf, brauner, cgroups, kvm, netdev, torvalds

From: Al Viro <viro@zeniv.linux.org.uk>

Irregularity here is fdput() not in the same scope as fdget(); we could
just lift it out vmsplice_type() in vmsplice(2), but there's no much
point keeping vmsplice_type() separate after that...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/splice.c | 33 ++++++++++-----------------------
 1 file changed, 10 insertions(+), 23 deletions(-)

diff --git a/fs/splice.c b/fs/splice.c
index 06232d7e505f..29cd39d7f4a0 100644
--- a/fs/splice.c
+++ b/fs/splice.c
@@ -1564,21 +1564,6 @@ static ssize_t vmsplice_to_pipe(struct file *file, struct iov_iter *iter,
 	return ret;
 }
 
-static int vmsplice_type(struct fd f, int *type)
-{
-	if (!fd_file(f))
-		return -EBADF;
-	if (fd_file(f)->f_mode & FMODE_WRITE) {
-		*type = ITER_SOURCE;
-	} else if (fd_file(f)->f_mode & FMODE_READ) {
-		*type = ITER_DEST;
-	} else {
-		fdput(f);
-		return -EBADF;
-	}
-	return 0;
-}
-
 /*
  * Note that vmsplice only really supports true splicing _from_ user memory
  * to a pipe, not the other way around. Splicing from user memory is a simple
@@ -1602,21 +1587,25 @@ SYSCALL_DEFINE4(vmsplice, int, fd, const struct iovec __user *, uiov,
 	struct iovec *iov = iovstack;
 	struct iov_iter iter;
 	ssize_t error;
-	struct fd f;
 	int type;
 
 	if (unlikely(flags & ~SPLICE_F_ALL))
 		return -EINVAL;
 
-	f = fdget(fd);
-	error = vmsplice_type(f, &type);
-	if (error)
-		return error;
+	CLASS(fd, f)(fd);
+	if (fd_empty(f))
+		return -EBADF;
+	if (fd_file(f)->f_mode & FMODE_WRITE)
+		type = ITER_SOURCE;
+	else if (fd_file(f)->f_mode & FMODE_READ)
+		type = ITER_DEST;
+	else
+		return -EBADF;
 
 	error = import_iovec(type, uiov, nr_segs,
 			     ARRAY_SIZE(iovstack), &iov, &iter);
 	if (error < 0)
-		goto out_fdput;
+		return error;
 
 	if (!iov_iter_count(&iter))
 		error = 0;
@@ -1626,8 +1615,6 @@ SYSCALL_DEFINE4(vmsplice, int, fd, const struct iovec __user *, uiov,
 		error = vmsplice_to_user(fd_file(f), &iter, flags);
 
 	kfree(iov);
-out_fdput:
-	fdput(f);
 	return error;
 }
 
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 134+ messages in thread

* Re: [PATCH 15/39] convert vmsplice() to CLASS(fd, ...)
  2024-07-30  5:16   ` [PATCH 15/39] convert vmsplice() to CLASS(fd, ...) viro
@ 2024-08-07 10:27     ` Christian Brauner
  0 siblings, 0 replies; 134+ messages in thread
From: Christian Brauner @ 2024-08-07 10:27 UTC (permalink / raw)
  To: viro; +Cc: linux-fsdevel, amir73il, bpf, cgroups, kvm, netdev, torvalds

On Tue, Jul 30, 2024 at 01:16:01AM GMT, viro@kernel.org wrote:
> From: Al Viro <viro@zeniv.linux.org.uk>
> 
> Irregularity here is fdput() not in the same scope as fdget(); we could
> just lift it out vmsplice_type() in vmsplice(2), but there's no much
> point keeping vmsplice_type() separate after that...
> 
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> ---

Reviewed-by: Christian Brauner <brauner@kernel.org>

^ permalink raw reply	[flat|nested] 134+ messages in thread

* [PATCH 16/39] convert __bpf_prog_get() to CLASS(fd, ...)
  2024-07-30  5:15 ` [PATCH 01/39] memcg_write_event_control(): fix a user-triggerable oops viro
                     ` (13 preceding siblings ...)
  2024-07-30  5:16   ` [PATCH 15/39] convert vmsplice() to CLASS(fd, ...) viro
@ 2024-07-30  5:16   ` viro
  2024-08-06 21:08     ` Andrii Nakryiko
  2024-08-07 10:28     ` Christian Brauner
  2024-07-30  5:16   ` [PATCH 17/39] bpf: resolve_pseudo_ldimm64(): take handling of a single ldimm64 insn into helper viro
                     ` (23 subsequent siblings)
  38 siblings, 2 replies; 134+ messages in thread
From: viro @ 2024-07-30  5:16 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: amir73il, bpf, brauner, cgroups, kvm, netdev, torvalds

From: Al Viro <viro@zeniv.linux.org.uk>

Irregularity here is fdput() not in the same scope as fdget();
just fold ____bpf_prog_get() into its (only) caller and that's
it...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 kernel/bpf/syscall.c | 32 +++++++++++---------------------
 1 file changed, 11 insertions(+), 21 deletions(-)

diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 3093bf2cc266..c5b252c0faa8 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -2407,18 +2407,6 @@ int bpf_prog_new_fd(struct bpf_prog *prog)
 				O_RDWR | O_CLOEXEC);
 }
 
-static struct bpf_prog *____bpf_prog_get(struct fd f)
-{
-	if (!fd_file(f))
-		return ERR_PTR(-EBADF);
-	if (fd_file(f)->f_op != &bpf_prog_fops) {
-		fdput(f);
-		return ERR_PTR(-EINVAL);
-	}
-
-	return fd_file(f)->private_data;
-}
-
 void bpf_prog_add(struct bpf_prog *prog, int i)
 {
 	atomic64_add(i, &prog->aux->refcnt);
@@ -2474,20 +2462,22 @@ bool bpf_prog_get_ok(struct bpf_prog *prog,
 static struct bpf_prog *__bpf_prog_get(u32 ufd, enum bpf_prog_type *attach_type,
 				       bool attach_drv)
 {
-	struct fd f = fdget(ufd);
+	CLASS(fd, f)(ufd);
 	struct bpf_prog *prog;
 
-	prog = ____bpf_prog_get(f);
-	if (IS_ERR(prog))
+	if (fd_empty(f))
+		return ERR_PTR(-EBADF);
+	if (fd_file(f)->f_op != &bpf_prog_fops)
+		return ERR_PTR(-EINVAL);
+
+	prog = fd_file(f)->private_data;
+	if (IS_ERR(prog))	// can that actually happen?
 		return prog;
-	if (!bpf_prog_get_ok(prog, attach_type, attach_drv)) {
-		prog = ERR_PTR(-EINVAL);
-		goto out;
-	}
+
+	if (!bpf_prog_get_ok(prog, attach_type, attach_drv))
+		return ERR_PTR(-EINVAL);
 
 	bpf_prog_inc(prog);
-out:
-	fdput(f);
 	return prog;
 }
 
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 134+ messages in thread

* Re: [PATCH 16/39] convert __bpf_prog_get() to CLASS(fd, ...)
  2024-07-30  5:16   ` [PATCH 16/39] convert __bpf_prog_get() " viro
@ 2024-08-06 21:08     ` Andrii Nakryiko
  2024-08-07 10:28     ` Christian Brauner
  1 sibling, 0 replies; 134+ messages in thread
From: Andrii Nakryiko @ 2024-08-06 21:08 UTC (permalink / raw)
  To: viro; +Cc: linux-fsdevel, amir73il, bpf, brauner, cgroups, kvm, netdev,
	torvalds

On Mon, Jul 29, 2024 at 10:19 PM <viro@kernel.org> wrote:
>
> From: Al Viro <viro@zeniv.linux.org.uk>
>
> Irregularity here is fdput() not in the same scope as fdget();
> just fold ____bpf_prog_get() into its (only) caller and that's
> it...
>
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> ---
>  kernel/bpf/syscall.c | 32 +++++++++++---------------------
>  1 file changed, 11 insertions(+), 21 deletions(-)
>

Folding makes total sense, the logic lgtm (though I find CLASS(fd,
f)(ufd) utterly non-intuitive naming-wise). Extra IS_ERR(prog) check
should be dropped though, see below.

Acked-by: Andrii Nakryiko <andrii@kernel.org>

> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> index 3093bf2cc266..c5b252c0faa8 100644
> --- a/kernel/bpf/syscall.c
> +++ b/kernel/bpf/syscall.c
> @@ -2407,18 +2407,6 @@ int bpf_prog_new_fd(struct bpf_prog *prog)
>                                 O_RDWR | O_CLOEXEC);
>  }
>
> -static struct bpf_prog *____bpf_prog_get(struct fd f)
> -{
> -       if (!fd_file(f))
> -               return ERR_PTR(-EBADF);
> -       if (fd_file(f)->f_op != &bpf_prog_fops) {
> -               fdput(f);
> -               return ERR_PTR(-EINVAL);
> -       }
> -
> -       return fd_file(f)->private_data;
> -}
> -
>  void bpf_prog_add(struct bpf_prog *prog, int i)
>  {
>         atomic64_add(i, &prog->aux->refcnt);
> @@ -2474,20 +2462,22 @@ bool bpf_prog_get_ok(struct bpf_prog *prog,
>  static struct bpf_prog *__bpf_prog_get(u32 ufd, enum bpf_prog_type *attach_type,
>                                        bool attach_drv)
>  {
> -       struct fd f = fdget(ufd);
> +       CLASS(fd, f)(ufd);
>         struct bpf_prog *prog;
>
> -       prog = ____bpf_prog_get(f);
> -       if (IS_ERR(prog))
> +       if (fd_empty(f))
> +               return ERR_PTR(-EBADF);
> +       if (fd_file(f)->f_op != &bpf_prog_fops)
> +               return ERR_PTR(-EINVAL);
> +
> +       prog = fd_file(f)->private_data;
> +       if (IS_ERR(prog))       // can that actually happen?

no, it can't, private_data will always be a valid pointer, otherwise
that file would never be successfully created

>                 return prog;
> -       if (!bpf_prog_get_ok(prog, attach_type, attach_drv)) {
> -               prog = ERR_PTR(-EINVAL);
> -               goto out;
> -       }
> +
> +       if (!bpf_prog_get_ok(prog, attach_type, attach_drv))
> +               return ERR_PTR(-EINVAL);
>
>         bpf_prog_inc(prog);
> -out:
> -       fdput(f);
>         return prog;
>  }
>
> --
> 2.39.2
>
>

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCH 16/39] convert __bpf_prog_get() to CLASS(fd, ...)
  2024-07-30  5:16   ` [PATCH 16/39] convert __bpf_prog_get() " viro
  2024-08-06 21:08     ` Andrii Nakryiko
@ 2024-08-07 10:28     ` Christian Brauner
  1 sibling, 0 replies; 134+ messages in thread
From: Christian Brauner @ 2024-08-07 10:28 UTC (permalink / raw)
  To: viro; +Cc: linux-fsdevel, amir73il, bpf, cgroups, kvm, netdev, torvalds

On Tue, Jul 30, 2024 at 01:16:02AM GMT, viro@kernel.org wrote:
> From: Al Viro <viro@zeniv.linux.org.uk>
> 
> Irregularity here is fdput() not in the same scope as fdget();
> just fold ____bpf_prog_get() into its (only) caller and that's
> it...
> 
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> ---
>  kernel/bpf/syscall.c | 32 +++++++++++---------------------
>  1 file changed, 11 insertions(+), 21 deletions(-)
> 
> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> index 3093bf2cc266..c5b252c0faa8 100644
> --- a/kernel/bpf/syscall.c
> +++ b/kernel/bpf/syscall.c
> @@ -2407,18 +2407,6 @@ int bpf_prog_new_fd(struct bpf_prog *prog)
>  				O_RDWR | O_CLOEXEC);
>  }
>  
> -static struct bpf_prog *____bpf_prog_get(struct fd f)
> -{
> -	if (!fd_file(f))
> -		return ERR_PTR(-EBADF);
> -	if (fd_file(f)->f_op != &bpf_prog_fops) {
> -		fdput(f);
> -		return ERR_PTR(-EINVAL);
> -	}
> -
> -	return fd_file(f)->private_data;
> -}
> -
>  void bpf_prog_add(struct bpf_prog *prog, int i)
>  {
>  	atomic64_add(i, &prog->aux->refcnt);
> @@ -2474,20 +2462,22 @@ bool bpf_prog_get_ok(struct bpf_prog *prog,
>  static struct bpf_prog *__bpf_prog_get(u32 ufd, enum bpf_prog_type *attach_type,
>  				       bool attach_drv)
>  {
> -	struct fd f = fdget(ufd);
> +	CLASS(fd, f)(ufd);
>  	struct bpf_prog *prog;
>  
> -	prog = ____bpf_prog_get(f);
> -	if (IS_ERR(prog))
> +	if (fd_empty(f))
> +		return ERR_PTR(-EBADF);
> +	if (fd_file(f)->f_op != &bpf_prog_fops)
> +		return ERR_PTR(-EINVAL);
> +
> +	prog = fd_file(f)->private_data;
> +	if (IS_ERR(prog))	// can that actually happen?
>  		return prog;

With or without that removed:

Reviewed-by: Christian Brauner <brauner@kernel.org>

^ permalink raw reply	[flat|nested] 134+ messages in thread

* [PATCH 17/39] bpf: resolve_pseudo_ldimm64(): take handling of a single ldimm64 insn into helper
  2024-07-30  5:15 ` [PATCH 01/39] memcg_write_event_control(): fix a user-triggerable oops viro
                     ` (14 preceding siblings ...)
  2024-07-30  5:16   ` [PATCH 16/39] convert __bpf_prog_get() " viro
@ 2024-07-30  5:16   ` viro
  2024-08-06 22:32     ` Andrii Nakryiko
  2024-07-30  5:16   ` [PATCH 18/39] bpf maps: switch to CLASS(fd, ...) viro
                     ` (22 subsequent siblings)
  38 siblings, 1 reply; 134+ messages in thread
From: viro @ 2024-07-30  5:16 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: amir73il, bpf, brauner, cgroups, kvm, netdev, torvalds

From: Al Viro <viro@zeniv.linux.org.uk>

Equivalent transformation.  For one thing, it's easier to follow that way.
For another, that simplifies the control flow in the vicinity of struct fd
handling in there, which will allow a switch to CLASS(fd) and make the
thing much easier to verify wrt leaks.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 kernel/bpf/verifier.c | 342 +++++++++++++++++++++---------------------
 1 file changed, 172 insertions(+), 170 deletions(-)

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 4cb5441ad75f..d54f61e12062 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -18414,11 +18414,180 @@ static bool bpf_map_is_cgroup_storage(struct bpf_map *map)
  *
  * NOTE: btf_vmlinux is required for converting pseudo btf_id.
  */
+static int do_one_ldimm64(struct bpf_verifier_env *env,
+			  struct bpf_insn *insn,
+			  struct bpf_insn_aux_data *aux)
+{
+	struct bpf_map *map;
+	struct fd f;
+	u64 addr;
+	u32 fd;
+	int err;
+
+	if (insn[0].src_reg == 0)
+		/* valid generic load 64-bit imm */
+		return 0;
+
+	if (insn[0].src_reg == BPF_PSEUDO_BTF_ID)
+		return check_pseudo_btf_id(env, insn, aux);
+
+	if (insn[0].src_reg == BPF_PSEUDO_FUNC) {
+		aux->ptr_type = PTR_TO_FUNC;
+		return 0;
+	}
+
+	/* In final convert_pseudo_ld_imm64() step, this is
+	 * converted into regular 64-bit imm load insn.
+	 */
+	switch (insn[0].src_reg) {
+	case BPF_PSEUDO_MAP_VALUE:
+	case BPF_PSEUDO_MAP_IDX_VALUE:
+		break;
+	case BPF_PSEUDO_MAP_FD:
+	case BPF_PSEUDO_MAP_IDX:
+		if (insn[1].imm == 0)
+			break;
+		fallthrough;
+	default:
+		verbose(env, "unrecognized bpf_ld_imm64 insn\n");
+		return -EINVAL;
+	}
+
+	switch (insn[0].src_reg) {
+	case BPF_PSEUDO_MAP_IDX_VALUE:
+	case BPF_PSEUDO_MAP_IDX:
+		if (bpfptr_is_null(env->fd_array)) {
+			verbose(env, "fd_idx without fd_array is invalid\n");
+			return -EPROTO;
+		}
+		if (copy_from_bpfptr_offset(&fd, env->fd_array,
+					    insn[0].imm * sizeof(fd),
+					    sizeof(fd)))
+			return -EFAULT;
+		break;
+	default:
+		fd = insn[0].imm;
+		break;
+	}
+
+	f = fdget(fd);
+	map = __bpf_map_get(f);
+	if (IS_ERR(map)) {
+		verbose(env, "fd %d is not pointing to valid bpf_map\n", fd);
+		return PTR_ERR(map);
+	}
+
+	err = check_map_prog_compatibility(env, map, env->prog);
+	if (err) {
+		fdput(f);
+		return err;
+	}
+
+	if (insn[0].src_reg == BPF_PSEUDO_MAP_FD ||
+	    insn[0].src_reg == BPF_PSEUDO_MAP_IDX) {
+		addr = (unsigned long)map;
+	} else {
+		u32 off = insn[1].imm;
+
+		if (off >= BPF_MAX_VAR_OFF) {
+			verbose(env, "direct value offset of %u is not allowed\n", off);
+			fdput(f);
+			return -EINVAL;
+		}
+
+		if (!map->ops->map_direct_value_addr) {
+			verbose(env, "no direct value access support for this map type\n");
+			fdput(f);
+			return -EINVAL;
+		}
+
+		err = map->ops->map_direct_value_addr(map, &addr, off);
+		if (err) {
+			verbose(env, "invalid access to map value pointer, value_size=%u off=%u\n",
+				map->value_size, off);
+			fdput(f);
+			return err;
+		}
+
+		aux->map_off = off;
+		addr += off;
+	}
+
+	insn[0].imm = (u32)addr;
+	insn[1].imm = addr >> 32;
+
+	/* check whether we recorded this map already */
+	for (int i = 0; i < env->used_map_cnt; i++) {
+		if (env->used_maps[i] == map) {
+			aux->map_index = i;
+			fdput(f);
+			return 0;
+		}
+	}
+
+	if (env->used_map_cnt >= MAX_USED_MAPS) {
+		verbose(env, "The total number of maps per program has reached the limit of %u\n",
+			MAX_USED_MAPS);
+		fdput(f);
+		return -E2BIG;
+	}
+
+	if (env->prog->sleepable)
+		atomic64_inc(&map->sleepable_refcnt);
+	/* hold the map. If the program is rejected by verifier,
+	 * the map will be released by release_maps() or it
+	 * will be used by the valid program until it's unloaded
+	 * and all maps are released in bpf_free_used_maps()
+	 */
+	bpf_map_inc(map);
+
+	aux->map_index = env->used_map_cnt;
+	env->used_maps[env->used_map_cnt++] = map;
+
+	if (bpf_map_is_cgroup_storage(map) &&
+	    bpf_cgroup_storage_assign(env->prog->aux, map)) {
+		verbose(env, "only one cgroup storage of each type is allowed\n");
+		fdput(f);
+		return -EBUSY;
+	}
+	if (map->map_type == BPF_MAP_TYPE_ARENA) {
+		if (env->prog->aux->arena) {
+			verbose(env, "Only one arena per program\n");
+			fdput(f);
+			return -EBUSY;
+		}
+		if (!env->allow_ptr_leaks || !env->bpf_capable) {
+			verbose(env, "CAP_BPF and CAP_PERFMON are required to use arena\n");
+			fdput(f);
+			return -EPERM;
+		}
+		if (!env->prog->jit_requested) {
+			verbose(env, "JIT is required to use arena\n");
+			fdput(f);
+			return -EOPNOTSUPP;
+		}
+		if (!bpf_jit_supports_arena()) {
+			verbose(env, "JIT doesn't support arena\n");
+			fdput(f);
+			return -EOPNOTSUPP;
+		}
+		env->prog->aux->arena = (void *)map;
+		if (!bpf_arena_get_user_vm_start(env->prog->aux->arena)) {
+			verbose(env, "arena's user address must be set via map_extra or mmap()\n");
+			fdput(f);
+			return -EINVAL;
+		}
+	}
+
+	fdput(f);
+	return 0;
+}
+
 static int resolve_pseudo_ldimm64(struct bpf_verifier_env *env)
 {
 	struct bpf_insn *insn = env->prog->insnsi;
 	int insn_cnt = env->prog->len;
-	int i, j, err;
+	int i, err;
 
 	err = bpf_prog_calc_tag(env->prog);
 	if (err)
@@ -18433,12 +18602,6 @@ static int resolve_pseudo_ldimm64(struct bpf_verifier_env *env)
 		}
 
 		if (insn[0].code == (BPF_LD | BPF_IMM | BPF_DW)) {
-			struct bpf_insn_aux_data *aux;
-			struct bpf_map *map;
-			struct fd f;
-			u64 addr;
-			u32 fd;
-
 			if (i == insn_cnt - 1 || insn[1].code != 0 ||
 			    insn[1].dst_reg != 0 || insn[1].src_reg != 0 ||
 			    insn[1].off != 0) {
@@ -18446,170 +18609,9 @@ static int resolve_pseudo_ldimm64(struct bpf_verifier_env *env)
 				return -EINVAL;
 			}
 
-			if (insn[0].src_reg == 0)
-				/* valid generic load 64-bit imm */
-				goto next_insn;
-
-			if (insn[0].src_reg == BPF_PSEUDO_BTF_ID) {
-				aux = &env->insn_aux_data[i];
-				err = check_pseudo_btf_id(env, insn, aux);
-				if (err)
-					return err;
-				goto next_insn;
-			}
-
-			if (insn[0].src_reg == BPF_PSEUDO_FUNC) {
-				aux = &env->insn_aux_data[i];
-				aux->ptr_type = PTR_TO_FUNC;
-				goto next_insn;
-			}
-
-			/* In final convert_pseudo_ld_imm64() step, this is
-			 * converted into regular 64-bit imm load insn.
-			 */
-			switch (insn[0].src_reg) {
-			case BPF_PSEUDO_MAP_VALUE:
-			case BPF_PSEUDO_MAP_IDX_VALUE:
-				break;
-			case BPF_PSEUDO_MAP_FD:
-			case BPF_PSEUDO_MAP_IDX:
-				if (insn[1].imm == 0)
-					break;
-				fallthrough;
-			default:
-				verbose(env, "unrecognized bpf_ld_imm64 insn\n");
-				return -EINVAL;
-			}
-
-			switch (insn[0].src_reg) {
-			case BPF_PSEUDO_MAP_IDX_VALUE:
-			case BPF_PSEUDO_MAP_IDX:
-				if (bpfptr_is_null(env->fd_array)) {
-					verbose(env, "fd_idx without fd_array is invalid\n");
-					return -EPROTO;
-				}
-				if (copy_from_bpfptr_offset(&fd, env->fd_array,
-							    insn[0].imm * sizeof(fd),
-							    sizeof(fd)))
-					return -EFAULT;
-				break;
-			default:
-				fd = insn[0].imm;
-				break;
-			}
-
-			f = fdget(fd);
-			map = __bpf_map_get(f);
-			if (IS_ERR(map)) {
-				verbose(env, "fd %d is not pointing to valid bpf_map\n", fd);
-				return PTR_ERR(map);
-			}
-
-			err = check_map_prog_compatibility(env, map, env->prog);
-			if (err) {
-				fdput(f);
+			err = do_one_ldimm64(env, insn, &env->insn_aux_data[i]);
+			if (err)
 				return err;
-			}
-
-			aux = &env->insn_aux_data[i];
-			if (insn[0].src_reg == BPF_PSEUDO_MAP_FD ||
-			    insn[0].src_reg == BPF_PSEUDO_MAP_IDX) {
-				addr = (unsigned long)map;
-			} else {
-				u32 off = insn[1].imm;
-
-				if (off >= BPF_MAX_VAR_OFF) {
-					verbose(env, "direct value offset of %u is not allowed\n", off);
-					fdput(f);
-					return -EINVAL;
-				}
-
-				if (!map->ops->map_direct_value_addr) {
-					verbose(env, "no direct value access support for this map type\n");
-					fdput(f);
-					return -EINVAL;
-				}
-
-				err = map->ops->map_direct_value_addr(map, &addr, off);
-				if (err) {
-					verbose(env, "invalid access to map value pointer, value_size=%u off=%u\n",
-						map->value_size, off);
-					fdput(f);
-					return err;
-				}
-
-				aux->map_off = off;
-				addr += off;
-			}
-
-			insn[0].imm = (u32)addr;
-			insn[1].imm = addr >> 32;
-
-			/* check whether we recorded this map already */
-			for (j = 0; j < env->used_map_cnt; j++) {
-				if (env->used_maps[j] == map) {
-					aux->map_index = j;
-					fdput(f);
-					goto next_insn;
-				}
-			}
-
-			if (env->used_map_cnt >= MAX_USED_MAPS) {
-				verbose(env, "The total number of maps per program has reached the limit of %u\n",
-					MAX_USED_MAPS);
-				fdput(f);
-				return -E2BIG;
-			}
-
-			if (env->prog->sleepable)
-				atomic64_inc(&map->sleepable_refcnt);
-			/* hold the map. If the program is rejected by verifier,
-			 * the map will be released by release_maps() or it
-			 * will be used by the valid program until it's unloaded
-			 * and all maps are released in bpf_free_used_maps()
-			 */
-			bpf_map_inc(map);
-
-			aux->map_index = env->used_map_cnt;
-			env->used_maps[env->used_map_cnt++] = map;
-
-			if (bpf_map_is_cgroup_storage(map) &&
-			    bpf_cgroup_storage_assign(env->prog->aux, map)) {
-				verbose(env, "only one cgroup storage of each type is allowed\n");
-				fdput(f);
-				return -EBUSY;
-			}
-			if (map->map_type == BPF_MAP_TYPE_ARENA) {
-				if (env->prog->aux->arena) {
-					verbose(env, "Only one arena per program\n");
-					fdput(f);
-					return -EBUSY;
-				}
-				if (!env->allow_ptr_leaks || !env->bpf_capable) {
-					verbose(env, "CAP_BPF and CAP_PERFMON are required to use arena\n");
-					fdput(f);
-					return -EPERM;
-				}
-				if (!env->prog->jit_requested) {
-					verbose(env, "JIT is required to use arena\n");
-					fdput(f);
-					return -EOPNOTSUPP;
-				}
-				if (!bpf_jit_supports_arena()) {
-					verbose(env, "JIT doesn't support arena\n");
-					fdput(f);
-					return -EOPNOTSUPP;
-				}
-				env->prog->aux->arena = (void *)map;
-				if (!bpf_arena_get_user_vm_start(env->prog->aux->arena)) {
-					verbose(env, "arena's user address must be set via map_extra or mmap()\n");
-					fdput(f);
-					return -EINVAL;
-				}
-			}
-
-			fdput(f);
-next_insn:
 			insn++;
 			i++;
 			continue;
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 134+ messages in thread

* Re: [PATCH 17/39] bpf: resolve_pseudo_ldimm64(): take handling of a single ldimm64 insn into helper
  2024-07-30  5:16   ` [PATCH 17/39] bpf: resolve_pseudo_ldimm64(): take handling of a single ldimm64 insn into helper viro
@ 2024-08-06 22:32     ` Andrii Nakryiko
  2024-08-07 10:29       ` Christian Brauner
  0 siblings, 1 reply; 134+ messages in thread
From: Andrii Nakryiko @ 2024-08-06 22:32 UTC (permalink / raw)
  To: viro, bpf
  Cc: linux-fsdevel, amir73il, brauner, cgroups, kvm, netdev, torvalds

On Mon, Jul 29, 2024 at 10:20 PM <viro@kernel.org> wrote:
>
> From: Al Viro <viro@zeniv.linux.org.uk>
>
> Equivalent transformation.  For one thing, it's easier to follow that way.
> For another, that simplifies the control flow in the vicinity of struct fd
> handling in there, which will allow a switch to CLASS(fd) and make the
> thing much easier to verify wrt leaks.
>
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> ---
>  kernel/bpf/verifier.c | 342 +++++++++++++++++++++---------------------
>  1 file changed, 172 insertions(+), 170 deletions(-)
>

This looks unnecessarily intrusive. I think it's best to extract the
logic of fetching and adding bpf_map by fd into a helper and that way
contain fdget + fdput logic nicely. Something like below, which I can
send to bpf-next.

commit b5eec08241cc0263e560551de91eda73ccc5987d
Author: Andrii Nakryiko <andrii@kernel.org>
Date:   Tue Aug 6 14:31:34 2024 -0700

    bpf: factor out fetching bpf_map from FD and adding it to used_maps list

    Factor out the logic to extract bpf_map instances from FD embedded in
    bpf_insns, adding it to the list of used_maps (unless it's already
    there, in which case we just reuse map's index). This simplifies the
    logic in resolve_pseudo_ldimm64(), especially around `struct fd`
    handling, as all that is now neatly contained in the helper and doesn't
    leak into a dozen error handling paths.

    Signed-off-by: Andrii Nakryiko <andrii@kernel.org>

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index df3be12096cf..14e4ef687a59 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -18865,6 +18865,58 @@ static bool bpf_map_is_cgroup_storage(struct
bpf_map *map)
         map->map_type == BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE);
 }

+/* Add map behind fd to used maps list, if it's not already there, and return
+ * its index. Also set *reused to true if this map was already in the list of
+ * used maps.
+ * Returns <0 on error, or >= 0 index, on success.
+ */
+static int add_used_map_from_fd(struct bpf_verifier_env *env, int fd,
bool *reused)
+{
+    struct fd f = fdget(fd);
+    struct bpf_map *map;
+    int i;
+
+    map = __bpf_map_get(f);
+    if (IS_ERR(map)) {
+        verbose(env, "fd %d is not pointing to valid bpf_map\n", fd);
+        return PTR_ERR(map);
+    }
+
+    /* check whether we recorded this map already */
+    for (i = 0; i < env->used_map_cnt; i++) {
+        if (env->used_maps[i] == map) {
+            *reused = true;
+            fdput(f);
+            return i;
+        }
+    }
+
+    if (env->used_map_cnt >= MAX_USED_MAPS) {
+        verbose(env, "The total number of maps per program has
reached the limit of %u\n",
+            MAX_USED_MAPS);
+        fdput(f);
+        return -E2BIG;
+    }
+
+    if (env->prog->sleepable)
+        atomic64_inc(&map->sleepable_refcnt);
+
+    /* hold the map. If the program is rejected by verifier,
+     * the map will be released by release_maps() or it
+     * will be used by the valid program until it's unloaded
+     * and all maps are released in bpf_free_used_maps()
+     */
+    bpf_map_inc(map);
+
+    *reused = false;
+    env->used_maps[env->used_map_cnt++] = map;
+
+    fdput(f);
+
+    return env->used_map_cnt - 1;
+
+}
+
 /* find and rewrite pseudo imm in ld_imm64 instructions:
  *
  * 1. if it accesses map FD, replace it with actual map pointer.
@@ -18876,7 +18928,7 @@ static int resolve_pseudo_ldimm64(struct
bpf_verifier_env *env)
 {
     struct bpf_insn *insn = env->prog->insnsi;
     int insn_cnt = env->prog->len;
-    int i, j, err;
+    int i, err;

     err = bpf_prog_calc_tag(env->prog);
     if (err)
@@ -18893,9 +18945,10 @@ static int resolve_pseudo_ldimm64(struct
bpf_verifier_env *env)
         if (insn[0].code == (BPF_LD | BPF_IMM | BPF_DW)) {
             struct bpf_insn_aux_data *aux;
             struct bpf_map *map;
-            struct fd f;
+            int map_idx;
             u64 addr;
             u32 fd;
+            bool reused;

             if (i == insn_cnt - 1 || insn[1].code != 0 ||
                 insn[1].dst_reg != 0 || insn[1].src_reg != 0 ||
@@ -18956,20 +19009,18 @@ static int resolve_pseudo_ldimm64(struct
bpf_verifier_env *env)
                 break;
             }

-            f = fdget(fd);
-            map = __bpf_map_get(f);
-            if (IS_ERR(map)) {
-                verbose(env, "fd %d is not pointing to valid bpf_map\n", fd);
-                return PTR_ERR(map);
-            }
+            map_idx = add_used_map_from_fd(env, fd, &reused);
+            if (map_idx < 0)
+                return map_idx;
+            map = env->used_maps[map_idx];
+
+            aux = &env->insn_aux_data[i];
+            aux->map_index = map_idx;

             err = check_map_prog_compatibility(env, map, env->prog);
-            if (err) {
-                fdput(f);
+            if (err)
                 return err;
-            }

-            aux = &env->insn_aux_data[i];
             if (insn[0].src_reg == BPF_PSEUDO_MAP_FD ||
                 insn[0].src_reg == BPF_PSEUDO_MAP_IDX) {
                 addr = (unsigned long)map;
@@ -18978,13 +19029,11 @@ static int resolve_pseudo_ldimm64(struct
bpf_verifier_env *env)

                 if (off >= BPF_MAX_VAR_OFF) {
                     verbose(env, "direct value offset of %u is not
allowed\n", off);
-                    fdput(f);
                     return -EINVAL;
                 }

                 if (!map->ops->map_direct_value_addr) {
                     verbose(env, "no direct value access support for
this map type\n");
-                    fdput(f);
                     return -EINVAL;
                 }

@@ -18992,7 +19041,6 @@ static int resolve_pseudo_ldimm64(struct
bpf_verifier_env *env)
                 if (err) {
                     verbose(env, "invalid access to map value
pointer, value_size=%u off=%u\n",
                         map->value_size, off);
-                    fdput(f);
                     return err;
                 }

@@ -19003,70 +19051,39 @@ static int resolve_pseudo_ldimm64(struct
bpf_verifier_env *env)
             insn[0].imm = (u32)addr;
             insn[1].imm = addr >> 32;

-            /* check whether we recorded this map already */
-            for (j = 0; j < env->used_map_cnt; j++) {
-                if (env->used_maps[j] == map) {
-                    aux->map_index = j;
-                    fdput(f);
-                    goto next_insn;
-                }
-            }
-
-            if (env->used_map_cnt >= MAX_USED_MAPS) {
-                verbose(env, "The total number of maps per program
has reached the limit of %u\n",
-                    MAX_USED_MAPS);
-                fdput(f);
-                return -E2BIG;
-            }
-
-            if (env->prog->sleepable)
-                atomic64_inc(&map->sleepable_refcnt);
-            /* hold the map. If the program is rejected by verifier,
-             * the map will be released by release_maps() or it
-             * will be used by the valid program until it's unloaded
-             * and all maps are released in bpf_free_used_maps()
-             */
-            bpf_map_inc(map);
-
-            aux->map_index = env->used_map_cnt;
-            env->used_maps[env->used_map_cnt++] = map;
+            /* proceed with extra checks only if its newly added used map */
+            if (reused)
+                goto next_insn;

             if (bpf_map_is_cgroup_storage(map) &&
                 bpf_cgroup_storage_assign(env->prog->aux, map)) {
                 verbose(env, "only one cgroup storage of each type is
allowed\n");
-                fdput(f);
                 return -EBUSY;
             }
             if (map->map_type == BPF_MAP_TYPE_ARENA) {
                 if (env->prog->aux->arena) {
                     verbose(env, "Only one arena per program\n");
-                    fdput(f);
                     return -EBUSY;
                 }
                 if (!env->allow_ptr_leaks || !env->bpf_capable) {
                     verbose(env, "CAP_BPF and CAP_PERFMON are
required to use arena\n");
-                    fdput(f);
                     return -EPERM;
                 }
                 if (!env->prog->jit_requested) {
                     verbose(env, "JIT is required to use arena\n");
-                    fdput(f);
                     return -EOPNOTSUPP;
                 }
                 if (!bpf_jit_supports_arena()) {
                     verbose(env, "JIT doesn't support arena\n");
-                    fdput(f);
                     return -EOPNOTSUPP;
                 }
                 env->prog->aux->arena = (void *)map;
                 if (!bpf_arena_get_user_vm_start(env->prog->aux->arena)) {
                     verbose(env, "arena's user address must be set
via map_extra or mmap()\n");
-                    fdput(f);
                     return -EINVAL;
                 }
             }

-            fdput(f);
 next_insn:
             insn++;
             i++;



[...]

^ permalink raw reply related	[flat|nested] 134+ messages in thread

* Re: [PATCH 17/39] bpf: resolve_pseudo_ldimm64(): take handling of a single ldimm64 insn into helper
  2024-08-06 22:32     ` Andrii Nakryiko
@ 2024-08-07 10:29       ` Christian Brauner
  2024-08-07 15:30         ` Andrii Nakryiko
  0 siblings, 1 reply; 134+ messages in thread
From: Christian Brauner @ 2024-08-07 10:29 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: viro, bpf, linux-fsdevel, amir73il, cgroups, kvm, netdev,
	torvalds

On Tue, Aug 06, 2024 at 03:32:20PM GMT, Andrii Nakryiko wrote:
> On Mon, Jul 29, 2024 at 10:20 PM <viro@kernel.org> wrote:
> >
> > From: Al Viro <viro@zeniv.linux.org.uk>
> >
> > Equivalent transformation.  For one thing, it's easier to follow that way.
> > For another, that simplifies the control flow in the vicinity of struct fd
> > handling in there, which will allow a switch to CLASS(fd) and make the
> > thing much easier to verify wrt leaks.
> >
> > Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> > ---
> >  kernel/bpf/verifier.c | 342 +++++++++++++++++++++---------------------
> >  1 file changed, 172 insertions(+), 170 deletions(-)
> >
> 
> This looks unnecessarily intrusive. I think it's best to extract the
> logic of fetching and adding bpf_map by fd into a helper and that way
> contain fdget + fdput logic nicely. Something like below, which I can
> send to bpf-next.
> 
> commit b5eec08241cc0263e560551de91eda73ccc5987d
> Author: Andrii Nakryiko <andrii@kernel.org>
> Date:   Tue Aug 6 14:31:34 2024 -0700
> 
>     bpf: factor out fetching bpf_map from FD and adding it to used_maps list
> 
>     Factor out the logic to extract bpf_map instances from FD embedded in
>     bpf_insns, adding it to the list of used_maps (unless it's already
>     there, in which case we just reuse map's index). This simplifies the
>     logic in resolve_pseudo_ldimm64(), especially around `struct fd`
>     handling, as all that is now neatly contained in the helper and doesn't
>     leak into a dozen error handling paths.
> 
>     Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
> 
> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> index df3be12096cf..14e4ef687a59 100644
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c
> @@ -18865,6 +18865,58 @@ static bool bpf_map_is_cgroup_storage(struct
> bpf_map *map)
>          map->map_type == BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE);
>  }
> 
> +/* Add map behind fd to used maps list, if it's not already there, and return
> + * its index. Also set *reused to true if this map was already in the list of
> + * used maps.
> + * Returns <0 on error, or >= 0 index, on success.
> + */
> +static int add_used_map_from_fd(struct bpf_verifier_env *env, int fd,
> bool *reused)
> +{
> +    struct fd f = fdget(fd);

Use CLASS(fd, f)(fd) and you can avoid all that fdput() stuff.

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCH 17/39] bpf: resolve_pseudo_ldimm64(): take handling of a single ldimm64 insn into helper
  2024-08-07 10:29       ` Christian Brauner
@ 2024-08-07 15:30         ` Andrii Nakryiko
  2024-08-08 16:51           ` Alexei Starovoitov
  0 siblings, 1 reply; 134+ messages in thread
From: Andrii Nakryiko @ 2024-08-07 15:30 UTC (permalink / raw)
  To: Christian Brauner
  Cc: viro, bpf, linux-fsdevel, amir73il, cgroups, kvm, netdev,
	torvalds

On Wed, Aug 7, 2024 at 3:30 AM Christian Brauner <brauner@kernel.org> wrote:
>
> On Tue, Aug 06, 2024 at 03:32:20PM GMT, Andrii Nakryiko wrote:
> > On Mon, Jul 29, 2024 at 10:20 PM <viro@kernel.org> wrote:
> > >
> > > From: Al Viro <viro@zeniv.linux.org.uk>
> > >
> > > Equivalent transformation.  For one thing, it's easier to follow that way.
> > > For another, that simplifies the control flow in the vicinity of struct fd
> > > handling in there, which will allow a switch to CLASS(fd) and make the
> > > thing much easier to verify wrt leaks.
> > >
> > > Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> > > ---
> > >  kernel/bpf/verifier.c | 342 +++++++++++++++++++++---------------------
> > >  1 file changed, 172 insertions(+), 170 deletions(-)
> > >
> >
> > This looks unnecessarily intrusive. I think it's best to extract the
> > logic of fetching and adding bpf_map by fd into a helper and that way
> > contain fdget + fdput logic nicely. Something like below, which I can
> > send to bpf-next.
> >
> > commit b5eec08241cc0263e560551de91eda73ccc5987d
> > Author: Andrii Nakryiko <andrii@kernel.org>
> > Date:   Tue Aug 6 14:31:34 2024 -0700
> >
> >     bpf: factor out fetching bpf_map from FD and adding it to used_maps list
> >
> >     Factor out the logic to extract bpf_map instances from FD embedded in
> >     bpf_insns, adding it to the list of used_maps (unless it's already
> >     there, in which case we just reuse map's index). This simplifies the
> >     logic in resolve_pseudo_ldimm64(), especially around `struct fd`
> >     handling, as all that is now neatly contained in the helper and doesn't
> >     leak into a dozen error handling paths.
> >
> >     Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
> >
> > diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> > index df3be12096cf..14e4ef687a59 100644
> > --- a/kernel/bpf/verifier.c
> > +++ b/kernel/bpf/verifier.c
> > @@ -18865,6 +18865,58 @@ static bool bpf_map_is_cgroup_storage(struct
> > bpf_map *map)
> >          map->map_type == BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE);
> >  }
> >
> > +/* Add map behind fd to used maps list, if it's not already there, and return
> > + * its index. Also set *reused to true if this map was already in the list of
> > + * used maps.
> > + * Returns <0 on error, or >= 0 index, on success.
> > + */
> > +static int add_used_map_from_fd(struct bpf_verifier_env *env, int fd,
> > bool *reused)
> > +{
> > +    struct fd f = fdget(fd);
>
> Use CLASS(fd, f)(fd) and you can avoid all that fdput() stuff.

That was the point of Al's next patch in the series, so I didn't want
to do it in this one that just refactored the logic of adding maps.
But I can fold that in and send it to bpf-next.

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCH 17/39] bpf: resolve_pseudo_ldimm64(): take handling of a single ldimm64 insn into helper
  2024-08-07 15:30         ` Andrii Nakryiko
@ 2024-08-08 16:51           ` Alexei Starovoitov
  2024-08-08 20:35             ` Andrii Nakryiko
  2024-08-10  3:29             ` Al Viro
  0 siblings, 2 replies; 134+ messages in thread
From: Alexei Starovoitov @ 2024-08-08 16:51 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Christian Brauner, viro, bpf, Linux-Fsdevel, Amir Goldstein,
	open list:CONTROL GROUP (CGROUP), kvm, Network Development,
	Linus Torvalds

On Wed, Aug 7, 2024 at 8:31 AM Andrii Nakryiko
<andrii.nakryiko@gmail.com> wrote:
>
> On Wed, Aug 7, 2024 at 3:30 AM Christian Brauner <brauner@kernel.org> wrote:
> >
> > On Tue, Aug 06, 2024 at 03:32:20PM GMT, Andrii Nakryiko wrote:
> > > On Mon, Jul 29, 2024 at 10:20 PM <viro@kernel.org> wrote:
> > > >
> > > > From: Al Viro <viro@zeniv.linux.org.uk>
> > > >
> > > > Equivalent transformation.  For one thing, it's easier to follow that way.
> > > > For another, that simplifies the control flow in the vicinity of struct fd
> > > > handling in there, which will allow a switch to CLASS(fd) and make the
> > > > thing much easier to verify wrt leaks.
> > > >
> > > > Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> > > > ---
> > > >  kernel/bpf/verifier.c | 342 +++++++++++++++++++++---------------------
> > > >  1 file changed, 172 insertions(+), 170 deletions(-)
> > > >
> > >
> > > This looks unnecessarily intrusive. I think it's best to extract the
> > > logic of fetching and adding bpf_map by fd into a helper and that way
> > > contain fdget + fdput logic nicely. Something like below, which I can
> > > send to bpf-next.
> > >
> > > commit b5eec08241cc0263e560551de91eda73ccc5987d
> > > Author: Andrii Nakryiko <andrii@kernel.org>
> > > Date:   Tue Aug 6 14:31:34 2024 -0700
> > >
> > >     bpf: factor out fetching bpf_map from FD and adding it to used_maps list
> > >
> > >     Factor out the logic to extract bpf_map instances from FD embedded in
> > >     bpf_insns, adding it to the list of used_maps (unless it's already
> > >     there, in which case we just reuse map's index). This simplifies the
> > >     logic in resolve_pseudo_ldimm64(), especially around `struct fd`
> > >     handling, as all that is now neatly contained in the helper and doesn't
> > >     leak into a dozen error handling paths.
> > >
> > >     Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
> > >
> > > diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> > > index df3be12096cf..14e4ef687a59 100644
> > > --- a/kernel/bpf/verifier.c
> > > +++ b/kernel/bpf/verifier.c
> > > @@ -18865,6 +18865,58 @@ static bool bpf_map_is_cgroup_storage(struct
> > > bpf_map *map)
> > >          map->map_type == BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE);
> > >  }
> > >
> > > +/* Add map behind fd to used maps list, if it's not already there, and return
> > > + * its index. Also set *reused to true if this map was already in the list of
> > > + * used maps.
> > > + * Returns <0 on error, or >= 0 index, on success.
> > > + */
> > > +static int add_used_map_from_fd(struct bpf_verifier_env *env, int fd,
> > > bool *reused)
> > > +{
> > > +    struct fd f = fdget(fd);
> >
> > Use CLASS(fd, f)(fd) and you can avoid all that fdput() stuff.
>
> That was the point of Al's next patch in the series, so I didn't want
> to do it in this one that just refactored the logic of adding maps.
> But I can fold that in and send it to bpf-next.

+1.

The bpf changes look ok and Andrii's approach is easier to grasp.
It's better to route bpf conversion to CLASS(fd,..) via bpf-next,
so it goes through bpf CI and our other testing.

bpf patches don't seem to depend on newly added CLASS(fd_pos, ...
and fderr, so pretty much independent from other patches.

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCH 17/39] bpf: resolve_pseudo_ldimm64(): take handling of a single ldimm64 insn into helper
  2024-08-08 16:51           ` Alexei Starovoitov
@ 2024-08-08 20:35             ` Andrii Nakryiko
  2024-08-09  1:23               ` Alexei Starovoitov
  2024-08-10  3:29             ` Al Viro
  1 sibling, 1 reply; 134+ messages in thread
From: Andrii Nakryiko @ 2024-08-08 20:35 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Christian Brauner, viro, bpf, Linux-Fsdevel, Amir Goldstein,
	open list:CONTROL GROUP (CGROUP), kvm, Network Development,
	Linus Torvalds

On Thu, Aug 8, 2024 at 9:51 AM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Wed, Aug 7, 2024 at 8:31 AM Andrii Nakryiko
> <andrii.nakryiko@gmail.com> wrote:
> >
> > On Wed, Aug 7, 2024 at 3:30 AM Christian Brauner <brauner@kernel.org> wrote:
> > >
> > > On Tue, Aug 06, 2024 at 03:32:20PM GMT, Andrii Nakryiko wrote:
> > > > On Mon, Jul 29, 2024 at 10:20 PM <viro@kernel.org> wrote:
> > > > >
> > > > > From: Al Viro <viro@zeniv.linux.org.uk>
> > > > >
> > > > > Equivalent transformation.  For one thing, it's easier to follow that way.
> > > > > For another, that simplifies the control flow in the vicinity of struct fd
> > > > > handling in there, which will allow a switch to CLASS(fd) and make the
> > > > > thing much easier to verify wrt leaks.
> > > > >
> > > > > Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> > > > > ---
> > > > >  kernel/bpf/verifier.c | 342 +++++++++++++++++++++---------------------
> > > > >  1 file changed, 172 insertions(+), 170 deletions(-)
> > > > >
> > > >
> > > > This looks unnecessarily intrusive. I think it's best to extract the
> > > > logic of fetching and adding bpf_map by fd into a helper and that way
> > > > contain fdget + fdput logic nicely. Something like below, which I can
> > > > send to bpf-next.
> > > >
> > > > commit b5eec08241cc0263e560551de91eda73ccc5987d
> > > > Author: Andrii Nakryiko <andrii@kernel.org>
> > > > Date:   Tue Aug 6 14:31:34 2024 -0700
> > > >
> > > >     bpf: factor out fetching bpf_map from FD and adding it to used_maps list
> > > >
> > > >     Factor out the logic to extract bpf_map instances from FD embedded in
> > > >     bpf_insns, adding it to the list of used_maps (unless it's already
> > > >     there, in which case we just reuse map's index). This simplifies the
> > > >     logic in resolve_pseudo_ldimm64(), especially around `struct fd`
> > > >     handling, as all that is now neatly contained in the helper and doesn't
> > > >     leak into a dozen error handling paths.
> > > >
> > > >     Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
> > > >
> > > > diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> > > > index df3be12096cf..14e4ef687a59 100644
> > > > --- a/kernel/bpf/verifier.c
> > > > +++ b/kernel/bpf/verifier.c
> > > > @@ -18865,6 +18865,58 @@ static bool bpf_map_is_cgroup_storage(struct
> > > > bpf_map *map)
> > > >          map->map_type == BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE);
> > > >  }
> > > >
> > > > +/* Add map behind fd to used maps list, if it's not already there, and return
> > > > + * its index. Also set *reused to true if this map was already in the list of
> > > > + * used maps.
> > > > + * Returns <0 on error, or >= 0 index, on success.
> > > > + */
> > > > +static int add_used_map_from_fd(struct bpf_verifier_env *env, int fd,
> > > > bool *reused)
> > > > +{
> > > > +    struct fd f = fdget(fd);
> > >
> > > Use CLASS(fd, f)(fd) and you can avoid all that fdput() stuff.
> >
> > That was the point of Al's next patch in the series, so I didn't want
> > to do it in this one that just refactored the logic of adding maps.
> > But I can fold that in and send it to bpf-next.
>
> +1.
>
> The bpf changes look ok and Andrii's approach is easier to grasp.
> It's better to route bpf conversion to CLASS(fd,..) via bpf-next,
> so it goes through bpf CI and our other testing.
>
> bpf patches don't seem to depend on newly added CLASS(fd_pos, ...
> and fderr, so pretty much independent from other patches.

Ok, so CLASS(fd, f) won't work just yet because of peculiar
__bpf_map_get() contract: if it gets valid struct fd but it doesn't
contain a valid struct bpf_map, then __bpf_map_get() does fdput()
internally. In all other cases the caller has to do fdput() and
returned struct bpf_map's refcount has to be bumped by the caller
(__bpf_map_get() doesn't do that, I guess that's why it's
double-underscored).

I think the reason it was done was just a convenience to not have to
get/put bpf_map for temporary uses (and instead rely on file's
reference keeping bpf_map alive), plus we have bpf_map_inc() and
bpf_map_inc_uref() variants, so in some cases we need to bump just
refcount, and in some both user and normal refcounts.

So can't use CLASS(fd, ...) without some more clean up.

Alexei, how about changing __bpf_map_get(struct fd f) to
__bpf_map_get_from_fd(int ufd), doing fdget/fdput internally, and
always returning bpf_map with (normal) refcount bumped (if successful,
of course). We can then split bpf_map_inc_with_uref() into just
bpf_map_inc() and bpf_map_inc_uref(), and callers will be able to do
extra uref-only increment, if necessary.

I can do that as a pre-patch, there are about 15 callers, so not too
much work to clean this up. Let me know.

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCH 17/39] bpf: resolve_pseudo_ldimm64(): take handling of a single ldimm64 insn into helper
  2024-08-08 20:35             ` Andrii Nakryiko
@ 2024-08-09  1:23               ` Alexei Starovoitov
  2024-08-09 17:23                 ` Andrii Nakryiko
  0 siblings, 1 reply; 134+ messages in thread
From: Alexei Starovoitov @ 2024-08-09  1:23 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Christian Brauner, viro, bpf, Linux-Fsdevel, Amir Goldstein,
	open list:CONTROL GROUP (CGROUP), kvm, Network Development,
	Linus Torvalds

On Thu, Aug 8, 2024 at 1:35 PM Andrii Nakryiko
<andrii.nakryiko@gmail.com> wrote:
>
> On Thu, Aug 8, 2024 at 9:51 AM Alexei Starovoitov
> <alexei.starovoitov@gmail.com> wrote:
> >
> > On Wed, Aug 7, 2024 at 8:31 AM Andrii Nakryiko
> > <andrii.nakryiko@gmail.com> wrote:
> > >
> > > On Wed, Aug 7, 2024 at 3:30 AM Christian Brauner <brauner@kernel.org> wrote:
> > > >
> > > > On Tue, Aug 06, 2024 at 03:32:20PM GMT, Andrii Nakryiko wrote:
> > > > > On Mon, Jul 29, 2024 at 10:20 PM <viro@kernel.org> wrote:
> > > > > >
> > > > > > From: Al Viro <viro@zeniv.linux.org.uk>
> > > > > >
> > > > > > Equivalent transformation.  For one thing, it's easier to follow that way.
> > > > > > For another, that simplifies the control flow in the vicinity of struct fd
> > > > > > handling in there, which will allow a switch to CLASS(fd) and make the
> > > > > > thing much easier to verify wrt leaks.
> > > > > >
> > > > > > Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> > > > > > ---
> > > > > >  kernel/bpf/verifier.c | 342 +++++++++++++++++++++---------------------
> > > > > >  1 file changed, 172 insertions(+), 170 deletions(-)
> > > > > >
> > > > >
> > > > > This looks unnecessarily intrusive. I think it's best to extract the
> > > > > logic of fetching and adding bpf_map by fd into a helper and that way
> > > > > contain fdget + fdput logic nicely. Something like below, which I can
> > > > > send to bpf-next.
> > > > >
> > > > > commit b5eec08241cc0263e560551de91eda73ccc5987d
> > > > > Author: Andrii Nakryiko <andrii@kernel.org>
> > > > > Date:   Tue Aug 6 14:31:34 2024 -0700
> > > > >
> > > > >     bpf: factor out fetching bpf_map from FD and adding it to used_maps list
> > > > >
> > > > >     Factor out the logic to extract bpf_map instances from FD embedded in
> > > > >     bpf_insns, adding it to the list of used_maps (unless it's already
> > > > >     there, in which case we just reuse map's index). This simplifies the
> > > > >     logic in resolve_pseudo_ldimm64(), especially around `struct fd`
> > > > >     handling, as all that is now neatly contained in the helper and doesn't
> > > > >     leak into a dozen error handling paths.
> > > > >
> > > > >     Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
> > > > >
> > > > > diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> > > > > index df3be12096cf..14e4ef687a59 100644
> > > > > --- a/kernel/bpf/verifier.c
> > > > > +++ b/kernel/bpf/verifier.c
> > > > > @@ -18865,6 +18865,58 @@ static bool bpf_map_is_cgroup_storage(struct
> > > > > bpf_map *map)
> > > > >          map->map_type == BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE);
> > > > >  }
> > > > >
> > > > > +/* Add map behind fd to used maps list, if it's not already there, and return
> > > > > + * its index. Also set *reused to true if this map was already in the list of
> > > > > + * used maps.
> > > > > + * Returns <0 on error, or >= 0 index, on success.
> > > > > + */
> > > > > +static int add_used_map_from_fd(struct bpf_verifier_env *env, int fd,
> > > > > bool *reused)
> > > > > +{
> > > > > +    struct fd f = fdget(fd);
> > > >
> > > > Use CLASS(fd, f)(fd) and you can avoid all that fdput() stuff.
> > >
> > > That was the point of Al's next patch in the series, so I didn't want
> > > to do it in this one that just refactored the logic of adding maps.
> > > But I can fold that in and send it to bpf-next.
> >
> > +1.
> >
> > The bpf changes look ok and Andrii's approach is easier to grasp.
> > It's better to route bpf conversion to CLASS(fd,..) via bpf-next,
> > so it goes through bpf CI and our other testing.
> >
> > bpf patches don't seem to depend on newly added CLASS(fd_pos, ...
> > and fderr, so pretty much independent from other patches.
>
> Ok, so CLASS(fd, f) won't work just yet because of peculiar
> __bpf_map_get() contract: if it gets valid struct fd but it doesn't
> contain a valid struct bpf_map, then __bpf_map_get() does fdput()
> internally. In all other cases the caller has to do fdput() and
> returned struct bpf_map's refcount has to be bumped by the caller
> (__bpf_map_get() doesn't do that, I guess that's why it's
> double-underscored).
>
> I think the reason it was done was just a convenience to not have to
> get/put bpf_map for temporary uses (and instead rely on file's
> reference keeping bpf_map alive), plus we have bpf_map_inc() and
> bpf_map_inc_uref() variants, so in some cases we need to bump just
> refcount, and in some both user and normal refcounts.
>
> So can't use CLASS(fd, ...) without some more clean up.
>
> Alexei, how about changing __bpf_map_get(struct fd f) to
> __bpf_map_get_from_fd(int ufd), doing fdget/fdput internally, and
> always returning bpf_map with (normal) refcount bumped (if successful,
> of course). We can then split bpf_map_inc_with_uref() into just
> bpf_map_inc() and bpf_map_inc_uref(), and callers will be able to do
> extra uref-only increment, if necessary.
>
> I can do that as a pre-patch, there are about 15 callers, so not too
> much work to clean this up. Let me know.

Yeah. Let's kill __bpf_map_get(struct fd ..) altogether.
This logic was added in 2014.
fdget() had to be first and fdput() last to make sure
the map won't disappear while sys_bpf command is running.
All of the places can use bpf_map_get(), bpf_map_put() pair
and rely on map->refcnt, but...

- it's atomic64_inc(&map->refcnt); The cost is probably
in the noise compared to all the work that map sys_bpf commands do.

- It also opens new fuzzing opportunity to do some map operation
in one thread and close(map_fd) in the other, so map->usercnt can
drop to zero and map_release_uref() cleanup can start while
the other thread is still busy doing something like map_update_elem().
It can be mitigated by doing bpf_map_get_with_uref(), but two
atomic64_inc() is kinda too much.

So let's remove __bpf_map_get() and replace all users with bpf_map_get(),
but we may need to revisit that later.

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCH 17/39] bpf: resolve_pseudo_ldimm64(): take handling of a single ldimm64 insn into helper
  2024-08-09  1:23               ` Alexei Starovoitov
@ 2024-08-09 17:23                 ` Andrii Nakryiko
  0 siblings, 0 replies; 134+ messages in thread
From: Andrii Nakryiko @ 2024-08-09 17:23 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Christian Brauner, viro, bpf, Linux-Fsdevel, Amir Goldstein,
	open list:CONTROL GROUP (CGROUP), kvm, Network Development,
	Linus Torvalds

On Thu, Aug 8, 2024 at 6:23 PM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Thu, Aug 8, 2024 at 1:35 PM Andrii Nakryiko
> <andrii.nakryiko@gmail.com> wrote:
> >
> > On Thu, Aug 8, 2024 at 9:51 AM Alexei Starovoitov
> > <alexei.starovoitov@gmail.com> wrote:
> > >
> > > On Wed, Aug 7, 2024 at 8:31 AM Andrii Nakryiko
> > > <andrii.nakryiko@gmail.com> wrote:
> > > >
> > > > On Wed, Aug 7, 2024 at 3:30 AM Christian Brauner <brauner@kernel.org> wrote:
> > > > >
> > > > > On Tue, Aug 06, 2024 at 03:32:20PM GMT, Andrii Nakryiko wrote:
> > > > > > On Mon, Jul 29, 2024 at 10:20 PM <viro@kernel.org> wrote:
> > > > > > >
> > > > > > > From: Al Viro <viro@zeniv.linux.org.uk>
> > > > > > >
> > > > > > > Equivalent transformation.  For one thing, it's easier to follow that way.
> > > > > > > For another, that simplifies the control flow in the vicinity of struct fd
> > > > > > > handling in there, which will allow a switch to CLASS(fd) and make the
> > > > > > > thing much easier to verify wrt leaks.
> > > > > > >
> > > > > > > Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> > > > > > > ---
> > > > > > >  kernel/bpf/verifier.c | 342 +++++++++++++++++++++---------------------
> > > > > > >  1 file changed, 172 insertions(+), 170 deletions(-)
> > > > > > >
> > > > > >
> > > > > > This looks unnecessarily intrusive. I think it's best to extract the
> > > > > > logic of fetching and adding bpf_map by fd into a helper and that way
> > > > > > contain fdget + fdput logic nicely. Something like below, which I can
> > > > > > send to bpf-next.
> > > > > >
> > > > > > commit b5eec08241cc0263e560551de91eda73ccc5987d
> > > > > > Author: Andrii Nakryiko <andrii@kernel.org>
> > > > > > Date:   Tue Aug 6 14:31:34 2024 -0700
> > > > > >
> > > > > >     bpf: factor out fetching bpf_map from FD and adding it to used_maps list
> > > > > >
> > > > > >     Factor out the logic to extract bpf_map instances from FD embedded in
> > > > > >     bpf_insns, adding it to the list of used_maps (unless it's already
> > > > > >     there, in which case we just reuse map's index). This simplifies the
> > > > > >     logic in resolve_pseudo_ldimm64(), especially around `struct fd`
> > > > > >     handling, as all that is now neatly contained in the helper and doesn't
> > > > > >     leak into a dozen error handling paths.
> > > > > >
> > > > > >     Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
> > > > > >
> > > > > > diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> > > > > > index df3be12096cf..14e4ef687a59 100644
> > > > > > --- a/kernel/bpf/verifier.c
> > > > > > +++ b/kernel/bpf/verifier.c
> > > > > > @@ -18865,6 +18865,58 @@ static bool bpf_map_is_cgroup_storage(struct
> > > > > > bpf_map *map)
> > > > > >          map->map_type == BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE);
> > > > > >  }
> > > > > >
> > > > > > +/* Add map behind fd to used maps list, if it's not already there, and return
> > > > > > + * its index. Also set *reused to true if this map was already in the list of
> > > > > > + * used maps.
> > > > > > + * Returns <0 on error, or >= 0 index, on success.
> > > > > > + */
> > > > > > +static int add_used_map_from_fd(struct bpf_verifier_env *env, int fd,
> > > > > > bool *reused)
> > > > > > +{
> > > > > > +    struct fd f = fdget(fd);
> > > > >
> > > > > Use CLASS(fd, f)(fd) and you can avoid all that fdput() stuff.
> > > >
> > > > That was the point of Al's next patch in the series, so I didn't want
> > > > to do it in this one that just refactored the logic of adding maps.
> > > > But I can fold that in and send it to bpf-next.
> > >
> > > +1.
> > >
> > > The bpf changes look ok and Andrii's approach is easier to grasp.
> > > It's better to route bpf conversion to CLASS(fd,..) via bpf-next,
> > > so it goes through bpf CI and our other testing.
> > >
> > > bpf patches don't seem to depend on newly added CLASS(fd_pos, ...
> > > and fderr, so pretty much independent from other patches.
> >
> > Ok, so CLASS(fd, f) won't work just yet because of peculiar
> > __bpf_map_get() contract: if it gets valid struct fd but it doesn't
> > contain a valid struct bpf_map, then __bpf_map_get() does fdput()
> > internally. In all other cases the caller has to do fdput() and
> > returned struct bpf_map's refcount has to be bumped by the caller
> > (__bpf_map_get() doesn't do that, I guess that's why it's
> > double-underscored).
> >
> > I think the reason it was done was just a convenience to not have to
> > get/put bpf_map for temporary uses (and instead rely on file's
> > reference keeping bpf_map alive), plus we have bpf_map_inc() and
> > bpf_map_inc_uref() variants, so in some cases we need to bump just
> > refcount, and in some both user and normal refcounts.
> >
> > So can't use CLASS(fd, ...) without some more clean up.
> >
> > Alexei, how about changing __bpf_map_get(struct fd f) to
> > __bpf_map_get_from_fd(int ufd), doing fdget/fdput internally, and
> > always returning bpf_map with (normal) refcount bumped (if successful,
> > of course). We can then split bpf_map_inc_with_uref() into just
> > bpf_map_inc() and bpf_map_inc_uref(), and callers will be able to do
> > extra uref-only increment, if necessary.
> >
> > I can do that as a pre-patch, there are about 15 callers, so not too
> > much work to clean this up. Let me know.
>
> Yeah. Let's kill __bpf_map_get(struct fd ..) altogether.
> This logic was added in 2014.
> fdget() had to be first and fdput() last to make sure
> the map won't disappear while sys_bpf command is running.
> All of the places can use bpf_map_get(), bpf_map_put() pair
> and rely on map->refcnt, but...
>
> - it's atomic64_inc(&map->refcnt); The cost is probably
> in the noise compared to all the work that map sys_bpf commands do.
>

agreed, not too worried about this

> - It also opens new fuzzing opportunity to do some map operation
> in one thread and close(map_fd) in the other, so map->usercnt can
> drop to zero and map_release_uref() cleanup can start while
> the other thread is still busy doing something like map_update_elem().
> It can be mitigated by doing bpf_map_get_with_uref(), but two
> atomic64_inc() is kinda too much.
>

yep, with_uref() is an overkill for most cases. I'd rather fix any
such bugs, if we have them.

> So let's remove __bpf_map_get() and replace all users with bpf_map_get(),
> but we may need to revisit that later.

Ok, I will probably send something next week.

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCH 17/39] bpf: resolve_pseudo_ldimm64(): take handling of a single ldimm64 insn into helper
  2024-08-08 16:51           ` Alexei Starovoitov
  2024-08-08 20:35             ` Andrii Nakryiko
@ 2024-08-10  3:29             ` Al Viro
  2024-08-12 20:05               ` Andrii Nakryiko
  1 sibling, 1 reply; 134+ messages in thread
From: Al Viro @ 2024-08-10  3:29 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Andrii Nakryiko, Christian Brauner, viro, bpf, Linux-Fsdevel,
	Amir Goldstein, open list:CONTROL GROUP (CGROUP), kvm,
	Network Development, Linus Torvalds

On Thu, Aug 08, 2024 at 09:51:34AM -0700, Alexei Starovoitov wrote:

> The bpf changes look ok and Andrii's approach is easier to grasp.
> It's better to route bpf conversion to CLASS(fd,..) via bpf-next,
> so it goes through bpf CI and our other testing.
> 
> bpf patches don't seem to depend on newly added CLASS(fd_pos, ...
> and fderr, so pretty much independent from other patches.

Representation change and switch to accessors do matter, though.
OTOH, I can put just those into never-rebased branch (basically,
"introduce fd_file(), convert all accessors to it" +
"struct fd representation change" + possibly "add struct fd constructors,
get rid of __to_fd()", for completeness sake), so you could pull it.
Otherwise you'll get textual conflicts on all those f.file vs. fd_file(f)...

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCH 17/39] bpf: resolve_pseudo_ldimm64(): take handling of a single ldimm64 insn into helper
  2024-08-10  3:29             ` Al Viro
@ 2024-08-12 20:05               ` Andrii Nakryiko
  2024-08-13  2:06                 ` Al Viro
  0 siblings, 1 reply; 134+ messages in thread
From: Andrii Nakryiko @ 2024-08-12 20:05 UTC (permalink / raw)
  To: Al Viro
  Cc: Alexei Starovoitov, Christian Brauner, viro, bpf, Linux-Fsdevel,
	Amir Goldstein, open list:CONTROL GROUP (CGROUP), kvm,
	Network Development, Linus Torvalds

On Fri, Aug 9, 2024 at 8:29 PM Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> On Thu, Aug 08, 2024 at 09:51:34AM -0700, Alexei Starovoitov wrote:
>
> > The bpf changes look ok and Andrii's approach is easier to grasp.
> > It's better to route bpf conversion to CLASS(fd,..) via bpf-next,
> > so it goes through bpf CI and our other testing.
> >
> > bpf patches don't seem to depend on newly added CLASS(fd_pos, ...
> > and fderr, so pretty much independent from other patches.
>
> Representation change and switch to accessors do matter, though.
> OTOH, I can put just those into never-rebased branch (basically,
> "introduce fd_file(), convert all accessors to it" +
> "struct fd representation change" + possibly "add struct fd constructors,
> get rid of __to_fd()", for completeness sake), so you could pull it.
> Otherwise you'll get textual conflicts on all those f.file vs. fd_file(f)...

Yep, makes sense. Let's do that, we can merge that branch into
bpf-next/master and I will follow up with my changes on top of that.

Let's just drop the do_one_ldimm64() extraction, and keep fdput(f)
logic, plus add fd_file() accessor changes. I'll then add a switch to
CLASS(fd) after a bit more BPF-specific clean ups. This code is pretty
sensitive, so I'd rather have all the non-trivial refactoring done
separately. Thanks!

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCH 17/39] bpf: resolve_pseudo_ldimm64(): take handling of a single ldimm64 insn into helper
  2024-08-12 20:05               ` Andrii Nakryiko
@ 2024-08-13  2:06                 ` Al Viro
  2024-08-13  3:32                   ` Andrii Nakryiko
  0 siblings, 1 reply; 134+ messages in thread
From: Al Viro @ 2024-08-13  2:06 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Alexei Starovoitov, Christian Brauner, viro, bpf, Linux-Fsdevel,
	Amir Goldstein, open list:CONTROL GROUP (CGROUP), kvm,
	Network Development, Linus Torvalds

On Mon, Aug 12, 2024 at 01:05:19PM -0700, Andrii Nakryiko wrote:
> On Fri, Aug 9, 2024 at 8:29???PM Al Viro <viro@zeniv.linux.org.uk> wrote:
> >
> > On Thu, Aug 08, 2024 at 09:51:34AM -0700, Alexei Starovoitov wrote:
> >
> > > The bpf changes look ok and Andrii's approach is easier to grasp.
> > > It's better to route bpf conversion to CLASS(fd,..) via bpf-next,
> > > so it goes through bpf CI and our other testing.
> > >
> > > bpf patches don't seem to depend on newly added CLASS(fd_pos, ...
> > > and fderr, so pretty much independent from other patches.
> >
> > Representation change and switch to accessors do matter, though.
> > OTOH, I can put just those into never-rebased branch (basically,
> > "introduce fd_file(), convert all accessors to it" +
> > "struct fd representation change" + possibly "add struct fd constructors,
> > get rid of __to_fd()", for completeness sake), so you could pull it.
> > Otherwise you'll get textual conflicts on all those f.file vs. fd_file(f)...
> 
> Yep, makes sense. Let's do that, we can merge that branch into
> bpf-next/master and I will follow up with my changes on top of that.
> 
> Let's just drop the do_one_ldimm64() extraction, and keep fdput(f)
> logic, plus add fd_file() accessor changes. I'll then add a switch to
> CLASS(fd) after a bit more BPF-specific clean ups. This code is pretty
> sensitive, so I'd rather have all the non-trivial refactoring done
> separately. Thanks!

Done (#stable-struct_fd); BTW, which tree do you want "convert __bpf_prog_get()
to CLASS(fd)" to go through?

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCH 17/39] bpf: resolve_pseudo_ldimm64(): take handling of a single ldimm64 insn into helper
  2024-08-13  2:06                 ` Al Viro
@ 2024-08-13  3:32                   ` Andrii Nakryiko
  0 siblings, 0 replies; 134+ messages in thread
From: Andrii Nakryiko @ 2024-08-13  3:32 UTC (permalink / raw)
  To: Al Viro
  Cc: Alexei Starovoitov, Christian Brauner, viro, bpf, Linux-Fsdevel,
	Amir Goldstein, open list:CONTROL GROUP (CGROUP), kvm,
	Network Development, Linus Torvalds

On Mon, Aug 12, 2024 at 7:06 PM Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> On Mon, Aug 12, 2024 at 01:05:19PM -0700, Andrii Nakryiko wrote:
> > On Fri, Aug 9, 2024 at 8:29???PM Al Viro <viro@zeniv.linux.org.uk> wrote:
> > >
> > > On Thu, Aug 08, 2024 at 09:51:34AM -0700, Alexei Starovoitov wrote:
> > >
> > > > The bpf changes look ok and Andrii's approach is easier to grasp.
> > > > It's better to route bpf conversion to CLASS(fd,..) via bpf-next,
> > > > so it goes through bpf CI and our other testing.
> > > >
> > > > bpf patches don't seem to depend on newly added CLASS(fd_pos, ...
> > > > and fderr, so pretty much independent from other patches.
> > >
> > > Representation change and switch to accessors do matter, though.
> > > OTOH, I can put just those into never-rebased branch (basically,
> > > "introduce fd_file(), convert all accessors to it" +
> > > "struct fd representation change" + possibly "add struct fd constructors,
> > > get rid of __to_fd()", for completeness sake), so you could pull it.
> > > Otherwise you'll get textual conflicts on all those f.file vs. fd_file(f)...
> >
> > Yep, makes sense. Let's do that, we can merge that branch into
> > bpf-next/master and I will follow up with my changes on top of that.
> >
> > Let's just drop the do_one_ldimm64() extraction, and keep fdput(f)
> > logic, plus add fd_file() accessor changes. I'll then add a switch to
> > CLASS(fd) after a bit more BPF-specific clean ups. This code is pretty
> > sensitive, so I'd rather have all the non-trivial refactoring done
> > separately. Thanks!
>
> Done (#stable-struct_fd);

great, thanks, I'll look at this tomorrow

> BTW, which tree do you want "convert __bpf_prog_get()
> to CLASS(fd)" to go through?

So we seem to have the following for BPF-related stuff:

[PATCH 16/39] convert __bpf_prog_get() to CLASS(fd, ...)

This looks to be ready to go in.

[PATCH 17/39] bpf: resolve_pseudo_ldimm64(): take handling of a single
ldimm64 insn into helper

This one I'd like to rework differently and land it through bpf-next.

[PATCH 18/39] bpf maps: switch to CLASS(fd, ...)

This one touches __bpf_map_get() which I'm going to remove or refactor
as part of the abovementioned refactoring, so there will be conflicts.

[PATCH 19/39] fdget_raw() users: switch to CLASS(fd_raw, ...)

This one touches a bunch of cases across multiple systems, including
BPF's kernel/bpf/bpf_inode_storage.c.


So how about this. We take #16 as is through bpf-next, change how #17
is done, take 18 mostly as is but adjust as necessary. As for #19, if
you could split out changes in bpf_inode_storage.c to a separate
patch, we can also apply it in bpf-next as one coherent set. I'll send
all that as one complete patch set for you to do the final review.

WDYT?

^ permalink raw reply	[flat|nested] 134+ messages in thread

* [PATCH 18/39] bpf maps: switch to CLASS(fd, ...)
  2024-07-30  5:15 ` [PATCH 01/39] memcg_write_event_control(): fix a user-triggerable oops viro
                     ` (15 preceding siblings ...)
  2024-07-30  5:16   ` [PATCH 17/39] bpf: resolve_pseudo_ldimm64(): take handling of a single ldimm64 insn into helper viro
@ 2024-07-30  5:16   ` viro
  2024-08-07 10:34     ` Christian Brauner
  2024-07-30  5:16   ` [PATCH 19/39] fdget_raw() users: switch to CLASS(fd_raw, ...) viro
                     ` (21 subsequent siblings)
  38 siblings, 1 reply; 134+ messages in thread
From: viro @ 2024-07-30  5:16 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: amir73il, bpf, brauner, cgroups, kvm, netdev, torvalds

From: Al Viro <viro@zeniv.linux.org.uk>

        Calling conventions for __bpf_map_get() would be more convenient
if it left fpdut() on failure to callers.  Makes for simpler logics
in the callers.

	Among other things, the proof of memory safety no longer has to
rely upon file->private_data never being ERR_PTR(...) for bpffs files.
Original calling conventions made it impossible for the caller to tell
whether __bpf_map_get() has returned ERR_PTR(-EINVAL) because it has found
the file not be a bpf map one (in which case it would've done fdput())
or because it found that ERR_PTR(-EINVAL) in file->private_data of a
bpf map file (in which case fdput() would _not_ have been done).

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 include/linux/bpf.h     |   9 +++-
 kernel/bpf/map_in_map.c |  38 ++++---------
 kernel/bpf/syscall.c    | 117 ++++++++++++----------------------------
 kernel/bpf/verifier.c   |  19 +------
 net/core/sock_map.c     |  23 +++-----
 5 files changed, 60 insertions(+), 146 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 3b94ec161e8c..004d37e8b8df 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -2227,7 +2227,14 @@ void __bpf_obj_drop_impl(void *p, const struct btf_record *rec, bool percpu);
 
 struct bpf_map *bpf_map_get(u32 ufd);
 struct bpf_map *bpf_map_get_with_uref(u32 ufd);
-struct bpf_map *__bpf_map_get(struct fd f);
+
+struct bpf_map *bpf_file_to_map(struct file *f);
+static inline struct bpf_map *__bpf_map_get(struct fd f)
+{
+	if (fd_empty(f))
+		return ERR_PTR(-EBADF);
+	return bpf_file_to_map(fd_file(f));
+}
 void bpf_map_inc(struct bpf_map *map);
 void bpf_map_inc_with_uref(struct bpf_map *map);
 struct bpf_map *__bpf_map_inc_not_zero(struct bpf_map *map, bool uref);
diff --git a/kernel/bpf/map_in_map.c b/kernel/bpf/map_in_map.c
index b4f18c85d7bc..645bd30bc9a9 100644
--- a/kernel/bpf/map_in_map.c
+++ b/kernel/bpf/map_in_map.c
@@ -11,24 +11,18 @@ struct bpf_map *bpf_map_meta_alloc(int inner_map_ufd)
 {
 	struct bpf_map *inner_map, *inner_map_meta;
 	u32 inner_map_meta_size;
-	struct fd f;
-	int ret;
+	CLASS(fd, f)(inner_map_ufd);
 
-	f = fdget(inner_map_ufd);
 	inner_map = __bpf_map_get(f);
 	if (IS_ERR(inner_map))
 		return inner_map;
 
 	/* Does not support >1 level map-in-map */
-	if (inner_map->inner_map_meta) {
-		ret = -EINVAL;
-		goto put;
-	}
+	if (inner_map->inner_map_meta)
+		return ERR_PTR(-EINVAL);
 
-	if (!inner_map->ops->map_meta_equal) {
-		ret = -ENOTSUPP;
-		goto put;
-	}
+	if (!inner_map->ops->map_meta_equal)
+		return ERR_PTR(-ENOTSUPP);
 
 	inner_map_meta_size = sizeof(*inner_map_meta);
 	/* In some cases verifier needs to access beyond just base map. */
@@ -36,10 +30,8 @@ struct bpf_map *bpf_map_meta_alloc(int inner_map_ufd)
 		inner_map_meta_size = sizeof(struct bpf_array);
 
 	inner_map_meta = kzalloc(inner_map_meta_size, GFP_USER);
-	if (!inner_map_meta) {
-		ret = -ENOMEM;
-		goto put;
-	}
+	if (!inner_map_meta)
+		return ERR_PTR(-ENOMEM);
 
 	inner_map_meta->map_type = inner_map->map_type;
 	inner_map_meta->key_size = inner_map->key_size;
@@ -53,8 +45,9 @@ struct bpf_map *bpf_map_meta_alloc(int inner_map_ufd)
 		 * invalid/empty/valid, but ERR_PTR in case of errors. During
 		 * equality NULL or IS_ERR is equivalent.
 		 */
-		ret = PTR_ERR(inner_map_meta->record);
-		goto free;
+		struct bpf_map *ret = ERR_CAST(inner_map_meta->record);
+		kfree(inner_map_meta);
+		return ret;
 	}
 	/* Note: We must use the same BTF, as we also used btf_record_dup above
 	 * which relies on BTF being same for both maps, as some members like
@@ -77,14 +70,7 @@ struct bpf_map *bpf_map_meta_alloc(int inner_map_ufd)
 		inner_array_meta->elem_size = inner_array->elem_size;
 		inner_map_meta->bypass_spec_v1 = inner_map->bypass_spec_v1;
 	}
-
-	fdput(f);
 	return inner_map_meta;
-free:
-	kfree(inner_map_meta);
-put:
-	fdput(f);
-	return ERR_PTR(ret);
 }
 
 void bpf_map_meta_free(struct bpf_map *map_meta)
@@ -110,9 +96,8 @@ void *bpf_map_fd_get_ptr(struct bpf_map *map,
 			 int ufd)
 {
 	struct bpf_map *inner_map, *inner_map_meta;
-	struct fd f;
+	CLASS(fd, f)(ufd);
 
-	f = fdget(ufd);
 	inner_map = __bpf_map_get(f);
 	if (IS_ERR(inner_map))
 		return inner_map;
@@ -123,7 +108,6 @@ void *bpf_map_fd_get_ptr(struct bpf_map *map,
 	else
 		inner_map = ERR_PTR(-EINVAL);
 
-	fdput(f);
 	return inner_map;
 }
 
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index c5b252c0faa8..807042c643c8 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -1418,19 +1418,11 @@ static int map_create(union bpf_attr *attr)
 	return err;
 }
 
-/* if error is returned, fd is released.
- * On success caller should complete fd access with matching fdput()
- */
-struct bpf_map *__bpf_map_get(struct fd f)
+struct bpf_map *bpf_file_to_map(struct file *f)
 {
-	if (!fd_file(f))
-		return ERR_PTR(-EBADF);
-	if (fd_file(f)->f_op != &bpf_map_fops) {
-		fdput(f);
+	if (unlikely(f->f_op != &bpf_map_fops))
 		return ERR_PTR(-EINVAL);
-	}
-
-	return fd_file(f)->private_data;
+	return f->private_data;
 }
 
 void bpf_map_inc(struct bpf_map *map)
@@ -1448,15 +1440,11 @@ EXPORT_SYMBOL_GPL(bpf_map_inc_with_uref);
 
 struct bpf_map *bpf_map_get(u32 ufd)
 {
-	struct fd f = fdget(ufd);
-	struct bpf_map *map;
-
-	map = __bpf_map_get(f);
-	if (IS_ERR(map))
-		return map;
+	CLASS(fd, f)(ufd);
+	struct bpf_map *map = __bpf_map_get(f);
 
-	bpf_map_inc(map);
-	fdput(f);
+	if (!IS_ERR(map))
+		bpf_map_inc(map);
 
 	return map;
 }
@@ -1464,15 +1452,11 @@ EXPORT_SYMBOL(bpf_map_get);
 
 struct bpf_map *bpf_map_get_with_uref(u32 ufd)
 {
-	struct fd f = fdget(ufd);
-	struct bpf_map *map;
-
-	map = __bpf_map_get(f);
-	if (IS_ERR(map))
-		return map;
+	CLASS(fd, f)(ufd);
+	struct bpf_map *map = __bpf_map_get(f);
 
-	bpf_map_inc_with_uref(map);
-	fdput(f);
+	if (!IS_ERR(map))
+		bpf_map_inc_with_uref(map);
 
 	return map;
 }
@@ -1537,11 +1521,9 @@ static int map_lookup_elem(union bpf_attr *attr)
 {
 	void __user *ukey = u64_to_user_ptr(attr->key);
 	void __user *uvalue = u64_to_user_ptr(attr->value);
-	int ufd = attr->map_fd;
 	struct bpf_map *map;
 	void *key, *value;
 	u32 value_size;
-	struct fd f;
 	int err;
 
 	if (CHECK_ATTR(BPF_MAP_LOOKUP_ELEM))
@@ -1550,26 +1532,20 @@ static int map_lookup_elem(union bpf_attr *attr)
 	if (attr->flags & ~BPF_F_LOCK)
 		return -EINVAL;
 
-	f = fdget(ufd);
+	CLASS(fd, f)(attr->map_fd);
 	map = __bpf_map_get(f);
 	if (IS_ERR(map))
 		return PTR_ERR(map);
-	if (!(map_get_sys_perms(map, f) & FMODE_CAN_READ)) {
-		err = -EPERM;
-		goto err_put;
-	}
+	if (!(map_get_sys_perms(map, f) & FMODE_CAN_READ))
+		return -EPERM;
 
 	if ((attr->flags & BPF_F_LOCK) &&
-	    !btf_record_has_field(map->record, BPF_SPIN_LOCK)) {
-		err = -EINVAL;
-		goto err_put;
-	}
+	    !btf_record_has_field(map->record, BPF_SPIN_LOCK))
+		return -EINVAL;
 
 	key = __bpf_copy_key(ukey, map->key_size);
-	if (IS_ERR(key)) {
-		err = PTR_ERR(key);
-		goto err_put;
-	}
+	if (IS_ERR(key))
+		return PTR_ERR(key);
 
 	value_size = bpf_map_value_size(map);
 
@@ -1600,8 +1576,6 @@ static int map_lookup_elem(union bpf_attr *attr)
 	kvfree(value);
 free_key:
 	kvfree(key);
-err_put:
-	fdput(f);
 	return err;
 }
 
@@ -1612,17 +1586,15 @@ static int map_update_elem(union bpf_attr *attr, bpfptr_t uattr)
 {
 	bpfptr_t ukey = make_bpfptr(attr->key, uattr.is_kernel);
 	bpfptr_t uvalue = make_bpfptr(attr->value, uattr.is_kernel);
-	int ufd = attr->map_fd;
 	struct bpf_map *map;
 	void *key, *value;
 	u32 value_size;
-	struct fd f;
 	int err;
 
 	if (CHECK_ATTR(BPF_MAP_UPDATE_ELEM))
 		return -EINVAL;
 
-	f = fdget(ufd);
+	CLASS(fd, f)(attr->map_fd);
 	map = __bpf_map_get(f);
 	if (IS_ERR(map))
 		return PTR_ERR(map);
@@ -1660,7 +1632,6 @@ static int map_update_elem(union bpf_attr *attr, bpfptr_t uattr)
 	kvfree(key);
 err_put:
 	bpf_map_write_active_dec(map);
-	fdput(f);
 	return err;
 }
 
@@ -1669,16 +1640,14 @@ static int map_update_elem(union bpf_attr *attr, bpfptr_t uattr)
 static int map_delete_elem(union bpf_attr *attr, bpfptr_t uattr)
 {
 	bpfptr_t ukey = make_bpfptr(attr->key, uattr.is_kernel);
-	int ufd = attr->map_fd;
 	struct bpf_map *map;
-	struct fd f;
 	void *key;
 	int err;
 
 	if (CHECK_ATTR(BPF_MAP_DELETE_ELEM))
 		return -EINVAL;
 
-	f = fdget(ufd);
+	CLASS(fd, f)(attr->map_fd);
 	map = __bpf_map_get(f);
 	if (IS_ERR(map))
 		return PTR_ERR(map);
@@ -1715,7 +1684,6 @@ static int map_delete_elem(union bpf_attr *attr, bpfptr_t uattr)
 	kvfree(key);
 err_put:
 	bpf_map_write_active_dec(map);
-	fdput(f);
 	return err;
 }
 
@@ -1726,30 +1694,24 @@ static int map_get_next_key(union bpf_attr *attr)
 {
 	void __user *ukey = u64_to_user_ptr(attr->key);
 	void __user *unext_key = u64_to_user_ptr(attr->next_key);
-	int ufd = attr->map_fd;
 	struct bpf_map *map;
 	void *key, *next_key;
-	struct fd f;
 	int err;
 
 	if (CHECK_ATTR(BPF_MAP_GET_NEXT_KEY))
 		return -EINVAL;
 
-	f = fdget(ufd);
+	CLASS(fd, f)(attr->map_fd);
 	map = __bpf_map_get(f);
 	if (IS_ERR(map))
 		return PTR_ERR(map);
-	if (!(map_get_sys_perms(map, f) & FMODE_CAN_READ)) {
-		err = -EPERM;
-		goto err_put;
-	}
+	if (!(map_get_sys_perms(map, f) & FMODE_CAN_READ))
+		return -EPERM;
 
 	if (ukey) {
 		key = __bpf_copy_key(ukey, map->key_size);
-		if (IS_ERR(key)) {
-			err = PTR_ERR(key);
-			goto err_put;
-		}
+		if (IS_ERR(key))
+			return PTR_ERR(key);
 	} else {
 		key = NULL;
 	}
@@ -1781,8 +1743,6 @@ static int map_get_next_key(union bpf_attr *attr)
 	kvfree(next_key);
 free_key:
 	kvfree(key);
-err_put:
-	fdput(f);
 	return err;
 }
 
@@ -2011,11 +1971,9 @@ static int map_lookup_and_delete_elem(union bpf_attr *attr)
 {
 	void __user *ukey = u64_to_user_ptr(attr->key);
 	void __user *uvalue = u64_to_user_ptr(attr->value);
-	int ufd = attr->map_fd;
 	struct bpf_map *map;
 	void *key, *value;
 	u32 value_size;
-	struct fd f;
 	int err;
 
 	if (CHECK_ATTR(BPF_MAP_LOOKUP_AND_DELETE_ELEM))
@@ -2024,7 +1982,7 @@ static int map_lookup_and_delete_elem(union bpf_attr *attr)
 	if (attr->flags & ~BPF_F_LOCK)
 		return -EINVAL;
 
-	f = fdget(ufd);
+	CLASS(fd, f)(attr->map_fd);
 	map = __bpf_map_get(f);
 	if (IS_ERR(map))
 		return PTR_ERR(map);
@@ -2094,7 +2052,6 @@ static int map_lookup_and_delete_elem(union bpf_attr *attr)
 	kvfree(key);
 err_put:
 	bpf_map_write_active_dec(map);
-	fdput(f);
 	return err;
 }
 
@@ -2102,27 +2059,22 @@ static int map_lookup_and_delete_elem(union bpf_attr *attr)
 
 static int map_freeze(const union bpf_attr *attr)
 {
-	int err = 0, ufd = attr->map_fd;
+	int err = 0;
 	struct bpf_map *map;
-	struct fd f;
 
 	if (CHECK_ATTR(BPF_MAP_FREEZE))
 		return -EINVAL;
 
-	f = fdget(ufd);
+	CLASS(fd, f)(attr->map_fd);
 	map = __bpf_map_get(f);
 	if (IS_ERR(map))
 		return PTR_ERR(map);
 
-	if (map->map_type == BPF_MAP_TYPE_STRUCT_OPS || !IS_ERR_OR_NULL(map->record)) {
-		fdput(f);
+	if (map->map_type == BPF_MAP_TYPE_STRUCT_OPS || !IS_ERR_OR_NULL(map->record))
 		return -ENOTSUPP;
-	}
 
-	if (!(map_get_sys_perms(map, f) & FMODE_CAN_WRITE)) {
-		fdput(f);
+	if (!(map_get_sys_perms(map, f) & FMODE_CAN_WRITE))
 		return -EPERM;
-	}
 
 	mutex_lock(&map->freeze_mutex);
 	if (bpf_map_write_active(map)) {
@@ -2137,7 +2089,6 @@ static int map_freeze(const union bpf_attr *attr)
 	WRITE_ONCE(map->frozen, true);
 err_put:
 	mutex_unlock(&map->freeze_mutex);
-	fdput(f);
 	return err;
 }
 
@@ -5178,14 +5129,13 @@ static int bpf_map_do_batch(const union bpf_attr *attr,
 			 cmd == BPF_MAP_LOOKUP_AND_DELETE_BATCH;
 	bool has_write = cmd != BPF_MAP_LOOKUP_BATCH;
 	struct bpf_map *map;
-	int err, ufd;
-	struct fd f;
+	int err;
 
 	if (CHECK_ATTR(BPF_MAP_BATCH))
 		return -EINVAL;
 
-	ufd = attr->batch.map_fd;
-	f = fdget(ufd);
+	CLASS(fd, f)(attr->batch.map_fd);
+
 	map = __bpf_map_get(f);
 	if (IS_ERR(map))
 		return PTR_ERR(map);
@@ -5213,7 +5163,6 @@ static int bpf_map_do_batch(const union bpf_attr *attr,
 		maybe_wait_bpf_programs(map);
 		bpf_map_write_active_dec(map);
 	}
-	fdput(f);
 	return err;
 }
 
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index d54f61e12062..011c51a84158 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -18419,7 +18419,6 @@ static int do_one_ldimm64(struct bpf_verifier_env *env,
 			  struct bpf_insn_aux_data *aux)
 {
 	struct bpf_map *map;
-	struct fd f;
 	u64 addr;
 	u32 fd;
 	int err;
@@ -18470,7 +18469,7 @@ static int do_one_ldimm64(struct bpf_verifier_env *env,
 		break;
 	}
 
-	f = fdget(fd);
+	CLASS(fd, f)(fd);
 	map = __bpf_map_get(f);
 	if (IS_ERR(map)) {
 		verbose(env, "fd %d is not pointing to valid bpf_map\n", fd);
@@ -18478,10 +18477,8 @@ static int do_one_ldimm64(struct bpf_verifier_env *env,
 	}
 
 	err = check_map_prog_compatibility(env, map, env->prog);
-	if (err) {
-		fdput(f);
+	if (err)
 		return err;
-	}
 
 	if (insn[0].src_reg == BPF_PSEUDO_MAP_FD ||
 	    insn[0].src_reg == BPF_PSEUDO_MAP_IDX) {
@@ -18491,13 +18488,11 @@ static int do_one_ldimm64(struct bpf_verifier_env *env,
 
 		if (off >= BPF_MAX_VAR_OFF) {
 			verbose(env, "direct value offset of %u is not allowed\n", off);
-			fdput(f);
 			return -EINVAL;
 		}
 
 		if (!map->ops->map_direct_value_addr) {
 			verbose(env, "no direct value access support for this map type\n");
-			fdput(f);
 			return -EINVAL;
 		}
 
@@ -18505,7 +18500,6 @@ static int do_one_ldimm64(struct bpf_verifier_env *env,
 		if (err) {
 			verbose(env, "invalid access to map value pointer, value_size=%u off=%u\n",
 				map->value_size, off);
-			fdput(f);
 			return err;
 		}
 
@@ -18520,7 +18514,6 @@ static int do_one_ldimm64(struct bpf_verifier_env *env,
 	for (int i = 0; i < env->used_map_cnt; i++) {
 		if (env->used_maps[i] == map) {
 			aux->map_index = i;
-			fdput(f);
 			return 0;
 		}
 	}
@@ -18528,7 +18521,6 @@ static int do_one_ldimm64(struct bpf_verifier_env *env,
 	if (env->used_map_cnt >= MAX_USED_MAPS) {
 		verbose(env, "The total number of maps per program has reached the limit of %u\n",
 			MAX_USED_MAPS);
-		fdput(f);
 		return -E2BIG;
 	}
 
@@ -18547,39 +18539,32 @@ static int do_one_ldimm64(struct bpf_verifier_env *env,
 	if (bpf_map_is_cgroup_storage(map) &&
 	    bpf_cgroup_storage_assign(env->prog->aux, map)) {
 		verbose(env, "only one cgroup storage of each type is allowed\n");
-		fdput(f);
 		return -EBUSY;
 	}
 	if (map->map_type == BPF_MAP_TYPE_ARENA) {
 		if (env->prog->aux->arena) {
 			verbose(env, "Only one arena per program\n");
-			fdput(f);
 			return -EBUSY;
 		}
 		if (!env->allow_ptr_leaks || !env->bpf_capable) {
 			verbose(env, "CAP_BPF and CAP_PERFMON are required to use arena\n");
-			fdput(f);
 			return -EPERM;
 		}
 		if (!env->prog->jit_requested) {
 			verbose(env, "JIT is required to use arena\n");
-			fdput(f);
 			return -EOPNOTSUPP;
 		}
 		if (!bpf_jit_supports_arena()) {
 			verbose(env, "JIT doesn't support arena\n");
-			fdput(f);
 			return -EOPNOTSUPP;
 		}
 		env->prog->aux->arena = (void *)map;
 		if (!bpf_arena_get_user_vm_start(env->prog->aux->arena)) {
 			verbose(env, "arena's user address must be set via map_extra or mmap()\n");
-			fdput(f);
 			return -EINVAL;
 		}
 	}
 
-	fdput(f);
 	return 0;
 }
 
diff --git a/net/core/sock_map.c b/net/core/sock_map.c
index d3dbb92153f2..0f5f80f44d52 100644
--- a/net/core/sock_map.c
+++ b/net/core/sock_map.c
@@ -67,46 +67,39 @@ static struct bpf_map *sock_map_alloc(union bpf_attr *attr)
 
 int sock_map_get_from_fd(const union bpf_attr *attr, struct bpf_prog *prog)
 {
-	u32 ufd = attr->target_fd;
 	struct bpf_map *map;
-	struct fd f;
 	int ret;
 
 	if (attr->attach_flags || attr->replace_bpf_fd)
 		return -EINVAL;
 
-	f = fdget(ufd);
+	CLASS(fd, f)(attr->target_fd);
 	map = __bpf_map_get(f);
 	if (IS_ERR(map))
 		return PTR_ERR(map);
 	mutex_lock(&sockmap_mutex);
 	ret = sock_map_prog_update(map, prog, NULL, NULL, attr->attach_type);
 	mutex_unlock(&sockmap_mutex);
-	fdput(f);
 	return ret;
 }
 
 int sock_map_prog_detach(const union bpf_attr *attr, enum bpf_prog_type ptype)
 {
-	u32 ufd = attr->target_fd;
 	struct bpf_prog *prog;
 	struct bpf_map *map;
-	struct fd f;
 	int ret;
 
 	if (attr->attach_flags || attr->replace_bpf_fd)
 		return -EINVAL;
 
-	f = fdget(ufd);
+	CLASS(fd, f)(attr->target_fd);
 	map = __bpf_map_get(f);
 	if (IS_ERR(map))
 		return PTR_ERR(map);
 
 	prog = bpf_prog_get(attr->attach_bpf_fd);
-	if (IS_ERR(prog)) {
-		ret = PTR_ERR(prog);
-		goto put_map;
-	}
+	if (IS_ERR(prog))
+		return PTR_ERR(prog);
 
 	if (prog->type != ptype) {
 		ret = -EINVAL;
@@ -118,8 +111,6 @@ int sock_map_prog_detach(const union bpf_attr *attr, enum bpf_prog_type ptype)
 	mutex_unlock(&sockmap_mutex);
 put_prog:
 	bpf_prog_put(prog);
-put_map:
-	fdput(f);
 	return ret;
 }
 
@@ -1550,18 +1541,17 @@ int sock_map_bpf_prog_query(const union bpf_attr *attr,
 			    union bpf_attr __user *uattr)
 {
 	__u32 __user *prog_ids = u64_to_user_ptr(attr->query.prog_ids);
-	u32 prog_cnt = 0, flags = 0, ufd = attr->target_fd;
+	u32 prog_cnt = 0, flags = 0;
 	struct bpf_prog **pprog;
 	struct bpf_prog *prog;
 	struct bpf_map *map;
-	struct fd f;
 	u32 id = 0;
 	int ret;
 
 	if (attr->query.query_flags)
 		return -EINVAL;
 
-	f = fdget(ufd);
+	CLASS(fd, f)(attr->target_fd);
 	map = __bpf_map_get(f);
 	if (IS_ERR(map))
 		return PTR_ERR(map);
@@ -1593,7 +1583,6 @@ int sock_map_bpf_prog_query(const union bpf_attr *attr,
 	    copy_to_user(&uattr->query.prog_cnt, &prog_cnt, sizeof(prog_cnt)))
 		ret = -EFAULT;
 
-	fdput(f);
 	return ret;
 }
 
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 134+ messages in thread

* Re: [PATCH 18/39] bpf maps: switch to CLASS(fd, ...)
  2024-07-30  5:16   ` [PATCH 18/39] bpf maps: switch to CLASS(fd, ...) viro
@ 2024-08-07 10:34     ` Christian Brauner
  0 siblings, 0 replies; 134+ messages in thread
From: Christian Brauner @ 2024-08-07 10:34 UTC (permalink / raw)
  To: viro; +Cc: linux-fsdevel, amir73il, bpf, cgroups, kvm, netdev, torvalds

On Tue, Jul 30, 2024 at 01:16:04AM GMT, viro@kernel.org wrote:
> From: Al Viro <viro@zeniv.linux.org.uk>
> 
>         Calling conventions for __bpf_map_get() would be more convenient
> if it left fpdut() on failure to callers.  Makes for simpler logics
> in the callers.
> 
> 	Among other things, the proof of memory safety no longer has to
> rely upon file->private_data never being ERR_PTR(...) for bpffs files.
> Original calling conventions made it impossible for the caller to tell
> whether __bpf_map_get() has returned ERR_PTR(-EINVAL) because it has found
> the file not be a bpf map one (in which case it would've done fdput())
> or because it found that ERR_PTR(-EINVAL) in file->private_data of a
> bpf map file (in which case fdput() would _not_ have been done).
> 
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> ---

Reviewed-by: Christian Brauner <brauner@kernel.org>

^ permalink raw reply	[flat|nested] 134+ messages in thread

* [PATCH 19/39] fdget_raw() users: switch to CLASS(fd_raw, ...)
  2024-07-30  5:15 ` [PATCH 01/39] memcg_write_event_control(): fix a user-triggerable oops viro
                     ` (16 preceding siblings ...)
  2024-07-30  5:16   ` [PATCH 18/39] bpf maps: switch to CLASS(fd, ...) viro
@ 2024-07-30  5:16   ` viro
  2024-08-07 10:35     ` Christian Brauner
  2024-07-30  5:16   ` [PATCH 20/39] introduce "fd_pos" class, convert fdget_pos() users to it viro
                     ` (20 subsequent siblings)
  38 siblings, 1 reply; 134+ messages in thread
From: viro @ 2024-07-30  5:16 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: amir73il, bpf, brauner, cgroups, kvm, netdev, torvalds

From: Al Viro <viro@zeniv.linux.org.uk>

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 arch/arm/kernel/sys_oabi-compat.c | 10 +++-----
 fs/fcntl.c                        | 42 +++++++++++++------------------
 fs/namei.c                        | 13 +++-------
 fs/open.c                         | 13 +++-------
 fs/quota/quota.c                  | 12 +++------
 fs/stat.c                         | 10 +++-----
 fs/statfs.c                       | 12 ++++-----
 kernel/bpf/bpf_inode_storage.c    | 25 ++++++------------
 kernel/cgroup/cgroup.c            |  9 +++----
 security/landlock/syscalls.c      | 19 +++++---------
 10 files changed, 58 insertions(+), 107 deletions(-)

diff --git a/arch/arm/kernel/sys_oabi-compat.c b/arch/arm/kernel/sys_oabi-compat.c
index f5781ff54a5c..2944721e82a2 100644
--- a/arch/arm/kernel/sys_oabi-compat.c
+++ b/arch/arm/kernel/sys_oabi-compat.c
@@ -235,12 +235,12 @@ asmlinkage long sys_oabi_fcntl64(unsigned int fd, unsigned int cmd,
 				 unsigned long arg)
 {
 	void __user *argp = (void __user *)arg;
-	struct fd f = fdget_raw(fd);
+	CLASS(fd_raw, f)(fd);
 	struct flock64 flock;
-	long err = -EBADF;
+	long err;
 
-	if (!fd_file(f))
-		goto out;
+	if (fd_empty(f))
+		return -EBADF;
 
 	switch (cmd) {
 	case F_GETLK64:
@@ -271,8 +271,6 @@ asmlinkage long sys_oabi_fcntl64(unsigned int fd, unsigned int cmd,
 		err = sys_fcntl64(fd, cmd, arg);
 		break;
 	}
-	fdput(f);
-out:
 	return err;
 }
 
diff --git a/fs/fcntl.c b/fs/fcntl.c
index 2b5616762354..f96328ee3853 100644
--- a/fs/fcntl.c
+++ b/fs/fcntl.c
@@ -476,24 +476,21 @@ static int check_fcntl_cmd(unsigned cmd)
 
 SYSCALL_DEFINE3(fcntl, unsigned int, fd, unsigned int, cmd, unsigned long, arg)
 {	
-	struct fd f = fdget_raw(fd);
-	long err = -EBADF;
+	CLASS(fd_raw, f)(fd);
+	long err;
 
-	if (!fd_file(f))
-		goto out;
+	if (fd_empty(f))
+		return -EBADF;
 
 	if (unlikely(fd_file(f)->f_mode & FMODE_PATH)) {
 		if (!check_fcntl_cmd(cmd))
-			goto out1;
+			return -EBADF;
 	}
 
 	err = security_file_fcntl(fd_file(f), cmd, arg);
 	if (!err)
 		err = do_fcntl(fd, cmd, arg, fd_file(f));
 
-out1:
- 	fdput(f);
-out:
 	return err;
 }
 
@@ -502,21 +499,21 @@ SYSCALL_DEFINE3(fcntl64, unsigned int, fd, unsigned int, cmd,
 		unsigned long, arg)
 {	
 	void __user *argp = (void __user *)arg;
-	struct fd f = fdget_raw(fd);
+	CLASS(fd_raw, f)(fd);
 	struct flock64 flock;
-	long err = -EBADF;
+	long err;
 
-	if (!fd_file(f))
-		goto out;
+	if (fd_empty(f))
+		return -EBADF;
 
 	if (unlikely(fd_file(f)->f_mode & FMODE_PATH)) {
 		if (!check_fcntl_cmd(cmd))
-			goto out1;
+			return -EBADF;
 	}
 
 	err = security_file_fcntl(fd_file(f), cmd, arg);
 	if (err)
-		goto out1;
+		return err;
 	
 	switch (cmd) {
 	case F_GETLK64:
@@ -541,9 +538,6 @@ SYSCALL_DEFINE3(fcntl64, unsigned int, fd, unsigned int, cmd,
 		err = do_fcntl(fd, cmd, arg, fd_file(f));
 		break;
 	}
-out1:
-	fdput(f);
-out:
 	return err;
 }
 #endif
@@ -639,21 +633,21 @@ static int fixup_compat_flock(struct flock *flock)
 static long do_compat_fcntl64(unsigned int fd, unsigned int cmd,
 			     compat_ulong_t arg)
 {
-	struct fd f = fdget_raw(fd);
+	CLASS(fd_raw, f)(fd);
 	struct flock flock;
-	long err = -EBADF;
+	long err;
 
-	if (!fd_file(f))
-		return err;
+	if (fd_empty(f))
+		return -EBADF;
 
 	if (unlikely(fd_file(f)->f_mode & FMODE_PATH)) {
 		if (!check_fcntl_cmd(cmd))
-			goto out_put;
+			return -EBADF;
 	}
 
 	err = security_file_fcntl(fd_file(f), cmd, arg);
 	if (err)
-		goto out_put;
+		return err;
 
 	switch (cmd) {
 	case F_GETLK:
@@ -696,8 +690,6 @@ static long do_compat_fcntl64(unsigned int fd, unsigned int cmd,
 		err = do_fcntl(fd, cmd, arg, fd_file(f));
 		break;
 	}
-out_put:
-	fdput(f);
 	return err;
 }
 
diff --git a/fs/namei.c b/fs/namei.c
index af86e3330594..9a1dcaf8be30 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -2489,26 +2489,22 @@ static const char *path_init(struct nameidata *nd, unsigned flags)
 		}
 	} else {
 		/* Caller must check execute permissions on the starting path component */
-		struct fd f = fdget_raw(nd->dfd);
+		CLASS(fd_raw, f)(nd->dfd);
 		struct dentry *dentry;
 
-		if (!fd_file(f))
+		if (fd_empty(f))
 			return ERR_PTR(-EBADF);
 
 		if (flags & LOOKUP_LINKAT_EMPTY) {
 			if (fd_file(f)->f_cred != current_cred() &&
-			    !ns_capable(fd_file(f)->f_cred->user_ns, CAP_DAC_READ_SEARCH)) {
-				fdput(f);
+			    !ns_capable(fd_file(f)->f_cred->user_ns, CAP_DAC_READ_SEARCH))
 				return ERR_PTR(-ENOENT);
-			}
 		}
 
 		dentry = fd_file(f)->f_path.dentry;
 
-		if (*s && unlikely(!d_can_lookup(dentry))) {
-			fdput(f);
+		if (*s && unlikely(!d_can_lookup(dentry)))
 			return ERR_PTR(-ENOTDIR);
-		}
 
 		nd->path = fd_file(f)->f_path;
 		if (flags & LOOKUP_RCU) {
@@ -2518,7 +2514,6 @@ static const char *path_init(struct nameidata *nd, unsigned flags)
 			path_get(&nd->path);
 			nd->inode = nd->path.dentry->d_inode;
 		}
-		fdput(f);
 	}
 
 	/* For scoped-lookups we need to set the root to the dirfd as well. */
diff --git a/fs/open.c b/fs/open.c
index a388828ccd22..624c173a0270 100644
--- a/fs/open.c
+++ b/fs/open.c
@@ -581,23 +581,18 @@ SYSCALL_DEFINE1(chdir, const char __user *, filename)
 
 SYSCALL_DEFINE1(fchdir, unsigned int, fd)
 {
-	struct fd f = fdget_raw(fd);
+	CLASS(fd_raw, f)(fd);
 	int error;
 
-	error = -EBADF;
-	if (!fd_file(f))
-		goto out;
+	if (fd_empty(f))
+		return -EBADF;
 
-	error = -ENOTDIR;
 	if (!d_can_lookup(fd_file(f)->f_path.dentry))
-		goto out_putf;
+		return -ENOTDIR;
 
 	error = file_permission(fd_file(f), MAY_EXEC | MAY_CHDIR);
 	if (!error)
 		set_fs_pwd(current->fs, &fd_file(f)->f_path);
-out_putf:
-	fdput(f);
-out:
 	return error;
 }
 
diff --git a/fs/quota/quota.c b/fs/quota/quota.c
index 290157bc7bec..7c2b75a44485 100644
--- a/fs/quota/quota.c
+++ b/fs/quota/quota.c
@@ -976,21 +976,19 @@ SYSCALL_DEFINE4(quotactl_fd, unsigned int, fd, unsigned int, cmd,
 	struct super_block *sb;
 	unsigned int cmds = cmd >> SUBCMDSHIFT;
 	unsigned int type = cmd & SUBCMDMASK;
-	struct fd f;
+	CLASS(fd_raw, f)(fd);
 	int ret;
 
-	f = fdget_raw(fd);
-	if (!fd_file(f))
+	if (fd_empty(f))
 		return -EBADF;
 
-	ret = -EINVAL;
 	if (type >= MAXQUOTAS)
-		goto out;
+		return -EINVAL;
 
 	if (quotactl_cmd_write(cmds)) {
 		ret = mnt_want_write(fd_file(f)->f_path.mnt);
 		if (ret)
-			goto out;
+			return ret;
 	}
 
 	sb = fd_file(f)->f_path.mnt->mnt_sb;
@@ -1008,7 +1006,5 @@ SYSCALL_DEFINE4(quotactl_fd, unsigned int, fd, unsigned int, cmd,
 
 	if (quotactl_cmd_write(cmds))
 		mnt_drop_write(fd_file(f)->f_path.mnt);
-out:
-	fdput(f);
 	return ret;
 }
diff --git a/fs/stat.c b/fs/stat.c
index 41e598376d7e..56d6ce2b2c79 100644
--- a/fs/stat.c
+++ b/fs/stat.c
@@ -220,15 +220,11 @@ EXPORT_SYMBOL(vfs_getattr);
  */
 int vfs_fstat(int fd, struct kstat *stat)
 {
-	struct fd f;
-	int error;
+	CLASS(fd_raw, f)(fd);
 
-	f = fdget_raw(fd);
-	if (!fd_file(f))
+	if (fd_empty(f))
 		return -EBADF;
-	error = vfs_getattr(&fd_file(f)->f_path, stat, STATX_BASIC_STATS, 0);
-	fdput(f);
-	return error;
+	return vfs_getattr(&fd_file(f)->f_path, stat, STATX_BASIC_STATS, 0);
 }
 
 int getname_statx_lookup_flags(int flags)
diff --git a/fs/statfs.c b/fs/statfs.c
index 9c7bb27e7932..a45ac85e6048 100644
--- a/fs/statfs.c
+++ b/fs/statfs.c
@@ -114,13 +114,11 @@ int user_statfs(const char __user *pathname, struct kstatfs *st)
 
 int fd_statfs(int fd, struct kstatfs *st)
 {
-	struct fd f = fdget_raw(fd);
-	int error = -EBADF;
-	if (fd_file(f)) {
-		error = vfs_statfs(&fd_file(f)->f_path, st);
-		fdput(f);
-	}
-	return error;
+	CLASS(fd_raw, f)(fd);
+
+	if (fd_empty(f))
+		return -EBADF;
+	return vfs_statfs(&fd_file(f)->f_path, st);
 }
 
 static int do_statfs_native(struct kstatfs *st, struct statfs __user *p)
diff --git a/kernel/bpf/bpf_inode_storage.c b/kernel/bpf/bpf_inode_storage.c
index 0a79aee6523d..a2c05db49ebc 100644
--- a/kernel/bpf/bpf_inode_storage.c
+++ b/kernel/bpf/bpf_inode_storage.c
@@ -78,13 +78,12 @@ void bpf_inode_storage_free(struct inode *inode)
 static void *bpf_fd_inode_storage_lookup_elem(struct bpf_map *map, void *key)
 {
 	struct bpf_local_storage_data *sdata;
-	struct fd f = fdget_raw(*(int *)key);
+	CLASS(fd_raw, f)(*(int *)key);
 
-	if (!fd_file(f))
+	if (fd_empty(f))
 		return ERR_PTR(-EBADF);
 
 	sdata = inode_storage_lookup(file_inode(fd_file(f)), map, true);
-	fdput(f);
 	return sdata ? sdata->data : NULL;
 }
 
@@ -92,19 +91,16 @@ static long bpf_fd_inode_storage_update_elem(struct bpf_map *map, void *key,
 					     void *value, u64 map_flags)
 {
 	struct bpf_local_storage_data *sdata;
-	struct fd f = fdget_raw(*(int *)key);
+	CLASS(fd_raw, f)(*(int *)key);
 
-	if (!fd_file(f))
+	if (fd_empty(f))
 		return -EBADF;
-	if (!inode_storage_ptr(file_inode(fd_file(f)))) {
-		fdput(f);
+	if (!inode_storage_ptr(file_inode(fd_file(f))))
 		return -EBADF;
-	}
 
 	sdata = bpf_local_storage_update(file_inode(fd_file(f)),
 					 (struct bpf_local_storage_map *)map,
 					 value, map_flags, GFP_ATOMIC);
-	fdput(f);
 	return PTR_ERR_OR_ZERO(sdata);
 }
 
@@ -123,15 +119,10 @@ static int inode_storage_delete(struct inode *inode, struct bpf_map *map)
 
 static long bpf_fd_inode_storage_delete_elem(struct bpf_map *map, void *key)
 {
-	struct fd f = fdget_raw(*(int *)key);
-	int err;
-
-	if (!fd_file(f))
+	CLASS(fd_raw, f)(*(int *)key);
+	if (fd_empty(f))
 		return -EBADF;
-
-	err = inode_storage_delete(file_inode(fd_file(f)), map);
-	fdput(f);
-	return err;
+	return inode_storage_delete(file_inode(fd_file(f)), map);
 }
 
 /* *gfp_flags* is a hidden argument provided by the verifier */
diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
index b96489277f70..1244a8c8878e 100644
--- a/kernel/cgroup/cgroup.c
+++ b/kernel/cgroup/cgroup.c
@@ -6899,14 +6899,11 @@ EXPORT_SYMBOL_GPL(cgroup_get_from_path);
  */
 struct cgroup *cgroup_v1v2_get_from_fd(int fd)
 {
-	struct cgroup *cgrp;
-	struct fd f = fdget_raw(fd);
-	if (!fd_file(f))
+	CLASS(fd_raw, f)(fd);
+	if (fd_empty(f))
 		return ERR_PTR(-EBADF);
 
-	cgrp = cgroup_v1v2_get_from_file(fd_file(f));
-	fdput(f);
-	return cgrp;
+	return cgroup_v1v2_get_from_file(fd_file(f));
 }
 
 /**
diff --git a/security/landlock/syscalls.c b/security/landlock/syscalls.c
index 00b63971ab64..9689169ad5cc 100644
--- a/security/landlock/syscalls.c
+++ b/security/landlock/syscalls.c
@@ -269,15 +269,12 @@ static struct landlock_ruleset *get_ruleset_from_fd(const int fd,
  */
 static int get_path_from_fd(const s32 fd, struct path *const path)
 {
-	struct fd f;
-	int err = 0;
+	CLASS(fd_raw, f)(fd);
 
 	BUILD_BUG_ON(!__same_type(
 		fd, ((struct landlock_path_beneath_attr *)NULL)->parent_fd));
 
-	/* Handles O_PATH. */
-	f = fdget_raw(fd);
-	if (!fd_file(f))
+	if (fd_empty(f))
 		return -EBADF;
 	/*
 	 * Forbids ruleset FDs, internal filesystems (e.g. nsfs), including
@@ -288,16 +285,12 @@ static int get_path_from_fd(const s32 fd, struct path *const path)
 	    (fd_file(f)->f_path.mnt->mnt_flags & MNT_INTERNAL) ||
 	    (fd_file(f)->f_path.dentry->d_sb->s_flags & SB_NOUSER) ||
 	    d_is_negative(fd_file(f)->f_path.dentry) ||
-	    IS_PRIVATE(d_backing_inode(fd_file(f)->f_path.dentry))) {
-		err = -EBADFD;
-		goto out_fdput;
-	}
+	    IS_PRIVATE(d_backing_inode(fd_file(f)->f_path.dentry)))
+		return -EBADFD;
+
 	*path = fd_file(f)->f_path;
 	path_get(path);
-
-out_fdput:
-	fdput(f);
-	return err;
+	return 0;
 }
 
 static int add_rule_path_beneath(struct landlock_ruleset *const ruleset,
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 134+ messages in thread

* Re: [PATCH 19/39] fdget_raw() users: switch to CLASS(fd_raw, ...)
  2024-07-30  5:16   ` [PATCH 19/39] fdget_raw() users: switch to CLASS(fd_raw, ...) viro
@ 2024-08-07 10:35     ` Christian Brauner
  0 siblings, 0 replies; 134+ messages in thread
From: Christian Brauner @ 2024-08-07 10:35 UTC (permalink / raw)
  To: viro; +Cc: linux-fsdevel, amir73il, bpf, cgroups, kvm, netdev, torvalds

On Tue, Jul 30, 2024 at 01:16:05AM GMT, viro@kernel.org wrote:
> From: Al Viro <viro@zeniv.linux.org.uk>
> 
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> ---

Reviewed-by: Christian Brauner <brauner@kernel.org>

^ permalink raw reply	[flat|nested] 134+ messages in thread

* [PATCH 20/39] introduce "fd_pos" class, convert fdget_pos() users to it.
  2024-07-30  5:15 ` [PATCH 01/39] memcg_write_event_control(): fix a user-triggerable oops viro
                     ` (17 preceding siblings ...)
  2024-07-30  5:16   ` [PATCH 19/39] fdget_raw() users: switch to CLASS(fd_raw, ...) viro
@ 2024-07-30  5:16   ` viro
  2024-08-07 10:36     ` Christian Brauner
  2024-07-30  5:16   ` [PATCH 21/39] o2hb_region_dev_store(): avoid goto around fdget()/fdput() viro
                     ` (19 subsequent siblings)
  38 siblings, 1 reply; 134+ messages in thread
From: viro @ 2024-07-30  5:16 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: amir73il, bpf, brauner, cgroups, kvm, netdev, torvalds

From: Al Viro <viro@zeniv.linux.org.uk>

fdget_pos() for constructor, fdput_pos() for cleanup, all users of
fd..._pos() converted trivially.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 arch/alpha/kernel/osf_sys.c |  5 ++---
 fs/read_write.c             | 34 +++++++++++++---------------------
 fs/readdir.c                | 28 ++++++++++------------------
 include/linux/file.h        |  1 +
 4 files changed, 26 insertions(+), 42 deletions(-)

diff --git a/arch/alpha/kernel/osf_sys.c b/arch/alpha/kernel/osf_sys.c
index 56fea57f9642..09d82f0963d0 100644
--- a/arch/alpha/kernel/osf_sys.c
+++ b/arch/alpha/kernel/osf_sys.c
@@ -152,7 +152,7 @@ SYSCALL_DEFINE4(osf_getdirentries, unsigned int, fd,
 		long __user *, basep)
 {
 	int error;
-	struct fd arg = fdget_pos(fd);
+	CLASS(fd_pos, arg)(fd);
 	struct osf_dirent_callback buf = {
 		.ctx.actor = osf_filldir,
 		.dirent = dirent,
@@ -160,7 +160,7 @@ SYSCALL_DEFINE4(osf_getdirentries, unsigned int, fd,
 		.count = count
 	};
 
-	if (!fd_file(arg))
+	if (fd_empty(arg))
 		return -EBADF;
 
 	error = iterate_dir(fd_file(arg), &buf.ctx);
@@ -169,7 +169,6 @@ SYSCALL_DEFINE4(osf_getdirentries, unsigned int, fd,
 	if (count != buf.count)
 		error = count - buf.count;
 
-	fdput_pos(arg);
 	return error;
 }
 
diff --git a/fs/read_write.c b/fs/read_write.c
index 59d6d0dee579..7e5a2e50e12b 100644
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -293,8 +293,8 @@ EXPORT_SYMBOL(vfs_llseek);
 static off_t ksys_lseek(unsigned int fd, off_t offset, unsigned int whence)
 {
 	off_t retval;
-	struct fd f = fdget_pos(fd);
-	if (!fd_file(f))
+	CLASS(fd_pos, f)(fd);
+	if (fd_empty(f))
 		return -EBADF;
 
 	retval = -EINVAL;
@@ -304,7 +304,6 @@ static off_t ksys_lseek(unsigned int fd, off_t offset, unsigned int whence)
 		if (res != (loff_t)retval)
 			retval = -EOVERFLOW;	/* LFS: should only happen on 32 bit platforms */
 	}
-	fdput_pos(f);
 	return retval;
 }
 
@@ -327,15 +326,14 @@ SYSCALL_DEFINE5(llseek, unsigned int, fd, unsigned long, offset_high,
 		unsigned int, whence)
 {
 	int retval;
-	struct fd f = fdget_pos(fd);
+	CLASS(fd_pos, f)(fd);
 	loff_t offset;
 
-	if (!fd_file(f))
+	if (fd_empty(f))
 		return -EBADF;
 
-	retval = -EINVAL;
 	if (whence > SEEK_MAX)
-		goto out_putf;
+		return -EINVAL;
 
 	offset = vfs_llseek(fd_file(f), ((loff_t) offset_high << 32) | offset_low,
 			whence);
@@ -346,8 +344,6 @@ SYSCALL_DEFINE5(llseek, unsigned int, fd, unsigned long, offset_high,
 		if (!copy_to_user(result, &offset, sizeof(offset)))
 			retval = 0;
 	}
-out_putf:
-	fdput_pos(f);
 	return retval;
 }
 #endif
@@ -607,10 +603,10 @@ static inline loff_t *file_ppos(struct file *file)
 
 ssize_t ksys_read(unsigned int fd, char __user *buf, size_t count)
 {
-	struct fd f = fdget_pos(fd);
+	CLASS(fd_pos, f)(fd);
 	ssize_t ret = -EBADF;
 
-	if (fd_file(f)) {
+	if (!fd_empty(f)) {
 		loff_t pos, *ppos = file_ppos(fd_file(f));
 		if (ppos) {
 			pos = *ppos;
@@ -619,7 +615,6 @@ ssize_t ksys_read(unsigned int fd, char __user *buf, size_t count)
 		ret = vfs_read(fd_file(f), buf, count, ppos);
 		if (ret >= 0 && ppos)
 			fd_file(f)->f_pos = pos;
-		fdput_pos(f);
 	}
 	return ret;
 }
@@ -631,10 +626,10 @@ SYSCALL_DEFINE3(read, unsigned int, fd, char __user *, buf, size_t, count)
 
 ssize_t ksys_write(unsigned int fd, const char __user *buf, size_t count)
 {
-	struct fd f = fdget_pos(fd);
+	CLASS(fd_pos, f)(fd);
 	ssize_t ret = -EBADF;
 
-	if (fd_file(f)) {
+	if (!fd_empty(f)) {
 		loff_t pos, *ppos = file_ppos(fd_file(f));
 		if (ppos) {
 			pos = *ppos;
@@ -643,7 +638,6 @@ ssize_t ksys_write(unsigned int fd, const char __user *buf, size_t count)
 		ret = vfs_write(fd_file(f), buf, count, ppos);
 		if (ret >= 0 && ppos)
 			fd_file(f)->f_pos = pos;
-		fdput_pos(f);
 	}
 
 	return ret;
@@ -982,10 +976,10 @@ static ssize_t vfs_writev(struct file *file, const struct iovec __user *vec,
 static ssize_t do_readv(unsigned long fd, const struct iovec __user *vec,
 			unsigned long vlen, rwf_t flags)
 {
-	struct fd f = fdget_pos(fd);
+	CLASS(fd_pos, f)(fd);
 	ssize_t ret = -EBADF;
 
-	if (fd_file(f)) {
+	if (!fd_empty(f)) {
 		loff_t pos, *ppos = file_ppos(fd_file(f));
 		if (ppos) {
 			pos = *ppos;
@@ -994,7 +988,6 @@ static ssize_t do_readv(unsigned long fd, const struct iovec __user *vec,
 		ret = vfs_readv(fd_file(f), vec, vlen, ppos, flags);
 		if (ret >= 0 && ppos)
 			fd_file(f)->f_pos = pos;
-		fdput_pos(f);
 	}
 
 	if (ret > 0)
@@ -1006,10 +999,10 @@ static ssize_t do_readv(unsigned long fd, const struct iovec __user *vec,
 static ssize_t do_writev(unsigned long fd, const struct iovec __user *vec,
 			 unsigned long vlen, rwf_t flags)
 {
-	struct fd f = fdget_pos(fd);
+	CLASS(fd_pos, f)(fd);
 	ssize_t ret = -EBADF;
 
-	if (fd_file(f)) {
+	if (!fd_empty(f)) {
 		loff_t pos, *ppos = file_ppos(fd_file(f));
 		if (ppos) {
 			pos = *ppos;
@@ -1018,7 +1011,6 @@ static ssize_t do_writev(unsigned long fd, const struct iovec __user *vec,
 		ret = vfs_writev(fd_file(f), vec, vlen, ppos, flags);
 		if (ret >= 0 && ppos)
 			fd_file(f)->f_pos = pos;
-		fdput_pos(f);
 	}
 
 	if (ret > 0)
diff --git a/fs/readdir.c b/fs/readdir.c
index 6d29cab8576e..0038efda417b 100644
--- a/fs/readdir.c
+++ b/fs/readdir.c
@@ -219,20 +219,19 @@ SYSCALL_DEFINE3(old_readdir, unsigned int, fd,
 		struct old_linux_dirent __user *, dirent, unsigned int, count)
 {
 	int error;
-	struct fd f = fdget_pos(fd);
+	CLASS(fd_pos, f)(fd);
 	struct readdir_callback buf = {
 		.ctx.actor = fillonedir,
 		.dirent = dirent
 	};
 
-	if (!fd_file(f))
+	if (fd_empty(f))
 		return -EBADF;
 
 	error = iterate_dir(fd_file(f), &buf.ctx);
 	if (buf.result)
 		error = buf.result;
 
-	fdput_pos(f);
 	return error;
 }
 
@@ -309,7 +308,7 @@ static bool filldir(struct dir_context *ctx, const char *name, int namlen,
 SYSCALL_DEFINE3(getdents, unsigned int, fd,
 		struct linux_dirent __user *, dirent, unsigned int, count)
 {
-	struct fd f;
+	CLASS(fd_pos, f)(fd);
 	struct getdents_callback buf = {
 		.ctx.actor = filldir,
 		.count = count,
@@ -317,8 +316,7 @@ SYSCALL_DEFINE3(getdents, unsigned int, fd,
 	};
 	int error;
 
-	f = fdget_pos(fd);
-	if (!fd_file(f))
+	if (fd_empty(f))
 		return -EBADF;
 
 	error = iterate_dir(fd_file(f), &buf.ctx);
@@ -333,7 +331,6 @@ SYSCALL_DEFINE3(getdents, unsigned int, fd,
 		else
 			error = count - buf.count;
 	}
-	fdput_pos(f);
 	return error;
 }
 
@@ -392,7 +389,7 @@ static bool filldir64(struct dir_context *ctx, const char *name, int namlen,
 SYSCALL_DEFINE3(getdents64, unsigned int, fd,
 		struct linux_dirent64 __user *, dirent, unsigned int, count)
 {
-	struct fd f;
+	CLASS(fd_pos, f)(fd);
 	struct getdents_callback64 buf = {
 		.ctx.actor = filldir64,
 		.count = count,
@@ -400,8 +397,7 @@ SYSCALL_DEFINE3(getdents64, unsigned int, fd,
 	};
 	int error;
 
-	f = fdget_pos(fd);
-	if (!fd_file(f))
+	if (fd_empty(f))
 		return -EBADF;
 
 	error = iterate_dir(fd_file(f), &buf.ctx);
@@ -417,7 +413,6 @@ SYSCALL_DEFINE3(getdents64, unsigned int, fd,
 		else
 			error = count - buf.count;
 	}
-	fdput_pos(f);
 	return error;
 }
 
@@ -477,20 +472,19 @@ COMPAT_SYSCALL_DEFINE3(old_readdir, unsigned int, fd,
 		struct compat_old_linux_dirent __user *, dirent, unsigned int, count)
 {
 	int error;
-	struct fd f = fdget_pos(fd);
+	CLASS(fd_pos, f)(fd);
 	struct compat_readdir_callback buf = {
 		.ctx.actor = compat_fillonedir,
 		.dirent = dirent
 	};
 
-	if (!fd_file(f))
+	if (fd_empty(f))
 		return -EBADF;
 
 	error = iterate_dir(fd_file(f), &buf.ctx);
 	if (buf.result)
 		error = buf.result;
 
-	fdput_pos(f);
 	return error;
 }
 
@@ -560,7 +554,7 @@ static bool compat_filldir(struct dir_context *ctx, const char *name, int namlen
 COMPAT_SYSCALL_DEFINE3(getdents, unsigned int, fd,
 		struct compat_linux_dirent __user *, dirent, unsigned int, count)
 {
-	struct fd f;
+	CLASS(fd_pos, f)(fd);
 	struct compat_getdents_callback buf = {
 		.ctx.actor = compat_filldir,
 		.current_dir = dirent,
@@ -568,8 +562,7 @@ COMPAT_SYSCALL_DEFINE3(getdents, unsigned int, fd,
 	};
 	int error;
 
-	f = fdget_pos(fd);
-	if (!fd_file(f))
+	if (fd_empty(f))
 		return -EBADF;
 
 	error = iterate_dir(fd_file(f), &buf.ctx);
@@ -584,7 +577,6 @@ COMPAT_SYSCALL_DEFINE3(getdents, unsigned int, fd,
 		else
 			error = count - buf.count;
 	}
-	fdput_pos(f);
 	return error;
 }
 #endif
diff --git a/include/linux/file.h b/include/linux/file.h
index d3165d7a8112..2a0f4e13e1af 100644
--- a/include/linux/file.h
+++ b/include/linux/file.h
@@ -108,6 +108,7 @@ static inline void fdput_pos(struct fd f)
 
 DEFINE_CLASS(fd, struct fd, fdput(_T), fdget(fd), int fd)
 DEFINE_CLASS(fd_raw, struct fd, fdput(_T), fdget_raw(fd), int fd)
+DEFINE_CLASS(fd_pos, struct fd, fdput_pos(_T), fdget_pos(fd), int fd)
 
 extern int f_dupfd(unsigned int from, struct file *file, unsigned flags);
 extern int replace_fd(unsigned fd, struct file *file, unsigned flags);
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 134+ messages in thread

* Re: [PATCH 20/39] introduce "fd_pos" class, convert fdget_pos() users to it.
  2024-07-30  5:16   ` [PATCH 20/39] introduce "fd_pos" class, convert fdget_pos() users to it viro
@ 2024-08-07 10:36     ` Christian Brauner
  0 siblings, 0 replies; 134+ messages in thread
From: Christian Brauner @ 2024-08-07 10:36 UTC (permalink / raw)
  To: viro; +Cc: linux-fsdevel, amir73il, bpf, cgroups, kvm, netdev, torvalds

On Tue, Jul 30, 2024 at 01:16:06AM GMT, viro@kernel.org wrote:
> From: Al Viro <viro@zeniv.linux.org.uk>
> 
> fdget_pos() for constructor, fdput_pos() for cleanup, all users of
> fd..._pos() converted trivially.
> 
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> ---

Reviewed-by: Christian Brauner <brauner@kernel.org>

^ permalink raw reply	[flat|nested] 134+ messages in thread

* [PATCH 21/39] o2hb_region_dev_store(): avoid goto around fdget()/fdput()
  2024-07-30  5:15 ` [PATCH 01/39] memcg_write_event_control(): fix a user-triggerable oops viro
                     ` (18 preceding siblings ...)
  2024-07-30  5:16   ` [PATCH 20/39] introduce "fd_pos" class, convert fdget_pos() users to it viro
@ 2024-07-30  5:16   ` viro
  2024-07-30  5:16   ` [PATCH 22/39] privcmd_ioeventfd_assign(): don't open-code eventfd_ctx_fdget() viro
                     ` (18 subsequent siblings)
  38 siblings, 0 replies; 134+ messages in thread
From: viro @ 2024-07-30  5:16 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: amir73il, bpf, brauner, cgroups, kvm, netdev, torvalds

From: Al Viro <viro@zeniv.linux.org.uk>

Preparation for CLASS(fd) conversion.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/ocfs2/cluster/heartbeat.c | 11 +++++------
 1 file changed, 5 insertions(+), 6 deletions(-)

diff --git a/fs/ocfs2/cluster/heartbeat.c b/fs/ocfs2/cluster/heartbeat.c
index 4b9f45d7049e..bc55340a60c3 100644
--- a/fs/ocfs2/cluster/heartbeat.c
+++ b/fs/ocfs2/cluster/heartbeat.c
@@ -1770,23 +1770,23 @@ static ssize_t o2hb_region_dev_store(struct config_item *item,
 	int live_threshold;
 
 	if (reg->hr_bdev_file)
-		goto out;
+		return -EINVAL;
 
 	/* We can't heartbeat without having had our node number
 	 * configured yet. */
 	if (o2nm_this_node() == O2NM_MAX_NODES)
-		goto out;
+		return -EINVAL;
 
 	fd = simple_strtol(p, &p, 0);
 	if (!p || (*p && (*p != '\n')))
-		goto out;
+		return -EINVAL;
 
 	if (fd < 0 || fd >= INT_MAX)
-		goto out;
+		return -EINVAL;
 
 	f = fdget(fd);
 	if (fd_file(f) == NULL)
-		goto out;
+		return -EINVAL;
 
 	if (reg->hr_blocks == 0 || reg->hr_start_block == 0 ||
 	    reg->hr_block_bytes == 0)
@@ -1908,7 +1908,6 @@ static ssize_t o2hb_region_dev_store(struct config_item *item,
 	}
 out2:
 	fdput(f);
-out:
 	return ret;
 }
 
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 134+ messages in thread

* [PATCH 22/39] privcmd_ioeventfd_assign(): don't open-code eventfd_ctx_fdget()
  2024-07-30  5:15 ` [PATCH 01/39] memcg_write_event_control(): fix a user-triggerable oops viro
                     ` (19 preceding siblings ...)
  2024-07-30  5:16   ` [PATCH 21/39] o2hb_region_dev_store(): avoid goto around fdget()/fdput() viro
@ 2024-07-30  5:16   ` viro
  2024-07-30  5:16   ` [PATCH 23/39] fdget(), trivial conversions viro
                     ` (17 subsequent siblings)
  38 siblings, 0 replies; 134+ messages in thread
From: viro @ 2024-07-30  5:16 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: amir73il, bpf, brauner, cgroups, kvm, netdev, torvalds

From: Al Viro <viro@zeniv.linux.org.uk>

just call it, same as privcmd_ioeventfd_deassign() does...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 drivers/xen/privcmd.c | 11 +----------
 1 file changed, 1 insertion(+), 10 deletions(-)

diff --git a/drivers/xen/privcmd.c b/drivers/xen/privcmd.c
index 54e4f285c0f4..ba02b732fa49 100644
--- a/drivers/xen/privcmd.c
+++ b/drivers/xen/privcmd.c
@@ -1324,7 +1324,6 @@ static int privcmd_ioeventfd_assign(struct privcmd_ioeventfd *ioeventfd)
 	struct privcmd_kernel_ioeventfd *kioeventfd;
 	struct privcmd_kernel_ioreq *kioreq;
 	unsigned long flags;
-	struct fd f;
 	int ret;
 
 	/* Check for range overflow */
@@ -1344,15 +1343,7 @@ static int privcmd_ioeventfd_assign(struct privcmd_ioeventfd *ioeventfd)
 	if (!kioeventfd)
 		return -ENOMEM;
 
-	f = fdget(ioeventfd->event_fd);
-	if (!fd_file(f)) {
-		ret = -EBADF;
-		goto error_kfree;
-	}
-
-	kioeventfd->eventfd = eventfd_ctx_fileget(fd_file(f));
-	fdput(f);
-
+	kioeventfd->eventfd = eventfd_ctx_fdget(ioeventfd->event_fd);
 	if (IS_ERR(kioeventfd->eventfd)) {
 		ret = PTR_ERR(kioeventfd->eventfd);
 		goto error_kfree;
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 134+ messages in thread

* [PATCH 23/39] fdget(), trivial conversions
  2024-07-30  5:15 ` [PATCH 01/39] memcg_write_event_control(): fix a user-triggerable oops viro
                     ` (20 preceding siblings ...)
  2024-07-30  5:16   ` [PATCH 22/39] privcmd_ioeventfd_assign(): don't open-code eventfd_ctx_fdget() viro
@ 2024-07-30  5:16   ` viro
  2024-08-07 10:37     ` Christian Brauner
  2024-07-30  5:16   ` [PATCH 24/39] fdget(), more " viro
                     ` (16 subsequent siblings)
  38 siblings, 1 reply; 134+ messages in thread
From: viro @ 2024-07-30  5:16 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: amir73il, bpf, brauner, cgroups, kvm, netdev, torvalds

From: Al Viro <viro@zeniv.linux.org.uk>

fdget() is the first thing done in scope, all matching fdput() are
immediately followed by leaving the scope.

[conflict in fs/fhandle.c and a trival one in kernel/bpf/syscall.c]
[conflict in fs/xattr.c]

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 arch/powerpc/kvm/book3s_64_vio.c           | 21 +++---------
 arch/powerpc/kvm/powerpc.c                 | 24 ++++---------
 arch/powerpc/platforms/cell/spu_syscalls.c |  6 ++--
 arch/x86/kernel/cpu/sgx/main.c             | 10 ++----
 arch/x86/kvm/svm/sev.c                     | 39 ++++++++--------------
 drivers/gpu/drm/amd/amdgpu/amdgpu_sched.c  | 23 ++++---------
 drivers/gpu/drm/drm_syncobj.c              |  9 ++---
 drivers/media/rc/lirc_dev.c                | 13 +++-----
 fs/btrfs/ioctl.c                           |  5 ++-
 fs/eventfd.c                               |  9 ++---
 fs/eventpoll.c                             | 23 ++++---------
 fs/fhandle.c                               |  5 ++-
 fs/ioctl.c                                 | 23 +++++--------
 fs/kernel_read_file.c                      | 12 +++----
 fs/notify/fanotify/fanotify_user.c         | 15 +++------
 fs/notify/inotify/inotify_user.c           | 17 +++-------
 fs/open.c                                  | 36 +++++++++-----------
 fs/read_write.c                            | 28 +++++-----------
 fs/signalfd.c                              |  9 ++---
 fs/sync.c                                  | 29 ++++++----------
 fs/xattr.c                                 | 35 ++++++++-----------
 io_uring/sqpoll.c                          | 29 +++++-----------
 kernel/bpf/btf.c                           | 11 ++----
 kernel/bpf/syscall.c                       | 10 ++----
 kernel/bpf/token.c                         |  9 ++---
 kernel/events/core.c                       | 14 +++-----
 kernel/nsproxy.c                           |  5 ++-
 kernel/pid.c                               |  7 ++--
 kernel/sys.c                               | 15 +++------
 kernel/watch_queue.c                       |  6 ++--
 mm/fadvise.c                               | 10 ++----
 mm/readahead.c                             | 17 +++-------
 net/core/net_namespace.c                   | 10 +++---
 security/landlock/syscalls.c               | 26 +++++----------
 virt/kvm/vfio.c                            |  8 ++---
 35 files changed, 187 insertions(+), 381 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_64_vio.c b/arch/powerpc/kvm/book3s_64_vio.c
index 34c0adb9fdbf..742aa58a7c7e 100644
--- a/arch/powerpc/kvm/book3s_64_vio.c
+++ b/arch/powerpc/kvm/book3s_64_vio.c
@@ -115,10 +115,9 @@ long kvm_spapr_tce_attach_iommu_group(struct kvm *kvm, int tablefd,
 	struct iommu_table_group *table_group;
 	long i;
 	struct kvmppc_spapr_tce_iommu_table *stit;
-	struct fd f;
+	CLASS(fd, f)(tablefd);
 
-	f = fdget(tablefd);
-	if (!fd_file(f))
+	if (fd_empty(f))
 		return -EBADF;
 
 	rcu_read_lock();
@@ -130,16 +129,12 @@ long kvm_spapr_tce_attach_iommu_group(struct kvm *kvm, int tablefd,
 	}
 	rcu_read_unlock();
 
-	if (!found) {
-		fdput(f);
+	if (!found)
 		return -EINVAL;
-	}
 
 	table_group = iommu_group_get_iommudata(grp);
-	if (WARN_ON(!table_group)) {
-		fdput(f);
+	if (WARN_ON(!table_group))
 		return -EFAULT;
-	}
 
 	for (i = 0; i < IOMMU_TABLE_GROUP_MAX_TABLES; ++i) {
 		struct iommu_table *tbltmp = table_group->tables[i];
@@ -160,10 +155,8 @@ long kvm_spapr_tce_attach_iommu_group(struct kvm *kvm, int tablefd,
 			break;
 		}
 	}
-	if (!tbl) {
-		fdput(f);
+	if (!tbl)
 		return -EINVAL;
-	}
 
 	rcu_read_lock();
 	list_for_each_entry_rcu(stit, &stt->iommu_tables, next) {
@@ -174,7 +167,6 @@ long kvm_spapr_tce_attach_iommu_group(struct kvm *kvm, int tablefd,
 			/* stit is being destroyed */
 			iommu_tce_table_put(tbl);
 			rcu_read_unlock();
-			fdput(f);
 			return -ENOTTY;
 		}
 		/*
@@ -182,7 +174,6 @@ long kvm_spapr_tce_attach_iommu_group(struct kvm *kvm, int tablefd,
 		 * its KVM reference counter and can return.
 		 */
 		rcu_read_unlock();
-		fdput(f);
 		return 0;
 	}
 	rcu_read_unlock();
@@ -190,7 +181,6 @@ long kvm_spapr_tce_attach_iommu_group(struct kvm *kvm, int tablefd,
 	stit = kzalloc(sizeof(*stit), GFP_KERNEL);
 	if (!stit) {
 		iommu_tce_table_put(tbl);
-		fdput(f);
 		return -ENOMEM;
 	}
 
@@ -199,7 +189,6 @@ long kvm_spapr_tce_attach_iommu_group(struct kvm *kvm, int tablefd,
 
 	list_add_rcu(&stit->next, &stt->iommu_tables);
 
-	fdput(f);
 	return 0;
 }
 
diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index f14329989e9a..b3b37ea77849 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -1933,12 +1933,11 @@ static int kvm_vcpu_ioctl_enable_cap(struct kvm_vcpu *vcpu,
 #endif
 #ifdef CONFIG_KVM_MPIC
 	case KVM_CAP_IRQ_MPIC: {
-		struct fd f;
+		CLASS(fd, f)(cap->args[0]);
 		struct kvm_device *dev;
 
 		r = -EBADF;
-		f = fdget(cap->args[0]);
-		if (!fd_file(f))
+		if (fd_empty(f))
 			break;
 
 		r = -EPERM;
@@ -1946,18 +1945,16 @@ static int kvm_vcpu_ioctl_enable_cap(struct kvm_vcpu *vcpu,
 		if (dev)
 			r = kvmppc_mpic_connect_vcpu(dev, vcpu, cap->args[1]);
 
-		fdput(f);
 		break;
 	}
 #endif
 #ifdef CONFIG_KVM_XICS
 	case KVM_CAP_IRQ_XICS: {
-		struct fd f;
+		CLASS(fd, f)(cap->args[0]);
 		struct kvm_device *dev;
 
 		r = -EBADF;
-		f = fdget(cap->args[0]);
-		if (!fd_file(f))
+		if (fd_empty(f))
 			break;
 
 		r = -EPERM;
@@ -1968,34 +1965,27 @@ static int kvm_vcpu_ioctl_enable_cap(struct kvm_vcpu *vcpu,
 			else
 				r = kvmppc_xics_connect_vcpu(dev, vcpu, cap->args[1]);
 		}
-
-		fdput(f);
 		break;
 	}
 #endif /* CONFIG_KVM_XICS */
 #ifdef CONFIG_KVM_XIVE
 	case KVM_CAP_PPC_IRQ_XIVE: {
-		struct fd f;
+		CLASS(fd, f)(cap->args[0]);
 		struct kvm_device *dev;
 
 		r = -EBADF;
-		f = fdget(cap->args[0]);
-		if (!fd_file(f))
+		if (fd_empty(f))
 			break;
 
 		r = -ENXIO;
-		if (!xive_enabled()) {
-			fdput(f);
+		if (!xive_enabled())
 			break;
-		}
 
 		r = -EPERM;
 		dev = kvm_device_from_filp(fd_file(f));
 		if (dev)
 			r = kvmppc_xive_native_connect_vcpu(dev, vcpu,
 							    cap->args[1]);
-
-		fdput(f);
 		break;
 	}
 #endif /* CONFIG_KVM_XIVE */
diff --git a/arch/powerpc/platforms/cell/spu_syscalls.c b/arch/powerpc/platforms/cell/spu_syscalls.c
index cd7d42fc12a6..da4fad7fc8bf 100644
--- a/arch/powerpc/platforms/cell/spu_syscalls.c
+++ b/arch/powerpc/platforms/cell/spu_syscalls.c
@@ -64,12 +64,10 @@ SYSCALL_DEFINE4(spu_create, const char __user *, name, unsigned int, flags,
 		return -ENOSYS;
 
 	if (flags & SPU_CREATE_AFFINITY_SPU) {
-		struct fd neighbor = fdget(neighbor_fd);
+		CLASS(fd, neighbor)(neighbor_fd);
 		ret = -EBADF;
-		if (fd_file(neighbor)) {
+		if (!fd_empty(neighbor))
 			ret = calls->create_thread(name, flags, mode, fd_file(neighbor));
-			fdput(neighbor);
-		}
 	} else
 		ret = calls->create_thread(name, flags, mode, NULL);
 
diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
index d01deb386395..d91284001162 100644
--- a/arch/x86/kernel/cpu/sgx/main.c
+++ b/arch/x86/kernel/cpu/sgx/main.c
@@ -893,19 +893,15 @@ static struct miscdevice sgx_dev_provision = {
 int sgx_set_attribute(unsigned long *allowed_attributes,
 		      unsigned int attribute_fd)
 {
-	struct fd f = fdget(attribute_fd);
+	CLASS(fd, f)(attribute_fd);
 
-	if (!fd_file(f))
+	if (fd_empty(f))
 		return -EINVAL;
 
-	if (fd_file(f)->f_op != &sgx_provision_fops) {
-		fdput(f);
+	if (fd_file(f)->f_op != &sgx_provision_fops)
 		return -EINVAL;
-	}
 
 	*allowed_attributes |= SGX_ATTR_PROVISIONKEY;
-
-	fdput(f);
 	return 0;
 }
 EXPORT_SYMBOL_GPL(sgx_set_attribute);
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 0e38b5223263..65d6670a5969 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -530,17 +530,12 @@ static int sev_bind_asid(struct kvm *kvm, unsigned int handle, int *error)
 
 static int __sev_issue_cmd(int fd, int id, void *data, int *error)
 {
-	struct fd f;
-	int ret;
+	CLASS(fd, f)(fd);
 
-	f = fdget(fd);
-	if (!fd_file(f))
+	if (fd_empty(f))
 		return -EBADF;
 
-	ret = sev_issue_cmd_external_user(fd_file(f), id, data, error);
-
-	fdput(f);
-	return ret;
+	return sev_issue_cmd_external_user(fd_file(f), id, data, error);
 }
 
 static int sev_issue_cmd(struct kvm *kvm, int id, void *data, int *error)
@@ -2073,23 +2068,21 @@ int sev_vm_move_enc_context_from(struct kvm *kvm, unsigned int source_fd)
 {
 	struct kvm_sev_info *dst_sev = &to_kvm_svm(kvm)->sev_info;
 	struct kvm_sev_info *src_sev, *cg_cleanup_sev;
-	struct fd f = fdget(source_fd);
+	CLASS(fd, f)(source_fd);
 	struct kvm *source_kvm;
 	bool charged = false;
 	int ret;
 
-	if (!fd_file(f))
+	if (fd_empty(f))
 		return -EBADF;
 
-	if (!file_is_kvm(fd_file(f))) {
-		ret = -EBADF;
-		goto out_fput;
-	}
+	if (!file_is_kvm(fd_file(f)))
+		return -EBADF;
 
 	source_kvm = fd_file(f)->private_data;
 	ret = sev_lock_two_vms(kvm, source_kvm);
 	if (ret)
-		goto out_fput;
+		return ret;
 
 	if (kvm->arch.vm_type != source_kvm->arch.vm_type ||
 	    sev_guest(kvm) || !sev_guest(source_kvm)) {
@@ -2136,8 +2129,6 @@ int sev_vm_move_enc_context_from(struct kvm *kvm, unsigned int source_fd)
 	cg_cleanup_sev->misc_cg = NULL;
 out_unlock:
 	sev_unlock_two_vms(kvm, source_kvm);
-out_fput:
-	fdput(f);
 	return ret;
 }
 
@@ -2796,23 +2787,21 @@ int sev_mem_enc_unregister_region(struct kvm *kvm,
 
 int sev_vm_copy_enc_context_from(struct kvm *kvm, unsigned int source_fd)
 {
-	struct fd f = fdget(source_fd);
+	CLASS(fd, f)(source_fd);
 	struct kvm *source_kvm;
 	struct kvm_sev_info *source_sev, *mirror_sev;
 	int ret;
 
-	if (!fd_file(f))
+	if (fd_empty(f))
 		return -EBADF;
 
-	if (!file_is_kvm(fd_file(f))) {
-		ret = -EBADF;
-		goto e_source_fput;
-	}
+	if (!file_is_kvm(fd_file(f)))
+		return -EBADF;
 
 	source_kvm = fd_file(f)->private_data;
 	ret = sev_lock_two_vms(kvm, source_kvm);
 	if (ret)
-		goto e_source_fput;
+		return ret;
 
 	/*
 	 * Mirrors of mirrors should work, but let's not get silly.  Also
@@ -2855,8 +2844,6 @@ int sev_vm_copy_enc_context_from(struct kvm *kvm, unsigned int source_fd)
 
 e_unlock:
 	sev_unlock_two_vms(kvm, source_kvm);
-e_source_fput:
-	fdput(f);
 	return ret;
 }
 
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_sched.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_sched.c
index a9298cb8d19a..570634654489 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_sched.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_sched.c
@@ -36,21 +36,19 @@ static int amdgpu_sched_process_priority_override(struct amdgpu_device *adev,
 						  int fd,
 						  int32_t priority)
 {
-	struct fd f = fdget(fd);
+	CLASS(fd, f)(fd);
 	struct amdgpu_fpriv *fpriv;
 	struct amdgpu_ctx_mgr *mgr;
 	struct amdgpu_ctx *ctx;
 	uint32_t id;
 	int r;
 
-	if (!fd_file(f))
+	if (fd_empty(f))
 		return -EINVAL;
 
 	r = amdgpu_file_to_fpriv(fd_file(f), &fpriv);
-	if (r) {
-		fdput(f);
+	if (r)
 		return r;
-	}
 
 	mgr = &fpriv->ctx_mgr;
 	mutex_lock(&mgr->lock);
@@ -58,7 +56,6 @@ static int amdgpu_sched_process_priority_override(struct amdgpu_device *adev,
 		amdgpu_ctx_priority_override(ctx, priority);
 	mutex_unlock(&mgr->lock);
 
-	fdput(f);
 	return 0;
 }
 
@@ -67,31 +64,25 @@ static int amdgpu_sched_context_priority_override(struct amdgpu_device *adev,
 						  unsigned ctx_id,
 						  int32_t priority)
 {
-	struct fd f = fdget(fd);
+	CLASS(fd, f)(fd);
 	struct amdgpu_fpriv *fpriv;
 	struct amdgpu_ctx *ctx;
 	int r;
 
-	if (!fd_file(f))
+	if (fd_empty(f))
 		return -EINVAL;
 
 	r = amdgpu_file_to_fpriv(fd_file(f), &fpriv);
-	if (r) {
-		fdput(f);
+	if (r)
 		return r;
-	}
 
 	ctx = amdgpu_ctx_get(fpriv, ctx_id);
 
-	if (!ctx) {
-		fdput(f);
+	if (!ctx)
 		return -EINVAL;
-	}
 
 	amdgpu_ctx_priority_override(ctx, priority);
 	amdgpu_ctx_put(ctx);
-	fdput(f);
-
 	return 0;
 }
 
diff --git a/drivers/gpu/drm/drm_syncobj.c b/drivers/gpu/drm/drm_syncobj.c
index 7fb31ca3b5fc..4eaebd69253b 100644
--- a/drivers/gpu/drm/drm_syncobj.c
+++ b/drivers/gpu/drm/drm_syncobj.c
@@ -712,16 +712,14 @@ static int drm_syncobj_fd_to_handle(struct drm_file *file_private,
 				    int fd, u32 *handle)
 {
 	struct drm_syncobj *syncobj;
-	struct fd f = fdget(fd);
+	CLASS(fd, f)(fd);
 	int ret;
 
-	if (!fd_file(f))
+	if (fd_empty(f))
 		return -EINVAL;
 
-	if (fd_file(f)->f_op != &drm_syncobj_file_fops) {
-		fdput(f);
+	if (fd_file(f)->f_op != &drm_syncobj_file_fops)
 		return -EINVAL;
-	}
 
 	/* take a reference to put in the idr */
 	syncobj = fd_file(f)->private_data;
@@ -739,7 +737,6 @@ static int drm_syncobj_fd_to_handle(struct drm_file *file_private,
 	} else
 		drm_syncobj_put(syncobj);
 
-	fdput(f);
 	return ret;
 }
 
diff --git a/drivers/media/rc/lirc_dev.c b/drivers/media/rc/lirc_dev.c
index b8dfd530fab7..af6337489d21 100644
--- a/drivers/media/rc/lirc_dev.c
+++ b/drivers/media/rc/lirc_dev.c
@@ -816,28 +816,23 @@ void __exit lirc_dev_exit(void)
 
 struct rc_dev *rc_dev_get_from_fd(int fd, bool write)
 {
-	struct fd f = fdget(fd);
+	CLASS(fd, f)(fd);
 	struct lirc_fh *fh;
 	struct rc_dev *dev;
 
-	if (!fd_file(f))
+	if (fd_empty(f))
 		return ERR_PTR(-EBADF);
 
-	if (fd_file(f)->f_op != &lirc_fops) {
-		fdput(f);
+	if (fd_file(f)->f_op != &lirc_fops)
 		return ERR_PTR(-EINVAL);
-	}
 
-	if (write && !(fd_file(f)->f_mode & FMODE_WRITE)) {
-		fdput(f);
+	if (write && !(fd_file(f)->f_mode & FMODE_WRITE))
 		return ERR_PTR(-EPERM);
-	}
 
 	fh = fd_file(f)->private_data;
 	dev = fh->rc;
 
 	get_device(&dev->dev);
-	fdput(f);
 
 	return dev;
 }
diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 32ddd3d31719..04bee97a1ca0 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -1310,9 +1310,9 @@ static noinline int __btrfs_ioctl_snap_create(struct file *file,
 		ret = btrfs_mksubvol(&file->f_path, idmap, name,
 				     namelen, NULL, readonly, inherit);
 	} else {
-		struct fd src = fdget(fd);
+		CLASS(fd, src)(fd);
 		struct inode *src_inode;
-		if (!fd_file(src)) {
+		if (fd_empty(src)) {
 			ret = -EINVAL;
 			goto out_drop_write;
 		}
@@ -1343,7 +1343,6 @@ static noinline int __btrfs_ioctl_snap_create(struct file *file,
 					       BTRFS_I(src_inode)->root,
 					       readonly, inherit);
 		}
-		fdput(src);
 	}
 out_drop_write:
 	mnt_drop_write_file(file);
diff --git a/fs/eventfd.c b/fs/eventfd.c
index 22c934f3a080..76129bfcd663 100644
--- a/fs/eventfd.c
+++ b/fs/eventfd.c
@@ -347,13 +347,10 @@ EXPORT_SYMBOL_GPL(eventfd_fget);
  */
 struct eventfd_ctx *eventfd_ctx_fdget(int fd)
 {
-	struct eventfd_ctx *ctx;
-	struct fd f = fdget(fd);
-	if (!fd_file(f))
+	CLASS(fd, f)(fd);
+	if (fd_empty(f))
 		return ERR_PTR(-EBADF);
-	ctx = eventfd_ctx_fileget(fd_file(f));
-	fdput(f);
-	return ctx;
+	return eventfd_ctx_fileget(fd_file(f));
 }
 EXPORT_SYMBOL_GPL(eventfd_ctx_fdget);
 
diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index 28d1a754cf33..4609f7342005 100644
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -2259,25 +2259,22 @@ int do_epoll_ctl(int epfd, int op, int fd, struct epoll_event *epds,
 {
 	int error;
 	int full_check = 0;
-	struct fd f, tf;
 	struct eventpoll *ep;
 	struct epitem *epi;
 	struct eventpoll *tep = NULL;
 
-	error = -EBADF;
-	f = fdget(epfd);
-	if (!fd_file(f))
-		goto error_return;
+	CLASS(fd, f)(epfd);
+	if (fd_empty(f))
+		return -EBADF;
 
 	/* Get the "struct file *" for the target file */
-	tf = fdget(fd);
-	if (!fd_file(tf))
-		goto error_fput;
+	CLASS(fd, tf)(fd);
+	if (fd_empty(tf))
+		return -EBADF;
 
 	/* The target file descriptor must support poll */
-	error = -EPERM;
 	if (!file_can_poll(fd_file(tf)))
-		goto error_tgt_fput;
+		return -EPERM;
 
 	/* Check if EPOLLWAKEUP is allowed */
 	if (ep_op_has_event(op))
@@ -2396,12 +2393,6 @@ int do_epoll_ctl(int epfd, int op, int fd, struct epoll_event *epds,
 		loop_check_gen++;
 		mutex_unlock(&epnested_mutex);
 	}
-
-	fdput(tf);
-error_fput:
-	fdput(f);
-error_return:
-
 	return error;
 }
 
diff --git a/fs/fhandle.c b/fs/fhandle.c
index 3f07b52874a8..5996fd47a6f7 100644
--- a/fs/fhandle.c
+++ b/fs/fhandle.c
@@ -124,12 +124,11 @@ static int get_path_from_fd(int fd, struct path *root)
 		path_get(root);
 		spin_unlock(&fs->lock);
 	} else {
-		struct fd f = fdget(fd);
-		if (!fd_file(f))
+		CLASS(fd, f)(fd);
+		if (fd_empty(f))
 			return -EBADF;
 		*root = fd_file(f)->f_path;
 		path_get(root);
-		fdput(f);
 	}
 
 	return 0;
diff --git a/fs/ioctl.c b/fs/ioctl.c
index 6e0c954388d4..638a36be31c1 100644
--- a/fs/ioctl.c
+++ b/fs/ioctl.c
@@ -231,11 +231,11 @@ static int ioctl_fiemap(struct file *filp, struct fiemap __user *ufiemap)
 static long ioctl_file_clone(struct file *dst_file, unsigned long srcfd,
 			     u64 off, u64 olen, u64 destoff)
 {
-	struct fd src_file = fdget(srcfd);
+	CLASS(fd, src_file)(srcfd);
 	loff_t cloned;
 	int ret;
 
-	if (!fd_file(src_file))
+	if (fd_empty(src_file))
 		return -EBADF;
 	cloned = vfs_clone_file_range(fd_file(src_file), off, dst_file, destoff,
 				      olen, 0);
@@ -245,7 +245,6 @@ static long ioctl_file_clone(struct file *dst_file, unsigned long srcfd,
 		ret = -EINVAL;
 	else
 		ret = 0;
-	fdput(src_file);
 	return ret;
 }
 
@@ -892,22 +891,20 @@ static int do_vfs_ioctl(struct file *filp, unsigned int fd,
 
 SYSCALL_DEFINE3(ioctl, unsigned int, fd, unsigned int, cmd, unsigned long, arg)
 {
-	struct fd f = fdget(fd);
+	CLASS(fd, f)(fd);
 	int error;
 
-	if (!fd_file(f))
+	if (fd_empty(f))
 		return -EBADF;
 
 	error = security_file_ioctl(fd_file(f), cmd, arg);
 	if (error)
-		goto out;
+		return error;
 
 	error = do_vfs_ioctl(fd_file(f), fd, cmd, arg);
 	if (error == -ENOIOCTLCMD)
 		error = vfs_ioctl(fd_file(f), cmd, arg);
 
-out:
-	fdput(f);
 	return error;
 }
 
@@ -950,15 +947,15 @@ EXPORT_SYMBOL(compat_ptr_ioctl);
 COMPAT_SYSCALL_DEFINE3(ioctl, unsigned int, fd, unsigned int, cmd,
 		       compat_ulong_t, arg)
 {
-	struct fd f = fdget(fd);
+	CLASS(fd, f)(fd);
 	int error;
 
-	if (!fd_file(f))
+	if (fd_empty(f))
 		return -EBADF;
 
 	error = security_file_ioctl_compat(fd_file(f), cmd, arg);
 	if (error)
-		goto out;
+		return error;
 
 	switch (cmd) {
 	/* FICLONE takes an int argument, so don't use compat_ptr() */
@@ -1009,10 +1006,6 @@ COMPAT_SYSCALL_DEFINE3(ioctl, unsigned int, fd, unsigned int, cmd,
 			error = -ENOTTY;
 		break;
 	}
-
- out:
-	fdput(f);
-
 	return error;
 }
 #endif
diff --git a/fs/kernel_read_file.c b/fs/kernel_read_file.c
index 9ff37ae650ea..de32c95d823d 100644
--- a/fs/kernel_read_file.c
+++ b/fs/kernel_read_file.c
@@ -175,15 +175,11 @@ ssize_t kernel_read_file_from_fd(int fd, loff_t offset, void **buf,
 				 size_t buf_size, size_t *file_size,
 				 enum kernel_read_file_id id)
 {
-	struct fd f = fdget(fd);
-	ssize_t ret = -EBADF;
+	CLASS(fd, f)(fd);
 
-	if (!fd_file(f) || !(fd_file(f)->f_mode & FMODE_READ))
-		goto out;
+	if (fd_empty(f) || !(fd_file(f)->f_mode & FMODE_READ))
+		return -EBADF;
 
-	ret = kernel_read_file(fd_file(f), offset, buf, buf_size, file_size, id);
-out:
-	fdput(f);
-	return ret;
+	return kernel_read_file(fd_file(f), offset, buf, buf_size, file_size, id);
 }
 EXPORT_SYMBOL_GPL(kernel_read_file_from_fd);
diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c
index 13454e5fd3fb..64b1adc75fe5 100644
--- a/fs/notify/fanotify/fanotify_user.c
+++ b/fs/notify/fanotify/fanotify_user.c
@@ -1003,22 +1003,17 @@ static int fanotify_find_path(int dfd, const char __user *filename,
 		 dfd, filename, flags);
 
 	if (filename == NULL) {
-		struct fd f = fdget(dfd);
+		CLASS(fd, f)(dfd);
 
-		ret = -EBADF;
-		if (!fd_file(f))
-			goto out;
+		if (fd_empty(f))
+			return -EBADF;
 
-		ret = -ENOTDIR;
 		if ((flags & FAN_MARK_ONLYDIR) &&
-		    !(S_ISDIR(file_inode(fd_file(f))->i_mode))) {
-			fdput(f);
-			goto out;
-		}
+		    !(S_ISDIR(file_inode(fd_file(f))->i_mode)))
+			return -ENOTDIR;
 
 		*path = fd_file(f)->f_path;
 		path_get(path);
-		fdput(f);
 	} else {
 		unsigned int lookup_flags = 0;
 
diff --git a/fs/notify/inotify/inotify_user.c b/fs/notify/inotify/inotify_user.c
index c7e451d5bd51..711830b73275 100644
--- a/fs/notify/inotify/inotify_user.c
+++ b/fs/notify/inotify/inotify_user.c
@@ -794,33 +794,26 @@ SYSCALL_DEFINE2(inotify_rm_watch, int, fd, __s32, wd)
 {
 	struct fsnotify_group *group;
 	struct inotify_inode_mark *i_mark;
-	struct fd f;
-	int ret = -EINVAL;
+	CLASS(fd, f)(fd);
 
-	f = fdget(fd);
-	if (unlikely(!fd_file(f)))
+	if (fd_empty(f))
 		return -EBADF;
 
 	/* verify that this is indeed an inotify instance */
 	if (unlikely(fd_file(f)->f_op != &inotify_fops))
-		goto out;
+		return -EINVAL;
 
 	group = fd_file(f)->private_data;
 
 	i_mark = inotify_idr_find(group, wd);
 	if (unlikely(!i_mark))
-		goto out;
-
-	ret = 0;
+		return -EINVAL;
 
 	fsnotify_destroy_mark(&i_mark->fsn_mark, group);
 
 	/* match ref taken by inotify_idr_find */
 	fsnotify_put_mark(&i_mark->fsn_mark);
-
-out:
-	fdput(f);
-	return ret;
+	return 0;
 }
 
 /*
diff --git a/fs/open.c b/fs/open.c
index 624c173a0270..6d7dd58ce095 100644
--- a/fs/open.c
+++ b/fs/open.c
@@ -350,14 +350,12 @@ EXPORT_SYMBOL_GPL(vfs_fallocate);
 
 int ksys_fallocate(int fd, int mode, loff_t offset, loff_t len)
 {
-	struct fd f = fdget(fd);
-	int error = -EBADF;
+	CLASS(fd, f)(fd);
 
-	if (fd_file(f)) {
-		error = vfs_fallocate(fd_file(f), mode, offset, len);
-		fdput(f);
-	}
-	return error;
+	if (fd_empty(f))
+		return -EBADF;
+
+	return vfs_fallocate(fd_file(f), mode, offset, len);
 }
 
 SYSCALL_DEFINE4(fallocate, int, fd, int, mode, loff_t, offset, loff_t, len)
@@ -667,14 +665,12 @@ int vfs_fchmod(struct file *file, umode_t mode)
 
 SYSCALL_DEFINE2(fchmod, unsigned int, fd, umode_t, mode)
 {
-	struct fd f = fdget(fd);
-	int err = -EBADF;
+	CLASS(fd, f)(fd);
 
-	if (fd_file(f)) {
-		err = vfs_fchmod(fd_file(f), mode);
-		fdput(f);
-	}
-	return err;
+	if (fd_empty(f))
+		return -EBADF;
+
+	return vfs_fchmod(fd_file(f), mode);
 }
 
 static int do_fchmodat(int dfd, const char __user *filename, umode_t mode,
@@ -861,14 +857,12 @@ int vfs_fchown(struct file *file, uid_t user, gid_t group)
 
 int ksys_fchown(unsigned int fd, uid_t user, gid_t group)
 {
-	struct fd f = fdget(fd);
-	int error = -EBADF;
+	CLASS(fd, f)(fd);
 
-	if (fd_file(f)) {
-		error = vfs_fchown(fd_file(f), user, group);
-		fdput(f);
-	}
-	return error;
+	if (fd_empty(f))
+		return -EBADF;
+
+	return vfs_fchown(fd_file(f), user, group);
 }
 
 SYSCALL_DEFINE3(fchown, unsigned int, fd, uid_t, user, gid_t, group)
diff --git a/fs/read_write.c b/fs/read_write.c
index 7e5a2e50e12b..b3743e7dd93f 100644
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -1570,36 +1570,32 @@ SYSCALL_DEFINE6(copy_file_range, int, fd_in, loff_t __user *, off_in,
 {
 	loff_t pos_in;
 	loff_t pos_out;
-	struct fd f_in;
-	struct fd f_out;
 	ssize_t ret = -EBADF;
 
-	f_in = fdget(fd_in);
-	if (!fd_file(f_in))
-		goto out2;
+	CLASS(fd, f_in)(fd_in);
+	if (fd_empty(f_in))
+		return -EBADF;
 
-	f_out = fdget(fd_out);
-	if (!fd_file(f_out))
-		goto out1;
+	CLASS(fd, f_out)(fd_out);
+	if (fd_empty(f_out))
+		return -EBADF;
 
-	ret = -EFAULT;
 	if (off_in) {
 		if (copy_from_user(&pos_in, off_in, sizeof(loff_t)))
-			goto out;
+			return -EFAULT;
 	} else {
 		pos_in = fd_file(f_in)->f_pos;
 	}
 
 	if (off_out) {
 		if (copy_from_user(&pos_out, off_out, sizeof(loff_t)))
-			goto out;
+			return -EFAULT;
 	} else {
 		pos_out = fd_file(f_out)->f_pos;
 	}
 
-	ret = -EINVAL;
 	if (flags != 0)
-		goto out;
+		return -EINVAL;
 
 	ret = vfs_copy_file_range(fd_file(f_in), pos_in, fd_file(f_out), pos_out, len,
 				  flags);
@@ -1621,12 +1617,6 @@ SYSCALL_DEFINE6(copy_file_range, int, fd_in, loff_t __user *, off_in,
 			fd_file(f_out)->f_pos = pos_out;
 		}
 	}
-
-out:
-	fdput(f_out);
-out1:
-	fdput(f_in);
-out2:
 	return ret;
 }
 
diff --git a/fs/signalfd.c b/fs/signalfd.c
index 777e889ab0e8..b1844e0f58eb 100644
--- a/fs/signalfd.c
+++ b/fs/signalfd.c
@@ -288,20 +288,17 @@ static int do_signalfd4(int ufd, sigset_t *mask, int flags)
 
 		fd_install(ufd, file);
 	} else {
-		struct fd f = fdget(ufd);
-		if (!fd_file(f))
+		CLASS(fd, f)(ufd);
+		if (fd_empty(f))
 			return -EBADF;
 		ctx = fd_file(f)->private_data;
-		if (fd_file(f)->f_op != &signalfd_fops) {
-			fdput(f);
+		if (fd_file(f)->f_op != &signalfd_fops)
 			return -EINVAL;
-		}
 		spin_lock_irq(&current->sighand->siglock);
 		ctx->sigmask = *mask;
 		spin_unlock_irq(&current->sighand->siglock);
 
 		wake_up(&current->sighand->signalfd_wqh);
-		fdput(f);
 	}
 
 	return ufd;
diff --git a/fs/sync.c b/fs/sync.c
index 67df255eb189..2955cd4c77a3 100644
--- a/fs/sync.c
+++ b/fs/sync.c
@@ -148,11 +148,11 @@ void emergency_sync(void)
  */
 SYSCALL_DEFINE1(syncfs, int, fd)
 {
-	struct fd f = fdget(fd);
+	CLASS(fd, f)(fd);
 	struct super_block *sb;
 	int ret, ret2;
 
-	if (!fd_file(f))
+	if (fd_empty(f))
 		return -EBADF;
 	sb = fd_file(f)->f_path.dentry->d_sb;
 
@@ -162,7 +162,6 @@ SYSCALL_DEFINE1(syncfs, int, fd)
 
 	ret2 = errseq_check_and_advance(&sb->s_wb_err, &fd_file(f)->f_sb_err);
 
-	fdput(f);
 	return ret ? ret : ret2;
 }
 
@@ -205,14 +204,12 @@ EXPORT_SYMBOL(vfs_fsync);
 
 static int do_fsync(unsigned int fd, int datasync)
 {
-	struct fd f = fdget(fd);
-	int ret = -EBADF;
+	CLASS(fd, f)(fd);
 
-	if (fd_file(f)) {
-		ret = vfs_fsync(fd_file(f), datasync);
-		fdput(f);
-	}
-	return ret;
+	if (fd_empty(f))
+		return -EBADF;
+
+	return vfs_fsync(fd_file(f), datasync);
 }
 
 SYSCALL_DEFINE1(fsync, unsigned int, fd)
@@ -355,16 +352,12 @@ int sync_file_range(struct file *file, loff_t offset, loff_t nbytes,
 int ksys_sync_file_range(int fd, loff_t offset, loff_t nbytes,
 			 unsigned int flags)
 {
-	int ret;
-	struct fd f;
+	CLASS(fd, f)(fd);
 
-	ret = -EBADF;
-	f = fdget(fd);
-	if (fd_file(f))
-		ret = sync_file_range(fd_file(f), offset, nbytes, flags);
+	if (fd_empty(f))
+		return -EBADF;
 
-	fdput(f);
-	return ret;
+	return sync_file_range(fd_file(f), offset, nbytes, flags);
 }
 
 SYSCALL_DEFINE4(sync_file_range, int, fd, loff_t, offset, loff_t, nbytes,
diff --git a/fs/xattr.c b/fs/xattr.c
index 05ec7e7d9e87..0fc813cb005c 100644
--- a/fs/xattr.c
+++ b/fs/xattr.c
@@ -697,9 +697,9 @@ SYSCALL_DEFINE5(fsetxattr, int, fd, const char __user *, name,
 	int error;
 
 	CLASS(fd, f)(fd);
-	if (!fd_file(f))
-		return -EBADF;
 
+	if (fd_empty(f))
+		return -EBADF;
 	audit_file(fd_file(f));
 	error = setxattr_copy(name, &ctx);
 	if (error)
@@ -809,16 +809,13 @@ SYSCALL_DEFINE4(lgetxattr, const char __user *, pathname,
 SYSCALL_DEFINE4(fgetxattr, int, fd, const char __user *, name,
 		void __user *, value, size_t, size)
 {
-	struct fd f = fdget(fd);
-	ssize_t error = -EBADF;
+	CLASS(fd, f)(fd);
 
-	if (!fd_file(f))
-		return error;
+	if (fd_empty(f))
+		return -EBADF;
 	audit_file(fd_file(f));
-	error = getxattr(file_mnt_idmap(fd_file(f)), fd_file(f)->f_path.dentry,
+	return getxattr(file_mnt_idmap(fd_file(f)), fd_file(f)->f_path.dentry,
 			 name, value, size);
-	fdput(f);
-	return error;
 }
 
 /*
@@ -885,15 +882,12 @@ SYSCALL_DEFINE3(llistxattr, const char __user *, pathname, char __user *, list,
 
 SYSCALL_DEFINE3(flistxattr, int, fd, char __user *, list, size_t, size)
 {
-	struct fd f = fdget(fd);
-	ssize_t error = -EBADF;
+	CLASS(fd, f)(fd);
 
-	if (!fd_file(f))
-		return error;
+	if (fd_empty(f))
+		return -EBADF;
 	audit_file(fd_file(f));
-	error = listxattr(fd_file(f)->f_path.dentry, list, size);
-	fdput(f);
-	return error;
+	return listxattr(fd_file(f)->f_path.dentry, list, size);
 }
 
 /*
@@ -950,12 +944,12 @@ SYSCALL_DEFINE2(lremovexattr, const char __user *, pathname,
 
 SYSCALL_DEFINE2(fremovexattr, int, fd, const char __user *, name)
 {
-	struct fd f = fdget(fd);
+	CLASS(fd, f)(fd);
 	char kname[XATTR_NAME_MAX + 1];
-	int error = -EBADF;
+	int error;
 
-	if (!fd_file(f))
-		return error;
+	if (fd_empty(f))
+		return -EBADF;
 	audit_file(fd_file(f));
 
 	error = strncpy_from_user(kname, name, sizeof(kname));
@@ -970,7 +964,6 @@ SYSCALL_DEFINE2(fremovexattr, int, fd, const char __user *, name)
 				    fd_file(f)->f_path.dentry, kname);
 		mnt_drop_write_file(fd_file(f));
 	}
-	fdput(f);
 	return error;
 }
 
diff --git a/io_uring/sqpoll.c b/io_uring/sqpoll.c
index ffa7d341bd95..26b1c14d5967 100644
--- a/io_uring/sqpoll.c
+++ b/io_uring/sqpoll.c
@@ -105,29 +105,21 @@ static struct io_sq_data *io_attach_sq_data(struct io_uring_params *p)
 {
 	struct io_ring_ctx *ctx_attach;
 	struct io_sq_data *sqd;
-	struct fd f;
+	CLASS(fd, f)(p->wq_fd);
 
-	f = fdget(p->wq_fd);
-	if (!fd_file(f))
+	if (fd_empty(f))
 		return ERR_PTR(-ENXIO);
-	if (!io_is_uring_fops(fd_file(f))) {
-		fdput(f);
+	if (!io_is_uring_fops(fd_file(f)))
 		return ERR_PTR(-EINVAL);
-	}
 
 	ctx_attach = fd_file(f)->private_data;
 	sqd = ctx_attach->sq_data;
-	if (!sqd) {
-		fdput(f);
+	if (!sqd)
 		return ERR_PTR(-EINVAL);
-	}
-	if (sqd->task_tgid != current->tgid) {
-		fdput(f);
+	if (sqd->task_tgid != current->tgid)
 		return ERR_PTR(-EPERM);
-	}
 
 	refcount_inc(&sqd->refs);
-	fdput(f);
 	return sqd;
 }
 
@@ -415,16 +407,11 @@ __cold int io_sq_offload_create(struct io_ring_ctx *ctx,
 	/* Retain compatibility with failing for an invalid attach attempt */
 	if ((ctx->flags & (IORING_SETUP_ATTACH_WQ | IORING_SETUP_SQPOLL)) ==
 				IORING_SETUP_ATTACH_WQ) {
-		struct fd f;
-
-		f = fdget(p->wq_fd);
-		if (!fd_file(f))
+		CLASS(fd, f)(p->wq_fd);
+		if (fd_empty(f))
 			return -ENXIO;
-		if (!io_is_uring_fops(fd_file(f))) {
-			fdput(f);
+		if (!io_is_uring_fops(fd_file(f)))
 			return -EINVAL;
-		}
-		fdput(f);
 	}
 	if (ctx->flags & IORING_SETUP_SQPOLL) {
 		struct task_struct *tsk;
diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
index 4de1e3dc2284..8afcffefbb56 100644
--- a/kernel/bpf/btf.c
+++ b/kernel/bpf/btf.c
@@ -7673,21 +7673,16 @@ int btf_new_fd(const union bpf_attr *attr, bpfptr_t uattr, u32 uattr_size)
 struct btf *btf_get_by_fd(int fd)
 {
 	struct btf *btf;
-	struct fd f;
+	CLASS(fd, f)(fd);
 
-	f = fdget(fd);
-
-	if (!fd_file(f))
+	if (fd_empty(f))
 		return ERR_PTR(-EBADF);
 
-	if (fd_file(f)->f_op != &btf_fops) {
-		fdput(f);
+	if (fd_file(f)->f_op != &btf_fops)
 		return ERR_PTR(-EINVAL);
-	}
 
 	btf = fd_file(f)->private_data;
 	refcount_inc(&btf->refcnt);
-	fdput(f);
 
 	return btf;
 }
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 807042c643c8..4b39a1ea63e2 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -3197,20 +3197,16 @@ int bpf_link_new_fd(struct bpf_link *link)
 
 struct bpf_link *bpf_link_get_from_fd(u32 ufd)
 {
-	struct fd f = fdget(ufd);
+	CLASS(fd, f)(ufd);
 	struct bpf_link *link;
 
-	if (!fd_file(f))
+	if (fd_empty(f))
 		return ERR_PTR(-EBADF);
-	if (fd_file(f)->f_op != &bpf_link_fops && fd_file(f)->f_op != &bpf_link_fops_poll) {
-		fdput(f);
+	if (fd_file(f)->f_op != &bpf_link_fops && fd_file(f)->f_op != &bpf_link_fops_poll)
 		return ERR_PTR(-EINVAL);
-	}
 
 	link = fd_file(f)->private_data;
 	bpf_link_inc(link);
-	fdput(f);
-
 	return link;
 }
 EXPORT_SYMBOL(bpf_link_get_from_fd);
diff --git a/kernel/bpf/token.c b/kernel/bpf/token.c
index 9a1d356e79ed..9b92cb886d49 100644
--- a/kernel/bpf/token.c
+++ b/kernel/bpf/token.c
@@ -232,19 +232,16 @@ int bpf_token_create(union bpf_attr *attr)
 
 struct bpf_token *bpf_token_get_from_fd(u32 ufd)
 {
-	struct fd f = fdget(ufd);
+	CLASS(fd, f)(ufd);
 	struct bpf_token *token;
 
-	if (!fd_file(f))
+	if (fd_empty(f))
 		return ERR_PTR(-EBADF);
-	if (fd_file(f)->f_op != &bpf_token_fops) {
-		fdput(f);
+	if (fd_file(f)->f_op != &bpf_token_fops)
 		return ERR_PTR(-EINVAL);
-	}
 
 	token = fd_file(f)->private_data;
 	bpf_token_inc(token);
-	fdput(f);
 
 	return token;
 }
diff --git a/kernel/events/core.c b/kernel/events/core.c
index dae815c30514..81c3c926e938 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -930,22 +930,20 @@ static inline int perf_cgroup_connect(int fd, struct perf_event *event,
 {
 	struct perf_cgroup *cgrp;
 	struct cgroup_subsys_state *css;
-	struct fd f = fdget(fd);
+	CLASS(fd, f)(fd);
 	int ret = 0;
 
-	if (!fd_file(f))
+	if (fd_empty(f))
 		return -EBADF;
 
 	css = css_tryget_online_from_dir(fd_file(f)->f_path.dentry,
 					 &perf_event_cgrp_subsys);
-	if (IS_ERR(css)) {
-		ret = PTR_ERR(css);
-		goto out;
-	}
+	if (IS_ERR(css))
+		return PTR_ERR(css);
 
 	ret = perf_cgroup_ensure_storage(event, css);
 	if (ret)
-		goto out;
+		return ret;
 
 	cgrp = container_of(css, struct perf_cgroup, css);
 	event->cgrp = cgrp;
@@ -959,8 +957,6 @@ static inline int perf_cgroup_connect(int fd, struct perf_event *event,
 		perf_detach_cgroup(event);
 		ret = -EINVAL;
 	}
-out:
-	fdput(f);
 	return ret;
 }
 
diff --git a/kernel/nsproxy.c b/kernel/nsproxy.c
index dc952c3b05af..c9d97ed20122 100644
--- a/kernel/nsproxy.c
+++ b/kernel/nsproxy.c
@@ -545,12 +545,12 @@ static void commit_nsset(struct nsset *nsset)
 
 SYSCALL_DEFINE2(setns, int, fd, int, flags)
 {
-	struct fd f = fdget(fd);
+	CLASS(fd, f)(fd);
 	struct ns_common *ns = NULL;
 	struct nsset nsset = {};
 	int err = 0;
 
-	if (!fd_file(f))
+	if (fd_empty(f))
 		return -EBADF;
 
 	if (proc_ns_file(fd_file(f))) {
@@ -580,7 +580,6 @@ SYSCALL_DEFINE2(setns, int, fd, int, flags)
 	}
 	put_nsset(&nsset);
 out:
-	fdput(f);
 	return err;
 }
 
diff --git a/kernel/pid.c b/kernel/pid.c
index 2715afb77eab..b5bbc1a8a6e4 100644
--- a/kernel/pid.c
+++ b/kernel/pid.c
@@ -536,11 +536,10 @@ EXPORT_SYMBOL_GPL(find_ge_pid);
 
 struct pid *pidfd_get_pid(unsigned int fd, unsigned int *flags)
 {
-	struct fd f;
+	CLASS(fd, f)(fd);
 	struct pid *pid;
 
-	f = fdget(fd);
-	if (!fd_file(f))
+	if (fd_empty(f))
 		return ERR_PTR(-EBADF);
 
 	pid = pidfd_pid(fd_file(f));
@@ -548,8 +547,6 @@ struct pid *pidfd_get_pid(unsigned int fd, unsigned int *flags)
 		get_pid(pid);
 		*flags = fd_file(f)->f_flags;
 	}
-
-	fdput(f);
 	return pid;
 }
 
diff --git a/kernel/sys.c b/kernel/sys.c
index a4be1e568ff5..243d58916899 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -1911,12 +1911,11 @@ SYSCALL_DEFINE1(umask, int, mask)
 
 static int prctl_set_mm_exe_file(struct mm_struct *mm, unsigned int fd)
 {
-	struct fd exe;
+	CLASS(fd, exe)(fd);
 	struct inode *inode;
 	int err;
 
-	exe = fdget(fd);
-	if (!fd_file(exe))
+	if (fd_empty(exe))
 		return -EBADF;
 
 	inode = file_inode(fd_file(exe));
@@ -1926,18 +1925,14 @@ static int prctl_set_mm_exe_file(struct mm_struct *mm, unsigned int fd)
 	 * sure that this one is executable as well, to avoid breaking an
 	 * overall picture.
 	 */
-	err = -EACCES;
 	if (!S_ISREG(inode->i_mode) || path_noexec(&fd_file(exe)->f_path))
-		goto exit;
+		return -EACCES;
 
 	err = file_permission(fd_file(exe), MAY_EXEC);
 	if (err)
-		goto exit;
+		return err;
 
-	err = replace_mm_exe_file(mm, fd_file(exe));
-exit:
-	fdput(exe);
-	return err;
+	return replace_mm_exe_file(mm, fd_file(exe));
 }
 
 /*
diff --git a/kernel/watch_queue.c b/kernel/watch_queue.c
index d36242fd4936..1895fbc32bcb 100644
--- a/kernel/watch_queue.c
+++ b/kernel/watch_queue.c
@@ -663,16 +663,14 @@ struct watch_queue *get_watch_queue(int fd)
 {
 	struct pipe_inode_info *pipe;
 	struct watch_queue *wqueue = ERR_PTR(-EINVAL);
-	struct fd f;
+	CLASS(fd, f)(fd);
 
-	f = fdget(fd);
-	if (fd_file(f)) {
+	if (!fd_empty(f)) {
 		pipe = get_pipe_info(fd_file(f), false);
 		if (pipe && pipe->watch_queue) {
 			wqueue = pipe->watch_queue;
 			kref_get(&wqueue->usage);
 		}
-		fdput(f);
 	}
 
 	return wqueue;
diff --git a/mm/fadvise.c b/mm/fadvise.c
index 532dee205c6e..588fe76c5a14 100644
--- a/mm/fadvise.c
+++ b/mm/fadvise.c
@@ -190,16 +190,12 @@ EXPORT_SYMBOL(vfs_fadvise);
 
 int ksys_fadvise64_64(int fd, loff_t offset, loff_t len, int advice)
 {
-	struct fd f = fdget(fd);
-	int ret;
+	CLASS(fd, f)(fd);
 
-	if (!fd_file(f))
+	if (fd_empty(f))
 		return -EBADF;
 
-	ret = vfs_fadvise(fd_file(f), offset, len, advice);
-
-	fdput(f);
-	return ret;
+	return vfs_fadvise(fd_file(f), offset, len, advice);
 }
 
 SYSCALL_DEFINE4(fadvise64_64, int, fd, loff_t, offset, loff_t, len, int, advice)
diff --git a/mm/readahead.c b/mm/readahead.c
index e83fe1c6e5ac..fa881c55646e 100644
--- a/mm/readahead.c
+++ b/mm/readahead.c
@@ -641,29 +641,22 @@ EXPORT_SYMBOL_GPL(page_cache_async_ra);
 
 ssize_t ksys_readahead(int fd, loff_t offset, size_t count)
 {
-	ssize_t ret;
-	struct fd f;
+	CLASS(fd, f)(fd);
 
-	ret = -EBADF;
-	f = fdget(fd);
-	if (!fd_file(f) || !(fd_file(f)->f_mode & FMODE_READ))
-		goto out;
+	if (fd_empty(f) || !(fd_file(f)->f_mode & FMODE_READ))
+		return -EBADF;
 
 	/*
 	 * The readahead() syscall is intended to run only on files
 	 * that can execute readahead. If readahead is not possible
 	 * on this file, then we must return -EINVAL.
 	 */
-	ret = -EINVAL;
 	if (!fd_file(f)->f_mapping || !fd_file(f)->f_mapping->a_ops ||
 	    (!S_ISREG(file_inode(fd_file(f))->i_mode) &&
 	    !S_ISBLK(file_inode(fd_file(f))->i_mode)))
-		goto out;
+		return -EINVAL;
 
-	ret = vfs_fadvise(fd_file(f), offset, count, POSIX_FADV_WILLNEED);
-out:
-	fdput(f);
-	return ret;
+	return vfs_fadvise(fd_file(f), offset, count, POSIX_FADV_WILLNEED);
 }
 
 SYSCALL_DEFINE3(readahead, int, fd, loff_t, offset, size_t, count)
diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c
index 18b7c8f31234..f4334aa5ef1f 100644
--- a/net/core/net_namespace.c
+++ b/net/core/net_namespace.c
@@ -708,20 +708,18 @@ EXPORT_SYMBOL_GPL(get_net_ns);
 
 struct net *get_net_ns_by_fd(int fd)
 {
-	struct fd f = fdget(fd);
-	struct net *net = ERR_PTR(-EINVAL);
+	CLASS(fd, f)(fd);
 
-	if (!fd_file(f))
+	if (fd_empty(f))
 		return ERR_PTR(-EBADF);
 
 	if (proc_ns_file(fd_file(f))) {
 		struct ns_common *ns = get_proc_ns(file_inode(fd_file(f)));
 		if (ns->ops == &netns_operations)
-			net = get_net(container_of(ns, struct net, ns));
+			return get_net(container_of(ns, struct net, ns));
 	}
-	fdput(f);
 
-	return net;
+	return ERR_PTR(-EINVAL);
 }
 EXPORT_SYMBOL_GPL(get_net_ns_by_fd);
 #endif
diff --git a/security/landlock/syscalls.c b/security/landlock/syscalls.c
index 9689169ad5cc..c7edb2b590d5 100644
--- a/security/landlock/syscalls.c
+++ b/security/landlock/syscalls.c
@@ -234,31 +234,21 @@ SYSCALL_DEFINE3(landlock_create_ruleset,
 static struct landlock_ruleset *get_ruleset_from_fd(const int fd,
 						    const fmode_t mode)
 {
-	struct fd ruleset_f;
+	CLASS(fd, ruleset_f)(fd);
 	struct landlock_ruleset *ruleset;
 
-	ruleset_f = fdget(fd);
-	if (!fd_file(ruleset_f))
+	if (fd_empty(ruleset_f))
 		return ERR_PTR(-EBADF);
 
 	/* Checks FD type and access right. */
-	if (fd_file(ruleset_f)->f_op != &ruleset_fops) {
-		ruleset = ERR_PTR(-EBADFD);
-		goto out_fdput;
-	}
-	if (!(fd_file(ruleset_f)->f_mode & mode)) {
-		ruleset = ERR_PTR(-EPERM);
-		goto out_fdput;
-	}
+	if (fd_file(ruleset_f)->f_op != &ruleset_fops)
+		return ERR_PTR(-EBADFD);
+	if (!(fd_file(ruleset_f)->f_mode & mode))
+		return ERR_PTR(-EPERM);
 	ruleset = fd_file(ruleset_f)->private_data;
-	if (WARN_ON_ONCE(ruleset->num_layers != 1)) {
-		ruleset = ERR_PTR(-EINVAL);
-		goto out_fdput;
-	}
+	if (WARN_ON_ONCE(ruleset->num_layers != 1))
+		return ERR_PTR(-EINVAL);
 	landlock_get_ruleset(ruleset);
-
-out_fdput:
-	fdput(ruleset_f);
 	return ruleset;
 }
 
diff --git a/virt/kvm/vfio.c b/virt/kvm/vfio.c
index 388ae471d258..53262b8a7656 100644
--- a/virt/kvm/vfio.c
+++ b/virt/kvm/vfio.c
@@ -190,11 +190,10 @@ static int kvm_vfio_file_del(struct kvm_device *dev, unsigned int fd)
 {
 	struct kvm_vfio *kv = dev->private;
 	struct kvm_vfio_file *kvf;
-	struct fd f;
+	CLASS(fd, f)(fd);
 	int ret;
 
-	f = fdget(fd);
-	if (!fd_file(f))
+	if (fd_empty(f))
 		return -EBADF;
 
 	ret = -ENOENT;
@@ -220,9 +219,6 @@ static int kvm_vfio_file_del(struct kvm_device *dev, unsigned int fd)
 	kvm_vfio_update_coherency(dev);
 
 	mutex_unlock(&kv->lock);
-
-	fdput(f);
-
 	return ret;
 }
 
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 134+ messages in thread

* Re: [PATCH 23/39] fdget(), trivial conversions
  2024-07-30  5:16   ` [PATCH 23/39] fdget(), trivial conversions viro
@ 2024-08-07 10:37     ` Christian Brauner
  0 siblings, 0 replies; 134+ messages in thread
From: Christian Brauner @ 2024-08-07 10:37 UTC (permalink / raw)
  To: viro; +Cc: linux-fsdevel, amir73il, bpf, cgroups, kvm, netdev, torvalds

On Tue, Jul 30, 2024 at 01:16:09AM GMT, viro@kernel.org wrote:
> From: Al Viro <viro@zeniv.linux.org.uk>
> 
> fdget() is the first thing done in scope, all matching fdput() are
> immediately followed by leaving the scope.
> 
> [conflict in fs/fhandle.c and a trival one in kernel/bpf/syscall.c]
> [conflict in fs/xattr.c]
> 
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> ---

Reviewed-by: Christian Brauner <brauner@kernel.org>

^ permalink raw reply	[flat|nested] 134+ messages in thread

* [PATCH 24/39] fdget(), more trivial conversions
  2024-07-30  5:15 ` [PATCH 01/39] memcg_write_event_control(): fix a user-triggerable oops viro
                     ` (21 preceding siblings ...)
  2024-07-30  5:16   ` [PATCH 23/39] fdget(), trivial conversions viro
@ 2024-07-30  5:16   ` viro
  2024-08-07 10:39     ` Christian Brauner
  2024-07-30  5:16   ` [PATCH 25/39] convert do_preadv()/do_pwritev() viro
                     ` (15 subsequent siblings)
  38 siblings, 1 reply; 134+ messages in thread
From: viro @ 2024-07-30  5:16 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: amir73il, bpf, brauner, cgroups, kvm, netdev, torvalds

From: Al Viro <viro@zeniv.linux.org.uk>

all failure exits prior to fdget() leave the scope, all matching fdput()
are immediately followed by leaving the scope.

[trivial conflicts in fs/fsopen.c and kernel/bpf/syscall.c]

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 drivers/infiniband/core/ucma.c     | 19 +++-----
 drivers/vfio/group.c               |  6 +--
 fs/eventpoll.c                     | 15 ++----
 fs/ext4/ioctl.c                    | 21 +++------
 fs/f2fs/file.c                     | 15 ++----
 fs/fsopen.c                        | 19 +++-----
 fs/fuse/dev.c                      |  6 +--
 fs/locks.c                         | 15 ++----
 fs/namespace.c                     | 47 ++++++------------
 fs/notify/fanotify/fanotify_user.c | 29 +++++-------
 fs/notify/inotify/inotify_user.c   | 21 +++------
 fs/ocfs2/cluster/heartbeat.c       | 13 ++---
 fs/open.c                          | 12 ++---
 fs/read_write.c                    | 71 ++++++++++------------------
 fs/splice.c                        | 45 +++++++-----------
 fs/utimes.c                        | 11 ++---
 fs/xfs/xfs_exchrange.c             | 10 ++--
 fs/xfs/xfs_ioctl.c                 | 69 +++++++++------------------
 ipc/mqueue.c                       | 76 +++++++++---------------------
 kernel/bpf/syscall.c               | 22 +++------
 kernel/module/main.c               | 11 ++---
 kernel/pid.c                       | 13 ++---
 kernel/signal.c                    | 29 ++++--------
 kernel/taskstats.c                 | 18 +++----
 security/integrity/ima/ima_main.c  |  7 +--
 security/loadpin/loadpin.c         |  8 +---
 virt/kvm/vfio.c                    |  6 +--
 27 files changed, 207 insertions(+), 427 deletions(-)

diff --git a/drivers/infiniband/core/ucma.c b/drivers/infiniband/core/ucma.c
index dc57d07a1f45..cc95718fd24b 100644
--- a/drivers/infiniband/core/ucma.c
+++ b/drivers/infiniband/core/ucma.c
@@ -1615,7 +1615,6 @@ static ssize_t ucma_migrate_id(struct ucma_file *new_file,
 	struct ucma_event *uevent, *tmp;
 	struct ucma_context *ctx;
 	LIST_HEAD(event_list);
-	struct fd f;
 	struct ucma_file *cur_file;
 	int ret = 0;
 
@@ -1623,21 +1622,17 @@ static ssize_t ucma_migrate_id(struct ucma_file *new_file,
 		return -EFAULT;
 
 	/* Get current fd to protect against it being closed */
-	f = fdget(cmd.fd);
-	if (!fd_file(f))
+	CLASS(fd, f)(cmd.fd);
+	if (fd_empty(f))
 		return -ENOENT;
-	if (fd_file(f)->f_op != &ucma_fops) {
-		ret = -EINVAL;
-		goto file_put;
-	}
+	if (fd_file(f)->f_op != &ucma_fops)
+		return -EINVAL;
 	cur_file = fd_file(f)->private_data;
 
 	/* Validate current fd and prevent destruction of id. */
 	ctx = ucma_get_ctx(cur_file, cmd.id);
-	if (IS_ERR(ctx)) {
-		ret = PTR_ERR(ctx);
-		goto file_put;
-	}
+	if (IS_ERR(ctx))
+		return PTR_ERR(ctx);
 
 	rdma_lock_handler(ctx->cm_id);
 	/*
@@ -1678,8 +1673,6 @@ static ssize_t ucma_migrate_id(struct ucma_file *new_file,
 err_unlock:
 	rdma_unlock_handler(ctx->cm_id);
 	ucma_put_ctx(ctx);
-file_put:
-	fdput(f);
 	return ret;
 }
 
diff --git a/drivers/vfio/group.c b/drivers/vfio/group.c
index 95b336de8a17..49559605177e 100644
--- a/drivers/vfio/group.c
+++ b/drivers/vfio/group.c
@@ -104,15 +104,14 @@ static int vfio_group_ioctl_set_container(struct vfio_group *group,
 {
 	struct vfio_container *container;
 	struct iommufd_ctx *iommufd;
-	struct fd f;
 	int ret;
 	int fd;
 
 	if (get_user(fd, arg))
 		return -EFAULT;
 
-	f = fdget(fd);
-	if (!fd_file(f))
+	CLASS(fd, f)(fd);
+	if (fd_empty(f))
 		return -EBADF;
 
 	mutex_lock(&group->group_lock);
@@ -153,7 +152,6 @@ static int vfio_group_ioctl_set_container(struct vfio_group *group,
 
 out_unlock:
 	mutex_unlock(&group->group_lock);
-	fdput(f);
 	return ret;
 }
 
diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index 4609f7342005..1e63f3b03ca5 100644
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -2420,8 +2420,6 @@ SYSCALL_DEFINE4(epoll_ctl, int, epfd, int, op, int, fd,
 static int do_epoll_wait(int epfd, struct epoll_event __user *events,
 			 int maxevents, struct timespec64 *to)
 {
-	int error;
-	struct fd f;
 	struct eventpoll *ep;
 
 	/* The maximum number of event must be greater than zero */
@@ -2433,17 +2431,16 @@ static int do_epoll_wait(int epfd, struct epoll_event __user *events,
 		return -EFAULT;
 
 	/* Get the "struct file *" for the eventpoll file */
-	f = fdget(epfd);
-	if (!fd_file(f))
+	CLASS(fd, f)(epfd);
+	if (fd_empty(f))
 		return -EBADF;
 
 	/*
 	 * We have to check that the file structure underneath the fd
 	 * the user passed to us _is_ an eventpoll file.
 	 */
-	error = -EINVAL;
 	if (!is_file_epoll(fd_file(f)))
-		goto error_fput;
+		return -EINVAL;
 
 	/*
 	 * At this point it is safe to assume that the "private_data" contains
@@ -2452,11 +2449,7 @@ static int do_epoll_wait(int epfd, struct epoll_event __user *events,
 	ep = fd_file(f)->private_data;
 
 	/* Time to fish for events ... */
-	error = ep_poll(ep, events, maxevents, to);
-
-error_fput:
-	fdput(f);
-	return error;
+	return ep_poll(ep, events, maxevents, to);
 }
 
 SYSCALL_DEFINE4(epoll_wait, int, epfd, struct epoll_event __user *, events,
diff --git a/fs/ext4/ioctl.c b/fs/ext4/ioctl.c
index 1c77400bd88e..7b9ce71c1c81 100644
--- a/fs/ext4/ioctl.c
+++ b/fs/ext4/ioctl.c
@@ -1330,7 +1330,6 @@ static long __ext4_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
 
 	case EXT4_IOC_MOVE_EXT: {
 		struct move_extent me;
-		struct fd donor;
 		int err;
 
 		if (!(filp->f_mode & FMODE_READ) ||
@@ -1342,30 +1341,26 @@ static long __ext4_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
 			return -EFAULT;
 		me.moved_len = 0;
 
-		donor = fdget(me.donor_fd);
-		if (!fd_file(donor))
+		CLASS(fd, donor)(me.donor_fd);
+		if (fd_empty(donor))
 			return -EBADF;
 
-		if (!(fd_file(donor)->f_mode & FMODE_WRITE)) {
-			err = -EBADF;
-			goto mext_out;
-		}
+		if (!(fd_file(donor)->f_mode & FMODE_WRITE))
+			return -EBADF;
 
 		if (ext4_has_feature_bigalloc(sb)) {
 			ext4_msg(sb, KERN_ERR,
 				 "Online defrag not supported with bigalloc");
-			err = -EOPNOTSUPP;
-			goto mext_out;
+			return -EOPNOTSUPP;
 		} else if (IS_DAX(inode)) {
 			ext4_msg(sb, KERN_ERR,
 				 "Online defrag not supported with DAX");
-			err = -EOPNOTSUPP;
-			goto mext_out;
+			return -EOPNOTSUPP;
 		}
 
 		err = mnt_want_write_file(filp);
 		if (err)
-			goto mext_out;
+			return err;
 
 		err = ext4_move_extents(filp, fd_file(donor), me.orig_start,
 					me.donor_start, me.len, &me.moved_len);
@@ -1374,8 +1369,6 @@ static long __ext4_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
 		if (copy_to_user((struct move_extent __user *)arg,
 				 &me, sizeof(me)))
 			err = -EFAULT;
-mext_out:
-		fdput(donor);
 		return err;
 	}
 
diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
index 903337f8d21a..4682d83ab6b4 100644
--- a/fs/f2fs/file.c
+++ b/fs/f2fs/file.c
@@ -3006,32 +3006,27 @@ static int f2fs_move_file_range(struct file *file_in, loff_t pos_in,
 static int __f2fs_ioc_move_range(struct file *filp,
 				struct f2fs_move_range *range)
 {
-	struct fd dst;
 	int err;
 
 	if (!(filp->f_mode & FMODE_READ) ||
 			!(filp->f_mode & FMODE_WRITE))
 		return -EBADF;
 
-	dst = fdget(range->dst_fd);
-	if (!fd_file(dst))
+	CLASS(fd, dst)(range->dst_fd);
+	if (fd_empty(dst))
 		return -EBADF;
 
-	if (!(fd_file(dst)->f_mode & FMODE_WRITE)) {
-		err = -EBADF;
-		goto err_out;
-	}
+	if (!(fd_file(dst)->f_mode & FMODE_WRITE))
+		return -EBADF;
 
 	err = mnt_want_write_file(filp);
 	if (err)
-		goto err_out;
+		return err;
 
 	err = f2fs_move_file_range(filp, range->pos_in, fd_file(dst),
 					range->pos_out, range->len);
 
 	mnt_drop_write_file(filp);
-err_out:
-	fdput(dst);
 	return err;
 }
 
diff --git a/fs/fsopen.c b/fs/fsopen.c
index ee92ca58429e..ff807662a41c 100644
--- a/fs/fsopen.c
+++ b/fs/fsopen.c
@@ -350,7 +350,6 @@ SYSCALL_DEFINE5(fsconfig,
 		int, aux)
 {
 	struct fs_context *fc;
-	struct fd f;
 	int ret;
 	int lookup_flags = 0;
 
@@ -393,12 +392,11 @@ SYSCALL_DEFINE5(fsconfig,
 		return -EOPNOTSUPP;
 	}
 
-	f = fdget(fd);
-	if (!fd_file(f))
+	CLASS(fd, f)(fd);
+	if (fd_empty(f))
 		return -EBADF;
-	ret = -EINVAL;
 	if (fd_file(f)->f_op != &fscontext_fops)
-		goto out_f;
+		return -EINVAL;
 
 	fc = fd_file(f)->private_data;
 	if (fc->ops == &legacy_fs_context_ops) {
@@ -408,17 +406,14 @@ SYSCALL_DEFINE5(fsconfig,
 		case FSCONFIG_SET_PATH_EMPTY:
 		case FSCONFIG_SET_FD:
 		case FSCONFIG_CMD_CREATE_EXCL:
-			ret = -EOPNOTSUPP;
-			goto out_f;
+			return -EOPNOTSUPP;
 		}
 	}
 
 	if (_key) {
 		param.key = strndup_user(_key, 256);
-		if (IS_ERR(param.key)) {
-			ret = PTR_ERR(param.key);
-			goto out_f;
-		}
+		if (IS_ERR(param.key))
+			return PTR_ERR(param.key);
 	}
 
 	switch (cmd) {
@@ -497,7 +492,5 @@ SYSCALL_DEFINE5(fsconfig,
 	}
 out_key:
 	kfree(param.key);
-out_f:
-	fdput(f);
 	return ret;
 }
diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
index 991b9ae8e7c9..27826116a4fb 100644
--- a/fs/fuse/dev.c
+++ b/fs/fuse/dev.c
@@ -2315,13 +2315,12 @@ static long fuse_dev_ioctl_clone(struct file *file, __u32 __user *argp)
 	int res;
 	int oldfd;
 	struct fuse_dev *fud = NULL;
-	struct fd f;
 
 	if (get_user(oldfd, argp))
 		return -EFAULT;
 
-	f = fdget(oldfd);
-	if (!fd_file(f))
+	CLASS(fd, f)(oldfd);
+	if (fd_empty(f))
 		return -EINVAL;
 
 	/*
@@ -2338,7 +2337,6 @@ static long fuse_dev_ioctl_clone(struct file *file, __u32 __user *argp)
 		mutex_unlock(&fuse_mutex);
 	}
 
-	fdput(f);
 	return res;
 }
 
diff --git a/fs/locks.c b/fs/locks.c
index 239759a356e9..ea56dc70d445 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -2132,7 +2132,6 @@ SYSCALL_DEFINE2(flock, unsigned int, fd, unsigned int, cmd)
 {
 	int can_sleep, error, type;
 	struct file_lock fl;
-	struct fd f;
 
 	/*
 	 * LOCK_MAND locks were broken for a long time in that they never
@@ -2151,19 +2150,18 @@ SYSCALL_DEFINE2(flock, unsigned int, fd, unsigned int, cmd)
 	if (type < 0)
 		return type;
 
-	error = -EBADF;
-	f = fdget(fd);
-	if (!fd_file(f))
-		return error;
+	CLASS(fd, f)(fd);
+	if (fd_empty(f))
+		return -EBADF;
 
 	if (type != F_UNLCK && !(fd_file(f)->f_mode & (FMODE_READ | FMODE_WRITE)))
-		goto out_putf;
+		return -EBADF;
 
 	flock_make_lock(fd_file(f), &fl, type);
 
 	error = security_file_lock(fd_file(f), fl.c.flc_type);
 	if (error)
-		goto out_putf;
+		return error;
 
 	can_sleep = !(cmd & LOCK_NB);
 	if (can_sleep)
@@ -2177,9 +2175,6 @@ SYSCALL_DEFINE2(flock, unsigned int, fd, unsigned int, cmd)
 		error = locks_lock_file_wait(fd_file(f), &fl);
 
 	locks_release_private(&fl);
- out_putf:
-	fdput(f);
-
 	return error;
 }
 
diff --git a/fs/namespace.c b/fs/namespace.c
index c46d48bb38cd..876ab546feba 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -4070,7 +4070,6 @@ SYSCALL_DEFINE3(fsmount, int, fs_fd, unsigned int, flags,
 	struct file *file;
 	struct path newmount;
 	struct mount *mnt;
-	struct fd f;
 	unsigned int mnt_flags = 0;
 	long ret;
 
@@ -4098,19 +4097,18 @@ SYSCALL_DEFINE3(fsmount, int, fs_fd, unsigned int, flags,
 		return -EINVAL;
 	}
 
-	f = fdget(fs_fd);
-	if (!fd_file(f))
+	CLASS(fd, f)(fs_fd);
+	if (fd_empty(f))
 		return -EBADF;
 
-	ret = -EINVAL;
 	if (fd_file(f)->f_op != &fscontext_fops)
-		goto err_fsfd;
+		return -EINVAL;
 
 	fc = fd_file(f)->private_data;
 
 	ret = mutex_lock_interruptible(&fc->uapi_mutex);
 	if (ret < 0)
-		goto err_fsfd;
+		return ret;
 
 	/* There must be a valid superblock or we can't mount it */
 	ret = -EINVAL;
@@ -4177,8 +4175,6 @@ SYSCALL_DEFINE3(fsmount, int, fs_fd, unsigned int, flags,
 	path_put(&newmount);
 err_unlock:
 	mutex_unlock(&fc->uapi_mutex);
-err_fsfd:
-	fdput(f);
 	return ret;
 }
 
@@ -4629,10 +4625,8 @@ static int do_mount_setattr(struct path *path, struct mount_kattr *kattr)
 static int build_mount_idmapped(const struct mount_attr *attr, size_t usize,
 				struct mount_kattr *kattr, unsigned int flags)
 {
-	int err = 0;
 	struct ns_common *ns;
 	struct user_namespace *mnt_userns;
-	struct fd f;
 
 	if (!((attr->attr_set | attr->attr_clr) & MOUNT_ATTR_IDMAP))
 		return 0;
@@ -4648,20 +4642,16 @@ static int build_mount_idmapped(const struct mount_attr *attr, size_t usize,
 	if (attr->userns_fd > INT_MAX)
 		return -EINVAL;
 
-	f = fdget(attr->userns_fd);
-	if (!fd_file(f))
+	CLASS(fd, f)(attr->userns_fd);
+	if (fd_empty(f))
 		return -EBADF;
 
-	if (!proc_ns_file(fd_file(f))) {
-		err = -EINVAL;
-		goto out_fput;
-	}
+	if (!proc_ns_file(fd_file(f)))
+		return -EINVAL;
 
 	ns = get_proc_ns(file_inode(fd_file(f)));
-	if (ns->ops->type != CLONE_NEWUSER) {
-		err = -EINVAL;
-		goto out_fput;
-	}
+	if (ns->ops->type != CLONE_NEWUSER)
+		return -EINVAL;
 
 	/*
 	 * The initial idmapping cannot be used to create an idmapped
@@ -4672,22 +4662,15 @@ static int build_mount_idmapped(const struct mount_attr *attr, size_t usize,
 	 * result.
 	 */
 	mnt_userns = container_of(ns, struct user_namespace, ns);
-	if (mnt_userns == &init_user_ns) {
-		err = -EPERM;
-		goto out_fput;
-	}
+	if (mnt_userns == &init_user_ns)
+		return -EPERM;
 
 	/* We're not controlling the target namespace. */
-	if (!ns_capable(mnt_userns, CAP_SYS_ADMIN)) {
-		err = -EPERM;
-		goto out_fput;
-	}
+	if (!ns_capable(mnt_userns, CAP_SYS_ADMIN))
+		return -EPERM;
 
 	kattr->mnt_userns = get_user_ns(mnt_userns);
-
-out_fput:
-	fdput(f);
-	return err;
+	return 0;
 }
 
 static int build_mount_kattr(const struct mount_attr *attr, size_t usize,
diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c
index 64b1adc75fe5..7b5abc1b8c8f 100644
--- a/fs/notify/fanotify/fanotify_user.c
+++ b/fs/notify/fanotify/fanotify_user.c
@@ -1677,7 +1677,6 @@ static int do_fanotify_mark(int fanotify_fd, unsigned int flags, __u64 mask,
 	struct inode *inode = NULL;
 	struct vfsmount *mnt = NULL;
 	struct fsnotify_group *group;
-	struct fd f;
 	struct path path;
 	struct fan_fsid __fsid, *fsid = NULL;
 	u32 valid_mask = FANOTIFY_EVENTS | FANOTIFY_EVENT_FLAGS;
@@ -1747,14 +1746,13 @@ static int do_fanotify_mark(int fanotify_fd, unsigned int flags, __u64 mask,
 		umask = FANOTIFY_EVENT_FLAGS;
 	}
 
-	f = fdget(fanotify_fd);
-	if (unlikely(!fd_file(f)))
+	CLASS(fd, f)(fanotify_fd);
+	if (fd_empty(f))
 		return -EBADF;
 
 	/* verify that this is indeed an fanotify instance */
-	ret = -EINVAL;
 	if (unlikely(fd_file(f)->f_op != &fanotify_fops))
-		goto fput_and_out;
+		return -EINVAL;
 	group = fd_file(f)->private_data;
 
 	/*
@@ -1762,23 +1760,21 @@ static int do_fanotify_mark(int fanotify_fd, unsigned int flags, __u64 mask,
 	 * marks.  This also includes setting up such marks by a group that
 	 * was initialized by an unprivileged user.
 	 */
-	ret = -EPERM;
 	if ((!capable(CAP_SYS_ADMIN) ||
 	     FAN_GROUP_FLAG(group, FANOTIFY_UNPRIV)) &&
 	    mark_type != FAN_MARK_INODE)
-		goto fput_and_out;
+		return -EPERM;
 
 	/*
 	 * Permission events require minimum priority FAN_CLASS_CONTENT.
 	 */
-	ret = -EINVAL;
 	if (mask & FANOTIFY_PERM_EVENTS &&
 	    group->priority < FSNOTIFY_PRIO_CONTENT)
-		goto fput_and_out;
+		return -EINVAL;
 
 	if (mask & FAN_FS_ERROR &&
 	    mark_type != FAN_MARK_FILESYSTEM)
-		goto fput_and_out;
+		return -EINVAL;
 
 	/*
 	 * Evictable is only relevant for inode marks, because only inode object
@@ -1786,7 +1782,7 @@ static int do_fanotify_mark(int fanotify_fd, unsigned int flags, __u64 mask,
 	 */
 	if (flags & FAN_MARK_EVICTABLE &&
 	     mark_type != FAN_MARK_INODE)
-		goto fput_and_out;
+		return -EINVAL;
 
 	/*
 	 * Events that do not carry enough information to report
@@ -1798,7 +1794,7 @@ static int do_fanotify_mark(int fanotify_fd, unsigned int flags, __u64 mask,
 	fid_mode = FAN_GROUP_FLAG(group, FANOTIFY_FID_BITS);
 	if (mask & ~(FANOTIFY_FD_EVENTS|FANOTIFY_EVENT_FLAGS) &&
 	    (!fid_mode || mark_type == FAN_MARK_MOUNT))
-		goto fput_and_out;
+		return -EINVAL;
 
 	/*
 	 * FAN_RENAME uses special info type records to report the old and
@@ -1806,23 +1802,22 @@ static int do_fanotify_mark(int fanotify_fd, unsigned int flags, __u64 mask,
 	 * useful and was not implemented.
 	 */
 	if (mask & FAN_RENAME && !(fid_mode & FAN_REPORT_NAME))
-		goto fput_and_out;
+		return -EINVAL;
 
 	if (mark_cmd == FAN_MARK_FLUSH) {
-		ret = 0;
 		if (mark_type == FAN_MARK_MOUNT)
 			fsnotify_clear_vfsmount_marks_by_group(group);
 		else if (mark_type == FAN_MARK_FILESYSTEM)
 			fsnotify_clear_sb_marks_by_group(group);
 		else
 			fsnotify_clear_inode_marks_by_group(group);
-		goto fput_and_out;
+		return 0;
 	}
 
 	ret = fanotify_find_path(dfd, pathname, &path, flags,
 			(mask & ALL_FSNOTIFY_EVENTS), obj_type);
 	if (ret)
-		goto fput_and_out;
+		return ret;
 
 	if (mark_cmd == FAN_MARK_ADD) {
 		ret = fanotify_events_supported(group, &path, mask, flags);
@@ -1901,8 +1896,6 @@ static int do_fanotify_mark(int fanotify_fd, unsigned int flags, __u64 mask,
 
 path_put_and_out:
 	path_put(&path);
-fput_and_out:
-	fdput(f);
 	return ret;
 }
 
diff --git a/fs/notify/inotify/inotify_user.c b/fs/notify/inotify/inotify_user.c
index 711830b73275..f46ec6afee3c 100644
--- a/fs/notify/inotify/inotify_user.c
+++ b/fs/notify/inotify/inotify_user.c
@@ -732,7 +732,6 @@ SYSCALL_DEFINE3(inotify_add_watch, int, fd, const char __user *, pathname,
 	struct fsnotify_group *group;
 	struct inode *inode;
 	struct path path;
-	struct fd f;
 	int ret;
 	unsigned flags = 0;
 
@@ -752,21 +751,17 @@ SYSCALL_DEFINE3(inotify_add_watch, int, fd, const char __user *, pathname,
 	if (unlikely(!(mask & ALL_INOTIFY_BITS)))
 		return -EINVAL;
 
-	f = fdget(fd);
-	if (unlikely(!fd_file(f)))
+	CLASS(fd, f)(fd);
+	if (fd_empty(f))
 		return -EBADF;
 
 	/* IN_MASK_ADD and IN_MASK_CREATE don't make sense together */
-	if (unlikely((mask & IN_MASK_ADD) && (mask & IN_MASK_CREATE))) {
-		ret = -EINVAL;
-		goto fput_and_out;
-	}
+	if (unlikely((mask & IN_MASK_ADD) && (mask & IN_MASK_CREATE)))
+		return -EINVAL;
 
 	/* verify that this is indeed an inotify instance */
-	if (unlikely(fd_file(f)->f_op != &inotify_fops)) {
-		ret = -EINVAL;
-		goto fput_and_out;
-	}
+	if (unlikely(fd_file(f)->f_op != &inotify_fops))
+		return -EINVAL;
 
 	if (!(mask & IN_DONT_FOLLOW))
 		flags |= LOOKUP_FOLLOW;
@@ -776,7 +771,7 @@ SYSCALL_DEFINE3(inotify_add_watch, int, fd, const char __user *, pathname,
 	ret = inotify_find_inode(pathname, &path, flags,
 			(mask & IN_ALL_EVENTS));
 	if (ret)
-		goto fput_and_out;
+		return ret;
 
 	/* inode held in place by reference to path; group by fget on fd */
 	inode = path.dentry->d_inode;
@@ -785,8 +780,6 @@ SYSCALL_DEFINE3(inotify_add_watch, int, fd, const char __user *, pathname,
 	/* create/update an inode mark */
 	ret = inotify_update_watch(group, inode, mask);
 	path_put(&path);
-fput_and_out:
-	fdput(f);
 	return ret;
 }
 
diff --git a/fs/ocfs2/cluster/heartbeat.c b/fs/ocfs2/cluster/heartbeat.c
index bc55340a60c3..4200a0341343 100644
--- a/fs/ocfs2/cluster/heartbeat.c
+++ b/fs/ocfs2/cluster/heartbeat.c
@@ -1765,7 +1765,6 @@ static ssize_t o2hb_region_dev_store(struct config_item *item,
 	long fd;
 	int sectsize;
 	char *p = (char *)page;
-	struct fd f;
 	ssize_t ret = -EINVAL;
 	int live_threshold;
 
@@ -1784,23 +1783,23 @@ static ssize_t o2hb_region_dev_store(struct config_item *item,
 	if (fd < 0 || fd >= INT_MAX)
 		return -EINVAL;
 
-	f = fdget(fd);
-	if (fd_file(f) == NULL)
+	CLASS(fd, f)(fd);
+	if (fd_empty(f))
 		return -EINVAL;
 
 	if (reg->hr_blocks == 0 || reg->hr_start_block == 0 ||
 	    reg->hr_block_bytes == 0)
-		goto out2;
+		return -EINVAL;
 
 	if (!S_ISBLK(fd_file(f)->f_mapping->host->i_mode))
-		goto out2;
+		return -EINVAL;
 
 	reg->hr_bdev_file = bdev_file_open_by_dev(fd_file(f)->f_mapping->host->i_rdev,
 			BLK_OPEN_WRITE | BLK_OPEN_READ, NULL, NULL);
 	if (IS_ERR(reg->hr_bdev_file)) {
 		ret = PTR_ERR(reg->hr_bdev_file);
 		reg->hr_bdev_file = NULL;
-		goto out2;
+		return ret;
 	}
 
 	sectsize = bdev_logical_block_size(reg_bdev(reg));
@@ -1906,8 +1905,6 @@ static ssize_t o2hb_region_dev_store(struct config_item *item,
 		fput(reg->hr_bdev_file);
 		reg->hr_bdev_file = NULL;
 	}
-out2:
-	fdput(f);
 	return ret;
 }
 
diff --git a/fs/open.c b/fs/open.c
index 6d7dd58ce095..45884ff7dead 100644
--- a/fs/open.c
+++ b/fs/open.c
@@ -187,19 +187,13 @@ long do_ftruncate(struct file *file, loff_t length, int small)
 
 long do_sys_ftruncate(unsigned int fd, loff_t length, int small)
 {
-	struct fd f;
-	int error;
-
 	if (length < 0)
 		return -EINVAL;
-	f = fdget(fd);
-	if (!fd_file(f))
+	CLASS(fd, f)(fd);
+	if (fd_empty(f))
 		return -EBADF;
 
-	error = do_ftruncate(fd_file(f), length, small);
-
-	fdput(f);
-	return error;
+	return do_ftruncate(fd_file(f), length, small);
 }
 
 SYSCALL_DEFINE2(ftruncate, unsigned int, fd, off_t, length)
diff --git a/fs/read_write.c b/fs/read_write.c
index b3743e7dd93f..789ea6ecf147 100644
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -652,21 +652,17 @@ SYSCALL_DEFINE3(write, unsigned int, fd, const char __user *, buf,
 ssize_t ksys_pread64(unsigned int fd, char __user *buf, size_t count,
 		     loff_t pos)
 {
-	struct fd f;
-	ssize_t ret = -EBADF;
-
 	if (pos < 0)
 		return -EINVAL;
 
-	f = fdget(fd);
-	if (fd_file(f)) {
-		ret = -ESPIPE;
-		if (fd_file(f)->f_mode & FMODE_PREAD)
-			ret = vfs_read(fd_file(f), buf, count, &pos);
-		fdput(f);
-	}
+	CLASS(fd, f)(fd);
+	if (fd_empty(f))
+		return -EBADF;
 
-	return ret;
+	if (fd_file(f)->f_mode & FMODE_PREAD)
+		return vfs_read(fd_file(f), buf, count, &pos);
+
+	return -ESPIPE;
 }
 
 SYSCALL_DEFINE4(pread64, unsigned int, fd, char __user *, buf,
@@ -686,21 +682,17 @@ COMPAT_SYSCALL_DEFINE5(pread64, unsigned int, fd, char __user *, buf,
 ssize_t ksys_pwrite64(unsigned int fd, const char __user *buf,
 		      size_t count, loff_t pos)
 {
-	struct fd f;
-	ssize_t ret = -EBADF;
-
 	if (pos < 0)
 		return -EINVAL;
 
-	f = fdget(fd);
-	if (fd_file(f)) {
-		ret = -ESPIPE;
-		if (fd_file(f)->f_mode & FMODE_PWRITE)
-			ret = vfs_write(fd_file(f), buf, count, &pos);
-		fdput(f);
-	}
+	CLASS(fd, f)(fd);
+	if (fd_empty(f))
+		return -EBADF;
 
-	return ret;
+	if (fd_file(f)->f_mode & FMODE_PWRITE)
+		return vfs_write(fd_file(f), buf, count, &pos);
+
+	return -ESPIPE;
 }
 
 SYSCALL_DEFINE4(pwrite64, unsigned int, fd, const char __user *, buf,
@@ -1214,7 +1206,6 @@ COMPAT_SYSCALL_DEFINE6(pwritev2, compat_ulong_t, fd,
 static ssize_t do_sendfile(int out_fd, int in_fd, loff_t *ppos,
 			   size_t count, loff_t max)
 {
-	struct fd in, out;
 	struct inode *in_inode, *out_inode;
 	struct pipe_inode_info *opipe;
 	loff_t pos;
@@ -1225,35 +1216,32 @@ static ssize_t do_sendfile(int out_fd, int in_fd, loff_t *ppos,
 	/*
 	 * Get input file, and verify that it is ok..
 	 */
-	retval = -EBADF;
-	in = fdget(in_fd);
-	if (!fd_file(in))
-		goto out;
+	CLASS(fd, in)(in_fd);
+	if (fd_empty(in))
+		return -EBADF;
 	if (!(fd_file(in)->f_mode & FMODE_READ))
-		goto fput_in;
-	retval = -ESPIPE;
+		return -EBADF;
 	if (!ppos) {
 		pos = fd_file(in)->f_pos;
 	} else {
 		pos = *ppos;
 		if (!(fd_file(in)->f_mode & FMODE_PREAD))
-			goto fput_in;
+			return -ESPIPE;
 	}
 	retval = rw_verify_area(READ, fd_file(in), &pos, count);
 	if (retval < 0)
-		goto fput_in;
+		return retval;
 	if (count > MAX_RW_COUNT)
 		count =  MAX_RW_COUNT;
 
 	/*
 	 * Get output file, and verify that it is ok..
 	 */
-	retval = -EBADF;
-	out = fdget(out_fd);
-	if (!fd_file(out))
-		goto fput_in;
+	CLASS(fd, out)(out_fd);
+	if (fd_empty(out))
+		return -EBADF;
 	if (!(fd_file(out)->f_mode & FMODE_WRITE))
-		goto fput_out;
+		return -EBADF;
 	in_inode = file_inode(fd_file(in));
 	out_inode = file_inode(fd_file(out));
 	out_pos = fd_file(out)->f_pos;
@@ -1262,9 +1250,8 @@ static ssize_t do_sendfile(int out_fd, int in_fd, loff_t *ppos,
 		max = min(in_inode->i_sb->s_maxbytes, out_inode->i_sb->s_maxbytes);
 
 	if (unlikely(pos + count > max)) {
-		retval = -EOVERFLOW;
 		if (pos >= max)
-			goto fput_out;
+			return -EOVERFLOW;
 		count = max - pos;
 	}
 
@@ -1283,7 +1270,7 @@ static ssize_t do_sendfile(int out_fd, int in_fd, loff_t *ppos,
 	if (!opipe) {
 		retval = rw_verify_area(WRITE, fd_file(out), &out_pos, count);
 		if (retval < 0)
-			goto fput_out;
+			return retval;
 		retval = do_splice_direct(fd_file(in), &pos, fd_file(out), &out_pos,
 					  count, fl);
 	} else {
@@ -1309,12 +1296,6 @@ static ssize_t do_sendfile(int out_fd, int in_fd, loff_t *ppos,
 	inc_syscw(current);
 	if (pos > max)
 		retval = -EOVERFLOW;
-
-fput_out:
-	fdput(out);
-fput_in:
-	fdput(in);
-out:
 	return retval;
 }
 
diff --git a/fs/splice.c b/fs/splice.c
index 29cd39d7f4a0..2898fa1e9e63 100644
--- a/fs/splice.c
+++ b/fs/splice.c
@@ -1622,27 +1622,22 @@ SYSCALL_DEFINE6(splice, int, fd_in, loff_t __user *, off_in,
 		int, fd_out, loff_t __user *, off_out,
 		size_t, len, unsigned int, flags)
 {
-	struct fd in, out;
-	ssize_t error;
-
 	if (unlikely(!len))
 		return 0;
 
 	if (unlikely(flags & ~SPLICE_F_ALL))
 		return -EINVAL;
 
-	error = -EBADF;
-	in = fdget(fd_in);
-	if (fd_file(in)) {
-		out = fdget(fd_out);
-		if (fd_file(out)) {
-			error = __do_splice(fd_file(in), off_in, fd_file(out), off_out,
+	CLASS(fd, in)(fd_in);
+	if (fd_empty(in))
+		return -EBADF;
+
+	CLASS(fd, out)(fd_out);
+	if (fd_empty(out))
+		return -EBADF;
+
+	return __do_splice(fd_file(in), off_in, fd_file(out), off_out,
 					    len, flags);
-			fdput(out);
-		}
-		fdput(in);
-	}
-	return error;
 }
 
 /*
@@ -1992,25 +1987,19 @@ ssize_t do_tee(struct file *in, struct file *out, size_t len,
 
 SYSCALL_DEFINE4(tee, int, fdin, int, fdout, size_t, len, unsigned int, flags)
 {
-	struct fd in, out;
-	ssize_t error;
-
 	if (unlikely(flags & ~SPLICE_F_ALL))
 		return -EINVAL;
 
 	if (unlikely(!len))
 		return 0;
 
-	error = -EBADF;
-	in = fdget(fdin);
-	if (fd_file(in)) {
-		out = fdget(fdout);
-		if (fd_file(out)) {
-			error = do_tee(fd_file(in), fd_file(out), len, flags);
-			fdput(out);
-		}
- 		fdput(in);
- 	}
+	CLASS(fd, in)(fdin);
+	if (fd_empty(in))
+		return -EBADF;
 
-	return error;
+	CLASS(fd, out)(fdout);
+	if (fd_empty(out))
+		return -EBADF;
+
+	return do_tee(fd_file(in), fd_file(out), len, flags);
 }
diff --git a/fs/utimes.c b/fs/utimes.c
index 99b26f792b89..c7c7958e57b2 100644
--- a/fs/utimes.c
+++ b/fs/utimes.c
@@ -108,18 +108,13 @@ static int do_utimes_path(int dfd, const char __user *filename,
 
 static int do_utimes_fd(int fd, struct timespec64 *times, int flags)
 {
-	struct fd f;
-	int error;
-
 	if (flags)
 		return -EINVAL;
 
-	f = fdget(fd);
-	if (!fd_file(f))
+	CLASS(fd, f)(fd);
+	if (fd_empty(f))
 		return -EBADF;
-	error = vfs_utimes(&fd_file(f)->f_path, times);
-	fdput(f);
-	return error;
+	return vfs_utimes(&fd_file(f)->f_path, times);
 }
 
 /*
diff --git a/fs/xfs/xfs_exchrange.c b/fs/xfs/xfs_exchrange.c
index 9790e0f45d14..35b9b58a4f6f 100644
--- a/fs/xfs/xfs_exchrange.c
+++ b/fs/xfs/xfs_exchrange.c
@@ -778,8 +778,6 @@ xfs_ioc_exchange_range(
 		.file2			= file,
 	};
 	struct xfs_exchange_range	args;
-	struct fd			file1;
-	int				error;
 
 	if (copy_from_user(&args, argp, sizeof(args)))
 		return -EFAULT;
@@ -793,12 +791,10 @@ xfs_ioc_exchange_range(
 	fxr.length		= args.length;
 	fxr.flags		= args.flags;
 
-	file1 = fdget(args.file1_fd);
-	if (!fd_file(file1))
+	CLASS(fd, file1)(args.file1_fd);
+	if (fd_empty(file1))
 		return -EBADF;
 	fxr.file1 = fd_file(file1);
 
-	error = xfs_exchange_range(&fxr);
-	fdput(file1);
-	return error;
+	return xfs_exchange_range(&fxr);
 }
diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
index 95641817370b..1bb89f85af1b 100644
--- a/fs/xfs/xfs_ioctl.c
+++ b/fs/xfs/xfs_ioctl.c
@@ -1000,41 +1000,29 @@ xfs_ioc_swapext(
 	xfs_swapext_t	*sxp)
 {
 	xfs_inode_t     *ip, *tip;
-	struct fd	f, tmp;
-	int		error = 0;
 
 	/* Pull information for the target fd */
-	f = fdget((int)sxp->sx_fdtarget);
-	if (!fd_file(f)) {
-		error = -EINVAL;
-		goto out;
-	}
+	CLASS(fd, f)((int)sxp->sx_fdtarget);
+	if (fd_empty(f))
+		return -EINVAL;
 
 	if (!(fd_file(f)->f_mode & FMODE_WRITE) ||
 	    !(fd_file(f)->f_mode & FMODE_READ) ||
-	    (fd_file(f)->f_flags & O_APPEND)) {
-		error = -EBADF;
-		goto out_put_file;
-	}
+	    (fd_file(f)->f_flags & O_APPEND))
+		return -EBADF;
 
-	tmp = fdget((int)sxp->sx_fdtmp);
-	if (!fd_file(tmp)) {
-		error = -EINVAL;
-		goto out_put_file;
-	}
+	CLASS(fd, tmp)((int)sxp->sx_fdtmp);
+	if (fd_empty(tmp))
+		return -EINVAL;
 
 	if (!(fd_file(tmp)->f_mode & FMODE_WRITE) ||
 	    !(fd_file(tmp)->f_mode & FMODE_READ) ||
-	    (fd_file(tmp)->f_flags & O_APPEND)) {
-		error = -EBADF;
-		goto out_put_tmp_file;
-	}
+	    (fd_file(tmp)->f_flags & O_APPEND))
+		return -EBADF;
 
 	if (IS_SWAPFILE(file_inode(fd_file(f))) ||
-	    IS_SWAPFILE(file_inode(fd_file(tmp)))) {
-		error = -EINVAL;
-		goto out_put_tmp_file;
-	}
+	    IS_SWAPFILE(file_inode(fd_file(tmp))))
+		return -EINVAL;
 
 	/*
 	 * We need to ensure that the fds passed in point to XFS inodes
@@ -1042,37 +1030,22 @@ xfs_ioc_swapext(
 	 * control over what the user passes us here.
 	 */
 	if (fd_file(f)->f_op != &xfs_file_operations ||
-	    fd_file(tmp)->f_op != &xfs_file_operations) {
-		error = -EINVAL;
-		goto out_put_tmp_file;
-	}
+	    fd_file(tmp)->f_op != &xfs_file_operations)
+		return -EINVAL;
 
 	ip = XFS_I(file_inode(fd_file(f)));
 	tip = XFS_I(file_inode(fd_file(tmp)));
 
-	if (ip->i_mount != tip->i_mount) {
-		error = -EINVAL;
-		goto out_put_tmp_file;
-	}
-
-	if (ip->i_ino == tip->i_ino) {
-		error = -EINVAL;
-		goto out_put_tmp_file;
-	}
+	if (ip->i_mount != tip->i_mount)
+		return -EINVAL;
 
-	if (xfs_is_shutdown(ip->i_mount)) {
-		error = -EIO;
-		goto out_put_tmp_file;
-	}
+	if (ip->i_ino == tip->i_ino)
+		return -EINVAL;
 
-	error = xfs_swap_extents(ip, tip, sxp);
+	if (xfs_is_shutdown(ip->i_mount))
+		return -EIO;
 
- out_put_tmp_file:
-	fdput(tmp);
- out_put_file:
-	fdput(f);
- out:
-	return error;
+	return xfs_swap_extents(ip, tip, sxp);
 }
 
 static int
diff --git a/ipc/mqueue.c b/ipc/mqueue.c
index 4f1dec518fae..35b4f8659904 100644
--- a/ipc/mqueue.c
+++ b/ipc/mqueue.c
@@ -1063,7 +1063,6 @@ static int do_mq_timedsend(mqd_t mqdes, const char __user *u_msg_ptr,
 		size_t msg_len, unsigned int msg_prio,
 		struct timespec64 *ts)
 {
-	struct fd f;
 	struct inode *inode;
 	struct ext_wait_queue wait;
 	struct ext_wait_queue *receiver;
@@ -1084,37 +1083,27 @@ static int do_mq_timedsend(mqd_t mqdes, const char __user *u_msg_ptr,
 
 	audit_mq_sendrecv(mqdes, msg_len, msg_prio, ts);
 
-	f = fdget(mqdes);
-	if (unlikely(!fd_file(f))) {
-		ret = -EBADF;
-		goto out;
-	}
+	CLASS(fd, f)(mqdes);
+	if (fd_empty(f))
+		return -EBADF;
 
 	inode = file_inode(fd_file(f));
-	if (unlikely(fd_file(f)->f_op != &mqueue_file_operations)) {
-		ret = -EBADF;
-		goto out_fput;
-	}
+	if (unlikely(fd_file(f)->f_op != &mqueue_file_operations))
+		return -EBADF;
 	info = MQUEUE_I(inode);
 	audit_file(fd_file(f));
 
-	if (unlikely(!(fd_file(f)->f_mode & FMODE_WRITE))) {
-		ret = -EBADF;
-		goto out_fput;
-	}
+	if (unlikely(!(fd_file(f)->f_mode & FMODE_WRITE)))
+		return -EBADF;
 
-	if (unlikely(msg_len > info->attr.mq_msgsize)) {
-		ret = -EMSGSIZE;
-		goto out_fput;
-	}
+	if (unlikely(msg_len > info->attr.mq_msgsize))
+		return -EMSGSIZE;
 
 	/* First try to allocate memory, before doing anything with
 	 * existing queues. */
 	msg_ptr = load_msg(u_msg_ptr, msg_len);
-	if (IS_ERR(msg_ptr)) {
-		ret = PTR_ERR(msg_ptr);
-		goto out_fput;
-	}
+	if (IS_ERR(msg_ptr))
+		return PTR_ERR(msg_ptr);
 	msg_ptr->m_ts = msg_len;
 	msg_ptr->m_type = msg_prio;
 
@@ -1172,9 +1161,6 @@ static int do_mq_timedsend(mqd_t mqdes, const char __user *u_msg_ptr,
 out_free:
 	if (ret)
 		free_msg(msg_ptr);
-out_fput:
-	fdput(f);
-out:
 	return ret;
 }
 
@@ -1184,7 +1170,6 @@ static int do_mq_timedreceive(mqd_t mqdes, char __user *u_msg_ptr,
 {
 	ssize_t ret;
 	struct msg_msg *msg_ptr;
-	struct fd f;
 	struct inode *inode;
 	struct mqueue_inode_info *info;
 	struct ext_wait_queue wait;
@@ -1198,30 +1183,22 @@ static int do_mq_timedreceive(mqd_t mqdes, char __user *u_msg_ptr,
 
 	audit_mq_sendrecv(mqdes, msg_len, 0, ts);
 
-	f = fdget(mqdes);
-	if (unlikely(!fd_file(f))) {
-		ret = -EBADF;
-		goto out;
-	}
+	CLASS(fd, f)(mqdes);
+	if (fd_empty(f))
+		return -EBADF;
 
 	inode = file_inode(fd_file(f));
-	if (unlikely(fd_file(f)->f_op != &mqueue_file_operations)) {
-		ret = -EBADF;
-		goto out_fput;
-	}
+	if (unlikely(fd_file(f)->f_op != &mqueue_file_operations))
+		return -EBADF;
 	info = MQUEUE_I(inode);
 	audit_file(fd_file(f));
 
-	if (unlikely(!(fd_file(f)->f_mode & FMODE_READ))) {
-		ret = -EBADF;
-		goto out_fput;
-	}
+	if (unlikely(!(fd_file(f)->f_mode & FMODE_READ)))
+		return -EBADF;
 
 	/* checks if buffer is big enough */
-	if (unlikely(msg_len < info->attr.mq_msgsize)) {
-		ret = -EMSGSIZE;
-		goto out_fput;
-	}
+	if (unlikely(msg_len < info->attr.mq_msgsize))
+		return -EMSGSIZE;
 
 	/*
 	 * msg_insert really wants us to have a valid, spare node struct so
@@ -1275,9 +1252,6 @@ static int do_mq_timedreceive(mqd_t mqdes, char __user *u_msg_ptr,
 		}
 		free_msg(msg_ptr);
 	}
-out_fput:
-	fdput(f);
-out:
 	return ret;
 }
 
@@ -1437,21 +1411,18 @@ SYSCALL_DEFINE2(mq_notify, mqd_t, mqdes,
 
 static int do_mq_getsetattr(int mqdes, struct mq_attr *new, struct mq_attr *old)
 {
-	struct fd f;
 	struct inode *inode;
 	struct mqueue_inode_info *info;
 
 	if (new && (new->mq_flags & (~O_NONBLOCK)))
 		return -EINVAL;
 
-	f = fdget(mqdes);
-	if (!fd_file(f))
+	CLASS(fd, f)(mqdes);
+	if (fd_empty(f))
 		return -EBADF;
 
-	if (unlikely(fd_file(f)->f_op != &mqueue_file_operations)) {
-		fdput(f);
+	if (unlikely(fd_file(f)->f_op != &mqueue_file_operations))
 		return -EBADF;
-	}
 
 	inode = file_inode(fd_file(f));
 	info = MQUEUE_I(inode);
@@ -1475,7 +1446,6 @@ static int do_mq_getsetattr(int mqdes, struct mq_attr *new, struct mq_attr *old)
 	}
 
 	spin_unlock(&info->lock);
-	fdput(f);
 	return 0;
 }
 
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 4b39a1ea63e2..abb15cdbc5a3 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -4911,33 +4911,25 @@ static int bpf_link_get_info_by_fd(struct file *file,
 static int bpf_obj_get_info_by_fd(const union bpf_attr *attr,
 				  union bpf_attr __user *uattr)
 {
-	int ufd = attr->info.bpf_fd;
-	struct fd f;
-	int err;
-
 	if (CHECK_ATTR(BPF_OBJ_GET_INFO_BY_FD))
 		return -EINVAL;
 
-	f = fdget(ufd);
-	if (!fd_file(f))
+	CLASS(fd, f)(attr->info.bpf_fd);
+	if (fd_empty(f))
 		return -EBADFD;
 
 	if (fd_file(f)->f_op == &bpf_prog_fops)
-		err = bpf_prog_get_info_by_fd(fd_file(f), fd_file(f)->private_data, attr,
+		return bpf_prog_get_info_by_fd(fd_file(f), fd_file(f)->private_data, attr,
 					      uattr);
 	else if (fd_file(f)->f_op == &bpf_map_fops)
-		err = bpf_map_get_info_by_fd(fd_file(f), fd_file(f)->private_data, attr,
+		return bpf_map_get_info_by_fd(fd_file(f), fd_file(f)->private_data, attr,
 					     uattr);
 	else if (fd_file(f)->f_op == &btf_fops)
-		err = bpf_btf_get_info_by_fd(fd_file(f), fd_file(f)->private_data, attr, uattr);
+		return bpf_btf_get_info_by_fd(fd_file(f), fd_file(f)->private_data, attr, uattr);
 	else if (fd_file(f)->f_op == &bpf_link_fops || fd_file(f)->f_op == &bpf_link_fops_poll)
-		err = bpf_link_get_info_by_fd(fd_file(f), fd_file(f)->private_data,
+		return bpf_link_get_info_by_fd(fd_file(f), fd_file(f)->private_data,
 					      attr, uattr);
-	else
-		err = -EINVAL;
-
-	fdput(f);
-	return err;
+	return -EINVAL;
 }
 
 #define BPF_BTF_LOAD_LAST_FIELD btf_token_fd
diff --git a/kernel/module/main.c b/kernel/module/main.c
index 93cc9fea9d9d..ea036fd737b5 100644
--- a/kernel/module/main.c
+++ b/kernel/module/main.c
@@ -3196,10 +3196,7 @@ static int idempotent_init_module(struct file *f, const char __user * uargs, int
 
 SYSCALL_DEFINE3(finit_module, int, fd, const char __user *, uargs, int, flags)
 {
-	int err;
-	struct fd f;
-
-	err = may_init_module();
+	int err = may_init_module();
 	if (err)
 		return err;
 
@@ -3210,12 +3207,10 @@ SYSCALL_DEFINE3(finit_module, int, fd, const char __user *, uargs, int, flags)
 		      |MODULE_INIT_COMPRESSED_FILE))
 		return -EINVAL;
 
-	f = fdget(fd);
+	CLASS(fd, f)(fd);
 	if (fd_empty(f))
 		return -EBADF;
-	err = idempotent_init_module(fd_file(f), uargs, flags);
-	fdput(f);
-	return err;
+	return idempotent_init_module(fd_file(f), uargs, flags);
 }
 
 /* Keep in sync with MODULE_FLAGS_BUF_SIZE !!! */
diff --git a/kernel/pid.c b/kernel/pid.c
index b5bbc1a8a6e4..115448e89c3e 100644
--- a/kernel/pid.c
+++ b/kernel/pid.c
@@ -744,23 +744,18 @@ SYSCALL_DEFINE3(pidfd_getfd, int, pidfd, int, fd,
 		unsigned int, flags)
 {
 	struct pid *pid;
-	struct fd f;
-	int ret;
 
 	/* flags is currently unused - make sure it's unset */
 	if (flags)
 		return -EINVAL;
 
-	f = fdget(pidfd);
-	if (!fd_file(f))
+	CLASS(fd, f)(pidfd);
+	if (fd_empty(f))
 		return -EBADF;
 
 	pid = pidfd_pid(fd_file(f));
 	if (IS_ERR(pid))
-		ret = PTR_ERR(pid);
-	else
-		ret = pidfd_getfd(pid, fd);
+		return PTR_ERR(pid);
 
-	fdput(f);
-	return ret;
+	return pidfd_getfd(pid, fd);
 }
diff --git a/kernel/signal.c b/kernel/signal.c
index cc5d87cfa7c0..e555d22f8e42 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -3908,7 +3908,6 @@ SYSCALL_DEFINE4(pidfd_send_signal, int, pidfd, int, sig,
 		siginfo_t __user *, info, unsigned int, flags)
 {
 	int ret;
-	struct fd f;
 	struct pid *pid;
 	kernel_siginfo_t kinfo;
 	enum pid_type type;
@@ -3921,20 +3920,17 @@ SYSCALL_DEFINE4(pidfd_send_signal, int, pidfd, int, sig,
 	if (hweight32(flags & PIDFD_SEND_SIGNAL_FLAGS) > 1)
 		return -EINVAL;
 
-	f = fdget(pidfd);
-	if (!fd_file(f))
+	CLASS(fd, f)(pidfd);
+	if (fd_empty(f))
 		return -EBADF;
 
 	/* Is this a pidfd? */
 	pid = pidfd_to_pid(fd_file(f));
-	if (IS_ERR(pid)) {
-		ret = PTR_ERR(pid);
-		goto err;
-	}
+	if (IS_ERR(pid))
+		return PTR_ERR(pid);
 
-	ret = -EINVAL;
 	if (!access_pidfd_pidns(pid))
-		goto err;
+		return -EINVAL;
 
 	switch (flags) {
 	case 0:
@@ -3958,28 +3954,23 @@ SYSCALL_DEFINE4(pidfd_send_signal, int, pidfd, int, sig,
 	if (info) {
 		ret = copy_siginfo_from_user_any(&kinfo, info);
 		if (unlikely(ret))
-			goto err;
+			return ret;
 
-		ret = -EINVAL;
 		if (unlikely(sig != kinfo.si_signo))
-			goto err;
+			return -EINVAL;
 
 		/* Only allow sending arbitrary signals to yourself. */
-		ret = -EPERM;
 		if ((task_pid(current) != pid || type > PIDTYPE_TGID) &&
 		    (kinfo.si_code >= 0 || kinfo.si_code == SI_TKILL))
-			goto err;
+			return -EPERM;
 	} else {
 		prepare_kill_siginfo(sig, &kinfo, type);
 	}
 
 	if (type == PIDTYPE_PGID)
-		ret = kill_pgrp_info(sig, &kinfo, pid);
+		return kill_pgrp_info(sig, &kinfo, pid);
 	else
-		ret = kill_pid_info_type(sig, &kinfo, pid, type);
-err:
-	fdput(f);
-	return ret;
+		return kill_pid_info_type(sig, &kinfo, pid, type);
 }
 
 static int
diff --git a/kernel/taskstats.c b/kernel/taskstats.c
index 0700f40c53ac..0cd680ccc7e5 100644
--- a/kernel/taskstats.c
+++ b/kernel/taskstats.c
@@ -411,15 +411,14 @@ static int cgroupstats_user_cmd(struct sk_buff *skb, struct genl_info *info)
 	struct nlattr *na;
 	size_t size;
 	u32 fd;
-	struct fd f;
 
 	na = info->attrs[CGROUPSTATS_CMD_ATTR_FD];
 	if (!na)
 		return -EINVAL;
 
 	fd = nla_get_u32(info->attrs[CGROUPSTATS_CMD_ATTR_FD]);
-	f = fdget(fd);
-	if (!fd_file(f))
+	CLASS(fd, f)(fd);
+	if (fd_empty(f))
 		return 0;
 
 	size = nla_total_size(sizeof(struct cgroupstats));
@@ -427,14 +426,13 @@ static int cgroupstats_user_cmd(struct sk_buff *skb, struct genl_info *info)
 	rc = prepare_reply(info, CGROUPSTATS_CMD_NEW, &rep_skb,
 				size);
 	if (rc < 0)
-		goto err;
+		return rc;
 
 	na = nla_reserve(rep_skb, CGROUPSTATS_TYPE_CGROUP_STATS,
 				sizeof(struct cgroupstats));
 	if (na == NULL) {
 		nlmsg_free(rep_skb);
-		rc = -EMSGSIZE;
-		goto err;
+		return -EMSGSIZE;
 	}
 
 	stats = nla_data(na);
@@ -443,14 +441,10 @@ static int cgroupstats_user_cmd(struct sk_buff *skb, struct genl_info *info)
 	rc = cgroupstats_build(stats, fd_file(f)->f_path.dentry);
 	if (rc < 0) {
 		nlmsg_free(rep_skb);
-		goto err;
+		return rc;
 	}
 
-	rc = send_reply(rep_skb, info);
-
-err:
-	fdput(f);
-	return rc;
+	return send_reply(rep_skb, info);
 }
 
 static int cmd_attr_register_cpumask(struct genl_info *info)
diff --git a/security/integrity/ima/ima_main.c b/security/integrity/ima/ima_main.c
index e7c1d3ae33fe..e6b4cdd6e84c 100644
--- a/security/integrity/ima/ima_main.c
+++ b/security/integrity/ima/ima_main.c
@@ -1062,19 +1062,16 @@ int process_buffer_measurement(struct mnt_idmap *idmap,
  */
 void ima_kexec_cmdline(int kernel_fd, const void *buf, int size)
 {
-	struct fd f;
-
 	if (!buf || !size)
 		return;
 
-	f = fdget(kernel_fd);
-	if (!fd_file(f))
+	CLASS(fd, f)(kernel_fd);
+	if (fd_empty(f))
 		return;
 
 	process_buffer_measurement(file_mnt_idmap(fd_file(f)), file_inode(fd_file(f)),
 				   buf, size, "kexec-cmdline", KEXEC_CMDLINE, 0,
 				   NULL, false, NULL, 0);
-	fdput(f);
 }
 
 /**
diff --git a/security/loadpin/loadpin.c b/security/loadpin/loadpin.c
index 02144ec39f43..68252452b66c 100644
--- a/security/loadpin/loadpin.c
+++ b/security/loadpin/loadpin.c
@@ -283,7 +283,6 @@ enum loadpin_securityfs_interface_index {
 
 static int read_trusted_verity_root_digests(unsigned int fd)
 {
-	struct fd f;
 	void *data;
 	int rc;
 	char *p, *d;
@@ -295,8 +294,8 @@ static int read_trusted_verity_root_digests(unsigned int fd)
 	if (!list_empty(&dm_verity_loadpin_trusted_root_digests))
 		return -EPERM;
 
-	f = fdget(fd);
-	if (!fd_file(f))
+	CLASS(fd, f)(fd);
+	if (fd_empty(f))
 		return -EINVAL;
 
 	data = kzalloc(SZ_4K, GFP_KERNEL);
@@ -359,7 +358,6 @@ static int read_trusted_verity_root_digests(unsigned int fd)
 	}
 
 	kfree(data);
-	fdput(f);
 
 	return 0;
 
@@ -379,8 +377,6 @@ static int read_trusted_verity_root_digests(unsigned int fd)
 	/* disallow further attempts after reading a corrupt/invalid file */
 	deny_reading_verity_digests = true;
 
-	fdput(f);
-
 	return rc;
 }
 
diff --git a/virt/kvm/vfio.c b/virt/kvm/vfio.c
index 53262b8a7656..72aa1fdeb699 100644
--- a/virt/kvm/vfio.c
+++ b/virt/kvm/vfio.c
@@ -229,14 +229,13 @@ static int kvm_vfio_file_set_spapr_tce(struct kvm_device *dev,
 	struct kvm_vfio_spapr_tce param;
 	struct kvm_vfio *kv = dev->private;
 	struct kvm_vfio_file *kvf;
-	struct fd f;
 	int ret;
 
 	if (copy_from_user(&param, arg, sizeof(struct kvm_vfio_spapr_tce)))
 		return -EFAULT;
 
-	f = fdget(param.groupfd);
-	if (!fd_file(f))
+	CLASS(fd, f)(param.groupfd);
+	if (fd_empty(f))
 		return -EBADF;
 
 	ret = -ENOENT;
@@ -262,7 +261,6 @@ static int kvm_vfio_file_set_spapr_tce(struct kvm_device *dev,
 
 err_fdput:
 	mutex_unlock(&kv->lock);
-	fdput(f);
 	return ret;
 }
 #endif
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 134+ messages in thread

* Re: [PATCH 24/39] fdget(), more trivial conversions
  2024-07-30  5:16   ` [PATCH 24/39] fdget(), more " viro
@ 2024-08-07 10:39     ` Christian Brauner
  0 siblings, 0 replies; 134+ messages in thread
From: Christian Brauner @ 2024-08-07 10:39 UTC (permalink / raw)
  To: viro; +Cc: linux-fsdevel, amir73il, bpf, cgroups, kvm, netdev, torvalds

On Tue, Jul 30, 2024 at 01:16:10AM GMT, viro@kernel.org wrote:
> From: Al Viro <viro@zeniv.linux.org.uk>
> 
> all failure exits prior to fdget() leave the scope, all matching fdput()
> are immediately followed by leaving the scope.
> 
> [trivial conflicts in fs/fsopen.c and kernel/bpf/syscall.c]
> 
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> ---

Reviewed-by: Christian Brauner <brauner@kernel.org>

^ permalink raw reply	[flat|nested] 134+ messages in thread

* [PATCH 25/39] convert do_preadv()/do_pwritev()
  2024-07-30  5:15 ` [PATCH 01/39] memcg_write_event_control(): fix a user-triggerable oops viro
                     ` (22 preceding siblings ...)
  2024-07-30  5:16   ` [PATCH 24/39] fdget(), more " viro
@ 2024-07-30  5:16   ` viro
  2024-08-07 10:39     ` Christian Brauner
  2024-07-30  5:16   ` [PATCH 26/39] convert cachestat(2) viro
                     ` (14 subsequent siblings)
  38 siblings, 1 reply; 134+ messages in thread
From: viro @ 2024-07-30  5:16 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: amir73il, bpf, brauner, cgroups, kvm, netdev, torvalds

From: Al Viro <viro@zeniv.linux.org.uk>

fdput() can be transposed with add_{r,w}char() and inc_sysc{r,w}();
it's the same story as with do_readv()/do_writev(), only with
fdput() instead of fdput_pos().

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/read_write.c | 12 ++++--------
 1 file changed, 4 insertions(+), 8 deletions(-)

diff --git a/fs/read_write.c b/fs/read_write.c
index 789ea6ecf147..486ed16ab633 100644
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -1020,18 +1020,16 @@ static inline loff_t pos_from_hilo(unsigned long high, unsigned long low)
 static ssize_t do_preadv(unsigned long fd, const struct iovec __user *vec,
 			 unsigned long vlen, loff_t pos, rwf_t flags)
 {
-	struct fd f;
 	ssize_t ret = -EBADF;
 
 	if (pos < 0)
 		return -EINVAL;
 
-	f = fdget(fd);
-	if (fd_file(f)) {
+	CLASS(fd, f)(fd);
+	if (!fd_empty(f)) {
 		ret = -ESPIPE;
 		if (fd_file(f)->f_mode & FMODE_PREAD)
 			ret = vfs_readv(fd_file(f), vec, vlen, &pos, flags);
-		fdput(f);
 	}
 
 	if (ret > 0)
@@ -1043,18 +1041,16 @@ static ssize_t do_preadv(unsigned long fd, const struct iovec __user *vec,
 static ssize_t do_pwritev(unsigned long fd, const struct iovec __user *vec,
 			  unsigned long vlen, loff_t pos, rwf_t flags)
 {
-	struct fd f;
 	ssize_t ret = -EBADF;
 
 	if (pos < 0)
 		return -EINVAL;
 
-	f = fdget(fd);
-	if (fd_file(f)) {
+	CLASS(fd, f)(fd);
+	if (!fd_empty(f)) {
 		ret = -ESPIPE;
 		if (fd_file(f)->f_mode & FMODE_PWRITE)
 			ret = vfs_writev(fd_file(f), vec, vlen, &pos, flags);
-		fdput(f);
 	}
 
 	if (ret > 0)
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 134+ messages in thread

* Re: [PATCH 25/39] convert do_preadv()/do_pwritev()
  2024-07-30  5:16   ` [PATCH 25/39] convert do_preadv()/do_pwritev() viro
@ 2024-08-07 10:39     ` Christian Brauner
  0 siblings, 0 replies; 134+ messages in thread
From: Christian Brauner @ 2024-08-07 10:39 UTC (permalink / raw)
  To: viro; +Cc: linux-fsdevel, amir73il, bpf, cgroups, kvm, netdev, torvalds

On Tue, Jul 30, 2024 at 01:16:11AM GMT, viro@kernel.org wrote:
> From: Al Viro <viro@zeniv.linux.org.uk>
> 
> fdput() can be transposed with add_{r,w}char() and inc_sysc{r,w}();
> it's the same story as with do_readv()/do_writev(), only with
> fdput() instead of fdput_pos().
> 
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> ---

Reviewed-by: Christian Brauner <brauner@kernel.org>

^ permalink raw reply	[flat|nested] 134+ messages in thread

* [PATCH 26/39] convert cachestat(2)
  2024-07-30  5:15 ` [PATCH 01/39] memcg_write_event_control(): fix a user-triggerable oops viro
                     ` (23 preceding siblings ...)
  2024-07-30  5:16   ` [PATCH 25/39] convert do_preadv()/do_pwritev() viro
@ 2024-07-30  5:16   ` viro
  2024-08-07 10:39     ` Christian Brauner
  2024-07-30  5:16   ` [PATCH 27/39] switch spufs_calls_{get,put}() to CLASS() use viro
                     ` (13 subsequent siblings)
  38 siblings, 1 reply; 134+ messages in thread
From: viro @ 2024-07-30  5:16 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: amir73il, bpf, brauner, cgroups, kvm, netdev, torvalds

From: Al Viro <viro@zeniv.linux.org.uk>

fdput() can be transposed with copy_to_user()

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 mm/filemap.c | 17 +++++------------
 1 file changed, 5 insertions(+), 12 deletions(-)

diff --git a/mm/filemap.c b/mm/filemap.c
index 0b5cbd644fdd..9ef41935b0a7 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -4387,31 +4387,25 @@ SYSCALL_DEFINE4(cachestat, unsigned int, fd,
 		struct cachestat_range __user *, cstat_range,
 		struct cachestat __user *, cstat, unsigned int, flags)
 {
-	struct fd f = fdget(fd);
+	CLASS(fd, f)(fd);
 	struct address_space *mapping;
 	struct cachestat_range csr;
 	struct cachestat cs;
 	pgoff_t first_index, last_index;
 
-	if (!fd_file(f))
+	if (fd_empty(f))
 		return -EBADF;
 
 	if (copy_from_user(&csr, cstat_range,
-			sizeof(struct cachestat_range))) {
-		fdput(f);
+			sizeof(struct cachestat_range)))
 		return -EFAULT;
-	}
 
 	/* hugetlbfs is not supported */
-	if (is_file_hugepages(fd_file(f))) {
-		fdput(f);
+	if (is_file_hugepages(fd_file(f)))
 		return -EOPNOTSUPP;
-	}
 
-	if (flags != 0) {
-		fdput(f);
+	if (flags != 0)
 		return -EINVAL;
-	}
 
 	first_index = csr.off >> PAGE_SHIFT;
 	last_index =
@@ -4419,7 +4413,6 @@ SYSCALL_DEFINE4(cachestat, unsigned int, fd,
 	memset(&cs, 0, sizeof(struct cachestat));
 	mapping = fd_file(f)->f_mapping;
 	filemap_cachestat(mapping, first_index, last_index, &cs);
-	fdput(f);
 
 	if (copy_to_user(cstat, &cs, sizeof(struct cachestat)))
 		return -EFAULT;
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 134+ messages in thread

* Re: [PATCH 26/39] convert cachestat(2)
  2024-07-30  5:16   ` [PATCH 26/39] convert cachestat(2) viro
@ 2024-08-07 10:39     ` Christian Brauner
  0 siblings, 0 replies; 134+ messages in thread
From: Christian Brauner @ 2024-08-07 10:39 UTC (permalink / raw)
  To: viro; +Cc: linux-fsdevel, amir73il, bpf, cgroups, kvm, netdev, torvalds

On Tue, Jul 30, 2024 at 01:16:12AM GMT, viro@kernel.org wrote:
> From: Al Viro <viro@zeniv.linux.org.uk>
> 
> fdput() can be transposed with copy_to_user()
> 
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> ---

Reviewed-by: Christian Brauner <brauner@kernel.org>

^ permalink raw reply	[flat|nested] 134+ messages in thread

* [PATCH 27/39] switch spufs_calls_{get,put}() to CLASS() use
  2024-07-30  5:15 ` [PATCH 01/39] memcg_write_event_control(): fix a user-triggerable oops viro
                     ` (24 preceding siblings ...)
  2024-07-30  5:16   ` [PATCH 26/39] convert cachestat(2) viro
@ 2024-07-30  5:16   ` viro
  2024-07-30  5:16   ` [PATCH 28/39] convert spu_run(2) viro
                     ` (12 subsequent siblings)
  38 siblings, 0 replies; 134+ messages in thread
From: viro @ 2024-07-30  5:16 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: amir73il, bpf, brauner, cgroups, kvm, netdev, torvalds

From: Al Viro <viro@zeniv.linux.org.uk>

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 arch/powerpc/platforms/cell/spu_syscalls.c | 52 +++++++---------------
 1 file changed, 17 insertions(+), 35 deletions(-)

diff --git a/arch/powerpc/platforms/cell/spu_syscalls.c b/arch/powerpc/platforms/cell/spu_syscalls.c
index da4fad7fc8bf..64a4c9eac6e0 100644
--- a/arch/powerpc/platforms/cell/spu_syscalls.c
+++ b/arch/powerpc/platforms/cell/spu_syscalls.c
@@ -36,6 +36,9 @@ static inline struct spufs_calls *spufs_calls_get(void)
 
 static inline void spufs_calls_put(struct spufs_calls *calls)
 {
+	if (!calls)
+		return;
+
 	BUG_ON(calls != spufs_calls);
 
 	/* we don't need to rcu this, as we hold a reference to the module */
@@ -53,35 +56,30 @@ static inline void spufs_calls_put(struct spufs_calls *calls) { }
 
 #endif /* CONFIG_SPU_FS_MODULE */
 
+DEFINE_CLASS(spufs_calls, struct spufs_calls *, spufs_calls_put(_T), spufs_calls_get(), void)
+
 SYSCALL_DEFINE4(spu_create, const char __user *, name, unsigned int, flags,
 	umode_t, mode, int, neighbor_fd)
 {
-	long ret;
-	struct spufs_calls *calls;
-
-	calls = spufs_calls_get();
+	CLASS(spufs_calls, calls)();
 	if (!calls)
 		return -ENOSYS;
 
 	if (flags & SPU_CREATE_AFFINITY_SPU) {
 		CLASS(fd, neighbor)(neighbor_fd);
-		ret = -EBADF;
-		if (!fd_empty(neighbor))
-			ret = calls->create_thread(name, flags, mode, fd_file(neighbor));
-	} else
-		ret = calls->create_thread(name, flags, mode, NULL);
-
-	spufs_calls_put(calls);
-	return ret;
+		if (fd_empty(neighbor))
+			return -EBADF;
+		return calls->create_thread(name, flags, mode, fd_file(neighbor));
+	} else {
+		return calls->create_thread(name, flags, mode, NULL);
+	}
 }
 
 SYSCALL_DEFINE3(spu_run,int, fd, __u32 __user *, unpc, __u32 __user *, ustatus)
 {
 	long ret;
 	struct fd arg;
-	struct spufs_calls *calls;
-
-	calls = spufs_calls_get();
+	CLASS(spufs_calls, calls)();
 	if (!calls)
 		return -ENOSYS;
 
@@ -91,42 +89,26 @@ SYSCALL_DEFINE3(spu_run,int, fd, __u32 __user *, unpc, __u32 __user *, ustatus)
 		ret = calls->spu_run(fd_file(arg), unpc, ustatus);
 		fdput(arg);
 	}
-
-	spufs_calls_put(calls);
 	return ret;
 }
 
 #ifdef CONFIG_COREDUMP
 int elf_coredump_extra_notes_size(void)
 {
-	struct spufs_calls *calls;
-	int ret;
-
-	calls = spufs_calls_get();
+	CLASS(spufs_calls, calls)();
 	if (!calls)
 		return 0;
 
-	ret = calls->coredump_extra_notes_size();
-
-	spufs_calls_put(calls);
-
-	return ret;
+	return calls->coredump_extra_notes_size();
 }
 
 int elf_coredump_extra_notes_write(struct coredump_params *cprm)
 {
-	struct spufs_calls *calls;
-	int ret;
-
-	calls = spufs_calls_get();
+	CLASS(spufs_calls, calls)();
 	if (!calls)
 		return 0;
 
-	ret = calls->coredump_extra_notes_write(cprm);
-
-	spufs_calls_put(calls);
-
-	return ret;
+	return calls->coredump_extra_notes_write(cprm);
 }
 #endif
 
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 134+ messages in thread

* [PATCH 28/39] convert spu_run(2)
  2024-07-30  5:15 ` [PATCH 01/39] memcg_write_event_control(): fix a user-triggerable oops viro
                     ` (25 preceding siblings ...)
  2024-07-30  5:16   ` [PATCH 27/39] switch spufs_calls_{get,put}() to CLASS() use viro
@ 2024-07-30  5:16   ` viro
  2024-08-07 10:40     ` Christian Brauner
  2024-07-30  5:16   ` [PATCH 29/39] convert media_request_get_by_fd() viro
                     ` (11 subsequent siblings)
  38 siblings, 1 reply; 134+ messages in thread
From: viro @ 2024-07-30  5:16 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: amir73il, bpf, brauner, cgroups, kvm, netdev, torvalds

From: Al Viro <viro@zeniv.linux.org.uk>

all failure exits prior to fdget() are returns, fdput() is immediately
followed by return.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 arch/powerpc/platforms/cell/spu_syscalls.c | 14 +++++---------
 1 file changed, 5 insertions(+), 9 deletions(-)

diff --git a/arch/powerpc/platforms/cell/spu_syscalls.c b/arch/powerpc/platforms/cell/spu_syscalls.c
index 64a4c9eac6e0..000894e07b02 100644
--- a/arch/powerpc/platforms/cell/spu_syscalls.c
+++ b/arch/powerpc/platforms/cell/spu_syscalls.c
@@ -77,19 +77,15 @@ SYSCALL_DEFINE4(spu_create, const char __user *, name, unsigned int, flags,
 
 SYSCALL_DEFINE3(spu_run,int, fd, __u32 __user *, unpc, __u32 __user *, ustatus)
 {
-	long ret;
-	struct fd arg;
 	CLASS(spufs_calls, calls)();
 	if (!calls)
 		return -ENOSYS;
 
-	ret = -EBADF;
-	arg = fdget(fd);
-	if (fd_file(arg)) {
-		ret = calls->spu_run(fd_file(arg), unpc, ustatus);
-		fdput(arg);
-	}
-	return ret;
+	CLASS(fd, arg)(fd);
+	if (fd_empty(arg))
+		return -EBADF;
+
+	return calls->spu_run(fd_file(arg), unpc, ustatus);
 }
 
 #ifdef CONFIG_COREDUMP
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 134+ messages in thread

* Re: [PATCH 28/39] convert spu_run(2)
  2024-07-30  5:16   ` [PATCH 28/39] convert spu_run(2) viro
@ 2024-08-07 10:40     ` Christian Brauner
  0 siblings, 0 replies; 134+ messages in thread
From: Christian Brauner @ 2024-08-07 10:40 UTC (permalink / raw)
  To: viro; +Cc: linux-fsdevel, amir73il, bpf, cgroups, kvm, netdev, torvalds

On Tue, Jul 30, 2024 at 01:16:14AM GMT, viro@kernel.org wrote:
> From: Al Viro <viro@zeniv.linux.org.uk>
> 
> all failure exits prior to fdget() are returns, fdput() is immediately
> followed by return.
> 
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> ---

Reviewed-by: Christian Brauner <brauner@kernel.org>

^ permalink raw reply	[flat|nested] 134+ messages in thread

* [PATCH 29/39] convert media_request_get_by_fd()
  2024-07-30  5:15 ` [PATCH 01/39] memcg_write_event_control(): fix a user-triggerable oops viro
                     ` (26 preceding siblings ...)
  2024-07-30  5:16   ` [PATCH 28/39] convert spu_run(2) viro
@ 2024-07-30  5:16   ` viro
  2024-08-07 10:40     ` Christian Brauner
  2024-07-30  5:16   ` [PATCH 30/39] convert coda_parse_fd() viro
                     ` (10 subsequent siblings)
  38 siblings, 1 reply; 134+ messages in thread
From: viro @ 2024-07-30  5:16 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: amir73il, bpf, brauner, cgroups, kvm, netdev, torvalds

From: Al Viro <viro@zeniv.linux.org.uk>

the only thing done after fdput() (in failure cases) is a printk; safely
transposable with fdput()...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 drivers/media/mc/mc-request.c | 18 ++++++------------
 1 file changed, 6 insertions(+), 12 deletions(-)

diff --git a/drivers/media/mc/mc-request.c b/drivers/media/mc/mc-request.c
index e064914c476e..df39c8c11e9a 100644
--- a/drivers/media/mc/mc-request.c
+++ b/drivers/media/mc/mc-request.c
@@ -246,22 +246,21 @@ static const struct file_operations request_fops = {
 struct media_request *
 media_request_get_by_fd(struct media_device *mdev, int request_fd)
 {
-	struct fd f;
 	struct media_request *req;
 
 	if (!mdev || !mdev->ops ||
 	    !mdev->ops->req_validate || !mdev->ops->req_queue)
 		return ERR_PTR(-EBADR);
 
-	f = fdget(request_fd);
-	if (!fd_file(f))
-		goto err_no_req_fd;
+	CLASS(fd, f)(request_fd);
+	if (fd_empty(f))
+		goto err;
 
 	if (fd_file(f)->f_op != &request_fops)
-		goto err_fput;
+		goto err;
 	req = fd_file(f)->private_data;
 	if (req->mdev != mdev)
-		goto err_fput;
+		goto err;
 
 	/*
 	 * Note: as long as someone has an open filehandle of the request,
@@ -272,14 +271,9 @@ media_request_get_by_fd(struct media_device *mdev, int request_fd)
 	 * before media_request_get() is called.
 	 */
 	media_request_get(req);
-	fdput(f);
-
 	return req;
 
-err_fput:
-	fdput(f);
-
-err_no_req_fd:
+err:
 	dev_dbg(mdev->dev, "cannot find request_fd %d\n", request_fd);
 	return ERR_PTR(-EINVAL);
 }
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 134+ messages in thread

* Re: [PATCH 29/39] convert media_request_get_by_fd()
  2024-07-30  5:16   ` [PATCH 29/39] convert media_request_get_by_fd() viro
@ 2024-08-07 10:40     ` Christian Brauner
  0 siblings, 0 replies; 134+ messages in thread
From: Christian Brauner @ 2024-08-07 10:40 UTC (permalink / raw)
  To: viro; +Cc: linux-fsdevel, amir73il, bpf, cgroups, kvm, netdev, torvalds

On Tue, Jul 30, 2024 at 01:16:15AM GMT, viro@kernel.org wrote:
> From: Al Viro <viro@zeniv.linux.org.uk>
> 
> the only thing done after fdput() (in failure cases) is a printk; safely
> transposable with fdput()...
> 
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> ---

Reviewed-by: Christian Brauner <brauner@kernel.org>

^ permalink raw reply	[flat|nested] 134+ messages in thread

* [PATCH 30/39] convert coda_parse_fd()
  2024-07-30  5:15 ` [PATCH 01/39] memcg_write_event_control(): fix a user-triggerable oops viro
                     ` (27 preceding siblings ...)
  2024-07-30  5:16   ` [PATCH 29/39] convert media_request_get_by_fd() viro
@ 2024-07-30  5:16   ` viro
  2024-08-07 10:41     ` Christian Brauner
  2024-07-30  5:16   ` [PATCH 31/39] convert cifs_ioctl_copychunk() viro
                     ` (9 subsequent siblings)
  38 siblings, 1 reply; 134+ messages in thread
From: viro @ 2024-07-30  5:16 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: amir73il, bpf, brauner, cgroups, kvm, netdev, torvalds

From: Al Viro <viro@zeniv.linux.org.uk>

fdput() is followed by invalf(), which is transposable with it.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/coda/inode.c | 11 +++--------
 1 file changed, 3 insertions(+), 8 deletions(-)

diff --git a/fs/coda/inode.c b/fs/coda/inode.c
index 7d56b6d1e4c3..293cf5e6dfeb 100644
--- a/fs/coda/inode.c
+++ b/fs/coda/inode.c
@@ -122,22 +122,17 @@ static const struct fs_parameter_spec coda_param_specs[] = {
 static int coda_parse_fd(struct fs_context *fc, int fd)
 {
 	struct coda_fs_context *ctx = fc->fs_private;
-	struct fd f;
+	CLASS(fd, f)(fd);
 	struct inode *inode;
 	int idx;
 
-	f = fdget(fd);
-	if (!fd_file(f))
+	if (fd_empty(f))
 		return -EBADF;
 	inode = file_inode(fd_file(f));
-	if (!S_ISCHR(inode->i_mode) || imajor(inode) != CODA_PSDEV_MAJOR) {
-		fdput(f);
+	if (!S_ISCHR(inode->i_mode) || imajor(inode) != CODA_PSDEV_MAJOR)
 		return invalf(fc, "code: Not coda psdev");
-	}
 
 	idx = iminor(inode);
-	fdput(f);
-
 	if (idx < 0 || idx >= MAX_CODADEVS)
 		return invalf(fc, "coda: Bad minor number");
 	ctx->idx = idx;
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 134+ messages in thread

* Re: [PATCH 30/39] convert coda_parse_fd()
  2024-07-30  5:16   ` [PATCH 30/39] convert coda_parse_fd() viro
@ 2024-08-07 10:41     ` Christian Brauner
  0 siblings, 0 replies; 134+ messages in thread
From: Christian Brauner @ 2024-08-07 10:41 UTC (permalink / raw)
  To: viro; +Cc: linux-fsdevel, amir73il, bpf, cgroups, kvm, netdev, torvalds

On Tue, Jul 30, 2024 at 01:16:16AM GMT, viro@kernel.org wrote:
> From: Al Viro <viro@zeniv.linux.org.uk>
> 
> fdput() is followed by invalf(), which is transposable with it.
> 
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> ---

Reviewed-by: Christian Brauner <brauner@kernel.org>

^ permalink raw reply	[flat|nested] 134+ messages in thread

* [PATCH 31/39] convert cifs_ioctl_copychunk()
  2024-07-30  5:15 ` [PATCH 01/39] memcg_write_event_control(): fix a user-triggerable oops viro
                     ` (28 preceding siblings ...)
  2024-07-30  5:16   ` [PATCH 30/39] convert coda_parse_fd() viro
@ 2024-07-30  5:16   ` viro
  2024-08-07 10:41     ` Christian Brauner
  2024-07-30  5:16   ` [PATCH 32/39] convert vfs_dedupe_file_range() viro
                     ` (8 subsequent siblings)
  38 siblings, 1 reply; 134+ messages in thread
From: viro @ 2024-07-30  5:16 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: amir73il, bpf, brauner, cgroups, kvm, netdev, torvalds

From: Al Viro <viro@zeniv.linux.org.uk>

fdput() moved past mnt_drop_file_write(); harmless, if somewhat cringeworthy.
Reordering could be avoided either by adding an explicit scope or by making
mnt_drop_file_write() called via __cleanup.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/smb/client/ioctl.c | 11 ++++-------
 1 file changed, 4 insertions(+), 7 deletions(-)

diff --git a/fs/smb/client/ioctl.c b/fs/smb/client/ioctl.c
index 94bf2e5014d9..6d9df3646df3 100644
--- a/fs/smb/client/ioctl.c
+++ b/fs/smb/client/ioctl.c
@@ -72,7 +72,6 @@ static long cifs_ioctl_copychunk(unsigned int xid, struct file *dst_file,
 			unsigned long srcfd)
 {
 	int rc;
-	struct fd src_file;
 	struct inode *src_inode;
 
 	cifs_dbg(FYI, "ioctl copychunk range\n");
@@ -89,8 +88,8 @@ static long cifs_ioctl_copychunk(unsigned int xid, struct file *dst_file,
 		return rc;
 	}
 
-	src_file = fdget(srcfd);
-	if (!fd_file(src_file)) {
+	CLASS(fd, src_file)(srcfd);
+	if (fd_empty(src_file)) {
 		rc = -EBADF;
 		goto out_drop_write;
 	}
@@ -98,20 +97,18 @@ static long cifs_ioctl_copychunk(unsigned int xid, struct file *dst_file,
 	if (fd_file(src_file)->f_op->unlocked_ioctl != cifs_ioctl) {
 		rc = -EBADF;
 		cifs_dbg(VFS, "src file seems to be from a different filesystem type\n");
-		goto out_fput;
+		goto out_drop_write;
 	}
 
 	src_inode = file_inode(fd_file(src_file));
 	rc = -EINVAL;
 	if (S_ISDIR(src_inode->i_mode))
-		goto out_fput;
+		goto out_drop_write;
 
 	rc = cifs_file_copychunk_range(xid, fd_file(src_file), 0, dst_file, 0,
 					src_inode->i_size, 0);
 	if (rc > 0)
 		rc = 0;
-out_fput:
-	fdput(src_file);
 out_drop_write:
 	mnt_drop_write_file(dst_file);
 	return rc;
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 134+ messages in thread

* Re: [PATCH 31/39] convert cifs_ioctl_copychunk()
  2024-07-30  5:16   ` [PATCH 31/39] convert cifs_ioctl_copychunk() viro
@ 2024-08-07 10:41     ` Christian Brauner
  0 siblings, 0 replies; 134+ messages in thread
From: Christian Brauner @ 2024-08-07 10:41 UTC (permalink / raw)
  To: viro; +Cc: linux-fsdevel, amir73il, bpf, cgroups, kvm, netdev, torvalds

On Tue, Jul 30, 2024 at 01:16:17AM GMT, viro@kernel.org wrote:
> From: Al Viro <viro@zeniv.linux.org.uk>
> 
> fdput() moved past mnt_drop_file_write(); harmless, if somewhat cringeworthy.
> Reordering could be avoided either by adding an explicit scope or by making
> mnt_drop_file_write() called via __cleanup.
> 
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> ---

Reviewed-by: Christian Brauner <brauner@kernel.org>

^ permalink raw reply	[flat|nested] 134+ messages in thread

* [PATCH 32/39] convert vfs_dedupe_file_range().
  2024-07-30  5:15 ` [PATCH 01/39] memcg_write_event_control(): fix a user-triggerable oops viro
                     ` (29 preceding siblings ...)
  2024-07-30  5:16   ` [PATCH 31/39] convert cifs_ioctl_copychunk() viro
@ 2024-07-30  5:16   ` viro
  2024-08-07 10:42     ` Christian Brauner
  2024-07-30  5:16   ` [PATCH 33/39] convert do_select() viro
                     ` (7 subsequent siblings)
  38 siblings, 1 reply; 134+ messages in thread
From: viro @ 2024-07-30  5:16 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: amir73il, bpf, brauner, cgroups, kvm, netdev, torvalds

From: Al Viro <viro@zeniv.linux.org.uk>

fdput() is followed by checking fatal_signal_pending() (and aborting
the loop in such case).  fdput() is transposable with that check.
Yes, it'll probably end up with slightly fatter code (call after the
check has returned false + call on the almost never taken out-of-line path
instead of one call before the check), but it's not worth bothering with
explicit extra scope there (or dragging the check into the loop condition,
for that matter).

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/remap_range.c | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/fs/remap_range.c b/fs/remap_range.c
index 017d0d1ea6c9..26afbbbfb10c 100644
--- a/fs/remap_range.c
+++ b/fs/remap_range.c
@@ -536,7 +536,7 @@ int vfs_dedupe_file_range(struct file *file, struct file_dedupe_range *same)
 	}
 
 	for (i = 0, info = same->info; i < count; i++, info++) {
-		struct fd dst_fd = fdget(info->dest_fd);
+		CLASS(fd, dst_fd)(info->dest_fd);
 
 		if (fd_empty(dst_fd)) {
 			info->status = -EBADF;
@@ -545,7 +545,7 @@ int vfs_dedupe_file_range(struct file *file, struct file_dedupe_range *same)
 
 		if (info->reserved) {
 			info->status = -EINVAL;
-			goto next_fdput;
+			goto next_loop;
 		}
 
 		deduped = vfs_dedupe_file_range_one(file, off, fd_file(dst_fd),
@@ -558,8 +558,6 @@ int vfs_dedupe_file_range(struct file *file, struct file_dedupe_range *same)
 		else
 			info->bytes_deduped = len;
 
-next_fdput:
-		fdput(dst_fd);
 next_loop:
 		if (fatal_signal_pending(current))
 			break;
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 134+ messages in thread

* Re: [PATCH 32/39] convert vfs_dedupe_file_range().
  2024-07-30  5:16   ` [PATCH 32/39] convert vfs_dedupe_file_range() viro
@ 2024-08-07 10:42     ` Christian Brauner
  0 siblings, 0 replies; 134+ messages in thread
From: Christian Brauner @ 2024-08-07 10:42 UTC (permalink / raw)
  To: viro; +Cc: linux-fsdevel, amir73il, bpf, cgroups, kvm, netdev, torvalds

On Tue, Jul 30, 2024 at 01:16:18AM GMT, viro@kernel.org wrote:
> From: Al Viro <viro@zeniv.linux.org.uk>
> 
> fdput() is followed by checking fatal_signal_pending() (and aborting
> the loop in such case).  fdput() is transposable with that check.
> Yes, it'll probably end up with slightly fatter code (call after the
> check has returned false + call on the almost never taken out-of-line path
> instead of one call before the check), but it's not worth bothering with
> explicit extra scope there (or dragging the check into the loop condition,
> for that matter).
> 
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> ---

Reviewed-by: Christian Brauner <brauner@kernel.org>

^ permalink raw reply	[flat|nested] 134+ messages in thread

* [PATCH 33/39] convert do_select()
  2024-07-30  5:15 ` [PATCH 01/39] memcg_write_event_control(): fix a user-triggerable oops viro
                     ` (30 preceding siblings ...)
  2024-07-30  5:16   ` [PATCH 32/39] convert vfs_dedupe_file_range() viro
@ 2024-07-30  5:16   ` viro
  2024-08-07 10:42     ` Christian Brauner
  2024-07-30  5:16   ` [PATCH 34/39] do_pollfd(): convert to CLASS(fd) viro
                     ` (6 subsequent siblings)
  38 siblings, 1 reply; 134+ messages in thread
From: viro @ 2024-07-30  5:16 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: amir73il, bpf, brauner, cgroups, kvm, netdev, torvalds

From: Al Viro <viro@zeniv.linux.org.uk>

take the logics from fdget() to fdput() into an inlined helper - with existing
wait_key_set() subsumed into that.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/select.c | 21 ++++++++++-----------
 1 file changed, 10 insertions(+), 11 deletions(-)

diff --git a/fs/select.c b/fs/select.c
index 97e1009dde00..039f81c6f817 100644
--- a/fs/select.c
+++ b/fs/select.c
@@ -465,15 +465,22 @@ static int max_select_fd(unsigned long n, fd_set_bits *fds)
 			 EPOLLNVAL)
 #define POLLEX_SET (EPOLLPRI | EPOLLNVAL)
 
-static inline void wait_key_set(poll_table *wait, unsigned long in,
+static inline __poll_t select_poll_one(int fd, poll_table *wait, unsigned long in,
 				unsigned long out, unsigned long bit,
 				__poll_t ll_flag)
 {
+	CLASS(fd, f)(fd);
+
+	if (fd_empty(f))
+		return EPOLLNVAL;
+
 	wait->_key = POLLEX_SET | ll_flag;
 	if (in & bit)
 		wait->_key |= POLLIN_SET;
 	if (out & bit)
 		wait->_key |= POLLOUT_SET;
+
+	return vfs_poll(fd_file(f), wait);
 }
 
 static noinline_for_stack int do_select(int n, fd_set_bits *fds, struct timespec64 *end_time)
@@ -525,20 +532,12 @@ static noinline_for_stack int do_select(int n, fd_set_bits *fds, struct timespec
 			}
 
 			for (j = 0; j < BITS_PER_LONG; ++j, ++i, bit <<= 1) {
-				struct fd f;
 				if (i >= n)
 					break;
 				if (!(bit & all_bits))
 					continue;
-				mask = EPOLLNVAL;
-				f = fdget(i);
-				if (fd_file(f)) {
-					wait_key_set(wait, in, out, bit,
-						     busy_flag);
-					mask = vfs_poll(fd_file(f), wait);
-
-					fdput(f);
-				}
+				mask = select_poll_one(i, wait, in, out, bit,
+						       busy_flag);
 				if ((mask & POLLIN_SET) && (in & bit)) {
 					res_in |= bit;
 					retval++;
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 134+ messages in thread

* Re: [PATCH 33/39] convert do_select()
  2024-07-30  5:16   ` [PATCH 33/39] convert do_select() viro
@ 2024-08-07 10:42     ` Christian Brauner
  0 siblings, 0 replies; 134+ messages in thread
From: Christian Brauner @ 2024-08-07 10:42 UTC (permalink / raw)
  To: viro; +Cc: linux-fsdevel, amir73il, bpf, cgroups, kvm, netdev, torvalds

On Tue, Jul 30, 2024 at 01:16:19AM GMT, viro@kernel.org wrote:
> From: Al Viro <viro@zeniv.linux.org.uk>
> 
> take the logics from fdget() to fdput() into an inlined helper - with existing
> wait_key_set() subsumed into that.
> 
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> ---

Reviewed-by: Christian Brauner <brauner@kernel.org>

^ permalink raw reply	[flat|nested] 134+ messages in thread

* [PATCH 34/39] do_pollfd(): convert to CLASS(fd)
  2024-07-30  5:15 ` [PATCH 01/39] memcg_write_event_control(): fix a user-triggerable oops viro
                     ` (31 preceding siblings ...)
  2024-07-30  5:16   ` [PATCH 33/39] convert do_select() viro
@ 2024-07-30  5:16   ` viro
  2024-08-07 10:43     ` Christian Brauner
  2024-07-30  5:16   ` [PATCH 35/39] convert bpf_token_create() viro
                     ` (5 subsequent siblings)
  38 siblings, 1 reply; 134+ messages in thread
From: viro @ 2024-07-30  5:16 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: amir73il, bpf, brauner, cgroups, kvm, netdev, torvalds

From: Al Viro <viro@zeniv.linux.org.uk>

lift setting ->revents into the caller, so that failure exits (including
the early one) would be plain returns.

We need the scope of our struct fd to end before the store to ->revents,
since that's shared with the failure exits prior to the point where we
can do fdget().

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/select.c | 27 +++++++++++----------------
 1 file changed, 11 insertions(+), 16 deletions(-)

diff --git a/fs/select.c b/fs/select.c
index 039f81c6f817..995b4d9491d3 100644
--- a/fs/select.c
+++ b/fs/select.c
@@ -856,15 +856,14 @@ static inline __poll_t do_pollfd(struct pollfd *pollfd, poll_table *pwait,
 				     __poll_t busy_flag)
 {
 	int fd = pollfd->fd;
-	__poll_t mask = 0, filter;
-	struct fd f;
+	__poll_t mask, filter;
 
 	if (fd < 0)
-		goto out;
-	mask = EPOLLNVAL;
-	f = fdget(fd);
-	if (!fd_file(f))
-		goto out;
+		return 0;
+
+	CLASS(fd, f)(fd);
+	if (fd_empty(f))
+		return EPOLLNVAL;
 
 	/* userland u16 ->events contains POLL... bitmap */
 	filter = demangle_poll(pollfd->events) | EPOLLERR | EPOLLHUP;
@@ -872,13 +871,7 @@ static inline __poll_t do_pollfd(struct pollfd *pollfd, poll_table *pwait,
 	mask = vfs_poll(fd_file(f), pwait);
 	if (mask & busy_flag)
 		*can_busy_poll = true;
-	mask &= filter;		/* Mask out unneeded events. */
-	fdput(f);
-
-out:
-	/* ... and so does ->revents */
-	pollfd->revents = mangle_poll(mask);
-	return mask;
+	return mask & filter;		/* Mask out unneeded events. */
 }
 
 static int do_poll(struct poll_list *list, struct poll_wqueues *wait,
@@ -910,6 +903,7 @@ static int do_poll(struct poll_list *list, struct poll_wqueues *wait,
 			pfd = walk->entries;
 			pfd_end = pfd + walk->len;
 			for (; pfd != pfd_end; pfd++) {
+				__poll_t mask;
 				/*
 				 * Fish for events. If we found one, record it
 				 * and kill poll_table->_qproc, so we don't
@@ -917,8 +911,9 @@ static int do_poll(struct poll_list *list, struct poll_wqueues *wait,
 				 * this. They'll get immediately deregistered
 				 * when we break out and return.
 				 */
-				if (do_pollfd(pfd, pt, &can_busy_loop,
-					      busy_flag)) {
+				mask = do_pollfd(pfd, pt, &can_busy_loop, busy_flag);
+				pfd->revents = mangle_poll(mask);
+				if (mask) {
 					count++;
 					pt->_qproc = NULL;
 					/* found something, stop busy polling */
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 134+ messages in thread

* Re: [PATCH 34/39] do_pollfd(): convert to CLASS(fd)
  2024-07-30  5:16   ` [PATCH 34/39] do_pollfd(): convert to CLASS(fd) viro
@ 2024-08-07 10:43     ` Christian Brauner
  0 siblings, 0 replies; 134+ messages in thread
From: Christian Brauner @ 2024-08-07 10:43 UTC (permalink / raw)
  To: viro; +Cc: linux-fsdevel, amir73il, bpf, cgroups, kvm, netdev, torvalds

On Tue, Jul 30, 2024 at 01:16:20AM GMT, viro@kernel.org wrote:
> From: Al Viro <viro@zeniv.linux.org.uk>
> 
> lift setting ->revents into the caller, so that failure exits (including
> the early one) would be plain returns.
> 
> We need the scope of our struct fd to end before the store to ->revents,
> since that's shared with the failure exits prior to the point where we
> can do fdget().
> 
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> ---

Reviewed-by: Christian Brauner <brauner@kernel.org>

^ permalink raw reply	[flat|nested] 134+ messages in thread

* [PATCH 35/39] convert bpf_token_create()
  2024-07-30  5:15 ` [PATCH 01/39] memcg_write_event_control(): fix a user-triggerable oops viro
                     ` (32 preceding siblings ...)
  2024-07-30  5:16   ` [PATCH 34/39] do_pollfd(): convert to CLASS(fd) viro
@ 2024-07-30  5:16   ` viro
  2024-08-06 22:42     ` Andrii Nakryiko
  2024-08-07 10:44     ` Christian Brauner
  2024-07-30  5:16   ` [PATCH 36/39] assorted variants of irqfd setup: convert to CLASS(fd) viro
                     ` (4 subsequent siblings)
  38 siblings, 2 replies; 134+ messages in thread
From: viro @ 2024-07-30  5:16 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: amir73il, bpf, brauner, cgroups, kvm, netdev, torvalds

From: Al Viro <viro@zeniv.linux.org.uk>

keep file reference through the entire thing, don't bother with
grabbing struct path reference (except, for now, around the LSM
call and that only until it gets constified) and while we are
at it, don't confuse the hell out of readers by random mix of
path.dentry->d_sb and path.mnt->mnt_sb uses - these two are equal,
so just put one of those into a local variable and use that.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 kernel/bpf/token.c | 69 +++++++++++++++++-----------------------------
 1 file changed, 26 insertions(+), 43 deletions(-)

diff --git a/kernel/bpf/token.c b/kernel/bpf/token.c
index 9b92cb886d49..15da405d8302 100644
--- a/kernel/bpf/token.c
+++ b/kernel/bpf/token.c
@@ -116,67 +116,52 @@ int bpf_token_create(union bpf_attr *attr)
 	struct user_namespace *userns;
 	struct inode *inode;
 	struct file *file;
+	CLASS(fd, f)(attr->token_create.bpffs_fd);
 	struct path path;
-	struct fd f;
+	struct super_block *sb;
 	umode_t mode;
 	int err, fd;
 
-	f = fdget(attr->token_create.bpffs_fd);
-	if (!fd_file(f))
+	if (fd_empty(f))
 		return -EBADF;
 
 	path = fd_file(f)->f_path;
-	path_get(&path);
-	fdput(f);
+	sb = path.dentry->d_sb;
 
-	if (path.dentry != path.mnt->mnt_sb->s_root) {
-		err = -EINVAL;
-		goto out_path;
-	}
-	if (path.mnt->mnt_sb->s_op != &bpf_super_ops) {
-		err = -EINVAL;
-		goto out_path;
-	}
+	if (path.dentry != sb->s_root)
+		return -EINVAL;
+	if (sb->s_op != &bpf_super_ops)
+		return -EINVAL;
 	err = path_permission(&path, MAY_ACCESS);
 	if (err)
-		goto out_path;
+		return err;
 
-	userns = path.dentry->d_sb->s_user_ns;
+	userns = sb->s_user_ns;
 	/*
 	 * Enforce that creators of BPF tokens are in the same user
 	 * namespace as the BPF FS instance. This makes reasoning about
 	 * permissions a lot easier and we can always relax this later.
 	 */
-	if (current_user_ns() != userns) {
-		err = -EPERM;
-		goto out_path;
-	}
-	if (!ns_capable(userns, CAP_BPF)) {
-		err = -EPERM;
-		goto out_path;
-	}
+	if (current_user_ns() != userns)
+		return -EPERM;
+	if (!ns_capable(userns, CAP_BPF))
+		return -EPERM;
 
 	/* Creating BPF token in init_user_ns doesn't make much sense. */
-	if (current_user_ns() == &init_user_ns) {
-		err = -EOPNOTSUPP;
-		goto out_path;
-	}
+	if (current_user_ns() == &init_user_ns)
+		return -EOPNOTSUPP;
 
-	mnt_opts = path.dentry->d_sb->s_fs_info;
+	mnt_opts = sb->s_fs_info;
 	if (mnt_opts->delegate_cmds == 0 &&
 	    mnt_opts->delegate_maps == 0 &&
 	    mnt_opts->delegate_progs == 0 &&
-	    mnt_opts->delegate_attachs == 0) {
-		err = -ENOENT; /* no BPF token delegation is set up */
-		goto out_path;
-	}
+	    mnt_opts->delegate_attachs == 0)
+		return -ENOENT; /* no BPF token delegation is set up */
 
 	mode = S_IFREG | ((S_IRUSR | S_IWUSR) & ~current_umask());
-	inode = bpf_get_inode(path.mnt->mnt_sb, NULL, mode);
-	if (IS_ERR(inode)) {
-		err = PTR_ERR(inode);
-		goto out_path;
-	}
+	inode = bpf_get_inode(sb, NULL, mode);
+	if (IS_ERR(inode))
+		return PTR_ERR(inode);
 
 	inode->i_op = &bpf_token_iops;
 	inode->i_fop = &bpf_token_fops;
@@ -185,8 +170,7 @@ int bpf_token_create(union bpf_attr *attr)
 	file = alloc_file_pseudo(inode, path.mnt, BPF_TOKEN_INODE_NAME, O_RDWR, &bpf_token_fops);
 	if (IS_ERR(file)) {
 		iput(inode);
-		err = PTR_ERR(file);
-		goto out_path;
+		return PTR_ERR(file);
 	}
 
 	token = kzalloc(sizeof(*token), GFP_USER);
@@ -205,7 +189,9 @@ int bpf_token_create(union bpf_attr *attr)
 	token->allowed_progs = mnt_opts->delegate_progs;
 	token->allowed_attachs = mnt_opts->delegate_attachs;
 
-	err = security_bpf_token_create(token, attr, &path);
+	path_get(&path);	// kill it
+	err = security_bpf_token_create(token, attr, &path); // constify
+	path_put(&path);	// kill it
 	if (err)
 		goto out_token;
 
@@ -218,15 +204,12 @@ int bpf_token_create(union bpf_attr *attr)
 	file->private_data = token;
 	fd_install(fd, file);
 
-	path_put(&path);
 	return fd;
 
 out_token:
 	bpf_token_free(token);
 out_file:
 	fput(file);
-out_path:
-	path_put(&path);
 	return err;
 }
 
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 134+ messages in thread

* Re: [PATCH 35/39] convert bpf_token_create()
  2024-07-30  5:16   ` [PATCH 35/39] convert bpf_token_create() viro
@ 2024-08-06 22:42     ` Andrii Nakryiko
  2024-08-10  3:46       ` Al Viro
  2024-08-07 10:44     ` Christian Brauner
  1 sibling, 1 reply; 134+ messages in thread
From: Andrii Nakryiko @ 2024-08-06 22:42 UTC (permalink / raw)
  To: viro; +Cc: linux-fsdevel, amir73il, bpf, brauner, cgroups, kvm, netdev,
	torvalds

On Mon, Jul 29, 2024 at 10:27 PM <viro@kernel.org> wrote:
>
> From: Al Viro <viro@zeniv.linux.org.uk>
>
> keep file reference through the entire thing, don't bother with
> grabbing struct path reference (except, for now, around the LSM
> call and that only until it gets constified) and while we are
> at it, don't confuse the hell out of readers by random mix of
> path.dentry->d_sb and path.mnt->mnt_sb uses - these two are equal,
> so just put one of those into a local variable and use that.
>
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> ---
>  kernel/bpf/token.c | 69 +++++++++++++++++-----------------------------
>  1 file changed, 26 insertions(+), 43 deletions(-)
>

LGTM overall (modulo // comments, but see below)

Acked-by: Andrii Nakryiko <andrii@kernel.org>

> diff --git a/kernel/bpf/token.c b/kernel/bpf/token.c
> index 9b92cb886d49..15da405d8302 100644
> --- a/kernel/bpf/token.c
> +++ b/kernel/bpf/token.c
> @@ -116,67 +116,52 @@ int bpf_token_create(union bpf_attr *attr)

[...]

> -       err = security_bpf_token_create(token, attr, &path);
> +       path_get(&path);        // kill it
> +       err = security_bpf_token_create(token, attr, &path); // constify
> +       path_put(&path);        // kill it
>         if (err)
>                 goto out_token;
>

By constify you mean something like below?

commit 06a6442ca9cc441805881eea61fd57d7defadaca
Author: Andrii Nakryiko <andrii@kernel.org>
Date:   Tue Aug 6 15:38:12 2024 -0700

    security: constify struct path in bpf_token_create() LSM hook

    There is no reason why struct path pointer shouldn't be const-qualified
    when being passed into bpf_token_create() LSM hook. Add that const.

    Suggested-by: Al Viro <viro@zeniv.linux.org.uk>
    Signed-off-by: Andrii Nakryiko <andrii@kernel.org>

diff --git a/include/linux/lsm_hook_defs.h b/include/linux/lsm_hook_defs.h
index 855db460e08b..462b55378241 100644
--- a/include/linux/lsm_hook_defs.h
+++ b/include/linux/lsm_hook_defs.h
@@ -431,7 +431,7 @@ LSM_HOOK(int, 0, bpf_prog_load, struct bpf_prog
*prog, union bpf_attr *attr,
      struct bpf_token *token)
 LSM_HOOK(void, LSM_RET_VOID, bpf_prog_free, struct bpf_prog *prog)
 LSM_HOOK(int, 0, bpf_token_create, struct bpf_token *token, union
bpf_attr *attr,
-     struct path *path)
+     const struct path *path)
 LSM_HOOK(void, LSM_RET_VOID, bpf_token_free, struct bpf_token *token)
 LSM_HOOK(int, 0, bpf_token_cmd, const struct bpf_token *token, enum
bpf_cmd cmd)
 LSM_HOOK(int, 0, bpf_token_capable, const struct bpf_token *token, int cap)
diff --git a/include/linux/security.h b/include/linux/security.h
index 1390f1efb4f0..31523a2c71c4 100644
--- a/include/linux/security.h
+++ b/include/linux/security.h
@@ -2137,7 +2137,7 @@ extern int security_bpf_prog_load(struct
bpf_prog *prog, union bpf_attr *attr,
                   struct bpf_token *token);
 extern void security_bpf_prog_free(struct bpf_prog *prog);
 extern int security_bpf_token_create(struct bpf_token *token, union
bpf_attr *attr,
-                     struct path *path);
+                     const struct path *path);
 extern void security_bpf_token_free(struct bpf_token *token);
 extern int security_bpf_token_cmd(const struct bpf_token *token, enum
bpf_cmd cmd);
 extern int security_bpf_token_capable(const struct bpf_token *token, int cap);
@@ -2177,7 +2177,7 @@ static inline void security_bpf_prog_free(struct
bpf_prog *prog)
 { }

 static inline int security_bpf_token_create(struct bpf_token *token,
union bpf_attr *attr,
-                     struct path *path)
+                        const struct path *path)
 {
     return 0;
 }
diff --git a/security/security.c b/security/security.c
index 8cee5b6c6e6d..d8d0b67ced25 100644
--- a/security/security.c
+++ b/security/security.c
@@ -5510,7 +5510,7 @@ int security_bpf_prog_load(struct bpf_prog
*prog, union bpf_attr *attr,
  * Return: Returns 0 on success, error on failure.
  */
 int security_bpf_token_create(struct bpf_token *token, union bpf_attr *attr,
-                  struct path *path)
+                  const struct path *path)
 {
     return call_int_hook(bpf_token_create, token, attr, path);
 }
diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
index 55c78c318ccd..0eec141a8f37 100644
--- a/security/selinux/hooks.c
+++ b/security/selinux/hooks.c
@@ -6965,7 +6965,7 @@ static void selinux_bpf_prog_free(struct bpf_prog *prog)
 }

 static int selinux_bpf_token_create(struct bpf_token *token, union
bpf_attr *attr,
-                    struct path *path)
+                    const struct path *path)
 {
     struct bpf_security_struct *bpfsec;


[...]

^ permalink raw reply related	[flat|nested] 134+ messages in thread

* Re: [PATCH 35/39] convert bpf_token_create()
  2024-08-06 22:42     ` Andrii Nakryiko
@ 2024-08-10  3:46       ` Al Viro
  2024-08-12 20:06         ` Andrii Nakryiko
  0 siblings, 1 reply; 134+ messages in thread
From: Al Viro @ 2024-08-10  3:46 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: viro, linux-fsdevel, amir73il, bpf, brauner, cgroups, kvm, netdev,
	torvalds

On Tue, Aug 06, 2024 at 03:42:56PM -0700, Andrii Nakryiko wrote:

> By constify you mean something like below?

Yep.  Should go through LSM folks, probably, and once it's in
those path_get() and path_put() around the call can go to hell,
along with 'path' itself (we can use file->f_path instead).

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCH 35/39] convert bpf_token_create()
  2024-08-10  3:46       ` Al Viro
@ 2024-08-12 20:06         ` Andrii Nakryiko
  0 siblings, 0 replies; 134+ messages in thread
From: Andrii Nakryiko @ 2024-08-12 20:06 UTC (permalink / raw)
  To: Al Viro
  Cc: viro, linux-fsdevel, amir73il, bpf, brauner, cgroups, kvm, netdev,
	torvalds

On Fri, Aug 9, 2024 at 8:46 PM Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> On Tue, Aug 06, 2024 at 03:42:56PM -0700, Andrii Nakryiko wrote:
>
> > By constify you mean something like below?
>
> Yep.  Should go through LSM folks, probably, and once it's in
> those path_get() and path_put() around the call can go to hell,
> along with 'path' itself (we can use file->f_path instead).

Ok, cool. Let's then do that branch you proposed, and I can send
everything on top of it, removing path_get()+path_put() you add here.

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCH 35/39] convert bpf_token_create()
  2024-07-30  5:16   ` [PATCH 35/39] convert bpf_token_create() viro
  2024-08-06 22:42     ` Andrii Nakryiko
@ 2024-08-07 10:44     ` Christian Brauner
  1 sibling, 0 replies; 134+ messages in thread
From: Christian Brauner @ 2024-08-07 10:44 UTC (permalink / raw)
  To: viro; +Cc: linux-fsdevel, amir73il, bpf, cgroups, kvm, netdev, torvalds

On Tue, Jul 30, 2024 at 01:16:21AM GMT, viro@kernel.org wrote:
> From: Al Viro <viro@zeniv.linux.org.uk>
> 
> keep file reference through the entire thing, don't bother with
> grabbing struct path reference (except, for now, around the LSM
> call and that only until it gets constified) and while we are
> at it, don't confuse the hell out of readers by random mix of
> path.dentry->d_sb and path.mnt->mnt_sb uses - these two are equal,
> so just put one of those into a local variable and use that.
> 
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> ---

Reviewed-by: Christian Brauner <brauner@kernel.org>

^ permalink raw reply	[flat|nested] 134+ messages in thread

* [PATCH 36/39] assorted variants of irqfd setup: convert to CLASS(fd)
  2024-07-30  5:15 ` [PATCH 01/39] memcg_write_event_control(): fix a user-triggerable oops viro
                     ` (33 preceding siblings ...)
  2024-07-30  5:16   ` [PATCH 35/39] convert bpf_token_create() viro
@ 2024-07-30  5:16   ` viro
  2024-08-07 10:46     ` Christian Brauner
  2024-07-30  5:16   ` [PATCH 37/39] memcg_write_event_control(): switch " viro
                     ` (3 subsequent siblings)
  38 siblings, 1 reply; 134+ messages in thread
From: viro @ 2024-07-30  5:16 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: amir73il, bpf, brauner, cgroups, kvm, netdev, torvalds

From: Al Viro <viro@zeniv.linux.org.uk>

in all of those failure exits prior to fdget() are plain returns and
the only thing done after fdput() is (on failure exits) a kfree(),
which can be done before fdput() just fine.

NOTE: in acrn_irqfd_assign() 'fail:' failure exit is wrong for
eventfd_ctx_fileget() failure (we only want fdput() there) and once
we stop doing that, it doesn't need to check if eventfd is NULL or
ERR_PTR(...) there.

NOTE: in privcmd we move fdget() up before the allocation - more
to the point, before the copy_from_user() attempt.

[trivial conflict in privcmd]

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 drivers/vfio/virqfd.c     | 16 +++-------------
 drivers/virt/acrn/irqfd.c | 13 ++++---------
 drivers/xen/privcmd.c     | 17 ++++-------------
 virt/kvm/eventfd.c        | 15 +++------------
 4 files changed, 14 insertions(+), 47 deletions(-)

diff --git a/drivers/vfio/virqfd.c b/drivers/vfio/virqfd.c
index d22881245e89..aa2891f97508 100644
--- a/drivers/vfio/virqfd.c
+++ b/drivers/vfio/virqfd.c
@@ -113,7 +113,6 @@ int vfio_virqfd_enable(void *opaque,
 		       void (*thread)(void *, void *),
 		       void *data, struct virqfd **pvirqfd, int fd)
 {
-	struct fd irqfd;
 	struct eventfd_ctx *ctx;
 	struct virqfd *virqfd;
 	int ret = 0;
@@ -133,8 +132,8 @@ int vfio_virqfd_enable(void *opaque,
 	INIT_WORK(&virqfd->inject, virqfd_inject);
 	INIT_WORK(&virqfd->flush_inject, virqfd_flush_inject);
 
-	irqfd = fdget(fd);
-	if (!fd_file(irqfd)) {
+	CLASS(fd, irqfd)(fd);
+	if (fd_empty(irqfd)) {
 		ret = -EBADF;
 		goto err_fd;
 	}
@@ -142,7 +141,7 @@ int vfio_virqfd_enable(void *opaque,
 	ctx = eventfd_ctx_fileget(fd_file(irqfd));
 	if (IS_ERR(ctx)) {
 		ret = PTR_ERR(ctx);
-		goto err_ctx;
+		goto err_fd;
 	}
 
 	virqfd->eventfd = ctx;
@@ -181,18 +180,9 @@ int vfio_virqfd_enable(void *opaque,
 		if ((!handler || handler(opaque, data)) && thread)
 			schedule_work(&virqfd->inject);
 	}
-
-	/*
-	 * Do not drop the file until the irqfd is fully initialized,
-	 * otherwise we might race against the EPOLLHUP.
-	 */
-	fdput(irqfd);
-
 	return 0;
 err_busy:
 	eventfd_ctx_put(ctx);
-err_ctx:
-	fdput(irqfd);
 err_fd:
 	kfree(virqfd);
 
diff --git a/drivers/virt/acrn/irqfd.c b/drivers/virt/acrn/irqfd.c
index 9994d818bb7e..b7da24ca1475 100644
--- a/drivers/virt/acrn/irqfd.c
+++ b/drivers/virt/acrn/irqfd.c
@@ -112,7 +112,6 @@ static int acrn_irqfd_assign(struct acrn_vm *vm, struct acrn_irqfd *args)
 	struct eventfd_ctx *eventfd = NULL;
 	struct hsm_irqfd *irqfd, *tmp;
 	__poll_t events;
-	struct fd f;
 	int ret = 0;
 
 	irqfd = kzalloc(sizeof(*irqfd), GFP_KERNEL);
@@ -124,8 +123,8 @@ static int acrn_irqfd_assign(struct acrn_vm *vm, struct acrn_irqfd *args)
 	INIT_LIST_HEAD(&irqfd->list);
 	INIT_WORK(&irqfd->shutdown, hsm_irqfd_shutdown_work);
 
-	f = fdget(args->fd);
-	if (!fd_file(f)) {
+	CLASS(fd, f)(args->fd);
+	if (fd_empty(f)) {
 		ret = -EBADF;
 		goto out;
 	}
@@ -133,7 +132,7 @@ static int acrn_irqfd_assign(struct acrn_vm *vm, struct acrn_irqfd *args)
 	eventfd = eventfd_ctx_fileget(fd_file(f));
 	if (IS_ERR(eventfd)) {
 		ret = PTR_ERR(eventfd);
-		goto fail;
+		goto out;
 	}
 
 	irqfd->eventfd = eventfd;
@@ -162,13 +161,9 @@ static int acrn_irqfd_assign(struct acrn_vm *vm, struct acrn_irqfd *args)
 	if (events & EPOLLIN)
 		acrn_irqfd_inject(irqfd);
 
-	fdput(f);
 	return 0;
 fail:
-	if (eventfd && !IS_ERR(eventfd))
-		eventfd_ctx_put(eventfd);
-
-	fdput(f);
+	eventfd_ctx_put(eventfd);
 out:
 	kfree(irqfd);
 	return ret;
diff --git a/drivers/xen/privcmd.c b/drivers/xen/privcmd.c
index ba02b732fa49..8a5bdf1f3050 100644
--- a/drivers/xen/privcmd.c
+++ b/drivers/xen/privcmd.c
@@ -939,10 +939,11 @@ static int privcmd_irqfd_assign(struct privcmd_irqfd *irqfd)
 	struct privcmd_kernel_irqfd *kirqfd, *tmp;
 	unsigned long flags;
 	__poll_t events;
-	struct fd f;
 	void *dm_op;
 	int ret, idx;
 
+	CLASS(fd, f)(irqfd->fd);
+
 	kirqfd = kzalloc(sizeof(*kirqfd) + irqfd->size, GFP_KERNEL);
 	if (!kirqfd)
 		return -ENOMEM;
@@ -958,8 +959,7 @@ static int privcmd_irqfd_assign(struct privcmd_irqfd *irqfd)
 	kirqfd->dom = irqfd->dom;
 	INIT_WORK(&kirqfd->shutdown, irqfd_shutdown);
 
-	f = fdget(irqfd->fd);
-	if (!fd_file(f)) {
+	if (fd_empty(f)) {
 		ret = -EBADF;
 		goto error_kfree;
 	}
@@ -967,7 +967,7 @@ static int privcmd_irqfd_assign(struct privcmd_irqfd *irqfd)
 	kirqfd->eventfd = eventfd_ctx_fileget(fd_file(f));
 	if (IS_ERR(kirqfd->eventfd)) {
 		ret = PTR_ERR(kirqfd->eventfd);
-		goto error_fd_put;
+		goto error_kfree;
 	}
 
 	/*
@@ -1000,20 +1000,11 @@ static int privcmd_irqfd_assign(struct privcmd_irqfd *irqfd)
 		irqfd_inject(kirqfd);
 
 	srcu_read_unlock(&irqfds_srcu, idx);
-
-	/*
-	 * Do not drop the file until the kirqfd is fully initialized, otherwise
-	 * we might race against the EPOLLHUP.
-	 */
-	fdput(f);
 	return 0;
 
 error_eventfd:
 	eventfd_ctx_put(kirqfd->eventfd);
 
-error_fd_put:
-	fdput(f);
-
 error_kfree:
 	kfree(kirqfd);
 	return ret;
diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c
index 65efb3735e79..70bc0d1f5f6a 100644
--- a/virt/kvm/eventfd.c
+++ b/virt/kvm/eventfd.c
@@ -303,7 +303,6 @@ static int
 kvm_irqfd_assign(struct kvm *kvm, struct kvm_irqfd *args)
 {
 	struct kvm_kernel_irqfd *irqfd, *tmp;
-	struct fd f;
 	struct eventfd_ctx *eventfd = NULL, *resamplefd = NULL;
 	int ret;
 	__poll_t events;
@@ -326,8 +325,8 @@ kvm_irqfd_assign(struct kvm *kvm, struct kvm_irqfd *args)
 	INIT_WORK(&irqfd->shutdown, irqfd_shutdown);
 	seqcount_spinlock_init(&irqfd->irq_entry_sc, &kvm->irqfds.lock);
 
-	f = fdget(args->fd);
-	if (!fd_file(f)) {
+	CLASS(fd, f)(args->fd);
+	if (fd_empty(f)) {
 		ret = -EBADF;
 		goto out;
 	}
@@ -335,7 +334,7 @@ kvm_irqfd_assign(struct kvm *kvm, struct kvm_irqfd *args)
 	eventfd = eventfd_ctx_fileget(fd_file(f));
 	if (IS_ERR(eventfd)) {
 		ret = PTR_ERR(eventfd);
-		goto fail;
+		goto out;
 	}
 
 	irqfd->eventfd = eventfd;
@@ -439,12 +438,6 @@ kvm_irqfd_assign(struct kvm *kvm, struct kvm_irqfd *args)
 #endif
 
 	srcu_read_unlock(&kvm->irq_srcu, idx);
-
-	/*
-	 * do not drop the file until the irqfd is fully initialized, otherwise
-	 * we might race against the EPOLLHUP
-	 */
-	fdput(f);
 	return 0;
 
 fail:
@@ -457,8 +450,6 @@ kvm_irqfd_assign(struct kvm *kvm, struct kvm_irqfd *args)
 	if (eventfd && !IS_ERR(eventfd))
 		eventfd_ctx_put(eventfd);
 
-	fdput(f);
-
 out:
 	kfree(irqfd);
 	return ret;
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 134+ messages in thread

* Re: [PATCH 36/39] assorted variants of irqfd setup: convert to CLASS(fd)
  2024-07-30  5:16   ` [PATCH 36/39] assorted variants of irqfd setup: convert to CLASS(fd) viro
@ 2024-08-07 10:46     ` Christian Brauner
  2024-08-10  3:53       ` Al Viro
  0 siblings, 1 reply; 134+ messages in thread
From: Christian Brauner @ 2024-08-07 10:46 UTC (permalink / raw)
  To: viro; +Cc: linux-fsdevel, amir73il, bpf, cgroups, kvm, netdev, torvalds

On Tue, Jul 30, 2024 at 01:16:22AM GMT, viro@kernel.org wrote:
> From: Al Viro <viro@zeniv.linux.org.uk>
> 
> in all of those failure exits prior to fdget() are plain returns and
> the only thing done after fdput() is (on failure exits) a kfree(),

They could also be converted to:

struct virqfd *virqfd __free(kfree) = NULL;

and then direct returns are usable.

> which can be done before fdput() just fine.
> 
> NOTE: in acrn_irqfd_assign() 'fail:' failure exit is wrong for
> eventfd_ctx_fileget() failure (we only want fdput() there) and once
> we stop doing that, it doesn't need to check if eventfd is NULL or
> ERR_PTR(...) there.
> 
> NOTE: in privcmd we move fdget() up before the allocation - more
> to the point, before the copy_from_user() attempt.
> 
> [trivial conflict in privcmd]
> 
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> ---

Reviewed-by: Christian Brauner <brauner@kernel.org>

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCH 36/39] assorted variants of irqfd setup: convert to CLASS(fd)
  2024-08-07 10:46     ` Christian Brauner
@ 2024-08-10  3:53       ` Al Viro
  0 siblings, 0 replies; 134+ messages in thread
From: Al Viro @ 2024-08-10  3:53 UTC (permalink / raw)
  To: Christian Brauner
  Cc: viro, linux-fsdevel, amir73il, bpf, cgroups, kvm, netdev,
	torvalds

On Wed, Aug 07, 2024 at 12:46:35PM +0200, Christian Brauner wrote:
> On Tue, Jul 30, 2024 at 01:16:22AM GMT, viro@kernel.org wrote:
> > From: Al Viro <viro@zeniv.linux.org.uk>
> > 
> > in all of those failure exits prior to fdget() are plain returns and
> > the only thing done after fdput() is (on failure exits) a kfree(),
> 
> They could also be converted to:
> 
> struct virqfd *virqfd __free(kfree) = NULL;
> 
> and then direct returns are usable.

No.  They could be converted, but kfree() is *not* the right
destructor for that thing.  This is a good example of the
reasons why __cleanup should not be used blindly - this
"oh, we'll just return, this object will be taken care of
automagically" would better really take care of the object.
Look for goto error_eventfd; in there...

^ permalink raw reply	[flat|nested] 134+ messages in thread

* [PATCH 37/39] memcg_write_event_control(): switch to CLASS(fd)
  2024-07-30  5:15 ` [PATCH 01/39] memcg_write_event_control(): fix a user-triggerable oops viro
                     ` (34 preceding siblings ...)
  2024-07-30  5:16   ` [PATCH 36/39] assorted variants of irqfd setup: convert to CLASS(fd) viro
@ 2024-07-30  5:16   ` viro
  2024-08-07 10:47     ` Christian Brauner
  2024-07-30  5:16   ` [PATCH 38/39] css_set_fork(): switch to CLASS(fd_raw, ...) viro
                     ` (2 subsequent siblings)
  38 siblings, 1 reply; 134+ messages in thread
From: viro @ 2024-07-30  5:16 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: amir73il, bpf, brauner, cgroups, kvm, netdev, torvalds

From: Al Viro <viro@zeniv.linux.org.uk>

some reordering required - take both fdget() to the point before
the allocations, with matching move of fdput() to the very end
of failure exit(s); after that it converts trivially.

simplify the cleanups that involve css_put(), while we are at it...

[stuff moved in mainline]

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 mm/memcontrol-v1.c | 44 +++++++++++++++-----------------------------
 1 file changed, 15 insertions(+), 29 deletions(-)

diff --git a/mm/memcontrol-v1.c b/mm/memcontrol-v1.c
index 9725c731fb21..478bb3130c1b 100644
--- a/mm/memcontrol-v1.c
+++ b/mm/memcontrol-v1.c
@@ -1824,8 +1824,6 @@ static ssize_t memcg_write_event_control(struct kernfs_open_file *of,
 	struct mem_cgroup_event *event;
 	struct cgroup_subsys_state *cfile_css;
 	unsigned int efd, cfd;
-	struct fd efile;
-	struct fd cfile;
 	struct dentry *cdentry;
 	const char *name;
 	char *endp;
@@ -1849,6 +1847,12 @@ static ssize_t memcg_write_event_control(struct kernfs_open_file *of,
 	else
 		return -EINVAL;
 
+	CLASS(fd, efile)(efd);
+	if (fd_empty(efile))
+		return -EBADF;
+
+	CLASS(fd, cfile)(cfd);
+
 	event = kzalloc(sizeof(*event), GFP_KERNEL);
 	if (!event)
 		return -ENOMEM;
@@ -1859,20 +1863,13 @@ static ssize_t memcg_write_event_control(struct kernfs_open_file *of,
 	init_waitqueue_func_entry(&event->wait, memcg_event_wake);
 	INIT_WORK(&event->remove, memcg_event_remove);
 
-	efile = fdget(efd);
-	if (!fd_file(efile)) {
-		ret = -EBADF;
-		goto out_kfree;
-	}
-
 	event->eventfd = eventfd_ctx_fileget(fd_file(efile));
 	if (IS_ERR(event->eventfd)) {
 		ret = PTR_ERR(event->eventfd);
-		goto out_put_efile;
+		goto out_kfree;
 	}
 
-	cfile = fdget(cfd);
-	if (!fd_file(cfile)) {
+	if (fd_empty(cfile)) {
 		ret = -EBADF;
 		goto out_put_eventfd;
 	}
@@ -1881,7 +1878,7 @@ static ssize_t memcg_write_event_control(struct kernfs_open_file *of,
 	/* AV: shouldn't we check that it's been opened for read instead? */
 	ret = file_permission(fd_file(cfile), MAY_READ);
 	if (ret < 0)
-		goto out_put_cfile;
+		goto out_put_eventfd;
 
 	/*
 	 * The control file must be a regular cgroup1 file. As a regular cgroup
@@ -1890,7 +1887,7 @@ static ssize_t memcg_write_event_control(struct kernfs_open_file *of,
 	cdentry = fd_file(cfile)->f_path.dentry;
 	if (cdentry->d_sb->s_type != &cgroup_fs_type || !d_is_reg(cdentry)) {
 		ret = -EINVAL;
-		goto out_put_cfile;
+		goto out_put_eventfd;
 	}
 
 	/*
@@ -1917,7 +1914,7 @@ static ssize_t memcg_write_event_control(struct kernfs_open_file *of,
 		event->unregister_event = memsw_cgroup_usage_unregister_event;
 	} else {
 		ret = -EINVAL;
-		goto out_put_cfile;
+		goto out_put_eventfd;
 	}
 
 	/*
@@ -1929,11 +1926,9 @@ static ssize_t memcg_write_event_control(struct kernfs_open_file *of,
 					       &memory_cgrp_subsys);
 	ret = -EINVAL;
 	if (IS_ERR(cfile_css))
-		goto out_put_cfile;
-	if (cfile_css != css) {
-		css_put(cfile_css);
-		goto out_put_cfile;
-	}
+		goto out_put_eventfd;
+	if (cfile_css != css)
+		goto out_put_css;
 
 	ret = event->register_event(memcg, event->eventfd, buf);
 	if (ret)
@@ -1944,23 +1939,14 @@ static ssize_t memcg_write_event_control(struct kernfs_open_file *of,
 	spin_lock_irq(&memcg->event_list_lock);
 	list_add(&event->list, &memcg->event_list);
 	spin_unlock_irq(&memcg->event_list_lock);
-
-	fdput(cfile);
-	fdput(efile);
-
 	return nbytes;
 
 out_put_css:
-	css_put(css);
-out_put_cfile:
-	fdput(cfile);
+	css_put(cfile_css);
 out_put_eventfd:
 	eventfd_ctx_put(event->eventfd);
-out_put_efile:
-	fdput(efile);
 out_kfree:
 	kfree(event);
-
 	return ret;
 }
 
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 134+ messages in thread

* Re: [PATCH 37/39] memcg_write_event_control(): switch to CLASS(fd)
  2024-07-30  5:16   ` [PATCH 37/39] memcg_write_event_control(): switch " viro
@ 2024-08-07 10:47     ` Christian Brauner
  0 siblings, 0 replies; 134+ messages in thread
From: Christian Brauner @ 2024-08-07 10:47 UTC (permalink / raw)
  To: viro; +Cc: linux-fsdevel, amir73il, bpf, cgroups, kvm, netdev, torvalds

On Tue, Jul 30, 2024 at 01:16:23AM GMT, viro@kernel.org wrote:
> From: Al Viro <viro@zeniv.linux.org.uk>
> 
> some reordering required - take both fdget() to the point before
> the allocations, with matching move of fdput() to the very end
> of failure exit(s); after that it converts trivially.
> 
> simplify the cleanups that involve css_put(), while we are at it...
> 
> [stuff moved in mainline]
> 
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> ---

Reviewed-by: Christian Brauner <brauner@kernel.org>

^ permalink raw reply	[flat|nested] 134+ messages in thread

* [PATCH 38/39] css_set_fork(): switch to CLASS(fd_raw, ...)
  2024-07-30  5:15 ` [PATCH 01/39] memcg_write_event_control(): fix a user-triggerable oops viro
                     ` (35 preceding siblings ...)
  2024-07-30  5:16   ` [PATCH 37/39] memcg_write_event_control(): switch " viro
@ 2024-07-30  5:16   ` viro
  2024-08-07 10:47     ` Christian Brauner
  2024-07-30  5:16   ` [PATCH 39/39] deal with the last remaing boolean uses of fd_file() viro
  2024-07-30  7:13   ` [PATCH 01/39] memcg_write_event_control(): fix a user-triggerable oops Michal Hocko
  38 siblings, 1 reply; 134+ messages in thread
From: viro @ 2024-07-30  5:16 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: amir73il, bpf, brauner, cgroups, kvm, netdev, torvalds

From: Al Viro <viro@zeniv.linux.org.uk>

reference acquired there by fget_raw() is not stashed anywhere -
we could as well borrow instead.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 kernel/cgroup/cgroup.c | 12 ++++--------
 1 file changed, 4 insertions(+), 8 deletions(-)

diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
index 1244a8c8878e..0838301e4f65 100644
--- a/kernel/cgroup/cgroup.c
+++ b/kernel/cgroup/cgroup.c
@@ -6409,7 +6409,6 @@ static int cgroup_css_set_fork(struct kernel_clone_args *kargs)
 	struct cgroup *dst_cgrp = NULL;
 	struct css_set *cset;
 	struct super_block *sb;
-	struct file *f;
 
 	if (kargs->flags & CLONE_INTO_CGROUP)
 		cgroup_lock();
@@ -6426,14 +6425,14 @@ static int cgroup_css_set_fork(struct kernel_clone_args *kargs)
 		return 0;
 	}
 
-	f = fget_raw(kargs->cgroup);
-	if (!f) {
+	CLASS(fd_raw, f)(kargs->cgroup);
+	if (fd_empty(f)) {
 		ret = -EBADF;
 		goto err;
 	}
-	sb = f->f_path.dentry->d_sb;
+	sb = fd_file(f)->f_path.dentry->d_sb;
 
-	dst_cgrp = cgroup_get_from_file(f);
+	dst_cgrp = cgroup_get_from_file(fd_file(f));
 	if (IS_ERR(dst_cgrp)) {
 		ret = PTR_ERR(dst_cgrp);
 		dst_cgrp = NULL;
@@ -6481,15 +6480,12 @@ static int cgroup_css_set_fork(struct kernel_clone_args *kargs)
 	}
 
 	put_css_set(cset);
-	fput(f);
 	kargs->cgrp = dst_cgrp;
 	return ret;
 
 err:
 	cgroup_threadgroup_change_end(current);
 	cgroup_unlock();
-	if (f)
-		fput(f);
 	if (dst_cgrp)
 		cgroup_put(dst_cgrp);
 	put_css_set(cset);
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 134+ messages in thread

* Re: [PATCH 38/39] css_set_fork(): switch to CLASS(fd_raw, ...)
  2024-07-30  5:16   ` [PATCH 38/39] css_set_fork(): switch to CLASS(fd_raw, ...) viro
@ 2024-08-07 10:47     ` Christian Brauner
  0 siblings, 0 replies; 134+ messages in thread
From: Christian Brauner @ 2024-08-07 10:47 UTC (permalink / raw)
  To: viro; +Cc: linux-fsdevel, amir73il, bpf, cgroups, kvm, netdev, torvalds

On Tue, Jul 30, 2024 at 01:16:24AM GMT, viro@kernel.org wrote:
> From: Al Viro <viro@zeniv.linux.org.uk>
> 
> reference acquired there by fget_raw() is not stashed anywhere -
> we could as well borrow instead.
> 
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> ---

Reviewed-by: Christian Brauner <brauner@kernel.org>

^ permalink raw reply	[flat|nested] 134+ messages in thread

* [PATCH 39/39] deal with the last remaing boolean uses of fd_file()
  2024-07-30  5:15 ` [PATCH 01/39] memcg_write_event_control(): fix a user-triggerable oops viro
                     ` (36 preceding siblings ...)
  2024-07-30  5:16   ` [PATCH 38/39] css_set_fork(): switch to CLASS(fd_raw, ...) viro
@ 2024-07-30  5:16   ` viro
  2024-08-07 10:48     ` Christian Brauner
  2024-07-30  7:13   ` [PATCH 01/39] memcg_write_event_control(): fix a user-triggerable oops Michal Hocko
  38 siblings, 1 reply; 134+ messages in thread
From: viro @ 2024-07-30  5:16 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: amir73il, bpf, brauner, cgroups, kvm, netdev, torvalds

From: Al Viro <viro@zeniv.linux.org.uk>

[one added in fs/stat.c]

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 drivers/infiniband/core/uverbs_cmd.c | 8 +++-----
 fs/stat.c                            | 2 +-
 include/linux/cleanup.h              | 2 +-
 sound/core/pcm_native.c              | 2 +-
 4 files changed, 6 insertions(+), 8 deletions(-)

diff --git a/drivers/infiniband/core/uverbs_cmd.c b/drivers/infiniband/core/uverbs_cmd.c
index a4cce360df21..66b02fbf077a 100644
--- a/drivers/infiniband/core/uverbs_cmd.c
+++ b/drivers/infiniband/core/uverbs_cmd.c
@@ -584,7 +584,7 @@ static int ib_uverbs_open_xrcd(struct uverbs_attr_bundle *attrs)
 	if (cmd.fd != -1) {
 		/* search for file descriptor */
 		f = fdget(cmd.fd);
-		if (!fd_file(f)) {
+		if (fd_empty(f)) {
 			ret = -EBADF;
 			goto err_tree_mutex_unlock;
 		}
@@ -632,8 +632,7 @@ static int ib_uverbs_open_xrcd(struct uverbs_attr_bundle *attrs)
 		atomic_inc(&xrcd->usecnt);
 	}
 
-	if (fd_file(f))
-		fdput(f);
+	fdput(f);
 
 	mutex_unlock(&ibudev->xrcd_tree_mutex);
 	uobj_finalize_uobj_create(&obj->uobject, attrs);
@@ -648,8 +647,7 @@ static int ib_uverbs_open_xrcd(struct uverbs_attr_bundle *attrs)
 	uobj_alloc_abort(&obj->uobject, attrs);
 
 err_tree_mutex_unlock:
-	if (fd_file(f))
-		fdput(f);
+	fdput(f);
 
 	mutex_unlock(&ibudev->xrcd_tree_mutex);
 
diff --git a/fs/stat.c b/fs/stat.c
index 56d6ce2b2c79..b7be119b070f 100644
--- a/fs/stat.c
+++ b/fs/stat.c
@@ -273,7 +273,7 @@ static int vfs_statx_fd(int fd, int flags, struct kstat *stat,
 			  u32 request_mask)
 {
 	CLASS(fd_raw, f)(fd);
-	if (!fd_file(f))
+	if (fd_empty(f))
 		return -EBADF;
 	return vfs_statx_path(&fd_file(f)->f_path, flags, stat, request_mask);
 }
diff --git a/include/linux/cleanup.h b/include/linux/cleanup.h
index a3d3e888cf1f..72615212b911 100644
--- a/include/linux/cleanup.h
+++ b/include/linux/cleanup.h
@@ -98,7 +98,7 @@ const volatile void * __must_check_fn(const volatile void *val)
  * DEFINE_CLASS(fdget, struct fd, fdput(_T), fdget(fd), int fd)
  *
  *	CLASS(fdget, f)(fd);
- *	if (!fd_file(f))
+ *	if (fd_empty(f))
  *		return -EBADF;
  *
  *	// use 'f' without concern
diff --git a/sound/core/pcm_native.c b/sound/core/pcm_native.c
index cbb9c972cb93..320a7637b662 100644
--- a/sound/core/pcm_native.c
+++ b/sound/core/pcm_native.c
@@ -2250,7 +2250,7 @@ static int snd_pcm_link(struct snd_pcm_substream *substream, int fd)
 	bool nonatomic = substream->pcm->nonatomic;
 	CLASS(fd, f)(fd);
 
-	if (!fd_file(f))
+	if (fd_empty(f))
 		return -EBADFD;
 	if (!is_pcm_file(fd_file(f)))
 		return -EBADFD;
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 134+ messages in thread

* Re: [PATCH 39/39] deal with the last remaing boolean uses of fd_file()
  2024-07-30  5:16   ` [PATCH 39/39] deal with the last remaing boolean uses of fd_file() viro
@ 2024-08-07 10:48     ` Christian Brauner
  0 siblings, 0 replies; 134+ messages in thread
From: Christian Brauner @ 2024-08-07 10:48 UTC (permalink / raw)
  To: viro; +Cc: linux-fsdevel, amir73il, bpf, cgroups, kvm, netdev, torvalds

On Tue, Jul 30, 2024 at 01:16:25AM GMT, viro@kernel.org wrote:
> From: Al Viro <viro@zeniv.linux.org.uk>
> 
> [one added in fs/stat.c]
> 
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> ---

Reviewed-by: Christian Brauner <brauner@kernel.org>

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCH 01/39] memcg_write_event_control(): fix a user-triggerable oops
  2024-07-30  5:15 ` [PATCH 01/39] memcg_write_event_control(): fix a user-triggerable oops viro
                     ` (37 preceding siblings ...)
  2024-07-30  5:16   ` [PATCH 39/39] deal with the last remaing boolean uses of fd_file() viro
@ 2024-07-30  7:13   ` Michal Hocko
  2024-07-30  7:18     ` Al Viro
  38 siblings, 1 reply; 134+ messages in thread
From: Michal Hocko @ 2024-07-30  7:13 UTC (permalink / raw)
  To: viro; +Cc: linux-fsdevel, amir73il, bpf, brauner, cgroups, kvm, netdev,
	torvalds

On Tue 30-07-24 01:15:47, viro@kernel.org wrote:
> From: Al Viro <viro@zeniv.linux.org.uk>
> 
> we are *not* guaranteed that anything past the terminating NUL
> is mapped (let alone initialized with anything sane).
> 
> [the sucker got moved in mainline]
> 

You could have preserved
Fixes: 0dea116876ee ("cgroup: implement eventfd-based generic API for notifications")
Cc: stable

> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

and
Acked-by: Michal Hocko <mhocko@suse.com>

> ---
>  mm/memcontrol-v1.c | 7 +++++--
>  1 file changed, 5 insertions(+), 2 deletions(-)
> 
> diff --git a/mm/memcontrol-v1.c b/mm/memcontrol-v1.c
> index 2aeea4d8bf8e..417c96f2da28 100644
> --- a/mm/memcontrol-v1.c
> +++ b/mm/memcontrol-v1.c
> @@ -1842,9 +1842,12 @@ static ssize_t memcg_write_event_control(struct kernfs_open_file *of,
>  	buf = endp + 1;
>  
>  	cfd = simple_strtoul(buf, &endp, 10);
> -	if ((*endp != ' ') && (*endp != '\0'))
> +	if (*endp == '\0')
> +		buf = endp;
> +	else if (*endp == ' ')
> +		buf = endp + 1;
> +	else
>  		return -EINVAL;
> -	buf = endp + 1;
>  
>  	event = kzalloc(sizeof(*event), GFP_KERNEL);
>  	if (!event)
> -- 
> 2.39.2
> 

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCH 01/39] memcg_write_event_control(): fix a user-triggerable oops
  2024-07-30  7:13   ` [PATCH 01/39] memcg_write_event_control(): fix a user-triggerable oops Michal Hocko
@ 2024-07-30  7:18     ` Al Viro
  2024-07-30  7:37       ` Michal Hocko
  0 siblings, 1 reply; 134+ messages in thread
From: Al Viro @ 2024-07-30  7:18 UTC (permalink / raw)
  To: Michal Hocko
  Cc: viro, linux-fsdevel, amir73il, bpf, brauner, cgroups, kvm, netdev,
	torvalds

On Tue, Jul 30, 2024 at 09:13:38AM +0200, Michal Hocko wrote:
> On Tue 30-07-24 01:15:47, viro@kernel.org wrote:
> > From: Al Viro <viro@zeniv.linux.org.uk>
> > 
> > we are *not* guaranteed that anything past the terminating NUL
> > is mapped (let alone initialized with anything sane).
> > 
> > [the sucker got moved in mainline]
> > 
> 
> You could have preserved
> Fixes: 0dea116876ee ("cgroup: implement eventfd-based generic API for notifications")
> Cc: stable
> 
> > Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> 
> and
> Acked-by: Michal Hocko <mhocko@suse.com>

Will do; FWIW, I think it would be better off going via the
cgroup tree - it's completely orthogonal to the rest of the
series, the only relation being "got caught during the same
audit"...

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCH 01/39] memcg_write_event_control(): fix a user-triggerable oops
  2024-07-30  7:18     ` Al Viro
@ 2024-07-30  7:37       ` Michal Hocko
  0 siblings, 0 replies; 134+ messages in thread
From: Michal Hocko @ 2024-07-30  7:37 UTC (permalink / raw)
  To: Al Viro, Andrew Morton
  Cc: viro, linux-fsdevel, amir73il, bpf, brauner, cgroups, kvm, netdev,
	torvalds

I think it can go through Andrew as well. The patch is 
https://lore.kernel.org/all/20240730051625.14349-1-viro@kernel.org/T/#u

On Tue 30-07-24 08:18:51, Al Viro wrote:
> On Tue, Jul 30, 2024 at 09:13:38AM +0200, Michal Hocko wrote:
> > On Tue 30-07-24 01:15:47, viro@kernel.org wrote:
> > > From: Al Viro <viro@zeniv.linux.org.uk>
> > > 
> > > we are *not* guaranteed that anything past the terminating NUL
> > > is mapped (let alone initialized with anything sane).
> > > 
> > > [the sucker got moved in mainline]
> > > 
> > 
> > You could have preserved
> > Fixes: 0dea116876ee ("cgroup: implement eventfd-based generic API for notifications")
> > Cc: stable
> > 
> > > Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> > 
> > and
> > Acked-by: Michal Hocko <mhocko@suse.com>
> 
> Will do; FWIW, I think it would be better off going via the
> cgroup tree - it's completely orthogonal to the rest of the
> series, the only relation being "got caught during the same
> audit"...

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCHSET][RFC] struct fd and memory safety
  2024-07-30  5:09 [PATCHSET][RFC] struct fd and memory safety Al Viro
  2024-07-30  5:15 ` [PATCH 01/39] memcg_write_event_control(): fix a user-triggerable oops viro
@ 2024-07-30  5:17 ` Al Viro
  2024-07-30 20:02 ` Josef Bacik
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 134+ messages in thread
From: Al Viro @ 2024-07-30  5:17 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: Linus Torvalds, Christian Brauner, bpf, Amir Goldstein, kvm,
	cgroups, netdev

On Tue, Jul 30, 2024 at 06:09:27AM +0100, Al Viro wrote:
 
> Individual patches in followups; it definitely needs review and testing.

... followups sent via kernel.org account, due to git-send-email missing
on zeniv at the moment ;-/

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCHSET][RFC] struct fd and memory safety
  2024-07-30  5:09 [PATCHSET][RFC] struct fd and memory safety Al Viro
  2024-07-30  5:15 ` [PATCH 01/39] memcg_write_event_control(): fix a user-triggerable oops viro
  2024-07-30  5:17 ` [PATCHSET][RFC] struct fd and memory safety Al Viro
@ 2024-07-30 20:02 ` Josef Bacik
  2024-07-31  0:43 ` Al Viro
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 134+ messages in thread
From: Josef Bacik @ 2024-07-30 20:02 UTC (permalink / raw)
  To: Al Viro
  Cc: linux-fsdevel, Linus Torvalds, Christian Brauner, bpf,
	Amir Goldstein, kvm, cgroups, netdev

On Tue, Jul 30, 2024 at 06:09:27AM +0100, Al Viro wrote:
> 		Background
> 		==========
> 

<snip>

I reviewed the best I could, validated all the error paths, checked the
resulting code.  There's a few places where you've left // style comments that I
assume you'll clean up before you are done.  I had two questions, one about
formatting and one about a changelog.  I don't think either of those are going
to affect my review if you change them, so you can add

Reviewed-by: Josef Bacik <josef@toxicpanda.com>

to the series.  Thanks,

Josef

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCHSET][RFC] struct fd and memory safety
  2024-07-30  5:09 [PATCHSET][RFC] struct fd and memory safety Al Viro
                   ` (2 preceding siblings ...)
  2024-07-30 20:02 ` Josef Bacik
@ 2024-07-31  0:43 ` Al Viro
  2024-08-06 17:58 ` Jason Gunthorpe
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 134+ messages in thread
From: Al Viro @ 2024-07-31  0:43 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: Linus Torvalds, Christian Brauner, bpf, Amir Goldstein, kvm,
	cgroups, netdev

On Tue, Jul 30, 2024 at 06:09:27AM +0100, Al Viro wrote:

> 	10a.  All calls of fdput_pos() that return a non-empty
	                   fdget_pos(), that is.

> value are followed by exactly one call of fdput_pos().

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCHSET][RFC] struct fd and memory safety
  2024-07-30  5:09 [PATCHSET][RFC] struct fd and memory safety Al Viro
                   ` (3 preceding siblings ...)
  2024-07-31  0:43 ` Al Viro
@ 2024-08-06 17:58 ` Jason Gunthorpe
  2024-08-06 18:56   ` Al Viro
  2024-08-07 10:51 ` Christian Brauner
  2024-11-02  5:02 ` [PATCHSET v3] " Al Viro
  6 siblings, 1 reply; 134+ messages in thread
From: Jason Gunthorpe @ 2024-08-06 17:58 UTC (permalink / raw)
  To: Al Viro
  Cc: linux-fsdevel, Linus Torvalds, Christian Brauner, bpf,
	Amir Goldstein, kvm, cgroups, netdev

On Tue, Jul 30, 2024 at 06:09:27AM +0100, Al Viro wrote:

> 	* ib_uverbs_open_xrcd().  FWIW, a closer look shows that the
> damn thing is buggy - it accepts _any_ descriptor and pins the associated
> inode.  mount tmpfs, open a file there, feed it to that, unmount and
> watch the show...

What happens? There is still an igrab() while it is in the red black
tree?

> AFAICS, that's done for the sake of libibverbs and
> I've no idea how it's actually used - all examples I'd been able to
> find use -1 for descriptor here.  Needs to be discussed with infiniband
> folks (Sean Hefty?).  For now, leave that as-is.

The design seems insane, but it is what it is from 20 years ago..

Userspace can affiliate this "xrc domain" with a file in the
filesystem. Any file. That is actually a deliberate part of the API.

This is done as some ugly way to pass xrc domain object from process A
to process B. IIRC the idea is process A will affiliate the object
with a file and then B will be able to access the shared object if B
is able to open the file.

It looks like the code keeps a red/black tree of this association, and
holds an igrab while the inode is in that tree..

Jason

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCHSET][RFC] struct fd and memory safety
  2024-08-06 17:58 ` Jason Gunthorpe
@ 2024-08-06 18:56   ` Al Viro
  0 siblings, 0 replies; 134+ messages in thread
From: Al Viro @ 2024-08-06 18:56 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: linux-fsdevel, Linus Torvalds, Christian Brauner, bpf,
	Amir Goldstein, kvm, cgroups, netdev

On Tue, Aug 06, 2024 at 02:58:59PM -0300, Jason Gunthorpe wrote:
> On Tue, Jul 30, 2024 at 06:09:27AM +0100, Al Viro wrote:
> 
> > 	* ib_uverbs_open_xrcd().  FWIW, a closer look shows that the
> > damn thing is buggy - it accepts _any_ descriptor and pins the associated
> > inode.  mount tmpfs, open a file there, feed it to that, unmount and
> > watch the show...
> 
> What happens? There is still an igrab() while it is in the red black
> tree?

... which does not render the mount busy.

> > AFAICS, that's done for the sake of libibverbs and
> > I've no idea how it's actually used - all examples I'd been able to
> > find use -1 for descriptor here.  Needs to be discussed with infiniband
> > folks (Sean Hefty?).  For now, leave that as-is.
> 
> The design seems insane, but it is what it is from 20 years ago..
> 
> Userspace can affiliate this "xrc domain" with a file in the
> filesystem. Any file. That is actually a deliberate part of the API.
> 
> This is done as some ugly way to pass xrc domain object from process A
> to process B. IIRC the idea is process A will affiliate the object
> with a file and then B will be able to access the shared object if B
> is able to open the file.
> 
> It looks like the code keeps a red/black tree of this association, and
> holds an igrab while the inode is in that tree..

You need a mount (or file) reference to prevent fs destruction by umount.
igrab() pins an _inode_, but the caller must arrange for the hosting
filesystem to stay alive.

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCHSET][RFC] struct fd and memory safety
  2024-07-30  5:09 [PATCHSET][RFC] struct fd and memory safety Al Viro
                   ` (4 preceding siblings ...)
  2024-08-06 17:58 ` Jason Gunthorpe
@ 2024-08-07 10:51 ` Christian Brauner
  2024-11-02  5:02 ` [PATCHSET v3] " Al Viro
  6 siblings, 0 replies; 134+ messages in thread
From: Christian Brauner @ 2024-08-07 10:51 UTC (permalink / raw)
  To: Al Viro
  Cc: linux-fsdevel, Linus Torvalds, bpf, Amir Goldstein, kvm, cgroups,
	netdev

On Tue, Jul 30, 2024 at 06:09:27AM GMT, Al Viro wrote:
> 		Background
> 		==========

That series looks good and pretty simple to me. There was nothing really
suprising in that code.

^ permalink raw reply	[flat|nested] 134+ messages in thread

* [PATCHSET v3] struct fd and memory safety
  2024-07-30  5:09 [PATCHSET][RFC] struct fd and memory safety Al Viro
                   ` (5 preceding siblings ...)
  2024-08-07 10:51 ` Christian Brauner
@ 2024-11-02  5:02 ` Al Viro
  2024-11-02  5:07   ` [PATCH v3 01/28] net/socket.c: switch to CLASS(fd) Al Viro
  6 siblings, 1 reply; 134+ messages in thread
From: Al Viro @ 2024-11-02  5:02 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: Linus Torvalds, Christian Brauner, kvm, cgroups, netdev

	struct fd stuff got rebased (with fairly minor conflicts), branch
lives in the same place -
	git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs.git work.fd

	Changes since the previous version:
* branch rebased to 6.12-rc2
* the fixes gone into mainline.
* so's the conversion to new layout and accessors.
* bpf side of things (with modifications) is gone into mainline (via bpf tree).
* struct fderr side dropped - overlayfs doesn't need that anymore and while it's
possible that use cases show up, for now there's none.
* coda_parse_fd() part dropped - no longer valid due to mainline changes.
* fs/xattr.c and fs/stat.c changes moved to separate branches (#work.xattr2 and
#work.statx2 resp.)

Individual patches in followups; review and testing would be welcome.
If no objections materialize, I'm going to put that into #for-next on
Monday.

Diffstat:
 arch/alpha/kernel/osf_sys.c                |   5 +-
 arch/arm/kernel/sys_oabi-compat.c          |  10 +-
 arch/powerpc/kvm/book3s_64_vio.c           |  21 +-
 arch/powerpc/kvm/powerpc.c                 |  24 +--
 arch/powerpc/platforms/cell/spu_syscalls.c |  68 +++----
 arch/x86/kernel/cpu/sgx/main.c             |  10 +-
 arch/x86/kvm/svm/sev.c                     |  39 ++--
 drivers/gpu/drm/amd/amdgpu/amdgpu_sched.c  |  23 +--
 drivers/gpu/drm/drm_syncobj.c              |   9 +-
 drivers/infiniband/core/ucma.c             |  19 +-
 drivers/infiniband/core/uverbs_cmd.c       |   8 +-
 drivers/media/mc/mc-request.c              |  18 +-
 drivers/media/rc/lirc_dev.c                |  13 +-
 drivers/vfio/group.c                       |   6 +-
 drivers/vfio/virqfd.c                      |  16 +-
 drivers/virt/acrn/irqfd.c                  |  13 +-
 drivers/xen/privcmd.c                      |  28 +--
 fs/btrfs/ioctl.c                           |   5 +-
 fs/eventfd.c                               |   9 +-
 fs/eventpoll.c                             |  38 ++--
 fs/ext4/ioctl.c                            |  21 +-
 fs/f2fs/file.c                             |  15 +-
 fs/fcntl.c                                 |  42 ++--
 fs/fhandle.c                               |   5 +-
 fs/fsopen.c                                |  19 +-
 fs/fuse/dev.c                              |   6 +-
 fs/ioctl.c                                 |  23 +--
 fs/kernel_read_file.c                      |  12 +-
 fs/locks.c                                 |  15 +-
 fs/namei.c                                 |  13 +-
 fs/namespace.c                             |  47 ++---
 fs/notify/fanotify/fanotify_user.c         |  44 ++---
 fs/notify/inotify/inotify_user.c           |  38 ++--
 fs/ocfs2/cluster/heartbeat.c               |  24 +--
 fs/open.c                                  |  61 +++---
 fs/quota/quota.c                           |  12 +-
 fs/read_write.c                            | 145 +++++---------
 fs/readdir.c                               |  28 +--
 fs/remap_range.c                           |  11 +-
 fs/select.c                                |  48 ++---
 fs/signalfd.c                              |   9 +-
 fs/smb/client/ioctl.c                      |  11 +-
 fs/splice.c                                |  78 +++-----
 fs/statfs.c                                |  12 +-
 fs/sync.c                                  |  29 ++-
 fs/timerfd.c                               |  40 ++--
 fs/utimes.c                                |  11 +-
 fs/xfs/xfs_exchrange.c                     |  18 +-
 fs/xfs/xfs_handle.c                        |  16 +-
 fs/xfs/xfs_ioctl.c                         |  69 ++-----
 include/linux/cleanup.h                    |   2 +-
 include/linux/file.h                       |   7 +-
 include/linux/netlink.h                    |   2 +-
 io_uring/sqpoll.c                          |  29 +--
 ipc/mqueue.c                               | 109 +++--------
 kernel/cgroup/cgroup.c                     |  21 +-
 kernel/events/core.c                       |  63 ++----
 kernel/module/main.c                       |  15 +-
 kernel/nsproxy.c                           |   5 +-
 kernel/pid.c                               |  20 +-
 kernel/signal.c                            |  29 +--
 kernel/sys.c                               |  15 +-
 kernel/taskstats.c                         |  18 +-
 kernel/watch_queue.c                       |   6 +-
 mm/fadvise.c                               |  10 +-
 mm/filemap.c                               |  17 +-
 mm/memcontrol-v1.c                         |  44 ++---
 mm/readahead.c                             |  17 +-
 net/core/net_namespace.c                   |  10 +-
 net/netlink/af_netlink.c                   |   9 +-
 net/socket.c                               | 303 +++++++++++++----------------
 security/integrity/ima/ima_main.c          |   7 +-
 security/landlock/syscalls.c               |  45 ++---
 security/loadpin/loadpin.c                 |   8 +-
 sound/core/pcm_native.c                    |   2 +-
 virt/kvm/eventfd.c                         |  15 +-
 virt/kvm/vfio.c                            |  14 +-
 77 files changed, 751 insertions(+), 1395 deletions(-)

Shortlog and commit summaries:

01/28) net/socket.c: switch to CLASS(fd)
	Get rid of the sockfd_lookup_light() and associated irregularities;
fput_light() gone, old users of sockfd_lookup_light() switched to CLASS(fd) +
sock_from_file().

02/28) regularize emptiness checks in fini_module(2) and vfs_dedupe_file_range()

Getting rid of passing struct fd by reference:
03/28) timerfd: switch to CLASS(fd, ...)
04/28) get rid of perf_fget_light(), convert kernel/events/core.c to CLASS(fd)

do_mq_notify() regularization:
05/28) switch netlink_getsockbyfilp() to taking descriptor
06/28) do_mq_notify(): saner skb freeing on failures
07/28) do_mq_notify(): switch to CLASS(fd, ...)

After that the weirdness with reassignments in do_mq_notify() is gone
(and, IMO, the result is easier to follow).

08/28) simplify xfs_find_handle() a bit
	Massage to get rid of reassignment there; simplifies control flow...

Making sure that fdget() and fdput() are done in the same function:
09/28) convert vmsplice() to CLASS(fd, ...)

Deal with fdget_raw() and fdget_pos() users - all trivial to convert.
10/28) fdget_raw() users: switch to CLASS(fd_raw, ...)
11/28) introduce "fd_pos" class, convert fdget_pos() users to it.

Prep for fdget() conversions:
12/28) o2hb_region_dev_store(): avoid goto around fdget()/fdput()
13/28) privcmd_ioeventfd_assign(): don't open-code eventfd_ctx_fdget()

14/28) fdget(), trivial conversions.
	Big one: all callers that have fdget() done the first thing in
scope, with all matching fdput() immediately followed by leaving the
scope.  All of those are trivial to convert.
15/28) fdget(), more trivial conversions
	Same, except that fdget() is preceded by some work.  All fdput()
are still immediately followed by leaving the scope.  These are also
trivial to convert, and along with the previous commit that takes care
of the majority of fdget() calls.

16/28) convert do_preadv()/do_pwritev()
	fdput() is transposable with everything done after it (inc_syscw()
et.al.)
17/28) convert cachestat(2)
	fdput() is transposable with copy_to_user() downstream of it.

18/28) switch spufs_calls_{get,put}() to CLASS() use
19/28) convert spu_run(2)
	fdput() used to be followed by spufs_calls_put(); we could transpose
those two, but spufs_calls_get()/spufd_calls_put() itself can be converted
to CLASS() use and it's cleaner that way.

20/28) convert media_request_get_by_fd()
	fdput() is transposable with debugging printk

21/28) convert cifs_ioctl_copychunk()
	fdput() moved past mnt_drop_file_write(); harmless, if somewhat
cringeworthy.  Reordering could be avoided either by adding an explicit
scope or by making mnt_drop_file_write() called via __cleanup...

22/28) convert vfs_dedupe_file_range()
	fdput() is followed by checking fatal_signal_pending() (and aborting
the loop in such case).  fdput() is transposable with that check.
Yes, it'll probably end up with slightly fatter code (call after the
check has returned false + call on the almost never taken out-of-line path
instead of one call before the check), but it's not worth bothering with
explicit extra scope there (or dragging the check into the loop condition,
for that matter).

23/28) convert do_select()
	take the logics from fdget() to fdput() into an inlined helper -
with existing wait_key_set() subsumed into that.
24/28) do_pollfd(): convert to CLASS(fd)
	lift setting ->revents into the caller, so that failure exits
(including the early one) would be plain returns.

25/28) assorted variants of irqfd setup: convert to CLASS(fd)
	fdput() is transposable with kfree(); some reordering
is required in one of those (we do fdget() a bit earlier there).
26/28) memcg_write_event_control(): switch to CLASS(fd)
	similar to the previous.  As the matter of fact, there
might be a missing common helper or two hiding in both...

27/28) css_set_fork(): switch to CLASS(fd_raw, ...)
	could be separated from the series; its use of fget_raw()
could be converted to fdget_raw(), with the result convertible to
CLASS(fd_raw)

28/28) deal with the last remaing boolean uses of fd_file()
	most of them had been converted to fd_empty() by now; pick
the few remaining strugglers.

^ permalink raw reply	[flat|nested] 134+ messages in thread

* [PATCH v3 01/28] net/socket.c: switch to CLASS(fd)
  2024-11-02  5:02 ` [PATCHSET v3] " Al Viro
@ 2024-11-02  5:07   ` Al Viro
  2024-11-02  5:08     ` [PATCH v3 02/28] regularize emptiness checks in fini_module(2) and vfs_dedupe_file_range() Al Viro
                       ` (27 more replies)
  0 siblings, 28 replies; 134+ messages in thread
From: Al Viro @ 2024-11-02  5:07 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: viro, brauner, cgroups, kvm, netdev, torvalds

	The important part in sockfd_lookup_light() is avoiding needless
file refcount operations, not the marginal reduction of the register
pressure from not keeping a struct file pointer in the caller.

	Switch to use fdget()/fdpu(); with sane use of CLASS(fd) we can
get a better code generation...

	Would be nice if somebody tested it on networking test suites
(including benchmarks)...

	sockfd_lookup_light() does fdget(), uses sock_from_file() to
get the associated socket and returns the struct socket reference to
the caller, along with "do we need to fput()" flag.  No matching fdput(),
the caller does its equivalent manually, using the fact that sock->file
points to the struct file the socket has come from.

	Get rid of that - have the callers do fdget()/fdput() and
use sock_from_file() directly.  That kills sockfd_lookup_light()
and fput_light() (no users left).

	What's more, we can get rid of explicit fdget()/fdput() by
switching to CLASS(fd, ...) - code generation does not suffer, since
now fdput() inserted on "descriptor is not opened" failure exit
is recognized to be a no-op by compiler.

Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 include/linux/file.h |   6 -
 net/socket.c         | 303 +++++++++++++++++++------------------------
 2 files changed, 137 insertions(+), 172 deletions(-)

diff --git a/include/linux/file.h b/include/linux/file.h
index f98de143245a..b49a92295b3f 100644
--- a/include/linux/file.h
+++ b/include/linux/file.h
@@ -30,12 +30,6 @@ extern struct file *alloc_file_pseudo_noaccount(struct inode *, struct vfsmount
 extern struct file *alloc_file_clone(struct file *, int flags,
 	const struct file_operations *);
 
-static inline void fput_light(struct file *file, int fput_needed)
-{
-	if (fput_needed)
-		fput(file);
-}
-
 /* either a reference to struct file + flags
  * (cloned vs. borrowed, pos locked), with
  * flags stored in lower bits of value,
diff --git a/net/socket.c b/net/socket.c
index 601ad74930ef..fb3806a11f94 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -509,7 +509,7 @@ static int sock_map_fd(struct socket *sock, int flags)
 
 struct socket *sock_from_file(struct file *file)
 {
-	if (file->f_op == &socket_file_ops)
+	if (likely(file->f_op == &socket_file_ops))
 		return file->private_data;	/* set in sock_alloc_file */
 
 	return NULL;
@@ -549,24 +549,6 @@ struct socket *sockfd_lookup(int fd, int *err)
 }
 EXPORT_SYMBOL(sockfd_lookup);
 
-static struct socket *sockfd_lookup_light(int fd, int *err, int *fput_needed)
-{
-	struct fd f = fdget(fd);
-	struct socket *sock;
-
-	*err = -EBADF;
-	if (fd_file(f)) {
-		sock = sock_from_file(fd_file(f));
-		if (likely(sock)) {
-			*fput_needed = f.word & FDPUT_FPUT;
-			return sock;
-		}
-		*err = -ENOTSOCK;
-		fdput(f);
-	}
-	return NULL;
-}
-
 static ssize_t sockfs_listxattr(struct dentry *dentry, char *buffer,
 				size_t size)
 {
@@ -1853,16 +1835,20 @@ int __sys_bind(int fd, struct sockaddr __user *umyaddr, int addrlen)
 {
 	struct socket *sock;
 	struct sockaddr_storage address;
-	int err, fput_needed;
-
-	sock = sockfd_lookup_light(fd, &err, &fput_needed);
-	if (sock) {
-		err = move_addr_to_kernel(umyaddr, addrlen, &address);
-		if (!err)
-			err = __sys_bind_socket(sock, &address, addrlen);
-		fput_light(sock->file, fput_needed);
-	}
-	return err;
+	CLASS(fd, f)(fd);
+	int err;
+
+	if (fd_empty(f))
+		return -EBADF;
+	sock = sock_from_file(fd_file(f));
+	if (unlikely(!sock))
+		return -ENOTSOCK;
+
+	err = move_addr_to_kernel(umyaddr, addrlen, &address);
+	if (unlikely(err))
+		return err;
+
+	return __sys_bind_socket(sock, &address, addrlen);
 }
 
 SYSCALL_DEFINE3(bind, int, fd, struct sockaddr __user *, umyaddr, int, addrlen)
@@ -1891,15 +1877,16 @@ int __sys_listen_socket(struct socket *sock, int backlog)
 
 int __sys_listen(int fd, int backlog)
 {
+	CLASS(fd, f)(fd);
 	struct socket *sock;
-	int err, fput_needed;
 
-	sock = sockfd_lookup_light(fd, &err, &fput_needed);
-	if (sock) {
-		err = __sys_listen_socket(sock, backlog);
-		fput_light(sock->file, fput_needed);
-	}
-	return err;
+	if (fd_empty(f))
+		return -EBADF;
+	sock = sock_from_file(fd_file(f));
+	if (unlikely(!sock))
+		return -ENOTSOCK;
+
+	return __sys_listen_socket(sock, backlog);
 }
 
 SYSCALL_DEFINE2(listen, int, fd, int, backlog)
@@ -2009,17 +1996,12 @@ static int __sys_accept4_file(struct file *file, struct sockaddr __user *upeer_s
 int __sys_accept4(int fd, struct sockaddr __user *upeer_sockaddr,
 		  int __user *upeer_addrlen, int flags)
 {
-	int ret = -EBADF;
-	struct fd f;
+	CLASS(fd, f)(fd);
 
-	f = fdget(fd);
-	if (fd_file(f)) {
-		ret = __sys_accept4_file(fd_file(f), upeer_sockaddr,
+	if (fd_empty(f))
+		return -EBADF;
+	return __sys_accept4_file(fd_file(f), upeer_sockaddr,
 					 upeer_addrlen, flags);
-		fdput(f);
-	}
-
-	return ret;
 }
 
 SYSCALL_DEFINE4(accept4, int, fd, struct sockaddr __user *, upeer_sockaddr,
@@ -2071,20 +2053,18 @@ int __sys_connect_file(struct file *file, struct sockaddr_storage *address,
 
 int __sys_connect(int fd, struct sockaddr __user *uservaddr, int addrlen)
 {
-	int ret = -EBADF;
-	struct fd f;
+	struct sockaddr_storage address;
+	CLASS(fd, f)(fd);
+	int ret;
 
-	f = fdget(fd);
-	if (fd_file(f)) {
-		struct sockaddr_storage address;
+	if (fd_empty(f))
+		return -EBADF;
 
-		ret = move_addr_to_kernel(uservaddr, addrlen, &address);
-		if (!ret)
-			ret = __sys_connect_file(fd_file(f), &address, addrlen, 0);
-		fdput(f);
-	}
+	ret = move_addr_to_kernel(uservaddr, addrlen, &address);
+	if (ret)
+		return ret;
 
-	return ret;
+	return __sys_connect_file(fd_file(f), &address, addrlen, 0);
 }
 
 SYSCALL_DEFINE3(connect, int, fd, struct sockaddr __user *, uservaddr,
@@ -2103,26 +2083,25 @@ int __sys_getsockname(int fd, struct sockaddr __user *usockaddr,
 {
 	struct socket *sock;
 	struct sockaddr_storage address;
-	int err, fput_needed;
+	CLASS(fd, f)(fd);
+	int err;
 
-	sock = sockfd_lookup_light(fd, &err, &fput_needed);
-	if (!sock)
-		goto out;
+	if (fd_empty(f))
+		return -EBADF;
+	sock = sock_from_file(fd_file(f));
+	if (unlikely(!sock))
+		return -ENOTSOCK;
 
 	err = security_socket_getsockname(sock);
 	if (err)
-		goto out_put;
+		return err;
 
 	err = READ_ONCE(sock->ops)->getname(sock, (struct sockaddr *)&address, 0);
 	if (err < 0)
-		goto out_put;
-	/* "err" is actually length in this case */
-	err = move_addr_to_user(&address, err, usockaddr, usockaddr_len);
+		return err;
 
-out_put:
-	fput_light(sock->file, fput_needed);
-out:
-	return err;
+	/* "err" is actually length in this case */
+	return move_addr_to_user(&address, err, usockaddr, usockaddr_len);
 }
 
 SYSCALL_DEFINE3(getsockname, int, fd, struct sockaddr __user *, usockaddr,
@@ -2141,26 +2120,25 @@ int __sys_getpeername(int fd, struct sockaddr __user *usockaddr,
 {
 	struct socket *sock;
 	struct sockaddr_storage address;
-	int err, fput_needed;
+	CLASS(fd, f)(fd);
+	int err;
 
-	sock = sockfd_lookup_light(fd, &err, &fput_needed);
-	if (sock != NULL) {
-		const struct proto_ops *ops = READ_ONCE(sock->ops);
+	if (fd_empty(f))
+		return -EBADF;
+	sock = sock_from_file(fd_file(f));
+	if (unlikely(!sock))
+		return -ENOTSOCK;
 
-		err = security_socket_getpeername(sock);
-		if (err) {
-			fput_light(sock->file, fput_needed);
-			return err;
-		}
+	err = security_socket_getpeername(sock);
+	if (err)
+		return err;
 
-		err = ops->getname(sock, (struct sockaddr *)&address, 1);
-		if (err >= 0)
-			/* "err" is actually length in this case */
-			err = move_addr_to_user(&address, err, usockaddr,
-						usockaddr_len);
-		fput_light(sock->file, fput_needed);
-	}
-	return err;
+	err = READ_ONCE(sock->ops)->getname(sock, (struct sockaddr *)&address, 1);
+	if (err < 0)
+		return err;
+
+	/* "err" is actually length in this case */
+	return move_addr_to_user(&address, err, usockaddr, usockaddr_len);
 }
 
 SYSCALL_DEFINE3(getpeername, int, fd, struct sockaddr __user *, usockaddr,
@@ -2181,14 +2159,17 @@ int __sys_sendto(int fd, void __user *buff, size_t len, unsigned int flags,
 	struct sockaddr_storage address;
 	int err;
 	struct msghdr msg;
-	int fput_needed;
 
 	err = import_ubuf(ITER_SOURCE, buff, len, &msg.msg_iter);
 	if (unlikely(err))
 		return err;
-	sock = sockfd_lookup_light(fd, &err, &fput_needed);
-	if (!sock)
-		goto out;
+
+	CLASS(fd, f)(fd);
+	if (fd_empty(f))
+		return -EBADF;
+	sock = sock_from_file(fd_file(f));
+	if (unlikely(!sock))
+		return -ENOTSOCK;
 
 	msg.msg_name = NULL;
 	msg.msg_control = NULL;
@@ -2198,7 +2179,7 @@ int __sys_sendto(int fd, void __user *buff, size_t len, unsigned int flags,
 	if (addr) {
 		err = move_addr_to_kernel(addr, addr_len, &address);
 		if (err < 0)
-			goto out_put;
+			return err;
 		msg.msg_name = (struct sockaddr *)&address;
 		msg.msg_namelen = addr_len;
 	}
@@ -2206,12 +2187,7 @@ int __sys_sendto(int fd, void __user *buff, size_t len, unsigned int flags,
 	if (sock->file->f_flags & O_NONBLOCK)
 		flags |= MSG_DONTWAIT;
 	msg.msg_flags = flags;
-	err = __sock_sendmsg(sock, &msg);
-
-out_put:
-	fput_light(sock->file, fput_needed);
-out:
-	return err;
+	return __sock_sendmsg(sock, &msg);
 }
 
 SYSCALL_DEFINE6(sendto, int, fd, void __user *, buff, size_t, len,
@@ -2246,14 +2222,18 @@ int __sys_recvfrom(int fd, void __user *ubuf, size_t size, unsigned int flags,
 	};
 	struct socket *sock;
 	int err, err2;
-	int fput_needed;
 
 	err = import_ubuf(ITER_DEST, ubuf, size, &msg.msg_iter);
 	if (unlikely(err))
 		return err;
-	sock = sockfd_lookup_light(fd, &err, &fput_needed);
-	if (!sock)
-		goto out;
+
+	CLASS(fd, f)(fd);
+
+	if (fd_empty(f))
+		return -EBADF;
+	sock = sock_from_file(fd_file(f));
+	if (unlikely(!sock))
+		return -ENOTSOCK;
 
 	if (sock->file->f_flags & O_NONBLOCK)
 		flags |= MSG_DONTWAIT;
@@ -2265,9 +2245,6 @@ int __sys_recvfrom(int fd, void __user *ubuf, size_t size, unsigned int flags,
 		if (err2 < 0)
 			err = err2;
 	}
-
-	fput_light(sock->file, fput_needed);
-out:
 	return err;
 }
 
@@ -2342,17 +2319,16 @@ int __sys_setsockopt(int fd, int level, int optname, char __user *user_optval,
 {
 	sockptr_t optval = USER_SOCKPTR(user_optval);
 	bool compat = in_compat_syscall();
-	int err, fput_needed;
 	struct socket *sock;
+	CLASS(fd, f)(fd);
 
-	sock = sockfd_lookup_light(fd, &err, &fput_needed);
-	if (!sock)
-		return err;
-
-	err = do_sock_setsockopt(sock, compat, level, optname, optval, optlen);
+	if (fd_empty(f))
+		return -EBADF;
+	sock = sock_from_file(fd_file(f));
+	if (unlikely(!sock))
+		return -ENOTSOCK;
 
-	fput_light(sock->file, fput_needed);
-	return err;
+	return do_sock_setsockopt(sock, compat, level, optname, optval, optlen);
 }
 
 SYSCALL_DEFINE5(setsockopt, int, fd, int, level, int, optname,
@@ -2408,20 +2384,17 @@ EXPORT_SYMBOL(do_sock_getsockopt);
 int __sys_getsockopt(int fd, int level, int optname, char __user *optval,
 		int __user *optlen)
 {
-	int err, fput_needed;
 	struct socket *sock;
-	bool compat;
+	CLASS(fd, f)(fd);
 
-	sock = sockfd_lookup_light(fd, &err, &fput_needed);
-	if (!sock)
-		return err;
+	if (fd_empty(f))
+		return -EBADF;
+	sock = sock_from_file(fd_file(f));
+	if (unlikely(!sock))
+		return -ENOTSOCK;
 
-	compat = in_compat_syscall();
-	err = do_sock_getsockopt(sock, compat, level, optname,
+	return do_sock_getsockopt(sock, in_compat_syscall(), level, optname,
 				 USER_SOCKPTR(optval), USER_SOCKPTR(optlen));
-
-	fput_light(sock->file, fput_needed);
-	return err;
 }
 
 SYSCALL_DEFINE5(getsockopt, int, fd, int, level, int, optname,
@@ -2447,15 +2420,16 @@ int __sys_shutdown_sock(struct socket *sock, int how)
 
 int __sys_shutdown(int fd, int how)
 {
-	int err, fput_needed;
 	struct socket *sock;
+	CLASS(fd, f)(fd);
 
-	sock = sockfd_lookup_light(fd, &err, &fput_needed);
-	if (sock != NULL) {
-		err = __sys_shutdown_sock(sock, how);
-		fput_light(sock->file, fput_needed);
-	}
-	return err;
+	if (fd_empty(f))
+		return -EBADF;
+	sock = sock_from_file(fd_file(f));
+	if (unlikely(!sock))
+		return -ENOTSOCK;
+
+	return __sys_shutdown_sock(sock, how);
 }
 
 SYSCALL_DEFINE2(shutdown, int, fd, int, how)
@@ -2671,22 +2645,21 @@ long __sys_sendmsg_sock(struct socket *sock, struct msghdr *msg,
 long __sys_sendmsg(int fd, struct user_msghdr __user *msg, unsigned int flags,
 		   bool forbid_cmsg_compat)
 {
-	int fput_needed, err;
 	struct msghdr msg_sys;
 	struct socket *sock;
 
 	if (forbid_cmsg_compat && (flags & MSG_CMSG_COMPAT))
 		return -EINVAL;
 
-	sock = sockfd_lookup_light(fd, &err, &fput_needed);
-	if (!sock)
-		goto out;
+	CLASS(fd, f)(fd);
 
-	err = ___sys_sendmsg(sock, msg, &msg_sys, flags, NULL, 0);
+	if (fd_empty(f))
+		return -EBADF;
+	sock = sock_from_file(fd_file(f));
+	if (unlikely(!sock))
+		return -ENOTSOCK;
 
-	fput_light(sock->file, fput_needed);
-out:
-	return err;
+	return ___sys_sendmsg(sock, msg, &msg_sys, flags, NULL, 0);
 }
 
 SYSCALL_DEFINE3(sendmsg, int, fd, struct user_msghdr __user *, msg, unsigned int, flags)
@@ -2701,7 +2674,7 @@ SYSCALL_DEFINE3(sendmsg, int, fd, struct user_msghdr __user *, msg, unsigned int
 int __sys_sendmmsg(int fd, struct mmsghdr __user *mmsg, unsigned int vlen,
 		   unsigned int flags, bool forbid_cmsg_compat)
 {
-	int fput_needed, err, datagrams;
+	int err, datagrams;
 	struct socket *sock;
 	struct mmsghdr __user *entry;
 	struct compat_mmsghdr __user *compat_entry;
@@ -2717,9 +2690,13 @@ int __sys_sendmmsg(int fd, struct mmsghdr __user *mmsg, unsigned int vlen,
 
 	datagrams = 0;
 
-	sock = sockfd_lookup_light(fd, &err, &fput_needed);
-	if (!sock)
-		return err;
+	CLASS(fd, f)(fd);
+
+	if (fd_empty(f))
+		return -EBADF;
+	sock = sock_from_file(fd_file(f));
+	if (unlikely(!sock))
+		return -ENOTSOCK;
 
 	used_address.name_len = UINT_MAX;
 	entry = mmsg;
@@ -2756,8 +2733,6 @@ int __sys_sendmmsg(int fd, struct mmsghdr __user *mmsg, unsigned int vlen,
 		cond_resched();
 	}
 
-	fput_light(sock->file, fput_needed);
-
 	/* We only return an error if no datagrams were able to be sent */
 	if (datagrams != 0)
 		return datagrams;
@@ -2879,22 +2854,21 @@ long __sys_recvmsg_sock(struct socket *sock, struct msghdr *msg,
 long __sys_recvmsg(int fd, struct user_msghdr __user *msg, unsigned int flags,
 		   bool forbid_cmsg_compat)
 {
-	int fput_needed, err;
 	struct msghdr msg_sys;
 	struct socket *sock;
 
 	if (forbid_cmsg_compat && (flags & MSG_CMSG_COMPAT))
 		return -EINVAL;
 
-	sock = sockfd_lookup_light(fd, &err, &fput_needed);
-	if (!sock)
-		goto out;
+	CLASS(fd, f)(fd);
 
-	err = ___sys_recvmsg(sock, msg, &msg_sys, flags, 0);
+	if (fd_empty(f))
+		return -EBADF;
+	sock = sock_from_file(fd_file(f));
+	if (unlikely(!sock))
+		return -ENOTSOCK;
 
-	fput_light(sock->file, fput_needed);
-out:
-	return err;
+	return ___sys_recvmsg(sock, msg, &msg_sys, flags, 0);
 }
 
 SYSCALL_DEFINE3(recvmsg, int, fd, struct user_msghdr __user *, msg,
@@ -2911,7 +2885,7 @@ static int do_recvmmsg(int fd, struct mmsghdr __user *mmsg,
 			  unsigned int vlen, unsigned int flags,
 			  struct timespec64 *timeout)
 {
-	int fput_needed, err, datagrams;
+	int err, datagrams;
 	struct socket *sock;
 	struct mmsghdr __user *entry;
 	struct compat_mmsghdr __user *compat_entry;
@@ -2926,16 +2900,18 @@ static int do_recvmmsg(int fd, struct mmsghdr __user *mmsg,
 
 	datagrams = 0;
 
-	sock = sockfd_lookup_light(fd, &err, &fput_needed);
-	if (!sock)
-		return err;
+	CLASS(fd, f)(fd);
+
+	if (fd_empty(f))
+		return -EBADF;
+	sock = sock_from_file(fd_file(f));
+	if (unlikely(!sock))
+		return -ENOTSOCK;
 
 	if (likely(!(flags & MSG_ERRQUEUE))) {
 		err = sock_error(sock->sk);
-		if (err) {
-			datagrams = err;
-			goto out_put;
-		}
+		if (err)
+			return err;
 	}
 
 	entry = mmsg;
@@ -2992,12 +2968,10 @@ static int do_recvmmsg(int fd, struct mmsghdr __user *mmsg,
 	}
 
 	if (err == 0)
-		goto out_put;
+		return datagrams;
 
-	if (datagrams == 0) {
-		datagrams = err;
-		goto out_put;
-	}
+	if (datagrams == 0)
+		return err;
 
 	/*
 	 * We may return less entries than requested (vlen) if the
@@ -3012,9 +2986,6 @@ static int do_recvmmsg(int fd, struct mmsghdr __user *mmsg,
 		 */
 		WRITE_ONCE(sock->sk->sk_err, -err);
 	}
-out_put:
-	fput_light(sock->file, fput_needed);
-
 	return datagrams;
 }
 
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 134+ messages in thread

* [PATCH v3 02/28] regularize emptiness checks in fini_module(2) and vfs_dedupe_file_range()
  2024-11-02  5:07   ` [PATCH v3 01/28] net/socket.c: switch to CLASS(fd) Al Viro
@ 2024-11-02  5:08     ` Al Viro
  2024-11-02  5:08     ` [PATCH v3 03/28] timerfd: switch to CLASS(fd) Al Viro
                       ` (26 subsequent siblings)
  27 siblings, 0 replies; 134+ messages in thread
From: Al Viro @ 2024-11-02  5:08 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: viro, brauner, cgroups, kvm, netdev, torvalds

With few exceptions emptiness checks are done as fd_file(...) in boolean
context (usually something like if (!fd_file(f))...); those will be
taken care of later.

However, there's a couple of places where we do those checks as
'store fd_file(...) into a variable, then check if this variable is
NULL' and those are harder to spot.

Get rid of those now.

use fd_empty() instead of extracting file and then checking it for NULL.

Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/remap_range.c     | 5 ++---
 kernel/module/main.c | 4 +++-
 2 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/fs/remap_range.c b/fs/remap_range.c
index 4403d5c68fcb..017d0d1ea6c9 100644
--- a/fs/remap_range.c
+++ b/fs/remap_range.c
@@ -537,9 +537,8 @@ int vfs_dedupe_file_range(struct file *file, struct file_dedupe_range *same)
 
 	for (i = 0, info = same->info; i < count; i++, info++) {
 		struct fd dst_fd = fdget(info->dest_fd);
-		struct file *dst_file = fd_file(dst_fd);
 
-		if (!dst_file) {
+		if (fd_empty(dst_fd)) {
 			info->status = -EBADF;
 			goto next_loop;
 		}
@@ -549,7 +548,7 @@ int vfs_dedupe_file_range(struct file *file, struct file_dedupe_range *same)
 			goto next_fdput;
 		}
 
-		deduped = vfs_dedupe_file_range_one(file, off, dst_file,
+		deduped = vfs_dedupe_file_range_one(file, off, fd_file(dst_fd),
 						    info->dest_offset, len,
 						    REMAP_FILE_CAN_SHORTEN);
 		if (deduped == -EBADE)
diff --git a/kernel/module/main.c b/kernel/module/main.c
index 49b9bca9de12..d785973d8a51 100644
--- a/kernel/module/main.c
+++ b/kernel/module/main.c
@@ -3202,7 +3202,7 @@ static int idempotent_init_module(struct file *f, const char __user * uargs, int
 {
 	struct idempotent idem;
 
-	if (!f || !(f->f_mode & FMODE_READ))
+	if (!(f->f_mode & FMODE_READ))
 		return -EBADF;
 
 	/* Are we the winners of the race and get to do this? */
@@ -3234,6 +3234,8 @@ SYSCALL_DEFINE3(finit_module, int, fd, const char __user *, uargs, int, flags)
 		return -EINVAL;
 
 	f = fdget(fd);
+	if (fd_empty(f))
+		return -EBADF;
 	err = idempotent_init_module(fd_file(f), uargs, flags);
 	fdput(f);
 	return err;
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 134+ messages in thread

* [PATCH v3 03/28] timerfd: switch to CLASS(fd)
  2024-11-02  5:07   ` [PATCH v3 01/28] net/socket.c: switch to CLASS(fd) Al Viro
  2024-11-02  5:08     ` [PATCH v3 02/28] regularize emptiness checks in fini_module(2) and vfs_dedupe_file_range() Al Viro
@ 2024-11-02  5:08     ` Al Viro
  2024-11-02  5:08     ` [PATCH v3 04/28] get rid of perf_fget_light(), convert kernel/events/core.c " Al Viro
                       ` (25 subsequent siblings)
  27 siblings, 0 replies; 134+ messages in thread
From: Al Viro @ 2024-11-02  5:08 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: viro, brauner, cgroups, kvm, netdev, torvalds

Fold timerfd_fget() into both callers to have fdget() and fdput() in
the same scope.  Could be done in different ways, but this is probably
the smallest solution.

Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/timerfd.c | 40 ++++++++++++++--------------------------
 1 file changed, 14 insertions(+), 26 deletions(-)

diff --git a/fs/timerfd.c b/fs/timerfd.c
index 137523e0bb21..4c32244b0508 100644
--- a/fs/timerfd.c
+++ b/fs/timerfd.c
@@ -394,19 +394,6 @@ static const struct file_operations timerfd_fops = {
 	.unlocked_ioctl	= timerfd_ioctl,
 };
 
-static int timerfd_fget(int fd, struct fd *p)
-{
-	struct fd f = fdget(fd);
-	if (!fd_file(f))
-		return -EBADF;
-	if (fd_file(f)->f_op != &timerfd_fops) {
-		fdput(f);
-		return -EINVAL;
-	}
-	*p = f;
-	return 0;
-}
-
 SYSCALL_DEFINE2(timerfd_create, int, clockid, int, flags)
 {
 	int ufd;
@@ -471,7 +458,6 @@ static int do_timerfd_settime(int ufd, int flags,
 		const struct itimerspec64 *new,
 		struct itimerspec64 *old)
 {
-	struct fd f;
 	struct timerfd_ctx *ctx;
 	int ret;
 
@@ -479,15 +465,17 @@ static int do_timerfd_settime(int ufd, int flags,
 		 !itimerspec64_valid(new))
 		return -EINVAL;
 
-	ret = timerfd_fget(ufd, &f);
-	if (ret)
-		return ret;
+	CLASS(fd, f)(ufd);
+	if (fd_empty(f))
+		return -EBADF;
+
+	if (fd_file(f)->f_op != &timerfd_fops)
+		return -EINVAL;
+
 	ctx = fd_file(f)->private_data;
 
-	if (isalarm(ctx) && !capable(CAP_WAKE_ALARM)) {
-		fdput(f);
+	if (isalarm(ctx) && !capable(CAP_WAKE_ALARM))
 		return -EPERM;
-	}
 
 	timerfd_setup_cancel(ctx, flags);
 
@@ -535,17 +523,18 @@ static int do_timerfd_settime(int ufd, int flags,
 	ret = timerfd_setup(ctx, flags, new);
 
 	spin_unlock_irq(&ctx->wqh.lock);
-	fdput(f);
 	return ret;
 }
 
 static int do_timerfd_gettime(int ufd, struct itimerspec64 *t)
 {
-	struct fd f;
 	struct timerfd_ctx *ctx;
-	int ret = timerfd_fget(ufd, &f);
-	if (ret)
-		return ret;
+	CLASS(fd, f)(ufd);
+
+	if (fd_empty(f))
+		return -EBADF;
+	if (fd_file(f)->f_op != &timerfd_fops)
+		return -EINVAL;
 	ctx = fd_file(f)->private_data;
 
 	spin_lock_irq(&ctx->wqh.lock);
@@ -567,7 +556,6 @@ static int do_timerfd_gettime(int ufd, struct itimerspec64 *t)
 	t->it_value = ktime_to_timespec64(timerfd_get_remaining(ctx));
 	t->it_interval = ktime_to_timespec64(ctx->tintv);
 	spin_unlock_irq(&ctx->wqh.lock);
-	fdput(f);
 	return 0;
 }
 
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 134+ messages in thread

* [PATCH v3 04/28] get rid of perf_fget_light(), convert kernel/events/core.c to CLASS(fd)
  2024-11-02  5:07   ` [PATCH v3 01/28] net/socket.c: switch to CLASS(fd) Al Viro
  2024-11-02  5:08     ` [PATCH v3 02/28] regularize emptiness checks in fini_module(2) and vfs_dedupe_file_range() Al Viro
  2024-11-02  5:08     ` [PATCH v3 03/28] timerfd: switch to CLASS(fd) Al Viro
@ 2024-11-02  5:08     ` Al Viro
  2024-11-02  5:08     ` [PATCH v3 05/28] switch netlink_getsockbyfilp() to taking descriptor Al Viro
                       ` (24 subsequent siblings)
  27 siblings, 0 replies; 134+ messages in thread
From: Al Viro @ 2024-11-02  5:08 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: viro, brauner, cgroups, kvm, netdev, torvalds

Lift fdget() and fdput() out of perf_fget_light(), turning it into
is_perf_file(struct fd f).  The life gets easier in both callers
if we do fdget() unconditionally, including the case when we are
given -1 instead of a descriptor - that avoids a reassignment in
perf_event_open(2) and it avoids a nasty temptation in _perf_ioctl()
where we must *not* lift output_event out of scope for output.

Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 kernel/events/core.c | 49 +++++++++++++++-----------------------------
 1 file changed, 16 insertions(+), 33 deletions(-)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index e3589c4287cb..85b209626dd7 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -5998,18 +5998,9 @@ EXPORT_SYMBOL_GPL(perf_event_period);
 
 static const struct file_operations perf_fops;
 
-static inline int perf_fget_light(int fd, struct fd *p)
+static inline bool is_perf_file(struct fd f)
 {
-	struct fd f = fdget(fd);
-	if (!fd_file(f))
-		return -EBADF;
-
-	if (fd_file(f)->f_op != &perf_fops) {
-		fdput(f);
-		return -EBADF;
-	}
-	*p = f;
-	return 0;
+	return !fd_empty(f) && fd_file(f)->f_op == &perf_fops;
 }
 
 static int perf_event_set_output(struct perf_event *event,
@@ -6057,20 +6048,14 @@ static long _perf_ioctl(struct perf_event *event, unsigned int cmd, unsigned lon
 
 	case PERF_EVENT_IOC_SET_OUTPUT:
 	{
-		int ret;
+		CLASS(fd, output)(arg);	     // arg == -1 => empty
+		struct perf_event *output_event = NULL;
 		if (arg != -1) {
-			struct perf_event *output_event;
-			struct fd output;
-			ret = perf_fget_light(arg, &output);
-			if (ret)
-				return ret;
+			if (!is_perf_file(output))
+				return -EBADF;
 			output_event = fd_file(output)->private_data;
-			ret = perf_event_set_output(event, output_event);
-			fdput(output);
-		} else {
-			ret = perf_event_set_output(event, NULL);
 		}
-		return ret;
+		return perf_event_set_output(event, output_event);
 	}
 
 	case PERF_EVENT_IOC_SET_FILTER:
@@ -12664,7 +12649,6 @@ SYSCALL_DEFINE5(perf_event_open,
 	struct perf_event_attr attr;
 	struct perf_event_context *ctx;
 	struct file *event_file = NULL;
-	struct fd group = EMPTY_FD;
 	struct task_struct *task = NULL;
 	struct pmu *pmu;
 	int event_fd;
@@ -12735,10 +12719,12 @@ SYSCALL_DEFINE5(perf_event_open,
 	if (event_fd < 0)
 		return event_fd;
 
+	CLASS(fd, group)(group_fd);     // group_fd == -1 => empty
 	if (group_fd != -1) {
-		err = perf_fget_light(group_fd, &group);
-		if (err)
+		if (!is_perf_file(group)) {
+			err = -EBADF;
 			goto err_fd;
+		}
 		group_leader = fd_file(group)->private_data;
 		if (flags & PERF_FLAG_FD_OUTPUT)
 			output_event = group_leader;
@@ -12750,7 +12736,7 @@ SYSCALL_DEFINE5(perf_event_open,
 		task = find_lively_task_by_vpid(pid);
 		if (IS_ERR(task)) {
 			err = PTR_ERR(task);
-			goto err_group_fd;
+			goto err_fd;
 		}
 	}
 
@@ -13017,12 +13003,11 @@ SYSCALL_DEFINE5(perf_event_open,
 	mutex_unlock(&current->perf_event_mutex);
 
 	/*
-	 * Drop the reference on the group_event after placing the
-	 * new event on the sibling_list. This ensures destruction
-	 * of the group leader will find the pointer to itself in
-	 * perf_group_detach().
+	 * File reference in group guarantees that group_leader has been
+	 * kept alive until we place the new event on the sibling_list.
+	 * This ensures destruction of the group leader will find
+	 * the pointer to itself in perf_group_detach().
 	 */
-	fdput(group);
 	fd_install(event_fd, event_file);
 	return event_fd;
 
@@ -13041,8 +13026,6 @@ SYSCALL_DEFINE5(perf_event_open,
 err_task:
 	if (task)
 		put_task_struct(task);
-err_group_fd:
-	fdput(group);
 err_fd:
 	put_unused_fd(event_fd);
 	return err;
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 134+ messages in thread

* [PATCH v3 05/28] switch netlink_getsockbyfilp() to taking descriptor
  2024-11-02  5:07   ` [PATCH v3 01/28] net/socket.c: switch to CLASS(fd) Al Viro
                       ` (2 preceding siblings ...)
  2024-11-02  5:08     ` [PATCH v3 04/28] get rid of perf_fget_light(), convert kernel/events/core.c " Al Viro
@ 2024-11-02  5:08     ` Al Viro
  2024-11-02  5:08     ` [PATCH v3 06/28] do_mq_notify(): saner skb freeing on failures Al Viro
                       ` (23 subsequent siblings)
  27 siblings, 0 replies; 134+ messages in thread
From: Al Viro @ 2024-11-02  5:08 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: viro, brauner, cgroups, kvm, netdev, torvalds

the only call site (in do_mq_notify()) obtains the argument
from an immediately preceding fdget() and it is immediately
followed by fdput(); might as well just replace it with
a variant that would take a descriptor instead of struct file *
and have file lookups handled inside that function.

Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 include/linux/netlink.h  | 2 +-
 ipc/mqueue.c             | 8 +-------
 net/netlink/af_netlink.c | 9 +++++++--
 3 files changed, 9 insertions(+), 10 deletions(-)

diff --git a/include/linux/netlink.h b/include/linux/netlink.h
index b332c2048c75..a48a30842d84 100644
--- a/include/linux/netlink.h
+++ b/include/linux/netlink.h
@@ -239,7 +239,7 @@ int netlink_register_notifier(struct notifier_block *nb);
 int netlink_unregister_notifier(struct notifier_block *nb);
 
 /* finegrained unicast helpers: */
-struct sock *netlink_getsockbyfilp(struct file *filp);
+struct sock *netlink_getsockbyfd(int fd);
 int netlink_attachskb(struct sock *sk, struct sk_buff *skb,
 		      long *timeo, struct sock *ssk);
 void netlink_detachskb(struct sock *sk, struct sk_buff *skb);
diff --git a/ipc/mqueue.c b/ipc/mqueue.c
index 34fa0bd8bb11..fd05e3d4f7b6 100644
--- a/ipc/mqueue.c
+++ b/ipc/mqueue.c
@@ -1355,13 +1355,7 @@ static int do_mq_notify(mqd_t mqdes, const struct sigevent *notification)
 			skb_put(nc, NOTIFY_COOKIE_LEN);
 			/* and attach it to the socket */
 retry:
-			f = fdget(notification->sigev_signo);
-			if (!fd_file(f)) {
-				ret = -EBADF;
-				goto out;
-			}
-			sock = netlink_getsockbyfilp(fd_file(f));
-			fdput(f);
+			sock = netlink_getsockbyfd(notification->sigev_signo);
 			if (IS_ERR(sock)) {
 				ret = PTR_ERR(sock);
 				goto free_skb;
diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
index 0b7a89db3ab7..42451ac355d0 100644
--- a/net/netlink/af_netlink.c
+++ b/net/netlink/af_netlink.c
@@ -1180,11 +1180,16 @@ static struct sock *netlink_getsockbyportid(struct sock *ssk, u32 portid)
 	return sock;
 }
 
-struct sock *netlink_getsockbyfilp(struct file *filp)
+struct sock *netlink_getsockbyfd(int fd)
 {
-	struct inode *inode = file_inode(filp);
+	CLASS(fd, f)(fd);
+	struct inode *inode;
 	struct sock *sock;
 
+	if (fd_empty(f))
+		return ERR_PTR(-EBADF);
+
+	inode = file_inode(fd_file(f));
 	if (!S_ISSOCK(inode->i_mode))
 		return ERR_PTR(-ENOTSOCK);
 
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 134+ messages in thread

* [PATCH v3 06/28] do_mq_notify(): saner skb freeing on failures
  2024-11-02  5:07   ` [PATCH v3 01/28] net/socket.c: switch to CLASS(fd) Al Viro
                       ` (3 preceding siblings ...)
  2024-11-02  5:08     ` [PATCH v3 05/28] switch netlink_getsockbyfilp() to taking descriptor Al Viro
@ 2024-11-02  5:08     ` Al Viro
  2024-11-02  5:08     ` [PATCH v3 07/28] do_mq_notify(): switch to CLASS(fd) Al Viro
                       ` (22 subsequent siblings)
  27 siblings, 0 replies; 134+ messages in thread
From: Al Viro @ 2024-11-02  5:08 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: viro, brauner, cgroups, kvm, netdev, torvalds

cleanup is convoluted enough as it is; it's easier to have early
failure outs do explicit kfree_skb(nc), rather than going to
contortions needed to reuse the cleanup from late failures.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 ipc/mqueue.c | 16 +++++-----------
 1 file changed, 5 insertions(+), 11 deletions(-)

diff --git a/ipc/mqueue.c b/ipc/mqueue.c
index fd05e3d4f7b6..48640a362637 100644
--- a/ipc/mqueue.c
+++ b/ipc/mqueue.c
@@ -1347,8 +1347,8 @@ static int do_mq_notify(mqd_t mqdes, const struct sigevent *notification)
 			if (copy_from_user(nc->data,
 					notification->sigev_value.sival_ptr,
 					NOTIFY_COOKIE_LEN)) {
-				ret = -EFAULT;
-				goto free_skb;
+				kfree_skb(nc);
+				return -EFAULT;
 			}
 
 			/* TODO: add a header? */
@@ -1357,16 +1357,14 @@ static int do_mq_notify(mqd_t mqdes, const struct sigevent *notification)
 retry:
 			sock = netlink_getsockbyfd(notification->sigev_signo);
 			if (IS_ERR(sock)) {
-				ret = PTR_ERR(sock);
-				goto free_skb;
+				kfree_skb(nc);
+				return PTR_ERR(sock);
 			}
 
 			timeo = MAX_SCHEDULE_TIMEOUT;
 			ret = netlink_attachskb(sock, nc, &timeo, NULL);
-			if (ret == 1) {
-				sock = NULL;
+			if (ret == 1)
 				goto retry;
-			}
 			if (ret)
 				return ret;
 		}
@@ -1425,10 +1423,6 @@ static int do_mq_notify(mqd_t mqdes, const struct sigevent *notification)
 out:
 	if (sock)
 		netlink_detachskb(sock, nc);
-	else
-free_skb:
-		dev_kfree_skb(nc);
-
 	return ret;
 }
 
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 134+ messages in thread

* [PATCH v3 07/28] do_mq_notify(): switch to CLASS(fd)
  2024-11-02  5:07   ` [PATCH v3 01/28] net/socket.c: switch to CLASS(fd) Al Viro
                       ` (4 preceding siblings ...)
  2024-11-02  5:08     ` [PATCH v3 06/28] do_mq_notify(): saner skb freeing on failures Al Viro
@ 2024-11-02  5:08     ` Al Viro
  2024-11-02  5:08     ` [PATCH v3 08/28] simplify xfs_find_handle() a bit Al Viro
                       ` (21 subsequent siblings)
  27 siblings, 0 replies; 134+ messages in thread
From: Al Viro @ 2024-11-02  5:08 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: viro, brauner, cgroups, kvm, netdev, torvalds

The only failure exit before fdget() is a return, the only thing done
after fdput() is transposable with it.

Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 ipc/mqueue.c | 9 +++------
 1 file changed, 3 insertions(+), 6 deletions(-)

diff --git a/ipc/mqueue.c b/ipc/mqueue.c
index 48640a362637..4f1dec518fae 100644
--- a/ipc/mqueue.c
+++ b/ipc/mqueue.c
@@ -1317,7 +1317,6 @@ SYSCALL_DEFINE5(mq_timedreceive, mqd_t, mqdes, char __user *, u_msg_ptr,
 static int do_mq_notify(mqd_t mqdes, const struct sigevent *notification)
 {
 	int ret;
-	struct fd f;
 	struct sock *sock;
 	struct inode *inode;
 	struct mqueue_inode_info *info;
@@ -1370,8 +1369,8 @@ static int do_mq_notify(mqd_t mqdes, const struct sigevent *notification)
 		}
 	}
 
-	f = fdget(mqdes);
-	if (!fd_file(f)) {
+	CLASS(fd, f)(mqdes);
+	if (fd_empty(f)) {
 		ret = -EBADF;
 		goto out;
 	}
@@ -1379,7 +1378,7 @@ static int do_mq_notify(mqd_t mqdes, const struct sigevent *notification)
 	inode = file_inode(fd_file(f));
 	if (unlikely(fd_file(f)->f_op != &mqueue_file_operations)) {
 		ret = -EBADF;
-		goto out_fput;
+		goto out;
 	}
 	info = MQUEUE_I(inode);
 
@@ -1418,8 +1417,6 @@ static int do_mq_notify(mqd_t mqdes, const struct sigevent *notification)
 		inode_set_atime_to_ts(inode, inode_set_ctime_current(inode));
 	}
 	spin_unlock(&info->lock);
-out_fput:
-	fdput(f);
 out:
 	if (sock)
 		netlink_detachskb(sock, nc);
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 134+ messages in thread

* [PATCH v3 08/28] simplify xfs_find_handle() a bit
  2024-11-02  5:07   ` [PATCH v3 01/28] net/socket.c: switch to CLASS(fd) Al Viro
                       ` (5 preceding siblings ...)
  2024-11-02  5:08     ` [PATCH v3 07/28] do_mq_notify(): switch to CLASS(fd) Al Viro
@ 2024-11-02  5:08     ` Al Viro
  2024-11-02  5:08     ` [PATCH v3 09/28] convert vmsplice() to CLASS(fd) Al Viro
                       ` (20 subsequent siblings)
  27 siblings, 0 replies; 134+ messages in thread
From: Al Viro @ 2024-11-02  5:08 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: viro, brauner, cgroups, kvm, netdev, torvalds

XFS_IOC_FD_TO_HANDLE can grab a reference to copied ->f_path and
let the file go; results in simpler control flow - cleanup is
the same for both "by descriptor" and "by pathname" cases.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/xfs/xfs_handle.c | 16 +++++++---------
 1 file changed, 7 insertions(+), 9 deletions(-)

diff --git a/fs/xfs/xfs_handle.c b/fs/xfs/xfs_handle.c
index 49e5e5f04e60..f19fce557354 100644
--- a/fs/xfs/xfs_handle.c
+++ b/fs/xfs/xfs_handle.c
@@ -85,22 +85,23 @@ xfs_find_handle(
 	int			hsize;
 	xfs_handle_t		handle;
 	struct inode		*inode;
-	struct fd		f = EMPTY_FD;
 	struct path		path;
 	int			error;
 	struct xfs_inode	*ip;
 
 	if (cmd == XFS_IOC_FD_TO_HANDLE) {
-		f = fdget(hreq->fd);
-		if (!fd_file(f))
+		CLASS(fd, f)(hreq->fd);
+
+		if (fd_empty(f))
 			return -EBADF;
-		inode = file_inode(fd_file(f));
+		path = fd_file(f)->f_path;
+		path_get(&path);
 	} else {
 		error = user_path_at(AT_FDCWD, hreq->path, 0, &path);
 		if (error)
 			return error;
-		inode = d_inode(path.dentry);
 	}
+	inode = d_inode(path.dentry);
 	ip = XFS_I(inode);
 
 	/*
@@ -134,10 +135,7 @@ xfs_find_handle(
 	error = 0;
 
  out_put:
-	if (cmd == XFS_IOC_FD_TO_HANDLE)
-		fdput(f);
-	else
-		path_put(&path);
+	path_put(&path);
 	return error;
 }
 
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 134+ messages in thread

* [PATCH v3 09/28] convert vmsplice() to CLASS(fd)
  2024-11-02  5:07   ` [PATCH v3 01/28] net/socket.c: switch to CLASS(fd) Al Viro
                       ` (6 preceding siblings ...)
  2024-11-02  5:08     ` [PATCH v3 08/28] simplify xfs_find_handle() a bit Al Viro
@ 2024-11-02  5:08     ` Al Viro
  2024-11-02  5:08     ` [PATCH v3 10/28] fdget_raw() users: switch to CLASS(fd_raw) Al Viro
                       ` (19 subsequent siblings)
  27 siblings, 0 replies; 134+ messages in thread
From: Al Viro @ 2024-11-02  5:08 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: viro, brauner, cgroups, kvm, netdev, torvalds

Irregularity here is fdput() not in the same scope as fdget(); we could
just lift it out vmsplice_type() in vmsplice(2), but there's no much
point keeping vmsplice_type() separate after that...

Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/splice.c | 33 ++++++++++-----------------------
 1 file changed, 10 insertions(+), 23 deletions(-)

diff --git a/fs/splice.c b/fs/splice.c
index 06232d7e505f..29cd39d7f4a0 100644
--- a/fs/splice.c
+++ b/fs/splice.c
@@ -1564,21 +1564,6 @@ static ssize_t vmsplice_to_pipe(struct file *file, struct iov_iter *iter,
 	return ret;
 }
 
-static int vmsplice_type(struct fd f, int *type)
-{
-	if (!fd_file(f))
-		return -EBADF;
-	if (fd_file(f)->f_mode & FMODE_WRITE) {
-		*type = ITER_SOURCE;
-	} else if (fd_file(f)->f_mode & FMODE_READ) {
-		*type = ITER_DEST;
-	} else {
-		fdput(f);
-		return -EBADF;
-	}
-	return 0;
-}
-
 /*
  * Note that vmsplice only really supports true splicing _from_ user memory
  * to a pipe, not the other way around. Splicing from user memory is a simple
@@ -1602,21 +1587,25 @@ SYSCALL_DEFINE4(vmsplice, int, fd, const struct iovec __user *, uiov,
 	struct iovec *iov = iovstack;
 	struct iov_iter iter;
 	ssize_t error;
-	struct fd f;
 	int type;
 
 	if (unlikely(flags & ~SPLICE_F_ALL))
 		return -EINVAL;
 
-	f = fdget(fd);
-	error = vmsplice_type(f, &type);
-	if (error)
-		return error;
+	CLASS(fd, f)(fd);
+	if (fd_empty(f))
+		return -EBADF;
+	if (fd_file(f)->f_mode & FMODE_WRITE)
+		type = ITER_SOURCE;
+	else if (fd_file(f)->f_mode & FMODE_READ)
+		type = ITER_DEST;
+	else
+		return -EBADF;
 
 	error = import_iovec(type, uiov, nr_segs,
 			     ARRAY_SIZE(iovstack), &iov, &iter);
 	if (error < 0)
-		goto out_fdput;
+		return error;
 
 	if (!iov_iter_count(&iter))
 		error = 0;
@@ -1626,8 +1615,6 @@ SYSCALL_DEFINE4(vmsplice, int, fd, const struct iovec __user *, uiov,
 		error = vmsplice_to_user(fd_file(f), &iter, flags);
 
 	kfree(iov);
-out_fdput:
-	fdput(f);
 	return error;
 }
 
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 134+ messages in thread

* [PATCH v3 10/28] fdget_raw() users: switch to CLASS(fd_raw)
  2024-11-02  5:07   ` [PATCH v3 01/28] net/socket.c: switch to CLASS(fd) Al Viro
                       ` (7 preceding siblings ...)
  2024-11-02  5:08     ` [PATCH v3 09/28] convert vmsplice() to CLASS(fd) Al Viro
@ 2024-11-02  5:08     ` Al Viro
  2024-11-02  5:08     ` [PATCH v3 11/28] introduce "fd_pos" class, convert fdget_pos() users to it Al Viro
                       ` (18 subsequent siblings)
  27 siblings, 0 replies; 134+ messages in thread
From: Al Viro @ 2024-11-02  5:08 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: viro, brauner, cgroups, kvm, netdev, torvalds

Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 arch/arm/kernel/sys_oabi-compat.c | 10 +++-----
 fs/fcntl.c                        | 42 +++++++++++++------------------
 fs/namei.c                        | 13 +++-------
 fs/open.c                         | 13 +++-------
 fs/quota/quota.c                  | 12 +++------
 fs/statfs.c                       | 12 ++++-----
 kernel/cgroup/cgroup.c            |  9 +++----
 security/landlock/syscalls.c      | 19 +++++---------
 8 files changed, 47 insertions(+), 83 deletions(-)

diff --git a/arch/arm/kernel/sys_oabi-compat.c b/arch/arm/kernel/sys_oabi-compat.c
index f5781ff54a5c..2944721e82a2 100644
--- a/arch/arm/kernel/sys_oabi-compat.c
+++ b/arch/arm/kernel/sys_oabi-compat.c
@@ -235,12 +235,12 @@ asmlinkage long sys_oabi_fcntl64(unsigned int fd, unsigned int cmd,
 				 unsigned long arg)
 {
 	void __user *argp = (void __user *)arg;
-	struct fd f = fdget_raw(fd);
+	CLASS(fd_raw, f)(fd);
 	struct flock64 flock;
-	long err = -EBADF;
+	long err;
 
-	if (!fd_file(f))
-		goto out;
+	if (fd_empty(f))
+		return -EBADF;
 
 	switch (cmd) {
 	case F_GETLK64:
@@ -271,8 +271,6 @@ asmlinkage long sys_oabi_fcntl64(unsigned int fd, unsigned int cmd,
 		err = sys_fcntl64(fd, cmd, arg);
 		break;
 	}
-	fdput(f);
-out:
 	return err;
 }
 
diff --git a/fs/fcntl.c b/fs/fcntl.c
index 22dd9dcce7ec..bd022a54bd0d 100644
--- a/fs/fcntl.c
+++ b/fs/fcntl.c
@@ -570,24 +570,21 @@ static int check_fcntl_cmd(unsigned cmd)
 
 SYSCALL_DEFINE3(fcntl, unsigned int, fd, unsigned int, cmd, unsigned long, arg)
 {	
-	struct fd f = fdget_raw(fd);
-	long err = -EBADF;
+	CLASS(fd_raw, f)(fd);
+	long err;
 
-	if (!fd_file(f))
-		goto out;
+	if (fd_empty(f))
+		return -EBADF;
 
 	if (unlikely(fd_file(f)->f_mode & FMODE_PATH)) {
 		if (!check_fcntl_cmd(cmd))
-			goto out1;
+			return -EBADF;
 	}
 
 	err = security_file_fcntl(fd_file(f), cmd, arg);
 	if (!err)
 		err = do_fcntl(fd, cmd, arg, fd_file(f));
 
-out1:
- 	fdput(f);
-out:
 	return err;
 }
 
@@ -596,21 +593,21 @@ SYSCALL_DEFINE3(fcntl64, unsigned int, fd, unsigned int, cmd,
 		unsigned long, arg)
 {	
 	void __user *argp = (void __user *)arg;
-	struct fd f = fdget_raw(fd);
+	CLASS(fd_raw, f)(fd);
 	struct flock64 flock;
-	long err = -EBADF;
+	long err;
 
-	if (!fd_file(f))
-		goto out;
+	if (fd_empty(f))
+		return -EBADF;
 
 	if (unlikely(fd_file(f)->f_mode & FMODE_PATH)) {
 		if (!check_fcntl_cmd(cmd))
-			goto out1;
+			return -EBADF;
 	}
 
 	err = security_file_fcntl(fd_file(f), cmd, arg);
 	if (err)
-		goto out1;
+		return err;
 	
 	switch (cmd) {
 	case F_GETLK64:
@@ -635,9 +632,6 @@ SYSCALL_DEFINE3(fcntl64, unsigned int, fd, unsigned int, cmd,
 		err = do_fcntl(fd, cmd, arg, fd_file(f));
 		break;
 	}
-out1:
-	fdput(f);
-out:
 	return err;
 }
 #endif
@@ -733,21 +727,21 @@ static int fixup_compat_flock(struct flock *flock)
 static long do_compat_fcntl64(unsigned int fd, unsigned int cmd,
 			     compat_ulong_t arg)
 {
-	struct fd f = fdget_raw(fd);
+	CLASS(fd_raw, f)(fd);
 	struct flock flock;
-	long err = -EBADF;
+	long err;
 
-	if (!fd_file(f))
-		return err;
+	if (fd_empty(f))
+		return -EBADF;
 
 	if (unlikely(fd_file(f)->f_mode & FMODE_PATH)) {
 		if (!check_fcntl_cmd(cmd))
-			goto out_put;
+			return -EBADF;
 	}
 
 	err = security_file_fcntl(fd_file(f), cmd, arg);
 	if (err)
-		goto out_put;
+		return err;
 
 	switch (cmd) {
 	case F_GETLK:
@@ -790,8 +784,6 @@ static long do_compat_fcntl64(unsigned int fd, unsigned int cmd,
 		err = do_fcntl(fd, cmd, arg, fd_file(f));
 		break;
 	}
-out_put:
-	fdput(f);
 	return err;
 }
 
diff --git a/fs/namei.c b/fs/namei.c
index 4a4a22a08ac2..f0db1e724262 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -2503,26 +2503,22 @@ static const char *path_init(struct nameidata *nd, unsigned flags)
 		}
 	} else {
 		/* Caller must check execute permissions on the starting path component */
-		struct fd f = fdget_raw(nd->dfd);
+		CLASS(fd_raw, f)(nd->dfd);
 		struct dentry *dentry;
 
-		if (!fd_file(f))
+		if (fd_empty(f))
 			return ERR_PTR(-EBADF);
 
 		if (flags & LOOKUP_LINKAT_EMPTY) {
 			if (fd_file(f)->f_cred != current_cred() &&
-			    !ns_capable(fd_file(f)->f_cred->user_ns, CAP_DAC_READ_SEARCH)) {
-				fdput(f);
+			    !ns_capable(fd_file(f)->f_cred->user_ns, CAP_DAC_READ_SEARCH))
 				return ERR_PTR(-ENOENT);
-			}
 		}
 
 		dentry = fd_file(f)->f_path.dentry;
 
-		if (*s && unlikely(!d_can_lookup(dentry))) {
-			fdput(f);
+		if (*s && unlikely(!d_can_lookup(dentry)))
 			return ERR_PTR(-ENOTDIR);
-		}
 
 		nd->path = fd_file(f)->f_path;
 		if (flags & LOOKUP_RCU) {
@@ -2532,7 +2528,6 @@ static const char *path_init(struct nameidata *nd, unsigned flags)
 			path_get(&nd->path);
 			nd->inode = nd->path.dentry->d_inode;
 		}
-		fdput(f);
 	}
 
 	/* For scoped-lookups we need to set the root to the dirfd as well. */
diff --git a/fs/open.c b/fs/open.c
index acaeb3e25c88..a0c1fa3f60d5 100644
--- a/fs/open.c
+++ b/fs/open.c
@@ -580,23 +580,18 @@ SYSCALL_DEFINE1(chdir, const char __user *, filename)
 
 SYSCALL_DEFINE1(fchdir, unsigned int, fd)
 {
-	struct fd f = fdget_raw(fd);
+	CLASS(fd_raw, f)(fd);
 	int error;
 
-	error = -EBADF;
-	if (!fd_file(f))
-		goto out;
+	if (fd_empty(f))
+		return -EBADF;
 
-	error = -ENOTDIR;
 	if (!d_can_lookup(fd_file(f)->f_path.dentry))
-		goto out_putf;
+		return -ENOTDIR;
 
 	error = file_permission(fd_file(f), MAY_EXEC | MAY_CHDIR);
 	if (!error)
 		set_fs_pwd(current->fs, &fd_file(f)->f_path);
-out_putf:
-	fdput(f);
-out:
 	return error;
 }
 
diff --git a/fs/quota/quota.c b/fs/quota/quota.c
index 290157bc7bec..7c2b75a44485 100644
--- a/fs/quota/quota.c
+++ b/fs/quota/quota.c
@@ -976,21 +976,19 @@ SYSCALL_DEFINE4(quotactl_fd, unsigned int, fd, unsigned int, cmd,
 	struct super_block *sb;
 	unsigned int cmds = cmd >> SUBCMDSHIFT;
 	unsigned int type = cmd & SUBCMDMASK;
-	struct fd f;
+	CLASS(fd_raw, f)(fd);
 	int ret;
 
-	f = fdget_raw(fd);
-	if (!fd_file(f))
+	if (fd_empty(f))
 		return -EBADF;
 
-	ret = -EINVAL;
 	if (type >= MAXQUOTAS)
-		goto out;
+		return -EINVAL;
 
 	if (quotactl_cmd_write(cmds)) {
 		ret = mnt_want_write(fd_file(f)->f_path.mnt);
 		if (ret)
-			goto out;
+			return ret;
 	}
 
 	sb = fd_file(f)->f_path.mnt->mnt_sb;
@@ -1008,7 +1006,5 @@ SYSCALL_DEFINE4(quotactl_fd, unsigned int, fd, unsigned int, cmd,
 
 	if (quotactl_cmd_write(cmds))
 		mnt_drop_write(fd_file(f)->f_path.mnt);
-out:
-	fdput(f);
 	return ret;
 }
diff --git a/fs/statfs.c b/fs/statfs.c
index 9c7bb27e7932..a45ac85e6048 100644
--- a/fs/statfs.c
+++ b/fs/statfs.c
@@ -114,13 +114,11 @@ int user_statfs(const char __user *pathname, struct kstatfs *st)
 
 int fd_statfs(int fd, struct kstatfs *st)
 {
-	struct fd f = fdget_raw(fd);
-	int error = -EBADF;
-	if (fd_file(f)) {
-		error = vfs_statfs(&fd_file(f)->f_path, st);
-		fdput(f);
-	}
-	return error;
+	CLASS(fd_raw, f)(fd);
+
+	if (fd_empty(f))
+		return -EBADF;
+	return vfs_statfs(&fd_file(f)->f_path, st);
 }
 
 static int do_statfs_native(struct kstatfs *st, struct statfs __user *p)
diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
index 5886b95c6eae..8305a67ea8d9 100644
--- a/kernel/cgroup/cgroup.c
+++ b/kernel/cgroup/cgroup.c
@@ -6966,14 +6966,11 @@ EXPORT_SYMBOL_GPL(cgroup_get_from_path);
  */
 struct cgroup *cgroup_v1v2_get_from_fd(int fd)
 {
-	struct cgroup *cgrp;
-	struct fd f = fdget_raw(fd);
-	if (!fd_file(f))
+	CLASS(fd_raw, f)(fd);
+	if (fd_empty(f))
 		return ERR_PTR(-EBADF);
 
-	cgrp = cgroup_v1v2_get_from_file(fd_file(f));
-	fdput(f);
-	return cgrp;
+	return cgroup_v1v2_get_from_file(fd_file(f));
 }
 
 /**
diff --git a/security/landlock/syscalls.c b/security/landlock/syscalls.c
index f5a0e7182ec0..f32eb38abd0f 100644
--- a/security/landlock/syscalls.c
+++ b/security/landlock/syscalls.c
@@ -276,15 +276,12 @@ static struct landlock_ruleset *get_ruleset_from_fd(const int fd,
  */
 static int get_path_from_fd(const s32 fd, struct path *const path)
 {
-	struct fd f;
-	int err = 0;
+	CLASS(fd_raw, f)(fd);
 
 	BUILD_BUG_ON(!__same_type(
 		fd, ((struct landlock_path_beneath_attr *)NULL)->parent_fd));
 
-	/* Handles O_PATH. */
-	f = fdget_raw(fd);
-	if (!fd_file(f))
+	if (fd_empty(f))
 		return -EBADF;
 	/*
 	 * Forbids ruleset FDs, internal filesystems (e.g. nsfs), including
@@ -295,16 +292,12 @@ static int get_path_from_fd(const s32 fd, struct path *const path)
 	    (fd_file(f)->f_path.mnt->mnt_flags & MNT_INTERNAL) ||
 	    (fd_file(f)->f_path.dentry->d_sb->s_flags & SB_NOUSER) ||
 	    d_is_negative(fd_file(f)->f_path.dentry) ||
-	    IS_PRIVATE(d_backing_inode(fd_file(f)->f_path.dentry))) {
-		err = -EBADFD;
-		goto out_fdput;
-	}
+	    IS_PRIVATE(d_backing_inode(fd_file(f)->f_path.dentry)))
+		return -EBADFD;
+
 	*path = fd_file(f)->f_path;
 	path_get(path);
-
-out_fdput:
-	fdput(f);
-	return err;
+	return 0;
 }
 
 static int add_rule_path_beneath(struct landlock_ruleset *const ruleset,
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 134+ messages in thread

* [PATCH v3 11/28] introduce "fd_pos" class, convert fdget_pos() users to it.
  2024-11-02  5:07   ` [PATCH v3 01/28] net/socket.c: switch to CLASS(fd) Al Viro
                       ` (8 preceding siblings ...)
  2024-11-02  5:08     ` [PATCH v3 10/28] fdget_raw() users: switch to CLASS(fd_raw) Al Viro
@ 2024-11-02  5:08     ` Al Viro
  2024-11-02  5:08     ` [PATCH v3 12/28] o2hb_region_dev_store(): avoid goto around fdget()/fdput() Al Viro
                       ` (17 subsequent siblings)
  27 siblings, 0 replies; 134+ messages in thread
From: Al Viro @ 2024-11-02  5:08 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: viro, brauner, cgroups, kvm, netdev, torvalds

fdget_pos() for constructor, fdput_pos() for cleanup, all users of
fd..._pos() converted trivially.

Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 arch/alpha/kernel/osf_sys.c |  5 ++---
 fs/read_write.c             | 34 +++++++++++++---------------------
 fs/readdir.c                | 28 ++++++++++------------------
 include/linux/file.h        |  1 +
 4 files changed, 26 insertions(+), 42 deletions(-)

diff --git a/arch/alpha/kernel/osf_sys.c b/arch/alpha/kernel/osf_sys.c
index c0424de9e7cd..86185021f75a 100644
--- a/arch/alpha/kernel/osf_sys.c
+++ b/arch/alpha/kernel/osf_sys.c
@@ -152,7 +152,7 @@ SYSCALL_DEFINE4(osf_getdirentries, unsigned int, fd,
 		long __user *, basep)
 {
 	int error;
-	struct fd arg = fdget_pos(fd);
+	CLASS(fd_pos, arg)(fd);
 	struct osf_dirent_callback buf = {
 		.ctx.actor = osf_filldir,
 		.dirent = dirent,
@@ -160,7 +160,7 @@ SYSCALL_DEFINE4(osf_getdirentries, unsigned int, fd,
 		.count = count
 	};
 
-	if (!fd_file(arg))
+	if (fd_empty(arg))
 		return -EBADF;
 
 	error = iterate_dir(fd_file(arg), &buf.ctx);
@@ -169,7 +169,6 @@ SYSCALL_DEFINE4(osf_getdirentries, unsigned int, fd,
 	if (count != buf.count)
 		error = count - buf.count;
 
-	fdput_pos(arg);
 	return error;
 }
 
diff --git a/fs/read_write.c b/fs/read_write.c
index 64dc24afdb3a..ef3ee3725714 100644
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -386,8 +386,8 @@ EXPORT_SYMBOL(vfs_llseek);
 static off_t ksys_lseek(unsigned int fd, off_t offset, unsigned int whence)
 {
 	off_t retval;
-	struct fd f = fdget_pos(fd);
-	if (!fd_file(f))
+	CLASS(fd_pos, f)(fd);
+	if (fd_empty(f))
 		return -EBADF;
 
 	retval = -EINVAL;
@@ -397,7 +397,6 @@ static off_t ksys_lseek(unsigned int fd, off_t offset, unsigned int whence)
 		if (res != (loff_t)retval)
 			retval = -EOVERFLOW;	/* LFS: should only happen on 32 bit platforms */
 	}
-	fdput_pos(f);
 	return retval;
 }
 
@@ -420,15 +419,14 @@ SYSCALL_DEFINE5(llseek, unsigned int, fd, unsigned long, offset_high,
 		unsigned int, whence)
 {
 	int retval;
-	struct fd f = fdget_pos(fd);
+	CLASS(fd_pos, f)(fd);
 	loff_t offset;
 
-	if (!fd_file(f))
+	if (fd_empty(f))
 		return -EBADF;
 
-	retval = -EINVAL;
 	if (whence > SEEK_MAX)
-		goto out_putf;
+		return -EINVAL;
 
 	offset = vfs_llseek(fd_file(f), ((loff_t) offset_high << 32) | offset_low,
 			whence);
@@ -439,8 +437,6 @@ SYSCALL_DEFINE5(llseek, unsigned int, fd, unsigned long, offset_high,
 		if (!copy_to_user(result, &offset, sizeof(offset)))
 			retval = 0;
 	}
-out_putf:
-	fdput_pos(f);
 	return retval;
 }
 #endif
@@ -700,10 +696,10 @@ static inline loff_t *file_ppos(struct file *file)
 
 ssize_t ksys_read(unsigned int fd, char __user *buf, size_t count)
 {
-	struct fd f = fdget_pos(fd);
+	CLASS(fd_pos, f)(fd);
 	ssize_t ret = -EBADF;
 
-	if (fd_file(f)) {
+	if (!fd_empty(f)) {
 		loff_t pos, *ppos = file_ppos(fd_file(f));
 		if (ppos) {
 			pos = *ppos;
@@ -712,7 +708,6 @@ ssize_t ksys_read(unsigned int fd, char __user *buf, size_t count)
 		ret = vfs_read(fd_file(f), buf, count, ppos);
 		if (ret >= 0 && ppos)
 			fd_file(f)->f_pos = pos;
-		fdput_pos(f);
 	}
 	return ret;
 }
@@ -724,10 +719,10 @@ SYSCALL_DEFINE3(read, unsigned int, fd, char __user *, buf, size_t, count)
 
 ssize_t ksys_write(unsigned int fd, const char __user *buf, size_t count)
 {
-	struct fd f = fdget_pos(fd);
+	CLASS(fd_pos, f)(fd);
 	ssize_t ret = -EBADF;
 
-	if (fd_file(f)) {
+	if (!fd_empty(f)) {
 		loff_t pos, *ppos = file_ppos(fd_file(f));
 		if (ppos) {
 			pos = *ppos;
@@ -736,7 +731,6 @@ ssize_t ksys_write(unsigned int fd, const char __user *buf, size_t count)
 		ret = vfs_write(fd_file(f), buf, count, ppos);
 		if (ret >= 0 && ppos)
 			fd_file(f)->f_pos = pos;
-		fdput_pos(f);
 	}
 
 	return ret;
@@ -1075,10 +1069,10 @@ static ssize_t vfs_writev(struct file *file, const struct iovec __user *vec,
 static ssize_t do_readv(unsigned long fd, const struct iovec __user *vec,
 			unsigned long vlen, rwf_t flags)
 {
-	struct fd f = fdget_pos(fd);
+	CLASS(fd_pos, f)(fd);
 	ssize_t ret = -EBADF;
 
-	if (fd_file(f)) {
+	if (!fd_empty(f)) {
 		loff_t pos, *ppos = file_ppos(fd_file(f));
 		if (ppos) {
 			pos = *ppos;
@@ -1087,7 +1081,6 @@ static ssize_t do_readv(unsigned long fd, const struct iovec __user *vec,
 		ret = vfs_readv(fd_file(f), vec, vlen, ppos, flags);
 		if (ret >= 0 && ppos)
 			fd_file(f)->f_pos = pos;
-		fdput_pos(f);
 	}
 
 	if (ret > 0)
@@ -1099,10 +1092,10 @@ static ssize_t do_readv(unsigned long fd, const struct iovec __user *vec,
 static ssize_t do_writev(unsigned long fd, const struct iovec __user *vec,
 			 unsigned long vlen, rwf_t flags)
 {
-	struct fd f = fdget_pos(fd);
+	CLASS(fd_pos, f)(fd);
 	ssize_t ret = -EBADF;
 
-	if (fd_file(f)) {
+	if (!fd_empty(f)) {
 		loff_t pos, *ppos = file_ppos(fd_file(f));
 		if (ppos) {
 			pos = *ppos;
@@ -1111,7 +1104,6 @@ static ssize_t do_writev(unsigned long fd, const struct iovec __user *vec,
 		ret = vfs_writev(fd_file(f), vec, vlen, ppos, flags);
 		if (ret >= 0 && ppos)
 			fd_file(f)->f_pos = pos;
-		fdput_pos(f);
 	}
 
 	if (ret > 0)
diff --git a/fs/readdir.c b/fs/readdir.c
index 6d29cab8576e..0038efda417b 100644
--- a/fs/readdir.c
+++ b/fs/readdir.c
@@ -219,20 +219,19 @@ SYSCALL_DEFINE3(old_readdir, unsigned int, fd,
 		struct old_linux_dirent __user *, dirent, unsigned int, count)
 {
 	int error;
-	struct fd f = fdget_pos(fd);
+	CLASS(fd_pos, f)(fd);
 	struct readdir_callback buf = {
 		.ctx.actor = fillonedir,
 		.dirent = dirent
 	};
 
-	if (!fd_file(f))
+	if (fd_empty(f))
 		return -EBADF;
 
 	error = iterate_dir(fd_file(f), &buf.ctx);
 	if (buf.result)
 		error = buf.result;
 
-	fdput_pos(f);
 	return error;
 }
 
@@ -309,7 +308,7 @@ static bool filldir(struct dir_context *ctx, const char *name, int namlen,
 SYSCALL_DEFINE3(getdents, unsigned int, fd,
 		struct linux_dirent __user *, dirent, unsigned int, count)
 {
-	struct fd f;
+	CLASS(fd_pos, f)(fd);
 	struct getdents_callback buf = {
 		.ctx.actor = filldir,
 		.count = count,
@@ -317,8 +316,7 @@ SYSCALL_DEFINE3(getdents, unsigned int, fd,
 	};
 	int error;
 
-	f = fdget_pos(fd);
-	if (!fd_file(f))
+	if (fd_empty(f))
 		return -EBADF;
 
 	error = iterate_dir(fd_file(f), &buf.ctx);
@@ -333,7 +331,6 @@ SYSCALL_DEFINE3(getdents, unsigned int, fd,
 		else
 			error = count - buf.count;
 	}
-	fdput_pos(f);
 	return error;
 }
 
@@ -392,7 +389,7 @@ static bool filldir64(struct dir_context *ctx, const char *name, int namlen,
 SYSCALL_DEFINE3(getdents64, unsigned int, fd,
 		struct linux_dirent64 __user *, dirent, unsigned int, count)
 {
-	struct fd f;
+	CLASS(fd_pos, f)(fd);
 	struct getdents_callback64 buf = {
 		.ctx.actor = filldir64,
 		.count = count,
@@ -400,8 +397,7 @@ SYSCALL_DEFINE3(getdents64, unsigned int, fd,
 	};
 	int error;
 
-	f = fdget_pos(fd);
-	if (!fd_file(f))
+	if (fd_empty(f))
 		return -EBADF;
 
 	error = iterate_dir(fd_file(f), &buf.ctx);
@@ -417,7 +413,6 @@ SYSCALL_DEFINE3(getdents64, unsigned int, fd,
 		else
 			error = count - buf.count;
 	}
-	fdput_pos(f);
 	return error;
 }
 
@@ -477,20 +472,19 @@ COMPAT_SYSCALL_DEFINE3(old_readdir, unsigned int, fd,
 		struct compat_old_linux_dirent __user *, dirent, unsigned int, count)
 {
 	int error;
-	struct fd f = fdget_pos(fd);
+	CLASS(fd_pos, f)(fd);
 	struct compat_readdir_callback buf = {
 		.ctx.actor = compat_fillonedir,
 		.dirent = dirent
 	};
 
-	if (!fd_file(f))
+	if (fd_empty(f))
 		return -EBADF;
 
 	error = iterate_dir(fd_file(f), &buf.ctx);
 	if (buf.result)
 		error = buf.result;
 
-	fdput_pos(f);
 	return error;
 }
 
@@ -560,7 +554,7 @@ static bool compat_filldir(struct dir_context *ctx, const char *name, int namlen
 COMPAT_SYSCALL_DEFINE3(getdents, unsigned int, fd,
 		struct compat_linux_dirent __user *, dirent, unsigned int, count)
 {
-	struct fd f;
+	CLASS(fd_pos, f)(fd);
 	struct compat_getdents_callback buf = {
 		.ctx.actor = compat_filldir,
 		.current_dir = dirent,
@@ -568,8 +562,7 @@ COMPAT_SYSCALL_DEFINE3(getdents, unsigned int, fd,
 	};
 	int error;
 
-	f = fdget_pos(fd);
-	if (!fd_file(f))
+	if (fd_empty(f))
 		return -EBADF;
 
 	error = iterate_dir(fd_file(f), &buf.ctx);
@@ -584,7 +577,6 @@ COMPAT_SYSCALL_DEFINE3(getdents, unsigned int, fd,
 		else
 			error = count - buf.count;
 	}
-	fdput_pos(f);
 	return error;
 }
 #endif
diff --git a/include/linux/file.h b/include/linux/file.h
index b49a92295b3f..4b09a8de2fd5 100644
--- a/include/linux/file.h
+++ b/include/linux/file.h
@@ -81,6 +81,7 @@ static inline void fdput_pos(struct fd f)
 
 DEFINE_CLASS(fd, struct fd, fdput(_T), fdget(fd), int fd)
 DEFINE_CLASS(fd_raw, struct fd, fdput(_T), fdget_raw(fd), int fd)
+DEFINE_CLASS(fd_pos, struct fd, fdput_pos(_T), fdget_pos(fd), int fd)
 
 extern int f_dupfd(unsigned int from, struct file *file, unsigned flags);
 extern int replace_fd(unsigned fd, struct file *file, unsigned flags);
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 134+ messages in thread

* [PATCH v3 12/28] o2hb_region_dev_store(): avoid goto around fdget()/fdput()
  2024-11-02  5:07   ` [PATCH v3 01/28] net/socket.c: switch to CLASS(fd) Al Viro
                       ` (9 preceding siblings ...)
  2024-11-02  5:08     ` [PATCH v3 11/28] introduce "fd_pos" class, convert fdget_pos() users to it Al Viro
@ 2024-11-02  5:08     ` Al Viro
  2024-11-02  5:08     ` [PATCH v3 13/28] privcmd_ioeventfd_assign(): don't open-code eventfd_ctx_fdget() Al Viro
                       ` (16 subsequent siblings)
  27 siblings, 0 replies; 134+ messages in thread
From: Al Viro @ 2024-11-02  5:08 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: viro, brauner, cgroups, kvm, netdev, torvalds

Preparation for CLASS(fd) conversion.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/ocfs2/cluster/heartbeat.c | 11 +++++------
 1 file changed, 5 insertions(+), 6 deletions(-)

diff --git a/fs/ocfs2/cluster/heartbeat.c b/fs/ocfs2/cluster/heartbeat.c
index 4b9f45d7049e..bc55340a60c3 100644
--- a/fs/ocfs2/cluster/heartbeat.c
+++ b/fs/ocfs2/cluster/heartbeat.c
@@ -1770,23 +1770,23 @@ static ssize_t o2hb_region_dev_store(struct config_item *item,
 	int live_threshold;
 
 	if (reg->hr_bdev_file)
-		goto out;
+		return -EINVAL;
 
 	/* We can't heartbeat without having had our node number
 	 * configured yet. */
 	if (o2nm_this_node() == O2NM_MAX_NODES)
-		goto out;
+		return -EINVAL;
 
 	fd = simple_strtol(p, &p, 0);
 	if (!p || (*p && (*p != '\n')))
-		goto out;
+		return -EINVAL;
 
 	if (fd < 0 || fd >= INT_MAX)
-		goto out;
+		return -EINVAL;
 
 	f = fdget(fd);
 	if (fd_file(f) == NULL)
-		goto out;
+		return -EINVAL;
 
 	if (reg->hr_blocks == 0 || reg->hr_start_block == 0 ||
 	    reg->hr_block_bytes == 0)
@@ -1908,7 +1908,6 @@ static ssize_t o2hb_region_dev_store(struct config_item *item,
 	}
 out2:
 	fdput(f);
-out:
 	return ret;
 }
 
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 134+ messages in thread

* [PATCH v3 13/28] privcmd_ioeventfd_assign(): don't open-code eventfd_ctx_fdget()
  2024-11-02  5:07   ` [PATCH v3 01/28] net/socket.c: switch to CLASS(fd) Al Viro
                       ` (10 preceding siblings ...)
  2024-11-02  5:08     ` [PATCH v3 12/28] o2hb_region_dev_store(): avoid goto around fdget()/fdput() Al Viro
@ 2024-11-02  5:08     ` Al Viro
  2024-11-02  5:08     ` [PATCH v3 14/28] fdget(), trivial conversions Al Viro
                       ` (15 subsequent siblings)
  27 siblings, 0 replies; 134+ messages in thread
From: Al Viro @ 2024-11-02  5:08 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: viro, brauner, cgroups, kvm, netdev, torvalds

just call it, same as privcmd_ioeventfd_deassign() does...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 drivers/xen/privcmd.c | 11 +----------
 1 file changed, 1 insertion(+), 10 deletions(-)

diff --git a/drivers/xen/privcmd.c b/drivers/xen/privcmd.c
index 3273cb8c2a66..79070494070d 100644
--- a/drivers/xen/privcmd.c
+++ b/drivers/xen/privcmd.c
@@ -1352,7 +1352,6 @@ static int privcmd_ioeventfd_assign(struct privcmd_ioeventfd *ioeventfd)
 	struct privcmd_kernel_ioeventfd *kioeventfd;
 	struct privcmd_kernel_ioreq *kioreq;
 	unsigned long flags;
-	struct fd f;
 	int ret;
 
 	/* Check for range overflow */
@@ -1372,15 +1371,7 @@ static int privcmd_ioeventfd_assign(struct privcmd_ioeventfd *ioeventfd)
 	if (!kioeventfd)
 		return -ENOMEM;
 
-	f = fdget(ioeventfd->event_fd);
-	if (!fd_file(f)) {
-		ret = -EBADF;
-		goto error_kfree;
-	}
-
-	kioeventfd->eventfd = eventfd_ctx_fileget(fd_file(f));
-	fdput(f);
-
+	kioeventfd->eventfd = eventfd_ctx_fdget(ioeventfd->event_fd);
 	if (IS_ERR(kioeventfd->eventfd)) {
 		ret = PTR_ERR(kioeventfd->eventfd);
 		goto error_kfree;
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 134+ messages in thread

* [PATCH v3 14/28] fdget(), trivial conversions
  2024-11-02  5:07   ` [PATCH v3 01/28] net/socket.c: switch to CLASS(fd) Al Viro
                       ` (11 preceding siblings ...)
  2024-11-02  5:08     ` [PATCH v3 13/28] privcmd_ioeventfd_assign(): don't open-code eventfd_ctx_fdget() Al Viro
@ 2024-11-02  5:08     ` Al Viro
  2024-11-11 17:22       ` Francesco Lavra
  2024-11-02  5:08     ` [PATCH v3 15/28] fdget(), more " Al Viro
                       ` (14 subsequent siblings)
  27 siblings, 1 reply; 134+ messages in thread
From: Al Viro @ 2024-11-02  5:08 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: viro, brauner, cgroups, kvm, netdev, torvalds

fdget() is the first thing done in scope, all matching fdput() are
immediately followed by leaving the scope.

Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 arch/powerpc/kvm/book3s_64_vio.c           | 21 +++---------
 arch/powerpc/kvm/powerpc.c                 | 24 ++++---------
 arch/powerpc/platforms/cell/spu_syscalls.c |  6 ++--
 arch/x86/kernel/cpu/sgx/main.c             | 10 ++----
 arch/x86/kvm/svm/sev.c                     | 39 ++++++++--------------
 drivers/gpu/drm/amd/amdgpu/amdgpu_sched.c  | 23 ++++---------
 drivers/gpu/drm/drm_syncobj.c              |  9 ++---
 drivers/media/rc/lirc_dev.c                | 13 +++-----
 fs/btrfs/ioctl.c                           |  5 ++-
 fs/eventfd.c                               |  9 ++---
 fs/eventpoll.c                             | 23 ++++---------
 fs/fhandle.c                               |  5 ++-
 fs/ioctl.c                                 | 23 +++++--------
 fs/kernel_read_file.c                      | 12 +++----
 fs/notify/fanotify/fanotify_user.c         | 15 +++------
 fs/notify/inotify/inotify_user.c           | 17 +++-------
 fs/open.c                                  | 36 +++++++++-----------
 fs/read_write.c                            | 28 +++++-----------
 fs/signalfd.c                              |  9 ++---
 fs/sync.c                                  | 29 ++++++----------
 io_uring/sqpoll.c                          | 29 +++++-----------
 kernel/events/core.c                       | 14 +++-----
 kernel/nsproxy.c                           |  5 ++-
 kernel/pid.c                               |  7 ++--
 kernel/sys.c                               | 15 +++------
 kernel/watch_queue.c                       |  6 ++--
 mm/fadvise.c                               | 10 ++----
 mm/readahead.c                             | 17 +++-------
 net/core/net_namespace.c                   | 10 +++---
 security/landlock/syscalls.c               | 26 +++++----------
 virt/kvm/vfio.c                            |  8 ++---
 31 files changed, 164 insertions(+), 339 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_64_vio.c b/arch/powerpc/kvm/book3s_64_vio.c
index 34c0adb9fdbf..742aa58a7c7e 100644
--- a/arch/powerpc/kvm/book3s_64_vio.c
+++ b/arch/powerpc/kvm/book3s_64_vio.c
@@ -115,10 +115,9 @@ long kvm_spapr_tce_attach_iommu_group(struct kvm *kvm, int tablefd,
 	struct iommu_table_group *table_group;
 	long i;
 	struct kvmppc_spapr_tce_iommu_table *stit;
-	struct fd f;
+	CLASS(fd, f)(tablefd);
 
-	f = fdget(tablefd);
-	if (!fd_file(f))
+	if (fd_empty(f))
 		return -EBADF;
 
 	rcu_read_lock();
@@ -130,16 +129,12 @@ long kvm_spapr_tce_attach_iommu_group(struct kvm *kvm, int tablefd,
 	}
 	rcu_read_unlock();
 
-	if (!found) {
-		fdput(f);
+	if (!found)
 		return -EINVAL;
-	}
 
 	table_group = iommu_group_get_iommudata(grp);
-	if (WARN_ON(!table_group)) {
-		fdput(f);
+	if (WARN_ON(!table_group))
 		return -EFAULT;
-	}
 
 	for (i = 0; i < IOMMU_TABLE_GROUP_MAX_TABLES; ++i) {
 		struct iommu_table *tbltmp = table_group->tables[i];
@@ -160,10 +155,8 @@ long kvm_spapr_tce_attach_iommu_group(struct kvm *kvm, int tablefd,
 			break;
 		}
 	}
-	if (!tbl) {
-		fdput(f);
+	if (!tbl)
 		return -EINVAL;
-	}
 
 	rcu_read_lock();
 	list_for_each_entry_rcu(stit, &stt->iommu_tables, next) {
@@ -174,7 +167,6 @@ long kvm_spapr_tce_attach_iommu_group(struct kvm *kvm, int tablefd,
 			/* stit is being destroyed */
 			iommu_tce_table_put(tbl);
 			rcu_read_unlock();
-			fdput(f);
 			return -ENOTTY;
 		}
 		/*
@@ -182,7 +174,6 @@ long kvm_spapr_tce_attach_iommu_group(struct kvm *kvm, int tablefd,
 		 * its KVM reference counter and can return.
 		 */
 		rcu_read_unlock();
-		fdput(f);
 		return 0;
 	}
 	rcu_read_unlock();
@@ -190,7 +181,6 @@ long kvm_spapr_tce_attach_iommu_group(struct kvm *kvm, int tablefd,
 	stit = kzalloc(sizeof(*stit), GFP_KERNEL);
 	if (!stit) {
 		iommu_tce_table_put(tbl);
-		fdput(f);
 		return -ENOMEM;
 	}
 
@@ -199,7 +189,6 @@ long kvm_spapr_tce_attach_iommu_group(struct kvm *kvm, int tablefd,
 
 	list_add_rcu(&stit->next, &stt->iommu_tables);
 
-	fdput(f);
 	return 0;
 }
 
diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index f14329989e9a..b3b37ea77849 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -1933,12 +1933,11 @@ static int kvm_vcpu_ioctl_enable_cap(struct kvm_vcpu *vcpu,
 #endif
 #ifdef CONFIG_KVM_MPIC
 	case KVM_CAP_IRQ_MPIC: {
-		struct fd f;
+		CLASS(fd, f)(cap->args[0]);
 		struct kvm_device *dev;
 
 		r = -EBADF;
-		f = fdget(cap->args[0]);
-		if (!fd_file(f))
+		if (fd_empty(f))
 			break;
 
 		r = -EPERM;
@@ -1946,18 +1945,16 @@ static int kvm_vcpu_ioctl_enable_cap(struct kvm_vcpu *vcpu,
 		if (dev)
 			r = kvmppc_mpic_connect_vcpu(dev, vcpu, cap->args[1]);
 
-		fdput(f);
 		break;
 	}
 #endif
 #ifdef CONFIG_KVM_XICS
 	case KVM_CAP_IRQ_XICS: {
-		struct fd f;
+		CLASS(fd, f)(cap->args[0]);
 		struct kvm_device *dev;
 
 		r = -EBADF;
-		f = fdget(cap->args[0]);
-		if (!fd_file(f))
+		if (fd_empty(f))
 			break;
 
 		r = -EPERM;
@@ -1968,34 +1965,27 @@ static int kvm_vcpu_ioctl_enable_cap(struct kvm_vcpu *vcpu,
 			else
 				r = kvmppc_xics_connect_vcpu(dev, vcpu, cap->args[1]);
 		}
-
-		fdput(f);
 		break;
 	}
 #endif /* CONFIG_KVM_XICS */
 #ifdef CONFIG_KVM_XIVE
 	case KVM_CAP_PPC_IRQ_XIVE: {
-		struct fd f;
+		CLASS(fd, f)(cap->args[0]);
 		struct kvm_device *dev;
 
 		r = -EBADF;
-		f = fdget(cap->args[0]);
-		if (!fd_file(f))
+		if (fd_empty(f))
 			break;
 
 		r = -ENXIO;
-		if (!xive_enabled()) {
-			fdput(f);
+		if (!xive_enabled())
 			break;
-		}
 
 		r = -EPERM;
 		dev = kvm_device_from_filp(fd_file(f));
 		if (dev)
 			r = kvmppc_xive_native_connect_vcpu(dev, vcpu,
 							    cap->args[1]);
-
-		fdput(f);
 		break;
 	}
 #endif /* CONFIG_KVM_XIVE */
diff --git a/arch/powerpc/platforms/cell/spu_syscalls.c b/arch/powerpc/platforms/cell/spu_syscalls.c
index cd7d42fc12a6..da4fad7fc8bf 100644
--- a/arch/powerpc/platforms/cell/spu_syscalls.c
+++ b/arch/powerpc/platforms/cell/spu_syscalls.c
@@ -64,12 +64,10 @@ SYSCALL_DEFINE4(spu_create, const char __user *, name, unsigned int, flags,
 		return -ENOSYS;
 
 	if (flags & SPU_CREATE_AFFINITY_SPU) {
-		struct fd neighbor = fdget(neighbor_fd);
+		CLASS(fd, neighbor)(neighbor_fd);
 		ret = -EBADF;
-		if (fd_file(neighbor)) {
+		if (!fd_empty(neighbor))
 			ret = calls->create_thread(name, flags, mode, fd_file(neighbor));
-			fdput(neighbor);
-		}
 	} else
 		ret = calls->create_thread(name, flags, mode, NULL);
 
diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
index 9ace84486499..eb5848d1851a 100644
--- a/arch/x86/kernel/cpu/sgx/main.c
+++ b/arch/x86/kernel/cpu/sgx/main.c
@@ -901,19 +901,15 @@ static struct miscdevice sgx_dev_provision = {
 int sgx_set_attribute(unsigned long *allowed_attributes,
 		      unsigned int attribute_fd)
 {
-	struct fd f = fdget(attribute_fd);
+	CLASS(fd, f)(attribute_fd);
 
-	if (!fd_file(f))
+	if (fd_empty(f))
 		return -EINVAL;
 
-	if (fd_file(f)->f_op != &sgx_provision_fops) {
-		fdput(f);
+	if (fd_file(f)->f_op != &sgx_provision_fops)
 		return -EINVAL;
-	}
 
 	*allowed_attributes |= SGX_ATTR_PROVISIONKEY;
-
-	fdput(f);
 	return 0;
 }
 EXPORT_SYMBOL_GPL(sgx_set_attribute);
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 0b851ef937f2..34304f6c36be 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -530,17 +530,12 @@ static int sev_bind_asid(struct kvm *kvm, unsigned int handle, int *error)
 
 static int __sev_issue_cmd(int fd, int id, void *data, int *error)
 {
-	struct fd f;
-	int ret;
+	CLASS(fd, f)(fd);
 
-	f = fdget(fd);
-	if (!fd_file(f))
+	if (fd_empty(f))
 		return -EBADF;
 
-	ret = sev_issue_cmd_external_user(fd_file(f), id, data, error);
-
-	fdput(f);
-	return ret;
+	return sev_issue_cmd_external_user(fd_file(f), id, data, error);
 }
 
 static int sev_issue_cmd(struct kvm *kvm, int id, void *data, int *error)
@@ -2073,23 +2068,21 @@ int sev_vm_move_enc_context_from(struct kvm *kvm, unsigned int source_fd)
 {
 	struct kvm_sev_info *dst_sev = &to_kvm_svm(kvm)->sev_info;
 	struct kvm_sev_info *src_sev, *cg_cleanup_sev;
-	struct fd f = fdget(source_fd);
+	CLASS(fd, f)(source_fd);
 	struct kvm *source_kvm;
 	bool charged = false;
 	int ret;
 
-	if (!fd_file(f))
+	if (fd_empty(f))
 		return -EBADF;
 
-	if (!file_is_kvm(fd_file(f))) {
-		ret = -EBADF;
-		goto out_fput;
-	}
+	if (!file_is_kvm(fd_file(f)))
+		return -EBADF;
 
 	source_kvm = fd_file(f)->private_data;
 	ret = sev_lock_two_vms(kvm, source_kvm);
 	if (ret)
-		goto out_fput;
+		return ret;
 
 	if (kvm->arch.vm_type != source_kvm->arch.vm_type ||
 	    sev_guest(kvm) || !sev_guest(source_kvm)) {
@@ -2136,8 +2129,6 @@ int sev_vm_move_enc_context_from(struct kvm *kvm, unsigned int source_fd)
 	cg_cleanup_sev->misc_cg = NULL;
 out_unlock:
 	sev_unlock_two_vms(kvm, source_kvm);
-out_fput:
-	fdput(f);
 	return ret;
 }
 
@@ -2798,23 +2789,21 @@ int sev_mem_enc_unregister_region(struct kvm *kvm,
 
 int sev_vm_copy_enc_context_from(struct kvm *kvm, unsigned int source_fd)
 {
-	struct fd f = fdget(source_fd);
+	CLASS(fd, f)(source_fd);
 	struct kvm *source_kvm;
 	struct kvm_sev_info *source_sev, *mirror_sev;
 	int ret;
 
-	if (!fd_file(f))
+	if (fd_empty(f))
 		return -EBADF;
 
-	if (!file_is_kvm(fd_file(f))) {
-		ret = -EBADF;
-		goto e_source_fput;
-	}
+	if (!file_is_kvm(fd_file(f)))
+		return -EBADF;
 
 	source_kvm = fd_file(f)->private_data;
 	ret = sev_lock_two_vms(kvm, source_kvm);
 	if (ret)
-		goto e_source_fput;
+		return ret;
 
 	/*
 	 * Mirrors of mirrors should work, but let's not get silly.  Also
@@ -2857,8 +2846,6 @@ int sev_vm_copy_enc_context_from(struct kvm *kvm, unsigned int source_fd)
 
 e_unlock:
 	sev_unlock_two_vms(kvm, source_kvm);
-e_source_fput:
-	fdput(f);
 	return ret;
 }
 
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_sched.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_sched.c
index b0a8abc7a8ec..341beec59537 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_sched.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_sched.c
@@ -35,21 +35,19 @@ static int amdgpu_sched_process_priority_override(struct amdgpu_device *adev,
 						  int fd,
 						  int32_t priority)
 {
-	struct fd f = fdget(fd);
+	CLASS(fd, f)(fd);
 	struct amdgpu_fpriv *fpriv;
 	struct amdgpu_ctx_mgr *mgr;
 	struct amdgpu_ctx *ctx;
 	uint32_t id;
 	int r;
 
-	if (!fd_file(f))
+	if (fd_empty(f))
 		return -EINVAL;
 
 	r = amdgpu_file_to_fpriv(fd_file(f), &fpriv);
-	if (r) {
-		fdput(f);
+	if (r)
 		return r;
-	}
 
 	mgr = &fpriv->ctx_mgr;
 	mutex_lock(&mgr->lock);
@@ -57,7 +55,6 @@ static int amdgpu_sched_process_priority_override(struct amdgpu_device *adev,
 		amdgpu_ctx_priority_override(ctx, priority);
 	mutex_unlock(&mgr->lock);
 
-	fdput(f);
 	return 0;
 }
 
@@ -66,31 +63,25 @@ static int amdgpu_sched_context_priority_override(struct amdgpu_device *adev,
 						  unsigned ctx_id,
 						  int32_t priority)
 {
-	struct fd f = fdget(fd);
+	CLASS(fd, f)(fd);
 	struct amdgpu_fpriv *fpriv;
 	struct amdgpu_ctx *ctx;
 	int r;
 
-	if (!fd_file(f))
+	if (fd_empty(f))
 		return -EINVAL;
 
 	r = amdgpu_file_to_fpriv(fd_file(f), &fpriv);
-	if (r) {
-		fdput(f);
+	if (r)
 		return r;
-	}
 
 	ctx = amdgpu_ctx_get(fpriv, ctx_id);
 
-	if (!ctx) {
-		fdput(f);
+	if (!ctx)
 		return -EINVAL;
-	}
 
 	amdgpu_ctx_priority_override(ctx, priority);
 	amdgpu_ctx_put(ctx);
-	fdput(f);
-
 	return 0;
 }
 
diff --git a/drivers/gpu/drm/drm_syncobj.c b/drivers/gpu/drm/drm_syncobj.c
index 8e3d2d7060f8..4f2ab8a7b50f 100644
--- a/drivers/gpu/drm/drm_syncobj.c
+++ b/drivers/gpu/drm/drm_syncobj.c
@@ -712,16 +712,14 @@ static int drm_syncobj_fd_to_handle(struct drm_file *file_private,
 				    int fd, u32 *handle)
 {
 	struct drm_syncobj *syncobj;
-	struct fd f = fdget(fd);
+	CLASS(fd, f)(fd);
 	int ret;
 
-	if (!fd_file(f))
+	if (fd_empty(f))
 		return -EINVAL;
 
-	if (fd_file(f)->f_op != &drm_syncobj_file_fops) {
-		fdput(f);
+	if (fd_file(f)->f_op != &drm_syncobj_file_fops)
 		return -EINVAL;
-	}
 
 	/* take a reference to put in the idr */
 	syncobj = fd_file(f)->private_data;
@@ -739,7 +737,6 @@ static int drm_syncobj_fd_to_handle(struct drm_file *file_private,
 	} else
 		drm_syncobj_put(syncobj);
 
-	fdput(f);
 	return ret;
 }
 
diff --git a/drivers/media/rc/lirc_dev.c b/drivers/media/rc/lirc_dev.c
index f042f3f14afa..a2257dc2f25d 100644
--- a/drivers/media/rc/lirc_dev.c
+++ b/drivers/media/rc/lirc_dev.c
@@ -815,28 +815,23 @@ void __exit lirc_dev_exit(void)
 
 struct rc_dev *rc_dev_get_from_fd(int fd, bool write)
 {
-	struct fd f = fdget(fd);
+	CLASS(fd, f)(fd);
 	struct lirc_fh *fh;
 	struct rc_dev *dev;
 
-	if (!fd_file(f))
+	if (fd_empty(f))
 		return ERR_PTR(-EBADF);
 
-	if (fd_file(f)->f_op != &lirc_fops) {
-		fdput(f);
+	if (fd_file(f)->f_op != &lirc_fops)
 		return ERR_PTR(-EINVAL);
-	}
 
-	if (write && !(fd_file(f)->f_mode & FMODE_WRITE)) {
-		fdput(f);
+	if (write && !(fd_file(f)->f_mode & FMODE_WRITE))
 		return ERR_PTR(-EPERM);
-	}
 
 	fh = fd_file(f)->private_data;
 	dev = fh->rc;
 
 	get_device(&dev->dev);
-	fdput(f);
 
 	return dev;
 }
diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 226c91fe31a7..adb591b1d071 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -1308,9 +1308,9 @@ static noinline int __btrfs_ioctl_snap_create(struct file *file,
 		ret = btrfs_mksubvol(&file->f_path, idmap, name,
 				     namelen, NULL, readonly, inherit);
 	} else {
-		struct fd src = fdget(fd);
+		CLASS(fd, src)(fd);
 		struct inode *src_inode;
-		if (!fd_file(src)) {
+		if (fd_empty(src)) {
 			ret = -EINVAL;
 			goto out_drop_write;
 		}
@@ -1341,7 +1341,6 @@ static noinline int __btrfs_ioctl_snap_create(struct file *file,
 					       BTRFS_I(src_inode)->root,
 					       readonly, inherit);
 		}
-		fdput(src);
 	}
 out_drop_write:
 	mnt_drop_write_file(file);
diff --git a/fs/eventfd.c b/fs/eventfd.c
index 22c934f3a080..76129bfcd663 100644
--- a/fs/eventfd.c
+++ b/fs/eventfd.c
@@ -347,13 +347,10 @@ EXPORT_SYMBOL_GPL(eventfd_fget);
  */
 struct eventfd_ctx *eventfd_ctx_fdget(int fd)
 {
-	struct eventfd_ctx *ctx;
-	struct fd f = fdget(fd);
-	if (!fd_file(f))
+	CLASS(fd, f)(fd);
+	if (fd_empty(f))
 		return ERR_PTR(-EBADF);
-	ctx = eventfd_ctx_fileget(fd_file(f));
-	fdput(f);
-	return ctx;
+	return eventfd_ctx_fileget(fd_file(f));
 }
 EXPORT_SYMBOL_GPL(eventfd_ctx_fdget);
 
diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index 1ae4542f0bd8..4607dcbc2851 100644
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -2254,25 +2254,22 @@ int do_epoll_ctl(int epfd, int op, int fd, struct epoll_event *epds,
 {
 	int error;
 	int full_check = 0;
-	struct fd f, tf;
 	struct eventpoll *ep;
 	struct epitem *epi;
 	struct eventpoll *tep = NULL;
 
-	error = -EBADF;
-	f = fdget(epfd);
-	if (!fd_file(f))
-		goto error_return;
+	CLASS(fd, f)(epfd);
+	if (fd_empty(f))
+		return -EBADF;
 
 	/* Get the "struct file *" for the target file */
-	tf = fdget(fd);
-	if (!fd_file(tf))
-		goto error_fput;
+	CLASS(fd, tf)(fd);
+	if (fd_empty(tf))
+		return -EBADF;
 
 	/* The target file descriptor must support poll */
-	error = -EPERM;
 	if (!file_can_poll(fd_file(tf)))
-		goto error_tgt_fput;
+		return -EPERM;
 
 	/* Check if EPOLLWAKEUP is allowed */
 	if (ep_op_has_event(op))
@@ -2391,12 +2388,6 @@ int do_epoll_ctl(int epfd, int op, int fd, struct epoll_event *epds,
 		loop_check_gen++;
 		mutex_unlock(&epnested_mutex);
 	}
-
-	fdput(tf);
-error_fput:
-	fdput(f);
-error_return:
-
 	return error;
 }
 
diff --git a/fs/fhandle.c b/fs/fhandle.c
index 82df28d45cd7..5f801139358e 100644
--- a/fs/fhandle.c
+++ b/fs/fhandle.c
@@ -139,12 +139,11 @@ static int get_path_from_fd(int fd, struct path *root)
 		path_get(root);
 		spin_unlock(&fs->lock);
 	} else {
-		struct fd f = fdget(fd);
-		if (!fd_file(f))
+		CLASS(fd, f)(fd);
+		if (fd_empty(f))
 			return -EBADF;
 		*root = fd_file(f)->f_path;
 		path_get(root);
-		fdput(f);
 	}
 
 	return 0;
diff --git a/fs/ioctl.c b/fs/ioctl.c
index 6e0c954388d4..638a36be31c1 100644
--- a/fs/ioctl.c
+++ b/fs/ioctl.c
@@ -231,11 +231,11 @@ static int ioctl_fiemap(struct file *filp, struct fiemap __user *ufiemap)
 static long ioctl_file_clone(struct file *dst_file, unsigned long srcfd,
 			     u64 off, u64 olen, u64 destoff)
 {
-	struct fd src_file = fdget(srcfd);
+	CLASS(fd, src_file)(srcfd);
 	loff_t cloned;
 	int ret;
 
-	if (!fd_file(src_file))
+	if (fd_empty(src_file))
 		return -EBADF;
 	cloned = vfs_clone_file_range(fd_file(src_file), off, dst_file, destoff,
 				      olen, 0);
@@ -245,7 +245,6 @@ static long ioctl_file_clone(struct file *dst_file, unsigned long srcfd,
 		ret = -EINVAL;
 	else
 		ret = 0;
-	fdput(src_file);
 	return ret;
 }
 
@@ -892,22 +891,20 @@ static int do_vfs_ioctl(struct file *filp, unsigned int fd,
 
 SYSCALL_DEFINE3(ioctl, unsigned int, fd, unsigned int, cmd, unsigned long, arg)
 {
-	struct fd f = fdget(fd);
+	CLASS(fd, f)(fd);
 	int error;
 
-	if (!fd_file(f))
+	if (fd_empty(f))
 		return -EBADF;
 
 	error = security_file_ioctl(fd_file(f), cmd, arg);
 	if (error)
-		goto out;
+		return error;
 
 	error = do_vfs_ioctl(fd_file(f), fd, cmd, arg);
 	if (error == -ENOIOCTLCMD)
 		error = vfs_ioctl(fd_file(f), cmd, arg);
 
-out:
-	fdput(f);
 	return error;
 }
 
@@ -950,15 +947,15 @@ EXPORT_SYMBOL(compat_ptr_ioctl);
 COMPAT_SYSCALL_DEFINE3(ioctl, unsigned int, fd, unsigned int, cmd,
 		       compat_ulong_t, arg)
 {
-	struct fd f = fdget(fd);
+	CLASS(fd, f)(fd);
 	int error;
 
-	if (!fd_file(f))
+	if (fd_empty(f))
 		return -EBADF;
 
 	error = security_file_ioctl_compat(fd_file(f), cmd, arg);
 	if (error)
-		goto out;
+		return error;
 
 	switch (cmd) {
 	/* FICLONE takes an int argument, so don't use compat_ptr() */
@@ -1009,10 +1006,6 @@ COMPAT_SYSCALL_DEFINE3(ioctl, unsigned int, fd, unsigned int, cmd,
 			error = -ENOTTY;
 		break;
 	}
-
- out:
-	fdput(f);
-
 	return error;
 }
 #endif
diff --git a/fs/kernel_read_file.c b/fs/kernel_read_file.c
index 9ff37ae650ea..de32c95d823d 100644
--- a/fs/kernel_read_file.c
+++ b/fs/kernel_read_file.c
@@ -175,15 +175,11 @@ ssize_t kernel_read_file_from_fd(int fd, loff_t offset, void **buf,
 				 size_t buf_size, size_t *file_size,
 				 enum kernel_read_file_id id)
 {
-	struct fd f = fdget(fd);
-	ssize_t ret = -EBADF;
+	CLASS(fd, f)(fd);
 
-	if (!fd_file(f) || !(fd_file(f)->f_mode & FMODE_READ))
-		goto out;
+	if (fd_empty(f) || !(fd_file(f)->f_mode & FMODE_READ))
+		return -EBADF;
 
-	ret = kernel_read_file(fd_file(f), offset, buf, buf_size, file_size, id);
-out:
-	fdput(f);
-	return ret;
+	return kernel_read_file(fd_file(f), offset, buf, buf_size, file_size, id);
 }
 EXPORT_SYMBOL_GPL(kernel_read_file_from_fd);
diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c
index 9644bc72e457..07c5ffc8523b 100644
--- a/fs/notify/fanotify/fanotify_user.c
+++ b/fs/notify/fanotify/fanotify_user.c
@@ -1003,22 +1003,17 @@ static int fanotify_find_path(int dfd, const char __user *filename,
 		 dfd, filename, flags);
 
 	if (filename == NULL) {
-		struct fd f = fdget(dfd);
+		CLASS(fd, f)(dfd);
 
-		ret = -EBADF;
-		if (!fd_file(f))
-			goto out;
+		if (fd_empty(f))
+			return -EBADF;
 
-		ret = -ENOTDIR;
 		if ((flags & FAN_MARK_ONLYDIR) &&
-		    !(S_ISDIR(file_inode(fd_file(f))->i_mode))) {
-			fdput(f);
-			goto out;
-		}
+		    !(S_ISDIR(file_inode(fd_file(f))->i_mode)))
+			return -ENOTDIR;
 
 		*path = fd_file(f)->f_path;
 		path_get(path);
-		fdput(f);
 	} else {
 		unsigned int lookup_flags = 0;
 
diff --git a/fs/notify/inotify/inotify_user.c b/fs/notify/inotify/inotify_user.c
index 0794dcaf1e47..dc645af2a6ad 100644
--- a/fs/notify/inotify/inotify_user.c
+++ b/fs/notify/inotify/inotify_user.c
@@ -794,33 +794,26 @@ SYSCALL_DEFINE2(inotify_rm_watch, int, fd, __s32, wd)
 {
 	struct fsnotify_group *group;
 	struct inotify_inode_mark *i_mark;
-	struct fd f;
-	int ret = -EINVAL;
+	CLASS(fd, f)(fd);
 
-	f = fdget(fd);
-	if (unlikely(!fd_file(f)))
+	if (fd_empty(f))
 		return -EBADF;
 
 	/* verify that this is indeed an inotify instance */
 	if (unlikely(fd_file(f)->f_op != &inotify_fops))
-		goto out;
+		return -EINVAL;
 
 	group = fd_file(f)->private_data;
 
 	i_mark = inotify_idr_find(group, wd);
 	if (unlikely(!i_mark))
-		goto out;
-
-	ret = 0;
+		return -EINVAL;
 
 	fsnotify_destroy_mark(&i_mark->fsn_mark, group);
 
 	/* match ref taken by inotify_idr_find */
 	fsnotify_put_mark(&i_mark->fsn_mark);
-
-out:
-	fdput(f);
-	return ret;
+	return 0;
 }
 
 /*
diff --git a/fs/open.c b/fs/open.c
index a0c1fa3f60d5..24d22f4222f0 100644
--- a/fs/open.c
+++ b/fs/open.c
@@ -349,14 +349,12 @@ EXPORT_SYMBOL_GPL(vfs_fallocate);
 
 int ksys_fallocate(int fd, int mode, loff_t offset, loff_t len)
 {
-	struct fd f = fdget(fd);
-	int error = -EBADF;
+	CLASS(fd, f)(fd);
 
-	if (fd_file(f)) {
-		error = vfs_fallocate(fd_file(f), mode, offset, len);
-		fdput(f);
-	}
-	return error;
+	if (fd_empty(f))
+		return -EBADF;
+
+	return vfs_fallocate(fd_file(f), mode, offset, len);
 }
 
 SYSCALL_DEFINE4(fallocate, int, fd, int, mode, loff_t, offset, loff_t, len)
@@ -666,14 +664,12 @@ int vfs_fchmod(struct file *file, umode_t mode)
 
 SYSCALL_DEFINE2(fchmod, unsigned int, fd, umode_t, mode)
 {
-	struct fd f = fdget(fd);
-	int err = -EBADF;
+	CLASS(fd, f)(fd);
 
-	if (fd_file(f)) {
-		err = vfs_fchmod(fd_file(f), mode);
-		fdput(f);
-	}
-	return err;
+	if (fd_empty(f))
+		return -EBADF;
+
+	return vfs_fchmod(fd_file(f), mode);
 }
 
 static int do_fchmodat(int dfd, const char __user *filename, umode_t mode,
@@ -860,14 +856,12 @@ int vfs_fchown(struct file *file, uid_t user, gid_t group)
 
 int ksys_fchown(unsigned int fd, uid_t user, gid_t group)
 {
-	struct fd f = fdget(fd);
-	int error = -EBADF;
+	CLASS(fd, f)(fd);
 
-	if (fd_file(f)) {
-		error = vfs_fchown(fd_file(f), user, group);
-		fdput(f);
-	}
-	return error;
+	if (fd_empty(f))
+		return -EBADF;
+
+	return vfs_fchown(fd_file(f), user, group);
 }
 
 SYSCALL_DEFINE3(fchown, unsigned int, fd, uid_t, user, gid_t, group)
diff --git a/fs/read_write.c b/fs/read_write.c
index ef3ee3725714..5e3df2d39283 100644
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -1663,36 +1663,32 @@ SYSCALL_DEFINE6(copy_file_range, int, fd_in, loff_t __user *, off_in,
 {
 	loff_t pos_in;
 	loff_t pos_out;
-	struct fd f_in;
-	struct fd f_out;
 	ssize_t ret = -EBADF;
 
-	f_in = fdget(fd_in);
-	if (!fd_file(f_in))
-		goto out2;
+	CLASS(fd, f_in)(fd_in);
+	if (fd_empty(f_in))
+		return -EBADF;
 
-	f_out = fdget(fd_out);
-	if (!fd_file(f_out))
-		goto out1;
+	CLASS(fd, f_out)(fd_out);
+	if (fd_empty(f_out))
+		return -EBADF;
 
-	ret = -EFAULT;
 	if (off_in) {
 		if (copy_from_user(&pos_in, off_in, sizeof(loff_t)))
-			goto out;
+			return -EFAULT;
 	} else {
 		pos_in = fd_file(f_in)->f_pos;
 	}
 
 	if (off_out) {
 		if (copy_from_user(&pos_out, off_out, sizeof(loff_t)))
-			goto out;
+			return -EFAULT;
 	} else {
 		pos_out = fd_file(f_out)->f_pos;
 	}
 
-	ret = -EINVAL;
 	if (flags != 0)
-		goto out;
+		return -EINVAL;
 
 	ret = vfs_copy_file_range(fd_file(f_in), pos_in, fd_file(f_out), pos_out, len,
 				  flags);
@@ -1714,12 +1710,6 @@ SYSCALL_DEFINE6(copy_file_range, int, fd_in, loff_t __user *, off_in,
 			fd_file(f_out)->f_pos = pos_out;
 		}
 	}
-
-out:
-	fdput(f_out);
-out1:
-	fdput(f_in);
-out2:
 	return ret;
 }
 
diff --git a/fs/signalfd.c b/fs/signalfd.c
index 736bebf93591..d1a5f43ce466 100644
--- a/fs/signalfd.c
+++ b/fs/signalfd.c
@@ -288,20 +288,17 @@ static int do_signalfd4(int ufd, sigset_t *mask, int flags)
 
 		fd_install(ufd, file);
 	} else {
-		struct fd f = fdget(ufd);
-		if (!fd_file(f))
+		CLASS(fd, f)(ufd);
+		if (fd_empty(f))
 			return -EBADF;
 		ctx = fd_file(f)->private_data;
-		if (fd_file(f)->f_op != &signalfd_fops) {
-			fdput(f);
+		if (fd_file(f)->f_op != &signalfd_fops)
 			return -EINVAL;
-		}
 		spin_lock_irq(&current->sighand->siglock);
 		ctx->sigmask = *mask;
 		spin_unlock_irq(&current->sighand->siglock);
 
 		wake_up(&current->sighand->signalfd_wqh);
-		fdput(f);
 	}
 
 	return ufd;
diff --git a/fs/sync.c b/fs/sync.c
index 67df255eb189..2955cd4c77a3 100644
--- a/fs/sync.c
+++ b/fs/sync.c
@@ -148,11 +148,11 @@ void emergency_sync(void)
  */
 SYSCALL_DEFINE1(syncfs, int, fd)
 {
-	struct fd f = fdget(fd);
+	CLASS(fd, f)(fd);
 	struct super_block *sb;
 	int ret, ret2;
 
-	if (!fd_file(f))
+	if (fd_empty(f))
 		return -EBADF;
 	sb = fd_file(f)->f_path.dentry->d_sb;
 
@@ -162,7 +162,6 @@ SYSCALL_DEFINE1(syncfs, int, fd)
 
 	ret2 = errseq_check_and_advance(&sb->s_wb_err, &fd_file(f)->f_sb_err);
 
-	fdput(f);
 	return ret ? ret : ret2;
 }
 
@@ -205,14 +204,12 @@ EXPORT_SYMBOL(vfs_fsync);
 
 static int do_fsync(unsigned int fd, int datasync)
 {
-	struct fd f = fdget(fd);
-	int ret = -EBADF;
+	CLASS(fd, f)(fd);
 
-	if (fd_file(f)) {
-		ret = vfs_fsync(fd_file(f), datasync);
-		fdput(f);
-	}
-	return ret;
+	if (fd_empty(f))
+		return -EBADF;
+
+	return vfs_fsync(fd_file(f), datasync);
 }
 
 SYSCALL_DEFINE1(fsync, unsigned int, fd)
@@ -355,16 +352,12 @@ int sync_file_range(struct file *file, loff_t offset, loff_t nbytes,
 int ksys_sync_file_range(int fd, loff_t offset, loff_t nbytes,
 			 unsigned int flags)
 {
-	int ret;
-	struct fd f;
+	CLASS(fd, f)(fd);
 
-	ret = -EBADF;
-	f = fdget(fd);
-	if (fd_file(f))
-		ret = sync_file_range(fd_file(f), offset, nbytes, flags);
+	if (fd_empty(f))
+		return -EBADF;
 
-	fdput(f);
-	return ret;
+	return sync_file_range(fd_file(f), offset, nbytes, flags);
 }
 
 SYSCALL_DEFINE4(sync_file_range, int, fd, loff_t, offset, loff_t, nbytes,
diff --git a/io_uring/sqpoll.c b/io_uring/sqpoll.c
index a26593979887..d5f0c3d9c35f 100644
--- a/io_uring/sqpoll.c
+++ b/io_uring/sqpoll.c
@@ -106,29 +106,21 @@ static struct io_sq_data *io_attach_sq_data(struct io_uring_params *p)
 {
 	struct io_ring_ctx *ctx_attach;
 	struct io_sq_data *sqd;
-	struct fd f;
+	CLASS(fd, f)(p->wq_fd);
 
-	f = fdget(p->wq_fd);
-	if (!fd_file(f))
+	if (fd_empty(f))
 		return ERR_PTR(-ENXIO);
-	if (!io_is_uring_fops(fd_file(f))) {
-		fdput(f);
+	if (!io_is_uring_fops(fd_file(f)))
 		return ERR_PTR(-EINVAL);
-	}
 
 	ctx_attach = fd_file(f)->private_data;
 	sqd = ctx_attach->sq_data;
-	if (!sqd) {
-		fdput(f);
+	if (!sqd)
 		return ERR_PTR(-EINVAL);
-	}
-	if (sqd->task_tgid != current->tgid) {
-		fdput(f);
+	if (sqd->task_tgid != current->tgid)
 		return ERR_PTR(-EPERM);
-	}
 
 	refcount_inc(&sqd->refs);
-	fdput(f);
 	return sqd;
 }
 
@@ -417,16 +409,11 @@ __cold int io_sq_offload_create(struct io_ring_ctx *ctx,
 	/* Retain compatibility with failing for an invalid attach attempt */
 	if ((ctx->flags & (IORING_SETUP_ATTACH_WQ | IORING_SETUP_SQPOLL)) ==
 				IORING_SETUP_ATTACH_WQ) {
-		struct fd f;
-
-		f = fdget(p->wq_fd);
-		if (!fd_file(f))
+		CLASS(fd, f)(p->wq_fd);
+		if (fd_empty(f))
 			return -ENXIO;
-		if (!io_is_uring_fops(fd_file(f))) {
-			fdput(f);
+		if (!io_is_uring_fops(fd_file(f)))
 			return -EINVAL;
-		}
-		fdput(f);
 	}
 	if (ctx->flags & IORING_SETUP_SQPOLL) {
 		struct task_struct *tsk;
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 85b209626dd7..075ce7299973 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -966,22 +966,20 @@ static inline int perf_cgroup_connect(int fd, struct perf_event *event,
 {
 	struct perf_cgroup *cgrp;
 	struct cgroup_subsys_state *css;
-	struct fd f = fdget(fd);
+	CLASS(fd, f)(fd);
 	int ret = 0;
 
-	if (!fd_file(f))
+	if (fd_empty(f))
 		return -EBADF;
 
 	css = css_tryget_online_from_dir(fd_file(f)->f_path.dentry,
 					 &perf_event_cgrp_subsys);
-	if (IS_ERR(css)) {
-		ret = PTR_ERR(css);
-		goto out;
-	}
+	if (IS_ERR(css))
+		return PTR_ERR(css);
 
 	ret = perf_cgroup_ensure_storage(event, css);
 	if (ret)
-		goto out;
+		return ret;
 
 	cgrp = container_of(css, struct perf_cgroup, css);
 	event->cgrp = cgrp;
@@ -995,8 +993,6 @@ static inline int perf_cgroup_connect(int fd, struct perf_event *event,
 		perf_detach_cgroup(event);
 		ret = -EINVAL;
 	}
-out:
-	fdput(f);
 	return ret;
 }
 
diff --git a/kernel/nsproxy.c b/kernel/nsproxy.c
index dc952c3b05af..c9d97ed20122 100644
--- a/kernel/nsproxy.c
+++ b/kernel/nsproxy.c
@@ -545,12 +545,12 @@ static void commit_nsset(struct nsset *nsset)
 
 SYSCALL_DEFINE2(setns, int, fd, int, flags)
 {
-	struct fd f = fdget(fd);
+	CLASS(fd, f)(fd);
 	struct ns_common *ns = NULL;
 	struct nsset nsset = {};
 	int err = 0;
 
-	if (!fd_file(f))
+	if (fd_empty(f))
 		return -EBADF;
 
 	if (proc_ns_file(fd_file(f))) {
@@ -580,7 +580,6 @@ SYSCALL_DEFINE2(setns, int, fd, int, flags)
 	}
 	put_nsset(&nsset);
 out:
-	fdput(f);
 	return err;
 }
 
diff --git a/kernel/pid.c b/kernel/pid.c
index 2715afb77eab..b5bbc1a8a6e4 100644
--- a/kernel/pid.c
+++ b/kernel/pid.c
@@ -536,11 +536,10 @@ EXPORT_SYMBOL_GPL(find_ge_pid);
 
 struct pid *pidfd_get_pid(unsigned int fd, unsigned int *flags)
 {
-	struct fd f;
+	CLASS(fd, f)(fd);
 	struct pid *pid;
 
-	f = fdget(fd);
-	if (!fd_file(f))
+	if (fd_empty(f))
 		return ERR_PTR(-EBADF);
 
 	pid = pidfd_pid(fd_file(f));
@@ -548,8 +547,6 @@ struct pid *pidfd_get_pid(unsigned int fd, unsigned int *flags)
 		get_pid(pid);
 		*flags = fd_file(f)->f_flags;
 	}
-
-	fdput(f);
 	return pid;
 }
 
diff --git a/kernel/sys.c b/kernel/sys.c
index 4da31f28fda8..ebe10c27a9f4 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -1911,12 +1911,11 @@ SYSCALL_DEFINE1(umask, int, mask)
 
 static int prctl_set_mm_exe_file(struct mm_struct *mm, unsigned int fd)
 {
-	struct fd exe;
+	CLASS(fd, exe)(fd);
 	struct inode *inode;
 	int err;
 
-	exe = fdget(fd);
-	if (!fd_file(exe))
+	if (fd_empty(exe))
 		return -EBADF;
 
 	inode = file_inode(fd_file(exe));
@@ -1926,18 +1925,14 @@ static int prctl_set_mm_exe_file(struct mm_struct *mm, unsigned int fd)
 	 * sure that this one is executable as well, to avoid breaking an
 	 * overall picture.
 	 */
-	err = -EACCES;
 	if (!S_ISREG(inode->i_mode) || path_noexec(&fd_file(exe)->f_path))
-		goto exit;
+		return -EACCES;
 
 	err = file_permission(fd_file(exe), MAY_EXEC);
 	if (err)
-		goto exit;
+		return err;
 
-	err = replace_mm_exe_file(mm, fd_file(exe));
-exit:
-	fdput(exe);
-	return err;
+	return replace_mm_exe_file(mm, fd_file(exe));
 }
 
 /*
diff --git a/kernel/watch_queue.c b/kernel/watch_queue.c
index d36242fd4936..1895fbc32bcb 100644
--- a/kernel/watch_queue.c
+++ b/kernel/watch_queue.c
@@ -663,16 +663,14 @@ struct watch_queue *get_watch_queue(int fd)
 {
 	struct pipe_inode_info *pipe;
 	struct watch_queue *wqueue = ERR_PTR(-EINVAL);
-	struct fd f;
+	CLASS(fd, f)(fd);
 
-	f = fdget(fd);
-	if (fd_file(f)) {
+	if (!fd_empty(f)) {
 		pipe = get_pipe_info(fd_file(f), false);
 		if (pipe && pipe->watch_queue) {
 			wqueue = pipe->watch_queue;
 			kref_get(&wqueue->usage);
 		}
-		fdput(f);
 	}
 
 	return wqueue;
diff --git a/mm/fadvise.c b/mm/fadvise.c
index 532dee205c6e..588fe76c5a14 100644
--- a/mm/fadvise.c
+++ b/mm/fadvise.c
@@ -190,16 +190,12 @@ EXPORT_SYMBOL(vfs_fadvise);
 
 int ksys_fadvise64_64(int fd, loff_t offset, loff_t len, int advice)
 {
-	struct fd f = fdget(fd);
-	int ret;
+	CLASS(fd, f)(fd);
 
-	if (!fd_file(f))
+	if (fd_empty(f))
 		return -EBADF;
 
-	ret = vfs_fadvise(fd_file(f), offset, len, advice);
-
-	fdput(f);
-	return ret;
+	return vfs_fadvise(fd_file(f), offset, len, advice);
 }
 
 SYSCALL_DEFINE4(fadvise64_64, int, fd, loff_t, offset, loff_t, len, int, advice)
diff --git a/mm/readahead.c b/mm/readahead.c
index 3dc6c7a128dd..9a807727d809 100644
--- a/mm/readahead.c
+++ b/mm/readahead.c
@@ -673,29 +673,22 @@ EXPORT_SYMBOL_GPL(page_cache_async_ra);
 
 ssize_t ksys_readahead(int fd, loff_t offset, size_t count)
 {
-	ssize_t ret;
-	struct fd f;
+	CLASS(fd, f)(fd);
 
-	ret = -EBADF;
-	f = fdget(fd);
-	if (!fd_file(f) || !(fd_file(f)->f_mode & FMODE_READ))
-		goto out;
+	if (fd_empty(f) || !(fd_file(f)->f_mode & FMODE_READ))
+		return -EBADF;
 
 	/*
 	 * The readahead() syscall is intended to run only on files
 	 * that can execute readahead. If readahead is not possible
 	 * on this file, then we must return -EINVAL.
 	 */
-	ret = -EINVAL;
 	if (!fd_file(f)->f_mapping || !fd_file(f)->f_mapping->a_ops ||
 	    (!S_ISREG(file_inode(fd_file(f))->i_mode) &&
 	    !S_ISBLK(file_inode(fd_file(f))->i_mode)))
-		goto out;
+		return -EINVAL;
 
-	ret = vfs_fadvise(fd_file(f), offset, count, POSIX_FADV_WILLNEED);
-out:
-	fdput(f);
-	return ret;
+	return vfs_fadvise(fd_file(f), offset, count, POSIX_FADV_WILLNEED);
 }
 
 SYSCALL_DEFINE3(readahead, int, fd, loff_t, offset, size_t, count)
diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c
index e39479f1c9a4..b231b27d8268 100644
--- a/net/core/net_namespace.c
+++ b/net/core/net_namespace.c
@@ -694,20 +694,18 @@ EXPORT_SYMBOL_GPL(get_net_ns);
 
 struct net *get_net_ns_by_fd(int fd)
 {
-	struct fd f = fdget(fd);
-	struct net *net = ERR_PTR(-EINVAL);
+	CLASS(fd, f)(fd);
 
-	if (!fd_file(f))
+	if (fd_empty(f))
 		return ERR_PTR(-EBADF);
 
 	if (proc_ns_file(fd_file(f))) {
 		struct ns_common *ns = get_proc_ns(file_inode(fd_file(f)));
 		if (ns->ops == &netns_operations)
-			net = get_net(container_of(ns, struct net, ns));
+			return get_net(container_of(ns, struct net, ns));
 	}
-	fdput(f);
 
-	return net;
+	return ERR_PTR(-EINVAL);
 }
 EXPORT_SYMBOL_GPL(get_net_ns_by_fd);
 #endif
diff --git a/security/landlock/syscalls.c b/security/landlock/syscalls.c
index f32eb38abd0f..f937f748d9e8 100644
--- a/security/landlock/syscalls.c
+++ b/security/landlock/syscalls.c
@@ -241,31 +241,21 @@ SYSCALL_DEFINE3(landlock_create_ruleset,
 static struct landlock_ruleset *get_ruleset_from_fd(const int fd,
 						    const fmode_t mode)
 {
-	struct fd ruleset_f;
+	CLASS(fd, ruleset_f)(fd);
 	struct landlock_ruleset *ruleset;
 
-	ruleset_f = fdget(fd);
-	if (!fd_file(ruleset_f))
+	if (fd_empty(ruleset_f))
 		return ERR_PTR(-EBADF);
 
 	/* Checks FD type and access right. */
-	if (fd_file(ruleset_f)->f_op != &ruleset_fops) {
-		ruleset = ERR_PTR(-EBADFD);
-		goto out_fdput;
-	}
-	if (!(fd_file(ruleset_f)->f_mode & mode)) {
-		ruleset = ERR_PTR(-EPERM);
-		goto out_fdput;
-	}
+	if (fd_file(ruleset_f)->f_op != &ruleset_fops)
+		return ERR_PTR(-EBADFD);
+	if (!(fd_file(ruleset_f)->f_mode & mode))
+		return ERR_PTR(-EPERM);
 	ruleset = fd_file(ruleset_f)->private_data;
-	if (WARN_ON_ONCE(ruleset->num_layers != 1)) {
-		ruleset = ERR_PTR(-EINVAL);
-		goto out_fdput;
-	}
+	if (WARN_ON_ONCE(ruleset->num_layers != 1))
+		return ERR_PTR(-EINVAL);
 	landlock_get_ruleset(ruleset);
-
-out_fdput:
-	fdput(ruleset_f);
 	return ruleset;
 }
 
diff --git a/virt/kvm/vfio.c b/virt/kvm/vfio.c
index 388ae471d258..53262b8a7656 100644
--- a/virt/kvm/vfio.c
+++ b/virt/kvm/vfio.c
@@ -190,11 +190,10 @@ static int kvm_vfio_file_del(struct kvm_device *dev, unsigned int fd)
 {
 	struct kvm_vfio *kv = dev->private;
 	struct kvm_vfio_file *kvf;
-	struct fd f;
+	CLASS(fd, f)(fd);
 	int ret;
 
-	f = fdget(fd);
-	if (!fd_file(f))
+	if (fd_empty(f))
 		return -EBADF;
 
 	ret = -ENOENT;
@@ -220,9 +219,6 @@ static int kvm_vfio_file_del(struct kvm_device *dev, unsigned int fd)
 	kvm_vfio_update_coherency(dev);
 
 	mutex_unlock(&kv->lock);
-
-	fdput(f);
-
 	return ret;
 }
 
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 134+ messages in thread

* Re: [PATCH v3 14/28] fdget(), trivial conversions
  2024-11-02  5:08     ` [PATCH v3 14/28] fdget(), trivial conversions Al Viro
@ 2024-11-11 17:22       ` Francesco Lavra
  0 siblings, 0 replies; 134+ messages in thread
From: Francesco Lavra @ 2024-11-11 17:22 UTC (permalink / raw)
  To: Al Viro, linux-fsdevel; +Cc: brauner, cgroups, kvm, netdev, torvalds

On Sat, 2024-11-02 at 05:08 +0000, Al Viro wrote:
> fdget() is the first thing done in scope, all matching fdput() are
> immediately followed by leaving the scope.
> 
> Reviewed-by: Christian Brauner <brauner@kernel.org>
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> ---
>  arch/powerpc/kvm/book3s_64_vio.c           | 21 +++---------
>  arch/powerpc/kvm/powerpc.c                 | 24 ++++---------
>  arch/powerpc/platforms/cell/spu_syscalls.c |  6 ++--
>  arch/x86/kernel/cpu/sgx/main.c             | 10 ++----
>  arch/x86/kvm/svm/sev.c                     | 39 ++++++++------------
> --
>  drivers/gpu/drm/amd/amdgpu/amdgpu_sched.c  | 23 ++++---------
>  drivers/gpu/drm/drm_syncobj.c              |  9 ++---
>  drivers/media/rc/lirc_dev.c                | 13 +++-----
>  fs/btrfs/ioctl.c                           |  5 ++-
>  fs/eventfd.c                               |  9 ++---
>  fs/eventpoll.c                             | 23 ++++---------
>  fs/fhandle.c                               |  5 ++-
>  fs/ioctl.c                                 | 23 +++++--------
>  fs/kernel_read_file.c                      | 12 +++----
>  fs/notify/fanotify/fanotify_user.c         | 15 +++------
>  fs/notify/inotify/inotify_user.c           | 17 +++-------
>  fs/open.c                                  | 36 +++++++++-----------
>  fs/read_write.c                            | 28 +++++-----------
>  fs/signalfd.c                              |  9 ++---
>  fs/sync.c                                  | 29 ++++++----------
>  io_uring/sqpoll.c                          | 29 +++++-----------
>  kernel/events/core.c                       | 14 +++-----
>  kernel/nsproxy.c                           |  5 ++-
>  kernel/pid.c                               |  7 ++--
>  kernel/sys.c                               | 15 +++------
>  kernel/watch_queue.c                       |  6 ++--
>  mm/fadvise.c                               | 10 ++----
>  mm/readahead.c                             | 17 +++-------
>  net/core/net_namespace.c                   | 10 +++---
>  security/landlock/syscalls.c               | 26 +++++----------
>  virt/kvm/vfio.c                            |  8 ++---
>  31 files changed, 164 insertions(+), 339 deletions(-)

[...]

> diff --git a/fs/read_write.c b/fs/read_write.c
> index ef3ee3725714..5e3df2d39283 100644
> --- a/fs/read_write.c
> +++ b/fs/read_write.c
> @@ -1663,36 +1663,32 @@ SYSCALL_DEFINE6(copy_file_range, int, fd_in,
> loff_t __user *, off_in,
>  {
>         loff_t pos_in;
>         loff_t pos_out;
> -       struct fd f_in;
> -       struct fd f_out;
>         ssize_t ret = -EBADF;

This initialization is no longer needed.

>  
> -       f_in = fdget(fd_in);
> -       if (!fd_file(f_in))
> -               goto out2;
> +       CLASS(fd, f_in)(fd_in);
> +       if (fd_empty(f_in))
> +               return -EBADF;
>  
> -       f_out = fdget(fd_out);
> -       if (!fd_file(f_out))
> -               goto out1;
> +       CLASS(fd, f_out)(fd_out);
> +       if (fd_empty(f_out))
> +               return -EBADF;
>  
> -       ret = -EFAULT;
>         if (off_in) {
>                 if (copy_from_user(&pos_in, off_in, sizeof(loff_t)))
> -                       goto out;
> +                       return -EFAULT;
>         } else {
>                 pos_in = fd_file(f_in)->f_pos;
>         }
>  
>         if (off_out) {
>                 if (copy_from_user(&pos_out, off_out,
> sizeof(loff_t)))
> -                       goto out;
> +                       return -EFAULT;
>         } else {
>                 pos_out = fd_file(f_out)->f_pos;
>         }
>  
> -       ret = -EINVAL;
>         if (flags != 0)
> -               goto out;
> +               return -EINVAL;
>  
>         ret = vfs_copy_file_range(fd_file(f_in), pos_in,
> fd_file(f_out), pos_out, len,
>                                   flags);


^ permalink raw reply	[flat|nested] 134+ messages in thread

* [PATCH v3 15/28] fdget(), more trivial conversions
  2024-11-02  5:07   ` [PATCH v3 01/28] net/socket.c: switch to CLASS(fd) Al Viro
                       ` (12 preceding siblings ...)
  2024-11-02  5:08     ` [PATCH v3 14/28] fdget(), trivial conversions Al Viro
@ 2024-11-02  5:08     ` Al Viro
  2024-11-02  5:08     ` [PATCH v3 16/28] convert do_preadv()/do_pwritev() Al Viro
                       ` (13 subsequent siblings)
  27 siblings, 0 replies; 134+ messages in thread
From: Al Viro @ 2024-11-02  5:08 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: viro, brauner, cgroups, kvm, netdev, torvalds

all failure exits prior to fdget() leave the scope, all matching fdput()
are immediately followed by leaving the scope.

[xfs_ioc_commit_range() chunk moved here as well]

Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 drivers/infiniband/core/ucma.c     | 19 +++-----
 drivers/vfio/group.c               |  6 +--
 fs/eventpoll.c                     | 15 ++----
 fs/ext4/ioctl.c                    | 21 +++------
 fs/f2fs/file.c                     | 15 ++----
 fs/fsopen.c                        | 19 +++-----
 fs/fuse/dev.c                      |  6 +--
 fs/locks.c                         | 15 ++----
 fs/namespace.c                     | 47 ++++++------------
 fs/notify/fanotify/fanotify_user.c | 29 +++++-------
 fs/notify/inotify/inotify_user.c   | 21 +++------
 fs/ocfs2/cluster/heartbeat.c       | 13 ++---
 fs/open.c                          | 12 ++---
 fs/read_write.c                    | 71 ++++++++++------------------
 fs/splice.c                        | 45 +++++++-----------
 fs/utimes.c                        | 11 ++---
 fs/xfs/xfs_exchrange.c             | 18 ++-----
 fs/xfs/xfs_ioctl.c                 | 69 +++++++++------------------
 ipc/mqueue.c                       | 76 +++++++++---------------------
 kernel/module/main.c               | 11 ++---
 kernel/pid.c                       | 13 ++---
 kernel/signal.c                    | 29 ++++--------
 kernel/taskstats.c                 | 18 +++----
 security/integrity/ima/ima_main.c  |  7 +--
 security/loadpin/loadpin.c         |  8 +---
 virt/kvm/vfio.c                    |  6 +--
 26 files changed, 202 insertions(+), 418 deletions(-)

diff --git a/drivers/infiniband/core/ucma.c b/drivers/infiniband/core/ucma.c
index 5dbb248e9625..02f1666f3cba 100644
--- a/drivers/infiniband/core/ucma.c
+++ b/drivers/infiniband/core/ucma.c
@@ -1615,7 +1615,6 @@ static ssize_t ucma_migrate_id(struct ucma_file *new_file,
 	struct ucma_event *uevent, *tmp;
 	struct ucma_context *ctx;
 	LIST_HEAD(event_list);
-	struct fd f;
 	struct ucma_file *cur_file;
 	int ret = 0;
 
@@ -1623,21 +1622,17 @@ static ssize_t ucma_migrate_id(struct ucma_file *new_file,
 		return -EFAULT;
 
 	/* Get current fd to protect against it being closed */
-	f = fdget(cmd.fd);
-	if (!fd_file(f))
+	CLASS(fd, f)(cmd.fd);
+	if (fd_empty(f))
 		return -ENOENT;
-	if (fd_file(f)->f_op != &ucma_fops) {
-		ret = -EINVAL;
-		goto file_put;
-	}
+	if (fd_file(f)->f_op != &ucma_fops)
+		return -EINVAL;
 	cur_file = fd_file(f)->private_data;
 
 	/* Validate current fd and prevent destruction of id. */
 	ctx = ucma_get_ctx(cur_file, cmd.id);
-	if (IS_ERR(ctx)) {
-		ret = PTR_ERR(ctx);
-		goto file_put;
-	}
+	if (IS_ERR(ctx))
+		return PTR_ERR(ctx);
 
 	rdma_lock_handler(ctx->cm_id);
 	/*
@@ -1678,8 +1673,6 @@ static ssize_t ucma_migrate_id(struct ucma_file *new_file,
 err_unlock:
 	rdma_unlock_handler(ctx->cm_id);
 	ucma_put_ctx(ctx);
-file_put:
-	fdput(f);
 	return ret;
 }
 
diff --git a/drivers/vfio/group.c b/drivers/vfio/group.c
index 95b336de8a17..49559605177e 100644
--- a/drivers/vfio/group.c
+++ b/drivers/vfio/group.c
@@ -104,15 +104,14 @@ static int vfio_group_ioctl_set_container(struct vfio_group *group,
 {
 	struct vfio_container *container;
 	struct iommufd_ctx *iommufd;
-	struct fd f;
 	int ret;
 	int fd;
 
 	if (get_user(fd, arg))
 		return -EFAULT;
 
-	f = fdget(fd);
-	if (!fd_file(f))
+	CLASS(fd, f)(fd);
+	if (fd_empty(f))
 		return -EBADF;
 
 	mutex_lock(&group->group_lock);
@@ -153,7 +152,6 @@ static int vfio_group_ioctl_set_container(struct vfio_group *group,
 
 out_unlock:
 	mutex_unlock(&group->group_lock);
-	fdput(f);
 	return ret;
 }
 
diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index 4607dcbc2851..7873d75a43cb 100644
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -2415,8 +2415,6 @@ SYSCALL_DEFINE4(epoll_ctl, int, epfd, int, op, int, fd,
 static int do_epoll_wait(int epfd, struct epoll_event __user *events,
 			 int maxevents, struct timespec64 *to)
 {
-	int error;
-	struct fd f;
 	struct eventpoll *ep;
 
 	/* The maximum number of event must be greater than zero */
@@ -2428,17 +2426,16 @@ static int do_epoll_wait(int epfd, struct epoll_event __user *events,
 		return -EFAULT;
 
 	/* Get the "struct file *" for the eventpoll file */
-	f = fdget(epfd);
-	if (!fd_file(f))
+	CLASS(fd, f)(epfd);
+	if (fd_empty(f))
 		return -EBADF;
 
 	/*
 	 * We have to check that the file structure underneath the fd
 	 * the user passed to us _is_ an eventpoll file.
 	 */
-	error = -EINVAL;
 	if (!is_file_epoll(fd_file(f)))
-		goto error_fput;
+		return -EINVAL;
 
 	/*
 	 * At this point it is safe to assume that the "private_data" contains
@@ -2447,11 +2444,7 @@ static int do_epoll_wait(int epfd, struct epoll_event __user *events,
 	ep = fd_file(f)->private_data;
 
 	/* Time to fish for events ... */
-	error = ep_poll(ep, events, maxevents, to);
-
-error_fput:
-	fdput(f);
-	return error;
+	return ep_poll(ep, events, maxevents, to);
 }
 
 SYSCALL_DEFINE4(epoll_wait, int, epfd, struct epoll_event __user *, events,
diff --git a/fs/ext4/ioctl.c b/fs/ext4/ioctl.c
index 1c77400bd88e..7b9ce71c1c81 100644
--- a/fs/ext4/ioctl.c
+++ b/fs/ext4/ioctl.c
@@ -1330,7 +1330,6 @@ static long __ext4_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
 
 	case EXT4_IOC_MOVE_EXT: {
 		struct move_extent me;
-		struct fd donor;
 		int err;
 
 		if (!(filp->f_mode & FMODE_READ) ||
@@ -1342,30 +1341,26 @@ static long __ext4_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
 			return -EFAULT;
 		me.moved_len = 0;
 
-		donor = fdget(me.donor_fd);
-		if (!fd_file(donor))
+		CLASS(fd, donor)(me.donor_fd);
+		if (fd_empty(donor))
 			return -EBADF;
 
-		if (!(fd_file(donor)->f_mode & FMODE_WRITE)) {
-			err = -EBADF;
-			goto mext_out;
-		}
+		if (!(fd_file(donor)->f_mode & FMODE_WRITE))
+			return -EBADF;
 
 		if (ext4_has_feature_bigalloc(sb)) {
 			ext4_msg(sb, KERN_ERR,
 				 "Online defrag not supported with bigalloc");
-			err = -EOPNOTSUPP;
-			goto mext_out;
+			return -EOPNOTSUPP;
 		} else if (IS_DAX(inode)) {
 			ext4_msg(sb, KERN_ERR,
 				 "Online defrag not supported with DAX");
-			err = -EOPNOTSUPP;
-			goto mext_out;
+			return -EOPNOTSUPP;
 		}
 
 		err = mnt_want_write_file(filp);
 		if (err)
-			goto mext_out;
+			return err;
 
 		err = ext4_move_extents(filp, fd_file(donor), me.orig_start,
 					me.donor_start, me.len, &me.moved_len);
@@ -1374,8 +1369,6 @@ static long __ext4_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
 		if (copy_to_user((struct move_extent __user *)arg,
 				 &me, sizeof(me)))
 			err = -EFAULT;
-mext_out:
-		fdput(donor);
 		return err;
 	}
 
diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
index 9ae54c4c72fe..8ba0b6d47c8c 100644
--- a/fs/f2fs/file.c
+++ b/fs/f2fs/file.c
@@ -3038,32 +3038,27 @@ static int f2fs_move_file_range(struct file *file_in, loff_t pos_in,
 static int __f2fs_ioc_move_range(struct file *filp,
 				struct f2fs_move_range *range)
 {
-	struct fd dst;
 	int err;
 
 	if (!(filp->f_mode & FMODE_READ) ||
 			!(filp->f_mode & FMODE_WRITE))
 		return -EBADF;
 
-	dst = fdget(range->dst_fd);
-	if (!fd_file(dst))
+	CLASS(fd, dst)(range->dst_fd);
+	if (fd_empty(dst))
 		return -EBADF;
 
-	if (!(fd_file(dst)->f_mode & FMODE_WRITE)) {
-		err = -EBADF;
-		goto err_out;
-	}
+	if (!(fd_file(dst)->f_mode & FMODE_WRITE))
+		return -EBADF;
 
 	err = mnt_want_write_file(filp);
 	if (err)
-		goto err_out;
+		return err;
 
 	err = f2fs_move_file_range(filp, range->pos_in, fd_file(dst),
 					range->pos_out, range->len);
 
 	mnt_drop_write_file(filp);
-err_out:
-	fdput(dst);
 	return err;
 }
 
diff --git a/fs/fsopen.c b/fs/fsopen.c
index 6cef3deccded..094a7f510edf 100644
--- a/fs/fsopen.c
+++ b/fs/fsopen.c
@@ -349,7 +349,6 @@ SYSCALL_DEFINE5(fsconfig,
 		int, aux)
 {
 	struct fs_context *fc;
-	struct fd f;
 	int ret;
 	int lookup_flags = 0;
 
@@ -392,12 +391,11 @@ SYSCALL_DEFINE5(fsconfig,
 		return -EOPNOTSUPP;
 	}
 
-	f = fdget(fd);
-	if (!fd_file(f))
+	CLASS(fd, f)(fd);
+	if (fd_empty(f))
 		return -EBADF;
-	ret = -EINVAL;
 	if (fd_file(f)->f_op != &fscontext_fops)
-		goto out_f;
+		return -EINVAL;
 
 	fc = fd_file(f)->private_data;
 	if (fc->ops == &legacy_fs_context_ops) {
@@ -407,17 +405,14 @@ SYSCALL_DEFINE5(fsconfig,
 		case FSCONFIG_SET_PATH_EMPTY:
 		case FSCONFIG_SET_FD:
 		case FSCONFIG_CMD_CREATE_EXCL:
-			ret = -EOPNOTSUPP;
-			goto out_f;
+			return -EOPNOTSUPP;
 		}
 	}
 
 	if (_key) {
 		param.key = strndup_user(_key, 256);
-		if (IS_ERR(param.key)) {
-			ret = PTR_ERR(param.key);
-			goto out_f;
-		}
+		if (IS_ERR(param.key))
+			return PTR_ERR(param.key);
 	}
 
 	switch (cmd) {
@@ -496,7 +491,5 @@ SYSCALL_DEFINE5(fsconfig,
 	}
 out_key:
 	kfree(param.key);
-out_f:
-	fdput(f);
 	return ret;
 }
diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
index 1f64ae6d7a69..0723c6344b20 100644
--- a/fs/fuse/dev.c
+++ b/fs/fuse/dev.c
@@ -2371,13 +2371,12 @@ static long fuse_dev_ioctl_clone(struct file *file, __u32 __user *argp)
 	int res;
 	int oldfd;
 	struct fuse_dev *fud = NULL;
-	struct fd f;
 
 	if (get_user(oldfd, argp))
 		return -EFAULT;
 
-	f = fdget(oldfd);
-	if (!fd_file(f))
+	CLASS(fd, f)(oldfd);
+	if (fd_empty(f))
 		return -EINVAL;
 
 	/*
@@ -2394,7 +2393,6 @@ static long fuse_dev_ioctl_clone(struct file *file, __u32 __user *argp)
 		mutex_unlock(&fuse_mutex);
 	}
 
-	fdput(f);
 	return res;
 }
 
diff --git a/fs/locks.c b/fs/locks.c
index 204847628f3e..25afc8d9c9d1 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -2136,7 +2136,6 @@ SYSCALL_DEFINE2(flock, unsigned int, fd, unsigned int, cmd)
 {
 	int can_sleep, error, type;
 	struct file_lock fl;
-	struct fd f;
 
 	/*
 	 * LOCK_MAND locks were broken for a long time in that they never
@@ -2155,19 +2154,18 @@ SYSCALL_DEFINE2(flock, unsigned int, fd, unsigned int, cmd)
 	if (type < 0)
 		return type;
 
-	error = -EBADF;
-	f = fdget(fd);
-	if (!fd_file(f))
-		return error;
+	CLASS(fd, f)(fd);
+	if (fd_empty(f))
+		return -EBADF;
 
 	if (type != F_UNLCK && !(fd_file(f)->f_mode & (FMODE_READ | FMODE_WRITE)))
-		goto out_putf;
+		return -EBADF;
 
 	flock_make_lock(fd_file(f), &fl, type);
 
 	error = security_file_lock(fd_file(f), fl.c.flc_type);
 	if (error)
-		goto out_putf;
+		return error;
 
 	can_sleep = !(cmd & LOCK_NB);
 	if (can_sleep)
@@ -2181,9 +2179,6 @@ SYSCALL_DEFINE2(flock, unsigned int, fd, unsigned int, cmd)
 		error = locks_lock_file_wait(fd_file(f), &fl);
 
 	locks_release_private(&fl);
- out_putf:
-	fdput(f);
-
 	return error;
 }
 
diff --git a/fs/namespace.c b/fs/namespace.c
index 93c377816d75..d2eccbdd0439 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -4105,7 +4105,6 @@ SYSCALL_DEFINE3(fsmount, int, fs_fd, unsigned int, flags,
 	struct file *file;
 	struct path newmount;
 	struct mount *mnt;
-	struct fd f;
 	unsigned int mnt_flags = 0;
 	long ret;
 
@@ -4133,19 +4132,18 @@ SYSCALL_DEFINE3(fsmount, int, fs_fd, unsigned int, flags,
 		return -EINVAL;
 	}
 
-	f = fdget(fs_fd);
-	if (!fd_file(f))
+	CLASS(fd, f)(fs_fd);
+	if (fd_empty(f))
 		return -EBADF;
 
-	ret = -EINVAL;
 	if (fd_file(f)->f_op != &fscontext_fops)
-		goto err_fsfd;
+		return -EINVAL;
 
 	fc = fd_file(f)->private_data;
 
 	ret = mutex_lock_interruptible(&fc->uapi_mutex);
 	if (ret < 0)
-		goto err_fsfd;
+		return ret;
 
 	/* There must be a valid superblock or we can't mount it */
 	ret = -EINVAL;
@@ -4212,8 +4210,6 @@ SYSCALL_DEFINE3(fsmount, int, fs_fd, unsigned int, flags,
 	path_put(&newmount);
 err_unlock:
 	mutex_unlock(&fc->uapi_mutex);
-err_fsfd:
-	fdput(f);
 	return ret;
 }
 
@@ -4668,10 +4664,8 @@ static int do_mount_setattr(struct path *path, struct mount_kattr *kattr)
 static int build_mount_idmapped(const struct mount_attr *attr, size_t usize,
 				struct mount_kattr *kattr, unsigned int flags)
 {
-	int err = 0;
 	struct ns_common *ns;
 	struct user_namespace *mnt_userns;
-	struct fd f;
 
 	if (!((attr->attr_set | attr->attr_clr) & MOUNT_ATTR_IDMAP))
 		return 0;
@@ -4687,20 +4681,16 @@ static int build_mount_idmapped(const struct mount_attr *attr, size_t usize,
 	if (attr->userns_fd > INT_MAX)
 		return -EINVAL;
 
-	f = fdget(attr->userns_fd);
-	if (!fd_file(f))
+	CLASS(fd, f)(attr->userns_fd);
+	if (fd_empty(f))
 		return -EBADF;
 
-	if (!proc_ns_file(fd_file(f))) {
-		err = -EINVAL;
-		goto out_fput;
-	}
+	if (!proc_ns_file(fd_file(f)))
+		return -EINVAL;
 
 	ns = get_proc_ns(file_inode(fd_file(f)));
-	if (ns->ops->type != CLONE_NEWUSER) {
-		err = -EINVAL;
-		goto out_fput;
-	}
+	if (ns->ops->type != CLONE_NEWUSER)
+		return -EINVAL;
 
 	/*
 	 * The initial idmapping cannot be used to create an idmapped
@@ -4711,22 +4701,15 @@ static int build_mount_idmapped(const struct mount_attr *attr, size_t usize,
 	 * result.
 	 */
 	mnt_userns = container_of(ns, struct user_namespace, ns);
-	if (mnt_userns == &init_user_ns) {
-		err = -EPERM;
-		goto out_fput;
-	}
+	if (mnt_userns == &init_user_ns)
+		return -EPERM;
 
 	/* We're not controlling the target namespace. */
-	if (!ns_capable(mnt_userns, CAP_SYS_ADMIN)) {
-		err = -EPERM;
-		goto out_fput;
-	}
+	if (!ns_capable(mnt_userns, CAP_SYS_ADMIN))
+		return -EPERM;
 
 	kattr->mnt_userns = get_user_ns(mnt_userns);
-
-out_fput:
-	fdput(f);
-	return err;
+	return 0;
 }
 
 static int build_mount_kattr(const struct mount_attr *attr, size_t usize,
diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c
index 07c5ffc8523b..e19b28b44805 100644
--- a/fs/notify/fanotify/fanotify_user.c
+++ b/fs/notify/fanotify/fanotify_user.c
@@ -1677,7 +1677,6 @@ static int do_fanotify_mark(int fanotify_fd, unsigned int flags, __u64 mask,
 	struct inode *inode = NULL;
 	struct vfsmount *mnt = NULL;
 	struct fsnotify_group *group;
-	struct fd f;
 	struct path path;
 	struct fan_fsid __fsid, *fsid = NULL;
 	u32 valid_mask = FANOTIFY_EVENTS | FANOTIFY_EVENT_FLAGS;
@@ -1747,14 +1746,13 @@ static int do_fanotify_mark(int fanotify_fd, unsigned int flags, __u64 mask,
 		umask = FANOTIFY_EVENT_FLAGS;
 	}
 
-	f = fdget(fanotify_fd);
-	if (unlikely(!fd_file(f)))
+	CLASS(fd, f)(fanotify_fd);
+	if (fd_empty(f))
 		return -EBADF;
 
 	/* verify that this is indeed an fanotify instance */
-	ret = -EINVAL;
 	if (unlikely(fd_file(f)->f_op != &fanotify_fops))
-		goto fput_and_out;
+		return -EINVAL;
 	group = fd_file(f)->private_data;
 
 	/*
@@ -1762,23 +1760,21 @@ static int do_fanotify_mark(int fanotify_fd, unsigned int flags, __u64 mask,
 	 * marks.  This also includes setting up such marks by a group that
 	 * was initialized by an unprivileged user.
 	 */
-	ret = -EPERM;
 	if ((!capable(CAP_SYS_ADMIN) ||
 	     FAN_GROUP_FLAG(group, FANOTIFY_UNPRIV)) &&
 	    mark_type != FAN_MARK_INODE)
-		goto fput_and_out;
+		return -EPERM;
 
 	/*
 	 * Permission events require minimum priority FAN_CLASS_CONTENT.
 	 */
-	ret = -EINVAL;
 	if (mask & FANOTIFY_PERM_EVENTS &&
 	    group->priority < FSNOTIFY_PRIO_CONTENT)
-		goto fput_and_out;
+		return -EINVAL;
 
 	if (mask & FAN_FS_ERROR &&
 	    mark_type != FAN_MARK_FILESYSTEM)
-		goto fput_and_out;
+		return -EINVAL;
 
 	/*
 	 * Evictable is only relevant for inode marks, because only inode object
@@ -1786,7 +1782,7 @@ static int do_fanotify_mark(int fanotify_fd, unsigned int flags, __u64 mask,
 	 */
 	if (flags & FAN_MARK_EVICTABLE &&
 	     mark_type != FAN_MARK_INODE)
-		goto fput_and_out;
+		return -EINVAL;
 
 	/*
 	 * Events that do not carry enough information to report
@@ -1798,7 +1794,7 @@ static int do_fanotify_mark(int fanotify_fd, unsigned int flags, __u64 mask,
 	fid_mode = FAN_GROUP_FLAG(group, FANOTIFY_FID_BITS);
 	if (mask & ~(FANOTIFY_FD_EVENTS|FANOTIFY_EVENT_FLAGS) &&
 	    (!fid_mode || mark_type == FAN_MARK_MOUNT))
-		goto fput_and_out;
+		return -EINVAL;
 
 	/*
 	 * FAN_RENAME uses special info type records to report the old and
@@ -1806,23 +1802,22 @@ static int do_fanotify_mark(int fanotify_fd, unsigned int flags, __u64 mask,
 	 * useful and was not implemented.
 	 */
 	if (mask & FAN_RENAME && !(fid_mode & FAN_REPORT_NAME))
-		goto fput_and_out;
+		return -EINVAL;
 
 	if (mark_cmd == FAN_MARK_FLUSH) {
-		ret = 0;
 		if (mark_type == FAN_MARK_MOUNT)
 			fsnotify_clear_vfsmount_marks_by_group(group);
 		else if (mark_type == FAN_MARK_FILESYSTEM)
 			fsnotify_clear_sb_marks_by_group(group);
 		else
 			fsnotify_clear_inode_marks_by_group(group);
-		goto fput_and_out;
+		return 0;
 	}
 
 	ret = fanotify_find_path(dfd, pathname, &path, flags,
 			(mask & ALL_FSNOTIFY_EVENTS), obj_type);
 	if (ret)
-		goto fput_and_out;
+		return ret;
 
 	if (mark_cmd == FAN_MARK_ADD) {
 		ret = fanotify_events_supported(group, &path, mask, flags);
@@ -1901,8 +1896,6 @@ static int do_fanotify_mark(int fanotify_fd, unsigned int flags, __u64 mask,
 
 path_put_and_out:
 	path_put(&path);
-fput_and_out:
-	fdput(f);
 	return ret;
 }
 
diff --git a/fs/notify/inotify/inotify_user.c b/fs/notify/inotify/inotify_user.c
index dc645af2a6ad..e0c48956608a 100644
--- a/fs/notify/inotify/inotify_user.c
+++ b/fs/notify/inotify/inotify_user.c
@@ -732,7 +732,6 @@ SYSCALL_DEFINE3(inotify_add_watch, int, fd, const char __user *, pathname,
 	struct fsnotify_group *group;
 	struct inode *inode;
 	struct path path;
-	struct fd f;
 	int ret;
 	unsigned flags = 0;
 
@@ -752,21 +751,17 @@ SYSCALL_DEFINE3(inotify_add_watch, int, fd, const char __user *, pathname,
 	if (unlikely(!(mask & ALL_INOTIFY_BITS)))
 		return -EINVAL;
 
-	f = fdget(fd);
-	if (unlikely(!fd_file(f)))
+	CLASS(fd, f)(fd);
+	if (fd_empty(f))
 		return -EBADF;
 
 	/* IN_MASK_ADD and IN_MASK_CREATE don't make sense together */
-	if (unlikely((mask & IN_MASK_ADD) && (mask & IN_MASK_CREATE))) {
-		ret = -EINVAL;
-		goto fput_and_out;
-	}
+	if (unlikely((mask & IN_MASK_ADD) && (mask & IN_MASK_CREATE)))
+		return -EINVAL;
 
 	/* verify that this is indeed an inotify instance */
-	if (unlikely(fd_file(f)->f_op != &inotify_fops)) {
-		ret = -EINVAL;
-		goto fput_and_out;
-	}
+	if (unlikely(fd_file(f)->f_op != &inotify_fops))
+		return -EINVAL;
 
 	if (!(mask & IN_DONT_FOLLOW))
 		flags |= LOOKUP_FOLLOW;
@@ -776,7 +771,7 @@ SYSCALL_DEFINE3(inotify_add_watch, int, fd, const char __user *, pathname,
 	ret = inotify_find_inode(pathname, &path, flags,
 			(mask & IN_ALL_EVENTS));
 	if (ret)
-		goto fput_and_out;
+		return ret;
 
 	/* inode held in place by reference to path; group by fget on fd */
 	inode = path.dentry->d_inode;
@@ -785,8 +780,6 @@ SYSCALL_DEFINE3(inotify_add_watch, int, fd, const char __user *, pathname,
 	/* create/update an inode mark */
 	ret = inotify_update_watch(group, inode, mask);
 	path_put(&path);
-fput_and_out:
-	fdput(f);
 	return ret;
 }
 
diff --git a/fs/ocfs2/cluster/heartbeat.c b/fs/ocfs2/cluster/heartbeat.c
index bc55340a60c3..4200a0341343 100644
--- a/fs/ocfs2/cluster/heartbeat.c
+++ b/fs/ocfs2/cluster/heartbeat.c
@@ -1765,7 +1765,6 @@ static ssize_t o2hb_region_dev_store(struct config_item *item,
 	long fd;
 	int sectsize;
 	char *p = (char *)page;
-	struct fd f;
 	ssize_t ret = -EINVAL;
 	int live_threshold;
 
@@ -1784,23 +1783,23 @@ static ssize_t o2hb_region_dev_store(struct config_item *item,
 	if (fd < 0 || fd >= INT_MAX)
 		return -EINVAL;
 
-	f = fdget(fd);
-	if (fd_file(f) == NULL)
+	CLASS(fd, f)(fd);
+	if (fd_empty(f))
 		return -EINVAL;
 
 	if (reg->hr_blocks == 0 || reg->hr_start_block == 0 ||
 	    reg->hr_block_bytes == 0)
-		goto out2;
+		return -EINVAL;
 
 	if (!S_ISBLK(fd_file(f)->f_mapping->host->i_mode))
-		goto out2;
+		return -EINVAL;
 
 	reg->hr_bdev_file = bdev_file_open_by_dev(fd_file(f)->f_mapping->host->i_rdev,
 			BLK_OPEN_WRITE | BLK_OPEN_READ, NULL, NULL);
 	if (IS_ERR(reg->hr_bdev_file)) {
 		ret = PTR_ERR(reg->hr_bdev_file);
 		reg->hr_bdev_file = NULL;
-		goto out2;
+		return ret;
 	}
 
 	sectsize = bdev_logical_block_size(reg_bdev(reg));
@@ -1906,8 +1905,6 @@ static ssize_t o2hb_region_dev_store(struct config_item *item,
 		fput(reg->hr_bdev_file);
 		reg->hr_bdev_file = NULL;
 	}
-out2:
-	fdput(f);
 	return ret;
 }
 
diff --git a/fs/open.c b/fs/open.c
index 24d22f4222f0..33468aaa5311 100644
--- a/fs/open.c
+++ b/fs/open.c
@@ -187,19 +187,13 @@ long do_ftruncate(struct file *file, loff_t length, int small)
 
 long do_sys_ftruncate(unsigned int fd, loff_t length, int small)
 {
-	struct fd f;
-	int error;
-
 	if (length < 0)
 		return -EINVAL;
-	f = fdget(fd);
-	if (!fd_file(f))
+	CLASS(fd, f)(fd);
+	if (fd_empty(f))
 		return -EBADF;
 
-	error = do_ftruncate(fd_file(f), length, small);
-
-	fdput(f);
-	return error;
+	return do_ftruncate(fd_file(f), length, small);
 }
 
 SYSCALL_DEFINE2(ftruncate, unsigned int, fd, off_t, length)
diff --git a/fs/read_write.c b/fs/read_write.c
index 5e3df2d39283..deb87457aa76 100644
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -745,21 +745,17 @@ SYSCALL_DEFINE3(write, unsigned int, fd, const char __user *, buf,
 ssize_t ksys_pread64(unsigned int fd, char __user *buf, size_t count,
 		     loff_t pos)
 {
-	struct fd f;
-	ssize_t ret = -EBADF;
-
 	if (pos < 0)
 		return -EINVAL;
 
-	f = fdget(fd);
-	if (fd_file(f)) {
-		ret = -ESPIPE;
-		if (fd_file(f)->f_mode & FMODE_PREAD)
-			ret = vfs_read(fd_file(f), buf, count, &pos);
-		fdput(f);
-	}
+	CLASS(fd, f)(fd);
+	if (fd_empty(f))
+		return -EBADF;
 
-	return ret;
+	if (fd_file(f)->f_mode & FMODE_PREAD)
+		return vfs_read(fd_file(f), buf, count, &pos);
+
+	return -ESPIPE;
 }
 
 SYSCALL_DEFINE4(pread64, unsigned int, fd, char __user *, buf,
@@ -779,21 +775,17 @@ COMPAT_SYSCALL_DEFINE5(pread64, unsigned int, fd, char __user *, buf,
 ssize_t ksys_pwrite64(unsigned int fd, const char __user *buf,
 		      size_t count, loff_t pos)
 {
-	struct fd f;
-	ssize_t ret = -EBADF;
-
 	if (pos < 0)
 		return -EINVAL;
 
-	f = fdget(fd);
-	if (fd_file(f)) {
-		ret = -ESPIPE;
-		if (fd_file(f)->f_mode & FMODE_PWRITE)
-			ret = vfs_write(fd_file(f), buf, count, &pos);
-		fdput(f);
-	}
+	CLASS(fd, f)(fd);
+	if (fd_empty(f))
+		return -EBADF;
 
-	return ret;
+	if (fd_file(f)->f_mode & FMODE_PWRITE)
+		return vfs_write(fd_file(f), buf, count, &pos);
+
+	return -ESPIPE;
 }
 
 SYSCALL_DEFINE4(pwrite64, unsigned int, fd, const char __user *, buf,
@@ -1307,7 +1299,6 @@ COMPAT_SYSCALL_DEFINE6(pwritev2, compat_ulong_t, fd,
 static ssize_t do_sendfile(int out_fd, int in_fd, loff_t *ppos,
 			   size_t count, loff_t max)
 {
-	struct fd in, out;
 	struct inode *in_inode, *out_inode;
 	struct pipe_inode_info *opipe;
 	loff_t pos;
@@ -1318,35 +1309,32 @@ static ssize_t do_sendfile(int out_fd, int in_fd, loff_t *ppos,
 	/*
 	 * Get input file, and verify that it is ok..
 	 */
-	retval = -EBADF;
-	in = fdget(in_fd);
-	if (!fd_file(in))
-		goto out;
+	CLASS(fd, in)(in_fd);
+	if (fd_empty(in))
+		return -EBADF;
 	if (!(fd_file(in)->f_mode & FMODE_READ))
-		goto fput_in;
-	retval = -ESPIPE;
+		return -EBADF;
 	if (!ppos) {
 		pos = fd_file(in)->f_pos;
 	} else {
 		pos = *ppos;
 		if (!(fd_file(in)->f_mode & FMODE_PREAD))
-			goto fput_in;
+			return -ESPIPE;
 	}
 	retval = rw_verify_area(READ, fd_file(in), &pos, count);
 	if (retval < 0)
-		goto fput_in;
+		return retval;
 	if (count > MAX_RW_COUNT)
 		count =  MAX_RW_COUNT;
 
 	/*
 	 * Get output file, and verify that it is ok..
 	 */
-	retval = -EBADF;
-	out = fdget(out_fd);
-	if (!fd_file(out))
-		goto fput_in;
+	CLASS(fd, out)(out_fd);
+	if (fd_empty(out))
+		return -EBADF;
 	if (!(fd_file(out)->f_mode & FMODE_WRITE))
-		goto fput_out;
+		return -EBADF;
 	in_inode = file_inode(fd_file(in));
 	out_inode = file_inode(fd_file(out));
 	out_pos = fd_file(out)->f_pos;
@@ -1355,9 +1343,8 @@ static ssize_t do_sendfile(int out_fd, int in_fd, loff_t *ppos,
 		max = min(in_inode->i_sb->s_maxbytes, out_inode->i_sb->s_maxbytes);
 
 	if (unlikely(pos + count > max)) {
-		retval = -EOVERFLOW;
 		if (pos >= max)
-			goto fput_out;
+			return -EOVERFLOW;
 		count = max - pos;
 	}
 
@@ -1376,7 +1363,7 @@ static ssize_t do_sendfile(int out_fd, int in_fd, loff_t *ppos,
 	if (!opipe) {
 		retval = rw_verify_area(WRITE, fd_file(out), &out_pos, count);
 		if (retval < 0)
-			goto fput_out;
+			return retval;
 		retval = do_splice_direct(fd_file(in), &pos, fd_file(out), &out_pos,
 					  count, fl);
 	} else {
@@ -1402,12 +1389,6 @@ static ssize_t do_sendfile(int out_fd, int in_fd, loff_t *ppos,
 	inc_syscw(current);
 	if (pos > max)
 		retval = -EOVERFLOW;
-
-fput_out:
-	fdput(out);
-fput_in:
-	fdput(in);
-out:
 	return retval;
 }
 
diff --git a/fs/splice.c b/fs/splice.c
index 29cd39d7f4a0..2898fa1e9e63 100644
--- a/fs/splice.c
+++ b/fs/splice.c
@@ -1622,27 +1622,22 @@ SYSCALL_DEFINE6(splice, int, fd_in, loff_t __user *, off_in,
 		int, fd_out, loff_t __user *, off_out,
 		size_t, len, unsigned int, flags)
 {
-	struct fd in, out;
-	ssize_t error;
-
 	if (unlikely(!len))
 		return 0;
 
 	if (unlikely(flags & ~SPLICE_F_ALL))
 		return -EINVAL;
 
-	error = -EBADF;
-	in = fdget(fd_in);
-	if (fd_file(in)) {
-		out = fdget(fd_out);
-		if (fd_file(out)) {
-			error = __do_splice(fd_file(in), off_in, fd_file(out), off_out,
+	CLASS(fd, in)(fd_in);
+	if (fd_empty(in))
+		return -EBADF;
+
+	CLASS(fd, out)(fd_out);
+	if (fd_empty(out))
+		return -EBADF;
+
+	return __do_splice(fd_file(in), off_in, fd_file(out), off_out,
 					    len, flags);
-			fdput(out);
-		}
-		fdput(in);
-	}
-	return error;
 }
 
 /*
@@ -1992,25 +1987,19 @@ ssize_t do_tee(struct file *in, struct file *out, size_t len,
 
 SYSCALL_DEFINE4(tee, int, fdin, int, fdout, size_t, len, unsigned int, flags)
 {
-	struct fd in, out;
-	ssize_t error;
-
 	if (unlikely(flags & ~SPLICE_F_ALL))
 		return -EINVAL;
 
 	if (unlikely(!len))
 		return 0;
 
-	error = -EBADF;
-	in = fdget(fdin);
-	if (fd_file(in)) {
-		out = fdget(fdout);
-		if (fd_file(out)) {
-			error = do_tee(fd_file(in), fd_file(out), len, flags);
-			fdput(out);
-		}
- 		fdput(in);
- 	}
+	CLASS(fd, in)(fdin);
+	if (fd_empty(in))
+		return -EBADF;
 
-	return error;
+	CLASS(fd, out)(fdout);
+	if (fd_empty(out))
+		return -EBADF;
+
+	return do_tee(fd_file(in), fd_file(out), len, flags);
 }
diff --git a/fs/utimes.c b/fs/utimes.c
index 99b26f792b89..c7c7958e57b2 100644
--- a/fs/utimes.c
+++ b/fs/utimes.c
@@ -108,18 +108,13 @@ static int do_utimes_path(int dfd, const char __user *filename,
 
 static int do_utimes_fd(int fd, struct timespec64 *times, int flags)
 {
-	struct fd f;
-	int error;
-
 	if (flags)
 		return -EINVAL;
 
-	f = fdget(fd);
-	if (!fd_file(f))
+	CLASS(fd, f)(fd);
+	if (fd_empty(f))
 		return -EBADF;
-	error = vfs_utimes(&fd_file(f)->f_path, times);
-	fdput(f);
-	return error;
+	return vfs_utimes(&fd_file(f)->f_path, times);
 }
 
 /*
diff --git a/fs/xfs/xfs_exchrange.c b/fs/xfs/xfs_exchrange.c
index 75cb53f090d1..fa29c8b334d2 100644
--- a/fs/xfs/xfs_exchrange.c
+++ b/fs/xfs/xfs_exchrange.c
@@ -813,8 +813,6 @@ xfs_ioc_exchange_range(
 		.file2			= file,
 	};
 	struct xfs_exchange_range	args;
-	struct fd			file1;
-	int				error;
 
 	if (copy_from_user(&args, argp, sizeof(args)))
 		return -EFAULT;
@@ -828,14 +826,12 @@ xfs_ioc_exchange_range(
 	fxr.length		= args.length;
 	fxr.flags		= args.flags;
 
-	file1 = fdget(args.file1_fd);
-	if (!fd_file(file1))
+	CLASS(fd, file1)(args.file1_fd);
+	if (fd_empty(file1))
 		return -EBADF;
 	fxr.file1 = fd_file(file1);
 
-	error = xfs_exchange_range(&fxr);
-	fdput(file1);
-	return error;
+	return xfs_exchange_range(&fxr);
 }
 
 /* Opaque freshness blob for XFS_IOC_COMMIT_RANGE */
@@ -909,8 +905,6 @@ xfs_ioc_commit_range(
 	struct xfs_commit_range_fresh	*kern_f;
 	struct xfs_inode		*ip2 = XFS_I(file_inode(file));
 	struct xfs_mount		*mp = ip2->i_mount;
-	struct fd			file1;
-	int				error;
 
 	kern_f = (struct xfs_commit_range_fresh *)&args.file2_freshness;
 
@@ -934,12 +928,10 @@ xfs_ioc_commit_range(
 	fxr.file2_ctime.tv_sec	= kern_f->file2_ctime;
 	fxr.file2_ctime.tv_nsec	= kern_f->file2_ctime_nsec;
 
-	file1 = fdget(args.file1_fd);
+	CLASS(fd, file1)(args.file1_fd);
 	if (fd_empty(file1))
 		return -EBADF;
 	fxr.file1 = fd_file(file1);
 
-	error = xfs_exchange_range(&fxr);
-	fdput(file1);
-	return error;
+	return xfs_exchange_range(&fxr);
 }
diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
index a20d426ef021..a24fcdc8ad4f 100644
--- a/fs/xfs/xfs_ioctl.c
+++ b/fs/xfs/xfs_ioctl.c
@@ -881,41 +881,29 @@ xfs_ioc_swapext(
 	xfs_swapext_t	*sxp)
 {
 	xfs_inode_t     *ip, *tip;
-	struct fd	f, tmp;
-	int		error = 0;
 
 	/* Pull information for the target fd */
-	f = fdget((int)sxp->sx_fdtarget);
-	if (!fd_file(f)) {
-		error = -EINVAL;
-		goto out;
-	}
+	CLASS(fd, f)((int)sxp->sx_fdtarget);
+	if (fd_empty(f))
+		return -EINVAL;
 
 	if (!(fd_file(f)->f_mode & FMODE_WRITE) ||
 	    !(fd_file(f)->f_mode & FMODE_READ) ||
-	    (fd_file(f)->f_flags & O_APPEND)) {
-		error = -EBADF;
-		goto out_put_file;
-	}
+	    (fd_file(f)->f_flags & O_APPEND))
+		return -EBADF;
 
-	tmp = fdget((int)sxp->sx_fdtmp);
-	if (!fd_file(tmp)) {
-		error = -EINVAL;
-		goto out_put_file;
-	}
+	CLASS(fd, tmp)((int)sxp->sx_fdtmp);
+	if (fd_empty(tmp))
+		return -EINVAL;
 
 	if (!(fd_file(tmp)->f_mode & FMODE_WRITE) ||
 	    !(fd_file(tmp)->f_mode & FMODE_READ) ||
-	    (fd_file(tmp)->f_flags & O_APPEND)) {
-		error = -EBADF;
-		goto out_put_tmp_file;
-	}
+	    (fd_file(tmp)->f_flags & O_APPEND))
+		return -EBADF;
 
 	if (IS_SWAPFILE(file_inode(fd_file(f))) ||
-	    IS_SWAPFILE(file_inode(fd_file(tmp)))) {
-		error = -EINVAL;
-		goto out_put_tmp_file;
-	}
+	    IS_SWAPFILE(file_inode(fd_file(tmp))))
+		return -EINVAL;
 
 	/*
 	 * We need to ensure that the fds passed in point to XFS inodes
@@ -923,37 +911,22 @@ xfs_ioc_swapext(
 	 * control over what the user passes us here.
 	 */
 	if (fd_file(f)->f_op != &xfs_file_operations ||
-	    fd_file(tmp)->f_op != &xfs_file_operations) {
-		error = -EINVAL;
-		goto out_put_tmp_file;
-	}
+	    fd_file(tmp)->f_op != &xfs_file_operations)
+		return -EINVAL;
 
 	ip = XFS_I(file_inode(fd_file(f)));
 	tip = XFS_I(file_inode(fd_file(tmp)));
 
-	if (ip->i_mount != tip->i_mount) {
-		error = -EINVAL;
-		goto out_put_tmp_file;
-	}
-
-	if (ip->i_ino == tip->i_ino) {
-		error = -EINVAL;
-		goto out_put_tmp_file;
-	}
+	if (ip->i_mount != tip->i_mount)
+		return -EINVAL;
 
-	if (xfs_is_shutdown(ip->i_mount)) {
-		error = -EIO;
-		goto out_put_tmp_file;
-	}
+	if (ip->i_ino == tip->i_ino)
+		return -EINVAL;
 
-	error = xfs_swap_extents(ip, tip, sxp);
+	if (xfs_is_shutdown(ip->i_mount))
+		return -EIO;
 
- out_put_tmp_file:
-	fdput(tmp);
- out_put_file:
-	fdput(f);
- out:
-	return error;
+	return xfs_swap_extents(ip, tip, sxp);
 }
 
 static int
diff --git a/ipc/mqueue.c b/ipc/mqueue.c
index 4f1dec518fae..35b4f8659904 100644
--- a/ipc/mqueue.c
+++ b/ipc/mqueue.c
@@ -1063,7 +1063,6 @@ static int do_mq_timedsend(mqd_t mqdes, const char __user *u_msg_ptr,
 		size_t msg_len, unsigned int msg_prio,
 		struct timespec64 *ts)
 {
-	struct fd f;
 	struct inode *inode;
 	struct ext_wait_queue wait;
 	struct ext_wait_queue *receiver;
@@ -1084,37 +1083,27 @@ static int do_mq_timedsend(mqd_t mqdes, const char __user *u_msg_ptr,
 
 	audit_mq_sendrecv(mqdes, msg_len, msg_prio, ts);
 
-	f = fdget(mqdes);
-	if (unlikely(!fd_file(f))) {
-		ret = -EBADF;
-		goto out;
-	}
+	CLASS(fd, f)(mqdes);
+	if (fd_empty(f))
+		return -EBADF;
 
 	inode = file_inode(fd_file(f));
-	if (unlikely(fd_file(f)->f_op != &mqueue_file_operations)) {
-		ret = -EBADF;
-		goto out_fput;
-	}
+	if (unlikely(fd_file(f)->f_op != &mqueue_file_operations))
+		return -EBADF;
 	info = MQUEUE_I(inode);
 	audit_file(fd_file(f));
 
-	if (unlikely(!(fd_file(f)->f_mode & FMODE_WRITE))) {
-		ret = -EBADF;
-		goto out_fput;
-	}
+	if (unlikely(!(fd_file(f)->f_mode & FMODE_WRITE)))
+		return -EBADF;
 
-	if (unlikely(msg_len > info->attr.mq_msgsize)) {
-		ret = -EMSGSIZE;
-		goto out_fput;
-	}
+	if (unlikely(msg_len > info->attr.mq_msgsize))
+		return -EMSGSIZE;
 
 	/* First try to allocate memory, before doing anything with
 	 * existing queues. */
 	msg_ptr = load_msg(u_msg_ptr, msg_len);
-	if (IS_ERR(msg_ptr)) {
-		ret = PTR_ERR(msg_ptr);
-		goto out_fput;
-	}
+	if (IS_ERR(msg_ptr))
+		return PTR_ERR(msg_ptr);
 	msg_ptr->m_ts = msg_len;
 	msg_ptr->m_type = msg_prio;
 
@@ -1172,9 +1161,6 @@ static int do_mq_timedsend(mqd_t mqdes, const char __user *u_msg_ptr,
 out_free:
 	if (ret)
 		free_msg(msg_ptr);
-out_fput:
-	fdput(f);
-out:
 	return ret;
 }
 
@@ -1184,7 +1170,6 @@ static int do_mq_timedreceive(mqd_t mqdes, char __user *u_msg_ptr,
 {
 	ssize_t ret;
 	struct msg_msg *msg_ptr;
-	struct fd f;
 	struct inode *inode;
 	struct mqueue_inode_info *info;
 	struct ext_wait_queue wait;
@@ -1198,30 +1183,22 @@ static int do_mq_timedreceive(mqd_t mqdes, char __user *u_msg_ptr,
 
 	audit_mq_sendrecv(mqdes, msg_len, 0, ts);
 
-	f = fdget(mqdes);
-	if (unlikely(!fd_file(f))) {
-		ret = -EBADF;
-		goto out;
-	}
+	CLASS(fd, f)(mqdes);
+	if (fd_empty(f))
+		return -EBADF;
 
 	inode = file_inode(fd_file(f));
-	if (unlikely(fd_file(f)->f_op != &mqueue_file_operations)) {
-		ret = -EBADF;
-		goto out_fput;
-	}
+	if (unlikely(fd_file(f)->f_op != &mqueue_file_operations))
+		return -EBADF;
 	info = MQUEUE_I(inode);
 	audit_file(fd_file(f));
 
-	if (unlikely(!(fd_file(f)->f_mode & FMODE_READ))) {
-		ret = -EBADF;
-		goto out_fput;
-	}
+	if (unlikely(!(fd_file(f)->f_mode & FMODE_READ)))
+		return -EBADF;
 
 	/* checks if buffer is big enough */
-	if (unlikely(msg_len < info->attr.mq_msgsize)) {
-		ret = -EMSGSIZE;
-		goto out_fput;
-	}
+	if (unlikely(msg_len < info->attr.mq_msgsize))
+		return -EMSGSIZE;
 
 	/*
 	 * msg_insert really wants us to have a valid, spare node struct so
@@ -1275,9 +1252,6 @@ static int do_mq_timedreceive(mqd_t mqdes, char __user *u_msg_ptr,
 		}
 		free_msg(msg_ptr);
 	}
-out_fput:
-	fdput(f);
-out:
 	return ret;
 }
 
@@ -1437,21 +1411,18 @@ SYSCALL_DEFINE2(mq_notify, mqd_t, mqdes,
 
 static int do_mq_getsetattr(int mqdes, struct mq_attr *new, struct mq_attr *old)
 {
-	struct fd f;
 	struct inode *inode;
 	struct mqueue_inode_info *info;
 
 	if (new && (new->mq_flags & (~O_NONBLOCK)))
 		return -EINVAL;
 
-	f = fdget(mqdes);
-	if (!fd_file(f))
+	CLASS(fd, f)(mqdes);
+	if (fd_empty(f))
 		return -EBADF;
 
-	if (unlikely(fd_file(f)->f_op != &mqueue_file_operations)) {
-		fdput(f);
+	if (unlikely(fd_file(f)->f_op != &mqueue_file_operations))
 		return -EBADF;
-	}
 
 	inode = file_inode(fd_file(f));
 	info = MQUEUE_I(inode);
@@ -1475,7 +1446,6 @@ static int do_mq_getsetattr(int mqdes, struct mq_attr *new, struct mq_attr *old)
 	}
 
 	spin_unlock(&info->lock);
-	fdput(f);
 	return 0;
 }
 
diff --git a/kernel/module/main.c b/kernel/module/main.c
index d785973d8a51..4490924fe24e 100644
--- a/kernel/module/main.c
+++ b/kernel/module/main.c
@@ -3219,10 +3219,7 @@ static int idempotent_init_module(struct file *f, const char __user * uargs, int
 
 SYSCALL_DEFINE3(finit_module, int, fd, const char __user *, uargs, int, flags)
 {
-	int err;
-	struct fd f;
-
-	err = may_init_module();
+	int err = may_init_module();
 	if (err)
 		return err;
 
@@ -3233,12 +3230,10 @@ SYSCALL_DEFINE3(finit_module, int, fd, const char __user *, uargs, int, flags)
 		      |MODULE_INIT_COMPRESSED_FILE))
 		return -EINVAL;
 
-	f = fdget(fd);
+	CLASS(fd, f)(fd);
 	if (fd_empty(f))
 		return -EBADF;
-	err = idempotent_init_module(fd_file(f), uargs, flags);
-	fdput(f);
-	return err;
+	return idempotent_init_module(fd_file(f), uargs, flags);
 }
 
 /* Keep in sync with MODULE_FLAGS_BUF_SIZE !!! */
diff --git a/kernel/pid.c b/kernel/pid.c
index b5bbc1a8a6e4..115448e89c3e 100644
--- a/kernel/pid.c
+++ b/kernel/pid.c
@@ -744,23 +744,18 @@ SYSCALL_DEFINE3(pidfd_getfd, int, pidfd, int, fd,
 		unsigned int, flags)
 {
 	struct pid *pid;
-	struct fd f;
-	int ret;
 
 	/* flags is currently unused - make sure it's unset */
 	if (flags)
 		return -EINVAL;
 
-	f = fdget(pidfd);
-	if (!fd_file(f))
+	CLASS(fd, f)(pidfd);
+	if (fd_empty(f))
 		return -EBADF;
 
 	pid = pidfd_pid(fd_file(f));
 	if (IS_ERR(pid))
-		ret = PTR_ERR(pid);
-	else
-		ret = pidfd_getfd(pid, fd);
+		return PTR_ERR(pid);
 
-	fdput(f);
-	return ret;
+	return pidfd_getfd(pid, fd);
 }
diff --git a/kernel/signal.c b/kernel/signal.c
index 4344860ffcac..6be807ecb94c 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -3908,7 +3908,6 @@ SYSCALL_DEFINE4(pidfd_send_signal, int, pidfd, int, sig,
 		siginfo_t __user *, info, unsigned int, flags)
 {
 	int ret;
-	struct fd f;
 	struct pid *pid;
 	kernel_siginfo_t kinfo;
 	enum pid_type type;
@@ -3921,20 +3920,17 @@ SYSCALL_DEFINE4(pidfd_send_signal, int, pidfd, int, sig,
 	if (hweight32(flags & PIDFD_SEND_SIGNAL_FLAGS) > 1)
 		return -EINVAL;
 
-	f = fdget(pidfd);
-	if (!fd_file(f))
+	CLASS(fd, f)(pidfd);
+	if (fd_empty(f))
 		return -EBADF;
 
 	/* Is this a pidfd? */
 	pid = pidfd_to_pid(fd_file(f));
-	if (IS_ERR(pid)) {
-		ret = PTR_ERR(pid);
-		goto err;
-	}
+	if (IS_ERR(pid))
+		return PTR_ERR(pid);
 
-	ret = -EINVAL;
 	if (!access_pidfd_pidns(pid))
-		goto err;
+		return -EINVAL;
 
 	switch (flags) {
 	case 0:
@@ -3958,28 +3954,23 @@ SYSCALL_DEFINE4(pidfd_send_signal, int, pidfd, int, sig,
 	if (info) {
 		ret = copy_siginfo_from_user_any(&kinfo, info);
 		if (unlikely(ret))
-			goto err;
+			return ret;
 
-		ret = -EINVAL;
 		if (unlikely(sig != kinfo.si_signo))
-			goto err;
+			return -EINVAL;
 
 		/* Only allow sending arbitrary signals to yourself. */
-		ret = -EPERM;
 		if ((task_pid(current) != pid || type > PIDTYPE_TGID) &&
 		    (kinfo.si_code >= 0 || kinfo.si_code == SI_TKILL))
-			goto err;
+			return -EPERM;
 	} else {
 		prepare_kill_siginfo(sig, &kinfo, type);
 	}
 
 	if (type == PIDTYPE_PGID)
-		ret = kill_pgrp_info(sig, &kinfo, pid);
+		return kill_pgrp_info(sig, &kinfo, pid);
 	else
-		ret = kill_pid_info_type(sig, &kinfo, pid, type);
-err:
-	fdput(f);
-	return ret;
+		return kill_pid_info_type(sig, &kinfo, pid, type);
 }
 
 static int
diff --git a/kernel/taskstats.c b/kernel/taskstats.c
index 0700f40c53ac..0cd680ccc7e5 100644
--- a/kernel/taskstats.c
+++ b/kernel/taskstats.c
@@ -411,15 +411,14 @@ static int cgroupstats_user_cmd(struct sk_buff *skb, struct genl_info *info)
 	struct nlattr *na;
 	size_t size;
 	u32 fd;
-	struct fd f;
 
 	na = info->attrs[CGROUPSTATS_CMD_ATTR_FD];
 	if (!na)
 		return -EINVAL;
 
 	fd = nla_get_u32(info->attrs[CGROUPSTATS_CMD_ATTR_FD]);
-	f = fdget(fd);
-	if (!fd_file(f))
+	CLASS(fd, f)(fd);
+	if (fd_empty(f))
 		return 0;
 
 	size = nla_total_size(sizeof(struct cgroupstats));
@@ -427,14 +426,13 @@ static int cgroupstats_user_cmd(struct sk_buff *skb, struct genl_info *info)
 	rc = prepare_reply(info, CGROUPSTATS_CMD_NEW, &rep_skb,
 				size);
 	if (rc < 0)
-		goto err;
+		return rc;
 
 	na = nla_reserve(rep_skb, CGROUPSTATS_TYPE_CGROUP_STATS,
 				sizeof(struct cgroupstats));
 	if (na == NULL) {
 		nlmsg_free(rep_skb);
-		rc = -EMSGSIZE;
-		goto err;
+		return -EMSGSIZE;
 	}
 
 	stats = nla_data(na);
@@ -443,14 +441,10 @@ static int cgroupstats_user_cmd(struct sk_buff *skb, struct genl_info *info)
 	rc = cgroupstats_build(stats, fd_file(f)->f_path.dentry);
 	if (rc < 0) {
 		nlmsg_free(rep_skb);
-		goto err;
+		return rc;
 	}
 
-	rc = send_reply(rep_skb, info);
-
-err:
-	fdput(f);
-	return rc;
+	return send_reply(rep_skb, info);
 }
 
 static int cmd_attr_register_cpumask(struct genl_info *info)
diff --git a/security/integrity/ima/ima_main.c b/security/integrity/ima/ima_main.c
index 06132cf47016..db5e2dd7cec9 100644
--- a/security/integrity/ima/ima_main.c
+++ b/security/integrity/ima/ima_main.c
@@ -1062,19 +1062,16 @@ int process_buffer_measurement(struct mnt_idmap *idmap,
  */
 void ima_kexec_cmdline(int kernel_fd, const void *buf, int size)
 {
-	struct fd f;
-
 	if (!buf || !size)
 		return;
 
-	f = fdget(kernel_fd);
-	if (!fd_file(f))
+	CLASS(fd, f)(kernel_fd);
+	if (fd_empty(f))
 		return;
 
 	process_buffer_measurement(file_mnt_idmap(fd_file(f)), file_inode(fd_file(f)),
 				   buf, size, "kexec-cmdline", KEXEC_CMDLINE, 0,
 				   NULL, false, NULL, 0);
-	fdput(f);
 }
 
 /**
diff --git a/security/loadpin/loadpin.c b/security/loadpin/loadpin.c
index 02144ec39f43..68252452b66c 100644
--- a/security/loadpin/loadpin.c
+++ b/security/loadpin/loadpin.c
@@ -283,7 +283,6 @@ enum loadpin_securityfs_interface_index {
 
 static int read_trusted_verity_root_digests(unsigned int fd)
 {
-	struct fd f;
 	void *data;
 	int rc;
 	char *p, *d;
@@ -295,8 +294,8 @@ static int read_trusted_verity_root_digests(unsigned int fd)
 	if (!list_empty(&dm_verity_loadpin_trusted_root_digests))
 		return -EPERM;
 
-	f = fdget(fd);
-	if (!fd_file(f))
+	CLASS(fd, f)(fd);
+	if (fd_empty(f))
 		return -EINVAL;
 
 	data = kzalloc(SZ_4K, GFP_KERNEL);
@@ -359,7 +358,6 @@ static int read_trusted_verity_root_digests(unsigned int fd)
 	}
 
 	kfree(data);
-	fdput(f);
 
 	return 0;
 
@@ -379,8 +377,6 @@ static int read_trusted_verity_root_digests(unsigned int fd)
 	/* disallow further attempts after reading a corrupt/invalid file */
 	deny_reading_verity_digests = true;
 
-	fdput(f);
-
 	return rc;
 }
 
diff --git a/virt/kvm/vfio.c b/virt/kvm/vfio.c
index 53262b8a7656..72aa1fdeb699 100644
--- a/virt/kvm/vfio.c
+++ b/virt/kvm/vfio.c
@@ -229,14 +229,13 @@ static int kvm_vfio_file_set_spapr_tce(struct kvm_device *dev,
 	struct kvm_vfio_spapr_tce param;
 	struct kvm_vfio *kv = dev->private;
 	struct kvm_vfio_file *kvf;
-	struct fd f;
 	int ret;
 
 	if (copy_from_user(&param, arg, sizeof(struct kvm_vfio_spapr_tce)))
 		return -EFAULT;
 
-	f = fdget(param.groupfd);
-	if (!fd_file(f))
+	CLASS(fd, f)(param.groupfd);
+	if (fd_empty(f))
 		return -EBADF;
 
 	ret = -ENOENT;
@@ -262,7 +261,6 @@ static int kvm_vfio_file_set_spapr_tce(struct kvm_device *dev,
 
 err_fdput:
 	mutex_unlock(&kv->lock);
-	fdput(f);
 	return ret;
 }
 #endif
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 134+ messages in thread

* [PATCH v3 16/28] convert do_preadv()/do_pwritev()
  2024-11-02  5:07   ` [PATCH v3 01/28] net/socket.c: switch to CLASS(fd) Al Viro
                       ` (13 preceding siblings ...)
  2024-11-02  5:08     ` [PATCH v3 15/28] fdget(), more " Al Viro
@ 2024-11-02  5:08     ` Al Viro
  2024-11-02  5:08     ` [PATCH v3 17/28] convert cachestat(2) Al Viro
                       ` (12 subsequent siblings)
  27 siblings, 0 replies; 134+ messages in thread
From: Al Viro @ 2024-11-02  5:08 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: viro, brauner, cgroups, kvm, netdev, torvalds

fdput() can be transposed with add_{r,w}char() and inc_sysc{r,w}();
it's the same story as with do_readv()/do_writev(), only with
fdput() instead of fdput_pos().

Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/read_write.c | 12 ++++--------
 1 file changed, 4 insertions(+), 8 deletions(-)

diff --git a/fs/read_write.c b/fs/read_write.c
index deb87457aa76..f6e9232eacd8 100644
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -1113,18 +1113,16 @@ static inline loff_t pos_from_hilo(unsigned long high, unsigned long low)
 static ssize_t do_preadv(unsigned long fd, const struct iovec __user *vec,
 			 unsigned long vlen, loff_t pos, rwf_t flags)
 {
-	struct fd f;
 	ssize_t ret = -EBADF;
 
 	if (pos < 0)
 		return -EINVAL;
 
-	f = fdget(fd);
-	if (fd_file(f)) {
+	CLASS(fd, f)(fd);
+	if (!fd_empty(f)) {
 		ret = -ESPIPE;
 		if (fd_file(f)->f_mode & FMODE_PREAD)
 			ret = vfs_readv(fd_file(f), vec, vlen, &pos, flags);
-		fdput(f);
 	}
 
 	if (ret > 0)
@@ -1136,18 +1134,16 @@ static ssize_t do_preadv(unsigned long fd, const struct iovec __user *vec,
 static ssize_t do_pwritev(unsigned long fd, const struct iovec __user *vec,
 			  unsigned long vlen, loff_t pos, rwf_t flags)
 {
-	struct fd f;
 	ssize_t ret = -EBADF;
 
 	if (pos < 0)
 		return -EINVAL;
 
-	f = fdget(fd);
-	if (fd_file(f)) {
+	CLASS(fd, f)(fd);
+	if (!fd_empty(f)) {
 		ret = -ESPIPE;
 		if (fd_file(f)->f_mode & FMODE_PWRITE)
 			ret = vfs_writev(fd_file(f), vec, vlen, &pos, flags);
-		fdput(f);
 	}
 
 	if (ret > 0)
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 134+ messages in thread

* [PATCH v3 17/28] convert cachestat(2)
  2024-11-02  5:07   ` [PATCH v3 01/28] net/socket.c: switch to CLASS(fd) Al Viro
                       ` (14 preceding siblings ...)
  2024-11-02  5:08     ` [PATCH v3 16/28] convert do_preadv()/do_pwritev() Al Viro
@ 2024-11-02  5:08     ` Al Viro
  2024-11-02  5:08     ` [PATCH v3 18/28] switch spufs_calls_{get,put}() to CLASS() use Al Viro
                       ` (11 subsequent siblings)
  27 siblings, 0 replies; 134+ messages in thread
From: Al Viro @ 2024-11-02  5:08 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: viro, brauner, cgroups, kvm, netdev, torvalds

fdput() can be transposed with copy_to_user()

Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 mm/filemap.c | 17 +++++------------
 1 file changed, 5 insertions(+), 12 deletions(-)

diff --git a/mm/filemap.c b/mm/filemap.c
index 36d22968be9a..5f2504428f3d 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -4421,31 +4421,25 @@ SYSCALL_DEFINE4(cachestat, unsigned int, fd,
 		struct cachestat_range __user *, cstat_range,
 		struct cachestat __user *, cstat, unsigned int, flags)
 {
-	struct fd f = fdget(fd);
+	CLASS(fd, f)(fd);
 	struct address_space *mapping;
 	struct cachestat_range csr;
 	struct cachestat cs;
 	pgoff_t first_index, last_index;
 
-	if (!fd_file(f))
+	if (fd_empty(f))
 		return -EBADF;
 
 	if (copy_from_user(&csr, cstat_range,
-			sizeof(struct cachestat_range))) {
-		fdput(f);
+			sizeof(struct cachestat_range)))
 		return -EFAULT;
-	}
 
 	/* hugetlbfs is not supported */
-	if (is_file_hugepages(fd_file(f))) {
-		fdput(f);
+	if (is_file_hugepages(fd_file(f)))
 		return -EOPNOTSUPP;
-	}
 
-	if (flags != 0) {
-		fdput(f);
+	if (flags != 0)
 		return -EINVAL;
-	}
 
 	first_index = csr.off >> PAGE_SHIFT;
 	last_index =
@@ -4453,7 +4447,6 @@ SYSCALL_DEFINE4(cachestat, unsigned int, fd,
 	memset(&cs, 0, sizeof(struct cachestat));
 	mapping = fd_file(f)->f_mapping;
 	filemap_cachestat(mapping, first_index, last_index, &cs);
-	fdput(f);
 
 	if (copy_to_user(cstat, &cs, sizeof(struct cachestat)))
 		return -EFAULT;
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 134+ messages in thread

* [PATCH v3 18/28] switch spufs_calls_{get,put}() to CLASS() use
  2024-11-02  5:07   ` [PATCH v3 01/28] net/socket.c: switch to CLASS(fd) Al Viro
                       ` (15 preceding siblings ...)
  2024-11-02  5:08     ` [PATCH v3 17/28] convert cachestat(2) Al Viro
@ 2024-11-02  5:08     ` Al Viro
  2024-11-02  5:08     ` [PATCH v3 19/28] convert spu_run(2) Al Viro
                       ` (10 subsequent siblings)
  27 siblings, 0 replies; 134+ messages in thread
From: Al Viro @ 2024-11-02  5:08 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: viro, brauner, cgroups, kvm, netdev, torvalds

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 arch/powerpc/platforms/cell/spu_syscalls.c | 52 +++++++---------------
 1 file changed, 17 insertions(+), 35 deletions(-)

diff --git a/arch/powerpc/platforms/cell/spu_syscalls.c b/arch/powerpc/platforms/cell/spu_syscalls.c
index da4fad7fc8bf..64a4c9eac6e0 100644
--- a/arch/powerpc/platforms/cell/spu_syscalls.c
+++ b/arch/powerpc/platforms/cell/spu_syscalls.c
@@ -36,6 +36,9 @@ static inline struct spufs_calls *spufs_calls_get(void)
 
 static inline void spufs_calls_put(struct spufs_calls *calls)
 {
+	if (!calls)
+		return;
+
 	BUG_ON(calls != spufs_calls);
 
 	/* we don't need to rcu this, as we hold a reference to the module */
@@ -53,35 +56,30 @@ static inline void spufs_calls_put(struct spufs_calls *calls) { }
 
 #endif /* CONFIG_SPU_FS_MODULE */
 
+DEFINE_CLASS(spufs_calls, struct spufs_calls *, spufs_calls_put(_T), spufs_calls_get(), void)
+
 SYSCALL_DEFINE4(spu_create, const char __user *, name, unsigned int, flags,
 	umode_t, mode, int, neighbor_fd)
 {
-	long ret;
-	struct spufs_calls *calls;
-
-	calls = spufs_calls_get();
+	CLASS(spufs_calls, calls)();
 	if (!calls)
 		return -ENOSYS;
 
 	if (flags & SPU_CREATE_AFFINITY_SPU) {
 		CLASS(fd, neighbor)(neighbor_fd);
-		ret = -EBADF;
-		if (!fd_empty(neighbor))
-			ret = calls->create_thread(name, flags, mode, fd_file(neighbor));
-	} else
-		ret = calls->create_thread(name, flags, mode, NULL);
-
-	spufs_calls_put(calls);
-	return ret;
+		if (fd_empty(neighbor))
+			return -EBADF;
+		return calls->create_thread(name, flags, mode, fd_file(neighbor));
+	} else {
+		return calls->create_thread(name, flags, mode, NULL);
+	}
 }
 
 SYSCALL_DEFINE3(spu_run,int, fd, __u32 __user *, unpc, __u32 __user *, ustatus)
 {
 	long ret;
 	struct fd arg;
-	struct spufs_calls *calls;
-
-	calls = spufs_calls_get();
+	CLASS(spufs_calls, calls)();
 	if (!calls)
 		return -ENOSYS;
 
@@ -91,42 +89,26 @@ SYSCALL_DEFINE3(spu_run,int, fd, __u32 __user *, unpc, __u32 __user *, ustatus)
 		ret = calls->spu_run(fd_file(arg), unpc, ustatus);
 		fdput(arg);
 	}
-
-	spufs_calls_put(calls);
 	return ret;
 }
 
 #ifdef CONFIG_COREDUMP
 int elf_coredump_extra_notes_size(void)
 {
-	struct spufs_calls *calls;
-	int ret;
-
-	calls = spufs_calls_get();
+	CLASS(spufs_calls, calls)();
 	if (!calls)
 		return 0;
 
-	ret = calls->coredump_extra_notes_size();
-
-	spufs_calls_put(calls);
-
-	return ret;
+	return calls->coredump_extra_notes_size();
 }
 
 int elf_coredump_extra_notes_write(struct coredump_params *cprm)
 {
-	struct spufs_calls *calls;
-	int ret;
-
-	calls = spufs_calls_get();
+	CLASS(spufs_calls, calls)();
 	if (!calls)
 		return 0;
 
-	ret = calls->coredump_extra_notes_write(cprm);
-
-	spufs_calls_put(calls);
-
-	return ret;
+	return calls->coredump_extra_notes_write(cprm);
 }
 #endif
 
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 134+ messages in thread

* [PATCH v3 19/28] convert spu_run(2)
  2024-11-02  5:07   ` [PATCH v3 01/28] net/socket.c: switch to CLASS(fd) Al Viro
                       ` (16 preceding siblings ...)
  2024-11-02  5:08     ` [PATCH v3 18/28] switch spufs_calls_{get,put}() to CLASS() use Al Viro
@ 2024-11-02  5:08     ` Al Viro
  2024-11-02  5:08     ` [PATCH v3 20/28] convert media_request_get_by_fd() Al Viro
                       ` (9 subsequent siblings)
  27 siblings, 0 replies; 134+ messages in thread
From: Al Viro @ 2024-11-02  5:08 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: viro, brauner, cgroups, kvm, netdev, torvalds

all failure exits prior to fdget() are returns, fdput() is immediately
followed by return.

Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 arch/powerpc/platforms/cell/spu_syscalls.c | 14 +++++---------
 1 file changed, 5 insertions(+), 9 deletions(-)

diff --git a/arch/powerpc/platforms/cell/spu_syscalls.c b/arch/powerpc/platforms/cell/spu_syscalls.c
index 64a4c9eac6e0..000894e07b02 100644
--- a/arch/powerpc/platforms/cell/spu_syscalls.c
+++ b/arch/powerpc/platforms/cell/spu_syscalls.c
@@ -77,19 +77,15 @@ SYSCALL_DEFINE4(spu_create, const char __user *, name, unsigned int, flags,
 
 SYSCALL_DEFINE3(spu_run,int, fd, __u32 __user *, unpc, __u32 __user *, ustatus)
 {
-	long ret;
-	struct fd arg;
 	CLASS(spufs_calls, calls)();
 	if (!calls)
 		return -ENOSYS;
 
-	ret = -EBADF;
-	arg = fdget(fd);
-	if (fd_file(arg)) {
-		ret = calls->spu_run(fd_file(arg), unpc, ustatus);
-		fdput(arg);
-	}
-	return ret;
+	CLASS(fd, arg)(fd);
+	if (fd_empty(arg))
+		return -EBADF;
+
+	return calls->spu_run(fd_file(arg), unpc, ustatus);
 }
 
 #ifdef CONFIG_COREDUMP
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 134+ messages in thread

* [PATCH v3 20/28] convert media_request_get_by_fd()
  2024-11-02  5:07   ` [PATCH v3 01/28] net/socket.c: switch to CLASS(fd) Al Viro
                       ` (17 preceding siblings ...)
  2024-11-02  5:08     ` [PATCH v3 19/28] convert spu_run(2) Al Viro
@ 2024-11-02  5:08     ` Al Viro
  2024-11-02  5:08     ` [PATCH v3 21/28] convert cifs_ioctl_copychunk() Al Viro
                       ` (8 subsequent siblings)
  27 siblings, 0 replies; 134+ messages in thread
From: Al Viro @ 2024-11-02  5:08 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: viro, brauner, cgroups, kvm, netdev, torvalds

the only thing done after fdput() (in failure cases) is a printk; safely
transposable with fdput()...

Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 drivers/media/mc/mc-request.c | 18 ++++++------------
 1 file changed, 6 insertions(+), 12 deletions(-)

diff --git a/drivers/media/mc/mc-request.c b/drivers/media/mc/mc-request.c
index e064914c476e..df39c8c11e9a 100644
--- a/drivers/media/mc/mc-request.c
+++ b/drivers/media/mc/mc-request.c
@@ -246,22 +246,21 @@ static const struct file_operations request_fops = {
 struct media_request *
 media_request_get_by_fd(struct media_device *mdev, int request_fd)
 {
-	struct fd f;
 	struct media_request *req;
 
 	if (!mdev || !mdev->ops ||
 	    !mdev->ops->req_validate || !mdev->ops->req_queue)
 		return ERR_PTR(-EBADR);
 
-	f = fdget(request_fd);
-	if (!fd_file(f))
-		goto err_no_req_fd;
+	CLASS(fd, f)(request_fd);
+	if (fd_empty(f))
+		goto err;
 
 	if (fd_file(f)->f_op != &request_fops)
-		goto err_fput;
+		goto err;
 	req = fd_file(f)->private_data;
 	if (req->mdev != mdev)
-		goto err_fput;
+		goto err;
 
 	/*
 	 * Note: as long as someone has an open filehandle of the request,
@@ -272,14 +271,9 @@ media_request_get_by_fd(struct media_device *mdev, int request_fd)
 	 * before media_request_get() is called.
 	 */
 	media_request_get(req);
-	fdput(f);
-
 	return req;
 
-err_fput:
-	fdput(f);
-
-err_no_req_fd:
+err:
 	dev_dbg(mdev->dev, "cannot find request_fd %d\n", request_fd);
 	return ERR_PTR(-EINVAL);
 }
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 134+ messages in thread

* [PATCH v3 21/28] convert cifs_ioctl_copychunk()
  2024-11-02  5:07   ` [PATCH v3 01/28] net/socket.c: switch to CLASS(fd) Al Viro
                       ` (18 preceding siblings ...)
  2024-11-02  5:08     ` [PATCH v3 20/28] convert media_request_get_by_fd() Al Viro
@ 2024-11-02  5:08     ` Al Viro
  2024-11-02  5:08     ` [PATCH v3 22/28] convert vfs_dedupe_file_range() Al Viro
                       ` (7 subsequent siblings)
  27 siblings, 0 replies; 134+ messages in thread
From: Al Viro @ 2024-11-02  5:08 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: viro, brauner, cgroups, kvm, netdev, torvalds

fdput() moved past mnt_drop_file_write(); harmless, if somewhat cringeworthy.
Reordering could be avoided either by adding an explicit scope or by making
mnt_drop_file_write() called via __cleanup.

Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/smb/client/ioctl.c | 11 ++++-------
 1 file changed, 4 insertions(+), 7 deletions(-)

diff --git a/fs/smb/client/ioctl.c b/fs/smb/client/ioctl.c
index 2ce193609d8b..56439da4f119 100644
--- a/fs/smb/client/ioctl.c
+++ b/fs/smb/client/ioctl.c
@@ -72,7 +72,6 @@ static long cifs_ioctl_copychunk(unsigned int xid, struct file *dst_file,
 			unsigned long srcfd)
 {
 	int rc;
-	struct fd src_file;
 	struct inode *src_inode;
 
 	cifs_dbg(FYI, "ioctl copychunk range\n");
@@ -89,8 +88,8 @@ static long cifs_ioctl_copychunk(unsigned int xid, struct file *dst_file,
 		return rc;
 	}
 
-	src_file = fdget(srcfd);
-	if (!fd_file(src_file)) {
+	CLASS(fd, src_file)(srcfd);
+	if (fd_empty(src_file)) {
 		rc = -EBADF;
 		goto out_drop_write;
 	}
@@ -98,20 +97,18 @@ static long cifs_ioctl_copychunk(unsigned int xid, struct file *dst_file,
 	if (fd_file(src_file)->f_op->unlocked_ioctl != cifs_ioctl) {
 		rc = -EBADF;
 		cifs_dbg(VFS, "src file seems to be from a different filesystem type\n");
-		goto out_fput;
+		goto out_drop_write;
 	}
 
 	src_inode = file_inode(fd_file(src_file));
 	rc = -EINVAL;
 	if (S_ISDIR(src_inode->i_mode))
-		goto out_fput;
+		goto out_drop_write;
 
 	rc = cifs_file_copychunk_range(xid, fd_file(src_file), 0, dst_file, 0,
 					src_inode->i_size, 0);
 	if (rc > 0)
 		rc = 0;
-out_fput:
-	fdput(src_file);
 out_drop_write:
 	mnt_drop_write_file(dst_file);
 	return rc;
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 134+ messages in thread

* [PATCH v3 22/28] convert vfs_dedupe_file_range().
  2024-11-02  5:07   ` [PATCH v3 01/28] net/socket.c: switch to CLASS(fd) Al Viro
                       ` (19 preceding siblings ...)
  2024-11-02  5:08     ` [PATCH v3 21/28] convert cifs_ioctl_copychunk() Al Viro
@ 2024-11-02  5:08     ` Al Viro
  2024-11-02  5:08     ` [PATCH v3 23/28] convert do_select() Al Viro
                       ` (6 subsequent siblings)
  27 siblings, 0 replies; 134+ messages in thread
From: Al Viro @ 2024-11-02  5:08 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: viro, brauner, cgroups, kvm, netdev, torvalds

fdput() is followed by checking fatal_signal_pending() (and aborting
the loop in such case).  fdput() is transposable with that check.
Yes, it'll probably end up with slightly fatter code (call after the
check has returned false + call on the almost never taken out-of-line path
instead of one call before the check), but it's not worth bothering with
explicit extra scope there (or dragging the check into the loop condition,
for that matter).

Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/remap_range.c | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/fs/remap_range.c b/fs/remap_range.c
index 017d0d1ea6c9..26afbbbfb10c 100644
--- a/fs/remap_range.c
+++ b/fs/remap_range.c
@@ -536,7 +536,7 @@ int vfs_dedupe_file_range(struct file *file, struct file_dedupe_range *same)
 	}
 
 	for (i = 0, info = same->info; i < count; i++, info++) {
-		struct fd dst_fd = fdget(info->dest_fd);
+		CLASS(fd, dst_fd)(info->dest_fd);
 
 		if (fd_empty(dst_fd)) {
 			info->status = -EBADF;
@@ -545,7 +545,7 @@ int vfs_dedupe_file_range(struct file *file, struct file_dedupe_range *same)
 
 		if (info->reserved) {
 			info->status = -EINVAL;
-			goto next_fdput;
+			goto next_loop;
 		}
 
 		deduped = vfs_dedupe_file_range_one(file, off, fd_file(dst_fd),
@@ -558,8 +558,6 @@ int vfs_dedupe_file_range(struct file *file, struct file_dedupe_range *same)
 		else
 			info->bytes_deduped = len;
 
-next_fdput:
-		fdput(dst_fd);
 next_loop:
 		if (fatal_signal_pending(current))
 			break;
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 134+ messages in thread

* [PATCH v3 23/28] convert do_select()
  2024-11-02  5:07   ` [PATCH v3 01/28] net/socket.c: switch to CLASS(fd) Al Viro
                       ` (20 preceding siblings ...)
  2024-11-02  5:08     ` [PATCH v3 22/28] convert vfs_dedupe_file_range() Al Viro
@ 2024-11-02  5:08     ` Al Viro
  2024-11-02  5:08     ` [PATCH v3 24/28] do_pollfd(): convert to CLASS(fd) Al Viro
                       ` (5 subsequent siblings)
  27 siblings, 0 replies; 134+ messages in thread
From: Al Viro @ 2024-11-02  5:08 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: viro, brauner, cgroups, kvm, netdev, torvalds

take the logics from fdget() to fdput() into an inlined helper - with existing
wait_key_set() subsumed into that.

Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/select.c | 21 ++++++++++-----------
 1 file changed, 10 insertions(+), 11 deletions(-)

diff --git a/fs/select.c b/fs/select.c
index a77907faf2b4..b41e2d651cc1 100644
--- a/fs/select.c
+++ b/fs/select.c
@@ -462,15 +462,22 @@ static int max_select_fd(unsigned long n, fd_set_bits *fds)
 			 EPOLLNVAL)
 #define POLLEX_SET (EPOLLPRI | EPOLLNVAL)
 
-static inline void wait_key_set(poll_table *wait, unsigned long in,
+static inline __poll_t select_poll_one(int fd, poll_table *wait, unsigned long in,
 				unsigned long out, unsigned long bit,
 				__poll_t ll_flag)
 {
+	CLASS(fd, f)(fd);
+
+	if (fd_empty(f))
+		return EPOLLNVAL;
+
 	wait->_key = POLLEX_SET | ll_flag;
 	if (in & bit)
 		wait->_key |= POLLIN_SET;
 	if (out & bit)
 		wait->_key |= POLLOUT_SET;
+
+	return vfs_poll(fd_file(f), wait);
 }
 
 static noinline_for_stack int do_select(int n, fd_set_bits *fds, struct timespec64 *end_time)
@@ -522,20 +529,12 @@ static noinline_for_stack int do_select(int n, fd_set_bits *fds, struct timespec
 			}
 
 			for (j = 0; j < BITS_PER_LONG; ++j, ++i, bit <<= 1) {
-				struct fd f;
 				if (i >= n)
 					break;
 				if (!(bit & all_bits))
 					continue;
-				mask = EPOLLNVAL;
-				f = fdget(i);
-				if (fd_file(f)) {
-					wait_key_set(wait, in, out, bit,
-						     busy_flag);
-					mask = vfs_poll(fd_file(f), wait);
-
-					fdput(f);
-				}
+				mask = select_poll_one(i, wait, in, out, bit,
+						       busy_flag);
 				if ((mask & POLLIN_SET) && (in & bit)) {
 					res_in |= bit;
 					retval++;
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 134+ messages in thread

* [PATCH v3 24/28] do_pollfd(): convert to CLASS(fd)
  2024-11-02  5:07   ` [PATCH v3 01/28] net/socket.c: switch to CLASS(fd) Al Viro
                       ` (21 preceding siblings ...)
  2024-11-02  5:08     ` [PATCH v3 23/28] convert do_select() Al Viro
@ 2024-11-02  5:08     ` Al Viro
  2024-11-02  5:08     ` [PATCH v3 25/28] assorted variants of irqfd setup: " Al Viro
                       ` (4 subsequent siblings)
  27 siblings, 0 replies; 134+ messages in thread
From: Al Viro @ 2024-11-02  5:08 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: viro, brauner, cgroups, kvm, netdev, torvalds

lift setting ->revents into the caller, so that failure exits (including
the early one) would be plain returns.

We need the scope of our struct fd to end before the store to ->revents,
since that's shared with the failure exits prior to the point where we
can do fdget().

Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/select.c | 27 +++++++++++----------------
 1 file changed, 11 insertions(+), 16 deletions(-)

diff --git a/fs/select.c b/fs/select.c
index b41e2d651cc1..e223d1fe9d55 100644
--- a/fs/select.c
+++ b/fs/select.c
@@ -855,15 +855,14 @@ static inline __poll_t do_pollfd(struct pollfd *pollfd, poll_table *pwait,
 				     __poll_t busy_flag)
 {
 	int fd = pollfd->fd;
-	__poll_t mask = 0, filter;
-	struct fd f;
+	__poll_t mask, filter;
 
 	if (fd < 0)
-		goto out;
-	mask = EPOLLNVAL;
-	f = fdget(fd);
-	if (!fd_file(f))
-		goto out;
+		return 0;
+
+	CLASS(fd, f)(fd);
+	if (fd_empty(f))
+		return EPOLLNVAL;
 
 	/* userland u16 ->events contains POLL... bitmap */
 	filter = demangle_poll(pollfd->events) | EPOLLERR | EPOLLHUP;
@@ -871,13 +870,7 @@ static inline __poll_t do_pollfd(struct pollfd *pollfd, poll_table *pwait,
 	mask = vfs_poll(fd_file(f), pwait);
 	if (mask & busy_flag)
 		*can_busy_poll = true;
-	mask &= filter;		/* Mask out unneeded events. */
-	fdput(f);
-
-out:
-	/* ... and so does ->revents */
-	pollfd->revents = mangle_poll(mask);
-	return mask;
+	return mask & filter;		/* Mask out unneeded events. */
 }
 
 static int do_poll(struct poll_list *list, struct poll_wqueues *wait,
@@ -909,6 +902,7 @@ static int do_poll(struct poll_list *list, struct poll_wqueues *wait,
 			pfd = walk->entries;
 			pfd_end = pfd + walk->len;
 			for (; pfd != pfd_end; pfd++) {
+				__poll_t mask;
 				/*
 				 * Fish for events. If we found one, record it
 				 * and kill poll_table->_qproc, so we don't
@@ -916,8 +910,9 @@ static int do_poll(struct poll_list *list, struct poll_wqueues *wait,
 				 * this. They'll get immediately deregistered
 				 * when we break out and return.
 				 */
-				if (do_pollfd(pfd, pt, &can_busy_loop,
-					      busy_flag)) {
+				mask = do_pollfd(pfd, pt, &can_busy_loop, busy_flag);
+				pfd->revents = mangle_poll(mask);
+				if (mask) {
 					count++;
 					pt->_qproc = NULL;
 					/* found something, stop busy polling */
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 134+ messages in thread

* [PATCH v3 25/28] assorted variants of irqfd setup: convert to CLASS(fd)
  2024-11-02  5:07   ` [PATCH v3 01/28] net/socket.c: switch to CLASS(fd) Al Viro
                       ` (22 preceding siblings ...)
  2024-11-02  5:08     ` [PATCH v3 24/28] do_pollfd(): convert to CLASS(fd) Al Viro
@ 2024-11-02  5:08     ` Al Viro
  2024-11-02  5:08     ` [PATCH v3 26/28] memcg_write_event_control(): switch " Al Viro
                       ` (3 subsequent siblings)
  27 siblings, 0 replies; 134+ messages in thread
From: Al Viro @ 2024-11-02  5:08 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: viro, brauner, cgroups, kvm, netdev, torvalds

in all of those failure exits prior to fdget() are plain returns and
the only thing done after fdput() is (on failure exits) a kfree(),
which can be done before fdput() just fine.

NOTE: in acrn_irqfd_assign() 'fail:' failure exit is wrong for
eventfd_ctx_fileget() failure (we only want fdput() there) and once
we stop doing that, it doesn't need to check if eventfd is NULL or
ERR_PTR(...) there.

NOTE: in privcmd we move fdget() up before the allocation - more
to the point, before the copy_from_user() attempt.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 drivers/vfio/virqfd.c     | 16 +++-------------
 drivers/virt/acrn/irqfd.c | 13 ++++---------
 drivers/xen/privcmd.c     | 17 ++++-------------
 virt/kvm/eventfd.c        | 15 +++------------
 4 files changed, 14 insertions(+), 47 deletions(-)

diff --git a/drivers/vfio/virqfd.c b/drivers/vfio/virqfd.c
index d22881245e89..aa2891f97508 100644
--- a/drivers/vfio/virqfd.c
+++ b/drivers/vfio/virqfd.c
@@ -113,7 +113,6 @@ int vfio_virqfd_enable(void *opaque,
 		       void (*thread)(void *, void *),
 		       void *data, struct virqfd **pvirqfd, int fd)
 {
-	struct fd irqfd;
 	struct eventfd_ctx *ctx;
 	struct virqfd *virqfd;
 	int ret = 0;
@@ -133,8 +132,8 @@ int vfio_virqfd_enable(void *opaque,
 	INIT_WORK(&virqfd->inject, virqfd_inject);
 	INIT_WORK(&virqfd->flush_inject, virqfd_flush_inject);
 
-	irqfd = fdget(fd);
-	if (!fd_file(irqfd)) {
+	CLASS(fd, irqfd)(fd);
+	if (fd_empty(irqfd)) {
 		ret = -EBADF;
 		goto err_fd;
 	}
@@ -142,7 +141,7 @@ int vfio_virqfd_enable(void *opaque,
 	ctx = eventfd_ctx_fileget(fd_file(irqfd));
 	if (IS_ERR(ctx)) {
 		ret = PTR_ERR(ctx);
-		goto err_ctx;
+		goto err_fd;
 	}
 
 	virqfd->eventfd = ctx;
@@ -181,18 +180,9 @@ int vfio_virqfd_enable(void *opaque,
 		if ((!handler || handler(opaque, data)) && thread)
 			schedule_work(&virqfd->inject);
 	}
-
-	/*
-	 * Do not drop the file until the irqfd is fully initialized,
-	 * otherwise we might race against the EPOLLHUP.
-	 */
-	fdput(irqfd);
-
 	return 0;
 err_busy:
 	eventfd_ctx_put(ctx);
-err_ctx:
-	fdput(irqfd);
 err_fd:
 	kfree(virqfd);
 
diff --git a/drivers/virt/acrn/irqfd.c b/drivers/virt/acrn/irqfd.c
index 9994d818bb7e..b7da24ca1475 100644
--- a/drivers/virt/acrn/irqfd.c
+++ b/drivers/virt/acrn/irqfd.c
@@ -112,7 +112,6 @@ static int acrn_irqfd_assign(struct acrn_vm *vm, struct acrn_irqfd *args)
 	struct eventfd_ctx *eventfd = NULL;
 	struct hsm_irqfd *irqfd, *tmp;
 	__poll_t events;
-	struct fd f;
 	int ret = 0;
 
 	irqfd = kzalloc(sizeof(*irqfd), GFP_KERNEL);
@@ -124,8 +123,8 @@ static int acrn_irqfd_assign(struct acrn_vm *vm, struct acrn_irqfd *args)
 	INIT_LIST_HEAD(&irqfd->list);
 	INIT_WORK(&irqfd->shutdown, hsm_irqfd_shutdown_work);
 
-	f = fdget(args->fd);
-	if (!fd_file(f)) {
+	CLASS(fd, f)(args->fd);
+	if (fd_empty(f)) {
 		ret = -EBADF;
 		goto out;
 	}
@@ -133,7 +132,7 @@ static int acrn_irqfd_assign(struct acrn_vm *vm, struct acrn_irqfd *args)
 	eventfd = eventfd_ctx_fileget(fd_file(f));
 	if (IS_ERR(eventfd)) {
 		ret = PTR_ERR(eventfd);
-		goto fail;
+		goto out;
 	}
 
 	irqfd->eventfd = eventfd;
@@ -162,13 +161,9 @@ static int acrn_irqfd_assign(struct acrn_vm *vm, struct acrn_irqfd *args)
 	if (events & EPOLLIN)
 		acrn_irqfd_inject(irqfd);
 
-	fdput(f);
 	return 0;
 fail:
-	if (eventfd && !IS_ERR(eventfd))
-		eventfd_ctx_put(eventfd);
-
-	fdput(f);
+	eventfd_ctx_put(eventfd);
 out:
 	kfree(irqfd);
 	return ret;
diff --git a/drivers/xen/privcmd.c b/drivers/xen/privcmd.c
index 79070494070d..0ca532614343 100644
--- a/drivers/xen/privcmd.c
+++ b/drivers/xen/privcmd.c
@@ -967,10 +967,11 @@ static int privcmd_irqfd_assign(struct privcmd_irqfd *irqfd)
 	struct privcmd_kernel_irqfd *kirqfd, *tmp;
 	unsigned long flags;
 	__poll_t events;
-	struct fd f;
 	void *dm_op;
 	int ret, idx;
 
+	CLASS(fd, f)(irqfd->fd);
+
 	kirqfd = kzalloc(sizeof(*kirqfd) + irqfd->size, GFP_KERNEL);
 	if (!kirqfd)
 		return -ENOMEM;
@@ -986,8 +987,7 @@ static int privcmd_irqfd_assign(struct privcmd_irqfd *irqfd)
 	kirqfd->dom = irqfd->dom;
 	INIT_WORK(&kirqfd->shutdown, irqfd_shutdown);
 
-	f = fdget(irqfd->fd);
-	if (!fd_file(f)) {
+	if (fd_empty(f)) {
 		ret = -EBADF;
 		goto error_kfree;
 	}
@@ -995,7 +995,7 @@ static int privcmd_irqfd_assign(struct privcmd_irqfd *irqfd)
 	kirqfd->eventfd = eventfd_ctx_fileget(fd_file(f));
 	if (IS_ERR(kirqfd->eventfd)) {
 		ret = PTR_ERR(kirqfd->eventfd);
-		goto error_fd_put;
+		goto error_kfree;
 	}
 
 	/*
@@ -1028,20 +1028,11 @@ static int privcmd_irqfd_assign(struct privcmd_irqfd *irqfd)
 		irqfd_inject(kirqfd);
 
 	srcu_read_unlock(&irqfds_srcu, idx);
-
-	/*
-	 * Do not drop the file until the kirqfd is fully initialized, otherwise
-	 * we might race against the EPOLLHUP.
-	 */
-	fdput(f);
 	return 0;
 
 error_eventfd:
 	eventfd_ctx_put(kirqfd->eventfd);
 
-error_fd_put:
-	fdput(f);
-
 error_kfree:
 	kfree(kirqfd);
 	return ret;
diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c
index 6b390b622b72..249ba5b72e9b 100644
--- a/virt/kvm/eventfd.c
+++ b/virt/kvm/eventfd.c
@@ -304,7 +304,6 @@ static int
 kvm_irqfd_assign(struct kvm *kvm, struct kvm_irqfd *args)
 {
 	struct kvm_kernel_irqfd *irqfd, *tmp;
-	struct fd f;
 	struct eventfd_ctx *eventfd = NULL, *resamplefd = NULL;
 	int ret;
 	__poll_t events;
@@ -327,8 +326,8 @@ kvm_irqfd_assign(struct kvm *kvm, struct kvm_irqfd *args)
 	INIT_WORK(&irqfd->shutdown, irqfd_shutdown);
 	seqcount_spinlock_init(&irqfd->irq_entry_sc, &kvm->irqfds.lock);
 
-	f = fdget(args->fd);
-	if (!fd_file(f)) {
+	CLASS(fd, f)(args->fd);
+	if (fd_empty(f)) {
 		ret = -EBADF;
 		goto out;
 	}
@@ -336,7 +335,7 @@ kvm_irqfd_assign(struct kvm *kvm, struct kvm_irqfd *args)
 	eventfd = eventfd_ctx_fileget(fd_file(f));
 	if (IS_ERR(eventfd)) {
 		ret = PTR_ERR(eventfd);
-		goto fail;
+		goto out;
 	}
 
 	irqfd->eventfd = eventfd;
@@ -440,12 +439,6 @@ kvm_irqfd_assign(struct kvm *kvm, struct kvm_irqfd *args)
 #endif
 
 	srcu_read_unlock(&kvm->irq_srcu, idx);
-
-	/*
-	 * do not drop the file until the irqfd is fully initialized, otherwise
-	 * we might race against the EPOLLHUP
-	 */
-	fdput(f);
 	return 0;
 
 fail:
@@ -458,8 +451,6 @@ kvm_irqfd_assign(struct kvm *kvm, struct kvm_irqfd *args)
 	if (eventfd && !IS_ERR(eventfd))
 		eventfd_ctx_put(eventfd);
 
-	fdput(f);
-
 out:
 	kfree(irqfd);
 	return ret;
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 134+ messages in thread

* [PATCH v3 26/28] memcg_write_event_control(): switch to CLASS(fd)
  2024-11-02  5:07   ` [PATCH v3 01/28] net/socket.c: switch to CLASS(fd) Al Viro
                       ` (23 preceding siblings ...)
  2024-11-02  5:08     ` [PATCH v3 25/28] assorted variants of irqfd setup: " Al Viro
@ 2024-11-02  5:08     ` Al Viro
  2024-11-02  5:08     ` [PATCH v3 27/28] css_set_fork(): switch to CLASS(fd_raw, ...) Al Viro
                       ` (2 subsequent siblings)
  27 siblings, 0 replies; 134+ messages in thread
From: Al Viro @ 2024-11-02  5:08 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: viro, brauner, cgroups, kvm, netdev, torvalds

some reordering required - take both fdget() to the point before
the allocations, with matching move of fdput() to the very end
of failure exit(s); after that it converts trivially.

simplify the cleanups that involve css_put(), while we are at it...

Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 mm/memcontrol-v1.c | 44 +++++++++++++++-----------------------------
 1 file changed, 15 insertions(+), 29 deletions(-)

diff --git a/mm/memcontrol-v1.c b/mm/memcontrol-v1.c
index 81d8819f13cd..bc54cff7615f 100644
--- a/mm/memcontrol-v1.c
+++ b/mm/memcontrol-v1.c
@@ -1911,8 +1911,6 @@ static ssize_t memcg_write_event_control(struct kernfs_open_file *of,
 	struct mem_cgroup_event *event;
 	struct cgroup_subsys_state *cfile_css;
 	unsigned int efd, cfd;
-	struct fd efile;
-	struct fd cfile;
 	struct dentry *cdentry;
 	const char *name;
 	char *endp;
@@ -1936,6 +1934,12 @@ static ssize_t memcg_write_event_control(struct kernfs_open_file *of,
 	else
 		return -EINVAL;
 
+	CLASS(fd, efile)(efd);
+	if (fd_empty(efile))
+		return -EBADF;
+
+	CLASS(fd, cfile)(cfd);
+
 	event = kzalloc(sizeof(*event), GFP_KERNEL);
 	if (!event)
 		return -ENOMEM;
@@ -1946,20 +1950,13 @@ static ssize_t memcg_write_event_control(struct kernfs_open_file *of,
 	init_waitqueue_func_entry(&event->wait, memcg_event_wake);
 	INIT_WORK(&event->remove, memcg_event_remove);
 
-	efile = fdget(efd);
-	if (!fd_file(efile)) {
-		ret = -EBADF;
-		goto out_kfree;
-	}
-
 	event->eventfd = eventfd_ctx_fileget(fd_file(efile));
 	if (IS_ERR(event->eventfd)) {
 		ret = PTR_ERR(event->eventfd);
-		goto out_put_efile;
+		goto out_kfree;
 	}
 
-	cfile = fdget(cfd);
-	if (!fd_file(cfile)) {
+	if (fd_empty(cfile)) {
 		ret = -EBADF;
 		goto out_put_eventfd;
 	}
@@ -1968,7 +1965,7 @@ static ssize_t memcg_write_event_control(struct kernfs_open_file *of,
 	/* AV: shouldn't we check that it's been opened for read instead? */
 	ret = file_permission(fd_file(cfile), MAY_READ);
 	if (ret < 0)
-		goto out_put_cfile;
+		goto out_put_eventfd;
 
 	/*
 	 * The control file must be a regular cgroup1 file. As a regular cgroup
@@ -1977,7 +1974,7 @@ static ssize_t memcg_write_event_control(struct kernfs_open_file *of,
 	cdentry = fd_file(cfile)->f_path.dentry;
 	if (cdentry->d_sb->s_type != &cgroup_fs_type || !d_is_reg(cdentry)) {
 		ret = -EINVAL;
-		goto out_put_cfile;
+		goto out_put_eventfd;
 	}
 
 	/*
@@ -2010,7 +2007,7 @@ static ssize_t memcg_write_event_control(struct kernfs_open_file *of,
 		event->unregister_event = memsw_cgroup_usage_unregister_event;
 	} else {
 		ret = -EINVAL;
-		goto out_put_cfile;
+		goto out_put_eventfd;
 	}
 
 	/*
@@ -2022,11 +2019,9 @@ static ssize_t memcg_write_event_control(struct kernfs_open_file *of,
 					       &memory_cgrp_subsys);
 	ret = -EINVAL;
 	if (IS_ERR(cfile_css))
-		goto out_put_cfile;
-	if (cfile_css != css) {
-		css_put(cfile_css);
-		goto out_put_cfile;
-	}
+		goto out_put_eventfd;
+	if (cfile_css != css)
+		goto out_put_css;
 
 	ret = event->register_event(memcg, event->eventfd, buf);
 	if (ret)
@@ -2037,23 +2032,14 @@ static ssize_t memcg_write_event_control(struct kernfs_open_file *of,
 	spin_lock_irq(&memcg->event_list_lock);
 	list_add(&event->list, &memcg->event_list);
 	spin_unlock_irq(&memcg->event_list_lock);
-
-	fdput(cfile);
-	fdput(efile);
-
 	return nbytes;
 
 out_put_css:
-	css_put(css);
-out_put_cfile:
-	fdput(cfile);
+	css_put(cfile_css);
 out_put_eventfd:
 	eventfd_ctx_put(event->eventfd);
-out_put_efile:
-	fdput(efile);
 out_kfree:
 	kfree(event);
-
 	return ret;
 }
 
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 134+ messages in thread

* [PATCH v3 27/28] css_set_fork(): switch to CLASS(fd_raw, ...)
  2024-11-02  5:07   ` [PATCH v3 01/28] net/socket.c: switch to CLASS(fd) Al Viro
                       ` (24 preceding siblings ...)
  2024-11-02  5:08     ` [PATCH v3 26/28] memcg_write_event_control(): switch " Al Viro
@ 2024-11-02  5:08     ` Al Viro
  2024-11-02  5:08     ` [PATCH v3 28/28] deal with the last remaing boolean uses of fd_file() Al Viro
  2024-11-02 12:21     ` [PATCH v3 01/28] net/socket.c: switch to CLASS(fd) Simon Horman
  27 siblings, 0 replies; 134+ messages in thread
From: Al Viro @ 2024-11-02  5:08 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: viro, brauner, cgroups, kvm, netdev, torvalds

reference acquired there by fget_raw() is not stashed anywhere -
we could as well borrow instead.

Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 kernel/cgroup/cgroup.c | 12 ++++--------
 1 file changed, 4 insertions(+), 8 deletions(-)

diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
index 8305a67ea8d9..02acc2540c46 100644
--- a/kernel/cgroup/cgroup.c
+++ b/kernel/cgroup/cgroup.c
@@ -6476,7 +6476,6 @@ static int cgroup_css_set_fork(struct kernel_clone_args *kargs)
 	struct cgroup *dst_cgrp = NULL;
 	struct css_set *cset;
 	struct super_block *sb;
-	struct file *f;
 
 	if (kargs->flags & CLONE_INTO_CGROUP)
 		cgroup_lock();
@@ -6493,14 +6492,14 @@ static int cgroup_css_set_fork(struct kernel_clone_args *kargs)
 		return 0;
 	}
 
-	f = fget_raw(kargs->cgroup);
-	if (!f) {
+	CLASS(fd_raw, f)(kargs->cgroup);
+	if (fd_empty(f)) {
 		ret = -EBADF;
 		goto err;
 	}
-	sb = f->f_path.dentry->d_sb;
+	sb = fd_file(f)->f_path.dentry->d_sb;
 
-	dst_cgrp = cgroup_get_from_file(f);
+	dst_cgrp = cgroup_get_from_file(fd_file(f));
 	if (IS_ERR(dst_cgrp)) {
 		ret = PTR_ERR(dst_cgrp);
 		dst_cgrp = NULL;
@@ -6548,15 +6547,12 @@ static int cgroup_css_set_fork(struct kernel_clone_args *kargs)
 	}
 
 	put_css_set(cset);
-	fput(f);
 	kargs->cgrp = dst_cgrp;
 	return ret;
 
 err:
 	cgroup_threadgroup_change_end(current);
 	cgroup_unlock();
-	if (f)
-		fput(f);
 	if (dst_cgrp)
 		cgroup_put(dst_cgrp);
 	put_css_set(cset);
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 134+ messages in thread

* [PATCH v3 28/28] deal with the last remaing boolean uses of fd_file()
  2024-11-02  5:07   ` [PATCH v3 01/28] net/socket.c: switch to CLASS(fd) Al Viro
                       ` (25 preceding siblings ...)
  2024-11-02  5:08     ` [PATCH v3 27/28] css_set_fork(): switch to CLASS(fd_raw, ...) Al Viro
@ 2024-11-02  5:08     ` Al Viro
  2024-11-02 12:21     ` [PATCH v3 01/28] net/socket.c: switch to CLASS(fd) Simon Horman
  27 siblings, 0 replies; 134+ messages in thread
From: Al Viro @ 2024-11-02  5:08 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: viro, brauner, cgroups, kvm, netdev, torvalds

Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 drivers/infiniband/core/uverbs_cmd.c | 8 +++-----
 include/linux/cleanup.h              | 2 +-
 sound/core/pcm_native.c              | 2 +-
 3 files changed, 5 insertions(+), 7 deletions(-)

diff --git a/drivers/infiniband/core/uverbs_cmd.c b/drivers/infiniband/core/uverbs_cmd.c
index a4cce360df21..66b02fbf077a 100644
--- a/drivers/infiniband/core/uverbs_cmd.c
+++ b/drivers/infiniband/core/uverbs_cmd.c
@@ -584,7 +584,7 @@ static int ib_uverbs_open_xrcd(struct uverbs_attr_bundle *attrs)
 	if (cmd.fd != -1) {
 		/* search for file descriptor */
 		f = fdget(cmd.fd);
-		if (!fd_file(f)) {
+		if (fd_empty(f)) {
 			ret = -EBADF;
 			goto err_tree_mutex_unlock;
 		}
@@ -632,8 +632,7 @@ static int ib_uverbs_open_xrcd(struct uverbs_attr_bundle *attrs)
 		atomic_inc(&xrcd->usecnt);
 	}
 
-	if (fd_file(f))
-		fdput(f);
+	fdput(f);
 
 	mutex_unlock(&ibudev->xrcd_tree_mutex);
 	uobj_finalize_uobj_create(&obj->uobject, attrs);
@@ -648,8 +647,7 @@ static int ib_uverbs_open_xrcd(struct uverbs_attr_bundle *attrs)
 	uobj_alloc_abort(&obj->uobject, attrs);
 
 err_tree_mutex_unlock:
-	if (fd_file(f))
-		fdput(f);
+	fdput(f);
 
 	mutex_unlock(&ibudev->xrcd_tree_mutex);
 
diff --git a/include/linux/cleanup.h b/include/linux/cleanup.h
index 038b2d523bf8..875c998275c0 100644
--- a/include/linux/cleanup.h
+++ b/include/linux/cleanup.h
@@ -234,7 +234,7 @@ const volatile void * __must_check_fn(const volatile void *val)
  * DEFINE_CLASS(fdget, struct fd, fdput(_T), fdget(fd), int fd)
  *
  *	CLASS(fdget, f)(fd);
- *	if (!fd_file(f))
+ *	if (fd_empty(f))
  *		return -EBADF;
  *
  *	// use 'f' without concern
diff --git a/sound/core/pcm_native.c b/sound/core/pcm_native.c
index b465fb6e1f5f..3320cce35a03 100644
--- a/sound/core/pcm_native.c
+++ b/sound/core/pcm_native.c
@@ -2250,7 +2250,7 @@ static int snd_pcm_link(struct snd_pcm_substream *substream, int fd)
 	bool nonatomic = substream->pcm->nonatomic;
 	CLASS(fd, f)(fd);
 
-	if (!fd_file(f))
+	if (fd_empty(f))
 		return -EBADFD;
 	if (!is_pcm_file(fd_file(f)))
 		return -EBADFD;
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 134+ messages in thread

* Re: [PATCH v3 01/28] net/socket.c: switch to CLASS(fd)
  2024-11-02  5:07   ` [PATCH v3 01/28] net/socket.c: switch to CLASS(fd) Al Viro
                       ` (26 preceding siblings ...)
  2024-11-02  5:08     ` [PATCH v3 28/28] deal with the last remaing boolean uses of fd_file() Al Viro
@ 2024-11-02 12:21     ` Simon Horman
  2024-11-03  6:31       ` Al Viro
  27 siblings, 1 reply; 134+ messages in thread
From: Simon Horman @ 2024-11-02 12:21 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-fsdevel, brauner, cgroups, kvm, netdev, torvalds

On Sat, Nov 02, 2024 at 05:07:59AM +0000, Al Viro wrote:
> 	The important part in sockfd_lookup_light() is avoiding needless
> file refcount operations, not the marginal reduction of the register
> pressure from not keeping a struct file pointer in the caller.
> 
> 	Switch to use fdget()/fdpu(); with sane use of CLASS(fd) we can
> get a better code generation...
> 
> 	Would be nice if somebody tested it on networking test suites
> (including benchmarks)...
> 
> 	sockfd_lookup_light() does fdget(), uses sock_from_file() to
> get the associated socket and returns the struct socket reference to
> the caller, along with "do we need to fput()" flag.  No matching fdput(),
> the caller does its equivalent manually, using the fact that sock->file
> points to the struct file the socket has come from.
> 
> 	Get rid of that - have the callers do fdget()/fdput() and
> use sock_from_file() directly.  That kills sockfd_lookup_light()
> and fput_light() (no users left).
> 
> 	What's more, we can get rid of explicit fdget()/fdput() by
> switching to CLASS(fd, ...) - code generation does not suffer, since
> now fdput() inserted on "descriptor is not opened" failure exit
> is recognized to be a no-op by compiler.
> 
> Reviewed-by: Christian Brauner <brauner@kernel.org>
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

...

> diff --git a/net/socket.c b/net/socket.c

...

> @@ -2926,16 +2900,18 @@ static int do_recvmmsg(int fd, struct mmsghdr __user *mmsg,
>  
>  	datagrams = 0;
>  
> -	sock = sockfd_lookup_light(fd, &err, &fput_needed);
> -	if (!sock)
> -		return err;
> +	CLASS(fd, f)(fd);
> +
> +	if (fd_empty(f))
> +		return -EBADF;
> +	sock = sock_from_file(fd_file(f));
> +	if (unlikely(!sock))
> +		return -ENOTSOCK;

Hi Al,

There is an unconditional check on err down on line 2977.
However, with the above change err is now only conditionally
set before we reach that line. Are you sure that it will always
be initialised by the time line 2977 is reached?

...

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCH v3 01/28] net/socket.c: switch to CLASS(fd)
  2024-11-02 12:21     ` [PATCH v3 01/28] net/socket.c: switch to CLASS(fd) Simon Horman
@ 2024-11-03  6:31       ` Al Viro
  2024-11-06 10:03         ` Simon Horman
  0 siblings, 1 reply; 134+ messages in thread
From: Al Viro @ 2024-11-03  6:31 UTC (permalink / raw)
  To: Simon Horman; +Cc: linux-fsdevel, brauner, cgroups, kvm, netdev, torvalds

On Sat, Nov 02, 2024 at 12:21:32PM +0000, Simon Horman wrote:

> > @@ -2926,16 +2900,18 @@ static int do_recvmmsg(int fd, struct mmsghdr __user *mmsg,
> >  
> >  	datagrams = 0;
> >  
> > -	sock = sockfd_lookup_light(fd, &err, &fput_needed);
> > -	if (!sock)
> > -		return err;
> > +	CLASS(fd, f)(fd);
> > +
> > +	if (fd_empty(f))
> > +		return -EBADF;
> > +	sock = sock_from_file(fd_file(f));
> > +	if (unlikely(!sock))
> > +		return -ENOTSOCK;
> 
> Hi Al,
> 
> There is an unconditional check on err down on line 2977.
> However, with the above change err is now only conditionally
> set before we reach that line. Are you sure that it will always
> be initialised by the time line 2977 is reached?

Nice catch, thank you.  It is possible, if you call recvmmsg(2) with
zero vlen and MSG_ERRQUEUE in flags.  Which is not going to be done in
any well-behaving code, making it really nasty - nothing like a kernel
bug that shows up only when trying to narrow down a userland bug upstream
of the syscall in question ;-/

AFAICS, that's the only bug of that sort in this commit - all other
places that used to rely upon successful sockfd_lookup_light() zeroing
err have an unconditional assignment to err shortly downstream of that.

Fix folded into commit in question, branch force-pushed; incremental follows
(I would rather not spam the lists with repost of the entire patchset for
the sake of that):

diff --git a/net/socket.c b/net/socket.c
index fb3806a11f94..c3ac02d060c0 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -2885,7 +2885,7 @@ static int do_recvmmsg(int fd, struct mmsghdr __user *mmsg,
 			  unsigned int vlen, unsigned int flags,
 			  struct timespec64 *timeout)
 {
-	int err, datagrams;
+	int err = 0, datagrams;
 	struct socket *sock;
 	struct mmsghdr __user *entry;
 	struct compat_mmsghdr __user *compat_entry;

^ permalink raw reply related	[flat|nested] 134+ messages in thread

* Re: [PATCH v3 01/28] net/socket.c: switch to CLASS(fd)
  2024-11-03  6:31       ` Al Viro
@ 2024-11-06 10:03         ` Simon Horman
  0 siblings, 0 replies; 134+ messages in thread
From: Simon Horman @ 2024-11-06 10:03 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-fsdevel, brauner, cgroups, kvm, netdev, torvalds

On Sun, Nov 03, 2024 at 06:31:13AM +0000, Al Viro wrote:
> On Sat, Nov 02, 2024 at 12:21:32PM +0000, Simon Horman wrote:
> 
> > > @@ -2926,16 +2900,18 @@ static int do_recvmmsg(int fd, struct mmsghdr __user *mmsg,
> > >  
> > >  	datagrams = 0;
> > >  
> > > -	sock = sockfd_lookup_light(fd, &err, &fput_needed);
> > > -	if (!sock)
> > > -		return err;
> > > +	CLASS(fd, f)(fd);
> > > +
> > > +	if (fd_empty(f))
> > > +		return -EBADF;
> > > +	sock = sock_from_file(fd_file(f));
> > > +	if (unlikely(!sock))
> > > +		return -ENOTSOCK;
> > 
> > Hi Al,
> > 
> > There is an unconditional check on err down on line 2977.
> > However, with the above change err is now only conditionally
> > set before we reach that line. Are you sure that it will always
> > be initialised by the time line 2977 is reached?
> 
> Nice catch, thank you.  It is possible, if you call recvmmsg(2) with
> zero vlen and MSG_ERRQUEUE in flags.  Which is not going to be done in
> any well-behaving code, making it really nasty - nothing like a kernel
> bug that shows up only when trying to narrow down a userland bug upstream
> of the syscall in question ;-/

Ouch.

> AFAICS, that's the only bug of that sort in this commit - all other
> places that used to rely upon successful sockfd_lookup_light() zeroing
> err have an unconditional assignment to err shortly downstream of that.

Yes, I was unable to find any others either.

> 
> Fix folded into commit in question, branch force-pushed; incremental follows
> (I would rather not spam the lists with repost of the entire patchset for
> the sake of that):

Thanks, will run some checks for good measure.
But the fix below looks good to me.

> diff --git a/net/socket.c b/net/socket.c
> index fb3806a11f94..c3ac02d060c0 100644
> --- a/net/socket.c
> +++ b/net/socket.c
> @@ -2885,7 +2885,7 @@ static int do_recvmmsg(int fd, struct mmsghdr __user *mmsg,
>  			  unsigned int vlen, unsigned int flags,
>  			  struct timespec64 *timeout)
>  {
> -	int err, datagrams;
> +	int err = 0, datagrams;
>  	struct socket *sock;
>  	struct mmsghdr __user *entry;
>  	struct compat_mmsghdr __user *compat_entry;
> 

^ permalink raw reply	[flat|nested] 134+ messages in thread

end of thread, other threads:[~2024-11-11 17:22 UTC | newest]

Thread overview: 134+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-07-30  5:09 [PATCHSET][RFC] struct fd and memory safety Al Viro
2024-07-30  5:15 ` [PATCH 01/39] memcg_write_event_control(): fix a user-triggerable oops viro
2024-07-30  5:15   ` [PATCH 02/39] introduce fd_file(), convert all accessors to it viro
2024-08-07  9:55     ` Christian Brauner
2024-07-30  5:15   ` [PATCH 03/39] struct fd: representation change viro
2024-07-30 18:10     ` Josef Bacik
2024-08-07 10:07       ` Christian Brauner
2024-08-07 10:03     ` Christian Brauner
2024-07-30  5:15   ` [PATCH 04/39] add struct fd constructors, get rid of __to_fd() viro
2024-08-07 10:09     ` Christian Brauner
2024-07-30  5:15   ` [PATCH 05/39] regularize emptiness checks in fini_module(2) and vfs_dedupe_file_range() viro
2024-08-07 10:10     ` Christian Brauner
2024-07-30  5:15   ` [PATCH 06/39] net/socket.c: switch to CLASS(fd) viro
2024-08-07 10:13     ` Christian Brauner
2024-07-30  5:15   ` [PATCH 07/39] introduce struct fderr, convert overlayfs uses to that viro
2024-07-30  5:15   ` [PATCH 08/39] experimental: convert fs/overlayfs/file.c to CLASS(...) viro
2024-07-30 19:10     ` Josef Bacik
2024-07-30 21:12       ` Al Viro
2024-07-31 21:11         ` Josef Bacik
2024-08-07 10:23     ` Christian Brauner
2024-07-30  5:15   ` [PATCH 09/39] timerfd: switch to CLASS(fd, ...) viro
2024-08-07 10:24     ` Christian Brauner
2024-07-30  5:15   ` [PATCH 10/39] get rid of perf_fget_light(), convert kernel/events/core.c to CLASS(fd) viro
2024-08-07 10:25     ` Christian Brauner
2024-07-30  5:15   ` [PATCH 11/39] switch netlink_getsockbyfilp() to taking descriptor viro
2024-08-07 10:26     ` Christian Brauner
2024-07-30  5:15   ` [PATCH 12/39] do_mq_notify(): saner skb freeing on failures viro
2024-07-30  5:15   ` [PATCH 13/39] do_mq_notify(): switch to CLASS(fd, ...) viro
2024-08-07 10:27     ` Christian Brauner
2024-07-30  5:16   ` [PATCH 14/39] simplify xfs_find_handle() a bit viro
2024-07-30  5:16   ` [PATCH 15/39] convert vmsplice() to CLASS(fd, ...) viro
2024-08-07 10:27     ` Christian Brauner
2024-07-30  5:16   ` [PATCH 16/39] convert __bpf_prog_get() " viro
2024-08-06 21:08     ` Andrii Nakryiko
2024-08-07 10:28     ` Christian Brauner
2024-07-30  5:16   ` [PATCH 17/39] bpf: resolve_pseudo_ldimm64(): take handling of a single ldimm64 insn into helper viro
2024-08-06 22:32     ` Andrii Nakryiko
2024-08-07 10:29       ` Christian Brauner
2024-08-07 15:30         ` Andrii Nakryiko
2024-08-08 16:51           ` Alexei Starovoitov
2024-08-08 20:35             ` Andrii Nakryiko
2024-08-09  1:23               ` Alexei Starovoitov
2024-08-09 17:23                 ` Andrii Nakryiko
2024-08-10  3:29             ` Al Viro
2024-08-12 20:05               ` Andrii Nakryiko
2024-08-13  2:06                 ` Al Viro
2024-08-13  3:32                   ` Andrii Nakryiko
2024-07-30  5:16   ` [PATCH 18/39] bpf maps: switch to CLASS(fd, ...) viro
2024-08-07 10:34     ` Christian Brauner
2024-07-30  5:16   ` [PATCH 19/39] fdget_raw() users: switch to CLASS(fd_raw, ...) viro
2024-08-07 10:35     ` Christian Brauner
2024-07-30  5:16   ` [PATCH 20/39] introduce "fd_pos" class, convert fdget_pos() users to it viro
2024-08-07 10:36     ` Christian Brauner
2024-07-30  5:16   ` [PATCH 21/39] o2hb_region_dev_store(): avoid goto around fdget()/fdput() viro
2024-07-30  5:16   ` [PATCH 22/39] privcmd_ioeventfd_assign(): don't open-code eventfd_ctx_fdget() viro
2024-07-30  5:16   ` [PATCH 23/39] fdget(), trivial conversions viro
2024-08-07 10:37     ` Christian Brauner
2024-07-30  5:16   ` [PATCH 24/39] fdget(), more " viro
2024-08-07 10:39     ` Christian Brauner
2024-07-30  5:16   ` [PATCH 25/39] convert do_preadv()/do_pwritev() viro
2024-08-07 10:39     ` Christian Brauner
2024-07-30  5:16   ` [PATCH 26/39] convert cachestat(2) viro
2024-08-07 10:39     ` Christian Brauner
2024-07-30  5:16   ` [PATCH 27/39] switch spufs_calls_{get,put}() to CLASS() use viro
2024-07-30  5:16   ` [PATCH 28/39] convert spu_run(2) viro
2024-08-07 10:40     ` Christian Brauner
2024-07-30  5:16   ` [PATCH 29/39] convert media_request_get_by_fd() viro
2024-08-07 10:40     ` Christian Brauner
2024-07-30  5:16   ` [PATCH 30/39] convert coda_parse_fd() viro
2024-08-07 10:41     ` Christian Brauner
2024-07-30  5:16   ` [PATCH 31/39] convert cifs_ioctl_copychunk() viro
2024-08-07 10:41     ` Christian Brauner
2024-07-30  5:16   ` [PATCH 32/39] convert vfs_dedupe_file_range() viro
2024-08-07 10:42     ` Christian Brauner
2024-07-30  5:16   ` [PATCH 33/39] convert do_select() viro
2024-08-07 10:42     ` Christian Brauner
2024-07-30  5:16   ` [PATCH 34/39] do_pollfd(): convert to CLASS(fd) viro
2024-08-07 10:43     ` Christian Brauner
2024-07-30  5:16   ` [PATCH 35/39] convert bpf_token_create() viro
2024-08-06 22:42     ` Andrii Nakryiko
2024-08-10  3:46       ` Al Viro
2024-08-12 20:06         ` Andrii Nakryiko
2024-08-07 10:44     ` Christian Brauner
2024-07-30  5:16   ` [PATCH 36/39] assorted variants of irqfd setup: convert to CLASS(fd) viro
2024-08-07 10:46     ` Christian Brauner
2024-08-10  3:53       ` Al Viro
2024-07-30  5:16   ` [PATCH 37/39] memcg_write_event_control(): switch " viro
2024-08-07 10:47     ` Christian Brauner
2024-07-30  5:16   ` [PATCH 38/39] css_set_fork(): switch to CLASS(fd_raw, ...) viro
2024-08-07 10:47     ` Christian Brauner
2024-07-30  5:16   ` [PATCH 39/39] deal with the last remaing boolean uses of fd_file() viro
2024-08-07 10:48     ` Christian Brauner
2024-07-30  7:13   ` [PATCH 01/39] memcg_write_event_control(): fix a user-triggerable oops Michal Hocko
2024-07-30  7:18     ` Al Viro
2024-07-30  7:37       ` Michal Hocko
2024-07-30  5:17 ` [PATCHSET][RFC] struct fd and memory safety Al Viro
2024-07-30 20:02 ` Josef Bacik
2024-07-31  0:43 ` Al Viro
2024-08-06 17:58 ` Jason Gunthorpe
2024-08-06 18:56   ` Al Viro
2024-08-07 10:51 ` Christian Brauner
2024-11-02  5:02 ` [PATCHSET v3] " Al Viro
2024-11-02  5:07   ` [PATCH v3 01/28] net/socket.c: switch to CLASS(fd) Al Viro
2024-11-02  5:08     ` [PATCH v3 02/28] regularize emptiness checks in fini_module(2) and vfs_dedupe_file_range() Al Viro
2024-11-02  5:08     ` [PATCH v3 03/28] timerfd: switch to CLASS(fd) Al Viro
2024-11-02  5:08     ` [PATCH v3 04/28] get rid of perf_fget_light(), convert kernel/events/core.c " Al Viro
2024-11-02  5:08     ` [PATCH v3 05/28] switch netlink_getsockbyfilp() to taking descriptor Al Viro
2024-11-02  5:08     ` [PATCH v3 06/28] do_mq_notify(): saner skb freeing on failures Al Viro
2024-11-02  5:08     ` [PATCH v3 07/28] do_mq_notify(): switch to CLASS(fd) Al Viro
2024-11-02  5:08     ` [PATCH v3 08/28] simplify xfs_find_handle() a bit Al Viro
2024-11-02  5:08     ` [PATCH v3 09/28] convert vmsplice() to CLASS(fd) Al Viro
2024-11-02  5:08     ` [PATCH v3 10/28] fdget_raw() users: switch to CLASS(fd_raw) Al Viro
2024-11-02  5:08     ` [PATCH v3 11/28] introduce "fd_pos" class, convert fdget_pos() users to it Al Viro
2024-11-02  5:08     ` [PATCH v3 12/28] o2hb_region_dev_store(): avoid goto around fdget()/fdput() Al Viro
2024-11-02  5:08     ` [PATCH v3 13/28] privcmd_ioeventfd_assign(): don't open-code eventfd_ctx_fdget() Al Viro
2024-11-02  5:08     ` [PATCH v3 14/28] fdget(), trivial conversions Al Viro
2024-11-11 17:22       ` Francesco Lavra
2024-11-02  5:08     ` [PATCH v3 15/28] fdget(), more " Al Viro
2024-11-02  5:08     ` [PATCH v3 16/28] convert do_preadv()/do_pwritev() Al Viro
2024-11-02  5:08     ` [PATCH v3 17/28] convert cachestat(2) Al Viro
2024-11-02  5:08     ` [PATCH v3 18/28] switch spufs_calls_{get,put}() to CLASS() use Al Viro
2024-11-02  5:08     ` [PATCH v3 19/28] convert spu_run(2) Al Viro
2024-11-02  5:08     ` [PATCH v3 20/28] convert media_request_get_by_fd() Al Viro
2024-11-02  5:08     ` [PATCH v3 21/28] convert cifs_ioctl_copychunk() Al Viro
2024-11-02  5:08     ` [PATCH v3 22/28] convert vfs_dedupe_file_range() Al Viro
2024-11-02  5:08     ` [PATCH v3 23/28] convert do_select() Al Viro
2024-11-02  5:08     ` [PATCH v3 24/28] do_pollfd(): convert to CLASS(fd) Al Viro
2024-11-02  5:08     ` [PATCH v3 25/28] assorted variants of irqfd setup: " Al Viro
2024-11-02  5:08     ` [PATCH v3 26/28] memcg_write_event_control(): switch " Al Viro
2024-11-02  5:08     ` [PATCH v3 27/28] css_set_fork(): switch to CLASS(fd_raw, ...) Al Viro
2024-11-02  5:08     ` [PATCH v3 28/28] deal with the last remaing boolean uses of fd_file() Al Viro
2024-11-02 12:21     ` [PATCH v3 01/28] net/socket.c: switch to CLASS(fd) Simon Horman
2024-11-03  6:31       ` Al Viro
2024-11-06 10:03         ` Simon Horman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).