[RFC] struct fd situation

All of lore.kernel.org
 help / color / mirror / Atom feed

* [RFC] struct fd situation
@ 2024-05-31  3:18 Al Viro
  2024-05-31 16:30 ` Al Viro
  0 siblings, 1 reply; 5+ messages in thread
From: Al Viro @ 2024-05-31  3:18 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-fsdevel

	I've done another round of review of users.

1)  There are two leaks I'd missed earlier; missing fdput() on a failure
exit in arch/powerpc/kvm/powerpc.c and in drivers/media/rc/lirc_dev.c.
	I'll throw fixes into #fixes and send a pull request; that
part is obvious.

2)  Almost all values come from fdget(), fdget_raw() or fdget_pos().
Exceptions: several places that initialize to empty and a couple
of overlayfs functions that construct the sucker manually.  The latter
can bloody well generate bogus values on error.  fdget_pos() pairs
with fdput_pos(), everything else pairs with fdput().

3)  fdput(empty) should be a no-op; we have enough places relying upon
that and it makes sense in general.  Currently it is true, but compiler
has no way to figure that out.

4)  emptiness check is done by absolute majority of users as the
first thing after initialization.  The way it is implemented is
"check if fd.file is NULL"; currently we almost always skip fdput()
if that condition holds.  There are exceptions where we have fdput()
called (kernel_read_file_from_fd(), ksys_sync_file_range(),
do_utimes_fd(), ksys_readahead(), f_dupfd_query(), snd_pcm_link(),
xfs_find_handle(), finit_module(2)) regardless of emptiness.
That may result in excessive check (we look at FDPUT_FPUT in fd.flags,
which is never going to be set when fd.file is NULL).

Said that, there's a worse (still minor) problem with code generation -
most of the places checking for emptiness do not bother with
unlikely().  IMO that's a convincing argument in favour of explicit
fd_empty() or whatever we end up calling it.

5)  the most common (by far) pattern is
	f = fdget(n);
	if (!f.file)
		bugger off
	stuff using f.file
	fdput(f);
_usually_ the failure happens with -EBADF, but there's a weirdness galore
there - EINVAL, ENOENT, ENXIO, EBADFD (almost certainly misspelled EBADF),
etc.

There are several groups of unusual cases:
	* as mentioned above, some do fdput() on all exits, including the
"fd is empty" one.  Otherwise identical to the common variant.
	* two CLASS(fd{,_raw}) users.  Equivalent to the above.
	* a few places where we have struct fd initialized empty (NULL, 0),
with fdget() done in some cases, followed by unconditional fdput().
	* ipc/mqueue.c:do_mq_notify(); struct fd reused - fdget() after
fdput().  Usual otherwise.
	* net/socket.c; sockfd_lookup_light().  What happens there is
that struct fd instance is created in frame of sockfd_lookup_light(),
dropped if file is not a socket, then socket and fd.flags gets passed
to caller, with fput_light(sock->file, need_fput) done on subsequent
exits in caller.  Minimal patch would be to lift the struct fd into
the caller, along with dropping it on non-sockets and replace fput_light()
calls with fdput().  That would convert them to usual pattern; I don't
believe that it would make things any slower than they are now.
	* fs/splice.c:vmsplice_type().  One caller, called after fdget()
without a prior emptiness check, consumes struct fd on failure.  IMO
the call of fdput() ought to be lifted into the caller; perhaps the
entire thing should be expanded there.
	* fs/timerfd.c: timerfd_fget(): fdget() + check that it has the
right ->f_op then fdput() or move to caller's struct fd (passed by
reference).  Caller's instance is left uninitialized on error.  Only
two callers.  My preference would be to expand both, TBH - then
it would be the usual pattern.
	* kernel/events/core.c:perf_fget_light().  Similar to timerfd
case.
	* BPF: messy.  __bpf_map_get(f) is called right after fdget(),
without an emptiness check.  Verifies that fd is not empty, that
file is ours (fdput() if not), then returns file->private_data.
Note that proof of correctness requires showing that ->private_data is
never ERR_PTR() for those - otherwise the callers would leak.
IMO fdput() ought to be lifted into the callers, probably along with
the emptiness check.  All subsequent exits in the callers are
calling fdput() anyway... ____bpf_prog_get() is similar, but has
only one caller; I'd simply expand it.
	* overlayfs.  ovl_real_fdget() has rather unpleasant calling
conventions; it gets struct fd *real and returns an int; it builds
struct fd in *real and return 0 or -E...; the value left in *real
may be invalid unless 0 has been returned.  And it's not just left
uninitialized - bogus values are explicitly stored there.
ovl_real_fdget_meta() is similar.  Callers use it to initialize
their struct fd instance and return immediately if an error has
been returned.  All subsequent exits call fdput().

	That's it.

6)  The things clean up considerably if we use CLASS(fd{,_raw}) on
those suckers.  I've done mockup patches for that, and the end
result is a lot nicer than what we have right now.  The main issue
with doing such conversions is that failure exit on emptiness
grows an out-of-line check for FDPUT_FPUT and (never taken) call
of fput().  OTOH, the win from having that failure exit moved out
of line offsets that and it _is_ a slow path.

7)  Avoiding that stuff on EBADF exits is interesting.

It is possible to do without changing struct fd representation, but it's
somewhat iffy.  IME the following bit of ugliness
static always_inline void fdput(struct fd f)
{
	if (f.flags & FDPUT_FPUT && f.file)
		fput(f.file);
}

is enough for good code generation - I don's see any needless checks for
f.file in the resulting assembler, at least not with gcc 12 builds on x86.

gcc does manage to keep track of having seen a local struct member
NULL or non-NULL.

Alternative would be to turn struct fd into a struct-wrapped unsigned long
and either use a flag for emptiness checks (we can steal more from the
pointer) or just compare with zero for empitness check.

The former might allow to represent specific errors, which would give a neat
solution for ovl_real_fd() calling conventions - instead of "return an error,
fill user-supplied struct fd" it could just return an error-representing
struct fd and be done with that.

Unfortunately, gcc optimizer really stinks when it comes to bitops -
it can keep track of 'this field of struct has been found non-NULL'
easily, but 'this bit had been found set' is lost very easily.
So clever encodings are going to be brittle as far as code generation
is concerned.

	One lovely example (already posted in one of the old threads)
is this:
	if (v & 1)
		if (!(v & 1) && !(v & 2))
			foo();
is not recognized as a no-op.  Reason: gcc figures out that condition
in second if can be rewritten as !(v & 3) and after it does that
rewrite it can't figure out that if bit 0 is set, we can't have
both bit 0 and bit 1 clear.  clang manages that, gcc 12 does not.
It is possible to work around -
	if (v & 1)
		if (!(v & 1)) if (!(v & 2))
			foo();
is recognized to be a no-op.  But that kind of crap is too brittle -
minor changes in optimizer can screw it over.

That example is very close to what we'd need if fd_empty() turned
into checking a flag.  So unless somebody has a clever scheme that
would avoid that kind of fun, this is probably no-go.

If we give up on representing errors, we can use 0 for empty.
That means the following sequence of patches:
	* introduce fd_empty() and fd_file().
	* convert f.file accesses into fd_empty(f) (in boolean
contexts) and fd_file(f) (everything else).  That would have
to be done as a series of commits - doing a single-commit variant
would guarantee fuckloads of conflicts through the cycle, all
with the same commit.  And it's not entirely mechanical either.
	* switch representation.
	* follow with CLASS(fd, f) series of cleanups.
Doable, but it would be a long series.

Alternative would be to do the minimal fdput() modification (as above)
plus definitions of fd_empty() and fd_file(), followed by combinations
of "switch to fd_file/fd_empty, use CLASS(fd) where it makes sense" patches
from the first variant and finally switch the representation.  The series
is obviously shorter that way...

Comments?

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RFC] struct fd situation
  2024-05-31  3:18 [RFC] struct fd situation Al Viro
@ 2024-05-31 16:30 ` Al Viro
  2024-05-31 16:40   ` Al Viro
  2024-05-31 18:44   ` Linus Torvalds
  0 siblings, 2 replies; 5+ messages in thread
From: Al Viro @ 2024-05-31 16:30 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-fsdevel

On Fri, May 31, 2024 at 04:18:02AM +0100, Al Viro wrote:

> Alternative would be to turn struct fd into a struct-wrapped unsigned long
> and either use a flag for emptiness checks (we can steal more from the
> pointer) or just compare with zero for empitness check.
> 
> The former might allow to represent specific errors, which would give a neat
> solution for ovl_real_fd() calling conventions - instead of "return an error,
> fill user-supplied struct fd" it could just return an error-representing
> struct fd and be done with that.
> 
> Unfortunately, gcc optimizer really stinks when it comes to bitops -
[snip]

> That example is very close to what we'd need if fd_empty() turned
> into checking a flag.  So unless somebody has a clever scheme that
> would avoid that kind of fun, this is probably no-go.
> 
> If we give up on representing errors, we can use 0 for empty.

Hmm...  FWIW, we could do something like this:

fd: 0 for empty, (unsigned long)p | flags, with p being an address of struct file
fd_err: a non-zero value possible for fd or (unsigned long)-E... for an error.

The range of valid values for fd does not overlap with
	(unsigned long)-MAX_ERRNO..(unsigned long)-1
since that would require IS_ERR(p) to be true and if we ever had that for
an address of some object, we would have worse problems.

That would be something along the lines of

#define fd_file(f) (struct file *)((f).word & ~3)

static inline bool fd_empty(struct fd f)
{
	return unlikely(!f.word);
}

static inline void fdput(struct fd f)
{
	if (f.word & FDPUT_FPUT)
		fput(fd_file(f));
}

static inline bool __must_check FD_IS_ERR(struct fd_err f)
{
	return IS_ERR_VALUE(f.word);
}

static inline void fdput_err(struct fd_err f)
{
	if (!FD_IS_ERR(f))
		fdput((struct fd){f.word});
}

static inline long __must_check FD_ERR(struct fd_err f)
{
	return (long)fd.word;
}

That's basically the difference between "pointer to object or NULL" and
"pointer to object or ERR_PTR()", only with distinguishable types for
those...

Interesting...  So we get something like

static inline struct fd_err file_fd_err(struct file *f, bool is_cloned)
{
	if (IS_ERR(f))
		return (struct fd_err){PTR_ERR(f)};
	else
		return (struct fd_err){(unsigned long)f | is_cloned};
}

(or, perhaps, file_fd_cloned and file_fd_borrowed, to get rid of
'is_cloned' argument in public API; need to play around a bit and
see what works better - the interesting part is what the constructor
for struct fd would look like)

with 

static struct fd_err ovl_real_fdget_meta(const struct file *file, bool allow_meta)
{
        struct dentry *dentry = file_dentry(file);
        struct path realpath;
        int err;

        if (allow_meta) {
                ovl_path_real(dentry, &realpath);
        } else {
                /* lazy lookup and verify of lowerdata */
                err = ovl_verify_lowerdata(dentry);
                if (err)
                        return file_fd_err(ERR_PTR(err), false);

                ovl_path_realdata(dentry, &realpath);
        }
        if (!realpath.dentry)
                return file_fd_err(ERR_PTR(-EIO), false);

        /* Has it been copied up since we'd opened it? */
        if (unlikely(file_inode(real->file) != d_inode(realpath.dentry)))
		return file_fd_err(ovl_open_realfile(file, &realpath), true);

        /* Did the flags change since open? */
        if (unlikely((file->f_flags ^ real->file->f_flags) & ~OVL_OPEN_FLAGS)) {
		err = ovl_change_flags(real->file, file->f_flags);
		if (err)
			return file_fd_err(ERR_PTR(err), false);
	}

        return file_fd_err(file->private_data, false);
}

static struct fd_err ovl_real_fdget(const struct file *file)
{
        if (d_is_dir(file_dentry(file)))
		return file_fd_err(ovl_dir_real_file(file, false), false);

        return ovl_real_fdget_meta(file, false);
}

with callers looking like

        real = ovl_real_fdget(file);
        if (FD_IS_ERR(real))
                return FD_ERR(real);
	do stuff, using fd_file(real)
	fdput_err(real);

with possibility of __cleanup() use, since fdput_err() on
early failure exit would be a no-op.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RFC] struct fd situation
  2024-05-31 16:30 ` Al Viro
@ 2024-05-31 16:40   ` Al Viro
  2024-05-31 18:44   ` Linus Torvalds
  1 sibling, 0 replies; 5+ messages in thread
From: Al Viro @ 2024-05-31 16:40 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-fsdevel

On Fri, May 31, 2024 at 05:30:45PM +0100, Al Viro wrote:
> static inline struct fd_err file_fd_err(struct file *f, bool is_cloned)
> {
> 	if (IS_ERR(f))
> 		return (struct fd_err){PTR_ERR(f)};
> 	else
> 		return (struct fd_err){(unsigned long)f | is_cloned};
> }
> 
> (or, perhaps, file_fd_cloned and file_fd_borrowed, to get rid of
> 'is_cloned' argument in public API; need to play around a bit and
> see what works better - the interesting part is what the constructor
> for struct fd would look like)

... along with

static inline struct fd_err ERR_FD(long n)
{
	return (struct fd_err){.word = (unsigned long)n};
}

yielding

> static struct fd_err ovl_real_fdget_meta(const struct file *file, bool allow_meta)
> {
>         struct dentry *dentry = file_dentry(file);
>         struct path realpath;
>         int err;
> 
>         if (allow_meta) {
>                 ovl_path_real(dentry, &realpath);
>         } else {
>                 /* lazy lookup and verify of lowerdata */
>                 err = ovl_verify_lowerdata(dentry);
>                 if (err)
			return ERR_FD(err);
> 
>                 ovl_path_realdata(dentry, &realpath);
>         }
>         if (!realpath.dentry)
		return ERR_FD(-EIO);
> 
>         /* Has it been copied up since we'd opened it? */
>         if (unlikely(file_inode(real->file) != d_inode(realpath.dentry)))
		return file_fd_cloned(ovl_open_realfile(file, &realpath));
> 
>         /* Did the flags change since open? */
>         if (unlikely((file->f_flags ^ real->file->f_flags) & ~OVL_OPEN_FLAGS)) {
> 		err = ovl_change_flags(real->file, file->f_flags);
> 		if (err)
			return ERR_FD(err);
> 	}
> 
	return file_fd_borrowed(file->private_data);
> }

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RFC] struct fd situation
  2024-05-31 16:30 ` Al Viro
  2024-05-31 16:40   ` Al Viro
@ 2024-05-31 18:44   ` Linus Torvalds
  2024-05-31 22:01     ` Al Viro
  1 sibling, 1 reply; 5+ messages in thread
From: Linus Torvalds @ 2024-05-31 18:44 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-fsdevel

On Fri, 31 May 2024 at 09:30, Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> That would be something along the lines of

Looks all sane to me.

Putting error values in there worries me just a tiny bit, because they
won't have the low bits clear, so error values will match the FDPUT_X
bits, but your approach of using another type for it looks like a sane
and clean model.

In fact, with the different types, I think you could make things
cleaner by using _Generic(), and do things like

   #define fd_empty(f) _Generic(f, \
        struct fd: unlikely(!(f).word), \
        struct fd_err: unlikely(IS_ERR((f).word))

or something like that. Untested, written in the MUA, but I think it's
a good way to avoid having to remember which kind you're working with.

The above obviously assumes that an 'fd_err' never contains NULL.
Either it's the pointer to the fd (with whatever FDPUT_xyz bits) or
it's an error value. I don't know if that was your plan.

You can use similar tricks to then get the error value, ie

   #define fd_error(f) _Generic(f, \
        struct fd: -EBADF, \
        struct fd_err: PTR_ERR((f).word))

and now you can write code like

        if (fd_empty(x))
                return fd_error(x);

and it will automagically just DTRT, regardless of whether 'x' is a
'struct fd' or a 'struct fd_err'.

The same model goes for 'fdput()' - don't make people have to use two
names and pointlessly state the type.

The bad old days where people were convinced that "Hungarian notation"
was a good thing were bad. Let's leave them behind completely.

Worth it? Who knows.. But I like your type-based approach, and I think
that _Generic() thing would fit very very with it.

             Linus

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RFC] struct fd situation
  2024-05-31 18:44   ` Linus Torvalds
@ 2024-05-31 22:01     ` Al Viro
  0 siblings, 0 replies; 5+ messages in thread
From: Al Viro @ 2024-05-31 22:01 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-fsdevel

On Fri, May 31, 2024 at 11:44:22AM -0700, Linus Torvalds wrote:

> The above obviously assumes that an 'fd_err' never contains NULL.
> Either it's the pointer to the fd (with whatever FDPUT_xyz bits) or
> it's an error value. I don't know if that was your plan.

Definitely.  Mixing NULL and ERR_PTR() for error reporting is a bloody
bad idea; we *do* have functions that may legitimately return both,
but there NULL does not serve as an error indicator - ->lookup()
is one such case (ERR_PTR() on error, NULL - success, use dentry
passed to ->lookup(), non-NULL - a different dentry had to be
spliced in, use it instead).  IS_ERR_OR_NULL() is almost always
a sign of bad calling conventions, or the author being too lazy to
check what the calling conventions are... ;-/

> You can use similar tricks to then get the error value, ie
> 
>    #define fd_error(f) _Generic(f, \
>         struct fd: -EBADF, \
>         struct fd_err: PTR_ERR((f).word))
> 
> and now you can write code like
> 
>         if (fd_empty(x))
>                 return fd_error(x);
> 
> and it will automagically just DTRT, regardless of whether 'x' is a
> 'struct fd' or a 'struct fd_err'.

... except that there's a sizable fraction of those checks that return
something completely different instead of -EBADF ;-/  Userland ABI,
can't do anything about it...

> The same model goes for 'fdput()' - don't make people have to use two
> names and pointlessly state the type.

Maybe, but that depends upon the amount of explicit calls of either
left when we are done.  Conversions to CLASS(fd) eliminate a lot
of those...

> The bad old days where people were convinced that "Hungarian notation"
> was a good thing were bad. Let's leave them behind completely.
> 
> Worth it? Who knows.. But I like your type-based approach, and I think
> that _Generic() thing would fit very very with it.

I'm fine with that for fd_empty(); for the rest... let's see how the
things look like at the end of that series.

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2024-05-31 22:01 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-05-31  3:18 [RFC] struct fd situation Al Viro
2024-05-31 16:30 ` Al Viro
2024-05-31 16:40   ` Al Viro
2024-05-31 18:44   ` Linus Torvalds
2024-05-31 22:01     ` Al Viro

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.