linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Al Viro <viro@zeniv.linux.org.uk>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Christian Brauner <brauner@kernel.org>,
	linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [GIT PULL] pidfd updates
Date: Tue, 25 Apr 2023 07:04:27 +0100	[thread overview]
Message-ID: <20230425060427.GP3390869@ZenIV> (raw)
In-Reply-To: <CAHk-=whOE+wXrxykHK0GimbNmxyr4a07kTpG8dzoceowTz1Yxg@mail.gmail.com>

On Mon, Apr 24, 2023 at 01:24:24PM -0700, Linus Torvalds wrote:

> But I really think a potentially much nicer model would have been to
> extend our "get_unused_fd_flags()" model.
> 
> IOW, we could have instead marked the 'struct file *' in the file
> descriptor table as being "not ready yet".
> 
> I wonder how nasty it would have been to have the low bit of the
> 'struct file *' mark "not ready to be used yet" or something similar.
> You already can't just access the 'fdt->fd[]' array willy-nilly since
> we have both normal RCU issues _and_ the somewhat unusual spectre
> array indexing issues.
> 
> So looking around with
> 
>     git grep -e '->fd\['
> 
> we seem to be pretty good about that and it probably wouldn't be too
> horrid to add a "check low bit isn't set" to the rules.
> 
> Then pidfd_prepare() could actually install the file pointer in the fd
> table, just marked as "not ready", and then instead of "fd_install()",
> yuo'd have "fd_expose(fd)" or something.
> 
> I dislike interfaces that return two different things. Particularly
> ones that are supposed to be there to make things easy for the user. I
> think your pidfd_prepare() helper fails that "make it easy to use"
> test.
> 
> Hmm?

I'm not fond of "return two things" kind of helpers, but I'm even less
fond of "return fd, file is already there" ones, TBH.  {__,}pidfd_prepare()
users are thankfully very limited in the things they do to the file that
had been returned, but that really invites abuse.

The deeper in call chain we mess with descriptor table, the more painful it
gets, IME.

Speaking of {__,}pidfd_prepare(), I wonder if we wouldn't be better off
with get_unused_fd_flags() lifted into the callers - all three of those
(fanotify copy_event_to_user(), copy_process() and pidfd_create()).
Switch from anon_inode_getfd() to anon_inode_getfile() certainly
made sense, ditto for combining it with get_pid(), but mixing
get_unused_fd_flags() into that is a mistake, IMO.

As for your suggestion... let's see what it leads to.

	Suppose we add such entries (reserved, hold a reference to file,
marked "not yet available" in the LSB).  From the current tree POV those
would be equivalent to descriptor already reserved, but fd_install() not
done.  So behaviour of existing primitives should be the same as for this
situation, except for fd_install() and put_unused_fd().

	* pick_file(), __fget_files_rcu(), iterate_fd(), files_lookup_fd_raw(),
	  loop in dup_fd(), io_close() - treat odd pointers as NULL.
	* close_files() should, AFAICS, treat an odd pointer as "should never
happen" (and that xchg() in there needs to go anyway - it's pointless, since
we are freeing the the array immediately afterwards.
	* do_close_on_exec() should probably treat them as "should never happen".
	* do_dup2() - odd value should be treated as -EBUSY.

The interesting part, of course, is how to legitimize (or dispose of) such
a beast.  The former is your "fd_expose()" - parallel to fd_install(),
AFAICS.  The latter... another primitive that would
	grab ->files_lock
	pick_file() variant that *expects* an odd value
	drop ->files_lock
	clear LSB and pass to fput().

It's doable, but AFAICS doesn't make callers all that happier...

  parent reply	other threads:[~2023-04-25  6:04 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-04-21 13:41 [GIT PULL] pidfd updates Christian Brauner
2023-04-24 20:24 ` Linus Torvalds
2023-04-24 20:35   ` Linus Torvalds
2023-04-25 12:08     ` Christian Brauner
2023-04-25  6:04   ` Al Viro [this message]
2023-04-25 12:34     ` Christian Brauner
2023-04-25 13:54       ` Al Viro
2023-04-25 14:36         ` Christian Brauner
2023-04-25 15:48           ` Al Viro
2023-04-25 16:28       ` Linus Torvalds
2023-04-25 17:19         ` Al Viro
2023-04-28  8:40         ` David Laight
2023-04-28 18:26           ` Linus Torvalds
2023-04-27  1:07       ` Al Viro
2023-04-27  7:39         ` Al Viro
2023-04-27  8:33           ` Christian Brauner
2023-04-27  8:59             ` Al Viro
2023-04-27  9:40               ` Christian Brauner
2023-04-27 15:21           ` Linus Torvalds
2023-04-27 17:02             ` Al Viro
2023-05-02  7:11               ` Christian Brauner
2023-04-27  8:11         ` Christian Brauner
2023-04-24 21:45 ` pr-tracker-bot

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230425060427.GP3390869@ZenIV \
    --to=viro@zeniv.linux.org.uk \
    --cc=brauner@kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).