linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Al Viro <viro@zeniv.linux.org.uk>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: linux-fsdevel@vger.kernel.org, Christian Brauner <brauner@kernel.org>
Subject: Re: [RFC] more close_range() fun
Date: Fri, 16 Aug 2024 18:19:25 +0100	[thread overview]
Message-ID: <20240816171925.GB504335@ZenIV> (raw)
In-Reply-To: <CAHk-=wh_K+qj=gmTjiUqr8R3x9Tco31FSBZ5qkikKN02bL4y7A@mail.gmail.com>

On Fri, Aug 16, 2024 at 09:26:45AM -0700, Linus Torvalds wrote:
> On Thu, 15 Aug 2024 at 20:03, Al Viro <viro@zeniv.linux.org.uk> wrote:
> >
> > It *can* actually happen - all it takes is close_range(2) decision
> > to trim the copied descriptor table made before the first dup2()
> > and actual copying done after both dup2() are done.
> 
> I think this is fine. It's one of those "if user threads have no
> serialization, they get what they get" situations.

As it is, unshare(CLOSE_FILES) gives you a state that might be possible
if you e.g. attached a debugger to the process and poked around in
descriptor table.  CLOSE_RANGE_UNSHARE is supposed to be a shortcut
for unshare + plain close_range(), so having it end up with weird
states looks wrong.

For descriptor tables we have something very close to TSO (and possibly
the full TSO - I'll need to get some coffee and go through the barriers
we've got on the lockless side of fd_install()); this, OTOH, is not
quite Alpha-level weirdness, but it's not far from that.  And unlike
Alpha we don't have excuses along the lines of "it's cheaper that way" -
it really isn't any cheaper.

The variant I'm testing right now seems to be doing fine (LTP and about
halfway through the xfstests, with no regressions and no slowdowns)
and it's at
 fs/file.c               | 63 +++++++++++++++++--------------------------------
 include/linux/fdtable.h |  6 ++---
 kernel/fork.c           | 11 ++++-----
 3 files changed, 28 insertions(+), 52 deletions(-)

Basically,
	* switch CLOSE_UNSHARE_RANGE from unshare_fd() to dup_fd()
	* instead of "trim down to that much" pass dup_fd() an
optional "we'll be punching a hole from <this> to <that>", which
gets passed to sane_fdtable_size() (NULL == no hole to be punched).
	* in sane_fdtable_size()
		find last occupied bit in ->open_fds[]
		if asked to punch a hole and if that last bit is within
the hole, find last occupied bit below the hole
		round up last occupied plus 1 to BITS_PER_LONG.
All it takes, and IMO it's simpler that way.

  reply	other threads:[~2024-08-16 17:19 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-08-16  3:03 [RFC] more close_range() fun Al Viro
2024-08-16  3:07 ` Al Viro
2024-08-16  8:25 ` Christian Brauner
2024-08-16 11:15   ` Al Viro
2024-08-16 11:49     ` Al Viro
2024-08-16 16:26 ` Linus Torvalds
2024-08-16 17:19   ` Al Viro [this message]
2024-08-16 17:22     ` Al Viro
2024-08-16 17:55     ` Linus Torvalds
2024-08-16 17:58       ` Linus Torvalds
2024-08-16 18:15       ` Al Viro
2024-08-16 18:26         ` Linus Torvalds
2024-08-16 20:26           ` Al Viro
2024-08-16 23:35             ` Al Viro
2024-08-22  0:00               ` Al Viro
2024-10-04  4:52                 ` Al Viro

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240816171925.GB504335@ZenIV \
    --to=viro@zeniv.linux.org.uk \
    --cc=brauner@kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).