public inbox for linux-fsdevel@vger.kernel.org
 help / color / mirror / Atom feed
From: Al Viro <viro@zeniv.linux.org.uk>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Christian Brauner <brauner@kernel.org>, linux-fsdevel@vger.kernel.org
Subject: Re: [PATCH] fix bitmap corruption on close_range() with CLOSE_RANGE_UNSHARE
Date: Tue, 6 Aug 2024 00:44:56 +0100	[thread overview]
Message-ID: <20240805234456.GK5334@ZenIV> (raw)
In-Reply-To: <20240804211322.GD5334@ZenIV>

On Sun, Aug 04, 2024 at 10:13:22PM +0100, Al Viro wrote:
> On Sun, Aug 04, 2024 at 08:18:14AM -0700, Linus Torvalds wrote:
> > On Sat, 3 Aug 2024 at 20:47, Al Viro <viro@zeniv.linux.org.uk> wrote:
> > >
> > >         bitmap_copy_and_extend(nfdt->open_fds, ofdt->open_fds,
> > >                         copy_words * BITS_PER_LONG, nwords * BITS_PER_LONG);
> > 
> > Ok, thinking about it, I like this interface, since it looks like the
> > compiler would see that it's in BITS_PER_LONG chunks if we inline it
> > and decide it's important.
> > 
> > So make it so.
> 
> FWIW, what I'm testing right now is the patch below; it does fix the reproducer
> and it seems to be having no problems with xfstests so far.  Needs profiling;
> again, no visible slowdowns on xfstests, but...

Seems to work, no visible slowdowns.  Pushed into #fixes.

BTW, speaking of fdtable, the longer I look at it, the more I'm tempted to
try and kill it off completely.

Look:
	->fdt->max_fds is non-decreasing
	->fdt->full_fds_bits only accessed under ->files_lock
	->fdt->open_fds is sometimes used with rcu_read_lock() alone,
but such users are very few - it's
	bitmap_weight(fdt->open_fds, fdt->max_fds);
in proc_readfd_count() (stat on /proc/*/fd) and max_select_fds();
the latter is checking if anything in the fd_set_bits passed to
select(2) is not opened, in addition to looking for the maximal
descriptor passed to select(2) in that.  Then we do fdget() on
all of them during the main loop of select().  Sure, we want
-EBADF if it hadn't been opened at all and we want EPOLLNVAL
on subsequent passes, but that could've been handled easily
in the main loop itself.  As for the bitmap_weight(), I seriously
suspect that keeping the count (all updates are under ->files_lock)
might be cheap enough, and IIRC we already had cgroup (or was it bpf?)
folks trying to get that count in really scary ways...
	->fdt->close_on_exec has two places accessing it
with rcu_read_lock() alone (F_GETFD and alloc_bprm()).  In both
cases we have already done fdget() on the same descrpiptor.

So... do we really need that indirect?  The problem would be
seeing ->max_fds update before that of the array pointers.
Smells like that should be doable with barrier(s) - and not
many, at that...  

Note that if we coallocate the file references' array and bitmaps,
we could e.g. slap rcu_head over the beginning of ->open_fds
bitmap after the sucker's gone from files_struct and do RCU-delayed
freeing on that.  Am I missing something obvious here?

  reply	other threads:[~2024-08-05 23:44 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-08-03 22:50 [PATCH] fix bitmap corruption on close_range() with CLOSE_RANGE_UNSHARE Al Viro
2024-08-03 23:06 ` Al Viro
2024-08-03 23:51 ` Linus Torvalds
2024-08-04  0:05   ` Linus Torvalds
2024-08-04  0:34   ` Al Viro
2024-08-04  3:42     ` Linus Torvalds
2024-08-04  3:47     ` Al Viro
2024-08-04  4:17       ` Al Viro
2024-08-04 15:18       ` Linus Torvalds
2024-08-04 21:13         ` Al Viro
2024-08-05 23:44           ` Al Viro [this message]
2024-08-06  0:04             ` Linus Torvalds
2024-08-06  1:02               ` Al Viro
2024-08-06  8:41                 ` Christian Brauner
2024-08-06 16:32                   ` Al Viro
2024-08-06 17:01                     ` Linus Torvalds
2024-08-05  7:22 ` Christian Brauner
2024-08-05 18:54   ` Al Viro
2024-08-06  9:11     ` Christian Brauner
2024-08-05  9:48 ` Christian Brauner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240805234456.GK5334@ZenIV \
    --to=viro@zeniv.linux.org.uk \
    --cc=brauner@kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox