Re: [PATCH v2] fs: always return zero on success from replace_fd()

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Al Viro <viro@zeniv.linux.org.uk>
To: Christian Brauner <brauner@kernel.org>
Cc: "Thomas Weißschuh" <thomas.weissschuh@linutronix.de>,
	"Jan Kara" <jack@suse.cz>, "Sargun Dhillon" <sargun@sargun.me>,
	"Kees Cook" <kees@kernel.org>,
	linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org,
	stable@vger.kernel.org
Subject: Re: [PATCH v2] fs: always return zero on success from replace_fd()
Date: Wed, 6 Aug 2025 02:54:49 +0100	[thread overview]
Message-ID: <20250806015449.GA1638962@ZenIV> (raw)
In-Reply-To: <20250805195003.GC222315@ZenIV>

On Tue, Aug 05, 2025 at 08:50:03PM +0100, Al Viro wrote:
> On Tue, Aug 05, 2025 at 04:34:57PM +0100, Al Viro wrote:
> 
> > They do no allow to express things like "foo() consumes lock X".
> > >From time to time, we *do* need that, and when that happens guards
> > become a menace.
> > 
> > Another case is
> > 	lock
> > 	if (lock-dependent condition)
> > 		some work
> > 		unlock
> > 		work that can't be under that lock
> > 	else
> > 		some other work
> > 		unlock
> > 		more work that can't be under that lock
> > 
> > Fairly common, especially when that's a spinlock and "can't be under that
> > lock" includes blocking operations.  Can't be expressed with guards, not
> > without a massage that often ends up with bloody awful results.
> 
> FWIW, I'm looking through the raw data I've got during ->d_lock audit.
> Except for a few functions (all static in fs/dcache.c), all scopes
> terminate in the same function where they begin.

... and for ->file_lock we have the following:
	expand_fdtable(): drops and regains
	expand_files(): either nothing or drops and regains
	do_dup2(): drops
everything else is neutral.

20 functions touching that stuff total.  Convertible to guard() or
scoped_guard(): put_unused_fd(), fd_install(), close_fd() (scoped_guard
only), __range_cloexec(), file_close_fd(), set_close_on_exec(),
iterate_fd(), fcntl_setlk() and fcntl_setlk64() (scoped_guard only), 
seq_show() - 10 out of 20.

Anything that uses expand_fdtable() is in the best case an abuse of
guard/scoped_guard; in the worst, it's actively confusing, since
there's *not* one continuous scope there.  expand_files() is in
the same boat.  That covers alloc_fd(), replace_fd() and ksys_dup3().
That's 5 out of remaining 10.  BTW, trying to eliminate gotos from
alloc_fd() is not fun either.

dup_fd():
	spin_lock(&oldf->file_lock);
	...
	while (unlikely(open_files > new_fdt->max_fds)) {
		spin_unlock(&oldf->file_lock);
		... (blocking, possibly return on failure here)
		spin_lock(&oldf->file_lock);
		...
	}
	...
	spin_unlock(&oldf->file_lock);
	...
No way to do that with guard() - not unless you mix it with explicit
unlock/lock in the middle of scope, and even that will be bitch to
deal with due to failure exit after allocation failure.  We'd need
to do this:
	spin_unlock(&oldf->file_lock);
	if (new_fdt != &newf->fdtab)
		__free_fdtable(new_fdt);
	new_fdt = alloc_fdtable(open_files);
	spin_lock(&oldf->file_lock);
	if (IS_ERR(new_fdt)) {
		kmem_cache_free(files_cachep, newf);
		return ERR_CAST(new_fdt);
	}
all of that under guard(spinlock)(&oldf->file_lock).  IMO that would
be too confusing and brittle.

__range_close():
	spin_lock(&files->file_lock);
	...
	for (; fd <= max_fd; fd++) { 
		file = file_close_fd_locked(files, fd);
		if (file) {
			spin_unlock(&files->file_lock);
			filp_close(file, files);
			cond_resched();
			spin_lock(&files->file_lock);
		} else if (need_resched()) {
			spin_unlock(&files->file_lock);
			cond_resched();
			spin_lock(&files->file_lock);
		}
	}
	spin_unlock(&files->file_lock);
Not a good fit for guard(), for the same reasons.

do_close_on_exec():
	...
	spin_lock(&files->file_lock);
	for (i = 0; ; i++) {
		....
		for ( ; set ; fd++, set >>= 1) {
			....
			spin_unlock(&files->file_lock);
			filp_close(file, files);
			cond_resched();
			spin_lock(&files->file_lock);
		}
	}
	spin_unlock(&files->file_lock);
Same story.

io_close():
	might be convertible to scoped_guard; won't be pretty,
AFAICS - that -EAGAIN case in the middle makes it very clumsy.

do_dup2(): well... we could lift filp_close() into the callers,
but that ends up with fairly unpleasant boilerplate in the
callers, and one of those callers is a fairly hot syscall.

And that's the remaining 5.  For some locks scoped_guard() is
a decent fit; for some it really isn't ;-/

     prev parent reply	other threads:[~2025-08-06  1:54 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-08-04  8:40 [PATCH v2] fs: always return zero on success from replace_fd() Thomas Weißschuh
2025-08-04 12:33 ` Christian Brauner
2025-08-04 15:52   ` Al Viro
2025-08-04 16:02     ` Thomas Weißschuh
2025-08-05 11:56       ` Christian Brauner
2025-08-04 17:27     ` Al Viro
2025-08-05 11:55     ` Christian Brauner
2025-08-05 15:34       ` Al Viro
2025-08-05 19:50         ` Al Viro
2025-08-06  1:54           ` Al Viro [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250806015449.GA1638962@ZenIV \
    --to=viro@zeniv.linux.org.uk \
    --cc=brauner@kernel.org \
    --cc=jack@suse.cz \
    --cc=kees@kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=sargun@sargun.me \
    --cc=stable@vger.kernel.org \
    --cc=thomas.weissschuh@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.