linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Al Viro <viro@zeniv.linux.org.uk>
To: Christian Brauner <brauner@kernel.org>
Cc: "Thomas Weißschuh" <thomas.weissschuh@linutronix.de>,
	"Jan Kara" <jack@suse.cz>, "Sargun Dhillon" <sargun@sargun.me>,
	"Kees Cook" <kees@kernel.org>,
	linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org,
	stable@vger.kernel.org
Subject: Re: [PATCH v2] fs: always return zero on success from replace_fd()
Date: Wed, 6 Aug 2025 02:54:49 +0100	[thread overview]
Message-ID: <20250806015449.GA1638962@ZenIV> (raw)
In-Reply-To: <20250805195003.GC222315@ZenIV>

On Tue, Aug 05, 2025 at 08:50:03PM +0100, Al Viro wrote:
> On Tue, Aug 05, 2025 at 04:34:57PM +0100, Al Viro wrote:
> 
> > They do no allow to express things like "foo() consumes lock X".
> > >From time to time, we *do* need that, and when that happens guards
> > become a menace.
> > 
> > Another case is
> > 	lock
> > 	if (lock-dependent condition)
> > 		some work
> > 		unlock
> > 		work that can't be under that lock
> > 	else
> > 		some other work
> > 		unlock
> > 		more work that can't be under that lock
> > 
> > Fairly common, especially when that's a spinlock and "can't be under that
> > lock" includes blocking operations.  Can't be expressed with guards, not
> > without a massage that often ends up with bloody awful results.
> 
> FWIW, I'm looking through the raw data I've got during ->d_lock audit.
> Except for a few functions (all static in fs/dcache.c), all scopes
> terminate in the same function where they begin.

... and for ->file_lock we have the following:
	expand_fdtable(): drops and regains
	expand_files(): either nothing or drops and regains
	do_dup2(): drops
everything else is neutral.

20 functions touching that stuff total.  Convertible to guard() or
scoped_guard(): put_unused_fd(), fd_install(), close_fd() (scoped_guard
only), __range_cloexec(), file_close_fd(), set_close_on_exec(),
iterate_fd(), fcntl_setlk() and fcntl_setlk64() (scoped_guard only), 
seq_show() - 10 out of 20.

Anything that uses expand_fdtable() is in the best case an abuse of
guard/scoped_guard; in the worst, it's actively confusing, since
there's *not* one continuous scope there.  expand_files() is in
the same boat.  That covers alloc_fd(), replace_fd() and ksys_dup3().
That's 5 out of remaining 10.  BTW, trying to eliminate gotos from
alloc_fd() is not fun either.

dup_fd():
	spin_lock(&oldf->file_lock);
	...
	while (unlikely(open_files > new_fdt->max_fds)) {
		spin_unlock(&oldf->file_lock);
		... (blocking, possibly return on failure here)
		spin_lock(&oldf->file_lock);
		...
	}
	...
	spin_unlock(&oldf->file_lock);
	...
No way to do that with guard() - not unless you mix it with explicit
unlock/lock in the middle of scope, and even that will be bitch to
deal with due to failure exit after allocation failure.  We'd need
to do this:
	spin_unlock(&oldf->file_lock);
	if (new_fdt != &newf->fdtab)
		__free_fdtable(new_fdt);
	new_fdt = alloc_fdtable(open_files);
	spin_lock(&oldf->file_lock);
	if (IS_ERR(new_fdt)) {
		kmem_cache_free(files_cachep, newf);
		return ERR_CAST(new_fdt);
	}
all of that under guard(spinlock)(&oldf->file_lock).  IMO that would
be too confusing and brittle.

__range_close():
	spin_lock(&files->file_lock);
	...
	for (; fd <= max_fd; fd++) { 
		file = file_close_fd_locked(files, fd);
		if (file) {
			spin_unlock(&files->file_lock);
			filp_close(file, files);
			cond_resched();
			spin_lock(&files->file_lock);
		} else if (need_resched()) {
			spin_unlock(&files->file_lock);
			cond_resched();
			spin_lock(&files->file_lock);
		}
	}
	spin_unlock(&files->file_lock);
Not a good fit for guard(), for the same reasons.

do_close_on_exec():
	...
	spin_lock(&files->file_lock);
	for (i = 0; ; i++) {
		....
		for ( ; set ; fd++, set >>= 1) {
			....
			spin_unlock(&files->file_lock);
			filp_close(file, files);
			cond_resched();
			spin_lock(&files->file_lock);
		}
	}
	spin_unlock(&files->file_lock);
Same story.

io_close():
	might be convertible to scoped_guard; won't be pretty,
AFAICS - that -EAGAIN case in the middle makes it very clumsy.

do_dup2(): well... we could lift filp_close() into the callers,
but that ends up with fairly unpleasant boilerplate in the
callers, and one of those callers is a fairly hot syscall.

And that's the remaining 5.  For some locks scoped_guard() is
a decent fit; for some it really isn't ;-/

      reply	other threads:[~2025-08-06  1:54 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-08-04  8:40 [PATCH v2] fs: always return zero on success from replace_fd() Thomas Weißschuh
2025-08-04 12:33 ` Christian Brauner
2025-08-04 15:52   ` Al Viro
2025-08-04 16:02     ` Thomas Weißschuh
2025-08-05 11:56       ` Christian Brauner
2025-08-04 17:27     ` Al Viro
2025-08-05 11:55     ` Christian Brauner
2025-08-05 15:34       ` Al Viro
2025-08-05 19:50         ` Al Viro
2025-08-06  1:54           ` Al Viro [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250806015449.GA1638962@ZenIV \
    --to=viro@zeniv.linux.org.uk \
    --cc=brauner@kernel.org \
    --cc=jack@suse.cz \
    --cc=kees@kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=sargun@sargun.me \
    --cc=stable@vger.kernel.org \
    --cc=thomas.weissschuh@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).