Re: [PATCH v4 00/54] tree-in-dcache stuff

Linux filesystem development
 help / color / mirror / Atom feed

From: Al Viro <viro@zeniv.linux.org.uk>
To: Samuel Wu <wusamuel@google.com>
Cc: Greg KH <gregkh@linuxfoundation.org>,
	linux-fsdevel@vger.kernel.org, torvalds@linux-foundation.org,
	brauner@kernel.org, jack@suse.cz, raven@themaw.net,
	miklos@szeredi.hu, neil@brown.name, a.hindborg@kernel.org,
	linux-mm@kvack.org, linux-efi@vger.kernel.org,
	ocfs2-devel@lists.linux.dev, kees@kernel.org,
	rostedt@goodmis.org, linux-usb@vger.kernel.org,
	paul@paul-moore.com, casey@schaufler-ca.com,
	linuxppc-dev@lists.ozlabs.org, john.johansen@canonical.com,
	selinux@vger.kernel.org, borntraeger@linux.ibm.com,
	bpf@vger.kernel.org, clm@meta.com,
	android-kernel-team <android-kernel-team@google.com>
Subject: Re: [PATCH v4 00/54] tree-in-dcache stuff
Date: Fri, 30 Jan 2026 23:57:43 +0000	[thread overview]
Message-ID: <20260130235743.GW3183987@ZenIV> (raw)
In-Reply-To: <CAG2Kctoqja9R1bBzdEAV15_yt=sBGkcub6C2nGE6VHMJh13=FQ@mail.gmail.com>

On Fri, Jan 30, 2026 at 02:31:54PM -0800, Samuel Wu wrote:
> On Thu, Jan 29, 2026 at 11:02 PM Al Viro <viro@zeniv.linux.org.uk> wrote:
> > OK.  Could you take a clone of mainline repository and in there run
> > ; git fetch git://git.kernel.org:/pub/scm/linux/kernel/git/viro/vfs.git for-wsamuel:for-wsamuel
> > then
> > ; git diff for-wsamuel e5bf5ee26663
> > to verify that for-wsamuel is identical to tree you've seen breakage on
> > ; git diff for-wsamuel-base 1544775687f0
> > to verify that for-wsamuel-base is the tree where the breakage did not reproduce
> > Then bisect from for-wsamuel-base to for-wsamuel.
> >
> > Basically, that's the offending commit split into steps; let's try to figure
> > out what causes the breakage with better resolution...
> 
> Confirming that bisect points to this patch: 09e88dc22ea2 (serialize
> ffs_ep0_open() on ffs->mutex)

So we have something that does O_NDELAY opens of ep0 *and* does not retry on
EAGAIN?

How lovely...  Could you slap
	WARN_ON(ret == -EAGAIN);
right before that
	if (ret < 0)
		return ret;
in there and see which process is doing that?  Regression is a regression, 
odd userland or not, but I would like to see what is that userland actually
trying to do there.

*grumble*

IMO at that point we have two problems - one is how to avoid a revert of the
tail of tree-in-dcache series, another is how to deal with quite real
preexisting bugs in functionfs.

Another thing to try (not as a suggestion of a fix, just an attempt to figure
out how badly would the things break): in current mainline replace that
	ffs_mutex_lock(&ffs->mutex, file->f_flags & O_NONBLOCK)
in ffs_ep0_open() with
	ffs_mutex_lock(&ffs->mutex, false)
and see how badly do the things regress for userland.  Again, I'm not saying
that this is a fix - just trying to get some sense of what's the userland
is doing.

FWIW, it might make sense to try a lighter serialization in ffs_ep0_open() -
taking it there is due to the following scenario (assuming 6.18 or earlier):
ffs->state is FFS_DEACTIVATED.  ffs->opened is 0.  Two threads attempt to
open ep0.  Here's what happens prior to these patches:

static int ffs_ep0_open(struct inode *inode, struct file *file)
{
        struct ffs_data *ffs = inode->i_private;
 
        if (ffs->state == FFS_CLOSING)
                return -EBUSY;
 
        file->private_data = ffs;
        ffs_data_opened(ffs);

with
static void ffs_data_opened(struct ffs_data *ffs)
{
        refcount_inc(&ffs->ref);
        if (atomic_add_return(1, &ffs->opened) == 1 &&
                        ffs->state == FFS_DEACTIVATED) {
                ffs->state = FFS_CLOSING;
                ffs_data_reset(ffs);
        }
}

IOW, the sequence is
	if (state == FFS_CLOSING)
		return -EBUSY;
	n = atomic_add_return(1, &opened);
	if (n == 1 && state == FFS_DEACTIVATED) {
		state = FFS_CLOSING;
		ffs_data_reset();

See the race there?  If the second open() comes between the
increment of ffs->opened and setting the state to FFS_CLOSING,
it will *not* fail with EBUSY - it will proceed to return to
userland, while the first sucker is crawling through the work
in ffs_data_reset()/ffs_data_clear()/ffs_epfiles_destroy().

What's more, there's nothing to stop that second opener from
calling write() on the descriptor it got.  No exclusion there -
        ffs->state = FFS_READ_DESCRIPTORS;
        ffs->setup_state = FFS_NO_SETUP;
        ffs->flags = 0;
in ffs_data_reset() is *not* serialized against ffs_ep0_write().
Get preempted right after setting ->state and that write()
will go just fine, only to be surprised when the first thread
regains CPU and continues modifying the contents of *ffs
under whatever the second thread is doing.

That code obviously relies upon that kind of shit being prevented
by that -EBUSY logics in ep0 open() and that logics is obviously
racy as it is.  Note that other callers of ffs_data_reset() have
similar problem: ffs_func_set_alt(), for example has
        if (ffs->state == FFS_DEACTIVATED) {
                ffs->state = FFS_CLOSING;
                INIT_WORK(&ffs->reset_work, ffs_reset_work);
                schedule_work(&ffs->reset_work);
                return -ENODEV;
        }
again, with no exclusion.  Lose CPU just after seeing FFS_DEACTIVATED,
then have another thread open() the sucker and start going through
ffs_data_reset(), only to have us regain CPU and schedule this for
execution:
static void ffs_reset_work(struct work_struct *work)
{
        struct ffs_data *ffs = container_of(work,
                struct ffs_data, reset_work);
        ffs_data_reset(ffs);
}
IOW, stray ffs_data_reset() coming to surprise the opener who'd
just finished ffs_data_reset() during open(2) and proceeded to
write to the damn thing, etc.

That's obviously on the "how do we fix the preexisting bugs" side
of things, though - regression needs to be dealt with ASAP anyway.

next prev parent reply	other threads:[~2026-01-30 23:55 UTC|newest]

Thread overview: 81+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-11-18  5:15 [PATCH v4 00/54] tree-in-dcache stuff Al Viro
2025-11-18  5:15 ` [PATCH v4 01/54] fuse_ctl_add_conn(): fix nlink breakage in case of early failure Al Viro
2025-11-18  5:15 ` [PATCH v4 02/54] tracefs: fix a leak in eventfs_create_events_dir() Al Viro
2025-11-18  5:15 ` [PATCH v4 03/54] new helper: simple_remove_by_name() Al Viro
2025-11-18  5:15 ` [PATCH v4 04/54] new helper: simple_done_creating() Al Viro
2025-11-18  5:15 ` [PATCH v4 05/54] introduce a flag for explicitly marking persistently pinned dentries Al Viro
2025-11-18  5:15 ` [PATCH v4 06/54] primitives for maintaining persisitency Al Viro
2025-11-18  5:15 ` [PATCH v4 07/54] convert simple_{link,unlink,rmdir,rename,fill_super}() to new primitives Al Viro
2025-11-18  5:15 ` [PATCH v4 08/54] convert ramfs and tmpfs Al Viro
2025-11-18  5:15 ` [PATCH v4 09/54] procfs: make /self and /thread_self dentries persistent Al Viro
2025-11-18  5:15 ` [PATCH v4 10/54] configfs, securityfs: kill_litter_super() not needed Al Viro
2025-11-18  5:15 ` [PATCH v4 11/54] convert xenfs Al Viro
2025-11-18  5:15 ` [PATCH v4 12/54] convert smackfs Al Viro
2025-11-18  5:15 ` [PATCH v4 13/54] convert hugetlbfs Al Viro
2025-11-18  5:15 ` [PATCH v4 14/54] convert mqueue Al Viro
2025-11-18  5:15 ` [PATCH v4 15/54] convert bpf Al Viro
2025-11-18  5:15 ` [PATCH v4 16/54] convert dlmfs Al Viro
2025-11-18  5:15 ` [PATCH v4 17/54] convert fuse_ctl Al Viro
2025-11-18  5:15 ` [PATCH v4 18/54] convert pstore Al Viro
2025-11-18  5:15 ` [PATCH v4 19/54] convert tracefs Al Viro
2025-11-18  5:15 ` [PATCH v4 20/54] convert debugfs Al Viro
2025-11-18  5:15 ` [PATCH v4 21/54] debugfs: remove duplicate checks in callers of start_creating() Al Viro
2025-11-18  5:15 ` [PATCH v4 22/54] convert efivarfs Al Viro
2025-11-18  5:15 ` [PATCH v4 23/54] convert spufs Al Viro
2025-11-18  5:15 ` [PATCH v4 24/54] convert ibmasmfs Al Viro
2025-11-18  5:15 ` [PATCH v4 25/54] ibmasmfs: get rid of ibmasmfs_dir_ops Al Viro
2025-11-18  5:15 ` [PATCH v4 26/54] convert devpts Al Viro
2025-11-18  5:15 ` [PATCH v4 27/54] binderfs: use simple_start_creating() Al Viro
2025-11-18  5:15 ` [PATCH v4 28/54] binderfs_binder_ctl_create(): kill a bogus check Al Viro
2025-11-18  5:15 ` [PATCH v4 29/54] convert binderfs Al Viro
2025-11-18  5:15 ` [PATCH v4 30/54] autofs_{rmdir,unlink}: dentry->d_fsdata->dentry == dentry there Al Viro
2025-11-18  5:15 ` [PATCH v4 31/54] convert autofs Al Viro
2025-11-18  5:15 ` [PATCH v4 32/54] convert binfmt_misc Al Viro
2025-11-18  5:15 ` [PATCH v4 33/54] selinuxfs: don't stash the dentry of /policy_capabilities Al Viro
2025-11-18  5:15 ` [PATCH v4 34/54] selinuxfs: new helper for attaching files to tree Al Viro
2025-11-18  5:15 ` [PATCH v4 35/54] convert selinuxfs Al Viro
2025-11-18  5:15 ` [PATCH v4 36/54] functionfs: don't abuse ffs_data_closed() on fs shutdown Al Viro
2025-11-18  5:15 ` [PATCH v4 37/54] functionfs: don't bother with ffs->ref in ffs_data_{opened,closed}() Al Viro
2025-11-18  5:15 ` [PATCH v4 38/54] functionfs: need to cancel ->reset_work in ->kill_sb() Al Viro
2025-11-18  5:15 ` [PATCH v4 39/54] functionfs: fix the open/removal races Al Viro
2025-11-18  5:15 ` [PATCH v4 40/54] functionfs: switch to simple_remove_by_name() Al Viro
2025-11-18  5:15 ` [PATCH v4 41/54] convert functionfs Al Viro
2025-11-18  5:15 ` [PATCH v4 42/54] gadgetfs: switch to simple_remove_by_name() Al Viro
2025-11-18  5:15 ` [PATCH v4 43/54] convert gadgetfs Al Viro
2025-11-18  5:15 ` [PATCH v4 44/54] hypfs: don't pin dentries twice Al Viro
2025-11-18  5:15 ` [PATCH v4 45/54] hypfs: switch hypfs_create_str() to returning int Al Viro
2025-11-18  5:15 ` [PATCH v4 46/54] hypfs: swich hypfs_create_u64() " Al Viro
2025-11-18  5:15 ` [PATCH v4 47/54] convert hypfs Al Viro
2025-11-18  5:15 ` [PATCH v4 48/54] convert rpc_pipefs Al Viro
2025-11-18  5:15 ` [PATCH v4 49/54] convert nfsctl Al Viro
2025-11-18  5:15 ` [PATCH v4 50/54] convert rust_binderfs Al Viro
2025-11-18  5:16 ` [PATCH v4 51/54] get rid of kill_litter_super() Al Viro
2025-11-18  5:16 ` [PATCH v4 52/54] convert securityfs Al Viro
2025-11-18  5:16 ` [PATCH v4 53/54] kill securityfs_recursive_remove() Al Viro
2025-11-18  5:16 ` [PATCH v4 54/54] d_make_discardable(): warn if given a non-persistent dentry Al Viro
2026-01-27  0:56 ` [PATCH v4 00/54] tree-in-dcache stuff Samuel Wu
2026-01-27  7:42   ` Greg KH
2026-01-27 18:39     ` Linus Torvalds
2026-01-27 20:14       ` Al Viro
2026-01-28  8:53         ` Greg KH
2026-01-28  2:02     ` Samuel Wu
2026-01-28  4:59       ` Al Viro
2026-01-29  0:58         ` Samuel Wu
2026-01-29  3:23           ` Al Viro
2026-01-29 22:54             ` Al Viro
2026-01-30  1:16               ` Samuel Wu
2026-01-30  7:04                 ` Al Viro
2026-01-30 22:31                   ` Samuel Wu
2026-01-30 23:57                     ` Al Viro [this message]
2026-01-31  0:14                       ` Linus Torvalds
2026-01-31  1:08                         ` Al Viro
2026-01-31  1:11                           ` Linus Torvalds
2026-02-01  0:11                             ` Al Viro
2026-01-31  0:59                       ` Al Viro
2026-01-31  1:05                       ` Samuel Wu
2026-01-31  1:18                         ` Al Viro
2026-01-31  2:09                           ` Samuel Wu
2026-01-31  2:43                             ` Al Viro
2026-01-31 19:48                               ` Samuel Wu
2026-01-31 14:58                 ` Krishna Kurapati PSSNV
2026-01-31 20:02                   ` Samuel Wu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260130235743.GW3183987@ZenIV \
    --to=viro@zeniv.linux.org.uk \
    --cc=a.hindborg@kernel.org \
    --cc=android-kernel-team@google.com \
    --cc=borntraeger@linux.ibm.com \
    --cc=bpf@vger.kernel.org \
    --cc=brauner@kernel.org \
    --cc=casey@schaufler-ca.com \
    --cc=clm@meta.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=jack@suse.cz \
    --cc=john.johansen@canonical.com \
    --cc=kees@kernel.org \
    --cc=linux-efi@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-usb@vger.kernel.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=miklos@szeredi.hu \
    --cc=neil@brown.name \
    --cc=ocfs2-devel@lists.linux.dev \
    --cc=paul@paul-moore.com \
    --cc=raven@themaw.net \
    --cc=rostedt@goodmis.org \
    --cc=selinux@vger.kernel.org \
    --cc=torvalds@linux-foundation.org \
    --cc=wusamuel@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox