All of lore.kernel.org
 help / color / mirror / Atom feed
From: Al Viro <viro@zeniv.linux.org.uk>
To: Samuel Wu <wusamuel@google.com>
Cc: Greg KH <gregkh@linuxfoundation.org>,
	linux-fsdevel@vger.kernel.org, torvalds@linux-foundation.org,
	brauner@kernel.org, jack@suse.cz, raven@themaw.net,
	miklos@szeredi.hu, neil@brown.name, a.hindborg@kernel.org,
	linux-mm@kvack.org, linux-efi@vger.kernel.org,
	ocfs2-devel@lists.linux.dev, kees@kernel.org,
	rostedt@goodmis.org, linux-usb@vger.kernel.org,
	paul@paul-moore.com, casey@schaufler-ca.com,
	linuxppc-dev@lists.ozlabs.org, john.johansen@canonical.com,
	selinux@vger.kernel.org, borntraeger@linux.ibm.com,
	bpf@vger.kernel.org, clm@meta.com,
	android-kernel-team <android-kernel-team@google.com>
Subject: Re: [PATCH v4 00/54] tree-in-dcache stuff
Date: Fri, 30 Jan 2026 23:57:43 +0000	[thread overview]
Message-ID: <20260130235743.GW3183987@ZenIV> (raw)
In-Reply-To: <CAG2Kctoqja9R1bBzdEAV15_yt=sBGkcub6C2nGE6VHMJh13=FQ@mail.gmail.com>

On Fri, Jan 30, 2026 at 02:31:54PM -0800, Samuel Wu wrote:
> On Thu, Jan 29, 2026 at 11:02 PM Al Viro <viro@zeniv.linux.org.uk> wrote:
> > OK.  Could you take a clone of mainline repository and in there run
> > ; git fetch git://git.kernel.org:/pub/scm/linux/kernel/git/viro/vfs.git for-wsamuel:for-wsamuel
> > then
> > ; git diff for-wsamuel e5bf5ee26663
> > to verify that for-wsamuel is identical to tree you've seen breakage on
> > ; git diff for-wsamuel-base 1544775687f0
> > to verify that for-wsamuel-base is the tree where the breakage did not reproduce
> > Then bisect from for-wsamuel-base to for-wsamuel.
> >
> > Basically, that's the offending commit split into steps; let's try to figure
> > out what causes the breakage with better resolution...
> 
> Confirming that bisect points to this patch: 09e88dc22ea2 (serialize
> ffs_ep0_open() on ffs->mutex)

So we have something that does O_NDELAY opens of ep0 *and* does not retry on
EAGAIN?

How lovely...  Could you slap
	WARN_ON(ret == -EAGAIN);
right before that
	if (ret < 0)
		return ret;
in there and see which process is doing that?  Regression is a regression, 
odd userland or not, but I would like to see what is that userland actually
trying to do there.

*grumble*

IMO at that point we have two problems - one is how to avoid a revert of the
tail of tree-in-dcache series, another is how to deal with quite real
preexisting bugs in functionfs.

Another thing to try (not as a suggestion of a fix, just an attempt to figure
out how badly would the things break): in current mainline replace that
	ffs_mutex_lock(&ffs->mutex, file->f_flags & O_NONBLOCK)
in ffs_ep0_open() with
	ffs_mutex_lock(&ffs->mutex, false)
and see how badly do the things regress for userland.  Again, I'm not saying
that this is a fix - just trying to get some sense of what's the userland
is doing.

FWIW, it might make sense to try a lighter serialization in ffs_ep0_open() -
taking it there is due to the following scenario (assuming 6.18 or earlier):
ffs->state is FFS_DEACTIVATED.  ffs->opened is 0.  Two threads attempt to
open ep0.  Here's what happens prior to these patches:

static int ffs_ep0_open(struct inode *inode, struct file *file)
{
        struct ffs_data *ffs = inode->i_private;
 
        if (ffs->state == FFS_CLOSING)
                return -EBUSY;
 
        file->private_data = ffs;
        ffs_data_opened(ffs);

with
static void ffs_data_opened(struct ffs_data *ffs)
{
        refcount_inc(&ffs->ref);
        if (atomic_add_return(1, &ffs->opened) == 1 &&
                        ffs->state == FFS_DEACTIVATED) {
                ffs->state = FFS_CLOSING;
                ffs_data_reset(ffs);
        }
}

IOW, the sequence is
	if (state == FFS_CLOSING)
		return -EBUSY;
	n = atomic_add_return(1, &opened);
	if (n == 1 && state == FFS_DEACTIVATED) {
		state = FFS_CLOSING;
		ffs_data_reset();

See the race there?  If the second open() comes between the
increment of ffs->opened and setting the state to FFS_CLOSING,
it will *not* fail with EBUSY - it will proceed to return to
userland, while the first sucker is crawling through the work
in ffs_data_reset()/ffs_data_clear()/ffs_epfiles_destroy().

What's more, there's nothing to stop that second opener from
calling write() on the descriptor it got.  No exclusion there -
        ffs->state = FFS_READ_DESCRIPTORS;
        ffs->setup_state = FFS_NO_SETUP;
        ffs->flags = 0;
in ffs_data_reset() is *not* serialized against ffs_ep0_write().
Get preempted right after setting ->state and that write()
will go just fine, only to be surprised when the first thread
regains CPU and continues modifying the contents of *ffs
under whatever the second thread is doing.

That code obviously relies upon that kind of shit being prevented
by that -EBUSY logics in ep0 open() and that logics is obviously
racy as it is.  Note that other callers of ffs_data_reset() have
similar problem: ffs_func_set_alt(), for example has
        if (ffs->state == FFS_DEACTIVATED) {
                ffs->state = FFS_CLOSING;
                INIT_WORK(&ffs->reset_work, ffs_reset_work);
                schedule_work(&ffs->reset_work);
                return -ENODEV;
        }
again, with no exclusion.  Lose CPU just after seeing FFS_DEACTIVATED,
then have another thread open() the sucker and start going through
ffs_data_reset(), only to have us regain CPU and schedule this for
execution:
static void ffs_reset_work(struct work_struct *work)
{
        struct ffs_data *ffs = container_of(work,
                struct ffs_data, reset_work);
        ffs_data_reset(ffs);
}
IOW, stray ffs_data_reset() coming to surprise the opener who'd
just finished ffs_data_reset() during open(2) and proceeded to
write to the damn thing, etc.

That's obviously on the "how do we fix the preexisting bugs" side
of things, though - regression needs to be dealt with ASAP anyway.

  reply	other threads:[~2026-01-30 23:55 UTC|newest]

Thread overview: 81+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-11-18  5:15 [PATCH v4 00/54] tree-in-dcache stuff Al Viro
2025-11-18  5:15 ` [PATCH v4 01/54] fuse_ctl_add_conn(): fix nlink breakage in case of early failure Al Viro
2025-11-18  5:15 ` [PATCH v4 02/54] tracefs: fix a leak in eventfs_create_events_dir() Al Viro
2025-11-18  5:15 ` [PATCH v4 03/54] new helper: simple_remove_by_name() Al Viro
2025-11-18  5:15 ` [PATCH v4 04/54] new helper: simple_done_creating() Al Viro
2025-11-18  5:15 ` [PATCH v4 05/54] introduce a flag for explicitly marking persistently pinned dentries Al Viro
2025-11-18  5:15 ` [PATCH v4 06/54] primitives for maintaining persisitency Al Viro
2025-11-18  5:15 ` [PATCH v4 07/54] convert simple_{link,unlink,rmdir,rename,fill_super}() to new primitives Al Viro
2025-11-18  5:15 ` [PATCH v4 08/54] convert ramfs and tmpfs Al Viro
2025-11-18  5:15 ` [PATCH v4 09/54] procfs: make /self and /thread_self dentries persistent Al Viro
2025-11-18  5:15 ` [PATCH v4 10/54] configfs, securityfs: kill_litter_super() not needed Al Viro
2025-11-18  5:15 ` [PATCH v4 11/54] convert xenfs Al Viro
2025-11-18  5:15 ` [PATCH v4 12/54] convert smackfs Al Viro
2025-11-18  5:15 ` [PATCH v4 13/54] convert hugetlbfs Al Viro
2025-11-18  5:15 ` [PATCH v4 14/54] convert mqueue Al Viro
2025-11-18  5:15 ` [PATCH v4 15/54] convert bpf Al Viro
2025-11-18  5:15 ` [PATCH v4 16/54] convert dlmfs Al Viro
2025-11-18  5:15 ` [PATCH v4 17/54] convert fuse_ctl Al Viro
2025-11-18  5:15 ` [PATCH v4 18/54] convert pstore Al Viro
2025-11-18  5:15 ` [PATCH v4 19/54] convert tracefs Al Viro
2025-11-18  5:15 ` [PATCH v4 20/54] convert debugfs Al Viro
2025-11-18  5:15 ` [PATCH v4 21/54] debugfs: remove duplicate checks in callers of start_creating() Al Viro
2025-11-18  5:15 ` [PATCH v4 22/54] convert efivarfs Al Viro
2025-11-18  5:15 ` [PATCH v4 23/54] convert spufs Al Viro
2025-11-18  5:15 ` [PATCH v4 24/54] convert ibmasmfs Al Viro
2025-11-18  5:15 ` [PATCH v4 25/54] ibmasmfs: get rid of ibmasmfs_dir_ops Al Viro
2025-11-18  5:15 ` [PATCH v4 26/54] convert devpts Al Viro
2025-11-18  5:15 ` [PATCH v4 27/54] binderfs: use simple_start_creating() Al Viro
2025-11-18  5:15 ` [PATCH v4 28/54] binderfs_binder_ctl_create(): kill a bogus check Al Viro
2025-11-18  5:15 ` [PATCH v4 29/54] convert binderfs Al Viro
2025-11-18  5:15 ` [PATCH v4 30/54] autofs_{rmdir,unlink}: dentry->d_fsdata->dentry == dentry there Al Viro
2025-11-18  5:15 ` [PATCH v4 31/54] convert autofs Al Viro
2025-11-18  5:15 ` [PATCH v4 32/54] convert binfmt_misc Al Viro
2025-11-18  5:15 ` [PATCH v4 33/54] selinuxfs: don't stash the dentry of /policy_capabilities Al Viro
2025-11-18  5:15 ` [PATCH v4 34/54] selinuxfs: new helper for attaching files to tree Al Viro
2025-11-18  5:15 ` [PATCH v4 35/54] convert selinuxfs Al Viro
2025-11-18  5:15 ` [PATCH v4 36/54] functionfs: don't abuse ffs_data_closed() on fs shutdown Al Viro
2025-11-18  5:15 ` [PATCH v4 37/54] functionfs: don't bother with ffs->ref in ffs_data_{opened,closed}() Al Viro
2025-11-18  5:15 ` [PATCH v4 38/54] functionfs: need to cancel ->reset_work in ->kill_sb() Al Viro
2025-11-18  5:15 ` [PATCH v4 39/54] functionfs: fix the open/removal races Al Viro
2025-11-18  5:15 ` [PATCH v4 40/54] functionfs: switch to simple_remove_by_name() Al Viro
2025-11-18  5:15 ` [PATCH v4 41/54] convert functionfs Al Viro
2025-11-18  5:15 ` [PATCH v4 42/54] gadgetfs: switch to simple_remove_by_name() Al Viro
2025-11-18  5:15 ` [PATCH v4 43/54] convert gadgetfs Al Viro
2025-11-18  5:15 ` [PATCH v4 44/54] hypfs: don't pin dentries twice Al Viro
2025-11-18  5:15 ` [PATCH v4 45/54] hypfs: switch hypfs_create_str() to returning int Al Viro
2025-11-18  5:15 ` [PATCH v4 46/54] hypfs: swich hypfs_create_u64() " Al Viro
2025-11-18  5:15 ` [PATCH v4 47/54] convert hypfs Al Viro
2025-11-18  5:15 ` [PATCH v4 48/54] convert rpc_pipefs Al Viro
2025-11-18  5:15 ` [PATCH v4 49/54] convert nfsctl Al Viro
2025-11-18  5:15 ` [PATCH v4 50/54] convert rust_binderfs Al Viro
2025-11-18  5:16 ` [PATCH v4 51/54] get rid of kill_litter_super() Al Viro
2025-11-18  5:16 ` [PATCH v4 52/54] convert securityfs Al Viro
2025-11-18  5:16 ` [PATCH v4 53/54] kill securityfs_recursive_remove() Al Viro
2025-11-18  5:16 ` [PATCH v4 54/54] d_make_discardable(): warn if given a non-persistent dentry Al Viro
2026-01-27  0:56 ` [PATCH v4 00/54] tree-in-dcache stuff Samuel Wu
2026-01-27  7:42   ` Greg KH
2026-01-27 18:39     ` Linus Torvalds
2026-01-27 20:14       ` Al Viro
2026-01-28  8:53         ` Greg KH
2026-01-28  2:02     ` Samuel Wu
2026-01-28  4:59       ` Al Viro
2026-01-29  0:58         ` Samuel Wu
2026-01-29  3:23           ` Al Viro
2026-01-29 22:54             ` Al Viro
2026-01-30  1:16               ` Samuel Wu
2026-01-30  7:04                 ` Al Viro
2026-01-30 22:31                   ` Samuel Wu
2026-01-30 23:57                     ` Al Viro [this message]
2026-01-31  0:14                       ` Linus Torvalds
2026-01-31  1:08                         ` Al Viro
2026-01-31  1:11                           ` Linus Torvalds
2026-02-01  0:11                             ` Al Viro
2026-01-31  0:59                       ` Al Viro
2026-01-31  1:05                       ` Samuel Wu
2026-01-31  1:18                         ` Al Viro
2026-01-31  2:09                           ` Samuel Wu
2026-01-31  2:43                             ` Al Viro
2026-01-31 19:48                               ` Samuel Wu
2026-01-31 14:58                 ` Krishna Kurapati PSSNV
2026-01-31 20:02                   ` Samuel Wu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260130235743.GW3183987@ZenIV \
    --to=viro@zeniv.linux.org.uk \
    --cc=a.hindborg@kernel.org \
    --cc=android-kernel-team@google.com \
    --cc=borntraeger@linux.ibm.com \
    --cc=bpf@vger.kernel.org \
    --cc=brauner@kernel.org \
    --cc=casey@schaufler-ca.com \
    --cc=clm@meta.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=jack@suse.cz \
    --cc=john.johansen@canonical.com \
    --cc=kees@kernel.org \
    --cc=linux-efi@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-usb@vger.kernel.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=miklos@szeredi.hu \
    --cc=neil@brown.name \
    --cc=ocfs2-devel@lists.linux.dev \
    --cc=paul@paul-moore.com \
    --cc=raven@themaw.net \
    --cc=rostedt@goodmis.org \
    --cc=selinux@vger.kernel.org \
    --cc=torvalds@linux-foundation.org \
    --cc=wusamuel@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.