linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Al Viro <viro@zeniv.linux.org.uk>
To: Mateusz Guzik <mjguzik@gmail.com>
Cc: brauner@kernel.org, jack@suse.cz, linux-kernel@vger.kernel.org,
	linux-fsdevel@vger.kernel.org
Subject: Re: [PATCH] vfs: avoid spurious dentry ref/unref cycle on open
Date: Wed, 7 Aug 2024 06:32:32 +0100	[thread overview]
Message-ID: <20240807053232.GT5334@ZenIV> (raw)
In-Reply-To: <CAGudoHFJe0X-OD42cWrgTObq=G_AZnqCHWPPGawy0ur1b84HGw@mail.gmail.com>

On Wed, Aug 07, 2024 at 05:57:07AM +0200, Mateusz Guzik wrote:

[there'll be a separate reply with what I hope might be a usable
approach]

> Yes, this is my understanding of the code and part of my compliant. :)
> 
> Things just work(tm) as is with NULLified pointers, but this is error-prone.

And carrying the arseloads of information (which ones do and which do not
need to be dropped) is *less* error-prone?  Are you serious?

> As a hypothetical suppose there is code executing some time after
> vfs_open which looks at nd->path.dentry and by finding the pointer is
> NULL it concludes the lookup did not work out.
> 
> If such code exists *and* the pointer is poisoned in the above sense
> (notably merely branching on it with kasan already traps), then the
> consumer will be caught immediately during coverage testing by
> syzkaller.

You are much too optimistic about the quality of test coverage in this
particular area.

> If such code exists but the pointer is only nullified, one is only
> going to find out the hard way when some functionality weirdly breaks.

To do _useful_ asserts, one needs invariants to check.  And "we got
to this check after having passed through that assignment at some
earlier point" is not it.  That's why I'm asking questions about
the state.

The thing is, suppose I (or you, or somebody else) is trying to modify
the whole thing.  There's a magical mystery assert in the way; what
should be done with it?  Move it/split it/remove it/do something
random and hope syzkaller won't catch anything?  If I can reason
about the predicate being checked, I can at least start figuring out
what should be done.  If not, it's bloody guaranteed to rot.

This particular area (pathwalk machinery) has a nasty history of
growing complexity once in a while, with following cleanups and
massage to get it back into more or less tolerable shape.
And refactoring that had been _painful_ - I'd done more than
a few there.

As far as I can tell, at the moment this flag (and yes, I've seen its
removal in the next version) is "we'd called vfs_open_consume() at
some point, then found ourselves still in RCU mode or we'd called
vfs_open_consume() more than once".

This is *NOT* a property of state; it's a property of execution
history.  The first part is checked in the wrong place - one of
the invariants (trivially verified by code examination) is that
LOOKUP_RCU is never regained after it had been dropped.  The
only place where it can be set is path_init() and calling _that_
between path_init() and terminate_walk() would be
	a) a hard and very visible bug
	b) would've wiped your flag anyway.
So that part of the check is basically "we are not calling
vfs_open_consume() under rcu_read_lock()".  Which is definitely
a desirable property, since ->open() can block.  So can
mnt_want_write() several lines prior.  Invariant here is
"the places where we set FMODE_OPENED or FMODE_CREATED may
not have LOOKUP_RCU".  Having
        if (!(file->f_mode & (FMODE_OPENED | FMODE_CREATED))) {
		error = complete_walk(nd);
		if (error)
			return error;
	}
in the beginning of do_open() guarantees that for vfs_open()
call there.  All other places where that can happen are in
lookup_open() or called from it (via ->atomic_open() to
finish_open()).  And *that* definitely should not be done
in RCU mode, due to
        if (open_flag & O_CREAT)
                inode_lock(dir->d_inode);
        else
                inode_lock_shared(dir->d_inode);
        dentry = lookup_open(nd, file, op, got_write);
in the sole caller of that thing.  Again, can't grab a blocking
lock under rcu_read_lock().  Which is why we have this
                if (WARN_ON_ONCE(nd->flags & LOOKUP_RCU))
                        return ERR_PTR(-ECHILD);
        } else {
                /* create side of things */
                if (nd->flags & LOOKUP_RCU) {
                        if (!try_to_unlazy(nd))
                                return ERR_PTR(-ECHILD);
                }
slightly prior to that call.  WARN_ON_ONCE is basically "lookup_fast()
has returned NULL and stayed in RCU mode", which should never happen.
try_to_unlazy() is straight "either switch to non-RCU mode or return an
error" - that's what this function is for.  No WARN_ON after that - it
would only obfuscate things.

*IF* you want to add debugging checks for that kind of stuff, just call
that assert_nonrcu(nd), make it check and whine and feel free to slap
them in reasonable amount of places (anything that makes a reader go
"for fuck sake, hadn't we (a) done that on the entry to this function
and (b) done IO since then, anyway?" is obviously not reasonable, etc. -
no more than common sense limitations).

Another common sense thing: extra asserts won't confuse syzkaller, but
they very much can confuse a human reader.  And any rewrites are done
by humans...

As for the double call of vfs_open_consume()...  You do realize that
the damn thing wouldn't have reached that check if it would ever have
cause to be triggered, right?  Seeing that we call
static inline struct mnt_idmap *mnt_idmap(const struct vfsmount *mnt)
{
        /* Pairs with smp_store_release() in do_idmap_mount(). */
        return smp_load_acquire(&mnt->mnt_idmap);
}
near the beginning of do_open(), ~20 lines before the place where
you added that check...

I'm not sure it makes sense to defend against a weird loop appearing
out of nowhere near the top of call chain, but if you want to do that,
this is not the right place for that.

  reply	other threads:[~2024-08-07  5:32 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-08-06 14:46 [PATCH] vfs: avoid spurious dentry ref/unref cycle on open Mateusz Guzik
2024-08-06 15:53 ` Al Viro
2024-08-06 16:09   ` Mateusz Guzik
2024-08-06 16:14     ` Mateusz Guzik
2024-08-07  3:38     ` Al Viro
2024-08-07  3:57       ` Mateusz Guzik
2024-08-07  5:32         ` Al Viro [this message]
2024-08-07  5:46           ` Mateusz Guzik
2024-08-07  6:23         ` Al Viro
2024-08-07  6:33           ` Al Viro
2024-08-07  6:40             ` Mateusz Guzik
2024-08-07  7:05               ` Al Viro
2024-08-07  7:22                 ` Mateusz Guzik
2024-08-07  7:52                   ` Al Viro
2024-08-07  7:59                     ` Mateusz Guzik
2024-08-07  9:50                       ` Mateusz Guzik
2024-08-07 12:43                         ` Al Viro
2024-08-07 20:38                           ` Al Viro
2024-08-20 11:38                             ` Mateusz Guzik
2024-08-22  0:33                               ` Al Viro
2024-08-22  0:34                                 ` [PATCH 1/3] don't duplicate vfs_open() in kernel_file_open() Al Viro
2024-08-22  7:53                                   ` Christian Brauner
2024-08-22  0:41                                 ` [PATCH 2/3] lift grabbing path into caller of do_dentry_open() Al Viro
2024-08-22  7:54                                   ` Christian Brauner
2024-08-22  0:41                                 ` [PATCH 3/3] avoid extra path_get/path_put cycle in path_openat() Al Viro
2024-08-22  9:31                                   ` Christian Brauner
2024-08-22 10:21                                   ` Mateusz Guzik
2025-02-17  8:03                                 ` [PATCH] vfs: avoid spurious dentry ref/unref cycle on open Mateusz Guzik
2024-08-08  6:26                           ` Mateusz Guzik
2024-08-06 22:51 ` Dave Chinner
2024-08-06 22:55   ` Mateusz Guzik
2024-08-07  2:56     ` Dave Chinner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240807053232.GT5334@ZenIV \
    --to=viro@zeniv.linux.org.uk \
    --cc=brauner@kernel.org \
    --cc=jack@suse.cz \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mjguzik@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).