From: Al Viro <viro@zeniv.linux.org.uk>
To: Steven Rostedt <rostedt@goodmis.org>
Cc: Greg KH <gregkh@linuxfoundation.org>,
yu kuai <yukuai3@huawei.com>,
rafael@kernel.org, oleg@redhat.com, mchehab+samsung@kernel.org,
corbet@lwn.net, tytso@mit.edu, jmorris@namei.org,
linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
zhengbin13@huawei.com, yi.zhang@huawei.com,
chenxiang66@hisilicon.com, xiexiuqi@huawei.com
Subject: Re: [PATCH 1/3] dcache: add a new enum type for 'dentry_d_lock_class'
Date: Fri, 15 Nov 2019 17:54:23 +0000 [thread overview]
Message-ID: <20191115175423.GS26530@ZenIV.linux.org.uk> (raw)
In-Reply-To: <20191115141754.GR26530@ZenIV.linux.org.uk>
On Fri, Nov 15, 2019 at 02:17:54PM +0000, Al Viro wrote:
> On Fri, Nov 15, 2019 at 08:58:05AM -0500, Steven Rostedt wrote:
> > On Fri, 15 Nov 2019 13:48:23 +0000
> > Al Viro <viro@zeniv.linux.org.uk> wrote:
> >
> > > > BTW, what do you mean by "can debugfs_remove_recursive() rely upon the
> > > > lack of attempts to create new entries inside the subtree it's trying
> > > > to kill?"
> > >
> > > Is it possible for something to call e.g. debugfs_create_dir() (or any
> > > similar primitive) with parent inside the subtree that has been
> > > passed to debugfs_remove_recursive() call that is still in progress?
> > >
> > > If debugfs needs to cope with that, debugfs_remove_recursive() needs
> > > considerably heavier locking, to start with.
> >
> > I don't know about debugfs, but at least tracefs (which cut and pasted
> > from debugfs) does not allow that. At least in theory it doesn't allow
> > that (and if it does, it's a bug in the locking at the higher levels).
> >
> > And perhaps debugfs shouldn't allow that either. As it is only suppose
> > to be a light weight way to interact with the kernel, hence the name
> > "debugfs".
> >
> > Yu, do you have a test case for the "infinite loop" case?
>
> Infinite loop, AFAICS, is reasonably easy to trigger - just open
> a non-empty subdirectory and lseek to e.g. next-to-last element
> in it. Again, list_empty() use in there is quite wrong - it can
> give false negatives just on the cursors. No arguments about
> that part...
FWIW, given that debugfs_remove_recursive() has no way to report an error
*and* that we have a single chokepoint for all entry creations (start_creating())
we could make sure nothing gets added pretty easily - just mark the victim
dentry as "don't allow any creations here" when we first reach it and
have start_creating check that, using e.g. inode_lock() for serialization.
Marking could be done e.g. by setting ->d_fsdata to
(void *)DEBUGFS_FSDATA_IS_REAL_FOPS_BIT, so that ->d_release() doesn't
need any changes.
BTW, this
if (!ret)
d_delete(dentry);
if (d_is_reg(dentry))
__debugfs_file_removed(dentry);
dput(dentry);
in __debugfs_remove() is both subtle and bogus. If we get here
with refcount > 1, d_delete() is just a d_drop() - dentry can't
be made negative, so it gets unhashed instead. If we *do* get
here with refcount 1, it will be made negative, all right.
Only to be killed off by immediately following dput(), since
debugfs doesn't retain unpinned dentries.
Why immediate? Because d_is_reg() is obviously false on
negative dentries, so __debugfs_file_removed() is not called.
That's where the subtle part begins: there should've been
nobody for __debugfs_file_removed() to wait for, since they
would've had to hold additional references to dentry and
refcount wouldn't have been 1. So that's actually not
a bug. However, it's too subtle to introduce without
having bothered to even comment the damn thing.
As for the "bogus" part - that d_delete() is bollocks.
First of all, it is fully equivalent to d_drop(). Always
had been. What's more, that sucker should've been
d_invalidate() instead.
Anyway, AFAICS removal could be done this way:
// parent is held exclusive, after is NULL or a child of parent
find_next_child(parent, prev)
child = NULL
node = prev ? &prev->d_child : &parent->d_subdirs;
grab parent->d_lock
for each entry in the list starting at node->next
d = container_of(entry, struct dentry, d_child)
grab d->d_lock
if simple_positive(d)
bump d->d_count
child = d
drop d->d_lock
if child
break
drop parent->d_lock
dput(prev);
return child
kill_it(victim, parent)
if simple_positive(victim)
d_invalidate(victim); // needed to avoid lost mounts
if victim is a directory
fsnotify_rmdir
else
fsnotify_unlink
if victim is regular
__debugfs_file_removed(victim)
dput(victim) // unpin it
recursive_removal(dentry)
this = dentry;
while (true) {
victim = NULL;
inode = this->d_inode;
inode_lock(inode);
if (d_is_dir(this))
mark this doomed
while ((child = find_next_child(this, victim)) == NULL) {
// no children (left); kill and ascend
// update metadata while it's still locked
inode->i_ctime = current_time(inode);
clear_nlink(inode);
inode_unlock(inode);
victim = this;
this = this->d_parent;
inode = this->d_inode;
inode_lock(inode);
kill_it(victim, this);
if (victim == dentry) {
inode->i_ctime = inode->i_mtime = current_time(inode);
if (d_is_dir(dentry))
drop_nlink(inode);
inode_unlock(inode);
dput(dentry);
return;
}
}
inode_unlock(inode);
this = child;
}
next prev parent reply other threads:[~2019-11-15 17:54 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-11-15 3:27 [PATCH 0/3] fix potential infinite loop in debugfs_remove_recursive yu kuai
2019-11-15 3:27 ` [PATCH 1/3] dcache: add a new enum type for 'dentry_d_lock_class' yu kuai
2019-11-15 3:27 ` Greg KH
2019-11-15 4:12 ` Al Viro
2019-11-15 7:20 ` Greg KH
2019-11-15 10:08 ` yukuai (C)
2019-11-15 13:16 ` Al Viro
2019-11-15 13:38 ` Steven Rostedt
2019-11-15 13:39 ` Steven Rostedt
2019-11-15 13:48 ` Al Viro
2019-11-15 13:58 ` Steven Rostedt
2019-11-15 14:17 ` Al Viro
2019-11-15 17:54 ` Al Viro [this message]
2019-11-15 18:42 ` [RFC] simple_recursive_removal() Al Viro
2019-11-15 19:41 ` Al Viro
2019-11-15 21:18 ` Al Viro
2019-11-15 21:26 ` Steven Rostedt
2019-11-15 22:10 ` Al Viro
2019-11-16 12:04 ` Greg KH
2019-11-17 22:24 ` Al Viro
2019-11-18 6:37 ` Greg KH
2019-11-15 10:02 ` [PATCH 1/3] dcache: add a new enum type for 'dentry_d_lock_class' yukuai (C)
2019-11-15 3:27 ` [PATCH 2/3] fs/libfs.c: use 'spin_lock_nested' when taking 'd_lock' for dentry in simple_empty yu kuai
2019-11-15 3:27 ` [PATCH 3/3] debugfs: fix potential infinite loop in debugfs_remove_recursive yu kuai
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20191115175423.GS26530@ZenIV.linux.org.uk \
--to=viro@zeniv.linux.org.uk \
--cc=chenxiang66@hisilicon.com \
--cc=corbet@lwn.net \
--cc=gregkh@linuxfoundation.org \
--cc=jmorris@namei.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mchehab+samsung@kernel.org \
--cc=oleg@redhat.com \
--cc=rafael@kernel.org \
--cc=rostedt@goodmis.org \
--cc=tytso@mit.edu \
--cc=xiexiuqi@huawei.com \
--cc=yi.zhang@huawei.com \
--cc=yukuai3@huawei.com \
--cc=zhengbin13@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.