All of lore.kernel.org
 help / color / mirror / Atom feed
From: Al Viro <viro@zeniv.linux.org.uk>
To: Steven Rostedt <rostedt@goodmis.org>
Cc: Greg KH <gregkh@linuxfoundation.org>,
	yu kuai <yukuai3@huawei.com>,
	rafael@kernel.org, oleg@redhat.com, mchehab+samsung@kernel.org,
	corbet@lwn.net, tytso@mit.edu, jmorris@namei.org,
	linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	zhengbin13@huawei.com, yi.zhang@huawei.com,
	chenxiang66@hisilicon.com, xiexiuqi@huawei.com
Subject: Re: [PATCH 1/3] dcache: add a new enum type for 'dentry_d_lock_class'
Date: Fri, 15 Nov 2019 17:54:23 +0000	[thread overview]
Message-ID: <20191115175423.GS26530@ZenIV.linux.org.uk> (raw)
In-Reply-To: <20191115141754.GR26530@ZenIV.linux.org.uk>

On Fri, Nov 15, 2019 at 02:17:54PM +0000, Al Viro wrote:
> On Fri, Nov 15, 2019 at 08:58:05AM -0500, Steven Rostedt wrote:
> > On Fri, 15 Nov 2019 13:48:23 +0000
> > Al Viro <viro@zeniv.linux.org.uk> wrote:
> > 
> > > > BTW, what do you mean by "can debugfs_remove_recursive() rely upon the
> > > > lack of attempts to create new entries inside the subtree it's trying
> > > > to kill?"  
> > > 
> > > Is it possible for something to call e.g. debugfs_create_dir() (or any
> > > similar primitive) with parent inside the subtree that has been
> > > passed to debugfs_remove_recursive() call that is still in progress?
> > > 
> > > If debugfs needs to cope with that, debugfs_remove_recursive() needs
> > > considerably heavier locking, to start with.
> > 
> > I don't know about debugfs, but at least tracefs (which cut and pasted
> > from debugfs) does not allow that. At least in theory it doesn't allow
> > that (and if it does, it's a bug in the locking at the higher levels).
> > 
> > And perhaps debugfs shouldn't allow that either. As it is only suppose
> > to be a light weight way to interact with the kernel, hence the name
> > "debugfs".
> > 
> > Yu, do you have a test case for the "infinite loop" case?
> 
> Infinite loop, AFAICS, is reasonably easy to trigger - just open
> a non-empty subdirectory and lseek to e.g. next-to-last element
> in it.  Again, list_empty() use in there is quite wrong - it can
> give false negatives just on the cursors.  No arguments about
> that part...

FWIW, given that debugfs_remove_recursive() has no way to report an error
*and* that we have a single chokepoint for all entry creations (start_creating())
we could make sure nothing gets added pretty easily - just mark the victim
dentry as "don't allow any creations here" when we first reach it and
have start_creating check that, using e.g. inode_lock() for serialization.
Marking could be done e.g. by setting ->d_fsdata to
(void *)DEBUGFS_FSDATA_IS_REAL_FOPS_BIT, so that ->d_release() doesn't
need any changes.

BTW, this
                if (!ret)
                        d_delete(dentry);
                if (d_is_reg(dentry))
                        __debugfs_file_removed(dentry);
                dput(dentry);
in __debugfs_remove() is both subtle and bogus.  If we get here
with refcount > 1, d_delete() is just a d_drop() - dentry can't
be made negative, so it gets unhashed instead.  If we *do* get
here with refcount 1, it will be made negative, all right.
Only to be killed off by immediately following dput(), since
debugfs doesn't retain unpinned dentries.

Why immediate?  Because d_is_reg() is obviously false on
negative dentries, so __debugfs_file_removed() is not called.
That's where the subtle part begins: there should've been
nobody for __debugfs_file_removed() to wait for, since they
would've had to hold additional references to dentry and
refcount wouldn't have been 1.  So that's actually not
a bug.  However, it's too subtle to introduce without
having bothered to even comment the damn thing.

As for the "bogus" part - that d_delete() is bollocks.
First of all, it is fully equivalent to d_drop().  Always
had been.  What's more, that sucker should've been
d_invalidate() instead.

Anyway, AFAICS removal could be done this way:

// parent is held exclusive, after is NULL or a child of parent
find_next_child(parent, prev)
	child = NULL
	node = prev ? &prev->d_child : &parent->d_subdirs;
	grab parent->d_lock
	for each entry in the list starting at node->next
		d = container_of(entry, struct dentry, d_child)
		grab d->d_lock
		if simple_positive(d)
			bump d->d_count
			child = d
		drop d->d_lock
		if child
			break
	drop parent->d_lock
	dput(prev);
	return child

kill_it(victim, parent)
	if simple_positive(victim)
		d_invalidate(victim);	// needed to avoid lost mounts
		if victim is a directory
			fsnotify_rmdir
		else
			fsnotify_unlink
		if victim is regular
			__debugfs_file_removed(victim)
		dput(victim)		// unpin it

recursive_removal(dentry)
	this = dentry;
	while (true) {
		victim = NULL;
		inode = this->d_inode;
		inode_lock(inode);
		if (d_is_dir(this))
			mark this doomed
		while ((child = find_next_child(this, victim)) == NULL) {
			// no children (left); kill and ascend
			// update metadata while it's still locked
			inode->i_ctime = current_time(inode);
			clear_nlink(inode);
			inode_unlock(inode);
			victim = this;
			this = this->d_parent;
			inode = this->d_inode;
			inode_lock(inode);
			kill_it(victim, this);
			if (victim == dentry) {
				inode->i_ctime = inode->i_mtime = current_time(inode);
				if (d_is_dir(dentry))
					drop_nlink(inode);
				inode_unlock(inode);
				dput(dentry);
				return;
			}
		}
		inode_unlock(inode);
		this = child;
	}

  reply	other threads:[~2019-11-15 17:54 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-11-15  3:27 [PATCH 0/3] fix potential infinite loop in debugfs_remove_recursive yu kuai
2019-11-15  3:27 ` [PATCH 1/3] dcache: add a new enum type for 'dentry_d_lock_class' yu kuai
2019-11-15  3:27   ` Greg KH
2019-11-15  4:12     ` Al Viro
2019-11-15  7:20       ` Greg KH
2019-11-15 10:08         ` yukuai (C)
2019-11-15 13:16         ` Al Viro
2019-11-15 13:38           ` Steven Rostedt
2019-11-15 13:39             ` Steven Rostedt
2019-11-15 13:48             ` Al Viro
2019-11-15 13:58               ` Steven Rostedt
2019-11-15 14:17                 ` Al Viro
2019-11-15 17:54                   ` Al Viro [this message]
2019-11-15 18:42                     ` [RFC] simple_recursive_removal() Al Viro
2019-11-15 19:41                       ` Al Viro
2019-11-15 21:18                         ` Al Viro
2019-11-15 21:26                           ` Steven Rostedt
2019-11-15 22:10                             ` Al Viro
2019-11-16 12:04                               ` Greg KH
2019-11-17 22:24                               ` Al Viro
2019-11-18  6:37                                 ` Greg KH
2019-11-15 10:02     ` [PATCH 1/3] dcache: add a new enum type for 'dentry_d_lock_class' yukuai (C)
2019-11-15  3:27 ` [PATCH 2/3] fs/libfs.c: use 'spin_lock_nested' when taking 'd_lock' for dentry in simple_empty yu kuai
2019-11-15  3:27 ` [PATCH 3/3] debugfs: fix potential infinite loop in debugfs_remove_recursive yu kuai

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20191115175423.GS26530@ZenIV.linux.org.uk \
    --to=viro@zeniv.linux.org.uk \
    --cc=chenxiang66@hisilicon.com \
    --cc=corbet@lwn.net \
    --cc=gregkh@linuxfoundation.org \
    --cc=jmorris@namei.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mchehab+samsung@kernel.org \
    --cc=oleg@redhat.com \
    --cc=rafael@kernel.org \
    --cc=rostedt@goodmis.org \
    --cc=tytso@mit.edu \
    --cc=xiexiuqi@huawei.com \
    --cc=yi.zhang@huawei.com \
    --cc=yukuai3@huawei.com \
    --cc=zhengbin13@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.