From: Al Viro <viro@ZenIV.linux.org.uk>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Dave Hansen <dave.hansen@intel.com>,
"Chen, Tim C" <tim.c.chen@intel.com>,
Ingo Molnar <mingo@redhat.com>, Davidlohr Bueso <dbueso@suse.de>,
"Peter Zijlstra (Intel)" <peterz@infradead.org>,
Jason Low <jason.low2@hp.com>,
Michel Lespinasse <walken@google.com>,
"Paul E. McKenney" <paulmck@linux.vnet.ibm.com>,
Waiman Long <waiman.long@hp.com>,
LKML <linux-kernel@vger.kernel.org>
Subject: Re: performance delta after VFS i_mutex=>i_rwsem conversion
Date: Tue, 7 Jun 2016 01:44:27 +0100 [thread overview]
Message-ID: <20160607004427.GI14480@ZenIV.linux.org.uk> (raw)
In-Reply-To: <20160607004058.GH14480@ZenIV.linux.org.uk>
On Tue, Jun 07, 2016 at 01:40:58AM +0100, Al Viro wrote:
> On Mon, Jun 06, 2016 at 04:50:59PM -0700, Linus Torvalds wrote:
> >
> >
> > On Mon, 6 Jun 2016, Al Viro wrote:
> > >
> > > True in general, but here we really do a lot under that ->d_lock - all
> > > list traversals are under it. So I suspect that contention on nested
> > > lock is not an issue in that particular load. It's certainly a separate
> > > commit, so we'll see how much does it give on its own, but I doubt that
> > > it'll be anywhere near enough.
> >
> > Hmm. Maybe.
> >
> > But at least we can try to minimize everything that happens under the
> > dentry->d_lock spinlock.
> >
> > So how about this patch? It's entirely untested, but it rewrites that
> > readdir() function to try to do the minimum possible under the d_lock
> > spinlock.
> >
> > I say "rewrite", because it really is totally different. It's not just
> > that the nested "next" locking is gone, it also treats the cursor very
> > differently and tries to avoid doing any unnecessary cursor list
> > operations.
>
> Similar to what I've got here, except that mine has a couple of helper
> functions usable in dcache_dir_lseek() as well:
>
> next_positive(parent, child, n) - returns nth positive child after that one
> or NULL if there's less than n such. NULL as the second argument => search
> from the beginning.
>
> move_cursor(cursor, child) - moves cursor immediately past child *or* to
> the very end if child is NULL.
>
> The third commit in series will be the lockless replacement for
> for next_positive(). move_cursor() is easy - it became simply
> struct dentry *parent = cursor->d_parent;
> unsigned n, *seq = &parent->d_inode->i_dir_seq;
> spin_lock(&parent->d_lock);
> for (;;) {
> n = *seq;
> if (!(n & 1) && cmpxchg(seq, n, n + 1) == n)
> break;
> cpu_relax();
> }
> __list_del(cursor->d_child.prev, cursor->d_child.next);
> if (child)
> list_add(&cursor->d_child, &child->d_child);
> else
> list_add_tail(&cursor->d_child, &parent->d_subdirs);
> smp_store_release(seq, n + 2);
> spin_unlock(&parent->d_lock);
>
> with
>
> static struct dentry *next_positive(struct dentry *parent,
> struct dentry *child, int count)
> {
> struct list_head *p = child ? &child->d_child : &parent->d_subdirs;
> unsigned *seq = &parent->d_inode->i_dir_seq, n;
> do {
> int i = count;
> n = smp_load_acquire(seq) & ~1;
> rcu_read_lock();
> do {
> p = p->next;
> if (p == &parent->d_subdirs) {
> child = NULL;
> break;
> }
> child = list_entry(p, struct dentry, d_child);
> } while (!simple_positive(child) || --i);
> rcu_read_unlock();
> } while (unlikely(smp_load_acquire(seq) != n));
> return child;
> }
> as initial attempt at lockless next_positive(); barriers are probably wrong,
> though...
FWIW,
loff_t dcache_dir_lseek(struct file *file, loff_t offset, int whence)
{
struct dentry *dentry = file->f_path.dentry;
switch (whence) {
case 1:
offset += file->f_pos;
case 0:
if (offset >= 0)
break;
default:
return -EINVAL;
}
if (offset != file->f_pos) {
file->f_pos = offset;
if (file->f_pos >= 2) {
struct dentry *cursor = file->private_data;
loff_t n = file->f_pos - 2;
inode_lock_shared(dentry->d_inode);
move_cursor(cursor, next_positive(dentry, NULL, n));
inode_unlock_shared(dentry->d_inode);
}
}
return offset;
}
and
int dcache_readdir(struct file *file, struct dir_context *ctx)
{
struct dentry *dentry = file->f_path.dentry;
struct dentry *cursor = file->private_data;
struct dentry *child, *next;
bool moved = false;
if (!dir_emit_dots(file, ctx))
return 0;
child = ctx->pos != 2 ? cursor : NULL;
while ((next = next_positive(dentry, child, 1)) != NULL) {
if (!dir_emit(ctx, next->d_name.name, next->d_name.len,
d_inode(next)->i_ino, dt_type(d_inode(next))))
break;
moved = true;
child = next;
ctx->pos++;
}
if (moved)
move_cursor(cursor, child);
return 0;
}
is what the methods themselves became.
next prev parent reply other threads:[~2016-06-07 0:44 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-06-06 20:00 performance delta after VFS i_mutex=>i_rwsem conversion Dave Hansen
2016-06-06 20:46 ` Linus Torvalds
2016-06-06 21:13 ` Waiman Long
2016-06-06 21:20 ` Linus Torvalds
2016-06-07 3:22 ` Valdis.Kletnieks
2016-06-07 15:22 ` Waiman Long
2016-06-08 8:58 ` Ingo Molnar
2016-06-09 10:25 ` Ingo Molnar
2016-06-09 18:14 ` Dave Hansen
2016-06-09 20:10 ` Chen, Tim C
2016-06-06 21:15 ` Al Viro
2016-06-06 21:46 ` Linus Torvalds
2016-06-06 22:07 ` Al Viro
2016-06-06 23:50 ` Linus Torvalds
2016-06-06 23:59 ` Linus Torvalds
2016-06-07 0:29 ` Linus Torvalds
2016-06-07 0:40 ` Al Viro
2016-06-07 0:44 ` Al Viro [this message]
2016-06-07 0:58 ` Al Viro
2016-06-07 0:58 ` Linus Torvalds
2016-06-07 1:19 ` Al Viro
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20160607004427.GI14480@ZenIV.linux.org.uk \
--to=viro@zeniv.linux.org.uk \
--cc=dave.hansen@intel.com \
--cc=dbueso@suse.de \
--cc=jason.low2@hp.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@redhat.com \
--cc=paulmck@linux.vnet.ibm.com \
--cc=peterz@infradead.org \
--cc=tim.c.chen@intel.com \
--cc=torvalds@linux-foundation.org \
--cc=waiman.long@hp.com \
--cc=walken@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.