From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753622AbcFGAlT (ORCPT ); Mon, 6 Jun 2016 20:41:19 -0400 Received: from zeniv.linux.org.uk ([195.92.253.2]:45018 "EHLO ZenIV.linux.org.uk" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752480AbcFGAlS (ORCPT ); Mon, 6 Jun 2016 20:41:18 -0400 Date: Tue, 7 Jun 2016 01:40:58 +0100 From: Al Viro To: Linus Torvalds Cc: Dave Hansen , "Chen, Tim C" , Ingo Molnar , Davidlohr Bueso , "Peter Zijlstra (Intel)" , Jason Low , Michel Lespinasse , "Paul E. McKenney" , Waiman Long , LKML Subject: Re: performance delta after VFS i_mutex=>i_rwsem conversion Message-ID: <20160607004058.GH14480@ZenIV.linux.org.uk> References: <5755D671.9070908@intel.com> <20160606211522.GF14480@ZenIV.linux.org.uk> <20160606220753.GG14480@ZenIV.linux.org.uk> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: User-Agent: Mutt/1.6.0 (2016-04-01) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Jun 06, 2016 at 04:50:59PM -0700, Linus Torvalds wrote: > > > On Mon, 6 Jun 2016, Al Viro wrote: > > > > True in general, but here we really do a lot under that ->d_lock - all > > list traversals are under it. So I suspect that contention on nested > > lock is not an issue in that particular load. It's certainly a separate > > commit, so we'll see how much does it give on its own, but I doubt that > > it'll be anywhere near enough. > > Hmm. Maybe.  > > But at least we can try to minimize everything that happens under the > dentry->d_lock spinlock.  > > So how about this patch? It's entirely untested, but it rewrites that > readdir() function to try to do the minimum possible under the d_lock > spinlock. > > I say "rewrite", because it really is totally different. It's not just > that the nested "next" locking is gone, it also treats the cursor very > differently and tries to avoid doing any unnecessary cursor list > operations. Similar to what I've got here, except that mine has a couple of helper functions usable in dcache_dir_lseek() as well: next_positive(parent, child, n) - returns nth positive child after that one or NULL if there's less than n such. NULL as the second argument => search from the beginning. move_cursor(cursor, child) - moves cursor immediately past child *or* to the very end if child is NULL. The third commit in series will be the lockless replacement for for next_positive(). move_cursor() is easy - it became simply struct dentry *parent = cursor->d_parent; unsigned n, *seq = &parent->d_inode->i_dir_seq; spin_lock(&parent->d_lock); for (;;) { n = *seq; if (!(n & 1) && cmpxchg(seq, n, n + 1) == n) break; cpu_relax(); } __list_del(cursor->d_child.prev, cursor->d_child.next); if (child) list_add(&cursor->d_child, &child->d_child); else list_add_tail(&cursor->d_child, &parent->d_subdirs); smp_store_release(seq, n + 2); spin_unlock(&parent->d_lock); with static struct dentry *next_positive(struct dentry *parent, struct dentry *child, int count) { struct list_head *p = child ? &child->d_child : &parent->d_subdirs; unsigned *seq = &parent->d_inode->i_dir_seq, n; do { int i = count; n = smp_load_acquire(seq) & ~1; rcu_read_lock(); do { p = p->next; if (p == &parent->d_subdirs) { child = NULL; break; } child = list_entry(p, struct dentry, d_child); } while (!simple_positive(child) || --i); rcu_read_unlock(); } while (unlikely(smp_load_acquire(seq) != n)); return child; } as initial attempt at lockless next_positive(); barriers are probably wrong, though...