[PATCH v3 00/25] Lockless update of reference count protected by spinlock

linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v3 00/25] Lockless update of reference count protected by spinlock
@ 2013-07-03 20:18 Waiman Long
  2013-07-03 20:18 ` [PATCH v3 01/25] spinlock: A new lockref structure for lockless update of refcount Waiman Long
                   ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: Waiman Long @ 2013-07-03 20:18 UTC (permalink / raw)
  To: Alexander Viro, Jeff Layton, Miklos Szeredi, Ingo Molnar,
	Thomas Gleixner
  Cc: Waiman Long, linux-fsdevel, linux-kernel, Peter Zijlstra,
	Steven Rostedt, Linus Torvalds, Benjamin Herrenschmidt,
	Andi Kleen, Chandramouleeswaran, Aswin, Norton, Scott J

v2->v3:
 - Completely revamp the packaging by adding a new lockref data
   structure that combines the spinlock with the reference
   count. Helper functions are also added to manipulate the new data
   structure. That results in modifying over 50 files, but the changes
   were trivial in most of them.
 - Change initial spinlock wait to use a timeout.
 - Force 64-bit alignment of the spinlock & reference count structure.
 - Add a new way to use the combo by using a new union and helper
   functions.

v1->v2:
 - Add one more layer of indirection to LOCK_WITH_REFCOUNT macro.
 - Add __LINUX_SPINLOCK_REFCOUNT_H protection to spinlock_refcount.h.
 - Add some generic get/put macros into spinlock_refcount.h.

This patchset supports a generic mechanism to atomically update
a reference count that is protected by a spinlock without actually
acquiring the lock itself. If the update doesn't succeeed, the caller
will have to acquire the lock and update the reference count in the
the old way.  This will help in situation where there is a lot of
spinlock contention because of frequent reference count update.

The d_lock and d_count fields of the struct dentry in dcache.h was
modified to use the new lockref data structure and the associated
helper functions to access the spinlock and the reference count. This
change cause significant performance improvement in the short workload
of the AIM7 benchmark on a 8-socket x86-64 machine with 80 cores.

patch 1: Introduce the new lockref data structure
patch 2: Enable x86 architecture to use the feature
patch 3: Change the dentry structure to use lockref to improve
	 performance for high contention situation
patches 4-25: Use new helper functions to access the d_lock and
	      d_count fields of the original dentry structure.

Thank to Thomas Gleixner, Andi Kleen and Linus for their valuable
input in shaping this patchset.

Signed-off-by: Waiman Long <Waiman.Long@hp.com>

Waiman Long (25):
  spinlock: A new lockref structure for lockless update of refcount
  spinlock: Enable x86 architecture to do lockless refcount update
  dcache: Enable lockless update of d_count in dentry structure
  powerpc: Change how dentry's d_lock field is accessed
  infiniband: Change how dentry's d_lock field is accessed
  9p-fs: Change how dentry's d_lock field is accessed
  afs-fs: Change how dentry's d_lock field is accessed
  auto-fs: Change how dentry's d_lock and d_count fields are accessed
  ceph-fs: Change how dentry's d_lock and d_count fields are accessed
  cifs: Change how dentry's d_lock field is accessed
  coda-fs: Change how dentry's d_lock and d_count fields are accessed
  config-fs: Change how dentry's d_lock and d_count fields are accessed
  vfs: Change how dentry's d_lock and d_count fields are accessed
  ecrypt-fs: Change how dentry's d_count field is accessed
  export-fs: Change how dentry's d_lock field is accessed
  vfat: Change how dentry's d_lock field is accessed
  file locking: Change how dentry's d_lock and d_count fields are
    accessed
  ncp-fs: Change how dentry's d_lock field is accessed
  nfs: Change how dentry's d_lock and d_count fields are accessed
  nilfs2: Change how dentry's d_count field is accessed
  fsnotify: Change how dentry's d_lock field is accessed
  ocfs2: Change how dentry's d_lock field is accessed
  cgroup: Change how dentry's d_lock field is accessed
  sunrpc: Change how dentry's d_lock field is accessed
  selinux: Change how dentry's d_lock field is accessed

 arch/powerpc/platforms/cell/spufs/inode.c |    6 +-
 arch/x86/Kconfig                          |    3 +
 arch/x86/include/asm/spinlock_refcount.h  |    1 +
 drivers/infiniband/hw/ipath/ipath_fs.c    |    6 +-
 drivers/infiniband/hw/qib/qib_fs.c        |    6 +-
 fs/9p/fid.c                               |   14 +-
 fs/afs/dir.c                              |    4 +-
 fs/autofs4/autofs_i.h                     |   24 +-
 fs/autofs4/expire.c                       |   48 ++--
 fs/autofs4/root.c                         |   14 +-
 fs/ceph/caps.c                            |    8 +-
 fs/ceph/debugfs.c                         |    8 +-
 fs/ceph/dir.c                             |   34 ++--
 fs/ceph/export.c                          |    4 +-
 fs/ceph/inode.c                           |   24 +-
 fs/ceph/mds_client.c                      |   22 +-
 fs/cifs/dir.c                             |   10 +-
 fs/coda/cache.c                           |    4 +-
 fs/coda/dir.c                             |    4 +-
 fs/configfs/configfs_internal.h           |    4 +-
 fs/configfs/dir.c                         |    2 +-
 fs/configfs/inode.c                       |    6 +-
 fs/dcache.c                               |  324 +++++++++++++++--------------
 fs/dcookies.c                             |    8 +-
 fs/ecryptfs/inode.c                       |    2 +-
 fs/exportfs/expfs.c                       |    8 +-
 fs/fat/namei_vfat.c                       |    4 +-
 fs/fs-writeback.c                         |    4 +-
 fs/libfs.c                                |   36 ++--
 fs/locks.c                                |    2 +-
 fs/namei.c                                |   28 ++--
 fs/namespace.c                            |   10 +-
 fs/ncpfs/dir.c                            |    6 +-
 fs/ncpfs/ncplib_kernel.h                  |    8 +-
 fs/nfs/dir.c                              |   14 +-
 fs/nfs/getroot.c                          |    8 +-
 fs/nfs/namespace.c                        |   16 +-
 fs/nfs/unlink.c                           |   22 +-
 fs/nilfs2/super.c                         |    2 +-
 fs/notify/fsnotify.c                      |    8 +-
 fs/notify/vfsmount_mark.c                 |   24 +-
 fs/ocfs2/dcache.c                         |    6 +-
 include/asm-generic/spinlock_refcount.h   |   98 +++++++++
 include/linux/dcache.h                    |   52 ++++--
 include/linux/fs.h                        |    4 +-
 include/linux/fsnotify_backend.h          |    6 +-
 include/linux/spinlock_refcount.h         |  159 ++++++++++++++
 kernel/Kconfig.locks                      |    5 +
 kernel/cgroup.c                           |   10 +-
 lib/Makefile                              |    2 +
 lib/spinlock_refcount.c                   |  229 ++++++++++++++++++++
 net/sunrpc/rpc_pipe.c                     |    6 +-
 security/selinux/selinuxfs.c              |   14 +-
 53 files changed, 958 insertions(+), 423 deletions(-)
 create mode 100644 arch/x86/include/asm/spinlock_refcount.h
 create mode 100644 include/asm-generic/spinlock_refcount.h
 create mode 100644 include/linux/spinlock_refcount.h
 create mode 100644 lib/spinlock_refcount.c

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH v3 01/25] spinlock: A new lockref structure for lockless update of refcount
  2013-07-03 20:18 [PATCH v3 00/25] Lockless update of reference count protected by spinlock Waiman Long
@ 2013-07-03 20:18 ` Waiman Long
  2013-07-03 20:19 ` [PATCH v3 02/25] spinlock: Enable x86 architecture to do lockless refcount update Waiman Long
  2013-07-03 20:19 ` [PATCH v3 03/25] dcache: Enable lockless update of d_count in dentry structure Waiman Long
  2 siblings, 0 replies; 6+ messages in thread
From: Waiman Long @ 2013-07-03 20:18 UTC (permalink / raw)
  To: Alexander Viro, Jeff Layton, Miklos Szeredi, Ingo Molnar,
	Thomas Gleixner
  Cc: Waiman Long, linux-fsdevel, linux-kernel, Peter Zijlstra,
	Steven Rostedt, Linus Torvalds, Benjamin Herrenschmidt,
	Andi Kleen, Chandramouleeswaran, Aswin, Norton, Scott J

This patch introduces a new set of spinlock_refcount.h header files to
be included by kernel codes that want to do a faster lockless update
of reference count protected by a spinlock.

The new lockref structure consists of just the spinlock and the
reference count data. Helper functions are defined in the new
<linux/spinlock_refcount.h> header file to access the content of
the new structure. There is a generic structure defined for all
architecture, but each architecture can also optionally define its
own structure and use its own helper functions.

Two new config parameters are introduced:
1. SPINLOCK_REFCOUNT
2. ARCH_SPINLOCK_REFCOUNT

The first one is defined in the kernel/Kconfig.locks which is used
to enable or disable the faster lockless reference count update
optimization. The second one has to be defined in each of the
architecture's Kconfig file to enable the optimization for that
architecture. Therefore, each architecture has to opt-in for this
optimization or it won't get it. This allows each architecture plenty
of time to test it out before deciding to use it or replace it with
a better architecture specific solution.

This optimization won't work for non-SMP system or when spinlock
debugging is turned on. As a result, it is turned off each any of
them is true. It also won't work for full preempt-RT and so should
be turned on in this case.

The current patch allows 3 levels of access to the new lockref
structure:

1. The lockless update optimization is turned off (SPINLOCK_REFCOUNT=n).
2. The lockless update optimization is turned on and the generic version
   is used (SPINLOCK_REFCOUNT=y and ARCH_SPINLOCK_REFCOUNT=y).
3. The lockless update optimization is turned on and the architecture
   provides its own version.

To maximize the chance of doing lockless update in the generic version,
the inlined __lockref_add_unless() function will wait for a certain
amount of time if the lock is not free before trying to do the update.

The new code also attempts to do lockless atomic update twice before
falling back to the old code path of acquiring a lock before doing
the update. It is because there will still be some fair amount of
contention with only one attempt.

Signed-off-by: Waiman Long <Waiman.Long@hp.com>
---
 include/asm-generic/spinlock_refcount.h |   98 +++++++++++++
 include/linux/spinlock_refcount.h       |  159 +++++++++++++++++++++
 kernel/Kconfig.locks                    |    5 +
 lib/Makefile                            |    2 +
 lib/spinlock_refcount.c                 |  229 +++++++++++++++++++++++++++++++
 5 files changed, 493 insertions(+), 0 deletions(-)
 create mode 100644 include/asm-generic/spinlock_refcount.h
 create mode 100644 include/linux/spinlock_refcount.h
 create mode 100644 lib/spinlock_refcount.c

diff --git a/include/asm-generic/spinlock_refcount.h b/include/asm-generic/spinlock_refcount.h
new file mode 100644
index 0000000..8b646cc
--- /dev/null
+++ b/include/asm-generic/spinlock_refcount.h
@@ -0,0 +1,98 @@
+/*
+ * Spinlock with reference count combo
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * (c) Copyright 2013 Hewlett-Packard Development Company, L.P.
+ *
+ * Authors: Waiman Long <waiman.long@hp.com>
+ */
+#ifndef __ASM_GENERIC_SPINLOCK_REFCOUNT_H
+#define __ASM_GENERIC_SPINLOCK_REFCOUNT_H
+
+/*
+ * The lockref structure defines a combined spinlock with reference count
+ * data structure to be embedded in a larger structure. The combined data
+ * structure is always 8-byte aligned. So proper placement of this structure
+ * in the larger embedding data structure is needed to ensure that there is
+ * no hole in it.
+ */
+struct __aligned(sizeof(u64)) lockref {
+	union {
+		u64		__lock_count;
+		struct {
+			unsigned int	refcnt;	/* Reference count */
+			spinlock_t	lock;
+		};
+	};
+};
+
+/*
+ * Struct lockref helper functions
+ */
+extern void lockref_get(struct lockref *lockcnt);
+extern int  lockref_put(struct lockref *lockcnt);
+extern int  lockref_get_not_zero(struct lockref *lockcnt);
+extern int  lockref_put_or_locked(struct lockref *lockcnt);
+
+/*
+ * lockref_lock - locks the embedded spinlock
+ * @lockcnt: pointer to lockref structure
+ */
+static __always_inline void
+lockref_lock(struct lockref *lockcnt)
+{
+	spin_lock(&lockcnt->lock);
+}
+
+/*
+ * lockref_lock_nested - locks the embedded spinlock
+ * @lockcnt: pointer to lockref structure
+ */
+static __always_inline void
+lockref_lock_nested(struct lockref *lockcnt, int subclass)
+{
+	spin_lock_nested(&lockcnt->lock, subclass);
+}
+
+/*
+ * lockref_trylock - trys to acquire the embedded spinlock
+ * @lockcnt: pointer to lockref structure
+ */
+static __always_inline int
+lockref_trylock(struct lockref *lockcnt)
+{
+	return spin_trylock(&lockcnt->lock);
+}
+
+/*
+ * lockref_unlock - unlocks the embedded spinlock
+ * @lockcnt: pointer to lockref structure
+ */
+static __always_inline void
+lockref_unlock(struct lockref *lockcnt)
+{
+	spin_unlock(&lockcnt->lock);
+}
+
+/*
+ * lockref_ret_lock - returns address of the embedded spinlock
+ * @lockcnt: pointer to lockref structure
+ */
+#define lockref_ret_lock(lockcnt)	((lockcnt)->lock)
+
+/*
+ * lockref_ret_count - returns the embedded reference count
+ * @lockcnt: pointer to lockref structure
+ */
+#define lockref_ret_count(lockcnt)	((lockcnt)->refcnt)
+
+#endif /* __ASM_GENERIC_SPINLOCK_REFCOUNT_H */
diff --git a/include/linux/spinlock_refcount.h b/include/linux/spinlock_refcount.h
new file mode 100644
index 0000000..0d6d10c
--- /dev/null
+++ b/include/linux/spinlock_refcount.h
@@ -0,0 +1,159 @@
+/*
+ * Spinlock with reference count combo
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * (c) Copyright 2013 Hewlett-Packard Development Company, L.P.
+ *
+ * Authors: Waiman Long <waiman.long@hp.com>
+ */
+#ifndef __LINUX_SPINLOCK_REFCOUNT_H
+#define __LINUX_SPINLOCK_REFCOUNT_H
+
+#include <linux/spinlock.h>
+
+#ifdef CONFIG_SPINLOCK_REFCOUNT
+#include <asm/spinlock_refcount.h>
+#else
+/*
+ * If the spinlock & reference count optimization feature is disabled,
+ * the spinlock and reference count are accessed separately on its own.
+ */
+struct lockref {
+	unsigned int refcnt;	/* Reference count */
+	spinlock_t   lock;
+};
+
+/*
+ * Struct lockref helper functions
+ */
+/*
+ * lockref_get - increments reference count while not locked
+ * @lockcnt: pointer to lockref structure
+ */
+static __always_inline void
+lockref_get(struct lockref *lockcnt)
+{
+	spin_lock(&lockcnt->lock);
+	lockcnt->refcnt++;
+	spin_unlock(&lockcnt->lock);
+}
+
+/*
+ * lockref_get_not_zero - increments count unless the count is 0
+ * @lockcnt: pointer to lockref structure
+ * Return: 1 if count updated successfully or 0 if count is 0
+ */
+static __always_inline int
+lockref_get_not_zero(struct lockref *lockcnt)
+{
+	int retval = 0;
+
+	spin_lock(&lockcnt->lock);
+	if (likely(lockcnt->refcnt)) {
+		lockcnt->refcnt++;
+		retval = 1;
+	}
+	spin_unlock(&lockcnt->lock);
+	return retval;
+}
+
+/*
+ * lockref_put - decrements count unless count <= 1 before decrement
+ * @lockcnt: pointer to lockref structure
+ * Return: 1 if count updated successfully or 0 if count <= 1
+ */
+static __always_inline int
+lockref_put(struct lockref *lockcnt)
+{
+	int retval = 0;
+
+	spin_lock(&lockcnt->lock);
+	if (likely(lockcnt->refcnt > 1)) {
+		lockcnt->refcnt--;
+		retval = 1;
+	}
+	spin_unlock(&lockcnt->lock);
+	return retval;
+}
+
+/*
+ * lockref_put_or_locked - decrements count unless count <= 1 before decrement
+ *			   otherwise the lock will be taken
+ * @lockcnt: pointer to lockref structure
+ * Return: 1 if count updated successfully or 0 if count <= 1 and lock taken
+ */
+static __always_inline int
+lockref_put_or_locked(struct lockref *lockcnt)
+{
+	spin_lock(&lockcnt->lock);
+	if (likely(lockcnt->refcnt > 1)) {
+		lockcnt->refcnt--;
+		spin_unlock(&lockcnt->lock);
+		return 1;
+	}
+	return 0;	/* Count is 1 & lock taken */
+}
+
+/*
+ * lockref_lock - locks the embedded spinlock
+ * @lockcnt: pointer to lockref structure
+ */
+static __always_inline void
+lockref_lock(struct lockref *lockcnt)
+{
+	spin_lock(&lockcnt->lock);
+}
+
+/*
+ * lockref_lock_nested - locks the embedded spinlock
+ * @lockcnt: pointer to lockref structure
+ */
+static __always_inline void
+lockref_lock_nested(struct lockref *lockcnt, int subclass)
+{
+	spin_lock_nested(&lockcnt->lock, subclass);
+}
+
+/*
+ * lockref_trylock - trys to acquire the embedded spinlock
+ * @lockcnt: pointer to lockref structure
+ */
+static __always_inline int
+lockref_trylock(struct lockref *lockcnt)
+{
+	return spin_trylock(&lockcnt->lock);
+}
+
+/*
+ * lockref_unlock - unlocks the embedded spinlock
+ * @lockcnt: pointer to lockref structure
+ */
+static __always_inline void
+lockref_unlock(struct lockref *lockcnt)
+{
+	spin_unlock(&lockcnt->lock);
+}
+
+/*
+ * lockref_ret_lock - returns the embedded spinlock
+ * @lockcnt: pointer to lockref structure
+ */
+#define lockref_ret_lock(lockcnt)	((lockcnt)->lock)
+
+/*
+ * lockref_ret_count - returns the embedded reference count
+ * @lockcnt: pointer to lockref structure
+ */
+#define lockref_ret_count(lockcnt)	((lockcnt)->refcnt)
+
+#endif /* CONFIG_SPINLOCK_REFCOUNT */
+#endif /* __LINUX_SPINLOCK_REFCOUNT_H */
diff --git a/kernel/Kconfig.locks b/kernel/Kconfig.locks
index 44511d1..d1f8670 100644
--- a/kernel/Kconfig.locks
+++ b/kernel/Kconfig.locks
@@ -223,3 +223,8 @@ endif
 config MUTEX_SPIN_ON_OWNER
 	def_bool y
 	depends on SMP && !DEBUG_MUTEXES
+
+config SPINLOCK_REFCOUNT
+	def_bool y
+	depends on ARCH_SPINLOCK_REFCOUNT && SMP
+	depends on !GENERIC_LOCKBREAK && !DEBUG_SPINLOCK && !DEBUG_LOCK_ALLOC
diff --git a/lib/Makefile b/lib/Makefile
index c55a037..0367915 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -179,3 +179,5 @@ quiet_cmd_build_OID_registry = GEN     $@
 clean-files	+= oid_registry_data.c
 
 obj-$(CONFIG_UCS2_STRING) += ucs2_string.o
+
+obj-$(CONFIG_SPINLOCK_REFCOUNT) += spinlock_refcount.o
diff --git a/lib/spinlock_refcount.c b/lib/spinlock_refcount.c
new file mode 100644
index 0000000..1f599a7
--- /dev/null
+++ b/lib/spinlock_refcount.c
@@ -0,0 +1,229 @@
+/*
+ * Generic spinlock with reference count combo
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * (C) Copyright 2013 Hewlett-Packard Development Company, L.P.
+ *
+ * Authors: Waiman Long <waiman.long@hp.com>
+ */
+
+#include <linux/spinlock.h>
+#include <asm-generic/spinlock_refcount.h>
+
+#ifdef _SHOW_WAIT_LOOP_TIME
+#include <linux/time.h>
+#endif
+
+/*
+ * The maximum number of waiting spins when the lock was acquiring before
+ * trying to attempt a lockless update. The purpose of the timeout is to
+ * limit the amount of unfairness to the thread that is doing the reference
+ * count update. Otherwise, it is theoretically possible for that thread to
+ * be starved for a really long time causing all kind of problems. If it
+ * times out and the lock is still not free, the code will fall back to the
+ * traditional way of queuing up to acquire the lock before updating the count.
+ *
+ * The actual time spent in the wait loop very much depends on the CPU being
+ * used. On a 2.4GHz Westmere CPU, the execution time of a PAUSE instruction
+ * (cpu_relax) in a 4k loop is about 16us. The lock checking and looping
+ * overhead is about 8us. With 4 cpu_relax's in the loop, it will wait
+ * about 72us before the count reaches 0. With cacheline contention, the
+ * wait time can go up to 3x as much (about 210us). Having multiple
+ * cpu_relax's in the wait loop does seem to reduce cacheline contention
+ * somewhat and give slightly better performance.
+ *
+ * The preset timeout value is rather arbitrary and really depends on the CPU
+ * being used. If customization is needed, we could add a config variable for
+ * that. The exact value is not that important. A longer wait time will
+ * increase the chance of doing a lockless update, but it results in more
+ * unfairness to the waiting thread and vice versa.
+ */
+#ifndef CONFIG_LOCKREF_WAIT_SHIFT
+#define CONFIG_LOCKREF_WAIT_SHIFT	12
+#endif
+#define	LOCKREF_SPIN_WAIT_MAX	(1<<CONFIG_LOCKREF_WAIT_SHIFT)
+
+/**
+ *
+ * __lockref_add_unless - atomically add the given value to the count unless
+ *			  the lock was acquired or the count equals to the
+ *			  given threshold value.
+ *
+ * @plockcnt : pointer to the combined lock and count 8-byte data
+ * @plock    : pointer to the spinlock
+ * @pcount   : pointer to the reference count
+ * @value    : value to be added
+ * @threshold: threshold value for acquiring the lock
+ * Return    : 1 if operation succeeds, 0 otherwise
+ *
+ * If the lock was not acquired, __lockref_add_unless() atomically adds the
+ * given value to the reference count unless the given threshold is reached.
+ * If the lock was acquired or the threshold was reached, 0 is returned and
+ * the caller will have to acquire the lock and update the count accordingly
+ * (can be done in a non-atomic way).
+ *
+ * N.B. The lockdep code won't be run as this optimization should be disabled
+ *	when LOCK_STAT is enabled.
+ */
+static __always_inline int
+__lockref_add_unless(u64 *plockcnt, spinlock_t *plock, unsigned int *pcount,
+		     int value, int threshold)
+{
+	struct lockref old, new;
+	int   get_lock, loopcnt;
+#ifdef _SHOW_WAIT_LOOP_TIME
+	struct timespec tv1, tv2;
+#endif
+
+	/*
+	 * Code doesn't work if raw spinlock is larger than 4 bytes
+	 * or is empty.
+	 */
+	BUILD_BUG_ON(sizeof(arch_spinlock_t) == 0);
+	if (sizeof(arch_spinlock_t) > 4)
+		return 0;	/* Need to acquire the lock explicitly */
+
+	/*
+	 * Wait a certain amount of time until the lock is free or the loop
+	 * counter reaches 0. Doing multiple cpu_relax() helps to reduce
+	 * contention in the spinlock cacheline and achieve better performance.
+	 */
+#ifdef _SHOW_WAIT_LOOP_TIME
+	getnstimeofday(&tv1);
+#endif
+	loopcnt = LOCKREF_SPIN_WAIT_MAX;
+	while (loopcnt && spin_is_locked(plock)) {
+		loopcnt--;
+		cpu_relax();
+		cpu_relax();
+		cpu_relax();
+		cpu_relax();
+	}
+#ifdef _SHOW_WAIT_LOOP_TIME
+	if (loopcnt == 0) {
+		unsigned long ns;
+
+		getnstimeofday(&tv2);
+		ns = (tv2.tv_sec - tv1.tv_sec) * NSEC_PER_SEC +
+		     (tv2.tv_nsec - tv1.tv_nsec);
+		pr_info("lockref wait loop time = %lu ns\n", ns);
+	}
+#endif
+
+#ifdef CONFIG_LOCK_STAT
+	if (loopcnt != LOCKREF_SPIN_WAIT_MAX)
+		lock_contended(plock->dep_map, _RET_IP_);
+#endif
+	old.__lock_count = ACCESS_ONCE(*plockcnt);
+	get_lock = ((threshold >= 0) && (old.refcnt <= threshold));
+	if (likely(!get_lock && !spin_is_locked(&old.lock))) {
+		new.__lock_count = old.__lock_count;
+		new.refcnt += value;
+		if (cmpxchg64(plockcnt, old.__lock_count, new.__lock_count)
+		    == old.__lock_count)
+			return 1;
+		cpu_relax();
+		cpu_relax();
+		/*
+		 * Try one more time
+		 */
+		old.__lock_count = ACCESS_ONCE(*plockcnt);
+		get_lock = ((threshold >= 0) && (old.refcnt <= threshold));
+		if (likely(!get_lock && !spin_is_locked(&old.lock))) {
+			new.__lock_count = old.__lock_count;
+			new.refcnt += value;
+			if (cmpxchg64(plockcnt, old.__lock_count,
+				      new.__lock_count) == old.__lock_count)
+				return 1;
+			cpu_relax();
+		}
+	}
+	return 0;	/* The caller will need to acquire the lock */
+}
+
+/*
+ * Generic macros to increment and decrement reference count
+ * @sname : pointer to lockref structure
+ * @lock  : name of spinlock in structure
+ * @count : name of reference count in structure
+ *
+ * LOCKREF_GET()  - increments count unless it is locked
+ * LOCKREF_GET0() - increments count unless it is locked or count is 0
+ * LOCKREF_PUT()  - decrements count unless it is locked or count is 1
+ */
+#define LOCKREF_GET(sname, lock, count)					\
+	__lockref_add_unless(&(sname)->__lock_count, &(sname)->lock,	\
+			     &(sname)->count, 1, -1)
+#define LOCKREF_GET0(sname, lock, count)				\
+	__lockref_add_unless(&(sname)->__lock_count, &(sname)->lock,	\
+			     &(sname)->count, 1, 0)
+#define LOCKREF_PUT(sname, lock, count)					\
+	__lockref_add_unless(&(sname)->__lock_count, &(sname)->lock,	\
+			     &(sname)->count, -1, 1)
+
+/*
+ * Struct lockref helper functions
+ */
+/*
+ * lockref_get - increments reference count
+ * @lockcnt: pointer to struct lockref structure
+ */
+void
+lockref_get(struct lockref *lockcnt)
+{
+	if (LOCKREF_GET(lockcnt, lock, refcnt))
+		return;
+	spin_lock(&lockcnt->lock);
+	lockcnt->refcnt++;
+	spin_unlock(&lockcnt->lock);
+}
+EXPORT_SYMBOL(lockref_get);
+
+/*
+ * lockref_get_not_zero - increments count unless the count is 0
+ * @lockcnt: pointer to struct lockref structure
+ * Return: 1 if count updated successfully or 0 if count is 0 and lock taken
+ */
+int
+lockref_get_not_zero(struct lockref *lockcnt)
+{
+	return LOCKREF_GET0(lockcnt, lock, refcnt);
+}
+EXPORT_SYMBOL(lockref_get_not_zero);
+
+/*
+ * lockref_put - decrements count unless the count <= 1
+ * @lockcnt: pointer to struct lockref structure
+ * Return: 1 if count updated successfully or 0 if count <= 1
+ */
+int
+lockref_put(struct lockref *lockcnt)
+{
+	return LOCKREF_PUT(lockcnt, lock, refcnt);
+}
+EXPORT_SYMBOL(lockref_put);
+
+/*
+ * lockref_put_or_locked - decrements count unless the count is <= 1
+ *			   otherwise, the lock will be taken
+ * @lockcnt: pointer to struct lockref structure
+ * Return: 1 if count updated successfully or 0 if count <= 1 and lock taken
+ */
+int
+lockref_put_or_locked(struct lockref *lockcnt)
+{
+	if (LOCKREF_PUT(lockcnt, lock, refcnt))
+		return 1;
+	spin_lock(&lockcnt->lock);
+	return 0;
+}
+EXPORT_SYMBOL(lockref_put_or_locked);
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH v3 02/25] spinlock: Enable x86 architecture to do lockless refcount update
  2013-07-03 20:18 [PATCH v3 00/25] Lockless update of reference count protected by spinlock Waiman Long
  2013-07-03 20:18 ` [PATCH v3 01/25] spinlock: A new lockref structure for lockless update of refcount Waiman Long
@ 2013-07-03 20:19 ` Waiman Long
  2013-07-03 20:19 ` [PATCH v3 03/25] dcache: Enable lockless update of d_count in dentry structure Waiman Long
  2 siblings, 0 replies; 6+ messages in thread
From: Waiman Long @ 2013-07-03 20:19 UTC (permalink / raw)
  To: Alexander Viro, Jeff Layton, Miklos Szeredi, Ingo Molnar,
	Thomas Gleixner
  Cc: Waiman Long, linux-fsdevel, linux-kernel, Peter Zijlstra,
	Steven Rostedt, Linus Torvalds, Benjamin Herrenschmidt,
	Andi Kleen, Chandramouleeswaran, Aswin, Norton, Scott J

There are two steps to enable each architecture to do lockless
reference count update:
1. Define the ARCH_SPINLOCK_REFCOUNT config parameter in its Kconfig
   file.
2. Add a <asm/spinlock_refcount.h> architecture specific header file.

This is done for the x86 architecture to use the generic version
available.

Signed-off-by: Waiman Long <Waiman.Long@hp.com>
---
 arch/x86/Kconfig                         |    3 +++
 arch/x86/include/asm/spinlock_refcount.h |    1 +
 2 files changed, 4 insertions(+), 0 deletions(-)
 create mode 100644 arch/x86/include/asm/spinlock_refcount.h

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index fe120da..649ed4b 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -253,6 +253,9 @@ config ARCH_CPU_PROBE_RELEASE
 config ARCH_SUPPORTS_UPROBES
 	def_bool y
 
+config ARCH_SPINLOCK_REFCOUNT
+	def_bool y
+
 source "init/Kconfig"
 source "kernel/Kconfig.freezer"
 
diff --git a/arch/x86/include/asm/spinlock_refcount.h b/arch/x86/include/asm/spinlock_refcount.h
new file mode 100644
index 0000000..ab6224f
--- /dev/null
+++ b/arch/x86/include/asm/spinlock_refcount.h
@@ -0,0 +1 @@
+#include <asm-generic/spinlock_refcount.h>
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH v3 03/25] dcache: Enable lockless update of d_count in dentry structure
  2013-07-03 20:18 [PATCH v3 00/25] Lockless update of reference count protected by spinlock Waiman Long
  2013-07-03 20:18 ` [PATCH v3 01/25] spinlock: A new lockref structure for lockless update of refcount Waiman Long
  2013-07-03 20:19 ` [PATCH v3 02/25] spinlock: Enable x86 architecture to do lockless refcount update Waiman Long
@ 2013-07-03 20:19 ` Waiman Long
  2013-07-03 20:37   ` Linus Torvalds
  2 siblings, 1 reply; 6+ messages in thread
From: Waiman Long @ 2013-07-03 20:19 UTC (permalink / raw)
  To: Alexander Viro, Jeff Layton, Miklos Szeredi, Ingo Molnar,
	Thomas Gleixner
  Cc: Waiman Long, linux-fsdevel, linux-kernel, Peter Zijlstra,
	Steven Rostedt, Linus Torvalds, Benjamin Herrenschmidt,
	Andi Kleen, Chandramouleeswaran, Aswin, Norton, Scott J

The current code takes the dentry's d_lock lock whenever the d_count
reference count is being updated. In reality, nothing big really
happens until d_count goes to 0 in dput(). So it is not necessary
to take the lock if the reference count won't go to 0. On the other
hand, there are cases where d_count should not be updated or was not
expected to be updated while d_lock was acquired by another thread.

To use the new lockref infrastructure to do lockless reference count
update, the d_lock and d_count field of the dentry structure was
combined into a new d_lockcnt field. To access the new spinlock and
reference count fields, a number of helper functions were added to
the dcache.h header file.

The offsets of the new d_lockcnt field are at byte 72 and 88 for
32-bit and 64-bit SMP systems respectively. In both cases, they are
8-byte aligned and their combination into a single 8-byte word will
not introduce a hole that increase the size of the dentry structure.

This patch has a particular big impact on the short workload of the
AIM7 benchmark with ramdisk filesystem. The table below show the
performance improvement to the JPM (jobs per minutes) throughput
due to this patch on an 8-socket 80-core x86-64 system with a 3.10
kernel in a 1/2/4/8 node configuration by using numactl to restrict
the execution of the workload on certain nodes.

+-----------------+----------------+-----------------+----------+
|  Configuration  |    Mean JPM    |    Mean JPM     | % Change |
|                 | Rate w/o patch | Rate with patch |          |
+-----------------+---------------------------------------------+
|                 |              User Range 10 - 100            |
+-----------------+---------------------------------------------+
| 8 nodes, HT off |    1650355     |     5191497     | +214.6%  |
| 4 nodes, HT off |    1665137     |     5204267     | +212.5%  |
| 2 nodes, HT off |    1667552     |     3815637     | +128.8%  |
| 1 node , HT off |    2442998     |     2352103     |   -3.7%  |
+-----------------+---------------------------------------------+
|                 |              User Range 200 - 1000          |
+-----------------+---------------------------------------------+
| 8 nodes, HT off |    1008604     |     5972142     | +492.1%  |
| 4 nodes, HT off |    1317284     |     7190302     | +445.8%  |
| 2 nodes, HT off |    1048363     |     4516400     | +330.8%  |
| 1 node , HT off |    2461802     |     2466583     |   +0.2%  |
+-----------------+---------------------------------------------+
|                 |              User Range 1100 - 2000         |
+-----------------+---------------------------------------------+
| 8 nodes, HT off |     995149     |     6424182     | +545.6%  |
| 4 nodes, HT off |    1313386     |     7012193     | +433.9%  |
| 2 nodes, HT off |    1041411     |     4478519     | +330.0%  |
| 1 node , HT off |    2511186     |     2482650     |   -1.1%  |
+-----------------+----------------+-----------------+----------+

It can be seen that with 20 CPUs (2 nodes) or more, this patch can
significantly improve the short workload performance. With only 1
node, the performance is similar with or without the patch. The short
workload also scales pretty well up to 4 nodes with this patch.

The following table shows the short workload performance difference
of the original 3.10 kernel versus the one with the patch but have
SPINLOCK_REFCOUNT config variable disabled.

+-----------------+----------------+-----------------+----------+
|  Configuration  |    Mean JPM    |    Mean JPM     | % Change |
|                 | Rate w/o patch | Rate with patch |          |
+-----------------+---------------------------------------------+
|                 |              User Range 10 - 100            |
+-----------------+---------------------------------------------+
| 8 nodes, HT off |    1650355     |     1634232     |   -1.0%  |
| 4 nodes, HT off |    1665137     |     1675791     |   +0.6%  |
| 2 nodes, HT off |    1667552     |     2985552     |  +79.0%  |
| 1 node , HT off |    2442998     |     2396091     |   -1.9%  |
+-----------------+---------------------------------------------+
|                 |              User Range 200 - 1000          |
+-----------------+---------------------------------------------+
| 8 nodes, HT off |    1008604     |     1005153     |   -0.3%  |
| 4 nodes, HT off |    1317284     |     1330782     |   +1.0%  |
| 2 nodes, HT off |    1048363     |     2056871     |  +96.2%  |
| 1 node , HT off |    2461802     |     2463877     |   +0.1%  |
+-----------------+---------------------------------------------+
|                 |              User Range 1100 - 2000         |
+-----------------+---------------------------------------------+
| 8 nodes, HT off |     995149     |      991157     |   -0.4%  |
| 4 nodes, HT off |    1313386     |     1321806     |   +0.6%  |
| 2 nodes, HT off |    1041411     |     2032808     |  +95.2%  |
| 1 node , HT off |    2511186     |     2483815     |   -1.1%  |
+-----------------+----------------+-----------------+----------+

There are some abnormalities in the original 3.10 2-node data. Ignoring
that, the performance difference for the other node counts, if any,
is insignificant.

A perf call-graph report of the short workload at 1500 users
without the patch on the same 8-node machine indicates that about
78% of the workload's total time were spent in the _raw_spin_lock()
function. Almost all of which can be attributed to the following 2
kernel functions:
 1. dget_parent (49.91%)
 2. dput (49.89%)

The relevant perf report lines are:
+  78.37%    reaim  [kernel.kallsyms]     [k] _raw_spin_lock
+   0.09%    reaim  [kernel.kallsyms]     [k] dput
+   0.05%    reaim  [kernel.kallsyms]     [k] _raw_spin_lock_irq
+   0.00%    reaim  [kernel.kallsyms]     [k] dget_parent

With this patch installed, the new perf report lines are:
+  19.65%    reaim  [kernel.kallsyms]     [k] _raw_spin_lock_irqsave
+   3.94%    reaim  [kernel.kallsyms]     [k] _raw_spin_lock
+   2.47%    reaim  [kernel.kallsyms]     [k] lockref_get_not_zero
+   0.62%    reaim  [kernel.kallsyms]     [k] lockref_put_or_locked
+   0.36%    reaim  [kernel.kallsyms]     [k] dput
+   0.31%    reaim  [kernel.kallsyms]     [k] lockref_get
+   0.02%    reaim  [kernel.kallsyms]     [k] dget_parent

-   3.94%    reaim  [kernel.kallsyms]     [k] _raw_spin_lock
   - _raw_spin_lock
      + 32.86% SyS_getcwd
      + 31.99% d_path
      + 4.81% prepend_path
      + 4.14% __rcu_process_callbacks
      + 3.73% complete_walk
      + 2.31% dget_parent
      + 1.99% unlazy_walk
      + 1.44% do_anonymous_page
      + 1.22% lockref_put_or_locked
      + 1.16% sem_lock
      + 0.95% task_rq_lock
      + 0.89% selinux_inode_free_security
      + 0.89% process_backlog
      + 0.79% enqueue_to_backlog
      + 0.72% unix_dgram_sendmsg
      + 0.69% unix_stream_sendmsg

The lockref_put_or_locked used up only 1.22% of the _raw_spin_lock
time while dget_parent used only 2.31%.

This impact of this patch on other AIM7 workloads were much more
modest.  The table below show the mean %change due to this patch on
the same 8-socket system with a 3.10 kernel.

+--------------+---------------+----------------+-----------------+
|   Workload   | mean % change | mean % change  | mean % change   |
|              | 10-100 users  | 200-1000 users | 1100-2000 users |
+--------------+---------------+----------------+-----------------+
| alltests     |     -0.2%     |     +0.5%      |     -0.3%       |
| five_sec     |     +2.5%     |     -4.2%      |     -4.7%       |
| fserver      |     +1.7%     |     +1.6%      |     +0.3%       |
| high_systime |     +0.1%     |     +1.4%      |     +5.5%       |
| new_fserver  |     +0.4%     |     +1.2%      |     +0.3%       |
| shared       |     +0.8%     |     -0.3%      |      0.0%       |
+--------------+---------------+----------------+-----------------+

There are slight drops in performance for the five_sec workload,
but slight increase in the high_systime workload.

Signed-off-by: Waiman Long <Waiman.Long@hp.com>
---
 fs/dcache.c            |  324 +++++++++++++++++++++++++-----------------------
 include/linux/dcache.h |   52 ++++++--
 2 files changed, 207 insertions(+), 169 deletions(-)

diff --git a/fs/dcache.c b/fs/dcache.c
index f09b908..d3a1693 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -42,6 +42,9 @@
 
 /*
  * Usage:
+ * d_lock  - an alias to the spinlock in d_lockcnt
+ * d_count - an alias to the reference count in d_lockcnt
+ *
  * dcache->d_inode->i_lock protects:
  *   - i_dentry, d_alias, d_inode of aliases
  * dcache_hash_bucket lock protects:
@@ -229,7 +232,7 @@ static void __d_free(struct rcu_head *head)
  */
 static void d_free(struct dentry *dentry)
 {
-	BUG_ON(dentry->d_count);
+	BUG_ON(d_ret_count(dentry));
 	this_cpu_dec(nr_dentry);
 	if (dentry->d_op && dentry->d_op->d_release)
 		dentry->d_op->d_release(dentry);
@@ -250,7 +253,7 @@ static void d_free(struct dentry *dentry)
  */
 static inline void dentry_rcuwalk_barrier(struct dentry *dentry)
 {
-	assert_spin_locked(&dentry->d_lock);
+	assert_spin_locked(&d_ret_lock(dentry));
 	/* Go through a barrier */
 	write_seqcount_barrier(&dentry->d_seq);
 }
@@ -261,14 +264,14 @@ static inline void dentry_rcuwalk_barrier(struct dentry *dentry)
  * and is unhashed.
  */
 static void dentry_iput(struct dentry * dentry)
-	__releases(dentry->d_lock)
+	__releases(d_ret_lock(dentry))
 	__releases(dentry->d_inode->i_lock)
 {
 	struct inode *inode = dentry->d_inode;
 	if (inode) {
 		dentry->d_inode = NULL;
 		hlist_del_init(&dentry->d_alias);
-		spin_unlock(&dentry->d_lock);
+		d_unlock(dentry);
 		spin_unlock(&inode->i_lock);
 		if (!inode->i_nlink)
 			fsnotify_inoderemove(inode);
@@ -277,7 +280,7 @@ static void dentry_iput(struct dentry * dentry)
 		else
 			iput(inode);
 	} else {
-		spin_unlock(&dentry->d_lock);
+		d_unlock(dentry);
 	}
 }
 
@@ -286,14 +289,14 @@ static void dentry_iput(struct dentry * dentry)
  * d_iput() operation if defined. dentry remains in-use.
  */
 static void dentry_unlink_inode(struct dentry * dentry)
-	__releases(dentry->d_lock)
+	__releases(d_ret_lock(dentry))
 	__releases(dentry->d_inode->i_lock)
 {
 	struct inode *inode = dentry->d_inode;
 	dentry->d_inode = NULL;
 	hlist_del_init(&dentry->d_alias);
 	dentry_rcuwalk_barrier(dentry);
-	spin_unlock(&dentry->d_lock);
+	d_unlock(dentry);
 	spin_unlock(&inode->i_lock);
 	if (!inode->i_nlink)
 		fsnotify_inoderemove(inode);
@@ -359,12 +362,12 @@ static void dentry_lru_move_list(struct dentry *dentry, struct list_head *list)
  *
  * If this is the root of the dentry tree, return NULL.
  *
- * dentry->d_lock and parent->d_lock must be held by caller, and are dropped by
- * d_kill.
+ * d_ret_lock(dentry) and d_ret_lock(parent) must be held by caller,
+ * and are dropped by d_kill.
  */
 static struct dentry *d_kill(struct dentry *dentry, struct dentry *parent)
-	__releases(dentry->d_lock)
-	__releases(parent->d_lock)
+	__releases(d_ret_lock(dentry))
+	__releases(d_ret_lock(parent))
 	__releases(dentry->d_inode->i_lock)
 {
 	list_del(&dentry->d_u.d_child);
@@ -374,7 +377,7 @@ static struct dentry *d_kill(struct dentry *dentry, struct dentry *parent)
 	 */
 	dentry->d_flags |= DCACHE_DENTRY_KILLED;
 	if (parent)
-		spin_unlock(&parent->d_lock);
+		d_unlock(parent);
 	dentry_iput(dentry);
 	/*
 	 * dentry_iput drops the locks, at which point nobody (except
@@ -386,7 +389,7 @@ static struct dentry *d_kill(struct dentry *dentry, struct dentry *parent)
 
 /*
  * Unhash a dentry without inserting an RCU walk barrier or checking that
- * dentry->d_lock is locked.  The caller must take care of that, if
+ * d_ret_lock(dentry) is locked.  The caller must take care of that, if
  * appropriate.
  */
 static void __d_shrink(struct dentry *dentry)
@@ -418,7 +421,7 @@ static void __d_shrink(struct dentry *dentry)
  * d_drop() is used mainly for stuff that wants to invalidate a dentry for some
  * reason (NFS timeouts or autofs deletes).
  *
- * __d_drop requires dentry->d_lock.
+ * __d_drop requires d_ret_lock(dentry)
  */
 void __d_drop(struct dentry *dentry)
 {
@@ -431,20 +434,20 @@ EXPORT_SYMBOL(__d_drop);
 
 void d_drop(struct dentry *dentry)
 {
-	spin_lock(&dentry->d_lock);
+	d_lock(dentry);
 	__d_drop(dentry);
-	spin_unlock(&dentry->d_lock);
+	d_unlock(dentry);
 }
 EXPORT_SYMBOL(d_drop);
 
 /*
  * Finish off a dentry we've decided to kill.
- * dentry->d_lock must be held, returns with it unlocked.
+ * d_ret_lock(dentry) must be held, returns with it unlocked.
  * If ref is non-zero, then decrement the refcount too.
  * Returns dentry requiring refcount drop, or NULL if we're done.
  */
 static inline struct dentry *dentry_kill(struct dentry *dentry, int ref)
-	__releases(dentry->d_lock)
+	__releases(d_ret_lock(dentry))
 {
 	struct inode *inode;
 	struct dentry *parent;
@@ -452,7 +455,7 @@ static inline struct dentry *dentry_kill(struct dentry *dentry, int ref)
 	inode = dentry->d_inode;
 	if (inode && !spin_trylock(&inode->i_lock)) {
 relock:
-		spin_unlock(&dentry->d_lock);
+		d_unlock(dentry);
 		cpu_relax();
 		return dentry; /* try again with same dentry */
 	}
@@ -460,14 +463,14 @@ relock:
 		parent = NULL;
 	else
 		parent = dentry->d_parent;
-	if (parent && !spin_trylock(&parent->d_lock)) {
+	if (parent && !d_trylock(parent)) {
 		if (inode)
 			spin_unlock(&inode->i_lock);
 		goto relock;
 	}
 
 	if (ref)
-		dentry->d_count--;
+		d_ret_count(dentry)--;
 	/*
 	 * inform the fs via d_prune that this dentry is about to be
 	 * unhashed and destroyed.
@@ -513,13 +516,15 @@ void dput(struct dentry *dentry)
 		return;
 
 repeat:
-	if (dentry->d_count == 1)
+	if (d_ret_count(dentry) == 1)
 		might_sleep();
-	spin_lock(&dentry->d_lock);
-	BUG_ON(!dentry->d_count);
-	if (dentry->d_count > 1) {
-		dentry->d_count--;
-		spin_unlock(&dentry->d_lock);
+	if (lockref_put_or_locked(&dentry->d_lockcnt))
+		return;
+	/* dentry's lock taken */
+	BUG_ON(!d_ret_count(dentry));
+	if (d_ret_count(dentry) > 1) {
+		d_ret_count(dentry)--;
+		d_unlock(dentry);
 		return;
 	}
 
@@ -535,8 +540,8 @@ repeat:
 	dentry->d_flags |= DCACHE_REFERENCED;
 	dentry_lru_add(dentry);
 
-	dentry->d_count--;
-	spin_unlock(&dentry->d_lock);
+	d_ret_count(dentry)--;
+	d_unlock(dentry);
 	return;
 
 kill_it:
@@ -563,9 +568,9 @@ int d_invalidate(struct dentry * dentry)
 	/*
 	 * If it's already been dropped, return OK.
 	 */
-	spin_lock(&dentry->d_lock);
+	d_lock(dentry);
 	if (d_unhashed(dentry)) {
-		spin_unlock(&dentry->d_lock);
+		d_unlock(dentry);
 		return 0;
 	}
 	/*
@@ -573,9 +578,9 @@ int d_invalidate(struct dentry * dentry)
 	 * to get rid of unused child entries.
 	 */
 	if (!list_empty(&dentry->d_subdirs)) {
-		spin_unlock(&dentry->d_lock);
+		d_unlock(dentry);
 		shrink_dcache_parent(dentry);
-		spin_lock(&dentry->d_lock);
+		d_lock(dentry);
 	}
 
 	/*
@@ -590,15 +595,15 @@ int d_invalidate(struct dentry * dentry)
 	 * We also need to leave mountpoints alone,
 	 * directory or not.
 	 */
-	if (dentry->d_count > 1 && dentry->d_inode) {
+	if (d_ret_count(dentry) > 1 && dentry->d_inode) {
 		if (S_ISDIR(dentry->d_inode->i_mode) || d_mountpoint(dentry)) {
-			spin_unlock(&dentry->d_lock);
+			d_unlock(dentry);
 			return -EBUSY;
 		}
 	}
 
 	__d_drop(dentry);
-	spin_unlock(&dentry->d_lock);
+	d_unlock(dentry);
 	return 0;
 }
 EXPORT_SYMBOL(d_invalidate);
@@ -606,37 +611,41 @@ EXPORT_SYMBOL(d_invalidate);
 /* This must be called with d_lock held */
 static inline void __dget_dlock(struct dentry *dentry)
 {
-	dentry->d_count++;
+	d_ret_count(dentry)++;
 }
 
 static inline void __dget(struct dentry *dentry)
 {
-	spin_lock(&dentry->d_lock);
-	__dget_dlock(dentry);
-	spin_unlock(&dentry->d_lock);
+	lockref_get(&dentry->d_lockcnt);
 }
 
 struct dentry *dget_parent(struct dentry *dentry)
 {
 	struct dentry *ret;
 
+	rcu_read_lock();
+	ret = rcu_dereference(dentry->d_parent);
+	if (lockref_get_not_zero(&ret->d_lockcnt)) {
+		rcu_read_unlock();
+		return ret;
+	}
 repeat:
 	/*
 	 * Don't need rcu_dereference because we re-check it was correct under
 	 * the lock.
 	 */
-	rcu_read_lock();
-	ret = dentry->d_parent;
-	spin_lock(&ret->d_lock);
+	ret = ACCESS_ONCE(dentry->d_parent);
+	d_lock(ret);
 	if (unlikely(ret != dentry->d_parent)) {
-		spin_unlock(&ret->d_lock);
+		d_unlock(ret);
 		rcu_read_unlock();
+		rcu_read_lock();
 		goto repeat;
 	}
 	rcu_read_unlock();
-	BUG_ON(!ret->d_count);
-	ret->d_count++;
-	spin_unlock(&ret->d_lock);
+	BUG_ON(!d_ret_count(ret));
+	d_ret_count(ret)++;
+	d_unlock(ret);
 	return ret;
 }
 EXPORT_SYMBOL(dget_parent);
@@ -664,31 +673,31 @@ static struct dentry *__d_find_alias(struct inode *inode, int want_discon)
 again:
 	discon_alias = NULL;
 	hlist_for_each_entry(alias, &inode->i_dentry, d_alias) {
-		spin_lock(&alias->d_lock);
+		d_lock(alias);
  		if (S_ISDIR(inode->i_mode) || !d_unhashed(alias)) {
 			if (IS_ROOT(alias) &&
 			    (alias->d_flags & DCACHE_DISCONNECTED)) {
 				discon_alias = alias;
 			} else if (!want_discon) {
 				__dget_dlock(alias);
-				spin_unlock(&alias->d_lock);
+				d_unlock(alias);
 				return alias;
 			}
 		}
-		spin_unlock(&alias->d_lock);
+		d_unlock(alias);
 	}
 	if (discon_alias) {
 		alias = discon_alias;
-		spin_lock(&alias->d_lock);
+		d_lock(alias);
 		if (S_ISDIR(inode->i_mode) || !d_unhashed(alias)) {
 			if (IS_ROOT(alias) &&
 			    (alias->d_flags & DCACHE_DISCONNECTED)) {
 				__dget_dlock(alias);
-				spin_unlock(&alias->d_lock);
+				d_unlock(alias);
 				return alias;
 			}
 		}
-		spin_unlock(&alias->d_lock);
+		d_unlock(alias);
 		goto again;
 	}
 	return NULL;
@@ -717,16 +726,16 @@ void d_prune_aliases(struct inode *inode)
 restart:
 	spin_lock(&inode->i_lock);
 	hlist_for_each_entry(dentry, &inode->i_dentry, d_alias) {
-		spin_lock(&dentry->d_lock);
-		if (!dentry->d_count) {
+		d_lock(dentry);
+		if (!d_ret_count(dentry)) {
 			__dget_dlock(dentry);
 			__d_drop(dentry);
-			spin_unlock(&dentry->d_lock);
+			d_unlock(dentry);
 			spin_unlock(&inode->i_lock);
 			dput(dentry);
 			goto restart;
 		}
-		spin_unlock(&dentry->d_lock);
+		d_unlock(dentry);
 	}
 	spin_unlock(&inode->i_lock);
 }
@@ -734,13 +743,13 @@ EXPORT_SYMBOL(d_prune_aliases);
 
 /*
  * Try to throw away a dentry - free the inode, dput the parent.
- * Requires dentry->d_lock is held, and dentry->d_count == 0.
- * Releases dentry->d_lock.
+ * Requires d_ret_lock(dentry) is held, and dentry->d_count == 0.
+ * Releases d_ret_lock(dentry)
  *
  * This may fail if locks cannot be acquired no problem, just try again.
  */
 static void try_prune_one_dentry(struct dentry *dentry)
-	__releases(dentry->d_lock)
+	__releases(d_ret_lock(dentry))
 {
 	struct dentry *parent;
 
@@ -763,10 +772,10 @@ static void try_prune_one_dentry(struct dentry *dentry)
 	/* Prune ancestors. */
 	dentry = parent;
 	while (dentry) {
-		spin_lock(&dentry->d_lock);
-		if (dentry->d_count > 1) {
-			dentry->d_count--;
-			spin_unlock(&dentry->d_lock);
+		d_lock(dentry);
+		if (d_ret_count(dentry) > 1) {
+			d_ret_count(dentry)--;
+			d_unlock(dentry);
 			return;
 		}
 		dentry = dentry_kill(dentry, 1);
@@ -782,9 +791,9 @@ static void shrink_dentry_list(struct list_head *list)
 		dentry = list_entry_rcu(list->prev, struct dentry, d_lru);
 		if (&dentry->d_lru == list)
 			break; /* empty */
-		spin_lock(&dentry->d_lock);
+		d_lock(dentry);
 		if (dentry != list_entry(list->prev, struct dentry, d_lru)) {
-			spin_unlock(&dentry->d_lock);
+			d_unlock(dentry);
 			continue;
 		}
 
@@ -793,9 +802,9 @@ static void shrink_dentry_list(struct list_head *list)
 		 * the LRU because of laziness during lookup.  Do not free
 		 * it - just keep it off the LRU list.
 		 */
-		if (dentry->d_count) {
+		if (d_ret_count(dentry)) {
 			dentry_lru_del(dentry);
-			spin_unlock(&dentry->d_lock);
+			d_unlock(dentry);
 			continue;
 		}
 
@@ -833,7 +842,7 @@ relock:
 				struct dentry, d_lru);
 		BUG_ON(dentry->d_sb != sb);
 
-		if (!spin_trylock(&dentry->d_lock)) {
+		if (!d_trylock(dentry)) {
 			spin_unlock(&dcache_lru_lock);
 			cpu_relax();
 			goto relock;
@@ -842,11 +851,11 @@ relock:
 		if (dentry->d_flags & DCACHE_REFERENCED) {
 			dentry->d_flags &= ~DCACHE_REFERENCED;
 			list_move(&dentry->d_lru, &referenced);
-			spin_unlock(&dentry->d_lock);
+			d_unlock(dentry);
 		} else {
 			list_move_tail(&dentry->d_lru, &tmp);
 			dentry->d_flags |= DCACHE_SHRINK_LIST;
-			spin_unlock(&dentry->d_lock);
+			d_unlock(dentry);
 			if (!--count)
 				break;
 		}
@@ -913,7 +922,7 @@ static void shrink_dcache_for_umount_subtree(struct dentry *dentry)
 			dentry_lru_del(dentry);
 			__d_shrink(dentry);
 
-			if (dentry->d_count != 0) {
+			if (d_ret_count(dentry) != 0) {
 				printk(KERN_ERR
 				       "BUG: Dentry %p{i=%lx,n=%s}"
 				       " still in use (%d)"
@@ -922,7 +931,7 @@ static void shrink_dcache_for_umount_subtree(struct dentry *dentry)
 				       dentry->d_inode ?
 				       dentry->d_inode->i_ino : 0UL,
 				       dentry->d_name.name,
-				       dentry->d_count,
+				       d_ret_count(dentry),
 				       dentry->d_sb->s_type->name,
 				       dentry->d_sb->s_id);
 				BUG();
@@ -933,7 +942,7 @@ static void shrink_dcache_for_umount_subtree(struct dentry *dentry)
 				list_del(&dentry->d_u.d_child);
 			} else {
 				parent = dentry->d_parent;
-				parent->d_count--;
+				d_ret_count(parent)--;
 				list_del(&dentry->d_u.d_child);
 			}
 
@@ -964,7 +973,7 @@ static void shrink_dcache_for_umount_subtree(struct dentry *dentry)
 
 /*
  * destroy the dentries attached to a superblock on unmounting
- * - we don't need to use dentry->d_lock because:
+ * - we don't need to use d_ret_lock(dentry) because:
  *   - the superblock is detached from all mountings and open files, so the
  *     dentry trees will not be rearranged by the VFS
  *   - s_umount is write-locked, so the memory pressure shrinker will ignore
@@ -981,7 +990,7 @@ void shrink_dcache_for_umount(struct super_block *sb)
 
 	dentry = sb->s_root;
 	sb->s_root = NULL;
-	dentry->d_count--;
+	d_ret_count(dentry)--;
 	shrink_dcache_for_umount_subtree(dentry);
 
 	while (!hlist_bl_empty(&sb->s_anon)) {
@@ -1001,8 +1010,8 @@ static struct dentry *try_to_ascend(struct dentry *old, int locked, unsigned seq
 	struct dentry *new = old->d_parent;
 
 	rcu_read_lock();
-	spin_unlock(&old->d_lock);
-	spin_lock(&new->d_lock);
+	d_unlock(old);
+	d_lock(new);
 
 	/*
 	 * might go back up the wrong parent if we have had a rename
@@ -1011,7 +1020,7 @@ static struct dentry *try_to_ascend(struct dentry *old, int locked, unsigned seq
 	if (new != old->d_parent ||
 		 (old->d_flags & DCACHE_DENTRY_KILLED) ||
 		 (!locked && read_seqretry(&rename_lock, seq))) {
-		spin_unlock(&new->d_lock);
+		d_unlock(new);
 		new = NULL;
 	}
 	rcu_read_unlock();
@@ -1045,7 +1054,7 @@ again:
 
 	if (d_mountpoint(parent))
 		goto positive;
-	spin_lock(&this_parent->d_lock);
+	d_lock(this_parent);
 repeat:
 	next = this_parent->d_subdirs.next;
 resume:
@@ -1054,21 +1063,22 @@ resume:
 		struct dentry *dentry = list_entry(tmp, struct dentry, d_u.d_child);
 		next = tmp->next;
 
-		spin_lock_nested(&dentry->d_lock, DENTRY_D_LOCK_NESTED);
+		d_lock_nested(dentry, DENTRY_D_LOCK_NESTED);
 		/* Have we found a mount point ? */
 		if (d_mountpoint(dentry)) {
-			spin_unlock(&dentry->d_lock);
-			spin_unlock(&this_parent->d_lock);
+			d_unlock(dentry);
+			d_unlock(this_parent);
 			goto positive;
 		}
 		if (!list_empty(&dentry->d_subdirs)) {
-			spin_unlock(&this_parent->d_lock);
-			spin_release(&dentry->d_lock.dep_map, 1, _RET_IP_);
+			d_unlock(this_parent);
+			spin_release(&d_ret_lock(dentry).dep_map, 1, _RET_IP_);
 			this_parent = dentry;
-			spin_acquire(&this_parent->d_lock.dep_map, 0, 1, _RET_IP_);
+			spin_acquire(&d_ret_lock(this_parent).dep_map,
+				     0, 1, _RET_IP_);
 			goto repeat;
 		}
-		spin_unlock(&dentry->d_lock);
+		d_unlock(dentry);
 	}
 	/*
 	 * All done at this level ... ascend and resume the search.
@@ -1081,7 +1091,7 @@ resume:
 		next = child->d_u.d_child.next;
 		goto resume;
 	}
-	spin_unlock(&this_parent->d_lock);
+	d_unlock(this_parent);
 	if (!locked && read_seqretry(&rename_lock, seq))
 		goto rename_retry;
 	if (locked)
@@ -1128,7 +1138,7 @@ static int select_parent(struct dentry *parent, struct list_head *dispose)
 	seq = read_seqbegin(&rename_lock);
 again:
 	this_parent = parent;
-	spin_lock(&this_parent->d_lock);
+	d_lock(this_parent);
 repeat:
 	next = this_parent->d_subdirs.next;
 resume:
@@ -1137,7 +1147,7 @@ resume:
 		struct dentry *dentry = list_entry(tmp, struct dentry, d_u.d_child);
 		next = tmp->next;
 
-		spin_lock_nested(&dentry->d_lock, DENTRY_D_LOCK_NESTED);
+		d_lock_nested(dentry, DENTRY_D_LOCK_NESTED);
 
 		/*
 		 * move only zero ref count dentries to the dispose list.
@@ -1147,7 +1157,7 @@ resume:
 		 * loop in shrink_dcache_parent() might not make any progress
 		 * and loop forever.
 		 */
-		if (dentry->d_count) {
+		if (d_ret_count(dentry)) {
 			dentry_lru_del(dentry);
 		} else if (!(dentry->d_flags & DCACHE_SHRINK_LIST)) {
 			dentry_lru_move_list(dentry, dispose);
@@ -1160,7 +1170,7 @@ resume:
 		 * the rest.
 		 */
 		if (found && need_resched()) {
-			spin_unlock(&dentry->d_lock);
+			d_unlock(dentry);
 			goto out;
 		}
 
@@ -1168,14 +1178,15 @@ resume:
 		 * Descend a level if the d_subdirs list is non-empty.
 		 */
 		if (!list_empty(&dentry->d_subdirs)) {
-			spin_unlock(&this_parent->d_lock);
-			spin_release(&dentry->d_lock.dep_map, 1, _RET_IP_);
+			d_unlock(this_parent);
+			spin_release(&d_ret_lock(dentry).dep_map, 1, _RET_IP_);
 			this_parent = dentry;
-			spin_acquire(&this_parent->d_lock.dep_map, 0, 1, _RET_IP_);
+			spin_acquire(&d_ret_lock(this_parent).dep_map,
+				     0, 1, _RET_IP_);
 			goto repeat;
 		}
 
-		spin_unlock(&dentry->d_lock);
+		d_unlock(dentry);
 	}
 	/*
 	 * All done at this level ... ascend and resume the search.
@@ -1189,7 +1200,7 @@ resume:
 		goto resume;
 	}
 out:
-	spin_unlock(&this_parent->d_lock);
+	d_unlock(this_parent);
 	if (!locked && read_seqretry(&rename_lock, seq))
 		goto rename_retry;
 	if (locked)
@@ -1269,9 +1280,9 @@ struct dentry *__d_alloc(struct super_block *sb, const struct qstr *name)
 	smp_wmb();
 	dentry->d_name.name = dname;
 
-	dentry->d_count = 1;
+	d_ret_count(dentry) = 1;
 	dentry->d_flags = 0;
-	spin_lock_init(&dentry->d_lock);
+	spin_lock_init(&d_ret_lock(dentry));
 	seqcount_init(&dentry->d_seq);
 	dentry->d_inode = NULL;
 	dentry->d_parent = dentry;
@@ -1305,7 +1316,7 @@ struct dentry *d_alloc(struct dentry * parent, const struct qstr *name)
 	if (!dentry)
 		return NULL;
 
-	spin_lock(&parent->d_lock);
+	d_lock(parent);
 	/*
 	 * don't need child lock because it is not subject
 	 * to concurrency here
@@ -1313,7 +1324,7 @@ struct dentry *d_alloc(struct dentry * parent, const struct qstr *name)
 	__dget_dlock(parent);
 	dentry->d_parent = parent;
 	list_add(&dentry->d_u.d_child, &parent->d_subdirs);
-	spin_unlock(&parent->d_lock);
+	d_unlock(parent);
 
 	return dentry;
 }
@@ -1368,7 +1379,7 @@ EXPORT_SYMBOL(d_set_d_op);
 
 static void __d_instantiate(struct dentry *dentry, struct inode *inode)
 {
-	spin_lock(&dentry->d_lock);
+	d_lock(dentry);
 	if (inode) {
 		if (unlikely(IS_AUTOMOUNT(inode)))
 			dentry->d_flags |= DCACHE_NEED_AUTOMOUNT;
@@ -1376,7 +1387,7 @@ static void __d_instantiate(struct dentry *dentry, struct inode *inode)
 	}
 	dentry->d_inode = inode;
 	dentry_rcuwalk_barrier(dentry);
-	spin_unlock(&dentry->d_lock);
+	d_unlock(dentry);
 	fsnotify_d_instantiate(dentry, inode);
 }
 
@@ -1438,7 +1449,7 @@ static struct dentry *__d_instantiate_unique(struct dentry *entry,
 
 	hlist_for_each_entry(alias, &inode->i_dentry, d_alias) {
 		/*
-		 * Don't need alias->d_lock here, because aliases with
+		 * Don't need d_ret_lock(alias) here, because aliases with
 		 * d_parent == entry->d_parent are not subject to name or
 		 * parent changes, because the parent inode i_mutex is held.
 		 */
@@ -1576,14 +1587,14 @@ struct dentry *d_obtain_alias(struct inode *inode)
 	}
 
 	/* attach a disconnected dentry */
-	spin_lock(&tmp->d_lock);
+	d_lock(tmp);
 	tmp->d_inode = inode;
 	tmp->d_flags |= DCACHE_DISCONNECTED;
 	hlist_add_head(&tmp->d_alias, &inode->i_dentry);
 	hlist_bl_lock(&tmp->d_sb->s_anon);
 	hlist_bl_add_head(&tmp->d_hash, &tmp->d_sb->s_anon);
 	hlist_bl_unlock(&tmp->d_sb->s_anon);
-	spin_unlock(&tmp->d_lock);
+	d_unlock(tmp);
 	spin_unlock(&inode->i_lock);
 	security_d_instantiate(tmp, inode);
 
@@ -1946,7 +1957,7 @@ struct dentry *__d_lookup(const struct dentry *parent, const struct qstr *name)
 		if (dentry->d_name.hash != hash)
 			continue;
 
-		spin_lock(&dentry->d_lock);
+		d_lock(dentry);
 		if (dentry->d_parent != parent)
 			goto next;
 		if (d_unhashed(dentry))
@@ -1970,12 +1981,12 @@ struct dentry *__d_lookup(const struct dentry *parent, const struct qstr *name)
 				goto next;
 		}
 
-		dentry->d_count++;
+		d_ret_count(dentry)++;
 		found = dentry;
-		spin_unlock(&dentry->d_lock);
+		d_unlock(dentry);
 		break;
 next:
-		spin_unlock(&dentry->d_lock);
+		d_unlock(dentry);
  	}
  	rcu_read_unlock();
 
@@ -2021,17 +2032,17 @@ int d_validate(struct dentry *dentry, struct dentry *dparent)
 {
 	struct dentry *child;
 
-	spin_lock(&dparent->d_lock);
+	d_lock(dparent);
 	list_for_each_entry(child, &dparent->d_subdirs, d_u.d_child) {
 		if (dentry == child) {
-			spin_lock_nested(&dentry->d_lock, DENTRY_D_LOCK_NESTED);
+			d_lock_nested(dentry, DENTRY_D_LOCK_NESTED);
 			__dget_dlock(dentry);
-			spin_unlock(&dentry->d_lock);
-			spin_unlock(&dparent->d_lock);
+			d_unlock(dentry);
+			d_unlock(dparent);
 			return 1;
 		}
 	}
-	spin_unlock(&dparent->d_lock);
+	d_unlock(dparent);
 
 	return 0;
 }
@@ -2066,12 +2077,12 @@ void d_delete(struct dentry * dentry)
 	 * Are we the only user?
 	 */
 again:
-	spin_lock(&dentry->d_lock);
+	d_lock(dentry);
 	inode = dentry->d_inode;
 	isdir = S_ISDIR(inode->i_mode);
-	if (dentry->d_count == 1) {
+	if (d_ret_count(dentry) == 1) {
 		if (!spin_trylock(&inode->i_lock)) {
-			spin_unlock(&dentry->d_lock);
+			d_unlock(dentry);
 			cpu_relax();
 			goto again;
 		}
@@ -2084,7 +2095,7 @@ again:
 	if (!d_unhashed(dentry))
 		__d_drop(dentry);
 
-	spin_unlock(&dentry->d_lock);
+	d_unlock(dentry);
 
 	fsnotify_nameremove(dentry, isdir);
 }
@@ -2113,9 +2124,9 @@ static void _d_rehash(struct dentry * entry)
  
 void d_rehash(struct dentry * entry)
 {
-	spin_lock(&entry->d_lock);
+	d_lock(entry);
 	_d_rehash(entry);
-	spin_unlock(&entry->d_lock);
+	d_unlock(entry);
 }
 EXPORT_SYMBOL(d_rehash);
 
@@ -2138,11 +2149,11 @@ void dentry_update_name_case(struct dentry *dentry, struct qstr *name)
 	BUG_ON(!mutex_is_locked(&dentry->d_parent->d_inode->i_mutex));
 	BUG_ON(dentry->d_name.len != name->len); /* d_lookup gives this */
 
-	spin_lock(&dentry->d_lock);
+	d_lock(dentry);
 	write_seqcount_begin(&dentry->d_seq);
 	memcpy((unsigned char *)dentry->d_name.name, name->name, name->len);
 	write_seqcount_end(&dentry->d_seq);
-	spin_unlock(&dentry->d_lock);
+	d_unlock(dentry);
 }
 EXPORT_SYMBOL(dentry_update_name_case);
 
@@ -2190,27 +2201,27 @@ static void switch_names(struct dentry *dentry, struct dentry *target)
 static void dentry_lock_for_move(struct dentry *dentry, struct dentry *target)
 {
 	/*
-	 * XXXX: do we really need to take target->d_lock?
+	 * XXXX: do we really need to take d_ret_lock(target)?
 	 */
 	if (IS_ROOT(dentry) || dentry->d_parent == target->d_parent)
-		spin_lock(&target->d_parent->d_lock);
+		d_lock(target->d_parent);
 	else {
 		if (d_ancestor(dentry->d_parent, target->d_parent)) {
-			spin_lock(&dentry->d_parent->d_lock);
-			spin_lock_nested(&target->d_parent->d_lock,
+			d_lock(dentry->d_parent);
+			d_lock_nested(target->d_parent,
 						DENTRY_D_LOCK_NESTED);
 		} else {
-			spin_lock(&target->d_parent->d_lock);
-			spin_lock_nested(&dentry->d_parent->d_lock,
+			d_lock(target->d_parent);
+			d_lock_nested(dentry->d_parent,
 						DENTRY_D_LOCK_NESTED);
 		}
 	}
 	if (target < dentry) {
-		spin_lock_nested(&target->d_lock, 2);
-		spin_lock_nested(&dentry->d_lock, 3);
+		d_lock_nested(target, 2);
+		d_lock_nested(dentry, 3);
 	} else {
-		spin_lock_nested(&dentry->d_lock, 2);
-		spin_lock_nested(&target->d_lock, 3);
+		d_lock_nested(dentry, 2);
+		d_lock_nested(target, 3);
 	}
 }
 
@@ -2218,9 +2229,9 @@ static void dentry_unlock_parents_for_move(struct dentry *dentry,
 					struct dentry *target)
 {
 	if (target->d_parent != dentry->d_parent)
-		spin_unlock(&dentry->d_parent->d_lock);
+		d_unlock(dentry->d_parent);
 	if (target->d_parent != target)
-		spin_unlock(&target->d_parent->d_lock);
+		d_unlock(target->d_parent);
 }
 
 /*
@@ -2294,9 +2305,9 @@ static void __d_move(struct dentry * dentry, struct dentry * target)
 	write_seqcount_end(&dentry->d_seq);
 
 	dentry_unlock_parents_for_move(dentry, target);
-	spin_unlock(&target->d_lock);
+	d_unlock(target);
 	fsnotify_d_move(dentry);
-	spin_unlock(&dentry->d_lock);
+	d_unlock(dentry);
 }
 
 /*
@@ -2378,7 +2389,7 @@ out_err:
 /*
  * Prepare an anonymous dentry for life in the superblock's dentry tree as a
  * named dentry in place of the dentry to be replaced.
- * returns with anon->d_lock held!
+ * returns with d_ret_lock(anon) held!
  */
 static void __d_materialise_dentry(struct dentry *dentry, struct dentry *anon)
 {
@@ -2403,9 +2414,9 @@ static void __d_materialise_dentry(struct dentry *dentry, struct dentry *anon)
 	write_seqcount_end(&anon->d_seq);
 
 	dentry_unlock_parents_for_move(anon, dentry);
-	spin_unlock(&dentry->d_lock);
+	d_unlock(dentry);
 
-	/* anon->d_lock still locked, returns locked */
+	/* d_ret_lock(anon) still locked, returns locked */
 	anon->d_flags &= ~DCACHE_DISCONNECTED;
 }
 
@@ -2480,10 +2491,10 @@ struct dentry *d_materialise_unique(struct dentry *dentry, struct inode *inode)
 	else
 		BUG_ON(!d_unhashed(actual));
 
-	spin_lock(&actual->d_lock);
+	d_lock(actual);
 found:
 	_d_rehash(actual);
-	spin_unlock(&actual->d_lock);
+	d_unlock(actual);
 	spin_unlock(&inode->i_lock);
 out_nolock:
 	if (actual == dentry) {
@@ -2544,9 +2555,9 @@ static int prepend_path(const struct path *path,
 		}
 		parent = dentry->d_parent;
 		prefetch(parent);
-		spin_lock(&dentry->d_lock);
+		d_lock(dentry);
 		error = prepend_name(buffer, buflen, &dentry->d_name);
-		spin_unlock(&dentry->d_lock);
+		d_unlock(dentry);
 		if (!error)
 			error = prepend(buffer, buflen, "/", 1);
 		if (error)
@@ -2744,9 +2755,9 @@ static char *__dentry_path(struct dentry *dentry, char *buf, int buflen)
 		int error;
 
 		prefetch(parent);
-		spin_lock(&dentry->d_lock);
+		d_lock(dentry);
 		error = prepend_name(&end, &buflen, &dentry->d_name);
-		spin_unlock(&dentry->d_lock);
+		d_unlock(dentry);
 		if (error != 0 || prepend(&end, &buflen, "/", 1) != 0)
 			goto Elong;
 
@@ -2914,7 +2925,7 @@ void d_genocide(struct dentry *root)
 	seq = read_seqbegin(&rename_lock);
 again:
 	this_parent = root;
-	spin_lock(&this_parent->d_lock);
+	d_lock(this_parent);
 repeat:
 	next = this_parent->d_subdirs.next;
 resume:
@@ -2923,29 +2934,30 @@ resume:
 		struct dentry *dentry = list_entry(tmp, struct dentry, d_u.d_child);
 		next = tmp->next;
 
-		spin_lock_nested(&dentry->d_lock, DENTRY_D_LOCK_NESTED);
+		d_lock_nested(dentry, DENTRY_D_LOCK_NESTED);
 		if (d_unhashed(dentry) || !dentry->d_inode) {
-			spin_unlock(&dentry->d_lock);
+			d_unlock(dentry);
 			continue;
 		}
 		if (!list_empty(&dentry->d_subdirs)) {
-			spin_unlock(&this_parent->d_lock);
-			spin_release(&dentry->d_lock.dep_map, 1, _RET_IP_);
+			d_unlock(this_parent);
+			spin_release(&d_ret_lock(dentry).dep_map, 1, _RET_IP_);
 			this_parent = dentry;
-			spin_acquire(&this_parent->d_lock.dep_map, 0, 1, _RET_IP_);
+			spin_acquire(&d_ret_lock(this_parent).dep_map,
+				     0, 1, _RET_IP_);
 			goto repeat;
 		}
 		if (!(dentry->d_flags & DCACHE_GENOCIDE)) {
 			dentry->d_flags |= DCACHE_GENOCIDE;
-			dentry->d_count--;
+			d_ret_count(dentry)--;
 		}
-		spin_unlock(&dentry->d_lock);
+		d_unlock(dentry);
 	}
 	if (this_parent != root) {
 		struct dentry *child = this_parent;
 		if (!(this_parent->d_flags & DCACHE_GENOCIDE)) {
 			this_parent->d_flags |= DCACHE_GENOCIDE;
-			this_parent->d_count--;
+			d_ret_count(this_parent)--;
 		}
 		this_parent = try_to_ascend(this_parent, locked, seq);
 		if (!this_parent)
@@ -2953,7 +2965,7 @@ resume:
 		next = child->d_u.d_child.next;
 		goto resume;
 	}
-	spin_unlock(&this_parent->d_lock);
+	d_unlock(this_parent);
 	if (!locked && read_seqretry(&rename_lock, seq))
 		goto rename_retry;
 	if (locked)
diff --git a/include/linux/dcache.h b/include/linux/dcache.h
index 1a6bb81..52af188 100644
--- a/include/linux/dcache.h
+++ b/include/linux/dcache.h
@@ -9,6 +9,7 @@
 #include <linux/seqlock.h>
 #include <linux/cache.h>
 #include <linux/rcupdate.h>
+#include <linux/spinlock_refcount.h>
 
 struct nameidata;
 struct path;
@@ -112,8 +113,7 @@ struct dentry {
 	unsigned char d_iname[DNAME_INLINE_LEN];	/* small names */
 
 	/* Ref lookup also touches following */
-	unsigned int d_count;		/* protected by d_lock */
-	spinlock_t d_lock;		/* per dentry lock */
+	struct lockref d_lockcnt;	/* per dentry lock & count */
 	const struct dentry_operations *d_op;
 	struct super_block *d_sb;	/* The root of the dentry tree */
 	unsigned long d_time;		/* used by d_revalidate */
@@ -132,7 +132,7 @@ struct dentry {
 };
 
 /*
- * dentry->d_lock spinlock nesting subclasses:
+ * d_ret_lock(dentry) spinlock nesting subclasses:
  *
  * 0: normal
  * 1: nested
@@ -303,6 +303,10 @@ extern struct dentry *__d_lookup_rcu(const struct dentry *parent,
 				const struct qstr *name,
 				unsigned *seq, struct inode *inode);
 
+/* Return the embedded spinlock and reference count */
+#define	d_ret_lock(dentry)	lockref_ret_lock(&(dentry)->d_lockcnt)
+#define	d_ret_count(dentry)	lockref_ret_count(&(dentry)->d_lockcnt)
+
 /**
  * __d_rcu_to_refcount - take a refcount on dentry if sequence check is ok
  * @dentry: dentry to take a ref on
@@ -316,10 +320,10 @@ static inline int __d_rcu_to_refcount(struct dentry *dentry, unsigned seq)
 {
 	int ret = 0;
 
-	assert_spin_locked(&dentry->d_lock);
+	assert_spin_locked(&d_ret_lock(dentry));
 	if (!read_seqcount_retry(&dentry->d_seq, seq)) {
 		ret = 1;
-		dentry->d_count++;
+		d_ret_count(dentry)++;
 	}
 
 	return ret;
@@ -342,6 +346,31 @@ extern char *dentry_path(struct dentry *, char *, int);
 /* Allocation counts.. */
 
 /**
+ *	d_lock, d_lock_nested, d_trylock, d_unlock
+ *		- lock and unlock the embedding spinlock
+ *	@dentry: dentry to be locked or unlocked
+ */
+static inline void d_lock(struct dentry *dentry)
+{
+	lockref_lock(&dentry->d_lockcnt);
+}
+
+static inline void d_lock_nested(struct dentry *dentry, int subclass)
+{
+	lockref_lock_nested(&dentry->d_lockcnt, subclass);
+}
+
+static inline int d_trylock(struct dentry *dentry)
+{
+	return lockref_trylock(&dentry->d_lockcnt);
+}
+
+static inline void d_unlock(struct dentry *dentry)
+{
+	lockref_unlock(&dentry->d_lockcnt);
+}
+
+/**
  *	dget, dget_dlock -	get a reference to a dentry
  *	@dentry: dentry to get a reference to
  *
@@ -352,17 +381,14 @@ extern char *dentry_path(struct dentry *, char *, int);
 static inline struct dentry *dget_dlock(struct dentry *dentry)
 {
 	if (dentry)
-		dentry->d_count++;
+		d_ret_count(dentry)++;
 	return dentry;
 }
 
 static inline struct dentry *dget(struct dentry *dentry)
 {
-	if (dentry) {
-		spin_lock(&dentry->d_lock);
-		dget_dlock(dentry);
-		spin_unlock(&dentry->d_lock);
-	}
+	if (dentry)
+		lockref_get(&dentry->d_lockcnt);
 	return dentry;
 }
 
@@ -392,9 +418,9 @@ static inline int cant_mount(struct dentry *dentry)
 
 static inline void dont_mount(struct dentry *dentry)
 {
-	spin_lock(&dentry->d_lock);
+	d_lock(dentry);
 	dentry->d_flags |= DCACHE_CANT_MOUNT;
-	spin_unlock(&dentry->d_lock);
+	d_unlock(dentry);
 }
 
 extern void dput(struct dentry *);
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH v3 03/25] dcache: Enable lockless update of d_count in dentry structure
  2013-07-03 20:19 ` [PATCH v3 03/25] dcache: Enable lockless update of d_count in dentry structure Waiman Long
@ 2013-07-03 20:37   ` Linus Torvalds
  2013-07-03 20:52     ` Waiman Long
  0 siblings, 1 reply; 6+ messages in thread
From: Linus Torvalds @ 2013-07-03 20:37 UTC (permalink / raw)
  To: Waiman Long
  Cc: Alexander Viro, Jeff Layton, Miklos Szeredi, Ingo Molnar,
	Thomas Gleixner, linux-fsdevel, Linux Kernel Mailing List,
	Peter Zijlstra, Steven Rostedt, Benjamin Herrenschmidt,
	Andi Kleen, Chandramouleeswaran, Aswin, Norton, Scott J

This patch grew a lot, and that seems to be mainly because of bad reasons.

I'd suggest dropping the whole
"lockref_ret_count()"/"lockref_ret_lock()" helpers, which cause all
the annoyance, and just make people use the members directly.

Then, just do

   #define d_lock d_lockref.lockref_lock

or similar, so that all the existing code just continues to work,
without the need for the syntactic changes:

-       spin_lock(&dentry->d_lock);
+       d_lock(dentry);

For d_count, we probably do need to have the wrapper macro:

  #define dentry_count(dentry)  ((dentry)->d_lockref.lockref_count)

and change the existing users of "dentry->d_count" to use that, but
there are fewer of those than there are of people taking the dentry
lock. And most of them are in fs/dcache.c and would be affected by
this set of patches anyway.

                  Linus

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v3 03/25] dcache: Enable lockless update of d_count in dentry structure
  2013-07-03 20:37   ` Linus Torvalds
@ 2013-07-03 20:52     ` Waiman Long
  0 siblings, 0 replies; 6+ messages in thread
From: Waiman Long @ 2013-07-03 20:52 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Alexander Viro, Jeff Layton, Miklos Szeredi, Ingo Molnar,
	Thomas Gleixner, linux-fsdevel, Linux Kernel Mailing List,
	Peter Zijlstra, Steven Rostedt, Benjamin Herrenschmidt,
	Andi Kleen, Chandramouleeswaran, Aswin, Norton, Scott J

On 07/03/2013 04:37 PM, Linus Torvalds wrote:
> This patch grew a lot, and that seems to be mainly because of bad reasons.

That is the main reason why I choose to implement it the way it was in 
my previous version. As I add one more level to access d_lock and 
d_count, I need to change a lot more files.

> I'd suggest dropping the whole
> "lockref_ret_count()"/"lockref_ret_lock()" helpers, which cause all
> the annoyance, and just make people use the members directly.

Yes, I can do that. They are used in not that many places.

> Then, just do
>
>     #define d_lock d_lockref.lockref_lock
>
> or similar, so that all the existing code just continues to work,
> without the need for the syntactic changes:
>
> -       spin_lock(&dentry->d_lock);
> +       d_lock(dentry);

I had been thinking about that. The use of d_lock should be pretty safe 
as I didn't see that variable name used in other places. I didn't do it 
because I am afraid that people may say that using macro mapping like 
this is not a good idea. By doing that, the patch should shrink 
considerably.

> For d_count, we probably do need to have the wrapper macro:
>
>    #define dentry_count(dentry)  ((dentry)->d_lockref.lockref_count)
>
> and change the existing users of "dentry->d_count" to use that, but
> there are fewer of those than there are of people taking the dentry
> lock. And most of them are in fs/dcache.c and would be affected by
> this set of patches anyway.

The d_count name is not unique to the dentry structure. So files that 
access d_count have to be modified explicitly.

I will see if there are more feedback and send an updated patchset by 
the end of this week or early next week.

Regards,
Longman

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2013-07-03 20:52 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-07-03 20:18 [PATCH v3 00/25] Lockless update of reference count protected by spinlock Waiman Long
2013-07-03 20:18 ` [PATCH v3 01/25] spinlock: A new lockref structure for lockless update of refcount Waiman Long
2013-07-03 20:19 ` [PATCH v3 02/25] spinlock: Enable x86 architecture to do lockless refcount update Waiman Long
2013-07-03 20:19 ` [PATCH v3 03/25] dcache: Enable lockless update of d_count in dentry structure Waiman Long
2013-07-03 20:37   ` Linus Torvalds
2013-07-03 20:52     ` Waiman Long

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).