[PATCH v2 0/2] Lockless update of reference count protected by spinlock

linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v2 0/2] Lockless update of reference count protected by spinlock
@ 2013-06-26 17:43 Waiman Long
  2013-06-26 17:43 ` [PATCH v2 1/2] spinlock: New spinlock_refcount.h for lockless update of refcount Waiman Long
  2013-06-26 17:43 ` [PATCH v2 2/2] dcache: Locklessly update d_count whenever possible Waiman Long
  0 siblings, 2 replies; 28+ messages in thread
From: Waiman Long @ 2013-06-26 17:43 UTC (permalink / raw)
  To: Alexander Viro, Jeff Layton, Miklos Szeredi, Ingo Molnar
  Cc: Waiman Long, linux-fsdevel, linux-kernel, Linus Torvalds,
	Benjamin Herrenschmidt, Andi Kleen, Chandramouleeswaran, Aswin,
	Norton, Scott J

v1->v2:
 - Add one more layer of indirection to LOCK_WITH_REFCOUNT macro.
 - Add __LINUX_SPINLOCK_REFCOUNT_H protection to spinlock_refcount.h.
 - Add some generic get/put macros into spinlock_refcount.h.

This patchset supports a generic mechanism to atomically update
a reference count that is protected by a spinlock without actually
acquiring the lock itself. If the update doesn't succeeed, the caller
will have to acquire the lock and update the reference count in the
the old way.  This will help in situation where there is a lot of
spinlock contention because of frequent reference count update.

The new mechanism was designed in such a way to easily retrofit
into existing code with minimal changes. Both the spinlock and the
reference count can be accessed in the same way as before.

The d_lock and d_count fields of the struct dentry in dcache.h was
modified to use the new mechanism. This serves as an example of how
to convert existing spinlock and reference count combo to use the
new way of locklessly updating the reference count.

Signed-off-by: Waiman Long <Waiman.Long@hp.com>

Waiman Long (2):
  spinlock: New spinlock_refcount.h for lockless update of refcount
  dcache: Locklessly update d_count whenever possible

 fs/dcache.c                       |   14 +++-
 include/linux/dcache.h            |    6 +-
 include/linux/spinlock_refcount.h |  177 +++++++++++++++++++++++++++++++++++++
 include/linux/spinlock_types.h    |   19 ++++
 4 files changed, 212 insertions(+), 4 deletions(-)
 create mode 100644 include/linux/spinlock_refcount.h

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH v2 1/2] spinlock: New spinlock_refcount.h for lockless update of refcount
  2013-06-26 17:43 [PATCH v2 0/2] Lockless update of reference count protected by spinlock Waiman Long
@ 2013-06-26 17:43 ` Waiman Long
  2013-06-26 20:17   ` Andi Kleen
                     ` (2 more replies)
  2013-06-26 17:43 ` [PATCH v2 2/2] dcache: Locklessly update d_count whenever possible Waiman Long
  1 sibling, 3 replies; 28+ messages in thread
From: Waiman Long @ 2013-06-26 17:43 UTC (permalink / raw)
  To: Alexander Viro, Jeff Layton, Miklos Szeredi, Ingo Molnar
  Cc: Waiman Long, linux-fsdevel, linux-kernel, Linus Torvalds,
	Benjamin Herrenschmidt, Andi Kleen, Chandramouleeswaran, Aswin,
	Norton, Scott J

This patch introduces a new spinlock_refcount.h header file to be
included by kernel code that want to do a lockless update of reference
count protected by a spinlock.

To try to locklessly update the reference count while lock isn't
acquired by others, the 32-bit count and 32-bit raw spinlock can be
combined into a single 64-bit word to be updated atomically whenever
the following conditions are true:

1. The lock is not taken, i.e. spin_can_lock() returns true.
2. The value of the count isn't equal to the given non-negative
   threshold value.

To maximize the chance of doing lockless update, the inlined
__lockcnt_add_unless() function calls spin_unlock_wait() before trying
to do the update.

The new code also attempts to do lockless atomic update twice before
falling back to the old code path of acquring a lock before doing
the update. It is because there will still be some fair amount of
contention with only one attempt.

After including the header file, the LOCK_WITH_REFCOUNT() macro
should be used to define the spinlock with reference count combo in
an embedding data structure.  Then the __lockcnt_add_unless() inlined
function can be used to locklessly update the reference count if the
lock hasn't be acquired by others. Some predefined lockref_*() macros
that call __lockcnt_add_unless() can also be used for this purpose.

Build and boot tests of the new code and the associated dcache changes
were conducted for the following configurations:
1. x86 64-bit SMP, CONFIG_DEBUG_SPINLOCK=n
2. x86 64-bit SMP, CONFIG_DEBUG_SPINLOCK=y
3. x86 32-bit UP , CONFIG_DEBUG_SPINLOCK=n
4. x86 32-bit SMP, CONFIG_DEBUG_SPINLOCK=n
5. x86 32-bit SMP, CONFIG_DEBUG_SPINLOCK=y

Signed-off-by: Waiman Long <Waiman.Long@hp.com>
---
 include/linux/spinlock_refcount.h |  177 +++++++++++++++++++++++++++++++++++++
 include/linux/spinlock_types.h    |   19 ++++
 2 files changed, 196 insertions(+), 0 deletions(-)
 create mode 100644 include/linux/spinlock_refcount.h

diff --git a/include/linux/spinlock_refcount.h b/include/linux/spinlock_refcount.h
new file mode 100644
index 0000000..0160861
--- /dev/null
+++ b/include/linux/spinlock_refcount.h
@@ -0,0 +1,177 @@
+/*
+ * Spinlock with reference count combo
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * (C) Copyright 2013 Hewlett-Packard Development Company, L.P.
+ *
+ * Authors: Waiman Long <waiman.long@hp.com>
+ */
+#ifndef __LINUX_SPINLOCK_REFCOUNT_H
+#define __LINUX_SPINLOCK_REFCOUNT_H
+
+#include <linux/spinlock.h>
+
+/*
+ * The raw __LOCK_WITH_REFCOUNT() macro defines the combined spinlock with
+ * reference count data structure to be embedded in a larger structure.
+ * With unnamed union, the lock and count names can be accessed directly if
+ * no field name is assigned to the structure. Otherwise, they will have to
+ * be accessed indirectly via the assigned field name of the combined
+ * structure.
+ *
+ * The combined data structure is 8-byte aligned. So proper placement of this
+ * structure in the larger embedding data structure is needed to ensure that
+ * there is no hole in it.
+ *
+ * @lock:   Name of the spinlock
+ * @count:  Name of the reference count
+ * @u_name: Name of combined data structure union (can be empty for unnamed
+ *	    union)
+ */
+#ifndef CONFIG_SMP
+#define __LOCK_WITH_REFCOUNT(lock, count, u_name)	\
+	unsigned int count;				\
+	spinlock_t   lock
+
+#elif defined(__SPINLOCK_HAS_REFCOUNT)
+#define __LOCK_WITH_REFCOUNT(lock, count, u_name)	\
+	union u_name {					\
+		u64		__lock_count;		\
+		spinlock_t	lock;			\
+		struct {				\
+			arch_spinlock_t __raw_lock;	\
+			unsigned int	count;		\
+		};					\
+	}
+
+#else /* __SPINLOCK_HAS_REFCOUNT */
+#define __LOCK_WITH_REFCOUNT(lock, count, u_name)	\
+	union u_name {					\
+		u64		__lock_count;		\
+		struct {				\
+			unsigned int	count;		\
+			spinlock_t	lock;		\
+		};					\
+	}
+
+#endif /* __SPINLOCK_HAS_REFCOUNT */
+
+/*
+ * The LOCK_WITH_REFCOUNT() macro create an unnamed union structure
+ */
+#define LOCK_WITH_REFCOUNT(lock, count)			\
+	__LOCK_WITH_REFCOUNT(lock, count, /**/)
+
+#ifdef CONFIG_SMP
+#define	LOCKCNT_COMBO_PTR(s)	(&(s)->__lock_count)
+
+/*
+ * Define a "union _lock_refcnt" structure to be used by the helper function
+ */
+__LOCK_WITH_REFCOUNT(lock, count, __lock_refcnt);
+
+/**
+ *
+ * __lockcnt_add_unless - atomically add the given value to the count unless
+ *			  the lock was acquired or the count equals to the
+ *			  given threshold value.
+ *
+ * @plockcnt : pointer to the combined lock and count 8-byte data
+ * @plock    : pointer to the spinlock
+ * @pcount   : pointer to the reference count
+ * @value    : value to be added
+ * @threshold: threshold value for acquiring the lock
+ * Return    : 1 if operation succeed, 0 otherwise
+ *
+ * If the lock was not acquired, __lockcnt_add_unless() atomically adds the
+ * given value to the reference count unless the given threshold is reached.
+ * If the lock was acquired or the threshold was reached, 0 is returned and
+ * the caller will have to acquire the lock and update the count accordingly
+ * (can be done in a non-atomic way).
+ */
+static inline int
+__lockcnt_add_unless(u64 *plockcnt, spinlock_t *plock, unsigned int *pcount,
+		     int value, int threshold)
+{
+	union __lock_refcnt old, new;
+	int   get_lock;
+
+	/*
+	 * Code doesn't work if raw spinlock is larger than 4 bytes
+	 * or is empty.
+	 */
+	BUG_ON((sizeof(arch_spinlock_t) > 4) || (sizeof(arch_spinlock_t) == 0));
+
+	spin_unlock_wait(plock);	/* Wait until lock is released */
+	old.__lock_count = ACCESS_ONCE(*plockcnt);
+	get_lock = ((threshold >= 0) && (old.count == threshold));
+	if (likely(!get_lock && spin_can_lock(&old.lock))) {
+		new.__lock_count = old.__lock_count;
+		new.count += value;
+		if (cmpxchg64(plockcnt, old.__lock_count, new.__lock_count)
+		    == old.__lock_count)
+			return 1;
+		cpu_relax();
+		/*
+		 * Try one more time
+		 */
+		old.__lock_count = ACCESS_ONCE(*plockcnt);
+		get_lock = ((threshold >= 0) && (old.count == threshold));
+		if (likely(!get_lock && spin_can_lock(&old.lock))) {
+			new.__lock_count = old.__lock_count;
+			new.count += value;
+			if (cmpxchg64(plockcnt, old.__lock_count,
+				      new.__lock_count) == old.__lock_count)
+				return 1;
+			cpu_relax();
+		}
+	}
+	return 0;	/* The caller will need to acquire the lock */
+}
+#else /* CONFIG_SMP */
+#define	LOCKCNT_COMBO_PTR(s)	NULL
+
+/*
+ * Just add the value as the spinlock is a no-op
+ */
+static inline int
+__lockcnt_add_unless(u64 *plockcnt, spinlock_t *plock, unsigned int *pcount,
+		     int value, int threshold)
+{
+	if ((threshold >= 0) && (*pcount == threshold))
+		return 0;
+	(*pcount) += value;
+	return 1;
+}
+#endif /* CONFIG_SMP */
+
+/*
+ * Generic macros to increment and decrement reference count
+ * @sname : pointer to structure that embeds spinlock & reference count combo
+ * @lock  : name of spinlock in structure
+ * @count : name of reference count in structure
+ *
+ * lockref_get()  - increments count unless it is locked
+ * lockref_get0() - increments count unless it is locked or count is 0
+ * lockref_put()  - decrements count unless it is locked or count is 1
+ */
+#define lockref_get(sname, lock, count) 				\
+	__lockcnt_add_unless(LOCKCNT_COMBO_PTR(sname), &(sname)->lock,	\
+			     &(sname)->count, 1, -1)
+#define lockref_get0(sname, lock, count) 				\
+	__lockcnt_add_unless(LOCKCNT_COMBO_PTR(sname), &(sname)->lock,	\
+			     &(sname)->count, 1, 0)
+#define lockref_put(sname, lock, count) 				\
+	__lockcnt_add_unless(LOCKCNT_COMBO_PTR(sname), &(sname)->lock,	\
+			     &(sname)->count, -1, 1)
+
+#endif /* __LINUX_SPINLOCK_REFCOUNT_H */
diff --git a/include/linux/spinlock_types.h b/include/linux/spinlock_types.h
index 73548eb..cc107ad 100644
--- a/include/linux/spinlock_types.h
+++ b/include/linux/spinlock_types.h
@@ -17,8 +17,27 @@
 
 #include <linux/lockdep.h>
 
+/*
+ * The presence of either one of the CONFIG_DEBUG_SPINLOCK or
+ * CONFIG_DEBUG_LOCK_ALLOC configuration parameter will force the
+ * spinlock_t structure to be 8-byte aligned.
+ *
+ * To support the spinlock/reference count combo data type for 64-bit SMP
+ * environment with spinlock debugging turned on, the reference count has
+ * to be integrated into the spinlock_t data structure in this special case.
+ * The spinlock_t data type will be 8 bytes larger if CONFIG_GENERIC_LOCKBREAK
+ * is also defined.
+ */
+#if defined(CONFIG_64BIT) && (defined(CONFIG_DEBUG_SPINLOCK) ||\
+			      defined(CONFIG_DEBUG_LOCK_ALLOC))
+#define	__SPINLOCK_HAS_REFCOUNT	1
+#endif
+
 typedef struct raw_spinlock {
 	arch_spinlock_t raw_lock;
+#ifdef __SPINLOCK_HAS_REFCOUNT
+	unsigned int count;
+#endif
 #ifdef CONFIG_GENERIC_LOCKBREAK
 	unsigned int break_lock;
 #endif
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v2 2/2] dcache: Locklessly update d_count whenever possible
  2013-06-26 17:43 [PATCH v2 0/2] Lockless update of reference count protected by spinlock Waiman Long
  2013-06-26 17:43 ` [PATCH v2 1/2] spinlock: New spinlock_refcount.h for lockless update of refcount Waiman Long
@ 2013-06-26 17:43 ` Waiman Long
  1 sibling, 0 replies; 28+ messages in thread
From: Waiman Long @ 2013-06-26 17:43 UTC (permalink / raw)
  To: Alexander Viro, Jeff Layton, Miklos Szeredi, Ingo Molnar
  Cc: Waiman Long, linux-fsdevel, linux-kernel, Linus Torvalds,
	Benjamin Herrenschmidt, Andi Kleen, Chandramouleeswaran, Aswin,
	Norton, Scott J

The current code takes the dentry's d_lock lock whenever the d_count
reference count is being updated. In reality, nothing big really
happens until d_count goes to 0 in dput(). So it is not necessary
to take the lock if the reference count won't go to 0. On the other
hand, there are cases where d_count should not be updated or was not
expected to be updated while d_lock was acquired by another thread.

The new mechanism for locklessly update a reference count in the
spinlock & reference count combo was used to ensure that the reference
count was updated locklessly as much as possible.

The offsets of the d_count/d_lock combo are at byte 72 and 88 for
32-bit and 64-bit SMP systems respectively. In both cases, they are
8-byte aligned and their combination into a single 8-byte word will
not introduce a hole that increase the size of the dentry structure.

This patch has a particular big impact on the short workload of the
AIM7 benchmark with ramdisk filesystem. The table below show the
performance improvement to the JPM (jobs per minutes) throughput due
to this patch on an 8-socket 80-core x86-64 system with a 3.10-rc4
kernel in a 1/2/4/8 node configuration by using numactl to restrict
the execution of the workload on certain nodes.

+-----------------+----------------+-----------------+----------+
|  Configuration  |    Mean JPM    |    Mean JPM     | % Change |
|                 | Rate w/o patch | Rate with patch |          |
+-----------------+---------------------------------------------+
|                 |              User Range 10 - 100            |
+-----------------+---------------------------------------------+
| 8 nodes, HT off |    1596798     |     4721367     | +195.7%  |
| 4 nodes, HT off |    1653817     |     5020983     | +203.6%  |
| 2 nodes, HT off |    3802258     |     3813229     |   +0.3%  |
| 1 node , HT off |    2403297     |     2433240     |   +1.3%  |
+-----------------+---------------------------------------------+
|                 |              User Range 200 - 1000          |
+-----------------+---------------------------------------------+
| 8 nodes, HT off |    1070992     |     5878321     | +448.9%  |
| 4 nodes, HT off |    1367668     |     7578718     | +454.1%  |
| 2 nodes, HT off |    4554370     |     4614674     |   +1.3%  |
| 1 node , HT off |    2534826     |     2540622     |   +0.2%  |
+-----------------+---------------------------------------------+
|                 |              User Range 1100 - 2000         |
+-----------------+---------------------------------------------+
| 8 nodes, HT off |    1061322     |     6397776     | +502.8%  |
| 4 nodes, HT off |    1365111     |     6980558     | +411.3%  |
| 2 nodes, HT off |    4583947     |     4637919     |   +1.2%  |
| 1 node , HT off |    2563721     |     2587611     |   +0.9%  |
+-----------------+----------------+-----------------+----------+

It can be seen that with 40 CPUs (4 nodes) or more, this patch can
significantly improve the short workload performance. With only 1 or
2 nodes, the performance is similar with or without the patch. The
short workload also scales pretty well up to 4 nodes with this patch.

A perf call-graph report of the short workload at 1500 users
without the patch on the same 8-node machine indicates that about
78% of the workload's total time were spent in the _raw_spin_lock()
function. Almost all of which can be attributed to the following 2
kernel functions:
 1. dget_parent (49.91%)
 2. dput (49.89%)

The relevant perf report lines are:
+  78.37%    reaim  [kernel.kallsyms]     [k] _raw_spin_lock
+   0.09%    reaim  [kernel.kallsyms]     [k] dput
+   0.05%    reaim  [kernel.kallsyms]     [k] _raw_spin_lock_irq
+   0.00%    reaim  [kernel.kallsyms]     [k] dget_parent

With this patch installed, the new perf report lines are:
+  12.24%    reaim  [kernel.kallsyms]     [k] _raw_spin_lock_irqsave
+   4.82%    reaim  [kernel.kallsyms]     [k] _raw_spin_lock
+   3.26%    reaim  [kernel.kallsyms]     [k] dget_parent
+   1.08%    reaim  [kernel.kallsyms]     [k] dput

-   4.82%    reaim  [kernel.kallsyms]     [k] _raw_spin_lock
   - _raw_spin_lock
      + 34.41% d_path
      + 33.22% SyS_getcwd
      + 4.38% prepend_path
      + 3.71% dget_parent
      + 2.54% inet_twsk_schedule
      + 2.44% complete_walk
      + 2.01% __rcu_process_callbacks
      + 1.78% dput
      + 1.36% unlazy_walk
      + 1.18% do_anonymous_page
      + 1.10% sem_lock
      + 0.82% process_backlog
      + 0.78% task_rq_lock
      + 0.67% selinux_inode_free_security
      + 0.60% unix_dgram_sendmsg
      + 0.54% enqueue_to_backlog

The dput used up only 1.78% of the _raw_spin_lock time while
dget_parent used only 3.71%. The time spent on dput and dget_parent
did increase because of busy waiting for unlock as well as the overhead
of doing cmpxchg operations.

This impact of this patch on other AIM7 workloads were much more
modest.  The table below show the mean %change due to this patch on
the same 8-socket system with a 3.10-rc4 kernel.

+--------------+---------------+----------------+-----------------+
|   Workload   | mean % change | mean % change  | mean % change   |
|              | 10-100 users  | 200-1000 users | 1100-2000 users |
+--------------+---------------+----------------+-----------------+
| alltests     |      0.0%     |     -0.8%      |     +0.1%       |
| five_sec     |     -2.4%     |     +1.7%      |     -0.1%       |
| fserver      |     -1.8%     |     -2.4%      |     -2.1%       |
| high_systime |     +0.1%     |     +0.7%      |     +2.1%       |
| new_fserver  |     +0.2%     |     -1.7%      |     -0.9%       |
| shared       |     -0.1%     |      0.0%      |     -0.1%       |
+--------------+---------------+----------------+-----------------+

There are slight drops in performance for fsever and new_fserver
workloads, but slight increase in the high_systime workload.

Signed-off-by: Waiman Long <Waiman.Long@hp.com>
---
 fs/dcache.c            |   14 ++++++++++++--
 include/linux/dcache.h |    6 ++++--
 2 files changed, 16 insertions(+), 4 deletions(-)

diff --git a/fs/dcache.c b/fs/dcache.c
index f09b908..1f8f78d 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -515,6 +515,8 @@ void dput(struct dentry *dentry)
 repeat:
 	if (dentry->d_count == 1)
 		might_sleep();
+	else if (lockref_put(dentry, d_lock, d_count))
+		return;
 	spin_lock(&dentry->d_lock);
 	BUG_ON(!dentry->d_count);
 	if (dentry->d_count > 1) {
@@ -611,6 +613,8 @@ static inline void __dget_dlock(struct dentry *dentry)
 
 static inline void __dget(struct dentry *dentry)
 {
+	if (lockref_get(dentry, d_lock, d_count))
+		return;
 	spin_lock(&dentry->d_lock);
 	__dget_dlock(dentry);
 	spin_unlock(&dentry->d_lock);
@@ -620,17 +624,23 @@ struct dentry *dget_parent(struct dentry *dentry)
 {
 	struct dentry *ret;
 
+	rcu_read_lock();
+	ret = rcu_dereference(dentry->d_parent);
+	if (lockref_get0(ret, d_lock, d_count)) {
+		rcu_read_unlock();
+		return ret;
+	}
 repeat:
 	/*
 	 * Don't need rcu_dereference because we re-check it was correct under
 	 * the lock.
 	 */
-	rcu_read_lock();
-	ret = dentry->d_parent;
+	ret = ACCESS_ONCE(dentry->d_parent);
 	spin_lock(&ret->d_lock);
 	if (unlikely(ret != dentry->d_parent)) {
 		spin_unlock(&ret->d_lock);
 		rcu_read_unlock();
+		rcu_read_lock();
 		goto repeat;
 	}
 	rcu_read_unlock();
diff --git a/include/linux/dcache.h b/include/linux/dcache.h
index 1a6bb81..35d900a 100644
--- a/include/linux/dcache.h
+++ b/include/linux/dcache.h
@@ -6,6 +6,7 @@
 #include <linux/rculist.h>
 #include <linux/rculist_bl.h>
 #include <linux/spinlock.h>
+#include <linux/spinlock_refcount.h>
 #include <linux/seqlock.h>
 #include <linux/cache.h>
 #include <linux/rcupdate.h>
@@ -112,8 +113,7 @@ struct dentry {
 	unsigned char d_iname[DNAME_INLINE_LEN];	/* small names */
 
 	/* Ref lookup also touches following */
-	unsigned int d_count;		/* protected by d_lock */
-	spinlock_t d_lock;		/* per dentry lock */
+	LOCK_WITH_REFCOUNT(d_lock, d_count);	/* per dentry lock & count */
 	const struct dentry_operations *d_op;
 	struct super_block *d_sb;	/* The root of the dentry tree */
 	unsigned long d_time;		/* used by d_revalidate */
@@ -359,6 +359,8 @@ static inline struct dentry *dget_dlock(struct dentry *dentry)
 static inline struct dentry *dget(struct dentry *dentry)
 {
 	if (dentry) {
+		if (lockref_get(dentry, d_lock, d_count))
+			return dentry;
 		spin_lock(&dentry->d_lock);
 		dget_dlock(dentry);
 		spin_unlock(&dentry->d_lock);
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 1/2] spinlock: New spinlock_refcount.h for lockless update of refcount
  2013-06-26 17:43 ` [PATCH v2 1/2] spinlock: New spinlock_refcount.h for lockless update of refcount Waiman Long
@ 2013-06-26 20:17   ` Andi Kleen
  2013-06-26 21:07     ` Waiman Long
  2013-06-26 23:06   ` Thomas Gleixner
  2013-06-29 17:45   ` Linus Torvalds
  2 siblings, 1 reply; 28+ messages in thread
From: Andi Kleen @ 2013-06-26 20:17 UTC (permalink / raw)
  To: Waiman Long
  Cc: Alexander Viro, Jeff Layton, Miklos Szeredi, Ingo Molnar,
	linux-fsdevel, linux-kernel, Linus Torvalds,
	Benjamin Herrenschmidt, Andi Kleen, Chandramouleeswaran, Aswin,
	Norton, Scott J

> + * The combined data structure is 8-byte aligned. So proper placement of this
> + * structure in the larger embedding data structure is needed to ensure that
> + * there is no hole in it.

On i386 u64 is only 4 bytes aligned. So you need to explicitely align
it to 8 bytes. Otherwise you risk the two members crossing a cache line, which
would be really expensive with atomics.

> +	/*
> +	 * Code doesn't work if raw spinlock is larger than 4 bytes
> +	 * or is empty.
> +	 */
> +	BUG_ON((sizeof(arch_spinlock_t) > 4) || (sizeof(arch_spinlock_t) == 0));

BUILD_BUG_ON 

> +
> +	spin_unlock_wait(plock);	/* Wait until lock is released */
> +	old.__lock_count = ACCESS_ONCE(*plockcnt);
> +	get_lock = ((threshold >= 0) && (old.count == threshold));
> +	if (likely(!get_lock && spin_can_lock(&old.lock))) {

What is that for? Why can't you do the CMPXCHG unconditially ?

If it's really needed, it is most likely a race?

The duplicated code should be likely an inline.

> +/*
> + * The presence of either one of the CONFIG_DEBUG_SPINLOCK or
> + * CONFIG_DEBUG_LOCK_ALLOC configuration parameter will force the
> + * spinlock_t structure to be 8-byte aligned.
> + *
> + * To support the spinlock/reference count combo data type for 64-bit SMP
> + * environment with spinlock debugging turned on, the reference count has
> + * to be integrated into the spinlock_t data structure in this special case.
> + * The spinlock_t data type will be 8 bytes larger if CONFIG_GENERIC_LOCKBREAK
> + * is also defined.

I would rather just disable the optimization when these CONFIGs are set

-Andi
-- 
ak@linux.intel.com -- Speaking for myself only.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 1/2] spinlock: New spinlock_refcount.h for lockless update of refcount
  2013-06-26 20:17   ` Andi Kleen
@ 2013-06-26 21:07     ` Waiman Long
  2013-06-26 21:22       ` Andi Kleen
  0 siblings, 1 reply; 28+ messages in thread
From: Waiman Long @ 2013-06-26 21:07 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Alexander Viro, Jeff Layton, Miklos Szeredi, Ingo Molnar,
	linux-fsdevel, linux-kernel, Linus Torvalds,
	Benjamin Herrenschmidt, Chandramouleeswaran, Aswin,
	Norton, Scott J

On 06/26/2013 04:17 PM, Andi Kleen wrote:
>> + * The combined data structure is 8-byte aligned. So proper placement of this
>> + * structure in the larger embedding data structure is needed to ensure that
>> + * there is no hole in it.
> On i386 u64 is only 4 bytes aligned. So you need to explicitely align
> it to 8 bytes. Otherwise you risk the two members crossing a cache line, which
> would be really expensive with atomics.

Do you mean the original i386 or the i586 that are now used by most 
distribution now? If it is the former, I recall that i386 is now no 
longer supported.

I also look around some existing codes that use cmpxchg64. It doesn't 
seem like they use explicit alignment. I will need more investigation to 
see if it is a real problem.
>> +	/*
>> +	 * Code doesn't work if raw spinlock is larger than 4 bytes
>> +	 * or is empty.
>> +	 */
>> +	BUG_ON((sizeof(arch_spinlock_t)>  4) || (sizeof(arch_spinlock_t) == 0));
> BUILD_BUG_ON

Thank for the suggestion, will make the change.

>> +
>> +	spin_unlock_wait(plock);	/* Wait until lock is released */
>> +	old.__lock_count = ACCESS_ONCE(*plockcnt);
>> +	get_lock = ((threshold>= 0)&&  (old.count == threshold));
>> +	if (likely(!get_lock&&  spin_can_lock(&old.lock))) {
> What is that for? Why can't you do the CMPXCHG unconditially ?

An unconditional CMPXCHG can be as bad as acquiring the spinlock. So we 
need to check the conditions are ready before doing an actual CMPXCHG.
> If it's really needed, it is most likely a race?

If there is a race going on between threads, the code will fall back to 
the old way of acquiring the spinlock.

> The duplicated code should be likely an inline.

The duplicated code is only used once in the function. I don't think an 
additional inline is really needed, but I can do it if other people also 
think that is a good idea.

>> +/*
>> + * The presence of either one of the CONFIG_DEBUG_SPINLOCK or
>> + * CONFIG_DEBUG_LOCK_ALLOC configuration parameter will force the
>> + * spinlock_t structure to be 8-byte aligned.
>> + *
>> + * To support the spinlock/reference count combo data type for 64-bit SMP
>> + * environment with spinlock debugging turned on, the reference count has
>> + * to be integrated into the spinlock_t data structure in this special case.
>> + * The spinlock_t data type will be 8 bytes larger if CONFIG_GENERIC_LOCKBREAK
>> + * is also defined.
> I would rather just disable the optimization when these CONFIGs are set

Looking from the other perspective, we may want the locking code to have 
the same behavior whether spinlock debugging is enabled or not. 
Disabling the optimization will cause the code path to differ which may 
not be what we want. Of course, I can change it if other people also 
think it is the right way to do it.

Regards,
Longman

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 1/2] spinlock: New spinlock_refcount.h for lockless update of refcount
  2013-06-26 21:07     ` Waiman Long
@ 2013-06-26 21:22       ` Andi Kleen
  2013-06-26 23:26         ` Waiman Long
  2013-06-26 23:27         ` Thomas Gleixner
  0 siblings, 2 replies; 28+ messages in thread
From: Andi Kleen @ 2013-06-26 21:22 UTC (permalink / raw)
  To: Waiman Long
  Cc: Andi Kleen, Alexander Viro, Jeff Layton, Miklos Szeredi,
	Ingo Molnar, linux-fsdevel, linux-kernel, Linus Torvalds,
	Benjamin Herrenschmidt, Chandramouleeswaran, Aswin,
	Norton, Scott J

On Wed, Jun 26, 2013 at 05:07:02PM -0400, Waiman Long wrote:
> On 06/26/2013 04:17 PM, Andi Kleen wrote:
> >>+ * The combined data structure is 8-byte aligned. So proper placement of this
> >>+ * structure in the larger embedding data structure is needed to ensure that
> >>+ * there is no hole in it.
> >On i386 u64 is only 4 bytes aligned. So you need to explicitely align
> >it to 8 bytes. Otherwise you risk the two members crossing a cache line, which
> >would be really expensive with atomics.
> 
> Do you mean the original i386 or the i586 that are now used by most
> distribution now? If it is the former, I recall that i386 is now no
> longer supported.

I mean i386, as in the 32bit x86 architecture.

> 
> I also look around some existing codes that use cmpxchg64. It
> doesn't seem like they use explicit alignment. I will need more
> investigation to see if it is a real problem.

Adding the alignment is basically free. If 32bit users don't enforce
it they're likely potentially broken yes, but they may be lucky.
> >>+	get_lock = ((threshold>= 0)&&  (old.count == threshold));
> >>+	if (likely(!get_lock&&  spin_can_lock(&old.lock))) {
> >What is that for? Why can't you do the CMPXCHG unconditially ?
> 
> An unconditional CMPXCHG can be as bad as acquiring the spinlock. So
> we need to check the conditions are ready before doing an actual
> CMPXCHG.

But this isn't a cheap check. Especially spin_unlock_wait can be 
very expensive.

And all these functions have weird semantics. Perhaps just a quick
spin_is_locked.

> 
> Looking from the other perspective, we may want the locking code to
> have the same behavior whether spinlock debugging is enabled or not.
> Disabling the optimization will cause the code path to differ which
> may not be what we want. Of course, I can change it if other people
> also think it is the right way to do it.

Lock debugging already has quite different timing/lock semantics.

-Andi

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 1/2] spinlock: New spinlock_refcount.h for lockless update of refcount
  2013-06-26 17:43 ` [PATCH v2 1/2] spinlock: New spinlock_refcount.h for lockless update of refcount Waiman Long
  2013-06-26 20:17   ` Andi Kleen
@ 2013-06-26 23:06   ` Thomas Gleixner
  2013-06-27  0:16     ` Waiman Long
  2013-06-27  0:26     ` Waiman Long
  2013-06-29 17:45   ` Linus Torvalds
  2 siblings, 2 replies; 28+ messages in thread
From: Thomas Gleixner @ 2013-06-26 23:06 UTC (permalink / raw)
  To: Waiman Long
  Cc: Alexander Viro, Jeff Layton, Miklos Szeredi, Ingo Molnar,
	linux-fsdevel, LKML, Linus Torvalds, Benjamin Herrenschmidt,
	Andi Kleen, Chandramouleeswaran, Aswin, Norton, Scott J,
	Peter Zijlstra, Steven Rostedt

On Wed, 26 Jun 2013, Waiman Long wrote:

It would have been nice if you'd had cc'ed people who spent a massive
amount of time working on the spinlock code. Did that now.

> +#ifndef CONFIG_SMP
> +#define __LOCK_WITH_REFCOUNT(lock, count, u_name)	\
> +	unsigned int count;				\
> +	spinlock_t   lock
> +
> +#elif defined(__SPINLOCK_HAS_REFCOUNT)
> +#define __LOCK_WITH_REFCOUNT(lock, count, u_name)	\
> +	union u_name {					\
> +		u64		__lock_count;		\
> +		spinlock_t	lock;			\
> +		struct {				\
> +			arch_spinlock_t __raw_lock;	\
> +			unsigned int	count;		\
> +		};					\
> +	}
> +
> +#else /* __SPINLOCK_HAS_REFCOUNT */
> +#define __LOCK_WITH_REFCOUNT(lock, count, u_name)	\
> +	union u_name {					\
> +		u64		__lock_count;		\
> +		struct {				\
> +			unsigned int	count;		\
> +			spinlock_t	lock;		\
> +		};					\
> +	}
> +
> +#endif /* __SPINLOCK_HAS_REFCOUNT */

This is a complete design disaster. You are violating every single
rule of proper layering. The differentiation of spinlock, raw_spinlock
and arch_spinlock is there for a reason and you just take the current
implementation and map your desired function to it. If we take this,
then we fundamentally ruled out a change to the mappings of spinlock,
raw_spinlock and arch_spinlock. This layering is on purpose and it
took a lot of effort to make it as clean as it is now. Your proposed
change undoes that and completely wreckages preempt-rt.

What's even worse is that you go and fiddle with the basic data
structures:

> diff --git a/include/linux/spinlock_types.h b/include/linux/spinlock_types.h
> index 73548eb..cc107ad 100644
> --- a/include/linux/spinlock_types.h
> +++ b/include/linux/spinlock_types.h
> @@ -17,8 +17,27 @@
>  
>  #include <linux/lockdep.h>
>  
> +/*
> + * The presence of either one of the CONFIG_DEBUG_SPINLOCK or
> + * CONFIG_DEBUG_LOCK_ALLOC configuration parameter will force the
> + * spinlock_t structure to be 8-byte aligned.
> + *
> + * To support the spinlock/reference count combo data type for 64-bit SMP
> + * environment with spinlock debugging turned on, the reference count has
> + * to be integrated into the spinlock_t data structure in this special case.
> + * The spinlock_t data type will be 8 bytes larger if CONFIG_GENERIC_LOCKBREAK
> + * is also defined.
> + */
> +#if defined(CONFIG_64BIT) && (defined(CONFIG_DEBUG_SPINLOCK) ||\
> +			      defined(CONFIG_DEBUG_LOCK_ALLOC))
> +#define	__SPINLOCK_HAS_REFCOUNT	1
> +#endif
> +
>  typedef struct raw_spinlock {
>  	arch_spinlock_t raw_lock;
> +#ifdef __SPINLOCK_HAS_REFCOUNT
> +	unsigned int count;
> +#endif
>  #ifdef CONFIG_GENERIC_LOCKBREAK
>  	unsigned int break_lock;
>  #endif

You need to do that, because you blindly bolt your spinlock extension
into the existing code without taking the time to think about the
design implications.

Do not misunderstand me, I'm impressed by the improvement numbers, but
we need to find a way how this can be done sanely. You really have to
understand that there is a world aside of x86 and big machines. 

Let's have a look at your UP implementation. It's completely broken:

> +/*
> + * Just add the value as the spinlock is a no-op

So what protects that from being preempted? Nothing AFAICT.

What's even worse is that you break all architectures, which have an
arch_spinlock type != sizeof(unsigned int), e.g. SPARC

So what you really want is a new data structure, e.g. something like
struct spinlock_refcount() and a proper set of new accessor functions
instead of an adhoc interface which is designed solely for the needs
of dcache improvements.

The first step to achieve your goal is to consolidate all the
identical copies of arch_spinlock_t and talk to the maintainers of the
architectures which have a different (i.e. !32bit) notion of the arch
specific spinlock implementation, whether it's possible to move over
to a common data struct. Once you have achieved this, you can build
your spinlock + refcount construct upon that.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 1/2] spinlock: New spinlock_refcount.h for lockless update of refcount
  2013-06-26 21:22       ` Andi Kleen
@ 2013-06-26 23:26         ` Waiman Long
  2013-06-27  1:06           ` Andi Kleen
  2013-06-26 23:27         ` Thomas Gleixner
  1 sibling, 1 reply; 28+ messages in thread
From: Waiman Long @ 2013-06-26 23:26 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Alexander Viro, Jeff Layton, Miklos Szeredi, Ingo Molnar,
	linux-fsdevel, linux-kernel, Linus Torvalds,
	Benjamin Herrenschmidt, Chandramouleeswaran, Aswin,
	Norton, Scott J

On 06/26/2013 05:22 PM, Andi Kleen wrote:
> On Wed, Jun 26, 2013 at 05:07:02PM -0400, Waiman Long wrote:
>> On 06/26/2013 04:17 PM, Andi Kleen wrote:
>>>> + * The combined data structure is 8-byte aligned. So proper placement of this
>>>> + * structure in the larger embedding data structure is needed to ensure that
>>>> + * there is no hole in it.
>>> On i386 u64 is only 4 bytes aligned. So you need to explicitely align
>>> it to 8 bytes. Otherwise you risk the two members crossing a cache line, which
>>> would be really expensive with atomics.
>> Do you mean the original i386 or the i586 that are now used by most
>> distribution now? If it is the former, I recall that i386 is now no
>> longer supported.
> I mean i386, as in the 32bit x86 architecture.
>
>> I also look around some existing codes that use cmpxchg64. It
>> doesn't seem like they use explicit alignment. I will need more
>> investigation to see if it is a real problem.
> Adding the alignment is basically free. If 32bit users don't enforce
> it they're likely potentially broken yes, but they may be lucky.

You are right. I will added the 8-byte alignment attribute to the union 
definition and document that in the code.

>>>> +	get_lock = ((threshold>= 0)&&   (old.count == threshold));
>>>> +	if (likely(!get_lock&&   spin_can_lock(&old.lock))) {
>>> What is that for? Why can't you do the CMPXCHG unconditially ?
>> An unconditional CMPXCHG can be as bad as acquiring the spinlock. So
>> we need to check the conditions are ready before doing an actual
>> CMPXCHG.
> But this isn't a cheap check. Especially spin_unlock_wait can be
> very expensive.
> And all these functions have weird semantics. Perhaps just a quick
> spin_is_locked.

In the uncontended case, doing spin_unlock_wait will be similar to 
spin_can_lock. This, when combined with a cmpxchg, is still faster than 
doing 2 atomic operations in spin_lock/spin_unlock.

In the contended case, doing spin_unlock_wait won't be more expensive 
than doing spin_lock. Without doing that, most of the threads will fall 
back to acquiring the lock negating any performance benefit. I had tried 
that without spin_unlock_wait and there wasn't that much performance 
gain in the AIM7 short workload. BTW, spin_can_lock is just the negation 
of spin_is_locked.

Regards,
Longman

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 1/2] spinlock: New spinlock_refcount.h for lockless update of refcount
  2013-06-26 21:22       ` Andi Kleen
  2013-06-26 23:26         ` Waiman Long
@ 2013-06-26 23:27         ` Thomas Gleixner
  1 sibling, 0 replies; 28+ messages in thread
From: Thomas Gleixner @ 2013-06-26 23:27 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Waiman Long, Alexander Viro, Jeff Layton, Miklos Szeredi,
	Ingo Molnar, linux-fsdevel, linux-kernel, Linus Torvalds,
	Benjamin Herrenschmidt, Chandramouleeswaran, Aswin,
	Norton, Scott J

On Wed, 26 Jun 2013, Andi Kleen wrote:
> On Wed, Jun 26, 2013 at 05:07:02PM -0400, Waiman Long wrote:
> > Looking from the other perspective, we may want the locking code to
> > have the same behavior whether spinlock debugging is enabled or not.
> > Disabling the optimization will cause the code path to differ which
> > may not be what we want. Of course, I can change it if other people
> > also think it is the right way to do it.
> 
> Lock debugging already has quite different timing/lock semantics.

Locks have no timing related sematics and lock debugging does not
change the semantics of locks. Lock debugging adds overhead to the
locking functions, but it does not affect the semantics of locks.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 1/2] spinlock: New spinlock_refcount.h for lockless update of refcount
  2013-06-26 23:06   ` Thomas Gleixner
@ 2013-06-27  0:16     ` Waiman Long
  2013-06-27 14:44       ` Thomas Gleixner
  2013-06-27  0:26     ` Waiman Long
  1 sibling, 1 reply; 28+ messages in thread
From: Waiman Long @ 2013-06-27  0:16 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Alexander Viro, Jeff Layton, Miklos Szeredi, Ingo Molnar,
	linux-fsdevel, LKML, Linus Torvalds, Benjamin Herrenschmidt,
	Andi Kleen, Chandramouleeswaran, Aswin, Norton, Scott J,
	Peter Zijlstra, Steven Rostedt

On 06/26/2013 07:06 PM, Thomas Gleixner wrote:
> On Wed, 26 Jun 2013, Waiman Long wrote:
>
> It would have been nice if you'd had cc'ed people who spent a massive
> amount of time working on the spinlock code. Did that now.

I am sorry that I am relatively new to the Linux community and using 
get_maintainer.pl didn't out your name when I modified spinlock_types.h. 
Thank for letting me know about the shortcoming of my patch and this is 
what I am looking for.

>> +#ifndef CONFIG_SMP
>> +#define __LOCK_WITH_REFCOUNT(lock, count, u_name)	\
>> +	unsigned int count;				\
>> +	spinlock_t   lock
>> +
>> +#elif defined(__SPINLOCK_HAS_REFCOUNT)
>> +#define __LOCK_WITH_REFCOUNT(lock, count, u_name)	\
>> +	union u_name {					\
>> +		u64		__lock_count;		\
>> +		spinlock_t	lock;			\
>> +		struct {				\
>> +			arch_spinlock_t __raw_lock;	\
>> +			unsigned int	count;		\
>> +		};					\
>> +	}
>> +
>> +#else /* __SPINLOCK_HAS_REFCOUNT */
>> +#define __LOCK_WITH_REFCOUNT(lock, count, u_name)	\
>> +	union u_name {					\
>> +		u64		__lock_count;		\
>> +		struct {				\
>> +			unsigned int	count;		\
>> +			spinlock_t	lock;		\
>> +		};					\
>> +	}
>> +
>> +#endif /* __SPINLOCK_HAS_REFCOUNT */
> This is a complete design disaster. You are violating every single
> rule of proper layering. The differentiation of spinlock, raw_spinlock
> and arch_spinlock is there for a reason and you just take the current
> implementation and map your desired function to it. If we take this,
> then we fundamentally ruled out a change to the mappings of spinlock,
> raw_spinlock and arch_spinlock. This layering is on purpose and it
> took a lot of effort to make it as clean as it is now. Your proposed
> change undoes that and completely wreckages preempt-rt.

Would you mind enlighten me how this change will break preempt-rt?

> What's even worse is that you go and fiddle with the basic data
> structures:
>
>> diff --git a/include/linux/spinlock_types.h b/include/linux/spinlock_types.h
>> index 73548eb..cc107ad 100644
>> --- a/include/linux/spinlock_types.h
>> +++ b/include/linux/spinlock_types.h
>> @@ -17,8 +17,27 @@
>>
>>   #include<linux/lockdep.h>
>>
>> +/*
>> + * The presence of either one of the CONFIG_DEBUG_SPINLOCK or
>> + * CONFIG_DEBUG_LOCK_ALLOC configuration parameter will force the
>> + * spinlock_t structure to be 8-byte aligned.
>> + *
>> + * To support the spinlock/reference count combo data type for 64-bit SMP
>> + * environment with spinlock debugging turned on, the reference count has
>> + * to be integrated into the spinlock_t data structure in this special case.
>> + * The spinlock_t data type will be 8 bytes larger if CONFIG_GENERIC_LOCKBREAK
>> + * is also defined.
>> + */
>> +#if defined(CONFIG_64BIT)&&  (defined(CONFIG_DEBUG_SPINLOCK) ||\
>> +			      defined(CONFIG_DEBUG_LOCK_ALLOC))
>> +#define	__SPINLOCK_HAS_REFCOUNT	1
>> +#endif
>> +
>>   typedef struct raw_spinlock {
>>   	arch_spinlock_t raw_lock;
>> +#ifdef __SPINLOCK_HAS_REFCOUNT
>> +	unsigned int count;
>> +#endif
>>   #ifdef CONFIG_GENERIC_LOCKBREAK
>>   	unsigned int break_lock;
>>   #endif
> You need to do that, because you blindly bolt your spinlock extension
> into the existing code without taking the time to think about the
> design implications.

Another alternative is to disable this lockless update optimization for 
this special case. With that, I don't need to modify the raw_spinlock_t 
data structure.

> Do not misunderstand me, I'm impressed by the improvement numbers, but
> we need to find a way how this can be done sanely. You really have to
> understand that there is a world aside of x86 and big machines.

I am well aware that my knowledge in other architectures is very 
limited. And this is why we have this kind of public review process.

> Let's have a look at your UP implementation. It's completely broken:
>
>> +/*
>> + * Just add the value as the spinlock is a no-op
> So what protects that from being preempted? Nothing AFAICT.

Thank for the pointing this out. I can change it to return 0 to disable 
the optimization.

> What's even worse is that you break all architectures, which have an
> arch_spinlock type != sizeof(unsigned int), e.g. SPARC

Actually the code should work as long as arch_spinlock type is not 
bigger than 4 bytes because of structure field alignment.  It can be 1 
or 2 bytes. If NR_CPUS is less than 256, x86's arch_spinlock type should 
only be 2 bytes.

The only architecture that will break, according to data in the 
respectively arch/*/include/asm/spinlock_types.h files is PA-RISC 1.x 
(it is OK in PA-RISC 2.0) whose arch_spinlock type has a size of 16 
bytes. I am not sure if 32-bit PA-RISC 1.x is still supported or not, 
but we can always disable the optimization for this case.

> So what you really want is a new data structure, e.g. something like
> struct spinlock_refcount() and a proper set of new accessor functions
> instead of an adhoc interface which is designed solely for the needs
> of dcache improvements.

I had thought about that. The only downside is we need more code changes 
to make existing code to use the new infrastructure. One of the drivers 
in my design is to minimize change to existing code. Personally, I have 
no objection of doing that if others think this is the right way to go.

> The first step to achieve your goal is to consolidate all the
> identical copies of arch_spinlock_t and talk to the maintainers of the
> architectures which have a different (i.e. !32bit) notion of the arch
> specific spinlock implementation, whether it's possible to move over
> to a common data struct. Once you have achieved this, you can build
> your spinlock + refcount construct upon that.

As I said above, as long as the arch_spinlock type isn't bigger than 4 
bytes, the code should work.

Regards,
Longman

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 1/2] spinlock: New spinlock_refcount.h for lockless update of refcount
  2013-06-26 23:06   ` Thomas Gleixner
  2013-06-27  0:16     ` Waiman Long
@ 2013-06-27  0:26     ` Waiman Long
  1 sibling, 0 replies; 28+ messages in thread
From: Waiman Long @ 2013-06-27  0:26 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Alexander Viro, Jeff Layton, Miklos Szeredi, Ingo Molnar,
	linux-fsdevel, LKML, Linus Torvalds, Benjamin Herrenschmidt,
	Andi Kleen, Chandramouleeswaran, Aswin, Norton, Scott J,
	Peter Zijlstra, Steven Rostedt

On 06/26/2013 07:06 PM, Thomas Gleixner wrote:
> On Wed, 26 Jun 2013, Waiman Long wrote:
>
> It would have been nice if you'd had cc'ed people who spent a massive
> amount of time working on the spinlock code. Did that now.

I am sorry that I am relatively new to the Linux community and using 
get_maintainer.pl didn't out your name when I modified spinlock_types.h. 
Thank for letting me know about the shortcoming of my patch and this is 
what I am looking for.

>> +#ifndef CONFIG_SMP
>> +#define __LOCK_WITH_REFCOUNT(lock, count, u_name)	\
>> +	unsigned int count;				\
>> +	spinlock_t   lock
>> +
>> +#elif defined(__SPINLOCK_HAS_REFCOUNT)
>> +#define __LOCK_WITH_REFCOUNT(lock, count, u_name)	\
>> +	union u_name {					\
>> +		u64		__lock_count;		\
>> +		spinlock_t	lock;			\
>> +		struct {				\
>> +			arch_spinlock_t __raw_lock;	\
>> +			unsigned int	count;		\
>> +		};					\
>> +	}
>> +
>> +#else /* __SPINLOCK_HAS_REFCOUNT */
>> +#define __LOCK_WITH_REFCOUNT(lock, count, u_name)	\
>> +	union u_name {					\
>> +		u64		__lock_count;		\
>> +		struct {				\
>> +			unsigned int	count;		\
>> +			spinlock_t	lock;		\
>> +		};					\
>> +	}
>> +
>> +#endif /* __SPINLOCK_HAS_REFCOUNT */
> This is a complete design disaster. You are violating every single
> rule of proper layering. The differentiation of spinlock, raw_spinlock
> and arch_spinlock is there for a reason and you just take the current
> implementation and map your desired function to it. If we take this,
> then we fundamentally ruled out a change to the mappings of spinlock,
> raw_spinlock and arch_spinlock. This layering is on purpose and it
> took a lot of effort to make it as clean as it is now. Your proposed
> change undoes that and completely wreckages preempt-rt.

Would you mind enlighten me how this change will break preempt-rt?

> What's even worse is that you go and fiddle with the basic data
> structures:
>
>> diff --git a/include/linux/spinlock_types.h b/include/linux/spinlock_types.h
>> index 73548eb..cc107ad 100644
>> --- a/include/linux/spinlock_types.h
>> +++ b/include/linux/spinlock_types.h
>> @@ -17,8 +17,27 @@
>>
>>   #include<linux/lockdep.h>
>>
>> +/*
>> + * The presence of either one of the CONFIG_DEBUG_SPINLOCK or
>> + * CONFIG_DEBUG_LOCK_ALLOC configuration parameter will force the
>> + * spinlock_t structure to be 8-byte aligned.
>> + *
>> + * To support the spinlock/reference count combo data type for 64-bit SMP
>> + * environment with spinlock debugging turned on, the reference count has
>> + * to be integrated into the spinlock_t data structure in this special case.
>> + * The spinlock_t data type will be 8 bytes larger if CONFIG_GENERIC_LOCKBREAK
>> + * is also defined.
>> + */
>> +#if defined(CONFIG_64BIT)&&  (defined(CONFIG_DEBUG_SPINLOCK) ||\
>> +			      defined(CONFIG_DEBUG_LOCK_ALLOC))
>> +#define	__SPINLOCK_HAS_REFCOUNT	1
>> +#endif
>> +
>>   typedef struct raw_spinlock {
>>   	arch_spinlock_t raw_lock;
>> +#ifdef __SPINLOCK_HAS_REFCOUNT
>> +	unsigned int count;
>> +#endif
>>   #ifdef CONFIG_GENERIC_LOCKBREAK
>>   	unsigned int break_lock;
>>   #endif
> You need to do that, because you blindly bolt your spinlock extension
> into the existing code without taking the time to think about the
> design implications.

Another alternative is to disable this lockless update optimization for 
this special case. With that, I don't need to modify the raw_spinlock_t 
data structure.

> Do not misunderstand me, I'm impressed by the improvement numbers, but
> we need to find a way how this can be done sanely. You really have to
> understand that there is a world aside of x86 and big machines.

I am well aware that my knowledge in other architectures is very 
limited. And this is why we have this kind of public review process.

> Let's have a look at your UP implementation. It's completely broken:
>
>> +/*
>> + * Just add the value as the spinlock is a no-op
> So what protects that from being preempted? Nothing AFAICT.

Thank for the pointing this out. I can change it to return 0 to disable 
the optimization.

> What's even worse is that you break all architectures, which have an
> arch_spinlock type != sizeof(unsigned int), e.g. SPARC

Actually the code should work as long as arch_spinlock type is not 
bigger than 4 bytes because of structure field alignment.  It can be 1 
or 2 bytes. If NR_CPUS is less than 256, x86's arch_spinlock type should 
only be 2 bytes.

The only architecture that will break, according to data in the 
respectively arch/*/include/asm/spinlock_types.h files is PA-RISC 1.x 
(it is OK in PA-RISC 2.0) whose arch_spinlock type has a size of 16 
bytes. I am not sure if 32-bit PA-RISC 1.x is still supported or not, 
but we can always disable the optimization for this case.

> So what you really want is a new data structure, e.g. something like
> struct spinlock_refcount() and a proper set of new accessor functions
> instead of an adhoc interface which is designed solely for the needs
> of dcache improvements.

I had thought about that. The only downside is we need more code changes 
to make existing code to use the new infrastructure. One of the drivers 
in my design is to minimize change to existing code. Personally, I have 
no objection of doing that if others think this is the right way to go.

> The first step to achieve your goal is to consolidate all the
> identical copies of arch_spinlock_t and talk to the maintainers of the
> architectures which have a different (i.e. !32bit) notion of the arch
> specific spinlock implementation, whether it's possible to move over
> to a common data struct. Once you have achieved this, you can build
> your spinlock + refcount construct upon that.

As I said above, as long as the arch_spinlock type isn't bigger than 4 
bytes, the code should work.

Regards,
Longman

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 1/2] spinlock: New spinlock_refcount.h for lockless update of refcount
  2013-06-26 23:26         ` Waiman Long
@ 2013-06-27  1:06           ` Andi Kleen
  2013-06-27  1:15             ` Waiman Long
  0 siblings, 1 reply; 28+ messages in thread
From: Andi Kleen @ 2013-06-27  1:06 UTC (permalink / raw)
  To: Waiman Long
  Cc: Andi Kleen, Alexander Viro, Jeff Layton, Miklos Szeredi,
	Ingo Molnar, linux-fsdevel, linux-kernel, Linus Torvalds,
	Benjamin Herrenschmidt, Chandramouleeswaran, Aswin,
	Norton, Scott J

> In the uncontended case, doing spin_unlock_wait will be similar to
> spin_can_lock. This, when combined with a cmpxchg, is still faster
> than doing 2 atomic operations in spin_lock/spin_unlock.

I'm totally against any new users of spin_unlock_wait()

It has bizarre semantics, most likely will make various
lock optimizations impossible, it's race condition hell
for most users etc.

spin_can_lock() is not quite as bad has a lot of the similar problems.

> BTW, spin_can_lock is just the negation of spin_is_locked.

e.g. with elision it's not.

-Andi

-- 
ak@linux.intel.com -- Speaking for myself only.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 1/2] spinlock: New spinlock_refcount.h for lockless update of refcount
  2013-06-27  1:06           ` Andi Kleen
@ 2013-06-27  1:15             ` Waiman Long
  2013-06-27  1:24               ` Waiman Long
  0 siblings, 1 reply; 28+ messages in thread
From: Waiman Long @ 2013-06-27  1:15 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Alexander Viro, Jeff Layton, Miklos Szeredi, Ingo Molnar,
	linux-fsdevel, linux-kernel, Linus Torvalds,
	Benjamin Herrenschmidt, Chandramouleeswaran, Aswin,
	Norton, Scott J, Thomas Gleixner, Peter Zijlstra, Steven Rostedt

On 06/26/2013 09:06 PM, Andi Kleen wrote:
>> In the uncontended case, doing spin_unlock_wait will be similar to
>> spin_can_lock. This, when combined with a cmpxchg, is still faster
>> than doing 2 atomic operations in spin_lock/spin_unlock.
> I'm totally against any new users of spin_unlock_wait()
>
> It has bizarre semantics, most likely will make various
> lock optimizations impossible, it's race condition hell
> for most users etc.
>
> spin_can_lock() is not quite as bad has a lot of the similar problems.
>
>> BTW, spin_can_lock is just the negation of spin_is_locked.
> e.g. with elision it's not.
>
> -Andi

OK, it is about Haswell's lock elision feature. I will see what I can do 
to remove those problematic function calls.

Regards,
Longman

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 1/2] spinlock: New spinlock_refcount.h for lockless update of refcount
  2013-06-27  1:15             ` Waiman Long
@ 2013-06-27  1:24               ` Waiman Long
  2013-06-27  1:37                 ` Andi Kleen
  0 siblings, 1 reply; 28+ messages in thread
From: Waiman Long @ 2013-06-27  1:24 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Alexander Viro, Jeff Layton, Miklos Szeredi, Ingo Molnar,
	linux-fsdevel, linux-kernel, Linus Torvalds,
	Benjamin Herrenschmidt, Chandramouleeswaran, Aswin,
	Norton, Scott J, Thomas Gleixner, Peter Zijlstra, Steven Rostedt

On 06/26/2013 09:15 PM, Waiman Long wrote:
> On 06/26/2013 09:06 PM, Andi Kleen wrote:
>>> In the uncontended case, doing spin_unlock_wait will be similar to
>>> spin_can_lock. This, when combined with a cmpxchg, is still faster
>>> than doing 2 atomic operations in spin_lock/spin_unlock.
>> I'm totally against any new users of spin_unlock_wait()
>>
>> It has bizarre semantics, most likely will make various
>> lock optimizations impossible, it's race condition hell
>> for most users etc.
>>
>> spin_can_lock() is not quite as bad has a lot of the similar problems.
>>
>>> BTW, spin_can_lock is just the negation of spin_is_locked.
>> e.g. with elision it's not.
>>
>> -Andi
>
> OK, it is about Haswell's lock elision feature. I will see what I can 
> do to remove those problematic function calls.

It will be hard to know what changes will be needed without knowing the 
exact semantics of the spinlock functions with lock elision. Can you 
explain a little more what bizarre semantics you are referring to?

regards,
Longman

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 1/2] spinlock: New spinlock_refcount.h for lockless update of refcount
  2013-06-27  1:24               ` Waiman Long
@ 2013-06-27  1:37                 ` Andi Kleen
  2013-06-27 14:56                   ` Waiman Long
  0 siblings, 1 reply; 28+ messages in thread
From: Andi Kleen @ 2013-06-27  1:37 UTC (permalink / raw)
  To: Waiman Long
  Cc: Andi Kleen, Alexander Viro, Jeff Layton, Miklos Szeredi,
	Ingo Molnar, linux-fsdevel, linux-kernel, Linus Torvalds,
	Benjamin Herrenschmidt, Chandramouleeswaran, Aswin,
	Norton, Scott J, Thomas Gleixner, Peter Zijlstra, Steven Rostedt

> It will be hard to know what changes will be needed without knowing
> the exact semantics of the spinlock functions with lock elision. Can
> you explain a little more what bizarre semantics you are referring
> to?

Totally independent of elision.

For example, what semantics does spin_unlock_wait() have with a ticket
lock. Where in the queue does it wait?
It doesn't really make sense with a ticket lock.

What semantics would lockdep put on it?

-Andi

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 1/2] spinlock: New spinlock_refcount.h for lockless update of refcount
  2013-06-27  0:16     ` Waiman Long
@ 2013-06-27 14:44       ` Thomas Gleixner
  2013-06-29 21:03         ` Waiman Long
  0 siblings, 1 reply; 28+ messages in thread
From: Thomas Gleixner @ 2013-06-27 14:44 UTC (permalink / raw)
  To: Waiman Long
  Cc: Alexander Viro, Jeff Layton, Miklos Szeredi, Ingo Molnar,
	linux-fsdevel, LKML, Linus Torvalds, Benjamin Herrenschmidt,
	Andi Kleen, Chandramouleeswaran, Aswin, Norton, Scott J,
	Peter Zijlstra, Steven Rostedt

On Wed, 26 Jun 2013, Waiman Long wrote:
> On 06/26/2013 07:06 PM, Thomas Gleixner wrote:
> > On Wed, 26 Jun 2013, Waiman Long wrote:
> > This is a complete design disaster. You are violating every single
> > rule of proper layering. The differentiation of spinlock, raw_spinlock
> > and arch_spinlock is there for a reason and you just take the current
> > implementation and map your desired function to it. If we take this,
> > then we fundamentally ruled out a change to the mappings of spinlock,
> > raw_spinlock and arch_spinlock. This layering is on purpose and it
> > took a lot of effort to make it as clean as it is now. Your proposed
> > change undoes that and completely wreckages preempt-rt.
> 
> Would you mind enlighten me how this change will break preempt-rt?

The whole spinlock layering was done for RT. In mainline spinlock is
mapped to raw_spinlock. On RT spinlock is mapped to a PI aware
rtmutex.

So your mixing of the various layers and the assumption of lock plus
count being adjacent, does not work on RT at all.

> The only architecture that will break, according to data in the
> respectively arch/*/include/asm/spinlock_types.h files is PA-RISC
> 1.x (it is OK in PA-RISC 2.0) whose arch_spinlock type has a size of
> 16 bytes. I am not sure if 32-bit PA-RISC 1.x is still supported or
> not, but we can always disable the optimization for this case.

You have to do that right from the beginning with a proper data
structure and proper accessor functions. Introducing wreckage and then
retroactivly fixing it, is not a really good idea.

> > So what you really want is a new data structure, e.g. something like
> > struct spinlock_refcount() and a proper set of new accessor functions
> > instead of an adhoc interface which is designed solely for the needs
> > of dcache improvements.
> 
> I had thought about that. The only downside is we need more code changes to
> make existing code to use the new infrastructure. One of the drivers in my

That's not a downside. It makes the usage of the infrastructure more
obvious and not hidden behind macro magic. And the changes are trivial
to script.

> design is to minimize change to existing code. Personally, I have no
> objection of doing that if others think this is the right way to go.

Definitely. Something like this would be ok:

struct lock_refcnt {
       int                 refcnt;
       struct spinlock     lock;
};

It does not require a gazillion of ifdefs and works for
UP,SMP,DEBUG....

extern bool lock_refcnt_mod(struct lock_refcnt *lr, int mod, int cond);

You also want something like:

extern void lock_refcnt_disable(void);

So we can runtime disable it e.g. when lock elision is detected and
active. So you can force lock_refcnt_mod() to return false.

static inline bool lock_refcnt_inc(struct lock_refcnt *sr)
{
#ifdef CONFIG_HAVE_LOCK_REFCNT
	return lock_refcnt_mod(sr, 1, INTMAX);
#else
	return false;
#endif
}

That does not break any code as long as CONFIG_HAVE_SPINLOCK_REFCNT=n.

So we can enable it per architecture and make it depend on SMP. For RT
we simply can force this switch to n.

The other question is the semantic of these refcount functions. From
your description the usage pattern is:

     if (refcnt_xxx())
	   return;
     /* Slow path */
     spin_lock();
     ...
     spin_unlock();

So it might be sensible to make this explicit:

static inline bool refcnt_inc_or_lock(struct lock_refcnt *lr))
{
#ifdef CONFIG_HAVE_SPINLOCK_REFCNT
	if (lock_refcnt_mod(lr, 1, INTMAX))
	   return true;
#endif
	spin_lock(&lr->lock);
	return false;
}

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 1/2] spinlock: New spinlock_refcount.h for lockless update of refcount
  2013-06-27  1:37                 ` Andi Kleen
@ 2013-06-27 14:56                   ` Waiman Long
  2013-06-28 13:46                     ` Thomas Gleixner
  0 siblings, 1 reply; 28+ messages in thread
From: Waiman Long @ 2013-06-27 14:56 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Alexander Viro, Jeff Layton, Miklos Szeredi, Ingo Molnar,
	linux-fsdevel, linux-kernel, Linus Torvalds,
	Benjamin Herrenschmidt, Chandramouleeswaran, Aswin,
	Norton, Scott J, Thomas Gleixner, Peter Zijlstra, Steven Rostedt

On 06/26/2013 09:37 PM, Andi Kleen wrote:
>> It will be hard to know what changes will be needed without knowing
>> the exact semantics of the spinlock functions with lock elision. Can
>> you explain a little more what bizarre semantics you are referring
>> to?
> Totally independent of elision.
>
> For example, what semantics does spin_unlock_wait() have with a ticket
> lock. Where in the queue does it wait?
> It doesn't really make sense with a ticket lock.
>
> What semantics would lockdep put on it?
>
> -Andi

Calling spin_unlock_wait() doesn't put the caller into a queue. It just 
wait until the lock is no longer held by any thread. Yes, there is a 
possibility that the lock can be so busy that it may be hold by various 
threads continuously for a long time making it hard for those who wait 
to proceed. Perhaps, I should change the code to abandon the use of 
spin_unlock_wait(). Instead, I can make it wait for the lock to be free 
with some kind of timeout to make sure that it won't wait too long.

With this timeout mechanism, additional lockdep code shouldn't be needed 
as the code will eventually call spin_lock() if the lock is really busy.

Regards,
Longman

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 1/2] spinlock: New spinlock_refcount.h for lockless update of refcount
  2013-06-27 14:56                   ` Waiman Long
@ 2013-06-28 13:46                     ` Thomas Gleixner
  2013-06-29 20:30                       ` Waiman Long
  0 siblings, 1 reply; 28+ messages in thread
From: Thomas Gleixner @ 2013-06-28 13:46 UTC (permalink / raw)
  To: Waiman Long
  Cc: Andi Kleen, Alexander Viro, Jeff Layton, Miklos Szeredi,
	Ingo Molnar, linux-fsdevel, linux-kernel, Linus Torvalds,
	Benjamin Herrenschmidt, Chandramouleeswaran, Aswin,
	Norton, Scott J, Peter Zijlstra, Steven Rostedt

On Thu, 27 Jun 2013, Waiman Long wrote:
> On 06/26/2013 09:37 PM, Andi Kleen wrote:
> > > It will be hard to know what changes will be needed without knowing
> > > the exact semantics of the spinlock functions with lock elision. Can
> > > you explain a little more what bizarre semantics you are referring
> > > to?
> > Totally independent of elision.
> > 
> > For example, what semantics does spin_unlock_wait() have with a ticket
> > lock. Where in the queue does it wait?
> > It doesn't really make sense with a ticket lock.
> > 
> > What semantics would lockdep put on it?
> > 
> > -Andi
> 
> Calling spin_unlock_wait() doesn't put the caller into a queue. It just wait
> until the lock is no longer held by any thread. Yes, there is a possibility
> that the lock can be so busy that it may be hold by various threads
> continuously for a long time making it hard for those who wait to proceed.
> Perhaps, I should change the code to abandon the use of spin_unlock_wait().
> Instead, I can make it wait for the lock to be free with some kind of timeout
> to make sure that it won't wait too long.

Please no timeout heuristics. They are bound to be wrong.

If the lock is held by some other cpu, then waiting for it with
unlock_wait() or a magic timeout is probably equally expensive as just
going into the slow path right away.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 1/2] spinlock: New spinlock_refcount.h for lockless update of refcount
  2013-06-26 17:43 ` [PATCH v2 1/2] spinlock: New spinlock_refcount.h for lockless update of refcount Waiman Long
  2013-06-26 20:17   ` Andi Kleen
  2013-06-26 23:06   ` Thomas Gleixner
@ 2013-06-29 17:45   ` Linus Torvalds
  2013-06-29 20:23     ` Waiman Long
  2 siblings, 1 reply; 28+ messages in thread
From: Linus Torvalds @ 2013-06-29 17:45 UTC (permalink / raw)
  To: Waiman Long
  Cc: Alexander Viro, Jeff Layton, Miklos Szeredi, Ingo Molnar,
	linux-fsdevel, Linux Kernel Mailing List, Benjamin Herrenschmidt,
	Andi Kleen, Chandramouleeswaran, Aswin, Norton, Scott J

Sorry for not commenting earlier, I was traveling and keeping email to
a minimum..

On Wed, Jun 26, 2013 at 10:43 AM, Waiman Long <Waiman.Long@hp.com> wrote:
> This patch introduces a new spinlock_refcount.h header file to be
> included by kernel code that want to do a lockless update of reference
> count protected by a spinlock.

So I really like the concept, but the implementation is a mess, and
tries to do too much, while actually achieving too little.

I do not believe you should care about debug spinlocks at all, and
just leave them be. Have a simple fallback code that defaults to
regular counts and spinlocks, and have any debug cases just use that.

But more importantly, I think this needs to be architecture-specific,
and using <linux/spinlock_refcount.h> to try to do some generic 64-bit
cmpxchg() version is a bad bad idea.

We have several architectures coming up that have memory transaction
support, and the "spinlock with refcount" is a perfect candidate for a
transactional memory implementation. So when introducing a new atomic
like this that is very performance-critical and used for some very
core code, I really think architectures would want to make their own
optimized versions.

These things should also not be inlined, I think.

So I think the concept is good, but I think the implementation needs
more thought.

                       Linus

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 1/2] spinlock: New spinlock_refcount.h for lockless update of refcount
  2013-06-29 17:45   ` Linus Torvalds
@ 2013-06-29 20:23     ` Waiman Long
  2013-06-29 21:34       ` Waiman Long
  2013-06-29 21:58       ` Linus Torvalds
  0 siblings, 2 replies; 28+ messages in thread
From: Waiman Long @ 2013-06-29 20:23 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Alexander Viro, Jeff Layton, Miklos Szeredi, Ingo Molnar,
	linux-fsdevel, Linux Kernel Mailing List, Benjamin Herrenschmidt,
	Andi Kleen, Chandramouleeswaran, Aswin, Norton, Scott J,
	Thomas Gleixner, Peter Zijlstra, Steven Rostedt

On 06/29/2013 01:45 PM, Linus Torvalds wrote:
> Sorry for not commenting earlier, I was traveling and keeping email to
> a minimum..
>
> On Wed, Jun 26, 2013 at 10:43 AM, Waiman Long<Waiman.Long@hp.com>  wrote:
>> This patch introduces a new spinlock_refcount.h header file to be
>> included by kernel code that want to do a lockless update of reference
>> count protected by a spinlock.
> So I really like the concept, but the implementation is a mess, and
> tries to do too much, while actually achieving too little.
>
> I do not believe you should care about debug spinlocks at all, and
> just leave them be. Have a simple fallback code that defaults to
> regular counts and spinlocks, and have any debug cases just use that.

I was concern that people might want to have the same behavior even when 
spinlock debugging was  on. Apparently, this is not really needed. Now I 
can just disable the optimization and fall back to the old path when 
spinlock debugging is on.

> But more importantly, I think this needs to be architecture-specific,
> and using<linux/spinlock_refcount.h>  to try to do some generic 64-bit
> cmpxchg() version is a bad bad idea.

Yes, I can put the current implementation into 
asm-generic/spinlock_refcount.h. Now I need to put an 
asm/spinlock_refcount.h into every arch's include/asm directory. Right? 
I don't think there is a mechanism in the build script to create a 
symlink from asm to generic-asm when a header file is missing. Is it the 
general rule that we should have a linux/spinlock_refcount.h that 
include asm/spinlock_refcount.h instead of including 
asm/spinlock_refcount.h directly?

> We have several architectures coming up that have memory transaction
> support, and the "spinlock with refcount" is a perfect candidate for a
> transactional memory implementation. So when introducing a new atomic
> like this that is very performance-critical and used for some very
> core code, I really think architectures would want to make their own
> optimized versions.
>
> These things should also not be inlined, I think.
>
> So I think the concept is good, but I think the implementation needs
> more thought.
>
>                         Linus

Thank for the comment. I will try to come up with a version that is 
acceptable to all stakeholders.

Regards,
Longman


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 1/2] spinlock: New spinlock_refcount.h for lockless update of refcount
  2013-06-28 13:46                     ` Thomas Gleixner
@ 2013-06-29 20:30                       ` Waiman Long
  0 siblings, 0 replies; 28+ messages in thread
From: Waiman Long @ 2013-06-29 20:30 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Andi Kleen, Alexander Viro, Jeff Layton, Miklos Szeredi,
	Ingo Molnar, linux-fsdevel, linux-kernel, Linus Torvalds,
	Benjamin Herrenschmidt, Chandramouleeswaran, Aswin,
	Norton, Scott J, Peter Zijlstra, Steven Rostedt

On 06/28/2013 09:46 AM, Thomas Gleixner wrote:
> On Thu, 27 Jun 2013, Waiman Long wrote:
>> On 06/26/2013 09:37 PM, Andi Kleen wrote:
>>>> It will be hard to know what changes will be needed without knowing
>>>> the exact semantics of the spinlock functions with lock elision. Can
>>>> you explain a little more what bizarre semantics you are referring
>>>> to?
>>> Totally independent of elision.
>>>
>>> For example, what semantics does spin_unlock_wait() have with a ticket
>>> lock. Where in the queue does it wait?
>>> It doesn't really make sense with a ticket lock.
>>>
>>> What semantics would lockdep put on it?
>>>
>>> -Andi
>> Calling spin_unlock_wait() doesn't put the caller into a queue. It just wait
>> until the lock is no longer held by any thread. Yes, there is a possibility
>> that the lock can be so busy that it may be hold by various threads
>> continuously for a long time making it hard for those who wait to proceed.
>> Perhaps, I should change the code to abandon the use of spin_unlock_wait().
>> Instead, I can make it wait for the lock to be free with some kind of timeout
>> to make sure that it won't wait too long.
> Please no timeout heuristics. They are bound to be wrong.
>
> If the lock is held by some other cpu, then waiting for it with
> unlock_wait() or a magic timeout is probably equally expensive as just
> going into the slow path right away.

After some more thought, it may not be such a bad idea of have some kind 
of timeout. In that case, the code will just fall back to the old way of 
acquiring the spinlock before updating the count. If the lock is really 
busy, it is possible that the waiting thread may get starved to a point 
that it cannot proceed for a really long time. A timeout mechanism 
ensures that unfairness to the waiting thread will be limited. The exact 
timeout value is not that critical, a larger value will increase the 
probability of doing a lockless update and a smaller value will decrease 
the probability.

Regards,
Longman

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 1/2] spinlock: New spinlock_refcount.h for lockless update of refcount
  2013-06-27 14:44       ` Thomas Gleixner
@ 2013-06-29 21:03         ` Waiman Long
  0 siblings, 0 replies; 28+ messages in thread
From: Waiman Long @ 2013-06-29 21:03 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Alexander Viro, Jeff Layton, Miklos Szeredi, Ingo Molnar,
	linux-fsdevel, LKML, Linus Torvalds, Benjamin Herrenschmidt,
	Andi Kleen, Chandramouleeswaran, Aswin, Norton, Scott J,
	Peter Zijlstra, Steven Rostedt

On 06/27/2013 10:44 AM, Thomas Gleixner wrote:
> On Wed, 26 Jun 2013, Waiman Long wrote:
>> On 06/26/2013 07:06 PM, Thomas Gleixner wrote:
>>> On Wed, 26 Jun 2013, Waiman Long wrote:
>>> This is a complete design disaster. You are violating every single
>>> rule of proper layering. The differentiation of spinlock, raw_spinlock
>>> and arch_spinlock is there for a reason and you just take the current
>>> implementation and map your desired function to it. If we take this,
>>> then we fundamentally ruled out a change to the mappings of spinlock,
>>> raw_spinlock and arch_spinlock. This layering is on purpose and it
>>> took a lot of effort to make it as clean as it is now. Your proposed
>>> change undoes that and completely wreckages preempt-rt.
>> Would you mind enlighten me how this change will break preempt-rt?
> The whole spinlock layering was done for RT. In mainline spinlock is
> mapped to raw_spinlock. On RT spinlock is mapped to a PI aware
> rtmutex.
>
> So your mixing of the various layers and the assumption of lock plus
> count being adjacent, does not work on RT at all.

Thank for the explanation. I had downloaded and looked at the RT patch. 
My code won't work for the full RT kernel. I guess that is no way to 
work around this and only logical choice is to disable it for the full 
RT kernel.

>> The only architecture that will break, according to data in the
>> respectively arch/*/include/asm/spinlock_types.h files is PA-RISC
>> 1.x (it is OK in PA-RISC 2.0) whose arch_spinlock type has a size of
>> 16 bytes. I am not sure if 32-bit PA-RISC 1.x is still supported or
>> not, but we can always disable the optimization for this case.
> You have to do that right from the beginning with a proper data
> structure and proper accessor functions. Introducing wreckage and then
> retroactivly fixing it, is not a really good idea.

For architecture that needs a larger than 32-bit arch_spin_lock type, 
the optimization code will be disabled just like the non-SMP and full RT 
cases.

>>> So what you really want is a new data structure, e.g. something like
>>> struct spinlock_refcount() and a proper set of new accessor functions
>>> instead of an adhoc interface which is designed solely for the needs
>>> of dcache improvements.
>> I had thought about that. The only downside is we need more code changes to
>> make existing code to use the new infrastructure. One of the drivers in my
> That's not a downside. It makes the usage of the infrastructure more
> obvious and not hidden behind macro magic. And the changes are trivial
> to script.

Making the code from d_lock to lockref.lock, for example, is trivial. 
However, there are about 40 different files that need to be changed with 
different maintainers. Do I need to get an ACK from all of them or can I 
just go ahead with such trivial changes without their ACK? You know it 
can be hard to get responses from so many maintainers in a timely 
manner. This is actually my main concern.

>> design is to minimize change to existing code. Personally, I have no
>> objection of doing that if others think this is the right way to go.
> Definitely. Something like this would be ok:
>
> struct lock_refcnt {
>         int                 refcnt;
>         struct spinlock     lock;
> };
>
> It does not require a gazillion of ifdefs and works for
> UP,SMP,DEBUG....
>
> extern bool lock_refcnt_mod(struct lock_refcnt *lr, int mod, int cond);
>
> You also want something like:
>
> extern void lock_refcnt_disable(void);
>
> So we can runtime disable it e.g. when lock elision is detected and
> active. So you can force lock_refcnt_mod() to return false.
>
> static inline bool lock_refcnt_inc(struct lock_refcnt *sr)
> {
> #ifdef CONFIG_HAVE_LOCK_REFCNT
> 	return lock_refcnt_mod(sr, 1, INTMAX);
> #else
> 	return false;
> #endif
> }
>
> That does not break any code as long as CONFIG_HAVE_SPINLOCK_REFCNT=n.
>
> So we can enable it per architecture and make it depend on SMP. For RT
> we simply can force this switch to n.
>
> The other question is the semantic of these refcount functions. From
> your description the usage pattern is:
>
>       if (refcnt_xxx())
> 	   return;
>       /* Slow path */
>       spin_lock();
>       ...
>       spin_unlock();
>
> So it might be sensible to make this explicit:
>
> static inline bool refcnt_inc_or_lock(struct lock_refcnt *lr))
> {
> #ifdef CONFIG_HAVE_SPINLOCK_REFCNT
> 	if (lock_refcnt_mod(lr, 1, INTMAX))
> 	   return true;
> #endif
> 	spin_lock(&lr->lock);
> 	return false;
> }

Yes, it is a good idea to have a config variable to enable/disable it as 
long as the default is "y". Of course, an full RT kernel or an non-SMP 
kernel will have it disabled.

Regards,
Longman

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 1/2] spinlock: New spinlock_refcount.h for lockless update of refcount
  2013-06-29 20:23     ` Waiman Long
@ 2013-06-29 21:34       ` Waiman Long
  2013-06-29 22:11         ` Linus Torvalds
  2013-06-29 21:58       ` Linus Torvalds
  1 sibling, 1 reply; 28+ messages in thread
From: Waiman Long @ 2013-06-29 21:34 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Alexander Viro, Jeff Layton, Miklos Szeredi, Ingo Molnar,
	linux-fsdevel, Linux Kernel Mailing List, Benjamin Herrenschmidt,
	Andi Kleen, Chandramouleeswaran, Aswin, Norton, Scott J,
	Thomas Gleixner, Peter Zijlstra, Steven Rostedt

On 06/29/2013 04:23 PM, Waiman Long wrote:
> On 06/29/2013 01:45 PM, Linus Torvalds wrote:
>
>> But more importantly, I think this needs to be architecture-specific,
>> and using<linux/spinlock_refcount.h>  to try to do some generic 64-bit
>> cmpxchg() version is a bad bad idea.
>
> Yes, I can put the current implementation into 
> asm-generic/spinlock_refcount.h. Now I need to put an 
> asm/spinlock_refcount.h into every arch's include/asm directory. 
> Right? I don't think there is a mechanism in the build script to 
> create a symlink from asm to generic-asm when a header file is 
> missing. Is it the general rule that we should have a 
> linux/spinlock_refcount.h that include asm/spinlock_refcount.h instead 
> of including asm/spinlock_refcount.h directly?
>
>> We have several architectures coming up that have memory transaction
>> support, and the "spinlock with refcount" is a perfect candidate for a
>> transactional memory implementation. So when introducing a new atomic
>> like this that is very performance-critical and used for some very
>> core code, I really think architectures would want to make their own
>> optimized versions.
>>
>> These things should also not be inlined, I think.
>>

I think I got it now. For architecture with transactional memory support 
to use an alternative implementation, we will need to use some kind of 
dynamic patching at kernel boot up time as not all CPUs in that 
architecture will have that support. In that case the helper functions 
have to be real functions and cannot be inlined. That means I need to 
put the implementation into a spinlock_refcount.c file with the header 
file contains structure definitions and function prototypes only. Is 
that what you are looking for?

Regards,
Longman

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 1/2] spinlock: New spinlock_refcount.h for lockless update of refcount
  2013-06-29 20:23     ` Waiman Long
  2013-06-29 21:34       ` Waiman Long
@ 2013-06-29 21:58       ` Linus Torvalds
  2013-06-29 22:47         ` Linus Torvalds
  1 sibling, 1 reply; 28+ messages in thread
From: Linus Torvalds @ 2013-06-29 21:58 UTC (permalink / raw)
  To: Waiman Long
  Cc: Alexander Viro, Jeff Layton, Miklos Szeredi, Ingo Molnar,
	linux-fsdevel, Linux Kernel Mailing List, Benjamin Herrenschmidt,
	Andi Kleen, Chandramouleeswaran, Aswin, Norton, Scott J,
	Thomas Gleixner, Peter Zijlstra, Steven Rostedt

On Sat, Jun 29, 2013 at 1:23 PM, Waiman Long <waiman.long@hp.com> wrote:
>>
>> But more importantly, I think this needs to be architecture-specific,
>> and using<linux/spinlock_refcount.h>  to try to do some generic 64-bit
>> cmpxchg() version is a bad bad idea.
>
> Yes, I can put the current implementation into
> asm-generic/spinlock_refcount.h. Now I need to put an
> asm/spinlock_refcount.h into every arch's include/asm directory. Right?

No, I'd suggest something slightly different: we did something fairly
similar to this with the per-arch optimized version of the
word-at-a-time filename hashing/lookup.

So the sanest approach is something like this:

STEP 1:

 - create a new set of config option (say
"CONFIG_[ARCH_]SPINLOCK_REFCOUNT") that defaults to 'y' and isn't
actually asked about anywhere. Just a simple

    config ARCH_SPINLOCK_REFCOUNT
        bool

    config SPINLOCK_REFCOUNT
        depends on ARCH_SPINLOCK_REFCOUNT
        depends on !GENERIC_LOCKBREAK
        depends on !DEBUG_SPINLOCK
        depends on !DEBUG_LOCK_ALLOC
        bool

   in init/Kconfig will do that.

 - create a <linux/spinlock-refcount.h> that looks something like this

    #ifdef CONFIG_SPINLOCK_REFCOUNT
    #include <asm/spinlock-refcount.h>
    #else
     .. fallback implementation with normal spinlocks that works for
all cases, including lockdep etc ..
    #endif

and now you should have something that should already work, it just
doesn't actually do the optimized cases for any architecture. Equally
importantly, it already self-disables itself if any of the complicated
debug options are set that means that a spinlock is anything but the
raw architecture spinlock.

STEP 2:

 - Now, step #2 is *not* to do per-architecture version, but to do the
generic cmpxchg version, and make that in
<asm-generic/spinlock-refcount.h>

At this point, any architecture can decide to hook into the generic
version with something like this in their own architecture Kconfig
file:

    select CONFIG_ARCH_SPINLOCK_REFCOUNT

and adding the line

    generic-y += spinlock-refcount.h

to their architecture Kbuild file. So after step two, you basically
have an easy way for architectures to step in one by one with the
cmpxchg version, after they have verified that their spinlocks work
for it. And notice how little the architecture actually needed to do?

STEP 3:

 - This is the "architecture decides to not use the
<asm-generic/spinlock-refcount.h> file, and instead creates its own
optimized version of it.

All these steps are nice and easy and can be tested on their own.  And
make sense on their own. Even that first step is important, since
that's when you create the logic, and that's the code that will be
used when spinlock debugging is enabled etc. The third step may not
ever happen for all architectures. Even the second step might be
something that some architecture decides is not worth it (ie slow and
complex atomics, but fast spinlocks).

And all three steps should be fairly small and logical, I think.

                  Linus

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 1/2] spinlock: New spinlock_refcount.h for lockless update of refcount
  2013-06-29 21:34       ` Waiman Long
@ 2013-06-29 22:11         ` Linus Torvalds
  2013-06-29 22:34           ` Waiman Long
  0 siblings, 1 reply; 28+ messages in thread
From: Linus Torvalds @ 2013-06-29 22:11 UTC (permalink / raw)
  To: Waiman Long
  Cc: Alexander Viro, Jeff Layton, Miklos Szeredi, Ingo Molnar,
	linux-fsdevel, Linux Kernel Mailing List, Benjamin Herrenschmidt,
	Andi Kleen, Chandramouleeswaran, Aswin, Norton, Scott J,
	Thomas Gleixner, Peter Zijlstra, Steven Rostedt

On Sat, Jun 29, 2013 at 2:34 PM, Waiman Long <waiman.long@hp.com> wrote:
>
> I think I got it now. For architecture with transactional memory support to
> use an alternative implementation, we will need to use some kind of dynamic
> patching at kernel boot up time as not all CPUs in that architecture will
> have that support. In that case the helper functions have to be real
> functions and cannot be inlined. That means I need to put the implementation
> into a spinlock_refcount.c file with the header file contains structure
> definitions and function prototypes only. Is that what you are looking for?

Yes. Except even more complex: I want the generic fallbacks in a
lib/*.c files too.

So we basically have multiple "levels" of specialization:

 (a) the purely lock-based model that doesn't do any optimization at
all, because we have lockdep enabled etc, so we *want* things to fall
back to real spinlocks.

 (b) the generic cmpxchg approach for the case when that works

 (c) the capability for an architecture to make up its own very
specialized version

and while I think in all cases the actual functions are big enough
that you don't ever want to inline them, at least in the case of (c)
it is entirely possible that the architecture actually wants a
particular layout for the spinlock and refcount, so we do want the
architecture to be able to specify the exact data structure in its own
<asm/spinlock-refcount.h> file. In fact, that may well be true of case
(b) too, as Andi already pointed out that on x86-32, an "u64" is not
necessarily sufficiently aligned for efficient cmpxchg (it may *work*,
but cacheline-crossing atomics are very very slow).

Other architectures may have other issues - even with a "generic"
cmpxchg-based library version, they may well want to specify exactly
how to take the lock. So while (a) would be 100% generic, (b) might
need small architecture-specific tweaks, and (c) would be a full
custom implementation.

See how we do <asm/word-at-a-time.h> and CONFIG_DCACHE_WORD_ACCESS.
Notice how there is a "generic" <asm-generic/word-at-a-time.h> file
(actually, big-endian only) for reference implementations (used by
sparc, m68k and parisc, for example), and then you have "full custom"
implementations for x86, powerpc, alpha and ARM.

See also lib/strnlen_user.c and CONFIG_GENERIC_STRNLEN_USER as an
example of how architectures may choose to opt in to using generic
library versions - if those work sufficiently well for that
architecture. Again, some architecture may decide to write their own
fully custome strlen_user() function.

Very similar concept.

                 Linus

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 1/2] spinlock: New spinlock_refcount.h for lockless update of refcount
  2013-06-29 22:11         ` Linus Torvalds
@ 2013-06-29 22:34           ` Waiman Long
  0 siblings, 0 replies; 28+ messages in thread
From: Waiman Long @ 2013-06-29 22:34 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Alexander Viro, Jeff Layton, Miklos Szeredi, Ingo Molnar,
	linux-fsdevel, Linux Kernel Mailing List, Benjamin Herrenschmidt,
	Andi Kleen, Chandramouleeswaran, Aswin, Norton, Scott J,
	Thomas Gleixner, Peter Zijlstra, Steven Rostedt

On 06/29/2013 06:11 PM, Linus Torvalds wrote:
> On Sat, Jun 29, 2013 at 2:34 PM, Waiman Long<waiman.long@hp.com>  wrote:
>> I think I got it now. For architecture with transactional memory support to
>> use an alternative implementation, we will need to use some kind of dynamic
>> patching at kernel boot up time as not all CPUs in that architecture will
>> have that support. In that case the helper functions have to be real
>> functions and cannot be inlined. That means I need to put the implementation
>> into a spinlock_refcount.c file with the header file contains structure
>> definitions and function prototypes only. Is that what you are looking for?
> Yes. Except even more complex: I want the generic fallbacks in a
> lib/*.c files too.
>
> So we basically have multiple "levels" of specialization:
>
>   (a) the purely lock-based model that doesn't do any optimization at
> all, because we have lockdep enabled etc, so we *want* things to fall
> back to real spinlocks.
>
>   (b) the generic cmpxchg approach for the case when that works
>
>   (c) the capability for an architecture to make up its own very
> specialized version
>
> and while I think in all cases the actual functions are big enough
> that you don't ever want to inline them, at least in the case of (c)
> it is entirely possible that the architecture actually wants a
> particular layout for the spinlock and refcount, so we do want the
> architecture to be able to specify the exact data structure in its own
> <asm/spinlock-refcount.h>  file. In fact, that may well be true of case
> (b) too, as Andi already pointed out that on x86-32, an "u64" is not
> necessarily sufficiently aligned for efficient cmpxchg (it may *work*,
> but cacheline-crossing atomics are very very slow).
>
> Other architectures may have other issues - even with a "generic"
> cmpxchg-based library version, they may well want to specify exactly
> how to take the lock. So while (a) would be 100% generic, (b) might
> need small architecture-specific tweaks, and (c) would be a full
> custom implementation.
>
> See how we do<asm/word-at-a-time.h>  and CONFIG_DCACHE_WORD_ACCESS.
> Notice how there is a "generic"<asm-generic/word-at-a-time.h>  file
> (actually, big-endian only) for reference implementations (used by
> sparc, m68k and parisc, for example), and then you have "full custom"
> implementations for x86, powerpc, alpha and ARM.
>
> See also lib/strnlen_user.c and CONFIG_GENERIC_STRNLEN_USER as an
> example of how architectures may choose to opt in to using generic
> library versions - if those work sufficiently well for that
> architecture. Again, some architecture may decide to write their own
> fully custome strlen_user() function.
>
> Very similar concept.
>
>                   Linus

Thank for the quick response. I now have a much better idea of what I 
need to do. I will send out a new patch for review once the code is ready.

Regards,
Longman

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 1/2] spinlock: New spinlock_refcount.h for lockless update of refcount
  2013-06-29 21:58       ` Linus Torvalds
@ 2013-06-29 22:47         ` Linus Torvalds
  2013-07-01 13:40           ` Waiman Long
  0 siblings, 1 reply; 28+ messages in thread
From: Linus Torvalds @ 2013-06-29 22:47 UTC (permalink / raw)
  To: Waiman Long
  Cc: Alexander Viro, Jeff Layton, Miklos Szeredi, Ingo Molnar,
	linux-fsdevel, Linux Kernel Mailing List, Benjamin Herrenschmidt,
	Andi Kleen, Chandramouleeswaran, Aswin, Norton, Scott J,
	Thomas Gleixner, Peter Zijlstra, Steven Rostedt

On Sat, Jun 29, 2013 at 2:58 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> STEP 1:
>
>  - create a new set of config option (say
> "CONFIG_[ARCH_]SPINLOCK_REFCOUNT") that defaults to 'y' and isn't
> actually asked about anywhere.

That "defauls to 'y'" was incomplete/misleading. The
ARCH_SPINLOCK_REFCOUNT config option should default to 'n' (the
example code snippet did that correctly: a bool config option defaults
to 'n' unless stated otherwise).

The SPINLOCK_REFCOUNT in turn _should_ have a "default y" (the sample
snippet I posted didn't have that), but that 'y' is then obviously
normally disabled by the "depends on" logic, so it only takes effect
when ARCH_SPINLOCK_REFCOUNT is set, and none of the spinlock debug
things are enabled.

I hope that was all fairly obvious from the concept (and the real
examples of us already doing things like this with the whole
CONFIG_GENERIC_STRNLEN_USER and CONFIG_DCACHE_WORD_ACCESS examples),
but I thought I'd follow up with an explicit correction, since both
the explanation and sample snippets were somewhat misleading.

                Linus

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 1/2] spinlock: New spinlock_refcount.h for lockless update of refcount
  2013-06-29 22:47         ` Linus Torvalds
@ 2013-07-01 13:40           ` Waiman Long
  0 siblings, 0 replies; 28+ messages in thread
From: Waiman Long @ 2013-07-01 13:40 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Alexander Viro, Jeff Layton, Miklos Szeredi, Ingo Molnar,
	linux-fsdevel, Linux Kernel Mailing List, Benjamin Herrenschmidt,
	Andi Kleen, Chandramouleeswaran, Aswin, Norton, Scott J,
	Thomas Gleixner, Peter Zijlstra, Steven Rostedt

On 06/29/2013 06:47 PM, Linus Torvalds wrote:
> On Sat, Jun 29, 2013 at 2:58 PM, Linus Torvalds
> <torvalds@linux-foundation.org>  wrote:
>> STEP 1:
>>
>>   - create a new set of config option (say
>> "CONFIG_[ARCH_]SPINLOCK_REFCOUNT") that defaults to 'y' and isn't
>> actually asked about anywhere.
> That "defauls to 'y'" was incomplete/misleading. The
> ARCH_SPINLOCK_REFCOUNT config option should default to 'n' (the
> example code snippet did that correctly: a bool config option defaults
> to 'n' unless stated otherwise).
>
> The SPINLOCK_REFCOUNT in turn _should_ have a "default y" (the sample
> snippet I posted didn't have that), but that 'y' is then obviously
> normally disabled by the "depends on" logic, so it only takes effect
> when ARCH_SPINLOCK_REFCOUNT is set, and none of the spinlock debug
> things are enabled.
>
> I hope that was all fairly obvious from the concept (and the real
> examples of us already doing things like this with the whole
> CONFIG_GENERIC_STRNLEN_USER and CONFIG_DCACHE_WORD_ACCESS examples),
> but I thought I'd follow up with an explicit correction, since both
> the explanation and sample snippets were somewhat misleading.
>
>                  Linus

Thank for the clarification, but I had already understood the 
implication of your suggestion. Each architecture has to explicitly 
opt-in by defining an ARCH_SPINLOCK_REFCOUNT in its Kconfig file to have 
this lockref feature enabled. In this way, each architecture can take 
its time to verify it works before enabling it. My patch will enable 
enable the x86 as I don't have test machines for the other architectures.

Regards,
Longman

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2013-07-01 13:40 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-06-26 17:43 [PATCH v2 0/2] Lockless update of reference count protected by spinlock Waiman Long
2013-06-26 17:43 ` [PATCH v2 1/2] spinlock: New spinlock_refcount.h for lockless update of refcount Waiman Long
2013-06-26 20:17   ` Andi Kleen
2013-06-26 21:07     ` Waiman Long
2013-06-26 21:22       ` Andi Kleen
2013-06-26 23:26         ` Waiman Long
2013-06-27  1:06           ` Andi Kleen
2013-06-27  1:15             ` Waiman Long
2013-06-27  1:24               ` Waiman Long
2013-06-27  1:37                 ` Andi Kleen
2013-06-27 14:56                   ` Waiman Long
2013-06-28 13:46                     ` Thomas Gleixner
2013-06-29 20:30                       ` Waiman Long
2013-06-26 23:27         ` Thomas Gleixner
2013-06-26 23:06   ` Thomas Gleixner
2013-06-27  0:16     ` Waiman Long
2013-06-27 14:44       ` Thomas Gleixner
2013-06-29 21:03         ` Waiman Long
2013-06-27  0:26     ` Waiman Long
2013-06-29 17:45   ` Linus Torvalds
2013-06-29 20:23     ` Waiman Long
2013-06-29 21:34       ` Waiman Long
2013-06-29 22:11         ` Linus Torvalds
2013-06-29 22:34           ` Waiman Long
2013-06-29 21:58       ` Linus Torvalds
2013-06-29 22:47         ` Linus Torvalds
2013-07-01 13:40           ` Waiman Long
2013-06-26 17:43 ` [PATCH v2 2/2] dcache: Locklessly update d_count whenever possible Waiman Long

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).