* [PATCH v2 0/2] Lockless update of reference count protected by spinlock
@ 2013-06-26 17:43 Waiman Long
2013-06-26 17:43 ` [PATCH v2 1/2] spinlock: New spinlock_refcount.h for lockless update of refcount Waiman Long
2013-06-26 17:43 ` [PATCH v2 2/2] dcache: Locklessly update d_count whenever possible Waiman Long
0 siblings, 2 replies; 28+ messages in thread
From: Waiman Long @ 2013-06-26 17:43 UTC (permalink / raw)
To: Alexander Viro, Jeff Layton, Miklos Szeredi, Ingo Molnar
Cc: Waiman Long, linux-fsdevel, linux-kernel, Linus Torvalds,
Benjamin Herrenschmidt, Andi Kleen, Chandramouleeswaran, Aswin,
Norton, Scott J
v1->v2:
- Add one more layer of indirection to LOCK_WITH_REFCOUNT macro.
- Add __LINUX_SPINLOCK_REFCOUNT_H protection to spinlock_refcount.h.
- Add some generic get/put macros into spinlock_refcount.h.
This patchset supports a generic mechanism to atomically update
a reference count that is protected by a spinlock without actually
acquiring the lock itself. If the update doesn't succeeed, the caller
will have to acquire the lock and update the reference count in the
the old way. This will help in situation where there is a lot of
spinlock contention because of frequent reference count update.
The new mechanism was designed in such a way to easily retrofit
into existing code with minimal changes. Both the spinlock and the
reference count can be accessed in the same way as before.
The d_lock and d_count fields of the struct dentry in dcache.h was
modified to use the new mechanism. This serves as an example of how
to convert existing spinlock and reference count combo to use the
new way of locklessly updating the reference count.
Signed-off-by: Waiman Long <Waiman.Long@hp.com>
Waiman Long (2):
spinlock: New spinlock_refcount.h for lockless update of refcount
dcache: Locklessly update d_count whenever possible
fs/dcache.c | 14 +++-
include/linux/dcache.h | 6 +-
include/linux/spinlock_refcount.h | 177 +++++++++++++++++++++++++++++++++++++
include/linux/spinlock_types.h | 19 ++++
4 files changed, 212 insertions(+), 4 deletions(-)
create mode 100644 include/linux/spinlock_refcount.h
^ permalink raw reply [flat|nested] 28+ messages in thread
* [PATCH v2 1/2] spinlock: New spinlock_refcount.h for lockless update of refcount
2013-06-26 17:43 [PATCH v2 0/2] Lockless update of reference count protected by spinlock Waiman Long
@ 2013-06-26 17:43 ` Waiman Long
2013-06-26 20:17 ` Andi Kleen
` (2 more replies)
2013-06-26 17:43 ` [PATCH v2 2/2] dcache: Locklessly update d_count whenever possible Waiman Long
1 sibling, 3 replies; 28+ messages in thread
From: Waiman Long @ 2013-06-26 17:43 UTC (permalink / raw)
To: Alexander Viro, Jeff Layton, Miklos Szeredi, Ingo Molnar
Cc: Waiman Long, linux-fsdevel, linux-kernel, Linus Torvalds,
Benjamin Herrenschmidt, Andi Kleen, Chandramouleeswaran, Aswin,
Norton, Scott J
This patch introduces a new spinlock_refcount.h header file to be
included by kernel code that want to do a lockless update of reference
count protected by a spinlock.
To try to locklessly update the reference count while lock isn't
acquired by others, the 32-bit count and 32-bit raw spinlock can be
combined into a single 64-bit word to be updated atomically whenever
the following conditions are true:
1. The lock is not taken, i.e. spin_can_lock() returns true.
2. The value of the count isn't equal to the given non-negative
threshold value.
To maximize the chance of doing lockless update, the inlined
__lockcnt_add_unless() function calls spin_unlock_wait() before trying
to do the update.
The new code also attempts to do lockless atomic update twice before
falling back to the old code path of acquring a lock before doing
the update. It is because there will still be some fair amount of
contention with only one attempt.
After including the header file, the LOCK_WITH_REFCOUNT() macro
should be used to define the spinlock with reference count combo in
an embedding data structure. Then the __lockcnt_add_unless() inlined
function can be used to locklessly update the reference count if the
lock hasn't be acquired by others. Some predefined lockref_*() macros
that call __lockcnt_add_unless() can also be used for this purpose.
Build and boot tests of the new code and the associated dcache changes
were conducted for the following configurations:
1. x86 64-bit SMP, CONFIG_DEBUG_SPINLOCK=n
2. x86 64-bit SMP, CONFIG_DEBUG_SPINLOCK=y
3. x86 32-bit UP , CONFIG_DEBUG_SPINLOCK=n
4. x86 32-bit SMP, CONFIG_DEBUG_SPINLOCK=n
5. x86 32-bit SMP, CONFIG_DEBUG_SPINLOCK=y
Signed-off-by: Waiman Long <Waiman.Long@hp.com>
---
include/linux/spinlock_refcount.h | 177 +++++++++++++++++++++++++++++++++++++
include/linux/spinlock_types.h | 19 ++++
2 files changed, 196 insertions(+), 0 deletions(-)
create mode 100644 include/linux/spinlock_refcount.h
diff --git a/include/linux/spinlock_refcount.h b/include/linux/spinlock_refcount.h
new file mode 100644
index 0000000..0160861
--- /dev/null
+++ b/include/linux/spinlock_refcount.h
@@ -0,0 +1,177 @@
+/*
+ * Spinlock with reference count combo
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * (C) Copyright 2013 Hewlett-Packard Development Company, L.P.
+ *
+ * Authors: Waiman Long <waiman.long@hp.com>
+ */
+#ifndef __LINUX_SPINLOCK_REFCOUNT_H
+#define __LINUX_SPINLOCK_REFCOUNT_H
+
+#include <linux/spinlock.h>
+
+/*
+ * The raw __LOCK_WITH_REFCOUNT() macro defines the combined spinlock with
+ * reference count data structure to be embedded in a larger structure.
+ * With unnamed union, the lock and count names can be accessed directly if
+ * no field name is assigned to the structure. Otherwise, they will have to
+ * be accessed indirectly via the assigned field name of the combined
+ * structure.
+ *
+ * The combined data structure is 8-byte aligned. So proper placement of this
+ * structure in the larger embedding data structure is needed to ensure that
+ * there is no hole in it.
+ *
+ * @lock: Name of the spinlock
+ * @count: Name of the reference count
+ * @u_name: Name of combined data structure union (can be empty for unnamed
+ * union)
+ */
+#ifndef CONFIG_SMP
+#define __LOCK_WITH_REFCOUNT(lock, count, u_name) \
+ unsigned int count; \
+ spinlock_t lock
+
+#elif defined(__SPINLOCK_HAS_REFCOUNT)
+#define __LOCK_WITH_REFCOUNT(lock, count, u_name) \
+ union u_name { \
+ u64 __lock_count; \
+ spinlock_t lock; \
+ struct { \
+ arch_spinlock_t __raw_lock; \
+ unsigned int count; \
+ }; \
+ }
+
+#else /* __SPINLOCK_HAS_REFCOUNT */
+#define __LOCK_WITH_REFCOUNT(lock, count, u_name) \
+ union u_name { \
+ u64 __lock_count; \
+ struct { \
+ unsigned int count; \
+ spinlock_t lock; \
+ }; \
+ }
+
+#endif /* __SPINLOCK_HAS_REFCOUNT */
+
+/*
+ * The LOCK_WITH_REFCOUNT() macro create an unnamed union structure
+ */
+#define LOCK_WITH_REFCOUNT(lock, count) \
+ __LOCK_WITH_REFCOUNT(lock, count, /**/)
+
+#ifdef CONFIG_SMP
+#define LOCKCNT_COMBO_PTR(s) (&(s)->__lock_count)
+
+/*
+ * Define a "union _lock_refcnt" structure to be used by the helper function
+ */
+__LOCK_WITH_REFCOUNT(lock, count, __lock_refcnt);
+
+/**
+ *
+ * __lockcnt_add_unless - atomically add the given value to the count unless
+ * the lock was acquired or the count equals to the
+ * given threshold value.
+ *
+ * @plockcnt : pointer to the combined lock and count 8-byte data
+ * @plock : pointer to the spinlock
+ * @pcount : pointer to the reference count
+ * @value : value to be added
+ * @threshold: threshold value for acquiring the lock
+ * Return : 1 if operation succeed, 0 otherwise
+ *
+ * If the lock was not acquired, __lockcnt_add_unless() atomically adds the
+ * given value to the reference count unless the given threshold is reached.
+ * If the lock was acquired or the threshold was reached, 0 is returned and
+ * the caller will have to acquire the lock and update the count accordingly
+ * (can be done in a non-atomic way).
+ */
+static inline int
+__lockcnt_add_unless(u64 *plockcnt, spinlock_t *plock, unsigned int *pcount,
+ int value, int threshold)
+{
+ union __lock_refcnt old, new;
+ int get_lock;
+
+ /*
+ * Code doesn't work if raw spinlock is larger than 4 bytes
+ * or is empty.
+ */
+ BUG_ON((sizeof(arch_spinlock_t) > 4) || (sizeof(arch_spinlock_t) == 0));
+
+ spin_unlock_wait(plock); /* Wait until lock is released */
+ old.__lock_count = ACCESS_ONCE(*plockcnt);
+ get_lock = ((threshold >= 0) && (old.count == threshold));
+ if (likely(!get_lock && spin_can_lock(&old.lock))) {
+ new.__lock_count = old.__lock_count;
+ new.count += value;
+ if (cmpxchg64(plockcnt, old.__lock_count, new.__lock_count)
+ == old.__lock_count)
+ return 1;
+ cpu_relax();
+ /*
+ * Try one more time
+ */
+ old.__lock_count = ACCESS_ONCE(*plockcnt);
+ get_lock = ((threshold >= 0) && (old.count == threshold));
+ if (likely(!get_lock && spin_can_lock(&old.lock))) {
+ new.__lock_count = old.__lock_count;
+ new.count += value;
+ if (cmpxchg64(plockcnt, old.__lock_count,
+ new.__lock_count) == old.__lock_count)
+ return 1;
+ cpu_relax();
+ }
+ }
+ return 0; /* The caller will need to acquire the lock */
+}
+#else /* CONFIG_SMP */
+#define LOCKCNT_COMBO_PTR(s) NULL
+
+/*
+ * Just add the value as the spinlock is a no-op
+ */
+static inline int
+__lockcnt_add_unless(u64 *plockcnt, spinlock_t *plock, unsigned int *pcount,
+ int value, int threshold)
+{
+ if ((threshold >= 0) && (*pcount == threshold))
+ return 0;
+ (*pcount) += value;
+ return 1;
+}
+#endif /* CONFIG_SMP */
+
+/*
+ * Generic macros to increment and decrement reference count
+ * @sname : pointer to structure that embeds spinlock & reference count combo
+ * @lock : name of spinlock in structure
+ * @count : name of reference count in structure
+ *
+ * lockref_get() - increments count unless it is locked
+ * lockref_get0() - increments count unless it is locked or count is 0
+ * lockref_put() - decrements count unless it is locked or count is 1
+ */
+#define lockref_get(sname, lock, count) \
+ __lockcnt_add_unless(LOCKCNT_COMBO_PTR(sname), &(sname)->lock, \
+ &(sname)->count, 1, -1)
+#define lockref_get0(sname, lock, count) \
+ __lockcnt_add_unless(LOCKCNT_COMBO_PTR(sname), &(sname)->lock, \
+ &(sname)->count, 1, 0)
+#define lockref_put(sname, lock, count) \
+ __lockcnt_add_unless(LOCKCNT_COMBO_PTR(sname), &(sname)->lock, \
+ &(sname)->count, -1, 1)
+
+#endif /* __LINUX_SPINLOCK_REFCOUNT_H */
diff --git a/include/linux/spinlock_types.h b/include/linux/spinlock_types.h
index 73548eb..cc107ad 100644
--- a/include/linux/spinlock_types.h
+++ b/include/linux/spinlock_types.h
@@ -17,8 +17,27 @@
#include <linux/lockdep.h>
+/*
+ * The presence of either one of the CONFIG_DEBUG_SPINLOCK or
+ * CONFIG_DEBUG_LOCK_ALLOC configuration parameter will force the
+ * spinlock_t structure to be 8-byte aligned.
+ *
+ * To support the spinlock/reference count combo data type for 64-bit SMP
+ * environment with spinlock debugging turned on, the reference count has
+ * to be integrated into the spinlock_t data structure in this special case.
+ * The spinlock_t data type will be 8 bytes larger if CONFIG_GENERIC_LOCKBREAK
+ * is also defined.
+ */
+#if defined(CONFIG_64BIT) && (defined(CONFIG_DEBUG_SPINLOCK) ||\
+ defined(CONFIG_DEBUG_LOCK_ALLOC))
+#define __SPINLOCK_HAS_REFCOUNT 1
+#endif
+
typedef struct raw_spinlock {
arch_spinlock_t raw_lock;
+#ifdef __SPINLOCK_HAS_REFCOUNT
+ unsigned int count;
+#endif
#ifdef CONFIG_GENERIC_LOCKBREAK
unsigned int break_lock;
#endif
--
1.7.1
^ permalink raw reply related [flat|nested] 28+ messages in thread
* [PATCH v2 2/2] dcache: Locklessly update d_count whenever possible
2013-06-26 17:43 [PATCH v2 0/2] Lockless update of reference count protected by spinlock Waiman Long
2013-06-26 17:43 ` [PATCH v2 1/2] spinlock: New spinlock_refcount.h for lockless update of refcount Waiman Long
@ 2013-06-26 17:43 ` Waiman Long
1 sibling, 0 replies; 28+ messages in thread
From: Waiman Long @ 2013-06-26 17:43 UTC (permalink / raw)
To: Alexander Viro, Jeff Layton, Miklos Szeredi, Ingo Molnar
Cc: Waiman Long, linux-fsdevel, linux-kernel, Linus Torvalds,
Benjamin Herrenschmidt, Andi Kleen, Chandramouleeswaran, Aswin,
Norton, Scott J
The current code takes the dentry's d_lock lock whenever the d_count
reference count is being updated. In reality, nothing big really
happens until d_count goes to 0 in dput(). So it is not necessary
to take the lock if the reference count won't go to 0. On the other
hand, there are cases where d_count should not be updated or was not
expected to be updated while d_lock was acquired by another thread.
The new mechanism for locklessly update a reference count in the
spinlock & reference count combo was used to ensure that the reference
count was updated locklessly as much as possible.
The offsets of the d_count/d_lock combo are at byte 72 and 88 for
32-bit and 64-bit SMP systems respectively. In both cases, they are
8-byte aligned and their combination into a single 8-byte word will
not introduce a hole that increase the size of the dentry structure.
This patch has a particular big impact on the short workload of the
AIM7 benchmark with ramdisk filesystem. The table below show the
performance improvement to the JPM (jobs per minutes) throughput due
to this patch on an 8-socket 80-core x86-64 system with a 3.10-rc4
kernel in a 1/2/4/8 node configuration by using numactl to restrict
the execution of the workload on certain nodes.
+-----------------+----------------+-----------------+----------+
| Configuration | Mean JPM | Mean JPM | % Change |
| | Rate w/o patch | Rate with patch | |
+-----------------+---------------------------------------------+
| | User Range 10 - 100 |
+-----------------+---------------------------------------------+
| 8 nodes, HT off | 1596798 | 4721367 | +195.7% |
| 4 nodes, HT off | 1653817 | 5020983 | +203.6% |
| 2 nodes, HT off | 3802258 | 3813229 | +0.3% |
| 1 node , HT off | 2403297 | 2433240 | +1.3% |
+-----------------+---------------------------------------------+
| | User Range 200 - 1000 |
+-----------------+---------------------------------------------+
| 8 nodes, HT off | 1070992 | 5878321 | +448.9% |
| 4 nodes, HT off | 1367668 | 7578718 | +454.1% |
| 2 nodes, HT off | 4554370 | 4614674 | +1.3% |
| 1 node , HT off | 2534826 | 2540622 | +0.2% |
+-----------------+---------------------------------------------+
| | User Range 1100 - 2000 |
+-----------------+---------------------------------------------+
| 8 nodes, HT off | 1061322 | 6397776 | +502.8% |
| 4 nodes, HT off | 1365111 | 6980558 | +411.3% |
| 2 nodes, HT off | 4583947 | 4637919 | +1.2% |
| 1 node , HT off | 2563721 | 2587611 | +0.9% |
+-----------------+----------------+-----------------+----------+
It can be seen that with 40 CPUs (4 nodes) or more, this patch can
significantly improve the short workload performance. With only 1 or
2 nodes, the performance is similar with or without the patch. The
short workload also scales pretty well up to 4 nodes with this patch.
A perf call-graph report of the short workload at 1500 users
without the patch on the same 8-node machine indicates that about
78% of the workload's total time were spent in the _raw_spin_lock()
function. Almost all of which can be attributed to the following 2
kernel functions:
1. dget_parent (49.91%)
2. dput (49.89%)
The relevant perf report lines are:
+ 78.37% reaim [kernel.kallsyms] [k] _raw_spin_lock
+ 0.09% reaim [kernel.kallsyms] [k] dput
+ 0.05% reaim [kernel.kallsyms] [k] _raw_spin_lock_irq
+ 0.00% reaim [kernel.kallsyms] [k] dget_parent
With this patch installed, the new perf report lines are:
+ 12.24% reaim [kernel.kallsyms] [k] _raw_spin_lock_irqsave
+ 4.82% reaim [kernel.kallsyms] [k] _raw_spin_lock
+ 3.26% reaim [kernel.kallsyms] [k] dget_parent
+ 1.08% reaim [kernel.kallsyms] [k] dput
- 4.82% reaim [kernel.kallsyms] [k] _raw_spin_lock
- _raw_spin_lock
+ 34.41% d_path
+ 33.22% SyS_getcwd
+ 4.38% prepend_path
+ 3.71% dget_parent
+ 2.54% inet_twsk_schedule
+ 2.44% complete_walk
+ 2.01% __rcu_process_callbacks
+ 1.78% dput
+ 1.36% unlazy_walk
+ 1.18% do_anonymous_page
+ 1.10% sem_lock
+ 0.82% process_backlog
+ 0.78% task_rq_lock
+ 0.67% selinux_inode_free_security
+ 0.60% unix_dgram_sendmsg
+ 0.54% enqueue_to_backlog
The dput used up only 1.78% of the _raw_spin_lock time while
dget_parent used only 3.71%. The time spent on dput and dget_parent
did increase because of busy waiting for unlock as well as the overhead
of doing cmpxchg operations.
This impact of this patch on other AIM7 workloads were much more
modest. The table below show the mean %change due to this patch on
the same 8-socket system with a 3.10-rc4 kernel.
+--------------+---------------+----------------+-----------------+
| Workload | mean % change | mean % change | mean % change |
| | 10-100 users | 200-1000 users | 1100-2000 users |
+--------------+---------------+----------------+-----------------+
| alltests | 0.0% | -0.8% | +0.1% |
| five_sec | -2.4% | +1.7% | -0.1% |
| fserver | -1.8% | -2.4% | -2.1% |
| high_systime | +0.1% | +0.7% | +2.1% |
| new_fserver | +0.2% | -1.7% | -0.9% |
| shared | -0.1% | 0.0% | -0.1% |
+--------------+---------------+----------------+-----------------+
There are slight drops in performance for fsever and new_fserver
workloads, but slight increase in the high_systime workload.
Signed-off-by: Waiman Long <Waiman.Long@hp.com>
---
fs/dcache.c | 14 ++++++++++++--
include/linux/dcache.h | 6 ++++--
2 files changed, 16 insertions(+), 4 deletions(-)
diff --git a/fs/dcache.c b/fs/dcache.c
index f09b908..1f8f78d 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -515,6 +515,8 @@ void dput(struct dentry *dentry)
repeat:
if (dentry->d_count == 1)
might_sleep();
+ else if (lockref_put(dentry, d_lock, d_count))
+ return;
spin_lock(&dentry->d_lock);
BUG_ON(!dentry->d_count);
if (dentry->d_count > 1) {
@@ -611,6 +613,8 @@ static inline void __dget_dlock(struct dentry *dentry)
static inline void __dget(struct dentry *dentry)
{
+ if (lockref_get(dentry, d_lock, d_count))
+ return;
spin_lock(&dentry->d_lock);
__dget_dlock(dentry);
spin_unlock(&dentry->d_lock);
@@ -620,17 +624,23 @@ struct dentry *dget_parent(struct dentry *dentry)
{
struct dentry *ret;
+ rcu_read_lock();
+ ret = rcu_dereference(dentry->d_parent);
+ if (lockref_get0(ret, d_lock, d_count)) {
+ rcu_read_unlock();
+ return ret;
+ }
repeat:
/*
* Don't need rcu_dereference because we re-check it was correct under
* the lock.
*/
- rcu_read_lock();
- ret = dentry->d_parent;
+ ret = ACCESS_ONCE(dentry->d_parent);
spin_lock(&ret->d_lock);
if (unlikely(ret != dentry->d_parent)) {
spin_unlock(&ret->d_lock);
rcu_read_unlock();
+ rcu_read_lock();
goto repeat;
}
rcu_read_unlock();
diff --git a/include/linux/dcache.h b/include/linux/dcache.h
index 1a6bb81..35d900a 100644
--- a/include/linux/dcache.h
+++ b/include/linux/dcache.h
@@ -6,6 +6,7 @@
#include <linux/rculist.h>
#include <linux/rculist_bl.h>
#include <linux/spinlock.h>
+#include <linux/spinlock_refcount.h>
#include <linux/seqlock.h>
#include <linux/cache.h>
#include <linux/rcupdate.h>
@@ -112,8 +113,7 @@ struct dentry {
unsigned char d_iname[DNAME_INLINE_LEN]; /* small names */
/* Ref lookup also touches following */
- unsigned int d_count; /* protected by d_lock */
- spinlock_t d_lock; /* per dentry lock */
+ LOCK_WITH_REFCOUNT(d_lock, d_count); /* per dentry lock & count */
const struct dentry_operations *d_op;
struct super_block *d_sb; /* The root of the dentry tree */
unsigned long d_time; /* used by d_revalidate */
@@ -359,6 +359,8 @@ static inline struct dentry *dget_dlock(struct dentry *dentry)
static inline struct dentry *dget(struct dentry *dentry)
{
if (dentry) {
+ if (lockref_get(dentry, d_lock, d_count))
+ return dentry;
spin_lock(&dentry->d_lock);
dget_dlock(dentry);
spin_unlock(&dentry->d_lock);
--
1.7.1
^ permalink raw reply related [flat|nested] 28+ messages in thread
* Re: [PATCH v2 1/2] spinlock: New spinlock_refcount.h for lockless update of refcount
2013-06-26 17:43 ` [PATCH v2 1/2] spinlock: New spinlock_refcount.h for lockless update of refcount Waiman Long
@ 2013-06-26 20:17 ` Andi Kleen
2013-06-26 21:07 ` Waiman Long
2013-06-26 23:06 ` Thomas Gleixner
2013-06-29 17:45 ` Linus Torvalds
2 siblings, 1 reply; 28+ messages in thread
From: Andi Kleen @ 2013-06-26 20:17 UTC (permalink / raw)
To: Waiman Long
Cc: Alexander Viro, Jeff Layton, Miklos Szeredi, Ingo Molnar,
linux-fsdevel, linux-kernel, Linus Torvalds,
Benjamin Herrenschmidt, Andi Kleen, Chandramouleeswaran, Aswin,
Norton, Scott J
> + * The combined data structure is 8-byte aligned. So proper placement of this
> + * structure in the larger embedding data structure is needed to ensure that
> + * there is no hole in it.
On i386 u64 is only 4 bytes aligned. So you need to explicitely align
it to 8 bytes. Otherwise you risk the two members crossing a cache line, which
would be really expensive with atomics.
> + /*
> + * Code doesn't work if raw spinlock is larger than 4 bytes
> + * or is empty.
> + */
> + BUG_ON((sizeof(arch_spinlock_t) > 4) || (sizeof(arch_spinlock_t) == 0));
BUILD_BUG_ON
> +
> + spin_unlock_wait(plock); /* Wait until lock is released */
> + old.__lock_count = ACCESS_ONCE(*plockcnt);
> + get_lock = ((threshold >= 0) && (old.count == threshold));
> + if (likely(!get_lock && spin_can_lock(&old.lock))) {
What is that for? Why can't you do the CMPXCHG unconditially ?
If it's really needed, it is most likely a race?
The duplicated code should be likely an inline.
> +/*
> + * The presence of either one of the CONFIG_DEBUG_SPINLOCK or
> + * CONFIG_DEBUG_LOCK_ALLOC configuration parameter will force the
> + * spinlock_t structure to be 8-byte aligned.
> + *
> + * To support the spinlock/reference count combo data type for 64-bit SMP
> + * environment with spinlock debugging turned on, the reference count has
> + * to be integrated into the spinlock_t data structure in this special case.
> + * The spinlock_t data type will be 8 bytes larger if CONFIG_GENERIC_LOCKBREAK
> + * is also defined.
I would rather just disable the optimization when these CONFIGs are set
-Andi
--
ak@linux.intel.com -- Speaking for myself only.
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH v2 1/2] spinlock: New spinlock_refcount.h for lockless update of refcount
2013-06-26 20:17 ` Andi Kleen
@ 2013-06-26 21:07 ` Waiman Long
2013-06-26 21:22 ` Andi Kleen
0 siblings, 1 reply; 28+ messages in thread
From: Waiman Long @ 2013-06-26 21:07 UTC (permalink / raw)
To: Andi Kleen
Cc: Alexander Viro, Jeff Layton, Miklos Szeredi, Ingo Molnar,
linux-fsdevel, linux-kernel, Linus Torvalds,
Benjamin Herrenschmidt, Chandramouleeswaran, Aswin,
Norton, Scott J
On 06/26/2013 04:17 PM, Andi Kleen wrote:
>> + * The combined data structure is 8-byte aligned. So proper placement of this
>> + * structure in the larger embedding data structure is needed to ensure that
>> + * there is no hole in it.
> On i386 u64 is only 4 bytes aligned. So you need to explicitely align
> it to 8 bytes. Otherwise you risk the two members crossing a cache line, which
> would be really expensive with atomics.
Do you mean the original i386 or the i586 that are now used by most
distribution now? If it is the former, I recall that i386 is now no
longer supported.
I also look around some existing codes that use cmpxchg64. It doesn't
seem like they use explicit alignment. I will need more investigation to
see if it is a real problem.
>> + /*
>> + * Code doesn't work if raw spinlock is larger than 4 bytes
>> + * or is empty.
>> + */
>> + BUG_ON((sizeof(arch_spinlock_t)> 4) || (sizeof(arch_spinlock_t) == 0));
> BUILD_BUG_ON
Thank for the suggestion, will make the change.
>> +
>> + spin_unlock_wait(plock); /* Wait until lock is released */
>> + old.__lock_count = ACCESS_ONCE(*plockcnt);
>> + get_lock = ((threshold>= 0)&& (old.count == threshold));
>> + if (likely(!get_lock&& spin_can_lock(&old.lock))) {
> What is that for? Why can't you do the CMPXCHG unconditially ?
An unconditional CMPXCHG can be as bad as acquiring the spinlock. So we
need to check the conditions are ready before doing an actual CMPXCHG.
> If it's really needed, it is most likely a race?
If there is a race going on between threads, the code will fall back to
the old way of acquiring the spinlock.
> The duplicated code should be likely an inline.
The duplicated code is only used once in the function. I don't think an
additional inline is really needed, but I can do it if other people also
think that is a good idea.
>> +/*
>> + * The presence of either one of the CONFIG_DEBUG_SPINLOCK or
>> + * CONFIG_DEBUG_LOCK_ALLOC configuration parameter will force the
>> + * spinlock_t structure to be 8-byte aligned.
>> + *
>> + * To support the spinlock/reference count combo data type for 64-bit SMP
>> + * environment with spinlock debugging turned on, the reference count has
>> + * to be integrated into the spinlock_t data structure in this special case.
>> + * The spinlock_t data type will be 8 bytes larger if CONFIG_GENERIC_LOCKBREAK
>> + * is also defined.
> I would rather just disable the optimization when these CONFIGs are set
Looking from the other perspective, we may want the locking code to have
the same behavior whether spinlock debugging is enabled or not.
Disabling the optimization will cause the code path to differ which may
not be what we want. Of course, I can change it if other people also
think it is the right way to do it.
Regards,
Longman
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH v2 1/2] spinlock: New spinlock_refcount.h for lockless update of refcount
2013-06-26 21:07 ` Waiman Long
@ 2013-06-26 21:22 ` Andi Kleen
2013-06-26 23:26 ` Waiman Long
2013-06-26 23:27 ` Thomas Gleixner
0 siblings, 2 replies; 28+ messages in thread
From: Andi Kleen @ 2013-06-26 21:22 UTC (permalink / raw)
To: Waiman Long
Cc: Andi Kleen, Alexander Viro, Jeff Layton, Miklos Szeredi,
Ingo Molnar, linux-fsdevel, linux-kernel, Linus Torvalds,
Benjamin Herrenschmidt, Chandramouleeswaran, Aswin,
Norton, Scott J
On Wed, Jun 26, 2013 at 05:07:02PM -0400, Waiman Long wrote:
> On 06/26/2013 04:17 PM, Andi Kleen wrote:
> >>+ * The combined data structure is 8-byte aligned. So proper placement of this
> >>+ * structure in the larger embedding data structure is needed to ensure that
> >>+ * there is no hole in it.
> >On i386 u64 is only 4 bytes aligned. So you need to explicitely align
> >it to 8 bytes. Otherwise you risk the two members crossing a cache line, which
> >would be really expensive with atomics.
>
> Do you mean the original i386 or the i586 that are now used by most
> distribution now? If it is the former, I recall that i386 is now no
> longer supported.
I mean i386, as in the 32bit x86 architecture.
>
> I also look around some existing codes that use cmpxchg64. It
> doesn't seem like they use explicit alignment. I will need more
> investigation to see if it is a real problem.
Adding the alignment is basically free. If 32bit users don't enforce
it they're likely potentially broken yes, but they may be lucky.
> >>+ get_lock = ((threshold>= 0)&& (old.count == threshold));
> >>+ if (likely(!get_lock&& spin_can_lock(&old.lock))) {
> >What is that for? Why can't you do the CMPXCHG unconditially ?
>
> An unconditional CMPXCHG can be as bad as acquiring the spinlock. So
> we need to check the conditions are ready before doing an actual
> CMPXCHG.
But this isn't a cheap check. Especially spin_unlock_wait can be
very expensive.
And all these functions have weird semantics. Perhaps just a quick
spin_is_locked.
>
> Looking from the other perspective, we may want the locking code to
> have the same behavior whether spinlock debugging is enabled or not.
> Disabling the optimization will cause the code path to differ which
> may not be what we want. Of course, I can change it if other people
> also think it is the right way to do it.
Lock debugging already has quite different timing/lock semantics.
-Andi
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH v2 1/2] spinlock: New spinlock_refcount.h for lockless update of refcount
2013-06-26 17:43 ` [PATCH v2 1/2] spinlock: New spinlock_refcount.h for lockless update of refcount Waiman Long
2013-06-26 20:17 ` Andi Kleen
@ 2013-06-26 23:06 ` Thomas Gleixner
2013-06-27 0:16 ` Waiman Long
2013-06-27 0:26 ` Waiman Long
2013-06-29 17:45 ` Linus Torvalds
2 siblings, 2 replies; 28+ messages in thread
From: Thomas Gleixner @ 2013-06-26 23:06 UTC (permalink / raw)
To: Waiman Long
Cc: Alexander Viro, Jeff Layton, Miklos Szeredi, Ingo Molnar,
linux-fsdevel, LKML, Linus Torvalds, Benjamin Herrenschmidt,
Andi Kleen, Chandramouleeswaran, Aswin, Norton, Scott J,
Peter Zijlstra, Steven Rostedt
On Wed, 26 Jun 2013, Waiman Long wrote:
It would have been nice if you'd had cc'ed people who spent a massive
amount of time working on the spinlock code. Did that now.
> +#ifndef CONFIG_SMP
> +#define __LOCK_WITH_REFCOUNT(lock, count, u_name) \
> + unsigned int count; \
> + spinlock_t lock
> +
> +#elif defined(__SPINLOCK_HAS_REFCOUNT)
> +#define __LOCK_WITH_REFCOUNT(lock, count, u_name) \
> + union u_name { \
> + u64 __lock_count; \
> + spinlock_t lock; \
> + struct { \
> + arch_spinlock_t __raw_lock; \
> + unsigned int count; \
> + }; \
> + }
> +
> +#else /* __SPINLOCK_HAS_REFCOUNT */
> +#define __LOCK_WITH_REFCOUNT(lock, count, u_name) \
> + union u_name { \
> + u64 __lock_count; \
> + struct { \
> + unsigned int count; \
> + spinlock_t lock; \
> + }; \
> + }
> +
> +#endif /* __SPINLOCK_HAS_REFCOUNT */
This is a complete design disaster. You are violating every single
rule of proper layering. The differentiation of spinlock, raw_spinlock
and arch_spinlock is there for a reason and you just take the current
implementation and map your desired function to it. If we take this,
then we fundamentally ruled out a change to the mappings of spinlock,
raw_spinlock and arch_spinlock. This layering is on purpose and it
took a lot of effort to make it as clean as it is now. Your proposed
change undoes that and completely wreckages preempt-rt.
What's even worse is that you go and fiddle with the basic data
structures:
> diff --git a/include/linux/spinlock_types.h b/include/linux/spinlock_types.h
> index 73548eb..cc107ad 100644
> --- a/include/linux/spinlock_types.h
> +++ b/include/linux/spinlock_types.h
> @@ -17,8 +17,27 @@
>
> #include <linux/lockdep.h>
>
> +/*
> + * The presence of either one of the CONFIG_DEBUG_SPINLOCK or
> + * CONFIG_DEBUG_LOCK_ALLOC configuration parameter will force the
> + * spinlock_t structure to be 8-byte aligned.
> + *
> + * To support the spinlock/reference count combo data type for 64-bit SMP
> + * environment with spinlock debugging turned on, the reference count has
> + * to be integrated into the spinlock_t data structure in this special case.
> + * The spinlock_t data type will be 8 bytes larger if CONFIG_GENERIC_LOCKBREAK
> + * is also defined.
> + */
> +#if defined(CONFIG_64BIT) && (defined(CONFIG_DEBUG_SPINLOCK) ||\
> + defined(CONFIG_DEBUG_LOCK_ALLOC))
> +#define __SPINLOCK_HAS_REFCOUNT 1
> +#endif
> +
> typedef struct raw_spinlock {
> arch_spinlock_t raw_lock;
> +#ifdef __SPINLOCK_HAS_REFCOUNT
> + unsigned int count;
> +#endif
> #ifdef CONFIG_GENERIC_LOCKBREAK
> unsigned int break_lock;
> #endif
You need to do that, because you blindly bolt your spinlock extension
into the existing code without taking the time to think about the
design implications.
Do not misunderstand me, I'm impressed by the improvement numbers, but
we need to find a way how this can be done sanely. You really have to
understand that there is a world aside of x86 and big machines.
Let's have a look at your UP implementation. It's completely broken:
> +/*
> + * Just add the value as the spinlock is a no-op
So what protects that from being preempted? Nothing AFAICT.
What's even worse is that you break all architectures, which have an
arch_spinlock type != sizeof(unsigned int), e.g. SPARC
So what you really want is a new data structure, e.g. something like
struct spinlock_refcount() and a proper set of new accessor functions
instead of an adhoc interface which is designed solely for the needs
of dcache improvements.
The first step to achieve your goal is to consolidate all the
identical copies of arch_spinlock_t and talk to the maintainers of the
architectures which have a different (i.e. !32bit) notion of the arch
specific spinlock implementation, whether it's possible to move over
to a common data struct. Once you have achieved this, you can build
your spinlock + refcount construct upon that.
Thanks,
tglx
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH v2 1/2] spinlock: New spinlock_refcount.h for lockless update of refcount
2013-06-26 21:22 ` Andi Kleen
@ 2013-06-26 23:26 ` Waiman Long
2013-06-27 1:06 ` Andi Kleen
2013-06-26 23:27 ` Thomas Gleixner
1 sibling, 1 reply; 28+ messages in thread
From: Waiman Long @ 2013-06-26 23:26 UTC (permalink / raw)
To: Andi Kleen
Cc: Alexander Viro, Jeff Layton, Miklos Szeredi, Ingo Molnar,
linux-fsdevel, linux-kernel, Linus Torvalds,
Benjamin Herrenschmidt, Chandramouleeswaran, Aswin,
Norton, Scott J
On 06/26/2013 05:22 PM, Andi Kleen wrote:
> On Wed, Jun 26, 2013 at 05:07:02PM -0400, Waiman Long wrote:
>> On 06/26/2013 04:17 PM, Andi Kleen wrote:
>>>> + * The combined data structure is 8-byte aligned. So proper placement of this
>>>> + * structure in the larger embedding data structure is needed to ensure that
>>>> + * there is no hole in it.
>>> On i386 u64 is only 4 bytes aligned. So you need to explicitely align
>>> it to 8 bytes. Otherwise you risk the two members crossing a cache line, which
>>> would be really expensive with atomics.
>> Do you mean the original i386 or the i586 that are now used by most
>> distribution now? If it is the former, I recall that i386 is now no
>> longer supported.
> I mean i386, as in the 32bit x86 architecture.
>
>> I also look around some existing codes that use cmpxchg64. It
>> doesn't seem like they use explicit alignment. I will need more
>> investigation to see if it is a real problem.
> Adding the alignment is basically free. If 32bit users don't enforce
> it they're likely potentially broken yes, but they may be lucky.
You are right. I will added the 8-byte alignment attribute to the union
definition and document that in the code.
>>>> + get_lock = ((threshold>= 0)&& (old.count == threshold));
>>>> + if (likely(!get_lock&& spin_can_lock(&old.lock))) {
>>> What is that for? Why can't you do the CMPXCHG unconditially ?
>> An unconditional CMPXCHG can be as bad as acquiring the spinlock. So
>> we need to check the conditions are ready before doing an actual
>> CMPXCHG.
> But this isn't a cheap check. Especially spin_unlock_wait can be
> very expensive.
> And all these functions have weird semantics. Perhaps just a quick
> spin_is_locked.
In the uncontended case, doing spin_unlock_wait will be similar to
spin_can_lock. This, when combined with a cmpxchg, is still faster than
doing 2 atomic operations in spin_lock/spin_unlock.
In the contended case, doing spin_unlock_wait won't be more expensive
than doing spin_lock. Without doing that, most of the threads will fall
back to acquiring the lock negating any performance benefit. I had tried
that without spin_unlock_wait and there wasn't that much performance
gain in the AIM7 short workload. BTW, spin_can_lock is just the negation
of spin_is_locked.
Regards,
Longman
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH v2 1/2] spinlock: New spinlock_refcount.h for lockless update of refcount
2013-06-26 21:22 ` Andi Kleen
2013-06-26 23:26 ` Waiman Long
@ 2013-06-26 23:27 ` Thomas Gleixner
1 sibling, 0 replies; 28+ messages in thread
From: Thomas Gleixner @ 2013-06-26 23:27 UTC (permalink / raw)
To: Andi Kleen
Cc: Waiman Long, Alexander Viro, Jeff Layton, Miklos Szeredi,
Ingo Molnar, linux-fsdevel, linux-kernel, Linus Torvalds,
Benjamin Herrenschmidt, Chandramouleeswaran, Aswin,
Norton, Scott J
On Wed, 26 Jun 2013, Andi Kleen wrote:
> On Wed, Jun 26, 2013 at 05:07:02PM -0400, Waiman Long wrote:
> > Looking from the other perspective, we may want the locking code to
> > have the same behavior whether spinlock debugging is enabled or not.
> > Disabling the optimization will cause the code path to differ which
> > may not be what we want. Of course, I can change it if other people
> > also think it is the right way to do it.
>
> Lock debugging already has quite different timing/lock semantics.
Locks have no timing related sematics and lock debugging does not
change the semantics of locks. Lock debugging adds overhead to the
locking functions, but it does not affect the semantics of locks.
Thanks,
tglx
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH v2 1/2] spinlock: New spinlock_refcount.h for lockless update of refcount
2013-06-26 23:06 ` Thomas Gleixner
@ 2013-06-27 0:16 ` Waiman Long
2013-06-27 14:44 ` Thomas Gleixner
2013-06-27 0:26 ` Waiman Long
1 sibling, 1 reply; 28+ messages in thread
From: Waiman Long @ 2013-06-27 0:16 UTC (permalink / raw)
To: Thomas Gleixner
Cc: Alexander Viro, Jeff Layton, Miklos Szeredi, Ingo Molnar,
linux-fsdevel, LKML, Linus Torvalds, Benjamin Herrenschmidt,
Andi Kleen, Chandramouleeswaran, Aswin, Norton, Scott J,
Peter Zijlstra, Steven Rostedt
On 06/26/2013 07:06 PM, Thomas Gleixner wrote:
> On Wed, 26 Jun 2013, Waiman Long wrote:
>
> It would have been nice if you'd had cc'ed people who spent a massive
> amount of time working on the spinlock code. Did that now.
I am sorry that I am relatively new to the Linux community and using
get_maintainer.pl didn't out your name when I modified spinlock_types.h.
Thank for letting me know about the shortcoming of my patch and this is
what I am looking for.
>> +#ifndef CONFIG_SMP
>> +#define __LOCK_WITH_REFCOUNT(lock, count, u_name) \
>> + unsigned int count; \
>> + spinlock_t lock
>> +
>> +#elif defined(__SPINLOCK_HAS_REFCOUNT)
>> +#define __LOCK_WITH_REFCOUNT(lock, count, u_name) \
>> + union u_name { \
>> + u64 __lock_count; \
>> + spinlock_t lock; \
>> + struct { \
>> + arch_spinlock_t __raw_lock; \
>> + unsigned int count; \
>> + }; \
>> + }
>> +
>> +#else /* __SPINLOCK_HAS_REFCOUNT */
>> +#define __LOCK_WITH_REFCOUNT(lock, count, u_name) \
>> + union u_name { \
>> + u64 __lock_count; \
>> + struct { \
>> + unsigned int count; \
>> + spinlock_t lock; \
>> + }; \
>> + }
>> +
>> +#endif /* __SPINLOCK_HAS_REFCOUNT */
> This is a complete design disaster. You are violating every single
> rule of proper layering. The differentiation of spinlock, raw_spinlock
> and arch_spinlock is there for a reason and you just take the current
> implementation and map your desired function to it. If we take this,
> then we fundamentally ruled out a change to the mappings of spinlock,
> raw_spinlock and arch_spinlock. This layering is on purpose and it
> took a lot of effort to make it as clean as it is now. Your proposed
> change undoes that and completely wreckages preempt-rt.
Would you mind enlighten me how this change will break preempt-rt?
> What's even worse is that you go and fiddle with the basic data
> structures:
>
>> diff --git a/include/linux/spinlock_types.h b/include/linux/spinlock_types.h
>> index 73548eb..cc107ad 100644
>> --- a/include/linux/spinlock_types.h
>> +++ b/include/linux/spinlock_types.h
>> @@ -17,8 +17,27 @@
>>
>> #include<linux/lockdep.h>
>>
>> +/*
>> + * The presence of either one of the CONFIG_DEBUG_SPINLOCK or
>> + * CONFIG_DEBUG_LOCK_ALLOC configuration parameter will force the
>> + * spinlock_t structure to be 8-byte aligned.
>> + *
>> + * To support the spinlock/reference count combo data type for 64-bit SMP
>> + * environment with spinlock debugging turned on, the reference count has
>> + * to be integrated into the spinlock_t data structure in this special case.
>> + * The spinlock_t data type will be 8 bytes larger if CONFIG_GENERIC_LOCKBREAK
>> + * is also defined.
>> + */
>> +#if defined(CONFIG_64BIT)&& (defined(CONFIG_DEBUG_SPINLOCK) ||\
>> + defined(CONFIG_DEBUG_LOCK_ALLOC))
>> +#define __SPINLOCK_HAS_REFCOUNT 1
>> +#endif
>> +
>> typedef struct raw_spinlock {
>> arch_spinlock_t raw_lock;
>> +#ifdef __SPINLOCK_HAS_REFCOUNT
>> + unsigned int count;
>> +#endif
>> #ifdef CONFIG_GENERIC_LOCKBREAK
>> unsigned int break_lock;
>> #endif
> You need to do that, because you blindly bolt your spinlock extension
> into the existing code without taking the time to think about the
> design implications.
Another alternative is to disable this lockless update optimization for
this special case. With that, I don't need to modify the raw_spinlock_t
data structure.
> Do not misunderstand me, I'm impressed by the improvement numbers, but
> we need to find a way how this can be done sanely. You really have to
> understand that there is a world aside of x86 and big machines.
I am well aware that my knowledge in other architectures is very
limited. And this is why we have this kind of public review process.
> Let's have a look at your UP implementation. It's completely broken:
>
>> +/*
>> + * Just add the value as the spinlock is a no-op
> So what protects that from being preempted? Nothing AFAICT.
Thank for the pointing this out. I can change it to return 0 to disable
the optimization.
> What's even worse is that you break all architectures, which have an
> arch_spinlock type != sizeof(unsigned int), e.g. SPARC
Actually the code should work as long as arch_spinlock type is not
bigger than 4 bytes because of structure field alignment. It can be 1
or 2 bytes. If NR_CPUS is less than 256, x86's arch_spinlock type should
only be 2 bytes.
The only architecture that will break, according to data in the
respectively arch/*/include/asm/spinlock_types.h files is PA-RISC 1.x
(it is OK in PA-RISC 2.0) whose arch_spinlock type has a size of 16
bytes. I am not sure if 32-bit PA-RISC 1.x is still supported or not,
but we can always disable the optimization for this case.
> So what you really want is a new data structure, e.g. something like
> struct spinlock_refcount() and a proper set of new accessor functions
> instead of an adhoc interface which is designed solely for the needs
> of dcache improvements.
I had thought about that. The only downside is we need more code changes
to make existing code to use the new infrastructure. One of the drivers
in my design is to minimize change to existing code. Personally, I have
no objection of doing that if others think this is the right way to go.
> The first step to achieve your goal is to consolidate all the
> identical copies of arch_spinlock_t and talk to the maintainers of the
> architectures which have a different (i.e. !32bit) notion of the arch
> specific spinlock implementation, whether it's possible to move over
> to a common data struct. Once you have achieved this, you can build
> your spinlock + refcount construct upon that.
As I said above, as long as the arch_spinlock type isn't bigger than 4
bytes, the code should work.
Regards,
Longman
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH v2 1/2] spinlock: New spinlock_refcount.h for lockless update of refcount
2013-06-26 23:06 ` Thomas Gleixner
2013-06-27 0:16 ` Waiman Long
@ 2013-06-27 0:26 ` Waiman Long
1 sibling, 0 replies; 28+ messages in thread
From: Waiman Long @ 2013-06-27 0:26 UTC (permalink / raw)
To: Thomas Gleixner
Cc: Alexander Viro, Jeff Layton, Miklos Szeredi, Ingo Molnar,
linux-fsdevel, LKML, Linus Torvalds, Benjamin Herrenschmidt,
Andi Kleen, Chandramouleeswaran, Aswin, Norton, Scott J,
Peter Zijlstra, Steven Rostedt
On 06/26/2013 07:06 PM, Thomas Gleixner wrote:
> On Wed, 26 Jun 2013, Waiman Long wrote:
>
> It would have been nice if you'd had cc'ed people who spent a massive
> amount of time working on the spinlock code. Did that now.
I am sorry that I am relatively new to the Linux community and using
get_maintainer.pl didn't out your name when I modified spinlock_types.h.
Thank for letting me know about the shortcoming of my patch and this is
what I am looking for.
>> +#ifndef CONFIG_SMP
>> +#define __LOCK_WITH_REFCOUNT(lock, count, u_name) \
>> + unsigned int count; \
>> + spinlock_t lock
>> +
>> +#elif defined(__SPINLOCK_HAS_REFCOUNT)
>> +#define __LOCK_WITH_REFCOUNT(lock, count, u_name) \
>> + union u_name { \
>> + u64 __lock_count; \
>> + spinlock_t lock; \
>> + struct { \
>> + arch_spinlock_t __raw_lock; \
>> + unsigned int count; \
>> + }; \
>> + }
>> +
>> +#else /* __SPINLOCK_HAS_REFCOUNT */
>> +#define __LOCK_WITH_REFCOUNT(lock, count, u_name) \
>> + union u_name { \
>> + u64 __lock_count; \
>> + struct { \
>> + unsigned int count; \
>> + spinlock_t lock; \
>> + }; \
>> + }
>> +
>> +#endif /* __SPINLOCK_HAS_REFCOUNT */
> This is a complete design disaster. You are violating every single
> rule of proper layering. The differentiation of spinlock, raw_spinlock
> and arch_spinlock is there for a reason and you just take the current
> implementation and map your desired function to it. If we take this,
> then we fundamentally ruled out a change to the mappings of spinlock,
> raw_spinlock and arch_spinlock. This layering is on purpose and it
> took a lot of effort to make it as clean as it is now. Your proposed
> change undoes that and completely wreckages preempt-rt.
Would you mind enlighten me how this change will break preempt-rt?
> What's even worse is that you go and fiddle with the basic data
> structures:
>
>> diff --git a/include/linux/spinlock_types.h b/include/linux/spinlock_types.h
>> index 73548eb..cc107ad 100644
>> --- a/include/linux/spinlock_types.h
>> +++ b/include/linux/spinlock_types.h
>> @@ -17,8 +17,27 @@
>>
>> #include<linux/lockdep.h>
>>
>> +/*
>> + * The presence of either one of the CONFIG_DEBUG_SPINLOCK or
>> + * CONFIG_DEBUG_LOCK_ALLOC configuration parameter will force the
>> + * spinlock_t structure to be 8-byte aligned.
>> + *
>> + * To support the spinlock/reference count combo data type for 64-bit SMP
>> + * environment with spinlock debugging turned on, the reference count has
>> + * to be integrated into the spinlock_t data structure in this special case.
>> + * The spinlock_t data type will be 8 bytes larger if CONFIG_GENERIC_LOCKBREAK
>> + * is also defined.
>> + */
>> +#if defined(CONFIG_64BIT)&& (defined(CONFIG_DEBUG_SPINLOCK) ||\
>> + defined(CONFIG_DEBUG_LOCK_ALLOC))
>> +#define __SPINLOCK_HAS_REFCOUNT 1
>> +#endif
>> +
>> typedef struct raw_spinlock {
>> arch_spinlock_t raw_lock;
>> +#ifdef __SPINLOCK_HAS_REFCOUNT
>> + unsigned int count;
>> +#endif
>> #ifdef CONFIG_GENERIC_LOCKBREAK
>> unsigned int break_lock;
>> #endif
> You need to do that, because you blindly bolt your spinlock extension
> into the existing code without taking the time to think about the
> design implications.
Another alternative is to disable this lockless update optimization for
this special case. With that, I don't need to modify the raw_spinlock_t
data structure.
> Do not misunderstand me, I'm impressed by the improvement numbers, but
> we need to find a way how this can be done sanely. You really have to
> understand that there is a world aside of x86 and big machines.
I am well aware that my knowledge in other architectures is very
limited. And this is why we have this kind of public review process.
> Let's have a look at your UP implementation. It's completely broken:
>
>> +/*
>> + * Just add the value as the spinlock is a no-op
> So what protects that from being preempted? Nothing AFAICT.
Thank for the pointing this out. I can change it to return 0 to disable
the optimization.
> What's even worse is that you break all architectures, which have an
> arch_spinlock type != sizeof(unsigned int), e.g. SPARC
Actually the code should work as long as arch_spinlock type is not
bigger than 4 bytes because of structure field alignment. It can be 1
or 2 bytes. If NR_CPUS is less than 256, x86's arch_spinlock type should
only be 2 bytes.
The only architecture that will break, according to data in the
respectively arch/*/include/asm/spinlock_types.h files is PA-RISC 1.x
(it is OK in PA-RISC 2.0) whose arch_spinlock type has a size of 16
bytes. I am not sure if 32-bit PA-RISC 1.x is still supported or not,
but we can always disable the optimization for this case.
> So what you really want is a new data structure, e.g. something like
> struct spinlock_refcount() and a proper set of new accessor functions
> instead of an adhoc interface which is designed solely for the needs
> of dcache improvements.
I had thought about that. The only downside is we need more code changes
to make existing code to use the new infrastructure. One of the drivers
in my design is to minimize change to existing code. Personally, I have
no objection of doing that if others think this is the right way to go.
> The first step to achieve your goal is to consolidate all the
> identical copies of arch_spinlock_t and talk to the maintainers of the
> architectures which have a different (i.e. !32bit) notion of the arch
> specific spinlock implementation, whether it's possible to move over
> to a common data struct. Once you have achieved this, you can build
> your spinlock + refcount construct upon that.
As I said above, as long as the arch_spinlock type isn't bigger than 4
bytes, the code should work.
Regards,
Longman
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH v2 1/2] spinlock: New spinlock_refcount.h for lockless update of refcount
2013-06-26 23:26 ` Waiman Long
@ 2013-06-27 1:06 ` Andi Kleen
2013-06-27 1:15 ` Waiman Long
0 siblings, 1 reply; 28+ messages in thread
From: Andi Kleen @ 2013-06-27 1:06 UTC (permalink / raw)
To: Waiman Long
Cc: Andi Kleen, Alexander Viro, Jeff Layton, Miklos Szeredi,
Ingo Molnar, linux-fsdevel, linux-kernel, Linus Torvalds,
Benjamin Herrenschmidt, Chandramouleeswaran, Aswin,
Norton, Scott J
> In the uncontended case, doing spin_unlock_wait will be similar to
> spin_can_lock. This, when combined with a cmpxchg, is still faster
> than doing 2 atomic operations in spin_lock/spin_unlock.
I'm totally against any new users of spin_unlock_wait()
It has bizarre semantics, most likely will make various
lock optimizations impossible, it's race condition hell
for most users etc.
spin_can_lock() is not quite as bad has a lot of the similar problems.
> BTW, spin_can_lock is just the negation of spin_is_locked.
e.g. with elision it's not.
-Andi
--
ak@linux.intel.com -- Speaking for myself only.
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH v2 1/2] spinlock: New spinlock_refcount.h for lockless update of refcount
2013-06-27 1:06 ` Andi Kleen
@ 2013-06-27 1:15 ` Waiman Long
2013-06-27 1:24 ` Waiman Long
0 siblings, 1 reply; 28+ messages in thread
From: Waiman Long @ 2013-06-27 1:15 UTC (permalink / raw)
To: Andi Kleen
Cc: Alexander Viro, Jeff Layton, Miklos Szeredi, Ingo Molnar,
linux-fsdevel, linux-kernel, Linus Torvalds,
Benjamin Herrenschmidt, Chandramouleeswaran, Aswin,
Norton, Scott J, Thomas Gleixner, Peter Zijlstra, Steven Rostedt
On 06/26/2013 09:06 PM, Andi Kleen wrote:
>> In the uncontended case, doing spin_unlock_wait will be similar to
>> spin_can_lock. This, when combined with a cmpxchg, is still faster
>> than doing 2 atomic operations in spin_lock/spin_unlock.
> I'm totally against any new users of spin_unlock_wait()
>
> It has bizarre semantics, most likely will make various
> lock optimizations impossible, it's race condition hell
> for most users etc.
>
> spin_can_lock() is not quite as bad has a lot of the similar problems.
>
>> BTW, spin_can_lock is just the negation of spin_is_locked.
> e.g. with elision it's not.
>
> -Andi
OK, it is about Haswell's lock elision feature. I will see what I can do
to remove those problematic function calls.
Regards,
Longman
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH v2 1/2] spinlock: New spinlock_refcount.h for lockless update of refcount
2013-06-27 1:15 ` Waiman Long
@ 2013-06-27 1:24 ` Waiman Long
2013-06-27 1:37 ` Andi Kleen
0 siblings, 1 reply; 28+ messages in thread
From: Waiman Long @ 2013-06-27 1:24 UTC (permalink / raw)
To: Andi Kleen
Cc: Alexander Viro, Jeff Layton, Miklos Szeredi, Ingo Molnar,
linux-fsdevel, linux-kernel, Linus Torvalds,
Benjamin Herrenschmidt, Chandramouleeswaran, Aswin,
Norton, Scott J, Thomas Gleixner, Peter Zijlstra, Steven Rostedt
On 06/26/2013 09:15 PM, Waiman Long wrote:
> On 06/26/2013 09:06 PM, Andi Kleen wrote:
>>> In the uncontended case, doing spin_unlock_wait will be similar to
>>> spin_can_lock. This, when combined with a cmpxchg, is still faster
>>> than doing 2 atomic operations in spin_lock/spin_unlock.
>> I'm totally against any new users of spin_unlock_wait()
>>
>> It has bizarre semantics, most likely will make various
>> lock optimizations impossible, it's race condition hell
>> for most users etc.
>>
>> spin_can_lock() is not quite as bad has a lot of the similar problems.
>>
>>> BTW, spin_can_lock is just the negation of spin_is_locked.
>> e.g. with elision it's not.
>>
>> -Andi
>
> OK, it is about Haswell's lock elision feature. I will see what I can
> do to remove those problematic function calls.
It will be hard to know what changes will be needed without knowing the
exact semantics of the spinlock functions with lock elision. Can you
explain a little more what bizarre semantics you are referring to?
regards,
Longman
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH v2 1/2] spinlock: New spinlock_refcount.h for lockless update of refcount
2013-06-27 1:24 ` Waiman Long
@ 2013-06-27 1:37 ` Andi Kleen
2013-06-27 14:56 ` Waiman Long
0 siblings, 1 reply; 28+ messages in thread
From: Andi Kleen @ 2013-06-27 1:37 UTC (permalink / raw)
To: Waiman Long
Cc: Andi Kleen, Alexander Viro, Jeff Layton, Miklos Szeredi,
Ingo Molnar, linux-fsdevel, linux-kernel, Linus Torvalds,
Benjamin Herrenschmidt, Chandramouleeswaran, Aswin,
Norton, Scott J, Thomas Gleixner, Peter Zijlstra, Steven Rostedt
> It will be hard to know what changes will be needed without knowing
> the exact semantics of the spinlock functions with lock elision. Can
> you explain a little more what bizarre semantics you are referring
> to?
Totally independent of elision.
For example, what semantics does spin_unlock_wait() have with a ticket
lock. Where in the queue does it wait?
It doesn't really make sense with a ticket lock.
What semantics would lockdep put on it?
-Andi
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH v2 1/2] spinlock: New spinlock_refcount.h for lockless update of refcount
2013-06-27 0:16 ` Waiman Long
@ 2013-06-27 14:44 ` Thomas Gleixner
2013-06-29 21:03 ` Waiman Long
0 siblings, 1 reply; 28+ messages in thread
From: Thomas Gleixner @ 2013-06-27 14:44 UTC (permalink / raw)
To: Waiman Long
Cc: Alexander Viro, Jeff Layton, Miklos Szeredi, Ingo Molnar,
linux-fsdevel, LKML, Linus Torvalds, Benjamin Herrenschmidt,
Andi Kleen, Chandramouleeswaran, Aswin, Norton, Scott J,
Peter Zijlstra, Steven Rostedt
On Wed, 26 Jun 2013, Waiman Long wrote:
> On 06/26/2013 07:06 PM, Thomas Gleixner wrote:
> > On Wed, 26 Jun 2013, Waiman Long wrote:
> > This is a complete design disaster. You are violating every single
> > rule of proper layering. The differentiation of spinlock, raw_spinlock
> > and arch_spinlock is there for a reason and you just take the current
> > implementation and map your desired function to it. If we take this,
> > then we fundamentally ruled out a change to the mappings of spinlock,
> > raw_spinlock and arch_spinlock. This layering is on purpose and it
> > took a lot of effort to make it as clean as it is now. Your proposed
> > change undoes that and completely wreckages preempt-rt.
>
> Would you mind enlighten me how this change will break preempt-rt?
The whole spinlock layering was done for RT. In mainline spinlock is
mapped to raw_spinlock. On RT spinlock is mapped to a PI aware
rtmutex.
So your mixing of the various layers and the assumption of lock plus
count being adjacent, does not work on RT at all.
> The only architecture that will break, according to data in the
> respectively arch/*/include/asm/spinlock_types.h files is PA-RISC
> 1.x (it is OK in PA-RISC 2.0) whose arch_spinlock type has a size of
> 16 bytes. I am not sure if 32-bit PA-RISC 1.x is still supported or
> not, but we can always disable the optimization for this case.
You have to do that right from the beginning with a proper data
structure and proper accessor functions. Introducing wreckage and then
retroactivly fixing it, is not a really good idea.
> > So what you really want is a new data structure, e.g. something like
> > struct spinlock_refcount() and a proper set of new accessor functions
> > instead of an adhoc interface which is designed solely for the needs
> > of dcache improvements.
>
> I had thought about that. The only downside is we need more code changes to
> make existing code to use the new infrastructure. One of the drivers in my
That's not a downside. It makes the usage of the infrastructure more
obvious and not hidden behind macro magic. And the changes are trivial
to script.
> design is to minimize change to existing code. Personally, I have no
> objection of doing that if others think this is the right way to go.
Definitely. Something like this would be ok:
struct lock_refcnt {
int refcnt;
struct spinlock lock;
};
It does not require a gazillion of ifdefs and works for
UP,SMP,DEBUG....
extern bool lock_refcnt_mod(struct lock_refcnt *lr, int mod, int cond);
You also want something like:
extern void lock_refcnt_disable(void);
So we can runtime disable it e.g. when lock elision is detected and
active. So you can force lock_refcnt_mod() to return false.
static inline bool lock_refcnt_inc(struct lock_refcnt *sr)
{
#ifdef CONFIG_HAVE_LOCK_REFCNT
return lock_refcnt_mod(sr, 1, INTMAX);
#else
return false;
#endif
}
That does not break any code as long as CONFIG_HAVE_SPINLOCK_REFCNT=n.
So we can enable it per architecture and make it depend on SMP. For RT
we simply can force this switch to n.
The other question is the semantic of these refcount functions. From
your description the usage pattern is:
if (refcnt_xxx())
return;
/* Slow path */
spin_lock();
...
spin_unlock();
So it might be sensible to make this explicit:
static inline bool refcnt_inc_or_lock(struct lock_refcnt *lr))
{
#ifdef CONFIG_HAVE_SPINLOCK_REFCNT
if (lock_refcnt_mod(lr, 1, INTMAX))
return true;
#endif
spin_lock(&lr->lock);
return false;
}
Thanks,
tglx
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH v2 1/2] spinlock: New spinlock_refcount.h for lockless update of refcount
2013-06-27 1:37 ` Andi Kleen
@ 2013-06-27 14:56 ` Waiman Long
2013-06-28 13:46 ` Thomas Gleixner
0 siblings, 1 reply; 28+ messages in thread
From: Waiman Long @ 2013-06-27 14:56 UTC (permalink / raw)
To: Andi Kleen
Cc: Alexander Viro, Jeff Layton, Miklos Szeredi, Ingo Molnar,
linux-fsdevel, linux-kernel, Linus Torvalds,
Benjamin Herrenschmidt, Chandramouleeswaran, Aswin,
Norton, Scott J, Thomas Gleixner, Peter Zijlstra, Steven Rostedt
On 06/26/2013 09:37 PM, Andi Kleen wrote:
>> It will be hard to know what changes will be needed without knowing
>> the exact semantics of the spinlock functions with lock elision. Can
>> you explain a little more what bizarre semantics you are referring
>> to?
> Totally independent of elision.
>
> For example, what semantics does spin_unlock_wait() have with a ticket
> lock. Where in the queue does it wait?
> It doesn't really make sense with a ticket lock.
>
> What semantics would lockdep put on it?
>
> -Andi
Calling spin_unlock_wait() doesn't put the caller into a queue. It just
wait until the lock is no longer held by any thread. Yes, there is a
possibility that the lock can be so busy that it may be hold by various
threads continuously for a long time making it hard for those who wait
to proceed. Perhaps, I should change the code to abandon the use of
spin_unlock_wait(). Instead, I can make it wait for the lock to be free
with some kind of timeout to make sure that it won't wait too long.
With this timeout mechanism, additional lockdep code shouldn't be needed
as the code will eventually call spin_lock() if the lock is really busy.
Regards,
Longman
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH v2 1/2] spinlock: New spinlock_refcount.h for lockless update of refcount
2013-06-27 14:56 ` Waiman Long
@ 2013-06-28 13:46 ` Thomas Gleixner
2013-06-29 20:30 ` Waiman Long
0 siblings, 1 reply; 28+ messages in thread
From: Thomas Gleixner @ 2013-06-28 13:46 UTC (permalink / raw)
To: Waiman Long
Cc: Andi Kleen, Alexander Viro, Jeff Layton, Miklos Szeredi,
Ingo Molnar, linux-fsdevel, linux-kernel, Linus Torvalds,
Benjamin Herrenschmidt, Chandramouleeswaran, Aswin,
Norton, Scott J, Peter Zijlstra, Steven Rostedt
On Thu, 27 Jun 2013, Waiman Long wrote:
> On 06/26/2013 09:37 PM, Andi Kleen wrote:
> > > It will be hard to know what changes will be needed without knowing
> > > the exact semantics of the spinlock functions with lock elision. Can
> > > you explain a little more what bizarre semantics you are referring
> > > to?
> > Totally independent of elision.
> >
> > For example, what semantics does spin_unlock_wait() have with a ticket
> > lock. Where in the queue does it wait?
> > It doesn't really make sense with a ticket lock.
> >
> > What semantics would lockdep put on it?
> >
> > -Andi
>
> Calling spin_unlock_wait() doesn't put the caller into a queue. It just wait
> until the lock is no longer held by any thread. Yes, there is a possibility
> that the lock can be so busy that it may be hold by various threads
> continuously for a long time making it hard for those who wait to proceed.
> Perhaps, I should change the code to abandon the use of spin_unlock_wait().
> Instead, I can make it wait for the lock to be free with some kind of timeout
> to make sure that it won't wait too long.
Please no timeout heuristics. They are bound to be wrong.
If the lock is held by some other cpu, then waiting for it with
unlock_wait() or a magic timeout is probably equally expensive as just
going into the slow path right away.
Thanks,
tglx
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH v2 1/2] spinlock: New spinlock_refcount.h for lockless update of refcount
2013-06-26 17:43 ` [PATCH v2 1/2] spinlock: New spinlock_refcount.h for lockless update of refcount Waiman Long
2013-06-26 20:17 ` Andi Kleen
2013-06-26 23:06 ` Thomas Gleixner
@ 2013-06-29 17:45 ` Linus Torvalds
2013-06-29 20:23 ` Waiman Long
2 siblings, 1 reply; 28+ messages in thread
From: Linus Torvalds @ 2013-06-29 17:45 UTC (permalink / raw)
To: Waiman Long
Cc: Alexander Viro, Jeff Layton, Miklos Szeredi, Ingo Molnar,
linux-fsdevel, Linux Kernel Mailing List, Benjamin Herrenschmidt,
Andi Kleen, Chandramouleeswaran, Aswin, Norton, Scott J
Sorry for not commenting earlier, I was traveling and keeping email to
a minimum..
On Wed, Jun 26, 2013 at 10:43 AM, Waiman Long <Waiman.Long@hp.com> wrote:
> This patch introduces a new spinlock_refcount.h header file to be
> included by kernel code that want to do a lockless update of reference
> count protected by a spinlock.
So I really like the concept, but the implementation is a mess, and
tries to do too much, while actually achieving too little.
I do not believe you should care about debug spinlocks at all, and
just leave them be. Have a simple fallback code that defaults to
regular counts and spinlocks, and have any debug cases just use that.
But more importantly, I think this needs to be architecture-specific,
and using <linux/spinlock_refcount.h> to try to do some generic 64-bit
cmpxchg() version is a bad bad idea.
We have several architectures coming up that have memory transaction
support, and the "spinlock with refcount" is a perfect candidate for a
transactional memory implementation. So when introducing a new atomic
like this that is very performance-critical and used for some very
core code, I really think architectures would want to make their own
optimized versions.
These things should also not be inlined, I think.
So I think the concept is good, but I think the implementation needs
more thought.
Linus
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH v2 1/2] spinlock: New spinlock_refcount.h for lockless update of refcount
2013-06-29 17:45 ` Linus Torvalds
@ 2013-06-29 20:23 ` Waiman Long
2013-06-29 21:34 ` Waiman Long
2013-06-29 21:58 ` Linus Torvalds
0 siblings, 2 replies; 28+ messages in thread
From: Waiman Long @ 2013-06-29 20:23 UTC (permalink / raw)
To: Linus Torvalds
Cc: Alexander Viro, Jeff Layton, Miklos Szeredi, Ingo Molnar,
linux-fsdevel, Linux Kernel Mailing List, Benjamin Herrenschmidt,
Andi Kleen, Chandramouleeswaran, Aswin, Norton, Scott J,
Thomas Gleixner, Peter Zijlstra, Steven Rostedt
On 06/29/2013 01:45 PM, Linus Torvalds wrote:
> Sorry for not commenting earlier, I was traveling and keeping email to
> a minimum..
>
> On Wed, Jun 26, 2013 at 10:43 AM, Waiman Long<Waiman.Long@hp.com> wrote:
>> This patch introduces a new spinlock_refcount.h header file to be
>> included by kernel code that want to do a lockless update of reference
>> count protected by a spinlock.
> So I really like the concept, but the implementation is a mess, and
> tries to do too much, while actually achieving too little.
>
> I do not believe you should care about debug spinlocks at all, and
> just leave them be. Have a simple fallback code that defaults to
> regular counts and spinlocks, and have any debug cases just use that.
I was concern that people might want to have the same behavior even when
spinlock debugging was on. Apparently, this is not really needed. Now I
can just disable the optimization and fall back to the old path when
spinlock debugging is on.
> But more importantly, I think this needs to be architecture-specific,
> and using<linux/spinlock_refcount.h> to try to do some generic 64-bit
> cmpxchg() version is a bad bad idea.
Yes, I can put the current implementation into
asm-generic/spinlock_refcount.h. Now I need to put an
asm/spinlock_refcount.h into every arch's include/asm directory. Right?
I don't think there is a mechanism in the build script to create a
symlink from asm to generic-asm when a header file is missing. Is it the
general rule that we should have a linux/spinlock_refcount.h that
include asm/spinlock_refcount.h instead of including
asm/spinlock_refcount.h directly?
> We have several architectures coming up that have memory transaction
> support, and the "spinlock with refcount" is a perfect candidate for a
> transactional memory implementation. So when introducing a new atomic
> like this that is very performance-critical and used for some very
> core code, I really think architectures would want to make their own
> optimized versions.
>
> These things should also not be inlined, I think.
>
> So I think the concept is good, but I think the implementation needs
> more thought.
>
> Linus
Thank for the comment. I will try to come up with a version that is
acceptable to all stakeholders.
Regards,
Longman
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH v2 1/2] spinlock: New spinlock_refcount.h for lockless update of refcount
2013-06-28 13:46 ` Thomas Gleixner
@ 2013-06-29 20:30 ` Waiman Long
0 siblings, 0 replies; 28+ messages in thread
From: Waiman Long @ 2013-06-29 20:30 UTC (permalink / raw)
To: Thomas Gleixner
Cc: Andi Kleen, Alexander Viro, Jeff Layton, Miklos Szeredi,
Ingo Molnar, linux-fsdevel, linux-kernel, Linus Torvalds,
Benjamin Herrenschmidt, Chandramouleeswaran, Aswin,
Norton, Scott J, Peter Zijlstra, Steven Rostedt
On 06/28/2013 09:46 AM, Thomas Gleixner wrote:
> On Thu, 27 Jun 2013, Waiman Long wrote:
>> On 06/26/2013 09:37 PM, Andi Kleen wrote:
>>>> It will be hard to know what changes will be needed without knowing
>>>> the exact semantics of the spinlock functions with lock elision. Can
>>>> you explain a little more what bizarre semantics you are referring
>>>> to?
>>> Totally independent of elision.
>>>
>>> For example, what semantics does spin_unlock_wait() have with a ticket
>>> lock. Where in the queue does it wait?
>>> It doesn't really make sense with a ticket lock.
>>>
>>> What semantics would lockdep put on it?
>>>
>>> -Andi
>> Calling spin_unlock_wait() doesn't put the caller into a queue. It just wait
>> until the lock is no longer held by any thread. Yes, there is a possibility
>> that the lock can be so busy that it may be hold by various threads
>> continuously for a long time making it hard for those who wait to proceed.
>> Perhaps, I should change the code to abandon the use of spin_unlock_wait().
>> Instead, I can make it wait for the lock to be free with some kind of timeout
>> to make sure that it won't wait too long.
> Please no timeout heuristics. They are bound to be wrong.
>
> If the lock is held by some other cpu, then waiting for it with
> unlock_wait() or a magic timeout is probably equally expensive as just
> going into the slow path right away.
After some more thought, it may not be such a bad idea of have some kind
of timeout. In that case, the code will just fall back to the old way of
acquiring the spinlock before updating the count. If the lock is really
busy, it is possible that the waiting thread may get starved to a point
that it cannot proceed for a really long time. A timeout mechanism
ensures that unfairness to the waiting thread will be limited. The exact
timeout value is not that critical, a larger value will increase the
probability of doing a lockless update and a smaller value will decrease
the probability.
Regards,
Longman
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH v2 1/2] spinlock: New spinlock_refcount.h for lockless update of refcount
2013-06-27 14:44 ` Thomas Gleixner
@ 2013-06-29 21:03 ` Waiman Long
0 siblings, 0 replies; 28+ messages in thread
From: Waiman Long @ 2013-06-29 21:03 UTC (permalink / raw)
To: Thomas Gleixner
Cc: Alexander Viro, Jeff Layton, Miklos Szeredi, Ingo Molnar,
linux-fsdevel, LKML, Linus Torvalds, Benjamin Herrenschmidt,
Andi Kleen, Chandramouleeswaran, Aswin, Norton, Scott J,
Peter Zijlstra, Steven Rostedt
On 06/27/2013 10:44 AM, Thomas Gleixner wrote:
> On Wed, 26 Jun 2013, Waiman Long wrote:
>> On 06/26/2013 07:06 PM, Thomas Gleixner wrote:
>>> On Wed, 26 Jun 2013, Waiman Long wrote:
>>> This is a complete design disaster. You are violating every single
>>> rule of proper layering. The differentiation of spinlock, raw_spinlock
>>> and arch_spinlock is there for a reason and you just take the current
>>> implementation and map your desired function to it. If we take this,
>>> then we fundamentally ruled out a change to the mappings of spinlock,
>>> raw_spinlock and arch_spinlock. This layering is on purpose and it
>>> took a lot of effort to make it as clean as it is now. Your proposed
>>> change undoes that and completely wreckages preempt-rt.
>> Would you mind enlighten me how this change will break preempt-rt?
> The whole spinlock layering was done for RT. In mainline spinlock is
> mapped to raw_spinlock. On RT spinlock is mapped to a PI aware
> rtmutex.
>
> So your mixing of the various layers and the assumption of lock plus
> count being adjacent, does not work on RT at all.
Thank for the explanation. I had downloaded and looked at the RT patch.
My code won't work for the full RT kernel. I guess that is no way to
work around this and only logical choice is to disable it for the full
RT kernel.
>> The only architecture that will break, according to data in the
>> respectively arch/*/include/asm/spinlock_types.h files is PA-RISC
>> 1.x (it is OK in PA-RISC 2.0) whose arch_spinlock type has a size of
>> 16 bytes. I am not sure if 32-bit PA-RISC 1.x is still supported or
>> not, but we can always disable the optimization for this case.
> You have to do that right from the beginning with a proper data
> structure and proper accessor functions. Introducing wreckage and then
> retroactivly fixing it, is not a really good idea.
For architecture that needs a larger than 32-bit arch_spin_lock type,
the optimization code will be disabled just like the non-SMP and full RT
cases.
>>> So what you really want is a new data structure, e.g. something like
>>> struct spinlock_refcount() and a proper set of new accessor functions
>>> instead of an adhoc interface which is designed solely for the needs
>>> of dcache improvements.
>> I had thought about that. The only downside is we need more code changes to
>> make existing code to use the new infrastructure. One of the drivers in my
> That's not a downside. It makes the usage of the infrastructure more
> obvious and not hidden behind macro magic. And the changes are trivial
> to script.
Making the code from d_lock to lockref.lock, for example, is trivial.
However, there are about 40 different files that need to be changed with
different maintainers. Do I need to get an ACK from all of them or can I
just go ahead with such trivial changes without their ACK? You know it
can be hard to get responses from so many maintainers in a timely
manner. This is actually my main concern.
>> design is to minimize change to existing code. Personally, I have no
>> objection of doing that if others think this is the right way to go.
> Definitely. Something like this would be ok:
>
> struct lock_refcnt {
> int refcnt;
> struct spinlock lock;
> };
>
> It does not require a gazillion of ifdefs and works for
> UP,SMP,DEBUG....
>
> extern bool lock_refcnt_mod(struct lock_refcnt *lr, int mod, int cond);
>
> You also want something like:
>
> extern void lock_refcnt_disable(void);
>
> So we can runtime disable it e.g. when lock elision is detected and
> active. So you can force lock_refcnt_mod() to return false.
>
> static inline bool lock_refcnt_inc(struct lock_refcnt *sr)
> {
> #ifdef CONFIG_HAVE_LOCK_REFCNT
> return lock_refcnt_mod(sr, 1, INTMAX);
> #else
> return false;
> #endif
> }
>
> That does not break any code as long as CONFIG_HAVE_SPINLOCK_REFCNT=n.
>
> So we can enable it per architecture and make it depend on SMP. For RT
> we simply can force this switch to n.
>
> The other question is the semantic of these refcount functions. From
> your description the usage pattern is:
>
> if (refcnt_xxx())
> return;
> /* Slow path */
> spin_lock();
> ...
> spin_unlock();
>
> So it might be sensible to make this explicit:
>
> static inline bool refcnt_inc_or_lock(struct lock_refcnt *lr))
> {
> #ifdef CONFIG_HAVE_SPINLOCK_REFCNT
> if (lock_refcnt_mod(lr, 1, INTMAX))
> return true;
> #endif
> spin_lock(&lr->lock);
> return false;
> }
Yes, it is a good idea to have a config variable to enable/disable it as
long as the default is "y". Of course, an full RT kernel or an non-SMP
kernel will have it disabled.
Regards,
Longman
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH v2 1/2] spinlock: New spinlock_refcount.h for lockless update of refcount
2013-06-29 20:23 ` Waiman Long
@ 2013-06-29 21:34 ` Waiman Long
2013-06-29 22:11 ` Linus Torvalds
2013-06-29 21:58 ` Linus Torvalds
1 sibling, 1 reply; 28+ messages in thread
From: Waiman Long @ 2013-06-29 21:34 UTC (permalink / raw)
To: Linus Torvalds
Cc: Alexander Viro, Jeff Layton, Miklos Szeredi, Ingo Molnar,
linux-fsdevel, Linux Kernel Mailing List, Benjamin Herrenschmidt,
Andi Kleen, Chandramouleeswaran, Aswin, Norton, Scott J,
Thomas Gleixner, Peter Zijlstra, Steven Rostedt
On 06/29/2013 04:23 PM, Waiman Long wrote:
> On 06/29/2013 01:45 PM, Linus Torvalds wrote:
>
>> But more importantly, I think this needs to be architecture-specific,
>> and using<linux/spinlock_refcount.h> to try to do some generic 64-bit
>> cmpxchg() version is a bad bad idea.
>
> Yes, I can put the current implementation into
> asm-generic/spinlock_refcount.h. Now I need to put an
> asm/spinlock_refcount.h into every arch's include/asm directory.
> Right? I don't think there is a mechanism in the build script to
> create a symlink from asm to generic-asm when a header file is
> missing. Is it the general rule that we should have a
> linux/spinlock_refcount.h that include asm/spinlock_refcount.h instead
> of including asm/spinlock_refcount.h directly?
>
>> We have several architectures coming up that have memory transaction
>> support, and the "spinlock with refcount" is a perfect candidate for a
>> transactional memory implementation. So when introducing a new atomic
>> like this that is very performance-critical and used for some very
>> core code, I really think architectures would want to make their own
>> optimized versions.
>>
>> These things should also not be inlined, I think.
>>
I think I got it now. For architecture with transactional memory support
to use an alternative implementation, we will need to use some kind of
dynamic patching at kernel boot up time as not all CPUs in that
architecture will have that support. In that case the helper functions
have to be real functions and cannot be inlined. That means I need to
put the implementation into a spinlock_refcount.c file with the header
file contains structure definitions and function prototypes only. Is
that what you are looking for?
Regards,
Longman
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH v2 1/2] spinlock: New spinlock_refcount.h for lockless update of refcount
2013-06-29 20:23 ` Waiman Long
2013-06-29 21:34 ` Waiman Long
@ 2013-06-29 21:58 ` Linus Torvalds
2013-06-29 22:47 ` Linus Torvalds
1 sibling, 1 reply; 28+ messages in thread
From: Linus Torvalds @ 2013-06-29 21:58 UTC (permalink / raw)
To: Waiman Long
Cc: Alexander Viro, Jeff Layton, Miklos Szeredi, Ingo Molnar,
linux-fsdevel, Linux Kernel Mailing List, Benjamin Herrenschmidt,
Andi Kleen, Chandramouleeswaran, Aswin, Norton, Scott J,
Thomas Gleixner, Peter Zijlstra, Steven Rostedt
On Sat, Jun 29, 2013 at 1:23 PM, Waiman Long <waiman.long@hp.com> wrote:
>>
>> But more importantly, I think this needs to be architecture-specific,
>> and using<linux/spinlock_refcount.h> to try to do some generic 64-bit
>> cmpxchg() version is a bad bad idea.
>
> Yes, I can put the current implementation into
> asm-generic/spinlock_refcount.h. Now I need to put an
> asm/spinlock_refcount.h into every arch's include/asm directory. Right?
No, I'd suggest something slightly different: we did something fairly
similar to this with the per-arch optimized version of the
word-at-a-time filename hashing/lookup.
So the sanest approach is something like this:
STEP 1:
- create a new set of config option (say
"CONFIG_[ARCH_]SPINLOCK_REFCOUNT") that defaults to 'y' and isn't
actually asked about anywhere. Just a simple
config ARCH_SPINLOCK_REFCOUNT
bool
config SPINLOCK_REFCOUNT
depends on ARCH_SPINLOCK_REFCOUNT
depends on !GENERIC_LOCKBREAK
depends on !DEBUG_SPINLOCK
depends on !DEBUG_LOCK_ALLOC
bool
in init/Kconfig will do that.
- create a <linux/spinlock-refcount.h> that looks something like this
#ifdef CONFIG_SPINLOCK_REFCOUNT
#include <asm/spinlock-refcount.h>
#else
.. fallback implementation with normal spinlocks that works for
all cases, including lockdep etc ..
#endif
and now you should have something that should already work, it just
doesn't actually do the optimized cases for any architecture. Equally
importantly, it already self-disables itself if any of the complicated
debug options are set that means that a spinlock is anything but the
raw architecture spinlock.
STEP 2:
- Now, step #2 is *not* to do per-architecture version, but to do the
generic cmpxchg version, and make that in
<asm-generic/spinlock-refcount.h>
At this point, any architecture can decide to hook into the generic
version with something like this in their own architecture Kconfig
file:
select CONFIG_ARCH_SPINLOCK_REFCOUNT
and adding the line
generic-y += spinlock-refcount.h
to their architecture Kbuild file. So after step two, you basically
have an easy way for architectures to step in one by one with the
cmpxchg version, after they have verified that their spinlocks work
for it. And notice how little the architecture actually needed to do?
STEP 3:
- This is the "architecture decides to not use the
<asm-generic/spinlock-refcount.h> file, and instead creates its own
optimized version of it.
All these steps are nice and easy and can be tested on their own. And
make sense on their own. Even that first step is important, since
that's when you create the logic, and that's the code that will be
used when spinlock debugging is enabled etc. The third step may not
ever happen for all architectures. Even the second step might be
something that some architecture decides is not worth it (ie slow and
complex atomics, but fast spinlocks).
And all three steps should be fairly small and logical, I think.
Linus
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH v2 1/2] spinlock: New spinlock_refcount.h for lockless update of refcount
2013-06-29 21:34 ` Waiman Long
@ 2013-06-29 22:11 ` Linus Torvalds
2013-06-29 22:34 ` Waiman Long
0 siblings, 1 reply; 28+ messages in thread
From: Linus Torvalds @ 2013-06-29 22:11 UTC (permalink / raw)
To: Waiman Long
Cc: Alexander Viro, Jeff Layton, Miklos Szeredi, Ingo Molnar,
linux-fsdevel, Linux Kernel Mailing List, Benjamin Herrenschmidt,
Andi Kleen, Chandramouleeswaran, Aswin, Norton, Scott J,
Thomas Gleixner, Peter Zijlstra, Steven Rostedt
On Sat, Jun 29, 2013 at 2:34 PM, Waiman Long <waiman.long@hp.com> wrote:
>
> I think I got it now. For architecture with transactional memory support to
> use an alternative implementation, we will need to use some kind of dynamic
> patching at kernel boot up time as not all CPUs in that architecture will
> have that support. In that case the helper functions have to be real
> functions and cannot be inlined. That means I need to put the implementation
> into a spinlock_refcount.c file with the header file contains structure
> definitions and function prototypes only. Is that what you are looking for?
Yes. Except even more complex: I want the generic fallbacks in a
lib/*.c files too.
So we basically have multiple "levels" of specialization:
(a) the purely lock-based model that doesn't do any optimization at
all, because we have lockdep enabled etc, so we *want* things to fall
back to real spinlocks.
(b) the generic cmpxchg approach for the case when that works
(c) the capability for an architecture to make up its own very
specialized version
and while I think in all cases the actual functions are big enough
that you don't ever want to inline them, at least in the case of (c)
it is entirely possible that the architecture actually wants a
particular layout for the spinlock and refcount, so we do want the
architecture to be able to specify the exact data structure in its own
<asm/spinlock-refcount.h> file. In fact, that may well be true of case
(b) too, as Andi already pointed out that on x86-32, an "u64" is not
necessarily sufficiently aligned for efficient cmpxchg (it may *work*,
but cacheline-crossing atomics are very very slow).
Other architectures may have other issues - even with a "generic"
cmpxchg-based library version, they may well want to specify exactly
how to take the lock. So while (a) would be 100% generic, (b) might
need small architecture-specific tweaks, and (c) would be a full
custom implementation.
See how we do <asm/word-at-a-time.h> and CONFIG_DCACHE_WORD_ACCESS.
Notice how there is a "generic" <asm-generic/word-at-a-time.h> file
(actually, big-endian only) for reference implementations (used by
sparc, m68k and parisc, for example), and then you have "full custom"
implementations for x86, powerpc, alpha and ARM.
See also lib/strnlen_user.c and CONFIG_GENERIC_STRNLEN_USER as an
example of how architectures may choose to opt in to using generic
library versions - if those work sufficiently well for that
architecture. Again, some architecture may decide to write their own
fully custome strlen_user() function.
Very similar concept.
Linus
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH v2 1/2] spinlock: New spinlock_refcount.h for lockless update of refcount
2013-06-29 22:11 ` Linus Torvalds
@ 2013-06-29 22:34 ` Waiman Long
0 siblings, 0 replies; 28+ messages in thread
From: Waiman Long @ 2013-06-29 22:34 UTC (permalink / raw)
To: Linus Torvalds
Cc: Alexander Viro, Jeff Layton, Miklos Szeredi, Ingo Molnar,
linux-fsdevel, Linux Kernel Mailing List, Benjamin Herrenschmidt,
Andi Kleen, Chandramouleeswaran, Aswin, Norton, Scott J,
Thomas Gleixner, Peter Zijlstra, Steven Rostedt
On 06/29/2013 06:11 PM, Linus Torvalds wrote:
> On Sat, Jun 29, 2013 at 2:34 PM, Waiman Long<waiman.long@hp.com> wrote:
>> I think I got it now. For architecture with transactional memory support to
>> use an alternative implementation, we will need to use some kind of dynamic
>> patching at kernel boot up time as not all CPUs in that architecture will
>> have that support. In that case the helper functions have to be real
>> functions and cannot be inlined. That means I need to put the implementation
>> into a spinlock_refcount.c file with the header file contains structure
>> definitions and function prototypes only. Is that what you are looking for?
> Yes. Except even more complex: I want the generic fallbacks in a
> lib/*.c files too.
>
> So we basically have multiple "levels" of specialization:
>
> (a) the purely lock-based model that doesn't do any optimization at
> all, because we have lockdep enabled etc, so we *want* things to fall
> back to real spinlocks.
>
> (b) the generic cmpxchg approach for the case when that works
>
> (c) the capability for an architecture to make up its own very
> specialized version
>
> and while I think in all cases the actual functions are big enough
> that you don't ever want to inline them, at least in the case of (c)
> it is entirely possible that the architecture actually wants a
> particular layout for the spinlock and refcount, so we do want the
> architecture to be able to specify the exact data structure in its own
> <asm/spinlock-refcount.h> file. In fact, that may well be true of case
> (b) too, as Andi already pointed out that on x86-32, an "u64" is not
> necessarily sufficiently aligned for efficient cmpxchg (it may *work*,
> but cacheline-crossing atomics are very very slow).
>
> Other architectures may have other issues - even with a "generic"
> cmpxchg-based library version, they may well want to specify exactly
> how to take the lock. So while (a) would be 100% generic, (b) might
> need small architecture-specific tweaks, and (c) would be a full
> custom implementation.
>
> See how we do<asm/word-at-a-time.h> and CONFIG_DCACHE_WORD_ACCESS.
> Notice how there is a "generic"<asm-generic/word-at-a-time.h> file
> (actually, big-endian only) for reference implementations (used by
> sparc, m68k and parisc, for example), and then you have "full custom"
> implementations for x86, powerpc, alpha and ARM.
>
> See also lib/strnlen_user.c and CONFIG_GENERIC_STRNLEN_USER as an
> example of how architectures may choose to opt in to using generic
> library versions - if those work sufficiently well for that
> architecture. Again, some architecture may decide to write their own
> fully custome strlen_user() function.
>
> Very similar concept.
>
> Linus
Thank for the quick response. I now have a much better idea of what I
need to do. I will send out a new patch for review once the code is ready.
Regards,
Longman
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH v2 1/2] spinlock: New spinlock_refcount.h for lockless update of refcount
2013-06-29 21:58 ` Linus Torvalds
@ 2013-06-29 22:47 ` Linus Torvalds
2013-07-01 13:40 ` Waiman Long
0 siblings, 1 reply; 28+ messages in thread
From: Linus Torvalds @ 2013-06-29 22:47 UTC (permalink / raw)
To: Waiman Long
Cc: Alexander Viro, Jeff Layton, Miklos Szeredi, Ingo Molnar,
linux-fsdevel, Linux Kernel Mailing List, Benjamin Herrenschmidt,
Andi Kleen, Chandramouleeswaran, Aswin, Norton, Scott J,
Thomas Gleixner, Peter Zijlstra, Steven Rostedt
On Sat, Jun 29, 2013 at 2:58 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> STEP 1:
>
> - create a new set of config option (say
> "CONFIG_[ARCH_]SPINLOCK_REFCOUNT") that defaults to 'y' and isn't
> actually asked about anywhere.
That "defauls to 'y'" was incomplete/misleading. The
ARCH_SPINLOCK_REFCOUNT config option should default to 'n' (the
example code snippet did that correctly: a bool config option defaults
to 'n' unless stated otherwise).
The SPINLOCK_REFCOUNT in turn _should_ have a "default y" (the sample
snippet I posted didn't have that), but that 'y' is then obviously
normally disabled by the "depends on" logic, so it only takes effect
when ARCH_SPINLOCK_REFCOUNT is set, and none of the spinlock debug
things are enabled.
I hope that was all fairly obvious from the concept (and the real
examples of us already doing things like this with the whole
CONFIG_GENERIC_STRNLEN_USER and CONFIG_DCACHE_WORD_ACCESS examples),
but I thought I'd follow up with an explicit correction, since both
the explanation and sample snippets were somewhat misleading.
Linus
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH v2 1/2] spinlock: New spinlock_refcount.h for lockless update of refcount
2013-06-29 22:47 ` Linus Torvalds
@ 2013-07-01 13:40 ` Waiman Long
0 siblings, 0 replies; 28+ messages in thread
From: Waiman Long @ 2013-07-01 13:40 UTC (permalink / raw)
To: Linus Torvalds
Cc: Alexander Viro, Jeff Layton, Miklos Szeredi, Ingo Molnar,
linux-fsdevel, Linux Kernel Mailing List, Benjamin Herrenschmidt,
Andi Kleen, Chandramouleeswaran, Aswin, Norton, Scott J,
Thomas Gleixner, Peter Zijlstra, Steven Rostedt
On 06/29/2013 06:47 PM, Linus Torvalds wrote:
> On Sat, Jun 29, 2013 at 2:58 PM, Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
>> STEP 1:
>>
>> - create a new set of config option (say
>> "CONFIG_[ARCH_]SPINLOCK_REFCOUNT") that defaults to 'y' and isn't
>> actually asked about anywhere.
> That "defauls to 'y'" was incomplete/misleading. The
> ARCH_SPINLOCK_REFCOUNT config option should default to 'n' (the
> example code snippet did that correctly: a bool config option defaults
> to 'n' unless stated otherwise).
>
> The SPINLOCK_REFCOUNT in turn _should_ have a "default y" (the sample
> snippet I posted didn't have that), but that 'y' is then obviously
> normally disabled by the "depends on" logic, so it only takes effect
> when ARCH_SPINLOCK_REFCOUNT is set, and none of the spinlock debug
> things are enabled.
>
> I hope that was all fairly obvious from the concept (and the real
> examples of us already doing things like this with the whole
> CONFIG_GENERIC_STRNLEN_USER and CONFIG_DCACHE_WORD_ACCESS examples),
> but I thought I'd follow up with an explicit correction, since both
> the explanation and sample snippets were somewhat misleading.
>
> Linus
Thank for the clarification, but I had already understood the
implication of your suggestion. Each architecture has to explicitly
opt-in by defining an ARCH_SPINLOCK_REFCOUNT in its Kconfig file to have
this lockref feature enabled. In this way, each architecture can take
its time to verify it works before enabling it. My patch will enable
enable the x86 as I don't have test machines for the other architectures.
Regards,
Longman
^ permalink raw reply [flat|nested] 28+ messages in thread
end of thread, other threads:[~2013-07-01 13:40 UTC | newest]
Thread overview: 28+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-06-26 17:43 [PATCH v2 0/2] Lockless update of reference count protected by spinlock Waiman Long
2013-06-26 17:43 ` [PATCH v2 1/2] spinlock: New spinlock_refcount.h for lockless update of refcount Waiman Long
2013-06-26 20:17 ` Andi Kleen
2013-06-26 21:07 ` Waiman Long
2013-06-26 21:22 ` Andi Kleen
2013-06-26 23:26 ` Waiman Long
2013-06-27 1:06 ` Andi Kleen
2013-06-27 1:15 ` Waiman Long
2013-06-27 1:24 ` Waiman Long
2013-06-27 1:37 ` Andi Kleen
2013-06-27 14:56 ` Waiman Long
2013-06-28 13:46 ` Thomas Gleixner
2013-06-29 20:30 ` Waiman Long
2013-06-26 23:27 ` Thomas Gleixner
2013-06-26 23:06 ` Thomas Gleixner
2013-06-27 0:16 ` Waiman Long
2013-06-27 14:44 ` Thomas Gleixner
2013-06-29 21:03 ` Waiman Long
2013-06-27 0:26 ` Waiman Long
2013-06-29 17:45 ` Linus Torvalds
2013-06-29 20:23 ` Waiman Long
2013-06-29 21:34 ` Waiman Long
2013-06-29 22:11 ` Linus Torvalds
2013-06-29 22:34 ` Waiman Long
2013-06-29 21:58 ` Linus Torvalds
2013-06-29 22:47 ` Linus Torvalds
2013-07-01 13:40 ` Waiman Long
2013-06-26 17:43 ` [PATCH v2 2/2] dcache: Locklessly update d_count whenever possible Waiman Long
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).