From: Jeff Layton <jlayton@kernel.org>
To: Alexander Viro <viro@zeniv.linux.org.uk>,
Christian Brauner <brauner@kernel.org>, Jan Kara <jack@suse.cz>,
Andrew Morton <akpm@linux-foundation.org>
Cc: Josef Bacik <josef@toxicpanda.com>,
linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org,
Jeff Layton <jlayton@kernel.org>
Subject: [PATCH RFC 3/4] lockref: rework CMPXCHG_LOOP to handle contention better
Date: Fri, 02 Aug 2024 17:45:04 -0400 [thread overview]
Message-ID: <20240802-openfast-v1-3-a1cff2a33063@kernel.org> (raw)
In-Reply-To: <20240802-openfast-v1-0-a1cff2a33063@kernel.org>
In a later patch, we want to change the open(..., O_CREAT) codepath to
avoid taking the inode->i_rwsem for write when the dentry already exists.
When we tested that initially, the performance devolved significantly
due to contention for the parent's d_lockref spinlock.
There are two problems with lockrefs today: First, once any concurrent
task takes the spinlock, they all end up taking the spinlock, which is
much more costly than a single cmpxchg operation. The second problem is
that once any task fails to cmpxchg 100 times, it falls back to the
spinlock. The upshot there is that even moderate contention can cause a
fallback to serialized spinlocking, which worsens performance.
This patch changes CMPXCHG_LOOP in 2 ways:
First, change the loop to spin instead of falling back to a locked
codepath when the spinlock is held. Once the lock is released, allow the
task to continue trying its cmpxchg loop as before instead of taking the
lock. Second, don't allow the cmpxchg loop to give up after 100 retries.
Just continue infinitely.
This greatly reduces contention on the lockref when there are large
numbers of concurrent increments and decrements occurring.
Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
lib/lockref.c | 85 ++++++++++++++++++++++-------------------------------------
1 file changed, 32 insertions(+), 53 deletions(-)
diff --git a/lib/lockref.c b/lib/lockref.c
index 2afe4c5d8919..b76941043fe9 100644
--- a/lib/lockref.c
+++ b/lib/lockref.c
@@ -8,22 +8,25 @@
* Note that the "cmpxchg()" reloads the "old" value for the
* failure case.
*/
-#define CMPXCHG_LOOP(CODE, SUCCESS) do { \
- int retry = 100; \
- struct lockref old; \
- BUILD_BUG_ON(sizeof(old) != 8); \
- old.lock_count = READ_ONCE(lockref->lock_count); \
- while (likely(arch_spin_value_unlocked(old.lock.rlock.raw_lock))) { \
- struct lockref new = old; \
- CODE \
- if (likely(try_cmpxchg64_relaxed(&lockref->lock_count, \
- &old.lock_count, \
- new.lock_count))) { \
- SUCCESS; \
- } \
- if (!--retry) \
- break; \
- } \
+#define CMPXCHG_LOOP(CODE, SUCCESS) do { \
+ struct lockref old; \
+ BUILD_BUG_ON(sizeof(old) != 8); \
+ old.lock_count = READ_ONCE(lockref->lock_count); \
+ for (;;) { \
+ struct lockref new = old; \
+ \
+ if (likely(arch_spin_value_unlocked(old.lock.rlock.raw_lock))) { \
+ CODE \
+ if (likely(try_cmpxchg64_relaxed(&lockref->lock_count, \
+ &old.lock_count, \
+ new.lock_count))) { \
+ SUCCESS; \
+ } \
+ } else { \
+ cpu_relax(); \
+ old.lock_count = READ_ONCE(lockref->lock_count); \
+ } \
+ } \
} while (0)
#else
@@ -46,10 +49,8 @@ void lockref_get(struct lockref *lockref)
,
return;
);
-
- spin_lock(&lockref->lock);
- lockref->count++;
- spin_unlock(&lockref->lock);
+ /* should never get here */
+ WARN_ON_ONCE(1);
}
EXPORT_SYMBOL(lockref_get);
@@ -60,8 +61,6 @@ EXPORT_SYMBOL(lockref_get);
*/
int lockref_get_not_zero(struct lockref *lockref)
{
- int retval;
-
CMPXCHG_LOOP(
new.count++;
if (old.count <= 0)
@@ -69,15 +68,9 @@ int lockref_get_not_zero(struct lockref *lockref)
,
return 1;
);
-
- spin_lock(&lockref->lock);
- retval = 0;
- if (lockref->count > 0) {
- lockref->count++;
- retval = 1;
- }
- spin_unlock(&lockref->lock);
- return retval;
+ /* should never get here */
+ WARN_ON_ONCE(1);
+ return -1;
}
EXPORT_SYMBOL(lockref_get_not_zero);
@@ -88,8 +81,6 @@ EXPORT_SYMBOL(lockref_get_not_zero);
*/
int lockref_put_not_zero(struct lockref *lockref)
{
- int retval;
-
CMPXCHG_LOOP(
new.count--;
if (old.count <= 1)
@@ -97,15 +88,9 @@ int lockref_put_not_zero(struct lockref *lockref)
,
return 1;
);
-
- spin_lock(&lockref->lock);
- retval = 0;
- if (lockref->count > 1) {
- lockref->count--;
- retval = 1;
- }
- spin_unlock(&lockref->lock);
- return retval;
+ /* should never get here */
+ WARN_ON_ONCE(1);
+ return -1;
}
EXPORT_SYMBOL(lockref_put_not_zero);
@@ -125,6 +110,8 @@ int lockref_put_return(struct lockref *lockref)
,
return new.count;
);
+ /* should never get here */
+ WARN_ON_ONCE(1);
return -1;
}
EXPORT_SYMBOL(lockref_put_return);
@@ -171,8 +158,6 @@ EXPORT_SYMBOL(lockref_mark_dead);
*/
int lockref_get_not_dead(struct lockref *lockref)
{
- int retval;
-
CMPXCHG_LOOP(
new.count++;
if (old.count < 0)
@@ -180,14 +165,8 @@ int lockref_get_not_dead(struct lockref *lockref)
,
return 1;
);
-
- spin_lock(&lockref->lock);
- retval = 0;
- if (lockref->count >= 0) {
- lockref->count++;
- retval = 1;
- }
- spin_unlock(&lockref->lock);
- return retval;
+ /* should never get here */
+ WARN_ON_ONCE(1);
+ return -1;
}
EXPORT_SYMBOL(lockref_get_not_dead);
--
2.45.2
next prev parent reply other threads:[~2024-08-02 21:45 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-08-02 21:45 [PATCH RFC 0/4] fs: try an opportunistic lookup for O_CREAT opens too Jeff Layton
2024-08-02 21:45 ` [PATCH RFC 1/4] fs: remove comment about d_rcu_to_refcount Jeff Layton
2024-08-02 21:45 ` [PATCH RFC 2/4] fs: add a kerneldoc header over lookup_fast Jeff Layton
2024-08-02 21:45 ` Jeff Layton [this message]
2024-08-03 4:44 ` [PATCH RFC 3/4] lockref: rework CMPXCHG_LOOP to handle contention better Mateusz Guzik
2024-08-03 9:09 ` Mateusz Guzik
2024-08-03 10:59 ` Jeff Layton
2024-08-03 11:21 ` Mateusz Guzik
2024-08-03 11:32 ` Jeff Layton
2024-08-05 11:44 ` Christian Brauner
2024-08-05 12:52 ` Jeff Layton
2024-08-06 11:36 ` Christian Brauner
2024-08-03 10:55 ` Jeff Layton
2024-08-02 21:45 ` [PATCH RFC 4/4] fs: try an opportunistic lookup for O_CREAT opens too Jeff Layton
2024-08-05 10:46 ` [PATCH RFC 0/4] " Christian Brauner
2024-08-05 11:55 ` Jeff Layton
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20240802-openfast-v1-3-a1cff2a33063@kernel.org \
--to=jlayton@kernel.org \
--cc=akpm@linux-foundation.org \
--cc=brauner@kernel.org \
--cc=jack@suse.cz \
--cc=josef@toxicpanda.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=viro@zeniv.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).