From: Linus Torvalds <torvalds@linux-foundation.org>
To: Guo Ren <guoren@kernel.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>,
Peter Zijlstra <peterz@infradead.org>,
linux-fsdevel@vger.kernel.org
Subject: Re: lockless case of retain_dentry() (was Re: [PATCH 09/15] fold the call of retain_dentry() into fast_dput())
Date: Wed, 22 Nov 2023 11:11:38 -0800 [thread overview]
Message-ID: <CAHk-=wj2ky85K5HYYLeLCP23qyTJpirnpiVSu5gWyT_GRXbJaQ@mail.gmail.com> (raw)
In-Reply-To: <CAHk-=wg6D_d-zaRfXZ=sUX1fbTJykQ4KxXCmEk3aq73wVk_ORA@mail.gmail.com>
[-- Attachment #1: Type: text/plain, Size: 1952 bytes --]
On Wed, 22 Nov 2023 at 09:52, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> Still not actually tested, but the code generation on x86 looks
> reasonable, so it migth be worth looking at whether it helps the
> RISC-V case.
Doing some more munging, and actually looking at RISC-V code
generation too (I obviously had to enable ARCH_USE_CMPXCHG_LOCKREF for
RISC-V).
I get this:
lockref_get:
addi sp,sp,-32
sd s0,16(sp)
sd s1,8(sp)
sd ra,24(sp)
addi s0,sp,32
li a1,65536
ld a5,0(a0)
mv s1,a0
addi a1,a1,-1
li a0,100
.L43:
sext.w a3,a5
li a4,1
srliw a2,a5,16
and a3,a3,a1
slli a4,a4,32
bne a2,a3,.L49
add a4,a5,a4
0:
lr.d a3, 0(s1)
bne a3, a5, 1f
sc.d.rl a2, a4, 0(s1)
bnez a2, 0b
fence rw, rw
1:
bne a5,a3,.L52
ld ra,24(sp)
ld s0,16(sp)
ld s1,8(sp)
addi sp,sp,32
jr ra
...
so now that single update is indeed just one single instruction:
add a4,a5,a4
is that "increment count in the high 32 bits".
The ticket lock being unlocked checks are those
li a1,65536
sext.w a3,a5
srliw a2,a5,16
and a3,a3,a1
bne a2,a3,.L49
instructions if I read it right.
That actually looks fairly close to optimal, although the frame setup
is kind of sad.
(The above does not include the "loop if the cmpxchg failed" part of
the code generation)
Anyway, apart from enabling LOCKREF, the patch to get this for RISC-V
is attached.
I'm not going to play with this any more, but you might want to check
whether this actually does work on RISC-V.
Becaue I only looked at the code generation, I didn't actually look at
whether it *worked*.
Linus
[-- Attachment #2: 0001-lockref-improve-code-generation-for-ref-updates.patch --]
[-- Type: text/x-patch, Size: 3125 bytes --]
From 168f35850c15468941e597907e33daacd179d54a Mon Sep 17 00:00:00 2001
From: Linus Torvalds <torvalds@linux-foundation.org>
Date: Wed, 22 Nov 2023 09:33:29 -0800
Subject: [PATCH] lockref: improve code generation for ref updates
Our lockref data structure is two 32-bit words laid out next to each
other, combining the spinlock and the count into one entity that can be
accessed atomically together.
In particular, the structure is laid out so that the count is the upper
32 bit word (on little-endian), so that you can do basic arithmetic on
the count in 64 bits: instead of adding one to the 32-bit word, you can
just add a value shifted by 32 to the full 64-bit word.
Sadly, neither gcc nor clang are quite clever enough to work that out on
their own, so this does that "manually".
Also, try to do any compares against zero values, which generally
improves the code generation. So rather than check that the value was
at least 1 before a decrement, check that it's positive or zero after
the decrement. We don't worry about the overflow point in lockrefs.
Cc: Guo Ren <guoren@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---
lib/lockref.c | 29 ++++++++++++++++++++---------
1 file changed, 20 insertions(+), 9 deletions(-)
diff --git a/lib/lockref.c b/lib/lockref.c
index 2afe4c5d8919..f3c30c538af1 100644
--- a/lib/lockref.c
+++ b/lib/lockref.c
@@ -26,6 +26,17 @@
} \
} while (0)
+/*
+ * The compiler isn't smart enough to the the count
+ * increment in the high 32 bits of the 64-bit value,
+ * so do this optimization by hand.
+ */
+#if defined(__LITTLE_ENDIAN) && BITS_PER_LONG == 64
+ #define LOCKREF_ADD(n,x) ((n).lock_count += (unsigned long)(x)<<32)
+#else
+ #define LOCKREF_ADD(n,x) ((n).count += (unsigned long)(x)<<32)
+#endif
+
#else
#define CMPXCHG_LOOP(CODE, SUCCESS) do { } while (0)
@@ -42,7 +53,7 @@
void lockref_get(struct lockref *lockref)
{
CMPXCHG_LOOP(
- new.count++;
+ LOCKREF_ADD(new,1);
,
return;
);
@@ -63,9 +74,9 @@ int lockref_get_not_zero(struct lockref *lockref)
int retval;
CMPXCHG_LOOP(
- new.count++;
if (old.count <= 0)
return 0;
+ LOCKREF_ADD(new,1);
,
return 1;
);
@@ -91,8 +102,8 @@ int lockref_put_not_zero(struct lockref *lockref)
int retval;
CMPXCHG_LOOP(
- new.count--;
- if (old.count <= 1)
+ LOCKREF_ADD(new,-1);
+ if (new.count <= 0)
return 0;
,
return 1;
@@ -119,8 +130,8 @@ EXPORT_SYMBOL(lockref_put_not_zero);
int lockref_put_return(struct lockref *lockref)
{
CMPXCHG_LOOP(
- new.count--;
- if (old.count <= 0)
+ LOCKREF_ADD(new,-1);
+ if (new.count < 0)
return -1;
,
return new.count;
@@ -137,8 +148,8 @@ EXPORT_SYMBOL(lockref_put_return);
int lockref_put_or_lock(struct lockref *lockref)
{
CMPXCHG_LOOP(
- new.count--;
- if (old.count <= 1)
+ LOCKREF_ADD(new,-1);
+ if (new.count <= 0)
break;
,
return 1;
@@ -174,9 +185,9 @@ int lockref_get_not_dead(struct lockref *lockref)
int retval;
CMPXCHG_LOOP(
- new.count++;
if (old.count < 0)
return 0;
+ LOCKREF_ADD(new,1);
,
return 1;
);
--
2.43.0.5.g38fb137bdb
next prev parent reply other threads:[~2023-11-22 19:11 UTC|newest]
Thread overview: 119+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-10-30 0:37 [RFC] simplifying fast_dput(), dentry_kill() et.al Al Viro
2023-10-30 21:53 ` Al Viro
2023-10-30 22:18 ` Linus Torvalds
2023-10-31 0:18 ` Al Viro
2023-10-31 1:53 ` Al Viro
2023-10-31 6:12 ` Al Viro
2023-11-01 6:18 ` Al Viro
2023-11-01 6:20 ` [PATCH 01/15] fast_dput(): having ->d_delete() is not reason to delay refcount decrement Al Viro
2023-11-01 6:20 ` [PATCH 02/15] fast_dput(): handle underflows gracefully Al Viro
2023-11-01 6:20 ` [PATCH 03/15] fast_dput(): new rules for refcount Al Viro
2023-11-01 6:20 ` [PATCH 04/15] __dput_to_list(): do decrement of refcount in the caller Al Viro
2023-11-01 6:20 ` [PATCH 05/15] retain_dentry(): lift decrement of ->d_count into callers Al Viro
2023-11-01 6:20 ` [PATCH 06/15] __dentry_kill(): get consistent rules for ->d_count Al Viro
2023-11-01 6:20 ` [PATCH 07/15] dentry_kill(): don't bother with retain_dentry() on slow path Al Viro
2023-11-01 6:20 ` [PATCH 08/15] Call retain_dentry() with refcount 0 Al Viro
2023-11-01 6:20 ` [PATCH 09/15] fold the call of retain_dentry() into fast_dput() Al Viro
2023-11-01 8:45 ` Al Viro
2023-11-01 17:30 ` Linus Torvalds
2023-11-01 18:19 ` Al Viro
2023-11-10 4:20 ` lockless case of retain_dentry() (was Re: [PATCH 09/15] fold the call of retain_dentry() into fast_dput()) Al Viro
2023-11-10 5:57 ` Linus Torvalds
2023-11-10 6:22 ` Linus Torvalds
2023-11-22 6:29 ` Guo Ren
2023-11-10 8:19 ` Al Viro
2023-11-22 7:19 ` Guo Ren
2023-11-22 17:20 ` Linus Torvalds
2023-11-22 17:52 ` Linus Torvalds
2023-11-22 18:05 ` Linus Torvalds
2023-11-22 19:11 ` Linus Torvalds [this message]
2023-11-29 7:14 ` Guo Ren
2023-11-29 12:25 ` Guo Ren
2023-11-29 14:42 ` Linus Torvalds
2023-11-26 16:39 ` Guo Ren
2023-11-26 16:51 ` Linus Torvalds
2023-11-30 10:00 ` Guo Ren
2023-12-01 1:09 ` Linus Torvalds
2023-12-01 3:36 ` Guo Ren
2023-12-01 5:15 ` Linus Torvalds
2023-12-01 7:31 ` Guo Ren
2023-11-26 16:51 ` Guo Ren
2023-11-26 17:06 ` Linus Torvalds
2023-11-26 17:59 ` Linus Torvalds
2023-11-29 9:52 ` Guo Ren
2023-11-01 6:20 ` [PATCH 10/15] don't try to cut corners in shrink_lock_dentry() Al Viro
2023-11-01 6:21 ` [PATCH 11/15] fold dentry_kill() into dput() Al Viro
2023-11-01 6:21 ` [PATCH 12/15] get rid of __dget() Al Viro
2023-11-01 6:21 ` [PATCH 13/15] shrink_dentry_list(): no need to check that dentry refcount is marked dead Al Viro
2023-11-01 6:21 ` [PATCH 14/15] to_shrink_list(): call only if refcount is 0 Al Viro
2023-11-01 6:21 ` [PATCH 15/15] switch select_collect{,2}() to use of to_shrink_list() Al Viro
2023-11-01 2:22 ` [RFC] simplifying fast_dput(), dentry_kill() et.al Al Viro
2023-11-01 14:29 ` Benjamin Coddington
2023-11-05 19:54 ` Al Viro
2023-11-05 21:59 ` Al Viro
2023-11-06 5:53 ` Al Viro
2023-11-07 2:08 ` Al Viro
2023-11-09 6:19 ` [RFC][PATCHSET v2] " Al Viro
2023-11-09 6:20 ` [PATCH 01/22] struct dentry: get rid of randomize_layout idiocy Al Viro
2023-11-09 6:20 ` [PATCH 02/22] switch nfsd_client_rmdir() to use of simple_recursive_removal() Al Viro
2023-11-09 13:42 ` Christian Brauner
2023-11-09 14:01 ` Chuck Lever
2023-11-09 18:47 ` Al Viro
2023-11-09 18:50 ` Chuck Lever III
2023-11-09 6:20 ` [PATCH 03/22] coda_flag_children(): cope with dentries turning negative Al Viro
2023-11-09 13:43 ` Christian Brauner
2023-11-09 6:20 ` [PATCH 04/22] dentry: switch the lists of children to hlist Al Viro
2023-11-09 13:48 ` Christian Brauner
2023-11-09 19:32 ` Al Viro
2023-11-09 6:20 ` [PATCH 05/22] centralize killing dentry from shrink list Al Viro
2023-11-09 13:49 ` Christian Brauner
2023-11-09 6:20 ` [PATCH 06/22] get rid of __dget() Al Viro
2023-11-09 13:50 ` Christian Brauner
2023-11-09 6:20 ` [PATCH 07/22] shrink_dentry_list(): no need to check that dentry refcount is marked dead Al Viro
2023-11-09 13:53 ` Christian Brauner
2023-11-09 20:28 ` Al Viro
2023-11-09 6:20 ` [PATCH 08/22] fast_dput(): having ->d_delete() is not reason to delay refcount decrement Al Viro
2023-11-09 13:58 ` Christian Brauner
2023-11-09 6:20 ` [PATCH 09/22] fast_dput(): handle underflows gracefully Al Viro
2023-11-09 14:46 ` Christian Brauner
2023-11-09 20:39 ` Al Viro
2023-11-09 6:20 ` [PATCH 10/22] fast_dput(): new rules for refcount Al Viro
2023-11-09 14:54 ` Christian Brauner
2023-11-09 20:52 ` Al Viro
2023-11-09 6:20 ` [PATCH 11/22] __dput_to_list(): do decrement of refcount in the callers Al Viro
2023-11-09 15:21 ` Christian Brauner
2023-11-09 6:20 ` [PATCH 12/22] Make retain_dentry() neutral with respect to refcounting Al Viro
2023-11-09 15:22 ` Christian Brauner
2023-11-09 6:20 ` [PATCH 13/22] __dentry_kill(): get consistent rules for victim's refcount Al Viro
2023-11-09 15:27 ` Christian Brauner
2023-11-09 6:20 ` [PATCH 14/22] dentry_kill(): don't bother with retain_dentry() on slow path Al Viro
2023-11-09 15:53 ` Christian Brauner
2023-11-09 21:29 ` Al Viro
2023-11-09 6:20 ` [PATCH 15/22] Call retain_dentry() with refcount 0 Al Viro
2023-11-09 16:09 ` Christian Brauner
2023-11-09 6:20 ` [PATCH 16/22] fold the call of retain_dentry() into fast_dput() Al Viro
2023-11-09 16:17 ` Christian Brauner
2023-11-09 6:20 ` [PATCH 17/22] don't try to cut corners in shrink_lock_dentry() Al Viro
2023-11-09 17:20 ` Christian Brauner
2023-11-09 21:45 ` Al Viro
2023-11-10 9:07 ` Christian Brauner
2023-11-09 17:39 ` Linus Torvalds
2023-11-09 18:11 ` Linus Torvalds
2023-11-09 18:20 ` Al Viro
2023-11-09 6:20 ` [PATCH 18/22] fold dentry_kill() into dput() Al Viro
2023-11-09 17:22 ` Christian Brauner
2023-11-09 6:20 ` [PATCH 19/22] to_shrink_list(): call only if refcount is 0 Al Viro
2023-11-09 17:29 ` Christian Brauner
2023-11-09 6:20 ` [PATCH 20/22] switch select_collect{,2}() to use of to_shrink_list() Al Viro
2023-11-09 17:31 ` Christian Brauner
2023-11-09 6:20 ` [PATCH 21/22] d_prune_aliases(): use a shrink list Al Viro
2023-11-09 17:33 ` Christian Brauner
2023-11-09 6:20 ` [PATCH 22/22] __dentry_kill(): new locking scheme Al Viro
2023-11-10 13:34 ` Christian Brauner
2023-11-09 13:33 ` [PATCH 01/22] struct dentry: get rid of randomize_layout idiocy Christian Brauner
2023-10-31 2:25 ` [RFC] simplifying fast_dput(), dentry_kill() et.al Gao Xiang
2023-10-31 2:29 ` Gao Xiang
2023-10-31 3:02 ` Linus Torvalds
2023-10-31 3:13 ` Gao Xiang
2023-10-31 3:26 ` Al Viro
2023-10-31 3:41 ` Linus Torvalds
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CAHk-=wj2ky85K5HYYLeLCP23qyTJpirnpiVSu5gWyT_GRXbJaQ@mail.gmail.com' \
--to=torvalds@linux-foundation.org \
--cc=guoren@kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=peterz@infradead.org \
--cc=viro@zeniv.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).