public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH] fs/xfs: Use atomic64_try_cmpxchg in xlog_grant_{add,sub}_space
@ 2022-08-09 16:56 Uros Bizjak
  2022-08-09 22:05 ` Dave Chinner
  0 siblings, 1 reply; 5+ messages in thread
From: Uros Bizjak @ 2022-08-09 16:56 UTC (permalink / raw)
  To: linux-xfs, linux-kernel; +Cc: Uros Bizjak, Darrick J. Wong

Use `!atomic64_try_cmpxchg(ptr, &old, new)` instead of
`atomic64_cmpxchg(ptr, old, new) != old` in xlog_grant_{add,sub}_space.
This has two benefits:

- The x86 cmpxchg instruction returns success in the ZF flag, so this
  change saves a compare after cmpxchg, as well as a related move
  instruction in the front of cmpxchg.

- atomic64_try_cmpxchg implicitly assigns the *ptr value to &old when
  cmpxchg fails, enabling further code simplifications.

This patch has no functional change.

Cc: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
---
 fs/xfs/xfs_log.c | 14 ++++++--------
 1 file changed, 6 insertions(+), 8 deletions(-)

diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c
index 4b1c0a9c6368..92e39873d09e 100644
--- a/fs/xfs/xfs_log.c
+++ b/fs/xfs/xfs_log.c
@@ -148,7 +148,7 @@ xlog_grant_sub_space(
 	int			bytes)
 {
 	int64_t	head_val = atomic64_read(head);
-	int64_t new, old;
+	int64_t new;
 
 	do {
 		int	cycle, space;
@@ -161,10 +161,9 @@ xlog_grant_sub_space(
 			cycle--;
 		}
 
-		old = head_val;
 		new = xlog_assign_grant_head_val(cycle, space);
-		head_val = atomic64_cmpxchg(head, old, new);
-	} while (head_val != old);
+
+	} while (!atomic64_try_cmpxchg(head, &head_val, new));
 }
 
 static void
@@ -174,7 +173,7 @@ xlog_grant_add_space(
 	int			bytes)
 {
 	int64_t	head_val = atomic64_read(head);
-	int64_t new, old;
+	int64_t new;
 
 	do {
 		int		tmp;
@@ -190,10 +189,9 @@ xlog_grant_add_space(
 			cycle++;
 		}
 
-		old = head_val;
 		new = xlog_assign_grant_head_val(cycle, space);
-		head_val = atomic64_cmpxchg(head, old, new);
-	} while (head_val != old);
+
+	} while (!atomic64_try_cmpxchg(head, &head_val, new));
 }
 
 STATIC void
-- 
2.37.1


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH] fs/xfs: Use atomic64_try_cmpxchg in xlog_grant_{add,sub}_space
  2022-08-09 16:56 [PATCH] fs/xfs: Use atomic64_try_cmpxchg in xlog_grant_{add,sub}_space Uros Bizjak
@ 2022-08-09 22:05 ` Dave Chinner
  2022-08-09 23:02   ` Dave Chinner
  0 siblings, 1 reply; 5+ messages in thread
From: Dave Chinner @ 2022-08-09 22:05 UTC (permalink / raw)
  To: Uros Bizjak; +Cc: linux-xfs, linux-kernel, Darrick J. Wong

On Tue, Aug 09, 2022 at 06:56:15PM +0200, Uros Bizjak wrote:
> Use `!atomic64_try_cmpxchg(ptr, &old, new)` instead of
> `atomic64_cmpxchg(ptr, old, new) != old` in xlog_grant_{add,sub}_space.
> This has two benefits:
> 
> - The x86 cmpxchg instruction returns success in the ZF flag, so this
>   change saves a compare after cmpxchg, as well as a related move
>   instruction in the front of cmpxchg.
> 
> - atomic64_try_cmpxchg implicitly assigns the *ptr value to &old when
>   cmpxchg fails, enabling further code simplifications.

Do the two cmpxchg operations have the same memory ordering
semantics on failure?

> This patch has no functional change.

The patch looks ok, but ....

... I'm about 2 hours away from posting a patchset that completely
removes the cmpxchg and the new grant head accounting has
significantly lower fast path overhead. It also opens the door for
tracking more than 2GB of log space in the grant heads.

So I don't think we need to be micro-optimising this code given that
there are much bigger perf gains about to land...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] fs/xfs: Use atomic64_try_cmpxchg in xlog_grant_{add,sub}_space
  2022-08-09 22:05 ` Dave Chinner
@ 2022-08-09 23:02   ` Dave Chinner
  2022-08-10  7:02     ` Uros Bizjak
  0 siblings, 1 reply; 5+ messages in thread
From: Dave Chinner @ 2022-08-09 23:02 UTC (permalink / raw)
  To: Uros Bizjak; +Cc: linux-xfs, linux-kernel, Darrick J. Wong

On Wed, Aug 10, 2022 at 08:05:11AM +1000, Dave Chinner wrote:
> On Tue, Aug 09, 2022 at 06:56:15PM +0200, Uros Bizjak wrote:
> > Use `!atomic64_try_cmpxchg(ptr, &old, new)` instead of
> > `atomic64_cmpxchg(ptr, old, new) != old` in xlog_grant_{add,sub}_space.
> > This has two benefits:
> > 
> > - The x86 cmpxchg instruction returns success in the ZF flag, so this
> >   change saves a compare after cmpxchg, as well as a related move
> >   instruction in the front of cmpxchg.
> > 
> > - atomic64_try_cmpxchg implicitly assigns the *ptr value to &old when
> >   cmpxchg fails, enabling further code simplifications.
> 
> Do the two cmpxchg operations have the same memory ordering
> semantics on failure?
> 
> > This patch has no functional change.
> 
> The patch looks ok, but ....
> 
> ... I'm about 2 hours away from posting a patchset that completely
> removes the cmpxchg and the new grant head accounting has
> significantly lower fast path overhead. It also opens the door for
> tracking more than 2GB of log space in the grant heads.

FYI, the original RFC for this was posted a bit over a month ago:

https://lore.kernel.org/linux-xfs/20220708015558.1134330-1-david@fromorbit.com/

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] fs/xfs: Use atomic64_try_cmpxchg in xlog_grant_{add,sub}_space
  2022-08-09 23:02   ` Dave Chinner
@ 2022-08-10  7:02     ` Uros Bizjak
  2022-08-10 22:56       ` Dave Chinner
  0 siblings, 1 reply; 5+ messages in thread
From: Uros Bizjak @ 2022-08-10  7:02 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs, linux-kernel, Darrick J. Wong

On Wed, Aug 10, 2022 at 1:02 AM Dave Chinner <david@fromorbit.com> wrote:
>
> On Wed, Aug 10, 2022 at 08:05:11AM +1000, Dave Chinner wrote:
> > On Tue, Aug 09, 2022 at 06:56:15PM +0200, Uros Bizjak wrote:
> > > Use `!atomic64_try_cmpxchg(ptr, &old, new)` instead of
> > > `atomic64_cmpxchg(ptr, old, new) != old` in xlog_grant_{add,sub}_space.
> > > This has two benefits:
> > >
> > > - The x86 cmpxchg instruction returns success in the ZF flag, so this
> > >   change saves a compare after cmpxchg, as well as a related move
> > >   instruction in the front of cmpxchg.
> > >
> > > - atomic64_try_cmpxchg implicitly assigns the *ptr value to &old when
> > >   cmpxchg fails, enabling further code simplifications.
> >
> > Do the two cmpxchg operations have the same memory ordering
> > semantics on failure?

Yes. The API also provides _acquire, _release and _relaxed variants of
both, atomic64_cmpxchg and atomic64_try_cmpxchg. On x86, these two
functions actually compile to the same CMPXCHG instruction, the
difference is only in how the comparison is handled:

      15:    48 09 c2                 or     %rax,%rdx
      18:    48 89 c8                 mov    %rcx,%rax
      1b:    f0 48 0f b1 16           lock cmpxchg %rdx,(%rsi)
      20:    48 39 c1                 cmp    %rax,%rcx
      23:    74 2a                    je     4f <xlog_grant_add_space+0x4f>

becomes:

     29c:    48 09 ca                 or     %rcx,%rdx
     29f:    f0 48 0f b1 16           lock cmpxchg %rdx,(%rsi)
     2a4:    75 d2                    jne    278 <xlog_grant_add_space+0x8>

And as demonstrated in [1], even the fallback code compiles to a
better assembly.

[1] https://lore.kernel.org/lkml/CAFULd4bc54+_FmJ=f++zzz99mR8r5c11-Y49pz86Yb8G3dyJpA@mail.gmail.com/

> > > This patch has no functional change.
> >
> > The patch looks ok, but ....
> >
> > ... I'm about 2 hours away from posting a patchset that completely
> > removes the cmpxchg and the new grant head accounting has
> > significantly lower fast path overhead. It also opens the door for
> > tracking more than 2GB of log space in the grant heads.
>
> FYI, the original RFC for this was posted a bit over a month ago:
>
> https://lore.kernel.org/linux-xfs/20220708015558.1134330-1-david@fromorbit.com/

-static void
+void
 xlog_grant_sub_space(

[...]

- old = head_val;
- new = xlog_assign_grant_head_val(cycle, space);
- head_val = atomic64_cmpxchg(&head->grant, old, new);
- } while (head_val != old);
+ atomic64_sub(bytes, &head->grant);
 }

I actually wondered why these two functions were not implemented as
atomic64_{add,sub}. Of course, this is a much better solution that
renders the proposed patch obsolete.

BR,
Uros.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] fs/xfs: Use atomic64_try_cmpxchg in xlog_grant_{add,sub}_space
  2022-08-10  7:02     ` Uros Bizjak
@ 2022-08-10 22:56       ` Dave Chinner
  0 siblings, 0 replies; 5+ messages in thread
From: Dave Chinner @ 2022-08-10 22:56 UTC (permalink / raw)
  To: Uros Bizjak; +Cc: linux-xfs, linux-kernel, Darrick J. Wong

On Wed, Aug 10, 2022 at 09:02:16AM +0200, Uros Bizjak wrote:
> On Wed, Aug 10, 2022 at 1:02 AM Dave Chinner <david@fromorbit.com> wrote:
> >
> > On Wed, Aug 10, 2022 at 08:05:11AM +1000, Dave Chinner wrote:
> > > On Tue, Aug 09, 2022 at 06:56:15PM +0200, Uros Bizjak wrote:
> > > > Use `!atomic64_try_cmpxchg(ptr, &old, new)` instead of
> > > > `atomic64_cmpxchg(ptr, old, new) != old` in xlog_grant_{add,sub}_space.
> > > > This has two benefits:
> > > >
> > > > - The x86 cmpxchg instruction returns success in the ZF flag, so this
> > > >   change saves a compare after cmpxchg, as well as a related move
> > > >   instruction in the front of cmpxchg.
> > > >
> > > > - atomic64_try_cmpxchg implicitly assigns the *ptr value to &old when
> > > >   cmpxchg fails, enabling further code simplifications.
> > >
> > > Do the two cmpxchg operations have the same memory ordering
> > > semantics on failure?
> 
> Yes. The API also provides _acquire, _release and _relaxed variants of
> both, atomic64_cmpxchg and atomic64_try_cmpxchg.

Yes, I know this, which was why I was asking if the default
behaviour is the same - many people get caught out by assuming that
cmpxchg implies release semantics even if it fails....

> On x86, these two
> functions actually compile to the same CMPXCHG instruction, the
> difference is only in how the comparison is handled:
> 
>       15:    48 09 c2                 or     %rax,%rdx
>       18:    48 89 c8                 mov    %rcx,%rax
>       1b:    f0 48 0f b1 16           lock cmpxchg %rdx,(%rsi)
>       20:    48 39 c1                 cmp    %rax,%rcx
>       23:    74 2a                    je     4f <xlog_grant_add_space+0x4f>
> 
> becomes:
> 
>      29c:    48 09 ca                 or     %rcx,%rdx
>      29f:    f0 48 0f b1 16           lock cmpxchg %rdx,(%rsi)
>      2a4:    75 d2                    jne    278 <xlog_grant_add_space+0x8>
> 
> And as demonstrated in [1], even the fallback code compiles to a
> better assembly.o

FWIW, I mostly don't care about assembler level optimisations for
the code I write. I'll try to write code efficiently, but I don't
really care that much for micro-optimisation. Fundamentally, 
focussing on optimising code down to the instruction level means you
are not looking for algorithmic optimisations, which is where all
the big gains typically come from....

This thread demonstrates that - I get an improvement from roughly
1.4 million transactions/s to roughly 2 million transactions/s with
the change to the grant head accounting algorithm, whilst the
improvement from removing 2 instructions from the cmpxchg can't
actually be measured on my tests. i.e. the improvement is lost
within the noise floor of the benchmarks.

So, yeah, when it comes to making code faster, I focus on efficient
algorithms rather than efficient code because algorithms are where
the gains users will notice are....

> > FYI, the original RFC for this was posted a bit over a month ago:
> >
> > https://lore.kernel.org/linux-xfs/20220708015558.1134330-1-david@fromorbit.com/
> 
> -static void
> +void
>  xlog_grant_sub_space(
> 
> [...]
> 
> - old = head_val;
> - new = xlog_assign_grant_head_val(cycle, space);
> - head_val = atomic64_cmpxchg(&head->grant, old, new);
> - } while (head_val != old);
> + atomic64_sub(bytes, &head->grant);
>  }
> 
> I actually wondered why these two functions were not implemented as
> atomic64_{add,sub}.

Because the grant heads were not integer values that can be added
and subtracted. Log sequence numbers (LSNs) are 64 bit objects made
up of two discrete 32 bit values. Essentially the upper 32 bits
counts the number of overflows of the fixed size space the lower 32
bits accounts.

The lower 32 bits matches the size of the journal, so will overflow
at some boundary much lower than 2^32. Hence adding or subtracting
to a LSN has to handle the space overflow/underflow itself to
modify the overflow (cycle) counter top 32 bits appropriately.

These calculations cannot be done as a single atomic operation,
hence the crack/calc/combine/cmpxchg loops to enable them to be done
without requiring locks in the fast path.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2022-08-10 22:56 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-08-09 16:56 [PATCH] fs/xfs: Use atomic64_try_cmpxchg in xlog_grant_{add,sub}_space Uros Bizjak
2022-08-09 22:05 ` Dave Chinner
2022-08-09 23:02   ` Dave Chinner
2022-08-10  7:02     ` Uros Bizjak
2022-08-10 22:56       ` Dave Chinner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox