All of lore.kernel.org
 help / color / mirror / Atom feed
From: Eric Dumazet <eric.dumazet@gmail.com>
To: Ingo Molnar <mingo@elte.hu>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
	David Howells <dhowells@redhat.com>,
	akpm@linux-foundation.org, paulus@samba.org, arnd@arndb.de,
	linux-kernel@vger.kernel.org
Subject: [PATCH] x86: atomic64_t: _cmpxchg() & _read() optimizations
Date: Fri, 03 Jul 2009 14:26:33 +0200	[thread overview]
Message-ID: <4A4DF8F9.9030503@gmail.com> (raw)
In-Reply-To: <20090703120159.GB7161@elte.hu>

Ingo Molnar a écrit :
> * Linus Torvalds <torvalds@linux-foundation.org> wrote:
> 
>> On Thu, 2 Jul 2009, Eric Dumazet wrote:
>>> Using a fixed initial value (instead of __atomic64_read()) is even faster, 
>>> it apparently permits cpu to use an appropriate bus transaction.
>> Yeah, I guess it does a "read-for-write-ownership" and allows the 
>> thing to be done as a single cache transaction.
>>
>> If we read it first, it will first get the cacheline for 
>> shared-read, and then the cmpxchg8b will need to turn it from 
>> shared to exclusive.
>>
>> Of course, the _optimal_ situation would be if the cmpxchg8b 
>> didn't actually do the write at all when the value matches (and 
>> all cores could just keep it shared), but I guess that's not going 
>> to happen.
>>
>> Too bad there is no pure 8-byte read op. Using MMX has too many 
>> downsides.
>>
>> Btw, your numbers imply that for the atomic64_add_return(), we 
>> really would be much better off not reading the original value at 
>> all. Again, in that case, we really do want the 
>> "read-for-write-ownership" cache transaction, not a read.
> 
> Something like the patch below?
> 
> Please review it carefully, as the perfcounter exposure to the 
> conditional-arithmetics atomic64_t APIs is very low:
> 
>   earth4:~/tip> for N in $(git grep atomic64_ | grep perf_ |
>     sed 's/(/ /g'); do echo $N; done | grep ^atomic64_ | sort | uniq -c | sort -n
> 
>       1 atomic64_add_negative
>       1 atomic64_inc_return
>       2 atomic64_xchg
>       3 atomic64_cmpxchg
>       3 atomic64_sub
>       7 atomic64_t
>      11 atomic64_add
>      21 atomic64_set
>      22 atomic64_read
> 
> So while i have tested it on a 32-bit box, it's only lightly tested 
> (and possibly broken) due to the low exposure of the API.
> 
> Thanks,
> 
> 	Ingo
> 
> ----------------------->
> Subject: x86: atomic64_t: Improve atomic64_add_return()
> From: Ingo Molnar <mingo@elte.hu>
> Date: Fri Jul 03 12:39:07 CEST 2009
> 
> Linus noted (based on Eric Dumazet's numbers) that we would
> probably be better off not trying an atomic_read() in
> atomic64_add_return() but intead intentionally let the first
> cmpxchg8b fail - to get a cache-friendly 'give me ownership
> of this cacheline' transaction. That can then be followed
> by the real cmpxchg8b which sets the value local to the CPU.
> 
> Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
> Cc: Eric Dumazet <eric.dumazet@gmail.com>
> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
> Cc: Mike Galbraith <efault@gmx.de>
> Cc: Paul Mackerras <paulus@samba.org>
> Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
> Cc: Frederic Weisbecker <fweisbec@gmail.com>
> Cc: David Howells <dhowells@redhat.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Arnd Bergmann <arnd@arndb.de>
> LKML-Reference: <alpine.LFD.2.01.0907021653030.3210@localhost.localdomain>
> Signed-off-by: Ingo Molnar <mingo@elte.hu>
> ---
>  arch/x86/lib/atomic64_32.c |   15 ++++++++++++---
>  1 file changed, 12 insertions(+), 3 deletions(-)
> 
> Index: linux/arch/x86/lib/atomic64_32.c
> ===================================================================
> --- linux.orig/arch/x86/lib/atomic64_32.c
> +++ linux/arch/x86/lib/atomic64_32.c
> @@ -76,13 +76,22 @@ u64 atomic64_read(atomic64_t *ptr)
>   */
>  u64 atomic64_add_return(u64 delta, atomic64_t *ptr)
>  {
> -	u64 old_val, new_val;
> +	/*
> +	 * Try first with a (probably incorrect) assumption about
> +	 * what we have there. We'll do two loops most likely,
> +	 * but we'll get an ownership MESI transaction straight away
> +	 * instead of a read transaction followed by a
> +	 * flush-for-ownership transaction:
> +	 */
> +	u64 old_val, new_val, real_val = 1ULL << 32;
>  
>  	do {
> -		old_val = atomic_read(ptr);
> +		old_val = real_val;
>  		new_val = old_val + delta;
>  
> -	} while (atomic64_cmpxchg(ptr, old_val, new_val) != old_val);
> +		real_val = atomic64_cmpxchg(ptr, old_val, new_val);
> +
> +	} while (real_val != old_val);
>  
>  	return new_val;
>  }

Seems cool, I am going to apply it and test it too.

I cooked following patch, to address _cmpxchg() and _read().

[PATCH] x86: atomic64_t: _cmpxchg() & _read() optimizations

atomic64_cmpxchg() can use existing optimized __cmpxchg64() function,
no need to redefine another cmpxchg8b() variant (BTW this variant had
aliasing problems)

As suggested by Linus, atomic64_read() doesnt have to loop.

We can implement it with a single cmpxchg8b instruction, and avoid
initial reading of the value so that cpu can use a "read-for-write-ownership"
bus transaction.

We do not call __cmpxchg64() here because ebx/ecx values are not changed,
lowering register pressure for the compiler.

This saves 1008 bytes of code on vmlinux (CONFIG_PERF_COUNTERS=y)

Performance results (with a user mode program doing 10.000.000 atomic64_read()
on 2x4 cpus, all fighting on same atomic64_t location)

before :

 Performance counter stats for './atomic64_slow':

   62897.582084  task-clock-msecs         #      7.670 CPUs
            186  context-switches         #      0.000 M/sec
              4  CPU-migrations           #      0.000 M/sec
            132  page-faults              #      0.000 M/sec
   188218718344  cycles                   #   2992.463 M/sec
     1017372948  instructions             #      0.005 IPC
      132425024  cache-references         #      2.105 M/sec
      109006338  cache-misses             #      1.733 M/sec

    8.200718697  seconds time elapsed

(2352 cycles per atomic64_read())

after :
 Performance counter stats for './atomic64_fast':

   10912.935654  task-clock-msecs         #      6.663 CPUs
             41  context-switches         #      0.000 M/sec
              4  CPU-migrations           #      0.000 M/sec
            132  page-faults              #      0.000 M/sec
    32657177400  cycles                   #   2992.520 M/sec
      838338807  instructions             #      0.026 IPC
       41855076  cache-references         #      3.835 M/sec
       36402586  cache-misses             #      3.336 M/sec

    1.637767671  seconds time elapsed

(408 cycles per atomic64_read())

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
---
 arch/x86/include/asm/atomic_32.h |   35 +++++++++--------------------
 1 files changed, 12 insertions(+), 23 deletions(-)

diff --git a/arch/x86/include/asm/atomic_32.h b/arch/x86/include/asm/atomic_32.h
index 2503d4e..14c9e9c 100644
--- a/arch/x86/include/asm/atomic_32.h
+++ b/arch/x86/include/asm/atomic_32.h
@@ -265,29 +265,10 @@ typedef struct {
 #define __atomic64_read(ptr)		((ptr)->counter)
 
 static inline unsigned long long
-cmpxchg8b(unsigned long long *ptr, unsigned long long old, unsigned long long new)
-{
-	asm volatile(
-
-		LOCK_PREFIX "cmpxchg8b (%[ptr])\n"
-
-		     :		"=A" (old)
-
-		     : [ptr]	"D" (ptr),
-				"A" (old),
-				"b" (ll_low(new)),
-				"c" (ll_high(new))
-
-		     : "memory");
-
-	return old;
-}
-
-static inline unsigned long long
 atomic64_cmpxchg(atomic64_t *ptr, unsigned long long old_val,
 		 unsigned long long new_val)
 {
-	return cmpxchg8b(&ptr->counter, old_val, new_val);
+	return __cmpxchg64(&ptr->counter, old_val, new_val);
 }
 
 /**
@@ -333,9 +314,17 @@ static inline unsigned long long atomic64_read(atomic64_t *ptr)
 {
 	unsigned long long curr_val;
 
-	do {
-		curr_val = __atomic64_read(ptr);
-	} while (atomic64_cmpxchg(ptr, curr_val, curr_val) != curr_val);
+	/*
+	 * "+A" constraint to make sure gcc wont use eax or edx
+	 * as a base or offset register for address computation
+	 */
+	asm volatile(
+		"mov\t%%ebx, %%eax\n\t"
+		"mov\t%%ecx, %%edx\n\t"
+		LOCK_PREFIX "cmpxchg8b %1\n"
+			: "+A" (curr_val)
+			: "m" (*__xg(ptr))
+		);
 
 	return curr_val;
 }


  reply	other threads:[~2009-07-03 12:27 UTC|newest]

Thread overview: 79+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-06-30 21:24 [PATCH] FRV: Wire up new syscalls David Howells
2009-06-30 21:34 ` Ingo Molnar
2009-06-30 21:41   ` Arnd Bergmann
2009-06-30 21:54     ` Ingo Molnar
2009-07-01 11:28       ` David Howells
2009-07-01 11:54         ` Ingo Molnar
2009-07-01 12:19           ` David Howells
2009-07-01 12:36             ` Paul Mackerras
2009-07-01 12:41               ` David Howells
2009-07-01 13:13                 ` Ingo Molnar
2009-07-01 14:10                   ` David Howells
2009-07-01 14:49                     ` Ingo Molnar
2009-07-01 15:19                       ` David Howells
2009-07-01 16:47                       ` [PATCH 1/2] FRV: Implement atomic64_t David Howells
2009-07-01 17:20                         ` Linus Torvalds
2009-07-01 17:33                           ` David Howells
2009-07-01 21:11                           ` Ingo Molnar
2009-07-01 22:57                           ` [PATCH] x86: Code atomic(64)_read and atomic(64)_set in C not CPP [was Re: FRV: Implement atomic64_t] Paul Mackerras
2009-07-02  7:21                             ` [tip:x86/urgent] x86: Code atomic(64)_read and atomic(64)_set in C not CPP tip-bot for Paul Mackerras
2009-07-02  7:21                             ` [PATCH] x86: Code atomic(64)_read and atomic(64)_set in C not CPP [was Re: FRV: Implement atomic64_t] Ingo Molnar
2009-07-01 23:46                           ` [PATCH 1/2] FRV: Implement atomic64_t [ver #2] David Howells
2009-07-01 23:46                           ` [PATCH 2/2] FRV: Add basic performance counter support " David Howells
2009-07-01 23:48                           ` [PATCH 1/2] FRV: Implement atomic64_t David Howells
2009-07-02 21:10                           ` Eric Dumazet
2009-07-02 21:28                             ` Linus Torvalds
2009-07-02 22:08                               ` [PATCH] x86: atomic64_t should be 8 bytes aligned Eric Dumazet
2009-07-02 23:53                                 ` Linus Torvalds
2009-07-03  6:14                                   ` Ingo Molnar
2009-07-03 12:42                                   ` [tip:perfcounters/urgent] x86: atomic64: The atomic64_t data type should be 8 bytes aligned on 32-bit too tip-bot for Eric Dumazet
2009-07-03 16:58                                     ` Linus Torvalds
2009-07-03 17:49                                       ` H. Peter Anvin
2009-07-03 12:42                                   ` [tip:perfcounters/urgent] x86: atomic64: Move the 32-bit atomic64_t implementation to a .c file tip-bot for Ingo Molnar
2009-07-03 16:47                                     ` Linus Torvalds
2009-07-03 18:31                                       ` [tip:perfcounters/urgent] x86: atomic64: Clean up atomic64_sub_and_test() and atomic64_add_negative() tip-bot for Ingo Molnar
2009-07-03 19:18                                       ` tip-bot for Ingo Molnar
2009-07-04  0:05                                     ` [tip:perfcounters/urgent] x86: atomic64: Move the 32-bit atomic64_t implementation to a .c file Paul Mackerras
2009-07-05 11:25                                       ` Ingo Molnar
2009-07-03 12:43                                   ` [tip:perfcounters/urgent] x86: atomic64: Improve atomic64_read() tip-bot for Eric Dumazet
2009-07-03 12:43                                   ` [tip:perfcounters/urgent] x86: atomic64: Improve cmpxchg8b() tip-bot for Eric Dumazet
2009-07-03 12:43                                   ` [tip:perfcounters/urgent] x86: atomic64: Improve atomic64_add_return() tip-bot for Ingo Molnar
2009-07-03 12:43                                   ` [tip:perfcounters/urgent] x86: atomic64: Reduce size of functions tip-bot for Ingo Molnar
2009-07-03 12:44                                   ` [tip:perfcounters/urgent] x86: atomic64: Fix unclean type use in atomic64_xchg() tip-bot for Ingo Molnar
2009-07-03 17:02                                     ` Linus Torvalds
2009-07-03 18:00                                       ` Ingo Molnar
2009-07-03 12:44                                   ` [tip:perfcounters/urgent] x86: atomic64: Improve atomic64_read() tip-bot for Eric Dumazet
2009-07-03 14:50                                     ` [PATCH -tip] x86: atomic64: inline atomic64_read() Eric Dumazet
2009-07-03 18:04                                       ` Ingo Molnar
2009-07-03 18:10                                         ` Arjan van de Ven
2009-07-03 18:18                                           ` Ingo Molnar
2009-07-03 18:25                                             ` Andi Kleen
2009-07-03 18:30                                             ` Arjan van de Ven
2009-07-03 18:43                                               ` Ingo Molnar
2009-07-03 18:24                                           ` Andi Kleen
2009-07-03 18:31                                           ` [tip:perfcounters/urgent] x86: atomic64: Optimize CMPXCHG8B sequences to not use the LOCK prefix tip-bot for Ingo Molnar
2009-07-03 18:45                                             ` Ingo Molnar
2009-07-03 19:10                                         ` [PATCH -tip] x86: atomic64: inline atomic64_read() Linus Torvalds
2009-07-03 19:17                                           ` Ingo Molnar
2009-07-03 19:38                                             ` Linus Torvalds
2009-07-03 21:40                                               ` Ingo Molnar
2009-07-03 18:31                                       ` [tip:perfcounters/urgent] x86: atomic64: Inline atomic64_read() again tip-bot for Eric Dumazet
2009-07-03 19:18                                       ` tip-bot for Eric Dumazet
2009-07-04  9:49                                       ` tip-bot for Eric Dumazet
2009-07-03 12:44                                   ` [tip:perfcounters/urgent] x86: atomic64: Code atomic(64)_read and atomic(64)_set in C not CPP tip-bot for Paul Mackerras
2009-07-03 12:48                                   ` tip-bot for Paul Mackerras
2009-07-03 12:48                                   ` [tip:perfcounters/urgent] x86: atomic64: Improve atomic64_read() tip-bot for Eric Dumazet
2009-07-03 15:33                                   ` [tip:perfcounters/urgent] x86: atomic64: Export APIs to modules tip-bot for Ingo Molnar
2009-07-03 18:30                                     ` tip-bot for Ingo Molnar
2009-07-03 18:30                                     ` [tip:perfcounters/urgent] x86: atomic64: Improve atomic64_xchg() tip-bot for Ingo Molnar
2009-07-03 12:01                               ` [patch] x86: atomic64_t: Improve atomic64_add_return() Ingo Molnar
2009-07-03 12:26                                 ` Eric Dumazet [this message]
2009-07-03 12:40                                   ` [PATCH] x86: atomic64_t: _cmpxchg() & _read() optimizations Ingo Molnar
2009-07-03 17:38                                 ` [patch] x86: atomic64_t: Improve atomic64_add_return() Linus Torvalds
2009-07-03  6:05                             ` [PATCH 1/2] FRV: Implement atomic64_t Eric Dumazet
2009-07-03 12:27                               ` Ingo Molnar
2009-07-03 12:39                                 ` Eric Dumazet
2009-07-03 11:17                             ` Ingo Molnar
2009-07-03 11:26                               ` Ingo Molnar
2009-07-01 16:47                       ` [PATCH 2/2] FRV: Add basic performance counter support David Howells
2009-07-01 21:10                         ` Ingo Molnar

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4A4DF8F9.9030503@gmail.com \
    --to=eric.dumazet@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=arnd@arndb.de \
    --cc=dhowells@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=paulus@samba.org \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.