From: Russell King <rmk+lkml@arm.linux.org.uk>
To: Christoph Lameter <clameter@sgi.com>
Cc: David Howells <dhowells@redhat.com>,
torvalds@osdl.org, akpm@osdl.org,
linux-arm-kernel@lists.arm.linux.org.uk,
linux-kernel@vger.kernel.org, linux-arch@vger.kernel.org
Subject: Re: [PATCH] WorkStruct: Implement generic UP cmpxchg() where an arch doesn't support it
Date: Wed, 6 Dec 2006 19:58:20 +0000 [thread overview]
Message-ID: <20061206195820.GA15281@flint.arm.linux.org.uk> (raw)
In-Reply-To: <Pine.LNX.4.64.0612061111130.27263@schroedinger.engr.sgi.com>
On Wed, Dec 06, 2006 at 11:16:55AM -0800, Christoph Lameter wrote:
> On Wed, 6 Dec 2006, Russell King wrote:
>
> > On Wed, Dec 06, 2006 at 10:56:14AM -0800, Christoph Lameter wrote:
> > > I'd really appreciate a cmpxchg that is generically available for
> > > all arches. It will allow lockless implementation for various performance
> > > criticial portions of the kernel.
> >
> > Let's recap on cmpxchg.
> >
> > For CPUs with no atomic operation other than SWP, it is not lockless.
>
> But then its also just requires disable/enable interrupts on UP which may
> be cheaper than an atomic operation.
No. SWP is atomic on the CPU it's being issued on, especially wrt
interrupts. Only on one ARM CPU (which is UP) does it have a
questionable use, and there we do it via interrupt disable/enable.
> > For CPUs with load locked + store conditional, it is expensive.
>
> Because it locks the bus? I am not that familiar with those architectures
> but it seems that those will have a general problem anyways.
No. That certainly would be bad for performance. I can talk
authoritively from the ARM implementation.
When you use a special "ldrex" (load exclusive) instruction, the
CPU remembers the "address + cpu" pair. If another access occurs
to the same address, this state is reset.
Only if this state is preserved will a "strex" (store exclusive)
instruction succeed. This instruction returns status indicating
whether it succeeded.
So, to implement cmpxchg, you need to do this:
; r1 = temporary register
; r2 = address
; r4 = new value
; r3 = returned status
ldrex r1, [r2]
cmp r1, old_value
streqex r3, r4, [r2]
> > If you want an operation for performance critical portions of the
> > kernel, please allow architecture maintainers the freedom to use their
> > best performance enhancements.
>
> And thereby denying the kernel developers to use a simple atomic SMP
> operation? Adding additional defines for each arch and each performance
> critical piece of kernel logic?
No. If you read what I said, you'll see that you can _cheaply_ use
cmpxchg in a ll/sc based implementation. Take an atomic increment
operation.
do {
old = load_locked(addr);
} while (store_exclusive(old, old + 1, addr);
On a cmpxchg, that "store_exclusive" (loosely) becomes your cmpxchg
instruction, comparing the first arg, and if equal storing the second.
The "load_locked" macro becomes a standard pointer deref. Ergo, x86
becomes:
do {
load value
manipulate it
conditional store
} while not stored
On ll/sc, the load_locked() macro is the load locked instruction. The
store_exclusive() macro is the exclusive store and it doesn't need to
use the first parameter at all. Ergo, ARM becomes:
do {
ldrex r1, [r2]
manipulate r1
strex r0, r1, [r2]
} while failed
Notice that both are optimal.
Now let's consider the cmpxchg case.
do {
val = *addr;
} while (cmpxchg(val, val + 1, addr);
The x86 case is _identical_ to the ll/sc based implementation. Absolutely
entirely. No impact what so ever.
Let's look at the ll/sc case. The cmpxchg code implemented on this has
to reload the original value, compare it, if equal store the new value.
So:
do {
val = *addr;
(r2 = addr,
ldrex r1, [r2]
compare r1, r0
strexeq r4, r3, [r2] (store exclusive if equal)
} while store failed or comparecondition failed
Note how the cmpxchg has _forced_ the ll/sc implementation to become
more complex.
So, let's recap.
Implementing ll/sc based accessor macros allows both ll/sc _and_ cmpxchg
architectures to produce optimal code.
Implementing an cmpxchg based accessor macro allows cmpxchg architectures
to produce optimal code and ll/sc non-optimal code.
See my point?
--
Russell King
Linux kernel 2.6 ARM Linux - http://www.arm.linux.org.uk/
maintainer of:
next prev parent reply other threads:[~2006-12-06 19:58 UTC|newest]
Thread overview: 76+ messages / expand[flat|nested] mbox.gz Atom feed top
2006-12-06 16:43 [PATCH] WorkStruct: Implement generic UP cmpxchg() where an arch doesn't support it David Howells
2006-12-06 17:21 ` Linus Torvalds
2006-12-06 18:56 ` Christoph Lameter
2006-12-06 19:00 ` Russell King
2006-12-06 19:16 ` Christoph Lameter
2006-12-06 19:28 ` Linus Torvalds
2006-12-06 19:58 ` Russell King [this message]
2006-12-06 21:36 ` Matthew Wilcox
2006-12-06 21:52 ` Christoph Lameter
2006-12-06 22:05 ` Matthew Wilcox
2006-12-06 22:15 ` Christoph Lameter
2006-12-07 0:37 ` Roman Zippel
2006-12-07 0:54 ` Linus Torvalds
2006-12-07 1:05 ` Roman Zippel
2006-12-07 1:18 ` Linus Torvalds
2006-12-07 1:24 ` Roman Zippel
2006-12-07 1:36 ` Linus Torvalds
2006-12-07 1:44 ` Matthew Wilcox
2006-12-07 2:09 ` Douglas McNaught
2006-12-07 1:52 ` Roman Zippel
2006-12-07 9:23 ` Nick Piggin
2006-12-06 22:38 ` Linus Torvalds
2006-12-07 9:31 ` Nick Piggin
2006-12-07 13:20 ` Ivan Kokshaysky
2006-12-07 15:03 ` Russell King
2006-12-08 1:18 ` Nick Piggin
2006-12-08 8:56 ` Russell King
2006-12-08 16:06 ` Christoph Lameter
2006-12-08 16:31 ` Russell King
2006-12-08 16:43 ` Christoph Lameter
2006-12-08 16:47 ` Russell King
2006-12-08 16:53 ` Christoph Lameter
2006-12-08 16:58 ` Russell King
2006-12-08 16:56 ` David Howells
2006-12-08 17:06 ` Christoph Lameter
2006-12-08 17:18 ` Russell King
2006-12-08 17:23 ` Christoph Lameter
2006-12-08 19:15 ` Linus Torvalds
2006-12-08 19:31 ` Russell King
2006-12-08 19:37 ` Linus Torvalds
2006-12-08 19:43 ` Russell King
2006-12-08 20:01 ` Linus Torvalds
2006-12-08 18:46 ` Linus Torvalds
2006-12-08 19:04 ` Russell King
2006-12-08 19:35 ` Linus Torvalds
2006-12-08 19:59 ` Russell King
2006-12-08 20:34 ` Linus Torvalds
2006-12-11 11:04 ` David Howells
2006-12-08 22:33 ` Nick Piggin
2006-12-07 15:36 ` Linus Torvalds
2006-12-07 16:51 ` Ralf Baechle
2006-12-07 0:46 ` Ralf Baechle
2006-12-06 19:05 ` Linus Torvalds
2006-12-06 19:08 ` Al Viro
2006-12-06 19:25 ` Linus Torvalds
2006-12-06 19:29 ` Matthew Wilcox
2006-12-06 19:43 ` David Howells
2006-12-06 19:54 ` Linus Torvalds
2006-12-06 19:56 ` Linus Torvalds
2006-12-07 1:09 ` David Miller
2006-12-06 19:26 ` Matthew Wilcox
2006-12-06 19:29 ` Christoph Lameter
2006-12-06 19:36 ` Matthew Wilcox
2006-12-06 19:47 ` Christoph Lameter
2006-12-06 19:50 ` Matthew Wilcox
2006-12-06 20:11 ` Christoph Lameter
2006-12-06 20:17 ` Matthew Wilcox
2006-12-06 19:34 ` Linus Torvalds
2006-12-06 19:41 ` Matthew Wilcox
2006-12-06 19:45 ` David Howells
2006-12-06 20:00 ` Russell King
2006-12-07 15:06 ` Russell King
2006-12-08 15:32 ` Russell King
2006-12-06 19:12 ` Lennert Buytenhek
2006-12-06 19:47 ` David Howells
2006-12-06 20:09 ` Lennert Buytenhek
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20061206195820.GA15281@flint.arm.linux.org.uk \
--to=rmk+lkml@arm.linux.org.uk \
--cc=akpm@osdl.org \
--cc=clameter@sgi.com \
--cc=dhowells@redhat.com \
--cc=linux-arch@vger.kernel.org \
--cc=linux-arm-kernel@lists.arm.linux.org.uk \
--cc=linux-kernel@vger.kernel.org \
--cc=torvalds@osdl.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox