From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtp110.mail.mud.yahoo.com ([209.191.85.220]:38974 "HELO smtp110.mail.mud.yahoo.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with SMTP id S1762637AbWLKHhG (ORCPT ); Mon, 11 Dec 2006 02:37:06 -0500 Message-ID: <457D0A6A.1090806@yahoo.com.au> Date: Mon, 11 Dec 2006 18:36:10 +1100 From: Nick Piggin MIME-Version: 1.0 Subject: Re: [PATCH] WorkStruct: Implement generic UP cmpxchg() where an References: <20061211061751.22553.qmail@science.horizon.com> In-Reply-To: <20061211061751.22553.qmail@science.horizon.com> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit Sender: linux-arch-owner@vger.kernel.org To: linux@horizon.com Cc: linux-arch@vger.kernel.org, linux-arm-kernel@lists.arm.linux.org.uk, linux-kernel@vger.kernel.org, torvalds@osdl.org List-ID: linux@horizon.com wrote: >> atomic_ll() / atomic_sc() with the restriction that they cannot be >> nested, you cannot write any C code between them, and may only call >> into some specific set of atomic_llsc_xxx primitives, operating on >> the address given to ll, and must not have more than a given number >> of instructions between them. Also, the atomic_sc won't always fail >> if there were interleaving stores. > > > I'm not entirely sure what you're talking about. You generally want I'm talking about what a generic API (which you were proposing) would look like. > to keep the amount of code between ll and sc to an absolute minimum > to avoid interference which causes livelock. Processor timeouts > are generally much longer than any reasonable code sequence. "Generally" does not mean you can just ignore it and hope the C compiler does the right thing. Nor is it enough for just SOME of the architectures to have the properties you require. > I hear you and Linus say that loads and stores are not allowed. > I'm trying to find that documented somewhere. Let's see... Wikipedia > says that LL/SC is implemented on Alpha, MIPS, PowerPC, and ARM. Ralf tells us that MIPS cannot execute any loads, stores, or sync instructions on MIPS. Ivan says no loads, stores, taken branches etc on Alpha. MIPS also has a limit of 2048 bytes between the ll and sc. So you almost definitely cannot have gcc generated assembly between. I think we agree on that much. So what remains is for you to propose your API. > Okay, so Alpha is the special case; write those primitives directly > in asm. It's still possible to get three out of four. Yes, but the code *between* the time you call atomic_ll and atomic_sc has to be performed in arch calls as well. Ie. you cannot write: do { val = atomic_ll(&A); if (val == blah) { ... } while (!atomic_sc(&A, newval)); AFAIKS, you cannot even write: do { val = atomic_ll(&A); val++; } while(!atomic_sc(&A, val)); Is the do {} while loop itself even safe? I'm not sure, I don't think we could guarantee it. On the other hand, atomic_cmpxchg can be used in arbitrary C code, and has no restrictions on use whatsoever. > In truth, however, realizing that we're only talking about three > architectures (wo of which have 32 & 64-bit versions) it's probably not > worth it. If there were five, it would probably be a savings, but 3x > code duplication of some small, well-defined primitives is a fair price > to pay for avoiding another layer of abstraction (a.k.a. obfuscation). > > And it lets you optimize them better. > > I apologize for not having counted them before. I disagree. It wouldn't be another layer of abstraction, it would be an abstraction on pretty much the same level as the other atomic primitives. What it would be is a bad abstraction if it leaked all these architecture specific details into generic C code. But if you can get around that... I also disagree that the architectures don't matter. ARM and PPC are pretty important, and I believe Linux on MIPS is growing too. I also think it is really nice to be able to come up with an optimal implementation over a wider range of architectures even if they are relatively rare. It often implies that you have come up with a better API[*]. One proposal that I could buy is an atomic_ll/sc API, which mapped to a cmpxchg emulation even on those llsc architectures which had any sort of restriction whatsoever. This could be used in regular C code (eg. you indicate powerpc might be able to do this). But it may also help cmpxchg architectures optimise their code, because the load really wants to be a "load with intent to store" -- and is IMO the biggest suboptimal aspect of current atomic_cmpxchg. [*] Of course you can take this in the wrong direction. For example, the 2 which implement SMP atomic operations with spinlocks would often prefer to use a spinlock in the data structure being modified than hashed off the address of the atomic (much better locality of reference). But doing this all over the kernel is idiotic. -- SUSE Labs, Novell Inc. Send instant messages to your online friends http://au.messenger.yahoo.com