From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1MCm5j-0008Jq-6z for qemu-devel@nongnu.org; Fri, 05 Jun 2009 22:57:07 -0400 Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1MCm5e-0008Gr-My for qemu-devel@nongnu.org; Fri, 05 Jun 2009 22:57:06 -0400 Received: from [199.232.76.173] (port=60044 helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1MCm5e-0008Go-Fj for qemu-devel@nongnu.org; Fri, 05 Jun 2009 22:57:02 -0400 Received: from mx20.gnu.org ([199.232.41.8]:5406) by monty-python.gnu.org with esmtps (TLS-1.0:RSA_AES_256_CBC_SHA1:32) (Exim 4.60) (envelope-from ) id 1MCm5e-0001fd-4V for qemu-devel@nongnu.org; Fri, 05 Jun 2009 22:57:02 -0400 Received: from mail.codesourcery.com ([65.74.133.4]) by mx20.gnu.org with esmtp (Exim 4.60) (envelope-from ) id 1MCm5a-0000QY-Ts for qemu-devel@nongnu.org; Fri, 05 Jun 2009 22:56:59 -0400 From: Paul Brook Subject: Re: [Qemu-devel] [PATCH 0/7] target-ppc/linux-user: NPTL support Date: Sat, 6 Jun 2009 03:56:55 +0100 References: <1244141522-21802-1-git-send-email-froydnj@codesourcery.com> In-Reply-To: MIME-Version: 1.0 Content-Type: Text/Plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <200906060356.56398.paul@codesourcery.com> List-Id: qemu-devel.nongnu.org List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: qemu-devel@nongnu.org Cc: Nathan Froyd On Saturday 06 June 2009, malc wrote: > On Thu, 4 Jun 2009, Nathan Froyd wrote: > > This patch series adds NPTL support in Linux user-mode emulation to > > 32-bit PowerPC targets. > > > > The main complication comes from implementing atomic instructions > > properly. We chose to implement a simplistic model: > > > > - reserved loads record the value loaded; An important point here is that the address/value pair is per thread/cpu. A nice side-effect is that these loads reduce to a simple atomic load and some thread local bookkeeping. Conditional stores require somewhat more exotic atomic operations, but still don't need to go poking at system global state or other CPUs. This sounds strange at first reading, but ll/sc semantics are deliberately designed to minimize contention between unrelated CPUs/resources in large systems. > > - conditional stores check that the memory at the effective address > > contains the value loaded by the previous reserved load, in addition > > to all other checks. if so, the store succeeds; otherwise, it fails. > > I think this will break code that relies on the fact that ll/sc is not > affected by the ABA problem. I'm not absolutely certain about PPC, but on other architectures (ARM, MIPS, Alpha) this implementation is sufficient. The only questionable case is when a second thread overwrites and then restores the original value between a locked load and a conditional store. However limited coherency and memory ordering between CPUs make it impossible to know whether this modify+restore occurred before or after the initial load. The worst that can happen here is that another thread gains and releases the lock[1] while the current thread is in the process of acquiring the lock. Even when this happens it is impossible for two threads to acquire the lock simultaneously. The only difference is the window between ll and sc. During this period we don't know whether we have the lock or not, so it's extremely unlikely that we will do anything that relies on no other thread having the lock. In practice ll/sc are always used as matching pairs with no intervening memory accesses, so this is never a problem. I could probably come up with synthetic testcases where qemu behavior is observably different to real hardware. However I'm pretty certain this never occurs in real code, and it is questionable whether such behavior is architecturally defined. If you still believe this is a problem you need come up with an actual testcase that demonstrates how this can introduce a race condition. Paul [1] Lock acquisition is the most obvious example, but the same applies to any atomic operation implemented on top of ll/sc.