From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1NxKhT-0003kN-Mq for qemu-devel@nongnu.org; Thu, 01 Apr 2010 09:44:47 -0400 Received: from [140.186.70.92] (port=59800 helo=eggs.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1NxKhS-0003kE-Fl for qemu-devel@nongnu.org; Thu, 01 Apr 2010 09:44:47 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.69) (envelope-from ) id 1NxKhO-000102-28 for qemu-devel@nongnu.org; Thu, 01 Apr 2010 09:44:46 -0400 Received: from mx20.gnu.org ([199.232.41.8]:40088) by eggs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1NxKhO-0000zp-08 for qemu-devel@nongnu.org; Thu, 01 Apr 2010 09:44:42 -0400 Received: from mail.codesourcery.com ([38.113.113.100]) by mx20.gnu.org with esmtp (Exim 4.60) (envelope-from ) id 1NxKhN-00028b-6G for qemu-devel@nongnu.org; Thu, 01 Apr 2010 09:44:41 -0400 From: Paul Brook Subject: Re: [Qemu-devel] [PATCH 00/10, v3] target-alpha improvements Date: Thu, 1 Apr 2010 13:44:37 +0000 References: <20100326015252.GG19308@shareable.org> In-Reply-To: <20100326015252.GG19308@shareable.org> MIME-Version: 1.0 Content-Type: Text/Plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Message-Id: <201004011444.37844.paul@codesourcery.com> List-Id: qemu-devel.nongnu.org List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: qemu-devel@nongnu.org Cc: aurelien@aurel32.net, Richard Henderson > CPU #0 CPU #1 > > x <- load-locked(A) > y <- load(B) > x+1 -> store(A) > y+1 -> store(B) > x -> store(A) > f(x,y) -> store-cond(A) > > Unless I made a mistake, the above cannot store f(x,y+1) into A, for > any interleaving (assume strongly ordered memory or barriers), on > machines where any store by another CPU breaks the condition. But on > machines which implement store-cond by atomic-cmpxchg using the > load-locked value, f(x,y+1) can be stored. I investigated this fairly closely for the initial implementation. Your key assumption is that you have strict ordering between CPUs. While it is possible to construct theoretical failure cases there this is observable. In practice you end up falling fall foul of architectural limitations on the use of ll/sc. Your example fails to describe how x and y are transferred from CPU0 to CPU1. I'd regard any code that has a barrier between a load-locked and store- conditional with extreme suspicion. For example PPC states that barrier instructions cause a CPU to loose the lock[1]. Paul [1] We currently get this wrong in QEMU.