From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:53247) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ZhjZu-0003KZ-OR for qemu-devel@nongnu.org; Thu, 01 Oct 2015 15:31:48 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1ZhjZr-0005oX-Av for qemu-devel@nongnu.org; Thu, 01 Oct 2015 15:31:42 -0400 Received: from out1-smtp.messagingengine.com ([66.111.4.25]:51957) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ZhjZr-0005nf-6P for qemu-devel@nongnu.org; Thu, 01 Oct 2015 15:31:39 -0400 Received: from compute4.internal (compute4.nyi.internal [10.202.2.44]) by mailout.nyi.internal (Postfix) with ESMTP id 7662120638 for ; Thu, 1 Oct 2015 15:31:37 -0400 (EDT) Date: Thu, 1 Oct 2015 15:32:36 -0400 From: "Emilio G. Cota" Message-ID: <20151001193236.GA26447@flamenco> References: <1443083566-10994-1-git-send-email-a.rigo@virtualopensystems.com> <560B68B0.1080702@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <560B68B0.1080702@redhat.com> Subject: Re: [Qemu-devel] [RFC v5 0/6] Slow-path for atomic instruction translation List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Paolo Bonzini Cc: mttcg@greensocs.com, claudio.fontana@huawei.com, Alvise Rigo , qemu-devel@nongnu.org, jani.kokkonen@huawei.com, tech@virtualopensystems.com, alex.bennee@linaro.org On Wed, Sep 30, 2015 at 06:44:32 +0200, Paolo Bonzini wrote: > I have a doubt about your patches for ll/sc emulation, that I hope you > can clarify. > > From 10000ft, both approaches rely on checking a flag during stores. > This is split between the TLB and the CPUState for Alvise's patches (in > order to exploit the existing fast-path checks), and entirely in the > radix tree for Emilio's. However, the idea is the same. Not quite the same idea, see below. > Now, the patch are okay for serial emulation, but I am not sure if it's > possible to do lock-free ll/sc emulation, because there is a race. > > If we check the flag before the store, the race is as follows: > > CPU0 CPU1 > ------------------------------------------------------- > check flag > load locked: > set flag > load value (normal load on CPU) > store > store conditional (normal store on CPU) > > where the sc doesn't fail. For completeness, if we check it afterwards > (which would be possible with Emilio's approach, though not for the > TLB-based one): > > CPU0 CPU1 > ------------------------------------------------------ > load locked > set bit > load value (normal load on CPU) > store > store conditional (normal store on CPU) > check flag > > and again the sc doesn't fail. (snip) > Tell me I'm wrong. :) The difference between Alvise's implementation and what I submitted is that in the radix tree a cache line that has *ever* had an atomic op performed on it, is marked as "slow path" for the rest of the execution, meaning that *all* subsequent stores to said cache line will take the cache line entry's lock. This does not fix the race completely (e.g. you could have a store and an atomic op concurrently executing on a line that hasn't yet had an atomic op on it), but it significantly closes it. My guess is that it would be very hard to trigger by practical programs. E.