From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:44149) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1dZlRP-0004bw-Mz for qemu-devel@nongnu.org; Mon, 24 Jul 2017 18:03:04 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1dZlRO-0000wS-HG for qemu-devel@nongnu.org; Mon, 24 Jul 2017 18:03:03 -0400 Sender: Richard Henderson References: <20170724212327.GA24963@flamenco> From: Richard Henderson Message-ID: Date: Mon, 24 Jul 2017 15:02:53 -0700 MIME-Version: 1.0 In-Reply-To: <20170724212327.GA24963@flamenco> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] Torn read/write possible on aarch64/x86-64 MTTCG? List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: "Emilio G. Cota" , Andrew Baumann Cc: "qemu-devel@nongnu.org" , "qemu-arm@nongnu.org" , Andrey Shedel , =?UTF-8?Q?Alex_Benn=ef=bf=bde?= , Pranith Kumar On 07/24/2017 02:23 PM, Emilio G. Cota wrote: > (Adding some Cc's) > > On Mon, Jul 24, 2017 at 19:05:33 +0000, Andrew Baumann via Qemu-devel wrote: >> Hi all, >> >> I'm trying to track down what appears to be a translation bug in either >> the aarch64 target or x86_64 TCG (in multithreaded mode). The symptoms I assume this is really x86_64 and not i686 as host. >> are entirely consistent with a torn read/write -- that is, a 64-bit >> load or store that was translated to two 32-bit loads and stores -- >> but that's obviously not what happens in the common path through the >> translation for this code, so I'm wondering: are there any cases in >> which qemu will split a 64-bit memory access into two 32-bit accesses? > > That would be a bug in MTTCG. > >> The code: Guest CPU A writes a 64-bit value to an aligned memory >> location that was previously 0, using a regular store; e.g.: >> f9000034 str x20,[x1] >> >> Guest CPU B (who is busy-waiting) reads a value from the same location: >> f9400280 ldr x0,[x20] >> >> The symptom: CPU B loads a value that is neither NULL nor the value >> written. Instead, x0 gets only the low 32-bits of the value written >> (high bits are all zero). By the time this value is dereferenced (a >> few instructions later) and the exception handlers run, the memory >> location from which it was loaded has the correct 64-bit value with >> a non-zero upper half. >> >> Obviously on a real ARM memory barriers are critical, and indeed >> the code has such barriers in it, but I'm assuming that any possible >> mistranslation of the barriers is irrelevant because for a 64-bit load >> and a 64-bit store you should get all or nothing. Other clues that may >> be relevant: the code is _near_ a LDREX/STREX pair (the busy-waiting >> is used to resolve a race when updating another variable), and the >> busy-wait loop has a yield instruction in it (although those appear >> to be no-ops with MTTCG). > > This might have to do with how ldrex/strex is emulated; are you relying > on the exclusive pair detecting ABA? If so, your code won't work in > QEMU since it uses cmpxchg to emulate ldrex/strex. ABA problem is nothing to do with tearing. And cmpxchg will definitely not create tearing problems. I don't know how we would manage 64-bit tearing on a 64-bit host, at least for the aarch64 guest, for which I believe we have a good emulation. > - Pin the QEMU-MTTCG process to a single CPU. Can you repro then? A good suggestion. > - Force the emulation of cmpxchg via EXCP_ATOMIC with: > > diff --git a/tcg/tcg-op.c b/tcg/tcg-op.c > index 87f673e..771effe5 100644 > --- a/tcg/tcg-op.c > +++ b/tcg/tcg-op.c > @@ -2856,7 +2856,7 @@ void tcg_gen_atomic_cmpxchg_i64(TCGv_i64 retv, TCGv addr, TCGv_i64 cmpv, > } > tcg_temp_free_i64(t1); > } else if ((memop & MO_SIZE) == MO_64) { > -#ifdef CONFIG_ATOMIC64 > +#if 0 I suspect this will simply alter the timing. However, give it a go by all means. If there's a test case that you can share, that would be awesome. Especially if you can prod it to happen with a standalone minimal binary. With luck you can reproduce via aarch64-linux-user too, and simply signal an error via branch to __builtin_trap. r~