From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([2001:4830:134:3::10]:46923)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <bobby.prani@gmail.com>) id 1bPaAr-0005IS-9a
	for qemu-devel@nongnu.org; Tue, 19 Jul 2016 14:55:22 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <bobby.prani@gmail.com>) id 1bPaAn-0006w9-A9
	for qemu-devel@nongnu.org; Tue, 19 Jul 2016 14:55:20 -0400
Received: from mail-yw0-x231.google.com ([2607:f8b0:4002:c05::231]:36156)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <bobby.prani@gmail.com>) id 1bPaAn-0006w4-5E
	for qemu-devel@nongnu.org; Tue, 19 Jul 2016 14:55:17 -0400
Received: by mail-yw0-x231.google.com with SMTP id u134so24092082ywg.3
	for <qemu-devel@nongnu.org>; Tue, 19 Jul 2016 11:55:16 -0700 (PDT)
References: <20160714202940.18399-1-bobby.prani@gmail.com>
	<558fdb52-fe3e-2841-cc67-3ec2744c0224@redhat.com>
From: Pranith Kumar <bobby.prani@gmail.com>
In-reply-to: <558fdb52-fe3e-2841-cc67-3ec2744c0224@redhat.com>
Date: Tue, 19 Jul 2016 14:55:15 -0400
Message-ID: <87zipde9v0.fsf@gmail.com>
MIME-Version: 1.0
Content-Type: text/plain
Subject: Re: [Qemu-devel] [RFC PATCH] tcg: Optimize fence instructions
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel/>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Paolo Bonzini <pbonzini@redhat.com>
Cc: Richard Henderson <rth@twiddle.net>, "open list:All patches CC here" <qemu-devel@nongnu.org>, serge.fdrv@gmail.com, alex.bennee@linaro.org


Paolo Bonzini writes:

> On 14/07/2016 22:29, Pranith Kumar wrote:
>> +            } else if (curr_mb_type == TCG_BAR_STRL &&
>> +                       prev_mb_type == TCG_BAR_LDAQ) {
>> +                /* Consecutive load-acquire and store-release barriers
>> +                 * can be merged into one stronger SC barrier
>> +                 * ldaq; strl => ld; mb; st
>> +                 */
>> +                args[0] = (args[0] & 0x0F) | TCG_BAR_SC;
>> +                tcg_op_remove(s, prev_op);
>
> Is this really an optimization?  For example the processor could reorder
> "st1; ldaq1; strl2; ld2" to "ldaq1; ld2; st1; strl2".  It cannot do this
> if you change ldaq1/strl2 to ld1/mb/st2.
>
> On x86 for example a memory fence costs ~50 clock cycles, while normal
> loads and stores are of course faster.
>
> Of course this is useful if your target doesn't have ldaq/strl
> instructions.  In this case, however, you probably want to lower ldaq to
> "ld;mb" and strl to "mb;st"; the other optimizations then will remove
> the unnecessary barrier.
>

I agree that this is a conservative optimization. The problem is that
currently even for architectures which have ldaq/strl instructions, tcg
backend does not generate them. TCG just generates plain loads and stores.I
guess we didn't need to since it was single threaded MTTCG.

I am trying to add support to generate these instructions on AARCH64. Once
this is done we can disable the above optimization.

-- 
Pranith