From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([2001:4830:134:3::10]:56038)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <rth7680@gmail.com>) id 1b8dbn-0004ON-8g
	for qemu-devel@nongnu.org; Thu, 02 Jun 2016 21:09:08 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <rth7680@gmail.com>) id 1b8dbh-0004y7-6R
	for qemu-devel@nongnu.org; Thu, 02 Jun 2016 21:09:06 -0400
Received: from mail-qk0-x22e.google.com ([2607:f8b0:400d:c09::22e]:36564)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <rth7680@gmail.com>) id 1b8dbh-0004y3-01
	for qemu-devel@nongnu.org; Thu, 02 Jun 2016 21:09:01 -0400
Received: by mail-qk0-x22e.google.com with SMTP id i187so41691249qkd.3
	for <qemu-devel@nongnu.org>; Thu, 02 Jun 2016 18:09:00 -0700 (PDT)
Sender: Richard Henderson <rth7680@gmail.com>
References: <20160531183928.29406-1-bobby.prani@gmail.com>
	<20160531183928.29406-2-bobby.prani@gmail.com>
	<57505F1A.3020808@gmail.com>
	<68c32d50-adc2-25b2-b136-2a486f6b3de7@twiddle.net>
	<5750995D.6030005@gmail.com>
	<8e9b8569-89a5-845a-a856-7f2fa4435659@twiddle.net>
	<5750A725.2050303@gmail.com>
From: Richard Henderson <rth@twiddle.net>
Message-ID: <8a253238-5718-10d4-a1b9-d9c0c890a457@twiddle.net>
Date: Thu, 2 Jun 2016 18:08:57 -0700
MIME-Version: 1.0
In-Reply-To: <5750A725.2050303@gmail.com>
Content-Type: text/plain; charset=windows-1252; format=flowed
Content-Transfer-Encoding: 7bit
Subject: Re: [Qemu-devel] [RFC v2 PATCH 01/13] Introduce TCGOpcode for
 memory barrier
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel/>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Sergey Fedorov <serge.fdrv@gmail.com>, Pranith Kumar <bobby.prani@gmail.com>, "open list:All patches CC here" <qemu-devel@nongnu.org>
Cc: serge.fdrv@linaro.org, alex.bennee@linaro.org

On 06/02/2016 02:37 PM, Sergey Fedorov wrote:
> On 03/06/16 00:18, Richard Henderson wrote:
>> On 06/02/2016 01:38 PM, Sergey Fedorov wrote:
>>> On 02/06/16 23:36, Richard Henderson wrote:
>>>> On 06/02/2016 09:30 AM, Sergey Fedorov wrote:
>>>>> I think we need to extend TCG load/store instruction attributes to
>>>>> provide information about guest ordering requirements and leave
>>>>> this TCG
>>>>> operation only for explicit barrier instruction translation.
>>>>
>>>> I do not agree.  I think separate barriers are much cleaner and easier
>>>> to manage and reason with.
>>>>
>>>
>>> How are we going to emulate strongly-ordered guests on weakly-ordered
>>> hosts then? I think if every load/store operation must specify which
>>> ordering it implies then this task would be quite simple.
>>
>> Hum.  That does seem helpful-ish.  But I'm not certain how helpful it
>> is to complicate the helper functions even further.
>>
>> What if we have tcg_canonicalize_memop (or some such) split off the
>> barriers into separate opcodes.  E.g.
>>
>> MO_BAR_LD_B = 32    // prevent earlier loads from crossing current op
>> MO_BAR_ST_B = 64    // prevent earlier stores from crossing current op
>> MO_BAR_LD_A = 128    // prevent later loads from crossing current op
>> MO_BAR_ST_A = 256    // prevent later stores from crossing current op
>> MO_BAR_LDST_B = MO_BAR_LD_B | MO_BAR_ST_B
>> MO_BAR_LDST_A = MO_BAR_LD_A | MO_BAR_ST_A
>> MO_BAR_MASK = MO_BAR_LDST_B | MO_BAR_LDST_A
>>
>> // Match Sparc MEMBAR as the most flexible host.
>> TCG_BAR_LD_LD = 1    // #LoadLoad barrier
>> TCG_BAR_ST_LD = 2    // #StoreLoad barrier
>> TCG_BAR_LD_ST = 4    // #LoadStore barrier
>> TCG_BAR_ST_ST = 8    // #StoreStore barrier
>> TCG_BAR_SYNC  = 64    // SEQ_CST barrier
>>
>> where
>>
>>   tcg_gen_qemu_ld_i32(x, y, i, m | MO_BAR_LD_BEFORE | MO_BAR_ST_AFTER)
>>
>> emits
>>
>>   mb        TCG_BAR_LD_LD
>>   qemu_ld_i32    x, y, i, m
>>   mb        TCG_BAR_LD_ST
>>
>> We can then add an optimization pass which folds barriers with no
>> memory operations in between, so that duplicates are eliminated.
>
> It would give us three TCG operations for each memory operation instead
> of one. But then we might like to combine these barrier operations back
> with memory operations in each backend. If we propagate memory ordering
> semantics up to the backend, it can decide itself what instructions are
> best to generate.

A strongly ordered target would generally only set BEFORE bits or AFTER bits, 
but not both (and I suggest we canonicalize on AFTER for all such targets). 
Thus a strongly ordered target would produce only 2 opcodes per memory op.

I supplied both to make it easier to handle a weakly ordered target with 
acquire/release bits.

I would *not* combine the barrier operations back with memory operations in the 
backend.  Only armv8 and ia64 can do that, and given the optimization level at 
which we generate code, I doubt it would really make much difference above 
separate barriers.

> So I would just focus on translating only explicit memory barrier
> operations for now.

Then why did you bring it up?


r~