From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([2001:4830:134:3::10]:46823)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <serge.fdrv@gmail.com>) id 1b8aJK-0006sp-IQ
	for qemu-devel@nongnu.org; Thu, 02 Jun 2016 17:37:51 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <serge.fdrv@gmail.com>) id 1b8aJE-0000Nu-Jw
	for qemu-devel@nongnu.org; Thu, 02 Jun 2016 17:37:49 -0400
Received: from mail-lf0-x243.google.com ([2a00:1450:4010:c07::243]:35962)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <serge.fdrv@gmail.com>) id 1b8aJE-0000Nl-BG
	for qemu-devel@nongnu.org; Thu, 02 Jun 2016 17:37:44 -0400
Received: by mail-lf0-x243.google.com with SMTP id h68so6313110lfh.3
	for <qemu-devel@nongnu.org>; Thu, 02 Jun 2016 14:37:43 -0700 (PDT)
References: <20160531183928.29406-1-bobby.prani@gmail.com>
	<20160531183928.29406-2-bobby.prani@gmail.com>
	<57505F1A.3020808@gmail.com>
	<68c32d50-adc2-25b2-b136-2a486f6b3de7@twiddle.net>
	<5750995D.6030005@gmail.com>
	<8e9b8569-89a5-845a-a856-7f2fa4435659@twiddle.net>
From: Sergey Fedorov <serge.fdrv@gmail.com>
Message-ID: <5750A725.2050303@gmail.com>
Date: Fri, 3 Jun 2016 00:37:41 +0300
MIME-Version: 1.0
In-Reply-To: <8e9b8569-89a5-845a-a856-7f2fa4435659@twiddle.net>
Content-Type: text/plain; charset=windows-1252
Content-Transfer-Encoding: 7bit
Subject: Re: [Qemu-devel] [RFC v2 PATCH 01/13] Introduce TCGOpcode for
 memory barrier
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel/>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Richard Henderson <rth@twiddle.net>, Pranith Kumar <bobby.prani@gmail.com>, "open list:All patches CC here" <qemu-devel@nongnu.org>
Cc: serge.fdrv@linaro.org, alex.bennee@linaro.org

On 03/06/16 00:18, Richard Henderson wrote:
> On 06/02/2016 01:38 PM, Sergey Fedorov wrote:
>> On 02/06/16 23:36, Richard Henderson wrote:
>>> On 06/02/2016 09:30 AM, Sergey Fedorov wrote:
>>>> I think we need to extend TCG load/store instruction attributes to
>>>> provide information about guest ordering requirements and leave
>>>> this TCG
>>>> operation only for explicit barrier instruction translation.
>>>
>>> I do not agree.  I think separate barriers are much cleaner and easier
>>> to manage and reason with.
>>>
>>
>> How are we going to emulate strongly-ordered guests on weakly-ordered
>> hosts then? I think if every load/store operation must specify which
>> ordering it implies then this task would be quite simple.
>
> Hum.  That does seem helpful-ish.  But I'm not certain how helpful it
> is to complicate the helper functions even further.
>
> What if we have tcg_canonicalize_memop (or some such) split off the
> barriers into separate opcodes.  E.g.
>
> MO_BAR_LD_B = 32    // prevent earlier loads from crossing current op
> MO_BAR_ST_B = 64    // prevent earlier stores from crossing current op
> MO_BAR_LD_A = 128    // prevent later loads from crossing current op
> MO_BAR_ST_A = 256    // prevent later stores from crossing current op
> MO_BAR_LDST_B = MO_BAR_LD_B | MO_BAR_ST_B
> MO_BAR_LDST_A = MO_BAR_LD_A | MO_BAR_ST_A
> MO_BAR_MASK = MO_BAR_LDST_B | MO_BAR_LDST_A
>
> // Match Sparc MEMBAR as the most flexible host.
> TCG_BAR_LD_LD = 1    // #LoadLoad barrier
> TCG_BAR_ST_LD = 2    // #StoreLoad barrier
> TCG_BAR_LD_ST = 4    // #LoadStore barrier
> TCG_BAR_ST_ST = 8    // #StoreStore barrier
> TCG_BAR_SYNC  = 64    // SEQ_CST barrier
>
> where
>
>   tcg_gen_qemu_ld_i32(x, y, i, m | MO_BAR_LD_BEFORE | MO_BAR_ST_AFTER)
>
> emits
>
>   mb        TCG_BAR_LD_LD
>   qemu_ld_i32    x, y, i, m
>   mb        TCG_BAR_LD_ST
>
> We can then add an optimization pass which folds barriers with no
> memory operations in between, so that duplicates are eliminated.

It would give us three TCG operations for each memory operation instead
of one. But then we might like to combine these barrier operations back
with memory operations in each backend. If we propagate memory ordering
semantics up to the backend, it can decide itself what instructions are
best to generate.

So I would just focus on translating only explicit memory barrier
operations for now.

Kind regards,
Sergey