From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:56038) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1b8dbn-0004ON-8g for qemu-devel@nongnu.org; Thu, 02 Jun 2016 21:09:08 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1b8dbh-0004y7-6R for qemu-devel@nongnu.org; Thu, 02 Jun 2016 21:09:06 -0400 Received: from mail-qk0-x22e.google.com ([2607:f8b0:400d:c09::22e]:36564) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1b8dbh-0004y3-01 for qemu-devel@nongnu.org; Thu, 02 Jun 2016 21:09:01 -0400 Received: by mail-qk0-x22e.google.com with SMTP id i187so41691249qkd.3 for ; Thu, 02 Jun 2016 18:09:00 -0700 (PDT) Sender: Richard Henderson References: <20160531183928.29406-1-bobby.prani@gmail.com> <20160531183928.29406-2-bobby.prani@gmail.com> <57505F1A.3020808@gmail.com> <68c32d50-adc2-25b2-b136-2a486f6b3de7@twiddle.net> <5750995D.6030005@gmail.com> <8e9b8569-89a5-845a-a856-7f2fa4435659@twiddle.net> <5750A725.2050303@gmail.com> From: Richard Henderson Message-ID: <8a253238-5718-10d4-a1b9-d9c0c890a457@twiddle.net> Date: Thu, 2 Jun 2016 18:08:57 -0700 MIME-Version: 1.0 In-Reply-To: <5750A725.2050303@gmail.com> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [RFC v2 PATCH 01/13] Introduce TCGOpcode for memory barrier List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Sergey Fedorov , Pranith Kumar , "open list:All patches CC here" Cc: serge.fdrv@linaro.org, alex.bennee@linaro.org On 06/02/2016 02:37 PM, Sergey Fedorov wrote: > On 03/06/16 00:18, Richard Henderson wrote: >> On 06/02/2016 01:38 PM, Sergey Fedorov wrote: >>> On 02/06/16 23:36, Richard Henderson wrote: >>>> On 06/02/2016 09:30 AM, Sergey Fedorov wrote: >>>>> I think we need to extend TCG load/store instruction attributes to >>>>> provide information about guest ordering requirements and leave >>>>> this TCG >>>>> operation only for explicit barrier instruction translation. >>>> >>>> I do not agree. I think separate barriers are much cleaner and easier >>>> to manage and reason with. >>>> >>> >>> How are we going to emulate strongly-ordered guests on weakly-ordered >>> hosts then? I think if every load/store operation must specify which >>> ordering it implies then this task would be quite simple. >> >> Hum. That does seem helpful-ish. But I'm not certain how helpful it >> is to complicate the helper functions even further. >> >> What if we have tcg_canonicalize_memop (or some such) split off the >> barriers into separate opcodes. E.g. >> >> MO_BAR_LD_B = 32 // prevent earlier loads from crossing current op >> MO_BAR_ST_B = 64 // prevent earlier stores from crossing current op >> MO_BAR_LD_A = 128 // prevent later loads from crossing current op >> MO_BAR_ST_A = 256 // prevent later stores from crossing current op >> MO_BAR_LDST_B = MO_BAR_LD_B | MO_BAR_ST_B >> MO_BAR_LDST_A = MO_BAR_LD_A | MO_BAR_ST_A >> MO_BAR_MASK = MO_BAR_LDST_B | MO_BAR_LDST_A >> >> // Match Sparc MEMBAR as the most flexible host. >> TCG_BAR_LD_LD = 1 // #LoadLoad barrier >> TCG_BAR_ST_LD = 2 // #StoreLoad barrier >> TCG_BAR_LD_ST = 4 // #LoadStore barrier >> TCG_BAR_ST_ST = 8 // #StoreStore barrier >> TCG_BAR_SYNC = 64 // SEQ_CST barrier >> >> where >> >> tcg_gen_qemu_ld_i32(x, y, i, m | MO_BAR_LD_BEFORE | MO_BAR_ST_AFTER) >> >> emits >> >> mb TCG_BAR_LD_LD >> qemu_ld_i32 x, y, i, m >> mb TCG_BAR_LD_ST >> >> We can then add an optimization pass which folds barriers with no >> memory operations in between, so that duplicates are eliminated. > > It would give us three TCG operations for each memory operation instead > of one. But then we might like to combine these barrier operations back > with memory operations in each backend. If we propagate memory ordering > semantics up to the backend, it can decide itself what instructions are > best to generate. A strongly ordered target would generally only set BEFORE bits or AFTER bits, but not both (and I suggest we canonicalize on AFTER for all such targets). Thus a strongly ordered target would produce only 2 opcodes per memory op. I supplied both to make it easier to handle a weakly ordered target with acquire/release bits. I would *not* combine the barrier operations back with memory operations in the backend. Only armv8 and ia64 can do that, and given the optimization level at which we generate code, I doubt it would really make much difference above separate barriers. > So I would just focus on translating only explicit memory barrier > operations for now. Then why did you bring it up? r~