From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:40999) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bPZkX-0005Y4-1M for qemu-devel@nongnu.org; Tue, 19 Jul 2016 14:28:10 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1bPZkT-0000SZ-T1 for qemu-devel@nongnu.org; Tue, 19 Jul 2016 14:28:09 -0400 Received: from mail-yw0-x242.google.com ([2607:f8b0:4002:c05::242]:33827) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bPZkT-0000ST-NI for qemu-devel@nongnu.org; Tue, 19 Jul 2016 14:28:05 -0400 Received: by mail-yw0-x242.google.com with SMTP id j12so1213240ywb.1 for ; Tue, 19 Jul 2016 11:28:05 -0700 (PDT) References: <20160714202940.18399-1-bobby.prani@gmail.com> <4ca466b4-2271-b08f-9766-cc944cfe8ba5@twiddle.net> From: Pranith Kumar In-reply-to: <4ca466b4-2271-b08f-9766-cc944cfe8ba5@twiddle.net> Date: Tue, 19 Jul 2016 14:28:03 -0400 Message-ID: <874m7lfpos.fsf@gmail.com> MIME-Version: 1.0 Content-Type: text/plain Subject: Re: [Qemu-devel] [RFC PATCH] tcg: Optimize fence instructions List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Richard Henderson Cc: "open list:All patches CC here" , serge.fdrv@gmail.com, alex.bennee@linaro.org Richard Henderson writes: > On 07/15/2016 01:59 AM, Pranith Kumar wrote: >> This patch applies on top of the fence generation patch series. >> >> This commit optimizes fence instructions. Two optimizations are >> currently implemented. These are: >> >> 1. Unnecessary duplicate fence instructions >> >> If the same fence instruction is detected consecutively, we remove >> one instance of it. >> >> ex: mb; mb => mb, strl; strl => strl >> >> 2. Merging weaker fence with subsequent/previous stronger fence >> >> load-acquire/store-release fence can be combined with a full fence >> without relaxing the ordering constraint. >> >> ex: a) ld; ldaq; mb => ld; mb >> b) mb; strl; st => mb; st >> >> Signed-off-by: Pranith Kumar >> --- >> tcg/optimize.c | 59 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >> tcg/tcg.h | 1 + >> 2 files changed, 60 insertions(+) >> >> diff --git a/tcg/optimize.c b/tcg/optimize.c >> index c0d975b..a655829 100644 >> --- a/tcg/optimize.c >> +++ b/tcg/optimize.c >> @@ -569,6 +569,63 @@ static bool swap_commutative2(TCGArg *p1, TCGArg *p2) >> return false; >> } >> >> +/* Eliminate duplicate and unnecessary fence instructions */ >> +void tcg_optimize_mb(TCGContext *s) >> +{ >> + int oi, oi_next; >> + TCGArg prev_op_mb = -1; >> + TCGOp *prev_op; >> + >> + for (oi = s->gen_first_op_idx; oi >= 0; oi = oi_next) { >> + TCGOp *op = &s->gen_op_buf[oi]; >> + TCGArg *args = &s->gen_opparam_buf[op->args]; >> + TCGOpcode opc = op->opc; >> + >> + switch (opc) { >> + case INDEX_op_mb: >> + { >> + TCGBar curr_mb_type = args[0] & 0xF0; >> + TCGBar prev_mb_type = prev_op_mb & 0xF0; > > No check to make sure that prev_op_mb != -1? > I don't think that is necessary since if prev_op_mb is -1 then prev_mb_type will be 0x70 which will not match any of the cases we compare against below. >> + >> + if (curr_mb_type == prev_mb_type || >> + (curr_mb_type == TCG_BAR_STRL && prev_mb_type == TCG_BAR_SC)) { >> + /* Remove the current weaker barrier op. The previous >> + * barrier is stronger and sufficient. >> + * mb; strl => mb; st >> + */ >> + tcg_op_remove(s, op); >> + } else if (curr_mb_type == TCG_BAR_SC && >> + prev_mb_type == TCG_BAR_LDAQ) { >> + /* Remove the previous weaker barrier op. The current >> + * barrier is stronger and sufficient. >> + * ldaq; mb => ld; mb >> + */ >> + tcg_op_remove(s, prev_op); >> + } else if (curr_mb_type == TCG_BAR_STRL && >> + prev_mb_type == TCG_BAR_LDAQ) { >> + /* Consecutive load-acquire and store-release barriers >> + * can be merged into one stronger SC barrier >> + * ldaq; strl => ld; mb; st >> + */ >> + args[0] = (args[0] & 0x0F) | TCG_BAR_SC; >> + tcg_op_remove(s, prev_op); > > Don't you need to merge the bottom 4 bits as well? > That is an interesting point. Ideally, the TCG_MO_* flags for both the barriers should be the same. Atleast now, for LDAR and STRL barriers this flag is TCG_MO_ALL since we do not have any other variant. >> + default: >> + prev_op_mb = -1; >> + prev_op = NULL; >> + break; > > A reasonable start, I suppose, but there's nothing stopping you from > considering barriers that aren't exactly adjacent. > > You only need to reset prev_op_mb when you see qemu_ld, qemu_st, call, or > tcg_op_defs[opc].flags & TCG_OPF_BB_END. > That sounds like a good idea to extend this to. I will work on it. Thanks! -- Pranith