From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:43934) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ZIfs8-0001b7-M1 for qemu-devel@nongnu.org; Fri, 24 Jul 2015 12:31:01 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1ZIfs3-0004To-SK for qemu-devel@nongnu.org; Fri, 24 Jul 2015 12:30:56 -0400 Received: from hall.aurel32.net ([2001:bc8:30d7:100::1]:53323) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ZIfs3-0004SN-Jr for qemu-devel@nongnu.org; Fri, 24 Jul 2015 12:30:51 -0400 From: Aurelien Jarno Date: Fri, 24 Jul 2015 18:30:38 +0200 Message-Id: <1437755447-10537-2-git-send-email-aurelien@aurel32.net> In-Reply-To: <1437755447-10537-1-git-send-email-aurelien@aurel32.net> References: <1437755447-10537-1-git-send-email-aurelien@aurel32.net> Subject: [Qemu-devel] [PATCH for-2.5 01/10] tcg/optimize: optimize temps tracking List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: qemu-devel@nongnu.org Cc: Aurelien Jarno , Richard Henderson The tcg_temp_info structure uses 24 bytes per temp. Now that we emulate vector registers on most guests, it's not uncommon to have more than 100 used temps. This means we have initialize more than 2kB at least twice per TB, often more when there is a few goto_tb. Instead used a TCGTempSet bit array to track which temps are in used in the current basic block. This means there are only around 16 bytes to initialize. This improves the boot time of a MIPS guest on an x86-64 host by around 7% and moves out tcg_optimize from the the top of the profiler list. Cc: Richard Henderson Signed-off-by: Aurelien Jarno --- tcg/optimize.c | 32 ++++++++++++++++++++++---------- 1 file changed, 22 insertions(+), 10 deletions(-) diff --git a/tcg/optimize.c b/tcg/optimize.c index cd0e793..20e24b3 100644 --- a/tcg/optimize.c +++ b/tcg/optimize.c @@ -50,6 +50,7 @@ struct tcg_temp_info { }; static struct tcg_temp_info temps[TCG_MAX_TEMPS]; +static TCGTempSet temps_used; /* Reset TEMP's state to TCG_TEMP_UNDEF. If TEMP only had one copy, remove the copy flag from the left temp. */ @@ -67,6 +68,22 @@ static void reset_temp(TCGArg temp) temps[temp].mask = -1; } +/* Reset all temporaries, given that there are NB_TEMPS of them. */ +static void reset_all_temps(int nb_temps) +{ + memset(&temps_used.l, 0, sizeof(long) * BITS_TO_LONGS(nb_temps)); +} + +/* Initialize and activate a temporary. */ +static void init_temp_info(TCGArg temp) +{ + if (!test_bit(temp, temps_used.l)) { + temps[temp].state = TCG_TEMP_UNDEF; + temps[temp].mask = -1; + set_bit(temp, temps_used.l); + } +} + static TCGOp *insert_op_before(TCGContext *s, TCGOp *old_op, TCGOpcode opc, int nargs) { @@ -98,16 +115,6 @@ static TCGOp *insert_op_before(TCGContext *s, TCGOp *old_op, return new_op; } -/* Reset all temporaries, given that there are NB_TEMPS of them. */ -static void reset_all_temps(int nb_temps) -{ - int i; - for (i = 0; i < nb_temps; i++) { - temps[i].state = TCG_TEMP_UNDEF; - temps[i].mask = -1; - } -} - static int op_bits(TCGOpcode op) { const TCGOpDef *def = &tcg_op_defs[op]; @@ -606,6 +613,11 @@ void tcg_optimize(TCGContext *s) nb_iargs = def->nb_iargs; } + /* Initialize the temps that are going to be used */ + for (i = 0; i < nb_oargs + nb_iargs; i++) { + init_temp_info(args[i]); + } + /* Do copy propagation */ for (i = nb_oargs; i < nb_oargs + nb_iargs; i++) { if (temps[args[i]].state == TCG_TEMP_COPY) { -- 2.1.4