From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:55803) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ZJdfV-0007zG-GV for qemu-devel@nongnu.org; Mon, 27 Jul 2015 04:21:55 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1ZJdfN-0004OQ-Qs for qemu-devel@nongnu.org; Mon, 27 Jul 2015 04:21:53 -0400 Received: from mail-wi0-x231.google.com ([2a00:1450:400c:c05::231]:36188) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ZJdfN-0004O1-JO for qemu-devel@nongnu.org; Mon, 27 Jul 2015 04:21:45 -0400 Received: by wicgb10 with SMTP id gb10so100720335wic.1 for ; Mon, 27 Jul 2015 01:21:44 -0700 (PDT) Sender: Paolo Bonzini References: <1437755447-10537-1-git-send-email-aurelien@aurel32.net> <1437755447-10537-2-git-send-email-aurelien@aurel32.net> From: Paolo Bonzini Message-ID: <55B5EA14.2050209@redhat.com> Date: Mon, 27 Jul 2015 10:21:40 +0200 MIME-Version: 1.0 In-Reply-To: <1437755447-10537-2-git-send-email-aurelien@aurel32.net> Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [PATCH for-2.5 01/10] tcg/optimize: optimize temps tracking List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Aurelien Jarno , qemu-devel@nongnu.org Cc: Richard Henderson On 24/07/2015 18:30, Aurelien Jarno wrote: > The tcg_temp_info structure uses 24 bytes per temp. Now that we emulate > vector registers on most guests, it's not uncommon to have more than 100 > used temps. This means we have initialize more than 2kB at least twice > per TB, often more when there is a few goto_tb. > > Instead used a TCGTempSet bit array to track which temps are in used in > the current basic block. This means there are only around 16 bytes to > initialize. > > This improves the boot time of a MIPS guest on an x86-64 host by around > 7% and moves out tcg_optimize from the the top of the profiler list. > > Cc: Richard Henderson > Signed-off-by: Aurelien Jarno > --- > tcg/optimize.c | 32 ++++++++++++++++++++++---------- > 1 file changed, 22 insertions(+), 10 deletions(-) > > diff --git a/tcg/optimize.c b/tcg/optimize.c > index cd0e793..20e24b3 100644 > --- a/tcg/optimize.c > +++ b/tcg/optimize.c > @@ -50,6 +50,7 @@ struct tcg_temp_info { > }; > > static struct tcg_temp_info temps[TCG_MAX_TEMPS]; > +static TCGTempSet temps_used; > > /* Reset TEMP's state to TCG_TEMP_UNDEF. If TEMP only had one copy, remove > the copy flag from the left temp. */ > @@ -67,6 +68,22 @@ static void reset_temp(TCGArg temp) > temps[temp].mask = -1; > } > > +/* Reset all temporaries, given that there are NB_TEMPS of them. */ > +static void reset_all_temps(int nb_temps) > +{ > + memset(&temps_used.l, 0, sizeof(long) * BITS_TO_LONGS(nb_temps)); You can use bitmap_zero here. Paolo > +} > + > +/* Initialize and activate a temporary. */ > +static void init_temp_info(TCGArg temp) > +{ > + if (!test_bit(temp, temps_used.l)) { > + temps[temp].state = TCG_TEMP_UNDEF; > + temps[temp].mask = -1; > + set_bit(temp, temps_used.l); > + } > +} > + > static TCGOp *insert_op_before(TCGContext *s, TCGOp *old_op, > TCGOpcode opc, int nargs) > { > @@ -98,16 +115,6 @@ static TCGOp *insert_op_before(TCGContext *s, TCGOp *old_op, > return new_op; > } > > -/* Reset all temporaries, given that there are NB_TEMPS of them. */ > -static void reset_all_temps(int nb_temps) > -{ > - int i; > - for (i = 0; i < nb_temps; i++) { > - temps[i].state = TCG_TEMP_UNDEF; > - temps[i].mask = -1; > - } > -} > - > static int op_bits(TCGOpcode op) > { > const TCGOpDef *def = &tcg_op_defs[op]; > @@ -606,6 +613,11 @@ void tcg_optimize(TCGContext *s) > nb_iargs = def->nb_iargs; > } > > + /* Initialize the temps that are going to be used */ > + for (i = 0; i < nb_oargs + nb_iargs; i++) { > + init_temp_info(args[i]); > + } > + > /* Do copy propagation */ > for (i = nb_oargs; i < nb_oargs + nb_iargs; i++) { > if (temps[args[i]].state == TCG_TEMP_COPY) { >