All of lore.kernel.org
 help / color / mirror / Atom feed
From: Aurelien Jarno <aurelien@aurel32.net>
To: Richard Henderson <rth@twiddle.net>
Cc: Peter Maydell <peter.maydell@linaro.org>, qemu-devel@nongnu.org
Subject: Re: [Qemu-devel] [PATCH v3 14/20] tcg-arm: Cleanup goto_tb handling
Date: Thu, 28 Mar 2013 21:09:46 +0100	[thread overview]
Message-ID: <20130328200946.GD5000@ohm.aurel32.net> (raw)
In-Reply-To: <1364484781-15561-15-git-send-email-rth@twiddle.net>

On Thu, Mar 28, 2013 at 08:32:55AM -0700, Richard Henderson wrote:
> Eliminate 2 disabled code blocks.  Choose the load-to-pc method of
> jumping so that we can eliminate the 16M code_gen_buffer limitation.
> Remove a test in the indirect jump method that is always true.
> 
> Signed-off-by: Richard Henderson <rth@twiddle.net>
> ---
>  include/exec/exec-all.h | 23 +++--------------------
>  tcg/arm/tcg-target.c    | 26 ++++++--------------------
>  translate-all.c         |  2 --
>  3 files changed, 9 insertions(+), 42 deletions(-)

I already proposed such a patch, but it seems to improve things only on
some specific cases (kernel boot), while increasing the I/D cache and
TLB pressure.

See https://lists.gnu.org/archive/html/qemu-devel/2012-10/msg01684.html 

> diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
> index e856191..190effa 100644
> --- a/include/exec/exec-all.h
> +++ b/include/exec/exec-all.h
> @@ -233,26 +233,9 @@ static inline void tb_set_jmp_target1(uintptr_t jmp_addr, uintptr_t addr)
>  #elif defined(__arm__)
>  static inline void tb_set_jmp_target1(uintptr_t jmp_addr, uintptr_t addr)
>  {
> -#if !QEMU_GNUC_PREREQ(4, 1)
> -    register unsigned long _beg __asm ("a1");
> -    register unsigned long _end __asm ("a2");
> -    register unsigned long _flg __asm ("a3");
> -#endif
> -
> -    /* we could use a ldr pc, [pc, #-4] kind of branch and avoid the flush */
> -    *(uint32_t *)jmp_addr =
> -        (*(uint32_t *)jmp_addr & ~0xffffff)
> -        | (((addr - (jmp_addr + 8)) >> 2) & 0xffffff);
> -
> -#if QEMU_GNUC_PREREQ(4, 1)
> -    __builtin___clear_cache((char *) jmp_addr, (char *) jmp_addr + 4);
> -#else
> -    /* flush icache */
> -    _beg = jmp_addr;
> -    _end = jmp_addr + 4;
> -    _flg = 0;
> -    __asm __volatile__ ("swi 0x9f0002" : : "r" (_beg), "r" (_end), "r" (_flg));
> -#endif
> +    /* We're using "ldr pc, [pc,#-4]", so we can just store the raw
> +       address, without caring for flushing the icache.  */
> +    *(uint32_t *)jmp_addr = addr;
>  }
>  #elif defined(__sparc__)
>  void tb_set_jmp_target1(uintptr_t jmp_addr, uintptr_t addr);
> diff --git a/tcg/arm/tcg-target.c b/tcg/arm/tcg-target.c
> index 7bcba19..6de5e90 100644
> --- a/tcg/arm/tcg-target.c
> +++ b/tcg/arm/tcg-target.c
> @@ -1595,30 +1595,16 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
>          break;
>      case INDEX_op_goto_tb:
>          if (s->tb_jmp_offset) {
> -            /* Direct jump method */
> -#if defined(USE_DIRECT_JUMP)
> -            s->tb_jmp_offset[args[0]] = s->code_ptr - s->code_buf;
> -            tcg_out_b_noaddr(s, COND_AL);
> -#else
> +            /* "Direct" jump method.  Rather than limit the code gen buffer
> +               to 16M, load the destination from the next word.  */
>              tcg_out_ld32_12(s, COND_AL, TCG_REG_PC, TCG_REG_PC, -4);
>              s->tb_jmp_offset[args[0]] = s->code_ptr - s->code_buf;
> -            tcg_out32(s, 0);
> -#endif
> +            s->code_ptr += 4;
>          } else {
> -            /* Indirect jump method */
> -#if 1
> -            c = (int) (s->tb_next + args[0]) - ((int) s->code_ptr + 8);
> -            if (c > 0xfff || c < -0xfff) {
> -                tcg_out_movi32(s, COND_AL, TCG_REG_R0,
> -                                (tcg_target_long) (s->tb_next + args[0]));
> -                tcg_out_ld32_12(s, COND_AL, TCG_REG_PC, TCG_REG_R0, 0);
> -            } else
> -                tcg_out_ld32_12(s, COND_AL, TCG_REG_PC, TCG_REG_PC, c);
> -#else
> -            tcg_out_ld32_12(s, COND_AL, TCG_REG_R0, TCG_REG_PC, 0);
> +            /* Indirect jump method.  */
> +            tcg_out_movi32(s, COND_AL, TCG_REG_R0,
> +                           (tcg_target_long) (s->tb_next + args[0]));
>              tcg_out_ld32_12(s, COND_AL, TCG_REG_PC, TCG_REG_R0, 0);
> -            tcg_out32(s, (tcg_target_long) (s->tb_next + args[0]));
> -#endif
>          }
>          s->tb_next_offset[args[0]] = s->code_ptr - s->code_buf;
>          break;
> diff --git a/translate-all.c b/translate-all.c
> index a98c646..3ca839f 100644
> --- a/translate-all.c
> +++ b/translate-all.c
> @@ -460,8 +460,6 @@ static inline PageDesc *page_find(tb_page_addr_t index)
>  # define MAX_CODE_GEN_BUFFER_SIZE  (2ul * 1024 * 1024 * 1024)
>  #elif defined(__sparc__)
>  # define MAX_CODE_GEN_BUFFER_SIZE  (2ul * 1024 * 1024 * 1024)
> -#elif defined(__arm__)
> -# define MAX_CODE_GEN_BUFFER_SIZE  (16u * 1024 * 1024)
>  #elif defined(__s390x__)
>    /* We have a +- 4GB range on the branches; leave some slop.  */
>  # define MAX_CODE_GEN_BUFFER_SIZE  (3ul * 1024 * 1024 * 1024)
> -- 
> 1.8.1.4
> 
> 
> 

-- 
Aurelien Jarno                          GPG: 1024D/F1BCDB73
aurelien@aurel32.net                 http://www.aurel32.net

  reply	other threads:[~2013-03-28 20:09 UTC|newest]

Thread overview: 41+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-03-28 15:32 [Qemu-devel] [PATCH v3 00/20] tcg-arm improvements Richard Henderson
2013-03-28 15:32 ` [Qemu-devel] [PATCH v3 01/20] tcg-arm: Use bic to implement and with constant Richard Henderson
2013-03-29 16:53   ` Aurelien Jarno
2013-03-28 15:32 ` [Qemu-devel] [PATCH v3 02/20] tcg-arm: Handle negated constant arguments to and/sub Richard Henderson
2013-03-29 16:53   ` Aurelien Jarno
2013-03-28 15:32 ` [Qemu-devel] [PATCH v3 03/20] tcg-arm: Allow constant first argument to sub Richard Henderson
2013-03-29 16:58   ` Aurelien Jarno
2013-03-28 15:32 ` [Qemu-devel] [PATCH v3 04/20] tcg-arm: Use tcg_out_dat_rIN for compares Richard Henderson
2013-03-29 16:58   ` Aurelien Jarno
2013-03-28 15:32 ` [Qemu-devel] [PATCH v3 05/20] tcg-arm: Handle constant arguments to add2/sub2 Richard Henderson
2013-03-28 15:56   ` Peter Maydell
2013-03-28 16:04     ` Richard Henderson
2013-03-28 16:09       ` Laurent Desnogues
2013-03-28 16:16         ` Richard Henderson
2013-03-28 15:32 ` [Qemu-devel] [PATCH v3 06/20] tcg-arm: Improve constant generation Richard Henderson
2013-03-28 15:32 ` [Qemu-devel] [PATCH v3 07/20] tcg-arm: Fold epilogue into INDEX_op_exit_tb Richard Henderson
2013-03-28 16:05   ` Peter Maydell
2013-03-28 16:12     ` Richard Henderson
2013-03-28 15:32 ` [Qemu-devel] [PATCH v3 08/20] tcg-arm: Implement deposit for armv7 Richard Henderson
2013-03-28 16:15   ` Peter Maydell
2013-03-28 16:22     ` Richard Henderson
2013-03-28 16:59       ` Peter Maydell
2013-03-28 15:32 ` [Qemu-devel] [PATCH v3 09/20] tcg-arm: Implement division instructions Richard Henderson
2013-03-28 15:32 ` [Qemu-devel] [PATCH v3 10/20] tcg-arm: Use TCG_REG_TMP name for the tcg temporary Richard Henderson
2013-03-28 15:32 ` [Qemu-devel] [PATCH v3 11/20] tcg-arm: Use R12 " Richard Henderson
2013-03-28 15:32 ` [Qemu-devel] [PATCH v3 12/20] tcg-arm: Cleanup multiply subroutines Richard Henderson
2013-03-28 15:32 ` [Qemu-devel] [PATCH v3 13/20] tcg-arm: Cleanup tcg_out_goto_label Richard Henderson
2013-03-28 15:32 ` [Qemu-devel] [PATCH v3 14/20] tcg-arm: Cleanup goto_tb handling Richard Henderson
2013-03-28 20:09   ` Aurelien Jarno [this message]
2013-03-28 20:48     ` Richard Henderson
2013-03-29  6:50       ` Aurelien Jarno
2013-03-29 15:06         ` Richard Henderson
2013-03-28 15:32 ` [Qemu-devel] [PATCH v3 15/20] tcg-arm: Cleanup most primitive load store subroutines Richard Henderson
2013-03-28 15:32 ` [Qemu-devel] [PATCH v3 16/20] tcg-arm: Fix local stack frame Richard Henderson
2013-03-29 16:50   ` Aurelien Jarno
2013-03-28 15:32 ` [Qemu-devel] [PATCH v3 17/20] tcg-arm: Split out tcg_out_tlb_read Richard Henderson
2013-03-28 15:32 ` [Qemu-devel] [PATCH v3 18/20] tcg-arm: Improve scheduling of tcg_out_tlb_read Richard Henderson
2013-03-28 15:33 ` [Qemu-devel] [PATCH v3 19/20] tcg-arm: Use movi32 + blx for calls on v7 Richard Henderson
2013-03-28 15:33 ` [Qemu-devel] [PATCH v3 20/20] tcg-arm: Convert to CONFIG_QEMU_LDST_OPTIMIZATION Richard Henderson
2013-03-28 16:44   ` Peter Maydell
2013-03-28 17:46     ` Richard Henderson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130328200946.GD5000@ohm.aurel32.net \
    --to=aurelien@aurel32.net \
    --cc=peter.maydell@linaro.org \
    --cc=qemu-devel@nongnu.org \
    --cc=rth@twiddle.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.