From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1KzISb-0006iv-A5 for qemu-devel@nongnu.org; Sun, 09 Nov 2008 17:08:45 -0500 Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1KzISa-0006iX-BV for qemu-devel@nongnu.org; Sun, 09 Nov 2008 17:08:44 -0500 Received: from [199.232.76.173] (port=39271 helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1KzISa-0006iM-0m for qemu-devel@nongnu.org; Sun, 09 Nov 2008 17:08:44 -0500 Received: from spsmtp01oc.spray.mail2world.com ([209.67.128.166]:2031 helo=spsmtp01oc.mail2world.com) by monty-python.gnu.org with esmtp (Exim 4.60) (envelope-from ) id 1KzISZ-0005mY-Ir for qemu-devel@nongnu.org; Sun, 09 Nov 2008 17:08:43 -0500 From: =?iso-8859-1?Q?Torbj=F6rn_Andersson?= Date: Sun, 9 Nov 2008 22:21:37 +0100 Message-ID: <005301c942b1$27625490$7626fdb0$@tt@home.se> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Content-Language: sv Subject: [Qemu-devel] Less interrupt overhead patch Reply-To: qemu-devel@nongnu.org List-Id: qemu-devel.nongnu.org List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: qemu-devel@nongnu.org Hi all.=20 I have been using QEMU for some time now and I have found that = interrupts cost more than necessary. The TB-chaining is lost, at least partially, at the arrival of an = interrupt. These chain links are then recreated at some time in the future, with a performance penalty. My proposed solution is to skip unlinking TBs recursively and simply = flag that there is an interrupt request pending. Then we check interrupt flag = in goto_tb and skip jumping to the next TB. I think the code shows this = change quite well. My experience is that this results in better overall performance. = However, I guess targets with very little interrupts will be somewhat slower. Unfortunately, I have no structured measurements worth sending, so you = will have to do some tests on your own. If you find this patch not resulting = in improved overall performance, then please reply and explain why. Regarding the code, I have done some small changes in all targets, but I guess it's possible to isolate the change in tcg-op.h and exec. However, = the type name of the CPU state struct is not the same in all targets. Signed-off-by: Torbj=F6rn Anderson Index: target-arm/translate.c =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D --- target-arm/translate.c (revision 5661) +++ target-arm/translate.c (working copy) @@ -3376,7 +3376,14 @@ =20 tb =3D s->tb; if ((tb->pc & TARGET_PAGE_MASK) =3D=3D (dest & TARGET_PAGE_MASK)) { + uint32_t offset =3D offsetof(CPUARMState, interrupt_request); + TCGv tmp =3D new_tmp(); + tcg_gen_ld_i32(tmp, cpu_env, offset); + int label =3D gen_new_label(); + tcg_gen_brcondi_i32(TCG_COND_GT, tmp, 0, label); + dead_tmp(tmp); tcg_gen_goto_tb(n); + gen_set_label(label); gen_set_pc_im(dest); tcg_gen_exit_tb((long)tb + n); } else { Index: target-sh4/translate.c =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D --- target-sh4/translate.c (revision 5661) +++ target-sh4/translate.c (working copy) @@ -264,7 +264,14 @@ if ((tb->pc & TARGET_PAGE_MASK) =3D=3D (dest & TARGET_PAGE_MASK) && !ctx->singlestep_enabled) { /* Use a direct jump if in same page and singlestep not enabled */ + uint32_t offset =3D offsetof(CPUState, interrupt_request); + TCGv tmp =3D tcg_temp_new(TCG_TYPE_I32); + tcg_gen_ld_i32(tmp, cpu_env, offset); + int label =3D gen_new_label(); + tcg_gen_brcondi_i32(TCG_COND_GT, tmp, 0, label); + tcg_temp_free(tmp); tcg_gen_goto_tb(n); + gen_set_label(label); tcg_gen_movi_i32(cpu_pc, dest); tcg_gen_exit_tb((long) tb + n); } else { Index: exec.c =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D --- exec.c (revision 5661) +++ exec.c (working copy) @@ -1495,15 +1495,6 @@ cpu_abort(env, "Raised interrupt while not in I/O = function"); } #endif - } else { - tb =3D env->current_tb; - /* if the cpu is currently executing code, we must unlink it = and - all the potentially executing TB */ - if (tb && !testandset(&interrupt_lock)) { - env->current_tb =3D NULL; - tb_reset_jump_recursive(tb); - resetlock(&interrupt_lock); - } } #endif } Index: target-mips/translate.c =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D --- target-mips/translate.c (revision 5661) +++ target-mips/translate.c (working copy) @@ -2438,7 +2438,14 @@ TranslationBlock *tb; tb =3D ctx->tb; if ((tb->pc & TARGET_PAGE_MASK) =3D=3D (dest & TARGET_PAGE_MASK)) { + uint32_t offset =3D offsetof(CPUState, interrupt_request); + TCGv tmp =3D tcg_temp_new(TCG_TYPE_I32); + tcg_gen_ld_i32(tmp, cpu_env, offset); + int label =3D gen_new_label(); + tcg_gen_brcondi_i32(TCG_COND_GT, tmp, 0, label); + tcg_temp_free(tmp); tcg_gen_goto_tb(n); + gen_set_label(label); gen_save_pc(dest); tcg_gen_exit_tb((long)tb + n); } else { Index: target-m68k/translate.c =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D --- target-m68k/translate.c (revision 5661) +++ target-m68k/translate.c (working copy) @@ -876,7 +876,14 @@ gen_exception(s, dest, EXCP_DEBUG); } else if ((tb->pc & TARGET_PAGE_MASK) =3D=3D (dest & = TARGET_PAGE_MASK) || (s->pc & TARGET_PAGE_MASK) =3D=3D (dest & = TARGET_PAGE_MASK)) { + uint32_t offset =3D offsetof(CPUState, interrupt_request); + TCGv tmp =3D tcg_temp_new(TCG_TYPE_I32); + tcg_gen_ld_i32(tmp, cpu_env, offset); + int label =3D gen_new_label(); + tcg_gen_brcondi_i32(TCG_COND_GT, tmp, 0, label); + tcg_temp_free(tmp); tcg_gen_goto_tb(n); + gen_set_label(label); tcg_gen_movi_i32(QREG_PC, dest); tcg_gen_exit_tb((long)tb + n); } else { Index: target-i386/translate.c =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D --- target-i386/translate.c (revision 5661) +++ target-i386/translate.c (working copy) @@ -2212,8 +2212,15 @@ /* NOTE: we handle the case where the TB spans two pages here */ if ((pc & TARGET_PAGE_MASK) =3D=3D (tb->pc & TARGET_PAGE_MASK) || (pc & TARGET_PAGE_MASK) =3D=3D ((s->pc - 1) & = TARGET_PAGE_MASK)) { + uint32_t offset =3D offsetof(CPUX86State, interrupt_request); + TCGv tmp =3D tcg_temp_new(TCG_TYPE_I32); + tcg_gen_ld_i32(tmp, cpu_env, offset); + int label =3D gen_new_label(); + tcg_gen_brcondi_i32(TCG_COND_GT, tmp, 0, label); + tcg_temp_free(tmp); /* jump to same page: we can use a direct jump */ tcg_gen_goto_tb(tb_num); + gen_set_label(label); gen_jmp_im(eip); tcg_gen_exit_tb((long)tb + tb_num); } else { Index: target-cris/translate.c =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D --- target-cris/translate.c (revision 5663) +++ target-cris/translate.c (working copy) @@ -643,7 +643,14 @@ TranslationBlock *tb; tb =3D dc->tb; if ((tb->pc & TARGET_PAGE_MASK) =3D=3D (dest & TARGET_PAGE_MASK)) { - tcg_gen_goto_tb(n); + uint32_t offset =3D offsetof(CPUState, interrupt_request); + TCGv tmp =3D tcg_temp_new(TCG_TYPE_I32); + tcg_gen_ld_i32(tmp, cpu_env, offset); + int label =3D gen_new_label(); + tcg_gen_brcondi_i32(TCG_COND_GT, tmp, 0, label); + tcg_temp_free(tmp); + tcg_gen_goto_tb(n); + gen_set_label(label); tcg_gen_movi_tl(env_pc, dest); tcg_gen_exit_tb((long)tb + n); } else { Index: target-sparc/translate.c =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D --- target-sparc/translate.c (revision 5661) +++ target-sparc/translate.c (working copy) @@ -226,7 +226,14 @@ if ((pc & TARGET_PAGE_MASK) =3D=3D (tb->pc & TARGET_PAGE_MASK) && (npc & TARGET_PAGE_MASK) =3D=3D (tb->pc & TARGET_PAGE_MASK)) { /* jump to same page: we can use a direct jump */ + uint32_t offset =3D offsetof(CPUState, interrupt_request); + TCGv tmp =3D tcg_temp_new(TCG_TYPE_I32); + tcg_gen_ld_i32(tmp, cpu_env, offset); + int label =3D gen_new_label(); + tcg_gen_brcondi_i32(TCG_COND_GT, tmp, 0, label); + tcg_temp_free(tmp); tcg_gen_goto_tb(tb_num); + gen_set_label(label); tcg_gen_movi_tl(cpu_pc, pc); tcg_gen_movi_tl(cpu_npc, npc); tcg_gen_exit_tb((long)tb + tb_num); Index: target-ppc/translate.c =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D --- target-ppc/translate.c (revision 5661) +++ target-ppc/translate.c (working copy) @@ -3463,7 +3463,14 @@ #endif if ((tb->pc & TARGET_PAGE_MASK) =3D=3D (dest & TARGET_PAGE_MASK) && likely(!ctx->singlestep_enabled)) { + uint32_t offset =3D offsetof(CPUState, interrupt_request); + TCGv tmp =3D tcg_temp_new(TCG_TYPE_I32); + tcg_gen_ld_i32(tmp, cpu_env, offset); + int label =3D gen_new_label(); + tcg_gen_brcondi_i32(TCG_COND_GT, tmp, 0, label); + tcg_temp_free(tmp); tcg_gen_goto_tb(n); + gen_set_label(label); tcg_gen_movi_tl(cpu_nip, dest & ~3); tcg_gen_exit_tb((long)tb + n); } else {