[Qemu-devel] [PATCH] Huge TLB performance improvement

* [Qemu-devel] [PATCH] Huge TLB performance improvement
@ 2006-03-06 14:59 Thiemo Seufer
  2006-11-05 15:38 ` Daniel Jacobowitz
  0 siblings, 1 reply; 16+ messages in thread
From: Thiemo Seufer @ 2006-03-06 14:59 UTC (permalink / raw)
  To: qemu-devel

Hello All,

this patch vastly improves TLB performance on MIPS, and probably also
on other architectures. I measured a Linux boot-shutdown cycle,
including userland init.

With minimal jump cache invalidation:

real    11m43.429s
user    9m51.975s
sys     0m1.375s

 64.19   1476.81  1476.81 20551904     0.00     0.00  tlb_flush_page
  6.72   1631.36   154.55   184346     0.00     0.00  cpu_mips_exec
  4.35   1731.46   100.10  3550500     0.00     0.00  dyngen_code
  3.66   1815.77    84.31 90897893     0.00     0.00  decode_opc
  2.89   1882.21    66.44 11170487     0.00     0.00  gen_intermediate_code_internal
  1.72   1921.80    39.59 29919267     0.00     0.00  map_address
  1.52   1956.66    34.86  7619987     0.00     0.00  tb_find_pc
  0.96   1978.85    22.19 26361969     0.00     0.00  tlb_set_page_exec
  0.96   2000.84    21.99                             __ldl_mmu
  0.90   2021.59    20.75 27279747     0.00     0.00  gen_arith_imm


With global jump cache kill:

real    6m19.811s
user    4m23.650s
sys     0m0.617s

 21.67    188.78   188.78   146571     0.00     0.00  cpu_mips_exec
 11.37    287.88    99.10  3393051     0.00     0.00  dyngen_code
  9.59    371.45    83.57 89839869     0.00     0.00  decode_opc
  7.68    438.33    66.88 10989930     0.00     0.00  gen_intermediate_code_internal
  4.24    475.26    36.93 30124659     0.00     0.00  map_address
  3.80    508.33    33.07  7596879     0.00     0.00  tb_find_pc
  2.74    532.22    23.89 27781692     0.00     0.00  tlb_set_page_exec
  2.62    555.02    22.80 39891573     0.00     0.00  cpu_mips_handle_mmu_fault
  2.55    577.25    22.23                             __ldl_mmu
  2.30    597.26    20.01 26968709     0.00     0.00  gen_arith_imm


Thiemo


Index: qemu-work/exec.c
===================================================================

--- qemu-work.orig/exec.c	2006-03-06 01:30:09.000000000 +0000
+++ qemu-work/exec.c	2006-03-06 01:30:28.000000000 +0000
@@ -1247,7 +1247,6 @@
 void tlb_flush_page(CPUState *env, target_ulong addr)
 {
     int i;
-    TranslationBlock *tb;
 
 #if defined(DEBUG_TLB)
     printf("tlb_flush_page: " TARGET_FMT_lx "\n", addr);
@@ -1261,14 +1260,10 @@
     tlb_flush_entry(&env->tlb_table[0][i], addr);
     tlb_flush_entry(&env->tlb_table[1][i], addr);
 
-    for(i = 0; i < TB_JMP_CACHE_SIZE; i++) {
-        tb = env->tb_jmp_cache[i];
-        if (tb && 
-            ((tb->pc & TARGET_PAGE_MASK) == addr ||
-             ((tb->pc + tb->size - 1) & TARGET_PAGE_MASK) == addr)) {
-            env->tb_jmp_cache[i] = NULL;
-        }
-    }
+    /* We throw away the jump cache altogether. This is cheaper than
+       trying to be smart by invalidating only the entries in the
+       affected address range. */
+    memset (env->tb_jmp_cache, 0, TB_JMP_CACHE_SIZE * sizeof (void *));
 
 #if !defined(CONFIG_SOFTMMU)
     if (addr < MMAP_AREA_END)

^ permalink raw reply	[flat|nested] 16+ messages in thread