[Qemu-devel] [PATCH v2] cpu: implementing victim TLB for QEMU system emulated TLB

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

* [Qemu-devel] [PATCH v2] cpu: implementing victim TLB for QEMU system emulated TLB
@ 2014-01-23 19:49 Xin Tong
  2014-01-23 19:54 ` Xin Tong
                   ` (2 more replies)
  0 siblings, 3 replies; 12+ messages in thread
From: Xin Tong @ 2014-01-23 19:49 UTC (permalink / raw)
  To: qemu-devel; +Cc: Xin Tong, afaerber, rth

This patch adds a victim TLB to the QEMU system mode TLB.

QEMU system mode page table walks are expensive. Taken by running QEMU
qemu-system-x86_64 system mode on Intel PIN , a TLB miss and walking a
4-level page tables in guest Linux OS takes ~450 X86 instructions on
average.

QEMU system mode TLB is implemented using a directly-mapped hashtable.
This structure suffers from conflict misses. Increasing the
associativity of the TLB may not be the solution to conflict misses as
all the ways may have to be walked in serial.

A victim TLB is a TLB used to hold translations evicted from the
primary TLB upon replacement. The victim TLB lies between the main TLB
and its refill path. Victim TLB is of greater associativity (fully
associative in this patch). It takes longer to lookup the victim TLB,
but its likely better than a full page table walk. The memory
translation path is changed as follows :

Before Victim TLB:
1. Inline TLB lookup
2. Exit code cache on TLB miss.
3. Check for unaligned, IO accesses
4. TLB refill.
5. Do the memory access.
6. Return to code cache.

After Victim TLB:
1. Inline TLB lookup
2. Exit code cache on TLB miss.
3. Check for unaligned, IO accesses
4. Victim TLB lookup.
5. If victim TLB misses, TLB refill
6. Do the memory access.
7. Return to code cache

The advantage is that victim TLB can offer more associativity to a
directly mapped TLB and thus potentially fewer page table walks while
still keeping the time taken to flush within reasonable limits.
However, placing a victim TLB before the refill path increase TLB
refill path as the victim TLB is consulted before the TLB refill. The
performance results demonstrate that the pros outweigh the cons.

Attached are some performance results taken on SPECINT2006 train
datasets and kernel boot and qemu configure script on an 
Intel(R) Xeon(R) CPU  E5620  @ 2.40GHz Linux machine. In
summary, victim TLB improves the performance of qemu-system-x86_64 by
10.7% on average on SPECINT2006 and with highest improvement of in 25.4%
in 464.h264ref. And victim TLB does not result in any performance
degradation in any of the measured benchmarks. Furthermore, the
implemented victim TLB is architecture independent and is expected to
benefit other architectures in QEMU as well.

Although there are measurement fluctuations, the performance
improvement is very significant and by no means in the range of
noises.

Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Xin Tong <trent.tong@gmail.com>

---
 cputlb.c                        | 50 +++++++++++++++++++++++++-
 include/exec/cpu-defs.h         | 16 ++++++---
 include/exec/exec-all.h         |  2 ++
 include/exec/softmmu_template.h | 80 ++++++++++++++++++++++++++++++++++++++---
 4 files changed, 138 insertions(+), 10 deletions(-)

diff --git a/cputlb.c b/cputlb.c
index b533f3f..03a048a 100644
--- a/cputlb.c
+++ b/cputlb.c
@@ -34,6 +34,22 @@
 /* statistics */
 int tlb_flush_count;
 
+/* swap the 2 given TLB entries as well as their corresponding IOTLB */
+inline void swap_tlb(CPUTLBEntry *te, CPUTLBEntry *se, hwaddr *iote,
+                     hwaddr *iose)
+{
+   hwaddr iotmp;
+   CPUTLBEntry t;
+   /* swap iotlb */
+   iotmp = *iote;
+   *iote = *iose;
+   *iose = iotmp;
+   /* swap tlb */
+   memcpy(&t, te, sizeof(CPUTLBEntry));
+   memcpy(te, se, sizeof(CPUTLBEntry));
+   memcpy(se, &t, sizeof(CPUTLBEntry));
+}
+
 /* NOTE:
  * If flush_global is true (the usual case), flush all tlb entries.
  * If flush_global is false, flush (at least) all tlb entries not
@@ -58,8 +74,10 @@ void tlb_flush(CPUArchState *env, int flush_global)
     cpu->current_tb = NULL;
 
     memset(env->tlb_table, -1, sizeof(env->tlb_table));
+    memset(env->tlb_v_table, -1, sizeof(env->tlb_v_table));
     memset(env->tb_jmp_cache, 0, sizeof(env->tb_jmp_cache));
 
+    env->vtlb_index = 0;
     env->tlb_flush_addr = -1;
     env->tlb_flush_mask = 0;
     tlb_flush_count++;
@@ -106,6 +124,14 @@ void tlb_flush_page(CPUArchState *env, target_ulong addr)
         tlb_flush_entry(&env->tlb_table[mmu_idx][i], addr);
     }
 
+    /* check whether there are entries that need to be flushed in the vtlb */
+    for (mmu_idx = 0; mmu_idx < NB_MMU_MODES; mmu_idx++) {
+        unsigned int k;
+        for (k = 0; k < CPU_VTLB_SIZE; k++) {
+            tlb_flush_entry(&env->tlb_v_table[mmu_idx][k], addr);
+        }
+    }
+
     tb_flush_jmp_cache(env, addr);
 }
 
@@ -170,6 +196,11 @@ void cpu_tlb_reset_dirty_all(ram_addr_t start1, ram_addr_t length)
                 tlb_reset_dirty_range(&env->tlb_table[mmu_idx][i],
                                       start1, length);
             }
+
+            for (i = 0; i < CPU_VTLB_SIZE; i++) {
+                tlb_reset_dirty_range(&env->tlb_v_table[mmu_idx][i],
+                                      start1, length);
+            }
         }
     }
 }
@@ -193,6 +224,13 @@ void tlb_set_dirty(CPUArchState *env, target_ulong vaddr)
     for (mmu_idx = 0; mmu_idx < NB_MMU_MODES; mmu_idx++) {
         tlb_set_dirty1(&env->tlb_table[mmu_idx][i], vaddr);
     }
+
+    for (mmu_idx = 0; mmu_idx < NB_MMU_MODES; mmu_idx++) {
+        unsigned int k;
+        for (k = 0; k < CPU_VTLB_SIZE; k++) {
+            tlb_set_dirty1(&env->tlb_v_table[mmu_idx][k], vaddr);
+        }
+    }
 }
 
 /* Our TLB does not support large pages, so remember the area covered by
@@ -264,8 +302,18 @@ void tlb_set_page(CPUArchState *env, target_ulong vaddr,
                                             prot, &address);
 
     index = (vaddr >> TARGET_PAGE_BITS) & (CPU_TLB_SIZE - 1);
-    env->iotlb[mmu_idx][index] = iotlb - vaddr;
     te = &env->tlb_table[mmu_idx][index];
+
+    /* do not discard the translation in te, evict it into a victim tlb */
+    unsigned vidx = env->vtlb_index++ % CPU_VTLB_SIZE;
+    env->tlb_v_table[mmu_idx][vidx].addr_read  = te->addr_read;
+    env->tlb_v_table[mmu_idx][vidx].addr_write = te->addr_write;
+    env->tlb_v_table[mmu_idx][vidx].addr_code  = te->addr_code;
+    env->tlb_v_table[mmu_idx][vidx].addend     = te->addend;
+    env->iotlb_v[mmu_idx][vidx]                = env->iotlb[mmu_idx][index];
+
+    /* refill the tlb */
+    env->iotlb[mmu_idx][index] = iotlb - vaddr;
     te->addend = addend - vaddr;
     if (prot & PAGE_READ) {
         te->addr_read = address;
diff --git a/include/exec/cpu-defs.h b/include/exec/cpu-defs.h
index 01cd8c7..18d5f0d 100644
--- a/include/exec/cpu-defs.h
+++ b/include/exec/cpu-defs.h
@@ -72,8 +72,10 @@ typedef uint64_t target_ulong;
 #define TB_JMP_PAGE_MASK (TB_JMP_CACHE_SIZE - TB_JMP_PAGE_SIZE)
 
 #if !defined(CONFIG_USER_ONLY)
-#define CPU_TLB_BITS 8
-#define CPU_TLB_SIZE (1 << CPU_TLB_BITS)
+#define CPU_TLB_BITS  8
+#define CPU_TLB_SIZE  (1 << CPU_TLB_BITS)
+/* use a fully associative victim tlb */
+#define CPU_VTLB_SIZE 8
 
 #if HOST_LONG_BITS == 32 && TARGET_LONG_BITS == 32
 #define CPU_TLB_ENTRY_BITS 4
@@ -103,12 +105,16 @@ typedef struct CPUTLBEntry {
 
 QEMU_BUILD_BUG_ON(sizeof(CPUTLBEntry) != (1 << CPU_TLB_ENTRY_BITS));
 
+/* The meaning of the MMU modes is defined in the target code. */
 #define CPU_COMMON_TLB \
     /* The meaning of the MMU modes is defined in the target code. */   \
-    CPUTLBEntry tlb_table[NB_MMU_MODES][CPU_TLB_SIZE];                  \
-    hwaddr iotlb[NB_MMU_MODES][CPU_TLB_SIZE];               \
+    CPUTLBEntry  tlb_table[NB_MMU_MODES][CPU_TLB_SIZE];                 \
+    CPUTLBEntry  tlb_v_table[NB_MMU_MODES][CPU_VTLB_SIZE];              \
+    hwaddr       iotlb[NB_MMU_MODES][CPU_TLB_SIZE];                     \
+    hwaddr       iotlb_v[NB_MMU_MODES][CPU_VTLB_SIZE];                  \
     target_ulong tlb_flush_addr;                                        \
-    target_ulong tlb_flush_mask;
+    target_ulong tlb_flush_mask;                                        \
+    target_ulong vtlb_index;                                            \
 
 #else
 
diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
index ea90b64..7e88b08 100644
--- a/include/exec/exec-all.h
+++ b/include/exec/exec-all.h
@@ -102,6 +102,8 @@ void tlb_set_page(CPUArchState *env, target_ulong vaddr,
                   hwaddr paddr, int prot,
                   int mmu_idx, target_ulong size);
 void tb_invalidate_phys_addr(hwaddr addr);
+/* swap the 2 given tlb entries as well as their iotlb */
+void swap_tlb(CPUTLBEntry *te, CPUTLBEntry *se, hwaddr *iote, hwaddr *iose);
 #else
 static inline void tlb_flush_page(CPUArchState *env, target_ulong addr)
 {
diff --git a/include/exec/softmmu_template.h b/include/exec/softmmu_template.h
index c6a5440..fe11343 100644
--- a/include/exec/softmmu_template.h
+++ b/include/exec/softmmu_template.h
@@ -141,6 +141,7 @@ WORD_TYPE helper_le_ld_name(CPUArchState *env, target_ulong addr, int mmu_idx,
     target_ulong tlb_addr = env->tlb_table[mmu_idx][index].ADDR_READ;
     uintptr_t haddr;
     DATA_TYPE res;
+    int vtlb_idx;
 
     /* Adjust the given return address.  */
     retaddr -= GETPC_ADJ;
@@ -153,7 +154,24 @@ WORD_TYPE helper_le_ld_name(CPUArchState *env, target_ulong addr, int mmu_idx,
             do_unaligned_access(env, addr, READ_ACCESS_TYPE, mmu_idx, retaddr);
         }
 #endif
-        tlb_fill(env, addr, READ_ACCESS_TYPE, mmu_idx, retaddr);
+        /* we are about to do a page table walk. our last hope is the
+         * victim tlb. try to refill from the victim tlb before walking the
+         * page table. */
+        for (vtlb_idx = CPU_VTLB_SIZE; vtlb_idx >= 0; --vtlb_idx) {
+            if (env->tlb_v_table[mmu_idx][vtlb_idx].ADDR_READ
+                == (addr & TARGET_PAGE_MASK)) {
+                /* found entry in victim tlb */
+                swap_tlb(&env->tlb_table[mmu_idx][index],
+                         &env->tlb_v_table[mmu_idx][vtlb_idx],
+                         &env->iotlb[mmu_idx][index],
+                         &env->iotlb_v[mmu_idx][vtlb_idx]);
+                break;
+            }
+        }
+        /* miss victim tlb */
+        if (vtlb_idx < 0) {
+            tlb_fill(env, addr, READ_ACCESS_TYPE, mmu_idx, retaddr);
+        }
         tlb_addr = env->tlb_table[mmu_idx][index].ADDR_READ;
     }
 
@@ -223,6 +241,7 @@ WORD_TYPE helper_be_ld_name(CPUArchState *env, target_ulong addr, int mmu_idx,
     target_ulong tlb_addr = env->tlb_table[mmu_idx][index].ADDR_READ;
     uintptr_t haddr;
     DATA_TYPE res;
+    int vtlb_idx;
 
     /* Adjust the given return address.  */
     retaddr -= GETPC_ADJ;
@@ -235,7 +254,24 @@ WORD_TYPE helper_be_ld_name(CPUArchState *env, target_ulong addr, int mmu_idx,
             do_unaligned_access(env, addr, READ_ACCESS_TYPE, mmu_idx, retaddr);
         }
 #endif
-        tlb_fill(env, addr, READ_ACCESS_TYPE, mmu_idx, retaddr);
+        /* we are about to do a page table walk. our last hope is the
+         * victim tlb. try to refill from the victim tlb before walking the
+         * page table. */
+        for (vtlb_idx = CPU_VTLB_SIZE; vtlb_idx >= 0; --vtlb_idx) {
+            if (env->tlb_v_table[mmu_idx][vtlb_idx].ADDR_READ
+                == (addr & TARGET_PAGE_MASK)) {
+                /* found entry in victim tlb */
+                swap_tlb(&env->tlb_table[mmu_idx][index],
+                         &env->tlb_v_table[mmu_idx][vtlb_idx],
+                         &env->iotlb[mmu_idx][index],
+                         &env->iotlb_v[mmu_idx][vtlb_idx]);
+                break;
+            }
+        }
+        /* miss victim tlb */
+        if (vtlb_idx < 0) {
+            tlb_fill(env, addr, READ_ACCESS_TYPE, mmu_idx, retaddr);
+        }
         tlb_addr = env->tlb_table[mmu_idx][index].ADDR_READ;
     }
 
@@ -342,6 +378,7 @@ void helper_le_st_name(CPUArchState *env, target_ulong addr, DATA_TYPE val,
     int index = (addr >> TARGET_PAGE_BITS) & (CPU_TLB_SIZE - 1);
     target_ulong tlb_addr = env->tlb_table[mmu_idx][index].addr_write;
     uintptr_t haddr;
+    int vtlb_idx;
 
     /* Adjust the given return address.  */
     retaddr -= GETPC_ADJ;
@@ -354,7 +391,24 @@ void helper_le_st_name(CPUArchState *env, target_ulong addr, DATA_TYPE val,
             do_unaligned_access(env, addr, 1, mmu_idx, retaddr);
         }
 #endif
-        tlb_fill(env, addr, 1, mmu_idx, retaddr);
+        /* we are about to do a page table walk. our last hope is the
+         * victim tlb. try to refill from the victim tlb before walking the
+         * page table. */
+        for (vtlb_idx = CPU_VTLB_SIZE; vtlb_idx >= 0; --vtlb_idx) {
+            if (env->tlb_v_table[mmu_idx][vtlb_idx].addr_write
+                == (addr & TARGET_PAGE_MASK)) {
+                /* found entry in victim tlb */
+                swap_tlb(&env->tlb_table[mmu_idx][index],
+                         &env->tlb_v_table[mmu_idx][vtlb_idx],
+                         &env->iotlb[mmu_idx][index],
+                         &env->iotlb_v[mmu_idx][vtlb_idx]);
+                break;
+            }
+        }
+        /* miss victim tlb */
+        if (vtlb_idx < 0) {
+            tlb_fill(env, addr, 1, mmu_idx, retaddr);
+        }
         tlb_addr = env->tlb_table[mmu_idx][index].addr_write;
     }
 
@@ -418,6 +472,7 @@ void helper_be_st_name(CPUArchState *env, target_ulong addr, DATA_TYPE val,
     int index = (addr >> TARGET_PAGE_BITS) & (CPU_TLB_SIZE - 1);
     target_ulong tlb_addr = env->tlb_table[mmu_idx][index].addr_write;
     uintptr_t haddr;
+    int vtlb_idx;
 
     /* Adjust the given return address.  */
     retaddr -= GETPC_ADJ;
@@ -430,7 +485,24 @@ void helper_be_st_name(CPUArchState *env, target_ulong addr, DATA_TYPE val,
             do_unaligned_access(env, addr, 1, mmu_idx, retaddr);
         }
 #endif
-        tlb_fill(env, addr, 1, mmu_idx, retaddr);
+        /* we are about to do a page table walk. our last hope is the
+         * victim tlb. try to refill from the victim tlb before walking the
+         * page table. */
+        for (vtlb_idx = CPU_VTLB_SIZE; vtlb_idx >= 0; --vtlb_idx) {
+            if (env->tlb_v_table[mmu_idx][vtlb_idx].addr_write
+                == (addr & TARGET_PAGE_MASK)) {
+                /* found entry in victim tlb */
+                swap_tlb(&env->tlb_table[mmu_idx][index],
+                         &env->tlb_v_table[mmu_idx][vtlb_idx],
+                         &env->iotlb[mmu_idx][index],
+                         &env->iotlb_v[mmu_idx][vtlb_idx]);
+                break;
+            }
+        }
+        /* miss victim tlb */
+        if (vtlb_idx < 0) {
+            tlb_fill(env, addr, 1, mmu_idx, retaddr);
+        }
         tlb_addr = env->tlb_table[mmu_idx][index].addr_write;
     }
 
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [Qemu-devel] [PATCH v2] cpu: implementing victim TLB for QEMU system emulated TLB
  2014-01-23 19:49 [Qemu-devel] [PATCH v2] cpu: implementing victim TLB for QEMU system emulated TLB Xin Tong
@ 2014-01-23 19:54 ` Xin Tong
  2014-01-24 12:15   ` [Qemu-devel] [PATCH v2] cpu: implementing victim TLB for QEMUsystem emulated TLBB Alex Bennée
  2014-01-23 20:44 ` [Qemu-devel] [PATCH v2] cpu: implementing victim TLB for QEMU system emulated TLB Max Filippov
  2014-01-23 21:47 ` Richard Henderson
  2 siblings, 1 reply; 12+ messages in thread
From: Xin Tong @ 2014-01-23 19:54 UTC (permalink / raw)
  To: QEMU Developers; +Cc: Xin Tong, afaerber, Richard Henderson

[-- Attachment #1: Type: text/plain, Size: 15400 bytes --]

Attaching data in excel which could not be sent with the patch at the same time.

On Thu, Jan 23, 2014 at 1:49 PM, Xin Tong <trent.tong@gmail.com> wrote:
> This patch adds a victim TLB to the QEMU system mode TLB.
>
> QEMU system mode page table walks are expensive. Taken by running QEMU
> qemu-system-x86_64 system mode on Intel PIN , a TLB miss and walking a
> 4-level page tables in guest Linux OS takes ~450 X86 instructions on
> average.
>
> QEMU system mode TLB is implemented using a directly-mapped hashtable.
> This structure suffers from conflict misses. Increasing the
> associativity of the TLB may not be the solution to conflict misses as
> all the ways may have to be walked in serial.
>
> A victim TLB is a TLB used to hold translations evicted from the
> primary TLB upon replacement. The victim TLB lies between the main TLB
> and its refill path. Victim TLB is of greater associativity (fully
> associative in this patch). It takes longer to lookup the victim TLB,
> but its likely better than a full page table walk. The memory
> translation path is changed as follows :
>
> Before Victim TLB:
> 1. Inline TLB lookup
> 2. Exit code cache on TLB miss.
> 3. Check for unaligned, IO accesses
> 4. TLB refill.
> 5. Do the memory access.
> 6. Return to code cache.
>
> After Victim TLB:
> 1. Inline TLB lookup
> 2. Exit code cache on TLB miss.
> 3. Check for unaligned, IO accesses
> 4. Victim TLB lookup.
> 5. If victim TLB misses, TLB refill
> 6. Do the memory access.
> 7. Return to code cache
>
> The advantage is that victim TLB can offer more associativity to a
> directly mapped TLB and thus potentially fewer page table walks while
> still keeping the time taken to flush within reasonable limits.
> However, placing a victim TLB before the refill path increase TLB
> refill path as the victim TLB is consulted before the TLB refill. The
> performance results demonstrate that the pros outweigh the cons.
>
> Attached are some performance results taken on SPECINT2006 train
> datasets and kernel boot and qemu configure script on an
> Intel(R) Xeon(R) CPU  E5620  @ 2.40GHz Linux machine. In
> summary, victim TLB improves the performance of qemu-system-x86_64 by
> 10.7% on average on SPECINT2006 and with highest improvement of in 25.4%
> in 464.h264ref. And victim TLB does not result in any performance
> degradation in any of the measured benchmarks. Furthermore, the
> implemented victim TLB is architecture independent and is expected to
> benefit other architectures in QEMU as well.
>
> Although there are measurement fluctuations, the performance
> improvement is very significant and by no means in the range of
> noises.
>
> Reviewed-by: Richard Henderson <rth@twiddle.net>
> Signed-off-by: Xin Tong <trent.tong@gmail.com>
>
> ---
>  cputlb.c                        | 50 +++++++++++++++++++++++++-
>  include/exec/cpu-defs.h         | 16 ++++++---
>  include/exec/exec-all.h         |  2 ++
>  include/exec/softmmu_template.h | 80 ++++++++++++++++++++++++++++++++++++++---
>  4 files changed, 138 insertions(+), 10 deletions(-)
>
> diff --git a/cputlb.c b/cputlb.c
> index b533f3f..03a048a 100644
> --- a/cputlb.c
> +++ b/cputlb.c
> @@ -34,6 +34,22 @@
>  /* statistics */
>  int tlb_flush_count;
>
> +/* swap the 2 given TLB entries as well as their corresponding IOTLB */
> +inline void swap_tlb(CPUTLBEntry *te, CPUTLBEntry *se, hwaddr *iote,
> +                     hwaddr *iose)
> +{
> +   hwaddr iotmp;
> +   CPUTLBEntry t;
> +   /* swap iotlb */
> +   iotmp = *iote;
> +   *iote = *iose;
> +   *iose = iotmp;
> +   /* swap tlb */
> +   memcpy(&t, te, sizeof(CPUTLBEntry));
> +   memcpy(te, se, sizeof(CPUTLBEntry));
> +   memcpy(se, &t, sizeof(CPUTLBEntry));
> +}
> +
>  /* NOTE:
>   * If flush_global is true (the usual case), flush all tlb entries.
>   * If flush_global is false, flush (at least) all tlb entries not
> @@ -58,8 +74,10 @@ void tlb_flush(CPUArchState *env, int flush_global)
>      cpu->current_tb = NULL;
>
>      memset(env->tlb_table, -1, sizeof(env->tlb_table));
> +    memset(env->tlb_v_table, -1, sizeof(env->tlb_v_table));
>      memset(env->tb_jmp_cache, 0, sizeof(env->tb_jmp_cache));
>
> +    env->vtlb_index = 0;
>      env->tlb_flush_addr = -1;
>      env->tlb_flush_mask = 0;
>      tlb_flush_count++;
> @@ -106,6 +124,14 @@ void tlb_flush_page(CPUArchState *env, target_ulong addr)
>          tlb_flush_entry(&env->tlb_table[mmu_idx][i], addr);
>      }
>
> +    /* check whether there are entries that need to be flushed in the vtlb */
> +    for (mmu_idx = 0; mmu_idx < NB_MMU_MODES; mmu_idx++) {
> +        unsigned int k;
> +        for (k = 0; k < CPU_VTLB_SIZE; k++) {
> +            tlb_flush_entry(&env->tlb_v_table[mmu_idx][k], addr);
> +        }
> +    }
> +
>      tb_flush_jmp_cache(env, addr);
>  }
>
> @@ -170,6 +196,11 @@ void cpu_tlb_reset_dirty_all(ram_addr_t start1, ram_addr_t length)
>                  tlb_reset_dirty_range(&env->tlb_table[mmu_idx][i],
>                                        start1, length);
>              }
> +
> +            for (i = 0; i < CPU_VTLB_SIZE; i++) {
> +                tlb_reset_dirty_range(&env->tlb_v_table[mmu_idx][i],
> +                                      start1, length);
> +            }
>          }
>      }
>  }
> @@ -193,6 +224,13 @@ void tlb_set_dirty(CPUArchState *env, target_ulong vaddr)
>      for (mmu_idx = 0; mmu_idx < NB_MMU_MODES; mmu_idx++) {
>          tlb_set_dirty1(&env->tlb_table[mmu_idx][i], vaddr);
>      }
> +
> +    for (mmu_idx = 0; mmu_idx < NB_MMU_MODES; mmu_idx++) {
> +        unsigned int k;
> +        for (k = 0; k < CPU_VTLB_SIZE; k++) {
> +            tlb_set_dirty1(&env->tlb_v_table[mmu_idx][k], vaddr);
> +        }
> +    }
>  }
>
>  /* Our TLB does not support large pages, so remember the area covered by
> @@ -264,8 +302,18 @@ void tlb_set_page(CPUArchState *env, target_ulong vaddr,
>                                              prot, &address);
>
>      index = (vaddr >> TARGET_PAGE_BITS) & (CPU_TLB_SIZE - 1);
> -    env->iotlb[mmu_idx][index] = iotlb - vaddr;
>      te = &env->tlb_table[mmu_idx][index];
> +
> +    /* do not discard the translation in te, evict it into a victim tlb */
> +    unsigned vidx = env->vtlb_index++ % CPU_VTLB_SIZE;
> +    env->tlb_v_table[mmu_idx][vidx].addr_read  = te->addr_read;
> +    env->tlb_v_table[mmu_idx][vidx].addr_write = te->addr_write;
> +    env->tlb_v_table[mmu_idx][vidx].addr_code  = te->addr_code;
> +    env->tlb_v_table[mmu_idx][vidx].addend     = te->addend;
> +    env->iotlb_v[mmu_idx][vidx]                = env->iotlb[mmu_idx][index];
> +
> +    /* refill the tlb */
> +    env->iotlb[mmu_idx][index] = iotlb - vaddr;
>      te->addend = addend - vaddr;
>      if (prot & PAGE_READ) {
>          te->addr_read = address;
> diff --git a/include/exec/cpu-defs.h b/include/exec/cpu-defs.h
> index 01cd8c7..18d5f0d 100644
> --- a/include/exec/cpu-defs.h
> +++ b/include/exec/cpu-defs.h
> @@ -72,8 +72,10 @@ typedef uint64_t target_ulong;
>  #define TB_JMP_PAGE_MASK (TB_JMP_CACHE_SIZE - TB_JMP_PAGE_SIZE)
>
>  #if !defined(CONFIG_USER_ONLY)
> -#define CPU_TLB_BITS 8
> -#define CPU_TLB_SIZE (1 << CPU_TLB_BITS)
> +#define CPU_TLB_BITS  8
> +#define CPU_TLB_SIZE  (1 << CPU_TLB_BITS)
> +/* use a fully associative victim tlb */
> +#define CPU_VTLB_SIZE 8
>
>  #if HOST_LONG_BITS == 32 && TARGET_LONG_BITS == 32
>  #define CPU_TLB_ENTRY_BITS 4
> @@ -103,12 +105,16 @@ typedef struct CPUTLBEntry {
>
>  QEMU_BUILD_BUG_ON(sizeof(CPUTLBEntry) != (1 << CPU_TLB_ENTRY_BITS));
>
> +/* The meaning of the MMU modes is defined in the target code. */
>  #define CPU_COMMON_TLB \
>      /* The meaning of the MMU modes is defined in the target code. */   \
> -    CPUTLBEntry tlb_table[NB_MMU_MODES][CPU_TLB_SIZE];                  \
> -    hwaddr iotlb[NB_MMU_MODES][CPU_TLB_SIZE];               \
> +    CPUTLBEntry  tlb_table[NB_MMU_MODES][CPU_TLB_SIZE];                 \
> +    CPUTLBEntry  tlb_v_table[NB_MMU_MODES][CPU_VTLB_SIZE];              \
> +    hwaddr       iotlb[NB_MMU_MODES][CPU_TLB_SIZE];                     \
> +    hwaddr       iotlb_v[NB_MMU_MODES][CPU_VTLB_SIZE];                  \
>      target_ulong tlb_flush_addr;                                        \
> -    target_ulong tlb_flush_mask;
> +    target_ulong tlb_flush_mask;                                        \
> +    target_ulong vtlb_index;                                            \
>
>  #else
>
> diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
> index ea90b64..7e88b08 100644
> --- a/include/exec/exec-all.h
> +++ b/include/exec/exec-all.h
> @@ -102,6 +102,8 @@ void tlb_set_page(CPUArchState *env, target_ulong vaddr,
>                    hwaddr paddr, int prot,
>                    int mmu_idx, target_ulong size);
>  void tb_invalidate_phys_addr(hwaddr addr);
> +/* swap the 2 given tlb entries as well as their iotlb */
> +void swap_tlb(CPUTLBEntry *te, CPUTLBEntry *se, hwaddr *iote, hwaddr *iose);
>  #else
>  static inline void tlb_flush_page(CPUArchState *env, target_ulong addr)
>  {
> diff --git a/include/exec/softmmu_template.h b/include/exec/softmmu_template.h
> index c6a5440..fe11343 100644
> --- a/include/exec/softmmu_template.h
> +++ b/include/exec/softmmu_template.h
> @@ -141,6 +141,7 @@ WORD_TYPE helper_le_ld_name(CPUArchState *env, target_ulong addr, int mmu_idx,
>      target_ulong tlb_addr = env->tlb_table[mmu_idx][index].ADDR_READ;
>      uintptr_t haddr;
>      DATA_TYPE res;
> +    int vtlb_idx;
>
>      /* Adjust the given return address.  */
>      retaddr -= GETPC_ADJ;
> @@ -153,7 +154,24 @@ WORD_TYPE helper_le_ld_name(CPUArchState *env, target_ulong addr, int mmu_idx,
>              do_unaligned_access(env, addr, READ_ACCESS_TYPE, mmu_idx, retaddr);
>          }
>  #endif
> -        tlb_fill(env, addr, READ_ACCESS_TYPE, mmu_idx, retaddr);
> +        /* we are about to do a page table walk. our last hope is the
> +         * victim tlb. try to refill from the victim tlb before walking the
> +         * page table. */
> +        for (vtlb_idx = CPU_VTLB_SIZE; vtlb_idx >= 0; --vtlb_idx) {
> +            if (env->tlb_v_table[mmu_idx][vtlb_idx].ADDR_READ
> +                == (addr & TARGET_PAGE_MASK)) {
> +                /* found entry in victim tlb */
> +                swap_tlb(&env->tlb_table[mmu_idx][index],
> +                         &env->tlb_v_table[mmu_idx][vtlb_idx],
> +                         &env->iotlb[mmu_idx][index],
> +                         &env->iotlb_v[mmu_idx][vtlb_idx]);
> +                break;
> +            }
> +        }
> +        /* miss victim tlb */
> +        if (vtlb_idx < 0) {
> +            tlb_fill(env, addr, READ_ACCESS_TYPE, mmu_idx, retaddr);
> +        }
>          tlb_addr = env->tlb_table[mmu_idx][index].ADDR_READ;
>      }
>
> @@ -223,6 +241,7 @@ WORD_TYPE helper_be_ld_name(CPUArchState *env, target_ulong addr, int mmu_idx,
>      target_ulong tlb_addr = env->tlb_table[mmu_idx][index].ADDR_READ;
>      uintptr_t haddr;
>      DATA_TYPE res;
> +    int vtlb_idx;
>
>      /* Adjust the given return address.  */
>      retaddr -= GETPC_ADJ;
> @@ -235,7 +254,24 @@ WORD_TYPE helper_be_ld_name(CPUArchState *env, target_ulong addr, int mmu_idx,
>              do_unaligned_access(env, addr, READ_ACCESS_TYPE, mmu_idx, retaddr);
>          }
>  #endif
> -        tlb_fill(env, addr, READ_ACCESS_TYPE, mmu_idx, retaddr);
> +        /* we are about to do a page table walk. our last hope is the
> +         * victim tlb. try to refill from the victim tlb before walking the
> +         * page table. */
> +        for (vtlb_idx = CPU_VTLB_SIZE; vtlb_idx >= 0; --vtlb_idx) {
> +            if (env->tlb_v_table[mmu_idx][vtlb_idx].ADDR_READ
> +                == (addr & TARGET_PAGE_MASK)) {
> +                /* found entry in victim tlb */
> +                swap_tlb(&env->tlb_table[mmu_idx][index],
> +                         &env->tlb_v_table[mmu_idx][vtlb_idx],
> +                         &env->iotlb[mmu_idx][index],
> +                         &env->iotlb_v[mmu_idx][vtlb_idx]);
> +                break;
> +            }
> +        }
> +        /* miss victim tlb */
> +        if (vtlb_idx < 0) {
> +            tlb_fill(env, addr, READ_ACCESS_TYPE, mmu_idx, retaddr);
> +        }
>          tlb_addr = env->tlb_table[mmu_idx][index].ADDR_READ;
>      }
>
> @@ -342,6 +378,7 @@ void helper_le_st_name(CPUArchState *env, target_ulong addr, DATA_TYPE val,
>      int index = (addr >> TARGET_PAGE_BITS) & (CPU_TLB_SIZE - 1);
>      target_ulong tlb_addr = env->tlb_table[mmu_idx][index].addr_write;
>      uintptr_t haddr;
> +    int vtlb_idx;
>
>      /* Adjust the given return address.  */
>      retaddr -= GETPC_ADJ;
> @@ -354,7 +391,24 @@ void helper_le_st_name(CPUArchState *env, target_ulong addr, DATA_TYPE val,
>              do_unaligned_access(env, addr, 1, mmu_idx, retaddr);
>          }
>  #endif
> -        tlb_fill(env, addr, 1, mmu_idx, retaddr);
> +        /* we are about to do a page table walk. our last hope is the
> +         * victim tlb. try to refill from the victim tlb before walking the
> +         * page table. */
> +        for (vtlb_idx = CPU_VTLB_SIZE; vtlb_idx >= 0; --vtlb_idx) {
> +            if (env->tlb_v_table[mmu_idx][vtlb_idx].addr_write
> +                == (addr & TARGET_PAGE_MASK)) {
> +                /* found entry in victim tlb */
> +                swap_tlb(&env->tlb_table[mmu_idx][index],
> +                         &env->tlb_v_table[mmu_idx][vtlb_idx],
> +                         &env->iotlb[mmu_idx][index],
> +                         &env->iotlb_v[mmu_idx][vtlb_idx]);
> +                break;
> +            }
> +        }
> +        /* miss victim tlb */
> +        if (vtlb_idx < 0) {
> +            tlb_fill(env, addr, 1, mmu_idx, retaddr);
> +        }
>          tlb_addr = env->tlb_table[mmu_idx][index].addr_write;
>      }
>
> @@ -418,6 +472,7 @@ void helper_be_st_name(CPUArchState *env, target_ulong addr, DATA_TYPE val,
>      int index = (addr >> TARGET_PAGE_BITS) & (CPU_TLB_SIZE - 1);
>      target_ulong tlb_addr = env->tlb_table[mmu_idx][index].addr_write;
>      uintptr_t haddr;
> +    int vtlb_idx;
>
>      /* Adjust the given return address.  */
>      retaddr -= GETPC_ADJ;
> @@ -430,7 +485,24 @@ void helper_be_st_name(CPUArchState *env, target_ulong addr, DATA_TYPE val,
>              do_unaligned_access(env, addr, 1, mmu_idx, retaddr);
>          }
>  #endif
> -        tlb_fill(env, addr, 1, mmu_idx, retaddr);
> +        /* we are about to do a page table walk. our last hope is the
> +         * victim tlb. try to refill from the victim tlb before walking the
> +         * page table. */
> +        for (vtlb_idx = CPU_VTLB_SIZE; vtlb_idx >= 0; --vtlb_idx) {
> +            if (env->tlb_v_table[mmu_idx][vtlb_idx].addr_write
> +                == (addr & TARGET_PAGE_MASK)) {
> +                /* found entry in victim tlb */
> +                swap_tlb(&env->tlb_table[mmu_idx][index],
> +                         &env->tlb_v_table[mmu_idx][vtlb_idx],
> +                         &env->iotlb[mmu_idx][index],
> +                         &env->iotlb_v[mmu_idx][vtlb_idx]);
> +                break;
> +            }
> +        }
> +        /* miss victim tlb */
> +        if (vtlb_idx < 0) {
> +            tlb_fill(env, addr, 1, mmu_idx, retaddr);
> +        }
>          tlb_addr = env->tlb_table[mmu_idx][index].addr_write;
>      }
>
> --
> 1.8.3.2
>

[-- Attachment #2: vtlb.xlsx --]
[-- Type: application/vnd.openxmlformats-officedocument.spreadsheetml.sheet, Size: 15093 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Qemu-devel] [PATCH v2] cpu: implementing victim TLB for QEMU system emulated TLB
  2014-01-23 19:49 [Qemu-devel] [PATCH v2] cpu: implementing victim TLB for QEMU system emulated TLB Xin Tong
  2014-01-23 19:54 ` Xin Tong
@ 2014-01-23 20:44 ` Max Filippov
  2014-01-23 21:29   ` Xin Tong
  2014-01-23 21:47 ` Richard Henderson
  2 siblings, 1 reply; 12+ messages in thread
From: Max Filippov @ 2014-01-23 20:44 UTC (permalink / raw)
  To: Xin Tong; +Cc: Richard Henderson, qemu-devel, Andreas Färber

Hi Xin,

On Thu, Jan 23, 2014 at 11:49 PM, Xin Tong <trent.tong@gmail.com> wrote:

[...]

> diff --git a/cputlb.c b/cputlb.c
> index b533f3f..03a048a 100644
> --- a/cputlb.c
> +++ b/cputlb.c
> @@ -34,6 +34,22 @@
>  /* statistics */
>  int tlb_flush_count;
>
> +/* swap the 2 given TLB entries as well as their corresponding IOTLB */
> +inline void swap_tlb(CPUTLBEntry *te, CPUTLBEntry *se, hwaddr *iote,
> +                     hwaddr *iose)
> +{
> +   hwaddr iotmp;
> +   CPUTLBEntry t;
> +   /* swap iotlb */
> +   iotmp = *iote;
> +   *iote = *iose;
> +   *iose = iotmp;
> +   /* swap tlb */
> +   memcpy(&t, te, sizeof(CPUTLBEntry));
> +   memcpy(te, se, sizeof(CPUTLBEntry));
> +   memcpy(se, &t, sizeof(CPUTLBEntry));

You could use assignment operator, it works:
    t = *te;
    *te = *se;
    *se = t;

Also all users of this function are inline functions from softmmu_template.h
Could this function be a static inline function in that file too?

> +}

[...]

> --- a/include/exec/softmmu_template.h
> +++ b/include/exec/softmmu_template.h
> @@ -141,6 +141,7 @@ WORD_TYPE helper_le_ld_name(CPUArchState *env, target_ulong addr, int mmu_idx,
>      target_ulong tlb_addr = env->tlb_table[mmu_idx][index].ADDR_READ;
>      uintptr_t haddr;
>      DATA_TYPE res;
> +    int vtlb_idx;
>
>      /* Adjust the given return address.  */
>      retaddr -= GETPC_ADJ;
> @@ -153,7 +154,24 @@ WORD_TYPE helper_le_ld_name(CPUArchState *env, target_ulong addr, int mmu_idx,
>              do_unaligned_access(env, addr, READ_ACCESS_TYPE, mmu_idx, retaddr);
>          }
>  #endif
> -        tlb_fill(env, addr, READ_ACCESS_TYPE, mmu_idx, retaddr);
> +        /* we are about to do a page table walk. our last hope is the
> +         * victim tlb. try to refill from the victim tlb before walking the
> +         * page table. */
> +        for (vtlb_idx = CPU_VTLB_SIZE; vtlb_idx >= 0; --vtlb_idx) {
> +            if (env->tlb_v_table[mmu_idx][vtlb_idx].ADDR_READ

Is it ADDR_READ or addr_read, as it is called in other places?

> +                == (addr & TARGET_PAGE_MASK)) {
> +                /* found entry in victim tlb */
> +                swap_tlb(&env->tlb_table[mmu_idx][index],
> +                         &env->tlb_v_table[mmu_idx][vtlb_idx],
> +                         &env->iotlb[mmu_idx][index],
> +                         &env->iotlb_v[mmu_idx][vtlb_idx]);
> +                break;
> +            }
> +        }
> +        /* miss victim tlb */
> +        if (vtlb_idx < 0) {
> +            tlb_fill(env, addr, READ_ACCESS_TYPE, mmu_idx, retaddr);
> +        }
>          tlb_addr = env->tlb_table[mmu_idx][index].ADDR_READ;
>      }
>
> @@ -223,6 +241,7 @@ WORD_TYPE helper_be_ld_name(CPUArchState *env, target_ulong addr, int mmu_idx,
>      target_ulong tlb_addr = env->tlb_table[mmu_idx][index].ADDR_READ;
>      uintptr_t haddr;
>      DATA_TYPE res;
> +    int vtlb_idx;
>
>      /* Adjust the given return address.  */
>      retaddr -= GETPC_ADJ;
> @@ -235,7 +254,24 @@ WORD_TYPE helper_be_ld_name(CPUArchState *env, target_ulong addr, int mmu_idx,
>              do_unaligned_access(env, addr, READ_ACCESS_TYPE, mmu_idx, retaddr);
>          }
>  #endif
> -        tlb_fill(env, addr, READ_ACCESS_TYPE, mmu_idx, retaddr);
> +        /* we are about to do a page table walk. our last hope is the
> +         * victim tlb. try to refill from the victim tlb before walking the
> +         * page table. */
> +        for (vtlb_idx = CPU_VTLB_SIZE; vtlb_idx >= 0; --vtlb_idx) {
> +            if (env->tlb_v_table[mmu_idx][vtlb_idx].ADDR_READ
> +                == (addr & TARGET_PAGE_MASK)) {
> +                /* found entry in victim tlb */
> +                swap_tlb(&env->tlb_table[mmu_idx][index],
> +                         &env->tlb_v_table[mmu_idx][vtlb_idx],
> +                         &env->iotlb[mmu_idx][index],
> +                         &env->iotlb_v[mmu_idx][vtlb_idx]);
> +                break;
> +            }
> +        }

It could be a good inline function.

> +        /* miss victim tlb */
> +        if (vtlb_idx < 0) {
> +            tlb_fill(env, addr, READ_ACCESS_TYPE, mmu_idx, retaddr);
> +        }
>          tlb_addr = env->tlb_table[mmu_idx][index].ADDR_READ;
>      }
>
> @@ -342,6 +378,7 @@ void helper_le_st_name(CPUArchState *env, target_ulong addr, DATA_TYPE val,
>      int index = (addr >> TARGET_PAGE_BITS) & (CPU_TLB_SIZE - 1);
>      target_ulong tlb_addr = env->tlb_table[mmu_idx][index].addr_write;
>      uintptr_t haddr;
> +    int vtlb_idx;
>
>      /* Adjust the given return address.  */
>      retaddr -= GETPC_ADJ;
> @@ -354,7 +391,24 @@ void helper_le_st_name(CPUArchState *env, target_ulong addr, DATA_TYPE val,
>              do_unaligned_access(env, addr, 1, mmu_idx, retaddr);
>          }
>  #endif
> -        tlb_fill(env, addr, 1, mmu_idx, retaddr);
> +        /* we are about to do a page table walk. our last hope is the
> +         * victim tlb. try to refill from the victim tlb before walking the
> +         * page table. */
> +        for (vtlb_idx = CPU_VTLB_SIZE; vtlb_idx >= 0; --vtlb_idx) {
> +            if (env->tlb_v_table[mmu_idx][vtlb_idx].addr_write
> +                == (addr & TARGET_PAGE_MASK)) {
> +                /* found entry in victim tlb */
> +                swap_tlb(&env->tlb_table[mmu_idx][index],
> +                         &env->tlb_v_table[mmu_idx][vtlb_idx],
> +                         &env->iotlb[mmu_idx][index],
> +                         &env->iotlb_v[mmu_idx][vtlb_idx]);
> +                break;
> +            }
> +        }
> +        /* miss victim tlb */
> +        if (vtlb_idx < 0) {
> +            tlb_fill(env, addr, 1, mmu_idx, retaddr);
> +        }
>          tlb_addr = env->tlb_table[mmu_idx][index].addr_write;
>      }
>
> @@ -418,6 +472,7 @@ void helper_be_st_name(CPUArchState *env, target_ulong addr, DATA_TYPE val,
>      int index = (addr >> TARGET_PAGE_BITS) & (CPU_TLB_SIZE - 1);
>      target_ulong tlb_addr = env->tlb_table[mmu_idx][index].addr_write;
>      uintptr_t haddr;
> +    int vtlb_idx;
>
>      /* Adjust the given return address.  */
>      retaddr -= GETPC_ADJ;
> @@ -430,7 +485,24 @@ void helper_be_st_name(CPUArchState *env, target_ulong addr, DATA_TYPE val,
>              do_unaligned_access(env, addr, 1, mmu_idx, retaddr);
>          }
>  #endif
> -        tlb_fill(env, addr, 1, mmu_idx, retaddr);
> +        /* we are about to do a page table walk. our last hope is the
> +         * victim tlb. try to refill from the victim tlb before walking the
> +         * page table. */
> +        for (vtlb_idx = CPU_VTLB_SIZE; vtlb_idx >= 0; --vtlb_idx) {
> +            if (env->tlb_v_table[mmu_idx][vtlb_idx].addr_write
> +                == (addr & TARGET_PAGE_MASK)) {
> +                /* found entry in victim tlb */
> +                swap_tlb(&env->tlb_table[mmu_idx][index],
> +                         &env->tlb_v_table[mmu_idx][vtlb_idx],
> +                         &env->iotlb[mmu_idx][index],
> +                         &env->iotlb_v[mmu_idx][vtlb_idx]);
> +                break;
> +            }
> +        }

It could be a good inline function too.

> +        /* miss victim tlb */
> +        if (vtlb_idx < 0) {
> +            tlb_fill(env, addr, 1, mmu_idx, retaddr);
> +        }
>          tlb_addr = env->tlb_table[mmu_idx][index].addr_write;
>      }

-- 
Thanks.
-- Max

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Qemu-devel] [PATCH v2] cpu: implementing victim TLB for QEMU system emulated TLB
  2014-01-23 20:44 ` [Qemu-devel] [PATCH v2] cpu: implementing victim TLB for QEMU system emulated TLB Max Filippov
@ 2014-01-23 21:29   ` Xin Tong
  2014-01-23 21:46     ` Max Filippov
  0 siblings, 1 reply; 12+ messages in thread
From: Xin Tong @ 2014-01-23 21:29 UTC (permalink / raw)
  To: Max Filippov; +Cc: Richard Henderson, qemu-devel, Andreas Färber

Hi Max

Thank you for taking the time to review my patch

On Thu, Jan 23, 2014 at 2:44 PM, Max Filippov <jcmvbkbc@gmail.com> wrote:
> Hi Xin,
>
> On Thu, Jan 23, 2014 at 11:49 PM, Xin Tong <trent.tong@gmail.com> wrote:
>
> [...]
>
>> diff --git a/cputlb.c b/cputlb.c
>> index b533f3f..03a048a 100644
>> --- a/cputlb.c
>> +++ b/cputlb.c
>> @@ -34,6 +34,22 @@
>>  /* statistics */
>>  int tlb_flush_count;
>>
>> +/* swap the 2 given TLB entries as well as their corresponding IOTLB */
>> +inline void swap_tlb(CPUTLBEntry *te, CPUTLBEntry *se, hwaddr *iote,
>> +                     hwaddr *iose)
>> +{
>> +   hwaddr iotmp;
>> +   CPUTLBEntry t;
>> +   /* swap iotlb */
>> +   iotmp = *iote;
>> +   *iote = *iose;
>> +   *iose = iotmp;
>> +   /* swap tlb */
>> +   memcpy(&t, te, sizeof(CPUTLBEntry));
>> +   memcpy(te, se, sizeof(CPUTLBEntry));
>> +   memcpy(se, &t, sizeof(CPUTLBEntry));
>
> You could use assignment operator, it works:
>     t = *te;
>     *te = *se;
>     *se = t;
>
i see, will fix in patch v3.

> Also all users of this function are inline functions from softmmu_template.h
> Could this function be a static inline function in that file too?

I am following where tlb_fill and tlb_flush are placed. Also, the
softmmu_template.h is included more than once, would put the swap_tlb
in softmmu_template.h result in multiple definitions ?
>
>> +}
>
> [...]
>
>> --- a/include/exec/softmmu_template.h
>> +++ b/include/exec/softmmu_template.h
>> @@ -141,6 +141,7 @@ WORD_TYPE helper_le_ld_name(CPUArchState *env, target_ulong addr, int mmu_idx,
>>      target_ulong tlb_addr = env->tlb_table[mmu_idx][index].ADDR_READ;
>>      uintptr_t haddr;
>>      DATA_TYPE res;
>> +    int vtlb_idx;
>>
>>      /* Adjust the given return address.  */
>>      retaddr -= GETPC_ADJ;
>> @@ -153,7 +154,24 @@ WORD_TYPE helper_le_ld_name(CPUArchState *env, target_ulong addr, int mmu_idx,
>>              do_unaligned_access(env, addr, READ_ACCESS_TYPE, mmu_idx, retaddr);
>>          }
>>  #endif
>> -        tlb_fill(env, addr, READ_ACCESS_TYPE, mmu_idx, retaddr);
>> +        /* we are about to do a page table walk. our last hope is the
>> +         * victim tlb. try to refill from the victim tlb before walking the
>> +         * page table. */
>> +        for (vtlb_idx = CPU_VTLB_SIZE; vtlb_idx >= 0; --vtlb_idx) {
>> +            if (env->tlb_v_table[mmu_idx][vtlb_idx].ADDR_READ
>
> Is it ADDR_READ or addr_read, as it is called in other places?

I am following  target_ulong tlb_addr =
env->tlb_table[mmu_idx][index].ADDR_READ; in the same function. I am
not sure i get what you mean.

>
>> +                == (addr & TARGET_PAGE_MASK)) {
>> +                /* found entry in victim tlb */
>> +                swap_tlb(&env->tlb_table[mmu_idx][index],
>> +                         &env->tlb_v_table[mmu_idx][vtlb_idx],
>> +                         &env->iotlb[mmu_idx][index],
>> +                         &env->iotlb_v[mmu_idx][vtlb_idx]);
>> +                break;
>> +            }
>> +        }
>> +        /* miss victim tlb */
>> +        if (vtlb_idx < 0) {
>> +            tlb_fill(env, addr, READ_ACCESS_TYPE, mmu_idx, retaddr);
>> +        }
>>          tlb_addr = env->tlb_table[mmu_idx][index].ADDR_READ;
>>      }
>>
>> @@ -223,6 +241,7 @@ WORD_TYPE helper_be_ld_name(CPUArchState *env, target_ulong addr, int mmu_idx,
>>      target_ulong tlb_addr = env->tlb_table[mmu_idx][index].ADDR_READ;
>>      uintptr_t haddr;
>>      DATA_TYPE res;
>> +    int vtlb_idx;
>>
>>      /* Adjust the given return address.  */
>>      retaddr -= GETPC_ADJ;
>> @@ -235,7 +254,24 @@ WORD_TYPE helper_be_ld_name(CPUArchState *env, target_ulong addr, int mmu_idx,
>>              do_unaligned_access(env, addr, READ_ACCESS_TYPE, mmu_idx, retaddr);
>>          }
>>  #endif
>> -        tlb_fill(env, addr, READ_ACCESS_TYPE, mmu_idx, retaddr);
>> +        /* we are about to do a page table walk. our last hope is the
>> +         * victim tlb. try to refill from the victim tlb before walking the
>> +         * page table. */
>> +        for (vtlb_idx = CPU_VTLB_SIZE; vtlb_idx >= 0; --vtlb_idx) {
>> +            if (env->tlb_v_table[mmu_idx][vtlb_idx].ADDR_READ
>> +                == (addr & TARGET_PAGE_MASK)) {
>> +                /* found entry in victim tlb */
>> +                swap_tlb(&env->tlb_table[mmu_idx][index],
>> +                         &env->tlb_v_table[mmu_idx][vtlb_idx],
>> +                         &env->iotlb[mmu_idx][index],
>> +                         &env->iotlb_v[mmu_idx][vtlb_idx]);
>> +                break;
>> +            }
>> +        }
>
> It could be a good inline function.
>
>> +        /* miss victim tlb */
>> +        if (vtlb_idx < 0) {
>> +            tlb_fill(env, addr, READ_ACCESS_TYPE, mmu_idx, retaddr);
>> +        }
>>          tlb_addr = env->tlb_table[mmu_idx][index].ADDR_READ;
>>      }
>>
>> @@ -342,6 +378,7 @@ void helper_le_st_name(CPUArchState *env, target_ulong addr, DATA_TYPE val,
>>      int index = (addr >> TARGET_PAGE_BITS) & (CPU_TLB_SIZE - 1);
>>      target_ulong tlb_addr = env->tlb_table[mmu_idx][index].addr_write;
>>      uintptr_t haddr;
>> +    int vtlb_idx;
>>
>>      /* Adjust the given return address.  */
>>      retaddr -= GETPC_ADJ;
>> @@ -354,7 +391,24 @@ void helper_le_st_name(CPUArchState *env, target_ulong addr, DATA_TYPE val,
>>              do_unaligned_access(env, addr, 1, mmu_idx, retaddr);
>>          }
>>  #endif
>> -        tlb_fill(env, addr, 1, mmu_idx, retaddr);
>> +        /* we are about to do a page table walk. our last hope is the
>> +         * victim tlb. try to refill from the victim tlb before walking the
>> +         * page table. */
>> +        for (vtlb_idx = CPU_VTLB_SIZE; vtlb_idx >= 0; --vtlb_idx) {
>> +            if (env->tlb_v_table[mmu_idx][vtlb_idx].addr_write
>> +                == (addr & TARGET_PAGE_MASK)) {
>> +                /* found entry in victim tlb */
>> +                swap_tlb(&env->tlb_table[mmu_idx][index],
>> +                         &env->tlb_v_table[mmu_idx][vtlb_idx],
>> +                         &env->iotlb[mmu_idx][index],
>> +                         &env->iotlb_v[mmu_idx][vtlb_idx]);
>> +                break;
>> +            }
>> +        }
>> +        /* miss victim tlb */
>> +        if (vtlb_idx < 0) {
>> +            tlb_fill(env, addr, 1, mmu_idx, retaddr);
>> +        }
>>          tlb_addr = env->tlb_table[mmu_idx][index].addr_write;
>>      }
>>
>> @@ -418,6 +472,7 @@ void helper_be_st_name(CPUArchState *env, target_ulong addr, DATA_TYPE val,
>>      int index = (addr >> TARGET_PAGE_BITS) & (CPU_TLB_SIZE - 1);
>>      target_ulong tlb_addr = env->tlb_table[mmu_idx][index].addr_write;
>>      uintptr_t haddr;
>> +    int vtlb_idx;
>>
>>      /* Adjust the given return address.  */
>>      retaddr -= GETPC_ADJ;
>> @@ -430,7 +485,24 @@ void helper_be_st_name(CPUArchState *env, target_ulong addr, DATA_TYPE val,
>>              do_unaligned_access(env, addr, 1, mmu_idx, retaddr);
>>          }
>>  #endif
>> -        tlb_fill(env, addr, 1, mmu_idx, retaddr);
>> +        /* we are about to do a page table walk. our last hope is the
>> +         * victim tlb. try to refill from the victim tlb before walking the
>> +         * page table. */
>> +        for (vtlb_idx = CPU_VTLB_SIZE; vtlb_idx >= 0; --vtlb_idx) {
>> +            if (env->tlb_v_table[mmu_idx][vtlb_idx].addr_write
>> +                == (addr & TARGET_PAGE_MASK)) {
>> +                /* found entry in victim tlb */
>> +                swap_tlb(&env->tlb_table[mmu_idx][index],
>> +                         &env->tlb_v_table[mmu_idx][vtlb_idx],
>> +                         &env->iotlb[mmu_idx][index],
>> +                         &env->iotlb_v[mmu_idx][vtlb_idx]);
>> +                break;
>> +            }
>> +        }
>
> It could be a good inline function too.

One way is to define the victim tlb lookup code in a C function which
gets called before tlb_fill. but i need the ADDR_READ and addr_write
macros.
>
>> +        /* miss victim tlb */
>> +        if (vtlb_idx < 0) {
>> +            tlb_fill(env, addr, 1, mmu_idx, retaddr);
>> +        }
>>          tlb_addr = env->tlb_table[mmu_idx][index].addr_write;
>>      }
>
> --
> Thanks.
> -- Max

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Qemu-devel] [PATCH v2] cpu: implementing victim TLB for QEMU system emulated TLB
  2014-01-23 21:29   ` Xin Tong
@ 2014-01-23 21:46     ` Max Filippov
  0 siblings, 0 replies; 12+ messages in thread
From: Max Filippov @ 2014-01-23 21:46 UTC (permalink / raw)
  To: Xin Tong; +Cc: Richard Henderson, qemu-devel, Andreas Färber

On Fri, Jan 24, 2014 at 1:29 AM, Xin Tong <trent.tong@gmail.com> wrote:
>>> +/* swap the 2 given TLB entries as well as their corresponding IOTLB */
>>> +inline void swap_tlb(CPUTLBEntry *te, CPUTLBEntry *se, hwaddr *iote,
>>> +                     hwaddr *iose)
>>> +{
>>> +   hwaddr iotmp;
>>> +   CPUTLBEntry t;
>>> +   /* swap iotlb */
>>> +   iotmp = *iote;
>>> +   *iote = *iose;
>>> +   *iose = iotmp;
>>> +   /* swap tlb */
>>> +   memcpy(&t, te, sizeof(CPUTLBEntry));
>>> +   memcpy(te, se, sizeof(CPUTLBEntry));
>>> +   memcpy(se, &t, sizeof(CPUTLBEntry));
>>
>> You could use assignment operator, it works:
>>     t = *te;
>>     *te = *se;
>>     *se = t;
>>
> i see, will fix in patch v3.
>
>> Also all users of this function are inline functions from softmmu_template.h
>> Could this function be a static inline function in that file too?
>
> I am following where tlb_fill and tlb_flush are placed. Also, the
> softmmu_template.h is included more than once, would put the swap_tlb
> in softmmu_template.h result in multiple definitions ?

You're right. Could it be put into a file that is included by softmmu_template.h
and has include guards in it, or could it be turned into macro?

[...]

>>> -        tlb_fill(env, addr, READ_ACCESS_TYPE, mmu_idx, retaddr);
>>> +        /* we are about to do a page table walk. our last hope is the
>>> +         * victim tlb. try to refill from the victim tlb before walking the
>>> +         * page table. */
>>> +        for (vtlb_idx = CPU_VTLB_SIZE; vtlb_idx >= 0; --vtlb_idx) {
>>> +            if (env->tlb_v_table[mmu_idx][vtlb_idx].ADDR_READ
>>
>> Is it ADDR_READ or addr_read, as it is called in other places?
>
> I am following  target_ulong tlb_addr =
> env->tlb_table[mmu_idx][index].ADDR_READ; in the same function. I am
> not sure i get what you mean.

Ok, I see ADDR_READ is a macro defined depending on
SOFTMMU_CODE_ACCESS. I got a bit confused by addr_read used
in other places vs. ADDR_READ used here vs. addr_write used below.
Sorry for the noise.

[...]

>>> -        tlb_fill(env, addr, 1, mmu_idx, retaddr);
>>> +        /* we are about to do a page table walk. our last hope is the
>>> +         * victim tlb. try to refill from the victim tlb before walking the
>>> +         * page table. */
>>> +        for (vtlb_idx = CPU_VTLB_SIZE; vtlb_idx >= 0; --vtlb_idx) {
>>> +            if (env->tlb_v_table[mmu_idx][vtlb_idx].addr_write
>>> +                == (addr & TARGET_PAGE_MASK)) {
>>> +                /* found entry in victim tlb */
>>> +                swap_tlb(&env->tlb_table[mmu_idx][index],
>>> +                         &env->tlb_v_table[mmu_idx][vtlb_idx],
>>> +                         &env->iotlb[mmu_idx][index],
>>> +                         &env->iotlb_v[mmu_idx][vtlb_idx]);
>>> +                break;
>>> +            }
>>> +        }
>>
>> It could be a good inline function too.
>
> One way is to define the victim tlb lookup code in a C function which
> gets called before tlb_fill. but i need the ADDR_READ and addr_write
> macros.

It could be a single macro then, taking ADDR_READ or addr_write
as a parameter.

-- 
Thanks.
-- Max

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Qemu-devel] [PATCH v2] cpu: implementing victim TLB for QEMU system emulated TLB
  2014-01-23 19:49 [Qemu-devel] [PATCH v2] cpu: implementing victim TLB for QEMU system emulated TLB Xin Tong
  2014-01-23 19:54 ` Xin Tong
  2014-01-23 20:44 ` [Qemu-devel] [PATCH v2] cpu: implementing victim TLB for QEMU system emulated TLB Max Filippov
@ 2014-01-23 21:47 ` Richard Henderson
  2014-01-23 21:52   ` Xin Tong
  2 siblings, 1 reply; 12+ messages in thread
From: Richard Henderson @ 2014-01-23 21:47 UTC (permalink / raw)
  To: Xin Tong, qemu-devel; +Cc: afaerber

On 01/23/2014 11:49 AM, Xin Tong wrote:
> Reviewed-by: Richard Henderson <rth@twiddle.net>
> Signed-off-by: Xin Tong <trent.tong@gmail.com>

I did not give you a Reviewed-by.  That implies that I
approve of the patch as written, or with minor tweaks.
But you can't just add this on your own.


r~

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Qemu-devel] [PATCH v2] cpu: implementing victim TLB for QEMU system emulated TLB
  2014-01-23 21:47 ` Richard Henderson
@ 2014-01-23 21:52   ` Xin Tong
  2014-01-23 21:56     ` Peter Maydell
  0 siblings, 1 reply; 12+ messages in thread
From: Xin Tong @ 2014-01-23 21:52 UTC (permalink / raw)
  To: Richard Henderson; +Cc: QEMU Developers, Andreas Färber

Richard. I am sorry. I thought the patch submitter put review-bys
themselves. How do i get a reviewed-by ?

Xin

On Thu, Jan 23, 2014 at 3:47 PM, Richard Henderson <rth@twiddle.net> wrote:
> On 01/23/2014 11:49 AM, Xin Tong wrote:
>> Reviewed-by: Richard Henderson <rth@twiddle.net>
>> Signed-off-by: Xin Tong <trent.tong@gmail.com>
>
> I did not give you a Reviewed-by.  That implies that I
> approve of the patch as written, or with minor tweaks.
> But you can't just add this on your own.
>
>
> r~

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Qemu-devel] [PATCH v2] cpu: implementing victim TLB for QEMU system emulated TLB
  2014-01-23 21:52   ` Xin Tong
@ 2014-01-23 21:56     ` Peter Maydell
  0 siblings, 0 replies; 12+ messages in thread
From: Peter Maydell @ 2014-01-23 21:56 UTC (permalink / raw)
  To: Xin Tong; +Cc: QEMU Developers, Andreas Färber, Richard Henderson

On 23 January 2014 21:52, Xin Tong <trent.tong@gmail.com> wrote:
> Richard. I am sorry. I thought the patch submitter put review-bys
> themselves. How do i get a reviewed-by ?

You submit your patch, you fix any issues people raise, and
you resubmit the fixed version. Eventually when all the problems
have been corrected people will reply to it with their Reviewed-by
tag. This is that person saying "yes, I am happy that this version
is now correct".

This is described in the section "Proper use of Reviewed-by: tags
can aid review" of http://wiki.qemu.org/Contribute/SubmitAPatch

thanks
-- PMM

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Qemu-devel] [PATCH v2] cpu: implementing victim TLB for QEMUsystem emulated TLBB
  2014-01-23 19:54 ` Xin Tong
@ 2014-01-24 12:15   ` Alex Bennée
  2014-01-24 16:46     ` Richard Henderson
  2014-01-24 22:32     ` BALATON Zoltan
  0 siblings, 2 replies; 12+ messages in thread
From: Alex Bennée @ 2014-01-24 12:15 UTC (permalink / raw)
  To: Xin Tong; +Cc: Richard Henderson, QEMU Developers, afaerber


trent.tong@gmail.com writes:

> Attaching data in excel which could not be sent with the patch at the same time.
>
<snip>

If you can attach the summary of the data as plain text that would be
useful. Not all of us have access to a Windows box with Excell!

-- 
Alex Bennée

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Qemu-devel] [PATCH v2] cpu: implementing victim TLB for QEMUsystem emulated TLBB
  2014-01-24 12:15   ` [Qemu-devel] [PATCH v2] cpu: implementing victim TLB for QEMUsystem emulated TLBB Alex Bennée
@ 2014-01-24 16:46     ` Richard Henderson
  2014-01-24 22:32     ` BALATON Zoltan
  1 sibling, 0 replies; 12+ messages in thread
From: Richard Henderson @ 2014-01-24 16:46 UTC (permalink / raw)
  To: Alex Bennée, Xin Tong; +Cc: QEMU Developers, afaerber

On 01/24/2014 04:15 AM, Alex Bennée wrote:
> If you can attach the summary of the data as plain text that would be
> useful. Not all of us have access to a Windows box with Excell!

To be fair, it can be opened with openoffice.


r~

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Qemu-devel] [PATCH v2] cpu: implementing victim TLB for QEMUsystem emulated TLBB
  2014-01-24 12:15   ` [Qemu-devel] [PATCH v2] cpu: implementing victim TLB for QEMUsystem emulated TLBB Alex Bennée
  2014-01-24 16:46     ` Richard Henderson
@ 2014-01-24 22:32     ` BALATON Zoltan
  2014-01-25  8:28       ` Paolo Bonzini
  1 sibling, 1 reply; 12+ messages in thread
From: BALATON Zoltan @ 2014-01-24 22:32 UTC (permalink / raw)
  To: Alex Bennée; +Cc: Xin Tong, QEMU Developers, afaerber, Richard Henderson

[-- Attachment #1: Type: TEXT/PLAIN, Size: 764 bytes --]

On Fri, 24 Jan 2014, Alex Bennée wrote:
> trent.tong@gmail.com writes:
>> Attaching data in excel which could not be sent with the patch at the same time.
>>
> <snip>
>
> If you can attach the summary of the data as plain text that would be
> useful. Not all of us have access to a Windows box with Excell!

Opens on LibreOffice as well but plain text is easier to read.

I wonder how much of the measured performance increase is due to the 
increased effective cache size and how much is the reduction in number of 
instructions by the victim TLB mechanism. Wouldn't it be a more fair 
measurement to use a bigger TLB size for the TLB only case that matches 
the effective size of the TLB+victim case instead of using the same TLB 
size for both?

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Qemu-devel] [PATCH v2] cpu: implementing victim TLB for QEMUsystem emulated TLBB
  2014-01-24 22:32     ` BALATON Zoltan
@ 2014-01-25  8:28       ` Paolo Bonzini
  0 siblings, 0 replies; 12+ messages in thread
From: Paolo Bonzini @ 2014-01-25  8:28 UTC (permalink / raw)
  To: BALATON Zoltan, Alex Bennée
  Cc: Richard Henderson, Xin Tong, QEMU Developers, afaerber

Il 24/01/2014 23:32, BALATON Zoltan ha scritto:
> On Fri, 24 Jan 2014, Alex Bennée wrote:
>> trent.tong@gmail.com writes:
>>> Attaching data in excel which could not be sent with the patch at the
>>> same time.
>>>
>> <snip>
>>
>> If you can attach the summary of the data as plain text that would be
>> useful. Not all of us have access to a Windows box with Excell!
>
> Opens on LibreOffice as well but plain text is easier to read.
>
> I wonder how much of the measured performance increase is due to the
> increased effective cache size and how much is the reduction in number
> of instructions by the victim TLB mechanism. Wouldn't it be a more fair
> measurement to use a bigger TLB size for the TLB only case that matches
> the effective size of the TLB+victim case instead of using the same TLB
> size for both?

That's quite unlikely, the TLB size is 256 while TLB+victim is 264.

Paolo

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2014-01-25  8:28 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-01-23 19:49 [Qemu-devel] [PATCH v2] cpu: implementing victim TLB for QEMU system emulated TLB Xin Tong
2014-01-23 19:54 ` Xin Tong
2014-01-24 12:15   ` [Qemu-devel] [PATCH v2] cpu: implementing victim TLB for QEMUsystem emulated TLBB Alex Bennée
2014-01-24 16:46     ` Richard Henderson
2014-01-24 22:32     ` BALATON Zoltan
2014-01-25  8:28       ` Paolo Bonzini
2014-01-23 20:44 ` [Qemu-devel] [PATCH v2] cpu: implementing victim TLB for QEMU system emulated TLB Max Filippov
2014-01-23 21:29   ` Xin Tong
2014-01-23 21:46     ` Max Filippov
2014-01-23 21:47 ` Richard Henderson
2014-01-23 21:52   ` Xin Tong
2014-01-23 21:56     ` Peter Maydell

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).