From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:37610) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1XROUs-0007Mz-Ns for qemu-devel@nongnu.org; Tue, 09 Sep 2014 12:42:31 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1XROUn-0003n8-Ho for qemu-devel@nongnu.org; Tue, 09 Sep 2014 12:42:26 -0400 Sender: Paolo Bonzini Message-ID: <540F2DE7.2020501@redhat.com> Date: Tue, 09 Sep 2014 18:42:15 +0200 From: Paolo Bonzini MIME-Version: 1.0 References: <1409246113-6519-1-git-send-email-pbonzini@redhat.com> <1409246113-6519-3-git-send-email-pbonzini@redhat.com> <540961C9.9000705@suse.de> <474664106.41699271.1409919082051.JavaMail.zimbra@redhat.com> In-Reply-To: <474664106.41699271.1409919082051.JavaMail.zimbra@redhat.com> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit Subject: Re: [Qemu-devel] [Qemu-ppc] [PATCH 02/17] ppc: avoid excessive TLB flushing List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Alexander Graf Cc: dgibson@redhat.com, qemu-ppc@nongnu.org, qemu-devel@nongnu.org, tommusta@gmail.com Il 05/09/2014 14:11, Paolo Bonzini ha scritto: > > > ----- Messaggio originale ----- >> Da: "Alexander Graf" >> A: "Paolo Bonzini" , qemu-devel@nongnu.org >> Cc: dgibson@redhat.com, qemu-ppc@nongnu.org, tommusta@gmail.com >> Inviato: Venerdì, 5 settembre 2014 9:10:01 >> Oggetto: Re: [Qemu-ppc] [PATCH 02/17] ppc: avoid excessive TLB flushing >> >> >> >> On 28.08.14 19:14, Paolo Bonzini wrote: >>> PowerPC TCG flushes the TLB on every IR/DR change, which basically >>> means on every user<->kernel context switch. Use the 6-element >>> TLB array as a cache, where each MMU index is mapped to a different >>> state of the IR/DR/PR/HV bits. >>> >>> This brings the number of TLB flushes down from ~900000 to ~50000 >>> for starting up the Debian installer, which is in line with x86 >>> and gives a ~10% performance improvement. >>> >>> Signed-off-by: Paolo Bonzini >>> --- >>> cputlb.c | 19 +++++++++++++++++ >>> hw/ppc/spapr_hcall.c | 6 +++++- >>> include/exec/exec-all.h | 5 +++++ >>> target-ppc/cpu.h | 4 +++- >>> target-ppc/excp_helper.c | 6 +----- >>> target-ppc/helper_regs.h | 52 >>> +++++++++++++++++++++++++++++++-------------- >>> target-ppc/translate_init.c | 5 +++++ >>> 7 files changed, 74 insertions(+), 23 deletions(-) >>> >>> diff --git a/cputlb.c b/cputlb.c >>> index afd3705..17e1b03 100644 >>> --- a/cputlb.c >>> +++ b/cputlb.c >>> @@ -67,6 +67,25 @@ void tlb_flush(CPUState *cpu, int flush_global) >>> tlb_flush_count++; >>> } >>> >>> +void tlb_flush_idx(CPUState *cpu, int mmu_idx) >>> +{ >>> + CPUArchState *env = cpu->env_ptr; >>> + >>> +#if defined(DEBUG_TLB) >>> + printf("tlb_flush_idx %d:\n", mmu_idx); >>> +#endif >>> + /* must reset current TB so that interrupts cannot modify the >>> + links while we are modifying them */ >>> + cpu->current_tb = NULL; >>> + >>> + memset(env->tlb_table[mmu_idx], -1, sizeof(env->tlb_table[mmu_idx])); >>> + memset(cpu->tb_jmp_cache, 0, sizeof(cpu->tb_jmp_cache)); >>> + >>> + env->tlb_flush_addr = -1; >>> + env->tlb_flush_mask = 0; >>> + tlb_flush_count++; >>> +} >>> + >>> static inline void tlb_flush_entry(CPUTLBEntry *tlb_entry, target_ulong >>> addr) >>> { >>> if (addr == (tlb_entry->addr_read & >>> diff --git a/hw/ppc/spapr_hcall.c b/hw/ppc/spapr_hcall.c >>> index 467858c..b95961c 100644 >>> --- a/hw/ppc/spapr_hcall.c >>> +++ b/hw/ppc/spapr_hcall.c >>> @@ -556,13 +556,17 @@ static target_ulong h_cede(PowerPCCPU *cpu, >>> sPAPREnvironment *spapr, >>> { >>> CPUPPCState *env = &cpu->env; >>> CPUState *cs = CPU(cpu); >>> + bool flush; >>> >>> env->msr |= (1ULL << MSR_EE); >>> - hreg_compute_hflags(env); >>> + flush = hreg_compute_hflags(env); >>> if (!cpu_has_work(cs)) { >>> cs->halted = 1; >>> cs->exception_index = EXCP_HLT; >>> cs->exit_request = 1; >>> + } else if (flush) { >>> + cs->interrupt_request |= CPU_INTERRUPT_EXITTB; >>> + cs->exit_request = 1; >> >> Can this ever happen? > > No, I think it can't. > >> Ok, so this basically changes the semantics of mmu_idx from a static >> array with predefined meanings to a dynamic array with runtime changing >> semantics. >> >> The first thing that comes to mind here is why we're not just extending >> the existing array? After all, we have 4 bits -> 16 states minus one for >> PR+HV. Can our existing logic not deal with this? > > Yeah, that would require 12 MMU indices. Right now, include/exec/cpu_ldst.h > only supports 6 but that's easy to extend. > > tlb_flush becomes progressively more expensive as you add more MMU modes, > but it may work. This patch removes 98.8% of the TLB flushes, makes the > remaining ones twice as slow (NB_MMU_MODES goes from 3 to 6), and speeds > up QEMU by 10%. You can solve this: > > 0.9 = 0.988 * 0 + 0.012 * tlb_time * 2 + (1 - tlb_time) * 1 > tlb_time = 0.1 / 0.98 = 0.102 > > to compute that the time spent in TLB flushes before the patch is 10.2% of the > whole emulation time. > > Doubling the NB_MMU_MODES further from 6 to 12 would still save 98.8% of the TLB > flushes, while making the remaining ones even more expensive. The savings will be > smaller, but actually not by much: > > 0.988 * 0 + 0.012 * tlb_time * 4 + (1 - tlb_time) * 1 = 0.903 > > i.e. what you propose would still save 9.7%. Still, having 12 modes seemed like a > waste, since only 4 or 5 are used in practice... The 12 MMU modes work just fine. Thanks for the suggestion! Paolo