Re: [Qemu-devel] [Qemu-ppc] [PATCH 02/17] ppc: avoid excessive TLB flushing

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Paolo Bonzini <pbonzini@redhat.com>
To: Alexander Graf <agraf@suse.de>
Cc: dgibson@redhat.com, qemu-ppc@nongnu.org, qemu-devel@nongnu.org,
	tommusta@gmail.com
Subject: Re: [Qemu-devel] [Qemu-ppc] [PATCH 02/17] ppc: avoid excessive TLB flushing
Date: Tue, 09 Sep 2014 18:42:15 +0200	[thread overview]
Message-ID: <540F2DE7.2020501@redhat.com> (raw)
In-Reply-To: <474664106.41699271.1409919082051.JavaMail.zimbra@redhat.com>

Il 05/09/2014 14:11, Paolo Bonzini ha scritto:
> 
> 
> ----- Messaggio originale -----
>> Da: "Alexander Graf" <agraf@suse.de>
>> A: "Paolo Bonzini" <pbonzini@redhat.com>, qemu-devel@nongnu.org
>> Cc: dgibson@redhat.com, qemu-ppc@nongnu.org, tommusta@gmail.com
>> Inviato: Venerdì, 5 settembre 2014 9:10:01
>> Oggetto: Re: [Qemu-ppc] [PATCH 02/17] ppc: avoid excessive TLB flushing
>>
>>
>>
>> On 28.08.14 19:14, Paolo Bonzini wrote:
>>> PowerPC TCG flushes the TLB on every IR/DR change, which basically
>>> means on every user<->kernel context switch.  Use the 6-element
>>> TLB array as a cache, where each MMU index is mapped to a different
>>> state of the IR/DR/PR/HV bits.
>>>
>>> This brings the number of TLB flushes down from ~900000 to ~50000
>>> for starting up the Debian installer, which is in line with x86
>>> and gives a ~10% performance improvement.
>>>
>>> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
>>> ---
>>>  cputlb.c                    | 19 +++++++++++++++++
>>>  hw/ppc/spapr_hcall.c        |  6 +++++-
>>>  include/exec/exec-all.h     |  5 +++++
>>>  target-ppc/cpu.h            |  4 +++-
>>>  target-ppc/excp_helper.c    |  6 +-----
>>>  target-ppc/helper_regs.h    | 52
>>>  +++++++++++++++++++++++++++++++--------------
>>>  target-ppc/translate_init.c |  5 +++++
>>>  7 files changed, 74 insertions(+), 23 deletions(-)
>>>
>>> diff --git a/cputlb.c b/cputlb.c
>>> index afd3705..17e1b03 100644
>>> --- a/cputlb.c
>>> +++ b/cputlb.c
>>> @@ -67,6 +67,25 @@ void tlb_flush(CPUState *cpu, int flush_global)
>>>      tlb_flush_count++;
>>>  }
>>>  
>>> +void tlb_flush_idx(CPUState *cpu, int mmu_idx)
>>> +{
>>> +    CPUArchState *env = cpu->env_ptr;
>>> +
>>> +#if defined(DEBUG_TLB)
>>> +    printf("tlb_flush_idx %d:\n", mmu_idx);
>>> +#endif
>>> +    /* must reset current TB so that interrupts cannot modify the
>>> +       links while we are modifying them */
>>> +    cpu->current_tb = NULL;
>>> +
>>> +    memset(env->tlb_table[mmu_idx], -1, sizeof(env->tlb_table[mmu_idx]));
>>> +    memset(cpu->tb_jmp_cache, 0, sizeof(cpu->tb_jmp_cache));
>>> +
>>> +    env->tlb_flush_addr = -1;
>>> +    env->tlb_flush_mask = 0;
>>> +    tlb_flush_count++;
>>> +}
>>> +
>>>  static inline void tlb_flush_entry(CPUTLBEntry *tlb_entry, target_ulong
>>>  addr)
>>>  {
>>>      if (addr == (tlb_entry->addr_read &
>>> diff --git a/hw/ppc/spapr_hcall.c b/hw/ppc/spapr_hcall.c
>>> index 467858c..b95961c 100644
>>> --- a/hw/ppc/spapr_hcall.c
>>> +++ b/hw/ppc/spapr_hcall.c
>>> @@ -556,13 +556,17 @@ static target_ulong h_cede(PowerPCCPU *cpu,
>>> sPAPREnvironment *spapr,
>>>  {
>>>      CPUPPCState *env = &cpu->env;
>>>      CPUState *cs = CPU(cpu);
>>> +    bool flush;
>>>  
>>>      env->msr |= (1ULL << MSR_EE);
>>> -    hreg_compute_hflags(env);
>>> +    flush = hreg_compute_hflags(env);
>>>      if (!cpu_has_work(cs)) {
>>>          cs->halted = 1;
>>>          cs->exception_index = EXCP_HLT;
>>>          cs->exit_request = 1;
>>> +    } else if (flush) {
>>> +        cs->interrupt_request |= CPU_INTERRUPT_EXITTB;
>>> +        cs->exit_request = 1;
>>
>> Can this ever happen?
> 
> No, I think it can't.
> 
>> Ok, so this basically changes the semantics of mmu_idx from a static
>> array with predefined meanings to a dynamic array with runtime changing
>> semantics.
>>
>> The first thing that comes to mind here is why we're not just extending
>> the existing array? After all, we have 4 bits -> 16 states minus one for
>> PR+HV. Can our existing logic not deal with this?
> 
> Yeah, that would require 12 MMU indices.  Right now, include/exec/cpu_ldst.h
> only supports 6 but that's easy to extend.
> 
> tlb_flush becomes progressively more expensive as you add more MMU modes,
> but it may work.  This patch removes 98.8% of the TLB flushes, makes the
> remaining ones twice as slow (NB_MMU_MODES goes from 3 to 6), and speeds
> up QEMU by 10%.  You can solve this:
> 
>     0.9 = 0.988 * 0 + 0.012 * tlb_time * 2 + (1 - tlb_time) * 1
>     tlb_time = 0.1 / 0.98 = 0.102
> 
> to compute that the time spent in TLB flushes before the patch is 10.2% of the
> whole emulation time.
> 
> Doubling the NB_MMU_MODES further from 6 to 12 would still save 98.8% of the TLB
> flushes, while making the remaining ones even more expensive.  The savings will be
> smaller, but actually not by much:
> 
>     0.988 * 0 + 0.012 * tlb_time * 4 + (1 - tlb_time) * 1 = 0.903
> 
> i.e. what you propose would still save 9.7%.  Still, having 12 modes seemed like a
> waste, since only 4 or 5 are used in practice...

The 12 MMU modes work just fine.  Thanks for the suggestion!

Paolo

next prev parent reply	other threads:[~2014-09-09 16:42 UTC|newest]

Thread overview: 50+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-08-28 17:14 [Qemu-devel] [RFT/RFH PATCH 00/16] PPC speedup patches for TCG Paolo Bonzini
2014-08-28 17:14 ` [Qemu-devel] [PATCH 01/17] ppc: do not look at the MMU index Paolo Bonzini
2014-08-28 17:14 ` [Qemu-devel] [PATCH 02/17] ppc: avoid excessive TLB flushing Paolo Bonzini
2014-08-28 17:30   ` Peter Maydell
2014-08-28 19:35     ` Paolo Bonzini
2014-09-05  6:00       ` David Gibson
2014-09-05  7:10   ` [Qemu-devel] [Qemu-ppc] " Alexander Graf
2014-09-05 12:11     ` Paolo Bonzini
2014-09-09 16:42       ` Paolo Bonzini [this message]
2014-09-09 20:51         ` Alexander Graf
2014-08-28 17:14 ` [Qemu-devel] [PATCH 03/17] ppc: fix monitor access to CR Paolo Bonzini
2014-09-03 18:21   ` Tom Musta
2014-09-05  7:10     ` [Qemu-devel] [Qemu-ppc] " Alexander Graf
2014-08-28 17:15 ` [Qemu-devel] [PATCH 04/17] ppc: use ARRAY_SIZE in gdbstub.c Paolo Bonzini
2014-09-03 18:21   ` Tom Musta
2014-08-28 17:15 ` [Qemu-devel] [PATCH 05/17] ppc: use CRF_* in fpu_helper.c Paolo Bonzini
2014-09-03 18:21   ` Tom Musta
2014-08-28 17:15 ` [Qemu-devel] [PATCH 06/17] ppc: use CRF_* in int_helper.c Paolo Bonzini
2014-09-03 18:28   ` Tom Musta
2014-09-05  7:12     ` [Qemu-devel] [Qemu-ppc] " Alexander Graf
2014-08-28 17:15 ` [Qemu-devel] [PATCH 07/17] ppc: fix result of DLMZB when no zero bytes are found Paolo Bonzini
2014-09-03 18:28   ` Tom Musta
2014-09-05  7:26     ` [Qemu-devel] [Qemu-ppc] " Alexander Graf
2014-08-28 17:15 ` [Qemu-devel] [PATCH 08/17] ppc: introduce helpers for mfocrf/mtocrf Paolo Bonzini
2014-09-03 18:28   ` Tom Musta
2014-08-28 17:15 ` [Qemu-devel] [PATCH 09/17] ppc: reorganize gen_compute_fprf Paolo Bonzini
2014-09-03 18:29   ` Tom Musta
2014-08-28 17:15 ` [Qemu-devel] [PATCH 10/17] ppc: introduce gen_op_mfcr/gen_op_mtcr Paolo Bonzini
2014-09-03 18:58   ` Tom Musta
2014-08-28 17:15 ` [Qemu-devel] [PATCH 11/17] ppc: rename gen_set_cr6_from_fpscr Paolo Bonzini
2014-09-03 19:41   ` Tom Musta
2014-09-05  7:27     ` [Qemu-devel] [Qemu-ppc] " Alexander Graf
2014-08-28 17:15 ` [Qemu-devel] [PATCH 12/17] ppc: use movcond for isel Paolo Bonzini
2014-08-29 18:30   ` Richard Henderson
2014-09-03 19:41   ` Tom Musta
2014-09-15 13:39     ` Paolo Bonzini
2014-08-28 17:15 ` [Qemu-devel] [PATCH 13/17] ppc: compute mask from BI using right shift Paolo Bonzini
2014-09-03 20:59   ` Tom Musta
2014-09-05  7:29     ` [Qemu-devel] [Qemu-ppc] " Alexander Graf
2014-08-28 17:15 ` [Qemu-devel] [PATCH 14/17] ppc: introduce ppc_get_crf and ppc_set_crf Paolo Bonzini
2014-09-04 18:26   ` Tom Musta
2014-08-28 17:15 ` [Qemu-devel] [PATCH 15/17] ppc: store CR registers in 32 1-bit registers Paolo Bonzini
2014-09-04 18:27   ` Tom Musta
2014-09-09 15:44     ` Paolo Bonzini
2014-09-09 16:41       ` Paolo Bonzini
2014-09-09 16:03     ` Richard Henderson
2014-09-09 16:26       ` Paolo Bonzini
2014-08-28 17:15 ` [Qemu-devel] [PATCH 16/17] ppc: inline ppc_get_crf/ppc_set_crf when clearer Paolo Bonzini
2014-08-28 17:15 ` [Qemu-devel] [PATCH 17/17] ppc: dump all 32 CR bits Paolo Bonzini
2014-08-28 18:05 ` [Qemu-devel] [RFT/RFH PATCH 00/16] PPC speedup patches for TCG Tom Musta

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=540F2DE7.2020501@redhat.com \
    --to=pbonzini@redhat.com \
    --cc=agraf@suse.de \
    --cc=dgibson@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=qemu-ppc@nongnu.org \
    --cc=tommusta@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.