qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Paolo Bonzini <pbonzini@redhat.com>
To: Alexander Graf <agraf@suse.de>
Cc: dgibson@redhat.com, qemu-ppc@nongnu.org, qemu-devel@nongnu.org,
	tommusta@gmail.com
Subject: Re: [Qemu-devel] [Qemu-ppc] [PATCH 02/17] ppc: avoid excessive TLB flushing
Date: Tue, 09 Sep 2014 18:42:15 +0200	[thread overview]
Message-ID: <540F2DE7.2020501@redhat.com> (raw)
In-Reply-To: <474664106.41699271.1409919082051.JavaMail.zimbra@redhat.com>

Il 05/09/2014 14:11, Paolo Bonzini ha scritto:
> 
> 
> ----- Messaggio originale -----
>> Da: "Alexander Graf" <agraf@suse.de>
>> A: "Paolo Bonzini" <pbonzini@redhat.com>, qemu-devel@nongnu.org
>> Cc: dgibson@redhat.com, qemu-ppc@nongnu.org, tommusta@gmail.com
>> Inviato: Venerdì, 5 settembre 2014 9:10:01
>> Oggetto: Re: [Qemu-ppc] [PATCH 02/17] ppc: avoid excessive TLB flushing
>>
>>
>>
>> On 28.08.14 19:14, Paolo Bonzini wrote:
>>> PowerPC TCG flushes the TLB on every IR/DR change, which basically
>>> means on every user<->kernel context switch.  Use the 6-element
>>> TLB array as a cache, where each MMU index is mapped to a different
>>> state of the IR/DR/PR/HV bits.
>>>
>>> This brings the number of TLB flushes down from ~900000 to ~50000
>>> for starting up the Debian installer, which is in line with x86
>>> and gives a ~10% performance improvement.
>>>
>>> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
>>> ---
>>>  cputlb.c                    | 19 +++++++++++++++++
>>>  hw/ppc/spapr_hcall.c        |  6 +++++-
>>>  include/exec/exec-all.h     |  5 +++++
>>>  target-ppc/cpu.h            |  4 +++-
>>>  target-ppc/excp_helper.c    |  6 +-----
>>>  target-ppc/helper_regs.h    | 52
>>>  +++++++++++++++++++++++++++++++--------------
>>>  target-ppc/translate_init.c |  5 +++++
>>>  7 files changed, 74 insertions(+), 23 deletions(-)
>>>
>>> diff --git a/cputlb.c b/cputlb.c
>>> index afd3705..17e1b03 100644
>>> --- a/cputlb.c
>>> +++ b/cputlb.c
>>> @@ -67,6 +67,25 @@ void tlb_flush(CPUState *cpu, int flush_global)
>>>      tlb_flush_count++;
>>>  }
>>>  
>>> +void tlb_flush_idx(CPUState *cpu, int mmu_idx)
>>> +{
>>> +    CPUArchState *env = cpu->env_ptr;
>>> +
>>> +#if defined(DEBUG_TLB)
>>> +    printf("tlb_flush_idx %d:\n", mmu_idx);
>>> +#endif
>>> +    /* must reset current TB so that interrupts cannot modify the
>>> +       links while we are modifying them */
>>> +    cpu->current_tb = NULL;
>>> +
>>> +    memset(env->tlb_table[mmu_idx], -1, sizeof(env->tlb_table[mmu_idx]));
>>> +    memset(cpu->tb_jmp_cache, 0, sizeof(cpu->tb_jmp_cache));
>>> +
>>> +    env->tlb_flush_addr = -1;
>>> +    env->tlb_flush_mask = 0;
>>> +    tlb_flush_count++;
>>> +}
>>> +
>>>  static inline void tlb_flush_entry(CPUTLBEntry *tlb_entry, target_ulong
>>>  addr)
>>>  {
>>>      if (addr == (tlb_entry->addr_read &
>>> diff --git a/hw/ppc/spapr_hcall.c b/hw/ppc/spapr_hcall.c
>>> index 467858c..b95961c 100644
>>> --- a/hw/ppc/spapr_hcall.c
>>> +++ b/hw/ppc/spapr_hcall.c
>>> @@ -556,13 +556,17 @@ static target_ulong h_cede(PowerPCCPU *cpu,
>>> sPAPREnvironment *spapr,
>>>  {
>>>      CPUPPCState *env = &cpu->env;
>>>      CPUState *cs = CPU(cpu);
>>> +    bool flush;
>>>  
>>>      env->msr |= (1ULL << MSR_EE);
>>> -    hreg_compute_hflags(env);
>>> +    flush = hreg_compute_hflags(env);
>>>      if (!cpu_has_work(cs)) {
>>>          cs->halted = 1;
>>>          cs->exception_index = EXCP_HLT;
>>>          cs->exit_request = 1;
>>> +    } else if (flush) {
>>> +        cs->interrupt_request |= CPU_INTERRUPT_EXITTB;
>>> +        cs->exit_request = 1;
>>
>> Can this ever happen?
> 
> No, I think it can't.
> 
>> Ok, so this basically changes the semantics of mmu_idx from a static
>> array with predefined meanings to a dynamic array with runtime changing
>> semantics.
>>
>> The first thing that comes to mind here is why we're not just extending
>> the existing array? After all, we have 4 bits -> 16 states minus one for
>> PR+HV. Can our existing logic not deal with this?
> 
> Yeah, that would require 12 MMU indices.  Right now, include/exec/cpu_ldst.h
> only supports 6 but that's easy to extend.
> 
> tlb_flush becomes progressively more expensive as you add more MMU modes,
> but it may work.  This patch removes 98.8% of the TLB flushes, makes the
> remaining ones twice as slow (NB_MMU_MODES goes from 3 to 6), and speeds
> up QEMU by 10%.  You can solve this:
> 
>     0.9 = 0.988 * 0 + 0.012 * tlb_time * 2 + (1 - tlb_time) * 1
>     tlb_time = 0.1 / 0.98 = 0.102
> 
> to compute that the time spent in TLB flushes before the patch is 10.2% of the
> whole emulation time.
> 
> Doubling the NB_MMU_MODES further from 6 to 12 would still save 98.8% of the TLB
> flushes, while making the remaining ones even more expensive.  The savings will be
> smaller, but actually not by much:
> 
>     0.988 * 0 + 0.012 * tlb_time * 4 + (1 - tlb_time) * 1 = 0.903
> 
> i.e. what you propose would still save 9.7%.  Still, having 12 modes seemed like a
> waste, since only 4 or 5 are used in practice...

The 12 MMU modes work just fine.  Thanks for the suggestion!

Paolo

  reply	other threads:[~2014-09-09 16:42 UTC|newest]

Thread overview: 50+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-08-28 17:14 [Qemu-devel] [RFT/RFH PATCH 00/16] PPC speedup patches for TCG Paolo Bonzini
2014-08-28 17:14 ` [Qemu-devel] [PATCH 01/17] ppc: do not look at the MMU index Paolo Bonzini
2014-08-28 17:14 ` [Qemu-devel] [PATCH 02/17] ppc: avoid excessive TLB flushing Paolo Bonzini
2014-08-28 17:30   ` Peter Maydell
2014-08-28 19:35     ` Paolo Bonzini
2014-09-05  6:00       ` David Gibson
2014-09-05  7:10   ` [Qemu-devel] [Qemu-ppc] " Alexander Graf
2014-09-05 12:11     ` Paolo Bonzini
2014-09-09 16:42       ` Paolo Bonzini [this message]
2014-09-09 20:51         ` Alexander Graf
2014-08-28 17:14 ` [Qemu-devel] [PATCH 03/17] ppc: fix monitor access to CR Paolo Bonzini
2014-09-03 18:21   ` Tom Musta
2014-09-05  7:10     ` [Qemu-devel] [Qemu-ppc] " Alexander Graf
2014-08-28 17:15 ` [Qemu-devel] [PATCH 04/17] ppc: use ARRAY_SIZE in gdbstub.c Paolo Bonzini
2014-09-03 18:21   ` Tom Musta
2014-08-28 17:15 ` [Qemu-devel] [PATCH 05/17] ppc: use CRF_* in fpu_helper.c Paolo Bonzini
2014-09-03 18:21   ` Tom Musta
2014-08-28 17:15 ` [Qemu-devel] [PATCH 06/17] ppc: use CRF_* in int_helper.c Paolo Bonzini
2014-09-03 18:28   ` Tom Musta
2014-09-05  7:12     ` [Qemu-devel] [Qemu-ppc] " Alexander Graf
2014-08-28 17:15 ` [Qemu-devel] [PATCH 07/17] ppc: fix result of DLMZB when no zero bytes are found Paolo Bonzini
2014-09-03 18:28   ` Tom Musta
2014-09-05  7:26     ` [Qemu-devel] [Qemu-ppc] " Alexander Graf
2014-08-28 17:15 ` [Qemu-devel] [PATCH 08/17] ppc: introduce helpers for mfocrf/mtocrf Paolo Bonzini
2014-09-03 18:28   ` Tom Musta
2014-08-28 17:15 ` [Qemu-devel] [PATCH 09/17] ppc: reorganize gen_compute_fprf Paolo Bonzini
2014-09-03 18:29   ` Tom Musta
2014-08-28 17:15 ` [Qemu-devel] [PATCH 10/17] ppc: introduce gen_op_mfcr/gen_op_mtcr Paolo Bonzini
2014-09-03 18:58   ` Tom Musta
2014-08-28 17:15 ` [Qemu-devel] [PATCH 11/17] ppc: rename gen_set_cr6_from_fpscr Paolo Bonzini
2014-09-03 19:41   ` Tom Musta
2014-09-05  7:27     ` [Qemu-devel] [Qemu-ppc] " Alexander Graf
2014-08-28 17:15 ` [Qemu-devel] [PATCH 12/17] ppc: use movcond for isel Paolo Bonzini
2014-08-29 18:30   ` Richard Henderson
2014-09-03 19:41   ` Tom Musta
2014-09-15 13:39     ` Paolo Bonzini
2014-08-28 17:15 ` [Qemu-devel] [PATCH 13/17] ppc: compute mask from BI using right shift Paolo Bonzini
2014-09-03 20:59   ` Tom Musta
2014-09-05  7:29     ` [Qemu-devel] [Qemu-ppc] " Alexander Graf
2014-08-28 17:15 ` [Qemu-devel] [PATCH 14/17] ppc: introduce ppc_get_crf and ppc_set_crf Paolo Bonzini
2014-09-04 18:26   ` Tom Musta
2014-08-28 17:15 ` [Qemu-devel] [PATCH 15/17] ppc: store CR registers in 32 1-bit registers Paolo Bonzini
2014-09-04 18:27   ` Tom Musta
2014-09-09 15:44     ` Paolo Bonzini
2014-09-09 16:41       ` Paolo Bonzini
2014-09-09 16:03     ` Richard Henderson
2014-09-09 16:26       ` Paolo Bonzini
2014-08-28 17:15 ` [Qemu-devel] [PATCH 16/17] ppc: inline ppc_get_crf/ppc_set_crf when clearer Paolo Bonzini
2014-08-28 17:15 ` [Qemu-devel] [PATCH 17/17] ppc: dump all 32 CR bits Paolo Bonzini
2014-08-28 18:05 ` [Qemu-devel] [RFT/RFH PATCH 00/16] PPC speedup patches for TCG Tom Musta

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=540F2DE7.2020501@redhat.com \
    --to=pbonzini@redhat.com \
    --cc=agraf@suse.de \
    --cc=dgibson@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=qemu-ppc@nongnu.org \
    --cc=tommusta@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).