Re: [Qemu-devel] outlined TLB lookup on x86

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Richard Henderson <rth@twiddle.net>
To: Xin Tong <trent.tong@gmail.com>, qemu-devel@nongnu.org
Subject: Re: [Qemu-devel] outlined TLB lookup on x86
Date: Thu, 28 Nov 2013 15:12:04 +1300	[thread overview]
Message-ID: <5296A674.2080406@twiddle.net> (raw)
In-Reply-To: <CA+JLOiuyXVhU-=yHra73G-Few4WicoPPW1AAf9eMupJXUmWqWw@mail.gmail.com>

On 11/27/2013 08:41 PM, Xin Tong wrote:
> I am trying to implement a out-of-line TLB lookup for QEMU softmmu-x86-64 on
> x86-64 machine, potentially for better instruction cache performance, I have a
> few  questions.
> 
> 1. I see that tcg_out_qemu_ld_slow_path/tcg_out_qemu_st_slow_path are generated
> when tcg_out_tb_finalize is called. And when a TLB lookup misses, it jumps to
> the generated slow path and slow path refills the TLB, then load/store and
> jumps to the next emulated instruction. I am wondering is it easy to outline
> the code for the slow path.

Hard.  There's quite a bit of code on that slow path that's unique to the
surrounding code context -- which registers contain inputs and outputs, where
to continue after slow path.

The amount of code that's in the TB slow path now is approximately minimal, as
far as I can see.  If you've got an idea for improvement, please share.  ;-)

> I am thinking when a TLB misses, the outlined TLB
> lookup code should generate a call out to the qemu_ld/st_helpers[opc &
> ~MO_SIGN] and rewalk the TLB after its refilled ? This code is off the critical
> path, so its not as important as the code when TLB hits.

That would work for true TLB misses to RAM, but does not work for memory mapped
I/O.

> 2. why not use a TLB or bigger size?  currently the TLB has 1<<8 entries. the
> TLB lookup is 10 x86 instructions , but every miss needs ~450 instructions, i
> measured this using Intel PIN. so even the miss rate is low (say 3%) the
> overall time spent in the cpu_x86_handle_mmu_fault is still signifcant.

I'd be interested to experiment with different TLB sizes, to see what effect
that has on performance.  But I suspect that lack of TLB contexts mean that we
wind up flushing the TLB more often than real hardware does, and therefore a
larger TLB merely takes longer to flush.

But be aware that we can't simply make the change universally.  E.g. ARM can
use an immediate 8-bit operand during the TLB lookup, but would have to use
several insns to perform a 9-bit mask.

>  I am
> thinking the tlb may need to be organized in a set associative fashion to
> reduce conflict miss, e.g. 2 way set associative to reduce the miss rate. or
> have a victim tlb that is 4 way associative and use x86 simd instructions to do
> the lookup once the direct-mapped tlb misses. Has anybody done any work on this
> front ?

Even with SIMD, I don't believe you could make the fast-path of a set
associative lookup fast.  This is the sort of thing for which you really need
the dedicated hardware of the real TLB.  Feel free to prove me wrong with code,
of course.

r~

next prev parent reply	other threads:[~2013-11-28  2:12 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-11-27  7:41 [Qemu-devel] outlined TLB lookup on x86 Xin Tong
2013-11-27 13:12 ` Lluís Vilanova
2013-11-28  1:58   ` Xin Tong
2013-11-28 16:12     ` Lluís Vilanova
2013-12-08 10:54       ` Xin Tong
2013-12-17 13:52         ` Xin Tong
2013-12-18  2:22           ` Xin Tong
2014-01-21 14:22             ` Xin Tong
2014-01-21 14:28               ` Peter Maydell
2013-12-09 12:18       ` Xin Tong
2013-12-09 15:31         ` Lluís Vilanova
2013-11-28  2:12 ` Richard Henderson [this message]
2013-11-28  3:56   ` Xin Tong
2013-12-08 11:19   ` Avi Kivity
2014-01-22 15:28   ` Xin Tong
2014-01-22 16:34     ` Richard Henderson
2014-01-22 16:55     ` Peter Maydell
2014-01-22 17:32       ` Richard Henderson
2014-01-22 17:35         ` Peter Maydell
2014-01-22 17:45           ` Richard Henderson
2014-01-22 17:56             ` Xin Tong

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5296A674.2080406@twiddle.net \
    --to=rth@twiddle.net \
    --cc=qemu-devel@nongnu.org \
    --cc=trent.tong@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.