From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([2001:4830:134:3::10]:60237)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <alex.bennee@linaro.org>) id 1W6INP-00017W-GF
	for qemu-devel@nongnu.org; Thu, 23 Jan 2014 06:23:21 -0500
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <alex.bennee@linaro.org>) id 1W6INJ-0005OF-Hu
	for qemu-devel@nongnu.org; Thu, 23 Jan 2014 06:23:15 -0500
Received: from static.88-198-71-155.clients.your-server.de
	([88.198.71.155]:34337 helo=socrates.bennee.com)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <alex.bennee@linaro.org>) id 1W6INJ-0005O6-Bb
	for qemu-devel@nongnu.org; Thu, 23 Jan 2014 06:23:09 -0500
References: <CA+JLOisqRbnhnGN4uQh_1yMOQ9__X2FDgNGtf9rz_hTg4Txdig@mail.gmail.com>
From: Alex =?utf-8?Q?Benn=C3=A9e?= <alex.bennee@linaro.org>
In-reply-to: <CA+JLOisqRbnhnGN4uQh_1yMOQ9__X2FDgNGtf9rz_hTg4Txdig@mail.gmail.com>
Date: Thu, 23 Jan 2014 11:23:04 +0000
Message-ID: <87k3drc57r.fsf@linaro.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 8bit
Subject: Re: [Qemu-devel] [PATCH] cpu: implementing victim TLB for QEMU
	system emulated TLB
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Xin Tong <trent.tong@gmail.com>
Cc: Stefan Hajnoczi <stefanha@gmail.com>, QEMU Developers <qemu-devel@nongnu.org>, aliguori@amazon.com, afaerber@suse.de


trent.tong@gmail.com writes:

> This patch adds a victim TLB to the QEMU system mode TLB.
>
> QEMU system mode page table walks are expensive. Taken by running QEMU
> qemu-system-x86_64 system mode on Intel PIN , a TLB miss and walking a
> 4-level page tables in guest Linux OS takes ~450 X86 instructions on
> average.
<snip>
>
> Attached are some performance results taken on SPECINT2006 train
> dataset and a Intel(R) Xeon(R) CPU  E5620  @ 2.40GHz Linux machine. In
> summary, victim TLB improves the performance of qemu-system-x86_64 by
> 11% on average on SPECINT2006 and with highest improvement of in 254%
> in
> 464.h264ref. And victim TLB does not result in any performance
> degradation in any of the measured benchmarks. Furthermore, the
> implemented victim TLB is architecture independent and is expected to
> benefit other architectures in QEMU as well.
>
> Although there are measurement fluctuations, the performance
> improvement are very significant and by no means in the range of
> noises.
<snip>

I'm curious as the implication seems to be that entries are evicted from
initial TLB lookup before they are "done". What would the impact be of
simply growing the size of the main TLB cache?

What's the current state of instrumentation around the system TLB
handling? Can we trace the hit rates of the various caches with
perf/oprofile/whatever (Stefan?)?

-- 
Alex Bennée