From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43)
	id 1M7rCX-0001r3-TH
	for qemu-devel@nongnu.org; Sat, 23 May 2009 09:23:49 -0400
Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43)
	id 1M7rCW-0001q6-Qg
	for qemu-devel@nongnu.org; Sat, 23 May 2009 09:23:49 -0400
Received: from [199.232.76.173] (port=33661 helo=monty-python.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.43) id 1M7rCW-0001pk-An
	for qemu-devel@nongnu.org; Sat, 23 May 2009 09:23:48 -0400
Received: from fg-out-1718.google.com ([72.14.220.154]:38563)
	by monty-python.gnu.org with esmtp (Exim 4.60)
	(envelope-from <laurent.desnogues@gmail.com>) id 1M7rCV-0004Xp-RD
	for qemu-devel@nongnu.org; Sat, 23 May 2009 09:23:48 -0400
Received: by fg-out-1718.google.com with SMTP id l27so250714fgb.8
	for <qemu-devel@nongnu.org>; Sat, 23 May 2009 06:23:45 -0700 (PDT)
MIME-Version: 1.0
In-Reply-To: <20090520162441.L73304@stanley.csl.cornell.edu>
References: <1242745197.24234.7.camel@peak10.cs.hut.fi>
	<200905201148.43631.paul@codesourcery.com>
	<761ea48b0905200516g47713089g5d0b06f6f94bcd1a@mail.gmail.com>
	<20090520162441.L73304@stanley.csl.cornell.edu>
Date: Sat, 23 May 2009 15:23:45 +0200
Message-ID: <761ea48b0905230623y1732cbcdt174cfeee010d495b@mail.gmail.com>
Subject: Re: [Qemu-devel] Instruction counting instrumentation for ARM +
	initial patch
From: Laurent Desnogues <laurent.desnogues@gmail.com>
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable
List-Id: qemu-devel.nongnu.org
List-Unsubscribe: <http://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.gnu.org/pipermail/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <http://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Vince Weaver <vince@csl.cornell.edu>
Cc: qemu-devel@nongnu.org

On Wed, May 20, 2009 at 10:35 PM, Vince Weaver <vince@csl.cornell.edu> wrot=
e:
>
> I wonder if a simplistic stats gathering frame work could be added to Qem=
u.
>
> The problem is there currently are at least 3 users of Qemu:
> =A01. =A0People who want fast simulation
> =A02. =A0People who are doing virtualization
> =A03. =A0People trying to do instrumentation/research
>
> Unfortunately those three groups have conflicting interests.
>
> The main problem is that adding instrumentation infrastructure will eithe=
r
> slow down the common case, or else introduce lots of #ifdefs all over the
> code. =A0Neither is very attractive.

I don't think adding command-line enabled options will slow down
the standard translation in a measurable way, provided, for instance,
it isn't being checked before running every translated block.  If it's
checked before/after translating a block, then it shouldn't effect
performance.

> It would be nice if maybe a limited instrumentation architecture could
> be put into qemu, that could be configured out. =A0It would save the vari=
ous
> researchers the problem of everyone re-implementing it differently.
>
> It would be nice to have:
> =A01. A way to dump an instruction trace (address, length (for CISC),
> =A0 =A0 and opcode, CPU# for multi-thread)
> =A02. A way to dump memory traces (address, length, possibly the value
> =A0 =A0 loaded/stored, CPU# for multi-thread)
> =A03. A way to dump basic-block entry/exit
>
> Many of the various research metrics can be gained from these stats.
> =A0#1 and #2 are enough for cache simulators.
> =A0#1 (if post-processed) is enough to get a frequency plot for instructi=
on
> =A0 =A0 count and type.
> =A0#1 can be used to extrapolate branch-taken statistics for branch
> =A0 =A0 predictors
> =A0#3 Can be used for basic block vectors, or to get faster instruction
> =A0 =A0 counts

You don't need to generate an instruction trace as I said in my
previous mail.  For user mode applications, a TB trace is enough
(of course there are some fine points that can cause trouble to
derive the instruction trace from a TB trace such as dynamically
generated code, or TB flushing) to derive an instruction trace.

As an example, my TB counter requires <30% more time to run
one of the SPEC 2k tests, while a full TB trace (binary format
>2.7 GB) going to a file doubles the run-time, which is very
acceptable. Of course, you then need to process the output
using other programs.

Getting memory traces would be more intrusive and would
certainly slow down simulation significantly.

> Pin manages to have their null plugin run very fast; at least one
> of the Spec2k binaries runs faster translated than it does natively.

Too bad they only support x86 now and are not open source.
Anyway Pin is not the only binary tool that can make programs
faster, Diablo (LTO) also was able to speed up ARM programs.


Laurent