From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1M7rCX-0001r3-TH for qemu-devel@nongnu.org; Sat, 23 May 2009 09:23:49 -0400 Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1M7rCW-0001q6-Qg for qemu-devel@nongnu.org; Sat, 23 May 2009 09:23:49 -0400 Received: from [199.232.76.173] (port=33661 helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1M7rCW-0001pk-An for qemu-devel@nongnu.org; Sat, 23 May 2009 09:23:48 -0400 Received: from fg-out-1718.google.com ([72.14.220.154]:38563) by monty-python.gnu.org with esmtp (Exim 4.60) (envelope-from ) id 1M7rCV-0004Xp-RD for qemu-devel@nongnu.org; Sat, 23 May 2009 09:23:48 -0400 Received: by fg-out-1718.google.com with SMTP id l27so250714fgb.8 for ; Sat, 23 May 2009 06:23:45 -0700 (PDT) MIME-Version: 1.0 In-Reply-To: <20090520162441.L73304@stanley.csl.cornell.edu> References: <1242745197.24234.7.camel@peak10.cs.hut.fi> <200905201148.43631.paul@codesourcery.com> <761ea48b0905200516g47713089g5d0b06f6f94bcd1a@mail.gmail.com> <20090520162441.L73304@stanley.csl.cornell.edu> Date: Sat, 23 May 2009 15:23:45 +0200 Message-ID: <761ea48b0905230623y1732cbcdt174cfeee010d495b@mail.gmail.com> Subject: Re: [Qemu-devel] Instruction counting instrumentation for ARM + initial patch From: Laurent Desnogues Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable List-Id: qemu-devel.nongnu.org List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Vince Weaver Cc: qemu-devel@nongnu.org On Wed, May 20, 2009 at 10:35 PM, Vince Weaver wrot= e: > > I wonder if a simplistic stats gathering frame work could be added to Qem= u. > > The problem is there currently are at least 3 users of Qemu: > =A01. =A0People who want fast simulation > =A02. =A0People who are doing virtualization > =A03. =A0People trying to do instrumentation/research > > Unfortunately those three groups have conflicting interests. > > The main problem is that adding instrumentation infrastructure will eithe= r > slow down the common case, or else introduce lots of #ifdefs all over the > code. =A0Neither is very attractive. I don't think adding command-line enabled options will slow down the standard translation in a measurable way, provided, for instance, it isn't being checked before running every translated block. If it's checked before/after translating a block, then it shouldn't effect performance. > It would be nice if maybe a limited instrumentation architecture could > be put into qemu, that could be configured out. =A0It would save the vari= ous > researchers the problem of everyone re-implementing it differently. > > It would be nice to have: > =A01. A way to dump an instruction trace (address, length (for CISC), > =A0 =A0 and opcode, CPU# for multi-thread) > =A02. A way to dump memory traces (address, length, possibly the value > =A0 =A0 loaded/stored, CPU# for multi-thread) > =A03. A way to dump basic-block entry/exit > > Many of the various research metrics can be gained from these stats. > =A0#1 and #2 are enough for cache simulators. > =A0#1 (if post-processed) is enough to get a frequency plot for instructi= on > =A0 =A0 count and type. > =A0#1 can be used to extrapolate branch-taken statistics for branch > =A0 =A0 predictors > =A0#3 Can be used for basic block vectors, or to get faster instruction > =A0 =A0 counts You don't need to generate an instruction trace as I said in my previous mail. For user mode applications, a TB trace is enough (of course there are some fine points that can cause trouble to derive the instruction trace from a TB trace such as dynamically generated code, or TB flushing) to derive an instruction trace. As an example, my TB counter requires <30% more time to run one of the SPEC 2k tests, while a full TB trace (binary format >2.7 GB) going to a file doubles the run-time, which is very acceptable. Of course, you then need to process the output using other programs. Getting memory traces would be more intrusive and would certainly slow down simulation significantly. > Pin manages to have their null plugin run very fast; at least one > of the Spec2k binaries runs faster translated than it does natively. Too bad they only support x86 now and are not open source. Anyway Pin is not the only binary tool that can make programs faster, Diablo (LTO) also was able to speed up ARM programs. Laurent