From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1M8bj1-00043D-8u for qemu-devel@nongnu.org; Mon, 25 May 2009 11:04:27 -0400 Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1M8biw-00041d-IE for qemu-devel@nongnu.org; Mon, 25 May 2009 11:04:26 -0400 Received: from [199.232.76.173] (port=40451 helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1M8biw-00041a-EB for qemu-devel@nongnu.org; Mon, 25 May 2009 11:04:22 -0400 Received: from smtp-2.hut.fi ([130.233.228.92]:52508) by monty-python.gnu.org with esmtps (TLS-1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.60) (envelope-from ) id 1M8biv-0006jE-Cx for qemu-devel@nongnu.org; Mon, 25 May 2009 11:04:21 -0400 Received: from localhost (katosiko.hut.fi [130.233.228.115]) by smtp-2.hut.fi (8.13.6/8.12.10) with ESMTP id n4PF4HIP017146 for ; Mon, 25 May 2009 18:04:17 +0300 Received: from smtp-2.hut.fi ([130.233.228.92]) by localhost (katosiko.hut.fi [130.233.228.115]) (amavisd-new, port 10024) with LMTP id 13141-376 for ; Mon, 25 May 2009 18:04:17 +0300 (EEST) Received: from [130.233.193.41] (dis.cs.hut.fi [130.233.193.41]) by smtp-2.hut.fi (8.13.6/8.12.10) with ESMTP id n4PF49uL017123 for ; Mon, 25 May 2009 18:04:09 +0300 Subject: Re: [Qemu-devel] Instruction counting instrumentation for ARM + initial patch From: Sami Kiminki In-Reply-To: <761ea48b0905230623y1732cbcdt174cfeee010d495b@mail.gmail.com> References: <1242745197.24234.7.camel@peak10.cs.hut.fi> <200905201148.43631.paul@codesourcery.com> <761ea48b0905200516g47713089g5d0b06f6f94bcd1a@mail.gmail.com> <20090520162441.L73304@stanley.csl.cornell.edu> <761ea48b0905230623y1732cbcdt174cfeee010d495b@mail.gmail.com> Content-Type: text/plain Date: Mon, 25 May 2009 18:04:09 +0300 Message-Id: <1243263849.7173.119.camel@dis> Mime-Version: 1.0 Content-Transfer-Encoding: 7bit List-Id: qemu-devel.nongnu.org List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: qemu-devel@nongnu.org On Sat, 2009-05-23 at 15:23 +0200, Laurent Desnogues wrote: > On Wed, May 20, 2009 at 10:35 PM, Vince Weaver wrote: > > The main problem is that adding instrumentation infrastructure will either > > slow down the common case, or else introduce lots of #ifdefs all over the > > code. Neither is very attractive. > > I don't think adding command-line enabled options will slow down > the standard translation in a measurable way, provided, for instance, > it isn't being checked before running every translated block. If it's > checked before/after translating a block, then it shouldn't effect > performance. I tried to measure the performance difference between vanilla Qemu and Qemu with this patch but without the command-line switch. As suggested above, I couldn't measure the difference. However, to disable this feature compile-time, I think that it should be enough to: 1. define macro instr_count_inc which conveniently eliminates all function calls to instr_count_inc() and instr_count_inc_init(). 2. insert some #ifdefs to disable the framework code (e.g. #ifdef in CPUARMState to remove counters) For my small set of workloads, I've measured around 10%..40% overhead when instruction counting is enabled and this is definitely acceptable for us. Your mileage may vary. > > It would be nice if maybe a limited instrumentation architecture could > > be put into qemu, that could be configured out. It would save the various > > researchers the problem of everyone re-implementing it differently. I think this is something that many software developers would be interested in, too. E.g., getting proper cache utilization etc. BTW, for on-line cache simulation, wouldn't it be enough to instrument memory accesses at TCG level (e.g., tcg_gen_ld8u_i32, ...)? > You don't need to generate an instruction trace as I said in my > previous mail. For user mode applications, a TB trace is enough > (of course there are some fine points that can cause trouble to > derive the instruction trace from a TB trace such as dynamically > generated code, or TB flushing) to derive an instruction trace. We considered this when designing the patch. However, we decided to start with the current implementation due to concerns of dynamically generated code, as you pointed out. And then there's the system emulation. Anyways, the most important thing would be to have the instr_count_inc():s in the decoders. It shouldn't be too hard to change the implementation later with at most trivial modifications to the decoders. Sami Kiminki Embedded Software Group / Helsinki University of Technology