From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43)
	id 1K19QH-0004G8-Qm
	for qemu-devel@nongnu.org; Tue, 27 May 2008 20:21:45 -0400
Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43)
	id 1K19QH-0004FA-1h
	for qemu-devel@nongnu.org; Tue, 27 May 2008 20:21:45 -0400
Received: from [199.232.76.173] (port=39193 helo=monty-python.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.43) id 1K19QG-0004Ev-Rv
	for qemu-devel@nongnu.org; Tue, 27 May 2008 20:21:44 -0400
Received: from csl.cornell.edu ([128.84.224.10]:3798 helo=vlsi.csl.cornell.edu)
	by monty-python.gnu.org with esmtp (Exim 4.60)
	(envelope-from <vince@bell.csl.cornell.edu>) id 1K19QG-0003Oc-Iu
	for qemu-devel@nongnu.org; Tue, 27 May 2008 20:21:44 -0400
Received: from bell.csl.cornell.edu (bell.csl.cornell.edu [128.84.224.41])
	by vlsi.csl.cornell.edu (8.13.4/8.13.4) with ESMTP id m4S0LXUA049407
	for <qemu-devel@nongnu.org>; Tue, 27 May 2008 20:21:38 -0400 (EDT)
Date: Tue, 27 May 2008 20:21:33 -0400 (EDT)
From: Vince Weaver <vince@csl.cornell.edu>
Subject: Re: [Qemu-devel] Re: Performance Monitoring
In-Reply-To: <3000d2e90805250522o54fdaa17g43d716d8f15dfe9d@mail.gmail.com>
Message-ID: <20080527201827.D48790-100000@bell.csl.cornell.edu>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Reply-To: qemu-devel@nongnu.org
List-Id: qemu-devel.nongnu.org
List-Unsubscribe: <http://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.gnu.org/pipermail/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <http://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: qemu-devel@nongnu.org


On Sun, 25 May 2008, Cheif Jones wrote:
> Your suggested patch is a good solution. One thing bothers me that there is
> a TB caching mechanism on top of the opcode translation mechanism. If the TB
> cache is not disabled, your patch might give inaccurate results (e.g a MIPS
> loop is translated to host CPU, gets cached, and executed N times from cache
> without being re-translated). The TB cache is implemented in tb_find_*()
> FYI.

I'm pretty sure the code I gave you inserts the dumping code into the
TB, so the fact that the TB is cached shouldn't matter.  I could be wrong,
as the Qemu internals can be a bit confusing.

I've run experiments with MIPS binaries on both actual r12k hardware with
performance counters and with the Qemu generated results though and the
results match to less than 1% different on intructions retired for the
SPEC CPU 2000 benchmarks.

Vince


>
> Am i missing something?
>
> Chief
>
> On Fri, May 23, 2008 at 6:38 AM, Vince Weaver <vince@csl.cornell.edu> wrote:
>
> >
> > > I would like to run an OS, say Linux, and take a sample for a small
> > period
> > > of time (seconds) while some app(s) are running and get a list of opcode
> > > names and how many times they were executed. I'm not interested in CPI at
> > > the moment.
> >
> > What you are trying to do is relatively straightforward, especially if you
> > are going to be running binaries from a RISC type machine.
> >
> > The way I'd recommend doing it is getting Qemu to output the raw
> > instruction stream, and then write an external program that
> > decodes the instructions and counts what kinds are in each.  This
> > is fairly straightforward to do on an arch like MIPS; it would be
> > very complicated on something like x86.
> >
> > I have some code I can dig up that does this kind of thing (I used
> > it to run a branch predictor simulator).  I'll include it at the end ot
> > this e-mail.
> >
> > >    - Paul mentioned "With either alternative you'll still have issues
> > with
> > >    exceptions. MMU faults abort a TB early, so will screw up your
> > statistics.
> > >    One possibility is to terminate a TB on every memory access, like we
> > do for
> > >    watchpoints." - is this an issue addressed by your patch?
> >
> > I've actually only tested my method of generating things with the
> > userspace linux-user type method of emulation, I haven't tested it at all
> > when doing full-system simulation.  I'd imagine it would still work.
> >
> >
> > Here's the code.  It's based on a pre-TCG version of Qemu so you can't use
> > it on the latest snapshots.  It also only works with MIPS, but it
> > probably will be similar with other architectures.  The code
> > buffers a large block of values before writing it out (for performance).
> > To avoid creating huge traces to disk (and they will be huge) you
> > can write to a named pipe (mkfifo) and have your analysis routine
> > run at the same time reading in from the same pipe.
> >
> > Hopefully if I am doing something horribly wrong with this code, someone
> > will correct me.  I've been using it for a while now though and have been
> > getting good results when compared to hw perf counters.
> >
> >
> > This adds code to dump the pc and instruction every executed instruction:
> >
> > --- ./target-mips/translate.c   2008-04-23 12:23:55.000000000 -0400
> > +++ ./target-mips/translate.c   2008-05-22 23:31:13.000000000 -0400
> > @@ -6696,6 +6696,7 @@
> >             gen_opc_instr_start[lj] = 1;
> >         }
> >         ctx.opcode = ldl_code(ctx.pc);
> > +        gen_op_dump_brpred(ctx.pc,ctx.opcode);
> >         decode_opc(env, &ctx);
> >         ctx.pc += 4;
> >
> >
> > Add this to "op.c"
> >
> > void op_dump_brpred(void) {
> >   helper_dump_brpred(PARAM1,PARAM2);
> > }
> >
> > Add this to "helper.c":
> >
> > static int brpred_fd=-1,brpred_ptr=0;
> >
> > static char error_message[]="Write error!\n";
> >
> > struct brpredtype {
> >   unsigned int addr;
> >   unsigned int insn;
> > } __attribute__((__packed__));
> >
> > #define TRACE_UNITS 4096
> >
> > static struct brpredtype brpred_buf[TRACE_UNITS];
> >
> > void helper_dump_brpred(unsigned long address,unsigned long insn) {
> >
> >     int result;
> >
> >     if (brpred_fd<0) {
> >        brpred_fd=creat("trace.bpred",0666);
> >     }
> >
> >     brpred_buf[memtrace_ptr].addr=address;
> >     brpred_buf[memtrace_ptr].insn=insn;
> >
> >     brpred_ptr++;
> >
> >     if (brpred_ptr>TRACE_UNITS) {
> >        brpred_ptr=0;
> >        result=write(brpred_fd,brpred_buf,
> >                     TRACE_UNITS*sizeof(struct brpredtype));
> >        if (result!=TRACE_UNITS*sizeof(struct brpredtype)) {
> >           write(2,error_message,13);
> >        }
> >     }
> > }
> >
> >
> >
> >
> >
>

-- 
/*  Vince Weaver  vince@csl.cornell.edu  http://csl.cornell.edu/~vince  */

main(){char O,o[66]="|\n\\/_  ",*I=o+7,l[]="B!FhhBHCWE9C?cJFKET$+h'Iq*chT"
,i=0,_;while(_=l[i++])for(O=0;O++<_>>5;)*I=*(I++-(_&31));*I=0;puts(o+5);}