From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1JzO6w-00037D-E1 for qemu-devel@nongnu.org; Thu, 22 May 2008 23:38:30 -0400 Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1JzO6t-00030d-AM for qemu-devel@nongnu.org; Thu, 22 May 2008 23:38:29 -0400 Received: from [199.232.76.173] (port=33541 helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1JzO6t-00030M-4i for qemu-devel@nongnu.org; Thu, 22 May 2008 23:38:27 -0400 Received: from csl.cornell.edu ([128.84.224.10]:4516 helo=vlsi.csl.cornell.edu) by monty-python.gnu.org with esmtp (Exim 4.60) (envelope-from ) id 1JzO6t-0002f2-1R for qemu-devel@nongnu.org; Thu, 22 May 2008 23:38:27 -0400 Received: from bell.csl.cornell.edu (bell.csl.cornell.edu [128.84.224.41]) by vlsi.csl.cornell.edu (8.13.4/8.13.4) with ESMTP id m4N3cLUe028290 for ; Thu, 22 May 2008 23:38:26 -0400 (EDT) Date: Thu, 22 May 2008 23:38:21 -0400 (EDT) From: Vince Weaver Subject: Re: [Qemu-devel] Re: Performance Monitoring In-Reply-To: <3000d2e90805212313r3c3fbaf0laabb4074f6cb52e7@mail.gmail.com> Message-ID: <20080522224940.M38819-100000@bell.csl.cornell.edu> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Reply-To: qemu-devel@nongnu.org List-Id: qemu-devel.nongnu.org List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: qemu-devel@nongnu.org > I would like to run an OS, say Linux, and take a sample for a small period > of time (seconds) while some app(s) are running and get a list of opcode > names and how many times they were executed. I'm not interested in CPI at > the moment. What you are trying to do is relatively straightforward, especially if you are going to be running binaries from a RISC type machine. The way I'd recommend doing it is getting Qemu to output the raw instruction stream, and then write an external program that decodes the instructions and counts what kinds are in each. This is fairly straightforward to do on an arch like MIPS; it would be very complicated on something like x86. I have some code I can dig up that does this kind of thing (I used it to run a branch predictor simulator). I'll include it at the end ot this e-mail. > - Paul mentioned "With either alternative you'll still have issues with > exceptions. MMU faults abort a TB early, so will screw up your statistics. > One possibility is to terminate a TB on every memory access, like we do for > watchpoints." - is this an issue addressed by your patch? I've actually only tested my method of generating things with the userspace linux-user type method of emulation, I haven't tested it at all when doing full-system simulation. I'd imagine it would still work. Here's the code. It's based on a pre-TCG version of Qemu so you can't use it on the latest snapshots. It also only works with MIPS, but it probably will be similar with other architectures. The code buffers a large block of values before writing it out (for performance). To avoid creating huge traces to disk (and they will be huge) you can write to a named pipe (mkfifo) and have your analysis routine run at the same time reading in from the same pipe. Hopefully if I am doing something horribly wrong with this code, someone will correct me. I've been using it for a while now though and have been getting good results when compared to hw perf counters. This adds code to dump the pc and instruction every executed instruction: --- ./target-mips/translate.c 2008-04-23 12:23:55.000000000 -0400 +++ ./target-mips/translate.c 2008-05-22 23:31:13.000000000 -0400 @@ -6696,6 +6696,7 @@ gen_opc_instr_start[lj] = 1; } ctx.opcode = ldl_code(ctx.pc); + gen_op_dump_brpred(ctx.pc,ctx.opcode); decode_opc(env, &ctx); ctx.pc += 4; Add this to "op.c" void op_dump_brpred(void) { helper_dump_brpred(PARAM1,PARAM2); } Add this to "helper.c": static int brpred_fd=-1,brpred_ptr=0; static char error_message[]="Write error!\n"; struct brpredtype { unsigned int addr; unsigned int insn; } __attribute__((__packed__)); #define TRACE_UNITS 4096 static struct brpredtype brpred_buf[TRACE_UNITS]; void helper_dump_brpred(unsigned long address,unsigned long insn) { int result; if (brpred_fd<0) { brpred_fd=creat("trace.bpred",0666); } brpred_buf[memtrace_ptr].addr=address; brpred_buf[memtrace_ptr].insn=insn; brpred_ptr++; if (brpred_ptr>TRACE_UNITS) { brpred_ptr=0; result=write(brpred_fd,brpred_buf, TRACE_UNITS*sizeof(struct brpredtype)); if (result!=TRACE_UNITS*sizeof(struct brpredtype)) { write(2,error_message,13); } } }