From mboxrd@z Thu Jan 1 00:00:00 1970 From: Christian Ehrhardt Date: Tue, 15 Apr 2008 07:34:47 +0000 Subject: Re: [kvm-ppc-devel] [PATCH] [1/3] kvmppc: add detailed instruction Message-Id: <48045A97.90804@linux.vnet.ibm.com> List-Id: References: <12081758423120-git-send-email-ehrhardt@linux.vnet.ibm.com> In-Reply-To: <12081758423120-git-send-email-ehrhardt@linux.vnet.ibm.com> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable To: kvm-ppc@vger.kernel.org Hollis Blanchard wrote: > On Monday 14 April 2008 07:23:56 ehrhardt@linux.vnet.ibm.com wrote: >> From: Christian Ehrhardt >> >> This extends the kvm_stat reports for kvm on embedded powerpc. Since kvm= ppc >> is using emulation (no hardware support yet) this gives people interested >> in performance a detailed split of the emul_instruction counter already >> available. This statistic does not cover e.g. operants of the commands, = but >> that way it should have only a small perf impact (never break what you w= ant >> to measure). This feature is configurable in .config under the kvmppc >> virtualization itself. >=20 > This array-based approach seems to add a lot of lines of code, and it's a= lso=20 > copy/paste stuff that is just begging for a typo (e.g. miscounting STHX i= n=20 > the STHUX bucket) or being forgotten entirely when adding new emulation c= ode. >=20 > A more general approach would be to record just the opcode/extended opcod= e in=20 > a variable-sized structure, allocating a new bucket as we encounter a=20 > previously unseen instruction. I'm thinking of something like this: I already thought about some bucket like counting, but I focused small runt= ime overhead above everything else here by hijacking the already existent s= witch/case and avoiding any new if's & function calls. I agree that the array like solution is messy in the aspect of lines and is= more error prone by forgetting something. > log_instruction(inst) { > bucket =3D hash_u32(inst, 5); > list_for_each(l, vcpu->instlog[bucket]) { > struct instlog =3D list_entry(l); > if (instlog->inst =3D inst) { > instlog->count++; > break; > } > } > } >=20 > emulate(inst) { > log =3D get_op(inst); > switch (get_op(inst)) { > ... > case 31: > log |=3D get_xop(inst); > switch (inst) { > ... > } > } > log_instruction(log); > } I agree that this looks better, but is has per instruction: 1x hash function a loop wich runs ~1-2 times an if (do we actually have long pipelines that suffer from wrong branch pre= diction?) When we add more functions here like mapping sprn's and whatever else comes= in mind later this might get even more complex, while this statistics shou= ld be the low overhead stats. The thing I want to point out is that the tracing with the relay channel al= so only have (speaking of the reduced version discussed in the other thread= with only 3 integers) 3 integers to move into a buffer and some overhead t= o select the right buffer. The userspace application reads 4096b =3D> 341 r= ecords per scheduling of the reading app and we might even increase that si= ze. I really like the "full trace" feeling of the relay based patch, so I d= on't like to increase the functionality&overhead of this patch which was in= tended to be low-overhead. As stated above I agree with you about lines of code and error proneness of= that patch. What about that: - for integration into our upstream code only the relay based approach is t= ake into account, giving a full trace while it has some overhead - we keep the slight overhead, but error prone kvm_stat based patch in our = private queues and only use it for our own measurements until we either nee= d or change it once we have more experience with the relay based tracing (e= .g. know the perf. overhead of that better) - That way we would have two very similar solutions that share code and are= configurable so developers can use them as they want - additionally I think again about Arnd's comments and if we might need/wan= t a runtime switch e.g. proc/sys interface to enable/disable that tracing f= unctions on the fly Comments welcome > It looks like we could build a hash table pretty easily with hash_long() = and=20 > list_entry stuff (see fs/mbcache.c for example). So far you've found 17=20 > different instructions types emulated in the "boot" workload, so 32 entri= es=20 > would probably be a reasonable hash size. The same approach could be used= for=20 > SPR accesses, where you've hit 31 different registers. Really I think tho= se=20 > numbers won't vary much by workload, but rather by guest kernel... >=20 --=20 Gr=FCsse / regards,=20 Christian Ehrhardt IBM Linux Technology Center, Open Virtualization ------------------------------------------------------------------------- This SF.net email is sponsored by the 2008 JavaOne(SM) Conference=20 Don't miss this year's exciting event. There's still time to save $100.=20 Use priority code J8TL2D2.=20 http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/java= one _______________________________________________ kvm-ppc-devel mailing list kvm-ppc-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-ppc-devel