* [Qemu-devel] [PATCH 0/3] RFC: TCG ARM optimizations @ 2009-06-29 17:50 Filip Navara 2009-06-29 17:59 ` Laurent Desnogues 0 siblings, 1 reply; 4+ messages in thread From: Filip Navara @ 2009-06-29 17:50 UTC (permalink / raw) To: qemu-devel Hello! I have been playing with some optimizations on the generated code for ARM target and x86 host. The result are the following three patches that improve the performance by 10% for Dhrystone benchmark compiled for the ARM7TDMI target. Also the size of the output x86 code has shrunk by up to 40% in some cases. These patches are relatively small and self-contained, so I would like to get them to merged eventually if others agree. I've some ideas for further optimizations, but implementing them would require substantial effort and the benefits are questionable. Big thanks goes to Laurent Desnogues who actually had suggested where the bottlenecks are. Best regards, Filip Navara ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [Qemu-devel] [PATCH 0/3] RFC: TCG ARM optimizations 2009-06-29 17:50 [Qemu-devel] [PATCH 0/3] RFC: TCG ARM optimizations Filip Navara @ 2009-06-29 17:59 ` Laurent Desnogues 2009-06-29 18:26 ` Filip Navara 0 siblings, 1 reply; 4+ messages in thread From: Laurent Desnogues @ 2009-06-29 17:59 UTC (permalink / raw) To: Filip Navara; +Cc: qemu-devel On Mon, Jun 29, 2009 at 7:50 PM, Filip Navara<filip.navara@gmail.com> wrote: > > Big thanks goes to Laurent Desnogues who actually had suggested where > the bottlenecks are. IIRC it's Paul who suggested that idea first on IRC months ago, as I was complaining about the stupidity of the generated code :-) Laurent ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [Qemu-devel] [PATCH 0/3] RFC: TCG ARM optimizations 2009-06-29 17:59 ` Laurent Desnogues @ 2009-06-29 18:26 ` Filip Navara 2009-06-29 18:58 ` Blue Swirl 0 siblings, 1 reply; 4+ messages in thread From: Filip Navara @ 2009-06-29 18:26 UTC (permalink / raw) To: Laurent Desnogues; +Cc: qemu-devel On Mon, Jun 29, 2009 at 7:59 PM, Laurent Desnogues<laurent.desnogues@gmail.com> wrote: > On Mon, Jun 29, 2009 at 7:50 PM, Filip Navara<filip.navara@gmail.com> wrote: >> >> Big thanks goes to Laurent Desnogues who actually had suggested where >> the bottlenecks are. > > IIRC it's Paul who suggested that idea first on IRC months > ago, Yeah, this Paul guy keeps coming with good ideas lately. His work helped me a lot in writing my bachelor thesis and saved me countless hours. I own him a beverage at very least, but somehow I doubt he will come to my little country any time soon. > as I was complaining about the stupidity of the generated > code :-) Let's keep complaining, maybe someone will improve it over the time. With the applied patches the OP statistics now look like this: mov_i32 1925 movi_i32 1556 add_i32 518 ld_i32 257 exit_tb 247 brcond_i32 225 qemu_ld32u 219 set_label 207 ... Some minor improvements could be done to the usage of TCG temporary variables in target-arm/translate.c. That's something that could be done gradually and without any substantial effort. It would probably increase the speed by about 1 to 5 percents. Another idea is to group blocks of conditional instructions to avoid unnecessary jumps. That would help with code like this: 0x00200d28: cmp lr, #0 ; 0x0 0x00200d2c: movle r0, #1 ; 0x1 0x00200d30: movle r1, r5 0x00200d34: movle ip, r0 0x00200d38: ble 0x200d64 I'm not sure how common pattern this is and I didn't do any further investigation yet. Lastly, the code generated for softmmu memory loads/stores could probably be optimized in some cases. It uses hard-coded registers. It's not optimized for multiple stores to adjacent locations (pushing multiple registers to stack) and does all the calculations again and again. This results not only in recomputing numbers we already have (as long as the stack is still on the same guest page), but also in huge TBs. I imagine that doesn't help the processor cache too much. This would probably benefit all targets. In fact I believe the softmmu code could be moved out of the TCG target-specific code and into the main code (with the possibility to override it with optimized version). Best regards, Filip Navara ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [Qemu-devel] [PATCH 0/3] RFC: TCG ARM optimizations 2009-06-29 18:26 ` Filip Navara @ 2009-06-29 18:58 ` Blue Swirl 0 siblings, 0 replies; 4+ messages in thread From: Blue Swirl @ 2009-06-29 18:58 UTC (permalink / raw) To: Filip Navara; +Cc: Laurent Desnogues, qemu-devel On 6/29/09, Filip Navara <filip.navara@gmail.com> wrote: > Lastly, the code generated for softmmu memory loads/stores could > probably be optimized in some cases. It uses hard-coded registers. > It's not optimized for multiple stores to adjacent locations (pushing > multiple registers to stack) and does all the calculations again and > again. This results not only in recomputing numbers we already have > (as long as the stack is still on the same guest page), but also in > huge TBs. I imagine that doesn't help the processor cache too much. > This would probably benefit all targets. In fact I believe the softmmu > code could be moved out of the TCG target-specific code and into the > main code (with the possibility to override it with optimized > version). Interesting. We could add a new optional TCG instruction op_ld_g2h (extracted from qemu_ld) that performs the TLB lookup and returns the host address. When multiple accesses near the same guest address are detected (how?), the translator can reuse the host address, perform some math and check if the guest page is still same. If true, ld_raw can be used, otherwise recalculate the host address. On the performance side, qemu_ld on Sparc host uses 9 instructions in the TLB hit case before the access. Maybe this would lower the number a bit but not too much. ^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2009-06-29 18:59 UTC | newest] Thread overview: 4+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2009-06-29 17:50 [Qemu-devel] [PATCH 0/3] RFC: TCG ARM optimizations Filip Navara 2009-06-29 17:59 ` Laurent Desnogues 2009-06-29 18:26 ` Filip Navara 2009-06-29 18:58 ` Blue Swirl
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).