From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1MLLYx-0006dk-IS for qemu-devel@nongnu.org; Mon, 29 Jun 2009 14:26:43 -0400 Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1MLLYs-0006Y9-11 for qemu-devel@nongnu.org; Mon, 29 Jun 2009 14:26:42 -0400 Received: from [199.232.76.173] (port=49710 helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1MLLYr-0006Y2-RI for qemu-devel@nongnu.org; Mon, 29 Jun 2009 14:26:37 -0400 Received: from mail-ew0-f211.google.com ([209.85.219.211]:52959) by monty-python.gnu.org with esmtp (Exim 4.60) (envelope-from ) id 1MLLYr-00033B-Cp for qemu-devel@nongnu.org; Mon, 29 Jun 2009 14:26:37 -0400 Received: by ewy7 with SMTP id 7so5037011ewy.34 for ; Mon, 29 Jun 2009 11:26:36 -0700 (PDT) MIME-Version: 1.0 In-Reply-To: <761ea48b0906291059g13103602uc678cc318ac63015@mail.gmail.com> References: <5b31733c0906291050w355b2fe0n9ac6f62f3486e47c@mail.gmail.com> <761ea48b0906291059g13103602uc678cc318ac63015@mail.gmail.com> Date: Mon, 29 Jun 2009 20:26:35 +0200 Message-ID: <5b31733c0906291126l5c4df229pdfd9e7faf88aa292@mail.gmail.com> Subject: Re: [Qemu-devel] [PATCH 0/3] RFC: TCG ARM optimizations From: Filip Navara Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit List-Id: qemu-devel.nongnu.org List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Laurent Desnogues Cc: qemu-devel On Mon, Jun 29, 2009 at 7:59 PM, Laurent Desnogues wrote: > On Mon, Jun 29, 2009 at 7:50 PM, Filip Navara wrote: >> >> Big thanks goes to Laurent Desnogues who actually had suggested where >> the bottlenecks are. > > IIRC it's Paul who suggested that idea first on IRC months > ago, Yeah, this Paul guy keeps coming with good ideas lately. His work helped me a lot in writing my bachelor thesis and saved me countless hours. I own him a beverage at very least, but somehow I doubt he will come to my little country any time soon. > as I was complaining about the stupidity of the generated > code :-) Let's keep complaining, maybe someone will improve it over the time. With the applied patches the OP statistics now look like this: mov_i32 1925 movi_i32 1556 add_i32 518 ld_i32 257 exit_tb 247 brcond_i32 225 qemu_ld32u 219 set_label 207 ... Some minor improvements could be done to the usage of TCG temporary variables in target-arm/translate.c. That's something that could be done gradually and without any substantial effort. It would probably increase the speed by about 1 to 5 percents. Another idea is to group blocks of conditional instructions to avoid unnecessary jumps. That would help with code like this: 0x00200d28: cmp lr, #0 ; 0x0 0x00200d2c: movle r0, #1 ; 0x1 0x00200d30: movle r1, r5 0x00200d34: movle ip, r0 0x00200d38: ble 0x200d64 I'm not sure how common pattern this is and I didn't do any further investigation yet. Lastly, the code generated for softmmu memory loads/stores could probably be optimized in some cases. It uses hard-coded registers. It's not optimized for multiple stores to adjacent locations (pushing multiple registers to stack) and does all the calculations again and again. This results not only in recomputing numbers we already have (as long as the stack is still on the same guest page), but also in huge TBs. I imagine that doesn't help the processor cache too much. This would probably benefit all targets. In fact I believe the softmmu code could be moved out of the TCG target-specific code and into the main code (with the possibility to override it with optimized version). Best regards, Filip Navara