From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.33) id 1CmN6I-00089y-CR for qemu-devel@nongnu.org; Wed, 05 Jan 2005 21:10:10 -0500 Received: from exim by lists.gnu.org with spam-scanned (Exim 4.33) id 1CmN6H-00089m-VO for qemu-devel@nongnu.org; Wed, 05 Jan 2005 21:10:10 -0500 Received: from [199.232.76.173] (helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.33) id 1CmN6H-00089j-LA for qemu-devel@nongnu.org; Wed, 05 Jan 2005 21:10:09 -0500 Received: from [64.233.184.204] (helo=wproxy.gmail.com) by monty-python.gnu.org with esmtp (Exim 4.34) id 1CmMum-0003Yx-VN for qemu-devel@nongnu.org; Wed, 05 Jan 2005 20:58:17 -0500 Received: by wproxy.gmail.com with SMTP id 67so5501wri for ; Wed, 05 Jan 2005 17:58:16 -0800 (PST) Message-ID: <2e5a35a6050105175820db00e5@mail.gmail.com> Date: Wed, 5 Jan 2005 17:58:13 -0800 From: Dan Hecht Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Subject: [Qemu-devel] conditional branch implementation using dyngen labels Reply-To: dmh23@cornell.edu, qemu-devel@nongnu.org List-Id: qemu-devel.nongnu.org List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: qemu-devel@nongnu.org Hi, I am wondering why you changed the code generation for conditional branches (i386) in gen_jcc() to use dyngen labels? It seems the new code will be lower performing than the old, since there is an extra jump instruction along one of the paths. For example, prior to the recent CVS commit, the code generated for a conditional jump would be something like: 0x087666ce: mov 0x2c(%ebp),%eax 0x087666d1: test %eax,%eax 0x087666d3: jne 0x87666ea 0x087666d5: jmp 0x95e3502 0x087666da: mov $0x82e98ac,%ebx 0x087666df: movl $0x6ca,0x20(%ebp) 0x087666e6: jmp 0x87666fb 0x087666e8: mov %esi,%esi 0x087666ea: jmp 0x95e3f26 0x087666ef: movl $0x6b3,0x20(%ebp) 0x087666f6: mov $0x82e98ad,%ebx 0x087666fb: ret with jmp at 0x087666d5 and 0x087666ea being chained. Now, the code is something like: 0x08bd2085: mov 0x2c(%ebp),%eax 0x08bd2088: test %eax,%eax 0x08bd208a: jne 0x8bd2091 0x08bd208c: jmp 0x8bd20a3 0x08bd2091: jmp 0x9a4f35d 0x08bd2096: movl $0x80555c30,0x20(%ebp) 0x08bd209d: mov $0x8406390,%ebx 0x08bd20a2: ret 0x08bd20a3: jmp 0x9a4fd8f 0x08bd20a8: movl $0x80555c8c,0x20(%ebp) 0x08bd20af: mov $0x8406391,%ebx 0x08bd20b4: ret with the jmp at 0x08bd2091 and 0x08bd20a3 being chained. Notice the extra jmp in the path to the later part of the block. Locally, I have optimized the i386 to generate something like: 0x087686dd: cmpl $0x0,0x2c(%ebp) 0x087686e1: jne 0x95e7981 0x087686e7: jmp 0x95e6f5d 0x087686ec: movl $0x6ca,0x20(%ebp) 0x087686f3: mov $0x82eba54,%ebx 0x087686f8: ret 0x087686f9: movl $0x6b3,0x20(%ebp) 0x08768700: mov $0x82eba55,%ebx 0x08768705: ret with the jne at 0x087686e1 and the jmp at 0x087686e7 getting chained, as an optimization (but haven't had time to clean it up enough to send in for a patch). However, this target specific code is harder to implement with the new broken down micro operations. So, I'm wondering the reasoning behind this change and if there's another way I should have gone about implementing this optimization. Previously, I just rewrite op_jz_subx, etc all in assembly. Thanks in advance, Dan