From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1NLgMc-0002eN-40 for qemu-devel@nongnu.org; Fri, 18 Dec 2009 12:11:38 -0500 Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1NLgMa-0002e1-Nw for qemu-devel@nongnu.org; Fri, 18 Dec 2009 12:11:37 -0500 Received: from [199.232.76.173] (port=53284 helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1NLgMa-0002du-3k for qemu-devel@nongnu.org; Fri, 18 Dec 2009 12:11:36 -0500 Received: from are.twiddle.net ([75.149.56.221]:59915) by monty-python.gnu.org with esmtp (Exim 4.60) (envelope-from ) id 1NLgMZ-0000J5-Lr for qemu-devel@nongnu.org; Fri, 18 Dec 2009 12:11:36 -0500 Message-ID: <4B2BB7C3.2040203@twiddle.net> Date: Fri, 18 Dec 2009 09:11:31 -0800 From: Richard Henderson MIME-Version: 1.0 Subject: Re: [Qemu-devel] Re: [PATCH 3/6] tcg-x86_64: Implement setcond and movcond. References: <761ea48b0912170620l534dcb02m8ea6b59524d76dbe@mail.gmail.com> <761ea48b0912180339k18573822wea90289345c58a84@mail.gmail.com> In-Reply-To: <761ea48b0912180339k18573822wea90289345c58a84@mail.gmail.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit List-Id: qemu-devel.nongnu.org List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Laurent Desnogues Cc: qemu-devel@nongnu.org On 12/18/2009 03:39 AM, Laurent Desnogues wrote: >> +static void tcg_out_setcond(TCGContext *s, int cond, TCGArg arg0, >> + TCGArg arg1, TCGArg arg2, int const_arg2, int rexw) > > Perhaps renaming arg0 to dest would make things slightly > more readable. Ok. > Also note that tcg_out_modrm will generate an unneeded prefix > for some registers. cf. the patch I sent to the list months ago. Huh. Didn't notice since the disassembler printed what I expected to see. Is fixing this at the same time a requirement for acceptance? I'd prefer to tackle that separately, since no doubt it affects every use of P_REXB. >> + tgen_arithi32(s, ARITH_AND, arg0, 0xff); > > Wouldn't movzbl be better? Handled inside tgen_arithi32: } else if (c == ARITH_AND && val == 0xffu) { /* movzbl */ tcg_out_modrm(s, 0xb6 | P_EXT | P_REXB, r0, r0); I didn't feel the need to replicate that. > Regarding the xor optimization, I tested it on my i7 and it was > (very) slightly slower running a 64-bit SPEC2k gcc. Huh. It used to be recommended. The partial word store used to stall the pipeline until the old value was ready, and the XOR was special-cased as a clear, which broke both the input dependency and also prevented a partial-register stall on the output. Actually, this recommendation is still present: Section 3.5.1.6 in the November 2009 revision of the Intel Optimization Reference Manual. If it's all the same, I'd prefer to keep what I have there. All other things being equal, the XOR is 2 bytes and the MOVZBL is 3. >> +static void tcg_out_movcond(TCGContext *s, int cond, TCGArg arg0, >> + TCGArg arg1, TCGArg arg2, int const_arg2, >> + TCGArg arg3, TCGArg arg4, int rexw) > > Perhaps renaming arg0 to dest would make things slightly > more readable. Ok. > You should also add a note stating that arg3 != arg4. I don't believe that's true though. It's caught immediately when we emit the movcond opcode, but there's no check later once copy-propagation has been done within TCG. I check for that in the i386 and sparc backends, because dest==arg3 && dest==arg4 would actually generate incorrect code. Here in the x86_64 backend, where we always use cmov it doesn't generate incorrect code, merely inefficient. I could add an early out for that case, if you prefer. >> + { INDEX_op_setcond_i32, { "r", "r", "re" } }, >> + { INDEX_op_setcond_i64, { "r", "r", "re" } }, >> + >> + { INDEX_op_movcond_i32, { "r", "r", "re", "r", "r" } }, >> + { INDEX_op_movcond_i64, { "r", "r", "re", "r", "r" } }, > > For the i32 variants, "ri" instead of "re" is enough. Ah, quite right. r~