From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from [140.186.70.92] (port=58258 helo=eggs.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1Pc3vj-0004kI-22 for qemu-devel@nongnu.org; Sun, 09 Jan 2011 17:40:07 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Pc3vh-00068W-OP for qemu-devel@nongnu.org; Sun, 09 Jan 2011 17:40:06 -0500 Received: from hall.aurel32.net ([88.191.126.93]:47382) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Pc3vh-00068N-Fo for qemu-devel@nongnu.org; Sun, 09 Jan 2011 17:40:05 -0500 Date: Sun, 9 Jan 2011 23:40:03 +0100 From: Aurelien Jarno Subject: Re: [Qemu-devel] [PATCH 3/3] tcg/arm: improve constant loading Message-ID: <20110109224002.GC21189@volta.aurel32.net> References: <1294350874-6885-1-git-send-email-aurelien@aurel32.net> <1294350874-6885-3-git-send-email-aurelien@aurel32.net> <20110107144035.GA18176@hall.aurel32.net> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-15 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: List-Id: qemu-devel.nongnu.org List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: andrzej zaborowski Cc: qemu-devel@nongnu.org On Fri, Jan 07, 2011 at 04:56:32PM +0100, andrzej zaborowski wrote: > On 7 January 2011 15:40, Aurelien Jarno wrote: > > On Fri, Jan 07, 2011 at 01:52:25PM +0100, andrzej zaborowski wrote: > >> Hi, > >> > >> On 6 January 2011 22:54, Aurelien Jarno wrote: > >> > Improve constant loading in two ways: > >> > - On all ARM versions, it's possible to load 0xffffff00 = -0x100 using > >> >  the mvn rd, #0. Fix the conditions. > >> > - On <= ARMv6 versions, where movw and movt are not available, load the > >> >  constants using mov and orr with rotations depending on the constant > >> >  to load. This is very useful for example to load constants where the > >> >  low byte is 0. This reduce the generated code size by about 7%. > >> > >> That's a nice improvement.  For some instructions using MVN and AND > >> could yield even shorter code and I think with that the optimisation > >> options (except loading from a constant pool) would be exhausted :) > > > > I also did something with MVN and BIC, it works well, but the problem is > > to find the right heuristic to choose between MOV/ORR and MVN/BIC. In my > > tries, it was making the code bigger. > > I was thinking of running both without writing the instructions, then > comparing the lengths and then running the better method. It's > possible that the cost of this outweights the shorter code advantage > though. > > > > >> ... > >> >         } > >> > +    } else { > >> > +        int opc = ARITH_MOV; > >> > +        int rn = 0; > >> > + > >> > +        do { > >> > +            int i, rot; > >> > + > >> > +            i = ctz32(arg) & ~1; > >> > +            rot = ((32 - i) << 7) & 0xf00; > >> > +            tcg_out_dat_imm(s, cond, opc, rd, rn, ((arg >> i) & 0xff) | rot); > >> > +            arg &= ~(0xff << i); > >> > + > >> > +            opc = ARITH_ORR; > >> > +            rn = rd; > >> > >> I think you could get rid of rn and just use rd from the start of the > >> loop.  Otherwise acked by me too. > >> > > > > What do you mean exactly? rn has to be 0 when opc is ARITH_MOV in order > > to generate a correct ARM instruction. > > According to my ARM926 manual rn is ignored for MOV/MVN, perhaps it's > different in later revisions. > I have just tried, and it actually works (tried on ARMv5 and ARMv7). Note that binutils is not able to disassemble such an instruction and outputs in qemu.log something like: | 0x01000008: e3aa50ff undefined instruction 0xe3aa50ff However what worries me the most is that the "ARM Architecture Reference Manual ARMv7-A and ARMv7-R edition" defines this opcode with the rn field as "(0)(0)(0)(0)". Looking at what it means: | An instruction is UNPREDICTABLE if: | [...] | * the pseudocode for that encoding does not indicate that a different |  special case applies, and a bit marked (0) or (1) in the encoding | diagram of an instruction is not 0 or 1 respectively. In short is it still going to work on newer CPUs? -- Aurelien Jarno GPG: 1024D/F1BCDB73 aurelien@aurel32.net http://www.aurel32.net