From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([140.186.70.92]:59834) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1QLaWP-0003Yl-Ak for qemu-devel@nongnu.org; Sun, 15 May 2011 08:34:10 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1QLaWN-0008GF-UM for qemu-devel@nongnu.org; Sun, 15 May 2011 08:34:09 -0400 Received: from mail-qy0-f173.google.com ([209.85.216.173]:42721) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1QLaWN-0008GB-Q9 for qemu-devel@nongnu.org; Sun, 15 May 2011 08:34:07 -0400 Received: by qyk36 with SMTP id 36so1086778qyk.4 for ; Sun, 15 May 2011 05:34:07 -0700 (PDT) MIME-Version: 1.0 In-Reply-To: <20110515113742.GJ30615@hall.aurel32.net> References: <20110514211616.GA13600@volta.aurel32.net> <20110514220447.GA30615@hall.aurel32.net> <20110515111428.GH30615@hall.aurel32.net> <20110515113742.GJ30615@hall.aurel32.net> From: Blue Swirl Date: Sun, 15 May 2011 15:33:47 +0300 Message-ID: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Subject: Re: [Qemu-devel] [PATCH RFC 00/11] AREG0 elimination List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Aurelien Jarno Cc: Laurent Desnogues , qemu-devel On Sun, May 15, 2011 at 2:37 PM, Aurelien Jarno wrot= e: > On Sun, May 15, 2011 at 02:33:55PM +0300, Blue Swirl wrote: >> On Sun, May 15, 2011 at 2:14 PM, Aurelien Jarno w= rote: >> > On Sun, May 15, 2011 at 02:03:40PM +0300, Blue Swirl wrote: >> >> On Sun, May 15, 2011 at 12:27 PM, Laurent Desnogues >> >> wrote: >> >> > On Sun, May 15, 2011 at 9:15 AM, Blue Swirl = wrote: >> >> >> On Sun, May 15, 2011 at 1:04 AM, Aurelien Jarno wrote: >> >> >>> On Sun, May 15, 2011 at 12:52:35AM +0300, Blue Swirl wrote: >> >> >>>> On Sun, May 15, 2011 at 12:16 AM, Aurelien Jarno wrote: >> >> >>>> > On Sat, May 14, 2011 at 10:35:20PM +0300, Blue Swirl wrote: >> >> > [...] >> >> >>>> > The env register is used very often (basically for every load/= store, but >> >> >>>> > also a lot of helpers), so it makes sense to reserve a registe= r for it. >> >> >>>> > >> >> >>>> > For what I understand from your patch series, you prefer to pa= ss this >> >> >>>> > register explicitly to TCG functions. This basically means thi= s TCG >> >> >>>> > global will be loaded to host register as soon as it is used, = but also >> >> >>>> > regularly, as globals are saved back to their canonical locati= on before >> >> >>>> > an helper or a load/store. >> >> >>>> > >> >> >>>> > So it seems that this patch series will just allowing the "env= register" >> >> >>>> > to change over time, though it will not spare one more registe= r for the >> >> >>>> > TCG code, and it will emit longer TCG code to regularly reload= the env >> >> >>>> > global into a host register. >> >> >>>> >> >> >>>> But there will be one more register available in some cases. In = other >> >> >>> >> >> >>> Inside the TCG code, it will basically happens very rarely, given >> >> >>> load/store are really the most used instructions, and they need t= o load >> >> >>> the env register. >> >> >> >> >> >> Not exactly, from a sample run with -d op_opt: >> >> >> $ egrep -v -e '^$' -v -e 'OP after' -v -e ' end' -v -e 'Search PC' >> >> >> /tmp/qemu.log | awk '{print $1}' | sort | uniq -c|sort -rn >> >> >> 1673966 movi_i32 >> >> >> =C2=A0653931 ld_i32 >> >> >> =C2=A0607432 mov_i32 >> >> >> =C2=A0428684 st_i32 >> >> >> =C2=A0326878 movi_i64 >> >> >> =C2=A0308626 add_i32 >> >> >> =C2=A0283186 call >> >> >> =C2=A0256817 exit_tb >> >> >> =C2=A0207232 nopn >> >> >> =C2=A0189388 goto_tb >> >> >> =C2=A0122398 and_i32 >> >> >> =C2=A0117997 shr_i32 >> >> >> =C2=A089107 qemu_ld32 >> >> >> =C2=A082926 set_label >> >> >> =C2=A082713 brcond_i32 >> >> >> =C2=A067169 qemu_st32 >> >> >> =C2=A055109 or_i32 >> >> >> =C2=A046536 ext32u_i64 >> >> >> =C2=A044288 xor_i32 >> >> >> =C2=A038103 sub_i32 >> >> >> =C2=A026361 shl_i32 >> >> >> =C2=A023218 shl_i64 >> >> >> =C2=A023218 qemu_st64 >> >> >> =C2=A023218 or_i64 >> >> >> =C2=A020474 shr_i64 >> >> >> =C2=A020445 qemu_ld64 >> >> >> =C2=A011161 qemu_ld8u >> >> >> =C2=A010409 qemu_st8 >> >> >> =C2=A0 5013 qemu_ld16u >> >> >> =C2=A0 3795 qemu_st16 >> >> >> =C2=A0 2776 qemu_ld8s >> >> >> =C2=A0 1915 sar_i32 >> >> >> =C2=A0 1414 qemu_ld16s >> >> >> =C2=A0 =C2=A0839 not_i32 >> >> >> =C2=A0 =C2=A0579 setcond_i32 >> >> >> =C2=A0 =C2=A0213 br >> >> >> =C2=A0 =C2=A0 42 ext32s_i64 >> >> >> =C2=A0 =C2=A0 30 mul_i64 >> >> > >> >> > Unless I missed something, this doesn't show the usage of >> >> > ld/st per TB, which is what Aur=C3=A9lien was looking for if I >> >> > understood correctly. =C2=A0All I can say is that you had at >> >> > most 256817 TB's and 234507 qemu_ld/st, so about one per >> >> > TB. >> >> >> >> The question was ratio of loads/stores to other instructions. The >> >> statistics are not per TB. There were about 174880 TBs. >> >> >> >> > Anyway I must be thick, because I fail to see how >> >> > generated code could access guest CPU registers without a >> >> > pointer to the CPU env :-) >> >> > >> >> > IIUC the SPARC translator uses ld_i32/st_i32 mainly for >> >> > accessing the guest CPU registers, which due to register >> >> > windows is held in a dedicated global temp. =C2=A0Is that >> >> > correct? =C2=A0If so this is kind of hiding accesses to the >> >> > CPU env; =C2=A0all other targets read/write registers by using >> >> > CPU env (through the use global temps in most cases). >> >> > >> >> > So I think most (if not almost all) TB will need a pointer >> >> > to CPU env, which is why I think Aur=C3=A9lien's proposal to >> >> > keep a dedicated register that'd be loaded in the prologue >> >> > is the only way to not degrade performance of the >> >> > generated code (I'd add that this dedicated register >> >> > should be the one defined by the ABI as holding the first >> >> > parameter value, if that's possible; =C2=A0I'm afraid this is >> >> > not necessarily a good idea). >> >> >> >> CPU env will be used, but the register could be made available for >> >> other uses too. >> >> >> > >> > What other uses? It is needed for almost every TB, so there is not oth= er >> > possible use. Let's take an example, a simple TB from mips on x86_64: >> > >> > |=C2=A0IN: start_kernel >> > |=C2=A00x80510958: =C2=A0jal =C2=A0 =C2=A0 =C2=A0 =C2=A00x8051d50c >> > | 0x8051095c: =C2=A0nop >> > | >> > | OP after liveness analysis: >> > |=C2=A0 movi_i32 ra,$0x80510960 >> > |=C2=A0 movi_i32 PC,$0x8051d50c >> > | =C2=A0exit_tb $0x0 >> > | =C2=A0end >> > >> > |=C2=A0OUT: [size=3D28] >> > | 0x40f22960: =C2=A0mov =C2=A0 =C2=A0$0x80510960,%ebp >> > | 0x40f22965: =C2=A0mov =C2=A0 =C2=A0%ebp,0x7c(%r14) >> > | 0x40f22969: =C2=A0mov =C2=A0 =C2=A0$0x8051d50c,%ebp >> > | 0x40f2296e: =C2=A0mov =C2=A0 =C2=A0%ebp,0x80(%r14) >> > |=C2=A00x40f22975: =C2=A0xor =C2=A0 =C2=A0%eax,%eax >> > | 0x40f22977: =C2=A0jmpq =C2=A0 0x115042e >> > >> > x86_64 uses r14 as TCG_AREG0. Despite the instructions being quite >> > simple (only 2 movi_i32), the resulting code makes 2 access to env to >> > save the two registers. Having to reload the env pointer each time to = a >> > register would clearly increase the size of this TB. >> >> I don't think TCG would be that simple, instead the pointer would be >> loaded only once in this case. >> >> > Another question, imagine env is not in a register anymore. What kind = of >> > code do you plan to use to load the env pointer into a register? This = is >> > something we need to know to evaluate the cost of not having a fixed >> > register for env withing the TCG code. >> >> The prologue could save it to the first temp dedicated for this. >> > > So basically you mean you want to keep a dedicated temp for env in the > TCG generated code??? It seems to go in the opposite of your previous > mails stating a full elimination of TCG_AREG0. I meant the first stack/temp_buf temp.