From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([140.186.70.92]:39191) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1QLc2Q-0007bQ-36 for qemu-devel@nongnu.org; Sun, 15 May 2011 10:11:19 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1QLc2O-0007HD-Bf for qemu-devel@nongnu.org; Sun, 15 May 2011 10:11:18 -0400 Received: from mail-qy0-f180.google.com ([209.85.216.180]:40230) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1QLc2N-0007GZ-Vs for qemu-devel@nongnu.org; Sun, 15 May 2011 10:11:16 -0400 Received: by qyk10 with SMTP id 10so2286278qyk.4 for ; Sun, 15 May 2011 07:11:15 -0700 (PDT) MIME-Version: 1.0 In-Reply-To: <20110515140616.GR30615@hall.aurel32.net> References: <20110515092449.GD30615@hall.aurel32.net> <20110515101941.GF30615@hall.aurel32.net> <20110515113609.GI30615@hall.aurel32.net> <20110515124942.GN30615@hall.aurel32.net> <20110515134337.GP30615@hall.aurel32.net> <20110515140616.GR30615@hall.aurel32.net> From: Blue Swirl Date: Sun, 15 May 2011 17:10:54 +0300 Message-ID: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Subject: Re: [Qemu-devel] [PATCH RFC 00/11] AREG0 elimination List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Aurelien Jarno Cc: qemu-devel On Sun, May 15, 2011 at 5:06 PM, Aurelien Jarno wrot= e: > On Sun, May 15, 2011 at 05:02:37PM +0300, Blue Swirl wrote: >> On Sun, May 15, 2011 at 4:43 PM, Aurelien Jarno w= rote: >> > On Sun, May 15, 2011 at 04:16:20PM +0300, Blue Swirl wrote: >> >> On Sun, May 15, 2011 at 3:49 PM, Aurelien Jarno wrote: >> >> > On Sun, May 15, 2011 at 03:30:17PM +0300, Blue Swirl wrote: >> >> >> On Sun, May 15, 2011 at 2:36 PM, Aurelien Jarno wrote: >> >> >> > On Sun, May 15, 2011 at 02:12:27PM +0300, Blue Swirl wrote: >> >> >> >> On Sun, May 15, 2011 at 1:19 PM, Aurelien Jarno wrote: >> >> >> >> > I still don't get where are this list of possible changes? Di= d I miss >> >> >> >> > something in another thread? >> >> >> >> >> >> >> >> I'm referring to the patches I sent. >> >> >> > >> >> >> > Ok, patches 1, 2 and 4 to 7 looks ok at a first glance, though I= think >> >> >> > patches 6 and 7 should be done for all hosts or none of them. >> >> >> >> >> >> The changes can be done in steps, but of course removing temp_buf = from >> >> >> CPUState would need all targets to be converted first. >> >> >> >> >> >> >> > On the TCG generated code, the env structure is used almost f= or every >> >> >> >> > op, so it really makes sense to keep it in a register instead= of having to >> >> >> >> > reload the address of env regularly from memory. Given it onl= y affects >> >> >> >> > TCG generated code, I don't see the point of portability here= . >> >> >> >> >> >> >> >> For example, maybe the bugs in Sparc glibc could be avoided by = using >> >> >> >> one of %i set of registers (not accessible from helpers) for AR= EG0 >> >> >> >> within generated code instead of %g registers which seem to be >> >> >> >> fragile. >> >> >> > >> >> >> > First of all, but it's a different subject, I am not sure there = are >> >> >> > sparc glibc bugs, I'd rather says QEMU mis-uses some register. F= or >> >> >> > example the following code is probably wrong: >> >> >> > >> >> >> > /* Note: must be synced with dyngen-exec.h */ >> >> >> > #ifdef CONFIG_SOLARIS >> >> >> > #define TCG_AREG0 TCG_REG_G2 >> >> >> > #elif defined(__sparc_v9__) >> >> >> > #define TCG_AREG0 TCG_REG_G5 >> >> >> > #else >> >> >> > #define TCG_AREG0 TCG_REG_G6 >> >> >> > #endif >> >> >> > >> >> >> > __sparc_v9__ can set on the 32-bit ABI, when the compiler target= s V8+, >> >> >> > so the condition is probably wrong there. Secondly the SPARC ABI= [1] on >> >> >> > page 23 that %g5 to %g7 are reserved for system. I don't think Q= EMU has >> >> >> > the right to use this registers. >> >> >> >> >> >> Yes, but the situation is not so nice. Please see this post for st= atus >> >> >> as of 2010: >> >> >> http://article.gmane.org/gmane.comp.emulators.qemu/63610 >> >> >> >> >> >> This is from Debian glibc 2.11.2-10: >> >> >> $ file /lib/libc-2.11.2.so >> >> >> /lib/libc-2.11.2.so: ELF 32-bit MSB shared object, SPARC32PLUS, >> >> >> version 1 (SYSV), dynamically linked (uses shared libs), for GNU/L= inux >> >> >> 2.6.18, stripped >> >> >> $ objdump -d /lib/libc.so.6 |grep %g1|wc -l >> >> >> 69648 >> >> >> $ objdump -d /lib/libc.so.6 |grep %g2|wc -l >> >> >> 37299 >> >> >> $ objdump -d /lib/libc.so.6 |grep %g3|wc -l >> >> >> 20635 >> >> >> $ objdump -d /lib/libc.so.6 |grep %g4|wc -l >> >> >> 11603 >> >> >> $ objdump -d /lib/libc.so.6 |grep %g5|wc -l >> >> >> 448 >> >> >> $ objdump -d /lib/libc.so.6 |grep %g6|wc -l >> >> >> 150 >> >> >> $ objdump -d /lib/libc.so.6 |grep %g7|wc -l >> >> >> 3052 >> >> >> >> >> >> Glibc is compiled for Sparc32plus, so it should only use %g6 and %= g7, >> >> > >> >> > From the calling convention point of view, sparc32 and sparc32plus = are >> >> > the same ABI, so %g5 is also reserved for system use. >> >> > >> >> >> or %g1 and %g5 for scratch purposes. However, it is the applicatio= n >> >> >> registers %g2 to %g4 that are used heaviest. Looking inside the >> >> >> objdump it's easy to see that the uses are not for example saving = or >> >> >> restoring, but actually using them without saving the previous val= ue >> >> >> first: >> >> > >> >> > Well, we have to define system and application. System is defined a= s >> >> > library in Chapter 6, and I don't see the libc there, and is probab= ly >> >> > considered as part of the application. >> >> >> >> No, for example unistd.h is described and even X11. GCC also says tha= t >> >> libraries should be compiled without using the registers: >> >> >> >> http://gcc.gnu.org/onlinedocs/gcc-4.6.0/gcc/SPARC-Options.html#SPARC-= Options >> >> >> >> >> 000211e0 <__divdi3>: >> >> >> =C2=A0 =C2=A0211e0: =C2=A0 =C2=A0 =C2=A0 9d e3 bf a0 =C2=A0 =C2=A0= save =C2=A0%sp, -96, %sp >> >> >> =C2=A0 =C2=A0211e4: =C2=A0 =C2=A0 =C2=A0 90 10 00 18 =C2=A0 =C2=A0= mov =C2=A0%i0, %o0 >> >> >> =C2=A0 =C2=A0211e8: =C2=A0 =C2=A0 =C2=A0 92 10 00 19 =C2=A0 =C2=A0= mov =C2=A0%i1, %o1 >> >> >> =C2=A0 =C2=A0211ec: =C2=A0 =C2=A0 =C2=A0 94 10 00 1a =C2=A0 =C2=A0= mov =C2=A0%i2, %o2 >> >> >> =C2=A0 =C2=A0211f0: =C2=A0 =C2=A0 =C2=A0 96 10 00 1b =C2=A0 =C2=A0= mov =C2=A0%i3, %o3 >> >> >> =C2=A0 =C2=A0211f4: =C2=A0 =C2=A0 =C2=A0 80 a6 20 00 =C2=A0 =C2=A0= cmp =C2=A0%i0, 0 >> >> >> =C2=A0 =C2=A0211f8: =C2=A0 =C2=A0 =C2=A0 06 40 00 10 =C2=A0 =C2=A0= bl,pn =C2=A0 %icc, 21238 <__divdi3+0x58> >> >> >> =C2=A0 =C2=A0211fc: =C2=A0 =C2=A0 =C2=A0 a0 10 20 00 =C2=A0 =C2=A0= clr =C2=A0%l0 >> >> >> =C2=A0 =C2=A021200: =C2=A0 =C2=A0 =C2=A0 80 a2 a0 00 =C2=A0 =C2=A0= cmp =C2=A0%o2, 0 >> >> >> =C2=A0 =C2=A021204: =C2=A0 =C2=A0 =C2=A0 26 40 00 13 =C2=A0 =C2=A0= bl,a,pn =C2=A0 %icc, 21250 <__divdi3+0x70> >> >> >> =C2=A0 =C2=A021208: =C2=A0 =C2=A0 =C2=A0 a0 38 00 10 =C2=A0 =C2=A0= xnor =C2=A0%g0, %l0, %l0 >> >> >> =C2=A0 =C2=A02120c: =C2=A0 =C2=A0 =C2=A0 7f ff fe ed =C2=A0 =C2=A0= call =C2=A020dc0 <__ashldi3+0x40> >> >> >> =C2=A0 =C2=A021210: =C2=A0 =C2=A0 =C2=A0 98 10 20 00 =C2=A0 =C2=A0= clr =C2=A0%o4 >> >> >> =C2=A0 =C2=A021214: =C2=A0 =C2=A0 =C2=A0 84 10 00 08 =C2=A0 =C2=A0= mov =C2=A0%o0, %g2 >> >> >> >> >> >> ...whoops... >> >> >> >> >> >> =C2=A0 =C2=A021218: =C2=A0 =C2=A0 =C2=A0 80 a4 20 00 =C2=A0 =C2=A0= cmp =C2=A0%l0, 0 >> >> >> =C2=A0 =C2=A02121c: =C2=A0 =C2=A0 =C2=A0 02 40 00 04 =C2=A0 =C2=A0= be,pn =C2=A0 %icc, 2122c <__divdi3+0x4c> >> >> >> =C2=A0 =C2=A021220: =C2=A0 =C2=A0 =C2=A0 86 10 00 09 =C2=A0 =C2=A0= mov =C2=A0%o1, %g3 >> >> >> >> >> >> ...whoops... >> >> >> >> >> >> =C2=A0 =C2=A021224: =C2=A0 =C2=A0 =C2=A0 86 a0 00 09 =C2=A0 =C2=A0= subcc =C2=A0%g0, %o1, %g3 >> >> >> =C2=A0 =C2=A021228: =C2=A0 =C2=A0 =C2=A0 84 60 00 02 =C2=A0 =C2=A0= subc =C2=A0%g0, %g2, %g2 >> >> >> =C2=A0 =C2=A02122c: =C2=A0 =C2=A0 =C2=A0 b2 10 00 03 =C2=A0 =C2=A0= mov =C2=A0%g3, %i1 >> >> >> =C2=A0 =C2=A021230: =C2=A0 =C2=A0 =C2=A0 81 cf e0 08 =C2=A0 =C2=A0= rett =C2=A0%i7 + 8 >> >> >> =C2=A0 =C2=A021234: =C2=A0 =C2=A0 =C2=A0 90 10 00 02 =C2=A0 =C2=A0= mov =C2=A0%g2, %o0 >> >> >> =C2=A0 =C2=A021238: =C2=A0 =C2=A0 =C2=A0 92 a0 00 19 =C2=A0 =C2=A0= subcc =C2=A0%g0, %i1, %o1 >> >> >> =C2=A0 =C2=A02123c: =C2=A0 =C2=A0 =C2=A0 90 60 00 18 =C2=A0 =C2=A0= subc =C2=A0%g0, %i0, %o0 >> >> >> =C2=A0 =C2=A021240: =C2=A0 =C2=A0 =C2=A0 80 a2 a0 00 =C2=A0 =C2=A0= cmp =C2=A0%o2, 0 >> >> >> =C2=A0 =C2=A021244: =C2=A0 =C2=A0 =C2=A0 16 4f ff f2 =C2=A0 =C2=A0= bge =C2=A0%icc, 2120c <__divdi3+0x2c> >> >> >> =C2=A0 =C2=A021248: =C2=A0 =C2=A0 =C2=A0 a0 10 3f ff =C2=A0 =C2=A0= mov =C2=A0-1, %l0 >> >> >> =C2=A0 =C2=A02124c: =C2=A0 =C2=A0 =C2=A0 a0 38 00 10 =C2=A0 =C2=A0= xnor =C2=A0%g0, %l0, %l0 >> >> >> =C2=A0 =C2=A021250: =C2=A0 =C2=A0 =C2=A0 96 a0 00 0b =C2=A0 =C2=A0= subcc =C2=A0%g0, %o3, %o3 >> >> >> =C2=A0 =C2=A021254: =C2=A0 =C2=A0 =C2=A0 10 6f ff ee =C2=A0 =C2=A0= b =C2=A0%xcc, 2120c <__divdi3+0x2c> >> >> >> =C2=A0 =C2=A021258: =C2=A0 =C2=A0 =C2=A0 94 60 00 0a =C2=A0 =C2=A0= subc =C2=A0%g0, %o2, %o2 >> >> >> =C2=A0 =C2=A02125c: =C2=A0 =C2=A0 =C2=A0 01 00 00 00 =C2=A0 =C2=A0= nop >> >> >> >> >> >> This is libc from OpenBSD/Sparc64 4.9: >> >> >> $ objdump -d /usr/lib/libc.so.58.0 |grep %g1|wc -l >> >> >> =C2=A0 =C2=A040562 >> >> >> $ objdump -d /usr/lib/libc.so.58.0 |grep %g2|wc -l >> >> >> =C2=A0 =C2=A020384 >> >> >> $ objdump -d /usr/lib/libc.so.58.0 |grep %g3|wc -l >> >> >> =C2=A0 =C2=A010240 >> >> >> $ objdump -d /usr/lib/libc.so.58.0 |grep %g4|wc -l >> >> >> =C2=A0 =C2=A0 6606 >> >> >> $ objdump -d /usr/lib/libc.so.58.0 |grep %g5|wc -l >> >> >> =C2=A0 =C2=A0 3811 >> >> >> $ objdump -d /usr/lib/libc.so.58.0 |grep %g6|wc -l >> >> >> =C2=A0 =C2=A0 =C2=A0 =C2=A04 >> >> >> $ objdump -d /usr/lib/libc.so.58.0 |grep %g7|wc -l >> >> >> =C2=A0 =C2=A0 =C2=A0 20 >> >> >> >> >> >> Not so great there either. >> >> >> >> >> >> > Anyway, I don't see why keeping TCG_AREG0 inside the TCG generat= ed code >> >> >> > would prevent you to use a register from the %i set for it. >> >> >> >> >> >> The helpers currently use global env register, but %i registers ca= n't >> >> >> be accessed from the next level of function call nesting hierarchy= so >> >> >> they can't be used for global env. >> >> >> >> >> > >> >> > That's the current situation yes. Using %i registers for TCG_AREG0 = does >> >> > mean you can't use a global env register in the helpers, but it doe= sn't >> >> > mean that internal TCG code can't use them for TCG_AREG0. >> >> >> >> Exactly. >> >> >> >> > What I am telling you since the beginning is that: >> >> > - I have no objection that we stop using a fixed register in GCC >> >> > =C2=A0generated code (that is completely removing HELPER_CFLAGS). H= owever I >> >> > =C2=A0don't really see the point of doing that, though the Sparc is= sue might >> >> > =C2=A0be an argument. >> >> > - I do have objection to remove TCG_AREG0 from inside the TCG gener= ated >> >> > =C2=A0code, this register is used for almost every TCG op, and I do= n't see >> >> > =C2=A0any real argument for not keeping it. >> >> >> >> I'm pretty much open at this point for all alternatives. >> >> >> > >> > So what about getting rid of TCG_AREG0 for GCC generated code only, at >> > least as a first step? >> > >> > So what about the following changes: >> > - Change TCG_AREG0 of all targets to a callee saved register (if >> > =C2=A0possible, e.g. sparc) >> > - Change the prologue of all TCG targets to take env as an argument, >> > =C2=A0and save it into TCG_AREG0. >> >> This can be the first step. >> >> > - Change all helpers to explicitly take an env pointer instead of usin= g >> > =C2=A0the fixed register. Note that it also includes softmmu helpers, = but >> > =C2=A0the TCG load/store instructions should be kept unchanged. >> >> I think this step will lose performance slightly if TCG is not changed. > > Agreed. What do you have in mind by "if TCG is not changed"? If the register is still fixed in generated code, it would be easily available for the helpers as well in for of global env. Not taking advantage of this should be a minor loss, which is offset with one register more for GCC at the next step. >> > - Remove HELPER_CFLAGS from makefiles when all helpers have been >> > =C2=A0changed. >> >> This should restore most of the performance loss from previous step, >> maybe even improve. >> >> > - TCG_AREG0 can then be changed to another register if needed. >> >> I'd combine this with callee saved register change. Anyway, at this >> point there should be a lot of flexibility with the register choice. >> >> > And later we can do more steps to get a complete removal of TCG_AREG0, >> > including in TCG code, though I still think it is a really bad idea. >> >> Maybe. At this point most of the hard work has been done, so it's >> possible to make experiments. > > That's exactly my point. We more or less agree on previous step, not on > this one, so we will need to experiment for that. OK. Also, most of the cleanup benefits will have been achieved by this poin= t.