From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1IQRML-0001qp-Ic for qemu-devel@nongnu.org; Wed, 29 Aug 2007 13:29:41 -0400 Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1IQRMJ-0001mX-QL for qemu-devel@nongnu.org; Wed, 29 Aug 2007 13:29:40 -0400 Received: from [199.232.76.173] (helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1IQRMJ-0001mF-D1 for qemu-devel@nongnu.org; Wed, 29 Aug 2007 13:29:39 -0400 Received: from mx2.suse.de ([195.135.220.15]) by monty-python.gnu.org with esmtps (TLS-1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.60) (envelope-from ) id 1IQRMI-0008Kg-Rg for qemu-devel@nongnu.org; Wed, 29 Aug 2007 13:29:39 -0400 Date: Wed, 29 Aug 2007 19:29:36 +0200 (CEST) From: Michael Matz Subject: Re: [Qemu-devel] [patch] make qemu work with GCC 4 In-Reply-To: <200708291606.14173.paul@codesourcery.com> Message-ID: References: <200708291606.14173.paul@codesourcery.com> MIME-Version: 1.0 Content-Type: MULTIPART/MIXED; BOUNDARY="168427776-1581796550-1188408576=:23011" Reply-To: qemu-devel@nongnu.org List-Id: qemu-devel.nongnu.org List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Paul Brook Cc: qemu-devel@nongnu.org, Alexander Graf This message is in MIME format. The first part should be readable text, while the remaining parts are likely unreadable without MIME-aware tools. --168427776-1581796550-1188408576=:23011 Content-Type: TEXT/PLAIN; charset=utf-8 Content-Transfer-Encoding: quoted-printable Hi, On Wed, 29 Aug 2007, Paul Brook wrote: > > I solved that by placing one of the T[012] operands into memor= y > > for HOST_I386, thereby freeing one reg. Here's some justifica= tion > > of why that doesn't really cost performance: with three free r= egs > > GCC is already spilling like mad in the snippets, we just trad= e one > > of those memory accesses (to stack) with one other mem access = to > > the cpu_state structure, which will be in cache. >=20 > Do you have any evidence to support this claim? Not really, only an apple and orange comparison. A 10000 iteration=20 tests/sha1 run in the same Linux image, with -no-kqemu, on host and targe= t=20 i386: time ./sha1 with qemu-0.8.2 (compiled by gcc 3.3-hammer): 7.92 seconds with qemu-0.9.0-cvs (gcc4.1 compiled, with the patch): 8.15 seconds I'll try to get a better comparison. Note, though, that this is the only= =20 easy solution. Any other solution would either involve improving reload=20 pretty much, or rewriting many of the op.c patterns (for all targets) to=20 never require more than three registers, and ensure that gcc doesn't=20 become clever and mashes some insns together again (and in trying to do=20 that probably slow down the snippets again). Or doing away with dyngen a= t=20 all. All three solutions are fairly involved, so I'm personally fine wit= h=20 the above slowdown (for i386 targets you won't normally get any noticable= =20 slowdown as you'd use kqemu). > Last time I did this it caused a significant performance hit. I'd guess= =20 > that most common ops are simple enough that we don't need more than 3=20 > registers. For i386 target I'm redefining only T1 (as I figured that A0 and T0 are=20 used a bit more), it might be that some of the code could be generated in= =20 a way to not make use of T1 too much, I haven't really poked at that. > > --- qemu-0.9.0.cvs.orig/softmmu_header.h > > - =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0: "%e= ax", "%ecx", "%edx", "memory", "cc"); > > + =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0: "%e= ax", "%edx", "memory", "cc"); >=20 > This change is wrong. The inline asm calls C code which clobbers %ecx. Indeed, must have been blind. Okay these are too many clobbers for poor=20 gcc4 nevertheless, so a push/pop %ecx around the call needs to be done. =20 Courious that this didn't lead to fire all over the place. See patch=20 below. Ciao, Michael. Index: softmmu_header.h =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D RCS file: /sources/qemu/qemu/softmmu_header.h,v retrieving revision 1.15 diff -u -p -r1.15 softmmu_header.h --- softmmu_header.h 23 May 2007 19:58:10 -0000 1.15 +++ softmmu_header.h 29 Aug 2007 17:27:37 -0000 @@ -230,9 +230,11 @@ static inline void glue(glue(st, SUFFIX) #else #error unsupported size #endif + "pushl %%ecx\n" "pushl %6\n" "call %7\n" "popl %%eax\n" + "popl %%ecx\n" "jmp 2f\n" "1:\n" "addl 8(%%edx), %%eax\n" @@ -250,14 +252,18 @@ static inline void glue(glue(st, SUFFIX) : "r" (ptr),=20 /* NOTE: 'q' would be needed as constraint, but we could not use it with T1 ! */ +#if DATA_SIZE =3D=3D 1 || DATA_SIZE =3D=3D 2 + "q" (v), +#else "r" (v),=20 +#endif "i" ((CPU_TLB_SIZE - 1) << CPU_TLB_ENTRY_BITS),=20 "i" (TARGET_PAGE_BITS - CPU_TLB_ENTRY_BITS),=20 "i" (TARGET_PAGE_MASK | (DATA_SIZE - 1)), "m" (*(uint32_t *)offsetof(CPUState, tlb_table[CPU_MEM= _INDEX][0].addr_write)), "i" (CPU_MEM_INDEX), "m" (*(uint8_t *)&glue(glue(__st, SUFFIX), MMUSUFFIX)) - : "%eax", "%ecx", "%edx", "memory", "cc"); + : "%eax", "%edx", "memory", "cc"); } =20 #else --168427776-1581796550-1188408576=:23011--