From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43)
	id 1IQRML-0001qp-Ic
	for qemu-devel@nongnu.org; Wed, 29 Aug 2007 13:29:41 -0400
Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43)
	id 1IQRMJ-0001mX-QL
	for qemu-devel@nongnu.org; Wed, 29 Aug 2007 13:29:40 -0400
Received: from [199.232.76.173] (helo=monty-python.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.43) id 1IQRMJ-0001mF-D1
	for qemu-devel@nongnu.org; Wed, 29 Aug 2007 13:29:39 -0400
Received: from mx2.suse.de ([195.135.220.15])
	by monty-python.gnu.org with esmtps
	(TLS-1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.60)
	(envelope-from <matz@suse.de>) id 1IQRMI-0008Kg-Rg
	for qemu-devel@nongnu.org; Wed, 29 Aug 2007 13:29:39 -0400
Date: Wed, 29 Aug 2007 19:29:36 +0200 (CEST)
From: Michael Matz <matz@suse.de>
Subject: Re: [Qemu-devel] [patch] make qemu work with GCC 4
In-Reply-To: <200708291606.14173.paul@codesourcery.com>
Message-ID: <Pine.LNX.4.64.0708291842500.23011@wotan.suse.de>
References: <Pine.LNX.4.64.0708282124590.23011@wotan.suse.de>
	<200708291606.14173.paul@codesourcery.com>
MIME-Version: 1.0
Content-Type: MULTIPART/MIXED;
	BOUNDARY="168427776-1581796550-1188408576=:23011"
Reply-To: qemu-devel@nongnu.org
List-Id: qemu-devel.nongnu.org
List-Unsubscribe: <http://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.gnu.org/pipermail/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <http://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Paul Brook <paul@codesourcery.com>
Cc: qemu-devel@nongnu.org, Alexander Graf <agraf@suse.de>

  This message is in MIME format.  The first part should be readable text,
  while the remaining parts are likely unreadable without MIME-aware tools.

--168427776-1581796550-1188408576=:23011
Content-Type: TEXT/PLAIN; charset=utf-8
Content-Transfer-Encoding: quoted-printable

Hi,

On Wed, 29 Aug 2007, Paul Brook wrote:

> >        I solved that by placing one of the T[012] operands into memor=
y
> >        for HOST_I386, thereby freeing one reg.  Here's some justifica=
tion
> >        of why that doesn't really cost performance: with three free r=
egs
> >        GCC is already spilling like mad in the snippets, we just trad=
e one
> >        of those memory accesses (to stack) with one other mem access =
to
> >        the cpu_state structure, which will be in cache.
>=20
> Do you have any evidence to support this claim?

Not really, only an apple and orange comparison.  A 10000 iteration=20
tests/sha1 run in the same Linux image, with -no-kqemu, on host and targe=
t=20
i386:  time ./sha1

with qemu-0.8.2 (compiled by gcc 3.3-hammer): 7.92 seconds
with qemu-0.9.0-cvs (gcc4.1 compiled, with the patch): 8.15 seconds

I'll try to get a better comparison.  Note, though, that this is the only=
=20
easy solution.  Any other solution would either involve improving reload=20
pretty much, or rewriting many of the op.c patterns (for all targets) to=20
never require more than three registers, and ensure that gcc doesn't=20
become clever and mashes some insns together again (and in trying to do=20
that probably slow down the snippets again).  Or doing away with dyngen a=
t=20
all.  All three solutions are fairly involved, so I'm personally fine wit=
h=20
the above slowdown (for i386 targets you won't normally get any noticable=
=20
slowdown as you'd use kqemu).

> Last time I did this it caused a significant performance hit. I'd guess=
=20
> that most common ops are simple enough that we don't need more than 3=20
> registers.

For i386 target I'm redefining only T1 (as I figured that A0 and T0 are=20
used a bit more), it might be that some of the code could be generated in=
=20
a way to not make use of T1 too much, I haven't really poked at that.

> > --- qemu-0.9.0.cvs.orig/softmmu_header.h
> > - =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0: "%e=
ax", "%ecx", "%edx", "memory", "cc");
> > + =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0: "%e=
ax", "%edx", "memory", "cc");
>=20
> This change is wrong. The inline asm calls C code which clobbers %ecx.

Indeed, must have been blind.  Okay these are too many clobbers for poor=20
gcc4 nevertheless, so a push/pop %ecx around the call needs to be done. =20
Courious that this didn't lead to fire all over the place.  See patch=20
below.


Ciao,
Michael.

Index: softmmu_header.h
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
RCS file: /sources/qemu/qemu/softmmu_header.h,v
retrieving revision 1.15
diff -u -p -r1.15 softmmu_header.h
--- softmmu_header.h	23 May 2007 19:58:10 -0000	1.15
+++ softmmu_header.h	29 Aug 2007 17:27:37 -0000
@@ -230,9 +230,11 @@ static inline void glue(glue(st, SUFFIX)
 #else
 #error unsupported size
 #endif
+		  "pushl %%ecx\n"
                   "pushl %6\n"
                   "call %7\n"
                   "popl %%eax\n"
+                  "popl %%ecx\n"
                   "jmp 2f\n"
                   "1:\n"
                   "addl 8(%%edx), %%eax\n"
@@ -250,14 +252,18 @@ static inline void glue(glue(st, SUFFIX)
                   : "r" (ptr),=20
 /* NOTE: 'q' would be needed as constraint, but we could not use it
    with T1 ! */
+#if DATA_SIZE =3D=3D 1 || DATA_SIZE =3D=3D 2
+		  "q" (v),
+#else
                   "r" (v),=20
+#endif
                   "i" ((CPU_TLB_SIZE - 1) << CPU_TLB_ENTRY_BITS),=20
                   "i" (TARGET_PAGE_BITS - CPU_TLB_ENTRY_BITS),=20
                   "i" (TARGET_PAGE_MASK | (DATA_SIZE - 1)),
                   "m" (*(uint32_t *)offsetof(CPUState, tlb_table[CPU_MEM=
_INDEX][0].addr_write)),
                   "i" (CPU_MEM_INDEX),
                   "m" (*(uint8_t *)&glue(glue(__st, SUFFIX), MMUSUFFIX))
-                  : "%eax", "%ecx", "%edx", "memory", "cc");
+                  : "%eax", "%edx", "memory", "cc");
 }
=20
 #else
--168427776-1581796550-1188408576=:23011--