* [Qemu-devel] [patch] performance improvement (softmmu, x86, GCC 3)
@ 2004-07-28 14:24 Piotr Krysik
2004-07-31 5:00 ` André Braga
2004-08-03 21:21 ` Fabrice Bellard
0 siblings, 2 replies; 6+ messages in thread
From: Piotr Krysik @ 2004-07-28 14:24 UTC (permalink / raw)
To: qemu-devel
[-- Attachment #1: Type: text/plain, Size: 828 bytes --]
Hi!
I'm attaching a small patch to enable assembly
implementation of ld, lds and st (from
softmmu_header.h) for GCC 3.3 and GCC 3.4 when
running softmmu x86 guest on x86 host.
With my simple benchmark (dd if=/dev/zero bs=1M
count=16 | gzip -9 on Linux guest) this patch
improves performance by about 8% (QEMU compiled
with GCC 3.3 on Pentium II Debian host).
Regards,
Piotrek
PS. I also considered removing "%ecx" from register
constraints of st (softmmu_header.h, line 224) and
explicitly saving ecx before calling __st (line 198),
but performance gain was much smaller. I suspect that
gcse optimization and asm blocks under GCC 3.3 and
GCC 3.4 don't mix well in QEMU.
__________________________________
Do you Yahoo!?
Yahoo! Mail is new and improved - Check it out!
http://promotions.yahoo.com/new_mail
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: gcc3-no-gcse.diff --]
[-- Type: text/x-patch; name="gcc3-no-gcse.diff", Size: 1177 bytes --]
diff -ru qemu-0.6.0/Makefile.target qemu-0.6.0-gcc3/Makefile.target
--- qemu-0.6.0/Makefile.target 2004-07-10 20:20:09.000000000 +0200
+++ qemu-0.6.0-gcc3/Makefile.target 2004-07-28 13:05:31.000000000 +0200
@@ -73,7 +73,7 @@
CFLAGS+=-fomit-frame-pointer
OP_CFLAGS=$(CFLAGS) -mpreferred-stack-boundary=2
ifeq ($(HAVE_GCC3_OPTIONS),yes)
-OP_CFLAGS+= -falign-functions=0
+OP_CFLAGS+= -falign-functions=0 -fno-gcse
else
OP_CFLAGS+= -malign-functions=0
endif
diff -ru qemu-0.6.0/target-i386/op.c qemu-0.6.0-gcc3/target-i386/op.c
--- qemu-0.6.0/target-i386/op.c 2004-07-10 20:20:09.000000000 +0200
+++ qemu-0.6.0-gcc3/target-i386/op.c 2004-07-28 13:08:00.000000000 +0200
@@ -20,11 +20,9 @@
/* XXX: must use this define because the soft mmu macros have huge
register constraints so they cannot be used in any C code. gcc 3.3
- does not seem to be able to handle some constraints in rol
- operations, so we disable it. */
-#if !(__GNUC__ == 3 && __GNUC_MINOR__ == 3)
+ does not seem to be able to handle some constraints in rol unless we
+ disable gcse optimization. */
#define ASM_SOFTMMU
-#endif
#include "exec.h"
/* n must be a constant to be efficient */
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Qemu-devel] [patch] performance improvement (softmmu, x86, GCC 3)
2004-07-28 14:24 [Qemu-devel] [patch] performance improvement (softmmu, x86, GCC 3) Piotr Krysik
@ 2004-07-31 5:00 ` André Braga
2004-08-04 12:50 ` Piotr Krysik
2004-08-03 21:21 ` Fabrice Bellard
1 sibling, 1 reply; 6+ messages in thread
From: André Braga @ 2004-07-31 5:00 UTC (permalink / raw)
To: qemu-devel
Awesome ;)
I haven't dug into the code, so could you please tell me if the ecx
thing you mentioned in the bottom of your message and disabling GCSE
are mutually exclusive? Have you tried to narrow the problem down to
one or more of the separate GCSE flags, instead of the broader
-f[no-]gcse one?
--
"A year spent in artificial intelligence is enough to make one believe in God"
Alan J. Perlis
On Wed, 28 Jul 2004 07:24:42 -0700 (PDT), Piotr Krysik
<piotrek_priv@yahoo.com> wrote:
> Hi!
>
> I'm attaching a small patch to enable assembly
> implementation of ld, lds and st (from
> softmmu_header.h) for GCC 3.3 and GCC 3.4 when
> running softmmu x86 guest on x86 host.
>
> With my simple benchmark (dd if=/dev/zero bs=1M
> count=16 | gzip -9 on Linux guest) this patch
> improves performance by about 8% (QEMU compiled
> with GCC 3.3 on Pentium II Debian host).
>
> Regards,
>
> Piotrek
>
> PS. I also considered removing "%ecx" from register
> constraints of st (softmmu_header.h, line 224) and
> explicitly saving ecx before calling __st (line 198),
> but performance gain was much smaller. I suspect that
> gcse optimization and asm blocks under GCC 3.3 and
> GCC 3.4 don't mix well in QEMU.
>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Qemu-devel] [patch] performance improvement (softmmu, x86, GCC 3)
2004-07-31 5:00 ` André Braga
@ 2004-08-04 12:50 ` Piotr Krysik
2004-08-04 17:21 ` André Braga
0 siblings, 1 reply; 6+ messages in thread
From: Piotr Krysik @ 2004-08-04 12:50 UTC (permalink / raw)
To: qemu-devel
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset=us-ascii, Size: 2406 bytes --]
Hi,
The "ecx thing" and disabling GCSE are not mutually
exclusive, but I didn't try to run/benchmark QEMU with
both. I'm not GCC guru, but I believe that it should not
significantly impact QEMU performance. If you are willing
to do some tests I could send you the "ecx" patch.
And yes, I tried different combinations of -fno-gcse
suboptions, but none worked.
To get more information about the problem, I used
compiler -da flag to trace GCC optimizations of
op_rolb_kernel_T0_T1_cc. I discovered that GCSE step
is introducing transformation that cannot be optimized
later. GCC insists on using copy of T0 value, instead of
using register ebx globally reserved for T0 (and as there
are no free register it gives error). The strangest thing
I noticed is that if I inline stXXXX function by hand instead
of using inline directive, problem disappears.
Piotrek
Andr韂raga <meianoite@gmail.com> wrote:
Awesome ;)
I haven't dug into the code, so could you please tell me if the ecx
thing you mentioned in the bottom of your message and disabling GCSE
are mutually exclusive? Have you tried to narrow the problem down to
one or more of the separate GCSE flags, instead of the broader
-f[no-]gcse one?
--
"A year spent in artificial intelligence is enough to make one believe in God"
Alan J. Perlis
On Wed, 28 Jul 2004 07:24:42 -0700 (PDT), Piotr Krysik
wrote:
> Hi!
>
> I'm attaching a small patch to enable assembly
> implementation of ld, lds and st (from
> softmmu_header.h) for GCC 3.3 and GCC 3.4 when
> running softmmu x86 guest on x86 host.
>
> With my simple benchmark (dd if=/dev/zero bs=1M
> count=16 | gzip -9 on Linux guest) this patch
> improves performance by about 8% (QEMU compiled
> with GCC 3.3 on Pentium II Debian host).
>
> Regards,
>
> Piotrek
>
> PS. I also considered removing "%ecx" from register
> constraints of st (softmmu_header.h, line 224) and
> explicitly saving ecx before calling __st (line 198),
> but performance gain was much smaller. I suspect that
> gcse optimization and asm blocks under GCC 3.3 and
> GCC 3.4 don't mix well in QEMU.
>
_______________________________________________
Qemu-devel mailing list
Qemu-devel@nongnu.org
http://lists.nongnu.org/mailman/listinfo/qemu-devel
__________________________________________________
Do You Yahoo!?
Tired of spam? Yahoo! Mail has the best spam protection around
http://mail.yahoo.com
[-- Attachment #2: Type: text/html, Size: 3053 bytes --]
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Qemu-devel] [patch] performance improvement (softmmu, x86, GCC 3)
2004-08-04 12:50 ` Piotr Krysik
@ 2004-08-04 17:21 ` André Braga
2004-08-05 1:43 ` Piotr Krysik
0 siblings, 1 reply; 6+ messages in thread
From: André Braga @ 2004-08-04 17:21 UTC (permalink / raw)
To: qemu-devel
Hmmm, that's very, very interesting (and exciting)! Squeezing any
ounce of performance out of QEMU is *very* desirable IMO, provided
that it doesn't break compatibility with other architectures. I'd
personally enjoy to see a patch (better yet, three discrete and
independent patches) that disable GCSE (this one is done!), introduce
the "ecx thing" (sorry if I have no idea what you meant in your
message about this -- a patch with some code would certainly help) and
another one that manually inlines the function you mentioned. All in
all, I'd like to see all the code working without special GCC switches
that are not pro-optimization ones, because I see those as
regressions.
Could you send me these patches? I'd glad to test them! I'm not sure
if I can be any more helpful than this since I'm just beginning to get
familiar to the emulation techniques of QEMU, let alone the code by
itself...
----- Original Message -----
From: Piotr Krysik <piotrek_priv@yahoo.com>
Date: Wed, 4 Aug 2004 05:50:18 -0700 (PDT)
Subject: Re: [Qemu-devel] [patch] performance improvement (softmmu, x86, GCC 3)
To: qemu-devel@nongnu.org
Hi,
The "ecx thing" and disabling GCSE are not mutually
exclusive, but I didn't try to run/benchmark QEMU with
both. I'm not GCC guru, but I believe that it should not
significantly impact QEMU performance. If you are willing
to do some tests I could send you the "ecx" patch.
And yes, I tried different combinations of -fno-gcse
suboptions, but none worked.
To get more information about the problem, I used
compiler -da flag to trace GCC optimizations of
op_rolb_kernel_T0_T1_cc. I discovered that GCSE step
is introducing transformation that cannot be optimized
later. GCC insists on using copy of T0 value, instead of
using register ebx globally reserved for T0 (and as there
are no free register it gives error). The strangest thing
I noticed is that if I inline stXXXX function by hand instead
of using inline directive, problem disappears.
Piotrek
--
"Dealing with failure is easy: Work hard to improve. Success is also
easy to handle: You've solved the wrong problem. Work hard to improve"
Alan J. Perlis
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Qemu-devel] [patch] performance improvement (softmmu, x86, GCC 3)
2004-08-04 17:21 ` André Braga
@ 2004-08-05 1:43 ` Piotr Krysik
0 siblings, 0 replies; 6+ messages in thread
From: Piotr Krysik @ 2004-08-05 1:43 UTC (permalink / raw)
To: qemu-devel
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset=us-ascii, Size: 1189 bytes --]
I send you the patches off-list.
Piotrek
--- André_Braga <meianoite@gmail.com> wrote:
> Hmmm, that's very, very interesting (and exciting)!
> Squeezing any
> ounce of performance out of QEMU is *very* desirable
> IMO, provided
> that it doesn't break compatibility with other
> architectures. I'd
> personally enjoy to see a patch (better yet, three
> discrete and
> independent patches) that disable GCSE (this one is
> done!), introduce
> the "ecx thing" (sorry if I have no idea what you
> meant in your
> message about this -- a patch with some code would
> certainly help) and
> another one that manually inlines the function you
> mentioned. All in
> all, I'd like to see all the code working without
> special GCC switches
> that are not pro-optimization ones, because I see
> those as
> regressions.
>
> Could you send me these patches? I'd glad to test
> them! I'm not sure
> if I can be any more helpful than this since I'm
> just beginning to get
> familiar to the emulation techniques of QEMU, let
> alone the code by
> itself...
__________________________________
Do you Yahoo!?
Yahoo! Mail - You care about security. So do we.
http://promotions.yahoo.com/new_mail
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Qemu-devel] [patch] performance improvement (softmmu, x86, GCC 3)
2004-07-28 14:24 [Qemu-devel] [patch] performance improvement (softmmu, x86, GCC 3) Piotr Krysik
2004-07-31 5:00 ` André Braga
@ 2004-08-03 21:21 ` Fabrice Bellard
1 sibling, 0 replies; 6+ messages in thread
From: Fabrice Bellard @ 2004-08-03 21:21 UTC (permalink / raw)
To: qemu-devel
I am trying your patch and will include it if no performance loss on my
benchmarks with gcc 3.2.
BTW, it could be possible to go a little faster by coding in assembler
the unaligned access case of the C helpers which are called by the
inline functions in softmmu_header.h.
Fabrice.
Piotr Krysik wrote:
> Hi!
>
> I'm attaching a small patch to enable assembly
> implementation of ld, lds and st (from
> softmmu_header.h) for GCC 3.3 and GCC 3.4 when
> running softmmu x86 guest on x86 host.
>
> With my simple benchmark (dd if=/dev/zero bs=1M
> count=16 | gzip -9 on Linux guest) this patch
> improves performance by about 8% (QEMU compiled
> with GCC 3.3 on Pentium II Debian host).
>
>
> Regards,
>
> Piotrek
>
>
> PS. I also considered removing "%ecx" from register
> constraints of st (softmmu_header.h, line 224) and
> explicitly saving ecx before calling __st (line 198),
> but performance gain was much smaller. I suspect that
> gcse optimization and asm blocks under GCC 3.3 and
> GCC 3.4 don't mix well in QEMU.
>
>
>
>
> __________________________________
> Do you Yahoo!?
> Yahoo! Mail is new and improved - Check it out!
> http://promotions.yahoo.com/new_mail
>
>
> ------------------------------------------------------------------------
>
> diff -ru qemu-0.6.0/Makefile.target qemu-0.6.0-gcc3/Makefile.target
> --- qemu-0.6.0/Makefile.target 2004-07-10 20:20:09.000000000 +0200
> +++ qemu-0.6.0-gcc3/Makefile.target 2004-07-28 13:05:31.000000000 +0200
> @@ -73,7 +73,7 @@
> CFLAGS+=-fomit-frame-pointer
> OP_CFLAGS=$(CFLAGS) -mpreferred-stack-boundary=2
> ifeq ($(HAVE_GCC3_OPTIONS),yes)
> -OP_CFLAGS+= -falign-functions=0
> +OP_CFLAGS+= -falign-functions=0 -fno-gcse
> else
> OP_CFLAGS+= -malign-functions=0
> endif
> diff -ru qemu-0.6.0/target-i386/op.c qemu-0.6.0-gcc3/target-i386/op.c
> --- qemu-0.6.0/target-i386/op.c 2004-07-10 20:20:09.000000000 +0200
> +++ qemu-0.6.0-gcc3/target-i386/op.c 2004-07-28 13:08:00.000000000 +0200
> @@ -20,11 +20,9 @@
>
> /* XXX: must use this define because the soft mmu macros have huge
> register constraints so they cannot be used in any C code. gcc 3.3
> - does not seem to be able to handle some constraints in rol
> - operations, so we disable it. */
> -#if !(__GNUC__ == 3 && __GNUC_MINOR__ == 3)
> + does not seem to be able to handle some constraints in rol unless we
> + disable gcse optimization. */
> #define ASM_SOFTMMU
> -#endif
> #include "exec.h"
>
> /* n must be a constant to be efficient */
>
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> Qemu-devel mailing list
> Qemu-devel@nongnu.org
> http://lists.nongnu.org/mailman/listinfo/qemu-devel
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2004-08-05 1:47 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-07-28 14:24 [Qemu-devel] [patch] performance improvement (softmmu, x86, GCC 3) Piotr Krysik
2004-07-31 5:00 ` André Braga
2004-08-04 12:50 ` Piotr Krysik
2004-08-04 17:21 ` André Braga
2004-08-05 1:43 ` Piotr Krysik
2004-08-03 21:21 ` Fabrice Bellard
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).