qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [Qemu-devel] User mode emulation and TCG_OPF_CALL_CLOBBER
@ 2008-12-26 14:32 Laurent Desnogues
  2008-12-29 10:46 ` Edgar E. Iglesias
  0 siblings, 1 reply; 4+ messages in thread
From: Laurent Desnogues @ 2008-12-26 14:32 UTC (permalink / raw)
  To: qemu-devel

[-- Attachment #1: Type: text/plain, Size: 864 bytes --]

Hello,

while looking at generated code for a user mode emulated program
I noticed some registers were saved/restored for qemu_{ld,st}
operations.  My understanding is that this is only needed for softmmu
(and even in that case for the slow path as a comment in tcg.c says)
since in that case, a call to a helper might be generated.

This register save & restore behavior is enabled by the op flag
TCG_OPF_CALL_CLOBBER.

A quick test on ARM target and x86_64 host for a SPEC2000 test
shows removing that flag speeds up execution by about 15%.

Did I understand things correctly?  If so what would be the best
way to patch this:

   - get rid of TCG_OPF_CALL_CLOBBER and associated code
   - simply omit that flag in opc definition?

The former would be slightly more intrusive for a probably low
benefit.  A patch is attached that implements the latter.


Laurent

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: tcg-call-clobber-usermode.patch --]
[-- Type: text/x-patch; name=tcg-call-clobber-usermode.patch, Size: 5968 bytes --]

Index: tcg/tcg-opc.h
===================================================================
--- tcg/tcg-opc.h	(revision 6131)
+++ tcg/tcg-opc.h	(working copy)
@@ -156,78 +156,83 @@
 DEF2(goto_tb, 0, 0, 1, TCG_OPF_BB_END | TCG_OPF_SIDE_EFFECTS)
 /* Note: even if TARGET_LONG_BITS is not defined, the INDEX_op
    constants must be defined */
+#ifdef CONFIG_SOFTMMU
+#define LOCAL_TCG_OPF_CALL_CLOBBER TCG_OPF_CALL_CLOBBER
+#else
+#define LOCAL_TCG_OPF_CALL_CLOBBER 0
+#endif
 #if TCG_TARGET_REG_BITS == 32
 #if TARGET_LONG_BITS == 32
-DEF2(qemu_ld8u, 1, 1, 1, TCG_OPF_CALL_CLOBBER | TCG_OPF_SIDE_EFFECTS)
+DEF2(qemu_ld8u, 1, 1, 1, LOCAL_TCG_OPF_CALL_CLOBBER | TCG_OPF_SIDE_EFFECTS)
 #else
-DEF2(qemu_ld8u, 1, 2, 1, TCG_OPF_CALL_CLOBBER | TCG_OPF_SIDE_EFFECTS)
+DEF2(qemu_ld8u, 1, 2, 1, LOCAL_TCG_OPF_CALL_CLOBBER | TCG_OPF_SIDE_EFFECTS)
 #endif
 #if TARGET_LONG_BITS == 32
-DEF2(qemu_ld8s, 1, 1, 1, TCG_OPF_CALL_CLOBBER | TCG_OPF_SIDE_EFFECTS)
+DEF2(qemu_ld8s, 1, 1, 1, LOCAL_TCG_OPF_CALL_CLOBBER | TCG_OPF_SIDE_EFFECTS)
 #else
-DEF2(qemu_ld8s, 1, 2, 1, TCG_OPF_CALL_CLOBBER | TCG_OPF_SIDE_EFFECTS)
+DEF2(qemu_ld8s, 1, 2, 1, LOCAL_TCG_OPF_CALL_CLOBBER | TCG_OPF_SIDE_EFFECTS)
 #endif
 #if TARGET_LONG_BITS == 32
-DEF2(qemu_ld16u, 1, 1, 1, TCG_OPF_CALL_CLOBBER | TCG_OPF_SIDE_EFFECTS)
+DEF2(qemu_ld16u, 1, 1, 1, LOCAL_TCG_OPF_CALL_CLOBBER | TCG_OPF_SIDE_EFFECTS)
 #else
-DEF2(qemu_ld16u, 1, 2, 1, TCG_OPF_CALL_CLOBBER | TCG_OPF_SIDE_EFFECTS)
+DEF2(qemu_ld16u, 1, 2, 1, LOCAL_TCG_OPF_CALL_CLOBBER | TCG_OPF_SIDE_EFFECTS)
 #endif
 #if TARGET_LONG_BITS == 32
-DEF2(qemu_ld16s, 1, 1, 1, TCG_OPF_CALL_CLOBBER | TCG_OPF_SIDE_EFFECTS)
+DEF2(qemu_ld16s, 1, 1, 1, LOCAL_TCG_OPF_CALL_CLOBBER | TCG_OPF_SIDE_EFFECTS)
 #else
-DEF2(qemu_ld16s, 1, 2, 1, TCG_OPF_CALL_CLOBBER | TCG_OPF_SIDE_EFFECTS)
+DEF2(qemu_ld16s, 1, 2, 1, LOCAL_TCG_OPF_CALL_CLOBBER | TCG_OPF_SIDE_EFFECTS)
 #endif
 #if TARGET_LONG_BITS == 32
-DEF2(qemu_ld32u, 1, 1, 1, TCG_OPF_CALL_CLOBBER | TCG_OPF_SIDE_EFFECTS)
+DEF2(qemu_ld32u, 1, 1, 1, LOCAL_TCG_OPF_CALL_CLOBBER | TCG_OPF_SIDE_EFFECTS)
 #else
-DEF2(qemu_ld32u, 1, 2, 1, TCG_OPF_CALL_CLOBBER | TCG_OPF_SIDE_EFFECTS)
+DEF2(qemu_ld32u, 1, 2, 1, LOCAL_TCG_OPF_CALL_CLOBBER | TCG_OPF_SIDE_EFFECTS)
 #endif
 #if TARGET_LONG_BITS == 32
-DEF2(qemu_ld32s, 1, 1, 1, TCG_OPF_CALL_CLOBBER | TCG_OPF_SIDE_EFFECTS)
+DEF2(qemu_ld32s, 1, 1, 1, LOCAL_TCG_OPF_CALL_CLOBBER | TCG_OPF_SIDE_EFFECTS)
 #else
-DEF2(qemu_ld32s, 1, 2, 1, TCG_OPF_CALL_CLOBBER | TCG_OPF_SIDE_EFFECTS)
+DEF2(qemu_ld32s, 1, 2, 1, LOCAL_TCG_OPF_CALL_CLOBBER | TCG_OPF_SIDE_EFFECTS)
 #endif
 #if TARGET_LONG_BITS == 32
-DEF2(qemu_ld64, 2, 1, 1, TCG_OPF_CALL_CLOBBER | TCG_OPF_SIDE_EFFECTS)
+DEF2(qemu_ld64, 2, 1, 1, LOCAL_TCG_OPF_CALL_CLOBBER | TCG_OPF_SIDE_EFFECTS)
 #else
-DEF2(qemu_ld64, 2, 2, 1, TCG_OPF_CALL_CLOBBER | TCG_OPF_SIDE_EFFECTS)
+DEF2(qemu_ld64, 2, 2, 1, LOCAL_TCG_OPF_CALL_CLOBBER | TCG_OPF_SIDE_EFFECTS)
 #endif
 
 #if TARGET_LONG_BITS == 32
-DEF2(qemu_st8, 0, 2, 1, TCG_OPF_CALL_CLOBBER | TCG_OPF_SIDE_EFFECTS)
+DEF2(qemu_st8, 0, 2, 1, LOCAL_TCG_OPF_CALL_CLOBBER | TCG_OPF_SIDE_EFFECTS)
 #else
-DEF2(qemu_st8, 0, 3, 1, TCG_OPF_CALL_CLOBBER | TCG_OPF_SIDE_EFFECTS)
+DEF2(qemu_st8, 0, 3, 1, LOCAL_TCG_OPF_CALL_CLOBBER | TCG_OPF_SIDE_EFFECTS)
 #endif
 #if TARGET_LONG_BITS == 32
-DEF2(qemu_st16, 0, 2, 1, TCG_OPF_CALL_CLOBBER | TCG_OPF_SIDE_EFFECTS)
+DEF2(qemu_st16, 0, 2, 1, LOCAL_TCG_OPF_CALL_CLOBBER | TCG_OPF_SIDE_EFFECTS)
 #else
-DEF2(qemu_st16, 0, 3, 1, TCG_OPF_CALL_CLOBBER | TCG_OPF_SIDE_EFFECTS)
+DEF2(qemu_st16, 0, 3, 1, LOCAL_TCG_OPF_CALL_CLOBBER | TCG_OPF_SIDE_EFFECTS)
 #endif
 #if TARGET_LONG_BITS == 32
-DEF2(qemu_st32, 0, 2, 1, TCG_OPF_CALL_CLOBBER | TCG_OPF_SIDE_EFFECTS)
+DEF2(qemu_st32, 0, 2, 1, LOCAL_TCG_OPF_CALL_CLOBBER | TCG_OPF_SIDE_EFFECTS)
 #else
-DEF2(qemu_st32, 0, 3, 1, TCG_OPF_CALL_CLOBBER | TCG_OPF_SIDE_EFFECTS)
+DEF2(qemu_st32, 0, 3, 1, LOCAL_TCG_OPF_CALL_CLOBBER | TCG_OPF_SIDE_EFFECTS)
 #endif
 #if TARGET_LONG_BITS == 32
-DEF2(qemu_st64, 0, 3, 1, TCG_OPF_CALL_CLOBBER | TCG_OPF_SIDE_EFFECTS)
+DEF2(qemu_st64, 0, 3, 1, LOCAL_TCG_OPF_CALL_CLOBBER | TCG_OPF_SIDE_EFFECTS)
 #else
-DEF2(qemu_st64, 0, 4, 1, TCG_OPF_CALL_CLOBBER | TCG_OPF_SIDE_EFFECTS)
+DEF2(qemu_st64, 0, 4, 1, LOCAL_TCG_OPF_CALL_CLOBBER | TCG_OPF_SIDE_EFFECTS)
 #endif
 
 #else /* TCG_TARGET_REG_BITS == 32 */
 
-DEF2(qemu_ld8u, 1, 1, 1, TCG_OPF_CALL_CLOBBER | TCG_OPF_SIDE_EFFECTS)
-DEF2(qemu_ld8s, 1, 1, 1, TCG_OPF_CALL_CLOBBER | TCG_OPF_SIDE_EFFECTS)
-DEF2(qemu_ld16u, 1, 1, 1, TCG_OPF_CALL_CLOBBER | TCG_OPF_SIDE_EFFECTS)
-DEF2(qemu_ld16s, 1, 1, 1, TCG_OPF_CALL_CLOBBER | TCG_OPF_SIDE_EFFECTS)
-DEF2(qemu_ld32u, 1, 1, 1, TCG_OPF_CALL_CLOBBER | TCG_OPF_SIDE_EFFECTS)
-DEF2(qemu_ld32s, 1, 1, 1, TCG_OPF_CALL_CLOBBER | TCG_OPF_SIDE_EFFECTS)
-DEF2(qemu_ld64, 1, 1, 1, TCG_OPF_CALL_CLOBBER | TCG_OPF_SIDE_EFFECTS)
+DEF2(qemu_ld8u, 1, 1, 1, LOCAL_TCG_OPF_CALL_CLOBBER | TCG_OPF_SIDE_EFFECTS)
+DEF2(qemu_ld8s, 1, 1, 1, LOCAL_TCG_OPF_CALL_CLOBBER | TCG_OPF_SIDE_EFFECTS)
+DEF2(qemu_ld16u, 1, 1, 1, LOCAL_TCG_OPF_CALL_CLOBBER | TCG_OPF_SIDE_EFFECTS)
+DEF2(qemu_ld16s, 1, 1, 1, LOCAL_TCG_OPF_CALL_CLOBBER | TCG_OPF_SIDE_EFFECTS)
+DEF2(qemu_ld32u, 1, 1, 1, LOCAL_TCG_OPF_CALL_CLOBBER | TCG_OPF_SIDE_EFFECTS)
+DEF2(qemu_ld32s, 1, 1, 1, LOCAL_TCG_OPF_CALL_CLOBBER | TCG_OPF_SIDE_EFFECTS)
+DEF2(qemu_ld64, 1, 1, 1, LOCAL_TCG_OPF_CALL_CLOBBER | TCG_OPF_SIDE_EFFECTS)
 
-DEF2(qemu_st8, 0, 2, 1, TCG_OPF_CALL_CLOBBER | TCG_OPF_SIDE_EFFECTS)
-DEF2(qemu_st16, 0, 2, 1, TCG_OPF_CALL_CLOBBER | TCG_OPF_SIDE_EFFECTS)
-DEF2(qemu_st32, 0, 2, 1, TCG_OPF_CALL_CLOBBER | TCG_OPF_SIDE_EFFECTS)
-DEF2(qemu_st64, 0, 2, 1, TCG_OPF_CALL_CLOBBER | TCG_OPF_SIDE_EFFECTS)
+DEF2(qemu_st8, 0, 2, 1, LOCAL_TCG_OPF_CALL_CLOBBER | TCG_OPF_SIDE_EFFECTS)
+DEF2(qemu_st16, 0, 2, 1, LOCAL_TCG_OPF_CALL_CLOBBER | TCG_OPF_SIDE_EFFECTS)
+DEF2(qemu_st32, 0, 2, 1, LOCAL_TCG_OPF_CALL_CLOBBER | TCG_OPF_SIDE_EFFECTS)
+DEF2(qemu_st64, 0, 2, 1, LOCAL_TCG_OPF_CALL_CLOBBER | TCG_OPF_SIDE_EFFECTS)
 
 #endif /* TCG_TARGET_REG_BITS != 32 */
 

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [Qemu-devel] User mode emulation and TCG_OPF_CALL_CLOBBER
  2008-12-26 14:32 [Qemu-devel] User mode emulation and TCG_OPF_CALL_CLOBBER Laurent Desnogues
@ 2008-12-29 10:46 ` Edgar E. Iglesias
  2008-12-29 11:35   ` Edgar E. Iglesias
  0 siblings, 1 reply; 4+ messages in thread
From: Edgar E. Iglesias @ 2008-12-29 10:46 UTC (permalink / raw)
  To: Laurent Desnogues; +Cc: qemu-devel

On Fri, Dec 26, 2008 at 03:32:06PM +0100, Laurent Desnogues wrote:
> Hello,
> 
> while looking at generated code for a user mode emulated program
> I noticed some registers were saved/restored for qemu_{ld,st}
> operations.  My understanding is that this is only needed for softmmu
> (and even in that case for the slow path as a comment in tcg.c says)
> since in that case, a call to a helper might be generated.
> 
> This register save & restore behavior is enabled by the op flag
> TCG_OPF_CALL_CLOBBER.
> 
> A quick test on ARM target and x86_64 host for a SPEC2000 test
> shows removing that flag speeds up execution by about 15%.
> 
> Did I understand things correctly?  If so what would be the best

Hello Laurent,

I think you did and I think what you propose kind of makes sense but
unfortunately your patch exposes errors on my setup.

The i386 backend's ld64 seems to clobber registers (eax/edx) behind
tcg's back and with your patch at least CRIS no longer passes it's testsuite
on i386 hosts. (Actually, I can't see how the plain tcg_gen_ld_i64 can work
reliably with the i386 backend from svn.)

Anyway, I made a dirty local fix for the ld64 issue and I am seeing about
5% performance improvements with my tests.

Another issue I'm worried about is when apps segfault, I think there is a
risk that part of the CPUState remains in host registers making programs
much harder to debug.

I'd very much like to see the speedup but IMO we should first fix these
issues.

Thanks

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [Qemu-devel] User mode emulation and TCG_OPF_CALL_CLOBBER
  2008-12-29 10:46 ` Edgar E. Iglesias
@ 2008-12-29 11:35   ` Edgar E. Iglesias
  2008-12-29 16:11     ` Laurent Desnogues
  0 siblings, 1 reply; 4+ messages in thread
From: Edgar E. Iglesias @ 2008-12-29 11:35 UTC (permalink / raw)
  To: qemu-devel; +Cc: Laurent Desnogues

On Mon, Dec 29, 2008 at 11:46:07AM +0100, Edgar E. Iglesias wrote:
> On Fri, Dec 26, 2008 at 03:32:06PM +0100, Laurent Desnogues wrote:
> > Hello,
> > 
> > while looking at generated code for a user mode emulated program
> > I noticed some registers were saved/restored for qemu_{ld,st}
> > operations.  My understanding is that this is only needed for softmmu
> > (and even in that case for the slow path as a comment in tcg.c says)
> > since in that case, a call to a helper might be generated.
> > 
> > This register save & restore behavior is enabled by the op flag
> > TCG_OPF_CALL_CLOBBER.
> > 
> > A quick test on ARM target and x86_64 host for a SPEC2000 test
> > shows removing that flag speeds up execution by about 15%.
> > 
> > Did I understand things correctly?  If so what would be the best
> 
> Hello Laurent,
> 
> I think you did and I think what you propose kind of makes sense but
> unfortunately your patch exposes errors on my setup.
> 
> The i386 backend's ld64 seems to clobber registers (eax/edx) behind
> tcg's back and with your patch at least CRIS no longer passes it's testsuite
> on i386 hosts. (Actually, I can't see how the plain tcg_gen_ld_i64 can work
> reliably with the i386 backend from svn.)

I see now. AFAICT, only qemu_ld64 has issues and only if you remove the
clobber flag.

Cheers

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [Qemu-devel] User mode emulation and TCG_OPF_CALL_CLOBBER
  2008-12-29 11:35   ` Edgar E. Iglesias
@ 2008-12-29 16:11     ` Laurent Desnogues
  0 siblings, 0 replies; 4+ messages in thread
From: Laurent Desnogues @ 2008-12-29 16:11 UTC (permalink / raw)
  To: qemu-devel

On Mon, Dec 29, 2008 at 12:35 PM, Edgar E. Iglesias
<edgar.iglesias@axis.com> wrote:
> On Mon, Dec 29, 2008 at 11:46:07AM +0100, Edgar E. Iglesias wrote:
>> On Fri, Dec 26, 2008 at 03:32:06PM +0100, Laurent Desnogues wrote:
>> > Hello,
>> >
>> > while looking at generated code for a user mode emulated program
>> > I noticed some registers were saved/restored for qemu_{ld,st}
>> > operations.  My understanding is that this is only needed for softmmu
>> > (and even in that case for the slow path as a comment in tcg.c says)
>> > since in that case, a call to a helper might be generated.
>> >
>> > This register save & restore behavior is enabled by the op flag
>> > TCG_OPF_CALL_CLOBBER.
>> >
>> > A quick test on ARM target and x86_64 host for a SPEC2000 test
>> > shows removing that flag speeds up execution by about 15%.
>> >
>> > Did I understand things correctly?  If so what would be the best
>>
>> Hello Laurent,
>>
>> I think you did and I think what you propose kind of makes sense but
>> unfortunately your patch exposes errors on my setup.
>>
>> The i386 backend's ld64 seems to clobber registers (eax/edx) behind
>> tcg's back and with your patch at least CRIS no longer passes it's testsuite
>> on i386 hosts. (Actually, I can't see how the plain tcg_gen_ld_i64 can work
>> reliably with the i386 backend from svn.)
>
> I see now. AFAICT, only qemu_ld64 has issues and only if you remove the
> clobber flag.

Well even if that din't break qemu_ld64 on i386 (which it does),
lack of saves before doing a memory access that could
generate a signal is a killer.

Trash the idea :)


Laurent

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2008-12-29 16:11 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-12-26 14:32 [Qemu-devel] User mode emulation and TCG_OPF_CALL_CLOBBER Laurent Desnogues
2008-12-29 10:46 ` Edgar E. Iglesias
2008-12-29 11:35   ` Edgar E. Iglesias
2008-12-29 16:11     ` Laurent Desnogues

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).