[Qemu-devel] Redundant repz prefixes in generated amd64 code

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

* [Qemu-devel] Redundant repz prefixes in generated amd64 code
@ 2007-03-16 14:15 Julian Seward
  2007-03-16 14:28 ` Paul Brook
  2007-03-16 19:30 ` Igor Kovalenko
  0 siblings, 2 replies; 9+ messages in thread
From: Julian Seward @ 2007-03-16 14:15 UTC (permalink / raw)
  To: qemu-devel

I'm seeing redundant repz (0xF3) prefixes in generated code, typically
just before jumps:

<code_gen_buffer+415>:  repz mov $0xe07f,%eax
<code_gen_buffer+421>:  mov    %eax,0x20(%rbp)
<code_gen_buffer+424>:  lea    -25168302(%rip),%ebx  # 0xaf0420 <tbs+96>
<code_gen_buffer+430>:  retq
<code_gen_buffer+431>:  mov    -25168245(%rip),%eax  # 0xaf0460 <tbs+160>
<code_gen_buffer+437>:  jmpq   *%rax
<code_gen_buffer+439>:  repz mov $0xe092,%eax
<code_gen_buffer+445>:  mov    %eax,0x20(%rbp)
<code_gen_buffer+448>:  lea    -25168325(%rip),%ebx   # 0xaf0421 <tbs+97>
<code_gen_buffer+454>:  retq

I assume these are something to do with translation chaining/unchaining but
have been unable to figure out where they come from.  I know they get executed
are so are not data - valgrind barfs on them.

This is on a 64-bit host (Core 2) with qemu-0.9.0 compiled from source by
gcc-3.4.6, running an x86 (32-bit) guest.

At a guess I'd say the mov $imm,%eax is (created by? to do with?) 
gen_jmp_im in target-i386/translate.c, but I don't see how the F3 
got in on the act.  Grepping the source for 0xF3 turns up nothing 
plausible.  Any ideas where it comes from and how to get rid of it?

J

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Qemu-devel] Redundant repz prefixes in generated amd64 code
  2007-03-16 14:15 [Qemu-devel] Redundant repz prefixes in generated amd64 code Julian Seward
@ 2007-03-16 14:28 ` Paul Brook
  2007-03-16 14:45   ` Julian Seward
  2007-03-16 19:30 ` Igor Kovalenko
  1 sibling, 1 reply; 9+ messages in thread
From: Paul Brook @ 2007-03-16 14:28 UTC (permalink / raw)
  To: qemu-devel

On Friday 16 March 2007 14:15, Julian Seward wrote:
> I'm seeing redundant repz (0xF3) prefixes in generated code, typically
> just before jumps:
>
> <code_gen_buffer+415>:  repz mov $0xe07f,%eax
> <code_gen_buffer+421>:  mov    %eax,0x20(%rbp)
> <code_gen_buffer+424>:  lea    -25168302(%rip),%ebx  # 0xaf0420 <tbs+96>
> <code_gen_buffer+430>:  retq
> <code_gen_buffer+431>:  mov    -25168245(%rip),%eax  # 0xaf0460 <tbs+160>
> <code_gen_buffer+437>:  jmpq   *%rax
> <code_gen_buffer+439>:  repz mov $0xe092,%eax
> <code_gen_buffer+445>:  mov    %eax,0x20(%rbp)
> <code_gen_buffer+448>:  lea    -25168325(%rip),%ebx   # 0xaf0421 <tbs+97>
> <code_gen_buffer+454>:  retq
>
> I assume these are something to do with translation chaining/unchaining but
> have been unable to figure out where they come from.

0000000000008b50 <op_goto_tb1>:
    8b50:       8b 05 00 00 00 00       mov    0(%rip),%eax 
                        8b52: R_X86_64_PC32     __op_param1+0x3c
    8b56:       ff e0                   jmpq   *%rax
    8b58:       f3 c3                   repz retq

qemu only strips the final ret off.
The prefixed ret is to avoid prefetch stalls on amd cpus.

Paul

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Qemu-devel] Redundant repz prefixes in generated amd64 code
  2007-03-16 14:28 ` Paul Brook
@ 2007-03-16 14:45   ` Julian Seward
  2007-03-16 18:14     ` Paul Brook
  0 siblings, 1 reply; 9+ messages in thread
From: Julian Seward @ 2007-03-16 14:45 UTC (permalink / raw)
  To: qemu-devel

On Friday 16 March 2007 14:28, Paul Brook wrote:
> On Friday 16 March 2007 14:15, Julian Seward wrote:
> > I'm seeing redundant repz (0xF3) prefixes in generated code, typically
> > just before jumps:
> >
> > <code_gen_buffer+415>:  repz mov $0xe07f,%eax
> > <code_gen_buffer+421>:  mov    %eax,0x20(%rbp)
> > <code_gen_buffer+424>:  lea    -25168302(%rip),%ebx  # 0xaf0420 <tbs+96>
> > <code_gen_buffer+430>:  retq
> > <code_gen_buffer+431>:  mov    -25168245(%rip),%eax  # 0xaf0460 <tbs+160>
> > <code_gen_buffer+437>:  jmpq   *%rax
> > <code_gen_buffer+439>:  repz mov $0xe092,%eax
> > <code_gen_buffer+445>:  mov    %eax,0x20(%rbp)
> > <code_gen_buffer+448>:  lea    -25168325(%rip),%ebx   # 0xaf0421 <tbs+97>
> > <code_gen_buffer+454>:  retq
> >
> > I assume these are something to do with translation chaining/unchaining
> > but have been unable to figure out where they come from.
>
> 0000000000008b50 <op_goto_tb1>:
>     8b50:       8b 05 00 00 00 00       mov    0(%rip),%eax
>                         8b52: R_X86_64_PC32     __op_param1+0x3c
>     8b56:       ff e0                   jmpq   *%rax
>     8b58:       f3 c3                   repz retq
>
> qemu only strips the final ret off.
> The prefixed ret is to avoid prefetch stalls on amd cpus.

So the implication of this is that the generated code just happens to
work only because the dangling F3 never ends up in front of some other
instruction which it would change the meaning of?

J

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Qemu-devel] Redundant repz prefixes in generated amd64 code
  2007-03-16 14:45   ` Julian Seward
@ 2007-03-16 18:14     ` Paul Brook
  0 siblings, 0 replies; 9+ messages in thread
From: Paul Brook @ 2007-03-16 18:14 UTC (permalink / raw)
  To: qemu-devel

> > 0000000000008b50 <op_goto_tb1>:
> >     8b50:       8b 05 00 00 00 00       mov    0(%rip),%eax
> >                         8b52: R_X86_64_PC32     __op_param1+0x3c
> >     8b56:       ff e0                   jmpq   *%rax
> >     8b58:       f3 c3                   repz retq
> >
> > qemu only strips the final ret off.
> > The prefixed ret is to avoid prefetch stalls on amd cpus.
>
> So the implication of this is that the generated code just happens to
> work only because the dangling F3 never ends up in front of some other
> instruction which it would change the meaning of?

Correct.

Paul

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Qemu-devel] Redundant repz prefixes in generated amd64 code
  2007-03-16 14:15 [Qemu-devel] Redundant repz prefixes in generated amd64 code Julian Seward
  2007-03-16 14:28 ` Paul Brook
@ 2007-03-16 19:30 ` Igor Kovalenko
  2007-03-16 23:06   ` Julian Seward
  2007-03-17  7:35   ` axel
  1 sibling, 2 replies; 9+ messages in thread
From: Igor Kovalenko @ 2007-03-16 19:30 UTC (permalink / raw)
  To: qemu-devel

On 3/16/07, Julian Seward <jseward@acm.org> wrote:
>
> I'm seeing redundant repz (0xF3) prefixes in generated code, typically
> just before jumps:
>
> <code_gen_buffer+415>:  repz mov $0xe07f,%eax
> <code_gen_buffer+421>:  mov    %eax,0x20(%rbp)
> <code_gen_buffer+424>:  lea    -25168302(%rip),%ebx  # 0xaf0420 <tbs+96>
> <code_gen_buffer+430>:  retq
> <code_gen_buffer+431>:  mov    -25168245(%rip),%eax  # 0xaf0460 <tbs+160>
> <code_gen_buffer+437>:  jmpq   *%rax
> <code_gen_buffer+439>:  repz mov $0xe092,%eax
> <code_gen_buffer+445>:  mov    %eax,0x20(%rbp)
> <code_gen_buffer+448>:  lea    -25168325(%rip),%ebx   # 0xaf0421 <tbs+97>
> <code_gen_buffer+454>:  retq
>
> I assume these are something to do with translation chaining/unchaining but
> have been unable to figure out where they come from.  I know they get executed
> are so are not data - valgrind barfs on them.
>
> This is on a 64-bit host (Core 2) with qemu-0.9.0 compiled from source by
> gcc-3.4.6, running an x86 (32-bit) guest.
>
> At a guess I'd say the mov $imm,%eax is (created by? to do with?)
> gen_jmp_im in target-i386/translate.c, but I don't see how the F3
> got in on the act.  Grepping the source for 0xF3 turns up nothing
> plausible.  Any ideas where it comes from and how to get rid of it?
>

Try -mtune=nocona something like the following

Index: Makefile.target
===================================================================
RCS file: /cvsroot/qemu/qemu/Makefile.target,v
retrieving revision 1.147
diff -u -r1.147 Makefile.target
--- Makefile.target     28 Feb 2007 21:36:41 -0000      1.147
+++ Makefile.target     16 Mar 2007 19:29:04 -0000
@@ -99,6 +99,7 @@
 endif

 ifeq ($(ARCH),x86_64)
+OP_CFLAGS+= -mtune=nocona -W -Wall -O4
 BASE_LDFLAGS+=-Wl,-T,$(SRC_PATH)/$(ARCH).ld
 endif


-- 
Kind regards,
Igor V. Kovalenko

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Qemu-devel] Redundant repz prefixes in generated amd64 code
  2007-03-16 19:30 ` Igor Kovalenko
@ 2007-03-16 23:06   ` Julian Seward
  2007-03-17  7:35   ` axel
  1 sibling, 0 replies; 9+ messages in thread
From: Julian Seward @ 2007-03-16 23:06 UTC (permalink / raw)
  To: qemu-devel


>  ifeq ($(ARCH),x86_64)
> +OP_CFLAGS+= -mtune=nocona -W -Wall -O4
>  BASE_LDFLAGS+=-Wl,-T,$(SRC_PATH)/$(ARCH).ld
>  endif

That works.  Thanks.

J

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Qemu-devel] Redundant repz prefixes in generated amd64 code
  2007-03-16 19:30 ` Igor Kovalenko
  2007-03-16 23:06   ` Julian Seward
@ 2007-03-17  7:35   ` axel
  2007-03-17  9:51     ` Johannes Schindelin
  1 sibling, 1 reply; 9+ messages in thread
From: axel @ 2007-03-17  7:35 UTC (permalink / raw)
  To: qemu-devel

On Friday 16 March 2007 20:30, Igor Kovalenko wrote:
> On 3/16/07, Julian Seward <jseward@acm.org> wrote:
> > I'm seeing redundant repz (0xF3) prefixes in generated code, typically
> > just before jumps:
> >
> > <code_gen_buffer+415>:  repz mov $0xe07f,%eax
> > <code_gen_buffer+421>:  mov    %eax,0x20(%rbp)
> > <code_gen_buffer+424>:  lea    -25168302(%rip),%ebx  # 0xaf0420 <tbs+96>
> > <code_gen_buffer+430>:  retq
> > <code_gen_buffer+431>:  mov    -25168245(%rip),%eax  # 0xaf0460 <tbs+160>
> > <code_gen_buffer+437>:  jmpq   *%rax
> > <code_gen_buffer+439>:  repz mov $0xe092,%eax
> > <code_gen_buffer+445>:  mov    %eax,0x20(%rbp)
> > <code_gen_buffer+448>:  lea    -25168325(%rip),%ebx   # 0xaf0421 <tbs+97>
> > <code_gen_buffer+454>:  retq
> >
> > I assume these are something to do with translation chaining/unchaining
> > but have been unable to figure out where they come from.  I know they get
> > executed are so are not data - valgrind barfs on them.
> >
> > This is on a 64-bit host (Core 2) with qemu-0.9.0 compiled from source by
> > gcc-3.4.6, running an x86 (32-bit) guest.
> >
> > At a guess I'd say the mov $imm,%eax is (created by? to do with?)
> > gen_jmp_im in target-i386/translate.c, but I don't see how the F3
> > got in on the act.  Grepping the source for 0xF3 turns up nothing
> > plausible.  Any ideas where it comes from and how to get rid of it?
>
> Try -mtune=nocona something like the following

IMHO one should change dyngen. Below a hack (elf only, I can not test the COFF 
branch). It works for amd64->amd64 (tested with -no-kqemu), but is not save, 
because the instruction before the ret may contain the 0xf3 byte as immediate 
operand. 
A full solution would dissassemble the whole function, determine the borders 
of the opcode and then decide, where to cut the block to copy. Perhaps one 
could then also detect multiple returns in a function and one could try to 
rewrite the opcode blocks replacing the multiple returns with jumps.

Why there exist two different blocks for COFF and ELF for x86/x86_64 hosts?

Axel

Index: dyngen.c
===================================================================
RCS file: /sources/qemu/qemu/dyngen.c,v
retrieving revision 1.49
diff -u -r1.49 dyngen.c
--- dyngen.c    4 Mar 2007 00:52:16 -0000       1.49
+++ dyngen.c    17 Mar 2007 07:19:41 -0000
@@ -1458,6 +1458,8 @@
             error("empty code for %s", name);
         if (p_end[-1] == 0xc3) {
             len--;
+           if ( len>0 && p_end[-2] == 0xf3 )
+               --len;
         } else {
             error("ret or jmp expected at the end of %s", name);
         }

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Qemu-devel] Redundant repz prefixes in generated amd64 code
  2007-03-17  7:35   ` axel
@ 2007-03-17  9:51     ` Johannes Schindelin
  2007-03-17 11:16       ` Axel Zeuner
  0 siblings, 1 reply; 9+ messages in thread
From: Johannes Schindelin @ 2007-03-17  9:51 UTC (permalink / raw)
  To: axel; +Cc: qemu-devel

Hi,

On Sat, 17 Mar 2007, axel wrote:

> Why there exist two different blocks for COFF and ELF for x86/x86_64 
> hosts?

Because COFF is used by Windows, and ELF by Linux, and they are 
substantially different?

> @@ -1458,6 +1458,8 @@
>              error("empty code for %s", name);
>          if (p_end[-1] == 0xc3) {
>              len--;
> +           if ( len>0 && p_end[-2] == 0xf3 )
> +               --len;

This is wrong in several accounts:

- style (space after opening parentheses and before closing parentheses, 
  no space before and after ">", "--" before instead of after "len", just 
  see the if clause above)

- if you want to access "p_end[-2]", you must check for "len > 1"

- you most likely want to check "p_end[-1]" anyway

- worst: there is no appropriate explanation why this patch is needed, and 
  even more importantly, why it does not break existing code

Hth,
Dscho

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Qemu-devel] Redundant repz prefixes in generated amd64 code
  2007-03-17  9:51     ` Johannes Schindelin
@ 2007-03-17 11:16       ` Axel Zeuner
  0 siblings, 0 replies; 9+ messages in thread
From: Axel Zeuner @ 2007-03-17 11:16 UTC (permalink / raw)
  To: qemu-devel

Hi,

On Saturday 17 March 2007 10:51, Johannes Schindelin wrote:
> Hi,
>
> On Sat, 17 Mar 2007, axel wrote:
> > Why there exist two different blocks for COFF and ELF for x86/x86_64
> > hosts?
>
> Because COFF is used by Windows, and ELF by Linux, and they are
> substantially different?
>
Sorry, I did not want to criticise the code, I apologise for that.
But do these blocks different things? They should check for the last byte, 
strip off trailing padding bytes including the ret instruction and determine 
the size of the block of code to copy. Am I really wrong here?
>
> > @@ -1458,6 +1458,8 @@
> >              error("empty code for %s", name);
> >          if (p_end[-1] == 0xc3) {
> >              len--;
> > +           if ( len>0 && p_end[-2] == 0xf3 )
> > +               --len;
>
> This is wrong in several accounts:
>
> - style (space after opening parentheses and before closing parentheses,
>   no space before and after ">", "--" before instead of after "len", just
>   see the if clause above)
I agree, sorry for that, next time I will follow the coding rules. Most of my 
time I use C++ and there it makes sense to prefer prefix decrement and prefix 
increment operations for performance reasons.
> - if you want to access "p_end[-2]", you must check for "len > 1"
I do not agree, because len was decremented in the line above and len is a 
signed int and p_end was not changed. 
>
> - you most likely want to check "p_end[-1]" anyway
No, because p_end[-1] was already checked and is known to be 0xc3. I want to 
check the byte before p_end[-1], because repz; ret translates to 0xf3 0xc3
>
> - worst: there is no appropriate explanation why this patch is needed, and
The currently generated op_XXX functions are not affected by the stale repz 
prefixes at the end of the generated and copied blocks, but the following 
scenario is possible, at least in theory:
op_1:
	movl $0,%%ecx
	do_what_ever_but_do_not_change_ecx
	repz; ret

op_2:
	stosd
	ret
Now the following op code sequence op_1, op_2 is generated. The resulting code 
in the code generation buffer will be
	movl $0,%%ecx
	do_what_ever_but_do_not_change_ecx
	repz; # stale from op_1
	stosd; # body of op_2
This is probably not what one wants to execute.
>   even more importantly, why it does not break existing code
I agree fully, as I mentioned, this is a HACK and WILL break existing code 
sooner or later. 

Kind regards
Axel

> Hth,
> Dscho
>
>
>
>
> _______________________________________________
> Qemu-devel mailing list
> Qemu-devel@nongnu.org
> http://lists.nongnu.org/mailman/listinfo/qemu-devel

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2007-03-17 11:18 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-03-16 14:15 [Qemu-devel] Redundant repz prefixes in generated amd64 code Julian Seward
2007-03-16 14:28 ` Paul Brook
2007-03-16 14:45   ` Julian Seward
2007-03-16 18:14     ` Paul Brook
2007-03-16 19:30 ` Igor Kovalenko
2007-03-16 23:06   ` Julian Seward
2007-03-17  7:35   ` axel
2007-03-17  9:51     ` Johannes Schindelin
2007-03-17 11:16       ` Axel Zeuner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).