* [Qemu-devel] Redundant repz prefixes in generated amd64 code
@ 2007-03-16 14:15 Julian Seward
2007-03-16 14:28 ` Paul Brook
2007-03-16 19:30 ` Igor Kovalenko
0 siblings, 2 replies; 9+ messages in thread
From: Julian Seward @ 2007-03-16 14:15 UTC (permalink / raw)
To: qemu-devel
I'm seeing redundant repz (0xF3) prefixes in generated code, typically
just before jumps:
<code_gen_buffer+415>: repz mov $0xe07f,%eax
<code_gen_buffer+421>: mov %eax,0x20(%rbp)
<code_gen_buffer+424>: lea -25168302(%rip),%ebx # 0xaf0420 <tbs+96>
<code_gen_buffer+430>: retq
<code_gen_buffer+431>: mov -25168245(%rip),%eax # 0xaf0460 <tbs+160>
<code_gen_buffer+437>: jmpq *%rax
<code_gen_buffer+439>: repz mov $0xe092,%eax
<code_gen_buffer+445>: mov %eax,0x20(%rbp)
<code_gen_buffer+448>: lea -25168325(%rip),%ebx # 0xaf0421 <tbs+97>
<code_gen_buffer+454>: retq
I assume these are something to do with translation chaining/unchaining but
have been unable to figure out where they come from. I know they get executed
are so are not data - valgrind barfs on them.
This is on a 64-bit host (Core 2) with qemu-0.9.0 compiled from source by
gcc-3.4.6, running an x86 (32-bit) guest.
At a guess I'd say the mov $imm,%eax is (created by? to do with?)
gen_jmp_im in target-i386/translate.c, but I don't see how the F3
got in on the act. Grepping the source for 0xF3 turns up nothing
plausible. Any ideas where it comes from and how to get rid of it?
J
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Qemu-devel] Redundant repz prefixes in generated amd64 code
2007-03-16 14:15 [Qemu-devel] Redundant repz prefixes in generated amd64 code Julian Seward
@ 2007-03-16 14:28 ` Paul Brook
2007-03-16 14:45 ` Julian Seward
2007-03-16 19:30 ` Igor Kovalenko
1 sibling, 1 reply; 9+ messages in thread
From: Paul Brook @ 2007-03-16 14:28 UTC (permalink / raw)
To: qemu-devel
On Friday 16 March 2007 14:15, Julian Seward wrote:
> I'm seeing redundant repz (0xF3) prefixes in generated code, typically
> just before jumps:
>
> <code_gen_buffer+415>: repz mov $0xe07f,%eax
> <code_gen_buffer+421>: mov %eax,0x20(%rbp)
> <code_gen_buffer+424>: lea -25168302(%rip),%ebx # 0xaf0420 <tbs+96>
> <code_gen_buffer+430>: retq
> <code_gen_buffer+431>: mov -25168245(%rip),%eax # 0xaf0460 <tbs+160>
> <code_gen_buffer+437>: jmpq *%rax
> <code_gen_buffer+439>: repz mov $0xe092,%eax
> <code_gen_buffer+445>: mov %eax,0x20(%rbp)
> <code_gen_buffer+448>: lea -25168325(%rip),%ebx # 0xaf0421 <tbs+97>
> <code_gen_buffer+454>: retq
>
> I assume these are something to do with translation chaining/unchaining but
> have been unable to figure out where they come from.
0000000000008b50 <op_goto_tb1>:
8b50: 8b 05 00 00 00 00 mov 0(%rip),%eax
8b52: R_X86_64_PC32 __op_param1+0x3c
8b56: ff e0 jmpq *%rax
8b58: f3 c3 repz retq
qemu only strips the final ret off.
The prefixed ret is to avoid prefetch stalls on amd cpus.
Paul
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Qemu-devel] Redundant repz prefixes in generated amd64 code
2007-03-16 14:28 ` Paul Brook
@ 2007-03-16 14:45 ` Julian Seward
2007-03-16 18:14 ` Paul Brook
0 siblings, 1 reply; 9+ messages in thread
From: Julian Seward @ 2007-03-16 14:45 UTC (permalink / raw)
To: qemu-devel
On Friday 16 March 2007 14:28, Paul Brook wrote:
> On Friday 16 March 2007 14:15, Julian Seward wrote:
> > I'm seeing redundant repz (0xF3) prefixes in generated code, typically
> > just before jumps:
> >
> > <code_gen_buffer+415>: repz mov $0xe07f,%eax
> > <code_gen_buffer+421>: mov %eax,0x20(%rbp)
> > <code_gen_buffer+424>: lea -25168302(%rip),%ebx # 0xaf0420 <tbs+96>
> > <code_gen_buffer+430>: retq
> > <code_gen_buffer+431>: mov -25168245(%rip),%eax # 0xaf0460 <tbs+160>
> > <code_gen_buffer+437>: jmpq *%rax
> > <code_gen_buffer+439>: repz mov $0xe092,%eax
> > <code_gen_buffer+445>: mov %eax,0x20(%rbp)
> > <code_gen_buffer+448>: lea -25168325(%rip),%ebx # 0xaf0421 <tbs+97>
> > <code_gen_buffer+454>: retq
> >
> > I assume these are something to do with translation chaining/unchaining
> > but have been unable to figure out where they come from.
>
> 0000000000008b50 <op_goto_tb1>:
> 8b50: 8b 05 00 00 00 00 mov 0(%rip),%eax
> 8b52: R_X86_64_PC32 __op_param1+0x3c
> 8b56: ff e0 jmpq *%rax
> 8b58: f3 c3 repz retq
>
> qemu only strips the final ret off.
> The prefixed ret is to avoid prefetch stalls on amd cpus.
So the implication of this is that the generated code just happens to
work only because the dangling F3 never ends up in front of some other
instruction which it would change the meaning of?
J
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Qemu-devel] Redundant repz prefixes in generated amd64 code
2007-03-16 14:45 ` Julian Seward
@ 2007-03-16 18:14 ` Paul Brook
0 siblings, 0 replies; 9+ messages in thread
From: Paul Brook @ 2007-03-16 18:14 UTC (permalink / raw)
To: qemu-devel
> > 0000000000008b50 <op_goto_tb1>:
> > 8b50: 8b 05 00 00 00 00 mov 0(%rip),%eax
> > 8b52: R_X86_64_PC32 __op_param1+0x3c
> > 8b56: ff e0 jmpq *%rax
> > 8b58: f3 c3 repz retq
> >
> > qemu only strips the final ret off.
> > The prefixed ret is to avoid prefetch stalls on amd cpus.
>
> So the implication of this is that the generated code just happens to
> work only because the dangling F3 never ends up in front of some other
> instruction which it would change the meaning of?
Correct.
Paul
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Qemu-devel] Redundant repz prefixes in generated amd64 code
2007-03-16 14:15 [Qemu-devel] Redundant repz prefixes in generated amd64 code Julian Seward
2007-03-16 14:28 ` Paul Brook
@ 2007-03-16 19:30 ` Igor Kovalenko
2007-03-16 23:06 ` Julian Seward
2007-03-17 7:35 ` axel
1 sibling, 2 replies; 9+ messages in thread
From: Igor Kovalenko @ 2007-03-16 19:30 UTC (permalink / raw)
To: qemu-devel
On 3/16/07, Julian Seward <jseward@acm.org> wrote:
>
> I'm seeing redundant repz (0xF3) prefixes in generated code, typically
> just before jumps:
>
> <code_gen_buffer+415>: repz mov $0xe07f,%eax
> <code_gen_buffer+421>: mov %eax,0x20(%rbp)
> <code_gen_buffer+424>: lea -25168302(%rip),%ebx # 0xaf0420 <tbs+96>
> <code_gen_buffer+430>: retq
> <code_gen_buffer+431>: mov -25168245(%rip),%eax # 0xaf0460 <tbs+160>
> <code_gen_buffer+437>: jmpq *%rax
> <code_gen_buffer+439>: repz mov $0xe092,%eax
> <code_gen_buffer+445>: mov %eax,0x20(%rbp)
> <code_gen_buffer+448>: lea -25168325(%rip),%ebx # 0xaf0421 <tbs+97>
> <code_gen_buffer+454>: retq
>
> I assume these are something to do with translation chaining/unchaining but
> have been unable to figure out where they come from. I know they get executed
> are so are not data - valgrind barfs on them.
>
> This is on a 64-bit host (Core 2) with qemu-0.9.0 compiled from source by
> gcc-3.4.6, running an x86 (32-bit) guest.
>
> At a guess I'd say the mov $imm,%eax is (created by? to do with?)
> gen_jmp_im in target-i386/translate.c, but I don't see how the F3
> got in on the act. Grepping the source for 0xF3 turns up nothing
> plausible. Any ideas where it comes from and how to get rid of it?
>
Try -mtune=nocona something like the following
Index: Makefile.target
===================================================================
RCS file: /cvsroot/qemu/qemu/Makefile.target,v
retrieving revision 1.147
diff -u -r1.147 Makefile.target
--- Makefile.target 28 Feb 2007 21:36:41 -0000 1.147
+++ Makefile.target 16 Mar 2007 19:29:04 -0000
@@ -99,6 +99,7 @@
endif
ifeq ($(ARCH),x86_64)
+OP_CFLAGS+= -mtune=nocona -W -Wall -O4
BASE_LDFLAGS+=-Wl,-T,$(SRC_PATH)/$(ARCH).ld
endif
--
Kind regards,
Igor V. Kovalenko
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Qemu-devel] Redundant repz prefixes in generated amd64 code
2007-03-16 19:30 ` Igor Kovalenko
@ 2007-03-16 23:06 ` Julian Seward
2007-03-17 7:35 ` axel
1 sibling, 0 replies; 9+ messages in thread
From: Julian Seward @ 2007-03-16 23:06 UTC (permalink / raw)
To: qemu-devel
> ifeq ($(ARCH),x86_64)
> +OP_CFLAGS+= -mtune=nocona -W -Wall -O4
> BASE_LDFLAGS+=-Wl,-T,$(SRC_PATH)/$(ARCH).ld
> endif
That works. Thanks.
J
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Qemu-devel] Redundant repz prefixes in generated amd64 code
2007-03-16 19:30 ` Igor Kovalenko
2007-03-16 23:06 ` Julian Seward
@ 2007-03-17 7:35 ` axel
2007-03-17 9:51 ` Johannes Schindelin
1 sibling, 1 reply; 9+ messages in thread
From: axel @ 2007-03-17 7:35 UTC (permalink / raw)
To: qemu-devel
On Friday 16 March 2007 20:30, Igor Kovalenko wrote:
> On 3/16/07, Julian Seward <jseward@acm.org> wrote:
> > I'm seeing redundant repz (0xF3) prefixes in generated code, typically
> > just before jumps:
> >
> > <code_gen_buffer+415>: repz mov $0xe07f,%eax
> > <code_gen_buffer+421>: mov %eax,0x20(%rbp)
> > <code_gen_buffer+424>: lea -25168302(%rip),%ebx # 0xaf0420 <tbs+96>
> > <code_gen_buffer+430>: retq
> > <code_gen_buffer+431>: mov -25168245(%rip),%eax # 0xaf0460 <tbs+160>
> > <code_gen_buffer+437>: jmpq *%rax
> > <code_gen_buffer+439>: repz mov $0xe092,%eax
> > <code_gen_buffer+445>: mov %eax,0x20(%rbp)
> > <code_gen_buffer+448>: lea -25168325(%rip),%ebx # 0xaf0421 <tbs+97>
> > <code_gen_buffer+454>: retq
> >
> > I assume these are something to do with translation chaining/unchaining
> > but have been unable to figure out where they come from. I know they get
> > executed are so are not data - valgrind barfs on them.
> >
> > This is on a 64-bit host (Core 2) with qemu-0.9.0 compiled from source by
> > gcc-3.4.6, running an x86 (32-bit) guest.
> >
> > At a guess I'd say the mov $imm,%eax is (created by? to do with?)
> > gen_jmp_im in target-i386/translate.c, but I don't see how the F3
> > got in on the act. Grepping the source for 0xF3 turns up nothing
> > plausible. Any ideas where it comes from and how to get rid of it?
>
> Try -mtune=nocona something like the following
IMHO one should change dyngen. Below a hack (elf only, I can not test the COFF
branch). It works for amd64->amd64 (tested with -no-kqemu), but is not save,
because the instruction before the ret may contain the 0xf3 byte as immediate
operand.
A full solution would dissassemble the whole function, determine the borders
of the opcode and then decide, where to cut the block to copy. Perhaps one
could then also detect multiple returns in a function and one could try to
rewrite the opcode blocks replacing the multiple returns with jumps.
Why there exist two different blocks for COFF and ELF for x86/x86_64 hosts?
Axel
Index: dyngen.c
===================================================================
RCS file: /sources/qemu/qemu/dyngen.c,v
retrieving revision 1.49
diff -u -r1.49 dyngen.c
--- dyngen.c 4 Mar 2007 00:52:16 -0000 1.49
+++ dyngen.c 17 Mar 2007 07:19:41 -0000
@@ -1458,6 +1458,8 @@
error("empty code for %s", name);
if (p_end[-1] == 0xc3) {
len--;
+ if ( len>0 && p_end[-2] == 0xf3 )
+ --len;
} else {
error("ret or jmp expected at the end of %s", name);
}
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Qemu-devel] Redundant repz prefixes in generated amd64 code
2007-03-17 7:35 ` axel
@ 2007-03-17 9:51 ` Johannes Schindelin
2007-03-17 11:16 ` Axel Zeuner
0 siblings, 1 reply; 9+ messages in thread
From: Johannes Schindelin @ 2007-03-17 9:51 UTC (permalink / raw)
To: axel; +Cc: qemu-devel
Hi,
On Sat, 17 Mar 2007, axel wrote:
> Why there exist two different blocks for COFF and ELF for x86/x86_64
> hosts?
Because COFF is used by Windows, and ELF by Linux, and they are
substantially different?
> @@ -1458,6 +1458,8 @@
> error("empty code for %s", name);
> if (p_end[-1] == 0xc3) {
> len--;
> + if ( len>0 && p_end[-2] == 0xf3 )
> + --len;
This is wrong in several accounts:
- style (space after opening parentheses and before closing parentheses,
no space before and after ">", "--" before instead of after "len", just
see the if clause above)
- if you want to access "p_end[-2]", you must check for "len > 1"
- you most likely want to check "p_end[-1]" anyway
- worst: there is no appropriate explanation why this patch is needed, and
even more importantly, why it does not break existing code
Hth,
Dscho
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Qemu-devel] Redundant repz prefixes in generated amd64 code
2007-03-17 9:51 ` Johannes Schindelin
@ 2007-03-17 11:16 ` Axel Zeuner
0 siblings, 0 replies; 9+ messages in thread
From: Axel Zeuner @ 2007-03-17 11:16 UTC (permalink / raw)
To: qemu-devel
Hi,
On Saturday 17 March 2007 10:51, Johannes Schindelin wrote:
> Hi,
>
> On Sat, 17 Mar 2007, axel wrote:
> > Why there exist two different blocks for COFF and ELF for x86/x86_64
> > hosts?
>
> Because COFF is used by Windows, and ELF by Linux, and they are
> substantially different?
>
Sorry, I did not want to criticise the code, I apologise for that.
But do these blocks different things? They should check for the last byte,
strip off trailing padding bytes including the ret instruction and determine
the size of the block of code to copy. Am I really wrong here?
>
> > @@ -1458,6 +1458,8 @@
> > error("empty code for %s", name);
> > if (p_end[-1] == 0xc3) {
> > len--;
> > + if ( len>0 && p_end[-2] == 0xf3 )
> > + --len;
>
> This is wrong in several accounts:
>
> - style (space after opening parentheses and before closing parentheses,
> no space before and after ">", "--" before instead of after "len", just
> see the if clause above)
I agree, sorry for that, next time I will follow the coding rules. Most of my
time I use C++ and there it makes sense to prefer prefix decrement and prefix
increment operations for performance reasons.
> - if you want to access "p_end[-2]", you must check for "len > 1"
I do not agree, because len was decremented in the line above and len is a
signed int and p_end was not changed.
>
> - you most likely want to check "p_end[-1]" anyway
No, because p_end[-1] was already checked and is known to be 0xc3. I want to
check the byte before p_end[-1], because repz; ret translates to 0xf3 0xc3
>
> - worst: there is no appropriate explanation why this patch is needed, and
The currently generated op_XXX functions are not affected by the stale repz
prefixes at the end of the generated and copied blocks, but the following
scenario is possible, at least in theory:
op_1:
movl $0,%%ecx
do_what_ever_but_do_not_change_ecx
repz; ret
op_2:
stosd
ret
Now the following op code sequence op_1, op_2 is generated. The resulting code
in the code generation buffer will be
movl $0,%%ecx
do_what_ever_but_do_not_change_ecx
repz; # stale from op_1
stosd; # body of op_2
This is probably not what one wants to execute.
> even more importantly, why it does not break existing code
I agree fully, as I mentioned, this is a HACK and WILL break existing code
sooner or later.
Kind regards
Axel
> Hth,
> Dscho
>
>
>
>
> _______________________________________________
> Qemu-devel mailing list
> Qemu-devel@nongnu.org
> http://lists.nongnu.org/mailman/listinfo/qemu-devel
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2007-03-17 11:18 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-03-16 14:15 [Qemu-devel] Redundant repz prefixes in generated amd64 code Julian Seward
2007-03-16 14:28 ` Paul Brook
2007-03-16 14:45 ` Julian Seward
2007-03-16 18:14 ` Paul Brook
2007-03-16 19:30 ` Igor Kovalenko
2007-03-16 23:06 ` Julian Seward
2007-03-17 7:35 ` axel
2007-03-17 9:51 ` Johannes Schindelin
2007-03-17 11:16 ` Axel Zeuner
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).