* [Qemu-devel] Redundant repz prefixes in generated amd64 code @ 2007-03-16 14:15 Julian Seward 2007-03-16 14:28 ` Paul Brook 2007-03-16 19:30 ` Igor Kovalenko 0 siblings, 2 replies; 9+ messages in thread From: Julian Seward @ 2007-03-16 14:15 UTC (permalink / raw) To: qemu-devel I'm seeing redundant repz (0xF3) prefixes in generated code, typically just before jumps: <code_gen_buffer+415>: repz mov $0xe07f,%eax <code_gen_buffer+421>: mov %eax,0x20(%rbp) <code_gen_buffer+424>: lea -25168302(%rip),%ebx # 0xaf0420 <tbs+96> <code_gen_buffer+430>: retq <code_gen_buffer+431>: mov -25168245(%rip),%eax # 0xaf0460 <tbs+160> <code_gen_buffer+437>: jmpq *%rax <code_gen_buffer+439>: repz mov $0xe092,%eax <code_gen_buffer+445>: mov %eax,0x20(%rbp) <code_gen_buffer+448>: lea -25168325(%rip),%ebx # 0xaf0421 <tbs+97> <code_gen_buffer+454>: retq I assume these are something to do with translation chaining/unchaining but have been unable to figure out where they come from. I know they get executed are so are not data - valgrind barfs on them. This is on a 64-bit host (Core 2) with qemu-0.9.0 compiled from source by gcc-3.4.6, running an x86 (32-bit) guest. At a guess I'd say the mov $imm,%eax is (created by? to do with?) gen_jmp_im in target-i386/translate.c, but I don't see how the F3 got in on the act. Grepping the source for 0xF3 turns up nothing plausible. Any ideas where it comes from and how to get rid of it? J ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Qemu-devel] Redundant repz prefixes in generated amd64 code 2007-03-16 14:15 [Qemu-devel] Redundant repz prefixes in generated amd64 code Julian Seward @ 2007-03-16 14:28 ` Paul Brook 2007-03-16 14:45 ` Julian Seward 2007-03-16 19:30 ` Igor Kovalenko 1 sibling, 1 reply; 9+ messages in thread From: Paul Brook @ 2007-03-16 14:28 UTC (permalink / raw) To: qemu-devel On Friday 16 March 2007 14:15, Julian Seward wrote: > I'm seeing redundant repz (0xF3) prefixes in generated code, typically > just before jumps: > > <code_gen_buffer+415>: repz mov $0xe07f,%eax > <code_gen_buffer+421>: mov %eax,0x20(%rbp) > <code_gen_buffer+424>: lea -25168302(%rip),%ebx # 0xaf0420 <tbs+96> > <code_gen_buffer+430>: retq > <code_gen_buffer+431>: mov -25168245(%rip),%eax # 0xaf0460 <tbs+160> > <code_gen_buffer+437>: jmpq *%rax > <code_gen_buffer+439>: repz mov $0xe092,%eax > <code_gen_buffer+445>: mov %eax,0x20(%rbp) > <code_gen_buffer+448>: lea -25168325(%rip),%ebx # 0xaf0421 <tbs+97> > <code_gen_buffer+454>: retq > > I assume these are something to do with translation chaining/unchaining but > have been unable to figure out where they come from. 0000000000008b50 <op_goto_tb1>: 8b50: 8b 05 00 00 00 00 mov 0(%rip),%eax 8b52: R_X86_64_PC32 __op_param1+0x3c 8b56: ff e0 jmpq *%rax 8b58: f3 c3 repz retq qemu only strips the final ret off. The prefixed ret is to avoid prefetch stalls on amd cpus. Paul ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Qemu-devel] Redundant repz prefixes in generated amd64 code 2007-03-16 14:28 ` Paul Brook @ 2007-03-16 14:45 ` Julian Seward 2007-03-16 18:14 ` Paul Brook 0 siblings, 1 reply; 9+ messages in thread From: Julian Seward @ 2007-03-16 14:45 UTC (permalink / raw) To: qemu-devel On Friday 16 March 2007 14:28, Paul Brook wrote: > On Friday 16 March 2007 14:15, Julian Seward wrote: > > I'm seeing redundant repz (0xF3) prefixes in generated code, typically > > just before jumps: > > > > <code_gen_buffer+415>: repz mov $0xe07f,%eax > > <code_gen_buffer+421>: mov %eax,0x20(%rbp) > > <code_gen_buffer+424>: lea -25168302(%rip),%ebx # 0xaf0420 <tbs+96> > > <code_gen_buffer+430>: retq > > <code_gen_buffer+431>: mov -25168245(%rip),%eax # 0xaf0460 <tbs+160> > > <code_gen_buffer+437>: jmpq *%rax > > <code_gen_buffer+439>: repz mov $0xe092,%eax > > <code_gen_buffer+445>: mov %eax,0x20(%rbp) > > <code_gen_buffer+448>: lea -25168325(%rip),%ebx # 0xaf0421 <tbs+97> > > <code_gen_buffer+454>: retq > > > > I assume these are something to do with translation chaining/unchaining > > but have been unable to figure out where they come from. > > 0000000000008b50 <op_goto_tb1>: > 8b50: 8b 05 00 00 00 00 mov 0(%rip),%eax > 8b52: R_X86_64_PC32 __op_param1+0x3c > 8b56: ff e0 jmpq *%rax > 8b58: f3 c3 repz retq > > qemu only strips the final ret off. > The prefixed ret is to avoid prefetch stalls on amd cpus. So the implication of this is that the generated code just happens to work only because the dangling F3 never ends up in front of some other instruction which it would change the meaning of? J ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Qemu-devel] Redundant repz prefixes in generated amd64 code 2007-03-16 14:45 ` Julian Seward @ 2007-03-16 18:14 ` Paul Brook 0 siblings, 0 replies; 9+ messages in thread From: Paul Brook @ 2007-03-16 18:14 UTC (permalink / raw) To: qemu-devel > > 0000000000008b50 <op_goto_tb1>: > > 8b50: 8b 05 00 00 00 00 mov 0(%rip),%eax > > 8b52: R_X86_64_PC32 __op_param1+0x3c > > 8b56: ff e0 jmpq *%rax > > 8b58: f3 c3 repz retq > > > > qemu only strips the final ret off. > > The prefixed ret is to avoid prefetch stalls on amd cpus. > > So the implication of this is that the generated code just happens to > work only because the dangling F3 never ends up in front of some other > instruction which it would change the meaning of? Correct. Paul ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Qemu-devel] Redundant repz prefixes in generated amd64 code 2007-03-16 14:15 [Qemu-devel] Redundant repz prefixes in generated amd64 code Julian Seward 2007-03-16 14:28 ` Paul Brook @ 2007-03-16 19:30 ` Igor Kovalenko 2007-03-16 23:06 ` Julian Seward 2007-03-17 7:35 ` axel 1 sibling, 2 replies; 9+ messages in thread From: Igor Kovalenko @ 2007-03-16 19:30 UTC (permalink / raw) To: qemu-devel On 3/16/07, Julian Seward <jseward@acm.org> wrote: > > I'm seeing redundant repz (0xF3) prefixes in generated code, typically > just before jumps: > > <code_gen_buffer+415>: repz mov $0xe07f,%eax > <code_gen_buffer+421>: mov %eax,0x20(%rbp) > <code_gen_buffer+424>: lea -25168302(%rip),%ebx # 0xaf0420 <tbs+96> > <code_gen_buffer+430>: retq > <code_gen_buffer+431>: mov -25168245(%rip),%eax # 0xaf0460 <tbs+160> > <code_gen_buffer+437>: jmpq *%rax > <code_gen_buffer+439>: repz mov $0xe092,%eax > <code_gen_buffer+445>: mov %eax,0x20(%rbp) > <code_gen_buffer+448>: lea -25168325(%rip),%ebx # 0xaf0421 <tbs+97> > <code_gen_buffer+454>: retq > > I assume these are something to do with translation chaining/unchaining but > have been unable to figure out where they come from. I know they get executed > are so are not data - valgrind barfs on them. > > This is on a 64-bit host (Core 2) with qemu-0.9.0 compiled from source by > gcc-3.4.6, running an x86 (32-bit) guest. > > At a guess I'd say the mov $imm,%eax is (created by? to do with?) > gen_jmp_im in target-i386/translate.c, but I don't see how the F3 > got in on the act. Grepping the source for 0xF3 turns up nothing > plausible. Any ideas where it comes from and how to get rid of it? > Try -mtune=nocona something like the following Index: Makefile.target =================================================================== RCS file: /cvsroot/qemu/qemu/Makefile.target,v retrieving revision 1.147 diff -u -r1.147 Makefile.target --- Makefile.target 28 Feb 2007 21:36:41 -0000 1.147 +++ Makefile.target 16 Mar 2007 19:29:04 -0000 @@ -99,6 +99,7 @@ endif ifeq ($(ARCH),x86_64) +OP_CFLAGS+= -mtune=nocona -W -Wall -O4 BASE_LDFLAGS+=-Wl,-T,$(SRC_PATH)/$(ARCH).ld endif -- Kind regards, Igor V. Kovalenko ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Qemu-devel] Redundant repz prefixes in generated amd64 code 2007-03-16 19:30 ` Igor Kovalenko @ 2007-03-16 23:06 ` Julian Seward 2007-03-17 7:35 ` axel 1 sibling, 0 replies; 9+ messages in thread From: Julian Seward @ 2007-03-16 23:06 UTC (permalink / raw) To: qemu-devel > ifeq ($(ARCH),x86_64) > +OP_CFLAGS+= -mtune=nocona -W -Wall -O4 > BASE_LDFLAGS+=-Wl,-T,$(SRC_PATH)/$(ARCH).ld > endif That works. Thanks. J ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Qemu-devel] Redundant repz prefixes in generated amd64 code 2007-03-16 19:30 ` Igor Kovalenko 2007-03-16 23:06 ` Julian Seward @ 2007-03-17 7:35 ` axel 2007-03-17 9:51 ` Johannes Schindelin 1 sibling, 1 reply; 9+ messages in thread From: axel @ 2007-03-17 7:35 UTC (permalink / raw) To: qemu-devel On Friday 16 March 2007 20:30, Igor Kovalenko wrote: > On 3/16/07, Julian Seward <jseward@acm.org> wrote: > > I'm seeing redundant repz (0xF3) prefixes in generated code, typically > > just before jumps: > > > > <code_gen_buffer+415>: repz mov $0xe07f,%eax > > <code_gen_buffer+421>: mov %eax,0x20(%rbp) > > <code_gen_buffer+424>: lea -25168302(%rip),%ebx # 0xaf0420 <tbs+96> > > <code_gen_buffer+430>: retq > > <code_gen_buffer+431>: mov -25168245(%rip),%eax # 0xaf0460 <tbs+160> > > <code_gen_buffer+437>: jmpq *%rax > > <code_gen_buffer+439>: repz mov $0xe092,%eax > > <code_gen_buffer+445>: mov %eax,0x20(%rbp) > > <code_gen_buffer+448>: lea -25168325(%rip),%ebx # 0xaf0421 <tbs+97> > > <code_gen_buffer+454>: retq > > > > I assume these are something to do with translation chaining/unchaining > > but have been unable to figure out where they come from. I know they get > > executed are so are not data - valgrind barfs on them. > > > > This is on a 64-bit host (Core 2) with qemu-0.9.0 compiled from source by > > gcc-3.4.6, running an x86 (32-bit) guest. > > > > At a guess I'd say the mov $imm,%eax is (created by? to do with?) > > gen_jmp_im in target-i386/translate.c, but I don't see how the F3 > > got in on the act. Grepping the source for 0xF3 turns up nothing > > plausible. Any ideas where it comes from and how to get rid of it? > > Try -mtune=nocona something like the following IMHO one should change dyngen. Below a hack (elf only, I can not test the COFF branch). It works for amd64->amd64 (tested with -no-kqemu), but is not save, because the instruction before the ret may contain the 0xf3 byte as immediate operand. A full solution would dissassemble the whole function, determine the borders of the opcode and then decide, where to cut the block to copy. Perhaps one could then also detect multiple returns in a function and one could try to rewrite the opcode blocks replacing the multiple returns with jumps. Why there exist two different blocks for COFF and ELF for x86/x86_64 hosts? Axel Index: dyngen.c =================================================================== RCS file: /sources/qemu/qemu/dyngen.c,v retrieving revision 1.49 diff -u -r1.49 dyngen.c --- dyngen.c 4 Mar 2007 00:52:16 -0000 1.49 +++ dyngen.c 17 Mar 2007 07:19:41 -0000 @@ -1458,6 +1458,8 @@ error("empty code for %s", name); if (p_end[-1] == 0xc3) { len--; + if ( len>0 && p_end[-2] == 0xf3 ) + --len; } else { error("ret or jmp expected at the end of %s", name); } ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Qemu-devel] Redundant repz prefixes in generated amd64 code 2007-03-17 7:35 ` axel @ 2007-03-17 9:51 ` Johannes Schindelin 2007-03-17 11:16 ` Axel Zeuner 0 siblings, 1 reply; 9+ messages in thread From: Johannes Schindelin @ 2007-03-17 9:51 UTC (permalink / raw) To: axel; +Cc: qemu-devel Hi, On Sat, 17 Mar 2007, axel wrote: > Why there exist two different blocks for COFF and ELF for x86/x86_64 > hosts? Because COFF is used by Windows, and ELF by Linux, and they are substantially different? > @@ -1458,6 +1458,8 @@ > error("empty code for %s", name); > if (p_end[-1] == 0xc3) { > len--; > + if ( len>0 && p_end[-2] == 0xf3 ) > + --len; This is wrong in several accounts: - style (space after opening parentheses and before closing parentheses, no space before and after ">", "--" before instead of after "len", just see the if clause above) - if you want to access "p_end[-2]", you must check for "len > 1" - you most likely want to check "p_end[-1]" anyway - worst: there is no appropriate explanation why this patch is needed, and even more importantly, why it does not break existing code Hth, Dscho ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Qemu-devel] Redundant repz prefixes in generated amd64 code 2007-03-17 9:51 ` Johannes Schindelin @ 2007-03-17 11:16 ` Axel Zeuner 0 siblings, 0 replies; 9+ messages in thread From: Axel Zeuner @ 2007-03-17 11:16 UTC (permalink / raw) To: qemu-devel Hi, On Saturday 17 March 2007 10:51, Johannes Schindelin wrote: > Hi, > > On Sat, 17 Mar 2007, axel wrote: > > Why there exist two different blocks for COFF and ELF for x86/x86_64 > > hosts? > > Because COFF is used by Windows, and ELF by Linux, and they are > substantially different? > Sorry, I did not want to criticise the code, I apologise for that. But do these blocks different things? They should check for the last byte, strip off trailing padding bytes including the ret instruction and determine the size of the block of code to copy. Am I really wrong here? > > > @@ -1458,6 +1458,8 @@ > > error("empty code for %s", name); > > if (p_end[-1] == 0xc3) { > > len--; > > + if ( len>0 && p_end[-2] == 0xf3 ) > > + --len; > > This is wrong in several accounts: > > - style (space after opening parentheses and before closing parentheses, > no space before and after ">", "--" before instead of after "len", just > see the if clause above) I agree, sorry for that, next time I will follow the coding rules. Most of my time I use C++ and there it makes sense to prefer prefix decrement and prefix increment operations for performance reasons. > - if you want to access "p_end[-2]", you must check for "len > 1" I do not agree, because len was decremented in the line above and len is a signed int and p_end was not changed. > > - you most likely want to check "p_end[-1]" anyway No, because p_end[-1] was already checked and is known to be 0xc3. I want to check the byte before p_end[-1], because repz; ret translates to 0xf3 0xc3 > > - worst: there is no appropriate explanation why this patch is needed, and The currently generated op_XXX functions are not affected by the stale repz prefixes at the end of the generated and copied blocks, but the following scenario is possible, at least in theory: op_1: movl $0,%%ecx do_what_ever_but_do_not_change_ecx repz; ret op_2: stosd ret Now the following op code sequence op_1, op_2 is generated. The resulting code in the code generation buffer will be movl $0,%%ecx do_what_ever_but_do_not_change_ecx repz; # stale from op_1 stosd; # body of op_2 This is probably not what one wants to execute. > even more importantly, why it does not break existing code I agree fully, as I mentioned, this is a HACK and WILL break existing code sooner or later. Kind regards Axel > Hth, > Dscho > > > > > _______________________________________________ > Qemu-devel mailing list > Qemu-devel@nongnu.org > http://lists.nongnu.org/mailman/listinfo/qemu-devel ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2007-03-17 11:18 UTC | newest] Thread overview: 9+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2007-03-16 14:15 [Qemu-devel] Redundant repz prefixes in generated amd64 code Julian Seward 2007-03-16 14:28 ` Paul Brook 2007-03-16 14:45 ` Julian Seward 2007-03-16 18:14 ` Paul Brook 2007-03-16 19:30 ` Igor Kovalenko 2007-03-16 23:06 ` Julian Seward 2007-03-17 7:35 ` axel 2007-03-17 9:51 ` Johannes Schindelin 2007-03-17 11:16 ` Axel Zeuner
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).