qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [Qemu-devel] patch for qemu with newer gcc-3.4.x (support repz retq optimization for amd processors correctly)
@ 2005-11-09 19:17 Igor Kovalenko
  2005-11-09 19:45 ` Paul Brook
  0 siblings, 1 reply; 9+ messages in thread
From: Igor Kovalenko @ 2005-11-09 19:17 UTC (permalink / raw)
  To: qemu-devel

Hi!

It turned out that newer gcc produces very interesting code
for op_goto_tbX and possibly other functions used by dyngen;
in that it adds 'rep' prefix to return instruction.
I have the following code in i386-softmmu/op.o:

00000000000084c0 <op_goto_tb0>:
     84c0:       8b 05 00 00 00 00       mov    0(%rip),%eax        # 84c6 <op_goto_tb0+0x6>
     84c6:       ff e0                   jmpq   *%eax
     84c8:       f3 c3                   repz retq
     84ca:       66                      data16
     84cb:       66                      data16
     84cc:       90                      nop
     84cd:       66                      data16
     84ce:       66                      data16
     84cf:       90                      nop

Quite obviously stripping the 'retq' in dyngen won't always
work because 'rep' prefix could interfere with appended code.
I found that trying to run qemu under valgrind, see bug page
http://bugs.kde.org/show_bug.cgi?id=115869 for details.
For example, at the very beginning of qemu booting the pc
the following code is generated:

## ...
## 0x000fe07d:  je     0xfe092
##
0x016f75ec:  cmpb   $0x0,0x2c(%rbp)
0x016f75f0:  jne    0x16f75f7
0x016f75f2:  jmpq   0x16f760f

###the return from call
0x016f75f7:  mov    -13631729(%rip),%eax        # 0x9f750c
0x016f75fd:  jmpq   *%eax

0x016f75ff:  repz mov $0xe07f,%eax
0x016f7605:  mov    %eax,0x20(%rbp)

0x016f7608:  lea    -13631814(%rip),%ebx        # 0x9f74c8
0x016f760e:  retq

###the not zero branch
0x016f760f:  mov    -13631749(%rip),%eax        # 0x9f7510
0x016f7615:  jmpq   *%eax

0x016f7617:  repz mov $0xe092,%eax
0x016f761d:  mov    %eax,0x20(%rbp)

0x016f7620:  lea    -13631837(%rip),%ebx        # 0x9f74c9
0x016f7626:  retq

Notice the 'repz mov' sequence, which seems to be undocumented
instruction. It seems to work somehow but chokes valgrind decoder.
The following patch (against current CVS) fixes this problem,
please apply:

Index: dyngen.c
===================================================================
RCS file: /cvsroot/qemu/qemu/dyngen.c,v
retrieving revision 1.40
diff -u -r1.40 dyngen.c
--- dyngen.c    27 Apr 2005 19:55:58 -0000      1.40
+++ dyngen.c    9 Nov 2005 19:12:38 -0000
@@ -1387,6 +1387,12 @@
              error("empty code for %s", name);
          if (p_end[-1] == 0xc3) {
              len--;
+            /* This can be 'rep ; ret' optimized return sequence,
+             * need to check further and strip the 'rep' prefix
+             */
+            if (len != 0 && p_end[-2] == 0xf3) {
+                len--;
+            }
          } else {
              error("ret or jmp expected at the end of %s", name);
          }

-- 
Kind regards,
Igor V. Kovalenko

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Qemu-devel] patch for qemu with newer gcc-3.4.x (support repz retq optimization for amd processors correctly)
  2005-11-09 19:17 [Qemu-devel] patch for qemu with newer gcc-3.4.x (support repz retq optimization for amd processors correctly) Igor Kovalenko
@ 2005-11-09 19:45 ` Paul Brook
  2005-11-09 19:51   ` Igor Kovalenko
  2005-11-10 22:28   ` Igor Kovalenko
  0 siblings, 2 replies; 9+ messages in thread
From: Paul Brook @ 2005-11-09 19:45 UTC (permalink / raw)
  To: qemu-devel; +Cc: Igor Kovalenko

> Notice the 'repz mov' sequence, which seems to be undocumented
> instruction. It seems to work somehow but chokes valgrind decoder.
> The following patch (against current CVS) fixes this problem,

This patch is incorrect.

It could match any number of other instructions that happen to end in 0xf3. eg

   0:   c7 45 00 00 00 00 f3    movl   $0xf3000000,0x0(%ebp)
   7:   c3                      ret

IIRC the "rep; ret" sequence is to avoid a pipeline stall on Athlon CPUs.  Try 
tuning for a different CPU.

Paul

> Index: dyngen.c
> ===================================================================
> RCS file: /cvsroot/qemu/qemu/dyngen.c,v
> retrieving revision 1.40
> diff -u -r1.40 dyngen.c
> --- dyngen.c    27 Apr 2005 19:55:58 -0000      1.40
> +++ dyngen.c    9 Nov 2005 19:12:38 -0000
> @@ -1387,6 +1387,12 @@
>               error("empty code for %s", name);
>           if (p_end[-1] == 0xc3) {
>               len--;
> +            /* This can be 'rep ; ret' optimized return sequence,
> +             * need to check further and strip the 'rep' prefix
> +             */
> +            if (len != 0 && p_end[-2] == 0xf3) {
> +                len--;
> +            }
>           } else {
>               error("ret or jmp expected at the end of %s", name);
>           }

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Qemu-devel] patch for qemu with newer gcc-3.4.x (support repz retq optimization for amd processors correctly)
  2005-11-09 19:45 ` Paul Brook
@ 2005-11-09 19:51   ` Igor Kovalenko
  2005-11-10  1:33     ` Julian Seward
  2005-11-10 22:28   ` Igor Kovalenko
  1 sibling, 1 reply; 9+ messages in thread
From: Igor Kovalenko @ 2005-11-09 19:51 UTC (permalink / raw)
  To: Paul Brook; +Cc: qemu-devel

Paul Brook wrote:
>> Notice the 'repz mov' sequence, which seems to be undocumented
>> instruction. It seems to work somehow but chokes valgrind decoder.
>> The following patch (against current CVS) fixes this problem,
> 
> This patch is incorrect.
> 
> It could match any number of other instructions that happen to end in 0xf3. eg
> 
>    0:   c7 45 00 00 00 00 f3    movl   $0xf3000000,0x0(%ebp)
>    7:   c3                      ret
> 
> IIRC the "rep; ret" sequence is to avoid a pipeline stall on Athlon CPUs.  Try 
> tuning for a different CPU.
> 
> Paul
> 
>> Index: dyngen.c
>> ===================================================================
>> RCS file: /cvsroot/qemu/qemu/dyngen.c,v
>> retrieving revision 1.40
>> diff -u -r1.40 dyngen.c
>> --- dyngen.c    27 Apr 2005 19:55:58 -0000      1.40
>> +++ dyngen.c    9 Nov 2005 19:12:38 -0000
>> @@ -1387,6 +1387,12 @@
>>               error("empty code for %s", name);
>>           if (p_end[-1] == 0xc3) {
>>               len--;
>> +            /* This can be 'rep ; ret' optimized return sequence,
>> +             * need to check further and strip the 'rep' prefix
>> +             */
>> +            if (len != 0 && p_end[-2] == 0xf3) {
>> +                len--;
>> +            }
>>           } else {
>>               error("ret or jmp expected at the end of %s", name);
>>           }
> 
> 

OK I missed that...
Then a discussion about gcc-4 turns into something much more interesting :)

-- 
Kind regards,
Igor V. Kovalenko

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Qemu-devel] patch for qemu with newer gcc-3.4.x (support repz retq optimization for amd processors correctly)
  2005-11-09 19:51   ` Igor Kovalenko
@ 2005-11-10  1:33     ` Julian Seward
  2005-11-10  1:44       ` Jamie Lokier
  2005-11-10  1:54       ` Jim C. Brown
  0 siblings, 2 replies; 9+ messages in thread
From: Julian Seward @ 2005-11-10  1:33 UTC (permalink / raw)
  To: qemu-devel


The use of gcc to generate the back end in QEMU's early days was a 
clever way to get the project up and running quickly.  But surely
now it would be better to transition to a handwritten backend, so
as to be independent future changes in gcc, and generally more robust?

J

On Wednesday 09 November 2005 19:51, Igor Kovalenko wrote:
> Paul Brook wrote:
> >> Notice the 'repz mov' sequence, which seems to be undocumented
> >> instruction. It seems to work somehow but chokes valgrind decoder.
> >> The following patch (against current CVS) fixes this problem,
> >
> > This patch is incorrect.
> >
> > It could match any number of other instructions that happen to end in
> > 0xf3. eg
> >
> >    0:   c7 45 00 00 00 00 f3    movl   $0xf3000000,0x0(%ebp)
> >    7:   c3                      ret
> >
> > IIRC the "rep; ret" sequence is to avoid a pipeline stall on Athlon CPUs.
> >  Try tuning for a different CPU.
> >
> > Paul
> >
> >> Index: dyngen.c
> >> ===================================================================
> >> RCS file: /cvsroot/qemu/qemu/dyngen.c,v
> >> retrieving revision 1.40
> >> diff -u -r1.40 dyngen.c
> >> --- dyngen.c    27 Apr 2005 19:55:58 -0000      1.40
> >> +++ dyngen.c    9 Nov 2005 19:12:38 -0000
> >> @@ -1387,6 +1387,12 @@
> >>               error("empty code for %s", name);
> >>           if (p_end[-1] == 0xc3) {
> >>               len--;
> >> +            /* This can be 'rep ; ret' optimized return sequence,
> >> +             * need to check further and strip the 'rep' prefix
> >> +             */
> >> +            if (len != 0 && p_end[-2] == 0xf3) {
> >> +                len--;
> >> +            }
> >>           } else {
> >>               error("ret or jmp expected at the end of %s", name);
> >>           }
>
> OK I missed that...
> Then a discussion about gcc-4 turns into something much more interesting :)

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Qemu-devel] patch for qemu with newer gcc-3.4.x (support repz retq optimization for amd processors correctly)
  2005-11-10  1:33     ` Julian Seward
@ 2005-11-10  1:44       ` Jamie Lokier
  2005-11-10  3:35         ` Jim C. Brown
  2005-11-11  7:59         ` John R. Hogerhuis
  2005-11-10  1:54       ` Jim C. Brown
  1 sibling, 2 replies; 9+ messages in thread
From: Jamie Lokier @ 2005-11-10  1:44 UTC (permalink / raw)
  To: qemu-devel

> 
> The use of gcc to generate the back end in QEMU's early days was a 
> clever way to get the project up and running quickly.  But surely
> now it would be better to transition to a handwritten backend, so

It should be trivial to take the _currently_ generated GCC code for
all the architectures QEMU is commonly built on, and just distribute
that code with the QEMU source.

Then it would be independent of future changes to GCC.

I understand a handwritten backend is already being written.  But
until a proper one is done, wouldn't that serve as a useful stopgap?

-- Jamie

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Qemu-devel] patch for qemu with newer gcc-3.4.x (support repz retq optimization for amd processors correctly)
  2005-11-10  1:33     ` Julian Seward
  2005-11-10  1:44       ` Jamie Lokier
@ 2005-11-10  1:54       ` Jim C. Brown
  1 sibling, 0 replies; 9+ messages in thread
From: Jim C. Brown @ 2005-11-10  1:54 UTC (permalink / raw)
  To: Julian Seward; +Cc: qemu-devel

On Thu, Nov 10, 2005 at 01:33:55AM +0000, Julian Seward wrote:
> 
> The use of gcc to generate the back end in QEMU's early days was a 
> clever way to get the project up and running quickly.  But surely
> now it would be better to transition to a handwritten backend, so
> as to be independent future changes in gcc, and generally more robust?
> 
> J
> 

Yes, Paul Brook is working on it.

> On Wednesday 09 November 2005 19:51, Igor Kovalenko wrote:
> > Paul Brook wrote:
> > >> Notice the 'repz mov' sequence, which seems to be undocumented
> > >> instruction. It seems to work somehow but chokes valgrind decoder.
> > >> The following patch (against current CVS) fixes this problem,
> > >
> > > This patch is incorrect.
> > >
> > > It could match any number of other instructions that happen to end in
> > > 0xf3. eg
> > >
> > >    0:   c7 45 00 00 00 00 f3    movl   $0xf3000000,0x0(%ebp)
> > >    7:   c3                      ret
> > >
> > > IIRC the "rep; ret" sequence is to avoid a pipeline stall on Athlon CPUs.
> > >  Try tuning for a different CPU.
> > >
> > > Paul
> > >
> > >> Index: dyngen.c
> > >> ===================================================================
> > >> RCS file: /cvsroot/qemu/qemu/dyngen.c,v
> > >> retrieving revision 1.40
> > >> diff -u -r1.40 dyngen.c
> > >> --- dyngen.c    27 Apr 2005 19:55:58 -0000      1.40
> > >> +++ dyngen.c    9 Nov 2005 19:12:38 -0000
> > >> @@ -1387,6 +1387,12 @@
> > >>               error("empty code for %s", name);
> > >>           if (p_end[-1] == 0xc3) {
> > >>               len--;
> > >> +            /* This can be 'rep ; ret' optimized return sequence,
> > >> +             * need to check further and strip the 'rep' prefix
> > >> +             */
> > >> +            if (len != 0 && p_end[-2] == 0xf3) {
> > >> +                len--;
> > >> +            }
> > >>           } else {
> > >>               error("ret or jmp expected at the end of %s", name);
> > >>           }
> >
> > OK I missed that...
> > Then a discussion about gcc-4 turns into something much more interesting :)
> 
> 
> _______________________________________________
> Qemu-devel mailing list
> Qemu-devel@nongnu.org
> http://lists.nongnu.org/mailman/listinfo/qemu-devel
> 

-- 
Infinite complexity begets infinite beauty.
Infinite precision begets infinite perfection.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Qemu-devel] patch for qemu with newer gcc-3.4.x (support repz retq optimization for amd processors correctly)
  2005-11-10  1:44       ` Jamie Lokier
@ 2005-11-10  3:35         ` Jim C. Brown
  2005-11-11  7:59         ` John R. Hogerhuis
  1 sibling, 0 replies; 9+ messages in thread
From: Jim C. Brown @ 2005-11-10  3:35 UTC (permalink / raw)
  To: Jamie Lokier; +Cc: qemu-devel

On Thu, Nov 10, 2005 at 01:44:04AM +0000, Jamie Lokier wrote:
> > 
> > The use of gcc to generate the back end in QEMU's early days was a 
> > clever way to get the project up and running quickly.  But surely
> > now it would be better to transition to a handwritten backend, so
> 
> It should be trivial to take the _currently_ generated GCC code for
> all the architectures QEMU is commonly built on, and just distribute
> that code with the QEMU source.
> 

You mean convert the code with gcc 3 into asm, and then distribute that?

I'm no expert, but I would imagine such a solution would be quite brittle.
That's assuming one can make gcc 3 assembly code work with gcc 4 (5?) code
to form a single object file.

> Then it would be independent of future changes to GCC.

Well, someone would still need to maintain all those assembly files.

Or else keep an old copy of gcc 3 around to regenerate them whenever needed.

> 
> I understand a handwritten backend is already being written.  But
> until a proper one is done, wouldn't that serve as a useful stopgap?
> 

I believe the current version works - but it doesn't implement every possible
op yet. For now, it relies on dyngen to produce the missing ops (until they are
replaced with the hand coded version).

> -- Jamie
> 

-- 
Infinite complexity begets infinite beauty.
Infinite precision begets infinite perfection.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Qemu-devel] patch for qemu with newer gcc-3.4.x (support repz retq optimization for amd processors correctly)
  2005-11-09 19:45 ` Paul Brook
  2005-11-09 19:51   ` Igor Kovalenko
@ 2005-11-10 22:28   ` Igor Kovalenko
  1 sibling, 0 replies; 9+ messages in thread
From: Igor Kovalenko @ 2005-11-10 22:28 UTC (permalink / raw)
  To: Paul Brook; +Cc: qemu-devel

Paul Brook wrote:
>> Notice the 'repz mov' sequence, which seems to be undocumented
>> instruction. It seems to work somehow but chokes valgrind decoder.
>> The following patch (against current CVS) fixes this problem,
> 
> This patch is incorrect.
> 
> It could match any number of other instructions that happen to end in 0xf3. eg
> 
>    0:   c7 45 00 00 00 00 f3    movl   $0xf3000000,0x0(%ebp)
>    7:   c3                      ret
> 
> IIRC the "rep; ret" sequence is to avoid a pipeline stall on Athlon CPUs.  Try 
> tuning for a different CPU.
> 
> Paul
> 
>> Index: dyngen.c
>> ===================================================================
>> RCS file: /cvsroot/qemu/qemu/dyngen.c,v
>> retrieving revision 1.40
>> diff -u -r1.40 dyngen.c
>> --- dyngen.c    27 Apr 2005 19:55:58 -0000      1.40
>> +++ dyngen.c    9 Nov 2005 19:12:38 -0000
>> @@ -1387,6 +1387,12 @@
>>               error("empty code for %s", name);
>>           if (p_end[-1] == 0xc3) {
>>               len--;
>> +            /* This can be 'rep ; ret' optimized return sequence,
>> +             * need to check further and strip the 'rep' prefix
>> +             */
>> +            if (len != 0 && p_end[-2] == 0xf3) {
>> +                len--;
>> +            }
>>           } else {
>>               error("ret or jmp expected at the end of %s", name);
>>           }
> 
> 

I was able to workaround 'rep ; ret' generation using gcc switches:
-mcpu=itanium2 -mtune=nocona

My system is running on amd64 and gcc required both -mcpu and -mtune

-- 
Kind regards,
Igor V. Kovalenko

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Qemu-devel] patch for qemu with newer gcc-3.4.x (support repz retq optimization for amd processors correctly)
  2005-11-10  1:44       ` Jamie Lokier
  2005-11-10  3:35         ` Jim C. Brown
@ 2005-11-11  7:59         ` John R. Hogerhuis
  1 sibling, 0 replies; 9+ messages in thread
From: John R. Hogerhuis @ 2005-11-11  7:59 UTC (permalink / raw)
  To: qemu-devel

On Thu, 2005-11-10 at 01:44 +0000, Jamie Lokier wrote:
> > 
> > The use of gcc to generate the back end in QEMU's early days was a 
> > clever way to get the project up and running quickly.  But surely
> > now it would be better to transition to a handwritten backend, so
> 
> It should be trivial to take the _currently_ generated GCC code for
> all the architectures QEMU is commonly built on, and just distribute
> that code with the QEMU source.
> 
> Then it would be independent of future changes to GCC.
> 
> I understand a handwritten backend is already being written.  But
> until a proper one is done, wouldn't that serve as a useful stopgap?
> 
> -- Jamie
> 


I you poke around in the archives, this idea was raised before (by me,
maybe). I think it would work, but I didn't hear a lot of enthusiasm for
it. I think it would also open the door for hand-tweaking critical areas
for performance.

Paul Brook has another take, and the benefit of his is you can look at
real life code to see how it will work.

-- John.

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2005-11-11  7:59 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-11-09 19:17 [Qemu-devel] patch for qemu with newer gcc-3.4.x (support repz retq optimization for amd processors correctly) Igor Kovalenko
2005-11-09 19:45 ` Paul Brook
2005-11-09 19:51   ` Igor Kovalenko
2005-11-10  1:33     ` Julian Seward
2005-11-10  1:44       ` Jamie Lokier
2005-11-10  3:35         ` Jim C. Brown
2005-11-11  7:59         ` John R. Hogerhuis
2005-11-10  1:54       ` Jim C. Brown
2005-11-10 22:28   ` Igor Kovalenko

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).