[PATCH] x86: x86-opcode-map.txt: explain CALLW discrepancy between Intel and AMD

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH] x86: x86-opcode-map.txt: explain CALLW discrepancy between Intel and AMD
@ 2015-02-12 19:06 Denys Vlasenko
  2015-02-13 12:01 ` Borislav Petkov
                   ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: Denys Vlasenko @ 2015-02-12 19:06 UTC (permalink / raw)
  To: Masami Hiramatsu; +Cc: Denys Vlasenko, Ingo Molnar, Oleg Nesterov, linux-kernel

In 64-bit mode, AMD and Intel CPUs treat 0x66 prefix before branch
insns differently. For near branches, it affects decode too since
immediate offset's width is different.

Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
CC: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
CC: Ingo Molnar <mingo@kernel.org>
CC: Oleg Nesterov <oleg@redhat.com>
CC: linux-kernel@vger.kernel.org
---
 arch/x86/lib/x86-opcode-map.txt | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/arch/x86/lib/x86-opcode-map.txt b/arch/x86/lib/x86-opcode-map.txt
index 1a2be7c..816488c 100644
--- a/arch/x86/lib/x86-opcode-map.txt
+++ b/arch/x86/lib/x86-opcode-map.txt
@@ -273,6 +273,9 @@ dd: ESC
 de: ESC
 df: ESC
 # 0xe0 - 0xef
+# Note: "forced64" is Intel CPU behavior: they ignore 0x66 prefix
+# in 64-bit mode. AMD CPUs accept 0x66 prefix, it causes RIP truncation
+# to 16 bits. In 32-bit mode, 0x66 is accepted by both Intel and AMD.
 e0: LOOPNE/LOOPNZ Jb (f64)
 e1: LOOPE/LOOPZ Jb (f64)
 e2: LOOP Jb (f64)
@@ -281,6 +284,10 @@ e4: IN AL,Ib
 e5: IN eAX,Ib
 e6: OUT Ib,AL
 e7: OUT Ib,eAX
+# With 0x66 prefix in 64-bit mode, for AMD CPUs immediate offset
+# in "near" jumps and calls is 16-bit. For CALL,
+# push of return address is 16-bit wide, RSP is decremented by 2
+# but is not truncated to 16 bits, unlike RIP.
 e8: CALL Jz (f64)
 e9: JMP-near Jz (f64)
 ea: JMP-far Ap (i64)
@@ -456,6 +463,7 @@ AVXcode: 1
 7e: movd/q Ey,Pd | vmovd/q Ey,Vy (66),(v1) | vmovq Vq,Wq (F3),(v1)
 7f: movq Qq,Pq | vmovdqa Wx,Vx (66) | vmovdqu Wx,Vx (F3)
 # 0x0f 0x80-0x8f
+# Note: "forced64" is Intel CPU behavior (see comment about CALL insn).
 80: JO Jz (f64)
 81: JNO Jz (f64)
 82: JB/JC/JNAE Jz (f64)
@@ -842,6 +850,7 @@ EndTable
 GrpTable: Grp5
 0: INC Ev
 1: DEC Ev
+# Note: "forced64" is Intel CPU behavior (see comment about CALL insn).
 2: CALLN Ev (f64)
 3: CALLF Ep
 4: JMPN Ev (f64)
-- 
1.8.1.4


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] x86: x86-opcode-map.txt: explain CALLW discrepancy between Intel and AMD
  2015-02-12 19:06 [PATCH] x86: x86-opcode-map.txt: explain CALLW discrepancy between Intel and AMD Denys Vlasenko
@ 2015-02-13 12:01 ` Borislav Petkov
  2015-02-13 13:25   ` Denys Vlasenko
  2015-02-13 12:52 ` Masami Hiramatsu
  2015-02-19  0:25 ` [tip:x86/asm] x86/asm/decoder: Explain " tip-bot for Denys Vlasenko
  2 siblings, 1 reply; 6+ messages in thread
From: Borislav Petkov @ 2015-02-13 12:01 UTC (permalink / raw)
  To: Denys Vlasenko; +Cc: Masami Hiramatsu, Ingo Molnar, Oleg Nesterov, linux-kernel

On Thu, Feb 12, 2015 at 08:06:57PM +0100, Denys Vlasenko wrote:
> In 64-bit mode, AMD and Intel CPUs treat 0x66 prefix before branch
> insns differently. For near branches, it affects decode too since
> immediate offset's width is different.
> 
> Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
> CC: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
> CC: Ingo Molnar <mingo@kernel.org>
> CC: Oleg Nesterov <oleg@redhat.com>
> CC: linux-kernel@vger.kernel.org
> ---
>  arch/x86/lib/x86-opcode-map.txt | 9 +++++++++
>  1 file changed, 9 insertions(+)
> 
> diff --git a/arch/x86/lib/x86-opcode-map.txt b/arch/x86/lib/x86-opcode-map.txt
> index 1a2be7c..816488c 100644
> --- a/arch/x86/lib/x86-opcode-map.txt
> +++ b/arch/x86/lib/x86-opcode-map.txt
> @@ -273,6 +273,9 @@ dd: ESC
>  de: ESC
>  df: ESC
>  # 0xe0 - 0xef
> +# Note: "forced64" is Intel CPU behavior: they ignore 0x66 prefix
> +# in 64-bit mode. AMD CPUs accept 0x66 prefix, it causes RIP truncation
> +# to 16 bits. In 32-bit mode, 0x66 is accepted by both Intel and AMD.

Well, according to the SDM, Intel truncates too, see the LOOP/LOOPcc
Operation section:

	...
	IF BranchCond = 1
	THEN
	IF OperandSize = 32
	THEN EIP ← EIP + SignExtend(DEST);
	ELSE IF OperandSize = 64
	THEN RIP ← RIP + SignExtend(DEST);
	FI;
	ELSE IF OperandSize = 16
	THEN EIP ← EIP AND 0000FFFFH;		<---

and text talks about 0x67 but that's address size and it is used to size
the rCX register.

So something must be setting the OperandSize and text doesn't mention
anywhere about 0x66 being ignored.

Or have you been doing some empirical experiments? :-)

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] x86: x86-opcode-map.txt: explain CALLW discrepancy between Intel and AMD
  2015-02-12 19:06 [PATCH] x86: x86-opcode-map.txt: explain CALLW discrepancy between Intel and AMD Denys Vlasenko
  2015-02-13 12:01 ` Borislav Petkov
@ 2015-02-13 12:52 ` Masami Hiramatsu
  2015-02-19  0:25 ` [tip:x86/asm] x86/asm/decoder: Explain " tip-bot for Denys Vlasenko
  2 siblings, 0 replies; 6+ messages in thread
From: Masami Hiramatsu @ 2015-02-13 12:52 UTC (permalink / raw)
  To: Denys Vlasenko; +Cc: Ingo Molnar, Oleg Nesterov, linux-kernel

(2015/02/13 4:06), Denys Vlasenko wrote:
> In 64-bit mode, AMD and Intel CPUs treat 0x66 prefix before branch
> insns differently. For near branches, it affects decode too since
> immediate offset's width is different.

You'd better add a link to your investigation report :)

http://marc.info/?l=linux-kernel&m=139714939728946&w=2

so that anyone can see what actually happens.

Thank you,

> 
> Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
> CC: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
> CC: Ingo Molnar <mingo@kernel.org>
> CC: Oleg Nesterov <oleg@redhat.com>
> CC: linux-kernel@vger.kernel.org
> ---
>  arch/x86/lib/x86-opcode-map.txt | 9 +++++++++
>  1 file changed, 9 insertions(+)
> 
> diff --git a/arch/x86/lib/x86-opcode-map.txt b/arch/x86/lib/x86-opcode-map.txt
> index 1a2be7c..816488c 100644
> --- a/arch/x86/lib/x86-opcode-map.txt
> +++ b/arch/x86/lib/x86-opcode-map.txt
> @@ -273,6 +273,9 @@ dd: ESC
>  de: ESC
>  df: ESC
>  # 0xe0 - 0xef
> +# Note: "forced64" is Intel CPU behavior: they ignore 0x66 prefix
> +# in 64-bit mode. AMD CPUs accept 0x66 prefix, it causes RIP truncation
> +# to 16 bits. In 32-bit mode, 0x66 is accepted by both Intel and AMD.
>  e0: LOOPNE/LOOPNZ Jb (f64)
>  e1: LOOPE/LOOPZ Jb (f64)
>  e2: LOOP Jb (f64)
> @@ -281,6 +284,10 @@ e4: IN AL,Ib
>  e5: IN eAX,Ib
>  e6: OUT Ib,AL
>  e7: OUT Ib,eAX
> +# With 0x66 prefix in 64-bit mode, for AMD CPUs immediate offset
> +# in "near" jumps and calls is 16-bit. For CALL,
> +# push of return address is 16-bit wide, RSP is decremented by 2
> +# but is not truncated to 16 bits, unlike RIP.
>  e8: CALL Jz (f64)
>  e9: JMP-near Jz (f64)
>  ea: JMP-far Ap (i64)
> @@ -456,6 +463,7 @@ AVXcode: 1
>  7e: movd/q Ey,Pd | vmovd/q Ey,Vy (66),(v1) | vmovq Vq,Wq (F3),(v1)
>  7f: movq Qq,Pq | vmovdqa Wx,Vx (66) | vmovdqu Wx,Vx (F3)
>  # 0x0f 0x80-0x8f
> +# Note: "forced64" is Intel CPU behavior (see comment about CALL insn).
>  80: JO Jz (f64)
>  81: JNO Jz (f64)
>  82: JB/JC/JNAE Jz (f64)
> @@ -842,6 +850,7 @@ EndTable
>  GrpTable: Grp5
>  0: INC Ev
>  1: DEC Ev
> +# Note: "forced64" is Intel CPU behavior (see comment about CALL insn).
>  2: CALLN Ev (f64)
>  3: CALLF Ep
>  4: JMPN Ev (f64)
> 


-- 
Masami HIRAMATSU
Software Platform Research Dept. Linux Technology Research Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: masami.hiramatsu.pt@hitachi.com



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] x86: x86-opcode-map.txt: explain CALLW discrepancy between Intel and AMD
  2015-02-13 12:01 ` Borislav Petkov
@ 2015-02-13 13:25   ` Denys Vlasenko
  2015-02-14  0:28     ` Borislav Petkov
  0 siblings, 1 reply; 6+ messages in thread
From: Denys Vlasenko @ 2015-02-13 13:25 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Denys Vlasenko, Masami Hiramatsu, Ingo Molnar, Oleg Nesterov,
	Linux Kernel Mailing List

On Fri, Feb 13, 2015 at 1:01 PM, Borislav Petkov <bp@alien8.de> wrote:
> On Thu, Feb 12, 2015 at 08:06:57PM +0100, Denys Vlasenko wrote:
>> In 64-bit mode, AMD and Intel CPUs treat 0x66 prefix before branch
>> insns differently. For near branches, it affects decode too since
>> immediate offset's width is different.
>>
>> Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
>> CC: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
>> CC: Ingo Molnar <mingo@kernel.org>
>> CC: Oleg Nesterov <oleg@redhat.com>
>> CC: linux-kernel@vger.kernel.org
>> ---
>>  arch/x86/lib/x86-opcode-map.txt | 9 +++++++++
>>  1 file changed, 9 insertions(+)
>>
>> diff --git a/arch/x86/lib/x86-opcode-map.txt b/arch/x86/lib/x86-opcode-map.txt
>> index 1a2be7c..816488c 100644
>> --- a/arch/x86/lib/x86-opcode-map.txt
>> +++ b/arch/x86/lib/x86-opcode-map.txt
>> @@ -273,6 +273,9 @@ dd: ESC
>>  de: ESC
>>  df: ESC
>>  # 0xe0 - 0xef
>> +# Note: "forced64" is Intel CPU behavior: they ignore 0x66 prefix
>> +# in 64-bit mode. AMD CPUs accept 0x66 prefix, it causes RIP truncation
>> +# to 16 bits. In 32-bit mode, 0x66 is accepted by both Intel and AMD.
>
> Well, according to the SDM, Intel truncates too, see the LOOP/LOOPcc
> Operation section:
>
>         ...
>         IF BranchCond = 1
>         THEN
>         IF OperandSize = 32
>         THEN EIP ← EIP + SignExtend(DEST);
>         ELSE IF OperandSize = 64
>         THEN RIP ← RIP + SignExtend(DEST);
>         FI;
>         ELSE IF OperandSize = 16
>         THEN EIP ← EIP AND 0000FFFFH;           <---
>
> and text talks about 0x67 but that's address size and it is used to size
> the rCX register.
>
> So something must be setting the OperandSize and text doesn't mention
> anywhere about 0x66 being ignored.
>
> Or have you been doing some empirical experiments? :-)

Yes, I did.

32-bit case: Intel CPU truncates EIP to 16 bits:

$ cat t.S
_start:         .globl  _start
1:  .byte 0x66
    loop 1b

$ gcc -nostartfiles -nostdlib -m32 t.S

$ objdump -dr a.out
a.out:     file format elf32-i386
Disassembly of section .text:
08048098 <_start>:
 8048098:    66                       data16
 8048099:    e2 fd                    loop   8048098 <_start>

$ gdb ./a.out
(gdb) run
Program received signal SIGSEGV, Segmentation fault.
0x00008098 in ?? ()


Now let's try 64-bit version - compiling without -m32:

$ gcc -nostartfiles -nostdlib t.S
$ ./a.out
(runs without SEGV)

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] x86: x86-opcode-map.txt: explain CALLW discrepancy between Intel and AMD
  2015-02-13 13:25   ` Denys Vlasenko
@ 2015-02-14  0:28     ` Borislav Petkov
  0 siblings, 0 replies; 6+ messages in thread
From: Borislav Petkov @ 2015-02-14  0:28 UTC (permalink / raw)
  To: Denys Vlasenko
  Cc: Denys Vlasenko, Masami Hiramatsu, Ingo Molnar, Oleg Nesterov,
	Linux Kernel Mailing List

On Fri, Feb 13, 2015 at 02:25:20PM +0100, Denys Vlasenko wrote:
> > Well, according to the SDM, Intel truncates too, see the LOOP/LOOPcc
> > Operation section:
> >
> >         ...
> >         IF BranchCond = 1
> >         THEN
> >         IF OperandSize = 32
> >         THEN EIP ← EIP + SignExtend(DEST);
> >         ELSE IF OperandSize = 64
> >         THEN RIP ← RIP + SignExtend(DEST);
> >         FI;
> >         ELSE IF OperandSize = 16
> >         THEN EIP ← EIP AND 0000FFFFH;           <---
> >
> > and text talks about 0x67 but that's address size and it is used to size
> > the rCX register.
> >
> > So something must be setting the OperandSize and text doesn't mention
> > anywhere about 0x66 being ignored.
> >
> > Or have you been doing some empirical experiments? :-)
> 
> Yes, I did.
> 
> 32-bit case: Intel CPU truncates EIP to 16 bits:
> 
> $ cat t.S
> _start:         .globl  _start
> 1:  .byte 0x66
>     loop 1b
> 
> $ gcc -nostartfiles -nostdlib -m32 t.S
> 
> $ objdump -dr a.out
> a.out:     file format elf32-i386
> Disassembly of section .text:
> 08048098 <_start>:
>  8048098:    66                       data16
>  8048099:    e2 fd                    loop   8048098 <_start>
> 
> $ gdb ./a.out
> (gdb) run
> Program received signal SIGSEGV, Segmentation fault.
> 0x00008098 in ?? ()
> 
> 
> Now let's try 64-bit version - compiling without -m32:
> 
> $ gcc -nostartfiles -nostdlib t.S
> $ ./a.out
> (runs without SEGV)
> 

AMD CPU always truncates:

32-bit:  a.out[13626]: segfault at 8098 ip 0000000000008098 sp 00000000ffa0ea20 error 14 in a.out[8048000+1000]

64-bit:  a.out[13706]: segfault at d6 ip 00000000000000d6 sp 00007fffec14e870 error 14 in a.out[400000+1000]


Intel CPU:

32-bit:  a.out[3478]: segfault at 8098 ip 0000000000008098 sp 00000000ff959da0 error 14 in a.out[8048000+1000]

64-bit:

Make the loop terminate:

_start:         .globl  _start
    mov $1, %rcx
1:  .byte 0x66
    loop 1b


        a.out[3523]: segfault at 0 ip 00000000004000de sp 00007ffff31674e0 error 6 in a.out[400000+1000]

segfaults because we don't have the libc glue around it, rIP is intact.

So it looks like the SDM is wrong.

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [tip:x86/asm] x86/asm/decoder: Explain CALLW discrepancy between Intel and AMD
  2015-02-12 19:06 [PATCH] x86: x86-opcode-map.txt: explain CALLW discrepancy between Intel and AMD Denys Vlasenko
  2015-02-13 12:01 ` Borislav Petkov
  2015-02-13 12:52 ` Masami Hiramatsu
@ 2015-02-19  0:25 ` tip-bot for Denys Vlasenko
  2 siblings, 0 replies; 6+ messages in thread
From: tip-bot for Denys Vlasenko @ 2015-02-19  0:25 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: dvlasenk, hpa, linux-kernel, mingo, tglx, oleg,
	masami.hiramatsu.pt

Commit-ID:  cbb53b9623a70f012e1fdfb6fc0af6878df4762b
Gitweb:     http://git.kernel.org/tip/cbb53b9623a70f012e1fdfb6fc0af6878df4762b
Author:     Denys Vlasenko <dvlasenk@redhat.com>
AuthorDate: Thu, 12 Feb 2015 20:06:57 +0100
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Wed, 18 Feb 2015 21:01:59 +0100

x86/asm/decoder: Explain CALLW discrepancy between Intel and AMD

In 64-bit mode, AMD and Intel CPUs treat 0x66 prefix before
branch insns differently. For near branches, it affects decode
too since immediate offset's width is different.

See these empirical tests:

  http://marc.info/?l=linux-kernel&m=139714939728946&w=2

Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Link: http://lkml.kernel.org/r/1423768017-31766-1-git-send-email-dvlasenk@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/lib/x86-opcode-map.txt | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/arch/x86/lib/x86-opcode-map.txt b/arch/x86/lib/x86-opcode-map.txt
index 1a2be7c..816488c 100644
--- a/arch/x86/lib/x86-opcode-map.txt
+++ b/arch/x86/lib/x86-opcode-map.txt
@@ -273,6 +273,9 @@ dd: ESC
 de: ESC
 df: ESC
 # 0xe0 - 0xef
+# Note: "forced64" is Intel CPU behavior: they ignore 0x66 prefix
+# in 64-bit mode. AMD CPUs accept 0x66 prefix, it causes RIP truncation
+# to 16 bits. In 32-bit mode, 0x66 is accepted by both Intel and AMD.
 e0: LOOPNE/LOOPNZ Jb (f64)
 e1: LOOPE/LOOPZ Jb (f64)
 e2: LOOP Jb (f64)
@@ -281,6 +284,10 @@ e4: IN AL,Ib
 e5: IN eAX,Ib
 e6: OUT Ib,AL
 e7: OUT Ib,eAX
+# With 0x66 prefix in 64-bit mode, for AMD CPUs immediate offset
+# in "near" jumps and calls is 16-bit. For CALL,
+# push of return address is 16-bit wide, RSP is decremented by 2
+# but is not truncated to 16 bits, unlike RIP.
 e8: CALL Jz (f64)
 e9: JMP-near Jz (f64)
 ea: JMP-far Ap (i64)
@@ -456,6 +463,7 @@ AVXcode: 1
 7e: movd/q Ey,Pd | vmovd/q Ey,Vy (66),(v1) | vmovq Vq,Wq (F3),(v1)
 7f: movq Qq,Pq | vmovdqa Wx,Vx (66) | vmovdqu Wx,Vx (F3)
 # 0x0f 0x80-0x8f
+# Note: "forced64" is Intel CPU behavior (see comment about CALL insn).
 80: JO Jz (f64)
 81: JNO Jz (f64)
 82: JB/JC/JNAE Jz (f64)
@@ -842,6 +850,7 @@ EndTable
 GrpTable: Grp5
 0: INC Ev
 1: DEC Ev
+# Note: "forced64" is Intel CPU behavior (see comment about CALL insn).
 2: CALLN Ev (f64)
 3: CALLF Ep
 4: JMPN Ev (f64)

^ permalink raw reply related	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2015-02-19  0:26 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-02-12 19:06 [PATCH] x86: x86-opcode-map.txt: explain CALLW discrepancy between Intel and AMD Denys Vlasenko
2015-02-13 12:01 ` Borislav Petkov
2015-02-13 13:25   ` Denys Vlasenko
2015-02-14  0:28     ` Borislav Petkov
2015-02-13 12:52 ` Masami Hiramatsu
2015-02-19  0:25 ` [tip:x86/asm] x86/asm/decoder: Explain " tip-bot for Denys Vlasenko

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox