linux-perf-users.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 00/10] perf intel pt: Update instruction decoder for APX and other new instructions
@ 2024-05-02 10:58 Adrian Hunter
  2024-05-02 10:58 ` [PATCH 01/10] x86/insn: Add Key Locker instructions to the opcode map Adrian Hunter
                   ` (10 more replies)
  0 siblings, 11 replies; 18+ messages in thread
From: Adrian Hunter @ 2024-05-02 10:58 UTC (permalink / raw)
  To: linux-kernel
  Cc: Chang S. Bae, Masami Hiramatsu, Nikolay Borisov, Borislav Petkov,
	Ingo Molnar, H. Peter Anvin, Dave Hansen, Thomas Gleixner, x86,
	Arnaldo Carvalho de Melo, Jiri Olsa, Namhyung Kim, Ian Rogers,
	linux-perf-users

Hi

The x86 instruction decoder is used not only for decoding kernel
instructions. It is also used by perf uprobes (user space probes) and by
perf tools Intel Processor Trace decoding. Consequently, it needs to
support instructions executed by user space also.

It should be noted that there are 2 copies of the instruction decoder.
One for the kernel and one for tools, which is a policy to prevent
changes from breaking builds.

Changes are based upon documents:

      Intel® Advanced Performance Extensions (Intel® APX)
      Architecture Specification
      April, 2024 Revision 4.0 355828-004

      Intel® Architecture Instruction Set Extensions
      and Future Features Programming Reference
      March 2024 319433-052

The patches depend on a patch by Chang S. Bae that they have already
submitted in their Key Locker patch set.  Nevertheless, the patch is
included in this patch set as well.

The final patch is an update to the perf tools' "x86 instruction decoder
test - new instructions" test, which provides testing.


Adrian Hunter (9):
      x86/insn: Fix PUSH instruction in x86 instruction decoder opcode map
      x86/insn: Add VEX versions of VPDPBUSD, VPDPBUSDS, VPDPWSSD and VPDPWSSDS
      x86/insn: Add misc new Intel instructions
      x86/insn: Add support for REX2 prefix to the instruction decoder logic
      x86/insn: x86/insn: Add support for REX2 prefix to the instruction decoder opcode map
      x86/insn: Add support for APX EVEX to the instruction decoder logic
      x86/insn: Add support for APX EVEX instructions to the opcode map
      perf intel pt: Add new JMPABS instruction to the Intel PT instruction decoder
      perf tests: Add APX and other new instructions to x86 instruction decoder test

Chang S. Bae (1):
      x86/insn: Add Key Locker instructions to the opcode map

 arch/x86/include/asm/inat.h                        |   17 +-
 arch/x86/include/asm/insn.h                        |   32 +-
 arch/x86/lib/insn.c                                |   29 +
 arch/x86/lib/x86-opcode-map.txt                    |  315 ++++--
 arch/x86/tools/gen-insn-attr-x86.awk               |   15 +-
 tools/arch/x86/include/asm/inat.h                  |   17 +-
 tools/arch/x86/include/asm/insn.h                  |   32 +-
 tools/arch/x86/lib/insn.c                          |   29 +
 tools/arch/x86/lib/x86-opcode-map.txt              |  315 ++++--
 tools/arch/x86/tools/gen-insn-attr-x86.awk         |   15 +-
 tools/perf/arch/x86/tests/insn-x86-dat-32.c        |  116 +++
 tools/perf/arch/x86/tests/insn-x86-dat-64.c        | 1026 ++++++++++++++++++++
 tools/perf/arch/x86/tests/insn-x86-dat-src.c       |  597 ++++++++++++
 .../util/intel-pt-decoder/intel-pt-insn-decoder.c  |    9 +
 14 files changed, 2370 insertions(+), 194 deletions(-)


Regards
Adrian

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH 01/10] x86/insn: Add Key Locker instructions to the opcode map
  2024-05-02 10:58 [PATCH 00/10] perf intel pt: Update instruction decoder for APX and other new instructions Adrian Hunter
@ 2024-05-02 10:58 ` Adrian Hunter
  2024-05-02 10:58 ` [PATCH 02/10] x86/insn: Fix PUSH instruction in x86 instruction decoder " Adrian Hunter
                   ` (9 subsequent siblings)
  10 siblings, 0 replies; 18+ messages in thread
From: Adrian Hunter @ 2024-05-02 10:58 UTC (permalink / raw)
  To: linux-kernel
  Cc: Chang S. Bae, Masami Hiramatsu, Nikolay Borisov, Borislav Petkov,
	Ingo Molnar, H. Peter Anvin, Dave Hansen, Thomas Gleixner, x86,
	Arnaldo Carvalho de Melo, Jiri Olsa, Namhyung Kim, Ian Rogers,
	linux-perf-users

From: "Chang S. Bae" <chang.seok.bae@intel.com>

The x86 instruction decoder needs to know these new instructions that
are going to be used in the crypto library as well as the x86 core
code. Add the following:

LOADIWKEY:
	Load a CPU-internal wrapping key.

ENCODEKEY128:
	Wrap a 128-bit AES key to a key handle.

ENCODEKEY256:
	Wrap a 256-bit AES key to a key handle.

AESENC128KL:
	Encrypt a 128-bit block of data using a 128-bit AES key
	indicated by a key handle.

AESENC256KL:
	Encrypt a 128-bit block of data using a 256-bit AES key
	indicated by a key handle.

AESDEC128KL:
	Decrypt a 128-bit block of data using a 128-bit AES key
	indicated by a key handle.

AESDEC256KL:
	Decrypt a 128-bit block of data using a 256-bit AES key
	indicated by a key handle.

AESENCWIDE128KL:
	Encrypt 8 128-bit blocks of data using a 128-bit AES key
	indicated by a key handle.

AESENCWIDE256KL:
	Encrypt 8 128-bit blocks of data using a 256-bit AES key
	indicated by a key handle.

AESDECWIDE128KL:
	Decrypt 8 128-bit blocks of data using a 128-bit AES key
	indicated by a key handle.

AESDECWIDE256KL:
	Decrypt 8 128-bit blocks of data using a 256-bit AES key
	indicated by a key handle.

The detail can be found in Intel Software Developer Manual.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 arch/x86/lib/x86-opcode-map.txt       | 11 +++++++----
 tools/arch/x86/lib/x86-opcode-map.txt | 11 +++++++----
 2 files changed, 14 insertions(+), 8 deletions(-)

diff --git a/arch/x86/lib/x86-opcode-map.txt b/arch/x86/lib/x86-opcode-map.txt
index 12af572201a2..c94988d5130d 100644
--- a/arch/x86/lib/x86-opcode-map.txt
+++ b/arch/x86/lib/x86-opcode-map.txt
@@ -800,11 +800,12 @@ cb: sha256rnds2 Vdq,Wdq | vrcp28ss/d Vx,Hx,Wx (66),(ev)
 cc: sha256msg1 Vdq,Wdq | vrsqrt28ps/d Vx,Wx (66),(ev)
 cd: sha256msg2 Vdq,Wdq | vrsqrt28ss/d Vx,Hx,Wx (66),(ev)
 cf: vgf2p8mulb Vx,Wx (66)
+d8: AESENCWIDE128KL Qpi (F3),(000),(00B) | AESENCWIDE256KL Qpi (F3),(000),(10B) | AESDECWIDE128KL Qpi (F3),(000),(01B) | AESDECWIDE256KL Qpi (F3),(000),(11B)
 db: VAESIMC Vdq,Wdq (66),(v1)
-dc: vaesenc Vx,Hx,Wx (66)
-dd: vaesenclast Vx,Hx,Wx (66)
-de: vaesdec Vx,Hx,Wx (66)
-df: vaesdeclast Vx,Hx,Wx (66)
+dc: vaesenc Vx,Hx,Wx (66) | LOADIWKEY Vx,Hx (F3) | AESENC128KL Vpd,Qpi (F3)
+dd: vaesenclast Vx,Hx,Wx (66) | AESDEC128KL Vpd,Qpi (F3)
+de: vaesdec Vx,Hx,Wx (66) | AESENC256KL Vpd,Qpi (F3)
+df: vaesdeclast Vx,Hx,Wx (66) | AESDEC256KL Vpd,Qpi (F3)
 f0: MOVBE Gy,My | MOVBE Gw,Mw (66) | CRC32 Gd,Eb (F2) | CRC32 Gd,Eb (66&F2)
 f1: MOVBE My,Gy | MOVBE Mw,Gw (66) | CRC32 Gd,Ey (F2) | CRC32 Gd,Ew (66&F2)
 f2: ANDN Gy,By,Ey (v)
@@ -814,6 +815,8 @@ f6: ADCX Gy,Ey (66) | ADOX Gy,Ey (F3) | MULX By,Gy,rDX,Ey (F2),(v) | WRSSD/Q My,
 f7: BEXTR Gy,Ey,By (v) | SHLX Gy,Ey,By (66),(v) | SARX Gy,Ey,By (F3),(v) | SHRX Gy,Ey,By (F2),(v)
 f8: MOVDIR64B Gv,Mdqq (66) | ENQCMD Gv,Mdqq (F2) | ENQCMDS Gv,Mdqq (F3)
 f9: MOVDIRI My,Gy
+fa: ENCODEKEY128 Ew,Ew (F3)
+fb: ENCODEKEY256 Ew,Ew (F3)
 EndTable
 
 Table: 3-byte opcode 2 (0x0f 0x3a)
diff --git a/tools/arch/x86/lib/x86-opcode-map.txt b/tools/arch/x86/lib/x86-opcode-map.txt
index 12af572201a2..c94988d5130d 100644
--- a/tools/arch/x86/lib/x86-opcode-map.txt
+++ b/tools/arch/x86/lib/x86-opcode-map.txt
@@ -800,11 +800,12 @@ cb: sha256rnds2 Vdq,Wdq | vrcp28ss/d Vx,Hx,Wx (66),(ev)
 cc: sha256msg1 Vdq,Wdq | vrsqrt28ps/d Vx,Wx (66),(ev)
 cd: sha256msg2 Vdq,Wdq | vrsqrt28ss/d Vx,Hx,Wx (66),(ev)
 cf: vgf2p8mulb Vx,Wx (66)
+d8: AESENCWIDE128KL Qpi (F3),(000),(00B) | AESENCWIDE256KL Qpi (F3),(000),(10B) | AESDECWIDE128KL Qpi (F3),(000),(01B) | AESDECWIDE256KL Qpi (F3),(000),(11B)
 db: VAESIMC Vdq,Wdq (66),(v1)
-dc: vaesenc Vx,Hx,Wx (66)
-dd: vaesenclast Vx,Hx,Wx (66)
-de: vaesdec Vx,Hx,Wx (66)
-df: vaesdeclast Vx,Hx,Wx (66)
+dc: vaesenc Vx,Hx,Wx (66) | LOADIWKEY Vx,Hx (F3) | AESENC128KL Vpd,Qpi (F3)
+dd: vaesenclast Vx,Hx,Wx (66) | AESDEC128KL Vpd,Qpi (F3)
+de: vaesdec Vx,Hx,Wx (66) | AESENC256KL Vpd,Qpi (F3)
+df: vaesdeclast Vx,Hx,Wx (66) | AESDEC256KL Vpd,Qpi (F3)
 f0: MOVBE Gy,My | MOVBE Gw,Mw (66) | CRC32 Gd,Eb (F2) | CRC32 Gd,Eb (66&F2)
 f1: MOVBE My,Gy | MOVBE Mw,Gw (66) | CRC32 Gd,Ey (F2) | CRC32 Gd,Ew (66&F2)
 f2: ANDN Gy,By,Ey (v)
@@ -814,6 +815,8 @@ f6: ADCX Gy,Ey (66) | ADOX Gy,Ey (F3) | MULX By,Gy,rDX,Ey (F2),(v) | WRSSD/Q My,
 f7: BEXTR Gy,Ey,By (v) | SHLX Gy,Ey,By (66),(v) | SARX Gy,Ey,By (F3),(v) | SHRX Gy,Ey,By (F2),(v)
 f8: MOVDIR64B Gv,Mdqq (66) | ENQCMD Gv,Mdqq (F2) | ENQCMDS Gv,Mdqq (F3)
 f9: MOVDIRI My,Gy
+fa: ENCODEKEY128 Ew,Ew (F3)
+fb: ENCODEKEY256 Ew,Ew (F3)
 EndTable
 
 Table: 3-byte opcode 2 (0x0f 0x3a)
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 02/10] x86/insn: Fix PUSH instruction in x86 instruction decoder opcode map
  2024-05-02 10:58 [PATCH 00/10] perf intel pt: Update instruction decoder for APX and other new instructions Adrian Hunter
  2024-05-02 10:58 ` [PATCH 01/10] x86/insn: Add Key Locker instructions to the opcode map Adrian Hunter
@ 2024-05-02 10:58 ` Adrian Hunter
  2024-05-02 13:41   ` Arnaldo Carvalho de Melo
  2024-05-02 10:58 ` [PATCH 03/10] x86/insn: Add VEX versions of VPDPBUSD, VPDPBUSDS, VPDPWSSD and VPDPWSSDS Adrian Hunter
                   ` (8 subsequent siblings)
  10 siblings, 1 reply; 18+ messages in thread
From: Adrian Hunter @ 2024-05-02 10:58 UTC (permalink / raw)
  To: linux-kernel
  Cc: Chang S. Bae, Masami Hiramatsu, Nikolay Borisov, Borislav Petkov,
	Ingo Molnar, H. Peter Anvin, Dave Hansen, Thomas Gleixner, x86,
	Arnaldo Carvalho de Melo, Jiri Olsa, Namhyung Kim, Ian Rogers,
	linux-perf-users

The x86 instruction decoder is used not only for decoding kernel
instructions. It is also used by perf uprobes (user space probes) and by
perf tools Intel Processor Trace decoding. Consequently, it needs to
support instructions executed by user space also.

Opcode 0x68 PUSH instruction is currently defined as 64-bit operand size
only i.e. (d64). That was based on Intel SDM Opcode Map. However that is
contradicted by the Instruction Set Reference section for PUSH in the
same manual.

Remove 64-bit operand size only annotation from opcode 0x68 PUSH
instruction.

Example:

  $ cat pushw.s
  .global  _start
  .text
  _start:
          pushw   $0x1234
          mov     $0x1,%eax   # system call number (sys_exit)
          int     $0x80
  $ as -o pushw.o pushw.s
  $ ld -s -o pushw pushw.o
  $ objdump -d pushw | tail -4
  0000000000401000 <.text>:
    401000:       66 68 34 12             pushw  $0x1234
    401004:       b8 01 00 00 00          mov    $0x1,%eax
    401009:       cd 80                   int    $0x80
  $ perf record -e intel_pt//u ./pushw
  [ perf record: Woken up 1 times to write data ]
  [ perf record: Captured and wrote 0.014 MB perf.data ]

 Before:

  $ perf script --insn-trace=disasm
  Warning:
  1 instruction trace errors
           pushw   10349 [000] 10586.869237014:            401000 [unknown] (/home/ahunter/git/misc/rtit-tests/pushw)           pushw $0x1234
           pushw   10349 [000] 10586.869237014:            401006 [unknown] (/home/ahunter/git/misc/rtit-tests/pushw)           addb %al, (%rax)
           pushw   10349 [000] 10586.869237014:            401008 [unknown] (/home/ahunter/git/misc/rtit-tests/pushw)           addb %cl, %ch
           pushw   10349 [000] 10586.869237014:            40100a [unknown] (/home/ahunter/git/misc/rtit-tests/pushw)           addb $0x2e, (%rax)
   instruction trace error type 1 time 10586.869237224 cpu 0 pid 10349 tid 10349 ip 0x40100d code 6: Trace doesn't match instruction

 After:

  $ perf script --insn-trace=disasm
             pushw   10349 [000] 10586.869237014:            401000 [unknown] (./pushw)           pushw $0x1234
             pushw   10349 [000] 10586.869237014:            401004 [unknown] (./pushw)           movl $1, %eax

Fixes: eb13296cfaf6 ("x86: Instruction decoder API")
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 arch/x86/lib/x86-opcode-map.txt       | 2 +-
 tools/arch/x86/lib/x86-opcode-map.txt | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/lib/x86-opcode-map.txt b/arch/x86/lib/x86-opcode-map.txt
index c94988d5130d..4ea2e6adb477 100644
--- a/arch/x86/lib/x86-opcode-map.txt
+++ b/arch/x86/lib/x86-opcode-map.txt
@@ -148,7 +148,7 @@ AVXcode:
 65: SEG=GS (Prefix)
 66: Operand-Size (Prefix)
 67: Address-Size (Prefix)
-68: PUSH Iz (d64)
+68: PUSH Iz
 69: IMUL Gv,Ev,Iz
 6a: PUSH Ib (d64)
 6b: IMUL Gv,Ev,Ib
diff --git a/tools/arch/x86/lib/x86-opcode-map.txt b/tools/arch/x86/lib/x86-opcode-map.txt
index c94988d5130d..4ea2e6adb477 100644
--- a/tools/arch/x86/lib/x86-opcode-map.txt
+++ b/tools/arch/x86/lib/x86-opcode-map.txt
@@ -148,7 +148,7 @@ AVXcode:
 65: SEG=GS (Prefix)
 66: Operand-Size (Prefix)
 67: Address-Size (Prefix)
-68: PUSH Iz (d64)
+68: PUSH Iz
 69: IMUL Gv,Ev,Iz
 6a: PUSH Ib (d64)
 6b: IMUL Gv,Ev,Ib
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 03/10] x86/insn: Add VEX versions of VPDPBUSD, VPDPBUSDS, VPDPWSSD and VPDPWSSDS
  2024-05-02 10:58 [PATCH 00/10] perf intel pt: Update instruction decoder for APX and other new instructions Adrian Hunter
  2024-05-02 10:58 ` [PATCH 01/10] x86/insn: Add Key Locker instructions to the opcode map Adrian Hunter
  2024-05-02 10:58 ` [PATCH 02/10] x86/insn: Fix PUSH instruction in x86 instruction decoder " Adrian Hunter
@ 2024-05-02 10:58 ` Adrian Hunter
  2024-05-02 10:58 ` [PATCH 04/10] x86/insn: Add misc new Intel instructions Adrian Hunter
                   ` (7 subsequent siblings)
  10 siblings, 0 replies; 18+ messages in thread
From: Adrian Hunter @ 2024-05-02 10:58 UTC (permalink / raw)
  To: linux-kernel
  Cc: Chang S. Bae, Masami Hiramatsu, Nikolay Borisov, Borislav Petkov,
	Ingo Molnar, H. Peter Anvin, Dave Hansen, Thomas Gleixner, x86,
	Arnaldo Carvalho de Melo, Jiri Olsa, Namhyung Kim, Ian Rogers,
	linux-perf-users

The x86 instruction decoder is used not only for decoding kernel
instructions. It is also used by perf uprobes (user space probes) and by
perf tools Intel Processor Trace decoding. Consequently, it needs to
support instructions executed by user space also.

Intel Architecture Instruction Set Extensions and Future Features manual
number 319433-044 of May 2021, documented VEX versions of instructions
VPDPBUSD, VPDPBUSDS, VPDPWSSD and VPDPWSSDS, but the opcode map has them
listed as EVEX only.

Remove EVEX-only (ev) annotation from instructions VPDPBUSD, VPDPBUSDS,
VPDPWSSD and VPDPWSSDS, which allows them to be decoded with either a VEX
or EVEX prefix.

Fixes: 0153d98f2dd6 ("x86/insn: Add misc instructions to x86 instruction decoder")
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 arch/x86/lib/x86-opcode-map.txt       | 8 ++++----
 tools/arch/x86/lib/x86-opcode-map.txt | 8 ++++----
 2 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/arch/x86/lib/x86-opcode-map.txt b/arch/x86/lib/x86-opcode-map.txt
index 4ea2e6adb477..24941b9d4e06 100644
--- a/arch/x86/lib/x86-opcode-map.txt
+++ b/arch/x86/lib/x86-opcode-map.txt
@@ -698,10 +698,10 @@ AVXcode: 2
 4d: vrcp14ss/d Vsd,Hpd,Wsd (66),(ev)
 4e: vrsqrt14ps/d Vpd,Wpd (66),(ev)
 4f: vrsqrt14ss/d Vsd,Hsd,Wsd (66),(ev)
-50: vpdpbusd Vx,Hx,Wx (66),(ev)
-51: vpdpbusds Vx,Hx,Wx (66),(ev)
-52: vdpbf16ps Vx,Hx,Wx (F3),(ev) | vpdpwssd Vx,Hx,Wx (66),(ev) | vp4dpwssd Vdqq,Hdqq,Wdq (F2),(ev)
-53: vpdpwssds Vx,Hx,Wx (66),(ev) | vp4dpwssds Vdqq,Hdqq,Wdq (F2),(ev)
+50: vpdpbusd Vx,Hx,Wx (66)
+51: vpdpbusds Vx,Hx,Wx (66)
+52: vdpbf16ps Vx,Hx,Wx (F3),(ev) | vpdpwssd Vx,Hx,Wx (66) | vp4dpwssd Vdqq,Hdqq,Wdq (F2),(ev)
+53: vpdpwssds Vx,Hx,Wx (66) | vp4dpwssds Vdqq,Hdqq,Wdq (F2),(ev)
 54: vpopcntb/w Vx,Wx (66),(ev)
 55: vpopcntd/q Vx,Wx (66),(ev)
 58: vpbroadcastd Vx,Wx (66),(v)
diff --git a/tools/arch/x86/lib/x86-opcode-map.txt b/tools/arch/x86/lib/x86-opcode-map.txt
index 4ea2e6adb477..24941b9d4e06 100644
--- a/tools/arch/x86/lib/x86-opcode-map.txt
+++ b/tools/arch/x86/lib/x86-opcode-map.txt
@@ -698,10 +698,10 @@ AVXcode: 2
 4d: vrcp14ss/d Vsd,Hpd,Wsd (66),(ev)
 4e: vrsqrt14ps/d Vpd,Wpd (66),(ev)
 4f: vrsqrt14ss/d Vsd,Hsd,Wsd (66),(ev)
-50: vpdpbusd Vx,Hx,Wx (66),(ev)
-51: vpdpbusds Vx,Hx,Wx (66),(ev)
-52: vdpbf16ps Vx,Hx,Wx (F3),(ev) | vpdpwssd Vx,Hx,Wx (66),(ev) | vp4dpwssd Vdqq,Hdqq,Wdq (F2),(ev)
-53: vpdpwssds Vx,Hx,Wx (66),(ev) | vp4dpwssds Vdqq,Hdqq,Wdq (F2),(ev)
+50: vpdpbusd Vx,Hx,Wx (66)
+51: vpdpbusds Vx,Hx,Wx (66)
+52: vdpbf16ps Vx,Hx,Wx (F3),(ev) | vpdpwssd Vx,Hx,Wx (66) | vp4dpwssd Vdqq,Hdqq,Wdq (F2),(ev)
+53: vpdpwssds Vx,Hx,Wx (66) | vp4dpwssds Vdqq,Hdqq,Wdq (F2),(ev)
 54: vpopcntb/w Vx,Wx (66),(ev)
 55: vpopcntd/q Vx,Wx (66),(ev)
 58: vpbroadcastd Vx,Wx (66),(v)
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 04/10] x86/insn: Add misc new Intel instructions
  2024-05-02 10:58 [PATCH 00/10] perf intel pt: Update instruction decoder for APX and other new instructions Adrian Hunter
                   ` (2 preceding siblings ...)
  2024-05-02 10:58 ` [PATCH 03/10] x86/insn: Add VEX versions of VPDPBUSD, VPDPBUSDS, VPDPWSSD and VPDPWSSDS Adrian Hunter
@ 2024-05-02 10:58 ` Adrian Hunter
  2024-05-02 10:58 ` [PATCH 05/10] x86/insn: Add support for REX2 prefix to the instruction decoder logic Adrian Hunter
                   ` (6 subsequent siblings)
  10 siblings, 0 replies; 18+ messages in thread
From: Adrian Hunter @ 2024-05-02 10:58 UTC (permalink / raw)
  To: linux-kernel
  Cc: Chang S. Bae, Masami Hiramatsu, Nikolay Borisov, Borislav Petkov,
	Ingo Molnar, H. Peter Anvin, Dave Hansen, Thomas Gleixner, x86,
	Arnaldo Carvalho de Melo, Jiri Olsa, Namhyung Kim, Ian Rogers,
	linux-perf-users

The x86 instruction decoder is used not only for decoding kernel
instructions. It is also used by perf uprobes (user space probes) and by
perf tools Intel Processor Trace decoding. Consequently, it needs to
support instructions executed by user space also.

Add instructions documented in Intel Architecture Instruction Set
Extensions and Future Features Programming Reference March 2024
319433-052, that have not been added yet:

	AADD
	AAND
	AOR
	AXOR
	CMPccXADD
	PBNDKB
	RDMSRLIST
	URDMSR
	UWRMSR
	VBCSTNEBF162PS
	VBCSTNESH2PS
	VCVTNEEBF162PS
	VCVTNEEPH2PS
	VCVTNEOBF162PS
	VCVTNEOPH2PS
	VCVTNEPS2BF16
	VPDPB[SU,UU,SS]D[,S]
	VPDPW[SU,US,UU]D[,S]
	VPMADD52HUQ
	VPMADD52LUQ
	VSHA512MSG1
	VSHA512MSG2
	VSHA512RNDS2
	VSM3MSG1
	VSM3MSG2
	VSM3RNDS2
	VSM4KEY4
	VSM4RNDS4
	WRMSRLIST
	TCMMIMFP16PS
	TCMMRLFP16PS
	TDPFP16PS
	PREFETCHIT1
	PREFETCHIT0

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 arch/x86/lib/x86-opcode-map.txt       | 57 +++++++++++++++++++++------
 tools/arch/x86/lib/x86-opcode-map.txt | 57 +++++++++++++++++++++------
 2 files changed, 90 insertions(+), 24 deletions(-)

diff --git a/arch/x86/lib/x86-opcode-map.txt b/arch/x86/lib/x86-opcode-map.txt
index 24941b9d4e06..4e9f53581b58 100644
--- a/arch/x86/lib/x86-opcode-map.txt
+++ b/arch/x86/lib/x86-opcode-map.txt
@@ -698,8 +698,8 @@ AVXcode: 2
 4d: vrcp14ss/d Vsd,Hpd,Wsd (66),(ev)
 4e: vrsqrt14ps/d Vpd,Wpd (66),(ev)
 4f: vrsqrt14ss/d Vsd,Hsd,Wsd (66),(ev)
-50: vpdpbusd Vx,Hx,Wx (66)
-51: vpdpbusds Vx,Hx,Wx (66)
+50: vpdpbusd Vx,Hx,Wx (66) | vpdpbssd Vx,Hx,Wx (F2),(v) | vpdpbsud Vx,Hx,Wx (F3),(v) | vpdpbuud Vx,Hx,Wx (v)
+51: vpdpbusds Vx,Hx,Wx (66) | vpdpbssds Vx,Hx,Wx (F2),(v) | vpdpbsuds Vx,Hx,Wx (F3),(v) | vpdpbuuds Vx,Hx,Wx (v)
 52: vdpbf16ps Vx,Hx,Wx (F3),(ev) | vpdpwssd Vx,Hx,Wx (66) | vp4dpwssd Vdqq,Hdqq,Wdq (F2),(ev)
 53: vpdpwssds Vx,Hx,Wx (66) | vp4dpwssds Vdqq,Hdqq,Wdq (F2),(ev)
 54: vpopcntb/w Vx,Wx (66),(ev)
@@ -708,7 +708,7 @@ AVXcode: 2
 59: vpbroadcastq Vx,Wx (66),(v) | vbroadcasti32x2 Vx,Wx (66),(evo)
 5a: vbroadcasti128 Vqq,Mdq (66),(v) | vbroadcasti32x4/64x2 Vx,Wx (66),(evo)
 5b: vbroadcasti32x8/64x4 Vqq,Mdq (66),(ev)
-5c: TDPBF16PS Vt,Wt,Ht (F3),(v1)
+5c: TDPBF16PS Vt,Wt,Ht (F3),(v1) | TDPFP16PS Vt,Wt,Ht (F2),(v1),(o64)
 # Skip 0x5d
 5e: TDPBSSD Vt,Wt,Ht (F2),(v1) | TDPBSUD Vt,Wt,Ht (F3),(v1) | TDPBUSD Vt,Wt,Ht (66),(v1) | TDPBUUD Vt,Wt,Ht (v1)
 # Skip 0x5f-0x61
@@ -718,10 +718,12 @@ AVXcode: 2
 65: vblendmps/d Vx,Hx,Wx (66),(ev)
 66: vpblendmb/w Vx,Hx,Wx (66),(ev)
 68: vp2intersectd/q Kx,Hx,Wx (F2),(ev)
-# Skip 0x69-0x6f
+# Skip 0x69-0x6b
+6c: TCMMIMFP16PS Vt,Wt,Ht (66),(v1),(o64) | TCMMRLFP16PS Vt,Wt,Ht (v1),(o64)
+# Skip 0x6d-0x6f
 70: vpshldvw Vx,Hx,Wx (66),(ev)
 71: vpshldvd/q Vx,Hx,Wx (66),(ev)
-72: vcvtne2ps2bf16 Vx,Hx,Wx (F2),(ev) | vcvtneps2bf16 Vx,Wx (F3),(ev) | vpshrdvw Vx,Hx,Wx (66),(ev)
+72: vcvtne2ps2bf16 Vx,Hx,Wx (F2),(ev) | vcvtneps2bf16 Vx,Wx (F3) | vpshrdvw Vx,Hx,Wx (66),(ev)
 73: vpshrdvd/q Vx,Hx,Wx (66),(ev)
 75: vpermi2b/w Vx,Hx,Wx (66),(ev)
 76: vpermi2d/q Vx,Hx,Wx (66),(ev)
@@ -777,8 +779,10 @@ ac: vfnmadd213ps/d Vx,Hx,Wx (66),(v)
 ad: vfnmadd213ss/d Vx,Hx,Wx (66),(v),(v1)
 ae: vfnmsub213ps/d Vx,Hx,Wx (66),(v)
 af: vfnmsub213ss/d Vx,Hx,Wx (66),(v),(v1)
-b4: vpmadd52luq Vx,Hx,Wx (66),(ev)
-b5: vpmadd52huq Vx,Hx,Wx (66),(ev)
+b0: vcvtneebf162ps Vx,Mx (F3),(!11B),(v) | vcvtneeph2ps Vx,Mx (66),(!11B),(v) | vcvtneobf162ps Vx,Mx (F2),(!11B),(v) | vcvtneoph2ps Vx,Mx (!11B),(v)
+b1: vbcstnebf162ps Vx,Mw (F3),(!11B),(v) | vbcstnesh2ps Vx,Mw (66),(!11B),(v)
+b4: vpmadd52luq Vx,Hx,Wx (66)
+b5: vpmadd52huq Vx,Hx,Wx (66)
 b6: vfmaddsub231ps/d Vx,Hx,Wx (66),(v)
 b7: vfmsubadd231ps/d Vx,Hx,Wx (66),(v)
 b8: vfmadd231ps/d Vx,Hx,Wx (66),(v)
@@ -796,16 +800,35 @@ c7: Grp19 (1A)
 c8: sha1nexte Vdq,Wdq | vexp2ps/d Vx,Wx (66),(ev)
 c9: sha1msg1 Vdq,Wdq
 ca: sha1msg2 Vdq,Wdq | vrcp28ps/d Vx,Wx (66),(ev)
-cb: sha256rnds2 Vdq,Wdq | vrcp28ss/d Vx,Hx,Wx (66),(ev)
-cc: sha256msg1 Vdq,Wdq | vrsqrt28ps/d Vx,Wx (66),(ev)
-cd: sha256msg2 Vdq,Wdq | vrsqrt28ss/d Vx,Hx,Wx (66),(ev)
+cb: sha256rnds2 Vdq,Wdq | vrcp28ss/d Vx,Hx,Wx (66),(ev) | vsha512rnds2 Vqq,Hqq,Udq (F2),(11B),(v)
+cc: sha256msg1 Vdq,Wdq | vrsqrt28ps/d Vx,Wx (66),(ev) | vsha512msg1 Vqq,Udq (F2),(11B),(v)
+cd: sha256msg2 Vdq,Wdq | vrsqrt28ss/d Vx,Hx,Wx (66),(ev) | vsha512msg2 Vqq,Uqq (F2),(11B),(v)
 cf: vgf2p8mulb Vx,Wx (66)
+d2: vpdpwsud Vx,Hx,Wx (F3),(v) | vpdpwusd Vx,Hx,Wx (66),(v) | vpdpwuud Vx,Hx,Wx (v)
+d3: vpdpwsuds Vx,Hx,Wx (F3),(v) | vpdpwusds Vx,Hx,Wx (66),(v) | vpdpwuuds Vx,Hx,Wx (v)
 d8: AESENCWIDE128KL Qpi (F3),(000),(00B) | AESENCWIDE256KL Qpi (F3),(000),(10B) | AESDECWIDE128KL Qpi (F3),(000),(01B) | AESDECWIDE256KL Qpi (F3),(000),(11B)
+da: vsm3msg1 Vdq,Hdq,Udq (v1) | vsm3msg2 Vdq,Hdq,Udq (66),(v1) | vsm4key4 Vx,Hx,Wx (F3),(v) | vsm4rnds4 Vx,Hx,Wx (F2),(v)
 db: VAESIMC Vdq,Wdq (66),(v1)
 dc: vaesenc Vx,Hx,Wx (66) | LOADIWKEY Vx,Hx (F3) | AESENC128KL Vpd,Qpi (F3)
 dd: vaesenclast Vx,Hx,Wx (66) | AESDEC128KL Vpd,Qpi (F3)
 de: vaesdec Vx,Hx,Wx (66) | AESENC256KL Vpd,Qpi (F3)
 df: vaesdeclast Vx,Hx,Wx (66) | AESDEC256KL Vpd,Qpi (F3)
+e0: CMPOXADD   My,Gy,By (66),(v1),(o64)
+e1: CMPNOXADD  My,Gy,By (66),(v1),(o64)
+e2: CMPBXADD   My,Gy,By (66),(v1),(o64)
+e3: CMPNBXADD  My,Gy,By (66),(v1),(o64)
+e4: CMPZXADD   My,Gy,By (66),(v1),(o64)
+e5: CMPNZXADD  My,Gy,By (66),(v1),(o64)
+e6: CMPBEXADD  My,Gy,By (66),(v1),(o64)
+e7: CMPNBEXADD My,Gy,By (66),(v1),(o64)
+e8: CMPSXADD   My,Gy,By (66),(v1),(o64)
+e9: CMPNSXADD  My,Gy,By (66),(v1),(o64)
+ea: CMPPXADD   My,Gy,By (66),(v1),(o64)
+eb: CMPNPXADD  My,Gy,By (66),(v1),(o64)
+ec: CMPLXADD   My,Gy,By (66),(v1),(o64)
+ed: CMPNLXADD  My,Gy,By (66),(v1),(o64)
+ee: CMPLEXADD  My,Gy,By (66),(v1),(o64)
+ef: CMPNLEXADD My,Gy,By (66),(v1),(o64)
 f0: MOVBE Gy,My | MOVBE Gw,Mw (66) | CRC32 Gd,Eb (F2) | CRC32 Gd,Eb (66&F2)
 f1: MOVBE My,Gy | MOVBE Mw,Gw (66) | CRC32 Gd,Ey (F2) | CRC32 Gd,Ew (66&F2)
 f2: ANDN Gy,By,Ey (v)
@@ -813,10 +836,11 @@ f3: Grp17 (1A)
 f5: BZHI Gy,Ey,By (v) | PEXT Gy,By,Ey (F3),(v) | PDEP Gy,By,Ey (F2),(v) | WRUSSD/Q My,Gy (66)
 f6: ADCX Gy,Ey (66) | ADOX Gy,Ey (F3) | MULX By,Gy,rDX,Ey (F2),(v) | WRSSD/Q My,Gy
 f7: BEXTR Gy,Ey,By (v) | SHLX Gy,Ey,By (66),(v) | SARX Gy,Ey,By (F3),(v) | SHRX Gy,Ey,By (F2),(v)
-f8: MOVDIR64B Gv,Mdqq (66) | ENQCMD Gv,Mdqq (F2) | ENQCMDS Gv,Mdqq (F3)
+f8: MOVDIR64B Gv,Mdqq (66) | ENQCMD Gv,Mdqq (F2) | ENQCMDS Gv,Mdqq (F3) | URDMSR Rq,Gq (F2),(11B) | UWRMSR Gq,Rq (F3),(11B)
 f9: MOVDIRI My,Gy
 fa: ENCODEKEY128 Ew,Ew (F3)
 fb: ENCODEKEY256 Ew,Ew (F3)
+fc: AADD My,Gy | AAND My,Gy (66) | AOR My,Gy (F2) | AXOR My,Gy (F3)
 EndTable
 
 Table: 3-byte opcode 2 (0x0f 0x3a)
@@ -896,6 +920,7 @@ c2: vcmpph Vx,Hx,Wx,Ib (ev) | vcmpsh Vx,Hx,Wx,Ib (F3),(ev)
 cc: sha1rnds4 Vdq,Wdq,Ib
 ce: vgf2p8affineqb Vx,Wx,Ib (66)
 cf: vgf2p8affineinvqb Vx,Wx,Ib (66)
+de: vsm3rnds2 Vdq,Hdq,Wdq,Ib (66),(v1)
 df: VAESKEYGEN Vdq,Wdq,Ib (66),(v1)
 f0: RORX Gy,Ey,Ib (F2),(v) | HRESET Gv,Ib (F3),(000),(11B)
 EndTable
@@ -978,6 +1003,12 @@ d6: vfcmulcph Vx,Hx,Wx (F2),(ev) | vfmulcph Vx,Hx,Wx (F3),(ev)
 d7: vfcmulcsh Vx,Hx,Wx (F2),(ev) | vfmulcsh Vx,Hx,Wx (F3),(ev)
 EndTable
 
+Table: VEX map 7
+Referrer:
+AVXcode: 7
+f8: URDMSR Rq,Id (F2),(v1),(11B) | UWRMSR Id,Rq (F3),(v1),(11B)
+EndTable
+
 GrpTable: Grp1
 0: ADD
 1: OR
@@ -1054,7 +1085,7 @@ GrpTable: Grp6
 EndTable
 
 GrpTable: Grp7
-0: SGDT Ms | VMCALL (001),(11B) | VMLAUNCH (010),(11B) | VMRESUME (011),(11B) | VMXOFF (100),(11B) | PCONFIG (101),(11B) | ENCLV (000),(11B) | WRMSRNS (110),(11B)
+0: SGDT Ms | VMCALL (001),(11B) | VMLAUNCH (010),(11B) | VMRESUME (011),(11B) | VMXOFF (100),(11B) | PCONFIG (101),(11B) | ENCLV (000),(11B) | WRMSRNS (110),(11B) | RDMSRLIST (F2),(110),(11B) | WRMSRLIST (F3),(110),(11B) | PBNDKB (111),(11B)
 1: SIDT Ms | MONITOR (000),(11B) | MWAIT (001),(11B) | CLAC (010),(11B) | STAC (011),(11B) | ENCLS (111),(11B) | ERETU (F3),(010),(11B) | ERETS (F2),(010),(11B)
 2: LGDT Ms | XGETBV (000),(11B) | XSETBV (001),(11B) | VMFUNC (100),(11B) | XEND (101)(11B) | XTEST (110)(11B) | ENCLU (111),(11B)
 3: LIDT Ms
@@ -1140,6 +1171,8 @@ GrpTable: Grp16
 1: prefetch T0
 2: prefetch T1
 3: prefetch T2
+6: prefetch IT1
+7: prefetch IT0
 EndTable
 
 GrpTable: Grp17
diff --git a/tools/arch/x86/lib/x86-opcode-map.txt b/tools/arch/x86/lib/x86-opcode-map.txt
index 24941b9d4e06..4e9f53581b58 100644
--- a/tools/arch/x86/lib/x86-opcode-map.txt
+++ b/tools/arch/x86/lib/x86-opcode-map.txt
@@ -698,8 +698,8 @@ AVXcode: 2
 4d: vrcp14ss/d Vsd,Hpd,Wsd (66),(ev)
 4e: vrsqrt14ps/d Vpd,Wpd (66),(ev)
 4f: vrsqrt14ss/d Vsd,Hsd,Wsd (66),(ev)
-50: vpdpbusd Vx,Hx,Wx (66)
-51: vpdpbusds Vx,Hx,Wx (66)
+50: vpdpbusd Vx,Hx,Wx (66) | vpdpbssd Vx,Hx,Wx (F2),(v) | vpdpbsud Vx,Hx,Wx (F3),(v) | vpdpbuud Vx,Hx,Wx (v)
+51: vpdpbusds Vx,Hx,Wx (66) | vpdpbssds Vx,Hx,Wx (F2),(v) | vpdpbsuds Vx,Hx,Wx (F3),(v) | vpdpbuuds Vx,Hx,Wx (v)
 52: vdpbf16ps Vx,Hx,Wx (F3),(ev) | vpdpwssd Vx,Hx,Wx (66) | vp4dpwssd Vdqq,Hdqq,Wdq (F2),(ev)
 53: vpdpwssds Vx,Hx,Wx (66) | vp4dpwssds Vdqq,Hdqq,Wdq (F2),(ev)
 54: vpopcntb/w Vx,Wx (66),(ev)
@@ -708,7 +708,7 @@ AVXcode: 2
 59: vpbroadcastq Vx,Wx (66),(v) | vbroadcasti32x2 Vx,Wx (66),(evo)
 5a: vbroadcasti128 Vqq,Mdq (66),(v) | vbroadcasti32x4/64x2 Vx,Wx (66),(evo)
 5b: vbroadcasti32x8/64x4 Vqq,Mdq (66),(ev)
-5c: TDPBF16PS Vt,Wt,Ht (F3),(v1)
+5c: TDPBF16PS Vt,Wt,Ht (F3),(v1) | TDPFP16PS Vt,Wt,Ht (F2),(v1),(o64)
 # Skip 0x5d
 5e: TDPBSSD Vt,Wt,Ht (F2),(v1) | TDPBSUD Vt,Wt,Ht (F3),(v1) | TDPBUSD Vt,Wt,Ht (66),(v1) | TDPBUUD Vt,Wt,Ht (v1)
 # Skip 0x5f-0x61
@@ -718,10 +718,12 @@ AVXcode: 2
 65: vblendmps/d Vx,Hx,Wx (66),(ev)
 66: vpblendmb/w Vx,Hx,Wx (66),(ev)
 68: vp2intersectd/q Kx,Hx,Wx (F2),(ev)
-# Skip 0x69-0x6f
+# Skip 0x69-0x6b
+6c: TCMMIMFP16PS Vt,Wt,Ht (66),(v1),(o64) | TCMMRLFP16PS Vt,Wt,Ht (v1),(o64)
+# Skip 0x6d-0x6f
 70: vpshldvw Vx,Hx,Wx (66),(ev)
 71: vpshldvd/q Vx,Hx,Wx (66),(ev)
-72: vcvtne2ps2bf16 Vx,Hx,Wx (F2),(ev) | vcvtneps2bf16 Vx,Wx (F3),(ev) | vpshrdvw Vx,Hx,Wx (66),(ev)
+72: vcvtne2ps2bf16 Vx,Hx,Wx (F2),(ev) | vcvtneps2bf16 Vx,Wx (F3) | vpshrdvw Vx,Hx,Wx (66),(ev)
 73: vpshrdvd/q Vx,Hx,Wx (66),(ev)
 75: vpermi2b/w Vx,Hx,Wx (66),(ev)
 76: vpermi2d/q Vx,Hx,Wx (66),(ev)
@@ -777,8 +779,10 @@ ac: vfnmadd213ps/d Vx,Hx,Wx (66),(v)
 ad: vfnmadd213ss/d Vx,Hx,Wx (66),(v),(v1)
 ae: vfnmsub213ps/d Vx,Hx,Wx (66),(v)
 af: vfnmsub213ss/d Vx,Hx,Wx (66),(v),(v1)
-b4: vpmadd52luq Vx,Hx,Wx (66),(ev)
-b5: vpmadd52huq Vx,Hx,Wx (66),(ev)
+b0: vcvtneebf162ps Vx,Mx (F3),(!11B),(v) | vcvtneeph2ps Vx,Mx (66),(!11B),(v) | vcvtneobf162ps Vx,Mx (F2),(!11B),(v) | vcvtneoph2ps Vx,Mx (!11B),(v)
+b1: vbcstnebf162ps Vx,Mw (F3),(!11B),(v) | vbcstnesh2ps Vx,Mw (66),(!11B),(v)
+b4: vpmadd52luq Vx,Hx,Wx (66)
+b5: vpmadd52huq Vx,Hx,Wx (66)
 b6: vfmaddsub231ps/d Vx,Hx,Wx (66),(v)
 b7: vfmsubadd231ps/d Vx,Hx,Wx (66),(v)
 b8: vfmadd231ps/d Vx,Hx,Wx (66),(v)
@@ -796,16 +800,35 @@ c7: Grp19 (1A)
 c8: sha1nexte Vdq,Wdq | vexp2ps/d Vx,Wx (66),(ev)
 c9: sha1msg1 Vdq,Wdq
 ca: sha1msg2 Vdq,Wdq | vrcp28ps/d Vx,Wx (66),(ev)
-cb: sha256rnds2 Vdq,Wdq | vrcp28ss/d Vx,Hx,Wx (66),(ev)
-cc: sha256msg1 Vdq,Wdq | vrsqrt28ps/d Vx,Wx (66),(ev)
-cd: sha256msg2 Vdq,Wdq | vrsqrt28ss/d Vx,Hx,Wx (66),(ev)
+cb: sha256rnds2 Vdq,Wdq | vrcp28ss/d Vx,Hx,Wx (66),(ev) | vsha512rnds2 Vqq,Hqq,Udq (F2),(11B),(v)
+cc: sha256msg1 Vdq,Wdq | vrsqrt28ps/d Vx,Wx (66),(ev) | vsha512msg1 Vqq,Udq (F2),(11B),(v)
+cd: sha256msg2 Vdq,Wdq | vrsqrt28ss/d Vx,Hx,Wx (66),(ev) | vsha512msg2 Vqq,Uqq (F2),(11B),(v)
 cf: vgf2p8mulb Vx,Wx (66)
+d2: vpdpwsud Vx,Hx,Wx (F3),(v) | vpdpwusd Vx,Hx,Wx (66),(v) | vpdpwuud Vx,Hx,Wx (v)
+d3: vpdpwsuds Vx,Hx,Wx (F3),(v) | vpdpwusds Vx,Hx,Wx (66),(v) | vpdpwuuds Vx,Hx,Wx (v)
 d8: AESENCWIDE128KL Qpi (F3),(000),(00B) | AESENCWIDE256KL Qpi (F3),(000),(10B) | AESDECWIDE128KL Qpi (F3),(000),(01B) | AESDECWIDE256KL Qpi (F3),(000),(11B)
+da: vsm3msg1 Vdq,Hdq,Udq (v1) | vsm3msg2 Vdq,Hdq,Udq (66),(v1) | vsm4key4 Vx,Hx,Wx (F3),(v) | vsm4rnds4 Vx,Hx,Wx (F2),(v)
 db: VAESIMC Vdq,Wdq (66),(v1)
 dc: vaesenc Vx,Hx,Wx (66) | LOADIWKEY Vx,Hx (F3) | AESENC128KL Vpd,Qpi (F3)
 dd: vaesenclast Vx,Hx,Wx (66) | AESDEC128KL Vpd,Qpi (F3)
 de: vaesdec Vx,Hx,Wx (66) | AESENC256KL Vpd,Qpi (F3)
 df: vaesdeclast Vx,Hx,Wx (66) | AESDEC256KL Vpd,Qpi (F3)
+e0: CMPOXADD   My,Gy,By (66),(v1),(o64)
+e1: CMPNOXADD  My,Gy,By (66),(v1),(o64)
+e2: CMPBXADD   My,Gy,By (66),(v1),(o64)
+e3: CMPNBXADD  My,Gy,By (66),(v1),(o64)
+e4: CMPZXADD   My,Gy,By (66),(v1),(o64)
+e5: CMPNZXADD  My,Gy,By (66),(v1),(o64)
+e6: CMPBEXADD  My,Gy,By (66),(v1),(o64)
+e7: CMPNBEXADD My,Gy,By (66),(v1),(o64)
+e8: CMPSXADD   My,Gy,By (66),(v1),(o64)
+e9: CMPNSXADD  My,Gy,By (66),(v1),(o64)
+ea: CMPPXADD   My,Gy,By (66),(v1),(o64)
+eb: CMPNPXADD  My,Gy,By (66),(v1),(o64)
+ec: CMPLXADD   My,Gy,By (66),(v1),(o64)
+ed: CMPNLXADD  My,Gy,By (66),(v1),(o64)
+ee: CMPLEXADD  My,Gy,By (66),(v1),(o64)
+ef: CMPNLEXADD My,Gy,By (66),(v1),(o64)
 f0: MOVBE Gy,My | MOVBE Gw,Mw (66) | CRC32 Gd,Eb (F2) | CRC32 Gd,Eb (66&F2)
 f1: MOVBE My,Gy | MOVBE Mw,Gw (66) | CRC32 Gd,Ey (F2) | CRC32 Gd,Ew (66&F2)
 f2: ANDN Gy,By,Ey (v)
@@ -813,10 +836,11 @@ f3: Grp17 (1A)
 f5: BZHI Gy,Ey,By (v) | PEXT Gy,By,Ey (F3),(v) | PDEP Gy,By,Ey (F2),(v) | WRUSSD/Q My,Gy (66)
 f6: ADCX Gy,Ey (66) | ADOX Gy,Ey (F3) | MULX By,Gy,rDX,Ey (F2),(v) | WRSSD/Q My,Gy
 f7: BEXTR Gy,Ey,By (v) | SHLX Gy,Ey,By (66),(v) | SARX Gy,Ey,By (F3),(v) | SHRX Gy,Ey,By (F2),(v)
-f8: MOVDIR64B Gv,Mdqq (66) | ENQCMD Gv,Mdqq (F2) | ENQCMDS Gv,Mdqq (F3)
+f8: MOVDIR64B Gv,Mdqq (66) | ENQCMD Gv,Mdqq (F2) | ENQCMDS Gv,Mdqq (F3) | URDMSR Rq,Gq (F2),(11B) | UWRMSR Gq,Rq (F3),(11B)
 f9: MOVDIRI My,Gy
 fa: ENCODEKEY128 Ew,Ew (F3)
 fb: ENCODEKEY256 Ew,Ew (F3)
+fc: AADD My,Gy | AAND My,Gy (66) | AOR My,Gy (F2) | AXOR My,Gy (F3)
 EndTable
 
 Table: 3-byte opcode 2 (0x0f 0x3a)
@@ -896,6 +920,7 @@ c2: vcmpph Vx,Hx,Wx,Ib (ev) | vcmpsh Vx,Hx,Wx,Ib (F3),(ev)
 cc: sha1rnds4 Vdq,Wdq,Ib
 ce: vgf2p8affineqb Vx,Wx,Ib (66)
 cf: vgf2p8affineinvqb Vx,Wx,Ib (66)
+de: vsm3rnds2 Vdq,Hdq,Wdq,Ib (66),(v1)
 df: VAESKEYGEN Vdq,Wdq,Ib (66),(v1)
 f0: RORX Gy,Ey,Ib (F2),(v) | HRESET Gv,Ib (F3),(000),(11B)
 EndTable
@@ -978,6 +1003,12 @@ d6: vfcmulcph Vx,Hx,Wx (F2),(ev) | vfmulcph Vx,Hx,Wx (F3),(ev)
 d7: vfcmulcsh Vx,Hx,Wx (F2),(ev) | vfmulcsh Vx,Hx,Wx (F3),(ev)
 EndTable
 
+Table: VEX map 7
+Referrer:
+AVXcode: 7
+f8: URDMSR Rq,Id (F2),(v1),(11B) | UWRMSR Id,Rq (F3),(v1),(11B)
+EndTable
+
 GrpTable: Grp1
 0: ADD
 1: OR
@@ -1054,7 +1085,7 @@ GrpTable: Grp6
 EndTable
 
 GrpTable: Grp7
-0: SGDT Ms | VMCALL (001),(11B) | VMLAUNCH (010),(11B) | VMRESUME (011),(11B) | VMXOFF (100),(11B) | PCONFIG (101),(11B) | ENCLV (000),(11B) | WRMSRNS (110),(11B)
+0: SGDT Ms | VMCALL (001),(11B) | VMLAUNCH (010),(11B) | VMRESUME (011),(11B) | VMXOFF (100),(11B) | PCONFIG (101),(11B) | ENCLV (000),(11B) | WRMSRNS (110),(11B) | RDMSRLIST (F2),(110),(11B) | WRMSRLIST (F3),(110),(11B) | PBNDKB (111),(11B)
 1: SIDT Ms | MONITOR (000),(11B) | MWAIT (001),(11B) | CLAC (010),(11B) | STAC (011),(11B) | ENCLS (111),(11B) | ERETU (F3),(010),(11B) | ERETS (F2),(010),(11B)
 2: LGDT Ms | XGETBV (000),(11B) | XSETBV (001),(11B) | VMFUNC (100),(11B) | XEND (101)(11B) | XTEST (110)(11B) | ENCLU (111),(11B)
 3: LIDT Ms
@@ -1140,6 +1171,8 @@ GrpTable: Grp16
 1: prefetch T0
 2: prefetch T1
 3: prefetch T2
+6: prefetch IT1
+7: prefetch IT0
 EndTable
 
 GrpTable: Grp17
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 05/10] x86/insn: Add support for REX2 prefix to the instruction decoder logic
  2024-05-02 10:58 [PATCH 00/10] perf intel pt: Update instruction decoder for APX and other new instructions Adrian Hunter
                   ` (3 preceding siblings ...)
  2024-05-02 10:58 ` [PATCH 04/10] x86/insn: Add misc new Intel instructions Adrian Hunter
@ 2024-05-02 10:58 ` Adrian Hunter
  2024-05-02 18:10   ` Ian Rogers
  2024-05-02 10:58 ` [PATCH 06/10] x86/insn: x86/insn: Add support for REX2 prefix to the instruction decoder opcode map Adrian Hunter
                   ` (5 subsequent siblings)
  10 siblings, 1 reply; 18+ messages in thread
From: Adrian Hunter @ 2024-05-02 10:58 UTC (permalink / raw)
  To: linux-kernel
  Cc: Chang S. Bae, Masami Hiramatsu, Nikolay Borisov, Borislav Petkov,
	Ingo Molnar, H. Peter Anvin, Dave Hansen, Thomas Gleixner, x86,
	Arnaldo Carvalho de Melo, Jiri Olsa, Namhyung Kim, Ian Rogers,
	linux-perf-users

Intel Advanced Performance Extensions (APX) uses a new 2-byte prefix named
REX2 to select extended general purpose registers (EGPRs) i.e. r16 to r31.

The REX2 prefix is effectively an extended version of the REX prefix.

REX2 and EVEX are also used with PUSH/POP instructions to provide a
Push-Pop Acceleration (PPX) hint. With PPX hints, a CPU will attempt to
fast-forward register data between matching PUSH and POP instructions.

REX2 is valid only with opcodes in maps 0 and 1. Similar extension for
other maps is provided by the EVEX prefix, covered in a separate patch.

Some opcodes in maps 0 and 1 are reserved under REX2. One of these is used
for a new 64-bit absolute direct jump instruction JMPABS.

Refer to the Intel Advanced Performance Extensions (Intel APX) Architecture
Specification for details.

Define a code value for the REX2 prefix (INAT_PFX_REX2), and add attribute
flags for opcodes reserved under REX2 (INAT_NO_REX2) and to identify
opcodes (only JMPABS) that require a mandatory REX2 prefix
(INAT_REX2_VARIANT).

Amend logic to read the REX2 prefix and get the opcode attribute for the
map number (0 or 1) encoded in the REX2 prefix.

Amend the awk script that generates the attribute tables from the opcode
map, to recognise "REX2" as attribute INAT_PFX_REX2, and "(!REX2)"
as attribute INAT_NO_REX2, and "(REX2)" as attribute INAT_REX2_VARIANT.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 arch/x86/include/asm/inat.h                | 11 +++++++++-
 arch/x86/include/asm/insn.h                | 25 ++++++++++++++++++----
 arch/x86/lib/insn.c                        | 25 ++++++++++++++++++++++
 arch/x86/tools/gen-insn-attr-x86.awk       | 11 +++++++++-
 tools/arch/x86/include/asm/inat.h          | 11 +++++++++-
 tools/arch/x86/include/asm/insn.h          | 25 ++++++++++++++++++----
 tools/arch/x86/lib/insn.c                  | 25 ++++++++++++++++++++++
 tools/arch/x86/tools/gen-insn-attr-x86.awk | 11 +++++++++-
 8 files changed, 132 insertions(+), 12 deletions(-)

diff --git a/arch/x86/include/asm/inat.h b/arch/x86/include/asm/inat.h
index b56c5741581a..1331bdd39a23 100644
--- a/arch/x86/include/asm/inat.h
+++ b/arch/x86/include/asm/inat.h
@@ -35,6 +35,8 @@
 #define INAT_PFX_VEX2	13	/* 2-bytes VEX prefix */
 #define INAT_PFX_VEX3	14	/* 3-bytes VEX prefix */
 #define INAT_PFX_EVEX	15	/* EVEX prefix */
+/* x86-64 REX2 prefix */
+#define INAT_PFX_REX2	16	/* 0xD5 */
 
 #define INAT_LSTPFX_MAX	3
 #define INAT_LGCPFX_MAX	11
@@ -50,7 +52,7 @@
 
 /* Legacy prefix */
 #define INAT_PFX_OFFS	0
-#define INAT_PFX_BITS	4
+#define INAT_PFX_BITS	5
 #define INAT_PFX_MAX    ((1 << INAT_PFX_BITS) - 1)
 #define INAT_PFX_MASK	(INAT_PFX_MAX << INAT_PFX_OFFS)
 /* Escape opcodes */
@@ -77,6 +79,8 @@
 #define INAT_VEXOK	(1 << (INAT_FLAG_OFFS + 5))
 #define INAT_VEXONLY	(1 << (INAT_FLAG_OFFS + 6))
 #define INAT_EVEXONLY	(1 << (INAT_FLAG_OFFS + 7))
+#define INAT_NO_REX2	(1 << (INAT_FLAG_OFFS + 8))
+#define INAT_REX2_VARIANT	(1 << (INAT_FLAG_OFFS + 9))
 /* Attribute making macros for attribute tables */
 #define INAT_MAKE_PREFIX(pfx)	(pfx << INAT_PFX_OFFS)
 #define INAT_MAKE_ESCAPE(esc)	(esc << INAT_ESC_OFFS)
@@ -128,6 +132,11 @@ static inline int inat_is_rex_prefix(insn_attr_t attr)
 	return (attr & INAT_PFX_MASK) == INAT_PFX_REX;
 }
 
+static inline int inat_is_rex2_prefix(insn_attr_t attr)
+{
+	return (attr & INAT_PFX_MASK) == INAT_PFX_REX2;
+}
+
 static inline int inat_last_prefix_id(insn_attr_t attr)
 {
 	if ((attr & INAT_PFX_MASK) > INAT_LSTPFX_MAX)
diff --git a/arch/x86/include/asm/insn.h b/arch/x86/include/asm/insn.h
index 1b29f58f730f..95249ec1f24e 100644
--- a/arch/x86/include/asm/insn.h
+++ b/arch/x86/include/asm/insn.h
@@ -112,10 +112,15 @@ struct insn {
 #define X86_SIB_INDEX(sib) (((sib) & 0x38) >> 3)
 #define X86_SIB_BASE(sib) ((sib) & 0x07)
 
-#define X86_REX_W(rex) ((rex) & 8)
-#define X86_REX_R(rex) ((rex) & 4)
-#define X86_REX_X(rex) ((rex) & 2)
-#define X86_REX_B(rex) ((rex) & 1)
+#define X86_REX2_M(rex) ((rex) & 0x80)	/* REX2 M0 */
+#define X86_REX2_R(rex) ((rex) & 0x40)	/* REX2 R4 */
+#define X86_REX2_X(rex) ((rex) & 0x20)	/* REX2 X4 */
+#define X86_REX2_B(rex) ((rex) & 0x10)	/* REX2 B4 */
+
+#define X86_REX_W(rex) ((rex) & 8)	/* REX or REX2 W */
+#define X86_REX_R(rex) ((rex) & 4)	/* REX or REX2 R3 */
+#define X86_REX_X(rex) ((rex) & 2)	/* REX or REX2 X3 */
+#define X86_REX_B(rex) ((rex) & 1)	/* REX or REX2 B3 */
 
 /* VEX bit flags  */
 #define X86_VEX_W(vex)	((vex) & 0x80)	/* VEX3 Byte2 */
@@ -161,6 +166,18 @@ static inline void insn_get_attribute(struct insn *insn)
 /* Instruction uses RIP-relative addressing */
 extern int insn_rip_relative(struct insn *insn);
 
+static inline int insn_is_rex2(struct insn *insn)
+{
+	if (!insn->prefixes.got)
+		insn_get_prefixes(insn);
+	return insn->rex_prefix.nbytes == 2;
+}
+
+static inline insn_byte_t insn_rex2_m_bit(struct insn *insn)
+{
+	return X86_REX2_M(insn->rex_prefix.bytes[1]);
+}
+
 static inline int insn_is_avx(struct insn *insn)
 {
 	if (!insn->prefixes.got)
diff --git a/arch/x86/lib/insn.c b/arch/x86/lib/insn.c
index 1bb155a0955b..6126ddc6e5f5 100644
--- a/arch/x86/lib/insn.c
+++ b/arch/x86/lib/insn.c
@@ -185,6 +185,17 @@ int insn_get_prefixes(struct insn *insn)
 			if (X86_REX_W(b))
 				/* REX.W overrides opnd_size */
 				insn->opnd_bytes = 8;
+		} else if (inat_is_rex2_prefix(attr)) {
+			insn_set_byte(&insn->rex_prefix, 0, b);
+			b = peek_nbyte_next(insn_byte_t, insn, 1);
+			insn_set_byte(&insn->rex_prefix, 1, b);
+			insn->rex_prefix.nbytes = 2;
+			insn->next_byte += 2;
+			if (X86_REX_W(b))
+				/* REX.W overrides opnd_size */
+				insn->opnd_bytes = 8;
+			insn->rex_prefix.got = 1;
+			goto vex_end;
 		}
 	}
 	insn->rex_prefix.got = 1;
@@ -294,6 +305,20 @@ int insn_get_opcode(struct insn *insn)
 		goto end;
 	}
 
+	/* Check if there is REX2 prefix or not */
+	if (insn_is_rex2(insn)) {
+		if (insn_rex2_m_bit(insn)) {
+			/* map 1 is escape 0x0f */
+			insn_attr_t esc_attr = inat_get_opcode_attribute(0x0f);
+
+			pfx_id = insn_last_prefix_id(insn);
+			insn->attr = inat_get_escape_attribute(op, pfx_id, esc_attr);
+		} else {
+			insn->attr = inat_get_opcode_attribute(op);
+		}
+		goto end;
+	}
+
 	insn->attr = inat_get_opcode_attribute(op);
 	while (inat_is_escape(insn->attr)) {
 		/* Get escaped opcode */
diff --git a/arch/x86/tools/gen-insn-attr-x86.awk b/arch/x86/tools/gen-insn-attr-x86.awk
index af38469afd14..3f43aa7d8fef 100644
--- a/arch/x86/tools/gen-insn-attr-x86.awk
+++ b/arch/x86/tools/gen-insn-attr-x86.awk
@@ -64,7 +64,9 @@ BEGIN {
 
 	modrm_expr = "^([CDEGMNPQRSUVW/][a-z]+|NTA|T[012])"
 	force64_expr = "\\([df]64\\)"
-	rex_expr = "^REX(\\.[XRWB]+)*"
+	rex_expr = "^((REX(\\.[XRWB]+)+)|(REX$))"
+	rex2_expr = "\\(REX2\\)"
+	no_rex2_expr = "\\(!REX2\\)"
 	fpu_expr = "^ESC" # TODO
 
 	lprefix1_expr = "\\((66|!F3)\\)"
@@ -99,6 +101,7 @@ BEGIN {
 	prefix_num["VEX+1byte"] = "INAT_PFX_VEX2"
 	prefix_num["VEX+2byte"] = "INAT_PFX_VEX3"
 	prefix_num["EVEX"] = "INAT_PFX_EVEX"
+	prefix_num["REX2"] = "INAT_PFX_REX2"
 
 	clear_vars()
 }
@@ -314,6 +317,10 @@ function convert_operands(count,opnd,       i,j,imm,mod)
 		if (match(ext, force64_expr))
 			flags = add_flags(flags, "INAT_FORCE64")
 
+		# check REX2 not allowed
+		if (match(ext, no_rex2_expr))
+			flags = add_flags(flags, "INAT_NO_REX2")
+
 		# check REX prefix
 		if (match(opcode, rex_expr))
 			flags = add_flags(flags, "INAT_MAKE_PREFIX(INAT_PFX_REX)")
@@ -351,6 +358,8 @@ function convert_operands(count,opnd,       i,j,imm,mod)
 			lptable3[idx] = add_flags(lptable3[idx],flags)
 			variant = "INAT_VARIANT"
 		}
+		if (match(ext, rex2_expr))
+			table[idx] = add_flags(table[idx], "INAT_REX2_VARIANT")
 		if (!match(ext, lprefix_expr)){
 			table[idx] = add_flags(table[idx],flags)
 		}
diff --git a/tools/arch/x86/include/asm/inat.h b/tools/arch/x86/include/asm/inat.h
index a61051400311..2e65312cae52 100644
--- a/tools/arch/x86/include/asm/inat.h
+++ b/tools/arch/x86/include/asm/inat.h
@@ -35,6 +35,8 @@
 #define INAT_PFX_VEX2	13	/* 2-bytes VEX prefix */
 #define INAT_PFX_VEX3	14	/* 3-bytes VEX prefix */
 #define INAT_PFX_EVEX	15	/* EVEX prefix */
+/* x86-64 REX2 prefix */
+#define INAT_PFX_REX2	16	/* 0xD5 */
 
 #define INAT_LSTPFX_MAX	3
 #define INAT_LGCPFX_MAX	11
@@ -50,7 +52,7 @@
 
 /* Legacy prefix */
 #define INAT_PFX_OFFS	0
-#define INAT_PFX_BITS	4
+#define INAT_PFX_BITS	5
 #define INAT_PFX_MAX    ((1 << INAT_PFX_BITS) - 1)
 #define INAT_PFX_MASK	(INAT_PFX_MAX << INAT_PFX_OFFS)
 /* Escape opcodes */
@@ -77,6 +79,8 @@
 #define INAT_VEXOK	(1 << (INAT_FLAG_OFFS + 5))
 #define INAT_VEXONLY	(1 << (INAT_FLAG_OFFS + 6))
 #define INAT_EVEXONLY	(1 << (INAT_FLAG_OFFS + 7))
+#define INAT_NO_REX2	(1 << (INAT_FLAG_OFFS + 8))
+#define INAT_REX2_VARIANT	(1 << (INAT_FLAG_OFFS + 9))
 /* Attribute making macros for attribute tables */
 #define INAT_MAKE_PREFIX(pfx)	(pfx << INAT_PFX_OFFS)
 #define INAT_MAKE_ESCAPE(esc)	(esc << INAT_ESC_OFFS)
@@ -128,6 +132,11 @@ static inline int inat_is_rex_prefix(insn_attr_t attr)
 	return (attr & INAT_PFX_MASK) == INAT_PFX_REX;
 }
 
+static inline int inat_is_rex2_prefix(insn_attr_t attr)
+{
+	return (attr & INAT_PFX_MASK) == INAT_PFX_REX2;
+}
+
 static inline int inat_last_prefix_id(insn_attr_t attr)
 {
 	if ((attr & INAT_PFX_MASK) > INAT_LSTPFX_MAX)
diff --git a/tools/arch/x86/include/asm/insn.h b/tools/arch/x86/include/asm/insn.h
index 65c0d9ce1e29..1a7e8fc4d75a 100644
--- a/tools/arch/x86/include/asm/insn.h
+++ b/tools/arch/x86/include/asm/insn.h
@@ -112,10 +112,15 @@ struct insn {
 #define X86_SIB_INDEX(sib) (((sib) & 0x38) >> 3)
 #define X86_SIB_BASE(sib) ((sib) & 0x07)
 
-#define X86_REX_W(rex) ((rex) & 8)
-#define X86_REX_R(rex) ((rex) & 4)
-#define X86_REX_X(rex) ((rex) & 2)
-#define X86_REX_B(rex) ((rex) & 1)
+#define X86_REX2_M(rex) ((rex) & 0x80)	/* REX2 M0 */
+#define X86_REX2_R(rex) ((rex) & 0x40)	/* REX2 R4 */
+#define X86_REX2_X(rex) ((rex) & 0x20)	/* REX2 X4 */
+#define X86_REX2_B(rex) ((rex) & 0x10)	/* REX2 B4 */
+
+#define X86_REX_W(rex) ((rex) & 8)	/* REX or REX2 W */
+#define X86_REX_R(rex) ((rex) & 4)	/* REX or REX2 R3 */
+#define X86_REX_X(rex) ((rex) & 2)	/* REX or REX2 X3 */
+#define X86_REX_B(rex) ((rex) & 1)	/* REX or REX2 B3 */
 
 /* VEX bit flags  */
 #define X86_VEX_W(vex)	((vex) & 0x80)	/* VEX3 Byte2 */
@@ -161,6 +166,18 @@ static inline void insn_get_attribute(struct insn *insn)
 /* Instruction uses RIP-relative addressing */
 extern int insn_rip_relative(struct insn *insn);
 
+static inline int insn_is_rex2(struct insn *insn)
+{
+	if (!insn->prefixes.got)
+		insn_get_prefixes(insn);
+	return insn->rex_prefix.nbytes == 2;
+}
+
+static inline insn_byte_t insn_rex2_m_bit(struct insn *insn)
+{
+	return X86_REX2_M(insn->rex_prefix.bytes[1]);
+}
+
 static inline int insn_is_avx(struct insn *insn)
 {
 	if (!insn->prefixes.got)
diff --git a/tools/arch/x86/lib/insn.c b/tools/arch/x86/lib/insn.c
index ada4b4a79dd4..f761adeb8e8c 100644
--- a/tools/arch/x86/lib/insn.c
+++ b/tools/arch/x86/lib/insn.c
@@ -185,6 +185,17 @@ int insn_get_prefixes(struct insn *insn)
 			if (X86_REX_W(b))
 				/* REX.W overrides opnd_size */
 				insn->opnd_bytes = 8;
+		} else if (inat_is_rex2_prefix(attr)) {
+			insn_set_byte(&insn->rex_prefix, 0, b);
+			b = peek_nbyte_next(insn_byte_t, insn, 1);
+			insn_set_byte(&insn->rex_prefix, 1, b);
+			insn->rex_prefix.nbytes = 2;
+			insn->next_byte += 2;
+			if (X86_REX_W(b))
+				/* REX.W overrides opnd_size */
+				insn->opnd_bytes = 8;
+			insn->rex_prefix.got = 1;
+			goto vex_end;
 		}
 	}
 	insn->rex_prefix.got = 1;
@@ -294,6 +305,20 @@ int insn_get_opcode(struct insn *insn)
 		goto end;
 	}
 
+	/* Check if there is REX2 prefix or not */
+	if (insn_is_rex2(insn)) {
+		if (insn_rex2_m_bit(insn)) {
+			/* map 1 is escape 0x0f */
+			insn_attr_t esc_attr = inat_get_opcode_attribute(0x0f);
+
+			pfx_id = insn_last_prefix_id(insn);
+			insn->attr = inat_get_escape_attribute(op, pfx_id, esc_attr);
+		} else {
+			insn->attr = inat_get_opcode_attribute(op);
+		}
+		goto end;
+	}
+
 	insn->attr = inat_get_opcode_attribute(op);
 	while (inat_is_escape(insn->attr)) {
 		/* Get escaped opcode */
diff --git a/tools/arch/x86/tools/gen-insn-attr-x86.awk b/tools/arch/x86/tools/gen-insn-attr-x86.awk
index af38469afd14..3f43aa7d8fef 100644
--- a/tools/arch/x86/tools/gen-insn-attr-x86.awk
+++ b/tools/arch/x86/tools/gen-insn-attr-x86.awk
@@ -64,7 +64,9 @@ BEGIN {
 
 	modrm_expr = "^([CDEGMNPQRSUVW/][a-z]+|NTA|T[012])"
 	force64_expr = "\\([df]64\\)"
-	rex_expr = "^REX(\\.[XRWB]+)*"
+	rex_expr = "^((REX(\\.[XRWB]+)+)|(REX$))"
+	rex2_expr = "\\(REX2\\)"
+	no_rex2_expr = "\\(!REX2\\)"
 	fpu_expr = "^ESC" # TODO
 
 	lprefix1_expr = "\\((66|!F3)\\)"
@@ -99,6 +101,7 @@ BEGIN {
 	prefix_num["VEX+1byte"] = "INAT_PFX_VEX2"
 	prefix_num["VEX+2byte"] = "INAT_PFX_VEX3"
 	prefix_num["EVEX"] = "INAT_PFX_EVEX"
+	prefix_num["REX2"] = "INAT_PFX_REX2"
 
 	clear_vars()
 }
@@ -314,6 +317,10 @@ function convert_operands(count,opnd,       i,j,imm,mod)
 		if (match(ext, force64_expr))
 			flags = add_flags(flags, "INAT_FORCE64")
 
+		# check REX2 not allowed
+		if (match(ext, no_rex2_expr))
+			flags = add_flags(flags, "INAT_NO_REX2")
+
 		# check REX prefix
 		if (match(opcode, rex_expr))
 			flags = add_flags(flags, "INAT_MAKE_PREFIX(INAT_PFX_REX)")
@@ -351,6 +358,8 @@ function convert_operands(count,opnd,       i,j,imm,mod)
 			lptable3[idx] = add_flags(lptable3[idx],flags)
 			variant = "INAT_VARIANT"
 		}
+		if (match(ext, rex2_expr))
+			table[idx] = add_flags(table[idx], "INAT_REX2_VARIANT")
 		if (!match(ext, lprefix_expr)){
 			table[idx] = add_flags(table[idx],flags)
 		}
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 06/10] x86/insn: x86/insn: Add support for REX2 prefix to the instruction decoder opcode map
  2024-05-02 10:58 [PATCH 00/10] perf intel pt: Update instruction decoder for APX and other new instructions Adrian Hunter
                   ` (4 preceding siblings ...)
  2024-05-02 10:58 ` [PATCH 05/10] x86/insn: Add support for REX2 prefix to the instruction decoder logic Adrian Hunter
@ 2024-05-02 10:58 ` Adrian Hunter
  2024-05-02 10:58 ` [PATCH 07/10] x86/insn: Add support for APX EVEX to the instruction decoder logic Adrian Hunter
                   ` (4 subsequent siblings)
  10 siblings, 0 replies; 18+ messages in thread
From: Adrian Hunter @ 2024-05-02 10:58 UTC (permalink / raw)
  To: linux-kernel
  Cc: Chang S. Bae, Masami Hiramatsu, Nikolay Borisov, Borislav Petkov,
	Ingo Molnar, H. Peter Anvin, Dave Hansen, Thomas Gleixner, x86,
	Arnaldo Carvalho de Melo, Jiri Olsa, Namhyung Kim, Ian Rogers,
	linux-perf-users

Support for REX2 has been added to the instruction decoder logic and the
awk script that generates the attribute tables from the opcode map.

Add REX2 prefix byte (0xD5) to the opcode map.

Add annotation (!REX2) for map 0/1 opcodes that are reserved under REX2.

Add JMPABS to the opcode map and add annotation (REX2) to identify that it
has a mandatory REX2 prefix. A separate opcode attribute table is not
needed at this time because JMPABS has the same attribute encoding as the
MOV instruction that it shares an opcode with i.e. INAT_MOFFSET.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 arch/x86/lib/x86-opcode-map.txt       | 148 +++++++++++++-------------
 tools/arch/x86/lib/x86-opcode-map.txt | 148 +++++++++++++-------------
 2 files changed, 152 insertions(+), 144 deletions(-)

diff --git a/arch/x86/lib/x86-opcode-map.txt b/arch/x86/lib/x86-opcode-map.txt
index 4e9f53581b58..240ef714b64f 100644
--- a/arch/x86/lib/x86-opcode-map.txt
+++ b/arch/x86/lib/x86-opcode-map.txt
@@ -33,6 +33,10 @@
 #  - (F2): the last prefix is 0xF2
 #  - (!F3) : the last prefix is not 0xF3 (including non-last prefix case)
 #  - (66&F2): Both 0x66 and 0xF2 prefixes are specified.
+#
+# REX2 Prefix
+#  - (!REX2): REX2 is not allowed
+#  - (REX2): REX2 variant e.g. JMPABS
 
 Table: one byte opcode
 Referrer:
@@ -157,22 +161,22 @@ AVXcode:
 6e: OUTS/OUTSB DX,Xb
 6f: OUTS/OUTSW/OUTSD DX,Xz
 # 0x70 - 0x7f
-70: JO Jb
-71: JNO Jb
-72: JB/JNAE/JC Jb
-73: JNB/JAE/JNC Jb
-74: JZ/JE Jb
-75: JNZ/JNE Jb
-76: JBE/JNA Jb
-77: JNBE/JA Jb
-78: JS Jb
-79: JNS Jb
-7a: JP/JPE Jb
-7b: JNP/JPO Jb
-7c: JL/JNGE Jb
-7d: JNL/JGE Jb
-7e: JLE/JNG Jb
-7f: JNLE/JG Jb
+70: JO Jb (!REX2)
+71: JNO Jb (!REX2)
+72: JB/JNAE/JC Jb (!REX2)
+73: JNB/JAE/JNC Jb (!REX2)
+74: JZ/JE Jb (!REX2)
+75: JNZ/JNE Jb (!REX2)
+76: JBE/JNA Jb (!REX2)
+77: JNBE/JA Jb (!REX2)
+78: JS Jb (!REX2)
+79: JNS Jb (!REX2)
+7a: JP/JPE Jb (!REX2)
+7b: JNP/JPO Jb (!REX2)
+7c: JL/JNGE Jb (!REX2)
+7d: JNL/JGE Jb (!REX2)
+7e: JLE/JNG Jb (!REX2)
+7f: JNLE/JG Jb (!REX2)
 # 0x80 - 0x8f
 80: Grp1 Eb,Ib (1A)
 81: Grp1 Ev,Iz (1A)
@@ -208,24 +212,24 @@ AVXcode:
 9e: SAHF
 9f: LAHF
 # 0xa0 - 0xaf
-a0: MOV AL,Ob
-a1: MOV rAX,Ov
-a2: MOV Ob,AL
-a3: MOV Ov,rAX
-a4: MOVS/B Yb,Xb
-a5: MOVS/W/D/Q Yv,Xv
-a6: CMPS/B Xb,Yb
-a7: CMPS/W/D Xv,Yv
-a8: TEST AL,Ib
-a9: TEST rAX,Iz
-aa: STOS/B Yb,AL
-ab: STOS/W/D/Q Yv,rAX
-ac: LODS/B AL,Xb
-ad: LODS/W/D/Q rAX,Xv
-ae: SCAS/B AL,Yb
+a0: MOV AL,Ob (!REX2)
+a1: MOV rAX,Ov (!REX2) | JMPABS O (REX2),(o64)
+a2: MOV Ob,AL (!REX2)
+a3: MOV Ov,rAX (!REX2)
+a4: MOVS/B Yb,Xb (!REX2)
+a5: MOVS/W/D/Q Yv,Xv (!REX2)
+a6: CMPS/B Xb,Yb (!REX2)
+a7: CMPS/W/D Xv,Yv (!REX2)
+a8: TEST AL,Ib (!REX2)
+a9: TEST rAX,Iz (!REX2)
+aa: STOS/B Yb,AL (!REX2)
+ab: STOS/W/D/Q Yv,rAX (!REX2)
+ac: LODS/B AL,Xb (!REX2)
+ad: LODS/W/D/Q rAX,Xv (!REX2)
+ae: SCAS/B AL,Yb (!REX2)
 # Note: The May 2011 Intel manual shows Xv for the second parameter of the
 # next instruction but Yv is correct
-af: SCAS/W/D/Q rAX,Yv
+af: SCAS/W/D/Q rAX,Yv (!REX2)
 # 0xb0 - 0xbf
 b0: MOV AL/R8L,Ib
 b1: MOV CL/R9L,Ib
@@ -266,7 +270,7 @@ d1: Grp2 Ev,1 (1A)
 d2: Grp2 Eb,CL (1A)
 d3: Grp2 Ev,CL (1A)
 d4: AAM Ib (i64)
-d5: AAD Ib (i64)
+d5: AAD Ib (i64) | REX2 (Prefix),(o64)
 d6:
 d7: XLAT/XLATB
 d8: ESC
@@ -281,26 +285,26 @@ df: ESC
 # Note: "forced64" is Intel CPU behavior: they ignore 0x66 prefix
 # in 64-bit mode. AMD CPUs accept 0x66 prefix, it causes RIP truncation
 # to 16 bits. In 32-bit mode, 0x66 is accepted by both Intel and AMD.
-e0: LOOPNE/LOOPNZ Jb (f64)
-e1: LOOPE/LOOPZ Jb (f64)
-e2: LOOP Jb (f64)
-e3: JrCXZ Jb (f64)
-e4: IN AL,Ib
-e5: IN eAX,Ib
-e6: OUT Ib,AL
-e7: OUT Ib,eAX
+e0: LOOPNE/LOOPNZ Jb (f64) (!REX2)
+e1: LOOPE/LOOPZ Jb (f64) (!REX2)
+e2: LOOP Jb (f64) (!REX2)
+e3: JrCXZ Jb (f64) (!REX2)
+e4: IN AL,Ib (!REX2)
+e5: IN eAX,Ib (!REX2)
+e6: OUT Ib,AL (!REX2)
+e7: OUT Ib,eAX (!REX2)
 # With 0x66 prefix in 64-bit mode, for AMD CPUs immediate offset
 # in "near" jumps and calls is 16-bit. For CALL,
 # push of return address is 16-bit wide, RSP is decremented by 2
 # but is not truncated to 16 bits, unlike RIP.
-e8: CALL Jz (f64)
-e9: JMP-near Jz (f64)
-ea: JMP-far Ap (i64)
-eb: JMP-short Jb (f64)
-ec: IN AL,DX
-ed: IN eAX,DX
-ee: OUT DX,AL
-ef: OUT DX,eAX
+e8: CALL Jz (f64) (!REX2)
+e9: JMP-near Jz (f64) (!REX2)
+ea: JMP-far Ap (i64) (!REX2)
+eb: JMP-short Jb (f64) (!REX2)
+ec: IN AL,DX (!REX2)
+ed: IN eAX,DX (!REX2)
+ee: OUT DX,AL (!REX2)
+ef: OUT DX,eAX (!REX2)
 # 0xf0 - 0xff
 f0: LOCK (Prefix)
 f1:
@@ -386,14 +390,14 @@ AVXcode: 1
 2e: vucomiss Vss,Wss (v1) | vucomisd  Vsd,Wsd (66),(v1)
 2f: vcomiss Vss,Wss (v1) | vcomisd  Vsd,Wsd (66),(v1)
 # 0x0f 0x30-0x3f
-30: WRMSR
-31: RDTSC
-32: RDMSR
-33: RDPMC
-34: SYSENTER
-35: SYSEXIT
+30: WRMSR (!REX2)
+31: RDTSC (!REX2)
+32: RDMSR (!REX2)
+33: RDPMC (!REX2)
+34: SYSENTER (!REX2)
+35: SYSEXIT (!REX2)
 36:
-37: GETSEC
+37: GETSEC (!REX2)
 38: escape # 3-byte escape 1
 39:
 3a: escape # 3-byte escape 2
@@ -473,22 +477,22 @@ AVXcode: 1
 7f: movq Qq,Pq | vmovdqa Wx,Vx (66) | vmovdqa32/64 Wx,Vx (66),(evo) | vmovdqu Wx,Vx (F3) | vmovdqu32/64 Wx,Vx (F3),(evo) | vmovdqu8/16 Wx,Vx (F2),(ev)
 # 0x0f 0x80-0x8f
 # Note: "forced64" is Intel CPU behavior (see comment about CALL insn).
-80: JO Jz (f64)
-81: JNO Jz (f64)
-82: JB/JC/JNAE Jz (f64)
-83: JAE/JNB/JNC Jz (f64)
-84: JE/JZ Jz (f64)
-85: JNE/JNZ Jz (f64)
-86: JBE/JNA Jz (f64)
-87: JA/JNBE Jz (f64)
-88: JS Jz (f64)
-89: JNS Jz (f64)
-8a: JP/JPE Jz (f64)
-8b: JNP/JPO Jz (f64)
-8c: JL/JNGE Jz (f64)
-8d: JNL/JGE Jz (f64)
-8e: JLE/JNG Jz (f64)
-8f: JNLE/JG Jz (f64)
+80: JO Jz (f64) (!REX2)
+81: JNO Jz (f64) (!REX2)
+82: JB/JC/JNAE Jz (f64) (!REX2)
+83: JAE/JNB/JNC Jz (f64) (!REX2)
+84: JE/JZ Jz (f64) (!REX2)
+85: JNE/JNZ Jz (f64) (!REX2)
+86: JBE/JNA Jz (f64) (!REX2)
+87: JA/JNBE Jz (f64) (!REX2)
+88: JS Jz (f64) (!REX2)
+89: JNS Jz (f64) (!REX2)
+8a: JP/JPE Jz (f64) (!REX2)
+8b: JNP/JPO Jz (f64) (!REX2)
+8c: JL/JNGE Jz (f64) (!REX2)
+8d: JNL/JGE Jz (f64) (!REX2)
+8e: JLE/JNG Jz (f64) (!REX2)
+8f: JNLE/JG Jz (f64) (!REX2)
 # 0x0f 0x90-0x9f
 90: SETO Eb | kmovw/q Vk,Wk | kmovb/d Vk,Wk (66)
 91: SETNO Eb | kmovw/q Mv,Vk | kmovb/d Mv,Vk (66)
diff --git a/tools/arch/x86/lib/x86-opcode-map.txt b/tools/arch/x86/lib/x86-opcode-map.txt
index 4e9f53581b58..240ef714b64f 100644
--- a/tools/arch/x86/lib/x86-opcode-map.txt
+++ b/tools/arch/x86/lib/x86-opcode-map.txt
@@ -33,6 +33,10 @@
 #  - (F2): the last prefix is 0xF2
 #  - (!F3) : the last prefix is not 0xF3 (including non-last prefix case)
 #  - (66&F2): Both 0x66 and 0xF2 prefixes are specified.
+#
+# REX2 Prefix
+#  - (!REX2): REX2 is not allowed
+#  - (REX2): REX2 variant e.g. JMPABS
 
 Table: one byte opcode
 Referrer:
@@ -157,22 +161,22 @@ AVXcode:
 6e: OUTS/OUTSB DX,Xb
 6f: OUTS/OUTSW/OUTSD DX,Xz
 # 0x70 - 0x7f
-70: JO Jb
-71: JNO Jb
-72: JB/JNAE/JC Jb
-73: JNB/JAE/JNC Jb
-74: JZ/JE Jb
-75: JNZ/JNE Jb
-76: JBE/JNA Jb
-77: JNBE/JA Jb
-78: JS Jb
-79: JNS Jb
-7a: JP/JPE Jb
-7b: JNP/JPO Jb
-7c: JL/JNGE Jb
-7d: JNL/JGE Jb
-7e: JLE/JNG Jb
-7f: JNLE/JG Jb
+70: JO Jb (!REX2)
+71: JNO Jb (!REX2)
+72: JB/JNAE/JC Jb (!REX2)
+73: JNB/JAE/JNC Jb (!REX2)
+74: JZ/JE Jb (!REX2)
+75: JNZ/JNE Jb (!REX2)
+76: JBE/JNA Jb (!REX2)
+77: JNBE/JA Jb (!REX2)
+78: JS Jb (!REX2)
+79: JNS Jb (!REX2)
+7a: JP/JPE Jb (!REX2)
+7b: JNP/JPO Jb (!REX2)
+7c: JL/JNGE Jb (!REX2)
+7d: JNL/JGE Jb (!REX2)
+7e: JLE/JNG Jb (!REX2)
+7f: JNLE/JG Jb (!REX2)
 # 0x80 - 0x8f
 80: Grp1 Eb,Ib (1A)
 81: Grp1 Ev,Iz (1A)
@@ -208,24 +212,24 @@ AVXcode:
 9e: SAHF
 9f: LAHF
 # 0xa0 - 0xaf
-a0: MOV AL,Ob
-a1: MOV rAX,Ov
-a2: MOV Ob,AL
-a3: MOV Ov,rAX
-a4: MOVS/B Yb,Xb
-a5: MOVS/W/D/Q Yv,Xv
-a6: CMPS/B Xb,Yb
-a7: CMPS/W/D Xv,Yv
-a8: TEST AL,Ib
-a9: TEST rAX,Iz
-aa: STOS/B Yb,AL
-ab: STOS/W/D/Q Yv,rAX
-ac: LODS/B AL,Xb
-ad: LODS/W/D/Q rAX,Xv
-ae: SCAS/B AL,Yb
+a0: MOV AL,Ob (!REX2)
+a1: MOV rAX,Ov (!REX2) | JMPABS O (REX2),(o64)
+a2: MOV Ob,AL (!REX2)
+a3: MOV Ov,rAX (!REX2)
+a4: MOVS/B Yb,Xb (!REX2)
+a5: MOVS/W/D/Q Yv,Xv (!REX2)
+a6: CMPS/B Xb,Yb (!REX2)
+a7: CMPS/W/D Xv,Yv (!REX2)
+a8: TEST AL,Ib (!REX2)
+a9: TEST rAX,Iz (!REX2)
+aa: STOS/B Yb,AL (!REX2)
+ab: STOS/W/D/Q Yv,rAX (!REX2)
+ac: LODS/B AL,Xb (!REX2)
+ad: LODS/W/D/Q rAX,Xv (!REX2)
+ae: SCAS/B AL,Yb (!REX2)
 # Note: The May 2011 Intel manual shows Xv for the second parameter of the
 # next instruction but Yv is correct
-af: SCAS/W/D/Q rAX,Yv
+af: SCAS/W/D/Q rAX,Yv (!REX2)
 # 0xb0 - 0xbf
 b0: MOV AL/R8L,Ib
 b1: MOV CL/R9L,Ib
@@ -266,7 +270,7 @@ d1: Grp2 Ev,1 (1A)
 d2: Grp2 Eb,CL (1A)
 d3: Grp2 Ev,CL (1A)
 d4: AAM Ib (i64)
-d5: AAD Ib (i64)
+d5: AAD Ib (i64) | REX2 (Prefix),(o64)
 d6:
 d7: XLAT/XLATB
 d8: ESC
@@ -281,26 +285,26 @@ df: ESC
 # Note: "forced64" is Intel CPU behavior: they ignore 0x66 prefix
 # in 64-bit mode. AMD CPUs accept 0x66 prefix, it causes RIP truncation
 # to 16 bits. In 32-bit mode, 0x66 is accepted by both Intel and AMD.
-e0: LOOPNE/LOOPNZ Jb (f64)
-e1: LOOPE/LOOPZ Jb (f64)
-e2: LOOP Jb (f64)
-e3: JrCXZ Jb (f64)
-e4: IN AL,Ib
-e5: IN eAX,Ib
-e6: OUT Ib,AL
-e7: OUT Ib,eAX
+e0: LOOPNE/LOOPNZ Jb (f64) (!REX2)
+e1: LOOPE/LOOPZ Jb (f64) (!REX2)
+e2: LOOP Jb (f64) (!REX2)
+e3: JrCXZ Jb (f64) (!REX2)
+e4: IN AL,Ib (!REX2)
+e5: IN eAX,Ib (!REX2)
+e6: OUT Ib,AL (!REX2)
+e7: OUT Ib,eAX (!REX2)
 # With 0x66 prefix in 64-bit mode, for AMD CPUs immediate offset
 # in "near" jumps and calls is 16-bit. For CALL,
 # push of return address is 16-bit wide, RSP is decremented by 2
 # but is not truncated to 16 bits, unlike RIP.
-e8: CALL Jz (f64)
-e9: JMP-near Jz (f64)
-ea: JMP-far Ap (i64)
-eb: JMP-short Jb (f64)
-ec: IN AL,DX
-ed: IN eAX,DX
-ee: OUT DX,AL
-ef: OUT DX,eAX
+e8: CALL Jz (f64) (!REX2)
+e9: JMP-near Jz (f64) (!REX2)
+ea: JMP-far Ap (i64) (!REX2)
+eb: JMP-short Jb (f64) (!REX2)
+ec: IN AL,DX (!REX2)
+ed: IN eAX,DX (!REX2)
+ee: OUT DX,AL (!REX2)
+ef: OUT DX,eAX (!REX2)
 # 0xf0 - 0xff
 f0: LOCK (Prefix)
 f1:
@@ -386,14 +390,14 @@ AVXcode: 1
 2e: vucomiss Vss,Wss (v1) | vucomisd  Vsd,Wsd (66),(v1)
 2f: vcomiss Vss,Wss (v1) | vcomisd  Vsd,Wsd (66),(v1)
 # 0x0f 0x30-0x3f
-30: WRMSR
-31: RDTSC
-32: RDMSR
-33: RDPMC
-34: SYSENTER
-35: SYSEXIT
+30: WRMSR (!REX2)
+31: RDTSC (!REX2)
+32: RDMSR (!REX2)
+33: RDPMC (!REX2)
+34: SYSENTER (!REX2)
+35: SYSEXIT (!REX2)
 36:
-37: GETSEC
+37: GETSEC (!REX2)
 38: escape # 3-byte escape 1
 39:
 3a: escape # 3-byte escape 2
@@ -473,22 +477,22 @@ AVXcode: 1
 7f: movq Qq,Pq | vmovdqa Wx,Vx (66) | vmovdqa32/64 Wx,Vx (66),(evo) | vmovdqu Wx,Vx (F3) | vmovdqu32/64 Wx,Vx (F3),(evo) | vmovdqu8/16 Wx,Vx (F2),(ev)
 # 0x0f 0x80-0x8f
 # Note: "forced64" is Intel CPU behavior (see comment about CALL insn).
-80: JO Jz (f64)
-81: JNO Jz (f64)
-82: JB/JC/JNAE Jz (f64)
-83: JAE/JNB/JNC Jz (f64)
-84: JE/JZ Jz (f64)
-85: JNE/JNZ Jz (f64)
-86: JBE/JNA Jz (f64)
-87: JA/JNBE Jz (f64)
-88: JS Jz (f64)
-89: JNS Jz (f64)
-8a: JP/JPE Jz (f64)
-8b: JNP/JPO Jz (f64)
-8c: JL/JNGE Jz (f64)
-8d: JNL/JGE Jz (f64)
-8e: JLE/JNG Jz (f64)
-8f: JNLE/JG Jz (f64)
+80: JO Jz (f64) (!REX2)
+81: JNO Jz (f64) (!REX2)
+82: JB/JC/JNAE Jz (f64) (!REX2)
+83: JAE/JNB/JNC Jz (f64) (!REX2)
+84: JE/JZ Jz (f64) (!REX2)
+85: JNE/JNZ Jz (f64) (!REX2)
+86: JBE/JNA Jz (f64) (!REX2)
+87: JA/JNBE Jz (f64) (!REX2)
+88: JS Jz (f64) (!REX2)
+89: JNS Jz (f64) (!REX2)
+8a: JP/JPE Jz (f64) (!REX2)
+8b: JNP/JPO Jz (f64) (!REX2)
+8c: JL/JNGE Jz (f64) (!REX2)
+8d: JNL/JGE Jz (f64) (!REX2)
+8e: JLE/JNG Jz (f64) (!REX2)
+8f: JNLE/JG Jz (f64) (!REX2)
 # 0x0f 0x90-0x9f
 90: SETO Eb | kmovw/q Vk,Wk | kmovb/d Vk,Wk (66)
 91: SETNO Eb | kmovw/q Mv,Vk | kmovb/d Mv,Vk (66)
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 07/10] x86/insn: Add support for APX EVEX to the instruction decoder logic
  2024-05-02 10:58 [PATCH 00/10] perf intel pt: Update instruction decoder for APX and other new instructions Adrian Hunter
                   ` (5 preceding siblings ...)
  2024-05-02 10:58 ` [PATCH 06/10] x86/insn: x86/insn: Add support for REX2 prefix to the instruction decoder opcode map Adrian Hunter
@ 2024-05-02 10:58 ` Adrian Hunter
  2024-05-02 10:58 ` [PATCH 08/10] x86/insn: Add support for APX EVEX instructions to the opcode map Adrian Hunter
                   ` (3 subsequent siblings)
  10 siblings, 0 replies; 18+ messages in thread
From: Adrian Hunter @ 2024-05-02 10:58 UTC (permalink / raw)
  To: linux-kernel
  Cc: Chang S. Bae, Masami Hiramatsu, Nikolay Borisov, Borislav Petkov,
	Ingo Molnar, H. Peter Anvin, Dave Hansen, Thomas Gleixner, x86,
	Arnaldo Carvalho de Melo, Jiri Olsa, Namhyung Kim, Ian Rogers,
	linux-perf-users

Intel Advanced Performance Extensions (APX) extends the EVEX prefix to
support:
 - extended general purpose registers (EGPRs) i.e. r16 to r31
 - Push-Pop Acceleration (PPX) hints
 - new data destination (NDD) register
 - suppress status flags writes (NF) of common instructions
 - new instructions

Refer to the Intel Advanced Performance Extensions (Intel APX) Architecture
Specification for details.

The extended EVEX prefix does not need amended instruction decoder logic,
except in one area. Some instructions are defined as SCALABLE which means
the EVEX.W bit and EVEX.pp bits are used to determine operand size.
Specifically, if an instruction is SCALABLE and EVEX.W is zero, then
EVEX.pp value 0 (representing no prefix NP) means default operand size,
whereas EVEX.pp value 1 (representing 66 prefix) means operand size
override i.e. 16 bits

Add an attribute (INAT_EVEX_SCALABLE) to identify such instructions, and
amend the logic appropriately.

Amend the awk script that generates the attribute tables from the opcode
map, to recognise "(es)" as attribute INAT_EVEX_SCALABLE.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 arch/x86/include/asm/inat.h                | 6 ++++++
 arch/x86/include/asm/insn.h                | 7 +++++++
 arch/x86/lib/insn.c                        | 4 ++++
 arch/x86/tools/gen-insn-attr-x86.awk       | 4 ++++
 tools/arch/x86/include/asm/inat.h          | 6 ++++++
 tools/arch/x86/include/asm/insn.h          | 7 +++++++
 tools/arch/x86/lib/insn.c                  | 4 ++++
 tools/arch/x86/tools/gen-insn-attr-x86.awk | 4 ++++
 8 files changed, 42 insertions(+)

diff --git a/arch/x86/include/asm/inat.h b/arch/x86/include/asm/inat.h
index 1331bdd39a23..53e4015242b4 100644
--- a/arch/x86/include/asm/inat.h
+++ b/arch/x86/include/asm/inat.h
@@ -81,6 +81,7 @@
 #define INAT_EVEXONLY	(1 << (INAT_FLAG_OFFS + 7))
 #define INAT_NO_REX2	(1 << (INAT_FLAG_OFFS + 8))
 #define INAT_REX2_VARIANT	(1 << (INAT_FLAG_OFFS + 9))
+#define INAT_EVEX_SCALABLE	(1 << (INAT_FLAG_OFFS + 10))
 /* Attribute making macros for attribute tables */
 #define INAT_MAKE_PREFIX(pfx)	(pfx << INAT_PFX_OFFS)
 #define INAT_MAKE_ESCAPE(esc)	(esc << INAT_ESC_OFFS)
@@ -236,4 +237,9 @@ static inline int inat_must_evex(insn_attr_t attr)
 {
 	return attr & INAT_EVEXONLY;
 }
+
+static inline int inat_evex_scalable(insn_attr_t attr)
+{
+	return attr & INAT_EVEX_SCALABLE;
+}
 #endif
diff --git a/arch/x86/include/asm/insn.h b/arch/x86/include/asm/insn.h
index 95249ec1f24e..7152ea809e6a 100644
--- a/arch/x86/include/asm/insn.h
+++ b/arch/x86/include/asm/insn.h
@@ -215,6 +215,13 @@ static inline insn_byte_t insn_vex_p_bits(struct insn *insn)
 		return X86_VEX_P(insn->vex_prefix.bytes[2]);
 }
 
+static inline insn_byte_t insn_vex_w_bit(struct insn *insn)
+{
+	if (insn->vex_prefix.nbytes < 3)
+		return 0;
+	return X86_VEX_W(insn->vex_prefix.bytes[2]);
+}
+
 /* Get the last prefix id from last prefix or VEX prefix */
 static inline int insn_last_prefix_id(struct insn *insn)
 {
diff --git a/arch/x86/lib/insn.c b/arch/x86/lib/insn.c
index 6126ddc6e5f5..5952ab41c60f 100644
--- a/arch/x86/lib/insn.c
+++ b/arch/x86/lib/insn.c
@@ -294,6 +294,10 @@ int insn_get_opcode(struct insn *insn)
 		m = insn_vex_m_bits(insn);
 		p = insn_vex_p_bits(insn);
 		insn->attr = inat_get_avx_attribute(op, m, p);
+		/* SCALABLE EVEX uses p bits to encode operand size */
+		if (inat_evex_scalable(insn->attr) && !insn_vex_w_bit(insn) &&
+		    p == INAT_PFX_OPNDSZ)
+			insn->opnd_bytes = 2;
 		if ((inat_must_evex(insn->attr) && !insn_is_evex(insn)) ||
 		    (!inat_accept_vex(insn->attr) &&
 		     !inat_is_group(insn->attr))) {
diff --git a/arch/x86/tools/gen-insn-attr-x86.awk b/arch/x86/tools/gen-insn-attr-x86.awk
index 3f43aa7d8fef..5770c8097f32 100644
--- a/arch/x86/tools/gen-insn-attr-x86.awk
+++ b/arch/x86/tools/gen-insn-attr-x86.awk
@@ -83,6 +83,8 @@ BEGIN {
 	vexonly_expr = "\\(v\\)"
 	# All opcodes with (ev) superscript supports *only* EVEX prefix
 	evexonly_expr = "\\(ev\\)"
+	# (es) is the same as (ev) but also "SCALABLE" i.e. W and pp determine operand size
+	evex_scalable_expr = "\\(es\\)"
 
 	prefix_expr = "\\(Prefix\\)"
 	prefix_num["Operand-Size"] = "INAT_PFX_OPNDSZ"
@@ -332,6 +334,8 @@ function convert_operands(count,opnd,       i,j,imm,mod)
 		# check VEX codes
 		if (match(ext, evexonly_expr))
 			flags = add_flags(flags, "INAT_VEXOK | INAT_EVEXONLY")
+		else if (match(ext, evex_scalable_expr))
+			flags = add_flags(flags, "INAT_VEXOK | INAT_EVEXONLY | INAT_EVEX_SCALABLE")
 		else if (match(ext, vexonly_expr))
 			flags = add_flags(flags, "INAT_VEXOK | INAT_VEXONLY")
 		else if (match(ext, vexok_expr) || match(opcode, vexok_opcode_expr))
diff --git a/tools/arch/x86/include/asm/inat.h b/tools/arch/x86/include/asm/inat.h
index 2e65312cae52..253690eb3c26 100644
--- a/tools/arch/x86/include/asm/inat.h
+++ b/tools/arch/x86/include/asm/inat.h
@@ -81,6 +81,7 @@
 #define INAT_EVEXONLY	(1 << (INAT_FLAG_OFFS + 7))
 #define INAT_NO_REX2	(1 << (INAT_FLAG_OFFS + 8))
 #define INAT_REX2_VARIANT	(1 << (INAT_FLAG_OFFS + 9))
+#define INAT_EVEX_SCALABLE	(1 << (INAT_FLAG_OFFS + 10))
 /* Attribute making macros for attribute tables */
 #define INAT_MAKE_PREFIX(pfx)	(pfx << INAT_PFX_OFFS)
 #define INAT_MAKE_ESCAPE(esc)	(esc << INAT_ESC_OFFS)
@@ -236,4 +237,9 @@ static inline int inat_must_evex(insn_attr_t attr)
 {
 	return attr & INAT_EVEXONLY;
 }
+
+static inline int inat_evex_scalable(insn_attr_t attr)
+{
+	return attr & INAT_EVEX_SCALABLE;
+}
 #endif
diff --git a/tools/arch/x86/include/asm/insn.h b/tools/arch/x86/include/asm/insn.h
index 1a7e8fc4d75a..0e5abd896ad4 100644
--- a/tools/arch/x86/include/asm/insn.h
+++ b/tools/arch/x86/include/asm/insn.h
@@ -215,6 +215,13 @@ static inline insn_byte_t insn_vex_p_bits(struct insn *insn)
 		return X86_VEX_P(insn->vex_prefix.bytes[2]);
 }
 
+static inline insn_byte_t insn_vex_w_bit(struct insn *insn)
+{
+	if (insn->vex_prefix.nbytes < 3)
+		return 0;
+	return X86_VEX_W(insn->vex_prefix.bytes[2]);
+}
+
 /* Get the last prefix id from last prefix or VEX prefix */
 static inline int insn_last_prefix_id(struct insn *insn)
 {
diff --git a/tools/arch/x86/lib/insn.c b/tools/arch/x86/lib/insn.c
index f761adeb8e8c..a43b37346a22 100644
--- a/tools/arch/x86/lib/insn.c
+++ b/tools/arch/x86/lib/insn.c
@@ -294,6 +294,10 @@ int insn_get_opcode(struct insn *insn)
 		m = insn_vex_m_bits(insn);
 		p = insn_vex_p_bits(insn);
 		insn->attr = inat_get_avx_attribute(op, m, p);
+		/* SCALABLE EVEX uses p bits to encode operand size */
+		if (inat_evex_scalable(insn->attr) && !insn_vex_w_bit(insn) &&
+		    p == INAT_PFX_OPNDSZ)
+			insn->opnd_bytes = 2;
 		if ((inat_must_evex(insn->attr) && !insn_is_evex(insn)) ||
 		    (!inat_accept_vex(insn->attr) &&
 		     !inat_is_group(insn->attr))) {
diff --git a/tools/arch/x86/tools/gen-insn-attr-x86.awk b/tools/arch/x86/tools/gen-insn-attr-x86.awk
index 3f43aa7d8fef..5770c8097f32 100644
--- a/tools/arch/x86/tools/gen-insn-attr-x86.awk
+++ b/tools/arch/x86/tools/gen-insn-attr-x86.awk
@@ -83,6 +83,8 @@ BEGIN {
 	vexonly_expr = "\\(v\\)"
 	# All opcodes with (ev) superscript supports *only* EVEX prefix
 	evexonly_expr = "\\(ev\\)"
+	# (es) is the same as (ev) but also "SCALABLE" i.e. W and pp determine operand size
+	evex_scalable_expr = "\\(es\\)"
 
 	prefix_expr = "\\(Prefix\\)"
 	prefix_num["Operand-Size"] = "INAT_PFX_OPNDSZ"
@@ -332,6 +334,8 @@ function convert_operands(count,opnd,       i,j,imm,mod)
 		# check VEX codes
 		if (match(ext, evexonly_expr))
 			flags = add_flags(flags, "INAT_VEXOK | INAT_EVEXONLY")
+		else if (match(ext, evex_scalable_expr))
+			flags = add_flags(flags, "INAT_VEXOK | INAT_EVEXONLY | INAT_EVEX_SCALABLE")
 		else if (match(ext, vexonly_expr))
 			flags = add_flags(flags, "INAT_VEXOK | INAT_VEXONLY")
 		else if (match(ext, vexok_expr) || match(opcode, vexok_opcode_expr))
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 08/10] x86/insn: Add support for APX EVEX instructions to the opcode map
  2024-05-02 10:58 [PATCH 00/10] perf intel pt: Update instruction decoder for APX and other new instructions Adrian Hunter
                   ` (6 preceding siblings ...)
  2024-05-02 10:58 ` [PATCH 07/10] x86/insn: Add support for APX EVEX to the instruction decoder logic Adrian Hunter
@ 2024-05-02 10:58 ` Adrian Hunter
  2024-05-02 10:58 ` [PATCH 09/10] perf intel pt: Add new JMPABS instruction to the Intel PT instruction decoder Adrian Hunter
                   ` (2 subsequent siblings)
  10 siblings, 0 replies; 18+ messages in thread
From: Adrian Hunter @ 2024-05-02 10:58 UTC (permalink / raw)
  To: linux-kernel
  Cc: Chang S. Bae, Masami Hiramatsu, Nikolay Borisov, Borislav Petkov,
	Ingo Molnar, H. Peter Anvin, Dave Hansen, Thomas Gleixner, x86,
	Arnaldo Carvalho de Melo, Jiri Olsa, Namhyung Kim, Ian Rogers,
	linux-perf-users

To support APX functionality, the EVEX prefix is used to:
 - promote legacy instructions
 - promote VEX instructions
 - add new instructions

Promoted VEX instructions require no extra annotation because the opcodes
do not change and the permissive nature of the instruction decoder already
allows them to have an EVEX prefix.

Promoted legacy instructions and new instructions are placed in map 4 which
has not been used before.

Create a new table for map 4 and add APX instructions.

Annotate SCALABLE instructions with "(es)" - refer to patch "x86/insn: Add
support for APX EVEX to the instruction decoder logic". SCALABLE
instructions must be represented in both no-prefix (NP) and 66 prefix
forms.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 arch/x86/lib/x86-opcode-map.txt       | 93 +++++++++++++++++++++++++++
 tools/arch/x86/lib/x86-opcode-map.txt | 93 +++++++++++++++++++++++++++
 2 files changed, 186 insertions(+)

diff --git a/arch/x86/lib/x86-opcode-map.txt b/arch/x86/lib/x86-opcode-map.txt
index 240ef714b64f..caedb3ef6688 100644
--- a/arch/x86/lib/x86-opcode-map.txt
+++ b/arch/x86/lib/x86-opcode-map.txt
@@ -23,6 +23,7 @@
 #
 # AVX Superscripts
 #  (ev): this opcode requires EVEX prefix.
+#  (es): this opcode requires EVEX prefix and is SCALABALE.
 #  (evo): this opcode is changed by EVEX prefix (EVEX opcode)
 #  (v): this opcode requires VEX prefix.
 #  (v1): this opcode only supports 128bit VEX.
@@ -929,6 +930,98 @@ df: VAESKEYGEN Vdq,Wdq,Ib (66),(v1)
 f0: RORX Gy,Ey,Ib (F2),(v) | HRESET Gv,Ib (F3),(000),(11B)
 EndTable
 
+Table: EVEX map 4
+Referrer:
+AVXcode: 4
+00: ADD Eb,Gb (ev)
+01: ADD Ev,Gv (es) | ADD Ev,Gv (66),(es)
+02: ADD Gb,Eb (ev)
+03: ADD Gv,Ev (es) | ADD Gv,Ev (66),(es)
+08: OR Eb,Gb (ev)
+09: OR Ev,Gv (es) | OR Ev,Gv (66),(es)
+0a: OR Gb,Eb (ev)
+0b: OR Gv,Ev (es) | OR Gv,Ev (66),(es)
+10: ADC Eb,Gb (ev)
+11: ADC Ev,Gv (es) | ADC Ev,Gv (66),(es)
+12: ADC Gb,Eb (ev)
+13: ADC Gv,Ev (es) | ADC Gv,Ev (66),(es)
+18: SBB Eb,Gb (ev)
+19: SBB Ev,Gv (es) | SBB Ev,Gv (66),(es)
+1a: SBB Gb,Eb (ev)
+1b: SBB Gv,Ev (es) | SBB Gv,Ev (66),(es)
+20: AND Eb,Gb (ev)
+21: AND Ev,Gv (es) | AND Ev,Gv (66),(es)
+22: AND Gb,Eb (ev)
+23: AND Gv,Ev (es) | AND Gv,Ev (66),(es)
+24: SHLD Ev,Gv,Ib (es) | SHLD Ev,Gv,Ib (66),(es)
+28: SUB Eb,Gb (ev)
+29: SUB Ev,Gv (es) | SUB Ev,Gv (66),(es)
+2a: SUB Gb,Eb (ev)
+2b: SUB Gv,Ev (es) | SUB Gv,Ev (66),(es)
+2c: SHRD Ev,Gv,Ib (es) | SHRD Ev,Gv,Ib (66),(es)
+30: XOR Eb,Gb (ev)
+31: XOR Ev,Gv (es) | XOR Ev,Gv (66),(es)
+32: XOR Gb,Eb (ev)
+33: XOR Gv,Ev (es) | XOR Gv,Ev (66),(es)
+# CCMPSCC instructions are: CCOMB, CCOMBE, CCOMF, CCOML, CCOMLE, CCOMNB, CCOMNBE, CCOMNL, CCOMNLE,
+#			    CCOMNO, CCOMNS, CCOMNZ, CCOMO, CCOMS, CCOMT, CCOMZ
+38: CCMPSCC Eb,Gb (ev)
+39: CCMPSCC Ev,Gv (es) | CCMPSCC Ev,Gv (66),(es)
+3a: CCMPSCC Gv,Ev (ev)
+3b: CCMPSCC Gv,Ev (es) | CCMPSCC Gv,Ev (66),(es)
+40: CMOVO   Gv,Ev (es) | CMOVO   Gv,Ev (66),(es) | CFCMOVO   Ev,Ev (es) | CFCMOVO   Ev,Ev (66),(es) | SETO   Eb (F2),(ev)
+41: CMOVNO  Gv,Ev (es) | CMOVNO  Gv,Ev (66),(es) | CFCMOVNO  Ev,Ev (es) | CFCMOVNO  Ev,Ev (66),(es) | SETNO  Eb (F2),(ev)
+42: CMOVB   Gv,Ev (es) | CMOVB   Gv,Ev (66),(es) | CFCMOVB   Ev,Ev (es) | CFCMOVB   Ev,Ev (66),(es) | SETB   Eb (F2),(ev)
+43: CMOVNB  Gv,Ev (es) | CMOVNB  Gv,Ev (66),(es) | CFCMOVNB  Ev,Ev (es) | CFCMOVNB  Ev,Ev (66),(es) | SETNB  Eb (F2),(ev)
+44: CMOVZ   Gv,Ev (es) | CMOVZ   Gv,Ev (66),(es) | CFCMOVZ   Ev,Ev (es) | CFCMOVZ   Ev,Ev (66),(es) | SETZ   Eb (F2),(ev)
+45: CMOVNZ  Gv,Ev (es) | CMOVNZ  Gv,Ev (66),(es) | CFCMOVNZ  Ev,Ev (es) | CFCMOVNZ  Ev,Ev (66),(es) | SETNZ  Eb (F2),(ev)
+46: CMOVBE  Gv,Ev (es) | CMOVBE  Gv,Ev (66),(es) | CFCMOVBE  Ev,Ev (es) | CFCMOVBE  Ev,Ev (66),(es) | SETBE  Eb (F2),(ev)
+47: CMOVNBE Gv,Ev (es) | CMOVNBE Gv,Ev (66),(es) | CFCMOVNBE Ev,Ev (es) | CFCMOVNBE Ev,Ev (66),(es) | SETNBE Eb (F2),(ev)
+48: CMOVS   Gv,Ev (es) | CMOVS   Gv,Ev (66),(es) | CFCMOVS   Ev,Ev (es) | CFCMOVS   Ev,Ev (66),(es) | SETS   Eb (F2),(ev)
+49: CMOVNS  Gv,Ev (es) | CMOVNS  Gv,Ev (66),(es) | CFCMOVNS  Ev,Ev (es) | CFCMOVNS  Ev,Ev (66),(es) | SETNS  Eb (F2),(ev)
+4a: CMOVP   Gv,Ev (es) | CMOVP   Gv,Ev (66),(es) | CFCMOVP   Ev,Ev (es) | CFCMOVP   Ev,Ev (66),(es) | SETP   Eb (F2),(ev)
+4b: CMOVNP  Gv,Ev (es) | CMOVNP  Gv,Ev (66),(es) | CFCMOVNP  Ev,Ev (es) | CFCMOVNP  Ev,Ev (66),(es) | SETNP  Eb (F2),(ev)
+4c: CMOVL   Gv,Ev (es) | CMOVL   Gv,Ev (66),(es) | CFCMOVL   Ev,Ev (es) | CFCMOVL   Ev,Ev (66),(es) | SETL   Eb (F2),(ev)
+4d: CMOVNL  Gv,Ev (es) | CMOVNL  Gv,Ev (66),(es) | CFCMOVNL  Ev,Ev (es) | CFCMOVNL  Ev,Ev (66),(es) | SETNL  Eb (F2),(ev)
+4e: CMOVLE  Gv,Ev (es) | CMOVLE  Gv,Ev (66),(es) | CFCMOVLE  Ev,Ev (es) | CFCMOVLE  Ev,Ev (66),(es) | SETLE  Eb (F2),(ev)
+4f: CMOVNLE Gv,Ev (es) | CMOVNLE Gv,Ev (66),(es) | CFCMOVNLE Ev,Ev (es) | CFCMOVNLE Ev,Ev (66),(es) | SETNLE Eb (F2),(ev)
+60: MOVBE Gv,Ev (es) | MOVBE Gv,Ev (66),(es)
+61: MOVBE Ev,Gv (es) | MOVBE Ev,Gv (66),(es)
+65: WRUSSD Md,Gd (66),(ev) | WRUSSQ Mq,Gq (66),(ev)
+66: ADCX Gy,Ey (66),(ev) | ADOX Gy,Ey (F3),(ev) | WRSSD Md,Gd (ev) | WRSSQ Mq,Gq (66),(ev)
+69: IMUL Gv,Ev,Iz (es) | IMUL Gv,Ev,Iz (66),(es)
+6b: IMUL Gv,Ev,Ib (es) | IMUL Gv,Ev,Ib (66),(es)
+80: Grp1 Eb,Ib (1A),(ev)
+81: Grp1 Ev,Iz (1A),(es)
+83: Grp1 Ev,Ib (1A),(es)
+# CTESTSCC instructions are: CTESTB, CTESTBE, CTESTF, CTESTL, CTESTLE, CTESTNB, CTESTNBE, CTESTNL,
+#			     CTESTNLE, CTESTNO, CTESTNS, CTESTNZ, CTESTO, CTESTS, CTESTT, CTESTZ
+84: CTESTSCC (ev)
+85: CTESTSCC (es) | CTESTSCC (66),(es)
+88: POPCNT Gv,Ev (es) | POPCNT Gv,Ev (66),(es)
+8f: POP2 Bq,Rq (000),(11B),(ev)
+a5: SHLD Ev,Gv,CL (es) | SHLD Ev,Gv,CL (66),(es)
+ad: SHRD Ev,Gv,CL (es) | SHRD Ev,Gv,CL (66),(es)
+af: IMUL Gv,Ev (es) | IMUL Gv,Ev (66),(es)
+c0: Grp2 Eb,Ib (1A),(ev)
+c1: Grp2 Ev,Ib (1A),(es)
+d0: Grp2 Eb,1 (1A),(ev)
+d1: Grp2 Ev,1 (1A),(es)
+d2: Grp2 Eb,CL (1A),(ev)
+d3: Grp2 Ev,CL (1A),(es)
+f0: CRC32 Gy,Eb (es) | INVEPT Gq,Mdq (F3),(ev)
+f1: CRC32 Gy,Ey (es) | CRC32 Gy,Ey (66),(es) | INVVPID Gy,Mdq (F3),(ev)
+f2: INVPCID Gy,Mdq (F3),(ev)
+f4: TZCNT Gv,Ev (es) | TZCNT Gv,Ev (66),(es)
+f5: LZCNT Gv,Ev (es) | LZCNT Gv,Ev (66),(es)
+f6: Grp3_1 Eb (1A),(ev)
+f7: Grp3_2 Ev (1A),(es)
+f8: MOVDIR64B Gv,Mdqq (66),(ev) | ENQCMD Gv,Mdqq (F2),(ev) | ENQCMDS Gv,Mdqq (F3),(ev) | URDMSR Rq,Gq (F2),(11B),(ev) | UWRMSR Gq,Rq (F3),(11B),(ev)
+f9: MOVDIRI My,Gy (ev)
+fe: Grp4 (1A),(ev)
+ff: Grp5 (1A),(es) | PUSH2 Bq,Rq (110),(11B),(ev)
+EndTable
+
 Table: EVEX map 5
 Referrer:
 AVXcode: 5
diff --git a/tools/arch/x86/lib/x86-opcode-map.txt b/tools/arch/x86/lib/x86-opcode-map.txt
index 240ef714b64f..caedb3ef6688 100644
--- a/tools/arch/x86/lib/x86-opcode-map.txt
+++ b/tools/arch/x86/lib/x86-opcode-map.txt
@@ -23,6 +23,7 @@
 #
 # AVX Superscripts
 #  (ev): this opcode requires EVEX prefix.
+#  (es): this opcode requires EVEX prefix and is SCALABALE.
 #  (evo): this opcode is changed by EVEX prefix (EVEX opcode)
 #  (v): this opcode requires VEX prefix.
 #  (v1): this opcode only supports 128bit VEX.
@@ -929,6 +930,98 @@ df: VAESKEYGEN Vdq,Wdq,Ib (66),(v1)
 f0: RORX Gy,Ey,Ib (F2),(v) | HRESET Gv,Ib (F3),(000),(11B)
 EndTable
 
+Table: EVEX map 4
+Referrer:
+AVXcode: 4
+00: ADD Eb,Gb (ev)
+01: ADD Ev,Gv (es) | ADD Ev,Gv (66),(es)
+02: ADD Gb,Eb (ev)
+03: ADD Gv,Ev (es) | ADD Gv,Ev (66),(es)
+08: OR Eb,Gb (ev)
+09: OR Ev,Gv (es) | OR Ev,Gv (66),(es)
+0a: OR Gb,Eb (ev)
+0b: OR Gv,Ev (es) | OR Gv,Ev (66),(es)
+10: ADC Eb,Gb (ev)
+11: ADC Ev,Gv (es) | ADC Ev,Gv (66),(es)
+12: ADC Gb,Eb (ev)
+13: ADC Gv,Ev (es) | ADC Gv,Ev (66),(es)
+18: SBB Eb,Gb (ev)
+19: SBB Ev,Gv (es) | SBB Ev,Gv (66),(es)
+1a: SBB Gb,Eb (ev)
+1b: SBB Gv,Ev (es) | SBB Gv,Ev (66),(es)
+20: AND Eb,Gb (ev)
+21: AND Ev,Gv (es) | AND Ev,Gv (66),(es)
+22: AND Gb,Eb (ev)
+23: AND Gv,Ev (es) | AND Gv,Ev (66),(es)
+24: SHLD Ev,Gv,Ib (es) | SHLD Ev,Gv,Ib (66),(es)
+28: SUB Eb,Gb (ev)
+29: SUB Ev,Gv (es) | SUB Ev,Gv (66),(es)
+2a: SUB Gb,Eb (ev)
+2b: SUB Gv,Ev (es) | SUB Gv,Ev (66),(es)
+2c: SHRD Ev,Gv,Ib (es) | SHRD Ev,Gv,Ib (66),(es)
+30: XOR Eb,Gb (ev)
+31: XOR Ev,Gv (es) | XOR Ev,Gv (66),(es)
+32: XOR Gb,Eb (ev)
+33: XOR Gv,Ev (es) | XOR Gv,Ev (66),(es)
+# CCMPSCC instructions are: CCOMB, CCOMBE, CCOMF, CCOML, CCOMLE, CCOMNB, CCOMNBE, CCOMNL, CCOMNLE,
+#			    CCOMNO, CCOMNS, CCOMNZ, CCOMO, CCOMS, CCOMT, CCOMZ
+38: CCMPSCC Eb,Gb (ev)
+39: CCMPSCC Ev,Gv (es) | CCMPSCC Ev,Gv (66),(es)
+3a: CCMPSCC Gv,Ev (ev)
+3b: CCMPSCC Gv,Ev (es) | CCMPSCC Gv,Ev (66),(es)
+40: CMOVO   Gv,Ev (es) | CMOVO   Gv,Ev (66),(es) | CFCMOVO   Ev,Ev (es) | CFCMOVO   Ev,Ev (66),(es) | SETO   Eb (F2),(ev)
+41: CMOVNO  Gv,Ev (es) | CMOVNO  Gv,Ev (66),(es) | CFCMOVNO  Ev,Ev (es) | CFCMOVNO  Ev,Ev (66),(es) | SETNO  Eb (F2),(ev)
+42: CMOVB   Gv,Ev (es) | CMOVB   Gv,Ev (66),(es) | CFCMOVB   Ev,Ev (es) | CFCMOVB   Ev,Ev (66),(es) | SETB   Eb (F2),(ev)
+43: CMOVNB  Gv,Ev (es) | CMOVNB  Gv,Ev (66),(es) | CFCMOVNB  Ev,Ev (es) | CFCMOVNB  Ev,Ev (66),(es) | SETNB  Eb (F2),(ev)
+44: CMOVZ   Gv,Ev (es) | CMOVZ   Gv,Ev (66),(es) | CFCMOVZ   Ev,Ev (es) | CFCMOVZ   Ev,Ev (66),(es) | SETZ   Eb (F2),(ev)
+45: CMOVNZ  Gv,Ev (es) | CMOVNZ  Gv,Ev (66),(es) | CFCMOVNZ  Ev,Ev (es) | CFCMOVNZ  Ev,Ev (66),(es) | SETNZ  Eb (F2),(ev)
+46: CMOVBE  Gv,Ev (es) | CMOVBE  Gv,Ev (66),(es) | CFCMOVBE  Ev,Ev (es) | CFCMOVBE  Ev,Ev (66),(es) | SETBE  Eb (F2),(ev)
+47: CMOVNBE Gv,Ev (es) | CMOVNBE Gv,Ev (66),(es) | CFCMOVNBE Ev,Ev (es) | CFCMOVNBE Ev,Ev (66),(es) | SETNBE Eb (F2),(ev)
+48: CMOVS   Gv,Ev (es) | CMOVS   Gv,Ev (66),(es) | CFCMOVS   Ev,Ev (es) | CFCMOVS   Ev,Ev (66),(es) | SETS   Eb (F2),(ev)
+49: CMOVNS  Gv,Ev (es) | CMOVNS  Gv,Ev (66),(es) | CFCMOVNS  Ev,Ev (es) | CFCMOVNS  Ev,Ev (66),(es) | SETNS  Eb (F2),(ev)
+4a: CMOVP   Gv,Ev (es) | CMOVP   Gv,Ev (66),(es) | CFCMOVP   Ev,Ev (es) | CFCMOVP   Ev,Ev (66),(es) | SETP   Eb (F2),(ev)
+4b: CMOVNP  Gv,Ev (es) | CMOVNP  Gv,Ev (66),(es) | CFCMOVNP  Ev,Ev (es) | CFCMOVNP  Ev,Ev (66),(es) | SETNP  Eb (F2),(ev)
+4c: CMOVL   Gv,Ev (es) | CMOVL   Gv,Ev (66),(es) | CFCMOVL   Ev,Ev (es) | CFCMOVL   Ev,Ev (66),(es) | SETL   Eb (F2),(ev)
+4d: CMOVNL  Gv,Ev (es) | CMOVNL  Gv,Ev (66),(es) | CFCMOVNL  Ev,Ev (es) | CFCMOVNL  Ev,Ev (66),(es) | SETNL  Eb (F2),(ev)
+4e: CMOVLE  Gv,Ev (es) | CMOVLE  Gv,Ev (66),(es) | CFCMOVLE  Ev,Ev (es) | CFCMOVLE  Ev,Ev (66),(es) | SETLE  Eb (F2),(ev)
+4f: CMOVNLE Gv,Ev (es) | CMOVNLE Gv,Ev (66),(es) | CFCMOVNLE Ev,Ev (es) | CFCMOVNLE Ev,Ev (66),(es) | SETNLE Eb (F2),(ev)
+60: MOVBE Gv,Ev (es) | MOVBE Gv,Ev (66),(es)
+61: MOVBE Ev,Gv (es) | MOVBE Ev,Gv (66),(es)
+65: WRUSSD Md,Gd (66),(ev) | WRUSSQ Mq,Gq (66),(ev)
+66: ADCX Gy,Ey (66),(ev) | ADOX Gy,Ey (F3),(ev) | WRSSD Md,Gd (ev) | WRSSQ Mq,Gq (66),(ev)
+69: IMUL Gv,Ev,Iz (es) | IMUL Gv,Ev,Iz (66),(es)
+6b: IMUL Gv,Ev,Ib (es) | IMUL Gv,Ev,Ib (66),(es)
+80: Grp1 Eb,Ib (1A),(ev)
+81: Grp1 Ev,Iz (1A),(es)
+83: Grp1 Ev,Ib (1A),(es)
+# CTESTSCC instructions are: CTESTB, CTESTBE, CTESTF, CTESTL, CTESTLE, CTESTNB, CTESTNBE, CTESTNL,
+#			     CTESTNLE, CTESTNO, CTESTNS, CTESTNZ, CTESTO, CTESTS, CTESTT, CTESTZ
+84: CTESTSCC (ev)
+85: CTESTSCC (es) | CTESTSCC (66),(es)
+88: POPCNT Gv,Ev (es) | POPCNT Gv,Ev (66),(es)
+8f: POP2 Bq,Rq (000),(11B),(ev)
+a5: SHLD Ev,Gv,CL (es) | SHLD Ev,Gv,CL (66),(es)
+ad: SHRD Ev,Gv,CL (es) | SHRD Ev,Gv,CL (66),(es)
+af: IMUL Gv,Ev (es) | IMUL Gv,Ev (66),(es)
+c0: Grp2 Eb,Ib (1A),(ev)
+c1: Grp2 Ev,Ib (1A),(es)
+d0: Grp2 Eb,1 (1A),(ev)
+d1: Grp2 Ev,1 (1A),(es)
+d2: Grp2 Eb,CL (1A),(ev)
+d3: Grp2 Ev,CL (1A),(es)
+f0: CRC32 Gy,Eb (es) | INVEPT Gq,Mdq (F3),(ev)
+f1: CRC32 Gy,Ey (es) | CRC32 Gy,Ey (66),(es) | INVVPID Gy,Mdq (F3),(ev)
+f2: INVPCID Gy,Mdq (F3),(ev)
+f4: TZCNT Gv,Ev (es) | TZCNT Gv,Ev (66),(es)
+f5: LZCNT Gv,Ev (es) | LZCNT Gv,Ev (66),(es)
+f6: Grp3_1 Eb (1A),(ev)
+f7: Grp3_2 Ev (1A),(es)
+f8: MOVDIR64B Gv,Mdqq (66),(ev) | ENQCMD Gv,Mdqq (F2),(ev) | ENQCMDS Gv,Mdqq (F3),(ev) | URDMSR Rq,Gq (F2),(11B),(ev) | UWRMSR Gq,Rq (F3),(11B),(ev)
+f9: MOVDIRI My,Gy (ev)
+fe: Grp4 (1A),(ev)
+ff: Grp5 (1A),(es) | PUSH2 Bq,Rq (110),(11B),(ev)
+EndTable
+
 Table: EVEX map 5
 Referrer:
 AVXcode: 5
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 09/10] perf intel pt: Add new JMPABS instruction to the Intel PT instruction decoder
  2024-05-02 10:58 [PATCH 00/10] perf intel pt: Update instruction decoder for APX and other new instructions Adrian Hunter
                   ` (7 preceding siblings ...)
  2024-05-02 10:58 ` [PATCH 08/10] x86/insn: Add support for APX EVEX instructions to the opcode map Adrian Hunter
@ 2024-05-02 10:58 ` Adrian Hunter
  2024-05-28 18:14   ` Adrian Hunter
  2024-05-02 10:58 ` [PATCH 10/10] perf tests: Add APX and other new instructions to x86 instruction decoder test Adrian Hunter
  2024-06-26  3:59 ` (subset) [PATCH 00/10] perf intel pt: Update instruction decoder for APX and other new instructions Namhyung Kim
  10 siblings, 1 reply; 18+ messages in thread
From: Adrian Hunter @ 2024-05-02 10:58 UTC (permalink / raw)
  To: linux-kernel
  Cc: Chang S. Bae, Masami Hiramatsu, Nikolay Borisov, Borislav Petkov,
	Ingo Molnar, H. Peter Anvin, Dave Hansen, Thomas Gleixner, x86,
	Arnaldo Carvalho de Melo, Jiri Olsa, Namhyung Kim, Ian Rogers,
	linux-perf-users

JMPABS is 64-bit absolute direct jump instruction, encoded with a mandatory
REX2 prefix. JMPABS is designed to be used in the procedure linkage table
(PLT) to replace indirect jumps, because it has better performance. In that
case the jump target will be amended at run time. To enable Intel PT to
follow the code, a TIP packet is always emitted when JMPABS is traced under
Intel PT.

Refer to the Intel Advanced Performance Extensions (Intel APX) Architecture
Specification for details.

Decode JMPABS as an indirect jump, because it has an associated TIP packet
the same as an indirect jump and the control flow should follow the TIP
packet payload, and not assume it is the same as the on-file object code
JMPABS target address.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 tools/perf/util/intel-pt-decoder/intel-pt-insn-decoder.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/tools/perf/util/intel-pt-decoder/intel-pt-insn-decoder.c b/tools/perf/util/intel-pt-decoder/intel-pt-insn-decoder.c
index c5d57027ec23..4407130d91f8 100644
--- a/tools/perf/util/intel-pt-decoder/intel-pt-insn-decoder.c
+++ b/tools/perf/util/intel-pt-decoder/intel-pt-insn-decoder.c
@@ -92,6 +92,15 @@ static void intel_pt_insn_decoder(struct insn *insn,
 		op = INTEL_PT_OP_JCC;
 		branch = INTEL_PT_BR_CONDITIONAL;
 		break;
+	case 0xa1:
+		if (insn_is_rex2(insn)) { /* jmpabs */
+			intel_pt_insn->op = INTEL_PT_OP_JMP;
+			/* jmpabs causes a TIP packet like an indirect branch */
+			intel_pt_insn->branch = INTEL_PT_BR_INDIRECT;
+			intel_pt_insn->length = insn->length;
+			return;
+		}
+		break;
 	case 0xc2: /* near ret */
 	case 0xc3: /* near ret */
 	case 0xca: /* far ret */
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 10/10] perf tests: Add APX and other new instructions to x86 instruction decoder test
  2024-05-02 10:58 [PATCH 00/10] perf intel pt: Update instruction decoder for APX and other new instructions Adrian Hunter
                   ` (8 preceding siblings ...)
  2024-05-02 10:58 ` [PATCH 09/10] perf intel pt: Add new JMPABS instruction to the Intel PT instruction decoder Adrian Hunter
@ 2024-05-02 10:58 ` Adrian Hunter
  2024-06-26  3:59 ` (subset) [PATCH 00/10] perf intel pt: Update instruction decoder for APX and other new instructions Namhyung Kim
  10 siblings, 0 replies; 18+ messages in thread
From: Adrian Hunter @ 2024-05-02 10:58 UTC (permalink / raw)
  To: linux-kernel
  Cc: Chang S. Bae, Masami Hiramatsu, Nikolay Borisov, Borislav Petkov,
	Ingo Molnar, H. Peter Anvin, Dave Hansen, Thomas Gleixner, x86,
	Arnaldo Carvalho de Melo, Jiri Olsa, Namhyung Kim, Ian Rogers,
	linux-perf-users

Add samples of APX and other new instructions to the 'x86 instruction
decoder - new instructions' test.

Note the test is only available if the perf tool has been built with
EXTRA_TESTS=1.

Example:

  $ make EXTRA_TESTS=1 -C tools/perf
  $ tools/perf/perf test -F -v 'new ins' |& grep -i 'jmpabs\|popp\|pushp'
  Decoded ok: d5 00 a1 ef cd ab 90 78 56 34 12    jmpabs $0x1234567890abcdef
  Decoded ok: d5 08 53                    pushp  %rbx
  Decoded ok: d5 18 50                    pushp  %r16
  Decoded ok: d5 19 57                    pushp  %r31
  Decoded ok: d5 19 5f                    popp   %r31
  Decoded ok: d5 18 58                    popp   %r16
  Decoded ok: d5 08 5b                    popp   %rbx

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 tools/perf/arch/x86/tests/insn-x86-dat-32.c  |  116 ++
 tools/perf/arch/x86/tests/insn-x86-dat-64.c  | 1026 ++++++++++++++++++
 tools/perf/arch/x86/tests/insn-x86-dat-src.c |  597 ++++++++++
 3 files changed, 1739 insertions(+)

diff --git a/tools/perf/arch/x86/tests/insn-x86-dat-32.c b/tools/perf/arch/x86/tests/insn-x86-dat-32.c
index ba429cadb18f..ce9645edaf68 100644
--- a/tools/perf/arch/x86/tests/insn-x86-dat-32.c
+++ b/tools/perf/arch/x86/tests/insn-x86-dat-32.c
@@ -3107,6 +3107,122 @@
 "62 f5 7c 08 2e ca    \tvucomish %xmm2,%xmm1",},
 {{0x62, 0xf5, 0x7c, 0x08, 0x2e, 0x8c, 0xc8, 0x78, 0x56, 0x34, 0x12, }, 11, 0, "", "",
 "62 f5 7c 08 2e 8c c8 78 56 34 12 \tvucomish 0x12345678(%eax,%ecx,8),%xmm1",},
+{{0xf3, 0x0f, 0x38, 0xdc, 0xd1, }, 5, 0, "", "",
+"f3 0f 38 dc d1       \tloadiwkey %xmm1,%xmm2",},
+{{0xf3, 0x0f, 0x38, 0xfa, 0xd0, }, 5, 0, "", "",
+"f3 0f 38 fa d0       \tencodekey128 %eax,%edx",},
+{{0xf3, 0x0f, 0x38, 0xfb, 0xd0, }, 5, 0, "", "",
+"f3 0f 38 fb d0       \tencodekey256 %eax,%edx",},
+{{0xf3, 0x0f, 0x38, 0xdc, 0x5a, 0x77, }, 6, 0, "", "",
+"f3 0f 38 dc 5a 77    \taesenc128kl 0x77(%edx),%xmm3",},
+{{0xf3, 0x0f, 0x38, 0xde, 0x5a, 0x77, }, 6, 0, "", "",
+"f3 0f 38 de 5a 77    \taesenc256kl 0x77(%edx),%xmm3",},
+{{0xf3, 0x0f, 0x38, 0xdd, 0x5a, 0x77, }, 6, 0, "", "",
+"f3 0f 38 dd 5a 77    \taesdec128kl 0x77(%edx),%xmm3",},
+{{0xf3, 0x0f, 0x38, 0xdf, 0x5a, 0x77, }, 6, 0, "", "",
+"f3 0f 38 df 5a 77    \taesdec256kl 0x77(%edx),%xmm3",},
+{{0xf3, 0x0f, 0x38, 0xd8, 0x42, 0x77, }, 6, 0, "", "",
+"f3 0f 38 d8 42 77    \taesencwide128kl 0x77(%edx)",},
+{{0xf3, 0x0f, 0x38, 0xd8, 0x52, 0x77, }, 6, 0, "", "",
+"f3 0f 38 d8 52 77    \taesencwide256kl 0x77(%edx)",},
+{{0xf3, 0x0f, 0x38, 0xd8, 0x4a, 0x77, }, 6, 0, "", "",
+"f3 0f 38 d8 4a 77    \taesdecwide128kl 0x77(%edx)",},
+{{0xf3, 0x0f, 0x38, 0xd8, 0x5a, 0x77, }, 6, 0, "", "",
+"f3 0f 38 d8 5a 77    \taesdecwide256kl 0x77(%edx)",},
+{{0x0f, 0x38, 0xfc, 0x08, }, 4, 0, "", "",
+"0f 38 fc 08          \taadd   %ecx,(%eax)",},
+{{0x0f, 0x38, 0xfc, 0x15, 0x78, 0x56, 0x34, 0x12, }, 8, 0, "", "",
+"0f 38 fc 15 78 56 34 12 \taadd   %edx,0x12345678",},
+{{0x0f, 0x38, 0xfc, 0x94, 0xc8, 0x78, 0x56, 0x34, 0x12, }, 9, 0, "", "",
+"0f 38 fc 94 c8 78 56 34 12 \taadd   %edx,0x12345678(%eax,%ecx,8)",},
+{{0x66, 0x0f, 0x38, 0xfc, 0x08, }, 5, 0, "", "",
+"66 0f 38 fc 08       \taand   %ecx,(%eax)",},
+{{0x66, 0x0f, 0x38, 0xfc, 0x15, 0x78, 0x56, 0x34, 0x12, }, 9, 0, "", "",
+"66 0f 38 fc 15 78 56 34 12 \taand   %edx,0x12345678",},
+{{0x66, 0x0f, 0x38, 0xfc, 0x94, 0xc8, 0x78, 0x56, 0x34, 0x12, }, 10, 0, "", "",
+"66 0f 38 fc 94 c8 78 56 34 12 \taand   %edx,0x12345678(%eax,%ecx,8)",},
+{{0xf2, 0x0f, 0x38, 0xfc, 0x08, }, 5, 0, "", "",
+"f2 0f 38 fc 08       \taor    %ecx,(%eax)",},
+{{0xf2, 0x0f, 0x38, 0xfc, 0x15, 0x78, 0x56, 0x34, 0x12, }, 9, 0, "", "",
+"f2 0f 38 fc 15 78 56 34 12 \taor    %edx,0x12345678",},
+{{0xf2, 0x0f, 0x38, 0xfc, 0x94, 0xc8, 0x78, 0x56, 0x34, 0x12, }, 10, 0, "", "",
+"f2 0f 38 fc 94 c8 78 56 34 12 \taor    %edx,0x12345678(%eax,%ecx,8)",},
+{{0xf3, 0x0f, 0x38, 0xfc, 0x08, }, 5, 0, "", "",
+"f3 0f 38 fc 08       \taxor   %ecx,(%eax)",},
+{{0xf3, 0x0f, 0x38, 0xfc, 0x15, 0x78, 0x56, 0x34, 0x12, }, 9, 0, "", "",
+"f3 0f 38 fc 15 78 56 34 12 \taxor   %edx,0x12345678",},
+{{0xf3, 0x0f, 0x38, 0xfc, 0x94, 0xc8, 0x78, 0x56, 0x34, 0x12, }, 10, 0, "", "",
+"f3 0f 38 fc 94 c8 78 56 34 12 \taxor   %edx,0x12345678(%eax,%ecx,8)",},
+{{0xc4, 0xe2, 0x7a, 0xb1, 0x31, }, 5, 0, "", "",
+"c4 e2 7a b1 31       \tvbcstnebf162ps (%ecx),%xmm6",},
+{{0xc4, 0xe2, 0x79, 0xb1, 0x31, }, 5, 0, "", "",
+"c4 e2 79 b1 31       \tvbcstnesh2ps (%ecx),%xmm6",},
+{{0xc4, 0xe2, 0x7a, 0xb0, 0x31, }, 5, 0, "", "",
+"c4 e2 7a b0 31       \tvcvtneebf162ps (%ecx),%xmm6",},
+{{0xc4, 0xe2, 0x79, 0xb0, 0x31, }, 5, 0, "", "",
+"c4 e2 79 b0 31       \tvcvtneeph2ps (%ecx),%xmm6",},
+{{0xc4, 0xe2, 0x7b, 0xb0, 0x31, }, 5, 0, "", "",
+"c4 e2 7b b0 31       \tvcvtneobf162ps (%ecx),%xmm6",},
+{{0xc4, 0xe2, 0x78, 0xb0, 0x31, }, 5, 0, "", "",
+"c4 e2 78 b0 31       \tvcvtneoph2ps (%ecx),%xmm6",},
+{{0x62, 0xf2, 0x7e, 0x08, 0x72, 0xf1, }, 6, 0, "", "",
+"62 f2 7e 08 72 f1    \tvcvtneps2bf16 %xmm1,%xmm6",},
+{{0xc4, 0xe2, 0x6b, 0x50, 0xd9, }, 5, 0, "", "",
+"c4 e2 6b 50 d9       \tvpdpbssd %xmm1,%xmm2,%xmm3",},
+{{0xc4, 0xe2, 0x6b, 0x51, 0xd9, }, 5, 0, "", "",
+"c4 e2 6b 51 d9       \tvpdpbssds %xmm1,%xmm2,%xmm3",},
+{{0xc4, 0xe2, 0x6a, 0x50, 0xd9, }, 5, 0, "", "",
+"c4 e2 6a 50 d9       \tvpdpbsud %xmm1,%xmm2,%xmm3",},
+{{0xc4, 0xe2, 0x6a, 0x51, 0xd9, }, 5, 0, "", "",
+"c4 e2 6a 51 d9       \tvpdpbsuds %xmm1,%xmm2,%xmm3",},
+{{0xc4, 0xe2, 0x68, 0x50, 0xd9, }, 5, 0, "", "",
+"c4 e2 68 50 d9       \tvpdpbuud %xmm1,%xmm2,%xmm3",},
+{{0xc4, 0xe2, 0x68, 0x51, 0xd9, }, 5, 0, "", "",
+"c4 e2 68 51 d9       \tvpdpbuuds %xmm1,%xmm2,%xmm3",},
+{{0xc4, 0xe2, 0x6a, 0xd2, 0xd9, }, 5, 0, "", "",
+"c4 e2 6a d2 d9       \tvpdpwsud %xmm1,%xmm2,%xmm3",},
+{{0xc4, 0xe2, 0x6a, 0xd3, 0xd9, }, 5, 0, "", "",
+"c4 e2 6a d3 d9       \tvpdpwsuds %xmm1,%xmm2,%xmm3",},
+{{0xc4, 0xe2, 0x69, 0xd2, 0xd9, }, 5, 0, "", "",
+"c4 e2 69 d2 d9       \tvpdpwusd %xmm1,%xmm2,%xmm3",},
+{{0xc4, 0xe2, 0x69, 0xd3, 0xd9, }, 5, 0, "", "",
+"c4 e2 69 d3 d9       \tvpdpwusds %xmm1,%xmm2,%xmm3",},
+{{0xc4, 0xe2, 0x68, 0xd2, 0xd9, }, 5, 0, "", "",
+"c4 e2 68 d2 d9       \tvpdpwuud %xmm1,%xmm2,%xmm3",},
+{{0xc4, 0xe2, 0x68, 0xd3, 0xd9, }, 5, 0, "", "",
+"c4 e2 68 d3 d9       \tvpdpwuuds %xmm1,%xmm2,%xmm3",},
+{{0x62, 0xf2, 0xed, 0x08, 0xb5, 0xd9, }, 6, 0, "", "",
+"62 f2 ed 08 b5 d9    \tvpmadd52huq %xmm1,%xmm2,%xmm3",},
+{{0x62, 0xf2, 0xed, 0x08, 0xb4, 0xd9, }, 6, 0, "", "",
+"62 f2 ed 08 b4 d9    \tvpmadd52luq %xmm1,%xmm2,%xmm3",},
+{{0xc4, 0xe2, 0x7f, 0xcc, 0xd1, }, 5, 0, "", "",
+"c4 e2 7f cc d1       \tvsha512msg1 %xmm1,%ymm2",},
+{{0xc4, 0xe2, 0x7f, 0xcd, 0xd1, }, 5, 0, "", "",
+"c4 e2 7f cd d1       \tvsha512msg2 %ymm1,%ymm2",},
+{{0xc4, 0xe2, 0x6f, 0xcb, 0xd9, }, 5, 0, "", "",
+"c4 e2 6f cb d9       \tvsha512rnds2 %xmm1,%ymm2,%ymm3",},
+{{0xc4, 0xe2, 0x68, 0xda, 0xd9, }, 5, 0, "", "",
+"c4 e2 68 da d9       \tvsm3msg1 %xmm1,%xmm2,%xmm3",},
+{{0xc4, 0xe2, 0x69, 0xda, 0xd9, }, 5, 0, "", "",
+"c4 e2 69 da d9       \tvsm3msg2 %xmm1,%xmm2,%xmm3",},
+{{0xc4, 0xe3, 0x69, 0xde, 0xd9, 0xa1, }, 6, 0, "", "",
+"c4 e3 69 de d9 a1    \tvsm3rnds2 $0xa1,%xmm1,%xmm2,%xmm3",},
+{{0xc4, 0xe2, 0x6a, 0xda, 0xd9, }, 5, 0, "", "",
+"c4 e2 6a da d9       \tvsm4key4 %xmm1,%xmm2,%xmm3",},
+{{0xc4, 0xe2, 0x6b, 0xda, 0xd9, }, 5, 0, "", "",
+"c4 e2 6b da d9       \tvsm4rnds4 %xmm1,%xmm2,%xmm3",},
+{{0x0f, 0x0d, 0x00, }, 3, 0, "", "",
+"0f 0d 00             \tprefetch (%eax)",},
+{{0x0f, 0x18, 0x08, }, 3, 0, "", "",
+"0f 18 08             \tprefetcht0 (%eax)",},
+{{0x0f, 0x18, 0x10, }, 3, 0, "", "",
+"0f 18 10             \tprefetcht1 (%eax)",},
+{{0x0f, 0x18, 0x18, }, 3, 0, "", "",
+"0f 18 18             \tprefetcht2 (%eax)",},
+{{0x0f, 0x18, 0x00, }, 3, 0, "", "",
+"0f 18 00             \tprefetchnta (%eax)",},
+{{0x0f, 0x01, 0xc6, }, 3, 0, "", "",
+"0f 01 c6             \twrmsrns",},
 {{0xf3, 0x0f, 0x3a, 0xf0, 0xc0, 0x00, }, 6, 0, "", "",
 "f3 0f 3a f0 c0 00    \threset $0x0",},
 {{0x0f, 0x01, 0xe8, }, 3, 0, "", "",
diff --git a/tools/perf/arch/x86/tests/insn-x86-dat-64.c b/tools/perf/arch/x86/tests/insn-x86-dat-64.c
index 3a47e98fec33..3881fe89df8b 100644
--- a/tools/perf/arch/x86/tests/insn-x86-dat-64.c
+++ b/tools/perf/arch/x86/tests/insn-x86-dat-64.c
@@ -3877,6 +3877,1032 @@
 "62 f5 7c 08 2e 8c c8 78 56 34 12 \tvucomish 0x12345678(%rax,%rcx,8),%xmm1",},
 {{0x67, 0x62, 0xf5, 0x7c, 0x08, 0x2e, 0x8c, 0xc8, 0x78, 0x56, 0x34, 0x12, }, 12, 0, "", "",
 "67 62 f5 7c 08 2e 8c c8 78 56 34 12 \tvucomish 0x12345678(%eax,%ecx,8),%xmm1",},
+{{0xf3, 0x0f, 0x38, 0xdc, 0xd1, }, 5, 0, "", "",
+"f3 0f 38 dc d1       \tloadiwkey %xmm1,%xmm2",},
+{{0xf3, 0x0f, 0x38, 0xfa, 0xd0, }, 5, 0, "", "",
+"f3 0f 38 fa d0       \tencodekey128 %eax,%edx",},
+{{0xf3, 0x0f, 0x38, 0xfb, 0xd0, }, 5, 0, "", "",
+"f3 0f 38 fb d0       \tencodekey256 %eax,%edx",},
+{{0xf3, 0x0f, 0x38, 0xdc, 0x5a, 0x77, }, 6, 0, "", "",
+"f3 0f 38 dc 5a 77    \taesenc128kl 0x77(%rdx),%xmm3",},
+{{0xf3, 0x0f, 0x38, 0xde, 0x5a, 0x77, }, 6, 0, "", "",
+"f3 0f 38 de 5a 77    \taesenc256kl 0x77(%rdx),%xmm3",},
+{{0xf3, 0x0f, 0x38, 0xdd, 0x5a, 0x77, }, 6, 0, "", "",
+"f3 0f 38 dd 5a 77    \taesdec128kl 0x77(%rdx),%xmm3",},
+{{0xf3, 0x0f, 0x38, 0xdf, 0x5a, 0x77, }, 6, 0, "", "",
+"f3 0f 38 df 5a 77    \taesdec256kl 0x77(%rdx),%xmm3",},
+{{0xf3, 0x0f, 0x38, 0xd8, 0x42, 0x77, }, 6, 0, "", "",
+"f3 0f 38 d8 42 77    \taesencwide128kl 0x77(%rdx)",},
+{{0xf3, 0x0f, 0x38, 0xd8, 0x52, 0x77, }, 6, 0, "", "",
+"f3 0f 38 d8 52 77    \taesencwide256kl 0x77(%rdx)",},
+{{0xf3, 0x0f, 0x38, 0xd8, 0x4a, 0x77, }, 6, 0, "", "",
+"f3 0f 38 d8 4a 77    \taesdecwide128kl 0x77(%rdx)",},
+{{0xf3, 0x0f, 0x38, 0xd8, 0x5a, 0x77, }, 6, 0, "", "",
+"f3 0f 38 d8 5a 77    \taesdecwide256kl 0x77(%rdx)",},
+{{0x0f, 0x38, 0xfc, 0x08, }, 4, 0, "", "",
+"0f 38 fc 08          \taadd   %ecx,(%rax)",},
+{{0x41, 0x0f, 0x38, 0xfc, 0x10, }, 5, 0, "", "",
+"41 0f 38 fc 10       \taadd   %edx,(%r8)",},
+{{0x0f, 0x38, 0xfc, 0x94, 0xc8, 0x78, 0x56, 0x34, 0x12, }, 9, 0, "", "",
+"0f 38 fc 94 c8 78 56 34 12 \taadd   %edx,0x12345678(%rax,%rcx,8)",},
+{{0x41, 0x0f, 0x38, 0xfc, 0x94, 0xc8, 0x78, 0x56, 0x34, 0x12, }, 10, 0, "", "",
+"41 0f 38 fc 94 c8 78 56 34 12 \taadd   %edx,0x12345678(%r8,%rcx,8)",},
+{{0x48, 0x0f, 0x38, 0xfc, 0x08, }, 5, 0, "", "",
+"48 0f 38 fc 08       \taadd   %rcx,(%rax)",},
+{{0x49, 0x0f, 0x38, 0xfc, 0x10, }, 5, 0, "", "",
+"49 0f 38 fc 10       \taadd   %rdx,(%r8)",},
+{{0x48, 0x0f, 0x38, 0xfc, 0x14, 0x25, 0x78, 0x56, 0x34, 0x12, }, 10, 0, "", "",
+"48 0f 38 fc 14 25 78 56 34 12 \taadd   %rdx,0x12345678",},
+{{0x48, 0x0f, 0x38, 0xfc, 0x94, 0xc8, 0x78, 0x56, 0x34, 0x12, }, 10, 0, "", "",
+"48 0f 38 fc 94 c8 78 56 34 12 \taadd   %rdx,0x12345678(%rax,%rcx,8)",},
+{{0x49, 0x0f, 0x38, 0xfc, 0x94, 0xc8, 0x78, 0x56, 0x34, 0x12, }, 10, 0, "", "",
+"49 0f 38 fc 94 c8 78 56 34 12 \taadd   %rdx,0x12345678(%r8,%rcx,8)",},
+{{0x66, 0x0f, 0x38, 0xfc, 0x08, }, 5, 0, "", "",
+"66 0f 38 fc 08       \taand   %ecx,(%rax)",},
+{{0x66, 0x41, 0x0f, 0x38, 0xfc, 0x10, }, 6, 0, "", "",
+"66 41 0f 38 fc 10    \taand   %edx,(%r8)",},
+{{0x66, 0x0f, 0x38, 0xfc, 0x94, 0xc8, 0x78, 0x56, 0x34, 0x12, }, 10, 0, "", "",
+"66 0f 38 fc 94 c8 78 56 34 12 \taand   %edx,0x12345678(%rax,%rcx,8)",},
+{{0x66, 0x41, 0x0f, 0x38, 0xfc, 0x94, 0xc8, 0x78, 0x56, 0x34, 0x12, }, 11, 0, "", "",
+"66 41 0f 38 fc 94 c8 78 56 34 12 \taand   %edx,0x12345678(%r8,%rcx,8)",},
+{{0x66, 0x48, 0x0f, 0x38, 0xfc, 0x08, }, 6, 0, "", "",
+"66 48 0f 38 fc 08    \taand   %rcx,(%rax)",},
+{{0x66, 0x49, 0x0f, 0x38, 0xfc, 0x10, }, 6, 0, "", "",
+"66 49 0f 38 fc 10    \taand   %rdx,(%r8)",},
+{{0x66, 0x48, 0x0f, 0x38, 0xfc, 0x14, 0x25, 0x78, 0x56, 0x34, 0x12, }, 11, 0, "", "",
+"66 48 0f 38 fc 14 25 78 56 34 12 \taand   %rdx,0x12345678",},
+{{0x66, 0x48, 0x0f, 0x38, 0xfc, 0x94, 0xc8, 0x78, 0x56, 0x34, 0x12, }, 11, 0, "", "",
+"66 48 0f 38 fc 94 c8 78 56 34 12 \taand   %rdx,0x12345678(%rax,%rcx,8)",},
+{{0x66, 0x49, 0x0f, 0x38, 0xfc, 0x94, 0xc8, 0x78, 0x56, 0x34, 0x12, }, 11, 0, "", "",
+"66 49 0f 38 fc 94 c8 78 56 34 12 \taand   %rdx,0x12345678(%r8,%rcx,8)",},
+{{0xf2, 0x0f, 0x38, 0xfc, 0x08, }, 5, 0, "", "",
+"f2 0f 38 fc 08       \taor    %ecx,(%rax)",},
+{{0xf2, 0x41, 0x0f, 0x38, 0xfc, 0x10, }, 6, 0, "", "",
+"f2 41 0f 38 fc 10    \taor    %edx,(%r8)",},
+{{0xf2, 0x0f, 0x38, 0xfc, 0x94, 0xc8, 0x78, 0x56, 0x34, 0x12, }, 10, 0, "", "",
+"f2 0f 38 fc 94 c8 78 56 34 12 \taor    %edx,0x12345678(%rax,%rcx,8)",},
+{{0xf2, 0x41, 0x0f, 0x38, 0xfc, 0x94, 0xc8, 0x78, 0x56, 0x34, 0x12, }, 11, 0, "", "",
+"f2 41 0f 38 fc 94 c8 78 56 34 12 \taor    %edx,0x12345678(%r8,%rcx,8)",},
+{{0xf2, 0x48, 0x0f, 0x38, 0xfc, 0x08, }, 6, 0, "", "",
+"f2 48 0f 38 fc 08    \taor    %rcx,(%rax)",},
+{{0xf2, 0x49, 0x0f, 0x38, 0xfc, 0x10, }, 6, 0, "", "",
+"f2 49 0f 38 fc 10    \taor    %rdx,(%r8)",},
+{{0xf2, 0x48, 0x0f, 0x38, 0xfc, 0x14, 0x25, 0x78, 0x56, 0x34, 0x12, }, 11, 0, "", "",
+"f2 48 0f 38 fc 14 25 78 56 34 12 \taor    %rdx,0x12345678",},
+{{0xf2, 0x48, 0x0f, 0x38, 0xfc, 0x94, 0xc8, 0x78, 0x56, 0x34, 0x12, }, 11, 0, "", "",
+"f2 48 0f 38 fc 94 c8 78 56 34 12 \taor    %rdx,0x12345678(%rax,%rcx,8)",},
+{{0xf2, 0x49, 0x0f, 0x38, 0xfc, 0x94, 0xc8, 0x78, 0x56, 0x34, 0x12, }, 11, 0, "", "",
+"f2 49 0f 38 fc 94 c8 78 56 34 12 \taor    %rdx,0x12345678(%r8,%rcx,8)",},
+{{0xf3, 0x0f, 0x38, 0xfc, 0x08, }, 5, 0, "", "",
+"f3 0f 38 fc 08       \taxor   %ecx,(%rax)",},
+{{0xf3, 0x41, 0x0f, 0x38, 0xfc, 0x10, }, 6, 0, "", "",
+"f3 41 0f 38 fc 10    \taxor   %edx,(%r8)",},
+{{0xf3, 0x0f, 0x38, 0xfc, 0x94, 0xc8, 0x78, 0x56, 0x34, 0x12, }, 10, 0, "", "",
+"f3 0f 38 fc 94 c8 78 56 34 12 \taxor   %edx,0x12345678(%rax,%rcx,8)",},
+{{0xf3, 0x41, 0x0f, 0x38, 0xfc, 0x94, 0xc8, 0x78, 0x56, 0x34, 0x12, }, 11, 0, "", "",
+"f3 41 0f 38 fc 94 c8 78 56 34 12 \taxor   %edx,0x12345678(%r8,%rcx,8)",},
+{{0xf3, 0x48, 0x0f, 0x38, 0xfc, 0x08, }, 6, 0, "", "",
+"f3 48 0f 38 fc 08    \taxor   %rcx,(%rax)",},
+{{0xf3, 0x49, 0x0f, 0x38, 0xfc, 0x10, }, 6, 0, "", "",
+"f3 49 0f 38 fc 10    \taxor   %rdx,(%r8)",},
+{{0xf3, 0x48, 0x0f, 0x38, 0xfc, 0x14, 0x25, 0x78, 0x56, 0x34, 0x12, }, 11, 0, "", "",
+"f3 48 0f 38 fc 14 25 78 56 34 12 \taxor   %rdx,0x12345678",},
+{{0xf3, 0x48, 0x0f, 0x38, 0xfc, 0x94, 0xc8, 0x78, 0x56, 0x34, 0x12, }, 11, 0, "", "",
+"f3 48 0f 38 fc 94 c8 78 56 34 12 \taxor   %rdx,0x12345678(%rax,%rcx,8)",},
+{{0xf3, 0x49, 0x0f, 0x38, 0xfc, 0x94, 0xc8, 0x78, 0x56, 0x34, 0x12, }, 11, 0, "", "",
+"f3 49 0f 38 fc 94 c8 78 56 34 12 \taxor   %rdx,0x12345678(%r8,%rcx,8)",},
+{{0xc4, 0xc2, 0x61, 0xe6, 0x09, }, 5, 0, "", "",
+"c4 c2 61 e6 09       \tcmpbexadd %ebx,%ecx,(%r9)",},
+{{0xc4, 0xc2, 0x61, 0xe2, 0x09, }, 5, 0, "", "",
+"c4 c2 61 e2 09       \tcmpbxadd %ebx,%ecx,(%r9)",},
+{{0xc4, 0xc2, 0x61, 0xee, 0x09, }, 5, 0, "", "",
+"c4 c2 61 ee 09       \tcmplexadd %ebx,%ecx,(%r9)",},
+{{0xc4, 0xc2, 0x61, 0xec, 0x09, }, 5, 0, "", "",
+"c4 c2 61 ec 09       \tcmplxadd %ebx,%ecx,(%r9)",},
+{{0xc4, 0xc2, 0x61, 0xe7, 0x09, }, 5, 0, "", "",
+"c4 c2 61 e7 09       \tcmpnbexadd %ebx,%ecx,(%r9)",},
+{{0xc4, 0xc2, 0x61, 0xe3, 0x09, }, 5, 0, "", "",
+"c4 c2 61 e3 09       \tcmpnbxadd %ebx,%ecx,(%r9)",},
+{{0xc4, 0xc2, 0x61, 0xef, 0x09, }, 5, 0, "", "",
+"c4 c2 61 ef 09       \tcmpnlexadd %ebx,%ecx,(%r9)",},
+{{0xc4, 0xc2, 0x61, 0xed, 0x09, }, 5, 0, "", "",
+"c4 c2 61 ed 09       \tcmpnlxadd %ebx,%ecx,(%r9)",},
+{{0xc4, 0xc2, 0x61, 0xe1, 0x09, }, 5, 0, "", "",
+"c4 c2 61 e1 09       \tcmpnoxadd %ebx,%ecx,(%r9)",},
+{{0xc4, 0xc2, 0x61, 0xeb, 0x09, }, 5, 0, "", "",
+"c4 c2 61 eb 09       \tcmpnpxadd %ebx,%ecx,(%r9)",},
+{{0xc4, 0xc2, 0x61, 0xe9, 0x09, }, 5, 0, "", "",
+"c4 c2 61 e9 09       \tcmpnsxadd %ebx,%ecx,(%r9)",},
+{{0xc4, 0xc2, 0x61, 0xe5, 0x09, }, 5, 0, "", "",
+"c4 c2 61 e5 09       \tcmpnzxadd %ebx,%ecx,(%r9)",},
+{{0xc4, 0xc2, 0x61, 0xe0, 0x09, }, 5, 0, "", "",
+"c4 c2 61 e0 09       \tcmpoxadd %ebx,%ecx,(%r9)",},
+{{0xc4, 0xc2, 0x61, 0xea, 0x09, }, 5, 0, "", "",
+"c4 c2 61 ea 09       \tcmppxadd %ebx,%ecx,(%r9)",},
+{{0xc4, 0xc2, 0x61, 0xe8, 0x09, }, 5, 0, "", "",
+"c4 c2 61 e8 09       \tcmpsxadd %ebx,%ecx,(%r9)",},
+{{0xc4, 0xc2, 0x61, 0xe4, 0x09, }, 5, 0, "", "",
+"c4 c2 61 e4 09       \tcmpzxadd %ebx,%ecx,(%r9)",},
+{{0x0f, 0x0d, 0x00, }, 3, 0, "", "",
+"0f 0d 00             \tprefetch (%rax)",},
+{{0x0f, 0x18, 0x08, }, 3, 0, "", "",
+"0f 18 08             \tprefetcht0 (%rax)",},
+{{0x0f, 0x18, 0x10, }, 3, 0, "", "",
+"0f 18 10             \tprefetcht1 (%rax)",},
+{{0x0f, 0x18, 0x18, }, 3, 0, "", "",
+"0f 18 18             \tprefetcht2 (%rax)",},
+{{0x0f, 0x18, 0x00, }, 3, 0, "", "",
+"0f 18 00             \tprefetchnta (%rax)",},
+{{0x0f, 0x18, 0x3d, 0x78, 0x56, 0x34, 0x12, }, 7, 0, "", "",
+"0f 18 3d 78 56 34 12 \tprefetchit0 0x12345678(%rip)        # 1234924e <main+0x1234924e>",},
+{{0x0f, 0x18, 0x35, 0x78, 0x56, 0x34, 0x12, }, 7, 0, "", "",
+"0f 18 35 78 56 34 12 \tprefetchit1 0x12345678(%rip)        # 12349255 <main+0x12349255>",},
+{{0xf2, 0x0f, 0x01, 0xc6, }, 4, 0, "", "",
+"f2 0f 01 c6          \trdmsrlist",},
+{{0xf3, 0x0f, 0x01, 0xc6, }, 4, 0, "", "",
+"f3 0f 01 c6          \twrmsrlist",},
+{{0xf2, 0x0f, 0x38, 0xf8, 0xd0, }, 5, 0, "", "",
+"f2 0f 38 f8 d0       \turdmsr %rdx,%rax",},
+{{0x62, 0xfc, 0x7f, 0x08, 0xf8, 0xd6, }, 6, 0, "", "",
+"62 fc 7f 08 f8 d6    \turdmsr %rdx,%r22",},
+{{0xc4, 0xc7, 0x7b, 0xf8, 0xc4, 0x7f, 0x00, 0x00, 0x00, }, 9, 0, "", "",
+"c4 c7 7b f8 c4 7f 00 00 00 \turdmsr $0x7f,%r12",},
+{{0xf3, 0x0f, 0x38, 0xf8, 0xd0, }, 5, 0, "", "",
+"f3 0f 38 f8 d0       \tuwrmsr %rax,%rdx",},
+{{0x62, 0xfc, 0x7e, 0x08, 0xf8, 0xd6, }, 6, 0, "", "",
+"62 fc 7e 08 f8 d6    \tuwrmsr %r22,%rdx",},
+{{0xc4, 0xc7, 0x7a, 0xf8, 0xc4, 0x7f, 0x00, 0x00, 0x00, }, 9, 0, "", "",
+"c4 c7 7a f8 c4 7f 00 00 00 \tuwrmsr %r12,$0x7f",},
+{{0xc4, 0xe2, 0x7a, 0xb1, 0x31, }, 5, 0, "", "",
+"c4 e2 7a b1 31       \tvbcstnebf162ps (%rcx),%xmm6",},
+{{0xc4, 0xe2, 0x79, 0xb1, 0x31, }, 5, 0, "", "",
+"c4 e2 79 b1 31       \tvbcstnesh2ps (%rcx),%xmm6",},
+{{0xc4, 0xe2, 0x7a, 0xb0, 0x31, }, 5, 0, "", "",
+"c4 e2 7a b0 31       \tvcvtneebf162ps (%rcx),%xmm6",},
+{{0xc4, 0xe2, 0x79, 0xb0, 0x31, }, 5, 0, "", "",
+"c4 e2 79 b0 31       \tvcvtneeph2ps (%rcx),%xmm6",},
+{{0xc4, 0xe2, 0x7b, 0xb0, 0x31, }, 5, 0, "", "",
+"c4 e2 7b b0 31       \tvcvtneobf162ps (%rcx),%xmm6",},
+{{0xc4, 0xe2, 0x78, 0xb0, 0x31, }, 5, 0, "", "",
+"c4 e2 78 b0 31       \tvcvtneoph2ps (%rcx),%xmm6",},
+{{0x62, 0xf2, 0x7e, 0x08, 0x72, 0xf1, }, 6, 0, "", "",
+"62 f2 7e 08 72 f1    \tvcvtneps2bf16 %xmm1,%xmm6",},
+{{0xf2, 0x0f, 0x01, 0xca, }, 4, 0, "erets", "indirect",
+"f2 0f 01 ca          \terets",},
+{{0xf3, 0x0f, 0x01, 0xca, }, 4, 0, "eretu", "indirect",
+"f3 0f 01 ca          \teretu",},
+{{0xc4, 0xe2, 0x71, 0x6c, 0xda, }, 5, 0, "", "",
+"c4 e2 71 6c da       \ttcmmimfp16ps %tmm1,%tmm2,%tmm3",},
+{{0xc4, 0xe2, 0x70, 0x6c, 0xda, }, 5, 0, "", "",
+"c4 e2 70 6c da       \ttcmmrlfp16ps %tmm1,%tmm2,%tmm3",},
+{{0xc4, 0xe2, 0x73, 0x5c, 0xda, }, 5, 0, "", "",
+"c4 e2 73 5c da       \ttdpfp16ps %tmm1,%tmm2,%tmm3",},
+{{0xd5, 0x10, 0xf6, 0xc2, 0x05, }, 5, 0, "", "",
+"d5 10 f6 c2 05       \ttest   $0x5,%r18b",},
+{{0xd5, 0x10, 0xf7, 0xc2, 0x05, 0x00, 0x00, 0x00, }, 8, 0, "", "",
+"d5 10 f7 c2 05 00 00 00 \ttest   $0x5,%r18d",},
+{{0xd5, 0x18, 0xf7, 0xc2, 0x05, 0x00, 0x00, 0x00, }, 8, 0, "", "",
+"d5 18 f7 c2 05 00 00 00 \ttest   $0x5,%r18",},
+{{0x66, 0xd5, 0x10, 0xf7, 0xc2, 0x05, 0x00, }, 7, 0, "", "",
+"66 d5 10 f7 c2 05 00 \ttest   $0x5,%r18w",},
+{{0x44, 0x0f, 0xaf, 0xf0, }, 4, 0, "", "",
+"44 0f af f0          \timul   %eax,%r14d",},
+{{0xd5, 0xc0, 0xaf, 0xc8, }, 4, 0, "", "",
+"d5 c0 af c8          \timul   %eax,%r17d",},
+{{0xd5, 0x90, 0x62, 0x12, }, 4, 0, "", "",
+"d5 90 62 12          \tpunpckldq %mm2,(%r18)",},
+{{0xd5, 0x40, 0x8d, 0x00, }, 4, 0, "", "",
+"d5 40 8d 00          \tlea    (%rax),%r16d",},
+{{0xd5, 0x44, 0x8d, 0x38, }, 4, 0, "", "",
+"d5 44 8d 38          \tlea    (%rax),%r31d",},
+{{0xd5, 0x20, 0x8d, 0x04, 0x05, 0x00, 0x00, 0x00, 0x00, }, 9, 0, "", "",
+"d5 20 8d 04 05 00 00 00 00 \tlea    0x0(,%r16,1),%eax",},
+{{0xd5, 0x22, 0x8d, 0x04, 0x3d, 0x00, 0x00, 0x00, 0x00, }, 9, 0, "", "",
+"d5 22 8d 04 3d 00 00 00 00 \tlea    0x0(,%r31,1),%eax",},
+{{0xd5, 0x10, 0x8d, 0x00, }, 4, 0, "", "",
+"d5 10 8d 00          \tlea    (%r16),%eax",},
+{{0xd5, 0x11, 0x8d, 0x07, }, 4, 0, "", "",
+"d5 11 8d 07          \tlea    (%r31),%eax",},
+{{0x4c, 0x8d, 0x38, }, 3, 0, "", "",
+"4c 8d 38             \tlea    (%rax),%r15",},
+{{0xd5, 0x48, 0x8d, 0x00, }, 4, 0, "", "",
+"d5 48 8d 00          \tlea    (%rax),%r16",},
+{{0x49, 0x8d, 0x07, }, 3, 0, "", "",
+"49 8d 07             \tlea    (%r15),%rax",},
+{{0xd5, 0x18, 0x8d, 0x00, }, 4, 0, "", "",
+"d5 18 8d 00          \tlea    (%r16),%rax",},
+{{0x4a, 0x8d, 0x04, 0x3d, 0x00, 0x00, 0x00, 0x00, }, 8, 0, "", "",
+"4a 8d 04 3d 00 00 00 00 \tlea    0x0(,%r15,1),%rax",},
+{{0xd5, 0x28, 0x8d, 0x04, 0x05, 0x00, 0x00, 0x00, 0x00, }, 9, 0, "", "",
+"d5 28 8d 04 05 00 00 00 00 \tlea    0x0(,%r16,1),%rax",},
+{{0xd5, 0x1c, 0x03, 0x00, }, 4, 0, "", "",
+"d5 1c 03 00          \tadd    (%r16),%r8",},
+{{0xd5, 0x1c, 0x03, 0x38, }, 4, 0, "", "",
+"d5 1c 03 38          \tadd    (%r16),%r15",},
+{{0xd5, 0x4a, 0x8b, 0x04, 0x0d, 0x00, 0x00, 0x00, 0x00, }, 9, 0, "", "",
+"d5 4a 8b 04 0d 00 00 00 00 \tmov    0x0(,%r9,1),%r16",},
+{{0xd5, 0x4a, 0x8b, 0x04, 0x35, 0x00, 0x00, 0x00, 0x00, }, 9, 0, "", "",
+"d5 4a 8b 04 35 00 00 00 00 \tmov    0x0(,%r14,1),%r16",},
+{{0xd5, 0x4d, 0x2b, 0x3a, }, 4, 0, "", "",
+"d5 4d 2b 3a          \tsub    (%r10),%r31",},
+{{0xd5, 0x4d, 0x2b, 0x7d, 0x00, }, 5, 0, "", "",
+"d5 4d 2b 7d 00       \tsub    0x0(%r13),%r31",},
+{{0xd5, 0x30, 0x8d, 0x44, 0x28, 0x01, }, 6, 0, "", "",
+"d5 30 8d 44 28 01    \tlea    0x1(%r16,%r21,1),%eax",},
+{{0xd5, 0x76, 0x8d, 0x7c, 0x10, 0x01, }, 6, 0, "", "",
+"d5 76 8d 7c 10 01    \tlea    0x1(%r16,%r26,1),%r31d",},
+{{0xd5, 0x12, 0x8d, 0x84, 0x0d, 0x81, 0x00, 0x00, 0x00, }, 9, 0, "", "",
+"d5 12 8d 84 0d 81 00 00 00 \tlea    0x81(%r21,%r9,1),%eax",},
+{{0xd5, 0x57, 0x8d, 0xbc, 0x0a, 0x81, 0x00, 0x00, 0x00, }, 9, 0, "", "",
+"d5 57 8d bc 0a 81 00 00 00 \tlea    0x81(%r26,%r9,1),%r31d",},
+{{0xd5, 0x00, 0xa1, 0xef, 0xcd, 0xab, 0x90, 0x78, 0x56, 0x34, 0x12, }, 11, 0, "jmp", "indirect",
+"d5 00 a1 ef cd ab 90 78 56 34 12 \tjmpabs $0x1234567890abcdef",},
+{{0xd5, 0x08, 0x53, }, 3, 0, "", "",
+"d5 08 53             \tpushp  %rbx",},
+{{0xd5, 0x18, 0x50, }, 3, 0, "", "",
+"d5 18 50             \tpushp  %r16",},
+{{0xd5, 0x19, 0x57, }, 3, 0, "", "",
+"d5 19 57             \tpushp  %r31",},
+{{0xd5, 0x19, 0x5f, }, 3, 0, "", "",
+"d5 19 5f             \tpopp   %r31",},
+{{0xd5, 0x18, 0x58, }, 3, 0, "", "",
+"d5 18 58             \tpopp   %r16",},
+{{0xd5, 0x08, 0x5b, }, 3, 0, "", "",
+"d5 08 5b             \tpopp   %rbx",},
+{{0x62, 0x72, 0x34, 0x00, 0xf7, 0xd2, }, 6, 0, "", "",
+"62 72 34 00 f7 d2    \tbextr  %r25d,%edx,%r10d",},
+{{0x62, 0xda, 0x34, 0x00, 0xf7, 0x94, 0x87, 0x23, 0x01, 0x00, 0x00, }, 11, 0, "", "",
+"62 da 34 00 f7 94 87 23 01 00 00 \tbextr  %r25d,0x123(%r31,%rax,4),%edx",},
+{{0x62, 0x52, 0x84, 0x00, 0xf7, 0xdf, }, 6, 0, "", "",
+"62 52 84 00 f7 df    \tbextr  %r31,%r15,%r11",},
+{{0x62, 0x5a, 0x84, 0x00, 0xf7, 0xbc, 0x87, 0x23, 0x01, 0x00, 0x00, }, 11, 0, "", "",
+"62 5a 84 00 f7 bc 87 23 01 00 00 \tbextr  %r31,0x123(%r31,%rax,4),%r15",},
+{{0x62, 0xda, 0x6c, 0x08, 0xf3, 0xd9, }, 6, 0, "", "",
+"62 da 6c 08 f3 d9    \tblsi   %r25d,%edx",},
+{{0x62, 0xda, 0x84, 0x08, 0xf3, 0xdf, }, 6, 0, "", "",
+"62 da 84 08 f3 df    \tblsi   %r31,%r15",},
+{{0x62, 0xda, 0x34, 0x00, 0xf3, 0x9c, 0x87, 0x23, 0x01, 0x00, 0x00, }, 11, 0, "", "",
+"62 da 34 00 f3 9c 87 23 01 00 00 \tblsi   0x123(%r31,%rax,4),%r25d",},
+{{0x62, 0xda, 0x84, 0x00, 0xf3, 0x9c, 0x87, 0x23, 0x01, 0x00, 0x00, }, 11, 0, "", "",
+"62 da 84 00 f3 9c 87 23 01 00 00 \tblsi   0x123(%r31,%rax,4),%r31",},
+{{0x62, 0xda, 0x6c, 0x08, 0xf3, 0xd1, }, 6, 0, "", "",
+"62 da 6c 08 f3 d1    \tblsmsk %r25d,%edx",},
+{{0x62, 0xda, 0x84, 0x08, 0xf3, 0xd7, }, 6, 0, "", "",
+"62 da 84 08 f3 d7    \tblsmsk %r31,%r15",},
+{{0x62, 0xda, 0x34, 0x00, 0xf3, 0x94, 0x87, 0x23, 0x01, 0x00, 0x00, }, 11, 0, "", "",
+"62 da 34 00 f3 94 87 23 01 00 00 \tblsmsk 0x123(%r31,%rax,4),%r25d",},
+{{0x62, 0xda, 0x84, 0x00, 0xf3, 0x94, 0x87, 0x23, 0x01, 0x00, 0x00, }, 11, 0, "", "",
+"62 da 84 00 f3 94 87 23 01 00 00 \tblsmsk 0x123(%r31,%rax,4),%r31",},
+{{0x62, 0xda, 0x6c, 0x08, 0xf3, 0xc9, }, 6, 0, "", "",
+"62 da 6c 08 f3 c9    \tblsr   %r25d,%edx",},
+{{0x62, 0xda, 0x84, 0x08, 0xf3, 0xcf, }, 6, 0, "", "",
+"62 da 84 08 f3 cf    \tblsr   %r31,%r15",},
+{{0x62, 0xda, 0x34, 0x00, 0xf3, 0x8c, 0x87, 0x23, 0x01, 0x00, 0x00, }, 11, 0, "", "",
+"62 da 34 00 f3 8c 87 23 01 00 00 \tblsr   0x123(%r31,%rax,4),%r25d",},
+{{0x62, 0xda, 0x84, 0x00, 0xf3, 0x8c, 0x87, 0x23, 0x01, 0x00, 0x00, }, 11, 0, "", "",
+"62 da 84 00 f3 8c 87 23 01 00 00 \tblsr   0x123(%r31,%rax,4),%r31",},
+{{0x62, 0x72, 0x34, 0x00, 0xf5, 0xd2, }, 6, 0, "", "",
+"62 72 34 00 f5 d2    \tbzhi   %r25d,%edx,%r10d",},
+{{0x62, 0xda, 0x34, 0x00, 0xf5, 0x94, 0x87, 0x23, 0x01, 0x00, 0x00, }, 11, 0, "", "",
+"62 da 34 00 f5 94 87 23 01 00 00 \tbzhi   %r25d,0x123(%r31,%rax,4),%edx",},
+{{0x62, 0x52, 0x84, 0x00, 0xf5, 0xdf, }, 6, 0, "", "",
+"62 52 84 00 f5 df    \tbzhi   %r31,%r15,%r11",},
+{{0x62, 0x5a, 0x84, 0x00, 0xf5, 0xbc, 0x87, 0x23, 0x01, 0x00, 0x00, }, 11, 0, "", "",
+"62 5a 84 00 f5 bc 87 23 01 00 00 \tbzhi   %r31,0x123(%r31,%rax,4),%r15",},
+{{0x62, 0xda, 0x35, 0x00, 0xe6, 0x94, 0x87, 0x23, 0x01, 0x00, 0x00, }, 11, 0, "", "",
+"62 da 35 00 e6 94 87 23 01 00 00 \tcmpbexadd %r25d,%edx,0x123(%r31,%rax,4)",},
+{{0x62, 0x5a, 0x85, 0x00, 0xe6, 0xbc, 0x87, 0x23, 0x01, 0x00, 0x00, }, 11, 0, "", "",
+"62 5a 85 00 e6 bc 87 23 01 00 00 \tcmpbexadd %r31,%r15,0x123(%r31,%rax,4)",},
+{{0x62, 0xda, 0x35, 0x00, 0xe2, 0x94, 0x87, 0x23, 0x01, 0x00, 0x00, }, 11, 0, "", "",
+"62 da 35 00 e2 94 87 23 01 00 00 \tcmpbxadd %r25d,%edx,0x123(%r31,%rax,4)",},
+{{0x62, 0x5a, 0x85, 0x00, 0xe2, 0xbc, 0x87, 0x23, 0x01, 0x00, 0x00, }, 11, 0, "", "",
+"62 5a 85 00 e2 bc 87 23 01 00 00 \tcmpbxadd %r31,%r15,0x123(%r31,%rax,4)",},
+{{0x62, 0xda, 0x35, 0x00, 0xec, 0x94, 0x87, 0x23, 0x01, 0x00, 0x00, }, 11, 0, "", "",
+"62 da 35 00 ec 94 87 23 01 00 00 \tcmplxadd %r25d,%edx,0x123(%r31,%rax,4)",},
+{{0x62, 0x5a, 0x85, 0x00, 0xec, 0xbc, 0x87, 0x23, 0x01, 0x00, 0x00, }, 11, 0, "", "",
+"62 5a 85 00 ec bc 87 23 01 00 00 \tcmplxadd %r31,%r15,0x123(%r31,%rax,4)",},
+{{0x62, 0xda, 0x35, 0x00, 0xe7, 0x94, 0x87, 0x23, 0x01, 0x00, 0x00, }, 11, 0, "", "",
+"62 da 35 00 e7 94 87 23 01 00 00 \tcmpnbexadd %r25d,%edx,0x123(%r31,%rax,4)",},
+{{0x62, 0x5a, 0x85, 0x00, 0xe7, 0xbc, 0x87, 0x23, 0x01, 0x00, 0x00, }, 11, 0, "", "",
+"62 5a 85 00 e7 bc 87 23 01 00 00 \tcmpnbexadd %r31,%r15,0x123(%r31,%rax,4)",},
+{{0x62, 0xda, 0x35, 0x00, 0xe3, 0x94, 0x87, 0x23, 0x01, 0x00, 0x00, }, 11, 0, "", "",
+"62 da 35 00 e3 94 87 23 01 00 00 \tcmpnbxadd %r25d,%edx,0x123(%r31,%rax,4)",},
+{{0x62, 0x5a, 0x85, 0x00, 0xe3, 0xbc, 0x87, 0x23, 0x01, 0x00, 0x00, }, 11, 0, "", "",
+"62 5a 85 00 e3 bc 87 23 01 00 00 \tcmpnbxadd %r31,%r15,0x123(%r31,%rax,4)",},
+{{0x62, 0xda, 0x35, 0x00, 0xef, 0x94, 0x87, 0x23, 0x01, 0x00, 0x00, }, 11, 0, "", "",
+"62 da 35 00 ef 94 87 23 01 00 00 \tcmpnlexadd %r25d,%edx,0x123(%r31,%rax,4)",},
+{{0x62, 0x5a, 0x85, 0x00, 0xef, 0xbc, 0x87, 0x23, 0x01, 0x00, 0x00, }, 11, 0, "", "",
+"62 5a 85 00 ef bc 87 23 01 00 00 \tcmpnlexadd %r31,%r15,0x123(%r31,%rax,4)",},
+{{0x62, 0xda, 0x35, 0x00, 0xed, 0x94, 0x87, 0x23, 0x01, 0x00, 0x00, }, 11, 0, "", "",
+"62 da 35 00 ed 94 87 23 01 00 00 \tcmpnlxadd %r25d,%edx,0x123(%r31,%rax,4)",},
+{{0x62, 0x5a, 0x85, 0x00, 0xed, 0xbc, 0x87, 0x23, 0x01, 0x00, 0x00, }, 11, 0, "", "",
+"62 5a 85 00 ed bc 87 23 01 00 00 \tcmpnlxadd %r31,%r15,0x123(%r31,%rax,4)",},
+{{0x62, 0xda, 0x35, 0x00, 0xe1, 0x94, 0x87, 0x23, 0x01, 0x00, 0x00, }, 11, 0, "", "",
+"62 da 35 00 e1 94 87 23 01 00 00 \tcmpnoxadd %r25d,%edx,0x123(%r31,%rax,4)",},
+{{0x62, 0x5a, 0x85, 0x00, 0xe1, 0xbc, 0x87, 0x23, 0x01, 0x00, 0x00, }, 11, 0, "", "",
+"62 5a 85 00 e1 bc 87 23 01 00 00 \tcmpnoxadd %r31,%r15,0x123(%r31,%rax,4)",},
+{{0x62, 0xda, 0x35, 0x00, 0xeb, 0x94, 0x87, 0x23, 0x01, 0x00, 0x00, }, 11, 0, "", "",
+"62 da 35 00 eb 94 87 23 01 00 00 \tcmpnpxadd %r25d,%edx,0x123(%r31,%rax,4)",},
+{{0x62, 0x5a, 0x85, 0x00, 0xeb, 0xbc, 0x87, 0x23, 0x01, 0x00, 0x00, }, 11, 0, "", "",
+"62 5a 85 00 eb bc 87 23 01 00 00 \tcmpnpxadd %r31,%r15,0x123(%r31,%rax,4)",},
+{{0x62, 0xda, 0x35, 0x00, 0xe9, 0x94, 0x87, 0x23, 0x01, 0x00, 0x00, }, 11, 0, "", "",
+"62 da 35 00 e9 94 87 23 01 00 00 \tcmpnsxadd %r25d,%edx,0x123(%r31,%rax,4)",},
+{{0x62, 0x5a, 0x85, 0x00, 0xe9, 0xbc, 0x87, 0x23, 0x01, 0x00, 0x00, }, 11, 0, "", "",
+"62 5a 85 00 e9 bc 87 23 01 00 00 \tcmpnsxadd %r31,%r15,0x123(%r31,%rax,4)",},
+{{0x62, 0xda, 0x35, 0x00, 0xe5, 0x94, 0x87, 0x23, 0x01, 0x00, 0x00, }, 11, 0, "", "",
+"62 da 35 00 e5 94 87 23 01 00 00 \tcmpnzxadd %r25d,%edx,0x123(%r31,%rax,4)",},
+{{0x62, 0x5a, 0x85, 0x00, 0xe5, 0xbc, 0x87, 0x23, 0x01, 0x00, 0x00, }, 11, 0, "", "",
+"62 5a 85 00 e5 bc 87 23 01 00 00 \tcmpnzxadd %r31,%r15,0x123(%r31,%rax,4)",},
+{{0x62, 0xda, 0x35, 0x00, 0xe0, 0x94, 0x87, 0x23, 0x01, 0x00, 0x00, }, 11, 0, "", "",
+"62 da 35 00 e0 94 87 23 01 00 00 \tcmpoxadd %r25d,%edx,0x123(%r31,%rax,4)",},
+{{0x62, 0x5a, 0x85, 0x00, 0xe0, 0xbc, 0x87, 0x23, 0x01, 0x00, 0x00, }, 11, 0, "", "",
+"62 5a 85 00 e0 bc 87 23 01 00 00 \tcmpoxadd %r31,%r15,0x123(%r31,%rax,4)",},
+{{0x62, 0xda, 0x35, 0x00, 0xea, 0x94, 0x87, 0x23, 0x01, 0x00, 0x00, }, 11, 0, "", "",
+"62 da 35 00 ea 94 87 23 01 00 00 \tcmppxadd %r25d,%edx,0x123(%r31,%rax,4)",},
+{{0x62, 0x5a, 0x85, 0x00, 0xea, 0xbc, 0x87, 0x23, 0x01, 0x00, 0x00, }, 11, 0, "", "",
+"62 5a 85 00 ea bc 87 23 01 00 00 \tcmppxadd %r31,%r15,0x123(%r31,%rax,4)",},
+{{0x62, 0xda, 0x35, 0x00, 0xe8, 0x94, 0x87, 0x23, 0x01, 0x00, 0x00, }, 11, 0, "", "",
+"62 da 35 00 e8 94 87 23 01 00 00 \tcmpsxadd %r25d,%edx,0x123(%r31,%rax,4)",},
+{{0x62, 0x5a, 0x85, 0x00, 0xe8, 0xbc, 0x87, 0x23, 0x01, 0x00, 0x00, }, 11, 0, "", "",
+"62 5a 85 00 e8 bc 87 23 01 00 00 \tcmpsxadd %r31,%r15,0x123(%r31,%rax,4)",},
+{{0x62, 0xda, 0x35, 0x00, 0xe4, 0x94, 0x87, 0x23, 0x01, 0x00, 0x00, }, 11, 0, "", "",
+"62 da 35 00 e4 94 87 23 01 00 00 \tcmpzxadd %r25d,%edx,0x123(%r31,%rax,4)",},
+{{0x62, 0x5a, 0x85, 0x00, 0xe4, 0xbc, 0x87, 0x23, 0x01, 0x00, 0x00, }, 11, 0, "", "",
+"62 5a 85 00 e4 bc 87 23 01 00 00 \tcmpzxadd %r31,%r15,0x123(%r31,%rax,4)",},
+{{0x62, 0xcc, 0xfc, 0x08, 0xf1, 0xf7, }, 6, 0, "", "",
+"62 cc fc 08 f1 f7    \tcrc32  %r31,%r22",},
+{{0x62, 0xcc, 0xfc, 0x08, 0xf1, 0x37, }, 6, 0, "", "",
+"62 cc fc 08 f1 37    \tcrc32q (%r31),%r22",},
+{{0x62, 0xec, 0xfc, 0x08, 0xf0, 0xcb, }, 6, 0, "", "",
+"62 ec fc 08 f0 cb    \tcrc32  %r19b,%r17",},
+{{0x62, 0xec, 0x7c, 0x08, 0xf0, 0xeb, }, 6, 0, "", "",
+"62 ec 7c 08 f0 eb    \tcrc32  %r19b,%r21d",},
+{{0x62, 0xfc, 0x7c, 0x08, 0xf0, 0x1b, }, 6, 0, "", "",
+"62 fc 7c 08 f0 1b    \tcrc32b (%r19),%ebx",},
+{{0x62, 0xcc, 0x7c, 0x08, 0xf1, 0xff, }, 6, 0, "", "",
+"62 cc 7c 08 f1 ff    \tcrc32  %r31d,%r23d",},
+{{0x62, 0xcc, 0x7c, 0x08, 0xf1, 0x3f, }, 6, 0, "", "",
+"62 cc 7c 08 f1 3f    \tcrc32l (%r31),%r23d",},
+{{0x62, 0xcc, 0x7d, 0x08, 0xf1, 0xef, }, 6, 0, "", "",
+"62 cc 7d 08 f1 ef    \tcrc32  %r31w,%r21d",},
+{{0x62, 0xcc, 0x7d, 0x08, 0xf1, 0x2f, }, 6, 0, "", "",
+"62 cc 7d 08 f1 2f    \tcrc32w (%r31),%r21d",},
+{{0x62, 0xe4, 0xfc, 0x08, 0xf1, 0xd0, }, 6, 0, "", "",
+"62 e4 fc 08 f1 d0    \tcrc32  %rax,%r18",},
+{{0x67, 0x62, 0x4c, 0x7f, 0x08, 0xf8, 0x8c, 0x87, 0x23, 0x01, 0x00, 0x00, }, 12, 0, "", "",
+"67 62 4c 7f 08 f8 8c 87 23 01 00 00 \tenqcmd 0x123(%r31d,%eax,4),%r25d",},
+{{0x62, 0x4c, 0x7f, 0x08, 0xf8, 0xbc, 0x87, 0x23, 0x01, 0x00, 0x00, }, 11, 0, "", "",
+"62 4c 7f 08 f8 bc 87 23 01 00 00 \tenqcmd 0x123(%r31,%rax,4),%r31",},
+{{0x67, 0x62, 0x4c, 0x7e, 0x08, 0xf8, 0x8c, 0x87, 0x23, 0x01, 0x00, 0x00, }, 12, 0, "", "",
+"67 62 4c 7e 08 f8 8c 87 23 01 00 00 \tenqcmds 0x123(%r31d,%eax,4),%r25d",},
+{{0x62, 0x4c, 0x7e, 0x08, 0xf8, 0xbc, 0x87, 0x23, 0x01, 0x00, 0x00, }, 11, 0, "", "",
+"62 4c 7e 08 f8 bc 87 23 01 00 00 \tenqcmds 0x123(%r31,%rax,4),%r31",},
+{{0x62, 0x4c, 0x7e, 0x08, 0xf0, 0xbc, 0x87, 0x23, 0x01, 0x00, 0x00, }, 11, 0, "", "",
+"62 4c 7e 08 f0 bc 87 23 01 00 00 \tinvept 0x123(%r31,%rax,4),%r31",},
+{{0x62, 0x4c, 0x7e, 0x08, 0xf2, 0xbc, 0x87, 0x23, 0x01, 0x00, 0x00, }, 11, 0, "", "",
+"62 4c 7e 08 f2 bc 87 23 01 00 00 \tinvpcid 0x123(%r31,%rax,4),%r31",},
+{{0x62, 0x4c, 0x7e, 0x08, 0xf1, 0xbc, 0x87, 0x23, 0x01, 0x00, 0x00, }, 11, 0, "", "",
+"62 4c 7e 08 f1 bc 87 23 01 00 00 \tinvvpid 0x123(%r31,%rax,4),%r31",},
+{{0x62, 0x61, 0x7d, 0x08, 0x93, 0xcd, }, 6, 0, "", "",
+"62 61 7d 08 93 cd    \tkmovb  %k5,%r25d",},
+{{0x62, 0xd9, 0x7d, 0x08, 0x91, 0xac, 0x87, 0x23, 0x01, 0x00, 0x00, }, 11, 0, "", "",
+"62 d9 7d 08 91 ac 87 23 01 00 00 \tkmovb  %k5,0x123(%r31,%rax,4)",},
+{{0x62, 0xd9, 0x7d, 0x08, 0x92, 0xe9, }, 6, 0, "", "",
+"62 d9 7d 08 92 e9    \tkmovb  %r25d,%k5",},
+{{0x62, 0xd9, 0x7d, 0x08, 0x90, 0xac, 0x87, 0x23, 0x01, 0x00, 0x00, }, 11, 0, "", "",
+"62 d9 7d 08 90 ac 87 23 01 00 00 \tkmovb  0x123(%r31,%rax,4),%k5",},
+{{0x62, 0x61, 0x7f, 0x08, 0x93, 0xcd, }, 6, 0, "", "",
+"62 61 7f 08 93 cd    \tkmovd  %k5,%r25d",},
+{{0x62, 0xd9, 0xfd, 0x08, 0x91, 0xac, 0x87, 0x23, 0x01, 0x00, 0x00, }, 11, 0, "", "",
+"62 d9 fd 08 91 ac 87 23 01 00 00 \tkmovd  %k5,0x123(%r31,%rax,4)",},
+{{0x62, 0xd9, 0x7f, 0x08, 0x92, 0xe9, }, 6, 0, "", "",
+"62 d9 7f 08 92 e9    \tkmovd  %r25d,%k5",},
+{{0x62, 0xd9, 0xfd, 0x08, 0x90, 0xac, 0x87, 0x23, 0x01, 0x00, 0x00, }, 11, 0, "", "",
+"62 d9 fd 08 90 ac 87 23 01 00 00 \tkmovd  0x123(%r31,%rax,4),%k5",},
+{{0x62, 0x61, 0xff, 0x08, 0x93, 0xfd, }, 6, 0, "", "",
+"62 61 ff 08 93 fd    \tkmovq  %k5,%r31",},
+{{0x62, 0xd9, 0xfc, 0x08, 0x91, 0xac, 0x87, 0x23, 0x01, 0x00, 0x00, }, 11, 0, "", "",
+"62 d9 fc 08 91 ac 87 23 01 00 00 \tkmovq  %k5,0x123(%r31,%rax,4)",},
+{{0x62, 0xd9, 0xff, 0x08, 0x92, 0xef, }, 6, 0, "", "",
+"62 d9 ff 08 92 ef    \tkmovq  %r31,%k5",},
+{{0x62, 0xd9, 0xfc, 0x08, 0x90, 0xac, 0x87, 0x23, 0x01, 0x00, 0x00, }, 11, 0, "", "",
+"62 d9 fc 08 90 ac 87 23 01 00 00 \tkmovq  0x123(%r31,%rax,4),%k5",},
+{{0x62, 0x61, 0x7c, 0x08, 0x93, 0xcd, }, 6, 0, "", "",
+"62 61 7c 08 93 cd    \tkmovw  %k5,%r25d",},
+{{0x62, 0xd9, 0x7c, 0x08, 0x91, 0xac, 0x87, 0x23, 0x01, 0x00, 0x00, }, 11, 0, "", "",
+"62 d9 7c 08 91 ac 87 23 01 00 00 \tkmovw  %k5,0x123(%r31,%rax,4)",},
+{{0x62, 0xd9, 0x7c, 0x08, 0x92, 0xe9, }, 6, 0, "", "",
+"62 d9 7c 08 92 e9    \tkmovw  %r25d,%k5",},
+{{0x62, 0xd9, 0x7c, 0x08, 0x90, 0xac, 0x87, 0x23, 0x01, 0x00, 0x00, }, 11, 0, "", "",
+"62 d9 7c 08 90 ac 87 23 01 00 00 \tkmovw  0x123(%r31,%rax,4),%k5",},
+{{0x62, 0xda, 0x7c, 0x08, 0x49, 0x84, 0x87, 0x23, 0x01, 0x00, 0x00, }, 11, 0, "", "",
+"62 da 7c 08 49 84 87 23 01 00 00 \tldtilecfg 0x123(%r31,%rax,4)",},
+{{0x62, 0xfc, 0x7d, 0x08, 0x60, 0xc2, }, 6, 0, "", "",
+"62 fc 7d 08 60 c2    \tmovbe  %r18w,%ax",},
+{{0x62, 0xd4, 0x7d, 0x08, 0x60, 0xc7, }, 6, 0, "", "",
+"62 d4 7d 08 60 c7    \tmovbe  %r15w,%ax",},
+{{0x62, 0xec, 0x7d, 0x08, 0x61, 0x94, 0x80, 0x23, 0x01, 0x00, 0x00, }, 11, 0, "", "",
+"62 ec 7d 08 61 94 80 23 01 00 00 \tmovbe  %r18w,0x123(%r16,%rax,4)",},
+{{0x62, 0xcc, 0x7d, 0x08, 0x61, 0x94, 0x87, 0x23, 0x01, 0x00, 0x00, }, 11, 0, "", "",
+"62 cc 7d 08 61 94 87 23 01 00 00 \tmovbe  %r18w,0x123(%r31,%rax,4)",},
+{{0x62, 0xdc, 0x7c, 0x08, 0x60, 0xd1, }, 6, 0, "", "",
+"62 dc 7c 08 60 d1    \tmovbe  %r25d,%edx",},
+{{0x62, 0xd4, 0x7c, 0x08, 0x60, 0xd7, }, 6, 0, "", "",
+"62 d4 7c 08 60 d7    \tmovbe  %r15d,%edx",},
+{{0x62, 0x6c, 0x7c, 0x08, 0x61, 0x8c, 0x80, 0x23, 0x01, 0x00, 0x00, }, 11, 0, "", "",
+"62 6c 7c 08 61 8c 80 23 01 00 00 \tmovbe  %r25d,0x123(%r16,%rax,4)",},
+{{0x62, 0x5c, 0xfc, 0x08, 0x60, 0xff, }, 6, 0, "", "",
+"62 5c fc 08 60 ff    \tmovbe  %r31,%r15",},
+{{0x62, 0x54, 0xfc, 0x08, 0x60, 0xf8, }, 6, 0, "", "",
+"62 54 fc 08 60 f8    \tmovbe  %r8,%r15",},
+{{0x62, 0x6c, 0xfc, 0x08, 0x61, 0xbc, 0x80, 0x23, 0x01, 0x00, 0x00, }, 11, 0, "", "",
+"62 6c fc 08 61 bc 80 23 01 00 00 \tmovbe  %r31,0x123(%r16,%rax,4)",},
+{{0x62, 0x4c, 0xfc, 0x08, 0x61, 0xbc, 0x87, 0x23, 0x01, 0x00, 0x00, }, 11, 0, "", "",
+"62 4c fc 08 61 bc 87 23 01 00 00 \tmovbe  %r31,0x123(%r31,%rax,4)",},
+{{0x62, 0x6c, 0xfc, 0x08, 0x60, 0xbc, 0x80, 0x23, 0x01, 0x00, 0x00, }, 11, 0, "", "",
+"62 6c fc 08 60 bc 80 23 01 00 00 \tmovbe  0x123(%r16,%rax,4),%r31",},
+{{0x62, 0xcc, 0x7d, 0x08, 0x60, 0x94, 0x87, 0x23, 0x01, 0x00, 0x00, }, 11, 0, "", "",
+"62 cc 7d 08 60 94 87 23 01 00 00 \tmovbe  0x123(%r31,%rax,4),%r18w",},
+{{0x62, 0x4c, 0x7c, 0x08, 0x60, 0x8c, 0x87, 0x23, 0x01, 0x00, 0x00, }, 11, 0, "", "",
+"62 4c 7c 08 60 8c 87 23 01 00 00 \tmovbe  0x123(%r31,%rax,4),%r25d",},
+{{0x67, 0x62, 0x4c, 0x7d, 0x08, 0xf8, 0x8c, 0x87, 0x23, 0x01, 0x00, 0x00, }, 12, 0, "", "",
+"67 62 4c 7d 08 f8 8c 87 23 01 00 00 \tmovdir64b 0x123(%r31d,%eax,4),%r25d",},
+{{0x62, 0x4c, 0x7d, 0x08, 0xf8, 0xbc, 0x87, 0x23, 0x01, 0x00, 0x00, }, 11, 0, "", "",
+"62 4c 7d 08 f8 bc 87 23 01 00 00 \tmovdir64b 0x123(%r31,%rax,4),%r31",},
+{{0x62, 0x4c, 0x7c, 0x08, 0xf9, 0x8c, 0x87, 0x23, 0x01, 0x00, 0x00, }, 11, 0, "", "",
+"62 4c 7c 08 f9 8c 87 23 01 00 00 \tmovdiri %r25d,0x123(%r31,%rax,4)",},
+{{0x62, 0x4c, 0xfc, 0x08, 0xf9, 0xbc, 0x87, 0x23, 0x01, 0x00, 0x00, }, 11, 0, "", "",
+"62 4c fc 08 f9 bc 87 23 01 00 00 \tmovdiri %r31,0x123(%r31,%rax,4)",},
+{{0x62, 0x5a, 0x6f, 0x08, 0xf5, 0xd1, }, 6, 0, "", "",
+"62 5a 6f 08 f5 d1    \tpdep   %r25d,%edx,%r10d",},
+{{0x62, 0x5a, 0x87, 0x08, 0xf5, 0xdf, }, 6, 0, "", "",
+"62 5a 87 08 f5 df    \tpdep   %r31,%r15,%r11",},
+{{0x62, 0xda, 0x37, 0x00, 0xf5, 0x94, 0x87, 0x23, 0x01, 0x00, 0x00, }, 11, 0, "", "",
+"62 da 37 00 f5 94 87 23 01 00 00 \tpdep   0x123(%r31,%rax,4),%r25d,%edx",},
+{{0x62, 0x5a, 0x87, 0x00, 0xf5, 0xbc, 0x87, 0x23, 0x01, 0x00, 0x00, }, 11, 0, "", "",
+"62 5a 87 00 f5 bc 87 23 01 00 00 \tpdep   0x123(%r31,%rax,4),%r31,%r15",},
+{{0x62, 0x5a, 0x6e, 0x08, 0xf5, 0xd1, }, 6, 0, "", "",
+"62 5a 6e 08 f5 d1    \tpext   %r25d,%edx,%r10d",},
+{{0x62, 0x5a, 0x86, 0x08, 0xf5, 0xdf, }, 6, 0, "", "",
+"62 5a 86 08 f5 df    \tpext   %r31,%r15,%r11",},
+{{0x62, 0xda, 0x36, 0x00, 0xf5, 0x94, 0x87, 0x23, 0x01, 0x00, 0x00, }, 11, 0, "", "",
+"62 da 36 00 f5 94 87 23 01 00 00 \tpext   0x123(%r31,%rax,4),%r25d,%edx",},
+{{0x62, 0x5a, 0x86, 0x00, 0xf5, 0xbc, 0x87, 0x23, 0x01, 0x00, 0x00, }, 11, 0, "", "",
+"62 5a 86 00 f5 bc 87 23 01 00 00 \tpext   0x123(%r31,%rax,4),%r31,%r15",},
+{{0x62, 0x72, 0x35, 0x00, 0xf7, 0xd2, }, 6, 0, "", "",
+"62 72 35 00 f7 d2    \tshlx   %r25d,%edx,%r10d",},
+{{0x62, 0xda, 0x35, 0x00, 0xf7, 0x94, 0x87, 0x23, 0x01, 0x00, 0x00, }, 11, 0, "", "",
+"62 da 35 00 f7 94 87 23 01 00 00 \tshlx   %r25d,0x123(%r31,%rax,4),%edx",},
+{{0x62, 0x52, 0x85, 0x00, 0xf7, 0xdf, }, 6, 0, "", "",
+"62 52 85 00 f7 df    \tshlx   %r31,%r15,%r11",},
+{{0x62, 0x5a, 0x85, 0x00, 0xf7, 0xbc, 0x87, 0x23, 0x01, 0x00, 0x00, }, 11, 0, "", "",
+"62 5a 85 00 f7 bc 87 23 01 00 00 \tshlx   %r31,0x123(%r31,%rax,4),%r15",},
+{{0x62, 0x72, 0x37, 0x00, 0xf7, 0xd2, }, 6, 0, "", "",
+"62 72 37 00 f7 d2    \tshrx   %r25d,%edx,%r10d",},
+{{0x62, 0xda, 0x37, 0x00, 0xf7, 0x94, 0x87, 0x23, 0x01, 0x00, 0x00, }, 11, 0, "", "",
+"62 da 37 00 f7 94 87 23 01 00 00 \tshrx   %r25d,0x123(%r31,%rax,4),%edx",},
+{{0x62, 0x52, 0x87, 0x00, 0xf7, 0xdf, }, 6, 0, "", "",
+"62 52 87 00 f7 df    \tshrx   %r31,%r15,%r11",},
+{{0x62, 0x5a, 0x87, 0x00, 0xf7, 0xbc, 0x87, 0x23, 0x01, 0x00, 0x00, }, 11, 0, "", "",
+"62 5a 87 00 f7 bc 87 23 01 00 00 \tshrx   %r31,0x123(%r31,%rax,4),%r15",},
+{{0x62, 0xda, 0x7d, 0x08, 0x49, 0x84, 0x87, 0x23, 0x01, 0x00, 0x00, }, 11, 0, "", "",
+"62 da 7d 08 49 84 87 23 01 00 00 \tsttilecfg 0x123(%r31,%rax,4)",},
+{{0x62, 0xda, 0x7f, 0x08, 0x4b, 0xb4, 0x87, 0x23, 0x01, 0x00, 0x00, }, 11, 0, "", "",
+"62 da 7f 08 4b b4 87 23 01 00 00 \ttileloadd 0x123(%r31,%rax,4),%tmm6",},
+{{0x62, 0xda, 0x7d, 0x08, 0x4b, 0xb4, 0x87, 0x23, 0x01, 0x00, 0x00, }, 11, 0, "", "",
+"62 da 7d 08 4b b4 87 23 01 00 00 \ttileloaddt1 0x123(%r31,%rax,4),%tmm6",},
+{{0x62, 0xda, 0x7e, 0x08, 0x4b, 0xb4, 0x87, 0x23, 0x01, 0x00, 0x00, }, 11, 0, "", "",
+"62 da 7e 08 4b b4 87 23 01 00 00 \ttilestored %tmm6,0x123(%r31,%rax,4)",},
+{{0x62, 0xfa, 0x7d, 0x28, 0x1a, 0x18, }, 6, 0, "", "",
+"62 fa 7d 28 1a 18    \tvbroadcastf32x4 (%r16),%ymm3",},
+{{0x62, 0xfa, 0x7d, 0x28, 0x5a, 0x18, }, 6, 0, "", "",
+"62 fa 7d 28 5a 18    \tvbroadcasti32x4 (%r16),%ymm3",},
+{{0x62, 0xfb, 0x7d, 0x28, 0x19, 0x18, 0x01, }, 7, 0, "", "",
+"62 fb 7d 28 19 18 01 \tvextractf32x4 $0x1,%ymm3,(%r16)",},
+{{0x62, 0xfb, 0x7d, 0x28, 0x39, 0x18, 0x01, }, 7, 0, "", "",
+"62 fb 7d 28 39 18 01 \tvextracti32x4 $0x1,%ymm3,(%r16)",},
+{{0x62, 0x7b, 0x65, 0x28, 0x18, 0x00, 0x01, }, 7, 0, "", "",
+"62 7b 65 28 18 00 01 \tvinsertf32x4 $0x1,(%r16),%ymm3,%ymm8",},
+{{0x62, 0x7b, 0x65, 0x28, 0x38, 0x00, 0x01, }, 7, 0, "", "",
+"62 7b 65 28 38 00 01 \tvinserti32x4 $0x1,(%r16),%ymm3,%ymm8",},
+{{0x62, 0xdb, 0xfd, 0x08, 0x09, 0x30, 0x01, }, 7, 0, "", "",
+"62 db fd 08 09 30 01 \tvrndscalepd $0x1,(%r24),%xmm6",},
+{{0x62, 0xdb, 0x7d, 0x08, 0x08, 0x30, 0x02, }, 7, 0, "", "",
+"62 db 7d 08 08 30 02 \tvrndscaleps $0x2,(%r24),%xmm6",},
+{{0x62, 0xdb, 0xcd, 0x08, 0x0b, 0x18, 0x03, }, 7, 0, "", "",
+"62 db cd 08 0b 18 03 \tvrndscalesd $0x3,(%r24),%xmm6,%xmm3",},
+{{0x62, 0xdb, 0x4d, 0x08, 0x0a, 0x18, 0x04, }, 7, 0, "", "",
+"62 db 4d 08 0a 18 04 \tvrndscaless $0x4,(%r24),%xmm6,%xmm3",},
+{{0x62, 0x4c, 0x7c, 0x08, 0x66, 0x8c, 0x87, 0x23, 0x01, 0x00, 0x00, }, 11, 0, "", "",
+"62 4c 7c 08 66 8c 87 23 01 00 00 \twrssd  %r25d,0x123(%r31,%rax,4)",},
+{{0x62, 0x4c, 0xfc, 0x08, 0x66, 0xbc, 0x87, 0x23, 0x01, 0x00, 0x00, }, 11, 0, "", "",
+"62 4c fc 08 66 bc 87 23 01 00 00 \twrssq  %r31,0x123(%r31,%rax,4)",},
+{{0x62, 0x4c, 0x7d, 0x08, 0x65, 0x8c, 0x87, 0x23, 0x01, 0x00, 0x00, }, 11, 0, "", "",
+"62 4c 7d 08 65 8c 87 23 01 00 00 \twrussd %r25d,0x123(%r31,%rax,4)",},
+{{0x62, 0x4c, 0xfd, 0x08, 0x65, 0xbc, 0x87, 0x23, 0x01, 0x00, 0x00, }, 11, 0, "", "",
+"62 4c fd 08 65 bc 87 23 01 00 00 \twrussq %r31,0x123(%r31,%rax,4)",},
+{{0x62, 0xf4, 0x0d, 0x10, 0x81, 0xd0, 0x34, 0x12, }, 8, 0, "", "",
+"62 f4 0d 10 81 d0 34 12 \tadc    $0x1234,%ax,%r30w",},
+{{0x62, 0x7c, 0x6c, 0x10, 0x10, 0xf9, }, 6, 0, "", "",
+"62 7c 6c 10 10 f9    \tadc    %r15b,%r17b,%r18b",},
+{{0x62, 0x54, 0x6c, 0x10, 0x11, 0x38, }, 6, 0, "", "",
+"62 54 6c 10 11 38    \tadc    %r15d,(%r8),%r18d",},
+{{0x62, 0xc4, 0x3c, 0x18, 0x12, 0x04, 0x07, }, 7, 0, "", "",
+"62 c4 3c 18 12 04 07 \tadc    (%r15,%rax,1),%r16b,%r8b",},
+{{0x62, 0xc4, 0x3d, 0x18, 0x13, 0x04, 0x07, }, 7, 0, "", "",
+"62 c4 3d 18 13 04 07 \tadc    (%r15,%rax,1),%r16w,%r8w",},
+{{0x62, 0xfc, 0x5c, 0x10, 0x83, 0x14, 0x83, 0x11, }, 8, 0, "", "",
+"62 fc 5c 10 83 14 83 11 \tadc    $0x11,(%r19,%rax,4),%r20d",},
+{{0x62, 0x54, 0x6d, 0x10, 0x66, 0xc7, }, 6, 0, "", "",
+"62 54 6d 10 66 c7    \tadcx   %r15d,%r8d,%r18d",},
+{{0x62, 0x14, 0xf9, 0x08, 0x66, 0x04, 0x3f, }, 7, 0, "", "",
+"62 14 f9 08 66 04 3f \tadcx   (%r15,%r31,1),%r8",},
+{{0x62, 0x14, 0x69, 0x10, 0x66, 0x04, 0x3f, }, 7, 0, "", "",
+"62 14 69 10 66 04 3f \tadcx   (%r15,%r31,1),%r8d,%r18d",},
+{{0x62, 0xf4, 0x0d, 0x10, 0x81, 0xc0, 0x34, 0x12, }, 8, 0, "", "",
+"62 f4 0d 10 81 c0 34 12 \tadd    $0x1234,%ax,%r30w",},
+{{0x62, 0xd4, 0xfc, 0x10, 0x81, 0xc7, 0x33, 0x44, 0x34, 0x12, }, 10, 0, "", "",
+"62 d4 fc 10 81 c7 33 44 34 12 \tadd    $0x12344433,%r15,%r16",},
+{{0x62, 0xd4, 0x74, 0x10, 0x80, 0xc5, 0x34, }, 7, 0, "", "",
+"62 d4 74 10 80 c5 34 \tadd    $0x34,%r13b,%r17b",},
+{{0x62, 0xf4, 0xbc, 0x18, 0x81, 0xc0, 0x11, 0x22, 0x33, 0xf4, }, 10, 0, "", "",
+"62 f4 bc 18 81 c0 11 22 33 f4 \tadd    $0xfffffffff4332211,%rax,%r8",},
+{{0x62, 0x44, 0xfc, 0x10, 0x01, 0xf8, }, 6, 0, "", "",
+"62 44 fc 10 01 f8    \tadd    %r31,%r8,%r16",},
+{{0x62, 0x44, 0xfc, 0x10, 0x01, 0x38, }, 6, 0, "", "",
+"62 44 fc 10 01 38    \tadd    %r31,(%r8),%r16",},
+{{0x62, 0x44, 0xf8, 0x10, 0x01, 0x3c, 0xc0, }, 7, 0, "", "",
+"62 44 f8 10 01 3c c0 \tadd    %r31,(%r8,%r16,8),%r16",},
+{{0x62, 0x44, 0x7c, 0x10, 0x00, 0xf8, }, 6, 0, "", "",
+"62 44 7c 10 00 f8    \tadd    %r31b,%r8b,%r16b",},
+{{0x62, 0x44, 0x7c, 0x10, 0x01, 0xf8, }, 6, 0, "", "",
+"62 44 7c 10 01 f8    \tadd    %r31d,%r8d,%r16d",},
+{{0x62, 0x44, 0x7d, 0x10, 0x01, 0xf8, }, 6, 0, "", "",
+"62 44 7d 10 01 f8    \tadd    %r31w,%r8w,%r16w",},
+{{0x62, 0x5c, 0xfc, 0x10, 0x03, 0x07, }, 6, 0, "", "",
+"62 5c fc 10 03 07    \tadd    (%r31),%r8,%r16",},
+{{0x62, 0x5c, 0xf8, 0x10, 0x03, 0x84, 0x07, 0x90, 0x90, 0x00, 0x00, }, 11, 0, "", "",
+"62 5c f8 10 03 84 07 90 90 00 00 \tadd    0x9090(%r31,%r16,1),%r8,%r16",},
+{{0x62, 0x44, 0x7c, 0x10, 0x00, 0xf8, }, 6, 0, "", "",
+"62 44 7c 10 00 f8    \tadd    %r31b,%r8b,%r16b",},
+{{0x62, 0x44, 0x7c, 0x10, 0x01, 0xf8, }, 6, 0, "", "",
+"62 44 7c 10 01 f8    \tadd    %r31d,%r8d,%r16d",},
+{{0x62, 0xfc, 0x5c, 0x10, 0x83, 0x04, 0x83, 0x11, }, 8, 0, "", "",
+"62 fc 5c 10 83 04 83 11 \tadd    $0x11,(%r19,%rax,4),%r20d",},
+{{0x62, 0x44, 0xfc, 0x10, 0x01, 0xf8, }, 6, 0, "", "",
+"62 44 fc 10 01 f8    \tadd    %r31,%r8,%r16",},
+{{0x62, 0xd4, 0xfc, 0x10, 0x81, 0x04, 0x8f, 0x33, 0x44, 0x34, 0x12, }, 11, 0, "", "",
+"62 d4 fc 10 81 04 8f 33 44 34 12 \tadd    $0x12344433,(%r15,%rcx,4),%r16",},
+{{0x62, 0x44, 0x7d, 0x10, 0x01, 0xf8, }, 6, 0, "", "",
+"62 44 7d 10 01 f8    \tadd    %r31w,%r8w,%r16w",},
+{{0x62, 0x54, 0x6e, 0x10, 0x66, 0xc7, }, 6, 0, "", "",
+"62 54 6e 10 66 c7    \tadox   %r15d,%r8d,%r18d",},
+{{0x62, 0x5c, 0xfc, 0x10, 0x03, 0xc7, }, 6, 0, "", "",
+"62 5c fc 10 03 c7    \tadd    %r31,%r8,%r16",},
+{{0x62, 0x44, 0xfc, 0x10, 0x01, 0xf8, }, 6, 0, "", "",
+"62 44 fc 10 01 f8    \tadd    %r31,%r8,%r16",},
+{{0x62, 0x14, 0xfa, 0x08, 0x66, 0x04, 0x3f, }, 7, 0, "", "",
+"62 14 fa 08 66 04 3f \tadox   (%r15,%r31,1),%r8",},
+{{0x62, 0x14, 0x6a, 0x10, 0x66, 0x04, 0x3f, }, 7, 0, "", "",
+"62 14 6a 10 66 04 3f \tadox   (%r15,%r31,1),%r8d,%r18d",},
+{{0x62, 0xf4, 0x0d, 0x10, 0x81, 0xe0, 0x34, 0x12, }, 8, 0, "", "",
+"62 f4 0d 10 81 e0 34 12 \tand    $0x1234,%ax,%r30w",},
+{{0x62, 0x7c, 0x6c, 0x10, 0x20, 0xf9, }, 6, 0, "", "",
+"62 7c 6c 10 20 f9    \tand    %r15b,%r17b,%r18b",},
+{{0x62, 0x54, 0x6c, 0x10, 0x21, 0x38, }, 6, 0, "", "",
+"62 54 6c 10 21 38    \tand    %r15d,(%r8),%r18d",},
+{{0x62, 0xc4, 0x3c, 0x18, 0x22, 0x04, 0x07, }, 7, 0, "", "",
+"62 c4 3c 18 22 04 07 \tand    (%r15,%rax,1),%r16b,%r8b",},
+{{0x62, 0xc4, 0x3d, 0x18, 0x23, 0x04, 0x07, }, 7, 0, "", "",
+"62 c4 3d 18 23 04 07 \tand    (%r15,%rax,1),%r16w,%r8w",},
+{{0x62, 0xfc, 0x5c, 0x10, 0x83, 0x24, 0x83, 0x11, }, 8, 0, "", "",
+"62 fc 5c 10 83 24 83 11 \tand    $0x11,(%r19,%rax,4),%r20d",},
+{{0x67, 0x62, 0xf4, 0x3c, 0x18, 0x47, 0x90, 0x90, 0x90, 0x90, 0x90, }, 11, 0, "", "",
+"67 62 f4 3c 18 47 90 90 90 90 90 \tcmova  -0x6f6f6f70(%eax),%edx,%r8d",},
+{{0x67, 0x62, 0xf4, 0x3c, 0x18, 0x43, 0x90, 0x90, 0x90, 0x90, 0x90, }, 11, 0, "", "",
+"67 62 f4 3c 18 43 90 90 90 90 90 \tcmovae -0x6f6f6f70(%eax),%edx,%r8d",},
+{{0x67, 0x62, 0xf4, 0x3c, 0x18, 0x42, 0x90, 0x90, 0x90, 0x90, 0x90, }, 11, 0, "", "",
+"67 62 f4 3c 18 42 90 90 90 90 90 \tcmovb  -0x6f6f6f70(%eax),%edx,%r8d",},
+{{0x67, 0x62, 0xf4, 0x3c, 0x18, 0x46, 0x90, 0x90, 0x90, 0x90, 0x90, }, 11, 0, "", "",
+"67 62 f4 3c 18 46 90 90 90 90 90 \tcmovbe -0x6f6f6f70(%eax),%edx,%r8d",},
+{{0x67, 0x62, 0xf4, 0x3c, 0x18, 0x44, 0x90, 0x90, 0x90, 0x90, 0x90, }, 11, 0, "", "",
+"67 62 f4 3c 18 44 90 90 90 90 90 \tcmove  -0x6f6f6f70(%eax),%edx,%r8d",},
+{{0x67, 0x62, 0xf4, 0x3c, 0x18, 0x4f, 0x90, 0x90, 0x90, 0x90, 0x90, }, 11, 0, "", "",
+"67 62 f4 3c 18 4f 90 90 90 90 90 \tcmovg  -0x6f6f6f70(%eax),%edx,%r8d",},
+{{0x67, 0x62, 0xf4, 0x3c, 0x18, 0x4d, 0x90, 0x90, 0x90, 0x90, 0x90, }, 11, 0, "", "",
+"67 62 f4 3c 18 4d 90 90 90 90 90 \tcmovge -0x6f6f6f70(%eax),%edx,%r8d",},
+{{0x67, 0x62, 0xf4, 0x3c, 0x18, 0x4c, 0x90, 0x90, 0x90, 0x90, 0x90, }, 11, 0, "", "",
+"67 62 f4 3c 18 4c 90 90 90 90 90 \tcmovl  -0x6f6f6f70(%eax),%edx,%r8d",},
+{{0x67, 0x62, 0xf4, 0x3c, 0x18, 0x4e, 0x90, 0x90, 0x90, 0x90, 0x90, }, 11, 0, "", "",
+"67 62 f4 3c 18 4e 90 90 90 90 90 \tcmovle -0x6f6f6f70(%eax),%edx,%r8d",},
+{{0x67, 0x62, 0xf4, 0x3c, 0x18, 0x45, 0x90, 0x90, 0x90, 0x90, 0x90, }, 11, 0, "", "",
+"67 62 f4 3c 18 45 90 90 90 90 90 \tcmovne -0x6f6f6f70(%eax),%edx,%r8d",},
+{{0x67, 0x62, 0xf4, 0x3c, 0x18, 0x41, 0x90, 0x90, 0x90, 0x90, 0x90, }, 11, 0, "", "",
+"67 62 f4 3c 18 41 90 90 90 90 90 \tcmovno -0x6f6f6f70(%eax),%edx,%r8d",},
+{{0x67, 0x62, 0xf4, 0x3c, 0x18, 0x4b, 0x90, 0x90, 0x90, 0x90, 0x90, }, 11, 0, "", "",
+"67 62 f4 3c 18 4b 90 90 90 90 90 \tcmovnp -0x6f6f6f70(%eax),%edx,%r8d",},
+{{0x67, 0x62, 0xf4, 0x3c, 0x18, 0x49, 0x90, 0x90, 0x90, 0x90, 0x90, }, 11, 0, "", "",
+"67 62 f4 3c 18 49 90 90 90 90 90 \tcmovns -0x6f6f6f70(%eax),%edx,%r8d",},
+{{0x67, 0x62, 0xf4, 0x3c, 0x18, 0x40, 0x90, 0x90, 0x90, 0x90, 0x90, }, 11, 0, "", "",
+"67 62 f4 3c 18 40 90 90 90 90 90 \tcmovo  -0x6f6f6f70(%eax),%edx,%r8d",},
+{{0x67, 0x62, 0xf4, 0x3c, 0x18, 0x4a, 0x90, 0x90, 0x90, 0x90, 0x90, }, 11, 0, "", "",
+"67 62 f4 3c 18 4a 90 90 90 90 90 \tcmovp  -0x6f6f6f70(%eax),%edx,%r8d",},
+{{0x67, 0x62, 0xf4, 0x3c, 0x18, 0x48, 0x90, 0x90, 0x90, 0x90, 0x90, }, 11, 0, "", "",
+"67 62 f4 3c 18 48 90 90 90 90 90 \tcmovs  -0x6f6f6f70(%eax),%edx,%r8d",},
+{{0x62, 0xf4, 0xf4, 0x10, 0xff, 0xc8, }, 6, 0, "", "",
+"62 f4 f4 10 ff c8    \tdec    %rax,%r17",},
+{{0x62, 0x9c, 0x3c, 0x18, 0xfe, 0x0c, 0x27, }, 7, 0, "", "",
+"62 9c 3c 18 fe 0c 27 \tdec    (%r31,%r12,1),%r8b",},
+{{0x62, 0xb4, 0xb0, 0x10, 0xaf, 0x94, 0xf8, 0x09, 0x09, 0x00, 0x00, }, 11, 0, "", "",
+"62 b4 b0 10 af 94 f8 09 09 00 00 \timul   0x909(%rax,%r31,8),%rdx,%r25",},
+{{0x67, 0x62, 0xf4, 0x3c, 0x18, 0xaf, 0x90, 0x09, 0x09, 0x09, 0x00, }, 11, 0, "", "",
+"67 62 f4 3c 18 af 90 09 09 09 00 \timul   0x90909(%eax),%edx,%r8d",},
+{{0x62, 0xdc, 0xfc, 0x10, 0xff, 0xc7, }, 6, 0, "", "",
+"62 dc fc 10 ff c7    \tinc    %r31,%r16",},
+{{0x62, 0xdc, 0xbc, 0x18, 0xff, 0xc7, }, 6, 0, "", "",
+"62 dc bc 18 ff c7    \tinc    %r31,%r8",},
+{{0x62, 0xf4, 0xe4, 0x18, 0xff, 0xc0, }, 6, 0, "", "",
+"62 f4 e4 18 ff c0    \tinc    %rax,%rbx",},
+{{0x62, 0xf4, 0xf4, 0x10, 0xf7, 0xd8, }, 6, 0, "", "",
+"62 f4 f4 10 f7 d8    \tneg    %rax,%r17",},
+{{0x62, 0x9c, 0x3c, 0x18, 0xf6, 0x1c, 0x27, }, 7, 0, "", "",
+"62 9c 3c 18 f6 1c 27 \tneg    (%r31,%r12,1),%r8b",},
+{{0x62, 0xf4, 0xf4, 0x10, 0xf7, 0xd0, }, 6, 0, "", "",
+"62 f4 f4 10 f7 d0    \tnot    %rax,%r17",},
+{{0x62, 0x9c, 0x3c, 0x18, 0xf6, 0x14, 0x27, }, 7, 0, "", "",
+"62 9c 3c 18 f6 14 27 \tnot    (%r31,%r12,1),%r8b",},
+{{0x62, 0xf4, 0x0d, 0x10, 0x81, 0xc8, 0x34, 0x12, }, 8, 0, "", "",
+"62 f4 0d 10 81 c8 34 12 \tor     $0x1234,%ax,%r30w",},
+{{0x62, 0x7c, 0x6c, 0x10, 0x08, 0xf9, }, 6, 0, "", "",
+"62 7c 6c 10 08 f9    \tor     %r15b,%r17b,%r18b",},
+{{0x62, 0x54, 0x6c, 0x10, 0x09, 0x38, }, 6, 0, "", "",
+"62 54 6c 10 09 38    \tor     %r15d,(%r8),%r18d",},
+{{0x62, 0xc4, 0x3c, 0x18, 0x0a, 0x04, 0x07, }, 7, 0, "", "",
+"62 c4 3c 18 0a 04 07 \tor     (%r15,%rax,1),%r16b,%r8b",},
+{{0x62, 0xc4, 0x3d, 0x18, 0x0b, 0x04, 0x07, }, 7, 0, "", "",
+"62 c4 3d 18 0b 04 07 \tor     (%r15,%rax,1),%r16w,%r8w",},
+{{0x62, 0xfc, 0x5c, 0x10, 0x83, 0x0c, 0x83, 0x11, }, 8, 0, "", "",
+"62 fc 5c 10 83 0c 83 11 \tor     $0x11,(%r19,%rax,4),%r20d",},
+{{0x62, 0xd4, 0x04, 0x10, 0xc0, 0xd4, 0x02, }, 7, 0, "", "",
+"62 d4 04 10 c0 d4 02 \trcl    $0x2,%r12b,%r31b",},
+{{0x62, 0xfc, 0x3c, 0x18, 0xd2, 0xd0, }, 6, 0, "", "",
+"62 fc 3c 18 d2 d0    \trcl    %cl,%r16b,%r8b",},
+{{0x62, 0xf4, 0x04, 0x10, 0xd0, 0x10, }, 6, 0, "", "",
+"62 f4 04 10 d0 10    \trcl    $1,(%rax),%r31b",},
+{{0x62, 0xf4, 0x04, 0x10, 0xc1, 0x10, 0x02, }, 7, 0, "", "",
+"62 f4 04 10 c1 10 02 \trcl    $0x2,(%rax),%r31d",},
+{{0x62, 0xf4, 0x05, 0x10, 0xd1, 0x10, }, 6, 0, "", "",
+"62 f4 05 10 d1 10    \trcl    $1,(%rax),%r31w",},
+{{0x62, 0xfc, 0x05, 0x10, 0xd3, 0x14, 0x83, }, 7, 0, "", "",
+"62 fc 05 10 d3 14 83 \trcl    %cl,(%r19,%rax,4),%r31w",},
+{{0x62, 0xd4, 0x04, 0x10, 0xc0, 0xdc, 0x02, }, 7, 0, "", "",
+"62 d4 04 10 c0 dc 02 \trcr    $0x2,%r12b,%r31b",},
+{{0x62, 0xfc, 0x3c, 0x18, 0xd2, 0xd8, }, 6, 0, "", "",
+"62 fc 3c 18 d2 d8    \trcr    %cl,%r16b,%r8b",},
+{{0x62, 0xf4, 0x04, 0x10, 0xd0, 0x18, }, 6, 0, "", "",
+"62 f4 04 10 d0 18    \trcr    $1,(%rax),%r31b",},
+{{0x62, 0xf4, 0x04, 0x10, 0xc1, 0x18, 0x02, }, 7, 0, "", "",
+"62 f4 04 10 c1 18 02 \trcr    $0x2,(%rax),%r31d",},
+{{0x62, 0xf4, 0x05, 0x10, 0xd1, 0x18, }, 6, 0, "", "",
+"62 f4 05 10 d1 18    \trcr    $1,(%rax),%r31w",},
+{{0x62, 0xfc, 0x05, 0x10, 0xd3, 0x1c, 0x83, }, 7, 0, "", "",
+"62 fc 05 10 d3 1c 83 \trcr    %cl,(%r19,%rax,4),%r31w",},
+{{0x62, 0xd4, 0x04, 0x10, 0xc0, 0xc4, 0x02, }, 7, 0, "", "",
+"62 d4 04 10 c0 c4 02 \trol    $0x2,%r12b,%r31b",},
+{{0x62, 0xfc, 0x3c, 0x18, 0xd2, 0xc0, }, 6, 0, "", "",
+"62 fc 3c 18 d2 c0    \trol    %cl,%r16b,%r8b",},
+{{0x62, 0xf4, 0x04, 0x10, 0xd0, 0x00, }, 6, 0, "", "",
+"62 f4 04 10 d0 00    \trol    $1,(%rax),%r31b",},
+{{0x62, 0xf4, 0x04, 0x10, 0xc1, 0x00, 0x02, }, 7, 0, "", "",
+"62 f4 04 10 c1 00 02 \trol    $0x2,(%rax),%r31d",},
+{{0x62, 0xf4, 0x05, 0x10, 0xd1, 0x00, }, 6, 0, "", "",
+"62 f4 05 10 d1 00    \trol    $1,(%rax),%r31w",},
+{{0x62, 0xfc, 0x05, 0x10, 0xd3, 0x04, 0x83, }, 7, 0, "", "",
+"62 fc 05 10 d3 04 83 \trol    %cl,(%r19,%rax,4),%r31w",},
+{{0x62, 0xd4, 0x04, 0x10, 0xc0, 0xcc, 0x02, }, 7, 0, "", "",
+"62 d4 04 10 c0 cc 02 \tror    $0x2,%r12b,%r31b",},
+{{0x62, 0xfc, 0x3c, 0x18, 0xd2, 0xc8, }, 6, 0, "", "",
+"62 fc 3c 18 d2 c8    \tror    %cl,%r16b,%r8b",},
+{{0x62, 0xf4, 0x04, 0x10, 0xd0, 0x08, }, 6, 0, "", "",
+"62 f4 04 10 d0 08    \tror    $1,(%rax),%r31b",},
+{{0x62, 0xf4, 0x04, 0x10, 0xc1, 0x08, 0x02, }, 7, 0, "", "",
+"62 f4 04 10 c1 08 02 \tror    $0x2,(%rax),%r31d",},
+{{0x62, 0xf4, 0x05, 0x10, 0xd1, 0x08, }, 6, 0, "", "",
+"62 f4 05 10 d1 08    \tror    $1,(%rax),%r31w",},
+{{0x62, 0xfc, 0x05, 0x10, 0xd3, 0x0c, 0x83, }, 7, 0, "", "",
+"62 fc 05 10 d3 0c 83 \tror    %cl,(%r19,%rax,4),%r31w",},
+{{0x62, 0xd4, 0x04, 0x10, 0xc0, 0xfc, 0x02, }, 7, 0, "", "",
+"62 d4 04 10 c0 fc 02 \tsar    $0x2,%r12b,%r31b",},
+{{0x62, 0xfc, 0x3c, 0x18, 0xd2, 0xf8, }, 6, 0, "", "",
+"62 fc 3c 18 d2 f8    \tsar    %cl,%r16b,%r8b",},
+{{0x62, 0xf4, 0x04, 0x10, 0xd0, 0x38, }, 6, 0, "", "",
+"62 f4 04 10 d0 38    \tsar    $1,(%rax),%r31b",},
+{{0x62, 0xf4, 0x04, 0x10, 0xc1, 0x38, 0x02, }, 7, 0, "", "",
+"62 f4 04 10 c1 38 02 \tsar    $0x2,(%rax),%r31d",},
+{{0x62, 0xf4, 0x05, 0x10, 0xd1, 0x38, }, 6, 0, "", "",
+"62 f4 05 10 d1 38    \tsar    $1,(%rax),%r31w",},
+{{0x62, 0xfc, 0x05, 0x10, 0xd3, 0x3c, 0x83, }, 7, 0, "", "",
+"62 fc 05 10 d3 3c 83 \tsar    %cl,(%r19,%rax,4),%r31w",},
+{{0x62, 0xf4, 0x0d, 0x10, 0x81, 0xd8, 0x34, 0x12, }, 8, 0, "", "",
+"62 f4 0d 10 81 d8 34 12 \tsbb    $0x1234,%ax,%r30w",},
+{{0x62, 0x7c, 0x6c, 0x10, 0x18, 0xf9, }, 6, 0, "", "",
+"62 7c 6c 10 18 f9    \tsbb    %r15b,%r17b,%r18b",},
+{{0x62, 0x54, 0x6c, 0x10, 0x19, 0x38, }, 6, 0, "", "",
+"62 54 6c 10 19 38    \tsbb    %r15d,(%r8),%r18d",},
+{{0x62, 0xc4, 0x3c, 0x18, 0x1a, 0x04, 0x07, }, 7, 0, "", "",
+"62 c4 3c 18 1a 04 07 \tsbb    (%r15,%rax,1),%r16b,%r8b",},
+{{0x62, 0xc4, 0x3d, 0x18, 0x1b, 0x04, 0x07, }, 7, 0, "", "",
+"62 c4 3d 18 1b 04 07 \tsbb    (%r15,%rax,1),%r16w,%r8w",},
+{{0x62, 0xfc, 0x5c, 0x10, 0x83, 0x1c, 0x83, 0x11, }, 8, 0, "", "",
+"62 fc 5c 10 83 1c 83 11 \tsbb    $0x11,(%r19,%rax,4),%r20d",},
+{{0x62, 0xd4, 0x04, 0x10, 0xc0, 0xe4, 0x02, }, 7, 0, "", "",
+"62 d4 04 10 c0 e4 02 \tshl    $0x2,%r12b,%r31b",},
+{{0x62, 0xd4, 0x04, 0x10, 0xc0, 0xe4, 0x02, }, 7, 0, "", "",
+"62 d4 04 10 c0 e4 02 \tshl    $0x2,%r12b,%r31b",},
+{{0x62, 0xfc, 0x3c, 0x18, 0xd2, 0xe0, }, 6, 0, "", "",
+"62 fc 3c 18 d2 e0    \tshl    %cl,%r16b,%r8b",},
+{{0x62, 0xfc, 0x3c, 0x18, 0xd2, 0xe0, }, 6, 0, "", "",
+"62 fc 3c 18 d2 e0    \tshl    %cl,%r16b,%r8b",},
+{{0x62, 0xf4, 0x04, 0x10, 0xd0, 0x20, }, 6, 0, "", "",
+"62 f4 04 10 d0 20    \tshl    $1,(%rax),%r31b",},
+{{0x62, 0xf4, 0x04, 0x10, 0xd0, 0x20, }, 6, 0, "", "",
+"62 f4 04 10 d0 20    \tshl    $1,(%rax),%r31b",},
+{{0x62, 0x74, 0x84, 0x10, 0x24, 0x20, 0x01, }, 7, 0, "", "",
+"62 74 84 10 24 20 01 \tshld   $0x1,%r12,(%rax),%r31",},
+{{0x62, 0x74, 0x04, 0x10, 0x24, 0x38, 0x02, }, 7, 0, "", "",
+"62 74 04 10 24 38 02 \tshld   $0x2,%r15d,(%rax),%r31d",},
+{{0x62, 0x54, 0x05, 0x10, 0x24, 0xc4, 0x02, }, 7, 0, "", "",
+"62 54 05 10 24 c4 02 \tshld   $0x2,%r8w,%r12w,%r31w",},
+{{0x62, 0x7c, 0xbc, 0x18, 0xa5, 0xe0, }, 6, 0, "", "",
+"62 7c bc 18 a5 e0    \tshld   %cl,%r12,%r16,%r8",},
+{{0x62, 0x7c, 0x05, 0x10, 0xa5, 0x2c, 0x83, }, 7, 0, "", "",
+"62 7c 05 10 a5 2c 83 \tshld   %cl,%r13w,(%r19,%rax,4),%r31w",},
+{{0x62, 0x74, 0x05, 0x10, 0xa5, 0x08, }, 6, 0, "", "",
+"62 74 05 10 a5 08    \tshld   %cl,%r9w,(%rax),%r31w",},
+{{0x62, 0xf4, 0x04, 0x10, 0xc1, 0x20, 0x02, }, 7, 0, "", "",
+"62 f4 04 10 c1 20 02 \tshl    $0x2,(%rax),%r31d",},
+{{0x62, 0xf4, 0x04, 0x10, 0xc1, 0x20, 0x02, }, 7, 0, "", "",
+"62 f4 04 10 c1 20 02 \tshl    $0x2,(%rax),%r31d",},
+{{0x62, 0xf4, 0x05, 0x10, 0xd1, 0x20, }, 6, 0, "", "",
+"62 f4 05 10 d1 20    \tshl    $1,(%rax),%r31w",},
+{{0x62, 0xf4, 0x05, 0x10, 0xd1, 0x20, }, 6, 0, "", "",
+"62 f4 05 10 d1 20    \tshl    $1,(%rax),%r31w",},
+{{0x62, 0xfc, 0x05, 0x10, 0xd3, 0x24, 0x83, }, 7, 0, "", "",
+"62 fc 05 10 d3 24 83 \tshl    %cl,(%r19,%rax,4),%r31w",},
+{{0x62, 0xfc, 0x05, 0x10, 0xd3, 0x24, 0x83, }, 7, 0, "", "",
+"62 fc 05 10 d3 24 83 \tshl    %cl,(%r19,%rax,4),%r31w",},
+{{0x62, 0xd4, 0x04, 0x10, 0xc0, 0xec, 0x02, }, 7, 0, "", "",
+"62 d4 04 10 c0 ec 02 \tshr    $0x2,%r12b,%r31b",},
+{{0x62, 0xfc, 0x3c, 0x18, 0xd2, 0xe8, }, 6, 0, "", "",
+"62 fc 3c 18 d2 e8    \tshr    %cl,%r16b,%r8b",},
+{{0x62, 0xf4, 0x04, 0x10, 0xd0, 0x28, }, 6, 0, "", "",
+"62 f4 04 10 d0 28    \tshr    $1,(%rax),%r31b",},
+{{0x62, 0x74, 0x84, 0x10, 0x2c, 0x20, 0x01, }, 7, 0, "", "",
+"62 74 84 10 2c 20 01 \tshrd   $0x1,%r12,(%rax),%r31",},
+{{0x62, 0x74, 0x04, 0x10, 0x2c, 0x38, 0x02, }, 7, 0, "", "",
+"62 74 04 10 2c 38 02 \tshrd   $0x2,%r15d,(%rax),%r31d",},
+{{0x62, 0x54, 0x05, 0x10, 0x2c, 0xc4, 0x02, }, 7, 0, "", "",
+"62 54 05 10 2c c4 02 \tshrd   $0x2,%r8w,%r12w,%r31w",},
+{{0x62, 0x7c, 0xbc, 0x18, 0xad, 0xe0, }, 6, 0, "", "",
+"62 7c bc 18 ad e0    \tshrd   %cl,%r12,%r16,%r8",},
+{{0x62, 0x7c, 0x05, 0x10, 0xad, 0x2c, 0x83, }, 7, 0, "", "",
+"62 7c 05 10 ad 2c 83 \tshrd   %cl,%r13w,(%r19,%rax,4),%r31w",},
+{{0x62, 0x74, 0x05, 0x10, 0xad, 0x08, }, 6, 0, "", "",
+"62 74 05 10 ad 08    \tshrd   %cl,%r9w,(%rax),%r31w",},
+{{0x62, 0xf4, 0x04, 0x10, 0xc1, 0x28, 0x02, }, 7, 0, "", "",
+"62 f4 04 10 c1 28 02 \tshr    $0x2,(%rax),%r31d",},
+{{0x62, 0xf4, 0x05, 0x10, 0xd1, 0x28, }, 6, 0, "", "",
+"62 f4 05 10 d1 28    \tshr    $1,(%rax),%r31w",},
+{{0x62, 0xfc, 0x05, 0x10, 0xd3, 0x2c, 0x83, }, 7, 0, "", "",
+"62 fc 05 10 d3 2c 83 \tshr    %cl,(%r19,%rax,4),%r31w",},
+{{0x62, 0xf4, 0x0d, 0x10, 0x81, 0xe8, 0x34, 0x12, }, 8, 0, "", "",
+"62 f4 0d 10 81 e8 34 12 \tsub    $0x1234,%ax,%r30w",},
+{{0x62, 0x7c, 0x6c, 0x10, 0x28, 0xf9, }, 6, 0, "", "",
+"62 7c 6c 10 28 f9    \tsub    %r15b,%r17b,%r18b",},
+{{0x62, 0x54, 0x6c, 0x10, 0x29, 0x38, }, 6, 0, "", "",
+"62 54 6c 10 29 38    \tsub    %r15d,(%r8),%r18d",},
+{{0x62, 0xc4, 0x3c, 0x18, 0x2a, 0x04, 0x07, }, 7, 0, "", "",
+"62 c4 3c 18 2a 04 07 \tsub    (%r15,%rax,1),%r16b,%r8b",},
+{{0x62, 0xc4, 0x3d, 0x18, 0x2b, 0x04, 0x07, }, 7, 0, "", "",
+"62 c4 3d 18 2b 04 07 \tsub    (%r15,%rax,1),%r16w,%r8w",},
+{{0x62, 0xfc, 0x5c, 0x10, 0x83, 0x2c, 0x83, 0x11, }, 8, 0, "", "",
+"62 fc 5c 10 83 2c 83 11 \tsub    $0x11,(%r19,%rax,4),%r20d",},
+{{0x62, 0xf4, 0x0d, 0x10, 0x81, 0xf0, 0x34, 0x12, }, 8, 0, "", "",
+"62 f4 0d 10 81 f0 34 12 \txor    $0x1234,%ax,%r30w",},
+{{0x62, 0x7c, 0x6c, 0x10, 0x30, 0xf9, }, 6, 0, "", "",
+"62 7c 6c 10 30 f9    \txor    %r15b,%r17b,%r18b",},
+{{0x62, 0x54, 0x6c, 0x10, 0x31, 0x38, }, 6, 0, "", "",
+"62 54 6c 10 31 38    \txor    %r15d,(%r8),%r18d",},
+{{0x62, 0xc4, 0x3c, 0x18, 0x32, 0x04, 0x07, }, 7, 0, "", "",
+"62 c4 3c 18 32 04 07 \txor    (%r15,%rax,1),%r16b,%r8b",},
+{{0x62, 0xc4, 0x3d, 0x18, 0x33, 0x04, 0x07, }, 7, 0, "", "",
+"62 c4 3d 18 33 04 07 \txor    (%r15,%rax,1),%r16w,%r8w",},
+{{0x62, 0xfc, 0x5c, 0x10, 0x83, 0x34, 0x83, 0x11, }, 8, 0, "", "",
+"62 fc 5c 10 83 34 83 11 \txor    $0x11,(%r19,%rax,4),%r20d",},
+{{0x62, 0xf4, 0x3c, 0x1c, 0x00, 0xda, }, 6, 0, "", "",
+"62 f4 3c 1c 00 da    \t{nf} add %bl,%dl,%r8b",},
+{{0x62, 0xf4, 0x35, 0x1c, 0x01, 0xd0, }, 6, 0, "", "",
+"62 f4 35 1c 01 d0    \t{nf} add %dx,%ax,%r9w",},
+{{0x62, 0xd4, 0x6c, 0x1c, 0x02, 0x9c, 0x80, 0x23, 0x01, 0x00, 0x00, }, 11, 0, "", "",
+"62 d4 6c 1c 02 9c 80 23 01 00 00 \t{nf} add 0x123(%r8,%rax,4),%bl,%dl",},
+{{0x62, 0xd4, 0x7d, 0x1c, 0x03, 0x94, 0x80, 0x23, 0x01, 0x00, 0x00, }, 11, 0, "", "",
+"62 d4 7d 1c 03 94 80 23 01 00 00 \t{nf} add 0x123(%r8,%rax,4),%dx,%ax",},
+{{0x62, 0xf4, 0x3c, 0x1c, 0x08, 0xda, }, 6, 0, "", "",
+"62 f4 3c 1c 08 da    \t{nf} or %bl,%dl,%r8b",},
+{{0x62, 0xf4, 0x35, 0x1c, 0x09, 0xd0, }, 6, 0, "", "",
+"62 f4 35 1c 09 d0    \t{nf} or %dx,%ax,%r9w",},
+{{0x62, 0xd4, 0x6c, 0x1c, 0x0a, 0x9c, 0x80, 0x23, 0x01, 0x00, 0x00, }, 11, 0, "", "",
+"62 d4 6c 1c 0a 9c 80 23 01 00 00 \t{nf} or 0x123(%r8,%rax,4),%bl,%dl",},
+{{0x62, 0xd4, 0x7d, 0x1c, 0x0b, 0x94, 0x80, 0x23, 0x01, 0x00, 0x00, }, 11, 0, "", "",
+"62 d4 7d 1c 0b 94 80 23 01 00 00 \t{nf} or 0x123(%r8,%rax,4),%dx,%ax",},
+{{0x62, 0xf4, 0x3c, 0x1c, 0x20, 0xda, }, 6, 0, "", "",
+"62 f4 3c 1c 20 da    \t{nf} and %bl,%dl,%r8b",},
+{{0x62, 0xf4, 0x35, 0x1c, 0x21, 0xd0, }, 6, 0, "", "",
+"62 f4 35 1c 21 d0    \t{nf} and %dx,%ax,%r9w",},
+{{0x62, 0xd4, 0x6c, 0x1c, 0x22, 0x9c, 0x80, 0x23, 0x01, 0x00, 0x00, }, 11, 0, "", "",
+"62 d4 6c 1c 22 9c 80 23 01 00 00 \t{nf} and 0x123(%r8,%rax,4),%bl,%dl",},
+{{0x62, 0xd4, 0x7d, 0x1c, 0x23, 0x94, 0x80, 0x23, 0x01, 0x00, 0x00, }, 11, 0, "", "",
+"62 d4 7d 1c 23 94 80 23 01 00 00 \t{nf} and 0x123(%r8,%rax,4),%dx,%ax",},
+{{0x62, 0xf4, 0x35, 0x1c, 0x24, 0xd0, 0x7b, }, 7, 0, "", "",
+"62 f4 35 1c 24 d0 7b \t{nf} shld $0x7b,%dx,%ax,%r9w",},
+{{0x62, 0xf4, 0x3c, 0x1c, 0x28, 0xda, }, 6, 0, "", "",
+"62 f4 3c 1c 28 da    \t{nf} sub %bl,%dl,%r8b",},
+{{0x62, 0xf4, 0x35, 0x1c, 0x29, 0xd0, }, 6, 0, "", "",
+"62 f4 35 1c 29 d0    \t{nf} sub %dx,%ax,%r9w",},
+{{0x62, 0xd4, 0x6c, 0x1c, 0x2a, 0x9c, 0x80, 0x23, 0x01, 0x00, 0x00, }, 11, 0, "", "",
+"62 d4 6c 1c 2a 9c 80 23 01 00 00 \t{nf} sub 0x123(%r8,%rax,4),%bl,%dl",},
+{{0x62, 0xd4, 0x7d, 0x1c, 0x2b, 0x94, 0x80, 0x23, 0x01, 0x00, 0x00, }, 11, 0, "", "",
+"62 d4 7d 1c 2b 94 80 23 01 00 00 \t{nf} sub 0x123(%r8,%rax,4),%dx,%ax",},
+{{0x62, 0xf4, 0x35, 0x1c, 0x2c, 0xd0, 0x7b, }, 7, 0, "", "",
+"62 f4 35 1c 2c d0 7b \t{nf} shrd $0x7b,%dx,%ax,%r9w",},
+{{0x62, 0xf4, 0x3c, 0x1c, 0x30, 0xda, }, 6, 0, "", "",
+"62 f4 3c 1c 30 da    \t{nf} xor %bl,%dl,%r8b",},
+{{0x62, 0x4c, 0xfc, 0x0c, 0x31, 0xff, }, 6, 0, "", "",
+"62 4c fc 0c 31 ff    \t{nf} xor %r31,%r31",},
+{{0x62, 0xd4, 0x6c, 0x1c, 0x32, 0x9c, 0x80, 0x23, 0x01, 0x00, 0x00, }, 11, 0, "", "",
+"62 d4 6c 1c 32 9c 80 23 01 00 00 \t{nf} xor 0x123(%r8,%rax,4),%bl,%dl",},
+{{0x62, 0xd4, 0x7d, 0x1c, 0x33, 0x94, 0x80, 0x23, 0x01, 0x00, 0x00, }, 11, 0, "", "",
+"62 d4 7d 1c 33 94 80 23 01 00 00 \t{nf} xor 0x123(%r8,%rax,4),%dx,%ax",},
+{{0x62, 0x54, 0xfc, 0x0c, 0x69, 0xf9, 0x90, 0xff, 0x00, 0x00, }, 10, 0, "", "",
+"62 54 fc 0c 69 f9 90 ff 00 00 \t{nf} imul $0xff90,%r9,%r15",},
+{{0x62, 0x54, 0xfc, 0x0c, 0x6b, 0xf9, 0x7b, }, 7, 0, "", "",
+"62 54 fc 0c 6b f9 7b \t{nf} imul $0x7b,%r9,%r15",},
+{{0x62, 0xf4, 0x6c, 0x1c, 0x80, 0xf3, 0x7b, }, 7, 0, "", "",
+"62 f4 6c 1c 80 f3 7b \t{nf} xor $0x7b,%bl,%dl",},
+{{0x62, 0xf4, 0x7d, 0x1c, 0x83, 0xf2, 0x7b, }, 7, 0, "", "",
+"62 f4 7d 1c 83 f2 7b \t{nf} xor $0x7b,%dx,%ax",},
+{{0x62, 0x44, 0xfc, 0x0c, 0x88, 0xf9, }, 6, 0, "", "",
+"62 44 fc 0c 88 f9    \t{nf} popcnt %r9,%r31",},
+{{0x62, 0xf4, 0x35, 0x1c, 0xa5, 0xd0, }, 6, 0, "", "",
+"62 f4 35 1c a5 d0    \t{nf} shld %cl,%dx,%ax,%r9w",},
+{{0x62, 0xf4, 0x35, 0x1c, 0xad, 0xd0, }, 6, 0, "", "",
+"62 f4 35 1c ad d0    \t{nf} shrd %cl,%dx,%ax,%r9w",},
+{{0x62, 0x44, 0xa4, 0x1c, 0xaf, 0xf9, }, 6, 0, "", "",
+"62 44 a4 1c af f9    \t{nf} imul %r9,%r31,%r11",},
+{{0x62, 0xf4, 0x6c, 0x1c, 0xc0, 0xfb, 0x7b, }, 7, 0, "", "",
+"62 f4 6c 1c c0 fb 7b \t{nf} sar $0x7b,%bl,%dl",},
+{{0x62, 0xf4, 0x7d, 0x1c, 0xc1, 0xfa, 0x7b, }, 7, 0, "", "",
+"62 f4 7d 1c c1 fa 7b \t{nf} sar $0x7b,%dx,%ax",},
+{{0x62, 0xf4, 0x6c, 0x1c, 0xd0, 0xfb, }, 6, 0, "", "",
+"62 f4 6c 1c d0 fb    \t{nf} sar $1,%bl,%dl",},
+{{0x62, 0xf4, 0x7d, 0x1c, 0xd1, 0xfa, }, 6, 0, "", "",
+"62 f4 7d 1c d1 fa    \t{nf} sar $1,%dx,%ax",},
+{{0x62, 0xf4, 0x6c, 0x1c, 0xd2, 0xfb, }, 6, 0, "", "",
+"62 f4 6c 1c d2 fb    \t{nf} sar %cl,%bl,%dl",},
+{{0x62, 0xf4, 0x7d, 0x1c, 0xd3, 0xfa, }, 6, 0, "", "",
+"62 f4 7d 1c d3 fa    \t{nf} sar %cl,%dx,%ax",},
+{{0x62, 0x52, 0x84, 0x04, 0xf2, 0xd9, }, 6, 0, "", "",
+"62 52 84 04 f2 d9    \t{nf} andn %r9,%r31,%r11",},
+{{0x62, 0xd2, 0x84, 0x04, 0xf3, 0xd9, }, 6, 0, "", "",
+"62 d2 84 04 f3 d9    \t{nf} blsi %r9,%r31",},
+{{0x62, 0x44, 0xfc, 0x0c, 0xf4, 0xf9, }, 6, 0, "", "",
+"62 44 fc 0c f4 f9    \t{nf} tzcnt %r9,%r31",},
+{{0x62, 0x44, 0xfc, 0x0c, 0xf5, 0xf9, }, 6, 0, "", "",
+"62 44 fc 0c f5 f9    \t{nf} lzcnt %r9,%r31",},
+{{0x62, 0xf4, 0x7c, 0x0c, 0xf6, 0xfb, }, 6, 0, "", "",
+"62 f4 7c 0c f6 fb    \t{nf} idiv %bl",},
+{{0x62, 0xf4, 0x7d, 0x0c, 0xf7, 0xfa, }, 6, 0, "", "",
+"62 f4 7d 0c f7 fa    \t{nf} idiv %dx",},
+{{0x62, 0xf4, 0x6c, 0x1c, 0xfe, 0xcb, }, 6, 0, "", "",
+"62 f4 6c 1c fe cb    \t{nf} dec %bl,%dl",},
+{{0x62, 0xf4, 0x7d, 0x1c, 0xff, 0xca, }, 6, 0, "", "",
+"62 f4 7d 1c ff ca    \t{nf} dec %dx,%ax",},
+{{0xf3, 0x0f, 0x38, 0xdc, 0xd1, }, 5, 0, "", "",
+"f3 0f 38 dc d1       \tloadiwkey %xmm1,%xmm2",},
+{{0xf3, 0x0f, 0x38, 0xfa, 0xd0, }, 5, 0, "", "",
+"f3 0f 38 fa d0       \tencodekey128 %eax,%edx",},
+{{0xf3, 0x0f, 0x38, 0xfb, 0xd0, }, 5, 0, "", "",
+"f3 0f 38 fb d0       \tencodekey256 %eax,%edx",},
+{{0x67, 0xf3, 0x0f, 0x38, 0xdc, 0x5a, 0x77, }, 7, 0, "", "",
+"67 f3 0f 38 dc 5a 77 \taesenc128kl 0x77(%edx),%xmm3",},
+{{0x67, 0xf3, 0x0f, 0x38, 0xde, 0x5a, 0x77, }, 7, 0, "", "",
+"67 f3 0f 38 de 5a 77 \taesenc256kl 0x77(%edx),%xmm3",},
+{{0x67, 0xf3, 0x0f, 0x38, 0xdd, 0x5a, 0x77, }, 7, 0, "", "",
+"67 f3 0f 38 dd 5a 77 \taesdec128kl 0x77(%edx),%xmm3",},
+{{0x67, 0xf3, 0x0f, 0x38, 0xdf, 0x5a, 0x77, }, 7, 0, "", "",
+"67 f3 0f 38 df 5a 77 \taesdec256kl 0x77(%edx),%xmm3",},
+{{0x67, 0xf3, 0x0f, 0x38, 0xd8, 0x42, 0x77, }, 7, 0, "", "",
+"67 f3 0f 38 d8 42 77 \taesencwide128kl 0x77(%edx)",},
+{{0x67, 0xf3, 0x0f, 0x38, 0xd8, 0x52, 0x77, }, 7, 0, "", "",
+"67 f3 0f 38 d8 52 77 \taesencwide256kl 0x77(%edx)",},
+{{0x67, 0xf3, 0x0f, 0x38, 0xd8, 0x4a, 0x77, }, 7, 0, "", "",
+"67 f3 0f 38 d8 4a 77 \taesdecwide128kl 0x77(%edx)",},
+{{0x67, 0xf3, 0x0f, 0x38, 0xd8, 0x5a, 0x77, }, 7, 0, "", "",
+"67 f3 0f 38 d8 5a 77 \taesdecwide256kl 0x77(%edx)",},
+{{0x67, 0x0f, 0x38, 0xfc, 0x08, }, 5, 0, "", "",
+"67 0f 38 fc 08       \taadd   %ecx,(%eax)",},
+{{0x0f, 0x38, 0xfc, 0x14, 0x25, 0x78, 0x56, 0x34, 0x12, }, 9, 0, "", "",
+"0f 38 fc 14 25 78 56 34 12 \taadd   %edx,0x12345678",},
+{{0x67, 0x0f, 0x38, 0xfc, 0x94, 0xc8, 0x78, 0x56, 0x34, 0x12, }, 10, 0, "", "",
+"67 0f 38 fc 94 c8 78 56 34 12 \taadd   %edx,0x12345678(%eax,%ecx,8)",},
+{{0x67, 0x66, 0x0f, 0x38, 0xfc, 0x08, }, 6, 0, "", "",
+"67 66 0f 38 fc 08    \taand   %ecx,(%eax)",},
+{{0x66, 0x0f, 0x38, 0xfc, 0x14, 0x25, 0x78, 0x56, 0x34, 0x12, }, 10, 0, "", "",
+"66 0f 38 fc 14 25 78 56 34 12 \taand   %edx,0x12345678",},
+{{0x67, 0x66, 0x0f, 0x38, 0xfc, 0x94, 0xc8, 0x78, 0x56, 0x34, 0x12, }, 11, 0, "", "",
+"67 66 0f 38 fc 94 c8 78 56 34 12 \taand   %edx,0x12345678(%eax,%ecx,8)",},
+{{0x67, 0xf2, 0x0f, 0x38, 0xfc, 0x08, }, 6, 0, "", "",
+"67 f2 0f 38 fc 08    \taor    %ecx,(%eax)",},
+{{0xf2, 0x0f, 0x38, 0xfc, 0x14, 0x25, 0x78, 0x56, 0x34, 0x12, }, 10, 0, "", "",
+"f2 0f 38 fc 14 25 78 56 34 12 \taor    %edx,0x12345678",},
+{{0x67, 0xf2, 0x0f, 0x38, 0xfc, 0x94, 0xc8, 0x78, 0x56, 0x34, 0x12, }, 11, 0, "", "",
+"67 f2 0f 38 fc 94 c8 78 56 34 12 \taor    %edx,0x12345678(%eax,%ecx,8)",},
+{{0x67, 0xf3, 0x0f, 0x38, 0xfc, 0x08, }, 6, 0, "", "",
+"67 f3 0f 38 fc 08    \taxor   %ecx,(%eax)",},
+{{0xf3, 0x0f, 0x38, 0xfc, 0x14, 0x25, 0x78, 0x56, 0x34, 0x12, }, 10, 0, "", "",
+"f3 0f 38 fc 14 25 78 56 34 12 \taxor   %edx,0x12345678",},
+{{0x67, 0xf3, 0x0f, 0x38, 0xfc, 0x94, 0xc8, 0x78, 0x56, 0x34, 0x12, }, 11, 0, "", "",
+"67 f3 0f 38 fc 94 c8 78 56 34 12 \taxor   %edx,0x12345678(%eax,%ecx,8)",},
+{{0x67, 0xc4, 0xe2, 0x7a, 0xb1, 0x31, }, 6, 0, "", "",
+"67 c4 e2 7a b1 31    \tvbcstnebf162ps (%ecx),%xmm6",},
+{{0x67, 0xc4, 0xe2, 0x79, 0xb1, 0x31, }, 6, 0, "", "",
+"67 c4 e2 79 b1 31    \tvbcstnesh2ps (%ecx),%xmm6",},
+{{0x67, 0xc4, 0xe2, 0x7a, 0xb0, 0x31, }, 6, 0, "", "",
+"67 c4 e2 7a b0 31    \tvcvtneebf162ps (%ecx),%xmm6",},
+{{0x67, 0xc4, 0xe2, 0x79, 0xb0, 0x31, }, 6, 0, "", "",
+"67 c4 e2 79 b0 31    \tvcvtneeph2ps (%ecx),%xmm6",},
+{{0x67, 0xc4, 0xe2, 0x7b, 0xb0, 0x31, }, 6, 0, "", "",
+"67 c4 e2 7b b0 31    \tvcvtneobf162ps (%ecx),%xmm6",},
+{{0x67, 0xc4, 0xe2, 0x78, 0xb0, 0x31, }, 6, 0, "", "",
+"67 c4 e2 78 b0 31    \tvcvtneoph2ps (%ecx),%xmm6",},
+{{0x62, 0xf2, 0x7e, 0x08, 0x72, 0xf1, }, 6, 0, "", "",
+"62 f2 7e 08 72 f1    \tvcvtneps2bf16 %xmm1,%xmm6",},
+{{0xc4, 0xe2, 0x6b, 0x50, 0xd9, }, 5, 0, "", "",
+"c4 e2 6b 50 d9       \tvpdpbssd %xmm1,%xmm2,%xmm3",},
+{{0xc4, 0xe2, 0x6b, 0x51, 0xd9, }, 5, 0, "", "",
+"c4 e2 6b 51 d9       \tvpdpbssds %xmm1,%xmm2,%xmm3",},
+{{0xc4, 0xe2, 0x6a, 0x50, 0xd9, }, 5, 0, "", "",
+"c4 e2 6a 50 d9       \tvpdpbsud %xmm1,%xmm2,%xmm3",},
+{{0xc4, 0xe2, 0x6a, 0x51, 0xd9, }, 5, 0, "", "",
+"c4 e2 6a 51 d9       \tvpdpbsuds %xmm1,%xmm2,%xmm3",},
+{{0xc4, 0xe2, 0x68, 0x50, 0xd9, }, 5, 0, "", "",
+"c4 e2 68 50 d9       \tvpdpbuud %xmm1,%xmm2,%xmm3",},
+{{0xc4, 0xe2, 0x68, 0x51, 0xd9, }, 5, 0, "", "",
+"c4 e2 68 51 d9       \tvpdpbuuds %xmm1,%xmm2,%xmm3",},
+{{0xc4, 0xe2, 0x6a, 0xd2, 0xd9, }, 5, 0, "", "",
+"c4 e2 6a d2 d9       \tvpdpwsud %xmm1,%xmm2,%xmm3",},
+{{0xc4, 0xe2, 0x6a, 0xd3, 0xd9, }, 5, 0, "", "",
+"c4 e2 6a d3 d9       \tvpdpwsuds %xmm1,%xmm2,%xmm3",},
+{{0xc4, 0xe2, 0x69, 0xd2, 0xd9, }, 5, 0, "", "",
+"c4 e2 69 d2 d9       \tvpdpwusd %xmm1,%xmm2,%xmm3",},
+{{0xc4, 0xe2, 0x69, 0xd3, 0xd9, }, 5, 0, "", "",
+"c4 e2 69 d3 d9       \tvpdpwusds %xmm1,%xmm2,%xmm3",},
+{{0xc4, 0xe2, 0x68, 0xd2, 0xd9, }, 5, 0, "", "",
+"c4 e2 68 d2 d9       \tvpdpwuud %xmm1,%xmm2,%xmm3",},
+{{0xc4, 0xe2, 0x68, 0xd3, 0xd9, }, 5, 0, "", "",
+"c4 e2 68 d3 d9       \tvpdpwuuds %xmm1,%xmm2,%xmm3",},
+{{0x62, 0xf2, 0xed, 0x08, 0xb5, 0xd9, }, 6, 0, "", "",
+"62 f2 ed 08 b5 d9    \tvpmadd52huq %xmm1,%xmm2,%xmm3",},
+{{0x62, 0xf2, 0xed, 0x08, 0xb4, 0xd9, }, 6, 0, "", "",
+"62 f2 ed 08 b4 d9    \tvpmadd52luq %xmm1,%xmm2,%xmm3",},
+{{0xc4, 0xe2, 0x7f, 0xcc, 0xd1, }, 5, 0, "", "",
+"c4 e2 7f cc d1       \tvsha512msg1 %xmm1,%ymm2",},
+{{0xc4, 0xe2, 0x7f, 0xcd, 0xd1, }, 5, 0, "", "",
+"c4 e2 7f cd d1       \tvsha512msg2 %ymm1,%ymm2",},
+{{0xc4, 0xe2, 0x6f, 0xcb, 0xd9, }, 5, 0, "", "",
+"c4 e2 6f cb d9       \tvsha512rnds2 %xmm1,%ymm2,%ymm3",},
+{{0xc4, 0xe2, 0x68, 0xda, 0xd9, }, 5, 0, "", "",
+"c4 e2 68 da d9       \tvsm3msg1 %xmm1,%xmm2,%xmm3",},
+{{0xc4, 0xe2, 0x69, 0xda, 0xd9, }, 5, 0, "", "",
+"c4 e2 69 da d9       \tvsm3msg2 %xmm1,%xmm2,%xmm3",},
+{{0xc4, 0xe3, 0x69, 0xde, 0xd9, 0xa1, }, 6, 0, "", "",
+"c4 e3 69 de d9 a1    \tvsm3rnds2 $0xa1,%xmm1,%xmm2,%xmm3",},
+{{0xc4, 0xe2, 0x6a, 0xda, 0xd9, }, 5, 0, "", "",
+"c4 e2 6a da d9       \tvsm4key4 %xmm1,%xmm2,%xmm3",},
+{{0xc4, 0xe2, 0x6b, 0xda, 0xd9, }, 5, 0, "", "",
+"c4 e2 6b da d9       \tvsm4rnds4 %xmm1,%xmm2,%xmm3",},
+{{0x67, 0x0f, 0x0d, 0x00, }, 4, 0, "", "",
+"67 0f 0d 00          \tprefetch (%eax)",},
+{{0x67, 0x0f, 0x18, 0x08, }, 4, 0, "", "",
+"67 0f 18 08          \tprefetcht0 (%eax)",},
+{{0x67, 0x0f, 0x18, 0x10, }, 4, 0, "", "",
+"67 0f 18 10          \tprefetcht1 (%eax)",},
+{{0x67, 0x0f, 0x18, 0x18, }, 4, 0, "", "",
+"67 0f 18 18          \tprefetcht2 (%eax)",},
+{{0x67, 0x0f, 0x18, 0x00, }, 4, 0, "", "",
+"67 0f 18 00          \tprefetchnta (%eax)",},
+{{0x0f, 0x01, 0xc6, }, 3, 0, "", "",
+"0f 01 c6             \twrmsrns",},
 {{0xf3, 0x0f, 0x3a, 0xf0, 0xc0, 0x00, }, 6, 0, "", "",
 "f3 0f 3a f0 c0 00    \threset $0x0",},
 {{0x0f, 0x01, 0xe8, }, 3, 0, "", "",
diff --git a/tools/perf/arch/x86/tests/insn-x86-dat-src.c b/tools/perf/arch/x86/tests/insn-x86-dat-src.c
index a391464c8dee..f55505c75d51 100644
--- a/tools/perf/arch/x86/tests/insn-x86-dat-src.c
+++ b/tools/perf/arch/x86/tests/insn-x86-dat-src.c
@@ -2628,6 +2628,512 @@ int main(void)
 	asm volatile("vucomish 0x12345678(%rax,%rcx,8), %xmm1");
 	asm volatile("vucomish 0x12345678(%eax,%ecx,8), %xmm1");
 
+	/* Key Locker */
+
+	asm volatile("loadiwkey %xmm1, %xmm2");
+	asm volatile("encodekey128 %eax, %edx");
+	asm volatile("encodekey256 %eax, %edx");
+	asm volatile("aesenc128kl 0x77(%rdx), %xmm3");
+	asm volatile("aesenc256kl 0x77(%rdx), %xmm3");
+	asm volatile("aesdec128kl 0x77(%rdx), %xmm3");
+	asm volatile("aesdec256kl 0x77(%rdx), %xmm3");
+	asm volatile("aesencwide128kl	0x77(%rdx)");
+	asm volatile("aesencwide256kl	0x77(%rdx)");
+	asm volatile("aesdecwide128kl	0x77(%rdx)");
+	asm volatile("aesdecwide256kl	0x77(%rdx)");
+
+	/* Remote Atomic Operations */
+
+	asm volatile("aadd %ecx,(%rax)");
+	asm volatile("aadd %edx,(%r8)");
+	asm volatile("aadd %edx,0x12345678(%rax,%rcx,8)");
+	asm volatile("aadd %edx,0x12345678(%r8,%rcx,8)");
+	asm volatile("aadd %rcx,(%rax)");
+	asm volatile("aadd %rdx,(%r8)");
+	asm volatile("aadd %rdx,(0x12345678)");
+	asm volatile("aadd %rdx,0x12345678(%rax,%rcx,8)");
+	asm volatile("aadd %rdx,0x12345678(%r8,%rcx,8)");
+
+	asm volatile("aand %ecx,(%rax)");
+	asm volatile("aand %edx,(%r8)");
+	asm volatile("aand %edx,0x12345678(%rax,%rcx,8)");
+	asm volatile("aand %edx,0x12345678(%r8,%rcx,8)");
+	asm volatile("aand %rcx,(%rax)");
+	asm volatile("aand %rdx,(%r8)");
+	asm volatile("aand %rdx,(0x12345678)");
+	asm volatile("aand %rdx,0x12345678(%rax,%rcx,8)");
+	asm volatile("aand %rdx,0x12345678(%r8,%rcx,8)");
+
+	asm volatile("aor %ecx,(%rax)");
+	asm volatile("aor %edx,(%r8)");
+	asm volatile("aor %edx,0x12345678(%rax,%rcx,8)");
+	asm volatile("aor %edx,0x12345678(%r8,%rcx,8)");
+	asm volatile("aor %rcx,(%rax)");
+	asm volatile("aor %rdx,(%r8)");
+	asm volatile("aor %rdx,(0x12345678)");
+	asm volatile("aor %rdx,0x12345678(%rax,%rcx,8)");
+	asm volatile("aor %rdx,0x12345678(%r8,%rcx,8)");
+
+	asm volatile("axor %ecx,(%rax)");
+	asm volatile("axor %edx,(%r8)");
+	asm volatile("axor %edx,0x12345678(%rax,%rcx,8)");
+	asm volatile("axor %edx,0x12345678(%r8,%rcx,8)");
+	asm volatile("axor %rcx,(%rax)");
+	asm volatile("axor %rdx,(%r8)");
+	asm volatile("axor %rdx,(0x12345678)");
+	asm volatile("axor %rdx,0x12345678(%rax,%rcx,8)");
+	asm volatile("axor %rdx,0x12345678(%r8,%rcx,8)");
+
+	/* VEX CMPxxXADD */
+
+	asm volatile("cmpbexadd %ebx,%ecx,(%r9)");
+	asm volatile("cmpbxadd %ebx,%ecx,(%r9)");
+	asm volatile("cmplexadd %ebx,%ecx,(%r9)");
+	asm volatile("cmplxadd %ebx,%ecx,(%r9)");
+	asm volatile("cmpnbexadd %ebx,%ecx,(%r9)");
+	asm volatile("cmpnbxadd %ebx,%ecx,(%r9)");
+	asm volatile("cmpnlexadd %ebx,%ecx,(%r9)");
+	asm volatile("cmpnlxadd %ebx,%ecx,(%r9)");
+	asm volatile("cmpnoxadd %ebx,%ecx,(%r9)");
+	asm volatile("cmpnpxadd %ebx,%ecx,(%r9)");
+	asm volatile("cmpnsxadd %ebx,%ecx,(%r9)");
+	asm volatile("cmpnzxadd %ebx,%ecx,(%r9)");
+	asm volatile("cmpoxadd %ebx,%ecx,(%r9)");
+	asm volatile("cmppxadd %ebx,%ecx,(%r9)");
+	asm volatile("cmpsxadd %ebx,%ecx,(%r9)");
+	asm volatile("cmpzxadd %ebx,%ecx,(%r9)");
+
+	/* Pre-fetch */
+
+	asm volatile("prefetch (%rax)");
+	asm volatile("prefetcht0 (%rax)");
+	asm volatile("prefetcht1 (%rax)");
+	asm volatile("prefetcht2 (%rax)");
+	asm volatile("prefetchnta (%rax)");
+	asm volatile("prefetchit0 0x12345678(%rip)");
+	asm volatile("prefetchit1 0x12345678(%rip)");
+
+	/* MSR List */
+
+	asm volatile("rdmsrlist");
+	asm volatile("wrmsrlist");
+
+	/* User Read/Write MSR */
+
+	asm volatile("urdmsr %rdx,%rax");
+	asm volatile("urdmsr %rdx,%r22");
+	asm volatile("urdmsr $0x7f,%r12");
+	asm volatile("uwrmsr %rax,%rdx");
+	asm volatile("uwrmsr %r22,%rdx");
+	asm volatile("uwrmsr %r12,$0x7f");
+
+	/* AVX NE Convert */
+
+	asm volatile("vbcstnebf162ps (%rcx),%xmm6");
+	asm volatile("vbcstnesh2ps (%rcx),%xmm6");
+	asm volatile("vcvtneebf162ps (%rcx),%xmm6");
+	asm volatile("vcvtneeph2ps (%rcx),%xmm6");
+	asm volatile("vcvtneobf162ps (%rcx),%xmm6");
+	asm volatile("vcvtneoph2ps (%rcx),%xmm6");
+	asm volatile("vcvtneps2bf16 %xmm1,%xmm6");
+
+	/* FRED */
+
+	asm volatile("erets");	/* Expecting: erets indirect 0 */
+	asm volatile("eretu");	/* Expecting: eretu indirect 0 */
+
+	/* AMX Complex */
+
+	asm volatile("tcmmimfp16ps %tmm1,%tmm2,%tmm3");
+	asm volatile("tcmmrlfp16ps %tmm1,%tmm2,%tmm3");
+
+	/* AMX FP16 */
+
+	asm volatile("tdpfp16ps %tmm1,%tmm2,%tmm3");
+
+	/* REX2 */
+
+	asm volatile("test $0x5, %r18b");
+	asm volatile("test $0x5, %r18d");
+	asm volatile("test $0x5, %r18");
+	asm volatile("test $0x5, %r18w");
+	asm volatile("imull %eax, %r14d");
+	asm volatile("imull %eax, %r17d");
+	asm volatile("punpckldq (%r18), %mm2");
+	asm volatile("leal (%rax), %r16d");
+	asm volatile("leal (%rax), %r31d");
+	asm volatile("leal (,%r16), %eax");
+	asm volatile("leal (,%r31), %eax");
+	asm volatile("leal (%r16), %eax");
+	asm volatile("leal (%r31), %eax");
+	asm volatile("leaq (%rax), %r15");
+	asm volatile("leaq (%rax), %r16");
+	asm volatile("leaq (%r15), %rax");
+	asm volatile("leaq (%r16), %rax");
+	asm volatile("leaq (,%r15), %rax");
+	asm volatile("leaq (,%r16), %rax");
+	asm volatile("add (%r16), %r8");
+	asm volatile("add (%r16), %r15");
+	asm volatile("mov (,%r9), %r16");
+	asm volatile("mov (,%r14), %r16");
+	asm volatile("sub (%r10), %r31");
+	asm volatile("sub (%r13), %r31");
+	asm volatile("leal 1(%r16, %r21), %eax");
+	asm volatile("leal 1(%r16, %r26), %r31d");
+	asm volatile("leal 129(%r21, %r9), %eax");
+	asm volatile("leal 129(%r26, %r9), %r31d");
+	/*
+	 * Have to use .byte for jmpabs because gas does not support the
+	 * mnemonic for some reason, but then it also gets the source line wrong
+	 * with .byte, so the following is a workaround.
+	 */
+	asm volatile(""); /* Expecting: jmp indirect 0 */
+	asm volatile(".byte 0xd5, 0x00, 0xa1, 0xef, 0xcd, 0xab, 0x90, 0x78, 0x56, 0x34, 0x12");
+	asm volatile("pushp %rbx");
+	asm volatile("pushp %r16");
+	asm volatile("pushp %r31");
+	asm volatile("popp %r31");
+	asm volatile("popp %r16");
+	asm volatile("popp %rbx");
+
+	/* APX */
+
+	asm volatile("bextr %r25d,%edx,%r10d");
+	asm volatile("bextr %r25d,0x123(%r31,%rax,4),%edx");
+	asm volatile("bextr %r31,%r15,%r11");
+	asm volatile("bextr %r31,0x123(%r31,%rax,4),%r15");
+	asm volatile("blsi %r25d,%edx");
+	asm volatile("blsi %r31,%r15");
+	asm volatile("blsi 0x123(%r31,%rax,4),%r25d");
+	asm volatile("blsi 0x123(%r31,%rax,4),%r31");
+	asm volatile("blsmsk %r25d,%edx");
+	asm volatile("blsmsk %r31,%r15");
+	asm volatile("blsmsk 0x123(%r31,%rax,4),%r25d");
+	asm volatile("blsmsk 0x123(%r31,%rax,4),%r31");
+	asm volatile("blsr %r25d,%edx");
+	asm volatile("blsr %r31,%r15");
+	asm volatile("blsr 0x123(%r31,%rax,4),%r25d");
+	asm volatile("blsr 0x123(%r31,%rax,4),%r31");
+	asm volatile("bzhi %r25d,%edx,%r10d");
+	asm volatile("bzhi %r25d,0x123(%r31,%rax,4),%edx");
+	asm volatile("bzhi %r31,%r15,%r11");
+	asm volatile("bzhi %r31,0x123(%r31,%rax,4),%r15");
+	asm volatile("cmpbexadd %r25d,%edx,0x123(%r31,%rax,4)");
+	asm volatile("cmpbexadd %r31,%r15,0x123(%r31,%rax,4)");
+	asm volatile("cmpbxadd %r25d,%edx,0x123(%r31,%rax,4)");
+	asm volatile("cmpbxadd %r31,%r15,0x123(%r31,%rax,4)");
+	asm volatile("cmplxadd %r25d,%edx,0x123(%r31,%rax,4)");
+	asm volatile("cmplxadd %r31,%r15,0x123(%r31,%rax,4)");
+	asm volatile("cmpnbexadd %r25d,%edx,0x123(%r31,%rax,4)");
+	asm volatile("cmpnbexadd %r31,%r15,0x123(%r31,%rax,4)");
+	asm volatile("cmpnbxadd %r25d,%edx,0x123(%r31,%rax,4)");
+	asm volatile("cmpnbxadd %r31,%r15,0x123(%r31,%rax,4)");
+	asm volatile("cmpnlexadd %r25d,%edx,0x123(%r31,%rax,4)");
+	asm volatile("cmpnlexadd %r31,%r15,0x123(%r31,%rax,4)");
+	asm volatile("cmpnlxadd %r25d,%edx,0x123(%r31,%rax,4)");
+	asm volatile("cmpnlxadd %r31,%r15,0x123(%r31,%rax,4)");
+	asm volatile("cmpnoxadd %r25d,%edx,0x123(%r31,%rax,4)");
+	asm volatile("cmpnoxadd %r31,%r15,0x123(%r31,%rax,4)");
+	asm volatile("cmpnpxadd %r25d,%edx,0x123(%r31,%rax,4)");
+	asm volatile("cmpnpxadd %r31,%r15,0x123(%r31,%rax,4)");
+	asm volatile("cmpnsxadd %r25d,%edx,0x123(%r31,%rax,4)");
+	asm volatile("cmpnsxadd %r31,%r15,0x123(%r31,%rax,4)");
+	asm volatile("cmpnzxadd %r25d,%edx,0x123(%r31,%rax,4)");
+	asm volatile("cmpnzxadd %r31,%r15,0x123(%r31,%rax,4)");
+	asm volatile("cmpoxadd %r25d,%edx,0x123(%r31,%rax,4)");
+	asm volatile("cmpoxadd %r31,%r15,0x123(%r31,%rax,4)");
+	asm volatile("cmppxadd %r25d,%edx,0x123(%r31,%rax,4)");
+	asm volatile("cmppxadd %r31,%r15,0x123(%r31,%rax,4)");
+	asm volatile("cmpsxadd %r25d,%edx,0x123(%r31,%rax,4)");
+	asm volatile("cmpsxadd %r31,%r15,0x123(%r31,%rax,4)");
+	asm volatile("cmpzxadd %r25d,%edx,0x123(%r31,%rax,4)");
+	asm volatile("cmpzxadd %r31,%r15,0x123(%r31,%rax,4)");
+	asm volatile("crc32q %r31, %r22");
+	asm volatile("crc32q (%r31), %r22");
+	asm volatile("crc32b %r19b, %r17");
+	asm volatile("crc32b %r19b, %r21d");
+	asm volatile("crc32b (%r19),%ebx");
+	asm volatile("crc32l %r31d, %r23d");
+	asm volatile("crc32l (%r31), %r23d");
+	asm volatile("crc32w %r31w, %r21d");
+	asm volatile("crc32w (%r31),%r21d");
+	asm volatile("crc32 %rax, %r18");
+	asm volatile("enqcmd 0x123(%r31d,%eax,4),%r25d");
+	asm volatile("enqcmd 0x123(%r31,%rax,4),%r31");
+	asm volatile("enqcmds 0x123(%r31d,%eax,4),%r25d");
+	asm volatile("enqcmds 0x123(%r31,%rax,4),%r31");
+	asm volatile("invept 0x123(%r31,%rax,4),%r31");
+	asm volatile("invpcid 0x123(%r31,%rax,4),%r31");
+	asm volatile("invvpid 0x123(%r31,%rax,4),%r31");
+	asm volatile("kmovb %k5,%r25d");
+	asm volatile("kmovb %k5,0x123(%r31,%rax,4)");
+	asm volatile("kmovb %r25d,%k5");
+	asm volatile("kmovb 0x123(%r31,%rax,4),%k5");
+	asm volatile("kmovd %k5,%r25d");
+	asm volatile("kmovd %k5,0x123(%r31,%rax,4)");
+	asm volatile("kmovd %r25d,%k5");
+	asm volatile("kmovd 0x123(%r31,%rax,4),%k5");
+	asm volatile("kmovq %k5,%r31");
+	asm volatile("kmovq %k5,0x123(%r31,%rax,4)");
+	asm volatile("kmovq %r31,%k5");
+	asm volatile("kmovq 0x123(%r31,%rax,4),%k5");
+	asm volatile("kmovw %k5,%r25d");
+	asm volatile("kmovw %k5,0x123(%r31,%rax,4)");
+	asm volatile("kmovw %r25d,%k5");
+	asm volatile("kmovw 0x123(%r31,%rax,4),%k5");
+	asm volatile("ldtilecfg 0x123(%r31,%rax,4)");
+	asm volatile("movbe %r18w,%ax");
+	asm volatile("movbe %r15w,%ax");
+	asm volatile("movbe %r18w,0x123(%r16,%rax,4)");
+	asm volatile("movbe %r18w,0x123(%r31,%rax,4)");
+	asm volatile("movbe %r25d,%edx");
+	asm volatile("movbe %r15d,%edx");
+	asm volatile("movbe %r25d,0x123(%r16,%rax,4)");
+	asm volatile("movbe %r31,%r15");
+	asm volatile("movbe %r8,%r15");
+	asm volatile("movbe %r31,0x123(%r16,%rax,4)");
+	asm volatile("movbe %r31,0x123(%r31,%rax,4)");
+	asm volatile("movbe 0x123(%r16,%rax,4),%r31");
+	asm volatile("movbe 0x123(%r31,%rax,4),%r18w");
+	asm volatile("movbe 0x123(%r31,%rax,4),%r25d");
+	asm volatile("movdir64b 0x123(%r31d,%eax,4),%r25d");
+	asm volatile("movdir64b 0x123(%r31,%rax,4),%r31");
+	asm volatile("movdiri %r25d,0x123(%r31,%rax,4)");
+	asm volatile("movdiri %r31,0x123(%r31,%rax,4)");
+	asm volatile("pdep %r25d,%edx,%r10d");
+	asm volatile("pdep %r31,%r15,%r11");
+	asm volatile("pdep 0x123(%r31,%rax,4),%r25d,%edx");
+	asm volatile("pdep 0x123(%r31,%rax,4),%r31,%r15");
+	asm volatile("pext %r25d,%edx,%r10d");
+	asm volatile("pext %r31,%r15,%r11");
+	asm volatile("pext 0x123(%r31,%rax,4),%r25d,%edx");
+	asm volatile("pext 0x123(%r31,%rax,4),%r31,%r15");
+	asm volatile("shlx %r25d,%edx,%r10d");
+	asm volatile("shlx %r25d,0x123(%r31,%rax,4),%edx");
+	asm volatile("shlx %r31,%r15,%r11");
+	asm volatile("shlx %r31,0x123(%r31,%rax,4),%r15");
+	asm volatile("shrx %r25d,%edx,%r10d");
+	asm volatile("shrx %r25d,0x123(%r31,%rax,4),%edx");
+	asm volatile("shrx %r31,%r15,%r11");
+	asm volatile("shrx %r31,0x123(%r31,%rax,4),%r15");
+	asm volatile("sttilecfg 0x123(%r31,%rax,4)");
+	asm volatile("tileloadd 0x123(%r31,%rax,4),%tmm6");
+	asm volatile("tileloaddt1 0x123(%r31,%rax,4),%tmm6");
+	asm volatile("tilestored %tmm6,0x123(%r31,%rax,4)");
+	asm volatile("vbroadcastf128 (%r16),%ymm3");
+	asm volatile("vbroadcasti128 (%r16),%ymm3");
+	asm volatile("vextractf128 $1,%ymm3,(%r16)");
+	asm volatile("vextracti128 $1,%ymm3,(%r16)");
+	asm volatile("vinsertf128 $1,(%r16),%ymm3,%ymm8");
+	asm volatile("vinserti128 $1,(%r16),%ymm3,%ymm8");
+	asm volatile("vroundpd $1,(%r24),%xmm6");
+	asm volatile("vroundps $2,(%r24),%xmm6");
+	asm volatile("vroundsd $3,(%r24),%xmm6,%xmm3");
+	asm volatile("vroundss $4,(%r24),%xmm6,%xmm3");
+	asm volatile("wrssd %r25d,0x123(%r31,%rax,4)");
+	asm volatile("wrssq %r31,0x123(%r31,%rax,4)");
+	asm volatile("wrussd %r25d,0x123(%r31,%rax,4)");
+	asm volatile("wrussq %r31,0x123(%r31,%rax,4)");
+
+	/* APX new data destination */
+
+	asm volatile("adc $0x1234,%ax,%r30w");
+	asm volatile("adc %r15b,%r17b,%r18b");
+	asm volatile("adc %r15d,(%r8),%r18d");
+	asm volatile("adc (%r15,%rax,1),%r16b,%r8b");
+	asm volatile("adc (%r15,%rax,1),%r16w,%r8w");
+	asm volatile("adcl $0x11,(%r19,%rax,4),%r20d");
+	asm volatile("adcx %r15d,%r8d,%r18d");
+	asm volatile("adcx (%r15,%r31,1),%r8");
+	asm volatile("adcx (%r15,%r31,1),%r8d,%r18d");
+	asm volatile("add $0x1234,%ax,%r30w");
+	asm volatile("add $0x12344433,%r15,%r16");
+	asm volatile("add $0x34,%r13b,%r17b");
+	asm volatile("add $0xfffffffff4332211,%rax,%r8");
+	asm volatile("add %r31,%r8,%r16");
+	asm volatile("add %r31,(%r8),%r16");
+	asm volatile("add %r31,(%r8,%r16,8),%r16");
+	asm volatile("add %r31b,%r8b,%r16b");
+	asm volatile("add %r31d,%r8d,%r16d");
+	asm volatile("add %r31w,%r8w,%r16w");
+	asm volatile("add (%r31),%r8,%r16");
+	asm volatile("add 0x9090(%r31,%r16,1),%r8,%r16");
+	asm volatile("addb %r31b,%r8b,%r16b");
+	asm volatile("addl %r31d,%r8d,%r16d");
+	asm volatile("addl $0x11,(%r19,%rax,4),%r20d");
+	asm volatile("addq %r31,%r8,%r16");
+	asm volatile("addq $0x12344433,(%r15,%rcx,4),%r16");
+	asm volatile("addw %r31w,%r8w,%r16w");
+	asm volatile("adox %r15d,%r8d,%r18d");
+	asm volatile("{load} add %r31,%r8,%r16");
+	asm volatile("{store} add %r31,%r8,%r16");
+	asm volatile("adox (%r15,%r31,1),%r8");
+	asm volatile("adox (%r15,%r31,1),%r8d,%r18d");
+	asm volatile("and $0x1234,%ax,%r30w");
+	asm volatile("and %r15b,%r17b,%r18b");
+	asm volatile("and %r15d,(%r8),%r18d");
+	asm volatile("and (%r15,%rax,1),%r16b,%r8b");
+	asm volatile("and (%r15,%rax,1),%r16w,%r8w");
+	asm volatile("andl $0x11,(%r19,%rax,4),%r20d");
+	asm volatile("cmova 0x90909090(%eax),%edx,%r8d");
+	asm volatile("cmovae 0x90909090(%eax),%edx,%r8d");
+	asm volatile("cmovb 0x90909090(%eax),%edx,%r8d");
+	asm volatile("cmovbe 0x90909090(%eax),%edx,%r8d");
+	asm volatile("cmove 0x90909090(%eax),%edx,%r8d");
+	asm volatile("cmovg 0x90909090(%eax),%edx,%r8d");
+	asm volatile("cmovge 0x90909090(%eax),%edx,%r8d");
+	asm volatile("cmovl 0x90909090(%eax),%edx,%r8d");
+	asm volatile("cmovle 0x90909090(%eax),%edx,%r8d");
+	asm volatile("cmovne 0x90909090(%eax),%edx,%r8d");
+	asm volatile("cmovno 0x90909090(%eax),%edx,%r8d");
+	asm volatile("cmovnp 0x90909090(%eax),%edx,%r8d");
+	asm volatile("cmovns 0x90909090(%eax),%edx,%r8d");
+	asm volatile("cmovo 0x90909090(%eax),%edx,%r8d");
+	asm volatile("cmovp 0x90909090(%eax),%edx,%r8d");
+	asm volatile("cmovs 0x90909090(%eax),%edx,%r8d");
+	asm volatile("dec %rax,%r17");
+	asm volatile("decb (%r31,%r12,1),%r8b");
+	asm volatile("imul 0x909(%rax,%r31,8),%rdx,%r25");
+	asm volatile("imul 0x90909(%eax),%edx,%r8d");
+	asm volatile("inc %r31,%r16");
+	asm volatile("inc %r31,%r8");
+	asm volatile("inc %rax,%rbx");
+	asm volatile("neg %rax,%r17");
+	asm volatile("negb (%r31,%r12,1),%r8b");
+	asm volatile("not %rax,%r17");
+	asm volatile("notb (%r31,%r12,1),%r8b");
+	asm volatile("or $0x1234,%ax,%r30w");
+	asm volatile("or %r15b,%r17b,%r18b");
+	asm volatile("or %r15d,(%r8),%r18d");
+	asm volatile("or (%r15,%rax,1),%r16b,%r8b");
+	asm volatile("or (%r15,%rax,1),%r16w,%r8w");
+	asm volatile("orl $0x11,(%r19,%rax,4),%r20d");
+	asm volatile("rcl $0x2,%r12b,%r31b");
+	asm volatile("rcl %cl,%r16b,%r8b");
+	asm volatile("rclb $0x1,(%rax),%r31b");
+	asm volatile("rcll $0x2,(%rax),%r31d");
+	asm volatile("rclw $0x1,(%rax),%r31w");
+	asm volatile("rclw %cl,(%r19,%rax,4),%r31w");
+	asm volatile("rcr $0x2,%r12b,%r31b");
+	asm volatile("rcr %cl,%r16b,%r8b");
+	asm volatile("rcrb $0x1,(%rax),%r31b");
+	asm volatile("rcrl $0x2,(%rax),%r31d");
+	asm volatile("rcrw $0x1,(%rax),%r31w");
+	asm volatile("rcrw %cl,(%r19,%rax,4),%r31w");
+	asm volatile("rol $0x2,%r12b,%r31b");
+	asm volatile("rol %cl,%r16b,%r8b");
+	asm volatile("rolb $0x1,(%rax),%r31b");
+	asm volatile("roll $0x2,(%rax),%r31d");
+	asm volatile("rolw $0x1,(%rax),%r31w");
+	asm volatile("rolw %cl,(%r19,%rax,4),%r31w");
+	asm volatile("ror $0x2,%r12b,%r31b");
+	asm volatile("ror %cl,%r16b,%r8b");
+	asm volatile("rorb $0x1,(%rax),%r31b");
+	asm volatile("rorl $0x2,(%rax),%r31d");
+	asm volatile("rorw $0x1,(%rax),%r31w");
+	asm volatile("rorw %cl,(%r19,%rax,4),%r31w");
+	asm volatile("sar $0x2,%r12b,%r31b");
+	asm volatile("sar %cl,%r16b,%r8b");
+	asm volatile("sarb $0x1,(%rax),%r31b");
+	asm volatile("sarl $0x2,(%rax),%r31d");
+	asm volatile("sarw $0x1,(%rax),%r31w");
+	asm volatile("sarw %cl,(%r19,%rax,4),%r31w");
+	asm volatile("sbb $0x1234,%ax,%r30w");
+	asm volatile("sbb %r15b,%r17b,%r18b");
+	asm volatile("sbb %r15d,(%r8),%r18d");
+	asm volatile("sbb (%r15,%rax,1),%r16b,%r8b");
+	asm volatile("sbb (%r15,%rax,1),%r16w,%r8w");
+	asm volatile("sbbl $0x11,(%r19,%rax,4),%r20d");
+	asm volatile("shl $0x2,%r12b,%r31b");
+	asm volatile("shl $0x2,%r12b,%r31b");
+	asm volatile("shl %cl,%r16b,%r8b");
+	asm volatile("shl %cl,%r16b,%r8b");
+	asm volatile("shlb $0x1,(%rax),%r31b");
+	asm volatile("shlb $0x1,(%rax),%r31b");
+	asm volatile("shld $0x1,%r12,(%rax),%r31");
+	asm volatile("shld $0x2,%r15d,(%rax),%r31d");
+	asm volatile("shld $0x2,%r8w,%r12w,%r31w");
+	asm volatile("shld %cl,%r12,%r16,%r8");
+	asm volatile("shld %cl,%r13w,(%r19,%rax,4),%r31w");
+	asm volatile("shld %cl,%r9w,(%rax),%r31w");
+	asm volatile("shll $0x2,(%rax),%r31d");
+	asm volatile("shll $0x2,(%rax),%r31d");
+	asm volatile("shlw $0x1,(%rax),%r31w");
+	asm volatile("shlw $0x1,(%rax),%r31w");
+	asm volatile("shlw %cl,(%r19,%rax,4),%r31w");
+	asm volatile("shlw %cl,(%r19,%rax,4),%r31w");
+	asm volatile("shr $0x2,%r12b,%r31b");
+	asm volatile("shr %cl,%r16b,%r8b");
+	asm volatile("shrb $0x1,(%rax),%r31b");
+	asm volatile("shrd $0x1,%r12,(%rax),%r31");
+	asm volatile("shrd $0x2,%r15d,(%rax),%r31d");
+	asm volatile("shrd $0x2,%r8w,%r12w,%r31w");
+	asm volatile("shrd %cl,%r12,%r16,%r8");
+	asm volatile("shrd %cl,%r13w,(%r19,%rax,4),%r31w");
+	asm volatile("shrd %cl,%r9w,(%rax),%r31w");
+	asm volatile("shrl $0x2,(%rax),%r31d");
+	asm volatile("shrw $0x1,(%rax),%r31w");
+	asm volatile("shrw %cl,(%r19,%rax,4),%r31w");
+	asm volatile("sub $0x1234,%ax,%r30w");
+	asm volatile("sub %r15b,%r17b,%r18b");
+	asm volatile("sub %r15d,(%r8),%r18d");
+	asm volatile("sub (%r15,%rax,1),%r16b,%r8b");
+	asm volatile("sub (%r15,%rax,1),%r16w,%r8w");
+	asm volatile("subl $0x11,(%r19,%rax,4),%r20d");
+	asm volatile("xor $0x1234,%ax,%r30w");
+	asm volatile("xor %r15b,%r17b,%r18b");
+	asm volatile("xor %r15d,(%r8),%r18d");
+	asm volatile("xor (%r15,%rax,1),%r16b,%r8b");
+	asm volatile("xor (%r15,%rax,1),%r16w,%r8w");
+	asm volatile("xorl $0x11,(%r19,%rax,4),%r20d");
+
+	/* APX suppress status flags */
+
+	asm volatile("{nf} add %bl,%dl,%r8b");
+	asm volatile("{nf} add %dx,%ax,%r9w");
+	asm volatile("{nf} add 0x123(%r8,%rax,4),%bl,%dl");
+	asm volatile("{nf} add 0x123(%r8,%rax,4),%dx,%ax");
+	asm volatile("{nf} or %bl,%dl,%r8b");
+	asm volatile("{nf} or %dx,%ax,%r9w");
+	asm volatile("{nf} or 0x123(%r8,%rax,4),%bl,%dl");
+	asm volatile("{nf} or 0x123(%r8,%rax,4),%dx,%ax");
+	asm volatile("{nf} and %bl,%dl,%r8b");
+	asm volatile("{nf} and %dx,%ax,%r9w");
+	asm volatile("{nf} and 0x123(%r8,%rax,4),%bl,%dl");
+	asm volatile("{nf} and 0x123(%r8,%rax,4),%dx,%ax");
+	asm volatile("{nf} shld $0x7b,%dx,%ax,%r9w");
+	asm volatile("{nf} sub %bl,%dl,%r8b");
+	asm volatile("{nf} sub %dx,%ax,%r9w");
+	asm volatile("{nf} sub 0x123(%r8,%rax,4),%bl,%dl");
+	asm volatile("{nf} sub 0x123(%r8,%rax,4),%dx,%ax");
+	asm volatile("{nf} shrd $0x7b,%dx,%ax,%r9w");
+	asm volatile("{nf} xor %bl,%dl,%r8b");
+	asm volatile("{nf} xor %r31,%r31");
+	asm volatile("{nf} xor 0x123(%r8,%rax,4),%bl,%dl");
+	asm volatile("{nf} xor 0x123(%r8,%rax,4),%dx,%ax");
+	asm volatile("{nf} imul $0xff90,%r9,%r15");
+	asm volatile("{nf} imul $0x7b,%r9,%r15");
+	asm volatile("{nf} xor $0x7b,%bl,%dl");
+	asm volatile("{nf} xor $0x7b,%dx,%ax");
+	asm volatile("{nf} popcnt %r9,%r31");
+	asm volatile("{nf} shld %cl,%dx,%ax,%r9w");
+	asm volatile("{nf} shrd %cl,%dx,%ax,%r9w");
+	asm volatile("{nf} imul %r9,%r31,%r11");
+	asm volatile("{nf} sar $0x7b,%bl,%dl");
+	asm volatile("{nf} sar $0x7b,%dx,%ax");
+	asm volatile("{nf} sar $1,%bl,%dl");
+	asm volatile("{nf} sar $1,%dx,%ax");
+	asm volatile("{nf} sar %cl,%bl,%dl");
+	asm volatile("{nf} sar %cl,%dx,%ax");
+	asm volatile("{nf} andn %r9,%r31,%r11");
+	asm volatile("{nf} blsi %r9,%r31");
+	asm volatile("{nf} tzcnt %r9,%r31");
+	asm volatile("{nf} lzcnt %r9,%r31");
+	asm volatile("{nf} idiv %bl");
+	asm volatile("{nf} idiv %dx");
+	asm volatile("{nf} dec %bl,%dl");
+	asm volatile("{nf} dec %dx,%ax");
+
 #else  /* #ifdef __x86_64__ */
 
 	/* bound r32, mem (same op code as EVEX prefix) */
@@ -4848,6 +5354,97 @@ int main(void)
 
 #endif /* #ifndef __x86_64__ */
 
+	/* Key Locker */
+
+	asm volatile("	loadiwkey %xmm1, %xmm2");
+	asm volatile("	encodekey128 %eax, %edx");
+	asm volatile("	encodekey256 %eax, %edx");
+	asm volatile("	aesenc128kl 0x77(%edx), %xmm3");
+	asm volatile("	aesenc256kl 0x77(%edx), %xmm3");
+	asm volatile("	aesdec128kl 0x77(%edx), %xmm3");
+	asm volatile("	aesdec256kl 0x77(%edx), %xmm3");
+	asm volatile("	aesencwide128kl	0x77(%edx)");
+	asm volatile("	aesencwide256kl	0x77(%edx)");
+	asm volatile("	aesdecwide128kl	0x77(%edx)");
+	asm volatile("	aesdecwide256kl	0x77(%edx)");
+
+	/* Remote Atomic Operations */
+
+	asm volatile("aadd %ecx,(%eax)");
+	asm volatile("aadd %edx,(0x12345678)");
+	asm volatile("aadd %edx,0x12345678(%eax,%ecx,8)");
+
+	asm volatile("aand %ecx,(%eax)");
+	asm volatile("aand %edx,(0x12345678)");
+	asm volatile("aand %edx,0x12345678(%eax,%ecx,8)");
+
+	asm volatile("aor %ecx,(%eax)");
+	asm volatile("aor %edx,(0x12345678)");
+	asm volatile("aor %edx,0x12345678(%eax,%ecx,8)");
+
+	asm volatile("axor %ecx,(%eax)");
+	asm volatile("axor %edx,(0x12345678)");
+	asm volatile("axor %edx,0x12345678(%eax,%ecx,8)");
+
+	/* AVX NE Convert */
+
+	asm volatile("vbcstnebf162ps (%ecx),%xmm6");
+	asm volatile("vbcstnesh2ps (%ecx),%xmm6");
+	asm volatile("vcvtneebf162ps (%ecx),%xmm6");
+	asm volatile("vcvtneeph2ps (%ecx),%xmm6");
+	asm volatile("vcvtneobf162ps (%ecx),%xmm6");
+	asm volatile("vcvtneoph2ps (%ecx),%xmm6");
+	asm volatile("vcvtneps2bf16 %xmm1,%xmm6");
+
+	/* AVX VNNI INT16 */
+
+	asm volatile("vpdpbssd %xmm1,%xmm2,%xmm3");
+	asm volatile("vpdpbssds %xmm1,%xmm2,%xmm3");
+	asm volatile("vpdpbsud %xmm1,%xmm2,%xmm3");
+	asm volatile("vpdpbsuds %xmm1,%xmm2,%xmm3");
+	asm volatile("vpdpbuud %xmm1,%xmm2,%xmm3");
+	asm volatile("vpdpbuuds %xmm1,%xmm2,%xmm3");
+	asm volatile("vpdpwsud %xmm1,%xmm2,%xmm3");
+	asm volatile("vpdpwsuds %xmm1,%xmm2,%xmm3");
+	asm volatile("vpdpwusd %xmm1,%xmm2,%xmm3");
+	asm volatile("vpdpwusds %xmm1,%xmm2,%xmm3");
+	asm volatile("vpdpwuud %xmm1,%xmm2,%xmm3");
+	asm volatile("vpdpwuuds %xmm1,%xmm2,%xmm3");
+
+	/* AVX IFMA */
+
+	asm volatile("vpmadd52huq %xmm1,%xmm2,%xmm3");
+	asm volatile("vpmadd52luq %xmm1,%xmm2,%xmm3");
+
+	/* AVX SHA512 */
+
+	asm volatile("vsha512msg1 %xmm1,%ymm2");
+	asm volatile("vsha512msg2 %ymm1,%ymm2");
+	asm volatile("vsha512rnds2 %xmm1,%ymm2,%ymm3");
+
+	/* AVX SM3 */
+
+	asm volatile("vsm3msg1 %xmm1,%xmm2,%xmm3");
+	asm volatile("vsm3msg2 %xmm1,%xmm2,%xmm3");
+	asm volatile("vsm3rnds2 $0xa1,%xmm1,%xmm2,%xmm3");
+
+	/* AVX SM4 */
+
+	asm volatile("vsm4key4 %xmm1,%xmm2,%xmm3");
+	asm volatile("vsm4rnds4 %xmm1,%xmm2,%xmm3");
+
+	/* Pre-fetch */
+
+	asm volatile("prefetch (%eax)");
+	asm volatile("prefetcht0 (%eax)");
+	asm volatile("prefetcht1 (%eax)");
+	asm volatile("prefetcht2 (%eax)");
+	asm volatile("prefetchnta (%eax)");
+
+	/* Non-serializing write MSR */
+
+	asm volatile("wrmsrns");
+
 	/* Prediction history reset */
 
 	asm volatile("hreset $0");
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH 02/10] x86/insn: Fix PUSH instruction in x86 instruction decoder opcode map
  2024-05-02 10:58 ` [PATCH 02/10] x86/insn: Fix PUSH instruction in x86 instruction decoder " Adrian Hunter
@ 2024-05-02 13:41   ` Arnaldo Carvalho de Melo
  2024-05-02 19:11     ` Adrian Hunter
  0 siblings, 1 reply; 18+ messages in thread
From: Arnaldo Carvalho de Melo @ 2024-05-02 13:41 UTC (permalink / raw)
  To: Adrian Hunter
  Cc: linux-kernel, Chang S. Bae, Masami Hiramatsu, Nikolay Borisov,
	Borislav Petkov, Ingo Molnar, H. Peter Anvin, Dave Hansen,
	Thomas Gleixner, x86, Jiri Olsa, Namhyung Kim, Ian Rogers,
	linux-perf-users

On Thu, May 02, 2024 at 01:58:45PM +0300, Adrian Hunter wrote:
> The x86 instruction decoder is used not only for decoding kernel
> instructions. It is also used by perf uprobes (user space probes) and by
> perf tools Intel Processor Trace decoding. Consequently, it needs to
> support instructions executed by user space also.

I wonder if we should do it this way, i.e. updating both tools/ and
kernel source code in the same patch.

I think the best is to update the kernel bits, then, after that is
merged, do the tooling part.

To avoid possible, yet unlikely, clashes in linux-next, for instance.

- Arnaldo
 
> Opcode 0x68 PUSH instruction is currently defined as 64-bit operand size
> only i.e. (d64). That was based on Intel SDM Opcode Map. However that is
> contradicted by the Instruction Set Reference section for PUSH in the
> same manual.
> 
> Remove 64-bit operand size only annotation from opcode 0x68 PUSH
> instruction.
> 
> Example:
> 
>   $ cat pushw.s
>   .global  _start
>   .text
>   _start:
>           pushw   $0x1234
>           mov     $0x1,%eax   # system call number (sys_exit)
>           int     $0x80
>   $ as -o pushw.o pushw.s
>   $ ld -s -o pushw pushw.o
>   $ objdump -d pushw | tail -4
>   0000000000401000 <.text>:
>     401000:       66 68 34 12             pushw  $0x1234
>     401004:       b8 01 00 00 00          mov    $0x1,%eax
>     401009:       cd 80                   int    $0x80
>   $ perf record -e intel_pt//u ./pushw
>   [ perf record: Woken up 1 times to write data ]
>   [ perf record: Captured and wrote 0.014 MB perf.data ]
> 
>  Before:
> 
>   $ perf script --insn-trace=disasm
>   Warning:
>   1 instruction trace errors
>            pushw   10349 [000] 10586.869237014:            401000 [unknown] (/home/ahunter/git/misc/rtit-tests/pushw)           pushw $0x1234
>            pushw   10349 [000] 10586.869237014:            401006 [unknown] (/home/ahunter/git/misc/rtit-tests/pushw)           addb %al, (%rax)
>            pushw   10349 [000] 10586.869237014:            401008 [unknown] (/home/ahunter/git/misc/rtit-tests/pushw)           addb %cl, %ch
>            pushw   10349 [000] 10586.869237014:            40100a [unknown] (/home/ahunter/git/misc/rtit-tests/pushw)           addb $0x2e, (%rax)
>    instruction trace error type 1 time 10586.869237224 cpu 0 pid 10349 tid 10349 ip 0x40100d code 6: Trace doesn't match instruction
> 
>  After:
> 
>   $ perf script --insn-trace=disasm
>              pushw   10349 [000] 10586.869237014:            401000 [unknown] (./pushw)           pushw $0x1234
>              pushw   10349 [000] 10586.869237014:            401004 [unknown] (./pushw)           movl $1, %eax
> 
> Fixes: eb13296cfaf6 ("x86: Instruction decoder API")
> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
> ---
>  arch/x86/lib/x86-opcode-map.txt       | 2 +-
>  tools/arch/x86/lib/x86-opcode-map.txt | 2 +-
>  2 files changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/lib/x86-opcode-map.txt b/arch/x86/lib/x86-opcode-map.txt
> index c94988d5130d..4ea2e6adb477 100644
> --- a/arch/x86/lib/x86-opcode-map.txt
> +++ b/arch/x86/lib/x86-opcode-map.txt
> @@ -148,7 +148,7 @@ AVXcode:
>  65: SEG=GS (Prefix)
>  66: Operand-Size (Prefix)
>  67: Address-Size (Prefix)
> -68: PUSH Iz (d64)
> +68: PUSH Iz
>  69: IMUL Gv,Ev,Iz
>  6a: PUSH Ib (d64)
>  6b: IMUL Gv,Ev,Ib
> diff --git a/tools/arch/x86/lib/x86-opcode-map.txt b/tools/arch/x86/lib/x86-opcode-map.txt
> index c94988d5130d..4ea2e6adb477 100644
> --- a/tools/arch/x86/lib/x86-opcode-map.txt
> +++ b/tools/arch/x86/lib/x86-opcode-map.txt
> @@ -148,7 +148,7 @@ AVXcode:
>  65: SEG=GS (Prefix)
>  66: Operand-Size (Prefix)
>  67: Address-Size (Prefix)
> -68: PUSH Iz (d64)
> +68: PUSH Iz
>  69: IMUL Gv,Ev,Iz
>  6a: PUSH Ib (d64)
>  6b: IMUL Gv,Ev,Ib
> -- 
> 2.34.1

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 05/10] x86/insn: Add support for REX2 prefix to the instruction decoder logic
  2024-05-02 10:58 ` [PATCH 05/10] x86/insn: Add support for REX2 prefix to the instruction decoder logic Adrian Hunter
@ 2024-05-02 18:10   ` Ian Rogers
  2024-05-03  5:09     ` Adrian Hunter
  0 siblings, 1 reply; 18+ messages in thread
From: Ian Rogers @ 2024-05-02 18:10 UTC (permalink / raw)
  To: Adrian Hunter
  Cc: linux-kernel, Chang S. Bae, Masami Hiramatsu, Nikolay Borisov,
	Borislav Petkov, Ingo Molnar, H. Peter Anvin, Dave Hansen,
	Thomas Gleixner, x86, Arnaldo Carvalho de Melo, Jiri Olsa,
	Namhyung Kim, linux-perf-users

On Thu, May 2, 2024 at 3:59 AM Adrian Hunter <adrian.hunter@intel.com> wrote:
>
> Intel Advanced Performance Extensions (APX) uses a new 2-byte prefix named
> REX2 to select extended general purpose registers (EGPRs) i.e. r16 to r31.
>
> The REX2 prefix is effectively an extended version of the REX prefix.
>
> REX2 and EVEX are also used with PUSH/POP instructions to provide a
> Push-Pop Acceleration (PPX) hint. With PPX hints, a CPU will attempt to
> fast-forward register data between matching PUSH and POP instructions.
>
> REX2 is valid only with opcodes in maps 0 and 1. Similar extension for
> other maps is provided by the EVEX prefix, covered in a separate patch.
>
> Some opcodes in maps 0 and 1 are reserved under REX2. One of these is used
> for a new 64-bit absolute direct jump instruction JMPABS.
>
> Refer to the Intel Advanced Performance Extensions (Intel APX) Architecture
> Specification for details.
>
> Define a code value for the REX2 prefix (INAT_PFX_REX2), and add attribute
> flags for opcodes reserved under REX2 (INAT_NO_REX2) and to identify
> opcodes (only JMPABS) that require a mandatory REX2 prefix
> (INAT_REX2_VARIANT).
>
> Amend logic to read the REX2 prefix and get the opcode attribute for the
> map number (0 or 1) encoded in the REX2 prefix.
>
> Amend the awk script that generates the attribute tables from the opcode
> map, to recognise "REX2" as attribute INAT_PFX_REX2, and "(!REX2)"
> as attribute INAT_NO_REX2, and "(REX2)" as attribute INAT_REX2_VARIANT.
>
> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
> ---
>  arch/x86/include/asm/inat.h                | 11 +++++++++-
>  arch/x86/include/asm/insn.h                | 25 ++++++++++++++++++----
>  arch/x86/lib/insn.c                        | 25 ++++++++++++++++++++++
>  arch/x86/tools/gen-insn-attr-x86.awk       | 11 +++++++++-
>  tools/arch/x86/include/asm/inat.h          | 11 +++++++++-
>  tools/arch/x86/include/asm/insn.h          | 25 ++++++++++++++++++----
>  tools/arch/x86/lib/insn.c                  | 25 ++++++++++++++++++++++
>  tools/arch/x86/tools/gen-insn-attr-x86.awk | 11 +++++++++-
>  8 files changed, 132 insertions(+), 12 deletions(-)
>
> diff --git a/arch/x86/include/asm/inat.h b/arch/x86/include/asm/inat.h
> index b56c5741581a..1331bdd39a23 100644
> --- a/arch/x86/include/asm/inat.h
> +++ b/arch/x86/include/asm/inat.h
> @@ -35,6 +35,8 @@
>  #define INAT_PFX_VEX2  13      /* 2-bytes VEX prefix */
>  #define INAT_PFX_VEX3  14      /* 3-bytes VEX prefix */
>  #define INAT_PFX_EVEX  15      /* EVEX prefix */
> +/* x86-64 REX2 prefix */
> +#define INAT_PFX_REX2  16      /* 0xD5 */
>
>  #define INAT_LSTPFX_MAX        3
>  #define INAT_LGCPFX_MAX        11
> @@ -50,7 +52,7 @@
>
>  /* Legacy prefix */
>  #define INAT_PFX_OFFS  0
> -#define INAT_PFX_BITS  4
> +#define INAT_PFX_BITS  5
>  #define INAT_PFX_MAX    ((1 << INAT_PFX_BITS) - 1)
>  #define INAT_PFX_MASK  (INAT_PFX_MAX << INAT_PFX_OFFS)
>  /* Escape opcodes */
> @@ -77,6 +79,8 @@
>  #define INAT_VEXOK     (1 << (INAT_FLAG_OFFS + 5))
>  #define INAT_VEXONLY   (1 << (INAT_FLAG_OFFS + 6))
>  #define INAT_EVEXONLY  (1 << (INAT_FLAG_OFFS + 7))
> +#define INAT_NO_REX2   (1 << (INAT_FLAG_OFFS + 8))
> +#define INAT_REX2_VARIANT      (1 << (INAT_FLAG_OFFS + 9))
>  /* Attribute making macros for attribute tables */
>  #define INAT_MAKE_PREFIX(pfx)  (pfx << INAT_PFX_OFFS)
>  #define INAT_MAKE_ESCAPE(esc)  (esc << INAT_ESC_OFFS)
> @@ -128,6 +132,11 @@ static inline int inat_is_rex_prefix(insn_attr_t attr)
>         return (attr & INAT_PFX_MASK) == INAT_PFX_REX;
>  }
>
> +static inline int inat_is_rex2_prefix(insn_attr_t attr)
> +{
> +       return (attr & INAT_PFX_MASK) == INAT_PFX_REX2;
> +}
> +
>  static inline int inat_last_prefix_id(insn_attr_t attr)
>  {
>         if ((attr & INAT_PFX_MASK) > INAT_LSTPFX_MAX)
> diff --git a/arch/x86/include/asm/insn.h b/arch/x86/include/asm/insn.h
> index 1b29f58f730f..95249ec1f24e 100644
> --- a/arch/x86/include/asm/insn.h
> +++ b/arch/x86/include/asm/insn.h
> @@ -112,10 +112,15 @@ struct insn {
>  #define X86_SIB_INDEX(sib) (((sib) & 0x38) >> 3)
>  #define X86_SIB_BASE(sib) ((sib) & 0x07)
>
> -#define X86_REX_W(rex) ((rex) & 8)
> -#define X86_REX_R(rex) ((rex) & 4)
> -#define X86_REX_X(rex) ((rex) & 2)
> -#define X86_REX_B(rex) ((rex) & 1)
> +#define X86_REX2_M(rex) ((rex) & 0x80) /* REX2 M0 */
> +#define X86_REX2_R(rex) ((rex) & 0x40) /* REX2 R4 */
> +#define X86_REX2_X(rex) ((rex) & 0x20) /* REX2 X4 */
> +#define X86_REX2_B(rex) ((rex) & 0x10) /* REX2 B4 */
> +
> +#define X86_REX_W(rex) ((rex) & 8)     /* REX or REX2 W */
> +#define X86_REX_R(rex) ((rex) & 4)     /* REX or REX2 R3 */
> +#define X86_REX_X(rex) ((rex) & 2)     /* REX or REX2 X3 */
> +#define X86_REX_B(rex) ((rex) & 1)     /* REX or REX2 B3 */
>
>  /* VEX bit flags  */
>  #define X86_VEX_W(vex) ((vex) & 0x80)  /* VEX3 Byte2 */
> @@ -161,6 +166,18 @@ static inline void insn_get_attribute(struct insn *insn)
>  /* Instruction uses RIP-relative addressing */
>  extern int insn_rip_relative(struct insn *insn);
>
> +static inline int insn_is_rex2(struct insn *insn)
> +{
> +       if (!insn->prefixes.got)
> +               insn_get_prefixes(insn);
> +       return insn->rex_prefix.nbytes == 2;

It'd be nice to capture that a rex2 prefix is by definition 2 bytes.
Playing devil's advocate, if there were a REX and a REX2 prefix,
couldn't rex_prefix.nbytes be 3? I'm wondering about other prefix
combinations that may confuse this logic, maybe someone dreams up
doing this for say alignment reasons like "rep ret".

Thanks,
Ian

> +}
> +
> +static inline insn_byte_t insn_rex2_m_bit(struct insn *insn)
> +{
> +       return X86_REX2_M(insn->rex_prefix.bytes[1]);
> +}
> +
>  static inline int insn_is_avx(struct insn *insn)
>  {
>         if (!insn->prefixes.got)
> diff --git a/arch/x86/lib/insn.c b/arch/x86/lib/insn.c
> index 1bb155a0955b..6126ddc6e5f5 100644
> --- a/arch/x86/lib/insn.c
> +++ b/arch/x86/lib/insn.c
> @@ -185,6 +185,17 @@ int insn_get_prefixes(struct insn *insn)
>                         if (X86_REX_W(b))
>                                 /* REX.W overrides opnd_size */
>                                 insn->opnd_bytes = 8;
> +               } else if (inat_is_rex2_prefix(attr)) {
> +                       insn_set_byte(&insn->rex_prefix, 0, b);
> +                       b = peek_nbyte_next(insn_byte_t, insn, 1);
> +                       insn_set_byte(&insn->rex_prefix, 1, b);
> +                       insn->rex_prefix.nbytes = 2;
> +                       insn->next_byte += 2;
> +                       if (X86_REX_W(b))
> +                               /* REX.W overrides opnd_size */
> +                               insn->opnd_bytes = 8;
> +                       insn->rex_prefix.got = 1;
> +                       goto vex_end;
>                 }
>         }
>         insn->rex_prefix.got = 1;
> @@ -294,6 +305,20 @@ int insn_get_opcode(struct insn *insn)
>                 goto end;
>         }
>
> +       /* Check if there is REX2 prefix or not */
> +       if (insn_is_rex2(insn)) {
> +               if (insn_rex2_m_bit(insn)) {
> +                       /* map 1 is escape 0x0f */
> +                       insn_attr_t esc_attr = inat_get_opcode_attribute(0x0f);
> +
> +                       pfx_id = insn_last_prefix_id(insn);
> +                       insn->attr = inat_get_escape_attribute(op, pfx_id, esc_attr);
> +               } else {
> +                       insn->attr = inat_get_opcode_attribute(op);
> +               }
> +               goto end;
> +       }
> +
>         insn->attr = inat_get_opcode_attribute(op);
>         while (inat_is_escape(insn->attr)) {
>                 /* Get escaped opcode */
> diff --git a/arch/x86/tools/gen-insn-attr-x86.awk b/arch/x86/tools/gen-insn-attr-x86.awk
> index af38469afd14..3f43aa7d8fef 100644
> --- a/arch/x86/tools/gen-insn-attr-x86.awk
> +++ b/arch/x86/tools/gen-insn-attr-x86.awk
> @@ -64,7 +64,9 @@ BEGIN {
>
>         modrm_expr = "^([CDEGMNPQRSUVW/][a-z]+|NTA|T[012])"
>         force64_expr = "\\([df]64\\)"
> -       rex_expr = "^REX(\\.[XRWB]+)*"
> +       rex_expr = "^((REX(\\.[XRWB]+)+)|(REX$))"
> +       rex2_expr = "\\(REX2\\)"
> +       no_rex2_expr = "\\(!REX2\\)"
>         fpu_expr = "^ESC" # TODO
>
>         lprefix1_expr = "\\((66|!F3)\\)"
> @@ -99,6 +101,7 @@ BEGIN {
>         prefix_num["VEX+1byte"] = "INAT_PFX_VEX2"
>         prefix_num["VEX+2byte"] = "INAT_PFX_VEX3"
>         prefix_num["EVEX"] = "INAT_PFX_EVEX"
> +       prefix_num["REX2"] = "INAT_PFX_REX2"
>
>         clear_vars()
>  }
> @@ -314,6 +317,10 @@ function convert_operands(count,opnd,       i,j,imm,mod)
>                 if (match(ext, force64_expr))
>                         flags = add_flags(flags, "INAT_FORCE64")
>
> +               # check REX2 not allowed
> +               if (match(ext, no_rex2_expr))
> +                       flags = add_flags(flags, "INAT_NO_REX2")
> +
>                 # check REX prefix
>                 if (match(opcode, rex_expr))
>                         flags = add_flags(flags, "INAT_MAKE_PREFIX(INAT_PFX_REX)")
> @@ -351,6 +358,8 @@ function convert_operands(count,opnd,       i,j,imm,mod)
>                         lptable3[idx] = add_flags(lptable3[idx],flags)
>                         variant = "INAT_VARIANT"
>                 }
> +               if (match(ext, rex2_expr))
> +                       table[idx] = add_flags(table[idx], "INAT_REX2_VARIANT")
>                 if (!match(ext, lprefix_expr)){
>                         table[idx] = add_flags(table[idx],flags)
>                 }
> diff --git a/tools/arch/x86/include/asm/inat.h b/tools/arch/x86/include/asm/inat.h
> index a61051400311..2e65312cae52 100644
> --- a/tools/arch/x86/include/asm/inat.h
> +++ b/tools/arch/x86/include/asm/inat.h
> @@ -35,6 +35,8 @@
>  #define INAT_PFX_VEX2  13      /* 2-bytes VEX prefix */
>  #define INAT_PFX_VEX3  14      /* 3-bytes VEX prefix */
>  #define INAT_PFX_EVEX  15      /* EVEX prefix */
> +/* x86-64 REX2 prefix */
> +#define INAT_PFX_REX2  16      /* 0xD5 */
>
>  #define INAT_LSTPFX_MAX        3
>  #define INAT_LGCPFX_MAX        11
> @@ -50,7 +52,7 @@
>
>  /* Legacy prefix */
>  #define INAT_PFX_OFFS  0
> -#define INAT_PFX_BITS  4
> +#define INAT_PFX_BITS  5
>  #define INAT_PFX_MAX    ((1 << INAT_PFX_BITS) - 1)
>  #define INAT_PFX_MASK  (INAT_PFX_MAX << INAT_PFX_OFFS)
>  /* Escape opcodes */
> @@ -77,6 +79,8 @@
>  #define INAT_VEXOK     (1 << (INAT_FLAG_OFFS + 5))
>  #define INAT_VEXONLY   (1 << (INAT_FLAG_OFFS + 6))
>  #define INAT_EVEXONLY  (1 << (INAT_FLAG_OFFS + 7))
> +#define INAT_NO_REX2   (1 << (INAT_FLAG_OFFS + 8))
> +#define INAT_REX2_VARIANT      (1 << (INAT_FLAG_OFFS + 9))
>  /* Attribute making macros for attribute tables */
>  #define INAT_MAKE_PREFIX(pfx)  (pfx << INAT_PFX_OFFS)
>  #define INAT_MAKE_ESCAPE(esc)  (esc << INAT_ESC_OFFS)
> @@ -128,6 +132,11 @@ static inline int inat_is_rex_prefix(insn_attr_t attr)
>         return (attr & INAT_PFX_MASK) == INAT_PFX_REX;
>  }
>
> +static inline int inat_is_rex2_prefix(insn_attr_t attr)
> +{
> +       return (attr & INAT_PFX_MASK) == INAT_PFX_REX2;
> +}
> +
>  static inline int inat_last_prefix_id(insn_attr_t attr)
>  {
>         if ((attr & INAT_PFX_MASK) > INAT_LSTPFX_MAX)
> diff --git a/tools/arch/x86/include/asm/insn.h b/tools/arch/x86/include/asm/insn.h
> index 65c0d9ce1e29..1a7e8fc4d75a 100644
> --- a/tools/arch/x86/include/asm/insn.h
> +++ b/tools/arch/x86/include/asm/insn.h
> @@ -112,10 +112,15 @@ struct insn {
>  #define X86_SIB_INDEX(sib) (((sib) & 0x38) >> 3)
>  #define X86_SIB_BASE(sib) ((sib) & 0x07)
>
> -#define X86_REX_W(rex) ((rex) & 8)
> -#define X86_REX_R(rex) ((rex) & 4)
> -#define X86_REX_X(rex) ((rex) & 2)
> -#define X86_REX_B(rex) ((rex) & 1)
> +#define X86_REX2_M(rex) ((rex) & 0x80) /* REX2 M0 */
> +#define X86_REX2_R(rex) ((rex) & 0x40) /* REX2 R4 */
> +#define X86_REX2_X(rex) ((rex) & 0x20) /* REX2 X4 */
> +#define X86_REX2_B(rex) ((rex) & 0x10) /* REX2 B4 */
> +
> +#define X86_REX_W(rex) ((rex) & 8)     /* REX or REX2 W */
> +#define X86_REX_R(rex) ((rex) & 4)     /* REX or REX2 R3 */
> +#define X86_REX_X(rex) ((rex) & 2)     /* REX or REX2 X3 */
> +#define X86_REX_B(rex) ((rex) & 1)     /* REX or REX2 B3 */
>
>  /* VEX bit flags  */
>  #define X86_VEX_W(vex) ((vex) & 0x80)  /* VEX3 Byte2 */
> @@ -161,6 +166,18 @@ static inline void insn_get_attribute(struct insn *insn)
>  /* Instruction uses RIP-relative addressing */
>  extern int insn_rip_relative(struct insn *insn);
>
> +static inline int insn_is_rex2(struct insn *insn)
> +{
> +       if (!insn->prefixes.got)
> +               insn_get_prefixes(insn);
> +       return insn->rex_prefix.nbytes == 2;
> +}
> +
> +static inline insn_byte_t insn_rex2_m_bit(struct insn *insn)
> +{
> +       return X86_REX2_M(insn->rex_prefix.bytes[1]);
> +}
> +
>  static inline int insn_is_avx(struct insn *insn)
>  {
>         if (!insn->prefixes.got)
> diff --git a/tools/arch/x86/lib/insn.c b/tools/arch/x86/lib/insn.c
> index ada4b4a79dd4..f761adeb8e8c 100644
> --- a/tools/arch/x86/lib/insn.c
> +++ b/tools/arch/x86/lib/insn.c
> @@ -185,6 +185,17 @@ int insn_get_prefixes(struct insn *insn)
>                         if (X86_REX_W(b))
>                                 /* REX.W overrides opnd_size */
>                                 insn->opnd_bytes = 8;
> +               } else if (inat_is_rex2_prefix(attr)) {
> +                       insn_set_byte(&insn->rex_prefix, 0, b);
> +                       b = peek_nbyte_next(insn_byte_t, insn, 1);
> +                       insn_set_byte(&insn->rex_prefix, 1, b);
> +                       insn->rex_prefix.nbytes = 2;
> +                       insn->next_byte += 2;
> +                       if (X86_REX_W(b))
> +                               /* REX.W overrides opnd_size */
> +                               insn->opnd_bytes = 8;
> +                       insn->rex_prefix.got = 1;
> +                       goto vex_end;
>                 }
>         }
>         insn->rex_prefix.got = 1;
> @@ -294,6 +305,20 @@ int insn_get_opcode(struct insn *insn)
>                 goto end;
>         }
>
> +       /* Check if there is REX2 prefix or not */
> +       if (insn_is_rex2(insn)) {
> +               if (insn_rex2_m_bit(insn)) {
> +                       /* map 1 is escape 0x0f */
> +                       insn_attr_t esc_attr = inat_get_opcode_attribute(0x0f);
> +
> +                       pfx_id = insn_last_prefix_id(insn);
> +                       insn->attr = inat_get_escape_attribute(op, pfx_id, esc_attr);
> +               } else {
> +                       insn->attr = inat_get_opcode_attribute(op);
> +               }
> +               goto end;
> +       }
> +
>         insn->attr = inat_get_opcode_attribute(op);
>         while (inat_is_escape(insn->attr)) {
>                 /* Get escaped opcode */
> diff --git a/tools/arch/x86/tools/gen-insn-attr-x86.awk b/tools/arch/x86/tools/gen-insn-attr-x86.awk
> index af38469afd14..3f43aa7d8fef 100644
> --- a/tools/arch/x86/tools/gen-insn-attr-x86.awk
> +++ b/tools/arch/x86/tools/gen-insn-attr-x86.awk
> @@ -64,7 +64,9 @@ BEGIN {
>
>         modrm_expr = "^([CDEGMNPQRSUVW/][a-z]+|NTA|T[012])"
>         force64_expr = "\\([df]64\\)"
> -       rex_expr = "^REX(\\.[XRWB]+)*"
> +       rex_expr = "^((REX(\\.[XRWB]+)+)|(REX$))"
> +       rex2_expr = "\\(REX2\\)"
> +       no_rex2_expr = "\\(!REX2\\)"
>         fpu_expr = "^ESC" # TODO
>
>         lprefix1_expr = "\\((66|!F3)\\)"
> @@ -99,6 +101,7 @@ BEGIN {
>         prefix_num["VEX+1byte"] = "INAT_PFX_VEX2"
>         prefix_num["VEX+2byte"] = "INAT_PFX_VEX3"
>         prefix_num["EVEX"] = "INAT_PFX_EVEX"
> +       prefix_num["REX2"] = "INAT_PFX_REX2"
>
>         clear_vars()
>  }
> @@ -314,6 +317,10 @@ function convert_operands(count,opnd,       i,j,imm,mod)
>                 if (match(ext, force64_expr))
>                         flags = add_flags(flags, "INAT_FORCE64")
>
> +               # check REX2 not allowed
> +               if (match(ext, no_rex2_expr))
> +                       flags = add_flags(flags, "INAT_NO_REX2")
> +
>                 # check REX prefix
>                 if (match(opcode, rex_expr))
>                         flags = add_flags(flags, "INAT_MAKE_PREFIX(INAT_PFX_REX)")
> @@ -351,6 +358,8 @@ function convert_operands(count,opnd,       i,j,imm,mod)
>                         lptable3[idx] = add_flags(lptable3[idx],flags)
>                         variant = "INAT_VARIANT"
>                 }
> +               if (match(ext, rex2_expr))
> +                       table[idx] = add_flags(table[idx], "INAT_REX2_VARIANT")
>                 if (!match(ext, lprefix_expr)){
>                         table[idx] = add_flags(table[idx],flags)
>                 }
> --
> 2.34.1
>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 02/10] x86/insn: Fix PUSH instruction in x86 instruction decoder opcode map
  2024-05-02 13:41   ` Arnaldo Carvalho de Melo
@ 2024-05-02 19:11     ` Adrian Hunter
  0 siblings, 0 replies; 18+ messages in thread
From: Adrian Hunter @ 2024-05-02 19:11 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: linux-kernel, Chang S. Bae, Masami Hiramatsu, Nikolay Borisov,
	Borislav Petkov, Ingo Molnar, H. Peter Anvin, Dave Hansen,
	Thomas Gleixner, x86, Jiri Olsa, Namhyung Kim, Ian Rogers,
	linux-perf-users

On 2/05/24 16:41, Arnaldo Carvalho de Melo wrote:
> On Thu, May 02, 2024 at 01:58:45PM +0300, Adrian Hunter wrote:
>> The x86 instruction decoder is used not only for decoding kernel
>> instructions. It is also used by perf uprobes (user space probes) and by
>> perf tools Intel Processor Trace decoding. Consequently, it needs to
>> support instructions executed by user space also.
> 
> I wonder if we should do it this way, i.e. updating both tools/ and
> kernel source code in the same patch.
> 
> I think the best is to update the kernel bits, then, after that is
> merged, do the tooling part.

For objtool purposes, it might be better to do both at the same time.

> 
> To avoid possible, yet unlikely, clashes in linux-next, for instance.

Always gonna find out about clashes at some point.


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 05/10] x86/insn: Add support for REX2 prefix to the instruction decoder logic
  2024-05-02 18:10   ` Ian Rogers
@ 2024-05-03  5:09     ` Adrian Hunter
  0 siblings, 0 replies; 18+ messages in thread
From: Adrian Hunter @ 2024-05-03  5:09 UTC (permalink / raw)
  To: Ian Rogers
  Cc: linux-kernel, Chang S. Bae, Masami Hiramatsu, Nikolay Borisov,
	Borislav Petkov, Ingo Molnar, H. Peter Anvin, Dave Hansen,
	Thomas Gleixner, x86, Arnaldo Carvalho de Melo, Jiri Olsa,
	Namhyung Kim, linux-perf-users

On 2/05/24 21:10, Ian Rogers wrote:
> On Thu, May 2, 2024 at 3:59 AM Adrian Hunter <adrian.hunter@intel.com> wrote:
>>
>> Intel Advanced Performance Extensions (APX) uses a new 2-byte prefix named
>> REX2 to select extended general purpose registers (EGPRs) i.e. r16 to r31.
>>
>> The REX2 prefix is effectively an extended version of the REX prefix.
>>
>> REX2 and EVEX are also used with PUSH/POP instructions to provide a
>> Push-Pop Acceleration (PPX) hint. With PPX hints, a CPU will attempt to
>> fast-forward register data between matching PUSH and POP instructions.
>>
>> REX2 is valid only with opcodes in maps 0 and 1. Similar extension for
>> other maps is provided by the EVEX prefix, covered in a separate patch.
>>
>> Some opcodes in maps 0 and 1 are reserved under REX2. One of these is used
>> for a new 64-bit absolute direct jump instruction JMPABS.
>>
>> Refer to the Intel Advanced Performance Extensions (Intel APX) Architecture
>> Specification for details.
>>
>> Define a code value for the REX2 prefix (INAT_PFX_REX2), and add attribute
>> flags for opcodes reserved under REX2 (INAT_NO_REX2) and to identify
>> opcodes (only JMPABS) that require a mandatory REX2 prefix
>> (INAT_REX2_VARIANT).
>>
>> Amend logic to read the REX2 prefix and get the opcode attribute for the
>> map number (0 or 1) encoded in the REX2 prefix.
>>
>> Amend the awk script that generates the attribute tables from the opcode
>> map, to recognise "REX2" as attribute INAT_PFX_REX2, and "(!REX2)"
>> as attribute INAT_NO_REX2, and "(REX2)" as attribute INAT_REX2_VARIANT.
>>
>> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
>> ---
>>  arch/x86/include/asm/inat.h                | 11 +++++++++-
>>  arch/x86/include/asm/insn.h                | 25 ++++++++++++++++++----
>>  arch/x86/lib/insn.c                        | 25 ++++++++++++++++++++++
>>  arch/x86/tools/gen-insn-attr-x86.awk       | 11 +++++++++-
>>  tools/arch/x86/include/asm/inat.h          | 11 +++++++++-
>>  tools/arch/x86/include/asm/insn.h          | 25 ++++++++++++++++++----
>>  tools/arch/x86/lib/insn.c                  | 25 ++++++++++++++++++++++
>>  tools/arch/x86/tools/gen-insn-attr-x86.awk | 11 +++++++++-
>>  8 files changed, 132 insertions(+), 12 deletions(-)
>>
>> diff --git a/arch/x86/include/asm/inat.h b/arch/x86/include/asm/inat.h
>> index b56c5741581a..1331bdd39a23 100644
>> --- a/arch/x86/include/asm/inat.h
>> +++ b/arch/x86/include/asm/inat.h
>> @@ -35,6 +35,8 @@
>>  #define INAT_PFX_VEX2  13      /* 2-bytes VEX prefix */
>>  #define INAT_PFX_VEX3  14      /* 3-bytes VEX prefix */
>>  #define INAT_PFX_EVEX  15      /* EVEX prefix */
>> +/* x86-64 REX2 prefix */
>> +#define INAT_PFX_REX2  16      /* 0xD5 */
>>
>>  #define INAT_LSTPFX_MAX        3
>>  #define INAT_LGCPFX_MAX        11
>> @@ -50,7 +52,7 @@
>>
>>  /* Legacy prefix */
>>  #define INAT_PFX_OFFS  0
>> -#define INAT_PFX_BITS  4
>> +#define INAT_PFX_BITS  5
>>  #define INAT_PFX_MAX    ((1 << INAT_PFX_BITS) - 1)
>>  #define INAT_PFX_MASK  (INAT_PFX_MAX << INAT_PFX_OFFS)
>>  /* Escape opcodes */
>> @@ -77,6 +79,8 @@
>>  #define INAT_VEXOK     (1 << (INAT_FLAG_OFFS + 5))
>>  #define INAT_VEXONLY   (1 << (INAT_FLAG_OFFS + 6))
>>  #define INAT_EVEXONLY  (1 << (INAT_FLAG_OFFS + 7))
>> +#define INAT_NO_REX2   (1 << (INAT_FLAG_OFFS + 8))
>> +#define INAT_REX2_VARIANT      (1 << (INAT_FLAG_OFFS + 9))
>>  /* Attribute making macros for attribute tables */
>>  #define INAT_MAKE_PREFIX(pfx)  (pfx << INAT_PFX_OFFS)
>>  #define INAT_MAKE_ESCAPE(esc)  (esc << INAT_ESC_OFFS)
>> @@ -128,6 +132,11 @@ static inline int inat_is_rex_prefix(insn_attr_t attr)
>>         return (attr & INAT_PFX_MASK) == INAT_PFX_REX;
>>  }
>>
>> +static inline int inat_is_rex2_prefix(insn_attr_t attr)
>> +{
>> +       return (attr & INAT_PFX_MASK) == INAT_PFX_REX2;
>> +}
>> +
>>  static inline int inat_last_prefix_id(insn_attr_t attr)
>>  {
>>         if ((attr & INAT_PFX_MASK) > INAT_LSTPFX_MAX)
>> diff --git a/arch/x86/include/asm/insn.h b/arch/x86/include/asm/insn.h
>> index 1b29f58f730f..95249ec1f24e 100644
>> --- a/arch/x86/include/asm/insn.h
>> +++ b/arch/x86/include/asm/insn.h
>> @@ -112,10 +112,15 @@ struct insn {
>>  #define X86_SIB_INDEX(sib) (((sib) & 0x38) >> 3)
>>  #define X86_SIB_BASE(sib) ((sib) & 0x07)
>>
>> -#define X86_REX_W(rex) ((rex) & 8)
>> -#define X86_REX_R(rex) ((rex) & 4)
>> -#define X86_REX_X(rex) ((rex) & 2)
>> -#define X86_REX_B(rex) ((rex) & 1)
>> +#define X86_REX2_M(rex) ((rex) & 0x80) /* REX2 M0 */
>> +#define X86_REX2_R(rex) ((rex) & 0x40) /* REX2 R4 */
>> +#define X86_REX2_X(rex) ((rex) & 0x20) /* REX2 X4 */
>> +#define X86_REX2_B(rex) ((rex) & 0x10) /* REX2 B4 */
>> +
>> +#define X86_REX_W(rex) ((rex) & 8)     /* REX or REX2 W */
>> +#define X86_REX_R(rex) ((rex) & 4)     /* REX or REX2 R3 */
>> +#define X86_REX_X(rex) ((rex) & 2)     /* REX or REX2 X3 */
>> +#define X86_REX_B(rex) ((rex) & 1)     /* REX or REX2 B3 */
>>
>>  /* VEX bit flags  */
>>  #define X86_VEX_W(vex) ((vex) & 0x80)  /* VEX3 Byte2 */
>> @@ -161,6 +166,18 @@ static inline void insn_get_attribute(struct insn *insn)
>>  /* Instruction uses RIP-relative addressing */
>>  extern int insn_rip_relative(struct insn *insn);
>>
>> +static inline int insn_is_rex2(struct insn *insn)
>> +{
>> +       if (!insn->prefixes.got)
>> +               insn_get_prefixes(insn);
>> +       return insn->rex_prefix.nbytes == 2;
> 
> It'd be nice to capture that a rex2 prefix is by definition 2 bytes.
> Playing devil's advocate, if there were a REX and a REX2 prefix,
> couldn't rex_prefix.nbytes be 3? I'm wondering about other prefix
> combinations that may confuse this logic, maybe someone dreams up
> doing this for say alignment reasons like "rep ret".

REX with REX2 is not allowed.


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 09/10] perf intel pt: Add new JMPABS instruction to the Intel PT instruction decoder
  2024-05-02 10:58 ` [PATCH 09/10] perf intel pt: Add new JMPABS instruction to the Intel PT instruction decoder Adrian Hunter
@ 2024-05-28 18:14   ` Adrian Hunter
  2024-06-18  8:05     ` Adrian Hunter
  0 siblings, 1 reply; 18+ messages in thread
From: Adrian Hunter @ 2024-05-28 18:14 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Jiri Olsa, Namhyung Kim, Ian Rogers, linux-perf-users

On 2/05/24 13:58, Adrian Hunter wrote:
> JMPABS is 64-bit absolute direct jump instruction, encoded with a mandatory
> REX2 prefix. JMPABS is designed to be used in the procedure linkage table
> (PLT) to replace indirect jumps, because it has better performance. In that
> case the jump target will be amended at run time. To enable Intel PT to
> follow the code, a TIP packet is always emitted when JMPABS is traced under
> Intel PT.
> 
> Refer to the Intel Advanced Performance Extensions (Intel APX) Architecture
> Specification for details.
> 
> Decode JMPABS as an indirect jump, because it has an associated TIP packet
> the same as an indirect jump and the control flow should follow the TIP
> packet payload, and not assume it is the same as the on-file object code
> JMPABS target address.
> 
> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>

Patches 1-8 are in perf-tools-next now, so this and patch 10
could be applied.

> ---
>  tools/perf/util/intel-pt-decoder/intel-pt-insn-decoder.c | 9 +++++++++
>  1 file changed, 9 insertions(+)
> 
> diff --git a/tools/perf/util/intel-pt-decoder/intel-pt-insn-decoder.c b/tools/perf/util/intel-pt-decoder/intel-pt-insn-decoder.c
> index c5d57027ec23..4407130d91f8 100644
> --- a/tools/perf/util/intel-pt-decoder/intel-pt-insn-decoder.c
> +++ b/tools/perf/util/intel-pt-decoder/intel-pt-insn-decoder.c
> @@ -92,6 +92,15 @@ static void intel_pt_insn_decoder(struct insn *insn,
>  		op = INTEL_PT_OP_JCC;
>  		branch = INTEL_PT_BR_CONDITIONAL;
>  		break;
> +	case 0xa1:
> +		if (insn_is_rex2(insn)) { /* jmpabs */
> +			intel_pt_insn->op = INTEL_PT_OP_JMP;
> +			/* jmpabs causes a TIP packet like an indirect branch */
> +			intel_pt_insn->branch = INTEL_PT_BR_INDIRECT;
> +			intel_pt_insn->length = insn->length;
> +			return;
> +		}
> +		break;
>  	case 0xc2: /* near ret */
>  	case 0xc3: /* near ret */
>  	case 0xca: /* far ret */


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 09/10] perf intel pt: Add new JMPABS instruction to the Intel PT instruction decoder
  2024-05-28 18:14   ` Adrian Hunter
@ 2024-06-18  8:05     ` Adrian Hunter
  0 siblings, 0 replies; 18+ messages in thread
From: Adrian Hunter @ 2024-06-18  8:05 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Jiri Olsa, Namhyung Kim, Ian Rogers, linux-perf-users

On 28/05/24 21:14, Adrian Hunter wrote:
> On 2/05/24 13:58, Adrian Hunter wrote:
>> JMPABS is 64-bit absolute direct jump instruction, encoded with a mandatory
>> REX2 prefix. JMPABS is designed to be used in the procedure linkage table
>> (PLT) to replace indirect jumps, because it has better performance. In that
>> case the jump target will be amended at run time. To enable Intel PT to
>> follow the code, a TIP packet is always emitted when JMPABS is traced under
>> Intel PT.
>>
>> Refer to the Intel Advanced Performance Extensions (Intel APX) Architecture
>> Specification for details.
>>
>> Decode JMPABS as an indirect jump, because it has an associated TIP packet
>> the same as an indirect jump and the control flow should follow the TIP
>> packet payload, and not assume it is the same as the on-file object code
>> JMPABS target address.
>>
>> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
> 
> Patches 1-8 are in perf-tools-next now, so this and patch 10
> could be applied.

Still apply

> 
>> ---
>>  tools/perf/util/intel-pt-decoder/intel-pt-insn-decoder.c | 9 +++++++++
>>  1 file changed, 9 insertions(+)
>>
>> diff --git a/tools/perf/util/intel-pt-decoder/intel-pt-insn-decoder.c b/tools/perf/util/intel-pt-decoder/intel-pt-insn-decoder.c
>> index c5d57027ec23..4407130d91f8 100644
>> --- a/tools/perf/util/intel-pt-decoder/intel-pt-insn-decoder.c
>> +++ b/tools/perf/util/intel-pt-decoder/intel-pt-insn-decoder.c
>> @@ -92,6 +92,15 @@ static void intel_pt_insn_decoder(struct insn *insn,
>>  		op = INTEL_PT_OP_JCC;
>>  		branch = INTEL_PT_BR_CONDITIONAL;
>>  		break;
>> +	case 0xa1:
>> +		if (insn_is_rex2(insn)) { /* jmpabs */
>> +			intel_pt_insn->op = INTEL_PT_OP_JMP;
>> +			/* jmpabs causes a TIP packet like an indirect branch */
>> +			intel_pt_insn->branch = INTEL_PT_BR_INDIRECT;
>> +			intel_pt_insn->length = insn->length;
>> +			return;
>> +		}
>> +		break;
>>  	case 0xc2: /* near ret */
>>  	case 0xc3: /* near ret */
>>  	case 0xca: /* far ret */
> 


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: (subset) [PATCH 00/10] perf intel pt: Update instruction decoder for APX and other new instructions
  2024-05-02 10:58 [PATCH 00/10] perf intel pt: Update instruction decoder for APX and other new instructions Adrian Hunter
                   ` (9 preceding siblings ...)
  2024-05-02 10:58 ` [PATCH 10/10] perf tests: Add APX and other new instructions to x86 instruction decoder test Adrian Hunter
@ 2024-06-26  3:59 ` Namhyung Kim
  10 siblings, 0 replies; 18+ messages in thread
From: Namhyung Kim @ 2024-06-26  3:59 UTC (permalink / raw)
  To: linux-kernel, Adrian Hunter
  Cc: Chang S. Bae, Masami Hiramatsu, Nikolay Borisov, Borislav Petkov,
	Ingo Molnar, H. Peter Anvin, Dave Hansen, Thomas Gleixner, x86,
	Arnaldo Carvalho de Melo, Jiri Olsa, Ian Rogers, linux-perf-users

On Thu, 02 May 2024 13:58:43 +0300, Adrian Hunter wrote:

> The x86 instruction decoder is used not only for decoding kernel
> instructions. It is also used by perf uprobes (user space probes) and by
> perf tools Intel Processor Trace decoding. Consequently, it needs to
> support instructions executed by user space also.
> 
> It should be noted that there are 2 copies of the instruction decoder.
> One for the kernel and one for tools, which is a policy to prevent
> changes from breaking builds.
> 
> [...]

Applied patch 9 and 10 to perf-tools-next, thanks!

Best regards,
Namhyung

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2024-06-26  3:59 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-05-02 10:58 [PATCH 00/10] perf intel pt: Update instruction decoder for APX and other new instructions Adrian Hunter
2024-05-02 10:58 ` [PATCH 01/10] x86/insn: Add Key Locker instructions to the opcode map Adrian Hunter
2024-05-02 10:58 ` [PATCH 02/10] x86/insn: Fix PUSH instruction in x86 instruction decoder " Adrian Hunter
2024-05-02 13:41   ` Arnaldo Carvalho de Melo
2024-05-02 19:11     ` Adrian Hunter
2024-05-02 10:58 ` [PATCH 03/10] x86/insn: Add VEX versions of VPDPBUSD, VPDPBUSDS, VPDPWSSD and VPDPWSSDS Adrian Hunter
2024-05-02 10:58 ` [PATCH 04/10] x86/insn: Add misc new Intel instructions Adrian Hunter
2024-05-02 10:58 ` [PATCH 05/10] x86/insn: Add support for REX2 prefix to the instruction decoder logic Adrian Hunter
2024-05-02 18:10   ` Ian Rogers
2024-05-03  5:09     ` Adrian Hunter
2024-05-02 10:58 ` [PATCH 06/10] x86/insn: x86/insn: Add support for REX2 prefix to the instruction decoder opcode map Adrian Hunter
2024-05-02 10:58 ` [PATCH 07/10] x86/insn: Add support for APX EVEX to the instruction decoder logic Adrian Hunter
2024-05-02 10:58 ` [PATCH 08/10] x86/insn: Add support for APX EVEX instructions to the opcode map Adrian Hunter
2024-05-02 10:58 ` [PATCH 09/10] perf intel pt: Add new JMPABS instruction to the Intel PT instruction decoder Adrian Hunter
2024-05-28 18:14   ` Adrian Hunter
2024-06-18  8:05     ` Adrian Hunter
2024-05-02 10:58 ` [PATCH 10/10] perf tests: Add APX and other new instructions to x86 instruction decoder test Adrian Hunter
2024-06-26  3:59 ` (subset) [PATCH 00/10] perf intel pt: Update instruction decoder for APX and other new instructions Namhyung Kim

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).