* [Qemu-devel] [PATCH 0/5] tcg/i386: use movbe instruction in qemu_ldst routines
@ 2013-12-21 16:43 Aurelien Jarno
2013-12-21 16:43 ` [Qemu-devel] [PATCH 1/5] disas/i386.c: disassemble movbe instruction Aurelien Jarno
` (5 more replies)
0 siblings, 6 replies; 13+ messages in thread
From: Aurelien Jarno @ 2013-12-21 16:43 UTC (permalink / raw)
To: qemu-devel; +Cc: Aurelien Jarno
This patchset enable the usage of the movbe instruction, available on
Intel Atom and Intel Haswell CPU, in qemu_ldst routines, avoiding bswap
instructions before or after the store or loads. The availability of
this instruction is done at runtime using the cpuid instruction.
The last patch of the series is not fully related, but I spotted the
issue when working on this patchset, so I thought it's a good idea to
fix it.
Aurelien Jarno (5):
disas/i386.c: disassemble movbe instruction
tcg/i386: remove hardcoded P_REXW value
tcg/i386: add support for three-byte opcodes
tcg/i386: use movbe instruction in qemu_ldst routines
tcg/i386: cleanup useless #ifdef
disas/i386.c | 8 +--
tcg/i386/tcg-target.c | 178 ++++++++++++++++++++++++++++++++++---------------
2 files changed, 128 insertions(+), 58 deletions(-)
--
1.7.10.4
^ permalink raw reply [flat|nested] 13+ messages in thread
* [Qemu-devel] [PATCH 1/5] disas/i386.c: disassemble movbe instruction
2013-12-21 16:43 [Qemu-devel] [PATCH 0/5] tcg/i386: use movbe instruction in qemu_ldst routines Aurelien Jarno
@ 2013-12-21 16:43 ` Aurelien Jarno
2013-12-22 16:43 ` Richard Henderson
2013-12-21 16:43 ` [Qemu-devel] [PATCH 2/5] tcg/i386: remove hardcoded P_REXW value Aurelien Jarno
` (4 subsequent siblings)
5 siblings, 1 reply; 13+ messages in thread
From: Aurelien Jarno @ 2013-12-21 16:43 UTC (permalink / raw)
To: qemu-devel; +Cc: Aurelien Jarno
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
---
disas/i386.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/disas/i386.c b/disas/i386.c
index 47f1f2e..044e02c 100644
--- a/disas/i386.c
+++ b/disas/i386.c
@@ -2632,17 +2632,17 @@ static const struct dis386 prefix_user_table[][4] = {
/* PREGRP87 */
{
+ { "movbe", { Gv, Ev } },
{ "(bad)", { XX } },
- { "(bad)", { XX } },
- { "(bad)", { XX } },
+ { "movbe", { Gv, Ev } },
{ "crc32", { Gdq, { CRC32_Fixup, b_mode } } },
},
/* PREGRP88 */
{
+ { "movbe", { Ev, Gv } },
{ "(bad)", { XX } },
- { "(bad)", { XX } },
- { "(bad)", { XX } },
+ { "movbe", { Ev, Gv } },
{ "crc32", { Gdq, { CRC32_Fixup, v_mode } } },
},
--
1.7.10.4
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [Qemu-devel] [PATCH 2/5] tcg/i386: remove hardcoded P_REXW value
2013-12-21 16:43 [Qemu-devel] [PATCH 0/5] tcg/i386: use movbe instruction in qemu_ldst routines Aurelien Jarno
2013-12-21 16:43 ` [Qemu-devel] [PATCH 1/5] disas/i386.c: disassemble movbe instruction Aurelien Jarno
@ 2013-12-21 16:43 ` Aurelien Jarno
2013-12-22 16:43 ` Richard Henderson
2013-12-21 16:43 ` [Qemu-devel] [PATCH 3/5] tcg/i386: add support for three-byte opcodes Aurelien Jarno
` (3 subsequent siblings)
5 siblings, 1 reply; 13+ messages in thread
From: Aurelien Jarno @ 2013-12-21 16:43 UTC (permalink / raw)
To: qemu-devel; +Cc: Aurelien Jarno
P_REXW is defined has a constant at the beginning of i386/tcg-target.c,
but the corresponding bit is later used in a harcoded way, which defeat
the purpose of a constant.
Fix that by using a conditional expression operator instead of a shift.
On x86 this actually makes the code slightly smaller as GCC does in
practice (opc >> 8) & 8 instead of (opc & 0x800) >> 8 so the constants
are smaller to load.
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
---
tcg/i386/tcg-target.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tcg/i386/tcg-target.c b/tcg/i386/tcg-target.c
index 495b901..753b3a1 100644
--- a/tcg/i386/tcg-target.c
+++ b/tcg/i386/tcg-target.c
@@ -381,7 +381,7 @@ static void tcg_out_opc(TCGContext *s, int opc, int r, int rm, int x)
}
rex = 0;
- rex |= (opc & P_REXW) >> 8; /* REX.W */
+ rex |= (opc & P_REXW) ? 0x8 : 0x0; /* REX.W */
rex |= (r & 8) >> 1; /* REX.R */
rex |= (x & 8) >> 2; /* REX.X */
rex |= (rm & 8) >> 3; /* REX.B */
--
1.7.10.4
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [Qemu-devel] [PATCH 3/5] tcg/i386: add support for three-byte opcodes
2013-12-21 16:43 [Qemu-devel] [PATCH 0/5] tcg/i386: use movbe instruction in qemu_ldst routines Aurelien Jarno
2013-12-21 16:43 ` [Qemu-devel] [PATCH 1/5] disas/i386.c: disassemble movbe instruction Aurelien Jarno
2013-12-21 16:43 ` [Qemu-devel] [PATCH 2/5] tcg/i386: remove hardcoded P_REXW value Aurelien Jarno
@ 2013-12-21 16:43 ` Aurelien Jarno
2013-12-22 16:46 ` Richard Henderson
2013-12-21 16:43 ` [Qemu-devel] [PATCH 4/5] tcg/i386: use movbe instruction in qemu_ldst routines Aurelien Jarno
` (2 subsequent siblings)
5 siblings, 1 reply; 13+ messages in thread
From: Aurelien Jarno @ 2013-12-21 16:43 UTC (permalink / raw)
To: qemu-devel; +Cc: Aurelien Jarno
Add support for three-byte opcodes, starting with the 0x0f 0x38 prefix.
Use P_EXT2 as the new constant, and shift all other constants so that
P_EXT and P_EXT2 have neighbouring values.
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
---
tcg/i386/tcg-target.c | 22 ++++++++++++++++------
1 file changed, 16 insertions(+), 6 deletions(-)
diff --git a/tcg/i386/tcg-target.c b/tcg/i386/tcg-target.c
index 753b3a1..e247829 100644
--- a/tcg/i386/tcg-target.c
+++ b/tcg/i386/tcg-target.c
@@ -240,13 +240,14 @@ static inline int tcg_target_const_match(tcg_target_long val,
#endif
#define P_EXT 0x100 /* 0x0f opcode prefix */
-#define P_DATA16 0x200 /* 0x66 opcode prefix */
+#define P_EXT2 0x200 /* 0x0f 0x38 opcode prefix */
+#define P_DATA16 0x400 /* 0x66 opcode prefix */
#if TCG_TARGET_REG_BITS == 64
-# define P_ADDR32 0x400 /* 0x67 opcode prefix */
-# define P_REXW 0x800 /* Set REX.W = 1 */
-# define P_REXB_R 0x1000 /* REG field as byte register */
-# define P_REXB_RM 0x2000 /* R/M field as byte register */
-# define P_GS 0x4000 /* gs segment override */
+# define P_ADDR32 0x800 /* 0x67 opcode prefix */
+# define P_REXW 0x1000 /* Set REX.W = 1 */
+# define P_REXB_R 0x2000 /* REG field as byte register */
+# define P_REXB_RM 0x4000 /* R/M field as byte register */
+# define P_GS 0x8000 /* gs segment override */
#else
# define P_ADDR32 0
# define P_REXW 0
@@ -401,6 +402,11 @@ static void tcg_out_opc(TCGContext *s, int opc, int r, int rm, int x)
if (opc & P_EXT) {
tcg_out8(s, 0x0f);
}
+
+ if (opc & P_EXT2) {
+ tcg_out8(s, 0x0f);
+ tcg_out8(s, 0x38);
+ }
tcg_out8(s, opc);
}
#else
@@ -412,6 +418,10 @@ static void tcg_out_opc(TCGContext *s, int opc)
if (opc & P_EXT) {
tcg_out8(s, 0x0f);
}
+ if (opc & P_EXT2) {
+ tcg_out8(s, 0x0f);
+ tcg_out8(s, 0x38);
+ }
tcg_out8(s, opc);
}
/* Discard the register arguments to tcg_out_opc early, so as not to penalize
--
1.7.10.4
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [Qemu-devel] [PATCH 4/5] tcg/i386: use movbe instruction in qemu_ldst routines
2013-12-21 16:43 [Qemu-devel] [PATCH 0/5] tcg/i386: use movbe instruction in qemu_ldst routines Aurelien Jarno
` (2 preceding siblings ...)
2013-12-21 16:43 ` [Qemu-devel] [PATCH 3/5] tcg/i386: add support for three-byte opcodes Aurelien Jarno
@ 2013-12-21 16:43 ` Aurelien Jarno
2013-12-22 16:52 ` Richard Henderson
2013-12-21 16:43 ` [Qemu-devel] [PATCH 5/5] tcg/i386: cleanup useless #ifdef Aurelien Jarno
2013-12-22 11:24 ` [Qemu-devel] [PATCH 0/5] tcg/i386: use movbe instruction in qemu_ldst routines Aurelien Jarno
5 siblings, 1 reply; 13+ messages in thread
From: Aurelien Jarno @ 2013-12-21 16:43 UTC (permalink / raw)
To: qemu-devel; +Cc: Aurelien Jarno
The movbe instruction has been added on some Intel Atom CPUs and on
recent Intel Haswell CPUs. It allows to load/store a value and at the
same time bswap it.
This patch detects the avaibility of this instruction and when available
use it in the qemu load/store routines in replacement of load/store +
bswap. Note that for 16-bit unsigned loads, movbe + movzw is basically the
same as movzw + bswap, so the patch doesn't touch this case.
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
---
tcg/i386/tcg-target.c | 152 ++++++++++++++++++++++++++++++++++---------------
1 file changed, 107 insertions(+), 45 deletions(-)
diff --git a/tcg/i386/tcg-target.c b/tcg/i386/tcg-target.c
index e247829..8fbb0be 100644
--- a/tcg/i386/tcg-target.c
+++ b/tcg/i386/tcg-target.c
@@ -99,18 +99,31 @@ static const int tcg_target_call_oarg_regs[] = {
# define TCG_REG_L1 TCG_REG_EDX
#endif
+/* The host compiler should supply <cpuid.h> to enable runtime features
+ detection, as we're not going to go so far as our own inline assembly.
+ If not available, default values will be assumed. */
+#if defined(CONFIG_CPUID_H)
+#include <cpuid.h>
+#endif
+
/* For 32-bit, we are going to attempt to determine at runtime whether cmov
- is available. However, the host compiler must supply <cpuid.h>, as we're
- not going to go so far as our own inline assembly. */
+ is available. */
#if TCG_TARGET_REG_BITS == 64
# define have_cmov 1
#elif defined(CONFIG_CPUID_H)
-#include <cpuid.h>
static bool have_cmov;
#else
# define have_cmov 0
#endif
+/* If bit_MOVBE is defined in cpuid.h (added in GCC version 4.6), we are
+ going to attempt to determine at runtime whether movbe is available. */
+#if defined(CONFIG_CPUID_H) && defined(bit_MOVBE)
+static bool have_movbe;
+#else
+# define have_movbe 0
+#endif
+
static uint8_t *tb_ret_addr;
static void patch_reloc(uint8_t *code_ptr, int type,
@@ -280,6 +293,8 @@ static inline int tcg_target_const_match(tcg_target_long val,
#define OPC_MOVB_EvIz (0xc6)
#define OPC_MOVL_EvIz (0xc7)
#define OPC_MOVL_Iv (0xb8)
+#define OPC_MOVBE_GyMy (0xf0 | P_EXT2)
+#define OPC_MOVBE_MyGy (0xf1 | P_EXT2)
#define OPC_MOVSBL (0xbe | P_EXT)
#define OPC_MOVSWL (0xbf | P_EXT)
#define OPC_MOVSLQ (0x63 | P_REXW)
@@ -1363,8 +1378,13 @@ static void tcg_out_qemu_ld_direct(TCGContext *s, TCGReg datalo, TCGReg datahi,
break;
case MO_SW:
if (bswap) {
- tcg_out_modrm_offset(s, OPC_MOVZWL + seg, datalo, base, ofs);
- tcg_out_rolw_8(s, datalo);
+ if (have_movbe) {
+ tcg_out_modrm_offset(s, OPC_MOVBE_GyMy + P_DATA16 + seg,
+ datalo, base, ofs);
+ } else {
+ tcg_out_modrm_offset(s, OPC_MOVZWL + seg, datalo, base, ofs);
+ tcg_out_rolw_8(s, datalo);
+ }
tcg_out_modrm(s, OPC_MOVSWL + P_REXW, datalo, datalo);
} else {
tcg_out_modrm_offset(s, OPC_MOVSWL + P_REXW + seg,
@@ -1372,16 +1392,25 @@ static void tcg_out_qemu_ld_direct(TCGContext *s, TCGReg datalo, TCGReg datahi,
}
break;
case MO_UL:
- tcg_out_modrm_offset(s, OPC_MOVL_GvEv + seg, datalo, base, ofs);
- if (bswap) {
- tcg_out_bswap32(s, datalo);
+ if (bswap && have_movbe) {
+ tcg_out_modrm_offset(s, OPC_MOVBE_GyMy + seg, datalo, base, ofs);
+ } else {
+ tcg_out_modrm_offset(s, OPC_MOVL_GvEv + seg, datalo, base, ofs);
+ if (bswap) {
+ tcg_out_bswap32(s, datalo);
+ }
}
break;
#if TCG_TARGET_REG_BITS == 64
case MO_SL:
if (bswap) {
- tcg_out_modrm_offset(s, OPC_MOVL_GvEv + seg, datalo, base, ofs);
- tcg_out_bswap32(s, datalo);
+ if (have_movbe) {
+ tcg_out_modrm_offset(s, OPC_MOVBE_GyMy + seg,
+ datalo, base, ofs);
+ } else {
+ tcg_out_modrm_offset(s, OPC_MOVL_GvEv + seg, datalo, base, ofs);
+ tcg_out_bswap32(s, datalo);
+ }
tcg_out_ext32s(s, datalo, datalo);
} else {
tcg_out_modrm_offset(s, OPC_MOVSLQ + seg, datalo, base, ofs);
@@ -1390,29 +1419,34 @@ static void tcg_out_qemu_ld_direct(TCGContext *s, TCGReg datalo, TCGReg datahi,
#endif
case MO_Q:
if (TCG_TARGET_REG_BITS == 64) {
- tcg_out_modrm_offset(s, OPC_MOVL_GvEv + P_REXW + seg,
- datalo, base, ofs);
- if (bswap) {
- tcg_out_bswap64(s, datalo);
+ if (bswap && have_movbe) {
+ tcg_out_modrm_offset(s, OPC_MOVBE_GyMy + P_REXW + seg,
+ datalo, base, ofs);
+ } else {
+ tcg_out_modrm_offset(s, OPC_MOVL_GvEv + P_REXW + seg,
+ datalo, base, ofs);
+ if (bswap) {
+ tcg_out_bswap64(s, datalo);
+ }
}
} else {
+ int opc = OPC_MOVL_GvEv;
if (bswap) {
int t = datalo;
datalo = datahi;
datahi = t;
+ if (have_movbe) {
+ opc = OPC_MOVBE_GyMy;
+ }
}
if (base != datalo) {
- tcg_out_modrm_offset(s, OPC_MOVL_GvEv + seg,
- datalo, base, ofs);
- tcg_out_modrm_offset(s, OPC_MOVL_GvEv + seg,
- datahi, base, ofs + 4);
+ tcg_out_modrm_offset(s, opc + seg, datalo, base, ofs);
+ tcg_out_modrm_offset(s, opc + seg, datahi, base, ofs + 4);
} else {
- tcg_out_modrm_offset(s, OPC_MOVL_GvEv + seg,
- datahi, base, ofs + 4);
- tcg_out_modrm_offset(s, OPC_MOVL_GvEv + seg,
- datalo, base, ofs);
+ tcg_out_modrm_offset(s, opc + seg, datahi, base, ofs + 4);
+ tcg_out_modrm_offset(s, opc + seg, datalo, base, ofs);
}
- if (bswap) {
+ if (bswap && opc != OPC_MOVBE_GyMy) {
tcg_out_bswap32(s, datalo);
tcg_out_bswap32(s, datahi);
}
@@ -1506,31 +1540,48 @@ static void tcg_out_qemu_st_direct(TCGContext *s, TCGReg datalo, TCGReg datahi,
datalo, base, ofs);
break;
case MO_16:
- if (bswap) {
- tcg_out_mov(s, TCG_TYPE_I32, scratch, datalo);
- tcg_out_rolw_8(s, scratch);
- datalo = scratch;
+ if (bswap & have_movbe) {
+ tcg_out_modrm_offset(s, OPC_MOVBE_MyGy + P_DATA16 + seg,
+ datalo, base, ofs);
+ } else {
+ if (bswap) {
+ tcg_out_mov(s, TCG_TYPE_I32, scratch, datalo);
+ tcg_out_rolw_8(s, scratch);
+ datalo = scratch;
+ }
+ tcg_out_modrm_offset(s, OPC_MOVL_EvGv + P_DATA16 + seg,
+ datalo, base, ofs);
}
- tcg_out_modrm_offset(s, OPC_MOVL_EvGv + P_DATA16 + seg,
- datalo, base, ofs);
break;
case MO_32:
- if (bswap) {
- tcg_out_mov(s, TCG_TYPE_I32, scratch, datalo);
- tcg_out_bswap32(s, scratch);
- datalo = scratch;
+ if (bswap & have_movbe) {
+ tcg_out_modrm_offset(s, OPC_MOVBE_MyGy + seg, datalo, base, ofs);
+ } else {
+ if (bswap) {
+ tcg_out_mov(s, TCG_TYPE_I32, scratch, datalo);
+ tcg_out_bswap32(s, scratch);
+ datalo = scratch;
+ }
+ tcg_out_modrm_offset(s, OPC_MOVL_EvGv + seg, datalo, base, ofs);
}
- tcg_out_modrm_offset(s, OPC_MOVL_EvGv + seg, datalo, base, ofs);
break;
case MO_64:
if (TCG_TARGET_REG_BITS == 64) {
- if (bswap) {
- tcg_out_mov(s, TCG_TYPE_I64, scratch, datalo);
- tcg_out_bswap64(s, scratch);
- datalo = scratch;
+ if (bswap && have_movbe) {
+ tcg_out_modrm_offset(s, OPC_MOVBE_MyGy + P_REXW + seg,
+ datalo, base, ofs);
+ } else {
+ if (bswap) {
+ tcg_out_mov(s, TCG_TYPE_I64, scratch, datalo);
+ tcg_out_bswap64(s, scratch);
+ datalo = scratch;
+ }
+ tcg_out_modrm_offset(s, OPC_MOVL_EvGv + P_REXW + seg,
+ datalo, base, ofs);
}
- tcg_out_modrm_offset(s, OPC_MOVL_EvGv + P_REXW + seg,
- datalo, base, ofs);
+ } else if (bswap && have_movbe) {
+ tcg_out_modrm_offset(s, OPC_MOVBE_MyGy + seg, datahi, base, ofs);
+ tcg_out_modrm_offset(s, OPC_MOVBE_MyGy + seg, datalo, base, ofs+4);
} else if (bswap) {
tcg_out_mov(s, TCG_TYPE_I32, scratch, datahi);
tcg_out_bswap32(s, scratch);
@@ -2167,13 +2218,24 @@ static void tcg_target_qemu_prologue(TCGContext *s)
static void tcg_target_init(TCGContext *s)
{
- /* For 32-bit, 99% certainty that we're running on hardware that supports
- cmov, but we still need to check. In case cmov is not available, we'll
- use a small forward branch. */
-#ifndef have_cmov
+#if !(defined(have_cmov) && defined(have_movbe))
{
unsigned a, b, c, d;
- have_cmov = (__get_cpuid(1, &a, &b, &c, &d) && (d & bit_CMOV));
+ int ret;
+ ret = __get_cpuid(1, &a, &b, &c, &d);
+
+# ifndef have_cmov
+ /* For 32-bit, 99% certainty that we're running on hardware that
+ supports cmov, but we still need to check. In case cmov is not
+ available, we'll use a small forward branch. */
+ have_cmov = ret && (d & bit_CMOV);
+# endif
+
+# ifndef have_movbe
+ /* MOVBE is only available on Intel Atom and Haswell CPUs, so we
+ need to probe for it. */
+ have_movbe = ret && (c & bit_MOVBE);
+# endif
}
#endif
--
1.7.10.4
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [Qemu-devel] [PATCH 5/5] tcg/i386: cleanup useless #ifdef
2013-12-21 16:43 [Qemu-devel] [PATCH 0/5] tcg/i386: use movbe instruction in qemu_ldst routines Aurelien Jarno
` (3 preceding siblings ...)
2013-12-21 16:43 ` [Qemu-devel] [PATCH 4/5] tcg/i386: use movbe instruction in qemu_ldst routines Aurelien Jarno
@ 2013-12-21 16:43 ` Aurelien Jarno
2013-12-22 16:44 ` Richard Henderson
2013-12-22 11:24 ` [Qemu-devel] [PATCH 0/5] tcg/i386: use movbe instruction in qemu_ldst routines Aurelien Jarno
5 siblings, 1 reply; 13+ messages in thread
From: Aurelien Jarno @ 2013-12-21 16:43 UTC (permalink / raw)
To: qemu-devel; +Cc: Aurelien Jarno
TCG_TARGET_HAS_movcond_i32 is always defined to 1 in tcg-target.h, so
remove the corresponding #ifdef #endif sequence, left from a previous
refactoring.
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
---
tcg/i386/tcg-target.c | 2 --
1 file changed, 2 deletions(-)
diff --git a/tcg/i386/tcg-target.c b/tcg/i386/tcg-target.c
index 8fbb0be..80d2fa3 100644
--- a/tcg/i386/tcg-target.c
+++ b/tcg/i386/tcg-target.c
@@ -2046,9 +2046,7 @@ static const TCGTargetOpDef x86_op_defs[] = {
{ INDEX_op_setcond_i32, { "q", "r", "ri" } },
{ INDEX_op_deposit_i32, { "Q", "0", "Q" } },
-#if TCG_TARGET_HAS_movcond_i32
{ INDEX_op_movcond_i32, { "r", "r", "ri", "r", "0" } },
-#endif
{ INDEX_op_mulu2_i32, { "a", "d", "a", "r" } },
{ INDEX_op_muls2_i32, { "a", "d", "a", "r" } },
--
1.7.10.4
^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [Qemu-devel] [PATCH 0/5] tcg/i386: use movbe instruction in qemu_ldst routines
2013-12-21 16:43 [Qemu-devel] [PATCH 0/5] tcg/i386: use movbe instruction in qemu_ldst routines Aurelien Jarno
` (4 preceding siblings ...)
2013-12-21 16:43 ` [Qemu-devel] [PATCH 5/5] tcg/i386: cleanup useless #ifdef Aurelien Jarno
@ 2013-12-22 11:24 ` Aurelien Jarno
2013-12-22 11:47 ` Aurelien Jarno
5 siblings, 1 reply; 13+ messages in thread
From: Aurelien Jarno @ 2013-12-22 11:24 UTC (permalink / raw)
To: qemu-devel; +Cc: Richard Henderson
I forgot to Cc: Richard on this patch set, doing that now...
On Sat, Dec 21, 2013 at 05:43:39PM +0100, Aurelien Jarno wrote:
> This patchset enable the usage of the movbe instruction, available on
> Intel Atom and Intel Haswell CPU, in qemu_ldst routines, avoiding bswap
> instructions before or after the store or loads. The availability of
> this instruction is done at runtime using the cpuid instruction.
>
> The last patch of the series is not fully related, but I spotted the
> issue when working on this patchset, so I thought it's a good idea to
> fix it.
>
> Aurelien Jarno (5):
> disas/i386.c: disassemble movbe instruction
> tcg/i386: remove hardcoded P_REXW value
> tcg/i386: add support for three-byte opcodes
> tcg/i386: use movbe instruction in qemu_ldst routines
> tcg/i386: cleanup useless #ifdef
>
> disas/i386.c | 8 +--
> tcg/i386/tcg-target.c | 178 ++++++++++++++++++++++++++++++++++---------------
> 2 files changed, 128 insertions(+), 58 deletions(-)
>
> --
> 1.7.10.4
>
>
--
Aurelien Jarno GPG: 1024D/F1BCDB73
aurelien@aurel32.net http://www.aurel32.net
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Qemu-devel] [PATCH 0/5] tcg/i386: use movbe instruction in qemu_ldst routines
2013-12-22 11:24 ` [Qemu-devel] [PATCH 0/5] tcg/i386: use movbe instruction in qemu_ldst routines Aurelien Jarno
@ 2013-12-22 11:47 ` Aurelien Jarno
0 siblings, 0 replies; 13+ messages in thread
From: Aurelien Jarno @ 2013-12-22 11:47 UTC (permalink / raw)
To: qemu-devel; +Cc: Richard Henderson
And now I just realized you send such a patch before me. I am going to
review yours then.
On Sun, Dec 22, 2013 at 12:24:38PM +0100, Aurelien Jarno wrote:
> I forgot to Cc: Richard on this patch set, doing that now...
>
> On Sat, Dec 21, 2013 at 05:43:39PM +0100, Aurelien Jarno wrote:
> > This patchset enable the usage of the movbe instruction, available on
> > Intel Atom and Intel Haswell CPU, in qemu_ldst routines, avoiding bswap
> > instructions before or after the store or loads. The availability of
> > this instruction is done at runtime using the cpuid instruction.
> >
> > The last patch of the series is not fully related, but I spotted the
> > issue when working on this patchset, so I thought it's a good idea to
> > fix it.
> >
> > Aurelien Jarno (5):
> > disas/i386.c: disassemble movbe instruction
> > tcg/i386: remove hardcoded P_REXW value
> > tcg/i386: add support for three-byte opcodes
> > tcg/i386: use movbe instruction in qemu_ldst routines
> > tcg/i386: cleanup useless #ifdef
> >
> > disas/i386.c | 8 +--
> > tcg/i386/tcg-target.c | 178 ++++++++++++++++++++++++++++++++++---------------
> > 2 files changed, 128 insertions(+), 58 deletions(-)
> >
> > --
> > 1.7.10.4
> >
> >
>
> --
> Aurelien Jarno GPG: 1024D/F1BCDB73
> aurelien@aurel32.net http://www.aurel32.net
--
Aurelien Jarno GPG: 1024D/F1BCDB73
aurelien@aurel32.net http://www.aurel32.net
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Qemu-devel] [PATCH 1/5] disas/i386.c: disassemble movbe instruction
2013-12-21 16:43 ` [Qemu-devel] [PATCH 1/5] disas/i386.c: disassemble movbe instruction Aurelien Jarno
@ 2013-12-22 16:43 ` Richard Henderson
0 siblings, 0 replies; 13+ messages in thread
From: Richard Henderson @ 2013-12-22 16:43 UTC (permalink / raw)
To: Aurelien Jarno, qemu-devel
On 12/21/2013 08:43 AM, Aurelien Jarno wrote:
> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
> ---
> disas/i386.c | 8 ++++----
> 1 file changed, 4 insertions(+), 4 deletions(-)
Reviewed-by: Richard Henderson <rth@twiddle.net>
r~
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Qemu-devel] [PATCH 2/5] tcg/i386: remove hardcoded P_REXW value
2013-12-21 16:43 ` [Qemu-devel] [PATCH 2/5] tcg/i386: remove hardcoded P_REXW value Aurelien Jarno
@ 2013-12-22 16:43 ` Richard Henderson
0 siblings, 0 replies; 13+ messages in thread
From: Richard Henderson @ 2013-12-22 16:43 UTC (permalink / raw)
To: Aurelien Jarno, qemu-devel
On 12/21/2013 08:43 AM, Aurelien Jarno wrote:
> P_REXW is defined has a constant at the beginning of i386/tcg-target.c,
> but the corresponding bit is later used in a harcoded way, which defeat
> the purpose of a constant.
>
> Fix that by using a conditional expression operator instead of a shift.
> On x86 this actually makes the code slightly smaller as GCC does in
> practice (opc >> 8) & 8 instead of (opc & 0x800) >> 8 so the constants
> are smaller to load.
>
> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
> ---
> tcg/i386/tcg-target.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
Reviewed-by: Richard Henderson <rth@twiddle.net>
r~
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Qemu-devel] [PATCH 5/5] tcg/i386: cleanup useless #ifdef
2013-12-21 16:43 ` [Qemu-devel] [PATCH 5/5] tcg/i386: cleanup useless #ifdef Aurelien Jarno
@ 2013-12-22 16:44 ` Richard Henderson
0 siblings, 0 replies; 13+ messages in thread
From: Richard Henderson @ 2013-12-22 16:44 UTC (permalink / raw)
To: Aurelien Jarno, qemu-devel
On 12/21/2013 08:43 AM, Aurelien Jarno wrote:
> TCG_TARGET_HAS_movcond_i32 is always defined to 1 in tcg-target.h, so
> remove the corresponding #ifdef #endif sequence, left from a previous
> refactoring.
>
> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
> ---
> tcg/i386/tcg-target.c | 2 --
> 1 file changed, 2 deletions(-)
Reviewed-by: Richard Henderson <rth@twiddle.net>
r~
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Qemu-devel] [PATCH 3/5] tcg/i386: add support for three-byte opcodes
2013-12-21 16:43 ` [Qemu-devel] [PATCH 3/5] tcg/i386: add support for three-byte opcodes Aurelien Jarno
@ 2013-12-22 16:46 ` Richard Henderson
0 siblings, 0 replies; 13+ messages in thread
From: Richard Henderson @ 2013-12-22 16:46 UTC (permalink / raw)
To: Aurelien Jarno, qemu-devel
On 12/21/2013 08:43 AM, Aurelien Jarno wrote:
> +#define P_EXT2 0x200 /* 0x0f 0x38 opcode prefix */
I'm not keen on the name. It's not like the different extensions are numbered.
r~
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Qemu-devel] [PATCH 4/5] tcg/i386: use movbe instruction in qemu_ldst routines
2013-12-21 16:43 ` [Qemu-devel] [PATCH 4/5] tcg/i386: use movbe instruction in qemu_ldst routines Aurelien Jarno
@ 2013-12-22 16:52 ` Richard Henderson
0 siblings, 0 replies; 13+ messages in thread
From: Richard Henderson @ 2013-12-22 16:52 UTC (permalink / raw)
To: Aurelien Jarno, qemu-devel
On 12/21/2013 08:43 AM, Aurelien Jarno wrote:
> +/* If bit_MOVBE is defined in cpuid.h (added in GCC version 4.6), we are
> + going to attempt to determine at runtime whether movbe is available. */
> +#if defined(CONFIG_CPUID_H) && defined(bit_MOVBE)
> +static bool have_movbe;
> +#else
> +# define have_movbe 0
> +#endif
> +
Good point about checking bit_MOVBE, I missed that in my version.
I do slightly prefer hoisting the mov opcode, as I do in my version.
I think that tidies the 32 and 64-bit paths a bit. Nothing can really help
adding extra conditionals to the 16-bit load paths though.
r~
^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2013-12-22 16:53 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-12-21 16:43 [Qemu-devel] [PATCH 0/5] tcg/i386: use movbe instruction in qemu_ldst routines Aurelien Jarno
2013-12-21 16:43 ` [Qemu-devel] [PATCH 1/5] disas/i386.c: disassemble movbe instruction Aurelien Jarno
2013-12-22 16:43 ` Richard Henderson
2013-12-21 16:43 ` [Qemu-devel] [PATCH 2/5] tcg/i386: remove hardcoded P_REXW value Aurelien Jarno
2013-12-22 16:43 ` Richard Henderson
2013-12-21 16:43 ` [Qemu-devel] [PATCH 3/5] tcg/i386: add support for three-byte opcodes Aurelien Jarno
2013-12-22 16:46 ` Richard Henderson
2013-12-21 16:43 ` [Qemu-devel] [PATCH 4/5] tcg/i386: use movbe instruction in qemu_ldst routines Aurelien Jarno
2013-12-22 16:52 ` Richard Henderson
2013-12-21 16:43 ` [Qemu-devel] [PATCH 5/5] tcg/i386: cleanup useless #ifdef Aurelien Jarno
2013-12-22 16:44 ` Richard Henderson
2013-12-22 11:24 ` [Qemu-devel] [PATCH 0/5] tcg/i386: use movbe instruction in qemu_ldst routines Aurelien Jarno
2013-12-22 11:47 ` Aurelien Jarno
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).