* [Qemu-devel] [PATCH 00/10] target-i386: fix and enable SSE4.1 and SSE4.2
@ 2013-03-26 19:01 Aurelien Jarno
2013-03-26 19:01 ` [Qemu-devel] [PATCH 01/10] target-i386: SSE4.1: fix pinsrb instruction Aurelien Jarno
` (9 more replies)
0 siblings, 10 replies; 21+ messages in thread
From: Aurelien Jarno @ 2013-03-26 19:01 UTC (permalink / raw)
To: qemu-devel; +Cc: Aurelien Jarno
SSE4.1 and SSE4.2 instruction sets are partly broken, at least enough to
render a recent glibc with ifunc enabled unusable.
This patch series fixes the issues, it has been tested with the valgrind
testsuite in user mode and by booting x86 and x86-64 guests with a
recent glibc in system mode.
Aurelien Jarno (10):
target-i386: SSE4.1: fix pinsrb instruction
target-i386: SSE4.2: fix pcmpgtq instruction
target-i386: SSE4.2: fix pcmpXstri instructions
target-i386: SSE4.2: fix pcmpXstrm instructions
target-i386: SSE4.2: fix pcmpXstrX instructions in "Ranges" mode
target-i386: SSE4.2: fix pcmpXstrX instructions in "Equal each" mode
target-i386: SSE4.2: fix pcmpXstrX instructions in "Equal ordered" mode
target-i386: SSE4.2: fix pcmpXstrX instructions with "Masked(-)" polarity
target-i386: enable SSE4.1 and SSE4.2 in TCG mode
target-i386: SSE4.2: use clz32/ctz32 instead of reinventing the wheel
target-i386/cpu.c | 13 +++++-----
target-i386/fpu_helper.c | 1 +
target-i386/ops_sse.h | 64 +++++++++++++---------------------------------
target-i386/translate.c | 4 +--
4 files changed, 28 insertions(+), 54 deletions(-)
--
1.7.10.4
^ permalink raw reply [flat|nested] 21+ messages in thread
* [Qemu-devel] [PATCH 01/10] target-i386: SSE4.1: fix pinsrb instruction
2013-03-26 19:01 [Qemu-devel] [PATCH 00/10] target-i386: fix and enable SSE4.1 and SSE4.2 Aurelien Jarno
@ 2013-03-26 19:01 ` Aurelien Jarno
2013-03-27 20:03 ` Richard Henderson
2013-03-26 19:01 ` [Qemu-devel] [PATCH 02/10] target-i386: SSE4.2: fix pcmpgtq instruction Aurelien Jarno
` (8 subsequent siblings)
9 siblings, 1 reply; 21+ messages in thread
From: Aurelien Jarno @ 2013-03-26 19:01 UTC (permalink / raw)
To: qemu-devel; +Cc: Aurelien Jarno
gen_op_mov_TN_reg() loads the value in cpu_T[0], so this temporary should
be used instead of cpu_tmp0.
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
---
target-i386/translate.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/target-i386/translate.c b/target-i386/translate.c
index 7239696..7596a90 100644
--- a/target-i386/translate.c
+++ b/target-i386/translate.c
@@ -4404,9 +4404,9 @@ static void gen_sse(CPUX86State *env, DisasContext *s, int b,
if (mod == 3)
gen_op_mov_TN_reg(OT_LONG, 0, rm);
else
- tcg_gen_qemu_ld8u(cpu_tmp0, cpu_A0,
+ tcg_gen_qemu_ld8u(cpu_T[0], cpu_A0,
(s->mem_index >> 2) - 1);
- tcg_gen_st8_tl(cpu_tmp0, cpu_env, offsetof(CPUX86State,
+ tcg_gen_st8_tl(cpu_T[0], cpu_env, offsetof(CPUX86State,
xmm_regs[reg].XMM_B(val & 15)));
break;
case 0x21: /* insertps */
--
1.7.10.4
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [Qemu-devel] [PATCH 02/10] target-i386: SSE4.2: fix pcmpgtq instruction
2013-03-26 19:01 [Qemu-devel] [PATCH 00/10] target-i386: fix and enable SSE4.1 and SSE4.2 Aurelien Jarno
2013-03-26 19:01 ` [Qemu-devel] [PATCH 01/10] target-i386: SSE4.1: fix pinsrb instruction Aurelien Jarno
@ 2013-03-26 19:01 ` Aurelien Jarno
2013-03-27 20:03 ` Richard Henderson
2013-03-26 19:01 ` [Qemu-devel] [PATCH 03/10] target-i386: SSE4.2: fix pcmpXstri instructions Aurelien Jarno
` (7 subsequent siblings)
9 siblings, 1 reply; 21+ messages in thread
From: Aurelien Jarno @ 2013-03-26 19:01 UTC (permalink / raw)
To: qemu-devel; +Cc: Aurelien Jarno
The "Intel 64 and IA-32 Architectures Software Developer's Manual" (at
least recent versions) clearly says that the comparison is signed.
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
---
target-i386/ops_sse.h | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/target-i386/ops_sse.h b/target-i386/ops_sse.h
index cad9d75..0136df9 100644
--- a/target-i386/ops_sse.h
+++ b/target-i386/ops_sse.h
@@ -1933,8 +1933,7 @@ void glue(helper_mpsadbw, SUFFIX)(CPUX86State *env, Reg *d, Reg *s,
}
/* SSE4.2 op helpers */
-/* it's unclear whether signed or unsigned */
-#define FCMPGTQ(d, s) (d > s ? -1 : 0)
+#define FCMPGTQ(d, s) ((int64_t)d > (int64_t)s ? -1 : 0)
SSE_HELPER_Q(helper_pcmpgtq, FCMPGTQ)
static inline int pcmp_elen(CPUX86State *env, int reg, uint32_t ctrl)
--
1.7.10.4
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [Qemu-devel] [PATCH 03/10] target-i386: SSE4.2: fix pcmpXstri instructions
2013-03-26 19:01 [Qemu-devel] [PATCH 00/10] target-i386: fix and enable SSE4.1 and SSE4.2 Aurelien Jarno
2013-03-26 19:01 ` [Qemu-devel] [PATCH 01/10] target-i386: SSE4.1: fix pinsrb instruction Aurelien Jarno
2013-03-26 19:01 ` [Qemu-devel] [PATCH 02/10] target-i386: SSE4.2: fix pcmpgtq instruction Aurelien Jarno
@ 2013-03-26 19:01 ` Aurelien Jarno
2013-03-27 20:10 ` Richard Henderson
2013-03-26 19:01 ` [Qemu-devel] [PATCH 04/10] target-i386: SSE4.2: fix pcmpXstrm instructions Aurelien Jarno
` (6 subsequent siblings)
9 siblings, 1 reply; 21+ messages in thread
From: Aurelien Jarno @ 2013-03-26 19:01 UTC (permalink / raw)
To: qemu-devel; +Cc: Aurelien Jarno
ffs1 returns the first bit set to one starting counting from the most
significant bit.
pcmpXstri returns the most significant bit set to one, starting counting
from the least significant bit.
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
---
target-i386/ops_sse.h | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/target-i386/ops_sse.h b/target-i386/ops_sse.h
index 0136df9..0667c87 100644
--- a/target-i386/ops_sse.h
+++ b/target-i386/ops_sse.h
@@ -2099,7 +2099,7 @@ void glue(helper_pcmpestri, SUFFIX)(CPUX86State *env, Reg *d, Reg *s,
pcmp_elen(env, R_EAX, ctrl));
if (res) {
- env->regs[R_ECX] = ((ctrl & (1 << 6)) ? rffs1 : ffs1)(res) - 1;
+ env->regs[R_ECX] = (ctrl & (1 << 6)) ? rffs1(res) - 1 : 32 - ffs1(res);
} else {
env->regs[R_ECX] = 16 >> (ctrl & (1 << 0));
}
@@ -2137,7 +2137,7 @@ void glue(helper_pcmpistri, SUFFIX)(CPUX86State *env, Reg *d, Reg *s,
pcmp_ilen(d, ctrl));
if (res) {
- env->regs[R_ECX] = ((ctrl & (1 << 6)) ? rffs1 : ffs1)(res) - 1;
+ env->regs[R_ECX] = (ctrl & (1 << 6)) ? rffs1(res) - 1 : 32 - ffs1(res);
} else {
env->regs[R_ECX] = 16 >> (ctrl & (1 << 0));
}
--
1.7.10.4
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [Qemu-devel] [PATCH 04/10] target-i386: SSE4.2: fix pcmpXstrm instructions
2013-03-26 19:01 [Qemu-devel] [PATCH 00/10] target-i386: fix and enable SSE4.1 and SSE4.2 Aurelien Jarno
` (2 preceding siblings ...)
2013-03-26 19:01 ` [Qemu-devel] [PATCH 03/10] target-i386: SSE4.2: fix pcmpXstri instructions Aurelien Jarno
@ 2013-03-26 19:01 ` Aurelien Jarno
2013-03-27 20:10 ` Richard Henderson
2013-03-26 19:01 ` [Qemu-devel] [PATCH 05/10] target-i386: SSE4.2: fix pcmpXstrX instructions in "Ranges" mode Aurelien Jarno
` (5 subsequent siblings)
9 siblings, 1 reply; 21+ messages in thread
From: Aurelien Jarno @ 2013-03-26 19:01 UTC (permalink / raw)
To: qemu-devel; +Cc: Aurelien Jarno
pcmpXstrm instructions returns their result in the XMM0 register and
not in the first operand.
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
---
target-i386/ops_sse.h | 16 ++++++++--------
1 file changed, 8 insertions(+), 8 deletions(-)
diff --git a/target-i386/ops_sse.h b/target-i386/ops_sse.h
index 0667c87..4a95f41 100644
--- a/target-i386/ops_sse.h
+++ b/target-i386/ops_sse.h
@@ -2116,16 +2116,16 @@ void glue(helper_pcmpestrm, SUFFIX)(CPUX86State *env, Reg *d, Reg *s,
if ((ctrl >> 6) & 1) {
if (ctrl & 1) {
for (i = 0; i < 8; i++, res >>= 1) {
- d->W(i) = (res & 1) ? ~0 : 0;
+ env->xmm_regs[0].W(i) = (res & 1) ? ~0 : 0;
}
} else {
for (i = 0; i < 16; i++, res >>= 1) {
- d->B(i) = (res & 1) ? ~0 : 0;
+ env->xmm_regs[0].B(i) = (res & 1) ? ~0 : 0;
}
}
} else {
- d->Q(1) = 0;
- d->Q(0) = res;
+ env->xmm_regs[0].Q(1) = 0;
+ env->xmm_regs[0].Q(0) = res;
}
}
@@ -2154,16 +2154,16 @@ void glue(helper_pcmpistrm, SUFFIX)(CPUX86State *env, Reg *d, Reg *s,
if ((ctrl >> 6) & 1) {
if (ctrl & 1) {
for (i = 0; i < 8; i++, res >>= 1) {
- d->W(i) = (res & 1) ? ~0 : 0;
+ env->xmm_regs[0].W(i) = (res & 1) ? ~0 : 0;
}
} else {
for (i = 0; i < 16; i++, res >>= 1) {
- d->B(i) = (res & 1) ? ~0 : 0;
+ env->xmm_regs[0].B(i) = (res & 1) ? ~0 : 0;
}
}
} else {
- d->Q(1) = 0;
- d->Q(0) = res;
+ env->xmm_regs[0].Q(1) = 0;
+ env->xmm_regs[0].Q(0) = res;
}
}
--
1.7.10.4
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [Qemu-devel] [PATCH 05/10] target-i386: SSE4.2: fix pcmpXstrX instructions in "Ranges" mode
2013-03-26 19:01 [Qemu-devel] [PATCH 00/10] target-i386: fix and enable SSE4.1 and SSE4.2 Aurelien Jarno
` (3 preceding siblings ...)
2013-03-26 19:01 ` [Qemu-devel] [PATCH 04/10] target-i386: SSE4.2: fix pcmpXstrm instructions Aurelien Jarno
@ 2013-03-26 19:01 ` Aurelien Jarno
2013-03-27 20:14 ` Richard Henderson
2013-03-26 19:01 ` [Qemu-devel] [PATCH 06/10] target-i386: SSE4.2: fix pcmpXstrX instructions in "Equal each" mode Aurelien Jarno
` (4 subsequent siblings)
9 siblings, 1 reply; 21+ messages in thread
From: Aurelien Jarno @ 2013-03-26 19:01 UTC (permalink / raw)
To: qemu-devel; +Cc: Aurelien Jarno
Fix the order of the of the comparisons to match the "Intel 64 and
IA-32 Architectures Software Developer's Manual".
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
---
target-i386/ops_sse.h | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/target-i386/ops_sse.h b/target-i386/ops_sse.h
index 4a95f41..51c5fc9 100644
--- a/target-i386/ops_sse.h
+++ b/target-i386/ops_sse.h
@@ -2019,8 +2019,8 @@ static inline unsigned pcmpxstrx(CPUX86State *env, Reg *d, Reg *s,
res <<= 1;
v = pcmp_val(s, ctrl, j);
for (i = ((validd - 1) | 1); i >= 0; i -= 2) {
- res |= (pcmp_val(d, ctrl, i - 0) <= v &&
- pcmp_val(d, ctrl, i - 1) >= v);
+ res |= (pcmp_val(d, ctrl, i - 0) >= v &&
+ pcmp_val(d, ctrl, i - 1) <= v);
}
}
break;
--
1.7.10.4
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [Qemu-devel] [PATCH 06/10] target-i386: SSE4.2: fix pcmpXstrX instructions in "Equal each" mode
2013-03-26 19:01 [Qemu-devel] [PATCH 00/10] target-i386: fix and enable SSE4.1 and SSE4.2 Aurelien Jarno
` (4 preceding siblings ...)
2013-03-26 19:01 ` [Qemu-devel] [PATCH 05/10] target-i386: SSE4.2: fix pcmpXstrX instructions in "Ranges" mode Aurelien Jarno
@ 2013-03-26 19:01 ` Aurelien Jarno
2013-03-27 20:15 ` Richard Henderson
2013-03-26 19:01 ` [Qemu-devel] [PATCH 07/10] target-i386: SSE4.2: fix pcmpXstrX instructions in "Equal ordered" mode Aurelien Jarno
` (3 subsequent siblings)
9 siblings, 1 reply; 21+ messages in thread
From: Aurelien Jarno @ 2013-03-26 19:01 UTC (permalink / raw)
To: qemu-devel; +Cc: Aurelien Jarno
pcmpXstrX instructions in "Equal each" mode force both invalid element
pair to true. It means (upper - MAX(valids, validd)) bits should be set
to 1, not (upper - MAX(valids, validd) + 1).
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
---
target-i386/ops_sse.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/target-i386/ops_sse.h b/target-i386/ops_sse.h
index 51c5fc9..2fc5fdd 100644
--- a/target-i386/ops_sse.h
+++ b/target-i386/ops_sse.h
@@ -2025,7 +2025,7 @@ static inline unsigned pcmpxstrx(CPUX86State *env, Reg *d, Reg *s,
}
break;
case 2:
- res = (2 << (upper - MAX(valids, validd))) - 1;
+ res = (1 << (upper - MAX(valids, validd))) - 1;
res <<= MAX(valids, validd) - MIN(valids, validd);
for (i = MIN(valids, validd); i >= 0; i--) {
res <<= 1;
--
1.7.10.4
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [Qemu-devel] [PATCH 07/10] target-i386: SSE4.2: fix pcmpXstrX instructions in "Equal ordered" mode
2013-03-26 19:01 [Qemu-devel] [PATCH 00/10] target-i386: fix and enable SSE4.1 and SSE4.2 Aurelien Jarno
` (5 preceding siblings ...)
2013-03-26 19:01 ` [Qemu-devel] [PATCH 06/10] target-i386: SSE4.2: fix pcmpXstrX instructions in "Equal each" mode Aurelien Jarno
@ 2013-03-26 19:01 ` Aurelien Jarno
2013-03-27 20:19 ` Richard Henderson
2013-03-26 19:01 ` [Qemu-devel] [PATCH 08/10] target-i386: SSE4.2: fix pcmpXstrX instructions with "Masked(-)" polarity Aurelien Jarno
` (2 subsequent siblings)
9 siblings, 1 reply; 21+ messages in thread
From: Aurelien Jarno @ 2013-03-26 19:01 UTC (permalink / raw)
To: qemu-devel; +Cc: Aurelien Jarno
The inner loop should only change the current bit of the result, instead
of the whole result.
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
---
target-i386/ops_sse.h | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/target-i386/ops_sse.h b/target-i386/ops_sse.h
index 2fc5fdd..77ab410 100644
--- a/target-i386/ops_sse.h
+++ b/target-i386/ops_sse.h
@@ -2036,10 +2036,11 @@ static inline unsigned pcmpxstrx(CPUX86State *env, Reg *d, Reg *s,
case 3:
for (j = valids - validd; j >= 0; j--) {
res <<= 1;
- res |= 1;
+ v = 1;
for (i = MIN(upper - j, validd); i >= 0; i--) {
- res &= (pcmp_val(s, ctrl, i + j) == pcmp_val(d, ctrl, i));
+ v &= (pcmp_val(s, ctrl, i + j) == pcmp_val(d, ctrl, i));
}
+ res |= v;
}
break;
}
--
1.7.10.4
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [Qemu-devel] [PATCH 08/10] target-i386: SSE4.2: fix pcmpXstrX instructions with "Masked(-)" polarity
2013-03-26 19:01 [Qemu-devel] [PATCH 00/10] target-i386: fix and enable SSE4.1 and SSE4.2 Aurelien Jarno
` (6 preceding siblings ...)
2013-03-26 19:01 ` [Qemu-devel] [PATCH 07/10] target-i386: SSE4.2: fix pcmpXstrX instructions in "Equal ordered" mode Aurelien Jarno
@ 2013-03-26 19:01 ` Aurelien Jarno
2013-03-27 20:20 ` Richard Henderson
2013-03-26 19:01 ` [Qemu-devel] [PATCH 09/10] target-i386: enable SSE4.1 and SSE4.2 in TCG mode Aurelien Jarno
2013-03-26 19:01 ` [Qemu-devel] [PATCH 10/10] target-i386: SSE4.2: use clz32/ctz32 instead of reinventing the wheel Aurelien Jarno
9 siblings, 1 reply; 21+ messages in thread
From: Aurelien Jarno @ 2013-03-26 19:01 UTC (permalink / raw)
To: qemu-devel; +Cc: Aurelien Jarno
valids can equals to -1 if the reg/mem string is empty. Change the
expression to have an empty xor mask in that case.
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
---
target-i386/ops_sse.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/target-i386/ops_sse.h b/target-i386/ops_sse.h
index 77ab410..a0bac07 100644
--- a/target-i386/ops_sse.h
+++ b/target-i386/ops_sse.h
@@ -2050,7 +2050,7 @@ static inline unsigned pcmpxstrx(CPUX86State *env, Reg *d, Reg *s,
res ^= (2 << upper) - 1;
break;
case 3:
- res ^= (2 << valids) - 1;
+ res ^= (1 << (valids + 1)) - 1;
break;
}
--
1.7.10.4
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [Qemu-devel] [PATCH 09/10] target-i386: enable SSE4.1 and SSE4.2 in TCG mode
2013-03-26 19:01 [Qemu-devel] [PATCH 00/10] target-i386: fix and enable SSE4.1 and SSE4.2 Aurelien Jarno
` (7 preceding siblings ...)
2013-03-26 19:01 ` [Qemu-devel] [PATCH 08/10] target-i386: SSE4.2: fix pcmpXstrX instructions with "Masked(-)" polarity Aurelien Jarno
@ 2013-03-26 19:01 ` Aurelien Jarno
2013-03-27 20:22 ` Richard Henderson
2013-03-26 19:01 ` [Qemu-devel] [PATCH 10/10] target-i386: SSE4.2: use clz32/ctz32 instead of reinventing the wheel Aurelien Jarno
9 siblings, 1 reply; 21+ messages in thread
From: Aurelien Jarno @ 2013-03-26 19:01 UTC (permalink / raw)
To: qemu-devel; +Cc: Aurelien Jarno
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
---
target-i386/cpu.c | 13 +++++++------
1 file changed, 7 insertions(+), 6 deletions(-)
diff --git a/target-i386/cpu.c b/target-i386/cpu.c
index a0640db..4b43759 100644
--- a/target-i386/cpu.c
+++ b/target-i386/cpu.c
@@ -388,16 +388,17 @@ typedef struct x86_def_t {
/* missing:
CPUID_VME, CPUID_DTS, CPUID_SS, CPUID_HT, CPUID_TM, CPUID_PBE */
#define TCG_EXT_FEATURES (CPUID_EXT_SSE3 | CPUID_EXT_MONITOR | \
- CPUID_EXT_SSSE3 | CPUID_EXT_CX16 | CPUID_EXT_POPCNT | \
- CPUID_EXT_MOVBE | CPUID_EXT_HYPERVISOR)
+ CPUID_EXT_SSSE3 | CPUID_EXT_CX16 | CPUID_EXT_SSE41 | \
+ CPUID_EXT_SSE42 | CPUID_EXT_POPCNT | CPUID_EXT_MOVBE | \
+ CPUID_EXT_HYPERVISOR)
/* missing:
CPUID_EXT_PCLMULQDQ, CPUID_EXT_DTES64, CPUID_EXT_DSCPL,
CPUID_EXT_VMX, CPUID_EXT_SMX, CPUID_EXT_EST, CPUID_EXT_TM2,
CPUID_EXT_CID, CPUID_EXT_FMA, CPUID_EXT_XTPR, CPUID_EXT_PDCM,
- CPUID_EXT_PCID, CPUID_EXT_DCA, CPUID_EXT_SSE41, CPUID_EXT_SSE42,
- CPUID_EXT_X2APIC, CPUID_EXT_TSC_DEADLINE_TIMER, CPUID_EXT_AES,
- CPUID_EXT_XSAVE, CPUID_EXT_OSXSAVE, CPUID_EXT_AVX,
- CPUID_EXT_F16C, CPUID_EXT_RDRAND */
+ CPUID_EXT_PCID, CPUID_EXT_DCA, CPUID_EXT_X2APIC,
+ CPUID_EXT_TSC_DEADLINE_TIMER, CPUID_EXT_AES, CPUID_EXT_XSAVE,
+ CPUID_EXT_OSXSAVE, CPUID_EXT_AVX, CPUID_EXT_F16C,
+ CPUID_EXT_RDRAND */
#define TCG_EXT2_FEATURES ((TCG_FEATURES & CPUID_EXT2_AMD_ALIASES) | \
CPUID_EXT2_NX | CPUID_EXT2_MMXEXT | CPUID_EXT2_RDTSCP | \
CPUID_EXT2_3DNOW | CPUID_EXT2_3DNOWEXT)
--
1.7.10.4
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [Qemu-devel] [PATCH 10/10] target-i386: SSE4.2: use clz32/ctz32 instead of reinventing the wheel
2013-03-26 19:01 [Qemu-devel] [PATCH 00/10] target-i386: fix and enable SSE4.1 and SSE4.2 Aurelien Jarno
` (8 preceding siblings ...)
2013-03-26 19:01 ` [Qemu-devel] [PATCH 09/10] target-i386: enable SSE4.1 and SSE4.2 in TCG mode Aurelien Jarno
@ 2013-03-26 19:01 ` Aurelien Jarno
2013-03-27 20:22 ` Richard Henderson
9 siblings, 1 reply; 21+ messages in thread
From: Aurelien Jarno @ 2013-03-26 19:01 UTC (permalink / raw)
To: qemu-devel; +Cc: Aurelien Jarno
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
---
target-i386/fpu_helper.c | 1 +
target-i386/ops_sse.h | 32 ++------------------------------
2 files changed, 3 insertions(+), 30 deletions(-)
diff --git a/target-i386/fpu_helper.c b/target-i386/fpu_helper.c
index 44f3d27..29a8fb6 100644
--- a/target-i386/fpu_helper.c
+++ b/target-i386/fpu_helper.c
@@ -20,6 +20,7 @@
#include <math.h>
#include "cpu.h"
#include "helper.h"
+#include "qemu/host-utils.h"
#if !defined(CONFIG_USER_ONLY)
#include "exec/softmmu_exec.h"
diff --git a/target-i386/ops_sse.h b/target-i386/ops_sse.h
index a0bac07..a11dba1 100644
--- a/target-i386/ops_sse.h
+++ b/target-i386/ops_sse.h
@@ -2064,34 +2064,6 @@ static inline unsigned pcmpxstrx(CPUX86State *env, Reg *d, Reg *s,
return res;
}
-static inline int rffs1(unsigned int val)
-{
- int ret = 1, hi;
-
- for (hi = sizeof(val) * 4; hi; hi /= 2) {
- if (val >> hi) {
- val >>= hi;
- ret += hi;
- }
- }
-
- return ret;
-}
-
-static inline int ffs1(unsigned int val)
-{
- int ret = 1, hi;
-
- for (hi = sizeof(val) * 4; hi; hi /= 2) {
- if (val << hi) {
- val <<= hi;
- ret += hi;
- }
- }
-
- return ret;
-}
-
void glue(helper_pcmpestri, SUFFIX)(CPUX86State *env, Reg *d, Reg *s,
uint32_t ctrl)
{
@@ -2100,7 +2072,7 @@ void glue(helper_pcmpestri, SUFFIX)(CPUX86State *env, Reg *d, Reg *s,
pcmp_elen(env, R_EAX, ctrl));
if (res) {
- env->regs[R_ECX] = (ctrl & (1 << 6)) ? rffs1(res) - 1 : 32 - ffs1(res);
+ env->regs[R_ECX] = (ctrl & (1 << 6)) ? 31 - clz32(res) : ctz32(res);
} else {
env->regs[R_ECX] = 16 >> (ctrl & (1 << 0));
}
@@ -2138,7 +2110,7 @@ void glue(helper_pcmpistri, SUFFIX)(CPUX86State *env, Reg *d, Reg *s,
pcmp_ilen(d, ctrl));
if (res) {
- env->regs[R_ECX] = (ctrl & (1 << 6)) ? rffs1(res) - 1 : 32 - ffs1(res);
+ env->regs[R_ECX] = (ctrl & (1 << 6)) ? 31 - clz32(res) : ctz32(res);
} else {
env->regs[R_ECX] = 16 >> (ctrl & (1 << 0));
}
--
1.7.10.4
^ permalink raw reply related [flat|nested] 21+ messages in thread
* Re: [Qemu-devel] [PATCH 01/10] target-i386: SSE4.1: fix pinsrb instruction
2013-03-26 19:01 ` [Qemu-devel] [PATCH 01/10] target-i386: SSE4.1: fix pinsrb instruction Aurelien Jarno
@ 2013-03-27 20:03 ` Richard Henderson
0 siblings, 0 replies; 21+ messages in thread
From: Richard Henderson @ 2013-03-27 20:03 UTC (permalink / raw)
To: Aurelien Jarno; +Cc: qemu-devel
On 03/26/2013 12:01 PM, Aurelien Jarno wrote:
> gen_op_mov_TN_reg() loads the value in cpu_T[0], so this temporary should
> be used instead of cpu_tmp0.
>
> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
> ---
> target-i386/translate.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
Reviewed-by: Richard Henderson <rth@twiddle.net>
r~
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [Qemu-devel] [PATCH 02/10] target-i386: SSE4.2: fix pcmpgtq instruction
2013-03-26 19:01 ` [Qemu-devel] [PATCH 02/10] target-i386: SSE4.2: fix pcmpgtq instruction Aurelien Jarno
@ 2013-03-27 20:03 ` Richard Henderson
0 siblings, 0 replies; 21+ messages in thread
From: Richard Henderson @ 2013-03-27 20:03 UTC (permalink / raw)
To: Aurelien Jarno; +Cc: qemu-devel
On 03/26/2013 12:01 PM, Aurelien Jarno wrote:
> The "Intel 64 and IA-32 Architectures Software Developer's Manual" (at
> least recent versions) clearly says that the comparison is signed.
>
> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
> ---
> target-i386/ops_sse.h | 3 +--
> 1 file changed, 1 insertion(+), 2 deletions(-)
Reviewed-by: Richard Henderson <rth@twiddle.net>
r~
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [Qemu-devel] [PATCH 03/10] target-i386: SSE4.2: fix pcmpXstri instructions
2013-03-26 19:01 ` [Qemu-devel] [PATCH 03/10] target-i386: SSE4.2: fix pcmpXstri instructions Aurelien Jarno
@ 2013-03-27 20:10 ` Richard Henderson
0 siblings, 0 replies; 21+ messages in thread
From: Richard Henderson @ 2013-03-27 20:10 UTC (permalink / raw)
To: Aurelien Jarno; +Cc: qemu-devel
On 03/26/2013 12:01 PM, Aurelien Jarno wrote:
> ffs1 returns the first bit set to one starting counting from the most
> significant bit.
>
> pcmpXstri returns the most significant bit set to one, starting counting
> from the least significant bit.
>
> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
> ---
> target-i386/ops_sse.h | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
I wonder if this ought not just be squashed with patch 10.
It would have made it easier to review, actually. That said,
Reviewed-by: Richard Henderson <rth@twiddle.net>
r~
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [Qemu-devel] [PATCH 04/10] target-i386: SSE4.2: fix pcmpXstrm instructions
2013-03-26 19:01 ` [Qemu-devel] [PATCH 04/10] target-i386: SSE4.2: fix pcmpXstrm instructions Aurelien Jarno
@ 2013-03-27 20:10 ` Richard Henderson
0 siblings, 0 replies; 21+ messages in thread
From: Richard Henderson @ 2013-03-27 20:10 UTC (permalink / raw)
To: Aurelien Jarno; +Cc: qemu-devel
On 03/26/2013 12:01 PM, Aurelien Jarno wrote:
> pcmpXstrm instructions returns their result in the XMM0 register and
> not in the first operand.
>
> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
> ---
> target-i386/ops_sse.h | 16 ++++++++--------
> 1 file changed, 8 insertions(+), 8 deletions(-)
Reviewed-by: Richard Henderson <rth@twiddle.net>
r~
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [Qemu-devel] [PATCH 05/10] target-i386: SSE4.2: fix pcmpXstrX instructions in "Ranges" mode
2013-03-26 19:01 ` [Qemu-devel] [PATCH 05/10] target-i386: SSE4.2: fix pcmpXstrX instructions in "Ranges" mode Aurelien Jarno
@ 2013-03-27 20:14 ` Richard Henderson
0 siblings, 0 replies; 21+ messages in thread
From: Richard Henderson @ 2013-03-27 20:14 UTC (permalink / raw)
To: Aurelien Jarno; +Cc: qemu-devel
On 03/26/2013 12:01 PM, Aurelien Jarno wrote:
> Fix the order of the of the comparisons to match the "Intel 64 and
> IA-32 Architectures Software Developer's Manual".
>
> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
> ---
> target-i386/ops_sse.h | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
Reviewed-by: Richard Henderson <rth@twiddle.net>
r~
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [Qemu-devel] [PATCH 06/10] target-i386: SSE4.2: fix pcmpXstrX instructions in "Equal each" mode
2013-03-26 19:01 ` [Qemu-devel] [PATCH 06/10] target-i386: SSE4.2: fix pcmpXstrX instructions in "Equal each" mode Aurelien Jarno
@ 2013-03-27 20:15 ` Richard Henderson
0 siblings, 0 replies; 21+ messages in thread
From: Richard Henderson @ 2013-03-27 20:15 UTC (permalink / raw)
To: Aurelien Jarno; +Cc: qemu-devel
On 03/26/2013 12:01 PM, Aurelien Jarno wrote:
> pcmpXstrX instructions in "Equal each" mode force both invalid element
> pair to true. It means (upper - MAX(valids, validd)) bits should be set
> to 1, not (upper - MAX(valids, validd) + 1).
>
> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
> ---
> target-i386/ops_sse.h | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
Reviewed-by: Richard Henderson <rth@twiddle.net>
r~
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [Qemu-devel] [PATCH 07/10] target-i386: SSE4.2: fix pcmpXstrX instructions in "Equal ordered" mode
2013-03-26 19:01 ` [Qemu-devel] [PATCH 07/10] target-i386: SSE4.2: fix pcmpXstrX instructions in "Equal ordered" mode Aurelien Jarno
@ 2013-03-27 20:19 ` Richard Henderson
0 siblings, 0 replies; 21+ messages in thread
From: Richard Henderson @ 2013-03-27 20:19 UTC (permalink / raw)
To: Aurelien Jarno; +Cc: qemu-devel
On 03/26/2013 12:01 PM, Aurelien Jarno wrote:
> The inner loop should only change the current bit of the result, instead
> of the whole result.
>
> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
> ---
> target-i386/ops_sse.h | 5 +++--
> 1 file changed, 3 insertions(+), 2 deletions(-)
Reviewed-by: Richard Henderson <rth@twiddle.net>
r~
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [Qemu-devel] [PATCH 08/10] target-i386: SSE4.2: fix pcmpXstrX instructions with "Masked(-)" polarity
2013-03-26 19:01 ` [Qemu-devel] [PATCH 08/10] target-i386: SSE4.2: fix pcmpXstrX instructions with "Masked(-)" polarity Aurelien Jarno
@ 2013-03-27 20:20 ` Richard Henderson
0 siblings, 0 replies; 21+ messages in thread
From: Richard Henderson @ 2013-03-27 20:20 UTC (permalink / raw)
To: Aurelien Jarno; +Cc: qemu-devel
On 03/26/2013 12:01 PM, Aurelien Jarno wrote:
> valids can equals to -1 if the reg/mem string is empty. Change the
> expression to have an empty xor mask in that case.
>
> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
> ---
> target-i386/ops_sse.h | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
Reviewed-by: Richard Henderson <rth@twiddle.net>
r~
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [Qemu-devel] [PATCH 09/10] target-i386: enable SSE4.1 and SSE4.2 in TCG mode
2013-03-26 19:01 ` [Qemu-devel] [PATCH 09/10] target-i386: enable SSE4.1 and SSE4.2 in TCG mode Aurelien Jarno
@ 2013-03-27 20:22 ` Richard Henderson
0 siblings, 0 replies; 21+ messages in thread
From: Richard Henderson @ 2013-03-27 20:22 UTC (permalink / raw)
To: Aurelien Jarno; +Cc: qemu-devel
On 03/26/2013 12:01 PM, Aurelien Jarno wrote:
> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
> ---
> target-i386/cpu.c | 13 +++++++------
> 1 file changed, 7 insertions(+), 6 deletions(-)
Reviewed-by: Richard Henderson <rth@twiddle.net>
r~
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [Qemu-devel] [PATCH 10/10] target-i386: SSE4.2: use clz32/ctz32 instead of reinventing the wheel
2013-03-26 19:01 ` [Qemu-devel] [PATCH 10/10] target-i386: SSE4.2: use clz32/ctz32 instead of reinventing the wheel Aurelien Jarno
@ 2013-03-27 20:22 ` Richard Henderson
0 siblings, 0 replies; 21+ messages in thread
From: Richard Henderson @ 2013-03-27 20:22 UTC (permalink / raw)
To: Aurelien Jarno; +Cc: qemu-devel
On 03/26/2013 12:01 PM, Aurelien Jarno wrote:
> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
> ---
> target-i386/fpu_helper.c | 1 +
> target-i386/ops_sse.h | 32 ++------------------------------
> 2 files changed, 3 insertions(+), 30 deletions(-)
Reviewed-by: Richard Henderson <rth@twiddle.net>
r~
^ permalink raw reply [flat|nested] 21+ messages in thread
end of thread, other threads:[~2013-03-27 20:22 UTC | newest]
Thread overview: 21+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-03-26 19:01 [Qemu-devel] [PATCH 00/10] target-i386: fix and enable SSE4.1 and SSE4.2 Aurelien Jarno
2013-03-26 19:01 ` [Qemu-devel] [PATCH 01/10] target-i386: SSE4.1: fix pinsrb instruction Aurelien Jarno
2013-03-27 20:03 ` Richard Henderson
2013-03-26 19:01 ` [Qemu-devel] [PATCH 02/10] target-i386: SSE4.2: fix pcmpgtq instruction Aurelien Jarno
2013-03-27 20:03 ` Richard Henderson
2013-03-26 19:01 ` [Qemu-devel] [PATCH 03/10] target-i386: SSE4.2: fix pcmpXstri instructions Aurelien Jarno
2013-03-27 20:10 ` Richard Henderson
2013-03-26 19:01 ` [Qemu-devel] [PATCH 04/10] target-i386: SSE4.2: fix pcmpXstrm instructions Aurelien Jarno
2013-03-27 20:10 ` Richard Henderson
2013-03-26 19:01 ` [Qemu-devel] [PATCH 05/10] target-i386: SSE4.2: fix pcmpXstrX instructions in "Ranges" mode Aurelien Jarno
2013-03-27 20:14 ` Richard Henderson
2013-03-26 19:01 ` [Qemu-devel] [PATCH 06/10] target-i386: SSE4.2: fix pcmpXstrX instructions in "Equal each" mode Aurelien Jarno
2013-03-27 20:15 ` Richard Henderson
2013-03-26 19:01 ` [Qemu-devel] [PATCH 07/10] target-i386: SSE4.2: fix pcmpXstrX instructions in "Equal ordered" mode Aurelien Jarno
2013-03-27 20:19 ` Richard Henderson
2013-03-26 19:01 ` [Qemu-devel] [PATCH 08/10] target-i386: SSE4.2: fix pcmpXstrX instructions with "Masked(-)" polarity Aurelien Jarno
2013-03-27 20:20 ` Richard Henderson
2013-03-26 19:01 ` [Qemu-devel] [PATCH 09/10] target-i386: enable SSE4.1 and SSE4.2 in TCG mode Aurelien Jarno
2013-03-27 20:22 ` Richard Henderson
2013-03-26 19:01 ` [Qemu-devel] [PATCH 10/10] target-i386: SSE4.2: use clz32/ctz32 instead of reinventing the wheel Aurelien Jarno
2013-03-27 20:22 ` Richard Henderson
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).