* [Qemu-devel] [PATCH 0/4] TCG: add some instructions
@ 2009-09-18 22:30 Andre Przywara
2009-09-18 22:30 ` [Qemu-devel] [PATCH 1/4] TCG x86: implement lzcnt emulation Andre Przywara
` (3 more replies)
0 siblings, 4 replies; 12+ messages in thread
From: Andre Przywara @ 2009-09-18 22:30 UTC (permalink / raw)
To: qemu-devel
Hi,
this patch series adds some instructions to the x86 TCG emulator.
They are all AMD originated, although RDTSCP is also implemented in
recent Intel CPUs. They are:
Patch 1: lzcnt (count leading zero bits in a word)
Patch 2: lock mov cr0 = mov cr8 (access to TPR in 32-bit mode)
Patch 3: extrq, insertq, movntss, movntsd (SSE4a instructions)
Patch 4: rdtscp (read TSC plus aux MSR)
This adds a new field to CPUX86State and thus bumps the
SAVE_VERSION for the CPU.
I am not very experienced in TCG, but I have written tests to check
the functionality of the implementation. Those tests use inline
assembly to trigger the opcodes and check their results with "golden"
precalculated values. The tests showed the same results in linux-user
and on the native hardware.
Please review!
Regards,
Andre.
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Qemu-devel] [PATCH 1/4] TCG x86: implement lzcnt emulation
2009-09-18 22:30 [Qemu-devel] [PATCH 0/4] TCG: add some instructions Andre Przywara
@ 2009-09-18 22:30 ` Andre Przywara
2009-10-04 12:02 ` Aurelien Jarno
2009-09-18 22:30 ` [Qemu-devel] [PATCH 2/4] TCG x86: add lock mov cr0 = cr8 Andre Przywara
` (2 subsequent siblings)
3 siblings, 1 reply; 12+ messages in thread
From: Andre Przywara @ 2009-09-18 22:30 UTC (permalink / raw)
To: qemu-devel; +Cc: Andre Przywara
lzcnt is an AMD Phenom/Barcelona added instruction returning the
number of leading zero bits in a word.
As this is similar to the "bsr" instruction, reuse the existing
code. There need to be some more changes, though, as lzcnt always
returns a valid value (in opposite to bsr, which has a special
case when the operand is 0).
lzcnt is guarded by the ABM CPUID bit (Fn8000_0001:ECX[5]).
Signed-off-by: Andre Przywara <andre.przywara@amd.com>
---
target-i386/helper.h | 2 +-
target-i386/op_helper.c | 11 ++++++++---
target-i386/translate.c | 37 +++++++++++++++++++++++++------------
3 files changed, 34 insertions(+), 16 deletions(-)
diff --git a/target-i386/helper.h b/target-i386/helper.h
index 68d57b1..38d0708 100644
--- a/target-i386/helper.h
+++ b/target-i386/helper.h
@@ -191,7 +191,7 @@ DEF_HELPER_2(frstor, void, tl, int)
DEF_HELPER_2(fxsave, void, tl, int)
DEF_HELPER_2(fxrstor, void, tl, int)
DEF_HELPER_1(bsf, tl, tl)
-DEF_HELPER_1(bsr, tl, tl)
+DEF_HELPER_2(bsr, tl, tl, int)
/* MMX/SSE */
diff --git a/target-i386/op_helper.c b/target-i386/op_helper.c
index c3f5af6..9ffda36 100644
--- a/target-i386/op_helper.c
+++ b/target-i386/op_helper.c
@@ -5457,11 +5457,14 @@ target_ulong helper_bsf(target_ulong t0)
return count;
}
-target_ulong helper_bsr(target_ulong t0)
+target_ulong helper_bsr(target_ulong t0, int lzcnt)
{
int count;
target_ulong res, mask;
-
+
+ if (lzcnt > 0 && t0 == 0) {
+ return lzcnt;
+ }
res = t0;
count = TARGET_LONG_BITS - 1;
mask = (target_ulong)1 << (TARGET_LONG_BITS - 1);
@@ -5469,10 +5472,12 @@ target_ulong helper_bsr(target_ulong t0)
count--;
res <<= 1;
}
+ if (lzcnt > 0) {
+ return lzcnt - 1 - count;
+ }
return count;
}
-
static int compute_all_eflags(void)
{
return CC_SRC;
diff --git a/target-i386/translate.c b/target-i386/translate.c
index 335fc08..aaa4492 100644
--- a/target-i386/translate.c
+++ b/target-i386/translate.c
@@ -6538,22 +6538,35 @@ static target_ulong disas_insn(DisasContext *s, target_ulong pc_start)
reg = ((modrm >> 3) & 7) | rex_r;
gen_ldst_modrm(s, modrm, ot, OR_TMP0, 0);
gen_extu(ot, cpu_T[0]);
- label1 = gen_new_label();
- tcg_gen_movi_tl(cpu_cc_dst, 0);
t0 = tcg_temp_local_new();
tcg_gen_mov_tl(t0, cpu_T[0]);
- tcg_gen_brcondi_tl(TCG_COND_EQ, t0, 0, label1);
- if (b & 1) {
- gen_helper_bsr(cpu_T[0], t0);
+ if ((b & 1) && (prefixes & PREFIX_REPZ) &&
+ (s->cpuid_ext3_features & CPUID_EXT3_ABM)) {
+ switch(ot) {
+ case OT_WORD: gen_helper_bsr(cpu_T[0], t0,
+ tcg_const_i32(16)); break;
+ case OT_LONG: gen_helper_bsr(cpu_T[0], t0,
+ tcg_const_i32(32)); break;
+ case OT_QUAD: gen_helper_bsr(cpu_T[0], t0,
+ tcg_const_i32(64)); break;
+ }
+ gen_op_mov_reg_T0(ot, reg);
} else {
- gen_helper_bsf(cpu_T[0], t0);
+ label1 = gen_new_label();
+ tcg_gen_movi_tl(cpu_cc_dst, 0);
+ tcg_gen_brcondi_tl(TCG_COND_EQ, t0, 0, label1);
+ if (b & 1) {
+ gen_helper_bsr(cpu_T[0], t0, tcg_const_i32(0));
+ } else {
+ gen_helper_bsf(cpu_T[0], t0);
+ }
+ gen_op_mov_reg_T0(ot, reg);
+ tcg_gen_movi_tl(cpu_cc_dst, 1);
+ gen_set_label(label1);
+ tcg_gen_discard_tl(cpu_cc_src);
+ s->cc_op = CC_OP_LOGICB + ot;
+ tcg_temp_free(t0);
}
- gen_op_mov_reg_T0(ot, reg);
- tcg_gen_movi_tl(cpu_cc_dst, 1);
- gen_set_label(label1);
- tcg_gen_discard_tl(cpu_cc_src);
- s->cc_op = CC_OP_LOGICB + ot;
- tcg_temp_free(t0);
}
break;
/************************/
--
1.6.1.3
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [Qemu-devel] [PATCH 2/4] TCG x86: add lock mov cr0 = cr8
2009-09-18 22:30 [Qemu-devel] [PATCH 0/4] TCG: add some instructions Andre Przywara
2009-09-18 22:30 ` [Qemu-devel] [PATCH 1/4] TCG x86: implement lzcnt emulation Andre Przywara
@ 2009-09-18 22:30 ` Andre Przywara
2009-10-04 12:06 ` Aurelien Jarno
2009-09-18 22:30 ` [Qemu-devel] [PATCH 3/4] TCG x86: add SSE4a instruction support Andre Przywara
2009-09-18 22:30 ` [Qemu-devel] [PATCH 4/4] TCG x86: add RDTSCP support Andre Przywara
3 siblings, 1 reply; 12+ messages in thread
From: Andre Przywara @ 2009-09-18 22:30 UTC (permalink / raw)
To: qemu-devel; +Cc: Andre Przywara
AMD CPUs featuring a shortcut to access CR8 even from 32-bit mode.
If you use the LOCK prefix with "mov CR0", it accesses CR8 instead.
This behavior is guarded by the CR8_LEGACY CPUID bit
(Fn8000_0001:ECX[1]).
Signed-off-by: Andre Przywara <andre.przywara@amd.com>
---
target-i386/translate.c | 4 ++++
1 files changed, 4 insertions(+), 0 deletions(-)
diff --git a/target-i386/translate.c b/target-i386/translate.c
index aaa4492..134c870 100644
--- a/target-i386/translate.c
+++ b/target-i386/translate.c
@@ -7362,6 +7362,10 @@ static target_ulong disas_insn(DisasContext *s, target_ulong pc_start)
ot = OT_QUAD;
else
ot = OT_LONG;
+ if ((prefixes & PREFIX_LOCK) && (reg == 0) &&
+ (s->cpuid_ext3_features & CPUID_EXT3_CR8LEG)) {
+ reg = 8;
+ }
switch(reg) {
case 0:
case 2:
--
1.6.1.3
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [Qemu-devel] [PATCH 3/4] TCG x86: add SSE4a instruction support
2009-09-18 22:30 [Qemu-devel] [PATCH 0/4] TCG: add some instructions Andre Przywara
2009-09-18 22:30 ` [Qemu-devel] [PATCH 1/4] TCG x86: implement lzcnt emulation Andre Przywara
2009-09-18 22:30 ` [Qemu-devel] [PATCH 2/4] TCG x86: add lock mov cr0 = cr8 Andre Przywara
@ 2009-09-18 22:30 ` Andre Przywara
2009-10-04 12:10 ` Aurelien Jarno
2009-09-18 22:30 ` [Qemu-devel] [PATCH 4/4] TCG x86: add RDTSCP support Andre Przywara
3 siblings, 1 reply; 12+ messages in thread
From: Andre Przywara @ 2009-09-18 22:30 UTC (permalink / raw)
To: qemu-devel; +Cc: Andre Przywara
This adds support for the AMD Phenom/Barcelona's SSE4a instructions.
Those include insertq and extrq, which are doing shift and mask on
XMM registers, in two versions (immediate shift/length values and
stored in another XMM register).
Additionally it implements movntss, movntsd, which are scalar
non-temporal stores (avoiding cache trashing). These are implemented
as normal stores, though.
SSE4a is guarded by the SSE4A CPUID bit (Fn8000_0001:ECX[6]).
Signed-off-by: Andre Przywara <andre.przywara@amd.com>
---
target-i386/ops_sse.h | 44 ++++++++++++++++++++++++++++++++++++++++++
target-i386/ops_sse_header.h | 4 +++
target-i386/translate.c | 39 +++++++++++++++++++++++++++++++++++-
3 files changed, 85 insertions(+), 2 deletions(-)
diff --git a/target-i386/ops_sse.h b/target-i386/ops_sse.h
index 709732a..3232abd 100644
--- a/target-i386/ops_sse.h
+++ b/target-i386/ops_sse.h
@@ -802,6 +802,50 @@ void helper_rcpss(XMMReg *d, XMMReg *s)
d->XMM_S(0) = approx_rcp(s->XMM_S(0));
}
+static inline uint64_t helper_extrq(uint64_t src, int shift, int len)
+{
+ uint64_t mask;
+
+ if (len == 0) {
+ mask = ~0LL;
+ } else {
+ mask = (1ULL << len) - 1;
+ }
+ return (src >> shift) & mask;
+}
+
+void helper_extrq_r(XMMReg *d, XMMReg *s)
+{
+ d->XMM_Q(0) = helper_extrq(d->XMM_Q(0), s->XMM_B(1), s->XMM_B(0));
+}
+
+void helper_extrq_i(XMMReg *d, int index, int length)
+{
+ d->XMM_Q(0) = helper_extrq(d->XMM_Q(0), index, length);
+}
+
+static inline uint64_t helper_insertq(uint64_t src, int shift, int len)
+{
+ uint64_t mask;
+
+ if (len == 0) {
+ mask = ~0ULL;
+ } else {
+ mask = (1ULL << len) - 1;
+ }
+ return (src & ~(mask << shift)) | ((src & mask) << shift);
+}
+
+void helper_insertq_r(XMMReg *d, XMMReg *s)
+{
+ d->XMM_Q(0) = helper_insertq(s->XMM_Q(0), s->XMM_B(9), s->XMM_B(8));
+}
+
+void helper_insertq_i(XMMReg *d, int index, int length)
+{
+ d->XMM_Q(0) = helper_insertq(d->XMM_Q(0), index, length);
+}
+
void helper_haddps(XMMReg *d, XMMReg *s)
{
XMMReg r;
diff --git a/target-i386/ops_sse_header.h b/target-i386/ops_sse_header.h
index 53add99..a0a6361 100644
--- a/target-i386/ops_sse_header.h
+++ b/target-i386/ops_sse_header.h
@@ -187,6 +187,10 @@ DEF_HELPER_2(rsqrtps, void, XMMReg, XMMReg)
DEF_HELPER_2(rsqrtss, void, XMMReg, XMMReg)
DEF_HELPER_2(rcpps, void, XMMReg, XMMReg)
DEF_HELPER_2(rcpss, void, XMMReg, XMMReg)
+DEF_HELPER_2(extrq_r, void, XMMReg, XMMReg)
+DEF_HELPER_3(extrq_i, void, XMMReg, int, int)
+DEF_HELPER_2(insertq_r, void, XMMReg, XMMReg)
+DEF_HELPER_3(insertq_i, void, XMMReg, int, int)
DEF_HELPER_2(haddps, void, XMMReg, XMMReg)
DEF_HELPER_2(haddpd, void, XMMReg, XMMReg)
DEF_HELPER_2(hsubps, void, XMMReg, XMMReg)
diff --git a/target-i386/translate.c b/target-i386/translate.c
index 134c870..5cbcf07 100644
--- a/target-i386/translate.c
+++ b/target-i386/translate.c
@@ -2822,7 +2822,7 @@ static void *sse_op_table1[256][4] = {
[0x28] = { SSE_SPECIAL, SSE_SPECIAL }, /* movaps, movapd */
[0x29] = { SSE_SPECIAL, SSE_SPECIAL }, /* movaps, movapd */
[0x2a] = { SSE_SPECIAL, SSE_SPECIAL, SSE_SPECIAL, SSE_SPECIAL }, /* cvtpi2ps, cvtpi2pd, cvtsi2ss, cvtsi2sd */
- [0x2b] = { SSE_SPECIAL, SSE_SPECIAL }, /* movntps, movntpd */
+ [0x2b] = { SSE_SPECIAL, SSE_SPECIAL, SSE_SPECIAL, SSE_SPECIAL }, /* movntps, movntpd, movntss, movntsd */
[0x2c] = { SSE_SPECIAL, SSE_SPECIAL, SSE_SPECIAL, SSE_SPECIAL }, /* cvttps2pi, cvttpd2pi, cvttsd2si, cvttss2si */
[0x2d] = { SSE_SPECIAL, SSE_SPECIAL, SSE_SPECIAL, SSE_SPECIAL }, /* cvtps2pi, cvtpd2pi, cvtsd2si, cvtss2si */
[0x2e] = { gen_helper_ucomiss, gen_helper_ucomisd },
@@ -2879,6 +2879,8 @@ static void *sse_op_table1[256][4] = {
[0x75] = MMX_OP2(pcmpeqw),
[0x76] = MMX_OP2(pcmpeql),
[0x77] = { SSE_DUMMY }, /* emms */
+ [0x78] = { NULL, SSE_SPECIAL, NULL, SSE_SPECIAL }, /* extrq_i, insertq_i */
+ [0x79] = { NULL, gen_helper_extrq_r, NULL, gen_helper_insertq_r },
[0x7c] = { NULL, gen_helper_haddpd, NULL, gen_helper_haddps },
[0x7d] = { NULL, gen_helper_hsubpd, NULL, gen_helper_hsubps },
[0x7e] = { SSE_SPECIAL, SSE_SPECIAL, SSE_SPECIAL }, /* movd, movd, , movq */
@@ -3165,6 +3167,20 @@ static void gen_sse(DisasContext *s, int b, target_ulong pc_start, int rex_r)
gen_lea_modrm(s, modrm, ®_addr, &offset_addr);
gen_sto_env_A0(s->mem_index, offsetof(CPUX86State,xmm_regs[reg]));
break;
+ case 0x22b: /* movntss */
+ case 0x32b: /* movntsd */
+ if (mod == 3)
+ goto illegal_op;
+ gen_lea_modrm(s, modrm, ®_addr, &offset_addr);
+ if (b1 & 1) {
+ gen_stq_env_A0(s->mem_index, offsetof(CPUX86State,
+ xmm_regs[reg]));
+ } else {
+ tcg_gen_ld32u_tl(cpu_T[0], cpu_env, offsetof(CPUX86State,
+ xmm_regs[reg].XMM_L(0)));
+ gen_op_st_T0_A0(OT_LONG + s->mem_index);
+ }
+ break;
case 0x6e: /* movd mm, ea */
#ifdef TARGET_X86_64
if (s->dflag == 2) {
@@ -3320,6 +3336,25 @@ static void gen_sse(DisasContext *s, int b, target_ulong pc_start, int rex_r)
gen_op_movl(offsetof(CPUX86State,xmm_regs[reg].XMM_L(2)),
offsetof(CPUX86State,xmm_regs[reg].XMM_L(3)));
break;
+ case 0x178:
+ case 0x378:
+ {
+ int bit_index, field_length;
+
+ if (b1 == 1 && reg != 0)
+ goto illegal_op;
+ field_length = ldub_code(s->pc++) & 0x3F;
+ bit_index = ldub_code(s->pc++) & 0x3F;
+ tcg_gen_addi_ptr(cpu_ptr0, cpu_env,
+ offsetof(CPUX86State,xmm_regs[reg]));
+ if (b1 == 1)
+ gen_helper_extrq_i(cpu_ptr0, tcg_const_i32(bit_index),
+ tcg_const_i32(field_length));
+ else
+ gen_helper_insertq_i(cpu_ptr0, tcg_const_i32(bit_index),
+ tcg_const_i32(field_length));
+ }
+ break;
case 0x7e: /* movd ea, mm */
#ifdef TARGET_X86_64
if (s->dflag == 2) {
@@ -7566,7 +7601,7 @@ static target_ulong disas_insn(DisasContext *s, target_ulong pc_start)
case 0x110 ... 0x117:
case 0x128 ... 0x12f:
case 0x138 ... 0x13a:
- case 0x150 ... 0x177:
+ case 0x150 ... 0x179:
case 0x17c ... 0x17f:
case 0x1c2:
case 0x1c4 ... 0x1c6:
--
1.6.1.3
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [Qemu-devel] [PATCH 4/4] TCG x86: add RDTSCP support
2009-09-18 22:30 [Qemu-devel] [PATCH 0/4] TCG: add some instructions Andre Przywara
` (2 preceding siblings ...)
2009-09-18 22:30 ` [Qemu-devel] [PATCH 3/4] TCG x86: add SSE4a instruction support Andre Przywara
@ 2009-09-18 22:30 ` Andre Przywara
2009-10-04 12:47 ` Aurelien Jarno
3 siblings, 1 reply; 12+ messages in thread
From: Andre Przywara @ 2009-09-18 22:30 UTC (permalink / raw)
To: qemu-devel; +Cc: Andre Przywara
RDTSCP reads the time stamp counter and atomically also the content
of a 32-bit MSR, which can be freely set by the OS. This allows CPU
local data to be queried by userspace.
Linux uses this to allow a fast implementation of the getcpu()
syscall, which uses the vsyscall page to avoid a context switch.
AMD CPUs since K8RevF and Intel CPUs since Nehalem support this
instruction.
RDTSCP is guarded by the RDTSCP CPUID bit (Fn8000_0001:EDX[27]).
Signed-off-by: Andre Przywara <andre.przywara@amd.com>
---
target-i386/cpu.h | 5 +++-
target-i386/helper.h | 1 +
target-i386/machine.c | 4 +++
target-i386/op_helper.c | 12 ++++++++++
target-i386/translate.c | 57 ++++++++++++++++++++++++++++++++++------------
5 files changed, 63 insertions(+), 16 deletions(-)
diff --git a/target-i386/cpu.h b/target-i386/cpu.h
index b9a6392..f318942 100644
--- a/target-i386/cpu.h
+++ b/target-i386/cpu.h
@@ -322,6 +322,7 @@
#define MSR_FSBASE 0xc0000100
#define MSR_GSBASE 0xc0000101
#define MSR_KERNELGSBASE 0xc0000102
+#define MSR_TSC_AUX 0xc0000103
#define MSR_VM_HSAVE_PA 0xc0010117
@@ -694,6 +695,8 @@ typedef struct CPUX86State {
uint64 mcg_status;
uint64 mcg_ctl;
uint64 *mce_banks;
+
+ uint64_t tsc_aux;
} CPUX86State;
CPUX86State *cpu_x86_init(const char *cpu_model);
@@ -854,7 +857,7 @@ uint64_t cpu_get_tsc(CPUX86State *env);
#define cpu_signal_handler cpu_x86_signal_handler
#define cpu_list x86_cpu_list
-#define CPU_SAVE_VERSION 10
+#define CPU_SAVE_VERSION 11
/* MMU modes definitions */
#define MMU_MODE0_SUFFIX _kernel
diff --git a/target-i386/helper.h b/target-i386/helper.h
index 38d0708..ef8d4e1 100644
--- a/target-i386/helper.h
+++ b/target-i386/helper.h
@@ -80,6 +80,7 @@ DEF_HELPER_1(cmpxchg16b, void, tl)
DEF_HELPER_0(single_step, void)
DEF_HELPER_0(cpuid, void)
DEF_HELPER_0(rdtsc, void)
+DEF_HELPER_0(rdtscp, void)
DEF_HELPER_0(rdpmc, void)
DEF_HELPER_0(rdmsr, void)
DEF_HELPER_0(wrmsr, void)
diff --git a/target-i386/machine.c b/target-i386/machine.c
index ab31329..e5a060f 100644
--- a/target-i386/machine.c
+++ b/target-i386/machine.c
@@ -171,6 +171,7 @@ void cpu_save(QEMUFile *f, void *opaque)
qemu_put_be64s(f, &env->mce_banks[4*i + 3]);
}
}
+ qemu_put_be64s(f, &env->tsc_aux);
}
#ifdef USE_X86LDOUBLE
@@ -377,6 +378,9 @@ int cpu_load(QEMUFile *f, void *opaque, int version_id)
}
}
+ if (version_id >= 11) {
+ qemu_get_be64s(f, &env->tsc_aux);
+ }
/* XXX: ensure compatiblity for halted bit ? */
/* XXX: compute redundant hflags bits */
env->hflags = hflags;
diff --git a/target-i386/op_helper.c b/target-i386/op_helper.c
index 9ffda36..a457bdd 100644
--- a/target-i386/op_helper.c
+++ b/target-i386/op_helper.c
@@ -2969,6 +2969,12 @@ void helper_rdtsc(void)
EDX = (uint32_t)(val >> 32);
}
+void helper_rdtscp(void)
+{
+ helper_rdtsc();
+ ECX = (uint32_t)(env->tsc_aux);
+}
+
void helper_rdpmc(void)
{
if ((env->cr[4] & CR4_PCE_MASK) && ((env->hflags & HF_CPL_MASK) != 0)) {
@@ -3107,6 +3113,9 @@ void helper_wrmsr(void)
&& (val == 0 || val == ~(uint64_t)0))
env->mcg_ctl = val;
break;
+ case MSR_TSC_AUX:
+ env->tsc_aux = val;
+ break;
default:
if ((uint32_t)ECX >= MSR_MC0_CTL
&& (uint32_t)ECX < MSR_MC0_CTL + (4 * env->mcg_cap & 0xff)) {
@@ -3177,6 +3186,9 @@ void helper_rdmsr(void)
case MSR_KERNELGSBASE:
val = env->kernelgsbase;
break;
+ case MSR_TSC_AUX:
+ val = env->tsc_aux;
+ break;
#endif
case MSR_MTRRphysBase(0):
case MSR_MTRRphysBase(1):
diff --git a/target-i386/translate.c b/target-i386/translate.c
index 5cbcf07..306685d 100644
--- a/target-i386/translate.c
+++ b/target-i386/translate.c
@@ -7217,31 +7217,58 @@ static target_ulong disas_insn(DisasContext *s, target_ulong pc_start)
gen_eob(s);
}
break;
- case 7: /* invlpg */
- if (s->cpl != 0) {
- gen_exception(s, EXCP0D_GPF, pc_start - s->cs_base);
+ case 7:
+ if (mod != 3) { /* invlpg */
+ if (s->cpl != 0) {
+ gen_exception(s, EXCP0D_GPF, pc_start - s->cs_base);
+ } else {
+ if (s->cc_op != CC_OP_DYNAMIC)
+ gen_op_set_cc_op(s->cc_op);
+ gen_jmp_im(pc_start - s->cs_base);
+ gen_lea_modrm(s, modrm, ®_addr, &offset_addr);
+ gen_helper_invlpg(cpu_A0);
+ gen_jmp_im(s->pc - s->cs_base);
+ gen_eob(s);
+ }
} else {
- if (mod == 3) {
+ switch (rm) {
+ case 0: /* swapgs */
#ifdef TARGET_X86_64
- if (CODE64(s) && rm == 0) {
- /* swapgs */
- tcg_gen_ld_tl(cpu_T[0], cpu_env, offsetof(CPUX86State,segs[R_GS].base));
- tcg_gen_ld_tl(cpu_T[1], cpu_env, offsetof(CPUX86State,kernelgsbase));
- tcg_gen_st_tl(cpu_T[1], cpu_env, offsetof(CPUX86State,segs[R_GS].base));
- tcg_gen_st_tl(cpu_T[0], cpu_env, offsetof(CPUX86State,kernelgsbase));
+ if (CODE64(s)) {
+ if (s->cpl != 0) {
+ gen_exception(s, EXCP0D_GPF, pc_start - s->cs_base);
+ } else {
+ tcg_gen_ld_tl(cpu_T[0], cpu_env,
+ offsetof(CPUX86State,segs[R_GS].base));
+ tcg_gen_ld_tl(cpu_T[1], cpu_env,
+ offsetof(CPUX86State,kernelgsbase));
+ tcg_gen_st_tl(cpu_T[1], cpu_env,
+ offsetof(CPUX86State,segs[R_GS].base));
+ tcg_gen_st_tl(cpu_T[0], cpu_env,
+ offsetof(CPUX86State,kernelgsbase));
+ }
} else
#endif
{
goto illegal_op;
}
- } else {
+ break;
+ case 1: /* rdtscp */
+ if (!(s->cpuid_ext2_features & CPUID_EXT2_RDTSCP))
+ goto illegal_op;
if (s->cc_op != CC_OP_DYNAMIC)
gen_op_set_cc_op(s->cc_op);
gen_jmp_im(pc_start - s->cs_base);
- gen_lea_modrm(s, modrm, ®_addr, &offset_addr);
- gen_helper_invlpg(cpu_A0);
- gen_jmp_im(s->pc - s->cs_base);
- gen_eob(s);
+ if (use_icount)
+ gen_io_start();
+ gen_helper_rdtscp();
+ if (use_icount) {
+ gen_io_end();
+ gen_jmp(s, s->pc - s->cs_base);
+ }
+ break;
+ default:
+ goto illegal_op;
}
}
break;
--
1.6.1.3
^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [Qemu-devel] [PATCH 1/4] TCG x86: implement lzcnt emulation
2009-09-18 22:30 ` [Qemu-devel] [PATCH 1/4] TCG x86: implement lzcnt emulation Andre Przywara
@ 2009-10-04 12:02 ` Aurelien Jarno
2009-10-05 8:20 ` [Qemu-devel] [PATCH 1/4 v2] " Andre Przywara
0 siblings, 1 reply; 12+ messages in thread
From: Aurelien Jarno @ 2009-10-04 12:02 UTC (permalink / raw)
To: Andre Przywara; +Cc: qemu-devel
On Sat, Sep 19, 2009 at 12:30:46AM +0200, Andre Przywara wrote:
> lzcnt is an AMD Phenom/Barcelona added instruction returning the
> number of leading zero bits in a word.
> As this is similar to the "bsr" instruction, reuse the existing
> code. There need to be some more changes, though, as lzcnt always
> returns a valid value (in opposite to bsr, which has a special
> case when the operand is 0).
> lzcnt is guarded by the ABM CPUID bit (Fn8000_0001:ECX[5]).
While it's probably a good idea to reuse the existing bsr code, I think
they should be different helpers. In helper.c, bsr and lzcnt helpers can
then call the same function with different arguments.
> Signed-off-by: Andre Przywara <andre.przywara@amd.com>
> ---
> target-i386/helper.h | 2 +-
> target-i386/op_helper.c | 11 ++++++++---
> target-i386/translate.c | 37 +++++++++++++++++++++++++------------
> 3 files changed, 34 insertions(+), 16 deletions(-)
>
> diff --git a/target-i386/helper.h b/target-i386/helper.h
> index 68d57b1..38d0708 100644
> --- a/target-i386/helper.h
> +++ b/target-i386/helper.h
> @@ -191,7 +191,7 @@ DEF_HELPER_2(frstor, void, tl, int)
> DEF_HELPER_2(fxsave, void, tl, int)
> DEF_HELPER_2(fxrstor, void, tl, int)
> DEF_HELPER_1(bsf, tl, tl)
> -DEF_HELPER_1(bsr, tl, tl)
> +DEF_HELPER_2(bsr, tl, tl, int)
>
> /* MMX/SSE */
>
> diff --git a/target-i386/op_helper.c b/target-i386/op_helper.c
> index c3f5af6..9ffda36 100644
> --- a/target-i386/op_helper.c
> +++ b/target-i386/op_helper.c
> @@ -5457,11 +5457,14 @@ target_ulong helper_bsf(target_ulong t0)
> return count;
> }
>
> -target_ulong helper_bsr(target_ulong t0)
> +target_ulong helper_bsr(target_ulong t0, int lzcnt)
> {
> int count;
> target_ulong res, mask;
> -
> +
> + if (lzcnt > 0 && t0 == 0) {
> + return lzcnt;
> + }
> res = t0;
> count = TARGET_LONG_BITS - 1;
> mask = (target_ulong)1 << (TARGET_LONG_BITS - 1);
> @@ -5469,10 +5472,12 @@ target_ulong helper_bsr(target_ulong t0)
> count--;
> res <<= 1;
> }
> + if (lzcnt > 0) {
> + return lzcnt - 1 - count;
> + }
> return count;
> }
>
> -
> static int compute_all_eflags(void)
> {
> return CC_SRC;
> diff --git a/target-i386/translate.c b/target-i386/translate.c
> index 335fc08..aaa4492 100644
> --- a/target-i386/translate.c
> +++ b/target-i386/translate.c
> @@ -6538,22 +6538,35 @@ static target_ulong disas_insn(DisasContext *s, target_ulong pc_start)
> reg = ((modrm >> 3) & 7) | rex_r;
> gen_ldst_modrm(s, modrm, ot, OR_TMP0, 0);
> gen_extu(ot, cpu_T[0]);
> - label1 = gen_new_label();
> - tcg_gen_movi_tl(cpu_cc_dst, 0);
> t0 = tcg_temp_local_new();
> tcg_gen_mov_tl(t0, cpu_T[0]);
> - tcg_gen_brcondi_tl(TCG_COND_EQ, t0, 0, label1);
> - if (b & 1) {
> - gen_helper_bsr(cpu_T[0], t0);
> + if ((b & 1) && (prefixes & PREFIX_REPZ) &&
> + (s->cpuid_ext3_features & CPUID_EXT3_ABM)) {
> + switch(ot) {
> + case OT_WORD: gen_helper_bsr(cpu_T[0], t0,
> + tcg_const_i32(16)); break;
> + case OT_LONG: gen_helper_bsr(cpu_T[0], t0,
> + tcg_const_i32(32)); break;
> + case OT_QUAD: gen_helper_bsr(cpu_T[0], t0,
> + tcg_const_i32(64)); break;
> + }
> + gen_op_mov_reg_T0(ot, reg);
> } else {
> - gen_helper_bsf(cpu_T[0], t0);
> + label1 = gen_new_label();
> + tcg_gen_movi_tl(cpu_cc_dst, 0);
> + tcg_gen_brcondi_tl(TCG_COND_EQ, t0, 0, label1);
> + if (b & 1) {
> + gen_helper_bsr(cpu_T[0], t0, tcg_const_i32(0));
> + } else {
> + gen_helper_bsf(cpu_T[0], t0);
> + }
> + gen_op_mov_reg_T0(ot, reg);
> + tcg_gen_movi_tl(cpu_cc_dst, 1);
> + gen_set_label(label1);
> + tcg_gen_discard_tl(cpu_cc_src);
> + s->cc_op = CC_OP_LOGICB + ot;
> + tcg_temp_free(t0);
> }
> - gen_op_mov_reg_T0(ot, reg);
> - tcg_gen_movi_tl(cpu_cc_dst, 1);
> - gen_set_label(label1);
> - tcg_gen_discard_tl(cpu_cc_src);
> - s->cc_op = CC_OP_LOGICB + ot;
> - tcg_temp_free(t0);
> }
> break;
> /************************/
> --
> 1.6.1.3
>
>
>
>
>
--
Aurelien Jarno GPG: 1024D/F1BCDB73
aurelien@aurel32.net http://www.aurel32.net
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Qemu-devel] [PATCH 2/4] TCG x86: add lock mov cr0 = cr8
2009-09-18 22:30 ` [Qemu-devel] [PATCH 2/4] TCG x86: add lock mov cr0 = cr8 Andre Przywara
@ 2009-10-04 12:06 ` Aurelien Jarno
0 siblings, 0 replies; 12+ messages in thread
From: Aurelien Jarno @ 2009-10-04 12:06 UTC (permalink / raw)
To: Andre Przywara; +Cc: qemu-devel
On Sat, Sep 19, 2009 at 12:30:47AM +0200, Andre Przywara wrote:
> AMD CPUs featuring a shortcut to access CR8 even from 32-bit mode.
> If you use the LOCK prefix with "mov CR0", it accesses CR8 instead.
> This behavior is guarded by the CR8_LEGACY CPUID bit
> (Fn8000_0001:ECX[1]).
Thanks, applied.
> Signed-off-by: Andre Przywara <andre.przywara@amd.com>
> ---
> target-i386/translate.c | 4 ++++
> 1 files changed, 4 insertions(+), 0 deletions(-)
>
> diff --git a/target-i386/translate.c b/target-i386/translate.c
> index aaa4492..134c870 100644
> --- a/target-i386/translate.c
> +++ b/target-i386/translate.c
> @@ -7362,6 +7362,10 @@ static target_ulong disas_insn(DisasContext *s, target_ulong pc_start)
> ot = OT_QUAD;
> else
> ot = OT_LONG;
> + if ((prefixes & PREFIX_LOCK) && (reg == 0) &&
> + (s->cpuid_ext3_features & CPUID_EXT3_CR8LEG)) {
> + reg = 8;
> + }
> switch(reg) {
> case 0:
> case 2:
> --
> 1.6.1.3
>
>
>
>
>
--
Aurelien Jarno GPG: 1024D/F1BCDB73
aurelien@aurel32.net http://www.aurel32.net
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Qemu-devel] [PATCH 3/4] TCG x86: add SSE4a instruction support
2009-09-18 22:30 ` [Qemu-devel] [PATCH 3/4] TCG x86: add SSE4a instruction support Andre Przywara
@ 2009-10-04 12:10 ` Aurelien Jarno
0 siblings, 0 replies; 12+ messages in thread
From: Aurelien Jarno @ 2009-10-04 12:10 UTC (permalink / raw)
To: Andre Przywara; +Cc: qemu-devel
On Sat, Sep 19, 2009 at 12:30:48AM +0200, Andre Przywara wrote:
> This adds support for the AMD Phenom/Barcelona's SSE4a instructions.
> Those include insertq and extrq, which are doing shift and mask on
> XMM registers, in two versions (immediate shift/length values and
> stored in another XMM register).
> Additionally it implements movntss, movntsd, which are scalar
> non-temporal stores (avoiding cache trashing). These are implemented
> as normal stores, though.
> SSE4a is guarded by the SSE4A CPUID bit (Fn8000_0001:ECX[6]).
Thanks applied.
> Signed-off-by: Andre Przywara <andre.przywara@amd.com>
> ---
> target-i386/ops_sse.h | 44 ++++++++++++++++++++++++++++++++++++++++++
> target-i386/ops_sse_header.h | 4 +++
> target-i386/translate.c | 39 +++++++++++++++++++++++++++++++++++-
> 3 files changed, 85 insertions(+), 2 deletions(-)
>
> diff --git a/target-i386/ops_sse.h b/target-i386/ops_sse.h
> index 709732a..3232abd 100644
> --- a/target-i386/ops_sse.h
> +++ b/target-i386/ops_sse.h
> @@ -802,6 +802,50 @@ void helper_rcpss(XMMReg *d, XMMReg *s)
> d->XMM_S(0) = approx_rcp(s->XMM_S(0));
> }
>
> +static inline uint64_t helper_extrq(uint64_t src, int shift, int len)
> +{
> + uint64_t mask;
> +
> + if (len == 0) {
> + mask = ~0LL;
> + } else {
> + mask = (1ULL << len) - 1;
> + }
> + return (src >> shift) & mask;
> +}
> +
> +void helper_extrq_r(XMMReg *d, XMMReg *s)
> +{
> + d->XMM_Q(0) = helper_extrq(d->XMM_Q(0), s->XMM_B(1), s->XMM_B(0));
> +}
> +
> +void helper_extrq_i(XMMReg *d, int index, int length)
> +{
> + d->XMM_Q(0) = helper_extrq(d->XMM_Q(0), index, length);
> +}
> +
> +static inline uint64_t helper_insertq(uint64_t src, int shift, int len)
> +{
> + uint64_t mask;
> +
> + if (len == 0) {
> + mask = ~0ULL;
> + } else {
> + mask = (1ULL << len) - 1;
> + }
> + return (src & ~(mask << shift)) | ((src & mask) << shift);
> +}
> +
> +void helper_insertq_r(XMMReg *d, XMMReg *s)
> +{
> + d->XMM_Q(0) = helper_insertq(s->XMM_Q(0), s->XMM_B(9), s->XMM_B(8));
> +}
> +
> +void helper_insertq_i(XMMReg *d, int index, int length)
> +{
> + d->XMM_Q(0) = helper_insertq(d->XMM_Q(0), index, length);
> +}
> +
> void helper_haddps(XMMReg *d, XMMReg *s)
> {
> XMMReg r;
> diff --git a/target-i386/ops_sse_header.h b/target-i386/ops_sse_header.h
> index 53add99..a0a6361 100644
> --- a/target-i386/ops_sse_header.h
> +++ b/target-i386/ops_sse_header.h
> @@ -187,6 +187,10 @@ DEF_HELPER_2(rsqrtps, void, XMMReg, XMMReg)
> DEF_HELPER_2(rsqrtss, void, XMMReg, XMMReg)
> DEF_HELPER_2(rcpps, void, XMMReg, XMMReg)
> DEF_HELPER_2(rcpss, void, XMMReg, XMMReg)
> +DEF_HELPER_2(extrq_r, void, XMMReg, XMMReg)
> +DEF_HELPER_3(extrq_i, void, XMMReg, int, int)
> +DEF_HELPER_2(insertq_r, void, XMMReg, XMMReg)
> +DEF_HELPER_3(insertq_i, void, XMMReg, int, int)
> DEF_HELPER_2(haddps, void, XMMReg, XMMReg)
> DEF_HELPER_2(haddpd, void, XMMReg, XMMReg)
> DEF_HELPER_2(hsubps, void, XMMReg, XMMReg)
> diff --git a/target-i386/translate.c b/target-i386/translate.c
> index 134c870..5cbcf07 100644
> --- a/target-i386/translate.c
> +++ b/target-i386/translate.c
> @@ -2822,7 +2822,7 @@ static void *sse_op_table1[256][4] = {
> [0x28] = { SSE_SPECIAL, SSE_SPECIAL }, /* movaps, movapd */
> [0x29] = { SSE_SPECIAL, SSE_SPECIAL }, /* movaps, movapd */
> [0x2a] = { SSE_SPECIAL, SSE_SPECIAL, SSE_SPECIAL, SSE_SPECIAL }, /* cvtpi2ps, cvtpi2pd, cvtsi2ss, cvtsi2sd */
> - [0x2b] = { SSE_SPECIAL, SSE_SPECIAL }, /* movntps, movntpd */
> + [0x2b] = { SSE_SPECIAL, SSE_SPECIAL, SSE_SPECIAL, SSE_SPECIAL }, /* movntps, movntpd, movntss, movntsd */
> [0x2c] = { SSE_SPECIAL, SSE_SPECIAL, SSE_SPECIAL, SSE_SPECIAL }, /* cvttps2pi, cvttpd2pi, cvttsd2si, cvttss2si */
> [0x2d] = { SSE_SPECIAL, SSE_SPECIAL, SSE_SPECIAL, SSE_SPECIAL }, /* cvtps2pi, cvtpd2pi, cvtsd2si, cvtss2si */
> [0x2e] = { gen_helper_ucomiss, gen_helper_ucomisd },
> @@ -2879,6 +2879,8 @@ static void *sse_op_table1[256][4] = {
> [0x75] = MMX_OP2(pcmpeqw),
> [0x76] = MMX_OP2(pcmpeql),
> [0x77] = { SSE_DUMMY }, /* emms */
> + [0x78] = { NULL, SSE_SPECIAL, NULL, SSE_SPECIAL }, /* extrq_i, insertq_i */
> + [0x79] = { NULL, gen_helper_extrq_r, NULL, gen_helper_insertq_r },
> [0x7c] = { NULL, gen_helper_haddpd, NULL, gen_helper_haddps },
> [0x7d] = { NULL, gen_helper_hsubpd, NULL, gen_helper_hsubps },
> [0x7e] = { SSE_SPECIAL, SSE_SPECIAL, SSE_SPECIAL }, /* movd, movd, , movq */
> @@ -3165,6 +3167,20 @@ static void gen_sse(DisasContext *s, int b, target_ulong pc_start, int rex_r)
> gen_lea_modrm(s, modrm, ®_addr, &offset_addr);
> gen_sto_env_A0(s->mem_index, offsetof(CPUX86State,xmm_regs[reg]));
> break;
> + case 0x22b: /* movntss */
> + case 0x32b: /* movntsd */
> + if (mod == 3)
> + goto illegal_op;
> + gen_lea_modrm(s, modrm, ®_addr, &offset_addr);
> + if (b1 & 1) {
> + gen_stq_env_A0(s->mem_index, offsetof(CPUX86State,
> + xmm_regs[reg]));
> + } else {
> + tcg_gen_ld32u_tl(cpu_T[0], cpu_env, offsetof(CPUX86State,
> + xmm_regs[reg].XMM_L(0)));
> + gen_op_st_T0_A0(OT_LONG + s->mem_index);
> + }
> + break;
> case 0x6e: /* movd mm, ea */
> #ifdef TARGET_X86_64
> if (s->dflag == 2) {
> @@ -3320,6 +3336,25 @@ static void gen_sse(DisasContext *s, int b, target_ulong pc_start, int rex_r)
> gen_op_movl(offsetof(CPUX86State,xmm_regs[reg].XMM_L(2)),
> offsetof(CPUX86State,xmm_regs[reg].XMM_L(3)));
> break;
> + case 0x178:
> + case 0x378:
> + {
> + int bit_index, field_length;
> +
> + if (b1 == 1 && reg != 0)
> + goto illegal_op;
> + field_length = ldub_code(s->pc++) & 0x3F;
> + bit_index = ldub_code(s->pc++) & 0x3F;
> + tcg_gen_addi_ptr(cpu_ptr0, cpu_env,
> + offsetof(CPUX86State,xmm_regs[reg]));
> + if (b1 == 1)
> + gen_helper_extrq_i(cpu_ptr0, tcg_const_i32(bit_index),
> + tcg_const_i32(field_length));
> + else
> + gen_helper_insertq_i(cpu_ptr0, tcg_const_i32(bit_index),
> + tcg_const_i32(field_length));
> + }
> + break;
> case 0x7e: /* movd ea, mm */
> #ifdef TARGET_X86_64
> if (s->dflag == 2) {
> @@ -7566,7 +7601,7 @@ static target_ulong disas_insn(DisasContext *s, target_ulong pc_start)
> case 0x110 ... 0x117:
> case 0x128 ... 0x12f:
> case 0x138 ... 0x13a:
> - case 0x150 ... 0x177:
> + case 0x150 ... 0x179:
> case 0x17c ... 0x17f:
> case 0x1c2:
> case 0x1c4 ... 0x1c6:
> --
> 1.6.1.3
>
>
>
>
>
--
Aurelien Jarno GPG: 1024D/F1BCDB73
aurelien@aurel32.net http://www.aurel32.net
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Qemu-devel] [PATCH 4/4] TCG x86: add RDTSCP support
2009-09-18 22:30 ` [Qemu-devel] [PATCH 4/4] TCG x86: add RDTSCP support Andre Przywara
@ 2009-10-04 12:47 ` Aurelien Jarno
0 siblings, 0 replies; 12+ messages in thread
From: Aurelien Jarno @ 2009-10-04 12:47 UTC (permalink / raw)
To: Andre Przywara; +Cc: qemu-devel
On Sat, Sep 19, 2009 at 12:30:49AM +0200, Andre Przywara wrote:
> RDTSCP reads the time stamp counter and atomically also the content
> of a 32-bit MSR, which can be freely set by the OS. This allows CPU
> local data to be queried by userspace.
> Linux uses this to allow a fast implementation of the getcpu()
> syscall, which uses the vsyscall page to avoid a context switch.
> AMD CPUs since K8RevF and Intel CPUs since Nehalem support this
> instruction.
> RDTSCP is guarded by the RDTSCP CPUID bit (Fn8000_0001:EDX[27]).
Thanks, applied.
> Signed-off-by: Andre Przywara <andre.przywara@amd.com>
> ---
> target-i386/cpu.h | 5 +++-
> target-i386/helper.h | 1 +
> target-i386/machine.c | 4 +++
> target-i386/op_helper.c | 12 ++++++++++
> target-i386/translate.c | 57 ++++++++++++++++++++++++++++++++++------------
> 5 files changed, 63 insertions(+), 16 deletions(-)
>
> diff --git a/target-i386/cpu.h b/target-i386/cpu.h
> index b9a6392..f318942 100644
> --- a/target-i386/cpu.h
> +++ b/target-i386/cpu.h
> @@ -322,6 +322,7 @@
> #define MSR_FSBASE 0xc0000100
> #define MSR_GSBASE 0xc0000101
> #define MSR_KERNELGSBASE 0xc0000102
> +#define MSR_TSC_AUX 0xc0000103
>
> #define MSR_VM_HSAVE_PA 0xc0010117
>
> @@ -694,6 +695,8 @@ typedef struct CPUX86State {
> uint64 mcg_status;
> uint64 mcg_ctl;
> uint64 *mce_banks;
> +
> + uint64_t tsc_aux;
> } CPUX86State;
>
> CPUX86State *cpu_x86_init(const char *cpu_model);
> @@ -854,7 +857,7 @@ uint64_t cpu_get_tsc(CPUX86State *env);
> #define cpu_signal_handler cpu_x86_signal_handler
> #define cpu_list x86_cpu_list
>
> -#define CPU_SAVE_VERSION 10
> +#define CPU_SAVE_VERSION 11
>
> /* MMU modes definitions */
> #define MMU_MODE0_SUFFIX _kernel
> diff --git a/target-i386/helper.h b/target-i386/helper.h
> index 38d0708..ef8d4e1 100644
> --- a/target-i386/helper.h
> +++ b/target-i386/helper.h
> @@ -80,6 +80,7 @@ DEF_HELPER_1(cmpxchg16b, void, tl)
> DEF_HELPER_0(single_step, void)
> DEF_HELPER_0(cpuid, void)
> DEF_HELPER_0(rdtsc, void)
> +DEF_HELPER_0(rdtscp, void)
> DEF_HELPER_0(rdpmc, void)
> DEF_HELPER_0(rdmsr, void)
> DEF_HELPER_0(wrmsr, void)
> diff --git a/target-i386/machine.c b/target-i386/machine.c
> index ab31329..e5a060f 100644
> --- a/target-i386/machine.c
> +++ b/target-i386/machine.c
> @@ -171,6 +171,7 @@ void cpu_save(QEMUFile *f, void *opaque)
> qemu_put_be64s(f, &env->mce_banks[4*i + 3]);
> }
> }
> + qemu_put_be64s(f, &env->tsc_aux);
> }
>
> #ifdef USE_X86LDOUBLE
> @@ -377,6 +378,9 @@ int cpu_load(QEMUFile *f, void *opaque, int version_id)
> }
> }
>
> + if (version_id >= 11) {
> + qemu_get_be64s(f, &env->tsc_aux);
> + }
> /* XXX: ensure compatiblity for halted bit ? */
> /* XXX: compute redundant hflags bits */
> env->hflags = hflags;
> diff --git a/target-i386/op_helper.c b/target-i386/op_helper.c
> index 9ffda36..a457bdd 100644
> --- a/target-i386/op_helper.c
> +++ b/target-i386/op_helper.c
> @@ -2969,6 +2969,12 @@ void helper_rdtsc(void)
> EDX = (uint32_t)(val >> 32);
> }
>
> +void helper_rdtscp(void)
> +{
> + helper_rdtsc();
> + ECX = (uint32_t)(env->tsc_aux);
> +}
> +
> void helper_rdpmc(void)
> {
> if ((env->cr[4] & CR4_PCE_MASK) && ((env->hflags & HF_CPL_MASK) != 0)) {
> @@ -3107,6 +3113,9 @@ void helper_wrmsr(void)
> && (val == 0 || val == ~(uint64_t)0))
> env->mcg_ctl = val;
> break;
> + case MSR_TSC_AUX:
> + env->tsc_aux = val;
> + break;
> default:
> if ((uint32_t)ECX >= MSR_MC0_CTL
> && (uint32_t)ECX < MSR_MC0_CTL + (4 * env->mcg_cap & 0xff)) {
> @@ -3177,6 +3186,9 @@ void helper_rdmsr(void)
> case MSR_KERNELGSBASE:
> val = env->kernelgsbase;
> break;
> + case MSR_TSC_AUX:
> + val = env->tsc_aux;
> + break;
> #endif
> case MSR_MTRRphysBase(0):
> case MSR_MTRRphysBase(1):
> diff --git a/target-i386/translate.c b/target-i386/translate.c
> index 5cbcf07..306685d 100644
> --- a/target-i386/translate.c
> +++ b/target-i386/translate.c
> @@ -7217,31 +7217,58 @@ static target_ulong disas_insn(DisasContext *s, target_ulong pc_start)
> gen_eob(s);
> }
> break;
> - case 7: /* invlpg */
> - if (s->cpl != 0) {
> - gen_exception(s, EXCP0D_GPF, pc_start - s->cs_base);
> + case 7:
> + if (mod != 3) { /* invlpg */
> + if (s->cpl != 0) {
> + gen_exception(s, EXCP0D_GPF, pc_start - s->cs_base);
> + } else {
> + if (s->cc_op != CC_OP_DYNAMIC)
> + gen_op_set_cc_op(s->cc_op);
> + gen_jmp_im(pc_start - s->cs_base);
> + gen_lea_modrm(s, modrm, ®_addr, &offset_addr);
> + gen_helper_invlpg(cpu_A0);
> + gen_jmp_im(s->pc - s->cs_base);
> + gen_eob(s);
> + }
> } else {
> - if (mod == 3) {
> + switch (rm) {
> + case 0: /* swapgs */
> #ifdef TARGET_X86_64
> - if (CODE64(s) && rm == 0) {
> - /* swapgs */
> - tcg_gen_ld_tl(cpu_T[0], cpu_env, offsetof(CPUX86State,segs[R_GS].base));
> - tcg_gen_ld_tl(cpu_T[1], cpu_env, offsetof(CPUX86State,kernelgsbase));
> - tcg_gen_st_tl(cpu_T[1], cpu_env, offsetof(CPUX86State,segs[R_GS].base));
> - tcg_gen_st_tl(cpu_T[0], cpu_env, offsetof(CPUX86State,kernelgsbase));
> + if (CODE64(s)) {
> + if (s->cpl != 0) {
> + gen_exception(s, EXCP0D_GPF, pc_start - s->cs_base);
> + } else {
> + tcg_gen_ld_tl(cpu_T[0], cpu_env,
> + offsetof(CPUX86State,segs[R_GS].base));
> + tcg_gen_ld_tl(cpu_T[1], cpu_env,
> + offsetof(CPUX86State,kernelgsbase));
> + tcg_gen_st_tl(cpu_T[1], cpu_env,
> + offsetof(CPUX86State,segs[R_GS].base));
> + tcg_gen_st_tl(cpu_T[0], cpu_env,
> + offsetof(CPUX86State,kernelgsbase));
> + }
> } else
> #endif
> {
> goto illegal_op;
> }
> - } else {
> + break;
> + case 1: /* rdtscp */
> + if (!(s->cpuid_ext2_features & CPUID_EXT2_RDTSCP))
> + goto illegal_op;
> if (s->cc_op != CC_OP_DYNAMIC)
> gen_op_set_cc_op(s->cc_op);
> gen_jmp_im(pc_start - s->cs_base);
> - gen_lea_modrm(s, modrm, ®_addr, &offset_addr);
> - gen_helper_invlpg(cpu_A0);
> - gen_jmp_im(s->pc - s->cs_base);
> - gen_eob(s);
> + if (use_icount)
> + gen_io_start();
> + gen_helper_rdtscp();
> + if (use_icount) {
> + gen_io_end();
> + gen_jmp(s, s->pc - s->cs_base);
> + }
> + break;
> + default:
> + goto illegal_op;
> }
> }
> break;
> --
> 1.6.1.3
>
>
>
>
>
--
Aurelien Jarno GPG: 1024D/F1BCDB73
aurelien@aurel32.net http://www.aurel32.net
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Qemu-devel] [PATCH 1/4 v2] TCG x86: implement lzcnt emulation
2009-10-04 12:02 ` Aurelien Jarno
@ 2009-10-05 8:20 ` Andre Przywara
2009-10-05 13:57 ` [Qemu-devel] " Aurelien Jarno
0 siblings, 1 reply; 12+ messages in thread
From: Andre Przywara @ 2009-10-05 8:20 UTC (permalink / raw)
To: aurelien; +Cc: Andre Przywara, qemu-devel
lzcnt is a AMD Phenom/Barcelona added instruction returning the
number of leading zero bits in a word.
As this is similar to the "bsr" instruction, reuse the existing
code. There need to be some more changes, though, as lzcnt always
returns a valid value (in opposite to bsr, which has a special
case when the operand is 0).
lzcnt is guarded by the ABM CPUID bit (Fn8000_0001:ECX_5).
Signed-off-by: Andre Przywara <andre.przywara@amd.com>
---
target-i386/helper.h | 1 +
target-i386/op_helper.c | 14 ++++++++++++--
target-i386/translate.c | 37 +++++++++++++++++++++++++------------
3 files changed, 38 insertions(+), 14 deletions(-)
Aurelien,
this version addresses your comments.
Thanks for the review (and the other commits)!
Regards,
Andre.
diff --git a/target-i386/helper.h b/target-i386/helper.h
index ca953f4..6b518ad 100644
--- a/target-i386/helper.h
+++ b/target-i386/helper.h
@@ -193,6 +193,7 @@ DEF_HELPER_2(fxsave, void, tl, int)
DEF_HELPER_2(fxrstor, void, tl, int)
DEF_HELPER_1(bsf, tl, tl)
DEF_HELPER_1(bsr, tl, tl)
+DEF_HELPER_2(lzcnt, tl, tl, int)
/* MMX/SSE */
diff --git a/target-i386/op_helper.c b/target-i386/op_helper.c
index 26fe612..5eea322 100644
--- a/target-i386/op_helper.c
+++ b/target-i386/op_helper.c
@@ -5479,11 +5479,14 @@ target_ulong helper_bsf(target_ulong t0)
return count;
}
-target_ulong helper_bsr(target_ulong t0)
+target_ulong helper_lzcnt(target_ulong t0, int wordsize)
{
int count;
target_ulong res, mask;
-
+
+ if (wordsize > 0 && t0 == 0) {
+ return wordsize;
+ }
res = t0;
count = TARGET_LONG_BITS - 1;
mask = (target_ulong)1 << (TARGET_LONG_BITS - 1);
@@ -5491,9 +5494,16 @@ target_ulong helper_bsr(target_ulong t0)
count--;
res <<= 1;
}
+ if (wordsize > 0) {
+ return wordsize - 1 - count;
+ }
return count;
}
+target_ulong helper_bsr(target_ulong t0)
+{
+ return helper_lzcnt(t0, 0);
+}
static int compute_all_eflags(void)
{
diff --git a/target-i386/translate.c b/target-i386/translate.c
index e3cb49f..5cbdce1 100644
--- a/target-i386/translate.c
+++ b/target-i386/translate.c
@@ -6575,22 +6575,35 @@ static target_ulong disas_insn(DisasContext *s, target_ulong pc_start)
reg = ((modrm >> 3) & 7) | rex_r;
gen_ldst_modrm(s, modrm, ot, OR_TMP0, 0);
gen_extu(ot, cpu_T[0]);
- label1 = gen_new_label();
- tcg_gen_movi_tl(cpu_cc_dst, 0);
t0 = tcg_temp_local_new();
tcg_gen_mov_tl(t0, cpu_T[0]);
- tcg_gen_brcondi_tl(TCG_COND_EQ, t0, 0, label1);
- if (b & 1) {
- gen_helper_bsr(cpu_T[0], t0);
+ if ((b & 1) && (prefixes & PREFIX_REPZ) &&
+ (s->cpuid_ext3_features & CPUID_EXT3_ABM)) {
+ switch(ot) {
+ case OT_WORD: gen_helper_lzcnt(cpu_T[0], t0,
+ tcg_const_i32(16)); break;
+ case OT_LONG: gen_helper_lzcnt(cpu_T[0], t0,
+ tcg_const_i32(32)); break;
+ case OT_QUAD: gen_helper_lzcnt(cpu_T[0], t0,
+ tcg_const_i32(64)); break;
+ }
+ gen_op_mov_reg_T0(ot, reg);
} else {
- gen_helper_bsf(cpu_T[0], t0);
+ label1 = gen_new_label();
+ tcg_gen_movi_tl(cpu_cc_dst, 0);
+ tcg_gen_brcondi_tl(TCG_COND_EQ, t0, 0, label1);
+ if (b & 1) {
+ gen_helper_bsr(cpu_T[0], t0);
+ } else {
+ gen_helper_bsf(cpu_T[0], t0);
+ }
+ gen_op_mov_reg_T0(ot, reg);
+ tcg_gen_movi_tl(cpu_cc_dst, 1);
+ gen_set_label(label1);
+ tcg_gen_discard_tl(cpu_cc_src);
+ s->cc_op = CC_OP_LOGICB + ot;
+ tcg_temp_free(t0);
}
- gen_op_mov_reg_T0(ot, reg);
- tcg_gen_movi_tl(cpu_cc_dst, 1);
- gen_set_label(label1);
- tcg_gen_discard_tl(cpu_cc_src);
- s->cc_op = CC_OP_LOGICB + ot;
- tcg_temp_free(t0);
}
break;
/************************/
--
1.6.1.3
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [Qemu-devel] Re: [PATCH 1/4 v2] TCG x86: implement lzcnt emulation
2009-10-05 8:20 ` [Qemu-devel] [PATCH 1/4 v2] " Andre Przywara
@ 2009-10-05 13:57 ` Aurelien Jarno
2009-10-23 11:44 ` [Qemu-devel] [PATCH v3] " Andre Przywara
0 siblings, 1 reply; 12+ messages in thread
From: Aurelien Jarno @ 2009-10-05 13:57 UTC (permalink / raw)
To: Andre Przywara; +Cc: qemu-devel
On Mon, Oct 05, 2009 at 10:20:01AM +0200, Andre Przywara wrote:
> lzcnt is a AMD Phenom/Barcelona added instruction returning the
> number of leading zero bits in a word.
> As this is similar to the "bsr" instruction, reuse the existing
> code. There need to be some more changes, though, as lzcnt always
> returns a valid value (in opposite to bsr, which has a special
> case when the operand is 0).
> lzcnt is guarded by the ABM CPUID bit (Fn8000_0001:ECX_5).
>
> Signed-off-by: Andre Przywara <andre.przywara@amd.com>
> ---
> target-i386/helper.h | 1 +
> target-i386/op_helper.c | 14 ++++++++++++--
> target-i386/translate.c | 37 +++++++++++++++++++++++++------------
> 3 files changed, 38 insertions(+), 14 deletions(-)
>
> Aurelien,
>
> this version addresses your comments.
>
> Thanks for the review (and the other commits)!
>
Thanks, for the new version. There is still a minor issue I haven't
spotted at the first review. See the inline comment.
> diff --git a/target-i386/helper.h b/target-i386/helper.h
> index ca953f4..6b518ad 100644
> --- a/target-i386/helper.h
> +++ b/target-i386/helper.h
> @@ -193,6 +193,7 @@ DEF_HELPER_2(fxsave, void, tl, int)
> DEF_HELPER_2(fxrstor, void, tl, int)
> DEF_HELPER_1(bsf, tl, tl)
> DEF_HELPER_1(bsr, tl, tl)
> +DEF_HELPER_2(lzcnt, tl, tl, int)
>
> /* MMX/SSE */
>
> diff --git a/target-i386/op_helper.c b/target-i386/op_helper.c
> index 26fe612..5eea322 100644
> --- a/target-i386/op_helper.c
> +++ b/target-i386/op_helper.c
> @@ -5479,11 +5479,14 @@ target_ulong helper_bsf(target_ulong t0)
> return count;
> }
>
> -target_ulong helper_bsr(target_ulong t0)
> +target_ulong helper_lzcnt(target_ulong t0, int wordsize)
> {
> int count;
> target_ulong res, mask;
> -
> +
> + if (wordsize > 0 && t0 == 0) {
> + return wordsize;
> + }
> res = t0;
> count = TARGET_LONG_BITS - 1;
> mask = (target_ulong)1 << (TARGET_LONG_BITS - 1);
> @@ -5491,9 +5494,16 @@ target_ulong helper_bsr(target_ulong t0)
> count--;
> res <<= 1;
> }
> + if (wordsize > 0) {
> + return wordsize - 1 - count;
> + }
> return count;
> }
>
> +target_ulong helper_bsr(target_ulong t0)
> +{
> + return helper_lzcnt(t0, 0);
> +}
>
> static int compute_all_eflags(void)
> {
> diff --git a/target-i386/translate.c b/target-i386/translate.c
> index e3cb49f..5cbdce1 100644
> --- a/target-i386/translate.c
> +++ b/target-i386/translate.c
> @@ -6575,22 +6575,35 @@ static target_ulong disas_insn(DisasContext *s, target_ulong pc_start)
> reg = ((modrm >> 3) & 7) | rex_r;
> gen_ldst_modrm(s, modrm, ot, OR_TMP0, 0);
> gen_extu(ot, cpu_T[0]);
> - label1 = gen_new_label();
> - tcg_gen_movi_tl(cpu_cc_dst, 0);
> t0 = tcg_temp_local_new();
> tcg_gen_mov_tl(t0, cpu_T[0]);
> - tcg_gen_brcondi_tl(TCG_COND_EQ, t0, 0, label1);
> - if (b & 1) {
> - gen_helper_bsr(cpu_T[0], t0);
> + if ((b & 1) && (prefixes & PREFIX_REPZ) &&
> + (s->cpuid_ext3_features & CPUID_EXT3_ABM)) {
> + switch(ot) {
> + case OT_WORD: gen_helper_lzcnt(cpu_T[0], t0,
> + tcg_const_i32(16)); break;
> + case OT_LONG: gen_helper_lzcnt(cpu_T[0], t0,
> + tcg_const_i32(32)); break;
> + case OT_QUAD: gen_helper_lzcnt(cpu_T[0], t0,
> + tcg_const_i32(64)); break;
> + }
> + gen_op_mov_reg_T0(ot, reg);
> } else {
> - gen_helper_bsf(cpu_T[0], t0);
> + label1 = gen_new_label();
> + tcg_gen_movi_tl(cpu_cc_dst, 0);
> + tcg_gen_brcondi_tl(TCG_COND_EQ, t0, 0, label1);
> + if (b & 1) {
> + gen_helper_bsr(cpu_T[0], t0);
> + } else {
> + gen_helper_bsf(cpu_T[0], t0);
> + }
> + gen_op_mov_reg_T0(ot, reg);
> + tcg_gen_movi_tl(cpu_cc_dst, 1);
> + gen_set_label(label1);
> + tcg_gen_discard_tl(cpu_cc_src);
> + s->cc_op = CC_OP_LOGICB + ot;
> + tcg_temp_free(t0);
> }
> - gen_op_mov_reg_T0(ot, reg);
> - tcg_gen_movi_tl(cpu_cc_dst, 1);
> - gen_set_label(label1);
> - tcg_gen_discard_tl(cpu_cc_src);
> - s->cc_op = CC_OP_LOGICB + ot;
> - tcg_temp_free(t0);
The tcg_temp_free(t0) is missing in the lzcnt path. As it is common, the
best is probably to not move it from here.
> }
> break;
> /************************/
> --
> 1.6.1.3
>
>
>
--
Aurelien Jarno GPG: 1024D/F1BCDB73
aurelien@aurel32.net http://www.aurel32.net
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Qemu-devel] [PATCH v3] TCG x86: implement lzcnt emulation
2009-10-05 13:57 ` [Qemu-devel] " Aurelien Jarno
@ 2009-10-23 11:44 ` Andre Przywara
0 siblings, 0 replies; 12+ messages in thread
From: Andre Przywara @ 2009-10-23 11:44 UTC (permalink / raw)
To: aurelien; +Cc: Andre Przywara, qemu-devel
lzcnt is a AMD Phenom/Barcelona added instruction returning the
number of leading zero bits in a word.
As this is similar to the "bsr" instruction, reuse the existing
code. There need to be some more changes, though, as lzcnt always
returns a valid value (in opposite to bsr, which has a special
case when the operand is 0).
lzcnt is guarded by the ABM CPUID bit (Fn8000_0001:ECX_5).
Signed-off-by: Andre Przywara <andre.przywara@amd.com>
---
target-i386/helper.h | 1 +
target-i386/op_helper.c | 14 ++++++++++++--
target-i386/translate.c | 37 +++++++++++++++++++++++++------------
3 files changed, 38 insertions(+), 14 deletions(-)
diff --git a/target-i386/helper.h b/target-i386/helper.h
index ca953f4..6b518ad 100644
--- a/target-i386/helper.h
+++ b/target-i386/helper.h
@@ -193,6 +193,7 @@ DEF_HELPER_2(fxsave, void, tl, int)
DEF_HELPER_2(fxrstor, void, tl, int)
DEF_HELPER_1(bsf, tl, tl)
DEF_HELPER_1(bsr, tl, tl)
+DEF_HELPER_2(lzcnt, tl, tl, int)
/* MMX/SSE */
diff --git a/target-i386/op_helper.c b/target-i386/op_helper.c
index 26fe612..5eea322 100644
--- a/target-i386/op_helper.c
+++ b/target-i386/op_helper.c
@@ -5479,11 +5479,14 @@ target_ulong helper_bsf(target_ulong t0)
return count;
}
-target_ulong helper_bsr(target_ulong t0)
+target_ulong helper_lzcnt(target_ulong t0, int wordsize)
{
int count;
target_ulong res, mask;
-
+
+ if (wordsize > 0 && t0 == 0) {
+ return wordsize;
+ }
res = t0;
count = TARGET_LONG_BITS - 1;
mask = (target_ulong)1 << (TARGET_LONG_BITS - 1);
@@ -5491,9 +5494,16 @@ target_ulong helper_bsr(target_ulong t0)
count--;
res <<= 1;
}
+ if (wordsize > 0) {
+ return wordsize - 1 - count;
+ }
return count;
}
+target_ulong helper_bsr(target_ulong t0)
+{
+ return helper_lzcnt(t0, 0);
+}
static int compute_all_eflags(void)
{
diff --git a/target-i386/translate.c b/target-i386/translate.c
index 2511943..64bc0a3 100644
--- a/target-i386/translate.c
+++ b/target-i386/translate.c
@@ -6573,23 +6573,36 @@ static target_ulong disas_insn(DisasContext *s, target_ulong pc_start)
ot = dflag + OT_WORD;
modrm = ldub_code(s->pc++);
reg = ((modrm >> 3) & 7) | rex_r;
- gen_ldst_modrm(s, modrm, ot, OR_TMP0, 0);
+ gen_ldst_modrm(s,modrm, ot, OR_TMP0, 0);
gen_extu(ot, cpu_T[0]);
- label1 = gen_new_label();
- tcg_gen_movi_tl(cpu_cc_dst, 0);
t0 = tcg_temp_local_new();
tcg_gen_mov_tl(t0, cpu_T[0]);
- tcg_gen_brcondi_tl(TCG_COND_EQ, t0, 0, label1);
- if (b & 1) {
- gen_helper_bsr(cpu_T[0], t0);
+ if ((b & 1) && (prefixes & PREFIX_REPZ) &&
+ (s->cpuid_ext3_features & CPUID_EXT3_ABM)) {
+ switch(ot) {
+ case OT_WORD: gen_helper_lzcnt(cpu_T[0], t0,
+ tcg_const_i32(16)); break;
+ case OT_LONG: gen_helper_lzcnt(cpu_T[0], t0,
+ tcg_const_i32(32)); break;
+ case OT_QUAD: gen_helper_lzcnt(cpu_T[0], t0,
+ tcg_const_i32(64)); break;
+ }
+ gen_op_mov_reg_T0(ot, reg);
} else {
- gen_helper_bsf(cpu_T[0], t0);
+ label1 = gen_new_label();
+ tcg_gen_movi_tl(cpu_cc_dst, 0);
+ tcg_gen_brcondi_tl(TCG_COND_EQ, t0, 0, label1);
+ if (b & 1) {
+ gen_helper_bsr(cpu_T[0], t0);
+ } else {
+ gen_helper_bsf(cpu_T[0], t0);
+ }
+ gen_op_mov_reg_T0(ot, reg);
+ tcg_gen_movi_tl(cpu_cc_dst, 1);
+ gen_set_label(label1);
+ tcg_gen_discard_tl(cpu_cc_src);
+ s->cc_op = CC_OP_LOGICB + ot;
}
- gen_op_mov_reg_T0(ot, reg);
- tcg_gen_movi_tl(cpu_cc_dst, 1);
- gen_set_label(label1);
- tcg_gen_discard_tl(cpu_cc_src);
- s->cc_op = CC_OP_LOGICB + ot;
tcg_temp_free(t0);
}
break;
--
1.6.4
^ permalink raw reply related [flat|nested] 12+ messages in thread
end of thread, other threads:[~2009-10-23 11:47 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-09-18 22:30 [Qemu-devel] [PATCH 0/4] TCG: add some instructions Andre Przywara
2009-09-18 22:30 ` [Qemu-devel] [PATCH 1/4] TCG x86: implement lzcnt emulation Andre Przywara
2009-10-04 12:02 ` Aurelien Jarno
2009-10-05 8:20 ` [Qemu-devel] [PATCH 1/4 v2] " Andre Przywara
2009-10-05 13:57 ` [Qemu-devel] " Aurelien Jarno
2009-10-23 11:44 ` [Qemu-devel] [PATCH v3] " Andre Przywara
2009-09-18 22:30 ` [Qemu-devel] [PATCH 2/4] TCG x86: add lock mov cr0 = cr8 Andre Przywara
2009-10-04 12:06 ` Aurelien Jarno
2009-09-18 22:30 ` [Qemu-devel] [PATCH 3/4] TCG x86: add SSE4a instruction support Andre Przywara
2009-10-04 12:10 ` Aurelien Jarno
2009-09-18 22:30 ` [Qemu-devel] [PATCH 4/4] TCG x86: add RDTSCP support Andre Przywara
2009-10-04 12:47 ` Aurelien Jarno
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).