xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/6] x86emul: FPU handling improvements
@ 2016-12-06 14:04 Jan Beulich
  2016-12-06 14:11 ` [PATCH 1/6] x86emul: extend / amend supported FPU opcodes Jan Beulich
                   ` (5 more replies)
  0 siblings, 6 replies; 17+ messages in thread
From: Jan Beulich @ 2016-12-06 14:04 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper

1: extend / amend supported FPU opcodes
2: simplify FPU source operand handling
3: simplify FPU destination operand handling
4: reduce FPU handling code size
5: avoid undefined behavior when dealing with 10-byte FPU operands
6: simplify FPU handling asm() constraints

Signed-off-by: Jan Beulich <jbeulich@suse.com>


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 1/6] x86emul: extend / amend supported FPU opcodes
  2016-12-06 14:04 [PATCH 0/6] x86emul: FPU handling improvements Jan Beulich
@ 2016-12-06 14:11 ` Jan Beulich
  2016-12-07 14:35   ` Andrew Cooper
  2016-12-06 14:12 ` [PATCH 2/6] x86emul: simplify FPU source operand handling Jan Beulich
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 17+ messages in thread
From: Jan Beulich @ 2016-12-06 14:11 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper

[-- Attachment #1: Type: text/plain, Size: 9133 bytes --]

First of all there are a number of secondary encodings both Intel and
AMD support, but which aren't formally documented.

Next there are a few more no-ops - instructions which served a purpose
only on 8087 or 287.

Further switch from fail_if() to raising of #UD in a couple of places
(as the decoding of FPU opcodes should now be complete except where
explictly marked as todo).

Also adjust a few comments.

Signed-off-by: Jan Beulich <jbeulich@suse.com>

--- a/xen/arch/x86/x86_emulate/x86_emulate.c
+++ b/xen/arch/x86/x86_emulate/x86_emulate.c
@@ -3672,18 +3672,18 @@ x86_emulate(
         host_and_vcpu_must_have(fpu);
         switch ( modrm )
         {
-        case 0xc0 ... 0xc7: /* fadd %stN,%stN */
-        case 0xc8 ... 0xcf: /* fmul %stN,%stN */
-        case 0xd0 ... 0xd7: /* fcom %stN,%stN */
-        case 0xd8 ... 0xdf: /* fcomp %stN,%stN */
-        case 0xe0 ... 0xe7: /* fsub %stN,%stN */
-        case 0xe8 ... 0xef: /* fsubr %stN,%stN */
-        case 0xf0 ... 0xf7: /* fdiv %stN,%stN */
-        case 0xf8 ... 0xff: /* fdivr %stN,%stN */
+        case 0xc0 ... 0xc7: /* fadd %stN,%st */
+        case 0xc8 ... 0xcf: /* fmul %stN,%st */
+        case 0xd0 ... 0xd7: /* fcom %stN,%st */
+        case 0xd8 ... 0xdf: /* fcomp %stN,%st */
+        case 0xe0 ... 0xe7: /* fsub %stN,%st */
+        case 0xe8 ... 0xef: /* fsubr %stN,%st */
+        case 0xf0 ... 0xf7: /* fdiv %stN,%st */
+        case 0xf8 ... 0xff: /* fdivr %stN,%st */
             emulate_fpu_insn_stub(0xd8, modrm);
             break;
         default:
-            fail_if(modrm >= 0xc0);
+            ASSERT(ea.type == OP_MEM);
             ea.bytes = 4;
             src = ea;
             if ( (rc = ops->read(src.mem.seg, src.mem.off, &src.val,
@@ -3729,6 +3729,7 @@ x86_emulate(
         case 0xc0 ... 0xc7: /* fld %stN */
         case 0xc8 ... 0xcf: /* fxch %stN */
         case 0xd0: /* fnop */
+        case 0xd8 ... 0xdf: /* fstp %stN (alternative encoding) */
         case 0xe0: /* fchs */
         case 0xe1: /* fabs */
         case 0xe4: /* ftst */
@@ -3758,7 +3759,7 @@ x86_emulate(
             emulate_fpu_insn_stub(0xd9, modrm);
             break;
         default:
-            fail_if(modrm >= 0xc0);
+            generate_exception_if(ea.type != OP_MEM, EXC_UD);
             switch ( modrm_reg & 7 )
             {
             case 0: /* fld m32fp */
@@ -3781,7 +3782,8 @@ x86_emulate(
                 dst.type = OP_MEM;
                 emulate_fpu_insn_memdst("fstps", dst.val);
                 break;
-                /* case 4: fldenv - TODO */
+            case 4: /* fldenv - TODO */
+                goto cannot_emulate;
             case 5: /* fldcw m2byte */
                 ea.bytes = 2;
                 src = ea;
@@ -3790,7 +3792,8 @@ x86_emulate(
                     goto done;
                 emulate_fpu_insn_memsrc("fldcw", src.val);
                 break;
-                /* case 6: fstenv - TODO */
+            case 6: /* fnstenv - TODO */
+                goto cannot_emulate;
             case 7: /* fnstcw m2byte */
                 ea.bytes = 2;
                 dst = ea;
@@ -3798,7 +3801,7 @@ x86_emulate(
                 emulate_fpu_insn_memdst("fnstcw", dst.val);
                 break;
             default:
-                goto cannot_emulate;
+                generate_exception_if(true, EXC_UD);
             }
         }
         break;
@@ -3818,7 +3821,7 @@ x86_emulate(
             emulate_fpu_insn_stub(0xda, modrm);
             break;
         default:
-            fail_if(modrm >= 0xc0);
+            generate_exception_if(ea.type != OP_MEM, EXC_UD);
             ea.bytes = 4;
             src = ea;
             if ( (rc = ops->read(src.mem.seg, src.mem.off, &src.val,
@@ -3867,16 +3870,16 @@ x86_emulate(
             vcpu_must_have_cmov();
             emulate_fpu_insn_stub_eflags(0xdb, modrm);
             break;
+        case 0xe0: /* fneni - 8087 only, ignored by 287 */
+        case 0xe1: /* fndisi - 8087 only, ignored by 287 */
         case 0xe2: /* fnclex */
-            emulate_fpu_insn("fnclex");
-            break;
         case 0xe3: /* fninit */
-            emulate_fpu_insn("fninit");
-            break;
-        case 0xe4: /* fsetpm - 287 only, ignored by 387 */
+        case 0xe4: /* fnsetpm - 287 only, ignored by 387 */
+        /* case 0xe5: frstpm - 287 only, #UD on 387 */
+            emulate_fpu_insn_stub(0xdb, modrm);
             break;
         default:
-            fail_if(modrm >= 0xc0);
+            generate_exception_if(ea.type != OP_MEM, EXC_UD);
             switch ( modrm_reg & 7 )
             {
             case 0: /* fild m32i */
@@ -3921,7 +3924,7 @@ x86_emulate(
                 emulate_fpu_insn_memdst("fstpt", dst.val);
                 break;
             default:
-                goto cannot_emulate;
+                generate_exception_if(true, EXC_UD);
             }
         }
         break;
@@ -3930,16 +3933,18 @@ x86_emulate(
         host_and_vcpu_must_have(fpu);
         switch ( modrm )
         {
-        case 0xc0 ... 0xc7: /* fadd %stN */
-        case 0xc8 ... 0xcf: /* fmul %stN */
-        case 0xe0 ... 0xe7: /* fsubr %stN */
-        case 0xe8 ... 0xef: /* fsub %stN */
-        case 0xf0 ... 0xf7: /* fdivr %stN */
-        case 0xf8 ... 0xff: /* fdiv %stN */
+        case 0xc0 ... 0xc7: /* fadd %st,%stN */
+        case 0xc8 ... 0xcf: /* fmul %st,%stN */
+        case 0xd0 ... 0xd7: /* fcom %stN,%st (alternative encoding) */
+        case 0xd8 ... 0xdf: /* fcomp %stN,%st (alternative encoding) */
+        case 0xe0 ... 0xe7: /* fsubr %st,%stN */
+        case 0xe8 ... 0xef: /* fsub %st,%stN */
+        case 0xf0 ... 0xf7: /* fdivr %st,%stN */
+        case 0xf8 ... 0xff: /* fdiv %st,%stN */
             emulate_fpu_insn_stub(0xdc, modrm);
             break;
         default:
-            fail_if(modrm >= 0xc0);
+            ASSERT(ea.type == OP_MEM);
             ea.bytes = 8;
             src = ea;
             if ( (rc = ops->read(src.mem.seg, src.mem.off, &src.val,
@@ -3980,6 +3985,7 @@ x86_emulate(
         switch ( modrm )
         {
         case 0xc0 ... 0xc7: /* ffree %stN */
+        case 0xc8 ... 0xcf: /* fxch %stN (alternative encoding) */
         case 0xd0 ... 0xd7: /* fst %stN */
         case 0xd8 ... 0xdf: /* fstp %stN */
         case 0xe0 ... 0xe7: /* fucom %stN */
@@ -3987,7 +3993,7 @@ x86_emulate(
             emulate_fpu_insn_stub(0xdd, modrm);
             break;
         default:
-            fail_if(modrm >= 0xc0);
+            generate_exception_if(ea.type != OP_MEM, EXC_UD);
             switch ( modrm_reg & 7 )
             {
             case 0: /* fld m64fp */;
@@ -4017,6 +4023,9 @@ x86_emulate(
                 dst.type = OP_MEM;
                 emulate_fpu_insn_memdst("fstpl", dst.val);
                 break;
+            case 4: /* frstor - TODO */
+            case 6: /* fnsave - TODO */
+                goto cannot_emulate;
             case 7: /* fnstsw m2byte */
                 ea.bytes = 2;
                 dst = ea;
@@ -4024,7 +4033,7 @@ x86_emulate(
                 emulate_fpu_insn_memdst("fnstsw", dst.val);
                 break;
             default:
-                goto cannot_emulate;
+                generate_exception_if(true, EXC_UD);
             }
         }
         break;
@@ -4035,6 +4044,7 @@ x86_emulate(
         {
         case 0xc0 ... 0xc7: /* faddp %stN */
         case 0xc8 ... 0xcf: /* fmulp %stN */
+        case 0xd0 ... 0xd7: /* fcomp %stN (alternative encoding) */
         case 0xd9: /* fcompp */
         case 0xe0 ... 0xe7: /* fsubrp %stN */
         case 0xe8 ... 0xef: /* fsubp %stN */
@@ -4043,7 +4053,7 @@ x86_emulate(
             emulate_fpu_insn_stub(0xde, modrm);
             break;
         default:
-            fail_if(modrm >= 0xc0);
+            generate_exception_if(ea.type != OP_MEM, EXC_UD);
             ea.bytes = 2;
             src = ea;
             if ( (rc = ops->read(src.mem.seg, src.mem.off, &src.val,
@@ -4090,13 +4100,19 @@ x86_emulate(
             dst.reg = (unsigned long *)&_regs.eax;
             emulate_fpu_insn_memdst("fnstsw", dst.val);
             break;
+        case 0xc0 ... 0xc7: /* ffreep %stN */
         case 0xe8 ... 0xef: /* fucomip %stN */
         case 0xf0 ... 0xf7: /* fcomip %stN */
             vcpu_must_have_cmov();
             emulate_fpu_insn_stub_eflags(0xdf, modrm);
             break;
+        case 0xc8 ... 0xcf: /* fxch %stN (alternative encoding) */
+        case 0xd0 ... 0xd7: /* fstp %stN (alternative encoding) */
+        case 0xd8 ... 0xdf: /* fstp %stN (alternative encoding) */
+            emulate_fpu_insn_stub(0xdf, modrm);
+            break;
         default:
-            fail_if(modrm >= 0xc0);
+            generate_exception_if(ea.type != OP_MEM, EXC_UD);
             switch ( modrm_reg & 7 )
             {
             case 0: /* fild m16i */



[-- Attachment #2: x86emul-FPU-opcodes.patch --]
[-- Type: text/plain, Size: 9178 bytes --]

x86emul: extend / amend supported FPU opcodes

First of all there are a number of secondary encodings both Intel and
AMD support, but which aren't formally documented.

Next there are a few more no-ops - instructions which served a purpose
only on 8087 or 287.

Further switch from fail_if() to raising of #UD in a couple of places
(as the decoding of FPU opcodes should now be complete except where
explictly marked as todo).

Also adjust a few comments.

Signed-off-by: Jan Beulich <jbeulich@suse.com>

--- a/xen/arch/x86/x86_emulate/x86_emulate.c
+++ b/xen/arch/x86/x86_emulate/x86_emulate.c
@@ -3672,18 +3672,18 @@ x86_emulate(
         host_and_vcpu_must_have(fpu);
         switch ( modrm )
         {
-        case 0xc0 ... 0xc7: /* fadd %stN,%stN */
-        case 0xc8 ... 0xcf: /* fmul %stN,%stN */
-        case 0xd0 ... 0xd7: /* fcom %stN,%stN */
-        case 0xd8 ... 0xdf: /* fcomp %stN,%stN */
-        case 0xe0 ... 0xe7: /* fsub %stN,%stN */
-        case 0xe8 ... 0xef: /* fsubr %stN,%stN */
-        case 0xf0 ... 0xf7: /* fdiv %stN,%stN */
-        case 0xf8 ... 0xff: /* fdivr %stN,%stN */
+        case 0xc0 ... 0xc7: /* fadd %stN,%st */
+        case 0xc8 ... 0xcf: /* fmul %stN,%st */
+        case 0xd0 ... 0xd7: /* fcom %stN,%st */
+        case 0xd8 ... 0xdf: /* fcomp %stN,%st */
+        case 0xe0 ... 0xe7: /* fsub %stN,%st */
+        case 0xe8 ... 0xef: /* fsubr %stN,%st */
+        case 0xf0 ... 0xf7: /* fdiv %stN,%st */
+        case 0xf8 ... 0xff: /* fdivr %stN,%st */
             emulate_fpu_insn_stub(0xd8, modrm);
             break;
         default:
-            fail_if(modrm >= 0xc0);
+            ASSERT(ea.type == OP_MEM);
             ea.bytes = 4;
             src = ea;
             if ( (rc = ops->read(src.mem.seg, src.mem.off, &src.val,
@@ -3729,6 +3729,7 @@ x86_emulate(
         case 0xc0 ... 0xc7: /* fld %stN */
         case 0xc8 ... 0xcf: /* fxch %stN */
         case 0xd0: /* fnop */
+        case 0xd8 ... 0xdf: /* fstp %stN (alternative encoding) */
         case 0xe0: /* fchs */
         case 0xe1: /* fabs */
         case 0xe4: /* ftst */
@@ -3758,7 +3759,7 @@ x86_emulate(
             emulate_fpu_insn_stub(0xd9, modrm);
             break;
         default:
-            fail_if(modrm >= 0xc0);
+            generate_exception_if(ea.type != OP_MEM, EXC_UD);
             switch ( modrm_reg & 7 )
             {
             case 0: /* fld m32fp */
@@ -3781,7 +3782,8 @@ x86_emulate(
                 dst.type = OP_MEM;
                 emulate_fpu_insn_memdst("fstps", dst.val);
                 break;
-                /* case 4: fldenv - TODO */
+            case 4: /* fldenv - TODO */
+                goto cannot_emulate;
             case 5: /* fldcw m2byte */
                 ea.bytes = 2;
                 src = ea;
@@ -3790,7 +3792,8 @@ x86_emulate(
                     goto done;
                 emulate_fpu_insn_memsrc("fldcw", src.val);
                 break;
-                /* case 6: fstenv - TODO */
+            case 6: /* fnstenv - TODO */
+                goto cannot_emulate;
             case 7: /* fnstcw m2byte */
                 ea.bytes = 2;
                 dst = ea;
@@ -3798,7 +3801,7 @@ x86_emulate(
                 emulate_fpu_insn_memdst("fnstcw", dst.val);
                 break;
             default:
-                goto cannot_emulate;
+                generate_exception_if(true, EXC_UD);
             }
         }
         break;
@@ -3818,7 +3821,7 @@ x86_emulate(
             emulate_fpu_insn_stub(0xda, modrm);
             break;
         default:
-            fail_if(modrm >= 0xc0);
+            generate_exception_if(ea.type != OP_MEM, EXC_UD);
             ea.bytes = 4;
             src = ea;
             if ( (rc = ops->read(src.mem.seg, src.mem.off, &src.val,
@@ -3867,16 +3870,16 @@ x86_emulate(
             vcpu_must_have_cmov();
             emulate_fpu_insn_stub_eflags(0xdb, modrm);
             break;
+        case 0xe0: /* fneni - 8087 only, ignored by 287 */
+        case 0xe1: /* fndisi - 8087 only, ignored by 287 */
         case 0xe2: /* fnclex */
-            emulate_fpu_insn("fnclex");
-            break;
         case 0xe3: /* fninit */
-            emulate_fpu_insn("fninit");
-            break;
-        case 0xe4: /* fsetpm - 287 only, ignored by 387 */
+        case 0xe4: /* fnsetpm - 287 only, ignored by 387 */
+        /* case 0xe5: frstpm - 287 only, #UD on 387 */
+            emulate_fpu_insn_stub(0xdb, modrm);
             break;
         default:
-            fail_if(modrm >= 0xc0);
+            generate_exception_if(ea.type != OP_MEM, EXC_UD);
             switch ( modrm_reg & 7 )
             {
             case 0: /* fild m32i */
@@ -3921,7 +3924,7 @@ x86_emulate(
                 emulate_fpu_insn_memdst("fstpt", dst.val);
                 break;
             default:
-                goto cannot_emulate;
+                generate_exception_if(true, EXC_UD);
             }
         }
         break;
@@ -3930,16 +3933,18 @@ x86_emulate(
         host_and_vcpu_must_have(fpu);
         switch ( modrm )
         {
-        case 0xc0 ... 0xc7: /* fadd %stN */
-        case 0xc8 ... 0xcf: /* fmul %stN */
-        case 0xe0 ... 0xe7: /* fsubr %stN */
-        case 0xe8 ... 0xef: /* fsub %stN */
-        case 0xf0 ... 0xf7: /* fdivr %stN */
-        case 0xf8 ... 0xff: /* fdiv %stN */
+        case 0xc0 ... 0xc7: /* fadd %st,%stN */
+        case 0xc8 ... 0xcf: /* fmul %st,%stN */
+        case 0xd0 ... 0xd7: /* fcom %stN,%st (alternative encoding) */
+        case 0xd8 ... 0xdf: /* fcomp %stN,%st (alternative encoding) */
+        case 0xe0 ... 0xe7: /* fsubr %st,%stN */
+        case 0xe8 ... 0xef: /* fsub %st,%stN */
+        case 0xf0 ... 0xf7: /* fdivr %st,%stN */
+        case 0xf8 ... 0xff: /* fdiv %st,%stN */
             emulate_fpu_insn_stub(0xdc, modrm);
             break;
         default:
-            fail_if(modrm >= 0xc0);
+            ASSERT(ea.type == OP_MEM);
             ea.bytes = 8;
             src = ea;
             if ( (rc = ops->read(src.mem.seg, src.mem.off, &src.val,
@@ -3980,6 +3985,7 @@ x86_emulate(
         switch ( modrm )
         {
         case 0xc0 ... 0xc7: /* ffree %stN */
+        case 0xc8 ... 0xcf: /* fxch %stN (alternative encoding) */
         case 0xd0 ... 0xd7: /* fst %stN */
         case 0xd8 ... 0xdf: /* fstp %stN */
         case 0xe0 ... 0xe7: /* fucom %stN */
@@ -3987,7 +3993,7 @@ x86_emulate(
             emulate_fpu_insn_stub(0xdd, modrm);
             break;
         default:
-            fail_if(modrm >= 0xc0);
+            generate_exception_if(ea.type != OP_MEM, EXC_UD);
             switch ( modrm_reg & 7 )
             {
             case 0: /* fld m64fp */;
@@ -4017,6 +4023,9 @@ x86_emulate(
                 dst.type = OP_MEM;
                 emulate_fpu_insn_memdst("fstpl", dst.val);
                 break;
+            case 4: /* frstor - TODO */
+            case 6: /* fnsave - TODO */
+                goto cannot_emulate;
             case 7: /* fnstsw m2byte */
                 ea.bytes = 2;
                 dst = ea;
@@ -4024,7 +4033,7 @@ x86_emulate(
                 emulate_fpu_insn_memdst("fnstsw", dst.val);
                 break;
             default:
-                goto cannot_emulate;
+                generate_exception_if(true, EXC_UD);
             }
         }
         break;
@@ -4035,6 +4044,7 @@ x86_emulate(
         {
         case 0xc0 ... 0xc7: /* faddp %stN */
         case 0xc8 ... 0xcf: /* fmulp %stN */
+        case 0xd0 ... 0xd7: /* fcomp %stN (alternative encoding) */
         case 0xd9: /* fcompp */
         case 0xe0 ... 0xe7: /* fsubrp %stN */
         case 0xe8 ... 0xef: /* fsubp %stN */
@@ -4043,7 +4053,7 @@ x86_emulate(
             emulate_fpu_insn_stub(0xde, modrm);
             break;
         default:
-            fail_if(modrm >= 0xc0);
+            generate_exception_if(ea.type != OP_MEM, EXC_UD);
             ea.bytes = 2;
             src = ea;
             if ( (rc = ops->read(src.mem.seg, src.mem.off, &src.val,
@@ -4090,13 +4100,19 @@ x86_emulate(
             dst.reg = (unsigned long *)&_regs.eax;
             emulate_fpu_insn_memdst("fnstsw", dst.val);
             break;
+        case 0xc0 ... 0xc7: /* ffreep %stN */
         case 0xe8 ... 0xef: /* fucomip %stN */
         case 0xf0 ... 0xf7: /* fcomip %stN */
             vcpu_must_have_cmov();
             emulate_fpu_insn_stub_eflags(0xdf, modrm);
             break;
+        case 0xc8 ... 0xcf: /* fxch %stN (alternative encoding) */
+        case 0xd0 ... 0xd7: /* fstp %stN (alternative encoding) */
+        case 0xd8 ... 0xdf: /* fstp %stN (alternative encoding) */
+            emulate_fpu_insn_stub(0xdf, modrm);
+            break;
         default:
-            fail_if(modrm >= 0xc0);
+            generate_exception_if(ea.type != OP_MEM, EXC_UD);
             switch ( modrm_reg & 7 )
             {
             case 0: /* fild m16i */

[-- Attachment #3: Type: text/plain, Size: 127 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 2/6] x86emul: simplify FPU source operand handling
  2016-12-06 14:04 [PATCH 0/6] x86emul: FPU handling improvements Jan Beulich
  2016-12-06 14:11 ` [PATCH 1/6] x86emul: extend / amend supported FPU opcodes Jan Beulich
@ 2016-12-06 14:12 ` Jan Beulich
  2016-12-07 14:50   ` Andrew Cooper
  2016-12-06 14:12 ` [PATCH 3/6] x86emul: simplify FPU destination " Jan Beulich
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 17+ messages in thread
From: Jan Beulich @ 2016-12-06 14:12 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper

[-- Attachment #1: Type: text/plain, Size: 7977 bytes --]

Consistently use ea instead of src for passing the memory address to
->read(). This eliminates the need to copy ea to src, resulting in a
couple of hundred bytes smaller binary size.

In addition for opcode DE we can leverage SrcMem16 to eliminate a call
of the ->read() hook. At the same time drop the stray Mov attributes
from D8, DA, DC, and DE: They're meaningful for memory writes only.

Signed-off-by: Jan Beulich <jbeulich@suse.com>

--- a/xen/arch/x86/x86_emulate/x86_emulate.c
+++ b/xen/arch/x86/x86_emulate/x86_emulate.c
@@ -159,10 +159,10 @@ static const opcode_desc_t opcode_table[
     ByteOp|DstMem|SrcImplicit|ModRM, DstMem|SrcImplicit|ModRM,
     DstImplicit|SrcImmByte, DstImplicit|SrcImmByte, ImplicitOps, ImplicitOps,
     /* 0xD8 - 0xDF */
-    ImplicitOps|ModRM|Mov, ImplicitOps|ModRM|Mov,
-    ImplicitOps|ModRM|Mov, ImplicitOps|ModRM|Mov,
-    ImplicitOps|ModRM|Mov, ImplicitOps|ModRM|Mov,
-    ImplicitOps|ModRM|Mov, ImplicitOps|ModRM|Mov,
+    ImplicitOps|ModRM, ImplicitOps|ModRM|Mov,
+    ImplicitOps|ModRM, ImplicitOps|ModRM|Mov,
+    ImplicitOps|ModRM, ImplicitOps|ModRM|Mov,
+    DstImplicit|SrcMem16|ModRM, ImplicitOps|ModRM|Mov,
     /* 0xE0 - 0xE7 */
     DstImplicit|SrcImmByte, DstImplicit|SrcImmByte,
     DstImplicit|SrcImmByte, DstImplicit|SrcImmByte,
@@ -3684,10 +3684,8 @@ x86_emulate(
             break;
         default:
             ASSERT(ea.type == OP_MEM);
-            ea.bytes = 4;
-            src = ea;
-            if ( (rc = ops->read(src.mem.seg, src.mem.off, &src.val,
-                                 src.bytes, ctxt)) != 0 )
+            if ( (rc = ops->read(ea.mem.seg, ea.mem.off, &src.val,
+                                 4, ctxt)) != X86EMUL_OKAY )
                 goto done;
             switch ( modrm_reg & 7 )
             {
@@ -3763,10 +3761,8 @@ x86_emulate(
             switch ( modrm_reg & 7 )
             {
             case 0: /* fld m32fp */
-                ea.bytes = 4;
-                src = ea;
                 if ( (rc = ops->read(ea.mem.seg, ea.mem.off, &src.val,
-                                     src.bytes, ctxt)) != 0 )
+                                     4, ctxt)) != X86EMUL_OKAY )
                     goto done;
                 emulate_fpu_insn_memsrc("flds", src.val);
                 break;
@@ -3785,10 +3781,8 @@ x86_emulate(
             case 4: /* fldenv - TODO */
                 goto cannot_emulate;
             case 5: /* fldcw m2byte */
-                ea.bytes = 2;
-                src = ea;
-                if ( (rc = ops->read(src.mem.seg, src.mem.off, &src.val,
-                                     src.bytes, ctxt)) != 0 )
+                if ( (rc = ops->read(ea.mem.seg, ea.mem.off, &src.val,
+                                     2, ctxt)) != X86EMUL_OKAY )
                     goto done;
                 emulate_fpu_insn_memsrc("fldcw", src.val);
                 break;
@@ -3822,10 +3816,8 @@ x86_emulate(
             break;
         default:
             generate_exception_if(ea.type != OP_MEM, EXC_UD);
-            ea.bytes = 4;
-            src = ea;
-            if ( (rc = ops->read(src.mem.seg, src.mem.off, &src.val,
-                                 src.bytes, ctxt)) != 0 )
+            if ( (rc = ops->read(ea.mem.seg, ea.mem.off, &src.val,
+                                 4, ctxt)) != X86EMUL_OKAY )
                 goto done;
             switch ( modrm_reg & 7 )
             {
@@ -3883,10 +3875,8 @@ x86_emulate(
             switch ( modrm_reg & 7 )
             {
             case 0: /* fild m32i */
-                ea.bytes = 4;
-                src = ea;
-                if ( (rc = ops->read(src.mem.seg, src.mem.off, &src.val,
-                                     src.bytes, ctxt)) != 0 )
+                if ( (rc = ops->read(ea.mem.seg, ea.mem.off, &src.val,
+                                     4, ctxt)) != X86EMUL_OKAY )
                     goto done;
                 emulate_fpu_insn_memsrc("fildl", src.val);
                 break;
@@ -3910,10 +3900,8 @@ x86_emulate(
                 emulate_fpu_insn_memdst("fistpl", dst.val);
                 break;
             case 5: /* fld m80fp */
-                ea.bytes = 10;
-                src = ea;
-                if ( (rc = ops->read(src.mem.seg, src.mem.off,
-                                     &src.val, src.bytes, ctxt)) != 0 )
+                if ( (rc = ops->read(ea.mem.seg, ea.mem.off, &src.val,
+                                     10, ctxt)) != X86EMUL_OKAY )
                     goto done;
                 emulate_fpu_insn_memsrc("fldt", src.val);
                 break;
@@ -3945,10 +3933,8 @@ x86_emulate(
             break;
         default:
             ASSERT(ea.type == OP_MEM);
-            ea.bytes = 8;
-            src = ea;
-            if ( (rc = ops->read(src.mem.seg, src.mem.off, &src.val,
-                                 src.bytes, ctxt)) != 0 )
+            if ( (rc = ops->read(ea.mem.seg, ea.mem.off, &src.val,
+                                 8, ctxt)) != X86EMUL_OKAY )
                 goto done;
             switch ( modrm_reg & 7 )
             {
@@ -3997,10 +3983,8 @@ x86_emulate(
             switch ( modrm_reg & 7 )
             {
             case 0: /* fld m64fp */;
-                ea.bytes = 8;
-                src = ea;
-                if ( (rc = ops->read(src.mem.seg, src.mem.off, &src.val,
-                                     src.bytes, ctxt)) != 0 )
+                if ( (rc = ops->read(ea.mem.seg, ea.mem.off, &src.val,
+                                     8, ctxt)) != X86EMUL_OKAY )
                     goto done;
                 emulate_fpu_insn_memsrc("fldl", src.val);
                 break;
@@ -4054,11 +4038,6 @@ x86_emulate(
             break;
         default:
             generate_exception_if(ea.type != OP_MEM, EXC_UD);
-            ea.bytes = 2;
-            src = ea;
-            if ( (rc = ops->read(src.mem.seg, src.mem.off, &src.val,
-                                 src.bytes, ctxt)) != 0 )
-                goto done;
             switch ( modrm_reg & 7 )
             {
             case 0: /* fiadd m16i */
@@ -4116,10 +4095,8 @@ x86_emulate(
             switch ( modrm_reg & 7 )
             {
             case 0: /* fild m16i */
-                ea.bytes = 2;
-                src = ea;
-                if ( (rc = ops->read(src.mem.seg, src.mem.off, &src.val,
-                                     src.bytes, ctxt)) != 0 )
+                if ( (rc = ops->read(ea.mem.seg, ea.mem.off, &src.val,
+                                     2, ctxt)) != X86EMUL_OKAY )
                     goto done;
                 emulate_fpu_insn_memsrc("filds", src.val);
                 break;
@@ -4143,18 +4120,14 @@ x86_emulate(
                 emulate_fpu_insn_memdst("fistps", dst.val);
                 break;
             case 4: /* fbld m80dec */
-                ea.bytes = 10;
-                src = ea;
-                if ( (rc = ops->read(src.mem.seg, src.mem.off,
-                                     &src.val, src.bytes, ctxt)) != 0 )
+                if ( (rc = ops->read(ea.mem.seg, ea.mem.off, &src.val,
+                                     10, ctxt)) != X86EMUL_OKAY )
                     goto done;
                 emulate_fpu_insn_memsrc("fbld", src.val);
                 break;
             case 5: /* fild m64i */
-                ea.bytes = 8;
-                src = ea;
-                if ( (rc = ops->read(src.mem.seg, src.mem.off, &src.val,
-                                     src.bytes, ctxt)) != 0 )
+                if ( (rc = ops->read(ea.mem.seg, ea.mem.off, &src.val,
+                                     8, ctxt)) != X86EMUL_OKAY )
                     goto done;
                 emulate_fpu_insn_memsrc("fildll", src.val);
                 break;



[-- Attachment #2: x86emul-FPU-simplify-src.patch --]
[-- Type: text/plain, Size: 8022 bytes --]

x86emul: simplify FPU source operand handling

Consistently use ea instead of src for passing the memory address to
->read(). This eliminates the need to copy ea to src, resulting in a
couple of hundred bytes smaller binary size.

In addition for opcode DE we can leverage SrcMem16 to eliminate a call
of the ->read() hook. At the same time drop the stray Mov attributes
from D8, DA, DC, and DE: They're meaningful for memory writes only.

Signed-off-by: Jan Beulich <jbeulich@suse.com>

--- a/xen/arch/x86/x86_emulate/x86_emulate.c
+++ b/xen/arch/x86/x86_emulate/x86_emulate.c
@@ -159,10 +159,10 @@ static const opcode_desc_t opcode_table[
     ByteOp|DstMem|SrcImplicit|ModRM, DstMem|SrcImplicit|ModRM,
     DstImplicit|SrcImmByte, DstImplicit|SrcImmByte, ImplicitOps, ImplicitOps,
     /* 0xD8 - 0xDF */
-    ImplicitOps|ModRM|Mov, ImplicitOps|ModRM|Mov,
-    ImplicitOps|ModRM|Mov, ImplicitOps|ModRM|Mov,
-    ImplicitOps|ModRM|Mov, ImplicitOps|ModRM|Mov,
-    ImplicitOps|ModRM|Mov, ImplicitOps|ModRM|Mov,
+    ImplicitOps|ModRM, ImplicitOps|ModRM|Mov,
+    ImplicitOps|ModRM, ImplicitOps|ModRM|Mov,
+    ImplicitOps|ModRM, ImplicitOps|ModRM|Mov,
+    DstImplicit|SrcMem16|ModRM, ImplicitOps|ModRM|Mov,
     /* 0xE0 - 0xE7 */
     DstImplicit|SrcImmByte, DstImplicit|SrcImmByte,
     DstImplicit|SrcImmByte, DstImplicit|SrcImmByte,
@@ -3684,10 +3684,8 @@ x86_emulate(
             break;
         default:
             ASSERT(ea.type == OP_MEM);
-            ea.bytes = 4;
-            src = ea;
-            if ( (rc = ops->read(src.mem.seg, src.mem.off, &src.val,
-                                 src.bytes, ctxt)) != 0 )
+            if ( (rc = ops->read(ea.mem.seg, ea.mem.off, &src.val,
+                                 4, ctxt)) != X86EMUL_OKAY )
                 goto done;
             switch ( modrm_reg & 7 )
             {
@@ -3763,10 +3761,8 @@ x86_emulate(
             switch ( modrm_reg & 7 )
             {
             case 0: /* fld m32fp */
-                ea.bytes = 4;
-                src = ea;
                 if ( (rc = ops->read(ea.mem.seg, ea.mem.off, &src.val,
-                                     src.bytes, ctxt)) != 0 )
+                                     4, ctxt)) != X86EMUL_OKAY )
                     goto done;
                 emulate_fpu_insn_memsrc("flds", src.val);
                 break;
@@ -3785,10 +3781,8 @@ x86_emulate(
             case 4: /* fldenv - TODO */
                 goto cannot_emulate;
             case 5: /* fldcw m2byte */
-                ea.bytes = 2;
-                src = ea;
-                if ( (rc = ops->read(src.mem.seg, src.mem.off, &src.val,
-                                     src.bytes, ctxt)) != 0 )
+                if ( (rc = ops->read(ea.mem.seg, ea.mem.off, &src.val,
+                                     2, ctxt)) != X86EMUL_OKAY )
                     goto done;
                 emulate_fpu_insn_memsrc("fldcw", src.val);
                 break;
@@ -3822,10 +3816,8 @@ x86_emulate(
             break;
         default:
             generate_exception_if(ea.type != OP_MEM, EXC_UD);
-            ea.bytes = 4;
-            src = ea;
-            if ( (rc = ops->read(src.mem.seg, src.mem.off, &src.val,
-                                 src.bytes, ctxt)) != 0 )
+            if ( (rc = ops->read(ea.mem.seg, ea.mem.off, &src.val,
+                                 4, ctxt)) != X86EMUL_OKAY )
                 goto done;
             switch ( modrm_reg & 7 )
             {
@@ -3883,10 +3875,8 @@ x86_emulate(
             switch ( modrm_reg & 7 )
             {
             case 0: /* fild m32i */
-                ea.bytes = 4;
-                src = ea;
-                if ( (rc = ops->read(src.mem.seg, src.mem.off, &src.val,
-                                     src.bytes, ctxt)) != 0 )
+                if ( (rc = ops->read(ea.mem.seg, ea.mem.off, &src.val,
+                                     4, ctxt)) != X86EMUL_OKAY )
                     goto done;
                 emulate_fpu_insn_memsrc("fildl", src.val);
                 break;
@@ -3910,10 +3900,8 @@ x86_emulate(
                 emulate_fpu_insn_memdst("fistpl", dst.val);
                 break;
             case 5: /* fld m80fp */
-                ea.bytes = 10;
-                src = ea;
-                if ( (rc = ops->read(src.mem.seg, src.mem.off,
-                                     &src.val, src.bytes, ctxt)) != 0 )
+                if ( (rc = ops->read(ea.mem.seg, ea.mem.off, &src.val,
+                                     10, ctxt)) != X86EMUL_OKAY )
                     goto done;
                 emulate_fpu_insn_memsrc("fldt", src.val);
                 break;
@@ -3945,10 +3933,8 @@ x86_emulate(
             break;
         default:
             ASSERT(ea.type == OP_MEM);
-            ea.bytes = 8;
-            src = ea;
-            if ( (rc = ops->read(src.mem.seg, src.mem.off, &src.val,
-                                 src.bytes, ctxt)) != 0 )
+            if ( (rc = ops->read(ea.mem.seg, ea.mem.off, &src.val,
+                                 8, ctxt)) != X86EMUL_OKAY )
                 goto done;
             switch ( modrm_reg & 7 )
             {
@@ -3997,10 +3983,8 @@ x86_emulate(
             switch ( modrm_reg & 7 )
             {
             case 0: /* fld m64fp */;
-                ea.bytes = 8;
-                src = ea;
-                if ( (rc = ops->read(src.mem.seg, src.mem.off, &src.val,
-                                     src.bytes, ctxt)) != 0 )
+                if ( (rc = ops->read(ea.mem.seg, ea.mem.off, &src.val,
+                                     8, ctxt)) != X86EMUL_OKAY )
                     goto done;
                 emulate_fpu_insn_memsrc("fldl", src.val);
                 break;
@@ -4054,11 +4038,6 @@ x86_emulate(
             break;
         default:
             generate_exception_if(ea.type != OP_MEM, EXC_UD);
-            ea.bytes = 2;
-            src = ea;
-            if ( (rc = ops->read(src.mem.seg, src.mem.off, &src.val,
-                                 src.bytes, ctxt)) != 0 )
-                goto done;
             switch ( modrm_reg & 7 )
             {
             case 0: /* fiadd m16i */
@@ -4116,10 +4095,8 @@ x86_emulate(
             switch ( modrm_reg & 7 )
             {
             case 0: /* fild m16i */
-                ea.bytes = 2;
-                src = ea;
-                if ( (rc = ops->read(src.mem.seg, src.mem.off, &src.val,
-                                     src.bytes, ctxt)) != 0 )
+                if ( (rc = ops->read(ea.mem.seg, ea.mem.off, &src.val,
+                                     2, ctxt)) != X86EMUL_OKAY )
                     goto done;
                 emulate_fpu_insn_memsrc("filds", src.val);
                 break;
@@ -4143,18 +4120,14 @@ x86_emulate(
                 emulate_fpu_insn_memdst("fistps", dst.val);
                 break;
             case 4: /* fbld m80dec */
-                ea.bytes = 10;
-                src = ea;
-                if ( (rc = ops->read(src.mem.seg, src.mem.off,
-                                     &src.val, src.bytes, ctxt)) != 0 )
+                if ( (rc = ops->read(ea.mem.seg, ea.mem.off, &src.val,
+                                     10, ctxt)) != X86EMUL_OKAY )
                     goto done;
                 emulate_fpu_insn_memsrc("fbld", src.val);
                 break;
             case 5: /* fild m64i */
-                ea.bytes = 8;
-                src = ea;
-                if ( (rc = ops->read(src.mem.seg, src.mem.off, &src.val,
-                                     src.bytes, ctxt)) != 0 )
+                if ( (rc = ops->read(ea.mem.seg, ea.mem.off, &src.val,
+                                     8, ctxt)) != X86EMUL_OKAY )
                     goto done;
                 emulate_fpu_insn_memsrc("fildll", src.val);
                 break;

[-- Attachment #3: Type: text/plain, Size: 127 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 3/6] x86emul: simplify FPU destination operand handling
  2016-12-06 14:04 [PATCH 0/6] x86emul: FPU handling improvements Jan Beulich
  2016-12-06 14:11 ` [PATCH 1/6] x86emul: extend / amend supported FPU opcodes Jan Beulich
  2016-12-06 14:12 ` [PATCH 2/6] x86emul: simplify FPU source operand handling Jan Beulich
@ 2016-12-06 14:12 ` Jan Beulich
  2016-12-07 14:54   ` Andrew Cooper
  2016-12-06 14:13 ` [PATCH 4/6] x86emul: reduce FPU handling code size Jan Beulich
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 17+ messages in thread
From: Jan Beulich @ 2016-12-06 14:12 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper

[-- Attachment #1: Type: text/plain, Size: 8486 bytes --]

Consolidate the copying of ea to dst: There's no need to set the type
to OP_MEM, and instead the load cases setting it to OP_NONE allows the
copying to be done just once per major opcode.

Signed-off-by: Jan Beulich <jbeulich@suse.com>

--- a/xen/arch/x86/x86_emulate/x86_emulate.c
+++ b/xen/arch/x86/x86_emulate/x86_emulate.c
@@ -3758,6 +3758,7 @@ x86_emulate(
             break;
         default:
             generate_exception_if(ea.type != OP_MEM, EXC_UD);
+            dst = ea;
             switch ( modrm_reg & 7 )
             {
             case 0: /* fld m32fp */
@@ -3765,18 +3766,15 @@ x86_emulate(
                                      4, ctxt)) != X86EMUL_OKAY )
                     goto done;
                 emulate_fpu_insn_memsrc("flds", src.val);
+                dst.type = OP_NONE;
                 break;
             case 2: /* fstp m32fp */
-                ea.bytes = 4;
-                dst = ea;
-                dst.type = OP_MEM;
                 emulate_fpu_insn_memdst("fsts", dst.val);
+                dst.bytes = 4;
                 break;
             case 3: /* fstp m32fp */
-                ea.bytes = 4;
-                dst = ea;
-                dst.type = OP_MEM;
                 emulate_fpu_insn_memdst("fstps", dst.val);
+                dst.bytes = 4;
                 break;
             case 4: /* fldenv - TODO */
                 goto cannot_emulate;
@@ -3785,14 +3783,13 @@ x86_emulate(
                                      2, ctxt)) != X86EMUL_OKAY )
                     goto done;
                 emulate_fpu_insn_memsrc("fldcw", src.val);
+                dst.type = OP_NONE;
                 break;
             case 6: /* fnstenv - TODO */
                 goto cannot_emulate;
             case 7: /* fnstcw m2byte */
-                ea.bytes = 2;
-                dst = ea;
-                dst.type = OP_MEM;
                 emulate_fpu_insn_memdst("fnstcw", dst.val);
+                dst.bytes = 2;
                 break;
             default:
                 generate_exception_if(true, EXC_UD);
@@ -3872,6 +3869,7 @@ x86_emulate(
             break;
         default:
             generate_exception_if(ea.type != OP_MEM, EXC_UD);
+            dst = ea;
             switch ( modrm_reg & 7 )
             {
             case 0: /* fild m32i */
@@ -3879,37 +3877,31 @@ x86_emulate(
                                      4, ctxt)) != X86EMUL_OKAY )
                     goto done;
                 emulate_fpu_insn_memsrc("fildl", src.val);
+                dst.type = OP_NONE;
                 break;
             case 1: /* fisttp m32i */
                 host_and_vcpu_must_have(sse3);
-                ea.bytes = 4;
-                dst = ea;
-                dst.type = OP_MEM;
                 emulate_fpu_insn_memdst("fisttpl", dst.val);
+                dst.bytes = 4;
                 break;
             case 2: /* fist m32i */
-                ea.bytes = 4;
-                dst = ea;
-                dst.type = OP_MEM;
                 emulate_fpu_insn_memdst("fistl", dst.val);
+                dst.bytes = 4;
                 break;
             case 3: /* fistp m32i */
-                ea.bytes = 4;
-                dst = ea;
-                dst.type = OP_MEM;
                 emulate_fpu_insn_memdst("fistpl", dst.val);
+                dst.bytes = 4;
                 break;
             case 5: /* fld m80fp */
                 if ( (rc = ops->read(ea.mem.seg, ea.mem.off, &src.val,
                                      10, ctxt)) != X86EMUL_OKAY )
                     goto done;
                 emulate_fpu_insn_memsrc("fldt", src.val);
+                dst.type = OP_NONE;
                 break;
             case 7: /* fstp m80fp */
-                ea.bytes = 10;
-                dst.type = OP_MEM;
-                dst = ea;
                 emulate_fpu_insn_memdst("fstpt", dst.val);
+                dst.bytes = 10;
                 break;
             default:
                 generate_exception_if(true, EXC_UD);
@@ -3980,6 +3972,7 @@ x86_emulate(
             break;
         default:
             generate_exception_if(ea.type != OP_MEM, EXC_UD);
+            dst = ea;
             switch ( modrm_reg & 7 )
             {
             case 0: /* fld m64fp */;
@@ -3987,34 +3980,27 @@ x86_emulate(
                                      8, ctxt)) != X86EMUL_OKAY )
                     goto done;
                 emulate_fpu_insn_memsrc("fldl", src.val);
+                dst.type = OP_NONE;
                 break;
             case 1: /* fisttp m64i */
                 host_and_vcpu_must_have(sse3);
-                ea.bytes = 8;
-                dst = ea;
-                dst.type = OP_MEM;
                 emulate_fpu_insn_memdst("fisttpll", dst.val);
+                dst.bytes = 8;
                 break;
             case 2: /* fst m64fp */
-                ea.bytes = 8;
-                dst = ea;
-                dst.type = OP_MEM;
                 emulate_fpu_insn_memdst("fstl", dst.val);
+                dst.bytes = 8;
                 break;
             case 3: /* fstp m64fp */
-                ea.bytes = 8;
-                dst = ea;
-                dst.type = OP_MEM;
                 emulate_fpu_insn_memdst("fstpl", dst.val);
+                dst.bytes = 8;
                 break;
             case 4: /* frstor - TODO */
             case 6: /* fnsave - TODO */
                 goto cannot_emulate;
             case 7: /* fnstsw m2byte */
-                ea.bytes = 2;
-                dst = ea;
-                dst.type = OP_MEM;
                 emulate_fpu_insn_memdst("fnstsw", dst.val);
+                dst.bytes = 2;
                 break;
             default:
                 generate_exception_if(true, EXC_UD);
@@ -4092,6 +4078,7 @@ x86_emulate(
             break;
         default:
             generate_exception_if(ea.type != OP_MEM, EXC_UD);
+            dst = ea;
             switch ( modrm_reg & 7 )
             {
             case 0: /* fild m16i */
@@ -4099,49 +4086,42 @@ x86_emulate(
                                      2, ctxt)) != X86EMUL_OKAY )
                     goto done;
                 emulate_fpu_insn_memsrc("filds", src.val);
+                dst.type = OP_NONE;
                 break;
             case 1: /* fisttp m16i */
                 host_and_vcpu_must_have(sse3);
-                ea.bytes = 2;
-                dst = ea;
-                dst.type = OP_MEM;
                 emulate_fpu_insn_memdst("fisttps", dst.val);
+                dst.bytes = 2;
                 break;
             case 2: /* fist m16i */
-                ea.bytes = 2;
-                dst = ea;
-                dst.type = OP_MEM;
                 emulate_fpu_insn_memdst("fists", dst.val);
+                dst.bytes = 2;
                 break;
             case 3: /* fistp m16i */
-                ea.bytes = 2;
-                dst = ea;
-                dst.type = OP_MEM;
                 emulate_fpu_insn_memdst("fistps", dst.val);
+                dst.bytes = 2;
                 break;
             case 4: /* fbld m80dec */
                 if ( (rc = ops->read(ea.mem.seg, ea.mem.off, &src.val,
                                      10, ctxt)) != X86EMUL_OKAY )
                     goto done;
                 emulate_fpu_insn_memsrc("fbld", src.val);
+                dst.type = OP_NONE;
                 break;
             case 5: /* fild m64i */
                 if ( (rc = ops->read(ea.mem.seg, ea.mem.off, &src.val,
                                      8, ctxt)) != X86EMUL_OKAY )
                     goto done;
                 emulate_fpu_insn_memsrc("fildll", src.val);
+                dst.type = OP_NONE;
                 break;
             case 6: /* fbstp packed bcd */
-                ea.bytes = 10;
-                dst = ea;
-                dst.type = OP_MEM;
                 emulate_fpu_insn_memdst("fbstp", dst.val);
+                dst.bytes = 10;
                 break;
             case 7: /* fistp m64i */
-                ea.bytes = 8;
-                dst = ea;
-                dst.type = OP_MEM;
                 emulate_fpu_insn_memdst("fistpll", dst.val);
+                dst.bytes = 8;
                 break;
             }
         }



[-- Attachment #2: x86emul-FPU-simplify-dst.patch --]
[-- Type: text/plain, Size: 8536 bytes --]

x86emul: simplify FPU destination operand handling

Consolidate the copying of ea to dst: There's no need to set the type
to OP_MEM, and instead the load cases setting it to OP_NONE allows the
copying to be done just once per major opcode.

Signed-off-by: Jan Beulich <jbeulich@suse.com>

--- a/xen/arch/x86/x86_emulate/x86_emulate.c
+++ b/xen/arch/x86/x86_emulate/x86_emulate.c
@@ -3758,6 +3758,7 @@ x86_emulate(
             break;
         default:
             generate_exception_if(ea.type != OP_MEM, EXC_UD);
+            dst = ea;
             switch ( modrm_reg & 7 )
             {
             case 0: /* fld m32fp */
@@ -3765,18 +3766,15 @@ x86_emulate(
                                      4, ctxt)) != X86EMUL_OKAY )
                     goto done;
                 emulate_fpu_insn_memsrc("flds", src.val);
+                dst.type = OP_NONE;
                 break;
             case 2: /* fstp m32fp */
-                ea.bytes = 4;
-                dst = ea;
-                dst.type = OP_MEM;
                 emulate_fpu_insn_memdst("fsts", dst.val);
+                dst.bytes = 4;
                 break;
             case 3: /* fstp m32fp */
-                ea.bytes = 4;
-                dst = ea;
-                dst.type = OP_MEM;
                 emulate_fpu_insn_memdst("fstps", dst.val);
+                dst.bytes = 4;
                 break;
             case 4: /* fldenv - TODO */
                 goto cannot_emulate;
@@ -3785,14 +3783,13 @@ x86_emulate(
                                      2, ctxt)) != X86EMUL_OKAY )
                     goto done;
                 emulate_fpu_insn_memsrc("fldcw", src.val);
+                dst.type = OP_NONE;
                 break;
             case 6: /* fnstenv - TODO */
                 goto cannot_emulate;
             case 7: /* fnstcw m2byte */
-                ea.bytes = 2;
-                dst = ea;
-                dst.type = OP_MEM;
                 emulate_fpu_insn_memdst("fnstcw", dst.val);
+                dst.bytes = 2;
                 break;
             default:
                 generate_exception_if(true, EXC_UD);
@@ -3872,6 +3869,7 @@ x86_emulate(
             break;
         default:
             generate_exception_if(ea.type != OP_MEM, EXC_UD);
+            dst = ea;
             switch ( modrm_reg & 7 )
             {
             case 0: /* fild m32i */
@@ -3879,37 +3877,31 @@ x86_emulate(
                                      4, ctxt)) != X86EMUL_OKAY )
                     goto done;
                 emulate_fpu_insn_memsrc("fildl", src.val);
+                dst.type = OP_NONE;
                 break;
             case 1: /* fisttp m32i */
                 host_and_vcpu_must_have(sse3);
-                ea.bytes = 4;
-                dst = ea;
-                dst.type = OP_MEM;
                 emulate_fpu_insn_memdst("fisttpl", dst.val);
+                dst.bytes = 4;
                 break;
             case 2: /* fist m32i */
-                ea.bytes = 4;
-                dst = ea;
-                dst.type = OP_MEM;
                 emulate_fpu_insn_memdst("fistl", dst.val);
+                dst.bytes = 4;
                 break;
             case 3: /* fistp m32i */
-                ea.bytes = 4;
-                dst = ea;
-                dst.type = OP_MEM;
                 emulate_fpu_insn_memdst("fistpl", dst.val);
+                dst.bytes = 4;
                 break;
             case 5: /* fld m80fp */
                 if ( (rc = ops->read(ea.mem.seg, ea.mem.off, &src.val,
                                      10, ctxt)) != X86EMUL_OKAY )
                     goto done;
                 emulate_fpu_insn_memsrc("fldt", src.val);
+                dst.type = OP_NONE;
                 break;
             case 7: /* fstp m80fp */
-                ea.bytes = 10;
-                dst.type = OP_MEM;
-                dst = ea;
                 emulate_fpu_insn_memdst("fstpt", dst.val);
+                dst.bytes = 10;
                 break;
             default:
                 generate_exception_if(true, EXC_UD);
@@ -3980,6 +3972,7 @@ x86_emulate(
             break;
         default:
             generate_exception_if(ea.type != OP_MEM, EXC_UD);
+            dst = ea;
             switch ( modrm_reg & 7 )
             {
             case 0: /* fld m64fp */;
@@ -3987,34 +3980,27 @@ x86_emulate(
                                      8, ctxt)) != X86EMUL_OKAY )
                     goto done;
                 emulate_fpu_insn_memsrc("fldl", src.val);
+                dst.type = OP_NONE;
                 break;
             case 1: /* fisttp m64i */
                 host_and_vcpu_must_have(sse3);
-                ea.bytes = 8;
-                dst = ea;
-                dst.type = OP_MEM;
                 emulate_fpu_insn_memdst("fisttpll", dst.val);
+                dst.bytes = 8;
                 break;
             case 2: /* fst m64fp */
-                ea.bytes = 8;
-                dst = ea;
-                dst.type = OP_MEM;
                 emulate_fpu_insn_memdst("fstl", dst.val);
+                dst.bytes = 8;
                 break;
             case 3: /* fstp m64fp */
-                ea.bytes = 8;
-                dst = ea;
-                dst.type = OP_MEM;
                 emulate_fpu_insn_memdst("fstpl", dst.val);
+                dst.bytes = 8;
                 break;
             case 4: /* frstor - TODO */
             case 6: /* fnsave - TODO */
                 goto cannot_emulate;
             case 7: /* fnstsw m2byte */
-                ea.bytes = 2;
-                dst = ea;
-                dst.type = OP_MEM;
                 emulate_fpu_insn_memdst("fnstsw", dst.val);
+                dst.bytes = 2;
                 break;
             default:
                 generate_exception_if(true, EXC_UD);
@@ -4092,6 +4078,7 @@ x86_emulate(
             break;
         default:
             generate_exception_if(ea.type != OP_MEM, EXC_UD);
+            dst = ea;
             switch ( modrm_reg & 7 )
             {
             case 0: /* fild m16i */
@@ -4099,49 +4086,42 @@ x86_emulate(
                                      2, ctxt)) != X86EMUL_OKAY )
                     goto done;
                 emulate_fpu_insn_memsrc("filds", src.val);
+                dst.type = OP_NONE;
                 break;
             case 1: /* fisttp m16i */
                 host_and_vcpu_must_have(sse3);
-                ea.bytes = 2;
-                dst = ea;
-                dst.type = OP_MEM;
                 emulate_fpu_insn_memdst("fisttps", dst.val);
+                dst.bytes = 2;
                 break;
             case 2: /* fist m16i */
-                ea.bytes = 2;
-                dst = ea;
-                dst.type = OP_MEM;
                 emulate_fpu_insn_memdst("fists", dst.val);
+                dst.bytes = 2;
                 break;
             case 3: /* fistp m16i */
-                ea.bytes = 2;
-                dst = ea;
-                dst.type = OP_MEM;
                 emulate_fpu_insn_memdst("fistps", dst.val);
+                dst.bytes = 2;
                 break;
             case 4: /* fbld m80dec */
                 if ( (rc = ops->read(ea.mem.seg, ea.mem.off, &src.val,
                                      10, ctxt)) != X86EMUL_OKAY )
                     goto done;
                 emulate_fpu_insn_memsrc("fbld", src.val);
+                dst.type = OP_NONE;
                 break;
             case 5: /* fild m64i */
                 if ( (rc = ops->read(ea.mem.seg, ea.mem.off, &src.val,
                                      8, ctxt)) != X86EMUL_OKAY )
                     goto done;
                 emulate_fpu_insn_memsrc("fildll", src.val);
+                dst.type = OP_NONE;
                 break;
             case 6: /* fbstp packed bcd */
-                ea.bytes = 10;
-                dst = ea;
-                dst.type = OP_MEM;
                 emulate_fpu_insn_memdst("fbstp", dst.val);
+                dst.bytes = 10;
                 break;
             case 7: /* fistp m64i */
-                ea.bytes = 8;
-                dst = ea;
-                dst.type = OP_MEM;
                 emulate_fpu_insn_memdst("fistpll", dst.val);
+                dst.bytes = 8;
                 break;
             }
         }

[-- Attachment #3: Type: text/plain, Size: 127 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 4/6] x86emul: reduce FPU handling code size
  2016-12-06 14:04 [PATCH 0/6] x86emul: FPU handling improvements Jan Beulich
                   ` (2 preceding siblings ...)
  2016-12-06 14:12 ` [PATCH 3/6] x86emul: simplify FPU destination " Jan Beulich
@ 2016-12-06 14:13 ` Jan Beulich
  2016-12-07 15:06   ` Andrew Cooper
  2016-12-06 14:13 ` [PATCH 5/6] x86emul: avoid undefined behavior when dealing with 10-byte FPU operands Jan Beulich
  2016-12-06 14:14 ` [PATCH 6/6] x86emul: simplify FPU handling asm() constraints Jan Beulich
  5 siblings, 1 reply; 17+ messages in thread
From: Jan Beulich @ 2016-12-06 14:13 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper

[-- Attachment #1: Type: text/plain, Size: 8750 bytes --]

Pulling out the {get,put}_fpu() invocations from individual emulation
paths leads to a couple of kb code size reduction in my builds. Note
that this is fine exception-wise:
- #UD and #NM have implementation defined order relative to one
  another,
- data read #GP/#SS/#PF now properly are delivered after #NM/#UD.

Signed-off-by: Jan Beulich <jbeulich@suse.com>

--- a/xen/arch/x86/x86_emulate/x86_emulate.c
+++ b/xen/arch/x86/x86_emulate/x86_emulate.c
@@ -874,59 +874,44 @@ do {
 } while (0)
 
 #define emulate_fpu_insn(_op)                           \
-do{ struct fpu_insn_ctxt fic;                           \
-    get_fpu(X86EMUL_FPU_fpu, &fic);                     \
     asm volatile (                                      \
         "movb $2f-1f,%0 \n"                             \
         "1: " _op "     \n"                             \
         "2:             \n"                             \
-        : "=m" (fic.insn_bytes) : : "memory" );         \
-    put_fpu(&fic);                                      \
-} while (0)
+        : "=m" (fic.insn_bytes) : : "memory" )
 
 #define emulate_fpu_insn_memdst(_op, _arg)              \
-do{ struct fpu_insn_ctxt fic;                           \
-    get_fpu(X86EMUL_FPU_fpu, &fic);                     \
     asm volatile (                                      \
         "movb $2f-1f,%0 \n"                             \
         "1: " _op " %1  \n"                             \
         "2:             \n"                             \
         : "=m" (fic.insn_bytes), "=m" (_arg)            \
-        : : "memory" );                                 \
-    put_fpu(&fic);                                      \
-} while (0)
+        : : "memory" )
 
 #define emulate_fpu_insn_memsrc(_op, _arg)              \
-do{ struct fpu_insn_ctxt fic;                           \
-    get_fpu(X86EMUL_FPU_fpu, &fic);                     \
     asm volatile (                                      \
         "movb $2f-1f,%0 \n"                             \
         "1: " _op " %1  \n"                             \
         "2:             \n"                             \
         : "=m" (fic.insn_bytes)                         \
-        : "m" (_arg) : "memory" );                      \
-    put_fpu(&fic);                                      \
-} while (0)
+        : "m" (_arg) : "memory" )
 
 #define emulate_fpu_insn_stub(_bytes...)                                \
 do {                                                                    \
     uint8_t *buf = get_stub(stub);                                      \
     unsigned int _nr = sizeof((uint8_t[]){ _bytes });                   \
-    struct fpu_insn_ctxt fic = { .insn_bytes = _nr };                   \
+    fic.insn_bytes = _nr;                                               \
     memcpy(buf, ((uint8_t[]){ _bytes, 0xc3 }), _nr + 1);                \
-    get_fpu(X86EMUL_FPU_fpu, &fic);                                     \
     stub.func();                                                        \
-    put_fpu(&fic);                                                      \
     put_stub(stub);                                                     \
 } while (0)
 
 #define emulate_fpu_insn_stub_eflags(bytes...)                          \
 do {                                                                    \
     unsigned int nr_ = sizeof((uint8_t[]){ bytes });                    \
-    struct fpu_insn_ctxt fic_ = { .insn_bytes = nr_ };                  \
     unsigned long tmp_;                                                 \
+    fic.insn_bytes = nr_;                                               \
     memcpy(get_stub(stub), ((uint8_t[]){ bytes, 0xc3 }), nr_ + 1);      \
-    get_fpu(X86EMUL_FPU_fpu, &fic_);                                    \
     asm volatile ( _PRE_EFLAGS("[eflags]", "[mask]", "[tmp]")           \
                    "call *%[func];"                                     \
                    _POST_EFLAGS("[eflags]", "[mask]", "[tmp]")          \
@@ -934,7 +919,6 @@ do {
                      [tmp] "=&r" (tmp_)                                 \
                    : [func] "rm" (stub.func),                           \
                      [mask] "i" (EFLG_ZF|EFLG_PF|EFLG_CF) );            \
-    put_fpu(&fic_);                                                     \
     put_stub(stub);                                                     \
 } while (0)
 
@@ -2579,6 +2563,7 @@ x86_emulate(
     struct operand src = { .reg = PTR_POISON };
     struct operand dst = { .reg = PTR_POISON };
     enum x86_swint_type swint_type;
+    struct fpu_insn_ctxt fic;
     struct x86_emulate_stub stub = {};
     DECLARE_ALIGNED(mmval_t, mmval);
 
@@ -3272,15 +3257,12 @@ x86_emulate(
         break;
 
     case 0x9b:  /* wait/fwait */
-    {
-        struct fpu_insn_ctxt fic = { .insn_bytes = 1 };
-
+        fic.insn_bytes = 1;
         host_and_vcpu_must_have(fpu);
         get_fpu(X86EMUL_FPU_wait, &fic);
         asm volatile ( "fwait" ::: "memory" );
         put_fpu(&fic);
         break;
-    }
 
     case 0x9c: /* pushf */
         generate_exception_if((_regs.eflags & EFLG_VM) &&
@@ -3670,6 +3652,7 @@ x86_emulate(
 
     case 0xd8: /* FPU 0xd8 */
         host_and_vcpu_must_have(fpu);
+        get_fpu(X86EMUL_FPU_fpu, &fic);
         switch ( modrm )
         {
         case 0xc0 ... 0xc7: /* fadd %stN,%st */
@@ -3715,10 +3698,12 @@ x86_emulate(
                 break;
             }
         }
+        put_fpu(&fic);
         break;
 
     case 0xd9: /* FPU 0xd9 */
         host_and_vcpu_must_have(fpu);
+        get_fpu(X86EMUL_FPU_fpu, &fic);
         switch ( modrm )
         {
         case 0xfb: /* fsincos */
@@ -3795,10 +3780,12 @@ x86_emulate(
                 generate_exception_if(true, EXC_UD);
             }
         }
+        put_fpu(&fic);
         break;
 
     case 0xda: /* FPU 0xda */
         host_and_vcpu_must_have(fpu);
+        get_fpu(X86EMUL_FPU_fpu, &fic);
         switch ( modrm )
         {
         case 0xc0 ... 0xc7: /* fcmovb %stN */
@@ -3844,10 +3831,12 @@ x86_emulate(
                 break;
             }
         }
+        put_fpu(&fic);
         break;
 
     case 0xdb: /* FPU 0xdb */
         host_and_vcpu_must_have(fpu);
+        get_fpu(X86EMUL_FPU_fpu, &fic);
         switch ( modrm )
         {
         case 0xc0 ... 0xc7: /* fcmovnb %stN */
@@ -3907,10 +3896,12 @@ x86_emulate(
                 generate_exception_if(true, EXC_UD);
             }
         }
+        put_fpu(&fic);
         break;
 
     case 0xdc: /* FPU 0xdc */
         host_and_vcpu_must_have(fpu);
+        get_fpu(X86EMUL_FPU_fpu, &fic);
         switch ( modrm )
         {
         case 0xc0 ... 0xc7: /* fadd %st,%stN */
@@ -3956,10 +3947,12 @@ x86_emulate(
                 break;
             }
         }
+        put_fpu(&fic);
         break;
 
     case 0xdd: /* FPU 0xdd */
         host_and_vcpu_must_have(fpu);
+        get_fpu(X86EMUL_FPU_fpu, &fic);
         switch ( modrm )
         {
         case 0xc0 ... 0xc7: /* ffree %stN */
@@ -4006,10 +3999,12 @@ x86_emulate(
                 generate_exception_if(true, EXC_UD);
             }
         }
+        put_fpu(&fic);
         break;
 
     case 0xde: /* FPU 0xde */
         host_and_vcpu_must_have(fpu);
+        get_fpu(X86EMUL_FPU_fpu, &fic);
         switch ( modrm )
         {
         case 0xc0 ... 0xc7: /* faddp %stN */
@@ -4052,10 +4047,12 @@ x86_emulate(
                 break;
             }
         }
+        put_fpu(&fic);
         break;
 
     case 0xdf: /* FPU 0xdf */
         host_and_vcpu_must_have(fpu);
+        get_fpu(X86EMUL_FPU_fpu, &fic);
         switch ( modrm )
         {
         case 0xe0:
@@ -4125,6 +4122,7 @@ x86_emulate(
                 break;
             }
         }
+        put_fpu(&fic);
         break;
 
     case 0xe0 ... 0xe2: /* loop{,z,nz} */ {
@@ -4945,8 +4943,8 @@ x86_emulate(
     case X86EMUL_OPC_VEX_F2(0x0f, 0x11): /* vmovsd xmm,xmm/m64 */
     {
         uint8_t *buf = get_stub(stub);
-        struct fpu_insn_ctxt fic = { .insn_bytes = 5 };
 
+        fic.insn_bytes = 5;
         buf[0] = 0x3e;
         buf[1] = 0x3e;
         buf[2] = 0x0f;
@@ -5215,8 +5213,8 @@ x86_emulate(
     case X86EMUL_OPC_VEX_66(0x0f, 0xd6): /* vmovq xmm,xmm/m64 */
     {
         uint8_t *buf = get_stub(stub);
-        struct fpu_insn_ctxt fic = { .insn_bytes = 5 };
 
+        fic.insn_bytes = 5;
         buf[0] = 0x3e;
         buf[1] = 0x3e;
         buf[2] = 0x0f;



[-- Attachment #2: x86emul-FPU-reduce.patch --]
[-- Type: text/plain, Size: 8788 bytes --]

x86emul: reduce FPU handling code size

Pulling out the {get,put}_fpu() invocations from individual emulation
paths leads to a couple of kb code size reduction in my builds. Note
that this is fine exception-wise:
- #UD and #NM have implementation defined order relative to one
  another,
- data read #GP/#SS/#PF now properly are delivered after #NM/#UD.

Signed-off-by: Jan Beulich <jbeulich@suse.com>

--- a/xen/arch/x86/x86_emulate/x86_emulate.c
+++ b/xen/arch/x86/x86_emulate/x86_emulate.c
@@ -874,59 +874,44 @@ do {
 } while (0)
 
 #define emulate_fpu_insn(_op)                           \
-do{ struct fpu_insn_ctxt fic;                           \
-    get_fpu(X86EMUL_FPU_fpu, &fic);                     \
     asm volatile (                                      \
         "movb $2f-1f,%0 \n"                             \
         "1: " _op "     \n"                             \
         "2:             \n"                             \
-        : "=m" (fic.insn_bytes) : : "memory" );         \
-    put_fpu(&fic);                                      \
-} while (0)
+        : "=m" (fic.insn_bytes) : : "memory" )
 
 #define emulate_fpu_insn_memdst(_op, _arg)              \
-do{ struct fpu_insn_ctxt fic;                           \
-    get_fpu(X86EMUL_FPU_fpu, &fic);                     \
     asm volatile (                                      \
         "movb $2f-1f,%0 \n"                             \
         "1: " _op " %1  \n"                             \
         "2:             \n"                             \
         : "=m" (fic.insn_bytes), "=m" (_arg)            \
-        : : "memory" );                                 \
-    put_fpu(&fic);                                      \
-} while (0)
+        : : "memory" )
 
 #define emulate_fpu_insn_memsrc(_op, _arg)              \
-do{ struct fpu_insn_ctxt fic;                           \
-    get_fpu(X86EMUL_FPU_fpu, &fic);                     \
     asm volatile (                                      \
         "movb $2f-1f,%0 \n"                             \
         "1: " _op " %1  \n"                             \
         "2:             \n"                             \
         : "=m" (fic.insn_bytes)                         \
-        : "m" (_arg) : "memory" );                      \
-    put_fpu(&fic);                                      \
-} while (0)
+        : "m" (_arg) : "memory" )
 
 #define emulate_fpu_insn_stub(_bytes...)                                \
 do {                                                                    \
     uint8_t *buf = get_stub(stub);                                      \
     unsigned int _nr = sizeof((uint8_t[]){ _bytes });                   \
-    struct fpu_insn_ctxt fic = { .insn_bytes = _nr };                   \
+    fic.insn_bytes = _nr;                                               \
     memcpy(buf, ((uint8_t[]){ _bytes, 0xc3 }), _nr + 1);                \
-    get_fpu(X86EMUL_FPU_fpu, &fic);                                     \
     stub.func();                                                        \
-    put_fpu(&fic);                                                      \
     put_stub(stub);                                                     \
 } while (0)
 
 #define emulate_fpu_insn_stub_eflags(bytes...)                          \
 do {                                                                    \
     unsigned int nr_ = sizeof((uint8_t[]){ bytes });                    \
-    struct fpu_insn_ctxt fic_ = { .insn_bytes = nr_ };                  \
     unsigned long tmp_;                                                 \
+    fic.insn_bytes = nr_;                                               \
     memcpy(get_stub(stub), ((uint8_t[]){ bytes, 0xc3 }), nr_ + 1);      \
-    get_fpu(X86EMUL_FPU_fpu, &fic_);                                    \
     asm volatile ( _PRE_EFLAGS("[eflags]", "[mask]", "[tmp]")           \
                    "call *%[func];"                                     \
                    _POST_EFLAGS("[eflags]", "[mask]", "[tmp]")          \
@@ -934,7 +919,6 @@ do {
                      [tmp] "=&r" (tmp_)                                 \
                    : [func] "rm" (stub.func),                           \
                      [mask] "i" (EFLG_ZF|EFLG_PF|EFLG_CF) );            \
-    put_fpu(&fic_);                                                     \
     put_stub(stub);                                                     \
 } while (0)
 
@@ -2579,6 +2563,7 @@ x86_emulate(
     struct operand src = { .reg = PTR_POISON };
     struct operand dst = { .reg = PTR_POISON };
     enum x86_swint_type swint_type;
+    struct fpu_insn_ctxt fic;
     struct x86_emulate_stub stub = {};
     DECLARE_ALIGNED(mmval_t, mmval);
 
@@ -3272,15 +3257,12 @@ x86_emulate(
         break;
 
     case 0x9b:  /* wait/fwait */
-    {
-        struct fpu_insn_ctxt fic = { .insn_bytes = 1 };
-
+        fic.insn_bytes = 1;
         host_and_vcpu_must_have(fpu);
         get_fpu(X86EMUL_FPU_wait, &fic);
         asm volatile ( "fwait" ::: "memory" );
         put_fpu(&fic);
         break;
-    }
 
     case 0x9c: /* pushf */
         generate_exception_if((_regs.eflags & EFLG_VM) &&
@@ -3670,6 +3652,7 @@ x86_emulate(
 
     case 0xd8: /* FPU 0xd8 */
         host_and_vcpu_must_have(fpu);
+        get_fpu(X86EMUL_FPU_fpu, &fic);
         switch ( modrm )
         {
         case 0xc0 ... 0xc7: /* fadd %stN,%st */
@@ -3715,10 +3698,12 @@ x86_emulate(
                 break;
             }
         }
+        put_fpu(&fic);
         break;
 
     case 0xd9: /* FPU 0xd9 */
         host_and_vcpu_must_have(fpu);
+        get_fpu(X86EMUL_FPU_fpu, &fic);
         switch ( modrm )
         {
         case 0xfb: /* fsincos */
@@ -3795,10 +3780,12 @@ x86_emulate(
                 generate_exception_if(true, EXC_UD);
             }
         }
+        put_fpu(&fic);
         break;
 
     case 0xda: /* FPU 0xda */
         host_and_vcpu_must_have(fpu);
+        get_fpu(X86EMUL_FPU_fpu, &fic);
         switch ( modrm )
         {
         case 0xc0 ... 0xc7: /* fcmovb %stN */
@@ -3844,10 +3831,12 @@ x86_emulate(
                 break;
             }
         }
+        put_fpu(&fic);
         break;
 
     case 0xdb: /* FPU 0xdb */
         host_and_vcpu_must_have(fpu);
+        get_fpu(X86EMUL_FPU_fpu, &fic);
         switch ( modrm )
         {
         case 0xc0 ... 0xc7: /* fcmovnb %stN */
@@ -3907,10 +3896,12 @@ x86_emulate(
                 generate_exception_if(true, EXC_UD);
             }
         }
+        put_fpu(&fic);
         break;
 
     case 0xdc: /* FPU 0xdc */
         host_and_vcpu_must_have(fpu);
+        get_fpu(X86EMUL_FPU_fpu, &fic);
         switch ( modrm )
         {
         case 0xc0 ... 0xc7: /* fadd %st,%stN */
@@ -3956,10 +3947,12 @@ x86_emulate(
                 break;
             }
         }
+        put_fpu(&fic);
         break;
 
     case 0xdd: /* FPU 0xdd */
         host_and_vcpu_must_have(fpu);
+        get_fpu(X86EMUL_FPU_fpu, &fic);
         switch ( modrm )
         {
         case 0xc0 ... 0xc7: /* ffree %stN */
@@ -4006,10 +3999,12 @@ x86_emulate(
                 generate_exception_if(true, EXC_UD);
             }
         }
+        put_fpu(&fic);
         break;
 
     case 0xde: /* FPU 0xde */
         host_and_vcpu_must_have(fpu);
+        get_fpu(X86EMUL_FPU_fpu, &fic);
         switch ( modrm )
         {
         case 0xc0 ... 0xc7: /* faddp %stN */
@@ -4052,10 +4047,12 @@ x86_emulate(
                 break;
             }
         }
+        put_fpu(&fic);
         break;
 
     case 0xdf: /* FPU 0xdf */
         host_and_vcpu_must_have(fpu);
+        get_fpu(X86EMUL_FPU_fpu, &fic);
         switch ( modrm )
         {
         case 0xe0:
@@ -4125,6 +4122,7 @@ x86_emulate(
                 break;
             }
         }
+        put_fpu(&fic);
         break;
 
     case 0xe0 ... 0xe2: /* loop{,z,nz} */ {
@@ -4945,8 +4943,8 @@ x86_emulate(
     case X86EMUL_OPC_VEX_F2(0x0f, 0x11): /* vmovsd xmm,xmm/m64 */
     {
         uint8_t *buf = get_stub(stub);
-        struct fpu_insn_ctxt fic = { .insn_bytes = 5 };
 
+        fic.insn_bytes = 5;
         buf[0] = 0x3e;
         buf[1] = 0x3e;
         buf[2] = 0x0f;
@@ -5215,8 +5213,8 @@ x86_emulate(
     case X86EMUL_OPC_VEX_66(0x0f, 0xd6): /* vmovq xmm,xmm/m64 */
     {
         uint8_t *buf = get_stub(stub);
-        struct fpu_insn_ctxt fic = { .insn_bytes = 5 };
 
+        fic.insn_bytes = 5;
         buf[0] = 0x3e;
         buf[1] = 0x3e;
         buf[2] = 0x0f;

[-- Attachment #3: Type: text/plain, Size: 127 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 5/6] x86emul: avoid undefined behavior when dealing with 10-byte FPU operands
  2016-12-06 14:04 [PATCH 0/6] x86emul: FPU handling improvements Jan Beulich
                   ` (3 preceding siblings ...)
  2016-12-06 14:13 ` [PATCH 4/6] x86emul: reduce FPU handling code size Jan Beulich
@ 2016-12-06 14:13 ` Jan Beulich
  2016-12-07 15:12   ` Andrew Cooper
  2016-12-06 14:14 ` [PATCH 6/6] x86emul: simplify FPU handling asm() constraints Jan Beulich
  5 siblings, 1 reply; 17+ messages in thread
From: Jan Beulich @ 2016-12-06 14:13 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper

[-- Attachment #1: Type: text/plain, Size: 3051 bytes --]

Accessing an 8-byte (or perhaps just 4-byte in the test harness when
built as 32-bit app) field to read/write 10 bytes (leveraging the
successive field) is a latent bug, as the compiler could copy things
around. Use the 32 bytes large SSE/AVX slot instead.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
The presence of the !op->write checks implies a logical (but not
functional) dependency on the patch making ops->write (and ->cmpxchg)
optional. Without that patch they're just dead code.

Signed-off-by: Jan Beulich <jbeulich@suse.com>

--- a/xen/arch/x86/x86_emulate/x86_emulate.c
+++ b/xen/arch/x86/x86_emulate/x86_emulate.c
@@ -3882,15 +3882,19 @@ x86_emulate(
                 dst.bytes = 4;
                 break;
             case 5: /* fld m80fp */
-                if ( (rc = ops->read(ea.mem.seg, ea.mem.off, &src.val,
+                if ( (rc = ops->read(ea.mem.seg, ea.mem.off, mmvalp,
                                      10, ctxt)) != X86EMUL_OKAY )
                     goto done;
-                emulate_fpu_insn_memsrc("fldt", src.val);
+                emulate_fpu_insn_memsrc("fldt", *mmvalp);
                 dst.type = OP_NONE;
                 break;
             case 7: /* fstp m80fp */
-                emulate_fpu_insn_memdst("fstpt", dst.val);
-                dst.bytes = 10;
+                fail_if(!ops->write);
+                emulate_fpu_insn_memdst("fstpt", *mmvalp);
+                if ( (rc = ops->write(ea.mem.seg, ea.mem.off, mmvalp,
+                                      10, ctxt)) != X86EMUL_OKAY )
+                    goto done;
+                dst.type = OP_NONE;
                 break;
             default:
                 generate_exception_if(true, EXC_UD);
@@ -4099,10 +4103,10 @@ x86_emulate(
                 dst.bytes = 2;
                 break;
             case 4: /* fbld m80dec */
-                if ( (rc = ops->read(ea.mem.seg, ea.mem.off, &src.val,
+                if ( (rc = ops->read(ea.mem.seg, ea.mem.off, mmvalp,
                                      10, ctxt)) != X86EMUL_OKAY )
                     goto done;
-                emulate_fpu_insn_memsrc("fbld", src.val);
+                emulate_fpu_insn_memsrc("fbld", *mmvalp);
                 dst.type = OP_NONE;
                 break;
             case 5: /* fild m64i */
@@ -4113,8 +4117,12 @@ x86_emulate(
                 dst.type = OP_NONE;
                 break;
             case 6: /* fbstp packed bcd */
-                emulate_fpu_insn_memdst("fbstp", dst.val);
-                dst.bytes = 10;
+                fail_if(!ops->write);
+                emulate_fpu_insn_memdst("fbstp", *mmvalp);
+                if ( (rc = ops->write(ea.mem.seg, ea.mem.off, mmvalp,
+                                      10, ctxt)) != X86EMUL_OKAY )
+                    goto done;
+                dst.type = OP_NONE;
                 break;
             case 7: /* fistp m64i */
                 emulate_fpu_insn_memdst("fistpll", dst.val);




[-- Attachment #2: x86emul-FPU-10byte.patch --]
[-- Type: text/plain, Size: 3121 bytes --]

x86emul: avoid undefined behavior when dealing with 10-byte FPU operands

Accessing an 8-byte (or perhaps just 4-byte in the test harness when
built as 32-bit app) field to read/write 10 bytes (leveraging the
successive field) is a latent bug, as the compiler could copy things
around. Use the 32 bytes large SSE/AVX slot instead.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
The presence of the !op->write checks implies a logical (but not
functional) dependency on the patch making ops->write (and ->cmpxchg)
optional. Without that patch they're just dead code.

Signed-off-by: Jan Beulich <jbeulich@suse.com>

--- a/xen/arch/x86/x86_emulate/x86_emulate.c
+++ b/xen/arch/x86/x86_emulate/x86_emulate.c
@@ -3882,15 +3882,19 @@ x86_emulate(
                 dst.bytes = 4;
                 break;
             case 5: /* fld m80fp */
-                if ( (rc = ops->read(ea.mem.seg, ea.mem.off, &src.val,
+                if ( (rc = ops->read(ea.mem.seg, ea.mem.off, mmvalp,
                                      10, ctxt)) != X86EMUL_OKAY )
                     goto done;
-                emulate_fpu_insn_memsrc("fldt", src.val);
+                emulate_fpu_insn_memsrc("fldt", *mmvalp);
                 dst.type = OP_NONE;
                 break;
             case 7: /* fstp m80fp */
-                emulate_fpu_insn_memdst("fstpt", dst.val);
-                dst.bytes = 10;
+                fail_if(!ops->write);
+                emulate_fpu_insn_memdst("fstpt", *mmvalp);
+                if ( (rc = ops->write(ea.mem.seg, ea.mem.off, mmvalp,
+                                      10, ctxt)) != X86EMUL_OKAY )
+                    goto done;
+                dst.type = OP_NONE;
                 break;
             default:
                 generate_exception_if(true, EXC_UD);
@@ -4099,10 +4103,10 @@ x86_emulate(
                 dst.bytes = 2;
                 break;
             case 4: /* fbld m80dec */
-                if ( (rc = ops->read(ea.mem.seg, ea.mem.off, &src.val,
+                if ( (rc = ops->read(ea.mem.seg, ea.mem.off, mmvalp,
                                      10, ctxt)) != X86EMUL_OKAY )
                     goto done;
-                emulate_fpu_insn_memsrc("fbld", src.val);
+                emulate_fpu_insn_memsrc("fbld", *mmvalp);
                 dst.type = OP_NONE;
                 break;
             case 5: /* fild m64i */
@@ -4113,8 +4117,12 @@ x86_emulate(
                 dst.type = OP_NONE;
                 break;
             case 6: /* fbstp packed bcd */
-                emulate_fpu_insn_memdst("fbstp", dst.val);
-                dst.bytes = 10;
+                fail_if(!ops->write);
+                emulate_fpu_insn_memdst("fbstp", *mmvalp);
+                if ( (rc = ops->write(ea.mem.seg, ea.mem.off, mmvalp,
+                                      10, ctxt)) != X86EMUL_OKAY )
+                    goto done;
+                dst.type = OP_NONE;
                 break;
             case 7: /* fistp m64i */
                 emulate_fpu_insn_memdst("fistpll", dst.val);

[-- Attachment #3: Type: text/plain, Size: 127 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 6/6] x86emul: simplify FPU handling asm() constraints
  2016-12-06 14:04 [PATCH 0/6] x86emul: FPU handling improvements Jan Beulich
                   ` (4 preceding siblings ...)
  2016-12-06 14:13 ` [PATCH 5/6] x86emul: avoid undefined behavior when dealing with 10-byte FPU operands Jan Beulich
@ 2016-12-06 14:14 ` Jan Beulich
  2016-12-07 15:16   ` Andrew Cooper
  5 siblings, 1 reply; 17+ messages in thread
From: Jan Beulich @ 2016-12-06 14:14 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper

[-- Attachment #1: Type: text/plain, Size: 1856 bytes --]

The memory clobber is rather harsh here. However, fic.exn_raised may be
modified as a side effect, so we need to let the compiler know that all
of "fic" may be changed (preventing it from moving around accesses to
the exn_raised field).

Signed-off-by: Jan Beulich <jbeulich@suse.com>

--- a/xen/arch/x86/x86_emulate/x86_emulate.c
+++ b/xen/arch/x86/x86_emulate/x86_emulate.c
@@ -784,7 +784,7 @@ do {
 })
 
 struct fpu_insn_ctxt {
-    uint8_t insn_bytes;
+    uint8_t insn_bytes; /* Must be first! */
     int8_t exn_raised;
 };
 
@@ -878,23 +878,21 @@ do {
         "movb $2f-1f,%0 \n"                             \
         "1: " _op "     \n"                             \
         "2:             \n"                             \
-        : "=m" (fic.insn_bytes) : : "memory" )
+        : "+m" (fic) )
 
 #define emulate_fpu_insn_memdst(_op, _arg)              \
     asm volatile (                                      \
         "movb $2f-1f,%0 \n"                             \
         "1: " _op " %1  \n"                             \
         "2:             \n"                             \
-        : "=m" (fic.insn_bytes), "=m" (_arg)            \
-        : : "memory" )
+        : "+m" (fic), "=m" (_arg) )
 
 #define emulate_fpu_insn_memsrc(_op, _arg)              \
     asm volatile (                                      \
         "movb $2f-1f,%0 \n"                             \
         "1: " _op " %1  \n"                             \
         "2:             \n"                             \
-        : "=m" (fic.insn_bytes)                         \
-        : "m" (_arg) : "memory" )
+        : "+m" (fic) : "m" (_arg) )
 
 #define emulate_fpu_insn_stub(_bytes...)                                \
 do {                                                                    \




[-- Attachment #2: x86emul-FPU-constraints.patch --]
[-- Type: text/plain, Size: 1902 bytes --]

x86emul: simplify FPU handling asm() constraints

The memory clobber is rather harsh here. However, fic.exn_raised may be
modified as a side effect, so we need to let the compiler know that all
of "fic" may be changed (preventing it from moving around accesses to
the exn_raised field).

Signed-off-by: Jan Beulich <jbeulich@suse.com>

--- a/xen/arch/x86/x86_emulate/x86_emulate.c
+++ b/xen/arch/x86/x86_emulate/x86_emulate.c
@@ -784,7 +784,7 @@ do {
 })
 
 struct fpu_insn_ctxt {
-    uint8_t insn_bytes;
+    uint8_t insn_bytes; /* Must be first! */
     int8_t exn_raised;
 };
 
@@ -878,23 +878,21 @@ do {
         "movb $2f-1f,%0 \n"                             \
         "1: " _op "     \n"                             \
         "2:             \n"                             \
-        : "=m" (fic.insn_bytes) : : "memory" )
+        : "+m" (fic) )
 
 #define emulate_fpu_insn_memdst(_op, _arg)              \
     asm volatile (                                      \
         "movb $2f-1f,%0 \n"                             \
         "1: " _op " %1  \n"                             \
         "2:             \n"                             \
-        : "=m" (fic.insn_bytes), "=m" (_arg)            \
-        : : "memory" )
+        : "+m" (fic), "=m" (_arg) )
 
 #define emulate_fpu_insn_memsrc(_op, _arg)              \
     asm volatile (                                      \
         "movb $2f-1f,%0 \n"                             \
         "1: " _op " %1  \n"                             \
         "2:             \n"                             \
-        : "=m" (fic.insn_bytes)                         \
-        : "m" (_arg) : "memory" )
+        : "+m" (fic) : "m" (_arg) )
 
 #define emulate_fpu_insn_stub(_bytes...)                                \
 do {                                                                    \

[-- Attachment #3: Type: text/plain, Size: 127 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/6] x86emul: extend / amend supported FPU opcodes
  2016-12-06 14:11 ` [PATCH 1/6] x86emul: extend / amend supported FPU opcodes Jan Beulich
@ 2016-12-07 14:35   ` Andrew Cooper
  2016-12-07 15:00     ` Jan Beulich
  0 siblings, 1 reply; 17+ messages in thread
From: Andrew Cooper @ 2016-12-07 14:35 UTC (permalink / raw)
  To: Jan Beulich, xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 2636 bytes --]

On 06/12/16 14:11, Jan Beulich wrote:
> First of all there are a number of secondary encodings both Intel and
> AMD support, but which aren't formally documented.

Where did you get them from then?

(I'm fine with introducing these if they exist, but it would be good to
provide references if possible.)

> @@ -3798,7 +3801,7 @@ x86_emulate(
>                  emulate_fpu_insn_memdst("fnstcw", dst.val);
>                  break;
>              default:
> -                goto cannot_emulate;
> +                generate_exception_if(true, EXC_UD);

You can use generate_exception(EXC_UD); here, and several places below.

>              }
>          }
>          break;
> @@ -4035,6 +4044,7 @@ x86_emulate(
>          {
>          case 0xc0 ... 0xc7: /* faddp %stN */
>          case 0xc8 ... 0xcf: /* fmulp %stN */
> +        case 0xd0 ... 0xd7: /* fcomp %stN (alternative encoding) */
>          case 0xd9: /* fcompp */
>          case 0xe0 ... 0xe7: /* fsubrp %stN */
>          case 0xe8 ... 0xef: /* fsubp %stN */
> @@ -4043,7 +4053,7 @@ x86_emulate(
>              emulate_fpu_insn_stub(0xde, modrm);
>              break;
>          default:
> -            fail_if(modrm >= 0xc0);
> +            generate_exception_if(ea.type != OP_MEM, EXC_UD);
>              ea.bytes = 2;
>              src = ea;
>              if ( (rc = ops->read(src.mem.seg, src.mem.off, &src.val,
> @@ -4090,13 +4100,19 @@ x86_emulate(
>              dst.reg = (unsigned long *)&_regs.eax;
>              emulate_fpu_insn_memdst("fnstsw", dst.val);
>              break;
> +        case 0xc0 ... 0xc7: /* ffreep %stN */

This positioning looks wrong.  ffreep doesn't appear to interact with
FEATURE_CMOV, or eflags.  Did you mean to have it part of the lower
introduced block?

~Andrew

>          case 0xe8 ... 0xef: /* fucomip %stN */
>          case 0xf0 ... 0xf7: /* fcomip %stN */
>              vcpu_must_have_cmov();
>              emulate_fpu_insn_stub_eflags(0xdf, modrm);
>              break;
> +        case 0xc8 ... 0xcf: /* fxch %stN (alternative encoding) */
> +        case 0xd0 ... 0xd7: /* fstp %stN (alternative encoding) */
> +        case 0xd8 ... 0xdf: /* fstp %stN (alternative encoding) */
> +            emulate_fpu_insn_stub(0xdf, modrm);
> +            break;
>          default:
> -            fail_if(modrm >= 0xc0);
> +            generate_exception_if(ea.type != OP_MEM, EXC_UD);
>              switch ( modrm_reg & 7 )
>              {
>              case 0: /* fild m16i */
>
>
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> https://lists.xen.org/xen-devel


[-- Attachment #1.2: Type: text/html, Size: 3719 bytes --]

[-- Attachment #2: Type: text/plain, Size: 127 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/6] x86emul: simplify FPU source operand handling
  2016-12-06 14:12 ` [PATCH 2/6] x86emul: simplify FPU source operand handling Jan Beulich
@ 2016-12-07 14:50   ` Andrew Cooper
  0 siblings, 0 replies; 17+ messages in thread
From: Andrew Cooper @ 2016-12-07 14:50 UTC (permalink / raw)
  To: Jan Beulich, xen-devel

On 06/12/16 14:12, Jan Beulich wrote:
> Consistently use ea instead of src for passing the memory address to
> ->read(). This eliminates the need to copy ea to src, resulting in a
> couple of hundred bytes smaller binary size.
>
> In addition for opcode DE we can leverage SrcMem16 to eliminate a call
> of the ->read() hook. At the same time drop the stray Mov attributes
> from D8, DA, DC, and DE: They're meaningful for memory writes only.
>
> Signed-off-by: Jan Beulich <jbeulich@suse.com>

Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 3/6] x86emul: simplify FPU destination operand handling
  2016-12-06 14:12 ` [PATCH 3/6] x86emul: simplify FPU destination " Jan Beulich
@ 2016-12-07 14:54   ` Andrew Cooper
  0 siblings, 0 replies; 17+ messages in thread
From: Andrew Cooper @ 2016-12-07 14:54 UTC (permalink / raw)
  To: Jan Beulich, xen-devel

On 06/12/16 14:12, Jan Beulich wrote:
> Consolidate the copying of ea to dst: There's no need to set the type
> to OP_MEM, and instead the load cases setting it to OP_NONE allows the
> copying to be done just once per major opcode.
>
> Signed-off-by: Jan Beulich <jbeulich@suse.com>

Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/6] x86emul: extend / amend supported FPU opcodes
  2016-12-07 14:35   ` Andrew Cooper
@ 2016-12-07 15:00     ` Jan Beulich
  0 siblings, 0 replies; 17+ messages in thread
From: Jan Beulich @ 2016-12-07 15:00 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: xen-devel

>>> On 07.12.16 at 15:35, <andrew.cooper3@citrix.com> wrote:
> On 06/12/16 14:11, Jan Beulich wrote:
>> First of all there are a number of secondary encodings both Intel and
>> AMD support, but which aren't formally documented.
> 
> Where did you get them from then?
> 
> (I'm fine with introducing these if they exist, but it would be good to
> provide references if possible.)

From memory, plus re-validation of that memory. I have my own
disassembler library, where years ago I did put all of that in. I can't
tell what - if any - source I may have had back then, other than
having tried them all out on hardware.

>> @@ -3798,7 +3801,7 @@ x86_emulate(
>>                  emulate_fpu_insn_memdst("fnstcw", dst.val);
>>                  break;
>>              default:
>> -                goto cannot_emulate;
>> +                generate_exception_if(true, EXC_UD);
> 
> You can use generate_exception(EXC_UD); here, and several places below.

Oh, yes, I meant to check for those during re-base but then forgot
to do so here.

>> @@ -4090,13 +4100,19 @@ x86_emulate(
>>              dst.reg = (unsigned long *)&_regs.eax;
>>              emulate_fpu_insn_memdst("fnstsw", dst.val);
>>              break;
>> +        case 0xc0 ... 0xc7: /* ffreep %stN */
> 
> This positioning looks wrong.  ffreep doesn't appear to interact with
> FEATURE_CMOV, or eflags.  Did you mean to have it part of the lower
> introduced block?

Yes indeed. Thanks for spotting.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 4/6] x86emul: reduce FPU handling code size
  2016-12-06 14:13 ` [PATCH 4/6] x86emul: reduce FPU handling code size Jan Beulich
@ 2016-12-07 15:06   ` Andrew Cooper
  2016-12-07 15:16     ` Jan Beulich
  0 siblings, 1 reply; 17+ messages in thread
From: Andrew Cooper @ 2016-12-07 15:06 UTC (permalink / raw)
  To: Jan Beulich, xen-devel

On 06/12/16 14:13, Jan Beulich wrote:
> @@ -3670,6 +3652,7 @@ x86_emulate(
>  
>      case 0xd8: /* FPU 0xd8 */
>          host_and_vcpu_must_have(fpu);
> +        get_fpu(X86EMUL_FPU_fpu, &fic);
>          switch ( modrm )
>          {
>          case 0xc0 ... 0xc7: /* fadd %stN,%st */
> @@ -3715,10 +3698,12 @@ x86_emulate(
>                  break;
>              }
>          }
> +        put_fpu(&fic);
>          break;

This repositioning does mean that we might skip the put_fpu() call,
although we still retain the catch-all _put_fpu().

I think this is still safe as we only skip this put_fpu() in cases where
we didn't emulate an instruction, and there isn't the risk of missing a
pending FPU exception.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 5/6] x86emul: avoid undefined behavior when dealing with 10-byte FPU operands
  2016-12-06 14:13 ` [PATCH 5/6] x86emul: avoid undefined behavior when dealing with 10-byte FPU operands Jan Beulich
@ 2016-12-07 15:12   ` Andrew Cooper
  0 siblings, 0 replies; 17+ messages in thread
From: Andrew Cooper @ 2016-12-07 15:12 UTC (permalink / raw)
  To: Jan Beulich, xen-devel

On 06/12/16 14:13, Jan Beulich wrote:
> Accessing an 8-byte (or perhaps just 4-byte in the test harness when
> built as 32-bit app) field to read/write 10 bytes (leveraging the
> successive field) is a latent bug, as the compiler could copy things
> around. Use the 32 bytes large SSE/AVX slot instead.
>
> Signed-off-by: Jan Beulich <jbeulich@suse.com>

Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>

The name mmval is now rather stale, and it would probably be better to
name it to something neutral, which is guarenteed to be large enough for
any single register we need to interact with.


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 6/6] x86emul: simplify FPU handling asm() constraints
  2016-12-06 14:14 ` [PATCH 6/6] x86emul: simplify FPU handling asm() constraints Jan Beulich
@ 2016-12-07 15:16   ` Andrew Cooper
  2016-12-07 15:36     ` Jan Beulich
  0 siblings, 1 reply; 17+ messages in thread
From: Andrew Cooper @ 2016-12-07 15:16 UTC (permalink / raw)
  To: Jan Beulich, xen-devel

On 06/12/16 14:14, Jan Beulich wrote:
> The memory clobber is rather harsh here.

Does removing it actually make a difference?  I can't spot anything
which could reasonably be reordered around the asm() blocks.

> However, fic.exn_raised may be
> modified as a side effect, so we need to let the compiler know that all
> of "fic" may be changed (preventing it from moving around accesses to
> the exn_raised field).
>
> Signed-off-by: Jan Beulich <jbeulich@suse.com>
>
> --- a/xen/arch/x86/x86_emulate/x86_emulate.c
> +++ b/xen/arch/x86/x86_emulate/x86_emulate.c
> @@ -784,7 +784,7 @@ do {
>  })
>  
>  struct fpu_insn_ctxt {
> -    uint8_t insn_bytes;
> +    uint8_t insn_bytes; /* Must be first! */

And one single byte.  The compiler would previously have caught an
accidental breaking of this requirement.

As an alternative, how about using ACCESS_ONCE() to read exn_raised?  It
would allow you to drop the memory clobber and also not generalise the
fic.insn_bytes memory parameter to fic.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 4/6] x86emul: reduce FPU handling code size
  2016-12-07 15:06   ` Andrew Cooper
@ 2016-12-07 15:16     ` Jan Beulich
  2016-12-07 15:21       ` Andrew Cooper
  0 siblings, 1 reply; 17+ messages in thread
From: Jan Beulich @ 2016-12-07 15:16 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: xen-devel

>>> On 07.12.16 at 16:06, <andrew.cooper3@citrix.com> wrote:
> On 06/12/16 14:13, Jan Beulich wrote:
>> @@ -3670,6 +3652,7 @@ x86_emulate(
>>  
>>      case 0xd8: /* FPU 0xd8 */
>>          host_and_vcpu_must_have(fpu);
>> +        get_fpu(X86EMUL_FPU_fpu, &fic);
>>          switch ( modrm )
>>          {
>>          case 0xc0 ... 0xc7: /* fadd %stN,%st */
>> @@ -3715,10 +3698,12 @@ x86_emulate(
>>                  break;
>>              }
>>          }
>> +        put_fpu(&fic);
>>          break;
> 
> This repositioning does mean that we might skip the put_fpu() call,
> although we still retain the catch-all _put_fpu().
> 
> I think this is still safe as we only skip this put_fpu() in cases where
> we didn't emulate an instruction, and there isn't the risk of missing a
> pending FPU exception.

Correct - the catch-all one is sufficient for all cases.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 4/6] x86emul: reduce FPU handling code size
  2016-12-07 15:16     ` Jan Beulich
@ 2016-12-07 15:21       ` Andrew Cooper
  0 siblings, 0 replies; 17+ messages in thread
From: Andrew Cooper @ 2016-12-07 15:21 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel

On 07/12/16 15:16, Jan Beulich wrote:
>>>> On 07.12.16 at 16:06, <andrew.cooper3@citrix.com> wrote:
>> On 06/12/16 14:13, Jan Beulich wrote:
>>> @@ -3670,6 +3652,7 @@ x86_emulate(
>>>  
>>>      case 0xd8: /* FPU 0xd8 */
>>>          host_and_vcpu_must_have(fpu);
>>> +        get_fpu(X86EMUL_FPU_fpu, &fic);
>>>          switch ( modrm )
>>>          {
>>>          case 0xc0 ... 0xc7: /* fadd %stN,%st */
>>> @@ -3715,10 +3698,12 @@ x86_emulate(
>>>                  break;
>>>              }
>>>          }
>>> +        put_fpu(&fic);
>>>          break;
>> This repositioning does mean that we might skip the put_fpu() call,
>> although we still retain the catch-all _put_fpu().
>>
>> I think this is still safe as we only skip this put_fpu() in cases where
>> we didn't emulate an instruction, and there isn't the risk of missing a
>> pending FPU exception.
> Correct - the catch-all one is sufficient for all cases.

Ok.  Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 6/6] x86emul: simplify FPU handling asm() constraints
  2016-12-07 15:16   ` Andrew Cooper
@ 2016-12-07 15:36     ` Jan Beulich
  0 siblings, 0 replies; 17+ messages in thread
From: Jan Beulich @ 2016-12-07 15:36 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: xen-devel

>>> On 07.12.16 at 16:16, <andrew.cooper3@citrix.com> wrote:
> On 06/12/16 14:14, Jan Beulich wrote:
>> The memory clobber is rather harsh here.
> 
> Does removing it actually make a difference?  I can't spot anything
> which could reasonably be reordered around the asm() blocks.

It's not so much the re-ordering but the potential dropping of
something that was cached in a register. Indeed there was no
change to the generated code at all with current gcc, but the
compiler gets better over time, and we shouldn't restrict it
unduly.

>> However, fic.exn_raised may be
>> modified as a side effect, so we need to let the compiler know that all
>> of "fic" may be changed (preventing it from moving around accesses to
>> the exn_raised field).
>>
>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
>>
>> --- a/xen/arch/x86/x86_emulate/x86_emulate.c
>> +++ b/xen/arch/x86/x86_emulate/x86_emulate.c
>> @@ -784,7 +784,7 @@ do {
>>  })
>>  
>>  struct fpu_insn_ctxt {
>> -    uint8_t insn_bytes;
>> +    uint8_t insn_bytes; /* Must be first! */
> 
> And one single byte.  The compiler would previously have caught an
> accidental breaking of this requirement.

There was no such requirement before. Of course we could add an
immediate operand to all the asm()s, specifying the offsetof(). But
then again we already have a hidden dependency on its size anyway.

> As an alternative, how about using ACCESS_ONCE() to read exn_raised?  It
> would allow you to drop the memory clobber and also not generalise the
> fic.insn_bytes memory parameter to fic.

There's no ACCESS_ONCE() in the x86 code, and even less so in
what the emulator code can use. Nor would what you suggest
allow to legitimately drop the memory clobber.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2016-12-07 15:36 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-12-06 14:04 [PATCH 0/6] x86emul: FPU handling improvements Jan Beulich
2016-12-06 14:11 ` [PATCH 1/6] x86emul: extend / amend supported FPU opcodes Jan Beulich
2016-12-07 14:35   ` Andrew Cooper
2016-12-07 15:00     ` Jan Beulich
2016-12-06 14:12 ` [PATCH 2/6] x86emul: simplify FPU source operand handling Jan Beulich
2016-12-07 14:50   ` Andrew Cooper
2016-12-06 14:12 ` [PATCH 3/6] x86emul: simplify FPU destination " Jan Beulich
2016-12-07 14:54   ` Andrew Cooper
2016-12-06 14:13 ` [PATCH 4/6] x86emul: reduce FPU handling code size Jan Beulich
2016-12-07 15:06   ` Andrew Cooper
2016-12-07 15:16     ` Jan Beulich
2016-12-07 15:21       ` Andrew Cooper
2016-12-06 14:13 ` [PATCH 5/6] x86emul: avoid undefined behavior when dealing with 10-byte FPU operands Jan Beulich
2016-12-07 15:12   ` Andrew Cooper
2016-12-06 14:14 ` [PATCH 6/6] x86emul: simplify FPU handling asm() constraints Jan Beulich
2016-12-07 15:16   ` Andrew Cooper
2016-12-07 15:36     ` Jan Beulich

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).