From: Boris Ostrovsky <boris.ostrovsky@oracle.com>
To: Andrew Cooper <andrew.cooper3@citrix.com>,
Xen-devel <xen-devel@lists.xen.org>
Cc: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>,
Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>,
Jan Beulich <jbeulich@suse.com>
Subject: Re: [PATCH 4/6] x86/emulate: Support for emulating software event injection
Date: Wed, 24 Sep 2014 09:01:34 -0400 [thread overview]
Message-ID: <5422C0AE.3090404@oracle.com> (raw)
In-Reply-To: <1411484611-31027-5-git-send-email-andrew.cooper3@citrix.com>
On 09/23/2014 11:03 AM, Andrew Cooper wrote:
> AMD SVM requires all software events to have their injection emulated if
> hardware lacks NextRIP support. In addition, `icebp` (opcode 0xf1) injection
> requires emulation in all cases, even with hardware NextRIP support.
>
> Emulating full control transfers is overkill for our needs. All that matters
> is that guest userspace can't bypass the descriptor DPL check. Any guest OS
> which would incur other faults as part of injection is going to end up with a
> double fault instead, and won't be in a position to care that the faulting eip
> is wrong.
>
> Reported-by: Andrei LUTAS <vlutas@bitdefender.com>
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> CC: Boris Ostrovsky <boris.ostrovsky@oracle.com>
> CC: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
> CC: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
> ---
> xen/arch/x86/hvm/emulate.c | 8 +++
> xen/arch/x86/hvm/svm/svm.c | 57 +++++++++++++--
> xen/arch/x86/mm.c | 2 +
> xen/arch/x86/mm/shadow/common.c | 1 +
> xen/arch/x86/x86_emulate/x86_emulate.c | 122 ++++++++++++++++++++++++++++++--
> xen/arch/x86/x86_emulate/x86_emulate.h | 10 +++
> 6 files changed, 191 insertions(+), 9 deletions(-)
>
> diff --git a/xen/arch/x86/hvm/emulate.c b/xen/arch/x86/hvm/emulate.c
> index 7ee146b..463ccfb 100644
> --- a/xen/arch/x86/hvm/emulate.c
> +++ b/xen/arch/x86/hvm/emulate.c
> @@ -21,6 +21,7 @@
> #include <asm/hvm/hvm.h>
> #include <asm/hvm/trace.h>
> #include <asm/hvm/support.h>
> +#include <asm/hvm/svm/svm.h>
>
> static void hvmtrace_io_assist(int is_mmio, ioreq_t *p)
> {
> @@ -1328,6 +1329,13 @@ static int _hvm_emulate_one(struct hvm_emulate_ctxt *hvmemul_ctxt,
> vio->mmio_retrying = vio->mmio_retry;
> vio->mmio_retry = 0;
>
> + if ( cpu_has_vmx )
> + hvmemul_ctxt->ctxt.swint_emulate = x86_swint_emulate_none;
> + else if ( cpu_has_svm_nrips )
> + hvmemul_ctxt->ctxt.swint_emulate = x86_swint_emulate_icebp;
> + else
> + hvmemul_ctxt->ctxt.swint_emulate = x86_swint_emulate_all;
> +
> rc = x86_emulate(&hvmemul_ctxt->ctxt, ops);
>
> if ( rc == X86EMUL_OKAY && vio->mmio_retry )
> diff --git a/xen/arch/x86/hvm/svm/svm.c b/xen/arch/x86/hvm/svm/svm.c
> index de982fd..b6beefc 100644
> --- a/xen/arch/x86/hvm/svm/svm.c
> +++ b/xen/arch/x86/hvm/svm/svm.c
> @@ -1177,11 +1177,12 @@ static void svm_inject_trap(struct hvm_trap *trap)
> struct vmcb_struct *vmcb = curr->arch.hvm_svm.vmcb;
> eventinj_t event = vmcb->eventinj;
> struct hvm_trap _trap = *trap;
> + const struct cpu_user_regs *regs = guest_cpu_user_regs();
>
> switch ( _trap.vector )
> {
> case TRAP_debug:
> - if ( guest_cpu_user_regs()->eflags & X86_EFLAGS_TF )
> + if ( regs->eflags & X86_EFLAGS_TF )
> {
> __restore_debug_registers(vmcb, curr);
> vmcb_set_dr6(vmcb, vmcb_get_dr6(vmcb) | 0x4000);
> @@ -1209,10 +1210,58 @@ static void svm_inject_trap(struct hvm_trap *trap)
>
> event.bytes = 0;
> event.fields.v = 1;
> - event.fields.type = X86_EVENTTYPE_HW_EXCEPTION;
> event.fields.vector = _trap.vector;
> - event.fields.ev = (_trap.error_code != HVM_DELIVER_NO_ERROR_CODE);
> - event.fields.errorcode = _trap.error_code;
> +
> + /* Refer to AMD Vol 2: System Programming, 15.20 Event Injection. */
> + switch ( _trap.type )
> + {
> + case X86_EVENTTYPE_SW_INTERRUPT: /* int $n */
> + /*
> + * Injection type 4 (software interrupt) is only supported with
> + * NextRIP support. Without NextRIP, the emulator will have performed
> + * DPL and presence checks for us.
> + */
> + if ( cpu_has_svm_nrips )
> + {
> + vmcb->nextrip = regs->eip + _trap.insn_len;
> + event.fields.type = X86_EVENTTYPE_SW_INTERRUPT;
> + }
> + else
> + event.fields.type = X86_EVENTTYPE_HW_EXCEPTION;
> + break;
> +
> + case X86_EVENTTYPE_PRI_SW_EXCEPTION: /* icebp */
> + /*
> + * icebp's injection must always be emulated. Software injection help
> + * in x86_emulate has moved eip forward, but NextRIP (if used) still
> + * needs setting or execution will resume from 0.
> + */
Can you tell me where eip is updated? I don't see any difference between
how, for example, int3 is emulated differently from icebp.
-boris
> + if ( cpu_has_svm_nrips )
> + vmcb->nextrip = regs->eip;
> + event.fields.type = X86_EVENTTYPE_HW_EXCEPTION;
> + break;
> +
> + case X86_EVENTTYPE_SW_EXCEPTION: /* int3, into */
> + /*
> + * The AMD manual states that .type=3 (HW exception), .vector=3 or 4,
> + * will perform DPL checks. Experimentally, DPL and presence checks
> + * are indeed performed, even without NextRIP support.
> + *
> + * However without NextRIP support, the event injection still needs
> + * fully emulating to get the correct eip in the trap frame, yet get
> + * the correct faulting eip should a fault occur.
> + */
> + if ( cpu_has_svm_nrips )
> + vmcb->nextrip = regs->eip + _trap.insn_len;
> + event.fields.type = X86_EVENTTYPE_HW_EXCEPTION;
> + break;
> +
> + default:
> + event.fields.type = X86_EVENTTYPE_HW_EXCEPTION;
> + event.fields.ev = (_trap.error_code != HVM_DELIVER_NO_ERROR_CODE);
> + event.fields.errorcode = _trap.error_code;
> + break;
> + }
>
> vmcb->eventinj = event;
>
> diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
> index 5b3f06f..bfe9f05 100644
> --- a/xen/arch/x86/mm.c
> +++ b/xen/arch/x86/mm.c
> @@ -5096,6 +5096,7 @@ int ptwr_do_page_fault(struct vcpu *v, unsigned long addr,
> ptwr_ctxt.ctxt.force_writeback = 0;
> ptwr_ctxt.ctxt.addr_size = ptwr_ctxt.ctxt.sp_size =
> is_pv_32on64_domain(d) ? 32 : BITS_PER_LONG;
> + ptwr_ctxt.ctxt.swint_emulate = x86_swint_emulate_none;
> ptwr_ctxt.cr2 = addr;
> ptwr_ctxt.pte = pte;
>
> @@ -5172,6 +5173,7 @@ int mmio_ro_do_page_fault(struct vcpu *v, unsigned long addr,
> .ctxt.regs = regs,
> .ctxt.addr_size = addr_size,
> .ctxt.sp_size = addr_size,
> + .ctxt.swint_emulate = x86_swint_emulate_none,
> .cr2 = addr
> };
> int rc;
> diff --git a/xen/arch/x86/mm/shadow/common.c b/xen/arch/x86/mm/shadow/common.c
> index 9115a78..a5eed28 100644
> --- a/xen/arch/x86/mm/shadow/common.c
> +++ b/xen/arch/x86/mm/shadow/common.c
> @@ -366,6 +366,7 @@ const struct x86_emulate_ops *shadow_init_emulation(
>
> sh_ctxt->ctxt.regs = regs;
> sh_ctxt->ctxt.force_writeback = 0;
> + sh_ctxt->ctxt.swint_emulate = x86_swint_emulate_none;
>
> if ( is_pv_vcpu(v) )
> {
> diff --git a/xen/arch/x86/x86_emulate/x86_emulate.c b/xen/arch/x86/x86_emulate/x86_emulate.c
> index e06aa60..ffca65a 100644
> --- a/xen/arch/x86/x86_emulate/x86_emulate.c
> +++ b/xen/arch/x86/x86_emulate/x86_emulate.c
> @@ -403,6 +403,11 @@ typedef union {
> #define EXC_PF 14
> #define EXC_MF 16
>
> +/* Segment selector error code bits. */
> +#define ECODE_EXT (1 << 0)
> +#define ECODE_IDT (1 << 1)
> +#define ECODE_TI (1 << 2)
> +
> /*
> * Instruction emulation:
> * Most instructions are emulated directly via a fragment of inline assembly
> @@ -1318,6 +1323,115 @@ decode_segment(uint8_t modrm_reg)
> return decode_segment_failed;
> }
>
> +/* Inject a software interrupt/exception, emulating if needed. */
> +static int inject_swint(enum x86_swint_type type,
> + uint8_t vector, uint8_t insn_len,
> + struct x86_emulate_ctxt *ctxt,
> + const struct x86_emulate_ops *ops)
> +{
> + int rc, error_code, fault_type = EXC_GP;
> +
> + fail_if(ops->inject_sw_interrupt == NULL);
> + fail_if(ops->inject_hw_exception == NULL);
> +
> + /*
> + * Without hardware support, injecting software interrupts/exceptions is
> + * problematic.
> + *
> + * All software methods of generating exceptions (other than BOUND) yield
> + * traps, so eip in the exception frame needs to point after the
> + * instruction, not at it.
> + *
> + * However, if injecting it as a hardware exception causes a fault during
> + * delivery, our adjustment of eip will cause the fault to be reported
> + * after the faulting instruction, not pointing to it.
> + *
> + * Therefore, eip can only safely be wound forwards if we are certain that
> + * injecting an equivalent hardware exception won't fault, which means
> + * emulating everything the processor would do on a control transfer.
> + *
> + * However, emulation of complete control transfers is very complicated.
> + * All we care about is that guest userspace cannot avoid the descriptor
> + * DPL check by using the Xen emulator, and successfully invoke DPL=0
> + * descriptors.
> + *
> + * Any OS which would further fault during injection is going to receive a
> + * double fault anyway, and won't be in a position to care that the
> + * faulting eip is incorrect.
> + */
> +
> + if ( (ctxt->swint_emulate == x86_swint_emulate_all) ||
> + ((ctxt->swint_emulate == x86_swint_emulate_icebp) &&
> + (type == x86_swint_icebp)) )
> + {
> + if ( !in_realmode(ctxt, ops) )
> + {
> + unsigned int idte_size = (ctxt->addr_size == 64) ? 16 : 8;
> + unsigned int idte_offset = vector * idte_size;
> + struct segment_register idtr;
> + uint32_t idte_ctl;
> +
> + /* icebp sets the External Event bit despite being an instruction. */
> + error_code = (vector << 3) | ECODE_IDT |
> + (type == x86_swint_icebp ? ECODE_EXT : 0);
> +
> + /*
> + * TODO - this does not cover the v8086 mode with CR4.VME case
> + * correctly, but falls on the safe side from the point of view of
> + * a 32bit OS. Someone with many TUITs can see about reading the
> + * TSS Software Interrupt Redirection bitmap.
> + */
> + if ( (ctxt->regs->eflags & EFLG_VM) &&
> + ((ctxt->regs->eflags & EFLG_IOPL) != EFLG_IOPL) )
> + goto raise_exn;
> +
> + fail_if(ops->read_segment == NULL);
> + fail_if(ops->read == NULL);
> + if ( (rc = ops->read_segment(x86_seg_idtr, &idtr, ctxt)) )
> + goto done;
> +
> + if ( (idte_offset + idte_size - 1) > idtr.limit )
> + goto raise_exn;
> +
> + /*
> + * Should strictly speaking read all 8/16 bytes of an entry,
> + * but we currently only care about the dpl and present bits.
> + */
> + ops->read(x86_seg_none, idtr.base + idte_offset + 4,
> + &idte_ctl, sizeof(idte_ctl), ctxt);
> +
> + /* Is this entry present? */
> + if ( !(idte_ctl & (1u << 15)) )
> + {
> + fault_type = EXC_NP;
> + goto raise_exn;
> + }
> +
> + /* icebp counts as a hardware event, and bypasses the dpl check. */
> + if ( type != x86_swint_icebp )
> + {
> + struct segment_register ss;
> +
> + if ( (rc = ops->read_segment(x86_seg_ss, &ss, ctxt)) )
> + goto done;
> +
> + if ( ss.attr.fields.dpl > ((idte_ctl >> 13) & 3) )
> + goto raise_exn;
> + }
> + }
> +
> + ctxt->regs->eip += insn_len;
> + }
> +
> + rc = ops->inject_sw_interrupt(type, vector, insn_len, ctxt);
> +
> + done:
> + return rc;
> +
> + raise_exn:
> + return ops->inject_hw_exception(fault_type, error_code, ctxt);
> +}
> +
> int
> x86_emulate(
> struct x86_emulate_ctxt *ctxt,
> @@ -2637,11 +2751,9 @@ x86_emulate(
> src.val = insn_fetch_type(uint8_t);
> swint_type = x86_swint_int;
> swint:
> - fail_if(!in_realmode(ctxt, ops)); /* XSA-106 */
> - fail_if(ops->inject_sw_interrupt == NULL);
> - rc = ops->inject_sw_interrupt(swint_type, src.val,
> - _regs.eip - ctxt->regs->eip,
> - ctxt) ? : X86EMUL_EXCEPTION;
> + rc = inject_swint(swint_type, src.val,
> + _regs.eip - ctxt->regs->eip,
> + ctxt, ops) ? : X86EMUL_EXCEPTION;
> goto done;
>
> case 0xce: /* into */
> diff --git a/xen/arch/x86/x86_emulate/x86_emulate.h b/xen/arch/x86/x86_emulate/x86_emulate.h
> index b336e17..b059341 100644
> --- a/xen/arch/x86/x86_emulate/x86_emulate.h
> +++ b/xen/arch/x86/x86_emulate/x86_emulate.h
> @@ -59,6 +59,13 @@ enum x86_swint_type {
> x86_swint_int, /* 0xcd $n */
> };
>
> +/* How much help is required with software event injection? */
> +enum x86_swint_emulation {
> + x86_swint_emulate_none, /* Hardware supports all software injection properly */
> + x86_swint_emulate_icebp,/* Help needed with `icebp` (0xf1) */
> + x86_swint_emulate_all, /* Help needed with all software events */
> +};
> +
> /*
> * Attribute for segment selector. This is a copy of bit 40:47 & 52:55 of the
> * segment descriptor. It happens to match the format of an AMD SVM VMCB.
> @@ -388,6 +395,9 @@ struct x86_emulate_ctxt
> /* Set this if writes may have side effects. */
> uint8_t force_writeback;
>
> + /* Software event injection support. */
> + enum x86_swint_emulation swint_emulate;
> +
> /* Retirement state, set by the emulator (valid only on X86EMUL_OKAY). */
> union {
> struct {
next prev parent reply other threads:[~2014-09-24 13:01 UTC|newest]
Thread overview: 28+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-09-23 15:03 [PATCH 0/6] HVM Emulation and trap injection fixes Andrew Cooper
2014-09-23 15:03 ` [PATCH 1/6] x86emul: fix SYSCALL/SYSENTER/SYSEXIT emulation Andrew Cooper
2014-09-23 15:03 ` [PATCH 2/6] x86/emulate: Provide further information about software events Andrew Cooper
2014-09-23 15:03 ` [PATCH 3/6] x86/hvm: Don't discard the SW/HW event distinction from the emulator Andrew Cooper
2014-09-25 20:57 ` Tian, Kevin
2014-09-26 20:12 ` Boris Ostrovsky
2014-09-23 15:03 ` [PATCH 4/6] x86/emulate: Support for emulating software event injection Andrew Cooper
2014-09-23 22:24 ` Aravind Gopalakrishnan
2014-09-24 9:22 ` Andrew Cooper
2014-09-24 13:01 ` Boris Ostrovsky [this message]
2014-09-24 13:04 ` Andrew Cooper
2014-09-24 13:24 ` Boris Ostrovsky
2014-09-24 14:20 ` Andrew Cooper
2014-09-26 20:13 ` Boris Ostrovsky
2014-09-26 21:09 ` Aravind Gopalakrishnan
2014-09-23 15:03 ` [PATCH 5/6] x86/hvm: Forced Emulation Prefix for debug builds of Xen Andrew Cooper
2014-09-23 15:27 ` Jan Beulich
2014-09-23 16:09 ` [PATCH v2 " Andrew Cooper
2014-09-23 16:21 ` Jan Beulich
2014-09-25 21:04 ` Tian, Kevin
2014-09-23 18:20 ` Boris Ostrovsky
2014-09-23 18:23 ` Andrew Cooper
2014-09-23 20:17 ` Boris Ostrovsky
2014-09-24 12:56 ` Andrew Cooper
2014-09-26 20:14 ` Boris Ostrovsky
2014-09-23 15:03 ` [PATCH 6/6] x86/svm: Misc cleanup Andrew Cooper
2014-09-26 20:15 ` Boris Ostrovsky
2014-09-23 15:19 ` [PATCH 0/6] HVM Emulation and trap injection fixes Jan Beulich
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5422C0AE.3090404@oracle.com \
--to=boris.ostrovsky@oracle.com \
--cc=Aravind.Gopalakrishnan@amd.com \
--cc=andrew.cooper3@citrix.com \
--cc=jbeulich@suse.com \
--cc=suravee.suthikulpanit@amd.com \
--cc=xen-devel@lists.xen.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).