[PATCH V7 for-4.5 0/4] Basic guest memory introspection support

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH V7 for-4.5 0/4] Basic guest memory introspection support
@ 2014-09-11 13:15 Razvan Cojocaru
  2014-09-11 13:15 ` [PATCH V7 for-4.5 1/4] xen: Emulate with no writes Razvan Cojocaru
                   ` (3 more replies)
  0 siblings, 4 replies; 12+ messages in thread
From: Razvan Cojocaru @ 2014-09-11 13:15 UTC (permalink / raw)
  To: xen-devel
  Cc: kevin.tian, keir, ian.campbell, stefano.stabellini, eddie.dong,
	ian.jackson, tim, jbeulich, jun.nakajima, andrew.cooper3

We need to be able to detect rootkits in HVM guests, in a way that
allows an application that runs in dom0 (or a similarly privileged
domain) to control what the guest is allowed to do once a threat is
detected. This has been done over the mem_event mechanism.

Here is a summary of the series:

a  1/4 xen: Emulate with no writes
a  2/4 xen: Optimize introspection access to guest state
A  3/4 xen, libxc: Force-enable relevant MSR events
a* 4/4 xen: Handle resumed instruction based on previous mem_event reply

Key to symbols:
 *   Updated in this version of the series.
 +   New patch in this version.
 /   Updated but only to remove changes into a separate patch.
 -   Updated with style changes only.
 a   Acked/reviewed by one reviewer.
 A   Acked/reviewed by more than one reviewer.

This new version has removed the "xen, libxc: Request page fault injection
via libxc" patch - for now we can make due with the original
HVMOP_inject_trap. In patch 4/5 (formerly 5/5) the GLA mem_event filtering
has been removed, and now all mem_events are being sent out. It becomes
the mem_event client's job to filter them. "For-4.5" has been added to the
prefix according to Ian Campbell's recommendation.

We needed to:

1. Be able to execute the current instruction without allowing it to
write to memory. This is done based on new mem_event response
fields sent from the controlling application.

2. Have the guest as responsive as possible amid all the processing. So
we had to cache some information with each mem_event sent.

3. Not allow the hypervisor to disable sending information about
interesting MSR events.

4. Add an additional mem_event type, namely a VMCALL event - in order to
do its work, our application occasionally triggers VMCALLs in the guest
(not included in the current series, but included in the initial RFC
series).

5. Extend libxc to allow triggering page faults in the guest (in order
to be able to bring back in pages swapped out by the guest OS). For now,
this patch has been skipped in favour of plain HVMOP_inject_trap.

Thanks,
Razvan Cojocaru

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH V7 for-4.5 1/4] xen: Emulate with no writes
  2014-09-11 13:15 [PATCH V7 for-4.5 0/4] Basic guest memory introspection support Razvan Cojocaru
@ 2014-09-11 13:15 ` Razvan Cojocaru
  2014-09-11 13:15 ` [PATCH V7 for-4.5 2/4] xen: Optimize introspection access to guest state Razvan Cojocaru
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 12+ messages in thread
From: Razvan Cojocaru @ 2014-09-11 13:15 UTC (permalink / raw)
  To: xen-devel
  Cc: kevin.tian, keir, ian.campbell, Razvan Cojocaru,
	stefano.stabellini, eddie.dong, ian.jackson, tim, jbeulich,
	jun.nakajima, andrew.cooper3

Added support for emulating an instruction with no memory writes.
Additionally, introduced hvm_emulate_one_full(), which inspects
possible return values from the hvm_emulate_one() functions
(EXCEPTION, UNHANDLEABLE) and acts on them.

Signed-off-by: Razvan Cojocaru <rcojocaru@bitdefender.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
---
 xen/arch/x86/hvm/emulate.c        |  175 ++++++++++++++++++++++++++++++++++++-
 xen/include/asm-x86/hvm/emulate.h |    5 ++
 2 files changed, 177 insertions(+), 3 deletions(-)

diff --git a/xen/arch/x86/hvm/emulate.c b/xen/arch/x86/hvm/emulate.c
index 86cf432..6ab06e0 100644
--- a/xen/arch/x86/hvm/emulate.c
+++ b/xen/arch/x86/hvm/emulate.c
@@ -690,6 +690,94 @@ static int hvmemul_write(
     return X86EMUL_OKAY;
 }
 
+static int hvmemul_write_discard(
+    enum x86_segment seg,
+    unsigned long offset,
+    void *p_data,
+    unsigned int bytes,
+    struct x86_emulate_ctxt *ctxt)
+{
+    /* Discarding the write. */
+    return X86EMUL_OKAY;
+}
+
+static int hvmemul_rep_ins_discard(
+    uint16_t src_port,
+    enum x86_segment dst_seg,
+    unsigned long dst_offset,
+    unsigned int bytes_per_rep,
+    unsigned long *reps,
+    struct x86_emulate_ctxt *ctxt)
+{
+    return X86EMUL_OKAY;
+}
+
+static int hvmemul_rep_movs_discard(
+   enum x86_segment src_seg,
+   unsigned long src_offset,
+   enum x86_segment dst_seg,
+   unsigned long dst_offset,
+   unsigned int bytes_per_rep,
+   unsigned long *reps,
+   struct x86_emulate_ctxt *ctxt)
+{
+    return X86EMUL_OKAY;
+}
+
+static int hvmemul_rep_outs_discard(
+    enum x86_segment src_seg,
+    unsigned long src_offset,
+    uint16_t dst_port,
+    unsigned int bytes_per_rep,
+    unsigned long *reps,
+    struct x86_emulate_ctxt *ctxt)
+{
+    return X86EMUL_OKAY;
+}
+
+static int hvmemul_cmpxchg_discard(
+    enum x86_segment seg,
+    unsigned long offset,
+    void *p_old,
+    void *p_new,
+    unsigned int bytes,
+    struct x86_emulate_ctxt *ctxt)
+{
+    return X86EMUL_OKAY;
+}
+
+static int hvmemul_read_io_discard(
+    unsigned int port,
+    unsigned int bytes,
+    unsigned long *val,
+    struct x86_emulate_ctxt *ctxt)
+{
+    return X86EMUL_OKAY;
+}
+
+static int hvmemul_write_io_discard(
+    unsigned int port,
+    unsigned int bytes,
+    unsigned long val,
+    struct x86_emulate_ctxt *ctxt)
+{
+    return X86EMUL_OKAY;
+}
+
+static int hvmemul_write_msr_discard(
+    unsigned long reg,
+    uint64_t val,
+    struct x86_emulate_ctxt *ctxt)
+{
+    return X86EMUL_OKAY;
+}
+
+static int hvmemul_wbinvd_discard(
+    struct x86_emulate_ctxt *ctxt)
+{
+    return X86EMUL_OKAY;
+}
+
 static int hvmemul_cmpxchg(
     enum x86_segment seg,
     unsigned long offset,
@@ -1140,8 +1228,33 @@ static const struct x86_emulate_ops hvm_emulate_ops = {
     .invlpg        = hvmemul_invlpg
 };
 
-int hvm_emulate_one(
-    struct hvm_emulate_ctxt *hvmemul_ctxt)
+static const struct x86_emulate_ops hvm_emulate_ops_no_write = {
+    .read          = hvmemul_read,
+    .insn_fetch    = hvmemul_insn_fetch,
+    .write         = hvmemul_write_discard,
+    .cmpxchg       = hvmemul_cmpxchg_discard,
+    .rep_ins       = hvmemul_rep_ins_discard,
+    .rep_outs      = hvmemul_rep_outs_discard,
+    .rep_movs      = hvmemul_rep_movs_discard,
+    .read_segment  = hvmemul_read_segment,
+    .write_segment = hvmemul_write_segment,
+    .read_io       = hvmemul_read_io_discard,
+    .write_io      = hvmemul_write_io_discard,
+    .read_cr       = hvmemul_read_cr,
+    .write_cr      = hvmemul_write_cr,
+    .read_msr      = hvmemul_read_msr,
+    .write_msr     = hvmemul_write_msr_discard,
+    .wbinvd        = hvmemul_wbinvd_discard,
+    .cpuid         = hvmemul_cpuid,
+    .inject_hw_exception = hvmemul_inject_hw_exception,
+    .inject_sw_interrupt = hvmemul_inject_sw_interrupt,
+    .get_fpu       = hvmemul_get_fpu,
+    .put_fpu       = hvmemul_put_fpu,
+    .invlpg        = hvmemul_invlpg
+};
+
+static int _hvm_emulate_one(struct hvm_emulate_ctxt *hvmemul_ctxt,
+    const struct x86_emulate_ops *ops)
 {
     struct cpu_user_regs *regs = hvmemul_ctxt->ctxt.regs;
     struct vcpu *curr = current;
@@ -1193,7 +1306,7 @@ int hvm_emulate_one(
     vio->mmio_retrying = vio->mmio_retry;
     vio->mmio_retry = 0;
 
-    rc = x86_emulate(&hvmemul_ctxt->ctxt, &hvm_emulate_ops);
+    rc = x86_emulate(&hvmemul_ctxt->ctxt, ops);
 
     if ( rc == X86EMUL_OKAY && vio->mmio_retry )
         rc = X86EMUL_RETRY;
@@ -1241,6 +1354,62 @@ int hvm_emulate_one(
     return X86EMUL_OKAY;
 }
 
+int hvm_emulate_one(
+    struct hvm_emulate_ctxt *hvmemul_ctxt)
+{
+    return _hvm_emulate_one(hvmemul_ctxt, &hvm_emulate_ops);
+}
+
+int hvm_emulate_one_no_write(
+    struct hvm_emulate_ctxt *hvmemul_ctxt)
+{
+    return _hvm_emulate_one(hvmemul_ctxt, &hvm_emulate_ops_no_write);
+}
+
+void hvm_mem_event_emulate_one(bool_t nowrite, unsigned int trapnr,
+    unsigned int errcode)
+{
+    struct hvm_emulate_ctxt ctx = {{ 0 }};
+    int rc;
+
+    hvm_emulate_prepare(&ctx, guest_cpu_user_regs());
+
+    if ( nowrite )
+        rc = hvm_emulate_one_no_write(&ctx);
+    else
+        rc = hvm_emulate_one(&ctx);
+
+    switch ( rc )
+    {
+    case X86EMUL_RETRY:
+        /*
+         * This function is called when handling an EPT-related mem_event
+         * reply. As such, nothing else needs to be done here, since simply
+         * returning makes the current instruction cause a page fault again,
+         * consistent with X86EMUL_RETRY.
+         */
+        return;
+    case X86EMUL_UNHANDLEABLE:
+        gdprintk(XENLOG_DEBUG, "Emulation failed @ %04x:%lx: "
+               "%02x %02x %02x %02x %02x %02x %02x %02x %02x %02x\n",
+               hvmemul_get_seg_reg(x86_seg_cs, &ctx)->sel,
+               ctx.insn_buf_eip,
+               ctx.insn_buf[0], ctx.insn_buf[1],
+               ctx.insn_buf[2], ctx.insn_buf[3],
+               ctx.insn_buf[4], ctx.insn_buf[5],
+               ctx.insn_buf[6], ctx.insn_buf[7],
+               ctx.insn_buf[8], ctx.insn_buf[9]);
+        hvm_inject_hw_exception(trapnr, errcode);
+        break;
+    case X86EMUL_EXCEPTION:
+        if ( ctx.exn_pending )
+            hvm_inject_hw_exception(ctx.exn_vector, ctx.exn_error_code);
+        break;
+    }
+
+    hvm_emulate_writeback(&ctx);
+}
+
 void hvm_emulate_prepare(
     struct hvm_emulate_ctxt *hvmemul_ctxt,
     struct cpu_user_regs *regs)
diff --git a/xen/include/asm-x86/hvm/emulate.h b/xen/include/asm-x86/hvm/emulate.h
index 00a06cc..efff97e 100644
--- a/xen/include/asm-x86/hvm/emulate.h
+++ b/xen/include/asm-x86/hvm/emulate.h
@@ -37,6 +37,11 @@ struct hvm_emulate_ctxt {
 
 int hvm_emulate_one(
     struct hvm_emulate_ctxt *hvmemul_ctxt);
+int hvm_emulate_one_no_write(
+    struct hvm_emulate_ctxt *hvmemul_ctxt);
+void hvm_mem_event_emulate_one(bool_t nowrite,
+    unsigned int trapnr,
+    unsigned int errcode);
 void hvm_emulate_prepare(
     struct hvm_emulate_ctxt *hvmemul_ctxt,
     struct cpu_user_regs *regs);
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH V7 for-4.5 2/4] xen: Optimize introspection access to guest state
  2014-09-11 13:15 [PATCH V7 for-4.5 0/4] Basic guest memory introspection support Razvan Cojocaru
  2014-09-11 13:15 ` [PATCH V7 for-4.5 1/4] xen: Emulate with no writes Razvan Cojocaru
@ 2014-09-11 13:15 ` Razvan Cojocaru
  2014-09-11 13:15 ` [PATCH V7 for-4.5 3/4] xen, libxc: Force-enable relevant MSR events Razvan Cojocaru
  2014-09-11 13:15 ` [PATCH V7 for-4.5 4/4] xen: Handle resumed instruction based on previous mem_event reply Razvan Cojocaru
  3 siblings, 0 replies; 12+ messages in thread
From: Razvan Cojocaru @ 2014-09-11 13:15 UTC (permalink / raw)
  To: xen-devel
  Cc: kevin.tian, keir, ian.campbell, Razvan Cojocaru,
	stefano.stabellini, eddie.dong, ian.jackson, tim, jbeulich,
	jun.nakajima, andrew.cooper3

Speed optimization for introspection purposes: a handful of registers
are sent along with each mem_event. This requires enlargement of the
mem_event_request / mem_event_response stuctures, and additional code
to fill in relevant values. Since the EPT event processing code needs
more data than CR3 or MSR event processors, hvm_mem_event_fill_regs()
fills in less data than p2m_mem_event_fill_regs(), in order to avoid
overhead. Struct hvm_hw_cpu has been considered instead of the custom
struct mem_event_regs_st, but its size would cause quick filling up
of the mem_event ring buffer.

Signed-off-by: Razvan Cojocaru <rcojocaru@bitdefender.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
---
 xen/arch/x86/hvm/hvm.c         |   33 +++++++++++++++++++++++
 xen/arch/x86/mm/p2m.c          |   57 ++++++++++++++++++++++++++++++++++++++++
 xen/include/public/mem_event.h |   39 +++++++++++++++++++++++++++
 3 files changed, 129 insertions(+)

diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 8d905d3..bb45593 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -6149,6 +6149,38 @@ int hvm_debug_op(struct vcpu *v, int32_t op)
     return rc;
 }
 
+static void hvm_mem_event_fill_regs(mem_event_request_t *req)
+{
+    const struct cpu_user_regs *regs = guest_cpu_user_regs();
+    const struct vcpu *curr = current;
+
+    req->x86_regs.rax = regs->eax;
+    req->x86_regs.rcx = regs->ecx;
+    req->x86_regs.rdx = regs->edx;
+    req->x86_regs.rbx = regs->ebx;
+    req->x86_regs.rsp = regs->esp;
+    req->x86_regs.rbp = regs->ebp;
+    req->x86_regs.rsi = regs->esi;
+    req->x86_regs.rdi = regs->edi;
+
+    req->x86_regs.r8  = regs->r8;
+    req->x86_regs.r9  = regs->r9;
+    req->x86_regs.r10 = regs->r10;
+    req->x86_regs.r11 = regs->r11;
+    req->x86_regs.r12 = regs->r12;
+    req->x86_regs.r13 = regs->r13;
+    req->x86_regs.r14 = regs->r14;
+    req->x86_regs.r15 = regs->r15;
+
+    req->x86_regs.rflags = regs->eflags;
+    req->x86_regs.rip    = regs->eip;
+
+    req->x86_regs.msr_efer = curr->arch.hvm_vcpu.guest_efer;
+    req->x86_regs.cr0 = curr->arch.hvm_vcpu.guest_cr[0];
+    req->x86_regs.cr3 = curr->arch.hvm_vcpu.guest_cr[3];
+    req->x86_regs.cr4 = curr->arch.hvm_vcpu.guest_cr[4];
+}
+
 static int hvm_memory_event_traps(long p, uint32_t reason,
                                   unsigned long value, unsigned long old, 
                                   bool_t gla_valid, unsigned long gla) 
@@ -6193,6 +6225,7 @@ static int hvm_memory_event_traps(long p, uint32_t reason,
         req.gla = old;
     }
     
+    hvm_mem_event_fill_regs(&req);
     mem_event_put_request(d, &d->mem_event->access, &req);
     
     return 1;
diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c
index 32776c3..d0962aa 100644
--- a/xen/arch/x86/mm/p2m.c
+++ b/xen/arch/x86/mm/p2m.c
@@ -1327,6 +1327,61 @@ void p2m_mem_paging_resume(struct domain *d)
     }
 }
 
+static void p2m_mem_event_fill_regs(mem_event_request_t *req)
+{
+    const struct cpu_user_regs *regs = guest_cpu_user_regs();
+    struct segment_register seg;
+    struct hvm_hw_cpu ctxt;
+    struct vcpu *curr = current;
+
+    /* Architecture-specific vmcs/vmcb bits */
+    hvm_funcs.save_cpu_ctxt(curr, &ctxt);
+
+    req->x86_regs.rax = regs->eax;
+    req->x86_regs.rcx = regs->ecx;
+    req->x86_regs.rdx = regs->edx;
+    req->x86_regs.rbx = regs->ebx;
+    req->x86_regs.rsp = regs->esp;
+    req->x86_regs.rbp = regs->ebp;
+    req->x86_regs.rsi = regs->esi;
+    req->x86_regs.rdi = regs->edi;
+
+    req->x86_regs.r8  = regs->r8;
+    req->x86_regs.r9  = regs->r9;
+    req->x86_regs.r10 = regs->r10;
+    req->x86_regs.r11 = regs->r11;
+    req->x86_regs.r12 = regs->r12;
+    req->x86_regs.r13 = regs->r13;
+    req->x86_regs.r14 = regs->r14;
+    req->x86_regs.r15 = regs->r15;
+
+    req->x86_regs.rflags = regs->eflags;
+    req->x86_regs.rip    = regs->eip;
+
+    req->x86_regs.dr7 = curr->arch.debugreg[7];
+    req->x86_regs.cr0 = ctxt.cr0;
+    req->x86_regs.cr2 = ctxt.cr2;
+    req->x86_regs.cr3 = ctxt.cr3;
+    req->x86_regs.cr4 = ctxt.cr4;
+
+    req->x86_regs.sysenter_cs = ctxt.sysenter_cs;
+    req->x86_regs.sysenter_esp = ctxt.sysenter_esp;
+    req->x86_regs.sysenter_eip = ctxt.sysenter_eip;
+
+    req->x86_regs.msr_efer = ctxt.msr_efer;
+    req->x86_regs.msr_star = ctxt.msr_star;
+    req->x86_regs.msr_lstar = ctxt.msr_lstar;
+
+    hvm_get_segment_register(curr, x86_seg_fs, &seg);
+    req->x86_regs.fs_base = seg.base;
+
+    hvm_get_segment_register(curr, x86_seg_gs, &seg);
+    req->x86_regs.gs_base = seg.base;
+
+    hvm_get_segment_register(curr, x86_seg_cs, &seg);
+    req->x86_regs.cs_arbytes = seg.attr.bytes;
+}
+
 bool_t p2m_mem_access_check(paddr_t gpa, unsigned long gla,
                             struct npfec npfec,
                             mem_event_request_t **req_ptr)
@@ -1417,6 +1472,8 @@ bool_t p2m_mem_access_check(paddr_t gpa, unsigned long gla,
         req->access_w = npfec.write_access;
         req->access_x = npfec.insn_fetch;
         req->vcpu_id = v->vcpu_id;
+
+        p2m_mem_event_fill_regs(req);
     }
 
     /* Pause the current VCPU */
diff --git a/xen/include/public/mem_event.h b/xen/include/public/mem_event.h
index fc12697..d3dd9c6 100644
--- a/xen/include/public/mem_event.h
+++ b/xen/include/public/mem_event.h
@@ -48,6 +48,44 @@
 #define MEM_EVENT_REASON_MSR         7    /* MSR was hit: gfn is MSR value, gla is MSR address;
                                              does NOT honour HVMPME_onchangeonly */
 
+/* Using a custom struct (not hvm_hw_cpu) so as to not fill
+ * the mem_event ring buffer too quickly. */
+struct mem_event_regs_x86 {
+    uint64_t rax;
+    uint64_t rcx;
+    uint64_t rdx;
+    uint64_t rbx;
+    uint64_t rsp;
+    uint64_t rbp;
+    uint64_t rsi;
+    uint64_t rdi;
+    uint64_t r8;
+    uint64_t r9;
+    uint64_t r10;
+    uint64_t r11;
+    uint64_t r12;
+    uint64_t r13;
+    uint64_t r14;
+    uint64_t r15;
+    uint64_t rflags;
+    uint64_t dr7;
+    uint64_t rip;
+    uint64_t cr0;
+    uint64_t cr2;
+    uint64_t cr3;
+    uint64_t cr4;
+    uint64_t sysenter_cs;
+    uint64_t sysenter_esp;
+    uint64_t sysenter_eip;
+    uint64_t msr_efer;
+    uint64_t msr_star;
+    uint64_t msr_lstar;
+    uint64_t fs_base;
+    uint64_t gs_base;
+    uint32_t cs_arbytes;
+    uint32_t _pad;
+};
+
 typedef struct mem_event_st {
     uint32_t flags;
     uint32_t vcpu_id;
@@ -67,6 +105,7 @@ typedef struct mem_event_st {
     uint16_t available:10;
 
     uint16_t reason;
+    struct mem_event_regs_x86 x86_regs;
 } mem_event_request_t, mem_event_response_t;
 
 DEFINE_RING_TYPES(mem_event, mem_event_request_t, mem_event_response_t);
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH V7 for-4.5 3/4] xen, libxc: Force-enable relevant MSR events
  2014-09-11 13:15 [PATCH V7 for-4.5 0/4] Basic guest memory introspection support Razvan Cojocaru
  2014-09-11 13:15 ` [PATCH V7 for-4.5 1/4] xen: Emulate with no writes Razvan Cojocaru
  2014-09-11 13:15 ` [PATCH V7 for-4.5 2/4] xen: Optimize introspection access to guest state Razvan Cojocaru
@ 2014-09-11 13:15 ` Razvan Cojocaru
  2014-09-11 13:15 ` [PATCH V7 for-4.5 4/4] xen: Handle resumed instruction based on previous mem_event reply Razvan Cojocaru
  3 siblings, 0 replies; 12+ messages in thread
From: Razvan Cojocaru @ 2014-09-11 13:15 UTC (permalink / raw)
  To: xen-devel
  Cc: kevin.tian, keir, ian.campbell, Razvan Cojocaru,
	stefano.stabellini, eddie.dong, ian.jackson, tim, jbeulich,
	jun.nakajima, andrew.cooper3

Vmx_disable_intercept_for_msr() will now refuse to disable interception of
MSRs needed for memory introspection. It is not possible to gate this on
mem_access being active for the domain, since by the time mem_access does
become active the interception for the interesting MSRs has already been
disabled (vmx_disable_intercept_for_msr() runs very early on).

Signed-off-by: Razvan Cojocaru <rcojocaru@bitdefender.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>
---
 tools/libxc/xc_mem_access.c        |   10 +++++++++-
 tools/libxc/xc_mem_event.c         |    7 +++++--
 tools/libxc/xc_private.h           |    2 +-
 tools/libxc/xenctrl.h              |    2 ++
 xen/arch/x86/hvm/vmx/vmcs.c        |   25 +++++++++++++++++++++++++
 xen/arch/x86/hvm/vmx/vmx.c         |   13 +++++++++++++
 xen/arch/x86/mm/mem_event.c        |   11 +++++++++++
 xen/include/asm-x86/hvm/domain.h   |    1 +
 xen/include/asm-x86/hvm/hvm.h      |    2 ++
 xen/include/asm-x86/hvm/vmx/vmcs.h |    7 +++++++
 xen/include/public/domctl.h        |    7 ++++---
 11 files changed, 80 insertions(+), 7 deletions(-)

diff --git a/tools/libxc/xc_mem_access.c b/tools/libxc/xc_mem_access.c
index 461f0e9..55d0e9f 100644
--- a/tools/libxc/xc_mem_access.c
+++ b/tools/libxc/xc_mem_access.c
@@ -26,7 +26,15 @@
 
 void *xc_mem_access_enable(xc_interface *xch, domid_t domain_id, uint32_t *port)
 {
-    return xc_mem_event_enable(xch, domain_id, HVM_PARAM_ACCESS_RING_PFN, port);
+    return xc_mem_event_enable(xch, domain_id, HVM_PARAM_ACCESS_RING_PFN,
+                               port, 0);
+}
+
+void *xc_mem_access_enable_introspection(xc_interface *xch, domid_t domain_id,
+                                         uint32_t *port)
+{
+    return xc_mem_event_enable(xch, domain_id, HVM_PARAM_ACCESS_RING_PFN,
+                               port, 1);
 }
 
 int xc_mem_access_disable(xc_interface *xch, domid_t domain_id)
diff --git a/tools/libxc/xc_mem_event.c b/tools/libxc/xc_mem_event.c
index faf1cc6..8c0be4e 100644
--- a/tools/libxc/xc_mem_event.c
+++ b/tools/libxc/xc_mem_event.c
@@ -57,7 +57,7 @@ int xc_mem_event_memop(xc_interface *xch, domid_t domain_id,
 }
 
 void *xc_mem_event_enable(xc_interface *xch, domid_t domain_id, int param,
-                          uint32_t *port)
+                          uint32_t *port, int enable_introspection)
 {
     void *ring_page = NULL;
     uint64_t pfn;
@@ -120,7 +120,10 @@ void *xc_mem_event_enable(xc_interface *xch, domid_t domain_id, int param,
         break;
 
     case HVM_PARAM_ACCESS_RING_PFN:
-        op = XEN_DOMCTL_MEM_EVENT_OP_ACCESS_ENABLE;
+        if ( enable_introspection )
+            op = XEN_DOMCTL_MEM_EVENT_OP_ACCESS_ENABLE_INTROSPECTION;
+        else
+            op = XEN_DOMCTL_MEM_EVENT_OP_ACCESS_ENABLE;
         mode = XEN_DOMCTL_MEM_EVENT_OP_ACCESS;
         break;
 
diff --git a/tools/libxc/xc_private.h b/tools/libxc/xc_private.h
index c50a7c9..94df688 100644
--- a/tools/libxc/xc_private.h
+++ b/tools/libxc/xc_private.h
@@ -381,6 +381,6 @@ int xc_mem_event_memop(xc_interface *xch, domid_t domain_id,
  * param can be HVM_PARAM_PAGING/ACCESS/SHARING_RING_PFN
  */
 void *xc_mem_event_enable(xc_interface *xch, domid_t domain_id, int param,
-                          uint32_t *port);
+                          uint32_t *port, int enable_introspection);
 
 #endif /* __XC_PRIVATE_H__ */
diff --git a/tools/libxc/xenctrl.h b/tools/libxc/xenctrl.h
index 1c8aa42..514b241 100644
--- a/tools/libxc/xenctrl.h
+++ b/tools/libxc/xenctrl.h
@@ -2273,6 +2273,8 @@ int xc_mem_paging_load(xc_interface *xch, domid_t domain_id,
  * Caller has to unmap this page when done.
  */
 void *xc_mem_access_enable(xc_interface *xch, domid_t domain_id, uint32_t *port);
+void *xc_mem_access_enable_introspection(xc_interface *xch, domid_t domain_id,
+                                         uint32_t *port);
 int xc_mem_access_disable(xc_interface *xch, domid_t domain_id);
 int xc_mem_access_resume(xc_interface *xch, domid_t domain_id);
 
diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c
index 4a4f4e1..fc1f882 100644
--- a/xen/arch/x86/hvm/vmx/vmcs.c
+++ b/xen/arch/x86/hvm/vmx/vmcs.c
@@ -39,6 +39,7 @@
 #include <xen/keyhandler.h>
 #include <asm/shadow.h>
 #include <asm/tboot.h>
+#include <asm/mem_event.h>
 
 static bool_t __read_mostly opt_vpid_enabled = 1;
 boolean_param("vpid", opt_vpid_enabled);
@@ -71,6 +72,18 @@ u32 vmx_vmexit_control __read_mostly;
 u32 vmx_vmentry_control __read_mostly;
 u64 vmx_ept_vpid_cap __read_mostly;
 
+const u32 vmx_introspection_force_enabled_msrs[] = {
+    MSR_IA32_SYSENTER_EIP,
+    MSR_IA32_SYSENTER_ESP,
+    MSR_IA32_SYSENTER_CS,
+    MSR_IA32_MC0_CTL,
+    MSR_STAR,
+    MSR_LSTAR
+};
+
+const unsigned int vmx_introspection_force_enabled_msrs_size =
+    ARRAY_SIZE(vmx_introspection_force_enabled_msrs);
+
 static DEFINE_PER_CPU_READ_MOSTLY(struct vmcs_struct *, vmxon_region);
 static DEFINE_PER_CPU(struct vmcs_struct *, current_vmcs);
 static DEFINE_PER_CPU(struct list_head, active_vmcs_list);
@@ -695,11 +708,23 @@ static void vmx_set_host_env(struct vcpu *v)
 void vmx_disable_intercept_for_msr(struct vcpu *v, u32 msr, int type)
 {
     unsigned long *msr_bitmap = v->arch.hvm_vmx.msr_bitmap;
+    struct domain *d = v->domain;
 
     /* VMX MSR bitmap supported? */
     if ( msr_bitmap == NULL )
         return;
 
+    if ( unlikely(d->arch.hvm_domain.introspection_enabled) &&
+         mem_event_check_ring(&d->mem_event->access) )
+    {
+        unsigned int i;
+
+        /* Filter out MSR-s needed for memory introspection */
+        for ( i = 0; i < vmx_introspection_force_enabled_msrs_size; i++ )
+            if ( msr == vmx_introspection_force_enabled_msrs[i] )
+                return;
+    }
+
     /*
      * See Intel PRM Vol. 3, 20.6.9 (MSR-Bitmap Address). Early manuals
      * have the write-low and read-high bitmap offsets the wrong way round.
diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index 26b1ad5..b08664e 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -1683,6 +1683,18 @@ void vmx_hypervisor_cpuid_leaf(uint32_t sub_idx,
         *eax |= XEN_HVM_CPUID_X2APIC_VIRT;
 }
 
+static void vmx_enable_msr_exit_interception(struct domain *d)
+{
+    struct vcpu *v;
+    unsigned int i;
+
+    /* Enable interception for MSRs needed for memory introspection. */
+    for_each_vcpu ( d, v )
+        for ( i = 0; i < vmx_introspection_force_enabled_msrs_size; i++ )
+            vmx_enable_intercept_for_msr(v, vmx_introspection_force_enabled_msrs[i],
+                                         MSR_TYPE_W);
+}
+
 static struct hvm_function_table __initdata vmx_function_table = {
     .name                 = "VMX",
     .cpu_up_prepare       = vmx_cpu_up_prepare,
@@ -1741,6 +1753,7 @@ static struct hvm_function_table __initdata vmx_function_table = {
     .handle_eoi           = vmx_handle_eoi,
     .nhvm_hap_walk_L1_p2m = nvmx_hap_walk_L1_p2m,
     .hypervisor_cpuid_leaf = vmx_hypervisor_cpuid_leaf,
+    .enable_msr_exit_interception = vmx_enable_msr_exit_interception,
 };
 
 const struct hvm_function_table * __init start_vmx(void)
diff --git a/xen/arch/x86/mm/mem_event.c b/xen/arch/x86/mm/mem_event.c
index ba7e71e..fdd5ff6 100644
--- a/xen/arch/x86/mm/mem_event.c
+++ b/xen/arch/x86/mm/mem_event.c
@@ -587,6 +587,7 @@ int mem_event_domctl(struct domain *d, xen_domctl_mem_event_op_t *mec,
         switch( mec->op )
         {
         case XEN_DOMCTL_MEM_EVENT_OP_ACCESS_ENABLE:
+        case XEN_DOMCTL_MEM_EVENT_OP_ACCESS_ENABLE_INTROSPECTION:
         {
             rc = -ENODEV;
             /* Only HAP is supported */
@@ -600,13 +601,23 @@ int mem_event_domctl(struct domain *d, xen_domctl_mem_event_op_t *mec,
             rc = mem_event_enable(d, mec, med, _VPF_mem_access, 
                                     HVM_PARAM_ACCESS_RING_PFN,
                                     mem_access_notification);
+
+            if ( mec->op != XEN_DOMCTL_MEM_EVENT_OP_ACCESS_ENABLE &&
+                 rc == 0 && hvm_funcs.enable_msr_exit_interception )
+            {
+                d->arch.hvm_domain.introspection_enabled = 1;
+                hvm_funcs.enable_msr_exit_interception(d);
+            }
         }
         break;
 
         case XEN_DOMCTL_MEM_EVENT_OP_ACCESS_DISABLE:
         {
             if ( med->ring_page )
+            {
                 rc = mem_event_disable(d, med);
+                d->arch.hvm_domain.introspection_enabled = 0;
+            }
         }
         break;
 
diff --git a/xen/include/asm-x86/hvm/domain.h b/xen/include/asm-x86/hvm/domain.h
index 291a2e0..30d4aa3 100644
--- a/xen/include/asm-x86/hvm/domain.h
+++ b/xen/include/asm-x86/hvm/domain.h
@@ -134,6 +134,7 @@ struct hvm_domain {
     bool_t                 mem_sharing_enabled;
     bool_t                 qemu_mapcache_invalidate;
     bool_t                 is_s3_suspended;
+    bool_t                 introspection_enabled;
 
     /*
      * TSC value that VCPUs use to calculate their tsc_offset value.
diff --git a/xen/include/asm-x86/hvm/hvm.h b/xen/include/asm-x86/hvm/hvm.h
index 1123857..121d053 100644
--- a/xen/include/asm-x86/hvm/hvm.h
+++ b/xen/include/asm-x86/hvm/hvm.h
@@ -205,6 +205,8 @@ struct hvm_function_table {
     void (*hypervisor_cpuid_leaf)(uint32_t sub_idx,
                                   uint32_t *eax, uint32_t *ebx,
                                   uint32_t *ecx, uint32_t *edx);
+
+    void (*enable_msr_exit_interception)(struct domain *d);
 };
 
 extern struct hvm_function_table hvm_funcs;
diff --git a/xen/include/asm-x86/hvm/vmx/vmcs.h b/xen/include/asm-x86/hvm/vmx/vmcs.h
index 215d93c..6a99dca 100644
--- a/xen/include/asm-x86/hvm/vmx/vmcs.h
+++ b/xen/include/asm-x86/hvm/vmx/vmcs.h
@@ -471,6 +471,13 @@ enum vmcs_field {
     HOST_RIP                        = 0x00006c16,
 };
 
+/*
+ * A set of MSR-s that need to be enabled for memory introspection
+ * to work.
+ */
+extern const u32 vmx_introspection_force_enabled_msrs[];
+extern const unsigned int vmx_introspection_force_enabled_msrs_size;
+
 #define VMCS_VPID_WIDTH 16
 
 #define MSR_TYPE_R 1
diff --git a/xen/include/public/domctl.h b/xen/include/public/domctl.h
index 69a8b44..cfa39b3 100644
--- a/xen/include/public/domctl.h
+++ b/xen/include/public/domctl.h
@@ -773,10 +773,11 @@ struct xen_domctl_gdbsx_domstatus {
  * ENODEV - host lacks HAP support (EPT/NPT) or HAP is disabled in guest
  * EBUSY  - guest has or had access enabled, ring buffer still active
  */
-#define XEN_DOMCTL_MEM_EVENT_OP_ACCESS            2
+#define XEN_DOMCTL_MEM_EVENT_OP_ACCESS                        2
 
-#define XEN_DOMCTL_MEM_EVENT_OP_ACCESS_ENABLE     0
-#define XEN_DOMCTL_MEM_EVENT_OP_ACCESS_DISABLE    1
+#define XEN_DOMCTL_MEM_EVENT_OP_ACCESS_ENABLE                 0
+#define XEN_DOMCTL_MEM_EVENT_OP_ACCESS_DISABLE                1
+#define XEN_DOMCTL_MEM_EVENT_OP_ACCESS_ENABLE_INTROSPECTION   2
 
 /*
  * Sharing ENOMEM helper.
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH V7 for-4.5 4/4] xen: Handle resumed instruction based on previous mem_event reply
  2014-09-11 13:15 [PATCH V7 for-4.5 0/4] Basic guest memory introspection support Razvan Cojocaru
                   ` (2 preceding siblings ...)
  2014-09-11 13:15 ` [PATCH V7 for-4.5 3/4] xen, libxc: Force-enable relevant MSR events Razvan Cojocaru
@ 2014-09-11 13:15 ` Razvan Cojocaru
  2014-09-11 13:25   ` Jan Beulich
  2014-09-11 13:35   ` Jan Beulich
  3 siblings, 2 replies; 12+ messages in thread
From: Razvan Cojocaru @ 2014-09-11 13:15 UTC (permalink / raw)
  To: xen-devel
  Cc: kevin.tian, keir, ian.campbell, Razvan Cojocaru,
	stefano.stabellini, eddie.dong, ian.jackson, tim, jbeulich,
	jun.nakajima, andrew.cooper3

In a scenario where a page fault that triggered a mem_event occured,
p2m_mem_access_check() will now be able to either 1) emulate the
current instruction, or 2) emulate it, but don't allow it to perform
any writes.

Signed-off-by: Razvan Cojocaru <rcojocaru@bitdefender.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>

---
Changes since V6:
 - Removed filtering for GLA events: all events are now being sent.

Changes since V4:
 - Removed vmx_exited_by_nested_pagefault() (now using npfec.kind).
---
 xen/arch/x86/domain.c          |    3 ++
 xen/arch/x86/mm/p2m.c          |   76 ++++++++++++++++++++++++++++++++++++++++
 xen/include/asm-x86/domain.h   |    9 +++++
 xen/include/public/mem_event.h |   12 ++++---
 4 files changed, 95 insertions(+), 5 deletions(-)

diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index f7e0e78..7b1dfe6 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -415,6 +415,9 @@ int vcpu_initialise(struct vcpu *v)
 
     v->arch.flags = TF_kernel_mode;
 
+    /* By default, do not emulate */
+    v->arch.mem_event.emulate_flags = 0;
+
     rc = mapcache_vcpu_init(v);
     if ( rc )
         return rc;
diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c
index d0962aa..a114734 100644
--- a/xen/arch/x86/mm/p2m.c
+++ b/xen/arch/x86/mm/p2m.c
@@ -1395,6 +1395,7 @@ bool_t p2m_mem_access_check(paddr_t gpa, unsigned long gla,
     p2m_access_t p2ma;
     mem_event_request_t *req;
     int rc;
+    unsigned long eip = guest_cpu_user_regs()->eip;
 
     /* First, handle rx2rw conversion automatically.
      * These calls to p2m->set_entry() must succeed: we have the gfn
@@ -1448,6 +1449,28 @@ bool_t p2m_mem_access_check(paddr_t gpa, unsigned long gla,
         }
     }
 
+    /* The previous mem_event reply does not match the current state. */
+    if ( v->arch.mem_event.gpa != gpa || v->arch.mem_event.eip != eip )
+    {
+        /* Don't emulate the current instruction, send a new mem_event. */
+        v->arch.mem_event.emulate_flags = 0;
+
+        /* Make sure to mark the current state to match it again against
+         * the new mem_event about to be sent. */
+        v->arch.mem_event.gpa = gpa;
+        v->arch.mem_event.eip = eip;
+    }
+
+    if ( v->arch.mem_event.emulate_flags )
+    {
+        hvm_mem_event_emulate_one((v->arch.mem_event.emulate_flags &
+                                   MEM_EVENT_FLAG_EMULATE_NOWRITE) != 0,
+                                  TRAP_invalid_op, HVM_DELIVER_NO_ERROR_CODE);
+
+        v->arch.mem_event.emulate_flags = 0;
+        return 1;
+    }
+
     *req_ptr = NULL;
     req = xzalloc(mem_event_request_t);
     if ( req )
@@ -1502,6 +1525,59 @@ void p2m_mem_access_resume(struct domain *d)
 
         v = d->vcpu[rsp.vcpu_id];
 
+        /* Mark vcpu for skipping one instruction upon rescheduling */
+        if ( rsp.flags & MEM_EVENT_FLAG_EMULATE )
+        {
+            xenmem_access_t access;
+            bool_t violation = 1;
+
+            v->arch.mem_event.emulate_flags = 0;
+
+            if ( p2m_get_mem_access(d, rsp.gfn, &access) == 0 )
+            {
+                switch ( access )
+                {
+                case XENMEM_access_n:
+                case XENMEM_access_n2rwx:
+                default:
+                    violation = rsp.access_r || rsp.access_w || rsp.access_x;
+                    break;
+
+                case XENMEM_access_r:
+                    violation = rsp.access_w || rsp.access_x;
+                    break;
+
+                case XENMEM_access_w:
+                    violation = rsp.access_r || rsp.access_x;
+                    break;
+
+                case XENMEM_access_x:
+                    violation = rsp.access_r || rsp.access_w;
+                    break;
+
+                case XENMEM_access_rx:
+                case XENMEM_access_rx2rw:
+                    violation = rsp.access_w;
+                    break;
+
+                case XENMEM_access_wx:
+                    violation = rsp.access_r;
+                    break;
+
+                case XENMEM_access_rw:
+                    violation = rsp.access_x;
+                    break;
+
+                case XENMEM_access_rwx:
+                    violation = 0;
+                    break;
+                }
+            }
+
+            if ( violation )
+                v->arch.mem_event.emulate_flags = rsp.flags;
+        }
+
         /* Unpause domain */
         if ( rsp.flags & MEM_EVENT_FLAG_VCPU_PAUSED )
             mem_event_vcpu_unpause(v);
diff --git a/xen/include/asm-x86/domain.h b/xen/include/asm-x86/domain.h
index 83329ed..440aa81 100644
--- a/xen/include/asm-x86/domain.h
+++ b/xen/include/asm-x86/domain.h
@@ -458,6 +458,15 @@ struct arch_vcpu
 
     /* A secondary copy of the vcpu time info. */
     XEN_GUEST_HANDLE(vcpu_time_info_t) time_info_guest;
+
+    /* Should we emulate the next matching instruction on VCPU resume
+     * after a mem_event? */
+    struct {
+        uint32_t emulate_flags;
+        unsigned long gpa;
+        unsigned long eip;
+    } mem_event;
+
 } __cacheline_aligned;
 
 smap_check_policy_t smap_policy_change(struct vcpu *v,
diff --git a/xen/include/public/mem_event.h b/xen/include/public/mem_event.h
index d3dd9c6..92c063c 100644
--- a/xen/include/public/mem_event.h
+++ b/xen/include/public/mem_event.h
@@ -31,11 +31,13 @@
 #include "io/ring.h"
 
 /* Memory event flags */
-#define MEM_EVENT_FLAG_VCPU_PAUSED  (1 << 0)
-#define MEM_EVENT_FLAG_DROP_PAGE    (1 << 1)
-#define MEM_EVENT_FLAG_EVICT_FAIL   (1 << 2)
-#define MEM_EVENT_FLAG_FOREIGN      (1 << 3)
-#define MEM_EVENT_FLAG_DUMMY        (1 << 4)
+#define MEM_EVENT_FLAG_VCPU_PAUSED     (1 << 0)
+#define MEM_EVENT_FLAG_DROP_PAGE       (1 << 1)
+#define MEM_EVENT_FLAG_EVICT_FAIL      (1 << 2)
+#define MEM_EVENT_FLAG_FOREIGN         (1 << 3)
+#define MEM_EVENT_FLAG_DUMMY           (1 << 4)
+#define MEM_EVENT_FLAG_EMULATE         (1 << 5)
+#define MEM_EVENT_FLAG_EMULATE_NOWRITE (1 << 6)
 
 /* Reasons for the memory event request */
 #define MEM_EVENT_REASON_UNKNOWN     0    /* typical reason */
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH V7 for-4.5 4/4] xen: Handle resumed instruction based on previous mem_event reply
  2014-09-11 13:15 ` [PATCH V7 for-4.5 4/4] xen: Handle resumed instruction based on previous mem_event reply Razvan Cojocaru
@ 2014-09-11 13:25   ` Jan Beulich
  2014-09-11 13:29     ` Razvan Cojocaru
  2014-09-11 13:35   ` Jan Beulich
  1 sibling, 1 reply; 12+ messages in thread
From: Jan Beulich @ 2014-09-11 13:25 UTC (permalink / raw)
  To: Razvan Cojocaru
  Cc: kevin.tian, keir, ian.campbell, stefano.stabellini,
	andrew.cooper3, eddie.dong, tim, jun.nakajima, xen-devel,
	ian.jackson

>>> On 11.09.14 at 15:15, <rcojocaru@bitdefender.com> wrote:
> In a scenario where a page fault that triggered a mem_event occured,
> p2m_mem_access_check() will now be able to either 1) emulate the
> current instruction, or 2) emulate it, but don't allow it to perform
> any writes.
> 
> Signed-off-by: Razvan Cojocaru <rcojocaru@bitdefender.com>
> Acked-by: Kevin Tian <kevin.tian@intel.com>
> 
> ---
> Changes since V6:
>  - Removed filtering for GLA events: all events are now being sent.

With a functional change like this not requested by the person
having given their ack under that condition, any earlier ack
necessarily becomes invalid imo.

Jan

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH V7 for-4.5 4/4] xen: Handle resumed instruction based on previous mem_event reply
  2014-09-11 13:25   ` Jan Beulich
@ 2014-09-11 13:29     ` Razvan Cojocaru
  0 siblings, 0 replies; 12+ messages in thread
From: Razvan Cojocaru @ 2014-09-11 13:29 UTC (permalink / raw)
  To: Jan Beulich
  Cc: kevin.tian, keir, ian.campbell, stefano.stabellini,
	andrew.cooper3, eddie.dong, tim, jun.nakajima, xen-devel,
	ian.jackson

On 09/11/2014 04:25 PM, Jan Beulich wrote:
>>>> On 11.09.14 at 15:15, <rcojocaru@bitdefender.com> wrote:
>> In a scenario where a page fault that triggered a mem_event occured,
>> p2m_mem_access_check() will now be able to either 1) emulate the
>> current instruction, or 2) emulate it, but don't allow it to perform
>> any writes.
>>
>> Signed-off-by: Razvan Cojocaru <rcojocaru@bitdefender.com>
>> Acked-by: Kevin Tian <kevin.tian@intel.com>
>>
>> ---
>> Changes since V6:
>>  - Removed filtering for GLA events: all events are now being sent.
> 
> With a functional change like this not requested by the person
> having given their ack under that condition, any earlier ack
> necessarily becomes invalid imo.

I understand. Please consider it un-Acked in that case, and if a new
series comes about (without Kevin't explicit Ack) I'll make sure to
remove it.


Thanks,
Razvan Cojocaru

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH V7 for-4.5 4/4] xen: Handle resumed instruction based on previous mem_event reply
  2014-09-11 13:15 ` [PATCH V7 for-4.5 4/4] xen: Handle resumed instruction based on previous mem_event reply Razvan Cojocaru
  2014-09-11 13:25   ` Jan Beulich
@ 2014-09-11 13:35   ` Jan Beulich
  2014-09-11 14:02     ` Razvan Cojocaru
  1 sibling, 1 reply; 12+ messages in thread
From: Jan Beulich @ 2014-09-11 13:35 UTC (permalink / raw)
  To: Razvan Cojocaru
  Cc: kevin.tian, keir, ian.campbell, stefano.stabellini,
	andrew.cooper3, eddie.dong, tim, jun.nakajima, xen-devel,
	ian.jackson

>>> On 11.09.14 at 15:15, <rcojocaru@bitdefender.com> wrote:
> @@ -1448,6 +1449,28 @@ bool_t p2m_mem_access_check(paddr_t gpa, unsigned long gla,
>          }
>      }
>  
> +    /* The previous mem_event reply does not match the current state. */
> +    if ( v->arch.mem_event.gpa != gpa || v->arch.mem_event.eip != eip )
> +    {
> +        /* Don't emulate the current instruction, send a new mem_event. */
> +        v->arch.mem_event.emulate_flags = 0;
> +
> +        /* Make sure to mark the current state to match it again against
> +         * the new mem_event about to be sent. */

Coding style.

> +        v->arch.mem_event.gpa = gpa;
> +        v->arch.mem_event.eip = eip;
> +    }
> +
> +    if ( v->arch.mem_event.emulate_flags )
> +    {
> +        hvm_mem_event_emulate_one((v->arch.mem_event.emulate_flags &
> +                                   MEM_EVENT_FLAG_EMULATE_NOWRITE) != 0,
> +                                  TRAP_invalid_op, HVM_DELIVER_NO_ERROR_CODE);
> +
> +        v->arch.mem_event.emulate_flags = 0;
> +        return 1;
> +    }
> +
>      *req_ptr = NULL;
>      req = xzalloc(mem_event_request_t);
>      if ( req )
> @@ -1502,6 +1525,59 @@ void p2m_mem_access_resume(struct domain *d)
>  
>          v = d->vcpu[rsp.vcpu_id];
>  
> +        /* Mark vcpu for skipping one instruction upon rescheduling */

Missing stop at end of sentence.

> +        if ( rsp.flags & MEM_EVENT_FLAG_EMULATE )
> +        {
> +            xenmem_access_t access;
> +            bool_t violation = 1;
> +
> +            v->arch.mem_event.emulate_flags = 0;

Do you really need to write this once here and ...

> +
> +            if ( p2m_get_mem_access(d, rsp.gfn, &access) == 0 )
> +            {
> +                switch ( access )
> +                {
> +                case XENMEM_access_n:
> +                case XENMEM_access_n2rwx:
> +                default:
> +                    violation = rsp.access_r || rsp.access_w || rsp.access_x;
> +                    break;
> +
> +                case XENMEM_access_r:
> +                    violation = rsp.access_w || rsp.access_x;
> +                    break;
> +
> +                case XENMEM_access_w:
> +                    violation = rsp.access_r || rsp.access_x;
> +                    break;
> +
> +                case XENMEM_access_x:
> +                    violation = rsp.access_r || rsp.access_w;
> +                    break;
> +
> +                case XENMEM_access_rx:
> +                case XENMEM_access_rx2rw:
> +                    violation = rsp.access_w;
> +                    break;
> +
> +                case XENMEM_access_wx:
> +                    violation = rsp.access_r;
> +                    break;
> +
> +                case XENMEM_access_rw:
> +                    violation = rsp.access_x;
> +                    break;
> +
> +                case XENMEM_access_rwx:
> +                    violation = 0;
> +                    break;
> +                }
> +            }
> +
> +            if ( violation )
> +                v->arch.mem_event.emulate_flags = rsp.flags;

... a second time here (rather making this one simply a conditional
expression)?

And I further wonder whether all the MEM_EVENT_FLAG_* values are
really potentially useful in v->arch.mem_event.emulate_flags - right
now it rather looks like this field could be a simple bool_t (likely with
a different name), which would at once make the
hvm_mem_event_emulate_one() a little better readable.

> --- a/xen/include/asm-x86/domain.h
> +++ b/xen/include/asm-x86/domain.h
> @@ -458,6 +458,15 @@ struct arch_vcpu
>  
>      /* A secondary copy of the vcpu time info. */
>      XEN_GUEST_HANDLE(vcpu_time_info_t) time_info_guest;
> +
> +    /* Should we emulate the next matching instruction on VCPU resume
> +     * after a mem_event? */

Coding style again.

Jan

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH V7 for-4.5 4/4] xen: Handle resumed instruction based on previous mem_event reply
  2014-09-11 13:35   ` Jan Beulich
@ 2014-09-11 14:02     ` Razvan Cojocaru
  2014-09-11 15:02       ` Jan Beulich
  0 siblings, 1 reply; 12+ messages in thread
From: Razvan Cojocaru @ 2014-09-11 14:02 UTC (permalink / raw)
  To: Jan Beulich
  Cc: kevin.tian, keir, ian.campbell, stefano.stabellini,
	andrew.cooper3, eddie.dong, tim, jun.nakajima, xen-devel,
	ian.jackson

On 09/11/2014 04:35 PM, Jan Beulich wrote:
>>>> On 11.09.14 at 15:15, <rcojocaru@bitdefender.com> wrote:
>> @@ -1448,6 +1449,28 @@ bool_t p2m_mem_access_check(paddr_t gpa, unsigned long gla,
>>          }
>>      }
>>  
>> +    /* The previous mem_event reply does not match the current state. */
>> +    if ( v->arch.mem_event.gpa != gpa || v->arch.mem_event.eip != eip )
>> +    {
>> +        /* Don't emulate the current instruction, send a new mem_event. */
>> +        v->arch.mem_event.emulate_flags = 0;
>> +
>> +        /* Make sure to mark the current state to match it again against
>> +         * the new mem_event about to be sent. */
> 
> Coding style.

Thank you for the review. The proper way to write multiline comments in
Xen is to always begin with '/*', then each line after preceded by an
'*', ended by a single '*/' below the next line, is that correct?

/*
 * Multiline comment
 * example.
 */

>> +        if ( rsp.flags & MEM_EVENT_FLAG_EMULATE )
>> +        {
>> +            xenmem_access_t access;
>> +            bool_t violation = 1;
>> +
>> +            v->arch.mem_event.emulate_flags = 0;
> 
> Do you really need to write this once here and ...
> 
>> +
>> +            if ( p2m_get_mem_access(d, rsp.gfn, &access) == 0 )
>> +            {
>> +                switch ( access )
>> +                {
>> +                case XENMEM_access_n:
>> +                case XENMEM_access_n2rwx:
>> +                default:
>> +                    violation = rsp.access_r || rsp.access_w || rsp.access_x;
>> +                    break;
>> +
>> +                case XENMEM_access_r:
>> +                    violation = rsp.access_w || rsp.access_x;
>> +                    break;
>> +
>> +                case XENMEM_access_w:
>> +                    violation = rsp.access_r || rsp.access_x;
>> +                    break;
>> +
>> +                case XENMEM_access_x:
>> +                    violation = rsp.access_r || rsp.access_w;
>> +                    break;
>> +
>> +                case XENMEM_access_rx:
>> +                case XENMEM_access_rx2rw:
>> +                    violation = rsp.access_w;
>> +                    break;
>> +
>> +                case XENMEM_access_wx:
>> +                    violation = rsp.access_r;
>> +                    break;
>> +
>> +                case XENMEM_access_rw:
>> +                    violation = rsp.access_x;
>> +                    break;
>> +
>> +                case XENMEM_access_rwx:
>> +                    violation = 0;
>> +                    break;
>> +                }
>> +            }
>> +
>> +            if ( violation )
>> +                v->arch.mem_event.emulate_flags = rsp.flags;
> 
> ... a second time here (rather making this one simply a conditional
> expression)?

I'll assign to v->arch.mem_event.emulate_flags directly in the switch.

> And I further wonder whether all the MEM_EVENT_FLAG_* values are
> really potentially useful in v->arch.mem_event.emulate_flags - right
> now it rather looks like this field could be a simple bool_t (likely with
> a different name), which would at once make the
> hvm_mem_event_emulate_one() a little better readable.

The value is checked here:

+        hvm_mem_event_emulate_one((v->arch.mem_event.emulate_flags &
+                                   MEM_EVENT_FLAG_EMULATE_NOWRITE) != 0,
+                                  TRAP_invalid_op,
HVM_DELIVER_NO_ERROR_CODE);

where it matters if MEM_EVENT_FLAG_EMULATE_NOWRITE is set. Also, please
bear in mind that in the original series we also had a
MEM_EVENT_FLAG_SKIP flag, so this allows for even more ways to respond
to a mem_event in the future.


Thanks,
Razvan Cojocaru

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH V7 for-4.5 4/4] xen: Handle resumed instruction based on previous mem_event reply
  2014-09-11 14:02     ` Razvan Cojocaru
@ 2014-09-11 15:02       ` Jan Beulich
  2014-09-11 15:21         ` Razvan Cojocaru
  0 siblings, 1 reply; 12+ messages in thread
From: Jan Beulich @ 2014-09-11 15:02 UTC (permalink / raw)
  To: Razvan Cojocaru
  Cc: kevin.tian, keir, ian.campbell, stefano.stabellini,
	andrew.cooper3, eddie.dong, tim, jun.nakajima, xen-devel,
	ian.jackson

>>> On 11.09.14 at 16:02, <rcojocaru@bitdefender.com> wrote:
> On 09/11/2014 04:35 PM, Jan Beulich wrote:
>>>>> On 11.09.14 at 15:15, <rcojocaru@bitdefender.com> wrote:
>>> @@ -1448,6 +1449,28 @@ bool_t p2m_mem_access_check(paddr_t gpa, unsigned long 
> gla,
>>>          }
>>>      }
>>>  
>>> +    /* The previous mem_event reply does not match the current state. */
>>> +    if ( v->arch.mem_event.gpa != gpa || v->arch.mem_event.eip != eip )
>>> +    {
>>> +        /* Don't emulate the current instruction, send a new mem_event. */
>>> +        v->arch.mem_event.emulate_flags = 0;
>>> +
>>> +        /* Make sure to mark the current state to match it again against
>>> +         * the new mem_event about to be sent. */
>> 
>> Coding style.
> 
> Thank you for the review. The proper way to write multiline comments in
> Xen is to always begin with '/*', then each line after preceded by an
> '*', ended by a single '*/' below the next line, is that correct?
> 
> /*
>  * Multiline comment
>  * example.
>  */

Yes.

>>> +            if ( violation )
>>> +                v->arch.mem_event.emulate_flags = rsp.flags;
>> 
>> ... a second time here (rather making this one simply a conditional
>> expression)?
> 
> I'll assign to v->arch.mem_event.emulate_flags directly in the switch.

I doubt that's going to result in better code.

>> And I further wonder whether all the MEM_EVENT_FLAG_* values are
>> really potentially useful in v->arch.mem_event.emulate_flags - right
>> now it rather looks like this field could be a simple bool_t (likely with
>> a different name), which would at once make the
>> hvm_mem_event_emulate_one() a little better readable.
> 
> The value is checked here:
> 
> +        hvm_mem_event_emulate_one((v->arch.mem_event.emulate_flags &
> +                                   MEM_EVENT_FLAG_EMULATE_NOWRITE) != 0,
> +                                  TRAP_invalid_op,
> HVM_DELIVER_NO_ERROR_CODE);
> 
> where it matters if MEM_EVENT_FLAG_EMULATE_NOWRITE is set.

Right, and this would reduce by quite a bit if you could just pass the
boolean variable.

> Also, please
> bear in mind that in the original series we also had a
> MEM_EVENT_FLAG_SKIP flag, so this allows for even more ways to respond
> to a mem_event in the future.

But that's now gone, with no current need to make provisions for it.

Jan

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH V7 for-4.5 4/4] xen: Handle resumed instruction based on previous mem_event reply
  2014-09-11 15:02       ` Jan Beulich
@ 2014-09-11 15:21         ` Razvan Cojocaru
  2014-09-11 16:04           ` Jan Beulich
  0 siblings, 1 reply; 12+ messages in thread
From: Razvan Cojocaru @ 2014-09-11 15:21 UTC (permalink / raw)
  To: Jan Beulich
  Cc: kevin.tian, keir, ian.campbell, stefano.stabellini,
	andrew.cooper3, eddie.dong, tim, jun.nakajima, xen-devel,
	ian.jackson

On 09/11/2014 06:02 PM, Jan Beulich wrote:
>>>> On 11.09.14 at 16:02, <rcojocaru@bitdefender.com> wrote:
>> On 09/11/2014 04:35 PM, Jan Beulich wrote:
>>>>>> On 11.09.14 at 15:15, <rcojocaru@bitdefender.com> wrote:

>>>> +            if ( violation )
>>>> +                v->arch.mem_event.emulate_flags = rsp.flags;
>>>
>>> ... a second time here (rather making this one simply a conditional
>>> expression)?
>>
>> I'll assign to v->arch.mem_event.emulate_flags directly in the switch.
> 
> I doubt that's going to result in better code.

You're right, I thought I'd simplify by getting rid of "violation" but
the code looks worse now. I'll remove the first assignment and modify
the second.

>>> And I further wonder whether all the MEM_EVENT_FLAG_* values are
>>> really potentially useful in v->arch.mem_event.emulate_flags - right
>>> now it rather looks like this field could be a simple bool_t (likely with
>>> a different name), which would at once make the
>>> hvm_mem_event_emulate_one() a little better readable.
>>
>> The value is checked here:
>>
>> +        hvm_mem_event_emulate_one((v->arch.mem_event.emulate_flags &
>> +                                   MEM_EVENT_FLAG_EMULATE_NOWRITE) != 0,
>> +                                  TRAP_invalid_op,
>> HVM_DELIVER_NO_ERROR_CODE);
>>
>> where it matters if MEM_EVENT_FLAG_EMULATE_NOWRITE is set.
> 
> Right, and this would reduce by quite a bit if you could just pass the
> boolean variable.

Sorry, I don't quite follow that. Currently,
v->arch.mem_event.emulate_flags serves both as a way to know if
emulation is required (0 if no emulation is required, and != 0 if
emulation is required), and as a way to know which kind of emulation is
required ("regular" emulation, or emulation with writes disabled).

Making it into a bool_t would make it possible to know that emulation is
required, but not which kind of emulation:

+    if ( v->arch.mem_event.emulate_flags )
+    {
+        hvm_mem_event_emulate_one((v->arch.mem_event.emulate_flags &
+                                   MEM_EVENT_FLAG_EMULATE_NOWRITE) != 0,
+                                  TRAP_invalid_op,
HVM_DELIVER_NO_ERROR_CODE);
+
+        v->arch.mem_event.emulate_flags = 0;
+        return 1;
+    }

So even in a scenario where we don't care about MEM_EVENT_FLAG_SKIP, we
still care about two different types response flags:
MEM_EVENT_FLAG_EMULATE for "regular" emulation and
MEM_EVENT_FLAG_EMULATE_NOWRITE that emulates with the dummy write
handlers (or-ed together for an "emulate no write" response).


Thanks,
Razvan Cojocaru

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH V7 for-4.5 4/4] xen: Handle resumed instruction based on previous mem_event reply
  2014-09-11 15:21         ` Razvan Cojocaru
@ 2014-09-11 16:04           ` Jan Beulich
  0 siblings, 0 replies; 12+ messages in thread
From: Jan Beulich @ 2014-09-11 16:04 UTC (permalink / raw)
  To: Razvan Cojocaru
  Cc: kevin.tian, keir, ian.campbell, stefano.stabellini,
	andrew.cooper3, eddie.dong, tim, jun.nakajima, xen-devel,
	ian.jackson

>>> On 11.09.14 at 17:21, <rcojocaru@bitdefender.com> wrote:
> Sorry, I don't quite follow that. Currently,
> v->arch.mem_event.emulate_flags serves both as a way to know if
> emulation is required (0 if no emulation is required, and != 0 if
> emulation is required), and as a way to know which kind of emulation is
> required ("regular" emulation, or emulation with writes disabled).
> 
> Making it into a bool_t would make it possible to know that emulation is
> required, but not which kind of emulation:
> 
> +    if ( v->arch.mem_event.emulate_flags )
> +    {
> +        hvm_mem_event_emulate_one((v->arch.mem_event.emulate_flags &
> +                                   MEM_EVENT_FLAG_EMULATE_NOWRITE) != 0,

Ah, right, I somehow managed to overlook this dual purpose.

Jan

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2014-09-11 16:04 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-09-11 13:15 [PATCH V7 for-4.5 0/4] Basic guest memory introspection support Razvan Cojocaru
2014-09-11 13:15 ` [PATCH V7 for-4.5 1/4] xen: Emulate with no writes Razvan Cojocaru
2014-09-11 13:15 ` [PATCH V7 for-4.5 2/4] xen: Optimize introspection access to guest state Razvan Cojocaru
2014-09-11 13:15 ` [PATCH V7 for-4.5 3/4] xen, libxc: Force-enable relevant MSR events Razvan Cojocaru
2014-09-11 13:15 ` [PATCH V7 for-4.5 4/4] xen: Handle resumed instruction based on previous mem_event reply Razvan Cojocaru
2014-09-11 13:25   ` Jan Beulich
2014-09-11 13:29     ` Razvan Cojocaru
2014-09-11 13:35   ` Jan Beulich
2014-09-11 14:02     ` Razvan Cojocaru
2014-09-11 15:02       ` Jan Beulich
2014-09-11 15:21         ` Razvan Cojocaru
2014-09-11 16:04           ` Jan Beulich

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.