Xen Security Advisory 260 (CVE-2018-8897)

* Xen Security Advisory 260 (CVE-2018-8897) - x86: mishandling of debug exceptions
@ 2018-05-08 17:00 Xen.org security team
  0 siblings, 0 replies; only message in thread
From: Xen.org security team @ 2018-05-08 17:00 UTC (permalink / raw)
  To: xen-announce, xen-devel, xen-users, oss-security; +Cc: Xen.org security team

[-- Attachment #1: Type: text/plain, Size: 6214 bytes --]

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

            Xen Security Advisory CVE-2018-8897 / XSA-260
                              version 2

                 x86: mishandling of debug exceptions

UPDATES IN VERSION 2
====================

Public release.

Updated .meta file

ISSUE DESCRIPTION
=================

When switching stacks, it is critical to have a matching stack segment
and stack pointer.  To allow an atomic update from what would otherwise
be two adjacent instructions, an update which changes the stack segment
(either a mov or pop instruction with %ss encoded as the destination
register) sets the movss shadow for one instruction.

The exact behaviour of the movss shadow is poorly understood.

In practice, a movss shadow delays some debug exceptions (e.g. from a
hardware breakpoint) until the subsequent instruction has completed.  If
the subsequent instruction normally transitions to supervisor mode
(e.g. a system call), then the debug exception will be taken after the
transition to ring0 is completed.

For most transitions to supervisor mode, this only confuses Xen into
printing a lot of debugging information.  For the syscall instruction
however, the exception gets taken before the syscall handler can move
off the guest stack.

IMPACT
======

A malicious PV guest can escalate their privilege to that of the
hypervisor.

VULNERABLE SYSTEMS
==================

All versions of Xen are vulnerable.

Only x86 systems are vulnerable.  ARM systems are not vulnerable.

Only x86 PV guests can exploit the vulnerability.  x86 HVM and PVH
guests cannot exploit the vulnerability.

An attacker needs to be able to control hardware debugging facilities to
exploit the vulnerability, but such permissions are typically available
to unprivileged users.

MITIGATION
==========

Running only HVM or PVH guests avoids the vulnerability.

Note however that a compromised device model (running in dom0 or a
stub domain) can carry out this attack, so users with HVM domains are
also advised to patch their systems.

CREDITS
=======

This issue was discovered by Andy Lutomirski, and Nick Peterson of Everdox
Tech LLC.

RESOLUTION
==========

Applying the appropriate attached patch resolves this issue.

xsa260-unstable/*.patch xen-unstable
xsa260-4.10/*.patch     Xen 4.10.x
xsa260-4.9/*.patch      Xen 4.9.x
xsa260-4.8/*.patch      Xen 4.8.x
xsa260-4.7/*.patch      Xen 4.7.x
xsa260-4.6/*.patch      Xen 4.6.x

$ sha256sum xsa260* xsa260*/*
f436009ea6d6a30cf9c316e909dcd260c223264884d2e4fc5b74bdaf2e515815  xsa260.meta
0f7e3cfecc59986fc950694bba7bb31ee9680b2390920335d6853fdf83ded9ef  xsa260-unstable/xsa260-1.patch
4df5b9d05a8f02754b1e819b8cad35b3da9ba7fcdaee0fc762d572481ef69f93  xsa260-unstable/xsa260-2.patch
5c3f9cbc777ed7a93a97a4665e0188e1b1a05dd057da830203e018c73e9e5ce7  xsa260-unstable/xsa260-3.patch
4b280ec02418f30f0576e84f23ae565acee4fcc2d398b3828c1e12d9346583af  xsa260-unstable/xsa260-4.patch
2c5ce2851351a40df9ed17fae3c6f7505dcda60209945321b545b6b6e4f065cb  xsa260-4.6/xsa260-1.patch
bfa2eb161f570b0295464ef41fc5add52e10853a1ec81de107f1a9deb945982f  xsa260-4.6/xsa260-2.patch
2f30c4fbebeb77da50caff62a0f28d3afe8993bee19233543170f1955cebdcbc  xsa260-4.6/xsa260-3.patch
363af89377d5819ad1450c8806824707d3e15700c179129aed62128e62ab1a0e  xsa260-4.6/xsa260-4.patch
0c2552a36737975f4f46d7054b49fd018b68c302cef3b39b27c2f17cc60eb531  xsa260-4.7/xsa260-1.patch
a92ef233a83923d6a18d51528ff28630ae3f1134ee76f2347397e22da9c84c24  xsa260-4.7/xsa260-2.patch
8469af8ba5b6722738b27c328eccc1d341af49c2e2bb23fe7b327a3349267b0e  xsa260-4.7/xsa260-3.patch
0327c2ef7984a4aa000849c68a01181fdb01962637e78629c6fb34bb95414a74  xsa260-4.7/xsa260-4.patch
a9be346f111bca3faf98045c089638ba960f291eb9ace03e8922d7b4f8a9b37e  xsa260-4.8/xsa260-1.patch
740c0ee49936430fdf66ae8b75f9f51fe728c71a7c7a56667f845aea7669d344  xsa260-4.8/xsa260-2.patch
94dbb7ad7d409f9170950162904247c7cf0e360cec2a0a1f1a6653ce9ca43283  xsa260-4.8/xsa260-3.patch
db440d76685cf1e8c332aea2aa13e6be43b1b7f68d9225dfe99bb2ee12e18b9e  xsa260-4.8/xsa260-4.patch
11b55f664a4043ed3a79d3e1a07877c68c8c19df6112feffdac1e55547f0002e  xsa260-4.9/xsa260-1.patch
38a762f8cf8db763d70f1ef35a4c2cac23282b694527a97b2eaf100a14f767eb  xsa260-4.9/xsa260-2.patch
18d9ffd273bdbd070e1b613e7f18ed21cdb874dba5f7964e14bb4a3dbc8844ec  xsa260-4.9/xsa260-3.patch
c3d689d581c2ce6beaaa9d955f159a3b5da8007a24a08969b0953e89491f15a5  xsa260-4.9/xsa260-4.patch
ffac7ab75bf65f8286b37d21cb4a4401d898670a4e52af88d8202ce4fe66edef  xsa260-4.10/xsa260-1.patch
fe85832a9b5b1076b3a9bdbd28a2f3be57cd019d66a725ce64698b1bd74145a8  xsa260-4.10/xsa260-2.patch
1955aed73828e23da871ef10e5ec49670ce59bdd06af2772e978f8e817e0319f  xsa260-4.10/xsa260-3.patch
8f504f8fcf100f8a00bece9c4df8b8933dceeaf29b50492317f9cbf74aaf4aa4  xsa260-4.10/xsa260-4.patch
$

DEPLOYMENT DURING EMBARGO
=========================

Deployment of the patches and/or mitigations described above (or
others which are substantially similar) is permitted during the
embargo, even on public-facing systems with untrusted guest users and
administrators.

But: Distribution of updated software is prohibited (except to other
members of the predisclosure list).

Predisclosure list members who wish to deploy significantly different
patches and/or mitigations, please contact the Xen Project Security
Team.

(Note: this during-embargo deployment notice is retained in
post-embargo publicly released Xen Project advisories, even though it
is then no longer applicable.  This is to enable the community to have
oversight of the Xen Project Security Team's decisionmaking.)

For more information about permissible uses of embargoed information,
consult the Xen Project community's agreed Security Policy:
  http://www.xenproject.org/security-policy.html
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1

iQEcBAEBCAAGBQJa8dQdAAoJEIP+FMlX6CvZwp4H/AxlMq1xyIAiDNGEESGlJpQh
Y0dD9I1dLraUr2tTpaDZM4qUjV2cQ5MRaFeiAxDVCNraNPTLeC5TRStkIMHWc3jK
C8/XzRq0lDdebQA04Usj7648HbtAoxkAV1SOOxsqPSBRHb1jPpa2/jvuA3BzCl+o
gZo0urWinKlIJ032KWOd/9j96M0YgqqdJ+h2bfSg5uBSdXcQ6at5nYc1T4s3fi2R
AQvs8aQ/yylKVsCit+AypcyOMRELNA2jHWEelZ7L18zMGHwTa9qt1NZAL+VM2pMW
SKNphOdrCJxVZdGMJlc6ujzxUBgUC7qdfsqprBrKi/4eT+K5I9CvfV21er+7+BA=
=0+sm
-----END PGP SIGNATURE-----

[-- Attachment #2: xsa260.meta --]
[-- Type: application/octet-stream, Size: 2374 bytes --]

{
  "XSA": 260,
  "SupportedVersions": [
    "master",
    "4.10",
    "4.9",
    "4.8",
    "4.7",
    "4.6"
  ],
  "Trees": [
    "xen"
  ],
  "Recipes": {
    "4.10": {
      "Recipes": {
        "xen": {
          "StableRef": "99e50001bea6f3d777b86bbb9bb41ef66ba47974",
          "Prereqs": [],
          "Patches": [
            "xsa260-4.10/xsa260-1.patch",
            "xsa260-4.10/xsa260-2.patch",
            "xsa260-4.10/xsa260-3.patch",
            "xsa260-4.10/xsa260-4.patch"
          ]
        }
      }
    },
    "4.6": {
      "Recipes": {
        "xen": {
          "StableRef": "927aca70011f83c44294f90275c18a0b3f7d7169",
          "Prereqs": [],
          "Patches": [
            "xsa260-4.6/xsa260-1.patch",
            "xsa260-4.6/xsa260-2.patch",
            "xsa260-4.6/xsa260-3.patch",
            "xsa260-4.6/xsa260-4.patch"
          ]
        }
      }
    },
    "4.7": {
      "Recipes": {
        "xen": {
          "StableRef": "a8ef07566fa8fe9a2e8db745014d93e259b66785",
          "Prereqs": [],
          "Patches": [
            "xsa260-4.7/xsa260-1.patch",
            "xsa260-4.7/xsa260-2.patch",
            "xsa260-4.7/xsa260-3.patch",
            "xsa260-4.7/xsa260-4.patch"
          ]
        }
      }
    },
    "4.8": {
      "Recipes": {
        "xen": {
          "StableRef": "1052a2168ed62999b35319a435c16da884f5f0e2",
          "Prereqs": [],
          "Patches": [
            "xsa260-4.8/xsa260-1.patch",
            "xsa260-4.8/xsa260-2.patch",
            "xsa260-4.8/xsa260-3.patch",
            "xsa260-4.8/xsa260-4.patch"
          ]
        }
      }
    },
    "4.9": {
      "Recipes": {
        "xen": {
          "StableRef": "7866e115f9c624b0669997fcc393b489ef3c38a2",
          "Prereqs": [],
          "Patches": [
            "xsa260-4.9/xsa260-1.patch",
            "xsa260-4.9/xsa260-2.patch",
            "xsa260-4.9/xsa260-3.patch",
            "xsa260-4.9/xsa260-4.patch"
          ]
        }
      }
    },
    "master": {
      "Recipes": {
        "xen": {
          "StableRef": "d80af845de7a4db01a4a3b4d779e0e0dcb5e738b",
          "Prereqs": [],
          "Patches": [
            "xsa260-unstable/xsa260-1.patch",
            "xsa260-unstable/xsa260-2.patch",
            "xsa260-unstable/xsa260-3.patch",
            "xsa260-unstable/xsa260-4.patch"
          ]
        }
      }
    }
  }
}

[-- Attachment #3: xsa260-unstable/xsa260-1.patch --]
[-- Type: application/octet-stream, Size: 3019 bytes --]

From: Andrew Cooper <andrew.cooper3@citrix.com>
Subject: x86/traps: Fix %dr6 handing in #DB handler

Most bits in %dr6 accumulate, rather than being set directly based on the
current source of #DB.  Have the handler follow the manuals guidance, which
avoids leaking hypervisor debugging activities into guest context.

This is part of XSA-260 / CVE-2018-8897

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>

diff --git a/xen/arch/x86/traps.c b/xen/arch/x86/traps.c
index 63c6569..3d4ac59 100644
--- a/xen/arch/x86/traps.c
+++ b/xen/arch/x86/traps.c
@@ -1777,11 +1777,36 @@ static void ler_enable(void)
 
 void do_debug(struct cpu_user_regs *regs)
 {
+    unsigned long dr6;
     struct vcpu *v = current;
 
+    /* Stash dr6 as early as possible. */
+    dr6 = read_debugreg(6);
+
     if ( debugger_trap_entry(TRAP_debug, regs) )
         return;
 
+    /*
+     * At the time of writing (March 2018), on the subject of %dr6:
+     *
+     * The Intel manual says:
+     *   Certain debug exceptions may clear bits 0-3. The remaining contents
+     *   of the DR6 register are never cleared by the processor. To avoid
+     *   confusion in identifying debug exceptions, debug handlers should
+     *   clear the register (except bit 16, which they should set) before
+     *   returning to the interrupted task.
+     *
+     * The AMD manual says:
+     *   Bits 15:13 of the DR6 register are not cleared by the processor and
+     *   must be cleared by software after the contents have been read.
+     *
+     * Some bits are reserved set, some are reserved clear, and some bits
+     * which were previously reserved set are reused and cleared by hardware.
+     * For future compatibility, reset to the default value, which will allow
+     * us to spot any bit being changed by hardware to its non-default value.
+     */
+    write_debugreg(6, X86_DR6_DEFAULT);
+
     if ( !guest_mode(regs) )
     {
         if ( regs->eflags & X86_EFLAGS_TF )
@@ -1814,7 +1839,8 @@ void do_debug(struct cpu_user_regs *regs)
     }
 
     /* Save debug status register where guest OS can peek at it */
-    v->arch.debugreg[6] = read_debugreg(6);
+    v->arch.debugreg[6] |= (dr6 & ~X86_DR6_DEFAULT);
+    v->arch.debugreg[6] &= (dr6 | ~X86_DR6_DEFAULT);
 
     ler_enable();
     pv_inject_hw_exception(TRAP_debug, X86_EVENT_NO_EC);
diff --git a/xen/include/asm-x86/debugreg.h b/xen/include/asm-x86/debugreg.h
index c57914e..b3b10ea 100644
--- a/xen/include/asm-x86/debugreg.h
+++ b/xen/include/asm-x86/debugreg.h
@@ -24,6 +24,8 @@
 #define DR_STATUS_RESERVED_ZERO (~0xffffeffful) /* Reserved, read as zero */
 #define DR_STATUS_RESERVED_ONE  0xffff0ff0ul /* Reserved, read as one */
 
+#define X86_DR6_DEFAULT 0xffff0ff0ul    /* Default %dr6 value. */
+
 /* Now define a bunch of things for manipulating the control register.
    The top two bytes of the control register consist of 4 fields of 4
    bits - each field corresponds to one of the four debug registers,

[-- Attachment #4: xsa260-unstable/xsa260-2.patch --]
[-- Type: application/octet-stream, Size: 3831 bytes --]

From: Andrew Cooper <andrew.cooper3@citrix.com>
Subject: x86/pv: Move exception injection into {,compat_}test_all_events()

This allows paths to jump straight to {,compat_}test_all_events() and have
injection of pending exceptions happen automatically, rather than requiring
all calling paths to handle exceptions themselves.

The normal exception path is simplified as a result, and
compat_post_handle_exception() is removed entirely.

This is part of XSA-260 / CVE-2018-8897

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>

diff --git a/xen/arch/x86/x86_64/compat/entry.S b/xen/arch/x86/x86_64/compat/entry.S
index ae2bb4b..79aeaa3 100644
--- a/xen/arch/x86/x86_64/compat/entry.S
+++ b/xen/arch/x86/x86_64/compat/entry.S
@@ -39,6 +39,12 @@ ENTRY(compat_test_all_events)
         leaq  irq_stat+IRQSTAT_softirq_pending(%rip),%rcx
         cmpl  $0,(%rcx,%rax,1)
         jne   compat_process_softirqs
+
+        /* Inject exception if pending. */
+        lea   VCPU_trap_bounce(%rbx), %rdx
+        testb $TBF_EXCEPTION, TRAPBOUNCE_flags(%rdx)
+        jnz   .Lcompat_process_trapbounce
+
         cmpb  $0, VCPU_mce_pending(%rbx)
         jne   compat_process_mce
 .Lcompat_test_guest_nmi:
@@ -68,6 +74,15 @@ compat_process_softirqs:
         call  do_softirq
         jmp   compat_test_all_events
 
+        ALIGN
+/* %rbx: struct vcpu, %rdx: struct trap_bounce */
+.Lcompat_process_trapbounce:
+        sti
+.Lcompat_bounce_exception:
+        call  compat_create_bounce_frame
+        movb  $0, TRAPBOUNCE_flags(%rdx)
+        jmp   compat_test_all_events
+
 	ALIGN
 /* %rbx: struct vcpu */
 compat_process_mce:
@@ -181,15 +196,6 @@ ENTRY(cr4_pv32_restore)
         xor   %eax, %eax
         ret
 
-/* %rdx: trap_bounce, %rbx: struct vcpu */
-ENTRY(compat_post_handle_exception)
-        testb $TBF_EXCEPTION,TRAPBOUNCE_flags(%rdx)
-        jz    compat_test_all_events
-.Lcompat_bounce_exception:
-        call  compat_create_bounce_frame
-        movb  $0,TRAPBOUNCE_flags(%rdx)
-        jmp   compat_test_all_events
-
         .section .text.entry, "ax", @progbits
 
 /* See lstar_enter for entry register state. */
diff --git a/xen/arch/x86/x86_64/entry.S b/xen/arch/x86/x86_64/entry.S
index 45d9842..ccece08 100644
--- a/xen/arch/x86/x86_64/entry.S
+++ b/xen/arch/x86/x86_64/entry.S
@@ -42,6 +42,12 @@ test_all_events:
         leaq  irq_stat+IRQSTAT_softirq_pending(%rip), %rcx
         cmpl  $0, (%rcx, %rax, 1)
         jne   process_softirqs
+
+        /* Inject exception if pending. */
+        lea   VCPU_trap_bounce(%rbx), %rdx
+        testb $TBF_EXCEPTION, TRAPBOUNCE_flags(%rdx)
+        jnz   .Lprocess_trapbounce
+
         cmpb  $0, VCPU_mce_pending(%rbx)
         jne   process_mce
 .Ltest_guest_nmi:
@@ -70,6 +76,15 @@ process_softirqs:
         jmp  test_all_events
 
         ALIGN
+/* %rbx: struct vcpu, %rdx struct trap_bounce */
+.Lprocess_trapbounce:
+        sti
+.Lbounce_exception:
+        call  create_bounce_frame
+        movb  $0, TRAPBOUNCE_flags(%rdx)
+        jmp   test_all_events
+
+        ALIGN
 /* %rbx: struct vcpu */
 process_mce:
         testb $1 << VCPU_TRAP_MCE, VCPU_async_exception_mask(%rbx)
@@ -651,15 +666,9 @@ handle_exception_saved:
         mov   %r15, STACK_CPUINFO_FIELD(xen_cr3)(%r14)
         testb $3,UREGS_cs(%rsp)
         jz    restore_all_xen
-        leaq  VCPU_trap_bounce(%rbx),%rdx
         movq  VCPU_domain(%rbx),%rax
         cmpb  $0, DOMAIN_is_32bit_pv(%rax)
-        jne   compat_post_handle_exception
-        testb $TBF_EXCEPTION,TRAPBOUNCE_flags(%rdx)
-        jz    test_all_events
-.Lbounce_exception:
-        call  create_bounce_frame
-        movb  $0,TRAPBOUNCE_flags(%rdx)
+        jne   compat_test_all_events
         jmp   test_all_events
 
 /* No special register assumptions. */

[-- Attachment #5: xsa260-unstable/xsa260-3.patch --]
[-- Type: application/octet-stream, Size: 4189 bytes --]

From: Andrew Cooper <andrew.cooper3@citrix.com>
Subject: x86/traps: Use an Interrupt Stack Table for #DB

PV guests can use architectural corner cases to cause #DB to be raised after
transitioning into supervisor mode.

Use an interrupt stack table for #DB to prevent the exception being taken with
a guest controlled stack pointer.

This is part of XSA-260 / CVE-2018-8897

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>

diff --git a/xen/arch/x86/cpu/common.c b/xen/arch/x86/cpu/common.c
index 0a452ae..8d1e7be 100644
--- a/xen/arch/x86/cpu/common.c
+++ b/xen/arch/x86/cpu/common.c
@@ -757,6 +757,7 @@ void load_system_tables(void)
 			[IST_MCE - 1] = stack_top + IST_MCE * PAGE_SIZE,
 			[IST_DF  - 1] = stack_top + IST_DF  * PAGE_SIZE,
 			[IST_NMI - 1] = stack_top + IST_NMI * PAGE_SIZE,
+			[IST_DB  - 1] = stack_top + IST_DB  * PAGE_SIZE,
 
 			[IST_MAX ... ARRAY_SIZE(tss->ist) - 1] =
 				0x8600111111111111ul,
diff --git a/xen/arch/x86/traps.c b/xen/arch/x86/traps.c
index 3d4ac59..819a31c 100644
--- a/xen/arch/x86/traps.c
+++ b/xen/arch/x86/traps.c
@@ -353,13 +353,13 @@ static void show_guest_stack(struct vcpu *v, const struct cpu_user_regs *regs)
 /*
  * Notes for get_stack_trace_bottom() and get_stack_dump_bottom()
  *
- * Stack pages 0, 1 and 2:
+ * Stack pages 0 - 3:
  *   These are all 1-page IST stacks.  Each of these stacks have an exception
  *   frame and saved register state at the top.  The interesting bound for a
  *   trace is the word adjacent to this, while the bound for a dump is the
  *   very top, including the exception frame.
  *
- * Stack pages 3, 4 and 5:
+ * Stack pages 4 and 5:
  *   None of these are particularly interesting.  With MEMORY_GUARD, page 5 is
  *   explicitly not present, so attempting to dump or trace it is
  *   counterproductive.  Without MEMORY_GUARD, it is possible for a call chain
@@ -380,12 +380,12 @@ unsigned long get_stack_trace_bottom(unsigned long sp)
 {
     switch ( get_stack_page(sp) )
     {
-    case 0 ... 2:
+    case 0 ... 3:
         return ROUNDUP(sp, PAGE_SIZE) -
             offsetof(struct cpu_user_regs, es) - sizeof(unsigned long);
 
 #ifndef MEMORY_GUARD
-    case 3 ... 5:
+    case 4 ... 5:
 #endif
     case 6 ... 7:
         return ROUNDUP(sp, STACK_SIZE) -
@@ -400,11 +400,11 @@ unsigned long get_stack_dump_bottom(unsigned long sp)
 {
     switch ( get_stack_page(sp) )
     {
-    case 0 ... 2:
+    case 0 ... 3:
         return ROUNDUP(sp, PAGE_SIZE) - sizeof(unsigned long);
 
 #ifndef MEMORY_GUARD
-    case 3 ... 5:
+    case 4 ... 5:
 #endif
     case 6 ... 7:
         return ROUNDUP(sp, STACK_SIZE) - sizeof(unsigned long);
diff --git a/xen/arch/x86/x86_64/entry.S b/xen/arch/x86/x86_64/entry.S
index ccece08..6ae4d6f 100644
--- a/xen/arch/x86/x86_64/entry.S
+++ b/xen/arch/x86/x86_64/entry.S
@@ -723,7 +723,7 @@ ENTRY(device_not_available)
 ENTRY(debug)
         pushq $0
         movl  $TRAP_debug,4(%rsp)
-        jmp   handle_exception
+        jmp   handle_ist_exception
 
 ENTRY(int3)
         pushq $0
diff --git a/xen/include/asm-x86/processor.h b/xen/include/asm-x86/processor.h
index db9988a..6e0e935 100644
--- a/xen/include/asm-x86/processor.h
+++ b/xen/include/asm-x86/processor.h
@@ -404,7 +404,8 @@ struct __packed __cacheline_aligned tss_struct {
 #define IST_DF   1UL
 #define IST_NMI  2UL
 #define IST_MCE  3UL
-#define IST_MAX  3UL
+#define IST_DB   4UL
+#define IST_MAX  4UL
 
 /* Set the interrupt stack table used by a particular interrupt
  * descriptor table entry. */
@@ -423,6 +424,7 @@ static inline void enable_each_ist(idt_entry_t *idt)
     set_ist(&idt[TRAP_double_fault],  IST_DF);
     set_ist(&idt[TRAP_nmi],           IST_NMI);
     set_ist(&idt[TRAP_machine_check], IST_MCE);
+    set_ist(&idt[TRAP_debug],         IST_DB);
 }
 
 static inline void disable_each_ist(idt_entry_t *idt)
@@ -430,6 +432,7 @@ static inline void disable_each_ist(idt_entry_t *idt)
     set_ist(&idt[TRAP_double_fault],  IST_NONE);
     set_ist(&idt[TRAP_nmi],           IST_NONE);
     set_ist(&idt[TRAP_machine_check], IST_NONE);
+    set_ist(&idt[TRAP_debug],         IST_NONE);
 }
 
 #define IDT_ENTRIES 256

[-- Attachment #6: xsa260-unstable/xsa260-4.patch --]
[-- Type: application/octet-stream, Size: 2891 bytes --]

From: Andrew Cooper <andrew.cooper3@citrix.com>
Subject: x86/traps: Fix handling of #DB exceptions in hypervisor context

The WARN_ON() can be triggered by guest activities, and emits a full stack
trace without rate limiting.  Swap it out for a ratelimited printk with just
enough information to work out what is going on.

Not all #DB exceptions are traps, so blindly continuing is not a safe action
to take.  We don't let PV guests select these settings in the real %dr7 to
begin with, but for added safety against unexpected situations, detect the
fault cases and crash in an obvious manner.

This is part of XSA-260 / CVE-2018-8897

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>

diff --git a/xen/arch/x86/traps.c b/xen/arch/x86/traps.c
index f6099ce..824647d 100644
--- a/xen/arch/x86/traps.c
+++ b/xen/arch/x86/traps.c
@@ -1825,16 +1825,44 @@ void do_debug(struct cpu_user_regs *regs)
                 regs->eflags &= ~X86_EFLAGS_TF;
             }
         }
-        else
+
+        /*
+         * Check for fault conditions.  General Detect, and instruction
+         * breakpoints are faults rather than traps, at which point attempting
+         * to ignore and continue will result in a livelock.
+         */
+        if ( dr6 & DR_GENERAL_DETECT )
         {
-            /*
-             * We ignore watchpoints when they trigger within Xen. This may
-             * happen when a buffer is passed to us which previously had a
-             * watchpoint set on it. No need to bump EIP; the only faulting
-             * trap is an instruction breakpoint, which can't happen to us.
-             */
-            WARN_ON(!search_exception_table(regs));
+            printk(XENLOG_ERR "Hit General Detect in Xen context\n");
+            fatal_trap(regs, 0);
         }
+
+        if ( dr6 & (DR_TRAP3 | DR_TRAP2 | DR_TRAP1 | DR_TRAP0) )
+        {
+            unsigned int bp, dr7 = read_debugreg(7) >> DR_CONTROL_SHIFT;
+
+            for ( bp = 0; bp < 4; ++bp )
+            {
+                if ( (dr6 & (1u << bp)) && /* Breakpoint triggered? */
+                     ((dr7 & (3u << (bp * DR_CONTROL_SIZE))) == 0) /* Insn? */ )
+                {
+                    printk(XENLOG_ERR
+                           "Hit instruction breakpoint in Xen context\n");
+                    fatal_trap(regs, 0);
+                }
+            }
+        }
+
+        /*
+         * Whatever caused this #DB should be a trap.  Note it and continue.
+         * Guests can trigger this in certain corner cases, so ensure the
+         * message is ratelimited.
+         */
+        gprintk(XENLOG_WARNING,
+                "Hit #DB in Xen context: %04x:%p [%ps], stk %04x:%p, dr6 %lx\n",
+                regs->cs, _p(regs->rip), _p(regs->rip),
+                regs->ss, _p(regs->rsp), dr6);
+
         goto out;
     }
 

[-- Attachment #7: xsa260-4.6/xsa260-1.patch --]
[-- Type: application/octet-stream, Size: 2785 bytes --]

From: Andrew Cooper <andrew.cooper3@citrix.com>
Subject: x86/traps: Fix %dr6 handing in #DB handler

Most bits in %dr6 accumulate, rather than being set directly based on the
current source of #DB.  Have the handler follow the manuals guidance, which
avoids leaking hypervisor debugging activities into guest context.

This is part of XSA-260 / CVE-2018-8897.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>

--- a/xen/arch/x86/traps.c
+++ b/xen/arch/x86/traps.c
@@ -3667,10 +3667,35 @@ static void ler_enable(void)
 
 void do_debug(struct cpu_user_regs *regs)
 {
+    unsigned long dr6;
     struct vcpu *v = current;
 
+    /* Stash dr6 as early as possible. */
+    dr6 = read_debugreg(6);
+
     DEBUGGER_trap_entry(TRAP_debug, regs);
 
+    /*
+     * At the time of writing (March 2018), on the subject of %dr6:
+     *
+     * The Intel manual says:
+     *   Certain debug exceptions may clear bits 0-3. The remaining contents
+     *   of the DR6 register are never cleared by the processor. To avoid
+     *   confusion in identifying debug exceptions, debug handlers should
+     *   clear the register (except bit 16, which they should set) before
+     *   returning to the interrupted task.
+     *
+     * The AMD manual says:
+     *   Bits 15:13 of the DR6 register are not cleared by the processor and
+     *   must be cleared by software after the contents have been read.
+     *
+     * Some bits are reserved set, some are reserved clear, and some bits
+     * which were previously reserved set are reused and cleared by hardware.
+     * For future compatibility, reset to the default value, which will allow
+     * us to spot any bit being changed by hardware to its non-default value.
+     */
+    write_debugreg(6, X86_DR6_DEFAULT);
+
     if ( !guest_mode(regs) )
     {
         if ( regs->eflags & X86_EFLAGS_TF )
@@ -3703,7 +3728,8 @@ void do_debug(struct cpu_user_regs *regs
     }
 
     /* Save debug status register where guest OS can peek at it */
-    v->arch.debugreg[6] = read_debugreg(6);
+    v->arch.debugreg[6] |= (dr6 & ~X86_DR6_DEFAULT);
+    v->arch.debugreg[6] &= (dr6 | ~X86_DR6_DEFAULT);
 
     ler_enable();
     do_guest_trap(TRAP_debug, regs, 0);
--- a/xen/include/asm-x86/debugreg.h
+++ b/xen/include/asm-x86/debugreg.h
@@ -24,6 +24,8 @@
 #define DR_STATUS_RESERVED_ZERO (~0xffffeffful) /* Reserved, read as zero */
 #define DR_STATUS_RESERVED_ONE  0xffff0ff0ul /* Reserved, read as one */
 
+#define X86_DR6_DEFAULT 0xffff0ff0ul    /* Default %dr6 value. */
+
 /* Now define a bunch of things for manipulating the control register.
    The top two bytes of the control register consist of 4 fields of 4
    bits - each field corresponds to one of the four debug registers,

[-- Attachment #8: xsa260-4.6/xsa260-2.patch --]
[-- Type: application/octet-stream, Size: 3600 bytes --]

From: Andrew Cooper <andrew.cooper3@citrix.com>
Subject: x86/pv: Move exception injection into {,compat_}test_all_events()

This allows paths to jump straight to {,compat_}test_all_events() and have
injection of pending exceptions happen automatically, rather than requiring
all calling paths to handle exceptions themselves.

The normal exception path is simplified as a result, and
compat_post_handle_exception() is removed entirely.

This is part of XSA-260 / CVE-2018-8897.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>

--- a/xen/arch/x86/x86_64/compat/entry.S
+++ b/xen/arch/x86/x86_64/compat/entry.S
@@ -105,6 +105,12 @@ ENTRY(compat_test_all_events)
         leaq  irq_stat+IRQSTAT_softirq_pending(%rip),%rcx
         cmpl  $0,(%rcx,%rax,1)
         jne   compat_process_softirqs
+
+        /* Inject exception if pending. */
+        lea   VCPU_trap_bounce(%rbx), %rdx
+        testb $TBF_EXCEPTION, TRAPBOUNCE_flags(%rdx)
+        jnz   .Lcompat_process_trapbounce
+
         testb $1,VCPU_mce_pending(%rbx)
         jnz   compat_process_mce
 .Lcompat_test_guest_nmi:
@@ -134,6 +140,15 @@ compat_process_softirqs:
         call  do_softirq
         jmp   compat_test_all_events
 
+        ALIGN
+/* %rbx: struct vcpu, %rdx: struct trap_bounce */
+.Lcompat_process_trapbounce:
+        sti
+.Lcompat_bounce_exception:
+        call  compat_create_bounce_frame
+        movb  $0, TRAPBOUNCE_flags(%rdx)
+        jmp   compat_test_all_events
+
 	ALIGN
 /* %rbx: struct vcpu */
 compat_process_mce:
@@ -290,15 +305,6 @@ ENTRY(cr4_pv32_restore)
         xor   %eax, %eax
         ret
 
-/* %rdx: trap_bounce, %rbx: struct vcpu */
-ENTRY(compat_post_handle_exception)
-        testb $TBF_EXCEPTION,TRAPBOUNCE_flags(%rdx)
-        jz    compat_test_all_events
-.Lcompat_bounce_exception:
-        call  compat_create_bounce_frame
-        movb  $0,TRAPBOUNCE_flags(%rdx)
-        jmp   compat_test_all_events
-
 /* See lstar_enter for entry register state. */
 ENTRY(cstar_enter)
         sti
--- a/xen/arch/x86/x86_64/entry.S
+++ b/xen/arch/x86/x86_64/entry.S
@@ -270,6 +270,12 @@ test_all_events:
         leaq  irq_stat+IRQSTAT_softirq_pending(%rip),%rcx
         cmpl  $0,(%rcx,%rax,1)
         jne   process_softirqs
+
+        /* Inject exception if pending. */
+        lea   VCPU_trap_bounce(%rbx), %rdx
+        testb $TBF_EXCEPTION, TRAPBOUNCE_flags(%rdx)
+        jnz   .Lprocess_trapbounce
+
         testb $1,VCPU_mce_pending(%rbx)
         jnz   process_mce
 .Ltest_guest_nmi:
@@ -298,6 +304,15 @@ process_softirqs:
         jmp  test_all_events
 
         ALIGN
+/* %rbx: struct vcpu, %rdx struct trap_bounce */
+.Lprocess_trapbounce:
+        sti
+.Lbounce_exception:
+        call  create_bounce_frame
+        movb  $0, TRAPBOUNCE_flags(%rdx)
+        jmp   test_all_events
+
+        ALIGN
 /* %rbx: struct vcpu */
 process_mce:
         testb $1 << VCPU_TRAP_MCE,VCPU_async_exception_mask(%rbx)
@@ -718,15 +733,9 @@ handle_exception_saved:
         mov   %r15, STACK_CPUINFO_FIELD(xen_cr3)(%r14)
         testb $3,UREGS_cs(%rsp)
         jz    restore_all_xen
-        leaq  VCPU_trap_bounce(%rbx),%rdx
         movq  VCPU_domain(%rbx),%rax
         testb $1,DOMAIN_is_32bit_pv(%rax)
-        jnz   compat_post_handle_exception
-        testb $TBF_EXCEPTION,TRAPBOUNCE_flags(%rdx)
-        jz    test_all_events
-.Lbounce_exception:
-        call  create_bounce_frame
-        movb  $0,TRAPBOUNCE_flags(%rdx)
+        jnz   compat_test_all_events
         jmp   test_all_events
 
 /* No special register assumptions. */

[-- Attachment #9: xsa260-4.6/xsa260-3.patch --]
[-- Type: application/octet-stream, Size: 4917 bytes --]

From: Andrew Cooper <andrew.cooper3@citrix.com>
Subject: x86/traps: Use an Interrupt Stack Table for #DB

PV guests can use architectural corner cases to cause #DB to be raised after
transitioning into supervisor mode.

Use an interrupt stack table for #DB to prevent the exception being taken with
a guest controlled stack pointer.

This is part of XSA-260 / CVE-2018-8897.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>

--- a/xen/arch/x86/cpu/common.c
+++ b/xen/arch/x86/cpu/common.c
@@ -614,6 +614,7 @@ void __cpuinit load_system_tables(void)
 	tss->ist[IST_MCE - 1] = stack_top + IST_MCE * PAGE_SIZE;
 	tss->ist[IST_DF  - 1] = stack_top + IST_DF  * PAGE_SIZE;
 	tss->ist[IST_NMI - 1] = stack_top + IST_NMI * PAGE_SIZE;
+	tss->ist[IST_DB  - 1] = stack_top + IST_DB  * PAGE_SIZE;
 
 	_set_tssldt_desc(
 		gdt + TSS_ENTRY,
@@ -634,6 +635,7 @@ void __cpuinit load_system_tables(void)
 	set_ist(&idt_tables[cpu][TRAP_double_fault],  IST_DF);
 	set_ist(&idt_tables[cpu][TRAP_nmi],	      IST_NMI);
 	set_ist(&idt_tables[cpu][TRAP_machine_check], IST_MCE);
+	set_ist(&idt_tables[cpu][TRAP_debug],         IST_DB);
 }
 
 /*
--- a/xen/arch/x86/hvm/svm/svm.c
+++ b/xen/arch/x86/hvm/svm/svm.c
@@ -997,6 +997,7 @@ static void svm_ctxt_switch_from(struct
     set_ist(&idt_tables[cpu][TRAP_double_fault],  IST_DF);
     set_ist(&idt_tables[cpu][TRAP_nmi],           IST_NMI);
     set_ist(&idt_tables[cpu][TRAP_machine_check], IST_MCE);
+    set_ist(&idt_tables[cpu][TRAP_debug],         IST_DB);
 }
 
 static void svm_ctxt_switch_to(struct vcpu *v)
@@ -1021,6 +1022,7 @@ static void svm_ctxt_switch_to(struct vc
     set_ist(&idt_tables[cpu][TRAP_double_fault],  IST_NONE);
     set_ist(&idt_tables[cpu][TRAP_nmi],           IST_NONE);
     set_ist(&idt_tables[cpu][TRAP_machine_check], IST_NONE);
+    set_ist(&idt_tables[cpu][TRAP_debug],         IST_NONE);
 
     svm_restore_dr(v);
 
--- a/xen/arch/x86/smpboot.c
+++ b/xen/arch/x86/smpboot.c
@@ -952,6 +952,7 @@ static int cpu_smpboot_alloc(unsigned in
     set_ist(&idt_tables[cpu][TRAP_double_fault],  IST_NONE);
     set_ist(&idt_tables[cpu][TRAP_nmi],           IST_NONE);
     set_ist(&idt_tables[cpu][TRAP_machine_check], IST_NONE);
+    set_ist(&idt_tables[cpu][TRAP_debug],         IST_NONE);
 
     if ( setup_cpu_root_pgt(cpu) )
         goto oom;
--- a/xen/arch/x86/traps.c
+++ b/xen/arch/x86/traps.c
@@ -196,13 +196,13 @@ static void show_guest_stack(struct vcpu
 /*
  * Notes for get_stack_trace_bottom() and get_stack_dump_bottom()
  *
- * Stack pages 0, 1 and 2:
+ * Stack pages 0 - 3:
  *   These are all 1-page IST stacks.  Each of these stacks have an exception
  *   frame and saved register state at the top.  The interesting bound for a
  *   trace is the word adjacent to this, while the bound for a dump is the
  *   very top, including the exception frame.
  *
- * Stack pages 3, 4 and 5:
+ * Stack pages 4 and 5:
  *   None of these are particularly interesting.  With MEMORY_GUARD, page 5 is
  *   explicitly not present, so attempting to dump or trace it is
  *   counterproductive.  Without MEMORY_GUARD, it is possible for a call chain
@@ -223,12 +223,12 @@ unsigned long get_stack_trace_bottom(uns
 {
     switch ( get_stack_page(sp) )
     {
-    case 0 ... 2:
+    case 0 ... 3:
         return ROUNDUP(sp, PAGE_SIZE) -
             offsetof(struct cpu_user_regs, es) - sizeof(unsigned long);
 
 #ifndef MEMORY_GUARD
-    case 3 ... 5:
+    case 4 ... 5:
 #endif
     case 6 ... 7:
         return ROUNDUP(sp, STACK_SIZE) -
@@ -243,11 +243,11 @@ unsigned long get_stack_dump_bottom(unsi
 {
     switch ( get_stack_page(sp) )
     {
-    case 0 ... 2:
+    case 0 ... 3:
         return ROUNDUP(sp, PAGE_SIZE) - sizeof(unsigned long);
 
 #ifndef MEMORY_GUARD
-    case 3 ... 5:
+    case 4 ... 5:
 #endif
     case 6 ... 7:
         return ROUNDUP(sp, STACK_SIZE) - sizeof(unsigned long);
@@ -3847,6 +3847,7 @@ void __init init_idt_traps(void)
     set_ist(&idt_table[TRAP_double_fault],  IST_DF);
     set_ist(&idt_table[TRAP_nmi],           IST_NMI);
     set_ist(&idt_table[TRAP_machine_check], IST_MCE);
+    set_ist(&idt_table[TRAP_debug],         IST_DB);
 
     /* CPU0 uses the master IDT. */
     idt_tables[0] = idt_table;
--- a/xen/arch/x86/x86_64/entry.S
+++ b/xen/arch/x86/x86_64/entry.S
@@ -789,7 +789,7 @@ ENTRY(device_not_available)
 ENTRY(debug)
         pushq $0
         movl  $TRAP_debug,4(%rsp)
-        jmp   handle_exception
+        jmp   handle_ist_exception
 
 ENTRY(int3)
         pushq $0
--- a/xen/include/asm-x86/processor.h
+++ b/xen/include/asm-x86/processor.h
@@ -453,7 +453,8 @@ struct __packed __cacheline_aligned tss_
 #define IST_DF   1UL
 #define IST_NMI  2UL
 #define IST_MCE  3UL
-#define IST_MAX  3UL
+#define IST_DB   4UL
+#define IST_MAX  4UL
 
 /* Set the interrupt stack table used by a particular interrupt
  * descriptor table entry. */

[-- Attachment #10: xsa260-4.6/xsa260-4.patch --]
[-- Type: application/octet-stream, Size: 2803 bytes --]

From: Andrew Cooper <andrew.cooper3@citrix.com>
Subject: x86/traps: Fix handling of #DB exceptions in hypervisor context

The WARN_ON() can be triggered by guest activities, and emits a full stack
trace without rate limiting.  Swap it out for a ratelimited printk with just
enough information to work out what is going on.

Not all #DB exceptions are traps, so blindly continuing is not a safe action
to take.  We don't let PV guests select these settings in the real %dr7 to
begin with, but for added safety against unexpected situations, detect the
fault cases and crash in an obvious manner.

This is part of XSA-260 / CVE-2018-8897.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>

--- a/xen/arch/x86/traps.c
+++ b/xen/arch/x86/traps.c
@@ -3714,16 +3714,44 @@ void do_debug(struct cpu_user_regs *regs
                 regs->eflags &= ~X86_EFLAGS_TF;
             }
         }
-        else
+
+        /*
+         * Check for fault conditions.  General Detect, and instruction
+         * breakpoints are faults rather than traps, at which point attempting
+         * to ignore and continue will result in a livelock.
+         */
+        if ( dr6 & DR_GENERAL_DETECT )
+        {
+            printk(XENLOG_ERR "Hit General Detect in Xen context\n");
+            fatal_trap(regs);
+        }
+
+        if ( dr6 & (DR_TRAP3 | DR_TRAP2 | DR_TRAP1 | DR_TRAP0) )
         {
-            /*
-             * We ignore watchpoints when they trigger within Xen. This may
-             * happen when a buffer is passed to us which previously had a
-             * watchpoint set on it. No need to bump EIP; the only faulting
-             * trap is an instruction breakpoint, which can't happen to us.
-             */
-            WARN_ON(!search_exception_table(regs->eip));
+            unsigned int bp, dr7 = read_debugreg(7) >> DR_CONTROL_SHIFT;
+
+            for ( bp = 0; bp < 4; ++bp )
+            {
+                if ( (dr6 & (1u << bp)) && /* Breakpoint triggered? */
+                     ((dr7 & (3u << (bp * DR_CONTROL_SIZE))) == 0) /* Insn? */ )
+                {
+                    printk(XENLOG_ERR
+                           "Hit instruction breakpoint in Xen context\n");
+                    fatal_trap(regs);
+                }
+            }
         }
+
+        /*
+         * Whatever caused this #DB should be a trap.  Note it and continue.
+         * Guests can trigger this in certain corner cases, so ensure the
+         * message is ratelimited.
+         */
+        gprintk(XENLOG_WARNING,
+                "Hit #DB in Xen context: %04x:%p [%ps], stk %04x:%p, dr6 %lx\n",
+                regs->cs, _p(regs->rip), _p(regs->rip),
+                regs->ss, _p(regs->rsp), dr6);
+
         goto out;
     }
 

[-- Attachment #11: xsa260-4.7/xsa260-1.patch --]
[-- Type: application/octet-stream, Size: 2785 bytes --]

From: Andrew Cooper <andrew.cooper3@citrix.com>
Subject: x86/traps: Fix %dr6 handing in #DB handler

Most bits in %dr6 accumulate, rather than being set directly based on the
current source of #DB.  Have the handler follow the manuals guidance, which
avoids leaking hypervisor debugging activities into guest context.

This is part of XSA-260 / CVE-2018-8897.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>

--- a/xen/arch/x86/traps.c
+++ b/xen/arch/x86/traps.c
@@ -3842,10 +3842,35 @@ static void ler_enable(void)
 
 void do_debug(struct cpu_user_regs *regs)
 {
+    unsigned long dr6;
     struct vcpu *v = current;
 
+    /* Stash dr6 as early as possible. */
+    dr6 = read_debugreg(6);
+
     DEBUGGER_trap_entry(TRAP_debug, regs);
 
+    /*
+     * At the time of writing (March 2018), on the subject of %dr6:
+     *
+     * The Intel manual says:
+     *   Certain debug exceptions may clear bits 0-3. The remaining contents
+     *   of the DR6 register are never cleared by the processor. To avoid
+     *   confusion in identifying debug exceptions, debug handlers should
+     *   clear the register (except bit 16, which they should set) before
+     *   returning to the interrupted task.
+     *
+     * The AMD manual says:
+     *   Bits 15:13 of the DR6 register are not cleared by the processor and
+     *   must be cleared by software after the contents have been read.
+     *
+     * Some bits are reserved set, some are reserved clear, and some bits
+     * which were previously reserved set are reused and cleared by hardware.
+     * For future compatibility, reset to the default value, which will allow
+     * us to spot any bit being changed by hardware to its non-default value.
+     */
+    write_debugreg(6, X86_DR6_DEFAULT);
+
     if ( !guest_mode(regs) )
     {
         if ( regs->eflags & X86_EFLAGS_TF )
@@ -3878,7 +3903,8 @@ void do_debug(struct cpu_user_regs *regs
     }
 
     /* Save debug status register where guest OS can peek at it */
-    v->arch.debugreg[6] = read_debugreg(6);
+    v->arch.debugreg[6] |= (dr6 & ~X86_DR6_DEFAULT);
+    v->arch.debugreg[6] &= (dr6 | ~X86_DR6_DEFAULT);
 
     ler_enable();
     do_guest_trap(TRAP_debug, regs, 0);
--- a/xen/include/asm-x86/debugreg.h
+++ b/xen/include/asm-x86/debugreg.h
@@ -24,6 +24,8 @@
 #define DR_STATUS_RESERVED_ZERO (~0xffffeffful) /* Reserved, read as zero */
 #define DR_STATUS_RESERVED_ONE  0xffff0ff0ul /* Reserved, read as one */
 
+#define X86_DR6_DEFAULT 0xffff0ff0ul    /* Default %dr6 value. */
+
 /* Now define a bunch of things for manipulating the control register.
    The top two bytes of the control register consist of 4 fields of 4
    bits - each field corresponds to one of the four debug registers,

[-- Attachment #12: xsa260-4.7/xsa260-2.patch --]
[-- Type: application/octet-stream, Size: 3662 bytes --]

From: Andrew Cooper <andrew.cooper3@citrix.com>
Subject: x86/pv: Move exception injection into {,compat_}test_all_events()

This allows paths to jump straight to {,compat_}test_all_events() and have
injection of pending exceptions happen automatically, rather than requiring
all calling paths to handle exceptions themselves.

The normal exception path is simplified as a result, and
compat_post_handle_exception() is removed entirely.

This is part of XSA-260 / CVE-2018-8897.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>

--- a/xen/arch/x86/x86_64/compat/entry.S
+++ b/xen/arch/x86/x86_64/compat/entry.S
@@ -107,6 +107,12 @@ ENTRY(compat_test_all_events)
         leaq  irq_stat+IRQSTAT_softirq_pending(%rip),%rcx
         cmpl  $0,(%rcx,%rax,1)
         jne   compat_process_softirqs
+
+        /* Inject exception if pending. */
+        lea   VCPU_trap_bounce(%rbx), %rdx
+        testb $TBF_EXCEPTION, TRAPBOUNCE_flags(%rdx)
+        jnz   .Lcompat_process_trapbounce
+
         testb $1,VCPU_mce_pending(%rbx)
         jnz   compat_process_mce
 .Lcompat_test_guest_nmi:
@@ -136,6 +142,15 @@ compat_process_softirqs:
         call  do_softirq
         jmp   compat_test_all_events
 
+        ALIGN
+/* %rbx: struct vcpu, %rdx: struct trap_bounce */
+.Lcompat_process_trapbounce:
+        sti
+.Lcompat_bounce_exception:
+        call  compat_create_bounce_frame
+        movb  $0, TRAPBOUNCE_flags(%rdx)
+        jmp   compat_test_all_events
+
 	ALIGN
 /* %rbx: struct vcpu */
 compat_process_mce:
@@ -260,15 +275,6 @@ ENTRY(cr4_pv32_restore)
         xor   %eax, %eax
         ret
 
-/* %rdx: trap_bounce, %rbx: struct vcpu */
-ENTRY(compat_post_handle_exception)
-        testb $TBF_EXCEPTION,TRAPBOUNCE_flags(%rdx)
-        jz    compat_test_all_events
-.Lcompat_bounce_exception:
-        call  compat_create_bounce_frame
-        movb  $0,TRAPBOUNCE_flags(%rdx)
-        jmp   compat_test_all_events
-
 /* See lstar_enter for entry register state. */
 ENTRY(cstar_enter)
         /* sti could live here when we don't switch page tables below. */
--- a/xen/arch/x86/x86_64/entry.S
+++ b/xen/arch/x86/x86_64/entry.S
@@ -253,6 +253,12 @@ test_all_events:
         leaq  irq_stat+IRQSTAT_softirq_pending(%rip),%rcx
         cmpl  $0,(%rcx,%rax,1)
         jne   process_softirqs
+
+        /* Inject exception if pending. */
+        lea   VCPU_trap_bounce(%rbx), %rdx
+        testb $TBF_EXCEPTION, TRAPBOUNCE_flags(%rdx)
+        jnz   .Lprocess_trapbounce
+
         testb $1,VCPU_mce_pending(%rbx)
         jnz   process_mce
 .Ltest_guest_nmi:
@@ -281,6 +287,15 @@ process_softirqs:
         jmp  test_all_events
 
         ALIGN
+/* %rbx: struct vcpu, %rdx struct trap_bounce */
+.Lprocess_trapbounce:
+        sti
+.Lbounce_exception:
+        call  create_bounce_frame
+        movb  $0, TRAPBOUNCE_flags(%rdx)
+        jmp   test_all_events
+
+        ALIGN
 /* %rbx: struct vcpu */
 process_mce:
         testb $1 << VCPU_TRAP_MCE,VCPU_async_exception_mask(%rbx)
@@ -698,15 +713,9 @@ handle_exception_saved:
         mov   %r15, STACK_CPUINFO_FIELD(xen_cr3)(%r14)
         testb $3,UREGS_cs(%rsp)
         jz    restore_all_xen
-        leaq  VCPU_trap_bounce(%rbx),%rdx
         movq  VCPU_domain(%rbx),%rax
         testb $1,DOMAIN_is_32bit_pv(%rax)
-        jnz   compat_post_handle_exception
-        testb $TBF_EXCEPTION,TRAPBOUNCE_flags(%rdx)
-        jz    test_all_events
-.Lbounce_exception:
-        call  create_bounce_frame
-        movb  $0,TRAPBOUNCE_flags(%rdx)
+        jnz   compat_test_all_events
         jmp   test_all_events
 
 /* No special register assumptions. */

[-- Attachment #13: xsa260-4.7/xsa260-3.patch --]
[-- Type: application/octet-stream, Size: 4973 bytes --]

From: Andrew Cooper <andrew.cooper3@citrix.com>
Subject: x86/traps: Use an Interrupt Stack Table for #DB

PV guests can use architectural corner cases to cause #DB to be raised after
transitioning into supervisor mode.

Use an interrupt stack table for #DB to prevent the exception being taken with
a guest controlled stack pointer.

This is part of XSA-260 / CVE-2018-8897.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>

--- a/xen/arch/x86/cpu/common.c
+++ b/xen/arch/x86/cpu/common.c
@@ -672,6 +672,7 @@ void load_system_tables(void)
 	tss->ist[IST_MCE - 1] = stack_top + IST_MCE * PAGE_SIZE;
 	tss->ist[IST_DF  - 1] = stack_top + IST_DF  * PAGE_SIZE;
 	tss->ist[IST_NMI - 1] = stack_top + IST_NMI * PAGE_SIZE;
+	tss->ist[IST_DB  - 1] = stack_top + IST_DB  * PAGE_SIZE;
 
 	_set_tssldt_desc(
 		gdt + TSS_ENTRY,
@@ -692,6 +693,7 @@ void load_system_tables(void)
 	set_ist(&idt_tables[cpu][TRAP_double_fault],  IST_DF);
 	set_ist(&idt_tables[cpu][TRAP_nmi],	      IST_NMI);
 	set_ist(&idt_tables[cpu][TRAP_machine_check], IST_MCE);
+	set_ist(&idt_tables[cpu][TRAP_debug],         IST_DB);
 }
 
 /*
--- a/xen/arch/x86/hvm/svm/svm.c
+++ b/xen/arch/x86/hvm/svm/svm.c
@@ -1045,6 +1045,7 @@ static void svm_ctxt_switch_from(struct
     set_ist(&idt_tables[cpu][TRAP_double_fault],  IST_DF);
     set_ist(&idt_tables[cpu][TRAP_nmi],           IST_NMI);
     set_ist(&idt_tables[cpu][TRAP_machine_check], IST_MCE);
+    set_ist(&idt_tables[cpu][TRAP_debug],         IST_DB);
 }
 
 static void svm_ctxt_switch_to(struct vcpu *v)
@@ -1069,6 +1070,7 @@ static void svm_ctxt_switch_to(struct vc
     set_ist(&idt_tables[cpu][TRAP_double_fault],  IST_NONE);
     set_ist(&idt_tables[cpu][TRAP_nmi],           IST_NONE);
     set_ist(&idt_tables[cpu][TRAP_machine_check], IST_NONE);
+    set_ist(&idt_tables[cpu][TRAP_debug],         IST_NONE);
 
     svm_restore_dr(v);
 
--- a/xen/arch/x86/smpboot.c
+++ b/xen/arch/x86/smpboot.c
@@ -962,6 +962,7 @@ static int cpu_smpboot_alloc(unsigned in
     set_ist(&idt_tables[cpu][TRAP_double_fault],  IST_NONE);
     set_ist(&idt_tables[cpu][TRAP_nmi],           IST_NONE);
     set_ist(&idt_tables[cpu][TRAP_machine_check], IST_NONE);
+    set_ist(&idt_tables[cpu][TRAP_debug],         IST_NONE);
 
     for ( stub_page = 0, i = cpu & ~(STUBS_PER_PAGE - 1);
           i < nr_cpu_ids && i <= (cpu | (STUBS_PER_PAGE - 1)); ++i )
--- a/xen/arch/x86/traps.c
+++ b/xen/arch/x86/traps.c
@@ -259,13 +259,13 @@ static void show_guest_stack(struct vcpu
 /*
  * Notes for get_stack_trace_bottom() and get_stack_dump_bottom()
  *
- * Stack pages 0, 1 and 2:
+ * Stack pages 0 - 3:
  *   These are all 1-page IST stacks.  Each of these stacks have an exception
  *   frame and saved register state at the top.  The interesting bound for a
  *   trace is the word adjacent to this, while the bound for a dump is the
  *   very top, including the exception frame.
  *
- * Stack pages 3, 4 and 5:
+ * Stack pages 4 and 5:
  *   None of these are particularly interesting.  With MEMORY_GUARD, page 5 is
  *   explicitly not present, so attempting to dump or trace it is
  *   counterproductive.  Without MEMORY_GUARD, it is possible for a call chain
@@ -286,12 +286,12 @@ unsigned long get_stack_trace_bottom(uns
 {
     switch ( get_stack_page(sp) )
     {
-    case 0 ... 2:
+    case 0 ... 3:
         return ROUNDUP(sp, PAGE_SIZE) -
             offsetof(struct cpu_user_regs, es) - sizeof(unsigned long);
 
 #ifndef MEMORY_GUARD
-    case 3 ... 5:
+    case 4 ... 5:
 #endif
     case 6 ... 7:
         return ROUNDUP(sp, STACK_SIZE) -
@@ -306,11 +306,11 @@ unsigned long get_stack_dump_bottom(unsi
 {
     switch ( get_stack_page(sp) )
     {
-    case 0 ... 2:
+    case 0 ... 3:
         return ROUNDUP(sp, PAGE_SIZE) - sizeof(unsigned long);
 
 #ifndef MEMORY_GUARD
-    case 3 ... 5:
+    case 4 ... 5:
 #endif
     case 6 ... 7:
         return ROUNDUP(sp, STACK_SIZE) - sizeof(unsigned long);
@@ -4022,6 +4022,7 @@ void __init init_idt_traps(void)
     set_ist(&idt_table[TRAP_double_fault],  IST_DF);
     set_ist(&idt_table[TRAP_nmi],           IST_NMI);
     set_ist(&idt_table[TRAP_machine_check], IST_MCE);
+    set_ist(&idt_table[TRAP_debug],         IST_DB);
 
     /* CPU0 uses the master IDT. */
     idt_tables[0] = idt_table;
--- a/xen/arch/x86/x86_64/entry.S
+++ b/xen/arch/x86/x86_64/entry.S
@@ -769,7 +769,7 @@ ENTRY(device_not_available)
 ENTRY(debug)
         pushq $0
         movl  $TRAP_debug,4(%rsp)
-        jmp   handle_exception
+        jmp   handle_ist_exception
 
 ENTRY(int3)
         pushq $0
--- a/xen/include/asm-x86/processor.h
+++ b/xen/include/asm-x86/processor.h
@@ -498,7 +498,8 @@ struct __packed __cacheline_aligned tss_
 #define IST_DF   1UL
 #define IST_NMI  2UL
 #define IST_MCE  3UL
-#define IST_MAX  3UL
+#define IST_DB   4UL
+#define IST_MAX  4UL
 
 /* Set the interrupt stack table used by a particular interrupt
  * descriptor table entry. */

[-- Attachment #14: xsa260-4.7/xsa260-4.patch --]
[-- Type: application/octet-stream, Size: 2803 bytes --]

From: Andrew Cooper <andrew.cooper3@citrix.com>
Subject: x86/traps: Fix handling of #DB exceptions in hypervisor context

The WARN_ON() can be triggered by guest activities, and emits a full stack
trace without rate limiting.  Swap it out for a ratelimited printk with just
enough information to work out what is going on.

Not all #DB exceptions are traps, so blindly continuing is not a safe action
to take.  We don't let PV guests select these settings in the real %dr7 to
begin with, but for added safety against unexpected situations, detect the
fault cases and crash in an obvious manner.

This is part of XSA-260 / CVE-2018-8897.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>

--- a/xen/arch/x86/traps.c
+++ b/xen/arch/x86/traps.c
@@ -3889,16 +3889,44 @@ void do_debug(struct cpu_user_regs *regs
                 regs->eflags &= ~X86_EFLAGS_TF;
             }
         }
-        else
+
+        /*
+         * Check for fault conditions.  General Detect, and instruction
+         * breakpoints are faults rather than traps, at which point attempting
+         * to ignore and continue will result in a livelock.
+         */
+        if ( dr6 & DR_GENERAL_DETECT )
+        {
+            printk(XENLOG_ERR "Hit General Detect in Xen context\n");
+            fatal_trap(regs);
+        }
+
+        if ( dr6 & (DR_TRAP3 | DR_TRAP2 | DR_TRAP1 | DR_TRAP0) )
         {
-            /*
-             * We ignore watchpoints when they trigger within Xen. This may
-             * happen when a buffer is passed to us which previously had a
-             * watchpoint set on it. No need to bump EIP; the only faulting
-             * trap is an instruction breakpoint, which can't happen to us.
-             */
-            WARN_ON(!search_exception_table(regs->eip));
+            unsigned int bp, dr7 = read_debugreg(7) >> DR_CONTROL_SHIFT;
+
+            for ( bp = 0; bp < 4; ++bp )
+            {
+                if ( (dr6 & (1u << bp)) && /* Breakpoint triggered? */
+                     ((dr7 & (3u << (bp * DR_CONTROL_SIZE))) == 0) /* Insn? */ )
+                {
+                    printk(XENLOG_ERR
+                           "Hit instruction breakpoint in Xen context\n");
+                    fatal_trap(regs);
+                }
+            }
         }
+
+        /*
+         * Whatever caused this #DB should be a trap.  Note it and continue.
+         * Guests can trigger this in certain corner cases, so ensure the
+         * message is ratelimited.
+         */
+        gprintk(XENLOG_WARNING,
+                "Hit #DB in Xen context: %04x:%p [%ps], stk %04x:%p, dr6 %lx\n",
+                regs->cs, _p(regs->rip), _p(regs->rip),
+                regs->ss, _p(regs->rsp), dr6);
+
         goto out;
     }
 

[-- Attachment #15: xsa260-4.8/xsa260-1.patch --]
[-- Type: application/octet-stream, Size: 2805 bytes --]

From: Andrew Cooper <andrew.cooper3@citrix.com>
Subject: x86/traps: Fix %dr6 handing in #DB handler

Most bits in %dr6 accumulate, rather than being set directly based on the
current source of #DB.  Have the handler follow the manuals guidance, which
avoids leaking hypervisor debugging activities into guest context.

This is part of XSA-260 / CVE-2018-8897.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>

--- a/xen/arch/x86/traps.c
+++ b/xen/arch/x86/traps.c
@@ -4038,11 +4038,36 @@ static void ler_enable(void)
 
 void do_debug(struct cpu_user_regs *regs)
 {
+    unsigned long dr6;
     struct vcpu *v = current;
 
+    /* Stash dr6 as early as possible. */
+    dr6 = read_debugreg(6);
+
     if ( debugger_trap_entry(TRAP_debug, regs) )
         return;
 
+    /*
+     * At the time of writing (March 2018), on the subject of %dr6:
+     *
+     * The Intel manual says:
+     *   Certain debug exceptions may clear bits 0-3. The remaining contents
+     *   of the DR6 register are never cleared by the processor. To avoid
+     *   confusion in identifying debug exceptions, debug handlers should
+     *   clear the register (except bit 16, which they should set) before
+     *   returning to the interrupted task.
+     *
+     * The AMD manual says:
+     *   Bits 15:13 of the DR6 register are not cleared by the processor and
+     *   must be cleared by software after the contents have been read.
+     *
+     * Some bits are reserved set, some are reserved clear, and some bits
+     * which were previously reserved set are reused and cleared by hardware.
+     * For future compatibility, reset to the default value, which will allow
+     * us to spot any bit being changed by hardware to its non-default value.
+     */
+    write_debugreg(6, X86_DR6_DEFAULT);
+
     if ( !guest_mode(regs) )
     {
         if ( regs->eflags & X86_EFLAGS_TF )
@@ -4075,7 +4100,8 @@ void do_debug(struct cpu_user_regs *regs
     }
 
     /* Save debug status register where guest OS can peek at it */
-    v->arch.debugreg[6] = read_debugreg(6);
+    v->arch.debugreg[6] |= (dr6 & ~X86_DR6_DEFAULT);
+    v->arch.debugreg[6] &= (dr6 | ~X86_DR6_DEFAULT);
 
     ler_enable();
     do_guest_trap(TRAP_debug, regs);
--- a/xen/include/asm-x86/debugreg.h
+++ b/xen/include/asm-x86/debugreg.h
@@ -24,6 +24,8 @@
 #define DR_STATUS_RESERVED_ZERO (~0xffffeffful) /* Reserved, read as zero */
 #define DR_STATUS_RESERVED_ONE  0xffff0ff0ul /* Reserved, read as one */
 
+#define X86_DR6_DEFAULT 0xffff0ff0ul    /* Default %dr6 value. */
+
 /* Now define a bunch of things for manipulating the control register.
    The top two bytes of the control register consist of 4 fields of 4
    bits - each field corresponds to one of the four debug registers,

[-- Attachment #16: xsa260-4.8/xsa260-2.patch --]
[-- Type: application/octet-stream, Size: 3614 bytes --]

From: Andrew Cooper <andrew.cooper3@citrix.com>
Subject: x86/pv: Move exception injection into {,compat_}test_all_events()

This allows paths to jump straight to {,compat_}test_all_events() and have
injection of pending exceptions happen automatically, rather than requiring
all calling paths to handle exceptions themselves.

The normal exception path is simplified as a result, and
compat_post_handle_exception() is removed entirely.

This is part of XSA-260 / CVE-2018-8897.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>

--- a/xen/arch/x86/x86_64/compat/entry.S
+++ b/xen/arch/x86/x86_64/compat/entry.S
@@ -45,6 +45,12 @@ ENTRY(compat_test_all_events)
         leaq  irq_stat+IRQSTAT_softirq_pending(%rip),%rcx
         cmpl  $0,(%rcx,%rax,1)
         jne   compat_process_softirqs
+
+        /* Inject exception if pending. */
+        lea   VCPU_trap_bounce(%rbx), %rdx
+        testb $TBF_EXCEPTION, TRAPBOUNCE_flags(%rdx)
+        jnz   .Lcompat_process_trapbounce
+
         testb $1,VCPU_mce_pending(%rbx)
         jnz   compat_process_mce
 .Lcompat_test_guest_nmi:
@@ -74,6 +80,15 @@ compat_process_softirqs:
         call  do_softirq
         jmp   compat_test_all_events
 
+        ALIGN
+/* %rbx: struct vcpu, %rdx: struct trap_bounce */
+.Lcompat_process_trapbounce:
+        sti
+.Lcompat_bounce_exception:
+        call  compat_create_bounce_frame
+        movb  $0, TRAPBOUNCE_flags(%rdx)
+        jmp   compat_test_all_events
+
 	ALIGN
 /* %rbx: struct vcpu */
 compat_process_mce:
@@ -194,15 +209,6 @@ ENTRY(cr4_pv32_restore)
         xor   %eax, %eax
         ret
 
-/* %rdx: trap_bounce, %rbx: struct vcpu */
-ENTRY(compat_post_handle_exception)
-        testb $TBF_EXCEPTION,TRAPBOUNCE_flags(%rdx)
-        jz    compat_test_all_events
-.Lcompat_bounce_exception:
-        call  compat_create_bounce_frame
-        movb  $0,TRAPBOUNCE_flags(%rdx)
-        jmp   compat_test_all_events
-
         .section .text.entry, "ax", @progbits
 
 /* See lstar_enter for entry register state. */
--- a/xen/arch/x86/x86_64/entry.S
+++ b/xen/arch/x86/x86_64/entry.S
@@ -41,6 +41,12 @@ test_all_events:
         leaq  irq_stat+IRQSTAT_softirq_pending(%rip), %rcx
         cmpl  $0, (%rcx, %rax, 1)
         jne   process_softirqs
+
+        /* Inject exception if pending. */
+        lea   VCPU_trap_bounce(%rbx), %rdx
+        testb $TBF_EXCEPTION, TRAPBOUNCE_flags(%rdx)
+        jnz   .Lprocess_trapbounce
+
         cmpb  $0, VCPU_mce_pending(%rbx)
         jne   process_mce
 .Ltest_guest_nmi:
@@ -69,6 +75,15 @@ process_softirqs:
         jmp  test_all_events
 
         ALIGN
+/* %rbx: struct vcpu, %rdx struct trap_bounce */
+.Lprocess_trapbounce:
+        sti
+.Lbounce_exception:
+        call  create_bounce_frame
+        movb  $0, TRAPBOUNCE_flags(%rdx)
+        jmp   test_all_events
+
+        ALIGN
 /* %rbx: struct vcpu */
 process_mce:
         testb $1 << VCPU_TRAP_MCE, VCPU_async_exception_mask(%rbx)
@@ -659,15 +674,9 @@ handle_exception_saved:
         mov   %r15, STACK_CPUINFO_FIELD(xen_cr3)(%r14)
         testb $3,UREGS_cs(%rsp)
         jz    restore_all_xen
-        leaq  VCPU_trap_bounce(%rbx),%rdx
         movq  VCPU_domain(%rbx),%rax
         testb $1,DOMAIN_is_32bit_pv(%rax)
-        jnz   compat_post_handle_exception
-        testb $TBF_EXCEPTION,TRAPBOUNCE_flags(%rdx)
-        jz    test_all_events
-.Lbounce_exception:
-        call  create_bounce_frame
-        movb  $0,TRAPBOUNCE_flags(%rdx)
+        jnz   compat_test_all_events
         jmp   test_all_events
 
 /* No special register assumptions. */

[-- Attachment #17: xsa260-4.8/xsa260-3.patch --]
[-- Type: application/octet-stream, Size: 5017 bytes --]

From: Andrew Cooper <andrew.cooper3@citrix.com>
Subject: x86/traps: Use an Interrupt Stack Table for #DB

PV guests can use architectural corner cases to cause #DB to be raised after
transitioning into supervisor mode.

Use an interrupt stack table for #DB to prevent the exception being taken with
a guest controlled stack pointer.

This is part of XSA-260 / CVE-2018-8897.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>

--- a/xen/arch/x86/cpu/common.c
+++ b/xen/arch/x86/cpu/common.c
@@ -681,6 +681,7 @@ void load_system_tables(void)
 	tss->ist[IST_MCE - 1] = stack_top + IST_MCE * PAGE_SIZE;
 	tss->ist[IST_DF  - 1] = stack_top + IST_DF  * PAGE_SIZE;
 	tss->ist[IST_NMI - 1] = stack_top + IST_NMI * PAGE_SIZE;
+	tss->ist[IST_DB  - 1] = stack_top + IST_DB  * PAGE_SIZE;
 
 	_set_tssldt_desc(
 		gdt + TSS_ENTRY,
@@ -701,6 +702,7 @@ void load_system_tables(void)
 	set_ist(&idt_tables[cpu][TRAP_double_fault],  IST_DF);
 	set_ist(&idt_tables[cpu][TRAP_nmi],	      IST_NMI);
 	set_ist(&idt_tables[cpu][TRAP_machine_check], IST_MCE);
+	set_ist(&idt_tables[cpu][TRAP_debug],         IST_DB);
 
 	/*
 	 * Bottom-of-stack must be 16-byte aligned!
--- a/xen/arch/x86/hvm/svm/svm.c
+++ b/xen/arch/x86/hvm/svm/svm.c
@@ -1042,6 +1042,7 @@ static void svm_ctxt_switch_from(struct
     set_ist(&idt_tables[cpu][TRAP_double_fault],  IST_DF);
     set_ist(&idt_tables[cpu][TRAP_nmi],           IST_NMI);
     set_ist(&idt_tables[cpu][TRAP_machine_check], IST_MCE);
+    set_ist(&idt_tables[cpu][TRAP_debug],         IST_DB);
 }
 
 static void svm_ctxt_switch_to(struct vcpu *v)
@@ -1063,6 +1064,7 @@ static void svm_ctxt_switch_to(struct vc
     set_ist(&idt_tables[cpu][TRAP_double_fault],  IST_NONE);
     set_ist(&idt_tables[cpu][TRAP_nmi],           IST_NONE);
     set_ist(&idt_tables[cpu][TRAP_machine_check], IST_NONE);
+    set_ist(&idt_tables[cpu][TRAP_debug],         IST_NONE);
 
     svm_restore_dr(v);
 
--- a/xen/arch/x86/smpboot.c
+++ b/xen/arch/x86/smpboot.c
@@ -960,6 +960,7 @@ static int cpu_smpboot_alloc(unsigned in
     set_ist(&idt_tables[cpu][TRAP_double_fault],  IST_NONE);
     set_ist(&idt_tables[cpu][TRAP_nmi],           IST_NONE);
     set_ist(&idt_tables[cpu][TRAP_machine_check], IST_NONE);
+    set_ist(&idt_tables[cpu][TRAP_debug],         IST_NONE);
 
     for ( stub_page = 0, i = cpu & ~(STUBS_PER_PAGE - 1);
           i < nr_cpu_ids && i <= (cpu | (STUBS_PER_PAGE - 1)); ++i )
--- a/xen/arch/x86/traps.c
+++ b/xen/arch/x86/traps.c
@@ -259,13 +259,13 @@ static void show_guest_stack(struct vcpu
 /*
  * Notes for get_stack_trace_bottom() and get_stack_dump_bottom()
  *
- * Stack pages 0, 1 and 2:
+ * Stack pages 0 - 3:
  *   These are all 1-page IST stacks.  Each of these stacks have an exception
  *   frame and saved register state at the top.  The interesting bound for a
  *   trace is the word adjacent to this, while the bound for a dump is the
  *   very top, including the exception frame.
  *
- * Stack pages 3, 4 and 5:
+ * Stack pages 4 and 5:
  *   None of these are particularly interesting.  With MEMORY_GUARD, page 5 is
  *   explicitly not present, so attempting to dump or trace it is
  *   counterproductive.  Without MEMORY_GUARD, it is possible for a call chain
@@ -286,12 +286,12 @@ unsigned long get_stack_trace_bottom(uns
 {
     switch ( get_stack_page(sp) )
     {
-    case 0 ... 2:
+    case 0 ... 3:
         return ROUNDUP(sp, PAGE_SIZE) -
             offsetof(struct cpu_user_regs, es) - sizeof(unsigned long);
 
 #ifndef MEMORY_GUARD
-    case 3 ... 5:
+    case 4 ... 5:
 #endif
     case 6 ... 7:
         return ROUNDUP(sp, STACK_SIZE) -
@@ -306,11 +306,11 @@ unsigned long get_stack_dump_bottom(unsi
 {
     switch ( get_stack_page(sp) )
     {
-    case 0 ... 2:
+    case 0 ... 3:
         return ROUNDUP(sp, PAGE_SIZE) - sizeof(unsigned long);
 
 #ifndef MEMORY_GUARD
-    case 3 ... 5:
+    case 4 ... 5:
 #endif
     case 6 ... 7:
         return ROUNDUP(sp, STACK_SIZE) - sizeof(unsigned long);
@@ -4219,6 +4219,7 @@ void __init init_idt_traps(void)
     set_ist(&idt_table[TRAP_double_fault],  IST_DF);
     set_ist(&idt_table[TRAP_nmi],           IST_NMI);
     set_ist(&idt_table[TRAP_machine_check], IST_MCE);
+    set_ist(&idt_table[TRAP_debug],         IST_DB);
 
     /* CPU0 uses the master IDT. */
     idt_tables[0] = idt_table;
--- a/xen/arch/x86/x86_64/entry.S
+++ b/xen/arch/x86/x86_64/entry.S
@@ -731,7 +731,7 @@ ENTRY(device_not_available)
 ENTRY(debug)
         pushq $0
         movl  $TRAP_debug,4(%rsp)
-        jmp   handle_exception
+        jmp   handle_ist_exception
 
 ENTRY(int3)
         pushq $0
--- a/xen/include/asm-x86/processor.h
+++ b/xen/include/asm-x86/processor.h
@@ -505,7 +505,8 @@ struct __packed __cacheline_aligned tss_
 #define IST_DF   1UL
 #define IST_NMI  2UL
 #define IST_MCE  3UL
-#define IST_MAX  3UL
+#define IST_DB   4UL
+#define IST_MAX  4UL
 
 /* Set the interrupt stack table used by a particular interrupt
  * descriptor table entry. */

[-- Attachment #18: xsa260-4.8/xsa260-4.patch --]
[-- Type: application/octet-stream, Size: 2809 bytes --]

From: Andrew Cooper <andrew.cooper3@citrix.com>
Subject: x86/traps: Fix handling of #DB exceptions in hypervisor context

The WARN_ON() can be triggered by guest activities, and emits a full stack
trace without rate limiting.  Swap it out for a ratelimited printk with just
enough information to work out what is going on.

Not all #DB exceptions are traps, so blindly continuing is not a safe action
to take.  We don't let PV guests select these settings in the real %dr7 to
begin with, but for added safety against unexpected situations, detect the
fault cases and crash in an obvious manner.

This is part of XSA-260 / CVE-2018-8897.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>

--- a/xen/arch/x86/traps.c
+++ b/xen/arch/x86/traps.c
@@ -4086,16 +4086,44 @@ void do_debug(struct cpu_user_regs *regs
                 regs->eflags &= ~X86_EFLAGS_TF;
             }
         }
-        else
+
+        /*
+         * Check for fault conditions.  General Detect, and instruction
+         * breakpoints are faults rather than traps, at which point attempting
+         * to ignore and continue will result in a livelock.
+         */
+        if ( dr6 & DR_GENERAL_DETECT )
+        {
+            printk(XENLOG_ERR "Hit General Detect in Xen context\n");
+            fatal_trap(regs, 0);
+        }
+
+        if ( dr6 & (DR_TRAP3 | DR_TRAP2 | DR_TRAP1 | DR_TRAP0) )
         {
-            /*
-             * We ignore watchpoints when they trigger within Xen. This may
-             * happen when a buffer is passed to us which previously had a
-             * watchpoint set on it. No need to bump EIP; the only faulting
-             * trap is an instruction breakpoint, which can't happen to us.
-             */
-            WARN_ON(!search_exception_table(regs->eip));
+            unsigned int bp, dr7 = read_debugreg(7) >> DR_CONTROL_SHIFT;
+
+            for ( bp = 0; bp < 4; ++bp )
+            {
+                if ( (dr6 & (1u << bp)) && /* Breakpoint triggered? */
+                     ((dr7 & (3u << (bp * DR_CONTROL_SIZE))) == 0) /* Insn? */ )
+                {
+                    printk(XENLOG_ERR
+                           "Hit instruction breakpoint in Xen context\n");
+                    fatal_trap(regs, 0);
+                }
+            }
         }
+
+        /*
+         * Whatever caused this #DB should be a trap.  Note it and continue.
+         * Guests can trigger this in certain corner cases, so ensure the
+         * message is ratelimited.
+         */
+        gprintk(XENLOG_WARNING,
+                "Hit #DB in Xen context: %04x:%p [%ps], stk %04x:%p, dr6 %lx\n",
+                regs->cs, _p(regs->rip), _p(regs->rip),
+                regs->ss, _p(regs->rsp), dr6);
+
         goto out;
     }
 

[-- Attachment #19: xsa260-4.9/xsa260-1.patch --]
[-- Type: application/octet-stream, Size: 2805 bytes --]

From: Andrew Cooper <andrew.cooper3@citrix.com>
Subject: x86/traps: Fix %dr6 handing in #DB handler

Most bits in %dr6 accumulate, rather than being set directly based on the
current source of #DB.  Have the handler follow the manuals guidance, which
avoids leaking hypervisor debugging activities into guest context.

This is part of XSA-260 / CVE-2018-8897.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>

--- a/xen/arch/x86/traps.c
+++ b/xen/arch/x86/traps.c
@@ -3766,11 +3766,36 @@ static void ler_enable(void)
 
 void do_debug(struct cpu_user_regs *regs)
 {
+    unsigned long dr6;
     struct vcpu *v = current;
 
+    /* Stash dr6 as early as possible. */
+    dr6 = read_debugreg(6);
+
     if ( debugger_trap_entry(TRAP_debug, regs) )
         return;
 
+    /*
+     * At the time of writing (March 2018), on the subject of %dr6:
+     *
+     * The Intel manual says:
+     *   Certain debug exceptions may clear bits 0-3. The remaining contents
+     *   of the DR6 register are never cleared by the processor. To avoid
+     *   confusion in identifying debug exceptions, debug handlers should
+     *   clear the register (except bit 16, which they should set) before
+     *   returning to the interrupted task.
+     *
+     * The AMD manual says:
+     *   Bits 15:13 of the DR6 register are not cleared by the processor and
+     *   must be cleared by software after the contents have been read.
+     *
+     * Some bits are reserved set, some are reserved clear, and some bits
+     * which were previously reserved set are reused and cleared by hardware.
+     * For future compatibility, reset to the default value, which will allow
+     * us to spot any bit being changed by hardware to its non-default value.
+     */
+    write_debugreg(6, X86_DR6_DEFAULT);
+
     if ( !guest_mode(regs) )
     {
         if ( regs->eflags & X86_EFLAGS_TF )
@@ -3803,7 +3828,8 @@ void do_debug(struct cpu_user_regs *regs
     }
 
     /* Save debug status register where guest OS can peek at it */
-    v->arch.debugreg[6] = read_debugreg(6);
+    v->arch.debugreg[6] |= (dr6 & ~X86_DR6_DEFAULT);
+    v->arch.debugreg[6] &= (dr6 | ~X86_DR6_DEFAULT);
 
     ler_enable();
     do_guest_trap(TRAP_debug, regs);
--- a/xen/include/asm-x86/debugreg.h
+++ b/xen/include/asm-x86/debugreg.h
@@ -24,6 +24,8 @@
 #define DR_STATUS_RESERVED_ZERO (~0xffffeffful) /* Reserved, read as zero */
 #define DR_STATUS_RESERVED_ONE  0xffff0ff0ul /* Reserved, read as one */
 
+#define X86_DR6_DEFAULT 0xffff0ff0ul    /* Default %dr6 value. */
+
 /* Now define a bunch of things for manipulating the control register.
    The top two bytes of the control register consist of 4 fields of 4
    bits - each field corresponds to one of the four debug registers,

[-- Attachment #20: xsa260-4.9/xsa260-2.patch --]
[-- Type: application/octet-stream, Size: 3614 bytes --]

From: Andrew Cooper <andrew.cooper3@citrix.com>
Subject: x86/pv: Move exception injection into {,compat_}test_all_events()

This allows paths to jump straight to {,compat_}test_all_events() and have
injection of pending exceptions happen automatically, rather than requiring
all calling paths to handle exceptions themselves.

The normal exception path is simplified as a result, and
compat_post_handle_exception() is removed entirely.

This is part of XSA-260 / CVE-2018-8897.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>

--- a/xen/arch/x86/x86_64/compat/entry.S
+++ b/xen/arch/x86/x86_64/compat/entry.S
@@ -37,6 +37,12 @@ ENTRY(compat_test_all_events)
         leaq  irq_stat+IRQSTAT_softirq_pending(%rip),%rcx
         cmpl  $0,(%rcx,%rax,1)
         jne   compat_process_softirqs
+
+        /* Inject exception if pending. */
+        lea   VCPU_trap_bounce(%rbx), %rdx
+        testb $TBF_EXCEPTION, TRAPBOUNCE_flags(%rdx)
+        jnz   .Lcompat_process_trapbounce
+
         testb $1,VCPU_mce_pending(%rbx)
         jnz   compat_process_mce
 .Lcompat_test_guest_nmi:
@@ -66,6 +72,15 @@ compat_process_softirqs:
         call  do_softirq
         jmp   compat_test_all_events
 
+        ALIGN
+/* %rbx: struct vcpu, %rdx: struct trap_bounce */
+.Lcompat_process_trapbounce:
+        sti
+.Lcompat_bounce_exception:
+        call  compat_create_bounce_frame
+        movb  $0, TRAPBOUNCE_flags(%rdx)
+        jmp   compat_test_all_events
+
 	ALIGN
 /* %rbx: struct vcpu */
 compat_process_mce:
@@ -186,15 +201,6 @@ ENTRY(cr4_pv32_restore)
         xor   %eax, %eax
         ret
 
-/* %rdx: trap_bounce, %rbx: struct vcpu */
-ENTRY(compat_post_handle_exception)
-        testb $TBF_EXCEPTION,TRAPBOUNCE_flags(%rdx)
-        jz    compat_test_all_events
-.Lcompat_bounce_exception:
-        call  compat_create_bounce_frame
-        movb  $0,TRAPBOUNCE_flags(%rdx)
-        jmp   compat_test_all_events
-
         .section .text.entry, "ax", @progbits
 
 /* See lstar_enter for entry register state. */
--- a/xen/arch/x86/x86_64/entry.S
+++ b/xen/arch/x86/x86_64/entry.S
@@ -40,6 +40,12 @@ test_all_events:
         leaq  irq_stat+IRQSTAT_softirq_pending(%rip), %rcx
         cmpl  $0, (%rcx, %rax, 1)
         jne   process_softirqs
+
+        /* Inject exception if pending. */
+        lea   VCPU_trap_bounce(%rbx), %rdx
+        testb $TBF_EXCEPTION, TRAPBOUNCE_flags(%rdx)
+        jnz   .Lprocess_trapbounce
+
         cmpb  $0, VCPU_mce_pending(%rbx)
         jne   process_mce
 .Ltest_guest_nmi:
@@ -68,6 +74,15 @@ process_softirqs:
         jmp  test_all_events
 
         ALIGN
+/* %rbx: struct vcpu, %rdx struct trap_bounce */
+.Lprocess_trapbounce:
+        sti
+.Lbounce_exception:
+        call  create_bounce_frame
+        movb  $0, TRAPBOUNCE_flags(%rdx)
+        jmp   test_all_events
+
+        ALIGN
 /* %rbx: struct vcpu */
 process_mce:
         testb $1 << VCPU_TRAP_MCE, VCPU_async_exception_mask(%rbx)
@@ -664,15 +679,9 @@ handle_exception_saved:
         mov   %r15, STACK_CPUINFO_FIELD(xen_cr3)(%r14)
         testb $3,UREGS_cs(%rsp)
         jz    restore_all_xen
-        leaq  VCPU_trap_bounce(%rbx),%rdx
         movq  VCPU_domain(%rbx),%rax
         testb $1,DOMAIN_is_32bit_pv(%rax)
-        jnz   compat_post_handle_exception
-        testb $TBF_EXCEPTION,TRAPBOUNCE_flags(%rdx)
-        jz    test_all_events
-.Lbounce_exception:
-        call  create_bounce_frame
-        movb  $0,TRAPBOUNCE_flags(%rdx)
+        jnz   compat_test_all_events
         jmp   test_all_events
 
 /* No special register assumptions. */

[-- Attachment #21: xsa260-4.9/xsa260-3.patch --]
[-- Type: application/octet-stream, Size: 5025 bytes --]

From: Andrew Cooper <andrew.cooper3@citrix.com>
Subject: x86/traps: Use an Interrupt Stack Table for #DB

PV guests can use architectural corner cases to cause #DB to be raised after
transitioning into supervisor mode.

Use an interrupt stack table for #DB to prevent the exception being taken with
a guest controlled stack pointer.

This is part of XSA-260 / CVE-2018-8897.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>

--- a/xen/arch/x86/cpu/common.c
+++ b/xen/arch/x86/cpu/common.c
@@ -679,6 +679,7 @@ void load_system_tables(void)
 			[IST_MCE - 1] = stack_top + IST_MCE * PAGE_SIZE,
 			[IST_DF  - 1] = stack_top + IST_DF  * PAGE_SIZE,
 			[IST_NMI - 1] = stack_top + IST_NMI * PAGE_SIZE,
+			[IST_DB  - 1] = stack_top + IST_DB  * PAGE_SIZE,
 
 			[IST_MAX ... ARRAY_SIZE(tss->ist) - 1] =
 				0x8600111111111111ul,
@@ -706,6 +707,7 @@ void load_system_tables(void)
 	set_ist(&idt_tables[cpu][TRAP_double_fault],  IST_DF);
 	set_ist(&idt_tables[cpu][TRAP_nmi],	      IST_NMI);
 	set_ist(&idt_tables[cpu][TRAP_machine_check], IST_MCE);
+	set_ist(&idt_tables[cpu][TRAP_debug],         IST_DB);
 
 	/*
 	 * Bottom-of-stack must be 16-byte aligned!
--- a/xen/arch/x86/hvm/svm/svm.c
+++ b/xen/arch/x86/hvm/svm/svm.c
@@ -1048,6 +1048,7 @@ static void svm_ctxt_switch_from(struct
     set_ist(&idt_tables[cpu][TRAP_double_fault],  IST_DF);
     set_ist(&idt_tables[cpu][TRAP_nmi],           IST_NMI);
     set_ist(&idt_tables[cpu][TRAP_machine_check], IST_MCE);
+    set_ist(&idt_tables[cpu][TRAP_debug],         IST_DB);
 }
 
 static void svm_ctxt_switch_to(struct vcpu *v)
@@ -1069,6 +1070,7 @@ static void svm_ctxt_switch_to(struct vc
     set_ist(&idt_tables[cpu][TRAP_double_fault],  IST_NONE);
     set_ist(&idt_tables[cpu][TRAP_nmi],           IST_NONE);
     set_ist(&idt_tables[cpu][TRAP_machine_check], IST_NONE);
+    set_ist(&idt_tables[cpu][TRAP_debug],         IST_NONE);
 
     svm_restore_dr(v);
 
--- a/xen/arch/x86/smpboot.c
+++ b/xen/arch/x86/smpboot.c
@@ -964,6 +964,7 @@ static int cpu_smpboot_alloc(unsigned in
     set_ist(&idt_tables[cpu][TRAP_double_fault],  IST_NONE);
     set_ist(&idt_tables[cpu][TRAP_nmi],           IST_NONE);
     set_ist(&idt_tables[cpu][TRAP_machine_check], IST_NONE);
+    set_ist(&idt_tables[cpu][TRAP_debug],         IST_NONE);
 
     for ( stub_page = 0, i = cpu & ~(STUBS_PER_PAGE - 1);
           i < nr_cpu_ids && i <= (cpu | (STUBS_PER_PAGE - 1)); ++i )
--- a/xen/arch/x86/traps.c
+++ b/xen/arch/x86/traps.c
@@ -259,13 +259,13 @@ static void show_guest_stack(struct vcpu
 /*
  * Notes for get_stack_trace_bottom() and get_stack_dump_bottom()
  *
- * Stack pages 0, 1 and 2:
+ * Stack pages 0 - 3:
  *   These are all 1-page IST stacks.  Each of these stacks have an exception
  *   frame and saved register state at the top.  The interesting bound for a
  *   trace is the word adjacent to this, while the bound for a dump is the
  *   very top, including the exception frame.
  *
- * Stack pages 3, 4 and 5:
+ * Stack pages 4 and 5:
  *   None of these are particularly interesting.  With MEMORY_GUARD, page 5 is
  *   explicitly not present, so attempting to dump or trace it is
  *   counterproductive.  Without MEMORY_GUARD, it is possible for a call chain
@@ -286,12 +286,12 @@ unsigned long get_stack_trace_bottom(uns
 {
     switch ( get_stack_page(sp) )
     {
-    case 0 ... 2:
+    case 0 ... 3:
         return ROUNDUP(sp, PAGE_SIZE) -
             offsetof(struct cpu_user_regs, es) - sizeof(unsigned long);
 
 #ifndef MEMORY_GUARD
-    case 3 ... 5:
+    case 4 ... 5:
 #endif
     case 6 ... 7:
         return ROUNDUP(sp, STACK_SIZE) -
@@ -306,11 +306,11 @@ unsigned long get_stack_dump_bottom(unsi
 {
     switch ( get_stack_page(sp) )
     {
-    case 0 ... 2:
+    case 0 ... 3:
         return ROUNDUP(sp, PAGE_SIZE) - sizeof(unsigned long);
 
 #ifndef MEMORY_GUARD
-    case 3 ... 5:
+    case 4 ... 5:
 #endif
     case 6 ... 7:
         return ROUNDUP(sp, STACK_SIZE) - sizeof(unsigned long);
@@ -3947,6 +3947,7 @@ void __init init_idt_traps(void)
     set_ist(&idt_table[TRAP_double_fault],  IST_DF);
     set_ist(&idt_table[TRAP_nmi],           IST_NMI);
     set_ist(&idt_table[TRAP_machine_check], IST_MCE);
+    set_ist(&idt_table[TRAP_debug],         IST_DB);
 
     /* CPU0 uses the master IDT. */
     idt_tables[0] = idt_table;
--- a/xen/arch/x86/x86_64/entry.S
+++ b/xen/arch/x86/x86_64/entry.S
@@ -736,7 +736,7 @@ ENTRY(device_not_available)
 ENTRY(debug)
         pushq $0
         movl  $TRAP_debug,4(%rsp)
-        jmp   handle_exception
+        jmp   handle_ist_exception
 
 ENTRY(int3)
         pushq $0
--- a/xen/include/asm-x86/processor.h
+++ b/xen/include/asm-x86/processor.h
@@ -447,7 +447,8 @@ struct __packed __cacheline_aligned tss_
 #define IST_DF   1UL
 #define IST_NMI  2UL
 #define IST_MCE  3UL
-#define IST_MAX  3UL
+#define IST_DB   4UL
+#define IST_MAX  4UL
 
 /* Set the interrupt stack table used by a particular interrupt
  * descriptor table entry. */

[-- Attachment #22: xsa260-4.9/xsa260-4.patch --]
[-- Type: application/octet-stream, Size: 2804 bytes --]

From: Andrew Cooper <andrew.cooper3@citrix.com>
Subject: x86/traps: Fix handling of #DB exceptions in hypervisor context

The WARN_ON() can be triggered by guest activities, and emits a full stack
trace without rate limiting.  Swap it out for a ratelimited printk with just
enough information to work out what is going on.

Not all #DB exceptions are traps, so blindly continuing is not a safe action
to take.  We don't let PV guests select these settings in the real %dr7 to
begin with, but for added safety against unexpected situations, detect the
fault cases and crash in an obvious manner.

This is part of XSA-260 / CVE-2018-8897.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>

--- a/xen/arch/x86/traps.c
+++ b/xen/arch/x86/traps.c
@@ -3814,16 +3814,44 @@ void do_debug(struct cpu_user_regs *regs
                 regs->eflags &= ~X86_EFLAGS_TF;
             }
         }
-        else
+
+        /*
+         * Check for fault conditions.  General Detect, and instruction
+         * breakpoints are faults rather than traps, at which point attempting
+         * to ignore and continue will result in a livelock.
+         */
+        if ( dr6 & DR_GENERAL_DETECT )
+        {
+            printk(XENLOG_ERR "Hit General Detect in Xen context\n");
+            fatal_trap(regs, 0);
+        }
+
+        if ( dr6 & (DR_TRAP3 | DR_TRAP2 | DR_TRAP1 | DR_TRAP0) )
         {
-            /*
-             * We ignore watchpoints when they trigger within Xen. This may
-             * happen when a buffer is passed to us which previously had a
-             * watchpoint set on it. No need to bump EIP; the only faulting
-             * trap is an instruction breakpoint, which can't happen to us.
-             */
-            WARN_ON(!search_exception_table(regs));
+            unsigned int bp, dr7 = read_debugreg(7) >> DR_CONTROL_SHIFT;
+
+            for ( bp = 0; bp < 4; ++bp )
+            {
+                if ( (dr6 & (1u << bp)) && /* Breakpoint triggered? */
+                     ((dr7 & (3u << (bp * DR_CONTROL_SIZE))) == 0) /* Insn? */ )
+                {
+                    printk(XENLOG_ERR
+                           "Hit instruction breakpoint in Xen context\n");
+                    fatal_trap(regs, 0);
+                }
+            }
         }
+
+        /*
+         * Whatever caused this #DB should be a trap.  Note it and continue.
+         * Guests can trigger this in certain corner cases, so ensure the
+         * message is ratelimited.
+         */
+        gprintk(XENLOG_WARNING,
+                "Hit #DB in Xen context: %04x:%p [%ps], stk %04x:%p, dr6 %lx\n",
+                regs->cs, _p(regs->rip), _p(regs->rip),
+                regs->ss, _p(regs->rsp), dr6);
+
         goto out;
     }
 

[-- Attachment #23: xsa260-4.10/xsa260-1.patch --]
[-- Type: application/octet-stream, Size: 2969 bytes --]

From: Andrew Cooper <andrew.cooper3@citrix.com>
Subject: x86/traps: Fix %dr6 handing in #DB handler

Most bits in %dr6 accumulate, rather than being set directly based on the
current source of #DB.  Have the handler follow the manuals guidance, which
avoids leaking hypervisor debugging activities into guest context.

This is part of XSA-260 / CVE-2018-8897.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>

--- a/xen/arch/x86/traps.c	2018-04-13 15:29:36.006747135 +0200
+++ b/xen/arch/x86/traps.c	2018-04-13 15:44:57.015516185 +0200
@@ -1761,11 +1761,36 @@ static void ler_enable(void)
 
 void do_debug(struct cpu_user_regs *regs)
 {
+    unsigned long dr6;
     struct vcpu *v = current;
 
+    /* Stash dr6 as early as possible. */
+    dr6 = read_debugreg(6);
+
     if ( debugger_trap_entry(TRAP_debug, regs) )
         return;
 
+    /*
+     * At the time of writing (March 2018), on the subject of %dr6:
+     *
+     * The Intel manual says:
+     *   Certain debug exceptions may clear bits 0-3. The remaining contents
+     *   of the DR6 register are never cleared by the processor. To avoid
+     *   confusion in identifying debug exceptions, debug handlers should
+     *   clear the register (except bit 16, which they should set) before
+     *   returning to the interrupted task.
+     *
+     * The AMD manual says:
+     *   Bits 15:13 of the DR6 register are not cleared by the processor and
+     *   must be cleared by software after the contents have been read.
+     *
+     * Some bits are reserved set, some are reserved clear, and some bits
+     * which were previously reserved set are reused and cleared by hardware.
+     * For future compatibility, reset to the default value, which will allow
+     * us to spot any bit being changed by hardware to its non-default value.
+     */
+    write_debugreg(6, X86_DR6_DEFAULT);
+
     if ( !guest_mode(regs) )
     {
         if ( regs->eflags & X86_EFLAGS_TF )
@@ -1798,7 +1823,8 @@ void do_debug(struct cpu_user_regs *regs
     }
 
     /* Save debug status register where guest OS can peek at it */
-    v->arch.debugreg[6] = read_debugreg(6);
+    v->arch.debugreg[6] |= (dr6 & ~X86_DR6_DEFAULT);
+    v->arch.debugreg[6] &= (dr6 | ~X86_DR6_DEFAULT);
 
     ler_enable();
     pv_inject_hw_exception(TRAP_debug, X86_EVENT_NO_EC);
--- a/xen/include/asm-x86/debugreg.h	2015-02-11 09:36:29.000000000 +0100
+++ b/xen/include/asm-x86/debugreg.h	2018-04-13 15:44:57.015516185 +0200
@@ -24,6 +24,8 @@
 #define DR_STATUS_RESERVED_ZERO (~0xffffeffful) /* Reserved, read as zero */
 #define DR_STATUS_RESERVED_ONE  0xffff0ff0ul /* Reserved, read as one */
 
+#define X86_DR6_DEFAULT 0xffff0ff0ul    /* Default %dr6 value. */
+
 /* Now define a bunch of things for manipulating the control register.
    The top two bytes of the control register consist of 4 fields of 4
    bits - each field corresponds to one of the four debug registers,

[-- Attachment #24: xsa260-4.10/xsa260-2.patch --]
[-- Type: application/octet-stream, Size: 3614 bytes --]

From: Andrew Cooper <andrew.cooper3@citrix.com>
Subject: x86/pv: Move exception injection into {,compat_}test_all_events()

This allows paths to jump straight to {,compat_}test_all_events() and have
injection of pending exceptions happen automatically, rather than requiring
all calling paths to handle exceptions themselves.

The normal exception path is simplified as a result, and
compat_post_handle_exception() is removed entirely.

This is part of XSA-260 / CVE-2018-8897.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>

--- a/xen/arch/x86/x86_64/compat/entry.S
+++ b/xen/arch/x86/x86_64/compat/entry.S
@@ -39,6 +39,12 @@ ENTRY(compat_test_all_events)
         leaq  irq_stat+IRQSTAT_softirq_pending(%rip),%rcx
         cmpl  $0,(%rcx,%rax,1)
         jne   compat_process_softirqs
+
+        /* Inject exception if pending. */
+        lea   VCPU_trap_bounce(%rbx), %rdx
+        testb $TBF_EXCEPTION, TRAPBOUNCE_flags(%rdx)
+        jnz   .Lcompat_process_trapbounce
+
         testb $1,VCPU_mce_pending(%rbx)
         jnz   compat_process_mce
 .Lcompat_test_guest_nmi:
@@ -68,6 +74,15 @@ compat_process_softirqs:
         call  do_softirq
         jmp   compat_test_all_events
 
+        ALIGN
+/* %rbx: struct vcpu, %rdx: struct trap_bounce */
+.Lcompat_process_trapbounce:
+        sti
+.Lcompat_bounce_exception:
+        call  compat_create_bounce_frame
+        movb  $0, TRAPBOUNCE_flags(%rdx)
+        jmp   compat_test_all_events
+
 	ALIGN
 /* %rbx: struct vcpu */
 compat_process_mce:
@@ -189,15 +204,6 @@ ENTRY(cr4_pv32_restore)
         xor   %eax, %eax
         ret
 
-/* %rdx: trap_bounce, %rbx: struct vcpu */
-ENTRY(compat_post_handle_exception)
-        testb $TBF_EXCEPTION,TRAPBOUNCE_flags(%rdx)
-        jz    compat_test_all_events
-.Lcompat_bounce_exception:
-        call  compat_create_bounce_frame
-        movb  $0,TRAPBOUNCE_flags(%rdx)
-        jmp   compat_test_all_events
-
         .section .text.entry, "ax", @progbits
 
 /* See lstar_enter for entry register state. */
--- a/xen/arch/x86/x86_64/entry.S
+++ b/xen/arch/x86/x86_64/entry.S
@@ -42,6 +42,12 @@ test_all_events:
         leaq  irq_stat+IRQSTAT_softirq_pending(%rip), %rcx
         cmpl  $0, (%rcx, %rax, 1)
         jne   process_softirqs
+
+        /* Inject exception if pending. */
+        lea   VCPU_trap_bounce(%rbx), %rdx
+        testb $TBF_EXCEPTION, TRAPBOUNCE_flags(%rdx)
+        jnz   .Lprocess_trapbounce
+
         cmpb  $0, VCPU_mce_pending(%rbx)
         jne   process_mce
 .Ltest_guest_nmi:
@@ -70,6 +76,15 @@ process_softirqs:
         jmp  test_all_events
 
         ALIGN
+/* %rbx: struct vcpu, %rdx struct trap_bounce */
+.Lprocess_trapbounce:
+        sti
+.Lbounce_exception:
+        call  create_bounce_frame
+        movb  $0, TRAPBOUNCE_flags(%rdx)
+        jmp   test_all_events
+
+        ALIGN
 /* %rbx: struct vcpu */
 process_mce:
         testb $1 << VCPU_TRAP_MCE, VCPU_async_exception_mask(%rbx)
@@ -667,15 +682,9 @@ handle_exception_saved:
         mov   %r15, STACK_CPUINFO_FIELD(xen_cr3)(%r14)
         testb $3,UREGS_cs(%rsp)
         jz    restore_all_xen
-        leaq  VCPU_trap_bounce(%rbx),%rdx
         movq  VCPU_domain(%rbx),%rax
         testb $1,DOMAIN_is_32bit_pv(%rax)
-        jnz   compat_post_handle_exception
-        testb $TBF_EXCEPTION,TRAPBOUNCE_flags(%rdx)
-        jz    test_all_events
-.Lbounce_exception:
-        call  create_bounce_frame
-        movb  $0,TRAPBOUNCE_flags(%rdx)
+        jnz   compat_test_all_events
         jmp   test_all_events
 
 /* No special register assumptions. */

[-- Attachment #25: xsa260-4.10/xsa260-3.patch --]
[-- Type: application/octet-stream, Size: 5025 bytes --]

From: Andrew Cooper <andrew.cooper3@citrix.com>
Subject: x86/traps: Use an Interrupt Stack Table for #DB

PV guests can use architectural corner cases to cause #DB to be raised after
transitioning into supervisor mode.

Use an interrupt stack table for #DB to prevent the exception being taken with
a guest controlled stack pointer.

This is part of XSA-260 / CVE-2018-8897.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>

--- a/xen/arch/x86/cpu/common.c
+++ b/xen/arch/x86/cpu/common.c
@@ -679,6 +679,7 @@ void load_system_tables(void)
 			[IST_MCE - 1] = stack_top + IST_MCE * PAGE_SIZE,
 			[IST_DF  - 1] = stack_top + IST_DF  * PAGE_SIZE,
 			[IST_NMI - 1] = stack_top + IST_NMI * PAGE_SIZE,
+			[IST_DB  - 1] = stack_top + IST_DB  * PAGE_SIZE,
 
 			[IST_MAX ... ARRAY_SIZE(tss->ist) - 1] =
 				0x8600111111111111ul,
@@ -706,6 +707,7 @@ void load_system_tables(void)
 	set_ist(&idt_tables[cpu][TRAP_double_fault],  IST_DF);
 	set_ist(&idt_tables[cpu][TRAP_nmi],	      IST_NMI);
 	set_ist(&idt_tables[cpu][TRAP_machine_check], IST_MCE);
+	set_ist(&idt_tables[cpu][TRAP_debug],         IST_DB);
 
 	/*
 	 * Bottom-of-stack must be 16-byte aligned!
--- a/xen/arch/x86/hvm/svm/svm.c
+++ b/xen/arch/x86/hvm/svm/svm.c
@@ -1046,6 +1046,7 @@ static void svm_ctxt_switch_from(struct
     set_ist(&idt_tables[cpu][TRAP_double_fault],  IST_DF);
     set_ist(&idt_tables[cpu][TRAP_nmi],           IST_NMI);
     set_ist(&idt_tables[cpu][TRAP_machine_check], IST_MCE);
+    set_ist(&idt_tables[cpu][TRAP_debug],         IST_DB);
 }
 
 static void svm_ctxt_switch_to(struct vcpu *v)
@@ -1067,6 +1068,7 @@ static void svm_ctxt_switch_to(struct vc
     set_ist(&idt_tables[cpu][TRAP_double_fault],  IST_NONE);
     set_ist(&idt_tables[cpu][TRAP_nmi],           IST_NONE);
     set_ist(&idt_tables[cpu][TRAP_machine_check], IST_NONE);
+    set_ist(&idt_tables[cpu][TRAP_debug],         IST_NONE);
 
     svm_restore_dr(v);
 
--- a/xen/arch/x86/smpboot.c
+++ b/xen/arch/x86/smpboot.c
@@ -964,6 +964,7 @@ static int cpu_smpboot_alloc(unsigned in
     set_ist(&idt_tables[cpu][TRAP_double_fault],  IST_NONE);
     set_ist(&idt_tables[cpu][TRAP_nmi],           IST_NONE);
     set_ist(&idt_tables[cpu][TRAP_machine_check], IST_NONE);
+    set_ist(&idt_tables[cpu][TRAP_debug],         IST_NONE);
 
     for ( stub_page = 0, i = cpu & ~(STUBS_PER_PAGE - 1);
           i < nr_cpu_ids && i <= (cpu | (STUBS_PER_PAGE - 1)); ++i )
--- a/xen/arch/x86/traps.c
+++ b/xen/arch/x86/traps.c
@@ -325,13 +325,13 @@ static void show_guest_stack(struct vcpu
 /*
  * Notes for get_stack_trace_bottom() and get_stack_dump_bottom()
  *
- * Stack pages 0, 1 and 2:
+ * Stack pages 0 - 3:
  *   These are all 1-page IST stacks.  Each of these stacks have an exception
  *   frame and saved register state at the top.  The interesting bound for a
  *   trace is the word adjacent to this, while the bound for a dump is the
  *   very top, including the exception frame.
  *
- * Stack pages 3, 4 and 5:
+ * Stack pages 4 and 5:
  *   None of these are particularly interesting.  With MEMORY_GUARD, page 5 is
  *   explicitly not present, so attempting to dump or trace it is
  *   counterproductive.  Without MEMORY_GUARD, it is possible for a call chain
@@ -352,12 +352,12 @@ unsigned long get_stack_trace_bottom(uns
 {
     switch ( get_stack_page(sp) )
     {
-    case 0 ... 2:
+    case 0 ... 3:
         return ROUNDUP(sp, PAGE_SIZE) -
             offsetof(struct cpu_user_regs, es) - sizeof(unsigned long);
 
 #ifndef MEMORY_GUARD
-    case 3 ... 5:
+    case 4 ... 5:
 #endif
     case 6 ... 7:
         return ROUNDUP(sp, STACK_SIZE) -
@@ -372,11 +372,11 @@ unsigned long get_stack_dump_bottom(unsi
 {
     switch ( get_stack_page(sp) )
     {
-    case 0 ... 2:
+    case 0 ... 3:
         return ROUNDUP(sp, PAGE_SIZE) - sizeof(unsigned long);
 
 #ifndef MEMORY_GUARD
-    case 3 ... 5:
+    case 4 ... 5:
 #endif
     case 6 ... 7:
         return ROUNDUP(sp, STACK_SIZE) - sizeof(unsigned long);
@@ -1943,6 +1943,7 @@ void __init init_idt_traps(void)
     set_ist(&idt_table[TRAP_double_fault],  IST_DF);
     set_ist(&idt_table[TRAP_nmi],           IST_NMI);
     set_ist(&idt_table[TRAP_machine_check], IST_MCE);
+    set_ist(&idt_table[TRAP_debug],         IST_DB);
 
     /* CPU0 uses the master IDT. */
     idt_tables[0] = idt_table;
--- a/xen/arch/x86/x86_64/entry.S
+++ b/xen/arch/x86/x86_64/entry.S
@@ -739,7 +739,7 @@ ENTRY(device_not_available)
 ENTRY(debug)
         pushq $0
         movl  $TRAP_debug,4(%rsp)
-        jmp   handle_exception
+        jmp   handle_ist_exception
 
 ENTRY(int3)
         pushq $0
--- a/xen/include/asm-x86/processor.h
+++ b/xen/include/asm-x86/processor.h
@@ -443,7 +443,8 @@ struct __packed __cacheline_aligned tss_
 #define IST_DF   1UL
 #define IST_NMI  2UL
 #define IST_MCE  3UL
-#define IST_MAX  3UL
+#define IST_DB   4UL
+#define IST_MAX  4UL
 
 /* Set the interrupt stack table used by a particular interrupt
  * descriptor table entry. */

[-- Attachment #26: xsa260-4.10/xsa260-4.patch --]
[-- Type: application/octet-stream, Size: 2804 bytes --]

From: Andrew Cooper <andrew.cooper3@citrix.com>
Subject: x86/traps: Fix handling of #DB exceptions in hypervisor context

The WARN_ON() can be triggered by guest activities, and emits a full stack
trace without rate limiting.  Swap it out for a ratelimited printk with just
enough information to work out what is going on.

Not all #DB exceptions are traps, so blindly continuing is not a safe action
to take.  We don't let PV guests select these settings in the real %dr7 to
begin with, but for added safety against unexpected situations, detect the
fault cases and crash in an obvious manner.

This is part of XSA-260 / CVE-2018-8897.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>

--- a/xen/arch/x86/traps.c
+++ b/xen/arch/x86/traps.c
@@ -1809,16 +1809,44 @@ void do_debug(struct cpu_user_regs *regs
                 regs->eflags &= ~X86_EFLAGS_TF;
             }
         }
-        else
+
+        /*
+         * Check for fault conditions.  General Detect, and instruction
+         * breakpoints are faults rather than traps, at which point attempting
+         * to ignore and continue will result in a livelock.
+         */
+        if ( dr6 & DR_GENERAL_DETECT )
+        {
+            printk(XENLOG_ERR "Hit General Detect in Xen context\n");
+            fatal_trap(regs, 0);
+        }
+
+        if ( dr6 & (DR_TRAP3 | DR_TRAP2 | DR_TRAP1 | DR_TRAP0) )
         {
-            /*
-             * We ignore watchpoints when they trigger within Xen. This may
-             * happen when a buffer is passed to us which previously had a
-             * watchpoint set on it. No need to bump EIP; the only faulting
-             * trap is an instruction breakpoint, which can't happen to us.
-             */
-            WARN_ON(!search_exception_table(regs));
+            unsigned int bp, dr7 = read_debugreg(7) >> DR_CONTROL_SHIFT;
+
+            for ( bp = 0; bp < 4; ++bp )
+            {
+                if ( (dr6 & (1u << bp)) && /* Breakpoint triggered? */
+                     ((dr7 & (3u << (bp * DR_CONTROL_SIZE))) == 0) /* Insn? */ )
+                {
+                    printk(XENLOG_ERR
+                           "Hit instruction breakpoint in Xen context\n");
+                    fatal_trap(regs, 0);
+                }
+            }
         }
+
+        /*
+         * Whatever caused this #DB should be a trap.  Note it and continue.
+         * Guests can trigger this in certain corner cases, so ensure the
+         * message is ratelimited.
+         */
+        gprintk(XENLOG_WARNING,
+                "Hit #DB in Xen context: %04x:%p [%ps], stk %04x:%p, dr6 %lx\n",
+                regs->cs, _p(regs->rip), _p(regs->rip),
+                regs->ss, _p(regs->rsp), dr6);
+
         goto out;
     }
 

[-- Attachment #27: Type: text/plain, Size: 157 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] only message in thread