From mboxrd@z Thu Jan 1 00:00:00 1970 From: Andrew Cooper Subject: Re: [PATCH 8/8] xen/x86: Additional SMAP modes to work around buggy 32bit PV guests Date: Thu, 25 Jun 2015 12:53:36 +0100 Message-ID: <558BEBC0.40901@citrix.com> References: <1435163500-10589-1-git-send-email-andrew.cooper3@citrix.com> <1435163500-10589-9-git-send-email-andrew.cooper3@citrix.com> <558BE39B.7070409@citrix.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <558BE39B.7070409@citrix.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: David Vrabel , Xen-devel Cc: Jan Beulich List-Id: xen-devel@lists.xenproject.org On 25/06/15 12:18, David Vrabel wrote: > On 24/06/15 17:31, Andrew Cooper wrote: >> Experimentally, older Linux guests perform construction of `init` with user >> pagetable mappings. This is fine for native systems as such a guest would not >> set CR4.SMAP itself. >> >> However if Xen uses SMAP itself, 32bit PV guests (whose kernels run in ring1) >> are also affected. Older Linux guests end up spinning in a loop assuming that >> the SMAP violation pagefaults are spurious, and make no further progress. >> >> One option is to disable SMAP completely, but this is unreasonable. A better >> alternative is to disable SMAP only in the context of 32bit PV guests, but >> reduces the effectiveness SMAP security. A 3rd option is for Xen to fix up >> behind a 32bit guest if it were SMAP-aware. It is a heuristic, and does >> result in a guest-visible state change, but allows Xen to keep CR4.SMAP >> unconditionally enabled. > [...] >> --- a/docs/misc/xen-command-line.markdown >> +++ b/docs/misc/xen-command-line.markdown >> @@ -1261,11 +1261,32 @@ Set the serial transmit buffer size. >> Flag to enable Supervisor Mode Execution Protection >> >> ### smap >> -> `= ` >> +> `= | compat | fixup` >> >> > Default: `true` >> >> -Flag to enable Supervisor Mode Access Prevention >> +Handling of Supervisor Mode Access Prevention. >> + >> +32bit PV guest kernels qualify as supervisor code, as they execute in ring 1. >> +If Xen uses SMAP protection itself, a PV guest which is not SMAP aware may >> +suffer unexpected pagefaults which it cannot handle. (Experimentally, there >> +are 32bit PV guests which fall foul of SMAP enforcement and spin in an >> +infinite loop taking pagefaults early on boot.) >> + >> +Two further SMAP modes are introduced to work around buggy 32bit PV guests to >> +prevent functional regressions of VMs on newer hardware. At any point if the >> +guest sets `CR4.SMAP` itself, it is deemed aware, and **compat/fixup** cease >> +to apply. > Guests that is not aware of SMAP or do not support it are not "buggy". Taking and not understanding a SMAP #PF is understandable. The way it spins in an infinite loop is unquestionably buggy. > >> + >> +A SMAP mode of **compat** causes Xen to disable `CR4.SMAP` in the context of >> +an unaware 32bit PV guest. This prevents the guest from being subject to SMAP >> +enforcement, but also prevents Xen from benefiting from the added security >> +checks. >> + >> +A SMAP mode of **fixup** causes Xen to set `EFLAGS.AC` when discovering a SMAP >> +pagefault in the context of an unaware 32bit PV guest. This allows Xen to >> +retain the added security from SMAP checks, but results in a guest-visible >> +state change which it might object to. > What does the PV ABI say about the use of EFLAGS.AC? Have guests > historically been allowed to use this bit? If so, does Xen fiddling > with it potentially break some guests? If there were an ABI written down anywhere, I might be able to answer that question. 32bit PV guest kernels cannot make use of AC themselves; alignment checking is only available in cpl3. AC is however able to be changed by a popf instruction even in cpl3 (which make it very curious as to why stac/clac are strictly cpl0 instructions). Fundamentally, smap=fixup might indeed break a PV guest, but testing shows that RHEL/CentOS 5/6, SLES 11/12 and Debian 6/7 PV guests are all fine with it. ~Andrew