From mboxrd@z Thu Jan 1 00:00:00 1970 From: Andrew Cooper Subject: Re: Xen PV PTE ABI (or lack thereof) Date: Thu, 21 Jan 2016 11:16:02 +0000 Message-ID: <56A0BDF2.8030308@citrix.com> References: <569FE999.2080404@citrix.com> <56A0CA0902000078000C9899@prv-mh.provo.novell.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <56A0CA0902000078000C9899@prv-mh.provo.novell.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Jan Beulich Cc: George Dunlap , Huaitong Han , Tim Deegan , Xen-devel List List-Id: xen-devel@lists.xenproject.org On 21/01/16 11:07, Jan Beulich wrote: >>>> On 20.01.16 at 21:10, wrote: >> First of all, SMEP and SMAP. 32bit PV guests are subject to Xen's >> SMEP/SMAP choices, because of running in ring 1. >> >> SMAP in particular is problematic because older Linux guests do fall >> foul of it; they don't understand what a SMAP pagefault is, and enter an >> infinite loop of pagefaults. SMEP is also problematic because it breaks >> any guest wishing to use a shared address space between kernel and >> user. (I had some fun getting the test framework to function until I >> twigged what was happening). >> >> Both of these are regressions; older guests relying on existing >> behaviour cease to function on newer hardware/Xen despite identical >> settings. > And for both of them there simply should be a way for the guest to > state whether it's compatible (which should be the case for anything > we can't deal with completely transparently to guests). > >> For the PTE bits, _PAGE_GNTTAB (bit 62) is used exclusively in debug >> build (so there is a guest observable difference between running on a >> debug and a non-debug Xen), and the comment beside it even identifies >> that it breaks BSD guests. PTE bits 62:59 used by hardware if CR4.PKE >> is set. Currently this means that we are not able to support Protection >> Key for PV guests (although this restriction technically only applies to >> debug builds of the hypervisor). >> >> The other PTE bit used by Xen is _PAGE_GUEST_KERNEL (bit 52). This bit >> is used to notice when a 64bit PV guest attempts to override the fixup >> Xen applies to its PTEs. Xen unilaterally sets _PAGE_GLOBAL for user >> pages, and clears _PAGE_GLOBAL for supervisor mappings, setting >> _PAGE_USER in both cases as the PV kernel runs in ring3. The only thing >> _PAGE_GUEST_KERNEL is used for is to notice when the kernel deliberately >> tries to create a _PAGE_GUEST_KERNEL|_PAGE_GLOBAL, at which point a >> warning is logged and the kernel overridden. >> >> >> Neither of the used PTE bits exist in the Xen public ABI. Neither of >> them serve a purpose other than a debugging aid. >> >> I propose hiding them behind CONFIG_PV_PTE_DEBUG and declaring an ABI of >> "all bits available for guest use". > And a kernel using any of the conflicting bits would then become > unusable on a hypervisor with that debug option enabled? I'd > rather see us document the state things are in... _PAGE_GNTMAP is already states: /* * Debug option: Ensure that granted mappings are not implicitly unmapped. * WARNING: This will need to be disabled to run OSes that use the spare PTE * bits themselves (e.g., *BSD). */ I was intending to have CONFIG_PV_PTE_DEBUG as an EXPERT option, disabled by default even in debug builds. There should not be an ABI difference between release and "normal" debug builds. ~Andrew