From mboxrd@z Thu Jan 1 00:00:00 1970 From: Andrew Cooper Subject: Re: Bisected Xen-unstable: "Segment register inaccessible for d1v0" when starting HVM guest on intel Date: Wed, 2 Jul 2014 10:44:49 +0100 Message-ID: <53B3D491.1080907@citrix.com> References: <1057886355.20140628222158@eikelenboom.it> <53B1A244020000780001EA4D@mail.emea.novell.com> <1081819750.20140630183750@eikelenboom.it> <53B19EEB.4060603@citrix.com> <53B27902020000780001ED8B@mail.emea.novell.com> <53B29E03020000780001EF03@mail.emea.novell.com> <53B3CA8A020000780001F4B4@mail.emea.novell.com> <53B3D5D2020000780001F4F8@mail.emea.novell.com> <53B3ECEE020000780001F61F@mail.emea.novell.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: Received: from mail6.bemta3.messagelabs.com ([195.245.230.39]) by lists.xen.org with esmtp (Exim 4.72) (envelope-from ) id 1X2H5y-0003cs-Ce for xen-devel@lists.xenproject.org; Wed, 02 Jul 2014 09:44:54 +0000 In-Reply-To: <53B3ECEE020000780001F61F@mail.emea.novell.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Jan Beulich , Feng Wu Cc: Sander Eikelenboom , "xen-devel@lists.xenproject.org" List-Id: xen-devel@lists.xenproject.org On 02/07/14 10:28, Jan Beulich wrote: >>>> On 02.07.14 at 11:14, wrote: >>> From: Jan Beulich [mailto:JBeulich@suse.com] >>> No, you're again looking at the segment register load side, which isn't >>> what this started with, and which we should put aside. The implicit >>> supervisor mode accesses we're needing to deal with here are the >>> ones _not_ resulting from emulation of anything: The update of the >>> runstate area (which is what Sander stumbled across) and (similar) >>> the update of time data, i.e. update_secondary_system_time(). Now >>> that I think about it the two are actually different: The latter is >>> specifically intended to update posibly user mode visible data, so we >>> need to first determine whether it is correct to apply the SMAP check >>> here (I think it is since the virtual address given to the kernel >>> shouldn't be the one exposed to user mode - at least on Linux, so >>> the question is whether we can assume eventual other OSes making >>> use of this PV extension to also use distinct virtual addresses here). >> If I understand it correctly, referring to the two examples you mentioned >> here, >> this is about a shared memory between Xen and guests. I have some questions >> about this: >> 1. What is the relationship between these operations and implicit supervisor >> mode accesses? >> Seems this is not what is defined for implicit supervisor mode accesses in >> the Spec. >> 2. For the first case you mentioned above, (v)->runstate_guest is a guest >> pointer which >> is set in 'VCPUOP_register_runstate_memory_area' operation, but I only see >> this pointer >> is set for domain 0, how is it set for HVM guests? For Sander's case, seems >> this pointer >> is set for the HVM guests (d1v0). > I have no idea where you found this to be set for Dom0 only. > VCPUOP_register_runstate_memory_area is available to all guests. > >> Here is a quote from Intel SDM: >> "If CR4.SMAP=1, supervisor-mode data accesses are not allowed to linear >> addresses that are accessible in user mode", >> So for the second case you listed above, if Xen and user space use different >> virtual >> address, if the virtual address for Xen usage is supervisor-only, no SMAP >> check will >> be needed, However, if they use the same virtual address, SMAP check may be >> needed >> if this virtual address is use accessible. > This being a PV extension to the base architecture, the hardware > specification is meaningless. What we need to do here is _extend_ what > the hardware has specified for those extra accesses. We have three > options basically: > 1) never do any checking on such accesses > 2) honor CPL and EFLAGS.AC > 3) always do the checking > The first one obviously is bad from a security POV. Since the third one is > more strict than the second and since I assume adding some override is > going to be the simpler change than altering the point in time when the > VMCS gets loaded during context switch (the suggestion of which no one > at all commented on so far), I'd prefer that one, but wouldn't mind > option 2 to be implemented instead. > > Jan The problem is not the hypervisor check. We are already deep within an hvm_copy_to_user() which is between a stac()/clac() pair. The issue is that guest_walk_tables() is checking a Xen access using guest page tables as if it were a supervisor access given the current context of the vcpu. What can/should Xen do if its emulated access fails with a guest SMAP violations? It certainly can't/shouldn't inject a pagefault, nor should it actually fail the write. copy_to_user() is not subject to the guest operating mode and whether we are writing into guest user or supervisor pages. ~Andrew