From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Han, Huaitong" Subject: Re: [V3 PATCH 7/9] x86/hvm: pkeys, add pkeys support for guest_walk_tables Date: Fri, 11 Dec 2015 09:23:37 +0000 Message-ID: <1449825823.3868.51.camel@intel.com> References: <1449479780-19146-1-git-send-email-huaitong.han@intel.com> <1449479780-19146-8-git-send-email-huaitong.han@intel.com> <5669C23F.6080203@citrix.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <5669C23F.6080203@citrix.com> Content-Language: en-US Content-ID: <7B25771BB7D3134C856C345C9880EB48@intel.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: "george.dunlap@eu.citrix.com" , "ian.campbell@citrix.com" , "Dong, Eddie" , "andrew.cooper3@citrix.com" , "wei.liu2@citrix.com" , "ian.jackson@eu.citrix.com" , "Tian, Kevin" , "george.dunlap@citrix.com" , "jbeulich@suse.com" , "Nakajima, Jun" , "stefano.stabellini@eu.citrix.com" , "keir@xen.org" Cc: "xen-devel@lists.xen.org" List-Id: xen-devel@lists.xenproject.org On Thu, 2015-12-10 at 18:19 +0000, George Dunlap wrote: > On 07/12/15 09:16, Huaitong Han wrote: > > +{ > > + void *xsave_addr; > > + unsigned int pkru = 0; > > + bool_t pkru_ad, pkru_wd; > > + > > + bool_t uf = !!(pfec & PFEC_user_mode); > > + bool_t wf = !!(pfec & PFEC_write_access); > > + bool_t ff = !!(pfec & PFEC_insn_fetch); > > + bool_t rsvdf = !!(pfec & PFEC_reserved_bit); > > + bool_t pkuf = !!(pfec & PFEC_prot_key); > > So I'm just wondering out loud here -- is there actually any > situation > in which we would want guest_walk_tables to act differently than the > real hardware? > > That is, is there actually any situation where, pku is enabled, the > vcpu > is in long mode, PFEC_write_access and/or PFEC_page_present is set, > and > the pkey is non-zero, that we want guest_walk_tables() to only check > the > write-protect bit for the pte, and not also check the pkru? > > Because if not, it seems like it would be much more robust to simply > *always* check for pkru_ad if PFEC_page_present is set, and for > pkru_wd > if PFEC_write_access is set. > Then in patch 8, you wouldn't need to go around all the __hvm_copy > functions adding in PFEC_prot; instead, you'd just need to add > PFEC_insn_fetch to the "fetch" (as is already done for SMEP and NX), > and > you'd be done. See reply email from Feng discussed with me. > > + > > + if ( !cpu_has_xsave || !pkuf || is_pv_vcpu(vcpu) ) > > + return 0; > > + > > + /* PKRU dom0 is always zero */ > > "dom0" has a very specific meaning in Xen. I think this would be > better > written "pkey 0 always has full access". > > > + if ( likely(!pte_pkeys) ) > > + return 0; > > + > > + /* Update vcpu xsave area */ > > + fpu_xsave(vcpu); > > Is there a reason you're calling fpu_xsave() directly here, rather > than > just calling vcpu_save_fpu()? That saves you actually doing the > xsave > if the fpu hasn't been modified since the last time you read it. use fpu_xsave instead of fpu_xsave because Jan's comment: Which is bogus by itself: That function isn't meant to be used for purposes like the one you have, e.g. due to its side effect of clearing ->fpu_dirtied. You really ought to be using a lower level function here (and I don't think the corresponding struct vcpu should get altered in any way). --Jan And I can add if ( !vcpu->fpu_dirtied ) before fpu_xsave(vcpu); > > + xsave_addr = get_xsave_addr(vcpu->arch.xsave_area, > > fls64(XSTATE_PKRU)-1); > > + if ( !!xsave_addr ) > > + memcpy(&pkru, xsave_addr, sizeof(pkru)); > > There's no need for the !! here. But in any case, isn't there a > better > function for reading the xsave state than manually calculating the > address and doing a memcpy? RDPKRU is disabled by hypervisor CR4 because PV mode must disable CR4.PKE, getting PKRU value only depends on xsave. > > + > > + if ( unlikely(pkru) ) > > + { > > + /* > > + * PKU: additional mechanism by which the paging controls > > + * access to user-mode addresses based on the value in the > > + * PKRU register. A fault is considered as a PKU violation > > if all > > + * of the following conditions are ture: > > + * 1.CR4_PKE=1. > > + * 2.EFER_LMA=1. > > + * 3.page is present with no reserved bit violations. > > + * 4.the access is not an instruction fetch. > > + * 5.the access is to a user page. > > + * 6.PKRU.AD=1 > > + * or The access is a data write and PKRU.WD=1 > > + * and either CR0.WP=1 or it is a user access. > > + */ > > + pkru_ad = read_pkru_ad(pkru, pte_pkeys); > > + pkru_wd = read_pkru_wd(pkru, pte_pkeys); > > + if ( hvm_pku_enabled(vcpu) && hvm_long_mode_enabled(vcpu) > > && > > + !rsvdf && !ff && (pkru_ad || > > + (pkru_wd && wf && (hvm_wp_enabled(vcpu) || uf)))) > > + return 1; > > This statement here is really difficult to read. Why don't you put > the > checks which don't depend on the pkru up before you read it? e.g., > hvm_pku_enabled(), hvm_long_mode_enabled(), rsvdf, ff, &c? > > -George