From: George Dunlap <george.dunlap@citrix.com>
To: "Han, Huaitong" <huaitong.han@intel.com>, "Wu, Feng" <feng.wu@intel.com>
Cc: "tim@xen.org" <tim@xen.org>, "Tian, Kevin" <kevin.tian@intel.com>,
"wei.liu2@citrix.com" <wei.liu2@citrix.com>,
"ian.campbell@citrix.com" <ian.campbell@citrix.com>,
"stefano.stabellini@eu.citrix.com"
<stefano.stabellini@eu.citrix.com>,
"george.dunlap@eu.citrix.com" <george.dunlap@eu.citrix.com>,
"andrew.cooper3@citrix.com" <andrew.cooper3@citrix.com>,
"Dong, Eddie" <eddie.dong@intel.com>,
"xen-devel@lists.xen.org" <xen-devel@lists.xen.org>,
"jbeulich@suse.com" <jbeulich@suse.com>,
"Nakajima, Jun" <jun.nakajima@intel.com>,
"keir@xen.org" <keir@xen.org>,
"ian.jackson@eu.citrix.com" <ian.jackson@eu.citrix.com>
Subject: Re: [V3 PATCH 7/9] x86/hvm: pkeys, add pkeys support for guest_walk_tables
Date: Fri, 18 Dec 2015 10:03:32 +0000 [thread overview]
Message-ID: <5673D9F4.5050400@citrix.com> (raw)
In-Reply-To: <1450426913.4557.20.camel@intel.com>
On 18/12/15 08:21, Han, Huaitong wrote:
> On Wed, 2015-12-16 at 15:36 +0000, George Dunlap wrote:
>> [Adding Tim, the previous mm maintainer]
>>
>> On 11/12/15 09:16, Wu, Feng wrote:
>>>>> +{
>>>>> + void *xsave_addr;
>>>>> + unsigned int pkru = 0;
>>>>> + bool_t pkru_ad, pkru_wd;
>>>>> +
>>>>> + bool_t uf = !!(pfec & PFEC_user_mode);
>>>>> + bool_t wf = !!(pfec & PFEC_write_access);
>>>>> + bool_t ff = !!(pfec & PFEC_insn_fetch);
>>>>> + bool_t rsvdf = !!(pfec & PFEC_reserved_bit);
>>>>> + bool_t pkuf = !!(pfec & PFEC_prot_key);
>>>>
>>>> So I'm just wondering out loud here -- is there actually any
>>>> situation
>>>> in which we would want guest_walk_tables to act differently than
>>>> the
>>>> real hardware?
>>>>
>>>> That is, is there actually any situation where, pku is enabled,
>>>> the vcpu
>>>> is in long mode, PFEC_write_access and/or PFEC_page_present is
>>>> set, and
>>>> the pkey is non-zero, that we want guest_walk_tables() to only
>>>> check the
>>>> write-protect bit for the pte, and not also check the pkru?
>>>>
>>>> Because if not, it seems like it would be much more robust to
>>>> simply
>>>> *always* check for pkru_ad if PFEC_page_present is set, and for
>>>> pkru_wd
>>>> if PFEC_write_access is set.
>>>
>>> guest_walk_tables() is also used by shadow code, though we don't
>>> plan to support pkeys for shadow now, however, in that case, the
>>> 'pfec'
>>> is generated by hardware, and the pkuf bit may be 0 or 1 depending
>>> on the real page fault. If we unconditionally check pkeys in
>>> guest_walk_tables(), it is not a good ideas for someone who may
>>> want to implement the pkeys for shadow in future, since we only
>>> need to check pkeys when the pkuf is set in pfec.
>>
>> So you'll have to forgive me for being a bit slow here -- I stepped
>> away
>> from the mm code for a while, and a lot has changed since I agreed to
>> become mm maintainer earlier this year.
>>
>> So here is what I see in the tree; please correct me if I missed
>> something important.
>>
>> The function guest_walk_tables():
>> 1. Accepts a pfec argument which is meant to describe the *access
>> type*
>> that is happening
>> 2. Returns a value with the "wrong" flags (i.e., flags which needed
>> to
>> be set that were missing, or flags that needed to be clear that were
>> set).
>> 3. It checks reserved bits in the pagetable regardless of whether
>> PFEC_reserved_bit is set in the caller, and returns
>> _PAGE_INVALID_BITS
>> if so.
>> 4. If PFEC_insn_fetch is set, it will only check for nx and smep if
>> those features are enabled for the guest.
>>
>> The main callers are the various gva_to_gfn(), which accept a
>> *pointer*
>> to a pfec argument. This pointer is actually bidirectional: its
>> value
>> passed to guest_walk_tables() to determine what access types are
>> checked; what is returned is meant to be a pfec value which can be
>> passed to a guest as part of a fault (and is by several callers).
>> But
>> it may also return the Xen-internal flags, PFEC_page_{paged,shared}.
>> And, importantly, it modifies the pfec value based on the bits that
>> are
>> returned from guest_walk_tables(): It will clear PFEC_present if
>> guest_walk_tables() returns _PAGE_PRESENT, and it will set
>> PFEC_reserved_bit if guest_walk_tables() returns _PAGE_INVALID_BIT.
>>
>> The next logical level up are the
>> hvm_{copy,fetch}_{to,from}_guest_virt(), which also take a pfec
>> argument: but really the only purpose of the pfec argument is to
>> allow
>> the caller to add the PFEC_user_mode flag; the other access-type
>> flags
>> are set automatically by the functions themselves (e.g., "to" sets
>> PFEC_write_access, "fetch" sets PFEC_insn_fetch, &c). Several
>> callers
>> set PFEC_present as well, but callers set any other bits.
>>
>> (hvm_fetch_from_guest_virt() seems to only set PFEC_insn_fetch if nx
>> or
>> smep are enabled in the guest. This seems inconsistent to me with
>> the
>> treatment of PFEC_reserved_bit: it seems like
>> hvm_fetch_from_guest_virt() should always pass in PFEC_insn_fetch,
>> particularly as guest_walk_tables() will already gate the checks
>> based
>> on whether nx or smep is enabled in the guest. Tim, you know of any
>> reason for this?)
>>
>> So there seems to me to be no reason to pass PFEC_prot_key into
>> guest_walk_tables() or gva_to_gfn(). The pfec value passed *into*
>> those
>> should simply indicate the type of memory access being done: present,
>> write, instruction fetch, user.
>>
>> With your current series, guest_walk_tables() already checks for
>> pkeys
>> being enabled in the guest before checking for them in the
>> pagetables.
>> For shadow mode, these will be false, and so no checks will be done.
>> If
>> anyone ever implements pkeys for shadow mode, then these will be
>> enabled, and the checks will be done, without any intervention on the
>> part of the caller.
>
> I have understood it, but, the problem with shadow mode is that pfec
> may come from regs->error_code(hardware), just like:
> rc = sh_walk_guest_tables(v, va, &gw, regs->error_code);
> so, when regs->error_code does not have PFEC_prot_key,
> guest_walk_tables may still check PKEY when codes is writen according
> to what you said, and it maybe return a different result.
Under what situation would the *hardware* not generate PFEC_prot_key,
but guest_walk_tables() would, under exactly the same conditions,
generate PFEC_prot_key? Shouldn't guest_walk_tables() be made to behave
identically to the hardware?
The only situation where that might legitimately happen is when the
shadow pagetables and the guest pagetables diverge; for instance, when a
page had been write-protected because Xen thought it was a pagetable.
Then the guest pte might have write permission, but the pkey read-only
permission; but the shadow pte would not have write permission. In that
case, the hardware might give PFEC_write_access but not PFEC_prot_key, I
suppose (haven't read the manual to be sure); but in that case, we
*want* guest_walk_tables() to detect and add PFEC_prot_key, don't we?
-George
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
next prev parent reply other threads:[~2015-12-18 10:03 UTC|newest]
Thread overview: 44+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-12-07 9:16 [V3 PATCH 0/9] x86/hvm: pkeys, add memory protection-key support Huaitong Han
2015-12-07 9:16 ` [V3 PATCH 1/9] x86/hvm: pkeys, add the flag to enable Memory Protection Keys Huaitong Han
2015-12-10 15:37 ` George Dunlap
2015-12-07 9:16 ` [V3 PATCH 2/9] x86/hvm: pkeys, add pkeys support when setting CR4 Huaitong Han
2015-12-07 9:16 ` [V3 PATCH 3/9] x86/hvm: pkeys, disable pkeys for guests in non-paging mode Huaitong Han
2015-12-07 9:16 ` [V3 PATCH 4/9] x86/hvm: pkeys, add functions to get pkeys value from PTE Huaitong Han
2015-12-10 15:48 ` George Dunlap
2015-12-10 18:47 ` Andrew Cooper
2015-12-07 9:16 ` [V3 PATCH 5/9] x86/hvm: pkeys, add functions to support PKRU access Huaitong Han
2015-12-10 18:48 ` Andrew Cooper
2015-12-07 9:16 ` [V3 PATCH 6/9] x86/hvm: pkeys, add xstate support for pkeys Huaitong Han
2015-12-10 17:39 ` George Dunlap
2015-12-10 18:57 ` Andrew Cooper
2015-12-11 9:36 ` Jan Beulich
2015-12-07 9:16 ` [V3 PATCH 7/9] x86/hvm: pkeys, add pkeys support for guest_walk_tables Huaitong Han
2015-12-10 18:19 ` George Dunlap
2015-12-11 9:16 ` Wu, Feng
2015-12-11 9:23 ` Jan Beulich
2015-12-16 15:36 ` George Dunlap
2015-12-16 16:28 ` Tim Deegan
2015-12-16 16:34 ` Andrew Cooper
2015-12-16 17:33 ` Tim Deegan
2015-12-16 16:50 ` George Dunlap
2015-12-16 17:21 ` Tim Deegan
2015-12-18 8:21 ` Han, Huaitong
2015-12-18 10:03 ` George Dunlap [this message]
2015-12-18 11:46 ` Tim Deegan
2015-12-11 9:23 ` Han, Huaitong
2015-12-11 9:50 ` Jan Beulich
2015-12-11 9:26 ` Jan Beulich
2015-12-15 8:14 ` Han, Huaitong
2015-12-15 9:02 ` Jan Beulich
2015-12-16 8:16 ` Han, Huaitong
2015-12-16 8:32 ` Jan Beulich
2015-12-16 9:03 ` Han, Huaitong
2015-12-16 9:12 ` Jan Beulich
2015-12-17 9:18 ` Han, Huaitong
2015-12-17 10:05 ` Jan Beulich
2015-12-10 18:59 ` Andrew Cooper
2015-12-11 7:18 ` Han, Huaitong
2015-12-11 8:48 ` Andrew Cooper
2015-12-07 9:16 ` [V3 PATCH 8/9] x86/hvm: pkeys, add pkeys support for gva2gfn funcitons Huaitong Han
2015-12-07 9:16 ` [V3 PATCH 9/9] x86/hvm: pkeys, add pkeys support for cpuid handling Huaitong Han
2015-12-11 9:47 ` Jan Beulich
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5673D9F4.5050400@citrix.com \
--to=george.dunlap@citrix.com \
--cc=andrew.cooper3@citrix.com \
--cc=eddie.dong@intel.com \
--cc=feng.wu@intel.com \
--cc=george.dunlap@eu.citrix.com \
--cc=huaitong.han@intel.com \
--cc=ian.campbell@citrix.com \
--cc=ian.jackson@eu.citrix.com \
--cc=jbeulich@suse.com \
--cc=jun.nakajima@intel.com \
--cc=keir@xen.org \
--cc=kevin.tian@intel.com \
--cc=stefano.stabellini@eu.citrix.com \
--cc=tim@xen.org \
--cc=wei.liu2@citrix.com \
--cc=xen-devel@lists.xen.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).