From: Xiao Guangrong <guangrong.xiao@linux.intel.com>
To: Paolo Bonzini <pbonzini@redhat.com>,
Huaitong Han <huaitong.han@intel.com>,
gleb@kernel.org
Cc: kvm@vger.kernel.org
Subject: Re: [PATCH V4 3/7] KVM, pkeys: update memeory permission bitmask for pkeys
Date: Tue, 8 Mar 2016 15:35:20 +0800 [thread overview]
Message-ID: <56DE80B8.40900@linux.intel.com> (raw)
In-Reply-To: <56DCB9BF.4020904@redhat.com>
On 03/07/2016 07:14 AM, Paolo Bonzini wrote:
>
>
> On 06/03/2016 08:42, Xiao Guangrong wrote:
>>>
>>> + rsvdf = pfec & PFERR_RSVD_MASK;
>>
>> No. RSVD is reserved by SMAP and it should not be used to walk guest
>> page page table.
>
> Agreed. You can treat your code as if rsvdf was always false. Reserved
> bits are handled elsewhere.
>
>>> + pkuf = pfec & PFERR_PK_MASK;
>>> /*
>>> * PFERR_RSVD_MASK bit is set in PFEC if the access is not
>>> * subject to SMAP restrictions, and cleared otherwise. The
>>> @@ -3824,12 +3830,34 @@ static void update_permission_bitmask(struct
>>> kvm_vcpu *vcpu,
>>> * clearer.
>>> */
>>> smap = cr4_smap && u && !uf && !ff;
>>> +
>>> + /*
>>> + * PKU:additional mechanism by which the paging
>>> + * controls access to user-mode addresses based
>>> + * on the value in the PKRU register. A fault is
>>> + * considered as a PKU violation if all of the
>>> + * following conditions are true:
>>> + * 1.CR4_PKE=1.
>>> + * 2.EFER_LMA=1.
>>> + * 3.page is present with no reserved bit
>>> + * violations.
>>> + * 4.the access is not an instruction fetch.
>>> + * 5.the access is to a user page.
>>> + * 6.PKRU.AD=1
>>> + * or The access is a data write and
>>> + * PKRU.WD=1 and either CR0.WP=1
>>> + * or it is a user access.
>>> + *
>>> + * The 2nd and 6th conditions are computed
>>> + * dynamically in permission_fault.
>>> + */
>>
>> It is not good as there are branches in the next patch.
>
> It's important to note that branches in general are _not_ a problem.
> Only badly-predicted branches are a problem;
I agreed on this point.
> well-predicted branches are
> _faster_ than branchless code.
Er, i do not understand this. If these two case have the same cache hit,
how can a branch be faster?
> For example, take is_last_gpte. The
> branchy way to write it in walk_addr_generic would be (excluding the
> 32-bit !PSE case) something like:
>
> do {
> } while (level > PT_PAGE_TABLE_LEVEL &&
> (!(gpte & PT_PAGE_SIZE_MASK) ||
> level == mmu->root_level));
>
> Here none of the branches is easily predicted, so we want to get rid of
> them.
>
> The next patch adds three branches, and they are not all equal:
>
> - is_long_vcpu is well predicted to true (or even for 32-bit OSes it
> should be well predicted if the host is not overcommitted).
>
But, in the production, cpu over-commit is the normal case...
> - pkru != 0 should be well-predicted to false, at least for a few
> years... and perhaps even later considering that most MMIO access
> happens in the kernel.
>
> - !wf || (!uf && !is_write_protection(vcpu)) is badly predicted and
> should be removed
>
> So only the last one is a problem.
>
>> However, i do not think we need a new byte index for PK. The conditions
>> detecting PK enablement
>> can be fully found in current vcpu content (i.e, CR4, EFER and U/S access).
>
> Adding a new byte index lets you cache CR4.PKE (and actually EFER.LMA
> too, though Huaitong's patch doesn't do that). It's a good thing to do.
> U/S is also handled by adding a new byte index, see Huaitong's
It is not on the same page, the U/S is the type of memory access which
is depended on vCPU runtime. But the condition whether PKEY is enabled or not
is fully depended on the envorment of CPU and we should _always_ check PKEY
even if PFEC_PKEY is not set.
As PKEY is not enabled on softmmu, the gva_to_gpa mostly comes from internal
KVM, that means we should always set PFEC.PKEY for all the gva_to_gpa request.
Wasting a bit is really unnecessary.
And it is always better to move more workload from permission_fault() to
update_permission_bitmask() as the former is much hotter than the latter.
>
> pku = cr4_pku && !ff && u;
>
> If this is improved to
>
> pku = cr4_pku && long_mode_vcpu && !ff && u;
>
> one branch goes away in permission_fault. The read_pkru() branch, if
> well predicted, lets you optimize away the pkru tests. I think it
> _would_ be well predicted, so I think it should remain.
>
> The "(!wf || (!uf && !is_write_protection(vcpu)))" is indeed the worst
> of the three. I was lenient in my previous review because this code
> won't run on any system being sold now and in the next 1-2 (?) years.
> However, we can indeed get rid of the branch, so let's do it. :)
>
> I don't like the idea of making permissions[] four times larger.
Okay, then lets introduce a new field for PKEY separately. Your approach
, fault_u1w0, looks good to me.
> Instead, we can find the value of the expression elsewhere in
> mmu->permissions (!), or cache it separately in a different field of mmu.
>
> If I interpret the rules correctly, WD works like this. First, we take
> the PTE and imagine that it had W=0. Then, if this access would not
> fault, WD is ignored. This is because:
>
> - on reads, WD is always ignored
>
> - on writes, WD is ignored in supervisor mode if !CR0.WP
>
> ... and this is how W=0 page work, isn't it?
>
Yes, it is.
> If so, I think it's something like this in code:
>
> - if (!wf || (!uf && !is_write_protection(vcpu)))
> - pkru_bits &= ~(1 << PKRU_WRITE);
> + /* Only testing writes, so ignore SMAP and fetch. */
> + pfec_uw = pfec & (PFERR_WRITE_MASK|PFERR_USER_MASK);
> + fault_uw = mmu->permissions[pfec_uw >> 1];
> + /*
> + * This page has U=1, so check if a U=1 W=0 page faults
> + * on this access; if not ignore WD.
> + */
> + pkru_bits &= ~(1 << PKRU_WRITE) |
> + (fault_uw >> (ACC_USER_MASK - PKRU_WRITE));
>
This is trick and finally i understand it, yeah, it works. :) Except
i do not think PFEC.PKEY should be taken to index as i explained above.
> I think I even prefer if update_permission_bitmask sets up a separate
> bitmask:
>
> mmu->fault_u1w0 |= (wf && !w) << byte;
>
> and then this other bitmap can be tested in permission_fault:
>
>
> - if (!wf || (!uf && !is_write_protection(vcpu)))
> - pkru_bits &= ~(1 << PKRU_WRITE);
> + /*
> + * fault_u1w0 ignores SMAP and PKRU, so use the
> + * partially-computed PFEC that we were given.
> + */
> + fault_uw = (mmu->fault_u1w0 >> (pfec >> 1)) & 1;
> + pkru_bits &= ~(1 << PKRU_WRITE) |
> + (fault_uw << PKRU_WRITE);
>
It looks good to me!
next prev parent reply other threads:[~2016-03-08 7:35 UTC|newest]
Thread overview: 47+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-03-05 11:27 [PATCH V4 0/7] KVM, pkeys: add memory protection-key support Huaitong Han
2016-03-05 11:27 ` [PATCH V4 1/7] KVM, pkeys: expose CPUID/CR4 to guest Huaitong Han
2016-03-06 7:15 ` Xiao Guangrong
2016-03-06 23:20 ` Paolo Bonzini
2016-03-08 7:39 ` Xiao Guangrong
2016-03-08 7:58 ` Paolo Bonzini
2016-03-05 11:27 ` [PATCH V4 2/7] KVM, pkeys: disable pkeys for guests in non-paging mode Huaitong Han
2016-03-06 7:19 ` Xiao Guangrong
2016-03-08 12:09 ` Yang Zhang
2016-03-08 12:11 ` Paolo Bonzini
2016-03-08 13:02 ` Yang Zhang
2016-03-05 11:27 ` [PATCH V4 3/7] KVM, pkeys: update memeory permission bitmask for pkeys Huaitong Han
2016-03-06 7:42 ` Xiao Guangrong
2016-03-06 23:14 ` Paolo Bonzini
2016-03-08 7:35 ` Xiao Guangrong [this message]
2016-03-08 8:29 ` Paolo Bonzini
2016-03-08 9:19 ` Xiao Guangrong
2016-03-08 10:01 ` Paolo Bonzini
2016-03-09 5:03 ` Xiao Guangrong
2016-03-09 8:10 ` Paolo Bonzini
2016-03-05 11:27 ` [PATCH V4 4/7] KVM, pkeys: add pkeys support for permission_fault logic Huaitong Han
2016-03-06 8:00 ` Xiao Guangrong
2016-03-06 20:36 ` Paolo Bonzini
2016-03-06 23:29 ` Paolo Bonzini
2016-03-08 5:57 ` Xiao Guangrong
2016-03-05 11:27 ` [PATCH V4 5/7] KVM, pkeys: Add pkeys support for gva_to_gpa funcions Huaitong Han
2016-03-06 8:01 ` Xiao Guangrong
2016-03-06 21:33 ` Paolo Bonzini
2016-03-05 11:27 ` [PATCH V4 6/7] KVM, pkeys: add pkeys support for xsave state Huaitong Han
2016-03-06 8:27 ` Xiao Guangrong
2016-03-05 11:27 ` [PATCH V4 7/7] KVM, pkeys: disable PKU feature without ept Huaitong Han
2016-03-06 9:28 ` Xiao Guangrong
2016-03-06 20:32 ` Paolo Bonzini
2016-03-08 5:54 ` Xiao Guangrong
2016-03-08 8:47 ` Paolo Bonzini
2016-03-08 9:32 ` Xiao Guangrong
2016-03-08 10:02 ` Paolo Bonzini
2016-03-09 5:51 ` Xiao Guangrong
2016-03-09 6:37 ` Yang Zhang
2016-03-09 7:21 ` Xiao Guangrong
2016-03-09 7:41 ` Yang Zhang
2016-03-09 7:50 ` Xiao Guangrong
2016-03-09 8:00 ` Yang Zhang
2016-03-09 8:05 ` Xiao Guangrong
2016-03-09 8:18 ` Paolo Bonzini
2016-03-09 8:13 ` Paolo Bonzini
2016-03-09 6:24 ` Yang Zhang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=56DE80B8.40900@linux.intel.com \
--to=guangrong.xiao@linux.intel.com \
--cc=gleb@kernel.org \
--cc=huaitong.han@intel.com \
--cc=kvm@vger.kernel.org \
--cc=pbonzini@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.