From: Xiao Guangrong <guangrong.xiao@linux.intel.com>
To: Paolo Bonzini <pbonzini@redhat.com>,
Huaitong Han <huaitong.han@intel.com>,
gleb@kernel.org
Cc: kvm@vger.kernel.org
Subject: Re: [PATCH V4 3/7] KVM, pkeys: update memeory permission bitmask for pkeys
Date: Tue, 8 Mar 2016 17:19:26 +0800 [thread overview]
Message-ID: <56DE991E.8070007@linux.intel.com> (raw)
In-Reply-To: <56DE8D7D.5010302@redhat.com>
On 03/08/2016 04:29 PM, Paolo Bonzini wrote:
>
>
> On 08/03/2016 08:35, Xiao Guangrong wrote:
>>> well-predicted branches are _faster_ than branchless code.
>>
>> Er, i do not understand this. If these two case have the same cache hit,
>> how can a branch be faster?
>
> Because branchless code typically executes fewer instructions.
>
> Take the same example here:
>
>>> do {
>>> } while (level > PT_PAGE_TABLE_LEVEL &&
>>> (!(gpte & PT_PAGE_SIZE_MASK) ||
>>> level == mmu->root_level));
>
> The assembly looks like (assuming %level, %gpte and %mmu are registers)
>
> cmp $1, %level
> jbe 1f
> test $128, %gpte
> jz beginning_of_loop
> cmpb ROOT_LEVEL_OFFSET(%mmu), %level
> je beginning_of_loop
> 1:
>
> These are two to six instructions, with no dependency and which the
> processor can change into one to three macro-ops. For the branchless
> code (I posted a patch to implement this algorithm yesterday):
>
> lea -2(%level), %temp1
> orl %temp1, %gpte
> movzbl LAST_NONLEAF_LEVEL_OFFSET(%mmu), %temp1
> movl %level, %temp2
> subl %temp1, %temp2
> andl %temp2, %gpte
> test $128, %gpte
> jz beginning_of_loop
>
> These are eight instructions, with some dependencies between them too.
> In some cases branchless code throws away the result of 10-15
> instructions (because in the end it's ANDed with 0, for example). If it
> weren't for mispredictions, the branchy code would be faster.
>
Good lesson, thank you, Paolo. :)
>>> Here none of the branches is easily predicted, so we want to get rid of
>>> them.
>>>
>>> The next patch adds three branches, and they are not all equal:
>>>
>>> - is_long_vcpu is well predicted to true (or even for 32-bit OSes it
>>> should be well predicted if the host is not overcommitted).
>>
>> But, in the production, cpu over-commit is the normal case...
>
> It depends on the workload. I would guess that 32-bit OSes are more
> common where you have a single legacy guest because e.g. it doesn't have
> drivers for recent hardware.
>
>>>> However, i do not think we need a new byte index for PK. The conditions
>>>> detecting PK enablement
>>>> can be fully found in current vcpu content (i.e, CR4, EFER and U/S
>>>> access).
>>>
>>> Adding a new byte index lets you cache CR4.PKE (and actually EFER.LMA
>>> too, though Huaitong's patch doesn't do that). It's a good thing to do.
>>> U/S is also handled by adding a new byte index, see Huaitong's
>>
>> It is not on the same page, the U/S is the type of memory access which
>> is depended on vCPU runtime.
>
> Do you mean the type of page (ACC_USER_MASK)? Only U=1 pages are
> subject to PKRU, even in the kernel. The processor CPL
> (PFERR_USER_MASK) only matters if CR0.WP=0.
No. The index is:
| Byte index: page fault error code [4:1]
So, the type i mentioned is the type of memory access issued by CPU, e,g CPU is
writing the memory or CPU is executing on the memory.
>
>> But the condition whether PKEY is enabled or not
>> is fully depended on the envorment of CPU and we should _always_
>> check PKEY even if PFEC_PKEY is not set.
>>
>> As PKEY is not enabled on softmmu, the gva_to_gpa mostly comes from internal
>> KVM, that means we should always set PFEC.PKEY for all the gva_to_gpa request.
>> Wasting a bit is really unnecessary.
>>
>> And it is always better to move more workload from permission_fault() to
>> update_permission_bitmask() as the former is much hotter than the latter.
>
> I agree, but I'm not sure why you say that adding a bits adds more work
> to permission_fault().
A branch to check PFEC.PKEY, which is not well predictable on soft mmu. (It
should always be set in EPT as the page table walking is done by software,
however, if we only consider EPT we can assume it is always true).
>
> Adding a bit lets us skip CR4.PKU and EFER.LMA checks in
> permission_fault() and in all gva_to_gpa() callers.
The point is when we can clear this bit to skip these checks. We should
_always_ check PKEY even if PFEC.PKEY = 0, because:
1) all gva_to_gpa()s issued by KVM should always check PKEY. This is the
case of ept only.
2) if the feature is enabled in softmmu, shadow page table may change its
behavior, for example, the mmio-access causes a reserved PF which
may clear PFEC.PKEY.
And skipping these checks is not really necessary as we can take them into
account when we update the bitmask.
>
> So my proposal is to compute the "effective" PKRU bits (i.e. extract the
> relevant AD and WD bits, and mask away WD if irrelevant) in
> update_permission_bitmask(), and add PFERR_PK_MASK to the error code if
> they are nonzero.
>
> PFERR_PK_MASK must be computed in permission_fault(). It's a runtime
> condition that it's not known before.
>
Yes, you are right.
next prev parent reply other threads:[~2016-03-08 9:19 UTC|newest]
Thread overview: 47+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-03-05 11:27 [PATCH V4 0/7] KVM, pkeys: add memory protection-key support Huaitong Han
2016-03-05 11:27 ` [PATCH V4 1/7] KVM, pkeys: expose CPUID/CR4 to guest Huaitong Han
2016-03-06 7:15 ` Xiao Guangrong
2016-03-06 23:20 ` Paolo Bonzini
2016-03-08 7:39 ` Xiao Guangrong
2016-03-08 7:58 ` Paolo Bonzini
2016-03-05 11:27 ` [PATCH V4 2/7] KVM, pkeys: disable pkeys for guests in non-paging mode Huaitong Han
2016-03-06 7:19 ` Xiao Guangrong
2016-03-08 12:09 ` Yang Zhang
2016-03-08 12:11 ` Paolo Bonzini
2016-03-08 13:02 ` Yang Zhang
2016-03-05 11:27 ` [PATCH V4 3/7] KVM, pkeys: update memeory permission bitmask for pkeys Huaitong Han
2016-03-06 7:42 ` Xiao Guangrong
2016-03-06 23:14 ` Paolo Bonzini
2016-03-08 7:35 ` Xiao Guangrong
2016-03-08 8:29 ` Paolo Bonzini
2016-03-08 9:19 ` Xiao Guangrong [this message]
2016-03-08 10:01 ` Paolo Bonzini
2016-03-09 5:03 ` Xiao Guangrong
2016-03-09 8:10 ` Paolo Bonzini
2016-03-05 11:27 ` [PATCH V4 4/7] KVM, pkeys: add pkeys support for permission_fault logic Huaitong Han
2016-03-06 8:00 ` Xiao Guangrong
2016-03-06 20:36 ` Paolo Bonzini
2016-03-06 23:29 ` Paolo Bonzini
2016-03-08 5:57 ` Xiao Guangrong
2016-03-05 11:27 ` [PATCH V4 5/7] KVM, pkeys: Add pkeys support for gva_to_gpa funcions Huaitong Han
2016-03-06 8:01 ` Xiao Guangrong
2016-03-06 21:33 ` Paolo Bonzini
2016-03-05 11:27 ` [PATCH V4 6/7] KVM, pkeys: add pkeys support for xsave state Huaitong Han
2016-03-06 8:27 ` Xiao Guangrong
2016-03-05 11:27 ` [PATCH V4 7/7] KVM, pkeys: disable PKU feature without ept Huaitong Han
2016-03-06 9:28 ` Xiao Guangrong
2016-03-06 20:32 ` Paolo Bonzini
2016-03-08 5:54 ` Xiao Guangrong
2016-03-08 8:47 ` Paolo Bonzini
2016-03-08 9:32 ` Xiao Guangrong
2016-03-08 10:02 ` Paolo Bonzini
2016-03-09 5:51 ` Xiao Guangrong
2016-03-09 6:37 ` Yang Zhang
2016-03-09 7:21 ` Xiao Guangrong
2016-03-09 7:41 ` Yang Zhang
2016-03-09 7:50 ` Xiao Guangrong
2016-03-09 8:00 ` Yang Zhang
2016-03-09 8:05 ` Xiao Guangrong
2016-03-09 8:18 ` Paolo Bonzini
2016-03-09 8:13 ` Paolo Bonzini
2016-03-09 6:24 ` Yang Zhang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=56DE991E.8070007@linux.intel.com \
--to=guangrong.xiao@linux.intel.com \
--cc=gleb@kernel.org \
--cc=huaitong.han@intel.com \
--cc=kvm@vger.kernel.org \
--cc=pbonzini@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).