From mboxrd@z Thu Jan  1 00:00:00 1970
From: Xiao Guangrong <guangrong.xiao@linux.intel.com>
Subject: Re: [PATCH V4 3/7] KVM, pkeys: update memeory permission bitmask for
 pkeys
Date: Wed, 9 Mar 2016 13:03:33 +0800
Message-ID: <56DFAEA5.9080604@linux.intel.com>
References: <1457177252-7577-1-git-send-email-huaitong.han@intel.com>
 <1457177252-7577-4-git-send-email-huaitong.han@intel.com>
 <56DBDF57.4030607@linux.intel.com> <56DCB9BF.4020904@redhat.com>
 <56DE80B8.40900@linux.intel.com> <56DE8D7D.5010302@redhat.com>
 <56DE991E.8070007@linux.intel.com> <56DEA2FA.70302@redhat.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit
Cc: kvm@vger.kernel.org
To: Paolo Bonzini <pbonzini@redhat.com>,
	Huaitong Han <huaitong.han@intel.com>, gleb@kernel.org
Return-path: <kvm-owner@vger.kernel.org>
Received: from mga14.intel.com ([192.55.52.115]:24832 "EHLO mga14.intel.com"
	rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
	id S1750710AbcCIFD7 (ORCPT <rfc822;kvm@vger.kernel.org>);
	Wed, 9 Mar 2016 00:03:59 -0500
In-Reply-To: <56DEA2FA.70302@redhat.com>
Sender: kvm-owner@vger.kernel.org
List-ID: <kvm.vger.kernel.org>


On 03/08/2016 06:01 PM, Paolo Bonzini wrote:
>
>
> On 08/03/2016 10:19, Xiao Guangrong wrote:
>>>>
>>>> As PKEY is not enabled on softmmu, the gva_to_gpa mostly comes from
>>>> internal
>>>> KVM, that means we should always set PFEC.PKEY for all the gva_to_gpa
>>>> request.
>>>> Wasting a bit is really unnecessary.
>>>>
>>>> And it is always better to move more workload from permission_fault() to
>>>> update_permission_bitmask() as the former is much hotter than the
>>>> latter.
>>>
>>> I agree, but I'm not sure why you say that adding a bits adds more work
>>> to permission_fault().
>>
>> A branch to check PFEC.PKEY, which is not well predictable on soft mmu. (It
>> should always be set in EPT as the page table walking is done by software,
>> however, if we only consider EPT we can assume it is always true).
>
> Ok, I agree we shouldn't add such a branch.
>
> It's really messy that PKERR_PK_MASK is computed at runtime.  It means
> we need to check U=1 in permission_fault, and we need to rethink all of
> the handling of PKERR_PK_MASK in fact.  I think we should extend the
> algorithm from earlier in the thread to handle the AD bit too:
>
> 	/* Bit 0: should PKRU.AD be considered if PFEC=0 or 1?
> 	 * Bit 1: should PKRU.WD be considered if PFEC=0 or 1?
> 	 * Bit 2: should PKRU.AD be considered if PFEC=2 or 3?
> 	 * Bit 3: should PKRU.WD be considered if PFEC=2 or 3?
> 	 * Ignore PKRU if U=0.
> 	 * etc.
> 	 */
> 	u32 pkru_mask;
>
> In update_permission_bitmask, the bits of mmu->pkru_mask only depend on
> the byte, not on the bit.
>
> Bits 0, 2, ... are computed from pku, i.e. cr4_pku && long_mode && !ff.
> Bits 1, 3, ... are computed from pku && (wf &&
> (is_write_protection(vcpu) || uf)).
>
> The permissions[] array can keep 16 entries, because PFERR_PK_MASK is
> never passed to permission_fault (like PFERR_RSVD_MASK).  The
> permission_fault code is something like:
>
> 	WARN_ON(pfec & PFERR_PK_MASK);
> 	pkru_mask = pte_access & ACC_USER_MASK ? mmu->pkru_mask : 0;
>          pkru_bits &= pkru_mask >> (pfec & ~1);

                                       ^ it should be pfec >> 1? As we
never take PFEC.P into account?

Actually pkru_mask can be u16 since only three bits that are PFEC.W, PFEC.U
and PFEC.F need to be taken into account and every offset occupies 2 bits.

> 	pfec |= -pkru_bits & PFERR_PK_MASK;
>
> What do you think?

Very good. :)

>
> I still believe that the above checks should be wrapped in a
>
> 	if (unlikely(mmu->pkru_mask) {
> 		...
> 	}
>
> It should be predicted well and removes the potential performance cost
> of branchless code on !PKE OSes.

I am considering we can use 'pte_access & ACC_USER_MASK' replacing PFEC.RSVD
in your algorithm (then pkru_mask is 32 bit), permission_fault() is like this:

WARN_ON(pfec & PFERR_PK_MASK);

offset = pfec & (PFERR_WRITE_MASK | PFERR_USER_MASK);
offset |=  !!(pte_access & ACC_USER_MASK) << PFERR_RSVD_BIT;
offset >>= 1;

pkru_bits &= pkru_mask >> offset;
pfec |= -pkru_bits & PFERR_PK_MASK;

Now, this is no branch and no much overload introduced in update_permission_bitmask().