From mboxrd@z Thu Jan  1 00:00:00 1970
From: Avi Kivity <avi@qumranet.com>
Subject: Re: [patch] kvm with mmu notifier v18
Date: Fri, 06 Jun 2008 11:49:23 +0300
Message-ID: <4848FA13.6040204@qumranet.com>
References: <20080605002626.GA15502@duo.random> <48480C36.6050309@qumranet.com> <20080605164717.GH15502@duo.random>
Mime-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Cc: kvm@vger.kernel.org
To: Andrea Arcangeli <andrea@qumranet.com>
Return-path: <kvm-owner@vger.kernel.org>
Received: from il.qumranet.com ([212.179.150.194]:26063 "EHLO il.qumranet.com"
	rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
	id S1751709AbYFFItT (ORCPT <rfc822;kvm@vger.kernel.org>);
	Fri, 6 Jun 2008 04:49:19 -0400
In-Reply-To: <20080605164717.GH15502@duo.random>
Sender: kvm-owner@vger.kernel.org
List-ID: <kvm.vger.kernel.org>

Andrea Arcangeli wrote:
>
>> One day we want to sort the slots according to size.  We'll need better 
>> locking then (rcu, likely).
>>     
>
> I think it's more interesting to sort them on their start/end gfn
> address, prio_tree would be the optimal structure for me to use in the
> mmu notifier invalidates as I could ask the prio tree to show me in
> O(log(N)) (N number of slots) all the slots that overlap with the
> invalidated start/end range. However I'm afraid there aren't enough
> slots for this to matter... but OTOH there aren't frequent
> modifications either, so it may be a microoptimization (if there were
> frequent modifications with such a small number of slots it likely
> would be slower than a list).
>   

There are only two interesting slots, 1G-end_of_mem and 0-pci_hole.  A 
sorted array gives an average of less than two lookups to find the slot, 
with the smallest possible constant.  Any other algorithm will give more 
lookups with a far higher constant.

Sometimes linear search is the best algorithm.


>   
>>>   @@ -1235,9 +1340,9 @@ static void mmu_free_roots(struct kvm_vcpu *vcpu)
>>>  	int i;
>>>  	struct kvm_mmu_page *sp;
>>>  -	if (!VALID_PAGE(vcpu->arch.mmu.root_hpa))
>>> -		return;
>>>  	spin_lock(&vcpu->kvm->mmu_lock);
>>> +	if (!VALID_PAGE(vcpu->arch.mmu.root_hpa))
>>> +		goto out;
>>>   
>>>       
>> vcpu-> stuff is protected by the vcpu mutex.
>>     
>
> That was to run it in ->release but I'm undoing as I'm removing release.
>
>   

Surely ->release can't happen while vcpus are still running?

>> Please move the registration to virt/kvm/kvm_main.c, and provide stubs for 
>> non-x86.  This is definitely something that we want to do cross-arch 
>> (except s390 which have it in hardware).
>>     
>
> Sorry but I did this originally in kvm/kvm_main.c and I got complains
> from s390 that didn't want the common code polluted with non-s390
> needed stuff. so what should I do?
>
>   

x86, ia64, and ppc all need mmu notifiers as well as other new 
architectures, so this better be in common code.  We can #ifdef it so 
s390 doesn't suffer a performance impact.

> +
> +static int kvm_age_rmapp(struct kvm *kvm, unsigned long *rmapp)
> +{
> +	u64 *spte;
> +	int young = 0;
> +
> +	spte = rmap_next(kvm, rmapp, NULL);
> +	while (spte) {
> +		int _young;
> +		u64 _spte = *spte;
> +		BUG_ON(!(_spte & PT_PRESENT_MASK));
> +		_young = _spte & PT_ACCESSED_MASK;
>   

I forgot, last round:  EPT doesn't have PT_ACCESSED_MASK, so you're 
reading (and clearing) some random bit.  We can easily qualify the test 
by looking at shadow_accessed_mask; if zero, the hardware doesn't have a 
young bit.

There's a bigger question of what to do in the case of EPT.  We can 
always return true, or return true and also unmap.  The first means we 
lie to the vm, the second means we take unnecessary exits to maintain 
age information.

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to panic.