Re: [patch] kvm with mmu notifier v18

public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed

From: Avi Kivity <avi@qumranet.com>
To: Andrea Arcangeli <andrea@qumranet.com>
Cc: kvm@vger.kernel.org
Subject: Re: [patch] kvm with mmu notifier v18
Date: Fri, 06 Jun 2008 19:37:48 +0300	[thread overview]
Message-ID: <484967DC.901@qumranet.com> (raw)
In-Reply-To: <20080606125019.GN15502@duo.random>

Andrea Arcangeli wrote:
> On Fri, Jun 06, 2008 at 11:49:23AM +0300, Avi Kivity wrote:
>   
>> Andrea Arcangeli wrote:
>>     
>>>> One day we want to sort the slots according to size.  We'll need better 
>>>> locking then (rcu, likely).
>>>>     
>>>>         
>>> I think it's more interesting to sort them on their start/end gfn
>>> address, prio_tree would be the optimal structure for me to use in the
>>> mmu notifier invalidates as I could ask the prio tree to show me in
>>> O(log(N)) (N number of slots) all the slots that overlap with the
>>> invalidated start/end range. However I'm afraid there aren't enough
>>> slots for this to matter... but OTOH there aren't frequent
>>> modifications either, so it may be a microoptimization (if there were
>>> frequent modifications with such a small number of slots it likely
>>> would be slower than a list).
>>>   
>>>       
>> There are only two interesting slots, 1G-end_of_mem and 0-pci_hole.  A 
>> sorted array gives an average of less than two lookups to find the slot, 
>> with the smallest possible constant.  Any other algorithm will give more 
>> lookups with a far higher constant.
>>
>> Sometimes linear search is the best algorithm.
>>     
>
> But we can't break the loop after the first match! I think you're not
> taking into account the aliasing done with overlapping gfn in the
> memslots (which is not forbidden apparently and you planned to
> obsolete the explicit aliasing logic with it)

For the gfn->hva (the common case) we can break immediately.  For the 
hva->gfn, in general we cannot, but we can add an "unaliased" flag to 
the memslot and set it for slots which do not have aliases.  That makes 
the loop terminate soon again.

> . Only prio_tree can
> reduce the number of walks in the most optimal way because of the gfn
> overlaps and search by start-end address. Sorting by size won't be
> useful because we can't break the loop after the first found match. If
> prio tree is too heavyweight the next best would be sorting with a
> single-index tree like rbtree by start address as it'll at least
> eliminate all the memslots that have a start address higher than the
> end of the invalidate range. But prio_tree will eliminate them all in
> O(log(N)) so on the paper prio_tree is best (but perhaps in practice
> rbtree is better, dunno).
>   

Any pointer-based data structure is bound to be much slower than a list 
with such a small number of elements.

btw, on 64-bit userspace we can arrange the entire physical address 
space in a contiguous region (some of it unmapped) and have just one 
slot for the guest.

>> I forgot, last round:  EPT doesn't have PT_ACCESSED_MASK, so you're reading 
>> (and clearing) some random bit.  We can easily qualify the test by looking 
>> at shadow_accessed_mask; if zero, the hardware doesn't have a young bit.
>>
>> There's a bigger question of what to do in the case of EPT.  We can always 
>> return true, or return true and also unmap.  The first means we lie to the 
>> vm, the second means we take unnecessary exits to maintain age information.
>>     
>
> If there's no access bit (and we can't use the guest pte access bit
> because EPT represent a physical page and not a guest pte anymore)
> we're forced to return 0 without doing anything at all. Returning 1 is
> unsafe as it'd pin the page in host ram.
>
> If there's no access bit all we can do is to wait the VM to unmap the
> page, mark the spte not present during the invalidate_page mmu
> notifier, and wait the guest to trigger a minor fault from swapcache.
>   

Okay.  It's sad, but I don't see any choice.

If anyone from Intel is listening, please give us an accessed bit (and a 
dirty bit, too).

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to panic.

next prev parent reply	other threads:[~2008-06-06 16:37 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-06-05  0:26 [patch] kvm with mmu notifier v18 Andrea Arcangeli
2008-06-05 15:54 ` Avi Kivity
2008-06-05 16:47   ` Andrea Arcangeli
2008-06-06  8:49     ` Avi Kivity
2008-06-06 12:50       ` Andrea Arcangeli
2008-06-06 16:37         ` Avi Kivity [this message]
2008-06-06 17:37           ` Andrea Arcangeli
2008-06-06 20:09             ` Avi Kivity
2008-06-10 20:41     ` Marcelo Tosatti
2008-06-12  1:33       ` Andrea Arcangeli

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=484967DC.901@qumranet.com \
    --to=avi@qumranet.com \
    --cc=andrea@qumranet.com \
    --cc=kvm@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox