public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed
From: Andrea Arcangeli <andrea@qumranet.com>
To: Avi Kivity <avi@qumranet.com>
Cc: kvm@vger.kernel.org
Subject: Re: [patch] kvm with mmu notifier v18
Date: Fri, 6 Jun 2008 14:50:19 +0200	[thread overview]
Message-ID: <20080606125019.GN15502@duo.random> (raw)
In-Reply-To: <4848FA13.6040204@qumranet.com>

On Fri, Jun 06, 2008 at 11:49:23AM +0300, Avi Kivity wrote:
> Andrea Arcangeli wrote:
>>
>>> One day we want to sort the slots according to size.  We'll need better 
>>> locking then (rcu, likely).
>>>     
>>
>> I think it's more interesting to sort them on their start/end gfn
>> address, prio_tree would be the optimal structure for me to use in the
>> mmu notifier invalidates as I could ask the prio tree to show me in
>> O(log(N)) (N number of slots) all the slots that overlap with the
>> invalidated start/end range. However I'm afraid there aren't enough
>> slots for this to matter... but OTOH there aren't frequent
>> modifications either, so it may be a microoptimization (if there were
>> frequent modifications with such a small number of slots it likely
>> would be slower than a list).
>>   
>
> There are only two interesting slots, 1G-end_of_mem and 0-pci_hole.  A 
> sorted array gives an average of less than two lookups to find the slot, 
> with the smallest possible constant.  Any other algorithm will give more 
> lookups with a far higher constant.
>
> Sometimes linear search is the best algorithm.

But we can't break the loop after the first match! I think you're not
taking into account the aliasing done with overlapping gfn in the
memslots (which is not forbidden apparently and you planned to
obsolete the explicit aliasing logic with it). Only prio_tree can
reduce the number of walks in the most optimal way because of the gfn
overlaps and search by start-end address. Sorting by size won't be
useful because we can't break the loop after the first found match. If
prio tree is too heavyweight the next best would be sorting with a
single-index tree like rbtree by start address as it'll at least
eliminate all the memslots that have a start address higher than the
end of the invalidate range. But prio_tree will eliminate them all in
O(log(N)) so on the paper prio_tree is best (but perhaps in practice
rbtree is better, dunno).

> Surely ->release can't happen while vcpus are still running?

Exactly.

> x86, ia64, and ppc all need mmu notifiers as well as other new 
> architectures, so this better be in common code.  We can #ifdef it so s390 
> doesn't suffer a performance impact.

Ok, that's fine with me (I'll undo the changes post the s390 comment ;).

> I forgot, last round:  EPT doesn't have PT_ACCESSED_MASK, so you're reading 
> (and clearing) some random bit.  We can easily qualify the test by looking 
> at shadow_accessed_mask; if zero, the hardware doesn't have a young bit.
>
> There's a bigger question of what to do in the case of EPT.  We can always 
> return true, or return true and also unmap.  The first means we lie to the 
> vm, the second means we take unnecessary exits to maintain age information.

If there's no access bit (and we can't use the guest pte access bit
because EPT represent a physical page and not a guest pte anymore)
we're forced to return 0 without doing anything at all. Returning 1 is
unsafe as it'd pin the page in host ram.

If there's no access bit all we can do is to wait the VM to unmap the
page, mark the spte not present during the invalidate_page mmu
notifier, and wait the guest to trigger a minor fault from swapcache.

  reply	other threads:[~2008-06-06 12:50 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-06-05  0:26 [patch] kvm with mmu notifier v18 Andrea Arcangeli
2008-06-05 15:54 ` Avi Kivity
2008-06-05 16:47   ` Andrea Arcangeli
2008-06-06  8:49     ` Avi Kivity
2008-06-06 12:50       ` Andrea Arcangeli [this message]
2008-06-06 16:37         ` Avi Kivity
2008-06-06 17:37           ` Andrea Arcangeli
2008-06-06 20:09             ` Avi Kivity
2008-06-10 20:41     ` Marcelo Tosatti
2008-06-12  1:33       ` Andrea Arcangeli

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20080606125019.GN15502@duo.random \
    --to=andrea@qumranet.com \
    --cc=avi@qumranet.com \
    --cc=kvm@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox