From: Avi Kivity <avi@qumranet.com>
To: Andrea Arcangeli <andrea@qumranet.com>
Cc: kvm@vger.kernel.org
Subject: Re: [patch] kvm with mmu notifier v18
Date: Fri, 06 Jun 2008 19:37:48 +0300 [thread overview]
Message-ID: <484967DC.901@qumranet.com> (raw)
In-Reply-To: <20080606125019.GN15502@duo.random>
Andrea Arcangeli wrote:
> On Fri, Jun 06, 2008 at 11:49:23AM +0300, Avi Kivity wrote:
>
>> Andrea Arcangeli wrote:
>>
>>>> One day we want to sort the slots according to size. We'll need better
>>>> locking then (rcu, likely).
>>>>
>>>>
>>> I think it's more interesting to sort them on their start/end gfn
>>> address, prio_tree would be the optimal structure for me to use in the
>>> mmu notifier invalidates as I could ask the prio tree to show me in
>>> O(log(N)) (N number of slots) all the slots that overlap with the
>>> invalidated start/end range. However I'm afraid there aren't enough
>>> slots for this to matter... but OTOH there aren't frequent
>>> modifications either, so it may be a microoptimization (if there were
>>> frequent modifications with such a small number of slots it likely
>>> would be slower than a list).
>>>
>>>
>> There are only two interesting slots, 1G-end_of_mem and 0-pci_hole. A
>> sorted array gives an average of less than two lookups to find the slot,
>> with the smallest possible constant. Any other algorithm will give more
>> lookups with a far higher constant.
>>
>> Sometimes linear search is the best algorithm.
>>
>
> But we can't break the loop after the first match! I think you're not
> taking into account the aliasing done with overlapping gfn in the
> memslots (which is not forbidden apparently and you planned to
> obsolete the explicit aliasing logic with it)
For the gfn->hva (the common case) we can break immediately. For the
hva->gfn, in general we cannot, but we can add an "unaliased" flag to
the memslot and set it for slots which do not have aliases. That makes
the loop terminate soon again.
> . Only prio_tree can
> reduce the number of walks in the most optimal way because of the gfn
> overlaps and search by start-end address. Sorting by size won't be
> useful because we can't break the loop after the first found match. If
> prio tree is too heavyweight the next best would be sorting with a
> single-index tree like rbtree by start address as it'll at least
> eliminate all the memslots that have a start address higher than the
> end of the invalidate range. But prio_tree will eliminate them all in
> O(log(N)) so on the paper prio_tree is best (but perhaps in practice
> rbtree is better, dunno).
>
Any pointer-based data structure is bound to be much slower than a list
with such a small number of elements.
btw, on 64-bit userspace we can arrange the entire physical address
space in a contiguous region (some of it unmapped) and have just one
slot for the guest.
>> I forgot, last round: EPT doesn't have PT_ACCESSED_MASK, so you're reading
>> (and clearing) some random bit. We can easily qualify the test by looking
>> at shadow_accessed_mask; if zero, the hardware doesn't have a young bit.
>>
>> There's a bigger question of what to do in the case of EPT. We can always
>> return true, or return true and also unmap. The first means we lie to the
>> vm, the second means we take unnecessary exits to maintain age information.
>>
>
> If there's no access bit (and we can't use the guest pte access bit
> because EPT represent a physical page and not a guest pte anymore)
> we're forced to return 0 without doing anything at all. Returning 1 is
> unsafe as it'd pin the page in host ram.
>
> If there's no access bit all we can do is to wait the VM to unmap the
> page, mark the spte not present during the invalidate_page mmu
> notifier, and wait the guest to trigger a minor fault from swapcache.
>
Okay. It's sad, but I don't see any choice.
If anyone from Intel is listening, please give us an accessed bit (and a
dirty bit, too).
--
Do not meddle in the internals of kernels, for they are subtle and quick to panic.
next prev parent reply other threads:[~2008-06-06 16:37 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-06-05 0:26 [patch] kvm with mmu notifier v18 Andrea Arcangeli
2008-06-05 15:54 ` Avi Kivity
2008-06-05 16:47 ` Andrea Arcangeli
2008-06-06 8:49 ` Avi Kivity
2008-06-06 12:50 ` Andrea Arcangeli
2008-06-06 16:37 ` Avi Kivity [this message]
2008-06-06 17:37 ` Andrea Arcangeli
2008-06-06 20:09 ` Avi Kivity
2008-06-10 20:41 ` Marcelo Tosatti
2008-06-12 1:33 ` Andrea Arcangeli
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=484967DC.901@qumranet.com \
--to=avi@qumranet.com \
--cc=andrea@qumranet.com \
--cc=kvm@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox