From: Andrea Arcangeli <andrea-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
To: Avi Kivity <avi-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
Cc: kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f@public.gmane.org
Subject: Re: [PATCH] kvm swapping with mmu notifiers + age_page
Date: Tue, 22 Jan 2008 18:41:14 +0100 [thread overview]
Message-ID: <20080122174114.GI7331@v2.random> (raw)
In-Reply-To: <47960371.8020709-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
On Tue, Jan 22, 2008 at 04:53:37PM +0200, Avi Kivity wrote:
> Andrea Arcangeli wrote:
>> On Tue, Jan 22, 2008 at 04:08:16PM +0200, Avi Kivity wrote:
>>
>>> Andrea Arcangeli wrote:
>>>
>>>> This is the same as before but it uses the age_page callback to
>>>> prevent the guest OS working set to be swapped out. It works well here
>>>> so far. This depends on the memslot locking with mmu lock patch and on
>>>> the mmu notifiers #v3 patch that I'll post in CC with linux-mm shortly
>>>> that implements the age_page callback and that changes follow_page to
>>>> set the young bit in the pte instead of setting the referenced bit (so
>>>> the age_page will be called again later when the VM clears the young
>>>> bit).
>>>>
>>>> +static void unmap_spte(struct kvm *kvm, u64 *spte)
>>>> +{
>>>> + struct page *page = pfn_to_page((*spte & PT64_BASE_ADDR_MASK) >>
>>>> PAGE_SHIFT);
>>>> + get_page(page);
>>>> + rmap_remove(kvm, spte);
>>>> + set_shadow_pte(spte, shadow_trap_nonpresent_pte);
>>>> + kvm_flush_remote_tlbs(kvm);
>>>> + __free_page(page);
>>>> +}
>>>>
>>> Why is get_page()/__free_page() needed here? Isn't kvm_release_page_*()
>>> sufficient?
>>>
>>
>> The other-cpus-tlb have to be flushed _before_ the page is visible in
>> the host kernel freelist, otherwise other host-cpus with tlbs still
>> mapping the page with write-access would be able to modify the page
>> even after it's queued in the freelist.
>
> Right. But doesn't this apply to other callers of rmap_remove()? Perhaps
> we need to put the flush in set_spte() or rmap_remove() and
> rmap_write_protect().
>
> Oh, rmap_write_protect() already has the flush.
rmap_write_protect is the only obviously safe one because it doesn't
decrease the reference count, it flushes the tlb only to flush any
write-enabled tlb entry.
The problem is only with all rmap_remove callers.
invalidate_page ironically I think is ok with flushing the tlb after
put_page because ptep_clear_flush is invoked with a pin on the page by
the caller of ptep_clear_flush.
invalidate_range is not ok with flushing the tlb _after_ put_page.
All other rmap_remove callers must take into account that when
rmap_remove returns, in between put_page and tlb-flush, another cpu
may be in the VM and free the page the moment after the pin on the
page is gone. This is especially true with readonly swapcache that
doesn't require swapout to be put in the freelist.
So yes, it may be a generic race for the rmap_remove callers.
I'm not exactly sure why I was getting crashes w/o doing
get_page/tlbflush/__free_page, the only logical explanation at this
point is invalidate_range.
> I'm afraid I don't really understand the difference in semantics between
> put_page() and __free_page(). Maybe we need to switch kvm_release_page_*()
> to __free_page()?
put_page/__free_page will work fine in practice for kvm, __free_page
is faster so yes, I think kvm_release_page_ should be changed to use
__free_page but this is a microoptimization only. The only real issue
is with the tlb flush in smp. If it can happen after
put_page/__free_page or not.
-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
prev parent reply other threads:[~2008-01-22 17:41 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-01-21 12:41 [PATCH] kvm swapping with mmu notifiers + age_page Andrea Arcangeli
[not found] ` <20080121124124.GG6970-lysg2Xt5kKMAvxtiuMwx3w@public.gmane.org>
2008-01-22 14:08 ` Avi Kivity
[not found] ` <4795F8D0.30102-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
2008-01-22 14:41 ` Andrea Arcangeli
[not found] ` <20080122144149.GD7331-lysg2Xt5kKMAvxtiuMwx3w@public.gmane.org>
2008-01-22 14:53 ` Avi Kivity
[not found] ` <47960371.8020709-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
2008-01-22 17:41 ` Andrea Arcangeli [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20080122174114.GI7331@v2.random \
--to=andrea-atkuwr5tajbwk0htik3j/w@public.gmane.org \
--cc=avi-atKUWr5tajBWk0Htik3J/w@public.gmane.org \
--cc=kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f@public.gmane.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox