From: Avi Kivity <avi@qumranet.com>
To: Andrea Arcangeli <andrea@qumranet.com>
Cc: kvm-devel@lists.sourceforge.net, Marcelo Tosatti <mtosatti@redhat.com>
Subject: Re: [PATCH] KVM: MMU: Fix rmap_remove() race
Date: Mon, 31 Mar 2008 09:35:00 +0300 [thread overview]
Message-ID: <47F08614.8080501@qumranet.com> (raw)
In-Reply-To: <20080328140113.GA9215@duo.random>
Andrea Arcangeli wrote:
> I thought some more about this.
>
> BTW, for completeness: normally (with exception of vm_destroy) the
> put_page run by rmap_remove won't be the last one, but still the page
> can go in the freelist a moment after put_page runs (leading to the
> same problem). The VM is prevented to free the page while it's pinned,
> but the VM can do the final free on the page before rmap_remove
> returns. And w/o mmu notifiers there's no serialization that makes the
> core VM stop on the mmu_lock to wait the tlb flush to run, before the
> VM finally executes the last free of the page. mmu notifiers fixes
> this race for regular swapping as the core VM will block on the
> mmu_lock waiting the tlb flush (for this to work the tlb flush must
> always happen inside the mmu_lock unless the order is exactly "spte =
> nonpresent; tlbflush; put_page"). A VM_LOCKED on the vmas backing the
> anonymous memory will fix this for regolar swapping too (I did
> something like this in a patch at the end as a band-aid).
>
> But thinking more the moment we pretend to allow anybody to randomly
> __munmap__ any part of the guest physical address space like for
> ballooning while the guest runs (think unprivileged user owning
> /dev/kvm and running munmap at will), not even VM_LOCKED (ignored by
> munmap) and not even the mmu notifiers, can prevent the page to be
> queued in the kernel freelists immediately after rmap_remove returns,
> this is because rmap_remove may run in a different host-cpu in between
> unmap_vmas and invalidate_range_end.
>
> Running the ioctl before munmap won't help to prevent the race as the
> guest can still re-instantiate the sptes with page faults between the
> ioctl and munmap.
>
> However we've invalidate_range_begin. If we invalidate all sptes in
> invalidate_range_begin and we hold off the page faults in between
> _begin/_end, then we can fix this with the mmu notifiers.
>
This can be done by taking mmu_lock in _begin and releasing it in _end,
unless there's a lock dependency issue.
> So I think I can allow munmap safely (to unprivileged user too) by
> using _range_begin somehow. For this to work any relevant tlb flush
> must happen inside the _same_ mmu_lock critical section where
> spte=nonpresent and rmap_remove run too (thanks to the mmu_lock the
> ordering of those operations won't matter anymore, and no queue will
> be needed).
>
>
I don't understand your conclusion: you prove that mlock() is not good
enough, then post a patch to do it?
I'll take another shot at fixing rmap_remove(), I don't like to cripple
swapping for 2.6.25 (though it will only be really dependable in .26).
--
Do not meddle in the internals of kernels, for they are subtle and quick to panic.
-------------------------------------------------------------------------
Check out the new SourceForge.net Marketplace.
It's the best place to buy or sell services for
just about anything Open Source.
http://ad.doubleclick.net/clk;164216239;13503038;w?http://sf.net/marketplace
next prev parent reply other threads:[~2008-03-31 6:35 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-03-26 15:02 [PATCH] KVM: MMU: Fix rmap_remove() race Avi Kivity
2008-03-26 15:15 ` Avi Kivity
2008-03-26 17:51 ` Marcelo Tosatti
2008-03-26 18:12 ` Andrea Arcangeli
2008-03-26 19:01 ` Marcelo Tosatti
2008-03-27 8:01 ` Avi Kivity
2008-03-26 19:22 ` Andrea Arcangeli
2008-03-26 19:27 ` Andrea Arcangeli
2008-03-27 8:06 ` Avi Kivity
2008-03-27 8:11 ` Avi Kivity
2008-03-27 13:52 ` Andrea Arcangeli
2008-03-27 13:56 ` Avi Kivity
2008-03-27 14:26 ` Andrea Arcangeli
2008-03-27 14:35 ` Avi Kivity
2008-03-27 14:50 ` Andrea Arcangeli
2008-03-27 14:56 ` Avi Kivity
2008-03-28 14:01 ` Andrea Arcangeli
2008-03-28 20:07 ` Andrea Arcangeli
2008-03-31 6:35 ` Avi Kivity [this message]
2008-03-31 9:25 ` Andrea Arcangeli
2008-03-27 15:26 ` Andi Kleen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=47F08614.8080501@qumranet.com \
--to=avi@qumranet.com \
--cc=andrea@qumranet.com \
--cc=kvm-devel@lists.sourceforge.net \
--cc=mtosatti@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox