From: Avi Kivity <avi@qumranet.com>
To: Andrea Arcangeli <andrea@qumranet.com>
Cc: kvm-devel@lists.sourceforge.net, Marcelo Tosatti <mtosatti@redhat.com>
Subject: Re: [PATCH] KVM: MMU: Fix rmap_remove() race
Date: Mon, 31 Mar 2008 09:35:00 +0300 [thread overview]
Message-ID: <47F08614.8080501@qumranet.com> (raw)
In-Reply-To: <20080328140113.GA9215@duo.random>
Andrea Arcangeli wrote:
> I thought some more about this.
>
> BTW, for completeness: normally (with exception of vm_destroy) the
> put_page run by rmap_remove won't be the last one, but still the page
> can go in the freelist a moment after put_page runs (leading to the
> same problem). The VM is prevented to free the page while it's pinned,
> but the VM can do the final free on the page before rmap_remove
> returns. And w/o mmu notifiers there's no serialization that makes the
> core VM stop on the mmu_lock to wait the tlb flush to run, before the
> VM finally executes the last free of the page. mmu notifiers fixes
> this race for regular swapping as the core VM will block on the
> mmu_lock waiting the tlb flush (for this to work the tlb flush must
> always happen inside the mmu_lock unless the order is exactly "spte =
> nonpresent; tlbflush; put_page"). A VM_LOCKED on the vmas backing the
> anonymous memory will fix this for regolar swapping too (I did
> something like this in a patch at the end as a band-aid).
>
> But thinking more the moment we pretend to allow anybody to randomly
> __munmap__ any part of the guest physical address space like for
> ballooning while the guest runs (think unprivileged user owning
> /dev/kvm and running munmap at will), not even VM_LOCKED (ignored by
> munmap) and not even the mmu notifiers, can prevent the page to be
> queued in the kernel freelists immediately after rmap_remove returns,
> this is because rmap_remove may run in a different host-cpu in between
> unmap_vmas and invalidate_range_end.
>
> Running the ioctl before munmap won't help to prevent the race as the
> guest can still re-instantiate the sptes with page faults between the
> ioctl and munmap.
>
> However we've invalidate_range_begin. If we invalidate all sptes in
> invalidate_range_begin and we hold off the page faults in between
> _begin/_end, then we can fix this with the mmu notifiers.
>
This can be done by taking mmu_lock in _begin and releasing it in _end,
unless there's a lock dependency issue.
> So I think I can allow munmap safely (to unprivileged user too) by
> using _range_begin somehow. For this to work any relevant tlb flush
> must happen inside the _same_ mmu_lock critical section where
> spte=nonpresent and rmap_remove run too (thanks to the mmu_lock the
> ordering of those operations won't matter anymore, and no queue will
> be needed).
>
>
I don't understand your conclusion: you prove that mlock() is not good
enough, then post a patch to do it?
I'll take another shot at fixing rmap_remove(), I don't like to cripple
swapping for 2.6.25 (though it will only be really dependable in .26).
--
Do not meddle in the internals of kernels, for they are subtle and quick to panic.
-------------------------------------------------------------------------
Check out the new SourceForge.net Marketplace.
It's the best place to buy or sell services for
just about anything Open Source.
http://ad.doubleclick.net/clk;164216239;13503038;w?http://sf.net/marketplace
next prev parent reply other threads:[~2008-03-31 6:35 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-03-26 15:02 [PATCH] KVM: MMU: Fix rmap_remove() race Avi Kivity
2008-03-26 15:15 ` Avi Kivity
2008-03-26 17:51 ` Marcelo Tosatti
2008-03-26 18:12 ` Andrea Arcangeli
2008-03-26 19:01 ` Marcelo Tosatti
2008-03-27 8:01 ` Avi Kivity
2008-03-26 19:22 ` Andrea Arcangeli
2008-03-26 19:27 ` Andrea Arcangeli
2008-03-27 8:06 ` Avi Kivity
2008-03-27 8:11 ` Avi Kivity
2008-03-27 13:52 ` Andrea Arcangeli
2008-03-27 13:56 ` Avi Kivity
2008-03-27 14:26 ` Andrea Arcangeli
2008-03-27 14:35 ` Avi Kivity
2008-03-27 14:50 ` Andrea Arcangeli
2008-03-27 14:56 ` Avi Kivity
2008-03-28 14:01 ` Andrea Arcangeli
2008-03-28 20:07 ` Andrea Arcangeli
2008-03-31 6:35 ` Avi Kivity [this message]
2008-03-31 9:25 ` Andrea Arcangeli
2008-03-27 15:26 ` Andi Kleen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=47F08614.8080501@qumranet.com \
--to=avi@qumranet.com \
--cc=andrea@qumranet.com \
--cc=kvm-devel@lists.sourceforge.net \
--cc=mtosatti@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.