From: Andrea Arcangeli <aarcange@redhat.com>
To: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>,
avi@redhat.com, kvm@vger.kernel.org,
xiaoguangrong@linux.vnet.ibm.com
Subject: Re: [PATCH 1/2] KVM: mmu_notifier: Flush TLBs before releasing mmu_lock
Date: Tue, 14 Feb 2012 19:53:56 +0100 [thread overview]
Message-ID: <20120214185356.GQ9440@redhat.com> (raw)
In-Reply-To: <20120214172947.GA22362@amt.cnet>
On Tue, Feb 14, 2012 at 03:29:47PM -0200, Marcelo Tosatti wrote:
> The problem the patch is fixing is not related to page freeing, but
> rmap_write_protect. From 8bf3f6f06fcdfd097b6c6ec51531d8292fa0d81d
Can't find the commit on kvm.git.
> (replace "A (get_dirty_log)" with "mmu_notifier_invalidate_page"):
>
>
> During protecting pages for dirty logging, other threads may also try
> to protect a page in mmu_sync_children() or kvm_mmu_get_page().
>
> In such a case, because get_dirty_log releases mmu_lock before flushing
> TLB's, the following race condition can happen:
>
> A (get_dirty_log) B (another thread)
>
> lock(mmu_lock)
> clear pte.w
> unlock(mmu_lock)
> lock(mmu_lock)
> pte.w is already cleared
> unlock(mmu_lock)
> skip TLB flush
Not sure which tree it is, but in kvm and upstream I see an
unconditional tlb flush here, not skip (both
kvm_mmu_slot_remove_write_access and kvm_mmu_rmap_write_protect). So I
assume this has been updated in your tree to eb conditional.
Also note kvm_mmu_rmap_write_protect, flushes outside of the mmu_lock
in the kvm_mmu_rmap_write_protect case (like in quoted description),
so two write_protect_slot in parallel against each other may not be
ok, but that may be enforced by design if qemu won't ever call that
ioctl from two different userland threads (it doesn't sounds security
related so it should be ok to enforce its safety by userland design).
> return
> ...
> TLB flush
>
> Though thread B assumes the page has already been protected when it
> returns, the remaining TLB entry will break that assumption.
Now I get the question of why not running the TLB flush inside the
mmu_lock only if the spte was writable :).
kvm_mmu_get_page as long as it only runs in the context of a kvm page
fault is ok, because the page fault would be inhibited by the mmu
notifier invalidates, so maybe it's safe.
mmu_sync_children seems to have a problem instead, in your tree
get_dirty_log also has an issue if it has been updated to skip the
flush on readonly sptes, like I guess.
Interesting how the spte is already non present, the page is just
being freed shortly later, but yet we still need to trigger write
faults synchronously and prevent other CPUs in guest mode to further
modify the page to avoid losing dirty bits updates or updates on
pagetables that maps pagetables in the not NPT/EPT case. If the page
was really only going to be freed it would be ok if the other cpus
would still write to it for a little longer until the page was freed.
Like I wrote in previous email, I was thinking if we'd change the mmu
notifier methods to do an unconditional flush, then every other flush
could also run outside of the mmu_lock. But then I didn't think enough
about this to be sure. My guess is we could move all flushes outside
the mmu_lock if we stop flushling the tlb conditonally to the current
spte values. It'd clearly be slower for an UP guest though :). Large
SMP guests might benefit, if that is feasible at all... It depends how
problematic the mmu_lock is on the large SMP guests and how much we're
saving by doing conditional TLB flushes.
next prev parent reply other threads:[~2012-02-14 18:54 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-02-10 6:28 [PATCH 1/2] KVM: mmu_notifier: Flush TLBs before releasing mmu_lock Takuya Yoshikawa
2012-02-10 6:29 ` [PATCH 2/2] KVM: MMU: Flush TLBs only once in invlpg() " Takuya Yoshikawa
2012-02-10 6:55 ` Xiao Guangrong
2012-02-10 7:21 ` Takuya Yoshikawa
2012-02-10 7:42 ` Xiao Guangrong
2012-02-14 4:36 ` Takuya Yoshikawa
2012-02-14 4:56 ` Takuya Yoshikawa
2012-02-14 17:21 ` Andrea Arcangeli
2012-02-10 7:52 ` [PATCH 1/2] KVM: mmu_notifier: Flush TLBs " Xiao Guangrong
2012-02-13 6:00 ` Takuya Yoshikawa
2012-02-14 17:27 ` Andrea Arcangeli
2012-02-10 17:26 ` Marcelo Tosatti
2012-02-14 17:10 ` Andrea Arcangeli
2012-02-14 17:29 ` Marcelo Tosatti
2012-02-14 18:53 ` Andrea Arcangeli [this message]
2012-02-14 19:43 ` Marcelo Tosatti
2012-02-15 9:18 ` Avi Kivity
2012-02-15 9:47 ` Avi Kivity
2012-02-15 11:37 ` Xiao Guangrong
2012-02-15 14:07 ` Avi Kivity
2012-02-15 19:16 ` Andrea Arcangeli
2012-02-16 4:50 ` Xiao Guangrong
2012-02-16 11:57 ` Avi Kivity
2012-02-17 2:36 ` Xiao Guangrong
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20120214185356.GQ9440@redhat.com \
--to=aarcange@redhat.com \
--cc=avi@redhat.com \
--cc=kvm@vger.kernel.org \
--cc=mtosatti@redhat.com \
--cc=xiaoguangrong@linux.vnet.ibm.com \
--cc=yoshikawa.takuya@oss.ntt.co.jp \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.