From: Avi Kivity <avi@redhat.com>
To: Takuya Yoshikawa <takuya.yoshikawa@gmail.com>
Cc: kvm-ppc@vger.kernel.org,
Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>,
Marcelo Tosatti <mtosatti@redhat.com>,
Xiao Guangrong <xiaoguangrong.eric@gmail.com>,
LKML <linux-kernel@vger.kernel.org>, KVM <kvm@vger.kernel.org>
Subject: Re: [PATCH 00/13] KVM: MMU: fast page fault
Date: Mon, 16 Apr 2012 19:02:12 +0300 [thread overview]
Message-ID: <4F8C4284.9080201@redhat.com> (raw)
In-Reply-To: <20120417004935.a9a39d951b3c24588e29edd2@gmail.com>
On 04/16/2012 06:49 PM, Takuya Yoshikawa wrote:
> > This doesn't work for EPT, which lacks a dirty bit. But we can emulate
> > it: take a free bit and call it spte.NOTDIRTY, when it is set, we also
> > clear spte.WRITE, and teach the mmu that if it sees spte.NOTDIRTY and
> > can just set spte.WRITE and clear spte.NOTDIRTY. Now that looks exactly
> > like Xiao's lockless write enabling.
>
> How do we sync with dirty_bitmap?
In Xiao's patch we call mark_page_dirty() at fault time. With the
write-protect-less approach, we look at spte.DIRTY (or spte.NOTDIRTY)
during GET_DIRTY_LOG, or when the spte is torn down.
> > Another note: O(1) write protection is not mutually exclusive with rmap
> > based write protection. In GET_DIRTY_LOG, you write protect everything,
> > and proceed to write enable on faults. When you reach the page table
> > level, you perform the rmap check to see if you should write protect or
> > not. With role.direct=1 the check is very cheap (and sometimes you can
> > drop the entire page table and replace it with a large spte).
>
> I understand that there are many possible combinations.
>
> But the question is whether the complexity is really worth it.
We don't know yet. I'm just throwing ideas around.
> Once, when we were searching a way to find atomic bitmap switch, you said
> to me that we should do our best not to add overheads to VCPU threads.
>
> From then, I tried my best to mitigate the latency problem without adding
> code to VCPU thread paths: if we add cond_resched patch, we will get a simple
> solution to the current known problem -- probably 64GB guests will work well
> without big latencies, once QEMU gets improved.
Sure, I'm not advocating doing the most nifty idea. After all I'm the
one that suffers most from it. Everything should be proven to improve,
and the improvement should be material, not just a random measurement
that doesn't matter to anyone.
>
> I also surveyed other known hypervisors internally. We can easily see
> hundreds of ms latency during migration. But people rarely complain
> about that if they are stable and usable in most situations.
There is also the unavoidable latency during the final stop-and-copy
phase, at least without post-copy. And the migration thread (when we
have one) is hardly latency sensitive.
> Although O(1) is actually O(1) for GET_DIRTY_LOG thread, it adds some
> overheads to page fault handling. We may need to hold mmu_lock for properly
> handling O(1)'s write protection and ~500 write protections will not be so
> cheap. And there is no answer to the question how to achive slot-wise write
> protection.
>
> Of course, we may need such a tree-wide write protection when we want to
> support guests with hundreds of GB, or TB, of memory. Sadly it's not now.
>
>
> Well, if you need the best answer now, we should discuss the whole design:
> KVM Forum may be a good place for that.
We don't need the best answer now, I'm satisfied with incremental
improvements. But it's good to have the ideas out in the open, maybe
some of them will be adopted, or maybe they'll trigger a better idea.
(btw O(1) write protection is equally applicable to ordinary fork())
--
error compiling committee.c: too many arguments to function
next prev parent reply other threads:[~2012-04-16 16:02 UTC|newest]
Thread overview: 83+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-03-29 9:20 [PATCH 00/13] KVM: MMU: fast page fault Xiao Guangrong
2012-03-29 9:20 ` [PATCH 01/13] KVM: MMU: properly assert spte on rmap_next path Xiao Guangrong
2012-03-29 9:21 ` [PATCH 02/13] KVM: MMU: abstract spte write-protect Xiao Guangrong
2012-03-29 11:11 ` Avi Kivity
2012-03-29 11:51 ` Xiao Guangrong
2012-03-29 9:22 ` [PATCH 03/13] KVM: MMU: split FNAME(prefetch_invalid_gpte) Xiao Guangrong
2012-03-29 13:00 ` Avi Kivity
2012-03-30 3:51 ` Xiao Guangrong
2012-03-29 9:22 ` [PATCH 04/13] KVM: MMU: introduce FNAME(get_sp_gpa) Xiao Guangrong
2012-03-29 13:07 ` Avi Kivity
2012-03-30 5:01 ` Xiao Guangrong
2012-04-01 12:42 ` Avi Kivity
2012-03-29 9:23 ` [PATCH 05/13] KVM: MMU: reset shadow_mmio_mask Xiao Guangrong
2012-03-29 13:10 ` Avi Kivity
2012-03-29 15:28 ` Avi Kivity
2012-03-29 16:24 ` Avi Kivity
2012-03-29 9:23 ` [PATCH 06/13] KVM: VMX: export PFEC.P bit on ept Xiao Guangrong
2012-03-29 9:24 ` [PATCH 07/13] KVM: MMU: store more bits in rmap Xiao Guangrong
2012-03-29 9:25 ` [PATCH 08/13] KVM: MMU: fask check whether page is writable Xiao Guangrong
2012-03-29 15:49 ` Avi Kivity
2012-03-30 5:10 ` Xiao Guangrong
2012-04-01 15:52 ` Avi Kivity
2012-04-05 17:54 ` Xiao Guangrong
2012-04-12 23:08 ` Marcelo Tosatti
2012-04-13 10:26 ` Xiao Guangrong
2012-03-29 9:25 ` [PATCH 09/13] KVM: MMU: get expected spte out of mmu-lock Xiao Guangrong
2012-04-01 15:53 ` Avi Kivity
2012-04-05 18:25 ` Xiao Guangrong
2012-04-09 12:28 ` Avi Kivity
2012-04-09 13:16 ` Takuya Yoshikawa
2012-04-09 13:21 ` Avi Kivity
2012-03-29 9:26 ` [PATCH 10/13] KVM: MMU: store vcpu id in spte to notify page write-protect path Xiao Guangrong
2012-03-29 9:27 ` [PATCH 11/13] KVM: MMU: fast path of handling guest page fault Xiao Guangrong
2012-03-31 12:24 ` Xiao Guangrong
2012-04-01 16:23 ` Avi Kivity
2012-04-03 13:04 ` Avi Kivity
2012-04-05 19:39 ` Xiao Guangrong
2012-03-29 9:27 ` [PATCH 12/13] KVM: MMU: trace fast " Xiao Guangrong
2012-03-29 9:28 ` [PATCH 13/13] KVM: MMU: fix kvm_mmu_pagetable_walk tracepoint Xiao Guangrong
2012-03-29 10:18 ` [PATCH 00/13] KVM: MMU: fast page fault Avi Kivity
2012-03-29 11:40 ` Xiao Guangrong
2012-03-29 12:57 ` Avi Kivity
2012-03-30 9:18 ` Xiao Guangrong
2012-03-31 13:12 ` Xiao Guangrong
2012-04-01 12:58 ` Avi Kivity
2012-04-05 21:57 ` Xiao Guangrong
2012-04-06 5:24 ` Xiao Guangrong
2012-04-09 13:20 ` Avi Kivity
2012-04-09 13:59 ` Xiao Guangrong
2012-04-09 13:12 ` Avi Kivity
2012-04-09 13:55 ` Xiao Guangrong
2012-04-09 14:01 ` Xiao Guangrong
2012-04-09 14:25 ` Avi Kivity
2012-04-09 17:58 ` Marcelo Tosatti
2012-04-09 18:13 ` Xiao Guangrong
2012-04-09 19:31 ` Marcelo Tosatti
2012-04-09 18:26 ` Xiao Guangrong
2012-04-09 19:46 ` Marcelo Tosatti
2012-04-10 3:06 ` Xiao Guangrong
2012-04-10 10:04 ` Avi Kivity
2012-04-11 1:47 ` Marcelo Tosatti
2012-04-11 9:15 ` Avi Kivity
2012-04-10 10:39 ` Avi Kivity
2012-04-10 11:40 ` Takuya Yoshikawa
2012-04-10 11:58 ` Xiao Guangrong
2012-04-11 12:15 ` Takuya Yoshikawa
2012-04-11 12:38 ` Xiao Guangrong
2012-04-11 14:14 ` Takuya Yoshikawa
2012-04-11 14:21 ` Avi Kivity
2012-04-11 22:26 ` Takuya Yoshikawa
2012-04-13 14:25 ` Takuya Yoshikawa
2012-04-15 9:32 ` Avi Kivity
2012-04-16 15:49 ` Takuya Yoshikawa
2012-04-16 16:02 ` Avi Kivity [this message]
2012-04-17 6:26 ` Xiao Guangrong
2012-04-17 7:51 ` Avi Kivity
2012-04-17 12:37 ` Takuya Yoshikawa
2012-04-17 12:41 ` Avi Kivity
2012-04-17 14:54 ` Takuya Yoshikawa
2012-04-17 14:56 ` Avi Kivity
2012-04-18 13:42 ` Takuya Yoshikawa
2012-04-17 6:16 ` Xiao Guangrong
2012-04-10 10:10 ` Avi Kivity
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4F8C4284.9080201@redhat.com \
--to=avi@redhat.com \
--cc=kvm-ppc@vger.kernel.org \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mtosatti@redhat.com \
--cc=takuya.yoshikawa@gmail.com \
--cc=xiaoguangrong.eric@gmail.com \
--cc=xiaoguangrong@linux.vnet.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).