linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Xiao Guangrong <xiaoguangrong.eric@gmail.com>
To: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Avi Kivity <avi@redhat.com>,
	Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>,
	LKML <linux-kernel@vger.kernel.org>, KVM <kvm@vger.kernel.org>
Subject: Re: [PATCH 00/13] KVM: MMU: fast page fault
Date: Tue, 10 Apr 2012 02:26:27 +0800	[thread overview]
Message-ID: <4F8329D3.7000605@gmail.com> (raw)
In-Reply-To: <20120409175829.GB21894@amt.cnet>

On 04/10/2012 01:58 AM, Marcelo Tosatti wrote:

> On Mon, Apr 09, 2012 at 04:12:46PM +0300, Avi Kivity wrote:
>> On 03/29/2012 11:20 AM, Xiao Guangrong wrote:
>>> * Idea
>>> The present bit of page fault error code (EFEC.P) indicates whether the
>>> page table is populated on all levels, if this bit is set, we can know
>>> the page fault is caused by the page-protection bits (e.g. W/R bit) or
>>> the reserved bits.
>>>
>>> In KVM, in most cases, all this kind of page fault (EFEC.P = 1) can be
>>> simply fixed: the page fault caused by reserved bit
>>> (EFFC.P = 1 && EFEC.RSV = 1) has already been filtered out in fast mmio
>>> path. What we need do to fix the rest page fault (EFEC.P = 1 && RSV != 1)
>>> is just increasing the corresponding access on the spte.
>>>
>>> This pachset introduces a fast path to fix this kind of page fault: it
>>> is out of mmu-lock and need not walk host page table to get the mapping
>>> from gfn to pfn.
>>>
>>>
>>
>> This patchset is really worrying to me.
>>
>> It introduces a lot of concurrency into data structures that were not
>> designed for it.  Even if it is correct, it will be very hard to
>> convince ourselves that it is correct, and if it isn't, to debug those
>> subtle bugs.  It will also be much harder to maintain the mmu code than
>> it is now.
>>
>> There are a lot of things to check.  Just as an example, we need to be
>> sure that if we use rcu_dereference() twice in the same code path, that
>> any inconsistencies due to a write in between are benign.  Doing that is
>> a huge task.
>>
>> But I appreciate the performance improvement and would like to see a
>> simpler version make it in.  This needs to reduce the amount of data
>> touched in the fast path so it is easier to validate, and perhaps reduce
>> the number of cases that the fast path works on.
>>
>> I would like to see the fast path as simple as
>>
>>   rcu_read_lock();
>>
>>   (lockless shadow walk)
>>   spte = ACCESS_ONCE(*sptep);
>>
>>   if (!(spte & PT_MAY_ALLOW_WRITES))
>>         goto slow;
>>
>>   gfn = kvm_mmu_page_get_gfn(sp, sptep - sp->sptes)
>>   mark_page_dirty(kvm, gfn);
>>
>>   new_spte = spte & ~(PT64_MAY_ALLOW_WRITES | PT_WRITABLE_MASK);
>>   if (cmpxchg(sptep, spte, new_spte) != spte)
>>        goto slow;
>>
>>   rcu_read_unlock();
>>   return;
>>
>> slow:
>>   rcu_read_unlock();
>>   slow_path();
>>
>> It now becomes the responsibility of the slow path to maintain *sptep &
>> PT_MAY_ALLOW_WRITES, but that path has a simpler concurrency model.  It
>> can be as simple as a clear_bit() before we update sp->gfns[] or if we
>> add host write protection.
>>
>> Sorry, it's too complicated for me.  Marcelo, what's your take?
> 
> The improvement is small and limited to special cases (migration should
> be rare and framebuffer memory accounts for a small percentage of total
> memory).


Actually, although the framebuffer is small but it is modified really
frequently, and another unlucky things is that dirty-log is also
very frequently and need hold mmu-lock to do write-protect.

Yes, if Xwindow is not enabled, the benefit is limited. :)

  parent reply	other threads:[~2012-04-09 18:26 UTC|newest]

Thread overview: 83+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-03-29  9:20 [PATCH 00/13] KVM: MMU: fast page fault Xiao Guangrong
2012-03-29  9:20 ` [PATCH 01/13] KVM: MMU: properly assert spte on rmap_next path Xiao Guangrong
2012-03-29  9:21 ` [PATCH 02/13] KVM: MMU: abstract spte write-protect Xiao Guangrong
2012-03-29 11:11   ` Avi Kivity
2012-03-29 11:51     ` Xiao Guangrong
2012-03-29  9:22 ` [PATCH 03/13] KVM: MMU: split FNAME(prefetch_invalid_gpte) Xiao Guangrong
2012-03-29 13:00   ` Avi Kivity
2012-03-30  3:51     ` Xiao Guangrong
2012-03-29  9:22 ` [PATCH 04/13] KVM: MMU: introduce FNAME(get_sp_gpa) Xiao Guangrong
2012-03-29 13:07   ` Avi Kivity
2012-03-30  5:01     ` Xiao Guangrong
2012-04-01 12:42       ` Avi Kivity
2012-03-29  9:23 ` [PATCH 05/13] KVM: MMU: reset shadow_mmio_mask Xiao Guangrong
2012-03-29 13:10   ` Avi Kivity
2012-03-29 15:28     ` Avi Kivity
2012-03-29 16:24       ` Avi Kivity
2012-03-29  9:23 ` [PATCH 06/13] KVM: VMX: export PFEC.P bit on ept Xiao Guangrong
2012-03-29  9:24 ` [PATCH 07/13] KVM: MMU: store more bits in rmap Xiao Guangrong
2012-03-29  9:25 ` [PATCH 08/13] KVM: MMU: fask check whether page is writable Xiao Guangrong
2012-03-29 15:49   ` Avi Kivity
2012-03-30  5:10     ` Xiao Guangrong
2012-04-01 15:52   ` Avi Kivity
2012-04-05 17:54     ` Xiao Guangrong
2012-04-12 23:08       ` Marcelo Tosatti
2012-04-13 10:26         ` Xiao Guangrong
2012-03-29  9:25 ` [PATCH 09/13] KVM: MMU: get expected spte out of mmu-lock Xiao Guangrong
2012-04-01 15:53   ` Avi Kivity
2012-04-05 18:25     ` Xiao Guangrong
2012-04-09 12:28       ` Avi Kivity
2012-04-09 13:16         ` Takuya Yoshikawa
2012-04-09 13:21           ` Avi Kivity
2012-03-29  9:26 ` [PATCH 10/13] KVM: MMU: store vcpu id in spte to notify page write-protect path Xiao Guangrong
2012-03-29  9:27 ` [PATCH 11/13] KVM: MMU: fast path of handling guest page fault Xiao Guangrong
2012-03-31 12:24   ` Xiao Guangrong
2012-04-01 16:23   ` Avi Kivity
2012-04-03 13:04     ` Avi Kivity
2012-04-05 19:39     ` Xiao Guangrong
2012-03-29  9:27 ` [PATCH 12/13] KVM: MMU: trace fast " Xiao Guangrong
2012-03-29  9:28 ` [PATCH 13/13] KVM: MMU: fix kvm_mmu_pagetable_walk tracepoint Xiao Guangrong
2012-03-29 10:18 ` [PATCH 00/13] KVM: MMU: fast page fault Avi Kivity
2012-03-29 11:40   ` Xiao Guangrong
2012-03-29 12:57     ` Avi Kivity
2012-03-30  9:18       ` Xiao Guangrong
2012-03-31 13:12         ` Xiao Guangrong
2012-04-01 12:58         ` Avi Kivity
2012-04-05 21:57           ` Xiao Guangrong
2012-04-06  5:24             ` Xiao Guangrong
2012-04-09 13:20               ` Avi Kivity
2012-04-09 13:59                 ` Xiao Guangrong
2012-04-09 13:12 ` Avi Kivity
2012-04-09 13:55   ` Xiao Guangrong
2012-04-09 14:01     ` Xiao Guangrong
2012-04-09 14:25     ` Avi Kivity
2012-04-09 17:58   ` Marcelo Tosatti
2012-04-09 18:13     ` Xiao Guangrong
2012-04-09 19:31       ` Marcelo Tosatti
2012-04-09 18:26     ` Xiao Guangrong [this message]
2012-04-09 19:46       ` Marcelo Tosatti
2012-04-10  3:06         ` Xiao Guangrong
2012-04-10 10:04         ` Avi Kivity
2012-04-11  1:47           ` Marcelo Tosatti
2012-04-11  9:15             ` Avi Kivity
2012-04-10 10:39         ` Avi Kivity
2012-04-10 11:40           ` Takuya Yoshikawa
2012-04-10 11:58             ` Xiao Guangrong
2012-04-11 12:15               ` Takuya Yoshikawa
2012-04-11 12:38                 ` Xiao Guangrong
2012-04-11 14:14                   ` Takuya Yoshikawa
2012-04-11 14:21                     ` Avi Kivity
2012-04-11 22:26                       ` Takuya Yoshikawa
2012-04-13 14:25                     ` Takuya Yoshikawa
2012-04-15  9:32                       ` Avi Kivity
2012-04-16 15:49                         ` Takuya Yoshikawa
2012-04-16 16:02                           ` Avi Kivity
2012-04-17  6:26                           ` Xiao Guangrong
2012-04-17  7:51                             ` Avi Kivity
2012-04-17 12:37                               ` Takuya Yoshikawa
2012-04-17 12:41                                 ` Avi Kivity
2012-04-17 14:54                                   ` Takuya Yoshikawa
2012-04-17 14:56                                     ` Avi Kivity
2012-04-18 13:42                                       ` Takuya Yoshikawa
2012-04-17  6:16                         ` Xiao Guangrong
2012-04-10 10:10       ` Avi Kivity

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4F8329D3.7000605@gmail.com \
    --to=xiaoguangrong.eric@gmail.com \
    --cc=avi@redhat.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mtosatti@redhat.com \
    --cc=xiaoguangrong@linux.vnet.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).