All of lore.kernel.org
 help / color / mirror / Atom feed
From: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
To: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Xiao Guangrong <xiaoguangrong.eric@gmail.com>,
	Avi Kivity <avi@redhat.com>, LKML <linux-kernel@vger.kernel.org>,
	KVM <kvm@vger.kernel.org>
Subject: Re: [PATCH 00/13] KVM: MMU: fast page fault
Date: Tue, 10 Apr 2012 11:06:14 +0800	[thread overview]
Message-ID: <4F83A3A6.4070705@linux.vnet.ibm.com> (raw)
In-Reply-To: <20120409194614.GB23053@amt.cnet>

On 04/10/2012 03:46 AM, Marcelo Tosatti wrote:

> On Tue, Apr 10, 2012 at 02:26:27AM +0800, Xiao Guangrong wrote:
>> On 04/10/2012 01:58 AM, Marcelo Tosatti wrote:
>>
>>> On Mon, Apr 09, 2012 at 04:12:46PM +0300, Avi Kivity wrote:
>>>> On 03/29/2012 11:20 AM, Xiao Guangrong wrote:
>>>>> * Idea
>>>>> The present bit of page fault error code (EFEC.P) indicates whether the
>>>>> page table is populated on all levels, if this bit is set, we can know
>>>>> the page fault is caused by the page-protection bits (e.g. W/R bit) or
>>>>> the reserved bits.
>>>>>
>>>>> In KVM, in most cases, all this kind of page fault (EFEC.P = 1) can be
>>>>> simply fixed: the page fault caused by reserved bit
>>>>> (EFFC.P = 1 && EFEC.RSV = 1) has already been filtered out in fast mmio
>>>>> path. What we need do to fix the rest page fault (EFEC.P = 1 && RSV != 1)
>>>>> is just increasing the corresponding access on the spte.
>>>>>
>>>>> This pachset introduces a fast path to fix this kind of page fault: it
>>>>> is out of mmu-lock and need not walk host page table to get the mapping
>>>>> from gfn to pfn.
>>>>>
>>>>>
>>>>
>>>> This patchset is really worrying to me.
>>>>
>>>> It introduces a lot of concurrency into data structures that were not
>>>> designed for it.  Even if it is correct, it will be very hard to
>>>> convince ourselves that it is correct, and if it isn't, to debug those
>>>> subtle bugs.  It will also be much harder to maintain the mmu code than
>>>> it is now.
>>>>
>>>> There are a lot of things to check.  Just as an example, we need to be
>>>> sure that if we use rcu_dereference() twice in the same code path, that
>>>> any inconsistencies due to a write in between are benign.  Doing that is
>>>> a huge task.
>>>>
>>>> But I appreciate the performance improvement and would like to see a
>>>> simpler version make it in.  This needs to reduce the amount of data
>>>> touched in the fast path so it is easier to validate, and perhaps reduce
>>>> the number of cases that the fast path works on.
>>>>
>>>> I would like to see the fast path as simple as
>>>>
>>>>   rcu_read_lock();
>>>>
>>>>   (lockless shadow walk)
>>>>   spte = ACCESS_ONCE(*sptep);
>>>>
>>>>   if (!(spte & PT_MAY_ALLOW_WRITES))
>>>>         goto slow;
>>>>
>>>>   gfn = kvm_mmu_page_get_gfn(sp, sptep - sp->sptes)
>>>>   mark_page_dirty(kvm, gfn);
>>>>
>>>>   new_spte = spte & ~(PT64_MAY_ALLOW_WRITES | PT_WRITABLE_MASK);
>>>>   if (cmpxchg(sptep, spte, new_spte) != spte)
>>>>        goto slow;
>>>>
>>>>   rcu_read_unlock();
>>>>   return;
>>>>
>>>> slow:
>>>>   rcu_read_unlock();
>>>>   slow_path();
>>>>
>>>> It now becomes the responsibility of the slow path to maintain *sptep &
>>>> PT_MAY_ALLOW_WRITES, but that path has a simpler concurrency model.  It
>>>> can be as simple as a clear_bit() before we update sp->gfns[] or if we
>>>> add host write protection.
>>>>
>>>> Sorry, it's too complicated for me.  Marcelo, what's your take?
>>>
>>> The improvement is small and limited to special cases (migration should
>>> be rare and framebuffer memory accounts for a small percentage of total
>>> memory).
>>
>>
>> Actually, although the framebuffer is small but it is modified really
>> frequently, and another unlucky things is that dirty-log is also
>> very frequently and need hold mmu-lock to do write-protect.
>>
>> Yes, if Xwindow is not enabled, the benefit is limited. :)
> 
> Ignoring that fact, the safety of lockless set_spte and friends is not
> clear.
> 


That is why AVI suggested me to simplify the whole things. :)


> Perhaps the mmu_lock hold times by get_dirty are a large component here?
> If that can be alleviated, not only RO->RW faults benefit.
> 

Yes.

  reply	other threads:[~2012-04-10  3:06 UTC|newest]

Thread overview: 92+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-03-29  9:20 [PATCH 00/13] KVM: MMU: fast page fault Xiao Guangrong
2012-03-29  9:20 ` [PATCH 01/13] KVM: MMU: properly assert spte on rmap_next path Xiao Guangrong
2012-03-29  9:21 ` [PATCH 02/13] KVM: MMU: abstract spte write-protect Xiao Guangrong
2012-03-29 11:11   ` Avi Kivity
2012-03-29 11:51     ` Xiao Guangrong
2012-03-29  9:22 ` [PATCH 03/13] KVM: MMU: split FNAME(prefetch_invalid_gpte) Xiao Guangrong
2012-03-29 13:00   ` Avi Kivity
2012-03-30  3:51     ` Xiao Guangrong
2012-03-29  9:22 ` [PATCH 04/13] KVM: MMU: introduce FNAME(get_sp_gpa) Xiao Guangrong
2012-03-29 13:07   ` Avi Kivity
2012-03-30  5:01     ` Xiao Guangrong
2012-04-01 12:42       ` Avi Kivity
2012-03-29  9:23 ` [PATCH 05/13] KVM: MMU: reset shadow_mmio_mask Xiao Guangrong
2012-03-29 13:10   ` Avi Kivity
2012-03-29 15:28     ` Avi Kivity
2012-03-29 16:24       ` Avi Kivity
2012-03-29  9:23 ` [PATCH 06/13] KVM: VMX: export PFEC.P bit on ept Xiao Guangrong
2012-03-29  9:24 ` [PATCH 07/13] KVM: MMU: store more bits in rmap Xiao Guangrong
2012-03-29  9:25 ` [PATCH 08/13] KVM: MMU: fask check whether page is writable Xiao Guangrong
2012-03-29 15:49   ` Avi Kivity
2012-03-30  5:10     ` Xiao Guangrong
2012-04-01 15:52   ` Avi Kivity
2012-04-05 17:54     ` Xiao Guangrong
2012-04-12 23:08       ` Marcelo Tosatti
2012-04-13 10:26         ` Xiao Guangrong
2012-03-29  9:25 ` [PATCH 09/13] KVM: MMU: get expected spte out of mmu-lock Xiao Guangrong
2012-04-01 15:53   ` Avi Kivity
2012-04-05 18:25     ` Xiao Guangrong
2012-04-09 12:28       ` Avi Kivity
2012-04-09 13:16         ` Takuya Yoshikawa
2012-04-09 13:21           ` Avi Kivity
2012-03-29  9:26 ` [PATCH 10/13] KVM: MMU: store vcpu id in spte to notify page write-protect path Xiao Guangrong
2012-03-29  9:27 ` [PATCH 11/13] KVM: MMU: fast path of handling guest page fault Xiao Guangrong
2012-03-31 12:24   ` Xiao Guangrong
2012-04-01 16:23   ` Avi Kivity
2012-04-03 13:04     ` Avi Kivity
2012-04-05 19:39     ` Xiao Guangrong
2012-03-29  9:27 ` [PATCH 12/13] KVM: MMU: trace fast " Xiao Guangrong
2012-03-29  9:28 ` [PATCH 13/13] KVM: MMU: fix kvm_mmu_pagetable_walk tracepoint Xiao Guangrong
2012-03-29 10:18 ` [PATCH 00/13] KVM: MMU: fast page fault Avi Kivity
2012-03-29 11:40   ` Xiao Guangrong
2012-03-29 12:57     ` Avi Kivity
2012-03-30  9:18       ` Xiao Guangrong
2012-03-31 13:12         ` Xiao Guangrong
2012-04-01 12:58         ` Avi Kivity
2012-04-05 21:57           ` Xiao Guangrong
2012-04-06  5:24             ` Xiao Guangrong
2012-04-09 13:20               ` Avi Kivity
2012-04-09 13:59                 ` Xiao Guangrong
2012-04-09 13:12 ` Avi Kivity
2012-04-09 13:55   ` Xiao Guangrong
2012-04-09 14:01     ` Xiao Guangrong
2012-04-09 14:25     ` Avi Kivity
2012-04-09 17:58   ` Marcelo Tosatti
2012-04-09 18:13     ` Xiao Guangrong
2012-04-09 19:31       ` Marcelo Tosatti
2012-04-09 18:26     ` Xiao Guangrong
2012-04-09 19:46       ` Marcelo Tosatti
2012-04-10  3:06         ` Xiao Guangrong [this message]
2012-04-10 10:04         ` Avi Kivity
2012-04-11  1:47           ` Marcelo Tosatti
2012-04-11  9:15             ` Avi Kivity
2012-04-10 10:39         ` Avi Kivity
2012-04-10 11:40           ` Takuya Yoshikawa
2012-04-10 11:58             ` Xiao Guangrong
2012-04-11 12:15               ` Takuya Yoshikawa
2012-04-11 12:38                 ` Xiao Guangrong
2012-04-11 14:14                   ` Takuya Yoshikawa
2012-04-11 14:21                     ` Avi Kivity
2012-04-11 22:26                       ` Takuya Yoshikawa
2012-04-13 14:25                     ` Takuya Yoshikawa
2012-04-15  9:32                       ` Avi Kivity
2012-04-16 15:49                         ` Takuya Yoshikawa
2012-04-16 15:49                           ` Takuya Yoshikawa
2012-04-16 16:02                           ` Avi Kivity
2012-04-16 16:02                             ` Avi Kivity
2012-04-17  6:26                           ` Xiao Guangrong
2012-04-17  6:26                             ` Xiao Guangrong
2012-04-17  7:51                             ` Avi Kivity
2012-04-17  7:51                               ` Avi Kivity
2012-04-17 12:37                               ` Takuya Yoshikawa
2012-04-17 12:37                                 ` Takuya Yoshikawa
2012-04-17 12:41                                 ` Avi Kivity
2012-04-17 12:41                                   ` Avi Kivity
2012-04-17 14:54                                   ` Takuya Yoshikawa
2012-04-17 14:54                                     ` Takuya Yoshikawa
2012-04-17 14:56                                     ` Avi Kivity
2012-04-17 14:56                                       ` Avi Kivity
2012-04-18 13:42                                       ` Takuya Yoshikawa
2012-04-18 13:42                                         ` Takuya Yoshikawa
2012-04-17  6:16                         ` Xiao Guangrong
2012-04-10 10:10       ` Avi Kivity

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4F83A3A6.4070705@linux.vnet.ibm.com \
    --to=xiaoguangrong@linux.vnet.ibm.com \
    --cc=avi@redhat.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mtosatti@redhat.com \
    --cc=xiaoguangrong.eric@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.