From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1757663Ab2DIS0f (ORCPT ); Mon, 9 Apr 2012 14:26:35 -0400 Received: from mail-ob0-f174.google.com ([209.85.214.174]:56049 "EHLO mail-ob0-f174.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1757367Ab2DIS0d (ORCPT ); Mon, 9 Apr 2012 14:26:33 -0400 Message-ID: <4F8329D3.7000605@gmail.com> Date: Tue, 10 Apr 2012 02:26:27 +0800 From: Xiao Guangrong User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:10.0.1) Gecko/20120216 Thunderbird/10.0.1 MIME-Version: 1.0 To: Marcelo Tosatti CC: Avi Kivity , Xiao Guangrong , LKML , KVM Subject: Re: [PATCH 00/13] KVM: MMU: fast page fault References: <4F742951.7080003@linux.vnet.ibm.com> <4F82E04E.6000900@redhat.com> <20120409175829.GB21894@amt.cnet> In-Reply-To: <20120409175829.GB21894@amt.cnet> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 04/10/2012 01:58 AM, Marcelo Tosatti wrote: > On Mon, Apr 09, 2012 at 04:12:46PM +0300, Avi Kivity wrote: >> On 03/29/2012 11:20 AM, Xiao Guangrong wrote: >>> * Idea >>> The present bit of page fault error code (EFEC.P) indicates whether the >>> page table is populated on all levels, if this bit is set, we can know >>> the page fault is caused by the page-protection bits (e.g. W/R bit) or >>> the reserved bits. >>> >>> In KVM, in most cases, all this kind of page fault (EFEC.P = 1) can be >>> simply fixed: the page fault caused by reserved bit >>> (EFFC.P = 1 && EFEC.RSV = 1) has already been filtered out in fast mmio >>> path. What we need do to fix the rest page fault (EFEC.P = 1 && RSV != 1) >>> is just increasing the corresponding access on the spte. >>> >>> This pachset introduces a fast path to fix this kind of page fault: it >>> is out of mmu-lock and need not walk host page table to get the mapping >>> from gfn to pfn. >>> >>> >> >> This patchset is really worrying to me. >> >> It introduces a lot of concurrency into data structures that were not >> designed for it. Even if it is correct, it will be very hard to >> convince ourselves that it is correct, and if it isn't, to debug those >> subtle bugs. It will also be much harder to maintain the mmu code than >> it is now. >> >> There are a lot of things to check. Just as an example, we need to be >> sure that if we use rcu_dereference() twice in the same code path, that >> any inconsistencies due to a write in between are benign. Doing that is >> a huge task. >> >> But I appreciate the performance improvement and would like to see a >> simpler version make it in. This needs to reduce the amount of data >> touched in the fast path so it is easier to validate, and perhaps reduce >> the number of cases that the fast path works on. >> >> I would like to see the fast path as simple as >> >> rcu_read_lock(); >> >> (lockless shadow walk) >> spte = ACCESS_ONCE(*sptep); >> >> if (!(spte & PT_MAY_ALLOW_WRITES)) >> goto slow; >> >> gfn = kvm_mmu_page_get_gfn(sp, sptep - sp->sptes) >> mark_page_dirty(kvm, gfn); >> >> new_spte = spte & ~(PT64_MAY_ALLOW_WRITES | PT_WRITABLE_MASK); >> if (cmpxchg(sptep, spte, new_spte) != spte) >> goto slow; >> >> rcu_read_unlock(); >> return; >> >> slow: >> rcu_read_unlock(); >> slow_path(); >> >> It now becomes the responsibility of the slow path to maintain *sptep & >> PT_MAY_ALLOW_WRITES, but that path has a simpler concurrency model. It >> can be as simple as a clear_bit() before we update sp->gfns[] or if we >> add host write protection. >> >> Sorry, it's too complicated for me. Marcelo, what's your take? > > The improvement is small and limited to special cases (migration should > be rare and framebuffer memory accounts for a small percentage of total > memory). Actually, although the framebuffer is small but it is modified really frequently, and another unlucky things is that dirty-log is also very frequently and need hold mmu-lock to do write-protect. Yes, if Xwindow is not enabled, the benefit is limited. :)