From mboxrd@z Thu Jan 1 00:00:00 1970 From: Gleb Natapov Subject: Re: [patch 4/5] KVM: MMU: reload request from GET_DIRTY_LOG path Date: Tue, 1 Jul 2014 09:27:19 +0300 Message-ID: <20140701062719.GK18167@minantech.com> References: <20140618231203.846608908@amt.cnet> <20140618231521.718959400@amt.cnet> <20140619081719.GD10948@minantech.com> <20140619184031.GC32410@amt.cnet> <20140620104610.GD20764@minantech.com> <20140630205902.GB26566@amt.cnet> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: kvm@vger.kernel.org, ak@linux.intel.com, pbonzini@redhat.com, xiaoguangrong@linux.vnet.ibm.com, avi@cloudius-systems.com To: Marcelo Tosatti Return-path: Received: from mail-wg0-f42.google.com ([74.125.82.42]:59846 "EHLO mail-wg0-f42.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750811AbaGAG1Y (ORCPT ); Tue, 1 Jul 2014 02:27:24 -0400 Received: by mail-wg0-f42.google.com with SMTP id z12so8821659wgg.1 for ; Mon, 30 Jun 2014 23:27:23 -0700 (PDT) Content-Disposition: inline In-Reply-To: <20140630205902.GB26566@amt.cnet> Sender: kvm-owner@vger.kernel.org List-ID: On Mon, Jun 30, 2014 at 05:59:02PM -0300, Marcelo Tosatti wrote: > On Fri, Jun 20, 2014 at 01:46:10PM +0300, Gleb Natapov wrote: > > On Thu, Jun 19, 2014 at 03:40:31PM -0300, Marcelo Tosatti wrote: > > > On Thu, Jun 19, 2014 at 11:17:19AM +0300, Gleb Natapov wrote: > > > > On Wed, Jun 18, 2014 at 08:12:07PM -0300, mtosatti@redhat.com wrote: > > > > > Reload remote vcpus MMU from GET_DIRTY_LOG codepath, before > > > > > deleting a pinned spte. > > > > > > > > > > Signed-off-by: Marcelo Tosatti > > > > > > > > > > --- > > > > > arch/x86/kvm/mmu.c | 3 +++ > > > > > 1 file changed, 3 insertions(+) > > > > > > > > > > Index: kvm.pinned-sptes/arch/x86/kvm/mmu.c > > > > > =================================================================== > > > > > --- kvm.pinned-sptes.orig/arch/x86/kvm/mmu.c 2014-06-13 16:50:50.040140594 -0300 > > > > > +++ kvm.pinned-sptes/arch/x86/kvm/mmu.c 2014-06-13 16:51:05.620104451 -0300 > > > > > @@ -1247,6 +1247,9 @@ > > > > > spte &= ~SPTE_MMU_WRITEABLE; > > > > > spte = spte & ~PT_WRITABLE_MASK; > > > > > > > > > > + if (is_pinned_spte(spte)) > > > > > + mmu_reload_pinned_vcpus(kvm); > > > > > + > > > > Why write protect it at all? mmu_reload_pinned_vcpus() will unprotected it anyway > > > > on the next vmentry. Isn't it better to just report all pinned pages as dirty alway. > > > > > > That was the initial plan, however its awkward to stop vcpus, execute > > > get_dirty_log twice, and have pages marked as dirty on the second > > > execution. > > Indeed, but I think it may happen today with vhost (or even other devices > > that emulate dma), so userspace should be able to deal with it already. > > > > > > > > That is, it is in "incorrect" to report pages as dirty when they are > > > clean. > > In case of PEBS area we cannot know. CPU writes there directly without > > KVM even knowing about it so the only sane thing is to assume that after > > a vmentry PEBS area is dirty. This is what happens with this patch BTW, > > mmu_reload_pinned_vcpus() will force ->page_fault(FERR_WRITE) which will > > mark all pinned pages as dirty even if pages are never written to. You > > can achieve the same by having vcpu->pinned_page_dirty which is set to > > true on each vmentry and to false on each GET_DIRTY_LOG. > > > > > > > > Moreover, if the number of pinned pages is larger than the dirty > > > threshold to stop VM and migrate, you'll never migrate. If vcpus are > > > in HLT and don't VM-enter immediately, the pages should not be refaulted > > > right away. > > We should not allow that many pinned pages for security reasons. And > > having a lot of page to fault in on a next vmentry after each > > GET_DIRTY_LOG will slow down a guest during migration considerably. > > > > > > > > Do you think the optimization is worthwhile ? > > > > > I think it's simpler, but even if we will go with your current approach > > it should be improved: there is no point sending IPI to all vcpus in > > spte_write_protect() like you do here since the IPI will be send anyway at > > the end of write protect because of ptes writable->nonwritable transition, > > so all you have to do here is to set KVM_REQ_MMU_RELOAD, not need for IPI. > > No, you have to make sure the vcpu is out of guest mode. > > > In fact this makes me thinking that write protecting pinned page here is > > incorrect because old translation may not be in TLB and if CPU will try to > > write PEBS entry after pte is write protected but before MMU is reloaded > > it will encounter non writable pte and god knows what will happen, SDM > > does not tell. So spte_write_protect() should be something like that IMO: > > The vcpu will never see a read-only spte because the VM-exit (due to > IPI) guarantees vcpu is outside of guest mode _before_ it is write > protected. Right. Now I see why you absolutely have to send IPI in mmu_reload_pinned_vcpus() before marking pte as read only. And kvm->mmu_lock is what will prevent vcpu from re-entering guest mode again before pte is marked read only, right? > > So i ask you: do you still hold the "current approach should be > improved" position ? > As I said IMO what I proposed is much simpler and not as tricky as what you have here. It also has an advantage of not slowing down next guest entry after GET_DIRTY_LOG because it does not require mmu reload and page_faulting in pinned pages. -- Gleb.