From mboxrd@z Thu Jan 1 00:00:00 1970 From: Marcelo Tosatti Subject: Re: [PATCH 04/12] KVM: MMU: log dirty page after marking spte writable Date: Wed, 20 Nov 2013 17:47:27 -0200 Message-ID: <20131120194727.GA3353@amt.cnet> References: <1375189330-24066-1-git-send-email-xiaoguangrong@linux.vnet.ibm.com> <1375189330-24066-5-git-send-email-xiaoguangrong@linux.vnet.ibm.com> <20130807014828.GA4781@amt.cnet> <5201C7D9.20004@linux.vnet.ibm.com> <20131120002920.GA14230@amt.cnet> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: QUOTED-PRINTABLE Cc: Gleb Natapov , avi.kivity@gmail.com, "pbonzini@redhat.com Bonzini" , linux-kernel@vger.kernel.org, kvm@vger.kernel.org To: Xiao Guangrong Return-path: Content-Disposition: inline In-Reply-To: Sender: linux-kernel-owner@vger.kernel.org List-Id: kvm.vger.kernel.org On Wed, Nov 20, 2013 at 10:20:09PM +0800, Xiao Guangrong wrote: > > But what guarantee does userspace require, from GET_DIRTY_LOG, whil= e vcpus are > > executing?=20 >=20 > Aha. Single calling GET_DIRTY_LOG is useless since new dirty page can= be generated > when GET_DIRTY_LOG is being returned. If user wants to get exact dirt= y pages the vcpus > should be stopped.=20 >=20 > >=20 > > With fast page fault: > >=20 > > if (cmpxchg64(sptep, spte, spte | PT_WRITABLE_MASK) =3D=3D spt= e) > > /* The window in here... */ > > mark_page_dirty(vcpu->kvm, gfn); > >=20 > > And the $SUBJECT set_spte reordering, the rule becomes > >=20 > > A call to GET_DIRTY_LOG guarantees to return correct information ab= out=20 > > dirty pages before invocation of the previous GET_DIRTY_LOG call. > >=20 > > (see example 1: the next GET_DIRTY_LOG will return the dirty inform= ation > > there). > >=20 >=20 > It seems no. >=20 > The first GET_DIRTY_LOG can happen before fast-page-fault=EF=BC=8C > the second GET_DIRTY_LOG happens in the window between cmpxchg() > and mark_page_dirty(), for the second one, the information is still =E2= =80=9Cincorrect=E2=80=9D. Its correct for the previous GET_DIRTY_LOG call. > > The rule for sptes that is, because kvm_write_guest does not match = the > > documentation at all. >=20 > You mean the case of =E2=80=9Ckvm_write_guest=E2=80=9D is valid (I do= not know why it is)? > Or anything else? >=20 > >=20 > > So before example 1 and this patch, the rule (well for sptes at lea= st) was > >=20 > > "Given a memory slot, return a bitmap containing any pages dirtied > > since the last call to this ioctl. Bit 0 is the first page in the > > memory slot. Ensure the entire structure is cleared to avoid paddi= ng > > issues." > >=20 > > Can you explain why it is OK to relax this rule? >=20 > It=E2=80=99s because: > 1) it doesn=E2=80=99t break current use cases, i.e. Live migration an= d FB-flushing. > 2) the current code, like kvm_write_guest has already broken the doc= umentation > (the guest page has been written but missed in the dirty bitmap). > 3) it=E2=80=99s needless to implement a exact get-dirty-pages since t= he dirty pages can > no be exactly got except stopping vcpus.=20 >=20 > So i think we'd document this case instead. No? Lets figure out the requirements, then. I don't understand why =46B-flushing is safe (think kvm-autotest: one pixel off the entire test fails). Before fast page fault: Pages are either write protected or the corresponding dirty bitmap bit is set. Any write faults to dirty logged sptes while GET_DIRTY log is executing in the protected section are allowed to instantiate writeable spte after GET_DIRTY log is finished. After fast page fault: Pages can be writeable and the dirty bitmap not set. Therefore data can be dirty before GET_DIRTY executes and still fail to be returned in the bitmap. Since this patchset does not introduce change in behaviour (fast pf did), no reason to avoid merging this. BTW, since GET_DIRTY log does not require to report concurrent (to GET_DIRTY_LOG) write faults, it is not necessary to rewalk the spte list, is it?