From mboxrd@z Thu Jan 1 00:00:00 1970 From: Marcelo Tosatti Subject: Re: [PATCH v2 12/15] KVM: MMU: allow locklessly access shadow page table out of vcpu thread Date: Tue, 8 Oct 2013 22:56:27 -0300 Message-ID: <20131009015627.GA4816@amt.cnet> References: <1378376958-27252-1-git-send-email-xiaoguangrong@linux.vnet.ibm.com> <1378376958-27252-13-git-send-email-xiaoguangrong@linux.vnet.ibm.com> <20131008012355.GA3588@amt.cnet> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: QUOTED-PRINTABLE Cc: Xiao Guangrong , gleb@redhat.com, avi.kivity@gmail.com, pbonzini@redhat.com, linux-kernel@vger.kernel.org, kvm@vger.kernel.org To: Xiao Guangrong Return-path: Received: from mx1.redhat.com ([209.132.183.28]:29453 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756658Ab3JIB5l (ORCPT ); Tue, 8 Oct 2013 21:57:41 -0400 Content-Disposition: inline In-Reply-To: Sender: kvm-owner@vger.kernel.org List-ID: On Tue, Oct 08, 2013 at 12:02:32PM +0800, Xiao Guangrong wrote: >=20 > Hi Marcelo, >=20 > On Oct 8, 2013, at 9:23 AM, Marcelo Tosatti wro= te: >=20 > >>=20 > >> + if (kvm->arch.rcu_free_shadow_page) { > >> + kvm_mmu_isolate_pages(invalid_list); > >> + sp =3D list_first_entry(invalid_list, struct kvm_mmu_page, link= ); > >> + list_del_init(invalid_list); > >> + call_rcu(&sp->rcu, free_pages_rcu); > >> + return; > >> + } > >=20 > > This is unbounded (there was a similar problem with early fast page= fault > > implementations): > >=20 > > From RCU/checklist.txt: > >=20 > > " An especially important property of the synchronize_rcu() > > primitive is that it automatically self-limits: if grace per= iods > > are delayed for whatever reason, then the synchronize_rcu() > > primitive will correspondingly delay updates. In contrast, > > code using call_rcu() should explicitly limit update rate in > > cases where grace periods are delayed, as failing to do so c= an > > result in excessive realtime latencies or even OOM condition= s. > > " >=20 > I understand what you are worrying about=E2=80=A6 Hmm, can it be avoi= ded by > just using kvm->arch.rcu_free_shadow_page in a small window? - Then > there are slight chance that the page need to be freed by call_rcu. The point that must be addressed is that you cannot allow an unlimited number of sp's to be freed via call_rcu between two grace periods. So something like: - For every 17MB worth of shadow pages. - Guarantee a grace period has passed. If you control kvm->arch.rcu_free_shadow_page, you could periodically verify how many MBs worth of shadow pages are in the queue for RCU freeing and force grace period after a certain number. > > Moreover, freeing pages differently depending on some state should=20 > > be avoided. > >=20 > > Alternatives: > >=20 > > - Disable interrupts at write protect sites. >=20 > The write-protection can be triggered by KVM ioctl that is not in the= VCPU > context, if we do this, we also need to send IPI to the KVM thread wh= en do > TLB flush. Yes. However for the case being measured, simultaneous page freeing by = vcpus=20 should be minimal (therefore not affecting the latency of GET_DIRTY_LOG= ). > And we can not do much work while interrupt is disabled due to > interrupt latency. >=20 > > - Rate limit the number of pages freed via call_rcu > > per grace period. >=20 > Seems complex. :( >=20 > > - Some better alternative. >=20 > Gleb has a idea that uses RCU_DESTORY to protect the shadow page tabl= e > and encodes the page-level into the spte (since we need to check if t= he spte > is the last-spte. ). How about this? Pointer please? Why is DESTROY_SLAB_RCU any safer than call_rcu with regards to limitation? (maybe it is). > I planned to do it after this patchset merged, if you like it and if = you think > that "using kvm->arch.rcu_free_shadow_page in a small window" can not= avoid > the issue, i am happy to do it in the next version. :) Unfortunately the window can be large (as it depends on the size of the memslot), so it would be best if this problem can be addressed before=20 merging. What is your idea for reducing rcu_free_shadow_page=3D1 window= ? Thank you for the good work.