From mboxrd@z Thu Jan 1 00:00:00 1970 From: Marcelo Tosatti Subject: Re: [PATCH v4 0/8] KVM paravirt remote flush tlb Date: Thu, 23 Aug 2012 08:48:56 -0300 Message-ID: <20120823114856.GB4747@amt.cnet> References: <20120821112346.3512.99814.stgit@abhimanyu.in.ibm.com> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: QUOTED-PRINTABLE Cc: avi@redhat.com, raghukt@linux.vnet.ibm.com, alex.shi@intel.com, kvm@vger.kernel.org, stefano.stabellini@eu.citrix.com, peterz@infradead.org, hpa@zytor.com, vsrivatsa@gmail.com, mingo@elte.hu To: "Nikunj A. Dadhania" , Peter Zijlstra , Avi Kivity Return-path: Received: from mx1.redhat.com ([209.132.183.28]:1897 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932702Ab2HWMDD (ORCPT ); Thu, 23 Aug 2012 08:03:03 -0400 Content-Disposition: inline In-Reply-To: <20120821112346.3512.99814.stgit@abhimanyu.in.ibm.com> Sender: kvm-owner@vger.kernel.org List-ID: On Tue, Aug 21, 2012 at 04:55:52PM +0530, Nikunj A. Dadhania wrote: > Remote flushing api's does a busy wait which is fine in bare-metal > scenario. But with-in the guest, the vcpus might have been pre-empted > or blocked. In this scenario, the initator vcpu would end up > busy-waiting for a long amount of time. >=20 > This was discovered in our gang scheduling test and other way to solv= e > this is by para-virtualizing the flush_tlb_others_ipi(now shows up as > smp_call_function_many after Alex Shi's TLB optimization) >=20 > This patch set implements para-virt flush tlbs making sure that it > does not wait for vcpus that are sleeping. And all the sleeping vcpus > flush the tlb on guest enter. Idea was discussed here: > https://lkml.org/lkml/2012/2/20/157 >=20 > This also brings one more dependency for lock-less page walk that is > performed by get_user_pages_fast(gup_fast). gup_fast disables the > interrupt and assumes that the pages will not be freed during that > period. And this was fine as the flush_tlb_others_ipi would wait for > all the IPI to be processed and return back. With the new approach of > not waiting for the sleeping vcpus, this assumption is not valid > anymore. So now HAVE_RCU_TABLE_FREE is used to free the pages. This > will make sure that all the cpus would atleast process smp_callback > before the pages are freed. >=20 > Changelog from v3: > =E2=80=A2 Add helper for cleaning up vcpu_state information (Marcelo) > =E2=80=A2 Fix code for checking vs_page and leaking page refs (Marcel= o) >=20 > Changelog from v2: > =E2=80=A2 Rebase to 3.5 based linus(commit - f7da9cd) kernel. > =E2=80=A2 Port PV-Flush to new TLB-Optimization code by Alex Shi > =E2=80=A2 Use pinned pages to avoid overhead during guest enter/exit = (Marcelo) > =E2=80=A2 Remove kick, as this is not improving much > =E2=80=A2 Use bit fields in the state(flush_on_enter and vcpu_running= ) flag to > avoid smp barriers (Marcelo) >=20 > Changelog from v1: > =E2=80=A2 Race fixes reported by Vatsa > =E2=80=A2 Address gup_fast dependency using PeterZ's rcu table free p= atch > =E2=80=A2 Fix rcu_table_free for hw pagetable walkers >=20 > Here are the results from PLE hardware. Here is the setup details: > =E2=80=A2 32 CPUs (HT disabled) > =E2=80=A2 64-bit VM > =E2=80=A2 32vcpus > =E2=80=A2 8GB RAM >=20 > base =3D 3.6-rc1 + ple handler optimization patch > pvflushv4 =3D 3.6-rc1 + ple handler optimization patch + pvflushv4 p= atch >=20 > kernbench(lower is better) > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D > base pvflushv4 %improvement > 1VM 48.5800 46.8513 3.55846 > 2VM 108.1823 104.6410 3.27346 > 3VM 183.2733 163.3547 10.86825 >=20 > ebizzy(higher is better) > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D > base pvflushv4 %improvement > 1VM 2414.5000 2089.8750 -13.44481 > 2VM 2167.6250 2371.7500 9.41699 > 3VM 1600.1111 2102.5556 31.40060 >=20 > Thanks Raghu for running the tests. >=20 > [1] http://article.gmane.org/gmane.linux.kernel/1329752 >=20 > --- >=20 > Nikunj A. Dadhania (6): > KVM Guest: Add VCPU running/pre-empted state for guest > KVM-HV: Add VCPU running/pre-empted state for guest > KVM Guest: Add paravirt kvm_flush_tlb_others > KVM-HV: Add flush_on_enter before guest enter > Enable HAVE_RCU_TABLE_FREE for kvm when PARAVIRT_TLB_FLUSH is e= nabled > KVM-doc: Add paravirt tlb flush document >=20 > Peter Zijlstra (2): > mm, x86: Add HAVE_RCU_TABLE_FREE support > mm: Add missing TLB invalidate to RCU page-table freeing >=20 >=20 > Documentation/virtual/kvm/msr.txt | 4 + > Documentation/virtual/kvm/paravirt-tlb-flush.txt | 53 ++++++++++++= ++ > arch/Kconfig | 3 + > arch/powerpc/Kconfig | 1=20 > arch/sparc/Kconfig | 1=20 > arch/x86/Kconfig | 11 +++ > arch/x86/include/asm/kvm_host.h | 7 ++ > arch/x86/include/asm/kvm_para.h | 13 +++ > arch/x86/include/asm/tlb.h | 1=20 > arch/x86/include/asm/tlbflush.h | 11 +++ > arch/x86/kernel/kvm.c | 38 ++++++++++ > arch/x86/kvm/cpuid.c | 1=20 > arch/x86/kvm/x86.c | 84 ++++++++++++= +++++++++- > arch/x86/mm/pgtable.c | 6 +- > arch/x86/mm/tlb.c | 36 +++++++++ > include/asm-generic/tlb.h | 9 ++ > mm/memory.c | 43 ++++++++++- > 17 files changed, 311 insertions(+), 11 deletions(-) > create mode 100644 Documentation/virtual/kvm/paravirt-tlb-flush.txt Avi, PeterZ can you please review?