From mboxrd@z Thu Jan 1 00:00:00 1970 From: Avi Kivity Subject: Re: [PATCH RFC 1/2] kvm: Handle undercommitted guest case in PLE handler Date: Wed, 03 Oct 2012 19:05:27 +0200 Message-ID: <506C7057.6000102@redhat.com> References: <20120921115942.27611.67488.sendpatchset@codeblue> <20120921120000.27611.71321.sendpatchset@codeblue> <505C654B.2050106@redhat.com> <505CA2EB.7050403@linux.vnet.ibm.com> <50607F1F.2040704@redhat.com> <20121003122209.GA9076@linux.vnet.ibm.com> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: Rik van Riel , Peter Zijlstra , "H. Peter Anvin" , Ingo Molnar , Marcelo Tosatti , Srikar , "Nikunj A. Dadhania" , KVM , Jiannan Ouyang , chegu vinod , "Andrew M. Theurer" , LKML , Srivatsa Vaddagiri , Gleb Natapov To: Raghavendra K T Return-path: In-Reply-To: <20121003122209.GA9076@linux.vnet.ibm.com> Sender: linux-kernel-owner@vger.kernel.org List-Id: kvm.vger.kernel.org On 10/03/2012 02:22 PM, Raghavendra K T wrote: >> So I think it's worth trying again with ple_window of 20000-40000. >> > > Hi Avi, > > I ran different benchmarks increasing ple_window, and results does not > seem to be encouraging for increasing ple_window. Thanks for testing! Comments below. > Results: > 16 core PLE machine with 16 vcpu guest. > > base kernel = 3.6-rc5 + ple handler optimization patch > base_pleopt_8k = base kernel + ple window = 8k > base_pleopt_16k = base kernel + ple window = 16k > base_pleopt_32k = base kernel + ple window = 32k > > > Percentage improvements of benchmarks w.r.t base_pleopt with ple_window = 4096 > > base_pleopt_8k base_pleopt_16k base_pleopt_32k > ----------------------------------------------------------------- > kernbench_1x -5.54915 -15.94529 -44.31562 > kernbench_2x -7.89399 -17.75039 -37.73498 So, 44% degradation even with no overcommit? That's surprising. > I also got perf top output to analyse the difference. Difference comes > because of flushtlb (and also spinlock). That's in the guest, yes? > > Ebizzy run for 4k ple_window > - 87.20% [kernel] [k] arch_local_irq_restore > - arch_local_irq_restore > - 100.00% _raw_spin_unlock_irqrestore > + 52.89% release_pages > + 47.10% pagevec_lru_move_fn > - 5.71% [kernel] [k] arch_local_irq_restore > - arch_local_irq_restore > + 86.03% default_send_IPI_mask_allbutself_phys > + 13.96% default_send_IPI_mask_sequence_phys > - 3.10% [kernel] [k] smp_call_function_many > smp_call_function_many > > > Ebizzy run for 32k ple_window > > - 91.40% [kernel] [k] arch_local_irq_restore > - arch_local_irq_restore > - 100.00% _raw_spin_unlock_irqrestore > + 53.13% release_pages > + 46.86% pagevec_lru_move_fn > - 4.38% [kernel] [k] smp_call_function_many > smp_call_function_many > - 2.51% [kernel] [k] arch_local_irq_restore > - arch_local_irq_restore > + 90.76% default_send_IPI_mask_allbutself_phys > + 9.24% default_send_IPI_mask_sequence_phys > Both the 4k and the 32k results are crazy. Why is arch_local_irq_restore() so prominent? Do you have a very high interrupt rate in the guest? -- error compiling committee.c: too many arguments to function