From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753215Ab1LYK6o (ORCPT ); Sun, 25 Dec 2011 05:58:44 -0500 Received: from mx1.redhat.com ([209.132.183.28]:23487 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752156Ab1LYK6l (ORCPT ); Sun, 25 Dec 2011 05:58:41 -0500 Message-ID: <4EF701C7.9080907@redhat.com> Date: Sun, 25 Dec 2011 12:58:15 +0200 From: Avi Kivity User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:8.0) Gecko/20111115 Thunderbird/8.0 MIME-Version: 1.0 To: Ingo Molnar CC: Nikunj A Dadhania , peterz@infradead.org, linux-kernel@vger.kernel.org, vatsa@linux.vnet.ibm.com, bharata@linux.vnet.ibm.com Subject: Re: [RFC PATCH 0/4] Gang scheduling in CFS References: <20111219083141.32311.9429.stgit@abhimanyu.in.ibm.com> <20111219112326.GA15090@elte.hu> <87sjke1a53.fsf@abhimanyu.in.ibm.com> <4EF1B85F.7060105@redhat.com> <877h1o9dp7.fsf@linux.vnet.ibm.com> <20111223103620.GD4749@elte.hu> In-Reply-To: <20111223103620.GD4749@elte.hu> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 12/23/2011 12:36 PM, Ingo Molnar wrote: > * Nikunj A Dadhania wrote: > > > Here some interesting perf reports from inside the guest: > > > > Baseline: > > 29.79% ebizzy [kernel.kallsyms] [k] native_flush_tlb_others > > 18.70% ebizzy libc-2.12.so [.] __GI_memcpy > > 7.23% ebizzy [kernel.kallsyms] [k] get_page_from_freelist > > 5.38% ebizzy [kernel.kallsyms] [k] __do_page_fault > > 4.50% ebizzy [kernel.kallsyms] [k] ____pagevec_lru_add > > 3.58% ebizzy [kernel.kallsyms] [k] default_send_IPI_mask_logical > > 3.26% ebizzy [kernel.kallsyms] [k] native_flush_tlb_single > > 2.82% ebizzy [kernel.kallsyms] [k] handle_pte_fault > > 2.16% ebizzy [kernel.kallsyms] [k] kunmap_atomic > > 2.10% ebizzy [kernel.kallsyms] [k] _spin_unlock_irqrestore > > 1.90% ebizzy [kernel.kallsyms] [k] down_read_trylock > > 1.65% ebizzy [kernel.kallsyms] [k] __mem_cgroup_commit_charge.clone.4 > > 1.60% ebizzy [kernel.kallsyms] [k] up_read > > 1.24% ebizzy [kernel.kallsyms] [k] __alloc_pages_nodemask > > > > Gang: > > 22.53% ebizzy libc-2.12.so [.] __GI_memcpy > > 9.73% ebizzy [kernel.kallsyms] [k] ____pagevec_lru_add > > 8.22% ebizzy [kernel.kallsyms] [k] get_page_from_freelist > > 7.80% ebizzy [kernel.kallsyms] [k] default_send_IPI_mask_logical > > 7.68% ebizzy [kernel.kallsyms] [k] native_flush_tlb_others > > 6.22% ebizzy [kernel.kallsyms] [k] __do_page_fault > > 5.54% ebizzy [kernel.kallsyms] [k] native_flush_tlb_single > > 4.44% ebizzy [kernel.kallsyms] [k] _spin_unlock_irqrestore > > 2.90% ebizzy [kernel.kallsyms] [k] kunmap_atomic > > 2.78% ebizzy [kernel.kallsyms] [k] __mem_cgroup_commit_charge.clone.4 > > 2.76% ebizzy [kernel.kallsyms] [k] handle_pte_fault > > 2.16% ebizzy [kernel.kallsyms] [k] __mem_cgroup_uncharge_common > > 1.59% ebizzy [kernel.kallsyms] [k] down_read_trylock > > 1.43% ebizzy [kernel.kallsyms] [k] up_read > > > > I see the main difference between both the reports is: > > native_flush_tlb_others. > > So it would be important to figure out why ebizzy gets into so > many TLB flushes and why gang scheduling makes it go away. The second part is easy - a remote tlb flush involves IPIs to many other vcpus (possible waking them up and scheduling them), then busy-waiting until they acknowledge the flush. Gang scheduling is really good here since it shortens the busy wait, would be even better if we schedule halted vcpus (see the yield_on_hlt module parameter, set to 0). Directed yield on PLE should provide intermediate results between doing nothing and gang sched. The first part appears to be unrelated to ebizzy itself - it's the kunmap_atomic() flushing ptes. It could be eliminated by switching to a non-highmem kernel, or by allocating more PTEs for kmap_atomic() and batching the flush. btw you can get an additional speedup by enabling x2apic, for default_send_IPI_mask_logical(). -- error compiling committee.c: too many arguments to function