From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail172.messagelabs.com (mail172.messagelabs.com [216.82.254.3]) by kanga.kvack.org (Postfix) with SMTP id 866D96B01EE for ; Tue, 6 Apr 2010 05:56:49 -0400 (EDT) Message-ID: <4BBB052D.8040307@redhat.com> Date: Tue, 06 Apr 2010 12:55:57 +0300 From: Avi Kivity MIME-Version: 1.0 Subject: Re: [PATCH 00 of 41] Transparent Hugepage Support #17 References: <20100405120906.0abe8e58.akpm@linux-foundation.org> <20100405193616.GA5125@elte.hu> <20100405232115.GM5825@random.random> <20100406011345.GT5825@random.random> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org To: Linus Torvalds Cc: Andrea Arcangeli , Pekka Enberg , Ingo Molnar , Andrew Morton , linux-mm@kvack.org, Marcelo Tosatti , Adam Litke , Izik Eidus , Hugh Dickins , Nick Piggin , Rik van Riel , Mel Gorman , Dave Hansen , Benjamin Herrenschmidt , Mike Travis , KAMEZAWA Hiroyuki , Christoph Lameter , Chris Wright , bpicco@redhat.com, KOSAKI Motohiro , Balbir Singh , Arnd Bergmann , "Michael S. Tsirkin" , Peter Zijlstra , Johannes Weiner , Daisuke Nishimura List-ID: On 04/06/2010 05:23 AM, Linus Torvalds wrote: > > On Mon, 5 Apr 2010, Linus Torvalds wrote: > >> So I thought it was a more interesting load than it was. The >> virtualization "TLB miss is expensive" load I can't find it in myself to >> care about. "Get a better CPU" is my answer to that one, >> > [ Btw, I do realize that "better CPU" in this case may be "future CPU". I > just think that this is where better TLB's and using ASID's etc is > likely to be a much bigger deal than adding VM complexity. Kind of the > For virtualization the tlb miss cost comes from two parts, first there are the 24 memory accesses needed for a tlb fill (instead of the usual 4); these can indeed be improved by various intermediate tlbs (and current processors already do have those caches). However something that cannot be solved by the tlb are the accesses to the last level of the page table hierarchy - as soon as the page tables exceed the cache size, you take two cache misses for each tlb miss. Note virtualization only increases the hit, it also shows with non-virtualized loads, but there your cache utilization is halved and you only need one memory access for your last level page table. Here is a microbenchmark demonstrating the hit (non-virtualized); it simulates a pointer-chasing application with a varying working set. It is easy to see when the working set overflows the various caches, and later when the page tables overflow the caches. For virtualization the hit will be a factor of 3 instead of 2, and will come earlier since the page tables are bigger. size 4k (ns) 2M (ns) 4k 4.9 4.9 16k 4.9 4.9 64k 7.6 7.6 256k 15.1 8.1 1M 28.5 23.9 4M 31.8 25.3 16M 94.8 79.0 64M 260.9 224.2 256M 269.8 248.8 1G 278.1 246.3 4G 330.9 252.6 16G 436.3 243.8 64G 486.0 253.3 -- error compiling committee.c: too many arguments to function -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org