From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail138.messagelabs.com (mail138.messagelabs.com [216.82.249.35]) by kanga.kvack.org (Postfix) with SMTP id 413356B01E3 for ; Sat, 10 Apr 2010 16:25:34 -0400 (EDT) Message-ID: <4BC0DE84.3090305@redhat.com> Date: Sat, 10 Apr 2010 23:24:36 +0300 From: Avi Kivity MIME-Version: 1.0 Subject: Re: [PATCH 00 of 41] Transparent Hugepage Support #17 References: <20100405232115.GM5825@random.random> <20100406011345.GT5825@random.random> <20100406090813.GA14098@elte.hu> <20100410184750.GJ5708@random.random> <20100410190233.GA30882@elte.hu> <4BC0CFF4.5000207@redhat.com> <20100410194751.GA23751@elte.hu> In-Reply-To: <20100410194751.GA23751@elte.hu> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org To: Ingo Molnar Cc: Mike Galbraith , Jason Garrett-Glaser , Andrea Arcangeli , Linus Torvalds , Pekka Enberg , Andrew Morton , linux-mm@kvack.org, Marcelo Tosatti , Adam Litke , Izik Eidus , Hugh Dickins , Nick Piggin , Rik van Riel , Mel Gorman , Dave Hansen , Benjamin Herrenschmidt , Mike Travis , KAMEZAWA Hiroyuki , Christoph Lameter , Chris Wright , bpicco@redhat.com, KOSAKI Motohiro , Balbir Singh , Arnd Bergmann , "Michael S. Tsirkin" , Peter Zijlstra , Johannes Weiner , Daisuke Nishimura List-ID: On 04/10/2010 10:47 PM, Ingo Molnar wrote: > * Avi Kivity wrote: > > >>> I think what would be needed is some non-virtualization speedup example of >>> a 'non-special' workload, running on the native/host kernel. 'sort' is an >>> interesting usecase - could it be patched to use hugepages if it has to >>> sort through lots of data? >>> >> In fact it works well unpatched, the 6% I measured was with the system sort. >> > Yes - but you intentionally sorted something large - the question is, how big > is the slowdown with small sizes (if there's a slowdown), where is the > break-even point (if any)? > There shouldn't be a slowdown as far as I can tell. The danger IMO is to pin down unused pages in a huge page and so increase memory pressure artificially. The point where this starts to win would be more or less when the page tables mapping the working set hit the size of the last-level cache, multiplied by some loading factor (guess: 0.5). So if you have a 4MB cache, the win should start at around 1GB working set. >>> Something like GIMP calculations would be a lot more representative of the >>> speedup potential. Is it possible to run the GIMP with transparent >>> hugepages enabled for it? >>> >> I thought of it, but raster work is too regular so speculative execution >> should hide the tlb fill latency. It's also easy to code in a way which >> hides cache effects (no idea if it is actually coded that way). Sort showed >> a speedup since it defeats branch prediction and thus the processor cannot >> pipeline the loop. >> > Would be nice to try because there's a lot of transformations within Gimp - > and Gimp can be scripted. It's also a test for negatives: if there is an > across-the-board _lack_ of speedups, it shows that it's not really general > purpose but more specialistic. > Right, but I don't think I can tell which transforms are likely to be sped up. Also, do people manipulate 500MB images regularly? A 20MB image won't see a significant improvement (40KB page tables, that's chickenfeed). > If the optimization is specialistic, then that's somewhat of an argument > against automatic/transparent handling. (even though even if the beneficiaries > turn out to be only special workloads then transparency still has advantages.) > Well, we know that databases, virtualization, and server-side java win from this. (Oracle won't benefit from this implementation since it wants shared, not anonymous, memory, but other databases may). I'm guessing large C++ compiles, and perhaps the new link-time optimization feature, will also see a nice speedup. Desktops will only benefit when they bloat to ~8GB RAM and 1-2GB firefox RSS, probably not so far in the future. >> I thought ray tracers with large scenes should show a nice speedup, but >> setting this up is beyond my capabilities. >> > Oh, this tickled some memories: x264 compressed encoding can be very cache and > TLB intense. Something like the encoding of a 350 MB video file: > > wget http://media.xiph.org/video/derf/y4m/soccer_4cif.y4m # NOTE: 350 MB! > x264 --crf 20 --quiet soccer_4cif.y4m -o /dev/null --threads 4 > > would be another thing worth trying with transparent-hugetlb enabled. > > I'll try it out. -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org