From mboxrd@z Thu Jan 1 00:00:00 1970 From: Gordan Bobic Subject: Re: Xen 4.2.2 / KVM / VirtualBox benchmark on Haswell Date: Thu, 11 Jul 2013 18:49:00 +0100 Message-ID: <51DEF00C.9080400@bobich.net> References: <51DC2BE3.7000009@xen.org> <1373540028.12772.31.camel@Solace> <1373560062.12772.48.camel@Solace> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii"; Format="flowed" Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <1373560062.12772.48.camel@Solace> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Dario Faggioli Cc: George Dunlap , Lars Kurth , "xen-devel@lists.xen.org" List-Id: xen-devel@lists.xenproject.org On 07/11/2013 05:27 PM, Dario Faggioli wrote: > On gio, 2013-07-11 at 17:23 +0100, George Dunlap wrote: >> On Thu, Jul 11, 2013 at 11:53 AM, Dario Faggioli >>> When I tried to use kernel compile as a benchmark for the NUMA effects, >>> it did not turn out that useful to me (and that's why I switched to >>> SpecJBB), but perhaps it was me that was doing something wrong... >> >> In my experience, kernel-build has excellent memory locality. One >> effect is that the effect of nested paging on TLB time is almostt nil; >> I'm not surprised that the caches make the effect of NUMA almost nil >> as well. >> > Not to mention I/O, unless you setup a ramfs backed building > environment. Again, when I tried, that was my intention, but perhaps I > failed right at that... Gordan, what about you? IIRC in my tests the disk I/O was relatively minimal. If you read the details here: http://www.altechnative.net/2012/08/04/virtual-performance-part-1-vmware/ you may notice that I actually primed the test by catting everything to /dev/null, so all the reads should have been coming from the page cache. I didn't have enough RAM in the machine (only 8GB) to fit all the produced binaries in tmpfs at the time. I don't think this had a large impact, though - the iowait time was about 0% all the time because there were plenty of threads that had productive compiling work to do while some were waiting to commit to disk. Since this was on a C2Q, there was no NUMA in play, so if I had to guess at the major cause of performance degradation, it would be related to context switching; having said that, I didn't get around to doing any in-depth profiling to be able to tell for sure. (Speaking of which, how would one go about profiling things at bare-metal hypervisor level? I will re-run the test on a new machine at some point and see how it compares, and this time I will have enough RAM for the whole lot to fit. Gordan