From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:37506) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ZRv01-0006Sc-5y for qemu-devel@nongnu.org; Wed, 19 Aug 2015 00:29:18 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1ZRuzx-0004tk-UB for qemu-devel@nongnu.org; Wed, 19 Aug 2015 00:29:17 -0400 Received: from mout.gmx.net ([212.227.17.22]:61149) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ZRuzx-0004tU-Jm for qemu-devel@nongnu.org; Wed, 19 Aug 2015 00:29:13 -0400 References: <55B734A7.8040108@gmx.net> <55B870A9.4090008@gmx.net> <20150729150147.GO11361@aurel32.net> <55B99F95.8010603@gmx.net> <20150730075252.GT11361@aurel32.net> <55B9DD60.8020801@gmx.net> <20150730085500.GV11361@aurel32.net> <55BF7FF7.8080308@gmx.net> <55BFC62E.4030001@gmx.net> <55D2B3B5.80309@gmx.net> From: Dennis Luehring Message-ID: <55D4060A.4060705@gmx.net> Date: Wed, 19 Aug 2015 06:28:58 +0200 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=iso-8859-15; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] Debian 7.8.0 SPARC64 on qemu - anything i can do to speedup the emulation? List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Karel Gardas , qemu-devel@nongnu.org, Artyom Tarasenko , Aurelien Jarno Am 18.08.2015 um 21:06 schrieb Karel Gardas: > Thanks a lot for doing this. It looks like g++ is memory-bound in this > case, isn't it? What does stream[1] benchmark tell on host and > emulated as 32/64bit sparc binary? Let's see if the ratio is kind of > similar to the time you get... > > [1]:https://www.cs.virginia.edu/stream/ ------------------------------------------------------------- STREAM version $Revision: 5.10 $ ------------------------------------------------------------- This system uses 8 bytes per array element. ------------------------------------------------------------- ==>host Ubuntu 15.04 x64 ------------------------------------------------------------- Array size = 10000000 (elements), Offset = 0 (elements) Memory per array = 76.3 MiB (= 0.1 GiB). Total memory required = 228.9 MiB (= 0.2 GiB). Each kernel will be executed 10 times. The *best* time for each kernel (excluding the first iteration) will be used to compute the reported bandwidth. ------------------------------------------------------------- Your clock granularity/precision appears to be 1 microseconds. Each test below will take on the order of 14147 microseconds. (= 14147 clock ticks) Increase the size of the arrays if this shows that you are not getting at least 20 clock ticks per test. ------------------------------------------------------------- WARNING -- The above is only a rough guideline. For best results, please be sure you know the precision of your system timer. ------------------------------------------------------------- Function Best Rate MB/s Avg time Min time Max time Copy: 8877.1 0.018049 0.018024 0.018074 Scale: 8842.7 0.018206 0.018094 0.018749 Add: 10312.9 0.023367 0.023272 0.023901 Triad: 10114.3 0.023758 0.023729 0.023871 ------------------------------------------------------------- Solution Validates: avg error less than 1.000000e-13 on all three arrays ------------------------------------------------------------- qemu 2.4.50 x64 build ==>netbsd-guest NetBSD 6.1.5 SPARC64 (pure 64bit) running from ramdisk ------------------------------------------------------------- Array size = 10000000 (elements), Offset = 0 (elements) Memory per array = 76.3 MiB (= 0.1 GiB). Total memory required = 228.9 MiB (= 0.2 GiB). Each kernel will be executed 10 times. The *best* time for each kernel (excluding the first iteration) will be used to compute the reported bandwidth. ------------------------------------------------------------- Your clock granularity/precision appears to be 42 microseconds. Each test below will take on the order of 330428 microseconds. (= 7867 clock ticks) Increase the size of the arrays if this shows that you are not getting at least 20 clock ticks per test. ------------------------------------------------------------- WARNING -- The above is only a rough guideline. For best results, please be sure you know the precision of your system timer. ------------------------------------------------------------- Function Best Rate MB/s Avg time Min time Max time Copy: 771.5 0.214717 0.207377 0.244214 Scale: 288.1 0.573320 0.555401 0.660161 Add: 423.5 0.633523 0.566661 1.092067 Triad: 242.9 1.053032 0.987970 1.499563 ------------------------------------------------------------- Solution Validates: avg error less than 1.000000e-13 on all three arrays ------------------------------------------------------------- ==>debian-guest 7.8.0 SPARC64 (mixed 32/64bit) running from ramdisk !!32bit version!! ------------------------------------------------------------- Array size = 10000000 (elements), Offset = 0 (elements) Memory per array = 76.3 MiB (= 0.1 GiB). Total memory required = 228.9 MiB (= 0.2 GiB). Each kernel will be executed 10 times. The *best* time for each kernel (excluding the first iteration) will be used to compute the reported bandwidth. ------------------------------------------------------------- Your clock granularity/precision appears to be 41 microseconds. Each test below will take on the order of 394519 microseconds. (= 9622 clock ticks) Increase the size of the arrays if this shows that you are not getting at least 20 clock ticks per test. ------------------------------------------------------------- WARNING -- The above is only a rough guideline. For best results, please be sure you know the precision of your system timer. ------------------------------------------------------------- Function Best Rate MB/s Avg time Min time Max time Copy: 629.4 0.280860 0.254224 0.401105 Scale: 231.7 0.733338 0.690452 0.868741 Add: 346.9 0.747893 0.691890 0.889102 Triad: 201.4 1.239293 1.191786 1.394918 ------------------------------------------------------------- Solution Validates: avg error less than 1.000000e-13 on all three arrays ------------------------------------------------------------- !!64bit version!! ------------------------------------------------------------- Array size = 10000000 (elements), Offset = 0 (elements) Memory per array = 76.3 MiB (= 0.1 GiB). Total memory required = 228.9 MiB (= 0.2 GiB). Each kernel will be executed 10 times. The *best* time for each kernel (excluding the first iteration) will be used to compute the reported bandwidth. ------------------------------------------------------------- Your clock granularity/precision appears to be 40 microseconds. Each test below will take on the order of 395364 microseconds. (= 9884 clock ticks) Increase the size of the arrays if this shows that you are not getting at least 20 clock ticks per test. ------------------------------------------------------------- WARNING -- The above is only a rough guideline. For best results, please be sure you know the precision of your system timer. ------------------------------------------------------------- Function Best Rate MB/s Avg time Min time Max time Copy: 651.3 0.251320 0.245668 0.274346 Scale: 240.3 0.694808 0.665834 0.770982 Add: 353.0 0.690291 0.679792 0.715228 Triad: 201.5 1.207881 1.191054 1.256001 ------------------------------------------------------------- Solution Validates: avg error less than 1.000000e-13 on all three arrays -------------------------------------------------------------