From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:34759) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ZTBvA-0006dE-LH for qemu-devel@nongnu.org; Sat, 22 Aug 2015 12:45:33 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1ZTBv9-0001cI-5r for qemu-devel@nongnu.org; Sat, 22 Aug 2015 12:45:32 -0400 Received: from mail-lb0-x233.google.com ([2a00:1450:4010:c04::233]:34763) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ZTBv8-0001bu-Ru for qemu-devel@nongnu.org; Sat, 22 Aug 2015 12:45:31 -0400 Received: by lbbtg9 with SMTP id tg9so59654546lbb.1 for ; Sat, 22 Aug 2015 09:45:28 -0700 (PDT) MIME-Version: 1.0 In-Reply-To: <55D60C1B.9010502@twiddle.net> References: <55B9DD60.8020801@gmx.net> <20150730085500.GV11361@aurel32.net> <20150730155003.GE30591@aurel32.net> <20150731154323.GD23508@aurel32.net> <20150803091716.GF30591@aurel32.net> <55D37189.3010809@twiddle.net> <20150819110010.GJ23508@aurel32.net> <55D60C1B.9010502@twiddle.net> From: Artyom Tarasenko Date: Sat, 22 Aug 2015 18:45:09 +0200 Message-ID: Content-Type: text/plain; charset=UTF-8 Subject: Re: [Qemu-devel] Debian 7.8.0 SPARC64 on qemu - anything i can do to speedup the emulation? List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Richard Henderson Cc: Dennis Luehring , Aurelien Jarno , qemu-devel On Thu, Aug 20, 2015 at 7:19 PM, Richard Henderson wrote: > On 08/19/2015 07:41 AM, Artyom Tarasenko wrote: >> >> Without the patch: >> >> time g++ -DHAVE_CONFIG_H -I. -I../binutils-gdb/gold >> -I../binutils-gdb/gold -I../binutils-gdb/gold/../include >> -I../binutils-gdb/gold/../elfcpp >> -DLOCALEDIR="\"/usr/local/share/locale\"" >> -DBINDIR="\"/usr/local/bin\"" -DTOOLBINDIR="\"/usr/local//bin\"" >> -DTOOLLIBDIR="\"/usr/local//lib\"" -W -Wall -Werror >> -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -frandom-seed=tilegx.o >> -I../binutils-gdb/gold/../zlib -g -O2 -MT tilegx.o -MD -MP -MF >> .deps/tilegx.Tpo -c -o tilegx.o ../binutils-gdb/gold/tilegx.cc >> >> real 18m31.407s >> user 18m23.661s >> sys 0m6.784s >> >> The patch surely improves the situation, tcg_optimize in the perf top >> takes ~7% (instead of~12%), and the only function marked red by >> perf-top is init_temp_info(). So with the patch: >> >> real 17m46.380s >> user 17m37.522s >> sys 0m7.120s >> >> >> And if I completely disable optimizer (// #define >> USE_TCG_OPTIMIZATIONS in tcg.c), it's still quite faster: >> >> real 14m17.668s >> user 14m10.241s >> sys 0m6.060s > > > This isn't surprising, because at the moment tcg optimizations are almost > completely ineffective for sparc. The way the register windows are > implemented means that there are very few proper tcg temporaries to > optimize. > > I've just updated an old branch that attempts to cure this. It creates > proper tcg temporaries for the windowed registers, and uses a bit of > recursion to find the place at which they should be stored. > > git://github.com/rth7680/qemu.git tcg-indirect > > With a few quick unscientific tests, it appears to help. It would be nice > to put that branch side-by-side with your tests above. Sorry for the delay with testing. For my test case tcg-indirect brings more performance gain than for Dennis: git master: 18m31s tcg-indirect: 16m50s #undef USE_TCG_OPTIMIZATIONS: 14m18s JIT statistic, before starting the test: (qemu) info jit Translation buffer state: gen code size 31851136/314448896 TB count 128224/2457592 TB avg target size 18 max=704 bytes TB avg host size 248 bytes (expansion ratio: 13.4) cross page TB count 0 (0%) direct jump count 83840 (65%) (2 jumps=64730 50%) Statistics: TB flush count 5 TB invalidate count 317160 TLB flush count 1180769 [TCG profiler not compiled] After (qemu) info jit Translation buffer state: gen code size 282903344/314448896 TB count 1139744/2457592 TB avg target size 17 max=704 bytes TB avg host size 248 bytes (expansion ratio: 14.0) cross page TB count 0 (0%) direct jump count 739828 (64%) (2 jumps=569074 49%) Statistics: TB flush count 5 TB invalidate count 324362 TLB flush count 2050744 So, TB invalidate count gained only ~ 5000. Yet tcg_optimize is ~7% in the perf top, and tcg_liveness_analysis ~3%. Why do we translate so much? Artyom -- Regards, Artyom Tarasenko 16m50.161s SPARC and PPC PReP under qemu blog: http://tyom.blogspot.com/search/label/qemu