From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:38896) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1X4Z3q-0002yQ-MB for qemu-devel@nongnu.org; Tue, 08 Jul 2014 13:20:14 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1X4Z3l-0007Eo-Gy for qemu-devel@nongnu.org; Tue, 08 Jul 2014 13:20:10 -0400 Received: from zeniv.linux.org.uk ([2002:c35c:fd02::1]:49703) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1X4Z3l-0007De-AR for qemu-devel@nongnu.org; Tue, 08 Jul 2014 13:20:05 -0400 Date: Tue, 8 Jul 2014 18:20:02 +0100 From: Al Viro Message-ID: <20140708172002.GD18016@ZenIV.linux.org.uk> References: <53BAAAAE.2060009@twiddle.net> <20140707150629.GZ18016@ZenIV.linux.org.uk> <53BAC8CC.9070301@twiddle.net> <20140708042037.GA18016@ZenIV.linux.org.uk> <53BB899C.30901@twiddle.net> <20140708065436.GB18016@ZenIV.linux.org.uk> <20140708071334.GA21956@ZenIV.linux.org.uk> <20140708161351.GC18016@ZenIV.linux.org.uk> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Sender: Al Viro Subject: Re: [Qemu-devel] [RFC] alpha qemu arithmetic exceptions List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Peter Maydell Cc: Alex =?iso-8859-1?Q?Benn=E9e?= , QEMU Developers , Richard Henderson On Tue, Jul 08, 2014 at 05:33:16PM +0100, Peter Maydell wrote: > > Incidentally, combination of --enable-gprof and (default) --enable-pie > > won't build - it dies with ld(1) complaining about relocs in gcrt1.o. > > This sounds like a toolchain bug to me :-) Debian stable/amd64, gcc 4.7.2, binutils 2.22. And google search finds this, for example: http://osdir.com/ml/qemu-devel/2013-05/msg00710.html. That one has gcc 4.4.3. Anyway, adding --disable-pie to --enable-gprof gets it to build, but as I said, gprof is no better than perf and oprofile - same problem. Stats I quoted were from qemu-system-alpha booting debian/lenny (5.10) and going through their kernel package build. I have perf report in front of me right now; the top ones are 41.77% qemu-system-alp perf-24701.map [.] 0x7fbbee558930 11.78% qemu-system-alp qemu-system-alpha [.] cpu_alpha_exec 4.95% qemu-system-alp [vdso] [.] 0x7fffdd7ff8de 2.40% qemu-system-alp qemu-system-alpha [.] phys_page_find 1.49% qemu-system-alp qemu-system-alpha [.] address_space_translate_internal 1.34% qemu-system-alp [kernel.kallsyms] [k] read_hpet 1.26% qemu-system-alp qemu-system-alpha [.] tlb_set_page 1.23% qemu-system-alp qemu-system-alpha [.] find_next_bit 1.04% qemu-system-alp qemu-system-alpha [.] get_page_addr_code 1.01% qemu-system-alp libpthread-2.13.so [.] pthread_mutex_lock 0.88% qemu-system-alp qemu-system-alpha [.] helper_cmpbge 0.80% qemu-system-alp libc-2.13.so [.] __memset_sse2 0.72% qemu-system-alp libpthread-2.13.so [.] __pthread_mutex_unlock_usercnt 0.70% qemu-system-alp qemu-system-alpha [.] get_physical_address 0.69% qemu-system-alp qemu-system-alpha [.] address_space_translate 0.68% qemu-system-alp qemu-system-alpha [.] tcg_optimize 0.67% qemu-system-alp qemu-system-alpha [.] ldq_phys 0.63% qemu-system-alp qemu-system-alpha [.] qemu_get_ram_ptr 0.62% qemu-system-alp qemu-system-alpha [.] helper_le_ldq_mmu 0.57% qemu-system-alp qemu-system-alpha [.] memory_region_is_ram and cpu_alpha_exec() spends most of the time in inlined tb_find_fast(). It might be worth checking the actual distribution of the hash of virt address used by that sucker - I wonder if dividing its argument by 4 wouldn't improve the things, but I don't have stats on actual frequency of conflicts, etc. In any case, the first lump (42%) seems to be tastier ;-) There are all kinds of microoptimizations possible (e.g. helper_cmpbge() could be done by a couple of MMX insns on amd64 host[1]), but it would be nice to have some details on what we spend the time on in tcg output... [1] The reason why helper_cmpbge() shows up is that string functions on alpha use that insn a lot; it _might_ be worth optimizing.