From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1N1LID-0003C6-Nu for qemu-devel@nongnu.org; Fri, 23 Oct 2009 10:39:01 -0400 Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1N1LIC-0003Bq-5S for qemu-devel@nongnu.org; Fri, 23 Oct 2009 10:39:00 -0400 Received: from [199.232.76.173] (port=58332 helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1N1LIB-0003Be-SF for qemu-devel@nongnu.org; Fri, 23 Oct 2009 10:38:59 -0400 Received: from hall.aurel32.net ([88.191.82.174]:38642) by monty-python.gnu.org with esmtps (TLS-1.0:RSA_AES_256_CBC_SHA1:32) (Exim 4.60) (envelope-from ) id 1N1LIB-0002jS-FQ for qemu-devel@nongnu.org; Fri, 23 Oct 2009 10:38:59 -0400 Message-ID: <4AE1BFFC.8030509@aurel32.net> Date: Fri, 23 Oct 2009 16:38:52 +0200 From: Aurelien Jarno MIME-Version: 1.0 Subject: Re: [Qemu-devel] [PATCH] target-arm: use clz32() instead of a for loop References: <20091015211452.GC7071@volta.aurel32.net> <20091023003417.GA31360@miranda.arrow> <4AE15595.8050709@aurel32.net> <20091023124745.GA32401@miranda.arrow> In-Reply-To: <20091023124745.GA32401@miranda.arrow> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 8bit List-Id: qemu-devel.nongnu.org List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Stuart Brady Cc: qemu-devel@nongnu.org Stuart Brady a écrit : > On Fri, Oct 23, 2009 at 09:04:53AM +0200, Aurelien Jarno wrote: >> Stuart Brady a écrit : >>> Just a quick note that the implementation of clz, ctz and popcnt is >>> still listed in the TCG TODO list. The last time I looked, I noticed >>> that quite a few architectures have clz/ctz instructions: >>> >>> http://lkml.indiana.edu/hypermail/linux/kernel/0601.3/1683.html >> OTOH, a dump shows that those instruction are not used than often, so I >> am not sure it worth implementing it. > > Really? I'm surprised, as I gather that optimised ffs/fls/hweight > functions in the kernel do give a modest gain... I suppose I'll have > to try it on several different targets and see! :-) I gave a quick look at MIPS, and at least here, it is used often. >>> For those that don't, I think a combination the following two hacks at >>> http://graphics.stanford.edu/~seander/bithacks.html could be used: >> The best is probably to use an helper in that case, calling clz32(x). > > Yes, you're right. > > There are several other places that should also call clz32()/ctz32(). > The ones that I can see are helper_neon_cls_s32() for ARM, helper_bsf() > and helper_bsr() for X86, helper_ff1() for M68K. (I'm not sure about > 'do_clz8' and 'do_clz16', though.) > > At some point, possibly next weekend, I'll submit patches to add clz > and ctz helpers to tcg-runtime.c, and to convert Alpha, ARM, CRIS, M68K, > MIPS, PowerPC and x86 (any others I've missed?) to use those helpers. The main problem I see for a TCG implementation is the definition of clz/ctz. Some targets define that clz(0) or ctz(0) returns 32, some other define it as being "undefined". If we go for the common denominator for the TCG op, that is clz(0) = undefined, it means that a test with brcond has to be added in the targets using clz(0) = 32, and this is likely to give more slow down than speed gain. If we go for clz(0) = 32, it means the test has to be implemented in TCG, which might be complicated for some hosts. -- Aurelien Jarno GPG: 1024D/F1BCDB73 aurelien@aurel32.net http://www.aurel32.net