From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43)
	id 1Illca-0004d2-Jh
	for qemu-devel@nongnu.org; Sat, 27 Oct 2007 09:22:36 -0400
Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43)
	id 1IllcZ-0004cg-4K
	for qemu-devel@nongnu.org; Sat, 27 Oct 2007 09:22:36 -0400
Received: from [199.232.76.173] (helo=monty-python.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.43) id 1IllcY-0004cb-Sl
	for qemu-devel@nongnu.org; Sat, 27 Oct 2007 09:22:34 -0400
Received: from bangui.magic.fr ([195.154.194.245])
	by monty-python.gnu.org with esmtps
	(TLS-1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.60)
	(envelope-from <l_indien@magic.fr>) id 1IllcY-0001up-FS
	for qemu-devel@nongnu.org; Sat, 27 Oct 2007 09:22:34 -0400
Subject: Re: [Qemu-devel] Mips 64 emulation not compiling
From: "J. Mayer" <l_indien@magic.fr>
In-Reply-To: <f43fc5580710270601haf985dem6bc7d92f40cf3082@mail.gmail.com>
References: <1193222474.16781.236.camel@rapid>
	<20071027111939.GH29176@networkno.de>
	<1193487891.16781.280.camel@rapid>
	<f43fc5580710270601haf985dem6bc7d92f40cf3082@mail.gmail.com>
Content-Type: text/plain
Date: Sat, 27 Oct 2007 15:22:04 +0200
Message-Id: <1193491325.16781.298.camel@rapid>
Mime-Version: 1.0
Content-Transfer-Encoding: 7bit
Reply-To: qemu-devel@nongnu.org
List-Id: qemu-devel.nongnu.org
List-Unsubscribe: <http://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.gnu.org/pipermail/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <http://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Blue Swirl <blauwirbel@gmail.com>
Cc: qemu-devel@nongnu.org

On Sat, 2007-10-27 at 16:01 +0300, Blue Swirl wrote:
> On 10/27/07, J. Mayer <l_indien@magic.fr> wrote:
> > I also got optimized versions of bit population count which could also
> > be shared:
> > static always_inline int ctpop32 (uint32_t val)
> > {
> >     int i;
> >
> >     for (i = 0; val != 0; i++)
> >         val = val ^ (val - 1);
> >
> >     return i;
> > }
> >
> > If you prefer, I can add those shared functions (ctz32, ctz64, cto32,
> > cto64, ctpop32, ctpop64) later, as they do not seem as widely used as
> > clxxx functions.
> 
> This would be interesting for Sparc64. Could you compare your version
> to do_popc() in target-sparc/op_helper.c?

My feeling is:
my implementation does n loops, n being the number of bits set in the
word, then will always be faster than yours when only a few bits are
set.
your implementation could be better because:
- it has a fixed cost
- it does not do any tests / jumps / loops
The drawback of your implementation is that it generates a lot of code,
thus could never be used directly in micro-ops: on my amd64 host, my
implementation compiles in 36 bytes of code and the 64 bits version does
not generate more code than the 32 bits one. Your (64 bits only)
implementation compiles in 217 bytes of code. On a x86, my 32 bits
version is 49 bytes long, the 64 bits one is 79 bits long and yours is
323 bytes long.
But this would never be a problem when called from a helper.

Then, I'm not really sure of what is the best choice to be done here....
We may have to do tests to see which one of the 2 implementations seems
more efficient.

-- 
J. Mayer <l_indien@magic.fr>
Never organized