* [Qemu-devel] [PATCH] target-arm: use clz32() instead of a for loop
@ 2009-10-15 21:14 Aurelien Jarno
2009-10-18 14:21 ` Laurent Desnogues
2009-10-23 0:34 ` Stuart Brady
0 siblings, 2 replies; 6+ messages in thread
From: Aurelien Jarno @ 2009-10-15 21:14 UTC (permalink / raw)
To: qemu-devel
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
---
target-arm/helper.c | 6 ++----
1 files changed, 2 insertions(+), 4 deletions(-)
diff --git a/target-arm/helper.c b/target-arm/helper.c
index 701629a..656b5df 100644
--- a/target-arm/helper.c
+++ b/target-arm/helper.c
@@ -7,6 +7,7 @@
#include "gdbstub.h"
#include "helpers.h"
#include "qemu-common.h"
+#include "host-utils.h"
static uint32_t cortexa8_cp15_c0_c1[8] =
{ 0x1031, 0x11, 0x400, 0, 0x31100003, 0x20000000, 0x01202000, 0x11 };
@@ -394,10 +395,7 @@ uint32_t HELPER(uxtb16)(uint32_t x)
uint32_t HELPER(clz)(uint32_t x)
{
- int count;
- for (count = 32; x; count--)
- x >>= 1;
- return count;
+ return clz32(x);
}
int32_t HELPER(sdiv)(int32_t num, int32_t den)
--
1.6.1.3
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [Qemu-devel] [PATCH] target-arm: use clz32() instead of a for loop
2009-10-15 21:14 [Qemu-devel] [PATCH] target-arm: use clz32() instead of a for loop Aurelien Jarno
@ 2009-10-18 14:21 ` Laurent Desnogues
2009-10-23 0:34 ` Stuart Brady
1 sibling, 0 replies; 6+ messages in thread
From: Laurent Desnogues @ 2009-10-18 14:21 UTC (permalink / raw)
To: Aurelien Jarno; +Cc: qemu-devel
On Thu, Oct 15, 2009 at 11:14 PM, Aurelien Jarno <aurelien@aurel32.net> wrote:
> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
> ---
> target-arm/helper.c | 6 ++----
> 1 files changed, 2 insertions(+), 4 deletions(-)
>
> diff --git a/target-arm/helper.c b/target-arm/helper.c
> index 701629a..656b5df 100644
> --- a/target-arm/helper.c
> +++ b/target-arm/helper.c
> @@ -7,6 +7,7 @@
> #include "gdbstub.h"
> #include "helpers.h"
> #include "qemu-common.h"
> +#include "host-utils.h"
>
> static uint32_t cortexa8_cp15_c0_c1[8] =
> { 0x1031, 0x11, 0x400, 0, 0x31100003, 0x20000000, 0x01202000, 0x11 };
> @@ -394,10 +395,7 @@ uint32_t HELPER(uxtb16)(uint32_t x)
>
> uint32_t HELPER(clz)(uint32_t x)
> {
> - int count;
> - for (count = 32; x; count--)
> - x >>= 1;
> - return count;
> + return clz32(x);
> }
>
> int32_t HELPER(sdiv)(int32_t num, int32_t den)
> --
> 1.6.1.3
Acked-by: Laurent Desnogues <laurent.desnogues@gmail.com>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Qemu-devel] [PATCH] target-arm: use clz32() instead of a for loop
2009-10-15 21:14 [Qemu-devel] [PATCH] target-arm: use clz32() instead of a for loop Aurelien Jarno
2009-10-18 14:21 ` Laurent Desnogues
@ 2009-10-23 0:34 ` Stuart Brady
2009-10-23 7:04 ` Aurelien Jarno
1 sibling, 1 reply; 6+ messages in thread
From: Stuart Brady @ 2009-10-23 0:34 UTC (permalink / raw)
To: qemu-devel
On Thu, Oct 15, 2009 at 11:14:52PM +0200, Aurelien Jarno wrote:
> @@ -394,10 +395,7 @@ uint32_t HELPER(uxtb16)(uint32_t x)
>
> uint32_t HELPER(clz)(uint32_t x)
> {
> - int count;
> - for (count = 32; x; count--)
> - x >>= 1;
> - return count;
> + return clz32(x);
> }
>
> int32_t HELPER(sdiv)(int32_t num, int32_t den)
Just a quick note that the implementation of clz, ctz and popcnt is
still listed in the TCG TODO list. The last time I looked, I noticed
that quite a few architectures have clz/ctz instructions:
http://lkml.indiana.edu/hypermail/linux/kernel/0601.3/1683.html
For those that don't, I think a combination the following two hacks at
http://graphics.stanford.edu/~seander/bithacks.html could be used:
'Round up to the next highest power of 2'
'Counting bits set, in parallel'
With this, it should be possible to implement clz and ctz without too
many operations for both 32-bit and 64-bit integers, without requiring
floats, lookup tables or branches. Of course, __builtin_clz() might
well do a better job...
BTW, it may be worth pointing out:
B[4] = 0x0000ffff;
B[3] = B[4] ^ (B[4] << 8) => 0x00ff00ff
B[2] = B[3] ^ (B[3] << 4) => 0x0f0f0f0f
B[1] = B[2] ^ (B[2] << 2) => 0x33333333
B[0] = B[1] ^ (B[1] << 1) => 0x55555555
In reality, I wonder if five separate loads would be quicker, though.
Cheers,
--
Stuart Brady
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Qemu-devel] [PATCH] target-arm: use clz32() instead of a for loop
2009-10-23 0:34 ` Stuart Brady
@ 2009-10-23 7:04 ` Aurelien Jarno
2009-10-23 12:47 ` Stuart Brady
0 siblings, 1 reply; 6+ messages in thread
From: Aurelien Jarno @ 2009-10-23 7:04 UTC (permalink / raw)
To: Stuart Brady; +Cc: qemu-devel
Stuart Brady a écrit :
> On Thu, Oct 15, 2009 at 11:14:52PM +0200, Aurelien Jarno wrote:
>> @@ -394,10 +395,7 @@ uint32_t HELPER(uxtb16)(uint32_t x)
>>
>> uint32_t HELPER(clz)(uint32_t x)
>> {
>> - int count;
>> - for (count = 32; x; count--)
>> - x >>= 1;
>> - return count;
>> + return clz32(x);
>> }
>>
>> int32_t HELPER(sdiv)(int32_t num, int32_t den)
>
> Just a quick note that the implementation of clz, ctz and popcnt is
> still listed in the TCG TODO list. The last time I looked, I noticed
> that quite a few architectures have clz/ctz instructions:
>
> http://lkml.indiana.edu/hypermail/linux/kernel/0601.3/1683.html
OTOH, a dump shows that those instruction are not used than often, so I
am not sure it worth implementing it.
> For those that don't, I think a combination the following two hacks at
> http://graphics.stanford.edu/~seander/bithacks.html could be used:
The best is probably to use an helper in that case, calling clz32(x).
--
Aurelien Jarno GPG: 1024D/F1BCDB73
aurelien@aurel32.net http://www.aurel32.net
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Qemu-devel] [PATCH] target-arm: use clz32() instead of a for loop
2009-10-23 7:04 ` Aurelien Jarno
@ 2009-10-23 12:47 ` Stuart Brady
2009-10-23 14:38 ` Aurelien Jarno
0 siblings, 1 reply; 6+ messages in thread
From: Stuart Brady @ 2009-10-23 12:47 UTC (permalink / raw)
To: Aurelien Jarno; +Cc: qemu-devel
On Fri, Oct 23, 2009 at 09:04:53AM +0200, Aurelien Jarno wrote:
> Stuart Brady a écrit :
> > Just a quick note that the implementation of clz, ctz and popcnt is
> > still listed in the TCG TODO list. The last time I looked, I noticed
> > that quite a few architectures have clz/ctz instructions:
> >
> > http://lkml.indiana.edu/hypermail/linux/kernel/0601.3/1683.html
>
> OTOH, a dump shows that those instruction are not used than often, so I
> am not sure it worth implementing it.
Really? I'm surprised, as I gather that optimised ffs/fls/hweight
functions in the kernel do give a modest gain... I suppose I'll have
to try it on several different targets and see! :-)
> > For those that don't, I think a combination the following two hacks at
> > http://graphics.stanford.edu/~seander/bithacks.html could be used:
>
> The best is probably to use an helper in that case, calling clz32(x).
Yes, you're right.
There are several other places that should also call clz32()/ctz32().
The ones that I can see are helper_neon_cls_s32() for ARM, helper_bsf()
and helper_bsr() for X86, helper_ff1() for M68K. (I'm not sure about
'do_clz8' and 'do_clz16', though.)
At some point, possibly next weekend, I'll submit patches to add clz
and ctz helpers to tcg-runtime.c, and to convert Alpha, ARM, CRIS, M68K,
MIPS, PowerPC and x86 (any others I've missed?) to use those helpers.
Cheers,
--
Stuart Brady
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Qemu-devel] [PATCH] target-arm: use clz32() instead of a for loop
2009-10-23 12:47 ` Stuart Brady
@ 2009-10-23 14:38 ` Aurelien Jarno
0 siblings, 0 replies; 6+ messages in thread
From: Aurelien Jarno @ 2009-10-23 14:38 UTC (permalink / raw)
To: Stuart Brady; +Cc: qemu-devel
Stuart Brady a écrit :
> On Fri, Oct 23, 2009 at 09:04:53AM +0200, Aurelien Jarno wrote:
>> Stuart Brady a écrit :
>>> Just a quick note that the implementation of clz, ctz and popcnt is
>>> still listed in the TCG TODO list. The last time I looked, I noticed
>>> that quite a few architectures have clz/ctz instructions:
>>>
>>> http://lkml.indiana.edu/hypermail/linux/kernel/0601.3/1683.html
>> OTOH, a dump shows that those instruction are not used than often, so I
>> am not sure it worth implementing it.
>
> Really? I'm surprised, as I gather that optimised ffs/fls/hweight
> functions in the kernel do give a modest gain... I suppose I'll have
> to try it on several different targets and see! :-)
I gave a quick look at MIPS, and at least here, it is used often.
>>> For those that don't, I think a combination the following two hacks at
>>> http://graphics.stanford.edu/~seander/bithacks.html could be used:
>> The best is probably to use an helper in that case, calling clz32(x).
>
> Yes, you're right.
>
> There are several other places that should also call clz32()/ctz32().
> The ones that I can see are helper_neon_cls_s32() for ARM, helper_bsf()
> and helper_bsr() for X86, helper_ff1() for M68K. (I'm not sure about
> 'do_clz8' and 'do_clz16', though.)
>
> At some point, possibly next weekend, I'll submit patches to add clz
> and ctz helpers to tcg-runtime.c, and to convert Alpha, ARM, CRIS, M68K,
> MIPS, PowerPC and x86 (any others I've missed?) to use those helpers.
The main problem I see for a TCG implementation is the definition of
clz/ctz. Some targets define that clz(0) or ctz(0) returns 32, some
other define it as being "undefined".
If we go for the common denominator for the TCG op, that is clz(0) =
undefined, it means that a test with brcond has to be added in the
targets using clz(0) = 32, and this is likely to give more slow down
than speed gain.
If we go for clz(0) = 32, it means the test has to be implemented in
TCG, which might be complicated for some hosts.
--
Aurelien Jarno GPG: 1024D/F1BCDB73
aurelien@aurel32.net http://www.aurel32.net
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2009-10-23 14:39 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-10-15 21:14 [Qemu-devel] [PATCH] target-arm: use clz32() instead of a for loop Aurelien Jarno
2009-10-18 14:21 ` Laurent Desnogues
2009-10-23 0:34 ` Stuart Brady
2009-10-23 7:04 ` Aurelien Jarno
2009-10-23 12:47 ` Stuart Brady
2009-10-23 14:38 ` Aurelien Jarno
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).