* [Qemu-devel] [PATCH] target-arm: use clz32() instead of a for loop @ 2009-10-15 21:14 Aurelien Jarno 2009-10-18 14:21 ` Laurent Desnogues 2009-10-23 0:34 ` Stuart Brady 0 siblings, 2 replies; 6+ messages in thread From: Aurelien Jarno @ 2009-10-15 21:14 UTC (permalink / raw) To: qemu-devel Signed-off-by: Aurelien Jarno <aurelien@aurel32.net> --- target-arm/helper.c | 6 ++---- 1 files changed, 2 insertions(+), 4 deletions(-) diff --git a/target-arm/helper.c b/target-arm/helper.c index 701629a..656b5df 100644 --- a/target-arm/helper.c +++ b/target-arm/helper.c @@ -7,6 +7,7 @@ #include "gdbstub.h" #include "helpers.h" #include "qemu-common.h" +#include "host-utils.h" static uint32_t cortexa8_cp15_c0_c1[8] = { 0x1031, 0x11, 0x400, 0, 0x31100003, 0x20000000, 0x01202000, 0x11 }; @@ -394,10 +395,7 @@ uint32_t HELPER(uxtb16)(uint32_t x) uint32_t HELPER(clz)(uint32_t x) { - int count; - for (count = 32; x; count--) - x >>= 1; - return count; + return clz32(x); } int32_t HELPER(sdiv)(int32_t num, int32_t den) -- 1.6.1.3 ^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [Qemu-devel] [PATCH] target-arm: use clz32() instead of a for loop 2009-10-15 21:14 [Qemu-devel] [PATCH] target-arm: use clz32() instead of a for loop Aurelien Jarno @ 2009-10-18 14:21 ` Laurent Desnogues 2009-10-23 0:34 ` Stuart Brady 1 sibling, 0 replies; 6+ messages in thread From: Laurent Desnogues @ 2009-10-18 14:21 UTC (permalink / raw) To: Aurelien Jarno; +Cc: qemu-devel On Thu, Oct 15, 2009 at 11:14 PM, Aurelien Jarno <aurelien@aurel32.net> wrote: > Signed-off-by: Aurelien Jarno <aurelien@aurel32.net> > --- > target-arm/helper.c | 6 ++---- > 1 files changed, 2 insertions(+), 4 deletions(-) > > diff --git a/target-arm/helper.c b/target-arm/helper.c > index 701629a..656b5df 100644 > --- a/target-arm/helper.c > +++ b/target-arm/helper.c > @@ -7,6 +7,7 @@ > #include "gdbstub.h" > #include "helpers.h" > #include "qemu-common.h" > +#include "host-utils.h" > > static uint32_t cortexa8_cp15_c0_c1[8] = > { 0x1031, 0x11, 0x400, 0, 0x31100003, 0x20000000, 0x01202000, 0x11 }; > @@ -394,10 +395,7 @@ uint32_t HELPER(uxtb16)(uint32_t x) > > uint32_t HELPER(clz)(uint32_t x) > { > - int count; > - for (count = 32; x; count--) > - x >>= 1; > - return count; > + return clz32(x); > } > > int32_t HELPER(sdiv)(int32_t num, int32_t den) > -- > 1.6.1.3 Acked-by: Laurent Desnogues <laurent.desnogues@gmail.com> ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Qemu-devel] [PATCH] target-arm: use clz32() instead of a for loop 2009-10-15 21:14 [Qemu-devel] [PATCH] target-arm: use clz32() instead of a for loop Aurelien Jarno 2009-10-18 14:21 ` Laurent Desnogues @ 2009-10-23 0:34 ` Stuart Brady 2009-10-23 7:04 ` Aurelien Jarno 1 sibling, 1 reply; 6+ messages in thread From: Stuart Brady @ 2009-10-23 0:34 UTC (permalink / raw) To: qemu-devel On Thu, Oct 15, 2009 at 11:14:52PM +0200, Aurelien Jarno wrote: > @@ -394,10 +395,7 @@ uint32_t HELPER(uxtb16)(uint32_t x) > > uint32_t HELPER(clz)(uint32_t x) > { > - int count; > - for (count = 32; x; count--) > - x >>= 1; > - return count; > + return clz32(x); > } > > int32_t HELPER(sdiv)(int32_t num, int32_t den) Just a quick note that the implementation of clz, ctz and popcnt is still listed in the TCG TODO list. The last time I looked, I noticed that quite a few architectures have clz/ctz instructions: http://lkml.indiana.edu/hypermail/linux/kernel/0601.3/1683.html For those that don't, I think a combination the following two hacks at http://graphics.stanford.edu/~seander/bithacks.html could be used: 'Round up to the next highest power of 2' 'Counting bits set, in parallel' With this, it should be possible to implement clz and ctz without too many operations for both 32-bit and 64-bit integers, without requiring floats, lookup tables or branches. Of course, __builtin_clz() might well do a better job... BTW, it may be worth pointing out: B[4] = 0x0000ffff; B[3] = B[4] ^ (B[4] << 8) => 0x00ff00ff B[2] = B[3] ^ (B[3] << 4) => 0x0f0f0f0f B[1] = B[2] ^ (B[2] << 2) => 0x33333333 B[0] = B[1] ^ (B[1] << 1) => 0x55555555 In reality, I wonder if five separate loads would be quicker, though. Cheers, -- Stuart Brady ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Qemu-devel] [PATCH] target-arm: use clz32() instead of a for loop 2009-10-23 0:34 ` Stuart Brady @ 2009-10-23 7:04 ` Aurelien Jarno 2009-10-23 12:47 ` Stuart Brady 0 siblings, 1 reply; 6+ messages in thread From: Aurelien Jarno @ 2009-10-23 7:04 UTC (permalink / raw) To: Stuart Brady; +Cc: qemu-devel Stuart Brady a écrit : > On Thu, Oct 15, 2009 at 11:14:52PM +0200, Aurelien Jarno wrote: >> @@ -394,10 +395,7 @@ uint32_t HELPER(uxtb16)(uint32_t x) >> >> uint32_t HELPER(clz)(uint32_t x) >> { >> - int count; >> - for (count = 32; x; count--) >> - x >>= 1; >> - return count; >> + return clz32(x); >> } >> >> int32_t HELPER(sdiv)(int32_t num, int32_t den) > > Just a quick note that the implementation of clz, ctz and popcnt is > still listed in the TCG TODO list. The last time I looked, I noticed > that quite a few architectures have clz/ctz instructions: > > http://lkml.indiana.edu/hypermail/linux/kernel/0601.3/1683.html OTOH, a dump shows that those instruction are not used than often, so I am not sure it worth implementing it. > For those that don't, I think a combination the following two hacks at > http://graphics.stanford.edu/~seander/bithacks.html could be used: The best is probably to use an helper in that case, calling clz32(x). -- Aurelien Jarno GPG: 1024D/F1BCDB73 aurelien@aurel32.net http://www.aurel32.net ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Qemu-devel] [PATCH] target-arm: use clz32() instead of a for loop 2009-10-23 7:04 ` Aurelien Jarno @ 2009-10-23 12:47 ` Stuart Brady 2009-10-23 14:38 ` Aurelien Jarno 0 siblings, 1 reply; 6+ messages in thread From: Stuart Brady @ 2009-10-23 12:47 UTC (permalink / raw) To: Aurelien Jarno; +Cc: qemu-devel On Fri, Oct 23, 2009 at 09:04:53AM +0200, Aurelien Jarno wrote: > Stuart Brady a écrit : > > Just a quick note that the implementation of clz, ctz and popcnt is > > still listed in the TCG TODO list. The last time I looked, I noticed > > that quite a few architectures have clz/ctz instructions: > > > > http://lkml.indiana.edu/hypermail/linux/kernel/0601.3/1683.html > > OTOH, a dump shows that those instruction are not used than often, so I > am not sure it worth implementing it. Really? I'm surprised, as I gather that optimised ffs/fls/hweight functions in the kernel do give a modest gain... I suppose I'll have to try it on several different targets and see! :-) > > For those that don't, I think a combination the following two hacks at > > http://graphics.stanford.edu/~seander/bithacks.html could be used: > > The best is probably to use an helper in that case, calling clz32(x). Yes, you're right. There are several other places that should also call clz32()/ctz32(). The ones that I can see are helper_neon_cls_s32() for ARM, helper_bsf() and helper_bsr() for X86, helper_ff1() for M68K. (I'm not sure about 'do_clz8' and 'do_clz16', though.) At some point, possibly next weekend, I'll submit patches to add clz and ctz helpers to tcg-runtime.c, and to convert Alpha, ARM, CRIS, M68K, MIPS, PowerPC and x86 (any others I've missed?) to use those helpers. Cheers, -- Stuart Brady ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Qemu-devel] [PATCH] target-arm: use clz32() instead of a for loop 2009-10-23 12:47 ` Stuart Brady @ 2009-10-23 14:38 ` Aurelien Jarno 0 siblings, 0 replies; 6+ messages in thread From: Aurelien Jarno @ 2009-10-23 14:38 UTC (permalink / raw) To: Stuart Brady; +Cc: qemu-devel Stuart Brady a écrit : > On Fri, Oct 23, 2009 at 09:04:53AM +0200, Aurelien Jarno wrote: >> Stuart Brady a écrit : >>> Just a quick note that the implementation of clz, ctz and popcnt is >>> still listed in the TCG TODO list. The last time I looked, I noticed >>> that quite a few architectures have clz/ctz instructions: >>> >>> http://lkml.indiana.edu/hypermail/linux/kernel/0601.3/1683.html >> OTOH, a dump shows that those instruction are not used than often, so I >> am not sure it worth implementing it. > > Really? I'm surprised, as I gather that optimised ffs/fls/hweight > functions in the kernel do give a modest gain... I suppose I'll have > to try it on several different targets and see! :-) I gave a quick look at MIPS, and at least here, it is used often. >>> For those that don't, I think a combination the following two hacks at >>> http://graphics.stanford.edu/~seander/bithacks.html could be used: >> The best is probably to use an helper in that case, calling clz32(x). > > Yes, you're right. > > There are several other places that should also call clz32()/ctz32(). > The ones that I can see are helper_neon_cls_s32() for ARM, helper_bsf() > and helper_bsr() for X86, helper_ff1() for M68K. (I'm not sure about > 'do_clz8' and 'do_clz16', though.) > > At some point, possibly next weekend, I'll submit patches to add clz > and ctz helpers to tcg-runtime.c, and to convert Alpha, ARM, CRIS, M68K, > MIPS, PowerPC and x86 (any others I've missed?) to use those helpers. The main problem I see for a TCG implementation is the definition of clz/ctz. Some targets define that clz(0) or ctz(0) returns 32, some other define it as being "undefined". If we go for the common denominator for the TCG op, that is clz(0) = undefined, it means that a test with brcond has to be added in the targets using clz(0) = 32, and this is likely to give more slow down than speed gain. If we go for clz(0) = 32, it means the test has to be implemented in TCG, which might be complicated for some hosts. -- Aurelien Jarno GPG: 1024D/F1BCDB73 aurelien@aurel32.net http://www.aurel32.net ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2009-10-23 14:39 UTC | newest] Thread overview: 6+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2009-10-15 21:14 [Qemu-devel] [PATCH] target-arm: use clz32() instead of a for loop Aurelien Jarno 2009-10-18 14:21 ` Laurent Desnogues 2009-10-23 0:34 ` Stuart Brady 2009-10-23 7:04 ` Aurelien Jarno 2009-10-23 12:47 ` Stuart Brady 2009-10-23 14:38 ` Aurelien Jarno
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).