* [Qemu-devel] Mips 64 emulation not compiling @ 2007-10-24 10:41 J. Mayer 2007-10-27 11:19 ` Thiemo Seufer 0 siblings, 1 reply; 8+ messages in thread From: J. Mayer @ 2007-10-24 10:41 UTC (permalink / raw) To: qemu-devel The latest patches in clo makes gcc 3.4.6 fail to build the mips64 targets on my amd64 host (looks like an register allocation clash in the optimizer code). Furthermore, the clz micro-op for Mips seems very suspect to me, according to the changes made in the clo implementation. I did change the clz / clo implementation to use the same code as the one used for the PowerPC implementation. It seems to me that the result would be correct... And it compiles... Please take a look to the folowing patch: Index: target-mips/op.c =================================================================== RCS file: /sources/qemu/qemu/target-mips/op.c,v retrieving revision 1.80 diff -u -d -d -p -r1.80 op.c --- target-mips/op.c 24 Oct 2007 00:10:32 -0000 1.80 +++ target-mips/op.c 24 Oct 2007 10:38:26 -0000 @@ -535,37 +535,44 @@ void op_rotrv (void) RETURN(); } -void op_clo (void) +static always_inline int _do_cntlzw (uint32_t val) { - int n; - - if (T0 == ~((target_ulong)0)) { - T0 = 32; - } else { - for (n = 0; n < 32; n++) { - if (!(((int32_t)T0) & (1 << 31))) - break; - T0 <<= 1; - } - T0 = n; + int cnt = 0; + if (!(val & 0xFFFF0000UL)) { + cnt += 16; + val <<= 16; + } + if (!(val & 0xFF000000UL)) { + cnt += 8; + val <<= 8; } + if (!(val & 0xF0000000UL)) { + cnt += 4; + val <<= 4; + } + if (!(val & 0xC0000000UL)) { + cnt += 2; + val <<= 2; + } + if (!(val & 0x80000000UL)) { + cnt++; + val <<= 1; + } + if (!(val & 0x80000000UL)) { + cnt++; + } + return cnt; +} + +void op_clo (void) +{ + T0 = _do_cntlzw(~T0); RETURN(); } void op_clz (void) { - int n; - - if (T0 == 0) { - T0 = 32; - } else { - for (n = 0; n < 32; n++) { - if (T0 & (1 << 31)) - break; - T0 <<= 1; - } - T0 = n; - } + T0 = _do_cntlzw(T0); RETURN(); } -- J. Mayer <l_indien@magic.fr> Never organized ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [Qemu-devel] Mips 64 emulation not compiling 2007-10-24 10:41 [Qemu-devel] Mips 64 emulation not compiling J. Mayer @ 2007-10-27 11:19 ` Thiemo Seufer 2007-10-27 12:24 ` J. Mayer 0 siblings, 1 reply; 8+ messages in thread From: Thiemo Seufer @ 2007-10-27 11:19 UTC (permalink / raw) To: J. Mayer; +Cc: qemu-devel J. Mayer wrote: > The latest patches in clo makes gcc 3.4.6 fail to build the mips64 > targets on my amd64 host (looks like an register allocation clash in the > optimizer code). Your version is likely faster as well. > Furthermore, the clz micro-op for Mips seems very suspect to me, > according to the changes made in the clo implementation. It is correct, the sign-extension are zero in that case. > I did change the clz / clo implementation to use the same code as the > one used for the PowerPC implementation. It seems to me that the result > would be correct... And it compiles... > > Please take a look to the folowing patch: We have now clz/clo in several places, so I expanded your patch a bit. For now it is only used for the mips target. Comments? Thiemo Index: qemu-work/host-utils.h =================================================================== --- /dev/null 1970-01-01 00:00:00.000000000 +0000 +++ qemu-work/host-utils.h 2007-10-27 12:13:30.000000000 +0100 @@ -0,0 +1,104 @@ +/* + * Utility compute operations used by translated code. + * + * Copyright (c) 2007 Thiemo Seufer + * Copyright (c) 2007 Jocelyn Mayer + * + * Permission is hereby granted, free of charge, to any person obtaining a copy + * of this software and associated documentation files (the "Software"), to deal + * in the Software without restriction, including without limitation the rights + * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell + * copies of the Software, and to permit persons to whom the Software is + * furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice shall be included in + * all copies or substantial portions of the Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN + * THE SOFTWARE. + */ + +/* Note that some of those functions may end up calling libgcc functions, + depending on the host machine. It is up to the target emulation to + cope with that. */ + +/* Binary search for leading zeros. */ + +static always_inline int clz32(uint32_t val) +{ + int cnt = 0; + + if (!(val & 0xFFFF0000U)) { + cnt += 16; + val <<= 16; + } + if (!(val & 0xFF000000U)) { + cnt += 8; + val <<= 8; + } + if (!(val & 0xF0000000U)) { + cnt += 4; + val <<= 4; + } + if (!(val & 0xC0000000U)) { + cnt += 2; + val <<= 2; + } + if (!(val & 0x80000000U)) { + cnt++; + val <<= 1; + } + if (!(val & 0x80000000U)) { + cnt++; + } + return cnt; +} + +static always_inline int clo32(uint32_t val) +{ + return clz32(~val); +} + +static always_inline int clz64(uint64_t val) +{ + int cnt = 0; + + if (!(val & 0xFFFFFFFF00000000ULL)) { + cnt += 32; + val <<= 32; + } + if (!(val & 0xFFFF000000000000ULL)) { + cnt += 16; + val <<= 16; + } + if (!(val & 0xFF00000000000000ULL)) { + cnt += 8; + val <<= 8; + } + if (!(val & 0xF000000000000000ULL)) { + cnt += 4; + val <<= 4; + } + if (!(val & 0xC000000000000000ULL)) { + cnt += 2; + val <<= 2; + } + if (!(val & 0x8000000000000000ULL)) { + cnt++; + val <<= 1; + } + if (!(val & 0x8000000000000000ULL)) { + cnt++; + } + return cnt; +} + +static always_inline int clo64(uint64_t val) +{ + return clz64(~val); +} Index: qemu-work/target-mips/exec.h =================================================================== --- qemu-work.orig/target-mips/exec.h 2007-10-26 22:42:15.000000000 +0100 +++ qemu-work/target-mips/exec.h 2007-10-27 12:13:30.000000000 +0100 @@ -70,6 +70,8 @@ void do_dsrav (void); void do_dsrlv (void); void do_drotrv (void); +void do_dclo (void); +void do_dclz (void); #endif #endif Index: qemu-work/target-mips/op.c =================================================================== --- qemu-work.orig/target-mips/op.c 2007-10-26 22:42:14.000000000 +0100 +++ qemu-work/target-mips/op.c 2007-10-27 12:13:30.000000000 +0100 @@ -22,6 +22,7 @@ #include "config.h" #include "exec.h" +#include "host-utils.h" #ifndef CALL_FROM_TB0 #define CALL_FROM_TB0(func) func() @@ -537,35 +538,13 @@ void op_clo (void) { - int n; - - if (T0 == ~((target_ulong)0)) { - T0 = 32; - } else { - for (n = 0; n < 32; n++) { - if (!(((int32_t)T0) & (1 << 31))) - break; - T0 <<= 1; - } - T0 = n; - } + T0 = clo32(T0); RETURN(); } void op_clz (void) { - int n; - - if (T0 == 0) { - T0 = 32; - } else { - for (n = 0; n < 32; n++) { - if (T0 & (1 << 31)) - break; - T0 <<= 1; - } - T0 = n; - } + T0 = clz32(T0); RETURN(); } @@ -645,6 +624,18 @@ RETURN(); } +void op_dclo (void) +{ + CALL_FROM_TB0(do_dclo); + RETURN(); +} + +void op_dclz (void) +{ + CALL_FROM_TB0(do_dclz); + RETURN(); +} + #else /* TARGET_LONG_BITS > HOST_LONG_BITS */ void op_dsll (void) @@ -735,41 +726,19 @@ T0 = T1; RETURN(); } -#endif /* TARGET_LONG_BITS > HOST_LONG_BITS */ void op_dclo (void) { - int n; - - if (T0 == ~((target_ulong)0)) { - T0 = 64; - } else { - for (n = 0; n < 64; n++) { - if (!(T0 & (1ULL << 63))) - break; - T0 <<= 1; - } - T0 = n; - } + T0 = clo64(T0); RETURN(); } void op_dclz (void) { - int n; - - if (T0 == 0) { - T0 = 64; - } else { - for (n = 0; n < 64; n++) { - if (T0 & (1ULL << 63)) - break; - T0 <<= 1; - } - T0 = n; - } + T0 = clz64(T0); RETURN(); } +#endif /* TARGET_LONG_BITS > HOST_LONG_BITS */ #endif /* TARGET_MIPSN32 || TARGET_MIPS64 */ /* 64 bits arithmetic */ Index: qemu-work/target-mips/op_helper.c =================================================================== --- qemu-work.orig/target-mips/op_helper.c 2007-10-26 22:42:15.000000000 +0100 +++ qemu-work/target-mips/op_helper.c 2007-10-27 12:13:30.000000000 +0100 @@ -20,6 +20,8 @@ #include <stdlib.h> #include "exec.h" +#include "host-utils.h" + #define GETPC() (__builtin_return_address(0)) /*****************************************************************************/ @@ -141,6 +143,17 @@ } else T0 = T1; } + +void do_dclo (void) +{ + T0 = clo64(T0); +} + +void do_dclz (void) +{ + T0 = clz64(T0); +} + #endif /* TARGET_LONG_BITS > HOST_LONG_BITS */ #endif /* TARGET_MIPSN32 || TARGET_MIPS64 */ ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [Qemu-devel] Mips 64 emulation not compiling 2007-10-27 11:19 ` Thiemo Seufer @ 2007-10-27 12:24 ` J. Mayer 2007-10-27 13:01 ` Blue Swirl 2007-10-27 13:08 ` Thiemo Seufer 0 siblings, 2 replies; 8+ messages in thread From: J. Mayer @ 2007-10-27 12:24 UTC (permalink / raw) To: Thiemo Seufer; +Cc: qemu-devel On Sat, 2007-10-27 at 12:19 +0100, Thiemo Seufer wrote: > J. Mayer wrote: > > The latest patches in clo makes gcc 3.4.6 fail to build the mips64 > > targets on my amd64 host (looks like an register allocation clash in the > > optimizer code). > > Your version is likely faster as well. > > > Furthermore, the clz micro-op for Mips seems very suspect to me, > > according to the changes made in the clo implementation. > > It is correct, the sign-extension are zero in that case. OK, you know better than me... > > I did change the clz / clo implementation to use the same code as the > > one used for the PowerPC implementation. It seems to me that the result > > would be correct... And it compiles... > > > > Please take a look to the folowing patch: > > We have now clz/clo in several places, so I expanded your patch a > bit. For now it is only used for the mips target. Comments? I fully aggree with the idea of sharing this code, if it's OK according to all targets specifications. Please commit and I'll update PowerPC and Alpha target to use them. Oh, I did an optimisation for clz64 used on 32 bits host, avoiding use of 64 bits logical operations: static always_inline int clz64(uint64_t val) { int cnt = 0; #if HOST_LONG_BITS == 64 if (!(val & 0xFFFFFFFF00000000ULL)) { cnt += 32; val <<= 32; } if (!(val & 0xFFFF000000000000ULL)) { cnt += 16; val <<= 16; } if (!(val & 0xFF00000000000000ULL)) { cnt += 8; val <<= 8; } if (!(val & 0xF000000000000000ULL)) { cnt += 4; val <<= 4; } if (!(val & 0xC000000000000000ULL)) { cnt += 2; val <<= 2; } if (!(val & 0x8000000000000000ULL)) { cnt++; val <<= 1; } if (!(val & 0x8000000000000000ULL)) { cnt++; } #else /* Make it easier on 32 bits host machines */ if (!(val >> 32)) cnt = _do_cntlzw(val) + 32; else cnt = _do_cntlzw(val >> 32); #endif return cnt; } If gcc is really cleaver, this would not lead to a better code, but it seemed that the 32 bits implementation leaded to a more optimized code on 32 bits hosts. Maybe this implementation could also be used for 64 bits host, avoiding #ifdef. Count trailing zero is also implemented on Alpha, it may be a good idea to share the implementation, if needed: static always_inline void ctz32 (uint32_t val) { int cnt = 0; if (!(val & 0x0000FFFFUL)) { cnt += 16; op32 >>= 16; } if (!(val & 0x000000FFUL)) { cnt += 8; val >>= 8; } if (!(val & 0x0000000FUL)) { cnt += 4; val >>= 4; } if (!(val & 0x00000003UL)) { cnt += 2; val >>= 2; } if (!(val & 0x00000001UL)) { cnt++; val >>= 1; } if (!(val & 0x00000001UL)) { cnt++; } return cnt; } static always_inline void ctz64 (uint64_t val) { int cnt = 0; if (!(val & 0x00000000FFFFFFFFULL)) { cnt+= 32; val >>= 32; } /* Make it easier for 32 bits hosts */ cnt += ctz32(val); return cnt; } And of course cto32 and cto64 could also be added. I also got optimized versions of bit population count which could also be shared: static always_inline int ctpop32 (uint32_t val) { int i; for (i = 0; val != 0; i++) val = val ^ (val - 1); return i; } If you prefer, I can add those shared functions (ctz32, ctz64, cto32, cto64, ctpop32, ctpop64) later, as they do not seem as widely used as clxxx functions. -- J. Mayer <l_indien@magic.fr> Never organized ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [Qemu-devel] Mips 64 emulation not compiling 2007-10-27 12:24 ` J. Mayer @ 2007-10-27 13:01 ` Blue Swirl 2007-10-27 13:22 ` J. Mayer 2007-10-27 13:27 ` Christian "Eddie" Dost 2007-10-27 13:08 ` Thiemo Seufer 1 sibling, 2 replies; 8+ messages in thread From: Blue Swirl @ 2007-10-27 13:01 UTC (permalink / raw) To: qemu-devel On 10/27/07, J. Mayer <l_indien@magic.fr> wrote: > I also got optimized versions of bit population count which could also > be shared: > static always_inline int ctpop32 (uint32_t val) > { > int i; > > for (i = 0; val != 0; i++) > val = val ^ (val - 1); > > return i; > } > > If you prefer, I can add those shared functions (ctz32, ctz64, cto32, > cto64, ctpop32, ctpop64) later, as they do not seem as widely used as > clxxx functions. This would be interesting for Sparc64. Could you compare your version to do_popc() in target-sparc/op_helper.c? ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [Qemu-devel] Mips 64 emulation not compiling 2007-10-27 13:01 ` Blue Swirl @ 2007-10-27 13:22 ` J. Mayer 2007-10-27 13:27 ` Christian "Eddie" Dost 1 sibling, 0 replies; 8+ messages in thread From: J. Mayer @ 2007-10-27 13:22 UTC (permalink / raw) To: Blue Swirl; +Cc: qemu-devel On Sat, 2007-10-27 at 16:01 +0300, Blue Swirl wrote: > On 10/27/07, J. Mayer <l_indien@magic.fr> wrote: > > I also got optimized versions of bit population count which could also > > be shared: > > static always_inline int ctpop32 (uint32_t val) > > { > > int i; > > > > for (i = 0; val != 0; i++) > > val = val ^ (val - 1); > > > > return i; > > } > > > > If you prefer, I can add those shared functions (ctz32, ctz64, cto32, > > cto64, ctpop32, ctpop64) later, as they do not seem as widely used as > > clxxx functions. > > This would be interesting for Sparc64. Could you compare your version > to do_popc() in target-sparc/op_helper.c? My feeling is: my implementation does n loops, n being the number of bits set in the word, then will always be faster than yours when only a few bits are set. your implementation could be better because: - it has a fixed cost - it does not do any tests / jumps / loops The drawback of your implementation is that it generates a lot of code, thus could never be used directly in micro-ops: on my amd64 host, my implementation compiles in 36 bytes of code and the 64 bits version does not generate more code than the 32 bits one. Your (64 bits only) implementation compiles in 217 bytes of code. On a x86, my 32 bits version is 49 bytes long, the 64 bits one is 79 bits long and yours is 323 bytes long. But this would never be a problem when called from a helper. Then, I'm not really sure of what is the best choice to be done here.... We may have to do tests to see which one of the 2 implementations seems more efficient. -- J. Mayer <l_indien@magic.fr> Never organized ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [Qemu-devel] Mips 64 emulation not compiling 2007-10-27 13:01 ` Blue Swirl 2007-10-27 13:22 ` J. Mayer @ 2007-10-27 13:27 ` Christian "Eddie" Dost 2007-10-27 14:12 ` J. Mayer 1 sibling, 1 reply; 8+ messages in thread From: Christian "Eddie" Dost @ 2007-10-27 13:27 UTC (permalink / raw) To: qemu-devel The sparc64 popc works in O(lg(n)), the "optimized" code below work in O(n). Could be better to generalize the sparc64 code, like this: static always_inline int ctpop32 (uint32_t val) { uint32_t i; i = (val & 0x55555555) + ((val >> 1) & 0x55555555); i = (i & 0x33333333) + ((i >> 2) & 0x33333333); i = (i & 0x0f0f0f0f) + ((i >> 4) & 0x0f0f0f0f); i = (i & 0x00ff00ff) + ((i >> 8) & 0x00ff00ff); i = (i & 0x0000ffff) + ((i >> 16) & 0x0000ffff); return i; } For the 64 bit version see target-sparc/op_helper.c Best regards, Eddie Blue Swirl wrote: > On 10/27/07, J. Mayer <l_indien@magic.fr> wrote: >> I also got optimized versions of bit population count which could also >> be shared: >> static always_inline int ctpop32 (uint32_t val) >> { >> int i; >> >> for (i = 0; val != 0; i++) >> val = val ^ (val - 1); >> >> return i; >> } >> >> If you prefer, I can add those shared functions (ctz32, ctz64, cto32, >> cto64, ctpop32, ctpop64) later, as they do not seem as widely used as >> clxxx functions. > > This would be interesting for Sparc64. Could you compare your version > to do_popc() in target-sparc/op_helper.c? > > -- ___________________________________________________brainaid_____________ Eddie C. Dost Rue de la Chapelle 51 phone +32 87 788817 B-4850 Moresnet fax +32 87 788818 ecd@brainaid.de Belgium cell +49 172 9312808 ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [Qemu-devel] Mips 64 emulation not compiling 2007-10-27 13:27 ` Christian "Eddie" Dost @ 2007-10-27 14:12 ` J. Mayer 0 siblings, 0 replies; 8+ messages in thread From: J. Mayer @ 2007-10-27 14:12 UTC (permalink / raw) To: qemu-devel; +Cc: Blue Swirl, Christian Eddie Dost On Sat, 2007-10-27 at 15:27 +0200, Christian "Eddie" Dost wrote: > The sparc64 popc works in O(lg(n)) No, it has a fix cost, whatever the operand is. It has another advantage: it does not need any intermediate variable, which is great when running on CISC host in the Qemu execution environmnent. > , the "optimized" code below work in > O(n). Yes. But it's false.... It shoudl be val &= val - 1 instead of val ^= val - 1... [...] I did tests on my PC, which will imho close the debate: the Sparc implementation is at least 50 % faster. I did generate 2 ^ 29 random numbers to achieve this test (and checked that the repartition was OK). -- J. Mayer <l_indien@magic.fr> Never organized ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [Qemu-devel] Mips 64 emulation not compiling 2007-10-27 12:24 ` J. Mayer 2007-10-27 13:01 ` Blue Swirl @ 2007-10-27 13:08 ` Thiemo Seufer 1 sibling, 0 replies; 8+ messages in thread From: Thiemo Seufer @ 2007-10-27 13:08 UTC (permalink / raw) To: J. Mayer; +Cc: qemu-devel J. Mayer wrote: > > On Sat, 2007-10-27 at 12:19 +0100, Thiemo Seufer wrote: > > J. Mayer wrote: > > > The latest patches in clo makes gcc 3.4.6 fail to build the mips64 > > > targets on my amd64 host (looks like an register allocation clash in the > > > optimizer code). > > > > Your version is likely faster as well. > > > > > Furthermore, the clz micro-op for Mips seems very suspect to me, > > > according to the changes made in the clo implementation. > > > > It is correct, the sign-extension are zero in that case. > > OK, you know better than me... > > > > I did change the clz / clo implementation to use the same code as the > > > one used for the PowerPC implementation. It seems to me that the result > > > would be correct... And it compiles... > > > > > > Please take a look to the folowing patch: > > > > We have now clz/clo in several places, so I expanded your patch a > > bit. For now it is only used for the mips target. Comments? > > I fully aggree with the idea of sharing this code, if it's OK according > to all targets specifications. Please commit and I'll update PowerPC and > Alpha target to use them. > Oh, I did an optimisation for clz64 used on 32 bits host, avoiding use > of 64 bits logical operations: [snip: a lot more nifty things] > If you prefer, I can add those shared functions (ctz32, ctz64, cto32, > cto64, ctpop32, ctpop64) later, as they do not seem as widely used as > clxxx functions. For now I just committed the patch. Feel free to enhance it as you see fit. :-) Thiemo ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2007-10-27 14:14 UTC | newest] Thread overview: 8+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2007-10-24 10:41 [Qemu-devel] Mips 64 emulation not compiling J. Mayer 2007-10-27 11:19 ` Thiemo Seufer 2007-10-27 12:24 ` J. Mayer 2007-10-27 13:01 ` Blue Swirl 2007-10-27 13:22 ` J. Mayer 2007-10-27 13:27 ` Christian "Eddie" Dost 2007-10-27 14:12 ` J. Mayer 2007-10-27 13:08 ` Thiemo Seufer
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).