* Endianness macros capitalization @ 2008-07-05 13:27 Javier Martín 2008-07-05 21:30 ` Pavel Roskin 0 siblings, 1 reply; 20+ messages in thread From: Javier Martín @ 2008-07-05 13:27 UTC (permalink / raw) To: The development of GRUB 2 [-- Attachment #1: Type: text/plain, Size: 254 bytes --] Just my two (euro) cents: why are the endianness macros written like functions? I'm talking about the grub_Xe_to_cpuNN family, which look like function calls instead of the macros they are. Shouldn't they be capitalized to GRUB_LE_TO_CPU32 and such? [-- Attachment #2: Esta parte del mensaje está firmada digitalmente --] [-- Type: application/pgp-signature, Size: 827 bytes --] ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Endianness macros capitalization 2008-07-05 13:27 Endianness macros capitalization Javier Martín @ 2008-07-05 21:30 ` Pavel Roskin 2008-07-05 22:54 ` Javier Martín 0 siblings, 1 reply; 20+ messages in thread From: Pavel Roskin @ 2008-07-05 21:30 UTC (permalink / raw) To: The development of GRUB 2 On Sat, 2008-07-05 at 15:27 +0200, Javier Martín wrote: > Just my two (euro) cents: why are the endianness macros written like > functions? I'm talking about the grub_Xe_to_cpuNN family, which look > like function calls instead of the macros they are. Shouldn't they be > capitalized to GRUB_LE_TO_CPU32 and such? They probably should be functions. We may want to sparse annotate GRUB one day, and then inline functions in the only way to go. I prefer capitalized names only for macros that cannot be functions or have non-trivial size effects. -- Regards, Pavel Roskin ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Endianness macros capitalization 2008-07-05 21:30 ` Pavel Roskin @ 2008-07-05 22:54 ` Javier Martín 2008-07-05 23:14 ` Pavel Roskin 2008-07-06 18:30 ` Robert Millan 0 siblings, 2 replies; 20+ messages in thread From: Javier Martín @ 2008-07-05 22:54 UTC (permalink / raw) To: The development of GRUB 2 [-- Attachment #1: Type: text/plain, Size: 746 bytes --] El sáb, 05-07-2008 a las 17:30 -0400, Pavel Roskin escribió: > They probably should be functions. We may want to sparse annotate GRUB > one day, and then inline functions in the only way to go. Hmm... you mean changing this #define grub_swap_bytes16(x) \ ({ \ grub_uint16_t _x = (x); \ (grub_uint16_t) ((_x << 8) | (_x >> 8)); \ }) ...for this inline grub_uint16_t grub_swap_bytes16(uint16_t x) { return (x << 8) | (x >> 8); } and such? The pro is that we get rid of the ugly hack in the macro version that ensures single evaluation, but a con is that we cannot _force_ any random compiler to inline anything, so we might end up with gross numbers of function calls. By the way, what is "sparse annotate"? [-- Attachment #2: Esta parte del mensaje está firmada digitalmente --] [-- Type: application/pgp-signature, Size: 827 bytes --] ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Endianness macros capitalization 2008-07-05 22:54 ` Javier Martín @ 2008-07-05 23:14 ` Pavel Roskin 2008-07-06 18:30 ` Robert Millan 1 sibling, 0 replies; 20+ messages in thread From: Pavel Roskin @ 2008-07-05 23:14 UTC (permalink / raw) To: The development of GRUB 2 On Sun, 2008-07-06 at 00:54 +0200, Javier Martín wrote: > El sáb, 05-07-2008 a las 17:30 -0400, Pavel Roskin escribió: > > They probably should be functions. We may want to sparse annotate GRUB > > one day, and then inline functions in the only way to go. > Hmm... you mean changing this > > #define grub_swap_bytes16(x) \ > ({ \ > grub_uint16_t _x = (x); \ > (grub_uint16_t) ((_x << 8) | (_x >> 8)); \ > }) > > ...for this > > inline grub_uint16_t grub_swap_bytes16(uint16_t x) > { > return (x << 8) | (x >> 8); > } > > and such? Actually, grub_swap_bytes16 should be avoided in the general code. We need to convert macros like grub_cpu_to_le16() to functions. > The pro is that we get rid of the ugly hack in the macro > version that ensures single evaluation, but a con is that we cannot > _force_ any random compiler to inline anything, so we might end up with > gross numbers of function calls. We can force inlining in gcc with always_inline. But we don't need to force anything. Modern versions of gcc can decide for us, but we can influence the choice by using optimization flags, such as -O2 and -Os. In come cases, function calls would take less space than the actual code for byte swapping. > By the way, what is "sparse annotate"? There is a utility called sparse: git://git.kernel.org/pub/scm/linux/kernel/git/josh/sparse.git It acts like a compiler, but it only produces warnings. It can distinguish bitwise types that cannot be treated like normal integers for arithmetic operations like addition. If we mark little-endian and big-endian variables as bitwise, sparse will warn if such variables are used without byte swapping. Linux uses it a lot. Sparse is really a very good checker for that purpose. Sparse does other checks too. -- Regards, Pavel Roskin ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Endianness macros capitalization 2008-07-05 22:54 ` Javier Martín 2008-07-05 23:14 ` Pavel Roskin @ 2008-07-06 18:30 ` Robert Millan 2008-07-06 20:02 ` Javier Martín 1 sibling, 1 reply; 20+ messages in thread From: Robert Millan @ 2008-07-06 18:30 UTC (permalink / raw) To: The development of GRUB 2 On Sun, Jul 06, 2008 at 12:54:58AM +0200, Javier Martín wrote: > El sáb, 05-07-2008 a las 17:30 -0400, Pavel Roskin escribió: > > They probably should be functions. We may want to sparse annotate GRUB > > one day, and then inline functions in the only way to go. > Hmm... you mean changing this > > #define grub_swap_bytes16(x) \ > ({ \ > grub_uint16_t _x = (x); \ > (grub_uint16_t) ((_x << 8) | (_x >> 8)); \ > }) > > ...for this > > inline grub_uint16_t grub_swap_bytes16(uint16_t x) > { > return (x << 8) | (x >> 8); > } I know I get to be annoying about this, but which of these two (plus the non-inline version) would result in _smaller_ code? Function calls on i386-pc are cheap (because we use the regparm hack), so maybe it'd work better using normal functions. -- Robert Millan <GPLv2> I know my rights; I want my phone call! <DRM> What good is a phone call… if you are unable to speak? (as seen on /.) ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Endianness macros capitalization 2008-07-06 18:30 ` Robert Millan @ 2008-07-06 20:02 ` Javier Martín 2008-07-07 19:25 ` Christian Franke 0 siblings, 1 reply; 20+ messages in thread From: Javier Martín @ 2008-07-06 20:02 UTC (permalink / raw) To: The development of GRUB 2 [-- Attachment #1: Type: text/plain, Size: 1090 bytes --] El dom, 06-07-2008 a las 20:30 +0200, Robert Millan escribió: > On Sun, Jul 06, 2008 at 12:54:58AM +0200, Javier Martín wrote: > > El sáb, 05-07-2008 a las 17:30 -0400, Pavel Roskin escribió: > > > They probably should be functions. We may want to sparse annotate GRUB > > > one day, and then inline functions in the only way to go. > > Hmm... you mean changing this > > > > #define grub_swap_bytes16(x) \ > > ({ \ > > grub_uint16_t _x = (x); \ > > (grub_uint16_t) ((_x << 8) | (_x >> 8)); \ > > }) > > > > ...for this > > > > inline grub_uint16_t grub_swap_bytes16(uint16_t x) > > { > > return (x << 8) | (x >> 8); > > } > > I know I get to be annoying about this, but which of these two (plus the > non-inline version) would result in _smaller_ code? > > Function calls on i386-pc are cheap (because we use the regparm hack), so > maybe it'd work better using normal functions. > If we are to take the space-saving route, the best we can do is turn them to functions, maybe even _without_ the "inline" keyword, and GCC will do what's best. [-- Attachment #2: Esta parte del mensaje está firmada digitalmente --] [-- Type: application/pgp-signature, Size: 827 bytes --] ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Endianness macros capitalization 2008-07-06 20:02 ` Javier Martín @ 2008-07-07 19:25 ` Christian Franke 2008-07-08 18:04 ` Christian Franke 0 siblings, 1 reply; 20+ messages in thread From: Christian Franke @ 2008-07-07 19:25 UTC (permalink / raw) To: The development of GRUB 2 Javier Martín wrote: > El dom, 06-07-2008 a las 20:30 +0200, Robert Millan escribió: > >> On Sun, Jul 06, 2008 at 12:54:58AM +0200, Javier Martín wrote: >> >>> El sáb, 05-07-2008 a las 17:30 -0400, Pavel Roskin escribió: >>> >>>> They probably should be functions. We may want to sparse annotate GRUB >>>> one day, and then inline functions in the only way to go. >>>> >>> Hmm... you mean changing this >>> >>> #define grub_swap_bytes16(x) \ >>> ({ \ >>> grub_uint16_t _x = (x); \ >>> (grub_uint16_t) ((_x << 8) | (_x >> 8)); \ >>> }) >>> >>> ...for this >>> >>> inline grub_uint16_t grub_swap_bytes16(uint16_t x) >>> { >>> return (x << 8) | (x >> 8); >>> } >>> >> I know I get to be annoying about this, but which of these two (plus the >> non-inline version) would result in _smaller_ code? >> >> Function calls on i386-pc are cheap (because we use the regparm hack), so >> maybe it'd work better using normal functions. >> >> Assembly code for grub_swap_bytes16 from Debian gcc 4.1.2-7: Macro or Inline: 4 bytes (minus possible additional benefit from register level optimizations) 66 c1 c0 08 rol $0x8,%ax Function call: 11 bytes 0f b7 c0 movzwl %ax,%eax e8 xx xx xx xx call grub_swap_bytes16 0f b7 c0 movzwl %ax,%eax The break even is possibly at grub_swap_bytes64() :-) > If we are to take the space-saving route, the best we can do is turn > them to functions, maybe even _without_ the "inline" keyword, and GCC > will do what's best. > > How can this be accomplished for functions in include files? If non-inline functions are declared non-static in an include file, duplicate symbols will result (Even with 'inline' you have to be careful due to different inline modes). If declared static and GCC decides to emit a function instead of inline, the code will be duplicated in all modules which use this function. AFAIK, GCC supports 'vague linkage' code (each put in an extra section with '.linkonce' attribute) only for C++, but not for C. Or is there a GCC specific function attribute to accomplish this? Christian ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Endianness macros capitalization 2008-07-07 19:25 ` Christian Franke @ 2008-07-08 18:04 ` Christian Franke 2008-07-09 6:47 ` Pavel Roskin 0 siblings, 1 reply; 20+ messages in thread From: Christian Franke @ 2008-07-08 18:04 UTC (permalink / raw) To: The development of GRUB 2 [-- Attachment #1: Type: text/plain, Size: 1633 bytes --] Christian Franke wrote: >>> > > Assembly code for grub_swap_bytes16 from Debian gcc 4.1.2-7: > > Macro or Inline: 4 bytes (minus possible additional benefit from > register level optimizations) > > 66 c1 c0 08 rol $0x8,%ax > > > Function call: 11 bytes > > 0f b7 c0 movzwl %ax,%eax > e8 xx xx xx xx call grub_swap_bytes16 > 0f b7 c0 movzwl %ax,%eax > > The break even is possibly at grub_swap_bytes64() :-) > I take that back. For i386, even grub_swap_bytes32() should possibly be a function. The attached script compares the sizes of 8 inline expansions of grub_swap_bytesNN() vs. function calls. Sample output (Debian gcc 4.1.2-7): /* 16 bit: inline=88, function=128 */ /* 32 bit: inline=357, function=104 */ #define GRUB_DONT_INLINE_GRUB_SWAP_BYTES32 1 /* 64 bit: inline=2621, function=167 */ #define GRUB_DONT_INLINE_GRUB_SWAP_BYTES64 1 With old gcc versions without the "rol" optimization, even the 16 bit swap should be a function: (Cygwin gcc 3.4.4): /* 16 bit: inline=148, function=116 */ #define GRUB_DONT_INLINE_GRUB_SWAP_BYTES16 1 /* 32 bit: inline=340, function=96 */ #define GRUB_DONT_INLINE_GRUB_SWAP_BYTES32 1 /* 64 bit: inline=2876, function=164 */ #define GRUB_DONT_INLINE_GRUB_SWAP_BYTES64 1 Interestingly, the 64bit inline result is much smaller if '-fomit-frame-pointer' is added: (Cygwin gcc 3.4.4): /* 16 bit: inline=144, function=112 */ #define GRUB_DONT_INLINE_GRUB_SWAP_BYTES16 1 /* 32 bit: inline=336, function=92 */ #define GRUB_DONT_INLINE_GRUB_SWAP_BYTES32 1 /* 64 bit: inline=1372, function=160 */ #define GRUB_DONT_INLINE_GRUB_SWAP_BYTES64 1 Christian [-- Attachment #2: testsizes.sh.txt --] [-- Type: text/plain, Size: 996 bytes --] #!/bin/sh set -e srcdir=.. cat <<EOF > test.c #include <grub/types.h> #ifdef extf type extf(type); #define func extf #else #define func inlf #endif extern type x1,x2,x3,x4; type test(type x, type *p) { x4 = func(func(x1) + func(x2) + func(x3)); *p = func(func(*p) * 13); return func(func(x) + 42); } EOF CC="gcc -c -I. -Iinclude -I${srcdir}/include -Os -falign-jumps=1 -falign-loops=1 -falign-functions=1 -mregparm=3 -mrtd $*" for bits in 16 32 64; do ${CC} -Dtype=grub_uint${bits}_t -Dinlf=grub_swap_bytes${bits} -o test${bits}i.o test.c si=$(size test${bits}i.o | sed -n '2s,^ *\([0-9]*\).*$,\1,p') ${CC} -Dtype=grub_uint${bits}_t -Dextf=grub_swap_bytes${bits}_f -o test${bits}f.o test.c sf=$(size test${bits}f.o | sed -n '2s,^ *\([0-9]*\).*$,\1,p') echo "/* ${bits} bit: inline=$si, function=$sf */" [ $si -gt $sf ] && echo "#define GRUB_DONT_INLINE_GRUB_SWAP_BYTES${bits} 1" #rm -f test${bits}{i,f}.o done rm -f test.c ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Endianness macros capitalization 2008-07-08 18:04 ` Christian Franke @ 2008-07-09 6:47 ` Pavel Roskin 2008-07-09 12:50 ` Christian Franke 0 siblings, 1 reply; 20+ messages in thread From: Pavel Roskin @ 2008-07-09 6:47 UTC (permalink / raw) To: The development of GRUB 2 On Tue, 2008-07-08 at 20:04 +0200, Christian Franke wrote: > With old gcc versions without the "rol" optimization, even the 16 bit > swap should be a function: Or better yet, an asm statement. We should consider optimized assembly vs function call. Even the 32-bit swap could be shorter: a: 86 c4 xchg %al,%ah c: c1 c0 10 rol $0x10,%eax f: 86 c4 xchg %al,%ah 11: That's 7 bytes! And if written properly, it could work with any of the registers that allow access to the lower two bytes (%eax, %ebx, %ecx and %edx), thus giving more flexibility to the optimizer. -- Regards, Pavel Roskin ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Endianness macros capitalization 2008-07-09 6:47 ` Pavel Roskin @ 2008-07-09 12:50 ` Christian Franke 2008-07-09 17:57 ` Pavel Roskin 2008-07-20 13:45 ` Christian Franke 0 siblings, 2 replies; 20+ messages in thread From: Christian Franke @ 2008-07-09 12:50 UTC (permalink / raw) To: The development of GRUB 2 Pavel Roskin wrote: > On Tue, 2008-07-08 at 20:04 +0200, Christian Franke wrote: > > > With old gcc versions without the "rol" optimization, even the 16 > > bit swap should be a function: > > > > Or better yet, an asm statement. > > We should consider optimized assembly vs function call. Even the > 32-bit swap could be shorter: > > a: 86 c4 xchg %al,%ah > c: c1 c0 10 rol $0x10,%eax > f: 86 c4 xchg %al,%ah > 11: > > That's 7 bytes! ... But the function call in the 32-bit case requires only 5 bytes :-) (The 16-bit case requires more due to short->int propagation) Overall size from inline asm would only be smaller if there is any benefit from additional optimizations. Here is a possibly working draft for include/grub/i386/types.h (requires a #ifdef in include/grub/types.h): #define grub_swap_bytes32(x) \ ({ \ grub_uint32_t _x = (x), _y; asm ( \ "xchg %%al,%%ah\n" \ "roll $0x10,%%eax\n" \ "xchg %%al,%%ah\n" \ : "=a"(_y) : "0"(_x) \ ); \ _y; \ }) Result with the test script from my last mail: Debian gcc 4.1.2-7: inline (portable)=357, inline (asm)=126, function=104 Cygwin gcc 3.4.4: inline (portable)=340, inline (asm)=124, function=96 Function call is still better. The only candidate for inline is probably grub_swap_bytes16(). > .... And if written properly, it could work with any of > the registers that allow access to the lower two bytes (%eax, %ebx, > %ecx and %edx), thus giving more flexibility to the optimizer. > This would require support to access the Rl and Rh parts of eRx for each R in [a-d]. Something like: asm ( "xchg %0:l,%0:h\n" "roll $0x10,%0\n" "xchg %0:l,%0:h\n" : "=r"(_y) : "0"(_x) \ ); Do gcc or gas provide a syntax to do this? Christian ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Endianness macros capitalization 2008-07-09 12:50 ` Christian Franke @ 2008-07-09 17:57 ` Pavel Roskin 2008-07-10 19:25 ` Christian Franke 2008-07-20 13:45 ` Christian Franke 1 sibling, 1 reply; 20+ messages in thread From: Pavel Roskin @ 2008-07-09 17:57 UTC (permalink / raw) To: The development of GRUB 2 On Wed, 2008-07-09 at 14:50 +0200, Christian Franke wrote: > Overall size from inline asm would only be smaller if there is any > benefit from additional optimizations. Correct. Function calls use fixed registers for everything, but macros don't have to. > Result with the test script from my last mail: > > Debian gcc 4.1.2-7: > inline (portable)=357, inline (asm)=126, function=104 > > Cygwin gcc 3.4.4: > inline (portable)=340, inline (asm)=124, function=96 > > Function call is still better. The only candidate for inline is probably > grub_swap_bytes16(). But we are getting closer! > > .... And if written properly, it could work with any of > > the registers that allow access to the lower two bytes (%eax, %ebx, > > %ecx and %edx), thus giving more flexibility to the optimizer. > > > > This would require support to access the Rl and Rh parts of eRx for each > R in [a-d]. Something like: > > asm ( > "xchg %0:l,%0:h\n" > "roll $0x10,%0\n" > "xchg %0:l,%0:h\n" > : "=r"(_y) : "0"(_x) \ > ); > > Do gcc or gas provide a syntax to do this? Yes. That's %b0 and %h0. Use "=q" for all registers with "upper halves" (%ah-%dh). Also, please be consistent and use width suffixes everywhere. xchg should be xchgb. -- Regards, Pavel Roskin ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Endianness macros capitalization 2008-07-09 17:57 ` Pavel Roskin @ 2008-07-10 19:25 ` Christian Franke 2008-07-10 19:59 ` Pavel Roskin 2008-07-11 8:53 ` Pavel Roskin 0 siblings, 2 replies; 20+ messages in thread From: Christian Franke @ 2008-07-10 19:25 UTC (permalink / raw) To: The development of GRUB 2 Pavel Roskin wrote: > On Wed, 2008-07-09 at 14:50 +0200, Christian Franke wrote: > > >> Result with the test script from my last mail: >> >> Debian gcc 4.1.2-7: >> inline (portable)=357, inline (asm)=126, function=104 >> >> Cygwin gcc 3.4.4: >> inline (portable)=340, inline (asm)=124, function=96 >> >> Function call is still better. The only candidate for inline is probably >> grub_swap_bytes16(). >> > > But we are getting closer! > > But probably not close enough in the average case, see below :-) >>> .... And if written properly, it could work with any of >>> the registers that allow access to the lower two bytes (%eax, %ebx, >>> %ecx and %edx), thus giving more flexibility to the optimizer. >>> >>> >> This would require support to access the Rl and Rh parts of eRx for each >> R in [a-d]. Something like: >> >> asm ( >> "xchg %0:l,%0:h\n" >> "roll $0x10,%0\n" >> "xchg %0:l,%0:h\n" >> : "=r"(_y) : "0"(_x) \ >> ); >> >> Do gcc or gas provide a syntax to do this? >> > > Yes. That's %b0 and %h0. Use "=q" for all registers with "upper > halves" (%ah-%dh). > > Thanks for the info. I tried this in the same script: #define grub_swap_bytes32(x) \ ({ \ grub_uint32_t _x = (x), _y; \ asm ( \ "xchgb %b0,%h0\n" \ "roll $0x10,%0\n" \ "xchgb %b0,%h0\n" \ : "=q"(_y) : "0"(_x) \ ); \ _y; \ }) GCC optimizer does a good job optimizing register use, but function call is still better: Debian gcc 4.1.2-7: inline (portable)=357,asm (%%eax)=126, asm (%0)=107, function=104 Cygwin gcc 3.4.4: inline (portable)=340, asm (%%eax)=124, asm (%0)=104, function=96 Inline asm is only better is rare cases, e.g. with this test function: type test(type *x) { return func(x[0]) + func(x[1]) + ... + func(x[N]); } Result with Cygwin gcc 3.4.4: N=0: asm=14, function=11 N=1: asm=28, function=32 N=2: asm=40, function=41 N=3: asm=52, function=51 N=4: asm=64, function=61 N=5: asm=76, function=72 To test which files would possibly be affected by any size optimization of grub_swap_bytes*(), I 've done a test compilation with these macros replaced by dummies '(x)'. Only the size of following modules changed: affs.mod afs.mod amiga.mod apple.mod ata.mod hfs.mod hfsplus.mod iso9660.mod jpeg.mod png.mod sfs.mod sun.mod xfs.mod Remaining modules and kernel.img are identical. Christian ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Endianness macros capitalization 2008-07-10 19:25 ` Christian Franke @ 2008-07-10 19:59 ` Pavel Roskin 2008-07-11 7:06 ` Jordi Mallach 2008-07-11 8:53 ` Pavel Roskin 1 sibling, 1 reply; 20+ messages in thread From: Pavel Roskin @ 2008-07-10 19:59 UTC (permalink / raw) To: grub-devel Quoting Christian Franke <Christian.Franke@t-online.de>: > GCC optimizer does a good job optimizing register use, but function > call is still better: > > Debian gcc 4.1.2-7: > inline (portable)=357,asm (%%eax)=126, asm (%0)=107, function=104 > > Cygwin gcc 3.4.4: > inline (portable)=340, asm (%%eax)=124, asm (%0)=104, function=96 Thank you for testing! We could do better with the bswap instruction, but it appeared in 486. I think we still need to support 386. I understand you are counting the function body, so the calling/inlining code is bigger by more than just 3 or 8 bytes. If the code grows, so will the number of the calls or the inline sites. Yet the function size will remain the same. Chances are that if we use 3 "rol"s and allow all general purpose registers, we may save a couple more bytes, because the assembler will still convert rol to xchg when possible, whereas gcc will be able to use non-e*x registers if needed. But I don't want to ask you to spend any more time on it. I think we'll have to go with the function for 32 bit. -- Regards, Pavel Roskin ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Endianness macros capitalization 2008-07-10 19:59 ` Pavel Roskin @ 2008-07-11 7:06 ` Jordi Mallach 0 siblings, 0 replies; 20+ messages in thread From: Jordi Mallach @ 2008-07-11 7:06 UTC (permalink / raw) To: grub-devel On Thu, Jul 10, 2008 at 03:59:14PM -0400, Pavel Roskin wrote: > Thank you for testing! We could do better with the bswap instruction, > but it appeared in 486. I think we still need to support 386. Do we, or better yet, can we? I thought most distros had dropped 386 support from kernels and toolchains. At least Debian has, and Debian isn't know for dropping old hardware that easily. We even had libc5 toolchain until before the etch release, iirc. :) -- Jordi Mallach Pérez -- Debian developer http://www.debian.org/ jordi@sindominio.net jordi@debian.org http://www.sindominio.net/ GnuPG public key information available at http://oskuro.net/ ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Endianness macros capitalization 2008-07-10 19:25 ` Christian Franke 2008-07-10 19:59 ` Pavel Roskin @ 2008-07-11 8:53 ` Pavel Roskin 2008-07-11 9:07 ` Pavel Roskin 1 sibling, 1 reply; 20+ messages in thread From: Pavel Roskin @ 2008-07-11 8:53 UTC (permalink / raw) To: The development of GRUB 2 Quoting Christian Franke <Christian.Franke@t-online.de>: > #define grub_swap_bytes32(x) \ > ({ \ > grub_uint32_t _x = (x), _y; \ > asm ( \ > "xchgb %b0,%h0\n" \ > "roll $0x10,%0\n" \ > "xchgb %b0,%h0\n" \ > : "=q"(_y) : "0"(_x) \ > ); \ > _y; \ > }) Results are much better for the inline implementation with gcc 4.3.1 (Fedora development AKA Rawhide). I tried the idea I suggested (let gcc use any register and optimize in the assembler). The result was a true monstrosity, and it failed to beat your approach, where gcc is forced to use the first four registers. The size of all modules together increased by 8 bytes. However, this code beats everything in my tests by a wide margin: #define grub_swap_lo(x) \ ({ \ grub_uint32_t _x = (x), _y; \ asm ( \ "xchgb %b0,%h0\n" \ : "=q"(_y) : "0"(_x) \ ); \ _y; \ }) #define grub_swap_hi(x) \ ({ \ grub_uint32_t _x = (x), _y; \ asm ( \ "roll $0x10,%0\n" \ : "=r"(_y) : "0"(_x) \ ); \ _y; \ }) #define grub_swap_bytes32(x) grub_swap_lo(grub_swap_hi(grub_swap_lo(x))) It needs some cleanups to avoid warnings and use better names, but I think it should go in. Telling gcc the little details really helps. Sum of module sizes ("du -b -S"): 366976 original 364988 "=q", xchgb 365460 "=q", rolw $8,%w0 365420 "=r", rolw $8,%w0 364996 monstrosity ("=r", conditional xchgb/rolw) 364728 "little details" With my compiler, everything beats the code we have. -- Regards, Pavel Roskin ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Endianness macros capitalization 2008-07-11 8:53 ` Pavel Roskin @ 2008-07-11 9:07 ` Pavel Roskin 2008-07-11 13:21 ` Christian Franke 0 siblings, 1 reply; 20+ messages in thread From: Pavel Roskin @ 2008-07-11 9:07 UTC (permalink / raw) To: grub-devel Quoting Pavel Roskin <proski@gnu.org>: > 366976 original > 364988 "=q", xchgb > 365460 "=q", rolw $8,%w0 > 365420 "=r", rolw $8,%w0 > 364996 monstrosity ("=r", conditional xchgb/rolw) > 364728 "little details" Marginally better (by just 12 bytes): 364716 "little details" + "monstrosity" #define grub_swap_lo(x) \ ({ \ grub_uint32_t _x = (x), _y; \ asm ( \ ".ifc %0,%%eax\n" \ "xchgb %b0,%h0\n" \ ".else\n" \ ".ifc %0,%%ebx\n" \ "xchgb %b0,%h0\n" \ ".else\n" \ ".ifc %0,%%ecx\n" \ "xchgb %b0,%h0\n" \ ".else\n" \ ".ifc %0,%%edx\n" \ "xchgb %b0,%h0\n" \ ".else\n" \ "rolw $8,%w0\n" \ ".endif\n" \ ".endif\n" \ ".endif\n" \ ".endif\n" \ : "=r"(_y) : "0"(_x) \ ); \ _y; \ }) #define grub_swap_hi(x) \ ({ \ grub_uint32_t _x = (x), _y; \ asm ( \ "roll $0x10,%0\n" \ : "=r"(_y) : "0"(_x) \ ); \ _y; \ }) #define grub_swap_bytes32(x) grub_swap_lo(grub_swap_hi(grub_swap_lo(x))) -- Regards, Pavel Roskin ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Endianness macros capitalization 2008-07-11 9:07 ` Pavel Roskin @ 2008-07-11 13:21 ` Christian Franke 2008-07-11 18:33 ` Pavel Roskin 0 siblings, 1 reply; 20+ messages in thread From: Christian Franke @ 2008-07-11 13:21 UTC (permalink / raw) To: The development of GRUB 2 [-- Attachment #1: Type: text/plain, Size: 738 bytes --] Pavel Roskin wrote: > Quoting Pavel Roskin <proski@gnu.org>: > >> 366976 original >> 364988 "=q", xchgb >> 365460 "=q", rolw $8,%w0 >> 365420 "=r", rolw $8,%w0 >> 364996 monstrosity ("=r", conditional xchgb/rolw) >> 364728 "little details" > > Marginally better (by just 12 bytes): > 364716 "little details" + "monstrosity" > Very nice optimization idea! The practical advantage is probably limited by the fact that kernel.img and most of the modules typically included in a i386-pc core.img are not affected. I attached a diff of "size *.mod" outputs for the test mentioned in my last mail. '-' original code, '+' grub_swap_*() replaced by dummies. Large savings potential is in afs.mod, hfs*.mod, and xfs.mod. Christian [-- Attachment #2: sizediff.txt --] [-- Type: text/plain, Size: 4751 bytes --] --- sizes.original 2008-07-10 20:33:57.968750000 +0200 +++ sizes.noswap 2008-07-10 20:34:03.046875000 +0200 @@ -1,78 +1,78 @@ text data bss dec hex filename 4352 24 160 4536 11b8 _bsd.mod 564 8 48 620 26c _chain.mod 2128 8 48 2184 888 _linux.mod 6528 48 128 6704 1a30 _multiboot.mod 820 28 0 848 350 acorn.mod - 2552 44 16 2612 a34 affs.mod - 5532 40 16 5588 15d4 afs.mod - 960 28 16 1004 3ec amiga.mod + 2136 44 16 2196 894 affs.mod + 2656 40 16 2712 a98 afs.mod + 692 28 16 736 2e0 amiga.mod 212 8 0 220 dc aout.mod - 1000 28 16 1044 414 apple.mod - 3028 36 16 3080 c08 ata.mod + 888 28 16 932 3a4 apple.mod + 2948 36 16 3000 bb8 ata.mod 1756 44 16 1816 718 biosdisk.mod 876 8 16 900 384 bitmap.mod 764 20 0 784 310 blocklist.mod 148 16 0 164 a4 boot.mod 628 16 0 644 284 bsd.mod 424 16 0 440 1b8 cat.mod 340 24 0 364 16c chain.mod 684 16 0 700 2bc cmp.mod 396 20 0 416 1a0 configfile.mod 1228 36 16 1280 500 cpio.mod 332 16 16 364 16c cpuid.mod 620 16 0 636 27c echo.mod 2292 12 0 2304 900 elf.mod 2044 44 16 2104 838 ext2.mod 2816 32 16 2864 b30 fat.mod 832 32 16 880 370 font.mod 1208 8 0 1216 4c0 fshelp.mod 3480 132 224 3836 efc gfxterm.mod 1032 32 16 1080 438 gpt.mod 5808 488 0 6296 1898 gzio.mod 284 16 0 300 12c halt.mod 116 16 0 132 84 hello.mod 480 16 0 496 1f0 help.mod 1220 20 0 1240 4d8 hexdump.mod - 4252 32 64 4348 10fc hfs.mod - 4184 44 16 4244 1094 hfsplus.mod - 3128 44 16 3188 c74 iso9660.mod + 3220 32 64 3316 cf4 hfs.mod + 2628 44 16 2688 a80 hfsplus.mod + 3108 44 16 3168 c60 iso9660.mod 2916 32 16 2964 b94 jfs.mod - 3968 40 0 4008 fa8 jpeg.mod + 3956 40 0 3996 f9c jpeg.mod 232 24 0 256 100 linux.mod 1076 52 16 1144 478 loopback.mod 1452 12 0 1464 5b8 ls.mod 2244 20 0 2264 8d8 lspci.mod 3260 36 32 3328 d00 lvm.mod 484 40 32 556 22c memdisk.mod 2512 36 16 2564 a04 minix.mod 264 32 0 296 128 multiboot.mod 25268 104 1936 27308 6aac normal.mod 5576 44 32 5652 1614 ntfs.mod 1880 20 0 1900 76c ntfscomp.mod 1604 24 16 1644 66c pc.mod 188 4 0 192 c0 pci.mod 604 16 0 620 26c play.mod - 4144 24 0 4168 1048 png.mod + 4108 24 0 4132 1024 png.mod 3312 40 16 3368 d28 raid.mod 276 16 0 292 124 read.mod 108 16 0 124 7c reboot.mod 7308 48 16 7372 1ccc reiserfs.mod 1164 16 0 1180 49c search.mod 2492 216 96 2804 af4 serial.mod - 2284 40 16 2340 924 sfs.mod + 1836 40 16 1892 764 sfs.mod 540 16 32 588 24c sleep.mod - 1056 24 16 1096 448 sun.mod + 948 24 16 988 3dc sun.mod 416 20 0 436 1b4 terminal.mod 5188 28 512 5728 1660 terminfo.mod 264 16 0 280 118 test.mod 1468 24 0 1492 5d4 tga.mod 2724 52 16 2792 ae8 udf.mod 2936 52 16 3004 bbc ufs.mod 8832 204 1952 10988 2aec vbe.mod 624 20 0 644 284 vbeinfo.mod 1172 20 0 1192 4a8 vbetest.mod 1668 108 13584 15360 3c00 vga.mod 1284 8 32 1324 52c video.mod 732 32 0 764 2fc videotest.mod - 4980 40 16 5036 13ac xfs.mod + 2924 40 16 2980 ba4 xfs.mod ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Endianness macros capitalization 2008-07-11 13:21 ` Christian Franke @ 2008-07-11 18:33 ` Pavel Roskin 0 siblings, 0 replies; 20+ messages in thread From: Pavel Roskin @ 2008-07-11 18:33 UTC (permalink / raw) To: The development of GRUB 2 On Fri, 2008-07-11 at 15:21 +0200, Christian Franke wrote: > Very nice optimization idea! > > The practical advantage is probably limited by the fact that kernel.img > and most of the modules typically included in a i386-pc core.img are not > affected. > > I attached a diff of "size *.mod" outputs for the test mentioned in my > last mail. '-' original code, '+' grub_swap_*() replaced by dummies. > Large savings potential is in afs.mod, hfs*.mod, and xfs.mod. ... > - 4980 40 16 5036 13ac xfs.mod > + 2924 40 16 2980 ba4 xfs.mod It is a big deal for xfs users. Even more so for gpt+raid+lvm+xfs users, who are against the wall :-) -- Regards, Pavel Roskin ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Endianness macros capitalization 2008-07-09 12:50 ` Christian Franke 2008-07-09 17:57 ` Pavel Roskin @ 2008-07-20 13:45 ` Christian Franke 2008-07-20 23:31 ` Pavel Roskin 1 sibling, 1 reply; 20+ messages in thread From: Christian Franke @ 2008-07-20 13:45 UTC (permalink / raw) To: The development of GRUB 2 Christian Franke wrote: > Pavel Roskin wrote: > >> On Tue, 2008-07-08 at 20:04 +0200, Christian Franke wrote: >> >> >>> With old gcc versions without the "rol" optimization, even the 16 >>> bit swap should be a function: >>> >>> >> Or better yet, an asm statement. >> >> We should consider optimized assembly vs function call. Even the >> 32-bit swap could be shorter: >> >> a: 86 c4 xchg %al,%ah >> c: c1 c0 10 rol $0x10,%eax >> f: 86 c4 xchg %al,%ah >> 11: >> >> That's 7 bytes! ... >> > > But the function call in the 32-bit case requires only 5 bytes :-) > Sorry, I was wrong here. The assumption about function call size was only true for module-local calls. If a module calls a function in kernel, each 5 byte call requires another 8 bytes for the ELF relocation table entry. Christian ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Endianness macros capitalization 2008-07-20 13:45 ` Christian Franke @ 2008-07-20 23:31 ` Pavel Roskin 0 siblings, 0 replies; 20+ messages in thread From: Pavel Roskin @ 2008-07-20 23:31 UTC (permalink / raw) To: The development of GRUB 2 On Sun, 2008-07-20 at 15:45 +0200, Christian Franke wrote: > Christian Franke wrote: > > But the function call in the 32-bit case requires only 5 bytes :-) > > > > Sorry, I was wrong here. The assumption about function call size was > only true for module-local calls. If a module calls a function in > kernel, each 5 byte call requires another 8 bytes for the ELF relocation > table entry. Actually, the new versions of gcc have __builtin_bswap32 and __builtin_bswap64, which are optimized even better. There is no __builtin_bswap16 because it is said that any correct implementation will be optimized anyway: http://gcc.gnu.org/ml/gcc-patches/2006-08/msg00079.html gcc 4.3.0 from Fedora is working fine with them. I'm not sure about gcc 4.2. There is only one bug. Suppose I use this: #define grub_swap_bytes32(x) __builtin_bswap32(x) #define grub_swap_bytes64(x) __builtin_bswap64(x) Then I get this warning: partmap/apple.c: In function 'apple_partition_map_iterate': partmap/apple.c:133: warning: format '%x' expects type 'unsigned int', but argument 8 has type 'unsigned int' partmap/apple.c:133: warning: format '%x' expects type 'unsigned int', but argument 9 has type 'unsigned int' Those arguments are produced by grub_swap_bytes32(). But if I use wrapper functions, then there are no warnings, and the code side is approximately the same. We'll need wrappers to support sparse. -- Regards, Pavel Roskin ^ permalink raw reply [flat|nested] 20+ messages in thread
end of thread, other threads:[~2008-07-20 23:31 UTC | newest] Thread overview: 20+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2008-07-05 13:27 Endianness macros capitalization Javier Martín 2008-07-05 21:30 ` Pavel Roskin 2008-07-05 22:54 ` Javier Martín 2008-07-05 23:14 ` Pavel Roskin 2008-07-06 18:30 ` Robert Millan 2008-07-06 20:02 ` Javier Martín 2008-07-07 19:25 ` Christian Franke 2008-07-08 18:04 ` Christian Franke 2008-07-09 6:47 ` Pavel Roskin 2008-07-09 12:50 ` Christian Franke 2008-07-09 17:57 ` Pavel Roskin 2008-07-10 19:25 ` Christian Franke 2008-07-10 19:59 ` Pavel Roskin 2008-07-11 7:06 ` Jordi Mallach 2008-07-11 8:53 ` Pavel Roskin 2008-07-11 9:07 ` Pavel Roskin 2008-07-11 13:21 ` Christian Franke 2008-07-11 18:33 ` Pavel Roskin 2008-07-20 13:45 ` Christian Franke 2008-07-20 23:31 ` Pavel Roskin
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.