* Endianness macros capitalization
@ 2008-07-05 13:27 Javier Martín
2008-07-05 21:30 ` Pavel Roskin
0 siblings, 1 reply; 20+ messages in thread
From: Javier Martín @ 2008-07-05 13:27 UTC (permalink / raw)
To: The development of GRUB 2
[-- Attachment #1: Type: text/plain, Size: 254 bytes --]
Just my two (euro) cents: why are the endianness macros written like
functions? I'm talking about the grub_Xe_to_cpuNN family, which look
like function calls instead of the macros they are. Shouldn't they be
capitalized to GRUB_LE_TO_CPU32 and such?
[-- Attachment #2: Esta parte del mensaje está firmada digitalmente --]
[-- Type: application/pgp-signature, Size: 827 bytes --]
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Endianness macros capitalization
2008-07-05 13:27 Endianness macros capitalization Javier Martín
@ 2008-07-05 21:30 ` Pavel Roskin
2008-07-05 22:54 ` Javier Martín
0 siblings, 1 reply; 20+ messages in thread
From: Pavel Roskin @ 2008-07-05 21:30 UTC (permalink / raw)
To: The development of GRUB 2
On Sat, 2008-07-05 at 15:27 +0200, Javier Martín wrote:
> Just my two (euro) cents: why are the endianness macros written like
> functions? I'm talking about the grub_Xe_to_cpuNN family, which look
> like function calls instead of the macros they are. Shouldn't they be
> capitalized to GRUB_LE_TO_CPU32 and such?
They probably should be functions. We may want to sparse annotate GRUB
one day, and then inline functions in the only way to go.
I prefer capitalized names only for macros that cannot be functions or
have non-trivial size effects.
--
Regards,
Pavel Roskin
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Endianness macros capitalization
2008-07-05 21:30 ` Pavel Roskin
@ 2008-07-05 22:54 ` Javier Martín
2008-07-05 23:14 ` Pavel Roskin
2008-07-06 18:30 ` Robert Millan
0 siblings, 2 replies; 20+ messages in thread
From: Javier Martín @ 2008-07-05 22:54 UTC (permalink / raw)
To: The development of GRUB 2
[-- Attachment #1: Type: text/plain, Size: 746 bytes --]
El sáb, 05-07-2008 a las 17:30 -0400, Pavel Roskin escribió:
> They probably should be functions. We may want to sparse annotate GRUB
> one day, and then inline functions in the only way to go.
Hmm... you mean changing this
#define grub_swap_bytes16(x) \
({ \
grub_uint16_t _x = (x); \
(grub_uint16_t) ((_x << 8) | (_x >> 8)); \
})
...for this
inline grub_uint16_t grub_swap_bytes16(uint16_t x)
{
return (x << 8) | (x >> 8);
}
and such? The pro is that we get rid of the ugly hack in the macro
version that ensures single evaluation, but a con is that we cannot
_force_ any random compiler to inline anything, so we might end up with
gross numbers of function calls.
By the way, what is "sparse annotate"?
[-- Attachment #2: Esta parte del mensaje está firmada digitalmente --]
[-- Type: application/pgp-signature, Size: 827 bytes --]
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Endianness macros capitalization
2008-07-05 22:54 ` Javier Martín
@ 2008-07-05 23:14 ` Pavel Roskin
2008-07-06 18:30 ` Robert Millan
1 sibling, 0 replies; 20+ messages in thread
From: Pavel Roskin @ 2008-07-05 23:14 UTC (permalink / raw)
To: The development of GRUB 2
On Sun, 2008-07-06 at 00:54 +0200, Javier Martín wrote:
> El sáb, 05-07-2008 a las 17:30 -0400, Pavel Roskin escribió:
> > They probably should be functions. We may want to sparse annotate GRUB
> > one day, and then inline functions in the only way to go.
> Hmm... you mean changing this
>
> #define grub_swap_bytes16(x) \
> ({ \
> grub_uint16_t _x = (x); \
> (grub_uint16_t) ((_x << 8) | (_x >> 8)); \
> })
>
> ...for this
>
> inline grub_uint16_t grub_swap_bytes16(uint16_t x)
> {
> return (x << 8) | (x >> 8);
> }
>
> and such?
Actually, grub_swap_bytes16 should be avoided in the general code. We
need to convert macros like grub_cpu_to_le16() to functions.
> The pro is that we get rid of the ugly hack in the macro
> version that ensures single evaluation, but a con is that we cannot
> _force_ any random compiler to inline anything, so we might end up with
> gross numbers of function calls.
We can force inlining in gcc with always_inline. But we don't need to
force anything. Modern versions of gcc can decide for us, but we can
influence the choice by using optimization flags, such as -O2 and -Os.
In come cases, function calls would take less space than the actual code
for byte swapping.
> By the way, what is "sparse annotate"?
There is a utility called sparse:
git://git.kernel.org/pub/scm/linux/kernel/git/josh/sparse.git
It acts like a compiler, but it only produces warnings. It can
distinguish bitwise types that cannot be treated like normal integers
for arithmetic operations like addition.
If we mark little-endian and big-endian variables as bitwise, sparse
will warn if such variables are used without byte swapping. Linux uses
it a lot. Sparse is really a very good checker for that purpose.
Sparse does other checks too.
--
Regards,
Pavel Roskin
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Endianness macros capitalization
2008-07-05 22:54 ` Javier Martín
2008-07-05 23:14 ` Pavel Roskin
@ 2008-07-06 18:30 ` Robert Millan
2008-07-06 20:02 ` Javier Martín
1 sibling, 1 reply; 20+ messages in thread
From: Robert Millan @ 2008-07-06 18:30 UTC (permalink / raw)
To: The development of GRUB 2
On Sun, Jul 06, 2008 at 12:54:58AM +0200, Javier Martín wrote:
> El sáb, 05-07-2008 a las 17:30 -0400, Pavel Roskin escribió:
> > They probably should be functions. We may want to sparse annotate GRUB
> > one day, and then inline functions in the only way to go.
> Hmm... you mean changing this
>
> #define grub_swap_bytes16(x) \
> ({ \
> grub_uint16_t _x = (x); \
> (grub_uint16_t) ((_x << 8) | (_x >> 8)); \
> })
>
> ...for this
>
> inline grub_uint16_t grub_swap_bytes16(uint16_t x)
> {
> return (x << 8) | (x >> 8);
> }
I know I get to be annoying about this, but which of these two (plus the
non-inline version) would result in _smaller_ code?
Function calls on i386-pc are cheap (because we use the regparm hack), so
maybe it'd work better using normal functions.
--
Robert Millan
<GPLv2> I know my rights; I want my phone call!
<DRM> What good is a phone call… if you are unable to speak?
(as seen on /.)
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Endianness macros capitalization
2008-07-06 18:30 ` Robert Millan
@ 2008-07-06 20:02 ` Javier Martín
2008-07-07 19:25 ` Christian Franke
0 siblings, 1 reply; 20+ messages in thread
From: Javier Martín @ 2008-07-06 20:02 UTC (permalink / raw)
To: The development of GRUB 2
[-- Attachment #1: Type: text/plain, Size: 1090 bytes --]
El dom, 06-07-2008 a las 20:30 +0200, Robert Millan escribió:
> On Sun, Jul 06, 2008 at 12:54:58AM +0200, Javier Martín wrote:
> > El sáb, 05-07-2008 a las 17:30 -0400, Pavel Roskin escribió:
> > > They probably should be functions. We may want to sparse annotate GRUB
> > > one day, and then inline functions in the only way to go.
> > Hmm... you mean changing this
> >
> > #define grub_swap_bytes16(x) \
> > ({ \
> > grub_uint16_t _x = (x); \
> > (grub_uint16_t) ((_x << 8) | (_x >> 8)); \
> > })
> >
> > ...for this
> >
> > inline grub_uint16_t grub_swap_bytes16(uint16_t x)
> > {
> > return (x << 8) | (x >> 8);
> > }
>
> I know I get to be annoying about this, but which of these two (plus the
> non-inline version) would result in _smaller_ code?
>
> Function calls on i386-pc are cheap (because we use the regparm hack), so
> maybe it'd work better using normal functions.
>
If we are to take the space-saving route, the best we can do is turn
them to functions, maybe even _without_ the "inline" keyword, and GCC
will do what's best.
[-- Attachment #2: Esta parte del mensaje está firmada digitalmente --]
[-- Type: application/pgp-signature, Size: 827 bytes --]
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Endianness macros capitalization
2008-07-06 20:02 ` Javier Martín
@ 2008-07-07 19:25 ` Christian Franke
2008-07-08 18:04 ` Christian Franke
0 siblings, 1 reply; 20+ messages in thread
From: Christian Franke @ 2008-07-07 19:25 UTC (permalink / raw)
To: The development of GRUB 2
Javier Martín wrote:
> El dom, 06-07-2008 a las 20:30 +0200, Robert Millan escribió:
>
>> On Sun, Jul 06, 2008 at 12:54:58AM +0200, Javier Martín wrote:
>>
>>> El sáb, 05-07-2008 a las 17:30 -0400, Pavel Roskin escribió:
>>>
>>>> They probably should be functions. We may want to sparse annotate GRUB
>>>> one day, and then inline functions in the only way to go.
>>>>
>>> Hmm... you mean changing this
>>>
>>> #define grub_swap_bytes16(x) \
>>> ({ \
>>> grub_uint16_t _x = (x); \
>>> (grub_uint16_t) ((_x << 8) | (_x >> 8)); \
>>> })
>>>
>>> ...for this
>>>
>>> inline grub_uint16_t grub_swap_bytes16(uint16_t x)
>>> {
>>> return (x << 8) | (x >> 8);
>>> }
>>>
>> I know I get to be annoying about this, but which of these two (plus the
>> non-inline version) would result in _smaller_ code?
>>
>> Function calls on i386-pc are cheap (because we use the regparm hack), so
>> maybe it'd work better using normal functions.
>>
>>
Assembly code for grub_swap_bytes16 from Debian gcc 4.1.2-7:
Macro or Inline: 4 bytes (minus possible additional benefit from
register level optimizations)
66 c1 c0 08 rol $0x8,%ax
Function call: 11 bytes
0f b7 c0 movzwl %ax,%eax
e8 xx xx xx xx call grub_swap_bytes16
0f b7 c0 movzwl %ax,%eax
The break even is possibly at grub_swap_bytes64() :-)
> If we are to take the space-saving route, the best we can do is turn
> them to functions, maybe even _without_ the "inline" keyword, and GCC
> will do what's best.
>
>
How can this be accomplished for functions in include files?
If non-inline functions are declared non-static in an include file,
duplicate symbols will result (Even with 'inline' you have to be careful
due to different inline modes).
If declared static and GCC decides to emit a function instead of inline,
the code will be duplicated in all modules which use this function.
AFAIK, GCC supports 'vague linkage' code (each put in an extra section
with '.linkonce' attribute) only for C++, but not for C. Or is there a
GCC specific function attribute to accomplish this?
Christian
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Endianness macros capitalization
2008-07-07 19:25 ` Christian Franke
@ 2008-07-08 18:04 ` Christian Franke
2008-07-09 6:47 ` Pavel Roskin
0 siblings, 1 reply; 20+ messages in thread
From: Christian Franke @ 2008-07-08 18:04 UTC (permalink / raw)
To: The development of GRUB 2
[-- Attachment #1: Type: text/plain, Size: 1633 bytes --]
Christian Franke wrote:
>>>
>
> Assembly code for grub_swap_bytes16 from Debian gcc 4.1.2-7:
>
> Macro or Inline: 4 bytes (minus possible additional benefit from
> register level optimizations)
>
> 66 c1 c0 08 rol $0x8,%ax
>
>
> Function call: 11 bytes
>
> 0f b7 c0 movzwl %ax,%eax
> e8 xx xx xx xx call grub_swap_bytes16
> 0f b7 c0 movzwl %ax,%eax
>
> The break even is possibly at grub_swap_bytes64() :-)
>
I take that back. For i386, even grub_swap_bytes32() should possibly be
a function.
The attached script compares the sizes of 8 inline expansions of
grub_swap_bytesNN() vs. function calls.
Sample output (Debian gcc 4.1.2-7):
/* 16 bit: inline=88, function=128 */
/* 32 bit: inline=357, function=104 */
#define GRUB_DONT_INLINE_GRUB_SWAP_BYTES32 1
/* 64 bit: inline=2621, function=167 */
#define GRUB_DONT_INLINE_GRUB_SWAP_BYTES64 1
With old gcc versions without the "rol" optimization, even the 16 bit
swap should be a function:
(Cygwin gcc 3.4.4):
/* 16 bit: inline=148, function=116 */
#define GRUB_DONT_INLINE_GRUB_SWAP_BYTES16 1
/* 32 bit: inline=340, function=96 */
#define GRUB_DONT_INLINE_GRUB_SWAP_BYTES32 1
/* 64 bit: inline=2876, function=164 */
#define GRUB_DONT_INLINE_GRUB_SWAP_BYTES64 1
Interestingly, the 64bit inline result is much smaller if
'-fomit-frame-pointer' is added:
(Cygwin gcc 3.4.4):
/* 16 bit: inline=144, function=112 */
#define GRUB_DONT_INLINE_GRUB_SWAP_BYTES16 1
/* 32 bit: inline=336, function=92 */
#define GRUB_DONT_INLINE_GRUB_SWAP_BYTES32 1
/* 64 bit: inline=1372, function=160 */
#define GRUB_DONT_INLINE_GRUB_SWAP_BYTES64 1
Christian
[-- Attachment #2: testsizes.sh.txt --]
[-- Type: text/plain, Size: 996 bytes --]
#!/bin/sh
set -e
srcdir=..
cat <<EOF > test.c
#include <grub/types.h>
#ifdef extf
type extf(type);
#define func extf
#else
#define func inlf
#endif
extern type x1,x2,x3,x4;
type test(type x, type *p)
{
x4 = func(func(x1) + func(x2) + func(x3));
*p = func(func(*p) * 13);
return func(func(x) + 42);
}
EOF
CC="gcc -c -I. -Iinclude -I${srcdir}/include -Os -falign-jumps=1 -falign-loops=1 -falign-functions=1 -mregparm=3 -mrtd $*"
for bits in 16 32 64; do
${CC} -Dtype=grub_uint${bits}_t -Dinlf=grub_swap_bytes${bits} -o test${bits}i.o test.c
si=$(size test${bits}i.o | sed -n '2s,^ *\([0-9]*\).*$,\1,p')
${CC} -Dtype=grub_uint${bits}_t -Dextf=grub_swap_bytes${bits}_f -o test${bits}f.o test.c
sf=$(size test${bits}f.o | sed -n '2s,^ *\([0-9]*\).*$,\1,p')
echo "/* ${bits} bit: inline=$si, function=$sf */"
[ $si -gt $sf ] && echo "#define GRUB_DONT_INLINE_GRUB_SWAP_BYTES${bits} 1"
#rm -f test${bits}{i,f}.o
done
rm -f test.c
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Endianness macros capitalization
2008-07-08 18:04 ` Christian Franke
@ 2008-07-09 6:47 ` Pavel Roskin
2008-07-09 12:50 ` Christian Franke
0 siblings, 1 reply; 20+ messages in thread
From: Pavel Roskin @ 2008-07-09 6:47 UTC (permalink / raw)
To: The development of GRUB 2
On Tue, 2008-07-08 at 20:04 +0200, Christian Franke wrote:
> With old gcc versions without the "rol" optimization, even the 16 bit
> swap should be a function:
Or better yet, an asm statement.
We should consider optimized assembly vs function call. Even the 32-bit
swap could be shorter:
a: 86 c4 xchg %al,%ah
c: c1 c0 10 rol $0x10,%eax
f: 86 c4 xchg %al,%ah
11:
That's 7 bytes! And if written properly, it could work with any of the
registers that allow access to the lower two bytes (%eax, %ebx, %ecx and
%edx), thus giving more flexibility to the optimizer.
--
Regards,
Pavel Roskin
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Endianness macros capitalization
2008-07-09 6:47 ` Pavel Roskin
@ 2008-07-09 12:50 ` Christian Franke
2008-07-09 17:57 ` Pavel Roskin
2008-07-20 13:45 ` Christian Franke
0 siblings, 2 replies; 20+ messages in thread
From: Christian Franke @ 2008-07-09 12:50 UTC (permalink / raw)
To: The development of GRUB 2
Pavel Roskin wrote:
> On Tue, 2008-07-08 at 20:04 +0200, Christian Franke wrote:
>
> > With old gcc versions without the "rol" optimization, even the 16
> > bit swap should be a function:
> >
>
> Or better yet, an asm statement.
>
> We should consider optimized assembly vs function call. Even the
> 32-bit swap could be shorter:
>
> a: 86 c4 xchg %al,%ah
> c: c1 c0 10 rol $0x10,%eax
> f: 86 c4 xchg %al,%ah
> 11:
>
> That's 7 bytes! ...
But the function call in the 32-bit case requires only 5 bytes :-)
(The 16-bit case requires more due to short->int propagation)
Overall size from inline asm would only be smaller if there is any
benefit from additional optimizations.
Here is a possibly working draft for include/grub/i386/types.h (requires
a #ifdef in include/grub/types.h):
#define grub_swap_bytes32(x) \
({ \
grub_uint32_t _x = (x), _y;
asm ( \
"xchg %%al,%%ah\n" \
"roll $0x10,%%eax\n" \
"xchg %%al,%%ah\n" \
: "=a"(_y) : "0"(_x) \
); \
_y; \
})
Result with the test script from my last mail:
Debian gcc 4.1.2-7:
inline (portable)=357, inline (asm)=126, function=104
Cygwin gcc 3.4.4:
inline (portable)=340, inline (asm)=124, function=96
Function call is still better. The only candidate for inline is probably
grub_swap_bytes16().
> .... And if written properly, it could work with any of
> the registers that allow access to the lower two bytes (%eax, %ebx,
> %ecx and %edx), thus giving more flexibility to the optimizer.
>
This would require support to access the Rl and Rh parts of eRx for each
R in [a-d]. Something like:
asm (
"xchg %0:l,%0:h\n"
"roll $0x10,%0\n"
"xchg %0:l,%0:h\n"
: "=r"(_y) : "0"(_x) \
);
Do gcc or gas provide a syntax to do this?
Christian
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Endianness macros capitalization
2008-07-09 12:50 ` Christian Franke
@ 2008-07-09 17:57 ` Pavel Roskin
2008-07-10 19:25 ` Christian Franke
2008-07-20 13:45 ` Christian Franke
1 sibling, 1 reply; 20+ messages in thread
From: Pavel Roskin @ 2008-07-09 17:57 UTC (permalink / raw)
To: The development of GRUB 2
On Wed, 2008-07-09 at 14:50 +0200, Christian Franke wrote:
> Overall size from inline asm would only be smaller if there is any
> benefit from additional optimizations.
Correct. Function calls use fixed registers for everything, but macros
don't have to.
> Result with the test script from my last mail:
>
> Debian gcc 4.1.2-7:
> inline (portable)=357, inline (asm)=126, function=104
>
> Cygwin gcc 3.4.4:
> inline (portable)=340, inline (asm)=124, function=96
>
> Function call is still better. The only candidate for inline is probably
> grub_swap_bytes16().
But we are getting closer!
> > .... And if written properly, it could work with any of
> > the registers that allow access to the lower two bytes (%eax, %ebx,
> > %ecx and %edx), thus giving more flexibility to the optimizer.
> >
>
> This would require support to access the Rl and Rh parts of eRx for each
> R in [a-d]. Something like:
>
> asm (
> "xchg %0:l,%0:h\n"
> "roll $0x10,%0\n"
> "xchg %0:l,%0:h\n"
> : "=r"(_y) : "0"(_x) \
> );
>
> Do gcc or gas provide a syntax to do this?
Yes. That's %b0 and %h0. Use "=q" for all registers with "upper
halves" (%ah-%dh).
Also, please be consistent and use width suffixes everywhere. xchg
should be xchgb.
--
Regards,
Pavel Roskin
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Endianness macros capitalization
2008-07-09 17:57 ` Pavel Roskin
@ 2008-07-10 19:25 ` Christian Franke
2008-07-10 19:59 ` Pavel Roskin
2008-07-11 8:53 ` Pavel Roskin
0 siblings, 2 replies; 20+ messages in thread
From: Christian Franke @ 2008-07-10 19:25 UTC (permalink / raw)
To: The development of GRUB 2
Pavel Roskin wrote:
> On Wed, 2008-07-09 at 14:50 +0200, Christian Franke wrote:
>
>
>> Result with the test script from my last mail:
>>
>> Debian gcc 4.1.2-7:
>> inline (portable)=357, inline (asm)=126, function=104
>>
>> Cygwin gcc 3.4.4:
>> inline (portable)=340, inline (asm)=124, function=96
>>
>> Function call is still better. The only candidate for inline is probably
>> grub_swap_bytes16().
>>
>
> But we are getting closer!
>
>
But probably not close enough in the average case, see below :-)
>>> .... And if written properly, it could work with any of
>>> the registers that allow access to the lower two bytes (%eax, %ebx,
>>> %ecx and %edx), thus giving more flexibility to the optimizer.
>>>
>>>
>> This would require support to access the Rl and Rh parts of eRx for each
>> R in [a-d]. Something like:
>>
>> asm (
>> "xchg %0:l,%0:h\n"
>> "roll $0x10,%0\n"
>> "xchg %0:l,%0:h\n"
>> : "=r"(_y) : "0"(_x) \
>> );
>>
>> Do gcc or gas provide a syntax to do this?
>>
>
> Yes. That's %b0 and %h0. Use "=q" for all registers with "upper
> halves" (%ah-%dh).
>
>
Thanks for the info.
I tried this in the same script:
#define grub_swap_bytes32(x) \
({ \
grub_uint32_t _x = (x), _y; \
asm ( \
"xchgb %b0,%h0\n" \
"roll $0x10,%0\n" \
"xchgb %b0,%h0\n" \
: "=q"(_y) : "0"(_x) \
); \
_y; \
})
GCC optimizer does a good job optimizing register use, but function call
is still better:
Debian gcc 4.1.2-7:
inline (portable)=357,asm (%%eax)=126, asm (%0)=107, function=104
Cygwin gcc 3.4.4:
inline (portable)=340, asm (%%eax)=124, asm (%0)=104, function=96
Inline asm is only better is rare cases, e.g. with this test function:
type test(type *x)
{
return func(x[0]) + func(x[1]) + ... + func(x[N]);
}
Result with Cygwin gcc 3.4.4:
N=0: asm=14, function=11
N=1: asm=28, function=32
N=2: asm=40, function=41
N=3: asm=52, function=51
N=4: asm=64, function=61
N=5: asm=76, function=72
To test which files would possibly be affected by any size optimization
of grub_swap_bytes*(), I 've done a test compilation with these macros
replaced by dummies '(x)'. Only the size of following modules changed:
affs.mod
afs.mod
amiga.mod
apple.mod
ata.mod
hfs.mod
hfsplus.mod
iso9660.mod
jpeg.mod
png.mod
sfs.mod
sun.mod
xfs.mod
Remaining modules and kernel.img are identical.
Christian
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Endianness macros capitalization
2008-07-10 19:25 ` Christian Franke
@ 2008-07-10 19:59 ` Pavel Roskin
2008-07-11 7:06 ` Jordi Mallach
2008-07-11 8:53 ` Pavel Roskin
1 sibling, 1 reply; 20+ messages in thread
From: Pavel Roskin @ 2008-07-10 19:59 UTC (permalink / raw)
To: grub-devel
Quoting Christian Franke <Christian.Franke@t-online.de>:
> GCC optimizer does a good job optimizing register use, but function
> call is still better:
>
> Debian gcc 4.1.2-7:
> inline (portable)=357,asm (%%eax)=126, asm (%0)=107, function=104
>
> Cygwin gcc 3.4.4:
> inline (portable)=340, asm (%%eax)=124, asm (%0)=104, function=96
Thank you for testing! We could do better with the bswap instruction,
but it appeared in 486. I think we still need to support 386.
I understand you are counting the function body, so the
calling/inlining code is bigger by more than just 3 or 8 bytes. If
the code grows, so will the number of the calls or the inline sites.
Yet the function size will remain the same.
Chances are that if we use 3 "rol"s and allow all general purpose
registers, we may save a couple more bytes, because the assembler will
still convert rol to xchg when possible, whereas gcc will be able to
use non-e*x registers if needed.
But I don't want to ask you to spend any more time on it. I think
we'll have to go with the function for 32 bit.
--
Regards,
Pavel Roskin
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Endianness macros capitalization
2008-07-10 19:59 ` Pavel Roskin
@ 2008-07-11 7:06 ` Jordi Mallach
0 siblings, 0 replies; 20+ messages in thread
From: Jordi Mallach @ 2008-07-11 7:06 UTC (permalink / raw)
To: grub-devel
On Thu, Jul 10, 2008 at 03:59:14PM -0400, Pavel Roskin wrote:
> Thank you for testing! We could do better with the bswap instruction,
> but it appeared in 486. I think we still need to support 386.
Do we, or better yet, can we? I thought most distros had dropped 386
support from kernels and toolchains.
At least Debian has, and Debian isn't know for dropping old hardware
that easily. We even had libc5 toolchain until before the etch release,
iirc. :)
--
Jordi Mallach Pérez -- Debian developer http://www.debian.org/
jordi@sindominio.net jordi@debian.org http://www.sindominio.net/
GnuPG public key information available at http://oskuro.net/
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Endianness macros capitalization
2008-07-10 19:25 ` Christian Franke
2008-07-10 19:59 ` Pavel Roskin
@ 2008-07-11 8:53 ` Pavel Roskin
2008-07-11 9:07 ` Pavel Roskin
1 sibling, 1 reply; 20+ messages in thread
From: Pavel Roskin @ 2008-07-11 8:53 UTC (permalink / raw)
To: The development of GRUB 2
Quoting Christian Franke <Christian.Franke@t-online.de>:
> #define grub_swap_bytes32(x) \
> ({ \
> grub_uint32_t _x = (x), _y; \
> asm ( \
> "xchgb %b0,%h0\n" \
> "roll $0x10,%0\n" \
> "xchgb %b0,%h0\n" \
> : "=q"(_y) : "0"(_x) \
> ); \
> _y; \
> })
Results are much better for the inline implementation with gcc 4.3.1
(Fedora development AKA Rawhide).
I tried the idea I suggested (let gcc use any register and optimize in
the assembler). The result was a true monstrosity, and it failed to
beat your approach, where gcc is forced to use the first four
registers. The size of all modules together increased by 8 bytes.
However, this code beats everything in my tests by a wide margin:
#define grub_swap_lo(x) \
({ \
grub_uint32_t _x = (x), _y; \
asm ( \
"xchgb %b0,%h0\n" \
: "=q"(_y) : "0"(_x) \
); \
_y; \
})
#define grub_swap_hi(x) \
({ \
grub_uint32_t _x = (x), _y; \
asm ( \
"roll $0x10,%0\n" \
: "=r"(_y) : "0"(_x) \
); \
_y; \
})
#define grub_swap_bytes32(x) grub_swap_lo(grub_swap_hi(grub_swap_lo(x)))
It needs some cleanups to avoid warnings and use better names, but I
think it should go in. Telling gcc the little details really helps.
Sum of module sizes ("du -b -S"):
366976 original
364988 "=q", xchgb
365460 "=q", rolw $8,%w0
365420 "=r", rolw $8,%w0
364996 monstrosity ("=r", conditional xchgb/rolw)
364728 "little details"
With my compiler, everything beats the code we have.
--
Regards,
Pavel Roskin
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Endianness macros capitalization
2008-07-11 8:53 ` Pavel Roskin
@ 2008-07-11 9:07 ` Pavel Roskin
2008-07-11 13:21 ` Christian Franke
0 siblings, 1 reply; 20+ messages in thread
From: Pavel Roskin @ 2008-07-11 9:07 UTC (permalink / raw)
To: grub-devel
Quoting Pavel Roskin <proski@gnu.org>:
> 366976 original
> 364988 "=q", xchgb
> 365460 "=q", rolw $8,%w0
> 365420 "=r", rolw $8,%w0
> 364996 monstrosity ("=r", conditional xchgb/rolw)
> 364728 "little details"
Marginally better (by just 12 bytes):
364716 "little details" + "monstrosity"
#define grub_swap_lo(x) \
({ \
grub_uint32_t _x = (x), _y; \
asm ( \
".ifc %0,%%eax\n" \
"xchgb %b0,%h0\n" \
".else\n" \
".ifc %0,%%ebx\n" \
"xchgb %b0,%h0\n" \
".else\n" \
".ifc %0,%%ecx\n" \
"xchgb %b0,%h0\n" \
".else\n" \
".ifc %0,%%edx\n" \
"xchgb %b0,%h0\n" \
".else\n" \
"rolw $8,%w0\n" \
".endif\n" \
".endif\n" \
".endif\n" \
".endif\n" \
: "=r"(_y) : "0"(_x) \
); \
_y; \
})
#define grub_swap_hi(x) \
({ \
grub_uint32_t _x = (x), _y; \
asm ( \
"roll $0x10,%0\n" \
: "=r"(_y) : "0"(_x) \
); \
_y; \
})
#define grub_swap_bytes32(x) grub_swap_lo(grub_swap_hi(grub_swap_lo(x)))
--
Regards,
Pavel Roskin
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Endianness macros capitalization
2008-07-11 9:07 ` Pavel Roskin
@ 2008-07-11 13:21 ` Christian Franke
2008-07-11 18:33 ` Pavel Roskin
0 siblings, 1 reply; 20+ messages in thread
From: Christian Franke @ 2008-07-11 13:21 UTC (permalink / raw)
To: The development of GRUB 2
[-- Attachment #1: Type: text/plain, Size: 738 bytes --]
Pavel Roskin wrote:
> Quoting Pavel Roskin <proski@gnu.org>:
>
>> 366976 original
>> 364988 "=q", xchgb
>> 365460 "=q", rolw $8,%w0
>> 365420 "=r", rolw $8,%w0
>> 364996 monstrosity ("=r", conditional xchgb/rolw)
>> 364728 "little details"
>
> Marginally better (by just 12 bytes):
> 364716 "little details" + "monstrosity"
>
Very nice optimization idea!
The practical advantage is probably limited by the fact that kernel.img
and most of the modules typically included in a i386-pc core.img are not
affected.
I attached a diff of "size *.mod" outputs for the test mentioned in my
last mail. '-' original code, '+' grub_swap_*() replaced by dummies.
Large savings potential is in afs.mod, hfs*.mod, and xfs.mod.
Christian
[-- Attachment #2: sizediff.txt --]
[-- Type: text/plain, Size: 4751 bytes --]
--- sizes.original 2008-07-10 20:33:57.968750000 +0200
+++ sizes.noswap 2008-07-10 20:34:03.046875000 +0200
@@ -1,78 +1,78 @@
text data bss dec hex filename
4352 24 160 4536 11b8 _bsd.mod
564 8 48 620 26c _chain.mod
2128 8 48 2184 888 _linux.mod
6528 48 128 6704 1a30 _multiboot.mod
820 28 0 848 350 acorn.mod
- 2552 44 16 2612 a34 affs.mod
- 5532 40 16 5588 15d4 afs.mod
- 960 28 16 1004 3ec amiga.mod
+ 2136 44 16 2196 894 affs.mod
+ 2656 40 16 2712 a98 afs.mod
+ 692 28 16 736 2e0 amiga.mod
212 8 0 220 dc aout.mod
- 1000 28 16 1044 414 apple.mod
- 3028 36 16 3080 c08 ata.mod
+ 888 28 16 932 3a4 apple.mod
+ 2948 36 16 3000 bb8 ata.mod
1756 44 16 1816 718 biosdisk.mod
876 8 16 900 384 bitmap.mod
764 20 0 784 310 blocklist.mod
148 16 0 164 a4 boot.mod
628 16 0 644 284 bsd.mod
424 16 0 440 1b8 cat.mod
340 24 0 364 16c chain.mod
684 16 0 700 2bc cmp.mod
396 20 0 416 1a0 configfile.mod
1228 36 16 1280 500 cpio.mod
332 16 16 364 16c cpuid.mod
620 16 0 636 27c echo.mod
2292 12 0 2304 900 elf.mod
2044 44 16 2104 838 ext2.mod
2816 32 16 2864 b30 fat.mod
832 32 16 880 370 font.mod
1208 8 0 1216 4c0 fshelp.mod
3480 132 224 3836 efc gfxterm.mod
1032 32 16 1080 438 gpt.mod
5808 488 0 6296 1898 gzio.mod
284 16 0 300 12c halt.mod
116 16 0 132 84 hello.mod
480 16 0 496 1f0 help.mod
1220 20 0 1240 4d8 hexdump.mod
- 4252 32 64 4348 10fc hfs.mod
- 4184 44 16 4244 1094 hfsplus.mod
- 3128 44 16 3188 c74 iso9660.mod
+ 3220 32 64 3316 cf4 hfs.mod
+ 2628 44 16 2688 a80 hfsplus.mod
+ 3108 44 16 3168 c60 iso9660.mod
2916 32 16 2964 b94 jfs.mod
- 3968 40 0 4008 fa8 jpeg.mod
+ 3956 40 0 3996 f9c jpeg.mod
232 24 0 256 100 linux.mod
1076 52 16 1144 478 loopback.mod
1452 12 0 1464 5b8 ls.mod
2244 20 0 2264 8d8 lspci.mod
3260 36 32 3328 d00 lvm.mod
484 40 32 556 22c memdisk.mod
2512 36 16 2564 a04 minix.mod
264 32 0 296 128 multiboot.mod
25268 104 1936 27308 6aac normal.mod
5576 44 32 5652 1614 ntfs.mod
1880 20 0 1900 76c ntfscomp.mod
1604 24 16 1644 66c pc.mod
188 4 0 192 c0 pci.mod
604 16 0 620 26c play.mod
- 4144 24 0 4168 1048 png.mod
+ 4108 24 0 4132 1024 png.mod
3312 40 16 3368 d28 raid.mod
276 16 0 292 124 read.mod
108 16 0 124 7c reboot.mod
7308 48 16 7372 1ccc reiserfs.mod
1164 16 0 1180 49c search.mod
2492 216 96 2804 af4 serial.mod
- 2284 40 16 2340 924 sfs.mod
+ 1836 40 16 1892 764 sfs.mod
540 16 32 588 24c sleep.mod
- 1056 24 16 1096 448 sun.mod
+ 948 24 16 988 3dc sun.mod
416 20 0 436 1b4 terminal.mod
5188 28 512 5728 1660 terminfo.mod
264 16 0 280 118 test.mod
1468 24 0 1492 5d4 tga.mod
2724 52 16 2792 ae8 udf.mod
2936 52 16 3004 bbc ufs.mod
8832 204 1952 10988 2aec vbe.mod
624 20 0 644 284 vbeinfo.mod
1172 20 0 1192 4a8 vbetest.mod
1668 108 13584 15360 3c00 vga.mod
1284 8 32 1324 52c video.mod
732 32 0 764 2fc videotest.mod
- 4980 40 16 5036 13ac xfs.mod
+ 2924 40 16 2980 ba4 xfs.mod
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Endianness macros capitalization
2008-07-11 13:21 ` Christian Franke
@ 2008-07-11 18:33 ` Pavel Roskin
0 siblings, 0 replies; 20+ messages in thread
From: Pavel Roskin @ 2008-07-11 18:33 UTC (permalink / raw)
To: The development of GRUB 2
On Fri, 2008-07-11 at 15:21 +0200, Christian Franke wrote:
> Very nice optimization idea!
>
> The practical advantage is probably limited by the fact that kernel.img
> and most of the modules typically included in a i386-pc core.img are not
> affected.
>
> I attached a diff of "size *.mod" outputs for the test mentioned in my
> last mail. '-' original code, '+' grub_swap_*() replaced by dummies.
> Large savings potential is in afs.mod, hfs*.mod, and xfs.mod.
...
> - 4980 40 16 5036 13ac xfs.mod
> + 2924 40 16 2980 ba4 xfs.mod
It is a big deal for xfs users. Even more so for gpt+raid+lvm+xfs
users, who are against the wall :-)
--
Regards,
Pavel Roskin
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Endianness macros capitalization
2008-07-09 12:50 ` Christian Franke
2008-07-09 17:57 ` Pavel Roskin
@ 2008-07-20 13:45 ` Christian Franke
2008-07-20 23:31 ` Pavel Roskin
1 sibling, 1 reply; 20+ messages in thread
From: Christian Franke @ 2008-07-20 13:45 UTC (permalink / raw)
To: The development of GRUB 2
Christian Franke wrote:
> Pavel Roskin wrote:
>
>> On Tue, 2008-07-08 at 20:04 +0200, Christian Franke wrote:
>>
>>
>>> With old gcc versions without the "rol" optimization, even the 16
>>> bit swap should be a function:
>>>
>>>
>> Or better yet, an asm statement.
>>
>> We should consider optimized assembly vs function call. Even the
>> 32-bit swap could be shorter:
>>
>> a: 86 c4 xchg %al,%ah
>> c: c1 c0 10 rol $0x10,%eax
>> f: 86 c4 xchg %al,%ah
>> 11:
>>
>> That's 7 bytes! ...
>>
>
> But the function call in the 32-bit case requires only 5 bytes :-)
>
Sorry, I was wrong here. The assumption about function call size was
only true for module-local calls. If a module calls a function in
kernel, each 5 byte call requires another 8 bytes for the ELF relocation
table entry.
Christian
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Endianness macros capitalization
2008-07-20 13:45 ` Christian Franke
@ 2008-07-20 23:31 ` Pavel Roskin
0 siblings, 0 replies; 20+ messages in thread
From: Pavel Roskin @ 2008-07-20 23:31 UTC (permalink / raw)
To: The development of GRUB 2
On Sun, 2008-07-20 at 15:45 +0200, Christian Franke wrote:
> Christian Franke wrote:
> > But the function call in the 32-bit case requires only 5 bytes :-)
> >
>
> Sorry, I was wrong here. The assumption about function call size was
> only true for module-local calls. If a module calls a function in
> kernel, each 5 byte call requires another 8 bytes for the ELF relocation
> table entry.
Actually, the new versions of gcc have __builtin_bswap32 and
__builtin_bswap64, which are optimized even better. There is no
__builtin_bswap16 because it is said that any correct implementation
will be optimized anyway:
http://gcc.gnu.org/ml/gcc-patches/2006-08/msg00079.html
gcc 4.3.0 from Fedora is working fine with them. I'm not sure about gcc
4.2. There is only one bug. Suppose I use this:
#define grub_swap_bytes32(x) __builtin_bswap32(x)
#define grub_swap_bytes64(x) __builtin_bswap64(x)
Then I get this warning:
partmap/apple.c: In function 'apple_partition_map_iterate':
partmap/apple.c:133: warning: format '%x' expects type 'unsigned int',
but argument 8 has type 'unsigned int'
partmap/apple.c:133: warning: format '%x' expects type 'unsigned int',
but argument 9 has type 'unsigned int'
Those arguments are produced by grub_swap_bytes32(). But if I use
wrapper functions, then there are no warnings, and the code side is
approximately the same. We'll need wrappers to support sparse.
--
Regards,
Pavel Roskin
^ permalink raw reply [flat|nested] 20+ messages in thread
end of thread, other threads:[~2008-07-20 23:31 UTC | newest]
Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-07-05 13:27 Endianness macros capitalization Javier Martín
2008-07-05 21:30 ` Pavel Roskin
2008-07-05 22:54 ` Javier Martín
2008-07-05 23:14 ` Pavel Roskin
2008-07-06 18:30 ` Robert Millan
2008-07-06 20:02 ` Javier Martín
2008-07-07 19:25 ` Christian Franke
2008-07-08 18:04 ` Christian Franke
2008-07-09 6:47 ` Pavel Roskin
2008-07-09 12:50 ` Christian Franke
2008-07-09 17:57 ` Pavel Roskin
2008-07-10 19:25 ` Christian Franke
2008-07-10 19:59 ` Pavel Roskin
2008-07-11 7:06 ` Jordi Mallach
2008-07-11 8:53 ` Pavel Roskin
2008-07-11 9:07 ` Pavel Roskin
2008-07-11 13:21 ` Christian Franke
2008-07-11 18:33 ` Pavel Roskin
2008-07-20 13:45 ` Christian Franke
2008-07-20 23:31 ` Pavel Roskin
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.