All of lore.kernel.org
 help / color / mirror / Atom feed
* Endianness macros capitalization
@ 2008-07-05 13:27 Javier Martín
  2008-07-05 21:30 ` Pavel Roskin
  0 siblings, 1 reply; 20+ messages in thread
From: Javier Martín @ 2008-07-05 13:27 UTC (permalink / raw)
  To: The development of GRUB 2

[-- Attachment #1: Type: text/plain, Size: 254 bytes --]

Just my two (euro) cents: why are the endianness macros written like
functions? I'm talking about the grub_Xe_to_cpuNN family, which look
like function calls instead of the macros they are. Shouldn't they be
capitalized to GRUB_LE_TO_CPU32 and such?

[-- Attachment #2: Esta parte del mensaje está firmada digitalmente --]
[-- Type: application/pgp-signature, Size: 827 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Endianness macros capitalization
  2008-07-05 13:27 Endianness macros capitalization Javier Martín
@ 2008-07-05 21:30 ` Pavel Roskin
  2008-07-05 22:54   ` Javier Martín
  0 siblings, 1 reply; 20+ messages in thread
From: Pavel Roskin @ 2008-07-05 21:30 UTC (permalink / raw)
  To: The development of GRUB 2

On Sat, 2008-07-05 at 15:27 +0200, Javier Martín wrote:
> Just my two (euro) cents: why are the endianness macros written like
> functions? I'm talking about the grub_Xe_to_cpuNN family, which look
> like function calls instead of the macros they are. Shouldn't they be
> capitalized to GRUB_LE_TO_CPU32 and such?

They probably should be functions.  We may want to sparse annotate GRUB
one day, and then inline functions in the only way to go.

I prefer capitalized names only for macros that cannot be functions or
have non-trivial size effects.

-- 
Regards,
Pavel Roskin



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Endianness macros capitalization
  2008-07-05 21:30 ` Pavel Roskin
@ 2008-07-05 22:54   ` Javier Martín
  2008-07-05 23:14     ` Pavel Roskin
  2008-07-06 18:30     ` Robert Millan
  0 siblings, 2 replies; 20+ messages in thread
From: Javier Martín @ 2008-07-05 22:54 UTC (permalink / raw)
  To: The development of GRUB 2

[-- Attachment #1: Type: text/plain, Size: 746 bytes --]

El sáb, 05-07-2008 a las 17:30 -0400, Pavel Roskin escribió:
> They probably should be functions.  We may want to sparse annotate GRUB
> one day, and then inline functions in the only way to go.
Hmm... you mean changing this

#define grub_swap_bytes16(x)	\
({ \
   grub_uint16_t _x = (x); \
   (grub_uint16_t) ((_x << 8) | (_x >> 8)); \
})

...for this

inline grub_uint16_t grub_swap_bytes16(uint16_t x)
{
  return (x << 8) | (x >> 8);
}

and such? The pro is that we get rid of the ugly hack in the macro
version that ensures single evaluation, but a con is that we cannot
_force_ any random compiler to inline anything, so we might end up with
gross numbers of function calls.

By the way, what is "sparse annotate"?

[-- Attachment #2: Esta parte del mensaje está firmada digitalmente --]
[-- Type: application/pgp-signature, Size: 827 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Endianness macros capitalization
  2008-07-05 22:54   ` Javier Martín
@ 2008-07-05 23:14     ` Pavel Roskin
  2008-07-06 18:30     ` Robert Millan
  1 sibling, 0 replies; 20+ messages in thread
From: Pavel Roskin @ 2008-07-05 23:14 UTC (permalink / raw)
  To: The development of GRUB 2

On Sun, 2008-07-06 at 00:54 +0200, Javier Martín wrote:
> El sáb, 05-07-2008 a las 17:30 -0400, Pavel Roskin escribió:
> > They probably should be functions.  We may want to sparse annotate GRUB
> > one day, and then inline functions in the only way to go.
> Hmm... you mean changing this
> 
> #define grub_swap_bytes16(x)	\
> ({ \
>    grub_uint16_t _x = (x); \
>    (grub_uint16_t) ((_x << 8) | (_x >> 8)); \
> })
> 
> ...for this
> 
> inline grub_uint16_t grub_swap_bytes16(uint16_t x)
> {
>   return (x << 8) | (x >> 8);
> }
> 
> and such?

Actually, grub_swap_bytes16 should be avoided in the general code.  We
need to convert macros like grub_cpu_to_le16() to functions.

>  The pro is that we get rid of the ugly hack in the macro
> version that ensures single evaluation, but a con is that we cannot
> _force_ any random compiler to inline anything, so we might end up with
> gross numbers of function calls.

We can force inlining in gcc with always_inline.  But we don't need to
force anything.  Modern versions of gcc can decide for us, but we can
influence the choice by using optimization flags, such as -O2 and -Os.
In come cases, function calls would take less space than the actual code
for byte swapping.

> By the way, what is "sparse annotate"?

There is a utility called sparse:
git://git.kernel.org/pub/scm/linux/kernel/git/josh/sparse.git

It acts like a compiler, but it only produces warnings.  It can
distinguish bitwise types that cannot be treated like normal integers
for arithmetic operations like addition.

If we mark little-endian and big-endian variables as bitwise, sparse
will warn if such variables are used without byte swapping.  Linux uses
it a lot.  Sparse is really a very good checker for that purpose.
Sparse does other checks too.

-- 
Regards,
Pavel Roskin



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Endianness macros capitalization
  2008-07-05 22:54   ` Javier Martín
  2008-07-05 23:14     ` Pavel Roskin
@ 2008-07-06 18:30     ` Robert Millan
  2008-07-06 20:02       ` Javier Martín
  1 sibling, 1 reply; 20+ messages in thread
From: Robert Millan @ 2008-07-06 18:30 UTC (permalink / raw)
  To: The development of GRUB 2

On Sun, Jul 06, 2008 at 12:54:58AM +0200, Javier Martín wrote:
> El sáb, 05-07-2008 a las 17:30 -0400, Pavel Roskin escribió:
> > They probably should be functions.  We may want to sparse annotate GRUB
> > one day, and then inline functions in the only way to go.
> Hmm... you mean changing this
> 
> #define grub_swap_bytes16(x)	\
> ({ \
>    grub_uint16_t _x = (x); \
>    (grub_uint16_t) ((_x << 8) | (_x >> 8)); \
> })
> 
> ...for this
> 
> inline grub_uint16_t grub_swap_bytes16(uint16_t x)
> {
>   return (x << 8) | (x >> 8);
> }

I know I get to be annoying about this, but which of these two (plus the
non-inline version) would result in _smaller_ code?

Function calls on i386-pc are cheap (because we use the regparm hack), so
maybe it'd work better using normal functions.

-- 
Robert Millan

<GPLv2> I know my rights; I want my phone call!
<DRM> What good is a phone call… if you are unable to speak?
(as seen on /.)



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Endianness macros capitalization
  2008-07-06 18:30     ` Robert Millan
@ 2008-07-06 20:02       ` Javier Martín
  2008-07-07 19:25         ` Christian Franke
  0 siblings, 1 reply; 20+ messages in thread
From: Javier Martín @ 2008-07-06 20:02 UTC (permalink / raw)
  To: The development of GRUB 2

[-- Attachment #1: Type: text/plain, Size: 1090 bytes --]

El dom, 06-07-2008 a las 20:30 +0200, Robert Millan escribió:
> On Sun, Jul 06, 2008 at 12:54:58AM +0200, Javier Martín wrote:
> > El sáb, 05-07-2008 a las 17:30 -0400, Pavel Roskin escribió:
> > > They probably should be functions.  We may want to sparse annotate GRUB
> > > one day, and then inline functions in the only way to go.
> > Hmm... you mean changing this
> > 
> > #define grub_swap_bytes16(x)	\
> > ({ \
> >    grub_uint16_t _x = (x); \
> >    (grub_uint16_t) ((_x << 8) | (_x >> 8)); \
> > })
> > 
> > ...for this
> > 
> > inline grub_uint16_t grub_swap_bytes16(uint16_t x)
> > {
> >   return (x << 8) | (x >> 8);
> > }
> 
> I know I get to be annoying about this, but which of these two (plus the
> non-inline version) would result in _smaller_ code?
> 
> Function calls on i386-pc are cheap (because we use the regparm hack), so
> maybe it'd work better using normal functions.
> 
If we are to take the space-saving route, the best we can do is turn
them to functions, maybe even _without_ the "inline" keyword, and GCC
will do what's best.

[-- Attachment #2: Esta parte del mensaje está firmada digitalmente --]
[-- Type: application/pgp-signature, Size: 827 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Endianness macros capitalization
  2008-07-06 20:02       ` Javier Martín
@ 2008-07-07 19:25         ` Christian Franke
  2008-07-08 18:04           ` Christian Franke
  0 siblings, 1 reply; 20+ messages in thread
From: Christian Franke @ 2008-07-07 19:25 UTC (permalink / raw)
  To: The development of GRUB 2

Javier Martín wrote:
> El dom, 06-07-2008 a las 20:30 +0200, Robert Millan escribió:
>   
>> On Sun, Jul 06, 2008 at 12:54:58AM +0200, Javier Martín wrote:
>>     
>>> El sáb, 05-07-2008 a las 17:30 -0400, Pavel Roskin escribió:
>>>       
>>>> They probably should be functions.  We may want to sparse annotate GRUB
>>>> one day, and then inline functions in the only way to go.
>>>>         
>>> Hmm... you mean changing this
>>>
>>> #define grub_swap_bytes16(x)	\
>>> ({ \
>>>    grub_uint16_t _x = (x); \
>>>    (grub_uint16_t) ((_x << 8) | (_x >> 8)); \
>>> })
>>>
>>> ...for this
>>>
>>> inline grub_uint16_t grub_swap_bytes16(uint16_t x)
>>> {
>>>   return (x << 8) | (x >> 8);
>>> }
>>>       
>> I know I get to be annoying about this, but which of these two (plus the
>> non-inline version) would result in _smaller_ code?
>>
>> Function calls on i386-pc are cheap (because we use the regparm hack), so
>> maybe it'd work better using normal functions.
>>
>>     

Assembly code for grub_swap_bytes16 from Debian gcc 4.1.2-7:

Macro or Inline: 4 bytes (minus possible additional benefit from 
register level optimizations)

66 c1 c0 08       rol    $0x8,%ax


Function call:  11 bytes

0f b7 c0          movzwl %ax,%eax
e8 xx xx xx xx    call  grub_swap_bytes16
0f b7 c0          movzwl %ax,%eax

The break even is possibly at grub_swap_bytes64() :-)

> If we are to take the space-saving route, the best we can do is turn
> them to functions, maybe even _without_ the "inline" keyword, and GCC
> will do what's best.
>   
>   

How can this be accomplished for functions in include files?

If non-inline functions are declared non-static in an include file, 
duplicate symbols will result (Even with 'inline' you have to be careful 
due to different inline modes).

If declared static and GCC decides to emit a function instead of inline, 
the code will be duplicated in all modules which use this function.

AFAIK, GCC supports 'vague linkage' code (each put in an extra section 
with '.linkonce' attribute) only for C++, but not for C. Or is there a 
GCC specific function attribute to accomplish this?


Christian




^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Endianness macros capitalization
  2008-07-07 19:25         ` Christian Franke
@ 2008-07-08 18:04           ` Christian Franke
  2008-07-09  6:47             ` Pavel Roskin
  0 siblings, 1 reply; 20+ messages in thread
From: Christian Franke @ 2008-07-08 18:04 UTC (permalink / raw)
  To: The development of GRUB 2

[-- Attachment #1: Type: text/plain, Size: 1633 bytes --]

Christian Franke wrote:
>>>  
>
> Assembly code for grub_swap_bytes16 from Debian gcc 4.1.2-7:
>
> Macro or Inline: 4 bytes (minus possible additional benefit from 
> register level optimizations)
>
> 66 c1 c0 08       rol    $0x8,%ax
>
>
> Function call:  11 bytes
>
> 0f b7 c0          movzwl %ax,%eax
> e8 xx xx xx xx    call  grub_swap_bytes16
> 0f b7 c0          movzwl %ax,%eax
>
> The break even is possibly at grub_swap_bytes64() :-)
>

I take that back. For i386, even grub_swap_bytes32() should possibly be 
a function.

The attached script compares the sizes of 8 inline expansions of 
grub_swap_bytesNN() vs. function calls.

Sample output (Debian gcc 4.1.2-7):
/* 16 bit: inline=88, function=128 */
/* 32 bit: inline=357, function=104 */
#define GRUB_DONT_INLINE_GRUB_SWAP_BYTES32 1
/* 64 bit: inline=2621, function=167 */
#define GRUB_DONT_INLINE_GRUB_SWAP_BYTES64 1

With old gcc versions without the "rol" optimization, even the 16 bit 
swap should be a function:

(Cygwin gcc 3.4.4):
/* 16 bit: inline=148, function=116 */
#define GRUB_DONT_INLINE_GRUB_SWAP_BYTES16 1
/* 32 bit: inline=340, function=96 */
#define GRUB_DONT_INLINE_GRUB_SWAP_BYTES32 1
/* 64 bit: inline=2876, function=164 */
#define GRUB_DONT_INLINE_GRUB_SWAP_BYTES64 1


Interestingly, the 64bit inline result is much smaller if 
'-fomit-frame-pointer' is added:

(Cygwin gcc 3.4.4):
/* 16 bit: inline=144, function=112 */
#define GRUB_DONT_INLINE_GRUB_SWAP_BYTES16 1
/* 32 bit: inline=336, function=92 */
#define GRUB_DONT_INLINE_GRUB_SWAP_BYTES32 1
/* 64 bit: inline=1372, function=160 */
#define GRUB_DONT_INLINE_GRUB_SWAP_BYTES64 1


Christian


[-- Attachment #2: testsizes.sh.txt --]
[-- Type: text/plain, Size: 996 bytes --]

#!/bin/sh

set -e

srcdir=..

cat <<EOF > test.c
#include <grub/types.h>

#ifdef extf
type extf(type);
#define func extf
#else
#define func inlf
#endif

extern type x1,x2,x3,x4;

type test(type x, type *p)
{
  x4 = func(func(x1) + func(x2) + func(x3));
  *p = func(func(*p) * 13);
  return func(func(x) + 42); 
}
EOF

CC="gcc -c -I. -Iinclude -I${srcdir}/include -Os -falign-jumps=1 -falign-loops=1 -falign-functions=1 -mregparm=3 -mrtd $*"

for bits in 16 32 64; do
  ${CC} -Dtype=grub_uint${bits}_t -Dinlf=grub_swap_bytes${bits}   -o test${bits}i.o test.c
  si=$(size test${bits}i.o | sed -n '2s,^ *\([0-9]*\).*$,\1,p')
  ${CC} -Dtype=grub_uint${bits}_t -Dextf=grub_swap_bytes${bits}_f -o test${bits}f.o test.c
  sf=$(size test${bits}f.o | sed -n '2s,^ *\([0-9]*\).*$,\1,p')
  echo "/* ${bits} bit: inline=$si, function=$sf */"
  [ $si -gt $sf ] && echo "#define GRUB_DONT_INLINE_GRUB_SWAP_BYTES${bits} 1"
  #rm -f test${bits}{i,f}.o
done

rm -f test.c


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Endianness macros capitalization
  2008-07-08 18:04           ` Christian Franke
@ 2008-07-09  6:47             ` Pavel Roskin
  2008-07-09 12:50               ` Christian Franke
  0 siblings, 1 reply; 20+ messages in thread
From: Pavel Roskin @ 2008-07-09  6:47 UTC (permalink / raw)
  To: The development of GRUB 2

On Tue, 2008-07-08 at 20:04 +0200, Christian Franke wrote:

> With old gcc versions without the "rol" optimization, even the 16 bit 
> swap should be a function:

Or better yet, an asm statement.

We should consider optimized assembly vs function call.  Even the 32-bit
swap could be shorter:

   a:   86 c4                   xchg   %al,%ah
   c:   c1 c0 10                rol    $0x10,%eax
   f:   86 c4                   xchg   %al,%ah
  11:

That's 7 bytes!  And if written properly, it could work with any of the
registers that allow access to the lower two bytes (%eax, %ebx, %ecx and
%edx), thus giving more flexibility to the optimizer.

-- 
Regards,
Pavel Roskin



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Endianness macros capitalization
  2008-07-09  6:47             ` Pavel Roskin
@ 2008-07-09 12:50               ` Christian Franke
  2008-07-09 17:57                 ` Pavel Roskin
  2008-07-20 13:45                 ` Christian Franke
  0 siblings, 2 replies; 20+ messages in thread
From: Christian Franke @ 2008-07-09 12:50 UTC (permalink / raw)
  To: The development of GRUB 2

Pavel Roskin wrote:
> On Tue, 2008-07-08 at 20:04 +0200, Christian Franke wrote:
> 
> > With old gcc versions without the "rol" optimization, even the 16
> > bit swap should be a function:
> > 
> 
> Or better yet, an asm statement.
> 
> We should consider optimized assembly vs function call.  Even the
> 32-bit swap could be shorter:
> 
> a:   86 c4                   xchg   %al,%ah
> c:   c1 c0 10                rol    $0x10,%eax
> f:   86 c4                   xchg   %al,%ah
> 11:
> 
> That's 7 bytes!  ...

But the function call in the 32-bit case requires only 5 bytes :-)
(The 16-bit case requires more due to short->int propagation)

Overall size from inline asm would only be smaller if there is any
benefit from additional optimizations.

Here is a possibly working draft for include/grub/i386/types.h (requires
a #ifdef in include/grub/types.h):

#define grub_swap_bytes32(x) \
({ \
   grub_uint32_t _x = (x), _y;
   asm ( \
        "xchg %%al,%%ah\n" \
        "roll $0x10,%%eax\n" \
        "xchg %%al,%%ah\n" \
        : "=a"(_y) : "0"(_x) \
    ); \
    _y; \
})

Result with the test script from my last mail:

Debian gcc 4.1.2-7:
inline (portable)=357, inline (asm)=126, function=104

Cygwin gcc 3.4.4:
inline (portable)=340, inline (asm)=124, function=96

Function call is still better. The only candidate for inline is probably
 grub_swap_bytes16().

> .... And if written properly, it could work with any of
> the registers that allow access to the lower two bytes (%eax, %ebx,
> %ecx and %edx), thus giving more flexibility to the optimizer.
> 

This would require support to access the Rl and Rh parts of eRx for each
R in [a-d]. Something like:

   asm (
        "xchg %0:l,%0:h\n"
        "roll $0x10,%0\n"
        "xchg %0:l,%0:h\n"
        : "=r"(_y) : "0"(_x) \
    );

Do gcc or gas provide a syntax to do this?


Christian






^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Endianness macros capitalization
  2008-07-09 12:50               ` Christian Franke
@ 2008-07-09 17:57                 ` Pavel Roskin
  2008-07-10 19:25                   ` Christian Franke
  2008-07-20 13:45                 ` Christian Franke
  1 sibling, 1 reply; 20+ messages in thread
From: Pavel Roskin @ 2008-07-09 17:57 UTC (permalink / raw)
  To: The development of GRUB 2

On Wed, 2008-07-09 at 14:50 +0200, Christian Franke wrote:

> Overall size from inline asm would only be smaller if there is any
> benefit from additional optimizations.

Correct.  Function calls use fixed registers for everything, but macros
don't have to.

> Result with the test script from my last mail:
> 
> Debian gcc 4.1.2-7:
> inline (portable)=357, inline (asm)=126, function=104
> 
> Cygwin gcc 3.4.4:
> inline (portable)=340, inline (asm)=124, function=96
> 
> Function call is still better. The only candidate for inline is probably
>  grub_swap_bytes16().

But we are getting closer!

> > .... And if written properly, it could work with any of
> > the registers that allow access to the lower two bytes (%eax, %ebx,
> > %ecx and %edx), thus giving more flexibility to the optimizer.
> > 
> 
> This would require support to access the Rl and Rh parts of eRx for each
> R in [a-d]. Something like:
> 
>    asm (
>         "xchg %0:l,%0:h\n"
>         "roll $0x10,%0\n"
>         "xchg %0:l,%0:h\n"
>         : "=r"(_y) : "0"(_x) \
>     );
> 
> Do gcc or gas provide a syntax to do this?

Yes.  That's %b0 and %h0.  Use "=q" for all registers with "upper
halves" (%ah-%dh).

Also, please be consistent and use width suffixes everywhere.  xchg
should be xchgb.

-- 
Regards,
Pavel Roskin



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Endianness macros capitalization
  2008-07-09 17:57                 ` Pavel Roskin
@ 2008-07-10 19:25                   ` Christian Franke
  2008-07-10 19:59                     ` Pavel Roskin
  2008-07-11  8:53                     ` Pavel Roskin
  0 siblings, 2 replies; 20+ messages in thread
From: Christian Franke @ 2008-07-10 19:25 UTC (permalink / raw)
  To: The development of GRUB 2

Pavel Roskin wrote:
> On Wed, 2008-07-09 at 14:50 +0200, Christian Franke wrote:
>   
>
>> Result with the test script from my last mail:
>>
>> Debian gcc 4.1.2-7:
>> inline (portable)=357, inline (asm)=126, function=104
>>
>> Cygwin gcc 3.4.4:
>> inline (portable)=340, inline (asm)=124, function=96
>>
>> Function call is still better. The only candidate for inline is probably
>>  grub_swap_bytes16().
>>     
>
> But we are getting closer!
>
>   

But probably not close enough in the average case, see below :-)


>>> .... And if written properly, it could work with any of
>>> the registers that allow access to the lower two bytes (%eax, %ebx,
>>> %ecx and %edx), thus giving more flexibility to the optimizer.
>>>
>>>       
>> This would require support to access the Rl and Rh parts of eRx for each
>> R in [a-d]. Something like:
>>
>>    asm (
>>         "xchg %0:l,%0:h\n"
>>         "roll $0x10,%0\n"
>>         "xchg %0:l,%0:h\n"
>>         : "=r"(_y) : "0"(_x) \
>>     );
>>
>> Do gcc or gas provide a syntax to do this?
>>     
>
> Yes.  That's %b0 and %h0.  Use "=q" for all registers with "upper
> halves" (%ah-%dh).
>
>   

Thanks for the info.

I tried this in the same script:

#define grub_swap_bytes32(x) \
({ \
   grub_uint32_t _x = (x), _y; \
   asm ( \
        "xchgb %b0,%h0\n" \
        "roll $0x10,%0\n" \
        "xchgb %b0,%h0\n" \
        : "=q"(_y) : "0"(_x) \
    ); \
    _y; \
})


GCC optimizer does a good job optimizing register use, but function call 
is still better:

Debian gcc 4.1.2-7:
inline (portable)=357,asm (%%eax)=126, asm (%0)=107, function=104

Cygwin gcc 3.4.4:
inline (portable)=340, asm (%%eax)=124, asm (%0)=104, function=96


Inline asm is only better is rare cases, e.g. with this test function:

type test(type *x)
{
  return func(x[0]) + func(x[1]) + ... + func(x[N]);
}

Result with Cygwin gcc 3.4.4:

N=0: asm=14, function=11
N=1: asm=28, function=32
N=2: asm=40, function=41
N=3: asm=52, function=51
N=4: asm=64, function=61
N=5: asm=76, function=72


To test which files would possibly be affected by any size optimization 
of grub_swap_bytes*(), I 've done a test compilation with these macros 
replaced by dummies '(x)'. Only the size of following modules changed:

affs.mod
afs.mod
amiga.mod
apple.mod
ata.mod
hfs.mod
hfsplus.mod
iso9660.mod
jpeg.mod
png.mod
sfs.mod
sun.mod
xfs.mod

Remaining modules and kernel.img are identical.

Christian




^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Endianness macros capitalization
  2008-07-10 19:25                   ` Christian Franke
@ 2008-07-10 19:59                     ` Pavel Roskin
  2008-07-11  7:06                       ` Jordi Mallach
  2008-07-11  8:53                     ` Pavel Roskin
  1 sibling, 1 reply; 20+ messages in thread
From: Pavel Roskin @ 2008-07-10 19:59 UTC (permalink / raw)
  To: grub-devel

Quoting Christian Franke <Christian.Franke@t-online.de>:

> GCC optimizer does a good job optimizing register use, but function
> call is still better:
>
> Debian gcc 4.1.2-7:
> inline (portable)=357,asm (%%eax)=126, asm (%0)=107, function=104
>
> Cygwin gcc 3.4.4:
> inline (portable)=340, asm (%%eax)=124, asm (%0)=104, function=96

Thank you for testing!  We could do better with the bswap instruction,  
but it appeared in 486.  I think we still need to support 386.

I understand you are counting the function body, so the  
calling/inlining code is bigger by more than just 3 or 8 bytes.  If  
the code grows, so will the number of the calls or the inline sites.   
Yet the function size will remain the same.

Chances are that if we use 3 "rol"s and allow all general purpose  
registers, we may save a couple more bytes, because the assembler will  
still convert rol to xchg when possible, whereas gcc will be able to  
use non-e*x registers if needed.

But I don't want to ask you to spend any more time on it.  I think  
we'll have to go with the function for 32 bit.

-- 
Regards,
Pavel Roskin



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Endianness macros capitalization
  2008-07-10 19:59                     ` Pavel Roskin
@ 2008-07-11  7:06                       ` Jordi Mallach
  0 siblings, 0 replies; 20+ messages in thread
From: Jordi Mallach @ 2008-07-11  7:06 UTC (permalink / raw)
  To: grub-devel

On Thu, Jul 10, 2008 at 03:59:14PM -0400, Pavel Roskin wrote:
> Thank you for testing!  We could do better with the bswap instruction,  
> but it appeared in 486.  I think we still need to support 386.

Do we, or better yet, can we? I thought most distros had dropped 386
support from kernels and toolchains.

At least Debian has, and Debian isn't know for dropping old hardware
that easily. We even had libc5 toolchain until before the etch release,
iirc. :)

-- 
Jordi Mallach Pérez  --  Debian developer     http://www.debian.org/
jordi@sindominio.net     jordi@debian.org     http://www.sindominio.net/
GnuPG public key information available at http://oskuro.net/



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Endianness macros capitalization
  2008-07-10 19:25                   ` Christian Franke
  2008-07-10 19:59                     ` Pavel Roskin
@ 2008-07-11  8:53                     ` Pavel Roskin
  2008-07-11  9:07                       ` Pavel Roskin
  1 sibling, 1 reply; 20+ messages in thread
From: Pavel Roskin @ 2008-07-11  8:53 UTC (permalink / raw)
  To: The development of GRUB 2

Quoting Christian Franke <Christian.Franke@t-online.de>:

> #define grub_swap_bytes32(x) \
> ({ \
>   grub_uint32_t _x = (x), _y; \
>   asm ( \
>        "xchgb %b0,%h0\n" \
>        "roll $0x10,%0\n" \
>        "xchgb %b0,%h0\n" \
>        : "=q"(_y) : "0"(_x) \
>    ); \
>    _y; \
> })

Results are much better for the inline implementation with gcc 4.3.1  
(Fedora development AKA Rawhide).

I tried the idea I suggested (let gcc use any register and optimize in  
the assembler).  The result was a true monstrosity, and it failed to  
beat your approach, where gcc is forced to use the first four  
registers.  The size of all modules together increased by 8 bytes.

However, this code beats everything in my tests by a wide margin:

#define grub_swap_lo(x) \
({ \
   grub_uint32_t _x = (x), _y; \
   asm ( \
        "xchgb %b0,%h0\n" \
        : "=q"(_y) : "0"(_x) \
    ); \
    _y; \
})
#define grub_swap_hi(x) \
({ \
   grub_uint32_t _x = (x), _y; \
   asm ( \
        "roll $0x10,%0\n" \
        : "=r"(_y) : "0"(_x) \
    ); \
    _y; \
})
#define grub_swap_bytes32(x) grub_swap_lo(grub_swap_hi(grub_swap_lo(x)))

It needs some cleanups to avoid warnings and use better names, but I  
think it should go in.  Telling gcc the little details really helps.

Sum of module sizes ("du -b -S"):

366976  original
364988  "=q", xchgb
365460  "=q", rolw $8,%w0
365420  "=r", rolw $8,%w0
364996  monstrosity ("=r", conditional xchgb/rolw)
364728  "little details"

With my compiler, everything beats the code we have.

-- 
Regards,
Pavel Roskin



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Endianness macros capitalization
  2008-07-11  8:53                     ` Pavel Roskin
@ 2008-07-11  9:07                       ` Pavel Roskin
  2008-07-11 13:21                         ` Christian Franke
  0 siblings, 1 reply; 20+ messages in thread
From: Pavel Roskin @ 2008-07-11  9:07 UTC (permalink / raw)
  To: grub-devel

Quoting Pavel Roskin <proski@gnu.org>:

> 366976  original
> 364988  "=q", xchgb
> 365460  "=q", rolw $8,%w0
> 365420  "=r", rolw $8,%w0
> 364996  monstrosity ("=r", conditional xchgb/rolw)
> 364728  "little details"

Marginally better (by just 12 bytes):
364716  "little details" + "monstrosity"

#define grub_swap_lo(x) \
({ \
   grub_uint32_t _x = (x), _y; \
   asm ( \
        ".ifc %0,%%eax\n" \
        "xchgb %b0,%h0\n" \
        ".else\n" \
        ".ifc %0,%%ebx\n" \
        "xchgb %b0,%h0\n" \
        ".else\n" \
        ".ifc %0,%%ecx\n" \
        "xchgb %b0,%h0\n" \
        ".else\n" \
        ".ifc %0,%%edx\n" \
        "xchgb %b0,%h0\n" \
        ".else\n" \
        "rolw $8,%w0\n" \
        ".endif\n" \
        ".endif\n" \
        ".endif\n" \
        ".endif\n" \
        : "=r"(_y) : "0"(_x) \
    ); \
    _y; \
})
#define grub_swap_hi(x) \
({ \
   grub_uint32_t _x = (x), _y; \
   asm ( \
        "roll $0x10,%0\n" \
        : "=r"(_y) : "0"(_x) \
    ); \
    _y; \
})
#define grub_swap_bytes32(x) grub_swap_lo(grub_swap_hi(grub_swap_lo(x)))

-- 
Regards,
Pavel Roskin



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Endianness macros capitalization
  2008-07-11  9:07                       ` Pavel Roskin
@ 2008-07-11 13:21                         ` Christian Franke
  2008-07-11 18:33                           ` Pavel Roskin
  0 siblings, 1 reply; 20+ messages in thread
From: Christian Franke @ 2008-07-11 13:21 UTC (permalink / raw)
  To: The development of GRUB 2

[-- Attachment #1: Type: text/plain, Size: 738 bytes --]

Pavel Roskin wrote:
> Quoting Pavel Roskin <proski@gnu.org>:
>
>> 366976  original
>> 364988  "=q", xchgb
>> 365460  "=q", rolw $8,%w0
>> 365420  "=r", rolw $8,%w0
>> 364996  monstrosity ("=r", conditional xchgb/rolw)
>> 364728  "little details"
>
> Marginally better (by just 12 bytes):
> 364716  "little details" + "monstrosity"
>

Very nice optimization idea!

The practical advantage is probably limited by the fact that kernel.img 
and most of the modules typically included in a i386-pc core.img are not 
affected.

I attached a diff of "size *.mod" outputs for the test mentioned in my 
last mail. '-' original code, '+' grub_swap_*() replaced by dummies. 
Large savings potential is in afs.mod, hfs*.mod, and xfs.mod.

Christian


[-- Attachment #2: sizediff.txt --]
[-- Type: text/plain, Size: 4751 bytes --]

--- sizes.original	2008-07-10 20:33:57.968750000 +0200
+++ sizes.noswap	2008-07-10 20:34:03.046875000 +0200
@@ -1,78 +1,78 @@
    text	   data	    bss	    dec	    hex	filename
    4352	     24	    160	   4536	   11b8	_bsd.mod
     564	      8	     48	    620	    26c	_chain.mod
    2128	      8	     48	   2184	    888	_linux.mod
    6528	     48	    128	   6704	   1a30	_multiboot.mod
     820	     28	      0	    848	    350	acorn.mod
-   2552	     44	     16	   2612	    a34	affs.mod
-   5532	     40	     16	   5588	   15d4	afs.mod
-    960	     28	     16	   1004	    3ec	amiga.mod
+   2136	     44	     16	   2196	    894	affs.mod
+   2656	     40	     16	   2712	    a98	afs.mod
+    692	     28	     16	    736	    2e0	amiga.mod
     212	      8	      0	    220	     dc	aout.mod
-   1000	     28	     16	   1044	    414	apple.mod
-   3028	     36	     16	   3080	    c08	ata.mod
+    888	     28	     16	    932	    3a4	apple.mod
+   2948	     36	     16	   3000	    bb8	ata.mod
    1756	     44	     16	   1816	    718	biosdisk.mod
     876	      8	     16	    900	    384	bitmap.mod
     764	     20	      0	    784	    310	blocklist.mod
     148	     16	      0	    164	     a4	boot.mod
     628	     16	      0	    644	    284	bsd.mod
     424	     16	      0	    440	    1b8	cat.mod
     340	     24	      0	    364	    16c	chain.mod
     684	     16	      0	    700	    2bc	cmp.mod
     396	     20	      0	    416	    1a0	configfile.mod
    1228	     36	     16	   1280	    500	cpio.mod
     332	     16	     16	    364	    16c	cpuid.mod
     620	     16	      0	    636	    27c	echo.mod
    2292	     12	      0	   2304	    900	elf.mod
    2044	     44	     16	   2104	    838	ext2.mod
    2816	     32	     16	   2864	    b30	fat.mod
     832	     32	     16	    880	    370	font.mod
    1208	      8	      0	   1216	    4c0	fshelp.mod
    3480	    132	    224	   3836	    efc	gfxterm.mod
    1032	     32	     16	   1080	    438	gpt.mod
    5808	    488	      0	   6296	   1898	gzio.mod
     284	     16	      0	    300	    12c	halt.mod
     116	     16	      0	    132	     84	hello.mod
     480	     16	      0	    496	    1f0	help.mod
    1220	     20	      0	   1240	    4d8	hexdump.mod
-   4252	     32	     64	   4348	   10fc	hfs.mod
-   4184	     44	     16	   4244	   1094	hfsplus.mod
-   3128	     44	     16	   3188	    c74	iso9660.mod
+   3220	     32	     64	   3316	    cf4	hfs.mod
+   2628	     44	     16	   2688	    a80	hfsplus.mod
+   3108	     44	     16	   3168	    c60	iso9660.mod
    2916	     32	     16	   2964	    b94	jfs.mod
-   3968	     40	      0	   4008	    fa8	jpeg.mod
+   3956	     40	      0	   3996	    f9c	jpeg.mod
     232	     24	      0	    256	    100	linux.mod
    1076	     52	     16	   1144	    478	loopback.mod
    1452	     12	      0	   1464	    5b8	ls.mod
    2244	     20	      0	   2264	    8d8	lspci.mod
    3260	     36	     32	   3328	    d00	lvm.mod
     484	     40	     32	    556	    22c	memdisk.mod
    2512	     36	     16	   2564	    a04	minix.mod
     264	     32	      0	    296	    128	multiboot.mod
   25268	    104	   1936	  27308	   6aac	normal.mod
    5576	     44	     32	   5652	   1614	ntfs.mod
    1880	     20	      0	   1900	    76c	ntfscomp.mod
    1604	     24	     16	   1644	    66c	pc.mod
     188	      4	      0	    192	     c0	pci.mod
     604	     16	      0	    620	    26c	play.mod
-   4144	     24	      0	   4168	   1048	png.mod
+   4108	     24	      0	   4132	   1024	png.mod
    3312	     40	     16	   3368	    d28	raid.mod
     276	     16	      0	    292	    124	read.mod
     108	     16	      0	    124	     7c	reboot.mod
    7308	     48	     16	   7372	   1ccc	reiserfs.mod
    1164	     16	      0	   1180	    49c	search.mod
    2492	    216	     96	   2804	    af4	serial.mod
-   2284	     40	     16	   2340	    924	sfs.mod
+   1836	     40	     16	   1892	    764	sfs.mod
     540	     16	     32	    588	    24c	sleep.mod
-   1056	     24	     16	   1096	    448	sun.mod
+    948	     24	     16	    988	    3dc	sun.mod
     416	     20	      0	    436	    1b4	terminal.mod
    5188	     28	    512	   5728	   1660	terminfo.mod
     264	     16	      0	    280	    118	test.mod
    1468	     24	      0	   1492	    5d4	tga.mod
    2724	     52	     16	   2792	    ae8	udf.mod
    2936	     52	     16	   3004	    bbc	ufs.mod
    8832	    204	   1952	  10988	   2aec	vbe.mod
     624	     20	      0	    644	    284	vbeinfo.mod
    1172	     20	      0	   1192	    4a8	vbetest.mod
    1668	    108	  13584	  15360	   3c00	vga.mod
    1284	      8	     32	   1324	    52c	video.mod
     732	     32	      0	    764	    2fc	videotest.mod
-   4980	     40	     16	   5036	   13ac	xfs.mod
+   2924	     40	     16	   2980	    ba4	xfs.mod

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Endianness macros capitalization
  2008-07-11 13:21                         ` Christian Franke
@ 2008-07-11 18:33                           ` Pavel Roskin
  0 siblings, 0 replies; 20+ messages in thread
From: Pavel Roskin @ 2008-07-11 18:33 UTC (permalink / raw)
  To: The development of GRUB 2

On Fri, 2008-07-11 at 15:21 +0200, Christian Franke wrote:

> Very nice optimization idea!
> 
> The practical advantage is probably limited by the fact that kernel.img 
> and most of the modules typically included in a i386-pc core.img are not 
> affected.
> 
> I attached a diff of "size *.mod" outputs for the test mentioned in my 
> last mail. '-' original code, '+' grub_swap_*() replaced by dummies. 
> Large savings potential is in afs.mod, hfs*.mod, and xfs.mod.
...
> -   4980	     40	     16	   5036	   13ac	xfs.mod
> +   2924	     40	     16	   2980	    ba4	xfs.mod

It is a big deal for xfs users.  Even more so for gpt+raid+lvm+xfs
users, who are against the wall :-)

-- 
Regards,
Pavel Roskin



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Endianness macros capitalization
  2008-07-09 12:50               ` Christian Franke
  2008-07-09 17:57                 ` Pavel Roskin
@ 2008-07-20 13:45                 ` Christian Franke
  2008-07-20 23:31                   ` Pavel Roskin
  1 sibling, 1 reply; 20+ messages in thread
From: Christian Franke @ 2008-07-20 13:45 UTC (permalink / raw)
  To: The development of GRUB 2

Christian Franke wrote:
> Pavel Roskin wrote:
>   
>> On Tue, 2008-07-08 at 20:04 +0200, Christian Franke wrote:
>>
>>     
>>> With old gcc versions without the "rol" optimization, even the 16
>>> bit swap should be a function:
>>>
>>>       
>> Or better yet, an asm statement.
>>
>> We should consider optimized assembly vs function call.  Even the
>> 32-bit swap could be shorter:
>>
>> a:   86 c4                   xchg   %al,%ah
>> c:   c1 c0 10                rol    $0x10,%eax
>> f:   86 c4                   xchg   %al,%ah
>> 11:
>>
>> That's 7 bytes!  ...
>>     
>
> But the function call in the 32-bit case requires only 5 bytes :-)
>   

Sorry, I was wrong here. The assumption about function call size was 
only true for module-local calls. If a module calls a function in 
kernel, each 5 byte call requires another 8 bytes for the ELF relocation 
table entry.

Christian




^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Endianness macros capitalization
  2008-07-20 13:45                 ` Christian Franke
@ 2008-07-20 23:31                   ` Pavel Roskin
  0 siblings, 0 replies; 20+ messages in thread
From: Pavel Roskin @ 2008-07-20 23:31 UTC (permalink / raw)
  To: The development of GRUB 2

On Sun, 2008-07-20 at 15:45 +0200, Christian Franke wrote:
> Christian Franke wrote:

> > But the function call in the 32-bit case requires only 5 bytes :-)
> >   
> 
> Sorry, I was wrong here. The assumption about function call size was 
> only true for module-local calls. If a module calls a function in 
> kernel, each 5 byte call requires another 8 bytes for the ELF relocation 
> table entry.

Actually, the new versions of gcc have __builtin_bswap32 and
__builtin_bswap64, which are optimized even better.  There is no
__builtin_bswap16 because it is said that any correct implementation
will be optimized anyway:
http://gcc.gnu.org/ml/gcc-patches/2006-08/msg00079.html

gcc 4.3.0 from Fedora is working fine with them.  I'm not sure about gcc
4.2.  There is only one bug.  Suppose I use this:

#define grub_swap_bytes32(x) __builtin_bswap32(x)
#define grub_swap_bytes64(x) __builtin_bswap64(x)

Then I get this warning:

partmap/apple.c: In function 'apple_partition_map_iterate':
partmap/apple.c:133: warning: format '%x' expects type 'unsigned int',
but argument 8 has type 'unsigned int'
partmap/apple.c:133: warning: format '%x' expects type 'unsigned int',
but argument 9 has type 'unsigned int'

Those arguments are produced by grub_swap_bytes32().  But if I use
wrapper functions, then there are no warnings, and the code side is
approximately the same.  We'll need wrappers to support sparse.

-- 
Regards,
Pavel Roskin



^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2008-07-20 23:31 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-07-05 13:27 Endianness macros capitalization Javier Martín
2008-07-05 21:30 ` Pavel Roskin
2008-07-05 22:54   ` Javier Martín
2008-07-05 23:14     ` Pavel Roskin
2008-07-06 18:30     ` Robert Millan
2008-07-06 20:02       ` Javier Martín
2008-07-07 19:25         ` Christian Franke
2008-07-08 18:04           ` Christian Franke
2008-07-09  6:47             ` Pavel Roskin
2008-07-09 12:50               ` Christian Franke
2008-07-09 17:57                 ` Pavel Roskin
2008-07-10 19:25                   ` Christian Franke
2008-07-10 19:59                     ` Pavel Roskin
2008-07-11  7:06                       ` Jordi Mallach
2008-07-11  8:53                     ` Pavel Roskin
2008-07-11  9:07                       ` Pavel Roskin
2008-07-11 13:21                         ` Christian Franke
2008-07-11 18:33                           ` Pavel Roskin
2008-07-20 13:45                 ` Christian Franke
2008-07-20 23:31                   ` Pavel Roskin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.