public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH] include/linux/etherdevice.h, kernel 2.6.14
@ 2005-10-29 14:10 Michal Srajer
  2005-10-29 14:17 ` Russell King
  0 siblings, 1 reply; 6+ messages in thread
From: Michal Srajer @ 2005-10-29 14:10 UTC (permalink / raw)
  To: linux-kernel

[-- Attachment #1: Type: text/plain, Size: 643 bytes --]

Description: Very small optimization patch for include/linux/etherdevice.h in 2.6.14 kernel.

Patch:
---------------cut-here---------------
--- linux-2.6.14/include/linux/etherdevice.h    2005-10-28 00:02:08.000000000 +0000
+++ linux/include/linux/etherdevice.h   2005-10-29 14:57:20.000000000 +0000
@@ -53,7 +53,7 @@
  */
 static inline int is_zero_ether_addr(const u8 *addr)
 {
-       return !(addr[0] | addr[1] | addr[2] | addr[3] | addr[4] | addr[5]);
+       return !(addr[0] || addr[1] || addr[2] || addr[3] || addr[4] || addr[5]);
 }

 /**
---------------cut-here---------------

Michal Srajer
michal@post.pl, michal@mat.uni.torun.pl


[-- Attachment #2: Type: application/pgp-signature, Size: 185 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] include/linux/etherdevice.h, kernel 2.6.14
  2005-10-29 14:10 [PATCH] include/linux/etherdevice.h, kernel 2.6.14 Michal Srajer
@ 2005-10-29 14:17 ` Russell King
       [not found]   ` <20051029154027.GC17715@ultra60.mat.uni.torun.pl>
  0 siblings, 1 reply; 6+ messages in thread
From: Russell King @ 2005-10-29 14:17 UTC (permalink / raw)
  To: Michal Srajer; +Cc: linux-kernel

On Sat, Oct 29, 2005 at 04:10:46PM +0200, Michal Srajer wrote:
> Description: Very small optimization patch for include/linux/etherdevice.h in 2.6.14 kernel.

How is this an optimisation?

typedef unsigned char u8;
 
static int is_zero_ether_addr1(const u8 *addr)
{
       return !(addr[0] | addr[1] | addr[2] | addr[3] | addr[4] | addr[5]);
}
 
static int is_zero_ether_addr2(const u8 *addr)
{
       return !(addr[0] || addr[1] || addr[2] || addr[3] || addr[4] || addr[5]);
}

produces on x86:

is_zero_ether_addr1:
        pushl   %ebp
        movl    %esp, %ebp
        movl    8(%ebp), %edx
        movb    1(%edx), %al
        orb     (%edx), %al
        orb     2(%edx), %al
        orb     3(%edx), %al
        orb     4(%edx), %al
        orb     5(%edx), %al
        sete    %al
        movzbl  %al, %eax
        leave
        ret

is_zero_ether_addr2:
        pushl   %ebp
        movl    %esp, %ebp
        movl    8(%ebp), %edx
        xorl    %eax, %eax
        cmpb    $0, (%edx)
        jne     .L3
        cmpb    $0, 1(%edx)
        jne     .L3
        cmpb    $0, 2(%edx)
        jne     .L3
        cmpb    $0, 3(%edx)
        jne     .L3
        cmpb    $0, 4(%edx)
        jne     .L3
        cmpb    $0, 5(%edx)
        jne     .L3
        movl    $1, %eax
.L3:
        leave
        ret

and on ARM:

is_zero_ether_addr1:
        ldrb    r1, [r0, #1]    @ zero_extendqisi2
        ldrb    r3, [r0, #0]    @ zero_extendqisi2
        ldrb    r2, [r0, #2]    @ zero_extendqisi2
        orr     r3, r3, r1
        ldrb    r1, [r0, #3]    @ zero_extendqisi2
        orr     r2, r2, r3
        ldrb    r3, [r0, #4]    @ zero_extendqisi2
        orr     r1, r1, r2
        ldrb    r2, [r0, #5]    @ zero_extendqisi2
        orr     r3, r3, r1
        orrs    r2, r2, r3
        movne   r0, #0
        moveq   r0, #1
        mov     pc, lr

is_zero_ether_addr2:
        ldrb    r3, [r0, #0]    @ zero_extendqisi2
        mov     r2, #0
        cmp     r3, r2
        bne     .L3
        ldrb    r3, [r0, #1]    @ zero_extendqisi2
        cmp     r3, r2
        bne     .L3
        ldrb    r3, [r0, #2]    @ zero_extendqisi2
        cmp     r3, r2
        bne     .L3
        ldrb    r3, [r0, #3]    @ zero_extendqisi2
        cmp     r3, r2
        bne     .L3
        ldrb    r3, [r0, #4]    @ zero_extendqisi2
        cmp     r3, r2
        bne     .L3
        ldrb    r3, [r0, #5]    @ zero_extendqisi2
        cmp     r3, r2
        movne   r2, #0
        moveq   r2, #1
.L3:
        mov     r0, r2
        mov     pc, lr

The former looks far more optimised in both cases.  In fact, the
latter on ARM is many times less efficient due to the LDR result
delays being incurred for every test.

-- 
Russell King
 Linux kernel    2.6 ARM Linux   - http://www.arm.linux.org.uk/
 maintainer of:  2.6 Serial core

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] include/linux/etherdevice.h, kernel 2.6.14
       [not found]   ` <20051029154027.GC17715@ultra60.mat.uni.torun.pl>
@ 2005-10-29 16:00     ` Russell King
  2005-10-29 21:36       ` J.A. Magallon
  0 siblings, 1 reply; 6+ messages in thread
From: Russell King @ 2005-10-29 16:00 UTC (permalink / raw)
  To: Michal Srajer; +Cc: Linux Kernel List

Please do not drop CC's from discussions on mailing lists.

On Sat, Oct 29, 2005 at 05:40:27PM +0200, Michal Srajer wrote:
> On Sat, Oct 29, 2005 at 03:17:57PM +0100, Russell King wrote:
> > On Sat, Oct 29, 2005 at 04:10:46PM +0200, Michal Srajer wrote:
> > > Description: Very small optimization patch for include/linux/etherdevice.h in 2.6.14 kernel.
> > 
> > How is this an optimisation?
> 
> I wrote C proggram which is about two times faster 
> when using is_zero_ether_addr2 than is_zero_ether_addr1.
> 
> --------cut--------
> typedef unsigned char u8;
> 
> static inline int is_zero_ether_addr1(const u8 *addr)
> {
>        return !(addr[0] | addr[1] | addr[2] | addr[3] | addr[4] | addr[5]);
> }
> 
> static inline int is_zero_ether_addr2(const u8 *addr)
> {
>        return !(addr[0] || addr[1] || addr[2] || addr[3] || addr[4] || addr[5]);
> }
> 
> main () {
>         long i;
>         u8 test_data[6] = {0x00,0x12,0xF0,0x0E,0xC9,0xDE};
>         u8 test_data0[6] = {0x00,0x00,0x00,0x00,0x00,0x00};
>         for (i=0; i<50000000; i++) {
>                 is_zero_ether_addr1(test_data);
>                 is_zero_ether_addr1(test_data0);
>         }
>         return 0;
> }
> --------cut--------
> $ time ./is_zero_ether_addr1_test
> real    0m5.986s
> user    0m5.976s
> sys     0m0.004s
> $ time ./is_zero_ether_addr2_test
> real    0m3.092s
> user    0m3.076s
> sys     0m0.004s
> 
> I use gcc 4.0.3.
> $ gcc is_zero_ether_addr1_test.c -o is_zero_ether_addr1_test
> Should I use some special gcc options?

The test is data dependent.  is_zero_ether_addr1() provides a determinstic
execution time irrespective of the supplied data.

is_zero_ether_addr2() depends on the data supplied, and whether the
architecture is able to optimise it sufficiently well (x86 may be able
to, RISC architectures less so.)

Therefore, the existing code is far more preferable, at least to me.
This is what I get on ARM:

$ /usr/bin/time ./t1
0.66user 0.02system 0:00.68elapsed 98%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+106minor)pagefaults 0swaps
$ /usr/bin/time ./t2
1.10user 0.02system 0:01.13elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+106minor)pagefaults 0swaps
$ /usr/bin/time ./t1
0.67user 0.01system 0:00.68elapsed 98%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+106mino^[[Ar)pagefaults 0swaps
$ /usr/bin/time ./t2
1.11user 0.02system 0:01.12elapsed 100%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+106minor)pagefaults 0swaps
$ /usr/bin/time ./t1
0.67user 0.02system 0:00.69elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+106minor)pagefaults 0swaps
$ /usr/bin/time ./t2
1.11user 0.01system 0:01.12elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+106minor)pagefaults 0swaps

where t1 is using is_zero_ether_addr1 and t2 is using
is_zero_ether_addr2.  That's almost twice as long for your "optimised"
version than for the present version.

-- 
Russell King

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] include/linux/etherdevice.h, kernel 2.6.14
  2005-10-29 16:00     ` Russell King
@ 2005-10-29 21:36       ` J.A. Magallon
  2005-10-30 10:22         ` Eric Piel
  2005-10-30 10:46         ` Andreas Schwab
  0 siblings, 2 replies; 6+ messages in thread
From: J.A. Magallon @ 2005-10-29 21:36 UTC (permalink / raw)
  To: Russell King; +Cc: Michal Srajer, Linux Kernel List


On 2005.10.29, at 18:00, Russell King wrote:

> Please do not drop CC's from discussions on mailing lists.
>
> On Sat, Oct 29, 2005 at 05:40:27PM +0200, Michal Srajer wrote:
>
>> On Sat, Oct 29, 2005 at 03:17:57PM +0100, Russell King wrote:
>>
>>> On Sat, Oct 29, 2005 at 04:10:46PM +0200, Michal Srajer wrote:
>>>
>>>> Description: Very small optimization patch for include/linux/ 
>>>> etherdevice.h in 2.6.14 kernel.
>>>>
>>>
>>> How is this an optimisation?
>>>
>>
>> I wrote C proggram which is about two times faster
>> when using is_zero_ether_addr2 than is_zero_ether_addr1.
>>
>> --------cut--------
>> typedef unsigned char u8;
>>
>> static inline int is_zero_ether_addr1(const u8 *addr)
>> {
>>        return !(addr[0] | addr[1] | addr[2] | addr[3] | addr[4] |  
>> addr[5]);
>> }
>>
>> static inline int is_zero_ether_addr2(const u8 *addr)
>> {
>>        return !(addr[0] || addr[1] || addr[2] || addr[3] || addr 
>> [4] || addr[5]);
>> }
>>
>> main () {
>>         long i;
>>         u8 test_data[6] = {0x00,0x12,0xF0,0x0E,0xC9,0xDE};
>>         u8 test_data0[6] = {0x00,0x00,0x00,0x00,0x00,0x00};
>>         for (i=0; i<50000000; i++) {
>>                 is_zero_ether_addr1(test_data);
>>                 is_zero_ether_addr1(test_data0);
>>         }
>>         return 0;
>> }
>> --------cut--------
>> $ time ./is_zero_ether_addr1_test
>> real    0m5.986s
>> user    0m5.976s
>> sys     0m0.004s
>> $ time ./is_zero_ether_addr2_test
>> real    0m3.092s
>> user    0m3.076s
>> sys     0m0.004s
>>
>> I use gcc 4.0.3.
>> $ gcc is_zero_ether_addr1_test.c -o is_zero_ether_addr1_test
>> Should I use some special gcc options?
>>
>
> The test is data dependent.  is_zero_ether_addr1() provides a  
> determinstic
> execution time irrespective of the supplied data.
>
> is_zero_ether_addr2() depends on the data supplied, and whether the
> architecture is able to optimise it sufficiently well (x86 may be able
> to, RISC architectures less so.)
>
> Therefore, the existing code is far more preferable, at least to me.
> This is what I get on ARM:
>
> $ /usr/bin/time ./t1
> 0.66user 0.02system 0:00.68elapsed 98%CPU (0avgtext+0avgdata  
> 0maxresident)k
> 0inputs+0outputs (0major+106minor)pagefaults 0swaps
> $ /usr/bin/time ./t2
> 1.10user 0.02system 0:01.13elapsed 99%CPU (0avgtext+0avgdata  
> 0maxresident)k
> 0inputs+0outputs (0major+106minor)pagefaults 0swaps
> $ /usr/bin/time ./t1
> 0.67user 0.01system 0:00.68elapsed 98%CPU (0avgtext+0avgdata  
> 0maxresident)k
> 0inputs+0outputs (0major+106mino^[[Ar)pagefaults 0swaps
> $ /usr/bin/time ./t2
> 1.11user 0.02system 0:01.12elapsed 100%CPU (0avgtext+0avgdata  
> 0maxresident)k
> 0inputs+0outputs (0major+106minor)pagefaults 0swaps
> $ /usr/bin/time ./t1
> 0.67user 0.02system 0:00.69elapsed 99%CPU (0avgtext+0avgdata  
> 0maxresident)k
> 0inputs+0outputs (0major+106minor)pagefaults 0swaps
> $ /usr/bin/time ./t2
> 1.11user 0.01system 0:01.12elapsed 99%CPU (0avgtext+0avgdata  
> 0maxresident)k
> 0inputs+0outputs (0major+106minor)pagefaults 0swaps
>
> where t1 is using is_zero_ether_addr1 and t2 is using
> is_zero_ether_addr2.  That's almost twice as long for your "optimised"
> version than for the present version.

Just for curiosity, could you both benchmark this also:

int is_zero_ether_addr0(const unsigned char *addr)
{
     return !(((unsigned long *)addr)[0] | ((unsigned short*)addr)[2]);
}

Assembler in x86 is

is_zero_ether_addr0:
     pushl   %ebp
     movl    %esp, %ebp
     movl    8(%ebp), %edx
     movzwl  4(%edx), %eax
     orl (%edx), %eax
     sete    %al
     movzbl  %al, %eax
     popl    %ebp
     ret


--
J.A. Magallon <jamagallon()able!es>   \          Software is like sex:
wolverine                              \    It's better when it's free
MacOS X 10.4.2, Darwin Kernel Version 8.2.0



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] include/linux/etherdevice.h, kernel 2.6.14
  2005-10-29 21:36       ` J.A. Magallon
@ 2005-10-30 10:22         ` Eric Piel
  2005-10-30 10:46         ` Andreas Schwab
  1 sibling, 0 replies; 6+ messages in thread
From: Eric Piel @ 2005-10-30 10:22 UTC (permalink / raw)
  To: J.A. Magallon; +Cc: Russell King, Michal Srajer, Linux Kernel List

J.A. Magallon wrote:
> 
> Just for curiosity, could you both benchmark this also:
> 
> int is_zero_ether_addr0(const unsigned char *addr)
> {
>     return !(((unsigned long *)addr)[0] | ((unsigned short*)addr)[2]);
> }
> 

This is probably safer (wrt 64 bits systems):

int is_zero_ether_addr0(const unsigned char *addr)
{
     return !(((u32*)addr)[0] | ((u16*)addr)[2]);
}

Eric

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] include/linux/etherdevice.h, kernel 2.6.14
  2005-10-29 21:36       ` J.A. Magallon
  2005-10-30 10:22         ` Eric Piel
@ 2005-10-30 10:46         ` Andreas Schwab
  1 sibling, 0 replies; 6+ messages in thread
From: Andreas Schwab @ 2005-10-30 10:46 UTC (permalink / raw)
  To: J.A. Magallon; +Cc: Russell King, Michal Srajer, Linux Kernel List

"J.A. Magallon" <jamagallon@able.es> writes:

> Just for curiosity, could you both benchmark this also:
>
> int is_zero_ether_addr0(const unsigned char *addr)
> {
>     return !(((unsigned long *)addr)[0] | ((unsigned short*)addr)[2]);
> }

It's probably slower when addr is unaligned, especially when unaligned
accesses need to be emulated.

Andreas.

-- 
Andreas Schwab, SuSE Labs, schwab@suse.de
SuSE Linux Products GmbH, Maxfeldstraße 5, 90409 Nürnberg, Germany
Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2005-10-30 10:46 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-10-29 14:10 [PATCH] include/linux/etherdevice.h, kernel 2.6.14 Michal Srajer
2005-10-29 14:17 ` Russell King
     [not found]   ` <20051029154027.GC17715@ultra60.mat.uni.torun.pl>
2005-10-29 16:00     ` Russell King
2005-10-29 21:36       ` J.A. Magallon
2005-10-30 10:22         ` Eric Piel
2005-10-30 10:46         ` Andreas Schwab

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox