Re: Using unsigned int for loop counters - better performance for Architectures - urban hacker legend?

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: "Luis R. Rodriguez" <lrodriguez@atheros.com>
To: Doug Dahlby <Doug.Dahlby@atheros.com>
Cc: Luis Rodriguez <Luis.Rodriguez@atheros.com>,
	<davidn@davidnewall.com>, <linux-kernel@vger.kernel.org>,
	<mcgrof@gmail.com>, <jirislaby@gmail.com>
Subject: Re: Using unsigned int for loop counters - better performance for Architectures - urban hacker legend?
Date: Mon, 2 Aug 2010 13:48:27 -0700	[thread overview]
Message-ID: <20100802204827.GF8920@tux> (raw)
In-Reply-To: <B7132A25476D334D9130FE7532F2A563109FED151A@SC1EXMB-MBCL.global.atheros.com>

Doug, I'm adding your response to lkml as its the best answer I've gotten so far.

On Mon, Aug 02, 2010 at 01:10:01PM -0700, Doug Dahlby wrote:
> Luis,
> 
> Just out of curiousity, I looked at what gcc does on my own x86 computer.
> When compiled regularly, the loop bodies are practically identical:
> 
> $ more loop_test1.c loop_test1.s loop_test2.c loop_test2.s
> ::::::::::::::
> loop_test1.c
> ::::::::::::::
> int foo(int limit)
> {
>     int i = 0;
>     for (; limit > 0; limit--) {
>         i += 1;
>     }
>     return i;
> }
> ::::::::::::::
> loop_test1.s
> ::::::::::::::
>         .file   "loop_test1.c"
>         .text
> .globl _foo
>         .def    _foo;   .scl    2;      .type   32;     .endef
> _foo:
>         pushl   %ebp
>         movl    %esp, %ebp
>         subl    $4, %esp
>         movl    $0, -4(%ebp)
> L2:
>         cmpl    $0, 8(%ebp)
>         jle     L3
>         leal    -4(%ebp), %eax
>         incl    (%eax)
>         decl    8(%ebp)
>         jmp     L2
> L3:
>         movl    -4(%ebp), %eax
>         leave
>         ret
> ::::::::::::::
> loop_test2.c
> ::::::::::::::
> int foo(unsigned limit)
> {
>     int i = 0;
>     for (; limit > 0; limit--) {
>         i += 1;
>     }
>     return i;
> }
> ::::::::::::::
> loop_test2.s
> ::::::::::::::
>         .file   "loop_test2.c"
>         .text
> .globl _foo
>         .def    _foo;   .scl    2;      .type   32;     .endef
> _foo:
>         pushl   %ebp
>         movl    %esp, %ebp
>         subl    $4, %esp
>         movl    $0, -4(%ebp)
> L2:
>         cmpl    $0, 8(%ebp)
>         je      L3
>         leal    -4(%ebp), %eax
>         incl    (%eax)
>         decl    8(%ebp)
>         jmp     L2
> L3:
>         movl    -4(%ebp), %eax
>         leave
>         ret
> 
> but when I compile with -O3, there is a little difference:
> 
> ::::::::::::::
> loop_test1.s
> ::::::::::::::
>         .file   "loop_test1.c"
>         .text
>         .p2align 4,,15
> .globl _foo
>         .def    _foo;   .scl    2;      .type   32;     .endef
> _foo:
>         pushl   %ebp
>         xorl    %eax, %eax
>         movl    %esp, %ebp
>         movl    8(%ebp), %edx
>         jmp     L10
>         .p2align 4,,7
> L12:
>         incl    %eax
>         decl    %edx
> L10:
>         testl   %edx, %edx
>         jg      L12
>         popl    %ebp
>         ret
> ::::::::::::::
> loop_test2.s
> ::::::::::::::
>         .file   "loop_test2.c"
>         .text
>         .p2align 4,,15
> .globl _foo
>         .def    _foo;   .scl    2;      .type   32;     .endef
> _foo:
>         pushl   %ebp
>         xorl    %eax, %eax
>         movl    %esp, %ebp
>         movl    8(%ebp), %edx
>         testl   %edx, %edx
>         jmp     L10
>         .p2align 4,,7
> L12:
>         incl    %eax
>         decl    %edx
> L10:
>         jne     L12
>         popl    %ebp
>         ret
> 
> Looks like the compiler is explicity testing the unsigned counter
> against zero, but uses the status bits set as a byproduct of the
> loop counter decrement for the unsigned case.  When I run these
> 2 functions repeatedly, the unsigned counter takes about 70% of
> the time of the signed counter.  This roughly matches the ratio
> of the 3 loop body statements in the unsigned case to the 4
> statements in the signed case.  This is not a rigorous test, and
> this may be specific to my architecture and my compiler settings
> (default + -O3), but it appears that there is some validity to
> make a general habit of using unsigned loop counters rather
> than signed. That being said, I'd be surprised if we have loops that
>
> (a) are dominated by the looping overhead rather than the operations
> in the loop body, and
> (b) iterate such a large number of times that they take up an non-negligible
> amount of the driver's CPU use. 
>
> So it looks to me like this is a good policy to recommend, but not one
> that needs across-the-board adherence.

Awesome, thanks!

  Luis

next prev parent reply	other threads:[~2010-08-02 20:48 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-07-30 23:53 Using unsigned int for loop counters - better performance for Architectures - urban hacker legend? Luis R. Rodriguez
2010-07-31  9:38 ` David Newall
     [not found]   ` <B7132A25476D334D9130FE7532F2A563109FED150D@SC1EXMB-MBCL.global.atheros.com>
     [not found]     ` <20100802193712.GB8920@tux>
     [not found]       ` <B7132A25476D334D9130FE7532F2A563109FED151A@SC1EXMB-MBCL.global.atheros.com>
2010-08-02 20:48         ` Luis R. Rodriguez [this message]
2010-08-03 10:06           ` David Newall

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20100802204827.GF8920@tux \
    --to=lrodriguez@atheros.com \
    --cc=Doug.Dahlby@atheros.com \
    --cc=Luis.Rodriguez@atheros.com \
    --cc=davidn@davidnewall.com \
    --cc=jirislaby@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mcgrof@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox