All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Luis R. Rodriguez" <lrodriguez@atheros.com>
To: Doug Dahlby <Doug.Dahlby@atheros.com>
Cc: Luis Rodriguez <Luis.Rodriguez@atheros.com>,
	<davidn@davidnewall.com>, <linux-kernel@vger.kernel.org>,
	<mcgrof@gmail.com>, <jirislaby@gmail.com>
Subject: Re: Using unsigned int for loop counters - better performance for Architectures - urban hacker legend?
Date: Mon, 2 Aug 2010 13:48:27 -0700	[thread overview]
Message-ID: <20100802204827.GF8920@tux> (raw)
In-Reply-To: <B7132A25476D334D9130FE7532F2A563109FED151A@SC1EXMB-MBCL.global.atheros.com>

Doug, I'm adding your response to lkml as its the best answer I've gotten so far.

On Mon, Aug 02, 2010 at 01:10:01PM -0700, Doug Dahlby wrote:
> Luis,
> 
> Just out of curiousity, I looked at what gcc does on my own x86 computer.
> When compiled regularly, the loop bodies are practically identical:
> 
> $ more loop_test1.c loop_test1.s loop_test2.c loop_test2.s
> ::::::::::::::
> loop_test1.c
> ::::::::::::::
> int foo(int limit)
> {
>     int i = 0;
>     for (; limit > 0; limit--) {
>         i += 1;
>     }
>     return i;
> }
> ::::::::::::::
> loop_test1.s
> ::::::::::::::
>         .file   "loop_test1.c"
>         .text
> .globl _foo
>         .def    _foo;   .scl    2;      .type   32;     .endef
> _foo:
>         pushl   %ebp
>         movl    %esp, %ebp
>         subl    $4, %esp
>         movl    $0, -4(%ebp)
> L2:
>         cmpl    $0, 8(%ebp)
>         jle     L3
>         leal    -4(%ebp), %eax
>         incl    (%eax)
>         decl    8(%ebp)
>         jmp     L2
> L3:
>         movl    -4(%ebp), %eax
>         leave
>         ret
> ::::::::::::::
> loop_test2.c
> ::::::::::::::
> int foo(unsigned limit)
> {
>     int i = 0;
>     for (; limit > 0; limit--) {
>         i += 1;
>     }
>     return i;
> }
> ::::::::::::::
> loop_test2.s
> ::::::::::::::
>         .file   "loop_test2.c"
>         .text
> .globl _foo
>         .def    _foo;   .scl    2;      .type   32;     .endef
> _foo:
>         pushl   %ebp
>         movl    %esp, %ebp
>         subl    $4, %esp
>         movl    $0, -4(%ebp)
> L2:
>         cmpl    $0, 8(%ebp)
>         je      L3
>         leal    -4(%ebp), %eax
>         incl    (%eax)
>         decl    8(%ebp)
>         jmp     L2
> L3:
>         movl    -4(%ebp), %eax
>         leave
>         ret
> 
> but when I compile with -O3, there is a little difference:
> 
> ::::::::::::::
> loop_test1.s
> ::::::::::::::
>         .file   "loop_test1.c"
>         .text
>         .p2align 4,,15
> .globl _foo
>         .def    _foo;   .scl    2;      .type   32;     .endef
> _foo:
>         pushl   %ebp
>         xorl    %eax, %eax
>         movl    %esp, %ebp
>         movl    8(%ebp), %edx
>         jmp     L10
>         .p2align 4,,7
> L12:
>         incl    %eax
>         decl    %edx
> L10:
>         testl   %edx, %edx
>         jg      L12
>         popl    %ebp
>         ret
> ::::::::::::::
> loop_test2.s
> ::::::::::::::
>         .file   "loop_test2.c"
>         .text
>         .p2align 4,,15
> .globl _foo
>         .def    _foo;   .scl    2;      .type   32;     .endef
> _foo:
>         pushl   %ebp
>         xorl    %eax, %eax
>         movl    %esp, %ebp
>         movl    8(%ebp), %edx
>         testl   %edx, %edx
>         jmp     L10
>         .p2align 4,,7
> L12:
>         incl    %eax
>         decl    %edx
> L10:
>         jne     L12
>         popl    %ebp
>         ret
> 
> Looks like the compiler is explicity testing the unsigned counter
> against zero, but uses the status bits set as a byproduct of the
> loop counter decrement for the unsigned case.  When I run these
> 2 functions repeatedly, the unsigned counter takes about 70% of
> the time of the signed counter.  This roughly matches the ratio
> of the 3 loop body statements in the unsigned case to the 4
> statements in the signed case.  This is not a rigorous test, and
> this may be specific to my architecture and my compiler settings
> (default + -O3), but it appears that there is some validity to
> make a general habit of using unsigned loop counters rather
> than signed. That being said, I'd be surprised if we have loops that
>
> (a) are dominated by the looping overhead rather than the operations
> in the loop body, and
> (b) iterate such a large number of times that they take up an non-negligible
> amount of the driver's CPU use. 
>
> So it looks to me like this is a good policy to recommend, but not one
> that needs across-the-board adherence.

Awesome, thanks!

  Luis

  parent reply	other threads:[~2010-08-02 20:48 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-07-30 23:53 Using unsigned int for loop counters - better performance for Architectures - urban hacker legend? Luis R. Rodriguez
2010-07-31  9:38 ` David Newall
     [not found]   ` <B7132A25476D334D9130FE7532F2A563109FED150D@SC1EXMB-MBCL.global.atheros.com>
     [not found]     ` <20100802193712.GB8920@tux>
     [not found]       ` <B7132A25476D334D9130FE7532F2A563109FED151A@SC1EXMB-MBCL.global.atheros.com>
2010-08-02 20:48         ` Luis R. Rodriguez [this message]
2010-08-03 10:06           ` David Newall

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20100802204827.GF8920@tux \
    --to=lrodriguez@atheros.com \
    --cc=Doug.Dahlby@atheros.com \
    --cc=Luis.Rodriguez@atheros.com \
    --cc=davidn@davidnewall.com \
    --cc=jirislaby@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mcgrof@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.