From: "Luis R. Rodriguez" <lrodriguez@atheros.com>
To: Doug Dahlby <Doug.Dahlby@atheros.com>
Cc: Luis Rodriguez <Luis.Rodriguez@atheros.com>,
<davidn@davidnewall.com>, <linux-kernel@vger.kernel.org>,
<mcgrof@gmail.com>, <jirislaby@gmail.com>
Subject: Re: Using unsigned int for loop counters - better performance for Architectures - urban hacker legend?
Date: Mon, 2 Aug 2010 13:48:27 -0700 [thread overview]
Message-ID: <20100802204827.GF8920@tux> (raw)
In-Reply-To: <B7132A25476D334D9130FE7532F2A563109FED151A@SC1EXMB-MBCL.global.atheros.com>
Doug, I'm adding your response to lkml as its the best answer I've gotten so far.
On Mon, Aug 02, 2010 at 01:10:01PM -0700, Doug Dahlby wrote:
> Luis,
>
> Just out of curiousity, I looked at what gcc does on my own x86 computer.
> When compiled regularly, the loop bodies are practically identical:
>
> $ more loop_test1.c loop_test1.s loop_test2.c loop_test2.s
> ::::::::::::::
> loop_test1.c
> ::::::::::::::
> int foo(int limit)
> {
> int i = 0;
> for (; limit > 0; limit--) {
> i += 1;
> }
> return i;
> }
> ::::::::::::::
> loop_test1.s
> ::::::::::::::
> .file "loop_test1.c"
> .text
> .globl _foo
> .def _foo; .scl 2; .type 32; .endef
> _foo:
> pushl %ebp
> movl %esp, %ebp
> subl $4, %esp
> movl $0, -4(%ebp)
> L2:
> cmpl $0, 8(%ebp)
> jle L3
> leal -4(%ebp), %eax
> incl (%eax)
> decl 8(%ebp)
> jmp L2
> L3:
> movl -4(%ebp), %eax
> leave
> ret
> ::::::::::::::
> loop_test2.c
> ::::::::::::::
> int foo(unsigned limit)
> {
> int i = 0;
> for (; limit > 0; limit--) {
> i += 1;
> }
> return i;
> }
> ::::::::::::::
> loop_test2.s
> ::::::::::::::
> .file "loop_test2.c"
> .text
> .globl _foo
> .def _foo; .scl 2; .type 32; .endef
> _foo:
> pushl %ebp
> movl %esp, %ebp
> subl $4, %esp
> movl $0, -4(%ebp)
> L2:
> cmpl $0, 8(%ebp)
> je L3
> leal -4(%ebp), %eax
> incl (%eax)
> decl 8(%ebp)
> jmp L2
> L3:
> movl -4(%ebp), %eax
> leave
> ret
>
> but when I compile with -O3, there is a little difference:
>
> ::::::::::::::
> loop_test1.s
> ::::::::::::::
> .file "loop_test1.c"
> .text
> .p2align 4,,15
> .globl _foo
> .def _foo; .scl 2; .type 32; .endef
> _foo:
> pushl %ebp
> xorl %eax, %eax
> movl %esp, %ebp
> movl 8(%ebp), %edx
> jmp L10
> .p2align 4,,7
> L12:
> incl %eax
> decl %edx
> L10:
> testl %edx, %edx
> jg L12
> popl %ebp
> ret
> ::::::::::::::
> loop_test2.s
> ::::::::::::::
> .file "loop_test2.c"
> .text
> .p2align 4,,15
> .globl _foo
> .def _foo; .scl 2; .type 32; .endef
> _foo:
> pushl %ebp
> xorl %eax, %eax
> movl %esp, %ebp
> movl 8(%ebp), %edx
> testl %edx, %edx
> jmp L10
> .p2align 4,,7
> L12:
> incl %eax
> decl %edx
> L10:
> jne L12
> popl %ebp
> ret
>
> Looks like the compiler is explicity testing the unsigned counter
> against zero, but uses the status bits set as a byproduct of the
> loop counter decrement for the unsigned case. When I run these
> 2 functions repeatedly, the unsigned counter takes about 70% of
> the time of the signed counter. This roughly matches the ratio
> of the 3 loop body statements in the unsigned case to the 4
> statements in the signed case. This is not a rigorous test, and
> this may be specific to my architecture and my compiler settings
> (default + -O3), but it appears that there is some validity to
> make a general habit of using unsigned loop counters rather
> than signed. That being said, I'd be surprised if we have loops that
>
> (a) are dominated by the looping overhead rather than the operations
> in the loop body, and
> (b) iterate such a large number of times that they take up an non-negligible
> amount of the driver's CPU use.
>
> So it looks to me like this is a good policy to recommend, but not one
> that needs across-the-board adherence.
Awesome, thanks!
Luis
next prev parent reply other threads:[~2010-08-02 20:48 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-07-30 23:53 Using unsigned int for loop counters - better performance for Architectures - urban hacker legend? Luis R. Rodriguez
2010-07-31 9:38 ` David Newall
[not found] ` <B7132A25476D334D9130FE7532F2A563109FED150D@SC1EXMB-MBCL.global.atheros.com>
[not found] ` <20100802193712.GB8920@tux>
[not found] ` <B7132A25476D334D9130FE7532F2A563109FED151A@SC1EXMB-MBCL.global.atheros.com>
2010-08-02 20:48 ` Luis R. Rodriguez [this message]
2010-08-03 10:06 ` David Newall
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20100802204827.GF8920@tux \
--to=lrodriguez@atheros.com \
--cc=Doug.Dahlby@atheros.com \
--cc=Luis.Rodriguez@atheros.com \
--cc=davidn@davidnewall.com \
--cc=jirislaby@gmail.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mcgrof@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox