All of lore.kernel.org
 help / color / mirror / Atom feed
From: Zachary Amsden <zach@vmware.com>
To: Linus Torvalds <torvalds@osdl.org>, linux-kernel@vger.kernel.org
Subject: Re: [PATCH] Remove some divide instructions
Date: Thu, 28 Oct 2004 17:47:54 -0700	[thread overview]
Message-ID: <4181933A.5000402@vmware.com> (raw)
In-Reply-To: <Pine.LNX.4.58.0410271731010.28839@ppc970.osdl.org>

Tried this.  I found a problem -- ide-cd.c calls a function to compute 
base, which was caught by sparse.  This is easy enough to workaround, 
but my adventures went further than expected..

I tested all powers of two and found gcc doesn't like to perform the 
optimization for 0x80000000 as a divisor.  Ok, worked around that.  Now 
Everything appears to work great, until I started using modules:

  MODPOST
*** Warning: "__udivdi3" [drivers/scsi/ipr.ko] undefined!
*** Warning: "__umoddi3" [drivers/scsi/ipr.ko] undefined!
*** Warning: "__udivdi3" [drivers/scsi/dpt_i2o.ko] undefined!
*** Warning: "__umoddi3" [drivers/scsi/dpt_i2o.ko] undefined!
*** Warning: "__udivdi3" [drivers/base/firmware_class.ko] undefined!
*** Warning: "__umoddi3" [drivers/base/firmware_class.ko] undefined!

That doesn't look good.  Trying to load the modules confirms that 
__uxxxdi3 is missing.  How did this happen?  If you look at do_div, it 
is clear that __udivdi3 should not be needed, since we will always hit 
the optimization path.  Nevertheless, gcc inserts an extraneous ".globl 
__udivdi3" in all output to the assembler from .c files which include 
asm/div64.h, even if the do_div function is never directly referenced 
and no instances of it appear in the assembler output (libcrc32c.c is 
the easiest to verify).  Apparently, this happens somewhere before the 
optimization phase, and the external reference never gets dropped after 
that.  Since div64.h is included from sched.h, that happens to be quite 
a few files.

#define do_div(n,base) ({ \
        unsigned long __mod; \
        if (unlikely(__builtin_constant_p(base) && !((base) & 
((base)-1)) && \
            (long)(base) > 0)) { \
                __mod = ((uint64_t)(n)) % ((unsigned long)(base));      \
                (n) = ((uint64_t)(n)) / ((unsigned long)(base)); \
        } else { \

The kernel proper is ok - since the optimization is done correctly and 
udivdi3 is never actually referenced, it gets dropped during the link 
phase.  Modules are not - the unresolved symbol stays there.

This leaves several options:

1) Forget the optimization altogether
2) Go back to the (base == 1) check
3) In the module post phase, strip extraneous unused external references 
from modules
4) Fixup module loading to work around the problem
5) Do an enumeration case for each possible constant divisor
6) Upgrade my silly old compiler and tools

This seems like a lot of work for a trivial optimization; for i386, 
perhaps #2 is the most appropriate - with a sufficiently new GCC, this 
optimization should be automatic for all architectures not hardcoding 
do_div as inline assembler.

Seems to have come full circle - the trivial extension turns out to have 
non-trivial side effects.  If only GCC were as easily extensible as 
sparse!  A __builtin_highest_one_bit() function would make it possible 
to use inline assembler without degenerating to individual cases for 
each bit.

Zachary Amsden
zach@vmware.com

Linus Torvalds wrote:

>On Wed, 27 Oct 2004, Linus Torvalds wrote:
>  
>
>>I could add a sparse check for "no side effects", if anybody cares (so 
>>that you could do
>>
>>	__builtin_warning(
>>		!__builtin_nosideeffects(base),
>>		"expression has side effects");
>>
>>in macros like these.. Sparse already has the logic internally..
>>    
>>
>
>Done. Except I called it __builtin_safe_p(). 
>
>The kernel sources already know about "__builtin_warning()" (and 
>pre-process it away on gcc), so if you have a new sparse setup (as of two 
>minutes ago ;), you can use this thing to check that arguments to macros 
>do not have side effects.
>
>Useful? You be the judge. But it was just a couple of lines in sparse, and
>doing so also made it obvious how to clean up __builtin_constant_p() a lot
>at the same time by just re-organizing things a bit.
>
>My inliner and statement simplificator isn't perfect, so inline functions
>sadly are not considered constant (or safe) even if they _do_ end up
>returning a constant value (or be safe internally).
>
>		Linus
>  
>


  reply	other threads:[~2004-10-29  0:54 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2004-10-27 16:14 [PATCH] Remove some divide instructions Zachary Amsden
2004-10-27 16:28 ` Linus Torvalds
2004-10-27 18:05   ` Zachary Amsden
2004-10-27 20:16   ` Zachary Amsden
2004-10-27 21:24     ` Linus Torvalds
2004-10-27 22:08   ` Thayne Harbaugh
2004-10-27 22:14   ` Zachary Amsden
2004-10-28  0:11     ` Linus Torvalds
2004-10-28  0:47       ` Linus Torvalds
2004-10-29  0:47         ` Zachary Amsden [this message]
2004-10-29  4:52           ` Linus Torvalds
2004-10-29 19:10             ` Geert Uytterhoeven
2004-10-28  0:59       ` Maciej W. Rozycki

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4181933A.5000402@vmware.com \
    --to=zach@vmware.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=torvalds@osdl.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.