public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* undefined reference to __udivdi3 (gcc-4.3)
@ 2008-05-04 10:59 Christian Kujau
  0 siblings, 0 replies; 18+ messages in thread
From: Christian Kujau @ 2008-05-04 10:59 UTC (permalink / raw)
  To: LKML

Hi,

while trying to compile current -git or 2.6.25.1 with gcc (Debian 4.3.0-4) 
4.3.1 20080501 (prerelease), the following error occurs during linking:

arch/x86/xen/built-in.o: In function `xen_timer_interrupt':
time.c:(.text+0x2992): undefined reference to `__udivdi3'
time.c:(.text+0x2a13): undefined reference to `__udivdi3'
make: *** [.tmp_vmlinux1] Error 1


Searching the archives brought up old issues with gcc-3.4 [0] and not so 
old posting from 01/2008 [1], but I could not find a solution (or a 
consense about (not to) include libgcc.a).

Full make.log and .config: http://nerdbynature.de/bits/2.6.25.1/xen/

Hints welcome :)

Thanks,
Christian.

[0] http://gcc.gnu.org/ml/gcc/2004-01/msg01975.html
[1] http://www.nabble.com/gcc-4.3:-Kernel-build-fails-td15135607.html
-- 
BOFH excuse #328:

Fiber optics caused gas main leak

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: undefined reference to __udivdi3 (gcc-4.3)
       [not found] <fa.QTbvQYXhEm5VNP5dvkl5JG7NHYQ@ifi.uio.no>
@ 2008-05-04 17:35 ` Robert Hancock
  2008-05-04 22:19   ` Segher Boessenkool
  0 siblings, 1 reply; 18+ messages in thread
From: Robert Hancock @ 2008-05-04 17:35 UTC (permalink / raw)
  To: Christian Kujau; +Cc: LKML

Christian Kujau wrote:
> Hi,
> 
> while trying to compile current -git or 2.6.25.1 with gcc (Debian 
> 4.3.0-4) 4.3.1 20080501 (prerelease), the following error occurs during 
> linking:
> 
> arch/x86/xen/built-in.o: In function `xen_timer_interrupt':
> time.c:(.text+0x2992): undefined reference to `__udivdi3'
> time.c:(.text+0x2a13): undefined reference to `__udivdi3'
> make: *** [.tmp_vmlinux1] Error 1
> 
> 
> Searching the archives brought up old issues with gcc-3.4 [0] and not so 
> old posting from 01/2008 [1], but I could not find a solution (or a 
> consense about (not to) include libgcc.a).
> 
> Full make.log and .config: http://nerdbynature.de/bits/2.6.25.1/xen/
> 
> Hints welcome :)
> 
> Thanks,
> Christian.
> 
> [0] http://gcc.gnu.org/ml/gcc/2004-01/msg01975.html
> [1] http://www.nabble.com/gcc-4.3:-Kernel-build-fails-td15135607.html

I assume it's one or both of these loops in arch/x86/xen/time.c 
do_stolen_accounting() that are being optimized into a divide which 
generates a libgcc call:

	while (stolen >= NS_PER_TICK) {
		ticks++;
		stolen -= NS_PER_TICK;
	}

or

	while (blocked >= NS_PER_TICK) {
		ticks++;
		blocked -= NS_PER_TICK;
	}

I seem to recall in one previous case we added some dummy assembly code 
to stop gcc from doing this. Not sure if that is a sustainable fix, though..

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: undefined reference to __udivdi3 (gcc-4.3)
  2008-05-04 17:35 ` undefined reference to __udivdi3 (gcc-4.3) Robert Hancock
@ 2008-05-04 22:19   ` Segher Boessenkool
  2008-05-07  9:29     ` Jeremy Fitzhardinge
  0 siblings, 1 reply; 18+ messages in thread
From: Segher Boessenkool @ 2008-05-04 22:19 UTC (permalink / raw)
  To: Robert Hancock; +Cc: Christian Kujau, LKML

> I assume it's one or both of these loops in arch/x86/xen/time.c 
> do_stolen_accounting() that are being optimized into a divide which 
> generates a libgcc call:
>
> 	while (stolen >= NS_PER_TICK) {
> 		ticks++;
> 		stolen -= NS_PER_TICK;
> 	}
>
> or
>
> 	while (blocked >= NS_PER_TICK) {
> 		ticks++;
> 		blocked -= NS_PER_TICK;
> 	}

That looks plausible.

> I seem to recall in one previous case we added some dummy assembly 
> code to stop gcc from doing this.

I think you refer to 38332cb9, "time: prevent the loop in
timespec_add_ns() from being optimised away".


         while(unlikely(ns >= NSEC_PER_SEC)) {
+               /* The following asm() prevents the compiler from
+                * optimising this loop into a modulo operation.  */
+               asm("" : "+r"(ns));
+
                 ns -= NSEC_PER_SEC;
                 a->tv_sec++;
         }


> Not sure if that is a sustainable fix, though..

It should be.  The asm() arg tells GCC that the asm() could modify
"ns" in some way, so GCC cannot optimise away the loop, since it
doesn't have the required info about the induction variable to do
that.


Segher


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: undefined reference to __udivdi3 (gcc-4.3)
  2008-05-04 22:19   ` Segher Boessenkool
@ 2008-05-07  9:29     ` Jeremy Fitzhardinge
  2008-05-08 15:16       ` [PATCH] common implementation of iterative div/mod Jeremy Fitzhardinge
  0 siblings, 1 reply; 18+ messages in thread
From: Jeremy Fitzhardinge @ 2008-05-07  9:29 UTC (permalink / raw)
  To: Segher Boessenkool; +Cc: Robert Hancock, Christian Kujau, LKML, Ingo Molnar

Segher Boessenkool wrote:
>> I assume it's one or both of these loops in arch/x86/xen/time.c 
>> do_stolen_accounting() that are being optimized into a divide which 
>> generates a libgcc call:
>>
>>     while (stolen >= NS_PER_TICK) {
>>         ticks++;
>>         stolen -= NS_PER_TICK;
>>     }
>>
>> or
>>
>>     while (blocked >= NS_PER_TICK) {
>>         ticks++;
>>         blocked -= NS_PER_TICK;
>>     }
>
> That looks plausible.

Yep.  Probably both.

>> Not sure if that is a sustainable fix, though..
>
> It should be.  The asm() arg tells GCC that the asm() could modify
> "ns" in some way, so GCC cannot optimise away the loop, since it
> doesn't have the required info about the induction variable to do
> that. 

Yep, it's guaranteed to work.  But it's an ugly hack to work around an 
over-enthusiastic compiler, and so is an inherent maintainability burden.

I think the correct fix here is to introduce an iter_div_rem() function 
which contains this hack, so we can avoid scattering it all over the 
place.  I'll cook up a patch.

    J

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH] common implementation of iterative div/mod
  2008-05-07  9:29     ` Jeremy Fitzhardinge
@ 2008-05-08 15:16       ` Jeremy Fitzhardinge
  2008-05-08 20:26         ` Andrew Morton
                           ` (3 more replies)
  0 siblings, 4 replies; 18+ messages in thread
From: Jeremy Fitzhardinge @ 2008-05-08 15:16 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: Robert Hancock, Christian Kujau, LKML, Ingo Molnar,
	Thomas Gleixner, Andrew Morton, john stultz

We have a few instances of the open-coded iterative div/mod loop, used
when we don't expcet the dividend to be much bigger than the divisor.
Unfortunately modern gcc's have the tendency to strength "reduce" this
into a full mod operation, which isn't necessarily any faster, and
even if it were, doesn't exist if gcc implements it in libgcc.

The workaround is to put a dummy asm statement in the loop to prevent
gcc from performing the transformation.

This patch creates a single implementation of this loop, and uses it
to replace the open-coded versions I know about.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: john stultz <johnstul@us.ibm.com>
Cc: Segher Boessenkool <segher@kernel.crashing.org>
Cc: Christian Kujau <lists@nerdbynature.de>
Cc: Robert Hancock <hancockr@shaw.ca>
---
 arch/x86/xen/time.c    |   13 +++----------
 include/linux/math64.h |    2 ++
 include/linux/time.h   |   11 ++---------
 lib/div64.c            |   23 +++++++++++++++++++++++
 4 files changed, 30 insertions(+), 19 deletions(-)

===================================================================
--- a/arch/x86/xen/time.c
+++ b/arch/x86/xen/time.c
@@ -12,6 +12,7 @@
 #include <linux/clocksource.h>
 #include <linux/clockchips.h>
 #include <linux/kernel_stat.h>
+#include <linux/math64.h>
 
 #include <asm/xen/hypervisor.h>
 #include <asm/xen/hypercall.h>
@@ -150,11 +151,7 @@ static void do_stolen_accounting(void)
 	if (stolen < 0)
 		stolen = 0;
 
-	ticks = 0;
-	while (stolen >= NS_PER_TICK) {
-		ticks++;
-		stolen -= NS_PER_TICK;
-	}
+	ticks = iter_div_u64_rem(stolen, NS_PER_TICK, &stolen);
 	__get_cpu_var(residual_stolen) = stolen;
 	account_steal_time(NULL, ticks);
 
@@ -166,11 +163,7 @@ static void do_stolen_accounting(void)
 	if (blocked < 0)
 		blocked = 0;
 
-	ticks = 0;
-	while (blocked >= NS_PER_TICK) {
-		ticks++;
-		blocked -= NS_PER_TICK;
-	}
+	ticks = iter_div_u64_rem(blocked, NS_PER_TICK, &blocked);
 	__get_cpu_var(residual_blocked) = blocked;
 	account_steal_time(idle_task(smp_processor_id()), ticks);
 }
===================================================================
--- a/include/linux/math64.h
+++ b/include/linux/math64.h
@@ -81,4 +81,6 @@ static inline s64 div_s64(s64 dividend, 
 }
 #endif
 
+unsigned iter_div_u64_rem(u64 dividend, u32 divisor, u64 *remainder);
+
 #endif /* _LINUX_MATH64_H */
===================================================================
--- a/include/linux/time.h
+++ b/include/linux/time.h
@@ -2,6 +2,7 @@
 #define _LINUX_TIME_H
 
 #include <linux/types.h>
+#include <linux/math64.h>
 
 #ifdef __KERNEL__
 # include <linux/cache.h>
@@ -172,15 +173,7 @@ extern struct timeval ns_to_timeval(cons
  */
 static inline void timespec_add_ns(struct timespec *a, u64 ns)
 {
-	ns += a->tv_nsec;
-	while(unlikely(ns >= NSEC_PER_SEC)) {
-		/* The following asm() prevents the compiler from
-		 * optimising this loop into a modulo operation.  */
-		asm("" : "+r"(ns));
-
-		ns -= NSEC_PER_SEC;
-		a->tv_sec++;
-	}
+	a->tv_sec += iter_div_u64_rem(a->tv_nsec + ns, NSEC_PER_SEC, &ns);
 	a->tv_nsec = ns;
 }
 #endif /* __KERNEL__ */
===================================================================
--- a/lib/div64.c
+++ b/lib/div64.c
@@ -98,3 +98,26 @@ EXPORT_SYMBOL(div64_u64);
 #endif
 
 #endif /* BITS_PER_LONG == 32 */
+
+/*
+ * Iterative div/mod for use when dividend is not expected to be much
+ * bigger than divisor.
+ */
+unsigned iter_div_u64_rem(u64 dividend, u32 divisor, u64 *remainder)
+{
+	unsigned ret = 0;
+
+	while(dividend >= divisor) {
+		/* The following asm() prevents the compiler from
+		   optimising this loop into a modulo operation.  */
+		asm("" : "+rm"(dividend));
+
+		dividend -= divisor;
+		ret++;
+	}
+
+	*remainder = dividend;
+
+	return ret;
+}
+EXPORT_SYMBOL(iter_div_u64_rem);



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] common implementation of iterative div/mod
  2008-05-08 15:16       ` [PATCH] common implementation of iterative div/mod Jeremy Fitzhardinge
@ 2008-05-08 20:26         ` Andrew Morton
  2008-05-08 22:00           ` Jeremy Fitzhardinge
  2008-05-08 20:52         ` Segher Boessenkool
                           ` (2 subsequent siblings)
  3 siblings, 1 reply; 18+ messages in thread
From: Andrew Morton @ 2008-05-08 20:26 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: segher, hancockr, lists, linux-kernel, mingo, tglx, johnstul

On Thu, 08 May 2008 16:16:41 +0100
Jeremy Fitzhardinge <jeremy@goop.org> wrote:

> We have a few instances of the open-coded iterative div/mod loop, used
> when we don't expcet the dividend to be much bigger than the divisor.
> Unfortunately modern gcc's have the tendency to strength "reduce" this
> into a full mod operation, which isn't necessarily any faster, and
> even if it were, doesn't exist if gcc implements it in libgcc.
> 
> The workaround is to put a dummy asm statement in the loop to prevent
> gcc from performing the transformation.
> 
> This patch creates a single implementation of this loop, and uses it
> to replace the open-coded versions I know about.

Fair enough.  I'll plan on feeding this into 2.6.26 soon.

>  #endif /* BITS_PER_LONG == 32 */
> +
> +/*
> + * Iterative div/mod for use when dividend is not expected to be much
> + * bigger than divisor.
> + */
> +unsigned iter_div_u64_rem(u64 dividend, u32 divisor, u64 *remainder)
> +{
> +	unsigned ret = 0;
> +
> +	while(dividend >= divisor) {
> +		/* The following asm() prevents the compiler from
> +		   optimising this loop into a modulo operation.  */
> +		asm("" : "+rm"(dividend));
> +
> +		dividend -= divisor;
> +		ret++;
> +	}
> +
> +	*remainder = dividend;
> +
> +	return ret;
> +}
> +EXPORT_SYMBOL(iter_div_u64_rem);
> 

I think it would be better to do s/unsigned/u32/ here.  It's cosmetic, but
all this sort of code is pretty formal about the sizes of the types which
it uses, and it sure needs to be.


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] common implementation of iterative div/mod
  2008-05-08 15:16       ` [PATCH] common implementation of iterative div/mod Jeremy Fitzhardinge
  2008-05-08 20:26         ` Andrew Morton
@ 2008-05-08 20:52         ` Segher Boessenkool
  2008-05-08 21:57           ` Jeremy Fitzhardinge
  2008-05-09 11:45         ` Christian Kujau
  2008-05-14  6:46         ` Andrew Morton
  3 siblings, 1 reply; 18+ messages in thread
From: Segher Boessenkool @ 2008-05-08 20:52 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: LKML, Andrew Morton, Ingo Molnar, Christian Kujau, Robert Hancock,
	john stultz, Thomas Gleixner

> We have a few instances of the open-coded iterative div/mod loop, used
> when we don't expcet the dividend to be much bigger than the divisor.
> Unfortunately modern gcc's have the tendency to strength "reduce" this
> into a full mod operation, which isn't necessarily any faster, and
> even if it were, doesn't exist if gcc implements it in libgcc.
>
> The workaround is to put a dummy asm statement in the loop to prevent
> gcc from performing the transformation.

It's not a "dummy" asm, it actually does something: it tells the
compiler that it has to iterate the loop exactly as written, and
not do something else.  I.e., exactly the behaviour we want here.

> +	ticks = iter_div_u64_rem(blocked, NS_PER_TICK, &blocked);

What a terrible function name.

> static inline void timespec_add_ns(struct timespec *a, u64 ns)
> {
> -	ns += a->tv_nsec;
> -	while(unlikely(ns >= NSEC_PER_SEC)) {
> -		/* The following asm() prevents the compiler from
> -		 * optimising this loop into a modulo operation.  */
> -		asm("" : "+r"(ns));
> -
> -		ns -= NSEC_PER_SEC;
> -		a->tv_sec++;
> -	}
> +	a->tv_sec += iter_div_u64_rem(a->tv_nsec + ns, NSEC_PER_SEC, &ns);
> 	a->tv_nsec = ns;
> }

...and now the "meat" of this function isn't inline anymore.  If we
cared about not doing a divide here, you'll have to explain why
taking this trivial loop out of line is a good idea.

> +unsigned iter_div_u64_rem(u64 dividend, u32 divisor, u64 *remainder)
> +{
> +	unsigned ret = 0;
> +
> +	while(dividend >= divisor) {

You removed the unlikely() here.  Why?

> +		/* The following asm() prevents the compiler from
> +		   optimising this loop into a modulo operation.  */
> +		asm("" : "+rm"(dividend));

You changed "+r" to "+rm" here.  Why?  Also, "rm" is an x86-ism,
and this is generic code (it does work here, but why is it better
than "r"?)

> +EXPORT_SYMBOL(iter_div_u64_rem);

Does this need to be exported?


Can I suggest an alternative approach?  Define a macro (with a
good, descriptive name!) for just the asm("" : "+r"(x)), and use
that.  Much smaller patch, much clearer code, and doesn't result
in different (and worse) code generation, so it's a much safer
change.


Segher


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] common implementation of iterative div/mod
  2008-05-08 20:52         ` Segher Boessenkool
@ 2008-05-08 21:57           ` Jeremy Fitzhardinge
  0 siblings, 0 replies; 18+ messages in thread
From: Jeremy Fitzhardinge @ 2008-05-08 21:57 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: LKML, Andrew Morton, Ingo Molnar, Christian Kujau, Robert Hancock,
	john stultz, Thomas Gleixner

Segher Boessenkool wrote:
>> The workaround is to put a dummy asm statement in the loop to prevent
>> gcc from performing the transformation.
>
> It's not a "dummy" asm, it actually does something: it tells the
> compiler that it has to iterate the loop exactly as written, and
> not do something else.  I.e., exactly the behaviour we want here.

No, it has no function of its own.  It's bullying gcc into not 
performing an optimisation by giving the impression its doing something.

>> +    ticks = iter_div_u64_rem(blocked, NS_PER_TICK, &blocked);
>
> What a terrible function name.

It's consistent with the other functions defined here.  I agree it isn't 
pretty.  If you have a better suggestion, I'm all ears.

>> static inline void timespec_add_ns(struct timespec *a, u64 ns)
>> {
>> -    ns += a->tv_nsec;
>> -    while(unlikely(ns >= NSEC_PER_SEC)) {
>> -        /* The following asm() prevents the compiler from
>> -         * optimising this loop into a modulo operation.  */
>> -        asm("" : "+r"(ns));
>> -
>> -        ns -= NSEC_PER_SEC;
>> -        a->tv_sec++;
>> -    }
>> +    a->tv_sec += iter_div_u64_rem(a->tv_nsec + ns, NSEC_PER_SEC, &ns);
>>     a->tv_nsec = ns;
>> }
>
> ...and now the "meat" of this function isn't inline anymore.  If we
> cared about not doing a divide here, you'll have to explain why
> taking this trivial loop out of line is a good idea.

On x86-32 it compiles to 26 instructions and 47 bytes of code.  
Admittedly it might be smaller inline - or on a 64-bit machine - but I 
seriously doubt its suffering a huge performance hit from being out of 
line.  These days the inline threshold is very small - probably under 10 
instructions.  A direct call/return is essentially free, since it can be 
trivially prefetched.

>> +unsigned iter_div_u64_rem(u64 dividend, u32 divisor, u64 *remainder)
>> +{
>> +    unsigned ret = 0;
>> +
>> +    while(dividend >= divisor) {
>
> You removed the unlikely() here.  Why?

Because it didn't seem all that unlikely.  Besides, it makes not one bit 
of difference to the code generated by my compiler.

>> +        /* The following asm() prevents the compiler from
>> +           optimising this loop into a modulo operation.  */
>> +        asm("" : "+rm"(dividend));
>
> You changed "+r" to "+rm" here.  Why?  Also, "rm" is an x86-ism,
> and this is generic code (it does work here, but why is it better
> than "r"?)

"rm" isn't x86-specific.  I just wanted to give the compiler the freedom 
to put the value in either register or memory if it wanted to.

>> +EXPORT_SYMBOL(iter_div_u64_rem);
>
> Does this need to be exported?

Everything else in the file is exported.

> Can I suggest an alternative approach?  Define a macro (with a
> good, descriptive name!) for just the asm("" : "+r"(x)), and use
> that.  Much smaller patch, much clearer code, and doesn't result
> in different (and worse) code generation, so it's a much safer
> change.

Uh, could you suggest a name?  Something along the lines of 
prevent_gcc_from_strength_reducing_this_subtraction_loop_into_a_modulo_operation_thanks_oh_and_remember_to_use_it_in_all_the_right_places() 
springs to mind.

Rather than putting this unsightly (though with a smear of lipstick) 
hack into every open-coded iterative div-mod loop, we may as well 
implement it once and just look out for places which should be using it.

I don't think the "worse" code generation is much of an issue, since it 
isn't used anywhere performance critical, and moving the code out of 
line means there should be an overall reduction in code size.

    J

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] common implementation of iterative div/mod
  2008-05-08 20:26         ` Andrew Morton
@ 2008-05-08 22:00           ` Jeremy Fitzhardinge
  0 siblings, 0 replies; 18+ messages in thread
From: Jeremy Fitzhardinge @ 2008-05-08 22:00 UTC (permalink / raw)
  To: Andrew Morton
  Cc: segher, hancockr, lists, linux-kernel, mingo, tglx, johnstul

Andrew Morton wrote:
> On Thu, 08 May 2008 16:16:41 +0100
> Jeremy Fitzhardinge <jeremy@goop.org> wrote:
>
>   
>> We have a few instances of the open-coded iterative div/mod loop, used
>> when we don't expcet the dividend to be much bigger than the divisor.
>>     
If you get a chance, could you fix the tpyo.

>> Unfortunately modern gcc's have the tendency to strength "reduce" this
>> into a full mod operation, which isn't necessarily any faster, and
>> even if it were, doesn't exist if gcc implements it in libgcc.
>>
>> The workaround is to put a dummy asm statement in the loop to prevent
>> gcc from performing the transformation.
>>
>> This patch creates a single implementation of this loop, and uses it
>> to replace the open-coded versions I know about.
>>     
>
> Fair enough.  I'll plan on feeding this into 2.6.26 soon.
>
>   
>>  #endif /* BITS_PER_LONG == 32 */
>> +
>> +/*
>> + * Iterative div/mod for use when dividend is not expected to be much
>> + * bigger than divisor.
>> + */
>> +unsigned iter_div_u64_rem(u64 dividend, u32 divisor, u64 *remainder)
>> +{
>> +	unsigned ret = 0;
>> +
>> +	while(dividend >= divisor) {
>> +		/* The following asm() prevents the compiler from
>> +		   optimising this loop into a modulo operation.  */
>> +		asm("" : "+rm"(dividend));
>> +
>> +		dividend -= divisor;
>> +		ret++;
>> +	}
>> +
>> +	*remainder = dividend;
>> +
>> +	return ret;
>> +}
>> +EXPORT_SYMBOL(iter_div_u64_rem);
>>
>>     
>
> I think it would be better to do s/unsigned/u32/ here.  It's cosmetic, but
> all this sort of code is pretty formal about the sizes of the types which
> it uses, and it sure needs to be.
>   

OK.

    J


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] common implementation of iterative div/mod
  2008-05-08 15:16       ` [PATCH] common implementation of iterative div/mod Jeremy Fitzhardinge
  2008-05-08 20:26         ` Andrew Morton
  2008-05-08 20:52         ` Segher Boessenkool
@ 2008-05-09 11:45         ` Christian Kujau
  2008-05-14  6:46         ` Andrew Morton
  3 siblings, 0 replies; 18+ messages in thread
From: Christian Kujau @ 2008-05-09 11:45 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: Segher Boessenkool, Robert Hancock, Christian Kujau, LKML,
	Ingo Molnar, Thomas Gleixner, Andrew Morton, john stultz

Hi Jeremy,

On Thu, May 8, 2008 17:16, Jeremy Fitzhardinge wrote:
> This patch creates a single implementation of this loop, and uses it
> to replace the open-coded versions I know about.

Thanks for the patch. Applying to 2.6.26-rc1 allowed me to compile with
gcc-4.3. Feel free to add a Tested-by :-)

Thanks,
Christian.
-- 
BOFH excuse #442:

Trojan horse ran out of hay


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] common implementation of iterative div/mod
  2008-05-08 15:16       ` [PATCH] common implementation of iterative div/mod Jeremy Fitzhardinge
                           ` (2 preceding siblings ...)
  2008-05-09 11:45         ` Christian Kujau
@ 2008-05-14  6:46         ` Andrew Morton
  2008-05-14  7:33           ` Jeremy Fitzhardinge
  3 siblings, 1 reply; 18+ messages in thread
From: Andrew Morton @ 2008-05-14  6:46 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: Segher Boessenkool, Robert Hancock, Christian Kujau, LKML,
	Ingo Molnar, Thomas Gleixner, john stultz

On Thu, 08 May 2008 16:16:41 +0100 Jeremy Fitzhardinge <jeremy@goop.org> wrote:

> We have a few instances of the open-coded iterative div/mod loop, used
> when we don't expcet the dividend to be much bigger than the divisor.
> Unfortunately modern gcc's have the tendency to strength "reduce" this
> into a full mod operation, which isn't necessarily any faster, and
> even if it were, doesn't exist if gcc implements it in libgcc.
> 
> The workaround is to put a dummy asm statement in the loop to prevent
> gcc from performing the transformation.
> 
> This patch creates a single implementation of this loop, and uses it
> to replace the open-coded versions I know about.

Believe it or not, this patch causes one of my test machines to fail to
find its disk.

good dmesg: http://userweb.kernel.org/~akpm/dmesg-t61p.txt
bad dmesg: http://userweb.kernel.org/~akpm/dmesg-t61p-dead.txt
config: http://userweb.kernel.org/~akpm/config-t61p.txt

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] common implementation of iterative div/mod
  2008-05-14  6:46         ` Andrew Morton
@ 2008-05-14  7:33           ` Jeremy Fitzhardinge
  2008-05-14  8:33             ` Andi Kleen
  0 siblings, 1 reply; 18+ messages in thread
From: Jeremy Fitzhardinge @ 2008-05-14  7:33 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Segher Boessenkool, Robert Hancock, Christian Kujau, LKML,
	Ingo Molnar, Thomas Gleixner, john stultz, Andi Kleen

Andrew Morton wrote:
> On Thu, 08 May 2008 16:16:41 +0100 Jeremy Fitzhardinge <jeremy@goop.org> wrote:
>
>   
>> We have a few instances of the open-coded iterative div/mod loop, used
>> when we don't expcet the dividend to be much bigger than the divisor.
>> Unfortunately modern gcc's have the tendency to strength "reduce" this
>> into a full mod operation, which isn't necessarily any faster, and
>> even if it were, doesn't exist if gcc implements it in libgcc.
>>
>> The workaround is to put a dummy asm statement in the loop to prevent
>> gcc from performing the transformation.
>>
>> This patch creates a single implementation of this loop, and uses it
>> to replace the open-coded versions I know about.
>>     
>
> Believe it or not, this patch causes one of my test machines to fail to
> find its disk.
>
> good dmesg: http://userweb.kernel.org/~akpm/dmesg-t61p.txt
> bad dmesg: http://userweb.kernel.org/~akpm/dmesg-t61p-dead.txt
> config: http://userweb.kernel.org/~akpm/config-t61p.txt

Uh, OK.  Do your other test machines work with it?  Are there any 
relevant config differences (x86-64 vs 32?).

Hm, it's not that it can't find its disk, I think it's this:

init[1]: segfault at ffffffffff7008d2 ip ffffffffff7008d2 sp 7fff86e67488 error 14
init[1]: segfault at ffffffffff7008d2 ip 311ac07628 sp 7fff86e66cb0 error 4 in libgcc_s-4.1.2-20070925.so.1[311ac00000+d000]


Is it that the vsyscall page is trying to call into the kernel?

    notrace static noinline int do_realtime(struct timespec *ts)
    {
    	unsigned long seq, ns;
    	do {
    		seq = read_seqbegin(&gtod->lock);
    		ts->tv_sec = gtod->wall_time_sec;
    		ts->tv_nsec = gtod->wall_time_nsec;
    		ns = vgetns();
    	} while (unlikely(read_seqretry(&gtod->lock, seq)));
    	timespec_add_ns(ts, ns);
    	return 0;
    }
      

timespec_add_ns() used to be entirely inlined, but now it contains the 
call to iter_div_u64_rem().

arch/x86/vdso/vclock_gettime.c has its own copies of other kernel 
functions which it can't directly call.  We could add 
timespec_add_ns()/iter_div_u64_rem() to that list, though its pretty 
hacky.  Could we link lib/div64.o into the vdso?

    J

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] common implementation of iterative div/mod
  2008-05-14  7:33           ` Jeremy Fitzhardinge
@ 2008-05-14  8:33             ` Andi Kleen
  2008-05-14  9:55               ` Jeremy Fitzhardinge
  0 siblings, 1 reply; 18+ messages in thread
From: Andi Kleen @ 2008-05-14  8:33 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: Andrew Morton, Segher Boessenkool, Robert Hancock,
	Christian Kujau, LKML, Ingo Molnar, Thomas Gleixner, john stultz

Jeremy Fitzhardinge wrote:

> 
> arch/x86/vdso/vclock_gettime.c has its own copies of other kernel
> functions which it can't directly call.  We could add
> timespec_add_ns()/iter_div_u64_rem() to that list, though its pretty
> hacky.  Could we link lib/div64.o into the vdso?

You would need to annotate it and have a separate object file for the
different sections. Also it would need to be compiled PIC.

Inlining is better.

-Andi


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] common implementation of iterative div/mod
  2008-05-14  8:33             ` Andi Kleen
@ 2008-05-14  9:55               ` Jeremy Fitzhardinge
  2008-05-14 10:50                 ` Andi Kleen
  0 siblings, 1 reply; 18+ messages in thread
From: Jeremy Fitzhardinge @ 2008-05-14  9:55 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Andrew Morton, Segher Boessenkool, Robert Hancock,
	Christian Kujau, LKML, Ingo Molnar, Thomas Gleixner, john stultz

Andi Kleen wrote:
> You would need to annotate it and have a separate object file for the
> different sections. Also it would need to be compiled PIC.
>
> Inlining is better.

BTW, I'm seeing the memcpy() in __vdso_gettimeofday() not being inlined.

    J

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] common implementation of iterative div/mod
  2008-05-14  9:55               ` Jeremy Fitzhardinge
@ 2008-05-14 10:50                 ` Andi Kleen
  2008-05-14 10:52                   ` Jeremy Fitzhardinge
  0 siblings, 1 reply; 18+ messages in thread
From: Andi Kleen @ 2008-05-14 10:50 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: Andrew Morton, Segher Boessenkool, Robert Hancock,
	Christian Kujau, LKML, Ingo Molnar, Thomas Gleixner, john stultz

Jeremy Fitzhardinge wrote:
> Andi Kleen wrote:
>> You would need to annotate it and have a separate object file for the
>> different sections. Also it would need to be compiled PIC.
>>
>> Inlining is better.
> 
> BTW, I'm seeing the memcpy() in __vdso_gettimeofday() not being inlined.

Hmm works here. What compiler do you use? Normally gcc should 
recognize the memcpy is just two constant stores and always inline even with -Os.

nm --dynamic arch/x86/vdso/vdso.so
0000000000000000 A LINUX_2.6
ffffffffff7007e0 T __vdso_clock_gettime
ffffffffff700820 T __vdso_getcpu
ffffffffff700750 T __vdso_gettimeofday
ffffffffff7007e0 W clock_gettime
ffffffffff700820 W getcpu
ffffffffff700750 W gettimeofday

Anyways if your compiler or config cannot get that right it would need
to switch to __inline_memcpy(), but that would be slower or explicit
copying field by field.

If it's a common problem we could also implement a build time check,
but normally such problems should be already caught in code review.

-Andi


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] common implementation of iterative div/mod
  2008-05-14 10:50                 ` Andi Kleen
@ 2008-05-14 10:52                   ` Jeremy Fitzhardinge
  2008-05-14 11:21                     ` Andi Kleen
  0 siblings, 1 reply; 18+ messages in thread
From: Jeremy Fitzhardinge @ 2008-05-14 10:52 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Andrew Morton, Segher Boessenkool, Robert Hancock,
	Christian Kujau, LKML, Ingo Molnar, Thomas Gleixner, john stultz

Andi Kleen wrote:
> Jeremy Fitzhardinge wrote:
>   
>> Andi Kleen wrote:
>>     
>>> You would need to annotate it and have a separate object file for the
>>> different sections. Also it would need to be compiled PIC.
>>>
>>> Inlining is better.
>>>       
>> BTW, I'm seeing the memcpy() in __vdso_gettimeofday() not being inlined.
>>     
>
> Hmm works here. What compiler do you use? Normally gcc should 
> recognize the memcpy is just two constant stores and always inline even with -Os.
>   

It's:
jeremy@cosworth:~/hg/xen/paravirt/linux-x86_64$ gcc -v
Reading specs from /usr/lib/gcc/x86_64-linux/3.4.4/specs
Configured with: ../src/configure -v 
--enable-languages=c,c++,java,f77,pascal,objc,ada,treelang --prefix=/usr 
--libexecdir=/usr/lib --with-gxx-include-dir=/usr/include/c++/3.4 
--enable-shared --with-system-zlib --enable-nls 
--without-included-gettext --program-suffix=-3.4 --enable-__cxa_atexit 
--enable-libstdcxx-allocator=mt --enable-clocale=gnu 
--enable-libstdcxx-debug --enable-java-gc=boehm --enable-java-awt=gtk 
--disable-werror x86_64-linux
Thread model: posix
gcc version 3.4.4 20050314 (prerelease) (Debian 3.4.3-13)

I think this is still a supported compiler, isn't it?

    J

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] common implementation of iterative div/mod
  2008-05-14 10:52                   ` Jeremy Fitzhardinge
@ 2008-05-14 11:21                     ` Andi Kleen
  2008-05-14 12:58                       ` Jeremy Fitzhardinge
  0 siblings, 1 reply; 18+ messages in thread
From: Andi Kleen @ 2008-05-14 11:21 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: Andrew Morton, Segher Boessenkool, Robert Hancock,
	Christian Kujau, LKML, Ingo Molnar, Thomas Gleixner, john stultz

[-- Attachment #1: Type: text/plain, Size: 1026 bytes --]


> It's:
> jeremy@cosworth:~/hg/xen/paravirt/linux-x86_64$ gcc -v
> Reading specs from /usr/lib/gcc/x86_64-linux/3.4.4/specs
> Configured with: ../src/configure -v
> --enable-languages=c,c++,java,f77,pascal,objc,ada,treelang --prefix=/usr
> --libexecdir=/usr/lib --with-gxx-include-dir=/usr/include/c++/3.4
> --enable-shared --with-system-zlib --enable-nls
> --without-included-gettext --program-suffix=-3.4 --enable-__cxa_atexit
> --enable-libstdcxx-allocator=mt --enable-clocale=gnu
> --enable-libstdcxx-debug --enable-java-gc=boehm --enable-java-awt=gtk
> --disable-werror x86_64-linux
> Thread model: posix
> gcc version 3.4.4 20050314 (prerelease) (Debian 3.4.3-13)
> 
> I think this is still a supported compiler, isn't it?

Yes. Here's a patch. You probably didn't see problems because you either
don't have a glibc that supports the vdso or none of your programs
gets the timezone from gettimeofday() [that is very obscure obsolete
functionality anyways, normally it should be gotten from the disk locales]

-Andi





[-- Attachment #2: vdso-explicit-copy --]
[-- Type: text/plain, Size: 955 bytes --]

Use explicit copy in vdso_gettimeofday()

Jeremy's gcc 3.4 seems to be unable to inline a 8 byte memcpy. But
the vdso doesn't support external references. Copy the structure
members of struct timezone explicitely instead.

Signed-off-by: Andi Kleen <ak@linux.intel.com>

Index: linux/arch/x86/vdso/vclock_gettime.c
===================================================================
--- linux.orig/arch/x86/vdso/vclock_gettime.c
+++ linux/arch/x86/vdso/vclock_gettime.c
@@ -106,9 +106,9 @@ int __vdso_gettimeofday(struct timeval *
 		do_realtime((struct timespec *)tv);
 		tv->tv_usec /= 1000;
 		if (tz != NULL) {
-			/* This relies on gcc inlining the memcpy. We'll notice
-			   if it ever fails to do so. */
-			memcpy(tz, &gtod->sys_tz, sizeof(struct timezone));
+			/* Don't use memcpy. Some old compilers fail to inline it */
+			tz->tz_minuteswest = gtod->sys_tz.tz_minuteswest;
+			tz->tz_dsttime = gtod->sys_tz.tz_dsttime;
 		}
 		return 0;
 	}

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] common implementation of iterative div/mod
  2008-05-14 11:21                     ` Andi Kleen
@ 2008-05-14 12:58                       ` Jeremy Fitzhardinge
  0 siblings, 0 replies; 18+ messages in thread
From: Jeremy Fitzhardinge @ 2008-05-14 12:58 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Andrew Morton, Segher Boessenkool, Robert Hancock,
	Christian Kujau, LKML, Ingo Molnar, Thomas Gleixner, john stultz

Andi Kleen wrote:
>> It's:
>> jeremy@cosworth:~/hg/xen/paravirt/linux-x86_64$ gcc -v
>> Reading specs from /usr/lib/gcc/x86_64-linux/3.4.4/specs
>> Configured with: ../src/configure -v
>> --enable-languages=c,c++,java,f77,pascal,objc,ada,treelang --prefix=/usr
>> --libexecdir=/usr/lib --with-gxx-include-dir=/usr/include/c++/3.4
>> --enable-shared --with-system-zlib --enable-nls
>> --without-included-gettext --program-suffix=-3.4 --enable-__cxa_atexit
>> --enable-libstdcxx-allocator=mt --enable-clocale=gnu
>> --enable-libstdcxx-debug --enable-java-gc=boehm --enable-java-awt=gtk
>> --disable-werror x86_64-linux
>> Thread model: posix
>> gcc version 3.4.4 20050314 (prerelease) (Debian 3.4.3-13)
>>
>> I think this is still a supported compiler, isn't it?
>>     
>
> Yes. Here's a patch. You probably didn't see problems because you either
> don't have a glibc that supports the vdso or none of your programs
> gets the timezone from gettimeofday() [that is very obscure obsolete
> functionality anyways, normally it should be gotten from the disk locales]
>   

Patch looks fine to me.  I just noticed the call to memcpy by 
inspection; I haven't tried to run this.

    J

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2008-05-14 12:59 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <fa.QTbvQYXhEm5VNP5dvkl5JG7NHYQ@ifi.uio.no>
2008-05-04 17:35 ` undefined reference to __udivdi3 (gcc-4.3) Robert Hancock
2008-05-04 22:19   ` Segher Boessenkool
2008-05-07  9:29     ` Jeremy Fitzhardinge
2008-05-08 15:16       ` [PATCH] common implementation of iterative div/mod Jeremy Fitzhardinge
2008-05-08 20:26         ` Andrew Morton
2008-05-08 22:00           ` Jeremy Fitzhardinge
2008-05-08 20:52         ` Segher Boessenkool
2008-05-08 21:57           ` Jeremy Fitzhardinge
2008-05-09 11:45         ` Christian Kujau
2008-05-14  6:46         ` Andrew Morton
2008-05-14  7:33           ` Jeremy Fitzhardinge
2008-05-14  8:33             ` Andi Kleen
2008-05-14  9:55               ` Jeremy Fitzhardinge
2008-05-14 10:50                 ` Andi Kleen
2008-05-14 10:52                   ` Jeremy Fitzhardinge
2008-05-14 11:21                     ` Andi Kleen
2008-05-14 12:58                       ` Jeremy Fitzhardinge
2008-05-04 10:59 undefined reference to __udivdi3 (gcc-4.3) Christian Kujau

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox