* [PATCH 0/3] Fixing udelay() on SMP (and non-SMP too)
@ 2010-08-19 2:24 Stephen Boyd
2010-08-19 2:24 ` [PATCH 1/3] [ARM] Translate delay.S into (mostly) C Stephen Boyd
` (2 more replies)
0 siblings, 3 replies; 16+ messages in thread
From: Stephen Boyd @ 2010-08-19 2:24 UTC (permalink / raw)
To: linux-arm-kernel
These patches are another attempt at fixing the udelay()
issue pointed out on arm-lkml[1][2]. A quick recap: some SMP
machines can scale their CPU frequencies independent of one
another. loops_per_jiffy is calibrated globally and used in
__const_udelay(). If one CPU is running faster than what the
loops_per_jiffy is calculated (or scaled) for, udelay() will
be incorrect and not wait long enough (or too long). A similar
problem occurs if the cpu frequency is scaled during a udelay()
call.
We could fix this issue a couple ways, wholesale replacement
of __udelay() and __const_udelay() (see [2] for that approach),
or replacement of __delay() (this series). Option 1 can fail if
anybody uses udelay() before memory is mapped and also duplicates
most of the code in asm/delay.h. It also needs to hardcode the
timer tick frequency, which can sometimes be inaccurate. The
benefit is that loops_per_jiffy stays the same and thus BogoMIPS
is unchanged. Option 2 can't fail since the __delay() loop is
replaced after memory is mapped in, but it suffers from a low
BogoMIPS when timers are clocked slowly. It also more accurately
calculates the timer tick frequency through the use of
calibrate_delay_direct().
-- Reference --
[1] http://article.gmane.org/gmane.linux.kernel/977567
[2] http://article.gmane.org/gmane.linux.ports.arm.kernel/78496
Stephen Boyd (3):
[ARM] Translate delay.S into (mostly) C
[ARM] Allow machines to override __delay()
[ARM] Implement a timer based __delay() loop
arch/arm/include/asm/delay.h | 5 ++-
arch/arm/kernel/armksyms.c | 4 --
arch/arm/lib/delay.S | 65 ------------------------------
arch/arm/lib/delay.c | 90 ++++++++++++++++++++++++++++++++++++++++++
4 files changed, 94 insertions(+), 70 deletions(-)
delete mode 100644 arch/arm/lib/delay.S
create mode 100644 arch/arm/lib/delay.c
--
Sent by an employee of the Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum.
^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH 1/3] [ARM] Translate delay.S into (mostly) C
2010-08-19 2:24 [PATCH 0/3] Fixing udelay() on SMP (and non-SMP too) Stephen Boyd
@ 2010-08-19 2:24 ` Stephen Boyd
2010-08-19 2:24 ` [PATCH 2/3] [ARM] Allow machines to override __delay() Stephen Boyd
2010-08-19 2:24 ` [PATCH 3/3] [ARM] Implement a timer based __delay() loop Stephen Boyd
2 siblings, 0 replies; 16+ messages in thread
From: Stephen Boyd @ 2010-08-19 2:24 UTC (permalink / raw)
To: linux-arm-kernel
In the next patch we're going to allow machines to override the
__delay() implementation at runtime so they can implement a timer
based __delay() routine. It's easier to do this using C, so lets
write udelay and friends in C.
We lose the #if 0 code, which according to Russell is used "to
make the delay loop more stable and predictable on older CPUs"
(see http://article.gmane.org/gmane.linux.kernel/888867 for more
info). We shouldn't be too worried though, since the next patch
adds functionality to allow a machine to set the __delay() loop
themselves, therefore allowing machines to resurrect the
commented out code if they need it.
bloat-o-meter shows an increase of 12 bytes. Further inspection
of the assembly shows GCC copying the loops_per_jiffy pointer and
the magic HZ value to the ends of __const_udelay() and _delay()
thus contributing an extra 4 and 8 bytes of data to each
function. These two values weren't taken into account in the
delay.S version since they weren't part of the function in nm's
eyes. This means we only really gained an extra 4 bytes due to
GCC's decision to duplicate the loops_per_jiffy pointer in
__const_udelay.
$ scripts/bloat-o-meter vmlinux.orig vmlinux.new
add/remove: 0/0 grow/shrink: 2/0 up/down: 12/0 (12)
function old new delta
__udelay 48 56 +8
__const_udelay 40 44 +4
Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
Reviewed-by: Saravana Kannan <skannan@codeaurora.org>
---
So maybe we can make the magic HZ value into a #define? UDELAY_SCALAR?
UDELAY_MAGIC?
arch/arm/include/asm/delay.h | 2 +-
arch/arm/kernel/armksyms.c | 4 --
arch/arm/lib/delay.S | 65 ------------------------------------------
arch/arm/lib/delay.c | 57 ++++++++++++++++++++++++++++++++++++
4 files changed, 58 insertions(+), 70 deletions(-)
delete mode 100644 arch/arm/lib/delay.S
create mode 100644 arch/arm/lib/delay.c
diff --git a/arch/arm/include/asm/delay.h b/arch/arm/include/asm/delay.h
index b2deda1..ccc5ed5 100644
--- a/arch/arm/include/asm/delay.h
+++ b/arch/arm/include/asm/delay.h
@@ -8,7 +8,7 @@
#include <asm/param.h> /* HZ */
-extern void __delay(int loops);
+extern void __delay(unsigned long loops);
/*
* This function intentionally does not exist; if you see references to
diff --git a/arch/arm/kernel/armksyms.c b/arch/arm/kernel/armksyms.c
index 8214bfe..259e549 100644
--- a/arch/arm/kernel/armksyms.c
+++ b/arch/arm/kernel/armksyms.c
@@ -52,10 +52,6 @@ extern void fpundefinstr(void);
EXPORT_SYMBOL(__backtrace);
- /* platform dependent support */
-EXPORT_SYMBOL(__udelay);
-EXPORT_SYMBOL(__const_udelay);
-
/* networking */
EXPORT_SYMBOL(csum_partial);
EXPORT_SYMBOL(csum_partial_copy_from_user);
diff --git a/arch/arm/lib/delay.S b/arch/arm/lib/delay.S
deleted file mode 100644
index 8d6a876..0000000
--- a/arch/arm/lib/delay.S
+++ /dev/null
@@ -1,65 +0,0 @@
-/*
- * linux/arch/arm/lib/delay.S
- *
- * Copyright (C) 1995, 1996 Russell King
- *
- * This program is free software; you can redistribute it and/or modify
- * it under the terms of the GNU General Public License version 2 as
- * published by the Free Software Foundation.
- */
-#include <linux/linkage.h>
-#include <asm/assembler.h>
-#include <asm/param.h>
- .text
-
-.LC0: .word loops_per_jiffy
-.LC1: .word (2199023*HZ)>>11
-
-/*
- * r0 <= 2000
- * lpj <= 0x01ffffff (max. 3355 bogomips)
- * HZ <= 1000
- */
-
-ENTRY(__udelay)
- ldr r2, .LC1
- mul r0, r2, r0
-ENTRY(__const_udelay) @ 0 <= r0 <= 0x7fffff06
- ldr r2, .LC0
- ldr r2, [r2] @ max = 0x01ffffff
- mov r0, r0, lsr #14 @ max = 0x0001ffff
- mov r2, r2, lsr #10 @ max = 0x00007fff
- mul r0, r2, r0 @ max = 2^32-1
- movs r0, r0, lsr #6
- moveq pc, lr
-
-/*
- * loops = r0 * HZ * loops_per_jiffy / 1000000
- *
- * Oh, if only we had a cycle counter...
- */
-
-@ Delay routine
-ENTRY(__delay)
- subs r0, r0, #1
-#if 0
- movls pc, lr
- subs r0, r0, #1
- movls pc, lr
- subs r0, r0, #1
- movls pc, lr
- subs r0, r0, #1
- movls pc, lr
- subs r0, r0, #1
- movls pc, lr
- subs r0, r0, #1
- movls pc, lr
- subs r0, r0, #1
- movls pc, lr
- subs r0, r0, #1
-#endif
- bhi __delay
- mov pc, lr
-ENDPROC(__udelay)
-ENDPROC(__const_udelay)
-ENDPROC(__delay)
diff --git a/arch/arm/lib/delay.c b/arch/arm/lib/delay.c
new file mode 100644
index 0000000..5ee0adc
--- /dev/null
+++ b/arch/arm/lib/delay.c
@@ -0,0 +1,57 @@
+/*
+ * Originally from linux/arch/arm/lib/delay.S
+ *
+ * Copyright (C) 1995, 1996 Russell King
+ * Copyright (c) 2010, Code Aurora Forum. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+#include <linux/module.h>
+#include <linux/delay.h>
+
+/*
+ * loops = usecs * HZ * loops_per_jiffy / 1000000
+ *
+ * Oh, if only we had a cycle counter...
+ */
+void __delay(unsigned long loops)
+{
+ asm volatile(
+ "1: subs %0, %0, #1 \n"
+ " bhi 1b \n"
+ : /* No output */
+ : "r" (loops)
+ );
+}
+EXPORT_SYMBOL(__delay);
+
+/*
+ * 0 <= xloops <= 0x7fffff06
+ * loops_per_jiffy <= 0x01ffffff (max. 3355 bogomips)
+ */
+void __const_udelay(unsigned long xloops)
+{
+ unsigned long lpj;
+ unsigned long loops;
+
+ xloops >>= 14; /* max = 0x01ffffff */
+ lpj = loops_per_jiffy >> 10; /* max = 0x0001ffff */
+ loops = lpj * xloops; /* max = 0x00007fff */
+ loops >>= 6; /* max = 2^32-1 */
+
+ if (loops)
+ __delay(loops);
+}
+EXPORT_SYMBOL(__const_udelay);
+
+/*
+ * usecs <= 2000
+ * HZ <= 1000
+ */
+void __udelay(unsigned long usecs)
+{
+ __const_udelay(usecs * ((2199023*HZ)>>11));
+}
+EXPORT_SYMBOL(__udelay);
--
Sent by an employee of the Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum.
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH 2/3] [ARM] Allow machines to override __delay()
2010-08-19 2:24 [PATCH 0/3] Fixing udelay() on SMP (and non-SMP too) Stephen Boyd
2010-08-19 2:24 ` [PATCH 1/3] [ARM] Translate delay.S into (mostly) C Stephen Boyd
@ 2010-08-19 2:24 ` Stephen Boyd
2010-08-19 2:24 ` [PATCH 3/3] [ARM] Implement a timer based __delay() loop Stephen Boyd
2 siblings, 0 replies; 16+ messages in thread
From: Stephen Boyd @ 2010-08-19 2:24 UTC (permalink / raw)
To: linux-arm-kernel
Some machines want to implement their own __delay() routine based
on fixed timers. Expose functionality to set the __delay()
routine at runtime. This should allow two machines with different
__delay() routines to happily co-exist within the same kernel
with minimal overhead.
Russell expressed concern that using a timer based __delay()
would cause problems where an iomapped device isn't mapped in
before a delay call was made (see
http://article.gmane.org/gmane.linux.ports.arm.kernel/78543 for
more info). We can sidestep that issue with this approach since
the __delay() routine _should_ only be pointed to a timer based
delay once the timer has been properly mapped. Up until that
point __delay() and udelay() will use delay_loop() which is
always safe to call.
This patch is inspired by x86's delay.c
Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
Reviewed-by: Saravana Kannan <skannan@codeaurora.org>
---
arch/arm/include/asm/delay.h | 2 ++
arch/arm/lib/delay.c | 21 ++++++++++++++++++---
2 files changed, 20 insertions(+), 3 deletions(-)
diff --git a/arch/arm/include/asm/delay.h b/arch/arm/include/asm/delay.h
index ccc5ed5..7c732b5 100644
--- a/arch/arm/include/asm/delay.h
+++ b/arch/arm/include/asm/delay.h
@@ -40,5 +40,7 @@ extern void __const_udelay(unsigned long);
__const_udelay((n) * ((2199023U*HZ)>>11))) : \
__udelay(n))
+extern void set_delay_fn(void (*fn)(unsigned long));
+
#endif /* defined(_ARM_DELAY_H) */
diff --git a/arch/arm/lib/delay.c b/arch/arm/lib/delay.c
index 5ee0adc..b307fcc 100644
--- a/arch/arm/lib/delay.c
+++ b/arch/arm/lib/delay.c
@@ -3,6 +3,8 @@
*
* Copyright (C) 1995, 1996 Russell King
* Copyright (c) 2010, Code Aurora Forum. All rights reserved.
+ * Copyright (C) 1993 Linus Torvalds
+ * Copyright (C) 1997 Martin Mares <mj@atrey.karlin.mff.cuni.cz>
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License version 2 as
@@ -12,11 +14,9 @@
#include <linux/delay.h>
/*
- * loops = usecs * HZ * loops_per_jiffy / 1000000
- *
* Oh, if only we had a cycle counter...
*/
-void __delay(unsigned long loops)
+void delay_loop(unsigned long loops)
{
asm volatile(
"1: subs %0, %0, #1 \n"
@@ -25,6 +25,21 @@ void __delay(unsigned long loops)
: "r" (loops)
);
}
+
+static void (*delay_fn)(unsigned long) = delay_loop;
+
+void set_delay_fn(void (*fn)(unsigned long))
+{
+ delay_fn = fn;
+}
+
+/*
+ * loops = usecs * HZ * loops_per_jiffy / 1000000
+ */
+void __delay(unsigned long loops)
+{
+ delay_fn(loops);
+}
EXPORT_SYMBOL(__delay);
/*
--
Sent by an employee of the Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum.
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH 3/3] [ARM] Implement a timer based __delay() loop
2010-08-19 2:24 [PATCH 0/3] Fixing udelay() on SMP (and non-SMP too) Stephen Boyd
2010-08-19 2:24 ` [PATCH 1/3] [ARM] Translate delay.S into (mostly) C Stephen Boyd
2010-08-19 2:24 ` [PATCH 2/3] [ARM] Allow machines to override __delay() Stephen Boyd
@ 2010-08-19 2:24 ` Stephen Boyd
2 siblings, 0 replies; 16+ messages in thread
From: Stephen Boyd @ 2010-08-19 2:24 UTC (permalink / raw)
To: linux-arm-kernel
udelay() can be incorrect on SMP machines that scale their CPU
frequencies independently of one another (as pointed out here
http://article.gmane.org/gmane.linux.kernel/977567). The delay
loop can either be too fast or too slow depending on which CPU the
loops_per_jiffy counter is calibrated on and which CPU the delay
loop is running on. udelay() can also be incorrect if the
CPU frequency switches during the __delay() loop, causing the loop
to either terminate too early, or too late.
We'd rather not fix udelay() by forcing it to run on one CPU or
take the penalty of a rather large loops_per_jiffy being used in
udelay() when the CPU is actually running slower. Instead we
solve the problem by making __delay() into a timer based loop
which should be unaffected by CPU frequency scaling. Therefore,
implement a timer based __delay() loop which can be used in place
of the current register based __delay() if a machine so chooses.
The kernel is already prepared for a timer based approach
(evident by the read_current_timer() function). If an arch
implements read_current_timer(), calibrate_delay() will use a
timer based function, calibrate_delay_direct(), to calculate
loops_per_jiffy (in which case loops_per_jiffy should really be
renamed to timer_ticks_per_jiffy). Since the loops_per_jiffy will
be based on timer ticks, __delay() should be implemented as a
loop around read_current_timer().
Doing this makes the expensive loops_per_jiffy calculation go
away (saving ~150ms on boot time on my machine) and fixes
udelay() by making it safe in the face of independently scaling
CPUs. The only prerequisite is that read_current_timer() is
monotonically increasing across calls (and doesn't overflow
within ~2000us).
There is a downside to this approach though. BogoMIPS is no
longer "accurate" in that it reflects the BogoMIPS of the timer
and not the CPU. On most SoC's the timer isn't running anywhere
near as fast as the CPU so BogoMIPS will be ridiculously low (my
timer runs at 4.8 MHz and thus my BogoMIPS is 9.6 compared to my
CPU's 800). This shouldn't be too much of a concern though since
BogoMIPS are bogus anyway (hence the name).
This loop is pretty much a copy of AVR's version made more generic.
Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
Reviewed-by: Saravana Kannan <skannan@codeaurora.org>
---
arch/arm/include/asm/delay.h | 1 +
arch/arm/lib/delay.c | 18 ++++++++++++++++++
2 files changed, 19 insertions(+), 0 deletions(-)
diff --git a/arch/arm/include/asm/delay.h b/arch/arm/include/asm/delay.h
index 7c732b5..5c6b9a3 100644
--- a/arch/arm/include/asm/delay.h
+++ b/arch/arm/include/asm/delay.h
@@ -41,6 +41,7 @@ extern void __const_udelay(unsigned long);
__udelay(n))
extern void set_delay_fn(void (*fn)(unsigned long));
+extern void read_current_timer_delay_loop(unsigned long loops);
#endif /* defined(_ARM_DELAY_H) */
diff --git a/arch/arm/lib/delay.c b/arch/arm/lib/delay.c
index b307fcc..59fdf64 100644
--- a/arch/arm/lib/delay.c
+++ b/arch/arm/lib/delay.c
@@ -5,6 +5,7 @@
* Copyright (c) 2010, Code Aurora Forum. All rights reserved.
* Copyright (C) 1993 Linus Torvalds
* Copyright (C) 1997 Martin Mares <mj@atrey.karlin.mff.cuni.cz>
+ * Copyright (C) 2005-2006 Atmel Corporation
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License version 2 as
@@ -12,6 +13,7 @@
*/
#include <linux/module.h>
#include <linux/delay.h>
+#include <linux/timex.h>
/*
* Oh, if only we had a cycle counter...
@@ -26,6 +28,22 @@ void delay_loop(unsigned long loops)
);
}
+#ifdef ARCH_HAS_READ_CURRENT_TIMER
+/*
+ * Assuming read_current_timer() is monotonically increasing
+ * across calls.
+ */
+void read_current_timer_delay_loop(unsigned long loops)
+{
+ unsigned long bclock, now;
+
+ read_current_timer(&bclock);
+ do {
+ read_current_timer(&now);
+ } while ((now - bclock) < loops);
+}
+#endif
+
static void (*delay_fn)(unsigned long) = delay_loop;
void set_delay_fn(void (*fn)(unsigned long))
--
Sent by an employee of the Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum.
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH 1/3] [ARM] Translate delay.S into (mostly) C
2010-09-07 18:23 [PATCH 0/3] Fixing udelay() on SMP (and non-SMP too) Stephen Boyd
@ 2010-09-07 18:23 ` Stephen Boyd
0 siblings, 0 replies; 16+ messages in thread
From: Stephen Boyd @ 2010-09-07 18:23 UTC (permalink / raw)
To: linux-arm-kernel
In the next patch we're going to allow machines to override the
__delay() implementation at runtime so they can implement a timer
based __delay() routine. It's easier to do this using C, so lets
write udelay and friends in C.
We lose the #if 0 code, which according to Russell is used "to
make the delay loop more stable and predictable on older CPUs"
(see http://article.gmane.org/gmane.linux.kernel/888867 for more
info). We shouldn't be too worried though, since the next patch
adds functionality to allow a machine to set the __delay() loop
themselves, therefore allowing machines to resurrect the
commented out code if they need it.
bloat-o-meter shows an increase of 12 bytes. Further inspection
of the assembly shows GCC copying the loops_per_jiffy pointer and
the magic HZ value to the ends of __const_udelay() and _delay()
thus contributing an extra 4 and 8 bytes of data to each
function. These two values weren't taken into account in the
delay.S version since they weren't part of the function in nm's
eyes. This means we only really gained an extra 4 bytes due to
GCC's decision to duplicate the loops_per_jiffy pointer in
__const_udelay.
$ scripts/bloat-o-meter vmlinux.orig vmlinux.new
add/remove: 0/0 grow/shrink: 2/0 up/down: 12/0 (12)
function old new delta
__udelay 48 56 +8
__const_udelay 40 44 +4
Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
Reviewed-by: Saravana Kannan <skannan@codeaurora.org>
---
So maybe we can make the magic HZ value into a #define? UDELAY_SCALAR?
UDELAY_MAGIC?
arch/arm/include/asm/delay.h | 2 +-
arch/arm/kernel/armksyms.c | 4 --
arch/arm/lib/delay.S | 65 ------------------------------------------
arch/arm/lib/delay.c | 57 ++++++++++++++++++++++++++++++++++++
4 files changed, 58 insertions(+), 70 deletions(-)
delete mode 100644 arch/arm/lib/delay.S
create mode 100644 arch/arm/lib/delay.c
diff --git a/arch/arm/include/asm/delay.h b/arch/arm/include/asm/delay.h
index b2deda1..ccc5ed5 100644
--- a/arch/arm/include/asm/delay.h
+++ b/arch/arm/include/asm/delay.h
@@ -8,7 +8,7 @@
#include <asm/param.h> /* HZ */
-extern void __delay(int loops);
+extern void __delay(unsigned long loops);
/*
* This function intentionally does not exist; if you see references to
diff --git a/arch/arm/kernel/armksyms.c b/arch/arm/kernel/armksyms.c
index 8214bfe..259e549 100644
--- a/arch/arm/kernel/armksyms.c
+++ b/arch/arm/kernel/armksyms.c
@@ -52,10 +52,6 @@ extern void fpundefinstr(void);
EXPORT_SYMBOL(__backtrace);
- /* platform dependent support */
-EXPORT_SYMBOL(__udelay);
-EXPORT_SYMBOL(__const_udelay);
-
/* networking */
EXPORT_SYMBOL(csum_partial);
EXPORT_SYMBOL(csum_partial_copy_from_user);
diff --git a/arch/arm/lib/delay.S b/arch/arm/lib/delay.S
deleted file mode 100644
index 8d6a876..0000000
--- a/arch/arm/lib/delay.S
+++ /dev/null
@@ -1,65 +0,0 @@
-/*
- * linux/arch/arm/lib/delay.S
- *
- * Copyright (C) 1995, 1996 Russell King
- *
- * This program is free software; you can redistribute it and/or modify
- * it under the terms of the GNU General Public License version 2 as
- * published by the Free Software Foundation.
- */
-#include <linux/linkage.h>
-#include <asm/assembler.h>
-#include <asm/param.h>
- .text
-
-.LC0: .word loops_per_jiffy
-.LC1: .word (2199023*HZ)>>11
-
-/*
- * r0 <= 2000
- * lpj <= 0x01ffffff (max. 3355 bogomips)
- * HZ <= 1000
- */
-
-ENTRY(__udelay)
- ldr r2, .LC1
- mul r0, r2, r0
-ENTRY(__const_udelay) @ 0 <= r0 <= 0x7fffff06
- ldr r2, .LC0
- ldr r2, [r2] @ max = 0x01ffffff
- mov r0, r0, lsr #14 @ max = 0x0001ffff
- mov r2, r2, lsr #10 @ max = 0x00007fff
- mul r0, r2, r0 @ max = 2^32-1
- movs r0, r0, lsr #6
- moveq pc, lr
-
-/*
- * loops = r0 * HZ * loops_per_jiffy / 1000000
- *
- * Oh, if only we had a cycle counter...
- */
-
-@ Delay routine
-ENTRY(__delay)
- subs r0, r0, #1
-#if 0
- movls pc, lr
- subs r0, r0, #1
- movls pc, lr
- subs r0, r0, #1
- movls pc, lr
- subs r0, r0, #1
- movls pc, lr
- subs r0, r0, #1
- movls pc, lr
- subs r0, r0, #1
- movls pc, lr
- subs r0, r0, #1
- movls pc, lr
- subs r0, r0, #1
-#endif
- bhi __delay
- mov pc, lr
-ENDPROC(__udelay)
-ENDPROC(__const_udelay)
-ENDPROC(__delay)
diff --git a/arch/arm/lib/delay.c b/arch/arm/lib/delay.c
new file mode 100644
index 0000000..5ee0adc
--- /dev/null
+++ b/arch/arm/lib/delay.c
@@ -0,0 +1,57 @@
+/*
+ * Originally from linux/arch/arm/lib/delay.S
+ *
+ * Copyright (C) 1995, 1996 Russell King
+ * Copyright (c) 2010, Code Aurora Forum. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+#include <linux/module.h>
+#include <linux/delay.h>
+
+/*
+ * loops = usecs * HZ * loops_per_jiffy / 1000000
+ *
+ * Oh, if only we had a cycle counter...
+ */
+void __delay(unsigned long loops)
+{
+ asm volatile(
+ "1: subs %0, %0, #1 \n"
+ " bhi 1b \n"
+ : /* No output */
+ : "r" (loops)
+ );
+}
+EXPORT_SYMBOL(__delay);
+
+/*
+ * 0 <= xloops <= 0x7fffff06
+ * loops_per_jiffy <= 0x01ffffff (max. 3355 bogomips)
+ */
+void __const_udelay(unsigned long xloops)
+{
+ unsigned long lpj;
+ unsigned long loops;
+
+ xloops >>= 14; /* max = 0x01ffffff */
+ lpj = loops_per_jiffy >> 10; /* max = 0x0001ffff */
+ loops = lpj * xloops; /* max = 0x00007fff */
+ loops >>= 6; /* max = 2^32-1 */
+
+ if (loops)
+ __delay(loops);
+}
+EXPORT_SYMBOL(__const_udelay);
+
+/*
+ * usecs <= 2000
+ * HZ <= 1000
+ */
+void __udelay(unsigned long usecs)
+{
+ __const_udelay(usecs * ((2199023*HZ)>>11));
+}
+EXPORT_SYMBOL(__udelay);
--
Sent by an employee of the Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum.
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH 1/3] [ARM] Translate delay.S into (mostly) C
2010-09-28 3:33 [PATCH 0/3] Fixing udelay() on SMP (and non-SMP too) Stephen Boyd
@ 2010-09-28 3:33 ` Stephen Boyd
2010-10-05 17:22 ` Daniel Walker
0 siblings, 1 reply; 16+ messages in thread
From: Stephen Boyd @ 2010-09-28 3:33 UTC (permalink / raw)
To: linux-arm-kernel
In the next patch we're going to allow machines to override the
__delay() implementation at runtime so they can implement a timer
based __delay() routine. It's easier to do this using C, so lets
write udelay and friends in C.
We lose the #if 0 code, which according to Russell is used "to
make the delay loop more stable and predictable on older CPUs"
(see http://article.gmane.org/gmane.linux.kernel/888867 for more
info). We shouldn't be too worried though, since the next patch
adds functionality to allow a machine to set the __delay() loop
themselves, therefore allowing machines to resurrect the
commented out code if they need it.
bloat-o-meter shows an increase of 12 bytes. Further inspection
of the assembly shows GCC copying the loops_per_jiffy pointer and
the magic HZ value to the ends of __const_udelay() and _delay()
thus contributing an extra 4 and 8 bytes of data to each
function. These two values weren't taken into account in the
delay.S version since they weren't part of the function in nm's
eyes. This means we only really gained an extra 4 bytes due to
GCC's decision to duplicate the loops_per_jiffy pointer in
__const_udelay.
$ scripts/bloat-o-meter vmlinux.orig vmlinux.new
add/remove: 0/0 grow/shrink: 2/0 up/down: 12/0 (12)
function old new delta
__udelay 48 56 +8
__const_udelay 40 44 +4
Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
Reviewed-by: Saravana Kannan <skannan@codeaurora.org>
---
So maybe we can make the magic HZ value into a #define? UDELAY_SCALAR?
UDELAY_MAGIC?
arch/arm/include/asm/delay.h | 2 +-
arch/arm/kernel/armksyms.c | 4 --
arch/arm/lib/delay.S | 65 ------------------------------------------
arch/arm/lib/delay.c | 57 ++++++++++++++++++++++++++++++++++++
4 files changed, 58 insertions(+), 70 deletions(-)
delete mode 100644 arch/arm/lib/delay.S
create mode 100644 arch/arm/lib/delay.c
diff --git a/arch/arm/include/asm/delay.h b/arch/arm/include/asm/delay.h
index b2deda1..ccc5ed5 100644
--- a/arch/arm/include/asm/delay.h
+++ b/arch/arm/include/asm/delay.h
@@ -8,7 +8,7 @@
#include <asm/param.h> /* HZ */
-extern void __delay(int loops);
+extern void __delay(unsigned long loops);
/*
* This function intentionally does not exist; if you see references to
diff --git a/arch/arm/kernel/armksyms.c b/arch/arm/kernel/armksyms.c
index 8214bfe..259e549 100644
--- a/arch/arm/kernel/armksyms.c
+++ b/arch/arm/kernel/armksyms.c
@@ -52,10 +52,6 @@ extern void fpundefinstr(void);
EXPORT_SYMBOL(__backtrace);
- /* platform dependent support */
-EXPORT_SYMBOL(__udelay);
-EXPORT_SYMBOL(__const_udelay);
-
/* networking */
EXPORT_SYMBOL(csum_partial);
EXPORT_SYMBOL(csum_partial_copy_from_user);
diff --git a/arch/arm/lib/delay.S b/arch/arm/lib/delay.S
deleted file mode 100644
index 8d6a876..0000000
--- a/arch/arm/lib/delay.S
+++ /dev/null
@@ -1,65 +0,0 @@
-/*
- * linux/arch/arm/lib/delay.S
- *
- * Copyright (C) 1995, 1996 Russell King
- *
- * This program is free software; you can redistribute it and/or modify
- * it under the terms of the GNU General Public License version 2 as
- * published by the Free Software Foundation.
- */
-#include <linux/linkage.h>
-#include <asm/assembler.h>
-#include <asm/param.h>
- .text
-
-.LC0: .word loops_per_jiffy
-.LC1: .word (2199023*HZ)>>11
-
-/*
- * r0 <= 2000
- * lpj <= 0x01ffffff (max. 3355 bogomips)
- * HZ <= 1000
- */
-
-ENTRY(__udelay)
- ldr r2, .LC1
- mul r0, r2, r0
-ENTRY(__const_udelay) @ 0 <= r0 <= 0x7fffff06
- ldr r2, .LC0
- ldr r2, [r2] @ max = 0x01ffffff
- mov r0, r0, lsr #14 @ max = 0x0001ffff
- mov r2, r2, lsr #10 @ max = 0x00007fff
- mul r0, r2, r0 @ max = 2^32-1
- movs r0, r0, lsr #6
- moveq pc, lr
-
-/*
- * loops = r0 * HZ * loops_per_jiffy / 1000000
- *
- * Oh, if only we had a cycle counter...
- */
-
-@ Delay routine
-ENTRY(__delay)
- subs r0, r0, #1
-#if 0
- movls pc, lr
- subs r0, r0, #1
- movls pc, lr
- subs r0, r0, #1
- movls pc, lr
- subs r0, r0, #1
- movls pc, lr
- subs r0, r0, #1
- movls pc, lr
- subs r0, r0, #1
- movls pc, lr
- subs r0, r0, #1
- movls pc, lr
- subs r0, r0, #1
-#endif
- bhi __delay
- mov pc, lr
-ENDPROC(__udelay)
-ENDPROC(__const_udelay)
-ENDPROC(__delay)
diff --git a/arch/arm/lib/delay.c b/arch/arm/lib/delay.c
new file mode 100644
index 0000000..5ee0adc
--- /dev/null
+++ b/arch/arm/lib/delay.c
@@ -0,0 +1,57 @@
+/*
+ * Originally from linux/arch/arm/lib/delay.S
+ *
+ * Copyright (C) 1995, 1996 Russell King
+ * Copyright (c) 2010, Code Aurora Forum. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+#include <linux/module.h>
+#include <linux/delay.h>
+
+/*
+ * loops = usecs * HZ * loops_per_jiffy / 1000000
+ *
+ * Oh, if only we had a cycle counter...
+ */
+void __delay(unsigned long loops)
+{
+ asm volatile(
+ "1: subs %0, %0, #1 \n"
+ " bhi 1b \n"
+ : /* No output */
+ : "r" (loops)
+ );
+}
+EXPORT_SYMBOL(__delay);
+
+/*
+ * 0 <= xloops <= 0x7fffff06
+ * loops_per_jiffy <= 0x01ffffff (max. 3355 bogomips)
+ */
+void __const_udelay(unsigned long xloops)
+{
+ unsigned long lpj;
+ unsigned long loops;
+
+ xloops >>= 14; /* max = 0x01ffffff */
+ lpj = loops_per_jiffy >> 10; /* max = 0x0001ffff */
+ loops = lpj * xloops; /* max = 0x00007fff */
+ loops >>= 6; /* max = 2^32-1 */
+
+ if (loops)
+ __delay(loops);
+}
+EXPORT_SYMBOL(__const_udelay);
+
+/*
+ * usecs <= 2000
+ * HZ <= 1000
+ */
+void __udelay(unsigned long usecs)
+{
+ __const_udelay(usecs * ((2199023*HZ)>>11));
+}
+EXPORT_SYMBOL(__udelay);
--
Sent by an employee of the Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum.
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH 1/3] [ARM] Translate delay.S into (mostly) C
2010-09-28 3:33 ` [PATCH 1/3] [ARM] Translate delay.S into (mostly) C Stephen Boyd
@ 2010-10-05 17:22 ` Daniel Walker
2010-10-06 3:36 ` Stephen Boyd
0 siblings, 1 reply; 16+ messages in thread
From: Daniel Walker @ 2010-10-05 17:22 UTC (permalink / raw)
To: linux-arm-kernel
On Mon, 2010-09-27 at 20:33 -0700, Stephen Boyd wrote:
> In the next patch we're going to allow machines to override the
> __delay() implementation at runtime so they can implement a timer
> based __delay() routine. It's easier to do this using C, so lets
> write udelay and friends in C.
You shouldn't really reference this series in this comment. You have to
look at this as a changeset comment. So you really want to describe what
this change is doing not what the over all series is doing.
It would be better to say something like,
"We're implementing this in C to give us further flexibility in allowing
overrides."
But don't references "next patch" or "this series" , but you can do that
in the intro email.
> We lose the #if 0 code, which according to Russell is used "to
> make the delay loop more stable and predictable on older CPUs"
> (see http://article.gmane.org/gmane.linux.kernel/888867 for more
> info). We shouldn't be too worried though, since the next patch
> adds functionality to allow a machine to set the __delay() loop
> themselves, therefore allowing machines to resurrect the
> commented out code if they need it.
>
> bloat-o-meter shows an increase of 12 bytes. Further inspection
> of the assembly shows GCC copying the loops_per_jiffy pointer and
> the magic HZ value to the ends of __const_udelay() and _delay()
> thus contributing an extra 4 and 8 bytes of data to each
> function. These two values weren't taken into account in the
> delay.S version since they weren't part of the function in nm's
> eyes. This means we only really gained an extra 4 bytes due to
> GCC's decision to duplicate the loops_per_jiffy pointer in
> __const_udelay.
>
> $ scripts/bloat-o-meter vmlinux.orig vmlinux.new
> add/remove: 0/0 grow/shrink: 2/0 up/down: 12/0 (12)
> function old new delta
> __udelay 48 56 +8
> __const_udelay 40 44 +4
I think the "size" command might be better for this type of stuff. It
should give you the same output (or similar).. It's just used more
frequently.
Is this an -Os kernel, or -O2 ?
> Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
> Reviewed-by: Saravana Kannan <skannan@codeaurora.org>
> ---
>
> So maybe we can make the magic HZ value into a #define? UDELAY_SCALAR?
> UDELAY_MAGIC?
>
> arch/arm/include/asm/delay.h | 2 +-
> arch/arm/kernel/armksyms.c | 4 --
> arch/arm/lib/delay.S | 65 ------------------------------------------
> arch/arm/lib/delay.c | 57 ++++++++++++++++++++++++++++++++++++
> 4 files changed, 58 insertions(+), 70 deletions(-)
> delete mode 100644 arch/arm/lib/delay.S
> create mode 100644 arch/arm/lib/delay.c
>
> diff --git a/arch/arm/include/asm/delay.h b/arch/arm/include/asm/delay.h
> index b2deda1..ccc5ed5 100644
> --- a/arch/arm/include/asm/delay.h
> +++ b/arch/arm/include/asm/delay.h
> @@ -8,7 +8,7 @@
>
> #include <asm/param.h> /* HZ */
>
> -extern void __delay(int loops);
> +extern void __delay(unsigned long loops);
>
> /*
> * This function intentionally does not exist; if you see references to
> diff --git a/arch/arm/kernel/armksyms.c b/arch/arm/kernel/armksyms.c
> index 8214bfe..259e549 100644
> --- a/arch/arm/kernel/armksyms.c
> +++ b/arch/arm/kernel/armksyms.c
> @@ -52,10 +52,6 @@ extern void fpundefinstr(void);
>
> EXPORT_SYMBOL(__backtrace);
>
> - /* platform dependent support */
> -EXPORT_SYMBOL(__udelay);
> -EXPORT_SYMBOL(__const_udelay);
> -
> /* networking */
> EXPORT_SYMBOL(csum_partial);
> EXPORT_SYMBOL(csum_partial_copy_from_user);
> diff --git a/arch/arm/lib/delay.S b/arch/arm/lib/delay.S
> deleted file mode 100644
> index 8d6a876..0000000
> --- a/arch/arm/lib/delay.S
> +++ /dev/null
> @@ -1,65 +0,0 @@
> -/*
> - * linux/arch/arm/lib/delay.S
> - *
> - * Copyright (C) 1995, 1996 Russell King
> - *
> - * This program is free software; you can redistribute it and/or modify
> - * it under the terms of the GNU General Public License version 2 as
> - * published by the Free Software Foundation.
> - */
> -#include <linux/linkage.h>
> -#include <asm/assembler.h>
> -#include <asm/param.h>
> - .text
> -
> -.LC0: .word loops_per_jiffy
> -.LC1: .word (2199023*HZ)>>11
> -
> -/*
> - * r0 <= 2000
> - * lpj <= 0x01ffffff (max. 3355 bogomips)
> - * HZ <= 1000
> - */
> -
> -ENTRY(__udelay)
> - ldr r2, .LC1
> - mul r0, r2, r0
> -ENTRY(__const_udelay) @ 0 <= r0 <= 0x7fffff06
> - ldr r2, .LC0
> - ldr r2, [r2] @ max = 0x01ffffff
> - mov r0, r0, lsr #14 @ max = 0x0001ffff
> - mov r2, r2, lsr #10 @ max = 0x00007fff
> - mul r0, r2, r0 @ max = 2^32-1
> - movs r0, r0, lsr #6
> - moveq pc, lr
> -
> -/*
> - * loops = r0 * HZ * loops_per_jiffy / 1000000
> - *
> - * Oh, if only we had a cycle counter...
> - */
> -
> -@ Delay routine
> -ENTRY(__delay)
> - subs r0, r0, #1
> -#if 0
> - movls pc, lr
> - subs r0, r0, #1
> - movls pc, lr
> - subs r0, r0, #1
> - movls pc, lr
> - subs r0, r0, #1
> - movls pc, lr
> - subs r0, r0, #1
> - movls pc, lr
> - subs r0, r0, #1
> - movls pc, lr
> - subs r0, r0, #1
> - movls pc, lr
> - subs r0, r0, #1
> -#endif
> - bhi __delay
> - mov pc, lr
> -ENDPROC(__udelay)
> -ENDPROC(__const_udelay)
> -ENDPROC(__delay)
> diff --git a/arch/arm/lib/delay.c b/arch/arm/lib/delay.c
> new file mode 100644
> index 0000000..5ee0adc
> --- /dev/null
> +++ b/arch/arm/lib/delay.c
> @@ -0,0 +1,57 @@
> +/*
> + * Originally from linux/arch/arm/lib/delay.S
> + *
> + * Copyright (C) 1995, 1996 Russell King
> + * Copyright (c) 2010, Code Aurora Forum. All rights reserved.
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2 as
> + * published by the Free Software Foundation.
> + */
> +#include <linux/module.h>
> +#include <linux/delay.h>
> +
> +/*
> + * loops = usecs * HZ * loops_per_jiffy / 1000000
> + *
> + * Oh, if only we had a cycle counter...
> + */
> +void __delay(unsigned long loops)
> +{
> + asm volatile(
> + "1: subs %0, %0, #1 \n"
> + " bhi 1b \n"
> + : /* No output */
> + : "r" (loops)
> + );
> +}
> +EXPORT_SYMBOL(__delay);
> +
> +/*
> + * 0 <= xloops <= 0x7fffff06
> + * loops_per_jiffy <= 0x01ffffff (max. 3355 bogomips)
As long as your doing a re-write may as well add some real text here,
and for the others.
> + */
> +void __const_udelay(unsigned long xloops)
> +{
> + unsigned long lpj;
> + unsigned long loops;
> +
> + xloops >>= 14; /* max = 0x01ffffff */
> + lpj = loops_per_jiffy >> 10; /* max = 0x0001ffff */
> + loops = lpj * xloops; /* max = 0x00007fff */
> + loops >>= 6; /* max = 2^32-1 */
> +
> + if (loops)
> + __delay(loops);
likely/unlikely ?
> +}
> +EXPORT_SYMBOL(__const_udelay);
> +
> +/*
> + * usecs <= 2000
> + * HZ <= 1000
> + */
> +void __udelay(unsigned long usecs)
> +{
> + __const_udelay(usecs * ((2199023*HZ)>>11));
> +}
> +EXPORT_SYMBOL(__udelay);
--
Sent by a consultant of the Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum.
^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH 1/3] [ARM] Translate delay.S into (mostly) C
2010-10-05 17:22 ` Daniel Walker
@ 2010-10-06 3:36 ` Stephen Boyd
2010-10-06 13:38 ` Daniel Walker
2010-10-06 14:26 ` Nicolas Pitre
0 siblings, 2 replies; 16+ messages in thread
From: Stephen Boyd @ 2010-10-06 3:36 UTC (permalink / raw)
To: linux-arm-kernel
On 10/05/2010 10:22 AM, Daniel Walker wrote:
> You shouldn't really reference this series in this comment. You have to
> look at this as a changeset comment. So you really want to describe what
> this change is doing not what the over all series is doing.
>
> It would be better to say something like,
>
> "We're implementing this in C to give us further flexibility in allowing
> overrides."
>
> But don't references "next patch" or "this series" , but you can do that
> in the intro email.
>
Why not? This is a common thing to do when multiple patches are related
to one topic. Doing a git log and searching for "next patch" shows
others doing the same.
>>
>> $ scripts/bloat-o-meter vmlinux.orig vmlinux.new
>> add/remove: 0/0 grow/shrink: 2/0 up/down: 12/0 (12)
>> function old new delta
>> __udelay 48 56 +8
>> __const_udelay 40 44 +4
>
> I think the "size" command might be better for this type of stuff. It
> should give you the same output (or similar).. It's just used more
> frequently.
Ok.
$ size vmlinux.orig
text data bss dec hex filename
6533503 530232 1228296 8292031 7e86bf vmlinux.orig
$ size vmlinux.new
text data bss dec hex filename
6533607 530232 1228296 8292135 7e8727 vmlinux.new
I should mention GCC decided to inline __delay() into __udelay() and
__const_udelay() and also decided to inline __const_udelay() into
__delay() meaning we lost the function interleaving the assembly file
had. I'll make a note of that and sorry for being misleading.
> Is this an -Os kernel, or -O2 ?
-Os.
>> +/*
>> + * 0 <= xloops <= 0x7fffff06
>> + * loops_per_jiffy <= 0x01ffffff (max. 3355 bogomips)
>
> As long as your doing a re-write may as well add some real text here,
> and for the others.
Any suggestions on what to add? I hoped the original comments would be
good enough already considering the code is unchanged.
>> + */
>> +void __const_udelay(unsigned long xloops)
>> +{
>> + unsigned long lpj;
>> + unsigned long loops;
>> +
>> + xloops >>= 14; /* max = 0x01ffffff */
>> + lpj = loops_per_jiffy >> 10; /* max = 0x0001ffff */
>> + loops = lpj * xloops; /* max = 0x00007fff */
>> + loops >>= 6; /* max = 2^32-1 */
>> +
>> + if (loops)
>> + __delay(loops);
>
> likely/unlikely ?
likely. Although I'm doubtful on how much it will help considering the
assembly and the code are equivalent already for this part of the code.
--
Sent by an employee of the Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum.
^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH 1/3] [ARM] Translate delay.S into (mostly) C
2010-10-06 3:36 ` Stephen Boyd
@ 2010-10-06 13:38 ` Daniel Walker
2010-10-06 14:26 ` Nicolas Pitre
1 sibling, 0 replies; 16+ messages in thread
From: Daniel Walker @ 2010-10-06 13:38 UTC (permalink / raw)
To: linux-arm-kernel
On Tue, 2010-10-05 at 20:36 -0700, Stephen Boyd wrote:
> On 10/05/2010 10:22 AM, Daniel Walker wrote:
> > You shouldn't really reference this series in this comment. You have to
> > look at this as a changeset comment. So you really want to describe what
> > this change is doing not what the over all series is doing.
> >
> > It would be better to say something like,
> >
> > "We're implementing this in C to give us further flexibility in allowing
> > overrides."
> >
> > But don't references "next patch" or "this series" , but you can do that
> > in the intro email.
> >
>
> Why not? This is a common thing to do when multiple patches are related
> to one topic. Doing a git log and searching for "next patch" shows
> others doing the same.
I would not say it's common .. I don't recall seeing many other series
which do this. You shouldn't do it because the patch description doesn't
stand on it own. What if this patch is accepted in isolation, and the
others rejected would your description make sense? Also people don't
always look at a "series" in git history, sometimes you only look at one
commit so saying "next patch" or "this series" can be confusing.
You can have an intro email where you can describe the whole series,
including what your intending for each patch to do.
> >>
> >> $ scripts/bloat-o-meter vmlinux.orig vmlinux.new
> >> add/remove: 0/0 grow/shrink: 2/0 up/down: 12/0 (12)
> >> function old new delta
> >> __udelay 48 56 +8
> >> __const_udelay 40 44 +4
> >
> > I think the "size" command might be better for this type of stuff. It
> > should give you the same output (or similar).. It's just used more
> > frequently.
>
> Ok.
>
> $ size vmlinux.orig
> text data bss dec hex filename
> 6533503 530232 1228296 8292031 7e86bf vmlinux.orig
> $ size vmlinux.new
> text data bss dec hex filename
> 6533607 530232 1228296 8292135 7e8727 vmlinux.new
>
> I should mention GCC decided to inline __delay() into __udelay() and
> __const_udelay() and also decided to inline __const_udelay() into
> __delay() meaning we lost the function interleaving the assembly file
> had. I'll make a note of that and sorry for being misleading.
>
> > Is this an -Os kernel, or -O2 ?
>
> -Os.
>
> >> +/*
> >> + * 0 <= xloops <= 0x7fffff06
> >> + * loops_per_jiffy <= 0x01ffffff (max. 3355 bogomips)
> >
> > As long as your doing a re-write may as well add some real text here,
> > and for the others.
>
> Any suggestions on what to add? I hoped the original comments would be
> good enough already considering the code is unchanged.
you might want to say what the purpose of the function is ..
> >> + */
> >> +void __const_udelay(unsigned long xloops)
> >> +{
> >> + unsigned long lpj;
> >> + unsigned long loops;
> >> +
> >> + xloops >>= 14; /* max = 0x01ffffff */
> >> + lpj = loops_per_jiffy >> 10; /* max = 0x0001ffff */
> >> + loops = lpj * xloops; /* max = 0x00007fff */
> >> + loops >>= 6; /* max = 2^32-1 */
> >> +
> >> + if (loops)
> >> + __delay(loops);
> >
> > likely/unlikely ?
>
> likely. Although I'm doubtful on how much it will help considering the
> assembly and the code are equivalent already for this part of the code.
I don't know, shouldn't hurt tho.
Daniel
--
Sent by an consultant of the Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora
Forum.
^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH 1/3] [ARM] Translate delay.S into (mostly) C
2010-10-06 3:36 ` Stephen Boyd
2010-10-06 13:38 ` Daniel Walker
@ 2010-10-06 14:26 ` Nicolas Pitre
2010-10-06 18:30 ` Stephen Boyd
1 sibling, 1 reply; 16+ messages in thread
From: Nicolas Pitre @ 2010-10-06 14:26 UTC (permalink / raw)
To: linux-arm-kernel
On Tue, 5 Oct 2010, Stephen Boyd wrote:
> On 10/05/2010 10:22 AM, Daniel Walker wrote:
> > I think the "size" command might be better for this type of stuff. It
> > should give you the same output (or similar).. It's just used more
> > frequently.
>
> Ok.
>
> $ size vmlinux.orig
> text data bss dec hex filename
> 6533503 530232 1228296 8292031 7e86bf vmlinux.orig
> $ size vmlinux.new
> text data bss dec hex filename
> 6533607 530232 1228296 8292135 7e8727 vmlinux.new
If you modified only one source file then I'd suggest you run 'size'
only on the affected .o file instead.
Nicolas
^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH 1/3] [ARM] Translate delay.S into (mostly) C
2010-10-06 14:26 ` Nicolas Pitre
@ 2010-10-06 18:30 ` Stephen Boyd
2010-10-06 19:35 ` Daniel Walker
0 siblings, 1 reply; 16+ messages in thread
From: Stephen Boyd @ 2010-10-06 18:30 UTC (permalink / raw)
To: linux-arm-kernel
On 10/06/2010 07:26 AM, Nicolas Pitre wrote:
>> Ok.
>>
>> $ size vmlinux.orig
>> text data bss dec hex filename
>> 6533503 530232 1228296 8292031 7e86bf vmlinux.orig
>> $ size vmlinux.new
>> text data bss dec hex filename
>> 6533607 530232 1228296 8292135 7e8727 vmlinux.new
>
> If you modified only one source file then I'd suggest you run 'size'
> only on the affected .o file instead.
Ok.
$ size delay.o.orig
text data bss dec hex filename
56 0 0 56 38 delay.o.orig
$ size delay.o.new
text data bss dec hex filename
192 0 0 192 c0 delay.o.new
Perhaps I should mark __delay and __const_udelay as noinline?
$ size delay.o.noinline
text data bss dec hex filename
144 0 0 144 90 delay.o.noinline
Now we get backwards branching.
$ objdump -d delay.o.noinline
Disassembly of section .text:
00000000 <__delay>:
0: e2500001 subs r0, r0, #1 ; 0x1
4: 8afffffd bhi 0 <__delay>
8: e12fff1e bx lr
0000000c <__const_udelay>:
c: e59f3018 ldr r3, [pc, #24] ; 2c <__const_udelay+0x20>
10: e1a00720 lsr r0, r0, #14
14: e5933000 ldr r3, [r3]
18: e1a03523 lsr r3, r3, #10
1c: e0000093 mul r0, r3, r0
20: e1b00320 lsrs r0, r0, #6
24: 012fff1e bxeq lr
28: eafffffe b 0 <__delay>
2c: 00000000 .word 0x00000000
00000030 <__udelay>:
30: e59f3004 ldr r3, [pc, #4] ; 3c <__udelay+0xc>
34: e0000093 mul r0, r3, r0
38: eafffffe b c <__const_udelay>
3c: 0001a36e .word 0x0001a36e
Is there some way to force GCC to do what I want (interleave the
functions)? It seems happy to inline them and then optimize the register
usage and instruction ordering. Perhaps that is OK though and we're
wasting our time trying to be conservative in code size.
$ objdump -d delay.o.new
Disassembly of section .text:
00000000 <__delay>:
0: e2500001 subs r0, r0, #1 ; 0x1
4: 8afffffd bhi 0 <__delay>
8: e12fff1e bx lr
0000000c <__const_udelay>:
c: e59f3020 ldr r3, [pc, #32] ; 34 <__const_udelay+0x28>
10: e1a00720 lsr r0, r0, #14
14: e5933000 ldr r3, [r3]
18: e1a03523 lsr r3, r3, #10
1c: e0000093 mul r0, r3, r0
20: e1b00320 lsrs r0, r0, #6
24: 012fff1e bxeq lr
28: e2500001 subs r0, r0, #1 ; 0x1
2c: 8afffffd bhi 28 <__const_udelay+0x1c>
30: e12fff1e bx lr
34: 00000000 .word 0x00000000
00000038 <__udelay>:
38: e59f3028 ldr r3, [pc, #40] ; 68 <__udelay+0x30>
3c: e59f2028 ldr r2, [pc, #40] ; 6c <__udelay+0x34>
40: e0030093 mul r3, r3, r0
44: e5922000 ldr r2, [r2]
48: e1a02522 lsr r2, r2, #10
4c: e1a03723 lsr r3, r3, #14
50: e0030392 mul r3, r2, r3
54: e1b03323 lsrs r3, r3, #6
58: 012fff1e bxeq lr
5c: e2533001 subs r3, r3, #1 ; 0x1
60: 8afffffd bhi 5c <__udelay+0x24>
64: e12fff1e bx lr
68: 0001a36e .word 0x0001a36e
6c: 00000000 .word 0x00000000
--
Sent by an employee of the Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum.
^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH 1/3] [ARM] Translate delay.S into (mostly) C
2010-10-06 18:30 ` Stephen Boyd
@ 2010-10-06 19:35 ` Daniel Walker
2010-10-06 20:05 ` Nicolas Pitre
2010-10-06 20:24 ` Stephen Boyd
0 siblings, 2 replies; 16+ messages in thread
From: Daniel Walker @ 2010-10-06 19:35 UTC (permalink / raw)
To: linux-arm-kernel
On Wed, 2010-10-06 at 11:30 -0700, Stephen Boyd wrote:
> Is there some way to force GCC to do what I want (interleave the
> functions)? It seems happy to inline them and then optimize the
> register
> usage and instruction ordering. Perhaps that is OK though and we're
> wasting our time trying to be conservative in code size.
Is it possible to do all this in assembly ? Can't you have the default
implementation using this assembly with different function names, then
just set the assembly function names in C code someplace?
Daniel
--
Sent by an consultant of the Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora
Forum.
^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH 1/3] [ARM] Translate delay.S into (mostly) C
2010-10-06 19:35 ` Daniel Walker
@ 2010-10-06 20:05 ` Nicolas Pitre
2010-10-08 0:11 ` Stephen Boyd
2010-10-06 20:24 ` Stephen Boyd
1 sibling, 1 reply; 16+ messages in thread
From: Nicolas Pitre @ 2010-10-06 20:05 UTC (permalink / raw)
To: linux-arm-kernel
On Wed, 6 Oct 2010, Daniel Walker wrote:
> On Wed, 2010-10-06 at 11:30 -0700, Stephen Boyd wrote:
> > Is there some way to force GCC to do what I want (interleave the
> > functions)? It seems happy to inline them and then optimize the
> > register
> > usage and instruction ordering. Perhaps that is OK though and we're
> > wasting our time trying to be conservative in code size.
You could use the noinline qualifier from <linux/compiler.h> with those
functions you don't want inlined.
> Is it possible to do all this in assembly ? Can't you have the default
> implementation using this assembly with different function names, then
> just set the assembly function names in C code someplace?
That weould be my preference too. Being in assembly means that this
code is unlikely to change with different optimization levels and/or gcc
versions which would otherwise require different calibration values.
Relying on stable calibration is necessary for the lpj kernel cmdline
parameter to have some meaning.
Nicolas
^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH 1/3] [ARM] Translate delay.S into (mostly) C
2010-10-06 19:35 ` Daniel Walker
2010-10-06 20:05 ` Nicolas Pitre
@ 2010-10-06 20:24 ` Stephen Boyd
1 sibling, 0 replies; 16+ messages in thread
From: Stephen Boyd @ 2010-10-06 20:24 UTC (permalink / raw)
To: linux-arm-kernel
On 10/06/2010 12:35 PM, Daniel Walker wrote:
> Is it possible to do all this in assembly ? Can't you have the default
> implementation using this assembly with different function names, then
> just set the assembly function names in C code someplace?
Sure we could do that. I went this route because adding the timer based
delay code was a copy paste instead of a copy translate. Actually, after
adding the set_delay_fn code __const_udelay and __delay aren't inlined
into __udelay anymore so we're back to the noinline behavior except
we're missing interleaving.
Finally, I thought it would be clearer what was going on if it was in C
as opposed to assembly. How bad is a branch as opposed to fall through.
And more importantly, how bad is a push/pop?
00000000 <delay_loop>:
0: e2500001 subs r0, r0, #1 ; 0x1
4: 8afffffd bhi 0 <delay_loop>
8: e12fff1e bx lr
0000000c <set_delay_fn>:
c: e59f3004 ldr r3, [pc, #4] ; 18 <set_delay_fn+0xc>
10: e5830000 str r0, [r3]
14: e12fff1e bx lr
18: 00000000 .word 0x00000000
0000001c <__delay>:
1c: e92d4010 push {r4, lr}
20: e59f3008 ldr r3, [pc, #8] ; 30 <__delay+0x14>
24: e1a0e00f mov lr, pc
28: e593f000 ldr pc, [r3]
2c: e8bd8010 pop {r4, pc}
30: 00000000 .word 0x00000000
00000034 <__const_udelay>:
34: e59f3018 ldr r3, [pc, #24] ; 54 <__const_udelay+0x20>
38: e1a00720 lsr r0, r0, #14
3c: e5933000 ldr r3, [r3]
40: e1a03523 lsr r3, r3, #10
44: e0000093 mul r0, r3, r0
48: e1b00320 lsrs r0, r0, #6
4c: 012fff1e bxeq lr
50: eafffffe b 1c <__delay>
54: 00000000 .word 0x00000000
00000058 <__udelay>:
58: e59f3004 ldr r3, [pc, #4] ; 64 <__udelay+0xc>
5c: e0000093 mul r0, r3, r0
60: eafffffe b 34 <__const_udelay>
64: 0001a36e .word 0x0001a36e
--
Sent by an employee of the Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum.
^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH 1/3] [ARM] Translate delay.S into (mostly) C
2010-10-06 20:05 ` Nicolas Pitre
@ 2010-10-08 0:11 ` Stephen Boyd
2010-10-08 1:12 ` Nicolas Pitre
0 siblings, 1 reply; 16+ messages in thread
From: Stephen Boyd @ 2010-10-08 0:11 UTC (permalink / raw)
To: linux-arm-kernel
On 10/06/2010 01:05 PM, Nicolas Pitre wrote:
> You could use the noinline qualifier from <linux/compiler.h> with those
> functions you don't want inlined.
>
That won't help me for the interleaving behavior though.
>> Is it possible to do all this in assembly ? Can't you have the default
>> implementation using this assembly with different function names, then
>> just set the assembly function names in C code someplace?
> That weould be my preference too. Being in assembly means that this
> code is unlikely to change with different optimization levels and/or gcc
> versions which would otherwise require different calibration values.
> Relying on stable calibration is necessary for the lpj kernel cmdline
> parameter to have some meaning.
Why doesn't any other architecture use assembly for their lpj code? They
may use headers with assembly in them or C code with assembly in them,
but they don't write all of the delay code in assembly and rely on
function interleaving. This leads me to believe other arches aren't
concerned about compiler optimizations breaking lpj cmdline parameters,
so why should ARM be concerned?
I tested the theory out and scaled down the CPU frequency to 19.2 MHz
and then called calibrate_delay(). Before and after applying this series
I got the same results.
Calibrating delay loop... 12.67 BogoMIPS (lpj=63360)
Jumping up to 1.2 GHz and calling calibrate_delay() gives me the same
before and after
Calibrating delay loop... 792.98 BogoMIPS (lpj=3964928)
I don't have access to a machine capable of running slower than 19.2
MHz. Maybe machines running in the KHz range would experience differences?
--
Sent by an employee of the Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum.
^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH 1/3] [ARM] Translate delay.S into (mostly) C
2010-10-08 0:11 ` Stephen Boyd
@ 2010-10-08 1:12 ` Nicolas Pitre
0 siblings, 0 replies; 16+ messages in thread
From: Nicolas Pitre @ 2010-10-08 1:12 UTC (permalink / raw)
To: linux-arm-kernel
On Thu, 7 Oct 2010, Stephen Boyd wrote:
> Why doesn't any other architecture use assembly for their lpj code? They
> may use headers with assembly in them or C code with assembly in them,
> but they don't write all of the delay code in assembly and rely on
> function interleaving. This leads me to believe other arches aren't
> concerned about compiler optimizations breaking lpj cmdline parameters,
> so why should ARM be concerned?
>
> I tested the theory out and scaled down the CPU frequency to 19.2 MHz
> and then called calibrate_delay(). Before and after applying this series
> I got the same results.
>
> Calibrating delay loop... 12.67 BogoMIPS (lpj=63360)
>
> Jumping up to 1.2 GHz and calling calibrate_delay() gives me the same
> before and after
>
> Calibrating delay loop... 792.98 BogoMIPS (lpj=3964928)
OK, fair enough.
> I don't have access to a machine capable of running slower than 19.2
> MHz. Maybe machines running in the KHz range would experience differences?
Don't worry, I doubt any ARM processor capable of running Linux ever was
that slow.
Nicolas
^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2010-10-08 1:12 UTC | newest]
Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-08-19 2:24 [PATCH 0/3] Fixing udelay() on SMP (and non-SMP too) Stephen Boyd
2010-08-19 2:24 ` [PATCH 1/3] [ARM] Translate delay.S into (mostly) C Stephen Boyd
2010-08-19 2:24 ` [PATCH 2/3] [ARM] Allow machines to override __delay() Stephen Boyd
2010-08-19 2:24 ` [PATCH 3/3] [ARM] Implement a timer based __delay() loop Stephen Boyd
-- strict thread matches above, loose matches on Subject: below --
2010-09-07 18:23 [PATCH 0/3] Fixing udelay() on SMP (and non-SMP too) Stephen Boyd
2010-09-07 18:23 ` [PATCH 1/3] [ARM] Translate delay.S into (mostly) C Stephen Boyd
2010-09-28 3:33 [PATCH 0/3] Fixing udelay() on SMP (and non-SMP too) Stephen Boyd
2010-09-28 3:33 ` [PATCH 1/3] [ARM] Translate delay.S into (mostly) C Stephen Boyd
2010-10-05 17:22 ` Daniel Walker
2010-10-06 3:36 ` Stephen Boyd
2010-10-06 13:38 ` Daniel Walker
2010-10-06 14:26 ` Nicolas Pitre
2010-10-06 18:30 ` Stephen Boyd
2010-10-06 19:35 ` Daniel Walker
2010-10-06 20:05 ` Nicolas Pitre
2010-10-08 0:11 ` Stephen Boyd
2010-10-08 1:12 ` Nicolas Pitre
2010-10-06 20:24 ` Stephen Boyd
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).