* [PATCH 0/3] Fixing udelay() on SMP (and non-SMP too) @ 2010-08-19 2:24 Stephen Boyd 2010-08-19 2:24 ` [PATCH 1/3] [ARM] Translate delay.S into (mostly) C Stephen Boyd ` (2 more replies) 0 siblings, 3 replies; 9+ messages in thread From: Stephen Boyd @ 2010-08-19 2:24 UTC (permalink / raw) To: linux-arm-kernel These patches are another attempt at fixing the udelay() issue pointed out on arm-lkml[1][2]. A quick recap: some SMP machines can scale their CPU frequencies independent of one another. loops_per_jiffy is calibrated globally and used in __const_udelay(). If one CPU is running faster than what the loops_per_jiffy is calculated (or scaled) for, udelay() will be incorrect and not wait long enough (or too long). A similar problem occurs if the cpu frequency is scaled during a udelay() call. We could fix this issue a couple ways, wholesale replacement of __udelay() and __const_udelay() (see [2] for that approach), or replacement of __delay() (this series). Option 1 can fail if anybody uses udelay() before memory is mapped and also duplicates most of the code in asm/delay.h. It also needs to hardcode the timer tick frequency, which can sometimes be inaccurate. The benefit is that loops_per_jiffy stays the same and thus BogoMIPS is unchanged. Option 2 can't fail since the __delay() loop is replaced after memory is mapped in, but it suffers from a low BogoMIPS when timers are clocked slowly. It also more accurately calculates the timer tick frequency through the use of calibrate_delay_direct(). -- Reference -- [1] http://article.gmane.org/gmane.linux.kernel/977567 [2] http://article.gmane.org/gmane.linux.ports.arm.kernel/78496 Stephen Boyd (3): [ARM] Translate delay.S into (mostly) C [ARM] Allow machines to override __delay() [ARM] Implement a timer based __delay() loop arch/arm/include/asm/delay.h | 5 ++- arch/arm/kernel/armksyms.c | 4 -- arch/arm/lib/delay.S | 65 ------------------------------ arch/arm/lib/delay.c | 90 ++++++++++++++++++++++++++++++++++++++++++ 4 files changed, 94 insertions(+), 70 deletions(-) delete mode 100644 arch/arm/lib/delay.S create mode 100644 arch/arm/lib/delay.c -- Sent by an employee of the Qualcomm Innovation Center, Inc. The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum. ^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH 1/3] [ARM] Translate delay.S into (mostly) C 2010-08-19 2:24 [PATCH 0/3] Fixing udelay() on SMP (and non-SMP too) Stephen Boyd @ 2010-08-19 2:24 ` Stephen Boyd 2010-08-19 2:24 ` [PATCH 2/3] [ARM] Allow machines to override __delay() Stephen Boyd 2010-08-19 2:24 ` [PATCH 3/3] [ARM] Implement a timer based __delay() loop Stephen Boyd 2 siblings, 0 replies; 9+ messages in thread From: Stephen Boyd @ 2010-08-19 2:24 UTC (permalink / raw) To: linux-arm-kernel In the next patch we're going to allow machines to override the __delay() implementation at runtime so they can implement a timer based __delay() routine. It's easier to do this using C, so lets write udelay and friends in C. We lose the #if 0 code, which according to Russell is used "to make the delay loop more stable and predictable on older CPUs" (see http://article.gmane.org/gmane.linux.kernel/888867 for more info). We shouldn't be too worried though, since the next patch adds functionality to allow a machine to set the __delay() loop themselves, therefore allowing machines to resurrect the commented out code if they need it. bloat-o-meter shows an increase of 12 bytes. Further inspection of the assembly shows GCC copying the loops_per_jiffy pointer and the magic HZ value to the ends of __const_udelay() and _delay() thus contributing an extra 4 and 8 bytes of data to each function. These two values weren't taken into account in the delay.S version since they weren't part of the function in nm's eyes. This means we only really gained an extra 4 bytes due to GCC's decision to duplicate the loops_per_jiffy pointer in __const_udelay. $ scripts/bloat-o-meter vmlinux.orig vmlinux.new add/remove: 0/0 grow/shrink: 2/0 up/down: 12/0 (12) function old new delta __udelay 48 56 +8 __const_udelay 40 44 +4 Signed-off-by: Stephen Boyd <sboyd@codeaurora.org> Reviewed-by: Saravana Kannan <skannan@codeaurora.org> --- So maybe we can make the magic HZ value into a #define? UDELAY_SCALAR? UDELAY_MAGIC? arch/arm/include/asm/delay.h | 2 +- arch/arm/kernel/armksyms.c | 4 -- arch/arm/lib/delay.S | 65 ------------------------------------------ arch/arm/lib/delay.c | 57 ++++++++++++++++++++++++++++++++++++ 4 files changed, 58 insertions(+), 70 deletions(-) delete mode 100644 arch/arm/lib/delay.S create mode 100644 arch/arm/lib/delay.c diff --git a/arch/arm/include/asm/delay.h b/arch/arm/include/asm/delay.h index b2deda1..ccc5ed5 100644 --- a/arch/arm/include/asm/delay.h +++ b/arch/arm/include/asm/delay.h @@ -8,7 +8,7 @@ #include <asm/param.h> /* HZ */ -extern void __delay(int loops); +extern void __delay(unsigned long loops); /* * This function intentionally does not exist; if you see references to diff --git a/arch/arm/kernel/armksyms.c b/arch/arm/kernel/armksyms.c index 8214bfe..259e549 100644 --- a/arch/arm/kernel/armksyms.c +++ b/arch/arm/kernel/armksyms.c @@ -52,10 +52,6 @@ extern void fpundefinstr(void); EXPORT_SYMBOL(__backtrace); - /* platform dependent support */ -EXPORT_SYMBOL(__udelay); -EXPORT_SYMBOL(__const_udelay); - /* networking */ EXPORT_SYMBOL(csum_partial); EXPORT_SYMBOL(csum_partial_copy_from_user); diff --git a/arch/arm/lib/delay.S b/arch/arm/lib/delay.S deleted file mode 100644 index 8d6a876..0000000 --- a/arch/arm/lib/delay.S +++ /dev/null @@ -1,65 +0,0 @@ -/* - * linux/arch/arm/lib/delay.S - * - * Copyright (C) 1995, 1996 Russell King - * - * This program is free software; you can redistribute it and/or modify - * it under the terms of the GNU General Public License version 2 as - * published by the Free Software Foundation. - */ -#include <linux/linkage.h> -#include <asm/assembler.h> -#include <asm/param.h> - .text - -.LC0: .word loops_per_jiffy -.LC1: .word (2199023*HZ)>>11 - -/* - * r0 <= 2000 - * lpj <= 0x01ffffff (max. 3355 bogomips) - * HZ <= 1000 - */ - -ENTRY(__udelay) - ldr r2, .LC1 - mul r0, r2, r0 -ENTRY(__const_udelay) @ 0 <= r0 <= 0x7fffff06 - ldr r2, .LC0 - ldr r2, [r2] @ max = 0x01ffffff - mov r0, r0, lsr #14 @ max = 0x0001ffff - mov r2, r2, lsr #10 @ max = 0x00007fff - mul r0, r2, r0 @ max = 2^32-1 - movs r0, r0, lsr #6 - moveq pc, lr - -/* - * loops = r0 * HZ * loops_per_jiffy / 1000000 - * - * Oh, if only we had a cycle counter... - */ - -@ Delay routine -ENTRY(__delay) - subs r0, r0, #1 -#if 0 - movls pc, lr - subs r0, r0, #1 - movls pc, lr - subs r0, r0, #1 - movls pc, lr - subs r0, r0, #1 - movls pc, lr - subs r0, r0, #1 - movls pc, lr - subs r0, r0, #1 - movls pc, lr - subs r0, r0, #1 - movls pc, lr - subs r0, r0, #1 -#endif - bhi __delay - mov pc, lr -ENDPROC(__udelay) -ENDPROC(__const_udelay) -ENDPROC(__delay) diff --git a/arch/arm/lib/delay.c b/arch/arm/lib/delay.c new file mode 100644 index 0000000..5ee0adc --- /dev/null +++ b/arch/arm/lib/delay.c @@ -0,0 +1,57 @@ +/* + * Originally from linux/arch/arm/lib/delay.S + * + * Copyright (C) 1995, 1996 Russell King + * Copyright (c) 2010, Code Aurora Forum. All rights reserved. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 as + * published by the Free Software Foundation. + */ +#include <linux/module.h> +#include <linux/delay.h> + +/* + * loops = usecs * HZ * loops_per_jiffy / 1000000 + * + * Oh, if only we had a cycle counter... + */ +void __delay(unsigned long loops) +{ + asm volatile( + "1: subs %0, %0, #1 \n" + " bhi 1b \n" + : /* No output */ + : "r" (loops) + ); +} +EXPORT_SYMBOL(__delay); + +/* + * 0 <= xloops <= 0x7fffff06 + * loops_per_jiffy <= 0x01ffffff (max. 3355 bogomips) + */ +void __const_udelay(unsigned long xloops) +{ + unsigned long lpj; + unsigned long loops; + + xloops >>= 14; /* max = 0x01ffffff */ + lpj = loops_per_jiffy >> 10; /* max = 0x0001ffff */ + loops = lpj * xloops; /* max = 0x00007fff */ + loops >>= 6; /* max = 2^32-1 */ + + if (loops) + __delay(loops); +} +EXPORT_SYMBOL(__const_udelay); + +/* + * usecs <= 2000 + * HZ <= 1000 + */ +void __udelay(unsigned long usecs) +{ + __const_udelay(usecs * ((2199023*HZ)>>11)); +} +EXPORT_SYMBOL(__udelay); -- Sent by an employee of the Qualcomm Innovation Center, Inc. The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum. ^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH 2/3] [ARM] Allow machines to override __delay() 2010-08-19 2:24 [PATCH 0/3] Fixing udelay() on SMP (and non-SMP too) Stephen Boyd 2010-08-19 2:24 ` [PATCH 1/3] [ARM] Translate delay.S into (mostly) C Stephen Boyd @ 2010-08-19 2:24 ` Stephen Boyd 2010-08-19 2:24 ` [PATCH 3/3] [ARM] Implement a timer based __delay() loop Stephen Boyd 2 siblings, 0 replies; 9+ messages in thread From: Stephen Boyd @ 2010-08-19 2:24 UTC (permalink / raw) To: linux-arm-kernel Some machines want to implement their own __delay() routine based on fixed timers. Expose functionality to set the __delay() routine at runtime. This should allow two machines with different __delay() routines to happily co-exist within the same kernel with minimal overhead. Russell expressed concern that using a timer based __delay() would cause problems where an iomapped device isn't mapped in before a delay call was made (see http://article.gmane.org/gmane.linux.ports.arm.kernel/78543 for more info). We can sidestep that issue with this approach since the __delay() routine _should_ only be pointed to a timer based delay once the timer has been properly mapped. Up until that point __delay() and udelay() will use delay_loop() which is always safe to call. This patch is inspired by x86's delay.c Signed-off-by: Stephen Boyd <sboyd@codeaurora.org> Reviewed-by: Saravana Kannan <skannan@codeaurora.org> --- arch/arm/include/asm/delay.h | 2 ++ arch/arm/lib/delay.c | 21 ++++++++++++++++++--- 2 files changed, 20 insertions(+), 3 deletions(-) diff --git a/arch/arm/include/asm/delay.h b/arch/arm/include/asm/delay.h index ccc5ed5..7c732b5 100644 --- a/arch/arm/include/asm/delay.h +++ b/arch/arm/include/asm/delay.h @@ -40,5 +40,7 @@ extern void __const_udelay(unsigned long); __const_udelay((n) * ((2199023U*HZ)>>11))) : \ __udelay(n)) +extern void set_delay_fn(void (*fn)(unsigned long)); + #endif /* defined(_ARM_DELAY_H) */ diff --git a/arch/arm/lib/delay.c b/arch/arm/lib/delay.c index 5ee0adc..b307fcc 100644 --- a/arch/arm/lib/delay.c +++ b/arch/arm/lib/delay.c @@ -3,6 +3,8 @@ * * Copyright (C) 1995, 1996 Russell King * Copyright (c) 2010, Code Aurora Forum. All rights reserved. + * Copyright (C) 1993 Linus Torvalds + * Copyright (C) 1997 Martin Mares <mj@atrey.karlin.mff.cuni.cz> * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License version 2 as @@ -12,11 +14,9 @@ #include <linux/delay.h> /* - * loops = usecs * HZ * loops_per_jiffy / 1000000 - * * Oh, if only we had a cycle counter... */ -void __delay(unsigned long loops) +void delay_loop(unsigned long loops) { asm volatile( "1: subs %0, %0, #1 \n" @@ -25,6 +25,21 @@ void __delay(unsigned long loops) : "r" (loops) ); } + +static void (*delay_fn)(unsigned long) = delay_loop; + +void set_delay_fn(void (*fn)(unsigned long)) +{ + delay_fn = fn; +} + +/* + * loops = usecs * HZ * loops_per_jiffy / 1000000 + */ +void __delay(unsigned long loops) +{ + delay_fn(loops); +} EXPORT_SYMBOL(__delay); /* -- Sent by an employee of the Qualcomm Innovation Center, Inc. The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum. ^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH 3/3] [ARM] Implement a timer based __delay() loop 2010-08-19 2:24 [PATCH 0/3] Fixing udelay() on SMP (and non-SMP too) Stephen Boyd 2010-08-19 2:24 ` [PATCH 1/3] [ARM] Translate delay.S into (mostly) C Stephen Boyd 2010-08-19 2:24 ` [PATCH 2/3] [ARM] Allow machines to override __delay() Stephen Boyd @ 2010-08-19 2:24 ` Stephen Boyd 2 siblings, 0 replies; 9+ messages in thread From: Stephen Boyd @ 2010-08-19 2:24 UTC (permalink / raw) To: linux-arm-kernel udelay() can be incorrect on SMP machines that scale their CPU frequencies independently of one another (as pointed out here http://article.gmane.org/gmane.linux.kernel/977567). The delay loop can either be too fast or too slow depending on which CPU the loops_per_jiffy counter is calibrated on and which CPU the delay loop is running on. udelay() can also be incorrect if the CPU frequency switches during the __delay() loop, causing the loop to either terminate too early, or too late. We'd rather not fix udelay() by forcing it to run on one CPU or take the penalty of a rather large loops_per_jiffy being used in udelay() when the CPU is actually running slower. Instead we solve the problem by making __delay() into a timer based loop which should be unaffected by CPU frequency scaling. Therefore, implement a timer based __delay() loop which can be used in place of the current register based __delay() if a machine so chooses. The kernel is already prepared for a timer based approach (evident by the read_current_timer() function). If an arch implements read_current_timer(), calibrate_delay() will use a timer based function, calibrate_delay_direct(), to calculate loops_per_jiffy (in which case loops_per_jiffy should really be renamed to timer_ticks_per_jiffy). Since the loops_per_jiffy will be based on timer ticks, __delay() should be implemented as a loop around read_current_timer(). Doing this makes the expensive loops_per_jiffy calculation go away (saving ~150ms on boot time on my machine) and fixes udelay() by making it safe in the face of independently scaling CPUs. The only prerequisite is that read_current_timer() is monotonically increasing across calls (and doesn't overflow within ~2000us). There is a downside to this approach though. BogoMIPS is no longer "accurate" in that it reflects the BogoMIPS of the timer and not the CPU. On most SoC's the timer isn't running anywhere near as fast as the CPU so BogoMIPS will be ridiculously low (my timer runs at 4.8 MHz and thus my BogoMIPS is 9.6 compared to my CPU's 800). This shouldn't be too much of a concern though since BogoMIPS are bogus anyway (hence the name). This loop is pretty much a copy of AVR's version made more generic. Signed-off-by: Stephen Boyd <sboyd@codeaurora.org> Reviewed-by: Saravana Kannan <skannan@codeaurora.org> --- arch/arm/include/asm/delay.h | 1 + arch/arm/lib/delay.c | 18 ++++++++++++++++++ 2 files changed, 19 insertions(+), 0 deletions(-) diff --git a/arch/arm/include/asm/delay.h b/arch/arm/include/asm/delay.h index 7c732b5..5c6b9a3 100644 --- a/arch/arm/include/asm/delay.h +++ b/arch/arm/include/asm/delay.h @@ -41,6 +41,7 @@ extern void __const_udelay(unsigned long); __udelay(n)) extern void set_delay_fn(void (*fn)(unsigned long)); +extern void read_current_timer_delay_loop(unsigned long loops); #endif /* defined(_ARM_DELAY_H) */ diff --git a/arch/arm/lib/delay.c b/arch/arm/lib/delay.c index b307fcc..59fdf64 100644 --- a/arch/arm/lib/delay.c +++ b/arch/arm/lib/delay.c @@ -5,6 +5,7 @@ * Copyright (c) 2010, Code Aurora Forum. All rights reserved. * Copyright (C) 1993 Linus Torvalds * Copyright (C) 1997 Martin Mares <mj@atrey.karlin.mff.cuni.cz> + * Copyright (C) 2005-2006 Atmel Corporation * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License version 2 as @@ -12,6 +13,7 @@ */ #include <linux/module.h> #include <linux/delay.h> +#include <linux/timex.h> /* * Oh, if only we had a cycle counter... @@ -26,6 +28,22 @@ void delay_loop(unsigned long loops) ); } +#ifdef ARCH_HAS_READ_CURRENT_TIMER +/* + * Assuming read_current_timer() is monotonically increasing + * across calls. + */ +void read_current_timer_delay_loop(unsigned long loops) +{ + unsigned long bclock, now; + + read_current_timer(&bclock); + do { + read_current_timer(&now); + } while ((now - bclock) < loops); +} +#endif + static void (*delay_fn)(unsigned long) = delay_loop; void set_delay_fn(void (*fn)(unsigned long)) -- Sent by an employee of the Qualcomm Innovation Center, Inc. The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum. ^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH 0/3] Fixing udelay() on SMP (and non-SMP too) @ 2010-09-07 18:23 Stephen Boyd 2010-09-07 18:23 ` [PATCH 3/3] [ARM] Implement a timer based __delay() loop Stephen Boyd 0 siblings, 1 reply; 9+ messages in thread From: Stephen Boyd @ 2010-09-07 18:23 UTC (permalink / raw) To: linux-arm-kernel (Sorry, resending due to stray comma...) These patches are another attempt at fixing the udelay() issue pointed out on arm-lkml[1][2]. A quick recap: some SMP machines can scale their CPU frequencies independent of one another. loops_per_jiffy is calibrated globally and used in __const_udelay(). If one CPU is running faster than what the loops_per_jiffy is calculated (or scaled) for, udelay() will be incorrect and not wait long enough (or too long). A similar problem occurs if the cpu frequency is scaled during a udelay() call. We could fix this issue a couple ways, wholesale replacement of __udelay() and __const_udelay() (see [2] for that approach), or replacement of __delay() (this series). Option 1 can fail if anybody uses udelay() before memory is mapped and also duplicates most of the code in asm/delay.h. It also needs to hardcode the timer tick frequency, which can sometimes be inaccurate. The benefit is that loops_per_jiffy stays the same and thus BogoMIPS is unchanged. Option 2 can't fail since the __delay() loop is replaced after memory is mapped in, but it suffers from a low BogoMIPS when timers are clocked slowly. It also more accurately calculates the timer tick frequency through the use of calibrate_delay_direct(). -- Reference -- [1] http://article.gmane.org/gmane.linux.kernel/977567 [2] http://article.gmane.org/gmane.linux.ports.arm.kernel/78496 Stephen Boyd (3): [ARM] Translate delay.S into (mostly) C [ARM] Allow machines to override __delay() [ARM] Implement a timer based __delay() loop arch/arm/include/asm/delay.h | 5 ++- arch/arm/kernel/armksyms.c | 4 -- arch/arm/lib/delay.S | 65 ------------------------------ arch/arm/lib/delay.c | 90 ++++++++++++++++++++++++++++++++++++++++++ 4 files changed, 94 insertions(+), 70 deletions(-) delete mode 100644 arch/arm/lib/delay.S create mode 100644 arch/arm/lib/delay.c -- Sent by an employee of the Qualcomm Innovation Center, Inc. The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum. ^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH 3/3] [ARM] Implement a timer based __delay() loop 2010-09-07 18:23 [PATCH 0/3] Fixing udelay() on SMP (and non-SMP too) Stephen Boyd @ 2010-09-07 18:23 ` Stephen Boyd 0 siblings, 0 replies; 9+ messages in thread From: Stephen Boyd @ 2010-09-07 18:23 UTC (permalink / raw) To: linux-arm-kernel udelay() can be incorrect on SMP machines that scale their CPU frequencies independently of one another (as pointed out here http://article.gmane.org/gmane.linux.kernel/977567). The delay loop can either be too fast or too slow depending on which CPU the loops_per_jiffy counter is calibrated on and which CPU the delay loop is running on. udelay() can also be incorrect if the CPU frequency switches during the __delay() loop, causing the loop to either terminate too early, or too late. We'd rather not fix udelay() by forcing it to run on one CPU or take the penalty of a rather large loops_per_jiffy being used in udelay() when the CPU is actually running slower. Instead we solve the problem by making __delay() into a timer based loop which should be unaffected by CPU frequency scaling. Therefore, implement a timer based __delay() loop which can be used in place of the current register based __delay() if a machine so chooses. The kernel is already prepared for a timer based approach (evident by the read_current_timer() function). If an arch implements read_current_timer(), calibrate_delay() will use a timer based function, calibrate_delay_direct(), to calculate loops_per_jiffy (in which case loops_per_jiffy should really be renamed to timer_ticks_per_jiffy). Since the loops_per_jiffy will be based on timer ticks, __delay() should be implemented as a loop around read_current_timer(). Doing this makes the expensive loops_per_jiffy calculation go away (saving ~150ms on boot time on my machine) and fixes udelay() by making it safe in the face of independently scaling CPUs. The only prerequisite is that read_current_timer() is monotonically increasing across calls (and doesn't overflow within ~2000us). There is a downside to this approach though. BogoMIPS is no longer "accurate" in that it reflects the BogoMIPS of the timer and not the CPU. On most SoC's the timer isn't running anywhere near as fast as the CPU so BogoMIPS will be ridiculously low (my timer runs at 4.8 MHz and thus my BogoMIPS is 9.6 compared to my CPU's 800). This shouldn't be too much of a concern though since BogoMIPS are bogus anyway (hence the name). This loop is pretty much a copy of AVR's version made more generic. Signed-off-by: Stephen Boyd <sboyd@codeaurora.org> Reviewed-by: Saravana Kannan <skannan@codeaurora.org> --- arch/arm/include/asm/delay.h | 1 + arch/arm/lib/delay.c | 18 ++++++++++++++++++ 2 files changed, 19 insertions(+), 0 deletions(-) diff --git a/arch/arm/include/asm/delay.h b/arch/arm/include/asm/delay.h index 7c732b5..5c6b9a3 100644 --- a/arch/arm/include/asm/delay.h +++ b/arch/arm/include/asm/delay.h @@ -41,6 +41,7 @@ extern void __const_udelay(unsigned long); __udelay(n)) extern void set_delay_fn(void (*fn)(unsigned long)); +extern void read_current_timer_delay_loop(unsigned long loops); #endif /* defined(_ARM_DELAY_H) */ diff --git a/arch/arm/lib/delay.c b/arch/arm/lib/delay.c index b307fcc..59fdf64 100644 --- a/arch/arm/lib/delay.c +++ b/arch/arm/lib/delay.c @@ -5,6 +5,7 @@ * Copyright (c) 2010, Code Aurora Forum. All rights reserved. * Copyright (C) 1993 Linus Torvalds * Copyright (C) 1997 Martin Mares <mj@atrey.karlin.mff.cuni.cz> + * Copyright (C) 2005-2006 Atmel Corporation * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License version 2 as @@ -12,6 +13,7 @@ */ #include <linux/module.h> #include <linux/delay.h> +#include <linux/timex.h> /* * Oh, if only we had a cycle counter... @@ -26,6 +28,22 @@ void delay_loop(unsigned long loops) ); } +#ifdef ARCH_HAS_READ_CURRENT_TIMER +/* + * Assuming read_current_timer() is monotonically increasing + * across calls. + */ +void read_current_timer_delay_loop(unsigned long loops) +{ + unsigned long bclock, now; + + read_current_timer(&bclock); + do { + read_current_timer(&now); + } while ((now - bclock) < loops); +} +#endif + static void (*delay_fn)(unsigned long) = delay_loop; void set_delay_fn(void (*fn)(unsigned long)) -- Sent by an employee of the Qualcomm Innovation Center, Inc. The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum. ^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH 0/3] Fixing udelay() on SMP (and non-SMP too) @ 2010-09-28 3:33 Stephen Boyd 2010-09-28 3:33 ` [PATCH 3/3] [ARM] Implement a timer based __delay() loop Stephen Boyd 0 siblings, 1 reply; 9+ messages in thread From: Stephen Boyd @ 2010-09-28 3:33 UTC (permalink / raw) To: linux-arm-kernel These patches are another attempt at fixing the udelay() issue pointed out on arm-lkml[1][2]. A quick recap: some SMP machines can scale their CPU frequencies independent of one another. loops_per_jiffy is calibrated globally and used in __const_udelay(). If one CPU is running faster than what the loops_per_jiffy is calculated (or scaled) for, udelay() will be incorrect and not wait long enough (or too long). A similar problem occurs if the cpu frequency is scaled during a udelay() call. We could fix this issue a couple ways, wholesale replacement of __udelay() and __const_udelay() (see [2] for that approach), or replacement of __delay() (this series). Option 1 can fail if anybody uses udelay() before memory is mapped and also duplicates most of the code in asm/delay.h. It also needs to hardcode the timer tick frequency, which can sometimes be inaccurate. The benefit is that loops_per_jiffy stays the same and thus BogoMIPS is unchanged. Option 2 can't fail since the __delay() loop is replaced after memory is mapped in, but it suffers from a low BogoMIPS when timers are clocked slowly. It also more accurately calculates the timer tick frequency through the use of calibrate_delay_direct(). -- Reference -- [1] http://article.gmane.org/gmane.linux.kernel/977567 [2] http://article.gmane.org/gmane.linux.ports.arm.kernel/78496 Stephen Boyd (3): [ARM] Translate delay.S into (mostly) C [ARM] Allow machines to override __delay() [ARM] Implement a timer based __delay() loop arch/arm/include/asm/delay.h | 5 ++- arch/arm/kernel/armksyms.c | 4 -- arch/arm/lib/delay.S | 65 ------------------------------ arch/arm/lib/delay.c | 90 ++++++++++++++++++++++++++++++++++++++++++ 4 files changed, 94 insertions(+), 70 deletions(-) delete mode 100644 arch/arm/lib/delay.S create mode 100644 arch/arm/lib/delay.c -- Sent by an employee of the Qualcomm Innovation Center, Inc. The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum. ^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH 3/3] [ARM] Implement a timer based __delay() loop 2010-09-28 3:33 [PATCH 0/3] Fixing udelay() on SMP (and non-SMP too) Stephen Boyd @ 2010-09-28 3:33 ` Stephen Boyd 2010-10-05 17:38 ` Daniel Walker 0 siblings, 1 reply; 9+ messages in thread From: Stephen Boyd @ 2010-09-28 3:33 UTC (permalink / raw) To: linux-arm-kernel udelay() can be incorrect on SMP machines that scale their CPU frequencies independently of one another (as pointed out here http://article.gmane.org/gmane.linux.kernel/977567). The delay loop can either be too fast or too slow depending on which CPU the loops_per_jiffy counter is calibrated on and which CPU the delay loop is running on. udelay() can also be incorrect if the CPU frequency switches during the __delay() loop, causing the loop to either terminate too early, or too late. We'd rather not fix udelay() by forcing it to run on one CPU or take the penalty of a rather large loops_per_jiffy being used in udelay() when the CPU is actually running slower. Instead we solve the problem by making __delay() into a timer based loop which should be unaffected by CPU frequency scaling. Therefore, implement a timer based __delay() loop which can be used in place of the current register based __delay() if a machine so chooses. The kernel is already prepared for a timer based approach (evident by the read_current_timer() function). If an arch implements read_current_timer(), calibrate_delay() will use a timer based function, calibrate_delay_direct(), to calculate loops_per_jiffy (in which case loops_per_jiffy should really be renamed to timer_ticks_per_jiffy). Since the loops_per_jiffy will be based on timer ticks, __delay() should be implemented as a loop around read_current_timer(). Doing this makes the expensive loops_per_jiffy calculation go away (saving ~150ms on boot time on my machine) and fixes udelay() by making it safe in the face of independently scaling CPUs. The only prerequisite is that read_current_timer() is monotonically increasing across calls (and doesn't overflow within ~2000us). There is a downside to this approach though. BogoMIPS is no longer "accurate" in that it reflects the BogoMIPS of the timer and not the CPU. On most SoC's the timer isn't running anywhere near as fast as the CPU so BogoMIPS will be ridiculously low (my timer runs at 4.8 MHz and thus my BogoMIPS is 9.6 compared to my CPU's 800). This shouldn't be too much of a concern though since BogoMIPS are bogus anyway (hence the name). This loop is pretty much a copy of AVR's version made more generic. Signed-off-by: Stephen Boyd <sboyd@codeaurora.org> Reviewed-by: Saravana Kannan <skannan@codeaurora.org> --- arch/arm/include/asm/delay.h | 1 + arch/arm/lib/delay.c | 18 ++++++++++++++++++ 2 files changed, 19 insertions(+), 0 deletions(-) diff --git a/arch/arm/include/asm/delay.h b/arch/arm/include/asm/delay.h index 7c732b5..5c6b9a3 100644 --- a/arch/arm/include/asm/delay.h +++ b/arch/arm/include/asm/delay.h @@ -41,6 +41,7 @@ extern void __const_udelay(unsigned long); __udelay(n)) extern void set_delay_fn(void (*fn)(unsigned long)); +extern void read_current_timer_delay_loop(unsigned long loops); #endif /* defined(_ARM_DELAY_H) */ diff --git a/arch/arm/lib/delay.c b/arch/arm/lib/delay.c index b307fcc..59fdf64 100644 --- a/arch/arm/lib/delay.c +++ b/arch/arm/lib/delay.c @@ -5,6 +5,7 @@ * Copyright (c) 2010, Code Aurora Forum. All rights reserved. * Copyright (C) 1993 Linus Torvalds * Copyright (C) 1997 Martin Mares <mj@atrey.karlin.mff.cuni.cz> + * Copyright (C) 2005-2006 Atmel Corporation * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License version 2 as @@ -12,6 +13,7 @@ */ #include <linux/module.h> #include <linux/delay.h> +#include <linux/timex.h> /* * Oh, if only we had a cycle counter... @@ -26,6 +28,22 @@ void delay_loop(unsigned long loops) ); } +#ifdef ARCH_HAS_READ_CURRENT_TIMER +/* + * Assuming read_current_timer() is monotonically increasing + * across calls. + */ +void read_current_timer_delay_loop(unsigned long loops) +{ + unsigned long bclock, now; + + read_current_timer(&bclock); + do { + read_current_timer(&now); + } while ((now - bclock) < loops); +} +#endif + static void (*delay_fn)(unsigned long) = delay_loop; void set_delay_fn(void (*fn)(unsigned long)) -- Sent by an employee of the Qualcomm Innovation Center, Inc. The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum. ^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH 3/3] [ARM] Implement a timer based __delay() loop 2010-09-28 3:33 ` [PATCH 3/3] [ARM] Implement a timer based __delay() loop Stephen Boyd @ 2010-10-05 17:38 ` Daniel Walker 2010-10-06 3:36 ` Stephen Boyd 0 siblings, 1 reply; 9+ messages in thread From: Daniel Walker @ 2010-10-05 17:38 UTC (permalink / raw) To: linux-arm-kernel On Mon, 2010-09-27 at 20:33 -0700, Stephen Boyd wrote: > udelay() can be incorrect on SMP machines that scale their CPU > frequencies independently of one another (as pointed out here > http://article.gmane.org/gmane.linux.kernel/977567). The delay > loop can either be too fast or too slow depending on which CPU the > loops_per_jiffy counter is calibrated on and which CPU the delay > loop is running on. udelay() can also be incorrect if the > CPU frequency switches during the __delay() loop, causing the loop > to either terminate too early, or too late. > > We'd rather not fix udelay() by forcing it to run on one CPU or > take the penalty of a rather large loops_per_jiffy being used in > udelay() when the CPU is actually running slower. Instead we > solve the problem by making __delay() into a timer based loop > which should be unaffected by CPU frequency scaling. Therefore, > implement a timer based __delay() loop which can be used in place > of the current register based __delay() if a machine so chooses. > > The kernel is already prepared for a timer based approach > (evident by the read_current_timer() function). If an arch > implements read_current_timer(), calibrate_delay() will use a > timer based function, calibrate_delay_direct(), to calculate > loops_per_jiffy (in which case loops_per_jiffy should really be > renamed to timer_ticks_per_jiffy). Since the loops_per_jiffy will > be based on timer ticks, __delay() should be implemented as a > loop around read_current_timer(). > > Doing this makes the expensive loops_per_jiffy calculation go > away (saving ~150ms on boot time on my machine) and fixes > udelay() by making it safe in the face of independently scaling > CPUs. The only prerequisite is that read_current_timer() is > monotonically increasing across calls (and doesn't overflow > within ~2000us). > > There is a downside to this approach though. BogoMIPS is no > longer "accurate" in that it reflects the BogoMIPS of the timer > and not the CPU. On most SoC's the timer isn't running anywhere > near as fast as the CPU so BogoMIPS will be ridiculously low (my > timer runs at 4.8 MHz and thus my BogoMIPS is 9.6 compared to my > CPU's 800). This shouldn't be too much of a concern though since > BogoMIPS are bogus anyway (hence the name). > > This loop is pretty much a copy of AVR's version made more generic. > > Signed-off-by: Stephen Boyd <sboyd@codeaurora.org> > Reviewed-by: Saravana Kannan <skannan@codeaurora.org> > --- > arch/arm/include/asm/delay.h | 1 + > arch/arm/lib/delay.c | 18 ++++++++++++++++++ > 2 files changed, 19 insertions(+), 0 deletions(-) > > diff --git a/arch/arm/include/asm/delay.h b/arch/arm/include/asm/delay.h > index 7c732b5..5c6b9a3 100644 > --- a/arch/arm/include/asm/delay.h > +++ b/arch/arm/include/asm/delay.h > @@ -41,6 +41,7 @@ extern void __const_udelay(unsigned long); > __udelay(n)) > > extern void set_delay_fn(void (*fn)(unsigned long)); > +extern void read_current_timer_delay_loop(unsigned long loops); > > #endif /* defined(_ARM_DELAY_H) */ > > diff --git a/arch/arm/lib/delay.c b/arch/arm/lib/delay.c > index b307fcc..59fdf64 100644 > --- a/arch/arm/lib/delay.c > +++ b/arch/arm/lib/delay.c > @@ -5,6 +5,7 @@ > * Copyright (c) 2010, Code Aurora Forum. All rights reserved. > * Copyright (C) 1993 Linus Torvalds > * Copyright (C) 1997 Martin Mares <mj@atrey.karlin.mff.cuni.cz> > + * Copyright (C) 2005-2006 Atmel Corporation > * > * This program is free software; you can redistribute it and/or modify > * it under the terms of the GNU General Public License version 2 as > @@ -12,6 +13,7 @@ > */ > #include <linux/module.h> > #include <linux/delay.h> > +#include <linux/timex.h> > > /* > * Oh, if only we had a cycle counter... > @@ -26,6 +28,22 @@ void delay_loop(unsigned long loops) > ); > } > > +#ifdef ARCH_HAS_READ_CURRENT_TIMER > +/* > + * Assuming read_current_timer() is monotonically increasing > + * across calls. You should add more comments here. You assuming that it's monotonic over a 2000us (2ms) period .. I'm not sure this is a good assumption this timer may not be monotonically increasing all the time, what happens then? > +void read_current_timer_delay_loop(unsigned long loops) > +{ > + unsigned long bclock, now; > + > + read_current_timer(&bclock); > + do { > + read_current_timer(&now); > + } while ((now - bclock) < loops); Have you looked@time_before()/time_after() ? Daniel -- Sent by a consultant of the Qualcomm Innovation Center, Inc. The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum. ^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH 3/3] [ARM] Implement a timer based __delay() loop 2010-10-05 17:38 ` Daniel Walker @ 2010-10-06 3:36 ` Stephen Boyd 2010-10-06 13:44 ` Daniel Walker 0 siblings, 1 reply; 9+ messages in thread From: Stephen Boyd @ 2010-10-06 3:36 UTC (permalink / raw) To: linux-arm-kernel On 10/05/2010 10:38 AM, Daniel Walker wrote: >> >> +#ifdef ARCH_HAS_READ_CURRENT_TIMER >> +/* >> + * Assuming read_current_timer() is monotonically increasing >> + * across calls. > You should add more comments here. You assuming that it's monotonic over > a 2000us (2ms) period .. I'm not sure this is a good assumption this > timer may not be monotonically increasing all the time, what happens > then? Ok I'll add that it shouldn't wrap more than once within 2000us (or should I say 5ms since mdelay uses udelay?). Is that what you're saying by it not being monotonically increasing? If a timer isn't increasing the tick count it's broken and this call will loop forever. If the timer wraps, we'll be safe due to unsigned maths as long as it wraps only once. >> +void read_current_timer_delay_loop(unsigned long loops) >> +{ >> + unsigned long bclock, now; >> + >> + read_current_timer(&bclock); >> + do { >> + read_current_timer(&now); >> + } while ((now - bclock) < loops); > Have you looked at time_before()/time_after() ? Nope. Wouldn't that require an addition though to make it work? I'd rather just leave it like it is. -- Sent by an employee of the Qualcomm Innovation Center, Inc. The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum. ^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH 3/3] [ARM] Implement a timer based __delay() loop 2010-10-06 3:36 ` Stephen Boyd @ 2010-10-06 13:44 ` Daniel Walker 0 siblings, 0 replies; 9+ messages in thread From: Daniel Walker @ 2010-10-06 13:44 UTC (permalink / raw) To: linux-arm-kernel On Tue, 2010-10-05 at 20:36 -0700, Stephen Boyd wrote: > On 10/05/2010 10:38 AM, Daniel Walker wrote: > >> > >> +#ifdef ARCH_HAS_READ_CURRENT_TIMER > >> +/* > >> + * Assuming read_current_timer() is monotonically increasing > >> + * across calls. > > You should add more comments here. You assuming that it's monotonic over > > a 2000us (2ms) period .. I'm not sure this is a good assumption this > > timer may not be monotonically increasing all the time, what happens > > then? > > Ok I'll add that it shouldn't wrap more than once within 2000us (or > should I say 5ms since mdelay uses udelay?). Is that what you're saying > by it not being monotonically increasing? If a timer isn't increasing > the tick count it's broken and this call will loop forever. If the timer > wraps, we'll be safe due to unsigned maths as long as it wraps only once. I was saying it could be near wrapping when this process starts, then wrap in the middle. The unsigned math thing should work I think, but add a comment .. Your basically saying it's OK if it wraps, but the comment says it needs to be "monotonically increasing" which actually doesn't appear to be true. It needs to not wrap twice during this process. > >> +void read_current_timer_delay_loop(unsigned long loops) > >> +{ > >> + unsigned long bclock, now; > >> + > >> + read_current_timer(&bclock); > >> + do { > >> + read_current_timer(&now); > >> + } while ((now - bclock) < loops); > > Have you looked at time_before()/time_after() ? > > Nope. Wouldn't that require an addition though to make it work? I'd > rather just leave it like it is. I'm not saying you should change it, I'm saying look at those macro's to make sure your doing thinks correctly. Daniel -- Sent by an consultant of the Qualcomm Innovation Center, Inc. The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum. ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2010-10-06 13:44 UTC | newest] Thread overview: 9+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2010-08-19 2:24 [PATCH 0/3] Fixing udelay() on SMP (and non-SMP too) Stephen Boyd 2010-08-19 2:24 ` [PATCH 1/3] [ARM] Translate delay.S into (mostly) C Stephen Boyd 2010-08-19 2:24 ` [PATCH 2/3] [ARM] Allow machines to override __delay() Stephen Boyd 2010-08-19 2:24 ` [PATCH 3/3] [ARM] Implement a timer based __delay() loop Stephen Boyd -- strict thread matches above, loose matches on Subject: below -- 2010-09-07 18:23 [PATCH 0/3] Fixing udelay() on SMP (and non-SMP too) Stephen Boyd 2010-09-07 18:23 ` [PATCH 3/3] [ARM] Implement a timer based __delay() loop Stephen Boyd 2010-09-28 3:33 [PATCH 0/3] Fixing udelay() on SMP (and non-SMP too) Stephen Boyd 2010-09-28 3:33 ` [PATCH 3/3] [ARM] Implement a timer based __delay() loop Stephen Boyd 2010-10-05 17:38 ` Daniel Walker 2010-10-06 3:36 ` Stephen Boyd 2010-10-06 13:44 ` Daniel Walker
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).