* [PATCH][MIPS] Use CP0 Count register to implement more granular ndelay
@ 2009-03-12 3:28 VomLehn
2009-03-13 9:29 ` Thomas Bogendoerfer
0 siblings, 1 reply; 10+ messages in thread
From: VomLehn @ 2009-03-12 3:28 UTC (permalink / raw)
To: Linux MIPS Mailing List; +Cc: Ralf Baechle
The default implementation of ndelay uses udelay, which will result in the
rounding up of any requested interval to the next highest number of
microseconds. This may be a much longer delay than was desired. However,
if the tick rate of the CP0 Count register is known, it is possible to
implement an accurate ndelay that works on multiple MIPS processors.
To use this, enable CONFIG_CP0_COUNT_NDELAY and modify the platform startup
code to call init_ndelay as early as possible. A good place to call it
is probably the prom_init function. The argument to init_ndelay should be
the CP0 Count register tick rate, in kHz. The tick rate is typically half
the processor clock rate so, if you have a 700 MHz processor, the CP0 Count
register would tick at 350 MHz and you would pass 3500000 to init_ndelay.
This is version 3. Changes from version 2 include:
o Correct the BUG_ON comparison as it has a reversed sense from assert.
Sorry, dumb mistake.
o Remove unnecessary of comparison to see if an unsigned long variable
is bigger than ULONG_MAX.
o Created a "safe" version of ndelay that disables interrupts to avoid
the possibility of a long interrupt processing interval turning a
very short delay into a much longer delay. If you know interrupts are
disabled, you can use no_interrupt_ndelay.
Changes from version 1 include:
o Added definitions for MIPS1, MIPS2, MIPS3, and MIPS4 configurations
o Restricted use of CP0 Count register-based ndelay to MIPS2, MIPS3,
MIPS32, and MIPS64 configurations.
o Replaced assert code with BUG_ON
o Corrected name of preprocessor symbol avoid multiple inclusions of
fast-ratio.h and delay.h
o Used '/**' to mark comments intended to be automatically parsed.
Signed-off-by: David VomLehn <dvomlehn@cisco.com>
---
arch/mips/Kconfig | 30 +++++
arch/mips/include/asm/delay.h | 100 +++++++++++++++++++
arch/mips/include/asm/fast-ratio.h | 53 ++++++++++
arch/mips/lib/Makefile | 6 -
arch/mips/lib/delay.c | 59 +++++++++++
arch/mips/lib/fast-ratio.c | 187 +++++++++++++++++++++++++++++++++++++
6 files changed, 433 insertions(+), 2 deletions(-)
Index: linux-2.6/arch/mips/Kconfig
===================================================================
--- linux-2.6.orig/arch/mips/Kconfig
+++ linux-2.6/arch/mips/Kconfig
@@ -1371,6 +1371,26 @@ config WEAK_REORDERING_BEYOND_LLSC
endmenu
#
+# Collect various processors by instruction family
+#
+config MIPS1
+ bool
+ default y if CPU_R3000 || CPU_TX39XX
+
+config MIPS2
+ bool
+ default y if CPU_R6000
+
+config MIPS3
+ bool
+ default y if CPU_LOONGSON2 || CPU_R4300 || CPU_R4X00 || CPU_TX49XX || \
+ CPU_VR41XX
+
+config MIPS4
+ bool
+ default y if CPU_R8000 || CPU_R10000
+
+#
# These two indicate any level of the MIPS32 and MIPS64 architecture
#
config CPU_MIPS32
@@ -1876,6 +1896,16 @@ config NR_CPUS
source "kernel/time/Kconfig"
+config CP0_COUNT_NDELAY
+ bool "Use coprocessor 0 Count register for ndelay functionality"
+ depends on CPU_MIPS3 || CPU_MIPS4 || CPU_MIPS32 || CPU_MIPS64
+ default n
+ help
+ Implements the ndelay function using the coprocessor 0 Count
+ register. Using this requires including a call to init_ndelay
+ with the Count register increment frequency, in KHz, in one
+ of the early initialization functions.
+
#
# Timer Interrupt Frequency Configuration
#
Index: linux-2.6/arch/mips/include/asm/delay.h
===================================================================
--- linux-2.6.orig/arch/mips/include/asm/delay.h
+++ linux-2.6/arch/mips/include/asm/delay.h
@@ -109,4 +109,104 @@ static inline void __udelay(unsigned lon
#define MAX_UDELAY_MS (1000 / HZ)
#endif
+#ifdef CONFIG_CP0_COUNT_NDELAY
+/*
+ * Definitions for using MIPS CP0 Count register-based ndelay. If
+ * CONFIG_CP0_COUNT_NDELAY is not defined, ndelay will default to using
+ * udelay.
+ */
+
+#include <linux/kernel.h>
+#include <asm/fast-ratio.h>
+#include <asm/mipsregs.h>
+
+/* Maximum amount of time that will be handled with ndelay, in nanoseconds.
+ * Values bigger than this will be bounced up to udelay. */
+#define _MAX_DIRECT_NDELAY 65535
+
+#define ndelay(n) _safe_ndelay(n)
+
+extern struct fast_ratio _ndelay_param;
+
+/*
+ * Compute the number of CP0 Count ticks corresponding to the interval
+ * @nsecs: Interval, expressed in nanoseconds
+ * Breaking this out as its own function makes it easier to test.
+ */
+static inline unsigned int _ndelay_ticks(unsigned int nsecs)
+{
+ return fast_ratio(nsecs, &_ndelay_param);
+}
+
+/**
+ * Delay for at least the given number of nanoseconds
+ * @nsecs: Number of nanoseconds to delay
+ *
+ * This function uses the CP0 Count register to give a pretty accurate delay
+ * for very short delay periods. Very small delays will, unavoidably, be
+ * dominated by the instructions in this function but this should converge
+ * to the true delay reasonably quickly before nsecs gets very large.
+ *
+ * NOTE: Failure to call init_ndelay will result in *very* long delay times.
+ * This is done deliberately to ensure that, if you use ndelay and forget to
+ * call init_delay first, you will notice your mistake quickly.
+ *
+ * NOTE: If we are interrupted for so long that the Count register can
+ * by more than half of the total value, the test will wrap and you will wind
+ * up with a much longer delay than you expect. So, only call this if you:
+ * o Have interrupts disabled, or
+ * o Are sure this can never be interrupted for more than half the time
+ * it takes for the Count register to wrap.
+ * Otherwise, use ndelay.
+ */
+static inline void no_interrupt_ndelay(unsigned long nsecs)
+{
+ int start;
+
+ /* The expected thing would be to do the first read of the Count
+ * register later, just before entering the delay loop. Reading here
+ * ensures that very short intervals will exit the first time through
+ * that loop. */
+ start = read_c0_count();
+
+ if (unlikely(nsecs > _MAX_DIRECT_NDELAY))
+ udelay(DIV_ROUND_UP(nsecs, 1000)); /* Would overflow counter */
+
+ else {
+ int end;
+ int now;
+
+ end = start + _ndelay_ticks(nsecs);
+
+ do {
+ now = read_c0_count();
+ } while (end - now > 0);
+ }
+}
+
+/**
+ * Delay for at least the given number of nanoseconds
+ * @nsecs: Number of nanoseconds to delay
+ *
+ * This is the safe version that disables interrupts to avoid the possibility
+ * of very long interrupts causing the comparision to wrap. Don't use this
+ * directly; use ndelay.
+ */
+static inline void _safe_ndelay(unsigned long nsecs)
+{
+ unsigned long flags;
+
+ local_irq_save(flags);
+ no_interrupt_ndelay(nsecs);
+ local_irq_restore(flags);
+}
+
+extern int init_ndelay(unsigned int count_freq);
+#else
+static inline int init_ndelay(unsigned int count_freq)
+{
+ return 0;
+}
+#endif
+
#endif /* _ASM_DELAY_H */
Index: linux-2.6/arch/mips/include/asm/fast-ratio.h
===================================================================
--- /dev/null
+++ linux-2.6/arch/mips/include/asm/fast-ratio.h
@@ -0,0 +1,53 @@
+/*
+ * fast-ratio.h
+ *
+ * Definitions for using fast evaluator for expressions of the form:
+ * a
+ * x * -
+ * b
+ *
+ * where x can be constrained to some maximum value and a and b are constants.
+ *
+ * Copyright (C) 2009 Cisco Systems, Inc.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+
+#ifndef _ASM_FAST_RATIO_H_
+#define _ASM_FAST_RATIO_H_
+
+/* Instances of this structure will normally be declared with the attribute
+ * __read_mostly since it only makes sense to use the fast-ratio code if
+ * you fill in the structure once for many calls to evalue the result. */
+struct fast_ratio {
+ unsigned long k;
+ unsigned int s;
+ unsigned long r;
+};
+
+/**
+ * Evaluate x * (a / b), a and b constant, as transformed for speed.
+ * @x: Value to multiply by a / b
+ * @fr: Pointer to &struct fast_ratio with transformed values for a and b
+ * Returns x * (a / b), rounded up in an unsigned long value
+ */
+static inline unsigned long fast_ratio(unsigned long x, struct fast_ratio *fr)
+{
+ return (x * fr->k + fr->r) >> fr->s;
+}
+
+extern int init_fast_ratio(unsigned int max_x, unsigned long a,
+ unsigned long b, struct fast_ratio *fr);
+#endif
Index: linux-2.6/arch/mips/lib/Makefile
===================================================================
--- linux-2.6.orig/arch/mips/lib/Makefile
+++ linux-2.6/arch/mips/lib/Makefile
@@ -2,8 +2,8 @@
# Makefile for MIPS-specific library files..
#
-lib-y += csum_partial.o memcpy.o memcpy-inatomic.o memset.o strlen_user.o \
- strncpy_user.o strnlen_user.o uncached.o
+lib-y += csum_partial.o fast-ratio.o memcpy.o memcpy-inatomic.o memset.o \
+ strlen_user.o strncpy_user.o strnlen_user.o uncached.o
obj-y += iomap.o
obj-$(CONFIG_PCI) += iomap-pci.o
@@ -29,5 +29,7 @@ obj-$(CONFIG_CPU_TX49XX) += dump_tlb.o
obj-$(CONFIG_CPU_VR41XX) += dump_tlb.o
obj-$(CONFIG_CPU_CAVIUM_OCTEON) += dump_tlb.o
+obj-$(CONFIG_CP0_COUNT_NDELAY) += delay.o
+
# libgcc-style stuff needed in the kernel
obj-y += ashldi3.o ashrdi3.o cmpdi2.o lshrdi3.o ucmpdi2.o
Index: linux-2.6/arch/mips/lib/delay.c
===================================================================
--- /dev/null
+++ linux-2.6/arch/mips/lib/delay.c
@@ -0,0 +1,59 @@
+/*
+ * delay.c
+ *
+ * Code implementing ndelay using the MIPS CP0 Count register.
+ *
+ * Copyright (C) 2009 Cisco Systems, Inc.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+
+#include <linux/cache.h>
+#include <linux/kernel.h>
+#include <linux/types.h>
+#include <linux/errno.h>
+#include <linux/init.h>
+#include <linux/module.h>
+#include <asm/delay.h>
+
+/* This elements are initialized to a value that will cause huge delays to
+ * arise from use of ndelay before calling init_ndelay. This should make such
+ * mistakes obvious enough to easily find and correct. */
+struct fast_ratio _ndelay_param __read_mostly = {
+ .k = 0,
+ .s = 0,
+ .r = ULONG_MAX / 2,
+};
+EXPORT_SYMBOL(_ndelay_param);
+
+/**
+ * Called to initialize the values for the ndelay function
+ * @f: Frequency, in KHz, of the CP0 Count register increment rate
+ */
+int __init init_ndelay(unsigned int f)
+{
+ int ret;
+
+ ret = init_fast_ratio(_MAX_DIRECT_NDELAY, f, 1000000, &_ndelay_param);
+
+ if (ret)
+ pr_err("Unable to initialize ndelay parameters, errno %d\n",
+ ret);
+ else
+ pr_info("Set ndelay fast_ratio parameters: k %lu s %d r %lu\n",
+ _ndelay_param.k, _ndelay_param.s, _ndelay_param.r);
+
+ return ret;
+}
Index: linux-2.6/arch/mips/lib/fast-ratio.c
===================================================================
--- /dev/null
+++ linux-2.6/arch/mips/lib/fast-ratio.c
@@ -0,0 +1,187 @@
+/*
+ * fast-ratio.c
+ *
+ * Code implementing fast ratio calculator.
+ *
+ * Copyright (C) 2009 Cisco Systems, Inc.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+
+#include <linux/cache.h>
+#include <linux/kernel.h>
+#include <linux/types.h>
+#include <linux/errno.h>
+#include <linux/log2.h>
+#include <asm-generic/bug.h>
+#include <asm/fast-ratio.h>
+
+#ifdef DEBUG
+#define dbg(fmt, ...) pr_crit(fmt, ## __VA_ARGS__)
+#else
+#define dbg(fmt, ...) do { } while (0)
+#endif
+
+#ifndef BITS_PER_LLONG
+#define BITS_PER_LLONG ((BITS_PER_LONG * sizeof(long long)) / sizeof(long))
+#endif
+
+/* Type for intermediate calculations, along with the number of bits and
+ * the maximum size. This should be the biggest unsigned type for which
+ * division and modulus by unsigned long are defined on this
+ * architecture. */
+#ifdef CONFIG_HAVE_ULLONG_DIV_AND_MOD
+typedef unsigned long long intermediate_t;
+#define BITS_PER_ACC BITS_PER_LLONG
+#define ACC_MAX ULLONG_MAX
+#else
+typedef unsigned long intermediate_t;
+#define BITS_PER_ACC BITS_PER_LONG
+#define ACC_MAX ULLONG_MAX
+#endif
+
+/**
+ * Compute transform of equation (x * a)/b for fast computation
+ * @max_x: Maximum value of x
+ * @a: Value of a
+ * @b: value b
+ * @fr: Pointer to a &struct fast_ratio to hold transformed parameters
+ * Returns a zero on success, otherwise a negative errno value. Errno values
+ * are:
+ * -EDOM Parameter b is zero
+ * -EINVAL Either max_x is too large or max_x is zero
+ * -ERANGE The rounded up intermediate value of x * a would not fit
+ * in an unsigned long.
+ *
+ * Mathematically, as long as the ratios:
+ * a k
+ * - = ---
+ * b 2^s
+ *
+ * are equal, the specific values of k and s don't matter. There are
+ * two constraints, however:
+ *
+ * o The value of s must be less than BIT_PER_LONG
+ * o With a rounding constant of r = 2^s - 1, we must have
+ * x * k + r <= ULONG_MAX
+ *
+ * We want k to be as large as possible so that
+ * it has the maximum precision. Getting the largest k means
+ * getting the smallest shift.
+ *
+ * Note that this is designed to work on both 32-bit systems and 64-bit systems
+ * using the LP64 model.
+ */
+int init_fast_ratio(unsigned int max_x, unsigned long a,
+ unsigned long b, struct fast_ratio *fr)
+{
+#define SHIFT_ROUND_UP(_v, _n) (((_n) < 0) ? \
+ (((unsigned long long) (_v)) << -(_n)) : \
+ (((_v) + ((1ull << (_n)) - 1)) >> (_n)))
+#define ROUNDING_CONST(_s) (((_s) < 0) ? 0 : ((1ull << (_s)) - 1))
+ intermediate_t scaled_a;
+ intermediate_t k0;
+ int s0;
+ int min_s;
+ int k_msb;
+ int s;
+ int si;
+ unsigned long long k;
+ unsigned long long r;
+ unsigned long long dividend;
+
+ if (b == 0)
+ return -EDOM; /* Divide by zero */
+
+ if (max_x == 0)
+ return -EINVAL;
+
+ if (a == 0) {
+ fr->k = 0; /* Trivial, result is always zero */
+ fr->s = 0;
+ fr->r = 0;
+ return 0;
+ }
+
+ /* Calculate the rounded up value of a / b with the most precision we
+ * can easily obtain by shifting the value a by n bits to the left.
+ * This means that the value we get is (a / b) * 2^n. We could get
+ * an overflow if we used the usual (a + (b - 1))/ b, so we compute the
+ * rounding value explicitly. If the scale value of a modulus b is
+ * not zero, we need to increase the result by one. */
+ s0 = (BITS_PER_ACC - 1) - ilog2(a);
+ scaled_a = ((intermediate_t) a) << s0;
+
+ k0 = (scaled_a / b) + ((scaled_a % b == 0) ? 0 : 1);
+ k_msb = ilog2(k0) + 1;
+ dbg("scaled_a %llx scaled_a %% b %llx k0 %llx s0 %d k_msb %d\n",
+ (unsigned long long) scaled_a,
+ (unsigned long long) scaled_a % b,
+ (unsigned long long) k0, s0, k_msb);
+
+ /* Find a shift that yields the largest value of k that will avoid an
+ * overflow on an unsigned long when multiplied by max_x, and rounded
+ * up. */
+ min_s = k_msb;
+
+ for (;;) {
+ int shft;
+ unsigned long long ri;
+ unsigned long long ki;
+ unsigned long long p;
+
+ shft = min_s - 1;
+ si = s0 - shft;
+ ki = SHIFT_ROUND_UP(k0, shft);
+ ri = ROUNDING_CONST(si);
+
+ /* We must be sure that max_x is smaller than p or the
+ * following calculation will eventually overflow */
+ BUG_ON(sizeof(max_x) > sizeof(p));
+ p = max_x * ki;
+ dividend = p + ri;
+ dbg("min_s %d shft %d si %d ri %llx ki %llx max_x %x p %llx "
+ "dividend %llx\n",
+ min_s, shft, si, ri, ki, max_x, p, dividend);
+ if ((si > BITS_PER_LONG || dividend > ULONG_MAX))
+ break;
+ min_s--;
+ }
+
+ s = s0 - min_s;
+ k = SHIFT_ROUND_UP(k0, min_s);
+ r = ROUNDING_CONST(s);
+ dbg("min_s %d s %d k %llx max_x * k %llx r %llx dividend %llx\n",
+ min_s, s, k, max_x * k, r, max_x * k + r);
+
+ /* If we have a negative shift, we couldn't find a k that would avoid
+ * an overflow. If that's true, or we have an overflow at the current
+ * shift, we return an error. */
+ if (s < 0 || max_x * k + r > ULONG_MAX)
+ return -ERANGE;
+
+ /* If the shift we came up with would shift the final result out
+ * of the register, we've underflowed the result */
+ if (s >= BITS_PER_LONG)
+ return -ERANGE;
+
+ fr->s = s;
+ fr->k = k;
+ fr->r = r;
+
+ return 0;
+#undef SHIFT_ROUND_UP
+#undef ROUNDING_CONST
+}
^ permalink raw reply [flat|nested] 10+ messages in thread* Re: [PATCH][MIPS] Use CP0 Count register to implement more granular ndelay
2009-03-12 3:28 [PATCH][MIPS] Use CP0 Count register to implement more granular ndelay VomLehn
@ 2009-03-13 9:29 ` Thomas Bogendoerfer
2009-03-13 11:32 ` Ralf Baechle
2009-03-13 17:35 ` VomLehn
0 siblings, 2 replies; 10+ messages in thread
From: Thomas Bogendoerfer @ 2009-03-13 9:29 UTC (permalink / raw)
To: VomLehn; +Cc: Linux MIPS Mailing List, Ralf Baechle
On Wed, Mar 11, 2009 at 08:28:50PM -0700, VomLehn wrote:
> #
> +# Collect various processors by instruction family
> +#
> +config MIPS1
> + bool
> + default y if CPU_R3000 || CPU_TX39XX
> +
> +config MIPS2
> + bool
> + default y if CPU_R6000
> +
> +config MIPS3
> + bool
> + default y if CPU_LOONGSON2 || CPU_R4300 || CPU_R4X00 || CPU_TX49XX || \
> + CPU_VR41XX
> +
> +config MIPS4
> + bool
> + default y if CPU_R8000 || CPU_R10000
> +
what about all the R5k CPUs ?
Thomas.
--
Crap can work. Given enough thrust pigs will fly, but it's not necessary a
good idea. [ RFC1925, 2.3 ]
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH][MIPS] Use CP0 Count register to implement more granular ndelay
2009-03-13 9:29 ` Thomas Bogendoerfer
@ 2009-03-13 11:32 ` Ralf Baechle
2009-03-13 17:35 ` VomLehn
1 sibling, 0 replies; 10+ messages in thread
From: Ralf Baechle @ 2009-03-13 11:32 UTC (permalink / raw)
To: Thomas Bogendoerfer; +Cc: VomLehn, Linux MIPS Mailing List
On Fri, Mar 13, 2009 at 10:29:07AM +0100, Thomas Bogendoerfer wrote:
> > +config MIPS4
> > + bool
> > + default y if CPU_R8000 || CPU_R10000
> > +
>
> what about all the R5k CPUs ?
There is cpu_has_counter which return if a processor actually has a cp0
counter. Also cpu_has_mfc0_count_bug() which indicates usability of
the counter. The cp0 counter should rather not be used on early R4000
processors, for example.
Ralf
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH][MIPS] Use CP0 Count register to implement more granular ndelay
2009-03-13 9:29 ` Thomas Bogendoerfer
2009-03-13 11:32 ` Ralf Baechle
@ 2009-03-13 17:35 ` VomLehn
1 sibling, 0 replies; 10+ messages in thread
From: VomLehn @ 2009-03-13 17:35 UTC (permalink / raw)
To: Thomas Bogendoerfer; +Cc: Linux MIPS Mailing List, Ralf Baechle
On Fri, Mar 13, 2009 at 10:29:07AM +0100, Thomas Bogendoerfer wrote:
> On Wed, Mar 11, 2009 at 08:28:50PM -0700, VomLehn wrote:
> > #
> > +# Collect various processors by instruction family
> > +#
> > +config MIPS1
> > + bool
> > + default y if CPU_R3000 || CPU_TX39XX
> > +
> > +config MIPS2
> > + bool
> > + default y if CPU_R6000
> > +
> > +config MIPS3
> > + bool
> > + default y if CPU_LOONGSON2 || CPU_R4300 || CPU_R4X00 || CPU_TX49XX || \
> > + CPU_VR41XX
> > +
> > +config MIPS4
> > + bool
> > + default y if CPU_R8000 || CPU_R10000
> > +
>
> what about all the R5k CPUs ?
Excellent question. What are their names and what is their ISA called?
^ permalink raw reply [flat|nested] 10+ messages in thread
[parent not found: <web-5716826@test.infohit.si>]
* Re: [PATCH][MIPS] Use CP0 Count register to implement more granular ndelay
[not found] <web-5716826@test.infohit.si>
@ 2009-03-25 17:37 ` David VomLehn
0 siblings, 0 replies; 10+ messages in thread
From: David VomLehn @ 2009-03-25 17:37 UTC (permalink / raw)
To: Mauricio Culibrk; +Cc: linux-mips
Mauricio Culibrk wrote:
> Hi there!
>
> I'm really sorry for bothering you.... I noticed your posts and patch on
> the linux-mips mailing list....
> and I'm very interested in ndelay patch you proposed.
>
> I'm currently using some embedded mips-based boards for bit-banging SPI
> and I2C implementations... but the current code uses udelay which is way
> too "long" for the purporse...
>
> I'm wondering if you have any updated/fixed patch available as you
> mention that on the list (that you'll fix your patch a little regarding
> all the comments received)
>
> Anyway... I'm using some "consumer" boards based on Atheros WiSoCs
> AR2315, AR2317... and Broadcom chips which should have MIPS32 MIPS4K
> cores...
> this cpu should have a "functioning cr0 register", right? (as I have
> absolutely no "datasheets" available to check for that)
The latest version of the ndelay patch is version 3, which disables
interrupts to ensure the Count register doesn't wrap. If you have
interrupts disabled already, there is another function you can can call.
You can get the version 3 patch from the mailing list archive at
http://www.linux-mips.org/archives/linux-mips/2009-03/msg00073.html
I haven't received much feedback since the first version of the patch,
and it's something we're already using, so I think it's in pretty good
shape. And, as I understand it, R4000-series processors should be in
shape as far as a good Count register. Part of the version 3 patch adds
dependencies so that it will only appear in your favorite configuration
tool if you've selected a processor on which it will work. Ralf has
mentioned some issues with some R4000 processors, though.
As the code stands on my 24K processor, a requested delay of 100 nsec
ends up as an actual delay of a bit over 130 nsec. Not perfect, but
definitely less than 1000 nsec. I have been thinking about adding in a
calibration constant that should get it closer, but it hasn't been
important yet and I wanted the basic patch to get accepted before I did
anything *really* obscure.
^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH][MIPS] Use CP0 Count register to implement more granular ndelay
@ 2009-02-27 23:09 VomLehn
2009-02-28 21:10 ` Paul Gortmaker
0 siblings, 1 reply; 10+ messages in thread
From: VomLehn @ 2009-02-27 23:09 UTC (permalink / raw)
To: linux-mips; +Cc: ralf
The default implementation of ndelay uses udelay, which will result in the
rounding up of any requested interval to the next highest number of
microseconds. This may be a much longer delay than was desired. However,
if the tick rate of the CP0 Count register is known, it is possible to
implement an accurate ndelay that works on multiple MIPS processors.
To use this, enable CONFIG_CP0_COUNT_NDELAY and modify the platform startup
code to call init_ndelay as early as possible. A good place to call it
is probably the prom_init function. The argument to init_ndelay should be
the CP0 Count register tick rate, in kHz. The tick rate is typically half
the processor clock rate so, if you have a 700 MHz processor, the CP0 Count
register would tick at 350 MHz and you would pass 3500000 to init_ndelay.
At the risk of being obvious, you will need to ensure that ndelay isn't used
until after the call to init_ndelay. There are no checks to enforce this as
it would increase the latency in ndelay, but, in order to make it obvious
that init_ndelay hasn't been called early enough, the initial ndelay
parameters are set to cause a really large delay.
Signed-off-by: David VomLehn <dvomlehn@cisco.com>
---
arch/mips/Kconfig | 9 ++
arch/mips/include/asm/delay.h | 75 ++++++++++++++
arch/mips/include/asm/fast-ratio.h | 53 ++++++++++
arch/mips/lib/Makefile | 6 +-
arch/mips/lib/delay.c | 59 +++++++++++
arch/mips/lib/fast-ratio.c | 196 ++++++++++++++++++++++++++++++++++++
6 files changed, 396 insertions(+), 2 deletions(-)
diff --git a/arch/mips/Kconfig b/arch/mips/Kconfig
index 0b5f16b..1568b91 100644
--- a/arch/mips/Kconfig
+++ b/arch/mips/Kconfig
@@ -1807,6 +1807,15 @@ config NR_CPUS
source "kernel/time/Kconfig"
+config CP0_COUNT_NDELAY
+ bool "Use coprocessor 0 Count register for ndelay functionality"
+ default n
+ help
+ Implements the ndelay function using the coprocessor 0 Count
+ register. Using this requires including a call to init_ndelay
+ with the Count register increment frequency, in KHz, in one
+ of the early initialization functions.
+
#
# Timer Interrupt Frequency Configuration
#
diff --git a/arch/mips/include/asm/delay.h b/arch/mips/include/asm/delay.h
index b0bccd2..fc41c46 100644
--- a/arch/mips/include/asm/delay.h
+++ b/arch/mips/include/asm/delay.h
@@ -109,4 +109,79 @@ static inline void __udelay(unsigned long usecs, unsigned long lpj)
#define MAX_UDELAY_MS (1000 / HZ)
#endif
+#ifdef CONFIG_CP0_COUNT_NDELAY
+/*
+ * Definitions for using MIPS CP0 Count register-based ndelay. If
+ * CONFIG_CP0_COUNT_NDELAY is not defined, ndelay will default to using
+ * udelay.
+ */
+
+#include <linux/kernel.h>
+#include <asm/fast-ratio.h>
+#include <asm/mipsregs.h>
+
+/* Maximum amount of time that will be handled with ndelay, in nanoseconds.
+ * Values bigger than this will be bounced up to udelay. */
+#define _MAX_DIRECT_NDELAY 65535
+
+#define ndelay(n) _ndelay(n)
+
+extern struct fast_ratio _ndelay_param;
+
+/*
+ * Compute the number of CP0 Count ticks corresponding to the interval
+ * @nsecs: Interval, expressed in nanoseconds
+ * Breaking this out as its own function makes it easier to test.
+ */
+static inline unsigned int _ndelay_ticks(unsigned int nsecs)
+{
+ return fast_ratio(nsecs, &_ndelay_param);
+}
+
+/*
+ * Delay for at least the given number of nanoseconds
+ * @nsecs: Number of nanoseconds to delay
+ *
+ * This function uses the CP0 Count register to give a pretty accurate delay
+ * for very short delay periods. Very small delays will, unavoidably, be
+ * dominated by the instructions in this function but this should converge
+ * to the true delay reasonably quickly before nsecs gets very large.
+ *
+ * NOTE: Failure to call init_ndelay will result in *very* long delay times.
+ * This is done deliberately to ensure that, if you use ndelay and forget to
+ * call init_delay first, you will notice your mistake quickly.
+ */
+static inline void _ndelay(unsigned long nsecs)
+{
+ int start;
+
+ /* The expected thing would be to do the first read of the Count
+ * register later, just before entering the delay loop. Reading here
+ * ensures that very short intervals will exit the first time through
+ * that loop. */
+ start = read_c0_count();
+
+ if (unlikely(nsecs > _MAX_DIRECT_NDELAY))
+ udelay(DIV_ROUND_UP(nsecs, 1000)); /* Would overflow counter */
+
+ else {
+ int end;
+ int now;
+
+ end = start + _ndelay_ticks(nsecs);
+
+ do {
+ now = read_c0_count();
+ } while (end - now > 0);
+ }
+}
+
+extern int init_ndelay(unsigned int count_freq);
+#else
+static inline int init_ndelay(unsigned int count_freq)
+{
+ return 0;
+}
+#endif
+
#endif /* _ASM_DELAY_H */
diff --git a/arch/mips/include/asm/fast-ratio.h b/arch/mips/include/asm/fast-ratio.h
new file mode 100644
index 0000000..84ac286
--- /dev/null
+++ b/arch/mips/include/asm/fast-ratio.h
@@ -0,0 +1,53 @@
+/*
+ * fast-ratio.h
+ *
+ * Definitions for using fast evaluator for expressions of the form:
+ * a
+ * x * -
+ * b
+ *
+ * where x can be constrained to some maximum value and a and b are constants.
+ *
+ * Copyright (C) 2009 Cisco Systems, Inc.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+
+#ifndef _ASM_MACH_POWERTV_FAST_RATIO_H_
+#define _ASM_MACH_POWERTV_FAST_RATIO_H_
+
+/* Instances of this structure will normally be declared with the attribute
+ * __read_mostly since it only makes sense to use the fast-ratio code if
+ * you fill in the structure once for many calls to evalue the result. */
+struct fast_ratio {
+ unsigned long k;
+ unsigned int s;
+ unsigned long r;
+};
+
+/*
+ * Evaluate x * (a / b), a and b constant, as transformed for speed.
+ * @x: Value to multiply by a / b
+ * @fr: Pointer to &struct fast_ratio with transformed values for a and b
+ * Returns x * (a / b), rounded up in an unsigned long value
+ */
+static inline unsigned long fast_ratio(unsigned long x, struct fast_ratio *fr)
+{
+ return (x * fr->k + fr->r) >> fr->s;
+}
+
+extern int init_fast_ratio(unsigned int max_x, unsigned long a,
+ unsigned long b, struct fast_ratio *fr);
+#endif
diff --git a/arch/mips/lib/Makefile b/arch/mips/lib/Makefile
index dbcf651..e5c4ee5 100644
--- a/arch/mips/lib/Makefile
+++ b/arch/mips/lib/Makefile
@@ -2,8 +2,8 @@
# Makefile for MIPS-specific library files..
#
-lib-y += csum_partial.o memcpy.o memcpy-inatomic.o memset.o strlen_user.o \
- strncpy_user.o strnlen_user.o uncached.o
+lib-y += csum_partial.o fast-ratio.o memcpy.o memcpy-inatomic.o memset.o \
+ strlen_user.o strncpy_user.o strnlen_user.o uncached.o
obj-y += iomap.o
obj-$(CONFIG_PCI) += iomap-pci.o
@@ -28,5 +28,7 @@ obj-$(CONFIG_CPU_TX39XX) += r3k_dump_tlb.o
obj-$(CONFIG_CPU_TX49XX) += dump_tlb.o
obj-$(CONFIG_CPU_VR41XX) += dump_tlb.o
+obj-$(CONFIG_CP0_COUNT_NDELAY) += delay.o
+
# libgcc-style stuff needed in the kernel
obj-y += ashldi3.o ashrdi3.o cmpdi2.o lshrdi3.o ucmpdi2.o
diff --git a/arch/mips/lib/delay.c b/arch/mips/lib/delay.c
new file mode 100644
index 0000000..0d74543
--- /dev/null
+++ b/arch/mips/lib/delay.c
@@ -0,0 +1,59 @@
+/*
+ * delay.c
+ *
+ * Code implementing ndelay using the MIPS CP0 Count register.
+ *
+ * Copyright (C) 2009 Cisco Systems, Inc.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+
+#include <linux/cache.h>
+#include <linux/kernel.h>
+#include <linux/types.h>
+#include <linux/errno.h>
+#include <linux/init.h>
+#include <linux/module.h>
+#include <asm/delay.h>
+
+/* This elements are initialized to a value that will cause huge delays to
+ * arise from use of ndelay before calling init_ndelay. This should make such
+ * mistakes obvious enough to easily find and correct. */
+struct fast_ratio _ndelay_param __read_mostly = {
+ .k = 0,
+ .s = 0,
+ .r = ULONG_MAX / 2,
+};
+EXPORT_SYMBOL(_ndelay_param);
+
+/*
+ * Called to initialize the values for the ndelay function
+ * @f: Frequency, in KHz, of the CP0 Count register increment rate
+ */
+int __init init_ndelay(unsigned int f)
+{
+ int ret;
+
+ ret = init_fast_ratio(_MAX_DIRECT_NDELAY, f, 1000000, &_ndelay_param);
+
+ if (ret)
+ pr_err("Unable to initialize ndelay parameters, errno %d\n",
+ ret);
+ else
+ pr_info("Set ndelay fast_ratio parameters: k %u s %u r %u\n",
+ _ndelay_param.k, _ndelay_param.s, _ndelay_param.r);
+
+ return ret;
+}
diff --git a/arch/mips/lib/fast-ratio.c b/arch/mips/lib/fast-ratio.c
new file mode 100644
index 0000000..c630adf
--- /dev/null
+++ b/arch/mips/lib/fast-ratio.c
@@ -0,0 +1,196 @@
+/*
+ * fast-ratio.c
+ *
+ * Code implementing fast ratio calculator.
+ *
+ * Copyright (C) 2009 Cisco Systems, Inc.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+
+#include <linux/cache.h>
+#include <linux/kernel.h>
+#include <linux/types.h>
+#include <linux/errno.h>
+#include <linux/log2.h>
+#include <asm/fast-ratio.h>
+
+#ifndef __KERNEL__
+#include <assert.h>
+#else
+#include <asm-generic/bug.h>
+
+#ifndef assert
+#define assert(x) BUG_ON(!(x))
+#endif
+#endif
+
+#ifdef DEBUG
+#define dbg(fmt, ...) pr_crit(fmt, ## __VA_ARGS__)
+#else
+#define dbg(fmt, ...) do { } while (0)
+#endif
+
+#ifndef BITS_PER_LLONG
+#define BITS_PER_LLONG ((BITS_PER_LONG * sizeof(long long)) / sizeof(long))
+#endif
+
+/* Type for intermediate calculations, along with the number of bits and
+ * the maximum size. This should be the biggest unsigned type for which
+ * division and modulus by unsigned long are defined on this
+ * architecture. */
+#ifdef CONFIG_HAVE_ULLONG_DIV_AND_MOD
+typedef unsigned long long intermediate_t;
+#define BITS_PER_ACC BITS_PER_LLONG
+#define ACC_MAX ULLONG_MAX
+#else
+typedef unsigned long intermediate_t;
+#define BITS_PER_ACC BITS_PER_LONG
+#define ACC_MAX ULLONG_MAX
+#endif
+
+/*
+ * Compute transform of equation (x * a)/b for fast computation
+ * @max_x: Maximum value of x
+ * @a: Value of a
+ * @b: value b
+ * @fr: Pointer to a &struct fast_ratio to hold transformed parameters
+ * Returns a zero on success, otherwise a negative errno value. Errno values
+ * are:
+ * -EDOM Parameter b is zero
+ * -EINVAL Either max_x is too large or max_x is zero
+ * -ERANGE The rounded up intermediate value of x * a would not fit
+ * in an unsigned long.
+ *
+ * Mathematically, as long as the ratios:
+ * a k
+ * - = ---
+ * b 2^s
+ *
+ * are equal, the specific values of k and s don't matter. There are
+ * two constraints, however:
+ *
+ * o The value of s must be less than BIT_PER_LONG
+ * o With a rounding constant of r = 2^s - 1, we must have
+ * x * k + r <= ULONG_MAX
+ *
+ * We want k to be as large as possible so that
+ * it has the maximum precision. Getting the largest k means
+ * getting the smallest shift.
+ *
+ * Note that this is designed to work on both 32-bit systems and 64-bit systems
+ * using the LP64 model.
+ */
+int init_fast_ratio(unsigned int max_x, unsigned long a,
+ unsigned long b, struct fast_ratio *fr)
+{
+#define SHIFT_ROUND_UP(_v, _n) (((_n) < 0) ? \
+ (((unsigned long long) (_v)) << -(_n)) : \
+ (((_v) + ((1ull << (_n)) - 1)) >> (_n)))
+#define ROUNDING_CONST(_s) (((_s) < 0) ? 0 : ((1ull << (_s)) - 1))
+ intermediate_t scaled_a;
+ intermediate_t k0;
+ int s0;
+ int min_s;
+ int k_msb;
+ int s;
+ int si;
+ unsigned long long k;
+ unsigned long long r;
+ unsigned long long dividend;
+
+ if (b == 0)
+ return -EDOM; /* Divide by zero */
+
+ if (max_x > ULONG_MAX || max_x == 0)
+ return -EINVAL;
+
+ if (a == 0) {
+ fr->k = 0; /* Trivial, result is always zero */
+ fr->s = 0;
+ fr->r = 0;
+ return 0;
+ }
+
+ /* Calculate the rounded up value of a / b with the most precision we
+ * can easily obtain by shifting the value a by n bits to the left.
+ * This means that the value we get is (a / b) * 2^n. We could get
+ * an overflow if we used the usual (a + (b - 1))/ b, so we compute the
+ * rounding value explicitly. If the scale value of a modulus b is
+ * not zero, we need to increase the result by one. */
+ s0 = (BITS_PER_ACC - 1) - ilog2(a);
+ scaled_a = ((intermediate_t) a) << s0;
+
+ k0 = (scaled_a / b) + ((scaled_a % b == 0) ? 0 : 1);
+ k_msb = ilog2(k0) + 1;
+ dbg("scaled_a %llx scaled_a %% b %llx k0 %llx s0 %d k_msb %d\n",
+ (unsigned long long) scaled_a,
+ (unsigned long long) scaled_a % b,
+ (unsigned long long) k0, s0, k_msb);
+
+ /* Find a shift that yields the largest value of k that will avoid an
+ * overflow on an unsigned long when multiplied by max_x, and rounded
+ * up. */
+ min_s = k_msb;
+
+ for (;;) {
+ int shft;
+ unsigned long long ri;
+ unsigned long long ki;
+ unsigned long long p;
+
+ shft = min_s - 1;
+ si = s0 - shft;
+ ki = SHIFT_ROUND_UP(k0, shft);
+ ri = ROUNDING_CONST(si);
+
+ /* We must be sure that max_x is smaller than p or the
+ * following calculation will eventually overflow */
+ assert(sizeof(max_x) < sizeof(p));
+ p = max_x * ki;
+ dividend = p + ri;
+ dbg("min_s %d shft %d si %d ri %llx ki %llx max_x %x p %llx "
+ "dividend %llx\n",
+ min_s, shft, si, ri, ki, max_x, p, dividend);
+ if ((si > BITS_PER_LONG || dividend > ULONG_MAX))
+ break;
+ min_s--;
+ }
+
+ s = s0 - min_s;
+ k = SHIFT_ROUND_UP(k0, min_s);
+ r = ROUNDING_CONST(s);
+ dbg("min_s %d s %d k %llx max_x * k %llx r %llx dividend %llx\n",
+ min_s, s, k, max_x * k, r, max_x * k + r);
+
+ /* If we have a negative shift, we couldn't find a k that would avoid
+ * an overflow. If that's true, or we have an overflow at the current
+ * shift, we return an error. */
+ if (s < 0 || max_x * k + r > ULONG_MAX)
+ return -ERANGE;
+
+ /* If the shift we came up with would shift the final result out
+ * of the register, we've underflowed the result */
+ if (s >= BITS_PER_LONG)
+ return -ERANGE;
+
+ fr->s = s;
+ fr->k = k;
+ fr->r = r;
+
+ return 0;
+#undef SHIFT_ROUND_UP
+#undef ROUNDING_CONST
+}
^ permalink raw reply related [flat|nested] 10+ messages in thread* Re: [PATCH][MIPS] Use CP0 Count register to implement more granular ndelay
@ 2009-02-28 21:10 ` Paul Gortmaker
0 siblings, 0 replies; 10+ messages in thread
From: Paul Gortmaker @ 2009-02-28 21:10 UTC (permalink / raw)
To: VomLehn; +Cc: linux-mips, ralf
On Fri, Feb 27, 2009 at 6:09 PM, VomLehn <dvomlehn@cisco.com> wrote:
> The default implementation of ndelay uses udelay, which will result in the
> rounding up of any requested interval to the next highest number of
> microseconds. This may be a much longer delay than was desired. However,
> if the tick rate of the CP0 Count register is known, it is possible to
> implement an accurate ndelay that works on multiple MIPS processors.
Presumably the only case where this would ever matter is for delays
that needed to be much less than udelay(1) -- is there a lot of these
out there? According to git grep, the only user of ndelay in
arch/mips is lasat. And, if what is there is working now, then one
could argue that the calls for the short delays are not explicitly
required to be less than udelay(1).
>
> To use this, enable CONFIG_CP0_COUNT_NDELAY and modify the platform startup
> code to call init_ndelay as early as possible. A good place to call it
> is probably the prom_init function. The argument to init_ndelay should be
> the CP0 Count register tick rate, in kHz. The tick rate is typically half
> the processor clock rate so, if you have a 700 MHz processor, the CP0 Count
> register would tick at 350 MHz and you would pass 3500000 to init_ndelay.
>
> At the risk of being obvious, you will need to ensure that ndelay isn't used
> until after the call to init_ndelay. There are no checks to enforce this as
> it would increase the latency in ndelay, but, in order to make it obvious
> that init_ndelay hasn't been called early enough, the initial ndelay
> parameters are set to cause a really large delay.
I didn't see the arch_initcall for the init_ndelay placed anywhere in
this patch.
>
> Signed-off-by: David VomLehn <dvomlehn@cisco.com>
> ---
> arch/mips/Kconfig | 9 ++
> arch/mips/include/asm/delay.h | 75 ++++++++++++++
> arch/mips/include/asm/fast-ratio.h | 53 ++++++++++
> arch/mips/lib/Makefile | 6 +-
> arch/mips/lib/delay.c | 59 +++++++++++
> arch/mips/lib/fast-ratio.c | 196 ++++++++++++++++++++++++++++++++++++
> 6 files changed, 396 insertions(+), 2 deletions(-)
>
> diff --git a/arch/mips/Kconfig b/arch/mips/Kconfig
> index 0b5f16b..1568b91 100644
> --- a/arch/mips/Kconfig
> +++ b/arch/mips/Kconfig
> @@ -1807,6 +1807,15 @@ config NR_CPUS
>
> source "kernel/time/Kconfig"
>
> +config CP0_COUNT_NDELAY
> + bool "Use coprocessor 0 Count register for ndelay functionality"
> + default n
Does there need to be some sort of depends here to cover off any
limitations where it is known that it won't work?
> + help
> + Implements the ndelay function using the coprocessor 0 Count
> + register. Using this requires including a call to init_ndelay
> + with the Count register increment frequency, in KHz, in one
> + of the early initialization functions.
> +
> #
> # Timer Interrupt Frequency Configuration
> #
> diff --git a/arch/mips/include/asm/delay.h b/arch/mips/include/asm/delay.h
> index b0bccd2..fc41c46 100644
> --- a/arch/mips/include/asm/delay.h
> +++ b/arch/mips/include/asm/delay.h
> @@ -109,4 +109,79 @@ static inline void __udelay(unsigned long usecs, unsigned long lpj)
> #define MAX_UDELAY_MS (1000 / HZ)
> #endif
>
> +#ifdef CONFIG_CP0_COUNT_NDELAY
> +/*
> + * Definitions for using MIPS CP0 Count register-based ndelay. If
> + * CONFIG_CP0_COUNT_NDELAY is not defined, ndelay will default to using
> + * udelay.
> + */
> +
> +#include <linux/kernel.h>
> +#include <asm/fast-ratio.h>
> +#include <asm/mipsregs.h>
> +
> +/* Maximum amount of time that will be handled with ndelay, in nanoseconds.
> + * Values bigger than this will be bounced up to udelay. */
> +#define _MAX_DIRECT_NDELAY 65535
Why the leading underscore here? Maybe MAX_CP0_NDELAY would be a
better choice if it has to be changed anyway?
> +
> +#define ndelay(n) _ndelay(n)
> +
> +extern struct fast_ratio _ndelay_param;
...and here ; not sure why the leading underscore.
> +
> +/*
> + * Compute the number of CP0 Count ticks corresponding to the interval
> + * @nsecs: Interval, expressed in nanoseconds
> + * Breaking this out as its own function makes it easier to test.
> + */
> +static inline unsigned int _ndelay_ticks(unsigned int nsecs)
> +{
> + return fast_ratio(nsecs, &_ndelay_param);
> +}
> +
> +/*
> + * Delay for at least the given number of nanoseconds
> + * @nsecs: Number of nanoseconds to delay
> + *
> + * This function uses the CP0 Count register to give a pretty accurate delay
> + * for very short delay periods. Very small delays will, unavoidably, be
> + * dominated by the instructions in this function but this should converge
> + * to the true delay reasonably quickly before nsecs gets very large.
> + *
> + * NOTE: Failure to call init_ndelay will result in *very* long delay times.
> + * This is done deliberately to ensure that, if you use ndelay and forget to
> + * call init_delay first, you will notice your mistake quickly.
> + */
> +static inline void _ndelay(unsigned long nsecs)
> +{
> + int start;
> +
> + /* The expected thing would be to do the first read of the Count
> + * register later, just before entering the delay loop. Reading here
> + * ensures that very short intervals will exit the first time through
> + * that loop. */
> + start = read_c0_count();
Is this really going to all work on mips64? I've spent hours
debugging silent boot death on mips64 due to bad variable choices used
for stuff playing with read_c0_count when mips went to generic
clockevents on r4k, and it wasn't fun.
> +
> + if (unlikely(nsecs > _MAX_DIRECT_NDELAY))
> + udelay(DIV_ROUND_UP(nsecs, 1000)); /* Would overflow counter */
> +
> + else {
> + int end;
> + int now;
> +
> + end = start + _ndelay_ticks(nsecs);
> +
> + do {
> + now = read_c0_count();
> + } while (end - now > 0);
> + }
> +}
> +
> +extern int init_ndelay(unsigned int count_freq);
> +#else
> +static inline int init_ndelay(unsigned int count_freq)
> +{
> + return 0;
> +}
> +#endif
> +
> #endif /* _ASM_DELAY_H */
> diff --git a/arch/mips/include/asm/fast-ratio.h b/arch/mips/include/asm/fast-ratio.h
> new file mode 100644
> index 0000000..84ac286
> --- /dev/null
> +++ b/arch/mips/include/asm/fast-ratio.h
> @@ -0,0 +1,53 @@
> +/*
> + * fast-ratio.h
> + *
> + * Definitions for using fast evaluator for expressions of the form:
> + * a
> + * x * -
> + * b
> + *
> + * where x can be constrained to some maximum value and a and b are constants.
> + *
> + * Copyright (C) 2009 Cisco Systems, Inc.
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License as published by
> + * the Free Software Foundation; either version 2 of the License, or
> + * (at your option) any later version.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program; if not, write to the Free Software
> + * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
> + */
> +
> +#ifndef _ASM_MACH_POWERTV_FAST_RATIO_H_
> +#define _ASM_MACH_POWERTV_FAST_RATIO_H_
s/MACH_POWERTV_// is probably what you wanted to do here.
> +
> +/* Instances of this structure will normally be declared with the attribute
> + * __read_mostly since it only makes sense to use the fast-ratio code if
> + * you fill in the structure once for many calls to evalue the result. */
> +struct fast_ratio {
> + unsigned long k;
> + unsigned int s;
> + unsigned long r;
> +};
Use of "int" again tends to make me nervous.
> +
> +/*
> + * Evaluate x * (a / b), a and b constant, as transformed for speed.
> + * @x: Value to multiply by a / b
> + * @fr: Pointer to &struct fast_ratio with transformed values for a and b
> + * Returns x * (a / b), rounded up in an unsigned long value
> + */
> +static inline unsigned long fast_ratio(unsigned long x, struct fast_ratio *fr)
> +{
> + return (x * fr->k + fr->r) >> fr->s;
> +}
> +
> +extern int init_fast_ratio(unsigned int max_x, unsigned long a,
> + unsigned long b, struct fast_ratio *fr);
> +#endif
> diff --git a/arch/mips/lib/Makefile b/arch/mips/lib/Makefile
> index dbcf651..e5c4ee5 100644
> --- a/arch/mips/lib/Makefile
> +++ b/arch/mips/lib/Makefile
> @@ -2,8 +2,8 @@
> # Makefile for MIPS-specific library files..
> #
>
> -lib-y += csum_partial.o memcpy.o memcpy-inatomic.o memset.o strlen_user.o \
> - strncpy_user.o strnlen_user.o uncached.o
> +lib-y += csum_partial.o fast-ratio.o memcpy.o memcpy-inatomic.o memset.o \
> + strlen_user.o strncpy_user.o strnlen_user.o uncached.o
>
> obj-y += iomap.o
> obj-$(CONFIG_PCI) += iomap-pci.o
> @@ -28,5 +28,7 @@ obj-$(CONFIG_CPU_TX39XX) += r3k_dump_tlb.o
> obj-$(CONFIG_CPU_TX49XX) += dump_tlb.o
> obj-$(CONFIG_CPU_VR41XX) += dump_tlb.o
>
> +obj-$(CONFIG_CP0_COUNT_NDELAY) += delay.o
> +
> # libgcc-style stuff needed in the kernel
> obj-y += ashldi3.o ashrdi3.o cmpdi2.o lshrdi3.o ucmpdi2.o
> diff --git a/arch/mips/lib/delay.c b/arch/mips/lib/delay.c
> new file mode 100644
> index 0000000..0d74543
> --- /dev/null
> +++ b/arch/mips/lib/delay.c
> @@ -0,0 +1,59 @@
> +/*
> + * delay.c
> + *
> + * Code implementing ndelay using the MIPS CP0 Count register.
> + *
> + * Copyright (C) 2009 Cisco Systems, Inc.
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License as published by
> + * the Free Software Foundation; either version 2 of the License, or
> + * (at your option) any later version.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program; if not, write to the Free Software
> + * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
> + */
> +
> +#include <linux/cache.h>
> +#include <linux/kernel.h>
> +#include <linux/types.h>
> +#include <linux/errno.h>
> +#include <linux/init.h>
> +#include <linux/module.h>
> +#include <asm/delay.h>
> +
> +/* This elements are initialized to a value that will cause huge delays to
> + * arise from use of ndelay before calling init_ndelay. This should make such
> + * mistakes obvious enough to easily find and correct. */
I think it would be better to have something like:
if (unlikely(not_calibrated))
WARN_ON_ONCE(...)
> +struct fast_ratio _ndelay_param __read_mostly = {
> + .k = 0,
> + .s = 0,
> + .r = ULONG_MAX / 2,
> +};
> +EXPORT_SYMBOL(_ndelay_param);
Not sure why the leading underscore here either.
> +
> +/*
> + * Called to initialize the values for the ndelay function
> + * @f: Frequency, in KHz, of the CP0 Count register increment rate
> + */
> +int __init init_ndelay(unsigned int f)
> +{
> + int ret;
> +
> + ret = init_fast_ratio(_MAX_DIRECT_NDELAY, f, 1000000, &_ndelay_param);
> +
> + if (ret)
> + pr_err("Unable to initialize ndelay parameters, errno %d\n",
> + ret);
> + else
> + pr_info("Set ndelay fast_ratio parameters: k %u s %u r %u\n",
> + _ndelay_param.k, _ndelay_param.s, _ndelay_param.r);
> +
> + return ret;
> +}
> diff --git a/arch/mips/lib/fast-ratio.c b/arch/mips/lib/fast-ratio.c
> new file mode 100644
> index 0000000..c630adf
> --- /dev/null
> +++ b/arch/mips/lib/fast-ratio.c
> @@ -0,0 +1,196 @@
> +/*
> + * fast-ratio.c
> + *
> + * Code implementing fast ratio calculator.
> + *
> + * Copyright (C) 2009 Cisco Systems, Inc.
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License as published by
> + * the Free Software Foundation; either version 2 of the License, or
> + * (at your option) any later version.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program; if not, write to the Free Software
> + * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
> + */
> +
> +#include <linux/cache.h>
> +#include <linux/kernel.h>
> +#include <linux/types.h>
> +#include <linux/errno.h>
> +#include <linux/log2.h>
> +#include <asm/fast-ratio.h>
> +
> +#ifndef __KERNEL__
> +#include <assert.h>
> +#else
> +#include <asm-generic/bug.h>
> +
> +#ifndef assert
> +#define assert(x) BUG_ON(!(x))
> +#endif
> +#endif
I suspect the thing you will be asked to do here is dump the whole
__KERNEL__ test and assert usage and simply just use BUG_ON right in
the code.
> +
> +#ifdef DEBUG
> +#define dbg(fmt, ...) pr_crit(fmt, ## __VA_ARGS__)
> +#else
> +#define dbg(fmt, ...) do { } while (0)
> +#endif
> +
> +#ifndef BITS_PER_LLONG
> +#define BITS_PER_LLONG ((BITS_PER_LONG * sizeof(long long)) / sizeof(long))
> +#endif
Is BITS_PER_LLONG really defined anywhere for anything?
> +
> +/* Type for intermediate calculations, along with the number of bits and
> + * the maximum size. This should be the biggest unsigned type for which
> + * division and modulus by unsigned long are defined on this
> + * architecture. */
> +#ifdef CONFIG_HAVE_ULLONG_DIV_AND_MOD
I've not seen any instances of this CONFIG option either.
> +typedef unsigned long long intermediate_t;
> +#define BITS_PER_ACC BITS_PER_LLONG
> +#define ACC_MAX ULLONG_MAX
> +#else
> +typedef unsigned long intermediate_t;
> +#define BITS_PER_ACC BITS_PER_LONG
> +#define ACC_MAX ULLONG_MAX
> +#endif
This might fall into the loophole of Documentation/CodingStyle --
chapter 5; i.e:
...but if there is a clear reason for why it under certain circumstances
might be an "unsigned int" and under other configurations might be
"unsigned long", then by all means go ahead and use a typedef.
> +
> +/*
Don't these need to start with /** if you want to have them
automagically parsed?
> + * Compute transform of equation (x * a)/b for fast computation
> + * @max_x: Maximum value of x
> + * @a: Value of a
> + * @b: value b
> + * @fr: Pointer to a &struct fast_ratio to hold transformed parameters
> + * Returns a zero on success, otherwise a negative errno value. Errno values
> + * are:
> + * -EDOM Parameter b is zero
> + * -EINVAL Either max_x is too large or max_x is zero
> + * -ERANGE The rounded up intermediate value of x * a would not fit
> + * in an unsigned long.
> + *
> + * Mathematically, as long as the ratios:
> + * a k
> + * - = ---
> + * b 2^s
> + *
> + * are equal, the specific values of k and s don't matter. There are
> + * two constraints, however:
> + *
> + * o The value of s must be less than BIT_PER_LONG
> + * o With a rounding constant of r = 2^s - 1, we must have
> + * x * k + r <= ULONG_MAX
> + *
> + * We want k to be as large as possible so that
> + * it has the maximum precision. Getting the largest k means
> + * getting the smallest shift.
> + *
> + * Note that this is designed to work on both 32-bit systems and 64-bit systems
> + * using the LP64 model.
> + */
> +int init_fast_ratio(unsigned int max_x, unsigned long a,
> + unsigned long b, struct fast_ratio *fr)
> +{
> +#define SHIFT_ROUND_UP(_v, _n) (((_n) < 0) ? \
> + (((unsigned long long) (_v)) << -(_n)) : \
> + (((_v) + ((1ull << (_n)) - 1)) >> (_n)))
> +#define ROUNDING_CONST(_s) (((_s) < 0) ? 0 : ((1ull << (_s)) - 1))
You've created a fast_ratio.h ; any reason why these #defines don't
live there rather than in the middle of this function? And if there
is any implicit trickery being used that might not be obvious to Joe
Average, then a comment or two wouldn't go amiss. I know that it is
over my head to parse on the fly, but that doesn't say much. :-)
> + intermediate_t scaled_a;
> + intermediate_t k0;
> + int s0;
> + int min_s;
> + int k_msb;
> + int s;
> + int si;
> + unsigned long long k;
> + unsigned long long r;
> + unsigned long long dividend;
> +
> + if (b == 0)
> + return -EDOM; /* Divide by zero */
> +
> + if (max_x > ULONG_MAX || max_x == 0)
> + return -EINVAL;
> +
> + if (a == 0) {
> + fr->k = 0; /* Trivial, result is always zero */
> + fr->s = 0;
> + fr->r = 0;
> + return 0;
> + }
> +
> + /* Calculate the rounded up value of a / b with the most precision we
> + * can easily obtain by shifting the value a by n bits to the left.
> + * This means that the value we get is (a / b) * 2^n. We could get
> + * an overflow if we used the usual (a + (b - 1))/ b, so we compute the
> + * rounding value explicitly. If the scale value of a modulus b is
> + * not zero, we need to increase the result by one. */
> + s0 = (BITS_PER_ACC - 1) - ilog2(a);
> + scaled_a = ((intermediate_t) a) << s0;
> +
> + k0 = (scaled_a / b) + ((scaled_a % b == 0) ? 0 : 1);
> + k_msb = ilog2(k0) + 1;
> + dbg("scaled_a %llx scaled_a %% b %llx k0 %llx s0 %d k_msb %d\n",
> + (unsigned long long) scaled_a,
> + (unsigned long long) scaled_a % b,
> + (unsigned long long) k0, s0, k_msb);
> +
> + /* Find a shift that yields the largest value of k that will avoid an
> + * overflow on an unsigned long when multiplied by max_x, and rounded
> + * up. */
> + min_s = k_msb;
> +
> + for (;;) {
> + int shft;
> + unsigned long long ri;
> + unsigned long long ki;
> + unsigned long long p;
> +
> + shft = min_s - 1;
> + si = s0 - shft;
> + ki = SHIFT_ROUND_UP(k0, shft);
> + ri = ROUNDING_CONST(si);
> +
> + /* We must be sure that max_x is smaller than p or the
> + * following calculation will eventually overflow */
> + assert(sizeof(max_x) < sizeof(p));
> + p = max_x * ki;
> + dividend = p + ri;
> + dbg("min_s %d shft %d si %d ri %llx ki %llx max_x %x p %llx "
> + "dividend %llx\n",
> + min_s, shft, si, ri, ki, max_x, p, dividend);
> + if ((si > BITS_PER_LONG || dividend > ULONG_MAX))
> + break;
> + min_s--;
> + }
> +
> + s = s0 - min_s;
> + k = SHIFT_ROUND_UP(k0, min_s);
> + r = ROUNDING_CONST(s);
> + dbg("min_s %d s %d k %llx max_x * k %llx r %llx dividend %llx\n",
> + min_s, s, k, max_x * k, r, max_x * k + r);
> +
> + /* If we have a negative shift, we couldn't find a k that would avoid
> + * an overflow. If that's true, or we have an overflow at the current
> + * shift, we return an error. */
> + if (s < 0 || max_x * k + r > ULONG_MAX)
> + return -ERANGE;
> +
> + /* If the shift we came up with would shift the final result out
> + * of the register, we've underflowed the result */
> + if (s >= BITS_PER_LONG)
> + return -ERANGE;
> +
> + fr->s = s;
> + fr->k = k;
> + fr->r = r;
> +
> + return 0;
> +#undef SHIFT_ROUND_UP
> +#undef ROUNDING_CONST
This is a .c file, so I don't see the need to undef anything at the end.
Generally speaking, I think this could be two separate commits -- one
that implements the basic ndelay uses read_c0_count() concept (if
really required), and then an add-on commit that does the uber
optimization of the whole ratio thing? Then if it turns out that one
hunk gets the green light, and the other doesn't, well then at least
you get to see that one part of your work goes forward.
Paul.
> +}
>
>
^ permalink raw reply [flat|nested] 10+ messages in thread* Re: [PATCH][MIPS] Use CP0 Count register to implement more granular ndelay
@ 2009-02-28 21:10 ` Paul Gortmaker
0 siblings, 0 replies; 10+ messages in thread
From: Paul Gortmaker @ 2009-02-28 21:10 UTC (permalink / raw)
To: VomLehn; +Cc: linux-mips, ralf
On Fri, Feb 27, 2009 at 6:09 PM, VomLehn <dvomlehn@cisco.com> wrote:
> The default implementation of ndelay uses udelay, which will result in the
> rounding up of any requested interval to the next highest number of
> microseconds. This may be a much longer delay than was desired. However,
> if the tick rate of the CP0 Count register is known, it is possible to
> implement an accurate ndelay that works on multiple MIPS processors.
Presumably the only case where this would ever matter is for delays
that needed to be much less than udelay(1) -- is there a lot of these
out there? According to git grep, the only user of ndelay in
arch/mips is lasat. And, if what is there is working now, then one
could argue that the calls for the short delays are not explicitly
required to be less than udelay(1).
>
> To use this, enable CONFIG_CP0_COUNT_NDELAY and modify the platform startup
> code to call init_ndelay as early as possible. A good place to call it
> is probably the prom_init function. The argument to init_ndelay should be
> the CP0 Count register tick rate, in kHz. The tick rate is typically half
> the processor clock rate so, if you have a 700 MHz processor, the CP0 Count
> register would tick at 350 MHz and you would pass 3500000 to init_ndelay.
>
> At the risk of being obvious, you will need to ensure that ndelay isn't used
> until after the call to init_ndelay. There are no checks to enforce this as
> it would increase the latency in ndelay, but, in order to make it obvious
> that init_ndelay hasn't been called early enough, the initial ndelay
> parameters are set to cause a really large delay.
I didn't see the arch_initcall for the init_ndelay placed anywhere in
this patch.
>
> Signed-off-by: David VomLehn <dvomlehn@cisco.com>
> ---
> arch/mips/Kconfig | 9 ++
> arch/mips/include/asm/delay.h | 75 ++++++++++++++
> arch/mips/include/asm/fast-ratio.h | 53 ++++++++++
> arch/mips/lib/Makefile | 6 +-
> arch/mips/lib/delay.c | 59 +++++++++++
> arch/mips/lib/fast-ratio.c | 196 ++++++++++++++++++++++++++++++++++++
> 6 files changed, 396 insertions(+), 2 deletions(-)
>
> diff --git a/arch/mips/Kconfig b/arch/mips/Kconfig
> index 0b5f16b..1568b91 100644
> --- a/arch/mips/Kconfig
> +++ b/arch/mips/Kconfig
> @@ -1807,6 +1807,15 @@ config NR_CPUS
>
> source "kernel/time/Kconfig"
>
> +config CP0_COUNT_NDELAY
> + bool "Use coprocessor 0 Count register for ndelay functionality"
> + default n
Does there need to be some sort of depends here to cover off any
limitations where it is known that it won't work?
> + help
> + Implements the ndelay function using the coprocessor 0 Count
> + register. Using this requires including a call to init_ndelay
> + with the Count register increment frequency, in KHz, in one
> + of the early initialization functions.
> +
> #
> # Timer Interrupt Frequency Configuration
> #
> diff --git a/arch/mips/include/asm/delay.h b/arch/mips/include/asm/delay.h
> index b0bccd2..fc41c46 100644
> --- a/arch/mips/include/asm/delay.h
> +++ b/arch/mips/include/asm/delay.h
> @@ -109,4 +109,79 @@ static inline void __udelay(unsigned long usecs, unsigned long lpj)
> #define MAX_UDELAY_MS (1000 / HZ)
> #endif
>
> +#ifdef CONFIG_CP0_COUNT_NDELAY
> +/*
> + * Definitions for using MIPS CP0 Count register-based ndelay. If
> + * CONFIG_CP0_COUNT_NDELAY is not defined, ndelay will default to using
> + * udelay.
> + */
> +
> +#include <linux/kernel.h>
> +#include <asm/fast-ratio.h>
> +#include <asm/mipsregs.h>
> +
> +/* Maximum amount of time that will be handled with ndelay, in nanoseconds.
> + * Values bigger than this will be bounced up to udelay. */
> +#define _MAX_DIRECT_NDELAY 65535
Why the leading underscore here? Maybe MAX_CP0_NDELAY would be a
better choice if it has to be changed anyway?
> +
> +#define ndelay(n) _ndelay(n)
> +
> +extern struct fast_ratio _ndelay_param;
...and here ; not sure why the leading underscore.
> +
> +/*
> + * Compute the number of CP0 Count ticks corresponding to the interval
> + * @nsecs: Interval, expressed in nanoseconds
> + * Breaking this out as its own function makes it easier to test.
> + */
> +static inline unsigned int _ndelay_ticks(unsigned int nsecs)
> +{
> + return fast_ratio(nsecs, &_ndelay_param);
> +}
> +
> +/*
> + * Delay for at least the given number of nanoseconds
> + * @nsecs: Number of nanoseconds to delay
> + *
> + * This function uses the CP0 Count register to give a pretty accurate delay
> + * for very short delay periods. Very small delays will, unavoidably, be
> + * dominated by the instructions in this function but this should converge
> + * to the true delay reasonably quickly before nsecs gets very large.
> + *
> + * NOTE: Failure to call init_ndelay will result in *very* long delay times.
> + * This is done deliberately to ensure that, if you use ndelay and forget to
> + * call init_delay first, you will notice your mistake quickly.
> + */
> +static inline void _ndelay(unsigned long nsecs)
> +{
> + int start;
> +
> + /* The expected thing would be to do the first read of the Count
> + * register later, just before entering the delay loop. Reading here
> + * ensures that very short intervals will exit the first time through
> + * that loop. */
> + start = read_c0_count();
Is this really going to all work on mips64? I've spent hours
debugging silent boot death on mips64 due to bad variable choices used
for stuff playing with read_c0_count when mips went to generic
clockevents on r4k, and it wasn't fun.
> +
> + if (unlikely(nsecs > _MAX_DIRECT_NDELAY))
> + udelay(DIV_ROUND_UP(nsecs, 1000)); /* Would overflow counter */
> +
> + else {
> + int end;
> + int now;
> +
> + end = start + _ndelay_ticks(nsecs);
> +
> + do {
> + now = read_c0_count();
> + } while (end - now > 0);
> + }
> +}
> +
> +extern int init_ndelay(unsigned int count_freq);
> +#else
> +static inline int init_ndelay(unsigned int count_freq)
> +{
> + return 0;
> +}
> +#endif
> +
> #endif /* _ASM_DELAY_H */
> diff --git a/arch/mips/include/asm/fast-ratio.h b/arch/mips/include/asm/fast-ratio.h
> new file mode 100644
> index 0000000..84ac286
> --- /dev/null
> +++ b/arch/mips/include/asm/fast-ratio.h
> @@ -0,0 +1,53 @@
> +/*
> + * fast-ratio.h
> + *
> + * Definitions for using fast evaluator for expressions of the form:
> + * a
> + * x * -
> + * b
> + *
> + * where x can be constrained to some maximum value and a and b are constants.
> + *
> + * Copyright (C) 2009 Cisco Systems, Inc.
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License as published by
> + * the Free Software Foundation; either version 2 of the License, or
> + * (at your option) any later version.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program; if not, write to the Free Software
> + * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
> + */
> +
> +#ifndef _ASM_MACH_POWERTV_FAST_RATIO_H_
> +#define _ASM_MACH_POWERTV_FAST_RATIO_H_
s/MACH_POWERTV_// is probably what you wanted to do here.
> +
> +/* Instances of this structure will normally be declared with the attribute
> + * __read_mostly since it only makes sense to use the fast-ratio code if
> + * you fill in the structure once for many calls to evalue the result. */
> +struct fast_ratio {
> + unsigned long k;
> + unsigned int s;
> + unsigned long r;
> +};
Use of "int" again tends to make me nervous.
> +
> +/*
> + * Evaluate x * (a / b), a and b constant, as transformed for speed.
> + * @x: Value to multiply by a / b
> + * @fr: Pointer to &struct fast_ratio with transformed values for a and b
> + * Returns x * (a / b), rounded up in an unsigned long value
> + */
> +static inline unsigned long fast_ratio(unsigned long x, struct fast_ratio *fr)
> +{
> + return (x * fr->k + fr->r) >> fr->s;
> +}
> +
> +extern int init_fast_ratio(unsigned int max_x, unsigned long a,
> + unsigned long b, struct fast_ratio *fr);
> +#endif
> diff --git a/arch/mips/lib/Makefile b/arch/mips/lib/Makefile
> index dbcf651..e5c4ee5 100644
> --- a/arch/mips/lib/Makefile
> +++ b/arch/mips/lib/Makefile
> @@ -2,8 +2,8 @@
> # Makefile for MIPS-specific library files..
> #
>
> -lib-y += csum_partial.o memcpy.o memcpy-inatomic.o memset.o strlen_user.o \
> - strncpy_user.o strnlen_user.o uncached.o
> +lib-y += csum_partial.o fast-ratio.o memcpy.o memcpy-inatomic.o memset.o \
> + strlen_user.o strncpy_user.o strnlen_user.o uncached.o
>
> obj-y += iomap.o
> obj-$(CONFIG_PCI) += iomap-pci.o
> @@ -28,5 +28,7 @@ obj-$(CONFIG_CPU_TX39XX) += r3k_dump_tlb.o
> obj-$(CONFIG_CPU_TX49XX) += dump_tlb.o
> obj-$(CONFIG_CPU_VR41XX) += dump_tlb.o
>
> +obj-$(CONFIG_CP0_COUNT_NDELAY) += delay.o
> +
> # libgcc-style stuff needed in the kernel
> obj-y += ashldi3.o ashrdi3.o cmpdi2.o lshrdi3.o ucmpdi2.o
> diff --git a/arch/mips/lib/delay.c b/arch/mips/lib/delay.c
> new file mode 100644
> index 0000000..0d74543
> --- /dev/null
> +++ b/arch/mips/lib/delay.c
> @@ -0,0 +1,59 @@
> +/*
> + * delay.c
> + *
> + * Code implementing ndelay using the MIPS CP0 Count register.
> + *
> + * Copyright (C) 2009 Cisco Systems, Inc.
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License as published by
> + * the Free Software Foundation; either version 2 of the License, or
> + * (at your option) any later version.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program; if not, write to the Free Software
> + * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
> + */
> +
> +#include <linux/cache.h>
> +#include <linux/kernel.h>
> +#include <linux/types.h>
> +#include <linux/errno.h>
> +#include <linux/init.h>
> +#include <linux/module.h>
> +#include <asm/delay.h>
> +
> +/* This elements are initialized to a value that will cause huge delays to
> + * arise from use of ndelay before calling init_ndelay. This should make such
> + * mistakes obvious enough to easily find and correct. */
I think it would be better to have something like:
if (unlikely(not_calibrated))
WARN_ON_ONCE(...)
> +struct fast_ratio _ndelay_param __read_mostly = {
> + .k = 0,
> + .s = 0,
> + .r = ULONG_MAX / 2,
> +};
> +EXPORT_SYMBOL(_ndelay_param);
Not sure why the leading underscore here either.
> +
> +/*
> + * Called to initialize the values for the ndelay function
> + * @f: Frequency, in KHz, of the CP0 Count register increment rate
> + */
> +int __init init_ndelay(unsigned int f)
> +{
> + int ret;
> +
> + ret = init_fast_ratio(_MAX_DIRECT_NDELAY, f, 1000000, &_ndelay_param);
> +
> + if (ret)
> + pr_err("Unable to initialize ndelay parameters, errno %d\n",
> + ret);
> + else
> + pr_info("Set ndelay fast_ratio parameters: k %u s %u r %u\n",
> + _ndelay_param.k, _ndelay_param.s, _ndelay_param.r);
> +
> + return ret;
> +}
> diff --git a/arch/mips/lib/fast-ratio.c b/arch/mips/lib/fast-ratio.c
> new file mode 100644
> index 0000000..c630adf
> --- /dev/null
> +++ b/arch/mips/lib/fast-ratio.c
> @@ -0,0 +1,196 @@
> +/*
> + * fast-ratio.c
> + *
> + * Code implementing fast ratio calculator.
> + *
> + * Copyright (C) 2009 Cisco Systems, Inc.
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License as published by
> + * the Free Software Foundation; either version 2 of the License, or
> + * (at your option) any later version.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program; if not, write to the Free Software
> + * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
> + */
> +
> +#include <linux/cache.h>
> +#include <linux/kernel.h>
> +#include <linux/types.h>
> +#include <linux/errno.h>
> +#include <linux/log2.h>
> +#include <asm/fast-ratio.h>
> +
> +#ifndef __KERNEL__
> +#include <assert.h>
> +#else
> +#include <asm-generic/bug.h>
> +
> +#ifndef assert
> +#define assert(x) BUG_ON(!(x))
> +#endif
> +#endif
I suspect the thing you will be asked to do here is dump the whole
__KERNEL__ test and assert usage and simply just use BUG_ON right in
the code.
> +
> +#ifdef DEBUG
> +#define dbg(fmt, ...) pr_crit(fmt, ## __VA_ARGS__)
> +#else
> +#define dbg(fmt, ...) do { } while (0)
> +#endif
> +
> +#ifndef BITS_PER_LLONG
> +#define BITS_PER_LLONG ((BITS_PER_LONG * sizeof(long long)) / sizeof(long))
> +#endif
Is BITS_PER_LLONG really defined anywhere for anything?
> +
> +/* Type for intermediate calculations, along with the number of bits and
> + * the maximum size. This should be the biggest unsigned type for which
> + * division and modulus by unsigned long are defined on this
> + * architecture. */
> +#ifdef CONFIG_HAVE_ULLONG_DIV_AND_MOD
I've not seen any instances of this CONFIG option either.
> +typedef unsigned long long intermediate_t;
> +#define BITS_PER_ACC BITS_PER_LLONG
> +#define ACC_MAX ULLONG_MAX
> +#else
> +typedef unsigned long intermediate_t;
> +#define BITS_PER_ACC BITS_PER_LONG
> +#define ACC_MAX ULLONG_MAX
> +#endif
This might fall into the loophole of Documentation/CodingStyle --
chapter 5; i.e:
...but if there is a clear reason for why it under certain circumstances
might be an "unsigned int" and under other configurations might be
"unsigned long", then by all means go ahead and use a typedef.
> +
> +/*
Don't these need to start with /** if you want to have them
automagically parsed?
> + * Compute transform of equation (x * a)/b for fast computation
> + * @max_x: Maximum value of x
> + * @a: Value of a
> + * @b: value b
> + * @fr: Pointer to a &struct fast_ratio to hold transformed parameters
> + * Returns a zero on success, otherwise a negative errno value. Errno values
> + * are:
> + * -EDOM Parameter b is zero
> + * -EINVAL Either max_x is too large or max_x is zero
> + * -ERANGE The rounded up intermediate value of x * a would not fit
> + * in an unsigned long.
> + *
> + * Mathematically, as long as the ratios:
> + * a k
> + * - = ---
> + * b 2^s
> + *
> + * are equal, the specific values of k and s don't matter. There are
> + * two constraints, however:
> + *
> + * o The value of s must be less than BIT_PER_LONG
> + * o With a rounding constant of r = 2^s - 1, we must have
> + * x * k + r <= ULONG_MAX
> + *
> + * We want k to be as large as possible so that
> + * it has the maximum precision. Getting the largest k means
> + * getting the smallest shift.
> + *
> + * Note that this is designed to work on both 32-bit systems and 64-bit systems
> + * using the LP64 model.
> + */
> +int init_fast_ratio(unsigned int max_x, unsigned long a,
> + unsigned long b, struct fast_ratio *fr)
> +{
> +#define SHIFT_ROUND_UP(_v, _n) (((_n) < 0) ? \
> + (((unsigned long long) (_v)) << -(_n)) : \
> + (((_v) + ((1ull << (_n)) - 1)) >> (_n)))
> +#define ROUNDING_CONST(_s) (((_s) < 0) ? 0 : ((1ull << (_s)) - 1))
You've created a fast_ratio.h ; any reason why these #defines don't
live there rather than in the middle of this function? And if there
is any implicit trickery being used that might not be obvious to Joe
Average, then a comment or two wouldn't go amiss. I know that it is
over my head to parse on the fly, but that doesn't say much. :-)
> + intermediate_t scaled_a;
> + intermediate_t k0;
> + int s0;
> + int min_s;
> + int k_msb;
> + int s;
> + int si;
> + unsigned long long k;
> + unsigned long long r;
> + unsigned long long dividend;
> +
> + if (b == 0)
> + return -EDOM; /* Divide by zero */
> +
> + if (max_x > ULONG_MAX || max_x == 0)
> + return -EINVAL;
> +
> + if (a == 0) {
> + fr->k = 0; /* Trivial, result is always zero */
> + fr->s = 0;
> + fr->r = 0;
> + return 0;
> + }
> +
> + /* Calculate the rounded up value of a / b with the most precision we
> + * can easily obtain by shifting the value a by n bits to the left.
> + * This means that the value we get is (a / b) * 2^n. We could get
> + * an overflow if we used the usual (a + (b - 1))/ b, so we compute the
> + * rounding value explicitly. If the scale value of a modulus b is
> + * not zero, we need to increase the result by one. */
> + s0 = (BITS_PER_ACC - 1) - ilog2(a);
> + scaled_a = ((intermediate_t) a) << s0;
> +
> + k0 = (scaled_a / b) + ((scaled_a % b == 0) ? 0 : 1);
> + k_msb = ilog2(k0) + 1;
> + dbg("scaled_a %llx scaled_a %% b %llx k0 %llx s0 %d k_msb %d\n",
> + (unsigned long long) scaled_a,
> + (unsigned long long) scaled_a % b,
> + (unsigned long long) k0, s0, k_msb);
> +
> + /* Find a shift that yields the largest value of k that will avoid an
> + * overflow on an unsigned long when multiplied by max_x, and rounded
> + * up. */
> + min_s = k_msb;
> +
> + for (;;) {
> + int shft;
> + unsigned long long ri;
> + unsigned long long ki;
> + unsigned long long p;
> +
> + shft = min_s - 1;
> + si = s0 - shft;
> + ki = SHIFT_ROUND_UP(k0, shft);
> + ri = ROUNDING_CONST(si);
> +
> + /* We must be sure that max_x is smaller than p or the
> + * following calculation will eventually overflow */
> + assert(sizeof(max_x) < sizeof(p));
> + p = max_x * ki;
> + dividend = p + ri;
> + dbg("min_s %d shft %d si %d ri %llx ki %llx max_x %x p %llx "
> + "dividend %llx\n",
> + min_s, shft, si, ri, ki, max_x, p, dividend);
> + if ((si > BITS_PER_LONG || dividend > ULONG_MAX))
> + break;
> + min_s--;
> + }
> +
> + s = s0 - min_s;
> + k = SHIFT_ROUND_UP(k0, min_s);
> + r = ROUNDING_CONST(s);
> + dbg("min_s %d s %d k %llx max_x * k %llx r %llx dividend %llx\n",
> + min_s, s, k, max_x * k, r, max_x * k + r);
> +
> + /* If we have a negative shift, we couldn't find a k that would avoid
> + * an overflow. If that's true, or we have an overflow at the current
> + * shift, we return an error. */
> + if (s < 0 || max_x * k + r > ULONG_MAX)
> + return -ERANGE;
> +
> + /* If the shift we came up with would shift the final result out
> + * of the register, we've underflowed the result */
> + if (s >= BITS_PER_LONG)
> + return -ERANGE;
> +
> + fr->s = s;
> + fr->k = k;
> + fr->r = r;
> +
> + return 0;
> +#undef SHIFT_ROUND_UP
> +#undef ROUNDING_CONST
This is a .c file, so I don't see the need to undef anything at the end.
Generally speaking, I think this could be two separate commits -- one
that implements the basic ndelay uses read_c0_count() concept (if
really required), and then an add-on commit that does the uber
optimization of the whole ratio thing? Then if it turns out that one
hunk gets the green light, and the other doesn't, well then at least
you get to see that one part of your work goes forward.
Paul.
> +}
>
>
^ permalink raw reply [flat|nested] 10+ messages in thread* Re: [PATCH][MIPS] Use CP0 Count register to implement more granular ndelay
2009-02-28 21:10 ` Paul Gortmaker
(?)
@ 2009-03-10 19:18 ` VomLehn
2009-03-10 20:12 ` Ralf Baechle
-1 siblings, 1 reply; 10+ messages in thread
From: VomLehn @ 2009-03-10 19:18 UTC (permalink / raw)
To: Paul Gortmaker; +Cc: linux-mips, ralf
On Sat, Feb 28, 2009 at 04:10:46PM -0500, Paul Gortmaker wrote:
> On Fri, Feb 27, 2009 at 6:09 PM, VomLehn <dvomlehn@cisco.com> wrote:
> > The default implementation of ndelay uses udelay, which will result in the
> > rounding up of any requested interval to the next highest number of
> > microseconds. This may be a much longer delay than was desired. However,
> > if the tick rate of the CP0 Count register is known, it is possible to
> > implement an accurate ndelay that works on multiple MIPS processors.
>
> Presumably the only case where this would ever matter is for delays
> that needed to be much less than udelay(1) -- is there a lot of these
> out there? According to git grep, the only user of ndelay in
> arch/mips is lasat. And, if what is there is working now, then one
> could argue that the calls for the short delays are not explicitly
> required to be less than udelay(1).
I'm working on patches that have a requirement for a 100 nsec delay, which
was the motivation for this. I'm trying to mainline code that has been
sitting outside the kernel tree for something like 3-1/2 years, but this
requires hitting a moving target with a moving gun, so it's taking a while...
> > To use this, enable CONFIG_CP0_COUNT_NDELAY and modify the platform startup
> > code to call init_ndelay as early as possible. A good place to call it
> > is probably the prom_init function. The argument to init_ndelay should be
> > the CP0 Count register tick rate, in kHz. The tick rate is typically half
> > the processor clock rate so, if you have a 700 MHz processor, the CP0 Count
> > register would tick at 350 MHz and you would pass 3500000 to init_ndelay.
> >
> > At the risk of being obvious, you will need to ensure that ndelay isn't used
> > until after the call to init_ndelay. There are no checks to enforce this as
> > it would increase the latency in ndelay, but, in order to make it obvious
> > that init_ndelay hasn't been called early enough, the initial ndelay
> > parameters are set to cause a really large delay.
>
> I didn't see the arch_initcall for the init_ndelay placed anywhere in
> this patch.
Two reasons for this:
1. I haven't convinced myself that an arch_init call is early enough; you
might need it earlier. I'm open to feedback about this.
2. Doing this as an arch_initcall requires that init_ndelay call some
currently undefined function to get the CP0 Count tick rate, whereas
it is presently called with that value. Again, I'm open to feedback.
> > +config CP0_COUNT_NDELAY
> > + bool "Use coprocessor 0 Count register for ndelay functionality"
> > + default n
>
> Does there need to be some sort of depends here to cover off any
> limitations where it is known that it won't work?
I don't have the breadth of knowledge required to say what processors have
a CP0 Count register. Any suggestions?
> > +/* Maximum amount of time that will be handled with ndelay, in nanoseconds.
> > + * Values bigger than this will be bounced up to udelay. */
> > +#define _MAX_DIRECT_NDELAY 65535
>
> Why the leading underscore here? Maybe MAX_CP0_NDELAY would be a
> better choice if it has to be changed anyway?
I've spent lots of time doing standards and am following the C convention
of "hiding" things that aren't part of the published interface with an
underscore. The name you suggested has the downside that it implies you
can't call ndelay with a bigger value, which isn't true. This is just the
cut-over to using udelay.
> > +
> > +#define ndelay(n) _ndelay(n)
> > +
> > +extern struct fast_ratio _ndelay_param;
>
>
> ...and here ; not sure why the leading underscore.
The previous comment about hiding things that aren't part of the published
interface applies.
> > +static inline void _ndelay(unsigned long nsecs)
> > +{
> > + int start;
> > +
> > + /* The expected thing would be to do the first read of the Count
> > + * register later, just before entering the delay loop. Reading here
> > + * ensures that very short intervals will exit the first time through
> > + * that loop. */
> > + start = read_c0_count();
>
> Is this really going to all work on mips64? I've spent hours
> debugging silent boot death on mips64 due to bad variable choices used
> for stuff playing with read_c0_count when mips went to generic
> clockevents on r4k, and it wasn't fun.
I don't have a MIPS64 box to play with, but the manuals make it look like
I've got a 32-bit Count register. It is also recommended that, though you
*can* set Count, you don't. I'm assuming this has been followed. If not,
then your suggestion about a dependency in Kconfig should limit this to
32-bit systems.
> > +#ifndef _ASM_MACH_POWERTV_FAST_RATIO_H_
> > +#define _ASM_MACH_POWERTV_FAST_RATIO_H_
>
> s/MACH_POWERTV_// is probably what you wanted to do here.
Yes, this is bleed-through from the out-of-tree implementation and should
be changed.
> > +/* Instances of this structure will normally be declared with the attribute
> > + * __read_mostly since it only makes sense to use the fast-ratio code if
> > + * you fill in the structure once for many calls to evalue the result. */
> > +struct fast_ratio {
> > + unsigned long k;
> > + unsigned int s;
> > + unsigned long r;
> > +};
>
> Use of "int" again tends to make me nervous.
Not to worry. The variable s is a shift count. Since the C standard assures
us that ints can hold values up to 32767, this should work until we hit the
MIPS32768 architecture. :-)
> > +/* This elements are initialized to a value that will cause huge delays to
> > + * arise from use of ndelay before calling init_ndelay. This should make such
> > + * mistakes obvious enough to easily find and correct. */
>
> I think it would be better to have something like:
>
> if (unlikely(not_calibrated))
> WARN_ON_ONCE(...)
If you have to use ndelay instead of udelay, you may very well care about the
extra few nanoseconds this would take.
> > +struct fast_ratio _ndelay_param __read_mostly = {
> > + .k = 0,
> > + .s = 0,
> > + .r = ULONG_MAX / 2,
> > +};
> > +EXPORT_SYMBOL(_ndelay_param);
>
> Not sure why the leading underscore here either.
Same reason as above.
> > +#ifndef assert
> > +#define assert(x) BUG_ON(!(x))
> > +#endif
> > +#endif
>
> I suspect the thing you will be asked to do here is dump the whole
> __KERNEL__ test and assert usage and simply just use BUG_ON right in
> the code.
Agreed, this is left-over cruft from testing and should be removed.
> > +#ifndef BITS_PER_LLONG
> > +#define BITS_PER_LLONG ((BITS_PER_LONG * sizeof(long long)) / sizeof(long))
> > +#endif
>
> Is BITS_PER_LLONG really defined anywhere for anything?
See below.
> > +/* Type for intermediate calculations, along with the number of bits and
> > + * the maximum size. This should be the biggest unsigned type for which
> > + * division and modulus by unsigned long are defined on this
> > + * architecture. */
> > +#ifdef CONFIG_HAVE_ULLONG_DIV_AND_MOD
>
> I've not seen any instances of this CONFIG option either.
>
> > +typedef unsigned long long intermediate_t;
> > +#define BITS_PER_ACC BITS_PER_LLONG
> > +#define ACC_MAX ULLONG_MAX
> > +#else
> > +typedef unsigned long intermediate_t;
> > +#define BITS_PER_ACC BITS_PER_LONG
> > +#define ACC_MAX ULLONG_MAX
> > +#endif
>
> This might fall into the loophole of Documentation/CodingStyle --
> chapter 5; i.e:
>
> ...but if there is a clear reason for why it under certain circumstances
> might be an "unsigned int" and under other configurations might be
> "unsigned long", then by all means go ahead and use a typedef.
This is fuzzy. The code was developed and tested in userspace for both
sizes, but it turns out we don't support division and modulus operations
of unsigned long longs by unsigned longs in kernel space, at least at present.
Support this would allow additional precision. So, I scratched my head a bit
and left it in to see what comments it might provoke. I'm still undecided as
to what I should do with it...
> > +
> > +/*
>
> Don't these need to start with /** if you want to have them
> automagically parsed?
Good point.
> > +int init_fast_ratio(unsigned int max_x, unsigned long a,
> > + unsigned long b, struct fast_ratio *fr)
> > +{
> > +#define SHIFT_ROUND_UP(_v, _n) (((_n) < 0) ? \
> > + (((unsigned long long) (_v)) << -(_n)) : \
> > + (((_v) + ((1ull << (_n)) - 1)) >> (_n)))
> > +#define ROUNDING_CONST(_s) (((_s) < 0) ? 0 : ((1ull << (_s)) - 1))
>
> You've created a fast_ratio.h ; any reason why these #defines don't
> live there rather than in the middle of this function? And if there
> is any implicit trickery being used that might not be obvious to Joe
> Average, then a comment or two wouldn't go amiss. I know that it is
> over my head to parse on the fly, but that doesn't say much. :-)
The macros aren't part of the interface, so I don't want to put them in
the header file. They only used in this particular function, so I only have
them defined here.
> > +#undef SHIFT_ROUND_UP
> > +#undef ROUNDING_CONST
>
> This is a .c file, so I don't see the need to undef anything at the end.
Information hiding, they aren't used outside the function, so they are not
defined outside the function. Nested function definitions or scoping rules for
#defines would accomplish the same thing.
> Generally speaking, I think this could be two separate commits -- one
> that implements the basic ndelay uses read_c0_count() concept (if
> really required), and then an add-on commit that does the uber
> optimization of the whole ratio thing? Then if it turns out that one
> hunk gets the green light, and the other doesn't, well then at least
> you get to see that one part of your work goes forward.
It could be, but I don't know of a current user of the fast-ratio work outside
of ndelay, and I like the extra dynamic range and precision it gives you for
ndelay. If there is an objection to the fast-ratio stuff, I could recast
ndelay as stand-alone but my preference is to have both.
I'll tweak this patch and resend a new version.
David VomLehn
^ permalink raw reply [flat|nested] 10+ messages in thread* Re: [PATCH][MIPS] Use CP0 Count register to implement more granular ndelay
2009-03-10 19:18 ` VomLehn
@ 2009-03-10 20:12 ` Ralf Baechle
0 siblings, 0 replies; 10+ messages in thread
From: Ralf Baechle @ 2009-03-10 20:12 UTC (permalink / raw)
To: VomLehn; +Cc: Paul Gortmaker, linux-mips
On Tue, Mar 10, 2009 at 12:18:28PM -0700, VomLehn wrote:
> > > +config CP0_COUNT_NDELAY
> > > + bool "Use coprocessor 0 Count register for ndelay functionality"
> > > + default n
> >
> > Does there need to be some sort of depends here to cover off any
> > limitations where it is known that it won't work?
>
> I don't have the breadth of knowledge required to say what processors have
> a CP0 Count register. Any suggestions?
All MIPS III, MIPS IV, MIPS32 and MIPS64 processors have a 32-bit count
register which typically is clocked at half the maximum instruction issue
rate, more rarely at the full rate. A few processors like the RM53230
family can select the increment rate at reset-time to either the full or
half instruction issue rate. Others have the option of totally halting
it in special low-power, low-performance modes. The count rate might also
be affected by clock scaling.
Ralf
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2009-03-25 17:38 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-03-12 3:28 [PATCH][MIPS] Use CP0 Count register to implement more granular ndelay VomLehn
2009-03-13 9:29 ` Thomas Bogendoerfer
2009-03-13 11:32 ` Ralf Baechle
2009-03-13 17:35 ` VomLehn
[not found] <web-5716826@test.infohit.si>
2009-03-25 17:37 ` David VomLehn
-- strict thread matches above, loose matches on Subject: below --
2009-02-27 23:09 VomLehn
2009-02-28 21:10 ` Paul Gortmaker
2009-02-28 21:10 ` Paul Gortmaker
2009-03-10 19:18 ` VomLehn
2009-03-10 20:12 ` Ralf Baechle
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.