* [PATCH] MIPS: PowerTV: Use fls() carefully where static optimization is required @ 2010-06-27 13:44 Shinya Kuribayashi 2010-06-30 17:48 ` David VomLehn 2010-06-30 22:01 ` David VomLehn 0 siblings, 2 replies; 13+ messages in thread From: Shinya Kuribayashi @ 2010-06-27 13:44 UTC (permalink / raw) To: dvomlehn; +Cc: linux-mips fls()/__fls() defined at <asm/bitops.h>, doesn't use CLZ unless it's explicitly requested via <cpu-features-overrides.h>. In other words, as long as depending on cpu_data[0].isa_level, CLZ is nerver used for fls()/__fls(). Looking at leftover clz() in asic_int.c, PowerTV used to use Malta's clz() and irq_ffs() as-is, then for some reason made a decision not to use clz(). It's MIPS32 machine and luckily clz() is left there, then let's go back to the original shape. Signed-off-by: Shinya Kuribayashi <skuribay@pobox.com> --- Compile checked, and now CLZ is back. arch/mips/powertv/asic/asic_int.c | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/arch/mips/powertv/asic/asic_int.c b/arch/mips/powertv/asic/asic_int.c index 529c44a..e3c08a2 100644 --- a/arch/mips/powertv/asic/asic_int.c +++ b/arch/mips/powertv/asic/asic_int.c @@ -86,7 +86,7 @@ static inline int clz(unsigned long x) */ static inline unsigned int irq_ffs(unsigned int pending) { - return fls(pending) - 1 + CAUSEB_IP; + return -clz(pending) + 31 - CAUSEB_IP; } /* -- 1.7.1 ^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH] MIPS: PowerTV: Use fls() carefully where static optimization is required 2010-06-27 13:44 [PATCH] MIPS: PowerTV: Use fls() carefully where static optimization is required Shinya Kuribayashi @ 2010-06-30 17:48 ` David VomLehn 2010-06-30 22:01 ` David VomLehn 1 sibling, 0 replies; 13+ messages in thread From: David VomLehn @ 2010-06-30 17:48 UTC (permalink / raw) To: Shinya Kuribayashi; +Cc: linux-mips On Sun, Jun 27, 2010 at 08:44:03AM -0500, Shinya Kuribayashi wrote: > fls()/__fls() defined at <asm/bitops.h>, doesn't use CLZ unless it's > explicitly requested via <cpu-features-overrides.h>. In other words, > as long as depending on cpu_data[0].isa_level, CLZ is nerver used for > fls()/__fls(). > > Looking at leftover clz() in asic_int.c, PowerTV used to use Malta's > clz() and irq_ffs() as-is, then for some reason made a decision not to > use clz(). > > It's MIPS32 machine and luckily clz() is left there, then let's go back > to the original shape. > > Signed-off-by: Shinya Kuribayashi <skuribay@pobox.com> > --- > > Compile checked, and now CLZ is back. > > arch/mips/powertv/asic/asic_int.c | 2 +- > 1 files changed, 1 insertions(+), 1 deletions(-) > > diff --git a/arch/mips/powertv/asic/asic_int.c b/arch/mips/powertv/asic/asic_int.c > index 529c44a..e3c08a2 100644 > --- a/arch/mips/powertv/asic/asic_int.c > +++ b/arch/mips/powertv/asic/asic_int.c > @@ -86,7 +86,7 @@ static inline int clz(unsigned long x) > */ > static inline unsigned int irq_ffs(unsigned int pending) > { > - return fls(pending) - 1 + CAUSEB_IP; > + return -clz(pending) + 31 - CAUSEB_IP; > } > > /* Thanks, I'll try this out. -- David VL ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] MIPS: PowerTV: Use fls() carefully where static optimization is required 2010-06-27 13:44 [PATCH] MIPS: PowerTV: Use fls() carefully where static optimization is required Shinya Kuribayashi 2010-06-30 17:48 ` David VomLehn @ 2010-06-30 22:01 ` David VomLehn 2010-07-02 14:13 ` Shinya Kuribayashi 1 sibling, 1 reply; 13+ messages in thread From: David VomLehn @ 2010-06-30 22:01 UTC (permalink / raw) To: Shinya Kuribayashi; +Cc: linux-mips On Sun, Jun 27, 2010 at 08:44:03AM -0500, Shinya Kuribayashi wrote: > fls()/__fls() defined at <asm/bitops.h>, doesn't use CLZ unless it's > explicitly requested via <cpu-features-overrides.h>. In other words, > as long as depending on cpu_data[0].isa_level, CLZ is nerver used for > fls()/__fls(). ... > It's MIPS32 machine and luckily clz() is left there, then let's go back > to the original shape. > > Signed-off-by: Shinya Kuribayashi <skuribay@pobox.com> Thanks! You are correct in your analysis and make a good point that clz should be used in interrupt handling. I think, though, that it's better to go ahead and supply a full-blown cpu-features-override.h rather than focusing on this one case. This way fls() will be optimized to use clz everywhere and any other optimizations that depend on constant cpu_has_* values will also be used. The following patch provides cpu-features-override.h and removes the unused clz() function in asic_int.c. Signed-off-by: David VomLehn <dvomlehn@cisco.com> --- .../asm/mach-powertv/cpu-feature-overrides.h | 59 ++++++++++++++++++++ arch/mips/powertv/asic/asic_int.c | 13 ---- 2 files changed, 59 insertions(+), 13 deletions(-) diff --git a/arch/mips/include/asm/mach-powertv/cpu-feature-overrides.h b/arch/mips/include/asm/mach-powertv/cpu-feature-overrides.h new file mode 100644 index 0000000..f751e3e --- /dev/null +++ b/arch/mips/include/asm/mach-powertv/cpu-feature-overrides.h @@ -0,0 +1,59 @@ +/* + * Copyright (C) 2010 Cisco Systems, Inc. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA + */ + +#ifndef _ASM_MACH_POWERTV_CPU_FEATURE_OVERRIDES_H_ +#define _ASM_MACH_POWERTV_CPU_FEATURE_OVERRIDES_H_ +#define cpu_has_tlb 1 +#define cpu_has_4kex 1 +#define cpu_has_3k_cache 0 +#define cpu_has_4k_cache 1 +#define cpu_has_tx39_cache 0 +#define cpu_has_fpu 0 +#define cpu_has_counter 1 +#define cpu_has_watch 1 +#define cpu_has_divec 1 +#define cpu_has_vce 0 +#define cpu_has_cache_cdex_p 0 +#define cpu_has_cache_cdex_s 0 +#define cpu_has_mcheck 1 +#define cpu_has_ejtag 1 +#define cpu_has_llsc 1 +#define cpu_has_mips16 0 +#define cpu_has_mdmx 0 +#define cpu_has_mips3d 0 +#define cpu_has_smartmips 0 +#define cpu_has_vtag_icache 0 +#define cpu_has_dc_aliases 0 +#define cpu_has_ic_fills_f_dc 0 +#define cpu_has_mips32r1 0 +#define cpu_has_mips32r2 1 +#define cpu_has_mips64r1 0 +#define cpu_has_mips64r2 0 +#define cpu_has_dsp 0 +#define cpu_has_mipsmt 0 +#define cpu_has_userlocal 0 +#define cpu_has_nofpuex 0 +#define cpu_has_64bits 0 +#define cpu_has_64bit_zero_reg 0 +#define cpu_has_vint 1 +#define cpu_has_veic 1 +#define cpu_has_inclusive_pcaches 0 + +#define cpu_dcache_line_size() 32 +#define cpu_icache_line_size() 32 +#endif diff --git a/arch/mips/powertv/asic/asic_int.c b/arch/mips/powertv/asic/asic_int.c index 529c44a..7362f63 100644 --- a/arch/mips/powertv/asic/asic_int.c +++ b/arch/mips/powertv/asic/asic_int.c @@ -68,19 +68,6 @@ static void asic_irqdispatch(void) do_IRQ(irq); } -static inline int clz(unsigned long x) -{ - __asm__( - " .set push \n" - " .set mips32 \n" - " clz %0, %1 \n" - " .set pop \n" - : "=r" (x) - : "r" (x)); - - return x; -} - /* * Version of ffs that only looks at bits 12..15. */ ^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH] MIPS: PowerTV: Use fls() carefully where static optimization is required 2010-06-30 22:01 ` David VomLehn @ 2010-07-02 14:13 ` Shinya Kuribayashi 2010-07-02 21:32 ` David VomLehn 2010-07-11 14:01 ` Shinya Kuribayashi 0 siblings, 2 replies; 13+ messages in thread From: Shinya Kuribayashi @ 2010-07-02 14:13 UTC (permalink / raw) To: David VomLehn; +Cc: linux-mips Hi, On 07/01/2010 07:01 AM, David VomLehn wrote: > Thanks! You are correct in your analysis and make a good point that > clz should be used in interrupt handling. I think, though, that it's > better to go ahead and supply a full-blown cpu-features-override.h > rather than focusing on this one case. This way fls() will be optimized > to use clz everywhere and any other optimizations that depend on constant > cpu_has_* values will also be used. Your choice, either one will be fine :-) By the way, Malta's clz() and irq_ffs() are very nice, and there are two followers; MIPSSim and PowerTV. And now I'm going to make use of them for emma2rh, too. I've prepared a consolidation patch like this, but have two concerns: 1) irq_ffs() is used to dispatch IRQs, so we'd like to give preference to CONFIG_CPU_xxx over cpu_has_clo_clz, to optimize with CLZ. It's somewhat different for usual fls() and ffs() cases. Or, 2) would it be better to check __builtin_constant_p(cpu_has_clo_clz)? Or, any other good alternatives? diff --git a/arch/mips/include/asm/irq_cpu.h b/arch/mips/include/asm/irq_cpu.h index ef6a07c..ad6b416 100644 --- a/arch/mips/include/asm/irq_cpu.h +++ b/arch/mips/include/asm/irq_cpu.h @@ -17,4 +17,47 @@ extern void mips_cpu_irq_init(void); extern void rm7k_cpu_irq_init(void); extern void rm9k_cpu_irq_init(void); +/* + * Version of ffs that only looks at bits 12..15. + */ +static inline unsigned int irq_ffs(unsigned int pending) +{ +#if defined(CONFIG_CPU_MIPS32) || defined(CONFIG_CPU_MIPS64) + int x; + + __asm__( + " .set push \n" + " .set mips32 \n" + " clz %0, %1 \n" + " .set pop \n" + : "=r" (x) + : "r" (pending)); + + return -x + 31 - CAUSEB_IP; +#else + unsigned int a0 = 7; + unsigned int t0; + + t0 = pending & 0xf000; + t0 = t0 < 1; + t0 = t0 << 2; + a0 = a0 - t0; + pending = pending << t0; + + t0 = pending & 0xc000; + t0 = t0 < 1; + t0 = t0 << 1; + a0 = a0 - t0; + pending = pending << t0; + + t0 = pending & 0x8000; + t0 = t0 < 1; + /* t0 = t0 << 2; */ + a0 = a0 - t0; + /* pending = pending << t0; */ + + return a0; +#endif +} + #endif /* _ASM_IRQ_CPU_H */ diff --git a/arch/mips/mipssim/sim_int.c b/arch/mips/mipssim/sim_int.c index 5c779be..5813e7e 100644 --- a/arch/mips/mipssim/sim_int.c +++ b/arch/mips/mipssim/sim_int.c @@ -22,52 +22,6 @@ #include <asm/mips-boards/simint.h> #include <asm/irq_cpu.h> -static inline int clz(unsigned long x) -{ - __asm__( - " .set push \n" - " .set mips32 \n" - " clz %0, %1 \n" - " .set pop \n" - : "=r" (x) - : "r" (x)); - - return x; -} - -/* - * Version of ffs that only looks at bits 12..15. - */ -static inline unsigned int irq_ffs(unsigned int pending) -{ -#if defined(CONFIG_CPU_MIPS32) || defined(CONFIG_CPU_MIPS64) - return -clz(pending) + 31 - CAUSEB_IP; -#else - unsigned int a0 = 7; - unsigned int t0; - - t0 = s0 & 0xf000; - t0 = t0 < 1; - t0 = t0 << 2; - a0 = a0 - t0; - s0 = s0 << t0; - - t0 = s0 & 0xc000; - t0 = t0 < 1; - t0 = t0 << 1; - a0 = a0 - t0; - s0 = s0 << t0; - - t0 = s0 & 0x8000; - t0 = t0 < 1; - /* t0 = t0 << 2; */ - a0 = a0 - t0; - /* s0 = s0 << t0; */ - - return a0; -#endif -} - asmlinkage void plat_irq_dispatch(void) { unsigned int pending = read_c0_cause() & read_c0_status() & ST0_IM; diff --git a/arch/mips/mti-malta/malta-int.c b/arch/mips/mti-malta/malta-int.c index 15949b0..656cecf 100644 --- a/arch/mips/mti-malta/malta-int.c +++ b/arch/mips/mti-malta/malta-int.c @@ -197,52 +197,6 @@ static void corehi_irqdispatch(void) die("CoreHi interrupt", regs); } -static inline int clz(unsigned long x) -{ - __asm__( - " .set push \n" - " .set mips32 \n" - " clz %0, %1 \n" - " .set pop \n" - : "=r" (x) - : "r" (x)); - - return x; -} - -/* - * Version of ffs that only looks at bits 12..15. - */ -static inline unsigned int irq_ffs(unsigned int pending) -{ -#if defined(CONFIG_CPU_MIPS32) || defined(CONFIG_CPU_MIPS64) - return -clz(pending) + 31 - CAUSEB_IP; -#else - unsigned int a0 = 7; - unsigned int t0; - - t0 = pending & 0xf000; - t0 = t0 < 1; - t0 = t0 << 2; - a0 = a0 - t0; - pending = pending << t0; - - t0 = pending & 0xc000; - t0 = t0 < 1; - t0 = t0 << 1; - a0 = a0 - t0; - pending = pending << t0; - - t0 = pending & 0x8000; - t0 = t0 < 1; - /* t0 = t0 << 2; */ - a0 = a0 - t0; - /* pending = pending << t0; */ - - return a0; -#endif -} - /* * IRQs on the Malta board look basically (barring software IRQs which we * don't use at all and all external interrupt sources are combined together diff --git a/arch/mips/powertv/asic/asic_int.c b/arch/mips/powertv/asic/asic_int.c index e3c08a2..06602e7 100644 --- a/arch/mips/powertv/asic/asic_int.c +++ b/arch/mips/powertv/asic/asic_int.c @@ -68,27 +68,6 @@ static void asic_irqdispatch(void) do_IRQ(irq); } -static inline int clz(unsigned long x) -{ - __asm__( - " .set push \n" - " .set mips32 \n" - " clz %0, %1 \n" - " .set pop \n" - : "=r" (x) - : "r" (x)); - - return x; -} - -/* - * Version of ffs that only looks at bits 12..15. - */ -static inline unsigned int irq_ffs(unsigned int pending) -{ - return -clz(pending) + 31 - CAUSEB_IP; -} - /* * TODO: check how it works under EIC mode. */ ^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH] MIPS: PowerTV: Use fls() carefully where static optimization is required 2010-07-02 14:13 ` Shinya Kuribayashi @ 2010-07-02 21:32 ` David VomLehn 2010-07-03 14:31 ` Shinya Kuribayashi 2010-07-11 14:01 ` Shinya Kuribayashi 1 sibling, 1 reply; 13+ messages in thread From: David VomLehn @ 2010-07-02 21:32 UTC (permalink / raw) To: Shinya Kuribayashi; +Cc: linux-mips On Fri, Jul 02, 2010 at 09:13:59AM -0500, Shinya Kuribayashi wrote: > Hi, > > On 07/01/2010 07:01 AM, David VomLehn wrote: > > Thanks! You are correct in your analysis and make a good point that > > clz should be used in interrupt handling. I think, though, that it's > > better to go ahead and supply a full-blown cpu-features-override.h > > rather than focusing on this one case. This way fls() will be optimized > > to use clz everywhere and any other optimizations that depend on constant > > cpu_has_* values will also be used. > > Your choice, either one will be fine :-) I think the cpu-features-override is a a better solution because it allows better code throughout the kernel. > By the way, Malta's clz() and irq_ffs() are very nice, and there are > two followers; MIPSSim and PowerTV. And now I'm going to make use of > them for emma2rh, too. > > I've prepared a consolidation patch like this, but have two concerns: > > 1) irq_ffs() is used to dispatch IRQs, so we'd like to give preference > to CONFIG_CPU_xxx over cpu_has_clo_clz, to optimize with CLZ. It's > somewhat different for usual fls() and ffs() cases. Or, > > 2) would it be better to check __builtin_constant_p(cpu_has_clo_clz)? > > Or, any other good alternatives? Usually it's better to control things on a feature-by-feature basis rather than rely on things like CPU model. This allows you to easily handle case where, for example, you have a different CPU that normally doesn't have a feature but a particular variant does have it. IIRC, the MIPS family has examples of this. So, I think it's better to go with the: __builtin_constant_p(cpu_has_clo_clz) && cpu_has_clo_clz used in fls(). -- David VL ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] MIPS: PowerTV: Use fls() carefully where static optimization is required 2010-07-02 21:32 ` David VomLehn @ 2010-07-03 14:31 ` Shinya Kuribayashi 2010-07-03 17:03 ` Maciej W. Rozycki 2010-07-06 1:22 ` Ralf Baechle 0 siblings, 2 replies; 13+ messages in thread From: Shinya Kuribayashi @ 2010-07-03 14:31 UTC (permalink / raw) To: David VomLehn; +Cc: linux-mips On 07/03/2010 06:32 AM, David VomLehn wrote: > Usually it's better to control things on a feature-by-feature basis rather > than rely on things like CPU model. This allows you to easily handle case > where, for example, you have a different CPU that normally doesn't have > a feature but a particular variant does have it. IIRC, the MIPS family has > examples of this. So, I think it's better to go with the: > __builtin_constant_p(cpu_has_clo_clz) && cpu_has_clo_clz > used in fls(). Ok, now I've come to the same conclusion. Revised patch will be like this. Malta is a development platform supporting various types of MIPS32/MIPS64 cores, hence use cpu_has_clo_clz directly. The same goes to MIPSSim. Another concern is that, I'm not really sure whether cpu_has_clo_clz is acceptable or not for Malta (and MIPSSim). Hopefully Ralf will help us make things in the right direction. Shinya diff --git a/arch/mips/include/asm/irq_cpu.h b/arch/mips/include/asm/irq_cpu.h index ef6a07c..8fa5c2f 100644 --- a/arch/mips/include/asm/irq_cpu.h +++ b/arch/mips/include/asm/irq_cpu.h @@ -13,6 +13,52 @@ #ifndef _ASM_IRQ_CPU_H #define _ASM_IRQ_CPU_H +/** + * irq_ffs - Version of ffs that only looks at bits 12..15 + * @pending: pending interrupt status + * + * @pending is expected to be 32-bit wide, where unwanted bits are + * filtered out using ST0_IM beforehand. + */ +static inline unsigned int irq_ffs(unsigned int pending) +{ + unsigned int a0 = 7; + unsigned int t0; + + if (__builtin_constant_p(cpu_has_clo_clz) && cpu_has_clo_clz) { + int x; + __asm__( + " .set push \n" + " .set mips32 \n" + " clz %0, %1 \n" + " .set pop \n" + : "=r" (x) + : "r" (pending)); + + return -x + 31 - CAUSEB_IP; + } + + t0 = pending & 0xf000; + t0 = t0 < 1; + t0 = t0 << 2; + a0 = a0 - t0; + pending = pending << t0; + + t0 = pending & 0xc000; + t0 = t0 < 1; + t0 = t0 << 1; + a0 = a0 - t0; + pending = pending << t0; + + t0 = pending & 0x8000; + t0 = t0 < 1; + /* t0 = t0 << 2; */ + a0 = a0 - t0; + /* pending = pending << t0; */ + + return a0; +} + extern void mips_cpu_irq_init(void); extern void rm7k_cpu_irq_init(void); extern void rm9k_cpu_irq_init(void); diff --git a/arch/mips/include/asm/mach-malta/cpu-feature-overrides.h b/arch/mips/include/asm/mach-malta/cpu-feature-overrides.h index 2848cea..37e3583 100644 --- a/arch/mips/include/asm/mach-malta/cpu-feature-overrides.h +++ b/arch/mips/include/asm/mach-malta/cpu-feature-overrides.h @@ -32,6 +32,7 @@ /* #define cpu_has_vtag_icache ? */ /* #define cpu_has_dc_aliases ? */ /* #define cpu_has_ic_fills_f_dc ? */ +#define cpu_has_clo_clz 1 #define cpu_has_nofpuex 0 /* #define cpu_has_64bits ? */ /* #define cpu_has_64bit_zero_reg ? */ @@ -58,6 +59,7 @@ /* #define cpu_has_vtag_icache ? */ /* #define cpu_has_dc_aliases ? */ /* #define cpu_has_ic_fills_f_dc ? */ +#define cpu_has_clo_clz 1 #define cpu_has_nofpuex 0 /* #define cpu_has_64bits ? */ /* #define cpu_has_64bit_zero_reg ? */ diff --git a/arch/mips/include/asm/mach-mipssim/cpu-feature-overrides.h b/arch/mips/include/asm/mach-mipssim/cpu-feature-overrides.h index 779b022..27aaaa5 100644 --- a/arch/mips/include/asm/mach-mipssim/cpu-feature-overrides.h +++ b/arch/mips/include/asm/mach-mipssim/cpu-feature-overrides.h @@ -31,6 +31,7 @@ /* #define cpu_has_vtag_icache ? */ /* #define cpu_has_dc_aliases ? */ /* #define cpu_has_ic_fills_f_dc ? */ +#define cpu_has_clo_clz 1 #define cpu_has_nofpuex 0 /* #define cpu_has_64bits ? */ /* #define cpu_has_64bit_zero_reg ? */ @@ -56,6 +57,7 @@ /* #define cpu_has_vtag_icache ? */ /* #define cpu_has_dc_aliases ? */ /* #define cpu_has_ic_fills_f_dc ? */ +#define cpu_has_clo_clz 1 #define cpu_has_nofpuex 0 /* #define cpu_has_64bits ? */ /* #define cpu_has_64bit_zero_reg ? */ diff --git a/arch/mips/mipssim/sim_int.c b/arch/mips/mipssim/sim_int.c index 5c779be..5813e7e 100644 --- a/arch/mips/mipssim/sim_int.c +++ b/arch/mips/mipssim/sim_int.c @@ -22,52 +22,6 @@ #include <asm/mips-boards/simint.h> #include <asm/irq_cpu.h> -static inline int clz(unsigned long x) -{ - __asm__( - " .set push \n" - " .set mips32 \n" - " clz %0, %1 \n" - " .set pop \n" - : "=r" (x) - : "r" (x)); - - return x; -} - -/* - * Version of ffs that only looks at bits 12..15. - */ -static inline unsigned int irq_ffs(unsigned int pending) -{ -#if defined(CONFIG_CPU_MIPS32) || defined(CONFIG_CPU_MIPS64) - return -clz(pending) + 31 - CAUSEB_IP; -#else - unsigned int a0 = 7; - unsigned int t0; - - t0 = s0 & 0xf000; - t0 = t0 < 1; - t0 = t0 << 2; - a0 = a0 - t0; - s0 = s0 << t0; - - t0 = s0 & 0xc000; - t0 = t0 < 1; - t0 = t0 << 1; - a0 = a0 - t0; - s0 = s0 << t0; - - t0 = s0 & 0x8000; - t0 = t0 < 1; - /* t0 = t0 << 2; */ - a0 = a0 - t0; - /* s0 = s0 << t0; */ - - return a0; -#endif -} - asmlinkage void plat_irq_dispatch(void) { unsigned int pending = read_c0_cause() & read_c0_status() & ST0_IM; diff --git a/arch/mips/mti-malta/malta-int.c b/arch/mips/mti-malta/malta-int.c index 15949b0..656cecf 100644 --- a/arch/mips/mti-malta/malta-int.c +++ b/arch/mips/mti-malta/malta-int.c @@ -197,52 +197,6 @@ static void corehi_irqdispatch(void) die("CoreHi interrupt", regs); } -static inline int clz(unsigned long x) -{ - __asm__( - " .set push \n" - " .set mips32 \n" - " clz %0, %1 \n" - " .set pop \n" - : "=r" (x) - : "r" (x)); - - return x; -} - -/* - * Version of ffs that only looks at bits 12..15. - */ -static inline unsigned int irq_ffs(unsigned int pending) -{ -#if defined(CONFIG_CPU_MIPS32) || defined(CONFIG_CPU_MIPS64) - return -clz(pending) + 31 - CAUSEB_IP; -#else - unsigned int a0 = 7; - unsigned int t0; - - t0 = pending & 0xf000; - t0 = t0 < 1; - t0 = t0 << 2; - a0 = a0 - t0; - pending = pending << t0; - - t0 = pending & 0xc000; - t0 = t0 < 1; - t0 = t0 << 1; - a0 = a0 - t0; - pending = pending << t0; - - t0 = pending & 0x8000; - t0 = t0 < 1; - /* t0 = t0 << 2; */ - a0 = a0 - t0; - /* pending = pending << t0; */ - - return a0; -#endif -} - /* * IRQs on the Malta board look basically (barring software IRQs which we * don't use at all and all external interrupt sources are combined together diff --git a/arch/mips/powertv/asic/asic_int.c b/arch/mips/powertv/asic/asic_int.c index 529c44a..06602e7 100644 --- a/arch/mips/powertv/asic/asic_int.c +++ b/arch/mips/powertv/asic/asic_int.c @@ -68,27 +68,6 @@ static void asic_irqdispatch(void) do_IRQ(irq); } -static inline int clz(unsigned long x) -{ - __asm__( - " .set push \n" - " .set mips32 \n" - " clz %0, %1 \n" - " .set pop \n" - : "=r" (x) - : "r" (x)); - - return x; -} - -/* - * Version of ffs that only looks at bits 12..15. - */ -static inline unsigned int irq_ffs(unsigned int pending) -{ - return fls(pending) - 1 + CAUSEB_IP; -} - /* * TODO: check how it works under EIC mode. */ ^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH] MIPS: PowerTV: Use fls() carefully where static optimization is required 2010-07-03 14:31 ` Shinya Kuribayashi @ 2010-07-03 17:03 ` Maciej W. Rozycki 2010-07-05 0:33 ` Shinya Kuribayashi 2010-07-06 1:22 ` Ralf Baechle 1 sibling, 1 reply; 13+ messages in thread From: Maciej W. Rozycki @ 2010-07-03 17:03 UTC (permalink / raw) To: Shinya Kuribayashi; +Cc: David VomLehn, linux-mips On Sat, 3 Jul 2010, Shinya Kuribayashi wrote: > Ok, now I've come to the same conclusion. Revised patch will be like > this. Malta is a development platform supporting various types of > MIPS32/MIPS64 cores, hence use cpu_has_clo_clz directly. The same goes > to MIPSSim. Malta also supports a couple of MIPS IV processors too, so please be careful about such assumptions. > + if (__builtin_constant_p(cpu_has_clo_clz) && cpu_has_clo_clz) { > + int x; > + __asm__( > + " .set push \n" > + " .set mips32 \n" > + " clz %0, %1 \n" > + " .set pop \n" > + : "=r" (x) > + : "r" (pending)); > + > + return -x + 31 - CAUSEB_IP; > + } Hmm, ".set mips32" looks dodgy here. For pre-MIPS32/64 platforms this code should never make it to the assembler and if it did, then a build-time error is better than a run-time problem. It might be simpler just to use __builtin_ffs() for this variant though. Inline assembly is better avoided unless absolutely required. Not even mentioning readability. Maciej ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] MIPS: PowerTV: Use fls() carefully where static optimization is required 2010-07-03 17:03 ` Maciej W. Rozycki @ 2010-07-05 0:33 ` Shinya Kuribayashi 2010-07-05 11:43 ` Ralf Baechle 2010-07-05 13:35 ` Maciej W. Rozycki 0 siblings, 2 replies; 13+ messages in thread From: Shinya Kuribayashi @ 2010-07-05 0:33 UTC (permalink / raw) To: Maciej W. Rozycki; +Cc: Shinya Kuribayashi, David VomLehn, linux-mips Hi, On 7/4/2010 2:03 AM, Maciej W. Rozycki wrote: > Malta also supports a couple of MIPS IV processors too, so please be > careful about such assumptions. Ah, that's the answer I'm looking for, thanks! So current irq_ffs() form (clz() is enabled only when CONFIG_CPU_MIPS32/64 is selected) is well-suited for Malta platform, and it seems better to leave them as they are. I'll drop the patch from my list. >> + if (__builtin_constant_p(cpu_has_clo_clz) && cpu_has_clo_clz) { >> + int x; >> + __asm__( >> + " .set push \n" >> + " .set mips32 \n" >> + " clz %0, %1 \n" >> + " .set pop \n" >> + : "=r" (x) >> + : "r" (pending)); >> + >> + return -x + 31 - CAUSEB_IP; >> + } > > Hmm, ".set mips32" looks dodgy here. For pre-MIPS32/64 platforms this > code should never make it to the assembler and if it did, then a > build-time error is better than a run-time problem. I see, cpu_has_clo_clz doesn't work well for platforms such as Malta. Malta can support several ISAs at a time, which is very valuable, but hard to be optimized :-) > It might be simpler just to use __builtin_ffs() for this variant though. > Inline assembly is better avoided unless absolutely required. Not even > mentioning readability. Hm. It might be simpler, but it's not the purpose of irq_ffs(), IMHO. -- Shinya Kuribayashi Renesas Electronics ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] MIPS: PowerTV: Use fls() carefully where static optimization is required 2010-07-05 0:33 ` Shinya Kuribayashi @ 2010-07-05 11:43 ` Ralf Baechle 2010-07-05 13:35 ` Maciej W. Rozycki 1 sibling, 0 replies; 13+ messages in thread From: Ralf Baechle @ 2010-07-05 11:43 UTC (permalink / raw) To: Shinya Kuribayashi Cc: Maciej W. Rozycki, Shinya Kuribayashi, David VomLehn, linux-mips On Mon, Jul 05, 2010 at 09:33:36AM +0900, Shinya Kuribayashi wrote: > On 7/4/2010 2:03 AM, Maciej W. Rozycki wrote: > > Malta also supports a couple of MIPS IV processors too, so please be > > careful about such assumptions. > > Ah, that's the answer I'm looking for, thanks! So current irq_ffs() > form (clz() is enabled only when CONFIG_CPU_MIPS32/64 is selected) is > well-suited for Malta platform, and it seems better to leave them as > they are. I'll drop the patch from my list. > > >> + if (__builtin_constant_p(cpu_has_clo_clz) && cpu_has_clo_clz) { > >> + int x; > >> + __asm__( > >> + " .set push \n" > >> + " .set mips32 \n" > >> + " clz %0, %1 \n" > >> + " .set pop \n" > >> + : "=r" (x) > >> + : "r" (pending)); > >> + > >> + return -x + 31 - CAUSEB_IP; > >> + } > > > > Hmm, ".set mips32" looks dodgy here. For pre-MIPS32/64 platforms this > > code should never make it to the assembler and if it did, then a > > build-time error is better than a run-time problem. For pedantic accuracy - the IDT RC32364 introduced CLO and CLZ; in an act of uglyness the RC64574 then inherited these two instructions but did not add. though it was 64-bit not DCLO and DCLZ; the NEC VR5500 has the full complement of CLO, CLZ, DCLO and DCLZ. > I see, cpu_has_clo_clz doesn't work well for platforms such as Malta. > Malta can support several ISAs at a time, which is very valuable, but > hard to be optimized :-) While MIPS IV CPU cards for the Malta are available hardly anybody is using on of those cards. Thus cpu_has_clo_clz defaults to cpu_has_mips_r and ideally and platform should see cpu_has_mips_r to a constant to allow best possible optimization. Malta doesn't ... > > It might be simpler just to use __builtin_ffs() for this variant though. > > Inline assembly is better avoided unless absolutely required. Not even > > mentioning readability. > > Hm. It might be simpler, but it's not the purpose of irq_ffs(), IMHO. Indeed. Ralf ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] MIPS: PowerTV: Use fls() carefully where static optimization is required 2010-07-05 0:33 ` Shinya Kuribayashi 2010-07-05 11:43 ` Ralf Baechle @ 2010-07-05 13:35 ` Maciej W. Rozycki 2010-07-06 11:17 ` Shinya Kuribayashi 1 sibling, 1 reply; 13+ messages in thread From: Maciej W. Rozycki @ 2010-07-05 13:35 UTC (permalink / raw) To: Shinya Kuribayashi; +Cc: Shinya Kuribayashi, David VomLehn, linux-mips On Mon, 5 Jul 2010, Shinya Kuribayashi wrote: > Ah, that's the answer I'm looking for, thanks! So current irq_ffs() > form (clz() is enabled only when CONFIG_CPU_MIPS32/64 is selected) is > well-suited for Malta platform, and it seems better to leave them as > they are. I'll drop the patch from my list. You still can and probably want to optimise. You can have configuration dependencies in override files. > >> + if (__builtin_constant_p(cpu_has_clo_clz) && cpu_has_clo_clz) { > >> + int x; > >> + __asm__( > >> + " .set push \n" > >> + " .set mips32 \n" > >> + " clz %0, %1 \n" > >> + " .set pop \n" > >> + : "=r" (x) > >> + : "r" (pending)); > >> + > >> + return -x + 31 - CAUSEB_IP; > >> + } > > > > Hmm, ".set mips32" looks dodgy here. For pre-MIPS32/64 platforms this > > code should never make it to the assembler and if it did, then a > > build-time error is better than a run-time problem. > > I see, cpu_has_clo_clz doesn't work well for platforms such as Malta. > Malta can support several ISAs at a time, which is very valuable, but > hard to be optimized :-) Well, <asm/mach-malta/cpu-feature-overrides.h> seems to be getting this right. MIPS IV options are not included as quite rare compared to MIPS32/64 ones and run-time determined defaults apply. For MIPS32/64 configurations compile-time optimisations work as usually. > > It might be simpler just to use __builtin_ffs() for this variant though. > > Inline assembly is better avoided unless absolutely required. Not even > > mentioning readability. > > Hm. It might be simpler, but it's not the purpose of irq_ffs(), IMHO. My point is whenever cpu_has_clo_clz is hardcoded to 1 GCC will expand __builtin_ffs() to CLZ as expected and may potentially be able to optimise it further. For the fallback path I agree you do not want to use __builtin_ffs() as you want to save processing time of the unneeded bits -- there's no 8-bit FFS intrinsic let alone a 4-bit one (I'm assuming there is a reason this piece of code does not check lower 4 interrupt inputs). Maciej ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] MIPS: PowerTV: Use fls() carefully where static optimization is required 2010-07-05 13:35 ` Maciej W. Rozycki @ 2010-07-06 11:17 ` Shinya Kuribayashi 0 siblings, 0 replies; 13+ messages in thread From: Shinya Kuribayashi @ 2010-07-06 11:17 UTC (permalink / raw) To: Maciej W. Rozycki; +Cc: Shinya Kuribayashi, David VomLehn, linux-mips On 7/5/2010 10:35 PM, Maciej W. Rozycki wrote: > Well, <asm/mach-malta/cpu-feature-overrides.h> seems to be getting this > right. MIPS IV options are not included as quite rare compared to > MIPS32/64 ones and run-time determined defaults apply. For MIPS32/64 > configurations compile-time optimisations work as usually. Good catch, I missed that point. Then CONFIG_CPU_32/64 used at irq_ffs() can be replaced with cpu_has_clo_clz without problems. (eell-designed, anyway) >>> It might be simpler just to use __builtin_ffs() for this variant though. >>> Inline assembly is better avoided unless absolutely required. Not even >>> mentioning readability. >> >> Hm. It might be simpler, but it's not the purpose of irq_ffs(), IMHO. > > My point is whenever cpu_has_clo_clz is hardcoded to 1 GCC will expand > __builtin_ffs() to CLZ as expected and may potentially be able to optimise > it further. For the fallback path I agree you do not want to use > __builtin_ffs() as you want to save processing time of the unneeded bits Got it. I was too lazy, thanks for kind clarification. As for __builtin_ffs(), in addition to Ralf's comments on GCC versions, I was thinkg about two things: 1) Can we really get irq_ffs() optimized using __builtin_ffs/fls()? __builtin_ffs/fls() will emit CLZ if available, that's fine. But we don't want ffs/fls itself here. First, contrary to its name, current irq_ffs() implementation is very similar to __fls(). It's like a super-subset of __fls() only menat for CP0.Status check. And ffs is basically achieved by fls(word & -word). So __builtin_ffs/fls() could never be smaller than current irq_ffs(). irq_ffs() < __fls() < __fls(w & -w) == __ffs() # <asm/bitops.h> Consequently we can't obtain smaller irq_ffs() using __builtin_ffs/fls(). IIUC, current form of irq_ffs() will be useful in the future. 2) How cpu_has_clo_clz should be handled in the kernel This is not limited to cpu_has_clo_clz, but more general issue. Using __builtin_foo features allow us to get a variety of GCC services, optimizations. But also means that, it increases dependency on the GCC, strictly speaking, dependency on processor specifiers (-march=). For instance, when cpu_has_clo_clz is specified, we'd like to make it to the assembler CLZ, even if it's not supported. Or we should not do that? This is already pointed out from Maciej in the previous comment: > ".set mips32" looks dodgy here. For pre-MIPS32/64 platforms this > code should never make it to the assembler and if it did, then a > build-time error is better than a run-time problem. I know there's not always one answer, everything should be handled on a case by case basis. I'm just wondering about such things for a while (no need to reply) > -- there's no 8-bit FFS intrinsic let alone a 4-bit one (I'm assuming > there is a reason this piece of code does not check lower 4 interrupt > inputs). Failed to understand what you say (sorry!), but as far as I examined irq_ffs(), it returns from 7 to 0. Perhaps the following "bits 12..15" could be replaced with "bits 8..15"? Or am I missing something? > /* > * Version of ffs that only looks at bits 12..15. > */ Once things become clear (for me), I'll make a patchset incorporating cpu-feature-overrieds.h against PowerTV (with David's SOB of course). -- Shinya Kuribayashi Renesas Electronics ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] MIPS: PowerTV: Use fls() carefully where static optimization is required 2010-07-03 14:31 ` Shinya Kuribayashi 2010-07-03 17:03 ` Maciej W. Rozycki @ 2010-07-06 1:22 ` Ralf Baechle 1 sibling, 0 replies; 13+ messages in thread From: Ralf Baechle @ 2010-07-06 1:22 UTC (permalink / raw) To: Shinya Kuribayashi; +Cc: David VomLehn, linux-mips On Sat, Jul 03, 2010 at 11:31:44PM +0900, Shinya Kuribayashi wrote: > On 07/03/2010 06:32 AM, David VomLehn wrote: > > Usually it's better to control things on a feature-by-feature basis rather > > than rely on things like CPU model. This allows you to easily handle case > > where, for example, you have a different CPU that normally doesn't have > > a feature but a particular variant does have it. IIRC, the MIPS family has > > examples of this. So, I think it's better to go with the: > > __builtin_constant_p(cpu_has_clo_clz) && cpu_has_clo_clz > > used in fls(). > > Ok, now I've come to the same conclusion. Revised patch will be like > this. Malta is a development platform supporting various types of > MIPS32/MIPS64 cores, hence use cpu_has_clo_clz directly. The same goes > to MIPSSim. > > Another concern is that, I'm not really sure whether cpu_has_clo_clz is > acceptable or not for Malta (and MIPSSim). Hopefully Ralf will help us > make things in the right direction. My grief with this patch at this moment is: o The suggestion of using __builtin_ffs or similar is nice but these functions appear to have introduced in GCC 3.4 but we unfortunately support GCC >= 3.2. o no Signed-off-by: line. So can you sen me one? Thanks Ralf ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] MIPS: PowerTV: Use fls() carefully where static optimization is required 2010-07-02 14:13 ` Shinya Kuribayashi 2010-07-02 21:32 ` David VomLehn @ 2010-07-11 14:01 ` Shinya Kuribayashi 1 sibling, 0 replies; 13+ messages in thread From: Shinya Kuribayashi @ 2010-07-11 14:01 UTC (permalink / raw) To: David VomLehn; +Cc: linux-mips Hi David, Before submitting irq_ffs() cleanup patchset, it seems I/we need to sort out PowerTV's IRQ code properly. Please find my comments below. On 07/02/2010 11:13 PM, Shinya Kuribayashi wrote: > On 07/01/2010 07:01 AM, David VomLehn wrote: >> Thanks! You are correct in your analysis and make a good point that >> clz should be used in interrupt handling. I think, though, that it's >> better to go ahead and supply a full-blown cpu-features-override.h >> rather than focusing on this one case. This way fls() will be optimized >> to use clz everywhere and any other optimizations that depend on constant >> cpu_has_* values will also be used. > > Your choice, either one will be fine :-) Double-checking the generated code, current PowerTV IRQ code is slightly different from what I expected: 1) PowerTV doesn't use mips_cpu_irq_init(), so IRQ #0-7 are not allocated for MIPS CPU IRQs, but used for its ASIC interrupts. All irq_desc[0.. 126] (NR_IRQS=127) are meant for ASIC interrupts, a bit surprising ;-) Presumably it intentionally skips primary CP0.Status decoding. Just check CP0.Status, and if it's flagged, then jump into ASIC dispatcher directly. 2) PowerTV's irq_ffs() behaves differently from Malta or MISPSim one. Without CLZ optimization, current PowerTV's irq_ffs() returns: PowerTV ------- status=0x 100 => 9 status=0x 300 => 10 status=0x 700 => 11 status=0x f00 => 12 status=0x1f00 => 13 status=0x3f00 => 14 status=0x7f00 => 15 status=0xff00 => 16 while Malta and MIPSSim would return: Malta, MIPSSim ------- status=0x 100 => 0 status=0x 300 => 1 status=0x 700 => 2 status=0x f00 => 3 status=0x1f00 => 4 status=0x3f00 => 5 status=0x7f00 => 6 status=0xff00 => 7 3) In addition to 2), the most questionable part is (irq == CAUSEF_IP3): > asmlinkage void plat_irq_dispatch(void) > { > unsigned int pending = read_c0_cause() & read_c0_status() & ST0_IM; > int irq; > > irq = irq_ffs(pending); > > if (irq == CAUSEF_IP3) > asic_irqdispatch(); > else if (irq >= 0) > do_IRQ(irq); > else > spurious_interrupt(); > } CAUSEF_IP3 is 0x0800, while irq_ffs() returns (9..16). This implies that asic_irqdispatch() is not used here, and all interrupts are forwarded to 'else-if (irq >= 0) do_IRQ(irq);' path. Remember that all irq_desc[0..126] are meant for ASIC interrupts, and irq_ffs() returns (9..16). How do you handle rest of interrupts? I'm lost here. Taking a closer look, PowerTV has the code registering VI- or EIC- handlers. Asic_irqdispatch() might be directly strapped via VI- or EIC-mode, and plat_irq_dispatch() is not used completely, hmm. --- For example, the patch like this still works for PowerTV? I'd like to make sure whether PowerTV still require irq_ffs() or not, as it prevents irq_ffs() consolidation patch from being submitted. But no need to hurry, I can hold the patch for weeks, for months. Shinya diff --git a/arch/mips/powertv/asic/asic_int.c b/arch/mips/powertv/asic/asic_int.c index 529c44a..2a8fd99 100644 --- a/arch/mips/powertv/asic/asic_int.c +++ b/arch/mips/powertv/asic/asic_int.c @@ -33,7 +33,6 @@ #include <asm/irq_cpu.h> #include <linux/io.h> -#include <asm/irq_regs.h> #include <asm/mips-boards/generic.h> #include <asm/mach-powertv/asic_regs.h> @@ -68,40 +67,15 @@ static void asic_irqdispatch(void) do_IRQ(irq); } -static inline int clz(unsigned long x) -{ - __asm__( - " .set push \n" - " .set mips32 \n" - " clz %0, %1 \n" - " .set pop \n" - : "=r" (x) - : "r" (x)); - - return x; -} - -/* - * Version of ffs that only looks at bits 12..15. - */ -static inline unsigned int irq_ffs(unsigned int pending) -{ - return fls(pending) - 1 + CAUSEB_IP; -} - /* * TODO: check how it works under EIC mode. */ asmlinkage void plat_irq_dispatch(void) { - unsigned int pending = read_c0_cause() & read_c0_status() & ST0_IM; int irq; - irq = irq_ffs(pending); - - if (irq == CAUSEF_IP3) - asic_irqdispatch(); - else if (irq >= 0) + irq = get_int(); + if (irq >= 0) do_IRQ(irq); else spurious_interrupt(); ^ permalink raw reply related [flat|nested] 13+ messages in thread
end of thread, other threads:[~2010-07-11 14:02 UTC | newest] Thread overview: 13+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2010-06-27 13:44 [PATCH] MIPS: PowerTV: Use fls() carefully where static optimization is required Shinya Kuribayashi 2010-06-30 17:48 ` David VomLehn 2010-06-30 22:01 ` David VomLehn 2010-07-02 14:13 ` Shinya Kuribayashi 2010-07-02 21:32 ` David VomLehn 2010-07-03 14:31 ` Shinya Kuribayashi 2010-07-03 17:03 ` Maciej W. Rozycki 2010-07-05 0:33 ` Shinya Kuribayashi 2010-07-05 11:43 ` Ralf Baechle 2010-07-05 13:35 ` Maciej W. Rozycki 2010-07-06 11:17 ` Shinya Kuribayashi 2010-07-06 1:22 ` Ralf Baechle 2010-07-11 14:01 ` Shinya Kuribayashi
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.