* [PATCH] fast path for rdhwr emulation for TLS @ 2006-07-07 15:00 Atsushi Nemoto 2006-07-07 15:22 ` Maciej W. Rozycki 0 siblings, 1 reply; 26+ messages in thread From: Atsushi Nemoto @ 2006-07-07 15:00 UTC (permalink / raw) To: linux-mips; +Cc: ralf Adding special short path for emulationg RDHWR which is used to support TLS. Signed-off-by: Atsushi Nemoto <anemo@mba.ocn.ne.jp> diff --git a/arch/mips/kernel/genex.S b/arch/mips/kernel/genex.S index b563811..545bcb1 100644 --- a/arch/mips/kernel/genex.S +++ b/arch/mips/kernel/genex.S @@ -357,7 +357,7 @@ #endif BUILD_HANDLER ibe be cli silent /* #6 */ BUILD_HANDLER dbe be cli silent /* #7 */ BUILD_HANDLER bp bp sti silent /* #9 */ - BUILD_HANDLER ri ri sti silent /* #10 */ + BUILD_HANDLER ri_slow ri sti silent /* #10 */ BUILD_HANDLER cpu cpu sti silent /* #11 */ BUILD_HANDLER ov ov sti silent /* #12 */ BUILD_HANDLER tr tr sti silent /* #13 */ @@ -369,6 +369,39 @@ #endif BUILD_HANDLER dsp dsp sti silent /* #26 */ BUILD_HANDLER reserved reserved sti verbose /* others */ + .align 5 + LEAF(handle_ri) + .set push + .set noat + mfc0 k0, CP0_CAUSE + MFC0 k1, CP0_EPC + bltz k0, handle_ri_slow /* if delay slot */ + lw k0, (k1) + li k1, 0x7c03e83b /* rdhwr v1,$29 */ + bne k0, k1, handle_ri_slow /* if not ours */ + get_saved_sp /* k1 := current_thread_info */ + MFC0 k0, CP0_EPC + LONG_ADDIU k0, 4 + .set noreorder +#if defined(CONFIG_CPU_R3000) || defined(CONFIG_CPU_TX39XX) + ori k1, _THREAD_MASK + xori k1, _THREAD_MASK + LONG_L v1, TI_TP_VALUE(k1) + jr k0 + rfe +#else + /* I hope three instructions between MTC0 and ERET are enough... */ + MTC0 k0, CP0_EPC + ori k1, _THREAD_MASK + xori k1, _THREAD_MASK + LONG_L v1, TI_TP_VALUE(k1) + .set mips3 + eret + .set mips0 +#endif + .set pop + END(handle_ri) + #ifdef CONFIG_64BIT /* A temporary overflow handler used by check_daddi(). */ ^ permalink raw reply related [flat|nested] 26+ messages in thread
* Re: [PATCH] fast path for rdhwr emulation for TLS 2006-07-07 15:00 [PATCH] fast path for rdhwr emulation for TLS Atsushi Nemoto @ 2006-07-07 15:22 ` Maciej W. Rozycki 2006-07-07 16:12 ` Atsushi Nemoto 0 siblings, 1 reply; 26+ messages in thread From: Maciej W. Rozycki @ 2006-07-07 15:22 UTC (permalink / raw) To: Atsushi Nemoto; +Cc: linux-mips, ralf On Sat, 8 Jul 2006, Atsushi Nemoto wrote: > Adding special short path for emulationg RDHWR which is used to > support TLS. You need to take care of VIVT I-caches. > @@ -369,6 +369,39 @@ #endif > BUILD_HANDLER dsp dsp sti silent /* #26 */ > BUILD_HANDLER reserved reserved sti verbose /* others */ > > + .align 5 > + LEAF(handle_ri) > + .set push > + .set noat > + mfc0 k0, CP0_CAUSE > + MFC0 k1, CP0_EPC > + bltz k0, handle_ri_slow /* if delay slot */ > + lw k0, (k1) For a VIVT I-cache this can result in a TLB exception. TLB handlers are not currently prepared for being called at the exception level. Also I am fairly sure gas won't fill the branch delay slot above -- a trivial rearrangement of code would save a cycle here (and this is a fast path, so we do not want wasting time). > + li k1, 0x7c03e83b /* rdhwr v1,$29 */ > + bne k0, k1, handle_ri_slow /* if not ours */ > + get_saved_sp /* k1 := current_thread_info */ > + MFC0 k0, CP0_EPC > + LONG_ADDIU k0, 4 I suggest moving MFC0 ahead of get_saved_sp to avoid a stall. I would fit in the branch delay slot nicely. Maciej ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH] fast path for rdhwr emulation for TLS 2006-07-07 15:22 ` Maciej W. Rozycki @ 2006-07-07 16:12 ` Atsushi Nemoto 2006-07-07 16:43 ` Atsushi Nemoto 2006-07-07 16:58 ` Maciej W. Rozycki 0 siblings, 2 replies; 26+ messages in thread From: Atsushi Nemoto @ 2006-07-07 16:12 UTC (permalink / raw) To: macro; +Cc: linux-mips, ralf On Fri, 7 Jul 2006 16:22:46 +0100 (BST), "Maciej W. Rozycki" <macro@linux-mips.org> wrote: > > + .align 5 > > + LEAF(handle_ri) > > + .set push > > + .set noat > > + mfc0 k0, CP0_CAUSE > > + MFC0 k1, CP0_EPC > > + bltz k0, handle_ri_slow /* if delay slot */ > > + lw k0, (k1) > > For a VIVT I-cache this can result in a TLB exception. TLB handlers are > not currently prepared for being called at the exception level. Thanks, now I understand the problem. Are there any good solutions? Only I can think now is using handle_ri_slow for such CPUs. > Also I am fairly sure gas won't fill the branch delay slot above -- a > trivial rearrangement of code would save a cycle here (and this is a fast > path, so we do not want wasting time). Well, here is a code compiled by binutils 2.17. This version of gas can put MFC0 on the delay slot. But it might be better to use noreorder by myself. 80012a80 <handle_ri>: 80012a80: 401a6800 mfc0 k0,c0_cause 80012a84: 0740fd2e bltz k0,80011f40 <handle_ri_slow> 80012a88: 401b7000 mfc0 k1,c0_epc 80012a8c: 8f7a0000 lw k0,0(k1) 80012a90: 3c1b7c03 lui k1,0x7c03 80012a94: 377be83b ori k1,k1,0xe83b 80012a98: 175bfd29 bne k0,k1,80011f40 <handle_ri_slow> 80012a9c: 00000000 nop 80012aa0: 3c1b801b lui k1,0x801b 80012aa4: 8f7b4008 lw k1,16392(k1) 80012aa8: 401a7000 mfc0 k0,c0_epc 80012aac: 275a0004 addiu k0,k0,4 80012ab0: 409a7000 mtc0 k0,c0_epc 80012ab4: 377b1fff ori k1,k1,0x1fff 80012ab8: 3b7b1fff xori k1,k1,0x1fff 80012abc: 8f63000c lw v1,12(k1) 80012ac0: 42000018 eret > > + li k1, 0x7c03e83b /* rdhwr v1,$29 */ > > + bne k0, k1, handle_ri_slow /* if not ours */ > > + get_saved_sp /* k1 := current_thread_info */ > > + MFC0 k0, CP0_EPC > > + LONG_ADDIU k0, 4 > > I suggest moving MFC0 ahead of get_saved_sp to avoid a stall. I would > fit in the branch delay slot nicely. The MFC0 can not be moved. SMP version of get_saved_sp uses k0 and k1. But of course I can use #ifdef CONFIG_SMP, but these assumption makes the code a bit fragile. Another performance vs. maintainance cost issue... --- Atsushi Nemoto ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH] fast path for rdhwr emulation for TLS 2006-07-07 16:12 ` Atsushi Nemoto @ 2006-07-07 16:43 ` Atsushi Nemoto 2006-07-07 17:04 ` Maciej W. Rozycki 2006-07-07 18:22 ` Ralf Baechle 2006-07-07 16:58 ` Maciej W. Rozycki 1 sibling, 2 replies; 26+ messages in thread From: Atsushi Nemoto @ 2006-07-07 16:43 UTC (permalink / raw) To: macro; +Cc: linux-mips, ralf On Sat, 08 Jul 2006 01:12:45 +0900 (JST), Atsushi Nemoto <anemo@mba.ocn.ne.jp> wrote: > > For a VIVT I-cache this can result in a TLB exception. TLB handlers are > > not currently prepared for being called at the exception level. > > Thanks, now I understand the problem. Are there any good solutions? > Only I can think now is using handle_ri_slow for such CPUs. Can we use Index_Load_Data_I to load the instruction code from icache? Just an idea... --- Atsushi Nemoto ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH] fast path for rdhwr emulation for TLS 2006-07-07 16:43 ` Atsushi Nemoto @ 2006-07-07 17:04 ` Maciej W. Rozycki 2006-07-07 18:22 ` Ralf Baechle 1 sibling, 0 replies; 26+ messages in thread From: Maciej W. Rozycki @ 2006-07-07 17:04 UTC (permalink / raw) To: Atsushi Nemoto; +Cc: linux-mips, ralf On Sat, 8 Jul 2006, Atsushi Nemoto wrote: > Can we use Index_Load_Data_I to load the instruction code from icache? No need to go through such a hassle when we have a proper architectural way of handling it. Remember MIPS TLB-based MMUs (the two variations I know well, anyway) were designed to support a paged kernel. Maciej ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH] fast path for rdhwr emulation for TLS 2006-07-07 16:43 ` Atsushi Nemoto 2006-07-07 17:04 ` Maciej W. Rozycki @ 2006-07-07 18:22 ` Ralf Baechle 1 sibling, 0 replies; 26+ messages in thread From: Ralf Baechle @ 2006-07-07 18:22 UTC (permalink / raw) To: Atsushi Nemoto; +Cc: macro, linux-mips On Sat, Jul 08, 2006 at 01:43:39AM +0900, Atsushi Nemoto wrote: > > > For a VIVT I-cache this can result in a TLB exception. TLB handlers are > > > not currently prepared for being called at the exception level. > > > > Thanks, now I understand the problem. Are there any good solutions? > > Only I can think now is using handle_ri_slow for such CPUs. > > Can we use Index_Load_Data_I to load the instruction code from icache? > Just an idea... In addition to what Maciej said - the format of instructions in the I-cache is not necessarily the same as in memory. Many processor store pre-decoded instructions in the I-cache. Ralf ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH] fast path for rdhwr emulation for TLS 2006-07-07 16:12 ` Atsushi Nemoto 2006-07-07 16:43 ` Atsushi Nemoto @ 2006-07-07 16:58 ` Maciej W. Rozycki 2006-07-08 16:12 ` Atsushi Nemoto 2006-07-10 14:55 ` Atsushi Nemoto 1 sibling, 2 replies; 26+ messages in thread From: Maciej W. Rozycki @ 2006-07-07 16:58 UTC (permalink / raw) To: Atsushi Nemoto; +Cc: linux-mips, ralf On Sat, 8 Jul 2006, Atsushi Nemoto wrote: > > For a VIVT I-cache this can result in a TLB exception. TLB handlers are > > not currently prepared for being called at the exception level. > > Thanks, now I understand the problem. Are there any good solutions? > Only I can think now is using handle_ri_slow for such CPUs. I have implemented an appropriate update to the TLB handlers (or actually it's enough to care for this case for the TLBL exception), but it predates the current synthesized ones. There is a small impact resulting from this change and the synthesized handlers have the advantage of making it only necessary for these chips that do need such handling. There are two possible ways of handling TLB exceptions from the exception level, both requiring checking cp0.index.p (which we do not do at the moment under the assumption a TLB refill exception has already been taken and handled) and if a failure is indicated either: 1. jumping to the TLB refill handler, or: 2. executing "tlbwr" rather than "tlbwi". Both are good, but I have not benchmarked them -- note that a failure is expected to be an extremely rare event, so it's the performance for the probe success that matters. > > Also I am fairly sure gas won't fill the branch delay slot above -- a > > trivial rearrangement of code would save a cycle here (and this is a fast > > path, so we do not want wasting time). > > Well, here is a code compiled by binutils 2.17. This version of gas > can put MFC0 on the delay slot. But it might be better to use > noreorder by myself. > > 80012a80 <handle_ri>: > 80012a80: 401a6800 mfc0 k0,c0_cause > 80012a84: 0740fd2e bltz k0,80011f40 <handle_ri_slow> > 80012a88: 401b7000 mfc0 k1,c0_epc > 80012a8c: 8f7a0000 lw k0,0(k1) Still bad -- you have a stall on $k1 here. And on $k0 two instructions earlier. > 80012a90: 3c1b7c03 lui k1,0x7c03 > 80012a94: 377be83b ori k1,k1,0xe83b > 80012a98: 175bfd29 bne k0,k1,80011f40 <handle_ri_slow> > 80012a9c: 00000000 nop And this "nop" is a waste of time. > 80012aa0: 3c1b801b lui k1,0x801b > 80012aa4: 8f7b4008 lw k1,16392(k1) > 80012aa8: 401a7000 mfc0 k0,c0_epc > 80012aac: 275a0004 addiu k0,k0,4 > 80012ab0: 409a7000 mtc0 k0,c0_epc > 80012ab4: 377b1fff ori k1,k1,0x1fff > 80012ab8: 3b7b1fff xori k1,k1,0x1fff > 80012abc: 8f63000c lw v1,12(k1) > 80012ac0: 42000018 eret I'd restructure the code more or less like this, taking care for (almost) all stalls resulting from interlocks on coprocessor moves and memory loads and likewise avoiding the need for "nop" fillers there for MIPS I processors: .set push .set noat .set noreorder mfc0 k0, CP0_CAUSE MFC0 k1, CP0_EPC bltz k0, handle_ri_slow /* if delay slot */ lui k0, 0x7c03 lw k1, (k1) ori k0, 0xe83b /* k0 := rdhwr v1,$29 */ bne k0, k1, handle_ri_slow /* if not ours */ get_saved_sp /* k1 := current_thread_info */ MFC0 k0, CP0_EPC #if defined(CONFIG_CPU_R3000) || defined(CONFIG_CPU_TX39XX) ori k1, _THREAD_MASK xori k1, _THREAD_MASK LONG_L v1, TI_FLAGS(k1) PTR_ADDIU k0, 4 jr k0 rfe #else PTR_ADDIU k0, 4 /* stall on $k0 */ MTC0 k0, CP0_EPC ori k1, _THREAD_MASK xori k1, _THREAD_MASK LONG_L v1, TI_FLAGS(k1) eret #endif .set pop I hope I got this right. ;-) Maciej ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH] fast path for rdhwr emulation for TLS 2006-07-07 16:58 ` Maciej W. Rozycki @ 2006-07-08 16:12 ` Atsushi Nemoto 2006-07-10 14:40 ` Atsushi Nemoto 2006-07-10 14:55 ` Atsushi Nemoto 1 sibling, 1 reply; 26+ messages in thread From: Atsushi Nemoto @ 2006-07-08 16:12 UTC (permalink / raw) To: macro; +Cc: linux-mips, ralf On Fri, 7 Jul 2006 17:58:44 +0100 (BST), "Maciej W. Rozycki" <macro@linux-mips.org> wrote: > > Thanks, now I understand the problem. Are there any good solutions? > > Only I can think now is using handle_ri_slow for such CPUs. > > I have implemented an appropriate update to the TLB handlers (or actually > it's enough to care for this case for the TLBL exception), but it predates > the current synthesized ones. There is a small impact resulting from > this change and the synthesized handlers have the advantage of making it > only necessary for these chips that do need such handling. Do you still have the code? Could you post it for reference? > I'd restructure the code more or less like this, taking care for (almost) > all stalls resulting from interlocks on coprocessor moves and memory loads > and likewise avoiding the need for "nop" fillers there for MIPS I > processors: Thanks. I'll look it deeply. > bne k0, k1, handle_ri_slow /* if not ours */ > get_saved_sp /* k1 := current_thread_info */ Unfortunately, get_saved_sp is not a single instruction... --- Atsushi Nemoto ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH] fast path for rdhwr emulation for TLS 2006-07-08 16:12 ` Atsushi Nemoto @ 2006-07-10 14:40 ` Atsushi Nemoto 2006-09-14 17:28 ` Ralf Baechle 0 siblings, 1 reply; 26+ messages in thread From: Atsushi Nemoto @ 2006-07-10 14:40 UTC (permalink / raw) To: linux-mips; +Cc: ralf, macro Take 2. Comments (especially from pipeline wizards) are welcome. Add special short path for emulationg RDHWR which is used to support TLS. The handle_tlbl synthesizer takes a care for cpu_has_vtag_icache. Signed-off-by: Atsushi Nemoto <anemo@mba.ocn.ne.jp> diff --git a/arch/mips/kernel/genex.S b/arch/mips/kernel/genex.S index 37fda3d..dfceea9 100644 --- a/arch/mips/kernel/genex.S +++ b/arch/mips/kernel/genex.S @@ -375,6 +375,43 @@ #endif BUILD_HANDLER dsp dsp sti silent /* #26 */ BUILD_HANDLER reserved reserved sti verbose /* others */ + .align 5 + LEAF(handle_ri_rdhwr) + .set push + .set noat + .set noreorder + /* 0x7c03e83b: rdhwr v1,$29 */ + MFC0 k1, CP0_EPC + lui k0, 0x7c03 + lw k1, (k1) + ori k0, 0xe83b + .set reorder + bne k0, k1, handle_ri /* if not ours */ + /* The insn is rdhwr. No need to check CAUSE.BD here. */ + get_saved_sp /* k1 := current_thread_info */ + .set noreorder + MFC0 k0, CP0_EPC +#if defined(CONFIG_CPU_R3000) || defined(CONFIG_CPU_TX39XX) + ori k1, _THREAD_MASK + xori k1, _THREAD_MASK + LONG_L v1, TI_TP_VALUE(k1) + LONG_ADDIU k0, 4 + jr k0 + rfe +#else + LONG_ADDIU k0, 4 /* stall on $k0 */ + MTC0 k0, CP0_EPC + /* I hope three instructions between MTC0 and ERET are enough... */ + ori k1, _THREAD_MASK + xori k1, _THREAD_MASK + LONG_L v1, TI_TP_VALUE(k1) + .set mips3 + eret + .set mips0 +#endif + .set pop + END(handle_ri_rdhwr) + #ifdef CONFIG_64BIT /* A temporary overflow handler used by check_daddi(). */ diff --git a/arch/mips/kernel/traps.c b/arch/mips/kernel/traps.c index 954a198..46eba9f 100644 --- a/arch/mips/kernel/traps.c +++ b/arch/mips/kernel/traps.c @@ -52,6 +52,7 @@ extern asmlinkage void handle_dbe(void); extern asmlinkage void handle_sys(void); extern asmlinkage void handle_bp(void); extern asmlinkage void handle_ri(void); +extern asmlinkage void handle_ri_rdhwr(void); extern asmlinkage void handle_cpu(void); extern asmlinkage void handle_ov(void); extern asmlinkage void handle_tr(void); @@ -1381,6 +1382,15 @@ #endif memcpy((void *)(uncached_ebase + offset), addr, size); } +int __initdata rdhwr_noopt; +static int __init set_rdhwr_noopt(char *str) +{ + rdhwr_noopt = 1; + return 1; +} + +__setup("rdhwr_noopt", set_rdhwr_noopt); + void __init trap_init(void) { extern char except_vec3_generic, except_vec3_r4000; @@ -1460,7 +1470,7 @@ void __init trap_init(void) set_except_vector(8, handle_sys); set_except_vector(9, handle_bp); - set_except_vector(10, handle_ri); + set_except_vector(10, rdhwr_noopt ? handle_ri : handle_ri_rdhwr); set_except_vector(11, handle_cpu); set_except_vector(12, handle_ov); set_except_vector(13, handle_tr); diff --git a/arch/mips/mm/tlbex.c b/arch/mips/mm/tlbex.c index 375e099..3f53fa7 100644 --- a/arch/mips/mm/tlbex.c +++ b/arch/mips/mm/tlbex.c @@ -817,9 +817,10 @@ static __init void __attribute__((unused * Write random or indexed TLB entry, and care about the hazards from * the preceeding mtc0 and for the following eret. */ -enum tlb_write_entry { tlb_random, tlb_indexed }; +enum tlb_write_entry { tlb_random, tlb_indexed, tlb_arbitrary }; -static __init void build_tlb_write_entry(u32 **p, struct label **l, +static __init void build_tlb_write_entry(u32 **p, unsigned int tmp, + struct label **l, struct reloc **r, enum tlb_write_entry wmode) { @@ -828,6 +829,11 @@ static __init void build_tlb_write_entry switch (wmode) { case tlb_random: tlbw = i_tlbwr; break; case tlb_indexed: tlbw = i_tlbwi; break; + case tlb_arbitrary: + /* tmp contains CP0_INDEX. see build_update_entries(). */ + /* if tmp <= 0, use tlbwr instead of tlbwi */ + tlbw = i_tlbwr; + break; } switch (current_cpu_data.cputype) { @@ -841,6 +847,10 @@ static __init void build_tlb_write_entry * This branch uses up a mtc0 hazard nop slot and saves * two nops after the tlbw instruction. */ + if (wmode == tlb_arbitrary) { + il_bgezl(p, r, tmp, label_tlbw_hazard); + i_tlbwi(p); + } il_bgezl(p, r, 0, label_tlbw_hazard); tlbw(p); l_tlbw_hazard(l, *p); @@ -851,8 +861,13 @@ static __init void build_tlb_write_entry case CPU_R4700: case CPU_R5000: case CPU_R5000A: - i_nop(p); + if (wmode == tlb_arbitrary) { + il_bgezl(p, r, tmp, label_tlbw_hazard); + i_tlbwi(p); + } else + i_nop(p); tlbw(p); + l_tlbw_hazard(l, *p); i_nop(p); break; @@ -865,8 +880,13 @@ static __init void build_tlb_write_entry case CPU_AU1550: case CPU_AU1200: case CPU_PR4450: - i_nop(p); + if (wmode == tlb_arbitrary) { + il_bgezl(p, r, tmp, label_tlbw_hazard); + i_tlbwi(p); + } else + i_nop(p); tlbw(p); + l_tlbw_hazard(l, *p); break; case CPU_R10000: @@ -878,15 +898,24 @@ static __init void build_tlb_write_entry case CPU_4KSC: case CPU_20KC: case CPU_25KF: + if (wmode == tlb_arbitrary) { + il_bgezl(p, r, tmp, label_tlbw_hazard); + i_tlbwi(p); + } tlbw(p); + l_tlbw_hazard(l, *p); break; case CPU_NEVADA: - i_nop(p); /* QED specifies 2 nops hazard */ /* * This branch uses up a mtc0 hazard nop slot and saves * a nop after the tlbw instruction. */ + if (wmode == tlb_arbitrary) { + il_bgezl(p, r, tmp, label_tlbw_hazard); + i_tlbwi(p); + } else + i_nop(p); /* QED specifies 2 nops hazard */ il_bgezl(p, r, 0, label_tlbw_hazard); tlbw(p); l_tlbw_hazard(l, *p); @@ -896,8 +925,13 @@ static __init void build_tlb_write_entry i_nop(p); i_nop(p); i_nop(p); - i_nop(p); + if (wmode == tlb_arbitrary) { + il_bgezl(p, r, tmp, label_tlbw_hazard); + i_tlbwi(p); + } else + i_nop(p); tlbw(p); + l_tlbw_hazard(l, *p); break; case CPU_4KEC: @@ -905,7 +939,12 @@ static __init void build_tlb_write_entry case CPU_34K: case CPU_74K: i_ehb(p); + if (wmode == tlb_arbitrary) { + il_bgezl(p, r, tmp, label_tlbw_hazard); + i_tlbwi(p); + } tlbw(p); + l_tlbw_hazard(l, *p); break; case CPU_RM9000: @@ -918,8 +957,13 @@ static __init void build_tlb_write_entry i_ssnop(p); i_ssnop(p); i_ssnop(p); - i_ssnop(p); + if (wmode == tlb_arbitrary) { + il_bgezl(p, r, tmp, label_tlbw_hazard); + i_tlbwi(p); + } else + i_ssnop(p); tlbw(p); + l_tlbw_hazard(l, *p); i_ssnop(p); i_ssnop(p); i_ssnop(p); @@ -932,8 +976,13 @@ static __init void build_tlb_write_entry case CPU_VR4181: case CPU_VR4181A: i_nop(p); - i_nop(p); + if (wmode == tlb_arbitrary) { + il_bgezl(p, r, tmp, label_tlbw_hazard); + i_tlbwi(p); + } else + i_nop(p); tlbw(p); + l_tlbw_hazard(l, *p); i_nop(p); i_nop(p); break; @@ -942,8 +991,13 @@ static __init void build_tlb_write_entry case CPU_VR4133: case CPU_R5432: i_nop(p); - i_nop(p); + if (wmode == tlb_arbitrary) { + il_bgezl(p, r, tmp, label_tlbw_hazard); + i_tlbwi(p); + } else + i_nop(p); tlbw(p); + l_tlbw_hazard(l, *p); break; default: @@ -1123,7 +1177,7 @@ static __init void build_get_ptep(u32 ** } static __init void build_update_entries(u32 **p, unsigned int tmp, - unsigned int ptep) + unsigned int ptep, int loadindex) { /* * 64bit address support (36bit on a 32bit CPU) in a 32bit @@ -1136,6 +1190,8 @@ #ifdef CONFIG_64BIT_PHYS_ADDR i_dsrl(p, tmp, tmp, 6); /* convert to entrylo0 */ i_mtc0(p, tmp, C0_ENTRYLO0); /* load it */ i_dsrl(p, ptep, ptep, 6); /* convert to entrylo1 */ + if (loadindex) + i_mfc0(p, tmp, C0_INDEX); /* used by tlb_arbitrary */ i_mtc0(p, ptep, C0_ENTRYLO1); /* load it */ } else { int pte_off_even = sizeof(pte_t) / 2; @@ -1145,6 +1201,8 @@ #ifdef CONFIG_64BIT_PHYS_ADDR i_lw(p, tmp, pte_off_even, ptep); /* get even pte */ i_mtc0(p, tmp, C0_ENTRYLO0); /* load it */ i_lw(p, ptep, pte_off_odd, ptep); /* get odd pte */ + if (loadindex) + i_mfc0(p, tmp, C0_INDEX); /* used by tlb_arbitrary */ i_mtc0(p, ptep, C0_ENTRYLO1); /* load it */ } #else @@ -1157,8 +1215,8 @@ #else i_mtc0(p, 0, C0_ENTRYLO0); i_mtc0(p, tmp, C0_ENTRYLO0); /* load it */ i_SRL(p, ptep, ptep, 6); /* convert to entrylo1 */ - if (r45k_bvahwbug()) - i_mfc0(p, tmp, C0_INDEX); + if (r45k_bvahwbug() || loadindex) + i_mfc0(p, tmp, C0_INDEX); /* used by tlb_arbitrary */ if (r4k_250MHZhwbug()) i_mtc0(p, 0, C0_ENTRYLO1); i_mtc0(p, ptep, C0_ENTRYLO1); /* load it */ @@ -1198,8 +1256,8 @@ #else #endif build_get_ptep(&p, K0, K1); - build_update_entries(&p, K0, K1); - build_tlb_write_entry(&p, &l, &r, tlb_random); + build_update_entries(&p, K0, K1, 0); + build_tlb_write_entry(&p, K0, &l, &r, tlb_random); l_leave(&l, p); i_eret(&p); /* return from trap */ @@ -1647,12 +1705,13 @@ # endif static void __init build_r4000_tlbchange_handler_tail(u32 **p, struct label **l, struct reloc **r, unsigned int tmp, - unsigned int ptr) + unsigned int ptr, + enum tlb_write_entry wmode) { i_ori(p, ptr, ptr, sizeof(pte_t)); i_xori(p, ptr, ptr, sizeof(pte_t)); - build_update_entries(p, tmp, ptr); - build_tlb_write_entry(p, l, r, tlb_indexed); + build_update_entries(p, tmp, ptr, wmode == tlb_arbitrary); + build_tlb_write_entry(p, tmp, l, r, wmode); l_leave(l, *p); i_eret(p); /* return from trap */ @@ -1667,6 +1726,9 @@ static void __init build_r4000_tlb_load_ struct label *l = labels; struct reloc *r = relocs; int i; + extern int rdhwr_noopt; + enum tlb_write_entry wmode = (!rdhwr_noopt && cpu_has_vtag_icache) ? + tlb_arbitrary : tlb_indexed; memset(handle_tlbl, 0, sizeof(handle_tlbl)); memset(labels, 0, sizeof(labels)); @@ -1684,7 +1746,7 @@ static void __init build_r4000_tlb_load_ build_r4000_tlbchange_handler_head(&p, &l, &r, K0, K1); build_pte_present(&p, &l, &r, K0, K1, label_nopage_tlbl); build_make_valid(&p, &r, K0, K1); - build_r4000_tlbchange_handler_tail(&p, &l, &r, K0, K1); + build_r4000_tlbchange_handler_tail(&p, &l, &r, K0, K1, wmode); l_nopage_tlbl(&l, p); i_j(&p, (unsigned long)tlb_do_page_fault_0 & 0x0fffffff); @@ -1718,7 +1780,7 @@ static void __init build_r4000_tlb_store build_r4000_tlbchange_handler_head(&p, &l, &r, K0, K1); build_pte_writable(&p, &l, &r, K0, K1, label_nopage_tlbs); build_make_write(&p, &r, K0, K1); - build_r4000_tlbchange_handler_tail(&p, &l, &r, K0, K1); + build_r4000_tlbchange_handler_tail(&p, &l, &r, K0, K1, tlb_indexed); l_nopage_tlbs(&l, p); i_j(&p, (unsigned long)tlb_do_page_fault_1 & 0x0fffffff); @@ -1753,7 +1815,7 @@ static void __init build_r4000_tlb_modif build_pte_modifiable(&p, &l, &r, K0, K1, label_nopage_tlbm); /* Present and writable bits set, set accessed and dirty bits. */ build_make_write(&p, &r, K0, K1); - build_r4000_tlbchange_handler_tail(&p, &l, &r, K0, K1); + build_r4000_tlbchange_handler_tail(&p, &l, &r, K0, K1, tlb_indexed); l_nopage_tlbm(&l, p); i_j(&p, (unsigned long)tlb_do_page_fault_1 & 0x0fffffff); ^ permalink raw reply related [flat|nested] 26+ messages in thread
* Re: [PATCH] fast path for rdhwr emulation for TLS 2006-07-10 14:40 ` Atsushi Nemoto @ 2006-09-14 17:28 ` Ralf Baechle 2006-09-15 3:09 ` Atsushi Nemoto 0 siblings, 1 reply; 26+ messages in thread From: Ralf Baechle @ 2006-09-14 17:28 UTC (permalink / raw) To: Atsushi Nemoto; +Cc: linux-mips, macro On Mon, Jul 10, 2006 at 11:40:10PM +0900, Atsushi Nemoto wrote: > Add special short path for emulationg RDHWR which is used to support > TLS. The handle_tlbl synthesizer takes a care for > cpu_has_vtag_icache. I'm just wondering if we actually need such optimizations. Have you ran any application benchmarks? Ralf ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH] fast path for rdhwr emulation for TLS 2006-09-14 17:28 ` Ralf Baechle @ 2006-09-15 3:09 ` Atsushi Nemoto 0 siblings, 0 replies; 26+ messages in thread From: Atsushi Nemoto @ 2006-09-15 3:09 UTC (permalink / raw) To: ralf; +Cc: linux-mips, macro On Thu, 14 Sep 2006 18:28:05 +0100, Ralf Baechle <ralf@linux-mips.org> wrote: > > Add special short path for emulationg RDHWR which is used to support > > TLS. The handle_tlbl synthesizer takes a care for > > cpu_has_vtag_icache. > > I'm just wondering if we actually need such optimizations. Have you ran > any application benchmarks? I've measured time of NPTL pthread_mutex_lock/pthread_mutex_unlock loop. pthread_mutex_init(&m, NULL); gettimeofday(&start, NULL); for (i = 0; i < 1000000; i++) { pthread_mutex_lock(&m); pthread_mutex_unlock(&m); } gettimeofday(&end, NULL); Without optimization: 0.826407 sec / 1000000 loop With optimization: 0.415667 sec / 1000000 loop It would be worth to do. --- Atsushi Nemoto ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH] fast path for rdhwr emulation for TLS 2006-07-07 16:58 ` Maciej W. Rozycki 2006-07-08 16:12 ` Atsushi Nemoto @ 2006-07-10 14:55 ` Atsushi Nemoto 2006-07-11 2:53 ` Daniel Jacobowitz 1 sibling, 1 reply; 26+ messages in thread From: Atsushi Nemoto @ 2006-07-10 14:55 UTC (permalink / raw) To: macro; +Cc: linux-mips, ralf On Fri, 7 Jul 2006 17:58:44 +0100 (BST), "Maciej W. Rozycki" <macro@linux-mips.org> wrote: > mfc0 k0, CP0_CAUSE > MFC0 k1, CP0_EPC > bltz k0, handle_ri_slow /* if delay slot */ > lui k0, 0x7c03 I noticed that checking for CP0_CAUSE.BD is unneeded, since we are checking the instruction code anyway and "rdhwr" does not have a delay slot. I removed the checking on the "take 2" patch I just sent. --- Atsushi Nemoto ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH] fast path for rdhwr emulation for TLS 2006-07-10 14:55 ` Atsushi Nemoto @ 2006-07-11 2:53 ` Daniel Jacobowitz 2006-07-11 3:20 ` Atsushi Nemoto 0 siblings, 1 reply; 26+ messages in thread From: Daniel Jacobowitz @ 2006-07-11 2:53 UTC (permalink / raw) To: Atsushi Nemoto; +Cc: macro, linux-mips, ralf On Mon, Jul 10, 2006 at 11:55:53PM +0900, Atsushi Nemoto wrote: > On Fri, 7 Jul 2006 17:58:44 +0100 (BST), "Maciej W. Rozycki" <macro@linux-mips.org> wrote: > > mfc0 k0, CP0_CAUSE > > MFC0 k1, CP0_EPC > > bltz k0, handle_ri_slow /* if delay slot */ > > lui k0, 0x7c03 > > I noticed that checking for CP0_CAUSE.BD is unneeded, since we are > checking the instruction code anyway and "rdhwr" does not have a delay > slot. I removed the checking on the "take 2" patch I just sent. Isn't BD "this instruction is in a delay slot", not "this instruction has a delay slot"? It affects where we go when we return. BTW, if the fast emulation can't handle rdhwr in a delay slot, please report a bug on GCC asking it not to put rdhwr in delay slots by default. It's probably worthwhile. -- Daniel Jacobowitz CodeSourcery ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH] fast path for rdhwr emulation for TLS 2006-07-11 2:53 ` Daniel Jacobowitz @ 2006-07-11 3:20 ` Atsushi Nemoto 2006-09-08 17:39 ` Nigel Stephens 0 siblings, 1 reply; 26+ messages in thread From: Atsushi Nemoto @ 2006-07-11 3:20 UTC (permalink / raw) To: dan; +Cc: macro, linux-mips, ralf On Mon, 10 Jul 2006 22:53:42 -0400, Daniel Jacobowitz <dan@debian.org> wrote: > > I noticed that checking for CP0_CAUSE.BD is unneeded, since we are > > checking the instruction code anyway and "rdhwr" does not have a delay > > slot. I removed the checking on the "take 2" patch I just sent. > > Isn't BD "this instruction is in a delay slot", not "this instruction > has a delay slot"? It affects where we go when we return. Well, the BD means "the exception occurred on a delay slot of this (which EPC points) instruction". If rdhwr was in a delay slot, EPC points the preceding jump/branch instruction. This fast path is reading a instruction at the EPC (regardless BD), so it must not be "rdhwr" and fall back to slow path. > BTW, if the fast emulation can't handle rdhwr in a delay slot, please > report a bug on GCC asking it not to put rdhwr in delay slots by > default. It's probably worthwhile. If rdhwr was on a delay slot, the slow emulation will be more slower. So I think rdhwr should not be put on delay slot anyway regardless fast emulation. I asked on GCC bugzilla a few days ago but can not got feedback yet. http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28126 --- Atsushi Nemoto ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH] fast path for rdhwr emulation for TLS 2006-07-11 3:20 ` Atsushi Nemoto @ 2006-09-08 17:39 ` Nigel Stephens 2006-09-09 13:56 ` Atsushi Nemoto 0 siblings, 1 reply; 26+ messages in thread From: Nigel Stephens @ 2006-09-08 17:39 UTC (permalink / raw) To: Atsushi Nemoto, ralf; +Cc: dan, macro, linux-mips moto wrote: > >> BTW, if the fast emulation can't handle rdhwr in a delay slot, please >> report a bug on GCC asking it not to put rdhwr in delay slots by >> default. It's probably worthwhile. >> > > If rdhwr was on a delay slot, the slow emulation will be more slower. > So I think rdhwr should not be put on delay slot anyway regardless > fast emulation. > > I asked on GCC bugzilla a few days ago but can not got feedback yet. > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28126 > In spite of the GCC issue, is this patch now at the point where it could be applied, or at least queued? Nigel ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH] fast path for rdhwr emulation for TLS 2006-09-08 17:39 ` Nigel Stephens @ 2006-09-09 13:56 ` Atsushi Nemoto 2006-09-10 22:30 ` Nigel Stephens 2006-09-11 13:09 ` Maciej W. Rozycki 0 siblings, 2 replies; 26+ messages in thread From: Atsushi Nemoto @ 2006-09-09 13:56 UTC (permalink / raw) To: nigel; +Cc: ralf, dan, macro, linux-mips On Fri, 08 Sep 2006 18:39:08 +0100, Nigel Stephens <nigel@mips.com> wrote: > > I asked on GCC bugzilla a few days ago but can not got feedback yet. > > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28126 > > > > In spite of the GCC issue, is this patch now at the point where it could > be applied, or at least queued? GCC 4.2 does not put RDHWR in delay slot now. Also, there is a "hackish fix" to prevent gcc move a RDHWR outside of a conditional (from Richard Sandiford). For kernel side, my patch can be still applied to current git tree as is. But I'm still looking for better solution (silver bullet?) for cpu_has_vtag_icache case. How about something like this (and do not touch tlbex.c)? LEAF(handle_ri_rdhwr_vivt) .set push .set noat .set noreorder /* check if TLB contains a entry for EPC */ MFC0 K1, CP0_ENTRYHI andi k1, ASID_MASK MFC0 k0, CP0_EPC andi k0, PAGE_MASK << 1 or k1, k0 MTC0 k1, CP0_ENTRYHI tlbp mfc0 k1, CP0_INDEX bltz k1, handle_ri /* slow path */ nop /* fall thru */ LEAF(handle_ri_rdhwr) I'm wondering if this could work on CONFIG_MIPS_MT_SMTC case... --- Atsushi Nemoto ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH] fast path for rdhwr emulation for TLS 2006-09-09 13:56 ` Atsushi Nemoto @ 2006-09-10 22:30 ` Nigel Stephens 2006-09-11 5:04 ` Atsushi Nemoto 2006-09-11 13:09 ` Maciej W. Rozycki 1 sibling, 1 reply; 26+ messages in thread From: Nigel Stephens @ 2006-09-10 22:30 UTC (permalink / raw) To: Atsushi Nemoto; +Cc: ralf, dan, macro, linux-mips Atsushi Nemoto wrote: > On Fri, 08 Sep 2006 18:39:08 +0100, Nigel Stephens <nigel@mips.com> wrote: > >>> I asked on GCC bugzilla a few days ago but can not got feedback yet. >>> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28126 >>> >>> >> In spite of the GCC issue, is this patch now at the point where it could >> be applied, or at least queued? >> > > GCC 4.2 does not put RDHWR in delay slot now. Also, there is a > "hackish fix" to prevent gcc move a RDHWR outside of a conditional > (from Richard Sandiford). > > For kernel side, my patch can be still applied to current git tree as > is. > > But I'm still looking for better solution (silver bullet?) for > cpu_has_vtag_icache case. > > How about something like this (and do not touch tlbex.c)? > > LEAF(handle_ri_rdhwr_vivt) > .set push > .set noat > .set noreorder > /* check if TLB contains a entry for EPC */ > MFC0 K1, CP0_ENTRYHI > andi k1, ASID_MASK > MFC0 k0, CP0_EPC > andi k0, PAGE_MASK << 1 > or k1, k0 > MTC0 k1, CP0_ENTRYHI > tlbp > mfc0 k1, CP0_INDEX > bltz k1, handle_ri /* slow path */ > nop > /* fall thru */ > LEAF(handle_ri_rdhwr) > > I'm wondering if this could work on CONFIG_MIPS_MT_SMTC case... > > No, that wouldn't be reliable for CONFIG_MIPS_MT_SMTC, but then again the only CPU which currently runs SMTC has VIPT caches Nigel ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH] fast path for rdhwr emulation for TLS 2006-09-10 22:30 ` Nigel Stephens @ 2006-09-11 5:04 ` Atsushi Nemoto 2006-09-11 8:50 ` Atsushi Nemoto 0 siblings, 1 reply; 26+ messages in thread From: Atsushi Nemoto @ 2006-09-11 5:04 UTC (permalink / raw) To: nigel; +Cc: ralf, dan, macro, linux-mips On Sun, 10 Sep 2006 23:30:18 +0100, Nigel Stephens <nigel@mips.com> wrote: > > LEAF(handle_ri_rdhwr_vivt) ... > > > > I'm wondering if this could work on CONFIG_MIPS_MT_SMTC case... > > No, that wouldn't be reliable for CONFIG_MIPS_MT_SMTC, but then again > the only CPU which currently runs SMTC has VIPT caches Then this woule be better then "take 2" patch? This add some overhead to fast RDHWR emulation path but no overhead to TLB refill path. The tlb_probe_hazard is not exist in main branch for now but already exist in queue branch. Take 3. Comments (especially from pipeline wizards) are welcome. Add special short path for emulationg RDHWR which is used to support TLS. Add an extra prologue for cpu_has_vtag_icache case. Signed-off-by: Atsushi Nemoto <anemo@mba.ocn.ne.jp> diff --git a/arch/mips/kernel/genex.S b/arch/mips/kernel/genex.S index 37fda3d..55e090e 100644 --- a/arch/mips/kernel/genex.S +++ b/arch/mips/kernel/genex.S @@ -19,6 +19,7 @@ #include <asm/fpregdef.h> #include <asm/mipsregs.h> #include <asm/stackframe.h> #include <asm/war.h> +#include <asm/page.h> #define PANIC_PIC(msg) \ .set push; \ @@ -375,6 +376,72 @@ #endif BUILD_HANDLER dsp dsp sti silent /* #26 */ BUILD_HANDLER reserved reserved sti verbose /* others */ + .align 5 + LEAF(handle_ri_rdhwr_vivt) +#ifdef CONFIG_MIPS_MT_SMTC + PANIC_PIC("handle_ri_rdhwr_vivt called") +#else + .set push + .set noat + .set noreorder + /* check if TLB contains a entry for EPC */ + MFC0 k1, CP0_ENTRYHI + andi k1, 0xff /* ASID_MASK */ + MFC0 k0, CP0_EPC + PTR_SRL k0, PAGE_SHIFT + 1 + PTR_SLL k0, PAGE_SHIFT + 1 + or k1, k0 + MTC0 k1, CP0_ENTRYHI + mtc0_tlbw_hazard + tlbp +#ifdef CONFIG_CPU_MIPSR2 + _ehb /* tlb_probe_hazard */ +#else + nop; nop; nop; nop; nop; nop /* tlb_probe_hazard */ +#endif + mfc0 k1, CP0_INDEX + .set pop + bltz k1, handle_ri /* slow path */ + /* fall thru */ +#endif + END(handle_ri_rdhwr_vivt) + + LEAF(handle_ri_rdhwr) + .set push + .set noat + .set noreorder + /* 0x7c03e83b: rdhwr v1,$29 */ + MFC0 k1, CP0_EPC + lui k0, 0x7c03 + lw k1, (k1) + ori k0, 0xe83b + .set reorder + bne k0, k1, handle_ri /* if not ours */ + /* The insn is rdhwr. No need to check CAUSE.BD here. */ + get_saved_sp /* k1 := current_thread_info */ + .set noreorder + MFC0 k0, CP0_EPC +#if defined(CONFIG_CPU_R3000) || defined(CONFIG_CPU_TX39XX) + ori k1, _THREAD_MASK + xori k1, _THREAD_MASK + LONG_L v1, TI_TP_VALUE(k1) + LONG_ADDIU k0, 4 + jr k0 + rfe +#else + LONG_ADDIU k0, 4 /* stall on $k0 */ + MTC0 k0, CP0_EPC + /* I hope three instructions between MTC0 and ERET are enough... */ + ori k1, _THREAD_MASK + xori k1, _THREAD_MASK + LONG_L v1, TI_TP_VALUE(k1) + .set mips3 + eret + .set mips0 +#endif + .set pop + END(handle_ri_rdhwr) + #ifdef CONFIG_64BIT /* A temporary overflow handler used by check_daddi(). */ diff --git a/arch/mips/kernel/traps.c b/arch/mips/kernel/traps.c index e51d8fd..7ae454a 100644 --- a/arch/mips/kernel/traps.c +++ b/arch/mips/kernel/traps.c @@ -53,6 +53,8 @@ extern asmlinkage void handle_dbe(void); extern asmlinkage void handle_sys(void); extern asmlinkage void handle_bp(void); extern asmlinkage void handle_ri(void); +extern asmlinkage void handle_ri_rdhwr_vivt(void); +extern asmlinkage void handle_ri_rdhwr(void); extern asmlinkage void handle_cpu(void); extern asmlinkage void handle_ov(void); extern asmlinkage void handle_tr(void); @@ -1453,6 +1455,15 @@ #endif memcpy((void *)(uncached_ebase + offset), addr, size); } +int __initdata rdhwr_noopt; +static int __init set_rdhwr_noopt(char *str) +{ + rdhwr_noopt = 1; + return 1; +} + +__setup("rdhwr_noopt", set_rdhwr_noopt); + void __init trap_init(void) { extern char except_vec3_generic, except_vec3_r4000; @@ -1532,7 +1543,9 @@ void __init trap_init(void) set_except_vector(8, handle_sys); set_except_vector(9, handle_bp); - set_except_vector(10, handle_ri); + set_except_vector(10, rdhwr_noopt ? handle_ri : + (cpu_has_vtag_icache ? + handle_ri_rdhwr_vivt : handle_ri_rdhwr)); set_except_vector(11, handle_cpu); set_except_vector(12, handle_ov); set_except_vector(13, handle_tr); ^ permalink raw reply related [flat|nested] 26+ messages in thread
* Re: [PATCH] fast path for rdhwr emulation for TLS 2006-09-11 5:04 ` Atsushi Nemoto @ 2006-09-11 8:50 ` Atsushi Nemoto 2006-09-11 9:49 ` Thiemo Seufer 0 siblings, 1 reply; 26+ messages in thread From: Atsushi Nemoto @ 2006-09-11 8:50 UTC (permalink / raw) To: nigel; +Cc: ralf, dan, macro, linux-mips On Mon, 11 Sep 2006 14:04:03 +0900 (JST), Atsushi Nemoto <anemo@mba.ocn.ne.jp> wrote: > Then this woule be better then "take 2" patch? This add some overhead > to fast RDHWR emulation path but no overhead to TLB refill path. > > The tlb_probe_hazard is not exist in main branch for now but already > exist in queue branch. > > > Take 3. Comments (especially from pipeline wizards) are welcome. Oops, "rdhwr_noopt" should be static in this take. Revised. Take 3(revised). Add special short path for emulationg RDHWR which is used to support TLS. Add an extra prologue for cpu_has_vtag_icache case. Signed-off-by: Atsushi Nemoto <anemo@mba.ocn.ne.jp> diff --git a/arch/mips/kernel/genex.S b/arch/mips/kernel/genex.S index 37fda3d..55e090e 100644 --- a/arch/mips/kernel/genex.S +++ b/arch/mips/kernel/genex.S @@ -19,6 +19,7 @@ #include <asm/fpregdef.h> #include <asm/mipsregs.h> #include <asm/stackframe.h> #include <asm/war.h> +#include <asm/page.h> #define PANIC_PIC(msg) \ .set push; \ @@ -375,6 +376,72 @@ #endif BUILD_HANDLER dsp dsp sti silent /* #26 */ BUILD_HANDLER reserved reserved sti verbose /* others */ + .align 5 + LEAF(handle_ri_rdhwr_vivt) +#ifdef CONFIG_MIPS_MT_SMTC + PANIC_PIC("handle_ri_rdhwr_vivt called") +#else + .set push + .set noat + .set noreorder + /* check if TLB contains a entry for EPC */ + MFC0 k1, CP0_ENTRYHI + andi k1, 0xff /* ASID_MASK */ + MFC0 k0, CP0_EPC + PTR_SRL k0, PAGE_SHIFT + 1 + PTR_SLL k0, PAGE_SHIFT + 1 + or k1, k0 + MTC0 k1, CP0_ENTRYHI + mtc0_tlbw_hazard + tlbp +#ifdef CONFIG_CPU_MIPSR2 + _ehb /* tlb_probe_hazard */ +#else + nop; nop; nop; nop; nop; nop /* tlb_probe_hazard */ +#endif + mfc0 k1, CP0_INDEX + .set pop + bltz k1, handle_ri /* slow path */ + /* fall thru */ +#endif + END(handle_ri_rdhwr_vivt) + + LEAF(handle_ri_rdhwr) + .set push + .set noat + .set noreorder + /* 0x7c03e83b: rdhwr v1,$29 */ + MFC0 k1, CP0_EPC + lui k0, 0x7c03 + lw k1, (k1) + ori k0, 0xe83b + .set reorder + bne k0, k1, handle_ri /* if not ours */ + /* The insn is rdhwr. No need to check CAUSE.BD here. */ + get_saved_sp /* k1 := current_thread_info */ + .set noreorder + MFC0 k0, CP0_EPC +#if defined(CONFIG_CPU_R3000) || defined(CONFIG_CPU_TX39XX) + ori k1, _THREAD_MASK + xori k1, _THREAD_MASK + LONG_L v1, TI_TP_VALUE(k1) + LONG_ADDIU k0, 4 + jr k0 + rfe +#else + LONG_ADDIU k0, 4 /* stall on $k0 */ + MTC0 k0, CP0_EPC + /* I hope three instructions between MTC0 and ERET are enough... */ + ori k1, _THREAD_MASK + xori k1, _THREAD_MASK + LONG_L v1, TI_TP_VALUE(k1) + .set mips3 + eret + .set mips0 +#endif + .set pop + END(handle_ri_rdhwr) + #ifdef CONFIG_64BIT /* A temporary overflow handler used by check_daddi(). */ diff --git a/arch/mips/kernel/traps.c b/arch/mips/kernel/traps.c index e51d8fd..e56b02f 100644 --- a/arch/mips/kernel/traps.c +++ b/arch/mips/kernel/traps.c @@ -53,6 +53,8 @@ extern asmlinkage void handle_dbe(void); extern asmlinkage void handle_sys(void); extern asmlinkage void handle_bp(void); extern asmlinkage void handle_ri(void); +extern asmlinkage void handle_ri_rdhwr_vivt(void); +extern asmlinkage void handle_ri_rdhwr(void); extern asmlinkage void handle_cpu(void); extern asmlinkage void handle_ov(void); extern asmlinkage void handle_tr(void); @@ -1453,6 +1455,15 @@ #endif memcpy((void *)(uncached_ebase + offset), addr, size); } +static int __initdata rdhwr_noopt; +static int __init set_rdhwr_noopt(char *str) +{ + rdhwr_noopt = 1; + return 1; +} + +__setup("rdhwr_noopt", set_rdhwr_noopt); + void __init trap_init(void) { extern char except_vec3_generic, except_vec3_r4000; @@ -1532,7 +1543,9 @@ void __init trap_init(void) set_except_vector(8, handle_sys); set_except_vector(9, handle_bp); - set_except_vector(10, handle_ri); + set_except_vector(10, rdhwr_noopt ? handle_ri : + (cpu_has_vtag_icache ? + handle_ri_rdhwr_vivt : handle_ri_rdhwr)); set_except_vector(11, handle_cpu); set_except_vector(12, handle_ov); set_except_vector(13, handle_tr); ^ permalink raw reply related [flat|nested] 26+ messages in thread
* Re: [PATCH] fast path for rdhwr emulation for TLS 2006-09-11 8:50 ` Atsushi Nemoto @ 2006-09-11 9:49 ` Thiemo Seufer 2006-09-11 14:13 ` Atsushi Nemoto 0 siblings, 1 reply; 26+ messages in thread From: Thiemo Seufer @ 2006-09-11 9:49 UTC (permalink / raw) To: Atsushi Nemoto; +Cc: nigel, ralf, dan, macro, linux-mips Atsushi Nemoto wrote: [snip] > @@ -375,6 +376,72 @@ #endif > BUILD_HANDLER dsp dsp sti silent /* #26 */ > BUILD_HANDLER reserved reserved sti verbose /* others */ > > + .align 5 > + LEAF(handle_ri_rdhwr_vivt) > +#ifdef CONFIG_MIPS_MT_SMTC > + PANIC_PIC("handle_ri_rdhwr_vivt called") > +#else > + .set push > + .set noat > + .set noreorder > + /* check if TLB contains a entry for EPC */ > + MFC0 k1, CP0_ENTRYHI > + andi k1, 0xff /* ASID_MASK */ > + MFC0 k0, CP0_EPC > + PTR_SRL k0, PAGE_SHIFT + 1 > + PTR_SLL k0, PAGE_SHIFT + 1 > + or k1, k0 > + MTC0 k1, CP0_ENTRYHI > + mtc0_tlbw_hazard > + tlbp This needs a .set mips3/.set mips0 pair. > +#ifdef CONFIG_CPU_MIPSR2 > + _ehb /* tlb_probe_hazard */ > +#else > + nop; nop; nop; nop; nop; nop /* tlb_probe_hazard */ > +#endif What about a mtc0_tlbp_hazard macro here? Thiemo ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH] fast path for rdhwr emulation for TLS 2006-09-11 9:49 ` Thiemo Seufer @ 2006-09-11 14:13 ` Atsushi Nemoto 2006-09-11 15:17 ` Thiemo Seufer 0 siblings, 1 reply; 26+ messages in thread From: Atsushi Nemoto @ 2006-09-11 14:13 UTC (permalink / raw) To: ths; +Cc: nigel, ralf, dan, macro, linux-mips On Mon, 11 Sep 2006 10:49:05 +0100, Thiemo Seufer <ths@networkno.de> wrote: > > + tlbp > > This needs a .set mips3/.set mips0 pair. The TLBP is belong to MIPS I ISA, isn't it? > > +#ifdef CONFIG_CPU_MIPSR2 > > + _ehb /* tlb_probe_hazard */ > > +#else > > + nop; nop; nop; nop; nop; nop /* tlb_probe_hazard */ > > +#endif > > What about a mtc0_tlbp_hazard macro here? You mean mtc0_tlbw_hazard? I took them from tlb_probe_hazard macro in queue branch. And it looks current mtc0_tlbw_hazard asm macro does not match with its C equivalent ... .macro mtc0_tlbw_hazard b . + 8 .endm #define mtc0_tlbw_hazard() \ __asm__ __volatile__( \ " .set noreorder \n" \ " nop \n" \ " nop \n" \ " nop \n" \ " nop \n" \ " nop \n" \ " nop \n" \ " .set reorder \n") --- Atsushi Nemoto ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH] fast path for rdhwr emulation for TLS 2006-09-11 14:13 ` Atsushi Nemoto @ 2006-09-11 15:17 ` Thiemo Seufer 0 siblings, 0 replies; 26+ messages in thread From: Thiemo Seufer @ 2006-09-11 15:17 UTC (permalink / raw) To: Atsushi Nemoto; +Cc: nigel, ralf, dan, macro, linux-mips Atsushi Nemoto wrote: > On Mon, 11 Sep 2006 10:49:05 +0100, Thiemo Seufer <ths@networkno.de> wrote: > > > + tlbp > > > > This needs a .set mips3/.set mips0 pair. > > The TLBP is belong to MIPS I ISA, isn't it? Uh, right. I wasn't awake when I wrote that mail. :-) > > > +#ifdef CONFIG_CPU_MIPSR2 > > > + _ehb /* tlb_probe_hazard */ > > > +#else > > > + nop; nop; nop; nop; nop; nop /* tlb_probe_hazard */ > > > +#endif > > > > What about a mtc0_tlbp_hazard macro here? > > You mean mtc0_tlbw_hazard? I took them from tlb_probe_hazard macro in > queue branch. Actually, I meant an equivalent to the build_tlb_probe_entry in tlbex.c, plus a tlb_use_hazard. > And it looks current mtc0_tlbw_hazard asm macro does not match with > its C equivalent ... > > .macro mtc0_tlbw_hazard > b . + 8 > .endm > > #define mtc0_tlbw_hazard() \ > __asm__ __volatile__( \ > " .set noreorder \n" \ > " nop \n" \ > " nop \n" \ > " nop \n" \ > " nop \n" \ > " nop \n" \ > " nop \n" \ > " .set reorder \n") It also lacks a case for R2 CPUs, where IIRC _ehb is the the way approved by the spec. Thiemo ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH] fast path for rdhwr emulation for TLS 2006-09-09 13:56 ` Atsushi Nemoto 2006-09-10 22:30 ` Nigel Stephens @ 2006-09-11 13:09 ` Maciej W. Rozycki 2006-09-11 14:30 ` Atsushi Nemoto 1 sibling, 1 reply; 26+ messages in thread From: Maciej W. Rozycki @ 2006-09-11 13:09 UTC (permalink / raw) To: Atsushi Nemoto; +Cc: nigel, ralf, dan, linux-mips On Sat, 9 Sep 2006, Atsushi Nemoto wrote: > But I'm still looking for better solution (silver bullet?) for > cpu_has_vtag_icache case. What's wrong with just letting a TLB fault happen? Maciej ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH] fast path for rdhwr emulation for TLS 2006-09-11 13:09 ` Maciej W. Rozycki @ 2006-09-11 14:30 ` Atsushi Nemoto 2006-09-11 17:53 ` Maciej W. Rozycki 0 siblings, 1 reply; 26+ messages in thread From: Atsushi Nemoto @ 2006-09-11 14:30 UTC (permalink / raw) To: macro; +Cc: nigel, ralf, dan, linux-mips On Mon, 11 Sep 2006 14:09:20 +0100 (BST), "Maciej W. Rozycki" <macro@linux-mips.org> wrote: > > But I'm still looking for better solution (silver bullet?) for > > cpu_has_vtag_icache case. > > What's wrong with just letting a TLB fault happen? It might add a little overhead to usual TLB refill handling. The overhead might be neglectable, but I'm not sure. --- Atsushi Nemoto ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH] fast path for rdhwr emulation for TLS 2006-09-11 14:30 ` Atsushi Nemoto @ 2006-09-11 17:53 ` Maciej W. Rozycki 2006-09-12 1:55 ` Atsushi Nemoto 0 siblings, 1 reply; 26+ messages in thread From: Maciej W. Rozycki @ 2006-09-11 17:53 UTC (permalink / raw) To: Atsushi Nemoto; +Cc: nigel, ralf, dan, linux-mips On Mon, 11 Sep 2006, Atsushi Nemoto wrote: > > What's wrong with just letting a TLB fault happen? > > It might add a little overhead to usual TLB refill handling. The > overhead might be neglectable, but I'm not sure. There is no need to change the refill handler -- only the general TLBL exception has to be modified. And this one may be not too critical -- the change required is in the path to mark pages accessed. Is the path frequent enough to seek a complex solution while a simple one would just work? Maciej ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH] fast path for rdhwr emulation for TLS 2006-09-11 17:53 ` Maciej W. Rozycki @ 2006-09-12 1:55 ` Atsushi Nemoto 0 siblings, 0 replies; 26+ messages in thread From: Atsushi Nemoto @ 2006-09-12 1:55 UTC (permalink / raw) To: macro; +Cc: nigel, ralf, dan, linux-mips On Mon, 11 Sep 2006 18:53:29 +0100 (BST), "Maciej W. Rozycki" <macro@linux-mips.org> wrote: > > It might add a little overhead to usual TLB refill handling. The > > overhead might be neglectable, but I'm not sure. > > There is no need to change the refill handler -- only the general TLBL > exception has to be modified. And this one may be not too critical -- the > change required is in the path to mark pages accessed. Is the path > frequent enough to seek a complex solution while a simple one would just > work? Yes, my description was wrong. general TLBL handling, not TLB refill handling. Hmm, it seems not so critical indeed. Then "take 2" patch would be exactly what you preferred. http://www.linux-mips.org/cgi-bin/mesg.cgi?a=linux-mips&i=20060710.234010.07457279.anemo%40mba.ocn.ne.jp Any comments about that? --- Atsushi Nemoto ^ permalink raw reply [flat|nested] 26+ messages in thread
end of thread, other threads:[~2006-09-15 3:09 UTC | newest] Thread overview: 26+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2006-07-07 15:00 [PATCH] fast path for rdhwr emulation for TLS Atsushi Nemoto 2006-07-07 15:22 ` Maciej W. Rozycki 2006-07-07 16:12 ` Atsushi Nemoto 2006-07-07 16:43 ` Atsushi Nemoto 2006-07-07 17:04 ` Maciej W. Rozycki 2006-07-07 18:22 ` Ralf Baechle 2006-07-07 16:58 ` Maciej W. Rozycki 2006-07-08 16:12 ` Atsushi Nemoto 2006-07-10 14:40 ` Atsushi Nemoto 2006-09-14 17:28 ` Ralf Baechle 2006-09-15 3:09 ` Atsushi Nemoto 2006-07-10 14:55 ` Atsushi Nemoto 2006-07-11 2:53 ` Daniel Jacobowitz 2006-07-11 3:20 ` Atsushi Nemoto 2006-09-08 17:39 ` Nigel Stephens 2006-09-09 13:56 ` Atsushi Nemoto 2006-09-10 22:30 ` Nigel Stephens 2006-09-11 5:04 ` Atsushi Nemoto 2006-09-11 8:50 ` Atsushi Nemoto 2006-09-11 9:49 ` Thiemo Seufer 2006-09-11 14:13 ` Atsushi Nemoto 2006-09-11 15:17 ` Thiemo Seufer 2006-09-11 13:09 ` Maciej W. Rozycki 2006-09-11 14:30 ` Atsushi Nemoto 2006-09-11 17:53 ` Maciej W. Rozycki 2006-09-12 1:55 ` Atsushi Nemoto
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.